<<

THE UNIVERSITY OF NEW SOUTH WALES Thesis/Dissertation Sheet

Surname: Adie

First name: Sam

Abbreviation for degree as given in the University calendar: PhD

School: South Western Sydney Clinical School

Faculty: Medicine

Title: iQuEST: Investigating the Quality and Epidemiology of Surgical Trials

Abstract

Randomised controlled trials (RCTs) provide clinicians with the best evidence for interventions, but are subject to systematic errors (bias) when methodology is not optimal. These biases occur at any time from the inception, execution, data collection, analysis, and dissemination of results. Performing RCTs for surgical interventions is additionally challenging, given the relative complexities of surgical interventions and patients, and the culture of surgical training. This thesis examined the epidemiology and quality characteristics of RCTs of surgical interventions.

A systematic search was conducted to locate recently published surgical RCTs and meta-analyses, in order to attain a sample that would be reflective of the current state of surgical evidence. Data was piloted and collected according to a proforma. The first study assessed the epidemiology and methodological quality of surgical RCTs, and compared these characteristics with what is known about general medical RCTs. The second study assessed reporting quality by compliance with the Consolidated Standards of Reporting Trials (CONSORT) statement. The third study investigated the association between methodological quality and treatment effects in surgical RCTs. The fourth and fifth studies examined patterns of outcome reporting. The association between statistical significance and reporting of outcomes (outcome reporting bias) was explored. The extent to which outcomes measured in surgical RCTs are patient important was also assessed. Finally, the sixth study assessed the epidemiology, reporting and methodological quality of meta-analyses of surgical RCTs.

The results show that there is substantial room for improvement in the conduct and reporting of RCTs of surgical interventions. Inadequate methodology was common, and was associated with an exaggeration of treatment effects. There was concerning evidence of unreported outcomes, and complete outcome reporting was associated with statistical significance. Only two thirds of primary outcomes were patient important.

If the truth about surgical interventions is to be discerned, the conduct and reporting of surgical trials must improve. Much of this responsibility lies with study authors, but journal editors and reviewers, and the funders of research also have an important role. Existing guidelines need to be promoted and imposed, and existing multicentre models for the conduct of surgical trials should be further explored.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts International (this is applicable to doctoral theses only).

1 June 2014

Signature Witness

The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research.

FOR OFFICE USE ONLY Date of completion of requirements for Award:

THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS

iQuEST: Investigating the Quality and Epidemiology of Surgical Trials

Sam Adie BSc(Med) MBBS (Hons) MSpMed MPH

A thesis in fulfilment of the requirements for the degree of Doctor of Philosophy

South Western Sydney Clinical School Faculty of Medicine, University of New South Wales Sydney, Australia

2014

Table of contents

Statement of originality ...... vii Copyright statement ...... viii Authenticity statement ...... ix List of figures ...... x List of tables ...... xiii List of appendices ...... xv List of abbreviations ...... xvi Acknowledgements ...... xvii Statement of contribution ...... xviii Declaration of funding ...... xviii Publications arising from this thesis ...... xix Presentations arising from this thesis ...... xix

1. The evolution of evidence in surgical practice ...... 1 1.1 Abstract ...... 1 1.2 A short history of evidence based medicine ...... 1 1.3 The rationale of randomised controlled trials ...... 6 1.4 Bias in randomised controlled trials ...... 11 1.4.1 Randomisation- Sequence generation ...... 11 1.4.2 Randomisation- Allocation concealment ...... 13 1.4.3 Blinding ...... 15 1.4.4 Attrition ...... 17 1.4.5 Publication and selective reporting bias ...... 19 1.4.6 Funding bias ...... 22 1.5 Assessing trial quality ...... 24 1.6 The challenges of surgical trials ...... 27 1.6.1 Challenges for surgeons ...... 28 1.6.2 Challenges for trial conduct ...... 30 1.6.3 Challenges for trial participants ...... 32 1.7 Summarising the evidence from surgical randomised trials: systematic review and meta-analysis of surgical interventions ...... 33 1.7.1 Narrative vs. systematic reviews ...... 33 1.7.2 Historical development ...... 34 1.7.3 The rationale of systematic reviews and meta-analyses ...... 35 1.7.4 Bias in systematic reviews and meta-analyses ...... 37 1.7.5 Measuring the quality of systematic reviews and meta-analyses ...... 39 1.7.6 The importance of systematic reviews and meta-analyses to surgery .... 40 1.8 Development of iQuEST: Investigating the Quality and Epidemiology of Surgical Trials ...... 41

2. Epidemiology and scientific quality of randomised trials of surgical interventions ...... 43 2.1 Abstract ...... 43 2.2 Introduction ...... 45 2.3 Aims ...... 47 2.4 Methods ...... 48 2.4.1 Study design ...... 48

iii 2.4.2 Eligibility criteria ...... 48 2.4.3 Sources of RCTs ...... 50 2.4.4 Electronic search strategy ...... 50 2.4.5 Study identification method ...... 51 2.4.6 Data extraction (including pilot) ...... 52 2.4.7 Items related to scientific quality ...... 53 2.4.8 General characteristics of trials ...... 54 2.4.9 Checking of data ...... 56 2.4.10 Author survey ...... 56 2.4.11 Data analyses ...... 58 2.4.12 Sample size calculation ...... 59 2.5 Results ...... 59 2.5.1 Results of search ...... 59 2.5.2 Epidemiology of published surgical RCTs ...... 60 2.5.3 Scientific quality of surgical RCTs ...... 62 2.5.4 Comparison with general medical RCTs ...... 65 2.5.5 Survey responses ...... 67 2.5.6 Discordance between survey responses and published report ...... 70 2.6 Discussion ...... 72

3. CONSORT compliance in surgical randomised trials: are we there yet? A systematic review ...... 81 3.1 Abstract ...... 81 3.2 Introduction ...... 82 3.3 Aims ...... 83 3.4 Methods ...... 84 3.4.1 Study design ...... 84 3.4.2 Eligibility criteria ...... 84 3.4.3 Study identification ...... 85 3.4.4 Data extraction ...... 86 3.4.5 CONSORT checklist ...... 86 3.4.6 Items related to external validity ...... 86 3.4.7 Associations with reporting quality ...... 87 3.4.8 Data analysis ...... 87 3.5 Results ...... 88 3.5.1 Study inclusion ...... 88 3.5.2 Characteristics of included studies ...... 88 3.5.3 Reporting of CONSORT items ...... 91 3.5.4 Reporting of items related to external validity ...... 91 3.5.5 Associations with reporting quality ...... 94 3.5.6 Inter-observer and intra-observer reliability assessment ...... 94 3.6 Discussion ...... 96

4. The association between quality and effect estimates in surgical randomised trials ...... 103 4.1 Abstract ...... 103 4.2 Introduction ...... 105 4.3 Aims and hypothesis statements ...... 108 4.4 Methods ...... 108 4.4.1 Study design ...... 108

iv 4.4.2 Inclusion of studies for the review ...... 109 4.4.3 Data extraction- primary outcomes ...... 109 4.4.4 Data extraction- quality domains ...... 109 4.4.5 Standardisation of effect sizes ...... 110 4.4.6 Statistical analysis ...... 111 4.4.7 The influence of quality domains on objective vs. subjective outcomes 112 4.4.8 Sensitivity analyses ...... 112 4.4.9 Publication bias ...... 113 4.5 Results ...... 114 4.5.1 Inclusion and characteristics of trials ...... 114 4.5.2 Univariable analyses ...... 116 4.5.3 Multivariable analyses ...... 120 4.5.4 Publication bias ...... 123 4.6 Discussion ...... 127

5. Outcome reporting bias in surgical randomised trials: A systematic review and meta-analysis ...... 134 5.1 Abstract ...... 134 5.2 Introduction ...... 135 5.3 Aims and hypothesis statements ...... 137 5.4 Methods ...... 138 5.4.1 Study design ...... 138 5.4.2 Inclusion of studies for the review ...... 138 5.4.3 Data extraction ...... 138 5.4.4 Data items ...... 139 5.4.5 Survey of authors ...... 143 5.4.6 Search of trial registries ...... 143 5.4.7 Data analysis ...... 144 5.4.8 Subgroup and sensitivity analyses ...... 145 5.5 Results ...... 146 5.6 Discussion ...... 158

6. Are outcomes reported in surgical randomised trials important to patients? A systematic review and meta-analysis ...... 164 6.1 Abstract ...... 164 6.2 Introduction ...... 166 6.3 Aims and hypothesis statements ...... 167 6.4 Methods ...... 168 6.4.1 Study design ...... 168 6.4.2 Inclusion of studies for the review ...... 168 6.4.3 Data extraction ...... 168 6.4.4 Data items ...... 169 6.4.5 Data analysis ...... 172 6.5 Results ...... 175 6.6 Discussion ...... 184

7. Characteristics and reporting of meta-analyses of surgical interventions: A systematic review ...... 190 7.1 Abstract ...... 190

v 7.2 Introduction ...... 191 7.3 Aims ...... 195 7.4 Methods ...... 195 7.4.1 Study design ...... 195 7.4.2 Eligibility criteria ...... 195 7.4.3 Search for meta-analyses ...... 196 7.4.4 Study identification methods ...... 196 7.4.5 Data extraction ...... 197 7.4.6 General characteristics of meta-analyses ...... 198 7.4.7 Reporting of meta-analyses ...... 198 7.4.8 Sample size calculation ...... 199 7.4.9 Data analysis ...... 199 7.5 Results ...... 205 7.5.1 Results of search ...... 205 7.5.2 Characteristics of surgical meta-analyses ...... 207 7.5.3 Inter-observer reliability testing ...... 208 7.5.4 Reporting of surgical meta-analyses: compliance with the PRISMA statement ...... 208 7.5.5 Reporting of issues related to methodological quality: compliance with the AMSTAR checklist ...... 213 7.5.6 Variables associated with PRISMA reporting ...... 215 7.6 Discussion ...... 217

8. A discussion of the findings of iQuEST ...... 225

References ...... 233

Appendix 1. Syntax of electronic search strategies employed to identify randomized trials ...... A-1 Appendix 2. Text of initial email invitation to participate in author survey ...... A-3 Appendix 3. First reminder email invitation to participate in author survey .... A-4 Appendix 4. Second reminder email invitation to participate in author survey A-5 Appendix 5. Third and final reminder email invitation to participate in author survey ...... A-6 Appendix 6. Copy of online author survey ...... A-7 Appendix 7. References to included surgical RCTs ...... A-19 Appendix 8. Operational definitions of data items collected for Chapter 2, including the CONSORT 2001 checklist, items related to external validity, and general study characteristics ...... A-50 Appendix 9. Syntax of electronic search strategy employed to identify meta- analyses ...... A-54 Appendix 10. References to included surgical meta-analyses ...... A-56

vi Statement of originality

‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at

UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’

Signed

Date: 1 June 2014

vii Copyright statement

‘I hearby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the

University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise the University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral thesis only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for partial restriction of the digital copy of my thesis or dissertation.’

Signed

Date: 1 June 2014

viii Authenticity statement

‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.’

Signed

Date: 1 June 2014

ix List of figures

Figure 1.1 The rationale of a randomised controlled trial, with sources of bias in red, and methodological safeguards against them in green ...... 8 Figure 1.2 The complexity of a surgical intervention (adapted from Ergina et al) ...... 31 Figure 2.1 Flow diagram depicting search and inclusion of eligible surgical RCTs ...... 61 Figure 2.2 Forest plot depicting comparison of adequate scientific quality domains for surgical RCTs vs. general medical RCTs from the year 2000. A higher risk ratio favours surgical RCTs ...... 66 Figure 2.3 Forest plot depicting comparison of adequate scientific quality domains for surgical RCTs vs. general medical RCTs from the year 2006. A higher risk ratio favours surgical RCTs ...... 66 Figure 2.4 Flow diagram depicting author survey invitations and responses received ...... 68 Figure 3.1 Flow diagram of randomised trial inclusion for this chapter ...... 89 Figure 4.1 Flow diagram of studies included the analysis for this chapter .. 115 Figure 4.2 Effect exaggeration in surgical RCTs. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain...... 116 Figure 4.3 Effect exaggeration in surgical RCTs: Subjective outcomes. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain...... 117 Figure 4.4 Effect exaggeration in surgical RCTs: Objective outcomes. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain...... 118 Figure 4.5 Effect exaggeration in surgical RCTs: Specified primary outcomes. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain...... 119 Figure 4.6 Effect exaggeration in surgical RCTs: Incorporating survey responses. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain...... 119 Figure 4.7 Multivariable analysis of effect exaggeration in surgical RCTs. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain...... 120 Figure 4.8 Multivariable analysis of effect exaggeration in surgical RCTs: Subjective outcomes. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain...... 121

x Figure 4.9 Multivariable analysis of effect exaggeration in surgical RCTs: Objective outcomes. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain...... 121 Figure 4.10 Multivariable analysis of effect exaggeration in surgical RCTs: Specified primary outcomes. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain...... 122 Figure 4.11 Multivariable analysis of effect exaggeration in surgical RCTs: Incorporating survey responses. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain...... 122 Figure 4.12 Funnel plot with pseudo 95% confidence limits. Standard error of effect estimate on y-axis plotted against effect estimates (logarithmic scale) on x-axis...... 124 Figure 4.13 Funnel plot with pseudo 95% confidence limits- Specified primary outcomes only. Standard error of effect estimate on y-axis plotted against effect estimates (logarithmic scale) on x-axis...... 124 Figure 4.14 Contour enhanced funnel plot. Inverse standard error of effect estimate on y-axis plotted against effect estimates on x-axis. Shaded areas represent different levels of statistical significance for each effect estimate...... 125 Figure 4.15 Contour enhanced funnel plot- Specified primary outcomes only. Inverse standard error of effect estimate on y-axis plotted against effect estimates on x-axis. Shaded areas represent different levels of statistical significance for each effect estimate...... 125 Figure 4.16 Egger regression plot. Standard normal deviate (SND) of effect estimate on y axis plotted against precision on x-axis...... 126 Figure 4.17 Egger regression plot- Specified primary outcomes only. Standard normal deviate (SND) of effect estimate on y axis plotted against precision on x-axis...... 126 Figure 5.1 Flow diagram of study inclusion for this chapter ...... 147 Figure 5.2 Flow diagram of survey responses ...... 148 Figure 5.3 Frequency histogram of number of outcomes in surgical trials .. 150 Figure 5.4 Frequency histogram of proportion of outcomes fully reported in surgical trials ...... 152 Figure 5.5 Frequency histogram of number of unreported outcomes in surgical trials ...... 152 Figure 5.6 Forest plot: Association between level of reporting and statistical significance- All outcomes ...... 154 Figure 5.7 Forest plot: Association between level of reporting and statistical significance- Efficacy outcomes ...... 155

xi Figure 5.8 Forest plot: Association between level of reporting and statistical significance- Harm outcomes ...... 156 Figure 6.1 Association between patient important outcomes and primary specification ...... 179 Figure 6.2 Association between patient important outcomes and statistical significance- All outcomes ...... 181 Figure 6.3 Association between patient important outcomes and statistical significance- Efficacy outcomes ...... 182 Figure 6.4 Association between patient important outcomes and statistical significance- Harm outcomes ...... 183 Figure 7.1 Flow diagram of surgical meta-analysis inclusion ...... 206 Figure 7.2 Frequency histogram of PRISMA scores ...... 211 Figure 7.3 Star chart depicting proportions of surgical meta-analyses adequately reporting each PRISMA item ...... 213 Figure 7.4 Star chart depicting proportions of surgical meta-analyses adequately reporting each AMSTAR item ...... 214 Figure 7.5 Forest plot depicting univariate regression results with regression point estimates and 95% confidence intervals for each associate variable 216 Figure 7.6 Forest plot depicting multivariate regression results with regression point estimates and 95% confidence intervals for each associate variable ...... 216

xii List of tables

Table 1.1 Oxford Centre for Evidence Based Medicine 2011 Levels of Evidence ...... 10 Table 1.2 Differences between pharmaceutical and surgical interventions .. 28 Table 2.1 Subspecialties recognised by the Royal Australasian College of Surgeons ...... 51 Table 2.2 Operational definitions of scientific quality domains ...... 55 Table 2.3 Operational definitions of general characteristics of surgical RCTs ...... 57 Table 2.4 Epidemiology of surgical RCTs ...... 63 Table 2.5 Reporting of scientific quality domains in surgical RCTs ...... 64 Table 2.6 Survey responses related to scientific quality ...... 69 Table 2.7 Discordance in trial report and author survey response: Randomisation sequence ...... 71 Table 2.8 Discordance in trial report and author survey response: Allocation concealment ...... 71 Table 2.9 Discordance in trial report and author survey response: Any blinding ...... 71 Table 2.10 Discordance in trial report and author survey response: Funding ...... 71 Table 3.1 Characteristics of randomised trials included in this chapter ...... 90 Table 3.2 Reporting of individual CONSORT items ...... 92 Table 3.3 Reporting of items related to external validity ...... 93 Table 3.4 Results of exploratory regression for variables associated with CONSORT score ...... 95 Table 5.1 Levels of outcome reporting (Adapted from Chan et al. 2004) ... 141 Table 5.2 Information required for classification as a fully reported outcome (or sufficient outcome data for inclusion in meta-analysis). Adapted from Chan et al. 2004...... 142 Table 5.3 Characteristics of RCTs analysed for issues related to outcome reporting ...... 149 Table 5.4 Mean proportions of outcome characteristics per trial, stratified by trial characteristics ...... 151 Table 5.5 Pooled odds ratios of selective outcome reporting bias ...... 157 Table 5.6 Results of exploratory meta-regression examining for variables with associations with selective outcome reporting bias ...... 157 Table 6.1 Operational definitions of patient important outcomes with examples ...... 170

xiii Table 6.2 Mean proportions of patient important outcomes per trial, stratified by trial characteristics ...... 176 Table 6.3 Proportion of trials reporting patient important outcomes as primary outcomes, stratified by trial characteristics. Data presented only from trials that explicitly specified a primary outcome. Trials that did not specify primary/secondary outcomes were not included in this descriptive analysis ...... 177 Table 6.4 Results of exploratory metaregression of association between patient important outcomes and specification as primary ...... 180 Table 6.5 Pooled odds ratios of association between patient important outcomes and statistical significance ...... 184 Table 7.1 Operational definitions of general characteristics of surgical meta- analyses ...... 200 Table 7.2 Preferred reporting items for systematic reviews and meta- analyses (PRISMA) 2009 checklist with operational definitions ...... 201 Table 7.3 AMSTAR 2007 (A measurement tool to assess the methodological quality of systematic reviews) items with operational definitions ...... 203 Table 7.4 Characteristics of surgical meta-analyses ...... 209 Table 7.5 Interobserver reliability assessments for each PRISMA item using kappa statistics ...... 210 Table 7.6 Interobserver reliability assessments for each AMSTAR item using kappa statistics ...... 211

xiv List of appendices

Appendix 1. Syntax of electronic search strategies employed to identify randomized trials ...... A-1 Appendix 2. Text of initial email invitation to participate in author survey ... A-3 Appendix 3. First reminder email invitation to participate in author survey . A-4 Appendix 4. Second reminder email invitation to participate in author survey ...... A-5 Appendix 5. Third and final reminder email invitation to participate in author survey ...... A-6 Appendix 6. Copy of online author survey ...... A-7 Appendix 7. References to included surgical RCTs ...... A-18 Appendix 8. Operational definitions of data items collected for Chapter 2, including the CONSORT 2001 checklist, items related to external validity, and general study characteristics ...... A-50 Appendix 9. Syntax of electronic search strategy employed to identify meta- analyses ...... A-54 Appendix 10. References to included surgical meta-analyses ...... A-56

xv List of abbreviations

RCT – Randomised Controlled Trial

EBM – Evidence Based Medicine

CONSORT – Consolidated Standards of Reporting of Randomised Trials

PICO – Participants, Interventions, Controls, Outcomes

BMJ – British Medical Journal

ORBIT – Outcome Reporting Bias in Trials (Study)

OR – Odds Ratio logOR – Natural logarithm of the odds ratio

RR – Risk Ratio

ROR – Ratio of Odds Ratios

95% CI – 95% Confidence Interval

BRANDO – Bias in Randomised and Observational Studies (Study)

ISRCTN – International Standard Randomised Controlled Trials Number

EU-CTR – European Union Clinical Trials Register

ANZ-CTR – Australia and New Zealand Clinical Trials Register

SPIRIT – Standard Protocol Items: Recommendations for Interventional Trials

PRISMA – Preferred Reporting Items for Systematic Reviews and Meta- analyses

AMSTAR – A Measurement Tool to Assess the Methodological Quality of Systematic Reviews

QUOROM – Quality of Reporting of Meta-analyses

SDGC – Studienzentrum der Deutschen Gesellschaft für Chirurgie (The Study Centre of the German Surgical Society)

xvi

Acknowledgements

To the One- Who created Knowledge,

To my parents- who raised me to love it.

To my wife- always there for the highs and lows,

To my family- who encouraged me as I got through it.

To my research supervisors- who guided me on the windy track,

To my clinical supervisors- who were sympathetic as I walked along it.

xvii Statement of contribution

Dr. Sam Adie is the guarantor for this thesis, and was responsible for the conception, search for studies, data extraction, data analysis, interpretation of results, and writing all sections of the thesis.

Professor Ian Harris was responsible for the conception and interpretation of the research, and was involved in the duplicate search for studies found in chapter two of this thesis.

Professor Jonathan Craig was involved in the conception and interpretation of the research.

Dr. Justine Naylor was involved in the duplicate data extraction found in chapters two and three of this thesis.

Dr. Rajat Mittal was involved in the duplicate data extraction found in chapters two, three, four, five and six of this thesis.

Dr. David Ma was involved in the duplicate search for studies and data extraction in chapter seven of this thesis.

Declaration of funding

Dr. Sam Adie was supported by scholarship grants from the National Health and Medical Research Council of Australia (Biomedical Postgraduate Scholarship), and the Royal Australasian College of Surgeons (Sir Roy McCaughey Research Fellowship).

The funders had no role in the design, data collection or analysis of this study.

xviii Publications arising from this thesis

Sam Adie, Ian Harris, Justine Naylor, and Rajat Mittal. CONSORT compliance in surgical randomized trials: are we there yet? A systematic review. Annals of Surgery. 2013 Dec;258(6):872-8. (Bound into this thesis as Chapter 2).

David Ma, Sam Adie, Ian Harris, Justine Naylor, Jonathan Craig. Quality of conduct and reporting of meta-analyses of surgical interventions. Annals of Surgery. Accepted on 12 May 2014. (Bound into this thesis as Chapter 7).

Presentations arising from this thesis (presenter underlined)

Adie S, Harris IA, Craig J, Naylor JM, Mittal R. An examination of the epidemiology, methodologic quality, and outcome effect exaggeration in published randomised trials of surgical interventions. Read at the Surgical Research Society 49th Annual Meeting, Queen Elizabeth Hospital, Adelaide, SA, Australia, 9th Nov 2012.

Adie S, Harris IA, Craig J, Naylor JM, Mittal R. CONSORT compliance in surgical randomized trials: are we there yet? Read at the Australian Orthopaedic Association Annual Scientific Meeting, Sydney, Australia, 9th Oct 2012.

Adie S, Harris IA, Craig J, Naylor JM, Mittal R. Are reports of surgical trials generalizable? A systematic review. Read at the Australian Orthopaedic Association Annual Scientific Meeting, Sydney, Australia, 9th Oct 2012.

Adie S, Harris IA, Naylor JM. The association between quality and effect estimate in surgical randomised trials. Read at the Australian Orthopaedic Association Annual Scientific Meeting, Darwin, Australia, 8th Oct 2013.

xix

“Once you start to get trials which show, for example, that 50

years of radical mastectomies have not been justified, then

you can start to see what a powerful weapon such evidence

is for those who wish to challenge authority”

–Sir Iain Chalmers1

xx 1. The evolution of evidence in surgical practice

1.1 Abstract

For thousands of years, physicians have been attempting to decipher the truth about which treatments are beneficial for their patients. As the methods for determining these truths have evolved, so have the nature and landscape of medicine. In this narrative review, a brief history of evidence based medicine is recalled, eventually leading to the adoption of the randomised controlled trial (RCT) as the most trusted method for determining the effects of treatments. The rationale and methodology of the RCT is discussed, along with its different sources of bias. Consideration is given to the particular challenges of conducting RCTs of surgical interventions. The combination of

RCTs in systematic reviews and meta-analyses is also discussed, along with implications for surgical research. The review concludes with the research questions of this thesis, which tells a story of the biases that may be present in randomised trials and meta-analyses of surgical interventions.

1.2 A short history of evidence based medicine

“All who drink of this treatment recover in a short time, except those whom it does not help, who all die. It is obvious therefore, that it fails only in incurable cases” –Galen2

For physicians, it’s about the pursuit of truth. For millennia, ailing patients were subjected to treatments that were believed to be effective, but may have indeed been harmful or inconvenient. The truth was often based on personal experience, or on consulting authority. For medieval physicians, the

1 prescription of treatments was based on current humoral theories, and experimentation was interpreted as a physician’s personal experience with a treatment.3 The production of medical texts was laborious and painstaking, taking months to complete. The transmission of medical knowledge was primarily from master to student, and scientific developments were based on the discovery of texts from Ancient Greek intellectual giants like Hippocrates and Aristotle.3 Thus for hundreds of years, complex and enigmatic treatments of apothecaries were prescribed – often to the patient’s misery. It is easy to view such practices with disdain, but it is here that the roots of evidence based medicine lie. For example, Galen strongly argued for a close union between empiricism and theory,4 and the medieval Arab physician Avicenna

(ibn Sina) had such a rigorous method of experimentation5 that his Canon of

Medicine is widely considered to be the most influential medical text in history.

The development of evidence based medicine has been divided into four distinct phases.6,7 In the first, ancient phase, anecdotes can be found that are in line with a modern understanding of science in medicine, but a practical application of the scientific method to the bedside was largely absent. The second, renaissance phase, took place in the enlightenment period of the

1700s. This period was characterised by the development of scepticism of established authorities and traditions. Medical texts were becoming more common, and personal medical journals were used to document specific data.

Age old procedures like blood letting for fevers were being questioned8 and subsequently, slowly abandoned. The success of new surgical procedures was documented so that surgeon colleagues were able to be convinced of

2 their efficacy. An example was William Cheselden’s data on his lithotomy approach for bladder stones, which was remarkable not just because they accurately documented mortality data of 213 consecutive patients, but because he analysed it using an age stratification approach.9 Importantly, this period also saw the emergence the formal use of clinical trials. Lind’s

1753 momentous work, A Treatise of Scurvy, was perhaps the first systematic clinical experiment, characterised by a meticulous collection of data and a direct comparison of different treatment groups. In the third period, the transitional era between 1900 to the 1970s, texts were commonplace, and the publications of results were widely available in medical journals. Two leaps in EBM took place during this period. First, Ernest Codman’s “end result of medical care” idea. Codman, the “improper Bostonian” surgeon, standardised pre- and postoperative surgical management on a 5x8 inch

“cards”, and also developed a classification of safety outcomes in order to improve care.7,10 His approach often agitated the status quo, since it emphasised a surgeon’s status based on patient outcomes, rather than seniority.11 Second, was the development of what would become the strongest method for assessing treatment effects – the randomised controlled trial (RCT). Perhaps the first published RCT was a blinded trial of sanocrysin for tuberculosis in 1931,12 where “matched” patients were allocated to treatment groups by the flip of a coin. But the experiment that firmly placed RCTs in the repertoire of evidence based medicine was the

1948 British Medical Research Council trial of streptomycin for tuberculosis.13

In this trial, random numbers were concealed in envelopes to determine patient allocation. Bradford-Hill, considered the father of modern medical

3 statistics, oversaw the trial, and recognised the importance of both the genesis and implementation of randomisation in an unbiased manner.14

The modern era of EBM commenced in the late 20th century. With the onset of the digital age, medical information is now widely available to share. Novel medical ideas are expected to have a scientific rationale. With this explosion of new information came a challenge to understand and to appropriately place it in context, and the concept of bias was developed and explored.

Several key intellectuals are synonymous with modern EBM. The first was

Archie Cochrane. A physician by training, and a “fiery independence of mind”1 that did not allow him to accept traditional modes of thinking. In his practice as a clinician, he was preoccupied with the thought that his treatments were causing more harm than good. As a prisoner of war to the

Germans during World War II, he used his limited resources to care for his sick and malnourished comrades, even conducting trials to determine the most effective therapy, while being viewed as “uberflüssing” (superfluous) by his German captors.15 After the War he was awarded a Rockefeller fellowship, and was trained in medical statistics by Bradford-Hill at the

London School of Hygiene and Tropical Medicine. His work earned him a legendary reputation for his meticulous approach to data. He was a strong advocate of social medicine and the United Kingdom National Health Service, but realised that much resources were being wasted on approaches to treatment that had little scientific evidence. In 1972, he wrote his most widely known work – “Effectiveness and Efficiency: Random Reflections on Health

Services”16 which encapsulated in writing what he had been urging for many years- treatments should be based on the best scientific evidence and those

4 that do not have any evidence should be discarded in the best interests of the patient and scarce health resources. Focussing on the elimination of bias,

Cochrane recognised that the best method for the evaluation of interventions was the randomised controlled trial.16 His ideas are what eventually sparked an international movement to improve the use of scientific evidence in healthcare, posthumously named after him: the Cochrane Collaboration.

Around the same time in North America, the field of clinical epidemiology rapidly evolved with the work of Alvin Feinstein in the 1960s. Feinstein was frustrated by what he saw as the gap between clinical and research practice.

Researchers, he observed, rejected clinical data as an unreliable form of measurement in clinical trials. But clinicians were often preoccupied with the nuances and heterogeneity of their clinical findings, and this often informed their judgments.17 His work on “clinical taxonomy”18 and the classification of disease allowed clinicians to analyse their own practice, and gave birth to a new basic science, termed clinical epidemiology.17,19,20 While he recognised the tremendous achievements of the randomised trial, he also criticised its limitations19 and was frustrated that a new generation of clinical epidemiologists emphasised its use. In 1967 at McMaster University, a new medical school was being established and David Sackett was appointed as director of the newly established Department of Clinical Epidemiology and

Biostatistics. Sackett married the approaches of Cochrane (who emphasised the role of the randomised controlled trial and the elimination of bias) and

Feinstein (who emphasised the role of clinicians in evaluating their own work).1 Here, a healthy scepticism of medical authority was encouraged, and an emphasis placed on a scientific approach to clinical medicine, rather than

5 unsystematic clinical experience or pathophysiological rationales.21 In 1992,

Gordon Guyatt called this approach “evidence based medicine”, and described it as a paradigm shift in the approach to healthcare.22 Evidence based medicine, an iconoclastic concept,1 is now accepted as a standard for the teaching and application of healthcare in most modern institutions around the world.

1.3 The rationale of randomised controlled trials

“‘The RCT is a very beautiful technique, of wide applicability, but as with everything else there are snags.” – Archie Cochrane16

RCTs are widely accepted as the best study design to assess the effects of interventions.23 Prior to a discussion related to this, a number of definitions are warranted. A randomised controlled trial (RCT) may be defined as a prospective clinical experiment where individuals are allocated by chance to study groups in order to assess the effects of one or more health care interventions.24 Bias may be defined as systematic errors in a study that favour one outcome or intervention group over another.25 Bias may be differentiated from random (or chance) error, which is a function of how imprecise our measuring tools are, and which do not favour one outcome or intervention over another. Many mechanisms for bias have been described, but all have one thing in common: they result in a predictable deviation from the truth. Figure 1.1 is a schematic representation of the rationale of a RCT, with its different sources of bias and their methodological safeguards.

RCTs have a number of strengths. They are prospective in nature, and therefore do not suffer from issues related to recall bias, temporal changes,

6 and ascertainment bias due to differential measurements between groups.26,27

RCTs also elegantly deal with the issue of confounding. When selection bias results in an imbalance of prognostic factors, observed effects may be due to the influence of a prognostic variable rather than the intervention of interest.28

Confounding is a significant problem in non-randomised or observational studies, and it is common practice to deal with confounding at both the design and analysis stage of those studies. Even then, the risk of unmeasured (unknown) confounding may still occur. It is far less likely that

RCTs suffer from this problem, since the process of random allocation of eligible patients should equally distribute prognostic variables into treatment groups, if an appropriate sample size is used. The only characteristic differentiating treatment groups is whether or not the intervention of interest is received. Observed effects should therefore be due to those interventions.

Recent history has shown how confounded effects in non-randomised studies can profoundly affect health care after reporting of results that “get you into the BMJ (British Medical Journal) and the Friday papers”.29 For example, non-randomised studies found cardioprotective effects for hormone replacement therapy, beta-carotene, and vitamin C, but subsequent RCTs found no benefits (and in the case of hormone replacement therapy even suggested a harmful effect). Surgical practice is no stranger to this issue;

Examples include internal mammary artery ligation for angina pectoris,30 which for many years was widely used based on non-randomised study evidence until subsequent RCTs found it to

7 Figure 1.1 The rationale of a randomised controlled trial, with sources of bias in red, and methodological safeguards against them in green

Eligible Population

Randomisation and Allocation concealment Study

Selection Bias

F Blinding of u participants and n care providers d Treatment Control Performance n Bias Group Group g

B

a Blinding of outcome s assessors and data analysts

Measurement l Bias Outcomes

Adequate follow up and intention to treat analysis

Attrition Bias

Trial registration and publication of protocols Outcome reporting bias

8 be ineffective; and arthroscopic surgery for osteoarthritis of the knee, which is still widely practiced31 despite being ineffective in large RCTs. 32,33

Despite their strengths, RCTs are not immune to bias. Methodological safeguards in a trial are designed to minimise the risk of bias, but these may not always be performed adequately.34-37 While RCTs are generally regarded as the best form of evidence for examining the effects of interventions (Table

1.1), a poorly executed RCT may in fact represent a lower tier of evidence.38

For many years, the basis for adequate RCT design and execution was largely theoretical, but in 1995, an investigation by Schulz and colleagues demonstrated this bias in an empirical study.39 250 controlled trials from 33 different meta-analyses were divided into those with adequate vs. inadequate or unreported methodological domains, and their effect sizes compared.

Trials with inadequate methods of allocation concealment and blinding were found to exaggerate treatment effects compared with adequately performed trials. Since then, several studies have replicated these findings, and the field has continued to evolve with the exploration of new domains of bias. The following is a summary these domains, and a brief history of their evolution.

9 Table 1.1 Oxford Centre for Evidence Based Medicine 2011 Levels of Evidence40

Step 5 (Level Question Step 1 (Level 1*) Step 2 (Level 2*) Step 3 (Level 3*) Step 4 (Level 4*) 5) Systematic review Local and current of surveys that How common random sample Local non-random allow matching to Case-series** n/a is the problem? surveys (or sample** local censuses) circumstances** Systematic review Individual cross Case-control Is this Non-consecutive of cross sectional sectional studies studies, or “poor diagnostic or studies, or studies Mechanism- studies with with consistently or non- monitoring test without consistently based consistently applied applied reference independent accurate? applied reference reasoning reference standard standard and reference (Diagnosis) standards** and blinding blinding standard** What will Case-series or happen if we Systematic review Cohort study or case- control Inception cohort do not add a of inception cohort control arm of studies, or poor n/a studies therapy? studies randomized trial* quality prognostic (Prognosis) cohort study** Case-series, Does this Randomized trial Non-randomized case-control intervention Systematic review Mechanism- or observational controlled studies, or help? of randomized trials based study with cohort/follow-up historically (Treatment or n-of-1 trials reasoning dramatic effect study** controlled Benefits) studies** Systematic review of randomized trials, Non-randomized systematic review Individual controlled What are the of nested case- randomized trial cohort/follow-up COMMON control studies, n- or (exceptionally) study (post- harms? of-1 trial with the marketing observational Case-series, (Treatment patient you are surveillance) study with case-control, or Mechanism- Harms) raising the question provided there are dramatic effect historically based about, or sufficient numbers controlled reasoning observational study to rule out a studies** with dramatic effect common harm. (For Randomized trial long-term harms What are the Systematic review of or (exceptionally) the duration of RARE harms? randomized trials or observational follow-up must be (Treatment n-of-1 trial study with sufficient.)** Harms) dramatic effect Case-series, Is this (early Non -randomized case-control, or Mechanism- detection) test Systematic review of controlled Randomized trial historically based worthwhile? randomized trials cohort/follow-up controlled reasoning (Screening) study** studies**

* Level may be graded down on the basis of study quality, imprecision, indirectness (study PICO does not match questions PICO), because of inconsistency between studies, or because the absolute effect size is very small; Level may be graded up if there is a large or very large effect size. ** A systematic review is generally better than an individual study.

10 1.4 Bias in randomised controlled trials

1.4.1 Randomisation- Sequence generation

“The lot causes disputes to cease, and it decideth between the mighty” –

Proverbs 18:18.

Humans have used randomisation for thousands of years to ensure that selection is fair and reasonable. Biblical passages speak of “drawing lots” to cast judgments or for divination. Stories abound of lost sailors taking life or death decisions based on drawing lots, and in more recent times lotteries were used to determine which men were conscripted to war.41 Physicians have long recognised the problem of “imputation of selection” in their trials of treatment, and alternation was used for many years to ensure groups were as alike as possible.42 Early trials used dice43 and drawing lots,44 but the first published RCT is credited to Bradford-Hill for his role in the British Medical

Research Council’s trial of streptomycin for tuberculosis.13

Randomisation ensures that the effects observed reflect the treatments given, rather than the effects of nature and the passage of time, and minimises the risk that confounders, both known and unknown, are unequally distributed between groups. Perhaps the best empirical evidence for the use of randomisation comes from studies comparing the treatment effects from overlapping randomised and non-randomised studies. Ioannidis and colleagues assessed meta-analyses that included both randomised and non- randomised studies. While both study designs were consistent in the direction of treatment effect, non-randomised studies were consistently exaggerated by at least 50%.45 In their Cochrane review of this issue, Kunz and colleagues found 15 studies containing 35 randomised/non-randomised

11 comparisons. There was a wide variation in the deviation of effect estimates in non-randomised studies, from 76% smaller, to 400% larger.46 The authors concluded that, on average, non-randomised studies exaggerate treatment effects, but the magnitude and direction of the exaggeration was difficult to determine, and must be placed in the context of the research question.

An adequate method of random sequence generation is one where the next allocation is completely unpredictable, although the probability of that allocation is usually known. An inadequate method of sequence generation is one where the sequence may be predictable, such as allocation by medical record number, date of birth (or other identifying characteristic), by alternation, or day of the week. These methods are not only non-random, but also have implications for allocation concealment (see next section), and may introduce selection bias.47 The simplest analogy for random sequence generation is a coin toss for allocation into two groups, but other methods include the use of shuffled cards or lotteries, and more commonly in trials, random number tables or computer generated sequences.48 These methods are collectively termed Simple randomisation, and sometimes may result in an imbalance of group numbers due to chance. Restricted randomisation minimises this risk by randomising in blocks that are often random and permuted in size, but care must be taken by investigators to ensure block patterns remain unpredictable.49 Stratified randomisation is another form of restricted randomisation, where eligible participants are divided into strata according to an important characteristic. This approach is useful for minimising the risk of an uneven distribution of an important confounding variable that can threaten the validity of trial results. Finally, Minimisation is a

12 useful method of sequence generation whereby several important characteristics may be balanced in trial groups, but this relies on a mathematical algorithm that may not be entirely random in nature, which has led to some criticism.50 Nevertheless, minimisation techniques may include a random component, and are now generally accepted as an effective method of maintaining similarities between groups, particularly in trials with small numbers.51

1.4.2 Randomisation- Allocation concealment

Adequate randomisation is contingent upon allocation of participants being unpredictable. A method of sequence generation may be completely random, but if that sequence is known, it is subject to manipulation. Therefore adequate randomisation may be divided into two important components- sequence generation, and allocation concealment. Random sequence generation balances known and unknown confounders; allocation concealment preserves the integrity of that process.52

In their review of the development of fair treatment allocation schedules,

Chalmers and colleagues document several instances in the late 19th and early 20th century where alternate allocation was used. In 1933, after a trial for serum treatment of lobar pneumonia revealed imbalances in treatment groups, Bradford Hill noted how alternate allocation could be subverted by those recruiting patients.42 The result of this and other similar developments was the realisation that unbiased allocation schedules had to be both random in nature, as well as concealed from personnel. The 1948 United Kingdom

Medical Research Council trial of streptomycin for tuberculosis was a landmark, not just because it used random number tables, but also because

13 it clearly documented concealment of its allocation sequence, thus setting the standard for future RCTs.13

The concept of allocation concealment has been poorly understood for years.

Researchers may erroneously accept a random sequence as “randomisation” but may not always stress concealment.47 This is what allowed Schulz and colleagues to produce strong empirical evidence of biased trial results in one of the first meta-epidemiology studies.39 Treatment effects were exaggerated by 41% for inadequately concealed trials, and 30% for trials where concealment methods were unclear, compared to trials with adequate concealment. These findings were replicated by other authors using trials in different clinical areas and with different statistical approaches.53-56

Adequate methods of concealment will protect the sequence before and until treatments are assigned. In surgical trials, or any other trial, concealment

(and therefore adequate randomisation) is always possible to carry out in a valid manner. Perhaps the most reliable method is the randomisation of eligible participants by a third party otherwise uninvolved with the project.57

Other popular methods involve the use of sealed envelopes containing allocation details, or in the case of drug interventions, sequentially numbered containers. But since the randomisation of medical treatment is “slightly odd”58 to human nature, manipulation of this process is always possible, and there are reports of envelopes scrutinised under intense light, or drug containers examined for the shape and smell of their contents.47 The motivations of such behaviour need not be nefarious, and anecdotes exist of surgical residents assigning patients in order to gain experience in a particular procedure, or because they felt uncomfortable doing one. Any

14 method used should have safeguards to protect against this subversion, and any method of randomisation that is predictable, such as alternate allocation, or allocation by date of birth or record number, cannot be adequately concealed.57

1.4.3 Blinding

When participants, health care providers, outcome assessors, or data analysts are aware of the assigned interventions, differential performance or measurement errors may occur. Blinding is designed to prevent this. When a trial is adequately blinded, it is impossible for the expectations of participants or trial personnel to influence outcomes.25

The need for blinded assessment of results was realised hundreds of years ago. The earliest record of blinding is found in the work of Lavoisier and

Franklin, who tested the healing effects of magnetic fields (mesmerism) in a carefully blinded trial in 178459 In the 19th century, it was common for homeopaths to use blinding in their “provings” of treatments. By the early 20th century, driven mostly by the work of physiologists and pharmacologists, blinding was well entrenched in research practice, albeit for different reasons.

German investigators used blinding to prevent bias, while those in the United

Kingdom and United States used placebo interventions as a tool to maintain participant follow up.59 In early investigations of surgical procedures, the need for blinding was not apparent, since mortality was the main measure of success or failure, and was unlikely to be influenced by biased assessments.60 But surgical practice has evolved, and with advances in life expectancy in the 20th century, surgical interventions were increasingly used

15 to improve symptoms – the measurements of which are often highly subjective, and prone to bias.

An adequately blinded trial is one where interventions are not distinguishable in any sense – taste, smell, shape or method of administration. When a control arm involves an inactive intervention, blinding involves the use of a placebo. For drug trials, creating a placebo drug is often a simple process, and the placebo may even be designed to have active properties that mimic the side effects of the intervention group, such as a dry mouth or dizziness.61

For surgical interventions however, blinding is difficult or even impossible.62 A placebo controlled surgical trial requires the use of a sham procedure – a surgery that is externally identical but omits any therapeutic steps. Sham controlled surgical trials are controversial, and have issues with informed consent and subsequent maintenance of recruitment.63 Surgical trials involving the use of sham surgery are rare, but have been performed in arthroscopic surgery for osteoarthritis,32 internal mammary artery ligation for angina,30 and stem cell transplants for Parkinson’s disease,64 all of which have generated much discussion and controversy in the medical community.

The proponents of sham controlled surgical trials argue it is the only way that entrenched practices, which may in truth be harmful, can be debunked.65

Indeed, the placebo effect, the observed benefit of an intervention with no known therapeutic action, is a real and tangible thing. For example, analgesic medication perceived to be novel and expensive, or when directly administered by a doctor (rather than nurse) is more effective for subjective pain relief.66 Further, physiological effects of placebo are consistent with radiological findings, which are more objective in nature.67

16 Empirical evidence of bias exists in trials that are not blinded. Studies following the landmark sham controlled study of internal mammary artery ligation estimated the “placebo effect” of surgical procedures at approximately 35%,68 but these studies could not account for improvements in the natural history of the disease, or regression to the mean.69 In a study that pooled results from several meta-epidemiology studies, Wood and colleagues found outcomes were exaggerated in unblinded trials that used subjective outcomes. When only objective outcomes (such as mortality) were considered, no evidence of an exaggeration was found.70

Surgery may be the ideal placebo. Any surgical procedure exists within a therapeutic “aura”71 made up of the patient (and their expectations), the surgeon (and his/her personality), the anaesthetic, as well as issues related to the surgery itself, which may involve the use of new technology and expensive devices.72 Any or all of these may influence the effects of treatment. In addition, this therapeutic aura is overstated in a surgical procedure, when compared to its presence in a drug intervention. Beecher68 argued that the personality of a surgeon was the most important component of the surgical placebo aura, citing evidence that surgeon enthusiasts had superior clinical results to surgeon sceptics. Regardless, history is replete with surgical procedures abandoned after many decades (or even centuries) of use.68 This underscores the need for blinding in surgical trials.

1.4.4 Attrition

When participants in a trial are withdrawn or lost to follow up, systematic differences may occur in trial groups, leading to attrition bias.25 Attrition of patients reflects a deviation from the planned protocol and reflects two main

17 patterns. The first are losses to follow up, where outcome data is missing at certain timepoints. Little can be done to address this problem, as it is impossible to make a correct determination about the outcome(s) of lost patients.73 It is a common but erroneous practice to find trial data presented with lost patients included in denominators.74 Although statistical approaches exist to deal with lost data, including the use of regression models, best and worst case scenarios, and imputation, all of these make unverifiable assumptions about the data, and may lead to bias.73 The second source of attrition is withdrawal from the prescribed trial treatment. When a lack of adherence results in a patient receiving the same treatment as an opposing trial arm, crossover occurs. Withdrawals and crossovers may be associated with prognostic variables, leading to systematic differences in the resultant groups.57 Most researchers and statisticians agree that the primary analysis of trial data should use the intention to treat principle,73 which states that trial groups should be analysed according to the group to which they were randomised (regardless of what intervention was actually administered). This approach preserves the integrity of the randomisation process, and prevents inferences based on sub-group characteristics. Since it provides a conservative estimate of treatment effect, Type 1 errors (rejecting the null hypothesis when it is actually true) are also unlikely to occur.75 On the other hand, a per protocol analysis, where analysis is according to the treatment received, is useful as it may provide an estimate of efficacy, but often exaggerates treatment effects beyond that observed in practical scenarios.

There is evidence that authors misunderstand and misapply the intention to treat principle. In their analysis of 249 trials published in high impact general

18 medical journals, Hollis and colleagues found that although 119 (48%) trials reported that intention to treat was used, a large proportion of these trials

(45%) reported evidence to the contrary.74 There is also empirical evidence of bias in treatment effects when the intention to treat approach is not adopted. A review of published results from trials indexed on PubMed showed that effect estimates from per protocol analyses were systematically larger than intention to treat analyses, although a wide variation in error exists.76 Three meta-epidemiology studies have assessed the influence of attrition bias on treatment effects, with conflicting results.39,55,77 However, these studies relied on the reporting of attrition by authors, rather than an assessment of whether methodology was adequate. Tierney and Stewart studied this issue using a more accurate approach. Using their access to oncology clinical trial data, they produced intention to treat results from individual patient data in different oncology trials, and compared it to results published by authors. It was found that 69% of trials excluded between 0.3% and 38% of patients randomised, and in most cases the published results were more in favour of research treatment.78 However, there was no evidence of a systematic bias in one direction or another. Clearly, the effects of an analysis must be assessed in its individual context.

1.4.5 Publication and selective reporting bias

Evidence based changes in practice are contingent upon the publication of research results, but when results are unfavourable they are less likely to be disseminated. This bias may occur at two levels. Within study bias can occur when only a subset of outcomes are reported and published: selective reporting bias. At the aggregate level, bias can occur when studies are

19 published based on the direction or significance of their results: publication bias. Selective reporting or publication of trials with positive results may lead to an exaggeration of treatment effects when results are combined in systematic reviews.79

Empirical evidence of publication bias is gathered from cohort studies that followed protocols through to their publication. Easterbrook and colleagues identified 487 projects approved by the Central Oxford Research Ethics

Committee between 1984-1987. Studies with statistically significant results were more likely to be published, be published in higher impact journals, and lead to a larger number of publications.80 Ioannidis followed 109 National

Institutes of Health funded trials in human immunodeficiency virus infection.

Positive trials were published much faster (a difference of over 2 years) than negative trials, and most of the time difference occurred in the period between trial completion and publication.81 Stern and colleagues had similar findings in their Australian study, which also included non-randomised studies.82 Conversely, a cohort of studies submitted to the Journal of the

American Medical Association found no evidence of publication bias, either in the submission of manuscripts to the journal, or in editorial decisions,83 but these findings may only be generalizable to high impact general medical journals. A recent systematic review found strong, direct empirical evidence of publication bias when summarising the evidence from eleven studies,84 and a Cochrane review from Hopewell and colleagues quantified the increased likelihood of publication of positive results at 78%.85 Although no direct evidence from cohort studies (that followed protocols to completion) exists, there is no reason to believe the above findings do not apply to

20 surgical research. Evidence of publication bias in surgery is restricted to the examination of published results, which are found to be disproportionately positive.86,87

A correlate of publication bias at the study level is the selective reporting of outcomes. This bias occurs in published studies, and may take several forms, including the selective reporting of analyses conducted, the selective presentation of data with different measures, or the selective reporting of a subset of outcomes from those that were measured, termed outcome reporting bias.84 The strongest evidence of outcome reporting bias stems from the work of Chan and colleagues. In their examination of trial protocols from Danish ethics committees and the Canadian Institutes of Health, it was found that statistically significant outcomes were more likely to be reported in full. There was also a concerning prevalence of primary outcomes that were changed or omitted.24,88

The effect of publication bias and selective reporting bias may be appreciated when the evidence from RCTs is summarised in systematic reviews and meta-analyses. Sutton and colleagues used a statistical approach to estimate the number of “missing” studies using the trim and fill method in Cochrane reviews.89 Most reviews showed evidence of missing studies, but inferences were affected only in a minority. Two other studies compared summary effects from published trials to those that also included unpublished (grey) literature. The results were consistent: pooled effects from published trials were 7-9% greater than pooled effects from unpublished trials.54,90 The ORBIT (Outcome Reporting Bias in Trials) study examined the impact of outcome reporting on the conclusions of Cochrane reviews. It was

21 found that 64% of reviews included at least one trial with a high suspicion of outcome reporting bias, and in 10% of reviews, conclusions were not robust to outcome reporting bias (pooled results changed from favouring the treatment to a non-significant result).91

1.4.6 Funding bias

In an ideal setting, the interests should primarily be about elaborating the truth regarding the efficacy or safety of an intervention. In reality, there are often competing interests. Trial investigators often have vested academic interests, and research groups are strongly encouraged to “publish or perish”.92 But the most concerning competing interests are funding sources, particularly when funded by for-profit organisations, as trial conclusions may have a direct impact on financial gains and losses.93 Industry funding may take the form of direct payments, or donations of material or medical devices.

Given that a substantial proportion (approximately one third) of both medical and surgical trials are industry funded,34,94 there has been much interest in the relationship between industry funding and trial conclusions. Two large systematic reviews have been conducted: Lexchin and colleagues included

30 studies, and found that while funded trials were of good methodological quality, they were also four times more likely to have outcomes favouring the sponsor.95 Bekelman and colleagues included 37 studies, and found a high prevalence of financial relationships between academic institutions and industry start-ups. Their results were consistent, with industry funded studies

3.6 times more likely to support the sponsors product.96 Another study that had a particular focus on surgical trials found that surgical trials were less likely to report industry funding than pharmaceutical trials, but were

22 significantly more likely to have conclusions that favour the sponsors product.94 However, it is also possible that sponsored interventions are chosen based on their expected (larger) treatment effects. This potential confounder was assessed in a study by Als-Nielsen and colleagues, who examined the association between trial funding and conclusions, adjusted for treatment effect size and trial methodological quality.97 Despite these adjustments, it was found that conclusions in funded trials were 5.3 times more likely to favour the sponsor’s intervention, which suggests that bias may be in the interpretation of study results.

The mechanisms for bias in industry sponsored research are not clear.

Funded research often results in larger sample sizes and higher methodological quality, so these are unlikely to be the sources of bias.93,98

Industry sponsors may be more likely to sponsor research that will show favourable treatment effects – a violation of the so called “uncertainty principle”.99 It is also likely that industry sponsored research is subject to publication bias. Meta-analysis of the effects (including harms) of treatments manufactured by for-profit industry is often different when unpublished studies are included. Recent examples include anti-depressant medication,100 chemotherapy for ovarian cancer,101 and the harmful cardiovascular effects of rofecoxib.102 Evidence from these case studies is supported by two cohorts of trial protocols followed through to publication. In both cohorts, industry funded trials were less likely to be published than other studies.80,103

Competing interests are common in surgical practice. In a recent study by

Okike and colleagues, it was found that almost 30% of physicians did not

23 adequately disclose their related interests when presenting on a topic at the

American Academy of Orthopaedic Surgeons annual meeting. Further, 50% of physicians did not disclose interests that were perceived as unrelated to their presentation.104 Controversy has also surrounded industry funded trials of a bone healing product commonly used in spine surgery, where it was found that harms were grossly underestimated,105,106 and public outrage was the result when surgeons did not adequately disclose the extent of payments received by a medical device company in exchange for the use of their products107

1.5 Assessing trial quality

The quality of a randomised trial may be defined as the confidence that the design, conduct, report and analysis restricts bias in the comparison of interventions.25 An assessment of trial quality is synonymous with a critical appraisal of the study, and the evaluation of how accurately the results reflect the truth. It has been shown that there is mounting empirical evidence of bias when different aspects of quality are not methodologically sound. Quality assessments are essential for both individual appraisal and understanding of

RCTs, as well as the synthesis of studies in systematic reviews and meta- analyses.

There has been much debate as to how the quality of RCTs should be assessed.108 Meta-epidemiology studies that defined the relationship between quality domains and the exaggeration of treatment effects assessed domains on an individual basis,39,53,55 and therefore this approach is supported empirically. However, another commonly used approach is the use

24 of checklists and scales, which provide an overall assessment of the quality of a trial. Checklists are qualitative assessments of the components of trial quality, while scales numerically score components and provide a quantitative summary score.109 By definition, scales assign a weight of each individual quality item in order to calculate a score. In their 1999 Health

Technology Assessment of quality assessment in systematic reviews, Moher and colleagues found that 83% of non-Cochrane systematic reviews used either a scale or a checklist, while 8% used an individual domain approach.

Cochrane reviews on the other hand opted for an individual domain approach in almost all cases.109

A plethora of trial quality scales and checklists exist. A systematic review conducted in 1995 identified twenty-five scales and nine checklists (although others may have since been developed or were not captured in that review).110 It was noted that almost all scales and checklists were not developed using appropriate techniques, and suffered from major weaknesses. Most scales did not report inter-observer reliability data, and issues of construct validity were largely absent in their development. These issues become important when considering the use of scales and checklists in meta-analysis, particularly where scales may be used to assign weight to included studies, or to exclude them based on a predetermined cut off. Using a meta-analysis of low vs. high molecular weight heparin as an example, Juni and colleagues examined the effect of quality score weighting on conclusions.

The size and direction of the pooled effect was starkly different and unpredictable when different scales were used to weight studies. A more recent study by Herbison and colleagues examined the utility of 43 different

25 quality scores on identifying high and low quality studies. None of the scores was better at identifying high vs. low quality studies, or at improving calculated results closer to the reference standard.111 The authors recommended that the practice be abandoned.

The Cochrane Collaboration recommends an individual domain approach to the assessment of quality, and term this an assessment of “risk of bias” rather than quality. The reasons for this distinction are that it is impossible to know whether poor methods have indeed had an impact on the results of a given study, hence use of the term “risk”.57 Further, while studies may be designed to the highest possible standard, they may still have important risk of bias.57 For example, the design of a trial comparing a surgical to a drug intervention may be excellent, but the lack of blinding of participants may introduce an important risk of performance or measurement bias, particularly for subjective outcomes.

Another dimension of quality is the reporting of randomised trials in publications. While the Cochrane Collaborations risk of bias tool has a focus on internal validity, reporting quality may be defined as the extent to which a trial report provides information on the design, conduct and analysis of a trial.112 This necessarily includes information on domains that are essential for readers to know, but are unlikely to result in biased results, such as a sample size calculation, the description of statistical analyses, or a discussion of trial generalisability. Empirical evidence consistently shows that trials from several subspecialty areas are poorly reported.113-115 In recognition of this, an international group of researchers, journal editors, statisticians and epidemiologists formed the Consolidated Standards of Reporting Trials

26 (CONSORT) group. In 1996, the first reporting guideline was produced,116 and was eventually adopted by a large number of journals. Since then, with the evolving nature of new methodological evidence,117 the CONSORT group has published a number of updates and extensions. The current checklist contains 25 items divided among the components of a trial report. Some items are related to bias, but many others are related to clarifying trial aims, details on the analyses conducted, and placing the trial findings in the appropriate context23 all of which are essential for readers to know.

1.6 The challenges of surgical trials

RCTs in surgery are rare. Instead, surgical research is characterised by a high prevalence of retrospective studies and single arm case series,118 resulting in criticism of surgical research as a “comic opera”.119 When surgical RCTs are performed, they are often poorly executed and reported,35,120 and surgical clinical practice is less likely to be based on evidence when compared to clinical medicine.121,122 It is clear that the nature of surgical interventions is different to pharmaceutical ones (see Table 1.2), but the challenges of conducting RCTs in surgery are far more complex. In the following overview, these challenges are broadly divided into surgeon, trial and patient factors.

27 Table 1.2 Differences between pharmaceutical and surgical interventions

Pharmaceutical interventions Surgical interventions

No learning curve Learning curve

Static Changing

Placebo simple Placebo difficult / impossible

No technique Skill required

Attrition less problematic High risk of cross over

1.6.1 Challenges for surgeons

Surgical training is a predominantly master-apprentice model. A large amount of credence is given to the practices utilised during training or “the way I have learned it”.123 This approach has important advantages in the development of practical surgical technique, but once adopted, change is difficult. In contrast to novel pharmaceutical interventions, many surgical procedures existed prior to the advent of randomised trials, and their assessment in randomised trials would require a radical change in surgical culture and thinking. Surgeons often argue that much of what they do has such an obvious benefit that conducting a trial would be unethical, citing the parachute experiment as an analogy.124,125 In reality, few surgical interventions have such large treatment effects that randomised trials are unwarranted.126

28 A related issue is the equipoise (or lack of it) among surgeons. Equipoise may be defined as a state of uncertainty among clinicians as to the comparative benefits of treatment alternatives. For equipoise to exist, an individual clinician may have a preferred approach, but must acknowledge alternate options may also be valid.127 This trait is perceived to be antithetical to surgical training.128 A surgeon’s ethos is characterised by a strong belief that an invasive procedure performed will be of benefit to their patient. On the other hand, obtaining consent for a trial with alternative options requires uncertainty, and may undermine the confidence that patients have in their surgeon.129

Surgeons are often defined by their specific sub-specialty interests, with the skills required often cultivated over many years of training. Further, these interests may be a major source of private practice income. It may represent a financial risk to expose these specialised surgical procedures to scientific testing, which may or may not support their practice.130 McCulloch and colleagues cite the example of laparoscopic cholecystectomy. Shortly after its introduction, this procedure became highly prevalent in private practice.

Surgeons were reluctant to perform RCTs, although early evidence did not support the new procedure.131 But in order for randomised trials to be done, skilled surgeons are required to participate as investigators. Another problem may arise when surgeons have vested interests in trial results. Two recent examples have generated much controversy – the first involving the underestimation of adverse events of recombinant bone morphogenetic protein for use in spine fusion surgery,106 and the second was the recruitment

29 of patients into a trial of vertebral disc replacement by surgeons who stood to financially benefit from the technology.132

Surgeons often complain about a lack of funding for surgical research, and the resultant lack of opportunity to develop skills as surgical trialists.133 In the

United States, National Institutes of Health funding for surgical research awards increased by 41% over the last 20 years, a substantial lag behind competing awards, which rose by 79%.134 The professional development of surgical trialists has been described as “painful and haphazard”,133 characterised by limited access to surgeon mentors and experiential learning in trials. Surgeons make substantial commitments to their clinical training with long periods spent in the operating theatre. Further, academic pursuits are not conducive to rapid career development,135 and academic qualities are less likely to be seen as traits in surgical opinion leaders.136 As a result formal training in epidemiology and biostatistics is rare amongst surgeons,135 and this may lead to a vicious cycle of poorly conducted and poorly funded research.

1.6.2 Challenges for trial conduct

Surgical procedures are complex and often involve a number of components performed by the surgeon, but also anaesthetists, physicians, operating theatre and ward staff (Figure 1.2). Co-interventions are common, and the interactions between these and the surgical procedure may lead to confounding. The surgical procedure itself may be subject to substantial variation in technique, and may lead to disagreement amongst surgeons as the optimum technique to evaluate in trials. Further, surgical interventions are subject to learning curves, where earlier attempts are characterised by higher

30 Series

of health-care delivery in ways that a pharmacological co-interventions might be used, such as antifi brinolytic intervention does not (fi gure). agents, insulin, or hypothermia. Preoperative medical A surgical procedure is mainly delivered by a surgeon care (eg, coronary care unit/cardiology management, and is aff ected by characteristics such as surgical skill, medical management of comorbidities, blood bank decision making, preferences, and experience. The management), roles of other members of the surgical delivery of a surgical intervention also depends on team (eg, nurses, anaesthetists, perfusionists), and complication rates, andthe can other distort members the results of the of teamrandomised (eg, anaesthetists, trials.137 In a postoperative care (eg, intensive care, acute and chronic nurses, technicians) and preoperative and postoperative cardiac rehabilitation) also vary and aff ect outcomes.48 trial setting, addressingmanagement learning curves (eg, with emergency minimum department, training standards imaging and These supporting components vary between centres and services, postoperative recovery ward, intensive care, are aff ected by infrastructure, staffi ng, and local policies. and rehabilitation programmes). This complexity often Although an intervention needs to have a coherent aim quality control measures adds another barrier to the conduct of the trial, but receives little recognition in the design of surgical studies. (or function), diff erent forms are often available.49 The Indeed, its existence is sometimes used to criticise complexity, and potential variability, of a surgical also may limit the generalisability of trial results.131 An example of the studies of surgical interventions for failing to control for intervention raises two diffi cult questions for the design potential confounding factors.45 of a surgical evaluation for which only general answers differential effects of learning curves can be seen in gastric surgery trials, An example of a typical complex surgical intervention can be given. First, when is variation in form substantial that consists of several interacting components is enough to be worth assessing? Second, when investigating where poor results fromcoronary Western artery surgeons bypass graft (as surgerycompared (CABG). to their The Japanese aim of alternatives, how standardised should they be, in view of

this procedure is to revascularise138 the myocardium by the complexity of the steps involved? Continuing the colleagues) were criticisedbypassing as learning coronary curve arteries effects. that are stenosed or blocked. CABG example, does avoidance of the heart–lung Several steps constitute the surgical procedure: opening machine warrant investigation? If so, how standardised Outcome selection in thesurgical chest; trials harvesting can also conduits; be challenging. attaching While (and surgical later should the off -pump CABG surgical strategy and other detaching) the heart–lung machine; undertaking the steps be? The eff ect on health services (eg, equipment procedures are oftenanastomoses; evaluated reanimating using short the heart; term closing outcomes the chest. such In as resources, staff requirements such as training), the the case of CABG, there is limited variation in technique potential for a change in the balance of benefi ts and 138 complications, a varietybetween of surgeons. patient reported46,47 However, and there long are term many outcomes recognised are harms, or consensus among surgeons could justify variations in surgical strategy, such as off -pump CABG assessment of alternatives. The degree of intervention also needed. Outcomes(avoidance often have of issues the heart–lungrelated to validity machine), and reliability. minimally defi nition and the level of standardisation of the new invasive approaches, and diff erent choices of bypass approach will depend on the stage of development and conduits (eg, bilateral mammary arteries, radial arteries). the aim of the evaluation. The amount of information Some decisions are made intra-operatively (eg, whether that researchers need to record about the conduct of an Figure 1.2 The complexiadditionalty of a surgical grafts are intervention needed) (a anddapted will from depend intervention will depend on how an intervention is Ergina et al)135 on the judgment of the individual surgeon. Other defi ned and the degree of standardisation sought. Very restrictive approaches could limit surgeon participation and might not be feasible in some centres. Preoperative and postoperative care Surgeon-related factors Operating theatre As previously mentioned, attributes of the surgeon, such as surgical knowledge, previous training and Surgical procedure experience, and inherent skills, will infl uence the delivery of a surgical intervention and lead to variability in practice and health outcomes. Variability can be Anaesthesia team Surgeon(s) expected irrespective of previous training and experi- ence. Diff erences between surgeons interact with patients’ diff erences, aff ecting the responses to opera- tions. The expectation that all surgeons should attain the ideal, often high level of performance is unrealistic. Evaluations of surgical procedures should therefore be Medical team Nursing team done in realistic settings. The learning curve for a surgical intervention, whereby surgeons acquire expertise, poses an important challenge. Since the technical and functional success of a procedure is paramount, the early stages of assessment, and thus publication of results, tend to focus on complications.50,51 For example, the rate of bile duct injuries associated with laparoscopic cholecystectomy fell as the surgeons’ Figure: Complexity of a surgical intervention experience increased.52 Proxies for operative expertise,

31 1100 www.thelancet.com Vol 374 September 26, 2009 For example, in the field of orthopaedic trauma research, there is no consensus as to which clinical and radiographic definitions of fracture union

(a common outcome) achieve high accuracy standards for use in research.139 Further, McCulloch and colleagues point out that there is discordance between surgical development and research. Surgical modifications usually produce small incremental outcome changes, which are hard to detect in randomised trials, and do not necessarily result in practice change amongst a sceptical surgical community.131

As previously noted, placebo controlled surgical intervention trials are extremely rare and controversial. There is good empirical evidence that a lack of blinding exaggerates the effects of treatments, particularly when subjective outcomes are used.70 Previous studies have noted that a minority of surgical trials are blinded, and in most of these cases medical treatments are compared in a surgical context.115,140 Anecdotally, trials with a surgical intervention are usually compared to another form of surgery. However, these do not offer answers to whether surgery is effective at all. Non- operative comparisons are uncommon, and are difficult or impossible to adequately blind.62,135

1.6.3 Challenges for trial participants

Patients faced with a choice between starkly different interventions in a surgical trial may hesitate to participate. The nature of surgical interventions and their risk profile is different to non-surgical treatments, even when alternatives are non-pharmacological, such as radiotherapy or shock wave therapy. Patients may also have a lack of equipoise, informed by the views of health professionals, but also their family, friends, and society. Mills and

32 colleagues conducted a qualitative study of the attitudes of participants in the

PROTECT study, a trial comparing surgical and non-surgical interventions for prostate cancer. Patients often had a good understanding of the concepts of chance and the need for comparisons in a trial, but clinical equipoise was identified as the key factor in determining their participation.141 Solomon and colleagues identified patient preference as the most common barrier precluding the conduct of surgical RCTs,142 and concluded only 39% of surgical treatment questions can be answered by a RCT.

Surgical patients often have problems of an emergency nature. The timely provision of health care and informed consent in these scenarios is already challenging and complicates recruitment into trials. Unlike pharmacological treatments, the performance of most surgical procedures is also dependent on the availability of a hospital operating theatre, placing further health care system and funding constraints onto trial conduct.130

1.7 Summarising the evidence from surgical randomised trials: systematic review and meta-analysis of surgical interventions

“Although science is cumulative, scientists rarely cumulate scientifically” – Sir

Iain Chalmers143

1.7.1 Narrative vs. systematic reviews

Every day a large number of new articles are published, and it is impossible for the clinician to stay up to date with the latest scientific evidence.144 It is also unlikely that answers to clinical problems are yielded from a single published investigation, given that false negative results are common in the scientific literature, and articles often provide conflicting results and

33 conclusions.145 A summary of all the available evidence is also necessary when publishing practice guidelines, formulating policies, and deciding whether further RCTs investigating a clinical problem are necessary.

Traditionally, a summary of the evidence was provided by narrative reviews performed by experts in the field. However, narrative reviews do not specify reproducible methods and are therefore unclear regarding the source of conclusions,146,147 which often reflect the biases of the author rather than published empirical evidence.148 Multiple narrative reviews on the same subject are also often found to have conflicting recommendations. The systematic review may be seen as a rigorous scientific approach to cumulate data, and may be defined as “an attempt to collate all empirical evidence that fits pre-specified criteria in order to answer a specific research question”.57 It includes clearly stated objectives (and is therefore prospective in nature), elaborates reproducible methods and searches for data, includes a critical appraisal of the validity of the evidence, and systematically presents the findings of included studies. A meta-analysis is performed in the context of a systematic review, and is a statistical method that combines the results of independent studies in a quantitative manner.149 Meta-analysis is a powerful tool that identifies the size, direction and consistency of treatment effects across studies.

1.7.2 Historical development

The idea of combining data from different studies has existed for hundreds of years. In the 17th century astronomers noted that their measurements would be associated with an error, even when taken by the same observer in similar conditions.150 Laplace and Gauss devised methods of combining

34 measurements and calculating the probability of associated errors, rather than choosing from individual measurements. This laid the foundation for current meta-analytic techniques. Chalmers and Trohler note that physicians and scientists have always been too busy to keep up to date with scientific progress, and in the 18th century several periodical “commentaries” were published, akin to current evidence reviews such as the ACP Journal Club.151

In 1904, British statistician Karl Pearson was credited as the first to use formal techniques to combine data from different studies.152 In the 1930s

Tippett and Fisher devised methods for combining p values from different studies asking similar research questions.153 It took another 50 years for meta-analysis techniques to become mainstream. In 1975, Glass described the “primary, secondary, and meta-analysis of research”, the first to coin the term.154 Shortly thereafter, Archie Cochrane would embark on his “maverick” crusade to encourage the scientific community to gather the scientific evidence on effective treatments.155 In the 1980s, the collection and indexation of clinical trials in perinatal medicine by Chalmers’ group led to the production of 600 systematic reviews, and laid the foundation for the

Cochrane collaboration in 1992. The Cochrane collaboration quickly spread internationally, and was a major factor in the recognition of systematic review and meta-analysis as a mainstream method of study.156 This led to an explosion in the conduct of systematic reviews and meta-analyses indexed in the Cochrane database and elsewhere, with a current annual publication rate of 2,500 reviews.157

1.7.3 The rationale of systematic reviews and meta-analyses

The use of systematic reviews is not only restricted to a summary of the

35 available evidence for the convenience of clinicians and researchers. Often it is the timely identification of the evidence that has the potential to improve clinical outcomes. Meta-analysis can enhance the precision of estimates and reduce the probability of false negative results.150 Lau and colleagues recount several examples of how cumulative evidence may show clear, statistically significant benefits of treatments long before they are taken up by the medical community.158 Had the use of beta-blockers and streptokinase following myocardial infarction coincided with evidence from meta-analyses, thousands of lives may have been saved. A more recent example is the use of rofecoxib, whose cardiovascular adverse effects may have been identified several years before it was pulled from the market.102

Systematic reviews not only include a summary of the evidence, but an appraisal of the quality of the evidence. Even if a pooled result from multiple studies shows a significant effect, it may be due to the bias inherent in the included studies, or what some authors have coined “garbage in – garbage out”.54 The results of systematic reviews and meta-analyses may be interpreted in the context of the quality of the evidence, and appropriate conclusions made.38

The differences, or heterogeneity, between included studies may also be explored in a systematic review and meta-analysis. Heterogeneity may be due to known clinical differences between included studies (clinical heterogeneity), or may be a statistical finding (as a result of a combination of known and unknown clinical and methodological differences).149 Thompson recounts several examples of how exploring heterogeneity may have an important effect on the conclusions drawn.159 Further, exploring

36 heterogeneity may be an important source of further research questions.

1.7.4 Bias in systematic reviews and meta-analyses

The rigor of a systematic review (and meta-analysis) lies in the logical, stepwise progression of pre-specified methods. Like all studies however, bias may be introduced when these methods are not adequate. Over 20 years ago, the different sources of bias were recognised in the search, selection, and extraction of data from included studies.160 Since then, the empirical evidence for these sources of bias in meta-analytic research has developed.161 The following is a brief overview of the methodological issues of systematic reviews and meta-analyses.

Given that systematic reviews summarise evidence from pre-existing data, it is possible for judgments during the review process to be biased by pre- existing knowledge of the data. For example, study eligibility criteria, the choice of comparisons, and the choice of outcomes may all be manipulated to attain the desired results.162 A key component of a systematic review is therefore the transparent formulation of a protocol prior to the execution of any part of the review. Ideally, the protocol would be peer reviewed, as is the practice of the Cochrane Collaboration, and iterative changes would be minimised and transparent in nature.163 It is recommended that the protocol contain clear objectives regarding the included patients, interventions, and outcomes, which then guide the eligibility criteria for the review.164

The search for included studies may be an important source of “sampling bias”.160 First, given that systematic reviews aim to encompass all sources of evidence, multiple electronic databases should be searched using a sensitive, reproducible search strategy.165 Whenever possible, languages other than

37 English should be included.166 Unpublished studies, or “grey literature”, may be very difficult to find. An attempt may be made by searching trial registries, conference proceedings and contacting experts in the field.90 Publication bias is a well recognised problem in systematic reviews, and may lead to an overestimation of effects (and an underestimation of harms).84,85

The search results are assessed to determine which studies are included.

The process of inclusion of studies should be reproducible and based on pre- specified eligibility criteria, rather than the whims of the review author (an important distinction between systematic and narrative reviews).160 The accuracy and transparency of this process may be improved by the use of at least two researchers working independently,167 with an arbitration process for disagreement. A similar process may also minimise bias in the extraction of data from included studies.168

An essential component of a systematic review is a quality assessment of included studies. Problems with the internal validity of included studies are difficult to correct for in a systematic review or meta-analysis, since the direction and extent of bias is unpredictable for a given clinical scenario.48

However, the results of the review should be explicitly placed in the context of its quality. Despite the empirical evidence for bias in poorly conducted trials,56 only 60% of non-Cochrane reviews incorporate the methodological quality of included studies into their conclusions.169

Like RCTs, funding bias may also be an issue in the conduct of reviews.

Lexchin and colleagues examined both clinical trials and systematic reviews, and found that industry sponsored reviews were more likely to favour the intervention.95 Another study of industry funded meta-analyses found that

38 that this bias was not due to favourable treatment effects, but rather the interpretation and conclusions of the review.170 As with clinical trials, the declaration of funding sources for the conduct of the review are essential, although over 40% of published systematic reviews still fail to do so.157

1.7.5 Measuring the quality of systematic reviews and meta-analyses

In recognition of the bias that may occur in the conduct of systematic reviews and meta-analyses, much work has been done to create tools for their methodological assessment. Shea and colleagues conducted a review of critical appraisal instruments, and found twenty-four instruments (including

21 checklists and three scales).171 However, problems of validity and reliability were commonly found, and there was little evidence of appropriate development amongst the tools. Like RCTs, two dimensions of systematic review quality may be identified: methodological and reporting quality.172

Methodological appraisal tools emphasise the items related to the conduct of the review. Most of the tools identified by Shea and colleagues assessed this dimension.171 More recently, a group of clinicians and methodologists developed AMSTAR (A measurement tool to assess the methodological quality of systematic reviews).173 The development of AMSTAR draws upon items included in previous checklists, but also uniquely used factor analysis methods to remove multiple overlapping concepts among eligible items. The result was an 11 item tool that has since been validated.174,175

Reporting quality refers to how clearly authors present what was planned, done, and found in a systematic review.176 Until recently, no guidelines existed for this purpose. Empirical assessments of reporting showed poor descriptions of essential systematic review domains.177,178 Building upon the

39 considerable success of the CONSORT group in promoting RCT reporting guidelines, a similar international group of methodologists produced the

QUOROM (QUality of Reporting Of Meta-analyses) statement in 1996.179 In recognition of the advances made in this field, QUOROM was substantially revised in 2009 and renamed PRISMA (Preferred Reporting Items for

Systematic reviews and Meta-Analyses).180 The PRISMA statement has been validated in a number of empirical assessments of reporting quality,181,182 and has been widely adopted by journals as an ideal for systematic review reporting.183

1.7.6 The importance of systematic reviews and meta-analyses to surgery

Systematic reviews may have specific advantages when used to explore surgical research questions. Surgical trials are often performed in emergency settings, and have issues with surgeon and patient equipoise. Recruitment issues are common, and surgical trials may have smaller sample sizes than pharmaceutical comparators and a higher risk of false negative (Type 2 error) findings.184 The pooling of surgical trial results in meta-analyses may remedy this. Further, surgical intervention trials have inherent methodological challenges, and may be of lesser quality than a matched sample of medical trials62,184 A quality assessment of the evidence may be particularly important in this context. Finally, it has been noted that surgical interventions are multidimensional, surgical patients are often heterogeneous, and controversy exists as to the most appropriate use of outcomes. The exploration of this heterogeneity in systematic reviews is the ideal approach to clarify and interpret what may otherwise be a confusing picture.

40 1.8 Development of iQuEST: Investigating the Quality and Epidemiology of Surgical Trials

The field of trial methodology, reporting, and bias has rapidly developed in the last two decades, but much of it relates to pharmaceutical interventions and general medical trials. Assessments of surgical studies exist, but these often compare medical interventions administered in a surgical setting. At the heart of surgical practice is the performance of operative procedures, and little is known about the characteristics of trials evaluating them.

This thesis explores the bias that may be present in randomised trials and meta-analyses of surgical interventions.

The first part of this thesis examines the epidemiology, methodological quality, and reporting quality of surgical intervention RCTs. A sample of recently published randomised trials of surgical interventions was obtained using systematic methods. The epidemiology of these publications is explored, and the scientific quality characteristics evaluated. There is a view that, given the difficulties in conducting surgical research, surgical trials may be of poorer quality than those performed in internal medicine. A direct comparison of the quality characteristics of both groups of trials was performed. To examine the distinct dimension of reporting quality, the compliance of surgical RCTs with CONSORT was examined, and trial level characteristics associated with improved reporting were explored.

The second part of this thesis examines whether an empirical basis for bias exists. The association of methodological quality with treatment effects is examined using metaregression modeling. Evidence for publication bias is explored with plots and statistical techniques.

41 The third part of this thesis explores issues related to outcome reporting. All outcomes were extracted from the included RCTs, and described. The association between statistical significance and complete reporting (or outcome reporting bias) is investigated using meta-analysis techniques.

Issues related to the selection of patient/clinically important outcomes in surgical RCTs are also evaluated. Are patient important outcomes commonly selected? Are they more likely to be the main focus of the trial?

The fourth and final investigation of this thesis evaluates meta-analyses of surgical interventions RCTs, given their importance in influencing clinical practice. The epidemiology of surgical meta-analyses is explored, and their methodological and reporting characteristics are evaluated.

42 2. Epidemiology and scientific quality of randomised trials of

surgical interventions

2.1 Abstract

Background: In clinical research, randomised controlled trials (RCTs) are the best study design to evaluate the efficacy and safety of interventions, as methodological safeguards are used to minimise the risk of bias. But not all trials may report (or perform) these methods, and little is known about the scientific quality of recently published trials in surgery. Here, a systematic review and author survey of published trials examining a surgical intervention were performed, and the scientific quality was compared to published standards and to general medical trials.

Methods: MEDLINE, EMBASE and CENTRAL were searched in May 2009 for RCTs examining a surgical intervention using a comprehensive electronic search strategy developed by the Cochrane Collaboration. Trial selection and data collection were piloted utilising a specific pro-forma. Quality characteristics (primary outcome specification, sample size calculation, random sequence generation, allocation concealment, blinding, handling of attrition, and declaration of funding) were graded as adequate, inadequate, or unclear. This was supplemented with general characteristic data related to the authors, the study, and the journal. A comparison of the proportions of adequate reporting was made with published standards of trials in other specialties, and risk ratios (RRs) were calculated. An online survey of authors was conducted to examine whether a discrepancy existed between the trial report and what authors reported actually took place; the discrepancy

43 was quantified as a proportion of discordance between trial report and survey response.

Results: 400 surgical trials published between August 2008 and May 2009 were included. The adequate reporting of quality domains was low, with only

42% reporting a sample size calculation, 42% a method of random sequence generation, 43% an adequate method of allocation concealment, 35% any form of blinding, and 27% reporting an adequate method to deal with attrition.

Surgical trials compared well to a sample of general medical trials from 2006, with superior reporting of a primary outcome (RR= 1.24, 95% CI 1.12 – 1.38), random sequence generation (RR= 1.24, 95% CI 1.06 – 1.45), and allocation concealment (RR= 1.71, 95% CI 1.43 – 2.04). However, an adequate method of blinding (RR= 0.60 95% CI 0.51 – 0.69) and trial funding source (RR= 0.67,

95% CI 0.59 – 0.76) were less likely to be reported in surgical trials. When unclear in the published report, 68% of responding authors stated the use of an adequate method for random sequence generation, 77% for allocation concealment, and 53% for any form of blinding.

Conclusions: The rates of adequate reporting of key scientific quality domains remain low in surgical trials, but compared well with trials published in other medical specialties. Discrepancies existed in the information contained in published reports and what was claimed to have taken place by authors. Further research is needed to demonstrate the effect of inadequate reporting on outcome estimates.

44 2.2 Introduction

In clinical research, the randomised clinical trial (RCT) is the ideal study design to investigate the efficacy and safety of interventions. Randomisation and unbiased allocation of interventions allow prognostic variables (or confounders) to be equally distributed within compared groups, and statistical theory to be applied.185 Methods are also adopted in RCTs as safeguards against different sources of bias, such as allocation concealment, blinding, and analysis according to the “intention to treat principle”.25 However, like all research designs, bias may be introduced if these safeguards are not adequately performed, and empirical evidence exists that outcome estimates may be exaggerated when this occurs.56

Few studies have performed a broad assessment of the scientific quality and general characteristics of RCTs. Chan and Altman performed one of the earliest, using a cross sectional sample of trials indexed on Pubmed in

December 2000.34 Only 18% reported adequate methods of allocation concealment, while blinding, random sequence generation and handling of attrition were adequately reported in 60%, 21% and 34% of trials, respectively. A similar study was performed six years later (by the same authors), and showed that, while the reporting of adequate methods had improved somewhat, deficiencies remained highly prevalent.186 Current evidence suggests the quality of RCTs in surgery is generally poorer than other specialties,35,187 although a direct comparison of general medical and surgical intervention trials has not been performed. Surgical research is faced with particular challenges. Where surgical interventions are tested, blinding is difficult, unethical or impossible,62 patients and/or surgeons

45 usually have preference for one treatment,188 and surgical interventions are difficult to standardise.131 In addition, surgery often takes place in an emergency setting, making informed consent and patient recruitment more challenging. In recognition of the particular difficulties faced in the assessment of procedural interventions, the IDEAL collaboration of clinicians and methodologists have developed broad recommendations on how surgical innovation may be assessed at all its stages, including increasing awareness of issues related to clinical trials.10,135,189

The methodology used in these and other empirical assessments of quality relied primarily on the reports of included RCTs. When reports did not provide an adequate description of methodology, it was assumed that the safeguards against bias provided by the methodological domain(s) did not occur: a “guilty until proven innocent approach”.190 But lack of reporting may not always reflect the truth about how the trial was actually conducted. Three studies have compared reported (published) methodology with what was actually conducted by contacting authors directly,190-192 with consistent improvements in quality between 7% and 96% after clarification with the authors. In a more robust cohort design, Pildal and colleagues assessed approved trial protocols submitted to ethics committees, and their corresponding publications.193 Examination of submitted protocols for method of allocation concealment did not add substantially to what was reported in the final manuscripts, with most trials having unclear methods based on both reports and protocols. This discrepancy in what was reported and what was conducted is important to explore, as the deficiency in scientific quality may

46 originate at different stages of the RCT- either at the trial design/protocol stage, or the reporting/publication stage.

To the best of the authors’ knowledge, a broad assessment of the characteristics and scientific quality of RCTs assessing a surgical intervention has not been performed. Further, a direct comparison of surgical and non-surgical trials has not been performed, and it is unknown whether surgical trials indeed lag behind non-surgical trials. An investigation of reported vs. actual methodology has not been performed specifically for surgical trials, and may clarify whether the “guilty until proven innocent approach” is appropriate. Knowledge of the epidemiology of surgical trials will influence perceptions, will hopefully encourage improved quality and reporting of trials, and may inform future methodological research in this area.

2.3 Aims

The primary aim was to assess the characteristics of published RCTs evaluating a surgical intervention. These were broadly divided into characteristics related to scientific quality (methodology), as well as other general (author, journal and published report) characteristics.

The secondary aims were to i) compare the methodology characteristics of surgical trials to a previously published cross sectional sample of trials in a mixture of general medical specialties, using identical definitions of scientific quality, and ii) to assess the discrepancy between the information that authors provided in the published report of their RCT, and what authors state was actually done.

47 2.4 Methods

The following five chapters present an assessment of issues related to quality in surgical RCTs. This chapter provides a detailed description of the methods used for obtaining this representative sample of recently published

RCTs, along with relevant rationale. Further methods specific to the aims of this chapter are also presented following the description of the search. In the following chapters, where the same sample of published RCTs was often used, only methods specific to that chapter are presented.

2.4.1 Study design

A systematic review methodology was adopted. It should be noted that the unit of analysis in this and the following chapters is the published report of a surgical RCT. The intention was to include a sample of RCTs that reflects the current evidence base (and therefore were recently published), and reflects the procedural / interventional practices of surgeon clinicians.

2.4.2 Eligibility criteria

To be eligible, a study met the following criteria: i) A randomised controlled trial, according to the definition of the U.S.

National Institutes of Health: “A study in which participants are randomly (i.e., by chance) assigned to one of two or more treatment arms of a clinical trial.”194 A study was included if it was described as a randomised trial, even if a specific description of the randomisation method was not given. Quasi- randomised trials, such as those with alternate allocation, or allocation based on birth date were excluded as those allocation methods are not due to chance;195

48 ii) Published as a full text article. Studies published as abstracts or conference proceedings were excluded, based on the rationale that this thesis focused on aspects of quality, and sufficient detail had to be presented about the methods and results of included studies; iii) Published in the English language. The main rationale for this was a logistic one, as English was the first language of the author of this thesis.

English remains by far the most common language of biomedical publication,196 and many non-English publications are often translated into

English for the purpose of distribution to a wider audience.197 Furthermore, previous evaluations of the methodology of RCTs in general medicine used

English language as an inclusion criterion, allowing a matched comparison with our surgical RCT sample; iv) The primary publication of an investigation, where multiple publications from one investigation were found. The primary publication was defined as the first (earliest) publication from an investigation, or the publication where the methods of the trial were described in full; v) Conducted on humans (not cadavers); vi) Compared a surgical intervention to any other intervention. We defined a surgical intervention as any procedure that requires surgical training and is usually performed by a surgeon of any subspecialty recognised by the Royal

Australasian College of Surgeons (Table 2.1). Obstetric/gynaecologic interventions, ophthalmic surgery (usually performed by an ophthalmologist), dental surgery (usually performed by a dental or maxillofacial surgeon), injections of any material, application of splints, and interventions for diagnostic purposes were excluded.

49 2.4.3 Sources of RCTs

A search on MEDLINE, EMBASE, and the Cochrane Central Register of

Controlled Trials (CENTRAL) was executed. While a variety of other electronic databases exist, a previous empirical study has shown that the vast majority (over 97%) of available records for systematic reviews may be found by searching a combination of MEDLINE, EMBASE and CENTRAL.198

2.4.4 Electronic search strategy

The electronic search strategy was formulated in collaboration with two medical librarians. One was associated with a hospital library and the other associated with a Cochrane review group. The search strategy was piloted to ensure the relevance of search terms and to obtain information about the precision of the search (and the number of “hits” that had to be reviewed in order to obtain the required sample). The strategy contained two filters to identify studies. The first, a randomised trial filter, was based on the

Cochrane highly sensitive search strategies for MEDLINE (Phase 1) and

EMBASE,199 and was used for many years by the Cochrane Collaboration to identify trials for indexation on CENTRAL and inclusion in Cochrane systematic reviews. This filter was slightly modified by the removal of the term “drug therapy.fs” as the records retrieved from this term did not meet the eligibility criteria for this thesis. The second, a surgery filter, aimed to retrieve all studies of relevance to the surgical specialties of interest to this thesis. The two filters were combined with the Boolean operator “AND”. The syntax of the electronic search strategy is presented in Appendix 1. Since only clinical trials were indexed in CENTRAL200 only the surgery filter was used for this database.

50 Table 2.1 Subspecialties recognised by the Royal Australasian College of Surgeons

Cardiothoracic Surgery General Surgery. Includes a. Upper Gastrointestinal / Hepatobiliary b. Colorectal c. Transplant Surgery Neurosurgery Otolaryngology / Head and Neck Surgery Paediatric Surgery Plastic / Reconstructive Surgery Vascular Surgery Orthopaedic Surgery Urology

2.4.5 Study identification method

Records identified using the search strategy were imported into Endnote

Version X Reference Management Software (Thomson, NY, USA). Using software functions, duplicates were first removed, and then records were ordered according to date of publication, with the most recently published studies (in May 2009) placed first. Assessment of studies for inclusion thus proceeded in reverse chronological order until the required sample was obtained.

Titles and abstracts of retrieved records were assessed according to the above inclusion criteria. When a reference did not meet one of the criteria, it was excluded and a reason specified in the following order of priority: i) not a

RCT, ii) no surgical intervention assessed, iii) not English language or, iv) secondary publication.

51 Full texts of abstracts that appeared to meet the eligibility criteria were retrieved, and eligibility was assessed using the same process as for abstracts. Full text articles that did not meet eligibility criteria were excluded, and a reason specified as above.

Study identification methods were piloted to resolve any issues with the interpretation of the eligibility criteria. Two researchers (the author of the thesis and the primary supervisor), both with training in clinical epidemiology, used a sample of 1000 records identified using the search strategy. The titles

/ abstracts of these records, followed by their full texts when appropriate, were reviewed according to the above methods. This pilot resulted in almost perfect agreement201 among the two reviewers (Kappa statistic = 0.85, 95% confidence interval 0.77 – 0.93). Anecdotally, there was little or no disagreement on the inclusion of individual articles (four articles were discussed and issues were quickly resolved). Following this pilot, study identification was carried out individually by the author of this thesis.

2.4.6 Data extraction (including pilot)

An electronic proforma for data extraction was created, which was piloted with the help of three researchers. First, a round table discussion took place clarifying the definition and rationale of each data item. Second, the data form was calibrated using a random sample of 15 included studies, in a two stage process (five studies, followed by ten studies). Study information such as author and journal information) was not masked as there is inconsistent evidence that masking reduces bias related to data extraction in methodological research.202,203 Agreement between the authors on individual

52 items was high (median kappa statistic = 0.75, range 0.63 – 1.0). Following this pilot, data extraction was carried out by the author of this thesis.

2.4.7 Items related to scientific quality

Although a plethora of criteria are available for the assessment of bias in

RCTs,204 similar domains (and definitions) were chosen as in two published studies that systemically assessed a cross section of RCTs.34,186 There were several reasons for this: First, this allowed a direct comparison of surgical

RCTs with general healthcare RCTs. Second, these domains had the strongest empirical basis56,70 for bias of intervention effects, and these data were used in investigations of bias in the following chapters of this thesis.

Third, these domains formed the default for a risk of bias assessment in

Cochrane reviews, and were the most pertinent in the current literature on this topic.57

The domains assessed were: “Randomised” in the title of the study, specification of primary outcome(s), sample size (power) calculation, method of random sequence generation, method of allocation concealment, blinding of patients, caregivers, and/or outcome assessors, handling of attrition, and source of funding. While the Cochrane Collaboration has defined the gold standard of reporting of these domains,57 we used the more liberal definitions of Chan and Altman34 in order to make meaningful comparisons between the two groups of RCTs (general healthcare and surgical). The Cochrane guideline definitions were also more stringent than those used in previous methodological research. For example, a sample size calculation was deemed to be adequate if the RCT authors stated one was done, whereas the Cochrane guidelines specifically require an effect size, an alpha and a

53 beta value to be stated.57 Blinding was deemed to be adequate if the RCT authors described the trial as blinded, whereas the Cochrane guidelines required additional information as to whether blinding may have been effective for each outcome measured, and then a judgment on the part of the appraiser as to whether any deficiency in blinding puts that outcome at risk of bias.57 Allocation concealment was deemed to be adequate if the envelopes were used to contain the allocation sequence, whereas the Cochrane guidelines require that “sealed”, “opaque” and “numbered” envelopes be specified. Detailed operational definitions of each scientific quality domain used in this thesis are contained in Table 2.2. Most domains were recorded as “adequate”, or “inadequate / unclear” when no information was found in the published report. During the protocol and data collection stages, authors of both previous assessments of general healthcare RCTs34,186 were contacted to clarify the definitions used and ensure consistency of interpretation.

2.4.8 General characteristics of trials

Information related to the authors (number, affiliation, education, country of origin), the study (trial registration, design, type, comparisons, study arms, sample size, subspecialty, number of centres), and the published journal report (type, impact factor, length in words) was collected. The World Health

Organisation Clinical Trial Portal (which indexed the majority of online trial registries) was searched when trial registration details were not reported.205

Apart from providing useful information on the epidemiology of surgical RCTs, these items had a theoretical and empirical basis for their association with

54 Table 2.2 Operational definitions of scientific quality domains

“Randomised” in title of study • Adequate: study contained the word “randomised” or any of its variants in the title of the study • Inadequate: “randomised” not in title Specification of primary outcome(s) • Adequate: the primary outcome was defined explicitly in the text (using the word “primary” or any of its synonyms), was stated explicitly in an aims/hypothesis statement, or was the outcome used in a sample size calculation. Other outcomes were regarded as secondary outcomes. • Unclear: primary outcome(s) not specified Power calculation • Adequate: authors stated a sample size calculation was performed to determine included study numbers • Unclear: sample size calculation not reported Generation of random sequence • Adequate: reported method that is completely unpredictable, such as computer random number generation, random number tables, coin flip, or lottery. If adequate, the method will be specified • Unclear: method of random sequence generation not reported Concealment of treatment allocation • Adequate: reported method where researchers and trial participants are prevented from knowing which study arm they have been allocated in advance, such as a separate central allocation service, coded containers, or envelopes. If adequate, the method will be specified. Envelopes were regarded as adequate without specification as “sealed”, “opaque” and “numbered”, although this information was also collected • Unclear: method of allocation concealment not reported Blinding • Adequate: patients, caregivers, and or outcome assessors were unaware of intervention arms, or the study was described as “blinded”, or the term “placebo” is used to describe a control, or sham surgery is used. The blinded party will be specified as patients, caregivers and/or outcome assessors • Inadequate: none of the above reported or the trial was described as non-blinded or open label Handling of attrition • Adequate: follow up rates are reported for intervention and control arms, and the “intention to treat principle” was used as the primary method of analysis. A description of analysis according randomised group, or a flow diagram representing analysed groups were considered adequate for “intention to treat” • Follow up: follow up is reported in intervention and control groups, but analysis according to intention to treat not reported • Inadequate: none of the above reported Source of funding • Full industry: the only source(s) of funding is/are stated as an industry (for-profit) source • Part industry: one source of funding or one section of the trial supported by an industry (for-profit) source • Non-industry: the only source(s) of funding is/are stated as a not-for-profit source (such as government grants, charitable trusts, or scholarships) • No external: where no external source of funding is declared (i.e. it is declared the trial was internally funded by the authors’ institutions / department) • Unclear: the funding source is not declared

55 RCT quality and the bias of outcome estimates,77,206 which were explored in further detail in the following chapters. Detailed operational definitions of each of these data items are presented in Table 2.3.

2.4.9 Checking of data

Since a single researcher collected the data, a computer generated random sample of 100 included RCTs was checked by a second researcher for accuracy. Any disagreements were discussed and resolved, and a third researcher acted as arbitrator if necessary. This resulted in 16 items changed in 15 different RCTs, or 1.5% of the total data points that were checked.

2.4.10 Author survey

The name and email address of the corresponding author were extracted for each included trial, and an online email initiated author survey was designed using a widely recognised website called “Survey Monkey”.207 A personalised invitation email was sent to each corresponding author, addressing them by name and title, and included information about the investigators and the reason for the research (Appendix 2). This email contained a hyperlink to the

Survey Monkey website, and each author was able to complete a uniquely identified survey, which allowed their answers to be compared with the information provided in their published report. If authors did not respond within 7 days, we sent a reminder email (Appendices 3 to 5). We repeated this twice (a total of three reminders), and the survey was open for a total of four weeks following this schedule.

56 Table 2.3 Operational definitions of general characteristics of surgical RCTs

AUTHOR CHARACTERISTICS

Number of authors (continuous variable recorded as an integer) Stated affiliation of the first author Categorical variable recorded as i) Department of surgery (any specialty recognized by the Royal Australasian College of Surgeons) ii) Department of epidemiology / statistics / public health OR Clinical trials unit OR Cochrane collaboration affiliate iii) Department of medicine iv) Other department Author background Citation of epidemiology, biostatistics, public health or trials unit background by any one of the authors. Binary variable recorded as yes / no Country of origin of first author Categorical variable dichotomised into i) research country, defined as USA, Canada, Australia, New Zealand, Japan, Israel, and Western Europe77 and ii) other country

STUDY CHARACTERISTICS

Trial registration Categorical variable recorded as i) trial registered and registration details provided in text ii) trial registered but was not mentioned in text, and was found by performing a search of the World Health Organisation Clinical Trial Portal iii) not stated in text and not found online. The trial registry name was also recorded. Study design186 Categorical variable recorded as i) parallel groups ii) split body iii) crossover iv) factorial Study type34 Categorical variable recorded as i) efficacy / superiority ii) equivalence / non-inferiority. This was determined by either explicit definition by authors, from the stated aim / hypothesis, or the sample size calculation Type of comparison Categorical variable recorded as i) surgical intervention vs. surgical intervention ii) surgical intervention vs. non-surgical intervention. The primary study aim statement or sample size calculation was used to determine the main comparison Study arms Number of intervention / control arms in the trial. Continuous variable recorded as integer Sample size Total number of patients examined in the study. Continuous variable Subspecialty of the intervention Categorical variable recorded as one of the nine specialties in Table 2.1 If the intervention is associated with two subspecialties, then the affiliations of the authors was used to determine the subspecialty Number of centres Multicentre trial status (dichotomous variable defined as a trial conducted in two or more separate centres) was recorded. For multicenter trials, the number of centres the trial was conducted in, recorded as continuous integer variable.

JOURNAL AND REPORT CHARACTERISTICS

Journal name and type This was categorised into i) general surgical journal ii) subspecialty surgical journal iii) general medical journal iv) subspecialty medical journal. A “general” journal was regarded as one publishing from multiple non-overlapping specialties Journal impact factor Recorded as a continuous variable. The Thomson ISI Journal Citation Reports (JCR) was used to reflect the impact of that journal with the JCR edition prior to year of publication used to reflect the time lag in submission to publication of the trial Article length in words Number of words in the published journal report, excluding abstracts, tables and figures

57 A copy of the complete survey is attached in Appendix 6. The questions were a combination of single answer multiple choice, multiple answer multiple choice, and free text when necessary. The main components of the survey were enquiries related to scientific quality domains (randomisation sequence, allocation concealment, blinding of patients/carers/outcomes assessors, source of funding), the registration of the trial, and any unreported outcomes

(this data was used in the following chapters of this thesis). In order to prevent ambiguity regarding the definitions of each scientific quality domain, a short description of that domain was provided, but no indication was given as to what was considered adequate. A number of different choices (different trial methods used for that domain) were presented, and these were randomised to eliminate any perceived order of preference.

2.4.11 Data analyses i) Characteristics of surgical trials: a descriptive analysis with summary

statistics (proportions, means, medians, standard deviations and

ranges) was performed for each scientific quality domain and general

characteristic. ii) Comparison of surgical trials and general healthcare trials: the

proportion of adequately reported methodological characteristics

reported was compared to the proportion adequately reported in

general healthcare trials published in 2006186 (the primary comparison)

and 2000.34 A risk ratio (RR) was calculated along with its 95%

confidence interval (CI), where a RR > 1 favoured adequate

methodology in surgical RCTs.

58 iii) Reported versus conducted methodology: survey response rate was

calculated as the proportion of invitees who responded. A descriptive

analysis of methodological domains (adequate vs. inadequate) was

performed using author survey responses, and comparison made to the

information provided in the published report. These data were

presented in 2x2 tables for each methodological domain, and quantified

the discordance between information provided in the published report,

and that obtained from the author survey.

2.4.12 Sample size calculation

The main measures of interest in this investigation are dichotomous quality measures. To determine sample size, allocation concealment (which has theoretical and empirical importance as a quality indicator of randomised trials) was used. Hopewell and colleagues186 demonstrated that 25% of a sample of trials indexed on PubMed adequately reported allocation concealment. In order to detect a difference of 33% with Hopewell’s sample of trials (or that 33% of surgical trials adequately report allocation concealment) with an alpha of 0.05 and a beta of 0.80, 397 trials were required.

2.5 Results

2.5.1 Results of search

The search strategy was executed in May 2009 on MEDLINE via Ovid (Week

3, May 2009), EMBASE via Ovid (Week 21, 2009) and CENTRAL via Wiley

InterScience (Issue 2, 2009). A total of 12,674 records based on title and/or abstract were assessed, and 11,659 were excluded. A total of 1,015 records

59 were assessed based on their full text. 615 of these were excluded, leaving the required sample of 400 RCTs (Figure 2.1). 29 records were excluded

(based on their full text assessment) due to not meeting the criteria for RCTs, although they were described as “randomised” in their title and/or abstract. A complete reference list of included RCTs is presented in Appendix 7.

2.5.2 Epidemiology of published surgical RCTs

The included surgical trials spanned a period of eight months from August

2008 to May 2009, and were published in 197 unique journals. Four journals published 10 or more RCTs during that period: Annals of Surgery (n=11),

Journal of Bone and Joint Surgery (British Volume) (n=10), Journal of

Endourology (n=10) and Journal of Thoracic and Cardiovascular Surgery

(n=10). Most journals were subspecialty (n=338, 85%) as opposed to general journals that published articles from any subspecialty. The most common subspecialties represented were general surgery (including upper/lower gastrointestinal and transplant surgery) (n=108, 27%), orthopaedics (n=88,

22%), cardiothoracic (n=58, 15%), and otolaryngology/head and neck surgery (n=46, 12%). The median impact factor of the journals was 2.3, but ranged from 0.11 to 50, and the mean length of each publication was 2720 words.

RCT first authors were mostly affiliated with a surgical department (n=316,

79%), while only 2% of first authors were affiliated with a clinical trial unit, epidemiology or statistics department. However, 75 RCTs (19%) had at least one author with a stated background or affiliation in epidemiology or statistics.

Authors were most commonly affiliated with a department based in the

United States (n=52, 13%), Italy (n=39, 10%), United Kingdom (n=31, 8%),

60 Figure 2.1 Flow diagram depicting search and inclusion of eligible surgical RCTs

ME DUNE via Ovid EM BASE via Ovid CENTRAL via Ovid Week 3, May 2009 Week 21, 2009 Issue 2, 2009 8,214 hits 6,269 hits 1,613 hits

I

'-c_o_m_b_i_n_ed_ h_its--, _=_1_6_,o_9_6_ ____,l----tol•[ Duplicates = 3,42 2 I

Not randomised trial= 8,276

Not surgical intervention = 3,247

Not English language = 112

Abstracts assessed= 12,674 11---l Not full text= 24 I

Not randomised trial = 3 51 -->Described as "randomised" = 29 Not surgical intervention = 12 5 Secondary publication = 64 Not English language = 24 Duplicate publications= 11 Not full text= 11 Full text articles reviewed = 1,015 1---l Prior to cut off date = 29

[ Randomised trials included = 400 ]

61 Germany (n=28, 7%), China (n=27, 7%), Netherlands (n=25, 6%) and Turkey

(n=24, 6%).

Only 15% of trials reported their registration details. A further 10% were found by performing an online search of the World Health Organisation

Clinical Trial Portal. No record of registration was found for 75% of trials.

The vast majority of trials had a parallel group design (n=380, 95%) and only

4% of trials had a stated aim of non-inferiority or equivalence, with the majority being superiority or efficacy trials. Few trials compared a surgical intervention to a non-surgical one (n=40, 10%), with most trials comparing surgery to surgery. Most had two intervention arms (n=368, 92%), and had a relatively small total sample size of 76 participants. 21% of trials were multicentre, which were conducted in a median of four (Range 2 to 96) centres.

A detailed description of the general characteristics of surgical trials is presented in Table 2.4.

2.5.3 Scientific quality of surgical RCTs

Most scientific quality domains were poorly reported (Table 2.5).

Approximately one third of trials (35%) did not specify a primary outcome, and more than half (58%) did not report a sample size calculation. Less than half of surgical trials described an adequate method to generate a randomisation sequence, with the most common methods being computer generated (n=111, 66%), or random number tables (n=29, 17%). Less than half of trials (43%) described an adequate method of allocation concealment, most commonly using envelopes. While envelopes were usually described as

“sealed” (80%), only 23% of trials specified using opaque envelopes, and

62 Characteristic Total N=400 Number of authors, mean (SD) 6.3 (3.1) Stated affiliation of first author, n (%) Surgical department 316 (79) Medical department 70 (18) Research department 6 (2) Other 8 (2) Table 2.4 Epidemiology of Author with epi/stats degree, n (%) 75 (19) Country of affiliation,^ n (%) surgical RCTs USA 52 (13) Italy 39 (10) UK 31 (8) Germany 28 (7) China 27 (7) Netherlands 25 (6) Turkey 24 (6) India 14 (3) Canada 13 (3) Iran 12 (3) Research country,# n (%) 261 (65) Trial registration, n (%) Reported in text 61 (15) Found online 41 (10) Unclear 298 (75) Registry,** n (%) Clinicaltrials.gov 66 (65) ISRCTN 26 (25) ANZ-CTR 4 (4) Study design, n (%) Parallel 380 (95) Split body 18 (4.5) Crossover 2 (0.5) Study type, n (%) Superiority / efficacy 385 (96) Non-inferiority / equivalence 15 (4) Type of comparison, n (%) Surgery vs. surgery 360 (90) Surgery vs. non-surgery 40 (10) Study arms, n (%) Two 368 (92) Three 20 (5) Four 8 (2) More than four 4 (1) Total sample size, median (Range, IQR) 76 (5 – 2352, 93) Subspecialty, n (%) General (incl. upper/lower gi) 108 (27) Orthopaedic 88 (22) Cardiothoracic 58 (15) ENT/head and neck 46 (12) Urology 33 (8) Vascular 30 (8) Neurosurgery 19 (5) Plastic and reconstructive 11 (3) Paediatric 7 (2) Multicentre, n (%) 85 (21) Centres, if multicentre, median (Range) 4 (2 – 96) Journal name,^^ n (%) * Items are reported as frequency Ann Surg 11 (3) (proportion) for categorical variables, J Bone Joint Surg Br 10 (2) J Endourol 10 (2) mean (standard deviation) or median J Thorac Cardiovasc Surg 10 (2) (range / interquartile range). SD = Br J Surg 9 (2) standard deviation. IQR = interquartile J Bone Joint Surg Am 8 (2) range. ^ Only the top ten countries J Vasc Surg 8 (2) presented for brevity. # USA, Canada, Eur J Vasc Endovasc Surg 7 (2) Australia, New Zealand, Japan, Israel, and J Urol 7 (2) Western Europe. ** Only top three Surg Endosc 7 (2) Type of journal, n (%) registries presented for brevity. ^^ Only top General surgery 40 (10) ten journals presented for brevity. Index General medicine 22 (6) medicus abbreviations used. Subspecialty surgery 250 (63) Subspecialty medicine 88 (22) Journal Impact factor, median (IQR) 2.3 (1.9) Length of article in words, mean (SD) 2720 (894)

63

Table 2.5 Reporting of scientific quality domains in surgical RCTs

Quality characteristic n (%) (Total N=400) “Randomised” in title 222 (56) Specification of primary outcome Adequate 261 (65) Unclear 139 (35) Sample size calculation Adequate 167 (42) Unclear 233 (58) Generation of random sequence Adequate 168 (42) Computer generated 111 (66) Random number table 29 (17) Cards 11 (7) Lots / lottery 10 (6) Coin flip 4 (2) Other 3 (2) Unclear 232 (58) Allocation concealment Adequate 173 (43) Envelopes 128 (74) “Sealed” 102 (80) “Opaque” 30 (23) “Numbered” 37 (29) Central / third party allocation 45 (26) Unclear 227 (57) Blinding Any blinding 140 (35) Patient 67 (17) Carer 32 (8) Outcome assessor 123 (31) Primary outcome blinded 106 (27) Unclear 260 (65) Handling of attrition Intention to treat 107 (27) Follow up only 256 (64) Inadequate 37 (9) Source of funding Reported 165 (41) Full industry 10 (6) Partial industry 64 (39) Non-industry 55 (33) No external source 36 (22) Unreported 235 (59)

64 only 29% specified them as being numbered or consecutive. Blinding was reported in 140 (35%) of trials, most commonly of the outcome assessor

(n=123, 31%). Few trials specified the blinding of patients/participants (n=67,

17%), or their caregivers (n=32, 8%). Only one quarter (n=107, 27%) of trials specified the intention to treat principle as the primary methods of analysis, as well as provided follow up rates for intervention groups. Most trials only provided follow up rates (n=256, 64%), while a tenth of trials did not report on follow up rates at all. Most trials (n=235, 59%) did not report their source of funding. Those that did report funding were supported partially by industry

(n=64, 39%), by non-industry sources (n=55, 33%), relied on internal funding

(n=36, 22%), and only ten trials (6%) reported full funding from a commercial source.

2.5.4 Comparison with general medical RCTs

Forest plots were used to compare the scientific quality domain reporting in surgical trials versus samples of trials indexed on PubMed in the years 2000 and 2006 (Figures 2.2 and 2.3). Compared to a PubMed sample from the year 2000,34 surgical trials were 46% more likely to specify a primary outcome (RR=1.46, 95% CI 1.30 – 1.64), 53% more likely to perform a sample size calculation (RR=1.53, 95% CI 1.27 – 1.83), twice as likely to specify an adequate method to generate a randomisation sequence (RR=2,

95% CI 1.63 – 2.45), and more than twice as likely to conceal allocation

(RR=2.39, 95% CI 1.93 – 2.96). However, surgical trials were 41% less likely to report any form of blinding (RR=0.59, 95% CI 0.51 – 0.68) and 54% less likely to report their source of funding (RR=0.46, 95% CI 0.41 – 0.52).

65 Figure 2.2 Forest plot depicting comparison of adequate scientific quality domains for surgical RCTs vs. general medical RCTs from the year 2000. A higher risk ratio favours surgical RCTs

Events, Events,

Domain RR (95% CI) Surgical RCTs GenMed RCTs

Primary outcome 1.46 (1.30, 1.64) 261/400 232/519

Sample size calculation 1.53 (1.27, 1.83) 167/400 142/519

Randomisation sequence 2.00 (1.63, 2.45) 168/400 109/519

Allocation concealment 2.39 (1.93, 2.96) 173/400 94/519

Blinding 0.59 (0.51, 0.68) 140/400 309/519

Source of funding 0.46 (0.41, 0.52) 165/400 466/519

.5 1 2 Favours general medical RCTs Favours surgical RCTs

Figure 2.3 Forest plot depicting comparison of adequate scientific quality domains for surgical RCTs vs. general medical RCTs from the year 2006. A higher risk ratio favours surgical RCTs

Events, Events,

Domain RR (95% CI) Surgical RCTs GenMed RCTs

Randomisation in title 1.67 (1.45, 1.92) 222/400 205/616

Primary outcome 1.24 (1.12, 1.38) 261/400 324/616

Sample size calculation 0.92 (0.80, 1.07) 167/400 279/616

Randomisation sequence 1.24 (1.06, 1.45) 168/400 209/616

Allocation concealment 1.71 (1.43, 2.04) 173/400 156/616

Blinding 0.60 (0.51, 0.69) 140/400 362/616

Source of funding 0.67 (0.59, 0.76) 165/400 380/616

.5 1 2 Favours general medical RCTs Favours surgical RCTs

66 The reporting of trials indexed on PubMed somewhat improved by 2006,186 yet surgical trials still compared favourably to that sample. In surgical trials, the term “randomisation” was 67% more likely to be used in the title

(RR=1.67, 95% CI 1.45 – 1.92), 24% more likely to describe their primary outcome (RR=1.24, 95% CI 1.12 – 1.38), 24% more likely to report an adequate method of random sequence generation (RR=1.24, 95% CI 1.06 –

1.45), and 71% more likely to report an adequate method of allocation concealment (RR=1.71, 95% CI 1.43 – 2.04). There was no difference in the reporting of a sample size calculation (RR=0.92, 95% CI 0.80 – 1.07), but blinding (RR=0.60, 95% CI 0.51 – 0.69) and source of funding (RR=0.67,

95% CI 0.59 – 0.76) were less likely to be reported in surgical RCTs.

2.5.5 Survey responses

345 RCT authors had a valid email available (either listed in the RCT report or found after an online search), and were sent invitations and reminders to complete the online survey. 124 (36%) of authors completed the survey. A flow diagram illustrated how survey responses were obtained (Figure 2.4).

Most respondents (n=113, 91%) stated that an adequate method of random sequence generation was used (Table 2.6), although 15 (13%) of these disclosed quasi-random methods. Such quasi-random methods were classified as inadequate methods in the current study (Table 2.7). Most respondents also stated they used an adequate method of allocation concealment (n=98, 79%), and any form of blinding (n=82, 66%), but fewanswered that any industry source of funding was obtained (n=10, 8%).

Most authors (n=86, 69%) did not disclose any trial registration details.

67

Figure 2.4 Flow diagram depicting author survey invitations and responses received

Total randomised trial Invalid email/ email sample= 400 unavailable = 55 l Invitation surveys sent No response after three N = 345 reminders= 221 l Survey responses n = 124 (36%)

68 Table 2.6 Survey responses related to scientific quality

Response variable Total N=124 Generation of random sequence, n (%) “Yes” 113 (91) Computer generated 54 (48) Random number table 23 (20) Cards 9 (8) Lots / lottery 9 (8) Coin flip 2 (2) Dice roll 1 (1) “Yes” but qausi-random method specified 15 (13) Alternate allocation 7 (47) Based on patient record number 4 (27) Based on admission day 3 (20) Availability of equipment 1 (7) “No” 11 (9) Allocation concealment, n (%) “Yes” 98 (79) Sealed envelopes 48 (49) Internet / web based 18 (18) List kept with third party 17 (17) Telephone system 9 (9) Sealed containers 5 (5) “No” 26 (21) Blinding, n (%) Patient 60 (48) Carer 20 (16) Outcome assessor 46 (37) Any blinding 82 (66) Unclear / no blinding 42 (34) Source of funding, n (%) Full industry 1 (1) Partial industry 9 (7) Non-industry 15 (12) No external source 99 (80) Trial registration, n (%) “Yes” 38 (31) Clinicaltrials.gov 23 (61) ISRCTN 9 (24) ANZ-CTR 1 (3) Other registry 5 (13) “No” 86 (69)

69 2.5.6 Discordance between survey responses and published report

Discordance existed between information provided in the published RCT report and that provided by the author survey (Tables 2.7 – 2.10). There was a trend for authors to claim adequate methods in the survey when the published report was unclear or did not report that domain. When unclear in the published report, 68% of respondents stated the use of an adequate method for random sequence generation (Table 2.7), 77% for allocation concealment (Table 2.8), and 53% for any form of blinding (Table 2.9). Most respondents confirmed an adequate method for these domains when they were reported in the journal article, although a small number actually stated they did not use an adequate method despite the trial report stating so.

When asked about source of funding, 65% of respondents stated they did not obtain any source of industry support, despite their trial report having clear evidence of such support. When the trial report was unclear regarding the source of funding, the majority (97%) confirmed no industry support.

70 Table 2.7 Discordance in trial report and author survey response: Randomisation sequence

Survey response No Yes Total Trial report Unclear 20 (32%) 42 (68%) 62 Reported 6 (10%) 56 (90%) 62 Total 26 98 124

Table 2.8 Discordance in trial report and author survey response: Allocation concealment

Survey response No Yes Total Trial report Unclear 15 (23%) 49 (77%) 64 Reported 7 (12%) 53 (88%) 60 Total 22 102 124

Table 2.9 Discordance in trial report and author survey response: Any blinding

Survey response No Yes Total Trial report Unclear 34 (47%) 39 (53%) 73 Reported 8 (16%) 43 (84%) 51 Total 42 82 124

Table 2.10 Discordance in trial report and author survey response: Funding

Survey response Industry Not industry Total funded funded Trial report Industry funded 8 (35%) 15 (65%) 23 Non industry 0 43 (100%) 43 Unclear 2 (3%) 56 (97%) 58 Total 10 114 124

71 2.6 Discussion

Few studies have assessed the general characteristics and quality of surgical randomised trials. Given that randomised trials are generally regarded as the most scientifically valid and reliable source of evidence for the efficacy (and safety) of interventions,25 the data presented here sheds light on the interpretation of surgical evidence, and informs the further exploration of the methodology of surgical trials found in the following chapters of this thesis.

Concerning gaps remain in the reporting of important domains related to the internal validity of randomised trials, but most of these domains compared favourably to a similar assessment of randomised trials in general medicine.

A survey of authors revealed discrepancies between survey answers regarding study methodology and published reports, with most respondents describing an adequate scientific method when their trial report was unclear.

Most trials were published in subspecialty journals with a moderate impact, most had no available registration details, and few had authors with reported epidemiology or statistical expertise. Median total sample size was relatively small (when compared to non-surgical trials),186 and most trials had a superiority and parallel group design in which one form of surgery was compared to another.

Compared to the Cochrane Collaboration guidelines (which are often regarded as the gold standard for methodological assessment),208 less stringent definitions were used to define an adequate scientific domain in this study. Despite this, information related to most domains was often not provided. 35% of trials did not specify a primary outcome, and 58% of trials did not report a sample size calculation. This has a number of implications.

72 First, the aims and hypotheses of the trial are difficult to elucidate by the reader, and therefore placing the trial in the correct context is more difficult.

Second, the lack of a target sample size may result in an increased incidence of false negative results (Type 2 error), or (of greater concern) an excess number of patients recruited into a trial with interventions of unproven efficacy and/or safety.209 Third, without a priori specified primary and secondary outcomes, selective reporting of outcomes may be more prevalent.88,210 While most trials were described as “randomised” in the title

(allowing them to be readily identified as such), the method of random sequence generation was adequately described in only 42% of studies. Of equal concern is the number of trials described as randomised but where quasi-random methods were used to determine patient allocation.

During the study identification stage, it was found that 29 trials that were described as “randomised” in either the title or the abstract did not use adequate methods to determine randomisation. Alternate allocation, birth date or patient record number were often erroneously described as “random” methods, although are essentially predictable.47 This was confirmed in the survey of authors, where 13% of respondents who reported a random method actually used quasi-random methods (Table 2.6). It is unclear to what extent this issue applied to RCTs where the method of sequence generation was not described and where authors did not respond to the survey, but it is likely to be a common occurrence.

The integrity of the randomisation sequence is preserved by its concealment prior to the allocation of patients. However, concealment also had a high rate

(57%) of unclear reporting. Given that adequate randomisation and

73 concealment are essential safeguards against selection and confounding bias,52 and that empirical evidence exists for the exaggeration of outcome estimates when these are not adequately reported,39,53 users of the research have a right and responsibility to know how a trial was performed, and authors and journals should insist that these are described with sufficient detail in order for a judgment to be made.

Blinding was uncommon in surgical RCTs, with only 35% of studies reporting any form of blinding. Since patients and caregivers are difficult to blind in the context of a trial comparing different surgeries, and more difficult if not impossible in trial comparing a surgical to a non-surgical intervention,62 the most commonly reported blinded party was an outcome assessor. Without adequate blinding, trials are subject to performance and observer bias211 and due to the very nature of surgical interventions, trials of surgery may be particularly susceptible to these forms of bias.212

Only one quarter of trials specified the intention to treat principle as the method of primary data analysis, the main purpose of which is to preserve the integrity of the randomisation of participants to intervention groups,74 as the reasons for participants not receiving their randomly allocated intervention may confound RCT results.213

Most trials also did not report their source of funding. The effect of commercial interest on trial results is inconsistent. Well funded trials may be more appropriately designed and of higher overall quality, but may be subject to biased (pro-industry) interpretation of results.95

The reporting of scientific quality domains in this surgical RCT sample is similar to that found in previous assessments of surgical trials. Jacquier and

74 colleagues examined a sample of 158 articles, and using the CLEAR-NPT criteria, found poor rates of reporting for sequence generation (41%), allocation concealment (25%), blinding (17%) and intention to treat (36%).

Other studies that have been journal specific36 or specialty specific206 found similar rates of poor reporting.

Broadly, surgical research has been criticised in the past as a “comic opera” for its reliance on lower quality study types, such as retrospective reviews and case series.119 Much has been written on the challenges facing the conduct of RCTs in surgery. Surgical interventions are relatively complex and multilayered, are subject to a lack of equipoise, and often take place in an emergency setting.135 It is therefore not surprising that there is often a delay of many years before a widely used surgical intervention is subjected to a

RCT.10 The IDEAL collaboration (and their recommendations) were established to improve on the status quo,189 and provide guidelines for how the evidence on any surgical intervention can evolve from the most basic research, to more stringent study designs such as RCTs. For example, the early assessment of procedures may take the form of structured case reports, and prospective development studies. Where RCTs may not be feasible due to rare clinical scenarios or strong patient or surgeon preferences, controlled interrupted time series studies or expertise based trials may be more appropriate.189

It is partly due to the known challenges of conducting RCTs of surgical interventions that the quality of surgical trials is thought to lag behind trials of other interventions. Boutron and colleagues explored this issue in one clinical trial subject area, osteoarthritis, and found that trials with pharmacologic

75 intervention trials had higher quality scores than non-pharmacologic trials.184

The same authors also conducted a similar study concluding that outcome blinding was often more difficult if not impossible for non-pharmacologic trials, when compared to pharmacologic trials in the same disease area.62

In order to allow a broader comparison of surgical and non-surgical trials, we intentionally adopted similar quality reporting definitions as two prior broad assessments of general medical trials indexed in PubMed.34,186 Both reviews had the same authors and originated in the same department at Oxford

University, and therefore adopted very similar protocols. Of note, the electronic search strategy, inclusion criteria, methods of the search for studies, and definitions of trial characteristics were identical, allowing a direct comparison of trial quality between the time periods34,214, and an examination of the effect of the CONSORT guidelines on trial reporting.215 The methods used in this chapter had many similarities, but also some important differences. First, while the same RCT search filter was used, an additional

“surgical intervention” filter was added (Appendix 1, page A-1). Second, for logistic purposes, three researchers piloted the search for studies and data extraction until a high level of agreement was reached. A single researcher then extracted the rest of the data. This was similar to the methods used by

Hopewell and colleagues,214 while Chan and Altman used a single reviewer to assess studies several months apart.34 Most importantly, this Chapter adopted the same liberal definitions for adequate trial quality as the two previous studies, and these were clarified by direct communication with the author (personal email communication with Sally Hopewell, 17th September,

2012).

76 Surgical RCTs were more likely to adequately report a specified primary outcome, the generation of a randomisation sequence, and allocation concealment. There are several explanations for why surgical RCTs were more adequately reported for some domains. First, there was a time lag between the different trial samples: the surgical trials in this sample were from August 2008 to May 2009, compared with the trials of Chan and Altman from December 2000, and Hopewell et al from December 2006. Hopewell and colleagues demonstrated that the quality of trials has improved over time.

When compared to Chan’s sample six years earlier, there was a 62% improvement in the description of random sequence generation, a 40% improvement in allocation concealment, and an 18% improvement in primary outcome specification.214 However, these improvements are not sufficient to explain the differences between this sample of surgical trials and Hopewell and colleagues’ sample from two years earlier. We demonstrated a 24% improvement in the description of the randomisation sequence, a 71% improvement in allocation concealment, and a 24% improvement in primary outcome specification. Second, it is possible that (given surgical RCTs are more challenging to perform), authors of surgical RCTs are more aware of the importance of methodological safeguards. It was noted that approximately one in five surgical trials had an author with an epidemiology and / or statistics background, and this figure is likely to be under-reported.

Conversely, the reporting of any form of blinding was 40% less likely in surgical trials. While outcome assessors can always be blinded, participants and carers are often more difficult to blind in surgical trials in the absence of a sham intervention, which has both ethical and logistic problems.216

77 The source(s) of support were also less likely to be reported in surgical RCTs.

There is consistent evidence that a trial supported by industry is likely to have pro-industry conclusions,96 but the bulk of this evidence relates to pharmaceutical industry sponsorship of drug trials. More recently, after several high profile cases, attention has been directed at author conflicts of interest with medical device companies104. While authors are usually obliged to report sources of trial financial support by journals, their relationships with industry, and the receipt of non-financial sources of trial support may not always be disclosed.217

The associated survey of authors found that authors were likely to under- report scientific quality domains in their published RCT. This finding is consistent with previous investigations of the same issue,190,192,193 and is problematic, since the judgment of internal validity (or truth) of a trial is primarily sourced from the published report. Furthermore, previous empirical assessments of bias that have demonstrated exaggerated effects in trials with inadequate or unclear scientific quality domains, have relied on the information provided in published reports.56 While it is possible that survey responses may have been biased to providing adequate scientific quality responses, this does not explain why a substantial proportion of respondents admitted that allocation was not concealed (21%), or that blinding was not performed (34%). Nevertheless, seeking further information from authors is often time consuming and expensive, and cannot routinely be relied upon for the assessment of the internal validity of a trial. The provision of adequate information in the published report should be the accepted standard, but

78 authors should be contacted to clarify unclear data when conducting a systematic reviews and meta-analysis.

The main strengths of the investigation in this chapter were the protocol driven, systematic review design, which was piloted at different stages to ensure validity, and the large representative sample size of included RCTs.

In addition, similar quality domain definitions as prior studies were used, allowing direct comparison with other specialties. The main weakness of this investigation was that a single researcher (for logistical reasons) performed most of the search and data extraction from included RCTs. However, the pilot quality control data indicated that the reproducibility of the search and data extraction between the single data extractor and others persons were high. It is also possible that the time lag necessary to conduct this study means the surgical trial sample used (from 2008-2009) does not reflect current standards. However, this is an inherent issue with research of this type. For example, Chan and Altman’s published review of PubMed randomized trials had a time lag of around five years,34 while Hopewell and colleagues results were published four years later.214 Another weakness is the low author response rate, which is a threat to the validity of the survey findings.

In conclusion, this chapter described the general and scientific quality characteristics of the surgical RCT sample used in much of the remainder of this thesis. We found that scientific quality domains were often not reported adequately, but still compared well with previous assessments of RCTs in a variety of specialties. Our survey of the authors of the RCTs suggested that

79 methodological safeguards, although performed, were often not reported in publications.

While this chapter investigated key scientific quality domains, no conclusions can be drawn about other important dimensions of clinical trial reporting.

These issues include placing the trial (and its findings) in the appropriate context, a description of statistical methods (and its findings), and issues related to external validity (or generalizability). The CONSORT statement recommends these items, and others, and is the most widely validated and supported reporting guideline in the academic community. An investigation of the compliance of surgical RCTs with CONSORT is therefore warranted.

Furthermore, while this study investigated the reporting of methodological domains, the effects of inadequate scientific domain reporting on outcome estimates remain unanswered, and this may provide empirical evidence of any bias. These issues are explored in the following two chapters.

80 3. CONSORT compliance in surgical randomised trials: are we there

yet? A systematic review

3.1 Abstract

Background: Randomised controlled trials (RCTs) provide clinicians with the best evidence for the effects of interventions, but may not be reported with necessary detail. A systematic review was performed assessing the reporting quality of trials of surgical interventions, and associated trial level variables were explored.

Methods: In May 2009, three databases (MEDLINE, EMBASE and

CENTRAL) were searched for RCTs that assessed a surgical intervention using a comprehensive electronic strategy developed by the Cochrane

Collaboration. The Consolidated Standards of Reporting Trials (CONSORT) checklist was used as a measure of reporting quality. An overall CONSORT score was calculated and expressed as a proportion. This was supplemented with domains related to external validity. Data was also collected on characteristics hypothesised to improve reporting quality, and exploratory regression was performed to determine associations.

Results: 150 of the most recently published RCTs were included. Overall reporting quality was low, with only 55% of CONSORT items addressed.

Less than half of trials described adequate methods for sample size calculation (45%), random sequence generation (43%), allocation concealment (45%), and blinding (37%). The strongest associations with reporting quality were adequate methods related to methodological domains, an author with an epidemiology/statistics degree, and a longer report length.

81 Conclusions: There remains much room for improvement for the reporting of surgical intervention trials. Authors and journal editors should apply existing reporting guidelines, and guidelines specific to the reporting of surgical interventions should be developed.

3.2 Introduction

The randomised controlled trial (RCT) is recognised as the gold standard design for comparing the effectiveness of different interventions.218.

Randomisation and unbiased allocation of interventions allow prognostic variables to be distributed equally within comparison groups and statistical theory to be applied. 185 The problem is that the quality of reporting of randomised controlled trials is consistently low in different specialties,35,113-

115,206,219,220 and without the presentation of sufficient information, an assessment of the risk of bias in a trial is difficult or impossible. To appraise and apply the results of a RCT, a clinician has the right and responsibility to know the details of how it was performed, regardless of methodological quality. The absence of such details has been viewed as unsatisfactory221 and was the driver behind an international group of researchers, statisticians, epidemiologists and editors to produce a checklist and flow diagram for the ideal reporting of RCTs in 1996: the Consolidated Standards of Reporting

Trials (CONSORT).116 This has since been updated and in its current format consists of a 25-item checklist.215

Surgical research has previously been criticised as a “comic opera”119 for its focus on poor quality research. Randomised trials make up only 12% of the published surgical literature in the most cited surgical journals.36 Given the

82 rarity of surgical randomised trials, the surgical scientific community has been urged to take CONSORT seriously if the surgical evidence base is to be reliable.222 Some progress has been made since, and more recently the focus of the community has been directed towards surgical interventions.10,135,189 While there is evidence that the introduction and endorsement of CONSORT among journals has resulted in an overall improvement in reporting quality,223,224 journals may not always enforce the guidelines.225 In addition, only 43 of over 400 CONSORT endorsing journals have a surgical subject matter,226 a small proportion of the 562 journals indexed in the National Library of Medicine with “Surgery” as a subject heading.227

Surgical RCTs are, however, becoming more common.186 The conduct of surgical trials is often challenging.135 Surgical interventions and patient characteristics and settings are often more complex, recruitment is often difficult, and trials may be difficult to generalise where these aspects are not reported. Previous reviews have assessed the reporting quality of trials, but these were often limited to a specific specialty,228,229 were not approached systematically,120 and may not reflect the current published evidence base.140

Further, most trials with a “surgical” subject matter may not have assessed a surgical or procedural intervention.115,140,229 This study was performed to address these deficits.

3.3 Aims

The aims of this study are threefold: i) to evaluate the extent to which recent publications of randomised trials examining a surgical intervention comply

83 with the recommendations of the CONSORT statement, ii) to evaluate the extent to which these trials report factors relating to external validity (or generalisability) and iii) to explore the associated variables of compliance with CONSORT.

3.4 Methods

3.4.1 Study design

A systematic review was performed in order to identify 150 of the most recently published randomised trials assessing a surgical intervention. Ethics approval for this study was sought and granted by the local research committee.

3.4.2 Eligibility criteria

To be eligible for inclusion, a study had to be: i) a randomised controlled trial.

The definition of the U.S. National Institutes of Health was used: “A study in which participants are randomly (i.e., by chance) assigned to one of two or more treatment arms of a clinical trial”;194 ii) published as a full text article; iii) published in the English language; iv) the primary publication where multiple publications from one investigation are available. The primary publication was defined as the first publication from an investigation or the presentation of results where the methods of the trial were first described in full; v) conducted on humans (not cadavers), and; vi) a comparison between a surgical intervention and any other intervention. A surgical intervention was defined as any procedure that required surgical training and is usually performed by a surgeon of any subspecialty recognised by the Royal

Australasian College of Surgeons. This included upper and lower

84 gastrointestinal, transplant, cardiothoracic, neuro, ear nose and throat, paediatric, plastic and reconstructive, urology, vascular and orthopaedic surgery. Obstetric/gynaecologic, ophthalmic and dental surgeries were excluded. Injections of any material, applications of splints, and interventions purely for diagnostic purposes were also excluded.

3.4.3 Study identification

An electronic search strategy was formulated in collaboration with a medical librarian associated with the Cochrane Collaboration. The search strategy was designed with a “randomised trial filter” based on the Cochrane highly sensitive search strategy230,231 and a “surgical intervention filter”. MEDLINE via Ovid (2005-Week 3, May 2009), EMBASE via Ovid (2005-Week 21,

2009) and CENTRAL via Wiley Interscience (2005-Issue 2, 2009) were searched and references imported into EndNote (Thomson Reuters,

California, USA) reference management software. Since the aim was to include the most recent trials, study identification began with the most recent reference and proceeded backwards in time. Using 1000 references, study identification was piloted by two researchers in order to resolve issues with interpretation of the eligibility criteria. Titles and abstracts of references were screened and the full text of potentially eligible articles was obtained. Studies were only included after assessment of their full text. The pilot search resulted in almost perfect agreement201 among the two assessors (kappa =

0.85, 95% confidence interval 0.77 – 0.93) and thereafter study identification was performed by one researcher, in an identical process.

85 3.4.4 Data extraction

An electronic data form was created and piloted by three researchers. First, a round table discussion took place clarifying the definition of each item.

Second, the data form was calibrated using a random sample of 15 reports.

Thereafter, data extraction was performed by one researcher (the PhD candidate) and was checked for quality assurance by another researcher.

3.4.5 CONSORT checklist

Compliance with CONSORT was assessed using the 2001 version215 (the most recent checklist published when the search for this study was executed).

The checklist consists of 22 items divided among the different section of the publication and is available in Appendix 8. The accompanying explanatory statement was used to clarify the definition for each item, which was graded as adequate, inadequate or not applicable. As a measure of overall reporting quality, a CONSORT score112 was calculated for each study, with a maximum possible score of 22, and then expressed as a proportion. Across all studies, the proportion of adequately reported individual CONSORT items was also calculated.

3.4.6 Items related to external validity

Since the earlier versions of the CONSORT statement were not specifically designed to address issues related to surgical randomised trials, additional items related to external validity (generalisability) were added.232 These include descriptions and justifications233 for eligibility criteria, details of the surgical intervention, anesthetic type, preoperative and postoperative care, the number and experience level of surgeons involved, setting details

86 including location, number and expertise of centres, and methods of recruitment. Operational definitions are available in Appendix 8.

3.4.7 Associations with reporting quality

Several factors have been previously hypothesised to influence the reporting quality of a published article.77,206 These are author, report and journal characteristics, including the number of authors, author expertise in epidemiology or statistics, type of comparison (surgical vs. non-surgical), type of journal (general vs. subspecialty), journal impact factor, total sample size, multicentre status, and length of the article in words. They also include the reporting by trial authors of adequate methodological factors that are used to reduce the risk of bias in a randomised trial: description of sample size calculations, specifying primary outcomes, random sequence generation, allocation concealment, blinding and source of funding. These variables are referred to here as “methodological domains”.

3.4.8 Data analysis

A descriptive analysis was performed with frequencies and proportions for categorical variables and means, medians and standard deviations for continuous variables. Exploratory multiple linear regression was modeled to examine whether author, report, journal and methodological characteristics are associated with reporting quality as measured by the CONSORT score.

The CONSORT score met assumptions of normality and was treated as a continuous outcome variable. Univariate regression was performed for each potentially associated variable, and variables with a p value <0.25 were then included in stepwise backward linear regression. Subsequently, a p value of

<0.05 was regarded as statistically significant. Since some CONSORT items

87 overlapped with methodological domains (see Appendix 8), a pre-planned sensitivity analysis was also performed using a CONSORT score without these items. Removal of these six items resulted in a maximum CONSORT score of 16 for this analysis. A sample size of 150 studies was selected to power the exploratory linear regression, allowing for ten studies per regression variable.

3.5 Results

3.5.1 Study inclusion

The process for study inclusion was conducted in an identical fashion to the description provided in Section 2.5.1 of this thesis, but only the 150 most recently published articles were selected for evaluation in this chapter, spanning a time period from August 2008 to May 2009. Figure 3.1 depicts a flow diagram of study inclusion, with reasons for exclusion at each stage.

3.5.2 Characteristics of included studies

Table 3.1 depicts the general characteristics of the included studies. Most

(75%) were single centre studies comparing a surgical intervention with another surgical intervention. General surgery was the most common subspecialty (29%), followed by orthopaedic (23%), cardiothoracic (13%) and ear nose and throat surgery (12%). Most were published in a subspecialty journal with a median impact factor of 2.4.

Methodological domains were poorly reported. Less than half of trials adequately reported a sample size calculation, specified a primary outcome or randomisation method, reported an adequate method of allocation concealment, or declared their source of funding. Only 37% of trials

88 attempted any blinding (usually of the outcome assessor) highlighting the difficulty of blinding trials with a surgical intervention.

Figure 3.1 Flow diagram of randomised trial inclusion for this chapter

89 Table 3.1 Characteristics of randomised trials included in this chapter

Characteristic (Total N=150)

Number of authors, median (IQR) 6 (4) Author epi/stats degree, n (%) 24 (16) Subspecialty, n (%) General (incl. upper/lower gi) 43 (29) Orthopaedic 35 (23) Cardiothoracic 20 (13) Other subspecialties 52 (35) Type of comparison, n (%) Surgery vs. surgery 131 (87) Surgery vs. non-surgery 19 (13) Type of journal, n (%) General surgery 14 (9) General medicine 9 (6) Subspecialty surgery 97 (65) Subspecialty medicine 30 (20) Impact factor, median (IQR) 2.4 (1.8) Country, n (%)^ USA 21 (14) Italy 18 (12) UK 16 (11) Total sample size, median (IQR) 68 (110) Multicentre, n (%) 37 (25) Length of article in words, median 2799 (1316) (IQR) Power calculation, n (%) 67 (45) Outcome specification, n (%) 82 (55) Random sequence, n (%) 64 (43) Allocation concealment, n (%) 68 (45) Blinding, n (%) Participant 24 (16) Care givers 7 (5) Outcome assessor 51 (34) Any blinding 55 (37) Funding source, n (%) Industry 37 (25) Non-industry/none 29 (19) Unclear 84 (56)

IQR = interquartile range, ^ Only the top three countries presented for brevity

90 3.5.3 Reporting of CONSORT items

The mean adjusted CONSORT score was 12.2 (SD=3.8) out of 22 items

(55%). Table 3.2 depicts the rate of adequate reporting for each CONSORT item. The majority of trials justified the reason for the study (Item 2), described their statistical methods (Item 12), reported on adverse events

(Item 19) and gave a conclusion statement consistent with their findings in the context of other research (Item 22). Other domains were poorly reported, such as the implementation of allocation concealment and blinding (Items 10 and 11), specification of the intention to treat principle as the primary method of analysis (Item 16), the appropriate reporting of outcome estimations with measures of precision (Item 17), or the discussion of any issues related to the generalisability (external validity) of the findings (Item 21).

3.5.4 Reporting of items related to external validity

Table 3.3 summarises the reporting of external validity items. Not all trials described their surgical interventions with sufficient detail for replication

(74%), or reported well defined inclusion/exclusion criteria (78.7%). Most inclusion/exclusion criteria were strongly justified (78.7%). Other items were poorly reported, such as descriptions of preoperative or postoperative care

(17.3% and 50% respectively), or details of the centres where the study was conducted.

91 Table 3.2 Reporting of individual CONSORT items

n (N=150) % 1. Title and abstract 142 95 2. Background 150 100 3. Participants 59 39.3 4. Interventions 110 73.3 5. Objectives 86 57.3 6. Outcomes 75 50.0 7. Sample size 66 44.0 8. Sequence generation 65 43.3 9. Allocation concealment 67 44.7 10. Implementation 19 12.7 11. Blinding 18 12.0 12. Statistical methods 133 88.7 13. Participant flow 69 46.0 14. Recruitment 109 72.7 15. Baseline data 125 83.3 16. Numbers analysed 41 27.3 17. Outcomes and 21 14.0 estimation 18. Ancillary analyses * 40 56.3 19. Adverse events 120 80.0 20. Interpretation 79 52.7 21. Generalisability 31 20.7 22. Overall evidence 133 88.7

*Total N = 71 since not all trials performed or reported any ancillary analyses

92

Table 3.3 Reporting of items related to external validity

External validity item n (N=150) % Inclusion/exclusion criteria reported 118 78.7 Proportion of criteria strongly justified - 79.4 Proportion of criteria potentially justified - 17.9 Proportion of criteria poorly justified - 3.2 Details of surgical interventions 111 74.0 Anaesthetic details 61 40.7 Preoperative care 26 17.3 Postoperative care 75 50.0 Number of surgeons 71 47.3 Experience level of surgeons 59 39.3 Number of centres 99 66.0 Details of each centre 57 38.0 Location of centre 57 38.0 Method of recruitment 30 20.0 Number of patients recruited at each 64 42.7 centre

93 3.5.5 Associations with reporting quality

Results of exploratory regression are depicted in Table 3.4. After adjustment for other variables, the reporting by trial authors of methodological domains

(sample size calculation, outcome specification, random sequence, and allocation concealment) were all strongly associated with CONSORT score, with each variable associated with an increase in 1.7 to 2.5 points to the total

CONSORT score. The length of the article in words also had a strong association, but had a smaller effect, with an additional 0.6 in adjusted

CONSORT score (95% CI, 0.2 to 1.0) for every 1000 words presented in the report. A pre-planned sensitivity analysis that removed CONSORT items that overlapped with methodological domains from the CONSORT score had similar results, with the same significant variables included in the final model.

The final model had an R2 value of 70%. Diagnostics revealed that regression assumptions were valid, and there was no evidence of multicollinearity.

3.5.6 Inter-observer and intra-observer reliability assessment

Overall inter-observer agreement on CONSORT scores was high (intraclass correlation = 0.98). Agreement on individual items was more variable (kappa statistic range 0.6 to 1), but median kappa showed substantial agreement

(kappa = 0.73).

Intra-observer agreement showed a similar pattern, with agreement on

CONSORT scores very high (intraclass correlation = 0.97) but some variability on individual items (kappa range 0.63 to 1, median kappa = 0.75).

94 Table 3.4 Results of exploratory regression for variables associated with CONSORT score

Univariate Multivariate Predictor Beta (95% CI) p value Beta (95% CI) p value

Power calculation 4.3 (3.3 – 5.4) <0.001 1.7 (0.8 – 2.6) <0.001 Outcome specification 5.1 (4.1 – 6.0) <0.001 2.5 (1.6 – 3.4) <0.001 Random sequence 3.4 (2.3 – 4.5) <0.001 2.2 (1.4 – 3.0) <0.001 Allocation concealment 4.0 (2.9 – 5.0) <0.001 1.7 (0.9 – 2.5) <0.001 Any blinding 0.9 (-0.3 – 2.2) 0.15 0.3 (-0.5 – 1.1) 0.49 Industry funding -1.4 (-3.0 – 0.1) 0.07 -0.7 (-1.7 – 0.2) 0.12 Number of authors 0.4 (0.2 – 0.6) <0.001 -0.05 (-0.2 – 0.1) 0.50 Epidemiology degree 4.2 (2.7 – 5.7) <0.001 1.7 (0.5 – 2.8) 0.004 Non-surgical control 1.2 (-0.7 – 3.0) 0.22 -0.2 (-1.4 – 0.9) 0.67 General 2.0 (0.3 – 3.7) 0.02 0.2 (-1.1 – 1.5) 0.75 medical/surgical journal Impact factor 0.2 (0.1 – 0.3) <0.001 0.02 (-0.1 – 0.1) 0.66 Research country 2.2 (0.9 – 3.5) 0.001 -0.4 (-1.3 – 0.5) 0.40 Total sample size ^ 0.5 (0.3 – 0.7) <0.001 0.02 (-0.2 – 0.2) 0.82 Multicentre trial 3.0 (1.6 – 4.3) <0.001 0.8 (-0.2 – 1.8) 0.10 Length of article * 1.5 (1.0 – 2.1) <0.001 0.6 (0.2 – 1.0) 0.006

^Beta results expressed per 100 patients. *Beta results expressed per 1000 words.

95 3.6 Discussion

Existing studies assessing the reporting of randomised trials in surgery are deficient. Studies were either specialty specific,228,229 were not conducted systematically,120 or assessed trials of non-surgical interventions.115,140 The extent to which recently published surgical trials comply with CONSORT, which may be regarded as the current standard of trial reporting, is therefore unknown. A systematic review was completed to identify randomised trials assessing any surgical intervention, in order to assess the extent of compliance with the CONSORT statement. It was found that just over half of

CONSORT items were reported adequately in a typical trial, with a concerning gap in the reporting of several items. The variables most strongly associated with CONSORT score were the reporting of adequate scientific methods (the adequate reporting of methodological domains) as well as the length of the printed article. It was also found that issues related to the external validity of the trials were under-reported. These findings are consistent with studies in other subspecialties that have used the CONSORT checklist as a measure of reporting quality,234,235 although in randomised trials published in the highest impact general medical journals in 1998, higher proportions of CONSORT items were addressed (68% vs. 55% in this review) after the adoption of the statement.236 This study updates what is known about compliance with CONSORT as a reporting guideline, and is the first to systematically address trials that specifically examine a surgical procedure.

While the rate of reporting of most CONSORT items was moderate overall, of particular concern was the unacceptably poor reporting of the implementation

96 of randomisation and blinding. Few trials actually provided details of how the integrity of their adopted methods was preserved. It is incorrect to assume that a trial that has been described as “randomised” has adopted truly random allocation methods. In the assessment of records for inclusion in this study, a concerning proportion of reports were described as randomised trials in the title and/or abstract, but did not meet the criteria for being a randomised trial after an inspection of their methods (see Figure 3.1), such as the use of alternate allocation,237 or allocation by record number.238

Furthermore, results from exploratory regression reinforce the importance of methodological domains. Variables most strongly associated with overall reporting quality were adequate reported methodology related to a sample size calculation, outcome specification, randomisation and allocation concealment, even when CONSORT items that may overlap with these methodological domains were removed from the overall reporting score. It should be noted that CONSORT reporting and methodological reporting were differentiated:112 an item that was reported in the trial may not necessarily reduce the risk of bias. For example, some surgical trials reported that blinding of participants and/or investigators was not possible, with reasons.

Other trials described methods of patient allocation that were not concealed.

While these were regarded as adequate reporting, the risk of measurement and/or performance bias was often high.70 Nevertheless, these findings are consistent with a previous smaller review that concluded reporting quality

(using a CONSORT score) was associated with adequate scientific methodology,112 although this sample size was larger and allowed adjustment for multiple factors. Other variables associated with reporting

97 quality included at least one author with a degree in epidemiology (or similar qualification), and the length of the article in words. These findings may have implications for both journal users and editors.

Almost half of the trials did not present a clear objective statement. Current recommendations are that objective statements follow the PICO format,239 with clear information on the “P”-patients, “I”-intervention(s), “C”-control and

“O”-outcomes studied. A clear objective immediately guides the reader to the main focus of the study, allowing a more effective interpretation and appraisal of the study results. Further, it may provide information as to what the study was statistically powered to find by specifying the primary outcome(s). Details for both of these items (primary outcomes and sample size calculation) are contained in other items (Items 6 and 7), and these were reported at similarly low rates. Type 2 errors (which can occur when studies are underpowered) are common in published reports, and are likely to be more common in surgical trials where sample sizes tend to be low.209

Another area requiring improvement is the presentation of outcome results. It is recommended that outcomes display an effect estimate (such as a mean difference, or risk ratio) as well as a measure of precision, such as a 95% confidence interval. The use of p values was common, and measures of precision were often displayed for each intervention group separately, but the reporting of confidence intervals for outcomes was uncommon. Confidence intervals provide important information (beyond that conveyed by p values alone), such as the power of a study to detect a clinically important difference between groups.240

98 Reporting of items related to external validity are essential for readers to determine whether the results of a study apply to their own practice. While the CONSORT statement recommends reporting on these items, it may be argued that further detail is required for surgical interventions, which are often far more complex than drug interventions.135 For this reason separate criteria were assessed to determine whether surgical trials reported on issues of generalisability. Readers determine who the study population is

(and therefore to which patients the study applies) from the reported inclusion/exclusion criteria,233 as well as from details on how patients were recruited to the study. While most trials reported detailed criteria, it was concerning that one in four trials did not. Reported inclusion criteria were often strongly justified, which compares well with previous studies.233 Criteria were often justified based on the safety or efficacy of invasive surgical interventions, highlighting one of the difficulties in recruiting patients for randomised trials of surgical interventions. While most studies described their surgical interventions in sufficient detail for replication (74%), it may be argued that this falls short of what is appropriate for what is essentially the main purpose of the trial. Unlike pharmacological interventions, surgical interventions are subject to a high degree of variability in terms of approach, technique, equipment and prostheses used. It may not always be valid to assume that the intervention examined in a trial is similar to the reader’s practice.

Other external validity domains were poorly reported, despite the fact that lax requirements were used for reporting, with any description deemed adequate.

Preoperative and postoperative care protocols may involve important co-

99 interventions that may bias effects, and are often the subject of randomised trial assessment themselves.241 Surgeons also need to know who did the procedures in the trial. The experience level of the surgeons involved is commonly cited as a reason why an intervention is not applicable to their practice. Trial surgeons may either have additional training in a procedure (or practice in a high volume centre), or may be inexperienced trainees. A similar logic applies to the details of the centres where the trial took place, while the location of the centre indicates who the study population was. This review reinforces findings from other external validity assessments of non- pharmacologic intervention trials232 and emphasises the need for specific reporting guidelines for trials of surgical interventions.

The main strengths of this study were the protocol driven design, the use of a pilot period to standardise data collection, and the assessment of trial reports from all subspecialties and journals, which maximises the generalisability of these findings to the surgical evidence base. However, this study has some limitations. First, a single researcher performed most of the search and data collection, with the possibility of measurement error. However, assessment of interobserver reliability (using a pilot sample) was high and compared well with similar studies.113,114,228 The data was also checked for its integrity, with few corrections made. Second, the purpose of this study was an assessment of reporting quality, not methodological quality, or risk of bias. It is possible that trial authors actually established adequate methodological standards to reduce the risk of bias, but then failed to report these in the published report.

Previous studies suggest that a lack of reporting does not necessarily mean inadequate methodology was adopted.242 However, empirical assessments

100 of treatment effect bias were based on published reports,39 not what was actually performed by trial investigators, which is difficult if not impossible to verify. It was also possible that trials that described inadequate approaches to methodological domains were graded as adequately reported. An example of this would be an unblinded trial that described the blinding status of those involved in the study, or a trial that had inadequate methods regarding allocation concealment, but which described the process of patient allocation.

Thus, CONSORT reporting and adequate methodological domain reporting were differentiated. Readers have a right to readily accessible information regarding a trial from a single report or other reference, without having to search for a protocol or contact trial authors, and that information should be reported in a trial regardless of its methodological quality. Third, the (now outdated) 2001 version of the CONSORT statement was used, since that was the current version at the time the study was executed. During the conduct of this study two updates of the CONSORT statement were published. One was an explanatory statement of CONSORT for trials of non- pharmacological interventions,243 and another was a general update of the checklist in 2010.244 However, it was noted that both updates contain the same items as that of the 2001 version, with changes in wording aimed at simplifying use and understanding. The additional external validity recommendations for non-pharmacologic trials were addressed in a separate section.

In conclusion, these findings add to the evidence that randomised trials examining a surgical intervention are not adequately reported. Trial authors need to be made aware of existing guidelines for reporting, and journal

101 editors should insist and assess compliance with these guidelines. In addition, a guideline that addresses issues specific to trials of surgical interventions should be developed and promoted by the surgical research community.

102 4. The association between quality and effect estimates in surgical

randomised trials

4.1 Abstract

Background: Randomised controlled trials (RCTs) provide a high level of evidence for the effects of interventions, but bias may be introduced when methodology is not adequate. Bias may also be introduced at the aggregate level if studies are not published based on the significance of their results

(publication bias). Several studies have provided empirical evidence for this bias, but the evidence is inconsistent. Furthermore, the influence of bias on effect estimates has not been explored specifically in trials of surgical interventions. This study stratified surgical trials according to whether or not methodology was adequately reported, and quantified the extent to which outcomes were distorted based on this categorisation. Publication bias was also explored.

Methods: A systematic review of 400 recently published surgical intervention

RCTs was conducted using a comprehensive online search strategy of

MEDLINE, EMBASE and CENTRAL. Data related to quality (randomisation method, allocation concealment, blinding, dealing with attrition, and source of funding) was extracted using predefined criteria for adequate methodology.

The primary outcome was also extracted from each study and standardised using log odds ratios. Random effects meta-regression was modelled to examine for differences in outcome estimates in RCTs with adequate methods, vs. RCTs with inadequate or unclear methods, and this was

103 expressed as a ratio of odds ratios. Publication bias was explored with forest plots and the Egger regression test.

Results: RCTs had a high prevalence of inadequate or unclear reporting of methods: 57% of RCTs had unclear randomisation sequences, 55% had inadequate or unclear allocation concealment, 73% were unclear regarding blinding of the primary outcome, 76% had inadequate or unclear methods of dealing with attrition, and 19% were funded partially or wholly by an industry source. Trials with inadequate or unclear methods of random sequence generation exaggerated effects by 17% on average (ROR= 0.83, 95% CI

0.70 – 0.98, p= 0.04, I2= 72%). Trials that did not report follow up rates and an analysis according to intention to treat exaggerated effect by 24% on average (ROR= 0.76, 95% CI 0.63 – 0.92, p= 0.005, I2= 72%). In a sensitivity analysis including a subset of trials that specified their primary outcomes, inadequate or unclear allocation concealment was also associated with an exaggeration of effects by 19% on average (ROR 0.81, 95% CI 0.66 – 0.99, p=0.04). No significant associations were found for outcome blinding, or trials with industry funding. There were similar findings in trials that had subjective primary outcomes. However, in trials that had objective outcomes, an inadequate or unclear randomisation sequence was not associated with an exaggerated of effects.

Conclusions: This study provides empirical evidence for the exaggeration of outcome estimates when inadequate RCT methodology is employed.

Trialists should minimise the risk of bias by using the most appropriate

104 methods, and users of research should be mindful of the potential for outcomes to be exaggerated when RCTs are of poor quality.

4.2 Introduction

In clinical research, “bias” refers to structural errors that favour one outcome over another.25 For the comparison of interventions, the randomised controlled trial (RCT) aims to reduce or eliminate bias by adopting a number of methodological safeguards, such as the random allocation of participants

(to reduce selection bias52), blinding (to reduce performance or measurement bias61), and intention to treat (in order to preserve the integrity of the randomisation process74). These methods have a strong theoretical basis for the reduction in bias, and there is accumulating empirical evidence indicating a distortion of outcome estimates in RCTs when trial authors do not report these methodological safeguards.

The empirical evidence for bias is primarily sourced from so-called “meta- epidemiology” studies. Several of these have been published (in different clinical areas) on the association between quality in trials and estimation of effect,39,54,77,245 and their findings have been variable.77,245 The Bias in

Randomized and Observational Studies (BRANDO) project, a collaboration of several authors of meta-epidemiology studies who subsequently combined their data into one large dataset, attempted to resolve these discrepancies.

The BRANDO project had findings consistent with an exaggeration of effects when the randomisation sequence (exaggeration of 11%), allocation concealment (exaggeration 7%) and blinding (exaggeration of 13%) were not

105 adequately reported.56 While the BRANDO study has some limitations, such as confounding by other methodological characteristics, it is the most comprehensive quantification of the bias in RCTs that is available. It should also be noted that most meta-epidemiology studies have relied on trial reports for the grading of that trial’s methodology- the “guilty until proven innocent” approach.190 This may not be valid, given that previous studies (as well as this thesis) have demonstrated trial authors often perform these safeguards, but fail to report them in the trial publication. It is therefore important to attempt to clarify what actually took place in their investigation prior to any analysis of bias.

The effects of bias may differ depending on the relative subjectivity of a measured outcome. Objective outcomes are less prone to human influence or manipulation, such as mortality or automated laboratory measurements, and are therefore less subject to measurement and/or performance bias.

More recent meta-epidemiologic studies have examined the differential influences on bias on different types of outcomes,70 and empirical evidence from the BRANDO project supports this hypothesis, with an increase in the exaggeration of effects in subjective outcomes (compared to objective outcomes), when blinding was not present.56

While methodological characteristics describe bias in RCTs at the study level, the publication and summary of RCT results in systematic reviews and meta- analyses is also subject to bias.84 Studies have shown that RCT results that are favourable (or statistically significant) are more likely to be published, be published earlier,81 and be cited than studies with unfavourable results.246

106 The implications are that systematic reviews and meta-analyses of RCTs contain only studies that have shown favourable results, while those that are unfavourable remain unpublished and difficult if not impossible to access, and thus summary effect estimates are also more favourable towards the intervention group.247 This form of bias, called publication bias, takes place on a different level than study level methodological domains - the review or dissemination level - and warrants investigation in any systematic review of the evidence, such as that presented in this thesis.

The challenges faced by surgical trialists are well documented,131 but little is known about the effects of bias on outcome estimates in surgical trials.

Previous meta-epidemiology studies (such as the BRANDO project) combined trials from a wide variety of clinical specialties, and did not examine differential effects of bias on different types of interventions.56

Surgical interventions may be particularly prone to the effects of bias

(compared to other non-surgical interventions). Surgery is often perceived as expensive and highly technical, requiring substantial training and expertise, and often associated with “cutting edge” technology. These factors may combine to an additive “placebo effect”,68,71 and when methodological safeguards are not present, outcome estimates may be particularly prone to exaggeration. Surgical trials may also be more prone to publication bias, given the high prevalence of industry sponsorship, and the relative under- reporting of sources of funding seen in Chapter 2. Surgeon investigators are also subject to a lack of equipoise,131 which may compound the file drawer problem248 when unfavourable results are found in a trial.

107 4.3 Aims and hypothesis statements

The primary aim of this study was to assess whether trials with adequate methods to reduce bias (defined as adequate random sequence generation, allocation concealment, blinding, analysis by intention to treat, and freedom from industry funding) have, on average, different effect estimates than those without adequate methodology. The hypothesis was that trials with inadequate methodology would, on average, exaggerate effect estimates when compared with trials with adequate methodology.

Another primary aim of this study was to assess the evidence of publication bias in surgical trials. The hypothesis was that surgical trials would exhibit evidence consistent with publication bias, in the form of visually asymmetric funnel plots and significant regression tests of publication bias.

A secondary aim of this study was to assess whether subjective outcomes are more prone to effect exaggeration due to poor methodology, than objective outcomes. The hypothesis was that subjective outcomes are more highly exaggerated than objective outcomes in the absence of adequate methodology.

4.4 Methods

4.4.1 Study design

Systematic review and meta-analysis (incorporating metaregression) of published RCTs assessing a surgical intervention.

108 4.4.2 Inclusion of studies for the review

The same sample of studies used in Chapter 2 (Epidemiology and quality of randomised trials of surgical interventions). For details on the eligibility criteria, search strategy, and study identification, refer to Chapter 2 methods.

4.4.3 Data extraction- primary outcomes

Pre-specified criteria were used to define the primary outcome of each included surgical RCT for use in the analyses. In order of priority, these criteria were: the outcome was used in the sample size calculation of the study; or the outcome was explicitly described as the “primary” outcome (or an appropriate synonym) in an aims or hypothesis statement. Where these were not clear, the most appropriate outcome was chosen based on information in the title, abstract, or introduction. Whenever possible, a patient centred249 outcome was chosen. A second researcher checked the outcomes selected, when they were not explicitly specified as primary, and any disagreements were resolved by discussion.

4.4.4 Data extraction- quality domains

The quality domains of random sequence generation, allocation concealment, blinding (of the extracted primary outcome), analysis according to intention to treat, and a non-industry source of funding were examined for their effects on outcome estimates. These domains have been previously defined in detail in

Chapter 2 of this thesis (refer Table 2.2), but were dichotomised for the purposes of this analysis into adequate vs. inadequate or unclear. Funding source was dichotomised into non-industry vs. other source of funding.

109 4.4.5 Standardisation of effect sizes

For each primary outcome, effect sizes were standardised using the natural logarithm of the odds ratio (logOR) and its standard error. Outcomes were coded such that an odds ratio greater than one (or a positive logOR) favoured the intervention, and an odds ratio less than one (or a negative logOR) favoured the control group.

Since outcomes were presented in different ways in the included RCTs, the following methods were used to standardise effect estimates into logORs for the purposes of the analysis:

For dichotomous outcomes, the events and non-events in the intervention and control groups were extracted, and the odds ratio (and its standard error) was calculated using usual methods.250 When a zero count was present in one cell of a row or column in the 2 x 2 table, 0.5 was added to all cells to avoid computational problems,251 as per the procedures of most standard statistical packages. When results were presented as a calculated effect estimate (such as an odds ratio) this was extracted and transformed into a logOR. An associated measure of precision was also extracted, such as a

95% confidence interval, or exact p value, and the standard error of the logOR was calculated using one of these.252

Continuous outcomes were extracted for each group along with a measure of precision, such as a standard error or standard deviation for the effect estimate in each group, or an exact p value or 95% confidence interval where continuous effect sizes were presented. This data was used to calculate a standardised mean difference (Cohen’s d) along with its standard error,253

110 which was transformed into a logOR by the method described by Chinn et al254 and recommended by the Cochrane Collaboration.250 This method assumes that the continuous measures in the intervention and control groups follows a logistic distribution, and the variability in both groups is the same.

Since this assumption was not met for some trials with low group numbers and/or non-parametric distributions, data from these trials was excluded.

RCTs that did not present sufficient data for pooling in a meta-analysis were also excluded. Data was extracted into an electronic spreadsheet, and

Comprehensive Meta-Analysis software (Version 2.2) software was used to compute effect sizes.255

4.4.6 Statistical analysis

Effect sizes were combined in random effects meta-analysis,256 and heterogeneity was estimated using the I2 statistic.257 Random effects metaregression was used to assess the influence each quality domain has on effect sizes, while controlling for potential study level confounders, including study design (parallel vs. split body/other), study type (efficacy vs. equivalence) and outcome objectivity (subjective vs. objective outcomes).

Objective outcomes were defined as mortality, automated laboratory measurements, or outcomes based on a need for a procedure70 (See

Chapter 6). The primary analysis was univariable metaregression (for each quality domain individually). Multivariable analysis was also performed combining all quality domains in a single metaregression model, which allowed the computation of an adjusted association for each quality domain in the presence of other quality measures. The influence of each quality

111 domain on effect sizes was expressed as a ratio of odds ratios (ROR), where a ROR less than one represents a reduction in effect estimates for studies with adequate quality (or an exaggeration of treatment effects in studies with inadequate quality). RORs and their 95% confidence intervals and p values are presented here.

4.4.7 The influence of quality domains on objective vs. subjective outcomes

To assess the potentially different influences that quality domains have on objective and subjective outcomes, the data were stratified and separate metaregression analyses were performed for studies with objective outcomes and, separately, studies with subjective outcomes. Identical procedures were followed, with RORs calculated for each quality domain in univariable and multivariable analyses, and these were expressed along with their 95% confidence intervals and p values.

4.4.8 Sensitivity analyses

In Chapter 2 it was found that 35% of RCTs did not specify a primary outcome either as an explicit description or the use of an outcome for a sample size calculation (see Table 2.5). For these RCTs, the most appropriate outcome was selected based on the information presented in the abstract and introduction, and was checked by two researchers as per the methods described above. To assess whether the analyses were robust to the selection of primary outcomes used, subgroup analyses were performed including only trials that specified a primary outcome using a sample size calculation, or an explicit definition. Identical procedures were followed, with

112 RORs calculated for each quality domain in univariable and multivariable analyses.

The primary analyses described here assess study quality using the reports of included RCTs. However, in Chapter 2 it was also found that there was discordance in quality domains between what was reported in the published report of the RCT, and the methods that authors claim to have used (via an author survey). A sensitivity analysis was therefore conducted using quality domain classifications that incorporate author survey responses. Since the author survey enquired about methods related to randomisation sequence, allocation concealment, blinding, and sources of funding, the sensitivity analysis was restricted to these domains

4.4.9 Publication bias

Evidence of publication bias was examined visually using funnel plots.258,259

Funnel plots are modifications of the Galbraith plot, initially developed in psychological research to examine the scatter of different outcome measures that measure the same quantity.260 Treatment effects are plotted on the horizontal axis, against a measure of study precision on the vertical axis.

Since treatment effects from smaller (more imprecise) studies vary more widely at the bottom of the plot, the plot normally resembles an inverted funnel in the absence of bias.247 Several estimates of precision are available for the y axis plot, but since precision relies on both the sample size and event rates in a study, the inverse of the standard error was chosen as a more accurate measure of precision.259 Since asymmetric funnel plots may also be produced in the presence of study level bias (with or without the

113 presence of publication bias), or when treatment effects tend to be visually unidirectional,247, contour enhanced funnel plots were also used. These plot treatment effects within contour lines representing different degrees of statistical significance.261

Publication bias was also explored statistically using the Egger test.262,263

The Egger test is a statistical analogue of a funnel plot,247 and is a weighted linear regression approach that tests the treatment effect against its standard error, with weights inversely proportional to the standard error. The regression line is then superimposed on the funnel plot. When there is evidence of bias, the regression line will not run through the origin, indicating smaller, less precise studies have systematically different effects than larger studies.258

4.5 Results

4.5.1 Inclusion and characteristics of trials

Of the 400 RCTs included, 13 trials had insufficient data presented for their primary outcome for inclusion in meta-analysis, nine studies had non- parametric data, and three studies had zero events, leaving 375 studies available for analysis. Figure 4.1 depicts a flow diagram of these trials.

114 Figure 4.1 Flow diagram of studies included the analysis for this chapter

RCTs had a high prevalence of unclear/inadequate reporting of quality domains: 215 trials (57%) had unclear random sequence generation, 208 trials (55%) had inadequate or unclear allocation concealment, 273 trials

(73%) were unclear regarding blinding of their primary outcome, 272 trials

(76%) had inadequate or unclear methods of dealing with attrition, and 73 trials (19%) were funded partially or wholly by an industry source.

The majority of studies were efficacy trials (360 trials, 96%), and most were parallel trials (355 trials, 95%). Most trial outcomes were graded as subjective in nature (319 trials, 85%), 205 outcomes were continuous (55%), and 243 outcomes (65%) were specified as primary either explicitly or by use in a sample size calculation. As previously reported in Chapter 2 (pages 69-

71), a response rate of 35% was achieved in an emailed author survey, and substantial discordance was observed between survey answers and published reports (Tables 2.6 – 2.10).

115 4.5.2 Univariable analyses

Figure 4.2 is a forest plot of ratios of odds ratios (RORs) for each quality domain. Trials with inadequate or unclear methods of random sequence generation exaggerated effects by 17% on average (ROR= 0.83, 95% CI

0.70 – 0.98, p= 0.04, I2= 72%). Trials that did not report follow up rates and an analysis according to intention to treat exaggerated effect by 24% on average (ROR= 0.76, 95% CI 0.63 – 0.92, p= 0.005, I2= 72%). No significant association was found for allocation concealment, blinding, or trials with industry funding (Figure 4.2).

Figure 4.2 Effect exaggeration in surgical RCTs. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain.

Similar associations were found between quality and effect estimates in a subgroup analysis of trials with subjective outcomes (Figure 4.3). Trials with inadequate or unclear randomisation sequence generation exaggerated outcomes by 19% on average (ROR= 0.81, 95% CI 0.67 – 0.98, p= 0.03, I2=

116 73%) and trials with inadequate methods of dealing with attrition exaggerated effects by 20% on average (ROR= 0.80, 95% CI 0.65 – 0.99, p= 0.04, I2=

73%). No associations were found for allocation concealment, blinding, or industry funding.

Figure 4.3 Effect exaggeration in surgical RCTs: Subjective outcomes. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain.

In a subgroup analysis of trials with objective outcomes, an exaggeration of effect estimates was no longer noted for trials with inadequate or unclear randomisation sequences (ROR= 1.03, 95% CI 0.65 – 1.64, p= 0.90, I2=

70%) but trials with inadequate methods of dealing with attrition were still observed to exaggerate effects (ROR= 0.58, 95% CI 0.36 – 0.94, p= 0.03, I2=

64%). No other significant associations were noted (Figure 4.4).

117 Figure 4.4 Effect exaggeration in surgical RCTs: Objective outcomes. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain.

A subgroup analysis was conducted for 243 trials (65% of the sample) that specified their primary outcome either explicitly or by the use of a sample size calculation (Figure 4.5). While an association was no longer observed for trials with inadequate or unclear randomisation sequences, trials with inadequate or unclear allocation concealment exaggerated effects by 19% on average (ROR 0.81, 95% CI 0.66 – 0.99, p= 0.04, I2= 71%). Inadequate methods of dealing with attrition were again observed to exaggerate effects

(ROR= 0.73, 95% CI 0.60 – 0.90, p= 0.003, I2= 71%).

A sensitivity analysis was also conducted using the total RCT sample, but incorporating answers received in a survey of authors (see Table 2.6). Since the survey did not enquire about issues related to attrition, this domain is not represented here (Figure 4.6). No evidence was found for effect estimate bias for any quality domain when survey responses were taken into account.

118 Figure 4.5 Effect exaggeration in surgical RCTs: Specified primary outcomes. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain.

Figure 4.6 Effect exaggeration in surgical RCTs: Incorporating survey responses. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain.

119 4.5.3 Multivariable analyses

When all quality domains were combined in metaregression analysis (giving adjusted estimates of quality domains in the presence of each other), no statistically significant associations were found (Figure 4.7). Similar findings were observed when only subjective outcomes were considered (Figure 4.8).

When only objective outcomes were analysed, an association was noted only for trials with inadequate methods of dealing with attrition (ROR= 0.55, 95%

CI 0.33 – 0.92, p= 0.02, I2=64%) (Figure 4.9). In sensitivity analyses, no associations were noted when only trials that specified their primary outcomes were analysed, nor when survey responses were taken into account (Figures 4.10 and 4.11).

Figure 4.7 Multivariable analysis of effect exaggeration in surgical RCTs. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain.

120 Figure 4.8 Multivariable analysis of effect exaggeration in surgical RCTs: Subjective outcomes. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain.

Figure 4.9 Multivariable analysis of effect exaggeration in surgical RCTs: Objective outcomes. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain.

121

Figure 4.10 Multivariable analysis of effect exaggeration in surgical RCTs: Specified primary outcomes. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain.

Figure 4.11 Multivariable analysis of effect exaggeration in surgical RCTs: Incorporating survey responses. A ratio of odds ratios of less than one indicates larger treatment effects in trials of inadequate methodology for that domain.

122

4.5.4 Publication bias

A standard funnel plot is presented in Figure 4.12, with effect estimates plotted on the horizontal axis and a corresponding measure of precision (the standard error of the effect estimate) on the y-axis. On visual inspection, most studies had similar levels of precision. There was an approximately

“inverted funnel” shape, with evidence of asymmetry, and a higher concentration of studies significantly favouring the intervention group (the right side of the plot). A similar pattern was observed in a sensitivity plot including only outcomes specified as being primary (Figure 4.13).

To further elucidate this finding, a contour enhanced funnel plot is presented that plots effect estimates within contour lines representing different levels of statistical significance (Figure 4.14). An approximate inverted funnel shape is noted, but a much higher concentration of studies were noted in an area of the plot representing significant effects (p<0.05 and p<0.01) that favour the intervention group. Similar findings were observed in a sensitivity plot that included only outcomes specified as primary (Figure 4.15).

The Egger test was consistent with publication bias (Egger coefficient= 1.33,

95% CI 1.00 – 1.65, p<0.001), and is demonstrated graphically in Figure 4.16.

The standard normal deviate of the effect estimate and its 95% CI do not cross the origin. A sensitivity plot including only outcomes specified as primary (Figure 4.17) revealed a similar result.

123 Figure 4.12 Funnel plot with pseudo 95% confidence limits. Standard error of effect estimate on y-axis plotted against effect estimates (logarithmic scale) on x-axis.

0 .5 1 SE of log odds ratio SE odds of log 1.5 2 10 20 30 40 Odds ratio

Figure 4.13 Funnel plot with pseudo 95% confidence limits- Specified primary outcomes only. Standard error of effect estimate on y-axis plotted against effect estimates (logarithmic scale) on x-axis.

0 .5 1 SE of log odds ratio SE odds of log 1.5 2 10 20 30 40 Odds ratio

124 Figure 4.14 Contour enhanced funnel plot. Inverse standard error of effect estimate on y-axis plotted against effect estimates on x-axis. Shaded areas represent different levels of statistical significance for each effect estimate.

15 Studies p < 1%

1% < p < 5%

5% < p < 10%

p > 10% 10

5 Inverse standard error standard Inverse

0 -4 -2 0 2 4 Effect estimate

Figure 4.15 Contour enhanced funnel plot- Specified primary outcomes only. Inverse standard error of effect estimate on y-axis plotted against effect estimates on x-axis. Shaded areas represent different levels of statistical significance for each effect estimate.

15 Studies p < 1%

1% < p < 5%

5% < p < 10%

p > 10% 10

Inverse standard error standard Inverse 5

0 -4 -2 0 2 4 Effect estimate

125 Figure 4.16 Egger regression plot. Standard normal deviate (SND) of effect estimate on y axis plotted against precision on x-axis.

2 0 -2 SND of SND effect estimate

0 5 10 15 Precision

Study regression line 95% CI for intercept

Figure 4.17 Egger regression plot- Specified primary outcomes only. Standard normal deviate (SND) of effect estimate on y axis plotted against precision on x-axis.

2 0 -2 SND of SND effect estimate

0 5 10 15 Precision

Study regression line 95% CI for intercept

126 4.6 Discussion

A systematic review and metaregression analysis was conducted to examine whether surgical trials of low scientific quality exaggerate intervention effects.

It was found that trials that did not report an adequate method of random sequence generation exaggerated effects by 17% on average, and trials with inadequate methods of dealing with attrition exaggerated effects by 24% on average. There was also evidence that inadequate allocation concealment biased intervention effects, with a subgroup analysis of specified primary outcomes consistent with a 19% exaggeration of intervention effects. The objective nature of an outcome may also make it less prone to bias: inadequate reporting of randomisation sequences was associated with an exaggeration for subjective outcomes, but not for objective outcomes. It was also found that funnel plots were asymmetrical, with a higher concentration of studies significantly favouring the intervention group.

An adequate method of random sequence generation allows the allocation of patients to intervention groups by chance, while minimising the risk that participants are selected based on prognostic factors (selection bias).25 This method is the basis of the RCT, and it is not surprising that this investigation as well as most meta-epidemiology studies have found empirical evidence of effect exaggeration.39,53,55,264 Other methodological safeguards are used to preserve the integrity of the randomisation sequence. One of these is allocation concealment. Although a sequence may be totally unpredictable in nature, if known to those involved in the study it may be subject to manipulation.47 While other studies found evidence for exaggeration of

127 effects in inadequately/unclearly concealed trials, the primary analysis here did not. There are a number of explanations for this. First, it is possible that these two methodological domains are conflated, with trials describing adequate methods of sequence generation also adopting adequate methods of concealment when the latter was not reported. The corollary to this may also apply, as trials that do not adopt adequate methods for sequence generation (such as alternate allocation or other methods that may be anticipated) are unlikely to be able to conceal that sequence. Second, in the context of poor reporting, it is likely that a portion of trials that did not report their method of sequence generation adopted sequence generation methods that are non-random in nature (exacerbating bias), but trials that did not report methods of concealment adopted adequate methods (a reduction in bias). This finding was confirmed by the survey of authors reported in

Chapter 2. In addition, a sensitivity analysis including only outcomes specified as primary by trial authors found that allocation concealment is associated with an exaggeration of effects, while sequence generation was not.

Another methodological safeguard used to preserve the integrity of randomisation is the analysis according to the “intention to treat” principle. In this study, an adequate method of dealing with attrition was defined as the reporting of follow up rates for each group, as well as an analysis of participants based in the group to which they were randomised, or a statement by trial authors that the analysis was according to intention to treat.

The basis for bias in this domain is still largely theoretical, although one

128 empirical study found that secondary analyses that exclude patients tend to favour the intervention group.265 Other studies have examined for evidence of bias in the context of missing outcome data, but no clear empirical evidence exists for an exaggeration of effect estimates. 39,55,56,77 Ruiz-Canela et al found that intention to treat analysis was association with an overall improvement of trial methodology, which may explain the effect estimate exaggeration found in this study.266 There is evidence however, of a wide variation in the interpretation of the intention to treat principle,74 which may limit the findings presented here, which often relied on author reporting to determine adequate methods of dealing with attrition.

No associations were found for other quality domains, including blinding, and the declaration of funding. Blinding has previously been noted as difficult or impossible in trials of surgical interventions. Lack of blinding puts the trial at risk of performance bias through different behaviour patterns and/or expectations in the groups, and the degree of bias may be different depending on the nature of the outcome.70 The focus of this study was on whether the primary outcome was blinded (since that was the effect of interest), but since blinding methods were generally poorly reported (see

Chapters 2 and 3), it was difficult to assess the success of blinding of participants and carers (as well as the outcome assessors). It was noted however, that when subjective and objective outcomes were separately assessed, the effect estimate in subjective outcome studies favoured the intervention group to a larger extent than objective studies. Previous empirical studies have found variable results in the extent of bias caused by

129 the lack of blinding, with some demonstrating an association,39,53,54 while others did not.77,245 However, the BRANDO study, which pooled results from hundreds of meta-analyses from several different meta-epidemiology studies, and allowed adjustment for the objective nature of an outcome, found the strongest exaggeration of treatment effects in unblinded studies.56 The effects of sources of funding on publication of significant or favourable results are well documented,80,103 but the mechanisms of bias within a commercially funded trial remain unclear. Bias may be more likely due to a lack of equipoise, or a subjective interpretation of trial results.25 While some studies have concluded treatment effects are biased towards their commercial funders93,95-97 others have noted that funded trials are associated with an improvement in scientific quality.98 In this study, no consistent evidence was found that commercial sources of support biased intervention effects. It should be noted that a broad definition of support was used, such as the donation of prostheses or equipment to the trial from a commercial source

(defined as partial industry funding), as well as financial or monetary support.

It is possible that partial forms of support do not influence outcome estimates.

Funnel plots used in this study demonstrated asymmetry, with a majority of studies significantly favouring the intervention group. This finding was robust to our selection of the primary outcome for each trial. While this may be evidence of publication bias, there are also valid alternative explanations for this finding. Since funnel plots do not convey information about the methodological quality of the trial, the plots may also be due to biases resulting from the inadequate methodologies discussed above. Thus funnel

130 plots should be interpreted as a means of examining “small study effects”, since smaller studies tend to have larger effect estimates, and are also more likely to be subject to different forms of bias.247 Given that the search and selection of studies for this investigation was comprehensive and systematic, the plots indicate that bias exists, either due to publication bias or otherwise.

This was confirmed statistically by the Egger regression test, which showed strong evidence of funnel plot asymmetry, although the Egger test may be limited by a high false positive rate.262

Statistical techniques to investigate the influence of trial level characteristics on effect estimates are relatively new.267 Several approaches have previously been used. Perhaps the most common, initially described by

Schulz and colleagues in 1995, involves the analysis of binary data from collections of meta-analyses.268 Trial level characteristics were compared within each meta-analysis, and their pooled odds ratios compared. Thus, for each meta-analysis a “ratio of odds ratios” (ROR) was calculated as a measure of the exaggeration of effect estimates in trials with and without a particular characteristic. RORs calculated from each included meta-analysis may then be pooled in either logistic regression models, or pooled further in meta-analysis to obtain an overall estimate of bias in the included sample.

This approach has gained a firm place in methodology research, and studies using it are termed “meta-epidemiology studies”.53,268-270 However, this approach requires the search and selection of meta-analyses, and these were not the units of study in this section of the thesis. Instead, a meta- regression approach was adopted in a similar fashion to previous studies that

131 have compared effect sizes across broad categories of healthcare trials.271-

273 The main advantage of this approach is that it does not rely on the availability of eligible randomised trials within meta-analyses. Indeed, any group of trials of interest may be assessed.273 In order to coherently compare studies effect sizes may be standardised (allowing the comparison of both binary and continuous data), but this requires that effect sizes meet the assumption of a normal distribution.254 In this chapter, nine studies were excluded as their effect sizes were clearly non-parametric. Another advantage of the meta-regression approach is that a larger number of covariates may be examined in the presence of each other (as in the multivariable analysis found within this chapter), whereas the meta- epidemiology approach is limited in this respect by the number of studies contained within each meta-analysis.

The strengths of this study lie in the protocol driven systematic review design, and in the piloting and checking of the data. The main limitation is the observational nature of the study design, and the risk of confounding. Indeed, the use of metaregression methods to determine treatment effect exaggeration may itself be confounded by the size of the treatment effect.

Smaller studies (as demonstrated by funnel plots) tend to have larger effect sizes (in either direction) and may also be more likely to be of lower scientific quality,274 resulting in confounding. However, there is empirical evidence that trial methodology rather than study size is responsible for exaggeration of effects. A meta-epidemiology study by Kjaergard and colleagues found that intervention effects in small trials with adequate methodological

132 characteristics did not significantly differ from large trials. However, inadequate methodology (which is more prevalent in small trials) was associated with effect exaggeration, and is likely to be the main reason why effect estimates are exaggerated in small trials.55 A substantial amount of residual heterogeneity was also observed in multivariable analyses. The scientific quality domains included in these analyses explained only a small proportion of heterogeneity. Clearly, there were many sources for study differences, since a wide variety of interventions, patients groups and subspecialties were included.

In conclusion, this chapter demonstrates that bias exists in published surgical

RCTs. This bias exists at the trial level: inadequate methodology related to random sequence generation, allocation concealment, and dealing with attrition was associated with an exaggeration of treatment effect. It may also exist at an aggregate level in the form of publication bias. This research adds to what is known on the theoretical and empirical importance of adequate methodology in randomised trials, and emphasises the need for well designed studies, the need for appropriate review and reporting in peer reviewed journals, and the need for careful appraisal of methodological quality prior to introducing practice change. However, little is known about whether bias exists at the outcome level in surgical trials. The issue of selective reporting of outcomes, based on statistical significance or other factors unrelated to their clinical importance, is perhaps the next challenge in the efforts to improve trial methodology, and is investigated in further detail in the following two Chapters.

133 5. Outcome reporting bias in surgical randomised trials: A

systematic review and meta-analysis

5.1 Abstract

Background: In recent years, much attention has been directed at the publication of research based on the significance of results, but few investigations have focussed on within-study reporting biases. The reporting of study outcomes based on their significance, or outcome reporting bias, is akin to publication bias, and may overestimate the benefits and underestimate the harms of interventions. This study aimed to quantify the extent to which outcomes measured in surgical randomised trials are completely reported, as well as the association between statistical significance and completeness of reporting.

Methods: A systematic review of 350 recently published surgical intervention

RCTs was conducted using a comprehensive online search strategy of

MEDLINE, EMBASE and CENTRAL, and this was supplemented with data obtained from a search of online trial registries, and a survey of authors.

Each trial’s outcomes were extracted and characteristics related to level of reporting, statistical significance, and harm/efficacy status were recorded. An outcome was defined as fully reported when sufficient data was reported for inclusion in a meta-analysis, and otherwise graded as partially reported

(some but not all data required for meta-analysis), qualitatively reported (only a statement of significance) or completely unreported (where there was evidence of measurement but no data presented). At the trial level, a 2x2 table was populated with outcomes graded as fully reported (yes/no) and

134 statistically significant (yes/no), and an odds ratio was calculated. Odds ratios were pooled in random effects meta-analysis to obtain and overall estimate of outcome reporting bias. Efficacy and harm outcomes were also explored separately in subgroup analysis.

Results: A total of 8,258 outcomes (4,141 efficacy and 4,117 harm) were extracted from 350 RCTs. As a proportion of outcomes reported in an average trial, 74% were fully reported, 13% were partially reported, 5% were qualitatively reported, and 8% were entirely unreported. Evidence of at least one unreported outcome was found in 123 trials (35%). Statistically significant outcomes were 2.4 times more likely to be fully reported than non- significant outcomes (OR = 2.4, 95% CI 1.7 – 3.3). This bias was larger when only efficacy outcomes were considered (OR = 3.3, 95% CI 2.0 – 5.6), but smaller when only harm outcomes were considered (OR = 2.2, 95% CI 1.3 –

3.8).

Conclusions: In a systematically obtained sample of surgical RCTs, there was a high prevalence of incomplete reporting of outcomes, and evidence of outcome reporting bias. Registration of trial protocols should be mandatory prior to trial commencement, and made accessible at the time of peer review and publication.

5.2 Introduction

In recent years, much attention has been focused on issues related to the internal validity of clinical trials56 as well as the bias related to the publication and dissemination of the evidence. There is considerable evidence that publication of studies favours those with statistically significant or favourable

135 intervention results.84 Furthermore, studies with significant results are more likely to be published earlier81,82 and cited by other authors.246,275 The implication is that the pooled evidence from published studies incorporates this bias, and leads to an exaggeration of effect estimates.247

A correlate of publication bias that has been widely suspected, but only recently explored, is the selective reporting of outcomes. Selective outcome reporting is defined as the publication of a subset of the original outcome variables measured, on the basis of their results.276 Given the difficulty in identifying unpublished outcomes, evidence of selective outcome reporting was previously restricted to editorials277,278 or small case reports.279 More recently however, the work of Chan and colleagues has provided more concrete evidence. In two separate investigations that compared trial protocols submitted to ethics committees, a high prevalence of incompletely reported outcomes was found.24,88 In addition, outcomes that were statistically significant were more likely to be reported in full. The effects of selective outcome reporting are akin to publication bias, in that pooled treatment effects may be biased. This has been explored in a number of statistical models276,280, which demonstrate that in meta-analysis, an inverse relationship exists between the proportion of trials contributing to the meta-analysis, and effect size.280 The findings of Chan and colleagues have since been supported by a cohort of pharmaceutical trial protocols,281 but otherwise few empirical studies have documented the nature and extent of outcome reporting bias.

Little is known about how outcome reporting bias affects trials of surgical interventions. Hall and Hall reported the patterns of outcome reporting in

136 surgical trial publications and found they often fail to specify outcomes and statistical comparisons of interest, but they did not investigate whether outcomes are selectively reported.282 Surgical trials may have smaller sample sizes than drug trials, and therefore be underpowered to detect differences in a range of measured outcomes. Furthermore, surgical trials are often subject to the equipoise of their surgeon investigators, which may exacerbate the problems of publication bias, as was as the selective reporting of outcomes.

In this Chapter, the association between outcome reporting and statistical significance in trials of surgical interventions is reported.

5.3 Aims and hypothesis statements

A primary aim of this study was to assess the prevalence of incompletely reported (and unreported) outcomes in recently published trials of surgical interventions. The hypothesis was that a high prevalence of incomplete outcome reporting (at least 50% as per previous evidence) exists in trials of surgical interventions.

A primary aim of this study was to assess the association between statistical significance and incomplete outcome reporting. The hypothesis was that outcomes that are statistically significant are more likely to be completely reported than outcomes that are not statistically significant.

A secondary aim was to examine the association between statistical significance and incomplete outcome reporting in a subgroup of efficacy and harm outcomes. The hypothesis was that the association between selective outcome reporting and statistical significance was larger for efficacy outcomes than harm outcomes.

137 A secondary aim was to explore the predictors and reasons for incomplete outcome reporting in surgical trials, through a survey of authors and exploratory metaregression analysis of trial characteristics as explanatory variables.

5.4 Methods

5.4.1 Study design

Systematic review and meta-analysis of published RCTs assessing a surgical intervention. This chapter is reported according to the PRISMA statement guidelines.283 The protocols for this thesis were not registered, but were peer reviewed through the usual process of the University of New

South Wales. A copy of the protocols is available from the author.

5.4.2 Inclusion of studies for the review

For details on the eligibility criteria, search strategy, and study identification, refer to Chapter 2 methods. The most recent 350 trials were used for this investigation. Since little is known about the appropriate sample size calculation given the hypotheses above, this number was chosen to adequately power the metaregression analyses conducted, and was comparable to previous studies in this area.24,281

5.4.3 Data extraction

An electronic proforma was created for data extraction, and was piloted by three researchers using a random sample of ten trials. Trials were not masked as no evidence was found that this reduces bias in methodological studies.202,203 Data points and definitions were calibrated after discussion.

The remaining data were then extracted from each trial by one researcher.

138 Another researcher then checked a random sample of 100 included RCTs, and disagreements were resolved by discussion. One data point from five trials (5% of the checked sample) was changed after data checking took place.

5.4.4 Data items

General and scientific quality reporting characteristics were extracted and used both to describe the trial sample as well as for the exploratory metaregression analyses described below. This included author background, study type, study design, impact factor of journal, total sample size, multicentre status, and trial registration. Scientific quality characteristics extracted were the reporting of random sequence generation, allocation concealment, blinding, attrition, and source of funding. Tables 2.2 and 2.3 contain the operational definitions (see Chapter 2) of these variables used in this thesis.

Outcomes were defined as a variables used to compare the randomised groups in a trial, in order to assess the efficacy or harm of an intervention.24

Variables that were only measured in one group, or which were measured before an intervention was administered (e.g. a baseline characteristic) were not regarded as outcomes. For each trial, all outcomes were extracted along with their level of reporting, statistical significance and efficacy or harm status.

The definition of Chan and colleagues was used to class the level of reporting of outcomes. Table 5.1 presents the hierarchy of outcome reporting used. Level of reporting was classed as fully reported, defined as the presentation of sufficient information to include the outcome in meta-analysis

(see Table 5.2) or not fully reported. Outcomes that were not fully reported

139 (referred to here as incompletely reported outcomes) were further classed into three groups based on the amount of information presented for that outcome. A partially reported outcome contained some, but not all, necessary information for inclusion in meta-analysis, such as an effect size without a measure of precision. A qualitatively reported outcome was one where only a statement of statistical significance (or a p value alone) was reported in the trial. An unreported outcome was one where no results whatsoever was presented for that outcome, despite clear evidence that the outcome was measured, gained either from the trial methods section, a published/online trial protocol, or information gained from the trial’s registry.

Incompletely reported outcomes were a composite of partially reported, qualitatively reported, and unreported outcomes.24

The statistical significance of each outcome was also recorded based on what was reported by the authors, or the p value (or 95% confidence interval) presented. For the vast majority of studies, a p value <0.05 (or a 95% confidence interval that did not include the null effect size) was regarded as statistically significant. For some studies which pre-specified a level of statistical significance, that level was regarded as significant. Outcomes that did not have any information presented related to statistical significance were marked as unclear, and were not included in the analyses presented here that relied on this information.

Finally, the efficacy or harm status of an outcome was also recorded.

Efficacy outcomes were defined as those measuring any intended or desirable effects of the intervention. A harm outcome, its direct opposite, compared any harmful or undesirable effects of the intervention.284 The

140 Table 5.1 Levels of outcome reporting (Adapted from Chan et al. 2004)24

FULL REPORTING

Outcome has sufficient data for inclusion in meta-analysis, including the sample size in each group, effect size, and a measure of precision for continuous outcomes (see Table 5.2)

INCOMPLETE REPORTING

Partially reported Only some (but not all) data required for inclusion in meta-analysis is provided, such as an effect size alone or precision alone (with or without a sample size or p value).

Qualitatively reported Only a statement of significance is provided, or a p value alone (with or without a sample size).

Unreported outcomes Outcome measurements that are entirely unreported in the published report, despite evidence they were actually measured. This was determined in one of three ways: i) The outcome was mentioned in the methods section but not the results section ii) The outcome was identified in a survey of authors iii) A discrepancy existed between the outcomes listed in the trial’s registry, and the published report.

141 Table 5.2 Information required for classification as a fully reported outcome (or sufficient outcome data for inclusion in meta-analysis). Adapted from Chan et al. 2004.24

Unpaired continuous data • Group numbers and • Size of treatment effect (group means or medians or difference in means or medians) and • Measure of precision or variability (confidence interval, standard deviation, or standard error for means; range for medians) or the exact P value

Unpaired binary data • Group numbers and • Numbers of events or event rates in each group

Paired continuous data • Mean difference between groups and a measure of its precision or exact P value or • Raw data for each participant

Paired binary data • Paired numbers of participants with and without events

Survival data • Kaplan-Meier curve with numbers of patients at risk over time or • Hazard ratio with a measure of precision

142 efficacy/harm status of outcomes was used to conduct the planned subgroup analyses described below.

5.4.5 Survey of authors

A survey of authors was also carried out as per the methods described in

Chapter 2. A copy of the online survey may be found in Appendix 6. In brief, all authors were sent a personalised email inviting them to participate in a study investigating the quality of published randomised trials. Authors were asked whether their published report contained all the outcomes measured, and whether any outcomes were not reported. If any outcomes were not reported, authors were asked to indicate the main reason(s) why that outcome was not reported, including journal/word restrictions, publication in another paper, not statistically significant, not clinically important, or any other reason, which they were asked to specify.

Unreported outcomes found using the author survey were included in the analyses below when the significance was specified in the survey, except if the authors indicated the unreported outcome was published in another article. The reasons provided for not reporting outcomes were noted.

5.4.6 Search of trial registries

When trial registry details were provided in the published report, these were retrieved. When registry details were not provided in the published report, a keyword search was used on the World Health Organisation (WHO) Clinical

Trial Portal (http://apps.who.int/trialsearch/) in order to find any registry record. The WHO Clinical Trial Portal was selected as it combines metadata from all major international trial registries (such as clinicaltrials.gov, ISRCTN,

EU-CTR, and the ANZ-CTR), as well as many country specific registries.205 It

143 is also updated regularly on at least a monthly basis. When a trial registration number was reported, its registration information was readily identified using the Portal. If the authors provided no registration details, a combination of keywords from the title and subject were used to identify relevant registry records. These were then cross-referenced with author names, their institutions, the interventions used, and the study design, in order to select the correct record.210

For each registered trial, the list of planned outcome measurements contained in the registry was cross-referenced with the published report. Any unreported outcomes were noted and described in the analyses below.

5.4.7 Data analysis

The total number of outcomes presented in each trial was presented in frequency histograms, and proportions for each level of reporting (Table 5.1), statistical significance (yes/no) and efficacy/harm status were calculated.

For each trial, a contingency (2 x 2) table was populated with that trial’s outcomes, describing whether each outcome was fully reported vs. incompletely reported, and whether each outcome was statistically significant

(yes vs. no). Outcomes that were unclear regarding their statistical significance were not included in this analysis. If the contingency table contained a single zero cell, or two diagonal zero cells, 0.5 was added to all four cells as per the default of standard meta-analysis statistical packages.251

When a whole row or column contained zero cells, an odds ratio was incalculable and that trial was excluded from this statistical analysis.

Calculations were performed such that an odds ratio greater than one meant that a statistically significant outcome had a higher odds of being fully

144 reported when compared to an outcome that was not statistically significant,284 implying a higher degree of outcome reporting bias. Odds ratios were then combined in random effects meta-analysis and a summary odds ratio (along with its 95% confidence interval and I2 as a measure of heterogeneity) was calculated as an overall indicator of the level of selective outcome reporting bias in surgical RCTs.

In order to explore whether (trial level) author, journal or report characteristics were associated with bias, exploratory random effects metaregression was carried out using the restricted maximum likelihood method.285 Several variables were identified based on their theoretical association with selective reporting of outcomes, including industry funding, an author with an epidemiology/statistics background, study type (superiority vs. non-inferiority), study design (parallel vs. cross-over/split-body), journal impact factor,286 total sample size, multicentre status (yes/no), trial registration (yes/no) and whether authors responded to the survey (yes/no).

The exponential of the coefficient was calculated for each explanatory variable in univariable metaregression models, and a p value and 95% confidence interval calculated for the coefficient. Eligible variables with a p value <0.25 were then combined in multivariable metaregression models and a statistically significant association was defined for any variable with a p value <0.05.

5.4.8 Subgroup and sensitivity analyses

The primary analysis included all outcomes, regardless of efficacy/harm status, since harm outcomes are often used as primary outcomes in superiority trials of surgical interventions, and previous studies24,284 found

145 evidence of bias for both efficacy and harm outcomes. However, a planned subgroup analysis was also conducted for efficacy and harm outcomes, since the theoretical effect of selective reporting bias is different depending on the outcome type. For efficacy outcomes, statistical significance is more likely to be reported than non-significance, but for harm outcomes, non- significance may be more favourable to the intervention and therefore more likely to be reported than significance. Efficacy and harm outcomes were considered separately at the trial level, and separate 2 x 2 contingency tables were constructed in order to calculate an odds ratio in an identical fashion to the above. Odds ratios were pooled (in random effects meta-analysis) in order to obtain an overall estimate of bias for efficacy and harm outcomes separately, and 95% confidence intervals and I2 values as a measure of heterogeneity were estimated.

To examine whether the estimates of bias were robust to the classifications of outcome reporting level used, a different dichotomy for reporting level was used in the 2 x 2 contingency tables. Instead of full reporting vs. incomplete reporting (which included partial, qualitative, and unreported outcomes), a dichotomy of full or partial reporting vs. qualitative or unreported outcomes was used. Odds ratios were calculated and pooled in an identical fashion to the above in order to estimate bias for this sensitivity analysis.

5.5 Results

Execution of the search strategy took place in May 2009, as per the methods outlined in Chapter 2. Three hundred and fifty RCTs were included as per the

146 Figure 5.1 Flow diagram of study inclusion for this chapter

ME DUNE via Ovid EM BASE via Ovid CENTRAL via Ovid Week 3, May 2009 Week 21, 2009 Issue 2, 2009 8,214 hits 6,269 hits 1,613 hits

I

'-c_o_m_b_i_ned_ _ hits_-.-=_1_6_,o_9_6_ __,J1---l•[ Duplicates= 3,422 ]

Not randomised trial= 8,276

Not surgical intervention= 3,247

Not English language = 112

Abstracts assessed= 12,674 1------to~ Not full text= 24

Not randomised trial = 3 51 -->Described as "randomised" = 29 Not surgical intervention = 12 5 Secondary publication = 64 Not English language = 24 Duplicate publications= 11 Not full text= 11 Full text articles reviewed = 1,015 1---1 Prior to cut off date = 79

IRandomised trials included = 3 50 I

147 flow diagram presented in Figure 5.1. Three hundred and one invitations were sent to corresponding authors. Of these, 110 responses were received

(31% response rate) after three reminders (Figure 5.2).

Figure 5.2 Flow diagram of survey responses

Most trials were superiority (96%), and of parallel design (94.5%). Median total sample size was 76, and median journal impact factor was 2.3. Trial registry details were reported in 16% of trials and found online in a further

11% of trials (27% total). Most trials had inadequate or unclear reported methods for random sequence generation, allocation concealment, blinding, handling of attrition, and source of funding. Table 5.3 presents the trial level characteristics of included RCTs.

A total of 8,258 outcomes were reported in the included RCTs, comprised of

4,141 efficacy outcomes and 4,117 harm outcomes. On average, 24

148 Table 5.3 Characteristics of RCTs analysed for issues related to outcome reporting Characteristic n* (Total N=350) Author with epi/stats degree 62 (18%) Study type Superiority / efficacy 335 (96%) Non-inferiority / equivalence 15 (4%) Study design Parallel 331 (94.5%) Split body 17 (5%) Crossover 2 (0.5%) Median journal Impact factor (IQR) 2.3 (1.9) Median total sample size (Range, 76 (8 – 2352, 91) IQR) Multicentre 78 (22%) Trial registration Reported in text 57 (16%) Not reported but found online 37 (11%) Unclear 256 (73%) Primary outcome(s) specified 225 (64%) Generation of random sequence Adequate 147 (42%) Unclear 203 (58%) Allocation concealment Adequate 155 (44%) Inadequate or unclear 195 (56%) Blinding Any blinding 125 (36%) Patient 60 (17%) Carer 28 (8%) Outcome assessor 110 (31%) Primary outcome blinded 97 (28%) Unclear 225 (64%) Handling of attrition Intention to treat 93 (27%) Only follow up reported 228 (65%) Inadequate 29 (8%) Source of funding Reported 153 (44%) Full industry 9 (6%) Partial industry 61 (40%) Non-industry 49 (32%) No external source 34 (22%) Unreported 197 (56%)

149 outcomes were reported per trial (standard deviation = 22), with a range between 1 and 231 outcomes. Figure 5.3 depicts a frequency histogram of the total number of outcomes reported in surgical trials.

Figure 5.3 Frequency histogram of number of outcomes in surgical trials Figure 5.3 Frequency histogram of number of outcomes in surgical trials 20 15 10 Number of trials Number 5 0 0 50 100 150 200 250 Number of outcomes

As a proportion of outcomes reported in each trial, approximately three quarters of outcomes were fully reported, 13% were partially reported, 5% were qualitatively reported, and 8% were unreported (Table 5.4). Figure 5.4 shows a wide variation in the proportion of outcomes fully reported in surgical trials. Most outcomes were either continuous or binary, were not statistically significant (approximately 52%), and were equally distributed between harm and efficacy status, per trial. There were small differences in these proportions when stratified for trial level characteristics (see Table 5.4).

150 Table 5.4 Mean proportions of outcome characteristics per trial, stratified by trial characteristics

All Superiority Non- Parallel Split-body / Full or part Non- trials trials inferiority trials Cross-over industry industry trials trials funding funding

Number of trials 350 335 15 331 19 70 83

Variable type Continuous (%) 48 49 39 50 12 44 52 Binary (%) 42 41 57 44 5 40 39 Paired continuous (%) 3 3 1 1 44 6 4 Paired binary (%) 2 2 0 0 37 3 1 Survival (%) 2 2 1 2 0 2 0 Unclear (%) 3 3 3 3 2 5 3

Level of reporting Full (%) 74 75 68 75 68 71 74 Part (%) 13 13 13 13 13 14 13 Qualitative (%) 5 5 6 5 5 4 6 Unreported (%) 8 8 13 8 14 11 7

Statistical significance Yes (%) 22 22 23 22 23 18 23 No (%) 52 52 43 52 52 53 53 Unclear (%) 26 26 33 26 25 29 24

Efficacy / harm Efficacy (%) 49 49 60 49 48 60 53 Harm (%) 51 51 40 51 52 40 47

151 Figure 5.4 Frequency histogram of proportion of outcomes fully reported in surgical trials

Figure 5.4 Frequency histogram of proportion of outcomes fully reported 15 10 Number of trials Number 5 0 0 20 40 60 80 100 Proportion of trial outcomes fully reported

Figure 5.5 Frequency histogram of number of unreported outcomes in surgical trials

Figure 5.5 Frequency histogram of number of unreported outcomes in surgical trials 30 20 Number of trials Number 10 0 0 20 40 60 80 Number of unreported outcomes

152 Evidence of at least one unreported outcome was found in 123 trials. Figure

5.5 depicts the distribution of these outcomes amongst these trials. In total

812 outcomes were unreported, with a median of 3 outcomes per trial

(interquartile range = 6, range 1 – 87). Eighteen trial authors indicated at least one outcome was unreported in their publication, with a total of 46 unreported outcomes found using the survey. The main reasons given for unreported outcomes were publication in another paper (19/46, 41%), word limits (17/46, 37%), not statistically significant (11/46, 24%), and not clinically important (6/46, 13%). (Authors could select more than one reason per outcome.)

There was evidence of an association between level of reporting and statistical significance. In the primary analysis including all outcomes, statistically significant outcomes were 2.4 times more likely to be fully reported than non-significant outcomes (OR = 2.4, 95% CI 1.7 – 3.3, Figure

5.6). This bias was somewhat larger when only efficacy outcomes were considered (OR = 3.3, 95% CI 2.0 – 5.6, Figure 5.7), but smaller when only harm outcomes were considered (OR = 2.2, 95% CI 1.3 – 3.8, Figure 5.8).

These findings were also robust to a sensitivity analysis using a different cutoff of reporting in the 2 x 2 contingency tables (full and partial reporting vs. qualitative and unreported outcomes). Table 5.5 contains summary statistics for these analyses. It should be noted that not all trials contributed to the analyses, since many had zero cells in a whole row or column in 2 x 2 tables,

153 Figure 5.6 Forest plot: Association between level of reporting and statistical significance- All outcomes Figure 5.6 Forest plot- Association between level of reporting and statistical significance

Pooled OR 2.4, 95% CI 1.7 - 3.3

.001 .01 .1 1 10 100 1000 Odds ratio Favours reporting of non-significance Favours reporting of significance

154 Figure 5.7 Forest plot: Association between level of reporting and statistical significance- Efficacy outcomes Figure 5.7 Forest plot- Association between level of reporting and statistical significance Efficacy outcomes

Pooled OR 3.3, 95% CI 2.0 - 5.6

.001 .01 .1 1 10 100 1000 Odds ratio Favours reporting of non-significance Favours reporting of significance

155 Figure 5.8 Forest plot: Association between level of reporting and statistical significance- Harm outcomes Figure 5.8 Forest plot- Association between level of reporting and statistical significance Harm outcomes

Pooled OR 2.2, 95% CI 1.3 - 3.8

.001 .01 .1 1 10 100 1000 Odds ratio Favours reporting of non-significance Favours reporting of significance

156 Table 5.5 Pooled odds ratios of selective outcome reporting bias

All outcomes Efficacy outcomes Harm outcomes

No. OR No. OR No. OR trials (95% CI) I2 trials (95% CI) I2 trials (95% CI) I2

Primary analysis (full vs. incompletely 127 2.4 (1.7–3.4) 35% 69 3.3 (2.0–5.6) 46% 52 2.2 (1.3–3.8) 26% reported outcomes

Sensitivity analysis (full/partially reported vs. 77 3.2 (2.2–4.7) 4% 40 4.4 (2.6–7.4) 0% 30 3.5 (1.8–6.6) 9% qualitative/unreported outcomes)

Table 5.6 Results of exploratory meta-regression examining for variables with associations with selective outcome reporting bias

All outcomes Efficacy outcomes Harm outcomes

Variable OR P OR P OR P (95% CI) value (95% CI) value (95% CI) value

Industry funding 1.1 (0.4 – 3.0) 0.8 1.3 (0.3 – 5.3) 0.7 1.0 (0.1 – 7.4) 1.0

Author epi/stats degree 1.1 (0.5 – 2.7) 0.8 1.2 (0.3 – 4.4) 0.8 0.5 (0.1 – 2.6) 0.4

Non-inferiority trial 1.4 (0.4 – 4.9) 0.6 1.7 (0.4 – 7.8) 0.5 * *

Split-body/crossover trial 1.4 (0.2 – 8.4) 0.7 1.9 (0.1 – 30.1) 0.6 1.0 (0.1 – 12.5) 1.0

Journal impact factor 1.0 (0.9 – 1.1) 0.1 1.0 (0.9 – 1.1) 0.1 0.9 (0.8 – 1.1) 0.2

Total sample size 1.0 (0.9 – 1.1) 0.2 1.0 (0.9 – 1.1) 0.1 1.0 (0.9 – 1.1) 1.0

Multicentre trial 1.0 (0.65 – 1.5) 0.9 1.1 (0.6 – 2.3) 0.7 1.0 (0.5 – 1.8) 0.9

Registered trial 0.9 (0.6 – 1.4) 0.8 0.9 (0.5 – 1.6) 0.7 0.7 (0.4 – 1.5) 0.4

Responded to survey 1.4 (0.6 – 2.9) 0.4 0.8 (0.2 – 2.5) 0.7 2.8 (0.8 – 9.8) 0.1

* Evidence of substantial collinearity - odds ratio not calculable.

157 and therefore odds ratios were not calculable. It was also noted that individual trials had calculated odds ratios with very wide confidence intervals

(see Figures 5.6 – 5.8), but pooled estimates had much higher precision.

Table 5.6 presents results of exploratory metaregression to examine for associations between selective outcome reporting bias and trial level characteristics. There were no significant associations found for the primary analysis, or the subgroup analyses of efficacy and harm outcomes, for any of the variables examined.

5.6 Discussion

This chapter examined a sample of surgical trials using a systematic search of the literature, and assessed the completeness of reporting of outcomes and its association with statistical significance. It was found that there was a high prevalence of incomplete reporting. One in four outcomes did not have enough information for combination in meta-analysis. There was evidence of at least one unreported outcome in 123 trials (35% of the sample), with a median number of three unreported outcomes per trial. Using an online author survey, a response rate of 31% was achieved, and provided further information regarding reasons for unreported outcomes. Pooled analyses showed a consistent association favouring complete reporting for statistically significant outcomes, when compared to non-significant outcomes.

Incomplete reporting of outcomes in trials is a substantial problem, and has implications for the synthesis and dissemination of evidence.91,276 Assuming all randomised trials are identified for a particular systematic review of a surgical intervention, these findings suggest that approximately 25% of

158 outcomes require further information (from the authors or otherwise) in order to be included in pooled statistical analyses, and therefore their use may be limited to a qualitative or narrative description. Split body trials, and industry funded trials had lower rates of complete reporting. Split body trials may face particular challenges with the presentation of data, since outcomes are usually paired in nature, and often require more information for inclusion in pooled analyses. Industry funded trials also had slightly lower rates of complete reporting, due to a higher prevalence of unreported outcomes than other trials. This data is of particular concern given the historical issues with the dissemination of evidence sourced from commercial funding.101,287 It was noted that this study showed lower rates of incomplete outcome reporting than previous work conducted by Chan and colleagues,284 who reported rates of incomplete reporting of 42% for efficacy outcomes and 50% for harm outcomes. However, Chan et al used a sample of general medical trials published in December 2000 (compared to the sample used in this study from 2008-2009), representing a substantial time lag. Furthermore, harm outcomes were often incompletely reported in their study, but were often the focus of surgical trials (see Chapter 6).

Completely unreported outcomes represent a particular problem, since they are often difficult or impossible to detect. In this study, the trial report, an author survey, and registry details were used to detect unreported outcomes, since trial protocols are difficult to obtain and often unavailable.288 It is therefore likely that the prevalence of unreported outcomes reported here has been underestimated. Although considerable effort may have gone into the planning and measurement of outcomes in trials, their lack of reporting

159 essentially implies a loss of potentially valuable data, and since reviewers are unaware of their existence, they are unlikely to be included in systematic reviews and meta-analyses. The main reasons provided by authors for the unreported outcomes were publication in another journal article (which implies those outcomes may be accessible elsewhere), but also because of word limitations, a lack of statistical significance (in 24% of cases) and a perceived lack of clinical significance. The reasons provided were very similar to those found in a previous study.284

Outcome reporting bias exists in this systematically obtained sample of surgical trials. The implications of selective reporting are similar to publications bias, since statistically significant data is more likely to be published and available for use in evidence synthesis.91 Furthermore, trials that find non-significant outcomes are likely to also be subject to publication bias, thus compounding the bias at both the outcome and the trial level.84 In a systematic review of the empirical evidence for outcome reporting bias, only three studies have measured the association between significance and reporting, with a range of odds ratios between 2.2 and 4.7.84 While this study did not follow a sample of trial protocols from their inception to publication, the analyses conducted here were consistent with previous findings examining this association. For all outcomes, statistically significant outcomes had 2.4 times greater odds, on average, of being fully reported compared to non-significant outcomes. This bias was greater for efficacy outcomes than for harm outcomes (which is consistent with the theoretical direction of effect and the hypotheses of this paper). Furthermore, the association was robust to a sensitivity analysis using a different cut-off in 2 x

160 2 tables for level of reporting- statistically significant outcomes were also more likely to be fully or partially reported compared to non-significant outcomes, when all outcomes were considered, as well as in a subgroup analysis of efficacy and harm outcomes. These findings confirm selective outcome reporting bias in trials examining a surgical intervention.

This study has several strengths, including the protocol driven design, and the systematic search of the surgical trial literature. There are several limitations. First, this study did not follow a cohort of trial protocols from inception through to publication, as this would have required permission and approval for the release of those records from one or more ethics committees.

This in itself may be challenging to obtain.288 Furthermore, it would also be difficult or impossible to obtain a sufficient number of records given overall aim of this thesis to examine issues of bias only in trials of surgical interventions. It is thus likely that the prevalence of unreported outcomes was underestimated in this study, as well as the degree of selective outcome reporting bias. Another limitation was that the data was collected by a single researcher, with a subset of the data checked by a second researcher. Thus measurement errors are possible. Finally, it was noted in the analyses that trial level odds ratios had very wide confidence intervals, but pooled odds ratios were much more precise. While this was consistent with previous similar analyses, it also implies that the pooled odds ratios may not be generalisable to an individual trial. Instead, pooled odds ratios should be regarded as an overall, average level of bias across all published surgical

RCTs.

161 It is difficult to remedy the selective reporting of outcomes without transparent access to protocols. Selective outcome reporting compounds the issue of publication bias, and limits the complete understanding of an individual trial’s findings. Systematic reviews and meta-analyses of the evidence may therefore be biased towards interventions, and may underestimate their harms, with implications for patient health care.210

International efforts have recently resulted in the publication of SPIRIT

(Standard Protocol Items: Recommendations for Interventional Trials).289

These are general guidelines for the content of clinical trial protocols that need to be submitted to an ethics committee for approval. However, protocols need to be accessible in order to determine the presence of selective outcome reporting. Trial registries provide the ideal platform for these to be made available, and should be expanded to compulsorily include more detailed descriptions of trial methods, such as those suggested by

SPIRIT. There may be resistance to the availability of protocols in the public domain, such as ethics committee requirements290 and intellectual property issues.291 However, trial registries may also provide an option to limit general access to a protocol, holding those records in escrow until they are submitted for publication.

In conclusion, this study found that the prevalence of incomplete outcome reporting in surgical trials is high, and there is consistent evidence for the selective reporting of outcomes based on their statistical significance.

However, this study does not report on the pattern of outcomes reported in surgical trials, and whether those outcomes are clinically relevant, or important to patients. This represents another potential form of bias, in that

162 outcomes may be selected for reasons other than their clinical relevance.

This subject is examined in further detail in Chapter 6.

163 6. Are outcomes reported in surgical randomised trials important to

patients? A systematic review and meta-analysis

6.1 Abstract

Background: Randomised trials ideally should measure outcomes that are patient important and clinically relevant, but researchers may instead choose to measure surrogates of important outcomes. Surrogate outcomes are often easier to measure, require less participants, and require shorter follow up periods, which make them appealing alternatives, but their use may lead to spurious conclusions. This study assessed whether surgical trials measured outcomes that were important to patients, and whether patient important outcomes were more likely to be primary. A secondary aim explored the association between patient important outcomes and statistical significance.

Methods: A systematic review of 350 recently published surgical intervention

RCTs was conducted using a comprehensive online search strategy of

MEDLINE, EMBASE and CENTRAL, and this was supplemented with data obtained from a search of online trial registries. Trial outcome characteristics related to patient importance (important/surrogate/laboratory based outcome), primary/secondary specification, statistical significance and harm/efficacy status were extracted. The mean proportion of patient important outcomes was calculated. At trial level, a 2x2 table was populated with outcomes graded as patient important (yes/no) and primary specification (yes/no), and an odds ratio calculated. Odds ratios were pooled in random effects meta- analysis to obtain an overall estimate of the association between patient importance and primary specification. A similar analysis was conducted to

164 determine the association between patient importance and statistical significance, stratified by harm/efficacy status.

Results: A total of 8,258 outcomes (4,141 efficacy and 4,117 harm outcomes) were reported in the 350 included RCTs, with a mean of 24 outcomes per trial. Of the total, 4,939 outcomes (60%) were patient important,

2,174 outcomes were surrogates of patient important outcomes (26%), and

1,145 were laboratory or physiological outcomes (14%). The mean proportion (per trial) of patient important outcomes was 60%. The most commonly reported patient important outcomes were morbid events (mean proportion 29%), followed by intervention outcomes (10%), pain (7%) and function (7%). The mean proportions of surrogate outcomes and laboratory outcomes were 29% and 10%, respectively. In trials that specified a primary outcome, 66% specified a patient important, primary outcome. Patient important outcomes were not associated with a primary outcome status

(odds ratio = 0.82, 95% CI 0.63 – 1.1, I2 = 21%). Patient important outcomes were less likely to be statistically significant than other outcomes (odds ratio

0.61, 95% CI 0.47 – 0.79, I2 = 35%).

Conclusions: A substantial proportion of surgical randomised trials specify primary outcomes that are not patient important. This makes clinically relevant decisions based on surgical trial data difficult to achieve. Authors, journals, and trial funders should insist that clinically relevant, patient important outcomes are the focus of study, and reporting guidelines should encourage authors to report on the clinical relevance of surrogate and laboratory based outcomes.

165 6.2 Introduction

In evaluating the effects of treatment, clinicians usually rely on subjective feedback from their patients to inform their decisions, but a more scientifically rigorous process is required for the evaluation of treatments in research.

Ideally, the outcome measurements in well designed clinical trials are important to patients, and therefore immediately relevant to clinical practice.

Outcome measurements must be valid and reliable, which pose some difficulty when measuring patient centred outcomes such as function or health related quality of life.292,293 Although a plethora of validated scales and checklists may be available, they are often lengthy, difficult to administer, and may still have issues of validity when applied in different language and culture settings.294 Other patient important end points such as death and morbidity require trials with large sample sizes and long term follow up.249,293

Thus researchers or companies may choose surrogate outcomes that are easier to measure and manage, but only reflect an issue important to the patient, and are not necessarily important themselves.

Surrogate outcomes are defined as “a laboratory measurement or a physical sign used as a substitute for a clinically meaningful end point that measures directly how a patient feels, functions or survives”.249 Surrogate outcomes are often more convenient and easier to measure than their patient important correlates, and are often the focus of clinical trials. For example, in ongoing diabetes trials, only 18% of primary outcomes were patient important,295 and other specialties had similar low rates after empirical reviews.296,297 Surrogate outcomes are assumed to correlate with an important outcome, but the dangers of using them are well documented. In a benign sense, surrogate

166 outcomes may have little or no association with their patient important correlates, leading to the approval and use of interventions that lack efficacy.

More worrying consequences are the result of approved interventions that are in fact harmful, and the use of surrogate outcomes has been blamed for unnecessary deaths.298-300

Little is known about the patterns of outcome reporting in surgical trials. In

Chapter 2, it was demonstrated that a large proportion of trials do not specify a primary outcome. In the previous chapter, evidence of selective outcome reporting was demonstrated. However, the content and applicability of measured outcomes in surgical trials is unknown. Given that surgical interventions are often associated with morbidity, their evaluation in clinical trials using appropriate outcome measurements is essential.

6.3 Aims and hypothesis statements

The primary aim of this chapter was to evaluate whether published trials of surgical interventions report outcomes that are of importance to patients. The hypothesis was that surgical trials contain low proportions (<50%) of patient important outcomes.

Another primary aim was to evaluate whether patient important outcomes were more likely to be specified as primary outcomes rather than secondary outcomes. The hypothesis was that in surgical trials, patient important outcomes were more likely to be secondary outcomes, while surrogate outcomes (which are often easier and more convenient to measure) are more likely to be primary outcomes.

167 A secondary aim was to evaluate whether patient important outcomes were less likely to be statistically significant. Since it is assumed patient important outcomes such as mortality and morbidity require large trial numbers, the hypothesis was that patient important outcomes are less likely to be reported as statistically significant, when compared to surrogate outcomes.

6.4 Methods

6.4.1 Study design

Systematic review and meta-analysis of published RCTs assessing a surgical intervention. This chapter is reported according to the PRISMA statement guidelines.283 The protocols for this thesis were not registered, but were peer reviewed through the usual process of the University of New

South Wales. A copy of the protocols is available from the author.

6.4.2 Inclusion of studies for the review

The same trial sample was used for this outcome assessment as the previous chapter (Chapter 5). For details on the eligibility criteria, search strategy, and study identification, refer to Chapter 2 methods (page 48).

Since little is known about the appropriate sample size calculation given the hypotheses above, this number was chosen to adequately power the metaregression analyses conducted.

6.4.3 Data extraction

Data extraction took place in a similar fashion to Chapter 5. In brief, an electronic proforma was created for data extraction, after piloting by three researchers using a sample of ten trials. After calibration of the data, one researcher extracted the data. Another researcher checked a random sample

168 of 100 included RCTs, and disagreements were resolved by discussion. One data point from five trials (5% of the checked sample) was changed after double-checking took place, and none of the data relating to classification of patient importance was changed.

6.4.4 Data items

Similar general and scientific quality reporting characteristics were extracted as per Chapter 5, and were used for descriptive purposes, as well as for the exploratory metaregression described below. These included author background, study type, study design, journal impact factor, total sample size, multicentre status, and trial registrations, as well as reported scientific methodology items including random sequence generation, allocation concealment, blinding, attrition, and source of funding. Tables 2.2 and 2.3 present the operational definitions of these variables.

An outcome was defined as a variable used to compare the randomised groups in a trial, in order to assess the efficacy or harm of an intervention.24

Variables that were only measured in one group, or which were measured before an intervention was administered (e.g. a baseline a characteristic) were not regarded as outcomes for this study. For each trial, all outcomes were extracted along with a classification of their patient importance, primary or secondary specification, statistical significance, and the status of the outcome regarding efficacy or harm.

Individual trial outcomes were classed as patient important, surrogate, or laboratory measurements. Table 6.1 presents the operational definitions used along with examples from various surgical specialties. Patient important outcomes included measurements of mortality/survival, pain, function, health

169 Table 6.1 Operational definitions of patient important outcomes with examples

PATIENT IMPORTANT OUTCOMES

Outcomes that are likely to be informative to patients and clinicians, and measure factors directly related to patient health. Includes the following:

a) Mortality / Survival Examples: 30-day mortality or 5-year survival

b) Pain Examples: Visual analogue score pain, or incidence of a painful symptom (such as dysuria or headache), or a questionnaire resulting in an aggregate score of pain (such as the Western Ontario and McMaster Universities Arthritis Index “WOMAC” pain subscale).

c) Function A validated measure of function. Examples: the International Index of Erectile Function, or the New York Heart Association Functional Classification.

d) Quality of life A validated measure of quality of life. Examples: the Short Form 36 “SF-36” survey, or the EuroQol 5 dimension survey “EQ-5D”.

e) Any morbid event or symptom Examples: incidence of wound infection, fracture non-union, or incontinence, or a measure of their opposites, such as wound healing, fracture union or continence.

f) Patient satisfaction Examples: a survey of overall patient satisfaction with their surgical procedure, or satisfaction with cosmesis.

g) Any intervention to address the above outcomes (b-f) Examples: use of analgesia for pain, catheterisation for urinary retention, interventions to restore vascular patency, or revision surgery.

SURROGATE OUTCOMES

Outcomes that may indicate progression or an increased risk of a patient important outcome, but are not intrinsically important to patients.

Examples: operative time for any procedure, urine flow rate for patients with prostatism, hemodynamic measurements after coronary bypass, fracture alignment after operative fixation, or surgeon reported “success” of a procedure.

LABORATORY OUTCOMES

Non-clinical tests that measure physiologic parameters without any direct or tangible effects on patients.

Examples: inflammatory markers after surgery, troponin after coronary bypass, cholesterol levels after obesity surgery, or amylase/lipase after pancreatic surgery.

170 related quality of life, any morbid event, patient satisfaction, or any procedure or intervention used to address these.295 Surrogate outcomes did not meet the criteria for being patient important, but instead were indicators of progression or an increased risk of a patient important outcome.249

Laboratory outcomes were defined as non-clinical tests that measure physiological parameters without any direct or tangible effects on patients.

When composite outcomes were reported, the components of that outcome were graded separately according to the above criteria.295

The specification of each outcome was also recorded as primary, secondary, or unclear, using a definition consistent with that used in Chapter 2. A primary outcome was either used in a sample size calculation, defined explicitly as such in the text (using the word “primary” or one of its synonyms), or was stated explicitly in an aims/hypothesis statement, in that order of priority. When a primary outcome was specified, other outcomes were regarded as secondary outcomes. When no primary or secondary outcomes were specified in a trial, that trials outcomes were marked as “unclear specification”.

The statistical significance of each outcome was also recorded based on what was reported by the authors, or the p value (or 95% confidence interval) presented. For the vast majority of studies, a p value <0.05 (or a 95% confidence interval that did not include the null effect size) was regarded as statistically significant. For some studies which pre-specified a level of statistical significance, that level was regarded as significant. Outcomes that

171 did not have any information presented relating to statistical significance were marked as “unclear significance”.

Finally, the efficacy or harm status of an outcome was also recorded.

Efficacy outcomes were defined as those measuring any intended or desirable effects of the intervention. A harm outcome, its direct opposite, compared any harmful or undesirable effects of the intervention.24 The efficacy/harm status of outcomes was used to conduct the planned subgroup analyses described below.

6.4.5 Data analysis

The total number of outcomes per trial was identified. Proportions (per trial) were calculated for each patient important outcome (as well as a composite of any patient important outcome), surrogate outcomes, and laboratory outcomes. Similarly, a descriptive analysis of specified primary and secondary outcomes was performed. Mean proportions were then calculated for each of the above for the whole sample of trials.

In order to investigate whether patient important outcomes were more likely to be specified as primary outcomes, the following analysis was conducted.

For each trial, a contingency (2 x 2) table was populated with that trials outcomes, describing whether each outcome was patient important or not, and whether each outcome was specified as primary or secondary. Trials that did not specify primary and/or secondary outcomes (either using a sample size calculation, explicitly using the word “primary” or one of its synonyms, or describing the primary outcome in an aims/hypothesis statement) were not eligible for this analysis. If the contingency table

172 contained a single zero cell, or two diagonal zero cells, 0.5 was added to all four cells as per the default of standard meta-analysis statistical packages.251

When a whole row or column contained zero cells, an odds ratio was incalculable and that trial was excluded from this statistical analysis. An odds ratio greater than one meant that a patient important outcome was more likely to be specified as a primary outcome. Odds ratios were then combined in random effects meta-analysis and a summary odds ratio (along with its

95% confidence interval and I2 as a measure of heterogeneity) was calculated as an overall indicator of whether surgical trials use patient important outcomes as primary outcomes.

In order to investigate whether patient important outcomes are less likely to be statistically significant (a secondary aim of this Chapter), a similar analysis was conducted. For each trial, a 2 x 2 table was populated using that trials outcomes, describing whether each outcome was patient important or not, and whether the outcome was statistically significant or not. An outcome that was unclear regarding its statistical significance was not eligible for this analysis. Odds ratios were generated in a similar fashion to the above, and pooled in random effects meta-analysis. An odds ratio greater than one meant that a patient important outcome was more likely to be statistically significant than other outcomes. The primary analysis was conducted for all outcomes, provided the statistical significance of the outcome was known.

However, as seen in Chapter 5, statistical significance may be related to the efficacy or harm status of an outcome. For example, the statistical significance of an efficacy outcome is more likely to be reported than its non- significance, but the opposite may apply to harm outcomes. Therefore, a

173 subgroup analysis was performed for efficacy and harm outcomes separately.

Separate 2 x 2 tables were constructed at the trial level for efficacy and harm outcomes, and pooled in an identical manner to the above to determine an overall association between patient important outcomes, and statistical significance.

Exploratory metaregression was modelled using the restricted maximum likelihood method, in order to explore trial level variables associated with the reporting of patient important, primary outcomes. Several variables were identified based on their theoretical association with the reporting of outcomes, including industry funding,93 an author with an epidemiology/statistics background,206 study type (superiority or non- inferiority), study design (parallel or cross-over/split-body), journal impact factor, total sample size, multicentre status (yes/no), and trial registration

(yes/no). The exponentialised coefficient was calculated for each potential explanatory variable in univariable metaregression models, and a p value and 95% confidence interval calculated for the coefficient. Eligible variables with a p value <0.25 were then combined in multivariable metaregression models and an association was considered for any variable with a p value

<0.05.

174 6.5 Results

The search and inclusion of surgical RCTs for this Chapter was identical to

Chapter 5 (Figure 5.1). Three hundred and fifty trials examining a surgical intervention were included. Table 5.3 presents the characteristics of included trials. Of note, most were superiority (335/350 trials, 96%), parallel arm

(331/350 trials, 94.5%) trials. Sixty two trials (18%) had an author with a declared epidemiology and/or statistics background. The primary outcome was specified (either using a sample size calculation, explicitly in the text, or described in an aims/hypothesis statement) in 225 trials (64%).

A total of 8,258 outcomes (4,141 efficacy and 4,117 harm outcomes) were reported in the included RCTs, with a mean of 24 outcomes per trial. Of the total, 4,939 outcomes (60%) were patient important, 2,174 outcomes were surrogates of patient important outcomes (26%), and 1,145 were laboratory or physiological outcomes (14%).

The mean proportion, per trial, of patient important outcomes reported was

60%, with slight variations when stratified for trial level characteristics (Table

6.2). The most commonly reported patient important outcomes were morbid events (mean proportion 29%), followed by intervention outcomes (10%) pain

(7%) and function (7%). The mean proportions of surrogate outcomes and laboratory outcomes were 29% and 10%, respectively.

A median of one primary outcome was identified for each of the 225 trials

(interquartile range = 3) that specified primary and/or secondary outcomes.

Of these, 148 (66%) specified a patient important outcome as a primary outcome. The most common patient important primary outcome was a morbid event or symptom (92 trials, 41%), followed by intervention outcomes

175 Table 6.2 Mean proportions of patient important outcomes per trial, stratified by trial characteristics

All trials Superiority Non- Parallel Split-body / Full or Non- trials inferiority trials Cross-over part industry industry trials trials funding funding

Number of trials 350 335 15 331 19 70 83

Any patient important (%) 60 60 67 60 60 58 56

Mortality / survival (%) 2 2 3 2 1 2 2

Pain (%) 7 7 6 7 9 7 6

Function (%) 7 7 10 7 7 10 6

Quality of life (%) 4 3 8 4 0 5 6

Morbid events / symptoms (%) 29 28 30 28 38 25 25

Patient satisfaction (%) 2 2 0 2 2 0 3

Intervention outcomes (%) 10 10 10 10 2 10 9

Surrogate outcomes (%) 29 29 26 28 38 32 31

Laboratory outcomes (%) 10 10 7 10 3 8 12

176 Table 6.3 Proportion of trials reporting patient important outcomes as primary outcomes, stratified by trial characteristics. Data presented only from trials that explicitly specified a primary outcome. Trials that did not specify primary/secondary outcomes were not included in this descriptive analysis

All trials Superiority Non- Parallel Split-body/ Full or part Non- trials inferiority trials Cross-over industry industry trials trials funding funding

Number of 225 212 13 213 12 58 58 trials

Any patient 148 (66%) 139 (66%) 9 (69%) 138 (65%) 10 (83%) 35 (60%) 34 (59%) important

Mortality / 10 (4%) 10 (5%) 0 10 (5%) 0 0 2 (3%) survival

Pain 20 (9%) 20 (9%) 0 18 (8%) 2 (17%) 4 (7%) 4 (7%)

Function 24 (11%) 23 (11%) 1 (8%) 22 (10%) 2 (17%) 9 (16%) 4 (7%)

Quality of life 8 (4%) 7 (3%) 1 (8%) 8 (4%) 0 3 (5%) 2 (3%)

Morbid events 92 (41%) 86 (41%) 7 (54%) 85 (40%) 7 (58%) 20 (34%) 22 (38%) or symptoms

Patient 1 (0.5%) 1 (0.5%) 0 1 (0.5%) 0 0 0 satisfaction

Intervention 25 (11%) 25 (12%) 0 23 (11%) 2 (17%) 7 (12%) 8 (14%) outcomes

Surrogate 74 (33%) 71 (33%) 3 (23%) 69 (32%) 5 (42%) 25 (43%) 20 (34%) outcomes

Laboratory 17 (8%) 15 (7%) 2 (15%) 17 (8%) 0 5 (9%) 3 (5%) outcomes

(25 trials, 11%) and function outcomes (24 trials, 11%). 74 trials (33%) specified a surrogate outcome as a primary outcome, and 17 trials (8%) specified a laboratory outcome as a primary outcome (Table 6.3).

Figure 6.1 depicts a forest plot of random effects meta-analysis that pooled trial level odds ratios of the association between patient importance and primary specification. Patient important outcomes were not significantly associated with a primary outcome status (odds ratio = 0.82, 95% CI 0.63 –

1.1, I2 = 21%). Exploratory random effects metaregression was modelled to

177 determine whether trial level variables were associated with the specification of patient important outcomes as primary outcomes (Table 6.4). Trials that had an author with a declared epidemiology and/or statistics background had twice the odds of other trials of specifying a patient important, primary outcome (odds ratio = 2.1, 95% CI 1.1 – 4.0, p = 0.03).

Similar random effect meta-analysis was performed to explore the association between patient important outcomes and statistical significance.

Figures 6.2-6.4 depict forest plots that pooled trial level odds ratios of this association. Across all studies, patient important outcomes were less likely to be statistically significant than other outcomes (odds ratio 0.61, 95% CI 0.47

– 0.79, I2 = 35%, Figure 6.2). This analysis was also performed separately for efficacy and harm outcomes, due to the theoretical association between efficacy/harm status and statistical significance. No association between patient importance and significance was found for efficacy outcomes (odds ratio = 0.8, 95% CI 0.5 – 1.2, I2 = 42%, Figure 6.3). However, when only harm outcomes were considered, patient important outcomes were less likely to be statistically significant (odds ratio = 0.6, 95% CI 0.4 – 0.8, I2 = 18%,

Figure 6.4). These results are summarised in Table 6.5.

178 Figure 6.1 Association between patient important outcomes and primary specification Figure 6.1 Association between patient important outcomes and primary specification

Pooled OR 0.82, 95% CI 0.63 - 1.1

.001 .01 .1 1 10 100 1000 Odds ratio Favours NOT patient important as primary Favours patient important as primary

179 Table 6.4 Results of exploratory metaregression of association between patient important outcomes and specification as primary

Univariate Mutlivariate*

Variable OR P OR P value (95% CI) value (95% CI)

Industry funding 1.3 (0.6 – 2.7) 0.5 - -

Author epi/stats degree 2.1 (1.1 – 4.0) 0.03 2.1 (1.1 – 4.0) 0.03

Non-inferiority trial 0.5 (0.2 – 1.5) 0.2 0.6 (0.2 – 1.6) 0.3

Split-body/crossover 2.1 (0.6 – 7.0) 0.2 2.0 (0.6 – 6.4) 0.3 trial

Journal impact factor 1.0 (0.9 – 1.1) 0.8 - -

Total sample size 0.99 (0.98 – 1.1) 0.2 0.99 (0.98 – 1.1) 0.6

Multicentre trial 0.9 (0.6 – 1.2) 0.5 - -

Registered trial 1.0 (0.7 – 1.4) 0.9 - -

*Variables only included in multivariate analysis if p<0.25 in univariate analysis

180 Figure 6.2 Association between patient important outcomes and statistical significance-Figure 6.2 Association All between outcomes patient important outcomes and statistical significance All outcomes

I I I Pooled OR 0.61, 95% CI 0.47 - 0.79 •

.001 .01 .1 1 10 100 1000 Odds ratio Favours patient important outcomes as non-significant Favours patient important outcomes as significant

181 Figure 6.3 Association between patient important outcomes and statisticalFigure 6.3 significanceAssociation between- Efficacy patient outcomes important outcomes and statistical significance Efficacy outcomes

Pooled OR 0.77, 95% CI 0.50 - 1.18

.001 .01 .1 1 10 100 1000 Odds ratio Favours patient important outcomes as non-significant Favours patient important outcomes as significant

182 Figure 6.4 Association between patient important outcomes and statisticalFigure 6.4 significanceAssociation between- Harm patient outcomes important outcomes and statistical significance Harm outcomes

Pooled OR 0.57, 95% CI 0.39 - 0.84

.001 .01 .1 1 10 100 1000 Odds ratio Favours patient important outcomes as non-significant Favours patient important outcomes as significant

183 Table 6.5 Pooled odds ratios of association between patient important outcomes and statistical significance

All outcomes Efficacy outcomes Harm outcomes

No. OR No. OR No. OR trials (95% CI) I2 trials (95% CI) I2 trials (95% CI) I2

193 0.6 (0.5–0.8) 35% 96 0.8 (0.5–1.2) 42% 84 0.6 (0.4–0.8) 18%

6.6 Discussion

In this chapter, the extent to which surgical trials reported outcomes that were patient-important was examined. It was found that a mean proportion of

60% of outcomes per trial were patient important, but substantial proportions of outcomes were surrogate (29%) or laboratory based (10%). It was also found that only two thirds of surgical trials specified a patient important primary outcome, and patient important outcomes were not more likely to be specified as primary outcomes.

Surrogate outcomes have been accepted as proxy measures of patient important outcomes in clinical trials for many years. They are often easier and quicker to measure,249 and statistical inferences can be made with smaller numbers of patients owing to larger treatment effect sizes.301 Thus they are useful in the early evaluation of the bioactivity of an intervention,293 and many drug interventions have been approved by regulatory bodies on the basis of a positive effect on surrogate endpoints.302 However, assumptions are often made that surrogate endpoints lie on a causal pathway to a patient important, clinically relevant outcome.303 These

184 assumptions of causality often rely on observational evidence such as laboratory studies, ecologic and cohort studies,249 and have resulted in grave misinterpretations of the benefits of some interventions. For example, Class I anti-arrhythmic medications were approved for use after myocardial infarction, based on a proven reduction in ventricular ectopic beats (a surrogate outcome of mortality), but a subsequent large clinical trial304 showed an increase in mortality with these agents, with thousands of patients likely to have been harmed (or killed) in the intervening period. Hormone replacement therapy was shown to improve cholesterol levels in women, but subsequent evidence suggested an increase in the incidence of myocardial infarction and stroke.305 On the other hand, there are many examples of trials for which surrogate and patient important outcomes have been aligned, such as an increase in bone mineral density and fracture risk in trials of bisphosphonates.306,307 Some commentators have argued strongly against the adoption of interventions based on surrogate endpoints.293,302,303 Clearly, a balance needs to be struck between the need for innovation and the need for sound evidence.

Surgical innovation has been criticised for not requiring such rigorous evaluation prior to its availability to patients,10 and it is likely that its early evaluation suffers from similar problems with the use of surrogate outcomes.

In this chapter it was found that 60% of the total of all outcomes were patient important outcomes, and a slight majority of the outcomes per study were patient important (60%). Further, 66% of trials specified patient important primary outcomes. Studies in other specialty areas have found smaller proportions of patient important outcomes in their samples of randomised

185 trials. Gandhi and colleagues295 assessed ongoing randomised trials of diabetes care and found that only 46% of trials reported any patient important outcome, and only 18% of trials reported patient important, primary outcomes.

However, in their assessment of recent randomised trials published in six high impact general medical journals, Ciani and colleagues301 found that 27% of trials specified a surrogate primary outcome. It is possible that surgical interventions (when compared to drug interventions) are more conducive to assessment by practical, clinically relevant outcomes, since surgeons are often directly exposed to pathological processes. It is also likely that surgeons focus on clinically relevant outcomes to monitor their patients after surgery (such as morbid/adverse events or functional outcomes) rather than laboratory measures. Nevertheless, a significant proportion of outcomes

(both primary and otherwise) in surgical trials remain non-patient important, and this warrants some concern. This study is the first to systematically assess the patient importance of outcomes in surgical trials. Other initiatives in surgical trials have encouraged the more transparent definition of morbid events in surgical trials,308 while Hall and Hall found evidence consistent with the subversive reporting of multiple statistical comparisons, without pre- specification of outcomes of interest, in their systematic review.282

When all outcomes were considered, it was found that patient important outcomes were less likely to be significant when compared with non-patient important outcomes. Ciani and colleagues confirmed the association between larger treatment effects and surrogate outcomes in their meta- regression analysis of trials published in high impact general medical journals.301 On average, surrogate outcomes resulted in treatment effects

186 50% greater than patient important outcomes, after adjustment for trial characteristics associated with exaggerated treatment effects. There are several alternative explanations for these findings. First, trials may be subject to selective reporting of outcomes. In the previous Chapter, it was demonstrated that an association exists between the statistical significance of an outcome, and the level of reporting of that outcome. In addition, a high prevalence of unreported outcomes was found. Previous work24 found a concerning proportion of discrepancies in the primary outcomes described in trial protocols and their published results. It is likely that a proportion of surgical trials in this cohort selectively reported outcome results based on effect size and statistical significance. Without readily available access to prospectively published trial protocols, this trend is impossible to verify.

Second, the efficacy/harm status of an outcome may also be associated with its statistical significance. Indeed, in a subgroup analysis presented in this

Chapter, patient important harm outcomes were less likely to be statistically significant, while no association was found for efficacy outcomes. This finding is consistent with the theoretical association between harm/efficacy status and statistical significance, and provides an alternate explanation for why patient important outcomes are less likely to be statistically significant.

This study has a number of strengths. First, the study had a protocol driven systematic review design, which allows generalizability of the results to recently published surgical randomised trials. Second, the definitions of patient important outcomes were clear and unambiguous, and there was minimal disagreement between two researchers when the data was checked.

This study also has a number of weaknesses. First, it does not account for

187 the occurrence of selective outcome reporting. It is likely that a certain proportion of outcomes remain unreported (possibly based on that outcomes statistical significance). Thus the proportions of patient important outcomes presented here may not be entirely accurate. Furthermore, some studies may have retrospectively selected primary outcomes based on their statistical significance, which will affect the analyses presented here on the association between primary outcomes and patient importance. Without access to trial protocols, clarification of these issues is difficult or impossible.

While the registration of trials is becoming more commonplace, an earlier

Chapter found only 27% of the surgical trials included here were registered.

Furthermore, there are known issues with the presentation and inclusiveness of trial registry data.210 Second, the analysis presented here examining for an association between patient importance and significance does not take into account whether trials were powered for the outcomes they measured. Thus the findings here may not be applicable to individual trials and should be interpreted with some caution.

The importance of appropriate outcome selection in clinical trials is not a new issue. Authors from various subspecialties have discussed the interpretation of trial findings, and have made recommendations on what outcomes should be measured in future studies.139,309 The Cochrane Collaboration recommends authors include surrogate outcomes with caution, but current guidance on the issue for British health technology assessment reviews is relatively brief and is currently under review.310 Indeed, there is evidence that a large proportion of technology reviews are based on surrogate outcome evidence.311

188 This Chapter has broadly clarified how authors of surgical trials report patient important outcomes, and provides empirical data for authors and readers of surgical trial publications. Investigators should select outcomes (particularly primary outcomes) based on whether they are patient important and/or clinically relevant, and surrogate outcomes should only be used when a strong, evidence based link exists between it and a patient important outcome. Trial reporting guidelines (such as the CONSORT Statement23) should include reporting on the clinical relevance of any surrogate outcomes measured, and trial funding and publication should consider the importance of outcomes to patients in their decision making.

189 7. Characteristics and reporting of meta-analyses of surgical

interventions: A systematic review

7.1 Abstract

Background: Surgical patients and interventions are often complex and heterogeneous, making surgical trials difficult to undertake. Meta-analyses can be useful tools for summarising surgical evidence as they aim to encompass all sources of information on a particular research question, providing quantitative summary effects for interventions. However, they may be prone to methodological and reporting biases. We assessed the epidemiology and reporting characteristics of meta-analyses of surgical interventions.

Methods: We conducted a systematic review of 150 meta-analyses that included randomised trials of surgical interventions published between

January 2010 and June 2011. A comprehensive search strategy was executed using the Medline, Embase and Cochrane databases. Data were independently extracted by two authors, and included author, study, journal and report characteristics, items from the PRISMA statement (reporting quality), and items from the AMSTAR tool (methodological quality).

Descriptive statistics were used for individual items, and PRISMA and

AMSTAR scores were calculated as a measure of overall compliance.

Exploratory linear regression was performed to explore variables associated with improved reporting of surgical meta-analyses.

Results: A median of eight trials (interquartile range = 8) was included in each meta-analysis. One third had an author with a background in

190 epidemiology and/or statistics. 44% were published in PRISMA-endorsing journals with a median impact factor of 3.5. There was moderate compliance with PRISMA, with an average of 71% of items reported, but poorer compliance with AMSTAR, with 48% of items adequately described, on average. Cochrane reviews and report length were independently associated with an improved level of reporting.

Conclusions: Deficiencies remain in the reporting and methodology of published surgical meta-analyses. This may bias the results of meta- analyses of randomised trials which themselves are regarded as the highest level of evidence for the effects of interventions. Editorial insistence on using reporting and methodological guidelines would improve this situation.

7.2 Introduction

Thousands of articles are published in biomedical journals every year with the number increasing annually.171 In order to keep up to date with published research, a clinician would be required to collate, read and interpret a large amount of information on a regular basis. Fortunately, part of the armoury of modern day research is the systematic review, which is a comprehensive search and objective appraisal of the evidence for a particular intervention.

Contrary to traditional narrative reviews, systematic reviews objectively approach the evidence, with a priori hypotheses, search strategies, inclusion criteria and data extraction techniques.150 Systematic reviews can resolve conflicting evidence, identify gaps in current research, and may provide valuable and timely conclusions.150 In order to derive an estimate of the effect size on the outcome of interest from separate studies, a quantitative

191 assessment may also be carried out within a systematic review – a meta- analysis. This method involves the use of statistical methods to pool results from individual studies into a ‘summary effect estimate’. Since studies are often different in design and methodology, meta-analyses face particular challenges as the “combinability” of individual studies often needs to be justified and sources of heterogeneity explored.159

Since a well conducted systematic review – inclusive or not of a meta- analysis - aims to encompass all available sources of qualifying information, it is recognised as a high level of evidence,40 and is consequently frequently cited.312 However, systematic reviews may be retrospective in nature,54 and thus prone to bias introduced in the identification, selection or statistical analysis of included studies.160 The methodology of published systematic reviews is generally poorly reported,313-316 and while systematic reviews published in the Cochrane library are of a higher quality and are considered the gold standard,317,318 there is still scope for improvement.172,319 The implication of poor reporting is that systematic reviews may reach incorrect conclusions, and since systematic reviews are often used to guide clinical practice, this may seriously affect the delivery of health care.

In 1996, a collaboration of leading authorities on the conduct of clinical trials produced the Consolidated Standards of Reporting Trials (CONSORT)

Statement,116 a checklist and flow diagram aimed at improving the reporting of randomised controlled trials. This was primarily in response to accumulating empirical evidence of bias when trials were not adequately designed and/or reported.39,53,55 In recognition of a similar problem in the reporting of systematic reviews, the Quality of Reporting of Meta-analyses

192 (QUOROM) statement was produced in 1999,109 which consists of 18 items for the reporting of meta-analyses of randomised controlled trials, and a flow diagram which describes the identification and inclusion of studies.

“Substantive changes” were made to QUOROM in light of new guidelines for the reporting of systematic reviews and meta-analyses, and in 2009 it was updated to the PRISMA statement (Preferred Reporting Items for Systematic reviews and Meta-Analyses).180 An accompanying explanatory statement was published by the same group of authors,176 and PRISMA has since been adopted by many journals as the gold standard for how a systematic review and/or meta-analysis should be reported. Indeed, there is some evidence to suggest journals that endorse the PRISMA statement (or its predecessor,

QUOROM) publish meta-analyses that are better reported than non- endorsers.183

An earlier separate initiative has been the development of AMSTAR: A measurement tool to assess the methodological quality of systematic reviews.173 Rather than being a reporting guideline, AMSTAR was specifically developed as a critical appraisal tool to identify the extent of bias in methodology at the review level. AMSTAR was developed using a synthesis of existing critical appraisal tools by a team of clinicians, methodologists and epidemiologists, as well as less experienced reviewers, and thus has good face and content validity. It is also more prescriptive in nature (when compared to PRISMA), which reflects its purpose as a tool to assess adequate methodology (or risk of bias at the review level).

Research in surgery presents particular challenges for providing high-quality evidence. Surgical patients may be heterogeneous, often suffer from

193 problems of an emergency nature, and may have trouble participating in a trial where interventions are perceived to have markedly different risks and efficacy. Surgical practice is difficult to standardise owing to differences in experiential evidence and preferences for techniques between surgeons.

Surgical research practice may also be stuck in a vicious cycle, where a history of poor quality research attracts less funding, leading to more poor quality and underpowered research,131 rendering the risk of Type I or II errors more likely. The implications of these challenges are two-fold. First, investigators in surgery need to circumvent the pitfalls of such research.

Second, it highlights the particular importance of well conducted systematic reviews in surgery, which include a quality assessment of the evidence

(therefore reducing the chance of erroneous conclusions based on Type I errors), as well as the ability to pool results from underpowered trials

(therefore addressing the problem of Type II errors).

Few studies have performed a broad assessment of the general characteristics and reporting of meta-analyses. Previous articles were limited to a specific database,317 included only reviews from a particular subspecialty,313,320 or were not approached systematically (and therefore may not reflect the current published evidence base).202 Given that meta- analyses are often influential and highly cited publications, this study was performed to address the deficits of what is known about the epidemiology and quality of meta-analyses in surgery.

194 7.3 Aims

The primary aim of this study was to evaluate the reporting characteristics of published meta-analyses assessing a surgical intervention. Two dimensions of reporting were explored. First, the PRISMA statement, regarded as a comprehensive guideline for how a meta-analysis should be reported.

Second, the AMSTAR tool, which specifically focuses on adequate review methodology.

A secondary aim was to examine the general (author, journal and report) characteristics of surgical meta-analyses, and to explore whether these characteristics were associated with an improved level of reporting, measured by compliance with the PRISMA statement.

7.4 Methods

7.4.1 Study design

Systematic review methodology was adopted. The unit of analysis was an individual published meta-analysis examining a surgical intervention. It was intended that a sample of meta-analyses that represented the current evidence base be included, and therefore only the most recently published articles identified in the search were included.

7.4.2 Eligibility criteria

To be eligible for inclusion, a study had to meet the following criteria: i) be described as a "meta-analysis", "data synthesis" or "quantitative overview" of randomised trials, or include a statistical pooling of results from a review article of randomised trials. The definition of the National Library of Medicine for meta-analysis was used: “systematic methods that use statistical

195 techniques for combining results from different studies to obtain a quantitative estimate of the overall effect of a particular intervention or variable on a defined outcome.”321 Articles that reviewed both randomised and non-randomised studies were included only if statistical pooling of results was presented for randomised trials alone; ii) be published as a full text article. Studies published as abstracts or conference proceedings were excluded; iii) be published in the English language; iv) include studies that were conducted on humans (not cadavers), and; v) be a comparison between a surgical intervention and another intervention. A surgical intervention was defined as any procedure that requires surgical training, and is usually performed by a surgeon of any subspecialty recognised by the

Royal Australasian College of Surgeons (see Table 2.1).

Obstetric/gynaecologic interventions, ophthalmic surgery (usually performed by an ophthalmologist), dental surgery (usually performed by a dental or maxillofacial surgeon), injections of any material, application of splints and interventions for diagnostic purposes were excluded.

7.4.3 Search for meta-analyses

A search on MEDLINE, EMBASE, and the Cochrane Database of Systematic

Reviews was executed. A pilot was performed of an electronic search strategy formed with the expertise of both a medical librarian and a search specialist associated with a Cochrane Review Group (Appendix 9).

7.4.4 Study identification methods

Records identified using the search strategy were imported into Endnote

Version X- Reference Management Software (Thomson, NY, USA). Using software functions, duplicates were removed and then articles were ordered

196 according to date of publication, with the most recently published studies placed first. Assessment of studies for inclusion therefore proceeded in reverse chronology until the required sample was obtained. The titles and abstracts of retrieved records were assessed according to the above inclusion criteria independently by two researchers. When a reference did not meet one of the criteria, it was excluded and a reason specified for the exclusion in the following order of priority: i) not a meta-analysis, ii) no surgical intervention assessed, and iii) other reasons (including duplication and not English language). The full texts of abstracts that appeared to meet the eligibility criteria were then retrieved. An assessment of eligibility was then performed using the same process as for abstracts. Disagreements at any stage were resolved by discussion, and arbitration by a third author was not required.

7.4.5 Data extraction

An electronic data form was piloted by two researchers using a sample of ten reports. The researchers then met at various time points during data collection in order to clarify the definitions for each data item. The data form was thus calibrated using a total sample of 60 reports.

Once data extraction was completed for all included reports, inter-observer reliability was assessed using the kappa statistic (which describes the level of agreement beyond that due to chance) for each individual (binary) data item collected, and concordance correlation coefficients for the (continuous) summary scores used.322 A priori definitions for the adequacy and level of agreement were used, according to Landis et al.201

197 7.4.6 General characteristics of meta-analyses

In order to explore the epidemiology of published meta-analyses in surgery, data was collected related to the authors (number of authors, affiliation, background in epidemiology/statistics and country of origin), the study (type of comparison, number of trials included, subspecialty of the intervention, funding source and conclusion), the journal (journal type, impact factor and

PRISMA endorsement) and the published report (length in words). These characteristics have also been previously hypothesised as being associated with measures of quality,183,313,323 and were used in the exploratory analyses described below. The general characteristics collected and their operational definitions are listed in Table 7.1.

7.4.7 Reporting of meta-analyses

Two dimensions of the reporting of meta-analyses were assessed. First, the

PRISMA statement was used to assess the reporting quality of included meta-analyses.180 PRISMA includes 27 items divided amongst the components of a published report (title/abstract, introduction, methods, results and discussion) and is the suggested format for how meta-analyses should be reported. For each included meta-analysis, the 27 PRISMA items were graded as “adequate” or “inadequately” reported. The PRISMA explanatory statement was used as a guideline for the interpretation of each item during the pilot phase outlined above.176 As an overall measure of reporting quality, a PRISMA score was calculated for each meta-analysis, with one point for each adequate PRISMA item reported, out of a maximum of 27. This was expressed as a proportion and used in the exploratory analyses outlined below. Table 7.2 contains the PRISMA guideline items

198 along with their operational definitions, as recommended by the PRISMA group.

Second, the AMSTAR critical appraisal tool was used to assess the review level methodological quality of included meta-analyses. AMSTAR is an 11 item tool that reflects the current knowledge on what may bias the conduct of a meta-analysis. It is prescriptive in what should be regarded as good methodology. While the PRISMA statement is a comprehensive guideline on how to structure a report of a published meta-analysis, it is not necessarily prescriptive in terms of what should be regarded as adequate methodology for the conduct of a meta-analysis. For this reason, AMSTAR was used as a more appropriate tool for the assessment of review level methodological quality. Each of the 11 AMSTAR checklist items were graded as “adequate” or “inadequately” reported. As an overall measure of reported methodological quality, an AMSTAR score was calculated, with one point for each adequate

AMSTAR item described, out of a maximum score of 11, and this was expressed as a proportion. The AMSTAR tool items along with their operational definitions are presented in Table 7.3.

7.4.8 Sample size calculation

A sample size of 150 articles was chosen to power the exploratory regression analyses, with 15 articles included for each of the 10 explanatory variables.

7.4.9 Data analysis

A descriptive analysis was performed for characteristics, PRISMA and

AMSTAR items. Frequencies and proportions were calculated for binary variables, including each individual adequately reported PRISMA and

199 Table 7.1 Operational definitions of general characteristics of surgical meta-analyses

AUTHOR CHARACTERISTICS Number of authors Continuous variable recorded as an integer Stated affiliation of the first author Categorical variable recorded as i) Department of surgery (any specialty recognized by the Royal Australasian College of Surgeons) ii) Department of epidemiology / statistics / public health OR Clinical trials unit OR Cochrane collaboration affiliate iii) Department of medicine iv) Other department Author background Citation of epidemiology, biostatistics, public health or trials unit background by any one of the authors. Binary variable recorded as yes / no Country of origin of first author Categorical variable, dichotomised into i) “research country” (REF) defined as USA, Canada, Australia, New Zealand, Japan, Israel, and Western Europe and ii) “other country”

STUDY CHARACTERISTICS Type of comparison Categorical variable recorded as i) surgical intervention vs. surgical intervention ii) surgical intervention vs. non-surgical intervention. The study aim statement of sample size calculation was used to determine the main comparison Number of trials included Total number of trials included in the meta-analysis (for any comparison). Continuous variable recorded as integer Subspecialty of the intervention Categorical variable recorded as one of the nine specialties represented by the Royal Australasian College of Surgeons, including general (including upper and lower gastrointestinal), orthopaedic, cardiothoracic, vascular, urology, otolaryngology/head and neck, neuro, plastic and paediatric surgery. If the intervention was associated with two subspecialties, the affiliations of the authors were used to determine the subspecialty Source of funding Categorical variable recorded as: i) Full industry: the only source(s) of funding was stated as an industry (for-profit) source, ii) Part industry: one source of funding or one section of the research supported by an industry (for-profit) source, iii) Non-industry: the only source(s) of funding was stated as a not-for-profit source (such as government grants, charitable trusts, or scholarships) iv) No external: where no external source of funding was declared (or it was declared the trial was internally funded by the authors’ institutions / department), or v) Unclear: the funding source was not declared Conclusions of the meta-analysis Categorical variable recorded as: i) supports and recommends the use of the intervention (even if conditions / caveats are mentioned), ii) calls for more research on the topic or, iii) does not support the intervention. If more than one intervention is examined, then the primary research question was used to determine which conclusion will be used for this item

JOURNAL AND REPORT CHARACTERISTICS Journal name and type Categorised into i) general surgical journal ii) subspecialty surgical journal iii) general medical journal iv) subspecialty medical journal. A “general” journal was regarded as one publishing from multiple non- overlapping specialties Journal impact factor Recorded as continuous variable. The Thomson ISI Journal Citation Reports (JCR) was used to reflect the impact of that journal with the JCR edition prior to year of publication used to reflect the time lag in submission to publication of the article PRISMA endorsement Categorical variable recorded as i) PRISMA endorsing if the journal was listed in the PRISMA website list of endorsing journals/organisations as of December 2012 (http://www.prisma- statement.org/endorsers.htm), or the journal website’s instructions to authors explicitly stated that authors should follow the PRISMA guidelines when submitting a systematic review/meta-analysis for publication, OR ii) Not PRISMA endorsing, if the above criteria were not met. Article length in words Number of words in the published journal report, excluding abstracts, tables and figures

200 Table 7.2 Preferred reporting items for systematic reviews and meta- analyses (PRISMA) 2009 checklist180 with operational definitions

Title 1. Title Identify the report as a systematic review, meta-analysis, or both.

Abstract 2. Structured Provide a structured summary including, as applicable: summary background; objectives; data sources; study eligibility criteria, participants, and interventions; study appraisal and synthesis methods; results; limitations; conclusions and implications of key findings; systematic review registration number.

Introduction 3. Rationale Describe the rationale for the review in the context of what is already known. 4. Objectives Provide an explicit statement of questions being addressed with reference to participants, interventions, comparisons, outcomes, and study design (PICOS).

Methods 5. Protocol and Indicate if a review protocol exists, if and where it can be registration accessed (e.g., Web address), and, if available, provide registration information including registration number. 6. Eligibility criteria Specify study characteristics (e.g., PICOS, length of follow- up) and report characteristics (e.g., years considered, language, publication status) used as criteria for eligibility, giving rationale. 7. Information sources Describe all information sources (e.g., databases with dates of coverage, contact with study authors to identify additional studies) in the search and date last searched. 8. Search Present full electronic search strategy for at least one database, including any limits used, such that it could be repeated. 9. Study selection State the process for selecting studies (i.e., screening, eligibility, included in systematic review, and, if applicable, included in the meta-analysis). 10. Data collection Describe method of data extraction from reports (e.g., piloted process forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators. 11. Data items List and define all variables for which data were sought (e.g., PICOS, funding sources) and any assumptions and simplifications made. 12. Risk of bias in Describe methods used for assessing risk of bias of individual individual studies studies (including specification of whether this was done at the study or outcome level), and how this information is to be used in any data synthesis. 13. Summary State the principal summary measures (e.g., risk ratio, measures difference in means).

201 Table 7.2 (continued)

14. Synthesis of Describe the methods of handling data and combining results results of studies, if done, including measures of consistency (e.g., I2) for each meta-analysis. 15. Risk of bias Specify any assessment of risk of bias that may affect the across studies cumulative evidence (e.g., publication bias, selective reporting within studies). 16. Additional Describe methods of additional analyses (e.g., sensitivity or analyses subgroup analyses, meta-regression), if done, indicating which were pre-specified.

Results 17. Study selection Give numbers of studies screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally with a flow diagram. 18. Study For each study, present characteristics for which data were characteristics extracted (e.g., study size, PICOS, follow-up period) and provide the citations. 19. Risk of bias within Present data on risk of bias of each study and, if available, studies any outcome level assessment (see item 12). 20. Results of For all outcomes considered (benefits or harms), present, for individual studies each study: (a) simple summary data for each intervention group (b) effect estimates and confidence intervals, ideally with a forest plot. 21. Synthesis of Present results of each meta-analysis done, including results confidence intervals and measures of consistency. 22. Risk of bias Present results of any assessment of risk of bias across across studies studies (see Item 15). 23. Additional analysis Give results of additional analyses, if done (e.g., sensitivity or subgroup analyses, meta-regression [see Item 16]).

Discussion 24. Summary of Summarize the main findings including the strength of evidence evidence for each main outcome; consider their relevance to key groups (e.g., healthcare providers, users, and policy makers). 25. Limitations Discuss limitations at study and outcome level (e.g., risk of bias), and at review-level (e.g., incomplete retrieval of identified research, reporting bias). 26. Conclusions Provide a general interpretation of the results in the context of other evidence, and implications for future research.

Funding 27. Funding Describe sources of funding for the systematic review and other support (e.g., supply of data); role of funders for the systematic review.

202 Table 7.3 AMSTAR 2007 (A measurement tool to assess the methodological quality of systematic reviews) items with operational definitions173

1. Was an ‘a priori’ design provided? The research question and inclusion criteria should be established before the conduct of the review.

2. Was there duplicate study selection and data extraction? There should be at least two independent data extractors and a consensus procedure for disagreements should be in place.

3. Was a comprehensive literature search performed? At least two electronic sources should be searched. The report must include years and databases used (e.g. Central, EMBASE, and MEDLINE). Key words and/or MESH terms must be stated and where feasible the search strategy should be provided. All searches should be supplemented by consulting current contents, reviews, textbooks, specialized registers, or experts in the particular field of study, and by reviewing the references in the studies found.

4. Was the status of publication (i.e. grey literature) used as an inclusion criterion? The authors should state that they searched for reports regardless of their publication type. The authors should state whether or not they excluded any reports (from the systematic review), based on their publication status, language etc.

5. Was a list of studies (included and excluded) provided? A list of included and excluded studies should be provided.

6. Were the characteristics of the included studies provided? In an aggregated form such as a table, data from the original studies should be provided on the participants, interventions and outcomes. The ranges of characteristics in all the studies analyzed e.g. age, race, sex, relevant socioeconomic data, disease status, duration, severity, or other diseases should be reported.

7. Was the scientific quality of the included studies assessed and documented? ‘A priori’ methods of assessment should be provided (e.g., for effectiveness studies if the author(s) chose to include only randomized, double-blind, placebo controlled studies, or allocation concealment as inclusion criteria); for other types of studies alternative items will be relevant.

8. Was the scientific quality of the included studies used appropriately in formulating conclusions? The results of the methodological rigor and scientific quality should be considered in the analysis and the conclusions of the review, and explicitly stated in formulating recommendations.

203 Table 7.3 (continued)

9. Were the methods used to combine the findings of studies appropriate? For the pooled results, a test should be done to ensure the studies were combinable, to assess their homogeneity (i.e. Chi-squared test for homogeneity, I²). If heterogeneity exists a random effects model should be used and/or the clinical appropriateness of combining should be taken into consideration (i.e. is it sensible to combine?).

10. Was the likelihood of publication bias assessed? An assessment of publication bias should include a combination of graphical aids (e.g., funnel plot, other available tests) and/or statistical tests (e.g., Egger regression test).

11. Was the conflict of interest stated? Potential sources of support should be clearly acknowledged in both the systematic review and the included studies.

204 AMSTAR item, and basic descriptive statistics were calculated for continuous variables.

The overall PRISMA score (out of a maximum of 27) was examined for whether it met the characteristics of a normally distributed continuous variable prior to further analysis. PRISMA score was then included in exploratory multiple linear regression as the outcome variable, and ten exploratory general characteristics related to authors (epidemiology/statistics background and research country of origin), the study (subspecialty of the intervention, number of trials included, and the conclusion of the meta- analysis), the journal (type of journal, impact factor, PRISMA endorsement status and Cochrane vs. other reviews) and the report (length of the article in words) were used as explanatory variables. Univariate linear regression was first performed for each explanatory variable, and variables with a p value

<0.25 were then combined in stepwise backward multiple regression analysis.

A p value <0.05 was considered statistically significant in multivariate modelling, and significant associations were reported with their effect size and 95% confidence intervals. Regression diagnostics were then performed on the final regression model to examine for normality and homoscedasticity of residuals, and collinearity.

7.5 Results

7.5.1 Results of search

The electronic search strategy was executed in July 2011. Figure 7.1 depicts a flow diagram of article inclusion. A total of 1,215 unique citations were reviewed by two researchers, and after a stepwise review of abstracts and

205 Figure 7.1 Flow diagram of surgical meta-analysis inclusion

Records identified through database searching (n = 1215)

Records screened (n = 1215)

Records excluded (n = 793) Not meta-analysis: 155 Not surgical: 511 Other reason: 127

Full-text articles assessed for eligibility (n = 422)

Full-text articles excluded (n=244) Not meta-analysis: 218 Not surgical: 9 Other reason: 17

Studies assessed for inclusion (n = 178) Full-text articles excluded (n = 28) Publications assessed backwards in time until the required sample size was reached Studies included in analyses (n = 150)

206 full texts, the required sample of 150 meta-analyses was included. Articles were published between January 2010 and June 2011. A list of the included surgical meta-analyses may be found in Appendix 10.

7.5.2 Characteristics of surgical meta-analyses

Table 7.4 depicts the general characteristics of the included studies. There was a mean 4.8 authors per published meta-analysis, with most (67%) first authors affiliated with a surgical department. One third of published meta- analyses had at least one author with an epidemiology/statistics background.

The most common countries of publication were the United Kingdom (21%),

China (19%), and the United States (9%). A median of 8 randomised trials

(interquartile range, 8) were included in each meta-analysis, and the most commonly published subspecialties were from general surgery (including upper and lower gastrointestinal, 47%), orthopaedics (17%) and cardiothoracic surgery (12%). Approximately half of meta-analyses did not specify from where they obtained funding to conduct their review (52%), 23% were from a not-for-profit source, and 22% were funded internally by the authors’ department(s). Only 3% of meta-analyses declared funding from a commercial source. Approximately half of meta-analyses (49%) had conclusions that supported the use of the surgical intervention, 15% supported the control group intervention, and in 36% of meta-analyses there was a call for further research in order for conclusions to be made about the comparison(s). Meta-analyses were most often published in subspecialty surgery journals (41%), followed by general medical journals (29%),

Cochrane reviews (which were regarded as general medical journals as per their classification in the Thomson Journal of Citation Reports) made up 24%

207 of the included meta-analyses. The articles were published in 81 unique journals, and 18 (22%) of those endorsed the PRISMA statement with explicit instructions to authors submitting a systematic review/meta-analysis to their journal. Journals had a moderately high Thomson Impact Factor, with a median impact factor of 3.5, and the average length of the published meta- analysis was 3,862 words, excluding the abstract, tables and figures.

7.5.3 Inter-observer reliability testing

Overall inter-observer agreement on PRISMA scores was high according to the Lin’s concordance coefficient (0.94, 95% CI 0.91 – 0.97). Agreement on most individual PRISMA items was very high or near perfect.201 Only two items resulted in kappa statistics less than 0.5 (Items 16 and 23). Similarly, there was high inter-observer agreement on assessments using the

AMSTAR tool (Lin’s concordance coefficient for AMSTAR scores = 0.93,

95% CI 0.89 – 0.96). Agreement on individual AMSTAR items was also very high, with only one item resulting in a kappa statistic less than 0.5 (Item 9).

Tables 7.5 and 7.6 depict agreement statistics for individual PRISMA and

AMSTAR items, respectively.

7.5.4 Reporting of surgical meta-analyses: compliance with the PRISMA statement

The mean PRISMA score was 19.0 (SD=4.4) out of a maximum of 27 (71% of items adequately reported, on average). PRISMA scores were normally distributed (with some negative skewness, Figure 7.2) and were subsequently used as an outcome variable in exploratory linear regression.

Cochrane reviews were more compliant with the PRISMA statement, with a

208 Table 7.4 Characteristics of surgical meta-analyses

Characteristic Number of authors, n (%) 1-3 31 (20) 4-6 79 (53) >6 40 (27) Department of first author, n (%) Surgical 101 (67) Medical 37 (25) Epi/stats 11 (7) Other 1 (1) Author with epi/stats degree , n (%) 49 (33) a Country , n (%) United Kingdom 32 (21) China 29 (19) United States 14 (9) b Research country , n (%) 110 (73) Type of comparison, n (%) Surgery vs. surgery 115 (77) Surgery vs. non-surgery 35 (23) Number of trials included, median 8 (8, 2–67) (IQR, Range) Subspecialty of the intervention, n (%) General (incl. upper/lower GI) 71 (47) Orthopaedic 25 (17) Cardiothoracic 18 (12) Vascular 12 (8) Urology 10 (7) ENT/head and neck 8 (5) Neurosurgery 5 (3) Plastic and reconstructive 1 (1) Funding source, n (%) Industry 4 (3) Unclear 78 (52) No external source 33 (22) Non-industry 35 (23) Conclusion, n (%) Supports surgical intervention 74 (49) Further research required 54 (36) Supports control group 22 (15) Type of journal, n (%) General surgical 23 (15) General medical 44 (29) Subspecialty surgical 62 (41) Subspecialty medical 21 (14) Cochrane reviews, n (%) 36 (24) Impact factor, median (IQR) 3.5 (3.5) PRISMA endorsing journal, n (%) 66 (44) Length of article in words, mean (SD) 3862 (2174)

a Only the top three countries presented for brevity b USA, Canada, Australia, New Zealand, Japan, Israel, and Western Europe GI = gastrointestinal PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses SD = standard deviation IQR = interquartile range

209

Table 7.5 Interobserver reliability assessments for each PRISMA item using kappa statistics

PRISMA Item Agreement (%) Kappa statistic 1. Title 97 0.73 2. Structured summary 93 0.86 3. Rationale 98 0.66 4. Objectives 92 0.62 5. Protocol and registration 97 0.93 6. Eligibility criteria 95 0.70 7. Information sources 98 0.91 8. Search 92 0.82 9. Study selection 92 0.81 10. Data collection process 95 0.83 11. Data items 90 0.57 12. Risk of bias in individual studies 98 0.95 13. Summary measures 98 0.79 14. Synthesis of results 90 0.71 15. Risk of bias across studies 97 0.93 16. Additional analyses 87 0.44 17. Study selection 93 0.86 18. Study characteristics 100 1.00 19. Risk of bias within studies 92 0.81 20. Results of individual studies 92 0.69 21. Synthesis of results 98 0.92 22. Risk of bias across studies 93 0.87 23. Additional analysis 85 -0.03 24. Summary of evidence 90 0.45 25. Limitations 87 0.72 26. Conclusions 93 0.57 27. Funding 97 0.93

210 Table 7.6 Interobserver reliability assessments for each AMSTAR item using kappa statistics

AMSTAR Item Agreement (%) Kappa statistic 1. “a priori” design 97 0.93 2. Duplicate study selection and 88 0.77 data extraction 3. Comprehensive literature 85 0.69 search 4. Grey literature search 80 0.60 5. List of included (and excluded) 100 1.00 studies 6. Characteristics of included studies 85 0.56 7. Scientific quality assessed 92 0.80 8. Scientific quality used in 87 0.72 conclusions 9. Appropriate methods to combine 83 0.48 studies 10. Publication bias assessment 87 0.74 11. Conflict of interest 95 0.81

Figure 7.2 Frequency histogram of PRISMA scores Figure 7.3 Frequency histogram of PRISMA scores .15 .1 Density .05 0 5 10 15 20 25 PRISMA Score

211 mean PRISMA score of 23.1 (SD=2.2) compared to other reviews (mean =

17.8, SD = 4.1).

Figure 7.3 depicts the proportion of meta-analyses that adequately reported each PRISMA domain. Items that were reported well included identifying the report as a systematic review / meta-analysis in the "title" (Item 1, 91%), justifying a "rationale" for the study (Item 3, 97%), specifying the "eligibility criteria" for the inclusion of studies in the review (Item 6, 92%), specifying the

"information sources" that were searched (Item 7, 89%), providing a list of

"data items" extracted from included studies (Item 11, 89%), specifying the

"summary measures" that were used in their statistical analysis (Item 13,

97%), providing a descriptive analysis of "study characteristics" (Item 18,

93%), providing a "synthesis of results" with appropriate outcome estimates and confidence intervals for each analysis (Item 21, 93%), and providing a

"summary of the evidence" in the context of health care providers, users and/or policy makers (Item 24, 94%). Other domains were poorly reported, such as providing a "structured abstract" (Item 2, 34%), reporting a well defined "objective(s)" statement (Item 4, 15%), describing whether a

"protocol and/or registration" of the review was available (Item 5, 34%), presenting the full electronic "search" strategy for at least one database (Item

8, 41%), reporting on the "risk of bias across studies" including publication and/or reporting biases in the included trials (Item 22, 47%), a description of the "limitations" at the study/outcome and review level (Item 25, 35%), and declaration of "funding" sources and their role in the review (Item 27, 43%).

212 Figure 7.3 Star chart depicting proportions of surgical meta-analyses adequately reporting each PRISMA item

7.5.5 Reporting of issues related to methodological quality: compliance with the AMSTAR checklist

Figure 7.4 depicts the reporting of individual AMSTAR items. Better reported items included the use of appropriate methods to combine studies in meta- analysis (Item 9, 83%), providing detailed characteristics of the included studies relating to included patients, the interventions, and outcomes examined (Item 6, 69%), and a detailed presentation of the scientific quality of the included studies (Item 7, 67%). Other items were poorly reported, including a description of any conflict(s) of interest in both the included studies and the performance of the review (Item 11, 9%), providing a list of included and excluded studies, even as appendices (Item 5, 27%),

213 describing a comprehensive literature search (Item 3, 29%), declaring an “a priori” design was put in place for the review and the use of a protocol (Item 1,

35%), and the reporting of duplicate study selection and data extraction methods (Item 2, 41%).

AMSTAR items were less likely to be reported than PRISMA items, and this was reflected in the mean AMSTAR score of 5.2 (SD=2.9) out of a maximum of 11 (48% of items adequately reported, on average). Cochrane reviews reported on methodological issues more than twice as often than other reviews (mean Cochrane AMSTAR score = 9.0 [SD = 1.5] vs. non-Cochrane review mean AMSTAR score = 4.1 [SD = 2.1])

Figure 7.4 Star chart depicting proportions of surgical meta-analyses adequately reporting each AMSTAR item

214 7.5.6 Variables associated with PRISMA reporting

Results of exploratory univariate and multivariate linear regression are presented in Figures 7.5 and 7.6, respectively. After adjustment for other variables, publication of the article in the Cochrane Database of Systematic

Reviews was associated with an increase in 3.76 points to the total PRISMA score (95% CI, 2.12 to 5.40, p<0.001). The length of the article in words also had an association, but a smaller effect, with an additional 0.40 in PRISMA score (95% CI, 0.08 to 0.72, p<0.001) for every 1000 words presented in the meta-analysis report. Diagnostics revealed that regression assumptions were valid, and there was no evidence of heteroscedasticity of residuals or multicollinearity.

215 Figure 7.5 Forest plot depicting univariate regression results with regression point estimates and 95% confidence intervals for each associate variable

a Change in PRISMA score expressed per one impact factor unit b Change in PRISMA score expressed per one RCT c Change in PRISMA score expressed per 1000 words

Figure 7.6 Forest plot depicting multivariate regression results with regression point estimates and 95% confidence intervals for each associate variable

a Change in PRISMA score expressed per 1000 words

216 7.6 Discussion

A systematic review of recently published surgical meta-analyses was conducted in order to examine compliance with the PRISMA statement as a reporting tool, and compliance with the AMSTAR checklist as a critical appraisal tool. It was found that overall compliance with the PRISMA statement was moderate - on average, 17 of 27 items (71%) were adequately reported. Of concern, items related to the validity of a meta- analysis, including protocol and registration, the search protocol for identifying eligible studies, assessments of risk of bias across included studies, a discussion of limitations, and the reporting of funding sources were poorly reported. Overall compliance with the AMSTAR checklist was poor, with less than half of items (48%) adequately addressed, on average. Of particular concern was the lack of reporting of conflicts of interest at both study and review level, the performance and description of a comprehensive literature search, describing whether any a priori design was implemented, and the lack of duplicate study selection and data extraction. It was also found that compliance with PRISMA was positively associated with publication in the Cochrane Database of Systematic Reviews, and an increased length of the article in words. There was no evidence of an association with reporting for other variables examined, such as author affiliation or reported educational background, journal endorsement of the

PRISMA guidelines, or the impact factor of the journal. To the best of knowledge this study is the first to examine compliance of surgical meta- analyses with the PRISMA statement, and sets the standard for what is known about current reporting.

217 A number of PRISMA items were well reported amongst the included sample of surgical meta-analyses. The majority of studies were described as a systematic review and/or meta-analysis in the title (Item 1), which allows instant identification by the reader (as a potential high level of evidence) and convenient indexation in the appropriate categories by medical databases.

Most authors also provided a sufficient rationale for their review (Item 3), allowing readers to place the questions being asked in their appropriate context and state of knowledge. Of greater importance was the high rate of reporting of study eligibility criteria (Item 6) which has implications for the validity of the meta-analysis and its ability to answer the research question(s) posed. For example, questions related to specific patients, interventions, comparisons and outcomes (or “PICO”) characteristics can only be answered if those data are represented in the included trials.239 Further, the inclusion of certain studies based on their scientific quality,204 publication status,90 or language,166 may result in bias. The reporting of details of these characteristics in the results section of included meta-analyses was also very good (Item 18, “Study characteristics”). Meta-analyses often listed the sources of information searched along with their dates of coverage (Item 7) allowing readers to assess how likely it is that all sources of evidence were found, and whether the meta-analysis was up to date.324 Since only meta- analyses were included in this sample, it was essential quantitative results were presented succinctly. Most authors adequately reported the results of meta-analyses by providing summary effect estimates, measures of precision, and measures of consistency, which allows readers to readily assess the clinical and statistical significance of findings.240

218 Other PRISMA items were poorly reported, and may reflect a lack of knowledge amongst review authors as to their importance and potential implications. Most meta-analyses were not summarised in a structured abstract (Item 2). Abstracts provide convenient, essential information about the published article, and for many readers (who are restricted by either time and/or access to full texts of journals), the abstract of the article may be all that is available. Most authors provided abstracts with standard subheadings

(introduction/background, methods, results and conclusions/discussion), which do not stipulate the reporting of some essential information for a meta- analysis, such as data sources, study eligibility criteria, and limitations of the review. While the PRISMA statement does not recommend a specific format for abstracts of meta-analyses,176 guidelines are available for other study types,325 and many general medical journals now stipulate that authors provide abstracts using specific subheadings. Cochrane reviews have strict protocols for the reporting of their reviews, which include a structured abstract, and comprised the majority of meta-analyses that adequately reported this item.

In any study (not just a meta-analysis), providing clear and unambiguous research questions guides the reader as to scope and relevance, and allows the appraisal of methods used to answer those questions.239 Conversely, a study that does not report an objective statement is often difficult to understand and cumbersome to appraise, and may indicate a lack of author focus and a risk of reporting bias. Only 15% of meta-analyses reported their study objectives in a clear and concise manner (Item 4). Apart from

Cochrane reviews, which are obligated to publish a peer-reviewed protocol

219 prior to the conduct of the review, few authors (only 13% of non-Cochrane review authors) reported whether a protocol was planned or available for their meta-analysis (Item 5). A lack of a protocol leaves a review open for a posteriori modification of methods and selective reporting of results and conclusions. A survey of Cochrane reviews found that most had important changes (that were likely to alter the review’s results) when compared to the published protocol.163 It is highly likely this practice also occurs commonly in non-Cochrane reviews. In recognition of the importance of protocols in randomised trials, most high impact journals require that trials be registered for acceptance for publication.289 It remains to be seen whether a similar initiative takes place for systematic reviews and meta-analyses, although publicly available registries already exist.326

Less than half of meta-analysis authors reported an assessment of bias across included studies (Items 15 and 22). Publication and outcome reporting biases (where studies and/or outcomes are published according to their results) are common in the literature,84 and are likely to affect summary effect estimates.91,280 Authors should therefore carefully consider whether their results are subject to these common biases by the examination of funnel plots or the performance of statistical tests.259 Most did not adequately discuss the limitations of their review (Item 25). While it was commonly found that the limitations of included studies (or their risk of bias) were included in the discussion, a discussion of the review level risk of bias was uncommon.

A meta-analysis, like any other study design, is subject to bias, which may be introduced in the selection of studies, their appraisal, data extraction and

220 analysis stages.161 These limitations should be reported in a transparent manner, in line with what is appropriate for reports of other study designs.

AMSTAR was additionally used as a critical appraisal tool to assess the potential for bias in the included meta-analyses. Some AMSTAR items were found to overlap with PRISMA items, such as the use of a protocol (PRISMA

Item 5 and AMSTAR Item 1), the assessment of risk of bias in included studies (PRISMA Item 19 and AMSTAR Item 7), and the assessment of publication bias (PRISMA Item 22 and AMSTAR Item 10). However, other

AMSTAR items had more stringent requirements, and were generally poorly reported. For example, most meta-analyses did not involve duplicate study selection and data extraction, which may reduce measurement biases.327

Only 30% of meta-analyses performed a “comprehensive search”, with at least two electronic databases searched, supplemented by other sources, and the syntax of the search strategy provided in detail. Unpublished “grey literature” was only searched half the time, but meta-analyses based solely on published data may have biased summary estimates.90 One third of meta- analyses did not perform a comprehensive individual study assessment of the risk of bias of included studies. A large body of evidence now exists for the exaggeration of treatment effects in randomised trials of poor methodological quality,56 and the summary effect estimates provided in meta- analyses are subject to the same distortion of the truth. The use of checklists or summary scores as an aggregate measure of bias is problematic204 and was not regarded as adequate reporting of this item. Finally, the vast majority of meta-analyses did not report on potential conflicts of interest both at the trial and review level. The impact of funding source on author conclusions

221 has long been recognised in randomised trials97,287 and it is likely that the same relationship exists for meta-analyses, with at least one empirical investigation showing an association between meta-analysis funding and conclusions which are favourable to the funder.95

Several studies have assessed the reporting of systematic reviews in various specialties, but few have used PRISMA as a reporting checklist.328 Moher and colleagues performed a broad review of 300 published systematic reviews indexed on Medline in 2004. This sample of surgical meta-analyses compared well with higher rates of reporting for several characteristics, such as use of the terms “systematic review and/or meta-analysis” in the title, the inclusion of grey literature in the search for studies, providing the syntax for the search strategy, the assessment of heterogeneity and publication bias.157

Surgical meta-analyses also compared favourably to those published in other sub-specialties, including those published in complementary and alternative medicine329 (although that sample did not include Cochrane reviews), and

Chinese medicine.328 However, samples of systematic reviews from anaesthesiology183 and renal medicine323 had superior reporting when compared to this review of surgical meta-analyses. It should be noted however, that sampling methods and data item definitions might have differed in all these studies, limiting the validity of any comparisons.

Multivariate regression was modelled to explore characteristics associated with overall reporting using the PRISMA guidelines. After adjustment for other variables, Cochrane reviews had a PRISMA score approximately four points higher than other reviews. Cochrane reviews follow strict guidelines and peer reviewed protocols, and have been shown to be consistently

222 superior in reporting and methodology than other reviews.157,317,318

Manuscript length was also associated with higher compliance with the

PRISMA statement; for every 1000 words, the PRISMA score increased by

0.40. There is conflicting evidence regarding the significance of this result.

Biondi-Zoccai and colleagues found an association between the length of the article and compliance with QUOROM323 but more recent analyses failed to confirm such associations.183 The methodology used in this study was more robust, with larger samples than the aforementioned studies, and employed manuscript length in words, rather than pages to reflect true journal constraints. Word limits imposed by journals are one of the major hurdles and these findings suggest that reporting quality is compromised when inadequate space is provided in all sections. Of note, journal endorsement of

PRISMA or QUOROM was not associated with improved reporting, despite a significant association being found in another study.183 It is likely that endorsement of the PRISMA statement is not always enforced by editors and journals, reflecting a similar experience to the introduction and enforcement of the CONSORT guidelines for randomised trials.225

The strengths of this research include a protocol driven design, a comprehensive search with duplicate study selection and data extraction, and a prolonged pilot period during which discrepancies in data items were resolved. The main weakness of this study related to the use of aggregate scores indicating compliance with PRISMA and AMSTAR. While these have been used previously as measures of overall reporting, they may not be valid.

Nevertheless, the reporting of individual items was described in this paper.

Second, assessment of reporting relied on the descriptions and methods

223 conveyed by authors, and may not be an accurate reflection of what actually occurred during the review. Finally, only published surgical meta-analyses of randomised trials were included in this review, representing the highest level of evidence available in clinical research. The reporting patterns found here may not be generalisable to other specialties, to unpublished articles or articles indexed in other databases, or to systematic reviews of non- randomised trials.

Systematic reviews and meta-analyses are increasingly being recognised as a means to cope with the large amounts of information available to clinicians, patients, and policy makers. Meta-analyses may be particularly useful in surgical specialties given the relatively smaller sample sizes of surgical trials, and the heterogeneous nature of surgical clinical scenarios. Despite the availability and endorsement by journals of reporting guidelines, deficits remain in published surgical meta-analyses. This paper has highlighted these deficits. The implications are that the results of meta-analyses may be biased, which is of particular concern given that meta-analyses are often highly accessed and cited. Further research is needed to explore to what extent the methodology of meta-analyses influences effect estimates. In the meantime, reporting guidelines such as PRISMA should be widely adopted and enforced by authors, journals and readers of research.

224 8. A discussion of the findings of iQuEST *

*Investigating the Quality and Epidemiology of Surgical Trials

This thesis has examined the different sources of bias that may occur in randomised trials and meta-analyses of surgical interventions, an area that until recently has received relatively little attention. In the first study (Chapter

2), the method of attaining the sample of surgical RCTs was explained, followed by a description of their epidemiology and methodological characteristics. A direct comparison of these quality characteristics was made to previous samples of general healthcare RCTs. In the second study

(Chapter 3), a different dimension of quality was assessed: reporting quality.

Compliance with the CONSORT statement was used to measure this, and given the complexities of surgical interventions and patients, this was supplemented with different items related to external validity. The third study

(Chapter 4) examined whether quality domains were associated with treatment effects, and whether there was visual (using plots) or statistical evidence of bias. The fourth and fifth studies examined issues related to outcome reporting in surgical trials. In Chapter 5, the association between completeness of outcome reporting and statistical significance was explored, while in Chapter 6 the patterns and clinical relevance of outcomes reported in surgical trials is assessed. Finally, in the sixth study (Chapter 7) the epidemiology, methodological quality, and reporting quality of meta-analyses of surgical interventions is assessed.

225 While some aspects of bias have been examined in the existing surgical literature, this project fills a number of important gaps. First, a large sample of recently published RCTs and meta-analyses was obtained using systematic review methods. Trials were not limited to a particular surgical specialty or journal. Second, only studies assessing a surgical intervention were included, thus the findings of this thesis are generalisable exclusively to surgical practice. Third, a broad assessment of the different sources of bias was performed, as opposed to the narrow focus of previous assessments of bias. Fourth, this study is the first to assess issues of outcome selection bias.

Fifth, this study is the first to examine compliance of surgical meta-analyses with the current guidelines on reporting (PRISMA) and methodology

(AMSTAR).

It was found that surgical RCTs had low rates of reporting key scientific quality domains. Less than half of trials described an adequate random sequence generation or allocation concealment, while only one third described any form of blinding, or used the intention to treat principle as the primary method of analysis. Few other studies have assessed the quality of surgical trials in a similar manner. Hall and colleagues330 assessed 346 trials published in several popular surgical journals, and found that 75% gave “an account of the selection process”, 48% of trials performed an “unbiased assessment of outcome events”, and only 27% described methods of randomisation. It is unclear, however, what the authors specifically regarded as adequate criteria for each of these domains. In 2005, the same authors performed an updated quality assessment of surgical RCTs using several

226 quality criteria adapted from the CONSORT checklist.140 Their findings were consistent with the findings of this thesis, with 58% of RCTs reporting an adequate method of randomisation, 29% with adequate allocation concealment, and 30% with adequate blinding. Manterola and colleagues36 found low methodological quality in their sample of 500 studies published in surgical journals, but included other study types in addition to RCTs. Further; a quality scale with little validity was used to assess quality. A key difference between this thesis and the above studies was the inclusion criteria for studies assessed: it is likely that a substantial proportion of the trials assessed examined non-surgical interventions in surgical patients, rather than a pure focus on surgical interventions. Jacquier and colleagues have, thus far, performed the most thorough quality evaluation of surgical intervention trials using the CLEAR NPT checklist.35 The authors found slightly lower rates of adequate randomisation, allocation concealment, and blinding than in this thesis. Other important issues, such as a sample size calculation, specification of primary/secondary outcomes, or declarations of funding, were not assessed in the above studies.

Commentators have criticised the quality of surgical research for lagging behind other specialties, particularly in terms of the quality and volume of

RCTs.119,331 This thesis found that surgical trial quality compared favourably to other samples of general healthcare trials34,186 for most domains.

Predictably, adequate forms of blinding were less common in surgical trials. It was concerning that surgical trials had significantly higher rates of industry funding, given that it is common for funded trials to favour intervention

227 groups,95,97 and at the same time a large proportion of surgical trials were unclear regarding their source of funding. When compared to trials in other disciplines, it appears the problem with surgical research is not quality, but their quantity. The focus on case series and retrospective studies remains, and cannot settle important clinical questions in a credible manner.

This project was unique in that different dimensions of trial reporting were assessed. In addition to domains of methodological quality, the reporting of surgical trials was examined using compliance with the CONSORT statement.

On average, only 55% of the 22 CONSORT 2001 items were adequately reported. For a large proportion of trials it was unclear, for example, what the objectives were, whether a sample size calculation was done, or what approach to statistical analysis was conducted. Further, items related to the external validity (generalisability) of surgical trials were poorly reported. This information is particularly important for trials of surgical interventions where there is often wide variation in the approach to a surgical procedure, as well as perioperative management. These items are unlikely to bias treatment effects, but are basic components of a trial, and without them understanding and appraising what was done can be quite difficult. The findings of this thesis are consistent with two other recent assessments of CONSORT compliance. Agha and colleagues120 found that approximately half of

CONSORT items were reported, but they were not systematic in their inclusion of RCTs (and therefore their assessment had questionable generalisability). Balasubramanian had slightly higher compliance, but only included studies from high impact general medical and surgical journals,332

228 and therefore their findings are restricted to those sources. It appears that the most reliable solution to improve surgical trial reporting is to make compliance with the CONSORT statement mandatory prior to acceptance by a journal. A study is underway to examine whether this will indeed improve reporting.333

This thesis provides empirical evidence for the association between inadequate methodology and quality in surgical trials. Inadequately randomised surgical trials were found to exaggerate effects by almost 20%, while inadequate methods to deal with attrition consistently exaggerated effects by approximately 25%. It is interesting to note these findings were consistent with those of Savovic and colleagues,56 whose review of meta- epidemiology studies found similar rates of outcome exaggeration for these domains. Thus, surgical interventions do not seem more or less prone to biased effect estimates than other types of interventions. It is unclear why blinding was not found to influence outcome estimates in this thesis, but it is possible the lack of blinding of other parties conflated these results (since only blinding of the outcome of interest was considered). It is also possible these findings are confounded by “small study effects”,79 or the tendency for smaller (and poorer quality) studies to show larger treatment effects.

The reporting of meta-analyses of surgical interventions also has deficits.

Individual domains had similar rates of reporting when compared to an assessment of general healthcare meta-analyses performed by Moher and colleagues.157 Bias in the conduct of a systematic review and meta-analysis is particularly problematic, since findings are often highly influential.312 There

229 may also be less awareness of the potential for bias in meta-analyses amongst the medical community. Tricco found few studies on the bias that may exist in systematic reviews, and little empirical data on how this bias can influence effect estimates.161 The influence of meta-epidemiology studies on methodological research in RCTs has been substantial. Similar research is now needed to explore the association between different quality domains on treatment effects in meta-analyses.

Outcomes measured in surgical trials are reported based on their statistical significance. Further, there is a concerning proportion of outcomes that remain unreported. This finding is highly consistent with previous investigations of outcome reporting bias,24,88,284 although the bias was somewhat larger in this thesis. The empirical basis for outcome reporting bias is now starting to be understood,91 and given the difficulty in detecting unreported outcomes, the impact is probably understated. The only way to reliably prevent it is the publication of protocols prior to trial commencement, a proposition that is fraught with legal and ethical challenges.288 Efforts are already underway, with the recent publication of guidelines for the publication of trial protocols.289 It remains to be seen whether the wider research community adopts these ideals.

One in three surgical trials had a surrogate/laboratory based primary outcome, which is concerning given the uncertain link between surrogate measures and their clinical correlates,302 as well as their tendency to have relatively larger treatment effects.301 However, the use of surrogate outcomes in surgical trials compares favourably to an assessment of diabetes trials by

230 Gandhi and colleagues, where over 80% of primary outcomes were surrogates. It is likely that surrogate outcomes are selected based on their ease of measurement and their expected effect size. Regardless, the use of surrogate and laboratory based outcomes may limit how useful trials are in clinical practice, and advice regarding their use should be integrated into reporting guidelines.

It is clear from existing evidence, and the findings of this thesis, that the status quo for the conduct and reporting of surgical trials should be improved.

Readers and users of the research “need and deserve to know” about what actually took place in trial.215 The onus is not just on authors, but also journal editors, reviewers and funders to stress the importance of this issue.

The multiple, coinciding challenges of surgical trials are well recognised. The need for greater emphasis and funding has been recognised,334 but slow recruitment is often a problem even in large, well-funded surgical trials.129,335

More comprehensive solutions are required. In Germany, a multicentre network of academic surgeons was established in 2003 in conjunction with the German Cochrane Centre, in order to facilitate the conduct of important surgical RCTs. The Study Centre of the German Surgical Society

(Studienzentrum der Deutschen Gesellschaft für Chirurgie, or SDGC) has thus far produced five multicentre trials and a number of systematic reviews, an impressive achievement, and may be a model for how surgical evidence should be produced.336 In the United Kingdom, important steps have been taken by the IDEAL collaboration, a group of surgeons and methodologists aiming to improve the quality of research in surgery. After a series of

231 meetings, the group has recently co-written a series of articles outlining the nature of surgical development, the challenges of surgical research, and a framework for surgical innovation.10,135,189 The IDEAL recommendations recognise that RCTs remain the gold standard for comparing interventions, but that alternate study designs are required in the early stages of surgical innovation or when RCTs are not feasible at all. It remains to be seen whether these IDEALs gain traction in the surgical community.

232 References

1. Daly J. A Short History of Evidence-Based Medicine. In: Montori V (editor). Evidence-Based Endocrinology. Totowa, USA. Humana Press; 2006:11–24.

2. Gordis L. Epidemiology. 4 ed. Philadelphia, USA: Saunders Elsevier; 2008.

3. Daly WJ, Brater DC. Medieval Contributions to the Search for Truth in Clinical Medicine. Perspectives in Biology and Medicine. 2000;43(4):530–540.

4. Galen on medical experience. JAMA 1945;129(6):483–483.

5. Nasser M, Tibi A, Savage-Smith E. 11th century rules for assessing the effects of drugs. JLL Bulletin: Commentaries on the history of treatment evaluation (wwwjameslindlibraryorg).

6. Tröhler U. To improve the evidence of medicine. Edinburgh, Scotland. Royal College of Physicians of Edinburgh, Metro Press; 2000.

7. Claridge JA, Fabian TC. History and development of evidence- based medicine. World J Surg. 2005;29:547-553.

8. Best M, Neuhauser D. Pierre Charles Alexandre Louis: Master of the spirit of mathematical clinical science. BMJ Quality & Safety. 2005;14(6):462–464.

9. Tröhler U. Cheselden's 1740 presentation of data on age-specific mortality after lithotomy. JLL Bulletin: Commentaries on the history of treatment evaluation (wwwjameslindlibraryorg).

10. Barkun JS, Aronson JK, Feldman LS, Maddern GJ, Strasberg SM.Evaluation and stages of surgical innovations. The Lancet. 2009;374(9695):1089–1096.

11. Kaska SC, Weinstein JN. Historical perspective. Ernest Amory Codman, 1869-1940. A pioneer of evidence-based medicine: the end result idea. Spine. 1998;23(5):629–633.

12. Amberson JB, McMahon BT, Pinner M. A clinical trial of sanocrysin in pulmonary tuberculosis. American Review of Tuberculosis. 1931;24:401–435.

13. British Medical Research Council. Streptomycin Treatment of Pulmonary Tuberculosis. BMJ. 1948;2(4582):769–782.

14. Chalmers I. Why the 1948 MRC trial of streptomycin used treatment allocation based on random numbers. Journal of the

233 Royal Society of Medicine. 2011;104(9):383–386.

15. Cochrane AL. Sickness in Salonica: my first, worst, and most successful clinical trial. BMJ (Clinical research ed). 1984;289(6460):1726.

16. Cochrane AL. Effectiveness and efficiency: random reflections on health services: Oxford, UK: Oxford University Press; 1973.

17. Feinstein AR. An additional basic science for clinical medicine: I. The constraining fundamental paradigms. Ann Intern Med. 1983;99(3):393-397.

18. Feinstein AR. Boolean Algebra and Clinical Taxonomy. N Engl J Med. 1963;269(18):929–938.

19. Feinstein AR. An Additional Basic Science for Clinical Medicine:II. The Limitations of Randomized Trials. Ann Intern Med. 1983;99(4):544–550.

20. Feinstein AR. An additional basic science for clinical medicine: III. The challenges of comparison and measurement. Ann Intern Med. 1983;99(5):705-712.

21. Guyatt GH, Haynes RB, Jaeschke RZ, et al. for the Evidence- Based Medicine Working Group. Users' Guides to the Medical Literature: XXV. Evidence-based medicine: principles for applying the Users' Guides to patient care. JAMA. 2000;284(10):1290– 1296.

22. Guyatt G and the Evidence-Based Medicine Working Group. Evidence-based medicine. A new approach to teaching the practice of medicine. JAMA. 1992;268(17):2420–2425.

23. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c869

24. Chan A-W, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA. 2004;291(20):2457–2465.

25. Gluud LL. Bias in clinical intervention research. Am J Epidemiol. 2006;163(6):493–501.

26. Sacks H, Chalmers TC, Smith HJ. Randomized versus historical controls for clinical trials. Am J Med. 1982;72(2):233–240.

234 27. Wacholder S. Design issues in case-control studies. Statistical Methods in Medical Research. 1995;4(4):293–309.

28. Chalmers TC, Celano P, Sacks HS, Smith HJ. Bias in treatment assignment in controlled clinical trials. N Engl J Med. 1983;309(22):1358–1361.

29. Smith GD, Ebrahim S. Data dredging, bias, or confounding. BMJ. 2002;325:1437-1438.

30. Cobb LA, Thomas GI, Dillard DH, Merendino KA, Bruce RA. An evaluation of internal-mammary-artery ligation by a double-blind technic. N Engl J Med. 1959;260(22):1115–1118.

31. Bohensky MA, Sundararajan V, Andrianopoulos N, et al. Trends in elective knee arthroscopies in a population-based cohort, 2000- 2009. Med J Aust. 2012;197(7):399–403.

32. Moseley JB, O'Malley K, Petersen NJ. A Controlled Trial of Arthroscopic Surgery for Osteoarthritis of the Knee. N Engl J Med. 2002;347(2):81-88.

33. Kirkley A, Birmingham TB, Litchfield RB, et al. A Randomized Trial of Arthroscopic Surgery for Osteoarthritis of the Knee. N Engl J Med. 2008;359(11):1097–1107.

34. Chan A-W, Altman DG. Epidemiology and reporting of randomised trials published in PubMed journals. Lancet. 2005;365(9465):1159–1162.

35. Jacquier I, Boutron I, Moher D, Roy C, Ravaud P. The reporting of randomized clinical trials using a surgical intervention is in need of immediate improvement: a systematic review. Ann Surg. 2006;244(5):677–683.

36. Manterola C, Pineda V, Vial M, Losada H, the MG. What is the methodologic quality of human therapy studies in ISI surgical publications? Ann Surg. 2006;244(5):827–832.

37. Bhandari M, Guyatt GH, Lochner H, Sprague S, Tornetta P3. Application of the Consolidated Standards of Reporting Trials (CONSORT) in the Fracture Care Literature. J Bone Joint Surg Am. 2002;84-A(3):485–489.

38. Oxman AD and the Evidence-Based Medicine Working Group. Grading quality of evidence and strength of recommendations. BMJ. 2004;328(7454):1490–1494.

39. Schulz KF, chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA.

235 1995;273(5):408–412.

40. Howick J, Chalmers I, Glasziou P, et al., editors. OCEBM Levels of Evidence Working Group. “The Oxford 2011 Levels of Evidence.” Oxford Centre for Evidence-Based Medicine Available at: http://www.cebm.net/index.aspx?o=5653. Accessed December 8, 2013.

41. Silverman WA, Chalmers I. Casting and drawing lots: a time- honoured way of dealing with uncertainty and for ensuring fairness. The James Lind Library (wwwjameslindlibraryorg). 2006:1–7.

42. Chalmers I, Dukan E, Podolsky S, Davey Smith G. The advent of fair treatment allocation schedules in clinical trials during the 19th and early 20th centuries. Journal of the Royal Society of Medicine. 2012;105(5):221–227.

43. Doull JA, Hardy M, Clark JH. The effect of irradiation with ultra- violet light on the frequency of attacks of upper respiratory disease (common colds). Am J Epidemiol. 1931;13(2):460-477.

44. Colebrook D. Irradiation and Health, Medical Research Council Special Report No. 131. 1929:p 4–5,12–13.

45. Ioannidis JP, Haidich AB, Pappa M, et al. Comparison of evidence of treatment effects in randomized and nonrandomized studies. JAMA. 2001;286(7):821–830.

46. Kunz R, Vist G, Oxman AD. Randomisation to protect against selection bias in healthcare trials. Cochrane Database Syst Rev. 2007;(2):MR000012.

47. Schulz KF. Subverting randomization in controlled trials. JAMA. 1995;274(18):1456–1458.

48. Higgins JPT, Altman DG, Sterne JAC. Assessing risk of bias in included studies. In: Higgins JPT, Green S, (editors). Cochrane Handbook for Systematic Reviews of Interventions. 5th edition. Chichester, UK: John Wiley & Sons, Ltd; 2008: pp187-241.

49. Schulz KF, Grimes DA. Unequal group sizes in randomised trials: guarding against guessing. The Lancet. 2002;359(9310):966–970.

50. Scott NW, McPherson GC, Ramsay CR, Campbell MK. The method of minimization for allocation to clinical trials. a review. Control Clin Trials. 2002;23(6):662–674.

51. Tom Treasure KDM. Minimisation: the platinum standard for trials?: Randomisation doesn’t guarantee similarity of groups; minimisation does. BMJ. 1998;317(7155):362.

236 52. Altman DG, Schulz KF. Statistics notes: Concealing treatment allocation in randomised trials. BMJ. 2001;323(7310):446–447.

53. Moher D, Pham B, Jones A, et al. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? The Lancet. 1998;352(9128):609–613.

54. Egger M, Juni P, Bartlett C, Holenstein F, Sterne J. How important are comprehensive literature searches and the assessment of trial quality in systematic reviews? Empirical study. Health Technol Assess. 2003;7(1):1–76.

55. Kjaergard LL, Villumsen J, Gluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Ann Intern Med. 2001;135(11):982–989.

56. Savovic J, Jones HE, Altman DG, et al. Influence of reported study design characteristics on intervention effect estimates from randomized, controlled trials. Ann Intern Med. 2012;157(6):429– 438.

57. Green S, Higgins JPT, Alderson P, Clarke M, Mulrow C, Oxman AD. Introduction. In: Higgins JPT, Green S, (editors). Cochrane Handbook for Systematic Reviews of Interventions. 5th edition. Chichester, UK: John Wiley & Sons, Ltd; 2008: pp 1-9.

58. Hill AB. Suspended judgment. Memories of the British Streptomycin Trial in Tuberculosis. The first randomized clinical trial. Control Clin Trials. 1990;11(2):77–79.

59. Schulz KF, Chalmers I, Altman DG. The landscape and lexicon of blinding in randomized trials. Ann Intern Med. 2002;136(3):254– 259.

60. Editorial commentary. Differences in the way treatment outcomes are assessed. The James Lind Library (wwwjameslindlibraryorg). 2007.

61. Schulz KF, Grimes DA. Blinding in randomised trials: hiding who got what. The Lancet. 2002;359(9307):696–700.

62. Boutron I, Tubach F, Giraudeau B, Ravaud P. Blinding was judged more difficult to achieve and maintain in nonpharmacologic than pharmacologic trials. J Clin Epidemiol. 2004;57(6):543–550.

63. Macklin R. The ethical problems with sham surgery in clinical research. N Engl J Med. 1999;341(13):992–996.

64. Freed CR, Greene PE, Breeze RE, et al. Transplantation of embryonic dopamine neurons for severe Parkinson's disease. N

237 Engl J Med. 2001;344(10):710–719.

65. Freeman TB, Vawter DE, Leaverton PE, et al. Use of placebo surgery in controlled trials of a cellular-based therapy for Parkinson's disease. N Engl J Med. 1999;341(13):988–992.

66. Waber RL, Shiv B, Carmon Z, Ariely D. Commercial features of placebo and therapeutic efficacy. JAMA. 2008;299(9):1016–1017.

67. Petrovic P, Kalso E, Petersson KM, Ingvar M. Placebo and opioid analgesia- imaging a shared neuronal network. Science. 2002;295(5560):1737–1740.

68. Beecher HK. Surgery as placebo: A quantitative study of bias. JAMA. 1961;176(13):1102.

69. Hróbjartsson A, Gotzsche, P. C. Is the Placebo Powerless? N Engl J Med. 2001;344(21):1594-1602.

70. Wood L, Egger M, Gluud LL, et al. Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study. BMJ. 2008;336(7644):601–605.

71. Johnson AG. Surgery as a placebo. The Lancet. 1994;344(8930):1140–1142.

72. Kaptchuk TJ, Goldman P, Stone DA. Do medical devices have enhanced placebo effects? J Clin Epidemiol. 2000;53(8):786-792.

73. Montori VM, Guyatt GH. Intention-to-treat principle. CMAJ. 2001;165(10):1339–1341.

74. Hollis S, Campbell F. What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ. 1999;319(7211):670–674.

75. Fergusson D, Aaron SD, Guyatt G, Hébert P. Post-randomisation exclusions: the intention to treat principle and excluding patients from analysis. BMJ. 2002;325(7365):652–654.

76. Porta N, Bonet C, Cobo E. Discordance between reported intention-to-treat and per protocol analyses. J Clin Epidemiol. 2007;60(7):663–669.

77. Balk EM, Bonis PAL, Moskowitz H, et al. Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA. 2002;287(22):2973–2982.

78. Tierney JF. Investigating patient exclusion bias in meta-analysis. Int J Epidemiol. 2004;34(1):79–87.

238 79. Sterne JAC, Egger M, Smith GD. Systematic reviews in health care: Investigating and dealing with publication and other biases in meta-analysis. BMJ. 2001;323(7304):101–105.

80. Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research. The Lancet. 1991;337(8746):867–872.

81. Ioannidis JP. Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials. JAMA. 1998;279(4):281–286.

82. Stern JM, Simes RJ. Publication bias: evidence of delayed publication in a cohort study of clinical research projects. BMJ. 1997;315(7109):640–645.

83. Olson CM, Rennie D, Cook D, et al. Publication bias in editorial decision making. JAMA. 2002;287(21):2825–2828.

84. Dwan K, Altman DG, Arnaiz JA, et al. Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PloS one. 2008;3(8):e3081.

85. Hopewell S, Loudon K, Clarke MJ, Oxman AD, Dickersin K. Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database Syst Rev. 2009;(1):MR000006.

86. Hasenboehler EA, Choudhry IK, Newman JT, Smith WR, Ziran BH, Stahel PF. Bias towards publishing positive results in orthopedic and general surgery: a patient safety issue? Patient Saf Surg. 2007;1(1):4.

87. Pitak-Arnnop P, Sader R, Rapidis AD, et al. Publication bias in oral and maxillofacial surgery journals: an observation on published controlled trials. Journal of cranio-maxillo-facial surgery. 2010;38(1):4–10.

88. Chan A-W, Krleza-Jeric K, Schmid I, Altman DG. Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research. CMAJ. 2004;171(7):735–740.

89. Sutton AJ, Duval SJ, Tweedie RL, Abrams KR, Jones DR. Empirical assessment of effect of publication bias on meta- analyses. BMJ. 2000;320(7249):1574–1577.

90. Hopewell S, McDonald S, Clarke M, Egger M. Grey literature in meta-analyses of randomized trials of health care interventions. Cochrane Database Syst Rev. 2007;(2):MR000010.

91. Kirkham JJ, Dwan KM, Altman DG, et al. The impact of outcome reporting bias in randomised controlled trials on a cohort of

239 systematic reviews. BMJ. 2010;340:c365.

92. Silen W. Publish or Perish. Arch Surg. 1971;103(1):1–1.

93. Kjaergard LL, Als-Nielsen B. Association between competing interests and authors' conclusions: epidemiological study of randomised clinical trials published in the BMJ. BMJ. 2002;325(7358):249.

94. Bhandari M, Busse JW, Jackowski D, et al. Association between industry funding and statistically significant pro-industry findings in medical and surgical randomized trials. CMAJ. 2004;170(4):477– 480.

95. Lexchin J, Bero LA, Djulbegovic B, Clark O. Pharmaceutical industry sponsorship and research outcome and quality: systematic review. BMJ. 2003;326(7400):1167–1170.

96. Bekelman JE, Li Y, Gross CP. Scope and impact of financial conflicts of interest in biomedical research: a systematic review. JAMA. 2003;289(4):454–465.

97. Als-Nielsen B, Chen W, Gluud C, Kjaergard LL. Association of funding and conclusions in randomized drug trials: a reflection of treatment effect or adverse events? JAMA. 2003;290(7):921–928.

98. Kjaergard L. Funding, disease area, and internal validity of hepatobiliary randomized clinical trials. The American Journal of Gastroenterology. 2002;97(11):2708–2713.

99. Djulbegovic B, Lacevic M, Cantor A, et al. The uncertainty principle and industry-sponsored research. Lancet. 2000;356(9230):635–638.

100. Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med. 2008;358(3):252–260.

101. Egger M, Smith GD. Bias in location and selection of studies. BMJ. 1998;316(7124):61–66.

102. Juni P, Nartey L, Reichenbach S, Sterchi R, Dieppe PA. Risk of cardiovascular events and rofecoxib: cumulative meta-analysis. The Lancet. 2004;364(9450):2021-2029.

103. Dickersin K, Min YI, Meinert CL. Factors influencing publication of research results. Follow-up of applications submitted to two institutional review boards. JAMA. 1992;267(3):374–378.

104. Okike K, Kocher MS, Wei EX. Accuracy of Conflict-of-Interest Disclosures Reported by Physicians. N Engl J Med.

240 2009;361:1466-1474.

105. Carragee EJ, Hurwitz EL, Weiner BK. A critical review of recombinant human bone morphogenetic protein-2 trials in spinal surgery: emerging safety concerns and lessons learned. The Spine Journal. 2011;11(6):471-491.

106. Rodgers MA, Brown JVE, Heirs MK, et al. Reporting of industry funded study outcome data: comparison of confidential and published data on the safety and effectiveness of rhBMP-2 for spinal fusion. BMJ. 2013;346:f3981.

107. Carreyrou J, McGinty T. Top spine surgeons reap royalties, Medicare bounty. Wall Street Journal. 20th December 2010.

108. Juni P, Altman DG, Egger M. Assessing the quality of controlled clinical trials. BMJ. 2001;323(7303):42–46.

109. Moher D, Cook DJ, Jadad AR, et al. Assessing the quality of reports of randomised trials: implications for the conduct of meta- analyses. Health Technol Assess. 1999;3(12):i–iv.

110. Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials. 1995;16(1):62–73.

111. Herbison P, Hay-Smith J, Gillespie WJ. Adjustment of meta- analyses on the basis of quality scores should be abandoned. J Clin Epidemiol. 2006;59(12):1249–1256.

112. Huwiler-Muntener K, Juni P, Junker C, Egger M. Quality of reporting of randomized trials as a measure of methodologic quality. JAMA. 2002;287(21):2801–2804.

113. Latronico N, M, Minelli C, Zanotti C, Bertolini G, Candiani A. Quality of reporting of randomised controlled trials in the intensive care literature. A systematic analysis of papers published in Intensive Care Medicine over 26 years. Intensive Care Med. 2002;28(9):1316–1323.

114. Farrokhyar F, Chu R, Whitlock R, Thabane L. A systematic review of the quality of publications reporting coronary artery bypass grafting trials. Can J Surg. 2007;50(4):266–277.

115. Bhandari M, Guyatt GH, Lochner H, Sprague S, Tornetta P. Application of the Consolidated Standards of Reporting Trials (CONSORT) in the Fracture Care Literature. J Bone Joint Surg Am. 2002;84-A(3):485–489.

241 116. Begg C, Cho M, Eastwood S, et al. Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA. 1996;276(8):637–639.

117. Moher D. CONSORT: an evolving tool to help improve the quality of reports of randomized controlled trials. Consolidated Standards of Reporting Trials. JAMA. 1998;279(18):1489–1491.

118. Obremskey WT, Pappas N, Attallah-Wasif E, Tornetta P, Bhandari M. Level of evidence in orthopaedic journals. J Bone Joint Surg Am. 2005;87(12):2632–2638.

119. Horton R. Surgical research or comic opera: questions, but few answers. The Lancet. 1996;347(9007):984–985.

120. Agha R, D, Muir G. The reporting quality of randomised controlled trials in surgery: a systematic review. Int J Surg. 2007;5(6):413–422.

121. Howes N, Chagla L, Thorpe M. Surgical practice is evidence based. Br J Surg. 1997;84(9):1220-1223.

122. Sackett DL, Ellis J, Mulligan I, Rowe J. Inpatient general medicine is evidence based. The Lancet. 1995;346(8972):407-410.

123. Brandt CT. Evidence and experience: what is the balance in surgeons' training? Acta Cir Bras. 2007;22(4):239–242.

124. Maier RV. What the surgeon of tomorrow needs to know about evidence-based surgery. Arch Surg. 2006;141(3):317–323.

125. Smith GCS, Pell JP. Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials. BMJ. 2003;327(7429):1459–1461.

126. Glasziou P, Chalmers I, Rawlins M, McCulloch P. When are randomised trials unnecessary? Picking signal from noise. BMJ. 2007;334(7589):349–351.

127. McDonald PJ, Kulkarni AV, Farrokhyar F, Bhandari M. Ethical issues in surgical research. Can J Surg. 2010;53(2):133–136.

128. Rudicel S, Esdaile J. The randomized clinical trial in orthopaedics: obligation or option? J Bone Joint Surg Am. 1985;67(8):1284– 1293.

129. Taylor KM, Margolese RG, Soskolne CL. Physicians' reasons for not entering eligible patients in a randomized clinical trial of surgery for breast cancer. N Engl J Med. 1984;310(21):1363– 1367.

242 130. Stirrat GM, Farrow SC, Farndon J. The challenge of evaluating surgical procedures. Ann R Coll Surg of Engl. 1992;74:80-84.

131. McCulloch P, Taylor I, Sasako M, Lovett B, Griffin D. Randomised trials in surgery: problems and possible solutions. BMJ. 2002;324(7351):1448–1451.

132. Abelson R. Financial ties are cited as issue in spine study. The New York Times. 30th January 2008.

133. Jarman AF, Wray NP, Wenner DM, Ashton CM. Trials and tribulations: the professional development of surgical trialists. Am J Surg. 2012;204(3):339–346.e5.

134. Mann M, Tendulkar A, Birger N, Howard C, Ratcliffe MB. National institutes of health funding for surgical research. Ann Surg. 2008;247(2):217–221.

135. Ergina PL, Cook JA, Blazeby JM, et al. Challenges in evaluating surgical innovation. The Lancet. 2009;374(9695):1097–1104.

136. Young JM, Hollands MJ, Ward J, Holman CDJ. Role for opinion leaders in promoting evidence-based surgery. Arch Surg. 2003;138(7):785–791.

137. Cook JA, Ramsaya CR, Fayersb P. Statistical evaluation of learning curve effects in surgical trials. Clin. 2004;1(5):421–427.

138. Cook JA. The challenges faced in the design, conduct and analysis of surgical randomised controlled trials. Trials. 2009;10(9).

139. Morshed S, Corrales L, Genant H, Miclau T. Outcome assessment in clinical trials of fracture-healing. J Bone Joint Surg Am. 2008;90 Suppl 1:62–67.

140. Ellis C, Hall JL, Khalil A, Hall JC. Evolution of methodological standards in surgical trials. ANZ J Surg. 2005;75(10):874–877.

141. Mills N, Donovan JL, Smith M, Jacoby A, Neal DE, Hamdy FC. Perceptions of equipoise are crucial to trial participation: a qualitative study of men in the ProtecT study. Control Clin Trials. 2003;24(3):272–282.

142. Solomon MJ, McLeod RS. Should we be performing more randomized controlled trials evaluating surgical operations? Surgery. 1995;118(3):459–467.

143. Swanson JA, Schmitz D, Chung KC. How to practice evidence- based medicine. Plast Reconstr Surg. 2010;126(1):286–294.

243 144. Davidoff F, Haynes B, Sackett D, Smith, R. Evidence based medicine. BMJ 1995;310:1085-1086.

145. Lau J, Ioannidis J, Schmid CH. Summing up evidence: one answer is not always enough. The Lancet. 1998;351(9096):123- 127.

146. Mulrow CD. The medical review article: state of the science. Ann Intern Med. 1987;106(3):485.

147. McAlister FA. The Medical Review Article Revisited: Has the Science Improved? Ann Intern Med. 1999;131(12):947.

148. Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC. A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts. Treatments for myocardial infarction. JAMA. 1992;268(2):240–248.

149. Deeks JJ, Higgins JPT, Altman DG. Analysing Data and Undertaking Meta-Analyses. In: Higgins JPT, Green S, (editors). Cochrane Handbook for Systematic Reviews of Interventions. 5th edition. Chichester, UK: John Wiley & Sons, Ltd; 2008: pp243– 296.

150. Egger M, Smith GD, O'Rourke K. Introduction: Rationale, Potentials, and Promise of Systematic Reviews. In: Egger M, Smith GD, Altman DG (editors). Systematic Reviews in Health Care: Meta-Analysis in Context. 3rd edition. London, UK: BMJ Publishing Group; 2001: pp1–19.

151. Chalmers I, Tröhler U. Helping physicians to keep abreast of the medical literature: Medical and Philosophical Commentaries, 1773-1795. Ann Intern Med. 2000;133(3):238–243.

152. Pearson K. Report on certain enteric fever inoculation statistics. Br Med J. 1904;2(2288):1243–1246.

153. Bartolucci AA, Hillegass WB. Overview, Strengths, and Limitations of Systematic Reviews and Meta-Analyses. In: Evidence-Based Practice: Toward Optimizing Clinical Outcomes. Chiappelli, F. (editor). 1st edition. Springer Berlin Heidelberg. Berlin, Germany; 2010: pp17–33.

154. Glass GV. Primary, secondary, and meta-analysis of research. Educational researcher. 1976;10:3-8.

155. Greenhalgh T. Effectiveness and efficiency: Random reflections on health services. BMJ. 2004;328(7438):529.

156. Antes G, Oxman AD. The Cochrane Collaboration in the 20th Century. In: Egger M, Smith GD, Altman DG (editors). Systematic

244 Reviews in Health Care: Meta-Analysis in Context. 3rd edition. London, UK: BMJ Publishing Group; 2001: pp447–458.

157. Moher D, Tetzlaff J, Tricco AC, Sampson M, Altman DG. Epidemiology and reporting characteristics of systematic reviews. PLoS Med. 2007;4(3):e78.

158. Lau J, Antman EM, Jimenez-Silva J. Cumulative meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med. 1992;327(4):248-254.

159. Thompson SG. Why sources of heterogeneity in meta-analysis should be investigated. BMJ. 1994;309(6965):1351–1355.

160. Felson DT. Bias in meta-analytic research. J Clin Epidemiol. 1992;45(8):885–892.

161. Tricco AC, Tetzlaff J, Sampson M, et al. Few systematic reviews exist documenting the extent of bias: a systematic review. J Clin Epidemiol. 2008;61(5):422–434.

162. Green S, Higgins JPT. Preparing a Cochrane Review. In: Higgins JPT, Green S, (editors). Cochrane Handbook for Systematic Reviews of Interventions. 5th edition. Chichester, UK: John Wiley & Sons, Ltd; 2008: pp11–30.

163. Silagy CA, Middleton P, Hopewell S. Publishing protocols of systematic reviews: comparing what was done to what was planned. JAMA. 2002;287(21):2831–2834.

164. Counsell C. Formulating questions and locating primary studies for inclusion in systematic reviews. Ann Intern Med. 1997;127(5):380–387.

165. Sampson M, McGowan J, Cogo E, Grimshaw J, Moher D, Lefebvre C. An evidence-based practice guideline for the peer review of electronic search strategies. J Clin Epidemiol. 2009;62(9):944–952.

166. Moher D, Pham B, Klassen TP, et al. What contributions do languages other than English make on the results of meta- analyses? J Clin Epidemiol. 2000;53(9):964–972.

167. Edwards P, Clarke M, DiGuiseppi C, Pratap S, Roberts I, Wentz R. Identification of randomized controlled trials in systematic reviews: accuracy and reliability of screening records. Stat Med. 2002;21(11):1635–1640.

168. Buscemi N, Hartling L, Vandermeer B, Tjosvold L, Klassen TP. Single data extraction generated more errors than double data extraction in systematic reviews. J Clin Epidemiol.

245 2006;59(7):697–703.

169. Moja, L.P., Telaro E, D'Amico R, Moschetti I, Coe L, Liberati A. Assessment of methodological quality of primary studies by systematic reviews: results of the metaquality cross sectional study. BMJ. 2005;330(1):1053.

170. Yank V, Rennie D, Bero LA. Financial ties and concordance between results and conclusions in meta-analyses: retrospective cohort study. BMJ. 2007;335(7631):1202–1205.

171. Shea B, Dub C, Moher D. Assessing the quality of reports of systematic reviews: The QUOROM statement compared to other tools. In: Egger M, Smith GD, Altman DG (editors). Systematic Reviews in Health Care: Meta-Analysis in Context. 3rd edition. London, UK: BMJ Publishing Group; 2001: pp122–139.

172. Shea B, Bouter LM, Grimshaw JM, et al. Scope for improvement in the quality of reporting of systematic reviews. From the Cochrane Musculoskeletal Group. J Rheumatol. 2006;33(1):9–15.

173. Shea BJ, Grimshaw JM, Wells GA, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol. 2007;7:10.

174. Kang D, Wu Y, Hu D, Hong Q, Wang J, Zhang X. Reliability and external validity of AMSTAR in assessing quality of TCM systematic reviews. Evidence-based Complementary and Alternative Medicine. 2012; 2012: Article ID 732195.

175. Shea BJ, Bouter LM, Peterson J, et al. External validation of a measurement tool to assess systematic reviews (AMSTAR). PloS one. 2007;2(12).

176. Liberati A, Altman DG, Tetzlaff J, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. 2009;339:b2700.

177. Jadad AR, McQuay HJ. Meta-analyses to evaluate analgesic interventions: A systematic qualitative review of their methodology. J Clin Epidemiol. 1996;49(2):235–243.

178. Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC. Meta-analyses of randomized controlled trials. N Engl J Med. 1987;316(8):450–455.

179. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of

246 Meta-analyses. The Lancet. 1999;354(9193):1896–1900.

181. Willis BH, Quigley M. The assessment of the quality of reporting of meta-analyses in diagnostic research: a systematic review. BMC Med Res Methodol. 2011;11:163.

182. Ma B, Qi GQ, Lin XT, Wang T, Chen ZM, Yang KH. Epidemiology, quality, and reporting characteristics of systematic reviews of acupuncture interventions published in Chinese journals. Journal of alternative and complementary medicine. 2012;18(9):813–817.

183. Tao KM, Li XQ, Zhou QH, Moher D, Ling CQ, Yu WF. From QUOROM to PRISMA: a survey of high-impact medical journals' instructions to authors and a review of systematic reviews in anesthesia literature. PloS one. 2011;6(11):e27611.

184. Boutron I, Tubach F, Giraudeau B, Ravaud P. Methodological differences in clinical trials evaluating nonpharmacological and pharmacological treatments of hip and knee osteoarthritis. JAMA. 2003;290(8):1062–1070.

185. Vandenbroucke JP. When are observational studies as credible as randomised trials? Lancet. 2004;363(9422):1728–1731.

186. Hopewell S, Dutton S, Yu L-M, Chan A-W, Altman DG. The quality of reports of randomised trials in 2000 and 2006: comparative study of articles indexed in PubMed. BMJ. 2010;340(c723).

187. Balasubramanian SP, Wiener M, Alshameeri Z, Tiruvoipati R, Elbourne D, Reed MW. Standards of reporting of randomized controlled trials in general surgery: can we do better? Ann Surg. 2006;244(5):663–667.

188. Boutron I, Ravaud P, Nizard R. The design and assessment of prospective randomised, controlled trials in orthopaedic surgery. Journal of Bone and Joint Surgery-British Volume. 2007;89(7):858–863.

189. McCulloch P, Altman DG, Campbell WB, et al. No surgical innovation without evaluation: the IDEAL recommendations. The Lancet. 2009;374(9695):1105–1112.

190. Devereaux PJ, Choi PTL, El-Dika S, et al. An observational study found that authors of randomized controlled trials frequently use concealment of randomization and blinding, despite the failure to report these methods. J Clin Epidemiol. 2004;57(12):1232–1236.

191. Hill CL, LaValley MP, Felson DT. Discrepancy between published report and actual conduct of randomized clinical trials. J Clin

247 Epidemiol. 2002;55(8):783–786.

192. Liberati A, Himel HN, Chalmers TC. A quality assessment of randomized control trials of primary treatment of breast cancer. J Clin Oncol. 1986;4(6):942–951.

193. Pildal J, Chan A-W, Hrobjartsson A, Forfang E, Altman DG, Gotzsche PC. Comparison of descriptions of allocation concealment in trial protocols and the published reports: cohort study. BMJ. 2005;330(7499):1049.

194. ClinicalTrials.gov glossary of clinical trials terms. Available at http://clinicaltrials.gov/ct2/info/glossary. Accessed January 2008.

195. Douglas G Altman JMB. Statistics notes. Treatment allocation in controlled trials: why randomise? BMJ. 1999;318(7192):1209.

196. Tsay MY, Yang YH. Bibliometric analysis of the literature of randomized controlled trials. J Med Libr Assoc. 2005;93(4):450– 458.

197. Egger M, Zellweger-Zahner T, Schneider M, Junker C, Lengeler C, Antes G. Language bias in randomised controlled trials published in English and German. The Lancet. 1997;350(9074):326–329.

198. Slobogean GP, Verma A, Giustini D, Slobogean BL, Mulpuri K. MEDLINE, EMBASE, and Cochrane index most primary studies but not abstracts included in orthopedic meta-analyses. J Clin Epidemiol. 2009;62(12):1261–1267.

199. Lefebvre C, Manheimer E, Glanville J. Searching for studies. In: Higgins JPT, Green S, (editors). Cochrane Handbook for Systematic Reviews of Interventions. 5th edition. Chichester, UK: John Wiley & Sons, Ltd; 2008: pp 95-150.

200. Dickersin K, Manheimer E, Wieland S, Robinson KA, Lefebvre C, McDonald S. Development of the Cochrane Collaboration's CENTRAL Register of controlled clinical trials. Eval Health Prof. 2002;25(1):38–64.

201. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174.

202. Verhagen AP, de Vet HC, de Bie RA, Kessels AG, Boers M, Knipschild PG. Balneotherapy and quality assessment: interobserver reliability of the Maastricht criteria list and the need for blinded quality assessment. J Clin Epidemiol. 1998;51(4):335– 341.

203. Verhagen AP, de Vet HC, de Bie RA, Boers M, van den Brandt

248 PA. The art of quality assessment of RCTs included in systematic reviews. J Clin Epidemiol. 2001;54(7):651–654.

204. Juni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA. 1999;282(11):1054–1060.

205. International Clinical Trials Search Portal. World Health Organization. Available at: http://apps.who.int/trialsearch. Accessed June 15, 2012.

206. Bhandari M, Richards RR, Sprague S, Schemitsch EH. The quality of reporting of randomized trials in the Journal of Bone and Joint Surgery from 1988 through 2000. J Bone Joint Surg Am. 2002;84-A(3):388–396.

207. SurveyMonkey. SurveyMonkey Inc. Available at: http://www.surveymonkey.com. Accessed June 1, 2010.

208. Lundh A, Gotzsche PC. Recommendations by Cochrane Review Groups for assessment of the risk of bias in studies. BMC Med Res Methodol. 2008;8:22.

209. Moher D, Dulberg CS, Wells GA. Statistical power, sample size, and their reporting in randomized controlled trials. JAMA. 1994;272(2):122–124.

210. Mathieu S, Boutron I, Moher D, Altman DG, Ravaud P. Comparison of registered and published primary outcomes in randomized controlled trials. JAMA. 2009;302(9):977–984.

211. Day SJ, Altman DG. Statistics notes: Blinding in clinical trials and other studies. BMJ. 2000;321(7259):504.

212. Poolman RW, Struijs PAA, Krips R, et al. Reporting of outcomes in orthopaedic randomized trials: Does blinding of outcome assessors matter? J Bone Joint Surg Am. 2007;89(3):550–558.

213. Newell DJ. Intention-to-treat analysis: Implications for quantitative and qualitative research. Int J Epidemiol. 1992;21(5):837–841.

214. Hopewell S, Dutton S, Yu L-M, Chan A-W, Altman DG. The quality of reports of randomised trials in 2000 and 2006: comparative study of articles indexed in PubMed. BMJ. 2010;340(c723).

215. Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. The Lancet. 2001;357(9263):1191–1194.

249 216. Boutron I, Guittet L, Estellat C, Moher D, Hrobjartsson A, Ravaud P. Reporting Methods of Blinding in Randomized Trials Assessing Nonpharmacological Treatments. PLoS Med. 2007;4(2):e61

217. Chimonas S, Frosch Z, Rothman DJ. From Disclosure to Transparency: The Use of Company Payment Data. Archives of Internal Medicine. 2011;171(1):81–86.

218. Phillips B, Ball C, Sackett D, et al. Levels of Evidence. Oxford Centre for Evidence Based Medicine. Available at: http://www.cebm.net/index.aspx?o=1025. Accessed 13th 2008.

219. Chan S, Bhandari M. The quality of reporting of orthopaedic randomized trials with use of a checklist for nonpharmacological therapies. J Bone Joint Surg Am. 2007;89(9):1970–1978.

220. Gluud C, Nikolova D. Quality assessment of reports on clinical trials in the Journal of Hepatology. J Hepatol. 1998;29(2):321–327.

221. Altman DG. The scandal of poor medical research. BMJ. 1994;308(6924):283–284.

222. Liem MS, van der Graaf Y, van Vroonhoven TJ. CONSORT, randomized trials and the surgical scientific community. Br J Surg. 1997;84(6):769–770.

223. Kane RL, Wang J, Garrard J. Reporting in randomized clinical trials improved after adoption of the CONSORT statement. J Clin Epidemiol. 2007;60(3):241–249.

224. Plint AC, Moher D, Morrison A, et al. Does the CONSORT checklist improve the quality of reports of randomised controlled trials? A systematic review. Med J Aust. 2006;185(5):263–267.

225. Mills E, Wu P, Gagnier J, Heels-Ansdell D, Montori VM. An analysis of general medical and specialist journals that endorse CONSORT found that reporting was not enforced consistently. J Clin Epidemiol. 2005;58(7):662–667.

226. The CONSORT Statement Website. Accessed online at http://www.consort-statement.org/ on 13th May 2008.

227. National Library of Medicine Catalog of Journals. Available at http://www.ncbi.nlm.nih.gov/nlmcatalog#. Accessed November 2012.

228. Cowan J, Lozano-Calderon S, Ring D. Quality of prospective controlled randomized trials. Analysis of trials of treatment for lateral epicondylitis as an example. J Bone Joint Surg Am. 2007;89(8):1693–1699.

250 229. Karri V. Randomised clinical trials in plastic surgery: survey of output and quality of reporting.[see comment]. J Plast Reconstr Aesthet Surg. 2006;59(8):787–796.

230. Lefebvre C, Eisinga A, McDonald S, Paul N. Enhancing access to reports of randomized trials published world-wide - The contribution of EMBASE records to the Cochrane Central Register of Controlled Trials (CENTRAL) in the Cochrane Library. Emerg Themes Epidemiol. 2008;5(13).

231. Robinson KA, Dickersin K. Development of a highly sensitive search strategy for the retrieval of reports of controlled trials using PubMed. Int J Epidemiol. 2002;31(150):150–153.

232. Ahmad N, Boutron I, Moher D, Pitrou I, Roy C, Ravaud P. Neglected external validity in reports of randomized trials: The example of hip and knee osteoarthritis. Arthritis Care & Research. 2009; 61(3):361-369.

233. Van Spall HGC, Toren A, Kiss A, Fowler RA. Eligibility criteria of randomized controlled trials published in high-impact general medical journals: A systematic sampling review. JAMA. 2007;297(11):1233–1240.

234. Adetugbo K, Williams H. How well are randomized controlled trials reported in the dermatology literature? Arch Dermatol. 2000;136(3):381–385.

235. Halpern SH, Darani R, Douglas MJ, Wight W, Yee J. Compliance with the CONSORT checklist in obstetric anaesthesia randomised controlled trials. International journal of obstetric anesthesia. 2004;13(4):207–214.

236. Moher D, Jones A, Lepage L, Group C. Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA. 2001;285(15):1992–1995.

237. Kiyama T, Naito M, Shitama H, Shinoda T, Maeyama A. Comparison of skin blood flow between mini- and standard- incision approaches during total hip arthroplasty. J Arthroplasty. 2008;23(7):1045–1049.

238. Lozano LM, Segur JM, Macule F, et al. Intramedullary versus extramedullary tibial cutting guide in severely obese patients undergoing total knee replacement: a randomized study of 70 patients with body mass index >35 kg/m2. Obesity surgery. 2008;18(12):1599–1604.

239. Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-

251 built clinical question: a key to evidence-based decisions. ACP J Club. 1995;123(3):A12–3.

240. Goodman SN, Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med. 1994;121(3):200–206.

241. Spanjersberg WR, Reurings J, Keus F, van Laarhoven CJ. Fast track surgery versus conventional recovery strategies for colorectal surgery. Cochrane Database Syst Rev. 2011;(2):CD007635. doi:10.1002/14651858.CD007635.pub2.

242. Soares HP, Daniels S, Kumar A, et al. Bad reporting does not mean bad methods for randomised trials: observational study of randomised controlled trials performed by the Radiation Therapy Oncology Group. BMJ. 2004;328(7430):22–24.

243. Boutron I, Moher D, Altman DG, Schulz KF, Ravaud P, for the CG. Extending the CONSORT Statement to Randomized Trials of Nonpharmacologic Treatment: Explanation and Elaboration. Ann Intern Med. 2008;148(4):295–309.

244. Schulz KF, Altman DG, Moher D. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340.

245. Als-Nielsen B, Chen W, Gluud LL, Siersma V, Hilden J, Gluud C. Are trial size and reported methodological quality associated with treatment effects? Observational study of 523 randomised trials. Cochrane Collaboration Colloquium. P-003. Ottawa, Canada: Cochrane Collaboration; 2004.

246. Gotzsche, P. C. Reference bias in reports of drug trials. Br Med J. 1987;295(6599):654.

247. Sterne JAC, Egger M, Smith GD. Investigating and dealing with publication and other biases in meta-analysis. BMJ. 2001;323(7304):101–105.

248. Ioannidis JPA. Why Science Is Not Necessarily Self-Correcting. Perspectives on Psychological Science. 2012;7(6):645-654.

249. Bucher HC, Guyatt GH, Cook DJ, Holbrook A, McAlister FA. User's guides to the medical literature: XIX. Applying clinical trial results. A. How to use an article measuring the effect of an intervention on surrogate end points. JAMA. 1999;282(8):771– 778.

250. Deeks JJ, Higgins JPT, Altman DG. Analysing Data and Undertaking Meta-Analyses. In: Higgins JPT, Green S, (editors).

252 Cochrane Handbook for Systematic Reviews of Interventions. 5th edition. Chichester, UK: John Wiley & Sons, Ltd; 2008: pp243– 296.

251. Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. Much ado about nothing: a comparison of the performance of meta- analytical methods with rare events. Stat Med. 2007;26(1):53–77.

252. Higgins JPT, Deeks JJ. Selecting studies and collecting data. In In: Higgins JPT, Green S, (editors). Cochrane Handbook for Systematic Reviews of Interventions. 5th edition. Chichester, UK: John Wiley & Sons, Ltd; 2008: pp151-185.

253. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, USA: Lawrence Erlbaum Associates; 1988.

254. Chinn S. A simple method for converting an odds ratio to effect size for use in meta-analysis. Stat Med. 2000;19:3127–3131.

255. Borenstein M, Hedges LV, Higgins JPT, Rothstein H. Comprehensive Meta-analysis Version 2. Biostat, Englewood, USA. 2011.

256. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177–188.

257. Higgins JPT. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557–560.

258. Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta- analysis detected by a simple, graphical test. BMJ. 1997;315(7109):629–634.

259. Sterne JA, Egger M. Funnel plots for detecting bias in meta- analysis: guidelines on choice of axis. J Clin Epidemiol. 2001;54(10):1046–1055.

260. Galbraith RF. A note on graphical presentation of estimated odds ratios from several clinical trials. Stat Med. 1988;7(8):889–894.

261. Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Contour-enhanced meta-analysis funnel plots help distinguish publication bias from other causes of asymmetry. J Clin Epidemiol. 2008;61(10):991–996.

262. Harbord RM, Egger M, Sterne JA. A modified test for small-study effects in meta-analyses of controlled trials with binary endpoints. Stat Med. 2006;25(20):3443–3457.

263. Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Comparison of two methods to detect publication bias in meta-

253 analysis. JAMA. 2006;295(6):676–680.

264. Siersma V, Als-Nielsen B, Chen W, Hilden J, Gluud LL, Gluud C. Multivariable modelling for meta-epidemiological assessment of the association between trial quality and treatment effects estimated in randomized clinical trials. Stat Med. 2007;26(14):2745–2758.

265. Tierney JF, Stewart LA. Investigating patient exclusion bias in meta-analysis. Int J Epidemiol. 2005;34(1):79–87.

266. Ruiz-Canela M. Intention to treat analysis is related to methodological quality. BMJ. 2000;320(7240):1007–1007.

267. Sterne JAC, Juni P, Schulz KF, Altman DG, Bartlett C, Egger M. Statistical methods for assessing the influence of study characteristics on treatment effects in "meta-epidemiological" research. Stat Med. 2002;21(11):1513–1524.

268. Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA. 1995;273(5):408–412.

269. Naylor CD. Meta-analysis and the meta-epidemiology of clinical research. BMJ. 1997;315(7109):617–619.

270. Ciani O, Buyse M, Garside R, et al. Comparison of treatment effect sizes associated with surrogate and final patient relevant outcomes in randomised controlled trials: meta-epidemiological study. BMJ. 2013;346:f457.

271. Shang A, K H, Nartey L, et al. Are the clinical effects of homoeopathy placebo effects? Comparative study of placebo- controlled trials of homoeopathy and allopathy. Lancet. 2005;366(9487):726–732.

272. Linde K, Clausius N, Ramirez G, et al. Are the clinical effects of homoeopathy placebo effects? A meta-analysis of placebo- controlled trials. The Lancet. 1997;350(9081):834–843.

273. Hartling L, Ospina M, Liang Y, et al. Risk of bias versus quality assessment of randomised controlled trials: cross sectional study. BMJ. 2009;339:b4012.

274. Sterne JAC, Juni P, Schulz KF, Altman DG, Bartlett C, Egger M. Statistical methods for assessing the influence of study characteristics on treatment effects in “meta-epidemiological” research. Stat Med. 2002;21(11):1513–1524.

275. Ravnskov U. Cholesterol lowering trials in coronary heart disease:

254 frequency of citation and outcome. BMJ. 1992;305:15–19.

276. Hutton JL, Williamson PR. Bias in meta-analysis due to outcome variable selection within studies. J R Statist Soc C. 2000;49(3):359–370.

277. Chalmers I. Underreporting research is scientific misconduct. JAMA. 1990;263(10):1405–1408.

278. Mills JL. Data Torturing. N Engl J Med. 1993;329(16):1196–1199.

279. Gotzsche PC. Methodology and overt and hidden bias in reports of 196 double-blind trials of nonsteroidal antiinflammatory drugs in rheumatoid arthritis. Control Clin Trials. 1989;10(1):31–56.

280. Furukawa TA, Watanabe N, Omori IM, Montori VM, Guyatt GH. Association between unreported outcomes and effect size estimates in Cochrane meta-analyses. JAMA. 2007;297(5):468– 470.

281. Elm Von E, Rollin A, Blumle A, K H, Witschi M, Egger M. Publication and non-publication of clinical trials: longitudinal study of applications submitted to a research ethics committee. Swiss Med Wkly. 2008;138(13-14):197–203.

282. Hall JL, Hall JC. Use of outcome events in surgical trials: a systematic review. ANZ J Surg. 2012;82(11):771–774.

283. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.

284. Chan A-W, Altman DG. Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors. Bmj. 2005;330(7494):753.

285. Thompson SG, Sharp SJ. Explaining heterogeneity in meta- analysis: a comparison of methods. Stat Med. 1999;18(20):2693– 2708.

286. Journal of Citation Reports. 2008. Available at: Accessed through the University of New South Wales on 20th August 2008: http://info.library.unsw.edu.au/cgi- bin/local/access/access.cgi?url=http://isiknowledge.com/JCR.

287. Bero L, Oostvogel F, Bacchetti P, Lee K. Factors associated with findings of published trials of drug-drug comparisons: why some statins appear more efficacious than others. PLoS Med. 2007;4(6):e184.

288. Chan A-W, Upshur R, Singh JA, Ghersi D, Chapuis F, Altman DG.

255 Waiving confidentiality for the greater good. BMJ. 2006;332(7549):1086–1089.

289. Chan A-W, Tetzlaff JM, Altman DG, et al. SPIRIT 2013 Statement: Defining Standard Protocol Items for Clinical Trials. Ann Intern Med. 2013;158(3):200-207.

290. J Savulescu ICJB. Are research ethics committees behaving unethically? Some suggestions for improving performance and accountability. BMJ. 1996;313(7069):1390.

291. Krleža-Jerić K. Clinical Trial Registration: The Differing Views of Industry, the WHO, and the Ottawa Group. PloS medicine. 2005;2(11):e378.

292. Korolija D, Wood-Dauphinee S, Pointner R. Patient-reported outcomes. How important are they? Surg Endosc. 2007;21(4):503–507.

293. Fleming TR, DeMets DL. Surrogate end points in clinical trials: are we being misled? Ann Intern Med. 1996;125(7):605–613.

294. Guyatt GH. Measuring Health-Related Quality of Life. Ann Intern Med. 1993;118(8):622.

295. Gandhi GY, Murad MH, Fujiyoshi A, et al. Patient-important outcomes in registered diabetes trials. JAMA. 2008;299(21):2543–2549.

296. Efficace F, Horneber M, Lejeune S, et al. Methodological quality of patient-reported outcome research was low in complementary and alternative medicine in oncology. J Clin Epidemiol. 2006;59(12):1257–1265.

297. Mathers SA, Chesson RA, Proctor JM, McKenzie GA, Robertson E. The use of patient-centered outcome measures in radiology: a systematic review. Acad Radiol. 2006;13(11):1394–1404.

298. Giaccone G. Gefitinib in Combination With Gemcitabine and Cisplatin in Advanced Non-Small-Cell Lung Cancer: A Phase III Trial--INTACT 1. J Clin Oncol. 2004;22(5):777–784.

299. Barter PJ, Caulfield M, Eriksson M, et al. Effects of Torcetrapib in Patients at High Risk for Coronary Events. N Engl J Med. 2007;357(21):2109–2122.

300. The Cardiac Arrythmia Suppression Trial Investigators. Preliminary report : Effect of Encainide and Flecainide on mortality in a randomized trial of arrhythmia suppression after myocardial infarction. N Engl J Med. 1989;321:406–412.

256 301. Ciani O, Buyse M, Garside R, et al. Comparison of treatment effect sizes associated with surrogate and final patient relevant outcomes in randomised controlled trials: meta-epidemiological study. BMJ. 2013;346:f457.

302. Moynihan R. Surrogates under scrutiny: fallible correlations, fatal consequences. BMJ. 2011;343:d5160.

303. Yudkin JS, Lipska KJ, Montori VM. The idolatry of the surrogate. BMJ. 2011;343:d7995.

304. Echt DS, Liebson PR, Mitchell LB. Mortality and morbidity in patients receiving Encainide, Flecainide, or placebo: the Cardiac Arrhythmia Suppression Trial. N Engl J Med. 1991;324(12):781- 788.

305. Rossouw JE, Anderson GL, Prentice RL, et al. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women's Health Initiative randomized controlled trial. JAMA. 2002;288(3):321–333.

306. Storm T, Thamsborg G, Steiniche T. Effect of intermittent cyclical etidronate therapy on bone mass and fracture rate in women with postmenopausal osteoporosis. N Engl J Med. 1990;322(18):1265- 1271.

307. Liberman UA, Weiss SR, Bröll J, et al. Effect of Oral Alendronate on Bone Mineral Density and the Incidence of Fractures in Postmenopausal Osteoporosis. N Engl J Med. 1995;333(22):1437–1444.

308. Clavien PA, Barkun J, de Oliveira ML, et al. The Clavien-Dindo classification of surgical complications: five-year experience. Ann Surg. 2009;250(2):187–196.

309. Urschel JD, Goldsmith CH, Tandan VR, Miller JD. Users guide to evidence-based surgery: How to use an article evaluating surgical interventions. Evidence-Based Surgery Working Group. Can J Surg. 2001;44(2):95–100.

310. UK National Institute for Health and Clinical Excellence. 2011/12 review of the guide to the methods of technology appraisal. TA methods guide review: supporting documents. Available at: http://www.nice.org.uk/media/C67/40/TAMethodsGuideReviewSu pportingDocuments.pdf. Accessed January 20, 2013.

311. Elston J, Taylor RS. Use of surrogate outcomes in cost- effectiveness models: a review of United Kingdom health technology assessment reports. Int J Technol Assess Health Care. 2009;25(1):6–13.

257 312. Bhandari M, Montori VM, Devereaux PJ, et al. Doubling the impact: publication of systematic review articles in orthopaedic journals. J Bone Joint Surg Am. 2004;86-A(5):1012–1016.

313. Bhandari M, Morrow F, Kulkarni AV, Tornetta P3. Meta-analyses in orthopaedic surgery. A systematic review of their methodologies. J Bone Joint Surg Am. 2001;83-A(1):15–24.

314. Delaney A, Bagshaw SM, Ferland A, Manns B, Laupland KB, Doig CJ. A systematic evaluation of the quality of meta-analyses in the critical care literature. Crit Care. 2005;9(5):R575–82.

315. Hemels MEH, Vicente C, Sadri H, Masson MJ, Einarson TR. Quality assessment of meta-analyses of RCTs of pharmacotherapy in major depressive disorder. Curr Med Res Opin. 2004;20(4):477–484.

316. Hind D, Booth A. Do health technology assessments comply with QUOROM diagram guidance? An empirical study. BMC Med Res Methodol. 2007;7:49.

317. Jadad AR, Cook DJ, Jones A, et al. Methodology and reports of systematic reviews and meta-analyses: a comparison of Cochrane reviews with articles published in paper-based journals. JAMA. 1998;280(3):278–280.

318. Jorgensen AW, Hilden J, Gotzsche PC. Cochrane reviews compared with industry supported meta-analyses and other meta- analyses of the same drugs: systematic review. BMJ. 2006;333(7572):782.

319. Olsen O, Middleton P, Ezzo J, et al. Quality of Cochrane reviews: assessment of sample from 1998. BMJ. 2001;323(7317):829–832.

320. Moher D, Soeken K, Sampson M, Ben-Porat L, Berman B. Assessing the quality of reports of systematic reviews in pediatric complementary and alternative medicine. BMC Pediatr. 2002;2:3.

321. Meta-analysis definition- National Library of Medicine. Available at: http://www.nlm.nih.gov/nichsr/hta101/ta101014.html. Accessed March 21, 2013.

322. Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989;45(1):255–268.

323. Biondi-Zoccai GGL, Lotrionte M, Abbate A, et al. Compliance with QUOROM and quality of reporting of overlapping meta-analyses on the role of acetylcysteine in the prevention of contrast associated nephropathy: case study. BMJ. 2006;332(7535):202– 209.

258 324. Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D. How Quickly Do Systematic Reviews Go Out of Date? A Survival Analysis. Ann Intern Med. 2007;147(4):224–233.

325. Hopewell S, Clarke M, Moher D, et al. CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration. PLoS Med. 2008;5(1):e20.

326. Booth A, Clarke M, Dooley G, et al. The nuts and bolts of PROSPERO: an international prospective register of systematic reviews. Syst Rev. 2012;1:2.

327. Jones AP, Remmington T, Williamson PR, Ashby D, Smyth RL. High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews. J Clin Epidemiol. 2005;58(7):741–742.

328. Ma B, Guo J, Qi G, et al. Epidemiology, quality and reporting characteristics of systematic reviews of traditional Chinese medicine interventions published in Chinese journals. PloS one. 2011;6(5):e20185.

329. Turner L, Galipeau J, Garritty C, et al. An evaluation of epidemiological and reporting characteristics of complementary and alternative medicine (CAM) systematic reviews (SRs). PloS one. 2013;8(1):e53536.

330. Hall JC, Mills B, Nguyen H, Hall JL. Methodologic standards in surgical trials. Surgery. 1996;119(4):466–472.

331. Solomon MJ, McLeod RS. Surgery and the randomised controlled trial: past, present and future. Med J Aust. 1998;169(7):380–383.

332. Ball C, Wiener M, Alshameeri Z, Tiruvoipati R, Elbourne D, Reed MW. Standards of reporting of randomized controlled trials in general surgery: can we do better? Ann Surg. 2006;244(5):663– 667.

333. Ravaud P, Hopewell S, eds. Web-based Tool to Improve Reporting of Randomized Controlled Trials (WebCONSORT). Trial registry. Available at: http://www.clinicaltrials.gov/ct2/show/NCT01891448. Accessed December 11, 2013.

334. Weil RJ. The Future of Surgical Research. PLoS Med. 2004;1(1):e13.

335. Hamdy FC, ed. The ProtecT trial - Evaluating the effectiveness of treatment for clinically localised prostate cancer. Trial registry. Available at: http://www.controlled-trials.com/ISRCTN20141297.

259 Accessed December 11, 2013.

336. Knebel P, Kühn S, Ulrich AB, Büchler MW, Diener MK. The Study Centre of the German Surgical Society: current trials and results. Langenbecks Arch Surg. 2012;397(4):611–618.

260

Appendix 1. Syntax of electronic search strategies employed to identify randomized trials

MEDLINE via Ovid (2005 - Week 3, May 2009)

1 randomized controlled trial.pt. 2 controlled clinical trial.pt. 3 randomi*ed.ab. 4 placebo.ab. 5 randomly.ab. 6 trial.ab. 7 groups.ab. 8 or/1-7 9 animals/ not (humans/ and animals/) 10 8 not 9 11 exp Specialties, Surgical/ 12 exp Surgical Procedures, Operative/ 13 surger$.tw. 14 surgical$.tw. 15 operative$.tw. 16 or/11-15 17 16 and 10 18 limit 17 to yr="2005 -Current"

EMBASE via Ovid (2005 – Week 21, 2009)

1 randomized controlled trial/ 2 crossover procedure/ 3 double-blind procedure/ 4 single-blind procedure/ 5 random$.tw. 6 factorial$.tw. 7 (crossover$ or cross-over$).tw. 8 placebo$.tw. 9 (double$ adj blind$).tw.

A -1

10 (singl$ adj blind$).tw. 11 assign$.tw. 12 allocat$.tw. 13 volunteer$.tw. 14 or/1-13 15 exp surgery/ 16 surger$.tw. 17 surgical$.tw. 18 operative$.tw. 19 or/15-18 20 14 and 19 21 limit 20 to yr="2005 -Current"

CENTRAL via Wiley Interscience (2005 – Week 3, May 2009)

#1 MeSH descriptor “Specialties, Surgical” explode all trees MeSH descriptor “Surgical Procedures, Operative” explode all #2 trees #3 surger*:ti,ab #4 surgical*:ti,ab #5 operative*:ti,ab #6 (#1 OR #2 OR #3 OR #4 OR #5), from 2005 to 2009

A -2

Appendix 2. Text of initial email invitation to participate in author survey

Dear (author name), We are a team of researchers conducting a large and important study investigating the quality and epidemiology of published randomised trials (IQuEST). You are listed as the contact author for an eligible published paper titled (insert title) in (insert Journal name). Please click the link below to access a short survey regarding your published paper. We know as a researcher you may be very busy, so the survey has been designed to be completed in about 5 minutes. (unique weblink) If clicking the link does not work, please copy and paste the link into a browser. The link is unique to your email address so please do not forward this message. Your responses are completely confidential and will only be used for research purposes. The study is part of a thesis for Dr. Sam Adie, a surgical trainee and PhD student at the University of New South Wales in Sydney, Australia. His supervisors are Prof. Ian Harris, Professor of Orthopaedic Surgery at the University of New South Wales, and Prof. Jonathan Craig, Professor of Clinical Epidemiology at the University of Sydney. Please click here if you would like to know more about the investigators. Dr. Sam Adie’s research has been funded by the National Health and Medical Research Council of Australia, and the Royal Australasian College of Surgeons. Thank you in advance for completing the survey. Please reply to this email ([email protected]) if you have any queries, or call +61 2 87389254. Dr. Sam Adie; BSc(Med), MBBS (Hons), MSpMed, MPH Prof. Ian Harris; MBBS, MMed(Clin Epi), PhD, FRACS(Orth), FAOrthA Prof. Jonathan Craig; MBChB, MMed(Clin Epi), PhD, FRACP, DipCH

A -3

Appendix 3. First reminder email invitation to participate in author survey

Dear (author name), You recently received an email invitation to participate in a large and important study investigating the quality and epidemiology of published randomised trials (iQuEST). We have not received a response from you yet, and we wanted to send you a friendly reminder. You are the contact author for an eligible published paper titled (insert title) in (insert Journal name). Please click the link below to access a short survey regarding your published paper. We know as a researcher you may be very busy, so the survey has been designed to be completed in about 5 minutes. (unique weblink) If clicking the link does not work, please copy and paste the link into a browser. The link is unique to your email address so please do not forward this message. Your responses are completely confidential and will only be used for research purposes. The study is part of a thesis for Dr. Sam Adie, a surgical trainee and PhD student at the University of New South Wales in Sydney, Australia. His supervisors are Prof. Ian Harris, Professor of Orthopaedic Surgery at the University of New South Wales, and Prof. Jonathan Craig, Professor of Clinical Epidemiology at the University of Sydney. Please click here if you would like to know more about the investigators. Dr. Sam Adie’s research has been funded by the National Health and Medical Research Council of Australia, and the Royal Australasian College of Surgeons. Thank you in advance for completing the survey. Please reply to this email ([email protected]) if you have any queries, or call +61 2 87389254. Dr. Sam Adie; BSc(Med), MBBS (Hons), MSpMed, MPH Prof. Ian Harris; MBBS, MMed(Clin Epi), PhD, FRACS(Orth), FAOrthA Prof. Jonathan Craig; MBChB, MMed(Clin Epi), PhD, FRACP, DipCH

A -4

Appendix 4. Second reminder email invitation to participate in author survey

Dear (author name), You recently received an email invitation to participate in a large and important study investigating the quality and epidemiology of published randomised trials (iQuEST). We have not received a response from you yet, and we wanted to send you a another friendly reminder. We know you must be very busy, so we can assure you that this survey only takes about 5 minutes to complete. Your eligible paper is titled (insert title) in (insert Journal name). Please click the link below to access a short survey regarding your published paper. (unique weblink) If clicking the link does not work, please copy and paste the link into a browser. The link is unique to your email address so please do not forward this message. Your responses are completely confidential and will only be used for research purposes. The study is part of a thesis for Dr. Sam Adie, a surgical trainee and PhD student at the University of New South Wales in Sydney, Australia. His supervisors are Prof. Ian Harris, Professor of Orthopaedic Surgery at the University of New South Wales, and Prof. Jonathan Craig, Professor of Clinical Epidemiology at the University of Sydney. Please click here if you would like to know more about the investigators. Dr. Sam Adie’s research has been funded by the National Health and Medical Research Council of Australia, and the Royal Australasian College of Surgeons. Thank you in advance for completing the survey. Please reply to this email ([email protected]) if you have any queries, or call +61 2 87389254. Dr. Sam Adie; BSc(Med), MBBS (Hons), MSpMed, MPH Prof. Ian Harris; MBBS, MMed(Clin Epi), PhD, FRACS(Orth), FAOrthA Prof. Jonathan Craig; MBChB, MMed(Clin Epi), PhD, FRACP, DipCH

A -5

Appendix 5. Third and final reminder email invitation to participate in author survey

Dear (author name), We are sending you this final reminder to participate in “iQuEST”- a large and important study investigating the quality and epidemiology of published randomised trials. We have not received a response from you yet, and we are really hoping you will help by completing the survey.

We can assure you that this survey only takes about 5 minutes to complete.

Your eligible paper is titled (insert title) in (insert Journal name).

Please click the link below to access a short survey regarding your published paper.

(unique weblink) If clicking the link does not work, please copy and paste the link into a browser. The link is unique to your email address so please do not forward this message.

Your responses are completely confidential and will only be used for research purposes.

The study is part of a thesis for Dr. Sam Adie, a surgical trainee and PhD student at the University of New South Wales in Sydney, Australia. His supervisors are Prof. Ian Harris, Professor of Orthopaedic Surgery at the University of New South Wales, and Prof. Jonathan Craig, Professor of Clinical Epidemiology at the University of Sydney. Please click here if you would like to know more about the investigators.

Dr. Sam Adie’s research has been funded by the National Health and Medical Research Council of Australia, and the Royal Australasian College of Surgeons.

Thank you in advance for completing the survey. Please reply to this email ([email protected]) if you have any queries, or call +61 2 87389254.

Dr. Sam Adie; BSc(Med), MBBS (Hons), MSpMed, MPH

Prof. Ian Harris; MBBS, MMed(Clin Epi), PhD, FRACS(Orth), FAOrthA

Prof. Jonathan Craig; MBChB, MMed(Clin Epi), PhD, FRACP, DipCH

A -6

Appendix 6. Copy of online author survey

iQuEST Author Survey

Investigating the Quality and Epidemiology of Trials (IQuEST)

Thank you for participating in this author survey. Your answers will provide important information about the conduct and reporting of published randomised trials.

Please be assured that all answers are completely confidential and will only be used for research purposes.

We anticipate the survey will take about 5 MINUTES to complete.

Please click Next to continue.

Page 1

A -7

iQuEST Author Survey

1. Was the method used to generate the patient allocation sequence completely random? Explanation: When planning the allocation of patients to different intervention groups, was the sequence of patients planned in a completely random, unpredictable manner? No

Yes

Page 2

A -8

iQuEST Author Survey

2. Please specify the method you used to generate the patient allocation sequence (select one answer) Drawing lots / Lottery Shuffled cards

Table or list of random numbers Coin flip

Alternate allocation Allocation based on patient date of birth

Allocation based on admission day Dice roll

Allocation based on patient medical record Computer / internet generated number

Other form of sequence generation (please specify)

Page 3

A -9

iQuEST Author Survey

3. Did you use any method to conceal the allocation sequence of participants/patients in your study? Explanation: Before patients were assigned to a group, was the patient allocation sequence hidden from study personnel, so that it was not possible to determine which group the next patient was going to be assigned to? No

Yes

Page 4

A -10

iQuEST Author Survey

4. Please specify the method of allocation concealment used (select one answer) Sealed containers List kept with third party not involved with study

Internet / web based system Telephone system

Pharmacy controlled allocation Sealed envelopes

Other method (please specify)

Page 5

A -11

iQuEST Author Survey

5. Please tick if any of the following were blinded (masked) to intervention groups in your trial. Explanation: Once patients were assigned to a group, were the patients and/or study personnel completely unaware to which group that patient belonged to? Please tick all the people that were blinded below Patients/Participants

Carers/Investigators

Outcome assessors

Statisticians

None of the above

Page 6

A -12

iQuEST Author Survey

6. Please briefly describe how blinding (masking) was achieved for each party that was blinded

Page 7

A -13

iQuEST Author Survey

7. What was the source of financial support for the clinical trial? (select one answer) Completely funded by commercial industry

Partially funded by commercial industry

No external funding received / Internal department funding

Completely funded by a not-for-profit or government organisation

Partially funded by a not-for-profit or government organisation

Funded by both commercial and not-for-profit sources

Other source of funding (please specify)

Page 8

A -14

iQuEST Author Survey

8. Please indicate if the trial was registered in any of the registries below. (Please tick all that apply) Not registered

Registered on Clinicaltrials.gov

Registered on ISRCTN

Registered on EU-CTR

Registered on Chinese Clinical Trials Registry

Registered on ANZ Clinical Trials Registry

Registered on another registry (please specify it below)

Other registry name

Page 9

A -15

iQuEST Author Survey

9. Please provide your trial's registration number, if available I don't have it right now

Trial registration number

Page 10

A -16

iQuEST Author Survey

10. Sometimes, journal publications may not contain all the outcomes that were actually measured. There may be a variety of reasons for this. Were all outcomes measured in your study described in the published report? No

Yes

Page 11

A -17

iQuEST Author Survey

11. Please provide the reason(s) for any outcomes not being published. (Please tick as many outcomes and as many reasons that apply) Journal / word Published in Not statistically Not clinically Other (please restrictions another paper significant important specify below) Outcome 1 Outcome 2 Outcome 3 Outcome 4 Outcome 5 Outcome 6 Outcome 7 Outcome 8 Outcome 9 Outcome 10

Specify any other reason below

Page 12

A -18

Appendix 7. References to included surgical RCTs

1. Porpiglia F, Fiori C, Grande S et al. Selective versus Standard Ligature of the Deep Venous Complex during Laparoscopic Radical Prostatectomy: Effects on Continence, Blood Loss, and Margin Status. European Urology 2009;55(6):1377-1385. 2. Lingeman JE, Preminger GM, Goldfischer ER et al. Assessing the Impact of Ureteral Stent Design on Patient Comfort. Journal of Urology 2009;181(6):2581-2587. 3. Hesham A. Bipolar diathermy versus cold dissection in paediatric tonsillectomy. International Journal of Pediatric Otorhinolaryngology 2009;73(6):793-795. 4. Gao F, Henricson A, Nilsson KG. Cemented versus uncemented fixation of the femoral component of the NexGen CR total knee replacement in patients younger than 60 years. A Prospective Randomised Controlled RSA Study. Knee 2009;16(3):200-206. 5. Bridgman SA, Walley G, MacKenzie G et al. Sub-vastus approach is more effective than a medial parapatellar approach in primary total knee arthroplasty: A randomized controlled trial. Knee 2009;16(3):216- 222. 6. Berlucchi M, Castelnuovo P, Vincenzi A et al. Endoscopic outcomes of resorbable nasal packing after functional endoscopic sinus surgery: A multicenter prospective randomized controlled study. European Archives of Oto-Rhino-Laryngology 2009;266(6):839-845. 7. Adhikari P. Nasal leech infestation in children: Comparison of two different innovative techniques. International Journal of Pediatric Otorhinolaryngology 2009;73(6):853-855. 8. Nasso G, Coppola R, Bonifazi R et al. Arterial revascularization in primary coronary artery bypass grafting: Direct comparison of 4 strategies-Results of the Stand-in-Y Mammary Study. Journal of Thoracic and Cardiovascular Surgery 2009;137(5):1093-1100. 9. Lepantalo M, Laurila K, Roth WD et al. PTFE Bypass or Thrupass for Superficial Femoral Artery Occlusion? A Randomised Controlled Trial. European Journal of Vascular and Endovascular Surgery 2009;37(5):578-584. 10. Jacobs LG, Smith MG, Khan SA et al. Manipulation or intra-articular steroids in the management of adhesive capsulitis of the shoulder? A prospective randomized trial. Journal of Shoulder and Elbow Surgery 2009;18(3):348-353. 11. Garutti I, Gonzalez-Aragoneses F, Biencinto MT et al. Thoracic paravertebral block after thoracotomy: comparison of three different approaches. European Journal of Cardio-thoracic Surgery 2009;35(5):829-832. 12. Formica F, Broccolo F, Martino A et al. Myocardial revascularization with miniaturized extracorporeal circulation versus off pump: Evaluation of systemic and myocardial inflammatory response in a

A -19

prospective randomized study. Journal of Thoracic and Cardiovascular Surgery 2009;137(5):1206-1212. 13. Davis JW, Chang DW, Chevray P et al. Randomized Phase II Trial Evaluation of Erectile Function after Attempted Unilateral Cavernous Nerve-Sparing Retropubic Radical Prostatectomy With Versus Without Unilateral Sural Nerve Grafting for Clinically Localized Prostate Cancer. European Urology 2009;55(5):1135-1144. 14. D'Andrilli A, Andreetti C, Ibrahim M et al. A prospective randomized study to assess the efficacy of a surgical sealant to treat air leaks in lung surgery. European Journal of Cardio-thoracic Surgery 2009;35(5):817-821. 15. Colao A, Cappabianca P, Caron P et al. Octreotide LAR vs. surgery in newly diagnosed patients with acromegaly: A randomized, open-label, multicentre study. Clinical Endocrinology 2009;70(5):757-768. 16. Calder N, Kang S, Fraser L et al. A double-blind randomized controlled trial of management of recurrent nosebleeds in children. Otolaryngology - Head and Neck Surgery 2009;140(5):670-674. 17. Blomqvist EH, Lundblad L, Bergstedt H et al. A randomized prospective study comparing medical and medical-surgical treatment of nasal polyposis by CT. Acta Otolaryngol 2009;129(5):545-549. 18. Berger AC, Howard TJ, Kennedy EP et al. Does Type of Pancreaticojejunostomy after Pancreaticoduodenectomy Decrease Rate of Pancreatic Fistula? A Randomized, Prospective, Dual- Institution Trial. Journal of the American College of Surgeons 2009;208(5):738-747. 19. Amin M, Glynn F, Timon C. Randomized trial of tissue adhesive vs staples in thyroidectomy integrating patient satisfaction and Manchester score. Otolaryngology - Head and Neck Surgery 2009;140(5):703-708. 20. Wong TC, Chiu Y, Tsang WL et al. A double-blind, prospective, randomised, controlled clinical trial of minimally invasive dynamic hip screw fixation of intertrochanteric fractures. Injury 2009;40(4):422-427. 21. von Oppell UO, Masani N, O'Callaghan P et al. Mitral valve surgery plus concomitant atrial fibrillation ablation is superior to mitral valve surgery alone with an intensive rhythm control strategy. European Journal of Cardio-thoracic Surgery 2009;35(4):641-650. 22. van Det RJ, Vriens BHR, van der Palen J et al. Dacron or ePTFE for Femoro-popliteal Above-Knee Bypass Grafting: Short- and Long-term Results of a Multicentre Randomised Trial. European Journal of Vascular and Endovascular Surgery 2009;37(4):457-463. 23. Unalp HR, Erbil Y, Akguner T et al. Does near total thyroidectomy offer advantage over total thyroidectomy in terms of postoperative hypocalcemia? International Journal of Surgery 2009;7(2):120-125. 24. Stephens J, Singh A, Hughes J et al. A prospective multi-centre randomised controlled trial comparing PlasmaKnife with bipolar dissection tonsillectomy: Evaluating an emerging technology. International Journal of Pediatric Otorhinolaryngology 2009;73(4):597- 601.

A -20

25. Simon P, Burkhardt U, Sack U et al. Inflammatory response is no different in children randomized to laparoscopic or open appendectomy. Journal of Laparoendoscopic and Advanced Surgical Techniques 2009;19(1):S71-S76. 26. Sheng W-C, Li J-Z, Chen S-H et al. A new technique for lag screw placement in the dynamic hip screw fixation of intertrochanteric fractures: decreasing radiation time dramatically. International Orthopaedics 2009;33(2):537-542. 27. Serra-Aracil X, Bombardo-Junca J, Moreno-Matias J et al. Randomized, controlled, prospective trial of the use of a mesh to prevent parastomal hernia. Annals of Surgery 2009;249(4):583-587. 28. Seiler CM, Bruckner T, Diener MK et al. Interrupted or continuous slowly absorbable sutures for closure of primary elective midline abdominal incisions: a multicenter randomized trial. Annals of Surgery 2009;249(4):576-582. 29. Papp M, Szabo L, Lazar I et al. Combined High Tibial Osteotomy Decreases Biomechanical Changes Radiologically Detectable in the Sagittal Plane Compared With Closing-Wedge Osteotomy. Arthroscopy - Journal of Arthroscopic and Related Surgery 2009;25(4):355-364. 30. Murrey D, Janssen M, Delamarter R et al. Results of the prospective, randomized, controlled multicenter Food and Drug Administration investigational device exemption study of the ProDisc-C total disc replacement versus anterior discectomy and fusion for the treatment of 1-level symptomatic cervical disc disease. Spine Journal 2009;9(4):275-286. 31. McCalden RW, MacDonald SJ, Rorabeck CH et al. Wear rate of highly cross-linked polyethylene in total hip arthroplasty. A randomized controlled trial. J Bone Joint Surg Am 2009;91(4):773-782. 32. Ko PJ, Liu YH, Hung YN et al. Patency rates of cuffed and noncuffed extended polytetrafluoroethylene grafts in dialysis access: A prospective, randomized study. World Journal of Surgery 2009;33(4):846-851. 33. Kim Y-H, Choi Y, Kwon O-R et al. Functional outcome and range of motion of high-flexion posterior cruciate-retaining and high-flexion posterior cruciate-substituting total knee prostheses. A prospective, randomized study. J Bone Joint Surg Am 2009;91(4):753-760. 34. Jung YB, Lee YS, Lee EY et al. Comparison of the modified subvastus and medial parapatellar approaches in total knee arthroplasty. International Orthopaedics 2009;33(2):419-423. 35. Jones RH, Velazquez EJ, Michler RE et al. Coronary bypass surgery with or without surgical ventricular reconstruction New England Journal of Medicine 2009;360(17):1705-1717. 36. Hyodo M, Nishikubo K, Motoyoshi K. Laterofixation of the vocal fold using an endo-extralaryngeal needle carrier for bilateral vocal fold paralysis. Auris Nasus Larynx 2009;36(2):181-186. 37. Hofmeijer J, Kappelle LJ, Algra A et al. Surgical decompression for space-occupying cerebral infarction (the Hemicraniectomy After Middle Cerebral Artery infarction with Life-threatening Edema Trial A -21

[HAMLET]): a multicentre, open, randomised trial. The Lancet Neurology 2009;8(4):326-333. 38. Henkus HE, De Witte PB, Nelissen RG et al. Bursectomy compared with acromioplasty in the management of subacromial impingement syndrome: A prospective randomised study. Journal of Bone and Joint Surgery - Series B 2009;91(4):504-510. 39. Gupta PJ, Heda PS, Kalaskar S. Randomized controlled study between suture ligation and radio wave ablation and suture ligation of grade III symptomatic hemorrhoidal disease. International Journal of Colorectal Disease 2009;24(4):455-460. 40. Geerdink CH, Grimm B, Vencken W et al. Cross-linked compared with historical polyethylene in THA: an 8-year clinical study. Clin Orthop 2009;467(4):979-984. 41. Falahatkar S, Neiroomand H, Akbarpour M et al. One-shot versus metal telescopic dilation technique for tract creation in percutaneous nephrolithotomy: Comparison of safety and efficacy. Journal of Endourology 2009;23(4):615-618. 42. Damgaard S, Wetterslev J, Lund JT et al. One-year results of total arterial revascularization vs. conventional coronary surgery: CARRPO trial. European Heart Journal 2009;30(8):1005-1011. 43. Dalenback J, Andersson C, Anesten B et al. Prolene Hernia System, Lichtenstein mesh and plug-and-patch for primary inguinal hernia repair: 3-year outcome of a prospective randomised controlled trial. The BOOP study: Bi-layer and connector, On-lay, and On-lay with Plug for inguinal hernia repair. Hernia 2009;13(2):121-129. 44. Csendes A, Maluenda F, Burgos AM. A prospective randomized study comparing patients with morbid obesity submitted to laparotomic gastric bypass with or without omentectomy. Obesity Surgery 2009;19(4):490-494. 45. Chibbaro S, Tacconi L. Use of skin glue versus traditional wound closure methods in brain surgery: A prospective, randomized, controlled study. Journal of Clinical Neuroscience 2009;16(4):535-539. 46. Chen Z, Lu G, Li X et al. Better Compliance Contributes to Better Nocturnal Continence With Orthotopic Ileal Neobladder Than Ileocolonic Neobladder After Radical Cystectomy for Bladder Cancer. Urology 2009;73(4):838-843. 47. Carradice D, Mekako AI, Hatfield J et al. Randomized clinical trial of concomitant or sequential phlebectomy after endovenous laser therapy for varicose veins. British Journal of Surgery 2009;96(4):369- 375. 48. Aslanabadi S, Jamshidi M, Tubbs RS et al. The role of prophylactic chest drainage in the operative management of esophageal atresia with tracheoesophageal fistula. Pediatric Surgery International 2009;25(4):365-368. 49. Ziegeler S, Raddatz A, Schneider SO et al. Effects of haemofiltration and mannitol treatment on cardiopulmonary-bypass induced immunosuppression. Scandinavian Journal of Immunology 2009;69(3):234-241.

A -22

50. Wardlaw D, Cummings SR, Van Meirhaeghe J et al. Efficacy and safety of balloon kyphoplasty compared with non-surgical care for vertebral compression fracture (FREE): a randomised controlled trial. The Lancet 2009;373(9668):1016-1024. 51. Wang T, Cui Y, Huang WS et al. The role of postoperative colonoscopic surveillance after radical surgery for colorectal cancer: a prospective, randomized clinical study. Gastrointestinal Endoscopy 2009;69(3):609-615. 52. van den Broek FJC, Fockens P, Van Eeden S et al. Clinical evaluation of endoscopic trimodal imaging for the detection and differentiation of colonic polyps. Clin Gastroenterol Hepatol 2009;7(3):288-295. 53. Ulrich AB, Seiler C, Rahbari N et al. Diverting stoma after low anterior resection: more arguments in favor. Dis Colon Rectum 2009;52(3):412-418. 54. Ullah AS, Dias JJ, Bhowal B. Does a 'firebreak' full-thickness skin graft prevent recurrence after surgery for Dupuytren's contracture? A prospective, randomised trial. Journal of Bone and Joint Surgery - Series B 2009;91(3):374-378. 55. Toniato A, Merante-Boschin I, Opocher G et al. Surgical versus conservative management for subclinical Cushing syndrome in adrenal incidentalomas: a prospective randomized study. Annals of Surgery 2009;249(3):388-391. 56. Timperley AJ, Whitehouse SL, Hourigan PG. The influence of a suction device on fixation of a cemented cup using RSA. Clinical orthopaedics and related research 2009;467(3):792-798. 57. Tarnoff M, Rodriguez L, Escalona A et al. Open label, prospective, randomized controlled trial of an endoscopic duodenal-jejunal bypass sleeve versus low calorie diet for pre-operative weight loss in bariatric surgery. Surgical Endoscopy 2009;23(3):650-656. 58. Serruvs PW, Morice MC, Kappetein AP et al. Percutaneous coronary intervention versus coronary-artery bypass grafting for severe coronary artery disease. New England Journal of Medicine 2009;360(10):961-972. 59. Seon JK, Park SJ, Lee KB et al. Range of motion in total knee arthroplasty: a prospective comparison of high-flexion and standard cruciate-retaining designs. J Bone Joint Surg Am 2009;91(3):672-679. 60. Schurr MJ, Foster KN, Centanni JM et al. Phase I/II clinical evaluation of StrataGraft: a consistent, pathogen-free human skin substitute. J Trauma 2009;66(3):866-873. 61. Rousseau JA, Girard K, Turcot-Lemay L et al. A randomized study comparing skin closure in cesarean sections: staples vs subcuticular sutures. American Journal of Obstetrics and Gynecology 2009;200(3):265.e261-265.e264. 62. Rossetto LA, Garcia EB, Abla LF et al. Quilting suture in the donor site of the transverse rectus abdominis musculocutaneous flap in breast reconstruction. Ann Plast Surg 2009;62(3):240-243. 63. Park CM, Lee WY, Cho YB et al. Sodium hyaluronate-based bioresorbable membrane (Seprafilm) reduced early postoperative intestinal obstruction after lower abdominal surgery for colorectal A -23

cancer: The preliminary report. International Journal of Colorectal Disease 2009;24(3):305-310. 64. Onorati F, Santarpino G, Presta P et al. Pulsatile perfusion with intra- aortic balloon pumping ameliorates whole body response to cardiopulmonary bypass in the elderly. Crit Care Med 2009;37(3):902- 911. 65. Mustafa M, Ali-El-Dein B. Stenting in extracorporeal Shockwave lithotripsy; may enhance the passage of the fragments. Journal of the Pakistan Medical Association 2009;59(3):141-143. 66. Musleh GS, Datta SS, Yonan NN et al. Association of IL6 and IL10 with renal dysfunction and the use of haemofiltration during cardiopulmonary bypass. European Journal of Cardio-thoracic Surgery 2009;35(3):511-514. 67. Mavuduru RM, Mandal AK, Singh SK et al. Comparison of HoLEP and TURP in terms of efficacy in the early postoperative period and perioperative morbidity. Urologia Internationalis 2009;82(2):130-135. 68. Manci N, Marchetti C, Esposito F et al. Inguinofemoral lymphadenectomy: Randomized trial comparing inguinal skin access above or below the inguinal ligament. Annals of Surgical Oncology 2009;16(3):721-728. 69. Lu P, Zhu X-Q, Xu Z-L et al. Increased infiltration of activated tumor- infiltrating lymphocytes after high intensity focused ultrasound ablation of human breast cancer. Surgery 2009;145(3):286-293. 70. Liu Y, Yang J, Liu J et al. Surgical treatment of primary palmar hyperhidrosis: a prospective randomized study comparing T3 and T4 sympathicotomy. European Journal of Cardio-thoracic Surgery 2009;35(3):398-402. 71. Li CZ, Cheng LF, Wang ZQ et al. Attempt of photodynamic therapy on esophageal varices. Lasers Med Sci 2009;24(2):167-171. 72. Kwon BK, Curt A, Belanger LM et al. Intrathecal pressure monitoring and cerebrospinal fluid drainage in acute spinal cord injury: a prospective randomized trial. J Neurosurg Spine 2009;10(3):181-193. 73. Kouhia STH, Huttunen R, Silvasti SO et al. Lichtenstein hernioplasty versus totally extraperitoneal laparoscopic hernioplasty in treatment of recurrent inguinal hernia--a prospective randomized trial. Annals of Surgery 2009;249(3):384-387. 74. Khan MA, Smythe A, Globe J et al. Randomized controlled trial of laparoscopic Nissen versus Lind fundoplication for gastro- oesophageal reflux disease. Scandinavian Journal of Gastroenterology 2009;44(3):269-275. 75. Karakayali F, Karagulle E, Karabulut Z et al. Unroofing and marsupialization vs. rhomboid excision and Limberg flap in pilonidal disease: a prospective, randomized, clinical trial. Dis Colon Rectum 2009;52(3):496-502. 76. Kanotra SP, Lateef M. Pseudocyst of pinna: a recurrence-free approach. Am J Otolaryngol 2009;30(2):73-79. 77. Jamal A, Shamim M, Hashmi F et al. Open excision with secondary healing versus rhomboid excision with limberg transposition flap in the

A -24

management of sacrococcygeal pilonidal disease. Journal of the Pakistan Medical Association 2009;59(3):157-160. 78. Hackel M, Masopust V, Bojar M et al. The epidural steroids in the prevention of epidural fibrosis: MRI and clinical findings. Neuroendocrinology Letters 2009;30(1):51-55. 79. Guarino N, Vallasciani SA, Marrocco G. A New Suture Material for Hypospadias Surgery: A Comparative Study. Journal of Urology 2009;181(3):1318-1323. 80. Fujino Y, Matsumoto I, Shinzeki M et al. Impact of internal biliary drainage after pancreaticoduodenectomy. Journal of Hepato-Biliary- Pancreatic Surgery 2009;16(2):160-164. 81. Fattouch K, Guccione F, Dioguardi P et al. Off-pump versus on-pump myocardial revascularization in patients with ST-segment elevation myocardial infarction: A randomized trial. Journal of Thoracic and Cardiovascular Surgery 2009;137(3):650-657. 82. Farid M, Youssef T, Mahdy T et al. Comparative study between botulinum toxin injection and partial division of puborectalis for treating anismus. International Journal of Colorectal Disease 2009;24(3):327- 334. 83. Dauri M, Fabbi E, Mariani P et al. Continuous femoral nerve block provides superior analgesia compared with continuous intra-articular and wound infusion after anterior cruciate ligament reconstruction. Regional Anesthesia and Pain Medicine 2009;34(2):95-99. 84. D'Arrigo C, Speranza A, Monaco E et al. Learning curve in tissue sparing total hip replacement: Comparison between different approaches. Journal of Orthopaedics and Traumatology 2009;10(1):47-54. 85. Dar GN, Tak SR, Kangoo KA et al. Bridge plate osteosynthesis using dynamic condylar screw (DCS) or retrograde intramedullary supracondylar nail (RIMSN) in the treatment of distal femoral fractures: Comparison of two methods in a prospective randomized study. Ulusal Travma ve Acil Cerrahi Dergisi 2009;15(2):148-153. 86. Cennamo V, Fuccio L, Repici A et al. Timing of precut procedure does not influence success rate and complications of ERCP procedure: a prospective randomized comparative study. Gastrointestinal Endoscopy 2009;69(3):473-479. 87. Braumann C, Gutt CN, Scheele J et al. Taurolidine reduces the tumor stimulating cytokine interleukin-1beta in patients with resectable gastrointestinal cancer: A multicentre prospective randomized trial. World Journal of Surgical Oncology 2009;7(32): 88. Barczynski M, Konturek A, Cichon S. Randomized clinical trial of visualization versus neuromonitoring of recurrent laryngeal nerves during thyroidectomy. British Journal of Surgery 2009;96(3):240-246. 89. Balzer KM, Pfeiffer T, Rossbach S et al. Prospective randomized trial of operative vs interventional treatment for renal artery ostial occlusive disease (RAOOD). Journal of Vascular Surgery 2009;49(3):667-675. 90. Wang W-Z, Jiang B, Liu H-M et al. Minimally invasive craniopuncture therapy vs. conservative treatment for spontaneous intracerebral

A -25

hemorrhage: results from a randomized clinical trial in China. Int j 2009;4(1):11-16. 91. Wang W, Zhu L, Lang J. Transobturator tape procedure versus tension-free vaginal tape for treatment of stress urinary incontinence. International Journal of Gynecology and Obstetrics 2009;104(2):113- 116. 92. Villanueva C, Aracil C, Colomo A et al. Clinical trial: A randomized controlled study on prevention of variceal rebleeding comparing nadolol + ligation vs. hepatic venous pressure gradient-guided pharmacological therapy. Alimentary Pharmacology and Therapeutics 2009;29(4):397-408. 93. Tierney EP, Moy RL, Kouba DJ. Rapid absorbing gut suture versus 2- octylethylcyanoacrylate tissue adhesive in the epidermal closure of linear repairs. J Drugs Dermatol 2009;8(2):115-119. 94. Sun K, Tian SQ, Zhang JH et al. ACL reconstruction with BPTB autograft and irradiated fresh frozen allograft. Journal of Zhejiang University: Science B 2009;10(4):306-316. 95. Smith ME, Elstad M. Mitomycin C and the endoscopic treatment of laryngotracheal stenosis: are two applications better than one? Laryngoscope 2009;119(2):272-283. 96. Smekal V, Irenberger A, Struve P et al. Elastic stable intramedullary nailing versus nonoperative treatment of displaced midshaft clavicular fractures-a randomized, controlled, clinical trial. J Orthop Trauma 2009;23(2):106-112. 97. Sillanpaa PJ, Mattila VM, Maenpaa H et al. Treatment with and without initial stabilizing surgery for primary traumatic patellar dislocation. A prospective randomized study. The Journal of bone and joint surgery 2009;American volume. 91(2):263-273. 98. Shoman N, Gherian H, Flamer D et al. Prospective, double-blind, randomized trial evaluating patient satisfaction, bleeding, and wound healing using biodegradable synthetic polyurethane foam (nasopore) as a middle meatal spacer in functional endoscopic sinus surgery. Journal of Otolaryngology - Head and Neck Surgery 2009;38(1):112- 118. 99. Shang E, Hasenberg T, Magdeburg R et al. First experiences with a circular stapled gastro-jejunostomy by a new transorally introducible stapler system in laparoscopic roux-en-y gastric bypass. Obesity Surgery 2009;19(2):230-236. 100. Schneeberger S, Biebl M, Steurer W et al. A prospective randomized multicenter trial comparing histidine-tryptophane- ketoglutarate versus University of Wisconsin perfusion solution in clinical pancreas transplantation. Transplant International 2009;22(2):217-224. 101. Sakwa MP, Emery RW, Shannon FL et al. Coronary artery bypass grafting with a minimized cardiopulmonary bypass circuit: a prospective, randomized trial. The Journal of thoracic and cardiovascular surgery 2009;137(2):481-485. 102. Rua JFM, Jatene FB, De Campos JRM et al. Robotic versus human camera holding in video-assisted thoracic sympathectomy: A single

A -26

blind randomized trial of efficacy and safety. Interactive Cardiovascular and Thoracic Surgery 2009;8(2):195-199. 103. Rifaie O, Abdel-Dayem MK, Ramzy A et al. Percutaneous mitral valvotomy versus closed surgical commissurotomy. Up to 15 years of follow-up of a prospective randomized study. Journal of Cardiology 2009;53(1):28-34. 104. Reid LA, Cahoon N, Stewart KJ. A prospective randomized control trial comparing one monofilament absorbable suture to a braided absorbable suture in children. Journal of Plastic, Reconstructive and Aesthetic Surgery 2009;62(2):270-272. 105. Regis J, Tamura M, Guillot C et al. Radiosurgery with the world's first fully robotized leksell gamma knife perfeXion in clinical use: A 200- patient prospective, randomized, controlled comparison with the gamma knife 4C. Neurosurgery 2009;64(2):346-355. 106. Pandit H, Jenkins C, Beard DJ et al. Cementless Oxford unicompartmental knee replacement shows reduced radiolucency at one year. Journal of Bone and Joint Surgery - Series B 2009;91(2):185-189. 107. Nunley PD, Jawahar A, Kerr IEJ et al. Choice of plate may affect outcomes for single versus multilevel ACDF: results of a prospective randomized single-blind trial. Spine Journal 2009;9(2):121-127. 108. Nordon IM, Senapati A, Cripps NPJ. A prospective randomized controlled trial of simple Bascom's technique versus Bascom's cleft closure for the treatment of chronic pilonidal disease. American Journal of Surgery 2009;197(2):189-192. 109. Man SY, Wong EML, Ng YC et al. Cost-Consequence Analysis Comparing 2-Octyl Cyanoacrylate Tissue Adhesive and Suture for Closure of Simple Lacerations: A Randomized Controlled Trial. Annals of Emergency Medicine 2009;53(2):189-197. 110. Liu C-M, Tan C-D, Lee F-P et al. Microdebrider-assisted versus radiofrequency-assisted inferior turbinoplasty. Laryngoscope 2009;119(2):414-418. 111. Lerner T, Bullmann V, Schulte TL et al. A level-1 pilot study to evaluate of ultraporous beta-tricalcium phosphate as a graft extender in the posterior correction of adolescent idiopathic scoliosis. European Spine Journal 2009;18(2):170-179. 112. Krukhaug Y, Ugland S, Lie SA et al. External fixation of fractures of the distal radius: A randomized comparison of the Hoffman compact II non-bridging fixator the Dynawrist fixator in 75 patients followed for 1 year. Acta Orthopaedica 2009;80(1):104-108. 113. Kossi J, Gronlund S, Uotila-nieminen M et al. The effect of 4% icodextrin solution on adhesiolysis surgery time at the Hartmann's reversal: A pilot, multicentre, randomized control trial vs lactated Ringer's solution. Colorectal Disease 2009;11(2):168-172. 114. Knebel P, Fischer L, Huesing J et al. Randomized clinical trial of a modified Seldinger technique for open central venous cannulation for implantable access devices. British Journal of Surgery 2009;96(2):159-165.

A -27

115. Klem TMAL, Schnater JM, Schutte PR et al. A randomized trial of cryo stripping versus conventional stripping of the great saphenous vein. Journal of Vascular Surgery 2009;49(2):403-409. 116. Kim YH, Yoon SH, Kim JS. Early outcome of TKA with a medial pivot fixed-bearing prosthesis is worse than with a PFC mobile-bearing prosthesis. Clinical orthopaedics and related research 2009;467(2):493-503. 117. Karakan T, Cindoruk M, Alagozlu H et al. EUS versus endoscopic retrograde cholangiography for patients with intermediate probability of bile duct stones: a prospective randomized trial. Gastrointestinal Endoscopy 2009;69(2):244-252. 118. Johansson E, Engervall P, Bjorvell H et al. Patients' perceptions of having a central venous catheter or a totally implantable subcutaneous port system-results from a randomised study in acute leukaemia. Supportive Care in Cancer 2009;17(2):137-143. 119. Jiya T, Smit T, Deddens J et al. Posterior lumbar interbody fusion using nonresorbable poly-ether-ether-ketone versus resorbable poly- L-lactide-co-D,L-lactide fusion devices: a prospective, randomized study to assess fusion and clinical outcome. Spine 2009;34(3):233- 237. 120. Jin C, Yao L, Long J et al. Effect of multiple-phase regional intra- arterial infusion chemotherapy on patients with resectable pancreatic head adenocarcinoma. Chinese Medical Journal 2009;122(3):284-290. 121. Ilizaliturri VM, Jr., Chaidez C, Villegas P et al. Prospective randomized study of 2 different techniques for endoscopic iliopsoas tendon release in the treatment of internal snapping hip syndrome. Arthroscopy 2009;25(2):159-163. 122. Hirao M, Kurokawa Y, Fujitani K et al. Randomized controlled trial of Roux-en-Y versus Rho-Shaped-Roux-en-Y reconstruction after distal gastrectomy for gastric cancer. World Journal of Surgery 2009;33(2):290-295. 123. Gill G, for the SNAC Trial Group. Sentinel-lymph-node-based management or routine axillary clearance? One-year outcomes of sentinel node biopsy versus axillary clearance (SNAC): a randomized controlled surgical trial. Annals of Surgical Oncology 2009;16(2):266- 275. 124. Franciosi JP, Mascarenhas M, Semeao E et al. Randomised controlled trial of paediatric magnetic positioning device assisted colonoscopy: a pilot and feasibility study. Dig Liver Dis 2009;41(2):123-126. 125. Dunklebarger J, Rhee D, Kim S et al. Video rigid laryngeal endoscopy compared to laryngeal mirror examination: an assessment of patient comfort and clinical visualization. Laryngoscope 2009;119(2):269-271. 126. Diab MA, Fernandez GN, Elsorafy K. Time and cost savings in arthroscopic subacromial decompression: The use of bipolar versus monopolar radiofrequency. International Orthopaedics 2009;33(1):175-179. 127. Csendes A, Burgos AM, Smok G et al. Latest results (12-21 years) of a prospective randomized study comparing Billroth II and Roux-en-Y A -28

anastomosis after a partial gastrectomy plus vagotomy in patients with duodenal ulcers. Annals of Surgery 2009;249(2):189-194. 128. Coban A, Hanagasi HA, Karamursel S et al. Comparison of unilateral pallidotomy and subthalamotomy findings in advanced idiopathic Parkinson's disease. British Journal of Neurosurgery 2009;23(1):23-29. 129. Chheda N, Katz AE, Gynizio L et al. The pain of nasal tampon removal after nasal surgery: A randomized control trial. Otolaryngology - Head and Neck Surgery 2009;140(2):215-217. 130. Barbaro NM, Quigg M, Broshek DK et al. A multicenter, prospective pilot study of gamma knife radiosurgery for mesial temporal lobe epilepsy: seizure response, adverse events, and verbal memory. Ann Neurol 2009;65(2):167-175. 131. Adler A, Aschenbeck J, Yenerim T et al. Narrow-Band Versus White- Light High Definition Television Endoscopic Imaging for Screening Colonoscopy: A Prospective Randomized Trial. Gastroenterology 2009;136(2):410-416. 132. Zhibo X, Miaobo Z. Effect of Sustained-Release Lidocaine on Reduction of Pain After Subpectoral Breast Augmentation. Aesthetic Surgery Journal 2009;29(1):32-34. 133. Yu H, Yang X-Y, Liu B. EMLA Cream coated on the rigid bronchoscope for tracheobronchial foreign body removal in children. Laryngoscope 2009;119(1):158-161. 134. Wood PL, Sutton C, Mishra V et al. A randomised, controlled trial of two mobile-bearing total ankle replacements. The Journal of bone and joint surgery British volume. 2009;91(1):69-74. 135. Wohlrab D, Hube R, Zeh A et al. Clinical and radiological results of high flex total knee arthroplasty: a 5 year follow-up. Archives of orthopaedic and trauma surgery 2009;129(1):21-24. 136. Wilson YL, Merer DM, Moscatello AL. Comparison of three common tonsillectomy techniques: a prospective randomized, double-blinded clinical study. Laryngoscope 2009;119(1):162-170. 137. Weaver FM, Follett K, Stern M et al. Bilateral deep brain stimulation vs best medical therapy for patients with advanced parkinson disease: A randomized controlled trial. JAMA. 2009;301(1):63-73. 138. Suda AJ, Knahr K. Early results with the cementless Variall hip system. Expert Rev Med Devices 2009;6(1):21-25. 139. St. Peter SD, Tsao K, Harrison C et al. Thoracoscopic decortication vs tube thoracostomy with fibrinolysis for empyema in children: a prospective, randomized trial. Journal of Pediatric Surgery 2009;44(1):106-111. 140. Shikora SA, Bergenstal R, Bessler M et al. Implantable gastric stimulation for the treatment of clinically severe obesity: results of the SHAPE trial. Surgery for Obesity and Related Diseases 2009;5(1):31- 37. 141. Senay S, Toraman F, Gunaydin S et al. The impact of allogenic red cell transfusion and coated bypass circuit on the inflammatory response during cardiopulmonary bypass: A randomized study. Interactive Cardiovascular and Thoracic Surgery 2009;8(1):93-99.

A -29

142. Rosenthal MD, Moore JH, Stoneman PD et al. Neuromuscular excitability changes in the vastus medialis following anterior cruciate ligament reconstruction. Electromyogr Clin Neurophysiol 2009;49(1):43-51. 143. Pohl J, Lotterer E, Balzer C et al. Computed virtual chromoendoscopy versus standard colonoscopy with targeted indigocarmine chromoscopy: a randomised multicentre trial. Gut 2009;58(1):73-78. 144. Natale F, La Penna C, Padoa A et al. A prospective, randomized, controlled study comparing Gynemesh(R), a synthetic mesh, and Pelvicol(R), a biologic graft, in the surgical treatment of recurrent cystocele. International Urogynecology Journal 2009;20(1):75-81. 145. Naseri MH, Pishgou B, Ameli J et al. Comparison of post-operative neurological complications between on-pump and off-pump coronary artery bypass surgery. Pakistan Journal of Medical Sciences 2009;25(1):137-141. 146. Morfa GM, Blazquez JT, Cordero AMP et al. A comparison of beating heart and arrested heart techniques for mitral valve replacement surgery. MEDICC Review 2009;11(1):36-41. 147. Miccoli P, Materazzi G, Fregoli L et al. Modified lateral neck lymphadenectomy: Prospective randomized study comparing harmonic scalpel with clamp-and-tie technique. Otolaryngology - Head and Neck Surgery 2009;140(1):61-64. 148. McQuade K, Gable D, Hohman S et al. Randomized comparison of ePTFE/nitinol self-expanding stent graft vs prosthetic femoral-popliteal bypass in the treatment of superficial femoral artery occlusive disease. Journal of Vascular Surgery 2009;49(1):109-116. 149. McKee MD, Veillette CJH, Hall JA et al. A multicenter, prospective, randomized, controlled trial of open reduction-internal fixation versus total elbow arthroplasty for displaced intra-articular distal humeral fractures in elderly patients. Journal of Shoulder and Elbow Surgery 2009;18(1):3-12. 150. Matsushita M, Takakuwa H, Shimeno N et al. Epinephrine sprayed on the papilla for prevention of post-ERCP pancreatitis. J Gastroenterol 2009;44(1):71-75. 151. Ma J, Li XH, Yan ZX et al. Effect of myocardial protection during beating heart surgery with right sub-axiliary approach. Chinese Medical Journal 2009;122(2):150-152. 152. Lorusso R, De Cicco G, Totaro P et al. Effects of phosphorylcholine coating on extracorporeal circulation management and postoperative outcome: A double-blind randomized study. Interactive Cardiovascular and Thoracic Surgery 2009;8(1):7-11. 153. Liao GS, Wu MH, Yu JC et al. Transection of the esophagus is optional in the Modified Sugiura procedure. Hepato-Gastroenterology 2009;56(89):133-138. 154. Liang H, Zhao Y, Wang D et al. Evaluation of the quality of processed blood salvaged during craniotomy. Surgical Neurology 2009;71(1):74- 80. 155. Li JY, Lai YJ, Yeh HI et al. Atrial gap junctions, NF-kappaB and fibrosis in patients undergoing coronary artery bypass surgery: The A -30

relationship with postoperative atrial fibrillation. Cardiology 2009;112(2):81-88. 156. Li B, Chen R, Huang R et al. Clinical benefit of cardiac ischemic postconditioning in corrections of tetralogy of Fallot. Interactive Cardiovascular and Thoracic Surgery 2009;8(1):17-21. 157. Lemaire SA, Jones MM, Conklin LD et al. Randomized comparison of cold blood and cold crystalloid renal perfusion for renal protection during thoracoabdominal aortic aneurysm repair. Journal of vascular surgery. 2009;49(1):11-19. 158. Lee YT, Lai LH, Hui AJ et al. Efficacy of cap-assisted colonoscopy in comparison with regular colonoscopy: a randomized controlled trial. Am J Gastroenterol 2009;104(1):41-46. 159. Lee S, Pham AM, Pryor SG et al. Efficacy of Crosseal fibrin sealant (human) in rhytidectomy. Arch Facial Plast Surg 2009;11(1):29-33. 160. Koksoy C, Demirci RK, Balci D et al. Brachiobasilic versus brachiocephalic arteriovenous fistula: a prospective randomized study. Journal of vascular surgery. 2009;49(1):171-177. 161. Koivusalo AI, Korpela R, Wirtavuori K et al. A single-blinded, randomized comparison of laparoscopic versus open hernia repair in children. Pediatrics 2009;123(1):332-337. 162. Klarenbeek BR, Veenhof AA, Bergamaschi R et al. Laparoscopic sigmoid resection for diverticulitis decreases major morbidity rates: a randomized control trial: short-term results of the Sigma Trial. Annals of Surgery 2009;249(1):39-44. 163. Kim Y-S, Lee J-Y, Yang S-C et al. Comparative study of the influence of room-temperature and warmed fluid irrigation on body temperature in arthroscopic shoulder surgery. Arthroscopy 2009;25(1):24-29. 164. Kim ES, Jeon SW, Park SY et al. Comparison of double-layered and covered Niti-S stents for palliation of malignant dysphagia. J Gastroenterol Hepatol 2009;24(1):114-119. 165. Huang X, Zheng Y, Liu X et al. A comparison between endoscope- assisted partial parotidectomy and conventional partial parotidectomy. Otolaryngology - Head and Neck Surgery 2009;140(1):70-75. 166. Heller JG, Sasso RC, Papadopoulos SM et al. Comparison of BRYAN cervical disc arthroplasty with anterior cervical decompression and fusion: clinical and radiographic results of a randomized, controlled, clinical trial. Spine 2009;34(2):101-107. 167. Heider P, Wildgruber M, Wolf O et al. Improvement of microcirculation after percutaneous transluminal angioplasty in the lower limb with prostaglandin E1. Prostaglandins Other Lipid Mediat 2009;88(1-2):23- 30. 168. Hall NJ, Pacilli M, Eaton S et al. Recovery after open versus laparoscopic pyloromyotomy for pyloric stenosis: a double-blind multicentre randomised controlled trial. Lancet 2009;373(9661):390- 398. 169. Group KATT, Johnston L, MacLennan G et al. The Knee Arthroplasty Trial (KAT) design features, baseline characteristics, and two-year functional outcomes after alternative approaches to knee replacement.

A -31

The Journal of bone and joint surgery American volume. 2009;91(1):134-141. 170. Grasso A, Milano G, Salvatore M et al. Single-row versus double-row arthroscopic rotator cuff repair: a prospective randomized clinical study. Arthroscopy 2009;25(1):4-12. 171. Gonen M, Ozturk B, Ozkardes H. Double-J stenting compared with one night externalized ureteral catheter placement in tubeless percutaneous nephrolithotomy. Journal of Endourology 2009;23(1):27- 31. 172. Gisbertz SS, Ramzan M, Tutein Nolthenius RP et al. Short-Term Results of A Randomized Trial Comparing Remote Endarterectomy and Supragenicular Bypass Surgery for Long Occlusions of the Superficial Femoral Artery [The REVAS Trial]. European Journal of Vascular and Endovascular Surgery 2009;37(1):68-76. 173. Ghoniem G, Corcos J, Comiter C et al. Cross-linked polydimethylsiloxane injection for female stress urinary incontinence: results of a multicenter, randomized, controlled, single-blind study. The Journal of urology 2009;181(1):204-210. 174. Garcia-Olmo D, Herreros D, Pascual I et al. Expanded adipose- derived stem cells for the treatment of complex perianal fistula: a phase II clinical trial. Dis Colon Rectum 2009;52(1):79-86. 175. Engh CA, MacDonald SJ, Sritulanondha S et al. 2008 John Charnley award: metal ion levels after metal-on-metal total hip arthroplasty: a randomized trial. Clinical orthopaedics and related research 2009;467(1):101-111. 176. El-Awadi S, El-Nakeeb A, Youssef T et al. Laparoscopic versus open cholecystectomy in cirrhotic patients: A prospective randomized study. International Journal of Surgery 2009;7(1):66-69. 177. El Moghazy WM, Hedaya MS, Kaido T et al. Two different methods for donor hepatic transection: cavitron ultrasonic surgical aspirator with bipolar cautery versus cavitron ultrasonic surgical aspirator with radiofrequency coagulator-A randomized controlled trial. Liver Transpl 2009;15(1):102-105. 178. Conroy JL, Chawda M, Kaushal R et al. Does Use of a "Rim Cutter" Improve Quality of Cementation of the Acetabular Component of Cemented Exeter Total Hip Arthroplasty? Journal of Arthroplasty 2009;24(1):71-76. 179. Chijiiwa K, Imamura N, Ohuchida J et al. Prospective randomized controlled study of gastric emptying assessed by 13C-acetate breath test after pylorus-preserving pancreaticoduodenectomy: Comparison between antecolic and vertical retrocolic duodenojejunostomy. Journal of Hepato-Biliary-Pancreatic Surgery 2009;16(1):49-55. 180. Chemla ES, Morsy M. Randomized clinical trial comparing decellularized bovine ureter with expanded polytetrafluoroethylene for vascular access. British journal of surgery 2009;96(1):34-39. 181. Chauhan A, Tiwari S, Mishra V et al. Comparison of internal sphincterotomy with topical diltiazem for post-hemorrhoidectomy pain relief: A prospective randomized trial. Journal of Postgraduate Medicine 2009;55(1):22-26. A -32

182. Camboni D, Schmidt S, Philipp A et al. Microbubble activity in miniaturized and in conventional extracorporeal circulation. Asaio J 2009;55(1):58-62. 183. Camazzola D, Hammond T, Gandhi R et al. A Randomized Trial of Hydroxyapatite-Coated Femoral Stems in Total Hip Arthroplasty. A 13- Year Follow-Up. Journal of Arthroplasty 2009;24(1):33-37. 184. Bronner MP, Overholt BF, Taylor SL et al. Squamous overgrowth is not a safety concern for photodynamic therapy for Barrett's esophagus with high-grade dysplasia. Gastroenterology 2009;136(1):56-64. 185. Brisinda G, Vanella S, Cadeddu F et al. End-to-end versus end-to-side stapled anastomoses after anterior resection for rectal cancer. Journal of surgical oncology 2009;99(1):75-79. 186. Blattert TR, Jestaedt L, Weckbach A. Suitability of a calcium phosphate cement in osteoporotic vertebral body fracture augmentation: a controlled, randomized, clinical trial of balloon kyphoplasty comparing calcium phosphate versus polymethylmethacrylate. Spine 2009;34(2):108-114. 187. Biffi R, Orsi F, Pozzi S et al. Best choice of central venous insertion site for the prevention of catheter-related complications in adult patients who need cancer therapy: A randomized trial. Annals of Oncology 2009;20(5):935-940. 188. Bhansali M, Patankar S, Dobhada S et al. Management of large (>60 g) prostate gland: PlasmaKinetic Superpulse (bipolar) versus conventional (monopolar) transurethral resection of the prostate. Journal of Endourology 2009;23(1):141-145. 189. Bergsland J, Lingaas PS, Skulstad H et al. Intracoronary shunt prevents ischemia in off-pump coronary artery bypass surgery. The Annals of thoracic surgery 2009;87(1):54-60. 190. Allen GS. Intraoperative temperature control using the Thermogard system during off-pump coronary artery bypass grafting. The Annals of thoracic surgery 2009;87(1):284-288. 191. Onorati F, Santarpino G, Rubino AS et al. Body perfusion during adult cardiopulmonary bypass is improved by pulsatile flow with intra-aortic balloon pump. International Journal of Artificial Organs 2009;32(1):50- 61. 192. Yoo J, Roth K, Hughes B et al. Evaluation of postoperative drainage with application of platelet-rich and platelet-poor plasma following hemithyroidectomy: a randomized controlled clinical trial. Head & neck 2008;30(12):1552-1558. 193. White RR, Pitzer KD, Fader RC et al. Pharmacokinetics of topical and intravenous cefazolin in patients with clean surgical wounds. Plastic and reconstructive surgery 2008;122(6):1773-1779. 194. Varadarajulu S, Christein JD, Tamhane A et al. Prospective randomized trial comparing EUS and EGD for transmural drainage of pancreatic pseudocysts (with videos). Gastrointestinal Endoscopy 2008;68(6):1102-1111. 195. van Manen CJ, Dekker ML, van Eerten PV et al. Bio-resorbable versus metal implants in wrist fractures: a randomised trial. Archives of orthopaedic and trauma surgery 2008;128(12):1413-1417. A -33

196. Tsuchiya M, Sato EF, Inoue M et al. Open abdominal surgery increases intraoperative oxidative stress: can it be prevented? Anesthesia and analgesia 2008;107(6):1946-1952. 197. Sywak MS, Yeh MW, McMullen T et al. A randomized controlled trial of minimally invasive thyroidectomy using the lateral direct approach versus conventional hemithyroidectomy. Surgery 2008;144(6):1016- 1021. 198. Stolzenburg J-U, Wasserscheid J, Rabenalt R et al. Reduction in incidence of lymphocele following extraperitoneal radical prostatectomy and pelvic lymph node dissection by bilateral peritoneal fenestration. World J Urol 2008;26(6):581-586. 199. Spronk S, Bosch JL, den Hoed PT et al. Cost-effectiveness of endovascular revascularization compared to supervised hospital- based exercise training in patients with intermittent claudication: a randomized controlled trial. Journal of vascular surgery. 2008;48(6):1472-1480. 200. Smietanski M, Polish Hernia Study G. Randomized clinical trial comparing a polypropylene with a poliglecaprone and polypropylene composite mesh for inguinal hernioplasty. British journal of surgery. 2008;95(12):1462-1468. 201. Slappendel R, Horstmann W, Dirksen R et al. Wound drainage with or without blood salvage? An open, prospective, randomized and single- center comparison of blood loss, postoperative hemoglobin levels and allogeneic blood transfusions after major hip surgery. Transfusion Alternatives in Transfusion Medicine 2008;10(4):174-181. 202. Siddiqui FG, Shaikh JM, Soomro AG et al. Outcome of ileostomy in the management of ileal perforation. Journal of the Liaquat University of Medical and Health Sciences 2008;7(3):168-172. 203. Shida T, Katsuura Y, Teramoto O et al. Transparent hood attached to the colonoscope: does it really work for all types of colonoscopes? Surgical endoscopy 2008;22(12):2654-2658. 204. Sherwinter DA, Ghaznavi AM, Spinner D et al. Continuous infusion of intraperitoneal bupivacaine after laparoscopic surgery: a randomized controlled trial. Obesity surgery 2008;18(12):1581-1586. 205. Shemesh D, Goldin I, Zaghal I et al. Angioplasty with stent graft versus bare stent for recurrent cephalic arch stenosis in autogenous arteriovenous access for hemodialysis: A prospective randomized clinical trial. Journal of Vascular Surgery 2008;48(6):1524-1531. 206. Schimmer C, Reents W, Berneder S et al. Prevention of sternal dehiscence and infection in high-risk patients: a prospective randomized multicenter trial. The Annals of thoracic surgery 2008;86(6):1897-1904. 207. Schierlitz L, Dwyer PL, Rosamilia A et al. Effectiveness of tension-free vaginal tape compared with transobturator tape in women with stress urinary incontinence and intrinsic sphincter deficiency: a randomized controlled trial. Obstetrics and gynecology 2008;112(6):1253-1261. 208. Saariniemi KM, Keranen UH, Salminen-Peltola PK et al. Reduction mammaplasty is effective treatment according to two quality of life

A -34

instruments. A prospective randomised clinical trial. Journal of plastic, reconstructive & aesthetic surgery. 2008;61(12):1472-1478. 209. Ronnberg K, Lind B, Zoega B et al. Peridural scar and its relation to clinical outcome: a randomised study on surgically treated lumbar disc herniation patients. European spine journal. 2008;17(12):1714-1720. 210. Rebecchi F, Giaccone C, Farinella E et al. Randomized controlled trial of laparoscopic Heller myotomy plus Dor fundoplication versus Nissen fundoplication for achalasia: long-term results. Annals of Surgery 2008;248(6):1023-1030. 211. Radmehr H, Soleimani A, Tatari H et al. Does Combined Antegrade- Retrograde Cardioplegia Have Any Superiority Over Antegrade Cardioplegia? Heart Lung and Circulation 2008;17(6):475-477. 212. Patel AR, Jones JS, Babineau D. Impact of real-time visualization of cystoscopy findings on procedural pain in female patients. Journal of endourology 2008;22(12):2695-2698. 213. Ozcan C, Vayisoglu Y, Kilic S et al. Comparison of rapid rhino and merocel nasal packs in endonasal septal surgery. J Otolaryngol Head Neck Surg 2008;37(6):826-831. 214. Ouerghi S, Frikha N, Mestiri T et al. A prospective, randomised comparison of continuous paravertebral block and continuous intercostal nerve block for post-thoracotomy pain. Southern African Journal of Anaesthesia and Analgesia 2008;14(6):19-23. 215. Mosterd K, Krekels GA, Nieman FH et al. Surgical excision versus Mohs' micrographic surgery for primary and recurrent basal-cell carcinoma of the face: a prospective randomised controlled trial with 5-years' follow-up. The lancet oncology 2008;9(12):1149-1156. 216. Morak MJ, van der Gaast A, Incrocci L et al. Adjuvant intra-arterial chemotherapy and radiotherapy versus surgery alone in resectable pancreatic and periampullary cancer: a prospective randomized controlled trial. Annals of Surgery 2008;248(6):1031-1041. 217. Moisala A-S, Jarvela T, Paakkala A et al. Comparison of the bioabsorbable and metal screw fixation after ACL reconstruction with a hamstring autograft in MRI and clinical outcome: a prospective randomized study. Knee Surg Sports Traumatol Arthrosc 2008;16(12):1080-1086. 218. Mehta Y, Arora D, Sharma KK et al. Comparison of continuous thoracic epidural and paravertebral block for postoperative analgesia after robotic-assisted coronary artery bypass surgery. Annals of cardiac anaesthesia 2008;11(2):91-96. 219. Leyba JL, Llopis SN, Isaac J et al. Laparoscopic gastric bypass for morbid obesity-a randomized controlled trial comparing two gastrojejunal anastomosis techniques. J Soc Laparoendosc Surg 2008;12(4):385-388. 220. Lewis PM, Moore CA, Olsen M et al. Comparison of mid-term clinical outcomes after primary total hip arthroplasty with Oxinium vs cobalt chrome femoral heads. 2008;12:2. 221. Krejsek J, Kunes P, Kolackova M et al. Expression of Toll-like receptors 2 and 4 on innate immunity cells modulated by cardiac surgical operation. Scand J Clin Lab Invest 2008;68(8):749-758. A -35

222. Koksal H, Rahman A, Burma O et al. The effects of low dose N- acetylcysteine (NAC) as an adjunct to cardioplegia in coronary artery bypass surgery. Anadolu Kardiyoloji Dergisi 2008;8(6):437-443. 223. Khan ZH, Hamidi S, Miri M et al. Post-operative pain relief following intrathecal injection of acetylcholine esterase inhibitor during lumbar disc surgery: A prospective double blind randomized study. Journal of Clinical Pharmacy and Therapeutics 2008;33(6):669-675. 224. Kastl KG, Betz CS, Siedek V et al. Control of bleeding following functional endoscopic sinus surgery using carboxy-methylated cellulose packing. Eur Arch Otorhinolaryngol 2008;22(8):1239-1243. 225. Kashkouli MB, Kaghazkanai R, Mirzaie AZ et al. Clinicopathologic comparison of radiofrequency versus scalpel incision for upper blepharoplasty. Ophthal Plast Reconstr Surg 2008;24(6):450-453. 226. Jung H, Norby B, Frimodt-Moller PC et al. Endoluminal Isoproterenol Irrigation Decreases Renal Pelvic Pressure During Flexible Ureterorenoscopy: A Clinical Randomized, Controlled Study. European Urology 2008;54(6):1404-1413. 227. Jin ZX, Zhang SL, Wang XM et al. The myocardial protective effects of a moderate-potassium adenosine-lidocaine cardioplegia in pediatric cardiac surgery. The Journal of thoracic and cardiovascular surgery 2008;136(6):1450-1455. 228. Jarvela T, Moisala AS, Paakkala T et al. Tunnel Enlargement After Double-Bundle Anterior Cruciate Ligament Reconstruction: A Prospective, Randomized Study. Arthroscopy. 2008;24(12):1349-1357. 229. Huvenne W, Zhang N, Tijsma E et al. Pilot study using doxycycline- releasing stents to ameliorate postoperative healing quality after sinus surgery. Wound Repair and Regeneration 2008;16(6):757-767. 230. Greenhalgh RM, Belch JJ, Brown LC et al. The adjuvant benefit of angioplasty in patients with mild to moderate intermittent claudication (MIMIC) managed by supervised exercise, smoking cessation advice and best medical therapy: results from two randomised trials for stenotic femoropopliteal and aortoiliac arterial disease. European journal of vascular and endovascular surgery. 2008;36(6):680-688. 231. Grant AM, Wileman SM, Ramsay CR et al. Minimal access surgery compared with medical management for chronic gastro-oesophageal reflux disease: UK collaborative randomised trial. BMJ 2008;337:a2664. 232. Goossens GA, Verbeeck G, Moons P et al. Functional evaluation of conventional 'Celsite' venous ports versus 'Vortex' ports with a tangential outlet: A prospective randomised pilot study. Supportive Care in Cancer 2008;16(12):1367-1374. 233. Girardi PBMA, Hueb W, Nogueira CRSR et al. Comparative costs between myocardial revascularization with or without extracorporeal circulation. Arquivos Brasileiros de Cardiologia 2008;91(6):340-346. 234. Gaudino M, Anselmi A, Glieca F et al. Assessment of the position of retrograde cardioplegia catheter: Comparison of hemodynamic versus manual evaluation in a prospective randomized trial. Journal of Cardiac Surgery 2008;23(6):638-641.

A -36

235. Fernandez-Cruz L, Cosa R, Blanco L et al. Pancreatogastrostomy with gastric partition after pylorus-preserving pancreatoduodenectomy versus conventional pancreatojejunostomy: a prospective randomized study. Annals of Surgery 2008;248(6):930-938. 236. Duzgun AP, Satir HZ, Ozozan O et al. Effect of Hyperbaric Oxygen Therapy on Healing of Diabetic Foot Ulcers. Journal of Foot and Ankle Surgery 2008;47(6):515-519. 237. Dumfarth J, Zimpfer D, Vogele-Kadletz M et al. Prophylactic low- energy shock wave therapy improves wound healing after vein harvesting for coronary artery bypass graft surgery: a prospective, randomized trial. The Annals of thoracic surgery 2008;86(6):1909- 1913. 238. Disselhoff BCVM, der Kinderen DJ, Kelder JC et al. Randomized Clinical Trial Comparing Endovenous Laser Ablation of the Great Saphenous Vein with and without Ligation of the Sapheno-femoral Junction: 2-year Results. European Journal of Vascular and Endovascular Surgery 2008;36(6):713-718. 239. Deng L, Li J, Shen YY. Calcium sulfate versus calcium phosphate in treating traumatic fractures. Journal of Clinical Rehabilitative Tissue Engineering Research 2008;12(49):9783-9786. 240. Denaro V, Di Martino A, Longo UG et al. Effectiveness of a mucolythic agent as a local adjuvant in revision lumbar spine surgery. European spine journal. 2008;17(12):1752-1756. 241. Deenik A, van Mameren H, de Visser E et al. Equivalent correction in scarf and chevron osteotomy in moderate and severe hallux valgus: a randomized controlled trial. Foot Ankle Int 2008;29(12):1209-1215. 242. Chung HJ, Chung KJ, Yoon HS et al. Comparative study of balloon kyphoplasty with unilateral versus bilateral approach in osteoporotic vertebral compression fractures. International Orthopaedics 2008;32(6):817-820. 243. Chaudhary R, Beaupre LA, Johnston DW. Knee range of motion during the first two years after use of posterior cruciate-stabilizing or posterior cruciate-retaining total knee prostheses. A randomized clinical trial. The Journal of bone and joint surgery American volume. 2008;90(12):2579-2586. 244. Calori GM, Tagliabue L, Gala L et al. Application of rhBMP-7 and platelet-rich plasma in the treatment of long bone non-unions. A prospective randomised clinical study on 120 patients. Injury 2008;39(12):1391-1402. 245. Calik A, Yucel Y, Topaloglu S et al. Umbilical trocar site closure with Berci's needle after laparoscopic cholecystectomy. Hepato- Gastroenterology 2008;55(88):1958-1961. 246. Bush RG, Shamma HN, Hammond K. Histological changes occurring after endoluminal ablation with two diode lasers (940 and 1319 nm) from acute changes to 4 months. Lasers in Surgery and Medicine 2008;40(10):676-679. 247. Birbicer H, Doruk N, Yapici D et al. Percutaneous tracheostomy: a comparison of PercuTwist and multi-dilatators techniques. Annals of cardiac anaesthesia 2008;11(2):131. A -37

248. Bhandari M, Guyatt G, Walter SD et al. Randomized trial of reamed and unreamed intramedullary nailing of tibial shaft fractures. Journal of Bone and Joint Surgery - Series A 2008;90(12):2567-2578. 249. Basiri A, Simforoosh N, Ziaee A et al. Retrograde, antegrade, and laparoscopic approaches for the management of large, proximal ureteral stones: A randomized clinical trial. Journal of Endourology 2008;22(12):2677-2680. 250. Barbaros U, Erbil Y, Aksakal N et al. Electrocautery for cutaneous flap creation during thyroidectomy: a randomised, controlled study. The Journal of laryngology and otology 2008;122(12):1343-1348. 251. Bakhtiary F, Moritz A, Kleine P et al. Leukocyte depletion during cardiac surgery with extracorporeal circulation in high risk patients. Inflammation Research 2008;57(12):577-585. 252. van Boven W-JP, Gerritsen WB, Driessen AH et al. Myocardial oxidative stress, and cell injury comparing three different techniques for coronary artery bypass grafting. European Journal of Cardio- Thoracic Surgery 2008;34(5):969-975. 253. Torzilli G, Donadon M, Marconi M et al. Monopolar floating ball versus bipolar forceps for hepatic resection: a prospective randomized clinical trial. Journal of gastrointestinal surgery. 2008;12(11):1961-1966. 254. Taradaj J, Franek A, Cierpka L et al. Failure of low-level laser therapy to boost healing of venous leg ulcers in surgically and conservatively treated patients. Phlebologie 2008;37(5):241-246. 255. Szeimies RM, Ibbotson S, Murrell DF et al. A clinical study comparing methyl aminolevulinate photodynamic therapy and surgery in small superficial basal cell carcinoma (8-20 mm), with a 12-month follow-up. Journal of the European Academy of Dermatology and Venereology 2008;22(11):1302-1311. 256. Stoffel EM, Turgeon DK, Stockwell DH et al. Missed adenomas during colonoscopic surveillance in individuals with Lynch Syndrome (hereditary nonpolyposis colorectal cancer). Cancer Prev Res. 2008;1(6):470-475. 257. Stepic N, Novakovic M, Martic V et al. Effects of perineural steroid injections on median nerve conduction during the carpal tunnel release. Vojnosanitetski pregled. 2008;65(11):825-829. 258. Solakovic E, Totic D, Solakovic S. Femoro-popliteal bypass above knee with saphenous vein vs synthetic graft. Bosnian journal of basic medical sciences. 2008;8(4):367-372. 259. Sistla SC, Sibal AK, Ravishankar M. Intermittent wound perfusion for postoperative pain relief following upper abdominal surgery: A surgeon's perspective. Pain Practice 2008;9(1):65-70. 260. Singh I, Saran RN, Jain M. Does sealing of the tract with absorbable gelatin (Spongostan) facilitate tubeless PCNL? A prospective study. Journal of Endourology 2008;22(11):2485-2493. 261. Silecchia G, Boru CE, Mouiel J et al. The use of fibrin sealant to prevent major complications following laparoscopic gastric bypass: results of a multicenter, randomized trial. Surgical endoscopy 2008;22(11):2492-2497.

A -38

262. Shen JW, Tong PJ, Qu HB. A three-dimensional reconstruction plate for displaced midshaft fractures of the clavicle. The Journal of bone and joint surgery British volume. 2008;90(11):1495-1498. 263. Sezen OS, Kaytanci H, Kubilay U et al. Comparison between tonsillectomy with thermal welding and the conventional 'cold' tonsillectomy technique. ANZ journal of surgery 2008;78(11):1014- 1018. 264. Pohl J, Nguyen-Tat M, Manner H et al. "Dry biopsies" with spraying of dilute epinephrine optimize biopsy mapping of long segment Barrett's esophagus. Endoscopy 2008;40(11):883-887. 265. Pisello F, Geraci G, Li Volsi F et al. Permanent stenting in "unextractable" common bile duct stones in high risk patients. A prospective randomized study comparing two different stents. Langenbeck's Archives of Surgery 2008;393(6):857-863. 266. Peyser A, Weil YA, Brocke L et al. A prospective randomised study comparing the percutaneous compression plate and the compression hip screw for the treatment of intertrochanteric fractures of the hip Journal of Bone and Joint Surgery - Series B. 2008;90(11):1533. 267. Pegg TJ, Selvanayagam JB, Francis JM et al. A randomized trial of on-pump beating heart and conventional cardioplegic arrest in coronary artery bypass surgery patients with impaired left ventricular function using cardiac magnetic resonance imaging and biochemical markers. Circulation 2008;118(21):2130-2138. 268. Pearce C, Torres C, Stallings S et al. Elective appendectomy at the time of cesarean delivery: a randomized controlled trial. American journal of obstetrics and gynecology 2008;199(5):491.e491-495. 269. Miller KA, Pump A. Mechanical versus suture fixation of the port in adjustable gastric banding procedures: a prospective randomized blinded study. Surgical endoscopy 2008;22(11):2478-2484. 270. Lezoche E, Guerrieri M, Crosta F et al. Flank approach versus anterior sub-mesocolic access in left laparoscopic adrenalectomy: a prospective randomized study. Surgical endoscopy 2008;22(11):2373- 2378. 271. Lehur PA, Stuto A, Fantoli M et al. Outcomes of stapled transanal rectal resection vs. biofeedback for the treatment of outlet obstruction associated with rectal intussusception and rectocele: a multicenter, randomized, controlled trial. Diseases of the colon and rectum 2008;51(11):1611-1618. 272. Kovacs AF, Sauer SN, Stefenelli U et al. Growth of the orbit after frontoorbital advancement using nonrigid suture vs rigid plate fixation technique. Journal of pediatric surgery 2008;43(11):2075-2081. 273. Klein AA, Nashef SA, Sharples L et al. A randomized controlled trial of cell salvage in routine cardiac surgery. Anesthesia and analgesia 2008;107(5):1487-1495. 274. Kim YW, Baik YH, Yun YH et al. Improved quality of life outcomes after laparoscopy-assisted distal gastrectomy for early gastric cancer: results of a prospective randomized clinical trial. Annals of Surgery 2008;248(5):721-727.

A -39

275. Husmann M, Dorffler-Melly J, Kalka C et al. Successful lower extremity angioplasty improves brachial artery flow-mediated dilation in patients with peripheral arterial disease. Journal of vascular surgery. 2008;48(5):1211-1216. 276. Hewett PJ, Allardyce RA, Bagshaw PF et al. Short-term outcomes of the Australasian randomized clinical study comparing laparoscopic and conventional open surgical treatments for colon cancer: the ALCCaS trial. Annals of Surgery 2008;248(5):728-738. 277. Guilleminault C, Davis K, Huynh NT. Prospective randomized study of patients with insomnia and mild sleep disordered breathing. Sleep 2008;31(11):1527-1533. 278. Gu YJ, Vermeijden WJ, de Vries AJ et al. Influence of Mechanical Cell Salvage on Red Blood Cell Aggregation, Deformability, and 2,3- Diphosphoglycerate in Patients Undergoing Cardiac Surgery With Cardiopulmonary Bypass. Annals of Thoracic Surgery 2008;86(5):1570-1575. 279. Falk V, Seeburger J, Czesla M et al. How does the use of polytetrafluoroethylene neochordae for posterior mitral valve prolapse (loop technique) compare with leaflet resection? A prospective randomized trial. The Journal of thoracic and cardiovascular surgery 2008;136(5):1205. 280. Falahatkar S, Moghaddam AA, Salehi M et al. Complete supine percutaneous nephrolithotripsy comparison with the prone standard technique. Journal of endourology. 2008;22(11):2513-2517. 281. Emmiler M, Kocogullari CU, Ela Y et al. Influence of intracoronary shunt on myocardial damage: a prospective randomized study. European Journal of Cardio-thoracic Surgery 2008;34(5):1000-1004. 282. Danielsen P, Jorgensen B, Karlsmark T et al. Effect of topical autologous platelet-rich fibrin versus no intervention on epithelialization of donor sites and meshed split-thickness skin autografts: a randomized clinical trial. Plastic and reconstructive surgery 2008;122(5):1431-1440. 283. Czibik G, Wu Z, Berne GP et al. Human adaptation to ischemia by preconditioning or unstable angina: involvement of nuclear factor kappa B, but not hypoxia-inducible factor 1 alpha in the heart. European Journal of Cardio-Thoracic Surgery 2008;34(5):976-984. 284. Coron E, Sebille V, Cadiot G et al. Clinical trial: Radiofrequency energy delivery in proton pump inhibitor-dependent gastro- oesophageal reflux disease patients. Alimentary pharmacology & therapeutics 2008;28(9):1147-1158. 285. Cornel EB, Oosterwijk E, Kiemeney LA. The effect on pain experienced by male patients of watching their office-based flexible cystoscopy. BJU international 2008;102(10):1445-1446. 286. Cheung J, Bailey R, Veldhuyzen van Zanten S et al. Early experience with unsedated ultrathin 4.9 mm transnasal gastroscopy: a pilot study. Canadian journal of gastroenterology. 2008;22(11):917-922. 287. Bayar A, Tuncay I, Atasoy N et al. The effect of watching live arthroscopic views on postoperative anxiety of patients. Knee Surg Sports Traumatol Arthrosc 2008;16(11):982-987. A -40

288. Baldini A, Adravanti P. Less invasive TKA: extramedullary femoral reference without navigation. Clinical orthopaedics and related research 2008;466(11):2694-2700. 289. Awan MS, Iqbal M. Nasal packing after septoplasty: a randomized comparison of packing versus no packing in 88 patients. Ear Nose Throat J 2008;87(11):624-627. 290. Assadian A, Wickenhauser G, Hubl W et al. Traditional versus endoscopic saphenous vein stripping: a prospective randomized pilot trial. European journal of vascular and endovascular surgery. 2008;36(5):611-615. 291. Adnan MT, Abdel-Fattah MM, Makhdoom NK et al. Does the use of radiofrequency ultrasonic dissector in tonsillectomy have a beneficial effect over the use of laser? Saudi Medical Journal 2008;29(12):1775- 1778. 292. Wong JCH, Yau KK, Cheung HYS et al. Towards painless colonoscopy: A randomized controlled trial on carbon dioxide- insufflating colonoscopy. ANZ Journal of Surgery 2008;78(10):871- 874. 293. Winterborn RJ, Foy C, Heather BP et al. Randomised trial of flush saphenofemoral ligation for primary great saphenous varicose veins. European journal of vascular and endovascular surgery. 2008;36(4):477-484. 294. Ulrich AB, Seiler CM, Z'Graggen K et al. Early results from a randomized clinical trial of colon J pouch versus transverse coloplasty pouch after low anterior resection for rectal cancer. British journal of surgery. 2008;95(10):1257-1263. 295. Suh KT, Park WW, Kim SJ et al. Posterior lumbar interbody fusion for adult isthmic spondylolisthesis: A comparison of fusion with one or two cages. Journal of Bone and Joint Surgery - Series B 2008;90(10):1352-1356. 296. Steward DL, Huntley TC, Woodson BT et al. Palate implants for obstructive sleep apnea: Multi-institution, randomized, placebo- controlled study. Otolaryngology - Head and Neck Surgery 2008;139(4):506-510. 297. Spradlin NM, Wise PE, Herline AJ et al. A randomized prospective trial of endoscopic ultrasound to guide combination medical and surgical treatment for Crohn's perianal fistulas. The American journal of gastroenterology 2008;103(10):2527-2535. 298. Soleimanpour J, Feizi HH, Mohseni MA et al. Comparison between ender and unreamed interlocking nails in tibial shaft fractures. Saudi medical journal 2008;29(10):1458-1462. 299. Skolarikos A, Papachristou C, Athanasiadis G et al. Eighteen-month results of a randomized prospective study comparing transurethral photoselective vaporization with transvesical open enucleation for prostatic adenomas greater than 80 cc. Journal of endourology. 2008;22(10):2333-2340. 300. Shao Y, Zhuo J, Sun X-W et al. Nonstented versus routine stented ureteroscopic holmium laser lithotripsy: a prospective randomized trial. Urol Res 2008;36(5):259-263. A -41

301. Salami A, Bavazzano M, Mora R et al. Harmonic scalpel in pharyngolaryngectomy with radical neck dissection. Journal of Otolaryngology - Head and Neck Surgery 2008;37(5):633-637. 302. Russell TA, Leighton RK. Comparison of autogenous bone graft and endothermic calcium phosphate cement for defect augmentation in tibial plateau fractures. A multicenter, prospective, randomized study. Journal of Bone and Joint Surgery - Series A 2008;90(10):2057-2061. 303. Rocchi L, Canal A, Fanfani F et al. Articular ganglia of the volar aspect of the wrist: arthroscopic resection compared with open excision. A prospective randomised study. Scandinavian journal of plastic and reconstructive surgery and hand surgery. 2008;42(5):253-259. 304. Radwan YA, ElSobhi G, Badawy WS et al. Resistant tennis elbow: shock-wave therapy versus percutaneous tenotomy. International orthopaedics 2008;32(5):671-677. 305. Pring CM, Tran V, O'Rourke N et al. Laparoscopic versus open ventral hernia repair: a randomized controlled trial. ANZ journal of surgery 2008;78(10):903-906. 306. Pellise M, Fernandez-Esparrach G, Cardenas A et al. Impact of wide- angle, high-definition endoscopy in the diagnosis of colorectal neoplasia: a randomized controlled trial. Gastroenterology 2008;135(4):1062-1068. 307. Pastor AC, Phillips JD, Fenton SJ et al. Routine use of a SILASTIC spring-loaded silo for infants with gastroschisis: a multicenter randomized controlled trial. Journal of pediatric surgery 2008;43(10):1807-1812. 308. Pace A, Yousef A. The effect of patient position on blood loss in primary cemented total hip arthroplasty. Archives of orthopaedic and trauma surgery 2008;128(10):1209-1212. 309. Ozkara A, Hatemi A, Mert M et al. The effects of internal thoracic artery preparation with intact pleura on respiratory function and patients' early outcomes. The Anatolian journal of cardiology 2008;8(5):368-373. 310. Nogueira CRSR, Hueb W, Takiuti ME et al. Quality of life after on- pump and off-pump coronary artery bypass grafting surgery. Arquivos Brasileiros de Cardiologia 2008;91(4):238-244. 311. Narang S, Satsangi DK, Banerjee A et al. Stentless valves versus stented bioprostheses at the aortic position: midterm results. The Journal of thoracic and cardiovascular surgery 2008;136(4):943-947. 312. Mosler P, Aziz AMA, Hieston K et al. Evaluation of supplemental cautery during endoluminal gastroplication for the treatment of gastroesophageal reflux disease. Surgical Endoscopy 2008;22(10):2158-2163. 313. Moser C, Opitz I, Zhai W et al. Autologous fibrin sealant reduces the incidence of prolonged air leak and duration of chest tube drainage after lung volume reduction surgery: a prospective randomized blinded study. The Journal of thoracic and cardiovascular surgery 2008;136(4):843-849. 314. Mickevicius A, Endzinas Z, Kiudelis M et al. Influence of wrap length on the effectiveness of Nissen and Toupet fundoplication: a A -42

prospective randomized study. Surgical endoscopy 2008;22(10):2269- 2276. 315. Meknas K, Odden-Miland A, Mercer JB et al. Radiofrequency microtenotomy: a promising method for treatment of recalcitrant lateral epicondylitis. The American journal of sports medicine 2008;36(10):1960-1965. 316. Markovic DM, Davidovic LB, Cvetkovic DD et al. Single-center prospective, randomized analysis of conventional and eversion carotid endarterectomy. The Journal of cardiovascular surgery 2008;49(5):619-625. 317. Luring C, Beckmann J, Haibock P et al. Minimal invasive and computer assisted total knee replacement compared with the conventional technique: a prospective, randomised trial. Knee Surg Sports Traumatol Arthrosc 2008;16(10):928-934. 318. Lindfors NC, Heikkila JT, Aho AJ. Long-term evaluation of blood silicon and ostecalcin in operatively treated patients with benign bone tumors using bioactive glass and autogenous bone. Journal of biomedical materials research. 2008;87(1):73-76. 319. Koch A, Bringman S, Myrelid P et al. Randomized clinical trial of groin hernia repair with titanium-coated lightweight mesh compared with standard polypropylene mesh. British journal of surgery 2008;95(10):1226-1231. 320. Kim YH, Kim JS, Yoon SH. A recession of posterior cruciate ligament in posterior cruciate-retaining total knee arthrosplasty. The Journal of arthroplasty 2008;23(7):999-1004. 321. Katsinelos P, Kountouras J, Paroutoglou G et al. A comparative study of 50% dextrose and normal saline solution on their ability to create submucosal fluid cushions for endoscopic resection of sessile rectosigmoid polyps. Gastrointestinal endoscopy 2008;68(4):692-698. 322. Kaltenbach T, Friedland S, Soetikno R. A randomised tandem colonoscopy trial of narrow band imaging versus white light examination to compare neoplasia miss rates. Gut 2008;57(10):1406- 1412. 323. Jakubietz RG, Gruenert JG, Kloss DF et al. A randomised clinical study comparing palmar and dorsal fixed-angle plates for the internal fixation of AO C-type fractures of the distal radius in the elderly. The Journal of hand surgery, European volume 2008;33(5):600-604. 324. Itoh S, Ohta T, Sekino Y et al. Treatment of distal radius fractures with a wrist-bridging external fixation: the value of alternating electric current stimulation. The Journal of hand surgery, European volume 2008;33(5):605-608. 325. Hu K-H, Lin K-N, Li W-T et al. Effects of Meropack in the middle meatus after functional endoscopic sinus surgery in children with chronic sinusitis. International Journal of Pediatric Otorhinolaryngology 2008;72(10):1535-1540. 326. Hernandez-Castanos DM, Ponce VV, Gil F. Release of ischaemia prior to wound closure in total knee arthroplasty: a better method? International orthopaedics 2008;32(5):635-638.

A -43

327. Hammond TM, Huang A, Prosser K et al. Parastomal hernia prevention using a novel collagen implant: a randomised controlled phase 1 study. Hernia. 2008;12(5):475-481. 328. Gopal SC, Gangopadhyay AN, Mohan TV et al. Use of fibrin glue in preventing urethrocutaneous fistula after hypospadias repair. Journal of pediatric surgery 2008;43(10):1869-1872. 329. Forauer AR, Hoffer EK, Homa K. Dialysis access venous stenoses: Treatment with balloon angioplasty-1- Versus 3-minute inflation times. Radiology 2008;249(1):375-381. 330. Evonich RF, Stephens JC, Merhi W et al. The role of temporary biventricular pacing in the cardiac surgical patient with severely reduced left ventricular systolic function. The Journal of thoracic and cardiovascular surgery 2008;136(4):915-921. 331. Eljamel MS, Goodman C, Moseley H. ALA and Photofrin fluorescence-guided resection and repetitive PDT in glioblastoma multiforme: a single centre Phase III randomised controlled trial. Lasers Med Sci 2008;23(4):361-367. 332. Disselhoff BCVM, Der Rinderen DJ, Kelder JC et al. Randomized clinical trial comparing endovenous laser with cryostripping for great saphenous varicose veins. British Journal of Surgery 2008;95(10):1232-1238. 333. Corry J, Poon W, McPhee N et al. Randomized study of percutaneous endoscopic gastrostomy versus nasogastric tubes for enteral feeding in head and neck cancer patients treated with (chemo)radiation. Journal of Medical Imaging and Radiation Oncology 2008;52(5):503- 510. 334. Casalino S, Tesler UF, Novelli E et al. The efficacy and safety of extending the ischemic time with a modified cardioplegic technique for coronary artery surgery. Journal of Cardiac Surgery 2008;23(5):444- 449. 335. Capello WN, D'Antonio JA, Feinberg JR et al. Ceramic-on-ceramic total hip arthroplasty: update. The Journal of arthroplasty 2008;23(7):39-43. 336. Braga-Silva J, Peruchi FM, Moschen GM et al. A comparison of the use of distal radius vascularised bone graft and non-vascularised iliac crest bone graft in the treatment of non-union of scaphoid fractures. The Journal of hand surgery, European volume 2008;33(5):636-640. 337. Berlucchi M, Castelnuovo P, Vincenzi A et al. Endoscopic outcomes of resorbable nasal packing after functional endoscopic sinus surgery: a multicenter prospective randomized controlled study. Eur Arch Otorhinolaryngol 2008; 338. Bednar F, Osmancik P, Vanek T et al. Platelet activity and aspirin efficacy after off-pump compared with on-pump coronary artery bypass surgery: results from the prospective randomized trial PRAGUE 11-Coronary Artery Bypass and REactivity of Thrombocytes (CABARET). The Journal of thoracic and cardiovascular surgery 2008;136(4):1054-1060.

A -44

339. Barth M, Tuettenberg J, Thome C et al. Watertight dural closure: is it necessary? A prospective randomized trial in patients with supratentorial craniotomies. Neurosurgery 2008;63(4):352-358. 340. Au WK, Chiu SW, Sun MP et al. Improved leg wound healing with endoscopic saphenous vein harvest in coronary artery bypass graft surgery: A prospective randomized study in Asian population. Journal of Cardiac Surgery 2008;23(6):633-637. 341. Ang K-L, Chin D, Leyva F et al. Randomized, controlled trial of intramuscular or intracoronary injection of autologous bone marrow cells into scarred myocardium during CABG versus CABG alone. Nat Clin Pract Cardiovasc Med 2008;5(10):663-670. 342. Allweis TM, Kaufman Z, Lelcuk S et al. A prospective, randomized, controlled, multicenter study of a real-time, intraoperative probe for positive margin detection in breast-conserving surgery. American journal of surgery 2008;196(4):483-489. 343. Abela R, Liamis A, Prionidis I et al. Reverse foam sclerotherapy of the great saphenous vein with sapheno-femoral ligation compared to standard and invagination stripping: a prospective clinical series. European journal of vascular and endovascular surgery. 2008;36(4):485-490. 344. Wylde V, Learmonth I, Potter A et al. Patient-reported outcomes after fixed- versus mobile-bearing total knee replacement: a multi-centre randomised controlled trial using the Kinemax total knee replacement. The Journal of bone and joint surgery British volume. 2008;90(9):1172-1179. 345. Werk M, Langner S, Reinkensmeier B et al. Inhibition of restenosis in femoropopliteal arteries: paclitaxel-coated versus uncoated balloon: femoral paclitaxel randomized pilot trial. Circulation 2008;118(13):1358-1365. 346. Tuzuner S, Inceoglu S, Bilen FE. Median nerve excursion in response to wrist movement after endoscopic and open carpal tunnel release. The Journal of hand surgery 2008;33(7):1063-1068. 347. Slepavicius A, Beisa V, Janusonis V et al. Focused versus conventional parathyroidectomy for primary hyperparathyroidism: A prospective, randomized, blinded trial. Langenbeck's Archives of Surgery 2008;393(5):659-666. 348. Sartori PV, De Fina S, Colombo G et al. Ligasure versus Ultracision in thyroid surgery: A prospective randomized study. Langenbeck's Archives of Surgery 2008;393(5):655-658. 349. Rasmussen S, Krum-Moller DS, Lauridsen LR et al. Epidural steroid following discectomy for herniated lumbar disc reduces neurological impairment and enhances recovery: a randomized study with two-year follow-up. Spine 2008;33(19):2028-2033. 350. Pryor SG, Sykes J, Tollefson TT. Efficacy of fibrin sealant (human) (Evicel) in rhinoplasty: a prospective, randomized, single-blind trial of the use of fibrin sealant in lateral osteotomy. Archives of facial plastic surgery. 2008;10(5):339-344.

A -45

351. Peng B, Zheng J-H, Li H. Effect of retroperitoneal laparoscopic radical nephrectomy of renal carcinoma (nephroma) on perioperative cell immunity. Journal of Endourology 2008;22(9):2161-2164. 352. Ost D, Shah R, Anasco E et al. A randomized trial of CT fluoroscopic- guided bronchoscopy vs conventional bronchoscopy in patients with suspected lung cancer. Chest 2008;134(3):507-513. 353. Noguchi M, Kakuma T, Suekane S et al. A randomized clinical trial of suspension technique for improving early recovery of urinary continence after radical retropubic prostatectomy. BJU international 2008;102(8):958-963. 354. Ng SSM, Leung KL, Lee JFY et al. Laparoscopic-assisted versus open abdominoperineal resection for low rectal cancer: A prospective randomized trial. Annals of Surgical Oncology 2008;15(9):2418-2425. 355. Misra MC, Kumar S, Bansal VK. Total extraperitoneal (TEP) mesh repair of inguinal hernia in the developing world: comparison of low- cost indigenous balloon dissection versus direct telescopic dissection: a prospective randomized controlled study. Surgical endoscopy 2008;22(9):1947-1958. 356. Minervini A, Davenport K, Pefanis G et al. Prospective study comparing the bladeless optical access trocar versus hasson open trocar for the establishment of pneumoperitoneum in laparoscopic renal procedures. Archivio Italiano di Urologia e Andrologia 2008;80(3):95-98. 357. Mik M, Rzetecki T, Sygut A et al. Open and closed haemorrhoidectomy for fourth degree haemorrhoids--comparative one center study. Acta chirurgica Iugoslavica 2008;55(3):119-125. 358. Metz R, Verleisdonk E-JMM, van der Heijden GJMG et al. Acute Achilles tendon rupture: minimally invasive surgery versus nonoperative treatment with immediate full weightbearing-a randomized controlled trial. Am J Sports Med 2008;36(9):1688-1694. 359. Menon M, Muhletaler F, Campos M et al. Assessment of early continence after reconstruction of the periprostatic tissues in patients undergoing computer assisted (robotic) prostatectomy: results of a 2 group parallel randomized controlled trial. The Journal of urology 2008;180(3):1018-1023. 360. Meneghini RM, Smits SA, Swinford RR et al. A randomized, prospective study of 3 minimally invasive surgical approaches in total hip arthroplasty: comprehensive gait analysis. The Journal of arthroplasty 2008;23(6):68-73. 361. Mahadeva S, Chia YC, Vinothini A et al. Cost-effectiveness of and satisfaction with a Helicobacter pylori "test and treat" strategy compared with prompt endoscopy in young Asians with dyspepsia. Gut 2008;57(9):1214-1220. 362. Macaulay W, Nellans KW, Garvin KL et al. Prospective randomized clinical trial comparing hemiarthroplasty to total hip arthroplasty in the treatment of displaced femoral neck fractures: winner of the Dorr Award. The Journal of arthroplasty 2008;23(6 Suppl 1):2-8. 363. Lundell L, Attwood S, Ell C et al. Comparing laparoscopic antireflux surgery with esomeprazole in the management of patients with A -46

chronic gastro-oesophageal reflux disease: a 3-year interim analysis of the LOTUS trial. Gut 2008;57(9):1207-1213. 364. Lombardi CP, Raffaelli M, Cicchetti A et al. The use of "harmonic scalpel" versus "knot tying" for conventional "open" thyroidectomy: Results of a prospective randomized study. Langenbeck's Archives of Surgery 2008;393(5):627-631. 365. Licameli G, Johnston P, Luz J et al. Phosphorylcholine-coated antibiotic tympanostomy tubes: are post-tube placement complications reduced? International journal of pediatric otorhinolaryngology 2008;72(9):1323-1328. 366. Kunisaki C, Makino H, Takagawa R et al. Prospective randomized controlled trial comparing the use of 3.5-mm and 4.8-mm staples in gastric surgery. Hepato-gastroenterology 2008;55(86-87):1943-1947. 367. Ko S-H, Lee C-C, Friedman D et al. Arthroscopic single-row supraspinatus tendon repair with a modified mattress locking stitch: a prospective, randomized controlled comparison with a simple stitch. Arthroscopy 2008;24(9):1005-1012. 368. Ko MT, Chuang KC, Su CY. Multiple analyses of factors related to intraoperative blood loss and the role of reverse Trendelenburg position in endoscopic sinus surgery. The Laryngoscope 2008;118(9):1687-1691. 369. Kirkley A, Birmingham TB, Litchfield RB et al. A randomized trial of arthroscopic surgery for osteoarthritis of the knee. The New England journal of medicine. 2008;359(11):1097-1107. 370. Kargi E, Babuccu O, Altunkaya H et al. Tramadol as a local anaesthetic in tendon repair surgery of the hand. The Journal of international medical research 2008;36(5):971-978. 371. Ibrahim HM, Al-Kandari AM, Shaaban HS et al. Role of ureteral stenting after uncomplicated ureteroscopy for distal ureteral stones: a randomized, controlled trial. The Journal of urology 2008;180(3):961- 965. 372. Hubner M, Demartines N, Muller S et al. Prospective randomized study of monopolar scissors, bipolar vessel sealer and ultrasonic shears in laparoscopic colorectal surgery. British journal of surgery 2008;95(9):1098-1104. 373. Horiuchi A, Nakayama Y, Tanaka N et al. Prospective randomized trial comparing the direct method using a 24 Fr bumper-button-type device with the pull method for percutaneous endoscopic gastrostomy. Endoscopy 2008;40(9):722-726. 374. Hallgrimsson P, Loven L, Westerdahl J et al. Use of the harmonic scalpel versus conventional haemostatic techniques in patients with Grave disease undergoing total thyroidectomy: A prospective randomised controlled trial. Langenbeck's Archives of Surgery 2008;393(5):675-680. 375. Gurbet A, Bekar A, Bilgin H et al. Pre-emptive infiltration of levobupivacaine is superior to at-closure administration in lumbar laminectomy patients. European spine journal. 2008;17(9):1237-1241. 376. Glineur D, Hanet C, Poncelet A et al. Comparison of bilateral internal thoracic artery revascularization using in situ or Y graft configurations: A -47

a prospective randomized clinical, functional, and angiographic midterm evaluation. Circulation 2008;118(14):S216-221. 377. Ezer A, Nursal TZ, Colakoglu T et al. The impact of gallbladder aspiration during elective laparoscopic cholecystectomy: a prospective randomized study. American journal of surgery 2008;196(3):456-459. 378. Ezeome ER, Adebamowo CA. Closed suction drainage versus closed simple drainage in the management of modified radical mastectomy wounds. South African medical journal. 2008;98(9):712-715. 379. Egol K, Walsh M, Tejwani N et al. Bridging external fixation and supplementary Kirschner-wire fixation versus volar locked plating for unstable fractures of the distal radius: a randomised, prospective trial. The Journal of bone and joint surgery, British volume. 2008;90(9):1214-1221. 380. Costantini E, Lazzeri M, Bini V et al. Burch colposuspension does not provide any additional benefit to pelvic organ prolapse repair in patients with urinary incontinence: a randomized surgical trial. The Journal of urology 2008;180(3):1007-1012. 381. Cooper DJ, Rosenfeld JV, Murray L et al. Early decompressive craniectomy for patients with severe traumatic brain injury and refractory intracranial hypertension-A pilot randomized trial. Journal of Critical Care 2008;23(3):387-393. 382. Chude GG, Rayate NV, Patris V et al. Defunctioning loop ileostomy with low anterior resection for distal rectal cancer: should we make an ileostomy as a routine procedure? A prospective randomized study. Hepato-gastroenterology 2008;55(86-87):1562-1567. 383. Chotanaphuti T, Ongnamthip P, Teeraleekul K et al. Comparative study between computer assisted-navigation and conventional technique in minimally invasive surgery total knee arthroplasty, prospective control study. Journal of the Medical Association of Thailand. 2008;91(9):1382-1388. 384. Chimona T, Proimos E, Mamoulakis C et al. Multiparametric comparison of cold knife tonsillectomy, radiofrequency excision and thermal welding tonsillectomy in children. International journal of pediatric otorhinolaryngology 2008;72(9):1431-1436. 385. Celik SE, Altan T, Celik S et al. Mitomycin protection of peridural fibrosis in lumbar disc surgery. Journal of neurosurgery 2008;Spine. 9(3):243-248. 386. Barczynski M, Konturek A, Cichon S. Minimally invasive video- assisted thyreoidectomy (MIVAT) with and without use of harmonic scalpel - A randomized study. Langenbeck's Archives of Surgery 2008;393(5):647-654. 387. Bansal P, Gupta A, Mongha R et al. Laparoscopic versus open pyeloplasty: Comparison of two surgical approaches- a single centre experience of three years. Journal of Minimal Access Surgery 2008;4(3):76-79. 388. Aghamir SMK, Mohammadi A, Mosavibahar SH et al. Totally tubeless percutaneous nephrolithotomy in renal anomalies. Journal of Endourology 2008;22(9):2131-2134.

A -48

389. Navali AM, Rouhani. Zone 2 flexor tendon repair in young children: A comparative study of four-strand versus two-strand repair. Journal of Hand Surgery: European Volume 2008;33(4):424-429. 390. Moreno M, Wiltgen JE, Bodanese B et al. Radioguided breast surgery for occult lesion localization - Correlation between two methods. Journal of Experimental and Clinical Cancer Research 2008;27(1)(29): 391. Moonen A, Thomassen BJW, Knoors NT et al. Pre-operative injections of epoetin-{alpha} versus post-operative retransfusion of autologous shed blood in total hip and knee replacement: a prospective randomised clinical trial. Journal of Bone and Joint Surgery - British Volume 2008;90(8):1079-1083. 392. Blond L, Vendel Jensen N, Soe Nielsen NH. Clinical consequences of different exsanguination methods in hand surgery. A double-blind randomised study. Journal of Hand Surgery: European Volume 2008;33(4):475-477. 393. Wang YF, Wu JS, Mao Y et al. The optimal time-window for surgical treatment of spontaneous intracerebral hemorrhage: result of prospective randomized controlled trial of 500 cases. Acta Neurochir Suppl 2008;105:141-145. 394. Ucak A, Inan BK, Guler A et al. Comparison of supragenuar and infragenuar incision in preparation of saphenous vein graft during coronary bypass surgery. Anatolian Journal of Clinical Investigation 2008;2(2):70-73. 395. Rai RS, Patrulu KSK, Rai R et al. Lithoclast* master in intracorporeal lithotripsy during percutaneous nephrolithotomy: Our experience. Medical Journal Armed Forces India 2008;64(3):232-233. 396. Navali AM, Rouhani AR, Mortazavi SMJ. A comparative study of two suture configurations in zone II flexor tendon repair in adults. Acta Medica Iranica 2008;46(3):207-212. 397. Morgan T, Zuccarello M, Narayan R et al. Preliminary findings of the minimally-invasive surgery plus rtPA for intracerebral hemorrhage evacuation (MISTIE) clinical trial. Acta Neurochir Suppl 2008;105:147- 151. 398. Gyawali KR, Pokharel M, Amatya RCM. Short duration anterior nasal packing after submucosal resection of nasal septum. Kathmandu University Medical Journal 2008;6(22):173-175. 399. Crook TJ, Lockyer CR, Keoghane SR et al. A Randomized Controlled Trial of Nephrostomy Placement Versus Tubeless Percutaneous Nephrolithotomy. The Journal of urology 2008;180(2):612-614. 400. Pokorny H, Klingler A, Schmid T et al. Recurrence and complications after laparoscopic versus open inguinal hernia repair: results of a prospective randomized multicenter trial. Hernia 2008;12(4):385-389.

A -49

Appendix 8. Operational definitions of data items collected for Chapter

2, including the CONSORT 2001 checklist, items related to external validity, and general study characteristics

CONSORT (2001) checklist

1. Title and abstract How participants were allocated to interventions (e.g., random allocation", "randomised", or "randomly assigned"). 2. Background Scientific background and explanation of rationale. 3. Participants Eligibility criteria for participants and the settings and locations where the data were collected. 4. Interventions Precise details of the interventions intended for each group and how and when they were actually administered. 5. Objectives Specific objectives and hypotheses. 6. Outcomes* Clearly defined primary and secondary outcome measures and, when applicable, any methods used to enhance the quality of measurements (e.g., multiple observations, training of assessors). 7. Sample size* How sample size was determined and, when applicable, explanation of any interim analyses and stopping rules. 8. Sequence Method used to generate the random allocation generation* sequence, including details of any restriction (e.g., blocking, stratification). 9. Allocation Method used to implement the random allocation concealment* sequence (e.g., numbered containers or central telephone), clarifying whether the sequence was concealed until interventions were assigned). 10. Implementation Who generated the allocation sequence, who enrolled participants, and who assigned participants to their groups. 11. Blinding* Whether or not participants, those administering the interventions, and those assessing the outcomes were blinded to group assignment. If done, how the success of blinding was evaluated. 12. Statistical Statistical methods used to compare groups for methods primary outcome(s); Methods for additional analyses, such as subgroup analyses and adjusted analyses. 13. Participant flow Flow of participants through each stage (a A -50

diagram is strongly recommended). Specifically, for each group report the numbers of participants randomly assigned, receiving intended treatment, completing the study protocol, and analyzed for the primary outcome. Describe protocol deviations from study as planned, together with reasons. 14. Recruitment Dates defining the periods of recruitment and follow-up. 15. Baseline data Baseline demographic and clinical characteristics of each group. 16. Numbers Number of participants (denominator) in each analysed* group included in each analysis and whether the analysis was by intention-to-treat". State the results in absolute numbers when feasible (e.g., 10/20, not 50%). 17. Outcomes and For each primary and secondary outcome, a estimation summary of results for each group, and the estimated effect size and its precision (e.g., 95% confidence interval). 18. Ancillary analyses Address multiplicity by reporting any other analyses performed, including subgroup analyses and adjusted analyses, indicating those pre-specified and those exploratory. 19. Adverse events All important adverse events or side effects in each intervention group. 20. Interpretation Interpretation of the results, taking into account study hypotheses, sources of potential bias or imprecision and the dangers associated with multiplicity of analyses and outcomes. 21. Generalisability Generalisability (external validity) of the trial findings. 22. Overall evidence General interpretation of the results in the context of current evidence.

Items related to external validity

Reporting of Were detailed criteria reported? Each reported inclusion/exclusion criterion was graded as “strongly justified”, criteria “potentially justified” or “poorly justified” according to the methods described elsewhere. Details of surgical Procedural details must be described in sufficient interventions detail to be replicated, or cite other reports which provide the necessary detail. Anaesthetic details Anaesthetic procedures must be described in sufficient detail to be replicated. Preoperative care The preoperative care protocol must be described in sufficient detail to be replicated.

A -51

Postoperative care The postoperative care protocol must be described in sufficient detail to be replicated. Number of surgeons The number of surgeons who performed the surgical intervention(s) in the study must be reported. Experience level of Any description of the experience level of the surgeons surgeons who performed the surgical intervention(s). This may include actual number of previous operations performed, training necessary for surgeons to participate in the trial, or the level of training of the surgeon (consultant, registrar/resident, house officer etc). Number of centres The actual number of centres where the study was conducted at. Details of each centre Any description of the centres where the study was conducted. This may include a description of teaching status, centre expertise, or affiliated organisations. Location of centre The city and country where each study centre is located. Method of recruitment Any description of how patients entered into the trial, e.g. presentation to hospital, recruitment via clinics, public advertisement etc. Number of patients Where more than one centre was involved, the recruited at each number of patients recruited at each centre must centre be reported.

Study characteristics

Number of authors The number of authors listed for each publication. Author degree The citation of any degree in epidemiology, public health, or biostatistics by any of the authors. Type of comparison This was dichotomized as surgical vs. surgical intervention, or surgical vs. non-surgical intervention. Type of journal This was categorised as general surgical, general medical, subspecialty surgical or subspecialty medical journal. Impact factor The impact factor was recorded for each publication’s journal according to the 2008 Thomson Journal of Citation reports. Sample size The total sample size of included participants was recorded. Multicentre The study was recorded as multicentre of more than one centre was involved in the conduct of the trial.

A -52

Length of the article The number of words in the published report, excluding abstracts, tables and figures. Sample size Was a power calculation performed a priori to calculation determine the required number of participants Outcome specification A clear and explicit definition of the primary/main outcome(s), and/or a specific outcome used for power calculation. Random sequence A method described which is completely generation unpredictable (i.e. due to chance) in nature Concealment of A reported method where researchers and trial treatment allocation participants are prevented from knowing which study arm they have been allocated prior to the intervention being administered. Blinding Methods to prevent the knowledge of which intervention group each participant belongs to. Efforts to blind participants, care providers and/or data collectors was recorded. Source of funding Any declared sources of support were recorded. These were categorised as: any industry (for profit) support, non-industry (not for profit) support, no external funding received (internal department funding), or unclear source of support.

Legend *CONSORT items that overlap with methodological domains and were removed from the total score in sensitivity analysis.

A -53

Appendix 9. Syntax of electronic search strategy employed to identify meta-analyses

Medline via Ovid (1st January 2010 – 30th June 2011) 1. meta analysis.pt. 2. meta-analysis/ 3. (systematic$ and (review$ or overview$)).tw. 4. meta?analy*.tw. 5. meta analy*.tw. 6. review.pt. and (medline or pubmed).tw. 7. review.pt. and embase.tw. 8. or/1-7 9. random*.tw. 10. rct*.tw. 11. trial*.tw. 12. or/9-11 13. exp Obstetric Surgical Procedures/ 14. exp body modification, non-therapeutic/ 15. exp ophthalmologic surgical procedures/ 16. exp oral surgical procedures/ 17. or/13-16 18. exp Surgical Procedures, Operative/ 19. 18 not 17 20. 8 and 12 and 19 21. letter.pt. 22. comment.pt. 23. editorial.pt. 24. or/21-23 25. animal/ 26. human/ 27. 25 not (25 and 26) 28. or/24,27 29. 20 not 28 30. limit 29 to english language

EMBASE via Ovid (1st January 2010 – 30th June 2011) 1. meta analysis/ 2. "systematic review"/ 3. (systematic$ and (review$ or overview$)).tw. 4. meta?analy*.tw. 5. meta analy*.tw. 6. review.pt. and (medline or pubmed).tw. 7. review.pt. and embase.tw. 8. or/1-7 9. random*.tw. 10. rct*.tw. 11. trial*.tw. A -54

12. or/9-11 13. SURGICAL TECHNIQUE/ 14. 8 and 12 and 13 15. letter.pt. 16. editorial.pt. 17. or/15-16 18. 14 not 17 19. 18 20. limit 19 to english language

Cochrane via Wiley (1st January 2010 – 30th June, 2011) 1. meta analysis/ 2. "systematic review"/ 3. (systematic$ and (review$ or overview$)).tw. 4. meta?analy*.tw. 5. meta analy*.tw. 6. review.pt. and (medline or pubmed).tw. 7. review.pt. and embase.tw. 8. or/1-7 9. random*.tw. 10. rct*.tw. 11. trial*.tw. 12. or/9-11 13. SURGICAL TECHNIQUE/ 14. 8 and 12 and 13 15. letter.pt. 16. editorial.pt. 17. or/15-16 18. 14 not 17 19. 18 20. limit 19 to english language

A -55

Appendix 10. References to included surgical meta-analyses

1. Jacobs W, Willems PC, Kruyt M et al. Systematic review of anterior interbody fusion techniques for single- and double-level cervical degenerative disc disease. Spine 2011;36(14):E950-960. 2. Ni S, Qiyin C, Tao W et al. Tubeless percutaneous nephrolithotomy is associated with less pain and shorter hospitalization compared with standard or small bore drainage: a meta-analysis of randomized, controlled trials. Urology 2011;77(6):1293-1298. 3. Mosges R, Hellmich M, Allekotte S et al. Hemorrhage rate after coblation tonsillectomy: a meta-analysis of published trials. Eur Arch Otorhinolaryngol 2011;268(6):807-816. 4. Li LY, Liu QS, Li L et al. A meta-analysis and systematic review of prophylactic endoscopic treatments for postpolypectomy bleeding. Int J Colorectal Dis 2011;26(6):709-719. 5. He J-Y, Jiang L-S, Dai L-Y. Is patellar resurfacing superior than nonresurfacing in total knee arthroplasty? A meta-analysis of randomized trials. Knee 2011;18(3):137-144. 6. Alexiou VG, Salazar-Salvia MS, Jervis PN et al. Modern technology- assisted vs conventional tonsillectomy: a meta-analysis of randomized controlled trials. Arch Otolaryngol Head Neck Surg 2011;137(6):558- 570. 7. Wen Y, Meng L, Xie J et al. Direct autologous bone marrow-derived stem cell transplantation for ischemic heart disease: a meta-analysis. Expert Opin Biol Ther 2011;11(5):559-567. 8. Tou S, Malik AI, Wexner SD et al. Energy source instruments for laparoscopic colectomy. Cochrane Database Syst Rev 2011;5:CD007886. 9. Singer AJ, Thode HC, Jr., Chale S et al. Primary closure of cutaneous abscesses: a systematic review. Am J Emerg Med 2011;29(4):361- 366. 10. Aly O, Green A, Joy M, Wong CH, Al-Kandari A, Cheng S, et al. Is laparoscopic inguinal hernia repair more effective than open repair? J Coll Physicians Surg Pak 2011;21(5):291-296. 11. Murtuza B, Pepper JR, Jones C et al. Does stentless aortic valve implantation increase perioperative risk? A critical appraisal of the literature and risk of bias analysis. Eur J Cardiothorac Surg 2011;39(5):643-652. 12. Memon MA, Subramanya MS, Khan S et al. Meta-analysis of D1 versus D2 gastrectomy for gastric adenocarcinoma. Ann Surg 2011;253(5):900-911. 13. Markar SR, Karthikesalingam A, Vyas S et al. Hand-sewn versus stapled oesophago-gastric anastomosis: systematic review and meta- analysis. J Gastrointest Surg 2011;15(5):876-884. 14. Lovrics PJ, Cornacchi SD, Vora R et al. Systematic review of radioguided surgery for non-palpable breast cancer. Eur J Surg Oncol 2011;37(5):388-397.

A -56

15. Liu HP, Zhang YC, Zhang YL et al. Drain versus no-drain after gastrectomy for patients with advanced gastric cancer: systematic review and meta-analysis. Dig Surg 2011;28(3):178-189. 16. Kuzyk PRT, Saccone M, Sprague S et al. Cross-linked versus conventional polyethylene for total hip replacement: a meta-analysis of randomised controlled trials. J Bone Joint Surg Br 2011;93(5):593-600. 17. Huang W-d, Jiang J-k, Lu Y-q. Value of T-tube in biliary tract reconstruction during orthotopic liver transplantation: a meta-analysis. J Zhejiang Univ Sci B 2011;12(5):357-364. 18. Harling L, Warren OJ, Martin A et al. Do miniaturized extracorporeal circuits confer significant clinical benefit without compromising safety? A meta-analysis of randomized controlled trials. Asaio J 2011;57(3):141-151. 19. Fasunla AJ, Greene BH, Timmesfeld N et al. A meta-analysis of the randomized controlled trials on elective neck dissection versus therapeutic neck dissection in oral cavity cancers with clinically node- negative neck. Oral Oncol 2011;47(5):320-324. 20. Diener MK, Fitzmaurice C, Schwarzer G et al. Pylorus-preserving pancreaticoduodenectomy (pp Whipple) versus pancreaticoduodenectomy (classic Whipple) for surgical treatment of periampullary and pancreatic carcinoma. Cochrane Database Syst Rev 2011;5):CD006053. 21. Ansaloni L, Catena F, Coccolini F et al. Surgery versus conservative antibiotic treatment in acute appendicitis: a systematic review and meta-analysis of randomized controlled trials. Dig Surg 2011;28(3):210-221. 22. Wei B, Qi C-L, Chen T-F et al. Laparoscopic versus open appendectomy for acute appendicitis: a metaanalysis. Surg Endosc 2011;25(4):1199-1208. 23. Tian HL, Tian JH, Yang KH et al. The effects of laparoscopic vs. open gastric bypass for morbid obesity: a systematic review and meta- analysis of randomized controlled trials. Obes Rev 2011;12(4):254- 260. 24. Tan G, Yang Z, Wang Z. Meta-analysis of laparoscopic total (Nissen) versus posterior (Toupet) fundoplication for gastro-oesophageal reflux disease based on randomized clinical trials. ANZ J Surg 2011;81(4):246-252. 25. Rerkasem K, Rothwell PM. Carotid endarterectomy for symptomatic carotid stenosis. Cochrane Database Syst Rev 2011;4):CD001081. 26. Kahokehr A, Sammour T, Soop M et al. Intraperitoneal local anaesthetic in abdominal surgery - a systematic review. ANZ J Surg 2011;81(4):237-245. 27. Jiang Y, Zhang K, Die J et al. A systematic review of modern metal- on-metal total hip resurfacing vs standard total hip arthroplasty in active young patients. J Arthroplasty 2011;26(3):419-426. 28. Huang M-J, Liang J-L, Wang H et al. Laparoscopic-assisted versus open surgery for rectal cancer: a meta-analysis of randomized controlled trials on oncologic adequacy of resection and long-term oncologic outcomes. Int J Colorectal Dis 2011;26(4):415-421. A -57

29. Hu X, Zhao Q. Systematic comparison of the effectiveness of radial artery and saphenous vein or right internal thoracic artery coronary bypass grafts in non-left anterior descending coronary arteries. J Zhejiang Univ Sci B 2011;12(4):273-279. 30. Colvin A, Sharma C, Parides M et al. What is the best femoral fixation of hamstring autografts in anterior cruciate ligament reconstruction?: a meta-analysis. Clin Orthop 2011;469(4):1075-1081. 31. Surgery for shoulder osteoarthritis: A cochrane systematic review. Journal of Rheumatology 2011;38(4):598-605. 32. Emond CE, Woelber EB, Kurd SK et al. A comparison of the results of anterior cruciate ligament reconstruction using bioabsorbable versus metal interference screws: a meta-analysis. J Bone Joint Surg Am 2011;93(6):572-580. 33. Yavin D, Roberts DJ, Tso M et al. Carotid endarterectomy versus stenting: a meta-analysis of randomized trials. Can J Neurol Sci 2011;38(2):230-235. 34. Sauerland S, Walgenbach M, Habermalz B et al. Laparoscopic versus open surgical techniques for ventral or incisional hernia repair. Cochrane Database Syst Rev 2011;3):CD007781. 35. Reid S, Cawthon PM, Craig JC et al. Non-immunosuppressive treatment for IgA nephropathy. Cochrane Database Syst Rev 2011;3):CD003962. 36. Pinder DK, Wilson H, Hilton MP. Dissection versus diathermy for tonsillectomy. Cochrane Database Syst Rev 2011;3):CD002211. 37. Ogah J, Cody DJ, Rogerson L. Minimally invasive synthetic suburethral sling operations for stress urinary incontinence in women: a short version Cochrane review. Neurourol Urodyn 2011;30(3):284- 291. 38. Murad MH, Shahrour A, Shah ND et al. A systematic review and meta-analysis of randomized trials of carotid endarterectomy vs stenting. J Vasc Surg 2011;53(3):792-797. 39. Li S, Chen Y, Su W et al. Systematic review of patellar resurfacing in total knee arthroplasty. Int Orthop 2011;35(3):305-316. 40. Li L, Yu C, Li Y. Endoscopic band ligation versus pharmacological therapy for variceal bleeding in cirrhosis: a meta-analysis. Can J Gastroenterol 2011;25(3):147-155. 41. Jia WQ, Tian JH, Yang KH et al. Open versus laparoscopic pyloromyotomy for pyloric stenosis: a meta-analysis of randomized controlled trials. Eur J Pediatr Surg 2011;21(2):77-81. 42. Gurusamy KS, Koti R, Pamecha V et al. Veno-venous bypass versus none for liver transplantation. Cochrane Database Syst Rev 2011;3):CD007712. 43. Economopoulos KP, Sergentanis TN, Tsivgoulis G et al. Carotid artery stenting versus carotid endarterectomy: a comprehensive meta- analysis of short-term and long-term outcomes. Stroke 2011;42(3):687-692. 44. Cheng T, Zhang G, Zhang X. Clinical and radiographic outcomes of image-based computer-assisted total knee arthroplasty: an evidence- based evaluation. Surg Innov 2011;18(1):15-20. A -58

45. Brin YS, Nikolaou VS, Joseph L et al. Imageless computer assisted versus conventional total knee replacement. A Bayesian meta- analysis of 23 comparative studies. Int Orthop 2011;35(3):331-339. 46. Thakur V, Schlachta CM, Jayaraman S. Minilaparoscopic versus conventional laparoscopic cholecystectomy a systematic review and meta-analysis. Ann Surg 2011;253(2):244-258. 47. Morgan J, Thomas K, Lee-Robichaud H et al. Transparent Cap Colonoscopy versus Standard Colonoscopy for Investigation of Gastrointestinal Tract Conditions. Cochrane Database Syst Rev 2011;2):CD008211. 48. Choudhary A, Bechtold ML, Arif M et al. Pancreatic stents for prophylaxis against post-ERCP pancreatitis: a meta-analysis and systematic review. Gastrointest Endosc 2011;73(2):275-282. 49. Bonati LH, Fraedrich G, Carotid Stenting Trialists C. Age modifies the relative risk of stenting versus endarterectomy for symptomatic carotid stenosis--a pooled analysis of EVA-3S, SPACE and ICSS. Eur J Vasc Endovasc Surg 2011;41(2):153-158. 50. Biere SSAY, Maas KW, Cuesta MA et al. Cervical or thoracic anastomosis after esophagectomy for cancer: a systematic review and meta-analysis. Dig Surg 2011;28(1):29-35. 51. Bangalore S, Kumar S, Wetterslev J et al. Carotid artery stenting vs carotid endarterectomy: meta-analysis and diversity-adjusted trial sequential analysis of randomized trials. Arch Neurol 2011;68(2):172- 184. 52. Mocellin SP, S. Nitti, D. The impact of surgery on survival of patients with cutaneous melanoma: Revisiting the role of primary tumor excision margins. Ann Surg 2011;253(2):238-243. 53. Zhu QD, Tao CL, Zhou MT et al. Primary closure versus T-tube drainage after common bile duct exploration for choledocholithiasis. Langenbecks Arch Surg 2011;396(1):53-62. 54. Wu Q, Xing Y, Zhou X et al. Meta-analysis of the efficacy and safety of bronchial thermoplasty in patients with moderate-to-severe persistent asthma. J Int Med Res 2011;39(1):10-22. 55. Theologou T, Bashir M, Rengarajan A et al. Preoperative intra aortic balloon pumps in patients undergoing coronary artery bypass grafting. Cochrane Database Syst Rev 2011;1):CD004472. 56. Siddiqui MRS, Sajid MS, Nisar A et al. A meta-analysis of outcomes after routine aspiration of the gallbladder during cholecystectomy. Int Surg 2011;96(1):21-27. 57. Sammour T, Kahokehr A, Srinivasa S et al. Laparoscopic colorectal surgery is associated with a higher intraoperative complication rate than open surgery. Ann Surg 2011;253(1):35-43. 58. Rerkasem K, Rothwell PM. Systematic review of randomized controlled trials of patch angioplasty versus primary closure and different types of patch materials during carotid endarterectomy. Asian J 2011;34(1):32-40. 59. Rehman H, Bezerra Carlos CB, Bruschini H et al: Traditional suburethral sling operations for urinary incontinence in women. Cochrane Database Syst Rev 2011;(1):CD001754. A -59

60. Mowatt G, N'Dow J, Vale L et al. Photodynamic diagnosis of bladder cancer compared with white light cystoscopy: Systematic review and meta-analysis. Int J Technol Assess Health Care 2011;27(1):3-10. 61. Kahokehr A, Sammour T, Srinivasa S et al. Systematic review and meta-analysis of intraperitoneal local anaesthetic for pain reduction after laparoscopic gastric procedures. Br J Surg 2011;98(1):29-36. 62. Jacobs W, Willems PC, van Limbeek J et al. Single or double-level anterior interbody fusion techniques for cervical degenerative disc disease. Cochrane Database Syst Rev 2011;1):CD004958. 63. Gurusamy KS, Pamecha V, Davidson BR. Piggy-back graft for liver transplantation. Cochrane Database Syst Rev 2011;(1):CD008258. 64. Gandhi R, Smith H, Lefaivre KA et al. Complications after minimally invasive total knee arthroplasty as compared with traditional incision techniques: a meta-analysis. J Arthroplasty 2011;26(1):29-35. 65. Gaitan HG, Reveiz L, Farquhar C. Laparoscopy for the management of acute lower abdominal pain in women of childbearing age. Cochrane Database Syst Rev 2011;(1):CD007683. 66. Dasari BV, McKay D, Gardiner K. Laparoscopic versus Open surgery for small bowel Crohn's disease. Cochrane Database Syst Rev 2011;(1):CD006956. 67. Dai Z, Li Y, Jiang D. Meta-analysis comparing arthroplasty with internal fixation for displaced femoral neck fracture in the elderly. J Surg Res 2011;165(1):68-74. 68. Cheng T, Liu T, Zhang G et al. Computer-navigated surgery in anterior cruciate ligament reconstruction: are radiographic outcomes better than conventional surgery? Arthroscopy 2011;27(1):97-100. 69. Birch DW, Manouchehri N, Shi X et al. Heated CO(2) with or without humidification for minimally invasive abdominal surgery. Cochrane Database Syst Rev 2011;(1):CD007821. 70. Anik I, Secer HI, Anik Y et al. Meta-analyses of intracerebral hematoma treatment. Turk 2011;21(1):6-14. 71. Ahmad NZ, Ahmed A. Meta-analysis of the effectiveness of surgical scalpel or diathermy in making abdominal skin incisions. Ann Surg 2011;253(1):8-13. 72. Malapert G, Hanna HA, Pages PB et al. Surgical sealant for the prevention of prolonged air leak after lung resection: meta-analysis. Ann Thorac Surg 2010;90(6):1779-1785. 73. Liu Z, Zhang P, Ma Y et al. Laparoscopy or not: A meta-analysis of the surgical effects of laparoscopic versus open appendicectomy. Surgical Laparoscopy, Endoscopy and Percutaneous Techniques 2010;20(6):362-370. 74. Du W, Ma B, Guo Y et al. Microdebrider vs. electrocautery for tonsillectomy: A meta-analysis. International Journal of Pediatric Otorhinolaryngology 2010;74(12):1379-1383. 75. Wijeyekoon SP, Gurusamy K, El-Gendy K et al. Prevention of parastomal herniation with biologic/composite prosthetic mesh: a systematic review and meta-analysis of randomized controlled trials. J Am Coll Surg 2010;211(5):637-645.

A -60

76. Sammour T, Kahokehr A, Chan S et al. The humoral response after laparoscopic versus open colorectal surgery: a meta-analysis. J Surg Res 2010;164(1):28-37. 77. Kodera Y, Fujiwara M, Ohashi N et al. Laparoscopic surgery for gastric cancer: a collective review with meta-analysis of randomized trials. J Am Coll Surg 2010;211(5):677-686. 78. Hegarty J, Beirne PV, Walsh E et al. Radical prostatectomy versus watchful waiting for prostate cancer. Cochrane Database Syst Rev 2010;(11):CD006590. 79. Gomes CA, Jr., Lustosa SAS, Matos D et al. Percutaneous endoscopic gastrostomy versus nasogastric tube feeding for adults with swallowing disturbances. Cochrane Database Syst Rev 2010;(11):CD008096. 80. Giannopoulos GA, Tzanakis NE, Rallis GE et al. Staple line reinforcement in laparoscopic bariatric surgery: Does it actually make a difference? A systematic review and meta-analysis. Surgical Endoscopy and Other Interventional Techniques 2010;24(11):2782- 2788. 81. Vasiliadis HS, Wasiak J. Autologous chondrocyte implantation for full thickness articular cartilage defects of the knee. Cochrane Database Syst Rev 2010;(10):CD003323. 82. Singh JA, Sperling J, Buchbinder R et al. Surgery for shoulder osteoarthritis. Cochrane Database Syst Rev 2010;(10):CD008089. 83. Seabra VF, Alobaidi S, Balk EM et al. Off-pump coronary artery bypass surgery and acute kidney injury: a meta-analysis of randomized controlled trials. Clin J Am Soc Nephrol 2010;5(10):1734- 1744. 84. Sauerland S, Jaschinski T, Neugebauer EA. Laparoscopic versus open surgery for suspected appendicitis. Cochrane Database Syst Rev 2010;(10):CD001546. 85. Moloo H, Haggar F, Coyle D et al. Hand assisted laparoscopic surgery versus conventional laparoscopy for colorectal surgery. Cochrane Database Syst Rev 2010;(10):CD006585. 86. Mehin R, Burnett RS, Brasher PMA. Does the new generation of high- flex knee prostheses improve the post-operative range of movement?: a meta-analysis. J Bone Joint Surg Br 2010;92(10):1429-1434. 87. Mazaki T, Masuda H, Takayama T. Prophylactic pancreatic stent placement and post-ERCP pancreatitis: a systematic review and meta-analysis. Endoscopy 2010;42(10):842-853. 88. Gurusamy KS, Kumar S, Davidson BR. Prophylactic gastrojejunostomy for unresectable periampullary carcinoma. Cochrane Database Syst Rev 2010;(10):CD008533. 89. Gurusamy KS, Bong JJ, Fusai G et al. Methods of cystic duct occlusion during laparoscopic cholecystectomy. Cochrane Database Syst Rev 2010;(10):CD006807. 90. Edelman JJ, Yan TD, Padang R et al. Off-pump coronary artery bypass surgery versus percutaneous coronary intervention: a meta- analysis of randomized and nonrandomized studies. Ann Thorac Surg 2010;90(4):1384-1390. A -61

91. Chen JS, You JF. Current status of surgical treatment for hemorrhoids - systematic review and meta-analysis. Chang Gung Medical Journal 2010;33(5):488-500. 92. Browning GG, Rovers MM, Williamson I et al. Grommets (ventilation tubes) for hearing loss associated with otitis media with effusion in children. Cochrane Database Syst Rev 2010;(10):CD001801. 93. Brown SR, Baraza W. Chromoscopy versus conventional endoscopy for the detection of polyps in the colon and rectum. Cochrane Database Syst Rev 2010;(10):CD006439. 94. Aslani N, Brown CJ. Does mesh offer an advantage over tissue in the open repair of umbilical hernias? A systematic review and meta- analysis. Hernia 2010;14(5):455-462. 95. Yadav K, Jalili M, Zehtabchi S. Management of traumatic occult pneumothorax. Resuscitation 2010;81(9):1063-1068. 96. Takagi H, Matsui M, Umemoto T. Lower graft patency after off-pump than on-pump coronary artery bypass grafting: an updated meta- analysis of randomized trials. J Thorac Cardiovasc Surg 2010;140(3):e45-47. 97. Parker MJ, Handoll HH. Gamma and other cephalocondylic intramedullary nails versus extramedullary implants for extracapsular hip fractures in adults. Cochrane Database Syst Rev 2010;(9):CD000093. 98. Henschke N, T, Rubinstein SM et al. Injection therapy and denervation procedures for chronic low-back pain: a systematic review. Eur Spine J 2010;19(9):1425-1449. 99. Carotid Stenting Trialists C, Bonati LH, Dobson J et al. Short-term outcome after stenting versus endarterectomy for symptomatic carotid stenosis: a preplanned meta-analysis of individual patient data. Lancet 2010;376(9746):1062-1073. 100. Burton MJ, Pollard AJ, Ramsden JD. Tonsillectomy for periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis syndrome (PFAPA). Cochrane Database Syst Rev 2010;(9):CD008669. 101. Broeders JAJL, Mauritz FA, Ahmed Ali U et al. Systematic review and meta-analysis of laparoscopic Nissen (posterior total) versus Toupet (posterior partial) fundoplication for gastro-oesophageal reflux disease. Br J Surg 2010;97(9):1318-1330. 102. Ahyai SA, Gilling P, Kaplan SA et al. Meta-analysis of functional outcomes and complications following transurethral procedures for lower urinary tract symptoms resulting from benign prostatic enlargement. Eur Urol 2010;58(3):384-397. 103. Yajun W, Yue Z, Xiuxin H et al. A meta-analysis of artificial total disc replacement versus fusion for lumbar degenerative disc disease. Eur Spine J 2010;19(8):1250-1261. 104. Verhagen PCMS, Schroder FH, Collette L et al. Does local treatment of the prostate in advanced and/or lymph node metastatic disease improve efficacy of androgen-deprivation therapy? A systematic review. Eur Urol 2010;58(2):261-269.

A -62

105. Tamaoki MJS, Belloti JC, Lenza M et al. Surgical versus conservative interventions for treating acromioclavicular dislocation of the shoulder in adults. Cochrane Database Syst Rev 2010;(8):CD007429. 106. Novara G, Artibani W, Barber MD et al. Updated systematic review and meta-analysis of the comparative data on colposuspensions, pubovaginal slings, and midurethral tapes in the surgical treatment of female stress urinary incontinence. Eur Urol 2010;58(2):218-238. 107. Mi J, Kang Y, Chen X et al. Whether robot-assisted laparoscopic fundoplication is better for gastroesophageal reflux disease in adults: a systematic review and meta-analysis. Surg Endosc 2010;24(8):1803-1814. 108. Maeso S, Reza M, Mayol JA et al. Efficacy of the Da Vinci surgical system in abdominal surgery compared with that of laparoscopy: a systematic review and meta-analysis. Ann Surg 2010;252(2):254-262. 109. Dedemadi G, Sgourakis G, Radtke A et al. Laparoscopic versus open mesh repair for recurrent inguinal hernia: a meta-analysis of outcomes. Am J Surg 2010;200(2):291-297. 110. Clark W, Hernandez J, McKeon B et al. Surgical shunting versus transjugular intrahepatic portasystemic shunting for bleeding varices resulting from portal hypertension and cirrhosis: a meta-analysis. Am Surg 2010;76(8):857-864. 111. Brar R, Nordon IM, Hinchliffe RJ et al. Surgical management of varicose veins: meta-analysis. Vascular 2010;18(4):205-220. 112. Takagi H, Goto S-N, Matsui M et al. A contemporary meta-analysis of Dacron versus polytetrafluoroethylene grafts for femoropopliteal bypass grafting. J Vasc Surg 2010;52(1):232-236. 113. Malik AI, Nelson RL, Tou S. Incision and drainage of perianal abscess with or without treatment of anal fistula. Cochrane Database Syst Rev 2010;(7):CD006827. 114. Latthe PM, Singh P, Foon R et al. Two routes of transobturator tape procedures in stress urinary incontinence: a meta-analysis with direct and indirect comparison of randomized trials. BJU Int 2010;106(1):68- 76. 115. Huisstede BM, Randsdorp MS, Coert JH et al. Carpal tunnel syndrome. Part II: effectiveness of surgical treatments--a systematic review. Arch Phys Med Rehabil 2010;91(7):1005-1024. 116. Ecker T, Carvalho AL, Choe JH et al. Hemostasis in thyroid surgery: Harmonic scalpel versus other techniques-a meta-analysis. Otolaryngology - Head and Neck Surgery 2010;143(1):17-25. 117. Chow A, Marshall H, Zacharakis E et al. Use of tissue glue for surgical incision closure: a systematic review and meta-analysis of randomized controlled trials. J Am Coll Surg 2010;211(1):114-125. 118. Parker MJ, Gurusamy KS, Azegami S. Arthroplasties (with and without bone cement) for proximal femoral fractures in adults. Cochrane Database Syst Rev 2010;(6):CD001706. 119. Hopley C, Stengel D, Ekkernkamp A et al. Primary total hip arthroplasty versus hemiarthroplasty for displaced intracapsular hip fractures in older patients: systematic review. BMJ 2010;340:c2332.

A -63

120. Zangrillo A, Garozzo FA, Biondi-Zoccai G et al. Miniaturized cardiopulmonary bypass improves short-term outcome in cardiac surgery: a meta-analysis of randomized controlled studies. J Thorac Cardiovasc Surg 2010;139(5):1162-1169. 121. Wang W, Shi J, Xie W-F. Transarterial chemoembolization in combination with percutaneous ablation therapy in unresectable hepatocellular carcinoma: a meta-analysis. Liver Int 2010;30(5):741- 749. 122. Twine CP, McLain AD. Graft type for femoro-popliteal bypass surgery. Cochrane Database Syst Rev 2010;(5):CD001487. 123. Shen C, Jiang S-D, Jiang L-S et al. Bioabsorbable versus metallic interference screw fixation in anterior cruciate ligament reconstruction: a meta-analysis of randomized controlled trials. Arthroscopy 2010;26(5):705-713. 124. Reichenbach S, Rutjes AW, Nuesch E et al. Joint lavage for osteoarthritis of the knee. Cochrane Database Syst Rev 2010;(5):CD007320. 125. Montedori A, Cirocchi R, Farinella E et al. Covering ileo- or colostomy in anterior resection for rectal carcinoma. Cochrane Database Syst Rev 2010;(5):CD006878. 126. Jacob TJ, Perakath B, Keighley MRB. Surgical intervention for anorectal fistula. Cochrane Database Syst Rev 2010;(5):CD006319. 127. Hahne AJ, Ford JJ, McMeeken JM. Conservative management of lumbar disc herniation with associated radiculopathy: a systematic review. Spine 2010;35(11):E488-504. 128. Diener MK, Voss S, Jensen K et al. Elective midline laparotomy closure: the INLINE systematic review and meta-analysis. Ann Surg 2010;251(5):843-856. 129. Coulthard P, Esposito M, Worthington HV et al. Tissue adhesives for closure of surgical incisions. Cochrane Database Syst Rev 2010;(5):CD004287. 130. Cennamo V, Fuccio L, Zagari RM et al. Can early precut implementation reduce endoscopic retrograde cholangiopancreatography-related complication risk? Meta-analysis of randomized controlled trials. Endoscopy 2010;42(5):381-388. 131. Burke N, Whelan JP, Goeree L et al. Systematic review and meta- analysis of transurethral resection of the prostate versus minimally invasive procedures for the treatment of benign prostatic obstruction. Urology 2010;75(5):1015-1022. 132. Biancari F, Tiozzo V. Staples versus sutures for closing leg wounds after vein graft harvesting for coronary artery bypass surgery. Cochrane Database Syst Rev 2010;(5):CD008057. 133. Biancari F, Mahar MAA. Meta-analysis of randomized trials on the efficacy of posterior pericardiotomy in preventing atrial fibrillation after coronary artery bypass surgery. J Thorac Cardiovasc Surg 2010;139(5):1158-1161. 134. Takagi H, Matsui M, Umemoto T. Off-pump coronary artery bypass may increase late mortality: a meta-analysis of randomized trials. Ann Thorac Surg 2010;89(6):1881-1888. A -64

135. Siddiqui MRS, Sajid MS, Woods WGA et al. A meta-analysis comparing side to end with colonic J-pouch formation after anterior resection for rectal cancer. Tech Coloproctol 2010;14(2):113-123. 136. Riediger C, Muller MW, Michalski CW et al. T-Tube or no T-tube in the reconstruction of the biliary tract during orthotopic liver transplantation: systematic review and meta-analysis. Liver Transpl 2010;16(6):705- 717. 137. Pan I, Dendukuri N. Efficacy and cost-effectiveness of a gentamicin- loaded collagen sponge as an adjuvant antibiotic prophylaxis for colorectal surgery. Montreal: Technology Assessment Unit of the McGill University Health Centre (MUHC). Report No 41. 2010. 138. Ohtani H, Tamamori Y, Noguchi K et al. A meta-analysis of randomized controlled trials that compared laparoscopy-assisted and open distal gastrectomy for early gastric cancer. J Gastrointest Surg 2010;14(6):958-964. 139. Nienhuijs SW, de Hingh IHJT. Pain after conventional versus Ligasure haemorrhoidectomy. A meta-analysis. Int J Surg 2010;8(4):269-273. 140. Markar SR, Karthikesalingam AP, Hagen ME et al. Robotic vs. laparoscopic Nissen fundoplication for gastro-oesophageal reflux disease: systematic review and meta-analysis. Int J Med Robot 2010;6(2):125-131. 141. Lee MS, Yang T, Dhoot J et al. Meta-analysis of clinical studies comparing coronary artery bypass grafting with percutaneous coronary intervention and drug-eluting stents in patients with unprotected left main coronary artery narrowings. Am J Cardiol 2010;105(8):1070-1075. 142. Kell MR, Burke JP, Barry M et al. Outcome of axillary staging in early breast cancer: a meta-analysis. Breast Cancer Res Treat 2010;120(2):441-447. 143. Heineman DJ, Poolman RW, Nork SE et al. Plate fixation or intramedullary fixation of humeral shaft fractures. Acta Orthop 2010;81(2):216-223. 144. From AM, Al Badarin FJ, Cha SS et al. Percutaneous coronary intervention with drug-eluting stents versus coronary artery bypass surgery for multivessel coronary artery disease: a meta-analysis of data from the ARTS II, CARDia, ERACI III, and SYNTAX studies and systematic review of observational data. EuroIntervention 2010;6(2):269-276. 145. Fan Y, Zhang A-M, Xiao Y-B et al. Warm versus cold cardioplegia for heart surgery: a meta-analysis. Eur J Cardiothorac Surg 2010;37(4):912-919. 146. Cheng T, Liu T, Zhang G et al. Does minimally invasive surgery improve short-term recovery in total knee arthroplasty? Clin Orthop 2010;468(6):1635-1648. 147. Biancari F, D'Andrea V, Di Marco C et al. Meta-analysis of randomized trials on the efficacy of vascular closure devices after diagnostic angiography and angioplasty. Am Heart J 2010;159(4):518-531.

A -65

148. Bartels RHMA, Donk R, Verbeek ALM. No justification for cervical disk prostheses in clinical practice: a meta-analysis of randomized controlled trials. Neurosurgery 2010;66(6):1153-1160. 149. Bakoyiannis C, Economopoulos KP, Georgopoulos S et al. Carotid endarterectomy versus carotid angioplasty with or without stenting for treatment of carotid artery stenosis: an updated meta-analysis of randomized controlled trials. Int Angiol 2010;29(3):205-215. 150. Zhang Z-j, Zhang P, Tian J-h et al. Ultrasonic coagulator for thyroidectomy: a systematic review of randomized controlled trials. Surg Innov 2010;17(1):41-47.

A -66