Stephen Senn 1 of 3

Discussion of papers in the session, ‘Statistical methods for combining information from studies’.

Stephen Senn University of Glasgow, Department of 15 University Gardens, Glasgow, UK [email protected]

I welcome these three papers, which show most pleasingly, the variety of statistical considerations, both theoretical and practical, which may be brought to bear on the important business of synthesising evidence. I am only able to comment on a very few of the many excellent points made in these three papers and hope that the authors will forgive me for concentrating on that which is debatable or controversial rather than on that which will be agreed by all to be sensible.

Professor Hartung and colleagues make a very interesting proposal as to how one may proceed when there are no reliable within-trial variances. Their proposal that one may use weights of

ni λi ==K ,1,,,iK ∑ n j j=1 where

nnniii=+21( 12 1 ) is equivalent to the ordinary least squares (OLS) solution when it is assumed that the variances of individual responses are homogenous between treatments within trials (which is certainly true under the null hypothesis) but also between trials (which is less likely to be true). In fact, a fixed effects meta- analysis is analogous to an OLS analysis of the original data, fitting trial, treatment and trial by treatment terms(Senn, 2000b). (It has been erroneously claimed that the analogy is to a main effects model only(Mathew, 1999; Olkin & Sampson, 1998) but this is incorrect, as the difference in treatment effect from trial to trial makes no contribution to the overall residual error term(Senn, 2000b).)

This also raises the issue, however, as to whether the weighting scheme proposed by Hartung et al (or suitable analogous ones) might not be superior even where the individual trial variances are known. This might be particularly appropriate where the meta-analysis consists of many small trials. For example, cross-over trials can estimate treatment effects efficiently with very few degrees of freedom(Curtin, Altman, & Elbourne, 2002). Therefore, the variances will not be estimated so well. There can then be considerable bias in weighting the trials by the observed information and using numbers of patients instead (when combing trials of similar design) may be superior(Senn, 2000a). This point was made almost 70 years ago, in the context of agricultural research by Yates and Cochran(Yates & Cochran, 1938) who pointed out that weighting by observed information could be inferior to equal weighting.

As one might expect, matters to do with variances are also important in Professor Keith Abrams’s paper and he provides a Bayesian perspective on this. Abrams describes a fixed effects analysis as a special case of a random effects analysis when the between-trial variance τ 2 = 0 . This is, of course true from one perspective, and, indeed, Hartung et al also make this point. Nevertheless, a Fisherian point of view might be that a fixed effects analysis is a valid test of the hypothesis that θi =∀0, i and hence of the hypothesis µ = 0 (since the latter hypothesis is implausible unless the former is true(Fisher, 1935; Senn, 2004)), whatever the value of τ 2 under the alternative hypothesis, and that it also provides a reasonable approximation to estimating the average causal effect of treatment for the patients recruited in the trial(Peto, Collins, & Gray, 1993). This is particularly so if some weighting scheme such as that proposed by Hartung et al is used. As John Nelder has pointed out on many occasions(Lane & Nelder, 1982; Lee & Nelder, 2004), estimation is not the same as prediction. It is highly doubtful as to whether the µ estimated from a random effects meta-analysis is a relevant Stephen Senn 2 of 3 prediction of anything useful at all: it predicts the long term response in a randomly chosen trial from amongst the trials for which the set studied could be regarded as a random realisation, not for example of a randomly chosen patient from that set, still less for a randomly chosen future patient (assuming such were possible)(Senn et al., 2004). The fact that the distinction between fixed and random effects approaches is not just a matter of trial by treatment interaction may be illustrated by considering the proposal by Hartung et al already discussed. The estimate is a sort of fixed-effects estimate as it reflects the presumed within trial precision. However, the variance estimator is a sort of random-effects estimator, as it is driven by the term 2 ⎛⎞K ⎜⎟ybyikk− ∑ , ⎝⎠k =1 which reflects the difference between the individual trial estimates and their weighted mean.

I also wonder, to return to the theme of within-trial variances, whether Abrams’s scheme can be described as fully Bayesian. Some modelling of the variances is surely also required so that we have

⎡⎤2 ⎛⎞11 yNiii∼ ⎢⎥θσ, ⎜⎟+ ⎣⎦⎝⎠nnii12 ()nn+−2 s2 ii12 i∼ χ 2 2 nnii12+−2 σ i 1 2 ∼ gamma()αβ, , or perhaps σ i 22 ln()σφηi ∼ N ( , ) .

2 This will have the effect of shrinking the within trial estimates σ i to some common value and hence increasing the relevance of the patient numbers to the final weight (as in the proposal of Hartung et al). I am not trying to imply that Professor Abrams is not aware of this, as indeed, the modelling of variances is covered in his excellent text with and Jonathan Myles(Spiegelhalter, Abrams, & Myles, 2003). See also Grieve(Grieve, 1991) for a discussion of Bayesian approaches to the modelling of variances.

Abrams also suggests what may be done about studying the variation of treatment effects as a function of baseline risk. Here I reserve judgement as to whether the paper by Sharp and Thompson(Sharp & Thompson, 2000) cited does in fact deal with the regression to the mean problem(Senn, 1994) he mentions. An earlier approach by these authors(Thompson, Smith, & Sharp, 1997) is definitely not appropriate for dealing with this problem(Arends et al., 2000; van Houwelingen & Senn, 1999).

Interesting as all such theoretical niceties may be, they are not as relevant to the welfare of individual patients as the actual business of identifying and summarising evidence and then making sure that the results are applied. Professor Sue Green in her paper describes the work of the Cochrane Collaboration in this connection. The dedication and energy of the members of the organisation and its consequent success is one of the wonders of the information age. Green outlines well the various activities involved. I am pleased to see that there is a ‘Statistical Methods Group’. No doubt the other papers in this session will give them much to think about. If I have one gripe about the Cochrane Collaboration it is that in their haste to synthesise they have sometimes been rather eager to ignore difficulties of inference and force every study into a parallel group straight-jacket(Senn, 2003). However, it is also an organisation with a determination to evolve and progress and methodological criticisms should not be allowed to detract from its achievements.

Professor Victor is to be congratulated on putting together this interesting session. I look forward to future evolution and progress in this important area of statistical research and application.

Stephen Senn 3 of 3

References

Arends, L. ., Hoes, A. W., Lubsen, J., Grobbee, D. E., & Stijnen, T. (2000), Baseline risk as predictor of treatment benefit: three clinical meta-re- analyses Stat Med, 19, 3497. Curtin, F., Altman, D. G., & Elbourne, D. (2002), Meta-analysis combining parallel and cross-over clinical trials. I: Continuous outcomes Stat Med, 21, 2131. Fisher, R. A. (1935), Contribution to a discussion of J. Neyman's paper on statistical problems in agricultural experimentation Journal of the Royal Statistical Society, Supplement, 2 Grieve, A. P. (1991), Confidence intervals and sample sizes Biometrics, 47, 1597. Lane, P. W., & Nelder, J. A. (1982), Analysis of Covariance and Standardization as Instances of Prediction Biometrics, 38, 613. Lee, Y., & Nelder, J. A. (2004), Conditional and marginal models: Another view Statistical Science, 19, 219. Mathew, T. (1999), On the equivalence of meta-analysis using literature and using individual patient data Biometrics, 55, 1221. Olkin, I., & Sampson, A. (1998), Comparison of meta-analysis versus of individual patient data Biometrics, 54, 317. Peto, R., Collins, R., & Gray, R. (1993), in Doing More Good Than Harm: The Evaluation of Health Care Interventions vol.703, pp. 314. Senn, S., Wang, N. Y., Jiang, J. M., Lee, Y., & Nelder, J. A. (2004), Conditional and marginal models: Another view - Comments and rejoinders Statistical Science, 19, 228. Senn, S. J. (1994), Importance of trends in the interpretation of an overall odds ratio in the meta- analysis of clinical trials [letter; comment] Statistics in Medicine, 13, 293. Senn, S. J. (2000a), Letter to the Editor: in defence of the Controlled Clinical Trials, 21, 589 Senn, S. J. (2000b), The many modes of meta Drug Information Journal, 34, 535. Senn, S. J. (2003). Dicing with Death. Cambridge: Cambridge University Press Senn, S. J. (2004), Added Values: Controversies concerning randomization and additivity in clinical trials Statistics in Medicine, 23, 3729. Sharp, S. J., & Thompson, S. G. (2000), Analysing the relationship between treatment effect and underlying risk in meta-analysis: comparison and development of approaches Statistics in Medicine, 19, 3251. Spiegelhalter, D., Abrams, K. R., & Myles, J. P. (2003). Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Chichester: Wiley Thompson, S. G., Smith, T. C., & Sharp, S. J. (1997), Investigating underlying risk as a source of heterogeneity in meta-analysis Statistics in Medicine, 16, 2741. Van Houwelingen, H., & Senn, S. (1999), Investigating underlying risk as a source of heterogeneity in meta- analysis [letter; comment] Statistics in Medicine, 18, 110. Yates, F., & Cochran, W. G. (1938), The analysis of groups of experiments Journal of Agricultural Science, 28, 556.