Research Methods & Reporting
Total Page:16
File Type:pdf, Size:1020Kb
RESEARCH METHODS & REPORTING Prognosis and prognostic research: Developing a prognostic model Patrick Royston, 1 Karel G M Moons, 2 Douglas G Altman, 3 Yvonne Vergouwe 2 In the second article in their series, Patrick Royston and colleagues describe different approaches to building clinical prognostic models 1MRC Clinical Trials Unit, London The first article in this series reviewed why prognosis is Box 2 | Modelling continuous predictors NW1 2DA important and how it is practised in different medical Simple predictor transformations intended to detect and 2Julius Centre for Health Sciences settings. 1 We also highlighted the difference between model non-linearity can be systematically identified using, and Primary Care, University multivariable models used in aetiological research and Medical Centre Utrecht, Utrecht, for example, fractional polynomials, a generalisation of Netherlands those used in prognostic research and outlined the conventional polynomials (linear, quadratic, etc). 6 27 Power 3Centre for Statistics in Medicine, design characteristics for studies developing a prognostic transformations of a predictor beyond squares and cubes, University of Oxford, Oxford model. In this article we focus on developing a multi- including reciprocals, logarithms, and square roots are OX2 6UD variable prognostic model. We illustrate the statistical Correspondence to: P Royston allowed. These transformations contain a single term, [email protected] issues using a logistic regression model to predict the but to enhance flexibility can be extended to two term 2 Accepted: 6 October 2008 risk of a specific event. The principles largely apply to all models (eg, terms in log x and x ). Fractional polynomial multivariable regression methods, including models for functions can successfully model non-linear relationships Cite this as: BMJ 2009;338:b604 continuous outcomes and for time to event outcomes. found in prognostic studies. The multivariable fractional doi: 10.1136/bmj.b604 The goal is to construct an accurate and discriminat- polynomial procedure is an extension to multivariable 4 27 ing prediction model from multiple variables. Models models including at least one continuous predictor, and may be a complicated function of the predictors, as in combines backward elimination of weaker predictors with transformation of continuous predictors. weather forecasting, but in clinical applications consid- Restricted cubic splines are an alternative approach to erations of practicality and face validity usually suggest modelling continuous predictors. 5 Their main advantage a simple, interpretable model (as in box 1). is their flexibility for representing a wide range of perhaps Surprisingly, there is no widely agreed approach to complex curve shapes. Drawbacks are the frequent building a multivariable prognostic model from a set of occurrence of wiggles in fitted curves that may be unreal candidate predictors. Katz gave a readable introduction and open to misinterpretation 28 29 and the absence of a to multivariable models, 3 and technical details are also simple description of the fitted curve. widely available. 4-6 We concentrate here on a few fairly standard modelling approaches and also consider how to handle continuous predictors, such as age. population of interest. Before starting to develop a multivariable prediction model, numerous decisions Preliminaries must be made that affect the model and therefore the We assume here that the available data are sufficiently conclusions of the research. These include: accurate for prognosis and adequately represent the t Selecting clinically relevant candidate predictors for possible inclusion in the model Box 1 | Example of a prognostic model t Evaluating the quality of the data and judging Risk score from a logistic regression model to predict the what to do about missing values risk of postoperative nausea or vomiting (PONV) within the t Data handling decisions 2 first 24 hours after surgery : t Choosing a strategy for selecting the important Risk score= −2.28+(1.27×female sex)+(0.65×history variables in the final model of PONV or motion sickness)+(0.72×non- t smoking)+(0.78×postoperative opioid use) Deciding how to model continuous variables 5 where all variables are coded 0 for no or 1 for yes. t Selecting measure(s) of model performance or The value −2.28 is called the intercept and the other predictive accuracy. numbers are the estimated regression coefficients for Other considerations include assessing the robustness the predictors, which indicate their mutually adjusted of the model to influential observations and outliers, relative contribution to the outcome risk. The regression This article is the second in a studying possible interaction between predictors, decid- series of four aiming to provide coefficients are log(odds ratios) for a change of 1 unit in the ing whether and how to adjust the final model for over- an accessible overview of the corresponding predictor. 5 −risk score fitting (so called shrinkage), and exploring the stability principles and methods of The predicted risk (or probability) of PONV=1/(1+e ). 7 prognostic research (reproducibility) of a model. BMJ | 6 JUNE 2009 | VOLUME 338 1373 RESEARCH METHODS & REPORTING Selecting candidate predictors esis tests is applied to determine whether a given vari- Studies often measure more predictors than can sensi- able should be removed from the model. Backward bly be used in a model, and pruning is required. Predic- elimination is preferable to forward selection (whereby tors already reported as prognostic would normally be the model is built up from the best candidate predic- candidates. Predictors that are highly correlated with tor). 16 The choice of significance level has a major effect others contribute little independent information and on the number of variables selected. A 1% level almost may be excluded beforehand. 5 However, predictors always results in a model with fewer variables than a that are not significant in univariable analysis should 5% level. Significance levels of 10% or 15% can result not be excluded as candidates. 8-10 in inclusion of some unimportant variables, as can the full model approach. (A variant is the Akaike informa- Evaluating data quality tion criterion, 17 a measure of model fit that includes There are no secure rules for evaluating the quality a penalty against large models and hence attempts to of data. Judgment is required. In principle, data used reduce overfitting. For a single predictor, the criterion for developing a prognostic model should be fit for equates to selection at 15.7% significance. 17 ) purpose. Measurements of candidate predictors and Selection of predictors by significance testing, par- outcomes should be comparable across clinicians or ticularly at conventional significance levels, is known to study centres. Predictors known to have considerable produce selection bias and optimism as a result of over- measurement error may be unsuitable because this fitting, meaning that the model is (too) closely adapted dilutes their prognostic information. to the data. 5 9 17 Selection bias means that a regression Modern statistical techniques (such as multiple impu- coefficient is overestimated, because the corresponding tation) can handle data sets with missing values. 11 12 predictor is more likely to be significant if its estimated However, all approaches make critical but essentially effect is larger (perhaps by chance) rather than smaller. untestable assumptions about how the data went miss- Overfitting leads to worse prediction in independent ing. The likely influence on the results increases with data; it is more likely to occur in small data sets or the amount of data that are missing. Missing data are with weakly predictive variables. Note, however, that seldom completely random. They are usually related, selected predictor variables with very small P values directly or indirectly, to other subject or disease char- (say, <0.001) are much less prone to selection bias and acteristics, including the outcome under study. Thus overfitting than weak predictors with P values near the exclusion of all individuals with a missing value leads nominal significance level. Commonly, prognostic data not only to loss of statistical power but often to incor- sets include a few strong predictors and several weaker rect estimates of the predictive power of the model and ones. specific predictors. 11 A complete case analysis may be sensible when few observations (say <5%) are missing. 5 Modelling continuous predictors If a candidate predictor has a lot of missing data it may Handling continuous predictors in multivariable mod- be excluded because the problem is likely to recur. els is important. It is unwise to assume linearity as it can lead to misinterpretation of the influence of a predictor Data handling decisions and to inaccurate predictions in new patients. 14 See box Often, new variables need to be created (for example, 2 for further comments on how to handle continuous diastolic and systolic blood pressure may be combined predictors in prognostic modelling. to give mean arterial pressure). For ordered categorical variables, such as stage of disease, collapsing of catego- Assessing performance ries or a judicious choice of coding may be required. The performance of a logistic regression model may We advise against turning continuous predictors into be assessed in terms of calibration and discrimination. dichotomies. 13 Keeping variables continuous is pref- Calibration can be investigated by plotting