Ambler, G. K., Gohel, M. S., Mitchell, D. C., Loftus, I. M., Boyle, J. R., & Audit and Quality Improvement Committee of the Vascular Society of Great Britain and Ireland (2015). The Abdominal Aortic Aneurysm Statistically Corrected Operative Risk Evaluation (AAA SCORE) for predicting mortality after open and endovascular interventions. Journal of Vascular Surgery, 61(1), 35-43. https://doi.org/10.1016/j.jvs.2014.06.002

Publisher's PDF, also known as Version of record License (if available): Other Link to published version (if available): 10.1016/j.jvs.2014.06.002

Link to publication record in Explore Bristol Research PDF-document

This is the final published version of the article (version of record). It first appeared online via Elsevier at https://www.sciencedirect.com/science/article/pii/S0741521414010775 . Please refer to any applicable terms of use of the publisher.

University of Bristol - Explore Bristol Research General rights

This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/red/research-policy/pure/user-guides/ebr-terms/ From the Society for Vascular Surgery

The Abdominal Aortic Aneurysm Statistically Corrected Operative Risk Evaluation (AAA SCORE) for predicting mortality after open and endovascular interventions

Graeme K. Ambler, MB BChir, BSc, PhD, MRCS,a Manjit S. Gohel, MD, FRCS, FEBVS,a David C. Mitchell, MA, MB BS, MS, FRCS,b Ian M. Loftus, BSc, MB ChB, MD, FRCS,c and Jonathan R. Boyle, MB ChB, MD, MA, FRCS,a in association with the Audit and Quality Improvement Committee of the Vascular Society of Great Britain and Ireland, Cambridge, Bristol, and London, United Kingdom

Background: Accurate adjustment of surgical outcome for risk is vital in an era of surgeon-level reporting. Current risk prediction models for abdominal aortic aneurysm (AAA) repair are suboptimal. We aimed to develop a reliable risk model for in-hospital mortality after intervention for AAA, using rigorous contemporary statistical techniques to handle missing data. Methods: Using data collected during a 15-month period in the United Kingdom National Vascular Database, we applied multiple methodology together with stepwise to generate preoperative and perioperative models of in-hospital mortality after AAA repair, using two thirds of the available data. Model performance was then assessed on the remaining third of the data by receiver operating characteristic curve analysis and compared with existing risk prediction models. Model calibration was assessed by Hosmer-Lemeshow analysis. Results: A total of 8088 AAA repair operations were recorded in the National Vascular Database during the study period, of which 5870 (72.6%) were elective procedures. Both preoperative and perioperative models showed excellent discrimination, with areas under the receiver operating characteristic curve of .89 and .92, respectively. This was significantly better than any of the existing models (area under the receiver operating characteristic curve for best comparator model, .84 and .88; P < .001 and P [ .001, respectively). Discrimination remained excellent when only elective procedures were considered. There was no evidence of miscalibration by Hosmer-Lemeshow analysis. Conclusions: We have developed accurate models to assess risk of in-hospital mortality after AAA repair. These models were carefully developed with rigorous statistical methodology and significantly outperform existing methods for both elective cases and overall AAA mortality. These models will be invaluable for both preoperative patient counseling and accurate risk adjustment of published outcome data. (J Vasc Surg 2015;61:35-43.)

Patients and politicians are driving the demand for a of incomplete data that do not adequately adjust for greater transparency in the reporting of surgical outcomes. individual patient risk. Publication of robust, risk-adjusted Surgeons are justifiably concerned about the publication data is vital to reassure patients and surgeons alike. Abdominal aortic aneurysm (AAA) disease is common, affecting around 5% of men aged 65 to 74 years in western From the Cambridge Vascular Unit, Cambridge University Hospitals NHS 1 Foundation Trust, Cambridgea; the Department of Surgery, North Bristol Europe, and the incidence increases with age. The annual NHS Trust, Southmead Hospital, Westbury-on-Trym, Bristolb;andtheSt risk of rupture for untreated AAA increases with size, reaching George’s Vascular Institute, St George’s, University of London, London.c around 30% for AAA >7cmindiameter.2 Aneurysm rupture G.K.A. was funded by a UK National Institute of Health Research academic carries a high in-hospital of at least 30% in recent clinical fellowship. studies for those patients reaching the hospital.3-5 Many of the Author conflict of interest: none. Presented in the plenary International Forum at the 2014 Vascular Annual risk factors for AAA, such as smoking, hypertension, obesity, Meeting of the Society for Vascular Surgery, Boston, Mass, June 5-7, 2014. and advanced age, are also associated with coronary artery dis- Additional material for this article may be found online at www.jvascsurg.org. ease,6 making this a group with high perioperative risk. Reprint requests: Graeme K. Ambler, MB BChir, BSc, PhD, MRCS, Accurate risk stratification is of paramount importance, Cambridge Vascular Unit, Cambridge University Hospitals NHS Foun- fi dation Trust, Hills Road, Cambridge, CB2 0QQ, UK (e-mail: graeme. primarily because patients need accurate quanti cation of [email protected]). the risks and benefits of elective intervention to make a The editors and reviewers of this article have no relevant financial relationships truly informed decision. Moreover, with the publication to disclose per the JVS policy that requires reviewers to decline review of any of surgeon-specific raw outcome data, there may be a fl manuscript for which they may have a con ict of interest. greater tendency to avoid intervention on high-risk patients 0741-5214 fi 7 Copyright Ó 2015 by the Society for Vascular Surgery. to protect outcome gures. Publication of reliable risk- http://dx.doi.org/10.1016/j.jvs.2014.06.002 adjusted outcomes offers a preferable approach. The value

35 JOURNAL OF VASCULAR SURGERY 36 Ambler et al January 2015 of an accurate risk model has been illustrated by the record procedures performed in United Kingdom National European System for Cardiac Operative Risk Evaluation Health Service hospitals and are used for reimbursement (EuroSCORE),8 pioneered and adopted by European purposes,16 although individual data fields for each case cardiac surgeons. The model has contributed to an are far less complete. The database used contained all improvement in cardiac surgical outcomes, in part by entries for AAA repair with a discharge or death date enabling resources to be correctly directed toward higher between February 1, 2010, and April 30, 2011. Approval risk patients. Risk-adjusted outcome modeling also allows to use the database for the purpose of the present study was robust comparative audit, which in turn can help identify authorized by the audit and quality improvement com- and disseminate areas of good practice and allow targeted mittee of the Vascular Society of Great Britain and Ireland, help when poorer performance is highlighted. which forms the oversight committee for the database. All A number of models are currently available for assessing patients consented to for the purposes of perioperative risk in AAA repair, including the Vascular service evaluation and quality improvement. The commit- Biochemistry and Haematology Outcome Model tee deemed that additional patient consent was not (VBHOM),9 Physiological and Operative Severity Score required, as the project fell within the scope of existing for the enUmeration of Mortality (POSSUM),10 a model consent. All data were anonymized before export. derived from data collected in the Medicare system (Medi- A total of 201 unique data variables are collected in the care),11 and the Vascular Governance North West model National Vascular Database for patients undergoing AAA (VGNW).12 None of these models provides the high intervention, including preoperative, intraoperative, and levels of both sensitivity and specificity associated with the postoperative fields. Of these, 63 variables related either EuroSCORE,13 and they have not been adopted widely to outcomes other than in-hospital mortality (such as in vascular surgical practice. One possible explanation of postoperative infection) or to procedure-specific fields why currently available AAA models are suboptimal is relevant only to open or endovascular repair but not that missing data plagues the databases used for generating both, leaving 138 variables for analysis (Fig 1). these models, making imputation of missing values neces- Model generation and validation subsets. The data sary. The reason for this is that to find optimal models, it were divided into separate model generation and validation is necessary to compare the way different models fit the sets to ensure that model validation could be performed on data. This is only possible if the is complete, which data that were independent from data used for model in practice is almost never the case. To date, the approaches generation. As model generation is a more complex process, used to deal with this problem have been simplistic, a pragmatic decision was made to divide the data into two including deleting cases with missing values or imputing thirds and one third for model generation and validation the / value for all missing items. Recent guide- subsets, respectively. As AAA interventions have evolved lines support the application of model-based approaches for significantly over time, every third case was moved to the missing data,14 with multiple imputation being the most model validation subset; the remaining cases were kept in extensively developed and deployed model-based tech- the model generation subset to avoid temporal . nique.15 The reasons for this are straightforward. First of Missing data. As appropriate handling of missing data all, simplistic approaches to handling the problem tend was considered crucial to the effectiveness of the study, to result in underestimation of the spread of the data simplistic approaches, such as simply removing all variables and so inflate estimates of significance. Second, simplistic or cases with missing values or imputing single mean or approaches all require that data are missing completely at median values for the missing values, were avoided as these random. Multiple imputation, by contrast, requires only have been shown to produce biased estimates or overly that the factors that affect the chances that data are missing optimistic confidence intervals.17 Instead, the technique are measured and included in the imputation model. In the of multiple imputation15 was used to generate 20 complete case of large databases with many measured variables, this versions of the data, using the Multiple Imputation with weaker assumption is much more likely to hold. Chained Equations (MICE) software version 2.1318 within The aim of this study was to develop a reliable risk the R package version 2.15.219 (R Core Team, model for in-hospital mortality after intervention for Vienna, Austria). This software uses the chained equations AAA, using rigorous contemporary statistical techniques approach based on Monte Carlo to generate to handle missing data. multiple complete data sets by modeling each variable as a function of multiple other variables that are well correlated METHODS when fields are complete. Study population. The United Kingdom National Multiple imputation relies on the assumption that data Vascular Database is a prospectively collected database are “missing at random.” This assumption that the containing clinical, demographic, and outcome data of value of a given variable is missing at random, once we patients undergoing key index vascular surgical procedures, take into account the values of the other variables measured, including open and endovascular AAA repair. Although and is much weaker than the assumption that data are data entry is entirely voluntary, it has been demonstrated “missing completely at random.” Considerable effort was that data entry rates exceed 90% of cases in most regions, made to improve the chances that the missing at random in comparison to Hospital Episode Statistics data, which assumption would hold by including a reasonable number JOURNAL OF VASCULAR SURGERY Volume 61, Number 1 Ambler et al 37

was used for binary or categorical variables. The built-in function “quickpred” was used to select variables with good predictive power for missing values. To reduce the computational complexity involved in imputation of values, variables with more than two thirds of values missing were excluded from the analysis before imputation. Statistical modeling. All modeling was performed in the R statistics package version 2.15.2, using built-in functions together with the MICE package version 2.13. Stepwise minimization of the Schwarz-Bayes criterion22 was used to generate optimal models of in-hospital mortality for each of the 20 imputed data sets using the stepAIC function.23 Continuous variables were allowed both linear and quadratic terms to better model quantities, such as blood pressure and pulse rate, where values that are too high and too low are detrimental. Variables were then selected for the final model if they were present in at least half of the 20 models.17 Standard combining rules were then used to calculate overall parameter estimates.17 It was thought to be desirable to generate two models, one with only preoperative variables (preoperative model) and one that included both preoperative and intraoperative variables (perioperative model), to aid in auditing both overall and postoperative care. This procedure was there- fore performed twice with slightly different variable sets. Model assessment. Performance of the models was based on their performance on the (completely distinct) vali- dation subset, and comparison was made to established models (POSSUM, VGNW, Medicare, VBHOM). We used receiver operating characteristic (ROC) curve anal- ysis,24 which assesses the tradeoff between sensitivity and specificity of a given test by varying the threshold and plot- Fig 1. Reasons for exclusion of variables. ting sensitivity against (1 specificity). The area under the curve (AUC) is often used to give a single number summary of variables in the imputation model (the median number of of these curves, with 1 representing perfect discrimination, predictor variables was 20) and by manually including vari- values above .8 usually interpreted as excellent discrimina- ables thought to be predictive of missingness. For example, tion, above .7 as good discrimination, and below .6 as poor it is known from previous work20 that patients who die dur- discrimination. A value of .5 represents performance no ing admission are more likely to have missing values, for a va- better than pure chance. Standard errors for AUC were riety of reasons. Thus, in-hospital mortality was included in calculated by DeLong’s method25 combined with Rubin’s all multiple imputation models. rules26 for calculating the of an estimate based on Our approach here is in sharp contrast to other recent multiple imputation data, enabling us to compare risk pre- attempts to generate outcome models for AAA repair, diction models and to calculate P values. In an attempt to which have discarded vast amounts of the available data detect bias introduced by the multiple imputation process, and used single-value imputation methods to impute values AUC was calculated both with the multiple imputed data where there were up to 15% missing values.21 Both single- sets and also by complete case analysis,27 where cases for value imputation methods and case deletion methods rely which scores cannot be calculated because of missing items on the much more stringent assumption that data are are deleted. The pROC package version 1.5.428 within the R missing completely at random. Single-value imputation statistics package was used to draw ROC curves and to methods also suffer from underestimation of standard er- calculate AUC. In addition to standard ROC curve analysis, rors and thus tend to artificially inflate estimates of we also calculated the net reclassification improvement29 for significance.17 patients classified as at low (<2%), moderate (2%-10%), and Essentially, multiple imputation involves the calcula- high (>10%) risk of in-hospital mortality by our new models tion of educated estimates for each missing value, based compared with the existing models and the integrated on the variables that are present. The predictive mean discrimination improvement.29 method15 was used for variables with many Calibration of the models was assessed by Hosmer- possible values, and logistic or multiple logistic regression Lemeshow statistics.30 This is a method that tests for JOURNAL OF VASCULAR SURGERY 38 Ambler et al January 2015

Table I. Basic demographic data comparing the Table III. Coefficients of the perioperative logistic modeling and validation data sets show no significant regression model, with percentages of missing values and differences between the groups P values from a

Modeling set Validation set P value Standard Coefficient error P value % Missing Elective 72.7% 72.5% .92a Unplanned 4.0% 4.0% 1.00a (Intercept) 7.4339 0.8755 <.0001 Emergency 23.3% 23.5% .89a AAA reoperation 1.6319 0.2392 <.0001 15 Mean age, years 74.6 74.9 .28b Admission , 0.6284 0.3318 .06 0.04 Male:female ratio 6.1:1 5.6:1 .18a unplanned Diabetes 12.4% 11.8% .49a Admission mode, 1.3532 0.1730 <.0001 0.04 Cardiac disease 42.% 42.4% .94a emergency Age, years 0.0467 0.0080 <.0001 0 For this simple analysis, missing values have been deleted. Creatinine, 0.0036 7.57 10 4 <.0001 19 aFisher exact test. mmol/L bWelch two-sample t-test. Intraoperative 0.5818 0.0691 <.0001 25 blood loss Table II. Coefficients of the preoperative logistic Lowest BP 0.0197 0.0033 <.0001 29 regression model, with percentages of missing values and (intraoperative) < P values from a Wald test ASA grade 0.3636 0.0789 .0001 16 Cardiac history 0.4601 0.1400 .001 13 Albumin, g/L 0.0286 0.0100 .005 45 Standard Highest pulse 4.55 10 5 1.42 10 5 .002 31 fi Coef cient error P value % Missing (intraoperative)

(Intercept) 7.1026 0.7650 <.001 AAA, Abdominal aortic aneurysm; ASA, American Society of Anesthesi- AAA reoperation 1.7691 0.2323 <.001 15 ologists; BP, blood pressure. Admission mode, 0.6877 0.3179 .031 0.04 Intraoperative blood loss was recorded in the database on a scale of 1 to 4, unplanned where 1 means <1 liter of blood loss, 2 means 1 to 2 liters of blood loss, Admission mode, 1.4731 0.1677 <.001 0.04 3 means 2 to 5 liters of blood loss, and 4 means >5 liters blood loss. emergency Lowest BP (intraoperative) was the lowest intraoperative systolic blood Age, years 0.0493 0.0076 <.001 0 pressure (in mm Hg). ASA grade was measured on the usual scale of 1 to Creatinine, 0.0035 7.49 10 4 <.001 19 5. Highest pulse (intraoperative) was the highest pulse rate measured mmol/L intraoperatively squared, as the quadratic term fitbetterthanasimple Lowest BP, 0.0307 0.0073 <.001 22 linear term. preoperativea 4 5 Lowest BP, 1.01 10 3.36 10 .003 22 used are given in Supplementary Table, online only. preoperativeb Cardiac history 0.4649 0.1364 .001 13 Reasons for exclusion of variables are presented in Fig 1. EVAR 0.9526 0.576 <.001 10 Modeling. Stepwise minimization of the Schwartz- ASA grade 0.5102 0.0763 <.001 16 Bayes criterion across the 20 multiple imputation White cell 0.0279 0.0119 .021 20 modeling data sets identified nine variables in the preop- 9 count, 10 /L erative model and 10 variables in the perioperative model. AAA, Abdominal aortic aneurysm; ASA, American Society of Anesthesi- Significant variables in both models were mode of ologists; BP, blood pressure; EVAR, endovascular aneurysm repair. admission, age, preoperative serum creatinine concentra- Lowest BP (preoperative) was the lowest systolic blood pressure measured tion, history of ischemic heart disease or cardiac failure, preoperatively (in mm Hg): alinear term; bquadratic term. ASA grade was scored 1 to 5. American Society of Anesthesiologists grade, and whether the procedure was a reoperation after previous AAA repair. poor calibration, so that small P values indicate poor For the preoperative model, additional significant variables calibration and imply that the model does not fit the data. were the lowest preoperative blood pressure, endovascular repair, and preoperative white cell count. For the periop- RESULTS erative model, additional significant variables were Patients and parameters. A total of 8088 operations the lowest intraoperative blood pressure, highest intra- were recorded in the National Vascular Database between operative heart rate, intraoperative blood loss, and serum February 1, 2010, and April 30, 2011. Two patients were albumin level. These are presented in Tables II and III excluded because of data inconsistencies, leaving a total of together with estimated values of the coefficients and 8086 cases. Of these, 5389 cases were used for model devel- Wald significance values, corrected for the degree of opment (modeling set), and the remaining 2694 cases were missing data and the Monte Carlo error from the multiple used for model validation (validation set), according to the imputation process. All variables improve model fit signif- protocol set out in the Methods. Basic demographic details icantly at the 5% level when evaluated against the modeling of the modeling and validation sets are shown in Table I. set. Because these are logistic regression models, the odds After application of the rules set out before, 49 of the ratio of in-hospital mortality is increased by a factor of 201 measured variables were suitable for inclusion in the exp(coefficient*(difference in variable)) for each variable in modeling phase of the analysis. Details of the 49 variables the model. For the preoperative model, a 10-year increase in JOURNAL OF VASCULAR SURGERY Volume 61, Number 1 Ambler et al 39

Table IV. Area under the curve (AUC) for the preoperative and perioperative Abdominal Aortic Aneurysm Statistically Corrected Operative Risk Evaluation (AAA SCORE) models compared with other risk scoring models, showing excellent performance both for the complete validation set (first column) and when attention is focused on elective AAA repair only (second column)

All cases Elective only

Preoperative AAA SCORE .894 (.011) .819 (.033) Perioperative AAA SCORE .917 (.010) .853 (.031) VBHOM9 .833 (.013) .735 (.037) VGNW12 .693 (.020) .702 (.042) Medicare11 $696 ($019) .722 (.041) POSSUM10 .880 (.013) .731 (.040) pPOSSUM10 .820 (.017) .669 (.043)

VBHOM, Vascular Biochemistry and Haematology Outcome Model; VGNW, Vascular Governance North West model; POSSUM, Physiological and Operative Severity Score for the enUmeration of Mortality; pPOSSUM, physiology-only POSSUM. Fig 3. Receiver operating characteristic (ROC) curves for the Abdominal Aortic Aneurysm Statistically Corrected Operative Risk Evaluation (AAA SCORE) perioperative model compared with other perioperative risk scoring models: Vascular Biochemistry and Haematology Outcome Model (VBHOM),9 Medicare,11 Physio- logical and Operative Severity Score for the enUmeration of Mortality (POSSUM),10 Vascular Governance North West model (VGNW).12 Curves shown are the mean curve from the 20 multiply imputed validation data sets. Values for the area under the curves (AUCs) are shown in Table IV.

Risk prediction models based on the validation set showed an AUC of .89 (95% confidence interval, 0.87-0.92) for the preoperative model and .92 (95% confi- dence interval, 0.90-0.94) for the perioperative model, which was superior to other established models (P < .001 and P ¼ .001, respectively) (Table IV; Figs 2 and 3). For elective cases only, the models showed an AUC of .82 (95% confidence interval, 0.75-0.88) for the preopera- tive model and .85 (95% confidence interval, 0.79-0.91) for the perioperative model. The models also outperformed current models (P ¼ .02 and P ¼ .001, respectively) Fig 2. Receiver operating characteristic (ROC) curves for the (Table IV; Figs 4 and 5). Abdominal Aortic Aneurysm Statistically Corrected Operative Risk There was a significant improvement in classification Evaluation (AAA SCORE) preoperative model compared with into low-risk (<2%), medium-risk (2%-10%), and high-risk other preoperative risk scoring models: Vascular Biochemistry (>10%) categories associated with use of both the and Haematology Outcome Model (VBHOM),9 Medicare,11 physiology-only Physiological and Operative Severity Score for preoperative and perioperative Abdominal Aortic Aneurysm the enUmeration of Mortality (pPOSSUM),10 Vascular Gover- Statistically Corrected Operative Risk Evaluation (AAA nance North West model (VGNW).12 Curves shown are the mean SCORE) models compared with the existing models. The curve from the 20 multiply imputed validation data sets. Values for net reclassification index was 0.232 (95% confidence inter- the area under the curves (AUCs) are shown in Table IV. val, 0.134-0.330; P < .001) in comparing the preoperative AAA SCORE to the best of the existing preoperative models (the physiology-only POSSUM model). In comparing the patient age therefore increases the odds of in-hospital perioperative AAA SCORE model to the best of the existing mortality by a factor of exp(0.0493*10) ¼ 1.6. In con- perioperative models (the POSSUM model), there was trast to this modest increase, endovascular repair reduces the again improvement in classification (net reclassification odds of in-hospital mortality by a factor of 2.6, and emer- index ¼ 0.079; 95% confidence interval, 0.004-0.153; P ¼ gency repair increases the odds of in-hospital mortality by a .02). There was also a positive integrated discrimination factor of almost 4 compared with elective repair. improvement in comparing the AAA SCORE model with JOURNAL OF VASCULAR SURGERY 40 Ambler et al January 2015

Fig 4. Receiver operating characteristic (ROC) curves for the Fig 5. Receiver operating characteristic (ROC) curves for the Abdominal Aortic Aneurysm Statistically Corrected Operative Risk Abdominal Aortic Aneurysm Statistically Corrected Operative Risk Evaluation (AAA SCORE) preoperative model compared with other Evaluation (AAA SCORE) perioperative model compared with preoperative risk scoring models for elective cases only: Vascular other perioperative risk scoring models for elective cases only: Biochemistry and Haematology Outcome Model (VBHOM),9 Vascular Biochemistry and Haematology Outcome Model Medicare,11 physiology-only Physiological and Operative Severity (VBHOM),9 Medicare,11 Physiological and Operative Severity Score for the enUmeration of Mortality (pPOSSUM),10 Vascular Score for the enUmeration of Mortality (POSSUM),10 Vascular Governance North West model (VGNW).12 Curves shown are the Governance North West model (VGNW).12 Curves shown are the mean curve from the 20 multiply imputed validation data sets. Values mean curve from the 20 multiply imputed validation data sets. for the area under the curves (AUCs) are given in Table IV. Values for the area under the curves (AUCs) are given in Table IV. all other models, and this improvement was significant for all for counseling patients about individualized risk from sur- comparisons other than between the perioperative AAA gical repair. This model could also be used to calculate SCORE and the POSSUM model. risk-adjusted outcomes in a manner similar to the Hospital To assess model calibration, the data were divided into Standardised Mortality Ratio or Summary Hospital-level deciles according to the risk predicted by the AAA SCORE Mortality Indicator data to provide a meaningful assess- models. Observed and expected mortality rates were then ment of performance within national surgeon-specific compared in the 20 multiply imputed data sets. There outcome publications. were no significant differences between these values The models developed in this study outperformed all (Hosmer-Lemeshow P values of .33 and .35 for the preop- other available risk prediction tools for AAA intervention erative and perioperative models, respectively, details pre- and provided excellent discrimination. Another recent sented in Table V). In contrast, all of the other models aneurysm risk model was also developed with the same Na- were poorly calibrated (Hosmer-Lemeshow P < .001 for tional Vascular Database.21 However, this model consid- all of the other models assessed). ered only elective cases and used simplistic approaches to handle missing data, as mentioned before. As this paper DISCUSSION used all of the elective data from both our model genera- There is a pressing need for robust outcome models tion and model validation subsets for model generation, that adjust for patient risk with the publication of hospital- we have not included data for these comparisons in this pa- and surgeon-level results.31 Without adequate adjustment, per as the results would be subject to bias. With this caveat there is a real danger that vascular surgeons will become in mind, in comparing performance of this model against risk averse and potentially avoid AAA repair in higher risk the AAA SCORE on the model validation subset, the patients. This study has demonstrated that it is possible AAA SCORE again appears to significantly outperform to develop risk prediction models for AAA repair with this model (data not shown). high predictive power by multiple imputation methodol- There has been some criticism recently of using the area ogy to handle missing data in a large prospective database. under the ROC curve as the main method for model discrim- Although the models were developed with data collected ination, with the charge that for a score to be clinically useful, for both emergency and elective AAA repair, they perform the most important criterion is that it should be better than very well on the elective subset, meaning that the same existing scores at classifying patients into a small number models can be used in both elective and emergency settings of arbitrary categories (often only three: low, moderate, JOURNAL OF VASCULAR SURGERY Volume 61, Number 1 Ambler et al 41

Table V. Calibration assessment of the Abdominal Aortic Aneurysm Statistically Corrected Operative Risk Evaluation (AAA SCORE) models with 10 equally spaced deciles showing excellent agreement between observed and predicted mortality

Preoperative AAA SCORE Perioperative AAA SCORE

Risk decile Observed Expected Total cases Observed Expected Total cases

0%-10% 49 49 2152 43 48 2196 10%-20% 27 25 179 23 23 156 20%-30% 35 29 119 23 21 86 30%-40% 22 27 77 25 26 74 40%-50% 28 27 60 26 22 50 50%-60% 22 25 46 23 24 44 60%-70% 16 15 23 18 18 28 70%-80% 14 15 20 19 20 27 80%-90% 11 13 16 18 19 23 90%-100% 3 3 3 7 9 9

Values are the mean values across the 20 multiply imputed validation sets. and high risk).29 However, the dependence that this places the baseline risk of procedures undertaken, these statistics on arbitrarily chosen cutoffs makes interpretation of these are at best inaccurate and potentially misleading. The risk reclassification methods difficult, as a model that optimally adjustment model presented in this study provides a means reclassifies patients according to one grouping will not neces- to calculate this baseline risk accurately. sarily perform well when the boundaries of these categories A strength of this study was the generality of the meth- are changed. Nevertheless, we have shown that the AAA odology. The rigorous handling of missing data means that SCOREs perform well when judged in this way. other surgical audit committees could adopt this approach During the development of the AAA SCORE, we to develop rigorous, accurate models for risk adjustment of considered whether it might be best to stratify models ac- their outcomes. The missing data approach allows far more cording to type of repair (endovascular or open); however, of the data set to be used than would be possible with we discovered that such stratified models did not offer simpler techniques, such as discarding variables or using significantly improved predictive power compared with simple mean or mode imputation, and allows calculation including type of repair as a predictor. The additional parsi- of baseline risk for all surgeons. Moreover, there is greater mony offered by restricting the number of models caused reassurance that results are valid, as the modeling tech- us to choose the approach presented in the current work. niques used are known to provide unbiased estimates The potential impact of the models generated in our with more accurate estimates of uncertainty.17 study is large. Using the example of cardiac surgery, once Although the multiple imputation methodology the EuroSCORE became widely available, providing accu- provides a solution to the problem of missing data in large rate risk prediction for cardiac surgery, it was rapidly adop- databases, it is not a panacea and relies on the assumption ted in clinical practice to aid decision making, patient that data are missing at random, in the technical sense counseling, and clinical audit. It has been suggested that explained in the Methods. We made a significant effort to development of EuroSCORE contributed significantly to include parameters in the imputation models that were the large improvements in cardiac surgical outcomes that likely to be associated with the likelihood of variables being have been seen during the past decade.32 Now that there missing, such as in-hospital mortality. However, it is not exists a model with similar predictive power for AAA repair, possible to guarantee that unknown and unmeasured factors there is no reason that it may not be similarly adopted. do not determine the chances of data being missing, leading With modern technologies, it would be straightforward to subtle violations of the missing at random assumption. to implement a calculator within a website or a smart Clearly, avoiding missing data by comprehensive data input phone application, allowing calculations to be done in sec- is preferable. One of the main reasons for the presence of onds. In addition to the clinical applications of these missing data in large databases is thought to be the burden models, there are significant implications for outcome created by an excessive number of data fields. This study audit. Within the United Kingdom and in many other has demonstrated that it is possible to generate accurate countries, there is an inexorable drive toward the publica- risk prediction models from a small data set of 10 preop- tion of unit- and surgeon-specific outcomes. In the United erative and three intraoperative variables, including the Kingdom, unit-level elective aneurysm mortality results type of repair. The Vascular Society of Great Britain and were published in 201216 and surgeon-level results were Ireland is in the process of commissioning a new National published in 2013.31 The format of the outcome publica- Vascular Registry.33 It is hoped that the adoption of a far tions has been criticized because of concerns about rudi- smaller data set will encourage more complete reporting mentary risk adjustment7 and poor capacity to detect in the future. In fact, with this in mind, the new National poor surgical performance. Without an appreciation of Vascular Registry does not permit cases to be fully JOURNAL OF VASCULAR SURGERY 42 Ambler et al January 2015

uploaded until a minimum subset of fields have been 3. Verhoeven EL, Kapma MR, Groen H, Tielliu IF, Zeebregts CJ, completed. Bekkema F, et al. Mortality of ruptured abdominal aortic aneurysm A potential weakness of this study is that both the treated with open or endovascular repair. J Vasc Surg 2008;48: 1396-400. modeling and validation sets were derived from the same 4. Reimerink JJ, Hoornweg LL, Vahl AC, Wisselink W, van den original database. However, as both data sets were large Broek TA, Legemate DA, et al. Endovascular repair versus open repair and contained data from the whole of the United of ruptured abdominal aortic aneurysms: a multicenter randomized Kingdom, results are likely to be applicable to future cases controlled trial. Ann Surg 2013;258:248-56. 5. Ambler GK, Twine CP, Shak J, Rollins KE, Varty K, Coughlin PA, within the United Kingdom and also to other countries et al. Survival following ruptured abdominal aortic aneurysm before with similar health care systems and populations. and during the IMPROVE trial: a single centre series. Eur J Vasc Before these risk prediction models can be wholeheart- Endovasc Surg 2014;47:388-93. edly endorsed, more studies are required to provide further 6. Nordon IM, Hinchliffe RJ, Loftus IM, Thompson MM. Pathophysi- external and independent validation. Efforts to perform ology and of abdominal aortic aneurysms. Nat Rev Cardiol 2011;8:92-102. this validation are currently in progress. In addition, in- 7. Walker K, Neuburger J, Groene O, Cromwell DA, van der Meulen J. hospital mortality is the sole outcome measure used in Public reporting of surgeon outcomes: low numbers of procedures this model, primarily because this is recorded accurately lead to false complacency. Lancet 2013;382:1674-7. and easily verifiable. However, other significant outcomes 8. Nashef S, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation include myocardial infarction, renal failure, and treatment (EuroSCORE). Eur J Cardiothorac Surg 1999;16:9-13. success. Although more challenging, an optimal modeling 9. Tang T, Walsh SR, Prytherch DR, Lees T, Varty K, Boyle JR. tool would also include these outcome events. VBHOM, a data economic model for predicting the outcome after open abdominal aortic aneurysm surgery. Br J Surg 2007;94: 717-21. CONCLUSIONS 10. Prytherch DR, Sutton G, Boyle JR. Portsmouth POSSUM models for We have developed accurate models to predict risk of abdominal aortic aneurysm surgery. Br J Surg 2001;88:958-63. ’ in-hospital mortality after AAA repair preoperatively and 11. Giles K, Schermerhorn M, O Malley A. Risk prediction for periopera- tive mortality of endovascular versus open repair of abdominal aortic postoperatively. These models were carefully developed aneurysms using the Medicare population. J Vasc Surg 2009;50: with rigorous statistical methodology, allowing clinicians 256-62. to have confidence in the risks they calculate. They signif- 12. Grant SW, Grayson AD, Purkayastha D, Wilson SD, McCollum C. icantly outperform existing models for both elective and Logistic risk model for mortality following elective abdominal aortic aneurysm repair. Br J Surg 2011;98:652-8. emergency, open and endovascular AAA repair. These 13. Grant SW, Grayson AD, Mitchell DC, McCollum CN. Evaluation of five models are powerful tools for both routine clinical practice risk prediction models for elective abdominal aortic aneurysm repair and clinical audit and are worthy of further validation and using the UK National Vascular Database. Br J Surg 2012;99:673-9. adoption. 14. Little RJ, D’Agostino RB, Cohen M, Dickersin K, Emerson SS, Farrar JT, et al. The prevention and treatment of missing data in clinical The authors would like to thank the members of the trials. N Engl J Med 2012;367:1355-60. Vascular Society of Great Britain and Ireland Audit and 15. Little RJ, Rubin DB. Statistical analysis with missing data. 2nd ed. New Quality Improvement committee, in particular Mr V. York: John Wiley & Sons; 2002. 16. Mitchell DC, Hindley H, Naylor R, Wyatt M, Loftus IM. Outcomes Smyth and Professor M. Thompson, for their helpful com- after elective repair of infra-renal abdominal aortic aneurysm. The ments during the preparation of this work. Vascular Society of Great Britain and Ireland; March 2012. 17. van Buuren S. Flexible imputation of missing data. Boca Raton, FL: CRC Press; 2012. AUTHOR CONTRIBUTIONS 18. van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation Conception and design: GA, MG, DM, JB by chained equations in R. J Stat Softw 2011;45:1-67. 19. R Core Team. R: a language and environment for statistical computing. Analysis and interpretation: GA, MG, IL, JB Vienna, Austria 2012. Data collection: GA 20. Baxter PD, Fleming TJ, West RM, Gilthorpe MS, Lees TA, Mitchell Writing the article: GA, MG, JB DC, et al. Assessment of National Vascular Database (NVD) quality. Critical revision of the article: GA, MG, DM, IL, JB Presented at AAA Research Meeting, Gloucester; 2011. Available at: http://aaa.screening.nhs.uk/researchmeetings. Accessed May 21, Final approval of the article: GA, MG, DM, IL, JB 2014. Statistical analysis: GA 21. Grant SW, Hickey GL, Grayson AD, Mitchell DC, McCollum CN. Obtained funding: GA National risk prediction model for elective abdominal aortic aneurysm Overall responsibility: JB repair. Br J Surg 2013;100:645-53. 22. Schwarz G. Estimating the dimension of a model. Ann Stat 1978;6: 461-4. REFERENCES 23. Venables WN, Ripley BD. Modern applied statistics with S. 4th ed. New York: Springer; 2002. 1. Anjum A, Powell JT. Is the incidence of abdominal aortic aneurysm 24. Fawcett T. An introduction to ROC analysis. Pattern Recogn Lett declining in the 21st century? Mortality and hospital admissions for 2006;27:861-74. England & Wales and Scotland. Eur J Vasc Endovasc Surg 2012;43: 25. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas 161-6. under two or more correlated receiver operating characteristic curves: a 2. Lederle FA, Johnson GR, Wilson SE, Ballard DJ, Jordan WD Jr, nonparametric approach. Biometrics 1988;44:837-45. Blebea J, et al. Rupture rate of large abdominal aortic aneurysms in 26. Rubin DB. Multiple imputation for nonresponse in surveys. New York: patients refusing or unfit for elective repair. JAMA 2002;287:2968-72. John Wiley & Sons; 1987. JOURNAL OF VASCULAR SURGERY Volume 61, Number 1 Ambler et al 43

27. Fichman M, Cummings JN. Multiple imputation for missing data: 31. Watson S, Johal A, Groene O, Cromwell D, Mitchell DC, Loftus IM. making the most of what you know. Organ Res Meth 2003;6: 2013 Report on surgical outcomes consultant-level statistics. London: 282-308. Royal College of Surgeons of England; 2013. 28. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. 32. Nashef SA. The new EuroSCORE project. Kardiol Pol 2010;68:128-9. pROC: an open-source package for R and Sþ to analyze and compare 33. National Vascular Registry. London: Healthcare Quality Improvement ROC curves. BMC 2011;12:77. Partnership; 2009-2014. Available at: http://www.hqip.org.uk/ 29. Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Vasan RS. Eval- national-vascular-registry. Accessed March 13, 2014. uating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008;27: Submitted Mar 14, 2014; accepted Jun 2, 2014. 157-72. 30. Hosmer DW, Lemeshow S. Applied logistic regression. New York: Additional material for this article may be found online Wiley; 2000. at www.jvascsurg.org. JOURNAL OF VASCULAR SURGERY 43.e1 Ambler et al January 2015

Supplementary Table (online only). Variables considered for inclusion in the Abdominal Aortic Aneurysm Statistically Corrected Operative Risk Evaluation (AAA SCORE), with the amount of missing data for each variable

Variable % Missing Options

Admission mode 7.9 Elective, unplanned, emergency Age at admission 0.0 Albumin, g/L 45.5 Anesthetic 14.9 General, epidural/spinal, local/regional Aneurysm: symptomatic? 19.7 No, yes Antiplatelet agent 19.9 No, yes ASA grade 16.9 Normal and healthy, mild systemic disease, severe systemic disease, incapacitating disease, moribund Beta blocker 22.7 No, yes Cardiac history: any IHD or CCF 12.9 No, yes Creatinine, mmol/L 19.6 Current smoker (up to within 2 months) 14.7 No, yes Diabetes 12.8 No, yes Diastolic BP, mm Hg 35.0 ECG 22.5 Normal, AF 60-90, AF >90, >5 ectopic beats/min, Q or ST/T changes, other abnormal rhythm/ changes EVAR 10.6 No, yes Exposure to intravenous contrast material in 3 days 32.2 No, yes preoperatively Patient from local small AAA surveillance 21.5 No, yes Gender 0.2 Male, female Glasgow Coma Scale score 13.4 Grade of anesthetist 19.7 Consultant, staff grade, trainee Grade of most senior surgeon 7.8 Consultant, staff grade, trainee Hemoglobin, g/dL 19.2 In-hospital mortality 0.1 No, yes International normalized ratio 44.0 Intraoperative blood loss 25.6 #11, 1-21, 2-51, >51 Intraoperative highest pulse 31.4 Intraoperative lowest systolic BP 29.7 Loss of consciousness prior to theatre 9.8 No, yes Maximum diameter, cm 20.3 Patient taking a statin 24.6 No, yes Operation duration 0.1 Patient in AAA screening program 20.8 No, yes Positive preoperative MRSA screen in last 3 months 25.4 No, yes, not done Potassium, mmol/L 20.4 Preoperative highest pulse 23.7 Preoperative lowest systolic BP 22.5 Previous aortic surgery/stent 15.5 No, yes Procedure code 0.0 Reoperation after previous AAA repair 14.8 No, yes Renal dialysis 14.4 No, yes Renal transplantation 14.5 No, yes Sodium, mmol/L 20.2 Statin 19.0 No, yes Suitable for EVAR 21.6 No, yes Transfer from another hospital 13.7 No, yes Urea, mmol/L 22.9 Volume of nonsaline crystalloid, mL 62.1 White cell count, 109/L 20.9 Wound class 6.2 Clean, clean/contaminated, contaminated, dirty

AAA, Abdominal aortic aneurysm; AF, atrial fibrillation; ASA, American Society of Anesthesiologists; BP, blood pressure; CHF, congestive heart failure; ECG, electrocardiography; EVAR, endovascular aneurysm repair; IHD, ischemic heart disease; MRSA, methicillin-resistant Staphylococcus aureus.