Prognostication in Comatose Survivors of Cardiac Arrest. an Advisory Statement from The s1

Prognostication in comatose survivors of cardiac arrest. An Advisory Statement from the European Resuscitation Council and the European Society of Intensive Care Medicine

Intensive Care Medicine, 2014.

Claudio Sandroni, Alain Cariou, Fabio Cavallaro, Tobias Cronberg, Hans Friberg, Cornelia Hoedemaekers, Janneke Horn, Jerry P. Nolan, Andrea O. Rossetti and Jasmeet Soar

Corresponding author:

Claudio Sandroni

Department of Anaesthesiology and Intensive Care

Catholic University School of Medicine,

Largo Gemelli, 8 - 00168 Rome, Italy [email protected] ESM Appendix 2 - Method for grading the quality of evidence

Quality of evidence According to GRADE, the quality of evidence is graded as high, moderate, low or very low according to the study design and to the presence of the following factors: 1) limitations; 2) indirectness; 3) inconsistency; 4) imprecision and 5) publication bias. Publication bias was not considered, given the difficulty of measuring it in prognostic studies[1].

Study design In the GRADE system the ideal study design for informing recommendations is a randomised trial; observational studies are deemed low level evidence [2]. However, valid studies of test accuracy may start as high quality[3]. These studies involve a comparison between the test under consideration and an appropriate reference. For diagnostic accuracy studies this reference is the test considered to be the gold standard, while for prognostic accuracy studies the comparison is made between the predicted outcome and the real outcome of the patient at a given time point, assessed by blinded evaluators.

Limitations Limitations (risk of bias), indirectness, inconsistency and imprecision decrease the quality of evidence by one level if serious, or by two levels if very serious.

Given the importance of the risk of self-fulfilling prophecy, limitations were graded as serious when the treating team was not blinded to the results of the predictor of poor outcome that was being studied, and very serious when the investigated predictor was used for the decision to WLST. Other factors that were considered in order to evaluate the presence of limitations are: blinding of outcome evaluators; exclusion of non-neurological causes of death (or description of the best CPC in patients who died at the end of the study period); exclusion of previous neurological disease; exclusion of sedation (for indices based on clinical examination or EEG); exclusion of patients receiving neuromuscular blocking drugs (for indices based on clinical examination); length of follow-up.

Indirectness Indirectness was deemed present when the described outcome did not completely correspond to that described in the inclusion criteria; in practice, when the poor outcome was defined as CPC 4-5 (i.e., vegetative state or death) instead of 3-5 (severe neurological disability, vegetative state, or death). Inconsistency Inconsistency across studies was evaluated after pooling. Inconsistency was graded as serious when heterogeneity was significant (p<0.1 or I2>50%) for either sensitivity or specificity, and it was graded as very serious when heterogeneity was significant for both of them.

Imprecision When evaluating the accuracy of predictors of poor outcome in critically ill patients, avoiding false positives (i.e., falsely pessimistic predictions) is particularly important. Ideally, the rate of false positives (FPR) should be zero. However, even a zero FPR has little value when the precision of its estimate is low, i.e., when the point estimate has a large confidence interval (CI). Imprecision was therefore graded as serious when the upper limit of the 95% CI of the FPR estimate was greater than 5%, and it was graded as very serious when this value was more than 10%. CIs were calculated using the F distribution method, according to Blyth.[4]

Recommendations GRADE has adopted a four-category classification for recommendations. A recommendation can be for or against a given management approach and its strength can be strong or weak.

Criteria for determining the strength of a recommendation The four domains that contribute to the strength of a GRADE recommendation are:[5]

1) The balance between desirable and undesirable outcomes;

2) The confidence and variability in values and preferences that patients (or population) apply to those outcomes;

3) The confidence in the magnitude of the estimates of effect (i.e., the quality of evidence);

4) The resource use, i.e. the cost of the strategy under evaluation.

Domain 1 - The balance between desirable and undesirable effects for a test predicting poor outcome depends not only on the quality of these effects, but also on the test performance (i.e. on the balance between true and false predictions given by that test). Table 1 below summarizes the patient-important outcomes associated with the four possible results of a test for predicting poor neurological outcome.

Sensitivity (TP/[TP+FN]) and specificity (TN[TN+FP]) of a test summarize the balance between the desirable and the undesirable test results and help to assess the balance between alternative treatment strategies based on them. When our panel was highly confident of the balance between desirable and undesirable consequences, we made a strong recommendation for (desirable outweighs undesirable) or against (undesirable outweighs desirable) a prognostication strategy. If we were less confident of the balance between desirable and undesirable consequences, we offered a weak recommendation. Table 1. Expected outcomes of results of a test predicting poor neurological outcome in a patient who is comatose after having been resuscitated from cardiac arrest

Population Adult patients who are comatose after resuscitation from cardiac arrest Intervention Prognostic test under evaluation Comparison Standard care (no prognostication) Outcomes For the patient For the family For the community

The patient is correctly predicted to have a poor Family stress due to Cost reduction since outcome and in most cases will undergo uncertainty about unnecessary diagnostic TP limitations or withdrawal of life sustaining patient outcome will procedures or treatment (WLST). Inappropriate treatments will be avoided. treatments will be be avoided. avoided. The patient is correctly predicted to have a good Knowing there is a Appropriate use of neurological outcome (1). Appropriate treatments reasonable chance of resources in patients TN will be continued even in the presence of recovery will comfort with a reasonable persistent unresponsiveness (2). the patient’s family. chance of recovery The patient is predicted to have a poor outcome Unnecessary suffering Burden of death but will ultimately recover with only mild or no for patient’s family secondary to incorrect neurological sequelae. Risk of inappropriate caused by a falsely prognostication and FP treatment limitation or WLST and consequent pessimistic prediction. inappropriate WLST poor outcome because of a falsely pessimistic prediction (self-fulfilling prophecy). The patient will continue to receive treatment Stress and suffering Resources spent on despite having an eventually poor outcome, due for patient’s family treating a patient with FN to a prediction of good neurological recovery. Risk because of unrealised no chance of recovery. of unnecessary prolonged treatment in a patient hope of good with irreversible brain injury. recovery. Complications of the test They could be relevant for some tests, such as Resource utilization (cost) MRI

Notes (1) Interference from residual sedation or paralysis may cause persistent unconsciousness in patients with no or minimal brain damage. (2) Correct prediction of good neurological outcome will not rule out subsequent death due to cardiovascular complications or multiorgan failure, which tend to occur mostly in the early post resuscitation phase.

Domain 2 - The confidence in values and preferences that patients or a community apply to the consequences and outcomes described above and the knowledge of their variability. Given the paucity of relevant studies on patients’ values and preferences, the panel made assumptions on this and refined them through discussion and consensus; Domain 3 – The confidence in the magnitude of the estimates of effect corresponds to the quality of evidence of the test that can be found in the relevant Evidence Profile Tables. The higher the quality of evidence, the more likely a strong recommendation is warranted.[5]

Domain 4 - The higher the costs of a prognostic test, the less likely a strong recommendation is warranted. For example, additional costs may be low in the case of a clinical examination but may be high for MRI. Domain 4 includes other variables, such as complications (e.g., a test which can be made only outside ICU may imply an additional risk for the patient) and feasibility (e.g. based on the availability of a test.

Process for grading evidence and recommendations

The process included the following steps:

1. Guideline panel members rated the relative importance of all outcomes and consequences using Table 1 as a basis. This was made a priori, i.e., before examining the Evidence Profile tables of the various predictors. Rating was assigned on a Likert scale from a minimum value (informative but not important for decision making) to a maximum (critical for decision making)

2. The panel reviewed the Evidence Profile tables of all predictors to be included in the recommendations, in order to evaluate the balance between favourable and unfavourable outcomes. These tables included the timing of the test, its performance (sensitivity, specificity, false positive rates and their respective confidence intervals), whether the test has been used for decisions of WLST, and the quality of evidence.

3. Finally, the panel members voted on the overall strength of recommendation for included predictors. This was accomplished using a web-based survey software (SurveyMonkey ® www.surveymonkey.com) followed by reports and web-based discussion, when needed.

References 1. Rifai N, Altman DG, Bossuyt PM, (2008) Reporting bias in diagnostic and prognostic studies: time for action. Clinical chemistry 54: 1101-1103 2. Balshem H, Helfand M, Schunemann HJ, Oxman AD, Kunz R, Brozek J, Vist GE, Falck-Ytter Y, Meerpohl J, Norris S, Guyatt GH, (2011) GRADE guidelines: 3. Rating the quality of evidence. Journal of clinical epidemiology 64: 401-406 3. Schunemann HJ, Oxman AD, Brozek J, Glasziou P, Jaeschke R, Vist GE, Williams JW, Jr., Kunz R, Craig J, Montori VM, Bossuyt P, Guyatt GH, (2008) Grading quality of evidence and strength of recommendations for diagnostic tests and strategies. Bmj 336: 1106-1110 4. Blyth CR, (1986) Approximate Binomial Confidence Limits. Journal of the American Statistical Association 81: 843-855 5. Andrews JC, Schunemann HJ, Oxman AD, Pottie K, Meerpohl JJ, Coello PA, Rind D, Montori VM, Brito JP, Norris S, Elbarbary M, Post P, Nasser M, Shukla V, Jaeschke R, Brozek J, Djulbegovic B, Guyatt G, (2013) GRADE guidelines: 15. Going from evidence to recommendation-determinants of a recommendation's direction and strength. Journal of clinical epidemiology 66: 726-735