A Time-Updated, Parsimonious Model to Predict AKI in Hospitalized Children
Total Page:16
File Type:pdf, Size:1020Kb
CLINICAL RESEARCH www.jasn.org A Time-Updated, Parsimonious Model to Predict AKI in Hospitalized Children Ibrahim Sandokji ,1,2 Yu Yamamoto ,2 Aditya Biswas,2 Tanima Arora ,2 Ugochukwu Ugwuowo,2 Michael Simonov,2 Ishan Saran,2 Melissa Martin ,2 Jeffrey M. Testani,3 Sherry Mansour,2 Dennis G. Moledina ,2 Jason H. Greenberg ,1,2 and F. Perry Wilson 2 1Department of Pediatrics, Section of Nephrology, Yale University School of Medicine, New Haven, Connecticut 2Clinical and Translational Research Accelerator, Department of Medicine, Yale University School of Medicine, New Haven, Connecticut 3Section of Cardiovascular Medicine, Department of Internal Medicine, Yale University School of Medicine, New Haven, Connecticut ABSTRACT Background Timely prediction of AKI in children can allow for targeted interventions, but the wealth of data in the electronic health record poses unique modeling challenges. Methods We retrospectively reviewed the electronic medical records of all children younger than 18 years old who had at least two creatinine values measured during a hospital admission from January 2014 through January 2018. We divided the study population into derivation, and internal and external valida- tion cohorts, and used five feature selection techniques to select 10 of 720 potentially predictive variables from the electronic health records. Model performance was assessed by the area under the receiver operating characteristic curve in the validation cohorts. The primary outcome was development of AKI (per the Kidney Disease Improving Global Outcomes creatinine definition) within a moving 48-hour win- dow. Secondary outcomes included severe AKI (stage 2 or 3), inpatient mortality, and length of stay. Results Among 8473 encounters studied, AKI occurred in 516 (10.2%), 207 (9%), and 27 (2.5%) encounters in the derivation, and internal and external validation cohorts, respectively. The highest-performing model used a machine learning-based genetic algorithm, with an overall receiver operating characteristic curve in the internal validation cohort of 0.76 [95% confidence interval (CI), 0.72 to 0.79] for AKI, 0.79 (95% CI, 0.74 to 0.83) for severe AKI, and 0.81 (95% CI, 0.77 to 0.86) for neonatal AKI. To translate this prediction model into a clinical risk-stratification tool, we identified high- and low-risk threshold points. Conclusions Using various machine learning algorithms, we identified and validated a time-updated pre- diction model of ten readily available electronic health record variables to accurately predict imminent AKI in hospitalized children. JASN 31: 1348–1357, 2020. doi: https://doi.org/10.1681/ASN.2019070745 AKI develops in approximately 30% of hospitalized children in intensive care units (ICUs) and 5% in Received July 26, 2019. Accepted March 13, 2020. 1–5 non-ICU settings. Hospital-acquired AKI is as- Published online ahead of print. Publication date available at sociated with a longer hospital stay, higher mortal- www.jasn.org. ity, and an increased economic burden.1–4,6 AKI Correspondence: Francis Perry Wilson, Program of Applied may also increase the risk of long-term complica- Translational Research, Temple Medical Center, 60 Temple tions such as CKD, proteinuria, and hypertension.3 Street Suite 6C, New Haven, CT 06510. Email: francis.p.wilson@ AKI management strategies are largely supportive and yale.edu include avoiding nephrotoxin exposure, optimizing Copyright © 2020 by the American Society of Nephrology 1348 ISSN : 1046-6673/3106-1348 JASN 31: 1348–1357, 2020 www.jasn.org CLINICAL RESEARCH volume status, and addressing electrolyte imbalances.4 Prior Significance Statement studies have suggested that the early identification of patients at risk of developing AKI and subsequent nephrotoxin avoidance Because AKI in hospitalized children is associated with poor can decrease the AKI rate by 64%.7 outcomes, a tool allowing early identification of children at risk of The AKI prediction workgroup of the Acute Dialysis developing AKI may facilitate timely interventions. The authors describe various machine learning techniques used to build a par- Quality Initiative recommended creating electronic health simonious model predictive of pediatric AKI. From an initial pool of record (EHR)-integrated, real-time AKI prediction models 720 potential variables, they evaluated multiple feature selection that combine risk factors from prototype prediction models techniques to create a ten-feature logistic regression model that with novel risk factors using machine learning methods.8 could predict, in time-updated fashion, the risk of AKI in the next fl Current AKI prediction models have helped build our un- 48 hours. A machine learning-based genetic algorithm (re ecting fi the process of natural selection) was the best variable selection derstanding of the eld of AKI prediction in children, but method, using ten factors extracted from electronic health records they have limitations as they mainly rely on baseline admis- to use for AKI prediction. Risk-stratifying hospitalized children sion data, were developed with a limited set of AKI predictors, might allow clinicians to implement targeted and timely interven- and do not include neonates.9,10 Although neonates have a tions prior to AKI development. lower GFR as compared to older children, they share many 4,11 similar AKI risk factors. Additionally, none of the current level ,4 mg/dl or an eGFR of .15 ml/min per 1.73 m2 were prediction models were created in an unbiased clinically ag- excluded. We calculated eGFR using the modified Schwartz nostic approach leveraging the diversity of variables available equation, or if height was not available, using the Full Age 9,10,12 in the EHR. Spectrum equation.14 The Yale Human Investigation Com- The EHR contains a wealth of variables that may predict mittee reviewed and approved the study protocol under a AKI, but brute-force methods to evaluate parsimonious mod- waiver of informed consent. els are computationally infeasible. For example, there are 21 9.69*10 models containing 10 variables that could be created Dependent (Outcome) Variables from a set of 720 variables (as we had in this study). Even The primary outcome was the development of AKI within the assuming one could evaluate 1,000,000 such models per sec- next 48 hours, updated throughout the hospital stay. We used ond using logistic regression (infeasible with even modern the Kidney Disease Improving Global Outcome creatinine supercomputers), it would take more than 300 million years definition using a rolling window that compared the current to find the best possible combination of 10 variables in a space creatinine to the lowest in the past 48 hours or 7 days. Stage 1 with 7001 potential variables. Therefore, researchers are often AKI was defined as a 0.3 mg/dl increase in creatinine within forced to create parsimonious models based on prior research 48 hours or an increase of 1.5 times the lowest measured cre- or clinical intuition, which is biased against novel predictors. atinine within the prior 7 days.4 Secondary outcomes included Several algorithms, including forward selection, Least Absolute severe AKI (stage 2 or 3) on the same time scale, inpatient Shrinkage and Selection Operator (LASSO) regression, and oth- mortality, and length of stay. Stage 2 and 3 AKI are defined ers, have been proposed to more efficiently find parsimonious as an increase in creatinine level by 2–2.9 times and an models that may allow for novel features to be included.13 This increase $3 times baseline, respectively.4 Urine output was study presents several such methods of narrowing the difficulty not used to define AKI due to missingness and expected diffi- of the problem. We aimed to create a novel AKI prediction culty in application in real time. To qualify for our AKI defini- model that is time-updated, is parsimonious, and could be tion, an absolute value of serum creatinine .0.5 mg/dl was also used in all hospitalized neonates and children to predict AKI required.11,15 This avoids clinically dubious changes in creati- within the next 48 hours. nine (such as from 0.2 mg/dl to 0.3 mg/dl) being considered diagnostic of AKI. To limit the number of imputed values and to make predictions similar to what would be done in prospec- METHODS tive implementation, we excluded timepoints prior to the first creatinine measurement. Study Design and Population We retrospectively reviewed the medical records of all children Independent (Predictor) Variables Selection younger than 18 years old, who had at least two creatinine We included a total of 720 candidate variables extracted from values measured at any time during a hospital admission the EHR including patient demographics, vital signs, patient from January 2014 to January 2018 to two hospitals in the locations, diagnoses, laboratory results, and medication expo- Yale-New Haven Health System, a large, academic, tertiary sures (Figure 1, Supplemental Table 1). We excluded from care center. The two hospitals are Yale-New Haven Children’s analysis highly collinear variables with a correlation coeffi- Hospital and Bridgeport Hospital, the latter being a hospital cient r of 0.95 or greater. For continuous variables with data affiliated with the same Health System, but staffed by other missing for ,25% of patient encounters, measured values physicians and caring for a significantly less acutely ill pediat- were carried forward until remeasurement occurred. For ric population. Patients with an initial serum creatinine timepoints with no previous measurement, we imputed JASN 31: 1348–1357, 2020 EHR-Based Pediatric AKI Prediction 1349 CLINICAL RESEARCH www.jasn.org Demographics Vital signs Locations Laboratory variables Medications Procedures 7 variables 6 variables 24 27 222 315 34 85 variables Continuous Categorical Individual Medication variables variables variables medications Groups Figure 1. Candidate variable groups. A total of 720 candidate variables available in the EHR were included to create a parsimonious predictive model. derivation-set medians. We transformed continuous variables is chosen and applied on regression coefficients.