Uncertainty-Aware Variational-Recurrent Imputation Network for Clinical Time Series Ahmad Wisnu Mulyadi, Eunji Jun, and Heung-Il Suk, Member, IEEE

UNDER REVIEW 1 Uncertainty-Aware Variational-Recurrent Imputation Network for Clinical Time Series Ahmad Wisnu Mulyadi, Eunji Jun, and Heung-Il Suk, Member, IEEE Abstract—Electronic health records (EHR) consist of longitudi- data yields poor performance at high missing rates and also nal clinical observations portrayed with sparsity, irregularity, and requires modeling separately for the distinct dataset. In fact, high-dimensionality, which become major obstacles in drawing those missing values reflect the decisions made by health-care reliable downstream clinical outcomes. Although there exist great numbers of imputation methods to tackle these issues, most of providers [5]. Therefore, the missing values and their patterns them ignore correlated features, temporal dynamics and entirely contain information regarding a patient’s health status [5] and set aside the uncertainty. Since the missing value estimates involve correlate with the outcomes or target labels [6]. Thus, we the risk of being inaccurate, it is appropriate for the method to resort to the imputation approach by exploiting those missing handle the less certain information differently than the reliable patterns to improve the prediction of the clinical outcomes as data. In that regard, we can use the uncertainties in estimating the missing values as the fidelity score to be further utilized to the downstream task. alleviate the risk of biased missing value estimates. In this work, There exist numerous strategies for imputing missing values we propose a novel variational-recurrent imputation network, in the literature. Generally, imputation methods can be clas- which unifies an imputation and a prediction network by taking sified into a deterministic or stochastic approach, depending into account the correlated features, temporal dynamics, as well on the use of randomness [7]. Deterministic methods, such as as the uncertainty. Specifically, we leverage the deep generative model in the imputation, which is based on the distribution mean [8] and median filling [9], produce only one possible among variables, and a recurrent imputation network to exploit value when estimating the missing values. However, it is the temporal relations, in conjunction with utilization of the desirable for the imputation model to generate values or sam- uncertainty. We validated the effectiveness of our proposed model ples by considering the distribution of the available observed on two publicly available real-world EHR datasets: PhysioNet data. Thus, it leads us to the employment of stochastic-based Challenge 2012 and MIMIC-III, and compared the results with other competing state-of-the-art methods in the literature. imputation methods. The recent rise of deep learning models has offered potential Index Terms—Bioinformatics, Deep generative model, Deep solutions in accommodating such circumstances. Variational learning, Electronic health records (EHR), In-hospital mortality prediction, Missing value imputation, Time-series modeling, autoencoders (VAEs) [10] and generative adversarial networks Uncertainty (GANs) [11] exploit the latent distribution of high-dimensional incomplete data and generate comparable data points as the approximation estimates for the missing or corrupted values I. INTRODUCTION [12, 13, 14]. However, such deep generative models are LECTRONIC health records (EHRs) store longitudinal insufficient for estimating the missing values of multivariate E data consisting of patients’ clinical observations in the time series owing to their nature of ignoring temporal relations intensive care unit (ICU). Despite the surge of interest in clin- between a span of time points. On the other hand, by virtue ical research on EHR, it still holds diverse challenging issues of recurrent neural networks (RNNs), which have proved to to be tackled, these include high-dimensionality, temporality, perform remarkably well in modeling sequential data, we can sparsity, irregularity, and bias [1, 2]. Specifically, sequences estimate the complete data by taking into account the temporal arXiv:2003.00662v2 [cs.LG] 14 Nov 2020 of multidimensional medical measurements are recorded ir- characteristics. In this approach, GRU-D [6] introduced a mod- regularly in terms of its variables and times. The reasons ified gated-recurrent unit (GRU) cell to model missing patterns behind these typical measurements are diverse, such as lack in the form of masking vectors and temporal delays. Likewise, of collection, documentation, or even recording faults [1, 3]. BRITS [15] exploited the temporal relations of bidirectional Since it carries essential information regarding a patient’s dynamics by considering feature correlations in estimating the health status, improper handling of missing values might cause missing values. Moreover, inspired by the residual networks an unintentional bias [3, 4], yielding an unreliable downstream and graph-based methods, [16] proposed the LIME-RNN to analysis and verdict. tackle the imputation task by integrating the previous hidden Complete-case analysis is an approach that draws clinical states of RNN in the forms of linear memory vector. outcomes by disregarding the missing values and relying Even though such models employed the stochastic approach only on the observed values. However, excluding the missing for inferring and generating samples by utilizing both features A. W. Mulyadi and E. Jun are with the Department of Brain and Cog- and temporal relations, they scarcely exploited the uncertainty nitive Engineering, Korea University, Seoul 02841, Korea (e-mail: wisnu- in estimating the missing values in multivariate time series [email protected], [email protected]). H.-I. Suk is with the Department data (i.e., since the imputation estimates are not thoroughly of Artificial Intelligence and the Department of Brain and Cognitive Engi- neering, Korea University, Seoul 02841 Korea (e-mail: [email protected]). accurate, we may introduce a fidelity score denoted by the (Corresponding author: Heung-Il Suk) uncertainty, which enhances the downstream task performance UNDER REVIEW 2 by emphasizing the reliable information more than the less II. RELATED WORK certain information) [14, 17, 18]. We can use the imputation model, which captures the aleatoric uncertainty in estimating Imputation strategies have been extensively devised to re- the missing values by placing a distribution over the output solve the issue of sparse high-dimensional time series data of the model [19]. In particular, we would like to estimate by means of statistics, machine learning, or deep learning the heteroscedastic aleatoric uncertainties, which are useful in methods. For instance, previous works exploited statistical cases where observation noises vary with the input [19]. attributes of observed data, such as mean [8] and median In this work, we define our primary task as the prediction filling [9], which clearly ignored the temporal relations and of in-hospital mortality on clinical time series data. However, the correlations among variables. From the machine learning since such data are portrayed with sparse and irregularly- approaches, k-nearest neighbors (KNN) [20], and principal sampled characteristics, we devise an imputation model as component analysis (PCA) [21] were proposed by considering the secondary problem to enhance the clinical outcome pre- the relationships of the features either in the original or in dictions. We propose a novel variational-recurrent imputation the latent space. In addition, imputation methods based on network (V-RIN) which unifies the imputation and prediction the expectation-maximization (EM) algorithm [22, 23] were networks for multivariate time series EHR data, governing also proposed in tackling the missing values estimations. Fur- both correlations among variables and temporal relations. thermore, multiple imputation by chained equations (MICE) Specifically, given the sparse data, an inference network of [24, 25] introduced variability by means of repeating the VAEs is employed to capture data distribution in the latent imputation process multiple times. However, these methods space. From this, we employ a generative network to obtain the ignore the temporal relations as crucial attributes in time series reconstructed data as the imputation estimates for the missing modeling and disregard the exploitation of the uncertainties in values and the uncertainty indicating the imputation fidelity estimating the missing values. score. Then, we integrate the temporal and feature correlations The deep learning-based imputation models, are closely into a combined vector and feed it into a novel uncertainty- related to our proposed models. A previous study [12] lever- aware GRU in the recurrent imputation network. Finally, we aged VAEs to generate stochastic imputation estimates by obtain the mortality prediction as a clinical verdict from the exploiting the distribution and correlations of features in the complete imputed data. In general, our main contributions in latent space. However, it ignored the temporal relations as well this study are as follows: as the uncertainties. Meanwhile, GP-VAE [26] was proposed to obtain the latent representation by means of VAEs and model • We estimate the missing values by utilizing a deep temporal dynamics in the latent space using the Gaussian generative model combined with a recurrent imputation process. However, since the model is merely focused on the network to capture both feature correlations and the imputation task, they required a separate

Uncertainty-Aware Variational-Recurrent Imputation Network for Clinical Time Series Ahmad Wisnu Mulyadi, Eunji Jun, and Heung-Il Suk, Member, IEEE

Imputets: Time Series Missing Value Imputation in R by Steffen Moritz and Thomas Bartz-Beielstein

Potential Sources of Missing Data in a Meta-Analysis

Multiple Imputation: a Review of Practical and Theoretical Findings Jared S

Within-Industry Productivity Dispersion and Imputation for Missing Data

Dealing with Missing Standard Deviation and Mean Values in Meta-Analysis of Continuous Outcomes: a Systematic Review Christopher J

Missing Value Imputation Using Feature-Specific Generative

Should a Normal Imputation Model Be Modified to Impute Skewed Variables?

Supervised Nonnegative Matrix Factorization to Predict ICU

Optimal Recovery of Missing Values for Non-Negative Matrix Factorization Rebecca Chen and Lav R

Convex Nonnegative Matrix Factorization with Missing Data Ronan Hamon, Valentin Emiya, Cédric Févotte

From Predictive Methods to Missing Data Imputation: an Optimization Approach

Missing Data Imputation Using Optimal Transport