Clustering Longitudinal Clinical Marker Trajectories from Electronic Health Data: Applications to Phenotyping and Endotype Discovery
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Clustering Longitudinal Clinical Marker Trajectories from Electronic Health Data: Applications to Phenotyping and Endotype Discovery Peter Schulam Fredrick Wigley Suchi Saria Department of Computer Science Division of Rheumatology Department of Computer Science Johns Hopkins University Johns Hopkins School of Medicine Department of Health Policy & Mgmt. 3400 N. Charles St. 733. N. Broadway Johns Hopkins University Baltimore, MD 21218 Baltimore, MD 21205 3400 N. Charles St. [email protected] [email protected] Baltimore, MD 21218 [email protected] Abstract more thorough retrospective or prospective study to confirm their existence (e.g. Barr et al. 1999). Recently, however, Diseases such as autism, cardiovascular disease, and the au- literature in the medical community has noted the need for toimmune disorders are difficult to treat because of the re- markable degree of variation among affected individuals. more objective methods for discovering subtypes (De Keu- Subtyping research seeks to refine the definition of such com- lenaer and Brutsaert 2009). Growing repositories of health plex, multi-organ diseases by identifying homogeneous pa- data stored in electronic health record (EHR) databases tient subgroups. In this paper, we propose the Probabilis- and patient registries (Blumenthal 2009; Shea and Hripc- tic Subtyping Model (PSM) to identify subgroups based on sak 2010) present an exciting opportunity to identify disease clustering individual clinical severity markers. This task is subtypes in an objective, data-driven manner using tools challenging due to the presence of nuisance variability— from machine learning that can help to tackle the problem variations in measurements that are not due to disease of combing through these massive databases. In this work, subtype—which, if not accounted for, generate biased esti- we propose such a tool, the Probabilistic Subtyping Model mates for the group-level trajectories. Measurement sparsity (PSM), that is designed to discover subtypes of complex, and irregular sampling patterns pose additional challenges in clustering such data. PSM uses a hierarchical model to ac- systemic diseases using longitudinal clinical markers col- count for these different sources of variability. Our experi- lected in EHR databases and patient registries. ments demonstrate that by accounting for nuisance variabil- Discovering and refining disease subtypes can benefit ity, PSM is able to more accurately model the marker data. both the practice and the science of medicine. Clinically, dis- We also discuss novel subtypes discovered using PSM and ease subtypes can help to reduce uncertainty in expected out- the resulting clinical hypotheses that are now the subject of come of an individual’s case, thereby improving treatment. follow up clinical experiments. Subtypes can inform therapies and aid in making prognoses and forecasts about expected costs of care (Chang, Clark, Introduction and Background and Weiner 2011). Scientifically, disease subtypes can help Disease subtyping is the process of developing criteria for to improve the effectiveness of clinical trials (Gundlapalli stratifying a population of individuals with a shared dis- et al. 2008), drive the design of new genome-wide asso- ease into subgroups that exhibit similar traits; a task that ciation studies (Kho et al. 2011; Kohane 2011), and al- is analogous to clustering in machine learning. Under the low medical scientists to view related diseases through a assumption that individuals with similar traits share an un- more fine-grained lens that can lead to insights that con- derlying disease mechanism, disease subtyping can help nect their causes and developmental pathways (Hoshida et to propose candidate subgroups of individuals that should al. 2007). Disease subtyping is considered to be especially be investigated for biological differences. Uncovering such useful for complex, systemic diseases where mechanism is differences can shed light on the mechanisms specific to often poorly understood. Examples of disease subtyping re- each group. Observable traits useful for identifying sub- search include work in autism (State and Sestan 2012), car- populations of similar patients are called phenotypes. When diovascular disease (De Keulenaer and Brutsaert 2009), and such traits have been linked to a distinct underlying pathobi- Parkinson’s disease (Lewis et al. 2005). ological mechanism, these are then referred to as endotypes Complex, systemic diseases are characterized using the (Anderson 2008). level of disease activity present in an array of organ sys- Traditionally, disease subtyping research has been con- tems. Clinicians typically measure the influence of a dis- ducted as a by-product of clinical experience. A clinician ease on an organ using clinical tests that quantify the ex- may notice the presence of subgroups, and may perform a tent to which that organ’s function has been affected by the Copyright c 2015, Association for the Advancement of Artificial disease. The results of these tests, which we refer to as ill- Intelligence (www.aaai.org). All rights reserved. ness severity markers (s-markers for short), are being rou- 2956 tinely collected over the course of care for large numbers an individual’s time series into windows in order to discover of patients within EHR databases and patient registries. For transient traits that are expressed over shorter durations— a single individual, the time series formed by the sequence minutes, hours, or days. For example, Saria et al. propose of these s-markers can be interpreted as a disease activity a probabilistic framework for discovering traits from phys- trajectory. Operating under the hypothesis that individuals iologic time series that have similar shape characteristics with similar disease activity trajectories are more likely to (Saria, Duchi, and Koller 2011) or dynamics characteristics share mechanism, our goal in this work is to cluster individ- (Saria, Koller, and Penn 2010). Lasko et al. use deep learn- uals according to their longitudinal clinical marker data and ing to induce an over-complete dictionary in order to de- learn the associated prototypical disease activity trajectories fine traits observed in shorter segments of clinical markers (i.e. a continuous-time curve characterizing the expected s- (Lasko, Denny, and Levy 2013). marker values over time) for each subtype. Beyond s-markers, others have used ICD-9 codes—codes The s-marker trajectories recorded in EHR databases are indicating the presence or absence of a condition—to study influenced by factors such as age and co-existing condi- comorbidity patterns over time among patients with a shared tions that are unrelated to the underlying disease mecha- disease (e.g. Doshi-Velez, Ge, and Kohane 2014). ICD-9 nism (Lotvall¨ et al. 2011). We call the effects of these ad- codes are further removed from the biological processes ditional factors nuisance variability. In order to correctly measured by quantitative tests. Moreover, the notion of dis- cluster individuals and uncover disease subtypes that are ease severity is more difficult to infer from codes. likely candidates for endotyping (Lotvall¨ et al. 2011), it is Latent class mixed models (LCMMs) are a family of important to model and explain away nuisance variability. methods designed to discover subgroup structure in longitu- In this work, we account for nuisance variability in the fol- dinal datasets using fixed and random effects (e.g., Muthen´ lowing ways. First, we use a population-level regression on and Shedden 1999, McCulloch et al. 2002, and Nagin and to observed covariates—such as demographic characteris- Odgers 2010). Random effects are typically used in linear tics or co-existing conditions—to account for variability in models where an individual’s coefficients may be probabilis- s-marker values across individuals. For example, lung func- tically perturbed from the group’s, which alters the model’s tion as measured by the forced expiratory volume (FEV) test fit to the individual over the entire observation period. Mod- is well-known to be worse in smokers than in non-smokers eling s-marker data for chronic diseases where data are col- (Camilli et al. 1987). Second, we use individual-specific pa- lected over tens of years requires accounting for additional rameters to account for variability across individuals that is influences such as those due to transient disease activity. The not predicted using the observed covariates. This form of task of modeling variability between related time series has variability may last throughout the course of an individual’s been explored in other contexts (e.g. Listgarten et al. 2006 disease (e.g. the individual may have an unusually weak and Fox et al. 2011). These typically assume regularly sam- respiratory system) or may be episodic (e.g. periods dur- pled time series and model properties that are different from ing which an individual is recovering from a cold). Finally, those in our application. the subtypes’ prototypical disease activity trajectories must Work in the machine learning literature has also looked at be inferred from measurement sequences that vary widely relaxing the assumption of regularly sampled data. Marlin between individuals in the time-stamp of the first measure- et al. cluster irregular clinical time series from in-hospital ment, the duration between consecutive measurements, and patients to improve mortality prediction using Gaussian the time-stamp