A Technique for Modeling Temporally Diffuse Effects Cory Shain

Deconvolutional Time Series Regression: A Technique for Modeling Temporally Diffuse Effects Cory Shain Advisor: William Schuler Department of Linguistics The Ohio State University Abstract the rest of the process unfolds. Temporal diffusion has been carefully studied in some psycho- Psycholinguists frequently use linear models logical subfields. For example, a sizeable liter- to study time series data generated by human ature on fMRI has investigated the structure of subjects. However, time series may violate the assumptions of these models through tempo- the hemodynamic response function (HRF), which ral diffusion, where stimulus presentation has is known to govern the relatively slow response a lingering influence on the response as the of blood oxygenation to neural activity (Boyn- rest of the experiment unfolds. This paper ton et al., 1996; Friston et al., 1998; H. Glover, proposes a new statistical model that borrows 1999; Ward, 2006; Lindquist and Wager, 2007; from digital signal processing by recasting the Lindquist et al., 2009). The HRF is an instan- predictors and response as convolutionally- tiation of the more general notion of impulse re- related signals, using recent advances in ma- sponse function (IRF) from the field of signal pro- chine learning to fit latent impulse response functions (IRFs) of arbitrary shape. A syn- cessing (Madisetti, 1997), where the response g∗h thetic experiment shows successful recovery of a dynamical system as a function of time is de- of true latent IRFs, and psycholinguistic exper- scribed as a convolution over time of an impulse g iments reveal plausible, replicable, and fine- with an IRF h: grained estimates of latent temporal dynamics, Z t with comparable or improved prediction qual- (g ∗ h)(t) = g(τ)h(t − τ)dτ ity to widely-used alternatives. 0 1 Introduction The process of deconvolution seeks to infer the structure of h (the IRF) given that the impulses Much of the data available to psycholinguistics is g (stimuli) and responses g ∗ h (psycholinguistic generated by processes that unfold in time. Ex- response) are known. amples include behavioral measures such as eye- Although particular attention has been paid to movement records and self-paced reading laten- the importance of impulse responses in fMRI, cies as well as neural measures like electroen- there are other kinds of psycholinguistic measures cephalography (EEG), magnetoencephalography in which temporal diffusion might reasonably play (MEG), functional magnetic resonance imaging a role. This paper focuses on one such example: (fMRI), and electrocorticography (ECoG). If left measures of reading time, specifically fixation du- uncontrolled, temporal confounds in psycholin- rations in eye-tracking and response times in self- guistic data can be problematic for interpretation paced reading. It has been known for decades that and testing of statistical models used to analyze the response to properties of words in human sub- them (Baayen et al., 2017, 2018). jects’ reading behavior may not be fully instan- This paper addresses one possible temporal taneous, but may "spill over" into the reading of confound which I will refer to as temporal diffu- subsequent words (Erlich and Rayner, 1983). A sion. Temporal diffusion exists when the response standard approach for handling the possibility of of a dependent variable to some inputs evolves temporally diffuse relationships between the word slowly as a function of time, with the result that a properties and reading response is to use spillover particular input observed at a particular time con- or lag regressors, where a word’s properties are tinues to exert an influence on the response as used to predict subsequent observations of the re- Response Response • • P P P•P• P Predictor • predictor Convolved • • • • • Trial index • • • • Figure 1: Effects in a linear time series model Predictor Response Latent IRFs Time Predictor Figure 3: Effects of predictors in DTSR Trial index Figure 2: Linear time series model with spillover fusion, this paper proposes deconvolutional time series regression (DTSR), a continuous-time deconvolutional method that directly models diffu- sponse. But this strategy has several undesirable sion by learning parametric IRFs of the predictors properties. First, the choice of spillover position(s) that mediate their relationship to the response vari- for a given predictor is difficult to motivate empir- able over time. The implementation of DTSR pro- ically. Second, since word fixations are variably posed here takes advantage of the recent advent of long, the use of relative event indices obscures po- the machine learning libraries Tensorflow (Abadi tentially important details about the actual amount et al., 2015) — which uses auto-differentiation of time that passed between events. Third, in- to support optimization of arbitrary computation cluding multiple spillover positions per predictor graphs — and Edward (Tran et al., 2016) — which quickly leads to parametric explosion on realisti- enables black box variational inference (BBVI) cally complex models over realistically sized data on Tensorflow graphs. While these libraries are sets, especially if random effects structures are in- typically used to build and train deep networks, cluded. And fourth, if the predictors are autocorre- DTSR uses them to overcome an important dif- lated, the spillover variants of each predictor will ficulty that otherwise holds of parametric decon- exhibit colinearities. volution: the likelihood surface depends on the Deconvolutional modeling provides a way for- choice of IRF kernel, requiring re-derivation of es- ward by supporting discovery from data of tem- timators for each unique model structure. Auto- poral diffusion in the reading response. However, differentiation and Bayesian inference eliminate major existing deconvolutional frameworks such the need for hand-derivation of estimators and as finite impulse response (FIR) models (Dayal sampling distributions for each model. and MacGregor, 1996) and vector autoregressive (VAR) models (Sims, 1980)1 are not applicable The IRFs learned by DTSR are interpretable as to variably-spaced reading data because they dis- estimates of the temporal shape of predictors’ in- cretize the time series, leading either (1) to severe fluence on the response variable. By convolving sparsity if variable event durations are retained predictors with their IRFs, DTSR is able to con- (few events are spaced exactly 207ms apart) or sider arbitrarily long histories of independent vari- (2) distortion if they are removed (all events are able observations in generating a given prediction, treated as equally spaced). and (in contrast to spillover) model complexity is constant on the length of the history window. As a solution to the problem of temporal dif- DTSR is thus a parsimonious technique for di- 1See Section2 for discussion. rectly measuring temporal diffusion. DTSR models are continuous-time and can therefore be op- tuted the first strong evidence of memory effects timized on naturalistic time series with variably- in broad-coverage sentence processing. However, spaced events, including reading time data. it turns out that when one baseline predictor — Figures1–3 illustrate the present proposal and probabilistic context free grammar (PCFG) sur- how it differs from linear time series models. As prisal — is spilled over one position, the reported shown in Figure1, a standard linear model as- effects disappear: p = 0.816 for constituent wrap- sumes conditional independence of the response up and p = 0.370 for dependency locality. Thus, from all preceding observations of the predictor. a reasonable but ultimately inaccurate assumption This independence assumption can be weakened about baseline effect timecourses — in this case, by including additional spillover predictors (Fig- that the PCFG effect did not spill over — can have ure2), at a cost of requiring additional parameters. a dramatic impact on the conclusions supported by In both cases, only the relative order of events is the statistical model. DTSR offers a way forward considered, not their actual distance in time. By by building the possibility of temporal diffusion contrast, DTSR recasts the predictor and response directly into the estimates, thereby avoiding the vectors as streams of impulses and responses (re- need to choose spillover positions as hyperparam- spectively) localized in time. It then fits latent eters. IRFs that govern the influence of each predictor on the response as a function of time (Figure3). 2.2 Deconvolutional time series modeling This paper presents evidence that DTSR can (1) recover known underlying IRFs from synthetic Deconvolutional modeling has long been used in data, (2) discover previously unknown temporal a variety of scientific fields, including economics structure in human data (psycholinguistic reading (Ramey, 2016), epidemiology (Goldstein et al., time experiments), (3) provide support for the ab- 2011), and neuroimaging (Friston et al., 1998). sence of temporal diffusion in settings where it One widely-used approach to IRF discovery is fi- might exist in principle, and (4) provide compara- nite impulse response modeling (FIR) (H. Glover, ble (or in some cases improved) prediction quality 1999; Ward, 2006). IRF models quantize the time to standard linear mixed-effects (LME) and gener- series and use linear regression to fit estimates alized additive (GAM) models. for each time point within some window, simi- larly to the spillover approach

Load more