LETTER Communicated by Dong Song

Decoding Hidden Cognitive States From Behavior and Physiology Using a Bayesian Approach

Ali Yousefi [email protected] Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, U.S.A.

Ishita Basu [email protected] Department of , Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, U.S.A.

Angelique C. Paulk [email protected] Department of , Massachusetts General Hospital, Boston, MA 02114, U.S.A.

Noam Peled [email protected] Department of Radiology, MBGH/HST Martinos Center for Biomedical Imaging and Harvard Medical School, Boston, MA 02114, U.S.A.

Emad N. Eskandar [email protected] Department of Neurological Surgery, Albert Einstein College of Medicine, Bronx, NY 10461, U.S.A.

Darin D. Dougherty [email protected] Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Charlestown, MA 02129, U.S.A.

Sydney S. Cash [email protected] Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, U.S.A.

Alik S. Widge [email protected] Department of Psychiatry, University of Minnesota, Minneapolis, MN 55454, U.S.A.

Neural Computation 31, 1751–1788 (2019) © 2019 Massachusetts Institute of Technology doi:10.1162/neco_a_01196

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1752 A. Yousefi et al.

Uri T. Eden [email protected] Department of Mathematics and Statistics, Boston University, Boston, MA 02215, U.S.A.

Cognitive processes, such as learning and cognitive flexibility, are both difficult to measure and to sample continuously using objective tools because cognitive processes arise from distributed, high-dimensional neural activity. For both research and clinical applications, that dimen- sionality must be reduced. To reduce dimensionality and measure under- lying cognitive processes, we propose a modeling framework in which a cognitive process is defined as a low-dimensional dynamical latent variable—called a cognitive state, which links high-dimensional neural recordings and multidimensional behavioral readouts. This framework allows us to decompose the hard problem of modeling the relationship between neural and behavioral data into separable encoding-decoding approaches. We first use a state-space modeling framework, the behav- ioral decoder, to articulate the relationship between an objective behav- ioral readout (e.g., response times) and cognitive state. The second step, the neural encoder, involves using a generalized linear model (GLM) to identify the relationship between the cognitive state and neural signals, such as local field potential (LFP). We then use the neural encoder model and a Bayesian filter to estimate cognitive state using neural data (LFP power) to generate the neural decoder. We provide goodness-of-fit anal- ysis and model selection criteria in support of the encoding-decoding re- sult. We apply this framework to estimate an underlying cognitive state from neural data in human participants (N = 8) performing a cognitive conflict task. We successfully estimated the cognitive state within the 95% confidence intervals of that estimated using behavior readout for an average of 90% of task trials across participants. In contrast to previ- ous encoder-decoder models, our proposed modeling framework incor- porates LFP spectral power to encode and decode a cognitive state. The framework allowed us to capture the temporal evolution of the underly- ing cognitive processes, which could be key to the development of closed- loop experiments and treatments.

1 Introduction

Recent technological and experimental advances have improved our capa- bility to stimulate and record simultaneous neural activity from multiple areas in humans (Stevenson & Kording, 2011; Lebedev & Nicolelis, 2017; Sani et al., 2018; Ezzyat et al., 2017). To meet the potential of online recording and stimulation across the brain, new technological advances

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1753

must be matched by complementary advances in analytical methods that can characterize complex dynamics of high-dimensional neural data and uncover relationships among neural data, behavior, and other physiologi- cal signals (Krakauer, Ghazanfar, Gomez-Marin, MacIver, & Poeppel, 2017; Jorgenson et al., 2015). Classically, neural representations have been investigated through stimulus-response experiments, which typically examine the effect of sim- ple, fixed stimuli on a particular feature of the neural response (Sidman, 1960; Zarahn, Aguirre, & D’Esposito, 1997). These experiments have been amenable to standard statistical modeling methods, like thresholding, averaging, and linear regression. For instance, classic experiments in motor cortical coding measured the number of spikes fired by M1 during arm-reaching movements in a discrete set of directions (Schwartz, Kettner, & Georgopoulos, 1988). Simple statistical models relating the spiking in each to the movement direction were then used for both encoding and decoding analyses (Hochberg et al., 2012; Serruya, Hatsopoulos, Panin- ski, Fellows, & Donoghue, 2002; Flint, Lindberg, Jordan, Miller, & Slutzky, 2012; Nuyujukian et al., 2018). While such brain region–specific statistical models may capture coding in certain neural systems, many other aspects of brain function, such as cognition, planning, and attention, are shaped by the interaction and cooperation of networks across brain areas. These functions are encoded in complex and distributed neural activity that spans multiple time and frequency scales (Haynes & Rees, 2006; Haxby, Hoffman, & Gobbini, 2000; Voytek & Knight, 2015; Avena-Koenigsberger, Misic, & Sporns, 2018). Furthermore, the neural representation of some cognitive processes might be understood only as a complex function of other behavioral or physiological signals. For instance, learning—as a cog- nitive process—typically is not directly measurable on its own and must be assessed through features of behavioral responses to a task (Brouwer et al., 2015; Martinez-Rubio, Paulk, McDonald, Widge, & Eskandar, 2018). Therefore, a model of the neural representation of learning might relate the joint neural activity across multiple brain regions and multiple recording modalities to specific attributes of a complicated task. Characterizing such relationships requires statistical and analytical tools to integrate neural recordings from many brain areas at varying temporal and spatial scales and to associate those recordings with relevant, low-dimensional features of the cognitive process being modeled. Finally, cognitive processes and their associated neural activity patterns evolve over time; thus, the dynamics of neural activity and behavior must be properly incorporated into the models (Davidson, Putnam, & Larson, 2000; Bressler & Menon, 2010; Braun et al., 2015). Classically, encoding models focus on characterizing how external stim- uli are encoded in the brain through patterns of neural activity. Subsequent neural decoding methods then aim to reconstruct the behavioral intentions

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1754 A. Yousefi et al.

from just the observed neural activity patterns. Neural encoding and decod- ing are challenging modeling and analysis problems, given the complexity of brain dynamics and the diverse and miscellaneous functions of brain net- works. The encoding and decoding problem becomes even more challeng- ing in the cognitive domain, as cognitive processes are typically not directly measurable and often are only loosely defined (Widge et al., 2017). We present an encoder-decoder modeling paradigm to overcome some of the limitations associated with applying existing neural decoder models to estimate hidden cognitive states from behavior and multidimensional neural data. We define a cognitive state process as a dynamic measure of cognitive function during a behavioral task. We then provide tools to estimate the cognitive state at each instant and associate its value with high-dimensional neural activity.The modeling framework accounts for the dynamics of the cognitive process and the associated temporal changes in the neural and behavioral signals. The proposed framework is scalable to high-dimensional neural and behavioral data and is applicable to many types of cognitive processes. Here, we applied this framework to the prob- lem of estimating cognitive flexibility in participants performing a multi- source interference task (MSIT; Bush, Shin, Holmes, Rosen, & Vogt, 2003). The results support the ability of the framework to infer a consistent associ- ation between neural and behavioral data and to estimate accurately the un- derlying cognitive state at each moment. This modeling framework not only provides a tool to study biomarkers of changes in the brain function(s) but also constructs an analytical platform for brain stimulation systems, such as closed-loop deep brain stimulation (DBS), where a reliable and real-time estimate of a cognitive state may be needed for cognitive control (Widge et al., 2017). Cognition is expressed in many aspects of behavior, and these behav- iors are complex and interrelated (Barrett & Satpute, 2013; LeDoux, 2000; Davidson et al., 2000; Bressler & Menon, 2010). Despite the fact that the be- havioral impact of cognition is multifold, features of cognition—or a cogni- tive state—can often be modeled using a low-dimensional manifold relative to the complex and distributed neural activity that generates the response to an ongoing cognitive demand (Shine et al., 2018; Vyas et al., 2018). Under this modeling assumption, the cognitive state becomes a low-dimensional latent variable that links brain and behavior, which are both adapting dy- namically in response to the task demands (Glaser & Kording, 2016; Skron- dal & Rabe-Hesketh, 2007; Suzuki & Brown, 2005). Approaches using latent variable analysis are widely applied in the field of statistics and , and its theory and applications can be extended to the analysis of neural and behavioral data (Loehlin & Beaujean, 2016). We define a cognitive state variable through its influence on observ- able behavior. Typically, this involves having a subject perform a controlled cognitive task that is designed to assess aspects of cognitive processing over multiple trials (Crandall, Klein, Klein, & Hoffman, 2006; Cohen, 2014).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1755

Analysis of data from this task involves characterizing changes in behav- ior using a low-dimensional cognitive state process addressable using a state-space modeling framework. State-space modeling has been shown previously to successfully model dynamical behavioral signals to estimate underlying cognitive or emotional processes (Schöner, 2008; Wang et al., 2005; Smith et al., 2004; Yousefi et al., 2015). State-space models can opti- mally process multimodal, complex, dynamical behavioral signals such as reaction time and decision choice in estimating cognitive processes as in learning or attention (Prerau et al., 2008). Under the state-space modeling framework, the unobserved cognitive state is modeled as a low-dimensional dynamical latent variable, and the behavioral signals are defined as functions of both the latent variable and trial-to-trial factors relevant to the task (Bentler & Weeks, 1980; Wang et al., 2005; Bishop, 2006). The state-space estimation problem involves simulta- neously fitting the models and computing the dynamics of the cognitive state from the observed behavior. Its general solution is a combination of a Kalman-type filter/smoother with an expectation-maximization (EM) al- gorithm (Smith et al., 2004; Dempster, Laird, & Rubin, 1977), which have been developed for a large class of behavioral signals with different dis- tribution models (Yousefi et al., 2015; Prerau et al., 2008). The state-space modeling framework therefore can be used for a low-dimensional behav- ioral decoder, combining fine and slow temporal dynamics that capture the essential features of the behavior (Smith et al., 2004; Yousefi et al., 2015). Importantly, this formulation makes a core assumption about the cogni- tive state: that it exists at all times when neural data are measured, but that its ground truth (behavioral output) can be observed only intermittently. That is, in a standard trial-structured behavioral task, we can update the value of the cognitive state only once we observe the participant’s response to each trial. The neural data, however, are usually sampled hundreds to thousands of times per second. We can assume that the observed decision was influenced by the state even before we knew the state value, inthat the preceding neural activity effectively determined and encoded the be- havior. Thus, once the value of the state on a given trial is known, it can be regressed against neural data that occurred before that decision to un- derstand where and when these cognitive variables are encoded. We used this approach in a recent paper where a learning process was formulated as a latent state variable (Martinez-Rubio et al., 2018). Alternatively, if we wish to predict the behavior before it occurs, we can directly infer the cog- nitive state from neural activity, as in the decoding analyses presented here and in recent related work by other groups (Ezzyat et al., 2017; Sani et al., 2018). Cognitive function emerges from ongoing and distributed brain activity, and thus, a full description of the cognitive state(s) should capture the link between cognitive state and neural data (Haynes & Rees, 2006; Haxby et al., 2000; Mitchell et al., 2004; Deco, Jirsa, & McIntosh, 2011; Poldrack, 2011).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1756 A. Yousefi et al.

Taking into account the fact that brain activity is dynamic and stochastic, we use dynamical and statistical modeling to characterize the neural activ- ity relative to cognition (Li, O’Doherty, Lebedev, & Nicolelis, 2011; Fischer & Peña, 2011; Kloosterman, Layton, Chen, & Wilson, 2013; Koyama, Eden, Brown, & Kass, 2010; Poldrack, 2011; Turner et al., 2013; Mukamel et al., 2005; Lawhern, Wu, Hatsopoulos, & Paninski, 2010; Shlens, 2014). A va- riety of statistical modeling approaches like stochastic dynamical model- ing, Bayes theory, and joint and conditional of neural features are widely used in the analysis of neural data (Knill & Pouget, 2004; Wu et al., 2003; Wu, Kulkarni, Hatsopoulos, & Paninski, 2009; Paninski, Pillow, & Lewi, 2007; Truccolo, Eden, Fellows, Donoghue, & Brown, 2005). The core assumption behind these approaches is that changes in the cognitive state(s) are encoded in a subset of features of neural activity, and thus these states can be decoded from multidimensional neural recordings (Pillow, 2007; Cal- abrese et al., 2011). Acommon class of statistical models used to relate neural activity and behavioral signals is the generalized linear model (GLM) (Mc- Cullagh & Nelder, 1989). GLMs provide an extension of linear regression that can be flexibly applied to a variety of different neural data modalities, including spiking activity, functional magnetic resonance imaging (fMRI), and local field potential (LFP; Pillow, 2007; Calabrese et al., 2011; Mukamel et al., 2005; Lawhern et al., 2010; Shlens, 2014). Here, we use GLMs to char- acterize the relationship between the cognitive state (estimated from behav- ior) and neural data recorded as LFP. Together, this modeling framework comprises three models: one that characterizes the dynamics of the central cognitive state process, one that captures the influence of the cognitive state on features of the behavioral task, and one that describes the neural representation of the cognitive state as it evolves. Once these models are defined and fit, we use multi- ple goodness-of-fit metrics to assess their quality and use statistical infer- ence methods to determine the statistical significance of any neural and behavioral associations with the cognitive state process (Brown, Barbieri, Ventura, Kass, & Frank, 2002; Wilcox, 2005; Venables & Ripley, 2013; Gor- don, Salmond, & Smith, 1993). In addition to defining the model frame- work and deriving the associated estimation and inference algorithms, we illustrate its application in decoding a hidden cognitive state by using recordings of neural activity in human participants performing a cognitive flexibility task. Application of the encoding-decoding framework to these data allows us to solve the problem of estimating a low-dimensional hid- den cognitive state from both behavior and LFP power (as decomposed into multiple spectral components) from multiple brain regions. This framework is scalable and can be applied in many other domains of cognitive state estimation to link high-dimensional behavioral and neu- ral data. Most important, the modeling framework is defined indepen- dent of the modality of neural activity and corresponding neural features; thus, other neural features, including dynamic functional connectivity and

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1757

spatiotemporal neural dynamics (Sporns, Chialvo, Kaiser, & Hilgetag, 2004; Makeig, Debener, Onton, & Delorme, 2004; Buzsáki et al., 2012), can also be studied in building the relationship between brain functions and behavior. In section 3, we first provide the theoretical foundations of the encoding- decoding modeling framework. We then describe how each of the encoder and decoder models is built and propose corresponding model identifica- tion procedures. We describe the behavioral decoder model and discuss how the underlying cognitive state(s) can be estimated using behavioral signals. We then describe the neural encoder model and propose how the neural features representing the cognitive state are selected. We continue by demonstrating the neural decoder model structure and how the decoder performance in estimating the cognitive state can be evaluated. We also dis- cuss a pruning procedure to select the parsimonious subset of neural fea- tures to build a more robust neural decoder model. In section 4, we demonstrate an application of the proposed encoder- decoder modeling framework in decoding the cognitive state of task par- ticipants performing a cognitive task using simultaneous recordings of their behavior and intracranial LFP. We describe the multisource interfer- ence task (MSIT) as performed by participants undergoing simultaneous intracranial neural recordings for clinical purposes. We discuss the specific behavioral decoder model being developed for MSIT. We then discuss the encoder model structure and model selection procedure for the cognitive state being estimated using the behavioral decoder. We discuss why the identified neural encoders representing cognitive states are validated and study the performance of the neural decoder in estimating the cognitive state. We conclude this section by describing the consistency of the neural encoder features identified by the encoder model across patients. In section 5, we further elaborate on different aspects of the proposed modeling framework. We also discuss its similarity and distinct features to other modeling frameworks used in characterizing the relationship be- tween brain and behavior. We also talk about the limitations of the frame- work and future research direction, which can be built on the modeling framework being proposed and studied in this research.

2 Methods

We developed an encoding-decoding (E-D) modeling framework linking behavioral and neural signals under the hypothesis that there exists an underlying cognitive state conditioned on which the neural activity and behavioral signals can be independently characterized. In other words, the behavior can be predicted by the cognitive state without knowing the neu- ral activity, and, similarly, the neural activity can be predicted using esti- mates of the cognitive state without observing behavior (see Figure 1). In the E-D modeling framework, the cognitive state is modeled as a dynamic pro- cess, and the conditional distribution of both behavioral signals and neural

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1758 A. Yousefi et al.

Figure 1: Schematic representation of the neural encoding-decoding model. xk corresponds to the cognitive state, yk is the observed behavioral signal, zk rep- resents features of the neural activity, and k represents the time or trial index. The dashed arrows refer to the encoding models, and the solid arrows are the | decoding models. xk y1,...,K is the distribution of xk given the behavioral read- ,..., | out from 1 K,andxk z1,...,K is the distribution of xk given the neural activity ,..., | over the time 1 K. zk xk is the conditional distribution of neural activity at | time index k given cognitive state, and yk xk is the conditional distribution of behavioral readout at time index k given cognitive state.

activity is modeled as functions of both the cognitive state and any addi- tional explanatory variables. Explanatory variables include factors of the behavioral experiments designed to assess cognitive state, like how hard or easy a task is, and current and previous features of the neural activity. Under this modeling framework, we assume that both neural and be- havioral data (the “observed signals”) are recorded during a cognitive task. The cognitive task runs over many trials, and the task trials are designed to both evoke and capture changes in one or multiple cognitive processes. We further assume that the cognitive process of interest might evolve over time and that its progression correlates with changes in behavior and brain activity. The model development starts by characterizing the underlying cogni- tive process as a function of behavioral readout and then using the esti- mated cognitive process to find the neural correlates that encode changes in the cognitive state (see Figure 1). The modeling goal is to estimate low-dimensional cognitive processes by capturing the essential features of neural or behavioral data. Using our proposed modeling framework, we assume both neural and behavioral data are high-dimensional signals; thus, we can combine information from different modalities of behavior to bet- ter capture the cognitive state and study neural activity from many brain areas to build the neural encoder-decoder model. In section 4, we discuss a multisource interference task (MSIT), which is designed to assess the task participants’ cognitive flexibility (Bush et al., 2003; Bush & Shin, 2006; Sheth et al., 2012; Van Veen, Cohen, Botvinick, Stenger, & Carter, 2001). Many cog- nitive tasks are designed with similar objectives, and our proposed model- ing framework would be applicable to estimating the underlying cognitive state in these types of experiments as well.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1759

2.1 Model of Behavior. We utilized a state-space modeling approach to describe the dynamical features of the behavioral signal. Here, we focus on modeling behavioral signals recorded on trial-structured cognitive tasks, where the task participant’s response is captured per task trial. The model consists of a state equation and an observation equation. The state equation defines the temporal evolution of the cognitive state from one trial tothe next, and the observation equation defines the influence of the cognitive state on the observed behavioral signals. Let xk be the cognitive state at trial k; the state variable evolution through time is defined by

| ∼ | ,θ , xk xk−1 f (xk xk−1 x ) (2.1)

θ where xk is the state variable at trial k and x is the set of free parameters of the state equation. We assume that the state equation has the Markov property; thus, the distribution of the cognitive state at trial k depends on the cognitive state only at the previous trial, k − 1. While this assumption simplifies the notation in the derivation, it is possible to extend these methods to non-Markovian processes (Hara, 1980). The conditional probability of the behavioral signal yk at trial k is defined as

| ∼ | , ,θ , yk xk f (yk xk Ik y ) (2.2)

where yk is the observed behavioral signal and Ik is the set of experimental stimuli (e.g., difficulty level) at trial k. In the observation equation, θy is the set of model free parameters. To predict the cognitive state, we use an adaptive filter solution that si- multaneously infers the model parameters and estimates the evolution of the cognitive state through the course of an experiment (Smith et al., 2004; Prerau et al., 2008; Yousefi et al., 2015). The adaptive filter consists of afilter- smoother estimate of the state and an expectation-maximization (EM) algo- rithm, which are sequentially used to update the model-free parameters and the cognitive state estimate maximizing the likelihood of the observed behavioral signals. The adaptive filter output includes the maximum like- lihood estimate of the model parameter sets, θˆy and θˆx and the posterior distribution of the xk at each trial,

| ∼ | , xk y1···K f (xk y1···K ) (2.3)

where K is the length of the experiment and y1,...,K are the observed behav- ioral signals over the course of the entire experiment. Equation 3.3 gives the fixed-interval smoother estimate of xk. For real-time applications, we use the filter estimate of the cognitive state, which is defined by

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1760 A. Yousefi et al.

| ∼ | . xk y1···k f (xk y1,...,k ) (2.4)

The adaptive filter solution and parameter estimation algorithm have been developed for a range of distributions (Smith et al., 2004; Prerau et al., 2008; Yousefi et al., 2015, 2018).

2.2 Neural Encoding Model. We built a neural encoder model to char- acterize the relationship between the cognitive state and multiple features of the recorded neural activity. These features can include spatiotemporal statistics such as the signal power in different frequency bands or the co- herence between neural signals recorded across different brain regions. The conditional probability of the neural features given the cognitive state is de- fined by

| ∼ | , ,θ , zk,n xk f (zk,n xk Jn,k z,n ) (2.5)

where zk,n refers to the nth feature of neural activity at trial index k, Jn,k represents a set of additional explanatory variables, θz,n are the model pa- rameters, and n = 1,...,N. We assumed there are some unknown number N of informative neural features, and we built a separate encoder model for each of these features (i.e., we assume each of these features is independent given the cognitive state estimate and a history of previous neural features) defined by the explanatory variable Jn,k. Even in multichannel recordings with volume conduction, this independence can often be satisfied—for ex- ample, by converting LFP signals to bipolar or Laplacian montages. For the encoder models, we particularly focused on changes in the brain dynamics evolving over multiple trials; thus, each feature could be properly encoded using the cognitive state and previous neural feature measures. In prac- ⊂{ = tice, Jn,k will include a subset of previous neural features—Jn,k zk− j,n : n 1,...,N, j = 1,...,k}—and other factors like neural stimulation. In building these neural encoder models, we used generalized linear modeling (GLM), an extension of the classical linear regression model for observed variables that may not be correctly described by a normal distri- bution (McCullough & Nelder, 1989). For instance, for a coherence feature, a beta distribution is often preferable to the normal distribution (Miranda de Sá, 2004), and power features are typically better described by log-normal or gamma distributions than by a normal distribution (Barbieri, Quirk, Frank, Wilson, & Brown, 2001; Yousefi et al., 2015). GLM generalizes the linear re- gression model for these distributions, and thus, it was used in building the encoder model. Generally there are multiple features per trial index; thus, the conditional probability defined in equation 3.5 will be extended over the whole feature ={ = ,..., } set: Zk zk,n : n 1 N .

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1761

2.2.1 Model Selection. Neural data were recorded from over 200 electrode contacts and transformed into multiple frequency bands and coherence fea- tures (Widge et al., 2017). Not every feature encodes the cognitive state, and if it encodes the state, the encoding relationship might be unique to each neural feature. We used GLMs to build the encoder models and then as- sessed the statistical significance of each of the models. We applied an analysis of deviance to identify a parsimonious encoder model relating a subset of neural features to the cognitive state process (Mc- Cullough & Nelder, 1989; Wilcox, 2005; Venables & Ripley, 2013; Brown et al., 2002). The analysis of deviance uses a maximum likelihood ratio test to compare nested models, one with q parameters—the reduced model— and another with p parameters—the complete model. Using the GLM re- gression, we simultaneously estimated the encoder model parameters and its dispersion term. Venables and Ripley (2013) suggest that when the dis- persion term is estimated, the distribution of the log maximum likelihood ratio statistic can be approximated by an Fp−q,K−p distribution, where K is the observation length. The F-statistic is defined by

Dq − Dp ∼ Fp−q,K−p, (2.6) φˆ(p − q)

where the Dq and Dp are the deviance values of the restricted and full mod- els and φˆ is the dispersion estimate using the full model. We ran the analysis of deviance to identify the set of neural features showing a significant relationship to the estimated cognitive state process. For practical reasons, like reducing the computational cost of neural fea- ture extraction and building a more robust decoder estimator, we sought a decoder model with a small number of neural features. To address this, we used a secondary model selection step, which groups neural features, identified by the model selection step, by their performance in decoding the cognitive state. We will discuss this step of the model selection after describing the decoding procedure.

2.3 Stochastic Cognitive State and Encoder Model. Using GLMs, we fit maximum likelihood estimates of the encoder model parameters given a set of predictor variables and observed outcomes (McCullough & Nelder, 1989). The predictor variables for the GLM are deterministic, and the GLM maximum likelihood estimate provides a point estimate. However, the cog- nitive state—the predictor variable in the encoder model—is a stochastic process. Thus, to estimate the encoder model parameters, we maximized the expected , defined in equation 3.5, over the distribu- tion of the cognitive state. The optimal parameters of the encoder model were defined by

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1762 A. Yousefi et al.    θˆ , = arg max log E f (z , |x , J , ,θ) , (2.7) z n θ∈ X k n k n k k

where the expectation is over the cognitive state and  is the parameter space. Finding the closed form of the expectation is not trivial; thus, we use a sampling technique to approximate the expectation. Using the behavioral model, we build the posterior distribution of the cognitive state given the whole behavioral observation (see equation 3.3). We then use the backward sampling technique (Lindsten & Schön, 2013) to draw samples from the cognitive state. The cognitive state is a dynamical process; thus, using the sampling step, we generate multiple trajectories of the cognitive state over the course of the experiment. To reformulate the optimization problem as a GLM problem, we also exchanged the order of log and expectation. Given Jensen’s inequality (Ho- effding, 1963), we maximize the lower bound of the likelihood function de- fined in equation 3.7. The new likelihood function is definedby    θˆ , = arg max log E log f (z , |x , J , ,θ) z n θ∈ X k n k n k k   ∼ = arg max log f (z , |xˆ , , J , ,θ) θ∈ k n k m n k m k   = arg max log f (z , |xˆ , , J , ,θ), (2.8) θ∈ k n k m n k m k

where xˆk,m is a sample of the state variable at trial k of the mth trajectory. The approximate likelihood function corresponds to the full likelihood of M × K samples; thus, we can use the GLM to estimate the encoder model parameters using M × K samples. In equation 3.8, the GLM parameters are fit to the data using M × K sam- ples; however, the effective size of these trajectories is generally lower than M × K given that they are not entirely independent. Thus, the deviance measure returned by GLM fitting using equation 3.8 will differ from the encoder model defined in equation 3.7. To address this discrepancy, wepro- pose a new statistic that can be used for the encoder model selection in place of the F-statistics already defined in equation 3.7,

Dq − Dp Fˆ = , (2.9) φˆ(p − q)M

where Dq and Dp are the deviance estimate using the data samples in all M trajectories (see equation 3.8). This Fˆ-statistic can be thought of as a sample mean of F-statistics, each computed from one sample of the cognitive state

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1763

process. A rough but conservative approximation for the sampling distri- bution of Fˆ under the null hypothesis of no difference between the full and reduced model is an F-distribution with degrees of freedom of p − q and K − p. A more accurate, and statistically powerful estimate of its sampling distribution could be obtained through a bootstrapping procedure. This ap- proximation also allows us to estimate conservative confidence intervals about each component parameter—θˆz,n in equation 3.7.

2.3.1 Neural Decoder. In the neural decoding step, we estimate the condi- tional distribution of the cognitive state given the observed neural features. We first estimated the instantaneous likelihood of the cognitive state given the current neural features, defined by  ; ∝ | , ,θ , , L(xk zk ) f (zk,ns xk Jns,k z ns ) (2.10)

ns⊂{1,...,N}

where N refers to the whole set of the observed neural features and ns is the subset of the neural features identified through the encoder model selec- | , ,θ , tion process. f (zk,ns xk Jns,k z ns ) is the conditional of a neural feature given the cognitive state and explanatory variables, and θ z,ns represents the parameters of the encoder model over the ns features, es- timated using equation 3.8. We estimated the posterior distribution of the cognitive state using the recursive Bayesian filter rule, which is defined by

| ,..., ∝ | ,..., × ; , f (xk z1 zk ) f (xk z1 zk−1 ) L(xk zk ) (2.11)

| ,..., where f (xk z1 zk−1 ) is the one-step prediction distribution, and it is de- fined by Chapman-Kolmogorov equation (Gardiner, 1985),  | ,..., = | ,θ × | ,..., × , f (xk z1 zk−1 ) f (xk xk−1 x ) f (xk−1 z1 zk−1 ) dxk−1

(2.12)

| ,θ where f (xk xk−1 x ) is the state-transition distribution defining the state evolution through time. The state equation is a part of the behavior model, and its parameters—θx—are estimated using the adaptive filter solution de- veloped for the behavioral signal—equation 3.1. Equations 3.10–3.12 define the neural decoding procedure. Using the proposed model, we can recur- sively update our estimate of the cognitive state using the most recently observed neural features.

2.3.2 Pruning the Neural Features. In the model selection step, we use an F-test with a fixed p-value threshold to identify an initial set of candidate features. One option would be to simply select all features with significant

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1764 A. Yousefi et al.

associations with the cognitive state when correcting for multiple compar- isons. Instead, we minimize a cross-validated objective function, such as the root-mean-squared-error (RMSE) or model deviance, to prune the encoder model neural features. To do so, we use a backward elimination model se- lection process. We start with all the candidate neural features and exam- ine all decoder models with Ns − 1 features to identify the one with the lowest cross-validated RMSE. We then repeat this procedure with the se- lected Ns − 1 features and drop features sequentially. Using this selection process, the optimal number of features is the number with the lowest cross- validated RMSE. This process produces a constrained encoder model with a subset of the neural features used to build the full encoder model. Given the one-at-a-time nature of dropping neural features, it is possible for the back- ward model to miss the optimal subset of neural features. The step-wise model selection techniques, including the forward and backward methods, tend to amplify the statistical significance of the variables that stay in the model, and thus, the model selection has a tendency to have a higher type 1 error. To have a more robust pruning step, we could use more advanced subset selection techniques, which could also be adopted in the neural fea- ture pruning step (Chipman et al., 2001; Miller, 2002). For instance, reg- ularization methods including Lasso, group Lasso (Friedman, Hastie, & Tibshirani, 2010b) and bridge Lasso (Huang, Ma, Xie, & Zhang, 2009), widely utilized techniques for model selection in high-dimensional data, can be applied in the pruning step. The neural decoder provides the poste- rior distribution of the cognitive state per each trial; thus, other measures of the decoder performance built on the distribution of estimation like cov- erage area or estimation skewness can be used in the feature pruning step.

2.3.3 Decoder Performance Analysis. The decoder performance can be ana- lyzed using different metrics. The first metric we used is a 95% high-density region (HDR) measure (Hyndman, 1996), in which we can check how many points of a specific trajectory of the cognitive state lie in the 95% HDR ofthe neural decoder posterior estimate. The 95% HDR provides a good sense of how similar the cognitive state’s true distribution—derived using behav- ioral signals—and its estimate are. The other measure we use is the RMSE between the mean of the cognitive state estimated from the behavioral data and the mean of the decoder posterior over the test-neural data set. The 95% HDR and RMSE provide a collection of metrics to assess the individ- ual decoder model. We can also use the K-S statistic, which provides a test to examine whether the test data set neural features are samples of their cor- responding trained encoder model or not. Here, the null hypothesis is the test data set, and neural features are samples of the trained encoder model. Using this test, we can also check whether the encoder model assumption is valid beyond the training set. We can also use the K-S test to check whether the cognitive state, estimated using behavioral model, is a sample of the neural decoder posterior distribution.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1765

2.4 Further Note on the Model Structure. While we can estimate the cognitive state given the neural features with the decoder model, we could also predict behavioral response properties during the task given the neural features (see Figure 1). We can combine equations 3.4 and 3.10 to estimate behavior. The behavior estimate at time k is defined by  | ,..., ∝ | , ,θ × | ,..., × , f (yk z1 zk ) f (yk xk Ik y ) f (xk z1 zk ) dxk (2.13)

| ,..., where, f (yk z1 zk ) gives the likelihood of yk given the neural activity up to k. In practice, we can calculate the integral of equation 3.13 numeri- cally using a sampling method. Equation 3.13 defines the conditional dis- tribution of behavior by summing the full likelihood over possible values of cognitive state. A good metric to assess the quality of this prediction is the RMSE between the observed behavior and its expected estimate.

In section 3, after deriving the mathematical solution of the encoder- decoder model, we discussed a series of goodness-of-fit analysis meth- ods to identify a parsimonious decoder model and assess the prediction result. In section 4, we demonstrate the methodology through applica- tion to a specific cognitive task performed by human participants with in- tractable undergoing clinically indicated intracranial recordings. We demonstrate the decoding of a cognitive state using brain activity.

3 Application: Decoding Cognitive State from Behavior and Local Field Potential Data during Performance of a Cognitive Task

We applied this modeling framework to neural and behavioral data recorded from eight human participants while they performed the multi- source interference task (MSIT; Bush et al., 2003; Bush & Shin, 2006). The objective was to build an E-D model to estimate the dynamics of a set of underlying cognitive processes related to baseline task difficulty and to the effect of interference stimuli using recorded behavioral and neural data. We first built a behavioral model to estimate these state processes andthen identified a neural E-D model expressing their neural correlates. Wees- timated the mapping from each neural feature to the estimated cognitive states. We then performed feature pruning to determine a subset of neural features that are ultimately used to build a neural decoder model to predict the cognitive state (see Figure 2). For each step, we addressed goodness- of-fit analyses and performance results and validated the modeling result using assessment criteria.

3.1 Experimental Data. Human participants consisted of eight patients with long-standing pharmaco-resistant complex partial seizures who vol- untarily participated after fully informed consent according to NIH and

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1766 A. Yousefi et al.

Figure 2: Schematic of the steps followed in building the E-D framework. In this framework, we used neural features and estimates of the state decoded from the task behavior to build an encoder model. After an optional neural feature pruning step, we finally built a decoder model.

Army HRPO guidelines as monitored by the Partners Institutional Review Board. Intracranial EEG (iEEG) recordings were made over the course of clinical monitoring for spontaneous seizures. The decision to implant elec- trodes, and the number, types, and location of the implantations were all determined on clinical grounds by a team of caregivers independent of this study. Participants were informed that their involvement in the study would not alter their clinical treatment in any way and that they could with- draw at any time without jeopardizing their clinical care. Depth electrodes (Ad-tech Medical, Racine WI, or PMT, Chanhas- sen, MN) had diameters of 0.8 to 1.0 mm and consisted of 8 to 16 platinum/iridium-contact leads at 2.4 mm long. Electrodes were localized by using a volumetric image coregistration procedure. Using Freesurfer scripts (Reuter, Rosas, & Fischl, 2010; Reuter, Schmansky, Rosas, & Fischl, 2012; http://surfer.nmr.mgh.harvard.edu), the preoperative T1-weighted MRI (showing the brain anatomy) was aligned with a postoperative CT (showing electrode locations). Electrode coordinates were manually deter- mined from the coregistered CT (Dykstra et al., 2012). Mapping to brain regions was performed using the electrode labeling algorithm (Peled et al., 2017). Intracranial recordings were made using a recording system with a

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1767

sampling rate of 2 kHz (Neural Signal Processor, Blackrock Microsystems). At the time of acquisition, depth recordings were referenced to an EEG electrode placed on the skin (cervical vertebrae 2 or Cz). LFP analysis was performed using custom analysis code in Matlab (MathWorks) and Field- trip, an open source software implemented in Matlab (http://www.ru.nl /neuroimaging/fieldtrip; Oostenveld, Fries, Maris, & Schoffelen, 2011). All LFP data were subsequently decimated to 1000 Hz, de-meaned relative to the entire recording, and line noise and its harmonics up to 200 Hz were re- moved by subtracting the bandpassed notched filtered signals from the raw signal. All LFP channels were bipolar re-referenced to account for volume conduction (Bastos & Schoffelen, 2016). Participants performed the MSIT with simultaneous recordings of be- havior and LFPs from both cortical and subcortical brain structures. MSIT was designed to reliably and robustly activate the dorsal anterior cingu- late cortex (dACC), which plays a critical role in cognitive processing in healthy individuals. It combines sources of cognitive interference (Stroop, 1935; Eriksen & Eriksen, 1974; Simon & Berbaum, 1990) with factors known to increase dACC activity so as to maximally activate dACC within indi- viduals (Bush et al., 2003, Bush & Shin, 2006). The MSIT has been used to identify the cognitive/attention network in normal individuals and those with neuropsychiatric disorders (Bush & Shin, 2006). MSIT trials consist of presentation of images containing three numbers from 0 to 3, where two of the numbers have the same value and one dif- fers (see Figure 3A). The images are presented for 1.75 seconds and jittered within 2 to 4 seconds between images on a computer screen with either Presentation software (Neurobehavioral Systems) or Psychophysics tool- box (Matlab; Brainard & Vision, 1997; Kleiner et al., 2007; Pelli, 1997). The participant has to identify, via button press, the identity of the number that is unique, not its position. The trials contained two levels of cognitive inter- ference: high-conflict or incongruent trials had the position of the unique number different from the keyboard position, inducing some cognitive in- congruence (incongruent or interference trials), while low-conflict trials had the unique number in the same numerical position (congruent or noninter- ference trials). Trials were presented in a pseudo-random order such that the identity of the pictures was random but the congruence changes were balanced. The behavioral data per trial consist of the reaction time (RT) (see Figure 3B) and response accuracy. For the interference or incongruent trials, par- ticipants responded around 200 ms more slowly on average than on non- interference or congruent trials (see Figure 3C), which was in agreement with previous reports (Bush & Shin, 2006). Furthermore, when the effect of noninterference trials is controlled for, RT showed a slow change over time, and this reflected slow improvement in the participants’ moment-to- moment cognitive control (Yousefi et al., 2015).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1768 A. Yousefi et al.

Figure 3: MSIT explanation and behavioral signals. (A) Schematic of congruent and incongruent trials in the MSIT. (B) Example RT over MSIT trials. The reac- tion times during incongruent trials (red circles) are on average slower than the congruent trials (blue circles). (C) Bar plot showing average RTs over con- gruent and incongruent MSIT trials across eight participants. Participants were slower by approximately 200 ms on incongruent trials than on congruent trials. (D) Brain regions implicated in cognitive conflict resolution (green: dlPFC, blue: vlPFC, red: dACC).

The neural features used in the E-D model were extracted from LFP recorded across electrode pairs that were localized to cortical structures comprising the dorsal anterior cingulate cortex (dACC), dorsolateral pre- frontal cortex (dlPFC), rostral anterior cingulate cortex (rACC), lateral orbitofrontal cortex (OFC), dorsomedial prefrontal cortex (dmPFC), and ventrolateral prefrontal cortex (vlPFC). The choice of these regions was based on brain regions activated in fMRI studies (see Figure 3D) during cognitive conflict processing (Bush et al., 2003; Bush & Shin, 2006; Sheth et al., 2012).

3.2 Behavioral Model of the Cognitive State. Since participants had a very high accuracy (over 98%) while performing the MSIT, we only consid- ered the RT as the behavior readout for estimating cognitive state. We fur- ther assumed that the participant’s behavior adapted through the course of

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1769

an experiment (see Figures 3B and 4A) (Egner, Etkin, Gale, & Hirsch, 2007; Egner, 2014) due to changes in their cognitive state. We modeled RT as

= + + ε ,ε∼ ,σ2 , log yRT,k xbase,k Ii,kxi,k k k N(0 ε ) (3.1)

where yRT,k is the RT at trial k. xbase,k represents the baseline cognitive flex- ibility state, which captured the participant’s overall performance through the experiment, irrespective of whether a trial contained a congruent or in- congruent stimulus pattern. The conflict state, xi,k, represented cognitive interference and led to changes in RT that were specifically due to incon- gruent trials. Ii,k was an indicator function with value one on the incon- ε gruent trials and zero on other trials. k defined the observation noise, and it was assumed to be normal and independent and identically distributed 2 (i.i.d) across trials with zero mean and fixed variance of σv . Both gamma and log-normal distributions fit MSIT data well (Yousefi et al., 2015); hence, we chose a log-normal distribution for the RT. In some cases, where accuracy is included as another behavior readout, such as learning, we can include data on whether the task response was cor- rect in estimating the cognitive state variables. In such cases, the response is modeled as a Bernoulli random variable with probability given by

== = + + , logit(pbin,k 1) c0 c1xbase,k c2Ii,kxi,k (3.2)

where pbin,k is the probability of a correct response on the kth trial. Free parameter c0 defines the chance level accuracy, and c1 and c2 parameters define to what extent the choice decision is influenced by the state variables. In online appendix A, we show another example data set where both RT and accuracy carry information about the cognitive state(s). We modeled the dynamics of the baseline state (xbase,k) and conflict state (xi,k) using first-order autoregressive (AR(1)) models. The evolution of these two state variables over trials is defined by

= + v v ∼ ,σ2 , xbase,k a1xbase,k−1 1,k 1,k N(0 1,v ) (3.3a) = + v v ∼ ,σ2 xi,k a2xi,k−1 2,k 2,k N(0 2,v ) (3.3b)

where a1 and a2 define the dependence between the state variables over v v time. 1,k and 2,k are normal i.i.d white noise with zero mean and variance σ 2 σ 2 1,v and 2,v , respectively. Equations 4.1 to 4.3 define the behavioral model; , ,σ2 ,σ2 , , ,σ2 the model parameters include (a1 a2 1,v 2,v c1 c2 ε ). We use a combi- nation of a filter smoother and EM algorithm to identify the model parame- ters (implemented using the COMPASS toolbox; Yousefi et al., 2018). Using the model parameters, we estimated the posterior distribution of the state variables at each task trial (see Figure 4).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1770 A. Yousefi et al.

Figure 4: Example of cognitive state estimation from MSIT. (A) Log RT over MSIT trials (blue) and the smoothed log RT using a 10-point moving average filter (black). (B) Conflict RT defined as the difference between log RTonincon- gruent trials and the smoothed log RT using a 10-point moving average filter. (C) Estimated mean baseline state, xbase, and its 95% confidence bound using RT. (D) Estimated mean conflict statei x and its 95% confidence bound.

Notably, in this example case, the participant got faster (decreased RTs) as the task proceeded over 242 trials (see Figure 4A). Another simple way of estimating xbase,k would be to use a moving average filter to get a smoother estimate of the RT (see Figure 4A). Since incongruent trials increased reac- tion time and added noise to the RT (see Figure 4B), we estimated xi,k by sub- tracting the smoothed log RT (see Figure 4A) from the log RT of incongruent trials. When we modeled the RT using equation 4.1, we found that the base- line state (xbase) (see Figure 4C) had a similar trend as the smoothed RT (see Figure 3A) while the conflict statex ( i) (see Figure 4D) showed a rapid rise between trials 30 and 50, a trend that is more difficult to detect in the noisy smoothed conflict RT (see Figure 4B). Thus, using the behavioral model, we could combine different behavioral trial types—for instance, baseline and interference components—to estimate the underlying cognitive state. We next focused on finding the neural correlates, which could also describe these changes in the cognitive state.

3.3 Model of Neural Encoding and Decoding. We constructed a col- lection of neural features consisting of spectral power in the theta (4–8 Hz),

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1771

Figure 5: A schematic of the procedure followed to extract a matrix of neural features from recorded LFP across electrode channel pairs spanning cortical re- gions. Thetaij,alphaij, and high gammaij refer to the spectral power of trial i and channel pair j in the 4–8 Hz, 8–15 Hz, and 65–200 Hz frequency bands, respectively.

alpha (8–15 Hz) and high gamma (65–200 Hz) extracted from cortical LFPs, bipolar re-referenced to account for volume conduction (Bastos & Schoffe- len, 2016). We considered these frequency bands because theta and alpha bands from scalp EEG have been shown to modulate with the RT during MSIT (Gonzalez-Villar & Carrillo-de-la-Peña, 2017), and we found a high correlation between LFP high gamma power and RT in some of our prelim- inary data analysis. For each MSIT trial, the spectral power in the theta, alpha, and high gamma frequency bands was calculated using a Morlet wavelet transform over a 2 s time window starting at the MSIT image onset. This procedure produced, for each trial and each participant data set, a total number of neural features three times the number of channel pairs included. The number of neural features and the number of MSIT trials ranged from 216 to 321 and 242 to 447, respectively, across eight data sets. Thus, we ar- rive at a large matrix with trials as rows and neural features as columns (see Figure 5). In building the neural encoder model, we looked for an initial subset of candidate neural features that are informative about the cognitive state. We

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1772 A. Yousefi et al.

started by constructing a linear model for the neural encoder of each feature individually with the following form,

= + +  , = ,..., , log zk,n b1 b2xk,m k,m m 1 M (3.4)

where xk,m is the cognitive state at trial k from the mth sampled trajec-  tory and k,m are independent normal white noise terms. We consider M trajectories—here, M = 1000—and all trajectories are mapped to the same neural features. We were interested in building a neural decoder model for the baseline cognitive state, xbase. Equation (4.4) models the conditional dis- tribution of the neural features by a log-normal distribution, as previously discussed (see section 3.2.1). Using the modified F-test (see equation 3.9), we selected candidate fea- tures in our example data set that showed significant association with the estimated state (neural features shown circled in orange in Figure 6A). Since we used a linear model, we also examined the R2 values for each neural fea- ture and found that they were higher for those features that were selected by the modified F-test (see Figure 6A). The neural feature exhibiting the highest R2 (green asterisks in Figures 6A and 6B) demonstrated that there can be a clear predictive power, or at least a significant correlation, between even single neural features and the cognitive state estimate (see Figure 6C). On the other hand, a feature with low R2 (see the red asterisk, Fig- ure 6A) did not show any correlation with the cognitive state estimate (see Figure 6D). To avoid the multiple comparison problem, we tested if the encoder se- lected the neural features that had a significant functional relationship with the baseline state by shuffling the trial order and fitting encoder models to the baseline state of these shuffled trials. We performed 100 shuffles and determined the number of neural features that were selected to have sig- nificant relationships (F-test, p < 0.01) with the shuffled trials. We found that the actual number of features selected was significantly higher before shuffling trials (see Figure 7). This showed that the features selected using the modified F-test have a significant encoding relationship with thebase- line state and that they were not selected by chance. To address the multi- ple comparison issue, we can also use the Bonferroni correction procedure by setting the significance cutoff at α/N where N is the number of neural features (Nakagawa, 2004). However, depending on the correlation struc- ture of the tests, the Bonferroni correction could be extremely conservative, leading to a high rate of false negatives. Note that the neural features are ex- tracted from different brain areas and different frequency bands; thus, they tend to be less correlated signals. This suggests that the significant value of 0.01 chosen here is a reasonable choice to avoid the multiple comparison problem.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1773

Figure 6: Neural feature selection using modified F-test. (A) R2 values of all the neural features considered for an example data set with the features identified using the modified F-test (p < 0.01) indicated by red circles. (B) Neural feature (Z) with highest R2 (corresponding to the feature circled marked in green aster- isks in panel (A) and baseline state (X) estimated from RT (Y). (C, D) Scatter plots of Z and X for the neural features with highest (0.13) and lowest (6.7431 ∗ 10−7) R2 values as marked in panel A.

To identify candidate features for final neural decoding, we performed cross-validation with a training set comprising either the first half or the last half of the trials in the participant data set. The training set was determined as the half that showed a greater range of the baseline state decoded from RT. Thus, the neural features identified via the modified F-test were the ones that showed a significant predictive power of the baseline cognitive state during the training set trials of the experiment. The same trials were used for neural feature pruning. For this second step, we used an RMSE measure between the mean cognitive state estimated from RT and that decoded from the neural features of the training set. The first step of the feature selection procedure applied to a single participant example data set (see Figures 4 and 6) using the modified F-test (as described in section 3.4.1) identified 36 out of 249 neural features. The pruning step was then applied to these

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1774 A. Yousefi et al.

Figure 7: Testing for significant encoding relationship. Distribution of number of neural features selected by the encoding step (F-test, p < 0.01) using shuf- fled trials in eight data sets. The number of neural features selected using the true trial sequence is marked in red. N is the number of features available for selection in each participant.

36 features, which produced a subset of 14 neural features with the lowest RMSE (see Figure 8A). We then used the resulting encoder models to decode the cognitive base- line state at each trial using only neural activity. In doing so, we computed the instantaneous likelihood of the cognitive state as a function of the neural features for each trial (see equation 3.10). The instantaneous likelihood is the product of individual likelihoods of the selected encoder models over cor- responding neural features (see Figures 8B and 8C). We then used equations | 3.11 and 3.12 to find the filter estimate—xk z1,...,k—of the cognitive state. We estimated the mean baseline state using the encoder models corresponding to both the extended feature set (selected via F-test) and the smaller con- strained set, selected after pruning (see Figure 8D). The mean baseline state decoded from the constrained model had a lower RMSE and higher correlation coefficient with that decoded from RT when compared to the one decoded from the full model (RMSE = 0.081, 0.149; Corr = 0.803, 0.698). Furthermore, the former had a higher HPD than the latter (0.974, 0.707). Thus, the constrained model produced a more ro- bust and accurate estimate of the baseline state than the full model that included more neural features.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1775

Figure 8: Pruning step and RMSE analysis. (A) RMSE between baseline cogni- tive state (xbase) estimated from RT and a progressively decreasing set of neural features from the training set trials. The x-axis indicates the number of neural features being considered. (B) Likelihood of the decoded state using full set of 36 features. (C) Likelihood of the decoded state using constrained neural features (the white line is the mean of the cognitive state using RT). (D) Mean cognitive state decoded from RT (black solid) with 2 SD error bounds (black dashed line) and that decoded from the full set of 36 neural features selected by the modified F-test (green) and 14 features that produced minimum RMSE on the training set (red).

To compare the decoder performance using the full encoder model and the constrained encoder model, we used three metrics: (1) RMSE between the mean decoded states using RT and neural features selected by the modified F-test and pruning procedure, divided by the range of the mean decoded state from RT; (2) ratio of the number of trials for which the neural decoded state was within the confidence bounds of the state esti- mated from the behavioral data; and (3) correlation coefficient between the mean decoded states using RT and neural features. For a reliable decoder performance, we aimed for a low RMSE, a high ratio of decoded states within the confidence interval, and a high correlation between the decoded states. The constrained model using neural features selected by the stepwise

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1776 A. Yousefi et al.

Figure 9: Decoding result across multiple data sets. (A) Top: RMSE between de- coded mean baseline state using RT and neural features divided by the range of the behavior decoded mean baseline state over all trials in eight participants. Middle: Ratio of the number of trials for which the neural decoded state lies within the confidence bounds of the behavior decoded state. Bottom: Correla- tion coefficient between the mean of decoded states using neural features and RT. The neural features were selected using a modified F-test to produce the full encoder model, and then an RMSE-based stepwise identification procedure was used to produce a constrained model. The error bars represent one standard er- ror. The p-values correspond to a t-test. (B) Mean baseline states decoded from neural features in the remaining seven data sets plotted along with the corre- sponding mean state decoded from RT. We assume the behavior decoded state as the ground truth.

identification procedure performed better in terms of achieving alower RMSE (t-test, p < 0.03), a higher correlation with the behavior decoded mean state (t-test, p < 0.05), and a higher ratio (t-test, p < 0.02) than the full model using a bigger pool of neural features selected by the modified F-test (see Figure 9A). Visual inspection of the neural decoded states also showed that their trajectory closely followed the behavior decoded state (see Figure 9B). In online appendix B, we show the decoding result in another data set. While the baseline state in Figure 8 has a monotonic trend, the one shown

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1777

Table 1: Number of Participants For Whom Different Brain Regions Were Se- lected Using the Stepwise Encoding Procedure in Eight Participant Data Sets.

Region Low (4–8 Hz, 8–15 Hz) High (65–200 Hz) dlPFC 6 6 dmPFC 0 1 Posterior dlPFC 1 2 Lateral OFC 1 2 vlPFC 5 5 rACC 2 1 dACC 3 4

in the appendix has a nonmonotonic trend, and thus it further validates the neural decoding result.

3.4 Neural Features used in Decoding of Flexibility State. We based our choice of the candidate neural features on regions known to be involved in interference tasks like the MSIT (Bush et al., 2003; Bush & Shin, 2006). We considered three frequency bands: theta, alpha, and high gamma. Using the model pruning procedure, we found that dlPFC neural features were correlated with the baseline (reaction time) state in six of eight participant data sets while vlPFC neural features were used in five of eight data sets (see Table 1). High gamma power in dACC was selected in 4 of 8 data sets.

4 Discussion

We have developed a new encoding-decoding framework to estimate the dynamics of cognitive processes using combinations of behavioral and neural signals. This framework is different from classical state-space paradigms, which are applied to directly relate behavior and neural activ- ity in low-dimensional and observed data. It is also fundamentally differ- ent from those state-space models used in the neural motor decoders. In the domain of neural motor decoders, the state-space models are gener- ally used to improve the accuracy of the neural decoder and potentially build a more robust decoder model by “denoising” the neural spike trains; using the state-space model to build a low-dimensional representation of neural spike trains motivated by the idea that recording a larger num- ber of cells’ activity does not necessarily improve the decoding accuracy (Kao et al., 2015; Aghagolzadeh & Truccolo, 2016). Our framework is scal- able in neural encoding-decoding problems when both behavioral and neu- ral data become high-dimensional and multifaceted, whereas developing neural encoder-decoder models for the classical methods becomes a chal- lenging modeling problem. For instance, in many cognitive tasks, both reaction time and accuracy carry information about brain function; as a

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1778 A. Yousefi et al.

result, neither a neural decoder model for reaction time nor accuracy can properly capture the behavior. Appendix A shows an example where both RT and accuracy carry mutual information about the behavior and un- derlying cognitive process. Building mathematical models between neural features and multidimensional behavioral signals—specifically, each with a different characteristic—becomes a hard modeling problem. Using the proposed modeling framework, we demonstrated how a cognitive state could be estimated using the cohort of neural features identified through the neural encoding step. Under this modeling framework, the behavior can be estimated using distributed neural activity without any explicit low- dimensionality assumption on brain dynamics; thus, we can estimate those components of behavior that are shaped by a distributed neural activity. The modeling framework has been successfully applied on a data set of eight participants performing the MSIT task. There are multiple challenges in building neural decoder models for cog- nitive state processes. Cognitive states are generally abstract features that influence measurable signals rather than being directly measurable them- selves. This contrasts with decoder models of signals such as those in motor cortex, where the associated movements can be measured (Serruya et al., 2002; Flint et al., 2012). The outcomes of a cognitive state may be observed across a range of behavioral signals, and our proposed behavioral encoder- decoder model demonstrates how different signals can be optimally com- bined in decoding the underlying cognitive state. Indeed, emergence of different cognitive processes requires the involvement of different brain networks and time courses; thus, the neural decoder model must be de- fined as a function of collective neural activity across multiple brain regions rather than a specific neural feature of an individual brain structure. For instance, cognitive flexibility is linked to the frontal cortical and anterior cingulate network (Sheth et al., 2012; Kim et al., 2011). Building a model to articulate behavior and neural activity is a hard problem given the multi- modality and dynamicity of the behavior and the large dimension of the neural features (Paninski et al., 2007; Koyama et al., 2010). The idea of using a low-dimensional dynamical variable that captures the essential features of the behavior has certain advantages that can be applied to other domains of cognitive research, including learning, affec- tive processes, and decision making. Previous work has sought to define static latent variables to link neural and behavioral data (Turner et al., 2013), whereas in our framework, we use a dynamical latent variable to link changes between behavior and neural activity over time, which could be useful in other behavioral domains Sarma and Sacré (2018) applied a state-space modeling framework to link the behavioral strategies observed during a gambling task to internal cognitive state processes and subse- quently link the estimated values of this state to spiking activity in orbital frontal cortex and insula. The goal of linking neural activity across mul- tiple brain areas to complex behaviors is similar to ours, but we extend

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1779

this approach by developing techniques to identify a subset of neural fea- tures from a candidate set that optimizes cross-validated decoding accu- racy. Ezzyat et al. (2017) used patterns of spectral power across electrodes to train a penalized logistic regression classifier used to discriminate encoding activity during subsequently recalled words from subsequently forgotten words. There, they demonstrated the possibility of using high-dimensional neural data to encode brain states; however, their internal state is observ- able and categorical rather than continuous valued. We built the neural en- coder model using spectral power features of LFP recorded from the ; however, the proposed modeling framework is applicable to other modalities of neural data, including fMRI and EEG. It is also applicable to other neural features such as a coherence measure representing coupling of different brain areas and measures representing spatio-temporal neural dynamics. The other modality of neural activity that can be used in build- ing the neural decoder model is cell ensemble spiking activity. The neu- ral decoder model can be built using a point-process modeling framework and GLMs on the encoder side (Truccolo et al., 2005; Kass & Ventura, 2001; Deng et al., 2015). The dynamical behavioral model proposed here can be extended to include other physiological signals including heart rate, skin conductance, pupil dilation, movement activity, or voice (Munia, Islam, Is- lam, Mostafa, & Ahmad, 2012; Sun et al., 2010). Though the main scope of this work focuses on linking multi- dimensional and multimodal behavioral and neural data using a low- dimensional dynamical latent variable(s), the idea of using state-space modeling frameworks to factorize different cognitive processes and build a low-dimensional dynamical representation of cognitive process(es) can be used in a one-dimensional behavioral readout, like RT. This is because most cognitive tasks incorporate different stimuli to trigger a specific cognitive process, and thus the RT is not merely the outcome of a single cognitive pro- cess but a complex combination of various processes, such as attention or adaptation. An accurate estimation of a desired cognitive process requires removing other confound cognitive components present in the readout sig- nal. In other words, even a low-dimensional behavioral readout like RT can be a complex function of multiple cognitive processes, and we can use the state-space modeling framework to regress out unnecessary and confound- ing cognitive processes to build a more accurate estimation of the desired cognitive processes. This suggests that smoothing the behavioral readout is not necessarily the optimal choice to characterize the relationship between brain and behavior, since the smoothed metric would include the processes of no interest. Indeed, we argue that the framework proposed here can be used to build a more accurate representation of behavior and its connection to the neural activity by separating out these hidden states as features in the modeling framework. In section 4, when we discussed the MSIT behavioral signal, we built the behavioral decoder using two cognitive process components, baseline and

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1780 A. Yousefi et al.

conflict, and focused on decoding the baseline state using neural data.In online appendix C, we demonstrate how these two states, the conflict and baseline states, can be combined to predict RT. This combined state esti- mate model could better predict RT than a decoder model built solely based on (smoothed) RT, showing how the state-space modeling framework may more optimally predict behavior from the underlying component cognitive processes. In developing the E-D framework, we applied different goodness-of-fit analyses to build and validate the optimal encoder model, assess the de- coding performance, and select neural correlates with significant statistical power. In future work, we could also use other model selection processes in- cluding step-wise regression or regularized GLM estimation like GLMNet (Claeskens & Hjort, 2008; Friedman, Hastie, & Tibshirani, 2010a). Similarly, this framework has the flexibility to examine different behavioral models and check the encoding result of other state variables of the model provid- ing more insight into the model to arrive at a more robust neural decoder model. The analysis of the decoding results on the MSIT data set of eight par- ticipants suggested the involvement of dACC, vlPFC, and dlPFC. These results support previous studies suggesting that performance on interfer- ence/conflict tasks correlates with function within the frontal cortical re- gion (Perret, 1974; Sheth et al., 2012; Cavanagh & Frank, 2014) and the anterior cingulate cortex (Carter et al., 1997; Gonzalez-Villar & Carrillo-de- la-Peña, 2017). The encoder selection procedure not only helped create a decoder model but could also be used to select parsimoniously features for real-time applications. Though the results are consistent with the prior literature (Bush & Shin, 2006), there are other questions that might be an- swered, such as whether the neural decoder model is consistent across dif- ferent sessions separated by days and how the changes in the neural model over time can be addressed. Another question is whether we can apply the framework to a more general problem of mental and cognitive estimation where the behavioral outcome is not conditioned on a specific cognitive task such as MSIT. The other important question is whether the E-D frame- work can be used as a diagnostic tool to assess healthy and abnormal brain dynamics and how the results can be validated using established psycho- metric assessment tools (Nasreddine et al., 2005). Finally, a trending effort in is to intervene in brain dynamics to alleviate neurological problems (Ezzyat et al., 2017; Deadwyler et al., 2017); therefore, it will be important to understand how neuromodulatory effects can be addressed in this encoding-decoding modeling framework. The modeling framework proposed here may allow investigators to characterize a causal relationship between neural activity and the behavior, particularly when the behavior or the neural activity is dynamic and high- dimensional. We envision that the modeling framework will play a future role in the development of novel brain computer interfaces for modulating

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1781

behavior. The proposed methodology allows investigators to identify and manipulate a low-dimensional correlate of cognitive function, which can lead to altering neural activity related to behavior. This is an important step in treating mental diseases like posttraumatic stress disorder and obsessive- compulsive disorders.

Acknowledgments

This research was funded in part by the Defense Advanced Re- search Projects Agency (DARPA) under Cooperative Agreement Number W911NF-14-2-0045, issued by the Army Research Office contracting of- fice in support of DARPA’S SUBNETS program. The views, opinions, and findings expressed are our own and should not be interpreted as repre- senting the official views or policies of the Department of Defense orthe U.S. government. We thank Gio Piantoni, J. B. Eichenlaub, Pariya Salami, Nir Nossenson, Erica Johnson, Mia Borzello, Kara Farnes, Deborah Vallejo- Lopez, Gavin Belok, Rina Zelman, Sam Zorowitz, and Britni Crocker for their help in recording the human data and task preparation. A copy of the source code used in this research, along with sam- ple data, can be found at the following GitHub link: https://github.com /TRANSFORM-DBS/Encoder-Decoder-Paper.

References

Aghagolzadeh, M., & Truccolo, W. (2016). Inference and decoding of motor cortex low-dimensional dynamics via latent state-space models. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 24(2), 272–282. Avena-Koenigsberger, A., Misic, B., & Sporns, O. (2018). Communication dynamics in complex brain networks. Nature Reviews Neuroscience, 19(1), 17. Barbieri, R., Quirk, M. C., Frank, L. M., Wilson, M. A., & Brown, E. N. (2001). Con- struction and analysis of non-Poisson stimulus-response models of neural spik- ing activity. Journal of Neuroscience Methods, 105(1), 25–37. Barrett, L. F., & Satpute, A. B. (2013). Large-scale brain networks in affective and : Towards an integrative functional architecture of the brain. Current Opinion in Neurobiology, 23(3), 361–372. Bastos, A. M., & Schoffelen, J. M. (2016). A tutorial review of functional connectivity analysis methods and their interpretational pitfalls. Frontiers in Systems Neuro- science, 9, 175. Bentler, P. M., & Weeks, D. G. (1980). Linear structural equations with latent vari- ables. Psychometrika, 45(3), 289–308. Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer-Verlag. Brainard, D. H., & Vision, S. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436. Braun,U.,Schäfer,A.,Walter,H.,Erk,S.,Romanczuk-Seiferth,N.,Haddad,L.,... Meyer-Lindenberg, A. (2015). Dynamic reconfiguration of frontal brain networks

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1782 A. Yousefi et al.

during executive cognition in humans. Proceedings of the National Academy of Sci- ences, 112(37), 11678–11683. Bressler, S. L., & Menon, V.(2010). Large-scale brain networks in cognition: Emerging methods and principles. Trends in Cognitive Sciences, 14(6), 277–290. Brouwer, A. M., Zander, T. O., van Erp, J. B., Korteling, J. E., & Bronkhorst, A. W. (2015). Using neurophysiological signals that reflect cognitive or affective state: Six recommendations to avoid common pitfalls. Frontiers in Neuroscience, 9, 136. Brown, E. N., Barbieri, R., Ventura, V., Kass, R. E., & Frank, L. M. (2002). The time- rescaling theorem and its application to neural spike train data analysis. Neural Computation, 14(2), 325–346. Bush, G., & Shin, L. M. (2006). The multi-source interference task: An fMRI task that reliably activates the cingulo-frontal-parietal cognitive/attention network. Na- ture Protocols, 1(1), 308. Bush, G., Shin, L. M., Holmes, J., Rosen, B. R., & Vogt, B. A. (2003). The multi-source interference task: Validation study with fMRI in individual subjects. Molecular Psychiatry, 8(1), 60. Buzsáki, G., Anastassiou, C. A., & Koch, C. (2012). The origin of extracellular fields and currents—EEG, ECoG, LFP, and spikes. Nature Reviews Neuroscience, 13(6), 407. Calabrese, A., Schumacher, J. W., Schneider, D. M., Paninski, L., & Woolley, S. M. (2011). Ageneralized linear model for estimating spectrotemporal receptive fields from responses to natural sounds. PloS One, 6(1), e16104. Carter, C. S., Mintun, M., Nichols, T., & Cohen, J. D. (1997). Anterior cingulate gyrus dysfunction and selective attention deficits in schizophrenia:[15O]2 H O PET study during single-trial Stroop task performance. American Journal of Psy- chiatry, 154(12), 1670–1675. Cavanagh, J. F., & Frank, M. J. (2014). Frontal theta as a mechanism for cognitive control. Trends in Cognitive Sciences, 18(8), 414–421. Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging.Cambridge: Cambridge University Press. Chipman, H., George, E. I., McCulloch, R. E., Clyde, M., Foster, D. P., & Stine, R. A. (2001). The practical implementation of Bayesian model selection. Lecture Notes— Monograph Series, 65–116. Beachood, OH: Institute of Mathematical Statistics. Cohen, M. X. (2014). Analyzing neural time series data: Theory and practice.Cambridge, MA: MIT Press. Crandall, B., Klein, G., Klein, G. A., & Hoffman, R. R. (2006). Working minds: A prac- titioner’s guide to cognitive task analysis. Cambridge, MA: MIT Press. Davidson, R. J., Putnam, K. M., & Larson, C. L. (2000). Dysfunction in the neural cir- cuitry of emotion regulation—a possible prelude to violence. Science, 289(5479), 591–594. Deadwyler,S.A.,Hampson,R.E.,Song,D.,Opris,I.,Gerhardt,G.A.,Marmarelis, V. Z., & Berger, T. W. (2017). A cognitive prosthesis for memory facilitation by closed-loop functional ensemble stimulation of hippocampal neurons in primate brain. Experimental Neurology, 287, 452–460. Deco, G., Jirsa, V. K., & McIntosh, A. R. (2011). Emerging concepts for the dynami- cal organization of resting-state activity in the brain. Nature Reviews Neuroscience, 12(1), 43.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1783

Deng,X.,Faghih,R.T.,Barbieri,R.,Paulk,A.C.,Asaad,W.F.,Brown,E.N.,... Eden, U. T. (2015). Estimating a dynamic state to relate neural spiking activity to behavioral signals during cognitive tasks. In Proceedings of the 37th Annual Inter- national Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 7808–7813). Piscataway, NJ: IEEE. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from in- complete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38. Dykstra,A.R.,Chan,A.M.,Quinn,B.T.,Zepeda,R.,Keller,C.J.,Cormier,J.,...Cash, S. S. (2012). Individualized localization and cortical surface-based registration of intracranial electrodes. NeuroImage, 59(4), 3563–3570. Egner, T. (2014). Creatures of habit (and control): A multi-level learning perspective on the modulation of congruency effects. Frontiers in Psychology, 5, 1247. Egner, T., Etkin, A., Gale, S., & Hirsch, J. (2007). Dissociable neural systems resolve conflict from emotional versus nonemotional distracters. Cerebral Cortex, 18(6), 1475–1484. Eriksen, B. A., & Eriksen, C. W. (1974). Effects of noise letters upon the identifica- tion of a target letter in a nonsearch task. Perception and Psychophysics, 16(1), 143– 149. Ezzyat,Y.,Kragel,J.E.,Burke,J.F.,Levy,D.F.,Lyalenko,A.,Wanda,P.,...Sperling, M. R. (2017). Direct brain stimulation modulates encoding states and memory performance in humans. Current Biology, 27(9), 1251–1258. Fischer, B. J., & Peña, J. L. (2011). Owl’s behavior and neural representation predicted by Bayesian inference. Nature Neuroscience, 14(8), 1061. Flint, R. D., Lindberg, E. W., Jordan, L. R., Miller, L. E., & Slutzky, M. W. (2012). Accurate decoding of reaching movements from field potentials in the absence of spikes. Journal of , 9(4), 046006. Friedman, J., Hastie, T., & Tibshirani, R. (2010a). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1. Friedman, J., Hastie, T., & Tibshirani, R. (2010b). A note on the group Lasso and a sparse group Lasso. arXiv:1001.0736. Gardiner, C. W. (1985). Handbook of stochastic methods for physics, chemistry and the natural sciences. Berlin: Springer. Glaser, J. I., & Kording, K. P. (2016). The development and analysis of integrated neuroscience data. Frontiers in Computational Neuroscience, 10,11. González-Villar, A. J., & Carrillo-de-la-Peña, M. T. (2017). Brain electrical activity signatures during performance of the multisource interference task. Psychophys- iology, 54(6), 874–881. Gordon, N. J., Salmond, D. J., & Smith, A. F. (1993, April). Novel approach to nonlinear/non-gaussian Bayesian state estimation. In IEE Proceedings F (Radar and ), 140(2), 107–113. Hara, H. (1980). A relation between non-Markov and Markov processes. Zeitschrift für Physik B Condensed Matter, 36(4), 369–372. Haynes, J. D., & Rees, G. (2006). : Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7), 523. Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4(6), 223–233.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1784 A. Yousefi et al.

Hochberg, L. R., Bacher, D., Jarosiewicz, B., Masse, N. Y., Simeral, J. D., Vogel, J., . . . Donoghue, J. P. (2012). Reach and grasp by people with tetraplegia using a neurally controlled robotic arm. Nature, 485(7398), 372. Hoeffding, W. (1963). Probability inequalities for sums of bounded random vari- ables. Journal of the American Statistical Association, 58(301), 13–30. Huang, J., Ma, S., Xie, H., & Zhang, C. H. (2009). A group bridge approach for vari- able selection. Biometrika, 96(2), 339–355. Hyndman, R. J. (1996). Computing and graphing highest density regions. American Statistician, 50(2), 120–126. Jorgenson, L. A., Newsome, W. T., Anderson, D. J., Bargmann, C. I., Brown, E. N., Deisseroth, K., . . . Marder, E. (2015). The BRAIN initiative: Developing technol- ogy to catalyse neuroscience discovery. Phil. Trans. R. Soc. B, 370(1668), 20140164. Kao, J. C., Nuyujukian, P., Ryu, S. I., Churchland, M. M., Cunningham, J. P., & Shenoy, K. V. (2015). Single-trial dynamics of motor cortex and their applications to brain- machine interfaces. Nature Communications, 6, 7759. Kass, R. E., & Ventura, V.(2001). Aspike-train probability model. Neural Computation, 13(8), 1713–1720. Kim, C., Johnson, N. F., Cilles, S. E., & Gold, B. T. (2011). Common and distinct mech- anisms of cognitive flexibility in prefrontal cortex. Journal of Neuroscience, 31(13), 4771–4779. Kleiner, M., Brainard, D., Pelli, D., Ingling, A., Murray, R., & Broussard, C. (2007). What’s new in Psychtoolbox-3? Perception, 36(14), 1. Kloosterman, F., Layton, S. P., Chen, Z., & Wilson, M. A. (2013). Bayesian decoding using unsorted spikes in the rat hippocampus. Journal of , 111(1), 217–227. Knill, D. C., & Pouget, A. (2004). The Bayesian brain: The role of uncertainty in and computation. Trends in , 27(12), 712–719. Koyama, S., Eden, U. T., Brown, E. N., & Kass, R. E. (2010). Bayesian decoding of neural spike trains. Annals of the Institute of Statistical Mathematics, 62(1), 37. Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., MacIver, M. A., & Poeppel, D. (2017). Neuroscience needs behavior: Correcting a reductionist bias. Neuron, 93(3), 480–490. Lawhern, V., Wu, W., Hatsopoulos, N., & Paninski, L. (2010). Population decoding of motor cortical activity using a generalized linear model with hidden states. Journal of Neuroscience Methods, 189(2), 267–280. Lebedev, M. A., & Nicolelis, M. A. (2017). Brain-machine interfaces: From basic sci- ence to neuroprostheses and . Physiological Reviews, 97(2), 767– 837. LeDoux, J. E. (2000). Emotion circuits in the brain. Annual Review of Neuroscience, 23(1), 155–184. Li, Z., O’Doherty, J. E., Lebedev, M. A., & Nicolelis, M. A. (2011). Adaptive decoding for brain-machine interfaces through Bayesian parameter updates. Neural Com- putation, 23(12), 3162–3204. Lindsten, F., & Schön, T. B. (2013). Backward simulation methods for Monte Carlo statistical inference. Foundations and Trends in Machine Learning, 6(1), 1–143. Loehlin, J. C., & Beaujean, A. A. (2016). Latent variable models (pp. 15–50). London: Routledge.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1785

Makeig, S., Debener, S., Onton, J., & Delorme, A. (2004). Mining event–related brain dynamics. Trends in Cognitive Sciences, 8(5), 204–210. Martinez-Rubio, C., Paulk, A. C., McDonald, E. J., Widge, A. S., & Eskandar, E. N. (2018). Multi-modal encoding of novelty, reward, and learning in the primate nu- cleus basalis of Meynert. Journal of Neuroscience, 38, 1942–1958. McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. Boca Raton, FL: CRC Press. Miller, A. (2002). Subset selection in regression. London: Chapman and Hall/CRC. Miranda de Sá, A. M. F. L. (2004). A note on the sampling distribution of coherence estimate for the detection of periodic signals. IEEE Signal Processing Letters, 11(3), 323–325. Mitchell, T. M., Hutchinson, R., Niculescu, R. S., Pereira, F., Wang, X., Just, M., & Newman, S. (2004). Learning to decode cognitive states from brain images. Ma- chine Learning, 57(1–2), 145–175. Mukamel, R., Gelbard, H., Arieli, A., Hasson, U., Fried, I., & Malach, R. (2005). Cou- pling between neuronal firing, field potentials, and FMRI in human auditory cor- tex. Science, 309(5736), 951–954. Munia, T. T. K., Islam, A., Islam, M. M., Mostafa, S. S., & Ahmad, M. (2012, May). Mental states estimation with the variation of physiological signals. In Proceedings of the 2012 International Conference on Informatics, Electronics and Vision (pp. 800– 805). Piscataway, NJ: IEEE. Nakagawa, S. (2004). A farewell to Bonferroni: The problems of low statistical power and publication bias. Behavioral Ecology, 15(6), 1044–1045. Nasreddine, Z. S., Phillips, N. A., Bédirian, V., Charbonneau, S., Whitehead, V., Collin, I., . . . Chertkow, H. (2005). The Montreal Cognitive Assessment, MoCA: A brief screening tool for mild cognitive impairment. Journal of the American Geri- atrics Society, 53(4), 695–699. Nuyujukian, P., Sanabria, J. A., Saab, J., Pandarinath, C., Jarosiewicz, B., Blabe, C. H., ...Hochberg,L.R.(2018). Cortical control of a tablet computer by people with paralysis. PloS One, 13(11), e0204566. Oostenveld, R., Fries, P., Maris, E., & Schoffelen, J. M. (2011). FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Computational Intelligence and Neuroscience, 2011,1. Paninski, L., Pillow, J., & Lewi, J. (2007). Statistical models for neural encoding, decoding, and optimal stimulus design. Progress in Brain Research, 165, 493– 507. Peled, N., Gholipour, T., Paulk, A. C., Felsenstein, O., Eichenlaub, J. B., Dougherty, D. D., . . . Stufflebeam, S. M. (2017). Invasive electrodes identification and labeling. GitHub Repository. https://github.com/pelednoam/ieil. doi:10.5281/zenodo .1078789 Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics: Transform- ing numbers into movies. Spatial Vision, 10(4), 437–442. Perret, E. (1974). The left frontal lobe of man and the suppression of habitual re- sponses in verbal categorical behaviour. Neuropsychologia, 12(3), 323–330. Pillow, J. (2007). Likelihood-based approaches to modeling the neural code. In K. Doya, S. Ishii, A. Pouget, & R. P. Rao (Eds.), Bayesian brain: Probabilistic approaches to neural coding (pp. 53–70). Cambridge, MA: MIT Press.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1786 A. Yousefi et al.

Poldrack, R. A. (2011). Inferring mental states from neuroimaging data: From reverse inference to large-scale decoding. Neuron, 72(5), 692–697. Prerau,M.J.,Smith,A.C.,Eden,U.T.,Yanike,M.,Suzuki,W.A.,&Brown,E.N. (2008). A mixed filter algorithm for cognitive state estimation from simultane- ously recorded continuous and binary measures of performance. Biological Cy- bernetics, 99(1), 1–14. Reuter, M., Rosas, H. D., & Fischl, B. (2010). Highly accurate inverse consistent reg- istration: A robust approach. NeuroImage, 53(4), 1181–1196. Reuter, M., Schmansky, N. J., Rosas, H. D., & Fischl, B. (2012). Within-subject tem- plate estimation for unbiased longitudinal image analysis. NeuroImage, 61(4), 1402–1418. Sani, O. G., Yang, Y., Lee, M. B., Dawes, H. E., Chang, E. F., & Shanechi, M. M. (2018). Mood variations decoded from multi-site intracranial human brain activity. Na- ture Biotechnology, 36(10), 954. Sarma, S. V., & Sacré, P. (2018). Characterizing complex human Behaviors and neu- ral responses using dynamic models. In Z. Chen & S. V. Sarma (Eds.), Dynamic neuroscience (pp. 177–195). Cham: Springer. Schöner, G. (2008). Dynamical systems approaches to cognition. In R. Sun (Ed.), Cam- bridge handbook of computational cognitive modeling (pp. 101–126). Cambridge: Cam- bridge University Press. Schwartz, A. B., Kettner, R. E., & Georgopoulos, A. P. (1988). Primate motor cortex and free arm movements to visual targets in three-dimensional space. I. Relations between single cell discharge and direction of movement. Journal of Neuroscience, 8(8), 2913–2927. Serruya, M. D., Hatsopoulos, N. G., Paninski, L., Fellows, M. R., & Donoghue, J. P. (2002). Brain-machine interface: Instant neural control of a movement signal. Nature, 416(6877), 141. Sheth, S. A., Mian, M. K., Patel, S. R., Asaad, W. F., Williams, Z. M., Dougherty, D. D.,...Eskandar,E.N.(2012). Human dorsal anterior cingulate cortex neurons mediate ongoing behavioural adaptation. Nature, 488(7410), 218. Shine, J. M., Breakspear, M., Bell, P., Martens, K. E., Shine, R., Koyejo, O., . . . Poldrack, R. (2018). The low dimensional dynamic and integrative core of cognition in the human brain. bioRxiv: 266635. Shlens, J. (2014). Notes on generalized linear models of neurons. arXiv:1404.1999. Sidman, M. (1960). Tactics of scientific research: Evaluating experimental data in psychol- ogy. New York: Basic Books. Simon, J. R., & Berbaum, K. (1990). Effect of conflicting cues on information pro- cessing: The “Stroop effect” vs. the “Simon effect.” Acta Psychologica, 73(2), 159– 170. Skrondal, A., & Rabe-Hesketh, S. (2007). Latent variable modelling: A survey. Scan- dinavian Journal of Statistics, 34(7), 712–745. Smith,A.C.,Frank,L.M.,Wirth,S.,Yanike,M.,Hu,D.,Kubota,Y.,...Brown,E.N. (2004). Dynamic analysis of learning in behavioral experiments. Journal of Neuro- science, 24(2), 447–461. Sporns, O., Chialvo, D. R., Kaiser, M., & Hilgetag, C. C. (2004). Organization, devel- opment and function of complex brain networks. Trends in Cognitive Sciences, 8(9), 418–425.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 Decoding Cognitive States From Behavior and Physiology 1787

Stevenson, I. H., & Kording, K. P. (2011). How advances in neural recording affect data analysis. Nature Neuroscience, 14(2), 139. Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Exper- imental Psychology, 18(6), 643. Sun, F. T., Kuo, C., Cheng, H. T., Buthpitiya, S., Collins, P., & Griss, M. (2010, October). Activity-aware mental stress detection using physiological sensors. In Proceedings of the International Conference on Mobile Computing, Applications, and Services (pp. 282–301). Berlin: Springer. Suzuki, W. A., & Brown, E. N. (2005). Behavioral and neurophysiological analyses of dynamic learning processes. Behavioral and Reviews, 4(2), 67–95. Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P., & Brown, E. N. (2005). A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of Neurophysiology, 93(2), 1074–1089. Turner, B. M., Forstmann, B. U., Wagenmakers, E. J., Brown, S. D., Sederberg, P. B., & Steyvers, M. (2013). A Bayesian framework for simultaneously modeling neural and behavioral data. NeuroImage, 72, 193–206. Van Veen, V., Cohen, J. D., Botvinick, M. M., Stenger, V. A., & Carter, C. S. (2001). An- terior cingulate cortex, conflict monitoring, and levels of processing. NeuroImage, 14(6), 1302–1308. Venables, W. N., & Ripley, B. D. (2013). Modern applied statistics with S-PLUS.New York: Springer Science & Business Media. Voytek, B., & Knight, R. T. (2015). Dynamic network communication as a unifying neural basis for cognition, development, aging, and disease. Biological Psychiatry, 77(12), 1089–1097. Vyas, S., Even-Chen, N., Stavisky, S. D., Ryu, S. I., Nuyujukian, P., & Shenoy, K. V. (2018). Neural population dynamics underlying motor learning transfer. Neuron, 97(5), 1177–1186. Wang, J., Rao, H., Wetmore, G. S., Furlan, P. M., Korczykowski, M., Dinges, D. F., & Detre, J. A. (2005). Perfusion functional MRI reveals cerebral blood flow pattern under psychological stress. Proceedings of the National Academy of Sciences, 102(49), 17804–17809. Widge,A.S.,Ellard,K.K.,Paulk,A.C.,Basu,I.,Yousefi,A.,Zorowitz,S.,...Kramer, M. A. (2017). Treating refractory mental illness with closed-loop brain stimula- tion: Progress towards a patient-specific transdiagnostic approach. Experimental Neurology, 287, 491–472. Wilcox, R. (2005). Kolmogorov–Smirnov Test. In P. Armitage & T. Colton (Eds.), En- cyclopedia of biostatistics, vol. 4. Hoboken, NJ: Wiley. Wu,W.,Black,M.J.,Gao,Y.,Serruya,M.,Shaikhouni,A.,Donoghue,J.P.,&Bienen- stock, E. (2003). Neural decoding of cursor motion using a Kalman filter. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems, 15 (pp. 133–140). Cambridge, MA: MIT Press. Wu, W., Kulkarni, J. E., Hatsopoulos, N. G., & Paninski, L. (2009). Neural de- coding of hand motion using a linear state-space model with hidden states. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 17(4), 370– 378.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021 1788 A. Yousefi et al.

Yousefi, A., Paulk, A. C., Basu, I., Dougherty, D. D., Eskandar, E. N., Eden, U.T.,& Widge, A. S. (2018). COMPASS: An open-source, general-purpose software toolkit for computational psychiatry. bioRxiv:377556. Yousefi, A., Paulk, A. C., Deckersbach, T., Dougherty, D. D., Eskandar, E. N.,Widge, A. S., & Eden, U. T. (2015). Cognitive state prediction using an EM algorithm applied to gamma distributed data. In Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 7819–7824). Piscataway, NJ: IEEE. Zarahn, E., Aguirre, G., & D’Esposito, M. (1997). A trial-based experimental design for fMRI. NeuroImage, 6(2), 122–138.

Received August 28, 2018; accepted February 11, 2019.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco_a_01196 by guest on 29 September 2021