Journal of the American Statistical Association
ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: https://www.tandfonline.com/loi/uasa20
Modeling Motor Learning Using Heteroscedastic Functional Principal Components Analysis
Daniel Backenroth, Jeff Goldsmith, Michelle D. Harran, Juan C. Cortes, John W. Krakauer & Tomoko Kitago
To cite this article: Daniel Backenroth, Jeff Goldsmith, Michelle D. Harran, Juan C. Cortes, John W. Krakauer & Tomoko Kitago (2018) Modeling Motor Learning Using Heteroscedastic Functional Principal Components Analysis, Journal of the American Statistical Association, 113:523, 1003-1015, DOI: 10.1080/01621459.2017.1379403 To link to this article: https://doi.org/10.1080/01621459.2017.1379403
View supplementary material
Accepted author version posted online: 29 Sep 2017. Published online: 29 Sep 2017.
Submit your article to this journal
Article views: 513
View Crossmark data
Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=uasa20 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION , VOL. , NO. , –: Applications and Case Studies https://doi.org/./..
Modeling Motor Learning Using Heteroscedastic Functional Principal Components Analysis
Daniel Backenrotha, Jeff Goldsmitha, Michelle D. Harranb,c,JuanC.Cortesb,d, John W. Krakauerb,c,d, and Tomoko Kitagob,e aDepartment of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY; bDepartment of Neurology, Columbia University Medical Center, New York, NY; cDepartment of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD; dDepartment of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD; eBurke Neurological Institute, White Plains, NY
ABSTRACT ARTICLE HISTORY We propose a novel method for estimating population-level and subject-specific effects of covariates on the Received February variability of functional data. We extend the functional principal components analysis framework by mod- Revised August eling the variance of principal component scores as a function of covariates and subject-specific random KEYWORDS effects. In a setting where principal components are largely invariant across subjects and covariate values, Functional data; Kinematic modeling the variance of these scores provides a flexible and interpretable way to explore factors that affect data; Motor control; the variability of functional data. Our work is motivated by a novel dataset from an experiment assessing Probabilistic PCA; Variance upper extremity motor control, and quantifies the reduction in movement variability associated with skill modeling; Variational Bayes learning. The proposed methods can be applied broadly to understand movement variability, in settings that include motor learning, impairment due to injury or disease, and recovery. Supplementary materials for this article are available online.
1. Scientific Motivation variance and recovery (Krakauer 2006). By taking full advantage of high-frequency laboratory recordings, we shift focus from 1.1. Motor Learning particular time points to complete curves. Our approach allows us to model the variability of these curves as they depend on Recent work in motor learning has suggested that change in covariates,likethehandusedortherepetitionnumber,aswell movement variability is an important component of improve- as the estimation of random effects reflecting differences in base- ment in motor skill. It has been suggested that when a motor line variability and learning rates among subjects. task is learned, variance is reduced along dimensions relevant Section 1.2 describes our motivating data in more detail, and to the successful accomplishment of the task, although it may Section 2 introduces our modeling framework. A review of rele- increase in other dimensions (Scholz and Schöner 1999;Yarrow, vantstatisticalworkappearsinSection 3.Detailsofourestima- Brown, and Krakauer 2009). Experimental work, moreover, tion approach are in Section 4. Simulations and the application has shown that learning-induced improvement of movement to our motivating data appear in Sections 5 and 6,respectively, execution, measured through the trade-off between speed and andweclosewithadiscussioninSection 7. accuracy, is accompanied by significant reductions in movement variability. In fact, these reductions in movement variability may be a more important feature of learning than changes in the 1.2. Dataset average movement (Shmuelof, Krakauer, and Mazzoni 2012). These results have typically been based on assessments of vari- Our motivating data were gathered as part of a study of motor ability at a few time points, for example, at the end of the move- learning among healthy subjects. Kinematic data were acquired ment, although high-frequency laboratory recordings of com- in a standard task used to measure control of reaching move- plete movements are often available. ments. In this task, subjects rest their forearm on an air-sled In this article we develop a modeling framework that can be system to reduce effects of friction and gravity. The subjects are used to quantify movement variability based on dense record- presented with a screen showing eight targets arranged in a cir- ings of fingertip position throughout movement execution. This cle around a starting point, and they reach with their arm to a framework can be used to explore many aspects of motor skill target and back when it is illuminated on the screen. Subjects’ and learning: differences in baseline skill among healthy sub- movements are displayed on the screen, and they are rewarded jects, effects of repetition and training to modulate variability with 10 points if they turn their hand around within the target, over time, or the effect of baseline stroke severity on movement and 3 or 1 otherwise, depending on how far their hand is from
CONTACT Daniel Backenroth [email protected] Department of Biostatistics, Mailman School of Public Health, Columbia University, West th Street, th Floor, New York, NY . Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/r/JASA. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JASA. © American Statistical Association 1004 D. BACKENROTH ET AL. thetargetatthepointofreturn.Subjectsarenotrewardedfor each target by each subject. The reduction in movement vari- movements outside prespecified velocity thresholds. ability after practice is clear. Our dataset consists of 9481 movements by 26 right-handed Prior to our analyses, we rotate curves so that all movements subjects. After becoming familiarized with the experimental extend to the target at 0◦. This rotation preserves shape and apparatus, each subject made 24 or 25 reaching movements to scale, but improves interpretation. After rotation, movement each of the 8 targets, in a semirandom order, with both the left along the X coordinate represents movement parallel to the line and right hand. Movements that did not reach at least 30% of the between origin and target, and movement along the Y coordi- distance to the target and movements with a direction more than nate represents movement perpendicular to this line. We build 90 deg away from the target direction at the point of peak veloc- models for X and Y coordinate curves separately in our primary ity were excluded from the dataset, because of the likelihood analysis. An alternative bivariate analysis appears in Appendix that they were made to the wrong target or not attempted due C. to distraction. Movements made at speeds outside the range of interest, with peak velocity less than 0.04 or greater than 2.0 m/s, 2. Model for Trajectory Variability were also excluded. These exclusion rules and other similar rules We adopt a functional data approach to model position curves have been used previously in similar kinematic experiments, and P (t).HereweomittheX andY superscripts for notational sim- are designed to increase the specificity of these experiments for ij plicity. Our starting point is the functional principal component probingmotorcontrolmechanisms(Huangetal.2012; Tanaka, analysis (FPCA) model of Yao, Müller, and Wang (2005)with Sejnowski, and Krakauer 2009;Kitagoetal.2015). A small num- subject-specific means. In this model, it is assumed that each ber of additional movements were removed from the dataset due curve P (t) can be modeled as to instrumentation and recording errors. The data we consider ij have not been previously reported. ( ) = μ ( ) + δ ( ) For each movement, the X and Y position of the hand move- Pij t ij t ij t ∞ ment is recorded as a function of time from movement onset to = μ ( ) + ξ φ ( ) + ( ). the initiation of return to the starting point, resulting in bivari- ij t ijk k t ij t (1) X ( ), Y ( ) k=1 ate functional observations denoted [Pij t Pij t ]forsubject i and movement j. In practice, observations are recorded not Here μij(t) is the mean function for curve Pij(t),thedevia- as functions but as discrete vectors. There is some variability in tion δij(t) is modeled as a linear combination of eigenfunctions φ ( ) ξ movement duration, which we remove for computational con- k t ,the ijk are uncorrelated random variables with mean 0 venience by linearly registering each movement onto a common and variances λk,where λk < ∞ and λ1 ≥ λ2 ≥··· , and = k grid of length D 50. The structure of the registered kinematic ij(t) iswhitenoise.Hereallthedeviationsδij(t) are assumed data is illustrated in Figure 1.Thetopandbottomrowsshow, tohavethesamedistribution,thatofasingleunderlyingrandom respectively, the first and last right-hand movement made to process δ(t).
Figure . Observed kinematic data. The top row shows the first right-hand movement to each target for each subject; the bottom row shows the last movement. The left X ( ) Y ( ) panel of each row shows observed reaching data in the X andY plane. Targets are indicated with circles. The middle and right panels of each row show the Pij t and Pij t curves, respectively. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1005
Model (1)isbasedonatruncationoftheKarhunen-Loève In particular, the φk(t) areassumedtobeeigenfunctionsofa representation of the random process δ(t).TheKarhunen- conditional covariance operator. Our proposal can be related Loève representation, in turn, arises from the spectral to standard FPCA by considering covariate values random and decomposition of the covariance of the random process marginalizing across the distribution of random effects and δ(t) from Mercer’s Theorem, from which one can obtain covariates using the law of total covariance: eigenfunctions φk(t) and eigenvalues λk. λ δ ( ), δ ( ) = δ ( ), δ ( )| ∗, ∗, The assumption of constant score variances k in model (1) cov[ ij s ij t ] E cov[ ij s ij t x z g] ∗ ∗ ∗ ∗ is inconsistent with our motivating data because it implies that +cov E[δ (s)|x , z , g]E[δ (t)|x , z , g] ( ) ij ij thevariabilityofthepositioncurvesPij t is not covariate- or K subject-dependent. However, movement variability can depend = E λ | ∗ , ∗ , φ (s)φ (t). k xijk zijk gik k k on the subject’s baseline motor control and may change in k=1 response to training. Indeed, these changes in movement vari- ance are precisely our interest. We assume that the basis functions φk(t) do not depend on In contrast to the preceding, we therefore assume that each covariateorsubjecteffects,andarethereforeunchangedbythis δ ( ) random process ij t has a potentially unique distribution, with marginalization. Scores ξijk are marginally uncorrelated over k; a covariance operator that can be decomposed as this follows from the assumption that scores are uncorrelated in our conditional specification, and holds even if random effects ∞ gik are correlated over k. Finally, the order of marginal variances cov[δij(s), δij(t)] = λijkφk(s)φk(t), E[λk|x∗ ,z∗ ,g ] may not correspond to the order of conditional k=1 ijk ijk ik variances λ | ∗ , ∗ , for some or even all values of the covari- k xijk zijk gik so that the eigenvalues λijk,butnottheeigenfunctions,vary ates and random effects coefficients. δ ( ) among the curves. We assume that deviations ij t are uncor- In our approach, we assume that the scores ξijk have mean related across both i and j. zero. For this assumption to be valid, the mean μij(t) in model The model we pose for the Pij(t) is therefore (2) should be carefully modeled. To this end we use the well- studied multilevel function-on-scalar regression model (Guo K 2002;Dietal.2009; Morris and Carroll 2006;Scheipl,Staicu, ( ) = μ ( ) + ξ φ ( ) + ( ), Pij t ij t ijk k t ij t (2) and Greven 2015), k=1 L M where we have truncated the expansion in model (1)toK eigen- μij(t) = β0(t) + xijlβl (t) + zijmbim(t). (4) functions, and into which we incorporate covariate and subject- l=1 m=1 dependent heteroscedasticity with the score variance model β ( ) ∈ ,..., ∗ ∗ Here 0 t is the functional intercept; xijl for l 1 L are λ = λ ∗ ∗ = (ξ | , , ) ijk k|x ,z ,g Var ijk xijk zijk g scalar covariates associated with functional fixed effects with ijk ijk ik ik ∗ ∗ ( ) β ( ) L M respect to the curve Pij t ; l t is the functional fixed effect = γ + γ ∗ + ∗ associated with the lth such covariate; z for m ∈ 1,...,M exp 0k lkxijlk gimkzijmk (3) ijm l=1 m=1 are scalar covariates associated with functional random effects with respect to the curve Pij(t);andbim(t) for m ∈ 1,...,M are ξ where, as before, ijk is the kth score for the jth curve of the functional random effects associated with the ith subject. γ ith subject. In model (3), 0k is an intercept for the variance of Keeping the basis functions constant across all subjects and γ ∗ the scores, lk are fixed effects coefficients for covariates xijlk, movements, as in conventional FPCA, maintains the inter- ∗ l = 1,...,L ,andgimk are random effects coefficients for pretability of the basis functions as the major patterns of ∗ = ,..., ∗ covariates zijmk, m 1 M .Thevectorgik consists of the variation across curves. Moreover, the covariate and subject- concatenation of the coefficients gimk,andlikewiseforthevec- dependent score variances reflect the proportion of variation ∗ ∗ tors xijk and zijk. Throughout, the subscript k indicates that attributable to those patterns. To examine the appropriateness models are used to describe the variance of scores associated of this assumption for our data, we estimated basis functions φ ( ) ∗ with each basis function k t separately. The covariates xijlk for various subsets of movements using a traditional FPCA and z∗ in model (3) need not be the same across principal approach, after rotating observed data so that all movements ijmk ◦ components. This model allows exploration of the dependence extend to the target at 0 .AsillustratedinFigure 2,thebasis of movement variability on covariates, like progress through functions for movements made by both hands and at different a training regimen, as well as of idiosyncratic subject-specific stages of training are similar. effects on variance through the incorporation of random inter- cepts and slopes. 3. Prior Work Together, models (2)and(3) induce a subject- and covariate- δ ( ) dependent covariance structure for ij t : FPCA has a long history in functional data analysis. It is com- monly performed using a spectral decomposition of the sam- δ ( ), δ ( )| ∗ , ∗ ,φ , cov[ ij s ij t xijk zijk k gik] ple covariance matrix of the observed functional data (Ramsay K and Silverman 2005;Yao,Müller,andWang2005). Most relevant = λ | ∗ , ∗ , φ (s)φ (t). to our current work are probabilistic and Bayesian approaches k xijk zijk gik k k k=1 based on nonfunctional PCA methods (Tipping and Bishop 1006 D. BACKENROTH ET AL.
Figure . FPC basis functions estimated for various data subsets after rotating curves onto the positive X axis. The left panel shows the first and second FPC basis functions estimated for the X coordinate of movements to each target, for the left and right hand separately, and separately for movement numbers -, -, -, and -. The right panel shows the same for the Y coordinate.
1999;Bishop1999;PengandPaul2009). Rather than proceed- inference may suffer as a result (Ormerod and Wand 2012;Jor- ing in stages, first by estimating basis functions and then, given dan 2004;Jordanetal.1999; Titterington 2004). These tools have these, estimating scores, such approaches estimate all parame- previously been used in functional data analysis (van der Linde ters in model (1) jointly. James, Hastie, and Sugar (2000)focused 2008; Goldsmith, Wand, and Crainiceanu 2011;McLeanetal. on sparsely observed functional data and estimated parameters 2013); in particular, Goldsmith and Kitago (2016)usedvaria- using an EM algorithm; van der Linde (2008)tookavariational tional Bayes methods in the estimation of model (4). Bayes approach to estimation of a similar model. Goldsmith, Zipunnikov, and Schrack (2015) considered both exponential- 4. Methods family functional data and multilevel curves, and estimated parameters using Hamiltonian Monte Carlo. The main contribution of this article is the introduction of sub- Some previous work has allowed for heteroskedasticity in ject and covariate effects on score variances in model (3). Several FPCA. Chiou, Müller, and Wang (2003) developed a model estimation strategies can be used within this framework. Here which uses covariate-dependent scores to capture the covari- we describe three possible approaches. Later, these will be com- ate dependence of the mean of curves. In a manner that is paredinsimulations. constrained by the conditional mean structure of the curves, some covariate dependence of the variance of curves is also 4.1. Sequential Estimation induced; the development of models for score variance was, however, not pursued. Here, by contrast, our interest is to use Models (2)and(3)canbefitsequentiallyinthefollowing FPCA to model the effects of covariates on curve variance, inde- way. First, the mean μij(t) in model (2)isestimatedthrough pendently of the mean structure. We are not using FPCA to function-on-scalar regression under a working independence model the mean; rather, the mean is modeled by the function- assumption of the errors; we use the function pffr in the on-scalar regression model (4). Jiang and Wang (2010)intro- refund package (Goldsmith et al. 2016) in R. Next, the resid- duce heteroscedasticity by allowing both the basis functions and uals from the function-on-scalar regression are modeled using the scores in an FPCA decomposition to depend on covariates. standard FPCA approaches to obtain estimates of principal Briefly, rather than considering a bivariate covariance as the components and marginal score variances; given these quan- object to be decomposed, the authors pose a covariance surface tities, scores themselves can be estimated (Yao, Müller, and that depends smoothly on a covariate. Aside from the challenge Wang 2005). For this step we use the function fpca.sc,also of incorporating more than a few covariates or subject-specific in the refund package, which is among the available imple- effects, it is difficult to use this model to explore the effects of mentations. Next, we reestimate the mean μij(t) in model (2) covariates on heteroscedasticity: covariates affect both the basis with function-on-scalar regression using pffr,althoughnow, functions and the scores, making the interpretation of scores and instead of assuming independence, we decompose the residu- score variances at different covariate levels unclear. Although it als using the principal components and score variances esti- does not allow for covariate-dependent heteroscedasticity, the matedinthepreviousstep.Wethenreestimateprincipalcompo- model of Huang, Li, and Guan (2014)allowscurvestobelongto nents and scores using fpca.sc. The final step is to model the one of a few different clusters, each with its own FPCs and score score variances given these score estimates. Assuming that the variances. scores are normally distributed conditional on random effects In contrast to the existing literature, our model provides a and covariates, model (3) induces a generalized gamma linear ξ 2 general framework for understanding covariate and subject- mixed model for ijk, the square of the scores, with log link, coef- dependent heteroskedasticity in FPCA. This allows the estima- ficients γlk and gimk, and shape parameter equal to 1/2. We fit tion of rich models with multiple covariates and random effects, this model with the lme4 package, separately with respect to while maintaining the familiar interpretation of basis functions, the scores for each principal component, in order to obtain esti- scores, and score variances. mates of our parameters of interest in the score variance model Variational Bayes methods, which we use here to approxi- (Bates et al. 2015). mate Bayesian estimates of the parameters in models (2)and(3), The first two steps of this approach are consistent with the are computationally efficient and typically yield accurate point common strategy for FPCA, and we account for non-constant estimates for model parameters, although they provide only score variance through an additional modeling step. We antic- an approximation to the complete posterior distribution and ipatethatthissequentialapproachwillworkreasonablywell JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1007 in many cases, but note that it arises as a sequence of models to the normal distribution, IG to the inverse-gamma distribu- that treat estimated quantities as fixed. First, one estimates the tion, and IW to the inverse-Wishart distribution. Models (3)and mean;thenonetreatsthemeanasfixedtoestimatetheprincipal (4)canbewrittenintheformofmodel(5)abovebyintroduc- ∗ components and the scores; finally, one treats the scores as fixed ing into those models covariates xij0k (in model (3), multiplying to estimate the score variance model. Overall performance may γ0k)andxij0 (in model (4), multiplying β0(t)), identically equal deteriorate by failing to incorporate uncertainty in estimates to1.Someofthemodelsusedhere,likeinourrealdataanalysis, β in each step, particularly in cases of sparsely observed curves do not have a global functional intercept 0 or global score vari- or high measurement error variances (Goldsmith, Greven, and ance intercepts γ0k; in these models there are no such covariates Crainiceanu 2013). Additionally, because scores are typically identically equal to 1. estimatedinamixedmodelframework,theuseofmarginal As discussed further in Section 4.2.3, for purposes of iden- scorevariancesintheFPCAstepcannegativelyimpactscore tifiability and to obtain FPCs that represent non-overlapping estimation and the subsequent modeling of conditional score directions of variation, when fitting this model we introduce the variances. additional constraint that the FPCs should be orthonormal and that each FPC should explain the largest possible amount of vari- ance in the data, conditionally on the previously estimated FPCs, 4.2. Bayesian Approach if any. ... Bayesian Model In keeping with standard practice, we set the prior vari- 2 ances σγ for the fixed-effect coefficients in the score variance Jointly estimating all parameters in models (2)and(3)ina lk Bayesian framework is an appealing alternative to the sequential model to a large constant, so that their prior is close to uni- ν estimation approach. We expect this to be less familiar to read- form.Weset , the degrees of freedom parameter for the inverse- Wishart prior for the covariance matrices ,tothedimen- ers than the sequential approach, and therefore provide a more gk detailed description. sion of gik. We use an empirical Bayes approach, discussed fur- Our Bayesian specification of these models is formulated in ther in Section 4.2.4,tospecify k,thescalematrixparameters matrix form to reflect the discrete nature of the observed data. of these inverse-Wishart priors. When the random effects gik In the following is a known D × Kθ matrix of Kθ spline basis are one-dimensional, this prior reduces to an inverse-Gamma functionsevaluatedonthesharedgridoflengthD on which the prior. Sensitivity to prior specifications of this model should be curves are observed. We assume a normal distribution of the explored, and we do so with respect to our real data analysis in scores ξ conditional on random effects and covariates: Appendix D. ijk 2 L 2 K Variance components {σβ } = and {σφ } = act as tuning l l 0 k k 1 L M K parameters controlling the smoothness of coefficient functions = β + + ξ φ + β (t) and FPC functions φ (t), and our prior specification for pij xijl l zijm bim ijk k ij l k l=0 m=1 k=1 them is related to standard techniques in semiparametric regres- σ 2 β ∼ ,σ2 −1 ; σ 2 ∼ α,β sion. b ,meanwhile,isatuningparameterthatcontrolsthe l MVN 0 β Q β IG [ ] l l amount of penalization of the random effects, and is shared ∼ ,σ2(( − π) + π )−1 ; σ 2 ∼ α,β across the b (t), so that all random effects for all subjects bi MVN 0 b 1 Q I b IG [ ] im share a common distribution. Whereas fixed effects and func- 2 −1 2 φ ∼ MVN 0,σφ Q ; σφ ∼ IG [α,β] k k k tional principal components are penalized only through their L∗ M∗ squared second derivative, the magnitude of the random effects ξ ∼ , γ ∗ + ∗ is also penalized through the full-rank penalty matrix I to ensure ijk N 0 exp lkxijlk gimkzijmk = = identifiability (Scheipl, Staicu, and Greven 2015;Djeundjeand l 0 m 1 2 Currie 2010). The parameter π,0<π<1, determines the bal- γlk ∼ N 0,σγ lk ance of smoothness and shrinkage penalties in the estimation of g ∼ MVN 0, g ; g ∼ IW [ ,ν] ik k k k the random effects bim(t).Wediscusshowtosetthevalueofthis 2 2 ij ∼ MVN 0,σ I ; σ ∼ IG [α,β] . (5) parameter in Section 4.2.4.Wesetα and β, the parameters of the inverse-gamma prior distributions for the variance components, In model (5), i = 1,...,I refers to subjects, j = 1,...,Ji refers to 1. to movements within subjects, and k = 1,...,K refers to prin- Our framework can accommodate more complicated ran- cipal components. We define the total number of functional dom effect structures. In our application in Section 6, for exam- = I observations n i=1 Ji.Thecolumnvectorspij and ij are ple, each subject has 8 random effect vectors gilk, one for each the D × 1 observed functional outcome and independent error target, indexed by l = 1,...,8; the index l is used here since in term, respectively, on the finite grid shared across subjects Section 6 l is used to index targets. We model the correlations β = for the jth curve of the ith subject. The vectors l,forl between these random effect vectors through a nested random 0,...,L, are functional effect spline coefficient vectors, bim,for effect structure: i = 1,...,I and m = 1,...,M, are random effect spline coeffi- φ , = ..., ∼ , ; ∼ , . cient vectors, and k for k 1 K,areprincipalcomponent gilk MVN gik gik gik MVN 0 gk (6) spline coefficient vectors, all of length Kθ . Q is a penalty matrix T T of the form M M ,whereM is a matrix that penalizes the Here the random effect vectors gilk for subject i and FPC k, second derivative of the estimated functions. I is the identity l = 1,...8,arecenteredaroundasubject-specificrandomeffect matrix. MVN refers to the multivariate normal distribution, N vector gik. We estimate two separate random effect covariance 1008 D. BACKENROTH ET AL.