<<

Journal of the American Statistical Association

ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: https://www.tandfonline.com/loi/uasa20

Modeling Motor Learning Using Heteroscedastic Functional Principal Components Analysis

Daniel Backenroth, Jeff Goldsmith, Michelle D. Harran, Juan C. Cortes, John W. Krakauer & Tomoko Kitago

To cite this article: Daniel Backenroth, Jeff Goldsmith, Michelle D. Harran, Juan C. Cortes, John W. Krakauer & Tomoko Kitago (2018) Modeling Motor Learning Using Heteroscedastic Functional Principal Components Analysis, Journal of the American Statistical Association, 113:523, 1003-1015, DOI: 10.1080/01621459.2017.1379403 To link to this article: https://doi.org/10.1080/01621459.2017.1379403

View supplementary material

Accepted author version posted online: 29 Sep 2017. Published online: 29 Sep 2017.

Submit your article to this journal

Article views: 513

View Crossmark data

Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=uasa20 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION , VOL. , NO. , –: Applications and Case Studies https://doi.org/./..

Modeling Motor Learning Using Heteroscedastic Functional Principal Components Analysis

Daniel Backenrotha, Jeff Goldsmitha, Michelle D. Harranb,c,JuanC.Cortesb,d, John W. Krakauerb,c,d, and Tomoko Kitagob,e aDepartment of , Mailman School of Public Health, Columbia University, New York, NY; bDepartment of Neurology, Columbia University Medical Center, New York, NY; cDepartment of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD; dDepartment of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD; eBurke Neurological Institute, White Plains, NY

ABSTRACT ARTICLE HISTORY We propose a novel method for estimating population-level and subject-specific effects of covariates on the Received February  variability of functional data. We extend the functional principal components analysis framework by mod- Revised August  eling the of principal component scores as a function of covariates and subject-specific random KEYWORDS effects. In a setting where principal components are largely invariant across subjects and covariate values, Functional data; Kinematic modeling the variance of these scores provides a flexible and interpretable way to explore factors that affect data; Motor control; the variability of functional data. Our work is motivated by a novel dataset from an assessing Probabilistic PCA; Variance upper extremity motor control, and quantifies the reduction in movement variability associated with skill modeling; Variational Bayes learning. The proposed methods can be applied broadly to understand movement variability, in settings that include motor learning, impairment due to injury or disease, and recovery. Supplementary materials for this article are available online.

1. Scientific Motivation variance and recovery (Krakauer 2006). By taking full advantage of high-frequency laboratory recordings, we shift focus from 1.1. Motor Learning particular time points to complete curves. Our approach allows us to model the variability of these curves as they depend on Recent work in motor learning has suggested that change in covariates,likethehandusedortherepetitionnumber,aswell movement variability is an important component of improve- as the estimation of random effects reflecting differences in base- ment in motor skill. It has been suggested that when a motor line variability and learning rates among subjects. task is learned, variance is reduced along dimensions relevant Section 1.2 describes our motivating data in more detail, and to the successful accomplishment of the task, although it may Section 2 introduces our modeling framework. A review of rele- increase in other dimensions (Scholz and Schöner 1999;Yarrow, vantstatisticalworkappearsinSection 3.Detailsofourestima- Brown, and Krakauer 2009). Experimental work, moreover, tion approach are in Section 4. Simulations and the application has shown that learning-induced improvement of movement to our motivating data appear in Sections 5 and 6,respectively, execution, measured through the trade-off between speed and andweclosewithadiscussioninSection 7. accuracy, is accompanied by significant reductions in movement variability. In fact, these reductions in movement variability may be a more important feature of learning than changes in the 1.2. Dataset average movement (Shmuelof, Krakauer, and Mazzoni 2012). These results have typically been based on assessments of vari- Our motivating data were gathered as part of a study of motor ability at a few time points, for example, at the end of the move- learning among healthy subjects. Kinematic data were acquired ment, although high-frequency laboratory recordings of com- in a standard task used to measure control of reaching move- plete movements are often available. ments. In this task, subjects rest their forearm on an air-sled In this article we develop a modeling framework that can be system to reduce effects of friction and gravity. The subjects are used to quantify movement variability based on dense record- presented with a screen showing eight targets arranged in a cir- ings of fingertip position throughout movement execution. This cle around a starting point, and they reach with their arm to a framework can be used to explore many aspects of motor skill target and back when it is illuminated on the screen. Subjects’ and learning: differences in baseline skill among healthy sub- movements are displayed on the screen, and they are rewarded jects, effects of repetition and training to modulate variability with 10 points if they turn their hand around within the target, over time, or the effect of baseline stroke severity on movement and 3 or 1 otherwise, depending on how far their hand is from

CONTACT Daniel Backenroth [email protected] Department of Biostatistics, Mailman School of Public Health, Columbia University,  West th Street, th Floor, New York, NY . Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/r/JASA. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JASA. ©  American Statistical Association 1004 D. BACKENROTH ET AL. thetargetatthepointofreturn.Subjectsarenotrewardedfor each target by each subject. The reduction in movement vari- movements outside prespecified velocity thresholds. ability after practice is clear. Our dataset consists of 9481 movements by 26 right-handed Prior to our analyses, we rotate curves so that all movements subjects. After becoming familiarized with the experimental extend to the target at 0◦. This rotation preserves shape and apparatus, each subject made 24 or 25 reaching movements to scale, but improves interpretation. After rotation, movement each of the 8 targets, in a semirandom order, with both the left along the X coordinate represents movement parallel to the line and right hand. Movements that did not reach at least 30% of the between origin and target, and movement along the Y coordi- distance to the target and movements with a direction more than nate represents movement perpendicular to this line. We build 90 deg away from the target direction at the point of peak veloc- models for X and Y coordinate curves separately in our primary ity were excluded from the dataset, because of the likelihood analysis. An alternative bivariate analysis appears in Appendix that they were made to the wrong target or not attempted due C. to distraction. Movements made at speeds outside the of interest, with peak velocity less than 0.04 or greater than 2.0 m/s, 2. Model for Trajectory Variability were also excluded. These exclusion rules and other similar rules We adopt a functional data approach to model position curves have been used previously in similar kinematic , and P (t).HereweomittheX andY superscripts for notational sim- are designed to increase the specificity of these experiments for ij plicity. Our starting point is the functional principal component probingmotorcontrolmechanisms(Huangetal.2012; Tanaka, analysis (FPCA) model of Yao, Müller, and Wang (2005)with Sejnowski, and Krakauer 2009;Kitagoetal.2015). A small num- subject-specific . In this model, it is assumed that each ber of additional movements were removed from the dataset due curve P (t) can be modeled as to instrumentation and recording errors. The data we consider ij have not been previously reported. ( ) = μ ( ) + δ ( ) For each movement, the X and Y position of the hand move- Pij t ij t ij t ∞ ment is recorded as a function of time from movement onset to  = μ ( ) + ξ φ ( ) +  ( ). the initiation of return to the starting point, resulting in bivari- ij t ijk k t ij t (1) X ( ), Y ( ) k=1 ate functional observations denoted [Pij t Pij t ]forsubject i and movement j. In practice, observations are recorded not Here μij(t) is the function for curve Pij(t),thedevia- as functions but as discrete vectors. There is some variability in tion δij(t) is modeled as a linear combination of eigenfunctions φ ( ) ξ movement duration, which we remove for computational con- k t ,the ijk are uncorrelated random variables with mean 0 venience by linearly registering each movement onto a common and λk,where λk < ∞ and λ1 ≥ λ2 ≥··· , and = k grid of length D 50. The structure of the registered kinematic ij(t) iswhitenoise.Hereallthedeviationsδij(t) are assumed data is illustrated in Figure 1.Thetopandbottomrowsshow, tohavethesamedistribution,thatofasingleunderlyingrandom respectively, the first and last right-hand movement made to process δ(t).

Figure . Observed kinematic data. The top row shows the first right-hand movement to each target for each subject; the bottom row shows the last movement. The left X ( ) Y ( ) panel of each row shows observed reaching data in the X andY plane. Targets are indicated with circles. The middle and right panels of each row show the Pij t and Pij t curves, respectively. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1005

Model (1)isbasedonatruncationoftheKarhunen-Loève In particular, the φk(t) areassumedtobeeigenfunctionsofa representation of the random process δ(t).TheKarhunen- conditional covariance operator. Our proposal can be related Loève representation, in turn, arises from the spectral to standard FPCA by considering covariate values random and decomposition of the covariance of the random process marginalizing across the distribution of random effects and δ(t) from Mercer’s Theorem, from which one can obtain covariates using the law of total covariance: eigenfunctions φk(t) and eigenvalues λk.   λ δ ( ), δ ( ) = δ ( ), δ ( )| ∗, ∗, The assumption of constant score variances k in model (1) cov[ ij s ij t ] E cov[ ij s ij t x z g]  ∗ ∗ ∗ ∗ is inconsistent with our motivating data because it implies that +cov E[δ (s)|x , z , g]E[δ (t)|x , z , g] ( ) ij ij thevariabilityofthepositioncurvesPij t is not covariate- or K  subject-dependent. However, movement variability can depend = E λ | ∗ , ∗ , φ (s)φ (t). k xijk zijk gik k k on the subject’s baseline motor control and may change in k=1 response to training. Indeed, these changes in movement vari- ance are precisely our interest. We assume that the basis functions φk(t) do not depend on In contrast to the preceding, we therefore assume that each covariateorsubjecteffects,andarethereforeunchangedbythis δ ( ) random process ij t has a potentially unique distribution, with marginalization. Scores ξijk are marginally uncorrelated over k; a covariance operator that can be decomposed as this follows from the assumption that scores are uncorrelated in our conditional specification, and holds even if random effects ∞ gik are correlated over k. Finally, the order of marginal variances cov[δij(s), δij(t)] = λijkφk(s)φk(t), E[λk|x∗ ,z∗ ,g ] may not correspond to the order of conditional k=1 ijk ijk ik variances λ | ∗ , ∗ , for some or even all values of the covari- k xijk zijk gik so that the eigenvalues λijk,butnottheeigenfunctions,vary ates and random effects coefficients. δ ( ) among the curves. We assume that deviations ij t are uncor- In our approach, we assume that the scores ξijk have mean related across both i and j. zero. For this assumption to be valid, the mean μij(t) in model The model we pose for the Pij(t) is therefore (2) should be carefully modeled. To this end we use the well- studied multilevel function-on-scalar regression model (Guo K 2002;Dietal.2009; Morris and Carroll 2006;Scheipl,Staicu, ( ) = μ ( ) + ξ φ ( ) +  ( ), Pij t ij t ijk k t ij t (2) and Greven 2015), k=1 L M where we have truncated the expansion in model (1)toK eigen- μij(t) = β0(t) + xijlβl (t) + zijmbim(t). (4) functions, and into which we incorporate covariate and subject- l=1 m=1 dependent with the score variance model β ( ) ∈ ,..., ∗ ∗ Here 0 t is the functional intercept; xijl for l 1 L are λ = λ ∗ ∗ = (ξ | , , ) ijk k|x ,z ,g Var ijk xijk zijk g scalar covariates associated with functional fixed effects with ijk ijk ik ik  ∗ ∗ ( ) β ( ) L M respect to the curve Pij t ; l t is the functional fixed effect = γ + γ ∗ + ∗ associated with the lth such covariate; z for m ∈ 1,...,M exp 0k lkxijlk gimkzijmk (3) ijm l=1 m=1 are scalar covariates associated with functional random effects with respect to the curve Pij(t);andbim(t) for m ∈ 1,...,M are ξ where, as before, ijk is the kth score for the jth curve of the functional random effects associated with the ith subject. γ ith subject. In model (3), 0k is an intercept for the variance of Keeping the basis functions constant across all subjects and γ ∗ the scores, lk are fixed effects coefficients for covariates xijlk, movements, as in conventional FPCA, maintains the inter- ∗ l = 1,...,L ,andgimk are random effects coefficients for pretability of the basis functions as the major patterns of ∗ = ,..., ∗ covariates zijmk, m 1 M .Thevectorgik consists of the variation across curves. Moreover, the covariate and subject- concatenation of the coefficients gimk,andlikewiseforthevec- dependent score variances reflect the proportion of variation ∗ ∗ tors xijk and zijk. Throughout, the subscript k indicates that attributable to those patterns. To examine the appropriateness models are used to describe the variance of scores associated of this assumption for our data, we estimated basis functions φ ( ) ∗ with each basis function k t separately. The covariates xijlk for various subsets of movements using a traditional FPCA and z∗ in model (3) need not be the same across principal approach, after rotating observed data so that all movements ijmk ◦ components. This model allows exploration of the dependence extend to the target at 0 .AsillustratedinFigure 2,thebasis of movement variability on covariates, like progress through functions for movements made by both hands and at different a training regimen, as well as of idiosyncratic subject-specific stages of training are similar. effects on variance through the incorporation of random inter- cepts and slopes. 3. Prior Work Together, models (2)and(3) induce a subject- and covariate- δ ( ) dependent covariance structure for ij t : FPCA has a long history in functional data analysis. It is com- monly performed using a spectral decomposition of the sam- δ ( ), δ ( )| ∗ , ∗ ,φ , cov[ ij s ij t xijk zijk k gik] ple covariance matrix of the observed functional data (Ramsay K and Silverman 2005;Yao,Müller,andWang2005). Most relevant = λ | ∗ , ∗ , φ (s)φ (t). to our current work are probabilistic and Bayesian approaches k xijk zijk gik k k k=1 based on nonfunctional PCA methods (Tipping and Bishop 1006 D. BACKENROTH ET AL.

Figure . FPC basis functions estimated for various data subsets after rotating curves onto the positive X axis. The left panel shows the first and second FPC basis functions estimated for the X coordinate of movements to each target, for the left and right hand separately, and separately for movement numbers -, -, -, and -. The right panel shows the same for the Y coordinate.

1999;Bishop1999;PengandPaul2009). Rather than proceed- inference may suffer as a result (Ormerod and Wand 2012;Jor- ing in stages, first by estimating basis functions and then, given dan 2004;Jordanetal.1999; Titterington 2004). These tools have these, estimating scores, such approaches estimate all parame- previously been used in functional data analysis (van der Linde ters in model (1) jointly. James, Hastie, and Sugar (2000)focused 2008; Goldsmith, Wand, and Crainiceanu 2011;McLeanetal. on sparsely observed functional data and estimated parameters 2013); in particular, Goldsmith and Kitago (2016)usedvaria- using an EM algorithm; van der Linde (2008)tookavariational tional Bayes methods in the estimation of model (4). Bayes approach to estimation of a similar model. Goldsmith, Zipunnikov, and Schrack (2015) considered both exponential- 4. Methods family functional data and multilevel curves, and estimated parameters using Hamiltonian Monte Carlo. The main contribution of this article is the introduction of sub- Some previous work has allowed for heteroskedasticity in ject and covariate effects on score variances in model (3). Several FPCA. Chiou, Müller, and Wang (2003) developed a model estimation strategies can be used within this framework. Here which uses covariate-dependent scores to capture the covari- we describe three possible approaches. Later, these will be com- ate dependence of the mean of curves. In a manner that is paredinsimulations. constrained by the conditional mean structure of the curves, some covariate dependence of the variance of curves is also 4.1. Sequential Estimation induced; the development of models for score variance was, however, not pursued. Here, by contrast, our interest is to use Models (2)and(3)canbefitsequentiallyinthefollowing FPCA to model the effects of covariates on curve variance, inde- way. First, the mean μij(t) in model (2)isestimatedthrough pendently of the mean structure. We are not using FPCA to function-on-scalar regression under a working independence model the mean; rather, the mean is modeled by the function- assumption of the errors; we use the function pffr in the on-scalar regression model (4). Jiang and Wang (2010)intro- refund package (Goldsmith et al. 2016) in R. Next, the resid- duce heteroscedasticity by allowing both the basis functions and uals from the function-on-scalar regression are modeled using the scores in an FPCA decomposition to depend on covariates. standard FPCA approaches to obtain estimates of principal Briefly, rather than considering a bivariate covariance as the components and marginal score variances; given these quan- object to be decomposed, the authors pose a covariance surface tities, scores themselves can be estimated (Yao, Müller, and that depends smoothly on a covariate. Aside from the challenge Wang 2005). For this step we use the function fpca.sc,also of incorporating more than a few covariates or subject-specific in the refund package, which is among the available imple- effects, it is difficult to use this model to explore the effects of mentations. Next, we reestimate the mean μij(t) in model (2) covariates on heteroscedasticity: covariates affect both the basis with function-on-scalar regression using pffr,althoughnow, functions and the scores, making the interpretation of scores and instead of assuming independence, we decompose the residu- score variances at different covariate levels unclear. Although it als using the principal components and score variances esti- does not allow for covariate-dependent heteroscedasticity, the matedinthepreviousstep.Wethenreestimateprincipalcompo- model of Huang, Li, and Guan (2014)allowscurvestobelongto nents and scores using fpca.sc. The final step is to model the one of a few different clusters, each with its own FPCs and score score variances given these score estimates. Assuming that the variances. scores are normally distributed conditional on random effects In contrast to the existing literature, our model provides a and covariates, model (3) induces a generalized gamma linear ξ 2 general framework for understanding covariate and subject- for ijk, the square of the scores, with log link, coef- dependent heteroskedasticity in FPCA. This allows the estima- ficients γlk and gimk, and equal to 1/2. We fit tion of rich models with multiple covariates and random effects, this model with the lme4 package, separately with respect to while maintaining the familiar interpretation of basis functions, the scores for each principal component, in order to obtain esti- scores, and score variances. mates of our parameters of interest in the score variance model Variational Bayes methods, which we use here to approxi- (Bates et al. 2015). mate Bayesian estimates of the parameters in models (2)and(3), The first two steps of this approach are consistent with the are computationally efficient and typically yield accurate point common strategy for FPCA, and we account for non-constant estimates for model parameters, although they provide only score variance through an additional modeling step. We antic- an approximation to the complete posterior distribution and ipatethatthissequentialapproachwillworkreasonablywell JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1007 in many cases, but note that it arises as a sequence of models to the normal distribution, IG to the inverse-gamma distribu- that treat estimated quantities as fixed. First, one estimates the tion, and IW to the inverse-Wishart distribution. Models (3)and mean;thenonetreatsthemeanasfixedtoestimatetheprincipal (4)canbewrittenintheformofmodel(5)abovebyintroduc- ∗ components and the scores; finally, one treats the scores as fixed ing into those models covariates xij0k (in model (3), multiplying to estimate the score variance model. Overall performance may γ0k)andxij0 (in model (4), multiplying β0(t)), identically equal deteriorate by failing to incorporate uncertainty in estimates to1.Someofthemodelsusedhere,likeinourrealdataanalysis, β in each step, particularly in cases of sparsely observed curves do not have a global functional intercept 0 or global score vari- or high measurement error variances (Goldsmith, Greven, and ance intercepts γ0k; in these models there are no such covariates Crainiceanu 2013). Additionally, because scores are typically identically equal to 1. estimatedinamixedmodelframework,theuseofmarginal As discussed further in Section 4.2.3, for purposes of iden- scorevariancesintheFPCAstepcannegativelyimpactscore tifiability and to obtain FPCs that represent non-overlapping estimation and the subsequent modeling of conditional score directions of variation, when fitting this model we introduce the variances. additional constraint that the FPCs should be orthonormal and that each FPC should explain the largest possible amount of vari- ance in the data, conditionally on the previously estimated FPCs, 4.2. Bayesian Approach if any. ... Bayesian Model In keeping with standard practice, we set the prior vari- 2 ances σγ for the fixed-effect coefficients in the score variance Jointly estimating all parameters in models (2)and(3)ina lk Bayesian framework is an appealing alternative to the sequential model to a large constant, so that their prior is close to uni- ν estimation approach. We expect this to be less familiar to read- form.Weset , the degrees of freedom parameter for the inverse- Wishart prior for the covariance matrices  ,tothedimen- ers than the sequential approach, and therefore provide a more gk detailed description. sion of gik. We use an empirical Bayes approach, discussed fur- Our Bayesian specification of these models is formulated in ther in Section 4.2.4,tospecify k,thescalematrixparameters matrix form to reflect the discrete nature of the observed data. of these inverse-Wishart priors. When the random effects gik In the following is a known D × Kθ matrix of Kθ spline basis are one-dimensional, this prior reduces to an inverse-Gamma functionsevaluatedonthesharedgridoflengthD on which the prior. Sensitivity to prior specifications of this model should be curves are observed. We assume a normal distribution of the explored, and we do so with respect to our real data analysis in scores ξ conditional on random effects and covariates: Appendix D. ijk 2 L 2 K Variance components {σβ } = and {σφ } = act as tuning l l 0 k k 1 L M K parameters controlling the smoothness of coefficient functions = β + + ξ φ + β (t) and FPC functions φ (t), and our prior specification for pij xijl l zijm bim ijk k ij l k l=0 m=1 k=1 them is related to standard techniques in semiparametric regres-  σ 2 β ∼ ,σ2 −1 ; σ 2 ∼ α,β sion. b ,meanwhile,isatuningparameterthatcontrolsthe l MVN 0 β Q β IG [ ] l l amount of penalization of the random effects, and is shared ∼ ,σ2(( − π) + π )−1 ; σ 2 ∼ α,β across the b (t), so that all random effects for all subjects bi MVN 0 b 1 Q I b IG [ ] im  share a common distribution. Whereas fixed effects and func- 2 −1 2 φ ∼ MVN 0,σφ Q ; σφ ∼ IG [α,β] k k k tional principal components are penalized only through their   L∗ M∗ squared second derivative, the magnitude of the random effects ξ ∼ , γ ∗ + ∗ is also penalized through the full-rank penalty matrix I to ensure ijk N 0 exp lkxijlk gimkzijmk = = identifiability (Scheipl, Staicu, and Greven 2015;Djeundjeand l 0 m 1 2 Currie 2010). The parameter π,0<π<1, determines the bal- γlk ∼ N 0,σγ lk ance of smoothness and shrinkage penalties in the estimation of g ∼ MVN 0, g ; g ∼ IW [ ,ν] ik k k k the random effects bim(t).Wediscusshowtosetthevalueofthis 2 2 ij ∼ MVN 0,σ I ; σ ∼ IG [α,β] . (5) parameter in Section 4.2.4.Wesetα and β, the parameters of the inverse-gamma prior distributions for the variance components, In model (5), i = 1,...,I refers to subjects, j = 1,...,Ji refers to 1. to movements within subjects, and k = 1,...,K refers to prin- Our framework can accommodate more complicated ran- cipal components. We define the total number of functional dom effect structures. In our application in Section 6, for exam- = I observations n i=1 Ji.Thecolumnvectorspij and ij are ple, each subject has 8 random effect vectors gilk, one for each the D × 1 observed functional outcome and independent error target, indexed by l = 1,...,8; the index l is used here since in term, respectively, on the finite grid shared across subjects Section 6 l is used to index targets. We model the correlations β = for the jth curve of the ith subject. The vectors l,forl between these random effect vectors through a nested random 0,...,L, are functional effect spline coefficient vectors, bim,for effect structure: i = 1,...,I and m = 1,...,M, are random effect spline coeffi- φ , = ..., ∼ , ; ∼ , . cient vectors, and k for k 1 K,areprincipalcomponent gilk MVN gik gik gik MVN 0 gk (6) spline coefficient vectors, all of length Kθ . Q is a penalty matrix T T of the form M M ,whereM is a matrix that penalizes the Here the random effect vectors gilk for subject i and FPC k, second derivative of the estimated functions. I is the identity l = 1,...8,arecenteredaroundasubject-specificrandomeffect matrix. MVN refers to the multivariate normal distribution, N vector gik. We estimate two separate random effect covariance 1008 D. BACKENROTH ET AL.

π matrices, gik and gk ,foreachFPCk,oneatthesubject- to minimize the Bayesian information criterion (Schwarz target level and one at the subject level. These matrices are given 1978), following the approach of Djeundje and Currie (2010). inverse-Wishart priors, and are discussed further in Section To set the hyperparameter k in model (5)(orthehyperpa- 4.2.4. rameters in the inverse-Wishart priors for the variance param- eters in model (6)), we use an empirical Bayes method. First, we estimate scores ξ using our variational Bayes method, with ... Estimation Strategies ijk a constant score variance for each FPC. We then estimate the -based approaches to of model (5) random effects g (or g ) using a generalized gamma linear arechallengingduetotheconstraintsweimposeontheφ (t) for ik ilk k mixed model, as described in Section 4.1.Finally,wecompute purposes of interpretability of the score variance models, which the empirical covariance matrix corresponding to g (or g are our primary interest. We present two methods for Bayesian k ik and g ), and set the hyperparameter so that the of the estimation and inference for model (5): first, an iterative varia- k prior distribution matches this empirical covariance matrix. tional Bayes method, and second, a Hamiltonian Monte Carlo (HMC) sampler, implemented with the STAN Bayesian pro- gramming language (Stan Development Team 2013). Our itera- 5. Simulations tive variational Bayes method, which estimates each parameter We demonstrate the performance of our method using simu- in turn conditional on currently estimated values of the other lated data. Here we present a simulation that includes functional parameters, is described in detail in Appendix E. This appendix random effects as well as scalar score variance random effects. also includes a brief overview of variational Bayes methods. Our Appendix F includes additional simulations in a cross-sectional HMC sampler, also described in Appendix E, conditions on esti- context which demonstrate the effect of varying the number of mates of the FPCs and fixed and random functional effects from estimated FPCs, the number of spline basis functions, and the the variational Bayes method, and estimates the other quantities measurement error. in model (5). In our simulation design, the jth curve for the ith subject is generated from the model ... Orthonormalization 4 A well-known challenge for Bayesian and probabilistic P (t) = 0 + b (t) + ξ φ (t) +  (t). (7) φ ( ) ij i ijk k ij approaches to FPCA is that the basis functions k t are k=1 not constrained to be orthogonal. In addition, when the scores = ξ do not have unit variance, the basis functions will also be We observe the curves at D 50 equally spaced points on the ijk , π φ φ indeterminate up to magnitude, since any increase in their domain [0 2 ]. FPCs 1 and 2 correspond to the functions ( ) ( ) φ φ norm can be accommodated by decreased variance of the sin x and cos x and FPCs 3 and 4 correspond to the func- ( ) ( ) scores. Where interest lies in the variance of scores with respect tions sin 2x and cos 2x .Wedividethecurvesequallyintotwo = , ∗ groups m 1 2. We define xij1 to be equal to 1 if the ith sub- to particular basis functions, it is important for the basis func- ∗ tions to be well-identified and orthogonal, so that they represent ject is assigned to group 1, and 0 otherwise, and we define xij2 distinct and non-overlapping modes of variation. We therefore to be equal to 1 if the ith subject is assigned to group 2, and 0 ξ constrain estimated FPCs to be orthonormal and require each otherwise. We generate scores ijk from zero-mean normal dis- tributions with variances equal to FPC to explain the largest possible amount of variance in the   data, conditionally on the previously estimated FPCs, if any. 2 × (ξ | ∗ , ) = γ ∗ + Let be the n K matrix of principal component scores and var ijk xij gik exp lkxijl gik (8) φ the K by Kθ matrix of principal component spline coefficient l=1 vectors. In each step of our iterative variational Bayes algorithm, We set γ1k for k = 1,...,4tothenaturallogarithmsof36, 12, 6 we apply the singular value decomposition to the matrix prod- γ = ,..., φT T and 4, respectively, and 2k for k 1 4tothenaturallog- uct ; the orthonormalized principal component basis arithms of 18, 24, 12 and 6, respectively. The order of γ and vectors which satisfy these constraints are then the right sin- 1k γ2k for FPCs (represented by k)1and2blackispurposely gular vectors of this decomposition. A similar approach was reversed between groups 1 and 2 so that the dominant mode used to induce orthogonality of the principal components in the ofvariationisnotthesameinthetwogroups.Wegenerate Monte Carlo Expectation Maximization algorithm of (Huang, the random effects gik in the score variance model from a nor- Li, and Guan 2014) and as a post-processing step in (Goldsmith, mal distribution with mean zero and variance σ 2 ,settingσ 2 to gk gk Zipunnikov, and Schrack 2015). Although explicit orthonormal- 3.0, 1.0, 0.3, and 0.1 across FPCs. We simulate functional ran- ity constraints may be possible in this setting (Šmídl and Quinn dom effects bi(t) for each subject by generating 10 elements 2007), our simple approach, while not exact, provides for accu- of a random effect spline coefficient vector from the distribu- rate estimation. Our HMC sampler conditions on the varia- ,σ2(( − π) + π )−1 tion MVN[0 b 1 Q I ], and then multiplying this tional Bayes estimates of the FPCs, and therefore also satisfies vector by a B-spline basis function evaluation matrix. We set the desired constraints. π = σ 2 = / b 1 2000, resulting in smooth random effects approx- imatelyone-thirdthemagnitudeoftheFPCdeviations.The  ( ) ... Hyperparameter Selection ij t are independent errors generated at all t from a normal The parameter π in model (5) controls the balance of smooth- distribution with mean zero and variance σ 2 = 0.25. ness and shrinkage penalization in the estimation of the We fix the sample size I at24,andsetthenumberofcurves random effects bim. In our variational Bayes approach we choose per subject Ji to 4, 12, 24, and 48. Two hundred replicate datasets JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1009

= = Figure . Selected results for the VB method for one simulation replicate with I Ji 24. This simulation replicate was selected because the estimation quality of the group-level score variances, shown in the bottom row, is close to with respect to all simulations. Panels in the top row show simulated curves for two subjects in light black, the simulated functional random effect for that subject as a dashed line, and the estimated functional random effect for that subject as a dark solid line. The subjects were selected to show one subject with a poorly estimated functional random effect (left) and one with a well estimated functional random effect (right). Panels in the bottom row show, for each FPC, estimates and simulated values of the group-level and subject-specific score variances. Large colored dots are the group-level score variances, and small colored dots are the estimated score variances for each subject, that is, they combine the fixed effect and the random effect. ( ) were generated for each of the four scenarios. The simulation Here pij is the vectorized observation of Pij t from model (7). scenario with I = Ji = 24 is closest to the sample size in our real We use 10 spline basis functions for estimation, so that is data application, where for each of 8 targets we have I = 26 and a50× 10 B-spline basis function evaluation matrix. For the Ji ≈ 24. Bayesian approaches, we use the priors specified in model (5), We fit the following model to each simulated dataset using including N[0, 100] priors for variance parameters σ 2 .Weuse γlk each of the three approaches described in Section 4: the empirical Bayes approach discussed in Section 4.2.4 to set the scale parameters for the inverse-gamma priors for the vari- 4 2 ances σ of the random effects gik. = β + + ξ φ + gk pij 0 bi ijk k ij Figures 3, 4,and5 illustrate the quality of variational Bayes =  k 1  (VB) estimation of functional random effects, FPCs, and fixed 2 ∗ and random effect score variance parameters. The top row ξ ∼ N 0, exp γ x + g . ijk lk ijl ik of Figure 3 shows the collection of simulated curves for two l=1

Figure . Estimation of FPCs using the VB method. Panels in the top row show a true FPC in dark black, and the VB estimates of that FPC for all simulation replicates with = φ ( ) φ ( ) Ji 24 in light black. Panels in the bottom row show, for each FPC and Ji, boxplots of integrated square errors (ISEs) for VB estimates k t of each FPC k t ,defined = 2π φ ( ) − φ ( ) 2 = as ISE 0 [ k t k t ] dt. The estimates in the top row therefore correspond to the ISEs for Ji 24 shown in the bottom row. Figure A. in Appendix F shows examples of estimates of FPCs with a range of different ISEs. 1010 D. BACKENROTH ET AL.

Figure . Estimation of score variance fixed and random effects using VB. Panels in the top row show, for each FPC, group, and Ji, boxplots of signed relative errors (SREs) γ−γ γ γ = lk lk forVBestimates lk of the fixed effect score variance parameters lk,definedasSRE γ .Panelsinthebottomrowshow,foreachFPCandJi, the correlation between lk random effect score variance parameters gik and their VB estimates. Intercepts and slopes for linear regressions of estimated on simulated random effect score variances are centered around  and , respectively (not shown). subjectsandincludesthetrueandestimatedsubject-specific suggesting convergence of the chains. In general, performance mean. The bottom row of this figure shows the true and esti- for the VB and HMC methods is comparable, and both meth- mated score variances across FPCs for a single simulated dataset, ods are in some respects superior to the performance of the SE and suggests that fixed and random effects in the score variance method. Figure 6 compares the three methods’ estimation of the model can be well-estimated. score variance parameters. Especially for FPC 4, the SE method The top row of Figure 4 shows estimated FPCs across all occasionally estimates random effect variances at 0; these are simulated datasets with Ji = 24; the FPCs are well-estimated represented in the lower-right panel of Figure 6 as points where andhavenoobvioussystematicbiases.Thebottomrowshows the correlation between simulated and estimated score variance integrated squared errors (ISEs) for the FPCs across each pos- random effects is 0. Table 1 shows, based on the simulation sible Ji.Asexpected,theISEsaresmallerfortheFPCswith scenario with Ji = 24, the frequentist coverage of 95% credible larger score variances, and decrease as Ji increases. For 12 and intervals for the VB and HMC methods, and of 95% confidence especially for 4 curves per subject, estimates of the FPCs cor- intervals for the SE method, in each case, for the fixed effect respond to linear combinations of the simulated FPCs, leading score variance parameters γlk.ForFPCs3and4especially,the to high ISEs and to inaccurate estimates of parameters in our SE procedure confidence intervals are too narrow. The median scorevariancemodel(examplesofpoorlyestimatedFPCscan ISE for the functional random effects is about 30% higher with be seen in Appendix F). the VB method than with the SE method. This results from the Panels in the top row of Figure 5 show that estimates of fixed relative tendency of the VB method to shrink FPC score esti- effect score variance parameters are shrunk towards zero, espe- mates to zero; when the mean of the scores is in fact non-zero, cially for lower numbers of curves per subject and FPCs 3 and 4. this shifts estimated functional random effects away from zero. We attribute this to overfitting of the random effects in the mean Other comparisons of these methods are broadly similar. model, which incorporates some of the variability attributable The HMC method is more computationally expensive than to the FPCs into the estimated random effects and reduces esti- the other two methods. Running 4 chains for 800 iterations mated score variances. Score variance random effects, shown in in parallel took approximately 90 minutes for Ji = 24. On one the bottom row of Figure 5, are more accurately estimated with more curves per subject. Table . Coverage of % credible/confidence intervals for the score variance parameters γ using the VB, SE, and HMC procedures, for J = 24. Figure 6 and Table 1 show results from a comparison of the lk i VB estimation procedure to the sequential estimation (SE) and FPC Group VB SE HMC Hamiltonian Monte Carlo (HMC) methods described in Section   . . . 4. We ran 4 HMC chains for 800 iterations each, and discarded   . . . the first 400 iterations from each chain. We assessed convergence   . . . of the chains by examining the convergence criterion of Gelman   . . .   . . . and Rubin (1992). Values of this criterion near 1 indicate con-   . . . vergence. For each of our simulation runs the criterion for every   . . . sampled variable was less than 1.1, and usually much closer to 1,   . . . JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1011

Figure . Comparison of estimation of score variance fixed and random effects using three methods. Panels in the top row show, for each FPC, group, and estimation γ = method, boxplots of signed relative errors (SREs) for estimates of the fixed effect score variance parameters lk for Ji 24. Panels in the bottom row show, for each FPC = and estimation method, the correlation between random effect score variance parameters gik and their estimates for Ji 24. Intercepts and slopes for linear regressions of estimated on simulated random effect score variances are centered around  and , respectively (not shown). processor, by comparison, the SE method took about 20 min- 6.1. Model utes, almost entirely to run function-on-scalar regression using We examine the effect of practice on the variability of move- pffr. The VB method took approximately six minutes, includ- ments while accounting for target and individual-specific ing the grid search to set the value of the parameter π,which idiosyncrasies. To do this, we use a model for score variance that controls the balance between zeroth and second-derivative includes a fixed intercept and slope parameter for each target penalties in the estimation of functional random effects. and one random intercept and slope parameter for each subject- 6. Analysis of Kinematic Data target combination. Correlation between score variance ran- dom effects for different targets for the same subject is induced We now apply the methods described above to our motivat- via a nested random effects structure. The mean structure for ing dataset. To reiterate, our goal is to quantify the process of β observed curves consists of functional intercepts l for each tar- motor learning in healthy subjects, with a focus on the reduc- get l ∈{1,...,8} and random effects bil for each subject-target tion of movement variability through repetition. Our dataset combination, to account for heterogeneity in the average move- consists of 26 healthy, right-handed subjects making repeated ment across subjects and targets. Our heteroscedastic FPCA movements to each of 8 targets. We focus on estimation, inter- model is therefore: pretation and inference for the parameters in a covariate and 8 subject-dependent heteroscedastic FPCA model, with primary   p = I(tar = l) β + b interest in the effect of repetition number in the model for score ij ij l il = variance. We hypothesize that variance will be lower for later l 1 K repetitions due to skill learning. + ξ φ + (9) Prior to fitting the model, we rotate all movements to be in ijk k ij = the direction of the target at 0◦ so that the X axis is the major k 1 axis of movement. For this reason, variation along the X axis   is interpretable as variation in movement extent and variation 8 2 ξ ∼ N 0,σξ = exp I(tar = l) along the Y axis is interpretable as variation in movement direc- ijk ijk ij X ( ) l=1 tion. We present results for analyses of the Pij t and    Y ( ) Pij t position curves in the right hand and describe a bivariate γ + + ( − )(γ + ) lk,int gilk,int repij 1 lk,slope gilk,slope approachtomodelingthesamedata. We present models with 2 FPCs, since 2 FPCs are sufficient ∼ , ; ∼ , gilk MVN gik g gik MVN 0 g (10) to explain roughly 95% of the movement variability (and usu- ik k ally more) of movements remaining after accounting for fixed The covariate tarij indicates the target to which movement j and random effects in the mean structure. Most of the variabil- by subject i is directed. The covariate repij indicates the repe- ity of movements around the mean is explained by the first FPC, tition number of movement j,startingat1,amongallmove- so we emphasize score variance of the first FPC as a convenient ments by subject i tothetargettowhichmovement j is directed, summary for the movement variability, and briefly present some and I(·) is the indicator function. To accommodate differences results for the second FPC. in baseline variance across targets, this model includes separate 1012 D. BACKENROTH ET AL.

population-level intercepts γlk,int for each target l.Theslopes which may reflect a familiarization with the experimental appa- γlk,slope onrepetitionnumberindicatethechangeinvariance ratus. due to practice for target l; negative values indicate a reduc- We now consider inference for the decreasing trend in vari- tion in movement variance. To accommodate subject and target- ance for the first principal component scores. We are interested specific effects, each subject-target combination has a random in the parameters γl1,slope, which estimate the population-level intercept gilk,int and a random slope gilk,slope,andeachsubject target-specific changes in score variance for the first principal has an overall random intercept gik,int and overall random slope component with each additional movement. Figure 8 shows VB gik,slope,inthescorevariancemodelforeachfunctionalprin- estimates and 95% credible intervals for the γl1,slope parameters cipal component. This model parameterization allows differ- for movements by the right hand to each target. All the point ent baseline variances and changes in variance for each target estimates γl1,slope are lower than 0, indicating decreasing first and subject, but shares FPC basis functions across targets. The principal component score variance with additional repetition. model also assumes independence of functional random effects For some targets and coordinates there is substantial evidence bil , l = 1,...,8 by the same subject to different targets, as well that γl1,slope < 0; these results are consistent with our under- as independence of functional random effects bil and score vari- standing of motor learning, although they do not adjust for mul- ance random effects gilk for the same subject. The validity of tiple comparisons. these assumptions for our data are discussed in Appendix D. Appendix C includes results of a bivariate approach to Throughout, fixed effects γlk,int and γlk,slope are given modeling movement kinematics, which accounts for the 2- N[0, 100] priors. Random effects gilk,int and gilk,slope are mod- dimensional nature of the movement. In this approach, the X eled using a bivariate normal distribution to allow for correla- and Y coordinates of curves are concatenated, and each princi- tion between the random intercept and slope parameters in each pal component reflects variation in both X and Y coordinates. FPC score variance model, and with nesting to allow for corre- For curves rotated to extend in the same direction, the results of lations between the random effects for the same subject and dif- this approach suggests that variation in movement extent (rep- ferent targets. We use the empirical Bayes method described in resented by the X coordinate) and movement direction (repre- Section 4.2.4 to set the scale matrix parameters of the inverse- sented by the Y coordinate) are largely uncorrelated: the esti- Wishart priors for gilk and gik. Appendix D includes an analysis mate of the first bivariate FPC represents variation primarily in which examines the sensitivity of our results to various choices the X coordinate, and is similar to the estimate of the first FPC in of prior hyperparameters. the X coordinate model, and vice versa for the second bivariate We fit (9)and(10)usingourVBmethod,withK = 2princi- FPC. Analyses of score variance, then, closely follow the preced- pal components and a cubic B-spline evaluation matrix with ing univariate analyses. Kθ = 10 basis functions. Appendix B includes an analysis of data for one target using theVB,HMC,andSEmethods.Thethreemethodsyieldsimilar results. 6.2. Results Figure 7 shows estimated score variances as a function of repeti- 7. Discussion tion number for X and Y coordinate right-hand movements to all targets. There is a decreasing trend in score variance for the This article develops a framework for the analysis of covari- first principal component scores for all targets and for both the ate and subject-dependent patterns of movement variability in X and Y coordinates, which agrees with our hypotheses regard- kinematic data. Our methods allow for flexible modeling of the ing learning. Figure 7 also shows that nearly all of the variance covariate-dependence of variance of functional data with eas- of movement is attributable to the first FPC. Baseline variance is ily interpretable results. Our approach allows for the estimation generally higher in the X direction than theY direction, indicat- of subject-specific effects on variance, as well as the considera- ing that movement extent is generally more variable than move- tion of multiple covariates. We are motivated by studies of motor ment direction. control, and anticipate these methods can be broadly applied in To examine the adequacy of modeling score variance as a this context; for example, ongoing work uses the methods devel- function of repetition number with a linear model, we compared oped here to study the initial post-stroke degree of impairment the results of model (10)withamodelforthescorevariancessat- and the recovery process over time. urated in repetition number, i.e., where each repetition number By applying these methods to our motivating dataset, we have demonstrated that movement variability is reduced with m has its own set of parameters γlkm in the model for the score variances: repetition. Results in Appendix A additionally show that the   baseline level of skill of subjects is correlated across targets and 8 24 hands, and that baseline movement variability is considerably ξ ∼ N 0,σ2 = exp I(tar = l, rep = m)γ . ijk ξijk ij ij lkm greater in the non-dominant than the dominant hand. Further l=1 m=1 applications of these methods in scientifically important con- (11) texts could focus, for example, on whether movement variabil- ity is reduced with training faster in the dominant hand, or The results for these two models are included in Figure 7.The on whether training with one hand transfers skill to the other general agreement between the linear and saturated models sug- hand. Further research could also investigate target-specific dif- gests that the slope-intercept model is reasonable. For some tar- ferences in improvement of variance with training. Movements gets score variance is especially high for the first movement, to some of the targets require coordination between the shoulder JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1013

Figure . VB estimates of score variances for right hand movements to each target (in columns), separately for each direction (X orY, in rows). Panels show the VB estimates of the score variance as a function of repetition number using the slope-intercept model () in red and orange (first and second FPC, respectively), and using the saturated one-parameter-per-repetition number model (), in black and gray (first and second FPC, respectively). and elbow, whereas others are primarily single-joint movements; estimation of parameters that summarize the global effect of the effectiveness of training may depend on the complexity of repetition on movement variability and shrinkage of the target- the movement. specific score variance parameters. However, with only 8 ran- We have provided three different estimation approaches for dom target effects, the model would be sensitive to the speci- fitting heteroscedastic functional principal components mod- fication of priors. Moreover, as discussed above, movements to els. Given its computational and comparable accuracy different targets impose different demands on coordination and totheHMCandSEmethods,werecommenduseoftheVB skill, which may reduce the interpretability of the parameters approach for exploratory analyses and model building. How- μk,int and μk,slope. ever, because of its approximate nature, we advise that any con- Ouranalysishereisofcurveslinearlyregisteredontoacom- clusionsderivedfromtheVBapproachbeconfirmedwithone mon , although our method could be applied to of the other two methods, perhaps with a subset of the data if curves with different time domains, or to sparsely observed required for computational feasibility. functional data. Our research group is currently working on An alternative approach to the analysis of this dataset could developing an improved approach to registration in kinematic treat the target effects γlk,int and γlk,slope in model (10)forthe experiments which will take account of the repeated obser- score variances as random effects centered around parame- vationsatthesubjectlevelbyseekingtoestimatesubject- ters μk,int and μk,slope, representing the average across-target and curve-specific warping functions. This approach, com- baseline score variance and change in score variance with bined with the methods we present in the current article, repetition. Some advantages of this approach would be the will eventually allow a more complete model for movement

γ γ Figure . VB estimates of l1,slope. This figure shows VB estimates and % credible intervals for target-specific score variance slope parameters l1,slope for movements by the right hand to each target, for the X and Y coordinates. 1014 D. BACKENROTH ET AL. variabilitythattakesintoaccountbothvariabilityinmovement Di, C.-Z., Crainiceanu, C. M., Caffo, B. S., and Punjabi, N. M.2009 ( ), “Mul- duration and variability in movement trajectories. tilevel Functional Principal Component Analysis,” Annals of Applied There are several directions for further development. A full Statistics, 4, 458–488. [1005] Djeundje, V. A. B., and Currie, I. D. (2010), “Appropriate Covariance- Bayesian treatment could estimate all quantities in model (5) Specification via Penalties for Penalized Splines in Mixed Models jointly, or could condition on only the FPCs and jointly esti- for Longitudinal Data,” Electronic Journal of Statistics, 4, 1202–1224. mate all other quantities; given the very flexible nature of [1007,1008] this model, additional constraints might be required in such Gelman, A., and Rubin, D. B. (1992), “Inference from Iterative Simulation a Bayesian treatment to improve identifiability. More complex Using Multiple Sequences,” Statistical Science, 7, 457–472. [1010] Goldsmith, J., Greven, S., and Crainiceanu, C. M. (2013), “Corrected Con- models could allow for correlations between functional random fidence Bands for Functional Data using Principal Components,” Bio- effects and score variance random effects. Considering our data metrics, 69, 41–51. [1007] from the perspective of shape analysis may provide better under- Goldsmith,J.,andKitago,T.(2016), “Assessing Systematic Effects of Stroke standing of interpretable movement features like location, scale on Motor Control using Hierarchical Function-on-Scalar Regression,” and orientation (Kurtek et al. 2012; Gu, Pati, and Dunson 2012). JournaloftheRoyalStatisticalSociety, Series C, 65, 215–236. [1006] Goldsmith, J., Scheipl, F., Huang, L., Wrobel, J., Gellar, J., Harezlak, J., Finally, an alternative approach to that presented here would McLean, M. W., Swihart, B., Xiao, L., Crainiceanu, C., and Reiss, P. T. be to model covariate-dependent score distributions through (2016), refund: Regression with Functional Data. R package version 0.1- quantileregression.Thismayproducevaluableinsightsintothe 17, available at http://CRAN.R-project.org/package=refund [1006] complete distribution of movements, especially when this is not Goldsmith, J., Wand, M. P., and Crainiceanu, C. M. (2011), “Functional symmetric, but some work is needed to understand the connec- Regression via Variational Bayes,” Electronic Journal of Statistics,5, 572–602. [1006] tion of this technique to traditional FPCA. Goldsmith, J., Zipunnikov, V., and Schrack, J. (2015), “Generalized Multi- level Function-on-Scalar Regression and Principal Component Analy- 8. Supplementary Materials sis,” Biometrics, 71, 344–353. [1006,1008] Gu, K., Pati, D., and Dunson, D. B. (2012), “Bayesian Hierarchical Model- The supplementary materials contains the following appen- ing of Simply Connected 2D Shapes,” arXiv preprint arXiv:1201.1658. dices: A, containing additional results from our real data [1014] Guo, W. (2002), “Functional Mixed Effects Models,” Biometrics, 58, analysis; B, containing results comparing the VB, HMC and 121–128. [1005] SE methods as applied to the kinematic data; C, containing a Huang, H., Li, Y., and Guan, Y. (2014), “Joint Modeling and Cluster- description of and results from a bivariate approach to the anal- ing Paired Generalized Longitudinal Trajectories with Application to ysis of our kinematic data; D, containing a sensitivity analysis for Cocaine Abuse Treatment Data.” Journal of the American Statistical our choice of hyperparameters and examining alternative mean Association, 83, 210–223. [1006,1008] Huang,V.,Ryan,S.,Kane,L.,Huang,S.,Berard,J.,Kitago,T.,Mazzoni, specifications; E, containing a derivation of our variational Bayes P. , a n d K r a k au e r , J. ( 2012), “3D Robotic Training in Chronic Stroke algorithm and more detail on our HMC implementation; and F, Improves Motor Control but not Motor Function,” Society for Neuro- containing additional simulation results. science, October 2012, New Orleans. [1004] The supplementary materials also include code implement- James,G.M.,Hastie,T.J.,andSugar,C.A.(2000), “Principal Component ing our methods and all the simulation scenarios presented here. Models for Sparse Functional Data,” Biometrika, 87, 587–602. [1006] Jiang, C.-R., and Wang, J.-L. (2010), “Covariate Adjusted Functional Princi- Thiscodecanfitmodelsintheformofmodel(5), provided that, pal Components Analysis for Longitudinal Data,” The Annals of Statis- as in our real data application and simulations, only one func- tics, 38, 1194–1226. [1006] tional random effect is associated with each functional observa- Jordan, M. I. (2004), “Graphical Models,” Statistical Science, 19, 140–155. tion. [1006] Jordan,M.I.,Ghahramani,Z.,Jaakkola,T.S.,andSaul,L.K.(1999), “An Introduction to Variational Methods for Graphical Models,” Machine Funding Learning, 37, 183–233. [1006] Kitago,T.,Goldsmith,J.,Harran,M.,Kane,L.,Berard,J.,Huang,S., The work of D. Backenroth and J. Goldsmith was supported in part by Ryan,S.L.,Mazzoni,P.,Krakauer,J.W.,andHuang,V.S.(2015), Award R21EB018917 from the National Institute of Biomedical Imaging “Robotic Therapy for Chronic Stroke: General Recovery of Impair- and Bioengineering. The work of J. Goldsmith was also supported by ment or Improved Task-Specific Skill?” Journal of Neurophysiology, 114, Award R01NS097423-01 from the National Institute of Neurological Dis- 1885–1894. [1004] orders and Stroke and Award R01HL123407 from the National Heart, Krakauer,J.W.(2006),“MotorLearning:ItsRelevancetoStrokeRecov- Lung, and Blood Institute. National Heart, Lung, and Blood Institute ery and Neurorehabilitation,” Current Opinion in Neurology, 19, 84–90. [R01HL123407]; National Institute of Biomedical Imaging and Bioengi- [1003] neering [R21EB018917]; National Institute of Neurological Disorders and Kurtek,S.,Srivastava,A.,Klassen,E.,andDing,Z.(2012), “Statistical Mod- Stroke [R01NS097423-01]. eling of Curves Using Shapes and Related Features,” Journal of the American Statistical Association, 107, 1152–1165. [1014] McLean,M.W.,Scheipl,F.,Hooker,G.,Greven,S.,andRuppert,D. References (2013), “Bayesian Functional Generalized Additive Models for Sparsely Observed Covariates,” under review. [1006] Bates, D., Mächler, M., Bolker, B., and Walker, S. (2015), “Fitting Linear Morris, J. S. and Carroll, R. J. (2006), “-Based Functional Mixed Mixed-Effects Models Using lme4,” Journal of Statistical Software, 67, Models,” Journal of the Royal Statistical Society, Series B, 68, 179–199. 1–48. [1006] [1005] Bishop, C. M. (1999), “Bayesian PCA,” Advances in Neural Information Pro- Ormerod,J.,andWand,M.P.(2012), “Gaussian Variational Approximation cessing Systems, 382–388. [1006] Inference for Generalized Linear Mixed Models,” The American Statis- Chiou, J.-M., Müller, H.-G., and Wang, J.-L. (2003), “Functional Quasi- tician, 21, 2–17. [1006] Likelihood Regression Models with Smooth Random Effects,” Journal Peng, J., and Paul, D. (2009), “A Geometric Approach to Maximum Likeli- of the Royal Statistical Society, Series B, 65, 405–423. [1006] hood Estimation of the Functional Principal Components from Sparse JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1015

Longitudinal Data,” Journal of Computational and Graphical Statistics, and Motor Cortical Areas,” Journal of Neurophysiology, 102, 2921–2932. 18, 995–1015. [1006] [1004] Ramsay,J.O.,andSilverman,B.W.(2005), Functional Data Analysis,New Tipping, M. E., and Bishop, C. (1999), “Probabilistic Principal Component York: Springer. [1005] Analysis,” JournaloftheRoyalStatisticalSociety, Series B, 61, 611–622. Scheipl, F., Staicu, A.-M., and Greven, S. (2015), “Functional Additive [1006] Mixed Models,” Journal of Computational and Graphical Statistics, 24, Titterington,D.M.(2004), “Bayesian Methods for Neural Networks and 477–501. [1005,1007] Related Models,” Statistical Science, 19, 128–139. [1006] Scholz, J.-P., and Schöner, G. (1999), “The Uncontrolled Manifold Con- van der Linde, A. (2008), “Variational Bayesian Functional PCA,” Compu- cept:IdentifyingControlVariablesforaFunctionalTask,”Experimental tational Statistics and Data Analysis, 53, 517–533. [1006] Brain Research, 126, 289–306. [1003] Šmídl, V., and Quinn, A. (2007), “On Bayesian Principal Component Schwarz, G. (1978), “Estimating the Dimension of a Model,” The Annals of Analysis,” Computational Statistics & Data Analysis, 51, 4101–4123. Statistics, 6, 461–464. [1008] [1008] Shmuelof,L.,Krakauer,J.W.,andMazzoni,P.(2012), “How is a Motor Skill Yao, F., Müller, H., and Wang, J. (2005), “Functional Data Analysis for Learned? Change and Invariance at the Levels of Task Success and Tra- Sparse Longitudinal Data,” Journal of the American Statistical Associ- jectory Control,” Journal of Neurophysiology, 108, 578–594. [1003] ation, 100, 577–590. [1004,1005,1006] Stan Development Team (2013), Stan Modeling Language User’s Guide and Yarrow, L., Brown, P., and Krakauer, J.-W. (2009), “Inside the Brain Reference Manual, Version 1.3,availableathttp://mc-stan.org/ [1008] of an Elite Athlete: The Neural Processes that Support High Tanaka, H., Sejnowski, T. J., and Krakauer, J. W. (2009), “Adaptation to Achievement in Sports,” Nature Reviews Neuroscience, 10, 585–596. Visuomotor Rotation Through between Posterior Parietal [1003]