Sparse Functional Boxplots for Multivariate Curves

Zhuo Qu and Marc G. Genton1

March 16, 2021

Abstract

This paper introduces the sparse functional boxplot and the intensity sparse functional boxplot as practical exploratory tools that make visualization possible for both complete and sparse functional data. These visualization tools can be used either in the univariate or multivariate functional setting. The sparse functional boxplot, which is based on the functional boxplot, depicts sparseness characteristics in the envelope of the 50% central region, the median curve, and the outliers. The proportion of missingness at each time index within the central region is colored in gray. The intensity sparse functional boxplot displays the relative intensity of sparse points in the central region, revealing where data are more or less sparse. The two-stage functional boxplot, a derivation from the functional boxplot to better detect outliers, is also extended to its sparse form. Several depth proposals for sparse multivariate functional data are evaluated and outlier detection is tested in simulations under various data settings and sparseness scenarios. The practical applications of the sparse functional boxplot and intensity sparse functional boxplot are illustrated with two public health datasets. arXiv:2103.07868v1 [stat.ME] 14 Mar 2021

Some key words: Depth, Missing values, Multivariate functional data, Outlier detection, Sparse functional data, Visualization 1Statistics Program, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia. E-mail: [email protected], [email protected] This research was supported by the King Abdullah University of Science and Technology (KAUST). 1 Introduction

Functional data analysis (Ramsay & Dalzell 1991) regards each observation unit as a function of an index displayed as a curve or image. In this exciting ﬁeld, concepts from classical data analysis are generalized to functional or even multivariate functional perspectives. Many tradi- tional descriptive statistics and analytical methods can be generalized to the functional domain, e.g., mean and outliers (Claeskens et al. 2014), principal component analysis (Yao et al. 2005), analysis of variance (Zhang 2013), median polish (Sun & Genton 2012b) and linear regression

(Yao et al. 2005, Cai & Hall 2006). Simultaneously, some new issues appear that require in- novative solutions, such as registration and cross-registration methods (Srivastava et al. 2011,

Carroll et al. 2020). These methods have wide applications in public health, biology, meteorol- ogy, engineering, and economics. Real-life examples include human growth curves (Liu & Yang

2009), CD4 cell counts (Yao et al. 2005, Goldsmith et al. 2013), spatio-temporal data such as temperature and precipitation (Sun & Genton 2012b, Qu et al. 2021), brain imaging (Happ &

Greven 2018), and the nondurable goods index (Ramsay & Silverman 2007).

Descriptive statistics, such as median and outliers, need to be determined before the functional data can be displayed for exploratory data analysis. Ordering the samples of curves and images directly is diﬃcult. Therefore, many functional depths have been proposed that establish the center of the functional data sample and then order them from the center outwards. Available functional depths in a marginal perspective are the integrated depth (Ibrahim & Molenberghs

2009), random projection depth (Cuevas et al. 2007), random Tukey depth (Cuesta-Albertos &

Nieto-Reyes 2008), band and modified band depth (López-Pintado & Romo 2009), half-region depth and modified half-region depth (Adler 2010), functional spatial depth (Sguera et al. 2014), and extremal depth (Narisetty & Nair 2016). Moreover, multivariate functional depths have been proposed based on two generalizations: from functional to multivariate functional scenarios and from multivariate to multivariate functional scenarios. Examples of the first idea include the weighted average of the marginal functional depths (Ieva & Paganoni 2013) and the simplicial

1 and modiﬁed simplicial band depths (López-Pintado et al. 2014). An example of the second idea is the multivariate functional halfspace depth (Claeskens et al. 2014). With the intensive development of diﬀerent data depth notions, various tools have been developed for visualizing functional data and detecting outliers.

Hyndman & Shang(2010) made use of the ﬁrst two robust functional principal component scores and presented functional versions of the bagplot and the highest density region plot. With the modiﬁed band depth giving the ranks among data, Sun & Genton(2011) proposed the functional boxplot, a potent analog to the classical boxplot (Tukey 1975), for visualizing functional data. For functional data with a dependence structure, Sun & Genton(2012a) provided an adap- tive way of determining the outlier selection factor in the adjusted functional boxplot. Arribas-Gil

& Romo(2014) explored the relationship between modified band depth and modified epigraph index (López-Pintado & Wei 2011) and proposed the outliergram to visualize and detect shape outliers among functional data. Genton et al.(2014) introduced a surface boxplot based on the modified volume depth and developed an interactive surface boxplot tool for the visualization of samples of images. Mirzargar et al.(2014) extended the notion of depths from functional data to curves, presenting the curve boxplot for visualizing ensembles of 2D and 3D curves. Dai

& Genton(2018a) developed a two-stage functional boxplot for multivariate curves, combining directional outlyingness (Dai & Genton 2019) and a classical functional boxplot in the outlier detection procedure. The two-stage functional boxplot can be applied to functional data both marginally and jointly. Additionally, Dai & Genton(2018b) introduced the magnitude-shape

(MS) plot for visualizing both the magnitude and shape outlyingness of multivariate functional data. Yao et al.(2020) proposed two exploratory tools: the trajectory functional boxplot and the modiﬁed simplicial band depth (MSBD) versus wiggliness of directional outlyingness (WO) plot for visualizing trajectory functional data, such as the paths of hurricanes and migrating birds in space. More visualization examples can be found in the recent review of Genton & Sun

(2020).

2 Observed CD4 Counts 3000 2500 2000 1500 1000 Total CD4 Cell Counts Total 500 0

−20 −10 0 10 20 30 40 Months since seroconversion

Figure 1: Observed CD4 cell counts from 366 patients (black solid points) measured from 18 months before to 42 months after seroconversion (the time period during which a speciﬁc antibody develops and becomes detectable in the blood).

The above visualization techniques can only be applied to samples of curves measured on

ﬁne and common grids. In practice, the grids are not always ﬁne or common; rather, curves are sometimes observed on sparse or irregularly spaced time points. Figure1 shows a longitudinal study of CD4 cell counts (Goldsmith et al. 2013) per milliliter of blood to track the progress of

HIV. Overall, there are 366 patients, with between 1 and 11 observations per subject and a mean of 5, yielding 1888 data points. Figure2 presents a dataset of malnutrition metrics, consisting of the prevalence of stunted growth and low birth weight, between 1986 to 2019 for 77 countries.

Sparseness appears randomly for the ﬁrst variable, stunted growth, and values for low birth weight are only available for years 2000 to 2015. Applying visualization tools to functional data with missing values is diﬃcult but necessary in order to visualize sparse multivariate functional data.

Various ideas have been proposed to deal with missingness in the notion of depth in the functional case. A ﬁrst approach is to ﬁt the sparse functional data before applying current univariate depths and a second approach is to revise the notions of depth for the sparse functional

3 Prevalence of Stunted Growth (Country Level) Prevalence of Low Birth Weight (Country Level) 70 70 60 60 50 50 40 40 30 30 Prevalence (%) Prevalence (%) Prevalence 20 20 10 10 0 0

1985 1990 1995 2000 2005 2010 2015 2020 1985 1990 1995 2000 2005 2010 2015 2020 Year Year

Figure 2: The observed prevalence of stunted growth and prevalence of low birth weight for 77 countries from 1986 to 2019. Observations are joined with solid black lines if observed continuously; otherwise, joined with gray dashed lines. data case. One example of the first approach is in López-Pintado & Wei(2011), in which the ideas of functional principal components (Yao et al. 2005) are applied for sparse data fitting and the modified band depth is used as an example to order the fitted data. Other available methods for fitting sparse functional data include functional principal component analysis (James et al.

2000, Liu et al. 2017), B-spline models (Thompson & Rosen 2008), and covariance estimation

(Xiao et al. 2018). One example of the second approach is in Sguera & López-Pintado(2020), where the idea of depth is extended by incorporating both the curve estimation and its conﬁdence interval into the depth analysis, which can be applied to any univariate functional depth.

Similar methods have been generalized to ﬁt sparse data in the multivariate functional setting.

Still, the depth of sparse multivariate functional data has not been considered in detail. Zhou et al.(2008) modeled paired longitudinal data with principal components and ﬁtted them by penalized splines under the mixed-eﬀects model framework. However, their method is limited to only two variables when the data are either observed or missing simultaneously. Happ & Greven

(2018) proposed a multivariate functional principal component analysis deﬁned on diﬀerent time grids, which is also suitable for data with missing values. Li et al.(2018) derived a fast algorithm

4 for ﬁtting sparse multivariate functional data via estimating the multivariate covariance function with tensor product B-spline. The limited discussion of possible depths for sparse multivariate functional data gives us much space to explore before providing the ranks for data visualization.

Therefore, in this paper we aim to address the revised depth in the sparse multivariate functional case and to develop visualization tools for both marginal and joint sparse functional data. Possible revised notions of depth are considered and the best depth is the one that has the highest Spearman rank coeﬃcient with the conventional depth of the original data. We also introduce the sparse functional boxplot, which shows the percentage of sparse points within the central region, plus the sparseness characteristics of the median and the outlying curves.

Its complementary tool, the intensity sparse functional boxplot, highlights the intensity of the sparse points in the central region.

The remainder of the paper is organized as follows. The procedure of ﬁtting sparse multivariate functional data and various depths for sparse multivariate functional data are considered in

Section2. Sparse and intensity sparse functional boxplots, together with their two-stage forms, are presented in Section3. The choice of depth and outlier detection performance for diﬀerent visualization tools are demonstrated via simulations in Section4. Applications to the aforemen- tioned CD4 cell counts and malnutrition data are presented in Section5. The paper ends with a discussion in Section6.

2 Ordering Sparse Multivariate Functional Data

Ordering sparse multivariate functional data is necessary before we visualize them. We consider a two-step procedure: ﬁrst we apply multivariate functional principal component analysis

(MFPCA, Happ 2018) to ﬁt the data; second, we consider various notions of depth to order the

ﬁtted data. If there is only one variable, then the data reduce to univariate functional data, and MFPCA becomes univariate functional principal component analysis (UFPCA, Yao et al.

2005). The depth notions we consider are either from the direct generalization of the univariate

5 revised functional depth (Sguera & López-Pintado 2020) to multivariate functional data, or our new idea described in Section 2.2.2.

2.1 Fitting Sparse Multivariate Functional Data

We consider multivariate functional data X(t) = (X(1)(t(1)),...,X(p)(t(p)))> ∈ Rp (p ≥ 1).

(1) (p) Following the deﬁnitions from Happ & Greven(2018), t := (t , . . . , t ) ∈ T := T1 × · · · × Tp,

dj (j) (j) where Tj (j = 1, . . . , p) is a compact set in R . X (t ): Tj −→ R is assumed to be in

2 L (Tj). Hence, t is a p-tuple of d1,. . . , dp-dimensional vectors and not a scalar. In practice, instead of X(t), we observe Y (t) = (Y (1)(t(1)),...,Y (p)(t(p)))> ∈ Rp, due to measurement errors

(1) (p) > i.i.d. 2 2 i = (i , . . . , i ) ∼ Np(0, diag{σ1, . . . , σp}). Taking the ith observation as an example,

Yi(ti) is denoted as

∞ X Yi(ti) = Xi(ti) + i = µ(ti) + ρi,mψi,m(ti) + i, i = 1, . . . , n, (1) m=1

(1) (1) (p) (p) > where the mean is µ(ti) = (µ (ti ), . . . , µ (ti )) , the multivariate eigenfunction is ψi,m(ti) =

(1) (1) (p) (p) > (ψi,m(ti ), . . . , ψi,m(ti )) , and νm are the corresponding multivariate eigenvalues following ν1 ≥

ν2 ≥ · · · . Here, ρi,m denote the functional principal component eigenscores. We assume that

ρi,m are uncorrelated random variables with zero mean and variances var(ρi,m) = νm with ρi,m :=

Pp (j) (j) 2 j=1hX , ψ i2, where h·, ·i2 is the scalar product in L (Tj) for j = 1, . . . , p. For s, t ∈ T , we deﬁne the covariance matrix C(s, t) = E((X(s) − µ(s)) ⊗ (X(t) − µ(t)))

(i) (j) (i) (i) (j) (j) (i) (i) (i) (i) (j) (j) with elements Cij(s , t ) = cov(X (s ),X (t )) = E((X (s ) − µ (s ))(X (t ) −

(j) (j) µ (t ))), with si ∈ Ti, tj ∈ Tj.

Proposition 5 in Happ & Greven(2018) indicates that multivariate functional data have a ﬁnite Karhunen–Loève representation if and only if all univariate elements have a ﬁnite

Karhunen–Loève representation:

∞ (j) (j) (j) (j) (j) (j) (j) X (j) (j) (j) (j) Yi (ti ) = Xi (ti ) + i = µi (ti ) + ξi,l φi,l (ti ) + i , i = 1, . . . , n, j = 1, . . . , p, l=1

6 (j) (j) (j) where φi,l (ti ) and ξi,l are the lth functional principal component eigenfunction and eigenscore corresponding to the ith observation and the jth variable, separately. Additionally, their proposition establishes a direct relationship between the multivariate and univariate Karhunen-Loève representations:

Mj p Mj (j) (j) X (j) (j) (j) X X (j) (j) ψbi,m(ti ) = [cbm]l φbi,l (ti ), ρbi,m = [cbm]l ξbi,l . l=1 j=1 l=1

(j) (j) Mj (1),> (p),> > Here, [cbm]l is the lth term of cbm ∈ R . We note that cbm = (cbm ,..., cbm ) is the orthogo- nal eigenvector from the matrix Zb = (n−1)−1Ξ>Ξ ∈ RM+×M+ , where Ξ ∈ Rn×M+ and its ith row is (ξ(1),..., ξ(1) ,..., ξ(p),..., ξ(p) ). Here, M and M are the optimal number of multivariate bi,1 bi,M1 bi,1 bi,Mp j Pp and univariate eigenfunction components, respectively, and M+ = j=1 Mj.

(j) (j) (j) For the ith sample in the jth variable, all time observations T = (t , . . . , t )> ∈ Nji for i i,1 i,Nji R (j) (j) (j) (j) (j) (j) (j) (j) t ∈ T (k = 1,...,N ). If we write Y = (Y (t ),...,Y (t ))> ∈ Nji , then X , µ , i,k j ji i i i,1 i i,Nji R i i (j) (j) (j) (j) and ψi,m can be written in a similar way, as well as their estimated versions Xci , µbi , and ψbi,m.

(1)> (p)> > Ni Pp We let Yi = (Yi ,..., Yi ) ∈ R (Ni = j=1 Nji), and analogous deﬁnitions can be made for Ybi, Xi, µi, and ψi,m and the estimated Xci, µbi, and ψbi,m. We deﬁne ΨiM = (ψi,1,..., ψi,M ) ∈

Ni×M > 2 2 R , V = diag{ν1, . . . , νM }, ρi = (ρi,1, . . . , ρi,M ) , and let θ := {M, ΨiM , µi, V , σ1, . . . , σp} be the collection of unobserved MFPC decomposition objects.

We evaluate the estimation based on MFPCA and name it the MFPCA ﬁt:

M h i X X := E Y |θ = µ + ρ ψ , i = 1, . . . , n, (2) cθb,i i b bi bi,m bm m=1

> where ρbi = (ρbi,1,..., ρbi,M ) . (j) (j) The Appendix presents theorems with proofs showing that {Xb (t(j)) − X (t(j))} is asymp- θb,i i totically normal, which makes building the conﬁdence intervals for the multivariate functional data possible. Therefore, we may construct the (1−α) asymptotic point-wise conﬁdence interval

7 for X(j)(t(j)) between the lower bound X (t(j)) and upper bound X (t(j)): i i lb,θb,i i ub,θb,i i

r (j) (j) α h (j) (j) (j) (j) i Xb (t ) ± Φ−1(1 − ) var Xb (t ) − X (t )|θb θb,i i 2 θb,i i i i where Φ is the standard Gaussian cumulative distribution function (cdf).

Due to the uncertainty of the obtained eigenvalues and eigenfunctions, the estimation and its conﬁdence intervals can be corrected through bootstrap in MFPCA. We generalize the corrected

fitting and confidence intervals from the univariate (Goldsmith et al. 2013) to the multivariate case from B times bootstrap. Specifically, the fit is obtained through the iterated expectation, and the confidence interval is obtained through the iterated variance. We name the bootstrap corrected estimation the BMFPCA fit:

n h io X = E E Y |θ , i = 1, . . . , n. (3) ci θb Yi|θb i b

(j) (j) (j) (j) (j) (j) We obtain the corrected conﬁdence interval for Xi (ti ) between Xlb,i(ti ) and Xub,i(ti ):

q (j) (j) −1 α (j) (j) (j) (j) Xb (t ) ± Φ (1 − ) var Xb (t ) − X (t ) . (4) i i 2 i i i i

The corrected variance is:

(j) (j) (j) (j) (j) (j) (j) (j) (j) (j) (j) (j) var Xbi (ti )−Xi (ti ) = Eθ var (j) (Xb (ti )−Xi (ti )|θb) +varθ E (j) (Xb (ti )−Xi (ti )|θb) , b Yi |θb θb,i b Yi |θb θb,i

(j) (j) (j) (j) (j) (j) where var (j) (Xb (ti )−Xi (ti )|θb) is given from the above conditional model, E (j) (Xb (ti )− Yi |θb θb,i Yi |θb θb,i (j) (j) (j) (j) (j) (j) Xi (ti )|θb) can be approximated by E (j) (Xb (ti ) − X (ti )), θbb is obtained from the bth Yi |θb θbb,i θb,i bootstrap sample (b = 1,...,B), and θb is obtained from the original sample. Usually, α = 0.05, and we obtain the 95% conﬁdence interval.

2.2 Notions of Depth for Sparse Multivariate Functional Data

Similar to revising the depth for sparse functional data in the univariate case, in the ﬁrst approach we apply current multivariate functional depths to the ﬁtted idea (López-Pintado & Wei 2011).

8 In the second approach, we revise the notions of depth to handle sparse multivariate functional data by incorporating the ﬁtted data and its conﬁdence interval into the revised depth (see the proposal by Sguera & López-Pintado(2020) in the univariate case). We evaluate these two approaches in the following subsections and select the best depth for sparse multivariate functional depth in a simulation study (Section 4.2).

2.2.1 Conventional Depth for Multivariate Functional Data

We take multivariate functional halfspace depth (MFHD, Hubert et al. 2015) as an example to represent depth revision in sparse settings in later sections; in principle, any reasonable multivariate functional depth can be used.

p We consider a p-variate stochastic process {X (t), t ∈ T } on R , where T = T1×· · ·×Tp. Since multivariate functional depth (MFD) requires common time grids for all variables, we simplify T

p such that all variables lie in Tc and consider an arbitrary {X(t), t ∈ Tc × · · · × Tc} on R . This is

p p equivalent to {X(t), t ∈ Tc} on R . We let D(·; FX ): R −→ [0, 1] be a stochastic depth function for the probability distribution of X with cdf FX . Accordingly, the depth region Dβ(FX ) at level

β ≥ 0, is deﬁned as D (F ) = {X ∈ p : D(X; F ) ≥ β} such that R vol{D (F )}dt ∈ β X R X Tc β X (t) (0, +∞).

p We let C (Tc) be continuous paths for the p-variate stochastic process X (t). If X (t) is con-

p tinuous in C (Tc) and w is a weight function that integrates to one on the domain Tc, then the

MFD of X is deﬁned as

Z MFD(X; FX ) = D(X(t); FX (t))w(t)dt, Tc where the weight w(t) can be deﬁned by either:  w constant, or w(t) = vol{D (F )}/ R vol{D (F )}du non-constant.  β X (t) Tc β X (u)

9 The halfspace depth (HD, Tukey 1975) is deﬁned as

> > HD(X(t); FX (t)) = inf P {u X (t) ≥ u X(t)}. u∈Rp,kuk=1

Hence, MFHD in the continuous time domain is expressed as:

Z MFHD(X; FX ) = HD(X(t); FX (t))w(t)dt. (5) Tc

f Similarly, under a ﬁnite sample of multivariate curve observations X (tc) = {X1(tc),..., Xn(tc);

tc ∈ Tc, c = 1,...,Nc} with a cdf FX (tc),n at each time point tc, the sample multivariate functional

p R (tc+tc+1)/2 depth at X ∈ C (Tc) is deﬁned with t0 = t1, tN +1 = tN and Wc = w(t)dt, by c c (tc−1+tc)/2

Nc f X MFD(X; X ) = D(X(tc); FX (tc),n)Wc, c=1 with  w · (tc+1 − tc−1)/2 constant, or Wc = PNc vol{Dβ(FX (tc),n)}(tc+1 − tc−1)/{ c=1 vol{Dβ(FX (tc),n)}(tc+1 − tc−1)} non-constant.

Let # represent the number of counts. The halfspace depth in the ﬁnite-sample version is:

1 > > HD(X(tc); FX (tc),n) = min #{Xi(tc), i = 1, . . . , n : u X (tc) ≥ u X(tc)}. n u∈Rp,kuk=1

Therefore, MFHD in the ﬁnite time domain is:

Nc f X MFHD(X; X ) = HD(X(tc); FX (tc),n)Wc. (6) c=1

2.2.2 Revised Depth for Sparse Multivariate Functional Data

f We let Xb = {Xc1,..., Xcn} be a set including n samples of multivariate functional data fit- f ted on common time grids. We also obtain a set of confidence level upper bounds Xb ub = f {Xbub,1,..., Xbub,n} and confidence level lower bounds Xb lb = {Xblb,1,..., Xblb,n}. We list the revised depth notion in the finite-sample version, and the continuous version can be analogously derived.

10 Here we take MFHD as an instance to illustrate the revised notions of depth.

Besides applying the current multivariate functional depths to the fitted data, we consider three additional possibilities for revising the notion of depth for sparse multivariate functional data. The first two incorporate the confidence intervals into the fitted data set and name the

f f f f updated dataset Xb upd,1 := Xb ∪ Xb ub ∪ Xb lb, where the number of observations is three times the original one; see Sguera & López-Pintado(2020) for the univariate functional case. The third only considers the conﬁdence intervals and combines upper and lower conﬁdence levels into one

f > > > > > > observation, where Xb upd,2 := {(Xcub,1, Xclb,1) ,..., (Xcub,n, Xclb,n) }, the number of variables is twice as much as Xcf .

1 1 1 If we put ( 3 , 3 , 3 ) as a weight in the depth of the fitted data, upper confidence level, and lower confidence level, then we obtain the average-weight revised depth:

f f f f MFHD(Xc; Xb upd,1) + MFHD(Xcub; Xb upd,1) + MFHD(Xclb; Xb upd,1) RMFHDaw(Xc; Xb ) = . (7) upd,1 3

1 1 1 If we put ( 2 , 4 , 4 ) as a weight in the depth of the fitted data, upper confidence level, and lower confidence level, then we obtain the nonaverage-weight revised depth:

f f f f 2MFHD(Xc; Xb upd,1) + MFHD(Xcub; Xb upd,1) + MFHD(Xclb; Xb upd,1) RMFHDnaw(Xc; Xb ) = . (8) upd,1 4

We also consider building a 2p-variate dataset Xb upd,2(tc) with size n, and name it the revised depth double-multivariate revised depth:

f > > > f RMFHDdm(Xc; Xb upd,2) = MFHD((Xcub, Xclb ) ; Xb upd,2). (9)

We apply the conventional depth MFHD and its revised forms RMFHD to various data settings in simulations in Section 4.1, and we identify the best depth according to the Spearman rank coeﬃcient in Section 4.2.

11 3 Construction of Sparse Functional Boxplots

To categorize possible sparseness scenarios, we consider three sparseness types in marginal functional data: point, peak, and partial sparseness. They represent curve sparseness appearing randomly, in a peak cluster, and in a partial cluster from a given start time until some random time.

> Hence, the sparseness can be tuned by the sparseness parameter ps = (psparse, pcurve, tstart) , where we use psparse to denote the probability of a curve with missing values in a group of curves, pcurve to denote the probability of sparseness in each of those curves with missing values, and tstart to deﬁne the initial sparse point in the peak and partial sparseness cases:  0, point sparseness, tstart = U[0, 1 − pcurve], peak and partial sparseness, where U means a random sampling from the uniform distribution.

The diﬀerence between peak and partial sparseness lies in the tstart. In the peak sparseness, each curve generates tstart independently; while in the partial sparseness, all curves with missing values share a common tstart. Usually, missing values appear in the middle intervals of time in the partial sparseness. When missing values appear at the start or end time, it is diﬃcult for

RMFPCA to reconstruct curves with suitable ﬁttings unless the curves follow simple monotone trends, such as the data application in Section 5.2.

The depth revision in Section2 makes the visualization of sparse multivariate functional data possible, for instance, with the functional boxplot (Sun & Genton 2011) and the two- stage functional boxplot (Dai & Genton 2018a). Nevertheless, directly applying the current visualization tools presumes the complete data are the ﬁtted data and lacks the characteristics of sparseness. Hence, we display the sparseness features based on the functional boxplot and its two-stage form, and extend the functional boxplot and two-stage functional boxplot to the sparse case. We coin them the sparse functional boxplot and the sparse two-stage functional boxplot, respectively.

12 To illustrate the visualization tools we construct in this section, we generate univariate functional data without outliers through a Gaussian process with zero mean and exponential covariance C(t, s) = exp{−|t − s|}. After sparsifying the samples by tuning ps, we ﬁt them with

BMFPCA, and then use the visualization tools on this example as shown in Figure3. When the data are complete, the sparse functional boxplot (1st column, Figure3) reduces to the conventional form (functional boxplot), and so does the sparse two-stage functional boxplot. Compared to the existing functional boxplot, the sparse functional boxplot is different in the visualization of the median, the 50% central region, and the detected outliers. The median is drawn in black for the observed values and gray for the missing ones (Figure3, first row). After defining the

50% central region, we calculate the sparseness proportion for each time point. We display this percentage with a smoothed boundary line showing the observed proportion below and the sparseness proportion above. We ﬁll the observed proportion area in magenta and the sparseness proportion area in gray, with a thin dotted cyan reference line representing the case of 50% sparseness proportion at every time point. For the detected outliers, the ﬁtted missing values are represented by dashed gray, whereas the observed values are marked with dashed green or red, depending on whether they are detected as outliers with directional outlyingness or with the functional boxplot.

To give a pattern of the intensity of sparse points, we construct additional visualization tools: the intensity sparse functional boxplot and the intensity sparse two-stage functional boxplot.

We build a sparseness point process within the 50% central region and plot its intensity, with white representing the most sparse intensity and magenta the least sparse intensity. We leave an option of whether or not contours are shown in the central regions. We also leave an option of normalization: the intensity divided by the maximum within the variable, or the intensity divided by the maximum among all variables. Usually, we recommend normalizing the intensity by dividing by the maximum within the variable since the sparseness proportion of each variable can be identiﬁed through the sparse functional boxplot.

13 Sparse Functional Boxplot Sparse Functional Boxplot Sparse Functional Boxplot 3 2 2 2 1 1 1 0 0 0 Values Values Values −1 −1 −1 −2 −2 −2

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time Time Time

Intensity Sparse Functional Boxplot Intensity Sparse Functional Boxplot Intensity Sparse Functional Boxplot

3 %

100 2 2 2

80 1 1 1

60 0 0 0 Value Value Value

40 −1 −1 −1

20 −2 −2 −2

0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time Time Time

pcurve 0% 20% 60%

Figure 3: Plots on top are the sparse functional boxplots and plots on bottom are the intensity sparse functional boxplots for univariate functional data generated from a Gaussian process with mean zero and exponential covariance function. Plots from left to right display various sparseness cases with pcurve of 0%, 20%, and 60%, and psparse = 100% under the point sparseness setting.

In Figure3, we normalize the intensity by dividing by the maximum within the variable because various sparse settings are derived from the same univariate data. In this way, we can compare the relative intensity horizontally among diﬀerent sparseness cases. It is clear that when the curve sparseness pcurve raises from 0% to 20% and then to 60%, the sparseness intensity within the central area generally increases, which can be seen from the corresponding color change from the all-over magenta, to orange and magenta, and ﬁnally to mainly yellow and some white.

Simultaneously, the sparse functional boxplot changes from no gray area to almost 20% gray area, and then nearly 60% gray area in proportion to the central region, reﬂecting the change in the pcurve when psparse = 100%.

Overall, when combined, the sparse functional boxplot and the intensity sparse functional

14 boxplot provide information about the proportion and intensity of sparse points within the central areas. The choice between the sparse functional boxplot and the sparse two-stage functional boxplot depends on the existence of outliers in the data. When no outliers exist, there is no diﬀerence between the sparse functional boxplot and the sparse two-stage functional boxplot.

Speciﬁcally, the simulation in Section 4.3 explores performances of the sparse functional boxplot and its two-stage form when detecting multivariate functional outliers.

4 Simulation Study

This section starts with an introduction of the data settings. Next, we address two problems in the simulations: ﬁrst, we explore the choice of best depth; second, we measure the outlier detection performances of the sparse functional boxplot and the sparse two-stage functional boxplot with the best depth.

4.1 Data Settings

For simpliﬁcation, we assume p = 3. The complete data are deﬁned on common time grids.

Then we assign the point, peak, and partial sparseness to variables 1, 2, and 3 in the simulation. We set n = 100, and the ts are 50 equidistant points from [0, 1] for all settings. Here

2 > iid 2 2 2 2 2 2 µ(t) = (5 sin(2πt), 5 cos(2πt), 5(t − 1) ) , i(t) ∼ N3(0, diag{σ1, σ2, σ3}), where σ1, σ2, σ3 are independent and follow U[0.5, 0.7]. We use orthonormal Fourier basis functions to construct

(j) iid M+1−m ψm (t). We let M = 9, ρi,m ∼ N (0, νm), and νm = M . Eight models are provided by specifying the eigenfunctions and outliers as below. Model 1 is a reference model without contamination, while the remaining models include contamination by adjusting µ(t), ui(t), or i(t), with a contamination level of 10%.

Model 1 (no outlier):

M X Yi(t) = µ(t) + ui(t) + i(t) = µ(t) + ρi,mψm(t) + i(t), i = 1, . . . , n. m=1

15 Model 2 (persistent magnitude outlier):

(j) (j) ui,ou(t) = ui (t) + 8Wj for t ∈ [0, 1],

where Wj (j = 1, 2, 3) follows the binomial distribution of obtaining 1 with probability 0.5, and

−1 with probability 0.5.

Model 3 (isolated magnitude outlier):

(j) (j) ui,ou(t) = ui (t) + 8Wj for t ∈ [Ts,Ts + 0.1],

Ts is from U[0, 0.9], and Wj follows the same distribution as in Model 2.

Model 4 (shape outlier I):

(1) (2) (3) > µou(t) = (µ (t − 0.3), µ (t − 0.2), µ (t − 0.5)) .

Model 5 (shape outlier II):

(1) (1) (2) (2) (3) (3) ui,ou(t) = ui (t) + 2 sin(4πt), ui,ou(t) = ui (t) + 2 cos(4πt), ui,ou(t) = ui (t) + 2 cos(8πt),

(j) (j) ui,no(t) = ui (t) + Uj, j = 1, 2, 3,

where Uj is generated from U[−2.1, 2.1]. Model 6 (mixed outlier):

(1) (1) (2) (2) (3) (3) > µi,ou(t) = ((2 + Ri )µ (t), (2 + Ri )µ (t), (2 + Ri )µ (t) − 6) , j = 1, 2, 3,

(j) where Ri follows Exp(2), and Exp is a random sampling from the exponential distribution. Model 7 (joint outlier):

(1) (1) (2) (2) (3) (3) ui,ou(t) = ui (t) + Z1t sin(πt), ui,ou(t) = ui (t) + Z2t cos(πt), ui,ou(t) = ui (t) + Z3t sin(2πt) − 6, (1) (1) (2) (2) (3) (3) ui,no(t) = ui (t) + Z4t sin(πt), ui,no(t) = ui (t) + (8 − Z4)t cos(πt), ui,no(t) = ui (t) + (Z4 − 2)t sin(2πt) − 6,

where Z1-Z4 are generated from U[2, 8].

Model 8 (covariance outlier): i(t) are generated from a stationary isotropic cross-covariance model belonging to the Matérn family (Gneiting et al. 2010) with diﬀerent smoothness parame-

16 ters νij > 0:

2 Cii(s, t) = σi M(|s − t|; νii), i = 1, 2, 3,

Cij(s, t) = ρijσiσjM(|s − t|; νij) 1 ≤ i 6= j ≤ 3,

1−νij 2 p νij p and M(r; νij) = ( 2νijr) Kν ( 2νijr), where Kν is the modiﬁed Bessel function of the Γ(νij ) ij second kind of order ν, νii is generated from the uniform distribution, with ranges [2, 3] for the

νii+νjj nonoutliers and [0.3,0.5] for the outliers, and νij = 2 . The results from the simulations containing Models 2 (isolated magnitude outlier case) and

4 (shape outlier I case) with 10% proportion of outliers are presented in this paper, while the results from the remaining models are shown in the Supplementary Material. Simulations and

ﬁttings of the data in Models 1, 2, and 4 with pcurve = 20% and psparse = 100% are depicted in Figure4. The ﬁtted data capture the typical tendency of the curve for both outliers and non-outliers in the above models.

4.2 Simulation I: Choice of Depth

We apply the conventional depth to the MFPCA and BMFPCA fits (Eq. (2)-(3)), and name them MFHDmfpca and MFHDbmfpca, respectively. The BMFPCA fit is more robust and closer to the original data than the MFPCA fit. Hence, we apply the remaining revised depths RMFHDaw,

RMFHDnaw, and RMFHDdm (Eq. (7)-(9)) to the BMFPCA ﬁt.

The Spearman rank correlation coefficient assesses how well the rank based on the fitted data, given by the revised depth, correctly associates with the rank based on the original information, given by the conventional depth. When the coefficient is closer to 1, the stronger the association, and when it is closer to 0, the weaker the association.

The Spearman coeﬃcients of the above ﬁve methods are presented in Figure5 under various pcurve with psparse = 100% for Models 1-8 when the point sparseness exists in all variables. The results of the other sparseness types can be seen in the Supplementary Material. In all scenar-

17 Simulation for Model 1 : Variable 1 Simulation for Model 1 : Variable 2 Simulation for Model 1 : Variable 3 15 15 15 10 10 10 5 5 5 0 0 0 Values Values Values −5 −5 −5 −10 −10 −10 −15 −15 −15

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Time Time Time

Fitting for Model 1 : Variable 1 Fitting for Model 1 : Variable 2 Fitting for Model 1 : Variable 3 15 15 15 10 10 10 5 5 5 0 0 0 Values Values Values −5 −5 −5 −10 −10 −10 −15 −15 −15

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Time Time Time

Simulation for Model 2 : Variable 1 Simulation for Model 2 : Variable 2 Simulation for Model 2 : Variable 3 15 15 15 10 10 10 5 5 5 0 0 0 Values Values Values −5 −5 −5 −10 −10 −10 −15 −15 −15

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Time Time Time

Fitting for Model 2 : Variable 1 Fitting for Model 2 : Variable 2 Fitting for Model 2 : Variable 3 15 15 15 10 10 10 5 5 5 0 0 0 Values Values Values −5 −5 −5 −10 −10 −10 −15 −15 −15

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Time Time Time

18 Simulation for Model 4 : Variable 1 Simulation for Model 4 : Variable 2 Simulation for Model 4 : Variable 3 15 15 15 10 10 10 5 5 5 0 0 0 Values Values Values −5 −5 −5 −10 −10 −10 −15 −15 −15

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Time Time Time

Fitting for Model 4 : Variable 1 Fitting for Model 4 : Variable 2 Fitting for Model 4 : Variable 3 15 15 15 10 10 10 5 5 5 0 0 0 Values Values Values −5 −5 −5 −10 −10 −10 −15 −15 −15

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Time Time Time

Figure 4: Simulations and fittings for Models 1 (no outlier), 2 (persistent magnitude outlier), and 4 (shape outlier I), with pcurve = 20% and psparse = 100%. Typical curves are colored in black in the simulation plot (or blue in the fitting plot), and outliers are colored in red, with the observed points shown as black dots. Artificial sparseness is represented by gray dashed lines. We assign the point, peak, and partial sparseness to variables 1, 2, and 3 in the simulation. ios, the BMFPCA fit leads to the strongest rank association with the original data. Although

RMFHDdm achieves a higher Spearman coeﬃcient than RMFHDaw and RMFHDnaw when pcurve is small, it is not as robust as MFHDbmfpca when the curve sparseness reaches 50% and above.

All of them show a bit of weaker association when the curve is sparser on average. The Spear- man coeﬃcient may reach closer to 0.95 when α in (4) is increased from 0.05 to above 0.2 for all RMFHDs; RMFHDaw in the univariate setting follows this pattern (Sguera & López-Pintado

2020). While seeking the best α requires more computation, when α is above 0.05, the result usually is not statistically convincing.

Overall, MFHDbmfpca is the best depth for sparse multivariate functional data due to its strong rank association with the original data and simple procedure of application. It omits

19 Spearman coefficient Spearman coefficient Spearman coefficient Spearman coefficient 0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0 p p p p curve curve curve curve 10% 10% 10% 10% Spearman coefficientsfrom various methods:Model2(persistent magnitude outliercase) Spearman coefficientsfrom various methods:Model3(isolatedmagnitude outliercase) Spearman coefficientsfrom various methods: Model4(shapeoutlierIcase) Spearman coefficientsfrom various methods:Model1(nooutliercase) 20% 20% 20% 20% MFHD mfpc MFHD b−mfpc 30% 30% 30% 30% Methods RMFHD 20 aw RMFHD 40% 40% 40% 40% naw RMFHD dm 50% 50% 50% 50% 60% 60% 60% 60% iue5 oprsno v et akn ehd.Tepnl rmtpt otmrpeetboxplots represent setting, bottom each to In top from settings. panels data The various with under methods. coeﬃcients ranking Spearman depth ﬁve of of Comparison 5: Figure

Spearman coefficient Spearman coefficient Spearman coefficient Spearman coefficient

p 0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0 p p p p sparse curve curve curve curve being 10% 10% 10% 10% 100% Spearman coefficientsfrom various methods:Model8 (covariance outliercase) Spearman coefficientsfrom various methods:Model5(shapeoutlierIIcase) Spearman coefficientsfrom various methods:Model6(mixed outliercase) Spearman coefficientsfrom various methods:Model7(jointoutliercase) ntepitsasns type. sparseness point the in 20% 20% 20% 20% MFHD mfpc MFHD b−mfpc 30% 30% 30% 30% Methods RMFHD 21 aw RMFHD 40% 40% 40% 40% naw RMFHD p curve dm r ie from given are 50% 50% 50% 50% 10% 60% 60% 60% 60% to 60% , the confidence interval construction, and thus, requires no additional depth modifications and is more efficient in computation.

4.3 Simulation II: Choice of Visualization Tools

We apply the sparse functional boxplot and the sparse two-stage functional boxplot in diﬀerent scenarios with MFHDbmfpca. Usually, the two-stage functional boxplot (Dai & Genton 2018a) detects both magnitude and shape outliers better, with the help of directional outlyingness (Dai

& Genton 2019). We use pc, the correct detection rate (the number of correctly detected outliers divided by the number of outliers) and pf , the false detection rate (the number of falsely detected outliers divided by the number of non-outliers) to measure the performance of the above two tools in multivariate functional outlier detection.

In Section 4.1, we introduced various multivariate functional outliers in Models 2-8. Here in

Table1, we display the performance of pc and pf for the above models when the point sparseness assigned to all variables. The performances for the other sparseness types are provided in the

Supplementary Material. Overall, the sparse two-stage functional boxplot performs better in detecting true outliers than the sparse functional boxplot, at the expense of a slightly higher pf .

BMFPCA ﬁts the curves after removing the mean trend; hence, it manages to capture the typical behavior of the curves independent of their amplitude. Suppose outliers show an abnormality, especially in the shifted amplitude or time. In that case, the sparse two-stage functional boxplot obtains a high pc. However, there are some cases when the sparse two-stage functional boxplot does not detect outliers correctly. One such case is when the outlying curves only show the abnormality in a small time interval, such as Model 3 (isolated magnitude outlier). As the

fitted curve does not recapture the outlier in shape correctly, one alternative is to impute the missing data with the fitted data and keep the observed data. Another case appears when the abnormality exists in the smoothness differences between the curves, such as Models 5 (shape outlier II) and 8 (covariance outlier). It is difficult to detect the outlying samples due to few

22 Table 1: The mean and standard deviation (in parenthesis) of the percentages pc and pf for the sparse functional boxplot and the sparse two-stage functional boxplot with 1000 replications and 100 curves when the curve sparseness pcurve is 20%, 40%, and 60%, respectively, and psparse = 100% for the point sparseness type, in the presence of Models 2-8. The proportion of outliers in each model is 10%.

Model 2 (persistent magnitude outlier):

Curve Sparseness pcurve = 20% pcurve = 40% pcurve = 60% Ratios p p p p p p Methods c f c f c f Sparse 64.9 (25.0) 0.0 (0.0) 61.4 (23.5) 0.0 (0.0) 66.0 (25.5) 0.0 (0.0) Sparse Two-Stage 100.0 (0.0) 0.0 (0.0) 100.0 (0.0) 0.0 (0.0) 100.0 (0.0) 0.0 (0.0)

Model 3 (isolated magnitude outlier):

Curve Sparseness pcurve = 20% pcurve = 40% pcurve = 60% Ratios p p p p p p Methods c f c f c f Sparse 2.8 (7.0) 0.0 (0.1) 4.5 (6.2) 0.0 (0.1) 10.9 (10.7) 0.3 (0.8) Sparse Two-Stage 11.0 (12.7) 0.9 (1.2) 20.4 (13.6) 1.8 (2.0) 31.8 (16.0) 5.2 (4.6)

Model 4 (shape outlier I):

Curve Sparseness pcurve = 20% pcurve = 40% pcurve = 60% Ratios p p p p p p Methods c f c f c f Sparse 15.7 (19.4) 0.0 (0.0) 15.4 (21.4) 0.0 (0.0) 13.5 (19.4) 0.0 (0.0) Sparse Two-Stage 97.8 (4.9) 0.1 (0.5) 97.1 (6.3) 0.1 (0.4) 97.5 (4.8) 0.1 (0.4)

Model 5 (shape outlier II):

Curve Sparseness pcurve = 20% pcurve = 40% pcurve = 60% Ratios p p p p p p Methods c f c f c f Sparse 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) Sparse Two-Stage 3.0 (7.0) 1.0 (1.0) 2.5 (6.0) 1.0 (1.0) 3.2 (6.5) 1.0 (1.1)

Model 6 (Mixed outlier):

Curve Sparseness pcurve = 20% pcurve = 40% pcurve = 60% Ratios p p p p p p Methods c f c f c f Sparse 7.5 (9.4) 0.0 (0.0) 6.7 (9.2) 0.0 (0.0) 7.8 (7.7) 0.0 (0.0) Sparse Two-Stage 64.2 (16.5) 0.5 (0.8) 62.2 (16.4) 1.8 (0.7) 66.1 (17.9) 2.1 (0.8)

23 Model 7 (Joint Outlier):

Curve Sparseness pcurve = 20% pcurve = 40% pcurve = 60% Ratios p p p p p p Methods c f c f c f Sparse 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) Sparse Two-Stage 55.0 (17.0) 9.2 (3.2) 53.7 (15.9) 9.3 (2.6) 51.4 (14.4) 8.1 (2.5)

Model 8 (Covariance Outlier):

Curve Sparseness pcurve = 20% pcurve = 40% pcurve = 60% Ratios p p p p p p Methods c f c f c f Sparse 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) Sparse Two-Stage 1.6 (4.0) 0.9 (1.2) 1.1 (3.7) 0.9 (1.1) 1.2 (3.7) 1.0 (1.2)

smoothness diﬀerences among the ﬁtted curves. On the whole, we recommend the sparse two- stage functional boxplot for detecting potential outliers.

5 Applications

Applications include the univariate CD4 cell counts and bivariate malnutrition data introduced in Section1. The analysis follows the procedures of ﬁtting the data and consideration between the sparse functional boxplot and its two-stage form. We do not show contours in the intensity sparse two-stage functional boxplot for the CD4 cell counts due to the narrow range in the central region, but we oﬀer them for the malnutrition data. Here, we implement the normalization by dividing the intensity by the maximum within each variable. The comparisons of sparseness proportion between variables are displayed from the sparse functional boxplots.

5.1 Univariate Case: CD4 Cell Counts Data

The human immune deﬁciency virus (HIV) harms the body by attacking an immune cell called the

CD4 cell and, thus, making the body more vulnerable to other illness-causing germs. Because of this, CD4 cell count per milliliter of blood (Taylor et al. 1989) can be used to track the progression of HIV. A person with untreated HIV will experience a series of stages: acute HIV

24 infection (2-6 weeks), stage one (1-5 years), stage two (6-9 years), stage three (9-11 years), which is also called the acquired immunodeﬁciency syndrome (AIDS), and stage four (11-12 years).

In the acute HIV infection, the CD4 cell counts go down from 1200 to 900; in stage one, the

CD4 cell counts drop slowly from 900 to 500; during stage two, it may go down from 500 to

350 (HIV.gov 2020). This CD4 cell counts data (available in the refund package in R) was analysed by Goldsmith et al.(2013) to construct the corrected conﬁdence interval for sparse functional data. It includes observed CD4 cell counts for 366 infected individuals from −18 to 42 months (Figure1), where 0 represents seroconversion, the moment when the antibody becomes detectable in the blood. The observation period in this data set includes acute HIV infection and stage one, or stage two for some patients who are in worse situations.

The CD4 cell count is a point sparseness case with psparse = 100.0% and pcurve = 91.5%, which is displayed by the magenta area in the sparse two-stage functional boxplot in Figure6. The sparse functional boxplot detects seven outliers with a contamination level of 1.9%. The sparse two-stage functional boxplot detects 65 outliers from directional outlyingness and two extra outliers from the functional boxplot, with a contamination level of 18.3%. Figure6 contains the

CD4 cell counts after they were fitted with the iterated expectation from UFPCA, and ordered according to the modified band depth (MBD, López-Pintado & Romo 2009). Compared to the sparse observations shown in Figure1, the fit is robust and not overfitted, with an estimated range between 0 and 1600, which lies in the range of low and normal CD4 cell count per cubic millimeter of blood. Additionally, the intensity sparse two-stage functional boxplot indicates that there are two time climaxes of missing values, one at the start and another during 15-30 months after seroconversion, respectively estimated around 530 and 400.

In this analysis, we wish to review the progression of an HIV infection during diﬀerent observation periods according to the visualization tools we introduced. In the window period, before the HIV antibody can be detected in the blood, CD4 cell counts have an inverse U shape, which can be seen from outliers and the upper fence of the central region, implying the process of rapid

25 Sparse Two−Stage Functional Boxplot: Observed CD4 Counts Intensity Sparse Two−Stage Functional Boxplot: Observed CD4 Counts %

100 1500 1500

80 1000 1000 60 Total CD4 Counts Total Total CD4 Counts Total 40 500 500

20 0 0 0 −20 −10 0 10 20 30 40 −20 −10 0 10 20 30 40 Months since seroconversion Months since seroconversion

Figure 6: The sparse two-stage functional boxplot and the intensity sparse two-stage functional boxplot for observed CD4 cell counts. Sixty-five outliers (green dashes) are detected from directional outlyingness, and two outliers (red dashes) are detected from the functional boxplot. replication and severe drop in CD4 cells. This reflects the HIV first invading the body, stim- ulating CD4 cells to replicate, and then infecting CD4 cells by injecting them with its genetic material, leading to a fast drop in the number of CD4 cells.

Although the CD4 cell count varies across patients, the 50% central region shows a relative narrow range. It starts between 450 and 800 at 18 months before seroconversion and ends between

350 and 550 at 42 months after seroconversion. The median shows a change in CD4 cell counts from 600 to 350 during the observation period, which implies that the corresponding patient has reached stage two by the end of the observation period. Half of the patients are below this level

42 months after seroconversion, implying that they have not received the treatment, or that the medicine may not be eﬀective for them. We see six outliers with CD4 cell counts below 350 at 40 months after seroconversion, implying that they are close to, or already at, stage three (AIDS).

However, most of the outliers show a rising trend of CD4 cell counts above 500 in the period after seroconversion, suggesting that they are on the right track against HIV. Surprisingly, there are several patients with CD4 cell counts around or above 1000 at 40 months after seroconversion, whose treatment or lifestyle may deserve further study.

26 5.2 Bivariate Case: Malnutrition Data

The malnutrition data, from the United Nations Children’s Fund (UNICEF 2020) data ware- house, include two variables, stunted growth and the prevalence of low birth weight, collected in

77 countries from 1986 to 2019. Stunted growth is deﬁned as the proportion of newborns aging from 0 to 59 months with a low height-for-age measurement (below two standard deviations).

The stunted growth data represent a point sparseness case with 4-22 recordings per nation; pcurve ranges from 29.4% in Bangladesh to 88.2% in Argentina. The low birth weight data are a partial sparseness case, with recordings from 2000-2015 only; pcurve = 52.9% for all nations. Missing values are ﬁtted reasonably well with BMFPCA and the estimated range lies in [0, 100%], matching the deﬁnition of the prevalence.

Since only one outlier (Bangladesh) is detected as a magnitude outlier in the prevalence of the low birth weight via the sparse functional boxplot, we implemented the sparse two-stage functional boxplot and detected 11 outliers. The central region in the top row of Figure7 displays that most nations went through a signiﬁcant decline in stunted growth prevalence over time, while the central region in low birth weight prevalence implies that most countries had a slow drop or rise before 2000 and a steady trend since 2000. Additionally, the low birth weight prevalence is right-skewed which can be seen from the margin diﬀerence between the central region and the fences. The skewed setting requires more attention to nations with high prevalence of low birth weight.

The intensity sparse two-stage functional boxplot (bottom row, Figure7) displays the relative intensity of the missing values inside the central region for each variable. Horizontally, the missing values are most intense during 1985-1995 for the prevalence of stunted growth, and 2015-2019 for the prevalence of low birth weight. Vertically, the ﬁtted missing values are concentrated mainly at 30-40% from 1985 to 1995, and shrink to 20-30% from 2018 to 2019 in stunted growth prevalence. In low birth weight prevalence, the ﬁtted values are restricted to 10-13% before 1995 and expand to 8-17% after 2015.

27 Sparse Two−Stage Functional Boxplot: Stunted Growth (Country Level) Sparse Two−Stage Functional Boxplot: Low Birth Weight (Country Level) 70 70 60 60

8 50 50

40 47 40 51 51 10 8 30 30 Prevalence (%) Prevalence (%) Prevalence

17 74 475 20 20 10

57 17 74 3657 10 236 10 2 12 71 71 12 0 0

1985 1990 1995 2000 2005 2010 2015 2020 1985 1990 1995 2000 2005 2010 2015 2020 Year Year

Intensity Sparse Two−Stage Functional Boxplot: Stunted Growth (Country Level) Intensity Sparse Two−Stage Functional Boxplot: Low Birth Weight (Country Level) % 70 70 100 60 60

80 50 50

40 40 60 30 30 Prevalence (%) Prevalence (%) Prevalence 40 20 20

20 10 10

0 0 0

1985 1990 1995 2000 2005 2010 2015 2020 1985 1990 1995 2000 2005 2010 2015 2020 Year Year

Figure 7: Visualization of stunted growth and low birth weight data for 77 countries with the sparse two-stage functional boxplot and the intensity sparse two-stage functional boxplot. One-to-one maps (number, outlier nation) are the following: (2, Argentina), (5, Burundi), (8, Bangladesh), (10, Bhutan), (12, Chile), (17, Republic of the Congo), (36, Kuwait), (47, Malawi), (51, Nepal), (57, Romania), (71, United States), and (74, Vietnam).

We use the human development index (United Nations Development Programme 2019), which categorize countries into low, medium, high and very high human development groups, to analyze the malnutrition trend in these countries. In the last 34 years, the median, Republic of the

Congo, which belongs to the medium human development group, reﬂects a representative trend of a smooth decline from 40% to 28% in the prevalence of stunted growth, and a constant trend around 12% in the prevalence of low birth weight. Outliers exist in all human development groups, often showing an abnormality in the shape of stunted growth prevalence. From the very high

28 human development group, Argentina, Chile, and the United States demonstrate an exception in shape lying at the bottom of the stunted growth’s prevalence and a slow rise at the bottom of the low birth weight’s prevalence. Outliers from high human development nations include

Romania and Kuwait, whose records indicate higher prevalence in both variables, with diﬀerent shapes in stunted growth prevalence. From the middle human development group, Bhutan and

Vietnam show only an abnormal shape in the prevalence of stunted growth, which is not far from their respective median. Finally, Burundi, Bangladesh, Malawi, and Nepal, which all belong to the low human development index group, lie at the top of the samples in Figure7. They all show a faster decline rate in the prevalence of the stunted growth compared to the rest of their group, except for Burundi, and a signiﬁcant decline, especially for Bangladesh and Nepal, in the prevalence of the low birth weight. Nevertheless, they are on top of the curves in 2019 and need more actions to reduce malnutrition.

6 Discussion

In this paper, we proposed two ﬂexible tools, the sparse functional boxplot and the intensity sparse functional boxplot, for visualizing sparse multivariate functional data. We also introduced a sparse form of the two-stage functional boxplot (Dai & Genton 2018a), which is itself a derivation from the functional boxplot (Sun & Genton 2011) with better outlier detection, called the sparse two-stage functional boxplot. All of these can be applied in both univariate and multivariate functional settings. When the data are observed on common time grids without missing values, the tools reduce to the functional boxplot and two-stage functional boxplot. We believe the introduction of visualization tools for sparse data is of great importance due to the wide applications of longitudinal studies and the common challenge of missing values in real data sets.

To apply the visualization tools described above to sparse functional data, we required an appropriate data ﬁtting and depth for data ordering. On the basis of MFPCA (Happ & Greven

29 2018), we improved the ﬁtting of data through the iterated expectation from bootstrap (Gold- smith et al. 2013). We took the multivariate functional halfspace depth (MFHD, Claeskens et al.

2014) as a building block for proposing various revised depths for sparse multivariate functional data. We obtained the best depth via the Spearman rank coeﬃcient simulated in various data settings and sparseness scenarios. In the univariate functional setting, MFPCA became UFPCA

(Yao et al. 2005), and we used the modiﬁed band depth (MBD, López-Pintado & Romo 2009) to order the data before applying the functional tools.

Besides data visualization, the sparse functional boxplot and its two-stage form can also detect outliers. Simulations demonstrated that the sparse two-stage functional boxplot performed better than the sparse functional boxplot in outlier detection. Usually, we obtained a higher pc, at the expense of a slightly higher pf . In the shape outlier cases, though we signiﬁcantly improved pc with the sparse two-stage functional boxplot thanks to the directional outlyingness (Dai &

Genton 2019), there was some space for a higher pc, which may require a further innovation in the outlier detection. Two public health applications, CD4 cell counts in individuals and malnutrition data at national levels, displayed how information may be extracted from the sparse two-stage functional boxplot and the intensity sparse two-stage functional boxplot. Extensions of the methods proposed in this paper to sparse images and surfaces could be explored with the surface boxplot (Genton et al. 2014).

Appendix

The Appendix presents the theorems used in Section 2.1. All the ﬁttings are based on the

MFPC decomposition objects θb, so we omit θ here for simpliﬁcation. Given Propositions 1-

7 from Happ & Greven(2018), we assume ρi,m and i are jointly Gaussian. Consider ∀j ∈

(1) (p) (1) (1) (p) (p) > p {1, . . . , p}, t := (t , . . . , t ) ∈ T , Xi(t) := (Xi (t ),...,Xi (t )) ∈ R , and Xci(t) :=

(1) (1) (p) (p) > p > > (Xbi (t ),..., Xbi (t )) ∈ R . Deﬁne ρiM = (ρi,1, . . . , ρi,M ) , and ρbiM = (ρbi,1,..., ρbi,M ) , P∞ Pp (j) Xi(t) = µ(t) + m=1 ρi,mψi,m(t), and |||X||| := j=1 kX k.

30 Theorem 1 |||Xci − Xi||| converges to 0 in probability for M1,...,Mp → ∞.

dMe Proof: Proposition 6 in Happ & Greven(2018) proves that |||Xi −Xi||| converges to 0 in prob-

dMe ability. Proposition 7 in Happ & Greven(2018) proves that |||Xi − Xci||| = Op(Mmax4M rn),

−1 −2 where Mmax = max Mj, rn = n h in the case of measurement error or irregularly sampled j=1,...,p data for a certain bandwidth h. Since µ is estimated marginally to p variables, the convergence

(j) (j) (j) (j) (j) (j) (j) (j) √ 1 rate of µ (t )−µ (t ) (Yao et al. 2005) still holds, sup |µ (t )−µ (t )| = OP ( nh ), b (j) b µ(j) t ∈Tj 1 and we name the bandwidth hmin,µ := min{h (j) }. Therefore sup kµ(t) − µ(t)k = OP ( √ ). µ b nhmin,µ j t∈T Therefore, |||Xci − Xi||| converges to 0 in probability for M1,...,Mp → ∞.

Theorem 2 ρ −ρ can be approximated by N (0, Ω ), where Ω = V −V Ψ> Σ−1Ψ V . biM iM M M M iM Yi iM

dMe Proof: Proposition 6 in Happ & Greven(2018) proves that ρi,m converges to ρi,m in prob-

dMe ability for all m ∈ N. Proposition 7 in Happ & Greven(2018) proves that |ρi,m − ρbi,m| = 3/2 −1/2 −1 −2 Op(Mmax max(n , 4M rn)), where Mmax = max Mj, rn = n h in the case of mea- j=1,...,p surement error or irregularly sampled data for a certain bandwidth h. Therefore ρbi,m converges to ρi,m in distribution. We can regard ρbi,m as a conditional expectation of ρi,m given

Yi. Assume that ρi,m and i are jointly Gaussian, ρi,m and Yi are also jointly Gaussian, so do ρ and Y . var(ρ ) = cov(ρ , Y )Σ−1cov(ρ , Y )> = V Ψ> Σ−1Ψ V , Σ = iM i biM iM i Yi iM i iM Yi iM Yi Ψ V Ψ> + diag{σ2I , . . . , σ2I }. cov(ρ , ρ ) = var(ρ ) because ρ is the projec- iM iM 1 N1i p Npi biM iM biM biM tion of ρiM on the space spanned by the linear functions of Yi. Then var(ρbiM − ρiM ) = var(ρbiM ) + var(ρiM ) − 2cov(ρbiM , ρiM ) = var(ρiM ) − var(ρbiM ) = ΩM . Under Gaussian assump- tions, (ρbiM − ρiM ) ∼ NM (0, ΩM ) when n → ∞.

> Theorem 3 Xci(t) − Xi(t) can be approximated by Np(0, ψiM (t)ΩM ψiM (t)), where the matrix

> p×M ψiM (t) = (ψi,1(t),..., ψi,M (t)) ∈ R .

Proof: Xci(t)−Xi(t) = µbi−µi+ψbiM (t)(ρbiM −ρiM )+(ψbiM (t)−ψiM (t))ρiM , var(Xci(t)−Xi(t)) ≈

> ψiM (t)var(ρbiM −ρiM )ψiM (t), due to ψbiM (t)−ψiM (t) converging to 0 in norm from Propositions

31 6-7 in Happ & Greven(2018). From the distribution of ρbiM −ρiM , we can infer that Xci(t)−Xi(t)

> follows Np(0, ψiM (t)ΩM ψiM (t)) asymptotically when n → ∞.

References

Adler, R. J. (2010). The Geometry of Random Fields. SIAM. Arribas-Gil, A. & Romo, J. (2014). Shape outlier detection and visualization for functional data: the outliergram. Biostatistics, 15(4), 603–619. Cai, T. T. & Hall, P. (2006). Prediction in functional linear regression. The Annals of Statistics, 34(5), 2159–2179. Carroll, C., Müller, H.-G., & Kneip, A. (2020). Cross-component registration for multivariate functional data, with application to growth curves. Biometrics, to appear. Claeskens, G., Hubert, M., Slaets, L., & Vakili, K. (2014). Multivariate functional halfspace depth. Journal of the American Statistical Association, 109(505), 411–423. Cuesta-Albertos, J. A. & Nieto-Reyes, A. (2008). The random Tukey depth. Computational Statistics & Data Analysis, 52(11), 4979–4988. Cuevas, A., Febrero, M., & Fraiman, R. (2007). Robust estimation and classification for functional data via projection-based depth notions. Computational Statistics, 22(3), 481–496. Dai, W. & Genton, M. G. (2018a). Functional boxplots for multivariate curves. Stat, 7(1), e190. Dai, W. & Genton, M. G. (2018b). Multivariate functional data visualization and outlier detection. Journal of Computational and Graphical Statistics, 27(4), 923–934. Dai, W. & Genton, M. G. (2019). Directional outlyingness for multivariate functional data. Computational Statistics & Data Analysis, 131, 50–65. Genton, M. G., Johnson, C., Potter, K., Stenchikov, G., & Sun, Y. (2014). Surface boxplots. Stat, 3(1), 1–11. Genton, M. G. & Sun, Y. (2020). Functional data visualization. Handbook of Computational Statistics and Data Science, Wiley StatsRef: Statistics Reference Online, (pp. 1–11). Gneiting, T., Kleiber, W., & Schlather, M. (2010). Matérn cross-covariance functions for multivariate random fields. Journal of the American Statistical Association, 105(491), 1167–1177. Goldsmith, J., Greven, S., & Crainiceanu, C. (2013). Corrected confidence bands for functional data using principal components. Biometrics, 69(1), 41–51. Happ, C. (2018). Mfpca: multivariate functional principal component analysis for data observed on different dimensional domains. R package version, 1. Happ, C. & Greven, S. (2018). Multivariate functional principal component analysis for data

32 observed on diﬀerent (dimensional) domains. Journal of the American Statistical Association, 113(522), 649–659. HIV.gov (2020). Symptoms of HIV. https://www.hiv.gov/hiv-basics/overview/about-hiv -and-aids/symptoms-of-hiv. Accessed: 2020-12-07. Hubert, M., Rousseeuw, P. J., & Segaert, P. (2015). Multivariate functional outlier detection. Statistical Methods & Applications, 24(2), 177–202. Hyndman, R. J. & Shang, H. L. (2010). Rainbow plots, bagplots, and boxplots for functional data. Journal of Computational and Graphical Statistics, 19(1), 29–45. Ibrahim, J. G. & Molenberghs, G. (2009). Missing data methods in longitudinal studies: a review. TEST, 18(1), 1–43. Ieva, F. & Paganoni, A. M. (2013). Depth measures for multivariate functional data. Commu- nications in Statistics-Theory and Methods, 42(7), 1265–1276. James, G. M., Hastie, T. J., & Sugar, C. A. (2000). Principal component models for sparse functional data. Biometrika, 87(3), 587–602. Li, C., Xiao, L., & Luo, S. (2018). Fast covariance estimation for multivariate sparse functional data. arXiv preprint arXiv:1812.00538. Liu, C., Ray, S., & Hooker, G. (2017). Functional principal component analysis of spatially correlated data. Statistics and Computing, 27(6), 1639–1654. Liu, X. & Yang, M. C. (2009). Simultaneous curve registration and clustering for functional data. Computational Statistics & Data Analysis, 53(4), 1361–1376. López-Pintado, S. & Romo, J. (2009). On the concept of depth for functional data. Journal of the American Statistical Association, 104(486), 718–734. López-Pintado, S., Sun, Y., Lin, J. K., & Genton, M. G. (2014). Simplicial band depth for multivariate functional data. Advances in Data Analysis and Classiﬁcation, 8(3), 321–338. López-Pintado, S. & Wei, Y. (2011). Depth for sparse functional data. In Recent Advances in Functional Data Analysis and Related Topics (pp. 209–212). Springer. Mirzargar, M., Whitaker, R. T., & Kirby, R. M. (2014). Curve boxplot: Generalization of boxplot for ensembles of curves. IEEE Transactions on Visualization and Computer Graphics, 20(12), 2654–2663. Narisetty, N. N. & Nair, V. N. (2016). Extremal depth for functional data and applications. Journal of the American Statistical Association, 111(516), 1705–1714. Qu, Z., Dai, W., & Genton, M. G. (2021). Robust functional multivariate analysis of variance with environmental applications. Environmetrics, 32, e2641. Ramsay, J. O. & Dalzell, C. (1991). Some tools for functional data analysis. Journal of the Royal Statistical Society: Series B (Methodological), 53(3), 539–561.

33 Ramsay, J. O. & Silverman, B. W. (2007). Applied Functional Data Analysis: Methods and Case Studies. Springer. Sguera, C., Galeano, P., & Lillo, R. (2014). Spatial depth-based classiﬁcation for functional data. 23(4), 725–750. Sguera, C. & López-Pintado, S. (2020). A notion of depth for sparse functional data. TEST, (pp. 1–20). Srivastava, A., Wu, W., Kurtek, S., Klassen, E., & Marron, J. S. (2011). Registration of functional data using Fisher-Rao metric. arXiv preprint arXiv:1103.3817. Sun, Y. & Genton, M. G. (2011). Functional boxplots. Journal of Computational and Graphical Statistics, 20(2), 316–334. Sun, Y. & Genton, M. G. (2012a). Adjusted functional boxplots for spatio-temporal data visualization and outlier detection. Environmetrics, 23(1), 54–64. Sun, Y. & Genton, M. G. (2012b). Functional median polish. Journal of Agricultural, Biological, and Environmental Statistics, 17(3), 354–376. Taylor, J., Fahey, J. L., Detels, R., & Giorgi, J. V. (1989). CD4 percentage, CD4 number, and CD4: CD8 ratio in HIV infection: which to choose and how to use. Journal of Acquired Immune Deﬁciency Syndromes, 2(2), 114–124. Thompson, W. K. & Rosen, O. (2008). A Bayesian model for sparse functional data. Biometrics, 64(1), 54–63. Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, Vancouver, 1975, volume 2 (pp. 523–531). UNICEF (2020). Malnutrition. https://data.unicef.org/resources/data_explorer/ unicef_f/?ag=UNICEF&df=GLOBAL_DATAFLOW&ver=1.0&dq=.NT_ANT_WHZ_NE3+NT_ANT_HAZ _NE2+NT_BW_LBW+NT_ANT_WHZ_NE2..&startPeriod=2016&endPeriod=. Accessed: 2020-06- 16. United Nations Development Programme (2019). Human Development Report 2019. http:// hdr.undp.org/sites/default/files/hdr2019.pdf. Accessed: 2020-11-25. Xiao, L., Li, C., Checkley, W., & Crainiceanu, C. (2018). Fast covariance estimation for sparse functional data. Statistics and Computing, 28(3), 511–522. Yao, F., Müller, H.-G., & Wang, J.-L. (2005). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association, 100(470), 577–590. Yao, Z., Dai, W., & Genton, M. G. (2020). Trajectory functional boxplots. Stat, 9(1), e289. Zhang, J.-T. (2013). Analysis of Variance for Functional Data. Chapman and Hall/CRC. Zhou, L., Huang, J. Z., & Carroll, R. J. (2008). Joint modelling of paired sparse functional data using principal components. Biometrika, 95(3), 601–619.