Statistical Methods and Applications manuscript No. (will be inserted by the editor)

Rejoinder to ‘Multivariate Functional Detection’

Mia Hubert Peter Rousseeuw Pieter Segaert · ·

CORE Metadata, citation and similar papers at core.ac.uk

Provided by Lirias

Received: date / Accepted: date

First of all we would like to thank the editor, Professor Andrea Cerioli, for inviting us to submit our work and for requesting comments from some esteemed colleagues. We were surprised by the number of invited comments and grateful to their contribut- ing authors, all of whom raised important points and/or offered valuable suggestions. We are happy for the opportunity to rejoin the discussion. Rather than addressing the comments in turn we will organize our rejoinder by topic, starting with comments directly related to concepts we proposed in the paper and continuing with some ex- tensions.

1 Bagdistance and

We did not know that the name ‘bagdistance’ was already used in a totally different setting, as pointed out by Karl Mosler. We sort of assumed that the name ‘bagdis- tance’ sounded strange enough to be unique, but we will be careful to consult Dr. Google next time! More importantly, we were not aware of the paper by Riani and Zani (1998) that was referenced in the discussion by Aldo Corbellini, Marco Riani and Anthony Atkinson. That paper appeared in the proceedings of a conference neither of us at- tended, and unfortunately it is much harder to look up an idea by search engine than it is to look up a phrase. In the meantime the authors were so kind as to provide us with a copy of their paper. We agree that the generalized distance of Riani and Zani (1998) is very similar to the bagdistance for p 6 2, the only difference being the choice of contour (in their case based on elliptical peeling and cubic spline smooth- ing) and center (for which they use the intersection of two least squares lines). As their construction was restricted to p = 2 they extended it to higher dimensions by

M. Hubert, P. Rousseeuw, P. Segaert Department of Mathematics, KU Leuven, Celestijnenlaan 200B, BE-3001 Heverlee, Belgium Tel.: +32-16-322023 E-mail: [email protected] 2 Hubert, Rousseeuw, Segaert

applying their method to pairs of variables. The generalized distance of Riani and Zani (1998) has many aspects in common with the bagdistance, such as its ability to reflect asymmetry in the data. We still prefer the bag and center based on halfspace depth, because halfspace depth is more robust than convex peeling as shown by Donoho and Gasko (1992) and the definition of the bagdistance applies to any dimension. As an aside, the bag as de- fined in Rousseeuw et al. (1999) was interpolated between two depth contours, so no data point had to lie on the contour of the bag. However, some simplified algorithms for the bag do have this property. Riani and Zani (1998) explicitly draw the fence (the boundary outside of which q 2 points are flagged as outlying) in their plot, at the distance χ2,0.99 which almost exactly coincides with the factor 3 of the bagplot’s fence. The bagplot shows the loop, which is the convex hull of the data points inside the fence and generalizes the whiskers of the univariate bagplot. By default the bagplot doesn’t show the fence itself, but we included it in Figure 1(a) for comparison. It is a plot of plasma triglyc- eride concentration versus cholesterol for n = 320 patients (Hand et al. 1994) with 2pronounced skewness. Hubert, Rousseeuw, Segaert

Bagplot based on halfspace depth Bagplot based on skew−adjusted projection depth 1000 1000

750 750

500 500 Triglycerides Triglycerides

250 250

0 0

100 200 300 400 100 200 300 400 Cholesterol Cholesterol

Fig. 1 Bloodfat data: (a) bagplot based on bagdistance; (b) bagplot based on adjusted outlyingness. In Fig. 1 Bagplot of the bloodfat data based on halfs- Fig. 2 Bagplot of the bloodfat data based on skew- both cases the dashed line is the fence. pace depth with indication of the fence. adjusted projection depth with indication of the fence. Note that we can also create ‘generalized bagplots’ based on other distance mea- suresBox-cox than the transformation: bagdistance. eventueelFor instance, verwijzen we can naar use Yohai, the skew-adjusted Van der Veeken outlyingness (wordt misschienAO which al is teveel, even morewe verwijzen robust than al een halfspace keer naar depth. Riani) Since we showed in Theorem 4 that its contours are convex, the resulting generalized bagdistance is a norm. Gener- alizing the fence of the bagplot we could flag a p-variate point x as outlying if 3 Weight function q 2 AO(x)/median(AO(xi)) > χ In our examples we always used a constanti weight functionp,0.99 in MFD, but as already explained in Claeskens et al (2014) using a non-constant weight function can be very interestingFigure 1(b) if we shows want the to bagplot emphasize based or downweighton AO with thiscertain cutoff. time It periods. looks similar Usually to the that databased do on not halfspace come with depth, their but own is intrinsic not quite weight the same. function, Several so thispoints is a at choice the bottom of the of statisticianthe plot used depending to be barely on the inside purpose the fence,of the andstudy. are As now pointed barely out outside. by ? this Fortunately, choice mightin a graphical heavily affect display the it conclusions, is easy to see so that a well-chosen they are borderline weight function cases. is indispens- able. When the purpose is on outlier detection, ? propose to use a weight function which is proportional to the variability of the curves. Instead of a robust measure of variability (such as the volume of the depth regions proposed in Claeskens et al (2014)) they consider a non-robust measure. [Mia: ik ga nog proberen een figuur te maken waarbij dit hopelijk misgaat...] Extending the functional bagdistance and adjusted outlyingness with a weight function is a valuable suggestion. [dan misschien zeggen dat we fbd en fAO altijd bedoeld hadden met kansmaat erbij? Ligt lastig, maar we moeten er wel iets over zeggen voor we de FOM van volgende sectie kunnen invoeren].

4 Functional outlier map

Several good ideas concerning Centrality-stability plot: to rescale the vertical axis (He), to focus on the outlyingness instead of centrality (Alicia, Genton), to put the standard deviation of the AO on the vertical axis (Alicia). Combining the ideas leads to our new proposal of functional outlier map: stdt (AO)/(1 + f AO) on vertical axis, and fAO on the horizontal axis. And size of the bullet according to percentage of time it is declared as a multivariate outlier, to reflect the amount of local outlyingness. If Rejoinder to ‘Multivariate Functional Outlier Detection’ 3

Note that the above formula for the cutoff is similar to that of the usual Ma- 1 halanobis distance, whose distribution can be approximated by χp under normality. Therefore, the above cutoff shares the ‘curse of dimensionality’ issues of the Maha- lanobis distance for high p, such as the fact that its distribution gets concentrated at its median. In other words, clean normal observations lie essentially on a sphere with very few points inside of it. Therefore, we only use this cutoff for small p.

2 Weight function

The examples in the paper used a constant weight function in MFD, but as already explained by Claeskens et al. (2014) a non-constant weight function can be very use- ful to emphasize or downweight certain time periods. We will always assume w.l.o.g. that the discrete weights sum to one (and analogously that the weight function inte- grates to one), which has the advantage that e.g. the functional adjusted outlyingness fAO is on the same scale as the AO at one time point. Usually the data do not come with their own intrinsic weight function, so this is a choice of the depending on the purpose of the study. As pointed out by Francesca Ieva and Anna Paganoni, this choice might heavily affect the conclusions, so a well-chosen weight function is indispensable. When the purpose is to detect out- liers, Davy Paindaveine and Germain Van Bever propose to use a weight function which is proportional to the variability of the curves. Instead of a robust measure of variability [such as the volume of the depth regions proposed in (Claeskens et al. 2014)] they prefer a nonrobust measure for this purpose. Alternatively, the weight function could be the inverse of a robust dispersion meausure. Sara Lopez´ suggested to extend the functional bagdistance and adjusted outlyingness as well by incorporat- ing a weight function.

3 Functional outlier map

Several discussants provided good ideas concerning the centrality-stability plot. Naveen Narisetti and Xuming He proposed to standardize the quantity on the vertical axis in order to stabilize its variability. Yuan Yan and Marc Genton, as well as Alicia Nieto-Reyes and Juan Cuesta-Albertos, suggested to focus on outlyingness instead of centrality, and the latter discussants also proposed to put the standard deviation of the AO on the vertical axis instead of the formula with the harmonic mean. We are grateful for these insightful suggestions. Combining these ideas leads to a new proposal for a functional outlier map, which is to plot for each curve Yi its fAO(Yi;Pn) (the functional AO) on the horizontal axis and

stdev(AO(Yi(t j);Pn(t j)))/(1 + fAO(Yi;Pn)) j

on the vertical axis. This outlier map can then be read in a way similar to the outlier map of (Rousseeuw and van Zomeren 1990) and the outlier map of robust principal components (Hubert et al. 2005). 4 Hubert, Rousseeuw, Segaert

The division by fAO on the vertical axis can be justified as follows. Suppose the center curve is at zero, and Yk(t) = 2Yi(t) in all t. Then stdev j(AO(Yk(t j);Pn(t j))) = 2 stdev j(AO(Yi(t j);Pn(t j))) as well as fAO(Yk;Pn) = 2 fAO(Yi;Pn) while their relative variability is the same.

According to the previous section, both fAO(Yi;Pn) = ave j(AO(Yi(t j);Pn(t j))) and stdev j(AO(Yi(t j);Pn(t j))) in the above formula can be weighted using a time weight. We have added one more feature to the map. The points are plotted as ‘bub- bles’, with the size of the bubble representing the fraction of the time the function is flagged as a multivariate outlier. This reflects the amount of local outlyingness. RejoinderFigure to ‘Multivariate 2 shows the Functional resulting Outlier outlier Detection’ maps for the four examples studied 3in the paper. The outlier map of the wine data immediately shows the high degree of outlyingness proportion is at most 25%, then size of point is smallest; at most 50% then size is secondof curve smallest 37. and Its so small on. size on the other hand indicates that its outlying behavior is limitedDifficult to to a define small cutoff region, values which (Lopez). is in Idea line of with signed classifying AO does not this seem curve to work as an isolated welloutlier. (and refer The to tablets compstat data paper are nicely of Hubert-VdV separated (2010) into the for theregular definition curves of (orange),signed the shift AO).(red) [ook and het the niet-werkende shape (blue) voorbeeld outlying van curves. tablets data tonen?]

1.5 37 26 0.75 38 37 39 25 36 1.0

0.50

0.5 sd(AO) / (1+fAO) sd(AO) sd(AO) / (1+fAO) / (1+fAO) sd(AO) 0.25 23 35

2 0.0 3 0.00 0 5 10 15 1.0 1.5 2.0 2.5 fAO fAO

41 0.4 0.3 121 95 67 30 0.3 0.2

132 0.2

0.1 / (1+fAO) sd(AO) sd(AO) / (1+fAO) / (1+fAO) sd(AO)

0.0 0.1 1.5 2.0 2.5 3.0 2 3 4 5 6 fAO fAO

Fig.Fig. 3 Functional 2 Functional outlier outlier map of map the Octane, of the Octane, Wine, Writing Wine, and Writing Tablets and data. Tablets data.

The univariate signed AO suggested by Naveen Narisetti and Xuming He was previously defined by Hubert and Van der Veeken (2010, page 1137). It indeed works 5 Computationwell for univariate functions (p = 1), but we found its multivariate generalization to be rather unstable to small changes in the functional data. Therefore, in the above We are intrigued by the example given by Alicia (large dimensions and still possible to find the by using only 20 random directions. In contradiction with Mosler (instability of the projection depth). Interesting for further research. Current example is not fully convincing as we also find the outliers based on their Euclidean norm, by considering the directions through the mean and each data point (apart from obs 7) [dus dit misschien niet vermelden?]. And the AO based on 20 random directions also finds them easily, as can be seen on the heatmap in Figure 4 and the functional outlier map in Figure 5. Rejoinder to ‘Multivariate Functional Outlier Detection’ 5 functional outlier map we chose to stick with the unsigned AO in order to keep the same definition for all p. It would be great if we could improve the functional outlier map by drawing vertical and horizontal cutoff lines based on some simple quantiles. However, such cutoffs have to depend on the autocorrelation of the observed functions, as explained in the contribution of Yuan Yan and Marc Genton who reference the interesting papers of Sun and Genton (2011, 2012) where this technology was developed.

4 Computation

The examples in our paper had low dimension p, but of course we want to be able to deal with high dimensions as well. About this the discussants were not all in agree- ment. Karl Mosler felt that the projection depth (and thus also the SDO, SPD, and AO) requires projecting the data on a huge number of directions, in fact exponential in p. Alicia Nieto-Reyes and Juan Cuesta-Albertos voiced the opposite opinion, that a small number of random directions is sufficient to compute the random Tukey depth, and illustrated this on a generated data set in 200 dimensions, with n = 39 curves containing 7 outliers. They drew 20 directions from the uniform distribution on the sphere and found all the outliers. It seems to us that when the random Tukey depth works, so should the adjusted outlyingness AO. Alicia and Juan were kind enough to provide their data, so we also generated 20 such random directions and computed the AO in each projection. This again found all the outliers, as can be seen in Figure 3(a). Figure 3(a) looks a bit different from the heatmap shown by the discussants, which is only because we added a feature to this graphical display. By default, the shading in the heatmap is directly proportional to the AO, which ranges from zero to its maximal value. This is not robust however, since large values of AO could be masked by one extremely large value. Therefore, going forward we assign the darkest shading to all AO that exceed a certain value, in this case 40. (Mechanically, this is done by truncating the AO values in the call to the heatmap function.) This way we can see all the outliers in one heatmap. Figure 3(b) shows the functional outlier map of these data. Curves 2 and 4 have the largest fAO because they are outlying in every dimension, and have the largest bubbles. Curve 5 has the largest variability as it has a huge AO at a few time points and a small AO at all other times, so it stands out as an isolated outlier. Curve 7 lies close to the regular data, in accordance with how it was generated. So, how can we reconcile the viewpoints of Karl Mosler with those of Alicia Nieto-Reyes and Juan Cuesta-Albertos? We think the difference is mainly caused by the type of invariance these authors aim for. Karl sits squarely in the camp of affine invariance, so he will typically compute directions orthogonal to random p- subsets, and as in all algorithms of this type it takes an exponential number of draws to reach the desired probability of a clean p-subset. Alicia and Juan aim for orthogonal invariance instead, even though they do not say so explicitly. Their method is not affine invariant since an affine transform can map the uniform distribution on the sphere to a very different distribution. However, an orthogonal transformation does leave the uniform distribution unchanged. Rejoinder6 to ‘Multivariate Functional Outlier Detection’ Hubert, Rousseeuw, 7 Segaert

2 4 5 1 6 4 3 5 7 19 22 21 26 31 33 3 32 29 AO 15 27 40+ 35 13 30 9 16 8 20 2 30 20 10 39 36 25 4 17 3 28 6 12 / (1+fAO) sd(AO) 23 1 38 37 1 2 34 24 7 14 11 18 10 1 33 65 97 129 161 193 225 0 10 20 30 40 wavelength fAO

2 2 4 4 6 32 10 29 7 38 5 30 26 10 11 37 16 3 37 1 28 31 29 13 1 19 24 5 36 AO 21 AO 38 27 13 40+ 16 40+ 32 6 31 30 23 30 33 7 34 28 19 20 36 20 3 35 21 10 24 10 30 34 14 14 9 26 12 20 25 33 15 11 39 15 20 25 27 39 35 17 8 8 22 9 17 12 23 18 18 22 1 33 65 97 129 161 193 225 1 33 65 97 129 161 193 225 wavelength wavelength

Fig. 3 200-dimensional data set of Nieto-Reyes and Cuesta-Albertos. Top row: (a) AO heatmap based on 20 random directions; (b) functional outlier map. Bottom row, after affine transformation of the data: (c) withheatmap local depth based and on 20 bagdistances random directions; in the (d) multivariate heatmap based space on of 500 the random observations, directions. and give a nice graphical illustration.

To verify this explanation we wrote some code to generate a random affine trans- formation. After applying this to the 200-dimensional dataset and projecting on the 6 Images same 20 random directions as before, yielding the heatmap in Figure 3(c), we only Saradetect Lopez´ outliers as well as 2 andYuan 4 Yan which and are Marc outlying Genton in mention every the dimension, possibility whereas to extend the other out- ourliers work remain to situations hidden. where This the is index still is true no longerif we draw univariate 500 (like random time directions, or wave- as seen in length)Figure but bivariate,3(d). as in the case of surfaces or images. A very nice depth-based exploratory tool to analyze image data has been proposed in Genton et al (2014) by generalizing the band depth to volume depth. Multivariate functional depth and our new outlier detection tools also easily gen- 5 Local outlyingness eralize to surfaces and images. We illustrate this on the Dorrit data, previously ana- lyzed in Engelen et al. (2007), Engelen and Hubert (2011) and Hubert et al. (2012). ThisSeveral data set people contains have excitation-emission commented on (EEM) local landscapes outlyingness, of 27 where mixtures a function of four is outlying knownonly fluorophores on a small with time excitation interval. wavelengths Karl Mosler ranging proposes from 230 an nm approach to 315 nm which ev- divides the ery 5time nm, interval and emission into subintervals. at wavelengths Luis-Angel from 250 nm Garc to´ıa-Escudero, 482 nm at 2 nm Alfonso intervals. Gordaliza, and HenceAgostin each sample Mayo-IscarY contains note 18 that116 isolated measurements outliersY correspond( j,k) for j = to1 cellwise,...J = 18 outliers in the i × i and terminologyk = 1,...,K = of116. Alqallaf et al. (2009). In the case of functional data, they offer an identification technique based on local trimming. Both discussions as well as that of Francesca Ieva and Anna Paganoni also bring up the connection between outlier detection and supervised classification (clustering). Davy Paindaveine and Germain Van Bever are concerned with local depth and bagdistances in the multivariate space of the observations, and give a nice graphical illustration. Rejoinder to ‘Multivariate Functional Outlier Detection’ 7 Rejoinder to ‘Multivariate Functional Outlier Detection’ 7 6 Images 6 Images Sara Lopez´ as well as Yuan Yan and Marc Genton mention the possibility to extend ourSara work Lopez-Pintado´ to situations where as well the as index Yuan isYan no and longer Marc univariate Genton mention(like time the or possibility wave- length)to extend but bivariate, our work as to in situations the case where of surfaces the index or images. is no longer A very univariate nice depth-based (like time or exploratorywavelength) tool but to analyze bivariate, image as in data the has case been of surfaces proposed or in images. Genton A et very al (2014) nice depth-by generalizingbased exploratory the band tooldepth to to analyze volume imagedepth. data has been proposed in Genton et al. (2014)Multivariate by generalizing functional the depth band and depth ournew to volume outlier depth. detection tools also easily gen- eralizeMultivariate to surfaces and functional images. depth We illustrate and ournew this onoutlier the Dorrit detection data, tools previously also easily ana- gen- lyzederalize in Engelen to surfaces et al. and (2007), images. Engelen We illustrate and Hubert this (2011) on the and Dorrit Hubert data, et previously al. (2012). ana- Thislyzed data in set Engelen contains et excitation-emission al. (2007), Engelen (EEM) and Hubert landscapes (2011) of and 27 Hubert mixtures etof al. four (2012). knownThis fluorophores data set contains with excitationexcitation-emission wavelengths (EEM) ranging landscapes from 230 of nm 27 to mixtures 315 nm ofev- four eryknown 5 nm, fluorophores and emission with at wavelengths excitation wavelengths from 250 nm ranging to 482 from nm 230 at 2 nm nm to intervals. 315 nm ev- Henceery 5 each nm, sample and emissionY contains at wavelengths 18 116 measurements from 250 nmY ( toj, 482k) for nmj = at1 2,... nmJ intervals.= 18 i × i andHencek = 1,..., eachK sample= 116.Yi contains 18 116 measurements Yi( j,k) for j = 1,...,J = 18 × andThek functional= 1,...,K depth= 116. of landscape Yi (with i = 1,...,27) then becomes The functional depth of landscape Y (with i = 1,...,27) then becomes 18 116 i MFD(Yi; Pn) = ∑ ∑18 116D(Yi( j,k); Pn( j,k))Wjk MFD(Yi; Pn)j =1 k∑=1 ∑ D(Yi( j,k); Pn( j,k))Wjk j=1 k=1 with ∑ j ∑k Wjk = 1. Similarly we define the functional adjusted outlyingness fAO as thewith weighted∑ j ∑k averageWjk = 1. of Similarly the AO at we every define location, the functional and construct adjusted the functional outlyingness outlier fAO as mapthe accordingly. weighted average of the AO at every location, and construct the functional outlier mapApplied accordingly. to the Dorrit data, using the SPD as depth function and a constant weight function,Applied we find to that theDorrit landscape data, 9 using has the the largest SPD asfunctional depth function depth among and a constant the 27 land- weight scapes.function, Its EEM we find landscape that landscape is depicted 9 has in the Figure largest 4(a). functional The peaks depth in these among landscapes the 27 land- reflectscapes. the Its concentration EEM landscape of the is fluorophores depicted in Figure which 4(a). are present The peaks in the in mixture.these landscapes The deepestreflect landscape the concentration is visualised of thein Figure fluorophores 4(b) and which looks are similar present butless in the smooth mixture. as it The doesdeepest not correspond landscape to is an visualized observed in surface. Figure 4(b) and looks similar but less smooth as it does not correspond to an observed surface.

Fig. 4 EEM landscape of (a) the deepest sample 9, and (b) the functional median. Fig. 4 Dorrit data: (a) EEM landscape of the deepest surface 9; (b) EEM landscape of the functional median computed from the data. The functional outlier map is presented in Figure 5. We note that landscape 5 has the highestThe functional fAO with very outlier variable map is AO presented values. in Figure 5. We note that landscape 5 has the highest fAO with very variable AO values. 8 Hubert, Rousseeuw, Segaert

5 0.5

0.4 3

0.3

2 4 1 10 sd(AO) / (1+fAO) sd(AO) 0.2

15

0.1 14

1.5 2.0 2.5 3.0 3.5 fAO

Fig. 5 Functional outlier map of the Dorrit data.

Also landscape 3 shows a high degree of outlyingness. From their raw values in the upper row of Figure 6 we see that they both achieve much larger values in many different regions. To visualize the location and the variability of these AO values we can represent them via a two-dimensional image as in the lower row of Figure 6. Here, darker colors correspond to higher AO’s.

Rejoinder to ‘Multivariate Functional Outlier Detection’ 9

310 310

300 300

290 290 AO AO 280 15+ 280 15+ 10 10 270 270 4 4 260 0 260 0

Excitation (nm) 250 Excitation (nm) 250

240 240

230 230

450 400 350 300 250 450 400 350 300 250 Emission (nm) Emission (nm)

Dorrit data: landscapes 3 and 5 (top row) with their two-dimensional AO values (bottom row). Fig.Fig. 6 6 Dorrit data:landscapes 3 and 5 (top row) with their two-dimensional AO values (bottom row).

Genton MG, Johnson C, Potter K, Stenchikov G, Sun Y (2014) Surface boxplots. Stat 3:1–11. Hand DJ, Daly F, Lunn AD, McConway KJ, Ostrowski E (1994) A Handbook of Small Data Sets. London: Chapman and Hall. Hubert M, Van der Veeken S (2010) Fast and robust classifiers adjusted for skewness. Proceedings of COMPSTAT 2010, eds. Y. Lechevallier and G. Saporta, Physica- Verlag, pp. 1135–1142. Hubert M, Van Kerckhoven J, Verdonck T (2012) Robust PARAFAC for incomplete data. Journal of Chemometrics 26:290–298. Rejoinder to ‘Multivariate Functional Outlier Detection’ 9

Also here, the AO values exceeding a certain threshold (here 15) get the darkest color, so we can still make out the intermediate AO values. We see that landscape 3 is mostly outlying at the longer excitation wavelengths, whereas the outlyingness of landscape 5 is most prominent at the longer emission and the longer/shorter excitation wavelengths. Note that in this exploratory analysis we only studied the differences between the landscapes based on their raw observed values. A more refined study could start by modeling the data through a parametric model (such as a PARAFAC model) and then apply our diagnostic tools to the residuals. Also two-dimensional warping and/or gradient functions could be added to improve performance.

References

1. Alqallaf F, Van Aelst S, Yohai VJ, Zamar RH (2009) Propagation of outliers in multivariate data. The Annals of 37:311–331 2. Claeskens G, Hubert M, Slaets L, Vakili K (2014) Multivariate functional halfs- pace depth. Journal of the American Statistical Association 109:411–423 3. Donoho D, Gasko G (1992) Breakdown properties of location estimates based on halfspace depth and projected outlyingness. The Annals of Statistics 20:1803– 1827 4. Engelen S, Frosch Møller S, Hubert M (2007) Automatically identifying scat- ter in fluorescence data using robust techniques. Chemometrics and Intelligent Laboratory Systems 86:35–51 5. Engelen S, Hubert M (2011) Detecting outlying samples in a parallel factor anal- ysis model. Analytica Chimica Acta 705:155–165 6. Genton MG, Johnson C, Potter K, Stenchikov G, Sun Y (2014) Surface boxplots. Stat 3:1–11 7. Hand DJ, Daly F, Lunn AD, McConway KJ, Ostrowski E (1994) A Handbook of Small Data Sets. London: Chapman and Hall. 8. Hubert M, Rousseeuw PJ, Vanden Branden K (2005) ROBPCA: a new approach to robust principal components analysis. Technometrics, 47:64–79 9. Hubert M, Van der Veeken S (2010) Fast and robust classifiers adjusted for skew- ness. Proceedings of COMPSTAT 2010, eds. Y. Lechevallier and G. Saporta, Physica-Verlag, pp. 1135–1142 10. Hubert M, Van Kerckhoven J, Verdonck T (2012) Robust PARAFAC for incom- plete data. Journal of Chemometrics 26:290–298 11. Riani M, Zani S (1998) Generalized distance measures for asymmetric multivari- ate distributions. Advances in Data Science and Classification, eds. A. Rizzi, M. Vichi, and H.-H. Bock, Springer, pp. 503-508 12. Rousseeuw PJ, Ruts I, Tukey J (1999) The bagplot: a bivariate boxplot. The American Statistician 53:382–387 13. Rousseeuw PJ, van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association 85:633–651 14. Sun Y, Genton MG (2011) Functional boxplots. Journal of Computational and Graphical Statistics 20:316–334 10 Hubert, Rousseeuw, Segaert

15. Sun Y, Genton MG (2012) Adjusted functional boxplots for spatio-temporal data visualization and outlier detection. Environmetrics 23:54–64