ADE-4 Software : a Tool for Multivariate Analysis and Graphical Display in : Féral J.P
Total Page:16
File Type:pdf, Size:1020Kb
Oceanis » vol. 24 n? 4.1998. p. 393-416 ADE-4 Software: a tool for multivariate analysis and graphical display* Monique SIMIER(1), Jean THIOULOUSE(2) & Jean-Michel OUVIER(3) (1) Laboratoire halieutique et ecosysternes aquatiques IRD BP5045 34032 Montpellier cedex 1 France [email protected] (2) Laboratoire de biornetrie, genetique et biologie des populations UMR CNRS 5558 Universite Lyon 1 69622 Villeurbanne cedex France (3) Laboratoire d'ecologie des eaux douces et des grands fleuves URA CNRS 1974 Universite Lyon 1 69622 Villeurbanne cedex France Keywords: Statistical software, multivariate analysis, principal compo nent analysis, correspondence analysis, instrumental variables, canon ical correspondence analysis, partial least squares regression, coinertia analysis, multi-table analysis, graphics, multivariate graphics, Macintosh, HyperCard, Windows 95, Windows NT Abstract We present ADE-4. a multivariate analysis and graphical display soft ware. Multivariate analysis methods available in ADE-4 include usual "This paper is from: Thioulouse ]., Chessel D., Doledec S. & Olivier ].M., 1997. ADE-4: a rnultivariate analysis and graphical display software. Statistics and Computing, n? 7, p. 75-83, which should be referenced by any publication using this software. Oceanis ISSN 0182-0745 © Institut oceanographique 2001 1. In ADE play lt is widf ta d man mult data is fn Che: tion is frl char (2) f 2.C 2.1. The (CA: whi, tran the we1j Ir to s Océ. ·. ADE-4 Software: a tooJ for muJtivariate anaJysis and graphicaJ dispJay 395 and correspon. réaliser automatiquement des collections de graphiques correspondant a total variance à des groupes de lignes ou à des colonnes du tableau de données, of ogous to Moran frant ainsi un outil très efficace pour les graphiques multi-tableaux ou ods, discriminant les cartes géographiques. Un module graphique dynamique autorise des linear regression opérations interactives comme la recherche, l'agrandissement de parties nultiple and PLS du graphe. la sélection de points. et la représentation de valeurs sur des ession (principal cartes factorielles. t:interface utilisateur est simple et homogène pour :ipal component tous les modules; cela contribue à faire d'ADE-4 un outil accessible aux ndence analysis non-spécialistes en statistiques, analyse de données ou informatique. Q method, and ,raphical display 1 graphies carre lle, thus provid ,d geographical 1. Introduction :eractive opera disp/ay of data ADE-4 (THIOULOUSE et al., 1997) is a multivariate analysis and graphical dis 1 homogeneous play software for Apple Macintosh and Windows 95 or NT microcomputers. ·4 very easy for It is made of several stand-alone applications, called modules, that feature a !nce. wide range of multivariate analysis methods, from simple one-table analysis to three-way table analysis and two-table coupling methods. Ir also provides many possibilities for helpful graphical display in the process of analyzing multivariate data sets. Ir has been developed in the context of environmental data analysis, but can be used in other scientific disciplines where data analysis Ise en compo les instrumen. is frequently used. ADE-4 is developed by J. Thioulouse (computer science), D. aux moindres Chessel (statistics), S. Dolédec (ecology) and J.-M. Olivier (contact & distribu IX, graphiques, tion). The whole software, including documentation and sample data sets are IWS 95, Win- is freely available on the Internet network. Here, we wish to present the main characteristics of ADE-4, from three point of view: (1) data analysis methods, (2) graphical display capabilities, and (3) user interface. la représenta l'analyse mul lSsiques à un ~ et "analyse 2. Data analysis methods ~s spatialisées ; et globales, 2.1. Mathematical basis of the computation de partition nalyses inter The data analysis methods available in ADE·4 are based on the duality diagram e comme les (CAILLIEZ & PAGÈS, 1976), introduced in ecology by ESCOUFIER (1987) in ,'tiples et PLS which a raw data table is considered as a statistical triplet composed of a régression en lme l'analyse transformed table of the same dimensions and two weight matrices, one for l'analyse ca the lines, the other for the columns. Depending on the transformation and the llyses de co weights, alllinear data analysis methods can be defined. nalyse mu/ti In many modules, Monte-Carlo tests (GOOD, 1994, chapter 13) are available ~rmettent de to study the significance of observed structures. Océanis • vol. 24 nO 4. 1998 di (l' 15 o ct C. of y< C pl fa al Sl tu 2. T nI hi cc (F 2. E le Ir Vol L (1 A P (1 c ·. ADE-4 Software: a tool for multivariate analysis and graphical display 397 The PCA module offers several options, corresponding to different duality diagrams: correlation matrix PCA, covariance matrix PCA, non centered PCA lble data sets (NoY-MEIR, 1973), decentered PCA, partially standardized PCA (BOUROCHE, dules are the 1975), within-groups standardized PCA (DOLÉDEC Ile CHESSEL, 1987). See ariables, the OKAMOTO (1972) for a discussion of different types of PCA. GREENACRE, The COA module offers six options for correspondence analysis (CA): ,r qualitative classical CA, reciprocal scaling (THIOULOUSE Ile CHESSEL, 1992), row weighted 5). CA, internaI CA (CAzEs et al., 1988), decentered CA (DOLÉDEC et al., 1995). The MCA module offers rwo options for the analysis of tables made of qualitative variables: Multiple Correspondence Analysis (TENENHAUS Ile YOUNG, 1985) and Fuzzy Correspondence Analysis (CHEVENET et al., 1994; CASTELLA Ile SPEIGHT, 1996). A fourth module, HTA (homogeneous table analysis) is intended for homo geneous tables, i.e., tables in which ail the values come from the same variable (for example, a toxicity table containing the toxicity of sorne chemical com pounds toward several animal species, see DEVILLERS Ile CHESSEL, 1995). Lastly, the DDUtil (duality diagram utilities) module provides several inter pretation helps that can be used with any of the methods available in the first four modules, namely: biplot representation (GABRIEL, 1971, 1981), inertia analysis for rows and columns (particularly for COA, see GREENACRE, 1984), supplementary rows and/or columns (LEBART et a/., 1984), and data reconsti tution (LEBART et al., 1984). 2.3. Partitioning and classification methods The Clusters module provides a partitioning algorithm using moving centers near to the k-means c1ustering method, and five algorithms for computing hierarchies from a distance matrix including average linkage, simple linkage, complete linkage, Ward's method and second order moment divisive algorithm (Roux, 1985, 1991). 2.4. One table with spatial structures Environmental data very often include spatial information (e.g., the spatial location of sampling sites), and this information is difficult to introduce in classical multivariate analysis methods. The NGStat module provides a way to achieve this by using a neighbouring relationship berween sites. See B: One LEBART (1969) for a presentation of this approach and THIOULOUSE et al. , several lmetrical (1995) for a general framework based on variance decomposition formulas. Jcture to Also available in Distances module are the Mantel test (MANTEL, 1967), the ollection principal coordinate analysis (MANLY, 1994), and the minimum spanning tree (KEVIN Ile WHITNEY, 1972). 04.1998 Océanis • vol. 24 nO 4 • 1998 398 M. SIMIER et al. 2.5. One table with groups of rows PCA, ( like Ci When a priori groups of individuals exist in the data table (figure lB), the al., 19 Discrimin module can be used to perform a discriminant analysis (DA, also 1987a. called canonical variate analysis), and between-groups or within-groups anal IS espe yses (DOLÉDEC & CHESSEL, 1989). These three methods can be performed after a PCA, a COA, or an MCA, leading to a great variety of analyses. For example, 2.8. Cl in the case of DA, we can obtain after a PCA, the classical DA (MAHALANOBIS, 1936; TOMAS SONE et al., 1988), after a COA, the correspondence DA, and There after an MCA the DA on qualitative variables (SAPORTA, 1975; PERSAT et al., perfor 1985). Monte-Carlo tests are available to test the significance of the between DOLÉ] groups structure. and th analys 2.6. Linear regression 2.9. k· Four modules provide several linear regression methods (figure lC). These modules are UniVarReg (for univariate regression), OrthoVar (for orthogonal Collee analy; regression), LinearReg (for linear regression) and PLSgen2 (for second gen eration Partial Least Squares regression). Here also, Monte-Carlo tests are three 1994) available to test the results of these methods. The UniVarReg module deals with two regression models: polynomial et al., regression and Lowess method (locally weighted regression and smoothing 1978) Multi scatterplots; CLEVELAND, 1979; CLEVELAND & DEVLIN, 1988). The OrthoReg module performs multiple linear regression in the particular analy table~ case of orthogonalexplanatory variables. This is useful for example in PCR (principal component regression; NiEs, 1984), or in the case of the projection ways stand on the subspace spaned by a series of eigenvectors (THIOULOUSE et al., 1995). exten The LinearReg module performs the usual multiple linear regression (MLR), and the first generation partial [east squares (PLS) regression (LINDGREN, inerti 1994). See also GELADI & KOWALSKI (1986); HOSKULDSSON (1988) for more details on PLS regression. 3.GI The PLSgen2 module performs second generation PLS regression (TENEN HAUS, 1997). ADE grapI 2.7. Two-tables coupling methods