R code

MULTIVARIATE ANALYSIS MESIO (15-16)

PROF. SERGI CIVIT PROF. MIQUEL SALICRÚ R packages , Datasets

 The following packages, available from CRAN, will be used in these practicals: ade4’, ‘ape’, ‘cclust’, ‘cluster’, ‘MVN’, ‘mvpart’, ‘rgl’, ‘spam’, ‘spdep’, ‘vegan’.

 The following datasets, available from ATENEA, will be used in these practicals:  FangaTaufa.xls, Fangataufa.txt (PCA, PcoA)  Ecosistemes.xls (CLUSTER, PCA)

2D representation

Fangataufa was the site of 's first two-stage thermonuclear test, code named , detonated on August 24, 1968. The nuclear explosion had a yield of 2.6 megatons. The atoll was also the location of the 1970 914-kiloton Licorne ('Unicorn') test and 2 other atmospheric nuclear tests as well as several underground nuclear tests. Today, Fangataufa serves as a wildlife sanctuary for various species of birds. http://ripley.si.edu/ent/nmnhtypedb/images/pdfs/4247.pdf NASA Astronaut Image of Fangataufa Atoll (Tuamotu Archipelago, ) in the Pacific Ocean fangataufa.txt is a ASCII file containing an inventory of the gastropod species. Sampling locations are in rows, Species names are in columns

# For this example, the data are found in object fangataufa.txt. data<-read.table("fangataufa.txt",sep="\t",header=TRUE)

Principal Component Analysis (PCA) (1/3)

# PCA is calculated by the prcomp()function. # Remove names of first colum

# The argument "scale=FALSE" means that the data is not standardized. data.out = prcomp(data[,-1], scale=FALSE)

# The values returned, by the function prcomp() names(data.out) data.out$sdev # the standard deviations of the principal components (the square roots of the eigenvalues) data.out$rotation # the matrix of variable loadings (columns are eigenvectors)

Principal Component Analysis (PCA) (2/3)

# PCA outputs: summary(data.out)

# shows a screeplot. plot(data.out, main="Scree plot", xlab="Principal Components")

# shows coordinates of individuals on the principal components biplot(data.out, main="Non standardized data", cex=0.7) #shows a biplot graph

# Repeat the analysis using the argument "scale=TRUE" means that the data is standardized and show a biplot graph.

Principal Component Analysis (PCA) (3/3)

Principal coordinate analysis (PCoA) 1/5

# the function cmdscale of the stats package to carry out this analysis. “cmds” is the acronym of “classical multidimensional scaling”.

# Functions to compute PCoA are also available in packages ape (function pcoa), ade4 (function dudi.pco), and mvpart (function cmds.diss).

# PCoA starts with a distance matrix. The Euclidean distance will be used to illustrate how to use PCoA. The Euclidean distance is the default option when using function dist of the stats package.

# Example: analysis of the file fangataufa.txt. data<-read.table("fangataufa.txt",sep="\t",header=TRUE) # Compute the matrix of Euclidean distances data.D1 = dist(data[,-1]) # or data.D1 = dist(data[,-1], method="eucl") # Compute the matrix of Bray Curtis distances library(vegan) data.D2<-vegdist(data[,-1], method="bray") Principal coordinate analysis (PCoA) 2/5

# Principal coordinate analysis. Save k=3 axes. Plot a graph of axes 1 and 2. outBC=cmdscale(data.D2, 3, eig=TRUE) plot(outBC$points[,1], outBC$points[,2], main="Bray-Curtis distance",asp=1, xlab="Axis 1", ylab="Axis 2") names = rownames(data) text(outBC$points[,1], outBC$points[,2], labels= names, pos=2, cex=0.5, offset=0.15)

# Note: "asp=1" constrains the two axes to the same scale. This ensures that the distances # among objects on the plot are projections of their real distances in multivariate space. ?cmdscale # consult the documentation file of function cmdscale summary(out) # to obtain a list of the elements in file out out$points # contains the coordinates of the objects along the k= requested dimensions out$eig # contains the eigenvalues of the principal axes

Principal coordinate analysis (PCoA) 3/5

# Repeat the analysis after applying the Hellinger transformation (using function decostand of the vegan package) to the data. Hellinger transformation, followed by calculation of Euclidean distances.

# Example: analysis of the file fangataufa.txt data<-read.table("fangataufa.txt",sep="\t",header=TRUE) library(vegan) data.hel = decostand(data[,-1], "hel") # Hellinger transformation data.DHell = dist(data.hel) # Compute the Hellinger distance

# Principal coordinate analysis. outDHell = cmdscale(data.DHell, 3, eig=TRUE) plot(outDHell$points[,1], outDHell$points[,2], main="Hellinger-Curtis distance",asp=1, xlab="Axis 1", ylab="Axis 2") names = rownames(data) text(outDHell$points[,1], outDHell$points[,2], labels= names, pos=2, cex=0.5, offset=0.15) Principal coordinate analysis (PCoA) 4/5

# Function text adds the object names to the graph. names = rownames(data) text(x, y, labels= names, pos=2, cex=0.5, offset=0.15)

# PCoA can also be computed using the pcoa() function of the ape package. The biplot.pcoa() function of that package produces nicer ordination plots than those produced above.

Principal coordinate analysis (PCoA) 5/5