Correspondence Analysis for Principal Components Transformation Of

CoruespondenceAnalysis for PrincipalGomponents Transformation of Multispectraland Hyperspectral Digital lmages James R. Caft and Korblaah Matanawi Abstract componentsand factor analysisby using a chi-squaremetric for representinginter- and Correspondenceanalysis is introduced for principal compo- intra-variablerelationships. For nents ftansformation of multispectral and hyperspectral digital certain types of imagesand/or individual scenes,correspon- images. This method relies on squared deviations between dence analysismay provide better discrimination among in- pixel volues and their expected values (joint probabilities tercorrelatedspectral bands when compared to principal computed as the product of the sum of all pixels in one componentsor factor analysis.Correspondence analysis may spectal band and the sum of pixel values acrossall bands be a better method when applied to ratio data, such as pixel at a given pixel position). Correspondenceanalysis is applied values, or measurementsin general (Greenacre,1984). to o multispectml SPoTHigh Resolution Visible fnnv) image Correspondenceanalysis is more a geometricalmethod of Eleuthera, Bahamas. Correspondenceanalysis, principa) than a statisticalone. Factor analysis and principal components analysis,for instance,are statistical components analysis, and factor andysis (standardized pfin- methods. Both are cipal components) yiel d similar transformations. Conespon- functions of covarianceamong the multiple variables,where denceanalysis, however, compresses mote imoge variance covatianceis the metric for closeness(similarity). Correspon- dence analysisuses a different measure into fewer principal components. For the particular SPOTHRv of closenessbetween scenechosen, correspondence analysis captures g6 percent variables.A chi-souaremetric is used to measuredistance between pixel veciors of order M, where M is the number of of the original image vafiance in its first principal component. Used in a lossy image compression algorithm to recon- spectralbands. In this sense,conespondence analysis is a ge- ometric method becauseits main concern is the orientation struct the original set of three SPOTHRV images, this first of vectorsin M-dimensionalspace, and how close principal component from cofiespondence analysis restores [pixel] spectrul content better than doesprincipal componentsanal- these vectors are to one another. A d-etaileddiscussion of the ysis. historical and theoretical background of correspondence analysiscan be found in Greenacre(19s4). A subsequentapplication is used to emphasizethe simi- lntroduction larities and differencesamong principal componentsanalysis, Principal componentstransformation, also refened to as a factor analysis,and correspondenceanalysis. The primary in- Hotelling transform,of multispectral imagesis well known tent of this paper is to introduce correspondenceanalysis for (e.g.,Schowengerdt, 1997; Gonzalez and Woods,1992; Jen- principal componentstransformation of multispectraland hy- sen, 1996).The typical application relies on within-band var- perspectral digital images.Depending on the image type and iance/between-bandcovariance matrices, eigenvectors of scene,it may yield resultsthat are superior to those from the which are used to rotate (transform)original image bands other two methods.An extensionof corresoondenceanalvsis such that each rotated band is statistically orthogonalto (in- to digital imagecompression is also suggesied. dependentofl all other bands. Another application, called standardizedprincipal components(Singh and Harrison, 1985),uses within- and between-bandcorrelation coeffrcients Algotithms to assemblea matrix that is eigen decomposed.Although called standardizedprincipal componentsin literature re- PrincipalComponents Analysis lated to remote sensing,this method is often known as factor Given M intercorrelatedimage bands, principal components analysis(see reviews in Davis (1s73),Davis (1986),and Carr analysisproceeds by assemblinga matrix [S], M by M in size (1995)),referred to as such throughout this manuscript to and symmetrical,such that S(i,i) is the variance for band i, : avoid confusion with principal componentsanalysis. Factor and S(i,il S(/,t is the covariancebetween bands i and l. analysishas been used in digital image processingto mini- Variance can be comouted as mize the numerical influence of bands having a larger variance.Realistically, principal componentsand factor analysis VAR(BV) : -;(i"u,) are similar because(co)variance is directly proportional to '= [i'"' ] correlationcoeffi cient. (1) A multivariate method of relatively recent advent, corre- ri ['i'"'- (i'",)'] spondenceanalysis (Benzecri,1973), differs from principal PhotogrammetricEngineering & RemoteSensing, J. Carr is with the Departmentof GeologicalSciences/l.72, Vol. 0s, No. B, August 19s9,pp. 909-914. University of Nevada,Reno, NV 89557 ([email protected]). oo99-1, 1, 1, 2 / 99/6508-909$3. 00/0 K. Matanawi is with World Vision, Sudan Proqram,P.O. Box @ 1999 American Society for Photogrammetry 56527,Nairobi, Kenya. and RemoteSensine PHOIOGRAMiIETRICEiIGINEERING & RETIIOTE SENSING August I gqq 9O9 pixels, in which N is the total number of BV, collectively (4) representinga particular image band. This equation com- pares to that for covariance between two image bands, BV, and BVr:i.e., This is an inefficient approach with respect to computer memory; instead, the matrix S can alternatively be formed 1 f..$ an approach that more directly implies the cov(nv,.BVr) : BV,.rBVr., as follows, N(N _ 1l LNZ chi-squareanalogy (Davis, 1986; Can, 1995): - ,i ,u,,i,u,,] e) ,*:i(w)(w) (s) where N is again the total number of pixels in each image For example,if i : = k :1, then band. i Supposethe BVsin a Landsat TM sceneare to be rotated v1lq -(Y,,-w'7,)' (transformed)using principal componentsanalysis of all wrT, seven(M bands. In this case,a matrix S of size 7 by 7 is computed.Diagonal entries of this matrix consist of the and comparing to the equation for the chi-square statistic: seven variances for each band. Additionallv. there are 21,off- (O' - E,)' diagonal entries, symmetrical above and below the diagonal, x":" representingthe covariancesfor all possible two-band combi- E, nations from the group of sevenbands. in which O is an observedvalue and E is the expectedvalue, the analogy is realized if one considersthat Y is observed, FactorAnalysis (Standardized Principal Components Analysis) and product WT value Y. When Again, assumea multispectral image consisting of M spectral the is the expected of using the second for comput- bands.A matrix S of size Mbv M is comouted such that approach in correspondenceanalysis ing the matrix S, off entries become negative. S(i,, : S(i,t : r, the correlatibn coefficientbetween bands, j diagonal can The approach for forming S is in this paper and For consistencywith previous equations,let the pixels second used and i. in (1998).As in band i be representedby BVi,and those for band j be rep- Car a ffnal note, correspondenceanalysis al- ways resentedby BVr.Then reducesthe dimensional spaceby at least one. If M im- agebands are considered,only M - 1 eigenvalues,at the : COV (BVJ BV,) most, will be significant.This has important implications for t" (s*rs,J- (3) digital image compression,a concept that is explored later in this paper. In this equation, S represents the standard deviation of a dis- crete sample.Moreover, if i : i, then r : 1; thus, all diago- Application nal entries of S are equal to 1. A portion (400 rows by 400 pixels per row) of a SPOTHigh Given a seven-bandLandsat TM scene,S is a 7 by 7 ma- Resolution Visible (nnv) image of Eleuthera, Bahamas (Figure trix, The seven diagonal entries are each equal to 1. The 21 1) acquired on 21 April 1986 is chosen for an example com- unique off-diagonal entries, symmetrical above and below parison among principal componentsanalysis, factor analy- the diagonal,are equal to correlation coefficientsfor all pos- sis, and correspondenceanalysis. The three multispectral sible two-band combinations (from the original group of channels of the SPOTHRV sensor are used: channel 1 (0.5 to sevenbands). These correlation coefficientsrange numeri- 0.59 pm), channel2 (0.61to 0.68 pm), and channel3 (0.79to -1 cally between and 1, inclusive. 0.89 pm). Matrices S, obtained from each multivariate method, are listed (Table 1), along with their associatedeigenvalues ConespondenceAnalysis and eigenvectors(Table 2). In this development,a hypothetical notion of an image is Principal componentsanalysis and factor analysisare used to demonstratemathematical concepts.Suppose a 512 identical with respect to the amount of original image vari- by SIZ , seven-bandLandsat TM image is to be transformed ance captured (represented)in each principal component: 70 using correspondenceanalysis. Each band consistsof 512 by percent in the first, 28 percent in the second,and 2 percent 512 pixels, or a total of 262,L44pixels. Let a matrix Y be in the third principal component. For this particular SPoT formed that is N by M in size;M in this case,is 262,1.44, HRVscene, there seemsto be no need to standardizeimage and M is 7. Notice that Y is simplv the total collection of all variance using correlation coefficientsin a principal compo- pixels involved. Further, Iet theioial sum of all entries in Y nents analysisbecause the eigen decompositionresults ard be called u. The following steps are completed to yield a identical from principal componentsanalysis and factor square,symmetrical matrix S that is eigen decomposedto

Load more