CoruespondenceAnalysis for PrincipalGomponents Transformation of Multispectraland Hyperspectral Digital lmages

James R. Caft and Korblaah Matanawi

Abstract componentsand factor analysisby using a chi-squaremetric for representinginter- and Correspondenceanalysis is introduced for principal compo- intra-variablerelationships. For nents ftansformation of multispectral and hyperspectral digital certain types of imagesand/or individual scenes,correspon- images. This method relies on squared deviations between dence analysismay provide better discrimination among in- pixel volues and their expected values (joint probabilities tercorrelatedspectral bands when compared to principal computed as the product of the sum of all pixels in one componentsor .Correspondence analysis may spectal band and the sum of pixel values acrossall bands be a better method when applied to ratio data, such as pixel at a given pixel position). Correspondenceanalysis is applied values, or measurementsin general (Greenacre,1984). to o multispectml SPoTHigh Resolution Visible fnnv) image Correspondenceanalysis is more a geometricalmethod of Eleuthera, Bahamas. Correspondenceanalysis, principa) than a statisticalone. Factor analysis and principal compo- nents analysis,for instance,are statistical components analysis, and factor andysis (standardized pfin- methods. Both are cipal components) yiel d similar transformations. Conespon- functions of covarianceamong the multiple variables,where denceanalysis, however, compresses mote imoge covatianceis the metric for closeness(similarity). Correspon- dence analysisuses a different measure into fewer principal components. For the particular SPOTHRv of closenessbetween scenechosen, correspondence analysis captures g6 percent variables.A chi-souaremetric is used to measuredistance between pixel veciors of order M, where M is the number of of the original image vafiance in its first principal compo- nent. Used in a lossy image compression algorithm to recon- spectralbands. In this sense,conespondence analysis is a ge- ometric method becauseits main concern is the orientation struct the original set of three SPOTHRV images, this first of vectorsin M-dimensionalspace, and how close principal component from cofiespondence analysis restores [pixel] spectrul content better than doesprincipal componentsanal- these vectors are to one another. A d-etaileddiscussion of the ysis. historical and theoretical background of correspondence analysiscan be found in Greenacre(19s4). A subsequentapplication is used to emphasizethe simi- lntroduction larities and differencesamong principal componentsanalysis, Principal componentstransformation, also refened to as a factor analysis,and correspondenceanalysis. The primary in- Hotelling transform,of multispectral imagesis well known tent of this paper is to introduce correspondenceanalysis for (e.g.,Schowengerdt, 1997; Gonzalez and Woods,1992; Jen- principal componentstransformation of multispectraland hy- sen, 1996).The typical application relies on within-band var- perspectral digital images.Depending on the image type and iance/between-bandcovariance matrices, eigenvectors of scene,it may yield resultsthat are superior to those from the which are used to rotate (transform)original image bands other two methods.An extensionof corresoondenceanalvsis such that each rotated band is statistically orthogonalto (in- to digital imagecompression is also suggesied. dependentofl all other bands. Another application, called standardizedprincipal components(Singh and Harrison, 1985),uses within- and between-bandcorrelation coeffrcients Algotithms to assemblea that is eigen decomposed.Although called standardizedprincipal componentsin literature re- PrincipalComponents Analysis lated to remote sensing,this method is often known as factor Given M intercorrelatedimage bands, principal components analysis(see reviews in Davis (1s73),Davis (1986),and Carr analysisproceeds by assemblinga matrix [S], M by M in size (1995)),referred to as such throughout this manuscript to and symmetrical,such that S(i,i) is the variance for band i, : avoid confusion with principal componentsanalysis. Factor and S(i,il S(/,t is the covariancebetween bands i and l. analysishas been used in digital image processingto mini- Variance can be comouted as mize the numerical influence of bands having a larger vari- ance.Realistically, principal componentsand factor analysis VAR(BV) : -;(i"u,) are similar because(co)variance is directly proportional to '= [i'"' ] correlationcoeffi cient. (1) A multivariate method of relatively recent advent, corre- ri ['i'"'- (i'",)'] spondenceanalysis (Benzecri,1973), differs from principal PhotogrammetricEngineering & RemoteSensing, J. Carr is with the Departmentof GeologicalSciences/l.72, Vol. 0s, No. B, August 19s9,pp. 909-914. University of Nevada,Reno, NV 89557 ([email protected]). oo99-1, 1, 1, 2 / 99/6508-909$3. 00/0 K. Matanawi is with World Vision, Sudan Proqram,P.O. Box @ 1999 American Society for Photogrammetry 56527,Nairobi, Kenya. and RemoteSensine

PHOIOGRAMiIETRICEiIGINEERING & RETIIOTE SENSING August I gqq 9O9 pixels, in which N is the total number of BV, collectively (4) representinga particular image band. This equation com- pares to that for covariance between two image bands, BV, and BVr:i.e., This is an inefficient approach with respect to computer memory; instead, the matrix S can alternatively be formed 1 f..$ an approach that more directly implies the cov(nv,.BVr) : BV,.rBVr., as follows, N(N _ 1l LNZ chi-squareanalogy (Davis, 1986; Can, 1995): - ,i ,u,,i,u,,] e) ,*:i(w)(w) (s) where N is again the total number of pixels in each image For example,if i : = k :1, then band. i Supposethe BVsin a Landsat TM sceneare to be rotated v1lq -(Y,,-w'7,)' (transformed)using principal componentsanalysis of all wrT, seven(M bands. In this case,a matrix S of size 7 by 7 is computed.Diagonal entries of this matrix consist of the and comparing to the equation for the chi-square statistic: seven for each band. Additionallv. there are 21,off- (O' - E,)' diagonal entries, symmetrical above and below the diagonal, x":" representingthe covariancesfor all possible two-band combi- E, nations from the group of sevenbands. in which O is an observedvalue and E is the expectedvalue, the analogy is realized if one considersthat Y is observed, FactorAnalysis (Standardized Principal Components Analysis) and product WT value Y. When Again, assumea multispectral image consisting of M spectral the is the expected of using the second for comput- bands.A matrix S of size Mbv M is comouted such that approach in correspondenceanalysis ing the matrix S, off entries become negative. S(i,, : S(i,t : r, the correlatibn coefficientbetween bands, j diagonal can The approach for forming S is in this paper and For consistencywith previous equations,let the pixels second used and i. in (1998).As in band i be representedby BVi,and those for band j be rep- Car a ffnal note, correspondenceanalysis al- ways resentedby BVr.Then reducesthe dimensional spaceby at least one. If M im- agebands are considered,only M - 1 eigenvalues,at the : COV (BVJ BV,) most, will be significant.This has important implications for t" (s*rs,J- (3) digital image compression,a concept that is explored later in this paper. In this equation, S represents the standard deviation of a dis- crete sample.Moreover, if i : i, then r : 1; thus, all diago- Application nal entries of S are equal to 1. A portion (400 rows by 400 pixels per row) of a SPOTHigh Given a seven-bandLandsat TM scene,S is a 7 by 7 ma- Resolution Visible (nnv) image of Eleuthera, Bahamas (Figure trix, The seven diagonal entries are each equal to 1. The 21 1) acquired on 21 April 1986 is chosen for an example com- unique off-diagonal entries, symmetrical above and below parison among principal componentsanalysis, factor analy- the diagonal,are equal to correlation coefficientsfor all pos- sis, and correspondenceanalysis. The three multispectral sible two-band combinations (from the original group of channels of the SPOTHRV sensor are used: channel 1 (0.5 to sevenbands). These correlation coefficientsrange numeri- 0.59 pm), channel2 (0.61to 0.68 pm), and channel3 (0.79to -1 cally between and 1, inclusive. 0.89 pm). Matrices S, obtained from each multivariate method, are listed (Table 1), along with their associatedeigenvalues ConespondenceAnalysis and eigenvectors(Table 2). In this development,a hypothetical notion of an image is Principal componentsanalysis and factor analysisare used to demonstratemathematical concepts.Suppose a 512 identical with respect to the amount of original image vari- by SIZ , seven-bandLandsat TM image is to be transformed ance captured (represented)in each principal component: 70 using correspondenceanalysis. Each band consistsof 512 by percent in the first, 28 percent in the second,and 2 percent 512 pixels, or a total of 262,L44pixels. Let a matrix Y be in the third principal component. For this particular SPoT formed that is N by M in size;M in this case,is 262,1.44, HRVscene, there seemsto be no need to standardizeimage and M is 7. Notice that Y is simplv the total collection of all variance using correlation coefficientsin a principal compo- pixels involved. Further, Iet theioial sum of all entries in Y nents analysisbecause the eigen decompositionresults ard be called u. The following steps are completed to yield a identical from principal componentsanalysis and factor square,symmetrical matrix S that is eigen decomposedto analysis. yield the image transformation (for more detail and computer programs,see Carr (1990;1995; 1998)): Tralr 1. MATRTCESS, ron PnrnclpnLColponenrs, FAcroR,AND Step 1: normalize Y by u: Y' = Y/u; CoRnesponoeruceAnnlysrs Step2: form an N by 1 vector W, each entry of which Principal ComponentsAnalysis representsthe sum of each row of Y'; each entry 2261.6 1546.1 _L//J.J in W is the total sum of [normalized] pixel val- 1546.1 1641.4 -432.O ues acrossall spectralbands consideredfor a - l/ /5-5 -432.O 3834.3 given pixel position; Factor Analysis Step3: form an Mby 1,vector T, each entry of which 1.0000 0.8025 - 0.6029 representsthe sum of each column of Y'; each 0.8025 1.0000 -o.7722 -0.6029 -o.1722 entry in T is the sum of all [normalized] pixel 1.0000 values in a given spectralband; CorrespondenceAnalysis -o.1211. Step 4: although a matrix S of size Mby M can be 0.0875 0.0370 0.0370 -0.0619 formed from two intermediate matrices.S^ and o.0267 -o.7277 -0.0619 780 S., such that 0.1 gfO Ausust I 999 PHOTOGRAMMETRICENGINEERII{G & REIIIOTESENSING Figure1' Mosaicof sPorHRV spectral images of Eleuthera,Bahamas acquired on 21 April1986. Theseimages represent a sub-sampled,400 rowby 400 pixelsection of the originalscene. Left: Channel 1 (0.5 to 0.59 pm); center:Channel 2 (o.6I to 0.68 pm); right:Channel 3 (0.79 to 0.89 pm).

In-comparison,correspondence analysis capturesg6 per- OE, and each column in T is one of the transformedimages, cent of the original image variance in its first principal com- N total pixels in size. ponent. The secondprincipal component capturesmost of Principal componentsanalysis, factor analysis,and cor- the remaining 4 percent of the total image variance,whereas respondenceanalysis yield visually similar results for the the third componentrepresents a negligible amount of vari- first principal component. The secondprincipal component ance.The 1argeamount of variance captured by correspon- image from correspondenceanalysis is visuafly similar to the dence analysisin its first principal component is usedlater third principal component image from both piincipal compo- in a discussionon digital image compression. nents an_alysisand factor analysis, probably becauJeapproxi- Eigenvectors(Table 2) from all three methods are used mately the _sameamount of original image-varianceis being to transform the original suite of three bands to three mutu- represented.The third principal component image from coi- ally orthogonal(statistically independent) bands, mosaicsof respondenceanalysis is shown, but representsnegligible var- which are shown (Figures2 and 3; a specific figure devel- iance and is not comparableto any principal component oped using factor analysisis not presentedbecause its trans- image from the other two methods formation is visually identical to that from principal c_omponentsanalysis). No scaling was used to develop the Extensionto Digitallmage Compression displays (Figures2 and 3), except to transform pixel values Given that OE : T effectsthe principal componentstransfor- to a range,[0, 255], to facilitate display. mation, then a reconstructionis afforded by reversingthis Mathematically,the transformationwas implemented as process:TE-' : O, in which E I is the inverse of the follows. Each of the SPOTHRV spectral band imases was eigenvectormatrix. If all M columns (principal component loaded as a single column in a matrix O, an N b! M image images)in T are used in this process,O is reconstructedex- where N is the total number of pixels in each band image, actly with no loss. In this case,T and O have the sameinfor- and M is the number of spectralbands. Eigenvectors(Table mation content (the same size). 2) from each method were loaded as columns into a matrix - A- lossy reconstruction (one with error) is possibleusing E. The transformationwas then obtained as a product: T : only the most significant principal component, or compo-

TneLE2. EreenDecovposrrron Resulrs Eigenvectors Vector 1 Vector 2 Vector 3 Principal Components: -o.779 0.664 2.679 - 0.435 7.172 -2.499 1.000 1.000 1.000 't57.8 Eigenvalues: 5463.7 2204.3 Sum = 7825.8 Variance: (7Oo/o) (28%) (2"/.) FactorAnalysis: - 1.458 0.063 7.970 -7.230 0.738 -1..522 1.000 1.000 1.000 Eigenvalues: 2.O9 0.84 o.o7 Sum = 3.0 Variance: (7Oo/o) (28o/o) (2%) Correspondence Analysis: -0.687 3.948 o.975 -o.342 -5.008 0.968 1..000 1.000 1.000 Eigenvalues: 0.28 0.0099 5.04E-15 Sum : 0.29 Variance: (s6%) (4%) (o%)

PHOTOGRAMMETRICENGINEERII{G & REMOTE SEI{SII{G August 1999 911 Figure2. Mosaicof transformedimages obtained using correspondenceanalysis.Left: first principalcomponent image; cen- ter: secondprincipal component image; right: third principalcomponent image.Each image is 400 linesby 400 pixelsper line.

nents, in T. For example, correspondenceanalysis of the one principal component for reconstruction.In this particular SPor HRVimage used in this paper resulted in the first prin- case,the first principal component in correspondenceanaly- cipal component capturing 96 percent of the original data sis representsmore between-bandcovariance than does the variance,An experiment is presentedto reconstructthe origi- comparablecomponent in principal componentsanalysis. nal three images,O, using only the first column in T. In this Whereaslossy digital image compressionmay not be satisfy- case.the secondand third columns of T are discarded.Such ing for many applications, it is used here more to distinguish a reconstructionis presented(Figure a). This is an example the information content in principal componentsdeveloped of a lossy compression,because discarding the second and using correspondenceanalysis from the information content third columns in T results in a loss of 4 percent of the origi- in principal componentsdeveloped using principal compo- nal data variance.The reconstruction (Figure 4) is visually nents analysis or factor analysis. similar, but not identical to the original three spectralbands (Figure 1). For comparison,a reconstructionis attempted us- Summary ing only the first principal component image from principal Corresoondenceanalvsis is a relativelv new method of multi- componentsanalysis (Figure 5). In this case,30 percent of variate analysis.Its metric is a "chi-square" deviation be- the original image variance is discarded,and the comparison tween a true (normalized)pixel value and its expectedvalue. to the original spectralimages (Figure 1) is less satisfying. In application to a three-band,spor HRVimage, the first two In thesereconstructions (Figure 4 contrastedby Figure principal component imagesfrom correspondenceanalysis 5), correspondenceanalysis restores the vegetationbrightness capture almost all original image variance.In contrast,all in the infrared spectralband (spor HRVband 3) better than principal component imagesare necessaryfrom principal does principal componentsanalysis when using only the first componentsanalysis or factor analysis (standardizedprinci-

Figure3. Mosaicof transformedimages obtained using principal components analysis. Left: first principalcomponent image; center:second principal component image; right: third principalcomponent image. Each image is 4OOlines by 400 pixels oerline.

August I 999 PHOTOGRAMMETRICEI{GI]{EERING & REMOTESEI{SING Figure4. Mosaicof reconstructedsPoT uRv images using the first principalcomponent image from correspondenceanalysis. Channel1 is on the left;Channel 2 is in the center;and Channel3 is on the right.Note the restorationof vegetationbright- nessin Channel3.

pal componentsanalysis) to representthe sameamount of construction if all principal component imagesare used for original imagevariance. the reconstruction.This, however, representsno compres- Visually, correspondenceanalysis yields principal com- sion. Correspondenceanalysis will always yield some com- ponent imagessimilar to those obtained lrom principal com- pression,even for losslessreconstruction, because it always ponents analysisor factor analysis.This statementis con- yields M - 1 significant principal components(recall the firmed by comparing Figures2 and 3 again and noting that negligible third principal component from correspondence the secondprincipal component image from correspondence analysis in the one application reviewed in this paper; Table analysisis almost identical to the third principal component 2). In the one example of reconstructionthat is presented, image from principal componentsor factor analysis.If ap- correspondenceanalysis yielded a fair reconstructionusing plied for decorrelationstretches, there is no validity to a rec- only its first principal component. ommendationthat one of thesemethods be used rather than another.But. correspondenceanalvsis capturesmore of the Acknowledgments original image variance in fewer piincipal components,an This work was supported in part by the NAsA-sponsored advantagewhen the original image data consistsof many University of Nevada SystemSpace Grant Consortium. Soft- spectralbands. ware used in this study was developedby the senior author Another advantageto correspondenceanalysis may be using Visual Basic 5.0 (Carr, 1998).This software is in the for digital image compression,especially for spectralcom- public domain and can be obtained at the following World pression.Such compressionis of two types: lossless(error- Wide Web site: http ://www. iamg.org/CGEditor/index.htm. free reconstruction)and lossy (reconstructionwith error), The SPoTHRV image of Eleuthera,Bahamas was taken from a Principal componentsand factor analysisprovide losslessre- CD-RoM:SPOT Satellite Image Library for Schools,NPA Group

Figure5. Mosaicof reconstructedspoT HRV images using the first principalcomponent image from principalcomponents analysis.Channels are arrangedas describedin the captionto Figure4. Vegetationbrightness is not reconstructedwell in Channel3.

PHOTOGRAMMEfRIC ENGINEERING& REMOTESENSIT{G Aueust 1999 913 Ltd., Edenbridge, Kent, TNB 6HS,United Kingdom. We are Davis, f.C., 1-973. and Data Analysis in Geolog.v, Fit'st Edi- grateful to two anonymous individuais whose reviews sub- tion, Wilev, New York, 550 p. slantiallyimproved this manuscript. 1986. Stofisfics and Data Analysis in Geologv, Second Edi- tion, Wilev, New York, 646 p. Greenacre, M.J., 1984. Theory and Applications of Cotespondence References Analysis, Academic Press, London, 364 p. Gonzalez, R.C., and R.E. Woods, 1992. Digital Image Processing, Benzecri, J.-P., 1,573.L'Analyse des Donnees, Vol. 1: La Taxinomie. Addison-Wesley, Reading, Massachusetts,7Lo p. Vol. 2: L'Analyse des Correspondances, Dunod, Paris, 628 p. Jensen, J.R., 1996. lntroductory Digital Image Processing, Prentice- Hall, Englervood Cliffs, New 316 p. Carr, J.R., 1990. CORSPOND: A Portable FORTRAN-77 Program lbr Iersey, Correspondence Analysis, Computers and Geosciences, 16(3): Schowengerdt, R.A., 1997. Remote Sensing: Models and Methods for 285-307. Image Processing, Second Edition, Academic Press, Ner,r'York, 522 p. 1995. Numedcal Analysis for the Geological Sciences,Pren- tice-Hall, Englewood Cliffs, New Jersey, 592 p. Singh, A., and A. Harrison, 1985. Standardized Principal Compo- 1sgB. A Visual Baslc Program for Principal Components nents, Inl. J. Remote Sensing, 6[6):885-8S6. Transformation of Digital Images, Computers and Geosciences, [Received 18 JuIy 1997; revised and accepted 10 September 1998t re- 24t31:2O9-21,8. vised 0z October 1998)

ASOTS

nugusr PHOTOGRAMMETRICENGIIiEERING & REMOTE SENSING