An Interactive Tool for Semi-Automatic Feature Extraction of Hyperspectral Data

Open Geosci. 2016; 8:493–502

Research Article Open Access

Zoltán Kovács* and Szilárd Szabó An interactive tool for semi-automatic feature extraction of hyperspectral data

DOI 10.1515/geo-2016-0040 Received Aug 05, 2015; accepted Jan 29, 2016 1 Introduction

Abstract: The spectral reflectance of the surface provides The appearance of hyperspectral imaging in remote sens- valuable information about the environment, which can ing technology was a milestone in both data acquisition be used to identify objects (e.g. land cover classification) and the possibilities for identification of different objects or to estimate quantities of substances (e.g. biomass). We or materials. Hyperspectral bands are narrow (with an ap- aimed to develop an MS Excel add-in – Hyperspectral Data proximate maximum width of 10 nm), continuous, and Analyst (HypDA) – for a multipurpose quantitative analy- cover the visible and near-infrared range. Beside aerial sis of spectral data in VBA programming language. HypDA sensors, there are field equipment and laboratory devices was designed to calculate spectral indices from spectral capable of recording the spectral characteristics of given data with user defined formulas (in all possible combina- surfaces or substances [1]. Consequently, wider research tions involving a maximum of 4 bands) and to find the communities started to use spectrum analysis in differ- best correlations between the quantitative attribute data ent fields, from remote sensing to analytical chemistry or of the same object. Different types of regression models re- physics [2–7]. veal the relationships, and the best results are saved in a The technology does not require any preparation of worksheet. Qualitative variables can also be involved in the analysed substance (which can be expensive and re- the analysis carried out with separability and hypothesis quires a laboratory background with specific devices) and testing; i.e. to find the wavelengths responsible for sep- is non-destructive. Although, the acquired result is not the arating data into predefined groups. HypDA can be used same as results achieved with traditional analytical meth- both with hyperspectral imagery and spectrometer mea- ods; it is a reflectance curve having certain sections (e.g. surements. This bivariate approach requires significantly minimums, maximums, peaks or slopes) indicating the fewer observations than popular multivariate methods; it presence or even the quantity of the analysed substance. can therefore be applied to a wide range of research areas. Accordingly, we need to find those bands or their combinations, which can substitute laboratory analyses. Keywords: hyperspectral data; spectral profile; re- Using spectral bands, we can determine spectral in- flectance; regression; hypothesis testing; separability indices which have a high correlation with a measurable dices quantity of surface objects. The best known index is the Highlights: NDVI (Normalized Difference Vegetation Index [8]), which 1. Reflectance curves are closely connected to the char- is widely used among researchers. Besides, there are many acteristics of materials. indices reported in several papers which aim, for exam- 2. An MS Excel add-in was developed to analyse spec- ple, to reduce the effect of soils’ reflectance on vegeta- tral data. tion indices (SAVI, TSAVI, PVI) [9, 10], to identify water 3. Calculations are conducted with user defined formu- (NDWI) [11] and to determine the water content of vegeta- las, considering all wavelengths. tion [12] or its chlorophyll content [13]. Finding the best in- 4. Regression models explore the correlations between dices becomes complicated as increasing number of bands spectral and attribute data. are involved, i.e. hyperspectral data allows us to find spe- 5. Separability testing reveals the greatest differences in the spectra of data groups.

Debrecen, Hungary; Email: [email protected]; Tel.: +36 52 512900/22201 (switchboard); Fax: +36 52 512945 *Corresponding Author: Zoltán Kovács: Department of Physical Szilárd Szabó: Department of Physical Geography and Geoinforma- Geography and Geoinformation Systems, University of Debrecen, tion Systems, University of Debrecen, Debrecen, Hungary

© 2016 Z. Kovács and S. Szabó, published by De Gruyter Open. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License. 494 Ë Z. Kovács and S. Szabó cific details; however, the possible band combinations in- the laboratory measurements with the help of the spectral crease exponentially. curves. Parallel with the larger dataset, the difficulty of data processing also increases. Given the large number of bands (generally > 100), several researchers have used multi- 2 Workflow variate techniques (usually Principal Component Analy- sis, PCA) to reduce the redundancy with uncorrelated prin- The concept of the data processing is summarized on Fig- cipal components (PCs), and at the same time to maintain ure 1. All hyperspectral image processing software (e.g. the maximum explained variance [14]. Although PCA is a ENVI, Erdas, Idrisi) is able to export the values of marked very effective method in data processing, it has its limita- pixels of aerial or satellite images into ASCII filese.g. ( ENVI tions in terms of the minimum number of cases: a dataset ROI ASCII text files); furthermore, most field/laboratory is required which is 5–10 times larger than the number of spectrometers and spectroradiometers can export the data variables (i.e. bands [15]), otherwise calculations are not to MS Excel file format. Due to the developed flexible MS accurate. If the aim is a prediction, PCR (Principal Com- Excel environment we can easily convert these wide scale ponent Regression) can be performed [16] and if we have of input data to tables (i.e. worksheets) from different de- large number of variables (i.e. bands) from the spectral vices. HypDA needs a special organization of the data: the analysis, the statistical analysis can only be efficient if first worksheet should contain the spectral data of samples we perform lots of chemical analysis, too. As the chemi- as independent variables, while the second contains quali- cal (and all kinds of laboratory) analysis is expensive, the tative data (as grouping variables: sampling sites, land use case number is limited, i.e. in some cases PCA cannot be categories, soil types etc.) and/or quantitative data (e.g. performed or provides false results. Thus, the number of pH, CaCO content, ion concentrations etc.) as dependent bands has to be reduced by the researcher, either by vi- 3 variables. sual selection based on the spectral curve or by merging In the data pre-processing phase we may want to re- neighbouring bands (and calculating mean values). Be- duce the spectral noise, and there are several techniques sides, Jimenez and Landgrebe (1999) [17] criticized PCA, to perform it, such as Savizky-Golay smoothing [19], nor- because it does not take into consideration the statisti- malization, standard normal variate (SNV) transforma- cal properties of the variables. As another multivariate ap- tion [20], multiplicative scatter correction (MSC) [21], and proach, Kooistra et al. (2001) [18] applied the partial least first and second derivatives (using Savizky-Golay convolu- squares regression method to filter out latent factors from tion coefficients) [22]. Spectral data can be cut with min- the bands, so they can efficiently predict the dependent imum and/or maximum wavelengths and can be placed variable. in spectral ranges with user-defined widths to merge in- There are several software packages providing statis- tensity values and replace them with a single calculated tical analysis, but none of them is able to prepare the spec- (mean or median) value. HypDA can also handle replica- tral dataset for further investigations. For data preparation tions of measurements, calculating mean or median val- and data processing we developed a Visual Basic MS Ex- ues using the similarity of variable names. cel add-in called Hyperspectral Data Analyst (HypDA). Ex- We can conduct quick tests with the workbook and cel is one of the most widely used software programs and its worksheet of spectral intensity values, and the work- its capabilities can be developed with small-sized add-ins sheet of nominal and/or scale properties: visual interpre- with a user-friendly interface. HypDA is able to import a tation of diagrams (scatterplot or boxplot), calculation of wide scale of input data from different devices, to handle e.g. Jeffries-Matusita distance (J-M) [23] or Bhattacharyya the measurement replications, and to conduct batch pro- distance between selected groups. A deeper analysis can cessing of spectral data. be performed to reveal the strongest relationship between The aim of this development was to prepare a tool be- the shape of spectra (independent variable) and scale type ing able to process the spectral curves along the follow- attributes (dependent variable) is calculated with cross- ing driving forces: (1) to find the strongest relationship be- validated regression models. The equation of these linear tween the shape of the spectra and the observed attributes or non-linear fitted curves can be used to estimate thede- (the independent variable is scale data) or (2) to find the pendent variables. In addition, we can obtain spectral in- largest difference between groups based on the shape of dices for the better discrimination of predefined groups the spectra (the independent variable is nominal data); compared the efficiency of original bands: the largest dif- furthermore, (3) its requirement for the case number is ference between user-defined groups is determined with minimal; and, (4) the results can be used to interpolate An interactive tool for semi-automatic feature extraction of hyperspectral data Ë 495

Figure 1: Theoretical workflow. the separability (J-M) or hypothesis testing. In this latter The first WS (‘spectral’) contains the independent values case, all groups have unique equations that are used for (spectral data), and the second WS (‘properties’) contains the classification of hyperspectral images, in hyperspec- the grouping and/or dependent variables (physical and tral image processing software. chemical attributes). In order to execute the calculation only on a part of the spectra (e.g. to exclude certain spectral ranges), or on a part of the samples (e.g. to exclude 3 Description of HypDA some sample group), bands and samples can be enabled or disabled before investigations. New properties can be added by reclassifying an existing attribute or calculating 3.1 HypDA structure from other quantitative attributes (e.g. NDVI). The third WS (‘analysis’) is used to observe a selected model in greater When the HypDA add-in is enabled, a new ribbon tab will detail, while the fourth WS (‘table’) is used to calculate all be added to the Excel ribbon. A wizard was designed to existing combinations of chosen bands in a matrix-like ta- break down the creation of a new HypDA-compatible work- ble. The fifth WS (‘best’) contains the main parameters of book into multiple steps, from the selection of the input the best models from the fourth WS. The sixth WS (‘set- data to the customization of the output file. This new Ex- tings’) is hidden from the user and stores all the saved pa- cel workbook is required for further investigations. Work- rameters and user settings. books have six worksheets (WS) with specific functions. 496 Ë Z. Kovács and S. Szabó

Figure 2: Any item of tables or any saved best value can be recalled in the ‘analysis’ WS.

3.2 Working with parametric formulas gether with all the parameters which are needed to recon- struct the models later. HypDA tool is able to define parametric formulas referenc- This second approach means that if the ‘spectral’ WS ing on spectral intensity values (e.g. BandA/BandB) and has 100 bands, a table will contain 10,000 individual mod- to use the results obtained as dependent variables. Users els, and if a four-element formula is used, 10,000 tables can follow two different procedures: (1) to choose certain with 10,000 individual models, i.e. 100 million regression bands (e.g. BandA = 560 nm and BandB = 970 nm) that re- models are calculated, and the best 10,000 models are sult in a certain single model (on ‘analysis’ WS); (2) or the saved for further investigations. other possibility is to involve all bands, and to generate a When we use matrix-like tables to search for high cor- matrix-like table with all the possible pairs of bands (on relations or large differences, all items of the table con- ‘table’ WS), where the bands change column by column stitute a unique investigated model described by a single (BandA) and row by row (BandB). In this second case an value that represents how good the model is (e.g. R2). One n-element formula results in an n-dimension table, so if can choose a cell from the table or the ‘best’ WS which con- a three-element formula is defined – e.g. BandC/(BandA- tains a model in the background and recall this model in BandB) - the result is a three dimensional table; conse- the ‘analysis’ WS for a detailed investigation (Figure 2). quently BandC acquires new intensity values following the calculations of each 2D table. Finally, a four-element formula results in a four-dimensional table where BandD ac- 3.3 Data processing: examples of quires new intensity values after the calculations of each hyperspectral image analysis and 3D table. Necessarily, this solution implies that all former results 2D tables have to be overwritten. To avoid the loss of the data gained, a user-defined number (up to 100) of the best Generally, when conducting data analysis, there are two values is saved to the fifth (‘best’) WS from each table to- main issues which arise during the processing of remotely An interactive tool for semi-automatic feature extraction of hyperspectral data Ë 497 sensed data: (1) which part of the spectra shows the features were measured with an Avantes 2048 spectrom- strongest relationship between a certain phenomenon or a eter in the 200–1100 nm range, with the band with of property, or (2) which part of the spectra shows the great- 0.55 nm. CaCO3 was the dependent and the 1630 bands est difference between groups. Furthermore, (3) we can were the independent variables. All measurements were visualize and query the matrix of the correlations, R2 or arranged into the required format with an import mod- RMSE values by band combinations. The following case ule of HypDA, then regression analyses were conducted studies represent the capabilities of HypDA with data col- involving all soil characteristics as independent variables lected by a spectrometer and with a hyperspectral image. and all bands as dependent variables (it is possible to apply equations for calculating spectral indices and to run the analyses in batch mode). Considering all bands, we 3.3.1 Searching for the bands which best correlate with identified a band (289 nm) that had high correlation with the attribute data CaCO3 content (r = 0.92, p < 0.05). Furthermore, R2 was 0.85 (p < 0.001), and RMSE = 4.51 (the error compared to the av- Our main goal was to investigate the relationship between erage CaCO3 content was 16.23%) (Figure 3). The results the values of chosen attributes and the calculated values were validated by splitting the dataset and recalculating from the intensity values of given bands. Following this, the equation. Boostrapping with random selection of cases regression models (defined by the user: one at a time, or (100 times) resulted the lower and upper 95% confidence even all of the types simultaneously; linear, exponential, interval of the R2 between 0.840 and 0.857. logarithmic, power and polynomial 2-4) are calculated between a chosen attribute (dependent variable) and the indices (independent variables). 3.3.2 Searching for bands where user defined groups All details of the model are listed by curve type (linear, have significant differences exponential, logarithmic, power and polynomial 2-4), in- cluding the model error, predicted residual sum of squares The tool is able to identify the largest difference between (PRESS), coefficient of determination, residuals and re- sample groups; accordingly, separability and/or hypotesis gression sum of squares, adjusted R square, standard error testing can be performed. In particular, both parametric for the y estimate, significance, quartiles of residuals, root- and nonparametric hypothesis tests (the Mann-Whitney U- mean-square deviation and normalized root-mean-square test, the Kruskal-Wallis H-test and the Welch t-test) can deviation calculated for all available fitting curve types. be carried out. As a first step, a normality test should be Furthermore, a scatterplot with the data pairs and a fitted conducted (Shapiro-Wilk test) to decide whether paramet- curve allows the possibility of visual analysis. A boxplot of ric or non-parametric methods can be applied. Descriptive the residuals and a homoscedasticity chart with standard- statistics (e.g. mean, percentiles, skewness and kurtosis) ized predictive and residual values, and influential values are also calculated for each group. The Welch t-test and (using Cook’s distance), are calculated and plotted for the the Mann-Whitney U-test are performed for each available selected fitting curve(s). If one sets a grouping (nominal) group pairs and the Kruskal-Wallis H-test for all selected attribute, the data points of charts can be coloured sepa- groups. Moreover, the Bonferroni correction (presented in rately and the coefficient of determinations can be calcu- tabular form) helps to decide whether there is a signifi- lated for each group individually according to the chosen cant difference between the group pairs originating from curve fitting type. the significance of Mann-Whitney tests [24]. Finally, effect There are several methods embedded for the valida- sizes [25] are also calculated to quantify differences in a tion of the regression models: leave-one-out, k-fold and re- standardized and comparable way and a boxplot diagram peated random sub-sampling cross-validation is available. is constructed to displays the quartiles of the groups. Datasets can be divided automatically by a user-defined Separability indices (such as Bhattacharyya distance, ratio into a train and test dataset. Jeffries - Matusita distance, divergence and transformed In order to show the relevance of the tool and to re- divergence) can be determined from the covariance matri- veal how soil characteristics can be predicted by using ces and the mean vectors of the analysed populations. spectral data we analysed 44 soil samples collected in the In order to find band combinations where a certain Bükk Mountains (N-Hungary), which physical and chem- group can be absolutely distinguished from the others, ical characteristics (pH, CaCO3-content, organic matter, three more analyses can be carried out: (1) have no over- granulometrical composition) were determined in the Lab- lapping inter-percentile ranges; (2) show a high Jeffries – oratory of Geosciences, University of Debrecen. Spectral 498 Ë Z. Kovács and S. Szabó

Figure 3: Main panels of the ‘Analysis’ WS for testing the correlations between spectral indices and attributes. Worksheet contains the data table (A) with the relevant information about samples, a scatter plot (B), a boxplot (C) and a plot for the homoscedasticity (D).

Matusita distance and/or (3) indicate a significant Mann- defined by the user, e.g. 1.85). Results showed that almost Whitney test. all LULC classes can be discriminated with the help of the We demonstrate a case study to present the ability of new indices, except the artificial group of classes (asphalt our tool to discriminate land use/land cover (LULC) cate- and buildings) (Figure 4). Results can be refined with the gories of a hyperspectral image captured by an AISA Ea- calculation of spectral indices per category when one cate- gle II sensor in the campus of University Debrecen (the gory’s difference from the others is enhanced. In this case, city is the second largest city in Hungary) in the range of the result was seven spectral indices for the seven classes 400–1000 nm (128 bands). Seven LULC categories were (Table 1), which can be used in the classification process analysed (asphalt, building, forest, grassland, synthetic in ENVI (or any other image processing software). grass, tennis court and shadow); we collected 100 points Finally, applying these indices, we conducted a Sup- for the train and 1000 points for the ground truth datasets port Vector Machine (SVM) classification in ENVI and de- by LULC classes in ENVI (Exelis Visual Solutions, 2014) termined the accuracy of the outcomes. Comparing the so- according to Burai et al. (2015) [26]. Spectral band data lution with the original bands, HypDA based indices per- were exported as a ROI file and processed in MS Excel with formed better (Figure 5). Based on the overall accuracy the HypDA add-in. The next step was to apply the general or Kappa Coefficient, the difference is slight but Produc- equation of (BandA-BandB)/(BandA+BandB) to calculate ers’s Accuracy (PA) and User’s Accuracy (UA) calculated by spectral indices and chose the one which provided the best categories (Table 2) highlight the advantage of HypDA in- J-M values (where the number of LULC pairs was maximal dices: all classification was more successful than with the in terms of the J-M distances being above a critical value original bands, only in case of buildings and asphalt sur- An interactive tool for semi-automatic feature extraction of hyperspectral data Ë 499

Figure 4: Main panels of the ‘Analysis’ WS for testing differences between groups. Worksheet contains the data table (A) with the relevant information about pixels, a boxplot (B) for visualising the distribution of populations and cross-tables (C) with the selected hypothesis test or separability index.

Figure 5: Supervised classification of a hyperspectral image of Debrecen (University Campus), Hungary (A: the true colour orthophoto ofthe area; B: classification with the HypDA indices; C: classification with all original bands). 500 Ë Z. Kovács and S. Szabó

Table 1: Spectral indices generated with the HypDA add-in for the discrimination of seven land use/land cover categories (each index were determined by categories with the enhancing the differences compared to all other categories).

Class Index Asphalt 744 nm / 615 nm Building 682 nm / 672 nm Forest 753 nm / 720 nm TennisCourt 686 nm / 601 nm SyntheticGrass 840 nm / 787 nm Grass 517 nm / 485 nm Shadow 522 nm / 445 nm faces resulted smaller accuracy values (but PA or UA was higher in these cases, too).

Figure 6: Coloured table emphasising the weaker and stronger relationships between selected attributes and calculated spectral 3.3.3 Searching for the most frequent band indices. combinations

Table 2: Accuracy assessment of the SVM classification considering When we consider all possible band pairs to find the best the one performed with all original spectral bands and the one with correlations or models having the lowest RMSE, the matrix the HypDA indices (larger accuracies were highlighted with bold can be plotted on a correlogram (i.e. all bands in rows vs letters). all columns). Correlogram indicates the required parameters (e.g. R2, separability indices) with colours, based on SVM HypDA indices user-defined formats. This operation enhances those spec- Class PA (%) UA (%) PA (%) UA (%) tra ranges which are generally stronger or weaker for the Asphalt 96.24 74.71 93.42 80.91 observed relations (Figure 6). When there is a large num- Building 81.03 99.44 88.15 96.29 ber of saved models on the ‘best’ WS, then a table can be Forest 93.16 99.13 97.06 99.47 created, showing which band pairs appear the most fre- TennisCourt 99.21 98.00 99.50 99.67 quently as best values for specific formulas or relations. SyntheticGrass 99.21 99.50 99.82 99.63 Figure 6 shows how a table can be coloured, based on user- Grass 99.29 96.09 99.18 98.29 defined condition formats. This option can be very help- Shadow 99.60 88.64 99.41 94.27 ful in creating our own hyperspectral indices based on our dataset on a given subject. maximum of 4 bands at the same time. According to the bivariate technique, it requires fewer cases. A similar application was developed by Buddenbaum 4 Results and Discussion and Püschel (2012) [27] in EnMap-Box software [28] and the approach was successfully applied in a study by Bu- Hyperspectral datasets represent an important base for chorn et al. [29] using field spectroscopy. Beside the sim- data mining. The analysis requires spectral bands and ilarity (as HypDA also determines indices based on re- measured or observed values as independent variables. gression between a measured variable and the spectral The number of hyperspectral bands is usually large, so an bands), HypDA provides several methods to investigate automated procedure is desirable. Multivariate techniques the prerequisites of regression analysis. Users can ver- require large numbers of cases; however, expensive analy- ify homoscedasticity and can exclude the influential data ses of the examined materials in the laboratories can limit points to ensure the normal distribution of the residuals the number of measurements. Our approach considers all (differences of measured and predicted values) automati- bands and we can use any kind of equations involving a cally (based on Cook’s distance [15]) or interactively (based on the scatterplot of the variables with highlighted out- An interactive tool for semi-automatic feature extraction of hyperspectral data Ë 501 liers). Any kind of tabular data can be processed and users Acknowledgement: Zoltán Kovács was supported by the can define any kind of equations (even with four bands); European Union and the State of Hungary, co-financed furthermore, calculations can be run in batch mode with by the European Social Fund in the framework of TÁ- several tasks. MOP 4.2.4. A/2-11-1-2012-0001 ‘National Excellence Pro- Moreover, HypDA can determine those bands/band gram’. The publication was supported by the University of combinations, which can discriminate user-defined Debrecen (RH/751/2015) and the SROP-4.2.2.B-15/1/KONV- groups. Due to the matrix approach, all possible band 2015-0001 project. The project has been supported by combinations are taken into consideration and the best the European Union, co-financed by the European Social solutions can be chosen by the users with the help of sev- Fund. eral embedded possibilities (separability or hypothesis testing). We successfully applied the indices generated by HypDA in a land cover classification of Landsat data and References gained 98% overall accuracy in discriminating five land cover classes [30]. [1] Chang, C 2003. Hyperspectral Imaging. Springer Science and A future task is to connect the add-in with a GIS soft- Business Media, ISBN 978-1-4419-9170-6 ware where the indices can be used directly with remotely [2] Ben-Dor, E., Chabrillat, S., Dematte, J.A.M., Taylor, G.R., Hill, J., sensed images. Another promising way of development is Whiting, M.L., Sommer, S., 2009. Using imaging spectroscopy the integration to R software (R Core Team). to study soil properties. Remote Sensing Environ. 113, S38-S55. DOI: 10.1016/j.rse.2008.09.019 [3] Deng, C., Wu, C., 2012. BCI: A biophysical composition index for remote sensing of urban environments. Remote Sensing Envi- 5 Conclusions ron. 127, 247–259. DOI: 10.1016/j.rse.2012.09.009 [4] Fodor, N., Pásztor, L., Németh, T., 2014. Coupling the 4M crop model with national geo-databases for assessing Spectral indices are efficient tools in estimating character- the effects of climate change on agro-ecological charac- istics of materials or Earth’s surface and can be used in teristics of Hungary. Int. J. Digital Earth 7 (5), 391–410. image classification. We developed a software add-in be- DOI: 10.1080/17538947.2012.689998 ing able to determine spectral indices for both purposes. [5] Smith, A.M.S., Kolden, C.A., Tinkham, W.T., Talhelm, A.F., Mar- shall, J.D. et al., 2014. Remote sensing the vulnerability of veg- According to its usage, we found that spectral indices pro- etation in natural terrestrial ecosystems. Remote Sensing Envi- duced directly to given task can perform well. CaCO3 con- ron. 154, 322–337. DOI: 10.1016/j.rse.2014.03.038 tent of soils (Cambisols) was predicted with 16.3% relative [6] Szalai, Z., Kiss, K., Jakab, G., Sipos, P., Belucz, B., Németh, error using the data of a spectrometer. Classification of a T., 2013. The use of UV-VIS-NIR reflectance spectroscopy hyperspectral aerial image using spectral indices being de- to identify iron minerals. Astron. Nachr. 334, 940–943. termined to enhance the differences between seven LULC DOI: 10.1002/asna.201211965 [7] Thenkabail, P.S., Lyon, J.G., Huete, A., 2011. Hyperspectral re- classes resulted a better overall accuracy than the one per- mote sensing of vegetation. CRC Press- Taylor and Francis formed with the original bands. group. [8] Rouse, J.W., Haas, R.H., Schell, J.A., Deering, D.W., 1973. Mon- itoring vegetation systems in the great plains with ERTS. Third ERTS Symposium, NASA SP-351, 309–317. 6 HypDA availability and hardware [9] Huete, A.R., 1988. A soil adjusted vegetation index (SAVI). Remote Sensing Environ. 25, 295–309. DOI: 10.1016/0034- requirements 4257(88)90106-X [10] Richardson, A.J., Everitt, J.H., 1992. Using spectral vegetation in- HypDA is a free add-in and can be downloaded from its dices to estimate rangeland productivity. Geocarto Int. 7(1), 63– webpage after registration: https://sites.google.com/site/ 69. DOI: 10.1080/10106049209354353 hyperspectraldataanalyst/download. [11] Gao, B.C., 1996. NDWI - A normalized difference water index for remote sensing of vegetation liquid water from space. There is no special hardware requirement to use the Remote Sensing Environ. 58, 257–266. DOI: 10.1016/S0034- add-in. Of course, the processing time is heavily depen- 4257(96)00067-3 dent on the computer’s characteristics and the size of [12] Ceccato, P.,Gobron, N., Flasse, S., Pinty, B., Tarantola, S., 2002. datasets. To indicate the calculation time on a standard PC Designing a spectral index to estimate vegetation water con- (4 GB RAM, 3.3 GHz, OS: Windows 8.1 64 bit), a table with tent from remote sensing data: Part 1: Theoretical approach. Remote Sensing Environ. 82, 188–197. DOI: 10.1016/S0034- 10,000 Kruskal-Wallis tests on 1000 samples (divided into 4257(02)00037-8 10 groups) takes only 20 seconds. 502 Ë Z. Kovács and S. Szabó

[13] Gitelson, A.A., Merzylak, M.N., 1997. Remote estimation of [22] Gans, P.,Gill, J. B. 1983. Examination of the Convolution Method chlorophyll content in higher plant leaves. Int. J. Remote Sens- for Numerical Smoothing and Differentiation of Spectroscopic ing 18, 2691–2697. DOI: 10.1080/014311697217558 Data in Theory and in Practice. Applied Spectrosc. 37, 515–520. [14] Leone, A.P., Sommer, S., 2000. Multivariate analysis of labora- DOI: 10.1366/0003702834634712 tory spectra for the assessment of soil development and soil [23] Richards, J.A., Jia, X. 2005. Remote Sensing Digital Image degradation in the southern Apennines (Italy). Remote Sensing Analysis. Springer, Berlin-Heidelberg-New York, ISBN-10 3-540- Environ. 72, 346–359. DOI: 10.1016/S0034-4257(99)00110-8 25128-6 [15] Kabacoff, R.I., 2011. R in action, Manning, Shelter Island [24] Dunnett, C.W., 1964. New tables for multiple comparisons with [16] Jolliffe, I.T. 1982. A note on the Use of Principal Components a control. Biometrics 20, 482–491. in Regression. J. Royal Stat. Soc. Series C 31, 300–303. DOI: [25] Cohen, J., 1992. Statistical power analysis. Curr. Dir. Psychol. 1, 10.2307/2348005 98–101. DOI:10.1111/1467-8721.ep10768783 [17] Jimenez, L.O., Landgrebe, D.A., 1999. Hyperspectral data anal- [26] Burai, P., Deák, B., Valkó, O., Pinty, B., Tomor, T., 2015. ysis and supervised feature reduction via projection pur- Classification of Herbaceous Vegetation Using Airborne suit. IEEE Trans. Geosci. Remote Sensing 37, 2653-2667. Hyperspectral Imagery. Remote Sens. 7(2), 2046–2066. DOI: 10.1109/36.803413 DOI: 10.3390/rs70202046 [18] Kooistra, L., Wehrens, R., Leuven, R.S.E.W., Buydens, L.M.C., [27] Buddenbaum, H.; Püschel, P. SpInMine (Spectral Index Data 2001. Possibilities of visible-near infrared spectroscopy for the Mining Tool): Manual for Application: SpInMine (1.0); University assessment of soil contamination in river floodplains. Analytica of Trier: Trier, Germany, 2012. Chimica Acta 446, 97-105. DOI: 10.1016/S0003-2670(01)01265- [28] van der Linden, S., Rabe, A.; Held, M., Jakimow, B., Leitão, P.J., X Okujeni, A., Schwieder, M., Suess, S., Hostert, P. 2015. The [19] Savitzky, A., Golay, M.J.E. 1964. Smoothing and Differentiation EnMAP-Box - A Toolbox and Application Programming Interface of Data by Simplified Least Squares Procedures. Anal. Chem. 36, for EnMAP Data Processing. Remote Sens. 7, 11249–11266. 1627–39. DOI: 10.1021/ac60214a047 [29] Buchhorn M, Walker D, Heim B, Raynolds M, Epstein H, [20] Barnes, R. J., Dhanoa, M.S. and Lister, S.J., 1989: Standard Nor- Schwieder M. 2013. Ground Based Hyperspectral Characteriza- mal Variate Transformation and De-trending of Near-Infrared tion of Alaska Tundra Vegetation along Environmental Gradi- Diffuse Reflectance Spectra. Appl. Spectrosc. 43, 772–777. DOI: ents. Remote Sens. 5, 3971–4005. DOI: 10.3390/rs5083971 10.1366/0003702894202201 [30] Olasz, A., Kristóf, D., Belényesi, M., Bakos, K., Kovács, Z., [21] Geladi P. and Dåbakk E., 1995: An overview of chemometrics ap- Balázs, B., Szabó, Sz. 2015. IQPC 2015 Track: Water detection plications in near infrared spectrometry. J. Near Infrared Spec- and classification on multi-resource remote sensing and terrain trosc., 3, 119–132. DOI:10.1255/jnirs.63 data. ISPRS Archives, XL-3/W3