<<

Applied and Environmental Science Quantitative Soil Spectroscopy

Guest Editors: Sabine Chabrillat, Eyal Ben-Dor, Raphael A. Viscarra Rossel, and Jose ' A. M. Dematte^ Quantitative Soil Spectroscopy Applied and Environmental

Quantitative Soil Spectroscopy

Guest Editors: Sabine Chabrillat, Eyal Ben-Dor, Raphael A. Viscarra Rossel, and JoseA.M.Dematt´ eˆ Copyright © 2013 Hindawi Publishing Corporation. All rights reserved.

This is a special issue published in “Applied and Environmental Soil Science.” All articles are open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Editorial Board

Lynette K. Abbott, Australia William Horwath, USA Nikolla Qafoku, USA Joselito M. Arocena, Canada Davey Jones, UK Peter Shouse, USA Nanthi Bolan, Australia Matthias Kaestner, Germany B. Singh, Australia Robert L. Bradley, Canada Heike Knicker, Spain Keith Smettem, Australia Artemi Cerda, Spain Takashi Kosaki, Japan Marco Trevisan, Italy Claudio Cocozza, Italy Yongchao Liang, China Antonio Violante, Italy Hong J. Di, New Zealand Teodoro M. Miano, Italy Paul Voroney, Canada Oliver Dilly, Germany Amaresh K. Nayak, India Jianming Xu, China Michael A. Fullen, UK Yong Sik Ok, Republic of Korea Ryusuke Hatano, Japan Alessandro Piccolo, Italy Contents

Quantitative Soil Spectroscopy, Sabine Chabrillat, Eyal Ben-Dor, Raphael A. Viscarra Rossel, and JoseA.M.Dematt´ eˆ Volume 2013, Article ID 616578, 3 pages

Effects of Subsetting by Carbon Content, Soil Order, and Spectral Classification on Prediction of Soil Total Carbon with Diffuse Reflectance Spectroscopy, Meryl L. McDowell, Gregory L. Bruland, Jonathan L. Deenik, and Sabine Grunwald Volume 2012, Article ID 294121, 14 pages

Investigations into Soil Composition and Texture Using Infrared Spectroscopy (2-14 µm), Robert D. Hewson, Thomas J. Cudahy, Malcolm Jones, and Matilda Thomas Volume 2012, Article ID 535646, 12 pages

The Effects of Spectral Pretreatments on Chemometric Analyses of Soil Profiles Using Laboratory Imaging Spectroscopy, Henning Buddenbaum and Markus Steffens Volume 2012, Article ID 274903, 12 pages

Spatially Explicit Estimation of and Organic Carbon Content in Agricultural Using Multi-Annual Imaging Spectroscopy Data, Heike Gerighausen, Gunter Menz, and Hermann Kaufmann Volume 2012, Article ID 868090, 23 pages

A Comparison of Feature-Based MLR and PLS Regression Techniques for the Prediction of Three Soil Constituents in a Degraded South African Ecosystem, Anita Bayer, Martin Bachmann, Andreas Muller,¨ and Hermann Kaufmann Volume 2012, Article ID 971252, 20 pages

Using Reflectance Spectroscopy and Artificial Neural Network to Assess Water Infiltration Rate into the Soil Profile, Naftali Goldshleger, Alexandra Chudnovsky, and Eyal Ben-Dor Volume 2012, Article ID 439567, 9 pages

Spectral Estimation of Soil Properties in Siberian Soils and Relations with Plant Species Composition, Harm Bartholomeus, Gabriela Schaepman-Strub, Daan Blok, Roman Sofronov, and Sergey Udaltsov Volume 2012, Article ID 241535, 13 pages

Quantitative Analysis of Total Petroleum Hydrocarbons in Soils: Comparison between Reflectance Spectroscopy and Solvent Extraction by 3 Certified Laboratories, Guy Schwartz, Eyal Ben-Dor, and Gil Eshel Volume 2012, Article ID 751956, 11 pages Hindawi Publishing Corporation Applied and Environmental Soil Science Volume 2013, Article ID 616578, 3 pages http://dx.doi.org/10.1155/2013/616578

Editorial Quantitative Soil Spectroscopy

Sabine Chabrillat,1 Eyal Ben-Dor,2 Raphael A. Viscarra Rossel,3 and José A. M. Demattê4

1 Section of Remote Sensing, Helmholtz Centre Potsdam, GFZ German Research Centre for Geosciences, Telegrafenberg, 14473 Potsdam, Germany 2 The Remote Sensing Laboratory, Department of Geography and Human Environment, Tel-Aviv University, P.O. Box 39040, Ramat Aviv, 69978 Tel-Aviv, Israel 3 Soil and Landscape Program, CSIRO Land and Water, Bruce E. Butler Laboratory, Clunies-Ross Street Black Mountain, P.O. Box 1666, Canberra, ACT 2601, Australia 4 Soil Science Department, Luiz de Queiroz College of Agriculture, SaoPaulo,UniversityofPiracicaba,SP13418-900,Brazil˜

Correspondence should be addressed to Sabine Chabrillat; [email protected]

Received 13 January 2013; Accepted 13 January 2013

Copyright © 2013 Sabine Chabrillat et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Interest in the use of visible-near infrared reflectance spec- relateprimarilytothefollowingattributes.(i)Soilwater, troscopy for the determination of mineralogical composition a key variable in hydrologic cycle, controls processes such in soils and planetary surfaces has been demonstrated since as infiltration and discharge with consequences for plant the 1970s with the development of databases of minerals growth, , and land degradation. (ii) spectra recorded in the laboratory by Hunt and Salisbury. (content and composition), through its key role in the carbon A little later, in the early 1980s, the first spectral database cycle, is an important variable in global climate models. Soil (or library) of American soils was generated by Stoner and organic matter, of which carbon is a major part, holds a Baumgardner. In the mid-1980s, several workers demon- large proportion of nutrients, cations, and trace elements strated that different soil attributes could be estimated from that are needed for plant growth. (iii) Soil mineralogy and the spectral reflectance measurement and the quantitative texture are important soil properties as they affect physical, era of soil spectroscopy begun. The attractiveness of soil chemical, and biological soil processes. Other soil parameters mightalsobeestimatedthrougheitherdirectorindirect spectroscopy is that measurements are rapid and estimates relationships of soil reflectance with the chemical, physical, of soil properties are inexpensive compared to conventional and biological characteristics of the soil matrix. soil analyses. Nowadays, research on quantitative soil spec- In this regard, optical and infrared sensing covering troscopy for the prediction of soil properties, prompted by the visible, infrared, and thermal parts of the electromag- developments in multivariate statistics and chemometrics, is netic spectrum, respectively, have shown good potential for continuing to grow. Over the past decades, the availability of retrieving information on soil attributes. Across this spectral new high signal-to-noise ratio hyperspectral sensors that can range, three regions sensitive to soil properties can be defined be mounted on airborne platforms or that can be used in the as follows: laboratory and the field opened significant new possibilities toward the quantitative analyses of the physical and bio- (i) The visible-near infrared (VNIR) spectral region from 𝜇 chemical composition of the Earth’s soil. 0.4 to 1 m contains information on , iron The spectral reflectance of soil is a cumulative property content and composition, , and organic that derives from the inherent spectral behaviour of het- matter. erogeneous combinations of minerals, organic matter, and (ii) The near infrared (NIR) which also referred asthe water molecules in the soil. Studies on soil spectroscopy short-wave infrared (SWIR) in remote sensing from 2 Applied and Environmental Soil Science

1to2.5𝜇m contains information on phyllosilicates, theuppersurfaceisusedtogetherwithon-ground(or most sorosilicates, hydroxides, some sulphates, am- proximal) soil spectroscopy for global soil characterization phiboles, carbonates, soil water, and organic matter. and monitoring. A number of bench top and portable spectrometers In this context, the main focus of this special issue is to combine the VNIR and the SWIR to provide spectra present current research on soil applications of reflectance in the vis–NIR range from 0.4 to 2.5 𝜇m. spectroscopy at both point and imagery domains, illustrating (iii) The mid-infrared (mid-IR), which is also referred state-of-the-art methods in quantitative soil spectroscopy. to as the thermal infrared (TIR) in remote sensing, Our principal objective is that this issue demonstrates the ranges from 2.5 to 25 𝜇mincludingthemedium potentialofsoilspectroscopyforthequantitativedescription (MWIR) and long-wave (LWIR) infrared spectral of spatial distribution of soils and their properties. The special regions (3–5 and 8–14 𝜇m) that are atmospheric win- issue received 13 papers from all over the world covering dows. This spectral region contains information on diverse topics. Of those, after a peer review process, 8 were quartz, feldspars, silicate minerals, mafic, clay, car- considered for publication. bonate mineral group, and organic compounds. M. L. McDowell et al. discuss the effects of subsetting by carbon content, soil order, and spectral classification for While the chemical attributes influencing soil reflectance are the prediction of soil total carbon using the VNIR, SWIR, basedonabsorptionofradiationbychemicalcompounds and TIR (0.4–14 𝜇m) spectral regions associated with partial in selected frequencies, a soil spectrum is also affected by least squares regression (PLSR) models based on diffuse physical characteristics of the soil such as particle size, reflectance information received by a point spectrometer. surface roughness and soil water, which will be dominated The soil samples originated from the Hawaiian agricultural by scattering radiation across all spectral regions. landswithverylowtohighcarboncontent.Theresultsshow Different methods have been proposed for the quantifi- that different subsetting methods explored lead to various cation of soil properties using spectra. For instance, methods results. Subsetting of only low carbon content samples based on the analytical and physical characteristics of the showed improvement in the prediction of carbon content. signal and empirical methods based on chemometrics have R.D.Hewsonetal.evaluatetheabilityofSWIRandTIR been shown to have good effective predictability. Therefore, (2–14 𝜇m) spectral regions to characterize soil composition regressions can be developed using field and laboratory data and texture using particle separated soil samples from Tick for calibration, allowing soil reflectance to be related with soil Hill,QLD,Australia,andnaturalsoilsfromtheUnitedStates properties. Direct relationships between specific absorption Department of Agriculture (ASTER spectral library). Deriva- features with soil mineralogy and water content are the more tion of clay mineral content, quartz content, and organic commonly used methods, particularly in remote sensing. carbon content is performed based on indices from the TIR However, methods that use empirical multivariate relation- region using the 9.5 𝜇m, 8.62 𝜇m, and 3.45 𝜇mspectralfea- ships of reflectance with soil properties are nowadays often tures, respectively. A good correlation with mineral content developed in remote sensing to help enlarge the prediction was observed for the quartz index, which demonstrated the envelope of quantitative soil spectroscopy toward more soil added value of the TIR region (emissivity) to the VNIR-SWIR attributes. region (reflectance). These advances of soil spectroscopy have a significant H. Buddenbaum et al. explore the influence of several impact in many soil science fields; for instance, quantitative spectral pretreatments on chemometric analyses of soil pro- spectroscopy is being used in the evaluation and monitoring files on a submillimetre scale. Two 30 cm deep cores from of and soil function (e.g., water storage and carbon southern German soils were measured in the laboratory storage), soil fertility and soil threats (e.g., acidification and with a hyperspectral scanner (HySpex-1600) covering the erosion), and soil . For example, soil degradation VNIR spectral range (0.4–1 𝜇m). The results showed that (salinity, erosion, and deposition), soil mapping and classi- preprocessing methods have a minor influence on the PLS fication, soil genesis and formation, , and regression results for the estimation of elemental concentra- soil hazards (swelling soils) are important issues that are tions in the soil cores. nowadays examined using hyperspectral remote sensing, H. Gerighausen et al. spatially estimate clay and organic enlarging the soil spectroscopy into a spatial domain. Indeed, carbon content in agricultural soils using multiannual imag- quantitative soil spectroscopy is being used to collect the ing spectroscopy data (HyMap: 0.4–2.4 𝜇m). The study area large amounts of data needed for and is is the test site of Durable Environmental Multidisciplinary used as a support in many activities toward future global soil Monitoring Information Network (DEMMIN) in Germany, monitoring. In this sense, the upcoming future availability with low average organic carbon content. PLS regressions are of high signal-to-noise ratio satellite imaging spectrometers used to test the predictive ability of the models for parameter such as EnMAP (German satellite initiative, in phase D), prediction for the different years of hyperspectral survey. PRISMA (Italian satellite initiative, prephase A), HYPXIM A. Bayer et al. compare between two approaches for (French satellite initiative, phase A will start in the end of the mapping and quantification of soil organic carbon, 2012), SHALOM (Italy-Israel initiative), and HyspIRI (USA iron oxides, and clay content from hyperspectral imagery satellite initiative, prephase A) will be a major step toward (HyMap: 0.4–2.4 𝜇m) over large areas in the Thicket Biome, the operational quantitative monitoring of soil surfaces over South Africa. A physical approach based on a set of diag- large areas. Then, hyperspectral remote sensing that senses nostic spectral features linked with chemical reference data Applied and Environmental Soil Science 3 using multiple linear regression techniques is developed and compared with results from multivariate method such as PLS regression models. N. Goldshleger et al. study the effect of raindrop energy on the water infiltration and runoff into the soil profile and on the soil surface SWIR reflectance (1.2–2.4 𝜇m) by studying seven soils from Israel and the USA subjected to artificial (controlled) rainstorm events. The spectral properties of crust formed on the soil surface were analyzed by using a nonlinear artificial neural network (ANN) method and compared with PLS regression. The ANN technique provides better results for the correlation of spectral reflectance with infiltration rate. H. Bartholomeus et al. investigate the estimation of soil properties in Siberian Tundra soils using reflectance informa- tion and study the relationships with the plant species com- position. PLS and stepwise multiple linear (SML) regression models derived from the reflectance measurements are used toproximatetotalC,N,pH,K,andPinthesoil.Subsequently, SML regressions yielded high accuracy models for prediction of C and N, and PLS models yielded good prediction model forKandmoderatemodelforpH. G. Schwartz et al. propose the use of reflectance spec- troscopy method as an alternative tool for detecting contam- ination of total petroleum hydrocarbons in the soils. They compare the spectral method with three commercial certified laboratory analyses using traditional methods. The results show that reflectance spectroscopy provides a similar accu- racy to commercial laboratories method and could be used for initial field-screening investigations holding cost-effective and rapid assessment. We hope that you find the special issue interesting and useful and that it will act as a precursor for more studies to come in soil spectroscopy.

Acknowledgments We thank the EU FP7 EUFAR Project for supporting the ExpertWorkingGroup(EWG)insoilapplicationsofhyper- spectral imagery and the organization of the first and second workshopsoftheEWGheldatPotsdam,Germany,onApril 2010andAugust2011,bringingtogetherforthefirsttime the community on soil spectroscopy. This project is further supported by the ISPRS Technical Group VII/3 “Information extraction from hyperspectral data.” Our thanks also to all the reviewers for their timely reports and constructive commentsandthestaffoftheEditorialOfficeofAppliedand Environmental Soil Science for their support. S. Chabrillat wishes to thank the EnMAP Satellite Project funded by BMBF/BMWi and the GFZ for additional support. R. A. Viscarra Rossel’s contribution was funded by the CSIRO’s Sustainable Agriculture Flagship (SAF). Sabine Chabrillat Eyal Ben-Dor Raphael A. Viscarra Rossel JoseA.M.Dematt´ eˆ Hindawi Publishing Corporation Applied and Environmental Soil Science Volume 2012, Article ID 294121, 14 pages doi:10.1155/2012/294121

Research Article Effects of Subsetting by Carbon Content, Soil Order, and Spectral Classification on Prediction of Soil Total Carbon with Diffuse Reflectance Spectroscopy

Meryl L. McDowell,1 Gregory L. Bruland,1, 2 Jonathan L. Deenik,3 and Sabine Grunwald4

1 Natural Resources and Environmental Management Department, University of Hawai‘i Manoa,¯ 1910 East-West Road, Sherman 101, Honolulu, HI 96822, USA 2 Biology & Natural Resources Department, Principia College, 1 Maybeck Place, Elsah, IL 62028, USA 3 Tropical Plant and Soil Sciences Department, University of Hawai‘i Manoa,¯ 3190 Maile Way, Honolulu, HI 96822, USA 4 Soil and Water Science Department, University of Florida, 2169 McCarty Hall, P.O. Box 110290, Gainesville, FL 32611-0290, USA

Correspondence should be addressed to Meryl L. McDowell, [email protected]

Received 15 March 2012; Revised 20 August 2012; Accepted 14 October 2012

Academic Editor: Sabine Chabrillat

Copyright © 2012 Meryl L. McDowell et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Subsetting of samples is a promising avenue of research for the continued improvement of prediction models for soil properties with diffuse reflectance spectroscopy. This study examined the effects of subsetting by soil total carbon (Ct) content, soil order, and spectral classification with k-means cluster analysis on visible/near-infrared and mid-infrared partial least squares models for Ct prediction. Our sample set was composed of various Hawaiian soils from primarily agricultural lands with Ct contents from <1% to 56%. Slight improvements in the coefficient of determination (R2) and other standard model quality parameters were observed in the models for the subset of the high activity clay soil orders compared to the models of the full sample set. The other subset models explored did not exhibit improvement across all parameters. Models created from subsets consisting of only low Ct samples (e.g., Ct < 10%) showed improvement in the root mean squared error (RMSE) and percent error of prediction for low Ct soil samples. These results provide a basis for future study of practical subsetting strategies for soil Ct prediction.

1. Introduction Partial least squares regression (PLSR) appears to be the most widely used chemometric method for developing pre- Diffuse reflectance spectroscopy (DRS) and chemometric diction models from soil diffuse reflectance spectra. A sample analysis have become popular subjects of research for their set is commonly divided into two groups with the larger used potential to predict soil carbon and other soil properties. for calibration and the smaller for validation to approximate This methodology could be beneficial for monitoring soil true independent model validation, but no clear or consistent quality and temporal variation, as well as helping to facil- guidelines have been adopted for this process. Model results itate digital soil mapping efforts. Both visible/near-infrared are known to vary with different groupings of samples for the (VNIR) and mid-infrared (MIR) spectra show promise for calibration and validation sets. To address this issue, some the prediction of soil total carbon (Ct) and organic carbon, studies have created multiple models, each with different as well as organic matter, total N, total P, , , and random divisions of the sample set into calibration and clay fractions, cation exchange capacity, and pH (e.g., [1–8]). validation sets, to reflect the range of possible results [13, 14]. Particular attention has been given to soil carbon, which is Highly accurate prediction models are required for DRS an important indicator of soil fertility and biological activity to be an effective method for soil carbon determination in and is crucial to carbon sequestration endeavors [9–12]. practical applications. Many statistically robust models have 2 Applied and Environmental Soil Science been developed (e.g., [5–8, 15]), but a single procedure is 0 25 50 100 150 200 not necessarily the best for producing high quality models Kilometers N from different soils in different locations. Even models that have excellent correlation between soil spectra and properties could be improved. For instance, the robust PLSR models of McDowell et al. [8] have relatively large errors in Ct prediction at very low Ct values, which decreases the utility of the models in situations where low Ct soils or small changes in Ct are examined. Additional methods are being explored to produce the most robust and accurate DRS prediction models possible for different local and global soil datasets. One promising idea is to split the sample set into groups based on similar characteristics and to develop individual prediction models for each of these subsets. In studies of soils from Poland, Brazil, and Florida (USA), previous researchers have investigated subsetting by characteristics such as carbon Soil order content, soil order, , and spectral similarity with varied success for their particular sample sets [16–18]. Spodosol The current work aimed to improve the prediction of Ct with VNIR and MIR DRS by creating attribute- specific chemometric models. Specifically, we investigated if Figure 1: Distribution of soils sample collection sites throughout predictions from a chemometric model built only from a the Hawaiian Islands with symbol color indicating soil order. subset of samples that are similar with respect to a particular characteristic (i.e., Ct) will provide better predictions than a comprehensive model built from a set of all possible samples. The study investigated the following three subsetting strate- 2.2. Traditional Total Carbon Analysis. Dry combustion was gies: (1) soil Ct value; (2) soil order; (3) spectral classification used to measure the Ct of ball-milled soil samples. Several of with k-means cluster analysis. Each of the various subset the samples obtained from the NRCS archive were previously models was compared against the original full sample set measured for Ct by dry combustion before storage. All model to assess the magnitude of changes in the predictions. remaining samples were analyzed at the Agricultural Diag- This study was built upon the research reported in McDowell nostic Services Center (ADSC) at the University of Hawaii et al. [8]. In that work the authors demonstrated the ability of Manoa¯ with an LECO CN2000 combustion gas analyzer [19]. DRS to predict Ct in Hawaiian soils. The success of different A small portion of the previously measured NRCS archive wavelength ranges (i.e., VNIR versus MIR) and chemometric samples were reanalyzed at ADSC to provide a cross-check of the values obtained from different laboratories. The Ct methods was investigated, as well. Because these ideas have < been previously explored in McDowell et al. [8], they will not values of the full sample set range from 1% to 56% with be discussed further here. a distribution weighted toward the lower Ct end.

2. Materials and Methods 2.3. Visible/Near-Infrared DiffuseReflectanceSpectroscopy. Visible/near-infrared diffuse reflectance spectra were col- 2.1. Sample Collection and Preparation. Thesamplesetfor lected from the 2 mm sieved soil samples with an Agrispec this study is composed of 307 soil samples collected across spectrometer and muglight light source (Analytical Spectral the five main Hawaiian Islands of Kauai, Oahu, Molokai, Devices, Inc., Boulder, CO, USA). The Agrispec has three Maui, and Hawaii, illustrated in Figure 1. Two hundred detectors with a combined spectral range of 350 to 2500 nm, and sixteen of these samples were collected from 1981 to sampling interval of 1 nm, and spectral resolution from 2007 and stored in the archive at the Natural Resources 3 nm (at 700 nm) to 10 nm (at 1400 nm). Each soil sample Conservation Service (NRCS) National Center was measured three times, with the sample cup rotated 20◦ in Lincoln, Nebraska, and the remaining 91 samples were between each measurement. The three spectra were averaged newly collected in 2010. Within this full set of samples, 10 to produce the final spectrum for each sample. A Spectralon soil orders and more than 100 are represented. (Labsphere, North Sutton, NH, USA) white reference was Samples were predominantly from a variety of agricultural measured as a reference spectrum to begin each session and soils, hosting over 25 different crop types. The majority of again every 30 minutes or less thereafter. A slight offset in samples are of surface soils (∼77%), and the remainder are reflectance between the range covered by the first and second of corresponding subsurface soil horizons from 17 of the detectors was observed in many spectra, and, therefore, collection sites. The soil samples were dried and sieved to we removed the narrow region of 990–1010 nm from the retain the less than 2 mm fraction for VNIR DRS analysis. final spectra for analysis. The VNIR spectra of these soils − A portion of each sample was also ball-milled to less than commonly exhibit features associated with OH ,H2O, iron 250 µm for MIR DRS analysis. oxides, phyllosilicates, and organic molecules. For regression Applied and Environmental Soil Science 3 analysis the spectra were transformed using the pretreatment 2.6. Sample Subsetting. The motivation behind our selected identifiedasmosteffective for this data set in McDowell subsetting strategies was to improve Ct prediction while et al. [8]. For the VNIR spectra, this optimal preprocessing still retaining the simplicity that makes DRS attractive. We transformation was mean normalization. focused on subsetting criteria that did not require additional highly detailed soil characterization, instead relying on 2.4. Mid-Infrared DiffuseReflectanceSpectroscopy.Mid- general soil data and information within Soil Taxonomy. infrared diffuse reflectance spectra were collected from the ball-milled samples in neat form with a Scimitar 2000 FTIR 2.6.1. Ct Content Subsets. A simple grouping of soils into low spectrometer (Varian, Inc., now Agilent Technologies, Santa and high Ct was used for subsetting by Ct value. Preliminary Clara, CA, USA) and diffuse reflectance infrared Fourier work tested a variety of low Ct/high Ct divisions (e.g., 2, 4, transform (DRIFT) accessory. The spectral range is 400 to 6, 8, and 10% Ct) iteratively. The initial results showed that a − − 6000 cm 1, with a sampling interval of 2 cm 1 and spectral cutoff of 10% Ct was most promising and therefore was used resolution of 4 cm−1 (note:therangeofourMIRspectra for the final analysis. Additionally, a division at 10% allows overlaps slightly with the range of our VNIR spectra.) Spectra for fairly easy assignment of unknown soils into low or high were corrected for background atmospheric and instrument Ct groupings from Ct estimates based on general or readily effects by the subtraction of the spectrum of KBr powder available soil information. measured between every seven samples, but features in To approximate independent validation, samples were two narrow regions persisted. Therefore, we excluded the randomly split into a group of 70% for model calibration − − regions of 1350–1419 cm 1 and 2281–2449 cm 1 from the and 30% for model validation. This random selection was analysis. Features in the MIR spectra of these soils are repeated to produce 10 iterations of calibration/validation − attributable to OH , organic molecules, and a variety of pairs from the full sample set. After this split, the samples silicate minerals. Based on the findings of McDowell et al. from each iteration were divided into low Ct (<10%) and [8], before regression analysis the Savitzky-Golay 1st deriva- high Ct (>10%) subsets. Separate VNIR and MIR regression tive transformation was applied to the MIR spectra as this models were then developed from the low Ct and high Ct was determined to be the most effective pretreatment for this portions of each of the 10 iterations. For comparison, VNIR data set. and MIR regression models from the full sample set using these same 10 calibration and validation divisions, but no 2.5. Regression Analysis. Partial least squares regression separation by Ct value, also were created. (PLSR) was employed to develop the chemometric mod- els for Ct prediction. Models were generated using the 2.6.2. Soil Order Subsets. Four broad soil groups were created Unscrambler X Software package (CAMO Software Inc., based on general similarity of soil order and number of Woodbridge, NJ, USA). The spectral range included in the samples available of that type. The allophane-dominated analysis was decreased slightly by removing any high noise volcanic Andisol soils comprised one group (n = 96), the portions at the limit of the range; therefore, the VNIR spectra Aridisol, Entisol, Inceptisol, Mollisol, and Vertisol soils were were restricted to the range of 425–2450 nm, and the MIR combined to make a second group (high activity clay soils; −1 spectra were restricted to 489–5300 cm . All spectra were n = 101), Oxisol and Ultisol soils made a third group (low mean centered for PLSR analysis. The optimal number of activity clay soils; n = 75), and Histosol and Spodosol soils factors for regression was chosen individually for each model comprised the fourth group (organic-dominated soils; n = based on maximizing the explained variance but minimiz- 26). These soil groupings are based upon information con ing the possibility of over fitting. We considered several tained in Soil Taxonomy allowing for the development of parameters when assessing the quality of models, including soil groups according to clay mineralogy and soil organic the coefficient of determination (R2), root mean squared matter. Table 1 provides information on additional soil error (RMSE), residual prediction deviation (RPD) [20], and properties for each soil subset where available. The average the ratio of performance to interquartile distance (RPIQ) spectra for each of these soil groups are shown in Figure 2. [21]. We defined the RPD as the ratio of the standard Nine soil samples from the NRCS archive had no recorded deviation of the validation set to the standard error of taxonomic classification and therefore were not included in prediction (RPD = SD/SEP) and the RPIQ as the ratio of these subsets. the interquartile distance of the validation set to the standard The full sample set was randomly divided 10 times into error of prediction (RPIQ = IQ/SEP), where the interquartile a group of 70% of samples to be used for the calibration of distance is the difference between the third and first quartiles the regression models and 30% of the samples to be used for (IQ = Q3 − Q1). With respect to these general model quality validation. After this division, the samples of each of the ten parameters, the best model would have the highest R2,RPD, iterations were grouped according to soil order as described and RPIQ, and the lowest RMSE. We also examined the above. Separate VNIR and MIR regression models were then success of the predictions for individual samples using the developed for each soil group subset within each of the ten percent error, calculated as the absolute difference between calibration/validation iterations. Because the number of low the measured (i.e., by combustion) and predicted (i.e., by activity clay and organic-dominated soil samples was small DRS) Ct values, divided by the measured value, and multi- (e.g., ≤80), full cross validation (i.e., leave-one-out cross plied by 100. validation) was used with the regression models for these 4 Applied and Environmental Soil Science a 10.13 a 0.52 r that specific subset with the Not available ional information (i.e., pH, texture, a 10.95 (1.65) (1.11) (47.52) (34.86) (17.61) (5.92) (8.28) (0.096) (23.23) (36.19) (20.26) (31.68) (30.45) (37.86) (4.29) (13.39) (12.53) (17.26) (40.62) (42.83) (5.66) (8.54) (0.64) (15.49) (14.51) (3.94) (25.72) (44.08) (30.21) (5.89) 0.24–510.15–10 0.39–55.595–55.29 0.3–59.8 0.2–3.58 4.7–81.3 2.62–54.98 2.4–94.9 7.6–88.7 4.4–67.6 3.7–8 10.4–69.5 11.5–45.7 0.75–69.8 1.58–13.89 1.3–84.1 4.5–7.3 0.025–4.80 3.3–5.8 7.66–9.61 7.33–22.63 0.049–0.16 13.43–27.03 0.21–53.63 0.3–14.65 0.2–66.7 10.8–93.2 0.4–88.6 3.3–8.3 Total carbon wt% Organic carbon wt% Clay wt% Silt wt% Sand wt% pH Total Al wt% Total Ca wt% Total Fe wt% 1: Soil properties of selected samples for each soil grouping used in subsetting by soil order. Values listed in the table are the minimum and maximum fo Only one data point available. Andisol soils High activity clay soils Lowactivityclaysoils Organic-dominated soils mean in parentheses. Data is providedAl, for Ca, samples and from Fe) for the the Natural Resources samples Conservation newly Service collected in (NRCS) archive 2010 where has it yet is to available. be The determined. composit Table a Applied and Environmental Soil Science 5

VNIR spectral classification has the advantage of requiring no 40 additional information about the soil. 35 The spectral classification subsets were created by k- X 30 means cluster analysis with Unscrambler .Spectrawere assigned to three cluster subsets based on the minimum 25 Euclidean distance to cluster centers. Separate analyses were 20 conducted for the VNIR and MIR spectra, resulting in different combinations of samples in their cluster subsets. 15 The spectral range used for these cluster analyses was Reflectance (%) Reflectance 10 limited to the regions most relevant to carbon prediction as previously determined by the PLSR variable significance 5 analysis by McDowell et al. [8]. Specifically, the ranges used 0 were 600–750, 898–990, 1910–1938, 2070–2150, and 2288– 500 1000 1500 2000 2316 nm for the VNIR spectra and 1500–1870, 3650–3690, Wavelength (nm) 4235–4260, 4305–4330, 4410–4455, and 5280–5245 cm−1 for (a) the MIR spectra. Each cluster subset was randomly divided into a group of 70% for model calibration and a group of MIR the remaining 30% for model validation, unless the number of samples in the cluster was small (e.g., ≤80), in which 50 case samples were not divided and full cross validation was performed. The random division into calibration and val- 40 idation groups was repeated nine more times to give 10 calibration/validation pairs for each of the VNIR and MIR 30 cluster subsets. Separate Ct prediction models were created for each of the different cluster subsets. For comparison, we 20 also developed 10 VNIR and 10 MIR models from the full

Reflectance (%) Reflectance sample set. The calibration and validation groups for these 10 models were created by combining the respective calibration or validation groups from the three different cluster subset 0 models. VNIR and MIR full cross validation models using 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 the full sample set were also produced to compare with full Wavenumber (cm−1) cross validation models from small cluster subsets. Andisol soils Low activity clay soils High activity clay soils Organic-dominated soils 3. Results and Discussion (b) 3.1. Modeling of Ct Content Subsets. The VNIR models subset by Ct content produced the results summarized in Table 2 Figure 2: Average (a) visible/near-infrared (VNIR) and (b) mid- and plotted in Figure 3(a). The range of results from the 10 ff infrared (MIR) di use reflectance spectra of soil groups used in sub- random divisions of the samples into 70% calibration and setting by soil order. Dashed lines represent one standard deviation 30% validation groups is given along with their mean value. from the average. The R2, RPD, and RPIQ values for the low Ct subset were not as good as those produced using the full sample set, though the RMSE values were lower for the low Ct subset. The results for the high Ct models approached, but were not quite as two groups rather than committing 30% of those samples good, as the results from the full sample set. to validation as with the other subsets. Additional models were created from the 10 calibration/validation divisions Results from the MIR Ct subset models are shown in of the full sample set with no separation of soil order for Figure 3(b) and Table 3. The models produced by the low the comparison of results without subsetting. A full cross Ct subset were generally of lesser quality than those of the validation model of the full sample set was developed to be full sample set, with the exception of better RMSE values, a compared with the low activity clay and organic-dominated trend similar to the VNIR models. The high Ct models were soil subsets’ full cross validation models. comparable overall to the high quality models produced by using the full sample set. 2.6.3. Spectral Classification Subsets. Our rationale behind From these results, it appears that a separate high Ct pre- grouping soil samples by spectral character is based on diction model is not an improvement over a model utilizing the assumption that this approach removes major spectral the full Ct range of available samples for either the VNIR or variation from consideration so that small-scale variation is MIR spectra from this data set. This statement may be true used to produce a more refined Ct prediction model. Also, for a separate low Ct prediction model as well, but the benefit the division of soil samples into subsets created solely from of a lower RMSE should also be considered. 6 Applied and Environmental Soil Science

Table 2: Detailed partial least squares regression model results for soil total carbon (Ct) prediction from the subsets of visible/near-infrared diffuse reflectance spectra based on Ct content. The range of values reflects the results of 10 random iterations of the models, and the number in parentheses is the mean. Detailed results are also given for full sample set models with no subsetting for comparison.

Calibration Validation na R2,b RMSE (%)c nR2 RMSE (%) RPDd RPIQe 0.43–0.80 1.08–1.78 0.47–0.76 1.27–1.97 1.37–2.03 1.77–2.88 Ct < 10% 133–147 56–70 (0.64) (1.46) (0.61) (1.59) (1.63) (2.12) 0.77–0.93 3.86–7.00 0.77–0.91 3.96–7.65 2.05–3.21 2.38–5.16 Ct > 10% 68–82 22–36 (0.86) (5.33) (0.84) (5.87) (2.55) (4.02) 0.81–0.96 2.88–5.87 0.81–0.95 2.82–7.18 2.27–4.47 2.08–4.35 Full sample set 215 92 (0.91) (4.06) (0.91) (4.24) (3.46) (3.19) a Number of samples. bCoefficient of determination. cRoot mean squared error. dResidual prediction deviation. eRatio of performance to interquartile distance.

Table 3: Detailed partial least squares regression model results for soil total carbon (Ct) prediction from the subsets of mid-infrared diffuse reflectance spectra based on Ct content. The range of values reflects the results of 10 random iterations of the models, and the number in parentheses is the mean. Detailed results are also given for full sample set models with no subsetting for comparison.

Calibration Validation na R2,b RMSE (%)c nR2 RMSE (%) RPDd RPIQe 0.86–0.99 0.21–0.87 0.71–0.86 0.94–1.26 1.84–2.64 2.24–3.66 Ct < 10% 133–147 56–70 (0.94) (0.58) (0.82) (1.10) (2.34) (3.05) 0.91–0.99 1.11–4.47 0.90–0.95 3.48–4.93 3.18–4.29 3.10–8.42 Ct > 10% 68–82 22–36 (0.95) (3.10) (0.92) (4.17) (3.55) (5.75) 0.94–0.99 1.61–3.40 0.91–0.96 2.87–4.48 3.33–4.87 2.36–5.69 Full sample set 215 92 (0.96) (2.61) (0.94) (3.38) (4.07) (3.74) a Number of samples. bCoefficient of determination. cRoot mean squared error. dResidual prediction deviation. eRatio of performance to interquartile distance.

Results varied for previous studies examining the behav- 3.2. Modeling of Soil Order Subsets. The results of the VNIR ior of separate models based on carbon content. Madari et models from the soil order subsets are given in Table 4 and al. [16] found that limiting the Ct in their NIR and MIR Figure 4(a). The models from the Andisol subset did not per- calibration models to 0.4–99.10 g kg−1 and 0.4–39.90 g kg−1 form as well as the models using the full sample set. The R2, decreased the not only R2, but also the root mean squared RMSE, and RPD values for the high activity clay subset were deviation (RMSD) compared to the original NIR and MIR similar to those of the full sample set models, but the RPIQ − models (0.4–555 g kg 1 Ct); this behavior is similar to that values were generally slightly lower. The low activity clay observed in the low Ct models presented here. The study by and organic-dominated subsets were not validated with an Vasques et al. [18] developed separate VNIR organic carbon independent validation set due to small sample numbers, and prediction models for their mineral and organic soil samples, therefore their results may be overly optimistic. Compared to which roughly correspond to division by carbon content in a full cross validation of a model created from the full sample this case (mineral soils, 0.01–14.70% carbon; organic soils, set, the low activity clay subset model did not perform as well, 13.52–57.54% carbon). Compared to the original combined except when considering the RMSE parameter, whereas the model, the R2 improved for both of the subset models, but organic-dominated subset model is broadly similar. the RMSE decreased for the lower carbon mineral group and Table 5 and Figure 4(b) show the results of the MIR soil increased for the higher carbon organic group. The increase order subset models. The models produced by the Andisol in R2 values for the subset models differsfromwhatisseen subset had no improvement on the models produced by in our work and that of Madari et al. [16] and is an example the full sample set. Results for the high activity clay subset of soils with different characteristics responding differently models were as good as or better than the full sample set to the same treatment. model results, with the exception of lower RPIQ values. The Applied and Environmental Soil Science 7

VNIR 1 8 5 6

7 5 0.8 4 6 4 0.6 5 3 2

R 4 3 RPD RPIQ RMSE 0.4 3 2 2 2 0.2 1 1 1

0 0 0 0 Ct < 10 Ct > 10 Full Ct < 10 Ct > 10 Full Ct < 10 Ct > 10 Full Ct < 10 Ct > 10 Full sample sample sample sample set set set set (a) MIR 1 5 5 9 8 0.8 4 4 7 6 0.6 3 3 5 2 R RPD RPIQ

RMSE 4 0.4 2 2 3

0.2 1 1 2 1 0 0 0 0 Ct < 10 Ct > 10 Full Ct < 10 Ct > 10 Full Ct < 10 Ct > 10 Full Ct < 10 Ct > 10 Full sample sample sample sample set set set set Calibration Validation Validation mean (b)

Figure 3: Visual assessment of partial least squares regression model results for soil total carbon (Ct) prediction from subsets of (a) visible/near-infrared (VNIR) and (b) mid-infrared (MIR) diffuse reflectance spectra based on Ct content. The parameters given are the coefficient of determination (R2), root mean squared error (RMSE, %), residual prediction deviation (RPD), and the ratio of performance to interquartile distance (RPIQ). The range of values reflects the results of 10 random iterations of the models. Results are also shown for full sample set models with no subsetting for comparison. overall performance of the low activity clay and organic- Reference Base [22], approximately equivalent to most of the dominated subset models using full cross validation was not Oxisol soil order), and the (classification according quite as good as the full cross validation model from the full to the World Reference Base [22], consisting of many Ultisol sample set. suborders and some ). The results of these models These results suggest that a separate prediction model for varied. The Ferralsol and the NIR and MIR models the high activity clay soil orders may have a slight advantage had lower R2 than the original model and also lower RMSD; − compared to a model with all available soil orders for both these two subsets included relatively low Ct (2–85.10 g kg 1 the VNIR and MIR spectra of this data set. Separate predic- and 1.70–91.60 g kg−1, resp.) compared to the full sample tion models for the other soil order subsets do not appear to set (0.40–555 g kg−1), so this lower R2 and lower RMSD be as promising. are a similar behavior to the low Ct subset models in the AstudybyMadarietal.[16] also investigated the benefits current study. The Histosol and Spodosol subset NIR and of subsetting their samples according to soil order. The MIR models in Madari et al. [16] resulted in slightly higher authors produced separate models for the and Spo- R2 values and much higher RMSD values. Our Histosol and dosols, the Ferralsols (classification according to the World Spodosol (i.e., organic-dominated soils) subset models did 8 Applied and Environmental Soil Science

Table 4: Detailed partial least squares regression model results for soil total carbon (Ct) prediction from the subsets of visible/near-infrared diffuse reflectance spectra based on soil order. The range of values reflects the results of 10 random iterations of the models, and the number in parentheses is the mean. Detailed results are also given for full sample set models with no subsetting for comparison. For models with full cross validation (i.e., leave-one-out cross validation), the same samples used to calibrate the model were used to validate the model.

Calibration Validation na R2,b RMSE (%)c nR2 RMSE (%) RPDd RPIQe 0.62–0.86 2.71–7.75 0.37–0.93 3.38–7.48 1.01–3.80 1.29–3.38 Andisol soils 64–71 25–32 (0.72) (4.64) (0.69) (4.85) (2.02) (2.28) 0.86–0.98 2.38–5.17 0.74–0.98 2.19–6.31 1.89–7.74 0.71–3.03 High activity clay soils 67–72 29–34 (0.93) (3.73) (0.90) (4.02) (4.12) (1.68) Low activity clay soils 75 0.82 0.72 Full cross validation 0.74 0.90 1.93 1.82 Organic-dominated soils 26 0.96 3.35 Full cross validation 0.92 5.16 3.30 6.26 0.82–0.96 2.89–5.96 0.79–0.95 2.96–6.03 2.25–4.43 2.07–4.53 Full sample set 215 92 (0.92) (3.89) (0.91) (4.02) (3.58) (3.42) Full sample set 307 0.95 3.09 Full cross validation 0.94 3.39 4.09 3.80 a Number of samples. bCoefficient of determination. cRoot mean squared error. dResidual prediction deviation. eRatio of performance to interquartile distance.

Table 5: Detailed partial least squares regression model results for soil total carbon (Ct) prediction from the subsets of mid-infrared diffuse reflectance spectra based on soil order. The range of values reflects the results of 10 random iterations of the models, and the number in parentheses is the mean. Detailed results are also given for full sample set models with no subsetting for comparison. For models with full cross validation (i.e., leave-one-out cross validation), the same samples used to calibrate the model were used to validate the model.

Calibration Validation na R2,b RMSE (%)c nR2 RMSE (%) RPDd RPIQe 0.84–0.96 1.92–3.02 0.41–0.92 2.99–6.94 1.12–3.60 1.87–4.09 Andisol soils 64–71 25–32 (0.91) (2.49) (0.79) (4.03) (2.33) (2.66) 0.96–0.99 0.96–2.71 0.95–0.99 1.70–3.60 4.34–9.81 0.92–4.38 High activity clay soils 67–72 29–34 (0.98) (1.74) (0.96) (2.65) (5.57) (2.44) Low activity clay soils 75 0.98 0.24 Full cross validation 0.79 0.80 2.10 2.01 Organic-dominated soils 26 0.97 2.9 Full cross validation 0.86 6.7 2.52 4.78 0.94–0.98 1.94–3.50 0.91–0.96 2.74–3.91 3.38–5.07 3.22–5.27 Full sample set 215 92 (0.96) (2.78) (0.94) (3.39) (4.07) (3.89) Full sample set 307 0.95 3.12 Full cross validation 0.94 3.52 3.94 3.68 a Number of samples. bCoefficient of determination. cRoot mean squared error. dResidual prediction deviation. eRatio of performance to interquartile distance. not have significantly increased R2 values, but the validation model was the only one that did not improve in R2 or RMSE. RMSE values were greater than the full sample set models’ These results are somewhat different from those in this study, values. where only the high activity clay soils (i.e., , , Vasques et al. [18] developed separate organic carbon , , and ) are suggested to provide prediction VNIR models for each of the seven soil orders in an overall improvement on models including all available their sample set consisting of soils from Florida, southeastern samples. USA Compared to the original model containing all of these mineral soil samples, six of the seven soil order subset models resulted in improved R2 values (Alfisols, Entisols, Inceptisols, 3.3. Modeling of Spectral Classification Subsets. The k-means Mollisols, Spodosols, and ). The RMSE values were cluster analysis of the VNIR spectra resulted in an unequal also similar or better for these subsets. The Histosol subset distribution of samples between the three clusters. The Applied and Environmental Soil Science 9

VNIR 1 8 8 7

7 7 6 0.8 6 6 5 5 5 0.6 4 2

R 4 4 RPD RPIQ RMSE 3 0.4 3 3 2 2 2 0.2 1 1 1

0 0 0 0 set set set set soils clay soils clay clay soils clay clay soils clay clay soils clay Full sample Full Full sample Full Full sample Full Full sample Full soils soils soils clay soils clay clay soils clay clay soils clay clay soils clay Low activityLow Low activityLow Low activityLow Low activityLow Andisol soils Andisol soils Andisol soils Andisol soils Andisol High activity High activity High activity High activity Organic-dominated Organic-dominated Organic-dominated Organic-dominated (a) MIR 1 7 10 6 9 6 5 0.8 8 5 7 4 0.6 4 6 2

R 5 3 RPD RPIQ RMSE 3 0.4 4 2 2 3 0.2 2 1 1 1 0 0 0 0 set set set set soils soils soils soils clay soils clay clay soils clay clay soils clay clay soils clay Full sample Full Full sample Full Full sample Full Full sample Full clay soils clay clay soils clay clay soils clay clay soils clay Low activityLow Low activityLow Low activityLow Low activityLow Andisol soils Andisol soils Andisol soils Andisol soils Andisol High activity High activity High activity High activity Organic-dominated Organic-dominated Organic-dominated Organic-dominated

Calibration Full cross validation Validation Validation mean Calibration (b)

Figure 4: Visual assessment of partial least squares regression model results for soil total carbon (Ct) prediction from subsets of (a) visible/near-infrared (VNIR) and (b) mid-infrared (MIR) diffuse reflectance spectra based on soil order. The parameters given are the coefficient of determination (R2), root mean squared error (RMSE, %), residual prediction deviation (RPD), and the ratio of performance to interquartile distance (RPIQ). The range of values reflects the results of 10 random iterations of the models. Results are also shown for full sample set models with no subsetting for comparison.

Cluster 0 subset consisted of only 78 samples (∼3–56% Ct) the 10 VNIR Ct prediction models from each of the clusters and therefore all 78 samples were used in its model calibra- are given in Table 6 and Figure 5(a). A comparison of the tion and full cross validation. The Cluster 1 and Cluster 2 Cluster 0 subset model with a full cross validation model subsets contained 124 samples (∼0–23% Ct) and 105 sam- of the full sample set showed that the subset model was not ples (∼0–14% Ct), respectively, allowing for the independent quite as robust, though it did produce a higher RPIQ value. validation of the models as initially planned. The results of The Cluster 1 and Cluster 2 subset models’ results generally 10 Applied and Environmental Soil Science

VNIR 1 6 5 6

5 5 0.8 4

4 4 0.6 3 2

R 3 3 RPD RPIQ RMSE 0.4 2 2 2

0.2 1 1 1

0 0 0 0 set set set set Cluster 0 Cluster 1 Cluster 2 Cluster 0 Cluster 1 Cluster 2 Cluster 0 Cluster 1 Cluster 2 Cluster 0 Cluster 1 Cluster 2 Cluster Full sample Full sample Full sample Full sample Full (a) MIR 1 6 5 6

5 5 0.8 4

4 4 0.6 3

2 3 3 R RPD RPIQ RMSE 0.4 2 2 2

0.2 1 1 1

0 0 0 0 set set set set Cluster 0 Cluster 1 Cluster 2 Cluster Cluster 0 Cluster 1 Cluster 2 Cluster Cluster 0 Cluster 1 Cluster 2 Cluster Cluster 0 Cluster 1 Cluster 2 Cluster Full sample Full Full sample Full Full sample Full Full sample Full

Calibration Full cross validation Validation Validation mean Calibration (b)

Figure 5: Visual assessment of partial least squares regression model results for soil total carbon (Ct) prediction from the subsets of (a) visible/near-infrared (VNIR) and (b) mid-infrared (MIR) diffuse reflectance spectra based on spectral classification with k-means cluster analysis. The parameters given are the coefficient of determination (R2), root mean squared error (RMSE, %), residual prediction deviation (RPD), and the ratio of performance to interquartile distance (RPIQ). The range of values reflects the results of 10 random iterations of the models. Results are also shown for full sample set models with no subsetting for comparison.

had lower (i.e., better) RMSE values, but were otherwise not the cluster subsets, as well as those from the full sample set quite as robust as the full sample set models’ results. models for comparison. The results for the Cluster 0 subset In the cluster analysis of the MIR spectra, the distribution models are broadly similar to those of the full sample set of samples was heavily weighted toward the Cluster 0 (137 models but overall they are not an improvement. Results samples, ∼0–52% Ct) and Cluster 2 (132 samples, ∼0– from the full cross validation of Cluster 1 subset were slightly 11% Ct) subsets. The Cluster 1 subset contained only 38 higher for calibration but much lower for validation than the samples (∼15–56% Ct) and was validated with full cross full cross validation of the full sample set. In general, the validation instead of independent validation. Table 7 and Cluster 1 model is not as robust as the full sample set model. Figure 5(b) present the results of the prediction models from The overall performance of Cluster 2 subset models is not Applied and Environmental Soil Science 11

Table 6: Detailed partial least squares regression model results for soil total carbon (Ct) prediction from the subsets of visible/near-infrared diffuse reflectance spectra based on spectral classification with k-means cluster analysis. The range of values reflects the results of 10 random iterations of the models, and the number in parentheses is the mean. Detailed results are also given for full sample set models with no subsetting for comparison. For models with full cross validation (i.e., leave-one-out cross validation), the same samples used to calibrate the modelwereusedtovalidatethemodel. Calibration Validation na R2,b RMSE (%)c nR2 RMSE (%) RPDd RPIQe Cluster 0 78 0.93 4.52 Full cross validation 0.88 5.87 2.86 5.40 0.68–0.88 1.92–3.26 0.60–0.91 1.74–3.47 1.54–3.33 1.94–5.50 Cluster 1 87 37 (0.77) (2.86) (0.75) (2.89) (2.16) (3.14) 0.54–0.96 0.65–2.22 0.62–0.91 0.98–1.72 1.67–3.34 0.79–2.56 Cluster 2 73 32 (0.81) (1.29) (0.80) (1.33) (2.39) (1.71) 0.83–0.96 2.82–5.84 0.74–0.95 3.10–5.83 1.89–4.54 1.80–3.92 Full sample set 215 92 (0.90) (4.30) (0.88) (4.30) (3.28) (3.06) Full sample set 307 0.95 3.09 Full cross validation 0.94 3.39 4.09 3.80 a Number of samples. bCoefficient of determination. cRoot mean squared error. dResidual prediction deviation. eRatio of performance to interquartile distance.

Table 7: Detailed partial least squares regression model results for soil total carbon (Ct) prediction from the subsets of mid-infrared diffuse reflectance spectra based on spectral classification with k-means cluster analysis. The range of values reflects the results of 10 random iterations of the models, and the number in parentheses is the mean. Detailed results are also given for full sample set models with no subsetting for comparison. For models with full cross validation (i.e., leave-one-out cross validation), the same samples used to calibrate the modelwereusedtovalidatethemodel. Calibration Validation na R2,b RMSE (%)c nR2 RMSE (%) RPDd RPIQe 0.78–0.96 1.49–4.07 0.55–0.91 2.08–4.67 1.13–3.20 1.77–5.65 Cluster 0 96 41 (0.90) (2.45) (0.81) (3.43) (2.34) (3.31) Cluster 1 38 0.98 1.89 Full cross validation 0.86 5.19 2.62 3.93 0.88–0.99 0.15–0.58 0.77–0.90 0.39–0.82 1.50–2.84 1.30–3.33 Cluster 2 92 40 (0.95) (0.33) (0.85) (0.56) (2.36) (2.33) 0.93–0.98 1.68–3.61 0.92–0.95 2.94–3.78 3.48–4.68 2.61–4.61 Full sample set 215 92 (0.95) (2.98) (0.94) (3.38) (4.03) (3.82) Full sample set 307 0.95 3.12 Full cross validation 0.94 3.52 3.94 3.68 a Number of samples. bCoefficient of determination. cRoot mean squared error. dResidual prediction deviation. eRatio of performance to interquartile distance. quite as good as the full sample set models, but the limited effect of four different unsupervised classification algorithms Ct range of Cluster 2 subset is apparent from its much lower (k-means, expectation-maximization, Ward’s Euclidean dis- range of RMSE values. tance, and Lance and Williams’ Euclidean distance) on For this sample set, the spectral classification by k-means simple linear regression results from VNIR data. These clustering and separate prediction model for each cluster was clustering algorithms produced five or six clusters, and the not an obvious improvement over the original full VNIR number of samples per cluster ranged from four to 56. This is or MIR models. The most noticeable difference is the lower in contrast to the method of k-means cluster analysis used in RMSE for the subset models from clusters limited to low Ct our study, where we specified that three clusters be produced values. to decrease the probability of a very low number of samples We have found one other study that investigated the effect in a cluster that would not be adequate for robust modeling. of subsetting a sample set by spectral classification for the Cierniewski et al. [17] found that the majority of their cluster prediction of soil carbon. Cierniewski et al. [17] tested the subsets had improved R2 values compared to the original 12 Applied and Environmental Soil Science full sample set. An increase in R2 was not observed for the 3.5. Variation in Model Parameters. The ranges of PLSR spectral classification subsets in the current work. Instead, model parameters produced by the 10 iterations of random the most significant improvement was a lower RMSE for calibration/validation set divisions in this study appear to many of the cluster subset models. Because other parameters be larger than the ranges of values encountered in previous such as RMSE were not provided in Cierniewski et al. [17], studies where multiple PLSR model iterations were used. it is difficult to determine if this behavior is an effect of their Brown et al. [13] reported results for five models produced subsetting study. from different random divisions of the sample set into 70% calibration and 30% validation groups. Values for organic 3.4. Percent Error of Prediction. The subset models with carbon prediction from VNIR data ranged from 0.75 to 0.86 improved RMSE values but an otherwise less-robust per- for R2, 1.08 to 1.26 for RMSD, and 1.95 to 2.62 for RPD. formance may still hold an advantage over the original full Mouazen et al. [14] included three model iterations with sample set model. If a more accurate prediction of the low random divisions into 90% calibration and 10% validation Ct samples makes a significant contribution to the lowered groups in their study. The exhaustive results are not reported, RMSE, the model could be very helpful in addressing the but visual estimation from plots of the mean and standard issue of large errors at low Ct values. To evaluate the error deviation for the R2 and RMSE from the organic carbon at these low Ct values, the percent error of prediction was prediction models suggests that the variation is similar to calculated for the samples with Ct values less than 10% and that in Brown et al. [13]orless.Thegreaterrangeinmodel the average value was reported for each model (Figure 6). We parameters observed in our study may be related to the use percent error rather than RMSE for comparing the subset testing of a greater number of iterations (i.e., 10 rather than models with the full sample set model to normalize the error five or three), or it could be related to a less obvious attribute, of the predicted value with respect to its measured value. such as a greater variation in a spectral character within the The mean value of the average percent error for each sample set. of the ten iterations of the full sample set model is ∼160– 200%, but the average percent error for a single model 4. Summary and Conclusions could be up to almost 400% (Figure 6). For example, with ameasuredvalueof1%Ct, an error of 400% would be Our research has provided an introduction to the under- translated to a predicted value of 5% Ct. The MIR full studied idea of sample subsetting based on criteria that are sample set models have lower average percent error, with a simple and easily applied. This particular investigation of mean average percent error of ∼135–150% and a maximum subsetting for Ct prediction had varied results with our average percent error of ∼200%. Many of the low RMSE Hawaiian soils sample set. Of all the different subset models subset models have noticeably lower average percent errors. created based on Ct content, soil order, and spectral classi- The low Ct VNIR and MIR models and the Cluster 2 MIR fication, the subset of high activity clay soil orders was the models appear to have the most significant improvement, only one to show improvement across all parameters (i.e., with average percent errors of ∼80% or less. For a measured R2, RMSE, RPD, and RPIQ) compared to the full sample value of 1% Ct, a percent error of 80% would reduce the set. Notably, one significant advantage was discovered; the predicted value to 1.8% Ct. Clusters 1 and 2 VNIR models subsets including only low Ct samples (e.g., Ct < 10% subset, also show moderate improvement, with all average percent MIR Cluster 2 subset) produced models with much lower error results below ∼175%. The average percent error of the RMSE values compared to the full sample set models, even low activity clay soils full cross validation model is slightly though the other model parameters were not as robust. The lower than the full sample set model for both the VNIR lower RMSE for these models corresponds to a significant and MIR data. The organic-dominated soils subset includes decrease in the percent error of predictions for low Ct only two samples with Ct <10%, so a comparison of average samples, which could be very helpful for the analysis of percent error is not as reliable in this case. soils with low Ct content or monitoring of small changes in The subsets with the largest decreases in average percent Ct. Incorporation of a low Ct subset model in the future error of prediction at low Ct content (i.e., Ct < 10%) are prediction of unknown soils Ct values could be done by first the ones that included only low Ct samples in their models. employing a model created with the full possible range of Ct The low Ct VNIR and MIR models contained samples with values and then utilizing the separate low Ct subset model if Ct values between ∼0 and 9.9% Ct, and the Cluster 2 MIR the soil is predicted to have low Ct. models had samples with Ct values between ∼0 and 11% Ct. As seen from this study and previous studies, the effect of These results suggest that a separate model for low Ct subsetting can have different results depending on the char- samples is beneficial for the accuracy of prediction for the acter of the sample set and the number of samples it includes. samples in this range. This advantage is indicated by the A small sample size may have limited the improvement RMSE of low Ct models, but may not be obvious from the R2 possible by subsetting in the current work. In an effort to parameter. The issue of relatively large errors of prediction keep the size of subsets large enough for regression analysis, for samples with very low Ct content has been understudied. the subsetting may have been too coarse (e.g., too few subsets To our knowledge there are no studies that have provided for Ct prediction by soil order and spectral classification). quantitative information addressing the degree of scatter The types of subsetting strategies explored here may be observed for low Ct soils on most predicted versus measured most helpful for large datasets and should be tested with plots. further research. Regardless of the strategy used to develop Applied and Environmental Soil Science 13

VNIR 300 400 400

350 350 250 300 300 200 250 250

150 200 200 10% samples 10% samples 10% samples < < < t t t 150 150 100 for C for C for C Average percent error percent Average error percent Average Average percent error percent Average 100 100 50 50 50

0 0 0 10 < t set set set C Cluster 0 Cluster 1 Cluster 2 Cluster clay soils clay soils Full sample Full Full sample Full Full sample Full clay soils clay Low activityLow Andisol soils Andisol High activity Organic-dominated (a) MIR 180 200 200

160 180 180

140 160 160 140 140 120 120 120 100 100 100 10% samples 10% samples 10% samples

< 80 < < t t 80 t 80 60 for C for C for C Average percent error percent Average error percent Average Average percent error percent Average 60 60

40 40 40

20 20 20

0 0 0 10 < t set set set C Cluster 0 Cluster 2 Cluster clay soils clay soils Full sample Full Full sample Full Full sample Full clay soils clay Low activityLow Andisol soils Andisol High activity Organic-dominated

70% calibration/30% validation results Mean Full cross validation result (b)

Figure 6: Average percent error of the Ct <10% portion of the (a) visible/near-infrared (VNIR) and (b) mid-infrared (MIR) subset and full sample set models in this study. The range of values reflects the results of 10 random iterations of the models. The VNIR and MIR high Ct models and the MIR Cluster 1 models were not included because all samples had Ct >10%. 14 Applied and Environmental Soil Science a model, our results suggest that multiple iterations of visible, near-infrared and mid-infrared diffuse reflectance models with different calibration/validation groupings may spectroscopy,” Geoderma, vol. 189-190, pp. 312–320, 2012. help to produce a more complete picture of the overall model [9] K. Paustian, O. Andren,´ H. H. Janzen et al., “Agricultural soils quality. as a sink to mitigate CO2 emissions,” Soil Use and Manage- ment, vol. 13, no. 4, pp. 230–244, 1997. [10] H. Tiessen, E. Cuevas, and P. Chacon, “The role of soil organic Acknowledgments matter in sustaining soil fertility,” Nature, vol. 371, no. 6500, pp. 783–785, 1994. This research was supported by USDA CSREES TSTAR [11]E.T.CraswellandR.D.B.Lefroy,“Theroleandfunction Project 2009-34135-20183 and UHM College of Tropical of organic matter in tropical soils,” Nutrient Cycling in Agroe- Agriculture and Human Resources (CTAHRs) Hatch Project cosystems, vol. 61, no. 1-2, pp. 7–18, 2001. HA-154. The authors thank J. Hempel, L. West, T. Reinsch, [12] R. Lal, “Soil carbon sequestration impacts on global climate L. Arnold, and R. Nesser of the NRCS National Soil Survey change and food security,” Science, vol. 304, no. 5677, pp. Center in Lincoln, NE, USA, for help with access, sampling, 1623–1627, 2004. and scanning of the archived samples; L. Muller and A. [13] D. J. Brown, R. S. Bricklemyer, and P. R. Miller, “Validation Quidez for help with scanning of samples at UHM; Drs. G. requirements for diffuse reflectance soil characterization mod- Uehara, R. Yost, and D. Beilman of UHM for the support els with a case study of VNIR soil C prediction in Montana,” of this project. They also appreciate the Hawaii landowners, Geoderma, vol. 129, no. 3-4, pp. 251–267, 2005. managers, and extension agents that gave them access to their [14] A. M. Mouazen, B. Kuang, J. De Baerdemaeker, and H. Ramon, “Comparison among principal component, partial least fields for collecting soil samples. These include from Kauai: squares and back propagation neural network analyses for R. Yamakawa and J. Gordines (CTAHR), S. Lupkes (BASF), accuracy of measurement of selected soil properties with visi- and Grove Farms; from Oahu: R. Corrales, A. Umaki, and ble and near infrared spectroscopy,” Geoderma, vol. 158, no. J. Grzebik (CTAHR), Hoa Aina, MAO Organic Farm, Nii 1-2, pp. 23–31, 2010. Nursery, J. Antonio and M. Conway (Dole), C. and P. [15] D. V. Sarkhot, S. Grunwald, Y. Ge, and C. L. S. Morgan, Reppun,L.Santo,T.Jones,andN.Dudley(HARC),andA. “Comparison and detection of total and available soil carbon Sou (Aloun Farms); from Molokai: A. Arakaki (CTAHR), K. fractions using visible/near infrared diffuse reflectance spec- Duvchelle (NRCS), and R. Foster (Monsanto); from Maui: troscopy,” Geoderma, vol. 164, no. 1-2, pp. 22–32, 2011. J. Powley and D. Oka (CTAHR), M. Nakahata and M. Ross [16] B. E. Madari, J. B. Reeves, M. R. Coelho et al., “Mid- and (HC&S), T. Callender (Ulupono), and B. Abru. near-infrared spectroscopic determination of carbon in a diverse set of soils from the Brazilian national soil collection,” Spectroscopy Letters, vol. 38, no. 6, pp. 721–740, 2005. References [17] J. Cierniewski, C. Kazmierowski,´ K. Kusnierek´ et al., “Unsu- pervised clustering of soil spectral curves to obtain their [1] J. B. Reeves III, G. W. McCarty, and V.B. Reeves, “Mid-infrared stronger correlation with soil properties,” in Proceedings of the diffuse reflectance spectroscopy for the quantitative analysis of 2nd Workshop on Hyperspectral Image and Signal Processing: agricultural soils,” Journal of Agricultural and Food Chemistry, Evolution in Remote Sensing, (WHISPERS ’10), Reykjavik, vol. 49, no. 2, pp. 766–772, 2001. Iceland, June 2010. [2]G.W.McCarty,J.B.Reeves,V.B.Reeves,R.F.Follett,andJ.M. [18] G. M. Vasques, S. Grunwald, and W. G. Harris, “Spectroscopic Kimble, “Mid-infrared and near-infrared diffuse reflectance models of soil organic carbon in Florida, USA,” Journal of spectroscopy for soil carbon measurement,” Soil Science Environmental Quality, vol. 39, no. 3, pp. 923–934, 2010. Society of America Journal, vol. 66, no. 2, pp. 640–646, 2002. [19] AOAC International, OfficialMethodsofAnalysisofAOAC [3] K. D. Shepherd and M. G. Walsh, “Development of reflectance International, AOAC International, Arlington, Va, USA, 16th spectral libraries for characterization of soil properties,” Soil edition, 1997. Science Society of America Journal, vol. 66, no. 3, pp. 988–998, [20] P. C. Williams, “Variables affecting near-infrared reflectance 2002. spectroscopic analysis,” in Near-Infrared Technology in the [4]R.A.V.Rossel,D.J.J.Walvoort,A.B.McBratney,L.J.Janik, Agricultural and Food Industries, P. Williams and K. Norris, and J. O. Skjemstad, “Visible, near infrared, mid infrared or Eds., pp. 143–167, American Association of Cereal Chemists, combined diffuse reflectance spectroscopy for simultaneous St. Paul, Minn, USA, 1987. assessment of various soil properties,” Geoderma, vol. 131, no. [21] V. Bellon-Maurel, E. Fernandez-Ahumada, B. Palagos, J. M. 1-2, pp. 59–75, 2006. Roger, and A. McBratney, “Critical review of chemometric [5] G. M. Vasques, S. Grunwald, and J. O. Sickman, “Comparison indicators commonly used for assessing the quality of the pre- of multivariate methods for inferential modeling of soil diction of soil attributes by NIR spectroscopy,” Trends in Ana- carbon using visible/near-infrared spectra,” Geoderma, vol. lytical Chemistry, vol. 29, no. 9, pp. 1073–1081, 2010. 146, no. 1-2, pp. 14–25, 2008. [22] W. R. B. IUSS Working Group, “World reference base for soil [6]G.M.Vasques,S.Grunwald,andJ.O.Sickman,“Modeling resources,” World Soil Resources report no. 103, FAO, Rome, of soil organic carbon fractions using visible—near-lnfrared Italy, 2006. spectroscopy,” Soil Science Society of America Journal, vol. 73, no. 1, pp. 176–184, 2009. [7] R. A. V. Rossel and T. Behrens, “Using data mining to model and interpret soil diffuse reflectance spectra,” Geoderma, vol. 158, no. 1-2, pp. 46–54, 2010. [8] M. L. McDowell, G. L. Bruland, J. L. Deenik, S. Grunwald, and N. M. Knox, “Soil total carbon analysis in Hawaiian soils with Hindawi Publishing Corporation Applied and Environmental Soil Science Volume 2012, Article ID 535646, 12 pages doi:10.1155/2012/535646

Research Article Investigations into Soil Composition and Texture Using Infrared Spectroscopy (2–14 µm)

Robert D. Hewson,1 Thomas J. Cudahy,2 Malcolm Jones,3 and Matilda Thomas4

1 School of Mathematical and Geospatial Sciences, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia 2 CSIRO Earth Science and Resource Engineering, P.O. Box 1130, Bentley, WA 6102, Australia 3 Geological Survey of Queensland, Level 10, 119 Charlotte Street, Brisbane, QLD 4000, Australia 4 Geoscience Australia, GPO Box 378, Canberra, ACT 2601, Australia

Correspondence should be addressed to Robert D. Hewson, [email protected]

Received 17 February 2012; Accepted 27 August 2012

Academic Editor: Sabine Chabrillat

Copyright © 2012 Robert D. Hewson et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The ability of thermal and shortwave infrared spectroscopy to characterise composition and texture was evaluated using both particle size separated soil samples and natural soils. Particle size analysis and separation into clay, silt, and sand-sized soil fractions was undertaken to examine possible relationships between quartz and clay mineral spectral signatures and soil texture. Spectral indices, based on thermal infrared specular and volume scattering features, were found to discriminate clay mineral-rich soil from mostly coarser quartz-rich sandy soil and to a lesser extent from the silty quartz-rich soil. Further investigations were undertaken using spectra and information on 51 USDA and other soils within the ASTER spectral library to test the application of shortwave, mid- and thermal infrared spectral indices for the derivation of clay mineral, quartz, and organic carbon content. A nonlinear correlation between quartz content and a TIR spectral index based on the 8.62 μm was observed. Preliminary efforts at deriving a spectral index for the soil organic carbon content, based on 3.4–3.5 μm fundamental H–C stretching vibration bands, were also undertaken with limited results.

1. Introduction Another motivation to determine a soil’s texture and composition, including mineralogy, is with the aim to Mapping and analysing soils for their composition and measure a soil’s ability to retain water or enable drainage. textural characteristics typically involves extensive field work Clay minerals such as montmorillonite can exhibit swelling and laboratory techniques that are traditionally time con- behaviour, absorbing and storing water, within their layered suming. However the measurement and determination of lattice structure [2]. Such finer textured clay rich soils can soil texture and composition is important for the mapping offer more water for plant growth than sandy soils. Sandy of areas vulnerable to soil erosion, driven by water and wind. soils are more vulnerable to drought than clayey soils, storing Coarser-textured soils are more resistant to detachment less water and likely to lose water more rapidly by the and transport via raindrops, thus less affected to water- growing plants [3]. However under flood conditions, clay assisted erosion [1]. Soils with a silt content above 40% are rich soils exhibit poor drainage of excess water and may considered highly erodible while clay particles can potentially become waterlogged. combine with organic matter to form aggregates or clods The incorporation of spectrally derived textural and which assist in their resistance to erosion [1]. Also, studies of compositional information into soil classification schemes the critical shear wind velocities required for transportation will be particularly undertaken for special-purpose clas- of different-sized soil particles indicate, those with diameters sification with specific aims, such as mapping erodibility between 0.10 to 0.15 mm are the most vulnerable to wind [3]. White [2] describes the usefulness of texture classifi- erosion [1]. cation of soils for drought vulnerability and poor aeration. 2 Applied and Environmental Soil Science

In addition, estimation of surface soil texture and the also been achievable using diffuse reflectance mid infrared identified presence or absence of certain minerals is still (DRIFTS) spectroscopy techniques with the assistance of potentially an important input for general-purpose soil partial least squares calibration using a set of control soil classification based on quantifiable properties that define its samples [13]. However the physics of DRIFTS measurements morphology, as distinct from its genesis. Soil classification in and sampling operation precludes the conversion of the the USA and Australia use mostly such as a recorded reflectance signatures to absolute emissivities. The basis for classification although this also requires observing DRIFTS technique requires samples to be ground to a fine- properties varying with depth and horizons [2]. Incor- powdered fraction (e.g., <80 μm) held in a small cap-like porating spectroscopy information into a comprehensive container from which reflectance and transmitted signatures 3D morphology description therefore requires the use of are detected [14]. proximal spectral measurements of excavated soil samples. Proximal and remote sensing bidirectional VNIR-SWIR The overall interest to determine a soil’s texture and reflectance measurements can be approximately representa- composition can be summarized also by the need to mon- tive if soils are lambertion (e.g., isotropic for EMR). This itor and map areas vulnerable to desertification, typically assumption will not be suitable for anisotropic surfaces demonstrated by increased soil erosion. A detailed study by such as clay-dominated scalds. Ideally the comparison [3] examined the key indicators of desertification, including of MIR and TIR remote sensing with soil spectroscopy soil properties, with the aim of mapping areas vulnerable to requires directional hemispherical reflectance (DHR) or future desertification. The study described the soil param- emission mode proximal measurements of soils. Salisbury eter, erodibility, as primarily a property of the soil texture, [15] demonstrated that for typical terrestrial materials with with highest values for fine sand and silty soils with low lambertian surfaces, it is possible to assume Kirchoff’s Law clay content, but which can be decreased significantly with (ε = 1 − ρ,whereε = emissivity and ρ = reflectance), the presence of organic carbon matter [3]. With the world enabling DHR and emissivity spectral signatures to be population expected to reach 9 billion by 2050, food security interchangeable. Later investigations within the 3–14 μm is an issue that can least afford the effects of erosion and wavelength range have indicated that the change in emissivity desertification, reducing the existing arable agricultural land with observation angle is small for all soils except for sand, to feed a growing world [4]. where a change of up to 4% occurs within the 8–10 μm Both proximal and remote sensing spectroscopy offers wavelength region [16]. the potential of increasing the speed, and reducing the cost Studies to interpret multispectral ASTER (Advanced of interpreting soil samples for texture and composition. Spaceborne Thermal Emission Reflectance Radiometer) Recently, hyperspectral airborne imaging spectroscopy has TIR and airborne TIMS (Thermal Infrared Multispectral been successfully applied to study some soil properties utilis- Scanner) imagery for soil mineralogy and texture however ing electromagnetic radiation (EMR) within the visible-near have demonstrated that their limited spatial and spectral infrared (VNIR) wavelengths (0.4–1.0 μm) and shortwave resolution restricts this application [17]. Other studies infrared (SWIR) wavelength regions (1.0–2.5 μm) [5]. Also, have in particular described the difficulties of comparing developments in airborne hyperspectral mid and thermal heterogeneities within ASTER’s 90 m pixel footprint with infrared remote sensing [6, 7] indicate the potential for temperature and emissivity variability on the ground [18]. mapping and characterising in situ soils. In the laboratory, ASTER imagery has been available since 2000, with only five thermal Infrared (TIR) spectroscopy within the 7–14 μm bands between 8.3 and 11.3 μmacquiredforeach90metre wavelength region has also shown its potential to provide pixel. NASA’s future HyspIRI spaceborne sensor will acquire soil mineralogy and textural information [8, 9]. In addition, at a slightly higher resolution of six TIR bands between 8.3 several key organic carbon rich components (including and 12.0 μm acquired for each 60 metre pixel [19]. lignin and cellulose) display diagnostic features related to Proximal DHR or high spectral resolution emis- the fundamental hydrocarbon H–C stretching vibration sion/radiance measurements are required to undertake soil bands between 3.4 and 3.5 μmoftheMidInfrared(MIR) spectroscopy to simulate passive hyperspectral MIR/TIR wavelength (3–5 μm) region [10, 11]. Spectral libraries, remote sensing techniques. Although extracting emissiv- consisting of bidirectional TIR reflectance measurements, ity information from MIR/TIR remote sensing radiance reveal diagnostic absorption features of many silicate min- imagery, particularly daytime (e.g., sunlit) MIR, is non triv- erals [9] although directional effects preclude their use for ial, several algorithms have been developed and applied [20, quantifiable comparison with TIR remote sensing signa- 21]. In this study, the derivation of selected compositional tures. Proximal laboratory spectral measurements within and textural soil information is targeted using key spectral the visible-near infrared (VNIR) wavelengths (0.4–1.0 μm) absorption features within the 2 to 14 μmwavelengthregion and shortwave infrared (SWIR) wavelengths (1.0–2.5 μm) as a pilot investigation for future hyperspectral infrared can identify ferric iron oxides, clay (AlOH), sulphates, remote sensing applications. and carbonate minerals [12] common within soils. How- The routine acquisition of hyperspectral TIR imagery ever discriminating the quartz or silicate content of soils is still at an early stage by airborne sensors such as the requires spectroscopy within the MIR region [11], and more Spatially Enhanced Broadband Array Spectrograph System commonly, the TIR region [9]. Table 1 summarises these (SEBASS) with 128 bands between the 7.6 and 13.5 μm wavelength regions containing diagnostic spectral signa- wavelengths [22]. Effectively at present, high-resolution tures. Quantifying soil composition and characteristics has spectral and spatial thermal infrared imagery is limited to Applied and Environmental Soil Science 3

Table 1: Accessible windows within infrared spectral regions and their diagnostic compositional elements commonly found in soils.

SWIR MIR TIR 1.0–2.5 μm 3–5 μm 7–14 μm Clay minerals (e.g., kaolinite, illite, Organic carbon (e.g., lignin, cellulose), Quartz (including silt versus coarse sand), montmorillonite), sulphates (e.g., gypsum) quartz, kaolinite kaolinite, illite/montmorillonite, cellulose. and carbonate (e.g., calcite) minerals

targeted airborne acquisitions or spot in situ or sampled-soil investigation also incorporates further interpretation of short measurements. Until more routine and operational airborne wavelength infrared (SWIR) spectroscopy for clay mineral (or higher resolution satellite) sensors are available, it is content of all the examined Tick Hill samples. hoped that the use of field spectrometry or laboratory measurements of soil samples will be of use for some faster textural and compositional analysis than traditional 2. Data and Laboratory Methods laboratory techniques. 2.1. Traditional Analytical Methods. In the past, texture has The possibility of undertaking three-dimensional soil been assessed qualitatively in field soil surveys by moistening spectroscopy is feasible using trays that are already currently a sample with water and kneading between fingers and used to store multiple small samples from regular depth thumb until the aggregates are broken down and the soil intervals. Spectral sensing of rock chip trays are already grains thoroughly wetted [2]. More accurate quantitative used to acquire proximal VNIR-SWIR measurements of but time consuming laboratory analysis of texture are samples collected as part of routine mining and exploration also available by particle-size analysis using sedimentation drilling programs (http://www.csiro.au/en/Organisation-St- techniques based on the rate of settling within a soil- ructure/Divisions/Earth-Science–Resource-Engineering/Hy- water suspension [29]. Likewise traditional compositional Chips.aspx). Laboratory VNIR-SWIR and potentially TIR analytical techniques involving X-ray diffraction are time measurements of such trays could be undertaken assuming consuming and expensive. It should be noted that in this the soil samples are sufficiently and consistently dried, and spectroscopy study, “clay” content refers to clay mineral completely fill each tray compartment. A small field of content (e.g., kaolinite, montmorillonite, and illite), that is, view would be required to ensure there was no spectral less than 2 μm, if disaggregated. Care is therefore required interference with the tray container material or that no when comparing spectrally derived clay mineral content with potential blackbody cavity effect was detected if utilizing a particle size clay content as the determined 2 μmandfiner TIR emission mode spectrometer. fraction may include fine iron oxides or organic material if Initially this study was part of a much larger CSIRO- not properly pretreated [29]. ESRE (Commonwealth Scientific Industrial Research Organ- isation, Division of Earth Science and Resource Engineering) project to map surface minerals/chemistries using airborne 2.2. Tick Hill Samples. Eight Tick Hill soil samples were hyperspectral and satellite ASTER imagery, applying visible- chosen and analysed for International System particle near infrared and shortwave infrared sensing (proximal and size fractions [2], clay (<2 μm), silt (2–20 μm) and sand airborne) techniques [23, 24]. The Tick Hill soils described (20 μm–2 mm), by CSIRO land and water (http://www.clw and used in this paper were part of this CSIRO-ESRE .csiro.au/services/analytical/), using the traditional pipette study and collected within north Queensland (21◦35S, method [29](Table 2). According to the Australian Soil 139◦55E) (Figures 1 and 2). The good soil exposure and Resource Information System (ASRIS) mapping [2, 31], all variability within this regional project area made this a except one sample used in this study area, were Ferrosols useful study area for soil mapping via remote sensing and (Table 2), being high in free iron oxide content and low spectroscopy. Preliminary results of the TIR spectroscopy textural contrast between A and B horizons [30]. MI132 investigations undertaken for these Tick Hill soils were was mapped as a Tenosols with weak pedologic structure presented at the 19th World Congress of Soil Science [25]. apart from the A horizon [30]. However at the detailed This publication describes these results in greater detail, and scale sampled in this Tick Hill area, a much greater in combination with more results from DHR measurements diversity of soil properties were observed (Figure 1). These of USDA (United States Department of Agriculture) soil samples were firstly prepared by chemically removing salts, samples, available via the ASTER Spectral Library (ASL) [26] organic matter, and ferric iron [29]. These fractions were (http://speclib.jpl.nasa.gov/search-1/soil). Previous investi- separated and dried for later spectral measurements. The gations of these USDA ASL spectra had found them useful for resulting sand fraction was also further separated between direct comparisons with field measurements when convolved 20–60 μmand60μm–2 mm, and dried for later spectral to ASTER’s 5 band TIR emissivity spectral resolution [27]. measurements. The ASL spectra within this study also included ten addi- The original raw soil and also four particle size fractions tional soil samples from a semiarid environment, Fowlers (<2 μm, 2–20 μm, 20–60 μm, 60 μm–2 mm) of each of the Gap, in western New South Wales, Australia, collected eight Tick Hill samples, were measured for their TIR spectral and analysed as part of a Ph.D. [28]. In addition, this emissivity signatures using the Designs and Prototypes 4 Applied and Environmental Soil Science

Table 2: Tick Hill soil sample particle analysis results using Method code 517.08 of [29] where Fe/Al oxides, organic matter, and soluble salts are removed. Calculated moisture, Fe/Al Clay Silt ASRIS∗ Sample <2 μm% <20 μm% <2000 μm% oxides, and organic matter <2 μm% 2–20 μm% classification removed by pretreatment % MI115 33.1 13.2 33.1 46.3 100 13 Ferrosols MI120 18.3 8.6 18.3 26.9 100 11 Ferrosols MI121 16.6 10.1 16.6 26.6 100 6 Ferrosols MI122 46.1 18.8 46.1 64.9 100 8 Ferrosols MI124 57.6 17.3 57.6 74.8 100 7 Ferrosols MI128 22.8 6.7 22.8 29.5 100 6 Ferrosols MI129 13.1 13.8 13.1 26.9 100 6 Ferrosols MI132 27.8 11.5 27.8 39.3 100 11 Tenosols ASRIS∗ (Australian Soil Resource Information System) [2, 30].

(a) (b)

Figure 1: Example of the Tick Hill landscape for sample sites: (a) MI121 and (b) MI124. microFTIR 102 [32](http://www.dpinstruments.com/). Soil emissivity signatures for mineralogy was assisted by com- samples were oven heated overnight at 60◦Ctoobtaina parison with the ASL [26]. Additional spectral measure- consistent dryness. MicroFTIR emission measurements were ments of the Tick Hill samples were also undertaken acquired from heated soil samples within ceramic crucibles, using the PIMA II (Portable Infrared Mineral Analyser; also at 60◦C, from an approximate 20 mm field of view and http://www.hyvista.com/) and the Fieldspec FR (http:// using 16 scan integrations. Measurements were calibrated www.asdi.com/) spectrometers to acquire SWIR reflectance to radiance units (W/m2/sr/μm) using hot and cold black signatures of the Tick Hill raw soil and particle-size fractions. body measurements set to 65◦Cand30◦C, respectively. Background radiance (e.g., “downwelling”) was removed by 2.3. ASTER Spectral Library. Fifty one of the soil samples measuring the emission of a brass plate at room temperature accessible at the ASTER Spectral Library [26](http:// (determined via a Pt thermocouple). Temperature-emissivity speclib.jpl.nasa.gov/search-1/soil) were used in this study separation of the acquired radiance measurements was and represented all major soil types. Detailed descriptions calculated using in-house software developed by CSIRO and classifications are given for most of the soils, including (Green, pers. comm.) to provide absolute emissivity spectral percentages of sand, silt, and clay provided by the USDA signatures. Each soil sample was also analysed for mineralogy National Soil Survey Laboratory, where clay size is less using X-ray diffraction (XRD). than 2 μm; silt size ranges from 2 μmto50μm; sand size Emission measurements by the microFTIR of each ranges from 50 μm to 2 mm. The USDA system definitions particle size fraction was repeated as a check for slight sig- of silt and sand by particle size differ from the International nature variations when the sample surface was disturbed System where silt and sand is defined by particle sizes with a spatula. Only minor changes in absolute emis- 2 μmto20μmand20μm to 2 mm, respectively [2]. This sivity values were observed with no effective change prevented the use of the silt and sand breakdown for the in signature shape. The resulting TIR emissivity signa- eight Tick Hills samples from being combined with these tures were imported into CSIRO software for processing USDA samples. When available, the USDA determined the proximal spectral data, “The Spectral Geologist” (TSG, clay mineralogy semiquantitatively from the XRD analysis, http://www.thespectralgeologist.com/). Interpretation of the while mineralogy of the silt and sand was determined Applied and Environmental Soil Science 5

Published geology

7621000

7618000

7615000

7612000 Emissivity (offset for clarity)

7609000 7.5 8.5 9.5 10.5 11.5 12.5 13.5 Wavelength (μm) 7606000

MI 122 Kaolinite 7603000 MI 128 Quartz 385000 388000 391000 Montmorillonite Field samples Figure 3: Example MicroFTIR emissivity spectra of Tick Hill soil 2006 Cainozoic Cambrian samples and ASL [26] highlighting the presence of mixtures of clay 2007 Mesozoic Proterozoic and quartz minerals within the samples. Tertiary Figure 2: Location of the Tick Hill study area [24], North Queen- sland, Australia. amounts of illite. MicroFTIR emissivity signatures of raw soil samples confirmed the predominance of quartz and clay minerals. In particular, samples MI122 and MI128 by petrographic microscope. The supplied USDA quartz indicated the presence of quartz and kaolinite/smectite min- content percentage estimate was recalculated into a total erals (Figure 3). Although the quartz “reststrahlen” feature soil quartz content estimate for this study using the clay between8and9.5μm is reduced in the raw soil samples particle size percentage estimate to derive an adjustment compared to the JHU library spectra, the 8.62 μmfeature (e.g., Quartz%[silt, sand] ∗ [100 − Clay%]/100). The USDA’s remains distinctive. Likewise, the 9.0 μm kaolinite feature is percent organic carbon content was also used in this study less distinctive in the raw soil mixture although the 9.5 μm and obtained by wet combustion analysis [33]. Estimates of feature remains. quartz contents for the Fowlers Gap samples were derived The corresponding emissivity signatures for the vari- from normative analysis of soil’s XRF elemental results [34]. ous particle size fractions of samples MI122, and MI128, Particle size analysis of the Fowlers Gap soils was determined highlight examples of the trend of an increasing quartz using sieving and laboratory techniques assuming USDA reststrahlen 8.62 μm spectral feature and decreasing clay system texture classes [34]. 9.5 μm feature, with increasing particle size (Figure 4). The USDA samples included in the ASL were measured Also within the 10.5–12 μm wavelength region, “volume” for DHR spectra using a Nicolet 5DXB FTIR spectropho- scattering quartz features (QVS) are associated with the 2– − tometer at constant 4 cm 1 spectral resolution from 2 μm 20 m and 20–60 μmsoilparticlefractions[36](Figure 4). to 14 μm (e.g., SWIR-TIR wavelengths) [35]. A directional By comparison, the reststrahlen quartz feature is asso- (10 degree) hemispherical reflectance attachment was used ciated with specular scattering from coarser quartz grains for measuring a 2.5 cm sample diameter. In this study the [36]. Several samples, including the displayed MI128, also ASL TIR signatures were converted from DHR to emissivity, indicate small amounts of kaolinite within the 2–20 μmand as generated from the microFTIR, assuming Kirchoff’s Law 20–60 μm particle size fractions, as shown by their 9.0 and [15]. These spectra were also combined with additional 9.8 μmspectralfeatures(Figure 4). 0.4 μmto2.0μm spectral measurements for each sample, The TSG software was customised to target those emis- although the VNIR wavelength region was not studied here. sivity features associated with kaolinite/smectite, quartz, and its volume scattering fine-grained variation. In particular, 3. Spectral Analysis and Results spectral indices were devised to estimate the coarser quartz content, the clay mineral content, and the effects of fine 3.1. Tick Hill. XRD analysis of the Tick Hill soils identified quartz volume scattering; Quartz(ε), Clay(ε), and QVS(ε), minerals such as quartz, smectite, kaolinite, and minor respectively. Generally the individual detector bands or 6 Applied and Environmental Soil Science

9.5 μm in the quartz spectral parameter with increasing grain size. A higher quartz volume scattering behaviour for the mid- sized fractions (e.g., 2–60 μm) is shown by the auxiliary colour coding (green to red, Figure 6) for the sample points. Figure 7 shows a clear inverse trend between the clay and quartz spectral indices. However a high clay value can still appear within silty fractions (e.g., cyan coding, Log ∼1.0 or ∼10 μm) and some coarser fractions (e.g., red) exhibit a high clay spectral index. The Clay(ε)versusQuartz(ε) relationship shown in Figure 7 shows a high correlation of determination (R2) value of 0.85. Although it should be noted that this Quartz-TIR volume dataset included repeated microFTIR measurements for each scattering sample fraction. Developing predictive clay% algorithms is complicated by such residual clay mineral content within Emissivity (offset for clarity) the separated soil fractions. However Figure 8 suggests predicting clay mineral content within fractions finer than 10 μm with low quartz content (e.g., blue) is possible. 8.625 μm A spectral index was determined for the SWIR spectral measurements collected using the PIMA II. A TSG-based algorithm calculating the depth of a 4th-order polyno- mial at the 2.2 μm clay absorption feature was applied to continuum-removed spectra (“clay/kaolinite (SWIR)”) for 7.5 8.5 9.5 10.5 11.5 12.5 13.5 all Tick Hill soil particle fractions (Figures 9(a) and 9(b)). μ Wavelength ( m) The value of clay/kaolinite (SWIR) was set to 0 when no MI 122: >60 μm MI 122: 2–20 μm 2.2 μm SWIR absorption feature was identified. Comparing MI 128: >60 μm MI 128: 2–20 μm the SWIR and TIR mineral clay indices for all the Tick MI 122: 20–60 μm MI 122: <2 μm Hill fractions showed a broad range of values (Figure 9(a)). MI 128: 20–60 μm MI 128: <2 μm However a better correlation between particle size and TIR- derived mineral Clay(ε) index was apparent when only Figure 4: Example MicroFTIR emissivity spectra of particle size < μ μ μ separated Tick Hill soil samples showing diagnostic mineral and the three 2 m, 2–20 m and 20–60 m fractions were textural related spectral features. examined (Figure 9(b)). It appears likely that there is some contamination of clay mineral content within the coarser fractions, possibly as a coating to the grains. This was closest wavelength thereof were used here. These spectral suggested by examples of clay spectral features (e.g., MI128 indices were devised to target the spectral absorption feature fractions, Figure 4). in a similar method as by [37]. Applying these spectral indices to the eight natural Tick Hill soil samples did not reveal clear relationships with 2 × ε ε = (8632) texture although the number of samples was small. The Quartz( ) ε ε , (1) 8383 + 8897 mineral Clay(ε) index versus lab derived clay content for these natural samples (Figure 10) clearly indicated a much where ε(8632) is the mean ε value within ±30 nm (e.g., larger population and range of soil samples is required to between 8602 and 8664 nm), and ε8383 and ε8897 are the ε values at wavelengths 8383 and 8897 nm, respectively. test the application of these spectral techniques, as further Figure 5 highlights these wavelengths in relation to the investigated in the following section. quartz reststrahlen spectral feature. The estimation of quartz content using (1), as a spectral index based on the diagnostic reststrahlen absorption feature, follows examples of the 3.2. ASTER Spectral Library Soils. SWIR and TIR spectral previous application of spectral indices with proximal and signatures of 51 soils from the ASL [26] were included airborne hyperspectral data [23, 24]. in this study to evaluate the devised spectral indices for Similarly, indices were devised for Clay (kaolinite) and composition and texture of natural soil samples. Examples estimates of the quartz volume scattering effect. of these ASL USDA and Fowlers Gap spectral signatures are shown in Figures 11(a) and 11(b), respectively. These ε9178 + εε9852 Clay(ε) = , (2) spectra highlight the diverse range of soil environments 2 × ε9500 represented, including organic-rich Spodosol (874264), aeolian based sandy Alfisol loam (87P2376), micaceous ε10318 + ε12279 QVS(ε) = . (3) ε + ε Inceptisol loam (88P2535), and quartz rich alluvial Entisol 11320 11664 sand (FGG027) (Figures 11(a) and 11(b)). The results of these spectral indices for each Tick Hill The same spectral indices as described in (1)–(3)were soil fraction, processed using TSG, are shown in Figures 6, applied to the ASL soil spectra using Excel rather than TSG 7,and8. Figure 6 indicates an approximate increasing trend in this application. Differences between the ASL Nicolet Applied and Environmental Soil Science 7

1 1.068 60 μm–2 mm 1.0621 20–60 μm 1.0563 0.98 1.0504 1.0445 1.0387 1.1 0.96 1.0328 )

ε 1.0269 1.0211 1.0152 Emissivity 0.94 Quartz( 2–20 μm 1.0093 1.0035 μ 0.92 8.63 m <2 μm 0.9976 8.89 μm 0.9917 8.38 μm 1 0.9859 0.9 0 0.6 1.2 1.6 2.4 3 0.98 8 8.5 9 9.5 Log (mid particle size fraction) QVS(ε) Wavelength (μm) Figure 6: Quartz content spectral index (Quartz(ε)) versus log of 75% quartz (85P3707) the particle size fraction and colour coded by the fine quartz volume 10% quartz (86P1994) scattering (QVS(ε)(e.g.,red= high scattering associated with fine 24% quartz (86P4561) quartz), showing an approximate decrease in quartz content with decreased grain size. Figure 5: Examples of USDA TIR soil signatures of variable quartz content (XRD analysis of silt-sand fraction) in relation to the diagnostic 8.63 μm reststrahlen feature.

3.016 Log (mid particle size fraction) 2.815 5DXB FTIR and the CSIRO-ESRE MicroFTIR measurements 2.613 required a slight adjustment of the spectral indices although 2.412 2.211 the differences in the TIR wavelengths were minor. 1 2.01

) 1.808

μ ε A distinct SWIR absorption feature at 2.2 m(e.g., 1.607 2200 nm) can be observed for several of AlOH clay and 1.406 Clay( 1.205 phyllosilicate minerals that could comprise the clay fine 1.003 particle fraction of soils (e.g., montmorillonite, kaolinite, 0.802 0.601 and muscovite, Figure 12). A relative band depth index, 0.4 ρ 0.198 AlOH RBD( ), was devised for this 2200 nm feature to 0.9 0 discriminate such clay minerals (Figure 12)by(4) 1 1.1 ε   ρ ρ Quartz( ) ρ = 2100 + 2300 AlOH RBD , (4) Figure 7: Relationship between quartz and clay content spectral ρ2195 + ρ2215 parameters colour coded by the log of the particle size (e.g., red > ∼ μ < ∼< μ where ρ2100 is the reflectance value at 2100 nm, and so forth. 2.8 or 60 m–2 mm; blue 1or 2 m), showing the inverse In particular, (4) uses an average estimate of the absorp- relationship between TIR interpreted quartz and clay mineral tion spectral feature at 2195 and 2215 nm as the denominator contents. and the shoulders of the absorption at 2100 and 2300 nm for the numerator as a variation of the relative band depth technique. Both the Clay(ε)andAlOHRBD(ρ) indices for the 1.16 combined Tick Hill and ASL data sets show no coherent <2 μm 1.149 correlation between clay mineralogy and clay particle size 1.139 2–20 μm 20–60 μm 1.128 despite the use of an enlarged sample population of soils with 1.118 1 a wider range in clay content (Figure 13). 1.107

) 1.097 The poor result for the spectrally derived clay content% ε 60 μm–2 mm 1.086 1.076

(Figure 13) could be due to a number of factors including Clay( 1.065 (1) interference from the 8 to 9.2 μm quartz reststrahlen 1.055 μ 1.044 feature on the 9.5 m clay absorption-based spectral index, 1.034 (2) limitations of the relative band depth spectral algorithm 1.023 applied for clay/AlOH mineral content, (3) nonlinear effects 0.9 1.013 00.6 1.2 1.8 2.4 3 1.002 from multiple scattering between clay mineral particles Quartz(ε) coating coarser grain particles, or (4) the presence of fine Log (mid particle size fraction) nonclay mineral particles less than 2 μm distorting the mea- Figure 8: Relationship between the clay content spectral parameter sured clay fraction (e.g., iron oxides, organics). A variation and the log of the particle size colour coded by the quartz content of the clay (ε) spectral index, calculated using (2), was spectral parameter (e.g., low quartz content: blue, high quartz: red). 8 Applied and Environmental Soil Science

0.3 1.05 <2 μm 0.3 1.05 <2 μm 60 μm–2 mm 2–20 μm 20–60 μm 0.2 1 2–20 μm 20–60 μm

) 1

ε 0.2 ) y =−0.0339x +1.0244 ε

Clay( R2 = 0.6918 Clay( 0.1 0.95 0.1 0.95 Clay/kaolinite (SWIR) Clay/kaolinite Clay/kaolinite (SWIR) Clay/kaolinite

0 0.9 0 0.9 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 Log (mid particle size fraction) Log (mid particle size fraction) SWIR SWIR Clay(ε) Clay(ε) Linear (clay(ε)) (a) (b)

Figure 9: (a) Comparison of clay mineral content determined by SWIR defined clay/kaolinite spectral index and TIR defined Clay(ε)forthe four Tick Hill particle size fractions. (b) Close up of (a) and line of best fit for Clay(ε) results for finest three particle size fractions.

1.035 0.6

1.03 0.5

1.025 0.4 ) ε 1.02 0.3 Clay(

1.015 Reflectance 0.2 1.01 0.1 1.005 0 20 40 60 0 02468 101214 Clay <2 μm % Wavelength (μm) Figure 10: TIR-derived Clay(ε) mineral index versus the particle 87P2376 size determined clay% of the eight natural Tick Hill samples. 87P4264 88P2535 (a) μ subsequently generated using the 9.0 m kaolinite spectral 0.6 feature, that remains even with the dominant presence of quartz (Figure 3)anddescribedherein(5): FGG027 2 × ε 0.4 Clay2(ε) = 9000 . (5) ε8862 + ε9257

However the resulting Clay2(ε) index also produced a poor Reflectance 0.2 result showing no effective correlation with the USDA clay content%. The results of the Quartz(ε) compared to the petro- 0 graphic determination of quartz content associated with 2468101214 the USDA soil spectra indicated a relationship involving a Wavelength (μm) power function (n = 2) with a moderate coefficient of determination, R2 = 0.69 (Figure 14). Including the Fowlers (b) 2 Gap soils into this sample population did not alter this R Figure 11: (a) Examples from the USDA soil spectra included result significantly. However the Fowlers Gap quartz content within the ASL. (b) Example from the semiarid Fowlers Gap was derived from a different method involving a normative (Australia) included in the ASL [26]. Applied and Environmental Soil Science 9

Tick Hill, Fowlers Gap and USDA soils 1.2

1.15

1.1

1.05

1

0.95 Spectral (clay-AlOH) index

0.9 0 102030405060 Clay <2 μm %

Clay(ε) AlOH RBD(ρ) Reflectance (offset for clarity) Reflectance Figure 13: Comparison between the SWIR derived, AlOH RBD(ρ), and TIR derived, Clay(ε), mineral spectral indices, and the clay particle size content supplied by USDA.

Several depth indices were devised, centred at 3.41 and 3.5 μm features as included in (6)–(8):

MIR organic hull depth A = 1 − ρ(3415), (6) 2.1 2.15 2.2 2.25 2.3 μ Wavelength ( m) where ρ(3415) is the mean ρ value within ±10 nm (e.g., Montmorillonite SCa-2 Kaolinite KGa-1 between 3405 to 3425 nm) Montmorillonite SWy-1 Muscovite GDS116 = − ρ Kaolinite KL502 Muscovite GDS113 MIR organic hull depth B 1 (3490), (7)

Figure 12: USGS VNIR-SWIR mineral library reflectance signa- where ρ(3490) is the mean ρ value within ±15 nm (e.g., tures for AlOH or clay phyllosilicate minerals highlighting the main between 3475 and 3505 nm) 2.2 μmabsorptionspectralfeature[12]. MIR organic hull depth average = MIR organic hull depth A (8) + MIR organic hull depth A. calculation based on XRF elemental analysis, and therefore not necessarily consistent with the USDA results. In addition, a spectral index was derived from the area The quartz volume scattering index (QVS) showed some bounded by the hull (=1inFigure 15) and the spectral minor increasing trend with silt content% for the USDA and signature between 3.3 and 3.6 μm. This required a reversal of Fowlers Gap samples, however the R2 was a low 0.27. This the hull signatures and calculation of area under the resulting index appears to be of more potential interest for the study curve. The highest “correlation” of the four indices to the of separated soil fractions than for natural soil samples. USDA organic carbon estimates was achieved using the MIR Attempts were also undertaken to derive nonmineral organic carbon depth index A, R2 of 0.53 (Figure 16(a)). organic carbon soil composition from the ASL soil spectra, However this included an extreme outlying sample contain- using spectral indices calculated from MIR absorption ing 28% organic carbon and the correlation reduced to a features and USDA determined organic carbon results. An R2 of 0.39 when this was omitted (Figure 16(b)). Likewise, examination of several USDA soil DHR spectra revealed the MIR organic hull average depth and Areal based indices organic carbon features at 3.41 μm (“A”) and 3.5 μm (“B”) produced a R2 of 0.37 and R2 = 0.34, respectively, that (Figure 15). Note that the displayed USDA soil, 874264, is an reduced to R2 values of 0.19 and 0.18, respectively, when this organic-rich Spodosol loam containing the highest organic single outlying high organic carbon sample was excluded. carbon of 28% within the ASL sample collection. These Although these preliminary results showed no coherent MIR DHR spectra (also shown in Figure 11(a)) highlight correlation, this outcome would be confirmed if more soils the processing between 3.3 μmand3.6μm where the hull were included containing a greater range of organic carbon continuum removal acts as normalisation process [38]and between 10 and 30%. However, as the majority of terrestrial simplifies the calculation of spectral indices as absorption soils contain less than 10% organic carbon, its determination depths (Figure 15). using such spectral indices appears unlikely. Figure 17 also 10 Applied and Environmental Soil Science

USDA soils USDA soils: MIR organic carbon depth indices 1.06 0.5 y = 0.0119x +0.1255 1.05 R2 = 0.53 y = 7E-06x2 − 0.0001x +1 1.04 0.4 R2 = 0.6876 ) ε 1.03

1.02 y = 0.0004x +0.994 0.3

Quartz( R2 = 0.5158 1.01 y = . x . 1 0.2 0 0071 +01021 R2 = 0.1937 0.99 10 30 50 70 90 Quartz (%) 0.1 MIR organic carbon depth indices Figure 14: Comparison between the Quartz(ε)andUSDAdeter- mined quartz% content with calculated linear (red) and power 0 relation (green) models. 0 5 10 15 20 25 30 Organic carbon (%)

MIR organic depth A Linear (MIR organic depth A) 1 MIR organic depth B Linear (MIR organic depth B) (a) USDA soils: MIR organic carbon depth indices 0.35 0.75 B y = 0.0198x +0.1122 R2 = . 0.3 0 3929 A 0.25 Hull removed reflectance removed Hull 0.5 3.3 3.35 3.4 3.45 3.5 3.55 3.6 0.2 Wavelength (μm) 0.15 y = . x . 87P2376 0 0094 +00983 R2 = 0.0717 87P4264 0.1 88P2535

MIR organic carbon depth indices 0.05 Figure 15: Examples of hull removed 3.3 μmto3.6μmASL reflectance spectra (see Figure 11(a)) highlighting organic carbon 0 spectral absorption features at 3.41 μm (“A”) and 3.5 μm (“B”) 0246810 applied to several USDA soil samples. Organic carbon (%) (b) shows this limitation with the MIR organic hull depth Figure 16: MIR organic carbon depth indices A and B versus the average Index result for values between 0 and 10% organic USDA organic carbon content; (a) for all data; (b) omitting outlying organic carbon rich sample at 28%. carbon.

4. Conclusions a TIR spectral index based on the 9.5 μmabsorptionfeature The study of the particle size fractions derived from Tick is useful for the determination of clay content within soil Hill soil samples show the intimate connection between particle size fractions. mineralogy and texture when examining TIR spectra. In Further investigations using a larger population of nat- particular, coarse and fine-grained quartz components have ural soil samples, available from the ASL, showed no strong distinct TIR spectral features. The spectral results also correlations between laboratory-determined clay particle size indicate that residual clay minerals may still be present in and the clay derived from either SWIR or TIR based spectral the sand fraction, even with thorough particle size separation indices. Possible causes for this may be (1) interference from processes, and potentially bias the determination of “clay” the8to9.2μm quartz reststrahlen feature on the 9.5 μmand content if based on sedimentation analysis alone (e.g., 9.0 μm clay absorption based spectral indices, (2) limitations <2 μm). The proximal spectral measurements of Tick Hill of the spectral relative band depth algorithm, (3) nonlinear samples suggested its potential to analyse for clay mineral effects from clay mineral particle coatings over coarser grain content. In particular, the Tick Hill results suggested that particles, or (4) the presence of nonclay mineral particles Applied and Environmental Soil Science 11

USDA soils [5] E. Ben-Dor, S. Chabrillat, J. A. M. Dematteˆ et al., “Using 0.6 imaging spectroscopy to study soil properties,” Remote Sensing of Environment, vol. 113, no. 1, pp. S38–S55, 2009. [6]J.A.Hackwell,D.W.Warren,R.P.Bongiovi,S.J.Hansel, 0.4 T. L. Hayhurst, and D. J. Mabry, “LWIR/MWIR imaging hy- perspectral sensor for airborne and ground-based remote sensing,” in Imaging Spectrometry II, vol. 2819 of Proceedings y . x . = 0 0283 +02274 of the SPIE, pp. 102–107, 1996. R2 = 0.1934 0.2 [7]S.Achal,J.E.McFee,T.Ivanco,andC.Anger,“Athermal infrared hyperspectral imager (TASI) for buried landmine detection,” in Detection and Remediation Technologies for

MIR organic hull average depth MIR organic average hull Mines and Minelike Targets 7, vol. 6553 of Proceedings of SPIE, 0 0246810 The International Society for Optical Engineering, April 2007. [8] J. W. Salisbury and D. M. D’Aria, “Infrared (8–14 μm) remote Organic carbon (%) sensing of soil particle size,” Remote Sensing of Environment, Figure 17: MIR organic hull average depth index versus the USDA vol. 42, no. 2, pp. 157–165, 1992. supplied Organic Carbon content for organic carbon less that 10% [9] J. W. Salisbury and D. M. D’Aria, “Emissivity of terrestrial content. materials in the 8–14 μm atmospheric window,” Remote Sensing of Environment, vol. 42, no. 2, pp. 83–106, 1992. [10] C. D. Elvidge, “Thermal infrared reflectance of dry plant materials: 2.5–20.0 μm,” Remote Sensing of Environment, vol. μ 26, no. 3, pp. 265–285, 1988. finer than 2 m but included within the “clay” particle [11] J. W. Salisbury and D. M. D’Aria, “Emissivity of terrestrial fraction. A moderate nonlinear correlation was indicated materials in the 3–5 μm atmospheric window,” Remote Sensing μ between the TIR quartz derived index, based on the 8.62 m of Environment, vol. 47, no. 3, pp. 345–361, 1994. reststrahlen feature, and the petrographic-derived quartz [12] R. N. Clark, T. V. V. King, M. Klejwa, G. A. Swayze, and N. content. Preliminary efforts were also undertaken to derive Vergo, “High spectral resolution reflectance spectroscopy of an organic carbon spectral index using absorption features minerals,” Journal of Geophysical Research,vol.95,no.8,pp. between 3.4 and 3.5 μm associated with the fundamental H– 12653–12680, 1990. C stretching vibration bands. No strong correlations with [13] R. H. Merry and L. J. Janik, “Mid infrared spectroscopy for the USDA organic carbon estimates were apparent using rapid and cheap analysis of soils,” in Proceedings of the 10th hull continuum removed spectra to provide indices based on Australia Agronomy Conference, January 2001. depth and area of the absorption features. [14] R. A. Viscarra Rossel, D. J. J. Walvoort, A. B. McBratney, L. J. Janik, and J. O. Skjemstad, “Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous Acknowledgments assessment of various soil properties,” Geoderma, vol. 131, no. 1-2, pp. 59–75, 2006. Adrian Beech, former manager of Analytical Services, CSIRO [15] J. W. Salisbury, A. Wald, and D. M. D’Aria, “Thermal- Land and Water, is sincerely thanked for undertaking the infrared remote sensing and Kirchhoff’s law—1. Laboratory extensive particle size analysis and sample separation of measurements,” Journal of Geophysical Research, vol. 99, no. 6, the Tick Hill soils. Assistance from Peter Mason of CSIRO pp. 11897–11911, 1994. has also been helpful in the application of TSG software. [16] W. C. Snyder, Z. Wan, Y. Zhang, and Y. Z. Feng, “Thermal Encouragement and support from Vittala Shettigara of infrared (3–14 μm) of bidirectional reflectance measurements DSTO, Simon Hook of JPL, Simon Jones of RMIT, Alan of and soils,” Remote Sensing of Environment, vol. 60, no. Marks, and Ian Goicoechea is also gratefully appreciated. 1, pp. 101–109, 1997. Matilda Thomas publishes with the permission of the Chief [17] R. D. Hewson, G. R. Taylor, and L. B. Whitbourn, “Application ffi of TIR imagery and spectroscopy for the extraction of soil Executive O cer of Geoscience Australia. textural information at fowlers gap, Western New South Wales, Australia,” in Proceedings of the IEEE International Geoscience References and Remote Sensing Symposium, pp. 723–726, July 2008. [18] J. A. Sobrino, J. C. Jimenez-Mu´ noz,˜ L. Balick, A. R. Gillespie, [1] R. P. C. Morgan, Soil Erosion and Conservation,Blackwell, D.A.Sabol,andW.T.Gustafson,“AccuracyofASTERLevel- Malden, Mass, USA, 3rd edition, 2005. 2 thermal-infrared standard products of an agricultural area [2]R.E.White,Principles and Practice of Soil Science—The Soil as in Spain,” Remote Sensing of Environment, vol. 106, no. 2, pp. a Natural Resource, Blackwell, Malden, Ma, USA, 4th edition, 146–153, 2007. 2006. [19] G. Hulley and S. Hook, HyspIRI Level-2 Thermal Infrared [3] C. Kosmas, M. Kirkby, and N. Geeson, “Manual on: Key (TIR) Land Surface Temperature and Emissivity Algorithm The- indicators of desertification and mapping environmentally oretical Basis Document, JPL Publication 11-5, Jet Propulsion sensitive areas to desertification,” EUR 18882, European Com- Laboratory, NASA, Pasadena, Calif, USA, 2011. mission, Energy, Environment and Sustainable Development, [20] Z. L. Li, F. Becker, M. P. Stoll, and Z. Wan, “Evaluation of 1999. six methods for extracting relative emissivity spectra from [4] Food and Agriculture Organisation of the United Nations, The thermal infrared images,” Remote Sensing of Environment, vol. State of Food Insecurity in the World, Rome, Italy, 2011. 69, no. 3, pp. 197–214, 1999. 12 Applied and Environmental Soil Science

[21] A. Mushkin, L. K. Balick, and A. R. Gillespie, “Extending [37] J. K. Crowley, D. W. Brickey, and L. C. Rowan, “Airborne surface temperature and emissivity retrieval to the mid- imaging spectrometer data of the Ruby Mountains, Montana: infrared (3–5 μm) using the Multispectral Thermal Imager mineral discrimination using relative absorption band-depth (MTI),” Remote Sensing of Environment,vol.98,no.2-3,pp. images,” Remote Sensing of Environment,vol.29,no.2,pp. 141–151, 2005. 121–134, 1989. [22] L. Kirkland, K. Herr, E. Keim et al., “First use of an airborne [38] F. Van Der Meer, “Spectral curve shape matching with a thermal infrared hyperspectral scanner for compositional continuum removed CCSM algorithm,” International Journal mapping,” Remote Sensing of Environment,vol.80,no.3,pp. of Remote Sensing, vol. 21, no. 16, pp. 3179–3185, 2000. 447–459, 2002. [23] T. J. Cudahy, M. Caccetta, A. Cornelius et al., “Regolith geology and alteration mineral maps from new generation airborne and satellite remote sensing technologies; and Ex- planatory Notes for the Kalgoorlie-Kanowna, 1:100,000 scale map sheet, remote sensing mineral maps,” MERIWA Report 252, Perth, Australia, 2005. [24] T. Cudahy, M. Jones, M. Thomas et al., Mapping Soil Surface Mineralogy at Tick Hill, North-Western Queensland, Australia, Using Airborne Hyperspectral Imagery, Springer, 1st edition, 2010. [25] R. Hewson, T. Cudahy, A. Beech, M. Jones, and M. Thomas, “Mineral and textural investigations of soils using thermal infrared spectroscopy,” in Proceedings of the 19th World Con- gress of Soil Science, Brisbane, Australia, August 2010. [26] A. M. Baldridge, S. J. Hook, C. I. Grove, and G. Rivera, “The ASTER spectral library version 2.0,” Remote Sensing of Environment, vol. 113, no. 4, pp. 711–715, 2009. [27] J. A. Sobrino, C. Mattar, P. Pardo et al., “Soil emissivity and reflectance spectra measurements,” Applied Optics, vol. 48, no. 19, pp. 3664–3670, 2009. [28] R. D. Hewson and G. R. Taylor, “An investigation of the geological and geomorphological features of Fowlers Gap using thermal infrared, radar and airborne geophysical remote sensing techniques,” Rangeland Journal, vol. 22, no. 1, pp. 105– 123, 2000. [29] N. J. McKenzie, K. J. Coughlan, and H. P. Cresswell, Soil Physical Measurement and Interpretation for Land Evaluation, CSIRO Publishing, Melbourne, Australia, 2002. [30] R. F. Isbell, The Australian Soil Classification,CSIRO,Mel- bourne, Australia, 1996. [31] P. Carlile, E. Bui, C. Moran, D. Simon, and B. Henderson, “Method used to generate soil attribute surfaces for the Australian Soil Resource Information System using soil maps and look-up table ,” CSIRO Land and Water Technical Report 24/01, CSIRO, 2011. [32] S. J. Hook and A. B. Kahle, “The micro Fourier Transform Interferometer (μFTIR)—a new field spectrometer for acqui- sition of infrared data of natural surfaces,” Remote Sensing of Environment, vol. 56, no. 3, pp. 172–181, 1996. [33] A. Walkley and I. A. Black, “An examination of the Degtjareff method for determining organic carbon in soils: effect of variations in digestion conditions and of inorganic soil constituents,” Soil Science, vol. 63, pp. 251–263, 1934. [34]R.D.Hewson,L.B.Whitbourn,andG.R.Taylor,“Application of TIMS imagery and airborne CO2 laser spectroscopy for geological and geomorphological investigations in an arid environment, Western NSW, Australia,” in Proceedings of the 12th International Conference of Applied Geologic Remote Sensing, vol. 2, pp. 373–384, 1997. [35] J. W. Salisbury, D. M. D’Aria, and L. E. Brown, Infrared (2.08– 14 μm) Spectra of Soils: A Preliminary Report,Departmentof Earth and Planetary Sciences, Johns Hopkins University, 1990. [36] J. W. Salisbury and J. W. Eastes, “The effect of particle size and porosity on spectral contrast in the mid-infrared,” Icarus, vol. 64, no. 3, pp. 586–588, 1985. Hindawi Publishing Corporation Applied and Environmental Soil Science Volume 2012, Article ID 274903, 12 pages doi:10.1155/2012/274903

Research Article The Effects of Spectral Pretreatments on Chemometric Analyses of Soil Profiles Using Laboratory Imaging Spectroscopy

Henning Buddenbaum1 and Markus Steffens2

1 Environmental Remote Sensing and Geoinformatics, Trier University, 54286 Trier, Germany 2 Lehrstuhl fur¨ Bodenkunde, Technische Universitat¨ Munchen,¨ 85350 Freising-Weihenstephan, Germany

Correspondence should be addressed to Henning Buddenbaum, [email protected]

Received 17 February 2012; Revised 11 May 2012; Accepted 18 September 2012

Academic Editor: Raphael Viscarra Rossel

Copyright © 2012 H. Buddenbaum and M. Steffens. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Laboratory imaging spectroscopy can be used to explore physical and chemical variations in soil profiles on a submillimetre scale. We used a hyperspectral scanner in the 400 to 1000 nm spectral range mounted in a laboratory frame to record images of two soil cores. Samples from these cores were chemically analyzed, and spectra of the sampled regions were used to train chemometric PLS regression models. With these models detailed maps of the elemental concentrations in the soil cores could be produced. Eight different spectral pretreatments were applied to the sample spectra and to the resulting images in order to explore the influence of these pre-treatments on the estimation of elemental concentrations. We found that spectral preprocessing has a minor influence on chemometry results when powerful regression algorithms like PLSR are used.

1. Introduction can be studied from sub-millimetre to decimetre scale. Comparable examinations on geologic cores have been Soils show a high degree of horizontal and vertical variation introduced by Kruse [7]. in physical and chemical properties. Visible and near- The soil core spectroscopic images can be used for infrared spectroscopy is an established tool to qualitatively various purposes, for example, for classifying soil types and quantitatively characterize these properties in soil and their horizons [8] or for a characterization of the samples [1–3]. Imaging spectroscopy is an approach that spatial heterogeneity of the soil profiles. This paper deals simultaneously creates VIS-NIR spectra for a complete image with the derivation of high-resolution maps of elemental thus enabling analyses of the spatial distribution of these concentrations in the soil profiles that can serve as a basis properties. In most cases imaging spectroscopy is applied for soil classification, or for studying soil forming processes. from above, that is, an air- or space-borne sensor looking Several regression methods (e.g., stepwise multiple linear at the soil surface. The third spatial dimension, depth, is regression [9, 10], support vector regression [10], penalized- heterogeneous on much smaller scales but is invisible to spline signal regression [10],artificialneuralnetworks[11, remote sensors. Spectroscopic analyses of soil profiles can 12], multivariate adaptive regression splines [13], random be done, for example, by measuring disturbed samples taken forests, boosted regression trees [14], principal component from different depths in the laboratory or by measuring the regression [14, 15], narrow-band vegetation indices [16], reflectance at different depths in boreholes [4]. However with and partial least squares regression (PLSR) [3]) have been these methods only few measurements can be made so that used to quantitatively derive information from reflectance they cannot be used for a high-resolution characterization or absorption spectra. Viscarra Rossel and Behrens [17]give of complete soil profiles and their spatial variability. Our a comprehensive comparison of many of these techniques. alternative is to take complete soil cores and measure their Among these regression methods, PLSR [18, 19]hasbecome reflective properties with a laboratory imaging spectrometer one of the most popular for chemometry in recent years and [5, 6]. This way the vertical distribution of soil properties will be used in this study. 2 Applied and Environmental Soil Science

In addition, spectral pretreatment can have a large 2. Material and Methods impact on the result of chemometric mapping. Several spectral preprocessing methods have been introduced, for 2.1. Study Sites and Soil Sampling. We sampled two soil types ff ff example, Ben-Dor et al. [20] used first and second derivative to compare the e ects of di erent spectral pre-treatments on absorption spectra to enhance the spectral information in the predictive power of PLSR for elemental mapping. The order to illustrate the spectral changes during a decom- first profile was sampled in a Norway spruce (Picea abies) position process. Udelhoven et al. [21] applied min-max monoculture near Freising (SE-Germany), approximately normalization, convex-hull computation, first derivatives, 35 km northeast of Munich. This soil was classified as a stag- and vector normalization, after centering each spectrum nic cutanic Luvisol (ruptic, epidystric, and siltic; WRB 2006). around its average. Vasques et al. [9] applied thirty pre- The soil is formed on tertiary clastic sediments which is processing transformations, including derivatives, normal- sporadically covered by quaternary aeolian sediments (). ization and nonlinear transformations on spectra from The second soil was sampled near Wellheim, approximately 554 soil samples from Florida to derive their organic 30 km west of Ingolstadt (SE-Germany). It was classified as a carbon content. Stevens et al. [10] used first and second folic albic (WRB 2006) that formed from cretaceous derivatives, first and second gap derivatives, Savitzky-Golay sands under a Norway spruce monoculture. smoothing and derivatives, Whittaker smoothing, standard We used custom-made stainless steel boxes (100 mm × normal variate transformation and detrending (SNV-DT), 100 mm × 300 mm) to sample 30-cm deep soil profiles. and a combination of these methods. Stenberg and Viscarra After removing the litter layer, the steel boxes were gently Rossel [22] show the effects of log 1/R transformation, hammered vertically into the soil and dug out so that an ff undisturbed profile was sampled. The soil cores were oven- first derivative, and SNV-DT on soil di use reflectance ◦ spectra. Hively et al. [23] evaluated untransformed spectra dried at 30 C to a constant weight before imaging. und first and second derivatives with gaps of 1 to 64 bands for the estimation of several variables from airborne 2.2. Imaging Setup. The images were acquired using a hyperspectral data. Rinnan et al. [24] review the most hyperspectral scanner with 160 bands in the visible and near common pre-processing techniques for near-infrared spectra infrared 400–1000 nm spectral range (NEO HySpex VNIR- in chemometry. The methods are divided into two categories: 1600) mounted in a laboratory frame with a translation stage scatter-correction methods and spectral derivatives. Only few under the scanner. The translation stage moves the object in studies systematically compare the effect of these methods on along-track direction, while the push-broom scanner records chemometric spectroscopy (e.g., [9, 10, 23, 24]), and none image lines across track. The speed of the translation stage covers pre-treatment for laboratory imaging spectroscopy. is adapted so that square pixels result. The field of view In this study, we compare the effect of eight different spec- in the setup is about 10 cm wide and consists of 1600 tral pre-processing methods on PLSR chemometry of five pixels resulting in about 62 μm spatial sampling distance. chemical elements (iron = Fe, aluminium = Al, manganese The full 30-cm soil cores are imaged in 4800 lines; about = Mn, carbon = C, and nitrogen = N) in two soil profiles 200 additional image lines contain the Spectralon white (Luvisol and Podzol). The pre-processing methods are as reference panel and the metal frame. Light sources illuminate follows: the currently scanned line from two directions in order to minimize shadows on the soil surface. A Spectralon (i) reflectance spectra (R, no pre-processing), white reference panel of known reflectance was scanned with the sample so radiance values could be transformed (ii) standard normal variate and detrending (SNV-DT), to reflectance. No geometric correction was applied to the (iii) first derivative of reflectance (1st D), images [5]. After recording the first image, homogeneous regions (iv) second derivative of reflectance (2nd D), of interests (ROIs) of about 1 to 2 cm2 area and about (v) continuum removed reflectance (CR), 1 cm depth were visually identified in the soil cores and sampled for chemical laboratory analysis. Samples were (vi) normalized continuum removed reflectance (NCR), selected so that most of the variation within the soil cores was covered in the different samples while having a small within- (vii) multiplicative scatter correction (MSC), sample variation. The ROIs were equally distributed over (viii) extended multiplicative scatter correction (EMSC). thewholeprofilefaceareaandcoveredallhorizons.After taking the samples, a layer of about 15 mm was removed These methods were selected because they are commonly from the profile face, the surface was carefully flattened, a used in spectroscopy and because each transforms the new image of a slice parallel to the first was taken, and spectra in different ways with a different reasoning. For new ROI samples were collected. In case of the Luvisol, 7 each set of transformed spectra and each chemical element images were recorded, and 66 samples were analyzed. Four PLS regression models were established; the best model was images of the Podzol were recorded, and 35 samples were chosen and used on the image data in order to create maps taken. of the distribution of the elements in the soil profiles. Model In order to explore the information content of the accuracies and maps were compared to study the effect of images, we calculated principal component analyses. Figure 4 the spectral pre-treatments. shows an example false colour composite of principal Applied and Environmental Soil Science 3 components of the Podzol profile, revealing information from it to correct for wavelength-dependent scattering ef- hidden in the real-colour image. fects.

2.3. Chemical Analyses. Prior to chemical analyses, the 2.4.3. First Derivative of Reflectance (1st D). Like SNV-DT, ◦ < ROI samples were dried at 50 Candsievedto 2 mm. 1st D is a method that removes the baseline from spectra Total C and total N concentrations were determined in while stressing absorption features. The first derivative was duplicate by dry combustion on a EuroEA elemental analyzer calculated via a Savitzky-Golay smoothing filter [11] using (Hekatech GmbH, Wegberg, Germany). All samples were the hyperspectral image processing software EnMAP-Box free of carbonates so that the total C concentration equals (Version 1.1, Humboldt-Universitat¨ zu Berlin, Germany, the organic carbon (OC) concentration. Quantity and http://www.hu-geomatics.de/). The original 160 band data quality of iron and manganese species were analysed on set was used, because smoothing is part of the processing. < bulk soils 2 mm from all samples excluding the organic In the Savitzky-Golay derivative procedure, a first-order surface layers and organic matter rich topsoils. Total Fe polynomial was fitted to spectral windows of 7 bands and Mn oxides were extracted using the dithionite-citrate- width. The derivative of this polynomial was assigned as bicarbonate-method (DCB; [25]) and measured as Fe, Mn, the new value of the central band. The first and last three and Al concentrations in the extracts by inductively coupled bands were discarded, so that 154 bands resulted like in plasma optical emission spectroscopy (ICP-OES; Vista Pro the other methods. Vasques et al. [9] consistently found CCD Simultaneous, Varian, Darmstadt). Savitzky-Golay derivatives among the best pre-processing transformations. Ertlen et al. [30] state that more useful 2.4. Spectral Pretreatments. We used 8 spectral pre-treat- information can be extracted from near-infrared spectra if ments prior to PLSR analyses. The mean reflectance and the derivatives of the spectra are taken. transformed spectra of 35 ROI regions corresponding to the sampling spots in the Podzol profile are shown in Figure 1. 2.4.4. Second Derivative of Reflectance (2nd D). The 2nd Some of the techniques presented here are discussed in more D has been applied many times in remote sensing and details in [24]. spectroscopy, for example, for the elimination of background signals and for differentiating overlapping signatures [31, ρ 2.4.1. Reflectance Spectra (R). Absolute reflectance was 32]. The second derivative was calculated from the 1st D derived from radiance measurements of the sample and the spectra. Kessler [33] states that consecutive first derivatives white reference separately for each image line by calculating result in less noisy spectra than higher-order derivatives the ratio of soil and reference radiances and multiplying this directly applied to the original data, so the second derivatives with the reference’s known reflectance [5, 26]. were calculated as derivative of the first derivative with In order to reduce image noise and calculation time, identical settings. the image resolution was reduced by a factor of 4 (half the number of lines and rows, resp.). Then the spectra were smoothened using a Savitzky-Golay filter [27] with a 2nd- 2.4.5. Continuum Removed Reflectance (CR). CR [34]is order polynomial across a moving window of 7 spectral calculated by fitting a convex hull (the continuum) to the bands. The first and last three bands were discarded, so that spectrum and then dividing the spectrum by the hull at 154 of the original 160 bands remained. This image was used each wavelength. This preprocessing gives a CR value of 1 as input for the different pre-processing methods except for to all parts of the spectrum that lie on the convex hull (i.e., the derivatives. PLSR results of the spectra without further wavelength regions that are not in an absorption band) and processing are the reference for the other pre-processing values between 0 and 1 to regions inside absorption bands. methods. So CR accentuates the absorption bands in the spectra while minimizing brightness differences. Continuum removal was done in Envi (Version 4.7, ITTV is, now Exelis Visual 2.4.2. Standard Normal Variate and Detrending (SNV-DT). Information Solutions). All CR calculations were applied on SNV-DT was developed by Barnes et al. [28]toremove the complete wavelength range, not just single absorption multiplicative interferences of scatter and particle size and to bands. account for the variation in baseline shift and curvilinearity in diffuse reflectance spectra. Standard normal variate, also known as z-transformation 2.4.6. Normalized Continuum Removed Reflectance (NCR). or as centering and scaling [29], normalizes each spectrum ρ NCR spectra (also known as Band Depth Normalization, to zero mean and unit variance by subtracting the mean of [34, 35]) were created by scaling each CR spectrum to the this spectrum ρ and dividing the difference by its standard full0to1rangebycalculating deviation σρ: − ρ − ρ = CR CRmin = . NCR , (2) SNV (1) CRmax − CRmin σρ

This is followed by a detrending step: a 2nd-order polyno- where CRmin and CRmax are the minimum and maximum mial is fit to the SNV transformed spectrum and subtracted values of a CR spectrum, respectively. The effect of band 4 Applied and Environmental Soil Science

0.4 0.4 0.2

0 0.2 −

Reflectance 0.2 SNV−DT reflectance 0 −0.4 400 600 800 1000 400 600 800 1000 Wavelength (nm) Wavelength (nm) (a) (b) ×10−3 ×10−4 6 10

4

2 5

0 0 −2

Derivative ofDerivative reflectance −

4 of2nd derivative reflectance −5 400 600 800 1000 400 600 800 1000 Wavelength (nm) Wavelength (nm) (c) (d)

1 1

0.8 0.5 0.6 NCR reflectance CR reflectance

0.4 0 400 600 800 1000 400 600 800 1000 Wavelength (nm) Wavelength (nm) (e) (f)

0.4 0.4

0.3 0.3

0.2 0.2

MSC reflectance 0.1 0.1 EMSC reflectance

0 0 400 600 800 1000 400 600 800 1000 Wavelength (nm) Wavelength (nm) (g) (h)

Figure 1: Mean reflectance (a) and preprocessed spectra ((b)–(h)) of the 35 sampled regions of interest of the Podzol soil core.

depth normalization is that the shape of absorption bands correction, is another pre-processing technique for baseline instead of their depth becomes the main feature of the correction in spectra. It assumes that the wavelength- spectra. dependent scatter effects on the spectrum can be separated from the chemical information. This is done by correcting 2.4.7. Multiplicative Scatter Correction (MSC). Multiplicative the different spectra to an “ideal” spectrum so that baseline scatter correction [36], also known as multiplicative signal and amplification effects are at the same average level in every Applied and Environmental Soil Science 5

0 0

−0.1 −0.1

−0.2 −0.2

−0.3 −0.3 R −0.4 −0.4 R

−0.5 −0.5

−0.6 −0.6

−0.7 −0.7

−0.8 − 0.8 400 600 800 1000 400 600 800 1000 Wavelength (nm) Wavelength (nm) Al Mn C Fe N (a) (b)

Figure 2: Correlation spectra for Al, Fe, and Mn content (a) and for C and N content (b) for the Podzol.

spectrum {Martens, 1991 page 156}. As this ideal spectrum concentrations. The resulting regression coefficients were is unknown, the mean spectrum x is used. This spectrum then applied on the reflectance or transformed images in represents the mean scattering and offset. Each spectrum xi is order to create maps of the elemental concentrations. The then fit to the mean spectrum using a least squares method: calculations analyses were carried out in MATLAB (Version 8.0, The Mathworks). xi = ai + bix + ei. (3) Single reflectance bands can be correlated to elemental concentrations as measured with standard laboratory tech- Ideally, ei contains the chemical information, because scat- niques. The spectral dependency of this correlation can be tering and offset are represented by the coefficients ai and illustrated by a plot of the coefficient of correlation for every bi. The MSC spectrum is calculated by determining the single band with the elemental concentration [38]. Figures 2 coefficients for each spectrum and then transforming the and 3 show correlation spectra between the reflectance values spectrum as follows: at each wavelength and the five elemental concentrations for the Podzol and the Luvisol. In the case of the Podzol, xi − ai − − − MSCi = . (4) single bands have correlations of up to 0.77, 0.78, 0.61, bi −0.74, and −0.60 with Al, Fe, Mn, C, and N concentrations, respectively. In the case of the Luvisol, the highest single 2.4.8. Extended Multiplicative Scatter Correction (EMSC). band correlations are −0.50, −0.52, −0.67, −0.61, and −0.45. MSC does not take wavelength dependences of scattering Fe oxides absorb mostly in the red spectral region, but into account. EMSC extends MSC by introducing wavelength due to the wide absorption features the correlation is high (λ) terms in order to correct for the wavelength-dependent in the whole visible region, at least for the Podzol. The scattering effects [33, 37]: Luvisol has a correlation minimum around 600 nm. The Al correlation curves follow the Fe curves closely due to the 2 xi = ai + bix + diλ + eiλ , high correlation between their concentrations. The Podzol Mn correlation spectrum shows no distinct features, while xi − ai − diλ − eiλ2 (5) EMSCi = . in the Luvisol the strongest correlation is between 450 and bi 500 nm. C and N have the highest correlations in the visible domain. Only in the Luvisol the C correlation is further 2.5. Regression Analyses with PLSR. All regression analyses differentiated with a correlation maximum around 600 nm. were calculated using the mean reflectance or transformed Combinations of spectral bands are known to explain higher spectra of the ROIs and the corresponding elemental proportions of the variance, so a tool that makes use of 6 Applied and Environmental Soil Science

0 0

−0.1 −0.1

−0.2 −0.2

−0.3 −0.3 R R −0.4 −0.4

−0.5 −0.5

−0.6 −0.6

−0.7 −0.7 400 600 800 1000 400 600 800 1000 Wavelength (nm) Wavelength (nm)

Al Mn C Fe N (a) (b)

Figure 3: Correlation spectra for Al, Fe, and Mn content (a) and for C and N content (b) for the Luvisol. all bands was chosen for the regression of chemical soil of regressors M and the number of observations n.Itwas constituents [14]. calculated from R2 using the following: We calculated the regression between the reflectance and   n − 1 the elemental concentrations with a PLSR. PLSR projects the adjR2 = 1 − 1 − R2 . (6) original data into a low-dimensional space formed by a set of n − M − 1 orthogonal latent variables by a simultaneous decomposition RMSE was calculated from the difference of predicted values of X (spectral matrix) and Y (elemental concentration yp and observed values yo as follows: matrix) that maximizes the covariance between X and Y  [3]. The method is well suited for the calibration of a small  n    2 number of samples with experimental noise in both chemical =  1 y − y . RMSE n p,i o,i (7) and spectral data [14]. i=1 In order to find the optimum number of latent vari- ables, we calculated PLSR models with 1 to 15 latent %RMSE was derived by dividing RMSE by the mean of the variables on the ROI spectra for each analyzed element, observed variable. separately for both images. We applied leave-one-out cross- validation (LOOCV) on each model to avoid overfitting 3. Results [35]. Because this was mostly a feasibility study, no further calibration/validation scheme was applied. In cases where 3.1. Chemical Analyses. Basic statistics of the chemical the samples are autocorrelated, LOOCV can also increase analyses of the two soil cores are collected in Tables 1 to 3. overfitting [39], but we decided to keep the validation Some elements have a very high skewness. We repeated the strategy simple because plausible maps resulted from this calculations on log-transformed data for these elements, but strategy. The accuracy of each model is given as coef- the results did not get better, so we only show the results from ficient of determination (R2), adjusted R2,andrelative untransformed data. Tables 2 and 3 show the correlations root mean square errors (%RMSE). For each element and between the element concentrations for the Podzol and the each spectral pre-processing method usually the number Luvisol, respectively. of latent variables with the lowest resulting RMSEcv was Elemental concentrations and correlations are different chosen. Selection of the optimal number of latent variables between the two different soil types. The Podzol has higher in the PLS estimation is a crucial step. In cases when concentrations of Al and Fe; the Luvisol has higher concen- the different measures of accuracy suggested a different trations of C and N. The contents of Al and Fe and those number of variables, the most parsimonious model was of C and N are highly correlated (Tables 2 and 3), while Mn chosen. is correlated more loosely to the other elements in both soil The adjusted R2 is a coefficient of determination that types. All correlations between the inorganic proxies (Al, Fe, rewards parsimonious models by incorporating the number and Mn) and the organic proxies (C and N) are positive in Applied and Environmental Soil Science 7

Table 1: Basic statistics for elemental concentrations of Al, Fe, Mn, C and N in ROI samples of Podzol and Luvisol.

Podzol Luvisol Al Fe Mn C N Al Fe Mn C N [mg g−1] Mean 4.31 13.47 0.0139 3.05 0.076 1.736 10.844 0.873 10.33 0.599 Min 0.39 2.71 0.0043 0.46 0.010 0.997 5.796 0.241 1.797 0.180 Max 10.72 35.79 0.0252 12.53 0.260 2.451 16.397 1.885 184.7 8.505 Skew 0.457 0.628 0.238 1.67 1.41 −0.028 −0.142 0.332 2.16 6.24 Stddev 3.42 10.29 0.0059 2.46 0.066 0.382 2.664 0.470 24.23 1.088 N 35 35 33 35 16 32 32 32 66 66

Table 2: Correlations of elemental concentrations in the Podzol ROI soil samples with significance levels (∗∗∗P<0.001, ∗∗P<0.01, ∗P<0.05).

Al Fe Mn C N Al 1 Fe 0.89∗∗∗ 1 Mn 0.50∗∗ 0.62∗∗ 1 C0.62∗∗∗ 0.64∗∗∗ 0.38∗ 1 N0.57∗ 0.50∗ 0.28 0.93∗∗∗ 1 the Podzol but negative in the Luvisol. Therefore, the models (Figure 4) provide a very detailed view on the vertical soil for Al and Fe and the models for C and N are expected to structure. While usually spectroscopic methods are used on have similar coefficients and model accuracies, respectively. small, homogenized samples, or as imaging spectroscopy These correlations also explain why Al and N can be detected from above, the methods presented here facilitate new by VNIR spectroscopy, although they are not optically active insights to the spatial distribution of elemental concentra- in the observed spectral region between 400 and 1000 nm. tions in soils. The maps can be used for soil classification, differentiation of characteristic horizons [8], or for the quantitative evaluation of soil forming processes. 3.2. Mapping of Chemical Soil Constituents. The numbers of We assume the different correlations between the organic latent variables and values of R2,adjustedR2,and%RMSEas and inorganic parameters in the two soil types (Tables 2 and accuracy measures achieved with these latent variables for the 3) to be the product of different soil forming processes. In PLS estimations of elemental concentrations of the Podzol the Luvisol, C and N accumulate in the topsoil including the soil core are stated in Table 4. The corresponding results for purely organic surface layer and the mixed organic/inorganic the Luvisol soil core are shown in Table 5. Ah horizon, while Al, Fe, and Mn show high concentrations The number of latent variable for the PLS regressions in the inorganic . This spatial separation of the is between 1 and 7 for all elements, with 4 being the most different materials is expressed by the negative correlation. common number. Not all of the selected models explained In the Podzol, C and N accumulate together with Al, Fe, and more variance than the simple regression with a single Mn in the spodic horizon in the subsoil. This can be seen reflectance band as explanatory variable. in a positive correlation between the organic and inorganic Figure 4 shows a real-colour image of the Podzol profile elements in the Podzol. at the left. The second panel shows principal components Figure 5 shows adjusted R2 values for all elements and of the image, revealing the large amount of information in both soil types as an aggregation of Tables 4 and 5. While the the hyperspectral image that cannot be seen by the human amounts of Al and Fe can be estimated best in the Podzol, C eye. The first principal component is not shown, because and N are estimated with the highest accuracy in the Luvisol it mainly contains the brightness of the image, a piece of profile. C and N contents in the Luvisol are much higher information that is already present in the left panel. The right than in the Podzol. This may be the reason for the higher panels are examples of chemometric maps of the element estimation accuracies in the Luvisol. Estimations of Mn have distribution in the profile, acquired by help of different a rather low accuracy for both soil types and all spectral spectral pre-treatments and application of the PLSR relations pre-treatments. We assume this to be the result of the low established on the ROIs to the images. concentrations in both soils and the associated low accuracy and the circumstance that Mn and organic substances have 4. Discussion both low reflection across the full analysed spectral range. The influence of the different pre-treatments on the PLSR The resulting submillimetre resolution maps of chemical estimation of elemental concentrations in laboratory spec- soil constituents of 10 cm × 30 cm sections of soil profiles troscopic images of soil profiles is rather small, especially for 8 Applied and Environmental Soil Science

0 2 4 6 8 100 2 4 6 8 100 2 4 6810 0246810 0

2

4

6

8

10

12

14

16

18

20

22 3.6 12

24 ) ) 1 1 − − 26 C (mg g 28 (mg g Fe

30 00

Figure 4: Real-colour image of the Podzol profile, false-colour image of 2nd, 3rd, and 4th principal component, map of Fe content from MSC transformed image, and map of C content from EMSC transformed image, with centimetre scales.

Adjusted R2 Podzol Adjusted R2 Luvisol 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 Al Fe Mn C N Mean Al Fe Mn C N Mean

R CR R CR

SNV-DT NCR SNV-DT NCR

Diff MSC Diff MSC

2nd Diff EMSC 2nd Diff EMSC (a) (b)

Figure 5: Adjusted R2 values for the PLSR models of the unchanged reflectance spectra and 7 pre-treatments for Al, Fe, Mn, C, and N. Applied and Environmental Soil Science 9

Table 3: Correlations of elemental concentrations in the Luvisol ROI soil samples.

Al Fe Mn C N Al 1 Fe 0.96∗∗∗ 1 Mn 0.65∗∗∗ 0.57∗∗∗ 1 C −0.39∗ −0.40∗ −0.51∗∗ 1 N −0.05 −0.04 −0.35∗ 0.97∗∗∗ 1

Table 4: Number of latent PLSR variables, coefficient of determination (R2), adjusted R2, and relative root mean square error (%RMSE) for the selected PLSR models of chemical constituents of the Podzol. Al Fe Mn C N N 77754 R2 R 0.81 0.82 0.55 0.49 0.36 adj. R2 0.76 0.77 0.44 0.40 0.27 %RMSE 34.3 32.1 28.8 58.9 75.3 N 55544 R2 0.79 0.83 0.55 0.51 0.36 SNV-DT adj. R2 0.76 0.81 0.47 0.45 0.28 %RMSE 36.0 30.9 28.6 56.6 71.2 N 55411 R2 0.78 0.79 0.51 0.41 0.31 Diff adj. R2 0.74 0.75 0.45 0.40 0.28 %RMSE 36.9 34.8 29.6 61.3 71.0 N 43431 R2 0.76 0.78 0.56 0.53 0.23 2nd Diff adj. R2 0.73 0.76 0.51 0.49 0.20 %RMSE 38.8 35.3 28.0 55.3 75.2 N 36511 R2 0.84 0.78 0.49 0.56 0.28 CR adj. R2 0.82 0.73 0.40 0.54 0.26 %RMSE 31.4 36.1 31.0 53.2 72.3 N 66773 R2 0.78 0.77 0.61 0.71 0.28 NCR adj. R2 0.73 0.72 0.51 0.64 0.21 %RMSE 37.6 36.9 26.9 42.9 74.7 N 65665 R2 0.81 0.82 0.53 0.52 0.43 MSC adj. R2 0.77 0.79 0.44 0.42 0.33 %RMSE 34.7 31.7 29.3 57.1 68.4 N 54854 R2 0.80 0.77 0.57 0.49 0.41 EMSC adj. R2 0.76 0.73 0.44 0.41 0.33 %RMSE 35.4 36.8 28.1 58.1 66.2 the Podzol. Although several authors (e.g., [20, 21, 40]) note in the spectra, but PLSR does not seem to depend on that. the benefit of transforming spectra before further analyses, in Furthermore, transformations that enhance spectral features our case these transformations are not very helpful. Kooistra (e.g., derivatives) generally have little impact on the visible et al. [41] also found preprocessing unnecessary for the and beginning of the NIR region (i.e., the spectral region estimation of some chemicals using VNIR spectroscopy and measured by the sensor), because absorption features in this PLSR. A reason might be that PLSR is a powerful regression region are very broad [41]. technique that uses the full spectral range and thus finds Of the pre-treatments tested, first and especially second the necessary information in all kinds of spectra. Spectral spectral derivatives seem to be the most dangerous, although pre-treatment might emphasize the information contained they are the most widely used methods. The derivatives 10 Applied and Environmental Soil Science

Table 5: Results for the Luvisol, see Table 4 for details. Al Fe Mn C N N 44455 R2 R 0.45 0.53 0.45 0.79 0.72 adj. R2 0.42 0.50 0.41 0.77 0.69 %RMSE 16.2 16.8 39.8 41.9 32.2 N 55455 R2 0.57 0.49 0.54 0.84 0.71 SNV-DT adj. R2 0.54 0.45 0.51 0.83 0.68 %RMSE 14.4 17.9 36.2 36.1 32.7 N 77655 R2 0.65 0.67 0.56 0.81 0.71 Diff adj. R2 0.61 0.63 0.51 0.79 0.69 %RMSE 13.1 14.2 36.4 40.0 32.6 N 56532 R2 0.69 0.69 0.59 0.80 0.67 2nd Diff adj. R2 0.67 0.66 0.55 0.79 0.66 %RMSE 12.2 13.6 34.8 40.4 34.9 N 33422 R2 0.49 0.54 0.47 0.83 0.74 CR adj. R2 0.47 0.52 0.44 0.82 0.73 %RMSE 15.6 16.6 38.9 38.1 31.0 N 22322 R2 0.25 0.26 0.47 0.77 0.67 NCR adj. R2 0.23 0.24 0.45 0.76 0.66 %RMSE 18.9 21.1 38.9 43.7 35.0 N 33334 R2 0.45 0.43 0.45 0.83 0.72 MSC adj. R2 0.43 0.40 0.42 0.82 0.70 %RMSE 16.2 18.5 39.7 37.6 32.2 N 33333 R2 0.36 0.34 0.50 0.80 0.68 EMSC adj. R2 0.33 0.31 0.47 0.79 0.66 %RMSE 17.6 20.0 37.8 41.0 34.3 emphasize noise in the data more distinctly than the other pre-treatment methods are designed for baseline corrections, methods, so they should only be used when a very low so their benefit is small if the baseline does not vary much. noise level is ensured, either by low noise data or by filtering Limitations of this study are that only two different soil the data before or during the calculation of the derivatives. profiles were analyzed, that only a limited number of samples But still, they lead to the best estimations of elemental were available, and that only the wavelength region of 400 concentrations in the Luvisol. NCR is the least recommend- to 1000 nm was considered. The relatively high variability able method for our application of quantitatively deriving of chemical soil constituents in the limited space of the soil elemental concentrations. Usually the concentration is linked cores considered made it possible to train regression models to band depths, so normalizing band depths eliminate parts with acceptable accuracies that could be used for creating of the desired information. This is reflected in generally maps of the vertical distribution of chemical soil constituents low accuracy values from NCR spectra. The CR estimations in a very high spatial resolution. Future work should include have the same accuracy as the estimations from untreated several soil profiles from the same area to be able to make R spectra. CR is commonly used for baseline corrections, robust claims on the vertical distribution of soil properties in that is, especially differences in illumination and in viewing that area. geometry. In our case, smooth surfaces and artificial light from two directions combined with a column-wise radio- 5. Conclusions metric correction of the images resulted in very uniform illumination. This might explain the small benefit of CR. The Laboratory imaging spectroscopy was used for mapping same is probably true for SNV-DT, MSC, and EMSC. These the small-scale distribution of elemental concentrations Applied and Environmental Soil Science 11 in soil profiles. PLSR is a powerful regression tool that infrared spectra,” EuropeanJournalofSoilScience, vol. 62, no. makes use of all input bands and served well in finding 4, pp. 637–647, 2011. the optimal combination of spectral bands representing [9] G. M. Vasques, S. Grunwald, and J. O. Sickman, “Comparison specific elemental concentrations. Eight different spectral of multivariate methods for inferential modeling of soil pre-treatments were tested but not deemed necessary for carbon using visible/near-infrared spectra,” Geoderma, vol. PLSR analyses and only in some of the cases increased the 146, no. 1-2, pp. 14–25, 2008. prediction accuracy of the PLSR. The estimation accuracy [10] A. Stevens, T. Udelhoven, A. Denis et al., “Measuring soil of the different elemental concentrations varies according to organic carbon in croplands at regional scale using airborne imaging spectroscopy,” Geoderma, vol. 158, no. 1-2, pp. 32–45, their optical activity and their concentration. Furthermore, 2010. there are no global predictors for elemental concentrations ff [11] A. M. Mouazen, B. Kuang, J. De Baerdemaeker, and H. across di erent soil types, and the analyses have to be Ramon, “Comparison among principal component, partial adjusted to the given conditions. In future studies we plan to least squares and back propagation neural network analyses extend the spectral range of soil profile imaging spectroscopy for accuracy of measurement of selected soil properties with to the short-wave infrared region of 1000 to 2500 nm. Since visible and near infrared spectroscopy,” Geoderma, vol. 158, many absorption bands lie in this spectral region, even better no. 1-2, pp. 23–31, 2010. chemometric mapping is expected from this. [12] J. Farifteh, F. Van der Meer, C. Atzberger, and E. J. M. Carranza, “Quantitative analysis of salt-affected soil reflectance spectra: a comparison of two adaptive methods (PLSR and ANN),” Acknowledgments Remote Sensing of Environment, vol. 110, no. 1, pp. 59–78, 2007. ff Hans and Florian Ste ens are gratefully acknowledged for the [13] K. D. Shepherd and M. G. Walsh, “Development of reflectance technical assistance and Joachim Hill from the Department spectral libraries for characterization of soil properties,” Soil of Environmental Remote Sensing and Geoinformatics at the Science Society of America Journal, vol. 66, no. 3, pp. 988–998, University of Trier for providing the imaging spectrometer. 2002. The authors are grateful to three anonymous reviewers and [14] C. Atzberger, M. Guerif,F.Baret,andW.Werner,“Com-´ the editor who gave valuable comments and suggestions. parative analysis of three chemometric techniques for the This research was supported within the framework of spectroradiometric assessment of canopy chlorophyll content the EnMAP project (Contract no. 50EE0946-50) by the in winter wheat,” Computers and Electronics in Agriculture, vol. German Aerospace Center (DLR) and the Federal Ministry 73, no. 2, pp. 165–173, 2010. of Economics and Technology. [15] C. W. Chang, D. A. Laird, M. J. Mausbach, and C. R. Hur- burgh, “Near-infrared reflectance spectroscopy—principal components regression analyses of soil properties,” Soil Science References Society of America Journal, vol. 65, no. 2, pp. 480–490, 2001. [16] M. Schlerf, C. Atzberger, and J. Hill, “Remote sensing of forest [1] E. R. Stoner and M. F. Baumgardner, “Characteristic variations biophysical variables using HyMap imaging spectrometer in reflectance of surface soils,” Soil Science Society of America data,” Remote Sensing of Environment, vol. 95, no. 2, pp. 177– Journal, vol. 45, no. 6, pp. 1161–1165, 1981. 194, 2005. [2] E. Ben-Dor, S. Chabrillat, J. A. M. Dematteˆ et al., “Using [17] R. A. Viscarra Rossel and T. Behrens, “Using data mining Imaging Spectroscopy to study soil properties,” Remote Sensing to model and interpret soil diffuse reflectance spectra,” of Environment, vol. 113, pp. S38–S55, 2009. Geoderma, vol. 158, no. 1-2, pp. 46–54, 2010. [3] M. Vohland and C. Emmerling, “Determination of total soil [18] S. Wold, M. Sjostr¨ om,¨ and L. Eriksson, “PLS-regression: a organic C and hot water-extractable C from VIS-NIR soil basic tool of chemometrics,” Chemometrics and Intelligent reflectance with partial least squares regression and spectral Laboratory Systems, vol. 58, no. 2, pp. 109–130, 2001. feature selection techniques,” European Journal of Soil Science, [19] S. Wold, J. Trygg, A. Berglund, and H. Antti, “Some recent vol. 62, no. 4, pp. 598–606, 2011. developments in PLS modeling,” Chemometrics and Intelligent [4] E. Ben-Dor, D. Heller, and A. Chudnovsky, “A novel method Laboratory Systems, vol. 58, no. 2, pp. 131–150, 2001. of classifying soil profiles in the field using optical means,” Soil [20] E. Ben-Dor, Y. Inbar, and Y. Chen, “The reflectance spectra Science Society of America Journal, vol. 72, no. 4, pp. 1113– of organic matter in the visible near-infrared and short 1123, 2008. wave infrared region (400–2500 nm) during a controlled [5] H. Buddenbaum and M. Steffens, “Laboratory imaging spec- decomposition process,” Remote Sensing of Environment, vol. troscopy of soil profiles,” Journal of Spectral Imaging, vol. 2, pp. 61, no. 1, pp. 1–15, 1997. 1–5, 2011. [21] T. Udelhoven, C. Emmerling, and T. Jarmer, “Quantitative [6] H. Buddenbaum and M. Steffens, “Mapping the distribution analysis of soil chemical properties with diffuse reflectance of chemical properties in soil profiles using laboratory imaging spectrometry and partial least-square regression: a feasibility spectroscopy, SVM and PLS regression,” EARSeL EProceedings, study,” Plant and Soil, vol. 251, no. 2, pp. 319–329, 2003. vol. 11, no. 1, pp. 25–32, 2012. [22] B. Stenberg and R. A. Viscarra Rossel, “Diffuse reflectance [7] F. A. Kruse, “Identification and mapping of minerals in drill spectroscopy for high-resolution soil sensing,” in Proximal core using hyperspectral image analysis of infrared reflectance Soil Sensing, R. A. Viscarra Rossel, B. A. McBratney, and spectra,” International Journal of Remote Sensing, vol. 17, no. 9, B. Minasny, Eds., pp. 29–47, Springer Science+Business, pp. 1623–1632, 1996. Dordrecht, The Netherlands, 2010. [8] R. A. Viscarra Rossel and R. Webster, “Discrimination of [23]W.D.Hively,G.W.McCarty,J.B.Reevesetal.,“Useof Australian soil horizons and classes from their visible-near airborne hyperspectral imagery to map soil properties in tilled 12 Applied and Environmental Soil Science

agricultural fields,” Applied and Environmental Soil Science, [40] R. A. V. Rossel and C. Chen, “Digitally mapping the infor- vol. 2011, Article ID 358193, 13 pages, 2011. mation content of visible-near infrared spectra of surficial [24] A.˚ Rinnan, F. V. D. Berg, and S. B. Engelsen, “Review of the Australian soils,” Remote Sensing of Environment, vol. 115, no. most common pre-processing techniques for near-infrared 6, pp. 1443–1455, 2011. spectra,” Trends in Analytical Chemistry, vol. 28, no. 10, pp. [41]L.Kooistra,R.Wehrens,R.S.E.W.Leuven,andL.M.C. 1201–1222, 2009. Buydens, “Possibilities of visible-near-infrared spectroscopy [25] O. P. Mehra and M. L. Jackson, “Iron oxide removal from soils for the assessment of soil contamination in river floodplains,” and clays by a dithionite-citrate system buffered with sodium Analytica Chimica Acta, vol. 446, no. 1-2, pp. 97–105, 2001. bicarbonate,” in Proceedings of the 7th National Conference on Clays and Clay Minerals, pp. 317–327, 1960. [26] D. R. Peddle, H. P. White, R. J. Soffer,J.R.Miller,and E. F. LeDrew, “Reflectance processing of remote sensing spectroradiometer data,” Computers and Geosciences, vol. 27, no. 2, pp. 203–213, 2001. [27] A. Savitzky and M. J. E. Golay, “Smoothing and differentiation of data by simplified least squares procedures,” Analytical Chemistry, vol. 36, no. 8, pp. 1627–1639, 1964. [28] R. J. Barnes, M. S. Dhanoa, and S. J. Lister, “Standard normal variate transformation and de-trending of near- infrared diffuse reflectance spectra,” Applied Spectroscopy, vol. 43, no. 5, pp. 772–777, 1989. [29] O. Otto, Statistics and Computer Application in Analytical Chemistry, Wiley-VCH, Weinheim, Germany, 1998. [30] D. Ertlen, D. Schwartz, M. Trautmann, R. Webster, and D. Brunet, “Discriminating between organic matter in soil from grass and forest by near-infrared spectroscopy,” European Journal of Soil Science, vol. 61, no. 2, pp. 207–216, 2010. [31] B. L. Becker, D. P. Lusch, and J. Qi, “Identifying optimal spectral bands from in situ measurements of Great Lakes coastal wetlands using second-derivative analysis,” Remote Sensing of Environment, vol. 97, no. 2, pp. 238–248, 2005. [32] Y. Li, T. H. Demetriades-Shah, E. T. Kanemasu, J. K. Shultis, and M. B. Kirkham, “Use of second derivatives of canopy reflectance for monitoring prairie vegetation over different soil backgrounds,” Remote Sensing of Environment,vol.44,no.1, pp. 81–87, 1993. [33] W. Kessler, Multivariate Datenanalyse Fur¨ Die Pharma-, Bio. Und Prozessanalytik, Wiley-VCH, Weinheim, Germany, 2007. [34] R. F. Kokaly and R. N. Clark, “Spectroscopic determination of leaf biochemistry using band-depth analysis of absorption features and stepwise multiple linear regression,” Remote Sensing of Environment, vol. 67, no. 3, pp. 267–287, 1999. [35] M. Schlerf, C. Atzberger, J. Hill, H. Buddenbaum, W. Werner, and G. Schuler,¨ “Retrieval of chlorophyll and nitrogen in Norway spruce (Picea abies L. Karst.) using imaging spec- troscopy,” International Journal of Applied Earth Observation and Geoinformation, vol. 12, no. 1, pp. 17–26, 2010. [36] B. Datt, “Visible/near infrared reflectance and chlorophyll content in Eucalyptus leaves,” International Journal of Remote Sensing, vol. 20, no. 14, pp. 2741–2759, 1999. [37] H. Martens and E. Stark, “Extended multiplicative signal correction and spectral interference subtraction: new prepro- cessing methods for near infrared spectroscopy,” Journal of Pharmaceutical and Biomedical Analysis, vol. 9, no. 8, pp. 625– 635, 1991. [38] M. Vohland, C. Bossung, and H. C. Frund,¨ “A spectroscopic approach to assess trace—heavy metal contents in contami- nated floodplain soils via spectrally active soil components,” Journal of Plant Nutrition and Soil Science, vol. 172, no. 2, pp. 201–209, 2009. [39] D. J. Brus, B. Kempen, and G. B. M. Heuvelink, “Sampling for validation of digital soil maps,” European Journal of Soil Science, vol. 62, no. 3, pp. 394–407, 2011. Hindawi Publishing Corporation Applied and Environmental Soil Science Volume 2012, Article ID 868090, 23 pages doi:10.1155/2012/868090

Research Article Spatially Explicit Estimation of Clay and Organic Carbon Content in Agricultural Soils Using Multi-Annual Imaging Spectroscopy Data

Heike Gerighausen,1 Gunter Menz,2 and Hermann Kaufmann3

1 German Remote Sensing Data Center, German Aerospace Center, Kalkhorstweg 53, 17235 Neustrelitz, Germany 2 Remote Sensing Research Group (RSRG), Department of Geography, University of Bonn, Meckenheimer Allee 166, 53115 Bonn, Germany 3 Section 1.4 Remote Sensing, GFZ German Research Centre for Geosciences, Telegrafenberg, 14473 Potsdam, Germany

Correspondence should be addressed to Heike Gerighausen, [email protected]

Received 14 February 2012; Revised 5 May 2012; Accepted 26 July 2012

Academic Editor: Jose Alexandre Melo Dematte

Copyright © 2012 Heike Gerighausen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Information on soil clay and organic carbon content on a regional to local scale is vital for a multitude of reasons such as soil conservation, precision agriculture, and possibly also in the context of global environmental change. The objective of this study was to evaluate the potential of multi-annual hyperspectral images acquired with the HyMap sensor (450–2480 nm) during three flight campaigns in 2004, 2005, and 2008 for the prediction of clay and organic carbon content on croplands by means of partial least squares regression (PLSR). Supplementary, laboratory reflectance measurements were acquired under standardized conditions. Laboratory spectroscopy yielded prediction errors between 19.48 and 35.55 g kg−1 for clay and 1.92 and 2.46 g kg−1 for organic carbon. Estimation errors with HyMap image spectra ranged from 15.99 to 23.39 g kg−1 for clay and 1.61 to 2.13 g kg−1 for organic carbon. A comparison of parameter predictions from different years confirmed the predictive ability of the models. BRDF effects increased model errors in the overlap of neighboring flight strips up to 3 times, but an appropriated preprocessing method can mitigate these negative influences. Using multi-annual image data, soil parameter maps could be successively complemented. They are exemplarily shown providing field specific information on prediction accuracy and image data source.

1. Introduction over large areas with a high spatial resolution. The rate of repetition will depend on the issue considered. In any Soil is the uppermost, weathered layer of the earth’s crust case, conventional soil sampling strategies suffer from the which forms the interface of the lithosphere, biosphere, lack of providing data with a high spatial and temporal hydrosphere, and atmosphere. As such, it acts as one of resolution as they are costly, labor intensive, and time the major resources available to man whose conservation consuming. must be given high priority [1]. Clay and organic carbon As an alternative to standard analytical methods, visible are two key soil attributes as they contribute many benefits and near infrared reflectance spectroscopy (VNIRS) has been to its physical, chemical, and biological properties such exploited successfully for the quantitative analysis of various as , soil water holding capacity, and soil soil properties in the laboratory since many years (e.g., [6– fertility. Furthermore, the question whether CO2 can be 11]). Because of the spectral characteristics of clay minerals sequestered in agricultural soils, to what degree and under [12, 13] and organic carbon [14], there is a good chance for which circumstances [2], has dramatically increased the their prediction from reflectance spectra with a reasonable interest in estimating soil organic carbon stocks (e.g., [3–5]). to good accuracy. Yet, interactions with other soil properties Yet, monitoring soil nutrient supply, degradation status, or such as iron or sand can diminish the prediction accuracy organic carbon stocks requires information on soil properties [15]. 2 Applied and Environmental Soil Science

Imaging spectroscopy measures surface reflectance with used grass reflectance as a proxy for the estimation of a given spatial resolution on a regional scale instead of soil metal concentrations. However, the most commonly a single point, thus offering an opportunity for spatially applied strategy, and probably the most reliable in terms of explicit estimations of soil characteristics. Early studies on prediction accuracy, is the restriction of parameter estimates soil clay and organic carbon content were published by to bare soils (e.g., [22, 32, 34]). The major disadvantage Kruger¨ et al. [16], Chabrillat et al. [17], and Ben-Dor et of this approach for croplands is the loss of a noticeable al. [18]. Meantime, other studies have proven the potential percentage of the cultivated area for parameter estimation. of imaging spectroscopy to retrieve quantitative information This may be compensated if several images of one study on soil clay or organic carbon content (e.g., [19–21]). But area acquired in different years are analyzed. Parameter the number of studies is still limited and many problems estimation would hence no longer be restricted to a few remain unsolved including radiometric calibration of the fields free of vegetation but permits a gradual completion sensor and atmospheric effects [22], variable water content of soil maps of cropland on a regional scale. Applying [23], scattering effects of wide field of view sensors [20], such an approach, particular attention has to be paid to or vegetation [24, 25].Thelatterisoneofthemainfactors the temporal variability of the soil parameters of interest. hampering the estimation of soil properties on agricultural While the clay content or soil particle size distribution does fields as arable land is covered by crops most time of the year not change a lot within a single decade [35]exceptfor due to cultivation practices. Siegal and Goetz [24]reported transport and deposition of large amounts of soil particles that already 10% of green vegetation can mask mineral during major erosion events, soil organic carbon is subject absorption characteristics beyond recognition. According to both seasonal and interannual variation mainly controlled to their results, dry vegetation does not change spectral by temperature and (e.g., [36–38]). Leinweber characteristics of minerals but only affects albedo. Murphy et al. [36] reported pronounced seasonal variations in topsoil and Wadge [26] concluded the reverse for the shortwave organic carbon concentrations on arable land between 2.40 infraredregioninastudyofdifferent soil types in arid and 4.30 g kg−1, measuring weekly from April to September. terrain. These findings were approved by Murphy [27] Most distinctive was a decline between June and August. who investigated the impact of vegetation on soil mineral Rogasik et al. [39] revealed variations between 0.40 g kg−1 absorption features near 2200 nm. Bartholomeus et al. [28] and 1.15 g kg−1 organic carbon though their measurements and Ouerghemmi et al. [29] stated that prediction errors for werecarriedoutmonthlyfromNovembertoApril.Yearly clay and organic carbon content decreased progressively as measurements of topsoil organic carbon contents over more vegetation cover exceeded 5 to 10%. Different strategies have than 20 years suppose that the year-to-year variability is of been developed to account for the negative impact of vege- the same order than seasonal variations and range between tation cover on soil parameter estimations. Bartholomeus et 1.94 g kg−1 and 4.40 g kg−1 [40, 41].Theabsoluteamountof al. [30] showed that a combination of several spectral indices change varies with treatment, tillage intensity, and type of could reduce the estimation errors for soil iron content in crop litter. partially vegetated areas in their specific case. Rodger and The aim of this study is to investigate the potential of Cudahy [31] successfully introduced a vegetation corrected imaging spectroscopy for spatially explicit estimations of two continuum depth (VCCD) method to remove, or negate, the important soil constituents, clay and organic carbon, on obscuring effect of the vegetation. But research by Lagacherie agricultural fields by evaluating multi-annual hyperspectral et al. [22] and Gomez et al. [32] disclosed that continuum image data using multivariate calibration. This has, to our removal analysis is clearly inferior to the PLSR approach for knowledge, not been done so far. Three image data sets of VNIRS quantitative analysis. Apart from that, VCCD only the Australian sensor HyMap, obtained during HyEurope applies to the absorption band at 2200 nm typical for Al-OH campaigns in 2004, 2005, and 2008, were explored by means bearing minerals such as clay but cannot be adopted for the of partial least squares regression (PLSR) and reference prediction of organic carbon content. A spectral unmixing data collected during a field campaign in 2006. The paper based approach was utilized by Bartholomeus et al. [28] addresses in particular (i) the gain in percentage area with to eliminate the influence of vegetation from the spectral soil parameter estimates, (ii) the verification of model reflectance of mixed pixels over partially covered maize accuracies by comparing predictions from different years fields to estimate the soil organic carbon concentrations. and in the overlap of adjacent flight strips additional to The success of their approach essentially depends on the the common practice of validation with mostly a limited quality of the unmixing results which may deteriorate number of reference samples, and (iii) the potential impact if the spectral variability of the studied area increases. of BRDF effects on model predictions and possibilities Ouerghemmi et al. [29] explored a “blind source separation” to mitigate these effects. Prior to image data analysis, (BSS) algorithm to isolate the soil signal from mixed surfaces laboratory spectra measured under standardized conditions of vegetation and soil. They yielded accurate predictions with were investigated to assure reliable relationships between this procedure in 40% of their test cases. Problems were chemical and spectral properties. This not only allows the encountered if vegetation cover exceeded 50%. Furthermore, direct comparison between the laboratory data and the they subsumed that a better understanding of the behaviour image data but also between the laboratory models and the of the BSS algorithm is required to improve the selection image models. Furthermore, knowledge could be gained on process for the extracted soil signals. A completely different the likely predictive ability and model behavior outside the approach was accomplished by Kooistra et al. [33]who calibration data range. Applied and Environmental Soil Science 3

54◦254.29N, 12◦5217.98E

N

Demmin Berlin

River Trebele

River Peene

Demmin

Kilometers River Tollense 20246810 Kummerower Lake 53◦4540.42N, 13◦2749.45E

Figure 1: Location of the DEMMIN test site and its agricultural fields in the northeast of Germany near the city of Demmin. The black polygon outlines the maximum extent of the HyMap data acquired in the years 2004, 2005, and 2008. Sampling points are pictured as black circles. The red flags indicate the location of the two fields shown in Figure 13 (eastern flag) and Figure 14 (western flag).

2. Materials and Methods 500 to 600 mm increasing from south-east to north-west [43]. The research herein focuses on the eastern part of the 2.1. Study Site. The study area is the test site of DEMMIN test site due to the coverage of the hyperspectral image data (Durable Environmental Multidisciplinary Monitoring (Figure 1). Information Network) located in the lowlands of northeastern Germany (upper left corner: 54◦254.29N, 12◦5217.98E, lower right corner: 53◦4540.42N, 2.2. Soil Sampling and Analysis. 135 soil samples were 13◦2749.45E). The test site is an intensively used collected from the agricultural fields in DEMMIN during agricultural ecosystem mainly grown with winter grains a field campaign in September 2006 (Figure 1). To obtain (about 60%) and root crops (about 13%). About 25% a representative set of soil samples of the area, sampling are cultivated as grassland and pasture. The topography is sites were preselected with the help of existing soil maps rather flat in the north but hilly to undulate in the south and a digital terrain model. These were then reconciled with an altitudinal range of 120 m. Considerable differences with the hyperspectral image data available at this time in the parent material and relief caused a high spatial (2004, 2005) to assure the occurrence of vegetation-free variability of soil types in some parts. The flat and slightly fields with reference samples in these images. Finally, they undulating plains are characterized by sand rich regions were adapted to field conditions if appropriate. In total, 14 and extensive areas with glacial till. The sand rich regions fields were sampled which stretched across approximately are dominated by , Luvisols, and . 22 km in north-south direction and 8 km in east-west Luvisols, Albeluvisols, and evolved on glacial tills direction. 8 fields were intensively sampled (6 to 43 points poor in sand but abound in loam and clay. Soils on glacial till per field). From 6 further fields, samples from 2 to 5 in hilly terrain are often truncated and colluvial sediments locations were taken. The mean distance between adjacent have accumulated on the bottom of slopes. The floodplains sampling points within the fields was about 138 m reaching which are mainly used as grassland are characterized by 743 m at maximum. The described spatial distribution of the peaty soils [42]. The long-term mean temperature of the samples was mainly determined by the pedological setting area is about 8◦C (281 K) with an annual precipitation from of the test site which is characterized by extensive areas of 4 Applied and Environmental Soil Science

Table 1: Characteristics of available HyMap data sets and its acquisition parameters.

DoA ToA (UTC)∗ Number of strips Solar angles (◦) Heading∗∗ (◦)(±2◦) MFA MSL (m) PS (m) Azimuth: 117.2–130 August 09, 2004 13:30–14:20 7 0/180 ∼1990 4 Zenith: 46.3–52.2 Azimuth: 152.7–178.3 May 27, 2005 9:50–10:50 7 1/180 ∼1940 4 Zenith: 31.7–34.0 Azimuth: 147.1–183.1 July 29, 2008 9:50–11:20 10 0/180 ∼1934 4 Zenith: 35.3–38.7 ∗ Local time is given by UTC+2. ∗∗Neighboring strips have the opposite flight direction. DoA: day of acquisition, ToA: time of acquisition, MFA ASL: mean flight altitude above mean sea level, and PS: pixel size. sandy and loamy soils, and distinct but smaller regions of from 350 to 2500 nm. The ASD has a fixed spectral loamy and heavy clay soils. In conjunction with the flight sampling interval of 1.4 nm in the visible and near infrared windows, soils abound in clay and organic carbon could only portion of the spectrum (350–1000 nm) and 2 nm in be encompassed in the reference data by sampling single the shortwave infrared (1000–2500 nm) with a full-width fields. half maximum (FWHM) of 3 nm and 10 nm, respec- Soil samples were taken from the top 5 cm of soil tively. All measurements were taken with bare fibre and within a radius of 2 m. The geographical coordinates of without foreoptics (25◦ FOV) relative to a standardized the sampling points were measured using differential GPS spectralon panel. A 1000 W tungsten lamp was used as with an accuracy of less than 1 m. At 18 sampling points, light source to illuminate the target at an incident angle five samples were taken within the radius of 2 m and of 30◦. Before the measurements, all soil samples were analyzed in the laboratory to assess the error of the applied grounded with a mortar and passed through a 2 mm sieve sampling strategy and to get an idea of the possible impacts to reduce anisotropic light scattering. To standardize the of subsampling [44]. We estimated this standard error of moisture level, they were oven-dried for 48 h at 105◦C sampling (SES) using the following equation from Lozan´ and (378 K). Kausch [45]: A spectrum of a soil probe was averaged out of ten      measurements with a system integration time of 100 scans     2  m k x2 − k x /k per single collected spectrum. Noise removal and smoothing  i=1 j=1 ij j=1 ij (1) SES = , was accomplished using the Savitzky Golay filter [46] n − m implemented in IDL [47]. where x is the measured value, k is the number of samples taken at each point, m is the number of sampling points, and 2.3.2. Imaging Spectroscopy. Hyperspectral image data from n is the total number of samples analyzed. the test site DEMMIN was acquired by the Australian sensor − Particle size distribution [g kg 1], carbon content HyMap mounted on a Dornier Do 228 aircraft and a Cessna − − [g kg 1], total iron [g kg 1], and soil acidity (pH) were ana- 208 B during three HyEurope flight campaigns. The HyMap lyzed by the Central Laboratory of the Centre for Agricultural airborne imaging spectrometer records reflected radiances in Landscape Research (ZALF) using air-dried samples. Particle 128 bands in the wavelength range from 450 to 2480 nm with size distribution was determined using a combined sieve a spectral bandwidth (FWHM) of 15-16 nm between 450 (fraction 0.063–2 mm) and pipette (fraction <0.063 mm) and 1900 nm and 18–20 nm between 1950 and 2480 nm. The analysis. The total amount of carbon (TC) was determined sensor’s field of view (FOV) is 60◦ with 512 pixel in across by dry combustion using an element analyzer (CNS2000 track direction. The instantaneous field of view (IFOV) is from LECO). Inorganic carbon content (IC) was analyzed 2.5 mrad along track and 2.0 mrad across track. Overflights measuring the change in volume during decomposition took place on August 9, 2004, May 27, 2005, and July 29, with phosphoric acid using a Carmhomat 12D. Organic 2008 at an average flight altitude of about 2000 m resulting carbon content (OC) was then calculated as the difference in a spatial resolution of 4 × 4m2 (Table 1). At each of between TC and IC. Iron contents were determined by the campaigns, seven to ten strips of about 16 km length dissolution with aqua regia. Soil acidity was measured by were acquired under clear-sky conditions covering an area of radiometer analytics with a TitraMaster85. The standard approximately 200 km2 and 260 km2,respectively. − error of laboratory (SEL) for soil clay content was 5.0 g kg 1. Radiometric calibration of the HyMap data was per- Analytical techniques for the determination of TC and IC formed inflight using reference ground measurements [48]. − − hadaSELof0.17gkg 1 and 2.88 g kg 1,respectively. Atmospheric correction and conversion of at-sensor radi- ances in reflectance values was accomplished using the 2.3. Spectral Measurements ATCOR-4 software [48, 49]. In a final step, the data was geometrically corrected by means of a generic ortho image 2.3.1. Laboratory Spectroscopy. Spectral measurements were processor [50]. acquired from all soil samples using an ASD spectrora- For each of the sampling points in the field, HyMap diometer FieldSpec Pro covering the full spectral range spectra were extracted using a window of 3×3 pixels centered Applied and Environmental Soil Science 5 on the pixel with the coordinates closest to the plot location. of validation (SEP). The latter is given by the following The average of these nine spectra was then used for further equation:  statistical analysis.   n  1 2 SEP = · y i − yi , (2) n − 1 2.4. Spectral Preprocessing. Laboratory spectra were resam- i=1 pled to fit the spectral resolution of the HyMap sensor. where y i and yi are the predicted and observed values of Due to the annual recalibration of the HyMap sensor by the sample i and n is the total number of samples in the HyVista, minor differences in the centre wavelengths and validation data set. SEC is calculated the same way but bandwidth (FWHM) occur in between years. To obtain additionally takes into account the number of PLS factors spectral consistency, laboratory spectra and image spectra used in the model. As an independent measure of model from the 2004 and 2008 flight campaigns were adjusted accuracy, the ratio of performance to deviation (RPD) was to the configuration of the HyMap sensor in 2005. Several computed by dividing the standard deviation of the reference spectral transformations were applied. Absorbance was samples through the SEP [56]. If no independent validation computed by taking the logarithm to the base 10 of the is performed, information on the predictive ability of the inverse reflectance. First and second derivative of absorbance model can be retrieved by calculating the ratio of the were computed after the Savitzky Golay method with a SEC and the standard deviation (SD) of the calibration fifth-order polynomial including 5 data points either side data set (SD/SEC). For each model variable, importance of each point in the filter. Bands prone to noise were in the projection (VIP) and PLS-regression coefficients (b- eliminated before statistical analysis. Wavelengths removed coefficients) were studied to identify important wavelengths were located either side of the water absorption bands used during model calibration and evaluate their selection near 1400 nm and 1900 nm and below 500 nm and above with respect to known absorption features of the parameter 2400 nm. of interest. To compare parameter predictions between years, root 2.5. Statistical Analysis. A multitude of statistical approaches mean squared error was calculated. It is given by the comprising principal component analysis, multiple linear equation regression, artificial neural networks, or regression trees,   n   has been utilized for the quantification of soil parameters  2 =  1 · y − y . from spectroscopic data with varying success (e.g., [10, 19, RMSE n k,A(ij) k,B(ij) (3) k= 51–53]). Partial least squares regression (PLSR) has been 1 ffi exploited by many authors and proofed a good e ciency where y k,A(ij) and y k,B(ij) are the predicted values of the pixel in quantitative spectroscopy ([11], Table 1). PLSR sets up a ij in data set A andindatasetB,andn is the total number of linear regression model by projecting the predictor variables pixels. onto a few so-called latent variables or factors taking into account both, the response and predictor variation. This way, two problems of conventional multiple linear regression modeling are handled effectively: multicollinearity and vari- 3. Results and Discussion able selection. However, choosing the optimum number of factors is a crucial point in PLS modeling. They were deter- 3.1. Soil Samples Descriptive Statistics. Soil texture analysis mined applying a leave-one-out cross-validation approach to revealed a prevalence of silty sand, loamy sand, and sandy the calibration data set, choosing the model with the lowest loam for the soils from the test site DEMMIN with a number Akaike information criterion (AIC). Rather than just using of samples representing clayey loam and pure sands. The the RMSE as a selection criterion, AIC also integrates the minimum amount of clay is 36.53 g kg−1 and the maximum number of samples and the number of PLS factors avoiding is 329.47 g kg−1. Sand and silt contents ranged from 380.17 to amodeloverfitmoreeffectively. Statistical analysis of the 899.62 g kg−1 and from 61.93 to 395.67 g kg−1,respectively. spectra was carried out using the ParLes Software [54]which Theorganiccarbonrangeofallsampleswas4.31to implements the orthogonal PLS regression for one y-variable 24.79 g kg−1 but 75% of the samples show OC concentrations after Martens and Næs [55].Thedatasetswererandomly less than 11.62 g kg−1. Inorganic carbon content is generally split into a calibration set and a validation set. About two very low but a few samples feature IC contents of up to third was used for calibration purposes. The rest of the 10.79 g kg−1 (Figure 2). This may be due to local geology, samples were retained from model calibration and used as that is, the occurrence of carbonate rich till, or to liming an independent validation dataset. If the total number of on agricultural fields. The correlation matrix reveals partly available samples was less than 30, no separate validation high interrelations between the soil chemical and physical was conducted but model accuracy was solely assessed via parameters. A high positive correlation exists between clay cross-validation. Calibration and validation data schemes are andironcontent(R = 0.93). A reasonable relationship was described in detail in Section 3.5.1. detected for clay and OC. High negative dependencies exist The performance of the models was evaluated by cal- between the sand fraction of the soil and the amount of silt culating the coefficient of determination (R2), the stan- or clay. Negative correlations were also encountered between dard error of calibration (SEC), and the standard error sand and OC and sand and iron (Table 2). 6 Applied and Environmental Soil Science

1000 35

900 30 800 ) 1 − 700 25

600

) 20 1 500 −

(g kg 15 400

300 10 Particle size fractions (g kg fractions size (g Particle 200 5 100

0 0 Sand Silt Clay TC OC IC Fe2O3 (a) (b)

Figure 2: Physical and chemical characteristics of the soil samples (n = 135) from the test site DEMMIN: particle size fractions sand (63–2000 µm), silt (2–63 µm), and clay (0–2 µm) (a) and total carbon (TC), organic carbon (OC), inorganic carbon (IC), and iron content (Fe2O3)(b).

Table 2: Correlation matrix of the physical and chemical soil properties.

Sand Silt Clay IC OC Fe2O3 pH Sand 1 Silt −0.81 1 Clay −0.85 0.38 1 IC −0.03 −0.15 0.18 1 OC −0.71 0.42 0.73 −0.11 1

Fe2O3 −0.77 0.32 0.93 0.26 0.57 1 pH −0.21 −0.09 0.42 0.40 0.28 0.42 1

Physical and chemical characteristics of the soil samples imaging spectrometer was evaluated visually by plotting from the test site DEMMIN exhibit typical properties of the reflectance and first derivative of the reflectance of the agricultural soils on arable land in north eastern Germany soil probes and its corresponding sampling points. Figure 3 [57]. With respect to OC, they furthermore resemble humus displays the spectral signal of both measurement techniques conditions on a majority of cropland in Germany [58]but for a sandy loam with 10.4 g OC kg−1. Maximum reflectance also parts of the central-European agricultural region (e.g., of the sampling point varied between 24% and 34% in [34]). case of the HyMap imaging spectrometer and the Field- Findings from the five-fold sampling at 18 points showed Spec Pro, respectively. Higher reflectance of the laboratory that the applied sampling strategy and errors due to sub- measurements is probably due to oven-drying of the soil sampling cause a slightly higher uncertainty of the analytical samples as soil moisture reduces the reflectance in the entire results for clay (7.97 g kg−1) and TC (0.69 g kg−1) than is to be wavelength range (e.g., [59]). Soil roughness may also reduce expected by the standard error of laboratory. Measurement reflectance in the field and is mitigated by sieving the samples errors for IC account for 0.43 g kg−1.Thisresultisinconflict before laboratory measurements. Despite the difference in with the SEL for IC which is rather high (cf. Section 2.2). It absolute reflectance, the curves are similar in shape. Both allows the presumption that the effective standard analytical show a pronounced absorption feature near 2200 nm. A less error for OC for the samples of DEMMIN is not as high as the distinct and broader absorption feature can be observed near sum of the standard errors of TC and IC, that is, 3.05 g kg−1 500 nm and 650 nm which is clearly displayed by the first but approximates about 1.12 g kg−1. derivative. The first derivative of the reflectance discloses some differences between the spectra near 880 nm, around 1400 nm and 1900 nm, and at 2350 nm. Deviations around 3.2. Spectral Data Quality. Quality of the spectral data 1400 nm and 1900 nm are probably caused by atmospheric as acquired by laboratory measurements and the HyMap water absorption. It is presumed that the disparity near Applied and Environmental Soil Science 7

0.4 0.015

0.01

0.3 R 0.005

0 0.2 −0.005 Reflectance −0.01 0.1 of First derivative −0.015

0 −0.02 500 1000 1500 2000 2500 500 1000 1500 2000 2500 Wavelength (nm) Wavelength (nm)

ASD (lab) ASD (lab) HyMap2005 HyMap2005 (a) (b)

Figure 3: Reflectance (a) and first derivative of the reflectance (b) of one sampling point as measured under laboratory conditions (black solid line) and by the HyMap imaging spectrometer on May 27, 2005 (red solid line with diamonds indicating centre wavelengths). The soil was classified by soil analysis as a sandy loam with 10.4 g OC kg−1.

880 nm is also related to atmospheric disturbances since 0.35 a minor water absorption band occurs near 900 nm [14]. 0.3 Minor differences near 2350 nm are likely a result of different surface conditions in the laboratory and the field where 0.25 such weak absorption features are less pronounced due to interference with other factors. 0.2

Reflectance 0.15 3.3. Image Data Quality. The quality of the atmospheric corrected HyMap images from 2004, 2005, and 2008 was 0.1 assessed in two ways. First, the general consistency between 0.05 the HyMap acquisitions of the different years was evaluated. 500 1000 1500 2000 2500 This was done by comparing the mean image spectra of Wavelength (nm) a relatively homogenous concrete area among each other ASD Fieldspec 2004 HyMap 2005 and with field spectral measurements from August 2004. ASD Fieldspec+/−Stdev HyMap 2008 The overall shape of the reflectance curves of the three HyMap 2004 different years is in good agreement. A slight decline in reflectivity in the shortwave infrared region is observed for Figure 4: Comparison of mean reflectance spectra of a concrete the spectra from 2005 and 2008 compared to 2004, whereas area on a farm premise near the city of Kruckow acquired with an reflectivity in the visible and near-infrared region is highest ASD Fieldspec Pro in 2004 and the HyMap sensor in 2004, 2005, and 2008. for the spectra from 2008. But apart from a few minor deviations, all spectra range within the standard deviation of the ASD field spectral measurements (Figure 4). Second, the relative accuracy of atmospheric correction between absorption bands. The low ratio of the soil spectra in 2004 image strips of each year was assessed. Spectra of identical indicates significant differences in the overall reflectance of image pixels (within the given geometric accuracy) in the this surface type and occurs systematically in the overlap overlap of neighboring images were extracted. The ratio of of all flight strips. Image acquisition parameters in Table 1 the corresponding spectra of a soil surface and a former suggest that a high solar zenith and a small solar azimuth runway as a function of wavelength is shown in Figure 5. across the flight heading promote the appearance of BRDF Thereisagoodmatchfortherunwayinallyearsand effects on rough soil surfaces in the 2004 data set. This is for the soil surface in 2005 and 2008. The mean factor of confirmed by an across-track brightness gradient that occurs 0.95 is within the expected accuracy of the atmospheric after atmospheric correction in all scenes of this year [48]. A correction including minor BRDF effects during optimal comparison to a topographic map with a scale of 1 : 25.000 illumination conditions. Increasing noise below 0.5 µmand revealed an absolute geometric accuracy of less than 25 m. above 2.4 µm is attributed to the lower signal-to-noise ratio The relative geometric accuracy of the image data between of the HyMap sensor in these spectral regions. Artifacts near the years amounts to two pixels in x or y direction, that is, 1.4 µmand1.7to1.9µm are owed to atmospheric water 8m. 8 Applied and Environmental Soil Science

Soil Runway 1.1 1.1

1 1

0.9 0.9 Ratio Ratio

0.8 0.8

0.7 0.7 500 1000 1500 2000 2500 500 1000 1500 2000 2500 Wavelength (nm) Wavelength (nm)

HyMap 2004 HyMap 2004 HyMap 2005 HyMap 2005 HyMap 2008 HyMap 2008 (a) (b)

Figure 5: Ratio of spectra of identical image pixels (within the given geometric accuracy) in the overlap of neighboring scenes for a former runway and a soil surface.

3.4. Preparing Image Data for Spatially Explicit Parameter spectra. There was almost no misinterpretation for pixels Estimation. In the central European agricultural region, classified as vegetation. Disagreements in the soil class were bare soil is only encountered in early spring while fields mostly related to emerging green vegetation as some of the are prepared for sowing of sugar beet, maize, potatoes, ground truth regions showed a small but discrete absorption or summer grain, and in late summer after harvesting feature near 680 nm but NDVI values below the assumed and seedbed preparation for winter grains (e.g., winter threshold. Accuracy statistics revealed a producer’s accuracy barley, winter wheat). Mitigating the impact of vegetation of 99% and 84% and a user’s accuracy of 86.1% and 98.8% or excluding vegetated fields from the analysis is therefore for vegetation and soil, respectively. The kappa coefficient a prerequisite for a successful application of spectroscopic was 0.83 indicating a good overall agreement for the two approaches. In a preliminary study, Daughtry et al. [60] observed classes. showed that photosynthetic active vegetation, nonphotosyn- A maximum of 8.21% of the total area covered by the thetic vegetation, and bare soils could be distinguished by HyMap scenes from May 2005 was detected free of vegetation a combination of vegetation indices, the cellulose absorp- applying the combined threshold of CAI and NDVI. In the tion index (CAI) [61, 62], and the normalized difference data sets from August 2004 and July 2008, only 2.2% and vegetation index (NDVI). They obtained better results on 1.29% of the total image area were identified as bare soils and AVIRIS data with the combination of indices than with made available for topsoil parameter estimation (Table 3). spectral unmixing. In the present study, this approach was Since the total image area covered by the HyMap data in 2008 adopted to identify fields with bare soils and mask pixels exceeds the total area covered in 2004 by roughly one-third, with vegetation signal before image data analysis. Indices the absolute number of image pixels available for image data were computed for all HyMap images using wavelengths analysis is within the same dimension. near 680 and 800 nm for the NDVI and wavelengths near In Figure 6, CAI is plotted against NDVI for all 135 2000, 2100, and 2200 nm for CAI. Thresholds for green and sampling points derived from HyMap images of the three dry vegetation were determined interactively and knowledge years. In 2005, sixty-seven sampling points reside on fields based and were set to 0.2 for NDVI and −1to−2.5 for without vegetation. The remaining sixty-eight points are CAI. The quality of the soil mask was assessed using 200 situated on fields with green vegetation indicated by NDVI ground truth regions of interest randomly distributed in values above 0.6. In 2004 and 2008, respectively, the spectral equal parts over the agricultural area classified as vegetation signal of only 20 and 11 sampling points suggested the and bare soil. The classification of the pixels was evaluated appearance of bare soil. Apart from a few data points with knowledge based with the help of the corresponding image higher NDVI values, the majority of sampling points exhibit Applied and Environmental Soil Science 9

Table 3: Area free of vegetation in the HyMap images from 2004, 2005, and 2008 given in absolute values and percentage of the total image coverage.

Proportion of image pixels with Total number of image Image pixels with NDVI < 0.2 NDVI < 0.2 and Time of acquisition pixels and −2.5 < CAI < −1.0 −2.5 < CAI < −1.0 (%) 09.08.2004 12.076.995 266.019 2.20 27.05.2005 12.495.804 1.026.187 8.21 29.07.2008 16.125.125 208.374 1.29

4 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 2 0.15 0.1 = 0.1 = Reflectance Reflectance CAI 0.57 CAI 2.5 0.05 NDVI = 0.13 0.05 NDVI = 0.15 0 0 500 1000 1500 2000 500 1000 1500 2000 CAI Wavelength (nm) 0 Wavelength (nm) 0.35 0.35 CAI = 0.45 0.3 0.3 NDVI = 0.85 0.25 0.25 0.2 −2 0.2 0.15 0.15 0.1 =− 0.1 Reflectance CAI 1.77 Reflectance 0.05 NDVI = 0.06 0 0.2 0.4 0.6 0.8 1 0.05 0 0 500 1000 1500 2000 NDVI 500 1000 1500 2000 Wavelength (nm) HyMap 2004 Wavelength (nm) HyMap 2005 HyMap 2008 Figure 6: CAI versus NDVI derived from HyMap spectra of the years 2004, 2005, and 2008 of the 135 sampling points in the field.

NDVI less than 0.2 but CAI values above zero which is clear models was confined to cross-validation in case of the evidence for nonphotosynthetic, dry vegetation. Due to the HyMap 2004 and HyMap 2008 models due to the lack of time of image data acquisition, these fields are either still vegetation-free reference points. The HyMap 2005 model grown with senescent crops or covered with stubbles which was additionally validated with the 29 samples not used remain one the fields after harvesting. during calibration. Descriptive statistics of all calibration and validation data sets are summarized in Table 4.As 3.5. Model Calibration and Validation sampling has been carried out field-by-field with a mean within-field sampling distance of adjacent sampling points 3.5.1. Calibration and Validation Schemes. Calibration and of 138 m, validation cannot be considered to be fully validation schemes were set up with respect to the availability independent from the calibration data because of spatial of vegetation-free reference points in the HyMap data from autocorrelation. Regardless of how the validation subsample 2004, 2005, and 2008. In total, three calibration data sets is selected, this can lead to an overestimation of the predictive were generated. The first one comprises the spectra of accuracy of the models in particular for unsampled fields the 20 sampling points identified in the image data from [63]. 2004 (Lab 2004, HyMap 2004). The second calibration set Except for the cal/val configuration with 38 and 29 contains 38 out of 67 vegetation-free spectra in the image samples, data range and standard deviation of the cali- data from 2005 (Lab 2005, HyMap 2005). The third-one is bration data sets were considerably smaller than in the made up of 11 spectra corresponding with the vegetation- validation data sets. For the most part of spectroscopic free sampling points in the image data acquired during the studies, validation is performed either on data resembling flight campaign in 2008 (Lab 2008, HyMap 2008). With the the data range of the calibration data set or solely by cross- laboratory spectroscopy, each of the calibration models were validation as model validity is restricted to the data range validated with the rest of the laboratory spectra withheld covered during calibration. Here, the described procedure during calibration that are 115, 97 and 124 in number. For was chosen to evaluate model performance outside the the calibration data set with 38 soil spectra, an additional calibration data range since soil characteristics of the test validation was performed utilizing only those 29 samples site DEMMIN are not adequately represented by the small which are located on fields with bare soil in the image number of vegetation-free sampling points in the image data data from 2005 (Lab 2005b). Validation of the HyMap sets. 10 Applied and Environmental Soil Science

Table 4: Descriptive statistics of the calibration and validation data sets for the prediction of clay and organic carbon content with laboratory and HyMap image spectra.

Calibration data sets Clay (g kg−1) Organic carbon (g kg−1) Laboratory HyMap Number of samples Mean SD Min Max Mean SD Min Max Lab 2004 HyMap 2004 20 108.07 48.38 48.03 186.64 9.60 3.17 5.25 19.59 Lab 2005 HyMap 2005 38 99.12 50.05 38.45 216.18 9.16 2.95 5.01 17.48 Lab 2008 HyMap 2008 11 97.31 45.68 52.22 197.20 6.66 1.47 4.31 10.34 Validation data sets Clay (g kg−1) Organic carbon (g kg−1) Laboratory HyMap Number of samples Mean SD Min Max Mean SD Min Max Lab 2004 — 115 130.22 67.45 36.53 329.47 10.42 4.02 4.31 24.79 Lab 2005a — 97 137.83 67.53 36.53 329.47 10.74 4.16 4.31 24.79 Lab 2005b HyMap 2005 29 92.06 52.36 36.53 237.52 8.89 3.73 5.55 24.79 Lab 2004 — 124 129.57 66.26 36.53 329.47 10.62 3.90 5.01 24.79

Table 5: Calibration and validation performance statistics of laboratory spectra.

Parameter Clay † † R2 Model name Number of samples (cal/val) Preprocessing Number of LV SDcal SDval SEC SD/SEC SEP RPD Lab 2004 Clay 20/115 Absorption 6 48.21 75.21 0.76 29.30 1.65 26.59 2.54 Lab 2005 Clay 38/97 First derivative of absorption 5 49.32 70.06 0.90 17.42 2.87 20.77 3.25 Lab 2005 Clay 38/29 First derivative of absorption 5 49.32 48.08 0.90 17.42 2.87 19.48 2.69 Lab 2008 Clay 11/124 Absorption 5 39.92 69.18 0.70 35.68 1.28 35.55 1.86 Parameter Organic carbon † † R2 Model name Number of samples (cal/val) Preprocessing Number of LV SDcal SDval SEC SD/SEC SEP RPD Lab 2004 OC 20/115 Absorption 2 2.53 3.77 0.51 2.36 1.34 2.36 1.71 Lab 2005 OC 38/97 Absorption 6 2.87 3.90 0.88 1.11 2.67 1.92 2.16 Lab 2005 OC 38/29 Absorption 6 2.87 2.63 0.88 1.11 2.67 1.94 1.93 Lab 2008 OC 11/124 First derivative of absorption 4 1.49 4.12 0.78 0.97 1.52 2.46 1.58 † SD: standard deviation of estimated parameter values in (g kg−1), R2 :coefficient of determination, SEC: standard error of calibration in (g kg−1), SEP: standard error of validation in (g kg−1), RPD: ratio of performance to deviation.

3.5.2. Laboratory Spectra. Table 5 summarizes calibration number of samples (Lab 2008 Clay) caused large deviations and validation statistics of the PLSR models for the predic- for some validation samples with less than 200 g kg−1 tion of clay and organic carbon content. Tests with varying clay but predictions of clay rich soils are less erroneous preprocessing methods showed that either absorbance or first (Figure 7). derivative of the absorbance gave the best prediction results. Calibrations based on at least 20 samples had RPD values Calibration data sets with a greater number of samples clearly above 2, indicating excellent prediction models. They generally yield higher R2 and lower standard errors of cali- can be assigned to group A in the classification of Chang et bration (SEC) than models with a fewer number of samples. al. [56]. Reducing the number of samples in the calibration For clay, R2 ranged from 0.70 to 0.90. SEC was lowest data set to just 11 spectra, results in less predictive power in the Lab 2005 Clay model (17.42 g kg−1) and highest in with an RPD of 1.86. Such models belong to group B and the Lab 2008 Clay model (35.68 g kg−1). Standard error of may be improved by different calibration strategies [56]. validation (SEP) varied from 19.48 to 35.55 g kg−1. Plotting Similar results on the prediction of clay from laboratory measured against predicted clay contents reveals a close rela- spectra are reported, for example, from Waiser et al. [64], tionship and a homogeneous scatter along the 1 : 1 line for Chang et al. [56], and Cozzolino and Moron´ [65]. The the Lab 2005 Clay model validated with 29 and 97 soil sam- latter performed a cross-validation with a modified PLSR ples (Figure 7). For the Lab 2004 Clay model, measured and approach on spectra from Uruguayan soils and achieved a predicted values are generally in good agreement but soils value of 2.7 (ratio of standard deviation by standard error of with more than 200 g kg−1 clay, of which none was enclosed cross validation). Waiser et al. [64]testedfourdifferent soil in the calibration data set, are slightly overestimated. Samples preparation techniques including air-dried ground samples show a systematic, linear offset from the 1 : 1 line. However, from Texas and achieved RPD values between 1.95 and deviations are within the error variance of the whole 3.51 with PLSR. Principal component regression and cross validation data set. The calibration model with the smallest validation were applied by Chang et al. [56]. They obtained Applied and Environmental Soil Science 11

Lab 2004 Clay Lab 2008 Clay 400 400

SEC = 29.3 1 : 1 line SEC = 35.68 1 : 1 line SEP = 26.59 SEP = 35.55 = 300 RPD = 2.54 300 RPD 1.86 ) n = ) n = 11 1 cal. 20 1 cal. − − n = n val. = 115 val. 124

200 200 Clay (predicted) (g kg (g (predicted) Clay Clay (predicted) (g kg (g (predicted) Clay 100 100

0 0 0 100 200 300 400 0 100 200 300 400 Clay (measured) (g kg−1) Clay (measured) (g kg−1) (a) (b)

Lab 2005 Clay Lab 2005 Clay 400 400

SEC = 17.42 1 : 1 line SEC = 17.42 1 : 1 line SEP = 20.77 SEP = 19.48 = = 300 RPD 3.25 300 RPD 2.69 ) n = ) n = 1 cal. 38 1 cal. 38 − − n val. = 97 n val. = 29

200 200 Clay (predicted) (g kg (g (predicted) Clay Clay (predicted) (g kg (g (predicted) Clay 100 100

0 0 0 100 200 300 400 0 100 200 300 400 Clay (measured) (g kg−1) Clay (measured) (g kg−1) (c) (d)

Figure 7: Plots of measured versus predicted values for clay (g kg−1) in the calibration (black symbol, ×) and validation (red symbol, +) data sets based on laboratory spectra. The respective calibration and validation data schemes (ncal./nval.) are marked on the plots. For further explanation see text.

an RPD of 1.7. In contrast to our work, the range of clay a solid calibration model for the prediction of clay content contents in the calibration and validation data sets in those using VNIR spectroscopy. studies were similar. The small range of clay concentrations R2 values of the PLSR models on organic carbon ranged in the soil samples of the calibration data sets varying from from 0.51 to 0.88 with SEC between 1.11 and 2.36 g kg−1. about 40 to 220 g kg−1 clay did hardly affect the validity of SEP varied between 1.92 and 2.46 g kg−1. The best results our models even for samples with higher clay contents. But were again achieved providing 38 samples in the calibration a very small number of samples hamper the generation of data set (Lab 2005 OC) having the highest R2 and lowest 12 Applied and Environmental Soil Science

Lab 2004 OC Lab 2008 OC

25 SEC = 2.36 1 : 1 line 25 SEC = 0.97 1 : 1 line SEP = 2.36 SEP = 2.46 = RPD = 1.58 20 RPD 1.71 20 ) ) n = 20 ncal. = 11 1

1 cal. − − n = nval. = 115 val. 124 15 15

10 10 OC (predicted) (g kg (g OC (predicted) OC (predicted) (g kg (g OC (predicted)

5 5

0 0 0 5 10 15 20 25 0 5 10 15 20 25 OC (measured) (g kg−1) OC (measured) (g kg−1) (a) (b)

Lab 2005 OC Lab 2005 OC

25 SEC = 1.11 1 : 1 line 25 SEC = 1.11 1 : 1 line SEP = 1.92 SEP = 1.94 = = 20 RPD 2.16 20 RPD 1.93 ) n = 38 ) n = 38 1 cal. 1 cal. − − nval. = 97 nval. = 29 15 15

10 10 OC (predicted) (g kg (g OC (predicted) kg (g OC (predicted)

5 5

0 0 0 5 10 15 20 25 0 5 10 15 20 25 OC (measured) (g kg−1) OC (measured) (g kg−1) (c) (d)

Figure 8: Plots of measured versus predicted values for organic carbon (g kg−1) in the calibration (black symbol, ×) and validation (red symbol, +) data sets based on laboratory spectra. The respective calibration and validation data schemes (ncal./nval.) are marked on the plots. For further explanation see text. standard errors. Calibration models for organic carbon model for samples with less than 15.0 g kg−1 OC. Samples exhibit a comparable predictive ability as those for clay. with higher concentrations show a trend of underestimation. The maximum ratio of standard deviation to SEC is 2.67 Validating the models with 20 and 11 calibration samples for the Lab 2005 OC model and minimum 1.34 for the reveals a great variety in the measured versus predicted plots Lab 2004 OC model. In contrast to the prediction of clay, which seems to increase with higher organic carbon contents. the distribution of data points in the scatter plots for the This decrease in prediction accuracy is reflected by lower validation data sets indicated less precise models (Figure 8). RPD values. With the exception of the Lab 2005 OC model A good agreement is observed between measured and pre- which has a RPD close to 2 or above 2, all models must be dicted values in the validation data sets for the Lab 2005 OC categorized in group B following the classification of Chang Applied and Environmental Soil Science 13 et al. [56]. The precision obtained for OC in our study VIP and b-coefficient curves of the organic carbon mod- is within the precision obtained by previous authors (e.g., els exhibit less consistency than for clay. Whereas high load- [34, 51, 66]).Stevensetal.[34], for example, generated a ings can be observed for the Lab 2005 OC and Lab 2008 OC PLSR model with 117 soil samples from agricultural fields model between 2000 and 2500 nm similar to clay, this in Belgian Lorraine with a RPD of 2.11. Volkan Bilgili et al. region seems of little importance in the Lab 2004 OC model [66]predictedseveralsoilpropertiesofa32hafieldinthe although a small accentuation occurs near 2200 nm. In the north of Turkey among them (SOM). latter, generally high VIP values and b-coefficients prevail In their study, a higher data range in the validation data in the 500 to 900 nm region. In the Lab 2005 OC and set (3.9–68.7 g kg−1 SOM) than in the calibration data set Lab 2008 OC models wavelengths near 650 nm and 500 to (5.0–33.8 g kg−1 SOM) did not result in a decrease of the 600 nm, respectively, are emphasized. As organic carbon in models predictive ability, neither for PLSR (RPD: 1.93) nor soils consists of very complex chemical compounds having with multivariate adaptive regression splines (MARS) (RPD: numerous, frequently overlapping absorption features [68, 1.90). This may be due to the great number of samples in 69], a spectral feature assignment to specific wavelengths their cal/val sets (153/359). is difficult [7, 70]. In addition, good correlations between Resampling laboratory spectra to HyMap, spectral res- clay and organic carbon or iron and organic carbon hamper olution had only little affect on the prediction accuracy of the elucidation of the causal relationships between model clay and none on the prediction accuracy of organic carbon parameters and its physical background. (results not shown). Similarly, Gomez et al. [21]reported little influence on prediction accuracy while resampling spectra to Hyperion resolution. 3.5.3. HyMap Image Spectra. Inafirstattempt,itwastriedto Variable importance in the projection (VIP) and PLS- set up a stable calibration model using the image data from regression coefficients (b-coefficients) provide information 2005 and validating this model with the remaining sampling about the weight of the x variables (wavelengths) in the points from 2004, 2005, and 2008. Model calibration and PLSR model. Large b-coefficients indicate wavelengths which validation with image spectra from 2005 yielded good results are relevant in the modeling of Y (soil parameter). High with RPD of 2.70 and 1.80 for clay and organic carbon, VIP values point out wavelengths which are essential for the respectively (Table 7). Validation with image spectra from ff modeling of both, Y and X [67]. Thus, x variables with a high di erent flight campaigns, that is, 2004 and 2008, reached VIP and a high b-coefficient indicate important wavelengths RPD below 1.0 indicating poor results with no or little and allow insights on the physical basis of the predictions. predictive ability especially for clay (Table 6). This is the case VIP values and b-coefficients of all laboratory models are although soil clay and OC contents of the respective sampling plotted in Figure 9. Curve shapes of the Lab 2005 Clay points are within the data range of the calibration model model and the Lab 2008 OCmodelhaveadifferent appear- and spectral data quality assessment showed a good overall ance compared to the rest due to the applied spectral pre- agreement between years (cf. Section 3.2). Experiments with processing (first derivative of absorbance). For clay, VIP and laboratory spectra revealed that the composition of the b-coefficients indicate an emphasis of wavelengths between validation data set did only slightly impair the model’s 2000 and 2500 nm and in particular around 2200 nm across predictive ability. The problem with empirical calibration ffi all models. In fact, clay minerals are spectrally active in this is that coe cients are precisely adjusted to the spectral part of the spectrum [12]. Absorption features near 2200 properties of the calibration input data. Even small changes to 2300 nm produced by combination vibrations involving in spectral data properties hamper a reasonable application. an OH stretch and metal-OH bend are diagnostic for all Laboratory spectra are measured under standard conditions clay minerals. The exact position of the absorption features where variations in spectral properties are solely a conse- varies depending on the metal which is bond to the hydroxyl quence of variations in physical and chemical characteristics. group [13, 14]. Furthermore, VIP and b-coefficients suggest Under field conditions, soil status (e.g., roughness, crusting, that wavelengths in the visible part of the spectrum near moisture) varies with time. Further variations may be 500 nm and 650 nm contain important information on clay introduced through radiometric and atmospheric correction (see Figure 4). Thus, it was not possible to set up one stable minerals or rather clay related features as the spectral activity calibration model for the estimation of soil parameters from of clays is confined to the infrared portion of the spectrum. image data but data set specific calibrations seem to be Iron, either as main constituent or associated with the crucial to retrieve reasonable prediction accuracies. structure of clay minerals, shows spectral features in the Results of data set specific PLS regressions with the VNIR due to electronic transitions of iron cations [14]. Hunt HyMap spectra from 2004, 2005, and 2008 are listed in Table and Salisbury [12], for example, attributed weak absorption 7. Calibration and validation were performed as described in bands near 500 nm to ferrous iron in montmorillonites. Section 3.5.1 (Table 4). As with laboratory spectra prediction In illites, absorption near 650 nm was observed by Clark accuracy was higher for clay than for organic carbon but a et al. [13] who ascribed this feature to ferric iron. The high number of samples in the calibration data set did not very close correlation of iron and clay (Table 2) in the soil necessarily yield the best results. For clay, R2 ranged from samples of the test site DEMMIN encourages the assumption 0.80 to 0.92 with SEC as low as 15.99 to 23.39 g kg−1.SD that iron has an influence on the PLS regression for divided by SEC is highest for the HyMap spectra from 2008 clay. reaching a value of 2.86 (HyMap 2008 clay). The calibration 14 Applied and Environmental Soil Science

Lab 2004 Clay (ncal = 20) Lab 2004 OC (ncal = 20) 10 4 0.3 100 0.2 5 2 50 0.1 cient cient ffi 0 0 ffi 0 0 VIP VIP −0.1 −50 − b-coe −5 b-coe 2 −0.2 −100 −10 −4 −0.3 500 1000 1500 2000 2500 500 1000 1500 2000 2500 Wavelength (nm) Wavelength (nm)

VIP VIP b-coefficient b-coefficient (a) (b) n = Lab 2005 Clay ( cal 38) Lab 2005 OC (ncal = 38) 10 600 6 6

400 4 4 5 2 2 cient 200 cient ffi ffi

VIP 0 0 VIP 0 0 −2 −2 b-coe − − b-coe −200 4 4 −5 −6 −6 500 1000 1500 2000 2500 500 1000 1500 2000 2500 Wavelength (nm) Wavelength (nm)

VIP VIP b-coefficient b-coefficient (c) (d)

Lab 2005 Clay (ncal = 11) Lab 2008 OC (ncal = 11) 10 10 40 10 5 5 5

20 cient cient ffi

ffi 0 0 VIP VIP 0 0 − −5 5 b-coe b-coe −20 −10 −5 −10 500 1000 1500 2000 2500 500 1000 1500 2000 2500 Wavelength (nm) Wavelength (nm)

VIP VIP b-coefficient b-coefficient (e) (f)

Figure 9: Variable importance in the projection (VIP) and b-coefficients of PLSR calibration models based on laboratory spectra using 20, 38, and 11 soil samples for clay (left side) and organic carbon (right side).

Table 6: Validation performance statistics of HyMap 2005 model transfers to HyMap spectra from different acquisition years.

Clay Organic carbon Transfer of HyMap 2005 model to SEP (g kg−1)RPDSEP(gkg−1)RPD HyMap spectra 2004 175.04 0.28 4.82 0.66 HyMap spectra 2008 239.66 0.19 1.58 0.93 model for the image spectra from 2005 is characterized by based on image spectra from 2004 with a SD/SEC ratio of a SD/SEC ratio of 2.46 (HyMap 2005 clay). Independent 2.07 (HyMap 2004 clay). Measured versus predicted values validation with the remaining 29 data points resulted in a suggest that samples with low clay contents (<80 g kg−1) RPD of 2.70 confirming a very high prediction accuracy of may not be adequately predicted applying this model as this model similar to the image spectra from 2008. The lowest variability of estimations is very high for samples with prediction ability is achieved with the calibration model more or less identical clay contents (Figure 10). According to Applied and Environmental Soil Science 15

Table 7: Calibration and validation performance statistics of HyMap image spectra.

Parameter Clay † † R2 Model name Number of samples (cal/val) Preprocessing Number of LV SDcal SDval SEC SD/SEC SEP RPD HyMap 2004 Clay 20 Absorption 3 44.72 — 0.80 23.39 2.07 — — HyMap 2005 Clay 38/29 First derivative of absorption 3 47.18 50.49 0.85 20.34 2.46 19.41 2.70 HyMap 2008 Clay 11 Absorption 3 42.36 — 0.92 15.99 2.86 — — Parameter Organic carbon † † R2 Model name Number of samples (cal/val) Preprocessing Number of LV SDcal SDval SEC SD/SEC SEP RPD HyMap 2004 OC 20 First derivative of absorption 3 2.70 — 0.62 2.13 1.48 — — HyMap 2005 OC 38/29 First derivative of absorption 2 2.58 2.20 0.71 1.64 1.80 2.07 1.80 HyMap 2008 OC 11 First derivative of absorption 5 1.20 — 0.45 1.61 0.91 — — † SD: standard deviation of estimated parameter values in (g kg−1), R2:coefficient of determination, SEC: standard error of calibration in (g kg−1), SEP: standard error of validation in (g kg−1), RPD: ratio of performance to deviation.

Chang et al. [56], all three models can be classified as group auxiliary information on is provided for delineation A with an excellent ability to predict soil clay content from of soil types in the image data in order to correctly apply the hyperspectral image data. A direct comparison of our results different calibration models. to the prediction accuracy obtained by other authors is Since laboratory measurements are acquired under stan- difficult since none of them reports RPD or SD/SEC values. dardized conditions, it was expected that calibrations with Standard errors may only be compared between studies if HyMap image spectra results in a deterioration of prediction soils have an analogous range of clay content. This is the accuracy. Lagacherie et al. [22], for example, attributed a case with the work of Selige et al. [19] who reported a cross- decrease in model performance from laboratory to imaging validated R2 of 0.65 with a RMSE of 38 g kg−1 for MLR spectroscopy to radiometric and wavelength calibration and a R2 of 0.71 with a RMSE of 42 g kg−1 for PLSR for uncertainties of the HyMap sensor as well as atmospheric estimations of clay from HyMap data in an agricultural area effects. Gomez et al. [21] assumed that a higher signal-to- in the German federal state Saxony-Anhalt. Although their noise ratio and the large spatial support area (30 × 30 m2) samples reached a maximum of 260 g kg−1 clay, these error of Hyperion images caused a decline in prediction ability. measures clearly indicate that prediction accuracy for clay Other factors degrading the model performance may be from HyMap image data in DEMMIN is superior. variable soil moisture content [23] or soil surface conditions R2 obtained from model calibration for the prediction of [71]asvegetationaffected spectra were removed before the OC content from HyMap spectra are 0.71 at most and 0.45 image data analysis. In fact, higher standard errors were at least with SEC between 1.61 and 2.13 g kg−1. Prediction observed in this study for the HyMap 2005 OC and the ability of the PLSR model based on the 2005 image data is HyMap 2008 OC models. Apart from the factors mentioned acceptable with a SD/SEC ratio and an RPD of 1.80. Being above, the interannual variability of soil organic carbon [40, restricted to 29 validation samples, no clear conclusion can 41] may have influenced the spectral calibrations for these be drawn on the model’s behaviour above 20 g kg−1 OC. two years. Since image data was analyzed with reference data Similar to the corresponding laboratory cal/val configuration from 2006, possible variations of soil organic carbon caused (Lab 2005 Clay, nval = 29), a strong underestimation by the specific temperature and soil moisture regime of of one data point with about 25 g kg−1 OC is observed each year could not be taken into account. Prediction ability (Figures 8 and 11). This gives reason to interpret that the of the HyMap 2005 Clay model remained more or less HyMap 2005 OC model is subject to a nonlinear trend just constant (Tables 5 and 7). Judging from the SD/SEC values as examined in the validation of the laboratory model with model performance increased when applying the image data 97 samples. Calibration models based on image data from from 2004 (HyMap 2004 OC, HyMap 2004 Clay). Accuracy 2004 and 2008 exhibit SD/SEC values of 1.49 and 0.91, also improved when estimating clay using the data from respectively. Thus, models for the prediction of OC based 2008 although it decreased when estimating organic carbon on image spectra belong to group B (HyMap 2004 OC, with the same data set. A reasonable explanation for this HyMap 2005 OC)andgroupC(HyMap2008 OC) after behavior was not found. In any case, it must be stated that Chang et al. [56]. Prediction accuracy obtained by PLS only a reverification of the predictive accuracy with totally regression on HyMap images from 2004 and 2005 match independent reference samples can reveal further insight and results from Stevens et al. [34]andStevensetal.[23] the true predictive ability of the presented PLSR models. This applying image data of the sensors AHS-160 (RPD: 1.40) and is of particular importance for fields not sampled during field CASI (RPD: 1.86). Stevens et al. [20] showed that regrouping work. To obtain a first idea of possible implications caused by a global cal/val data set by soil type into several smaller cal/val spatial autocorrelation, a calibration and validation scheme data sets could improve prediction accuracy about two times for the HyMAP spectra from 2005 was tested in which all withanRPDashighas3.15.However,thisapproachis samples from three fields were retained from calibration at only feasible if the number of reference points is large and if a time and used for model validation only. This was done 16 Applied and Environmental Soil Science

HyMap 2004 Clay HyMap 2005 Clay 300 1 : 1 line SEC = 23.39 250 SD/SEC = 2.07 n = 20 ) 1

− 200

150

100 Clay (predicted) (g kg (g (predicted) Clay

50

0 0 100 200 300 50 150 250 Clay (measured) (g kg−1) (a) (b)

HyMap 2008 Clay 300 1 : 1 line SEC = 15.99 250 SD/SEC = 2.86 n = 11 ) 1

− 200

150

100 Clay (predicted) (g kg (g (predicted) Clay

50

0 0 100 200 300 Clay (measured) (g kg−1) (c)

Figure 10: Plots of measured versus predicted values for clay (g kg−1) in the calibration (black symbol, ×) and validation (red symbol, +) data sets based on HyMap image spectra. The respective calibration and validation data schemes (ncal./nval.) are marked on the plots. For further explanation see text. for five different cal/val constellations. Standard errors of of the HyMAP 2005 OC model. These results corroborate validation (SEP) retrieved this way for clay were less than the urgent need for independent validation. Although an the SEP of the HyMAP 2005 Clay model in three cases. increase in standard errors by about 30% may still seem In two of these cal/val schemes, SEP increased by about acceptable, parameter estimations for soils with low clay or 11% and 15%, that is, a maximum SEP of 22.3 g kg−1 was OC contents would become very inaccurate. obtained. For OC, SEP increased by about 9% and 29% for VIP and b-coefficient curves of the PLSR models for clay two cal/val schemes. The top most SEP was 2.68 g kg−1. In the based on HyMap spectra presented in Table 7 signalize an remaining test cases considered, SEP was less than the SEP importance of wavelength between 2000 and 2500 nm just as Applied and Environmental Soil Science 17 observed for the laboratory models. Again special emphasis the model’s predictive ability. Because of the time of image is given to the 2200 nm band. This is less pronounced data acquisition, only two fields could be identified where in the HyMap 2008 Clay model where wavelengths in the parameters were estimated in either 2004 and 2005 or 2005 visible and near-infrared part of the spectrum seem to and 2008. No overlap was detected for all three years. Figure play a major role. Other than with the laboratory spectra, 14 displays the difference image of estimated clay and organic wavelengths between 900 and 1300 nm contain high values carbon contents based on HyMap images from 2004 and in this model with a slight accentuation near 1100 nm. This 2005. During the overflight in 2004, this field was partly inflection can also be seen in the Lab 2008 Clay model and covered with crop residues which are more or less arranged may be associated with absorption caused by ferrous iron in line according to the tillage direction. Furthermore, the in illite minerals [13]. Increased VIP and b-coefficients in state of tillage differed within the field. In the eastern part, all three models can also be found in the 500 to 650 nm vegetation residues almost completely cover the soil surface. region and are probably linked to iron as discussed with the These pixels are masked (white). On average, predicted clay laboratory spectra. For organic carbon, several wavelengths contents in 2005 are 3.42 g kg−1 below the estimates in 2004. in the visible part of the spectrum are stressed. Peaks occur 50% of all data points show differences between −14.19 and at 550, 680, and 770 nm. Likewise, bands near 2000, 2200, 9.11 g kg−1 clay. Mean differences of predicted OC contents and 2350 nm carry high loadings in all three models similar are close to 0 whereby half the number of data points differ to the laboratory models. As has already been stated in between −0.11 and 0.57 g kg−1 OC. The spatial pattern of Section 3.5.2, there are noticeable analogies to the clay model both difference images is similar. Higher negative differences parameters. appear with both parameters in the upper left of the field (dark blue). A small area in the centre of the field exhibits higher positive deviations (dark orange-red). Differences 3.6. Spatially Explicit Parameter Estimation. In Figure 12 reach extreme values for a few isolated pixels. They are results of the multi-annual soil parameter estimation using attributed to artifacts or problems caused by mixed pixels. hyperspectral image data and PLS regression is exemplarily RMSE values calculated for both parameters are equal to displayed for the parameter clay. Flight strips of each or less than standard errors of model calibration (Table 7). flight campaign were processed individually. Mosaics were This result is encouraging as it confirms statements made on created subsequently. Since statistical analysis was done model accuracies and emphasizes the potential of imaging for soils from arable land only, quantitative mapping was spectroscopy to map soil parameters. However, clay and restricted to agricultural crop land by means of a vector layer OC contents on this particular field are quite low. As with field boundaries. White regions represent nonarable laboratory models showed an increase in variance with land (e.g., settlements, forest, and grassland). Grey regions increasing OC concentrations, the actual RMSE calculated indicate arable land completely covered with vegetation for OC may be higher. This was not the case for clay which was masked using CAI and NDVI (cf. Section 3.4). but only further investigations may provide additional Supplementary information is given on the field specific evidence. model predictive ability classified after Chang et al. [56] Reflectance of most natural surfaces varies depending including details on the respective calibration data range on the viewing and solar illumination conditions. This fact (Figure 12(b)). Careful documentation of error measures is described by the bidirectional reflectance distribution and calibration data range is considered essential as it differs function (BDRF) [72]. Evaluating prediction accuracy in with multi-annual image data. Figure 12(c) provides direct the overlap area of the flight strips per year may give access to the field-specific image data source. As expected, an indication for inaccuracies related to these reflectance clay content is less than 125 g kg−1 within most of the fields. anisotropies. Stevens et al. [20] presumed that difficulties Despite this predominance of sandy loam and loamy sand, to produce robust calibrations were partly caused by strong the map reveals a typical characteristic of the DEMMIN area backward- or forward-scattering at the image edges due to which is the occurrence of partly large local heterogeneities the large field of view of the AHS sensor (90◦). Computing of the soil type. A single field of 40 ha near the village RMSE revealed substantially higher errors in the overlap area Heydenhof is an example (Figure 13) with clay contents for the 2004 data set than was expected from the valida- close to 300 g kg−1. Similarly, organic carbon contents are tion, respectively, cross-validation of the models (Table 8). above average in the eastern part of the field reaching up to For clay, RMSE was about 3 times larger than SEC 17.5 g kg−1. The spatial pattern of clay and organic carbon using reflectance transformed to absorbance. Then applying content in this field shows little congruence. This is in reflectance transformed to the first derivative of absorbance agreement with a moderate correlation coefficient (Table 2) RMSE was about 1.4 times larger than SEC. Effects were between the two soil properties and considerable deviations less pronounced for organic carbon but RMSE still exceeded that occur for a few sandy soils with very high organic carbon SEC by a factor of 1.6 using absorbance. In contrast, RMSE contents but very little clay. Spatial pattern of clay and OC in were either only slightly increased or even less than standard this field is most likely not related to topography as the area errors of validation for HyMap spectra from 2005 and is very flat. 2008 independent from the preprocessing method applied. Performing multi-annual estimations of soil parameters These results provide strong evidence that BRDF effects offers the opportunity to compare parameter estimations have negative influence on the performance of the models from different years which provides an additional measure of based on image data from 2004. As no negative impact 18 Applied and Environmental Soil Science

HyMap 2004 OC HyMap 2005 OC

25 SEC = 2.13 1 : 1 line 25 SEC = 1.64 1 : 1 line SD/SEC = 1.48 SEP = 2.07 n = 20 RPD = 1.8 20 20 ) ) n = 38 1

1 cal. − − nval. = 29 15 15

10 10 OC (predicted) (g kg (g OC (predicted) OC (predicted) (g kg (g OC (predicted)

5 5

0 0 0 5 10 15 20 25 0 5 10 15 20 25 OC (measured) (g kg−1) OC (measured) (g kg−1) (a) (b)

HyMap 2008 OC

25 SEC = 1.61 1 : 1 line SD/SEC = 0.91 n = 11 20 ) 1 −

15

10 OC (predicted) (g kg (g OC (predicted)

5

0 0 5 10 15 20 25 OC (measured) (g kg−1) (c)

Figure 11: Plots of measured versus predicted values for organic carbon (g kg−1) in the calibration (black symbol, ×) and validation (red symbol, +) data sets based on HyMap image spectra. The respective calibration and validation data schemes (ncal./nval.)aremarkedonthe plots. For further explanation see text.

was observed in the two other data sets, this effect seems 4. Conclusions solely due to the unfavorable viewing and illumination conditions during the overflight in 2004, that is, a large Multi-annual spatially explicit soil parameter estimation zenith and a low azimuth angle (Table 1), but not to the was performed using three HyMap images of the test site sensors field of view in principle. Negative effects in the DEMMIN and PLSR. It was shown that this approach pro- off-nadir area can be diminished applying the first derivate vides an opportunity to subsequently map soil parameters even though a somewhat higher standard error must be on agricultural fields despite long periods of vegetation accepted. coverage. Image data applied within this study was acquired Applied and Environmental Soil Science 19

Legend 770000 775000 780000 785000 770000 775000 780000 785000 770000 775000 780000 785000 (a) Clay content (g kg−1) SEC (2004): 23.39 SD/SEC (2004): 2.07 SEP (2005): 19.41 RPD (2005): 2.7 < 25 6000000 6000000 6000000 6000000 SEC (2008): 15.99 6000000 SD/SEC (2008): 2.86 6000000 100–125 175–200 5995000 5995000 5995000 5995000 5995000 5995000 275–300 (b) Model accuracy index 5990000 5990000 5990000 5990000 5990000 5990000 after Chang et al. (2001) Cat A Cat B 5985000 5985000 5985000 5985000 5985000 5985000 Cat C Calibration data range 2004: 48–186.6 g clay kg −1 − 5980000 5980000 5980000 5980000 5980000 5980000 2005: 38.4–216.2 g clay kg 1 2008: 52.2–197.2 g clay kg −1 5975000 5975000 5975000 5975000 5975000 5975000 (c) Image data source 09.08.2004 N N N 27.05.2005 5970000 5970000 5970000 5970000 5970000 5970000 29.07.2008 770000 775000 780000 785000 770000 775000 780000 785000 770000 775000 780000 785000 Field with dense vegetation Projection information: UTM zone 32, Datum WGS-84 Settlement 4048 Border of HyMAP images (Kilometers)

(a) Clay content (g kg−1) (b) Model accuracy index (c) Image data source

Figure 12: Map of soil clay content on agricultural fields (a) derived by multi-annual hyperspectral image data analysis. Map of RPD values (b) indicate field-specific model predictive ability after Chang et al. [56]. A field assignment to the year of acquisition is illustrated in Map (c).

780000 780200 780400 780600 Clay 780000 780200 780400 780600 OC (g kg−1) (g kg−1) < 25 < 2.5 5–7.5 100–125 12.5–15 17.5–20 175–200

275–300

N N

5977800 5978000780000 5978200 780200 780400 780600 5977800 5978000 5978200 5977800 5978000780000 5978200 780200 780400 780600 5977800 5978000 5978200 Projection information: UTM zone 32, Datum WGS-84 0.2 0 0.2 (Kilometers) Figure 13: Map of soil clay and organic carbon content derived from HyMap image data (May 27, 2005) and PLSR for a field near the village of Heydenhof. The location of this field on the test site DEMMIN is marked by a red flag in Figure 1. in May, July, and August. Whereas the spring image offered a vegetation as well as soil parameters will be met. Due to good chance to estimate soil parameters on fields grown with the high seasonal and interannual variations of soil organic root crops or summer grains, images acquired in summer carbon [36, 41], model calibrations for this parameter may were less suited. More than 97.8% of the images were covered be negatively affected if the time of acquisition of the image with dry or green vegetation at this time of the year. Having data and the reference data deviates. Problems could be three hyperspectral images at our disposal, estimations on solved by sampling the soil more or less at the same time as clay and organic carbon content could be performed for the flight campaign. For soil parameters with little temporal about 10% of the total area covered. The presented approach variability such as clay, this will not be required. will be facilitated by future hyperspectral satellite missions Calibration models for clay content based on image such as EnMAP [73] since there is no restriction to a very few spectra showed excellent prediction ability with a maximum if not only one overflight per year as is the case with airborne standard error of 23.39 g kg−1 clay. Clay estimations were sensors. Thus, requirements by researchers interested in generally more precise than predictions of organic carbon 20 Applied and Environmental Soil Science

775950 776300 776650 775950 776300 776650 N N 5979750 5980100 5980450 5979750 5980100 5980450 5979750 5980100 5980450 5979750 5980100 5980450

775950 776300 776650 775950 776300 776650

−70 0 70 −3.3 0 3.3 Difference 2005-2004 (g clay kg −1) ff −1 White = no data Di erence 2005-2004 (g OC kg ) Projection information: UTM zone 32, Datum WGS-84 0.2 0 0.2 0.4 (kilometers)

Parameter Min Max Mean Q1 Median (Q2) Q3 RMSE 2004-2005

Clay −254.69 71.45 −3.42 −14.19 −2.84 9.11 19.48

Organic carbon −12.62 3.3 0.19 −0.11 0.077 0.57 0.72

Figure 14: Difference image of estimated clay and organic carbon content based on HyMap images from 2004 and 2005 for an agricultural field near the city of Marienfelde. Q1: lower quartile, Q3: upper quartile. Mean deviation (RMSE) calculated from estimations in 2005 and 2004. The location of this field on the test site DEMMIN is marked by a red flag in Figure 1.

Table 8: Mean deviation (RMSE) of predicted clay and organic carbon contents in the overlap area of the HyMap flight strips and standard errors of PLSR models generated using the spectra transformed to absorbance (Abs) and spectra transformed to first derivative of absorbance (Abs Deriv1). Bold numbers indicate model parameters presented in Table 7.

Clay Organic carbon Number of pixels Day of acquisition ∗ ∗∗ ∗ ∗∗ estimated RMSEoverlap SEP or SEC RMSEoverlap SEP or SEC Abs Abs Deriv1 Abs Abs Deriv1 Abs Abs Deriv1 Abs Abs Deriv1 August 9, 2004 36015 69.63 38.99 23.39∗∗ 28.35∗∗ 3.86 2.20 2.35∗∗ 2.13∗∗ May 27, 2005 260206 22.46 17.03 20.96∗ 19.41∗ 2.34 0.60 3.13∗ 2.07∗ ∗∗ July 29, 2008 60308 12.95 15.39 15.99 22.89∗∗ 2.21 0.79 1.88∗∗ 1.61∗∗ SEC: standard error of calibration in (g kg−1) and SEP: standard error of validation in (g kg−1). content whose standard error was 2.13 g kg−1 OC at most. outside the calibration data range. However, despite these For both soil parameters, validation with soil samples promising results, further validation with truly independent withheld from calibration approved prediction errors of the samples will be crucial to reveal the model’s predictive models although for two image data sets validation was ability for the entire data set including fields without confined to leave-one-out cross-validation due to the little reference information. If this is known, soil parameter maps number of reference samples. A comparison of predictions derived from imaging spectroscopy may serve as a basis for from different years suggests that there is a very good precision agriculture, environmental modeling, or carbon agreement between estimations of different years and thus sequestration studies, for example. confirms validation results. Investigations under laboratory Furthermore, it was shown that off-nadir parameter conditions provided an additional indication that standard estimations were not affected by BRDF effects if image errors for clay may also apply to soils with concentrations data was acquired under optimal viewing and illumination Applied and Environmental Soil Science 21 conditions and in flat terrain. But if acquisition took place between 1984 and 2004,” Geoderma, vol. 152, no. 3-4, pp. 231– while zenith angle was high and azimuth angle was low 238, 2009. inaccuracies enlarged up to about 3 times. This negative [6] R. C. Dalal and R. J. Henry, “Simultaneous determination of effect could be mitigated using an appropriate spectral moisture, organic carbon, and total nitrogen by near infrared preprocessing such as the first derivative even though this reflectance spectrophotometry,” Soil Science Society of America resulted in a certain loss of the overall prediction accuracy. Journal, vol. 50, no. 1, pp. 120–123, 1986. [7] E. Ben-Dor and A. Banin, “Near-infrared analysis as a rapid Results presented here demonstrate that high resolution method to simultaneously evaluate several soil properties,” soil parameter maps of agricultural fields can be derived Soil Science Society of America Journal, vol. 59, no. 2, pp. 364– on a regional scale with multi-annual imaging spectroscopy 372, 1995. data. Prediction accuracy both for laboratory and imaging [8] J. B. Reeves, G. W. McCarty, and J. J. Meisinger, “Near infrared spectroscopy does not come up to conventional analytical reflectance spectroscopy for the analysis of agricultural soils,” techniques in the laboratory. Model errors are at least Journal of Near Infrared Spectroscopy, vol. 7, no. 3, pp. 179–193, about two times higher than SES and SEL taking into 1999. account that model errors include measurement errors of [9] K. D. Shepherd and M. G. Walsh, “Development of reflectance reference values. The great benefit of imaging spectroscopy spectral libraries for characterization of soil properties,” Soil is the provision of information on soil parameters as Science Society of America Journal, vol. 66, no. 3, pp. 988–998, a two-dimensional array rather than single data points 2002. whereby inaccuracies associated with spatial interpolation [10]D.J.Brown,K.D.Shepherd,M.G.Walsh,M.DewayneMays, and T. G. Reinsch, “Global soil characterization with VNIR are avoided. Because of frequent plowing on croplands, diffuse reflectance spectroscopy,” Geoderma, vol. 132, no. 3-4, surface reflectance measured by hyperspectral instruments pp. 273–290, 2006. does not only reflect the characteristics of the upper five [11] R. A. Viscarra Rossel, D. J. J. Walvoort, A. B. McBratney, L. J. centimeters but does very likely represent the soil character- Janik, and J. O. Skjemstad, “Visible, near infrared, mid infrared istics of the upper 20 to 30 cm. As this layer is most prone or combined diffuse reflectance spectroscopy for simultaneous to degradation, imaging spectroscopy can provide valuable assessment of various soil properties,” Geoderma, vol. 131, no. information for soil conservation measures on agricultural 1-2, pp. 59–75, 2006. land. [12] G. R. Hunt and J. W. Salisbury, “Visible and near-infrared spectra of minerals and rocks. I. Silicate minerals,” Modern Geology, vol. 1, pp. 283–300, 1970. Acknowledgments [13] R. N. Clark, T. V. V. King, M. Klejwa, G. A. Swayze, and N. Vergo, “High spectral resolution reflectance spectroscopy of The authors wish to thank Andreas Muller,¨ Martin Bach- minerals,” Journal of Geophysical Research,vol.95,no.8,pp. mann, and Stefanie Holzwarth from the Imaging Spec- 12–680, 1990. troscopy team of the German Aerospace Centre (DLR) for [14] E. Ben-Dor, J. R. Irons, and G. Epema, “Soil reflectance,” their assistance in geometric and atmospheric correction of in Remote Sensing for the Earth Sciences: Manual of Remote the image data and several valuable hints for the spectral Sensing, A. N. Rencz, Ed., vol. 3, chapter 3, pp. 111–188, John Wiley & Sons, New York, NY, USA, 3rd edition, 1999. data processing. The GFZ German Research Centre for Geo- [15] A. Palacios-Orueta and S. L. Ustin, “Remote sensing of soil sciences is thanked for providing its laboratory spectroscopy properties in the Santa Monica Mountains I. Spectral analysis,” facilities. Special thanks go to Sabine Chabrillat and Soren¨ Remote Sensing of Environment, vol. 65, no. 2, pp. 170–183, Haubrock for their encouraging and helpful comments. 1998. The authors are also grateful to the staff of ZALF central [16] G. Kruger,¨ J. Erzinger, and H. Kaufmann, “Laboratory and air- laboratory at the Leibniz-Centre for Agricultural Landscape borne reflectance spectroscopic analyses of lignite overburden Research Muncheberg¨ for their collaboration concerning soil dumps,” Journal of Geochemical Exploration, vol. 64, no. 1–3, data analysis. pp. 47–65, 1999. [17]S.Chabrillat,A.F.H.Goetz,L.Krosley,andH.W.Olsen,“Use of hyperspectral images in the identification and mapping of References expansive clay soils and the role of spatial resolution,” Remote Sensing of Environment, vol. 82, no. 2-3, pp. 431–445, 2002. [1] FAO, WorldSoilCharter, Food and Agriculture Organization [18] E. Ben-Dor, K. Patkin, A. Banin, and A. Karnieli, “Mapping of of the United Nations (FAO), 1982. several soil properties using DAIS-7915 hyperspectral scanner [2] R. Lal, “Soil carbon sequestration to mitigate climate change,” data—a case study over soils in Israel,” International Journal of Geoderma, vol. 123, no. 1-2, pp. 1–22, 2004. Remote Sensing, vol. 23, no. 6, pp. 1043–1062, 2002. [3] S. Sleutel, S. De Neve, B. Singier, and G. Hofman, “Organic [19] T. Selige, J. Bohner,¨ and U. Schmidhalter, “High resolution C levels in intensively managed arable soils—long-term topsoil mapping using hyperspectral image and field data in regional trends and characterization of fractions,” Soil Use and multivariate regression modeling procedures,” Geoderma, vol. Management, vol. 22, no. 2, pp. 188–196, 2006. 136, no. 1-2, pp. 235–244, 2006. [4] E. Goidts and B. van Wesemael, “Regional assessment of [20] A. Stevens, T. Udelhoven, A. Denis et al., “Measuring soil soil organic carbon changes under agriculture in Southern organic carbon in croplands at regional scale using airborne Belgium (1955–2005),” Geoderma, vol. 141, no. 3-4, pp. 341– imaging spectroscopy,” Geoderma, vol. 158, no. 1-2, pp. 32–45, 354, 2007. 2010. [5] A. Reijneveld, J. van Wensem, and O. Oenema, “Soil organic [21] C. Gomez, R. A. Viscarra Rossel, and A. B. McBratney, “Soil carbon contents of agricultural land in the Netherlands organic carbon prediction by hyperspectral remote sensing 22 Applied and Environmental Soil Science

and field vis-NIR spectroscopy: an Australian case study,” [37] S. E. Trumbore, O. A. Chadwick, and R. Amundson, “Rapid Geoderma, vol. 146, no. 3-4, pp. 403–411, 2008. exchange between soil carbon and atmospheric carbon diox- [22]P.Lagacherie,F.Baret,J.B.Feret,J.MadeiraNetto,and ide driven by temperature change,” Science, vol. 272, no. 5260, J. M. Robbez-Masson, “Estimation of soil clay and calcium pp. 393–396, 1996. carbonate using laboratory, field and airborne hyperspectral [38] Y. Wang, R. Amundson, and X. F. Niu, “Seasonal and measurements,” Remote Sensing of Environment, vol. 112, no. altitudinal variation in decomposition of soil organic matter 3, pp. 825–835, 2008. inferred from radiocarbon measurements of soil CO2 flux,” [23] A. Stevens, B. Van Wesemael, G. Vandenschrick, S. Toure,´ and Global Biogeochemical Cycles, vol. 14, no. 1, pp. 199–211, 2000. B. Tychon, “Detection of carbon stock change in agricultural [39] J. Rogasik, E. Schnug, and H. Rogasik, “Landbau und soils using spectroscopic techniques,” Soil Science Society of treibhauseffekt—quellen und senken fur¨ CO2 bei unter- America Journal, vol. 70, no. 3, pp. 844–850, 2006. schiedlicher Landbewirtschaftung,” Archives of Agronomy and [24]B.S.SiegalandA.F.H.Goetz,“Effects of vegetation on rock Soil Science, vol. 45, no. 2, pp. 105–121, 2000. and soil type discrimination,” Photogrammetric Engineering [40] F. Ellmer and M. Baumecker, “Static nutrient depletion and Remote Sensing, vol. 43, no. 2, pp. 191–196, 1977. experiment Thyrow. Results after 65 experimental years,” [25]L.Kooistra,J.Wanders,G.F.Epema,R.S.E.W.Leuven, Archives of Agronomy and Soil Science, vol. 51, no. 2, pp. 151– R.Wehrens,andL.M.C.Buydens,“Thepotentialoffield 161, 2005. spectroscopy for the assessment of sediment properties in river [41] L. S. Jensen, T. Mueller, N. E. Nielsen et al., “Simulating trends floodplains,” Analytica Chimica Acta, vol. 484, no. 2, pp. 189– in soil organic carbon in long-term experiments using the soil- 200, 2003. plant-atmosphere model DAISY,” Geoderma,vol.81,no.1-2, [26] R. J. Murphy and G. Wadge, “The effects of vegetation on pp. 5–28, 1997. the ability to map soils using imaging spectrometer data,” [42] T. Dann and U. Ratzke, “Boden,”¨ in Geologie von Mecklenburg- International Journal of Remote Sensing, vol. 15, no. 1, pp. 63– Vorpommern. Kap. 6. 12, E. Schweizerbart’Sche Verlagsbuch- 86, 1994. handlung, G. Katzung, Ed., pp. 489–508, Nagele¨ u. Obermiller, [27] R. J. Murphy, “The effects of surficial vegetation cover on Stuttgart, Germany, 2004. mineral absorption feature parameters,” International Journal [43] DWD, “DWD—Deutscher Wetterdienst, Deutscher Klimaat- of Remote Sensing, vol. 16, no. 12, pp. 2153–2164, 1995. las,” 2012, http://www.dwd.de/klimaatlas. [28] H. Bartholomeus, L. Kooistra, A. Stevens et al., “Soil Organic [44] D. F. Malley, P. D. Martin, and E. Ben-Dor, “Application in Carbon mapping of partially vegetated agricultural fields with analysis of soils,” in Near-Infrared Spectroscopy in Agriculture, imaging spectroscopy,” International Journal of Applied Earth C. A. Roberts, J. Workman Jr., and J. B. Reeves III, Eds., pp. Observation and Geoinformation, vol. 13, no. 1, pp. 81–88, 729–784, American Society of Agronomy, Madison, Wis, USA, 2011. 2004. [29] W. Ouerghemmi, C. Gomez, S. Naceur, and P. Lagacherie, [45] J. L. Lozan´ and H. Kausch, Angewandte Statistik fur¨ Natur- “Applying blind source separation on hyperspectral data for wissenschaftler. 3. Uberarbeitete¨ und erganzte¨ Auflg., Wis- clay content estimation over partially vegetated surfaces,” senschaftliche Auswertungen, Hamburg, Germany, 2004. Geoderma, vol. 163, no. 3-4, pp. 227–237, 2011. [46] A. Savitzky and M. J. E. Golay, “Smoothing and differentiation [30] H. Bartholomeus, G. Epema, and M. Schaepman, “Determin- of data by simplified least squares procedures,” Analytical ing iron content in Mediterranean soils in partly vegetated Chemistry, vol. 36, no. 8, pp. 1627–1639, 1964. areas, using spectral reflectance and imaging spectroscopy,” [47] Exelis VIS, Exelis Visual Information Solutions, 2011. International Journal of Applied Earth Observation and Geoin- [48] R. Richter, “Atmospheric/ topographic correction for airborne formation, vol. 9, no. 2, pp. 194–203, 2007. imagery,” 2008, ATCOR-4 User Guide, Version 4. 3. [31] A. Rodger and T. Cudahy, “Vegetation corrected continuum [49] R. Richter and D. Schlapfer,¨ “Geo-atmospheric process- depths at 2.20 µm: an approach for hyperspectral sensors,” ing of airborne imaging spectrometry data—part 2: atmo- Remote Sensing of Environment, vol. 113, no. 10, pp. 2243– spheric/topographic correction,” International Journal of 2257, 2009. Remote Sensing, vol. 23, no. 13, pp. 2631–2649, 2002. [32] C. Gomez, P. Lagacherie, and G. Coulouma, “Continuum [50] R. Muller,¨ M. Lehner, P. Reinartz, and M. Schroeder, “Evalu- removal versus PLSR method for clay and calcium carbonate ation of spaceborne and airborne line scanner images using content estimation from laboratory and airborne hyperspec- a generic ortho image processor,” in High Resolution Earth tral measurements,” Geoderma, vol. 148, no. 2, pp. 141–148, Imaging for Geospatial Information, C. Heipke, K. Jacobsen, 2008. and M. Gerke, Eds., vol. 36 of International Archives of [33] L. Kooistra, R. S. E. W. Leuven, R. Wehrens, P. H. Nienhuis, Photogrammetry and Remote Sensing, pp. 17–20, High Reso- and L. M. C. Buydens, “A comparison of methods to relate lution Earth Imaging for Geospatial Information, Hannover, grass reflectance to soil metal contamination,” International Germany, 2005. Journal of Remote Sensing, vol. 24, no. 24, pp. 4995–5010, 2003. [51] K. Islam, B. Singh, and A. McBratney, “Simultaneous estima- [34] A. Stevens, B. van Wesemael, H. Bartholomeus, D. Rosillon, tion of several soil properties by ultra-violet, visible, and near- B. Tychon, and E. Ben-Dor, “Laboratory, field and airborne infrared reflectance spectroscopy,” Australian Journal of Soil spectroscopy for monitoring organic carbon content in agri- Research, vol. 41, no. 6, pp. 1101–1114, 2003. cultural soils,” Geoderma, vol. 144, no. 1-2, pp. 395–404, 2008. [52] P. H. Fidencio,ˆ R. J. Poppi, and J. C. De Andrade, “Determi- [35] G. B. M. Heuvelink and R. Webster, “Modelling soil variation: nation of organic matter in soils using radial basis function past, present, and future,” Geoderma, vol. 100, no. 3-4, pp. networks and near infrared spectroscopy,” Analytica Chimica 269–301, 2001. Acta, vol. 453, no. 1, pp. 125–134, 2002. [36] P. Leinweber, H. R. Schulten, and M. Korschens,¨ “Seasonal [53] M. Cohen, R. S. Mylavarapu, I. Bogrekci, W. S. Lee, and M. W. variations of soil organic matter in a long-term agricultural Clark, “Reflectance spectroscopy for routine agronomic soil experiment,” Plant and Soil, vol. 160, no. 2, pp. 225–235, 1994. analyses,” Soil Science, vol. 172, no. 6, pp. 469–485, 2007. Applied and Environmental Soil Science 23

[54] R. A. Viscarra Rossel, “ParLeS: software for chemometric spectrometry and partial least-square regression: a feasibility analysis of spectroscopic data,” Chemometrics and Intelligent study,” Plant and Soil, vol. 251, no. 2, pp. 319–329, 2003. Laboratory Systems, vol. 90, no. 1, pp. 72–83, 2008. [72] M. von Schonermark,B.Geiger,andH.R¨ oser,¨ Eds., Reflection [55] H. Martens and T. Næs, Multivariate Calibration,JohnWiley Properties of Vegetation and Soil with a BRDF data base, & Sons, Guildford, UK, 1989. Wissenschaft und Technik, Berlin, Germany, 2004. [56] C. W. Chang, D. A. Laird, M. J. Mausbach, and C. R. Hur- [73] H. Kaufmann, L. Guanter, K. Segl et al., “EnMAP—an burgh, “Near-infrared reflectance spectroscopy—principal advanced optical payload for earth observation,” in Proceed- components regression analyses of soil properties,” Soil Science ings of ASD and IEEE GRS Art, Science and Applications Society of America Journal, vol. 65, no. 2, pp. 480–490, 2001. of Reflectance Spectroscopy Symposium, Boulder, Colo, USA, [57] LUNG, Landesamt fur¨ Umwelt, and Naturschutz und 2010. Geologie Mecklenburg Vorpommern, Beitrage¨ Zum Boden- schutz in Mecklenburg-Vorpommern,Boden¨ in Mecklenburg- Vorpommern, Gustrow,¨ Germany, 2003. [58] O. Duwel¨ and J. Utermann, “Humusversorgung der (Ober-) Boden¨ in Deutschland—status quo,” in Humusversorgung von Boden¨ in Deutschland,R.F.Huttl,A.Prechtel,andO.Bens,¨ Eds., Publikationen des Umweltbundesamtes, Abschnitt II, Kap. 8.1, 2008. [59]D.B.LobellandG.P.Asner,“Moistureeffects on soil reflectance,” Soil Science Society of America Journal, vol. 66, no. 3, pp. 722–727, 2002. [60]C.S.T.Daughtry,E.R.Hunt,C.L.Walthall,T.J.Gish,S. Liang, and E. J. Kramer, “Assessing the spatial distribution of plant litter,” in Proceedings of the 10th AVIRISEarth Science and Applications Workshop, pp. 105–114, NASA, Jet Propulsion, Pasadena, Calif, USA, March 2001. [61] P. L. Nagler, C. S. T. Daughtry, and S. N. Goward, “Plant litter and soil reflectance,” Remote Sensing of Environment, vol. 71, no. 2, pp. 207–215, 2000. [62] P. L. Nagler, Y. Inoue, E. P. Glenn, A. L. Russ, and C. S. T. Daughtry, “Cellulose absorption index (CAI) to quantify mixed soil-plant litter scenes,” Remote Sensing of Environment, vol. 87, no. 2-3, pp. 310–325, 2003. [63] D. J. Brus, B. Kempen, and G. B. M. Heuvelink, “Sampling for validation of digital soil maps,” European Journal of Soil Science, vol. 62, no. 3, pp. 394–407, 2011. [64]T.H.Waiser,C.L.S.Morgan,D.J.Brown,andC.T.Hallmark, “In situ characterization of soil clay content with visible near- infrared diffuse reflectance spectroscopy,” Soil Science Society of America Journal, vol. 71, no. 2, pp. 389–396, 2007. [65] D. Cozzolino and A. Moron,´ “The potential of near-infrared reflectance spectroscopy to analyse soil chemical and physical characteristics,” Journal of Agricultural Science, vol. 140, no. 1, pp. 65–71, 2003. [66] A. Volkan Bilgili, H. M. van Es, F. Akbas, A. Durak, and W. D. Hively, “Visible-near infrared reflectance spectroscopy for assessment of soil properties in a semi-arid area of Turkey,” Journal of Arid Environments, vol. 74, no. 2, pp. 229–238, 2010. [67] S. Wold, M. Sjostr¨ om,¨ and L. Eriksson, “PLS-regression: a basic tool of chemometrics,” Chemometrics and Intelligent Laboratory Systems, vol. 58, no. 2, pp. 109–130, 2001. [68] E. Ben-Dor, Y. Inbar, and Y. Chen, “The reflectance spectra of organic matter in the visible near-infrared and short wave infrared region (400–2500 nm) during a controlled decomposition process,” Remote Sensing of Environment, vol. 61, no. 1, pp. 1–15, 1997. [69]J.WorkmanJr.andL.Weyer,Practical Guide to Interpretive Near-Infrared Spectroscopy, CRC Press, Taylor & Francis Group, Boca Raton, Fla, USA, 2008. [70] C. W. Chang and D. A. Laird, “Near-infrared reflectance spectroscopic analysis of soil C and N,” Soil Science, vol. 167, no. 2, pp. 110–116, 2002. [71] T. Udelhoven, C. Emmerling, and T. Jarmer, “Quantitative analysis of soil chemical properties with diffuse reflectance Hindawi Publishing Corporation Applied and Environmental Soil Science Volume 2012, Article ID 971252, 20 pages doi:10.1155/2012/971252

Research Article A Comparison of Feature-Based MLR and PLS Regression Techniques for the Prediction of Three Soil Constituents in a Degraded South African Ecosystem

Anita Bayer,1 Martin Bachmann,1 Andreas Muller,¨ 1 and Hermann Kaufmann2

1 Department of Land Surface, German Remote Sensing Data Center (DFD), German Aerospace Center (DLR), Oberpfaffenhofen, 82234 Weßling, Germany 2 Remote Sensing Section, Department of Geodesy and Remote Sensing, German Research Centre for Geosciences (GFZ), Telegrafenberg, 14473 Potsdam, Germany

Correspondence should be addressed to Anita Bayer, [email protected]

Received 16 February 2012; Revised 20 April 2012; Accepted 21 May 2012

Academic Editor: Eyal Ben-Dor

Copyright © 2012 Anita Bayer et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The accurate assessment of selected soil constituents can provide valuable indicators to identify and monitor land changes coupled with degradation which are frequent phenomena in semiarid regions. Two approaches for the quantification of soil organic carbon, iron oxides, and clay content based on field and laboratory spectroscopy of natural surfaces are tested. (1) A physical approach which is based on spectral absorption feature analysis is applied. For every soil constituent, a set of diagnostic spectral features is selected and linked with chemical reference data by multiple linear regression (MLR) techniques. (2) Partial least squares regression (PLS) as an exclusively statistical multivariate method is applied for comparison. Regression models are developed based on extensive ground reference data of 163 sampled sites collected in the Thicket Biome, South Africa, where land changes are observed due to intensive overgrazing. The approaches are assessed upon their prediction performance and significance in regard to a future quantification of soil constituents over large areas using imaging spectroscopy.

1. Introduction dense shrubland with a high rate of carbon sequestration in vegetation and peripheral soils to an open savannah-like The soil as upper layer of the Earth’s surface is the most system. Chemical and physical soil attributes can serve as important layer for energy and nutrition flows necessary for tracers to assess and monitor such phenomena. However, a the development of vegetation and thus of key importance mapping of their spatial distribution and temporal develop- for landscape analysis. The characterization of an ecosystem’s ment is limited using conventional soil analyses since this soil condition and its spatial and temporal changes are vital ff indicators for and particularly in agricultural would require intensive sampling and analysis e orts. ecosystems directly linked to crop production. In semiarid Despite that, field and imaging spectroscopy provides a ff regions, land cover changes coupled with degradation and time-and cost-e ective tool for a mapping of selected chem- soil erosion are frequent phenomena which may be the result ical and physical soil attributes over large areas. The soil con- of long-term management practices or may be linked to stituents can be determined based on diagnostic absorption climate change. In particular the depletion of carbon inven- characteristics inherent in a soil’s reflectance spectrum. In tories in soils is accentuated by soil degradation and erosion general, the direct application of diagnostic spectral features [1] and directly causes not only environmental but also for quantification purposes is limited as many features are economical problems. In the semiarid subtropical Thicket influenced by (1) the overlapping of spectral features and Biome as part of the Eastern Cape Province of South Africa, (2) additional effects of other, mostly physical, soil pro- land changes are observed due to decades of overgrazing by perties (e.g., surface roughness) which influence the spectral goats. This has caused the unique ecosystem to change from reflectance in a nonsignificant way. 2 Applied and Environmental Soil Science

This paper presents two approaches for a quantification spectroscopy. Partial least squares regression (PLS) as mul- of soil organic carbon, iron oxides, and clay content based tivariate calibration procedure is a well-established method on field and laboratory spectroscopy. The first is a physical for these purposes and was often used in previous studies approach which is based on spectral absorption feature (e.g., [14–16]). PLS applies statistically derived spectral char- analysis. For every soil constituent, a set of distinct spectral acteristics to describe the relationship between features are selected according to their presence in the spectra and spectral information. However, the significance and and to their documentation in respective literature. They transferability of PLS models built upon local observations are linked with chemical ground reference information by to a regional scale is often limited because of their high multiple linear regression techniques. This approach is set degree of adaptation (see, e.g., [15]). To allow for a mapping up to take advantage of several spectral features and char- of soil constituents for ecosystems of a regional extent, acteristics and various properties describing these to gather more robust algorithms are needed. In previous studies, the most information content provided by physical features. approaches directly applying spectral absorption features or It is tested if the advantage of combining several spectral band indices are considered to be more suitable since they features, although each spectral feature might be influenced, are physically based (e.g., [17]). These approaches appear as will result in prediction models that reach the performance more robust and allow for a transfer to regions of similar of statistical techniques. As a second method, commonly environmental conditions. In the following paper, a special used partial least squares regression (PLS) as an exclusively focus is set to studies applying spectral feature parameters statistical multivariate method is applied for comparison. for soil quantification and in particular based on airborne The suitability of PLS techniques for the mapping of several sensors. soil constituents is well known (e.g., [2]). The two ap- The importance of soil organic carbon for soil sta- proaches are investigated for their prediction performance bility and fertility together with its contribution to the and significance for three spectral datasets measured in field global carbon cycle resulted in global initiatives to quantify and laboratory and for two spectral resolutions (full ASD ecosystem carbon stocks and their changes. Hence, there resolution and spectral resolution of the HyMap sensor). have been very intense efforts of using spectroscopy for The objective of this research is to allow for a robust these purposes resulting in various studies dealing with the quantification of carbon inventories in semiarid soils and quantification of soil organic carbon (e.g., [14, 15, 18, 19]). additionally to provide indicators for ecosystem function in Beside multivariate techniques, the estimation of soil organic relation to land degradation. carbon was done in particular based on an overall decrease In the next step, these approaches will be applied for in reflectance in the VIS/NIR region (see [4]) and also using a future large-scale quantification of soil constituents for a spectral indices (e.g., [4, 6]). South African study site in the Thicket Biome. Hyperspectral In a similar way, several studies have applied absorption imagery (approximately 320 km2) of the HyMap airborne feature parameters for a quantification of iron oxides con- sensor [3] was acquired of this area in 2009, coupled with tent. Richter et al. [20] therefore used the band depth of an extensive collection of spectral and chemical ground the 900 nm feature and evaluated the spectral influence of reference data (163 reference plots). In the Thicket Biome, variable soil textures on prediction accuracy for a laboratory accurate spatial information can be used to quantify the setup. The enrichment of iron oxides in Israelian sand dunes consequences of unsustainable land management techniques (rubification) was mapped by Ben-Dor et al. [21] using the and detect eroded or degraded areas in their primary stage. CASI airborne sensor, by application of the redness index as an iron-related band index. 2. State of the Art in Soil Spectroscopy For a large-scale mapping of clay minerals, Chabrillat et al. [22] applied the varying shape of the 2200 nm feature of Many studies exist where constituents of the upper soil layer smectite, illite, and kaolinite to detect and map their spatial aredescribedbasedonspectraldatafromalaboratoryor distribution from airborne AVIRIS and HyMap data of field environment. Examples include soil organic carbon Colorado, USA, while Gomez et al. [23] and also Lagacherie (e.g., [4–7]), soil iron oxides (e.g., [8–10]), and soil texture et al. [24] successfully applied the continuum removed band together with clay content (e.g., [7, 11, 12]). depth of the 2200 nm feature to quantify clay content. With increasing amount in the last years, airborne spec- trometer data were used for predictions of soil constituents. 3. Study Area and Data Collection A direct transfer of approaches developed on laboratory or field data to an airborne scanner is limited by technical Subtropical Thicket is a unique vegetation type located in aspects depending on the sensor system such as a lower spa- the semiarid valleys that cover around 17% of the Eastern tial and spectral resolution, increased signal-to-noise ratio, Cape Province of South Africa [25]. Among these, more than atmospheric influences, BRDF effects, and spectral mixing 70% of all vegetation units are considered as “moderately” to within pixels. A recent summary of key studies using imaging “severely” degraded [26]. Thicket is an unusual vegetation spectroscopy to study soil properties can be found in Ben- type for a semiarid environment since an almost complete Dor et al. [13]. cover of dense vegetation is present in pristine conditions. Viscarra Rossel et al. [2] outline the potential of different Thicket vegetation comprises a high component of the suc- spectral domains in combination with different methods for culent shrub Portulacaria afra [27], which accounts for large the derivation of quantitative soil information by using soil proportions of carbon in its biomass and the peripheral Applied and Environmental Soil Science 3

Soil samples of the representative spot included the top layer to a maximum depth of around 1 cm and were limited to the crust if a physical was present at the surface. For 38 plots, field spectral measurements were not possible due to overcast conditions; thus soil field spectra are only available for 125 of the sampled sites. Airborne HyMap imaging spectrometer data were ac- quired over the study area in October 2009 with a ground resolution of 3.3 m. These data are used for appropriate spectral resampling of point spectral datasets but not in the scope of this study and thus not described here in detail.

Figure 1: High-contrast land cover characteristics are found at 3.2. Laboratory Chemical Analysis. Soil samples were chem- fence lines between pasture and game farms in the Eastern Cape ically analyzed for organic carbon and dithionite extractable Province, South Africa. The pasture farm on the left side is highly iron using the Walkley-Black analysis [30], respectively, degraded whereas the land on the right side is only slightly citrate-bicarbonate-dithionite extraction method (CBD, influenced and is nearly pristine Thicket shrub vegetation. The [31]). Particle size was determined in five fractions by pipette consequences of unsustainable land use can be detected and method [32]. The analysis showed the presence of sandy and monitored using IS. loamy soils with generally low organic carbon contents. None of the histogram distributions shows a normal distribution soils. Intensive goat farming since the early 1900s resulted in (Figure 3). For all parameters, lower concentrations were the loss of P. afra and the transformation of wide parts of over represented. For organic carbon, only two samples Thicket vegetation to an open savannah-like system, which is exceed 3.5% and show high contents of 5.7 and 5.9%. Both accompanied by a severe loss of biodiversity and ecosystem of these samples were taken at extraordinary sites, since one ff carbon stocks [25]. Fence lines between farms of different sample was taken in an area a ected by droppings of grazing utilization show pristine and transformed vegetation in animals, while at the other sample site a partial grass layer direct neighborhood (Figure 1). covers the bare soil surface that might cause an increased The study area is located near the city of Port Elizabeth carbon input to the soil. Iron oxides, which primarily are (33.0◦S/25.3◦E; see Figure 2). It was selected within the the results of the weathering processes of the underlying Thicket Biome with an extent of 75×4 km, so that it includes sand- and mudstones range up to 10.6%. Field surveys show the highest variance of various Thicket vegetation classes. that the distribution of clay and sand in overlying soils is It splits in two parts separated by a national park. Terrain directly related to the underlying parental material. Clay and elevation changes from 300 to 930 m in the northern part sand concentrations increase where mudstones, respectively, to 130 to 400 m in the southern section. The study area has sandstones, occur close to the surface. All samples were used a warm semiarid climate with monthly mean temperatures for further analysis, even though the chemical contents of ◦ ◦ ff ranging from 13 C in winter months to 25 C in summer some samples di er from the distribution of the rest of the months (data by [28]). Annual precipitation averages 200 population (see Figure 3). to 400 mm. The underlying mudstones and sandstones have resulted in the development of loamy and sandy soils at the 3.3. Spectral Reference Data. The measurement protocol surface (mostly cambisols and luvisols), (own investigations resulted in three spectral datasets from field and laboratory and [29]). environment, (1) insitu field spectra of the undisturbed soil surface, (2) bare soil field spectra, where an eventually pre- 3.1. Field Sampling. During two sampling campaigns 163 sent cover of small stones was removed, and (3) laboratory natural, nonagricultural sites were sampled for ground truth, spectra. 96 in June 2009 and 67 in September/October 2009. The For spectroradiometric measurements, a portable Field- plots were randomly distributed within topographic units Spec Pro 2 spectrometer (Analytical Spectral Devices, Inc.) and the two sections of the study area as it is shown in was used. The measurements were taken with the bare fibre Figure 2. Most of the samples were taken in areas where (FOV of 25◦) and were directly converted to reflectance using bare soil is the dominant land cover fraction. The amount a Spectralon panel as a reference. Measurement height was and variability of sites sampled should result in a complete kept constant at 1.15 m, resulting in a footprint diameter of coverage of the variance of soils present. Each surveyed approximately 50 cm. For spots where the undisturbed bare spot was chosen to represent the typical conditions of the soil fraction was very small, an alternative height of 0.5 m surrounding area. For each plot, the general location-specific was applied. If small stones (mm to cm range) overlaid the information (GPS coordinates, terrain information, land use, surface, measurements were carried out first of the unchang- etc.), field spectra of the exposed soils and soil samples were ed surface (insitu) and then only from the bare soil surface collected. Additional information about vegetation coverage with the stone cover removed carefully. Stone coverage for density and type as well as surface conditions was assessed of around half of the sites was low (<3%). Despite this, par- the center spot (1 × 1 m) and its surrounding (10 × 10 m). ticularly in regions where soils are shallow and developed 4 Applied and Environmental Soil Science

Northern section (57.2 × 4 km) 25◦16E25◦18E 25◦20E Location of the study area within South Africa N

Johannesburg Pretoria

Eastern Cape Durban

Cape Town Port Elizabeth ◦  ◦  32 40 S 32 40 S 0 500 1000 Kilometers

Distribution of Thicket vegetation in the Eastern Cape Province

32◦45S 32◦45S Somerset East Kirkwood Port Elizabeth 0125 250 500 Kilometers

Legend: ◦  ◦  Distribution of Thicket vegetation 32 50 S 32 50 S Sample points June 2009 September/October 2009

Southern section (22.5 × 4 km) 25◦20E 25◦22E

32◦55S 32◦55S

◦  33◦25S 33 25 S

33◦0S 33◦0S

◦  33◦30S 33 30 S

33◦5S 33◦5S

◦  33◦35S 33 35 S

25◦16E25◦18E25◦20E 25◦20E25◦22E

Figure 2: Detailed view of the HyMap imagery of the two transects west of Somerset East and in the Kirkwood vicinity in the Eastern Cape Province, South Africa, showing the distribution of the sampled sites. Each transect section comprises three HyMap flight lines. Applied and Environmental Soil Science 5

C org Iron oxides Clay 60 Min: 0.21% 60 60 50 Min: 0.9% Min: 0% Max: 5.85% 50 Max: 10.62% 50 Max: 23.8% 40 Mean: 1.21% Mean: 3.06% Mean: 6.46% Stdev: 0.89% 40 40 30 Stdev: 1.44 % Stdev: 5.08 % 30 30

Frequency 20 Frequency Frequency 20 20 10 10 10 0 0 0 0123456 0246810 0 510152025 Content Corg (%) Content iron oxides (%) Content clay (%) (a) (b) (c) Figure 3: Histogram distribution with statistical parameters for 164 topsoil samples collected in the Thicket Biome, South Africa, analyzed for soil organic carbon, iron oxides, and clay content. upon mudstones, it was significantly higher and ranged up in maximum and minimum spectra (Figure 4(c)). Where to around 50%. physical soil crusts are present, the smooth soil surface in After the samples were dried, ground to dissolve aggre- combination with a reduced size of the particles at the soil gates, and sieved, the soil fraction smaller than 2 mm was surface increases soil reflectance. Thus the reflectance of indi- spectrally measured in the laboratory. Constant illumination vidual samples measured in a field environment exceeded the conditions were given by two Quartz halogen lamps (300 W reflectance of the corresponding laboratory spectra. each) at a zenith angle of 30◦ and a distance between spec- trometer probe and sample surface of 15 cm. After five mea- 4. Methodology surements were taken with the bare fibre, each sample was turned by 90◦ and measured again to reduce illumination Two methods are applied to establish a quantitative rela- effects of the soil surface. Averaged field and laboratory tionship between the chemical contents of the surveyed spectra were smoothed using a Savitzky-Golay filter [33]. soil constituents and the spectral signal. Approach A is a Additionally, for the spectra measured in the field, the ranges physically based model, where spectral feature analysis is of atmospheric water bands were interpolated from 1345 to coupled with multiple linear regression techniques, while 1445 nm and 1800 to 1960 nm. approach B applies partial least squares regression as an exclusively multivariate statistical method. 75% of the data Since the quantification methods are developed to be (94 samples for field datasets, 123 samples for laboratory applied to hyperspectral imagery, each dataset was used in dataset) were selected by random stratified sampling based both full ASD resolution of 2087 spectral bands for the on the chemical reference of each soil constituent for training laboratory dataset and 1829 bands for field datasets, res- and used for model calibration, while the remaining 25% pectively (382–2468 nm, 1 nm resampling), and additionally (31, resp., 40 samples) were used as test set to validate the resampled to the spectral characteristics of the calibrated models. The training and test sets thus are representative for HyMap imagery of 116 selected bands (456–2455 nm, spec- the distribution of the chemical reference values as they occur tral resolution 13–17 nm). in the South African study area. Even though the test samples ff The di erent measurement setups are reflected in the are not used for model calibration, they cannot be considered statistics of the spectral libraries (Figure 4). The mean reflec- as completely independent from the training set due to the tance of the spectra measured in the laboratory is higher than spatial proximity of the measurements. of the two-field datasets because the pre-treatment applied beforehand to the samples (homogenizing and sieving) 4.1. Approach A: Multiple Linear Regression of Spectral Feature ff prevents shadow e ects that may be caused by soil aggre- Parameters. This approach applies a set of spectral features, gates or extraneous material (Figure 4(a)). The controlled found in previous literature to be characteristic for soil laboratory conditions result in very low standard deviations. organic carbon, iron oxides, and clay content for subsequent Despite this, the spectral variance of the field measurements multiple linear regression analysis. Since specific spectral is stretched by influences that are not characteristic for properties which are related to physical absorption processes specific soil constituents, such as surface condition and are used, this approach is seen as a physical model. It was illumination effects (Figures 4(b) and 4(c)). In the insitu field developed to take advantage of the combination of several measurements, average soil reflectance is reduced by small spectral features and characteristics. By parameterization of stones overlying the soil surface. Detailed statistics show that these spectral properties, one takes advantage of existing the presence of stones also introduced a further variability physically based information. to the measurements, thus the standard deviation is higher compared to corresponding bare soil field measurements. 4.1.1. Selection of Spectral Features. The selection of diagnos- Bare soil field spectra in particular present large extremes tic spectral characteristics is based on their presence in the 6 Applied and Environmental Soil Science

70 70

60 60

50 50

40 40

30 30 Reflectance (%) Reflectance Reflectance (%) Reflectance 20 20

10 10

0 0 500 1000 1500 2000 2500 500 1000 1500 2000 2500 Wavelength (nm) Wavelength (nm) (a) (b) 70 70

60 60

50 50

40 40

30 30 Reflectance (%) Reflectance Reflectance (%) Reflectance 20 20

10 10

0 0 500 1000 1500 2000 2500 500 1000 1500 2000 2500 Wavelength (nm) Wavelength (nm) (c) (d)

Figure 4: Comparison of mean spectra of the three spectral libraries (a) of insitu field (solid), field bare soil (dashed), and laboratory measurements (dotted). Statistical overview of 125 insitu field soil measurements (b), 125 field bare soil (c), and 163 laboratory spectra (d) with mean (solid), positive and negative standard deviation (dashed), minimum and maximum (dotted). All spectra are given in full ASD spectral resolution.

spectra and on well-known literature, for example, [7, 8, 34– Spectral detection and assessment of soil organic carbon 40]. From this existing information only the most important is challenging since the numerous spectral features inherent and strong features are selected to increase the robustness of to organic materials are often weak matching the com- the algorithm. Features unique to one soil constituent are plex chemistry of organic matter. Among the absorptions preferred, but not all spectral characteristics can fulfil this previous studies found to be characteristic for molecules requirement. Table 1 lists the spectral features that are and combinations of organic matter, the wavelength ranges selected to be used in this study and references to a summary around 1730 and 2330 nm are selected because they are of previous studies where these features were described for identified as the most distinct in previous studies. Especially enriched soils or nonsynthetic mineral powders. the absorptions around 1730 nm are unique for organic The selected spectral characteristics are divided into carbon since in this wavelength region no other absorptions three feature types: absorption features (AF), features of the of common minerals present in soils and rocks occur. In spectral curve (CF), and features of its continuum as convex both wavelength regions, several second and third overtone hull of a spectrum (HF). With these three types it is assumed absorptions of functional groups assigned to cellulose, lignin, that all specific spectral characteristics are covered that are starch, pectin, glucan, protein, wax, and humic acid as described in previous studies as significant for soil organic components of soil organic matter overlap and in combi- carbon, iron oxides, and clay content. nation lead to a noticeable absorption (e.g., [5, 34]). The Applied and Environmental Soil Science 7

Table 1: Spectral features that are used for the delineation of soil organic carbon, iron oxides and clay content within approach A. References to studies previously describing these spectral characteristics are given.

Feature and type Wavelength Assignment Reference (i) 1730 nm (AF) 1650/1669 nm 2υ3 of aromatic CH stretch [5, 7] 1 CR : 1600–1815 nm ≈ υ 2 1700 nm 2 of CH stretch [34] λmax : not defined 1706/1754 nm 2υ of CH, alkyl doublet [7] 1706 nm 4υ of aliphatic CH stretch [7] 1726 nm 2υ of aliphatic CH stretch [5] 1761/1769 nm 2υ of aliphatic CH stretch [5] υ 1730–1852 nm 4 of methyl CH [7] υ Corg (ii) 2330 nm (AF) 2275/2279 nm 3 of CH2,CH3 [5, 7] CR: 2240–2410 nm ≈2300 nm υ + υ4 of CH stretch [34] λ max: 2330 nm 2307–2469 nm 3υ of methyl CH stretch [7] 2309 nm 3υ of aliphatic CH stretch [5]

2331 nm 3υ of CH2,COO [5]

2337/2386 nm 3υ of COO, CH3 [5] 2347 nm 3υ of aliphatic CH stretch [5] υ 2381 nm 3 of CO stretch of carbohydrates [7] (iii) 450–740 nm (HF) Decrease in reflectance in the visible range (e.g., [4–6, 35]) (iv) 1460–1750 nm (HF) Decrease in reflectance in the near to shortwave infrared range (e.g., [4, 6, 35]) (i) 550 nm (AF) ≈490 nm ET5 band of Fe3+ [7, 36] CR: 450–680 nm 503 nm Goethite [37] λ max: 550 nm ≈510 nm ET band of Fe2+ [36, 38] 529 nm ET band of hematite [7] 535 nm Hematite [37] 2+ ≈550 nm ET band of Fe , hematite (e.g., [36, 38, 39]) (ii) 700 nm (AF) 650 nm ET band of hematite and goethite [7, 39] CR: 580–800 nm 665 nm ET band of goethite [37] λ max: 700 nm 700 nm 4υ of O–H [7] 3+ ≈700 nm ET band of Fe (e.g., [35, 36, 38]) (iii) 900 nm (AF) ≈850–870 nm ET band of hematite [39, 40] Iron oxides CR: 750–1300 nm 860 nm ET band of Fe2+ and Fe3+ [34] λ max: 870 nm 868 nm Hematite [37] ≈870 nm ET band of Fe3+ [35, 36, 38] 884 nm ET band of hematite [7] ≈900 nm Transition bands of Fe2+ and Fe3+ (e.g., [34, 38]) 900–930 nm ET band of goethite [39] 920 nm ET band of goethite [7] ≈930 nm 3υ of O–H stretch, Goethite [7, 37, 40] 940 nm υ + υ of O–H stretch [7] 1000–1100 nm ET band of Fe2+ (e.g., [35, 36]) 1025 nm ET band of Fe3+ [8] 3+ 1075 nm ET band of Fe [8] (iv) 550–590 nm (CF) Decrease in reflectance in the blue wavelength range (e.g., [34, 38, 39]) towards the ultraviolet light (v) 450–750 nm (HF) Decrease in reflectance in the visible range (e.g., [34, 38, 39]) 8 Applied and Environmental Soil Science

Table 1: Continued. Feature and type Wavelength Assignment Reference δ6 (i) 2200 nm (AF) 2160/216 + of AlOH bend of kaolinite υ υ [7, 37, 50] CR: 2100–2290 nm 2208/2209 nm doublet and + of OH stretch λ δ of AlOH bend and υ + υ of OH max: 2206 nm (e.g., ≈2200 nm stretch of montmorillonite and [35, 40, 50, 51]) illite 2200/2204 nm Montmorillonite [37] υ of OH stretch of 2206 nm [7] montmorillonite and illite δ of AlOH bend of kaolinite 2208 nm [7] Clay doublet 2216 nm Illite [37] δ 2230 nm of AlOH bend of smectites [7] (ii) 2340 nm (AF) 2308/2312 nm Kaolinite [37] CR: 2270–2410 nm 2336 nm Illite [37] λ max: 2340 nm 2340 nm υ of OH stretch of illite (e.g., [7, 22, 50]) 2372/2376 nm Kaolinite [37] (iii) 450–700 nm (HF) Increase in reflectance in the visible range (e.g., [34, 42]) (iv) 1460–1750 nm (HF) Increase in reflectance in the near to shortwave infrared range (e.g., [34, 42]) 1 2 3 CR: range where continuum removal is performed; λmax: wavelength of maximal absorption predominantly found in literature; υ: overtone absorption bands; 4υ + υ: combination bands of fundamental and overtone absorptions; 5ET: electronic transition bands; 6δ: fundamental absorption bands. general decrease in reflection of organic-carbon-rich soils, as compounds of the most common iron oxides goethite for instance, in the visible due to the darkness of humic acid, and hematite (e.g., [34, 35, 39]). Electronic transition is included as a feature of the spectral continuum. Several bands usually appear as broad absorptions. The decrease in studies employ different wavelength regions in order to reflectance in the blue towards the ultraviolet region as result describe this property (e.g., [5, 6]) or tested the performance of a broad absorption band in the UV range is described of various spectral indices to detect soil organic carbon by various authors (e.g., [38, 39]) and included as two (e.g., [4]). In this study, the two ranges in the visible region features of the spectral curve: first, the change in reflectance between 450 and 740 nm and in the near to shortwave infra- in the blue wavelength range as slope of the spectral curve red range between 1460 and 1750 nm are used to describe the between 550 and 590 nm (CF) and second, the shape of the characteristic decrease in reflectance resulting of a presence continuum is included over the entire VIS range (450 to of soil organic carbon (HF). 740 nm, HF) to pick up the shape of the curve with a reduced Two selected features of organic matter are ambiguous influence of the 550 nm absorption. but still can be applied since the local conditions of the The spectroscopic determination of the clay content is study area rule out any influences. First, this applies to the done using the prominent AlOH absorption feature around absorption near 2330 nm, which is also inherent to carbonate 2200 nm that is inherent to all clay minerals, and a minor minerals, but since there is no carbonate bedrock present in hydroxyl feature around 2340 nm. Since both features result the study area and carbonate precipitation sediments occur from fundamental and overtone absorptions of functional only very locally and mostly in deeper soil layers. Second, in groups inherent to clay minerals, they occur as pronounced addition to the influence of soil organic carbon, the overall and narrow absorptions. As illite and montmorillonite are reflectance is also affected by soil moisture content and grain reported as the dominant clay minerals in the study area, size distribution. However, an influence of soil moisture 2206 nm was chosen as center wavelength of the 2200 nm can be neglected, because all soils were dry at the day of absorption. Further, the spectral effect of the grain size and before the time of the survey. This is also shown by distribution was used as indirect indicator for the clay local weather data [41]. The effect of grain size on overall content in a soil. The effect of an increasing content of finer reflectance and also a feature close to the carbon feature at grain sizes leading to an increase in reflectance (e.g., [34, 42]) 2330 nm are also included as spectral characteristics for the is analyzed as shape and mean reflectance of the spectral determination of clay content, but it is not possible to resolve curve in the VIS and also in the NIR/SWIR range (HF). the proportion of each of the two constituents. For the determination of soil iron oxides, we apply the 4.1.2. Feature Parameterization. In feature parameterization, strong absorptions occurring in the visible range around the spectral datasets are analyzed for the selected spectral 550, 700 nm, and near 900 nm. All of them result from characteristics of the three types introduced and are trans- electronic transition bands of the bi- and trivalent iron ions ferred to numerical parameters that describe the shape of Applied and Environmental Soil Science 9

Absorption features (AFs) Curve features (CFs) 30

S left w Sright 1 25

Aleft Aright

0.98 20 dmax s Reflectance (%) Reflectance 15

Normalized reflectance Normalized 0.96

10 λdmax 0.94 2150 2200 2250 500550 600 650 Wavelength (nm) Wavelength (nm) (a) (b) Hull features (HFs) 60

s 50

40

r 30 Reflectance (%) Reflectance

20

10 500 1000 1500 2000 2500 Wavelength (nm) (c)

Figure 5: Parameterization of variables for the three types of described spectral features (AF, CF, and HF) used for the determination of soil organic carbon, iron oxides, and clay. Solid line: reflectance curve, dotted: continuum of the reflectance curve.

the spectral features (see Figure 5). These spectral feature (A), and asymmetry (AS = Aleft/Aright). Note that for the variables are used for subsequent regression analysis. Table 1 AF near 1730 nm, only five variables are calculated because lists parameters of the applied spectral features that are used no center wavelength (λlit) is defined since this feature for parameterization. consists of several weaker absorptions overlapping in this Spectral absorption features (AF) are parameterized from wavelength region (see Tables 1 and 2). To describe the continuum removed reflectance spectra to isolate them from significant shape of the spectral curve in a defined wavelength the trend of overall reflection and allow for intercomparison range, two characteristics are introduced. (1) Curve features (see Figure 5(a)). Continuum removal (see [43]) is calculated (CF) describe changes in reflectance occurring in a specific individually for a defined interval around the feature being wavelength range (see Figure 5(b)), for example, the one mapped (CR in Table 1). An AF is described by the following triggered by the strong absorption of iron oxides in the six variables, similar to spectral absorption feature analysis ultraviolet. They are characterized only by the mean slope of described by various authors (e.g., [37, 44, 45]): depth (dmax) the spectral curve (s), calculated from a line fitted in the given and wavelength (λd max) of maximal absorption, absorption wavelength range (see Table 1). (2) Features of the convex depth at supposed characteristic wavelength according to hull (HF) describe a soil constituent’s effect on a broader respective literature (dλlit), feature width (w) as distance spectral region that does not produce a distinct absorption between the two determined feature shoulders (Sleft/right), (see Figure 5(c)). One region in the VIS (450 to 740 nm) and the area between normalized continuum and spectral curve a second in the SWIR range (1460 to 1750 nm) were selected. 10 Applied and Environmental Soil Science

Table 2: Number of spectral features used and total number of for regression analysis if the absolute value of a regression calculated variables for determination of key soil constituents. coefficient’s mean is smaller than twice its standard deviation (see [46]) determined in leave-one-out cross-validation. Absorption Curve Hull features Total number features features Spectral variables found to be insignificant were excluded (2 var.) of variables (6 var.) (1 var.) from further analysis since they provide no additional value for model development and might provide statistical C 21 0215 org adaptation. A sample i is identified as outlier if the absolute Iron 3 1 1 21 2 2 deviation of Ri from the mean of all Ri is higher than Clay 2 0 2 16 2 twice the standard deviation of all Ri determined in leave- 1 For the AF near 1730 nm, only five variables are calculated (see text). one-out cross-validation. Samples identified as outliers may be removed from the population in this step by manual These wavelength regions of the two HF are used for the interaction if a proper reason is eminent (e.g., problems with determination of multiple soil constituents. HF features are sampling or analysis). The final multiple linear regression parameterized by mean slope (s) and mean reflectance (r) model is established based on the significant feature variables of the convex hull of the spectrum in a defined wavelength and including all samples. Soil parameter prediction models range so that they describe only the spectral shape without are calibrated for the training spectral datasets of both the influence of distinct local absorptions. Both variables of spectral resolutions. The models are evaluated in test set HF features are calculated by using a line fit. validation to assess the predictability of each model. The parameterization of n spectra and m spectral variables results in a m × n sized matrix. Table 2 shows the 4.2. Approach B: Partial Least Squares Regression. For com- number of spectral features analyzed for the determination parison, calibration models were built upon the same spe- of soil organic carbon, iron oxides, clay, and the resulting ctral datasets using partial least squares regression, as it is a number of variables that describe the spectral shape and well-established chemometric technique and often applied in subsequently are used in regression analysis to model soil spectroscopy. PLS is based on a projection of the predic- parameter contents. tor (x) and response (y) variables into a set of latent variables (or PLS factors) and corresponding scores, minimizing the 4.1.3. Model Calibration by MLR Analysis. Multiple linear re- dimensionality of the data while maximizing the covariance gression analysis is used to establish a functional relationship between x and y variables. A detailed description of the PLS between spectral variables and chemical reference data. technique is given by [47]. For PLS modeling the ParLes Figure 6 summarizes crucial steps of the work flow applied software was applied [48]. in the feature-based regression approach. MLR is conducted Thespectraldatasetswerepreprocessedwithoneora separately for each soil constituent using the ground ref- combination of two of in total 11 manipulation methods erence data collected in the field campaigns. The group of (transformation of reflectance (R)tolog(1/R), 5 light scatter- spectral variables resulting from the parameterization of the ing and baseline corrections such as multiplicative scat- selected spectral characteristics initially is checked for three ter correction (MSC), standard normal variate transform parameters to ensure their suitability for MLR analysis: (1) (SNV), wavelet detrending, and so forth, calculation of 1st the normalized standard deviation of each variable’s values and 2nd derivative, mean centering, variance scale of the (standard deviation in relation to mean) must be higher data, and a combination of the latter two). The performance than 0.001 to exclude very small variables which would only of each preprocessing setting was evaluated in leave-one-out cause an undesirable statistical adaptation during regression cross-validation (CV). For each setting, the optimal number analysis. Such low standard deviations, for instance, would of latent variables to be used for modeling was examined be reached for the depth of an absorption feature that is only by the variation of the root mean square error (RMS) and weakly established or only occurs in some of the spectra. the Akaike information criterion (AIC) as a function of (2) Spectral variables result in redundant values which may the number of latent variables. The optimal number of be a result of a reduced spectral resolution (e.g., dmax and PLS factors was determined at local minima of RMS and dλlit), and (3) the variables are analyzed for their variance to AIC within a steady trend of these two factors. The one identify variables with a highly invariant part that are likely preprocessing setting in combination with its corresponding to be controlled by a superordinate factor (e.g., sensor band optimal number of PLS factors performing best in cross- positions). These types of spectral variables are not suitable validation was selected and subsequently used for model for regression analysis and are excluded from the analysis. calibration based on the observations of the training set. Calculated feature variables are subsequently standard- Calibrated models were further applied to the test set for ized to allow for a comparison and ranking of regression validation purposes (see Figure 6). coefficients. Together with chemical reference values related If several preprocessing settings provided similar CV to the soil constituent under consideration, multiple regres- accuracies, then model calibration and validation were sion analysis is performed resulting in an initial relationship performed for each setting. The PLS model with highest without further intervention. Leave-one-out cross-validation accuracies in both calibration and validation, as well as a is conducted to analyze the collection of spectral variables minimal difference in between them was selected. This way for significance and in order to check the sample population the most significant, and robust PLS prediction models were for outliers. Spectral variables are considered insignificant retrieved and overfitting of the models was prevented. Applied and Environmental Soil Science 11

Approach A: feature-based MLR Approach B: partial least squares regression

Pre-processed Pre-processed spectra spectra

Feature parameterization Pre-processing of spectra Selection of Quality control of variables Chemical optimal number of reference data PLS factors and best Standardization of variables pre processing method Chemical reference data Leave-one-out cross-validation Multiple linear regression analysis Calibrated regression Final PLS regression analysis relationship Leave-one-out cross-validation Test set validation Exclusion of insignificant variables Calibrated regression Final MLR regression analysis relationship Input dataset Manual step Test set validation Automated processing Output step

Figure 6: Work flow for feature-based MLR and PLS regression analysis.

5. Results and Discussion every feature-based regression model and thus are the most important ones for model development. Their influence is For each soil constituent, a model was set up using the two determined based on the regression coefficient and given modeling approaches, three spectral datasets, and two spec- in % of the summed absolute values of all regression tral resolutions, resulting in 12 models per constituent. All coefficients. Negatively signed influences indicate negative samples, though split into training and test sets, were applied regression coefficients. The absolute values of the regression for modeling. Model performance was assessed for each coefficients, and thus the influences determined therefrom, method and dataset and compared based on the model’s mainly depend on the number and character of participating ffi R2 correlation coe cient ( ) for predicted versus measured spectral variables. Thus, their absolute values cannot be compositions, root mean square error (RMS), and ratio of expected to be identical for different models. Nonetheless, performance to deviation (RPD). The RPD is defined as their ranges are significant and used to identify the most the ratio of the standard deviation of the reference samples important spectral variables. divided by the RMS. This was done for both calibration and validation using the corresponding datasets of training data Soil Organic Carbon. All prediction models for soil organic (94 samples for field and 123 for laboratory measurements) carbon provide excellent calibration accuracies with R2 and test data (31/40 samples) (see Tables 3 and 5). Cal between 0.75 and 0.81, RMS around 0.45%, and all RPD Goodness of predictions is evaluated following the Cal Cal higher than 2.0. The good correlation is also represented in qualitative classifications of Chang et al. [9]. They suggest R2 the scatter plots of measured versus calculated soil organic greater than 0.80 and RPD values greater than 2.0 as indi- cators for excellent prediction models. R2 between 0.50 and carbon concentrations exemplarily shown for the model 0.80 and RPD values between 2.0 and 1.4, were considered as built upon bare soil field spectra in HyMap’s spectral resolu- models of medium quality which are useful for quantitative tion (Figure 8). The accuracy of the test set validation is predictions in most applications. Models with R2 lower than good especially for the laboratory datasets and in the same 0.50 and RPD lower than 1.4 are to be ranked not useable. range as the calibration (RPDVal closeto2.0).Thesemodels particularly showed good RMSVal around 0.35%. However 5.1. Approach A: Multiple Linear Regression of Spectral Feature validation accuracy is slightly lower but reasonable for the R2 Parameters. Details of the calibration models established field datasets ( Val between 0.49 and 0.62, RMSVal between using approach A are given in Table 3.Thenumberof 0.43 and 0.54%, RPDVal between 1.26 and 1.57). spectral variables found as insignificant and excluded from The models apply 12 to 15 of 16 calculated spectral the final model development depends on its variation variables. In the models developed based on field spectra, during cross-validation. Thus, the number of significant the influences concentrate on few variables, and both spectral variables varies for all models established for a certain variables describing the two hull features are of great impor- parameter. Table 4 gives the five spectral variables, which tance (see Table 4). In all these models, rHF SWIR1, sHF VIS, show the highest coefficients in the regression equation of and rHF VIS are the dominant factors in the regression 12 Applied and Environmental Soil Science

Table 3: Calibration and validation accuracies for approach A—Multiple linear regression of spectral parameters. The models further to be applied for a large-scale prediction of soil constituents based on HyMap imagery are highlighted.

Calibration (941/1232 samples) Validation (311/402 samples) Spectral dataset No. of spectral variables R2 R2 Cal RMSCal RPDCal Val RMSVal RPDVal (1) In situ field 12 of 16 0.77 0.47 2.09 0.49 0.54 1.26

Corg (2) Bare soil field 14 of 16 0.79 0.44 2.22 0.56 0.48 1.42 (3) Laboratory 13 of 16 0.75 0.47 2.01 0.74 0.36 1.93

ASD spectral (1) In situ field 19 of 21 0.63 0.80 1.64 0.24 0.90 1.07 resolution Iron oxides (2) Bare soil field 15 of 21 0.64 0.78 1.68 0.21 0.93 1.04 (3) Laboratory 19 of 21 0.75 0.76 2.01 0.23 1.63 0.69

(1) In situ field 14 of 16 0.31 4.12 1.20 0.01 5.13 0.88 Clay (2) Bare soil field 11 of 16 0.21 4.39 1.13 0.03 4.80 0.94 (3) Laboratory 16 of 16 0.23 4.60 1.14 0.05 4.52 1.00 (1) In situ field 13 of 16 0.79 0.44 2.19 0.51 0.54 1.27

Corg (2) Bare soil field 12 of 16 0.81 0.43 2.28 0.62 0.43 1.57 (3) Laboratory 15 of 16 0.77 0.46 2.08 0.77 0.35 2.00 HyMap spectral (1) In situ field 18 of 21 0.62 0.80 1.64 0.17 1.04 0.93 resolution Iron oxides (2) Bare soil field 19 of 21 0.66 0.76 1.73 0.26 0.93 1.03 (3) Laboratory 20 of 21 0.73 0.79 1.94 0.25 1.58 0.71

(1) In situ field 14 of 16 0.28 4.20 1.18 0.00 5.05 0.89 Clay (2) Bare soil field 14 of 16 0.17 4.49 1.11 0.01 4.91 0.92 (3) Laboratory 12 of 16 0.25 4.53 1.16 0.05 4.62 0.98 1 Number of samples in training and test set for field datasets. 2Number of samples in training and test set for laboratory datasets.

R2 relationships (see Figure 5 for abbreviations of spectral performance of the laboratory spectra. Even if Cal values variables). Together they account for an influence of more for all iron oxides prediction models are reasonable, lower than 60%. The two last ones show a negative influence as RPDCal values already indicate reduced predictability. Vali- R2 it was to expect, since increasing contents of soil organic dation results confirm this with very low Val (0.17to0.26), carbon reduce reflectance in the visible, and thus have an increased RMSVal ,andlowRPDVal . Especially the models effect on slope and mean reflectance. Although the same built upon the laboratory soil spectra are providing signifi- ff r ffi e ect was expected for HF SWIR1, its coe cient is positive. In cantly lower validation accuracies. Scatter plots of validation the models developed based on the laboratory dataset, the samples revealed the poor prediction of one sample which influences are distributed on more spectral variables and in is characterized by an unusually high reflectance in the VIS particular the variables describing the 1730 nm absorption and probably not representative for the spectral variation. In A r d are predominant. AF1730, HF SWIR1,and maxAF1730 show up total feature-based iron oxides’ prediction models thus do as important factors. not reach the quality of the soil organic carbon prediction As calibration and validation accuracies show that all models with significantly poorer validation accuracies. prediction models for soil organic carbon are of high quality The predominant part of the iron oxides’ prediction according to the applied classification [9]. All models reach models are built with 18 to 20 of the 21 spectral variables the criteria of excellent prediction models based on RPD , Cal describing spectral feature properties. One model applies while their validation performance is well within medium- only 15 variables. In the models developed based on field class models. The selection of spectral variables that account A A d d d for the most influences on regression relationships is very spectra, AF900, AF550, max AF550, max AF700,and λlit AF900 constant and seems to be reasonable especially for field data, are of prominent importance (see Table 4). Maximal absorp- where the variables of the spectral continuum show up as the tion depths are usually positively signed, as expected. It is most influential ones. However, for the laboratory dataset, noticeable that the area of the 900 nm absorption feature A the influences are significantly different and more connected ( AF900) consequently is signed negatively. For the models d d to the specific absorption around 1730 nm. developed based on laboratory spectra, max AF900, λlit AF900, AAF550,andAAF900 appear as the most important spectral Iron Oxides. Calibrated iron oxides’ prediction models show variables. Also within the models built on the laboratory R2 spectra, negative coefficients of AAF900 can be observed also Cal between 0.62 and 0.75, RMSCal around 0.80%, and RPDCal between 1.64 and 2.01. Again with significantly better for the characteristic absorption at 900 nm. This may suggest Applied and Environmental Soil Science 13 AF2330 AF2330 AF2200 AF550 AF900 AF700 AF2330 HF VIS HF VIS HF VIS A A s r r max AF2340 A max AF700 max AF900 max AF550 HF SWIR1 max AF2330 max AF2340 dmax AF2200 s d d d d d d λ λ 5.9 7.7 9.0 5.6 7.4 7.1 6.5 6.2 7.3 A 5.3 AS [%] Variable 4.9 9.8 4.9 7.5 7.5 5.7 AS 7.9 7.4 AS − − − − − − − − − − I AF900 AF2200 AF1730 AF2340 lit AF900 lit AF900 HF VIS HF VIS HF VIS HF VIS λ λ s s r r A A A max AF700 max AF900 HF SWIR1 HF SWIR1 HF SWIR1 max AF1730 max AF2330 max AF1730 s d d r : asymmetry factor. Hull features (HFs): d d d d AF 6.0 7.6 8.3 6.6 7.8 A [%] Variable 11.8 8.0 4.9 9.9 6.7 5.6 7.7 8.1 8.8 9.3 d 7.7 r 12.0 11.6 − − − − − I − AF900 AF900 HF VIS lit AF550 lit AF700 lit AF550 HF VIS HF VIS HF VIS lit AF2340 lit AF2340 A A λ λ λ r r r HF SWIR1 HF SWIR1 HF SWIR1 λ λ max AF700 max AF1730 max AF2200 d d d r r r dmax AF2200 d d d d λ 8.4 [%] Variable 10.9 14.1 12.0 10.4 10.3 12.4 13.1 10.4 14.3 12.0 r 7.9 7.8 9.5 d 10.1 11.7 11.2 11.8 − I − − − − − − − − − − AF550 AF2200 lit AF900 lit AF900 HF VIS HF VIS HF VIS HF VIS lit AF2200 lit AF2200 lit AF2200 lit AF2200 A λ λ s s s A : depth at wavelength position given in literature, AS max AF550 max AF700 HF SWIR1 λ λ λ max AF550 λ max AF1730 max AF2200 d d r d d d d d d d AF lit d , [%] Variable 24.0 21.1 14.7 29.1 20.8 14.2 18.0 22.2 21.2 18.6 s 16.7 d 21.7 d 16.9 21.8 10.6 23.2 18.8 12.3 I − − − − − − − − − − − − max AF d AF550 AF900 AF900 AF550 AF1730 AF2200 AF1730 AF2200 AF2200 AF2200 lit AF2200 A A A A A A A A max AF900 max AF900 HF SWIR1 HF SWIR1 HF SWIR1 λ HF SWIR1 max AF2200 r r r d d d d :wavelengthof AF Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 max d [%] Variable 23.7 14.0 28.3 26.4 17.6 λ I − − − − − :maximumdepth, (1)Insitufield 36.6 (1)Insitufield 34.5 Spectral dataset (2) Bare soil field(3) Laboratory 30.5 (1)Insitufield 27.6 (3) Laboratory(1)Insitufield 29.3 29.2 (3) Laboratory (2) Bare soil field(3) Laboratory 30.4(1)Insitufield 21.3 r (3) Laboratory(1)Insitufield 24.4 18.9 (3) Laboratory 26.3 (2) Bare soil field 24.8 A (2) Bare soil field 22.0 A max AF d ) of the five most important spectral variables on the regression equation of each of the feature-based regression models of approach A. :area, I AF org org : slope in interval. A C C Clay (2) Bare soil field Clay HF s Iron oxides (2) Bare soil field Iron oxides 4: Influence ( Table : mean reflectance in interval, HF ASD spectral resolution HyMap spectral resolution r Symbols: absorption features (AFs): 14 Applied and Environmental Soil Science

Regression coefficients (Corg) Regression coefficients (iron oxides)

0.004 100

50 0.002

0 0

−50 −0.002 Regression coefficients Regression Regression coefficients Regression − 100 −0.004

500 1000 1500 2000 500 1000 1500 2000 Wavelength (nm) Wavelength (nm) (a) (b)

Figure 7: Regression coefficients of PLS models for the prediction of soil organic carbon (a) and iron oxides (b) built upon bare soil field spectra in HyMap’s spectral resolution of 116 bands. Shape and range of spectral regression coefficients are related to the selected data pretreatment method (A: log(1/R) + first derivative, B: Mean center). that with increasing iron oxides’ content more pronounced soil constituents using partial least squares regression tech- and steep features despite broad ones occur. niques. The application of the different pretreatment meth- All together, the iron oxides’ prediction model calibra- ods allowed the derivation of consistent PLS prediction mod- tions are of medium quality. Validation reveals drawbacks in els, which apply between 4 and 9 PLS factors for regression prediction accuracy. The selection of important spectral vari- analysis. In over 40% of the models, mean centering of the ables includes attributes of the prominent iron absorptions, data revealed to be the best pretreatment technique, while especially the one around 900 nm. transformation of reflectance (R)tolog(1/R) and the first derivative were the appropriate methods for each 11% of the models. In the remaining cases, the combination of two pre- Clay. The calibration of feature-based clay prediction mod- treatment methods performed best. Regression coefficients els leads to low correlations with R2 between0.17and Cal of established PLS models provide information about the 0.31 and RMS constantly above 4.0% for both spectral Cal importance of certain wavelength for the regression model. resolutions. Results of validation using the independent test Although they are highly variant depending on the settings set are poor and show no significant correlation (RMS up Val of each model, some of their information can be related to to 5.13%, RPD below 1.00). Cal physical soil properties. Soil clay content prediction models are built based on Soil organic carbon prediction models for the three A d 11 to all 16 derived spectral variables. AF2200, max AF2200, spectral datasets were very consistent for a number of pre- d r λlit AF2200,and HF SWIR1 frequently appear among the most treatment techniques, showing only slight differences. Cali- important spectral variables used in the multiple linear bration accuracies of the best PLS models for every dataset regression equations of all six models (see Table 4). Though, R2 in terms of Cal ranged between 0.77 and 0.82, and RPDCal considering all models, the influences observed for spectral ranged between 2.11 and 2.38 for both applied spectral variables are highly variant. There seems to be no connection resolutions, both with low RMSCal slightly above 0.4%. between the selection of specific variables and the field Test set validation results of organic carbon models are or laboratory origin of the spectral base data. Some vari- also of good quality (R2 between 0.53 and 0.79, RMS A r Val Val ables (e.g., AF2200, HF SWIR1) appear consequently positively between 0.32 and 0.52%, RPD between 1.30 and 2.11), d d Val signed, whereas, for instance, λlit AF2200 and max AF2200 occur with highest accuracies for laboratory spectra and consid- positively and negatively signed. erably lower for insitu field spectra. Nonetheless, organic The reduced calibration accuracies, poor validation accu- carbon prediction models established by partial least squares racies and no significant pattern within the spectral variables techniques indicate good performance in both calibration represented in the regression equation and their specific and validation and do not indicate model overfitting. The influence indicate that the feature-based regression approach regression coefficients of most soil organic carbon PLS fails to establish significant prediction models for soil clay models indicate a high importance of wavelengths in the VIS content. Developed regression relationships indicate a high range, which may be related to soil colour (see Figure 7(a) for degree of statistical adaptation and finally a low significance an example). Distinct features additionally occurring in the of each model. SWIR can be linked to absorptions of functional groups of soil organic carbon (see Table 1). 5.2. Approach B: Partial Least Squares Regression. Table 5 For the calibration of iron oxides prediction models, gives an overview of the prediction models for the selected consequently 7 or more PLS factors were needed to set up Applied and Environmental Soil Science 15

Table 5: Calibration and validation accuracies for approach B—partial least squares regression. The models further to be applied for a large-scale prediction of soil constituents based on HyMap imagery are highlighted.

Calibration Validation No. of Spectral dataset Pretreatment (941/1232 samples) (311/402 samples) factors R2 R2 Cal RMSCal RPDCal Val RMSVal RPDVal (1)Insitufield 1stderivative 6 0.79 0.45 2.16 0.53 0.52 1.30

Corg (2) Bare soil field MSC + 1st derivative 6 0.82 0.41 2.38 0.65 0.42 1.61 (3) Laboratory Mean center + 1st derivative 6 0.82 0.40 2.36 0.69 0.45 1.53

ASD spectral (1)Insitufield Meancenter 7 0.66 0.76 1.74 0.39 0.84 1.14 resolution Iron oxides (2) Bare soil field Mean center 8 0.69 0.73 1.80 0.43 0.80 1.20 (3) Laboratory Log(1/R) + mean center 9 0.82 0.64 2.39 0.43 0.98 1.14

(1)Insitufield Meancenter 7 0.23 4.34 1.15 0.02 4.63 0.97 Clay (2) Bare soil field Log(1/R) 5 0.11 4.67 1.06 0.06 4.42 1.02 (3) Laboratory Mean center 8 0.33 4.31 1.22 0.08 4.43 1.02 (1)Insitufield Log(1/R) + mean center 5 0.77 0.46 2.11 0.59 0.47 1.43

Corg (2) Bare soil field Log(1/R) + 1st derivative 5 0.82 0.41 2.38 0.62 0.45 1.51 (3) Laboratory Log(1/R) 9 0.79 0.43 2.20 0.79 0.32 2.11

HyMap spectral (1)Insitufield Meancenter 7 0.67 0.75 1.75 0.43 0.76 1.26 resolution Iron oxides (2) Bare soil field Mean center 7 0.66 0.76 1.73 0.45 0.77 1.25 (3) Laboratory Log(1/R) + mean center 9 0.81 0.67 2.28 0.49 0.88 1.27

(1)Insitufield Meancenter 4 0.14 4.58 1.09 0.01 5.06 0.89 Clay (2) Bare soil field Mean center 5 0.15 4.56 1.09 0.03 4.47 1.01 (3) Laboratory 1st derivative 8 0.37 4.16 1.27 0.10 4.43 1.02 1 Number of samples in training and test sets for field datasets. 2Number of samples in training and test sets for laboratory datasets.

R2 ffi significant models. The PLS models show correlations ( Cal) model providing su cient accuracy could be established between 0.66 and 0.82 for all six combinations of spectral when using partial least squares regression techniques and datasets and spectral resolutions. Resulting RMSCal errors based on the ground reference data of the South African between 0.64 and 0.76 and RPDCal between 1.73 and 2.39 study site. indicate slightly lower prediction ability than for organic carbon. Test set validation of the established iron oxides’ 5.3. Comparison of Modeling Approaches for Soil Organic prediction models show reduced accuracies, which is most Carbon and Iron Oxides. Of all functional relationships obvious in a decrease in R2 and RPD, while RMS for the most established with the two approaches, spectra measured in a models only slightly increases from calibration to validation controlled laboratory led to the best models (lowest RMSVal (RMSVal between 0.76 and 0.98%). No iron oxides model and highest RPDVal ), since they are not biased by influences reaches an RPDVal of 1.4 indicating a medium predictive present in a field environment (compare larger variance model [9]. PLS regression coefficients of iron oxides’ models of field datasets in Figure 4). This was expected and also (example in Figure 7(b)) show a increased importance of observed in previous studies (e.g., [24]). Generally, insitu wavelengths in the VIS range, which can be related to field spectral surveys usually performed lowest. This is very electronic transition bands of the iron ions primarily the one likely to be a result of small stones naturally overlying the around 900 nm (Table 1). In the SWIR, the 2208 nm clay bare soil surface and disturbing the correlation between spec- absorption shows up, representing an indirect correlation tral signature and chemical concentrations. The resampling with features related to clay. In fact, a correlation of iron of the full ASD spectral resolution of the field and laboratory oxides and clay content could be identified based on chemical dataset to the reduced HyMap resolution has a low influence reference (R2 of 0.20). on model performances. Similar losses in model accuracy The calibration accuracy of clay predictive models with when reducing spectral resolution were reported by various R2 Cal below 0.4 and RPDCal below 1.3, respectively, is below authors (e.g., [8, 24]). the accuracies of medium prediction models. RMS errors are For organic carbon prediction models, the difference constantly above 4.0%. Among these models, the best cor- in performance between feature-based MLR (approach A) relations were reached for the laboratory datasets. Corres- and PLS techniques (approach B) is low. However the pre- ponding validation results are equally poor. Regression co- dictability of the feature-based approach is slightly lower ffi e cients are highly variant. These results show that for esti- than that of PLS techniques. Average RPDCal of the feature- mations of the clay content in soils no significant prediction based MLR models for soil organic carbon is 2.15 and 2.27 of 16 Applied and Environmental Soil Science

C Approach A: Corg (calibration) Approach A: Corg (validation) Approach A: org (calibration) 6 6 30 R2 R2 Cal: 0.81 1 : 1 line Val: 0.62 1 : 1 line 5 RMSCal: 0.43% 5 RMSVal: 0.43% 25 RPDCal: 2.28 RPDVal: 1.57 4 4 20 3 3 15 calculated (%) calculated calculated (%) calculated 2 2 Frequency 10 org org C 1 C 1 5 0 0 0 0123456 0123456 −1.5 −1 −0.5 0 0.5 1 1.5 C org measured (%) Corg measured (%) Residues (%)

C C Approach B: C org (calibration) Approach B: org (validation) Approach B: org (calibration) 6 6 25 R2 R2 : 0.62 1 : 1 line Cal: 0.82 1 : 1 line Val 5 RMSCal: 0.41% 5 RMSVal: 0.45% 20 RPDCal: 2.38 RPDVal: 1.51 4 4 15 3 3 10 calculated (%) calculated calculated (%) calculated 2 2 Frequency org org 5 C 1 C 1 0 0 0 0123456 0123456 −1.5 −1 −0.5 0 0.5 1 1.5 C org measured (%) Corg measured (%) Residues (%)

Approach A: iron oxides Approach A: iron oxides Approach A: iron oxides (calibration) (validation) (calibration) 12 12 40 R2 R2 Cal: 0.66 1 : 1 line Val: 0.26 1 : 1 line 10 RMSCal: 0.76% 10 RMSVal: 0.93% RPDCal: 1.73 RPDVal: 1.03 30 8 8 6 6 20

4 4 Frequency 10 2 2 Iron oxides calculated (%) calculated oxides Iron 0 (%) calculated oxides Iron 0 0 024681012 024681012 −4 −20 2 4 Iron oxides measured (%) Iron oxides measured (%) Residues (%)

Approach B: iron oxides Approach B: iron oxides Approach B: iron oxides (calibration) (validation) (calibration) 12 12 30 R2 R2 Cal: 0.66 1 : 1 line Val: 0.45 1 : 1 line 10 RMSCal: 0.76% 10 RMSVal: 0.77% 25 RPDCal: 1.73 RPDVal: 1.25 8 8 20 6 6 15

4 4 Frequency 10 2 2 5 Iron oxides calculated (%) calculated oxides Iron Iron oxides calculated (%) calculated oxides Iron 0 0 0 024681012 0 2 4 6 8 10 12 −4 −2024 Iron oxides measured (%) Iron oxides measured (%) Residues (%)

Figure 8: Correlation plots and histograms of calibration residues for the prediction of soil organic carbon and iron oxides using the two presentedmodelingapproachesanddevelopedbasedonbaresoilfieldspectrainHyMap’sspectralresolutionof116bands.ApproachA: multiple linear regression of spectral parameters, approach B: partial least squares regression. Applied and Environmental Soil Science 17 the PLS models, while validation accuracies are comparable An explanation for the observed poor correlation of for both approaches (average RMSVal of 0.44%, RPDVal spectral and chemical information regarding clay content of 1.58 for both approaches). For approach A the most might result from the variable geology of alternating layers important spectral variables in the regression models were of sandstones and mudstones. Although in the study area selected very consistently and are significant. developed under the same climatic conditions, the formation Prediction models for iron oxides provide average cal- of soils highly depends on the source material and thus ibration correlation coefficients around 0.67 for approach soils which developed based on sandstones very likely differ A and 0.72 for approach B with RPDCal of 1.77 and 1.95, in their chemical and spectral properties from ones which respectively. Both approaches show RMSCal around 0.75%. developed based on mudstones. In a field environment the Lower RPDCal and higher root mean square errors of the presence of soil physical crusts may be one additional factor. iron calibrations of both approaches generally indicate a Soil physical crusts in the study area can locally be very reduced predictive power which is confirmed in validation well developed and reach over 1 cm in thickness. They form R2 during rain drop impacts on the uncovered soil surface results using the independent test set ( Val for all models is below 0.50, RPDVal of 0.91 as average of approach A and cause a disintegration of soil aggregates and particle models and 1.21 of approach B models). Thus, also within movement which results in a thin clay-rich surface layer of the iron oxides’ prediction models, the ones established using low permeability [49]. Soil crusts are problematic for spectral PLS regression techniques perform slightly better. With these analyses, since the chemistry of the surface layer which is results the iron oxides’ prediction models are of medium measured by a field spectrometer does not correspond to the quality for both approaches, even though they cannot reach chemistry of the bulk sample of the upper 1 cm analyzed in prediction accuracies of organic carbon models. the laboratory. As the detailed results show, the model accuracies for soil organic carbon and iron oxides predictions, determined 6. Summary and Conclusions by a combination of spectral feature analysis and linear multiple regression techniques, are in the same range that This study presented two approaches for the quantification of statistical PLS approaches provide. Though, the performance relevant soil parameters from spectral data. Regression mod- of approach A is slightly lower for both soil organic els were established for soil organic carbon, iron oxides, and carbon and iron oxides compared to the PLS approach. The clay content based on 163 samples measured in a laboratory difference is thus small for soil organic carbon models but environment and thereof 125 samples additionally measured more apparent for the prediction of iron oxides. intwofieldsetups.Thespectraldatasetswereinvestigated in original ASD spectral resolution and reduced resolution Figure 8 shows scatter plots of measured versus calcu- matching calibrated hyperspectral imagery of the study lated distributions for the prediction of soil constituents area obtained from the airborne HyMap sensor. The first using the two modeling approaches and based on bare field approach is a physical model based on spectral feature analy- spectra in the resolution of the HyMap imagery. These are sis. A combination of selected spectral features is described the models further to be applied to the HyMap imagery for by numerous variables that are used for multiple linear a large-scale prediction of soil parameters. Their statistics are regression analysis. As second approach, conventional partial highlighted in Tables 3 and 5. The histograms of the residues least squares regression analysis was chosen for comparison. are normally distributed. This clearly indicates that the estab- The best PLS model for each spectral library was selected lished models are able to model the chemical variance of the as combination of an appropriate preprocessing method and selected parameters, even though the chemical contents of the optimal number of latent variables determined in leave- the reference samples are not normally distributed (compare one-out cross-validation. Figure 3). Results show that the two presented approaches provide In the chemical reference data, 2 samples for soil organic similar capabilities to set up significant prediction models carbon and 1 sample for iron oxides attract attention because particularly for soil organic carbon and iron oxides. Good they exceed the overall distribution of the other samples and medium-class prediction models could be established (compare Figure 3). These samples appear in the calibration (RPDCal of 2.19 for Corg and 1.83 for iron oxides as scatter plots of Figure 8 for both approaches close to the average of all models of both approaches). Among these, 1 : 1 line, indicating that they might have a great influence organic carbon models for both approaches showed the best on model calibration. But when testing regression models calibration accuracies, with RPDCal of 2.15 for feature-based built on the presented spectral libraries excluding these regression and 2.27 for PLS regression with corresponding samples, similar models and results were achieved as they are RMSCal of around 0.44% (as average of all Corg models of presented here. each approach). Iron oxides’ models presented good and medium calibration accuracies but in particular showed a 5.4. Prediction of Clay Content in Soils. All regression models reduced validation performance. Both approaches failed to established and evaluated in this study do not adequately establish significant clay content prediction models (RPDCal predict soil clay content. Average calibration correlations are of 1.15 and RMSCal of 4.41 as average of all models), which R2 low for both approaches ( Cal of 0.23 as average of all mod- could be a result of the variable geology of the study area. els), with high RMSCal (average of 4.41%) and low RPDCal For the predominant part of the models built for soil (average of 1.15). Validation fails for both approaches. organic carbon and iron oxides, the prediction performance 18 Applied and Environmental Soil Science of the feature-based regression approach was in the same constituent are further combined in the proposed approach range as the statistically based PLS regressions provided. Also to establish a significant model, this approach is suggested for the models of both approaches, the same trends were to be more robust compared to similar physical approaches observed such as reduced validation accuracies compared that are based on the analysis of only one diagnostic spectral to calibration for iron oxides models and generally low feature or band ratio. Additionally, the established models are predictabilities of clay models. However, the feature-based rather simple and computationally unproblematic due to a approach in general performed slightly lower than the PLS limited number of variables in the regression relationship. approach. The difference was small for soil organic carbon Physically based approaches are often considered to be predictions (comparable validation performance of MLR more robust compared to exclusively statistical methods such and PLS models), though slightly more apparent for iron as PLS, and the transferability of such approaches, is expected oxides models. In the feature-based models, the spectral to be higher (e.g., [17]). For the newly developed approach variables dominating the regression relationships were very this will be tested based on the hyperspectral imagery of the consistent and support the significance of the establish- South African study area. For this, the developed feature- ed models, though different variables were predominant based regression relationships will be applied for a large-scale depend-ing on laboratory or field data as base. The impor- quantification of key soil parameters from hyperspectral tance of specific wavelengths in PLS models was highly imagery of the South African study area. The image data variant as result of the statistical modeling process. A acquired in 2009 are to be processed with a combination correlation to physical features was only present in some of methods that allow the extraction of the soil spectral models. signal from mixing signatures caused by small-scale changes Compared to other studies working in agricultural envi- in land cover. First results for the large-scale prediction of ronments [15, 19], the accuracy of prediction models for topsoil organic carbon and iron oxides using calibrations both approaches developed in this study is slightly lower. developed on bare soil field spectra and using the feature- This is likely a result of the large size and the highly variant based regression approach are very promising. However, characteristics of the South African study area caused by since the prediction models for soil clay content do not reach nonagricultural environment, differences in geology and significant accuracies, they are not suited to be applied for a soil types. The difficulty to achieve prediction models for derivation of large-scale soil information. The examination large areas with changing conditions (referred to as global of the HyMap imagery and the transfer of the methodology calibrations) was addressed in previous studies (e.g., [19]). presented in this study to the image data are beyond the scope Based on AHS-160 data Stevens et al. [19] predicted soil of this study. organic carbon using among others PLS techniques for an Resulting soil parameter maps are valuable information agronomic Luxembourgian region. An RPDVal of 1.47 was to quantify the soil degradation status within the South obtained with a global calibration, which is in the same African study area. In the Thicket Biome those information range as the prediction models for soil organic carbon can be used to detect eroded and degraded areas in their established with the two approaches presented in this study. primary stage in regard to direct the restoration of selective Stevens et al. could well improve this RPDVal to 2.76 using regions. The method was developed for semiarid areas in local calibrations based on agrogeological regions and soil general and not adapted to specific conditions in the study types. In our case the lack of detailed spatially continuous area. A transfer to other regions of similar environmental information matching the variation of geology and soil types conditions will be further investigated. in the South African study area prevented the investigation of the performance of local calibrations. A further factor-lowering prediction RPD is the small Acknowledgments variability in measured contents of organic carbon in our This research study was funded as Ph.D. project being part study area. The ground reference sites were mainly selected of the Helmholtz EOS Network, a collaboration of German to have a significant bare soil component so that these Helmholtz research centers. The PRESENCE Network (a Liv- sites can also be used as validation targets for airborne ing Lands initiative) supported the intensive field sampling. hyperspectral imagery. Thus, no soil samples were taken in The colleagues from DLR and GFZ are acknowledged for densely vegetated regions characterized by increased input their valuable contribution to many parts of this work. of Corg and thus higher concentrations that would have increased the variability in the ground reference which is directly related to the modeling RPD. This corresponds with References findings reported in, for example, [16, 19]. Results of this research show that it is possible to establish [1] R. Lal, “Soil carbon sequestration to mitigate climate change,” regression relationships on a physical basis that reach the Geoderma, vol. 123, no. 1-2, pp. 1–22, 2004. [2]R.A.ViscarraRossel,D.J.J.Walvoort,A.B.McBratney,L.J. predictability of PLS-derived regression models. This can Janik, and J. O. Skjemstad, “Visible, near infrared, mid infrared be achieved by the application of a set of spectral charac- or combined diffuse reflectance spectroscopy for simultaneous teristics for each of the investigated soil constituents and assessment of various soil properties,” Geoderma, vol. 131, no. the inclusion of various properties of these spectral features. 1-2, pp. 59–75, 2006. Statistical adaptation within regression analysis is reduced to [3] T. Cocks, R. Jenssen, A. Stewart, I. Wilson, and T. Shields, “The a minimum. Because several spectral properties of each soil HyMap airborne hyperspectral sensor: the system, calibration Applied and Environmental Soil Science 19

and performance,” in Proceedings of the 1st EARSEL Workshop [20] N. Richter, T. Jarmer, S. Chabrillat, C. Oyonarte, P. Hostert, on Imaging Spectroscopy, Zurich, Switzerland, 1998. and H. Kaufmann, “Free iron oxide determination in mediter- [4]H.M.Bartholomeus,M.E.Schaepman,L.Kooistra,A. ranean soils using diffuse reflectance spectroscopy,” Soil Sci- Stevens, W. B. Hoogmoed, and O. S. P. Spaargaren, “Spectral ence Society of America Journal, vol. 73, no. 1, pp. 72–81, 2009. reflectance based indices for soil organic carbon quantifica- [21] E. Ben-Dor, N. Levin, A. Singer, A. Karnieli, O. Braun, and G. J. tion,” Geoderma, vol. 145, no. 1-2, pp. 28–36, 2008. Kidron, “Quantitative mapping of the soil rubification process [5]E.Ben-Dor,Y.Inbar,andY.Chen,“Thereflectancespectra on sand dunes using an airborne hyperspectral sensor,” Geo- of organic matter in the visible near-infrared and short wave derma, vol. 131, no. 1-2, pp. 1–21, 2006. infrared region (400-2500 nm) during a controlled decompo- [22]S.Chabrillat,A.F.H.Goetz,L.Krosley,andH.W.Olsen,“Use sition process,” Remote Sensing of Environment,vol.61,no.1, of hyperspectral images in the identification and mapping of pp. 1–15, 1997. expansive clay soils and the role of spatial resolution,” Remote [6] J. Hill and B. Schutt,¨ “Mapping complex patterns of erosion Sensing of Environment, vol. 82, no. 2-3, pp. 431–445, 2002. and stability in dry mediterranean ecosystems,” Remote Sens- [23] C. Gomez, P. Lagacherie, and G. Coulouma, “Continuum ing of Environment, vol. 74, no. 3, pp. 557–569, 2000. removal versus PLSR method for clay and calcium carbonate content estimation from laboratory and airborne hyperspec- [7] R. A. Viscarra Rossel and T. Behrens, “Using data mining to tral measurements,” Geoderma, vol. 148, no. 2, pp. 141–148, model and interpret soil diffuse reflectance spectra,” Geo- 2008. derma, vol. 158, no. 1-2, pp. 46–54, 2010. [24]P.Lagacherie,F.Baret,J.B.Feret,J.MadeiraNetto,and [8] E. Ben-Dor and A. Banin, “Visible and near-infrared (0.4– J. M. Robbez-Masson, “Estimation of soil clay and calcium μ 1.1 m) analysis of arid and semiarid soils,” Remote Sensing of carbonate using laboratory, field and airborne hyperspectral Environment, vol. 48, no. 3, pp. 261–274, 1994. measurements,” Remote Sensing of Environment, vol. 112, no. [9] C.-W. Chang, D. A. Laird, M. J. Mausbach, and C. R. 3, pp. 825–835, 2008. Hurburgh, “Near-infrared reflectance spectroscopy-principal [25] R. G. Lechmere-Oertel, G. I. H. Kerley, and R. M. Cowling, components regression analyses of soil properties,” Soil Science “Patterns and implications of transformation in semi-arid Society of America Journal, vol. 65, no. 2, pp. 480–490, 2001. succulent thicket, South Africa,” Journal of Arid Environments, [10] A. Palacios-Orueta and S. L. Ustin, “Remote sensing of soil vol. 62, no. 3, pp. 459–474, 2005. properties in the Santa Monica Mountains I. Spectral analysis,” [26] J. W. Lloyd, E. C. van den Berg, and A. R. Palmer, “Patterns Remote Sensing of Environment, vol. 65, no. 2, pp. 170–183, of transformation and degradation in the Thicket Biome, 1998. South Africa,” Report 39, Terrestrial Ecology Research Unit, [11] E. Ben-Dor and A. Banin, “Near-infrared analysis as a rapid University of Port Elizabeth, Port Elizabeth, South Africa, method to simultaneously evaluate several soil properties,” 2002. Soil Science Society of America Journal, vol. 59, no. 2, pp. 364– [27] A. J. Mills and M. V.Fey, “Transformation of thicket to savanna 372, 1995. reduces soil quality in the Eastern Cape, South Africa,” Plant [12]T.H.Waiser,C.L.S.Morgan,D.J.Brown,andC.T.Hallmark, and Soil, vol. 265, no. 1-2, pp. 153–163, 2004. “In situ characterization of soil clay content with visible near- [28] ARC-ISCW, Weather Data of Three Selected Climate Stations in infrared diffuse reflectance spectroscopy,” Soil Science Society the Eastern Cape Province,ARC-ISCW,Pretoria,SouthAfrica, of America Journal, vol. 71, no. 2, pp. 389–396, 2007. 2011. [13] E. Ben-Dor, S. Chabrillat, J. A. M. Dematteˆ et al., “Using Imag- [29] A. J. Mills and R. M. Cowling, “Rate of carbon sequestration at ing Spectroscopy to study soil properties,” Remote Sensing of two thicket restoration sites in the Eastern Cape, South Africa,” Environment, vol. 113, no. 1, pp. S38–S55, 2009. Restoration Ecology, vol. 14, no. 1, pp. 38–49, 2006. ff [14] C. Gomez, R. A. Viscarra Rossel, and A. B. McBratney, “Soil [30] A. Walkley and I. A. Black, “An examination of the Degtjare organic carbon prediction by hyperspectral remote sensing method for determining soil organic matter and a proposed andfieldvis-NIRspectroscopy:anAustraliancasestudy,”Geo- modification of the chromic acid titration method,” Soil derma, vol. 146, no. 3-4, pp. 403–411, 2008. Science, vol. 37, pp. 29–37, 1934. [31] O. Mehra and M. Jackson, “Iron oxide removal from soils [15] A. Stevens, B. van Wesemael, H. Bartholomeus, D. Rosillon, and clays by a dithionite-citrate system buffered with sodium B. Tychon, and E. Ben-Dor, “Laboratory, field and airborne bicarbonate,” Clays and Clay Minerals, vol. 7, pp. 317–327, spectroscopy for monitoring organic carbon content in agri- 1958. cultural soils,” Geoderma, vol. 144, no. 1-2, pp. 395–404, 2008. [32] G. W. Gee and J. W. Bauder, “Particle size analysis,” in Methods [16] T. Udelhoven, C. Emmerling, and T. Jarmer, “Quantitative of Soil Analysis. Part 1, A. Klute, Ed., pp. 383–411, American ff analysis of soil chemical properties with di use reflectance Society of Agronomy, Soil Science Society, Madison, Wis, spectrometry and partial least-square regression: a feasibility USA, 2nd edition, 1986. study,” Plant and Soil, vol. 251, no. 2, pp. 319–329, 2003. [33] A. Savitzky and M. J. E. Golay, “Smoothing and differentiation [17] V. L. Mulder, S. de Bruin, M. E. Schaepman, and T. R. Mayr, of data by simplified least squares procedures,” Analytical “The use of remote sensing in soil and terrain mapping-a re- Chemistry, vol. 36, no. 8, pp. 1627–1639, 1964. view,” Geoderma, vol. 162, no. 1-2, pp. 1–19, 2011. [34] R. Clark, “Spectroscopy of rocks and minerals, and principles [18] T. Selige, J. Bohner,¨ and U. Schmidhalter, “High resolution of spectroscopy,” in Remote Sensing for the Earth Sciences, vol. topsoil mapping using hyperspectral image and field data in 3, pp. 3–58, John Wiley & Sons, New York, NY, USA, 1999. multivariate regression modeling procedures,” Geoderma, vol. [35] M. F. Baumgardner, L. F. Silva, L. L. Biehl, and E. R. Stoner, 136, no. 1-2, pp. 235–244, 2006. “Reflectance properties of soils,” Advances in agronomy. Vol. 38, [19] A. Stevens, T. Udelhoven, A. Denis et al., “Measuring soil pp. 1–44, 1985. organic carbon in croplands at regional scale using airborne [36] G. Hunt and J. Salisbury, “Visible and near-infrared spectra of imaging spectroscopy,” Geoderma, vol. 158, no. 1-2, pp. 32–45, minerals and rocks: 1. Silicate minerals,” Modern Geology, vol. 2010. 1, pp. 283–300, 1970. 20 Applied and Environmental Soil Science

[37] C. Grove, S. Hook, and E. Paylor II, Laboratory Reflectance Spectra of 160 Minerals, 0.4 to 2.5 Micrometers, Jet Propulsion Laboratory, National Aeronautics and Space Administration, JPL publication 92-2, Pasadena, Calif, USA, 1992. [38] G. Hunt, J. Salibury, and C. Lenhoff, “Visible and near-infra- redspectraofmineralsandrocks:III.Oxidesandhydroxides,” Modern Geology, vol. 2, pp. 195–205, 1971. [39] R. V. Morris, H. V. Lauer, C. A. Lawson, E. K. Gibson Jr, G. A. Nace, and C. Stewart, “Spectral and other physico- chemical properties of submicron powders of hematite (α- Fe2O3), maghemite (γ-Fe2O3), magnetite (Fe3O4), goethite (α-FeOOH) and lepidocrocite (γ-FeOOH),” Journal of Geo- physical Research, vol. 90, no. 4, pp. 3126–3144, 1985. [40] G. R. Hunt, “Spectroscopic properties of rocks and minerals,” in Handbook of Physical Properties of Rocks, R. S. Carmichael, Ed., vol. 1, pp. 295–385, CRC Press, Boca Raton, Fla, USA, 1982. [41] CSIR, Weather Data of a Climate Station in the Thicket Biome, South Africa, CSIR, Pretoria, South Africa, 2009. [42] E. R. Stoner and M. F. Baumgardner, “Characteristic variations in reflectance of surface soils,” Soil Science Society of America Journal, vol. 45, no. 6, pp. 1161–1165, 1981. [43] R. N. Clark and T. L. Roush, “Reflectance spectroscopy: quan- titative analysis techniques for remote sensing applications,” Journal of Geophysical Research, vol. 89, no. 7, pp. 6329–6340, 1984. [44] R. N. Clark, G. A. Swayze, K. E. Livo et al., “Imaging spectro- scopy: earth and planetary remote sensing with the USGS Tetracorder and expert systems,” Journal of Geophysical Re- search E, vol. 108, no. 12, pp. 5–1, 2003. [45] F. van der Meer, “Analysis of spectral absorption features in hyperspectral imagery,” International Journal of Applied Earth Observation and Geoinformation, vol. 5, no. 1, pp. 55–68, 2004. [46] W. Kessler, Multivariate Datenanalyse, Wiley-VCH Verlag, Weinheim, Germany, 2007. [47] S. Wold, M. Sjostr¨ om,¨ and L. Eriksson, “PLS-regression: a basic tool of chemometrics,” Chemometrics and Intelligent Laboratory Systems, vol. 58, no. 2, pp. 109–130, 2001. [48] R. A. V. Viscarra Rossel, “ParLeS: software for chemometric analysis of spectroscopic data,” Chemometrics and Intelligent Laboratory Systems, vol. 90, no. 1, pp. 72–83, 2008. [49] D. S. McIntyre, “Permeability measurements of soil crusts formed by raindrop impact,” Soil Science, vol. 85, pp. 185–189, 1958. [50] R. N. Clark, T. V. V. King, M. Klejwa, G. A. Swayze, and N. Vergo, “High spectral resolution reflectance spectroscopy of minerals,” Journal of Geophysical Research,vol.95,no.8,pp. 12–680, 1990. [51] G. R. Hunt, “Spectral signatures of particulate minerals in the visible and near infrared,” Geophysics, vol. 42, no. 3, pp. 501– 513, 1977. Hindawi Publishing Corporation Applied and Environmental Soil Science Volume 2012, Article ID 439567, 9 pages doi:10.1155/2012/439567

Research Article Using Reflectance Spectroscopy and Artificial Neural Network to Assess Water Infiltration Rate into the Soil Profile

Naftali Goldshleger,1, 2 Alexandra Chudnovsky,3 andEyalBen-Dor4

1 Soil Erosion Research Station, Soil Conservation and Drainage Division, Ministry of Agriculture, c/o Rupin Institute, Emek-Hefer 40250, Israel 2 Ariel University Center of Samaria, Israel 3 Department of Environmental Sciences, Weizmann Institute of Science, Rehovot 76100, Israel 4 Department of Geography and the Human Environment, Tel-Aviv University, Remote Sensing and GIS Laboratory, P.O. Box 39040, Ramat Aviv, Tel Aviv 69978, Israel

Correspondence should be addressed to Naftali Goldshleger, [email protected]

Received 8 November 2011; Revised 6 April 2012; Accepted 18 June 2012

Academic Editor: Raphael Viscarra Rossel

Copyright © 2012 Naftali Goldshleger et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We explored the effect of raindrop energy on both water infiltration into soil and the soil’s NIR-SWIR spectral reflectance (1200– 2400 nm). Seven soils with different physical and morphological properties from Israel and the US were subjected to an artificial rainstorm. The spectral properties of the crust formed on the soil surface were analyzed using an artificial neural network (ANN). Results were compared to a study with the same population in which partial least-squares (PLS) regression was applied. It was concluded that both models (PLS regression and ANN) are generic as they are based on properties that correlate with the physical crust, such as clay content, water content and organic matter. Nonetheless, better results for the connection between infiltration rate and spectral properties were achieved with the non-linear ANN technique in terms of statistical values (RMSE of 17.3% for PLS regression and 10% for ANN). Furthermore, although both models were run at the selected wavelengths and their accuracy was assessed with an independent external group of samples, no pre-processing procedure was applied to the reflectance data when using ANN. As the relationship between infiltration rate and soil reflectance is not linear, ANN methods have the advantage for examining this relationship when many soils are being analyzed.

1. Introduction 1.2. Reflectance Spectroscopy. Previous research has shown that reflectance spectroscopy can provide a valuable means 1.1. Physical Crust and Infiltration Rate. The main cause of of assessing the condition of the soil crust and estimating runoff from rain and overhead irrigation is the generation related problems. The spectral reflectance provides infor- of a structural soil crust [1, 2]. Crust formation results from mation about the chemical and physical conditions of bulk a combination of the kinetic energy impact of raindrops matter (in the laboratory) and of the surface (in the field), and the level of stability of the soil aggregates [1, 2]. The which can then be used to assess the condition of the soil structural crust is generated within minutes and significantly crust and estimate the related problem (e.g., infiltration rate, reduces soil infiltration rate (IR). Assessment of this IR is runoff potential) [3, 4]. Two important mechanisms which vital for sustainable land management, especially in semiarid usually occur during crust formation can be independently and arid regions where harsh climatic conditions cause soil tracked by spectroscopy: changes in particle size and changes degradation and damage to agricultural areas. Thus, moni- in mineral distribution on the crust surface, both of which toring of soil crust conditions is essential for the proper are associated with albedo [5, 6]. management of soils, from both an agricultural and land- Recent studies conducted by Goldshleger et al. [5, 7–9] degradation perspective. and Ben-Dor et al. [10] have shown that choosing specific 2 Applied and Environmental Soil Science wavelengths is essential for IR assessment. The authors a set of input and output variables [32]. Viscarra Rossel generated a spectral library for each soil type that was sub- and Behrens [23] compared several data-mining techniques jected to various rain energies. They showed that different with PLS, and Brown et al. [29] often use data mining as regression correlations exist between the spectral reading and a standard method in their studies. Although these appli- crust status for each studied soil. Moreover, the reflectance cations are in the soil sciences, we thought it would be of properties of soils could be used as indicators of the crusting interest to examine their performance on soil material and process generated by raindrop impact [5, 8, 10, 11]. especially on a property that exhibits high variation—the spectral signature of the soil’s physical crust. Farifteh et al. [33] showed that both ANN and PLS regression methods 1.3. Calibration Development: Linear and NonLinear Model- have great potential for estimating IR. Moreover, both ing Techniques. Most of the aforecited studies were used near methods showed similar accuracy in predicting . infrared spectroscopy (NIRS) to develop a spectral-based model for the assessment of soil crust status. NIRS is known Objective. In this study, the spectral reflectance of crusting as an analytical technology that enables many disciplines to soils (characterized by large variation) was analyzed in order conduct real-time monitoring procedures [12]. There has to assess applicable models for the prediction of IR values been a rise in recent years in the use of qualitative and via non-linear ANN and linear PLS regression. Four soils quantitative applications of NIRS to determine quality indi- from Israel and three soils from the US were selected as the cators for soils [13] and sediments [14]. In most cases, NIRS test population for examination. ANN and PLS regression requires a reference technique to build up calibration rou- results for the selected soil populations were also compared tines and to guarantee the proper maintenance of an to evaluate the performance of the different procedures. established calibration with reference to outlier detection and troubleshooting. The main issues, therefore, include spec- tral measurements in the VIS-NIR-SWIR regions (400– 2. Materials and Methods 2400 nm), spectroscopic calibration, spectral preprocessing, and validation of the calibration models. Finally, it should 2.1. Soil Samples. Four soils from Israel and three from the be emphasized that NIRS is not a routine tool but at the US, ranging from sandy to clayey, were used for this study. same time has tremendous potential, because it can provide Table 1 provides their physical and chemical compositions. unique information that is not accessible with any other technique [15]. 2.2. Experimental Rainstorm Simulations (IR Measurements). Constituents and properties of soils and/or sediments The soil samples were sieved through 2 mm mesh and iden- analyzed by reflectance spectroscopy combined with ref- tically packed into 30 × 50 × 4 cm perforated soil boxes over erence calibration modeling have included moisture, iron a layer of coarse sand. The boxes were placed on a carousel on oxides, organic and clay matter contents, organic C, N, P, S, a 5% slope, five boxes containing the same soil per run and K and Ca, and aggregate size [6, 18–23]. Furthermore, NIRS were subjected to simulated rainstorm [34] using distilled strategy has been used to determine the relationship between water. The water that percolated through the soil and sand spectral response and crust status [24, 25]. Quantitative layers out of the perforated soil box during a continuous sim- modeling of VIS-NIR-SWIR spectra is frequently performed ulated rainstorm approximately represented instantaneous using a variety of multivariate techniques, including stepwise IR. The IR of the soil was continuously measured during the regression, partial least-squares (PLS) regression, modified rainstorm. At first, the simulated rainstorm provided a fog- PLS, classification and regression trees, and principal com- type rain (no energy), with intensity similar to the initial ponent regression. All of these techniques have been applied IR of each soil as measured previously. This storm lasted to model the relationship between reference chemical and until the measured rate of percolation reached that of the physical soil properties and spectral data [26, 27]. The linear previously measured IR for the simulated rainstorm inten- PLS regression approach has been used to model several sity. Then the rainfall was stopped and the soil boxes were response variables simultaneously while effectively dealing left until drainage from all boxes ceased. One soil box was with strongly collinear and noisy independent variables [28]. randomly selected. The remaining soil boxes were subjected Recently, Viscarra Rossel et al. [22, 23] showed that PLS reg- to a continuous rainstorm at an intensity approximately ression techniques can be used successfully for the prediction similar to the initial IR of the soil, with an energy of ∼22.3 J. of soil properties from reflectance, even if the soils are The instantaneous simulated rainstorm was calculated by of different types and nature. They collected spectra and Morin et al. [34] and the accumulated (immediate) energy chemical properties of more than 2,000 soils worldwide and of each rainstorm depended on the depth intensity (mm) demonstrated the generation of a generic model for all of of the designed storm. For each of the remaining soil boxes, the selected soils, regardless of type. Brown et al. [29]and increasing levels of cumulative rainstorm energies (achieved Brown [30] built a global soil-spectral library to estimate the by increasing time of exposure to the rainstorm) were applied amount of clay-fraction material. (Tables 2(a) and 2(b)). Two consecutive runs were conducted A neural network is a “machine” that is designed to for each soil, resulting in 10 soil boxes with increasing levels model the way in which the brain performs a particular task of crusting (due to increasing levels of cumulative rainstorm or function [31].Artificialneuralnetworks(ANNs)havethe energies). After each rainstorm for a given soil, the boxes ability to model linear or non-linear relationships between were oven dried for 48 h at 105◦C (simulating dry field Applied and Environmental Soil Science 3

Table 1: Selected characteristics of the soil samples.

Mechanical composition (%) Chemical properties Soil series Israel∗ Soil classification Soil symbols ∗∗ ∗∗ OM CaCO ESP USA USDA Sand Silt Clay 3 gkg−1 gkg−1 % Is1(G) Grumosol Typic chromoxerets 21.7 25.4 52.9 12.4 114.4 1.0 Is2(S) Loamy loess Calcic haploxeralf 37.7 40.6 21.7 9.1 108.2 2.2 Is3(A) Lithie ruptie xerochrept 2.6 32.7 64.7 22.6 146 0.5 Is4(E) Hamra Typic Rhodoexeralf 79 10 11 8.2 22.2 0.8 Am1 Reiff cm∗∗∗ Mollic xerofluevents 29 54 17 19.4 66.4 0.72 Am2 Reiff om∗∗∗∗ Mollic xerofluevents 27 55 18 23 64.2 1.17 Am3 Capay Typic xerofluevents 7 62 31 14.4 77.1 0.73 ∗ [16]. ∗∗[17]. ∗∗∗cm: soil under conventional management. ∗∗∗∗om: soil under organic management.

Table 2: Cumulative rainfall energies (J mm−1 m−2) and subsequent infiltration rates (IR) for each soil: (a) Israeli soils. (b) US soils.

(a) Cumulative rainfall energies (J mm−1 m−2) and the subsequent IR for Israeli soils Is1-Grumosol Is2-loess Is3-Terra Rossa Is4-Hamra IR mm h−1 Jmm−1 m−2 IR mm h−1 Jmm−1 m−2 IR mm h−1 Jmm−1 m−2 IR mm h−1 Jmm−1 m−2 205 0 207 0 145 0 46 0 173 134 159 70 69 252 32 280 154 200 94.3 109 70 380 28.5 420 69 265 36.8 145 49,5 506 24 850 41 400 32.2 216 39 1012 21 1120 30 530 22 290 30 1460 17 1550 17 665 10.6 506 19.6 2016 11.5 1700 15 800 8.3 613 13.8 2832 11.5 2300 10 1590 7.4 1012 9.4 4060 7 3200 2522 1842 5.8 3.5 (b) Cumulative rainfall energies (J mm−1 m−2) and the following (IR) for American soils

Am1-Reiffcm Am2-Reiffom Am3-Capay IR mm h−1 Jmm−1 m−2 IR mm h−1 Jmm−1 m−2 IR mm h−1 Jmm−1 m−2 84 0 207 0 120 0 53 70 92 145 78 141 44 136 62 216 32 210 30 200 35 290 15/6 280 26 240 28s 335 11.5 350 17 270 23 357 7.8 420 11 340 18 435 4.1 560 5.5 540 470 1560 723 1810

conditions) and subjected to spectral reading of the dry sur- NIR-SWIR region (1200–2400 nm) with a bandwidth of face reflectance. 10 nm (1,200 spectral bands in total). BaSO4 powder was used as a white reference to enable conversion of the mea- 2.3. Spectral Measurements. The spectral reflectance mea- sured data into reflectance values. An average spectrum for surements were carried out with a Quantum 1200+ labo- every cumulative level of rain energy was calculated, using ratory spectrometer. The instrument was optimized to the five replicates. 4 Applied and Environmental Soil Science

2.4. Data Analyses. The IR and reflectance data were ana- object variation in the predictor block (the spectral matrix lyzed to generate a spectral-based model for the prediction in this case) and the corresponding variation in the response of IR solely from spectral measurements. Importantly, for block by the Y-scores. What PLS does is to maximize the the ANN analysis, it is impossible to include the array of covariance between these inner variables (also called latent all measured wavelengths and it must be reduced based on structures). A weight vector is calculated for each PLS com- number of observations (e.g., degrees of freedom). There- ponent that assess the contribution of each X-variable to the fore, first, the most significant wavelengths were identified explanation of Y in that particular component. (as described in the section below) and modeling was run The PLS regression model was constructed by cross- based on this selection. For comparison, we also apply linear validation. Due to the limited number of samples, statistical PLS regression analyses to assess IR values of soils. parameters for the calibration model (49 samples) were cal- Reflectance spectra of seven soils (64 samples) and their culated by leave-one-out-cross-validation (only one sample corresponding IR were carefully checked for errors in the at a time is kept out of the calibration and used for pre- measured IR values (e.g., suspiciously large or small mea- diction). The model was subjected to external validation, sured values of IR for the specific sample, Y-variable) and performed on a set of 11 samples, selected from the calib- soil reflectance spectra (NIR-SWIR range, X-variables). For ration matrix, and isolated as a test set. The test set was the both variables, no outliers were detected. same for both PLS and ANN models. Furthermore, reflec- It is important to mention that for ANN modeling, raw tance spectra were considered using the second derivative on IR was used whereas for PLS was based on log IR value were absorbance as a preprocessing technique based on Goldsh- used. In addition raw spectra were used, for ANN modeling, leger et al. [9]. whereas for PLS spectra were considered using the second derivative on absorbance. Both, PLS and ANN were run on the same reduced spectral range (48 wavelengths). 2.7. Modeling Approach: ANN. The ANN has three layers of neurons: input, hidden, and output [31]. All of the neurons in each layer are connected to neurons in the adjacent layer. 2.5. Wavelength Selection for a Model. We investigated the Neurons in the hidden layer perform two tasks: they sum the influence of selecting individual wavelengths to generate an weighted inputs and, then, pass the resulting summation to optimal calibration model to assess IR in soils. To that end, neurons in the adjacent layer through a sigmoidal processing we ran PLS models on the entire wavelength region with function known as the activation function. This function the aim of identifying the significant wavelengths. Significant determines the neuron’s output and generally maps the variables were estimated using Martens’ uncertainty test interval (−∞, ∞) onto (0, 1). The weighted inputs to the [35], which assesses the stability of the PLS regression. Many hidden and output neurons can be adjusted to shift the whole plots and results are associated with the test, which allows summation in a direction that will aid in minimizing errors. estimating the model’s stability, identifying perturbing sam- The activation function essentially forces each neuron’s ples or variables, and selecting significant X variables. The summation between set limits, 0 and 1, before it is passed test is performed with cross-validation and performed first to the neurons in the next layer. The most common learning on the whole spectral range. Therefore, each PLS model algorithm is based on supervised error backpropagation, in (raw and preprocessed) was first run on the whole range which a data set of system inputs and outputs (the training of spectra, and, then, restricted to significant wavelengths set) is presented to a neural network having initial assumed which were identified based on Martens’ test. We also include connection weights. An error is calculated by comparing the additional wavelengths (e.g., supervised/manual selection) actual outputs to those calculated, and, accordingly, the con- based on knowledge of soil absorbencies (e.g., absorbance of nection weights are modified to decrease the sum of squared clay at 2200 nm was included). Then the model was run errors. This training procedure is carried out repeatedly solely on the selected wavelengths and reassessed until accep- until the error converges to a small appreciable value. The table results (in terms of model stability and prediction network is tested by processing another set of inputs (the test accu-racy) were achieved. All data management, calcula- set) and comparing the network output with the test set. If tions, PLS analyses, and different spectral pretreatments were the resulting error is sufficiently small, the network is con- performed using version 9.2 of the chemometric software sidered trained and it may be used for predicting outputs. Unscrambler (Camo Software, Oslo, Norway). The numbers of neurons used for training the networks were varied systematically between four and 12 to allow 2.6. Modeling Approach: PLS. PLSwereusedtomodelcor- subsequent selection of the most appropriate network size relation between soil reflectance spectra and IR. The PLS based on the performance on the test data set [36]. For regression is based on latent variable decomposition of two ANN modeling, the computer software MatLab, in-house blocks of variables, the X and Y matrices, which contain programs and the Neural Network Toolbox were used (The spectral data and any reference chemical variable, respec- Mathworks Inc., Natick, MA, USA). tively. The objective of the regression is to locate small num- For each scale of the study, each data set was divided into bers of PLS components that efficiently predict Y when X three subsets, one for training (half of input data), one for is used [35]. PLS analysis establishes a relationship between validation (a quarter of the input data), and one for testing the predictor block, X-matrix, and the response, Y,viaan (a quarter of the input data). The Levenberg–Marquardt innerrelation of their scores. The X-scores, describe the algorithm [37], which provides a fast optimization, was Applied and Environmental Soil Science 5 used for network training. The performance of a trained Grumosol has a high content of expansive clay over 50% It network was assessed by comparing the mean squared error can be concluded that one major crust-formation pattern (MSE) and root mean squared error (RMSE) calculated from is exhibited in soils whereas an exceptional pattern can be training, validation, and testing data subsets. Only a training obtained in nonclayey soils. The increasing accumulated rain data subset is used for updating the network weights and energy applied to these soils caused an increase in albedo biases. During training, error with respect to validation data patterns related to increasing clay content. This is a result subset is monitored. When the validation error increases for a of fine particle size segregation in the crusting process specified number of iterations, the training is stopped. Error where clay minerals usually occupy the fine particle fraction. with respect to testing data subset is not monitored during Furthermore, there is evidence of changes in the crust with training but is quantified to assess the final performance of a a rise in rain energy (Figure 1), including demolition of the trained ANN model. clay layer or washing away of the sand particles. The above For the ANN analysis, the data of all spectral measure- results demonstrate the usefulness of reflectance as a tool to ments were manually divided into three subsets: a calibration monitor the crust formation process in a particular soil. set (training set) containing 37 samples (to establish the model), a validation set, comprised of 12 samples (to validate 3.2. Calibration Modeling Using PLS Analyses. Amuch the training set), and an external test set comprised of 11 clearer picture of spectroscopic-IR changes is obtained from samples (to examine the predictive ability of the entire the plot of factor loadings (LV, or regression coefficients) model). Note that the same test set was used for PLS analyses. versus wavelengths of the best pre-processing technique (sec- In our selection of calibration/validation and test sets, soil ond derivative of absorbance run on selected wavelengths) samples were comparably presented in terms of means of soil and shown in Figure 2. The wavelengths selected for the type and of their measured IR. IR prediction were based on Marten’s test selection and The predictive capability of all models was compared in were centered at 1230, 1385, 1390–1407, 1436, 1447, 1850, terms of the relative standard error for both the calibration 1866, 1912, 1940, 2016, 2180, 2200, 2250, 2292, 2315, and and validation sets (denoted as RMSECV (%) and RMSEP 2351 nm. These wavelengths can be spectrally assigned to (%)): OH in water (1400 and 1900 nm) [39], organic matter (2016   / and 2290 nm) [6], and Al-OH in clay minerals (2200 and 2 1 2 Σ(Xm − XP) 2250 nm) [39, 40]. These are general features for all soil types %RMSE = × 100. (1) ΣX2m and therefore, can be identified as a generic spectral footprint for the prediction of IR. In addition we used the ratio of prediction to deviation Using PLS analyses, the best model was generated (RPD), which is defined as the ratio of the standard deviation when the first derivative was applied to the reflectance of the reference values (e.g., IR) to the root mean square error values but run only using significant wavelengths (Marten’s of cross-validation (RMSECV) or prediction (RMSEP) [38]. test). Figure 3 upper panel shows the relationship between An RPD value below 1.5 was taken to indicate that the model measured and predicted values for cross-validation. The is unusable, whereas above 3.0 was considered to be excellent. statistics obtained were RMSECV of 10.2% and 13.6% for all soils, a slightly larger R2 of 0.70 and RPD values between 2.0 and 2.1. Slightly but not significantly better models could be 3. Results and Discussion achieved for the data set without IS4 in the cross-validation data set (54 samples in the cross-validation set), with R2 3.1. Spectral Changes. Soil reflectance is a product of particle increasing to 0.73. The lower panel of Figure 3 shows a plot of size distribution (expressed as baseline height or physical this best fit model when applied on an external mixed data set albedo effect) and chemical composition (expressed as measured by LT spectrometer (11 samples). The linear plot absorption peaks). In all soils tested to date, the soil crust exhibits a slope of 0.75, r2 = 0.46 and with a relatively high chroma (expressed by albedo changes) becomes brighter as RMSEP (%) of 15%. the cumulative rain energy increases. Moreover, a noticeable sequence of albedos is observed in all soils [9]. For example, in Israeli soils, the cumulative energy level ranges from lowest 3.3. Calibration Modeling Using the ANN Approach. Figures (baseline around 0.25, 0.15, 0.4, and 0.55) to highest (base- 4 and 5 present the measured versus predicted values for line around 0.4, 0.3, 0.5, and 0.7) for Grumusol, Terra Rossa, each soil type for the calibration and test sets, respectively, Loess, and Hamra soils, respectively, [8]. using the ANN approach. The test set was representative of Figure 1 provides the reflectance spectra in the NIR range the constructed calibration model and included different soil (1200–2400 nm) of each energy treatment used on all four types covering various IR values (i.e., wide dynamic range). soils: Hamra, Terra Rossa, Loess, and Grumosol. Hamra is Figure 5 shows the R2 and RMSE values for the test set (R2 = a sandy soil with a relatively high quartz content (90%) 0.91, RMSE = 10.6%), whereas for the calibration set, R2 = and a small percentage of clay (5%); Terra Rossa is a type 0.96 (RMSE = 6.5%). These results indicate that the ANN of red clay soil produced by the weathering of limestone. model learned the system well and had good generalization Loess is an aeolian sediment formed by the accumulation of and assessment abilities. In contrast, when PLS model was wind-blown silt, typically in the 20–50 µm size range, 20% run on the same test set (see Figure 3 lower panel for com- or less clay, and the remainder equal parts sand and silt. parison), lower values of R2 (0.46) and RMSEP (15%) were 6 Applied and Environmental Soil Science

Is1 Is2 0.5 0.55

0.45 0.5

0.4 0.45 0.35

Reflectance 0.4 Reflectance 0.3

0.35 0.25

0.2 0.3 1250 1450 1650 1850 2050 2250 1250 1450 1650 1850 2050 2250 Wavelength (nm) Wavelength (nm) Rain energy (J mm−1 m−2) Rain energy (J mm−1 m−2) 0 530 0 290 134 665 71 506 200 800 109 613 265 1590 145 1012 400 2522 216 1842 (a) (b)

Is3 Is4 0.8 0.3 0.75

0.7

0.2 0.65 Reflectance Reflectance 0.6

0.55 0.1 1250 1450 1650 1850 2050 2250 0.5 Wavelength (nm) 1250 1450 1650 1850 2050 2250 Rain energy (J mm−1 m−2) Wavelength (nm) −1 −2 0 1460 Rain energy (J mm m ) 252 2016 0 1120 380 2382 280 1550 506 4060 420 1700 1012 850 2300 (c) (d)

Figure 1: Spectral reflectance of each cumulative (rain) energy level for Israeli soil types.

accepted. In this regard, the ANN approach gives better combined in the calibration, validation, and test sets to assess results for the assessment of IR than the PLS regression IR using both PLS regression and ANN analyses, the latter analysis. Nevertheless, both methods indicate the possibility provided significantly better accuracy. of generating a generic model designed to assess IR, even if In our analysis, the major advantage of ANN is the utili- an outlier group (in terms of crust formation) is involved (in zation of an independent test set with raw IR values, elimi- our case, Is4) in the cross-validation or calibration/valida- nating the need to apply any data pre-processing to the tion test set. Interestingly, when the same soil samples were reflectance spectra. As the relationship between IR and soil Applied and Environmental Soil Science 7

20000 2180 nm 15000

10000 1407 nm 1940 nm 2200 nm 1436 nm 2351 nm 5000 2016 nm cients PV6 ffi 0 1200 1400 1600 1800 2000 2200 2400 −5000 1229 nm 1384 nm Wavelength (nm) 2315 nm 1912 nm − 1447 nm 10000 2250 nm 2292 nm Regression coe Regression −15000 1866 nm −20000

Figure 2: PLS regression coefficients accepted for a generic model (cross-validation data set, 64 samples).

3 3 R2 = 0.7 R2 = 0.46 2.5 2.5

2 2

1.5 1.5

1 1 Predicted log (IR) Predicted log (IR)

0.5 0.5

0 0 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 Measured log (IR) Measured log (IR) (a) (b)

Figure 3: (a) Measured versus predicted values of logarithmically transformed infiltration rates (IR). Results of cross-validation modeling run on the calibration set (53 samples, second derivative of absorbance run on 48 wavelengths). (b) 10 measured versus predicted IRs of the external test set (N = 11 samples).

reflectance is not linear, ANN methods have the advantage 4. Concluding Remarks when many soils are being analyzed. Indeed, when assessing IR by either PLS regression or Soils from Israel and the US were examined for possible ANN analyses for heterogeneous sets of soils, the spectral prediction of IR using spectral measurements. We concluded range needs to be reduced by identifying the wavelengths at that the ANN approach gives better results for the assessment which the spectral data give the best analytical behavior. In of IR in a heterogeneous sample set than the PLS regression our study, based on results of Marten’s test selection, we used analysis. We used specific ranges of wavelengths for both the wavelengths highlighted in Figure 2, which are similar to models. The calibration models developed and used in this those obtained by Goldshleger et al. [8] and by Ben-Dor et study can be transferred to use with other soils. al. [10] for the Israeli soil types. This result confirms that WhenPLSmodelwasrunusingtherawspectra(X- different kinds of soils can be modeled using specific ranges variables) and also when PLS was run using original not log- of wavelengths. An additional applicative conclusion that can transformed values (Y-variable), much lower accuracy of IR be drawn from this result is the possibility of constructing were accepted (RMSE∼30–50%) therefore results were not a simple sensor for future IR assessment in the field based reported. In contrast, raw spectra and original IR values were on specific wavelength regions, with the aim of predicting used in ANN approach, showing an additional advantage of IR online. The exploration of a larger sample set with other using the ANN approach to assess IR based on raw spectral soil types and a wider dynamic (or other) range of IR com- data using our data set. Note that, although ANN modeling binations is warranted. allows high accuracy estimations of soil IR, both approaches 8 Applied and Environmental Soil Science

250 R2 = 0.9615

200 am3–7 am3–6 ) 1 − am3-4 150

100 am2-po7 am3–1 am2-po4

am2-po6 Predicted IR (J) (mm h IR (J) (mm Predicted am9 am5 50 E7 am7 am10 A1 A9 E4 G4 E2 E9 A5 G8 E1 am1 S10 A5 E5 S8E7 S1 A3 G6 0 G3 G10 0 50 100 150 200 250 Measured IR (J) (mm h−1)

Figure 4: Measured versus predicted values of infiltration rate (IR) resulting from ANN analyses. Model run on original reflectance values, 48 wavelengths.

R2 = 0.91 am3–5 [3] S. M. De Jong, “The analysis of spectroscopical data to map

) 200 1 soil types and soil crusts of Mediterranean eroded soils,” Soil − Technology, vol. 5, no. 3, pp. 199–211, 1992. 150 ˆ am3-2 [4]J.A.M.Dematte,R.C.Campos,M.C.Alves,P.R.Fiorio,and M. R. Nanni, “Visible-NIR reflectance: a new approach on soil 100 am2-po2 evaluation,” Geoderma, vol. 121, no. 1-2, pp. 95–112, 2004. [5] N. Goldshleger, E. Ben-Dor, Y. Benyamini, D. G. Blumberg, am2-po3 50 am4 and M. Agassi, “Soil crusting and infiltration process as moni- E3 am2-po5 A6 toredbysoilreflectancespectroscopyintheSWIRregion,” Predicted IR (J) (mm h IR (J) (mm Predicted G2 E6 am8 0 G9 International Journal of Remote Sensing, vol. 23, no. 19, pp. 0 50 100 150 200 3909–3920, 2002. Measured IR (J) (mm h−1) [6] E. Ben Dor, N. Goldshleger, M. Eshel, V. Mirablis, and U. Bas- son, “Combined active and passive remote sensing methods N = Figure 5: ANN results for the test set ( 11 samples; same sam- for assessing soil salinity,” in Remote Sensing of Soil Salini- ples were used for PLS analyses). zation: Impact and Land Management, G. Metternicht and A. Zinck, Eds., pp. 235–255, CRC Press, Boca Raton, Fla, USA, 2009. [7] N. Goldshleger, E. Ben-Dor, Y. Benyamini, D. Blumberg, and are dependent on wavelengths selection and such a selection M. Agassi, “The spectral reflectance of soil’s structural crust in must, therefore, be applied before either model is considered. the SWIR region (1.2–2.5 µm),” Terra Nova, vol. 13, no. 1, pp. Finally, the significant relationships between selected 12–17, 2001. wavelength reads and IR as carried out in the present study [8] N. Goldshleger, E. Ben-Dor, Y. Benyamini, and M. Agassi, “Soil indicate that assessment of IR by reflectance spectroscopy is reflectance as a tool for assessing physical crust arrangement not only feasible but also reliable. of four typical soils in Israel,” Soil Science, vol. 169, no. 10, pp. 677–687, 2004. [9] N. Goldshleger, E. Ben-Dor, A. Chudnovsky, and M. Agassi, References “Soil reflectance as a generic tool for assessing infiltration rate induced by structural crust for heterogeneous soils,” European [1]M.Agassi,I.Shainberg,andJ.Morin,“Effect of raindrop Journal of Soil Science, vol. 60, no. 6, pp. 1038–1051, 2009. impact energy and water salinity on infiltration rates of sodic [10] E. Ben-Dor, N. Goldlshleger, Y. Benyamini, D. G. Blumberg, soils,” Soil Science Society of America Journal,vol.45,no.1,pp. and M. Agassi, “The spectral reflectance properties of soil 848–851, 1981. structural crusts in the 1.2- to 2.5-µm spectral region,” Soil [2]M.Agassi,J.Morin,andI.Shainberg,“Effect of raindrop Science Society of America Journal, vol. 67, no. 1, pp. 289–299, impact energy and water salinity on infiltration rates of sodic 2003. soils,” Soil Science Society of America Journal,vol.49,no.1,pp. [11] E. Ben-Dor, N. Goldshleger, O. Braun et al., “Monitoring 186–190, 1985. infiltration rates in semiarid soils using airborne hyperspectral Applied and Environmental Soil Science 9

technology,” International Journal of Remote Sensing, vol. 25, [30] D. J. Brown, “Using a global VNIR soil-spectral library for no. 13, pp. 2607–2624, 2004. local soil characterization and landscape modeling in a 2nd- [12] P. Williams and K. Norris, Near-Infrared Technology in The order Uganda watershed,” Geoderma, vol. 140, no. 4, pp. 444– Agricultural and Food Industries, American Association of 453, 2007. Cereal Chemists, 1987. [31] R. Hecht-Nielsen, Neurocomputing, Addison-Wesley, 1990. [13] E. Ben-Dor, K. Patkin, A. Banin, and A. Karnieli, “Mapping of [32] C.-C. Yang, S. O. Prasher, P. Enright et al., “Application of several soil properties using DAIS-7915 hyperspectral scanner decision tree technology for image classification using remote data—a case study over soils in Israel,” International Journal of sensing data,” Agricultural Systems, vol. 76, no. 3, pp. 1101– Remote Sensing, vol. 23, no. 6, pp. 1043–1062, 2002. 1117, 2003. [14] R. Clark, “Spectroscopy of rocks and minerals,” in Manual of [33] J. Farifteh, F. Van der Meer, C. Atzberger, and E. Carranza, ff Remote Sensing,A.Rencz,Ed.,pp.3–58,JohnWiley&Sons, “Quantitative analysis of salt-a ected soil reflectance spectra: a 1999. comparison of two adaptive methods (PLSR & ANN),” Remote Sensing of Environment, vol. 110, no. 1, pp. 59–78, 2007. [15] H. W. Siesler, Y. Ozaki, S. Kawata, and H. M. Heise, Eds., Near- [34] J. Morin, D. Goldberg, and I. Singer, “Rainfall simulator with Infrared Spectroscopy. Principles, Instruments, Applications, rotating disk,” Transactions of the American Society of Agricul- Wiley-VCH, Weinheim, Germany, 2002. tural Engineers, vol. 10, pp. 74–77, 1967. [16] Y. Dan and Z. Raz, Soil Association Map of Israel, Volcani [35] K. Esbensen, Multivariate Data Analyses. An Introduction to Institute for Agricultural Research, Israel, 1970. Multivariate Data Analyses and Experimental Design,Aalborg [17] Keys to soil Taxonomy, Soil Survey Staff, “Soil Taxonomy : A University, Esbjerg, Denmark, 5th edition, 2002. system of soil classification for making and interapating soils [36] A. J. Adeloye and A. De Munari, “Artificial neural network surveys,” US Department of Agriculture, 1999. based generalized storage-yield-reliability models using the [18] E. Ben-Dor and A. Banin, “Visible and near infrared (0.4– Levenberg-Marquardt algorithm,” Journal of Hydrology, vol. 1.1mm) analysis of arid and semi arid soils,” Remote Sensing of 326, no. 1–4, pp. 215–230, 2006. Environment, vol. 48, no. 3, pp. 261–274, 1995. [37] H. Demuth and M. Beale, Neural Network Toolbox for Use with [19] L. Kooistra, R. Wehrens, W. Leuven, and L. Buydens, “Possibil- Matlab, The MathWorks, Natick, Mass, USA, 2004. ities of visible-near-infrared spectroscopy for the assessment [38] A. M. Mouazen, W. Saeys, J. Xing, J. De Baerdemaeker, of soil contamination in river floodplains,” Analytica Chimica and H. Ramon, “Near infrared spectroscopy for agricultural Acta, vol. 446, no. 1-2, pp. 97–105, 2001. materials: an instrument comparison,” Journal of Near Infrared Spectroscopy, vol. 13, no. 2, pp. 87–97, 2005. [20] L. Kooistra, G. Wanders, R. Epemac, W. Leuven, L. Wehrens, [39] G. R. Hunt, “Spectral signatures of particulate minerals, in the and L. Buydens, “The potential of field spectroscopy for the visible and near-infrared,” Geophysics, vol. 42, no. 3, pp. 501– assessment of sediment properties in river floodplains,” Analy- 513, 1977. tica Chimica Acta, vol. 484, no. 2, pp. 189–200, 2003. [40] G. R. Hunt, J. W. Salisbury, and A. Lenhoff, “Visible and near- [21] G. I. Metternicht and J. A. Zinck, “Remote sensing of soil inftatrd spectra of minerals and rocks,” IIIoxides and Hydrox- salinity: potentials and constraints,” Remote Sensing of Envi- ides Modern Geology, vol. 2, pp. 195–205, 1971. ronment, vol. 85, no. 1, pp. 1–20, 2003. [22] R. A. Viscarra Rossel, R. N. McGlynn, and A. B. McBratney, “Determining the composition of mineral-organic mixes using UV-vis-NIR diffuse reflectance spectroscopy,” Geo- derma, vol. 137, no. 1-2, pp. 70–82, 2006. [23] R. A. Viscarra Rossel and T. Behrens, “Using data mining to model and interpret soil diffuse reflectance spectra,” Geo- derma, vol. 158, no. 1-2, pp. 46–54, 2010. [24] T. Owen, “Advances in UV-VIS spectroscopy,” Derivative Spec- troscopy, vol. 1, pp. 58–64, 1987. [25] F. Tsai and W. D. Philpot, “A derivative-aided hyperspectral image analysis system for land-cover classification,” IEEE Transactions on Geoscience and Remote Sensing,vol.40,no.2, pp. 416–425, 2002. [26] T. Udelhoven, C. Emmerling, and T. Jarmer, “Quantitative analysis of soil chemical properties with diffuse reflectance spectrometry and partial least-square regression: a feasibility study,” Plant and Soil, vol. 251, no. 2, pp. 319–329, 2003. [27] J. Moros, M. J. Mart´ınez-Sanchez,´ C. Perez-Sirvent,´ S. Gar- rigues, and M. de la Guardia, “Testing of the region of Murcia soils by near infrared diffuse reflectance spectroscopy and chemometrics,” Talanta, vol. 78, no. 2, pp. 388–398, 2009. [28] S. Wold, M. Sjostr¨ om,¨ and L. Eriksson, “PLS-regression: a basic tool of chemometrics,” Chemometrics and Intelligent Laboratory Systems, vol. 58, no. 2, pp. 109–130, 2001. [29] D. Brown, K. Shepherd, M. Walsh, M. Mays, and T. Reinsch, “Global soil characterization with VNIR diffuse reflectance spectroscopy,” Geoderma, vol. 132, no. 3-4, pp. 273–290, 2006. Hindawi Publishing Corporation Applied and Environmental Soil Science Volume 2012, Article ID 241535, 13 pages doi:10.1155/2012/241535

Research Article Spectral Estimation of Soil Properties in Siberian Tundra Soils and Relations with Plant Species Composition

Harm Bartholomeus,1 Gabriela Schaepman-Strub,2 Daan Blok,3 Roman Sofronov,4 and Sergey Udaltsov5

1 Centre for Geo-Information, Wageningen University, 6708 PB Wageningen, The Netherlands 2 Institute of Evolutionary Biology and Environmental Studies, University of Zurich,¨ 8006 Zurich, Switzerland 3 Nature Conservation and Plant Ecology, Wageningen University, 6708 PB Wageningen, The Netherlands 4 Institute of Biological Problems of the Cryolithozone, 677980 Yakutsk, Russia 5 Institute of Physicochemical and Biological Problems of Soil Science, 142290 Pushchino, Russia

Correspondence should be addressed to Harm Bartholomeus, [email protected]

Received 13 February 2012; Accepted 18 June 2012

Academic Editor: Raphael Viscarra Rossel

Copyright © 2012 Harm Bartholomeus et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Predicted global warming will be most pronounced in the Arctic and will severely affect environments. Due to its large spatial extent and large stocks of soil organic carbon, changes to organic matter decomposition rates and associated carbon fluxes in Arctic permafrost soils will significantly impact the global carbon cycle. We explore the potential of soil spectroscopy to estimate soil carbon properties and investigate the relation between soil properties and vegetation composition. Soil samples are collected in , and vegetation descriptions are made at each sample point. First, laboratory-determined soil properties are related to the spectral reflectance of wet and dried samples using partial least squares regression (PLSR) and stepwise multiple linear regression (SMLR). SMLR, using selected wavelengths related with C and N, yields high calibration accuracies for C and N. PLSR yields a good prediction model for K and a moderate model for pH. Using these models, soil properties are determined for a larger number of samples, and soil properties are related to plant species composition. This analysis shows that variation of soil properties is large within vegetation classes, but vegetation composition can be used for qualitative estimation of soil properties.

1. Introduction It has been suggested that a warmer and drier climate in Arctic regions might increase the decomposition rate and, The Arctic is experiencing the highest rates of warming hence, release more CO2 to the atmosphere than at present compared with other world regions [1] that will likely [11, 12]. have great impacts on high-latitude ecosystems [2, 3]. The Besides expected changes within the soil itself, changes large and potentially volatile carbon pools stored in Arctic on the vegetation development are observed and expected for soils have the potential for large emissions of greenhouse future warming. Plant species composition may greatly affect gases in the form of both CO2 and CH4 under warmer rates of soil processes, including decomposition [13]. In and potentially drier conditions, resulting in a positive general, species within a growth form (graminoids, evergreen feedback to global warming [4]. Further, climatic changes shrubs, deciduous shrubs, and mosses) are more similar in may impact vegetation development and affect water and their effects on decomposition than species belonging to energy exchange in tundra ecosystems, with consequences different growth forms, with graminoid litter having the for permafrost thaw depth [5, 6] and concomitant soil fastest rate and litter of deciduous shrubs and mosses having carbon release to the atmosphere [7–9]. The response of soil the slowest rates [14, 15]. Gough et al. [16] found that soil pH organic matter decomposition to increasing temperature is a was significantly correlated with plant species richness and critical aspect of ecosystem responses to global change [10]. density at larger spatial scales. 2 Applied and Environmental Soil Science

Abiotic soil factors have a strong influence on vege- 2. Materials and Methods tation development, since plant growth in tundra regions is typically limited by temperature and nutrient availabil- 2.1. Site Description and Measurements. Measurements are made in a low-Arctic tundra site within the Kytalyk nature ity [17, 18]. Without knowledge of the present chemical ◦  ◦  composition of the soil it is not possible to estimate how reserve in North East Siberia, Russia (70 49 N, 147 28 E). 2 and with which magnitude vegetation changes will take The research site covers an area of ca 9 km and is located place, thus limiting our understanding of climate-vegetation- on the north bank of the Berelekh (Yelon) river, a tributary permafrost feedbacks. Arctic vegetation is expected to be of the Indigirka river, approximately 30 km north West of the more shrub dominated with rising temperatures [18], which town Chokurdakh. The study area consists of a floodplain may positively feedback to summer atmospheric heating by area along the river and an extensive plain with thaw lakes ff decreasing the surface albedo [19, 20]. On the other hand, an and drained thaw lakes. The only large elevation di erence (ca 20–30 m) is caused by the presence of a Pleistocene increase in shrub cover may concomitantly also lead to sum- ◦ mer soil cooling and decreasing permafrost thaw by shading river terrace. The mean annual air temperature is −10.5 C, with a mean January temperature of −34.2◦Candamean the soil surface [6], thus potentially slowing down soil carbon ◦ turnover. More knowledge on the relationships between soil July temperature of +10.4 C. Annual mean precipitation properties and vegetation composition is however required amounts to 212 mm, of which about half falls as snow [25]. to accurately predict the consequences of climate-induced The soil is frozen for most part of the year, but the permafrost vegetation shifts for soil carbon pools in the Arctic. thaws to max. 50 cm depth during summer. Although there ff Due to the large carbon stocks in the permafrost soil are only minor di erences in topography in the area, there is and the potential high release of large quantities of carbon a large variation in micro topography and hydrology, which dioxide and methane, the role of tundra permafrost soils on results in a large variation in vegetation types. The vegetation global climate processes is significant. Therefore, we need at the research site consists of a mixture of graminoids, forbs, to know how large the carbon content of the soil is, and mosses, and shrubs and is classified as G4 (tussock-sedge, how this varies in space. Furthermore, we need to determine dwarf shrub, and moss tundra) and S2 (low-shrub tundra) other soil properties like pH and nutrients, in order to on the Circumpolar Arctic Vegetation Map (CAVM) [26]. estimate how these may influence carbon turnover rates and Fieldwork is done in the summer of 2008, including spectral vegetation development. However, costs for soil analysis are measurements of soil and vegetation, combined with the high and fieldwork faces many logistic difficulties due to the collection of soil samples and vegetation descriptions. inaccessibility of the tundra areas. Reflectance spectroscopy has proven to be a powerful tool 2.2. Methodology for fast assessment of multiple soil properties [21, 22]inboth 2.2.1. Soil Description and Sampling. Soil sampling is done laboratoryandfieldsetups[23]. However, the applicability of in two ways to get a good impression of the differences reflectance spectroscopy relies on the construction of a cali- and spatial variation in soil properties and types. First, soil bration database, which is in general site specific. Although profile descriptions are made along short transects (ca 2– numerous papers have been published on the estimation of 9 m length), to study the variation in soil profile under soil organic carbon and other soil properties in various envi- the dominating plant functional types. This is done for ronments [24], to our best knowledge none of them focused three locations: transect 1 is located on the Pleistocene river on highly organic tundra soil, and therefore no models are terrace, transect 2 on the slope down from this terrace available to determine soil properties from their reflectance. towards the drained thaw lake, and transect 3 is located Since bare soil surfaces occur rarely in tundra environments, in the drained thaw lake (Figure 1).Thesoilisdescribed the use of remotely sensed vegetation proxies will be essential down to the frozen layer (horizons and thickness, texture, for possible spatially continuous estimates of soil properties. decomposition stage, Munsell color), and for one location Theobjectivesofthisstudyare(1)toevaluateif within each transect a permafrost drill is used to sample reflectance spectroscopy operated in the field or in slightly frozen layers. Second, soil samples are collected for 37 plots controlled conditions can be successfully applied to assess throughout the area. Sampling is done randomly within soil properties that influence carbon turnover and vegetation the different main vegetation types, ensuring a comparable development in a Siberian arctic tundra environment (2) to number of samples for all main vegetation types. For these investigate the variation and distribution of major soil prop- plots, the thickness of the decomposed (no plant fibers erties in this area, and (3) to investigate the relation between visible anymore) and slightly decomposed (plant remains vegetation composition and soil properties of the organic still observable) organic layer is measured and samples are layers and evaluate if plant species composition can be used collected for spectral analysis. All samples (N = 128) are as a proxy to estimate soil properties. We use reflectance air-dried to determine the moisture content and prepare measurements to calibrate partial least square regression them for laboratory spectral measurements. Vegetation (PLSR) models and stepwise multiple linear regression descriptions are made in the plots where soil samples are models (SMLR) for total C, total N, pH, total K, total P, and collected, whereby we noted species identities and estimated soil moisture. Selected models with a good performance are plant fractional cover. then applied to estimate properties for a larger number of soil samples. Finally, the relations between soil properties, and 2.2.2. Spectral Measurements and Laboratory Element Analy- their relation with plant species composition are discussed. sis. The spectral reflectance of the soil samples is measured Applied and Environmental Soil Science 3

147◦280E 147◦300E

◦   ◦   ◦   N 50 0 0 N 70 0 0 N 70 0 0 N E N  0 200 400 600 800 1000  0   E 0

Meters 30 ◦   30 0   180 0 50 ◦ 50 ◦

◦ N ! 70 70 E E20 Russia   0 0   0 0 ◦ ◦ 40 140 N N   0 0   50 50 ◦ ◦ 70 70 Drained thaw lake Pleistocene river terrace Slope

2 3

1 N  N  30  30  49 ◦ 49 ◦ Berelekh River 70 70

147◦280E 147◦300E

Profile locations Sample locations Figure 1: Panchromatic GeoEye-1 image with an overview of the study area including the locations of the profile descriptions and sample locations. The arrow on the overview map indicates the location of the study area in Russia. with an ASD Fieldspec Classic FR, ranging from 350– for all soil properties (Total C, Total N, pH, Total P, and 2500 nm, combined with an ASD contact probe. A white Total K and moisture). This method has been used frequently spectralon calibration panel is used as reference. For most to develop soil property models to determine, for example, samples this is done under field conditions (N = 118, further organic carbon in laboratory, field and airborne settings referred to as fieldspectra), although this was not possible [27–29]. PLSR is done in Parles [30], where all reflectance for 10 samples originating from the frozen mineral soil. For spectra are converted to apparent absorbance, mean centre all samples (N = 128), the moisture content is determined transformation is done, and spectra are denoised using a by weighing the fresh and air dried samples. Additionally, Savitzky-Golay filter. Models are evaluated by leave-one-out the reflectance of the air dried samples is measured (further cross validation, using the root mean square error (RMSE) referred to as labs pectra). The frozen samples are included and Akaike Information Criterion (AIC) to select the proper in this dataset, after defreezing and drying them. Part of number of latent variables. the soil samples (N = 38) are sent to the laboratory (Inst. Furthermore, we investigate the possibility to use known Of Physicochemical and biological Problems in Soil Science, absorption features in the reflective domain to estimate soil Pushchino) for chemical analysis. To ensure that the full carbon and nitrogen. Because the samples are highly organic, range of full properties is represented, these samples are we assume that absorption features related to carbon and selected in such a way that samples from all horizons, land- nitrogen in plant material (e.g., in components like lignin scape elements, and from under all main vegetation types are and cellulose) can still be observed in the soil reflectance included. Organic soil samples (N = 33) are analyzed for pH, spectra. Therefore, we use the carbon-and nitrogen-related Total P, K, N, and C, while the mineral soil samples (N = 5) wavelengths described by Curran [31] in combination with are additionally analyzed for Mg, CaO, and Fe2O3. stepwise multiple linear regression (SMLR) for the estima- Different regression methods are used to relate the chem- tion of Total C and Total N in the soil samples. Regression ical analysis to the spectral measurements. As a reference models are fitted for lab spectra and field spectra and technique, we use Partial Least Square Regression (PLSR) evaluated by means of leave-one-out cross validation, using 4 Applied and Environmental Soil Science the R software package [32]. Model performance is evaluated Profile 1: Pleistocene ridge 2 0 using the R , RMSE, and ratio of performance to deviation −10 O O Oi (RPD), according to the criteria defined by Chang and Laird −20 Ao Ao Oi Ao Ao −30 B [33]. If appropriate prediction models can be established, soil O B B −40 B C D properties are estimated for all soil samples. This results in a A −50 Bf full analysis of all described soil profiles and an analysis of the −60 slightly decomposed and strongly decomposed organic layers −70 for the 37 locations for which full vegetation description is −80 done. −90 − E Due to the continuous vegetation cover, nondestructive 100

Depth from horizontal plane (cm) horizontal Depth from 0 50 100 150 200 250 300 measurements of soil reflectance are not possible. Since plant species composition is related to abiotic factors and Profile 2: slope 0 O are a potentially important source of variation in soil pro- −10 Ao − cesses, including decomposition rates [13], we investigated 20 O the relation between plant species composition and soil −30 B Oi − Ao O properties. The vegetation descriptions are classified into 40 Aof − Bf Bf Ao four major plant functional types (dry tussock evergreen 50 B B −60 Bf shrub, deciduous shrub, moist Sphagnum sedges, and wet −70 C sedge pools), using the two-way indicator species analysis − 80 A (TWINSPAN) for Windows v2.3 [34] as described in Blok −90 [35]. Boxplots of physical and chemical soil properties per −100

Depth from horizontal plane (cm) horizontal Depth from 0 50 100 150 200 250 vegetation class are made to investigate the relation between soil properties and vegetation type. Profile 3: drained thawlake 0 − O 10 Ao − O 3. Results and Discussion 20 Ao Bf − 30 B − Bf 40 H 3.1. Soil Profiles and Chemical Properties. Soil profile descrip- H −50 Ao tions made for three short transects are shown in Figure 2. B −60 Bf Within each transect, the microtopography and −70 A C Bf thickness are measured at fixed distances of 10 cm, and at −80 representative locations in terms of vegetation composition −90 D the full profile is described. The depth from the horizontal −100

Depth from horizontal plane (cm) horizontal Depth from 0 100 200 300 400 500 600 700 800 900 plane, shown on the y-axis, is the relative height compared to the highest point within the corresponding profile. Because Distance along transect (cm) of the presence of permafrost within the first meter, all Topography soils are classified as according to the USDA soil Permafrost taxonomy. On the Pleistocene ridge (transect 1), the soils Figure 2: Soil profile descriptions along three transects (green = consist of an organic layer on top of clayey/silty parent organic material, orange slightly decomposed organic = soil, deposits. The organic layer can mostly be subdivided in an brown = decomposed organic soil, and grey = mineral soil). Names O horizon, followed by an Ao horizon with decomposed of occurring plant species are given in the Appendix, including the organic material. On some locations, an Oi horizon is visible, estimated fractional cover. with slightly decomposed organic material. The presence of an O horizon depends on the vegetation type (e.g., Eriophorum vaginatum hummocks) and hydrological condi- tions (e.g., wet conditions with Sphagnum ( mosses)). observed. The profiles in the drained thaw lake basin differ Organic layer thickness mostly varies between 5 and 15 cm, from the other locations by the absence of an Oi horizon. but occasionally thicker layers occur (up to 25 cm). The Either a small organic layer is present at the drier locations, mineral B horizon consists of clay/loamy clay, with an olive or a thick wet organic layer (H horizon) is present at the to dark olive grey color and continues beyond our maximum lower parts. Usually, a small organic layer with decomposed sampling depth (92 cm). Spots of iron oxidation can be seen material is found between the organic layer and the mineral in the thawed soil, which indicates that aerobic processes B horizon, only the profile on the location with Sphagnum do occur above the permafrost. The total C content of the lacks this Ao horizon. At the time of sampling, the top of the mineral soil lies between 1.97% and 4.86%. The higher C permafrost follows the top of the mineral soil, which suggests content is found at a depth of >60 cm and is caused by some that permafrost thaw is related to the soil composition or small organic remains. The soil profiles in transect 2, on the vegetation composition. slope of the Pleistocene ridge to the drained thaw lake basin, The soil sampling of the B horizon down to a maximum shows no large differences with transect 1 on top of the ridge, depth of 92 cm reveals an average carbon content of 2.84% in although thick O horizons are absent. The mineral B horizon the frozen mineral soil along all transects, with a maximum has the same texture as on the ridge, and oxidation marks are value of 4.86% on the Pleistocene ridge. Compared to Applied and Environmental Soil Science 5

Table 1: Summary of chemical analysis and correlations (R) between soil properties.

pH Total P (mg/100 g) Total K (mg/100 g) Total N (%) Total C (%) Min 3.88 62 195 0.29 1.97 Max 6.78 201 1700 2.16 44.66 Mean 4.92 129 860 1.07 20.75 Stdev 0.66 38.23 426.16 0.50 11.47 pH 1 Total P −0.29 1 Total K 0.63 −0.62 1 Total N −0.45 0.72 −0.89 1 Total C −0.62 0.60 −0.97 0.88 1

pH Total P (mg/100 g) Total K(mg/100 g) 10 8 10 8 6 8 6 6 4 4 4 2 2 2 0 0 0 3.5 4 4.5 5 5.5 6 6.5 7 100 150 200 0 500 1000 1500

Total N (%) Total C (%) Moisture (%) 7 7 6 6 15 5 5 4 4 10 3 3 2 2 5 1 1 0 0 0 0.5 1 1.5 2 0 10203040 20 40 60 80 100 Figure 3: Frequency histograms of the 38 analyzed soil samples. The x-axis shows the ranges of the soil property and the y-axis the frequency. an average C content of 2.56% in soils [36], our site the other soil properties. Frequency histograms are shown shows slightly higher C contents for the sampled depth. This in Figure 3, which show that the selection of samples for means that an increase in active layer thickness will expose a chemical analysis was done well, since the full range of all slightly higher amount of C to decomposition than estimated soil properties is nicely covered. by Zimov et al. [36]. In general, very large differences in soil composition 3.2. Soil Spectral Analysis. In general, the mineral soil has are observed at short distances, making continuous spatial the lowest reflectance when measured in the laboratory mapping of soil properties a difficult task. The strong spatial (Figure 4). Major absorption features around 1400 and variation in soil composition corresponds with the spatial 1900 nm are caused by remaining water in the samples. The variation in microtopography, surface hydrology, and plant slightly decomposed horizons show a higher reflectance in species composition. For example, thickness of the organic the near infrared and more pronounced water absorption layer can vary between 5 to 25 cm within a distance of less features. First derivatives emphasize the presence of small than a meter. absorption features at 1535 nm, between 1700 and 1800 nm Table 1 shows the statistical summary of laboratory and between 2200 and 2320 nm, which correspond with analysis and correlations between soil properties. The values absorption features for plants, caused by the presence of show that the ranges in all soil properties are large and lignin, starch, cellulose, nitrogen, and proteins [31]. The that variation is high. Soils are in general acid, although in absorption features are most pronounced in the slightly some cases neutral pH levels were measured. As expected the decomposed samples, but present in the decomposed sam- Total C content is high on average, with lower levels for the ples as well. In the mineral soil spectra, an absorption mineral soil. Total K and total C show a very high correlation feature around 2200 nm is present, caused by the fact that (R =−0.97), and both properties are clearly correlated with clay is the parent material [37], but also the organic layers total N (R = 0.88 with total C and R =−0.89 for total do show a minor absorption feature at this wavelength. K). Total P and pH are not strongly correlated with any of Field observations revealed the presence of iron oxides in 6 Applied and Environmental Soil Science

Reflectance 1st derivative 0.4 0.001

0.3 0 ) ) − −

0.2 −0.001 Reflectance ( Reflectance 1st derivative ( 1st derivative 0.1 −0.002

0 −0.003

500 1000 1500 2000 2500 500 1000 1500 2000 2500 Wavelength (nm) Wavelength (nm) Mineral Mineral Decomposed Decomposed Slightly decomposed Slightly decomposed (a) (b)

Figure 4: Spectral signatures of three horizons sampled at the same geographic location. The left graph shows the reflectance spectra, the right figure shows the first derivative of the reflectance spectra.

Table 2: Performance of model fits using lab spectra of dried samples, evaluated with leave-one-out cross-validation.

pH Total P (mg/100 g) Total K (mg/100 g) Total N (%) Total N (%) Total C (%) Total C (%) Moisture (%) Method PLSR PLSR PLSR PLSR SMLR PLSR SMLR PLSR No. of factors 5 2 2 8 2 2 9 2 R2 0.50 0.38 0.79 0.73 0.80 0.79 0.95 0.42 RMSE CV 0.47 29.77 193.92 0.26 0.23 5.17 2.59 10.09 RPD CV 1.42 1.28 2.20 1.93 2.18 2.22 4.43 1.33 RMSE CV: root mean square error of cross-validation, RPD CV: ratio of performance to deviation of cross-validation, PLSR: partial least squares regression, SMLR: stepwise multiple linear regression. the mineral soil, which was supported by the chemical SMLR using the absorption features described by Curran analysis. However, the spectral signature of the mineral soil [31] yields very good results for the prediction of total N shows no clear absorption feature for iron oxides. and total C (Table 2). Especially for total,C the estimations Using the lab spectra, good calibrations are found for improve strongly, to an RMSE of 2.59%; half of the RMSE Total K and Total C, using PLSR (Table 2). The good fit was achieved by the PLSR model, which is also expressed for Total K is mainly caused by the strong correlation with with a high R2 (0.95) and RPD (4.43). It has to be noted that Total C and Total N (Table 1), instead of specific absorption the number of wavelengths that are kept for the final multiple features by K. The PLSR model for Total N yields somewhat linear regression model is rather high for total C, which may lower results with a R2 of 0.75 and RPD of 1.97, which just limit the use of this model for other areas. The RMSE is classifies it as a moderate model for prediction, but 8 factors larger than results obtained in other studies [24], but is very are used to fit this model. This is relatively high, given the size acceptable given the range in the dataset and high levels of of the calibration data set. For pH a moderate model (class total C in this study. For total N, the model performance also B according to the classification of Chang and Laird [33]) improves, although less stronger than for total C, but next to model can be fitted for prediction as well, but with a RPD of that the number of factors used in the regression is largely 1.42 this model is on the lower level of this class, indicating reduced. Because total K has a very strong correlation with that the predicted pH values should rather be interpreted Total C we checked if an indirect estimation of Total K, using qualitatively than quantitatively. Total P cannot be predicted the predicted Total C values and relation between the two well from the spectral data. The RPD of 1.28 and R2 of 0.38 properties, yields a better prediction. This is not the case, but indicate that this PLSR model cannot reliably be applied on results are comparable with the values obtained with PLSR other soil spectra. on the spectra directly. Scatterplots of the observed versus Applied and Environmental Soil Science 7

Table 3: Performance of model fits using spectra of samples under field conditions, evaluated with leave-one-out cross validation.

pH Total P (mg/100g) Total K (mg/100g) Total N (%) Total N (%) Total C (%) Total C (%) Moisture (%) Method PLSR PLSR PLSR PLSR SMLR PLSR SMLR PLSR No. of factors 4 2 2 2 4 4 8 1 R2 0.45 0.16 0.44 0.34 0.43 0.45 0.74 0.11 RMSE CV 0.48 34.93 314.73 0.40 0.39 8.38 6.03 12.7 RPD CV 1.35 1.09 1.35 1.24 1.28 1.37 1.90 1.05 RMSE CV: root mean square error of cross validation, RPD CV: ratio of performance to deviation of cross validation, PLSR: partial least squares regression, SMLR: stepwise multiple linear regression.

Table 4: TWINSPAN vegetation classes and dominant plant species per class.

TWINSPAN class Dominant plant species Dry tussock evergreen L edum decumbens, Eriophorum vaginatum, Salix glauca, and Vaccinium uliginosum Moist deciduous shrub Betula nana, Salix pulchra, and Arctagrostis latifolia Moist Sphagnum sedge Spagnum spp, Carex aquatilis, and Salix fuscescens Wet sedge pools Eriophorum angustifolium

predicted values for the best performing methods are shown the boxes, showing the 25% and 75% quantile ranges in Figure 5. (Figure 6). Only for the mineral soil the variation is more Using fieldspectra (i.e., wet soil samples) the model constrained for all soil properties. The levels of total C in performance decreases drastically, mostly to levels that are the organic layers are comparable to those presented by not acceptable for quantitative prediction of soil properties Michaelson et al. [39] for the Coastal Plain and Northern (RPD < 1.4). Only for total C a reasonable model could Foothills in Alaska, but the mineral soil samples in their study be fitted, using SMLR, but the RMSE is more than two show a larger variation in observed values, due to the large times larger than the RMSE found for dried samples. This geographic extent of their study. accuracy is comparable with the results using fieldspectra With the data we gathered, it is not possible to assess only of Knadel et al. [38], for their study site in Denmark, the total carbon stock of our research site. To make such which shows comparable ranges in carbon. The difference predictions the maximum soil sampling depth should be in accuracy between labs pectra and fieldspectra is very increased and bulk density has to be determined for each likely related to soil moisture, which generally decreases sample. Under current conditions, the total organic layer the prediction capabilities of visible and near infrared is thawed early in the summer season, but on a gram per spectroscopy. Interestingly, the moisture content cannot be carbon basis deep permafrost mineral soils show carbon estimated from the fieldspectra using PLSR (RPD = 1.05, release raters similar to organic soils for some soil types [40]. Table 3), but using the dried spectra some correspondence Further, changes in hydrology will have a large influence on can be found with the reflectance measurements (RPD = carbon decomposition as hydrological conditions determine 1.33, Table 2). The low accuracy for soil moisture is most if the carbon fluxes to the atmosphere are released under aer- probably caused by the very high levels of soil moisture (20– obic (mainly CO2) or anaerobic (high CH4 rates) conditions. 95%). Since the absorption features that are related to water According to Lee et al. [40], aerobic conditions have a greater may saturate at lower moisture levels already, observing effect on climate when compared with a similar amount of differences between these high levels is not possible from the permafrost thawing in an anaerobic environment. reflectance spectra. 3.4. Relation between Soil Properties and Vegetation Type. 3.3. Soil Properties per Horizon. The identified models for The twinspan classification results in four vegetation classes, lab spectra are used to predict soil properties for all samples for which the dominant plant species are given in Table 4. collected at the 37 locations for which detailed vegetation Figure 7 shows boxplots of the predicted soil properties for descriptions are done and all samples collected from the soil the vegetation classes. Separate boxplots are made for the profiles. Box plots of pH, total K, total C, and total N are slightly decomposed layer (Oi horizon) and the decomposed made for the different horizons (Figure 6). Total K and pH organic layer (Ao horizon). show a gradual increase when going deeper into the soil. The average pH of the soil is comparable for all vegetation The content of total C and total N decreases with depth. types, but the variation within the plant communities shows There are clear differences in the median and quartile values large differences. The dry tussock evergreen shrub and the for the different horizons, although there is overlap in the wet sedge pools, dominated by Eriophorum angustifolium, minimum and maximum ranges. Further, there is a large show a large variation in pH of the slightly decomposed variation in all soil properties within the slightly decomposed organic layer. For the moist Sphagnum sedge vegetation, and decomposed layer, as can be seen from the width of the pH in the slightly decomposed horizon hardly varies. 8 Applied and Environmental Soil Science

SMLR 1 : 1 PLSR 1 : 1 2 2

1.5 1.5

1 1

0.5 = 0.5 = Predicted total N (%) total Predicted RPD 2.18 RPD 1.93 Predicted total N (%) total Predicted RMSE = 0.23 RMSE = 0.26 2 0 R2 = 0.8 0 R = 0.73 0120.5 1.5 0 0.51 1.5 2 Observed total N (%) Observed total N (%)

50 SMLR 1 : 1 50 PLSR 1 : 1

40 40

30 30

20 20 Predicted total C (%) total Predicted 10 RPD = 4.43 C (%) total Predicted 10 RPD = 2.22 RMSECV = 2.59 RMSECV = 5.17 R2 = 0 0.95 0 R2 = 0.79 0 10 20 30 40 50 0 10 20 30 40 50 Observed total C (%) Observed total C (%)

7 PLSR 1 : 1200 PLSR 1 : 1 6.5 ) − 6 150 5.5

5

Predicted pH ( Predicted 100 4.5 RPD = 1.28 RPD = 1.42 = g) P (mg/100 total Predicted RMSE = 29.77 4 RMSE 0.47 R2 = 0.5 50 R2 = 0.38 4674.5 5 5.5 6.5 50 100 150 200 Observed pH (−) Observed total P (mg/100 g) 2000 PLSR 1 : 1 100 PLSR 1 : 1

80 1500

60 1000 40

500 Predicted moisture (%) moisture Predicted 20 = Predicted total K (mg/100 g) K (mg/100 total Predicted RPD = 2.2 RPD 1.33 RMSE = 193.92 RMSE = 10.1 R2 = R2 = 0 0.79 0 0.42 0 500 1000 1500 2000 0204060 80 100 Observed total K (mg/100 g) Observed moisture (%) Figure 5: Scatterplots of the observed versus predicted values for multiple soil properties, based on PLSR or SMLR using lab spectra. Applied and Environmental Soil Science 9

6

5.5 1500 ) − 5

pH ( 1000

4.5 Total K (mg/100 g) K (mg/100 Total

4 500

50 1.5 40

30 1 Total N (%) Total Total C (%) Total 20 0.5 10

0 0

Slightly decomposed Slightly decomposed

Decomposed Decomposed Mineral Mineral

Figure 6: Boxplots for pH, total K, total C, and total N, based on PLSR (pH and K) and SMLR (C and N) estimated soil properties, for all the samples.

The soil in the decomposed organic layer is on average less decomposed and decomposed layer. The sedge dominated acid, with a slightly higher pH for the moist Sphagnum- tussock/evergreen shrub class is characterized by large sedge vegetation, compared to other vegetation types. The amounts of standing litter and dense roots, which causes observed pH values correspond well with values in literature the high amount of total C in the slightly decomposed [16]. Soils under all vegetation types are more acidic than layer. Also the thickness of the organic layer shows large the optimal pH for methanogenesis of around 6 [41], but variation within the different vegetation classes. On average, methanogenesis has been shown to occur at low pH (pH = the total organic layer is the thinnest under the deciduous 3.1) [42]. Plant growth in the tundra system can be limited shrubs. Combined with the fact that the total C content by a number of factors, such as soil temperature and nutrient is relatively low, this vegetation type may contribute least availability. The measures of nutrients do not indicate direct to soil carbon stocks in the arctic tundra. However, large deficiencies, but if pH is lower than 6, P starts forming aboveground shrub biomass can also constitute a significant insoluble compounds with iron (Fe) and aluminium (Al). carbon pool, thus contributing to the total carbon stock in Concentrations of N are less sensitive to pH, but efficient shrub tundra areas. Several studies suggest that an increase use by plants depends on availability of several nutrients. in temperature will lead to an increase in shrub growth in Therefore, the amounts of nutrients available for plant arctic tundra [17, 18, 43–45]. This implies that the future growth is probably limited by soil pH, in combination with total C accumulation in tundra soils will decrease, since the low decomposition activity due to low temperatures and low thickness of the organic layers will on average decrease and quality of organic material. the total C content is not higher than for other vegetation The total C content of the slightly decomposed layer is in types. However, there will be a trade off with the fact that general higher than for the decomposed layer. Furthermore, increased abundance of deciduous shrubs with future climate total C in the upper layer shows more variation for the warming will promote carbon storage, because of their rela- different plant communities. Most vegetation types showed tively large allocation to woody stems that decompose slowly large differences in total C content between the slightly [14]. 10 Applied and Environmental Soil Science

6 6 40 40

5.5 5.5 30 30

] slightly 5 5 20 20 − ] decomposed layer ] decomposed pH [ 4.5 − 4.5 10 10 decomposed layer decomposed C (%) decomposed layer C (%) decomposed pH [

4 4 layer C (%) slightly decomposed 00

2 2 25 25 1.5 1.5 20 20 1 1 15 15 N (%) slightly 0.5 0.5 decomposed layer decomposed

N (%) decomposed layer N (%) decomposed 10 10 0 0 C/N ratio layer slightly decomposed C/N ratio layer slightly decomposed

100 100 30 50

80 80 25 40 20 60 60 30 15 40 40 20 10

decomposed layer decomposed 20 20 10 Moisture (%) slightly Moisture 5 Active layer thickness (cm) layer Active Organic layer thickness (cm) Organic layer

0 layer (%) decomposed Moisture 0 0 0

Dry tussock evergreen shrub Dry tussock evergreen shrub Dry tussock evergreen shrub Dry tussock evergreen shrub Deciduous shrub Deciduous shrub Deciduous shrub Deciduous shrub Moist Sphagnum-sedges Moist Sphagnum-sedges Moist Sphagnum-sedges Moist Sphagnum-sedges Wet sedge pools Wet sedge pools Wet sedge pools Wet sedge pools Figure 7: Boxplots of the predicted soil properties for the four vegetation classes.

The total N content in the slightly decomposed layer does conductivity. Strong relations between vegetation composi- not show large variation between the different vegetation tion and ALT have, for example, been shown in a large- classes. Remarkable is the low variation within the wet sedge scale study conducted in Alaska, where strong differences in vegetation class. The total N content of the decomposed ALT were observed between vegetation types along a gradient layer is generally about 0.5% lower than for the slightly from shrub-dominated to barren tundra [47]. decomposed layer, although the difference is small for the As expected, the soil moisture content is highest for deciduous shrub vegetation class. The C/N ratio is overall the wet vegetation classes (Sphagnum sedge and wet sedge rather high, indicating that the organic material in the soil pools), but the difference in soil moisture content under does not contain large amounts of humus. This is mainly the the other vegetation types is not that large, probably due case for the dry tussock evergreen shrub class, which consists to the fact that we sampled early summer. As a result, soil of dense graminoid species (Eriophorum vaginatum)with moisture content is high (>40%) for most samples, under all low evergreen shrubs with dense roots and relatively large vegetation types, which prohibits a good estimation of soil amounts of litter. properties by in situ reflectance measurements. Plots dominated by deciduous shrubs show a lower active layer thickness (ALT), which corresponds with the results 3.5. Implications for Spatially Continuous Mapping of Soil from a shrub removal experiment by Blok et al. [46], showing Properties. The relationships between plant species compo- that shrubs can reduce energy transfer to the soil by shading sition and soil properties allow qualitative estimations for the soil surface and thus can reduce ALT. Sphagnum and C and N in the different organic layers, due to the limited sedge-dominated wet areas show a higher ALT, probably variation within the vegetation classes, but the relationships due to the high soil moisture levels, increasing soil thermal are not distinctive enough to be used as a proxy for Applied and Environmental Soil Science 11 Betula nana 10 Ledum decumbens 10 Poaalpigena + Vaccinium vitis-idaea 1 Eriophorum vaginatum + Aulacomnium turgidum + Tomenthypnum nitens 10 Cetraria islandica 7 Peltigera aphthosa + .+ sp Carex chordorrhiza 1 Hylocomium splendens + . ssp .stans+ .stans10 ssp ssp .25 .20 .5 sp sp sp Transect 2-Slope Transect 1—Pleistocene Ridge Transect 3—Drained Thaw lake Arctagrostis latifolia Ledum decumbens +Vaccinium vitis-idaea 1 Vaccinium Ledum vitis-idaea decumbens 5 + Betula nana + Eriophorum vaginatum 80 Betula nana 50 Eriophorum angustifolium 20 arundinacea + Cetraria laevigata + Dactylina 80 . ssp .stans+ Cetrarialevigata+ ssp .80 Rhytidiumrugosum60 . 5 Peltigera aphthosa 20 sp sp 5: Names of occurring plant species for the described transects and profiles, including the fractional cover. Cetraria islandica 10 Tomenthypnum nitens 20 Eriophorum angustifolium 5 Betula nana 20 Salix pulchra 10 Betula nana + Vaccinium vitis-idea 35 Carex stans + Calamagrostis holmii 1 Poa alpigena + Carex aquatilis Arctagrostis latifolia arundinacea + Table . ssp . 50 Cetraria laevigata 5 Sphagnum squarrosum + Sphagnum squarrosum 95 .+ Ledumdecumbens5 Cetrariaislandica+ . + Vaccinium vitis-idaea 5 Peltigera aphthosa 10 . 10 Cetraria islandica + Aulacomnium turgidum + Hylocomium splendens 50 sp sp . 5 Hylocomnium splendens 15 Dicranum . 60 Tomenthypnum nitens 5 Hylocomium splendens 50 Dicranum polysetum 10 sp sp sp sp Arctagrostis latifolia arundinacea 5 Polytrichum Salix pulchra 1 Ledum decumbens 1 Betula nana 5 Betula nana 1 Betula nana 30 Betula nana 20 Vaccinium vitis-idaea 10 Polytrichum Ledum decumbens 10Vaccinium vitis-idaea 5Cassiope tetragona +Hylocomium splendens +Dicranum Polytrichum Cetraria nigricans 10 Salix glauca +Aulacomnium turgidum +Dicranum polysetum + Carex aquatilis Tomenthypnum nitens 10 Aulacomnium turgidum 5 Carex aquatilis Hylocomnium splendens + Ledum decumbens Vaccinium 30 vitis-idaea 10 Rhytidium rugosum + Poa alpigena + Cetraria islandica 15Dactylina Poa alpigena + Aulacomnium turgidum 45 AB C D E AB C D AB C Vaccinium vitis-idaea 10Ledum decumbens 1 Vaccinium vitis-idaea 1Dactylina Rhytidium rugosum + Eriophorum vaginatum + Peltigera Polytrichum aphthosa 1 Dicranum Dicranum Eriophorum vaginatum +Cetraria laevigata 2 Poa alpigena 1 Polytrichum Cetraria islandica + Poa alpigena + Aulacomnium turgidum 12 Applied and Environmental Soil Science quantitative estimates. Knowing the vegetation type, it can Report of the Intergovernmental Panel on Climate Change,S. be determined if a high or low C and N can be expected. Solomon, D. Qin, M. Manning et al., Eds., p. 996, Cambridge Concerning the pH, qualitative estimation will be possible University Press, Cambridge, UK, 2007. for some vegetation classes for some horizons. Especially for [2] E. Post, M. C. Forchhammer, M. S. Bret-Harte et al., “Ecolog- moist Sphagnum sedges, the range in pH is small for both the ical dynamics across the arctic associated with recent climate slightly decomposed and the decomposed organic layer. change,” Science, vol. 325, no. 5946, pp. 1355–1358, 2009. The fact that the presence of certain plant species is [3] ACIA, in Arctic Climate Impact Assessment, Impacts of a related to the soil properties opens possibilities for applica- Warming Arctic,V.M.KattsovandE.Kall¨ en,´ Eds., pp. 99–150, tion of vegetation spectroscopy and remote sensing. Field Cambridge University Press, Cambridge, UK, 2004. reflectance measurements can be used to estimate presence [4] C. D. Koven, B. Ringeval, P. Friedlingstein et al., “Permafrost and fractional cover of different species, for example, by carbon-climate feedbacks accelerate global warming,” Proceed- using spectral unmixing techniques [48]. Given the large ings of the National Academy of Sciences of the United States of spatial variation in plant species composition, the use of America, vol. 108, no. 36, pp. 14769–14774, 2011. air- or spaceborne remote sensing data requires both a high [5] V.E.Romanovsky,D.S.Drozdov,N.G.Obermanetal.,“Ther- spatial (<1 m) and spectral resolution. Nowadays, these can mal state of permafrost in Russia,” Permafrost and Periglacial Processes, vol. 21, no. 2, pp. 136–155, 2010. only be acquired from airborne platforms. Next to this, cok- riging techniques, using the vegetation as proxy variable in [6]D.Blok,M.M.P.D.Heijmans,G.Schaepman-Strub,A.V. Kononov, T. C. Maximov, and F. Berendse, “Shrub expansion combination with a well-designed spatial sampling strategy, ff may reduce summer permafrost thaw in Siberian tundra,” may o er possibilities for spatial mapping of soil properties Global Change Biology, vol. 16, no. 4, pp. 1296–1305, 2010. in the arctic tundra. The presented spectral methods do allow [7]E.A.G.Schuur,J.G.Vogel,K.G.Crummer,H.Lee,J.O.Sick- fast and cheap intensive measurement of the soil properties man,andT.E.Osterkamp,“Theeffect of permafrost thaw in our study area. Possibilities to map vegetation classes com- on old carbon release and net carbon exchange from tundra,” parable to the twinspan classification have to be investigated, Nature, vol. 459, no. 7246, pp. 556–559, 2009. since twinspan determines class assignments based on occur- [8]N.S.Zimov,S.A.Zimov,A.E.Zimova,´ G. M. Zimova,´ V. I. rence and quantity of individual species, which is practically Chuprynin, and F. S. Chapin, “Carbon storage in permafrost impossible to determine with remote sensing data. and soils of the mammoth tundra-steppe biome: role in the global carbon budget,” Geophysical Research Letters, vol. 36, 4. Conclusions no. 2, Article ID L02502, 6 pages, 2009. The presented results show that reflectance spectroscopy can [9] E. Dorrepaal, S. Toet, R. S. P.Van Logtestijn et al., “Carbon res- piration from subsurface peat accelerated by climate warming be used for fast quantification of multiple soil properties in the ,” Nature, vol. 460, no. 7255, pp. 616–619, 2009. in the Siberian tundra, although drying of the soil samples [10] R. T. Conant, M. G. Ryan, G. I. Agren˚ et al., “Temperature and is required before measuring reflectance. As such, it can soil organic matter decomposition rates—synthesis of current be a useful tool to achieve a higher sampling density for knowledge and a way forward,” Global Change Biology, vol. 17, soil properties in tundra ecosystems, where logistics limit no. 11, pp. 3392–3404, 2011. the collection and chemical analysis of a large number [11] W. D. Billings, J. O. Luken, D. A. Mortensen, and K. M. Peter- of samples. In situ reflectance spectroscopy can be used son, “Arctic tundra: a source or sink for atmospheric carbon to determine total C. Soil properties show large variation dioxide in a changing environment?” Oecologia, vol. 53, no. 1, over short distances, requiring intensive sampling to achieve pp. 7–11, 1982. good regional estimates of, for example, carbon stocks. [12] W. D. Billings, J. O. Luken, D. A. Mortensen, and K. M. Peter- To allow good estimates of carbon stocks in the area, it son, “Increasing atmospheric carbon dioxide: possible effects is important to increase maximum sampling depth and on arctic tundra,” Oecologia, vol. 58, no. 3, pp. 286–289, 1983. determine bulk density for each sample. Because of the [13] S. E. Hobbie and L. Gough, “Litter decomposition in moist relation between vegetation species and soil properties, acidic and non-acidic tundra with different glacial histories,” plant species composition can be used to give a qualitative Oecologia, vol. 140, no. 1, pp. 113–124, 2004. indication about the soil properties below the surface. [14] S. E. Hobbie, “Temperature and plant species control over lit- ter decomposition in Alaskan tundra,” Ecological Monographs, Appendix vol. 66, no. 4, pp. 503–522, 1996. For more details see Table 5. [15] J. H. C. Cornelissen, P. M. Van Bodegom, R. Aerts et al., “Global negative vegetation feedback to climate warming Acknowledgment responses of leaf litter decomposition rates in cold biomes,” Ecology Letters, vol. 10, no. 7, pp. 619–627, 2007. The authors would like to thank Alexander Kononov of [16] L. Gough, G. R. Shaver, J. Carroll, D. L. Royer, and J. A. Laun- the Institute of Biological Problems of the Cryolithozone, dre, “Vascular plant species richness in Alaskan arctic tundra: Yakutsk, for assistance in the logistics. the importance of soil pH,” Journal of Ecology, vol. 88, no. 1, pp. 54–66, 2000. References [17] F. S. Chapin, G. R. Shaver, A. E. Giblin, K. J. Nadelhoffer, and J. A. Laundre, “Responses of Arctic tundra to experimental and [1] IPCC, “Climate change 2007: the physical science basis,” in observed changes in climate,” Ecology, vol. 76, no. 3, pp. 694– Contribution of Working Group I to the Fourth Assessment 711, 1995. Applied and Environmental Soil Science 13

[18] M. D. Walker, C. H. Wahren, R. D. Hollister et al., “Plant com- [35] D. Blok, Shrubs in the Cold: Interactions between Vegetation, munity responses to experimental warming across the tundra Permafrost and Climate in Siberian Tundra, Wageningen biome,” Proceedings of the National Academy of Sciences of the University, Wageningen, The Netherlands, 2011. United States of America, vol. 103, no. 5, pp. 1342–1346, 2006. [36] S. A. Zimov, S. P. Davydov, G. M. Zimova et al., “Permafrost [19]D.Blok,G.Schaepman-Strub,H.Bartholomeus,M.M.P.D. carbon: stock and decomposability of a globally significant Heijmans, T. C. Maximov, and F. Berendse, “The response carbon pool,” Geophysical Research Letters, vol. 33, no. 20, of Arctic vegetation to the summer climate: relation between Article ID L20502, 5 pages, 2006. shrub cover, NDVI, surface albedo and temperature,” Environ- [37] E. Ben-Dor, “Quantitative remote sensing of soil properties,” mental Research Letters, vol. 6, no. 3, Article ID 035502, 2011. Advances in Agronomy, vol. 75, pp. 173–243, 2002. [20] F. S. Chapin, M. Sturm, M. C. Serreze et al., “Role of land- [38] M. Knadel, A. Thomsen, and M. H. Greve, “Multisensor on- surface changes in arctic summer warming,” Science, vol. 310, the-go mapping of soil organic carbon content,” Soil Science no. 5748, pp. 657–660, 2005. Society of America Journal, vol. 75, no. 5, pp. 1799–1806, 2011. [21] C. W. Chang, D. A. Laird, M. J. Mausbach, and C. R. Hur- [39] G. J. Michaelson, C. L. Ping, and J. M. Kimble, “Carbon stor- burgh, “Near-infrared reflectance spectroscopy—principal age and distribution in tundra soils of Arctic Alaska, U.S.A,” components regression analyses of soil properties,” Soil Science Arctic and Alpine Research, vol. 28, no. 4, pp. 414–424, 1996. Society of America Journal, vol. 65, no. 2, pp. 480–490, 2001. [40] H. Lee, E. A. G. Schuur, K. S. Inglett, M. Lavoie, and J. [22] R. A. Viscarra Rossel, D. J. J. Walvoort, A. B. McBratney, L. J. P. Chanton, “The rate of permafrost carbon release under ff Janik, and J. O. Skjemstad, “Visible, near infrared, mid infrared aerobic and anaerobic conditions and its potential e ects on or combined diffuse reflectance spectroscopy for simultaneous climate,” Global Change Biology, vol. 18, no. 2, pp. 515–527, assessment of various soil properties,” Geoderma, vol. 131, no. 2012. 1-2, pp. 59–75, 2006. [41] P. Dunfield, R. knowles, R. Dumont, and T. R. Moore, “Meth- [23] A. Stevens, B. van Wesemael, H. Bartholomeus, D. Rosillon, ane production and consumption in temperate and subarctic B. Tychon, and E. Ben-Dor, “Laboratory, field and airborne peat soils: response to temperature and pH,” and spectroscopy for monitoring organic carbon content in agri- Biochemistry, vol. 25, no. 3, pp. 321–326, 1993. cultural soils,” Geoderma, vol. 144, no. 1-2, pp. 395–404, 2008. [42] R. T. Williams and R. L. Crawford, “Methanogenic Bacteria, [24] B. Stenberg, R. A. Viscarra Rossel, A. M. Mouazen, and J. including an acid-tolerant strain, from peatlands,” Applied and Wetterlind, “Visible and Near Infrared Spectroscopy in Soil Environmental Microbiology, vol. 50, no. 6, pp. 1542–1544, Science,” Advances in Agronomy, vol. 107, pp. 163–215, 2010. 1985. [43] M. S. Bret-Harte, G. R. Shaver, J. P. Zoerner et al., “Devel- [25] M. K. Van Der Molen, J. Van Huissteden, F. J. W. Parmentier opmental plasticity allows betula nana to dominate tundra et al., “The growing season greenhouse gas balance of a con- subjected to an altered environment,” Ecology, vol. 82, no. 1, tinental tundra site in the Indigirka lowlands, NE Siberia,” pp. 18–32, 2001. Biogeosciences, vol. 4, no. 6, pp. 985–1003, 2007. [44] D. Blok, U. Sass-Klaassen, G. Schaepman-Strub, M. M. P. D. [26] D. A. Walker, M. K. Reynolds, F. J. A. Daniels¨ et al., “The Cir- Heijmans, P. Sauren, and F. Berendse, “What are the main cumpolar Arctic vegetation map,” Journal of Vegetation Sci- climate drivers for shrub growth in Northeastern Siberian ence, vol. 16, no. 3, pp. 267–282, 2005. tundra?” Biogeosciences, vol. 8, no. 5, pp. 1169–1179, 2011. [27] P.H. Fidencio,ˆ R. J. Poppi, J. C. De Andrade, and H. Cantarella, [45] I. H. Myers-Smith, B. C. Forbes, M. Wilmking et al., “Shrub “Determination of organic matter in soil using near-infrared expansion in tundra ecosystems: dynamics, impacts and spectroscopy and partial least squares regression,” Communi- research priorities,” Environmental Research Letters, vol. 6, no. cations in Soil Science and Plant Analysis, vol. 33, no. 9-10, pp. 4, Article ID 045509, 2011. 1607–1615, 2002. [46] D. Blok, G. Schaepman-Strub, M. Heijmans et al., Climate [28] T. Udelhoven, C. Emmerling, and T. Jarmer, “Quantitative Change Effects on Vegetation in Northeastern Siberian Tundra— ff analysis of soil chemical properties with di use reflectance How Does Shrub Growth Relate to Local Climate and What Are spectrometry and partial least-square regression: a feasibility Potential Effects of Shurb Expansion on Permafrost Thawing? study,” Plant and Soil, vol. 251, no. 2, pp. 319–329, 2003. EGU General Assembly, Vienna, Austria, 2010. [29] H. Bartholomeus, L. Kooistra, A. Stevens et al., “Soil Organic [47] F. E. Nelson, N. I. Shiklomanov, G. R. Mueller, K. M. Hinkel, D. Carbon mapping of partially vegetated agricultural fields with A. Walker, and J. G. Bockheim, “Estimating active-layer thick- imaging spectroscopy,” International Journal of Applied Earth ness over a large region: Kuparuk river basin, Alaska, U.S.A,” Observation and Geoinformation, vol. 13, no. 1, pp. 81–88, Arctic and Alpine Research, vol. 29, no. 4, pp. 367–378, 1997. 2011. [48] G. Schaepman-Strub, J. Limpens, M. Menken, H. M. Barthol- [30] R. A. Viscarra Rossel, “ParLeS: software for chemometric anal- omeus, and M. E. Schaepman, “Towards spatial assessment ysis of spectroscopic data,” Chemometrics and Intelligent Lab- of carbon sequestration in peatlands: spectroscopy based oratory Systems, vol. 90, no. 1, pp. 72–83, 2008. estimation of fractional cover of three plant functional types,” [31] P. J. Curran, “Remote sensing of foliar chemistry,” Remote Biogeosciences, vol. 6, no. 2, pp. 275–284, 2009. Sensing of Environment, vol. 30, no. 3, pp. 271–278, 1989. [32] R. Ihaka and R. Gentleman, “R: a language for data analysis and graphics,” Journal of Computational and Graphical Statis- tics, vol. 5, no. 3, pp. 299–314, 1996. [33] C. W. Chang and D. A. Laird, “Near-infrared reflectance spec- troscopic analysis of soil C and N,” Soil Science, vol. 167, no. 2, pp. 110–116, 2002. [34] M. O. Hill and P. Smilauer,ˇ TWINSPAN for Windows Version 2.3, Centre for Ecology & Hydrology and University of South Bohemia, Huntingdon, UK, 2005. Hindawi Publishing Corporation Applied and Environmental Soil Science Volume 2012, Article ID 751956, 11 pages doi:10.1155/2012/751956

Research Article Quantitative Analysis of Total Petroleum Hydrocarbons in Soils: Comparison between Reflectance Spectroscopy and Solvent Extraction by 3 Certified Laboratories

Guy Schwartz,1, 2, 3 Eyal Ben-Dor,2 and Gil Eshel4

1 Porter School of Environmental Studies, Tel-Aviv University, Tel-Aviv 69978, Israel 2 Remote Sensing Laboratory, Tel-Aviv University, Tel-Aviv 69978, Israel 3 Geography and Human Environment Department, Tel-Aviv University, P.O. Box 39040, Tel-Aviv 69978, Israel 4 The Soil Erosion Research Station, Ruppin Institute, Emeck Hefer 40250, Israel

Correspondence should be addressed to Guy Schwartz, [email protected]

Received 9 January 2012; Revised 29 March 2012; Accepted 3 April 2012

Academic Editor: Jose Alexandre Melo Dematte

Copyright © 2012 Guy Schwartz et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The commonly used analytic method for assessing total petroleum hydrocarbons (TPH) in soil, EPA method 418.1, is usually based on extraction with 1,1,2-trichlorotrifluoroethane (Freon 113) and FTIR spectroscopy of the extracted solvent. This method is widely used for initial site investigation, due to the relative low price per sample. It is known that the extraction efficiency varies depending on the extracting solvent and other sample properties. This study’s main goal was to evaluate reflectance spectroscopy as a tool for TPH assessment, as compared with three commercial certified laboratories using traditional methods. Large variations were found between the results of the three commercial laboratories, both internally (average deviation up to 20%), and between laboratories (average deviation up to 103%). Reflectance spectroscopy method was found be as good as the commercial laboratories in terms of accuracy and could be a viable field-screening tool that is rapid, environmental friendly, and cost effective.

1. Introduction fuels, oils, lubricants, waxes, and others [4]. Traditional wet chemistry methods for determining TPH level in soil samples Among the chemicals that are relevant as environmental con- is based on extracting the contaminant from the soil sample. taminants, petroleum hydrocarbons (PHC) are of particular The TPH level in the extracted solution is then determined significance. The widespread use of PHC for transportation, by a gravimetric, FTIR, or GC measurement calibrated by an heating and industry has led to the release of these petroleum EPA calibration standard. products into the environment through accidental spills, The TPH gross parameter is in use worldwide and facili- long-term leakage, or operational failures. Consequently, tates an important stage of contaminated sites investigation; many soil and water areas are contaminated with PHC. PHC therefore, it is important to examine the effects of hydrocar- are well known to be neurotoxic to humans and animals. bon type and soil properties on the extraction efficiency, as Several studies have been conducted in order to verify the well as cross-lab repeatability. ff e ects of PHC on humans and animals [1–3]. For both the The common method for assessing TPH in soil samples diagnosis of suspected areas and the possibility of controlling is based on a modified version of EPA method 418.1. This the rehabilitation process, there is a great need to measure method is based on extraction with 1,1,2-Trichlorotrifluoro- correctly the amounts of PHC in soils. ethane (Freon 113, GC 99.9%), although other extracting Total petroleum hydrocarbons (TPH) is a commonly solvents are available (i.e., Carbon tetrachloride, N-Hexane, used gross parameter for quantifying environmental con- etc.). This method was originally introduced in 1978 [5]by tamination originated by various PHC products such as the USEPA in order to assess TPH in waste water but was 2 Applied and Environmental Soil Science

Table 1: Major soil properties.

Israeli local HM Sand Silt Clay SOC SIC Total N EC1 SSA USDA classification pH1 name % volume % g kg−1 mS m−1 m2 g−1 Loess Typic xerofluvent 4.14 38.6 49.4 12 5.4 22.5 0.9 8.22 5.44 167 Hamra Typic xerocherept 1.44 97.37 1.73 0.9 1.5 2.1 0.5 8.57 0.08 83 Gromosol Typic chromoxerert 5.23 46.46 38.98 14.56 7.6 12.5 1.3 8.68 0.55 238 11to2ratio. later adjusted in 1983 [6] for the assessment of TPH in soil Reflectance spectroscopy is commonly applied for quanti- samples. Newer methods are available for determining TPH tative analysis in many disciplines. This method consists of in soil samples; these methods are based on extraction with measuring the reflected electromagnetic energy from the soil other solvents and are usually followed by gas chromato- samples in the VIS-NIR-SWIR region (350–2500 nm), and graph analysis for THP determination. As these methods are modeling this spectral data against samples with known more expensive, the EPA method 418.1 is in vast use as a concentration levels. Extracting the information about the screening tool [4, 7]. soil attributes that is hidden within the spectral information, There are number of possible interactions between inor- is done by using multivariate statistical techniques, also ganic and organic soil components and organic pollutants, called chemometrics. Essentially, this involves regression soil organic matter, and clays, having significant impact on techniques coupled with spectral preprocessing. A more solid-liquid extraction. Furthermore, the solvent extraction detailed description of the spectral preprocessing and the of compounds from soil or sludge samples is dependent on chemometricsprocessaswellasanoverviewofreflectance the moisture content in the soil [8]. There are some inherent spectroscopy as a tool for monitoring contaminated soils can problems with IR readings of the extracted solvent; all be found in a recent publication by the authors [7]. petroleum hydrocarbons do not respond equally to infrared Thespectralpropertiesofhydrocarbonswereidentified analysis, and comparison of the unknown to a standard at the late 1980s, although it was argued that these properties mixture may give results with high systematic errors [9]. The are visible at concentrations of 4% wt and above [14]. Several major problem with the adjusted EPA 418.1 method is that studies were conducted during the past 20 years in the field of the extraction yields can be strongly matrix dependent, and PHC and reflectance spectroscopy (ie., [15–24]) that showed the extraction method development and optimization may the potential of reflectance spectroscopy as being used as a be quite complicated. These extraction-related problems tool for predicting TPH content. For taking a step forward mainly originate from the diversity of chemical and physical in acceptance of this tool by the environmental protection properties of petroleum hydrocarbons, which affect not only authorities, a validation study that includes a comparison of the solubility of hydrocarbons to the solvents, but also on the the results of commercial laboratories analysis and reflec- strength of analyte-soil matrix interactions, and therefore tance spectroscopy performance is needed. Therefore, The render the control of the extraction process of petroleum goals of this study are (1) a comparison of the inner and hydrocarbons from soil problematic. interlaboratory TPH measuring capabilities, (2) general In conclusion, it is clear that the adjusted EPA method accuracy of the measured TPH levels as compared to the 418.1 may overestimate TPH as a result of the following: known TPH levels of the contaminated soil samples, and (3) (1) differences in infrared molar absorptivity for calibration Testing reflectance spectroscopy as a viable replacement for standards and petroleum products; (2) detection of naturally the traditional methods based on solvent extraction. occurring hydrocarbons; (3) infrared dispersion by mineral particles. Negative bias may also be introduced via (1) 2. Materials and Methods poor extraction efficiency of Freon-113 for high-molecular- weight hydrocarbons; (2) differences in molar absorptivity; Three certified laboratories in Israel were selected for this (3) removal of five to six-ring alkylated aromatics during the study. Analogue soils typical to Israel were artificially con- silica gel cleanup procedure [10]. taminated with PHC and sent at the same time and in the Quality assurance in the area of TPH determination is same conditions to all laboratories. In addition, the samples under developed and actually, except in few cases [11–13], underwent a new NIRS procedure that we developed in TAU there have not been any attempts to estimate the uncertainty in which reflectance spectroscopy is used to determine TPH related to the analytical procedure of TPH determination. level [7]. Taking in consideration all the possible biases that can 2.1. Soils and Hydrocarbons. Three soils were selected for this occur during the adjusted EPA 418.1 method, as well as the study (defined according to Israeli naming system [25]as fact that each laboratory uses somewhat different protocols well as the USDA key to soil taxonomy [26]): Loess (Typic and equipment for the extraction process and TPH determi- Xerofluvent), Hamra (Typic Xerocherept), and Gromosol nation; a methodic cross-laboratory evaluation is needed. (Typic Chromoxerert). These soils represent a wide range of In addition to the traditional analytical chemistry meth- soil properties as described in Table 1 and are significantly ods used for measuring TPH in the soil samples, a new differ from each other. The soils were collected from areas novel method based on reflectance spectroscopy was applied. that were assumed to have no PHC contamination and Applied and Environmental Soil Science 3 were air-dried and sieved through a 2 mm sieve twice. The IR absorbance versus concentration (ppm) soils properties were determined by the traditional methods 0.7 in soil science as follows: hydroscopic moisture content was 0.6 ◦ 0.5 determined by weight loss after 24 h at 105 C. pH level and 0.4 electrical conductivity were determined with a laboratory 0.3 0.2 bench top 86505 pH/Conductivity meter by M.R.C Ltd. in Absorbance 0.1 a 1 : 2 soil and DI water suspension (resp.) after reaching 0 equilibrium (30 minutes). Specific surface area (SSA) was 0 50 100 150 200 250 300 determined by the absorption of mono layer of ethylene Concentration (ppm) glycol monoethyl ether (EGME) [27]. Particle size distribu- tions were determined by Marvin Mastersizer 2000 following Diesel y = 0.0029x +0.0128, R2 = 0.9994 Eshel et al. methodology [28]. SOC, SIC, and Total N were Kerosene y = 0.0028x +0.0113, R2 = 0.9997 determined by a flash CHN elemental analyzer (Thermo Sci- y = . x . 2 entific Flash 2000). The soils analogue contaminated samples Octane 95 0 0014 +00067, R = 0.9998 were prepared by mixing a known weight of several PHC 418.1 EPA reference y = 0.0023x +0.0048, R2 = 1 types including: octane fuel, diesel and kerosene with known quantities of soil. For making well-mixed low concentration Figure 1: IR absorbance versus concentration (ppm). samples, we initially mixed a batch of 98.5 gr of soil with 1.5 gr of the selected PHC; after mixing the initial batch, the batch was then mixed again with clean soil at three con- 2.3. IR Absorbance of Diesel, Kerosene, Octane 95, and centration levels. In order to minimize the loss of PHC 418.1 EPA Reference. PHC efficiency to absorb IR radiation components, we minimized exposure to open air as much as depends on the PHC molecules structure. It was important to possible. Each sample was divided equally into 4 amber glass map these absorptions differences for the contaminants used ◦ vials, capped with a PTFE lined cap, and kept at 4 C. Three of in this study, relative to the 418.1 EPA reference that is usually the vials were sent to the analytical laboratories for analysis, used for TPH determination. Diesel, kerosene, octane 95, 1 vial was kept for reflectance spectroscopy analysis. Table 2 and the 418.1 EPA reference were mixed with Freon 113 at describes the samples contamination properties and presents four different concentration levels each: ∼50, ∼100, ∼150, the calculated concentration info. and ∼200 ppm. Each sample was then measured for its absorbance by a buck scientific 404 analyzer; the results are 2.2. Extraction and TPH Measurement Method. The general shown in Figure 1. Since the relation between the absorption methodology for the adjusted EPA 418.1 method is based on and the concentration for each PHC is perfectly linear, (see taking a representative soil sample (3–10 gr.), adding sodium Figure 1), the absorption was calculated for each PHC for the sulfate (1–5 gr.) to absorb any water and adding an extracting following concentrations: 50, 100, 150, 200, 250, 300, 350, solvent (usually Freon 113, 20–30 mL) to the mixture. This 400, 450, 500 ppm. Each PHC was then plotted versus the 418.1 EPA reference as shown in Figure 2. mixture is then kept in a sealed glass vial capped with a PTFE cap and placed in a sonic bath for assisting and hasting the extraction process (about 10–45 minutes). Silica gel is 2.4. Conversion of Specific PHC to TPH. Due to the fact that then added to the mixture to absorb any polar hydrocarbons laboratories give results in TPH which is a gross parameter (nonfuel-related soil organic matter and fatty acids), and the based on the EPA standard that represents a mixture of mixture is mixed well. The filtered extract is then measured several PHC, and our soil samples were contaminated by a in an FTIR spectrometer at 3.42 µm (some laboratories use specific PHC, we need to apply a conversion factor from the other absorption peaks in the close region). A calibration specific PHC to the relative gross parameter TPH as seen is curve is created by using the 418.1 EPA standard (consists of Figure 2. This resulted “Projected TPH” value should rep- 31.5% isooctane, 35% hexadecane and 33.5% chloroben- resent the contamination level of the contaminated samples zene) diluted in the same extracting solution at at least 3 con- if the laboratory process was flawless, thus eliminating one centrations. The absorption depth of the measured sample is major bias factor, which is the difference between IR absorb- ffi then converted to TPH values by the calibration curve. As ance e ciency of the 418.1 EPA standard, relative to the this method is an adjusted EPA method, it can vary slightly specific PHC we used to contaminate the soil as described between analytical laboratories, depending on internal lab- in the previous section. The conversion equations to project oratory standards, procedures, and equipment. The three the specific PHC to TPH values in this study (Figure 2)are: laboratories used for analyzing the samples prepared for this (1) TPH (ppm) = Diesel (ppm) ∗ 1.2609 + 0.0067, study are commercial laboratories, certified by the national laboratories certification authority, thus the exact procedure (2) TPH (ppm) = Kerosene (ppm) ∗ 1.2174 + 0.0055, is confidential and not known to the authors, although the (3) TPH (ppm) = Octane 95% (ppm) ∗ 0.6087 + 0.0039, principal remains the same. All 30 contaminated samples prepared for this study as described above were sent to the The calculated projected TPH values are shown in three certified laboratories for chemical analysis determina- Table 2, and are used for the rest of this study instead of the tion of TPH levels, the results are summarized in Table 2. original specific PHC levels. 4 Applied and Environmental Soil Science Lab A (TPH) Lab B (TPH) Lab C (TPH) Min Max Avg Min Max Avg Min Max Avg Spectroscopy (TPH) Projected TPH 2: Soil samples calculated concentration, projected TPH, and laboratory TPH results. Table 500400 630700 487 1378600 426 909600 252 757 937500 275 128 730 264 737 34 145 615 304 137 714 625 47 356 345 620 739 354 41 483 223 381 350 510 70 369 20 237 210 498 70 640 255 230 677 70 20 439 236 659 46 470 20 463 69 455 51 512 190 54 62 490 254 57 222 10 10 10 450550 567600 670 908 365 953 354 511 277 434 394 39 320 599 299 610 43 405 605 415 41 458 410 52 506 305 66 483 383 59 350 47 56 51 Calculated concentration (ppm) Contaminant oe0None 0Diesel 10 9Kerosene 9 9 101010101512 95% octane NoneDiesel Kerosene 095% octane 0 655 6 6 6 10 10 10 78 110 91 oe0None 0Diesel 8 4116 7 101010101010 Kerosene 95% octane Soil name Loess Gromosol Hamra 2 345 678 9 4500 10500 6000 12000 5674 13239 5500 7304 14609 4617 8693 3348 4871 8567 4575 8122 1274 5288 5455 8175 8740 4932 8149 6039 9608 519 6179 14534 15369 5747 6292 9174 14952 7441 14078 6236 586 9897 14125 7528 3730 10217 14102 553 7485 4480 10021 9410 793 3420 4111 9880 3814 838 9704 3628 816 244 333 300 1 11 12 131415 161718 192021 22 242325 26 250027 900028 29 400030 11000 3152 11348 4500 10000 4870 13391 2545 6062 3500 2739 11000 6087 3182 6495 1139 5984 5000 10000 704 4413 1100 13870 2308 2606 9435 7303 5200 9000 1724 6087 6644 12174 3250 9628 1419 3593 2376 210 1188 12447 2928 3601 12958 9532 3165 12703 1728 5478 4687 3320 13184 3597 1193 228 2613 11753 7970 13411 4698 2816 1191 13298 8560 1916 219 4693 3055 14593 2917 2674 7264 3494 1885 5839 8313 2145 13173 629 2936 3107 7859 2765 14705 2312 4169 2891 635 6209 410 14800 4441 7533 958 2219 14753 578 3832 4624 632 6024 10513 5588 11245 4533 491 629 11219 62 1127 11436 5613 2493 10811 11341 451 601 1043 88 5601 2706 7922 680 1800 1231 2621 8510 73 1852 1376 691 8261 1826 1306 686 685 228 824 265 743 249 10 9500 5783 1800 1227 1816 1522 1142 2003 1573 260 312 279 Sample Applied and Environmental Soil Science 5

Contaminant IR absorbance versus 418.1 EPA probe 3 times, each consisting of 30 measurements that have reference IR absorbance been averaged; the 3 resulting spectra for each sample were 1.6 1.4 averaged. The average spectrum for each sample was used to 1.2 1 predict the TPH level by a PLS model based on several soil 0.8 types and PHC types, predeveloped in the last few years by 0.6

absorbance 0.4 the authors. The modeling procedure included five types of

Contaminant 0.2 0 soils, three types of PHCs at 50 concentration levels, yielding 0 0.2 0.4 0.6 0.8 1 1.2 1.4 750 laboratory prepared samples. An “all possibilities” 418.1 EPA reference absorbance approach was used for generating robust NIRS models. This approach includes the evaluation of many preprocessing y = . x . Diesel 1 2609 +00067 techniques (SNV, MSC, smoothing, absorbance, first and Kerosene y = 1.2174x +0.0055 second derivatives, and continuum removal), as well as PLS y . x . and ANN modeling methods (i.e., [7, 22, 30–33]). Octane 95 = 0 6087 +00039

Figure 2: Contaminant IR absorbance versus 418.1 EPA reference 2.8. General Accuracy. In order to evaluate the reliability of IR absorbance. the reflectance spectroscopy method as compared to the common EPA 418.1 method as an environmental monitoring tool, the general accuracy of both methods had to be 2.5. Intralaboratory Consistency Factors. The contaminated examined. General accuracy is an important parameter as it soil samples from each laboratory separately were divided determines not only the intra- and interperformances of the into three groups: low, medium, and high, by the known con- laboratories but also portrays the ability of the laboratory to centration level, regardless of soil type or contaminant. The determine the actual contaminant concentration in the sam- intralaboratory consistency was evaluated by four factors. ple. General accuracy of TPH measurements done by both (1) Average delta: the difference between maximum TPH reflectance spectroscopy and analytical laboratories, was value and minimum TPH value of each sample in measured by the same previously mentioned factors used for that group, followed by averaging the results of all the inter and intra groups as shown in Table 5 (average delta, samples in that group. average deviation, maximum delta, and maximum devia- ff tion). The average delta was calculated for each group; by first (2) Average deviation: the di erence between maximum calculating the delta for each sample in that group (average TPH value and minimum TPH value of each sample TPH value-projected TPH value) followed by averaging the in that group, then divided by the average TPH value results of all the samples in that group. The average deviation for that sample, thus normalizing the results. Finally was calculated for each group by first calculating the delta the normalized results of all samples were averaged for each sample in that group (average TPH value-projected for all samples in each group. TPH value), then dividing the result with the projected TPH (3) Maximum delta: same as average delta, but instead of value for that sample, thus normalizing the results. Finally averaging the results for each group, only the maxi- the normalized results of all samples were averaged for each mum value was selected, portraying the “worst case group. The maximum delta and maximum deviation were scenario.” calculated in the same manner, but instead of averaging the (4) Maximum deviation: same as average deviation, but results for each group, only the maximum value was selected instead of averaging the results for each group, only portraying the “worst case scenario.” the maximum value was selected, portraying the “worst case scenario.” 3. Results and Discussion Results are shown in Table 3. Inner laboratory consistency seems very acceptable with results of under 20% average deviation for all 3 labs with 2.6. Interlaboratory Consistency Factors. The interlaboratory lab B having the best consistency of under 10% deviation consistency factors were calculated in the same way the (Table 3). Although the average deviation is low for all lab- intrafactors were calculated, but instead of taking the samples oratories, in some cases high deviation can occur, even up to from each laboratory separately, all samples from all labora- 68% as can be seen in Table 3 (medium concentration sam- tories were joined together, as if they came from the same ples, Lab A). The interlaboratory consistency on the other laboratory. The same four factors: average delta, average hand is far from satisfactory. Average interlaboratory devia- deviation, maximum delta, and maximum deviation were tion is between 83% and 103% and can even reach values of calculated as described in the intralaboratory consistency ∼200% in some cases, that is: a Hamra sample contaminated factors section. Results are summarized in Table 4. with diesel (Sample 4, Table 2) yielded an average value of 8149 TPH from Lab A and 14952 TPH from Lab B. Both 2.7. Spectroscopy TPH Measurements. The contaminated soil intra and interlaboratory average deviation are presented in samples were measured according to TAU’s protocol [29]by Figure 3, performance of Lab A and Lab C are about similar, an ASD Fieldspec pro instrument with an ASD contact with better performances by Lab B. General accuracy was also 6 Applied and Environmental Soil Science Max deviation Max delta AVG deviation AVG delta Max deviation Max delta AVG deviation AVG delta Max deviation 3: Intralab repetition statistics for low, medium, and high TPH levels. Table Max delta ABC AVG deviation 24 12% 80 32% 15 7% 37 24% 38 17% 78 40% 473712 20% 13% 1169 2840 68% 39% 54 361 2% 10% 183 861 6% 55% 229 390 16% 9% 750 706 35% 18% AVG delta High (9000–12000) Medium (2500–6000) Lab Low (400–600) TPH level (calculated) Applied and Environmental Soil Science 7

Table 4: Interlab repetition statistics for low, medium, and high TPH levels.

TPH level (calculated) AVG delta AVG deviation Max delta Max deviation Low (400–600) 190 83% 373 199% Medium (2500–6000) 2203 103% 4382 209% High (9000–12000) 4564 90% 7247 178%

Intra-\interlaboratory average deviation Hamra (typic xerocherept) 100 16000 80 14000 12000 60 10000 8000 (%) 40 6000 20 4000

Laboratory TPH 2000 0 0 Low (400–600)Medium (2500–6000) High (9000–12000) 0 2000 4000 6000 8000 10000 12000 14000 16000 Adjusted TPH Intra lab A deviation Intra lab B deviation Spectroscopy y = . x . R2 = . Intra lab C deviation Inter laboratory deviation 0 5953 + 166 78, 0 928 Lab A Figure 3: Intra-/interlaboratory deviation. y = 0.6526x − 315.28, R2 = 0.9075 Lab B y = 1.0765x − 1030, R2 = 0.9175 Lab C Average and maximum deviation from projected TPH 1 : 1 y = 0.7205x − 913.27, R2 = 0.8812 160 140 Figure 5: Hamra with all PHC types. 120 100 Loess (typic xerofluvent) 80 (%) 16000 60 14000 12000 40 10000 20 8000 6000 0 4000 Low (400–600) Medium (2500–6000) High (9000–12000)

Laboratory TPH 2000 0 0 2000 4000 6000 8000 10000 12000 14000 16000 Spectroscopy AVG deviation Spectroscopy max deviation Adjusted TPH Lab A AVG deviation Lab A max deviation Spectroscopy y = 0.4317x +520.88, R2 = 0.8313 Lab B AVG deviation Lab B max deviation Lab A Lab C AVG deviation Lab C max deviation y = 0.677x − 724.63, R2 = 0.9092 Figure 4: Average and maximum deviation from projected TPH. Lab B y = 1.0538x − 733.32, R2 = 0.9333 Lab C y = 0.6204x − 477.03, R2 = 0.8266 1 : 1 not satisfactory as seen in Table 5, average deviation ranged from 26% up to 68%. Many of the accuracy errors are in Figure 6: Loess with all PHC types. measuring 95% octane fuel; this could be a result of loosing most of the contaminant during the extraction process due to the high volatility nature of this PHC. Performance of all the contamination levels. Because it is clear that 95% octane laboratories, including the reflectance spectroscopy method, fuel is a problematic contaminant due to its high volatility, are almost identical as shown in Figure 4, with Lab B being when we examine the results while ignoring the 95% octane the most accurate laboratory. Although accuracy was not contaminated samples, almost perfect correlation coefficient satisfactory, a good correlation appears when plotting the appear (Figures 8, 9,and10). These correlations between reflectance spectroscopy and laboratories TPH results against the reflectance spectroscopy and the laboratories TPH results the projected TPH results as demonstrated in Figures 5, 6, shows consistency of Lab B being always over estimating and 7. This shows that both the spectroscopy and the labora- the projected TPH values, and the reflectance spectroscopy, tories TPH results are consistent and are good predictors of Lab A and Lab C always under estimating the projected 8 Applied and Environmental Soil Science Max deviation Max delta AVG deviation AVG delta Max deviation Max delta AVG deviation AVG delta Max deviation Max delta AVG deviation AVG delta Max 5: Accuracy of TPH determination by reflectance spectroscopy and three commercial laboratories. deviation Table Max delta Spectroscopy A B C AVG deviation 325 68% 747 143% 349 68% 500 93% 192 42% 356 84% 283 57% 508 97% AVG 20556187 47% 61% 4360 11494 74% 83% 1956 4392 51% 49% 2795 6150 92% 81% 1010 1827 30% 26% 2532 4210 78% 73% 2590 4413 60% 50% 4781 5859 97% 95% delta TPH level (calculated) High (9000–12000) Lab Low (400–600) Medium (2500–6000) Applied and Environmental Soil Science 9

Gromosol (typic chromoxerert) Loess (typic xerofluvent) 16000 diesel and kerosene 14000 16000 12000 14000 10000 12000 8000 10000 6000 8000 4000 6000 Laboratory TPH 2000 4000

0 Laboratory TPH 2000 0 2000 4000 6000 8000 10000 12000 14000 16000 0 Adjusted TPH 0 2000 4000 6000 8000 10000 12000 14000 16000 Adjusted TPH Spectroscopy y = 0.1575x +826.02, R2 = 0.793 Spectroscopy y = 0.4265x + 1020, R2 = 0.9911 Lab A y = 0.7949x − 1048.5, R2 = 0.8294 Lab A y = . x − . 2 Lab B 0 6823 315 06, R = 0.9836 y = 1.0433x − 1011.9, R2 = 0.9316 Lab B Lab C y = 1.0449x − 23.517, R2 = 0.9912 Lab C y = 0.7582x − 1212, R2 = 0.876 1 : 1 1 : 1 y = 0.6153x − 148.07, R2 = 0.9384 Figure 7: Gromosol with all PHC types. Figure 9: Loess with diesel and kerosene.

Hamra (typic xerocherept) diesel and kerosene Gromosol (typic chromoxerert) diesel and kerosene 16000 14000 16000 12000 14000 10000 12000 8000 10000 6000 8000 4000 6000

Laboratory TPH 2000 4000 0 Laboratory TPH 2000 0 2000 4000 6000 8000 10000 12000 14000 16000 0 0 2000 4000 6000 8000 10000 12000 14000 16000 Adjusted TPH Adjusted TPH Spectroscopy y = 0.5696x + 774.64, R2 = 0.9852 Spectroscopy y = 0.1653x + 667.7, R2 = 0.8491 Lab A y = 0.6169x + 457.14, R2 = 0.9665 Lab A y = . x − . R2 = . Lab B 0 7937 632 09, 0 8516 Lab B y = 1.0417x − 5.3423, R2 = 0.9859 Lab C y = 1.024x − 267.43, R2 = 0.9901 Lab C y = 0.7017x − 203.17, R2 = 0.9687 1 : 1 y = 0.7653x − 898.97, R2 = . 1 : 1 0 9179 Figure 8: Hamra with diesel and kerosene. Figure 10: Gromosol with diesel and kerosene.

TPH values at almost the same level. As this phenomena new spatial dimension for site investigation, opening new being so consistent, it can be corrected by the correlation frontiers in monitoring PHC contamination in soil. factors specific for each Laboratory. The result of this study confirms the hypothesis of large variations between 4. Conclusion laboratories and methods, even though they are properly certified by the authorities. It is interesting to note that While accuracy level is affected by various elements such as with a precise approach, it is possible to account for these laboratory protocols, equipment and personnel, results variations, correct and calibrate the results to represent remain very consistent and can be corrected when certain the contamination levels accurately, thus enabling reliable factors specific for each laboratory are employed. When a comparable results. Reflectance spectroscopy was found to new batch of samples needs to be evaluated, a sample of be as good as the traditional method employed by the clean soil similar to the same batch, contaminated with the commercial certified laboratories. Reflectance spectroscopy 418.1 EPA standard at two levels can be added to the batch, is a nondestructive method that can be used for rapid, thus helping to model the bias for this batch and to calibrate simple, and cost effective TPH determination both in the the results. Due to the problematic nature of measuring the laboratory and in the field. Moreover, the resent advances 95% octane TPH levels, a PID (Photo Ionization Detector) in imaging spectroscopy field could enable the adding of a instrument should be used to accompany each sample to 10 Applied and Environmental Soil Science help measure the volatile PHC. Reflectance Spectroscopy and a rapid field method,” Analytical Chemistry, vol. 71, no. 9, performed very well in this study (almost the same as Lab pp. 1899–1904, 1999. A and Lab C), and should be considered as a tool for field [11] P. Lambert, M. Fingas, and M. Goldthorp, “An evaluation of screening due to its very low cost per sample, easy operation, field total petroleum hydrocarbon (TPH) systems,” Journal of ability to work in field conditions, and the possibility of fast Hazardous Materials, vol. 83, no. 1-2, pp. 65–81, 2001. measurements and instant results. Reflectance spectroscopy [12] E. Saari, P. Peram¨ aki,¨ and J. Jalonen, “A comparative study of is a nondestructive environmental friendly method; that solvent extraction of total petroleum hydrocarbons in soil,” Microchimica Acta, vol. 158, no. 3-4, pp. 261–268, 2007. when coupled with a PID device (for volatile PHC detection) [13] M. Villalobos, A. P. Avila-Forcada, and M. E. Gutierrez-Ruiz, could be used as an excellent screening tool in the field. “An improved gravimetric method to determine total petro- When using reflectance spectroscopy coupled with PID, con- leum hydrocarbons in contaminated soils,” Water, Air, and Soil taminated samples should not elude detection. In general Pollution, vol. 194, no. 1–4, pp. 151–161, 2008. the 418.1 EPA method alone should not be used to grant a [14] E. A. Cloutis, “Spectral reflectance properties of hydrocarbons: “clean bill of health” to any contaminated site, but only as a remote-sensing implications,” Science, vol. 245, no. 4914, pp. screening and decision-making tool before more expensive 165–168, 1989. methods are employed. It is strongly recommended that any [15]I.Schneider,G.Nau,T.V.V.King,andI.Aggarwal,“Fiber- certified laboratory and method will be improved by using optic near-infrared reflectance sensor for detection of organics a standard protocol suggested in this study, for calibrating in soils,” IEEE Photonics Technology Letters,vol.7,no.1,pp. the laboratory results to the real contamination level of the 87–89, 1995. soil. Applying these protocols will assure both intra- and [16] B. R. Stallard, M. J. Garcia, and S. Kaushik, “Near-IR reflec- interaccurate, consistent, and comparable results. tance spectroscopy for the determination of motor oil con- tamination in sandy loam,” Applied Spectroscopy, vol. 50, no. 3, pp. 334–338, 1996. [17] Z. Zwanziger and F. Heidrun, “Near infrared spectroscopy of References fuel contaminated sand and soil. I. Preliminary results and calibration study,” Journal of Near Infrared Spectroscopy, vol. [1]M.S.Hutcheson,D.Pedersen,N.D.Anastas,J.Fitzgerald,and 6, no. 1–4, pp. 189–197, 1998. D. Silverman, “Beyond TPH: health-based evaluation of [18] D. F. Malley, K. N. Hunter, and G. R. B. Webster, “Anal- petroleum hydrocarbon exposures,” Regulatory Toxicology and ysis of diesel fuel contamination in soils by near-infrared Pharmacology, vol. 24, no. 1, pp. 85–101, 1996. ff reflectance spectrometry and solid phase microextraction-gas [2] P. Bo etta, N. Jourenkova, and P. Gustavsson, “Cancer risk chromatography,” Soil and Sediment Contamination, vol. 8, no. from occupational and environmental exposure to polycyclic 4, pp. 481–489, 1999. aromatic hydrocarbons,” Cancer Causes and Control, vol. 8, no. [19] B. Horig,¨ F. Kuhn,¨ F. Oschutz,¨ and F. Lehmann, “HyMap 3, pp. 444–472, 1997. hyperspectral remote sensing to detect hydrocarbons,” Inter- [3]G.D.Ritchie,K.R.Still,W.K.Alexanderetal.,“Areviewof national Journal of Remote Sensing, vol. 22, no. 8, pp. 1413– the neurotoxicity risk of selected hydrocarbon fuels,” Journal of 1422, 2001. Toxicology and Environmental Health B, vol. 4, no. 3, pp. 223– [20] F. Kuhn,¨ K. Oppermann, and B. Horig,¨ “Hydrocarbon 312, 2001. index—an algorithm for hyperspectral detection of hydrocar- [4] Environmental Sciences Division, Use of Gross Parameters for bons,” International Journal of Remote Sensing, vol. 25, no. 12, Assessment of Hydrocarbon Contamination of Soils in Alberta, pp. 2467–2473, 2004. Oxford, UK, 1993. [5] United States Environmental Protection Agency (USEPA), Test [21] K. H. Winkelmann, On the applicability of imaging spectrom- Method for Evaluating Total Recoverable Petroleum Hydrocar- etry for the detection and investigation of contaminated sites bon, Method 418.1 (Spectrophotometric, Infrared), Government with particular consideration given to the detection of fuel hydro- Printing Office, Washington, DC, USA, 1978. carbon contaminants in soil, Ph.D. thesis, Brandenburgische [6] United States Environmental Protection Agency (USEPA), Technische Universitat¨ Cottbus, 2005. Methods for Chemical Analysis of Water and Wastes,Govern- [22] G. Schwartz, G. Eshel, M. Ben-Haim, and E. Ben-Dor, ment Printing Office, Washington, DC, USA, 1983. “Rapid methods for classification and quantitative assessment [7] G. Schwartz, G. Eshel, and E. Ben-Dor, “Reflectance spec- of petroleum hydrocarbons pollution in soil samples using troscopy as a tool for monitoring contaminated soils,” in Soil reflectance spectroscopy,” EGU 2009-11441-2, Vienna, Aus- Contamination, Intech, 2011. tria, 2009. [8]R.S.G.Gomez,´ T. Pandiyan, V. E. A. Iris, V. Luna-Pabello, [23] S. Chakraborty, D. C. Weindorf, C. L. S. Morgan et al., “Rapid and C. D. de Bazua,´ “Spectroscopic determination of poly- identification of oil-contaminated soils using visible near- ff aromatic compounds in petroleum contaminated soils,” infrared di use reflectance spectroscopy,” Journal of Environ- Water, Air, and Soil Pollution, vol. 158, no. 1, pp. 137–151, mental Quality, vol. 39, no. 4, pp. 1378–1387, 2010. 2004. [24] T. Lammoglia and C. R. de S. Filho, “Spectroscopic character- [9] J. Krupc´ık, P. Oswald, D. Oktavec, and D. W. Armstrong, ization of oils yielded from Brazilian offshore basins: potential “Calibration of GC-FID and IR spectrometric methods for applications of remote sensing,” Remote Sensing of Environ- determination of high boiling petroleum hydrocarbons in ment, vol. 115, no. 10, pp. 2525–2535, 2011. environmental samples,” Water, Air, and Soil Pollution, vol. [25] J. Dan and H. Koyumdjisky, “The soils of israel and their 153, no. 1–4, pp. 329–341, 2004. distribution,” EuropeanJournalofSoilScience, vol. 14, no. 1, [10] G. Xie, M. J. Barcelona, and J. Fang, “Quantification and pp. 12–20, 1963. interpretation of total petroleum hydrocarbons in sediment [26] S. S. Staff, Keys to Soil Taxonomy, Government Printing Office, samples by a GC/MS method and comparison with EPA 418.1 2010. Applied and Environmental Soil Science 11

[27] D. L. Carter, M. M. Mortland, and W. D. Kemper, “Specific surface,” in MethodsofSoilAnalysisPartI.SoilScience,A. Klute, Ed., pp. 413–422, Society of America, Madison, Wis, USA, 1986. [28] G. Eshel, G. J. Levy, U. Mingelgrin, and M. J. Singer, “Critical evaluation of the use of laser diffraction for particle-size distribution analysis,” Soil Science Society of America Journal, vol. 68, no. 3, pp. 736–743, 2004. [29] A. Pimstein, E. Ben-Dor, and G. Notesco, “Performance of three identical spectrometers in retrieving soil reflectance under laboratory conditions,” Soil Science Society of America Journal, vol. 75, no. 2, pp. 746–759, 2011. [30] G. Schwartz, G. Eshel, M. Ben-Haim, and E. Ben-Dor, Reflectance Spectroscopy as a Rapid Tool for Qualitative Map- ping and Classification of Hydrocarbons Soil Contamination,Tel Aviv, Israel, 2009. [31] G. Schwartz, G. Eshel, M. Ben-Haim, and E. Ben-Dor, Quanti- tative Assessment of Petroleum Hydrocarbons in Situ by Diffused Reflectance Spectroscopy and a Penetrating Optical Sensor,GFZ, Potsdam, Germany, 2010. [32] G. Schwartz, G. Eshel, and E. Ben-Dor, An Operational Spectral Based Model to Predict Soil Petroleum Hydrocarbon Content in Field Samples, Edinburgh, Scotland, 2011. [33] G. Schwartz, Reflectance spectroscopy as a rapid tool for qualitative mapping and classification of hydrocarbons soil con- tamination, Ph.D. thesis, Tel Aviv University, 2012.