<<

Utah State University DigitalCommons@USU

All Graduate Theses and Dissertations Graduate Studies

5-2015

Digital Mapping Using Landscape Stratification for Arid Rangelands in the Eastern Great Basin, Central Utah

Brook B. Fonnesbeck Utah State University

Follow this and additional works at: https://digitalcommons.usu.edu/etd

Part of the Commons

Recommended Citation Fonnesbeck, Brook B., " Using Landscape Stratification for Arid Rangelands in the Eastern Great Basin, Central Utah" (2015). All Graduate Theses and Dissertations. 4525. https://digitalcommons.usu.edu/etd/4525

This Thesis is brought to you for free and open access by the Graduate Studies at DigitalCommons@USU. It has been accepted for inclusion in All Graduate Theses and Dissertations by an authorized administrator of DigitalCommons@USU. For more information, please contact [email protected]. DIGITAL SOIL MAPPING USING LANDSCAPE STRATIFICATION

FOR ARID RANGELANDS IN THE EASTERN GREAT

BASIN, CENTRAL UTAH

by

Brook B. Fonnesbeck

A thesis submitted in partial fulfillment of the requirements for the degree

of

MASTER OF SCIENCE

in

Soil Science

Approved:

______Dr. Janis L. Boettinger Dr. Joel L. Pederson Major Professor Committee Member

______Dr. R Douglas Ramsey Dr. Mark R. McLellan Committee Member Dean of the School of Graduate Studies

UTAH STATE UNIVERSITY

Logan, Utah

2015 ii

Copyright © Brook B. Fonnesbeck

2015

iii

ABSTRACT

Digital Soil Mapping Using Landscape Stratification for Arid

Rangelands in the Eastern Great Basin, Central Utah

by

Brook B. Fonnesbeck, Master of Science

Utah State University, 2015

Major Professor: Dr. Janis L. Boettinger Department: Plants, and

Digital soil mapping typically involves inputs of digital elevation models, remotely sensed imagery, and other spatially explicit digital data as environmental covariates to predict soil classes and attributes over a landscape using statistical models.

Digital imagery from Landsat 5, a , and a digital map were used as environmental covariates in a 67,000-ha study area of the Great Basin west of Fillmore, UT. A “pre-map” was created for selecting sampling locations. Several indices were derived from the Landsat imagery, including a normalized difference vegetation index, normalized difference ratios from bands 5/2, bands 5/7, bands 4/7, and bands 5/4. Slope, topographic curvature, inverse wetness index, and area solar radiation were calculated from the digital elevation model. The greatest variation across the study area was found by calculating the Optimum Index Factor of covariates, choosing band 7, normalized difference ratio bands 5/2, normalized difference vegetation index, slope, iv

profile curvature, and area solar radiation. A 20-class ISODATA unsupervised

classification of these six data layers was reduced to 12. Comparing the 12-class map to a

geologic map, 166 sites were chosen weighted by areal extent; 158 sites were visited.

Twelve points were added using case-based reasoning to total 170 points for model training. A validation set of 50 sites was selected using conditioned Latin Hypercube

Sampling. Density plots of sample sets compared to raw data produced comparable results. Geology was used to stratify the study area into areas above and below the Lake

Bonneville highstand shoreline. Raster data were subset to these areas, and predictions were made on each area. Spatial modeling was performed with three different models: random forests, support vector machines, and bagged classification trees. A set of covariates selected by random forests variable importance and the set of Optimum Index

Factor covariates were used in the models. The Optimum Index Factor covariates produced the best classification using random forests. Classification accuracy was 45.7%.

The predictive rasters may not be useful for unit delineation, but using a hybrid method to guide further sampling using the pre-map and standard sampling techniques can produce a reasonable soil map.

(113 pages)

v

PUBLIC ABSTRACT

Digital Soil Mapping Using Landscape Stratification for Arid

Rangelands in the Eastern Great Basin, Central Utah

Brook B. Fonnesbeck

In some parts of the western US there is limited publicly available soil information that can be used to make decisions on both public and private land. A goal of the USDI Bureau of Land Management (BLM) in Utah was to map an area in central Utah where such soil maps and value-added information was not available for management and restoration decisions following a wildfire. In 2007, the

Milford Flat Fire had burned more than 363,000 acres, removing vegetation that was holding erosion-sensitive soils in place. Following inconsistent results from stabilization and restoration efforts, this study was funded to create soil maps for a part of the burned area west of Fillmore, UT.

Soil maps were created over an area of more than 146,000 acres using predictive statistical models that incorporated geographic information systems and statistical software. Over two field seasons soil data were collected by excavating and describing the soil more than 150 sites over the project area. The coordinates of physical locations were recorded, and soils were sampled, described, characterized, and classified to a soil series that could be used to make interpretations for management and restoration decisions. Two sets of sampling sites were collected: one to create models and maps of soils in the project area, and another set to validate the accuracy of those maps. The vi project area was split into two areas: one above the Lake Bonneville highstand shoreline, and one below. Points were separated out between those above and those below the shoreline. Modeling results were less accurate than desired below the shoreline, but could be useful to guide further mapping and refining of subsequent soil maps. The dominant soil order predicted was ; some had high calcium carbonate content, and some had high content with high sodium. The soil distribution above the shoreline was estimated since there were not enough points to model any soils with accuracy.

vii

ACKNOWLEDGMENTS

I would like to thank the United States Department of the Interior Bureau of Land

Management for funding this research project, as well as Jeremy Jarnecke and Lisa

Bryant for taking an interest in my work and project. I would also like to thank the staff

of the BLM Field Office in Fillmore, UT, including Bill Thompson, Mike Gates, and

Dave Whitaker, for assistance in the field and maps of the project area. I would especially like to thank Dr. Janis L. Boettinger for providing every opportunity to conduct this

research project, providing expert knowledge of digital soil mapping, and encouragement

to push through to the end. I would like to thank the Utah Agricultural Experiment

Station at Utah State University for providing additional funding for this project. I would

also like to thank my committee members, Dr. R. Douglas Ramsey and Dr. Joel L.

Pederson. This project consisted of a good portion of their fields of expertise, and I could

not have done this project without their practical advice and encouragement.

I would also like to thank my wife, Lacy Fonnesbeck, for putting up with me the

last three years while I worked through this project, supporting me in my graduate career,

and loving me through it all. I could not ask for a better companion by my side. I would like to thank Suzann Kienast-Brown for her assistance and advice in the wide world of

digital soil mapping, and helping to break up the monotony in the lab. I would like to thank John R. Lawley for his advice, friendship, encouragement, and unselfish assistance in the field and in the lab, without which this project could not have been completed. I would like to thank Dr. Colby Brungard for his advice, assistance, knowledge, friendship, viii

and encouragement through the long hours of field work and research in the lab. I would

also like to thank the student staff in the Soil Genesis Lab who put in a lot of work to

analyze samples and data: Dan Horne, Ingrid Merrill, Jon Jones, Vance Almquist, Leanna

Hayes, Angie Swainston, and Jeremiah Armentrout. Lastly, I would like to thank my family for their love and support through all the years I’ve put into my education.

Brook Fonnesbeck

ix

CONTENTS

Page

ABSTRACT ...... iii

PUBLIC ABSTRACT ...... v

ACKNOWLEDGMENTS ...... vii

LIST OF TABLES ...... xi

LIST OF FIGURES ...... xii

INTRODUCTION ...... 1

MATERIALS AND METHODS ...... 7

Study Area Description ...... 7 Data Layers ...... 16

Remotely Sensed Imagery...... 16 Digital Elevation Model ...... 21 Geology ...... 24

Geospatial Analysis ...... 24

Imagery Classification ...... 24 Sample Site Selection ...... 24

Field Methods ...... 29 Laboratory Methods...... 31 Soil and Landscape Concept Development ...... 32 Predictive Modeling and Stratification ...... 33

Random Forests ...... 34 Support Vector Machines ...... 35 Bagged Classification Trees ...... 36

RESULTS AND DISCUSSION ...... 38

Comparison of Sampling Methods ...... 38 Soil and Landscape Concepts ...... 41 x

Predictive Models ...... 45

Random Forests ...... 47 Support Vector Machines ...... 56 Bagged Classification Trees ...... 61

Modeling Summary ...... 65

CONCLUSIONS...... 69

REFERENCES ...... 70

APPENDICES ...... 79

Appendix A – Figures of the covariates used in OIF, and classification...... 80 Appendix B – Summary Data for the Typical Pedons...... 95

xi

LIST OF TABLES

Table Page

1. This table shows the digital data layers used in classification, the source of the data, and the SCORPAN covariate represented by the layer...... 19

2. Soil series used for pedon classifications in the study area...... 28

3. A table of taxonomic classes associated with ID numbers in the following confusion matrices for classifications...... 52

4. Confusion matrix from the validation accuracy assessment of random forests predictions below Bonneville shoreline from RF covariates...... 53

5. Confusion matrix from the validation accuracy assessment of random forests predictions from OIF-selected covariates...... 53

6. Confusion matrix from the validation accuracy assessment of Support Vector Machine predictions from OIF-selected covariates...... 57

7. Confusion matrix from the validation accuracy assessment of Support Vector Machine predictions from RF covariates...... 58

8. Confusion matrix from the validation accuracy assessment of Bagged Classification Trees predictions from OIF-selected covariates...... 63

9. Confusion matrix from the validation accuracy assessment of Bagged Classification Trees predictions from RF covariates...... 63

10. The table of typical pedons showing all data collected. Some samples were not analyzed due to lack of adequate soil...... 95

xii

LIST OF FIGURES

Figure Page

1. Map of the defined study area outlined in green, with associated geographic features...... 8

2. A geology map of the study area (modified from Hintze et al., 2003). The study area is outlined in blue...... 9

3. A chart of the data obtained from the jNSM for the Deseret climate station. The chart shows the soil temperature throughout the year, as well and the moisture state of the control section. These data are used to calculate the SMR and STR at the site...... 12

4. A graph of the water balance throughout the year as calculated by the jNSM for the Deseret climate station...... 13

5. A chart of the data obtained from the jNSM for the Black Rock Junction climate station. The chart shows the soil temperature throughout the year, as well and the moisture state of the soil moisture control section. These data were used to calculate the SMR and STR at the site...... 14

6. A graph of the water balance throughout the year as calculated by the jNSM for the Black Rock Junction climate station...... 15

7. This map shows the Landsat 5 Thematic Mapper imagery used for analysis. Bands 7, 5, and 1, are displayed...... 17

8. This map shows the digital elevation model used in the analysis, overlain onto a hillshade model to show relief...... 22

9. This map shows the 20-class unsupervised classification by ISODATA clustering...... 25

10. This map shows the refined 12-class ISODATA clustered image after classes were collapsed based on statistical separability...... 26

11. The COST raster used in the conditioned Latin Hypercube Sampling method. This layer was created by taking Euclidian distance from roads, multiplied by the slope map...... 30

xiii

12. Density plots of data at each sampling point of the training sample and validation sample set from each covariate...... 39

13. Density plots of data at each sampling point of the training sample and validation sample set from each covariate, compared to the original covariates to check sample distributions...... 40

14. This map shows the Provo Highstand Shoreline emphasized for greater visualization in Clear Spot Flat...... 44

15. The variable importance plot for classification above the Bonneville Highstand Shoreline. Ten covariates were retained...... 46

16. This map shows the predictive raster created from random forests output for subgroup classification on areas above the Bonneville shoreline...... 48

17. The variable importance plot from random forests classification of areas below the Bonneville shoreline. Six covariates were retained for further classification. 50

18. This map shows the predictive raster created from random forests output of subgroup classification using covariates selected by random forests variable importance for areas below the Bonneville shoreline...... 51

19. This map shows the predictive raster created from random forests output of subgroup classification using covariates selected by Optimum Index Factor (OIF) for areas below the Bonneville shoreline...... 55

20. This map shows the predictive raster created by Support Vector Machines modeling of subgroup classification using random forests-selected covariates for areas below the Bonneville shoreline...... 59

21. This map shows the predictive raster created by Support Vector Machines modeling of subgroup classification using OIF-selected covariates for areas below the Bonneville shoreline...... 60

22. This map shows the predictive raster created by Bagged Classification Trees modeling of subgroup classification using random forests-selected covariates for areas below the Bonneville shoreline...... 64

23. This map shows the predictive raster created by Bagged Classification Trees modeling of subgroup classification using OIF-selected covariates for areas below the Bonneville shoreline...... 66

xiv

24. A map of the Tasseled Cap Transformation showing brightness, greeness, and wetness bands. Brightness and greeness were used to calculate the Greeness Above Bare Soil Index shown below...... 80

25. The GRABS index used in the OIF analysis...... 81

26. The Normalized Difference Vegetation Index used in the OIF covariate classification...... 82

27. The 5/7 Normalized Difference Ratio (NDR). This layer was created using bands 5 and 7 from Landsat 5 TM imagery...... 83

28. The 5/4 NDR. This layer was created using bands 5 and 4 from Landsat 5 TM imagery...... 84

29. The 4/7 NDR. This layer was created using bands 4 and 7 from Landsat 5 TM imagery...... 85

30. The 5/2 NDR. This layer was created using bands 5 and 2 from Landsat 5 TM imagery...... 86

31. A slope map derived from the digital elevation model (DEM) using ArcSIE derivatives...... 87

32. A curvature map derived from the DEM using ArcSIE terrain derivatives. Green values are convex shapes, and red values are concave...... 88

33. A profile curvature map derived from the DEM using ArcSIE terrain derivatives. Green values are convex shapes, and red values are concave. Profile curvature is measured vertically along the hillslope...... 89

34. A planform curvature map derived from the DEM using Arc SIE terrain derivatives. Green values are convex shapes, and red values are concave. Planform curvature is measured horizontally along the hillslope...... 90

35. The inverse wetness index, also called the Slope over Contributing Area Ratio, calculated using TauDEM. Higher values indicate areas that are water-shedding...... 91

36. The Area Solar Radiation (ASR) map generated from the DEM in ArcGIS using the Spring Equinox layer. Higher values indicate greater insolation...... 92

xv

37. The three Landsat-derived OIF covariates shown in the final layer stack that were used for unsupervised classification. Covariates were Band 7, 5/2 NDR, and NDVI...... 93

38. The three DEM-derived OIF covariates shown in the final layer stack that were used for unsupervised classification. Bands shown are Slope, Profile Curvature, and ASR...... 94

INTRODUCTION

In the arid Western US, there are many areas with no to very limited publicly

available soils data. High quality soils data can be used by multiple agencies, landowners,

and managers to aid in conservation and restoration efforts. Typically, soil surveys have

been done by soil scientists from the Natural Resources Conservation Service (NRCS)

using traditional methods. These included making field observations, using either a transect line method over a particular landscape, or a targeted sampling scheme where predetermined sampling sites have been selected. Most of the traditional methods are fairly labor intensive and costly. With constrained resources, the amount of land that can be mapped may be limited. Typical products also do not assess the accuracy of the soil map units.

A useful approach to mapping soils in large areas of unimproved arid lands that can be costly to map with traditional methods is digital soil mapping (DSM). DSM combines field observations and laboratory data, with remote and proximal soil observations in spatially explicit digital data and quantitative methods to infer or predict spatial patterns of soils and soil characteristics over a landscape (Grunwald, 2010).

McBratney et al. (2003), Grunwald (2009), Lagacherie and McBratney (2007), and

Lagacherie (2008) provided overviews of common approaches and techniques in DSM.

Given quality digital data layers, numerous soil observations, geographic information systems (GIS), and statistical analysis, DSM can quantitatively predict soil attribute or type extents over large areas and provide an estimate of uncertainty.

Based on the five soil forming factors developed by Jenny (1941), McBratney et al. (2003) proposed seven environmental covariates for digital soil mapping. This model 2 is known as the SCORPAN model. Each part of the model represents different covariates.

Sa or Sc = f(SCORPAN + ε), where:

Sa, Sc: predicted soil attributes or classes;

S: soil properties, attributes, or classes in the form of point observations, raster data, or polygons;

C: climate in the form of polygon or raster data;

O: , usually modeled with remotely sensed (RS) imagery and derivatives;

R: relief or topography, modeled using digital elevation models and derivatives;

P: parent material, modeled using RS imagery derivatives or geology;

A: age of soils or ;

N: spatial location, done by using GIS and site specific observations;

ε representing error in the model.

In digital soil mapping, digital data layers typically represent one or more of these covariates (McBratney et al., 2003). Environmental covariates are typically digital elevation models (DEMs) and associated topographic derivatives (Bilgili, 2013; Shi et al.,

2009), hyperspectral and RS imagery and derived indices (Boettinger, 2010; Ge et al.,

2011; Mulder et al., 2011; Rivero et al., 2007), and a combination of DEMs and spectral

imagery (Boettinger, 2010; Brungard and Boettinger, 2010; Kienast-Brown and

Boettinger, 2010; Ziadat et al., 2003). These approaches have been used to model soil

organic carbon (Bartholomeus et al., 2008; Minasny et al., 2013), soil phosphorous

(Grunwald et al., 2004; Litaor et al., 2003; Rivero et al., 2007), soil iron (Bartholomeus et

al., 2007; Litaor et al., 2003), and and gypsum (Bilgili, 2013; Nield et al., 3

2007) as well as soil classes and associated attributes (Evans, 2013; Jafari et al., 2012).

Typical sample site selection in DSM is random, usually done using some form of stratified sampling through data contained in the digital covariates. One sampling method used with increasing frequency is Conditioned Latin Hypercube Sampling (cLHS)

(Brungard et al., 2015; Brungard and Boettinger, 2010; Minasny and McBratney, 2006).

The ability to accurately predict different soil attributes is very desirable in remote areas where access to sampling sites is limited by distance and/or terrain. In the western

US, these areas include arid landscapes, with very little initial or very coarse resolution soil survey. Arid lands lacking initial soil survey have been mapped with success, predicting soil attributes and types over large extents (Bilgili, 2013; Brungard and

Boettinger, 2010; Jafari et al., 2012; Nield et al., 2007; Stum et al., 2010). Using conventional soil survey approaches, these areas would require considerable resources to map. Using DSM techniques, sampling sites can be selected based on quantitative data and often using statistical methods to characterize potential soil map units, sampling plans can be targeted to characterize particular areas while minimizing travel time, and the map can be produced from start to finish in a digital environment.

The unmapped aridlands in the western US have become a focus point for land management agencies because soils data are needed to facilitate restoration, and conservation of areas susceptible to anthropogenic or natural disturbances. One particular area in central Utah, Millard County, west of Fillmore, south of Delta, and north of

Milford was disturbed in 2007 after the largest wildfire in Utah history, the Milford Flat

Fire. The fire burned over 337,000 acres (>136,000 ha) and burned across a diversity of 4 landforms and vegetation cover types, exposing the soil in an area that is prone to high winds brought on by approaching storm systems along the Wasatch Front (Jewell and

Nicoll, 2011). These same winds were responsible for the creation of the Little Sahara

Sand Dunes outside of Lynndyl, UT, from the sediments left behind by the Sevier River delta of pluvial Lake Bonneville (Sack, 1987). Following the fire, the Bureau of Land

Management (BLM) implemented Emergency Stabilization and Restoration (ESR) efforts. The majority of ESR efforts were successful the following year, especially in areas with higher precipitation. However, some areas experienced severe erosion and sediment transport by wind because of the lack of vegetation cover and surface disturbance resulting from unsuccessful restoration efforts. A high quality soil map of the area could have aided the BLM in the ESR efforts to designate areas best left undisturbed post-fire.

In traditional soil survey performed by the NRCS soil survey staff, soil pedons are described by transect or targeted sampling, and inferences on soils over the landscape are made by correlations with vegetation, geomorphic surfaces, slope, geology types, etc.

Soil map-unit lines are drawn over orthoimagery to delineate soil classes. A more contemporary method is to use a “pre-map”, usually made of some digital RS imagery and DEM derivatives as covariates to represent landscape features such as vegetation or parent materials. The pre-map is used to stratify the area, and display variation over that landscape, sometimes using an unsupervised classification to enhance visibility of variation over the landscape (e,g., Kienast-Brown and Boettinger, 2010). This allows preliminary delineations of potential soil map units, and surveyors can then sample along 5 transects across the area to determine changes in soil properties. While this method tends to rely more on expert knowledge of the surveyor, it can be considered a hybrid method, incorporating digital covariates and some statistical measure (e.g., ISODATA clustering) into the conventional mapping process to increase the likelihood that the variability of the soils of an area is captured. The original unsupervised ISODATA classification pre-map classes can be linked to the soil observations in the training dataset, and then verified using the validation dataset.

Digital soil mapping can be a useful approach to soil survey that can provide detailed soil maps to guide soil interpretations for management and use. In this study, digital soil mapping was used to produce a map of soils in an area of Millard County,

Utah, that was partly impacted by the Milford Flat Fire. Digital covariates consisted of

RS data and DEMs, and associated derivatives of each. Two sampling sets were generated by different methods: 1) unsupervised classification of digital covariates with sampling points distributed within classes by areal extent of geologic map units to create a pre-map, and 2) A validation sample set was generated by conditioned Latin Hypercube

Sampling (cLHS), which sampled through digital covariates used in the pre-map. Both sample sets were collected and soils were characterized, classified to a taxonomic class, and correlated to a soil series. Using the data collected in sample set 1, soil distribution was predicted using quantitative modeling approaches and digital covariates as predictor variables. The models used were random forests (RF), support vector machines (SVM), and bagged classification trees (BCT). Variable selection was also tested using predictor variables selected by RF importance and selection by Optimum Index Factor (OIF) to 6 maximize predictive ability. Accuracy of models was measured using overall cross- validation, out-of-bag (OOB) error for random forests, and class accuracies for each model. The limited number of observations in the field over such a large area produced less than ideal modeling results. Therefore I suggest a hybrid method that incorporates digital soil mapping and traditional methods to delineate soil map units with guidance from the unsupervised classification.

7

MATERIALS AND METHODS

Study Area Description

The study area in central Utah is part of the Great Basin in the Basin and Range physiographic province, and comprises Ridge, Clear Spot Flat, Beaver and Lava

Ridges, North and South Twin Peaks, and the southeastern spur of the Cricket Mountains

(Figure 1). The outlined study area was designated by personnel at the Bureau of Land

Management Fillmore Field Office, and is 66,679 ha (164,767 acres). It lies 60 km north of Milford, UT, 25 km east of Fillmore, UT, and 18 km east of Kanosh, UT. Clear Lake

Waterfowl Management Area is north, and the Cricket Mountains are west of the study area. The Sevier River runs north of the study area creating some marshlands outside of

Delta, UT, located 45 km to the north, before draining into Sevier Lake. The Black Rock

Desert is northeast, where volcanism occurred in the late Pleistocene to early Holocene

(Hintze and Davis, 2007). Landforms of the area include Lake Bonneville plains, outflow deltas, outflow river terraces, and alluvial fans, and late Pleistocene-age Lake Bonneville shorelines that occur along the valley margins. Elevations range from 1420 to 1833 m.

Geology of the study area is represented in Figure 2 (modified from Hintze et al.,

2003). Geologic unit descriptions are given by Hintze and Davis (2007), simplified for the following paragraph. The oldest geologic units are in the Cricket Mountains to the west which are mainly limestones, dolomites, and sandstones from middle and late

Cambrian time, with some Cretaceous and Tertiary conglomerates and sandstones. The rolling hills in southwest of the study area are underlain by Pliocene obsidian, felsite,

8

Figure 1. Map of the defined study area outlined in green, with associated geographic features.

9

Hintze et al., 2003). The study area is outlined in blue. The study area is outlined Hintze et al., 2003). Figure 2. A geology map of the study area (modified from Figure 2. A geology map of the study

10 pumice, and other felsic volcanic rocks. Tertiary basalt flows occur in the southeastern corner of the study area. These rocks have contributed little to soil parent materials, as this unit was covered by Lake Bonneville which deposited lacustrine sediments over the basalt. Some rhyolitic hills (termed Rhyolite Hills hereafter) erupted in the southern end of the study area, including North and South Twin Peaks. Some early Quaternary volcanism occurred in the southwest end and northeast corners of the study area. These units also were covered by Lake Bonneville, contributing little to soil parent materials.

Basalt and andesitic basalt flows from the late Pleistocene and early Holocene epochs occurred on the eastern side of the study area, with cinder cones and vents in the

Tabernacle Hill and The Cinders lava flows.

A distinct division in geologic substrate was created by late Pleistocene pluvial

Lake Bonneville and its associated shorelines, which deposited significant amounts of lacustrine sediment in valley bottoms. Lake Bonneville’s highstand shoreline around

19,000 yr before present (dates by radiocarbon measurement) was at 1587 m elevation.

Its lower Provo shoreline formed around 16,000 yr before present lies at 1456 m (Hintze,

2005). Alluvial sediments of highland drainages above the Bonneville Highstand were deposited mainly in the mid- to late-Pleistocene epoch as alluvial fans and alluvial slopes across piedmonts. and eolian sand were deposited over lacustrine sediment in the early Holocene after Lake Bonneville receded, and this continues at the present time.

The climate of the study area is semi-arid to arid with warm summers and cold winters. Temperature data were generalized over the entire area from surrounding weather stations at Fort Deseret near Delta, UT, and Black Rock Junction, UT (Figure 1), 11 using the 30 year normals to calculate the soil temperature regime (STR). Mean annual temperature is 10.2°C at Fort Deseret, and 10.0°C at Black Rock Junction (Utah Climate

Center, 2013a), which give an estimated mean annual soil temperature (MAST) of

11.1°C (Buol et al., 2011). Mean summer temperature is 20.4°C at Black Rock Junction, giving a mean summer soil temperature of 18.4°C. Mean summer temperature at Fort

Deseret is 21.3°C, calculating a mean summer soil temperature of 19.3°C. Mean winter temperature is -1.1°C at Black Rock Junction, making the mean winter soil temperature

0.9°C. Mean winter temperature at Fort Deseret is -1.8°C, making mean winter soil temperature 0.2°C. Mean annual precipitation is 240.8 mm at Black Rock Junction, and

222.5 mm at Fort Deseret (Utah Climate Center, 2013b). The NRCS Java Newhall

Simulation Model (jNSM) was used to calculate the soil moisture regime and soil temperature regime using the data from the 30-yr normals for annual precipitation and temperature data (Figures 3 – 6). The jNSM calculates water deficits and soil temperatures to determine the soil moisture regime (SMR), and STR. The typical SMR in the flats of study area is aridic, with a mesic STR. Typical SMR of uplands may be xeric, given the dominant vegetation types. After consulting the PRISM precipitation data available for the area, the areas above the influence of Lake Bonneville were considered to have an aridic SMR bordering on a xeric SMR (xeric aridic SMR).

Typical vegetation varies over the landscape (based on personal observations). In unburned lacustrine sediment, vegetation is dominated by shadscale (Atriplex confertifolia), greasewood (Sarcobatus vermiculatus) and gray molly or kochia (Kochia 12

are used to calculate the The chart shows the soil temperature The chart shows ntrol section. These data state of the soil moisture co the jNSM for Deseret climate station. Figure 3. A chart of the data obtained from throughout the year, as well and moisture SMR and STR at the site.

13

SM for the Deseret climate station. roughout the year as calculated by jN Figure 4. A graph of the water balance th

14

ate station. The chart shows the soil The chart shows ate station. re control section. These data were used the jNSM for Black Rock Junction clim well and the moisture state of soil moistu temperature throughout the year, as to calculate the SMR and STR at site. Figure 5. A chart of the data obtained from

15

e jNSM for the Black Rock Junction e jNSM for the Black throughout the year as calculated by th Figure 6. A graph of the water balance climate station.

16 americana), with some biological . In many disturbed lacustrine substrates, dominant vegetation is Russian thistle (Salsola kali L.) and halogeton (Halogeton glomeratus C.A. Mey.). Dune are dominantly covered by shadscale, rabbitbrush

(Chrysothamnus viscidiflorus), and cheatgrass (Bromus tectorum), with some annual grasses. Alluvial uplands are dominantly covered by Wyoming big sagebrush (Artemisia tridentata ssp. wyomingensis) and black sagebrush (Artemisia nova) in rockier areas, with significant cheatgrass invasion along high flats and alluvial fans with other small grasses and forbs. Utah juniper (Juniperus osteosperma) dominates areas with significantly coarse-textured soils.

Data Layers

Remotely Sensed Imagery. Digital data layers were gathered to represent or

derive covariates to be used in the creation of a sampling plan for predictive modeling.

Landsat 5 TM imagery, dated September 8, 2011, was acquired from the United States

Geological Survey (USGS) Earth Resources Observation and Science Center (EROS)

Global Visualization Viewer (GloVis) (Figure 7). The imagery consisted of seven bands at 30-m resolution, from platform flight path 38, row 33. The imagery was downloaded

as separate TIFF files, and was compiled into a single 6-layer multiband image using

ERDAS Imagine (Leica Geosystems, 2011). Imagery was reprojected to Universal

Transverse Mercator (UTM) North American Datum 1983 (NAD83) Zone 12N,

Geographical Reference Sphere (GRS) 1980, then standardized using a model obtained

from the Utah State University /GIS (USU RS/GIS) Lab

17

7, 5, and 1, are displayed. ery used for analysis. Bands Figure 7. This map shows the Landsat 5 Thematic Mapper imag Figure 7. This map shows

18 specific for Landsat 5 TM imagery and used in ERDAS Imagine. The standardization model converted digital number (scaled from 0 to 255) to percent reflectance on each band. In order to convert to percent reflectance, the model first subtracts the dark object number for each band from all values in the image to remove values considered less than zero. Dark object numbers were chosen from the histogram of each image band at the first inflection point of the histogram. Subtracting dark object numbers essentially assumed that any data below the inflection point was noise in the image. The standardization model then used a COST (COStheta) correction (Chavez, 1996) to correct for atmospheric reflectance. The COST correction used the cosine of the solar elevation angle of the Landsat image, measured in radians, to correct for solar angle multiplied onto the values after subtracting dark object numbers. A tau factor was used in the standardization model to correct for sensor angle to the land surface. This step is sometimes not necessary in the arid West, but was used since the area is sometimes hazy in the summer. The values of percent reflectance were then rescaled to the original 0 to

255 values to keep the 8-bit data format.

Soil maps produced from this project will be created at a 3rd order scale (1:20,000

– 1:63,360) with a minimum unit mapped of >2 ha. In order to match the desired

mapping scale, pixel resolution of all data layers and imagery were matched and aligned

to the DEM (9.3274837 m) to have a finer resolution than 30 m. Pixel resolutions of all

imagery were resampled using bilinear interpolation, because it was the best match to the

statistics of the original image. A subset was taken using a mask of a square area around the project boundary buffered by a minimum 10 km distance, then the subset was 19 resampled to match cell centers of every layer using the nearest neighbor method with the raster (Hijmans and van Etten, 2012) and rgdal (Keitt et al., 2012) packages in R software (R Development Core Team, 2011).

An Optimum Index Factor (OIF) (Chavez et al., 1982) was calculated on the six

Landsat bands to determine the three-band combination that contained the greatest amount of information in the Landsat imagery while not containing the same data. The top two 3-band combinations (4, 5, 7 and 3, 5, 7) indicated the four bands that were selected to be analyzed further. Various indices were calculated from the Landsat imagery using either Imagine or R software (Table 1). A Kauth-Thomas Transformation

(Crist and Kauth, 1986), also called a Tasseled Cap Transformation was calculated in R using the six Landsat bands. The results of the Tasseled Cap Transformation included six different components that represented different aspects of the imagery data. The first

Table 1. This table shows the digital data layers used in classification, the source of the data, and the SCORPAN covariate represented by the layer.

Data Layer Data Source Covariate NDVI Landsat 5 Vegetation (O) 5/7 NDR Landsat 5 Parent Material (P) 5/4 NDR Landsat 5 Parent Material (P) 4/7 NDR Landsat 5 Parent Material/Vegetation (P/O) 5/2 NDR Landsat 5 Parent Material (P) Slope DEM (10 m) Relief and Topography (R) Curvature DEM Relief and Topography (R) Planform Curvature DEM Relief and Topography (R) Profile Curvature DEM Relief and Topography (R) Area Solar Radiation DEM Relief and Topography (R) Inverse Wetness Index DEM Relief and Topography (R) Geology UGS Shapefile Parent Material (P)

20 three components are considered brightness, greenness, and wetness. Brightness and wetness were used to calculate a Greenness Above Bare Soil (GRABS) Index (Hay et al.,

1979). The GRABS Index is used to represent vegetation (O) by removing the soil background (Jensen, 2005, p. 316). A normalized-difference vegetation index (NDVI) was created to also represent vegetation (O) (Jensen, 2005, p. 311). A 5/7 normalized difference ratio (NDR), a normalized version of the MidIR Index which has shown a strong correlation with soil moisture (Musick and Pelletier, 1988; Jensen, 2005, p. 317) was used to model parent materials (P) or soil properties (S) such as surface salts (e.g., gypsum) (Nield et al., 2007). A 5/4 NDR, also called a Normalized Difference Built-up

Index (Jensen, 2005, p. 322) was used to model P and S (Nield et al., 2007). A 4/7 NDR was used to model P and O. Bands 4 and 7 from Landsat 5 TM were shown to be useful as a 2-D proxy for the Tasseled Cap coefficient representing greenness because band 7 contains useful soil information, and band 4 contains useful vegetation information

(Jackson, 1983). Therefore, the 4/7 NDR was used here to model mineralogical influences with vegetation (P and O). A 5/2 NDR was used for P and S to distinguish igneous from sedimentary rocks and associated transported materials (Stum et al., 2010).

After compiling the GRABS Index, the NDVI, 5/7 NDR, 5/4 NDR, 4/7 NDR, and 5/2

NDR were combined into a stack, an OIF was calculated, returning the GRABS Index,

NDVI, and 5/2 NDR. The 5/7 NDR, 5/4 NDR, and the 4/7 NDR were discarded. The

GRABS Index was also discarded because it was considered to be too closely correlated to the NDVI (0.48). The NDVI and the 5/2 NDR were combined with the Landsat bands

3, 4, 5, and 7 into a 6-layer multiband image. Another OIF was performed on this layer 21 stack, with the highest ranked 3-band combination (band 7, NDVI, 5/2 NDR) selected for further analysis. All covariates from RS imagery can be referenced in Appendix A.

Digital Elevation Model. A 10-m (9.327 m) resolution DEM was acquired to generate surface derivatives to create a sampling plan for predictive modeling. The DEM was downloaded from the Utah Automated Geographic Reference Center (AGRC) from the National Elevation Dataset (NED) for Millard County, UT (Figure 8). The Millard

County DEM was obtained in .dem format and converted to a raster. The DEM was used to calculate several different layers representing relief or topography (R) (Table 1).

Slope, curvature, planform curvature, and profile curvature were calculated using the ArcMap Soil Inference Engine Add-in (ArcSIE). The ArcSIE metrics were calculated using a square 30-m neighborhood using the Shi method for calculation (Shi et al., 2007).

The Shi method using a square 30-m neighborhood (a 7 x 7 pixel filter) generalized landscape features without over-smoothing, or leaving too much detail. Slope was calculated in percentage (45° = 100%). Curvature measures slope change across distance, profile curvature is measured along a transect, and planform curvature is measured in lateral mapview. Positive curvature values represent convex water-shedding areas, and negative values represent concave water-gathering areas.

An Area Solar Radiation (ASR) index was created in ArcMap calculating for seasonal dates. The ASR index approximated the amount of solar radiation (kW m-2) an area may receive either at a specific date or over a period of time. The spring equinox was used to approximate incoming radiation over the whole year.

22

odel to show relief. rlain onto a hillshade m ital elevation model used in the analysis, ove model used in the analysis, elevation ital Figure 8. This map shows the dig Figure 8. This map shows

23

An inverse wetness index (IWI) was calculated using the TauDEM toolbox for

ArcMap 10 (Tarboton, 2012). The IWI layer, also called a Slope/Contributing Area Ratio

(SAR), was calculated using the slope at any point n divided by the specific catchment area upslope from the point n. This index is useful in modeling (Tarboton,

2009). It is the inverse of a topographic wetness index (TWI) (catchment area/slope), which has been used to model directions of water flow to show areas of a landscape that have higher soil moisture, or areas that are water-gathering, contributing to better plant growth and higher soil OM (Moore et al., 1993; Pei et al., 2010). The IWI proposed by

Tarboton (2009) was developed in part to eliminate problems that occurred when calculating the TWI where slopes equaled zero. When slope equaled zero, NoData values were created in the resulting raster layers. This problem was overcome using the inverse of the TWI.

These six DEM-derived layers were combined into a multiband image, and another OIF (Chavez et al., 1982) was calculated. The three layers selected by the OIF were slope, profile curvature, and the ASR index. These three DEM-derived layers were combined with the three Landsat-derived layers into the final 6-layer multiband image.

Layers were rescaled from 0 to 255 in Imagine, and the resulting image was used in an unsupervised classification.

Geology. Geology was modeled using the digitized geologic map of the 1:100,000 scale, 30’ x 60’ Richfield Quadrangle produced by the Utah Geological Survey (UGS)

(Hintze et al., 2003) downloaded from the UGS website. The shapefile layers were obtained in NAD27 UTM Zone 12N, and were reprojected to NAD83 UTM Zone 12N. 24

The geology layers represented P (Table 1) and were used to stratify the area for selecting sample sites.

Geospatial Analysis

Imagery Classification. In order to create the pre-map, a 20-class unsupervised classification was performed on the final clipped 6-band layer stack (Landsat band 7, 5/2

NDR, NDVI, slope, profile curvature, and ASR) using the Iterative Self-Organizing Data

Analysis Technique (ISODATA) in Imagine (Figure 9). Classes were evaluated in the signature editor of Imagine based on their transform divergence separability. Classes were combined if they had poor separability, sometimes combining three or four different classes into one. Classes were narrowed down to 12 classes (Figure 10).

Sample Site Selection. In order to create a sampling plan that would accurately represent the study area and serve as the training dataset for predictive modeling, spatially explicit sampling points were created by distributing at least 150 points through the study area, placing a minimum of one point in each geologic unit. In the GIS, the geology polygons were overlain on the classified pre-map image to show which classes were located in each polygon. Points were distributed through geologic units, and the number of points given to each unit was determined by the percentage of the study area it covered, essentially prorating the points through geology. Some geologic units in the study area were so small that when the amount of points per unit was calculated, the result was <1 point; these units were given one point, since it was required to have all geologic units represented in the sampling scheme. Doing so added extra points to the

25

ised classification by ISODATA clustering. ised classification by ISODATA Figure 9. This map shows the 20-class unsuperv Figure 9. This map shows

26

llapsed based on statistical d image after classes were co Figure 10. This map shows the refined 12-class ISODATA clustere Figure 10. This separability.

27 sampling point set, resulting in 166 proposed sampling points in the training sample set.

Points in each geologic unit were then distributed through pre-map classes contained in each unit. Only 158 points were sampled and described in the field. Twelve sampling points were added using case-based reasoning (CBR) (Shi et al., 2004; Stum et al., 2010) and field reconnaissance. These CBR points consisted of features easily identified in the orthoimagery: six sand dunes, four rock outcrops, and two sites identified near Coyote

Springs around a point classified as Typic Calcigypsids and correlated to the Deseret series (Table 2). A total of 170 sampling locations and associated data were used to create the predictive model described hereafter.

To validate the spatial predictive model created using the 170 training sites and to assess the statistical distribution of the first sampling set, a second 50-point sampling set was selected by using cLHS in R software using the clhs package (Roudier, 2011). The 6- band layer stack (Landsat band 7, 5/2 NDR, NDVI, slope, profile curvature, and ASR) was sampled through using cLHS to obtain spatial locations of points. Geology was used as a factor in the model, which made the algorithm look at the variation in each geologic unit separately. A ‘cost’ layer (Roudier et al., 2012) was used in the model (Figure 11), which was made of a distance raster calculated from roads using a Euclidean distance of

1000 m, multiplied by a slope raster, and rescaled (Mulder et al., 2013). Using this “cost” layer in the algorithm implemented operational constraints and forced the algorithm to pick points closer to roads, or places where sampling might be easier because of low slope (Roudier et al., 2012), while still retaining statistical normality of the samples selected. The 6-band layer stack was clipped to match the cost raster before cLHS 28

Table 2. Soil series used for pedon classifications in the study area.

Series Name Taxonomic Class Soil Landscape Concept Amtoft LOAMY-SKELETAL, CARBONATIC, Lithic carbonatic soils formed in residuum and collvium with MESIC LITHIC XERIC HAPLOCALCIDS black sagebrush and grasses on uplands such as hillslopes and mountainsides. Arapien FINE-LOAMY, CARBONATIC, MESIC Carbonatic soils formed in alluvium with big sagebrush, XERIC HAPLOCALCIDS shadscale, and rabbitbrush on alluvial fans. Berent MIXED, MESIC XERIC Sandy soils derived from mixed sources covered by juniper and TORRIPSAMMENTS grasses on stabilized sand dunes. Boxelder FINE-LOAMY, CARBONATIC, MESIC Fine carbonatic soils derived from lacustrine sediments, covered XERIC HAPLOCALCIDS with sagebruch and grasses on lake plains. Deseret FINE-SILTY, MIXED, SUPERACTIVE, Fine-silty soils derived from deltaic and lacustrine sediments with MESIC XERIC HAPLOGYPSIDS subsurface gypsum covered by greasewood and saltgrass on lake terraces and floodplains. This soil was observed by Coyote Springs with moist soil. Dixie FINE-LOAMY, MIXED, SUPERACTIVE, Soils formed in alluvium and colluvium from mixed sources with MESIC XERIC CALCIARGIDS Wyoming big sage on alluvial fans and foothills. Escalante COARSE-LOAMY, MIXED, Coarse-textured soils formed in alluvium covered by Wyoming SUPERACTIVE, MESIC XERIC big sage on alluvial fan skirts. HAPLOCALCIDS Goshute FINE-SILTY OVER SANDY OR SANDY- Fine-silty soils formed in lacustrine sediments with a coarse SKELETAL, MIXED, ACTIVE, MESIC bottom, a natric horizon, and gypsum covered by shadscale, and TYPIC NATRARGIDS greasewood on lake plains. Harding FINE, MIXED, ACTIVE, MESIC XERIC Fine soils derived from alluviual and lacustrine sediments with a NATRARGIDS natric horizon, organic stains, and gypsum in the subsurface in shadscale and greasewood on lake plains. Hiko Peak LOAMY-SKELETAL, MIXED, ACTIVE, Skeletal soils derived from alluvium covered by Wyoming big MESIC XERIC HAPLOCALCIDS sagebrush and grasses, on alluvial fans and fan remnants. Hiko Springs COARSE-LOAMY, MIXED, Coarse-textured soils formed in alluvium covered by shadscale SUPERACTIVE, MESIC TYPIC and grasses on dissected lake terraces and plains. HAPLOCALCIDS Kessler FINE-SILTY, CARBONATIC, MESIC Fine-silty carbonatic soils derived from lacustrine sediments XERIC HAPLOCALCIDS covered with big sagebrush and indian ricegrass on hills, alluvial fans and lake terraces. Mainly observed on basalt flows. Larwood FINE-SILTY, MIXED, SUPERACTIVE, Fine-silty soils derived from lacustrine sediments, covered by big MESIC XERIC CALCIARGIDS sagebrush and grasses on lake plains. Lynndyl SANDY, MIXED, MESIC TYPIC Sandy soils derived from eolian and alluvial deposits of lacustrine HAPLOCALCIDS materials with grasses and winterfat on lake terraces and plains. Oakden LOAMY, CARBONATIC, MESIC Lithic petrocalcids formed in calcareous materials in juniper, CALCIC LITHIC PETROCALCIDS black sagebrush, and bige sagebrush on hillslopes. Most sites had laminar caps. Rock Outcrop Sand Dunes Sanpete LOAMY-SKELETAL, CARBONATIC, Carbonatic skeletal soils derived from alluvium of mixed sources MESIC XERIC HAPLOCALCIDS covered by grasses and black sagebrush on alluvial fans and associated features. Saxby LOAMY-SKELETAL, MIXED, Skeletal lithic soils derived from alluvium and eolian-deposited SUPERACTIVE, MESIC LITHIC XERIC lacustrine sediments covered by big sagebrush over basalt flows. HAPLOCALCIDS Shotwell LOAMY, MIXED, SUPERACTIVE, Lithic soils formed in alluvium and residuum with grasses on CALCAREOUS, MESIC LITHIC XERIC mountainslopes. Soils observed had no ash. TORRIORTHENTS Soma LOAMY-SKELETAL, MIXED, Lithic loamy-skeletal soils derived from colluvium and residuum SUPERACTIVE, MESIC LITHIC XERIC from sedimentary rocks in black sagebrush on steep hillsides and HAPLOCALCIDS mountainslopes. Taylorsflat FINE-LOAMY, MIXED, SUPERACTIVE, Fine soils derived from lacustrine sediments with gypsum in the MESIC XERIC HAPLOCALCIDS bottom, covered by sagebrush on lake plains. Tosser SANDY-SKELETAL, MIXED, MESIC Sandy-skeletal soils derived from mixed alluvium in black XERIC HAPLOCALCIDS sagebrush sites on fan terraces. Uvada FINE, SMECTITIC, MESIC TYPIC Fine clay-rich soils derived from lacustrine sediments with a natric NATRARGIDS horizon in shadscale and greasewood on lake plains. Yenrab MIXED, MESIC TYPIC Sandy soils derived from mixed sources covered by rabbitbrush TORRIPSAMMENTS and greasewood on stabilized sand dunes over lake terraces, plains, fan terraces and beach bars. 29

sampling.

Density distributions of the 158 training dataset points sampled in the field were compared to the 50 validation cLHS points to assess similarity in distributions throughout each biophysical data layer. The sample sets were also compared to the original digital pre-map data layers to compare selecting samples from a pre-map created by unsupervised classification distributed by areal extent of geology units with a stratified random sampling algorithm, as well as similarity to the original data.

Field Methods

Fieldwork was performed during the summer of 2012 from June to September collecting 155 points for the training dataset. Fifty validation sampling points were investigated in May 2013, in addition to 3 more points for the training dataset. Figures 8 and 9 show all data points collected. Approximately 1000 soil samples were collected and analyzed (see Table in Appendix B for data of typical pedons). A GPS unit (2 to 4 m lateral accuracy) was used to mark the location of each site visited, and coordinates were recorded in UTM format.

Site data recorded for each location included landform and site position, aspect from north using a compass, slope percent by clinometer, vegetation in order of dominant coverage, geology type verification, parent material, and rock fragment cover and size class on the soil surface. It was noted if the site had been burned with obvious ESR treatment (usually evidenced by drill furrows), burned and not obviously ESR treated, or not burned.

30

layer was created by taking Hypercube Sampling method. This multiplied by the slope map. Figure 11. The COST raster used in the conditioned Latin Figure 11. The roads, Euclidian distance from

31

The soil was excavated at each site using either a Giddings probe to 150 cm, a shovel or sharpshooter to 110 cm, or soil auger to ≥150 cm. At each sampling site, the pedon was described and sampled by genetic horizon, storing each sample in a plastic bag. , horizon depths and boundary topography, ped and void surface features, concentrations, redoximorphic features (RMFs), percent volume of rock fragments (%RF), and moist and dry consistence were all evaluated and recorded for each genetic horizon. Soil structure was not recorded for pedons and horizons that were excavated using a soil auger or Giddings probe. Soil compressive strength was measured for the soil surface in units of kg cm -2 using a pocket penetrometer, and soil shear strength in units of tons ft -2 (TSF) (kg cm-2) using a pocket shear vane tester.

Laboratory Methods

Samples collected were analyzed for several physical and chemical properties on

each genetic horizon of all pedons (158 training, 50 validation cLHS). The methods

described in the Soil Survey Laboratory Methods Manual (Soil Survey Staff, 2004) and

the Field Book for Describing and Sampling Soils (Schoeneberger et al., 2012) were used

to characterize the soil and determine qualities for classification in US Soil Taxonomy.

Soils were allowed to air-dry in the lab, then the whole soil was weighed, and sieved to

<2 mm, retaining any materials >2 mm. Both the <2-mm and >2-mm fractions were

weighed. Laboratory analysis was performed on the <2-mm fraction unless otherwise

specified. Samples requiring grinding to finer than 2 mm were ground as necessary from

a subset of the original sample. Soil pH was measured by the colorimetric method. Color

of dry and moist soil was determined using Munsell charts at the lab on intact 32 soil peds. (sand, , and clay percent) was determined by both hydrometer and texture by feel. Effervescence class was determined in the lab by assessing reaction with 1M HCl (Schoeneberger et al., 2012) on intact soil peds.

All pedons were classified to order, suborder, great group, subgroup and family according to US Soil Taxonomy. Twenty-nine pedons considered to best represent the soils over a certain geomorphic surface, landform, or geology type, were selected as typical pedons.

The 29 typical pedons were analyzed for selected chemical properties in the laboratory. Electrical conductivity and pH were measured on a saturated paste (Soil

Survey Staff, 2004, Proc. 4F2). Gypsum was measured on a selection of typical pedons that had evidence of pedogenic gypsum by dissolution in water, precipitation in acetone, and measured electrical conductivity (Soil Survey Staff, 2004, Proc. 4E2). Inorganic carbon was analyzed on samples ground to <0.25 mm using a pressure-calcimeter

(Fonnesbeck et al., 2013).

Soil and Landscape Concept Development

All described pedons were linked to an established soil series used in Utah, and if possible, series commonly used around or established in Millard County using the

Official Soil Series Description Query Facility

(https://soilseries.sc.egov.usda.gov/osdquery.aspx). Soil series were determined by best

match to: taxonomic classification; dominant landforms such as hillslopes; parent

materials such as limestone colluvium versus alluvium from mixed sources; dominant

vegetative cover such as Wyoming big sagebrush versus black sagebrush; and other 33 distinctive pedon features such as subsurface gypsum versus organic-stained clay coats, or silt versus sandy loam textures. Twenty-five different soil series were selected to characterize the pedons in the study area. These are defined in Table 2.

Ecological site descriptions were also associated with each pedon, based mainly on vegetation and typical soil characteristics at each site. Examination of the ecological sites, vegetation, landform, and soil properties resulted in breaking the study area into two different SMRs, xeric aridic and typic aridic. These SMRs are reflected in the family classification and correlated soil series.

Predictive Modeling and Stratification

Predictive modeling to create a map of soils and associated properties of the study area was performed in R using three different statistical models: random forests, radial support vector machines, and bagged classification trees. Models were trained using 170 observation points: 158 points with collected pedon data, and 12 points selected using

CBR with no pedon data collected. The study area was divided at the Lake Bonnville

Highstand shoreline elevation. Soils above are typically older and soil formation has not been influenced by the lake, whereas below the shoreline the soils are younger and the parent materials are largely lacustrine. Therefore, the study area was stratified into two major geomorphic areas: areas above the Lake Bonneville Highstand shoreline, and areas below. Raster layers were subset in R at the shoreline elevation and points were divided into those above the shoreline (27 points), and those below (143).

Initially, modeling efforts were focused on predicting soil series. However, several soil series classes had very few observations. Therefore, modeling focused on 34 predicting subgroup classes, which decreased the numbers of classes and generally increased the numbers of observations per class. Soil subgroup classes convey the most important soil properties for interpreting use and management (e.g., presence of high exchangeable sodium, subsurface accumulations of clay, carbonates, and/or gypsum, etc.).

Modeling was performed with two sets of covariates. The first set was selected by testing a model with all covariates as predictors using random rorests. A subset of these covariates (named RF covariates hereafter) was identified as most important to the classification using the Gini Index (see below). The second set of covariates used for predictive modeling were the six covariates selected via OIF for pre-map creation, derived from the remotely sensed data layers (3) and the DEM-derived data layers (3)

(OIF covariates).

Random Forests. The random forests method (Breiman, 2001) is part of the randomForest package in R (Liaw and Weiner, 2002). Random forests (RF) is a statistical classifier patterned after classification and regression trees. Instead of using one large tree that is then pruned, many weak independent trees (or iterations) are generated. A classification tree is fit to the bootstrap sample, and a random sample of variables from the observations is used to split each node in the tree. The default number of variables tried at each split is equal to the square root of the number of variables, unless otherwise specified. Many trees are grown without pruning, and the modal class receives the vote for the tree. After training a tree using the bootstrap sample, the “out-of- bag” (OOB) observations are used to validate the tree and assess error. Usually 37% of 35 the observations are never selected for the bootstrap sample. One beneficial attribute of

RF is that the data do not need to be normally distributed, complete, or linear, which can be beneficial for DSM where sometimes data is highly non-linear or non-normal.

Random forests also has some resistance to over-fitting that would produce biased accuracy (Svetnik et al., 2003).

Very little tuning was performed for RF classifications. The default number of variable splits at each node was used. Classifications were performed using 10,000 trees.

This number of trees was selected by looking at the accuracy of predictions versus number of trees. Prediction accuracies were unstable even above 5,000 trees. Variables were eliminated using the Gini Index for variable importance, which measures the total decrease in node impurities from splitting on the variables, averaged over all trees (Liaw and Weiner, 2002). Variables were eliminated step-wise until the classification OOB error would improve no further. For modeling above the shoreline, ten variables were retained. For modeling below the shoreline, six variables were retained, and these variables were used in further classifications (RF covariates).

Support Vector Machines. Radial support vector machines (SVM) was performed in R using the kernlab package (Karatzoglou et al., 2004), and tuned using the caret package (Kuhn et al., 2014). Radial SVM uses machine learning and kernel functions to recognize patterns. The SVM algorithm attempts to fit a line, or hyperplane for multivariate models, through observations in feature space to separate patterns. The SVM function can work easily with non-linear patterns by transformations of original data into new feature space, which is the basis of the kernel function. Support vectors refer to the 36 observations in feature space that may be difficult to separate into classes, which lie closest to the decision surface (e.g., outliers in the data). These observations have direct bearing on the optimum location of the separation line. The line position is located by determining the optimal distance from outlying observations, which is an optimization problem. The outlying observations are chosen by linear regression and naïve Bayesian statistics. Optimization is solved using Lagrangian formulations to maximize the margin between classes. For classes that are not easily separable, a cost can be implemented that will essentially fit a spline through the data. The higher the cost specified, the more flexible the line through classes. The method used was the radial-basis function network,

2 2 where kernels were calculated as: exp(1/(2σ )||x-xi|| ), where σ was the margin constraint.

Ten-fold cross-validation was used to provide a better estimate of accuracy by reducing

over-fitting and over-estimating parameters for the model. Tuning parameters estimated

by caret were C or cost, and σ. Tuning was performed over 10 iterations.

Bagged Classification Trees. Bagged classification trees (BCT) was performed in and tuned using the caret package. The BCT method is related to RF, since it also uses bootstrap aggregation (bagging) (Brieman, 1996) to estimate classification accuracy. The

BCT model is based on classification and regression trees. A classification tree works by recursive binary partitioning of the data space into homogeneous classes with respect to variables measured. Partitions are called nodes, and for classification, nodes are separated by the largest class predicted at each node. Probability of membership is also calculated for each class at each node. Optimization is carried out to determine a node, a predictor variable, and the cut-off value of each class. Partitioning continues until either the 37 objective function can no longer be reduced, called a fully-grown tree, or a stopping rule is applied. When the tree is fully-grown, data is over-fit since it creates every node possible, so the trees must be “pruned” to eliminate over-fitting. A bagging method is used in order to reduce the variance within a sample to eliminate the need for pruning.

The BCT algorithm grows an ensemble of trees, and the predictions from each tree are used to generate a prediction for a new sample. Predictions are averaged to give the bagged model’s prediction (Kuhn and Johnson, 2013).

Predictive rasters were produced for each model for above and below the

Bonneville shoreline. Rasters were processed in Imagine’s clump function to identify clumps of similar classes. The eliminate function in Imagine was used to eliminate pixel clumps of <2 ha, and backfill with surrounding classes. A minimum size of 2 ha was used because a 3rd order soil survey (1:20,000-63,360 scale) does not typically identify polygons with areas <2 ha (Soil Survey Staff, 1993).

Metrics used to evaluate model accuracies were: 1) overall modeling cross- validated accuracies (if estimated by the model); 2) overall classification accuracies estimated by the validation dataset; 3) producer’s accuracy which looks at the predicted class accuracies, and measures how well each class was predicted by the model; 4) and user’s accuracy, which looks at how many observations of the reference data was correctly classified (Congalton, 1991). User’s accuracy can also be described as the probability that when the user of the map is in a pixel of a certain class that it actually is that class (Congalton, 1991).

38

RESULTS AND DISCUSSION

Comparison of Sampling Methods

The density distributions of the 158 training points generated from the pre-map are similar to the density distributions of the 50 validation points generated by cLHS for each biophysical data set (Figure 12). The density distributions of both training and validation datasets are similar to the density distribution of the original digital data layers, as shown in Figure 13. Given the results of the comparisons of density plots (Figures 12 and 13), it is clear that a targeted sampling plan such as a pre-map guided by an unsupervised classification while stratifying by geology can produce a comparable distribution of points from RS imagery and DEM data as a stratified random sampling procedure such as cLHS.

Patterns that emerged in the pre-map created to select samples (Figure 10) were also apparent in the field. Different landscape features classified in the pre-map were picked out easily in the field. The most obvious were basaltic lava flows, areas that had soil colors of high value and low chroma, areas with strong topographic relief, and sand dunes. Vegetated areas adjacent to bare areas with sand cover and north versus south facing slopes were also easily distinguishable. The influence of Lake Bonneville on soil characteristics was clearly visible in the pre-map. Classes mapped on uplands were distinct from classes mapped on lake plains and lowlands.

39

Figure 12. Density plots of data at each sampling point of the training sample and validation sample set from each covariate.

40

Figure 13. Density plots of data at each sampling point of the training sample and validation sample set from each covariate, compared to the original covariates to check sample distributions.

41

Soil and Landscape Concepts

The typical soil order in the study area was Aridisols. A few , soils with minimal development, typically occurred on hillslopes and in areas of significant eolian deposition. Geologic age (e.g., the temporal discontinuity created by the Lake Bonneville shoreline) had a very apparent influence on the distribution of soils over the study area

(Figure 10). Soils occurring at elevations higher than the Bonneville shoreline tended to be coarser-textured and have higher coarse fragment content (loamy-skeletal or sandy- skeletal) than the soils below the shoreline. Soils on older alluvial fans (Qaf2) were typically Haplocalcids (major) and Calciargids (minor) with greater secondary carbonate accumulation (Bkk horizons) than soils on younger alluvial units (Qal1, Qaf1; soils with

Bk horizons).

Soil development in the hills was dominated by hillslope processes, with slope and aspect affecting soil depth and carbonate accumulation. Soils on the south-facing slopes of bedrock-controlled uplands were lithic (shallow to hard bedrock; Lithic Xeric

Torriorthents) to deep (Xeric Haplocalcids), with soils becoming deeper and more developed as slope gradient decreased. The occurrence of soils with secondary carbonate accumulation (Bk and Bkk horizons and minimal carbonate cementation) increased with decreasing slope. In contrast, north-facing slopes had shallow, lithic soils with greater secondary carbonate accumulation (Lithic Xeric Haplocalcids). North slopes also had some scattered shallow Calcic Petrocalcids (Bkkm horizon) that sometimes developed a laminar cap. However, as slopes became steeper, soil development and carbonate 42 accumulation decreased (Lithic Xerid Torriorthents). Summits of hills were typically rock outcrop.

The soils in the Rhyolite Hills in the southern part of the study area (Tcr; Figure

10) had significant amounts of secondary carbonate accumulation (Bkk horizons), some cemented carbonates, and accumulation of secondary silica (Bq horizons) as films or nodules. Soils were mainly Calcic Petrocalcids, Xeric Haplocalcids, and Lithic Xeric

Torriorthents.

Soils below the Lake Bonneville shoreline on shoreline-modified alluvial fans were finer-textured Haplocalcids. Soils on flats and lake plains below the Bonneville shoreline were dominantly fine-textured with fine-loamy, fine, and fine-silty family particle size classes, and derived from lacustrine sediments (Qlf, QTlf, Qlm, Qll).

Haplocalcids were very common in soils formed in lacustrine sediments. Areas that were covered by eolian sediments were observed to be a complex of Aridisols and Entisols.

Soils occurring in eolian sediments blown over fine-textured lacustrine sediments or alluvium mixed with other parent materials (Qed/Qlf) had sandy surfaces, and fine subsurfaces. Some soils in eolian-derived sediments lacked calcic horizons and were classified as Torriorthents or Torripsamments depending on texture and sand size. Soils in the areas north of Sand Ridge varied the most, with patches of open sand dunes, stabilized dunes, and mound-intermound complexes of fine and very fine-textured sediments. Natrargids occurred in areas north of Sand Ridge that were not covered by eolian sands. Sand Ridge was the only place Haplargids were observed. Some dune fields had shifted 50-75 m since the creation of the geology map, but were still in fairly close 43 proximity to the original locations. Several areas had new dune features as a result of the fire removing vegetative cover.

Soils on the basalt flows were dominantly covered by finer-textured Haplocalcids, but a few Calciargids were found. Most were fine-loamy or fine-silty, but soil depths varied widely, from very shallow and lithic, to deep soils. Deep soils were in flats, and lithic soils in areas of high relief. There were no obvious patterns to vegetation, as most had burned off. Typically, soils with large rocks visible on the surface were lithic, and soils without visible rocks were either shallow or deep. There were some areas of juniper around the Beaver Ridge flow, as well as also other scattered areas of juniper. Areas covered by junipers were usually coarse-textured, and had calcic horizons. There were some soils with dark-colored epipedons, but were assumed to lack the required organic carbon content to classify as a mollic epipedon, and, therefore, would not classify as

Mollisols.

Vegetation coverage served as the main break in SMR. Areas dominated by shadscale were assumed to have typic aridic SMR, whereas areas dominated by

Wyoming big sagebrush had xeric aridic SMR.

Soils that were in the lowest-elevation areas of lake plains had some form of visible precipitates such as gypsum crystals or masses in the subsurface. Pedons that had prismatic structure, pH greater than 8.8, EC greater than 0.3 ds m-1 (if measured), organic

stained clay films in the lower horizons or gypsum crystals, possible greasewood cover,

and occurred below the Provo shoreline of Lake Bonneville (Figure 14) were assumed to 44

sized for greater visualization in Clear Spot Flat. Figure 14. This map shows the Provo Highstand Shoreline empha Figure 14. This

45 be Natrargids (exchangeable sodium percentage or sodium absorption ratio were not measured). Soils at the bottom of the old river terraces of the Beaver River were also classified as Natrargids, with all the above characteristics.

Overall, twenty-five different series were selected to characterize the soils in the study area (Table 2). Some difficulties arose when classifying pedons to series. One difficulty was that the original vegetation was removed. Burned sagebrush areas were now covered in grasses, such as non-native crested wheatgrass. Cheatgrass invasion was prevalent, and in places possibly more severe than before the fire. Mainly over basalt flows, post-fire chaining disturbed the by pulling up rocks, usually mixing a strong calcic horizon with a dark epipedon. Another complication was the erosion of the soil surface in some areas where as much as 50 cm of soil may have been lost (personal communication with Bob Newhall, USU). Truncated soils make difficult. Because the lacustrine flats are dominated by one parent material that is usually the same color as underlying horizons, identifying when a soil is truncated can also be a challenge.

Predictive Models

Modeling was performed using 170 training observations, 19 covariates, and 19 different soil subgroup classes. Eleven soil subgroups were present above the Bonneville shoreline, and 17 subgroups were present below. From the validation sample set, only four observations were above Bonneville shoreline, and 46 below. 46

Figure 15. The variable importance plot for classification above the Bonneville Highstand Shoreline. Ten covariates were retained.

47

Random Forests. Soil subgroups above the Bonneville shoreline were modeled using RF. The variable importance plot in Figure 15 shows the 10 covariates retained for the final classification. Potential OOB error was estimated at 85.2%, and when validated using the four validation points, accuracy was estimated at 50%. The predictive raster is given in Figure 16. User’s accuracy and producer’s accuracy were not estimable.

Major components of modeling areas above the Bonneville shoreline were driven by hillslope processes, represented by ASR and Slope covariates. Calcic Petrocalcids are over-predicted, but they follow the soil landscape concepts outlined above. Lithic soils tended to be predicted on north-facing slopes, which compared well with the soil landscape concepts. Xeric Haplocalcids were the most extensively predicted class, dominating alluvial fans and areas of low relief. Lithic Xeric Haplocalcids looked to be under-predicted, as most areas in the Rhyolite Hills that were predicted to be Lithic Xeric

Torriorthents. Haplocalcids were expected to be fairly extensive on moderately steep south-facing slopes. Rock outcrops were under-predicted in the Cricket Mountains and the Rhyolite Hills, but were predicted well around North Twin Peak and Lava Ridge.

Petronodic Xeric Haplocalcids and Xeric Haplocalcids were predicted fairly well along

Lava Ridge. After clumping, all predictions of Durinodic Haplocalcids and Lithic Xeric

Calciargids were removed since they were not highly predicted in the raw raster, and had no extents greater than 2 ha.

The predictions of soils above Bonneville shoreline were not taken beyond initial testing with RF, because using only 27 observations in the training data was considered 48

ssification on areas put for subgroup cla m random forests out Figure 16. This map shows the predictive raster created fro Figure 16. This shoreline. above the Bonneville

49 too small to be a statistically valid for training. There were also insufficient observations in the validation set to estimate accuracies for the models predicting soil distribution above the Bonneville shoreline. Therefore, a hybrid approach is recommended to map those soil extents.

For predictions below the Bonneville shoreline using RF, classification OOB error was 55.2% using all digital covariates. After variable elimination, six variables were retained, as shown in Figure 17. The classification with the RF variables was refitted with an OOB error of 49.6%. The most important variables according to the Gini index were elevation, slope, 5/7 NDR, ASR, 5/4 NDR, and 4/7 NDR. The predictive raster map is in

Figure 18. Class ID numbers are in Table 3 for taxonomic classes. Accuracy calculated for the RF model with the RF variables from the validation sample set was 39.1% (Table

4). Only three classes were predicted with any accuracy, and classes best predicted were

Typic Haplocalcids, Typic Natrargids, and Xeric Haplocalcids. User’s accuracies indicate that when using the map, 50% of sites visited within areas classified as Typic

Haplocalcids will be observed as such, only 9% of sites visited within areas classified as

Typic Natrargids will be observed as such, and 57.7% of sites visited in areas classified as Xeric Haplocalcids will be observed as such.

The second set of predictions below the Bonneville shoreline used the six OIF covariates. Results gave 57.1% OOB classification error. Validation of predictions gave

45.7% accuracy (Table 5). Six classes were modeled with any accuracy, and classes predicted with measurable accuracy were Lithic Xeric Haplocalcids, Lithic Xeric

Torriorthents, Petronodic Haplocalcids, Typic Haplocalcids, Typic Natrargids, and Xeric 50

Figure 17. The variable importance plot from random forests classification of areas below the Bonneville shoreline. Six covariates were retained for further classification.

51

output of subgroup classification using below the Bonneville shoreline. from random forests sts variable importance for areas covariates selected by random fore Figure 18. This map shows the predictive raster created Figure 18. This

52

Table 3. A table of taxonomic classes associated with ID numbers in the following confusion matrices for classifications.

Taxonomic Class Class ID Calcic Petrocalcids 1 Lithic Xeric Haplocalcids 2 Lithic Xeric Torriorthents 3 Petronodic Xeric Haplocalcids 4 Rock Outcrop 5 Sand Dunes 6 Typic Calciargids 7 Typic Haplargids 8 Typic Haplocalcids 9 Typic Natrargids 10 Typic Torripsamments 11 Xeric Calciargids 12 Xeric Calcigypsids 13 Xeric Haplocalcids 14 Xeric Natrargids 15 Xeric Torriorthents 16 Xeric Torripsamments 17

53

Table 4. Confusion matrix from the validation accuracy assessment of random forests predictions below Bonneville shoreline from RF covariates.

Observed User's Predicted 2 3 4 7 9 10 11 12 14 15 16 Accuracy 2 0 0 0 0 0 0 0 0 0 0 0 0.0% 3 0 0 0 0 0 0 0 0 0 0 0 0.0% 4 0 0 0 0 0 0 0 0 1 0 0 0.0% 7 0 0 0 0 0 0 0 0 0 0 0 0.0% 9 0 0 1 0 2 0 0 0 0 1 0 50.0% 10 0 0 0 2 6 1 1 0 0 1 0 9.1% 11 0 0 0 0 0 0 0 0 0 0 0 0.0% 12 1 0 0 0 0 0 0 0 1 1 1 0.0% 14 2 1 1 0 0 0 0 4 15 1 2 57.7% 15 0 0 0 0 0 0 0 0 0 0 0 0.0% 16 0 0 0 0 0 0 0 0 0 0 0 0.0% Producer's Accuracy 0.0% 0.0% 0.0% 0.0% 25.0% 100.0% 0.0% 0.0% 88.2% 0.0% 0.0% 39.1%

Table 5. Confusion matrix from the validation accuracy assessment of random forests predictions from OIF-selected covariates.

Observed User's Predicted 2 3 4 7 9 10 11 12 14 15 16 Accuracy 2 1 0 0 0 0 0 0 1 0 0 0 50.0% 3 0 1 0 0 0 0 0 0 0 0 0 100.0% 4 0 0 1 0 0 0 0 0 1 0 0 50.0% 7 0 0 0 0 0 0 0 0 0 0 0 0.0% 9 0 0 1 0 2 0 1 0 0 0 0 50.0% 10 0 0 0 1 6 1 0 0 0 2 1 9.1% 11 0 0 0 0 0 0 0 0 0 0 0 0.0% 12 1 0 0 0 0 0 0 0 1 0 0 0.0% 14 1 0 0 1 0 0 0 3 15 2 2 62.5% 15 0 0 0 0 0 0 0 0 0 0 0 0.0% 16 0 0 0 0 0 0 0 0 0 0 0 0.0% Producer's Accuracy 33.0% 100.0% 50.0% 0.0% 25.0% 100.0% 0.0% 0.0% 88.2% 0.0% 0.0% 45.7%

54

Haplocalcids. Typic Haplocalcids and Typic Natrargids were modeled with similar accuracy, but Xeric Haplocalcids had higher user’s accuracy. The predictive raster is given in Figure 19.

The OIF-selected layers produced a model with greater predictive accuracy than the RF-selected covariates. However, modeling with RF covariates had lower OOB error.

These results are interesting because of the differences in selection methods. The OIF- selected covariates consist of three covariates from RS imagery and three DEM covariates that contain the most information (high covariance) with the least correlation among all the different covariate data types. The RF method looks at when each variable starts to corrupt the classification, without checking to see if there is any correlation between covariates. It might be argued that correlation of variables may not matter so much, but collinear variables are never desirable in statistical analyses. When looking at an area with such high variability, there is some need to have covariates that differ as much as possible from each other in order to capture the variability (Kienast-Brown and

Boettinger, 2010).

When considering the predictive rasters, extents predicted by random forests created from RF-selected covariates tended to under-predict Sand Dunes, Calciargids, and Natrargids, and over-predict Lithic Xeric Haplocalcids and Xeric Calcigypsids

(Figure 18). For example, some areas in lacustrine flats were predicted as Lithic Xeric

Haplocalcids, and some areas in a wash near North Twin Peaks were predicted as Xeric

Calcigypsids. Both areas were clearly not what was predicted. The split between Xeric

Haplocalcids and Typic Haplocalcids was modeled very well along with Typic 55

output of subgroup classification using from random forests IF) for areas below the Bonneville shoreline. shoreline. below the Bonneville IF) for areas covariates selected by Optimum Index Factor (O Index by Optimum covariates selected Figure 19. This map shows the predictive raster created Figure 19. This

56

Natrargids. Soils in Clear Spot Flat and north tended to be Typic, while other areas and uplands were Xeric. The extent of Typic Natrargids matched the extent of the Provo shoreline of Lake Bonneville. Random forests also picked up on a lot of noise in the data.

Clumping and elimination of pixels removed Rock Outcrops, Typic Haplargids, Xeric

Torriorthents, and Xeric Torripsamments from the predictive raster. These classes were not heavily classified in the predictive raster, meaning pixel groups did not exceed 2 ha.

The OIF covariate classification fit soil landscape concepts better than the RF- selected covariate classification. Clumping and elimination of pixels removed Rock

Outcrops, Typic Haplargids, Xeric Torriorthents, and Xeric Torripsamments from the predictive raster again, because these classes were not heavily classified in the predictive raster and pixel groups did not exceed 2 ha.

Support Vector Machines. Predictions of soil subgroup distribution below the

Bonneville shoreline using SVM were done using the six variables selected in RF (see

Figure 17). After tuning SVM in the caret package, tuned parameters were estimated as σ

= 0.276 and C (cost parameter) = 2. Training cross-validated accuracy was estimated at

37.1%. A confusion matrix was not generated from the training data. Accuracy of the validation sample set was estimated at 45.7% (Table 6). The only classes predicted with some measure of accuracy were Typic Haplocalcids and Xeric Haplocalcids. These were the most dominant classes in the validation set. Typic Haplocalcids were predicted with moderate success, and Xeric Haplocalcids were predicted very successfully.

User’s accuracy was 83.3%, and 43.2% respectively, meaning that Typic

Haplocalcids are likely to be found in 83.3% of pixels classified as Typic Haplocalcids, 57

Table 6. Confusion matrix from the validation accuracy assessment of Support Vector Machine predictions from OIF-selected covariates.

Observed User's Predicted 2 3 4 7 9 10 11 12 14 15 16 Accuracy 2 0 0 0 0 0 0 0 0 0 0 0 0.0% 3 0 0 0 0 0 0 0 0 0 0 0 0.0% 4 0 0 0 0 0 0 0 0 0 0 0 0.0% 7 0 0 0 0 0 0 0 0 0 0 0 0.0% 9 0 0 0 0 4 0 0 0 0 0 0 100.0% 10 0 0 0 0 0 0 0 0 0 0 0 0.0% 11 0 0 0 0 0 0 0 0 0 0 0 0.0% 12 0 0 0 0 0 0 0 0 0 0 0 0.0% 14 3 1 2 2 4 1 1 4 17 4 3 40.5% 15 0 0 0 0 0 0 0 0 0 0 0 0.0% 16 0 0 0 0 0 0 0 0 0 0 0 0.0% Producer's Accuracy 0.0% 0.0% 0.0% 0.0% 50.0% 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 45.6%

and Xeric Haplocalcids are likely to be found in 43.2% of pixels classified as such.

Predicted classes are shown in Figure 20. Classifications made with the OIF covariates were tuned in the caret package as well. Tuning parameters were estimated as σ = 0.349 and C = 0.5, with classification cross-validated accuracy estimated as 33.7%, slightly lower than the RF-selected covariate classification. Validation sample accuracy was estimated as 45.6% (Table 7). Only two classes were predicted with any accuracy using

SVM, which were Typic Haplocalcids, and Xeric Haplocalcids. Producer’s accuracy was

50% for Typic Haplocalcids, and 100% for Xeric Haplocalcids. User’s accuracy was

100% for Typic Haplocalcids and 40.5% for Xeric Haplocalcids, meaning 100% of pixels visited on the maps classed as Typic Haplocalcids would be such, and only 40.5% of pixels would be Xeric Haplocalcids when visited (Figure 21). 58

Table 7. Confusion matrix from the validation accuracy assessment of Support Vector Machine predictions from RF covariates.

Observed User's Predicted 2 3 4 7 9 10 11 12 14 15 16 Accuracy 2 0 0 0 0 0 0 0 0 0 0 0 0.0% 3 0 0 0 0 0 0 0 0 0 0 0 0.0% 4 0 0 0 0 0 0 0 0 0 0 0 0.0% 7 0 0 0 0 0 0 0 0 0 0 0 0.0% 9 0 0 1 0 5 0 0 0 0 0 0 83.3% 10 0 0 0 0 0 0 1 0 1 1 0 0.0% 11 0 0 0 0 0 0 0 0 0 0 0 0.0% 12 0 0 0 0 0 0 0 0 0 0 0 0.0% 14 3 1 1 2 3 1 0 4 16 3 3 43.2% 15 0 0 0 0 0 0 0 0 0 0 0 0.0% 16 0 0 0 0 0 0 0 0 0 0 0 0.0% Producer's Accuracy 0.0% 0.0% 0.0% 0.0% 62.5% 0.0% 0.0% 0.0% 94.1% 0.0% 0.0% 45.7%

The SVM model using RF covariates had validation accuracy equal to RF modeling using OIF covariates. Modeling with SVM had several issues. First, while class accuracies were high for classes predicted, only two classes were modeled with any measure of accuracy, which is very poor. Second, the predictive raster had several classes missing, meaning that SVM left them out of the model entirely. None of the exempted classes were present in the validation set, and they were some of the least observed classes, so it had little bearing on the accuracy of the model, unless one is particularly interested in finding that class in the study area to increase observations in the model.

In the SVM by RF covariates predictive raster (Figure 20) Xeric Haplocalcids were highly over-predicted, and Typic soils were very under-predicted. Natrargids of any kind were confined to the northwest corner of Sand Ridge. Support vector machines had

59

ling of subgroup classification Support Vector Machines mode for areas below the Bonneville shoreline. for areas below the Bonneville shoreline. Figure 20. This map shows the predictive raster created by Figure 20. This covariates forests-selected using random

60

ling of subgroup classification Support Vector Machines mode eas below the Bonneville shoreline. eas below the Bonneville shoreline. Figure 21. This map shows the predictive raster created by Figure 21. This ar covariates for using OIF-selected

61 very little issues with DEM artifacts. Visually, the SVM predictions do not match soil landscape concepts very well. This may result from the way SVM does clustering, and given the large variability in the study area, SVM may not be able to differentiate between groups as easily as other models given that it relies on outliers to determine groups. Predictions of any Entisols or lithic soils were minimal. Rock outcrops were very few, and Typic and Xeric Natrargids had the smallest coverage of all three predictive maps. It can be concluded that either SVM cannot work well with the variability in the study area, or it needs balanced numbers of observations to make proper classifications.

Low numbers of observations meant that classes were poorly predicted. Classes left out after clumping and elimination were Calcic Petrocalcids, Typic Haplargids, Xeric

Natrargids, Xeric Torriorthents, and Xeric Torripsamments. Classes with the greatest extent were Typic Haplocalcids, and Xeric Haplocalcids. All other classes had limited extent.

Only three classes were produced in the raster using the OIF covariate classification (Figure 21), and after clumping and elimination, Typic Natrargids were almost entirely removed from the predictive raster. As shown in the confusion matrix

(Table 7), points were predicted dominantly as Xeric Haplocalcids. Both classes moderately fit the soil landscape concepts where expected, but other classes expected were non-existent.

Bagged Classification Trees. Bagged classification trees used to predict soils distribution below the Bonneville shoreline had no parameters to tune, but the caret package allowed easy OOB validation over a specified number of iterations. 62

Classification accuracy was estimated at 35.4%. Validation accuracy was estimated as

31.1% (Table 8). Four classes were predicted with some measure of accuracy, consisting of Typic Haplocalcids, Typic Natrargids, Xeric Haplocalcids, and Xeric Natrargids.

Interpreting the producer’s and user’s accuracies show that Typic Haplocalcids were modeled with 25% accuracy, and can be found onthe map 66.7% of the time is classes modeled as such. Typic Natrargids were modeled with 100% accuracy, but can only be found 12.5% of the time in a class modeled as such. Xeric Haplocalcids were modeled with 62.5% accuracy, and can 45.4% of the time in classes modeled as such. Xeric

Natrargids were predicted correctly 25% of the time, and can be found all the time.

Predicted classes are shown in Figure 22.

Results from the BCT model from RF variables fit soil landscape concepts the best. Natrargids are fairly extensive in the soils north of the study area. Influence of DEM artifacts can be seen in the BCT raster. All classes were present in the predictive raster.

Classification using BCT performed with OIF-selected covariates were comparable to

RF-selected covariate classifications. Classification by OOB validation was 25.8%.

Validation accuracy was estimated as 33.3% (Table 9). The four classes with measurable accuracy were Lithic Xeric Haplocalcids, Typic Haplocalcids, Typic Natrargids, and

Xeric Haplocalcids. Producer’s accuracy for each class was 33%, 12.5%, 0%, and 75%, respectively. User’s accuracy was 50%, 50%, 10%, and 48%, respectively. Typic

Natrargids were not predicted with any accuracy, but had a 10% chance of being present in pixels classified as such. 63

Table 9. Confusion matrix from the validation accuracy assessment of Bagged Classification Trees predictions from OIF-selected covariates.

Observed User's Predicted 2 3 4 7 9 10 11 12 14 15 16 Accuracy 2 1 0 0 0 0 0 0 1 0 0 0 50.00% 3 0 0 0 0 0 0 0 0 0 0 0 0.00% 4 0 0 0 0 0 0 0 0 1 0 0 0.00% 7 0 0 0 0 0 0 0 0 0 1 1 0.00% 9 0 0 0 0 1 0 1 0 0 0 0 50.00% 10 0 0 0 1 5 1 0 0 2 1 0 10.00% 11 0 0 0 0 1 0 0 0 0 0 0 0.00% 12 1 0 0 0 0 0 0 0 1 0 0 0.00% 14 1 1 2 1 1 0 0 3 12 2 2 48.00% 15 0 0 0 0 0 0 0 0 0 0 0 0.00% 16 0 0 0 0 0 0 0 0 0 0 0 0.00% Producer's Accuracy 33.00% 0.00% 0.00% 0.00% 12.50% 0.00% 0.00% 0.00% 75.00% 0.00% 0.00% 33.30%

Table 8. Confusion matrix from the validation accuracy assessment of Bagged Classification Trees predictions from RF covariates.

Observed User's Predicted 2 3 4 7 9 10 11 12 14 15 16 Accuracy 2 0 0 0 0 0 0 0 1 1 0 1 0.0% 3 0 0 0 0 0 0 0 0 0 0 0 0.0% 4 0 0 0 0 0 0 0 0 2 0 0 0.0% 7 0 0 0 0 0 0 0 0 0 1 0 0.0% 9 0 0 1 0 2 0 0 0 0 0 0 66.7% 10 0 0 0 1 4 1 1 0 1 0 0 12.5% 11 0 0 0 1 1 0 0 0 0 0 0 0.0% 12 1 0 0 0 0 0 0 0 2 0 0 0.0% 14 2 1 1 0 1 0 0 3 10 2 2 45.4% 15 0 0 0 0 0 0 0 0 0 1 0 100.0% 16 0 0 0 0 0 0 0 0 0 0 0 0.0% Producer's Accuracy 0.0% 0.0% 0.0% 0.0% 25.0% 100.0% 0.0% 0.0% 62.5% 25.0% 0.0% 31.1%

64

Trees modeling of subgroup ed by Bagged Classification riates for areas below the Bonneville shoreline. areas below the Bonneville shoreline. riates for Figure 22. This map shows the predictive raster creat Figure 22. This cova forests-selected using random classification

65

The OIF covariate predictive raster (Figure 23) was not comparable to the raster created from RF-selected covariates. All classes were predicted, but with such small areas of occurrence that after clumping and eliminating pixels Typic Haplargids and Xeric

Torriorthents were completely removed from the raster.

Modeling Summary

When evaluating models performed with the RF-selected covariates, the SVM model had the best accuracy because it predicted Typic Haplocalcids and Xeric

Haplocalcids better than any other model when considering user’s and producer’s accuracy. Yet, the model for SVM didn’t predict either class of Natrargids accurately.

Only BCT predicted both classes of Natrargids. Xeric Natrargids had excellent user’s accuracy from BCT. Typic Natrargids had poor user’s accuracy from both RF and BCT models. Since by-class accuracies from user’s accuracy are the better indicator of model predictive power, SVM is clearly the better model. However, its ability to predict classes other than the two most populous classes was severely limited. Modeling above the

Bonneville shoreline was dominated by DEM covariates. Hillslope processes are the greater factor in soil development in uplands. Modeling below the Bonneville shoreline was dominated by both spectral covariates and DEM covariates, with elevation being the most important covariate and the 5/4 NDR the second most important. The three spectral bands chosen are very useful to represent different features, and combined to give reasonable predictions. The RF-chosen covariates were three DEM-derived layers, and three spectral layers, similar to the six covariates chosen by OIF. Covariates selected by 66

Trees modeling of subgroup w the Bonneville shoreline. ed by Bagged Classification ed covariates for areas belo Figure 23. This map shows the predictive raster creat Figure 23. This classification using OIF-select

67

RF also selected three of the six OIF covariates, Slope, ASR, and 5/2 NDR (Figure 17).

Some artifacts in the DEM were very pronounced in the predictive rasters, as can be seen by waves, straight lines, and a large circular spot in the middle of the images (Figures 18,

19, and 23). Random forests had the most noise from DEM artifacts.

Models performed with OIF covariates point toward RF being the best method to model soil extents. When comparing predictive rasters, RF and BCT are almost identical.

This may have something to do with the similarities in models themselves. The poorest model was SVM, even with good classification accuracy. Only two classes with any predictive accuracy is a poor result. Six classes were modeled with measurable accuracy by RF, and even with the small extents on the map of each class, it would still be useful to guide further sampling to refine predictions or map units.

Using RS data and DEMs for DSM can involve many data layers from derivatives of both types of data, as well as any proximal soil data one may wish to use. As has been shown by the RF predictions using all covariates, in order to reduce dimensionality some variable selection must be done either by OIF or some statistical classifier such as RF.

Brungard et al. (2015) showed that using some form of variable selection for when using machine learning classifiers, whether by RF or some other method, can significantly improve classification accuracy. A 5% improvement was seen in this study for the RF classification below the Bonneville shoreline. Variable selection by both RF below the

Bonneville shoreline and OIF methods had three common data layers: ASR, 5/2 NDR, and Slope. Variables selected by OIF did not improve the classification accuracy, but 68 produced the best results in RF. Overall, modeling using RF, and OIF-selected covariates was the best method to obtain reasonable accuracy of class predictions.

Results show that given the low accuracy of the models, more sampling points are necessary to capture the variability in the area, particularly in uplands. Stum et al. (2010) used over 600 sampling points to train a model using RF that had 56% OOB error, which was comparable to the OOB error for the areas below the Bonneville shoreline. It was also observed, as has been shown by several studies that low observations in classes result in poor predictions for that class (Brungard and Boettinger, 2010; Brungard et al.,

2015; Stum et al., 2010). Brungard and Boettinger (2010) determined that the optimal amount of sampling sites selected by cLHS was approximately 200 – 300 samples for their study area, suggesting that the 50 sample validation set created may not accurately represent the variability in the data, but based on the density plots, can be comparable.

Brungard and Boettinger (2010) also suggested using the cLHS method to determine optimal sampling sizes for the area being mapped. Given more time and funding, more points could have been collected, and could be collected in the future to refine the models, and possibly make better predictions. The predictive rasters are not useless with poor accuracy; they can be used to guide further sampling to increase observations of small classes, particularly in uplands to model hillslopes.

69

CONCLUSIONS

This study has shown that in order to model soil extents over a landscape, a pre- map composed of selected covariates and stratified by some physiographic feature can produce a sampling scheme comparable to a random sampling scheme. Soil landscape concepts fit well with the classes created in the pre-map using ISODATA clustering, which aided in the selection of sampling sites. Soils over the landscape of the study area matched soil development concepts in relation to topography and relief. In general using

RF-selected covariates, SVM was the best classifier, but BCT produced the best map to fit soil landscape concepts, even though the accuracy was poor. Random forest also produced a fairly reasonable map. Using OIF-selected covariates produced the best model with RF, with good accuracy. Given the low number of sampling points and validation points, all of these models are not desirable to directly create soil polygons from. They can however, be used to guide further sampling for refinement of mapping schemes. It may also be fairly complicated to draw soil map units of the area given the diverse land surface. Some future work for the study area may be a multi-temporal analysis of the changes of the surface to possibly predict soil extents before the fire. Accuracy may be improved, but will require a reasonable amount of samples.

70

REFERENCES

Bartholomeus, H.M., G.F. Epema , and M.E. Schaepman. 2007. Determining iron content

in Mediterranean soils in partly vegetated areas, using spectral reflectance and

imaging spectroscopy. Int. J. Applied Earth Observation and Geoinformation

9:194–203.

Bartholomeus, H.M., M.E. Schaepman, L. Kooistra, A. Stevens, W.B. Hoogmoed, and

O.S.P. Spaargaren. 2008. Spectral reflectance base indices for soil organic carbon

quantification. Geoderma 145:28-36.

Bilgili, A.V. 2013. Spatial assessment of soil salinity in the Harran Plain using multiple

kriging techniques. J. Environ. Monit. Assess. 185:777-795.

Boettinger, J.L. 2010. Environmental covariates for digital soil mapping in the Western

USA. p. 17-27. In Boettinger, J.L., A.C. Moore, S. Kienast-Brown, D.W. Howell,

and A.E. Hartemink. (ed.) Digital soil mapping: progress in soil science 2, Part 2:

Research. Springer Dordrecht, Netherlands.

Breiman, L. 1996. Bagging predictors. J. Machine Learning 24: 123-140.

Breiman, L. 2001. Random forests. J. Machine Learning 45: 5-32.

Brungard, C.W., J.L. Boettinger. 2010. Conditioned Latin Hypercube Sampling: Optimal

sample size for digital soil mapping of arid rangelands in Utah, USA. p. 67-75. In

Boettinger, J.L., A.C. Moore, S. Kienast-Brown, D.W. Howell, and A.E.

Hartemink. (ed.) Digital soil mapping: progress in soil science 2, Part 2: Research.

Springer Dordrecht, Netherlands. 71

Brungard, C.W., J.L. Boettinger, M.C. Duniway, S.A. Wills, and T.C. Edwards. 2015.

Machine learning for predicting soil classes in three semi-arid landscape.

Geoderma 239-240:68-83.

Buol, S.W., R.J. Southard, R.C. Graham, and P.A. McDaniel. 2011. Soil genesis and

classification, 6th ed. Wiley-Blackwell, Hoboken, NJ.

Chavez, P.S. Jr. 1996. Image-based atmospheric corrections—revisited and revised.

Photogrammetric Engineering and Remote Sensing 62:1025-1036.

Chavez, P.S., G.L. Berlin, and L.B. Sowers. 1982. Statistical method for selecting

Landsat MSS ratios. J. Appl. Photographic Engineering 8:23-30.

Congalton, R.G. 1991. A review of assessing the accuracy of classifications of remotely

sensed data. Remote Sens. Environ. 37: 35-46.

Crist, E.P., and R.J. Kauth. 1986. The Tasseled Cap de-mystified. Photogrammetric

Engineering and Remote Sensing 52:81-86.

Evans, D.M. 2013. Digital soil mapping of the red clay of the driftless area near Verona,

Wisconsin, USA. M.S. Thesis, University of Wisconsin-Madison, Madison, WI.

Fonnesbeck, B.B., J.L. Boettinger, and J.R. Lawley. 2013. Improving a simple pressure-

calcimeter method for inorganic carbon analysis. Soil Sci. Soc. Am. J. 77:1553-

1562

Ge, Y., C.L.S. Morgan, S. Grunwald, D.J. Brown, and D.V. Sarkhot. 2011. Comparison

of soil reflectance spectra and calibration models obtained using multiple

spectrometers. Geoderma 161:202-211. 72

Grunwald, S. 2009. Multi-criteria characterization of recent digital soil mapping and

modeling approaches, Geoderma, 152:195-207.

Grunwald, S. 2010. Current state of digital soil mapping and what is next, p. 3-15. In:

Boettinger, J.L., A.C. Moore, S. Kienast-Brown, D.W. Howell, and A.E.

Hartemink. (ed.) Digital soil mapping: progress in soil science 2, Part 1:

Introduction. Springer, Dordrecht, Netherlands.

Grunwald, S., K.R. Reddy, S. Newman, and W.F. DeBusk. 2004. Spatial variability,

distribution and uncertainty assessment of soil phosphorus in a south Florida

wetland. Environmetrics 15:811–825.

Hay, C.M., C.A. Kuretz, J.B. Odenweller, E.J. Scheffner, and B. Wood. 1979.

Developments of AI procedures for dealing with the effects of episodal events on

crop temporal spectral response. AGRISTARS Report SR-B9-00434.

Hijmans R.J., and J. van Etten. 2012. raster: Geographic analysis and modeling with

raster data. R package version 1.9-92. http://CRAN.R-project.org/package=raster.

Accessed 13 March 2014.

Hintze, L.F. 2005. Utah’s spectacular geology: how it came to be. Brigham Young

University Geology Studies Special Publication 8. BYU Department of Geology,

Provo, UT.

Hintze, L.F., and F.D. Davis. 2007. Geology of Millard County, UT. Utah Geological

Survey, Bulletin 133.

Hintze, L.F., F.D. Davis, P.D. Rowley, C.G. Cunningham, T.A. Steven, and G.C. Willis.

2003. Geologic Map of the Richfield 30' x 60' Quadrangle, Southeast Millard 73

County and Parts of Beaver, Piute, and Sevier Counties, Utah Geological Survey

Map 195. http://geology.utah.gov/maps/gis/index.htm, Accessed April 5, 2011.

Jackson, R.D. 1983. Spectral indices in n-space. Remote Sens. Environ. 13: 409-421.

Jafari, A., P.A. Finke, J. Van De Wauw, S. Ayoubi, and H. Khadem. 2012. Spatial

prediction of USDA- great groups in the arid Zarand region, Iran: comparing

logistic regression approaches to predict diagnostic horizons and soil types.

European J. Soil Sci. 63:284-298.

Jenny, H. 1941. Factors of soil formation, McGraw-Hill, New York, NY.

Jensen, J.R. 2005. Introductory digital image processing: a remote sensing perspective,

3rd ed. Pearson Prentice Hall, NJ USA.

Jewell, P.W., and K. Nicoll. 2011. Wind regimes and aeolian transport in the Great Basin,

USA. 129:1-13.

Keitt, T.H., R. Bivand, E. Pebesma, and B. Rowlingson. 2012. rgdal: Bindings for the

Geospatial Data Abstraction Library. R package version 0.7-11. http://CRAN.R-

project.org/package=rgdal. Accessed 13 March 2014.

Kienast-Brown, S., J.L. Boettinger. 2010. Applying the Optimum Index Factor to

multiple data types in soil survey. In Boettinger, J.L., A.C. Moore, S. Kienast-

Brown, D.W. Howell, and A.E. Hartemink. (ed.) Digital soil mapping: progress in

soil science 2, Part 2: Research. Springer, Dordrecht, Netherlands.

Kuhn, M. Contributions from J. Wing, S. Weston, A. Williams, C. Keefer, A. Engelhardt,

T. Cooper, Z. Mayer, and the R Core Team. 2014. caret: Classification and 74

regression training. R package version 6.0-24. http://CRAN.R-

project.org/package=caret. Accessed 13 March 2014.

Kuhn, M. and K. Johnson. 2013. Applied predictive modeling. Springer, New York.

Lagacherie, P., and A.B. McBratney. 2007. Spatial soil information systems and spatial

soil inference systems: Perspectives for digital soil mapping. p. 3–22. In P.

Lagacherie, A. McBratney, and M. Voltz. (ed.) Digital soil mapping: An

introductory perspective. Elsevier, New York.

Lagacherie, P. 2008. Digital soil mapping: A state of the art. In A.E. Hartemink, A.B

McBratney, and M.L. Mendonca-Santos. (eds.) Digital soil mapping with limited

data. Springer, Dordrecht, Netherlands.

Leica Geosystems. 2011. ERDAS Imagine version 11. Leica Geosystems, Altanta, GA.

Liaw, A. and M. Wiener. 2002. Classification and regression by randomForest. R News

2:18-22.

Litaor, M.I., O. Reichman, M. Belzer, K. Auerswald, A. Nishri, and M. Shenker. 2003.

Spatial analysis of phosphorus sorption capacity in a semiarid altered wetland. J.

Environ. Qual. 32:335–343.

McBratney, A. B., M. L. Mendonca-Santos, and B. Minasny. 2003. On digital soil

mapping, Geoderma 117:3-52.

Minasny, B., and A.B. McBratney. 2006. A conditioned Latin hypercube method for

sampling in the presence of ancillary information. Computers and Geosciences

32:1378-1388. 75

Minasny, B., A.B. McBratney, B.P. Malone, and I. Wheeler. 2013. Digital mapping of

. Advances In Agronomy Vol. 118. Elsevier, New York.

Moore, I.P., P.E. Gessler, G.A. Nielsen, and G.A. Peterson. 1993. Soil attribute

prediction using terrain analysis. Soil Sci. Soc. Am. J. 57:443-452 (erratum 57:i).

Mulder, V.L., S. de Bruin, and M.E. Schaepman. 2013. Representing major soil

variability at regional scale by constrained Latin Hypercube Sampling of remote

sensing data. Int. J. Appl. Earth Observation and Geoinformation 21:301-310.

Mulder, V.L., S. de Bruin, M.E. Schaepman, and T.R. Mayr. 2011. The use of remote

sensing in soil and terrain mapping – A review. Geoderma 162:1-19.

Musick, H.B., and R.E. Pelletier. 1988. Response to soil moisture of spectral indexes

derived from bidirectional reflectance in thematic mapper wavebands. Remote

Sensing of Envir. 25:167-184.

Nield, S. J., J. L. Boettinger, and R. D. Ramsey. 2007. Digitally mapping gypsic and

natric soil areas using Landsat ETM data, Soil Sci. Soc. Am. J. 71:245-252.

Nield, S.J. 2004. Using geographic information systems and remote sensing to map

rangeland salinity source areas, Upper San Rafael River, Utah. M.S. Thesis, Utah

State University, Logan, UT.

Pei, T., C.Z. Qin, A.X. Zhu, L. Yang, M. Luo, B. Li, and C. Zhou. 2010. Mapping soil

organic matter using the topographic wetness index: A comparative study based

on different flow direction algorithms and kriging methods. Ecological Indicators

10:610-619. 76

Rivero, R.G., S. Grunwald, and G.L. Bruland. 2007. Incorporation of spectral data into

multivariate geostatistical models to map soil phosphorous variability in a Florida

wetland. Geoderma 140:428-443.

Roudier, P., A.E. Hewitt, and D.E. Beaudette. 2012. A conditioned Latin hypercube

sampling algorithm incorporating operational constraints. Pg. 227-238. In: B.

Minasny, B.P. Malone, and A.B. McBratney. (ed.) Digital soil assesments and

beyond, Taylor & Francis Group, London.

Roudier, P. 2011. clhs: an R package for conditioned Latin hypercube sampling.

http://code.scenzgrid.org/index.php/p/clhs/ Accessed May 2012.

Sack, D. 1987. Geomorphology of the Lynndyl dunes, west-central Utah, in Cenozoic

geology of western Utah – sites for precious metal and hydrocarbon

accumulations. Utah Geological Association Publication 16, 291-299.

Schoeneberger, P J., D.A. Wysocki, E.C. Benham, and Soil Survey Staff. 2012. Field

book for describing and sampling soils, Version 3.0. Natural Resources

Conservation Service, National Soil Survey Center, Lincoln, NE.

Shi, X., R. Long, R. Dekett, and J. Phillipe. 2009. Integrating different types of

knowledge for digital soil mapping. Soil Sci. Soc. Am. J. 73:1682-1692.

Shi, X., A-X. Zhu, J. Burt, W. Choi, R-X. Wang, T. Pei, and B-L. Li. 2007. An

experiment with circular neighborhood in the calculation of slope gradient from

DEM, Photogrammetric Engineering & Remote Sensing 73:143-154.

Shi, X., A-X. Zhu, J. Burt, F. Qi, and D. Simonson. 2004. A case-based reasoning

approach to fuzzy mapping. Soil Sci. Soc. Am. J. 68:885-894. 77

Soil Survey Division Staff. 1993. Soil survey manual. Service. U.S.

Department of Agriculture Handbook 18. Lincoln, Nebraska.

Soil Survey Staff. 2004. Soil Survey Laboratory Methods Manual. Soil Survey

Investigations Report No. 42. Version 4.0. R. Burt (ed.). U.S. Department of

Agriculture, Natural Resources Conservation Service. Lincoln, Nebraska.

Stum, A.K., J.L. Boettinger, M.A. White, and R.D. Ramsey. 2010. Random forests

applied as a soil spatial predictive model in arid Utah. In Boettinger, J.L., A.C.

Moore, S. Kienast-Brown, D.W. Howell, and A.E. Hartemink. (ed.) Digital Soil

Mapping: Progress in Soil Science 2, Part 2: Research. Springer, Dordrecht,

Netherlands.

Svetnik, V., A. Liaw, C. Tong, J.C. Culberson, R.P. Sheridan, and B.P. Feuston. 2003.

Random forest: a classification and regression tool for compound classification

and QSAR modeling. J. of Chem. Information and Computer Sci. 43: 1947–1958

Tarboton, D. 2009. Terrain analysis using digital elevation models (TauDEM). Version

3.1. http://hydrology.usu.edu/taudem/taudem3.1/ Accessed 14 November 2013.

Tarboton, D. 2012. Terrain analysis using digital elevation models (TauDEM). Version

5.1.1. http://hydrology.usu.edu/taudem/taudem5/index.html. Accessed 8 May

2013.

Utah Climate Center. 2013a. Black Rock Station dynamic report for period of record.

http://climate.usurf.usu.edu/reports/monthly_data_summary.php?stn=USC004207

30&unit=SI&network=direct:ghcn&sidebar=0. Verified 16 August 2013. Utah

State University, Logan, UT. 78

Utah Climate Center. 2013b. Deseret Station dynamic report for period of record.

http://climate.usurf.usu.edu/reports/monthly_data_summary.php?stn=USC004221

01&unit=SI&network=direct:ghcn&sidebar=0. Verified 16 August 2013. Utah

State University, Logan, UT.

Ziadat, F.M., J.C. Taylor, and T.R. Brewer. 2003. Merging Landsat TM imagery with

topographic data to aid soil mapping in the Badia region of Jordan. Jour. Arid

Environments 54: 527-541.

79

APPENDICES

80

Appendix A – Figures of the covariates used in OIF, and classification.

s, and wetness bands. Brightness sformation showing brightness, greenes ess Above Bare Soil Index shown below. greeness were used to calculate the Green greeness were used to calculate the Figure 24. A map of the Tasseled Cap Tran 81

Figure 25. The GRABS index used in the OIF analysis. Figure 25. The

82

e OIF covariate classification. e Vegetation Index used in th

Figure 26. The Normalized Differenc Figure 26. The

83

using bands 5 and 7 from Landsat TM Ratio (NDR). This layer was created . y er g 9 5/7 Normalized Difference Figure 27. The ima

84

ing bands 5 and 4 from Landsat TM imagery. Figure 28. The 5/4 NDR. This layer was created us Figure 28. The

85

ing bands 4 and 7 from Landsat 5 TM imagery. Figure 29. The 4/7 NDR. This layer was created us Figure 29. The

86

ing bands 5 and 2 from Landsat TM imagery. Figure 30. The 5/2 NDR. This layer was created us Figure 30. The

87

del (DEM) using ArcSIE terrain derivatives. e digital elevation mo

Figure 31. A slope map derived from th

88

es. Green values are convex shapes, the DEM using ArcSIE terrain derivativ the DEM Figure 32. A curvature map derived from and red values are concave.

89

in derivatives. Green values are convex ally along the hillslope. ally along the hillslope. ature is measured vertic ature is measured map derived from the DEM using ArcSIE terra

Figure 33. A profile curvature curv are concave. Profile shapes, and red values

90

Green values are convex ally along the hillslope. SIE terrain derivatives. ature is measured horizont ved from the DEM using Arc Figure 34. A planform curvature map deri shapes, and red values are concave. Planform curv are concave. Planform shapes, and red values

91

ng Area Ratio, calculated using TauDEM. so called the Slope over Contributi that are water-shedding. Higher values indicate areas Figure 35. The inverse wetness index, al Figure 35. The

92

m the DEM in ArcGIS using the Spring Equinox layer. m the DEM greater insolation. insolation. greater Figure 36. The Area Solar Radiation (ASR) map generated fro Figure 36. The Higher values indicate

93

ack that were used for unsupervised ovariates shown in the final layer st re Band 7, 5/2 NDR, and NDVI. Figure 37. The three Landsat-derived OIF c Figure 37. The classification. Covariates we

94

k that were used for unsupervised and ASR. , riates shown in the final layer stac Profile Curvature , e p Figure 38. The three DEM-derived OIF cova Figure 38. The classification. Bands shown are Slo

95

Appendix B – Summary Data for the Typical Pedons.

Table 10. The table of typical pedons showing all data collected. Some samples were not analyzed due to lack of adequate soil.

pH (by depth EC CaCO3 Eq Colorimetric Gypsum Soil ID ID Sat Efferv (cm) (dS/m) (%) pH (%) Paste) MF 013 0‐10 MFL078 7.25 3.16 42.1 VE 8.1 ‐ 10‐26 MFL079 7.40 0.68 41.2 VE 8.2 ‐ 26‐47 MFL080 ‐ ‐ 42.0 VE 8.9 ‐ 47‐74 MFL081 7.70 4.04 40.5 VE 8.5 ‐ 74‐92 MFL082 ‐ ‐ 47.3 VE 8.6 ‐ MF 019 0‐10 MFL097 7.22 1.80 30.1 VE 8.1 ‐ 10‐35 MFL098 7.83 0.55 44.7 VE 8.2 ‐ 35‐51 MFL099 8.05 3.31 64.9 VE 8.7 ‐ 51‐102 MFL100 7.92 6.45 60.5 VE 8.8 ‐ MF 020 0‐8 MFL101 ‐ ‐ 17.2 VE 8.1 ‐ 8‐27 MFL102 ‐ ‐ 13.4 VE 8.2 ‐ 27‐44 MFL103 ‐ ‐ 37.8 VE 8.6 ‐ 44‐60 MFL104 ‐ ‐ 53.0 VE 8.8 ‐ MF 025 0‐10 MFL115 ‐ ‐ 15.8 VE 8.1 ‐ 10‐26 MFL116 ‐ ‐ 20.6 VE 8.2 ‐ 26‐43 MFL117 ‐ ‐ 41.5 VE 8.3 ‐ 43‐53 MFL118 ‐ ‐ 65.4 VE 8.5 ‐ 53‐85 MFL119 ‐ ‐ 64.4 VE 8.6 ‐ 85‐96 MFL120 ‐ ‐ 52.4 VE 8.5 ‐ MF 028 0‐14 MFL130 ‐ ‐ 2.6 SL 7.9 ‐ 14‐24 MFL131 ‐ ‐ 6.3 ST 8.1 ‐ 24‐49 MFL132 ‐ ‐ 30.1 VE 8.4 ‐ 49‐70 MFL133 ‐ ‐ 42.0 VE 8.5 ‐ MF 031 0‐7 MFL145 7.32 2.00 6.5 SL 7.6 ‐ 7‐21 MFL146 7.90 0.54 6.6 VE 8.3 ‐ 21‐43 MFL147 ‐ ‐ 15.9 VE 8.6 ‐ MF 032 0‐8 MFL148 7.13 2.71 10.1 VE 8.2 ‐ 8‐22 MFL149 7.59 0.53 16.8 VE 8.3 ‐ 22‐51 MFL150 ‐ ‐ 19.4 VE 8.4 ‐ 51‐68 MFL151 7.91 3.12 29.6 VE 8.5 ‐ 68‐79 MFL152 ‐ ‐ 30.3 VE 8.2 ‐ 96

79‐108 MFL153 ‐ ‐ 22.1 VE 8.6 ‐ 108‐159 MFL154 ‐ ‐ 13.0 VE 8.3 ‐ 159‐190 MFL155 ‐ ‐ 13.5 VE 8.1 ‐ 190‐196 MFL156 ‐ ‐ 13.2 VE 8.1 ‐ MF 034 0‐7 MFL162 7.91 0.75 4.3 SL 8.1 ‐ 7‐27 MFL163 7.91 0.31 13.7 ST 8.0 ‐ 27‐53 MFL164 7.80 0.30 12.8 VE 8.5 ‐ 53‐64 MFL165 7.80 0.35 19.3 VE 8.2 ‐ 64‐105 MFL166 8.14 0.35 12.0 VE 8.7 ‐ MF 035 0‐19 MFL167 ‐ ‐ 4.3 SL 8.2 ‐ 19‐35 MFL168 ‐ ‐ 5.2 ST 8.2 ‐ 35‐58 MFL169 8.13 0.33 ND VE 8.2 ‐ 58‐110 MFL170 ‐ 0.37 9.2 VE 8.3 ‐ MF 048 0‐15 MFL213 ‐ 1.02 0.2 VSL 7.5 ‐ 15‐35 MFL214 ‐ ‐ 0.8 SL 8.1 ‐ 35‐57 MFL215 ‐ ‐ 5.9 ST 8.5 ‐ 57‐71 MFL216 ‐ ‐ ND VE 8.6 ‐ 71‐102 MFL217 ‐ ‐ ND VE 8.6 ‐ MF 051 0‐9 MFL221 7.83 0.55 4.5 ST 8.1 ‐ 9‐33 MFL222 7.85 0.30 6.8 ST 8.2 ‐ 33‐45 MFL223 8.02 0.30 9.5 VE 8.3 ‐ 45‐70 MFL224 7.96 0.18 8.1 VE 8.3 ‐ 70‐101 MFL225 ‐ ‐ 12.4 VE 8.4 ‐ MF 054 0‐6 MFL233 ‐ ‐ 2.6 ST 8.5 ‐ 6‐29 MFL234 ‐ ‐ 36.9 VE 8.2 ‐ 29‐54 MFL235 7.93 1.03 43.1 ST 8.7 ‐ 54‐75 MFL236 ‐ ‐ 32.9 ST 8.8 ‐ 75‐130 MFL237 ‐ ‐ 35.0 ST 8.6 ‐ MF 057 0‐21 MFL247 7.50 1.08 1.5 SL 8.1 ‐ 21‐41 MFL248 7.65 0.43 5.5 VE 8.2 ‐ 41‐60 MFL249 ‐ ‐ 6.8 ST 8.2 ‐ MF 066 0‐10 MFL285 7.87 0.91 16.2 ST 8.1 ‐ 10‐28 MFL286 ‐ ‐ 38.8 ST 8.2 ‐ 28‐58 MFL287 ‐ ‐ 57.3 VE 8.3 ‐ MF 072 0‐14 MFL309 ‐ 0.61 13.0 ST 8.2 ‐ 14‐25 MFL310 ‐ 1.98 11.7 VE 8.3 ‐ 25‐44 MFL311 ‐ ‐ 14.4 VE 8.6 ‐ 44‐60 MFL312 ‐ ‐ 16.1 VE 8.7 ‐ 60‐75 MFL313 ‐ ‐ 13.3 VE 8.8 ‐ MF 084 0‐11 MFL380 ‐ 1.80 20.7 ST 8.7 ‐ 11‐31 MFL381 ‐ 5.86 25.7 VE 8.6 ‐ 97

31‐53 MFL382 ‐ ‐ 32.8 VE 8.6 ‐ 53‐72 MFL383 ‐ ‐ 36.7 ST 8.4 ‐ 72‐126 MFL384 7.39 34.25 43.8 ST 8.4 ‐ MF 086 0‐19 MFL392 8.05 0.99 16.6 VE 8.5 ‐ 19‐33 MFL393 8.07 0.98 16.7 VE 8.8 ‐ 33‐56 MFL394 8.05 6.80 27.4 VE 8.6 ‐ 56‐80 MFL395 ‐ ‐ 27.9 VE 8.3 ‐ 80‐155 MFL396 7.39 24.71 29.8 ST 8.1 ‐ MF 088 0‐9 MFL402 7.79 36.44 6.5 VE 7.6 20.95 9‐26 MFL403 7.71 15.55 17.2 VE 8.4 5.06 26‐42 MFL404 ‐ ‐ 23.8 VE 8.3 0.75 42‐70 MFL405 ‐ ‐ 20.1 VE 8.2 3.86 70‐103 MFL406 ‐ ‐ 17.0 VE 8.6 3.02 103‐136 MFL407 7.55 16.21 13.4 ST 8.1 11.71 136‐152 MFL408 ‐ ‐ 17.4 ST 8.1 3.42 MF 097 0‐7 MFL447 ‐ 0.53 2.6 VE 7.8 ‐ 7‐34 MFL448 ‐ 0.37 8.5 VE 8.2 ‐ 34‐68 MFL449 ‐ ‐ 38.8 VE 8.6 ‐ 68‐82 MFL450 ‐ ‐ 23.4 ST 8.7 ‐ 82‐96 MFL451 ‐ 13.86 16.7 VE 8.7 ‐ 96‐102 MFL452 ‐ 10.24 2.7 ST 8.5 ‐ MF 106 0‐5 MFL486 ‐ ‐ 27.0 ST 8.1 ‐ 5‐20 MFL487 ‐ ‐ 26.5 VE 8.1 ‐ 20‐35 MFL488 8.33 1.73 31.6 VE 8.6 ‐ 35‐61 MFL489 ‐ 11.90 31.5 ST 8.7 ‐ 61‐72 MFL490 ‐ ‐ 30.6 ST 8.5 ‐ 72‐91 MFL491 ‐ 18.36 32.7 VE 8.6 ‐ 91‐106 MFL492 ‐ 16.50 31.9 ST 8.6 ‐ MF 109 0‐8 MFL502 ‐ ‐ 17.9 VE 8.2 ‐ 8‐28 MFL503 ‐ ‐ 24.1 VE 8.4 ‐ 28‐51 MFL504 ‐ ‐ 28.7 VE 8.6 ‐ 51‐66 MFL505 ‐ ‐ 35.9 ST 8.6 ‐ MF 110 0‐8 MFL506 ‐ ‐ 12.4 SL 8.0 ‐ 8‐29 MFL507 ‐ 0.37 27.0 ST 8.2 ‐ 29‐40 MFL508 ‐ ‐ 25.0 VE 8.3 ‐ 40‐82 MFL509 ‐ ‐ 54.5 VE 8.2 ‐ 82‐100 MFL510 ‐ ‐ 45.1 ST 8.2 ‐ MF125 0‐10 MFL572 7.66 0.65 8.2 ST 8.3 ‐ 10‐42 MFL573 8.13 0.37 9.6 ST 8.2 ‐ 42‐55 MFL574 ‐ ‐ 27.8 VE 8.5 ‐ 55‐77 MFL575 ‐ ‐ 7.4 VE 8.6 ‐ 98

77‐101 MFL576 ‐ ‐ 9.2 VE 8.1 ‐ MF 144 0‐30 MFL643 ‐ 0.46 16.7 VE 8.1 ‐ 30‐71 MFL644 ‐ ‐ 11.6 VE 7.8 ‐ 71‐90 MFL645 ‐ ‐ 7.1 VE 8.1 ‐ 90‐105 MFL646 ‐ ‐ 17.0 VE 8.3 ‐ 105‐127 MFL647 ‐ ‐ 11.9 VE 8.2 ‐ MF 511 0‐7 MFL1045 7.51 1.19 6.9 ST 7.6 ‐ 7‐26 MFL1046 8.04 0.31 7.4 VE 7.8 ‐ 26‐53 MFL1047 8.09 0.34 10.9 VE 7.9 ‐ 53‐73 MFL1048 8.03 4.67 26.0 VE 8.7 ‐ 73‐81 MFL1049 8.03 9.87 17.7 VE 8.6 ‐ 81‐92 MFL1050 7.72 13.61 15.1 VE 8.5 ‐ 92‐111 MFL1051 7.47 14.53 15.9 VE 8.6 ‐ MF 529 0‐9 MFL1127 7.74 1.04 24.2 VE 8.4 ‐ 9‐28 MFL1128 7.89 1.71 35.0 VE 8.5 ‐ 28‐40 MFL1129 7.69 10.63 35.0 VE 8.6 ‐ 40‐53 MFL1130 7.58 12.83 34.2 VE 8.5 ‐ 53‐101 MFL1131 7.59 15.86 31.3 VE 8.4 ‐ MF 534 0‐9 MFL1157 7.65 0.77 10.8 VE 8.4 ‐ MF 544 0‐9 MFL1203 7.79 0.51 1.4 SL 8.1 ‐ 9‐21 MFL1204 7.61 0.74 2.5 ST 8.1 ‐ 21‐38 MFL1205 ‐ ‐ 17.7 VE 8.4 ‐ MF 547 0‐11 MFL1212 7.47 1.38 9.8 VE 8.7 ‐ 11‐41 MFL1213 7.81 0.46 6.6 VE 8.6 ‐