Bas Bas Kempen Sene Area, Rip du inNioro the Mapping Soil Digital GIRS-2005-24 Report Thesis GeoĉInformation for Centre

June 2005 gal gal

Digital Soil Mapping in the Area,

Bas Kempen

A thesis submitted in partial fulfillment of the degree of Master of Science at Wageningen University and Research Centre, The Netherlands.

June 2005

Wageningen, The Netherlands

Registration nr: 800303427040 Thesis code: GRSĉ80340 Thesis report: GIRSĉ2005ĉ24

Supervision: Dr Ir Sytze de Bruin Centre for GeoĉInformation Dr Ir Gerard Heuvelink Laboratory of Soil Science and Geology / Alterra Dr Ir Jetse Stoorvogel Laboratory of Soil Science and Geology

Examination: Prof. Dr Ir Arnold Bregt Centre for GeoĉInformation Dr Ir Sytze de Bruin Centre for GeoĉInformation Dr Ir Gerard Heuvelink Laboratory of Soil Science and Geology / Alterra

Centre for GeoĉInformation Environmental Sciences Department Wageningen University

Acknowledgements

Inspired by a presentation by Gerard Heuvelink, I dived a year ago into the interesting world of pedometrics and digital soil mapping. A dive that gave me the opportunity to explore Senegal, that lead me to a feast of eating and drinking (in scientific circles referred to as the “Global Workshop on Digital Soil Mapping”) in Montpellier, France and that finally resulted in the report you are now looking at. A dive I do not regret to have made.

During the past year many people got involved in this project. I would like to use this opportunity to thank them. I will start with my supervisors Sytze de Bruin, Gerard Heuvelink and Jetse Stoorvogel. Thank you very much for your help, advice, comments and dedication throughout the year and for the critical review of the draft report. Your efforts were greatly appreciated.

In Senegal I could not do without the support of Bocar Diagana and Adrien Mankor of ISRA Dakar. Thank you very much for all arrangements you made to facilitate the fieldwork and for making this toubab feel a bit home. You were the solid base of the success of the fieldwork. Another person who I owe my deepest thanks to is Abibou Niang of ISRA St Louis and his family. Abibou, thank you for your incredible hospitality during my stay in St. Louis, for the help you continuously offered and for all the work you did for me, including the analysis of 187 soil samples. For the fieldwork I owe my thanks to my excellent field guide, moto-driver and assistant Mbaye Diaw, to the people of the villages in the Nioro du Rip area who gave us directions time and time again and shared their food and water with us and to the ISRA enumerators in Nioro du Rip for their company. Furthermore I would like to thank the driver Cheikh Mbaye and the director of ISRA Bambey Ousmane Ndoye and all other people of the staff of the ISRA departments in Dakar, St Louis and Bambey who assisted me in one way or another. Merci beaucoup!

Of the staff of the Centre for Geo-Information of Wageningen University I would like to thank Harm Bartholomeus, Gerrit Epema, Michael Schaepman and John Stuiver for their technical and scientifical help and advice. The advice of Dick Brus concerning soil sampling strategy and the validation procedure was greatly appreciated.

This thesis marks the end of a seven year period as a student at Wageningen University. Last but not least I would like to thank my family and friends for their interest and support. Even Maarten who kept me on the couch too long in the mornings by seducing me with fresh coffee (… ‘you can drink one more cup’….). And Hans and Bram who kept me too often around the kitchen table to battle for Middle Earth while I had to be behind my computer. I am especially greatful to my girlfriend Els for her support and her patience during the many months I was exploring the far corners of this planet the past years. And above all I owe my deepest thanks to my parents. Not only for the approximately 3000 kilometers they drove the past seven years to pick me up and bring me to the train station but for their unconditional support in every sense during this period of my life.

Bas Kempen

Wageningen, June 2005

3

4

Table of Contents

ACKNOWLEDGEMENTS 3

ABSTRACT 7

1. INTRODUCTION 9 1.1 DIGITAL SOIL MAPPING 9 1.1.1 Background 9 1.1.2 Environmental variables and soil spatial prediction 9 1.1.3 Quantitative vs. qualitative models of soil variation 10 1.2 SOIL DATA NEEDS IN SUB-SAHARAN AFRICA 11 1.2.1 Trade-off analysis model 11 1.2.2 Importance of SOC and soil texture for agricultural production 11 1.2.3 Digital soil mapping as tool to meet soil data requirements 11 1.2.4 Research objective 12

2. MATERIALS AND METHODS 13 2.1 THE STUDY AREA 13 2.1.1 Location and climate 13 2.1.2 Geology and geomorphology 13 2.1.3 Soils and land use 16 2.2 CONCEPTUAL MODEL 17 2.2.1 CLORPT-related processes that affect soil spatial distribution in the Nioro du Rip area 17 2.2.2 Mapping soil organic carbon and the fine fraction contents: model framework 19 2.3 DATA DESCRIPTION , PREPROCESSING AND GENERATION 20 2.3.1 Soil data 20 2.3.2 Remote sensing imagery 22 2.3.3 Digital elevation model 23 2.4 FROM CONCEPT TO APPLICATION : FROM QUALITATIVE TO QUANTITATIVE 26 2.4.1 Modelling soil organic carbon: model 1 26 2.4.2 Modelling soil organic carbon: model 2 27 2.4.3 Modelling soil texture: model 1 28 2.4.4 Modelling soil texture: model 2 29 2.5 SAMPLING AND VALIDATION 30 2.5.1 Sampling Strategy 31 2.5.2 Selection of sample points for validation 31 2.5.3 Sampling the soil 33 2.5.4 Validation procedure 34

3. RESULTS 37 3.1 MODEL INPUT DATA 37 3.1.1 Landsat ETM+ classification 37 3.1.2 Landscape unit mapping 40 3.2 MODEL RESULTS 42 3.2.1 Soil organic carbon mapping 42 3.2.2 Soil texture mapping 42 3.3 VALIDATION DATA ANALYSIS 45 3.3.1 Study area level 45 3.3.2 Landscape unit level 47

5

3.4 MODEL VALIDATION 49 3.4.1 Bias and goodness of fit 49 3.4.2 Statistical inference 50

4. DISCUSSION 57 4.1 PREDICTION MODEL DESIGN 57 4.2 MODEL INPUT : DATA QUALITY AND UNCERTAINTY 58 4.2.1 Digital elevation model 58 4.2.2 Landscape classification 58 4.2.3 Landsat 7 ETM+ image classification 59 4.2.4 Environmental predictor variables 60 4.3 SPATIAL PREDICTION OF THE SOIL ORGANIC CARBON AND FINE FRACTION CONTENTS 60 4.3.1 Spatial dependency 60 4.3.2 Other issues concerning spatial prediction 62 4.4 FINAL THOUGHTS 63 4.4.1 Digital soil mapping within the TOA context 63 4.4.2 Digital soil mapping in Africa 64

5. CONCLUSIONS 65 5.1 MODEL DESIGN AND SPATIAL PREDICTION 65 5.2 SPATIAL DISTRIBUTION OF SOC AND FF 66 5.3 CONCLUDING REMARKS AND RECOMMENDATIONS 66

REFERENCES 67

APPENDICES 71 APPENDIX A The soil data set collected during the fieldwork in 2005 73 APPENDIX B Summary statistics of the soil organic carbon and fine fraction contents on cluster level 76 APPENDIX C Scatterplots showing the observed vs. the predicted values for the three landscape positions 78 APPENDIX D Summary statistics of the PEs and SPEs of the two models on cluster level 80

6

Abstract

Digital soil mapping techniques quantify the functional relationships between soil properties and more readily observed environmental variables, including land use. These relationships are used in quantitative models that predict soil properties at unvisited locations. This study applied digital soil mapping to meet quantitative soil data requirements in the Nioro du Rip area, Senegal. It aimed to integrate qualitative soil-landscape and environmental process knowledge in a catenary context with quantitative spatial prediction methods and to validate the results by statistical inference with an independent data set. The soil properties of interest were soil organic carbon and the fine fraction of the soil texture. A qualitative conceptual model of soil-landscape processes and soil spatial distribution was defined and translated into prediction models that use simple prediction rules. Four models were developed on basis of the available input data: two models for each soil property. These two models had a different level of detail. Environmental input data were derived from a digital elevation model and a Landsat 7 ETM+ image. The models were calibrated with a small soil data set. Validation showed that the model results were poor. All results were biased and the mean squared prediction errors were large. The main reason for the poor model performance was the lack of sufficient soil and environmental data to define the conceptual model and to calibrate the corresponding quantitative model. This resulted in models that were largely based on expert judgment and that did not have a solid empirical support. The insights gained and validation data collected in this study create a basis for more accurate quantitative soil property mapping in the Nioro du Rip area. The elaborate soil data set can, together with additional (high resolution) environmental data, result in a better understanding of the soil spatial distribution with which spatial prediction can be improved.

Keywords: quantitative models, catena, Landsat, DEM, soil organic carbon, soil texture, West Africa.

7

8

1. Introduction

1.1 Digital Soil Mapping

1.1.1 Background Increasing population pressure, (human-induced) land degradation, pollution and climatic change are putting more and more pressure on our natural resources. This threatens food security and sustainable agricultural and environmental development in both the developing world as the developed world. There is a growing demand for accurate, multi-resolution soil data to take decisive action in order to cope with the rising problems and to ensure sustainable land use in the future. However, soil data for present day applications, e.g. quantitative land evaluation tools, is often required at resolutions unusual for classical soil surveys (Walter et al., 2004). Besides, many existing soil maps have a pedogenetic, qualitative character (FAO classification or Soil Taxonomy), which is not suitable for many of the current, more quantitative uses of soil data. An important reason for the lack of quantitative soil spatial data or soil data infrastructures in large parts of the world is that conventional soil survey methods are slow and expensive (McBratney et al., 2003) and often impractical in vast or inaccessible areas. However, during the last decades a new trend evolved in soil science that deals with this problem.

The explosive increase in computation power together with fast developments in data acquisition technology, geostatistics, analytical tools and GIS since the 1980s, have resulted in a gradual shift from conventional, qualitative survey techniques to reproducible, fast and cost-effective quantitative predictive methods, referred to as ‘ Digital Soil Mapping ’ (McBratney et al., 2003). This group of quantitative mapping methods is part of a new field of soil science known as ‘ pedometrics’ which was officially recognized at the beginning of the 1990s (McBratney et al., 2000). Pedometrics is defined as “the application of mathematical and statistical methods for the study of the distribution and genesis of soils” 1.

1.1.2 Environmental variables and soil spatial prediction Digital soil mapping techniques are usually based on the premise that soil properties and their spatial variation are a result from soil forming factors that vary in time and space. Jenny (1941) was one of the first who defined the relationship between soil and soil forming factors in an equation which he intended as a quasi-mechanistic model for soil development (Eq.1):

S = f (CL,O,R,P,T ), (1) where S is a soil property that is a function ( f) of variables that relate to CL (imate), O(rganisms), R(elief), P(arent material) and T(ime). Climate and climatic change influence weathering, leaching, mineralization and erosion of soil material. Organisms homogenize the soil, improve the soil structure and play an important role in organic matter and nutrient cycles. Relief causes soil erosion and redistribution of soil material. Parent material influences physical and chemical soil properties. And time, the last factor, is the driving force behind all other soil forming factors. The famous equation of Jenny can be regarded as the foundation of digital soil mapping (McBratney, 2003). Variables related to the CLORPT factors are in practice referred to as environmental variables (including land use) and can be derived from digital elevation models, remote sensing images or from existing soil or land use maps.

Digital soil mapping techniques exploit the relationships, or correlations, between soil properties and quantitative environmental variables to predict soil properties at unvisited locations. These

1 www.pedometrics.org

9

relationships are subsequently used to calibrate quantitative prediction models. This commonly applied digital soil mapping technique is referred to as environmental correlation (Odeh et al., 1994; McKenzie and Ryan, 1999). Hence, digital soil mapping relies on an empirical, geostatistical based modelling approach rather than on a mechanistic approach of modelling of soil formation (McKenzie and Gallant, 2004).

The advantage of using environmental variables, other than existing soil data, for predictive studies of soil is their extensive availability, even in areas where soil data are scarce. Digital elevation models (DEMs) and satellite imagery are available for most parts of the world. Especially DEMs are regarded as useful tools for predictive studies because many soil properties have a strong relationship with terrain attributes derived from a DEM (Odeh et al., 1995). DEMs and remote sensing images can far more easily be acquired than soil data resulting from a classical survey and are cheaper. Besides that, environmental data availability is increasing on a daily basis (McBratney et al., 2000) and digital mapping is less time-consuming. There are numerous examples of studies in which environmental variables are used as predictors, including Lagacherie and Voltz (2000), Gessler et al. (1995), Odeh et al. (1994) and De Bruin and Stein (1998) who used digital elevation data and terrain attributes as predictors. Odeh and McBratney (2000) use satellite radiometric data and Hengl et al. (2004) and Dobos et al. (2000, 2001) use various combinations of predictive environmental variables. Another important data source that is sometimes forgotten are existing soil maps (Heuvelink and Bierkens, 1992).

Thus, digital soil mapping techniques make extensive use of DEMs and remote sensing images. Still, digital soil mapping cannot do without direct observations of the soil. They remain necessary to establish the character and strength of the relationships with the environmental predictor variables and for the purpose of validation (Heuvelink et al., 2004). Soil surveys or sampling have to be carried out to collect soil data in areas where no data is available. However, these surveys can be more time and financially efficient than traditional pedological surveys. Sampling strategies are adapted for digital soil mapping and can be optimized to minimize prediction errors and maximize sampling efficiency, examples are Heuvelink et al. (2004) and Hengl et al. (2003).

Another advantage of quantitative digital soil mapping techniques compared to qualitative, pedological approaches is the possibility to assess and quantify the uncertainty of the generated soil maps with an independent validation data set. Soil maps that resulted from conventional soil surveys do not quantitatively express uncertainty and variation within soil classes (Heuvelink and Webster, 2001).

1.1.3 Quantitative vs. qualitative models of soil variation Quantitative digital mapping techniques that use spatial prediction have gained domination over qualitative pedological surveys during the last decades, when pedometrics evolved (McBratney et al., 2000). Besides the fact that soil observations are still needed for digital soil mapping, qualitative pedological models (or knowledge) of soil variation and soil-landscape processes should not be discarded. On the contrary, it is more and more advocated to integrate qualitative pedological and landscape process knowledge into quantitative predictive models (McKenzie and Gallant, 2004; Walter et al., 2004), i.e. use process knowledge to define a conceptual model framework on which the quantitative prediction model of soil distribution is based. This must lead to more process-oriented models of pedogenesis for digital soil mapping instead of geostatistical-oriented models that do not explicitly identify the underlying causes of soil spatial variability (Heuvelink and Webster, 2001; Walter et al., 2004). Spatial prediction rules based on environmental correlation should reflect the understanding of soil distribution in an area (McKenzie and Gallant, 2004).

Many studies show that digital soil mapping has become a widely used and quite successful approach to generate quantitative soil data since the early nineties. This offers possibilities to apply the developed techniques in other regions of the world were there is an urgent demand for accurate, quantitative soil data, for example in sub-Saharan Africa.

10

1.2 Soil data needs in sub-Saharan Africa

1.2.1 Trade-off analysis model In sub-Saharan Africa food security and sustainable livelihoods of local communities are threatened by desertification, soil erosion, soil fertility decline, mismanagement and salinization. There is dire need for detailed soil data for quantitative, multi-disciplinary land evaluation tools that may assist to tackle the growing problems in this part of the world. One of these tools is the tradeoff analysis (TOA) model 2. TOA is a GIS-based biophysical and economical model that simulates the complex interactions between economic and environmental factors of agro-ecosystems in the form of trade-offs (Stoorvogel et al., 2004). Users, ranging from community leaders to national decision-makers, can apply the tool to assess the sustainability of agricultural production systems under alternative technology and policy scenarios in order to support policy decision-making (Stoorvogel et al., 2004).

The TOA model has been applied in several study areas worldwide in both developed as developing countries. One of the current projects where TOA is applied is located in the Nioro du Rip area in southwest Senegal, where the work focuses on carbon sequestration. The biophysical part of the TOA model requires quantitative data on soil properties that are lacking in this area. Soil organic carbon (SOC) and soil texture are regarded as key soil properties for agricultural production. Understanding of the spatial distribution of these two soil properties is therefore of great importance for the success of the TOA project.

1.2.2 Importance of SOC and soil texture for agricultural production The chemical and biological properties of the soil rely heavily on soil organic carbon content (Manlay 2002b). It binds nutrients, it increases water holding capacity and it improves structural stability by reducing crusting and the bulk density and by forming stable aggregates that are resistant to erosion and it helps to maintain a stable soil pH (Van Breemen and Buurman, 1998). Furthermore SOC stimulates soil biological activity. Plants and soil fauna homogenize the topsoil, which enhances the porosity and makes plant rooting easier. Soil fauna plays a vital role in decomposition of SOC releases valuable plant nutrients. It plays therefore a vital role in the productivity and sustainability of heavily weathered tropical soils.

The fine particles of the soil texture influences water retention and the cation exchange capacity of the soil. Sandy soils, like the soils in the study area, have a small water and nutrient holding capacity compared to loamy and clayey soils. Small amounts of silt and clay in a sandy soil will improve the water and nutrient holding capacity of the soil. The fine particles of the soil texture can form complexes with SOC. Manlay et al. (2002a) found that carbon was highly positively correlated with clay and silt. The interaction between SOC and soil texture has also a positive effect on water and nutrient storage.

Soil organic carbon contents are in general small in the Nioro study area. But although small SOC contents, relevant spatial differences within the landscape might be present. Insight in the spatial distribution of SOC and the fine particles (silt and clay) of the soil texture in the Nioro study area is limited. There is need for techniques that map these soil properties for a better understanding of their spatial distributions.

1.2.3 Digital soil mapping as tool to meet soil data requirements Soil data availability is very limited for the Nioro du Rip area. Existing data sources are scattered. Two studies were carried out recently by Meerkerk (2003) and Niang (2004) to describe and sample the soils in the Nioro du Rip area. A conventional soil survey encompassing the whole area would be too time and money consuming and is not a feasible option. The TOA study area is large and not easily accessible. Besides, the TOA project has also study areas in other West African countries that face similar problems. When auxiliary data in the form of a DEM and remote sensing images are

2 www. tradeoffs.montana.edu or www.tradeoffs.nl

11

available, application of digital soil mapping might be a solution to meet soil data requirements in the context of the TOA project in Senegal.

The landscape in the study area is dominated by a characteristic toposequence of three dominant landscape units. Plateaus, partly capped with resistant ironstone known as laterite, form the high parts of the landscape. Valleys formed by river activity are the lowlands, which are locally referred to as bas-fonds. In between these landscape units there is a gently sloping transition zone, known as the glacis. This typical landscape occurs throughout West Africa. In such landscape with repeating high to low transitions a catena , a sequence of soils along a topographic transect, might be expected. Soil properties commonly vary with the position along a slope that can be explained by a relationship between topography and soil forming factors. The catena concept is frequently used as a model for soil-landscape formation (Brown et al., 2004). If such catena indeed exists in the Nioro du Rip area, it should be relatively easy to map the spatial distribution of soil organic carbon and soil texture across the landscape.

Digital soil mapping using environmental correlation are used in this study to map the SOC and the fine fraction content of the soil, which encompasses the silt and clay contents characterized by a particle size of 0-50 µm, of the catena. When soil-landscape and environmental processes (related to the CLORPT factors) that affect the soil spatial distribution within the landscape are understood, they can be captured in a qualitative conceptual model of catenary soil distribution. Subsequently, the conceptual model can be translated in quantitative models that map SOC and FF. For the calibration of these models only the existing soil data was used , there was no additional data collected for this purpose. This was a precondition of this study. The models use prediction rules are used that resulted from correlation of environmental variables (related to the soil-landscape processes) derived from a DEM and a remote sensing image with the soil observations. The last but important step in the digital soil mapping process is the validation of the predictions with independent soil data set to assess the prediction accuracy of the models. We want to know how good or bad the developed models predict the spatial distribution of the soil organic carbon and fine fraction contents.

An additional advantage of using digital soil mapping techniques to characterize the catenary spatial distribution of soil properties is its reproducibility. Once functional relationships between soil properties and soil-landscape processes are successfully translated into quantitative prediction models, predictions can be scaled up to similar landscapes in other parts of West Africa which would be very useful in the TOA context.

1.2.4 Research objective The objective of this study is to develop and validate prediction models that map the soil organic carbon content and the fine fraction of the topsoil in the Nioro du Rip area, using digital soil mapping techniques that incorporate qualitative soil-landscape and environmental process knowledge within a catenary context in quantitative prediction methods. Two research questions are derived from the objective: • How can pedological and landscape process knowledge within a catenary context, be used to select relevant environmental predictors and define the model structure? • How accurate are the resulting maps, i.e. what are the bias and accuracy of the predictions?

12

2. Materials and Methods

This chapter describes the methodological steps needed to develop quantitative prediction models for soil organic carbon (SOC) and fine fraction (FF) contents of the topsoil. The first section of this chapter describes the geographical and biophysical aspects of the study area. In the second section of a qualitative conceptual model of soil distribution and spatial prediction is formed on the basis of soil- landscape processes, the influence of vegetation, parent material and human activity on the soil properties of interest. In section three an overview is given of the available data and the data that are required for modelling but not yet available. Section three also describes how these data are generated from the DEM and remote sensing image. In the fourth section two quantitative prediction models for both soil organic carbon and fine fraction are calibrated according to the qualitative conceptual model framework. The fifth and last section of this chapter is dedicated to the validation process of the models. First the soil sampling strategy to collect validation observations is described followed by the selection method of the sample points and the sample procedure, followed by the statistical inference of the validation.

The methodology is captured in a flow diagram depicted in Fig. 1. The diagram shows that the methodological framework consists of five steps. The steps are numbered according to the chapter section where they are discussed.

2.1 The study area

2.1.1 Location and climate The study area lies in the south-west of Senegal close to the Gambian border between 13°45’-13°59’ north and 15°41’-15°59’ west, just north of the Nioro du Rip community (Fig. 2). It is part of the peanut basin that stretches through central Senegal between the Gambian and Mauritanian borders. The study area measures roughly 816 km 2. The year is divided in a dry season from November until May and a wet season from June until October. Rainfall is erratic, varying from 450 to 1300 mm per year with a long term average of 750 mm. The last twenty years the annual rainfall in Senegal showed a decreasing trend, which can be interpreted as an expansion of the Sahel and Sahara to the west and south. The average annual temperature is 27.5°C with a maximum temperature of 38°C and a minimum temperature of 15°C. The maximum elevation is 43 meters above sea level.

2.1.2 Geology and geomorphology The large African Precambrian shield forms the foundation of the geological structure in the study area. During the late Tertiary, vast amounts of sediments accumulated that hardened into sandstones that contain small amounts of clay. These sandstones form now the surface rocks of the area. The sandstones are heterogeneous with regard to grain size, color and thickness. Lenses of sand, kaolinitic clay banks and layers of iron rich fine pebbles occur. During humid periods in the Quaternary, highly weathered saprolite with quartz-rich clay, called plinthite, was formed. This plinthite hardened irreversibly to ironstone during a subsequent drier era (Driessen and Dudal, 1991). While the basin eroded, the plateaus, protected by ironstone caps, became the higher parts in the landscape (Fig. 3). The long, faint footslopes (< 1%) of the plateaus are referred to as ‘glacis de raccordement’ (Fig. 4). The glacis, that can be several kilometers in length, continue towards the low flat areas, known as the ‘bas-fonds’ (Fig. 5). They consist of eroded sandstone and plinthite. The transition area between the glacis and the bas-fond are denominated as colluvo-alluvial terraces and are marked by a ‘bend’ in the slope. Along the rivers narrow, discontinuous river terraces and recent alluvial deposits can be found although identification in the field is often difficult. The slope from the colluvo-alluvial terraces to the river beds is relatively steep; often more than 10% (Meerkerk, 2003). Fig. 6 gives a schematic representation of this typical West African landscape.

13

Define Literature Conceptual model of soi spatial distribution and soil-landscape processes Field Knowledge Determine required input data CONCEPTUAL MODEL DESIGN (1.2)

Landsat 7 DEM (raw) Soil Data Set ETM+ Image (raw)

Preprocessing

Preprocessing Study area Preprocessing and DEM Analysis

Fill sinks PREPROCESSING SOURCE DATA (1.3)

Depressionless Preprocessed Spectral enhancment DEM Landsat Image

Derive terrain Landscape Unit Delineate attributes Map

PCA, TCT, Village and CLassification NDVI, Band Bare Soil Areas Terrain Road Map Attribute Maps Composites

Calculate TM3/TM1 Create CLassification Map landscape units and krige result

Natural Compound TM3/TM1 ratio Vegetation Ring values Areas

GENERATING MODEL INPUT (1.3)

Calculate Conceptual model of soi spatial distribution Standardized Combine data and define relationships Soil Data and soil-landscape Glacis Position processes Combine data and define relationships

Correlate

Predictive Predictive Predictive relationships per relationships per relationships per landscape unit for landscape unit for landscape unit for SOC Model 2 FF Model 2 Models 1 Build Build

Build Build Soil Organic Fine Fraction Carbon Soil Organic Model 1 Fine Fraction Carbon Model 1 Model 2 Model 2 Predict Predict Predict Predict

Fine Fraction Soil Organic Soil Organic Fine Fraction Map 1 Carbon Map 1 Carbon Map 2 Map 2

BUILDING PREDICTION MODELS(1.4) Validate Validate Validate Validate Method of Statistical Inference Prediction Prediction Prediction Prediction Define Errors Errors Errors Errors FF model 1 SOC model 1 SOC model 2 FF model 2

Sample Design Test for differences Test for differences Select Collect in Comparison Comparison the field Validation Sample performance performance Observations Locations FF models SOC models MODEL VALIDATION (1.5)

Figure 1. Flow diagram that represents the methodological framework of this study. Vertically striped boxes represent the available source data, diagonally striped boxes represent the input of the prediction models and the checkered boxes represent the model output.

14

Figure 2. Location of the study area in Senegal, given by the black rectangle in the right part of the figure.

Figure 3. The laterite capped plateau.

Figure 4. The very gently sloping glacis during the dry season.

15

Figure 5. The bas-fond during the dry season. Here with shrub vegetation and a very dense soil. Other parts of the bas-fond are bare and used for agricultural practice.

Figure 6. The characteristic toposequence of West African landscapes (modified after Manlay et al., 2002a).

2.1.3 Soils and land use Soils in the Nioro area are formed in the weathered materials of the ironstone caps and the underlying sandstone. They have an in-situ (plateau), colluvial (glacis) or alluvial (terraces and bas-fond) origin. The red soils on the plateaus are in general stony and shallow. Laterite pebbles are often found on surface. On the glacis, colluvo-alluvial terraces and bas-fonds the soils are deep. The soil color of the topsoil ranges from red brown (7.5YR) on the upper and mid glacis to beige brown on the lower glacis, the terraces and the bas-fonds. Ironstone can be present at some depth or at the surface of the glacis soils. Clay content increases with depth and most soils have a clay illuviation horizon (argic B). The soils on the plateaus and glacis are highly weathered, leached and have a high iron content, which explains the reddish soil color. The soils of the bas-fonds are chemically richer due to seasonal flooding. Some of bas-fond soils show vertic properties during the dry season.

According to the FAO classification system the majority of the soils are classified as ferric or plinthic lixisols (plinthoxeralfs) and leptosols (ustorthents) on the plateau, haplic lixisols (haploxeralfs) on the glacis and haplic gleysols (endoaquept) and cambisols (ustepts) in the bas-fonds (Meerkerk, 2003).

16

Extensive agriculture dominates the land use in the study area. Dominant crops are groundnut which is rotated with millet, sorghum and maize. Other crops grown in the area are cotton, tomato, niebe, kandia, manioc, cabbage, djiakhatu and sesame. The crops are planted from July to August and harvested in October and November (maize and cereals) or November and December (groundnut). After the harvest the fields are cleared. Groundnut leftovers have high value as cattle fodder. The sale of the leftovers is often a more important income source for the farmers than the sale of the groundnuts. Other crop residues are used for construction or are burnt. The removal or burning of crop residues removes valuable carbon from the soil (Niang, 2004).

The glacis and the colluvo-alluvial terraces are the most intensively used part of the landscape for agricultural purposes. Farm management practice such as removal or burning of crop residues affects the soils on the glacis on a larger scale than the soils in the other landforms. Agricultural possibilities on the plateaus are limited because of the presence of laterite. Crops are only grown in the internal zones of the plateaus where the soil is deep enough. The rocky parts of the plateau are not cultivated and are covered with forest or savanna. These parts of the plateaus are used for grazing. The bas-fonds are partly under cultivation but their agricultural significance is less compared to the glacis. Workability is limited because of presence of heavy clay soils. Also inundation during the wet season is common.

Patches with natural vegetation in the study area can roughly be divided in shrub dominated and grass dominated. Shrub dominated areas consist of shrub vegetation, ranging from 1 to 2.5 meters high, with an undergrowth of grasses. Shrub density varies between patches. Shrub dominated areas occur predominantly in the bas-fonds. The grass dominated patches consist of long grasses with some scattered shrubs are found in the bas-fonds but also in the higher parts in the landscape. The large baobab ( Adansona digitata ) is the dominant tree species in the study area. These are found in groves around villages or scattered as solitary trees or in small clusters throughout the landscape.

2.2 Conceptual Model

This study aims to incorporate soil-landscape and environmental process knowledge (that is related to the CLORPT factors) within the catena concept in quantitative spatial prediction of soil properties. This means that first a conceptual model has to be defined that describes the soil-landscape and environmental processes and their consequences for the soil spatial distribution and spatial prediction. In subsequent steps this process knowledge has to be translated into a framework for quantitative prediction.

2.2.1 CLORPT-related processes that affect soil spatial distribution in the Nioro du Rip area The nature and intensity of the CLORPT soil forming factors vary along the catena positions in the Nioro du Rip area, resulting in a spatial variation of soils and soil properties within the toposequence.

The CL imate factor does not have a pronounced effect on the soils in the study area. The low and subtle relief does not cause climatic contrasts within the area. Sunshine and rainfall can be regarded homogeneously distributed throughout the area.

The influence of Organisms on the soil spatial distribution is related to the effect of vegetation and human activity in the area. Clearing of forest or woodland for agricultural land use leads to a fast decline of SOC. Especially in low-input land use systems, like in West Africa, where crop residues are removed or burned and where the only organic input consists of sporadic animal droppings. However, after the land has been left fallow and natural vegetation gets a chance to reestablish, the SOC content increases (Manlay et al., 2002b) and keeps increasing until dead biomass input and mineralization are in equilibrium, i.e. the natural situation is restored. The area immediately surrounding the villages, the so-called compound ring, are often used as a place for night corralling of cattle and as dump site for (organic) household waste (Manlay et al., 2002a). We can therefore expect larger SOC contents in areas under natural vegetation and in the compound ring.

17

The most widespread impact of organisms on texture is caused by termites. Termites transport silt and clay from the subsoil to the topsoil to build their mounds. The base of the mound consists of a very dense, almost impenetrable layer of this fine soil material, which is often a few meters in diameter. There are innumerable termite mounds scattered through the landscape. In time these mounds erode and the fine material is redistributed Dijkerman and Miedema (1988). However, termite mounds are too small cannot be discerned with the available input data. Therefore their impact on fine fraction was not taken into account in the models.

Relief causes soil erosion and sedimentation processes in the landscape that have a pronounced effect on the soil spatial distribution and are therefore often the point of focus in soil-landscape modelling studies, e.g. Schoorl (2002). Milne (1935 and 1936), who coined the term catena, suggested that soil– landscape processes including erosion–deposition are one of the specific mechanisms of catenary soil formation (as cited in Brown et al., 2004). Stancioff et al. (1984) concluded that soils throughout Senegal have a high to very high susceptibility for water and wind erosion. Pieri (1969) estimated the annual loss of topsoil caused by water erosion in the Nioro area at 30 t/ha.

Superficial run-off of water during the onset of the dry season causes erosion and subsequent sedimentation of soil material on the gently sloping glacis. Erosion predominantly affects (slightly steeper) upper half of the glacis. The erosion and transportation capacity of water increases when flowing down the glacis because of larger flow accumulation. The flowing water loses its energy in the lower parts of the landscape (lower glacis or bas-fonds) where the land surface flattens (Schoorl, 2002). The transported soil material with valuable nutrients and organic matter is deposited here. Erosion and sedimentation processes affect the spatial distribution of both SOC and FF. The fine fraction of the soil is more prone to erosion than the coarser soil material because it is more easily transported by flowing water. Soil organic matter, often bound with these fine soil particles (van Breemen and Buurman, 1998), is transported down the slope together with the soil particles. As a result of these processes we expect to find relatively small SOC and FF contents on the upper parts of the glacis and larger contents in the soils on the lower parts of the glacis and in the bas-fonds.

Dijkerman and Miedema (1988) described a similar trend in a catena in a similar landscape although under a different climatic regime. They found colluvium from the upper parts of the landscape on the terraces in the lower parts of the landscape. They found first signs of sedimentation on soil units at roughly halfway the slope and concluded that it was the result from colluviation combined with termite activity. Allison (1991) stated that surface erosion is at maximum in mid to low slope leading to sedimentation on the lower parts of the slope. Dijkerman and Miedema (1988) found that the terraces in the lower parts of the landscape consisted predominantly of fine sand, silt and clay. The thickness of this layer increased down the slope. The soils in the upper parts of the landscape were usually lacking this fine layer of sediment and were distinctly coarser. Dijkerman and Miedema (1988) suggested that the fine material, coming from eroded termite mounds, is transported down the slope to the collegial footslopes and terraces. In the catena studied by Dijkerman and Miedema (1988) the soils on upper slopes had a much lower silt and clay content (40%) compared to the upland (plateau) soils. In the lower parts of the landscape the silt and clay content started to rise from 51% on the upper terraces to 90% on the lower terraces.

The plateaus form the high and relatively stable parts of the landscape. They are partly capped with laterite that often outcrops at the plateau edges. Soils on the plateaus are less prone to erosion processes. Soils are older and likely in a more advanced weathering stage compared to the glacis soils. We can therefore expect that soils on the plateau have a somewhat heavier texture than the soils on the glacis.

During the wet season water collects in the bas-fonds. They will receive sediment containing silt, clay and SOC. Contents will be larger here than on the glacis. Furthermore, the soils distribution in the bas-fond is expected to be heterogeneous. Dijkerman and Miedema (1988) found very diverse soil units in the lower part of the landscape. River activity in the past under different climatic regimes

18

caused alternating erosion and sedimentation phases creating a large short-scale spatial variation of soil properties.

The influence of Parent material on the spatial distribution of SOC and FF is less pronounced than the effect of relief and organisms. Basically, all soils were formed in the same sandstone material. However, this sandstone was not homogenous. Layers of clay and iron are present within the sandstone. Many of the present day soils are formed in erosion products of the sandstone. Landscape processes and transport of eroded soil material caused differentiation of the erosion products. Soils of alluvial origin are for example likely to be finer textured than soils of colluvial origin or than soils that were formed by weathering of the laterite.

The most obvious difference in the present day situation is the sharp transition of soil color between the upper and lower parts of the landscape. Soil color changes abruptly approximately halfway the glacis. Soils in the upper part are red brown (7.5YR), which is a result from the iron rich lateritic parent material. Soils in the lower part of the landscape are beige brown (10YR) and in general somewhat finer textured. The bas-fond soils are more heterogeneous than the soils on the glacis and plateaus. Some of the bas-fond soils consist of loose, fine material. Other soils showed large cracks caused by shrinking and are almost as hard as concrete when dry.

Time is the driving force behind all soil forming processes. The factor time is not a model variable. The prediction models focus on a two dimensional distribution of SOC and FF.

2.2.2 Mapping soil organic carbon and the fine fraction contents: model framework Before we proceed to the translation of the conceptual model into quantitative models (section 2.4), we will first give an outline of the framework of the prediction models followed by list of required input data considering the conceptual model.

The CLORPT-related soil-landscape and environmental processes that were described in section 2.2.1 were considered the driving factors behind the spatial distribution of the SOC and FF contents in the study area. We have seen that these processes vary between the landscape units. A good starting point for the modelling of the soil spatial distribution using the catena concept is to discretize the landscape into three landscape units: plateau, glacis and bas-fond. The colluvo-alluvial terraces, the fourth landscape unit, are considered as a part of the glacis. The rationale behind this choice is that they could not be identified from the input data (see section 2.3). Furthermore, the terraces are discontinuous and hard to distinguish in the field from the glacis. The landscape unit can be considered as the first environmental variable for soil spatial prediction.

The quantitative models describe the influence of CLORPT-related processes on the soil spatial distribution with prediction rules. Each prediction rule uses an environmental variable as predictor. Each of the three landscape unit will receive one or more prediction rules that model the distribution of the SOC and FF contents, depending on the number of processes that affect the spatial distribution within that unit. This forms the basic framework of the prediction models.

On basis of the available data two different quantitative prediction models could be developed for both SOC and FF. The first model (model 1) can be considered a simple model that uses one prediction rule with an environmental variable that is related to the relief factor. The second model (model 2) can be considered a refined alternative of the first model with which we hope to achieve a more accurate prediction.

Model 2 for SOC uses besides the relief factor environmental variables for the influence of the organisms factor (natural vegetation and the compound ring). Model 2 for the spatial prediction of FF uses besides the relief factor an environmental variable that is related to expected texture difference between the red brown soils in the upper part of the landscape and the beige brown soils in the lower part of the landscape. This means that there were four models developed in this study, which are from now on referred to as SOC-model 1, SOC-model 2, FF-model 1 and FF-model 2.

19

What does this mean for the required input data? The environmental variables used for the models 1 can be derived from a DEM. The DEM can be classified into the three landscape units (see section 2.3). The landscape units themselves are an environmental variable and they are used to derive the environmental variable that is related to the relief factor (see also Fig. 6).

Besides the environmental variables mentioned above, the models 2 use variables that are derived from a Landsat 7 ETM+ image of the study area (see also Fig. 6). For SOC-model 2 this means that maps with the natural vegetation and with the compound ring around the villages are needed. The FF- model 2 used spectral information that was assumed to be indirectly related to the fine fraction content to refine the prediction of model 1 for the plateau and bas-fond areas according to the following concept based on expert judgment: • There exists a sharp transition in soil color between the upper and lower part of the landscape. • The soil color difference is caused by a difference in iron content. The red brown soils of the upper part of the landscape contain more iron than the beige brown soils of the lower parts. • The beige soils are assumed to be finer textured than the red brown soils. Thus the red, iron rich soils are expected to have a somewhat coarser texture (smaller fine fraction content) than the beige brown soils. • The ratio between reflection in red (Landsat band 3) and reflection in blue (Landsat band1) gives a measure for the iron oxide content of the topsoil. Iron oxide has a high reflectance in red and an absorption in blue (Kariuki et al., 2004), thus soils with large iron oxide contents will have high 3/1 band ratio values, soils with a small iron oxide content will have small 3/1 band ratio values. • This means that the soils with a small 3/1 band ratio value have a finer texture, i.e. a larger fine fraction content.

The values predicted by model 1 are used as input for model 2. For pixels with a high 3/1 ratio value the model 1 prediction is lowered and for pixels with a small value the predicted FF content is raised

It is obvious that the 3/1 band ratio value can only be calculated from pixels that represent bare soil. These pixels can be classified from the Landsat 7 ETM+ image. For the bare soil pixels the 3/1 band ratio value can be calculated and subsequently interpolated to cover the complete study area. The interpolated map is then used as input for FF-model 2.

To summarize, the data required for soil spatial prediction modelling are (see also Fig. 6): • soil data for calibration, • a DEM derived landscape unit map, from which also the environmental variable for the relief factor can be derived, • a Landsat ETM+ image from which the environmental variables for the organisms factor (natural vegetation and compound ring) and the parent material factor (bare soil map for the calculation of the TM3/TM1 band ratio).

A soil data set, a DEM and two Landsat ETM+ images were available, all other data had to be generated.

2.3 Data description, preprocessing and generation

2.3.1 Soil data The soil data used to calibrate the prediction models was acquired by Niang (2004) in April 2004. This data set contains 40 point observations taken along seven transects in the Paoskoto, and Djiguimar communities in the Nioro du Rip department. Soil samples were taken at each point and nine site properties were described: community, transect number, parcel number (location of sample point within the transect), latitude and longitude (in degrees-decimal minutes, WGS84 datum),

20

landscape units (four types), crop rotation (nine types), land use (7 types) and management practice (4 types). One soil sample was taken for each ten centimeters of soil. Sampling depth ranged from 20 to 40 centimeters, which means that two to four samples were taken at each point. Five soil physical and chemical properties were measured at the laboratory of ISRA (the Senegalese agricultural research institute) in Saint Louis, Senegal. These properties were: soil organic carbon (%), silt content (%), clay content (%), pH, bulk density (g/cm 3).

Soil data used in this study consisted of the SOC and texture data for the first 20 centimeters of the topsoil. Two preprocessing steps were necessary before the data could be used: (1) the silt and clay fractions were added together to form the fine soil fraction characterized by a particle size between 0- 50 µm and (2) the SOC and FF contents for the 0-10 centimeter topsoil layer and the 10-20 centimeter topsoil layer were averaged to give a measure of the contents for the first 20 centimeters of the topsoil.

Five summary statistics were calculated for the complete data set and for each landscape unit (Tables 1 and 2) using SPSS version 11.5 (SPSS for Windows, Rel. 11.5, 2001).

Table 1. Summary statistics for the soil organic carbon content (%). n Mean ± S.E. Std. Deviation Variance Minimum Maximum Total 40 0.43 ± 0.03 0.16 0.03 0.24 1.15

Plateau 3 0.50 ± 0.08 0.13 0.02 0.34 0.58 Glacis 19 0.37 ± 0.02 0.08 0.01 0.27 0.63 Bas-fond 5 0.41 ± 0.06 0.14 0.02 0.24 0.61 Terrace 13 0.52 ± 0.06 0.13 0.02 0.24 1.15

Table 2. Summary statistics for the fine fraction content (%). n Mean ± S.E. Std. Deviation Variance Minimum Maximum Total 40 12.90 ± 0.51 3.23 10.41 6.19 19.13

Plateau 3 13.48 ± 3.28 5.69 32.35 7.75 19.13 Glacis 19 12.92 ± 0.71 3.09 9.55 6.88 19.00 Bas-fond 5 10.56 ± 1.52 3.39 11.51 6.19 14.75 Terrace 13 13.62 ± 0.75 2.72 7.37 7.81 17.94

It is evident that the SOC content in the topsoil is small for the whole study area, on average approximately 0.43%. Furthermore, the landscape unit means differ from the area mean. Absolute differences in SOC content between landscape units range from 0.4% to 0.15%. Relative differences between landscape units range from 5% to almost 30%. SOC values are small and absolute differences are small (although relative differences can be large), the TOA model is sensitive for these differences. Agricultural production increases significantly with an increase in SOC of only 0.1%. Physical soil properties as water holding capacity and water availability for crops improve already with a small rise in SOC content. Differences of the fine fraction content between the landscape units are less pronounced. The topsoil contains on average 12.9% silt and clay, which means that soils can be considered sandy to loamy sand. This is not surprising given the sandstone parent material. The standard deviations and ranges are large for both soil properties, even within landscape units. This will make spatial prediction of the SOC and FF contents complicated considering the chosen conceptual model.

Analysis of variance (ANOVA) was used to test for significant effects of land use, management practice and landscape unit on the SOC and FF contents. Statistically significant relationships, i.e. some of the present variance is explained by an environmental variable, can be incorporated in the

21

prediction model. ANOVAs showed no significant interaction between land use or management practice and the soil properties. Landscape unit had an effect on SOC content at the 90% confidence level (F(3,36) = 2.65, p<0.1). However, we must be a bit careful with this interpretation. Statistical significance does not automatically mean practical difference, although this might be the case for the SOC content. Furthermore, we have only a limited number of soil observations available so it can be questioned if the relationships found are really statistically significant. But this does not mean that we cannot take these findings into account when building the prediction models.

2.3.2 Remote sensing imagery Preprocessing . Two Landsat 7 ETM+ images acquired on 5 November 2002 and 13 March 2003 were available for this study, representing the wet and the dry season respectively, with 30 meters spatial resolution. Both images consisted of six reflective and one thermal band. The 2003 image contained also the panchromatic band. In this study only the six non-thermal bands were used. Preprocessing involved georeferencing, image matching and the transformation of DN-value to planetary reflectance. All preprocessing steps were carried out with the software package ERDAS IMAGINE 8.7 (Erdas Imagine, Version 8.7, 2003).

The November 2002 image was georeferenced to the UTM zone 28 projection (WGS84 datum). The four corner coordinates contained in the metadata file were used as reference points. A first degree polynomial transformation was applied and the nearest neighbor algorithm was used for resampling. The March 2003 image was already georeferenced. After this first preprocessing phase a subset, encompassing the study area, was taken from the images. Although both images were georeferenced to the same projection, a discrepancy of approximately 200 meters existed between the two images. Using GPS field measurements of points that were easy recognizable on the images, it was observed that the 2002 image was more accurately projected. The 2003 image was matched with the 2002 image to ensure an exact overlay of the grid cells. The last preprocessing step involved the conversion from DN-value to planetary reflectance. The sensor gain and offset settings and conversion equations, as supplied by the European Space Agency (ESA) 3, were used. Atmospheric correction was not applied because the necessary atmospheric data was lacking. This was, however, not a problem because this study did not involve a multi-temporal analysis.

Landsat ETM + classification . Data necessary to refine the prediction models 1 were extracted from the March 2003 Landsat ETM+ image because differences between natural vegetation and bare soil are more pronounced during the dry season. SOC-model 2 required natural vegetation areas and the compound ring around villages as environmental variables while bare soil areas were necessary to calculate the variable that was used to refine the FF model 1.

Classification was implemented in ARC/INFO 9 (ArcInfo, version 9, 2004). For natural vegetation and bare soil classification, decision-rules with threshold values for the tasseled cap indices, principal components and image composites were used (as described by the concept at the bottom of this page). During an iterative process decision-rule and threshold settings were changed. The results of each setting were visually interpreted until a setting was found that gave the best selection of the bare soil and natural vegetation areas. For the pixels classified as bare soil the 3/1 band ratio value was calculated (see section 2.2.2). The result was kriged in VESPER version 1.6. The resulting map was used as input for FF-model 2.

Land use observations were used to assess the natural vegetation classification accuracy. Bare soil areas were also recorded during the field survey. However, these observations could not be used to asses the classification accuracy of the bare soil areas because of seasonal variation in bare soil areas due to crop rotation.

Discriminating between natural vegetation and bare soil . Several spectral enhancement techniques were used to support the identification of natural vegetation and bare soil pixels. These methods

3 http://earth.esa.int/pub/ESA_DOC/landsat_FAQ/

22

include principal component analysis (PCA), tasseled cap transformation (TCT), false color composites and band ratio images. All spectral enhancement processing was done with the software package ERDAS IMAGINE version 8.7.

• Principal component analysis . A PCA aims at reducing data redundancy in an image that is caused by strong correlation between the bands (Lillesand and Kiefer, 2000). It comprises a linear transformation of the original spectral bands into principal components (PCs). Each PC explains a part of the spectral variance present in an image. The first two PCs explain usually more than 90% of the total variance present in an image. In this study all bands except band 6 were used as input for the PCA.

• Tasseled cap transformation . Like PCA, TCT involves a linear transformation of the original spectral bands. It is used to enhance discrimination between soils and vegetation, the two properties of interest. This is done through the calculation of three indices; soil brightness, greenness and soil and canopy moisture (Kariuki et al., 2004). These indices were visualized as a color composite in red, green and blue respectively.

• Band ratios and false color composites . Band ratio images were generated of the normalized difference vegetation index (NDVI) and for the ETM+ bands 5/7, 5/4, 4/3 and 3/1. High NDVI and 4/3 values (bright pixels) represent a high vegetation density. These pixels were mainly found in the bas-fonds. The band ratios 5/7, 3/1 and 5/4 give information about clay mineral (hydroxyl) content, iron oxide content and difference between iron oxide dominance and hydroxyl respectively (Kariuki et al., 2004). High hydroxyl and iron oxide contents give bright pixels because of high reflectance in bands 5 and 3 respectively, and strong absorption in bands 7 and 1 respectively (Kariuki et al., 2004). A false color composite of band ratios 4/3, 5/4 and 5/7 (RGB) was generated. In this composite vegetated areas appear red and magenta, areas with high iron oxide content green (related to bare soil areas) and areas with strong presence of hydroxyls blue (Kariuki et al., 2004). Besides this ratio composite, 7,4,2 (RGB) and 4,3,2 (RGB) composites were obtained.

Spectral signatures and spatial patterns of the principal components, tasseled cap indices and composite images were interpreted and compared with each other. By relating spatial patterns with spectral signatures, combined with image comparison it was possible to identify bare soil and natural vegetation areas in the study area.

Detecting villages and roads . Various false color composite images showed the villages and roads very clearly, which made manual delineation easy. The delineated villages and roads were converted to grid. Subsequently ARC/INFO was used to create buffer zone around the villages of 150 meters wide. The buffer zone represented the compound ring. The created compound ring map was used as environmental variable in SOC-model 2. Another function of this map was masking the villages and roads from the map with the natural vegetation and bare soil classification result and from the SOC and FF maps.

2.3.3 Digital elevation model Preprocessing . The digital elevation model (DEM) used in this study was acquired by radar interferometry and had a grid cell size of 50 meters and a radiometric, vertical, resolution of one meter. The DEM was georeferenced to the UTM zone 28 projection with WGS84 datum and resampled to a grid cell size of 30 meters, equal to the Landsat ETM+ images, using nearest neighbor interpolation. Resampling was necessary for the cell-by-cell based model operations. Subsequently the 30 meter DEM was matched with the Landsat images so that there was an exact grid cell overlap. These first preprocessing steps were done with ERDAS IMAGINE Version 8.7. The next preprocessing phase was carried out with ARC/INFO 9.

23

The DEM showed an irregular pattern with a very large short-scale variability, which does not comply with the field situation. The landscape surface is relatively smooth with little short-distance variation in height. It was therefore assumed that the DEM irregularities were the result of rounding errors related to the acquisition method. To smooth these irregularities a focal median filter with a 13x13 neighborhood was used. Fig.7 shows a detail of the DEM before and after smoothing.

Figure 7. Two details of the digital elevation model of the study area showing the original image (left) and the same area after smoothing (right).

During the next preprocessing step two areas were clipped from the smoothed DEM: the study area and the so-called hydrology area. This latter area is slightly larger than the study area and was used to derive the terrain attributes. The reason for this was to avoid edge effects when deriving hydrological parameters. The next step was the creation of a depressionless DEM. Sinks were identified and filled, using standard ARC/INFO algorithms, to ensure undisturbed water flow necessary for flow direction and flow accumulation calculation. The terrain attributes slope, flow accumulation and local minimum and maximum elevations within a given radius were derived from the depressionless DEM. These terrain attribute maps and depressionless DEM were used for landscape unit mapping.

Concerning the slope map. The left part of Fig. 8 shows the slope map as derived from the DEM. Many of the slopes depicted in this map are not real slopes but that can be regarded as elevation contour lines; at the point where the elevation changes one meter, a short, steep slope is given. These artefacts (an example is indicated by the dashed arrow in Fig. 8) are caused by the coarse (integer) radiometric resolution of the DEM.

Figure 8. Detail of the slope map derived from the smoothed DEM (left). Note that the area presented in this figure is the same area as in figure 7. The shade of gray indicated with the arrow represents flat areas. The dashed arrow indicates a slope, which follows exactly the contour of the DEM (compare with Fig.7). The right part of this figure shows the slope map after application of the 9x9 focal median filter.

24

The very gradual elevation change in large parts of the landscape cannot be visualized in the DEM because of its one meter vertical resolution. Instead of a gradual change, the DEM gives an abrupt change of one meter causing a short, steep slope at this point (Fig. 9). To get rid of these artefacts the slope map was smoothed with a focal median filter with a 9x9 neighborhood. The result of this smoothing is depicted in the right part of Fig. 8. The artefacts are gone. The resulting map does much better comply with the field situation where gently sloping areas (glacis) alternate with relatively flat areas (the plateaus and bas-fonds).

Field situation DEM 1 meter Slope

1000 meters Figure 9. Elevation change: field situation versus DEM representation and its effect on the slope map.

Landscape unit mapping . The catena concept is chosen as starting point for soil spatial prediction. In section 2.2.2 was explained that we need a map with the landscape units for this purpose. The landscape units serve as basic environmental variables in the prediction models. Such a map is not available, it has to be generated. Discontinuous river terraces that can be found on the lower parts of the glacis were regarded as part of the glacis and not as a separate landscape unit because it was not possible to identify them with the DEM. The classification model was developed and implemented in ARC/INFO.

An important one condition for the classification method, which was that it had to be reproducible (an important digital soil mapping characteristic). This meant a trade-off between accuracy and reproducibility. Manual delineation of the landscape units would perhaps result in a relatively accurate map. However, manual delineation is generally time consuming. A more automated mapping approach is preferable. Also when considering the future. There are more TOA study areas in West Africa. It might from a practical point of view not feasible to do manual delineations on a larger scale. There was therefore chosen for a method without any manual delineations or adjustments.

The method applied in this study works with decision-rule and threshold values, like the Landsat image classification. The DEM and terrain attribute variables serve as input. For the plateau and bas- fond classification threshold values to the variables were given, e.g. for a pixel to be classified as plateau the elevation must be higher than 33 meters and the slope must be smaller than 0.5 degrees etc. During an iterative process the threshold settings were changed. The results of each setting were visually interpreted until a setting was found that gave the best representation of the plateau and bas- fond areas.

After this first mapping phase the plateaus and bas-fonds, the borders of the landscape units were smoothed with a focal majority filter with neighborhood radius of two grid cells to get rid of the classification noise which is inherent to the used method. This noise was created by pixels that fulfill the requirements of the decision-rules and threshold values but do not represent real plateaus and bas- fonds. These cells often appeared as small clusters scattered along plateau and bas-fond areas.

In last step of the mapping process the two landscape unit grids were merged into one grid. The resulting landscape map revealed another problem: in some areas were the plateaus adjacent to bas- fonds. It is obvious that in reality this is never the case. There must always be a glacis in between these two landscape units. To overcome this problem another processing step was built into the classification process prior to the grid merge, in which all plateau and bas-fond cells that were within a given range from each other were excluded. After the exclusion step the plateau and bas-fond grids were merged into one grid. The landscape unit ‘glacis’ was then assigned to all non-classified grid cells. The mapping accuracy was assessed by comparing the mapping result with the landscape unit observations during the field campaign of this study.

25

2.4 From concept to application: from qualitative to quantitative

The soil erosion and sedimentation processes of the CLORPT relief factor were used as a basis for the prediction models 1 (section 2.2). It was assumed that these processes only affect the glacis and not the plateaus and bas-fonds. To translate the conceptual model into a structure that can be implemented, a causal relationship between soil erosion and sedimentation and the spatial distribution of the SOC and FF contents on the glacis was assumed: erosion of the topsoil causes lowering of the SOC and FF contents and on places were the soil material is deposited the contents will be larger.

The next development step of the models 1 was, in theory, to find environmental variables (e.g. DEM- derived terrain attributes) that could explain the spatial distribution of SOC and FF of the available soil observations in the context of erosion and sedimentation. However, in practice it was not possible to explain the soil spatial distribution within each landscape unit with the available data. Only sixteen of the 40 observations of the soil data set were located in the study area. Seven of these were located on the glacis. This is too few to gain any insight in the spatial distribution of the SOC and FF contents, let alone to find any correlations with environmental variables.

To resolve this problem we assumed that the spatial distribution of the SOC and FF contents along the glacis can be modelled by as a quadratic function of the position on the glacis. This function follows the erosion-sedimentation pattern on the glacis as described above. For the plateau and bas-fonds no predictors were found that could partially explain the spatial variation of the two soil properties within these units. The most appropriate predictors were therefore the means of the soil data set. For the models 2 it was possible to add some spatial variation in SOC and FF contents to the model results of the plateau and bas-fond units by the use of environmental variables that were derived from the Landsat image (see section 2.2 and 2.3).

All models were based on a grid structure of square cells and use map algebra and cell-by-cell processing in the ARC/INFO environment. Soil properties were predicted on grid cell level. In the following sections the structure of the four prediction models is explained in more detail.

2.4.1 Modelling soil organic carbon: model 1 Plateaus . For this landscape unit the plateau SOC mean of the soil data set is used as predictor, which is 0.50% (see Table 1).

Glacis . The spatial distribution of the SOC on the glacis is modelled as a function of the position. For this purpose a position indicator x was calculated using the landscape unit map, which expressed the standardized location (i.e. the position within the catena) of a pixel on the glacis relative to the nearest plateau pixel. The position indicator was calculated by:

dP x = , (2) dP + dB where dP and dB are the Euclidean distances from a glacis pixel to the nearest plateau pixel and nearest bas-fond pixel respectively. Pixels close to a plateau will have a value close to 0, pixels close to a bas-fond will receive a value close to 1. It was assumed that SOC content distribution on the glacis can be described with a quadratic function:

SOC content = a x2 + b x + c, (3) where x is the standardized glacis position and a,b and c are the coefficients of the function. This function was based on the (schematic) erosion and redistribution patterns of soil material (including organic carbon) on a slope.

26

For the seven glacis points located in the study area, the SOC content was plotted versus the glacis position. A second degree polynomial curve was fitted through these points (Fig. 10).

0.45

0.40 0.35

0.30 0.25

0.20

0.15 SOC SOC content (%) 0.10 0.05

0.00 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Standardized position on the Glacis

Figure 10. Empirical relationship between soil organic carbon content (%) and relative glacis position. The dashed line represents the quadratic function fitted through the soil observation points. The solid line represents the function that is used by the prediction model.

The coefficients of the function were manually fit to keep the predicted SOC content within the physical range of the values of the soil data set. The SOC content was predicted with Eq. 4, which describes the empirical relationship between soil property and the environmental variable ‘standardized glacis position’:

SOC content = 0.67 x2 – 0.70 x + 0.46. (4)

Bas-fonds . The soil data set showed a large spread in SOC contents (0.24% - 0.61%) in the bas-fonds. With the current data it is not possible get more insight in the spatial distribution of SOC in the bas- fonds. The bas-fond mean is used as predictor, which is 0.41% (see Table 1). The structure of the prediction model SOC 1 is presented in Fig. 11.

Landscape Unit Map

Select Select Select

Standardized Plateau Glacis Bas-fond Glacis Areas Areas Areas position (x)

Predict Predict Predict

SOC = SOC = SOC = 0.50% (0.67x2 - 0.70x + 0.46)% 0.41%

Figure 11. Schematic structure of the SOC-model 1.

2.4.2 Modelling soil organic carbon: model 2 The CLOPRT organisms factor was used to improve the predictions of SOC-model 1 (see section 2.2). Natural vegetation and the compound ring were used as environmental variables in the second model. SOC-model 2 used, on grid cell level, the SOC content predicted by model 1 and adjusted this

27

value if natural vegetation was present in this grid cell or if the grid cell was located in the compound ring. The structure of prediction model SOC 2 is presented in Fig. 12.

Land Unit Map

Select Select Select

Predict Plateau Glacis Bas-fond

Yes Natural Yes Natural Yes Natural Vegetation Vegetation Vegetation Area Area Area

No No SOC = No SOC = SOC = Yes (0.67x2 - 0.70x + Yes Select 0.57% Compound SOC = 0.61% Compound SOC = 0.46)% + 0.20 % Ring Area 0.85% Ring Area 0.91%

No No Compound Ring Area SOC = SOC = Yes 0.50% 0.41% No SOC = (0.67x2 - 0.70x + 0.46)% + 0.50 %

SOC = (0.67x2 - 0.70x + 0.46)%

Figure 12. Schematic structure of the SOC-model 2.

Fig. 12 shows that: • If natural vegetation is present, 0.20% is added to the SOC content as predicted by model 1. This value is based on the difference between the average SOC content for soil data set locations under natural vegetation (0.60%) and other locations (0.40%). • If the grid cell is located in the compound ring, 0.50% is added to the value predicted by model 1.

For the plateaus a slightly different approach was applied. Two of the three plateau locations of the soil data set were under natural vegetation. The average SOC content for these two locations was 0.57%. This value was used as predictor for plateau positions that are under natural vegetation.

2.4.3 Modelling soil texture: model 1 Plateaus . The plateaus are less influenced by erosion and sedimentation than the glacis and form the relatively stable parts of the landscape. Relatively large fine fraction contents were expected in this landscape unit. Dijkerman and Miedema (1998) found upland (plateau) soils had a large fine fraction content (74.2%) compared to the upper parts of the glacis. The soil data set showed the same trend. The mean of the fine fraction content on the plateaus was larger than the mean of the complete data set and larger than the glacis mean. In this basic model the plateau FF mean of the soil data set (see Table 1) was used as predictor of the fine fraction of this landscape unit.

Glacis . The standardized glacis position is used as environmental variable for the glacis. For the seven glacis points located in the study area, the FF content was plotted versus the glacis position. A quadratic function is fitted through these points (Fig. 14). Again the coefficients of the function are manually fit to keep the predicted SOC content within the physical range of the soil data set. The SFF content is then predicted with Eq. 5, which is also plotted in Fig. 13:

28

SOC content = 20.3 x2 – 19.9 x + 15. (5)

20 18 d 16 14 12 10 8 6 4 Fine Fraction content (%) 2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Standardized position on the Glacis

Figure 13. Empirical relationship between fine fraction content (%) and relative glacis position. The dashed line represents the quadratic function fitted through the soil observation points. The solid line represents the function that is used by the prediction model.

Bas-fonds . The fine fraction content in the bas-fond ranged from 6% to almost 15% according to the soil data set. As with the SOC content, it was hard to map these differences without additional soil data, e.g. in the form of soil maps. The mean of the soil data set (10.6%) could be used as predictor. However, considering the conceptual model it was assumed that the soils had a larger fine fraction content (see section 2.2) than the soil data set mean suggested. There are only five bas-fond observations in the soil data set. Because of the heterogeneous character of the bas-fonds and its large short-scale variation, it is possible that these samples were taken in relatively coarser areas that are not representative for the bas-fonds. Using the soil data set mean would very likely lead to an underestimation of the fine fraction content. It was therefore decided not to use the mean as predictor. Instead, the value 16% was used. This value lies within the range of the soil data set values and was greater than the values predicted for plateau and glacis position. This is likely considering the trends observed in the field. The structure of prediction model FF 2 is presented in Fig. 14.

Landscape Unit Map

Select Select Select

Standardized Plateau Glacis Bas-fond Glacis Areas Areas Areas position (x)

Predict Predict Predict

FF = FF = FF = 13.50% (20.3x2 - 19.9x + 15)% 16.00%

Figure 14. Schematic structure of the FF-model 1.

2.4.4 Modelling soil texture: model 2 Model 1 used the mean as predictor for the FF content on the plateaus and bas-fonds. This resulted in a map that does not show any spatial variation within these units. The Landsat band ratio 3/1 was used

29

to add some detail to the model 1 prediction by adjusting the fine fraction content based on the ratio value, as explained in section 2.2.2.

The ratio values follow approximately a normally distribution and have a minimum 1.16 and a maximum of 1.55. The mean of the distribution was 1,347 and 95% of the values ( µ ± 2 σ) lay between 1.273 (µ - 2 σ) and 1.421 ( µ + 2σ). The FF contents of the sixteen soil data set points located within the study area were used to calibrate the assumed relationship between ratio value and FF content. However, this did not yield any results. The ratio values of the sixteen points covered only a small part of the total range. All values were between 1.36 and 1.39. Within this small range it was not possible to detect a trend that can be extrapolated to the complete range. It was therefore decided to use 1.273 and 1.421 for calibration. A linear relationship was assumed between FF adjustment factor and ratio value. An adjustment factor of +3% and -3% was assigned to 1,273 and 1,421 respectively. With these adjustment factors at least 95% of the predicted FF contents were within the range of the soil data set. This resulted in the adjustment function:

Adjustment factor = 40 * ratio value – 53.8. (6)

The adjustment factor was added to the model 1 prediction. For the glacis the model 1 outcome was used as prediction. The structure of this prediction model is presented in Fig. 15.

Landscape Unit Map

Select Select Select

Standardized Plateau Glacis Bas-fond Glacis Areas Areas Areas position (x)

Predict

FF = (20.3x2 - 19.9x + 15)%

TM3/TM1 ratio value (t)

Predict Predict

FF = FF = 13.50 + (40t - 53.8) % 16.00 + (40t - 53.8) %

Figure 15. Schematic structure of the FF-model 2.

2.5 Sampling and Validation

The validation process is an important aspect of this study. The performance of the prediction models was validated with soil data collected during a fieldwork period in January and February 2005 in the Nioro du Rip area. Before the actual soil sampling in Senegal a sampling strategy had to be chosen and sampling points had to be selected. After the fieldwork period the samples were analyzed for soil

30

organic carbon content and sand, silt and clay fractions in the laboratory of ISRA, Saint Louis in Senegal. The results were used for validation.

2.5.1 Sampling Strategy The objective during the fieldwork period was to collect as many samples as possible within the time frame of three weeks. Cluster random sampling was chosen as a sampling strategy (De Gruijter et al., in press). The choice for this type of design was its operational efficiency. The study area is large and the accessibility is difficult. The infrastructure mainly consists of narrow, sand roads. Cluster random sampling has the advantage that it greatly reduces travel time between sampling points, allowing more samples to be taken in the same amount of time when compared to simple random sampling.

De Gruijter et al. (in press) define clusters with three aspects: shape (point orientation), size (number of points) and direction (geographical orientation). We chose for transect sampling with equidistant points. Each transect consisted of five points evenly spaced at 180 meters. Transect directions were north-south and east-west.

2.5.2 Selection of sample points for validation Before doing the cluster selection, the study area was discretized into a finite number of possible clusters so that each grid cell could only belong to a single cluster. This was done by dividing the study area into blocks of 30 x 30 grid cells. The study area contained 1008 such blocks: 36 in east- west direction and 38 in north-south direction (upper part of Fig. 16). The blocks were numbered in west-east direction. The blocks on the first row were numbered 1-36, on the second row 37-72 etc., (see Fig. 16).

Each block consisted of exactly 180 unique clusters: 90 north-south oriented and 90 east-west oriented in a chessboard pattern as shown in the lower part of Fig. 16. The configuration of the cluster numbers is the same for each block. This meant that there were 181,440 (1008*180) unique clusters located within the study area.

In the first selection stage blocks were selected with replacement and with equal probability. Subsequently, for each block a cluster number was selected with equal probability. This two step approach was applied for reasons of convenience. Of course it was also possible to randomly generate 150 numbers between 0 and 181440. However, then it would be very difficult to locate the cluster in the study area; i.e. to find the coordinates of the starting points. With the two-step approach this was relatively easy. The following example will illustrate this:

• The first cluster selected was cluster 64 in block 386. • There are 25 blocks between the western boundary of the study area and block 386 and 10 blocks between the northern boundary of the study area and this block (upper part of Fig. 16). • This means that there are 750 (25*30) grid cells equaling 22,500 meters westwards and 300 (10*30) grid cells equaling 9,000 meters northwards from the upper left grid cell of block 386. • The coordinates of the center of the upper left grid cell of the study area are (392575,1546674). These UTM coordinates have meters as unit. The distance between the block and the western and northern boundary can then be added to or subtracted from the coordinates of the upper left cell of the study area. The coordinates of the center of the upper left grid cell of block 386 are then (415075,1537674), which shows an advantage of a coordinate system with meters as units. • The next step is to determine the location of cluster 64 (see lower part of Fig. 16) within block 386. • The clusters were sampled from south to north or from west to east. Thus the starting point of a cluster is the most southern or western point within the cluster. • Cluster 64 is east-west oriented, and is thus sampled eastwards. The starting point of cluster 64 is its most western point, which is located on row 11, column 4 within the block (see lower

31

part of Fig. 16). The distance between the center of the upper left block grid cell to the center of starting point of cluster 64 is 90 meters westwards and 300 meters southwards. This means that the starting point of cluster 64 in block 386 is located at coordinates (415165,1537374). The second point in this cluster lies 180 meters westwards at (415345,1537374), the third point at (415525,1537374) etc.

Figure 16. The upper part of the figure below represents the study area discretized into 1008 blocks of 30 x 30 pixels. The lower part of the figure shows the configuration of the sample clusters within one block. The numbers correspond to the cluster number. Each rectangle of the chessboard pattern corresponds to one grid cell. Note that there are 180 unique clusters within one block.

In total 150 clusters were selected, totally 750 sampling points. The coordinates of the starting points were calculated and uploaded to a GPS receiver. It is unlikely that all 150 clusters would be sampled. This number of clusters was chosen to make sure that we had enough clusters selected to last the fieldwork period.

32

2.5.3 Sampling the soil The samples of the first 20 centimeters of the topsoil were collected during a three week fieldwork period in January and February 2005. Before sampling it was checked whether the selected observation point belonged to the sample population. Build-up areas, roads, rock outcroppings or soils shallower than 20 cm (we model SOC and FF for the first 20 centimeters of the topsoil, soil material has to be taken until this depth) were excluded. In addition, sample points had to be located at least 25 meters from village borders and 10 meters from sealed roads to exclude severely disturbed topsoils by human activity. If the sample point did not meet these criteria it was rejected according to the approach of De Gruijter et al. (in press). They advise to delete the entire cluster from the selection if the starting point is rejected. If another point of the clusters is rejected, only this point is removed while maintaining the rest of the cluster points.

At the end of the fieldwork period we had to make sure that the clusters were sampled in the order they were selected. For example, if in total 30 clusters were sampled, these clusters must be the first 30 clusters that were selected and not clusters 20-30, 35-50 and 67-72. This did not mean that first cluster 1 was sampled and then cluster 2, 3 and so on, that would be too time-consuming. The 150 clusters were divided in batches. The first batch contained the first 30 clusters, which was assumed the minimum number of clusters possible to sample within the given timeframe. The second batch contained clusters 31-50, all other batches contained 10 clusters each. First the clusters of the first batch were sampled in the most efficient way by choosing an order that minimizes travel time. After the first batch the second one was sampled. Based on the time left it was decided how many clusters of the batch could be sampled. If this was for example five, only the first five selected clusters of this batch (31-35) were sampled and not five clusters selected on basis of travel efficiency. Working with batches ensured that the clusters were sampled in order of selection but in a way that limited the travel time.

The models predict SOC and FF contents for grid cells. One sample per grid cell would not give a representative value for this cell. Therefore, for each sample grid cell within a cluster a composite sample, consisting of five sub-samples, was taken to average out variability within a grid cell. A k- means clustering algorithm was used to optimize the sample point configuration within a grid cell (Fig. 17).

12 m 30 m

30 m

Figure 17. Optimized sample point configuration within a grid cell.

Of course, the grid cells cannot be seen in the field. It was therefore impossible to sample exact at the four points surrounding the central point. The 12 meters were measured with 12 steps in the general direction of the diagonal of the grid cell, which was at 45˚ degrees from the cluster direction.

The soil material of the first 20 cm of the topsoils of the five points was collected in a bowl and thoroughly mixed. From the mixture a sample was taken. In each cluster one point was selected for duplicate sampling.

For each sampling point land use, landscape unit and dry and wet soil color were recorded. The samples were randomly recoded and analyzed to minimize lab analysis bias.

33

2.5.4 Validation procedure The method of statistical inference is determined by the type of sample design. In the cluster random sampling design the clusters can be regarded as the primary sample units (De Gruijter et al., in press). The summary statistics of the collected data set were estimated with Eq. 7-10.

The spatial mean of the study area was estimated by the cluster mean (De Gruijter et al., in press):

1 n zˆCl = ∑ zˆi , (7) n i=1

where n is the number of clusters and zˆi is the sample mean of cluster i. The sampling variance of the estimated mean was estimated by (De Gruijter et al., in press):

n 1 2 Vzˆ()ˆCl= ()zz ˆ iCl − ˆ . (8) − ∑ n( n 1) i=1

The estimated standard error is the square root of the sampling variance (De Gruijter et al., in press):

Sˆ(zˆCl ) = Vˆ(zˆCl ) . (9)

The spatial variance between the points in the study area was estimated from the sample data by (De Gruijter et al., in press):

n 2 1 2 Sˆ (zˆCl ) = (zˆi − zˆCl ) . (10) − ∑ n 1 i=1

For validation we were interested the prediction errors and in the variances of the prediction errors. We used the estimators of equations (7), (8) and (9), in which the z values were substituted with the mean prediction errors or with the mean squared prediction errors. A similar approach was used by Brus and Kiestra (2002). The prediction error is the difference between predicted and observed value at a validation point:

* εl = zˆl - zl , (11)

* where zˆl and zl are the predicted value and the observed value at validation point l respectively. The mean prediction error of a cluster was then estimated by:

1 m mpe = ∑εj , (12) m j =1 where m is the number of validation points per cluster,. The mean squared prediction error of a cluster is estimated by:

m 1 2 mspe = ∑εj . (13) m j =1

The model mean prediction error (MPE), which is a measure of bias of the model results, was estimated by the mean of the cluster mean predictor errors:

34

1 n MPE Cl = ∑ mpe i , (14) n i=1

where n is the number of clusters and εˆi is the mean prediction error of cluster i. The model mean square prediction error (MSPE) and the model root mean square prediction error (RMSPE) were used to asses the accuracy of prediction (Hengl et al., 2004). The MSPE was estimated by the mean of the cluster mean squared prediction errors:

1 n MSPE Cl = ∑ mspe i . (15) n i=1

The RMSPE Cl is the square root from the MSPE Cl :

RMSPE Cl = MSPE Cl . (16)

The variances of the MPE Cl and MSPE Cl were estimated by (Brus and Kiestra, 2002):

n 1 2 Vˆ(MPE Cl ) = (mpe i − MPE Cl ) , (17) − ∑ n(n )1 i=1 and

n 1 2 Vˆ(MSPE Cl ) = (mspe i − MSPE Cl ) . (18) − ∑ n(n )1 i=1

The standard error of the MPE and the MSPE were estimated by taking the square root of the variances.

The next step in the validation process was the comparison of the models 1 with the models 2 for each soil property. Model 2 was regarded as a refined version of model 1. It used more input data which might result in a more accurate prediction. We therefore expect the MPE and MSPE of model 2 to be smaller than the MPE and MSPE of model 1. The differences in MPE and MSPE between the two models were estimated. A t-Test for paired observations was used to test if these differences significantly deviate from 0. When the t-statistic is larger than 2 at a 95% confidence interval, it can be concluded that the differences deviate significantly from 0. At a 90% confidence interval the t- statistic must be larger than approximately 1.6 (Brus and Kiestra, 2002). The statistical calculations described above have been applied to the models of both soil properties.

Statistical inference of the duplicates was based on a simple random sampling design in which the measurement differences between two samples taken from the same material are statistically analyzed instead of the measured values themselves. For each duplicate the measurement difference was estimated by:

ˆ 1 di = (s1i − s2i) , (19) n where s1 and s2 are the two measured values for sample i and n is the number of samples, which is in case of a duplicate 2. The squared difference was estimated by:

ˆ 2 1 2 di = (s1i − s2i) . (20) n

35

The mean measurement difference (MD) and mean square measurement differences (MSD) are estimated by:

1 m ˆ MD = ∑ di , (21) m i=1 and m 1 ˆ 2 MSD = ∑ di , (22) m i=1 where m is the number of duplicate samples. The root mean square measurement difference (RMSD) was calculated by taking the square root of the MSD. The variance of the MD and the MSD were estimated by:

m 1 ˆ 2 Vˆ(MD ) = (di − MD ) , (23) − ∑ m(m )1 i=1 and

m 1 ˆ 2 2 Vˆ(MSD ) = (di − MSD ) . (24) − ∑ m(m )1 i=1

The standard error of the MD and the MSD was estimated taking the square root of the variances. The MD quantifies the bias of the sample analysis and should be close to zero. The MSD and RMSD give a measure of accuracy of the sample analyses. Both MD and MSD should be close to zero. Statistical inference of the duplicates was done for both soil properties.

36

3. Results

In this chapter the results of this study are described. The first section lists the results of the classification of the Landsat 7 ETM+ image and the DEM. The aim of this classification was to select natural vegetation and bare soil areas from the image with decision-rules and threshold values for principal components, tasseled cap indices and various false color composites. The selected areas were directly, in case of natural vegetation, or indirectly, in case of bare soil, used as predictor variables in the models for SOC and FF, respectively. The DEM was classified into the three major landscape units. The SOC and FF distributions were predicted as function of landscape unit. In order to predict the spatial variation within each landscape unit, a landscape unit map is required. Therefore the DEM was classified using decision-rules and threshold values for terrain attribute combinations. The second section of this chapter describes the results of the four prediction models. These results were validated with soil observations collected during a fieldwork period. In the third section a statistical description of the collected validation observations is given, followed by validation of the model results in the last section of this chapter.

3.1 Model input data

3.1.1 Landsat ETM+ classification Natural vegetation and bare soil areas were classified from the March 2003 Landsat 7 ETM+ image. Various spectral enhancement techniques were applied to facilitate the classification, resulting in band composite, tasseled cap and principal component images. The spectral signatures and spatial spectral patterns of the images were interpreted and subsequently used for classification.

Image interpretation . The eigenmatrix of the principal component analysis is shown in Table 3. The first principal component (PC1) had positive loadings on all bands with particularly high loadings on band 5 and band 7. It was therefore interpreted as a brightness component with bare soil showing as bright pixels. Band 4 was the main contributor to PC2, which can be interpreted as the vegetation component. The first two PCs contain 95.6% of the information content of the original image, including vegetation and bare soil information. The other four PCs were therefore not used. The PCA image was visualized as a color composite with PC2 in red and PC1 in green and blue, thus appearing in cyan (Fig. 18a).

Table 3. Eigenmatrix of the PCA showing the contribution of the Landsat ETM+ bands to the principal components. Main contributions in bold. pc1 pc2 pc3 pc4 pc5 pc6 Band 1 0.10 0.03 -0.24 -0.53 -0.49 0.64 Band 2 0.15 0.10 -0.28 0.55 0.20 -0.74 Band 3 0.21 0.23 -0.17 0.39 -0.83 0.20 Band 4 0.24 0.86 -0.20 -0.37 0.18 0.02 Band 5 0.57 0.08 0.79 0.18 0.09 -0.02 Band 7 0.73 -0.44 -0.41 -0.31 0.01 0.01

The generated PCA, TCT, NDVI and band composite images showed a similar spatial spectral pattern. Areas that appeared dark in the NDVI image, appeared red in the TCT image (high soil brightness, Fig. 18b), cyan in the PCA image (high PC1 value, Fig. 18a) and had a relatively high reflectance in band 7, indicated by the pink colors in Fig 18c. These areas were therefore regarded as bare soil.

37

Vegetation appeared as bright pixels in the NDVI image. The same pixels appeared red in the PCA image (high value for PC2, Fig 18a) and cyan and blue in the TCT image (high value for the greenness and wetness indices, Fig 18b). In the false color 4,3,2 (RGB) composite these pixels were red and orange (Fig. 18d). The red pixels occurred predominantly in the bas-fonds were shrubby vegetation dominates. The orange pixels, caused by a slightly higher reflection in band 3 and thus a higher DN value for the green band that visualizes band 3, represented yellow grasses that mainly occurred in the higher parts of the landscape. All other pixels were assumed to be arable land with some form of soil cover by crop residues in different densities. These pixels appear pale red or grayish in the PCA image and pale mint green in the 7,4,2 (R,G,B) composite and grayish purple in the 4,3,2 (RGB) composite.

a. b.

c. d.

Figure 18. Four Landsat ETM+ images showing a similar spatial pattern: (a) PCA image with PC2 shown in red, PC1 in cyan, (b) tasseled cap image showing the brightness, greenness and wetness indices (R,G,B), (c) composite image of bands 7,4 and 2 (R,G,B) and (d) composite image of bands 4,3,2 (R,G,B).

Classification . Based on the interpretation of the images it was decided to use the PCA and TCT images for bare soil and vegetation classification because these two images showed a sharply edged pattern that was closely related to bare soil and vegetation areas. The classification parameters

38

(tasseled cap indices and principal components) with the assigned threshold values are shown in Table 4. The classification results are shown in Fig. 19. Note that the areas with natural vegetation are predominantly located in the bas-fonds.

Table 4. Classification parameters and the assigned threshold values. Natural Vegetation Bare Soil Brightness index < 56 > 57.5 Greenness index - < 11 Wetness index > -10 - PC1 < 40 ≥ 42 PC2 < 30 ≤ 26.2

Figure 19. Natural vegetation areas (left) and bare soil areas (right), depicted in black.

The accuracy of the natural vegetation classification was assessed by comparing the classification result with the recorded land use recording during field survey of this study. The results of this assessment are presented in a confusion matrix (Table 5). Table 6 shows the producer’s and user’s accuracies of the natural vegetation classification.

Table 5. Confusion matrix and classification accuracies of the natural vegetation classification. The bold numbers indicate the correctly classified pixels. Classification Result Validation Set Natural Veg. Other Land Use Total Natural Veg. 21 11 32 Other Land Use 22 101 123 Total 43 112 155 Table 6. Producer’s and User’s accuracies of the natural vegetation classification. Producer’s User’s Validation Set Accuracy (%) Accuracy (%) Natural Veg. 65.6 48.9 Other Land Use 82.1 90.2

The producer’s accuracy is computed by dividing the number of correctly classified pixels of a class by the total number of training set pixels used for that class, i.e. it represents the percentage of a class that is classified correctly. The producer’s accuracy of the classification is with almost 2/3 of the total number of natural vegetation pixels in the training set classified correctly moderately well. The user’s accuracy (reliability of the classification) is the accuracy of the classification from the user’s point of view. It indicates the percentage of pixels assigned to a class that really belong to that class. It is

39

calculated by dividing the number of correctly classified pixels of a class by the total number of pixels classified in that class. The reliability of the natural vegetation classification result is with almost 50% moderate.

3.1.2 Landscape unit mapping The terrain attributes used for classification were elevation, slope, flow accumulation and a parameter that expresses the deviation from the local maximum or minimum height. The height deviation is a measure for how much the surface of the plateau or bas-fond may deviate from respectively a local maximum or minimum elevation. The local search neighborhood was defined by a circle with 2400 meters (80 grid cells) radius. This variable was necessary to exclude pixels from classification that would be classified as plateau or bas-fond based on the threshold value for slope and flow accumulation but that are located too low in the landscape for a plateau or too high for a bas-fond. Elevation was unsuitable to use for this purpose because the area is slightly tilted. Plateaus in the northwest of the study area are found between 28 and 30 meters, plateaus in the northeast and in the south are elevated up to more than 40 meters. The elevation of the bas-fond bottoms varies from 27 meters in the upstream parts to 8 to 10 meters in the downstream parts. Elevation was only used to exclude all grid cells that are lower than the lowest plateau for plateau classification or grid cells that are higher than the highest bas-fond for bas-fond classification.

The threshold values used for the landscape classification are shown in Table 7. A grid cell had to fulfill the threshold requirements of all terrain attributes to be classified as plateau. Classification of the bas-fonds differed slightly from the classification of the plateau. Classification of the bas-fonds took three steps.

First all grid cells were selected that fulfilled the requirements for slope, elevation and flow accumulation. This resulted in a map that showed small flow routes, or a stream network, of maximum two grid cells wide in the center of the bas-fonds. A buffer of 180 meters, based on visual interpretation, was laid around these routes because bas-fonds are much wider than only a few grid cells. Then all grid cells were selected with the local minimum threshold value. These cells were merged with the buffered stream network cells, which resulted in the bas-fond map.

Table 7. Classification parameters and the assigned threshold values. Plateau Bas-fond Slope < 0.5 < 1 Elevation > 27 ≤ 33 Flow accumulation < 10 > 1600 Local maximum ≥ -4 - Local minimum - ≤ 3

After the classification phase grid cell clusters that were classified as plateau or bas-fond with an area less than 0.1 km 2 were eliminated. These clusters were regarded as classification noise (section 2.3.3). The chosen threshold value eliminates almost all noise without eliminating real plateaus and bas- fonds. The plateau and bas-fond pixels that were adjacent or within 480 meters (16 grid cells) from each other were excluded. This meant that in the final landscape unit map the plateaus and bas-fonds were separated by a glacis that is at least 480 meters in length. This threshold was chosen based on DEM interpretation and field knowledge; a glacis length is nearly always longer than 480 meters.

The landscape classification result is shown in Fig. 20. The accuracy of the landscape classification was assessed by comparing the classification result with the land use recording of the 2005 field survey. The results of this assessment are presented with a confusion matrix (Table 8). The producer’s and user’s accuracies of the classification are shown in Table 9.

The producer’s accuracy varies between the landscape units from low for the plateaus and moderate for the glacis to high for the bas-fonds. The user’s accuracy, or reliability of the classification, is very

40

low for the plateaus, high for the glacis and reasonable for the bas-fonds.

Figure 20. Landscape unit map.

Table 8. Confusion matrix and classification accuracies of the landscape classification. The bold numbers indicate the correctly classified pixels. Classification Result Training Set Plateau Glacis Bas-fond Total Plateau 6 8 0 14 Glacis 17 53 17 87 Bas-fond 0 6 48 54 Total 23 67 65 155

Table 9. Producer’s and User’s accuracies of the landscape classification. Producer’s User’s Accuracy Training Set Accuracy (%) (%) Plateau 42.9 26.1 Glacis 60.9 79.1 Bas-fond 88.9 73.9

41

3.2 Model Results

3.2.1 Soil organic carbon mapping The maps with the predicted SOC contents for the study area are shown in Fig. 21. The result of model 1 shows little spatial variation. The SOC content on glacis shows a decreasing trend on the upper part followed by a minimum around halfway the glacis and subsequently an increasing trend on the lower part. The largest SOC contents are predicted for the plateaus. The glacis values range from almost 0.28% in the center to 0.46% just below the plateaus and 0.43% in the areas around the bas- fonds. The map resulting from model 2 shows more spatial variation in the areas where natural vegetation is present and around the villages. The largest SOC contents, 0.95%, are predicted around the villages where natural vegetation does not occur. The lowest values, 0.28%, are again found in the center of the glacis. Tables 10 and 11 show some summary statistics of the predicted SOC content for the total study area and per landscape unit.

Table 10. Summary statistics for the predicted soil organic carbon content (%) by model 1. n Mean Std. Deviation Minimum Maximum Total 890126 0.38 0.06 0.28 0.50

Plateau 96271 0.50 0 0.50 0.50 Glacis 462962 0.34 0.05 0.28 0.45 Bas-fond 330893 0.41 0 0.41 0.41

Table 11. Summary statistics for the predicted soil organic carbon content (%) by model 2. n Mean Std. Deviation Minimum Maximum Total 890126 0.45 0.13 0.28 0.95

Plateau 96271 0.52 0.05 0.50 0.85 Glacis 462962 0.41 0.13 0.28 0.95 Bas-fond 330893 0.49 0.12 0.41 0.91

3.2.2 Soil texture mapping The predicted fine fraction content with model 1 (Fig. 22) shows the same pattern as the model 1 SOC content because basically the same prediction rules were used. The soil data plateau and bas-fond means were used as predictors for these two landscape units. The FF distribution on the glacis shows a parabolic trend, which corresponds to the parabolic prediction rule used. The largest value, 16%, is found in the bas-fonds. The values on the glacis range from approximately 15% just below the plateaus and around the bas-fonds to 10.1% in the center. Table 12 shows the summary statistics of the predicted FF content for the total study area and per landscape unit.

The map resulting from model 2 shows spatial variation within the plateaus and bas-fonds that is related to differences in spectral reflectance (ETM3/ETM1 ratio). This resulted in a larger average FF content for the bas-fond for model 2 when compared to model 1, which is represented by the darker shades of green that dominate the bas-fonds. This is in concordance with the assumptions made. The gray brown bas-fond soil contains less iron than the soils in the higher parts of the landscape and will have a generally lower band ratio value, which is translated to a heavier texture. For the plateaus it is just the other way around. The average plateau FF content predicted by model 2 is small than the content predicted by model 1. The smaller FF contents on the plateaus are shown in the darker shades of purple in Fig. 21. An exception however is the southwest part of the study area where green colors are found on the plateaus, indicating larger FF contents. This can be caused by a different origin of the parent material (alluvial vs. colluvial).

42

Figure 21. Predicted soil organic carbon content (%) for the upper 20 cm of the soil using model 1 (top) and model 2 (bottom).

43

Figure 22. Predicted fine fraction content (%) for the upper 20 cm of the soil using model 1 (top) and model 2 (bottom).

44

Table 13 shows summary statistics of the predicted FF content for the total study area and per landscape unit. There are differences between the predicted FF contents of model 1 and model 2. However, these differences are relatively small. The mean predicted FF content of the study area is almost the same for the two models. The validation of the models will show if these differences are significant and if model 2 indeed performs better than model 1.

Table 12. Summary statistics for the predicted fine fraction content (%) by model 1. n Mean Std. Deviation Minimum Maximum Total 890126 13.66 2.16 10.12 16.00

Plateau 96271 13.50 0 13.50 13.50 Glacis 462962 12.01 1.52 10.12 15.30 Bas-fond 330893 16.00 0 13.50 13.50

Table 13. Summary statistics for the predicted fine fraction content (%) by model 2. n Mean Std. Deviation Minimum Maximum Total 890126 13.78 2.65 7.35 23.11

Plateau 96271 12.25 1.21 7.35 0.768 Glacis 462962 12.01 1.52 10.12 15.30 Bas-fond 330893 16.71 1.25 10.92 23.11

3.3 Validation data analysis

3.3.1 Study area level During the fieldwork period 32 clusters were sampled (Fig. 23). Cluster 25 was not sampled because the starting point lay too close to a village. Five sample points spread over five different clusters were skipped because they were located on roads, laterite outcrops or termite mounds. This resulted in 155 soil samples. In section 2.5 it was explained that the method of statistical inference is determined by the sample design. This implies that inference had to be carried out on cluster level and not on the sample point level, e.g. the spatial mean of the study area is estimated by the cluster mean and not by the mean of the 155 sample points. The data set statistics, i.e. statistics estimated with Eq. 7-10, are summarized in Table 14. A complete overview of the collected data is given in appendix A.

Table 14. Validation data summary statistics (%). n Mean ± S.E. Std. Deviation Variance Minimum Maximum SOC 32 0.54 ± 0.02 0.11 0.01 0.42 0.87 Fine Fraction 32 11.36 ± 0.38 2.15 4.63 8.30 16.50

Note that these are cluster statistics, which smoothes the measure of spread statistics. Cluster standard deviations range from 0.052% to 0.476% and the within cluster range varies from 0.13% to 1.130%. The smallest SOC content measured at an individual point is 0.31%; the largest value is 1.530%. The FF cluster means vary from 8.30% to 32%, the standard deviations range from 0.57% to 9.28% and the within cluster range varies from 1.5% to 22%. The smallest FF content measured at an individual point is 5%; the largest value is 32%. This indicates considerable spatial variation between and within the clusters. Appendix B shows the summary statistics for the individual clusters. Fig. 24 shows that a positive correlation (r2 = 0.61) exists between the two soil properties at the sample points, which corresponds to the findings of Manlay et al. (2002a).

45

Figure 23. Validation observation points (circles) with the DEM as background.

35.0

a 30.0 R2 = 0.6075 25.0

20.0

15.0

10.0

Fine Fraction Content (%) 5.0

0.0 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 Soil Organic Carbon Content (%)

Figure 24. Scatter plot of the fine fraction content and the soil organic carbon content of the individual sample points, showing a positive correlation between the two soil properties.

There exists a difference between the SOC and FF means of the validation observations and the means of the original soil observations used for model development (Tables 1 and 2). The mean of the validation set is 0.1% larger for SOC, which is a relative difference of 25%, and approximately 1.5%

46

smaller for the FF content. An explanation for this discrepancy is that the samples of the original soil data set were not randomly taken. Sampling was done purposively in transects close together in three relatively small areas to cover all landscape units rather than to be representative for the area. This can bias the sample mean.

The Figs. 25 and 26 show the measured SOC and FF contents at the validation observation points. The SOC distribution within and between clusters appears to be somewhat erratic. In some clusters the SOC content of subsequent observation points varies from large to small to large. In other clusters the SOC content is more homogeneous. The spatial distribution of the FF content shows somewhat more homogeneity than the SOC distribution. The correlation between the two soil properties becomes also apparent in Figs. 24 and 25. Sudden large changes in SOC within clusters are often explained by changes in the FF content.

3.3.2 Landscape unit level The method of statistical inference of the validation observations was determined by the sampling design and had therefore to be carried out at cluster level. However, it is permitted to estimate spatial means for the landscape units, although the estimation accuracy of the means (the sample variance Vˆ and thus the standard error) cannot be estimated (Dick Brus, personal communication). Summary statistics for the three landscape units are summarized in Tables 15 and 16.

Note that the variance given in these tables is the spatial variance which gives an estimate of measure of spread of the data (Eq. 10). This should not be confused with the sample variance which gives an estimate of the accuracy with which the mean of the sample is estimated (Eq. 8). The square root of the sample variance is known as the standard error (Eq. 9).

Table 15. Landscape unit statistics for the measured soil organic carbon content (%) of the observation points. n Mean Std. Deviation Variance Minimum Maximum Plateau 14 0.55 0.11 0.01 0.39 0.77 Glacis 90 0.50 0.15 0.02 0.31 1.38 Bas-fond 51 0.61 0.26 0.07 0.31 1.53

Table 16. Landscape unit statistics for the measured fine fraction content (%) of the observation points. n Mean Std. Deviation Variance Minimum Maximum Plateau 14 11.93 3.00 9.00 8.00 16.50 Glacis 90 10.68 3.51 12.29 5.00 32.00 Bas-fond 51 12.39 4.13 17.02 7.00 26.00

ANOVA showed a significant effect of landscape unit on the SOC as well as the FF content: (F(2,152) = 4.84, p<0.01) for the SOC content and (F(2,152) = 3.69, p<0.05) for the FF content.

All landscape unit means of the SOC content are larger than the landscape unit means of the original soil observation data. The difference between bas-fond means is with 0.2% the largest. The bas-fond has the largest standard deviations. This landscape unit can be considered the most heterogeneous of the three. The means of the original soil observations were used to calibrate the prediction rules. Based on the comparison of the two data sets we can expect an underestimation of the SOC content.

The landscape unit means for the FF content are smaller than the landscape unit means of the original soil observation data, with an exception for the bas-fond. For the glacis the difference is more than 2%. The bas-fond mean is almost 2% larger. This will result in a general overestimation of the FF content. This will occur even for the bas-fond because heavier soils were expected than the soil data set suggested. As a result of this, the model was calibrated with 16% instead of the mean of the soil observation data. Model results will thus very likely be biased.

47

Figure 25. The measured SOC content at the validation observation points.

Figure 26. The measured FF content at the validation observation points.

48

3.4 Model Validation

The model results were validated with the 155 validation observations collected during the fieldwork period. When we compare the general trends in the model results as described in section 3.2 with the validation observations we can already gain some preliminary insight in model performance.

The SOC models predict on average the largest contents for the plateaus. The validation observations show that the largest SOC contents are found in the bas-fonds, which are larger than the average value predicted by both SOC models. The plateau estimates are also smaller for both SOC models than the observed value. On the glacis the smallest value is predicted halfway the slope, around position 0.5. Fig. 26 shows the distribution of SOC and FF contents along the glacis. Between 0.5-0.6 a dip in SOC values can be observed. The SOC content does not exceed 0.45. This is however much larger than the 0.28% the models predict. For both the upper and lower part of the glacis the SOC content varies between 0.40 and 0.80%. According to the fitted quadratic trend line the values on the upper part are slightly larger than for the lower part. A similar but stronger trend is found for the FF models. The FF content does not exceed 11% between glacis positions 0.4-0.6. The FF content for the upper part of the glacis ranges from approximately 7.5 to 16%. The lower part of the glacis shows a range from 7.5 to 14%.

The model assumed a quadratic trend between glacis position and SOC and FF contents. Fig. 27 does show a quadratic trend although weaker than the trend assumed by the model. Still, the patterns are roughly the same: contents on the upper part of the glacis are larger than on the lower parts and the smallest value are found around halfway down the glacis. However, the variance explained by the position on the glacis is small compared to the variance that is indicated by the scatter. In the next section the descriptive validation is complemented with a quantitative validation.

1.60 35.0 1.40 ... 30.0 ... 1.20 25.0 1.00 20.0 0.80 15.0 0.60

Fine fraction (%) 10.0 0.40 5.0

Soil organic carbon (%) 0.20 0.00 0.0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Standardized glacis position Standardized glacis position

Figure 27. Scatter plots showing the relationship between the standardized glacis position and the observed SOC content (left) and FF content (right). Through the points a second degree polynomial is fitted.

3.4.1 Bias and goodness of fit Figs. 28 and 29 show scatter plots of the observed vs. the predicted percentages SOC and FF respectively. All model fits are extremely low, almost zero. The SOC models perform slightly better than the FF models when looking at the model fits. SOC-model 2 has the best fit (r 2 = 0.13). The scatter plots show considerable bias in the predictions. The SOC models generally underestimate the SOC content, the FF models generally overestimate the FF content. The point columns in the plots are the result of the uniform predicted value within the landscape units. The point column in Fig. 28a at 0.41% is related to the bas-fond pixels. Note the large spread in observed values from approximately 0.30% to 1.50%. The right point column in Fig. 28b is associated with the plateau pixels. SOC-model 2 shows slightly better results but the fit is still very poor. There are still three point columns present. The outer columns relate to the bas-fond pixels. The left column (at 0.41%) shows the pixels where no

49

natural vegetation is present, the right column (at 0.61%) shows the pixels where natural vegetation is present. The smaller center column is related to the plateau pixels.

The results of the FF models are also biased (Fig. 29). The predicted values are in general overestimated. Again point columns at 13.5% and 16% are present that are related to plateau and bas- fond pixels respectively. The scatter cloud of FF-model 2 is more evenly spread and the point columns are gone. However, when regarding the model fit it cannot be said that model 2 is an improvement. Table 17 shows the model fits of each unit. The scatter plots of the landscape units can be found in appendix C.

The fits of both models are almost zero for all landscape units. The fits of SOC-model 2 for the glacis and bas-fonds are the only two that stand out from the rest although it can be questioned if it is really an improvement regarding the scatter plots in appendix C.

Table 17. Model fits (r 2) for each landscape unit and the overall fit. Landscape Unit SOC-model1 SOC-model2 FF-model1 FF-model2 Plateau 0.01 0.004 0.05 0.09 Glacis 0.002 0.12 0.001 0.004 Bas-fond 0.02 0.21 0.0001 0.03 Overall fit 0.001 0.13 0.005 0.02

1.60 (a) 1.60 (b) 1.40 1.40 1.20 1.20 1.00 2 1.00 R = 0.001 2 0.80 R = 0.13 0.80 0.60

Observed (%) 0.60 0.40 Observed (%) 0.40 0.20 0.20 0.00 0.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 Predicted (%) Predicted (%) Figure 28. Observed vs. predicted percentage soil organic carbon using model 1 (a) and model 2 (b). The dashed line represents the 1:1 line.

35.00 35.00 (a) (b) 30.00 30.00 25.00 25.00 2 R2 = 0.005 R = 0.017 20.00 20.00

15.00 15.00 Observed (%) 10.00 Observed (%) 10.00

5.00 5.00 0.00 0.00 0.00 5.00 10.00 15.00 20.00 25.00 0.00 5.00 10.00 15.00 20.00 25.00 Predicted (%) Predicted (%) Figure 29. Observed vs. predicted percentage fine fraction using model 1 (a) and model 2 (b). The dashed line represents the 1:1 line.

3.4.2 Statistical inference The inference statistics of the four models, calculated using Eq. 11-18, are presented in Table 18. The negative mean errors (ME) of the SOC models indicate bias, in this case underestimation of the SOC content. The FF models systematically overestimate the FF content. This confirms the trends shown in Figs. 28 and 29. The mean squared prediction errors (MSPE) of both models 2 are lower than the

50

MSPEs of models 1. However, this does not necessarily mean that the models 2 were an improvement. A t-test for paired observations will show whether or not the differences between the MPEs and MSPEs of the models differed significantly from zero, i.e. if the models 2 were significantly less biased and more accurate than models 1 at the validation points. The cluster MPE, MSPE and their variances are summarized for both soil properties in appendix D.

Table 18. Inference statistics for the four prediction models. Estimated mean prediction error (MPE), mean square prediction error (MSPE), root mean square prediction error (RMSPE) and the standard errors (S.E.) of the MPE and MSPE. MPE ± S.E. (%) MSPE ± S.E. (%) RMSPE (%) SOC-model 1 -0.153 ± 0.022 0.064 ± 0.016 0.253 SOC-model 2 -0.085 ± 0.019 0.042 ± 0.009 0.201 FF-model 1 2.34 ± 0.50 23.14 ± 3.15 4.81 FF-model 2 2.16 ± 0.51 22.47 ± 3.29 4.74

Figs. 30-33 show the spatial distribution of the MPEs and MSPEs of the SOC and FF models. Red colors dominate the error maps of the SOC models, indicating underestimation. Blue colors dominate the error maps of the FF models, indicating underestimation. There is no clear pattern in the spatial distribution of errors. The larger prediction errors are related to observations with values far outside the range of the prediction models. There are small differences in the distribution of errors between the landscape units, as shown by Tables 19 and 20.

Note that the statistics presented in these tables are calculated for the landscape units according to the field observations and not according to the classification. This explains the difference in MPEs and MSPEs of the FF for the glacis, although the same prediction rules were used for model 1 and model 2.

SOC-models . The bas-fonds show the largest bias followed by the plateaus and glacis. The RMSPE of the glacis and plateau are almost equal. The bas-fond has the largest RMSPE. Thus, the landscape unit with the best model fit (see Table 17) has the largest bias and smallest accuracy. This shows the danger of interpreting fitted scatter plots: the landscape unit with the best model fit can have the largest RMSPE. It can be further noticed that errors of all landscape are smaller for model 2 than for model 1. The bas-fond still has the largest prediction errors. The bias decreases with 50% for plateau and glacis and almost 30% for the bas-fond. The RMSPE decreases with 27% for the plateau, 24% for the glacis and 18% for the bas-fond. However, the RMSPEs remain large given the fact that the average SOC contents for the plateau, glacis and bas-fond are 0.55%, 0.50% and 0.61% respectively.

Table 19. Inference statistics of the SOC models for the three landscape units. Plateau Glacis Bas-fond MPE (%) -0.16 -0.12 -0.21 Model 1 MSPE (%) 0.05 0.04 0.11 RMSPE (%) 0.22 0.21 0.33

MPE (%) -0.08 -0.06 -0.15 Model 2 MSPE (%) 0.03 0.03 0.07 RMSPE (%) 0.16 0.16 0.27

FF-models . The statistics for the FF models show a large difference in bias between the almost unbiased plateau predictions and the predictions for the other landscape units. The differences in RMSPE are smaller. The bas-fond has the largest RMSPE followed by the plateau and glacis. Model 2 does not really the predictions. The MPE is larger for the plateau and bas-fond and slightly smaller for the glacis. The RMSPE decreases 19% for the plateau. The RMSPE of the glacis remains almost unchanged and even increases for the bas-fond. They remain relatively large compared to the average FF contents of 11.93%, 10.68% and 12.39% for the plateau, glacis and bas-fond respectively.

51

Figure 30. Spatial distribution of the prediction errors of SOC-model 1. The color of the circles indicates bias. Blue represents overestimation of the observed value, red means underestimation. The size of the circles gives a measure of the over- or underestimation.

Figure 31. Spatial distribution of the prediction errors of SOC-model 2. The color of the circles indicates bias. Blue represents overestimation of the observed value. Red means underestimation. The size of the circles gives a measure of the over- or underestimation.

52

Figure 32. Spatial distribution of the prediction errors of FF-model 1. The color of the circles indicates bias. Blue represents overestimation of the observed value. Red means underestimation. The size of the circles gives a measure of the over- or underestimation.

Figure 33. Spatial distribution of the prediction errors of FF-model 2. The color of the circles indicates bias. Blue represents overestimation of the observed value. Red means underestimation. The size of the circles gives a measure of the over- or underestimation.

53

Table 20. Inference statistics of the FF models for the three landscape units. Plateau Glacis Bas-fond MPE (%) 0.14 2.34 3.06 Model 1 MSPE (%) 11.97 22.25 28.34 RMSPE (%) 3.46 4.72 5.32

MPE (%) -0.59 1.96 3.41 Model 2 MSPE (%) 7.92 21.03 29.26 RMSPE (%) 2.81 4.59 5.41

Table 21 shows the result of a t-test for paired comparison. There is a significant difference in MPE and MSPE between SOC-model 1 and model 2. This means that the results of model 2 is significantly less biased and more accurate than model 1 at the validation points. The inclusion of the effect of natural vegetation and the compound ring around villages lead to an improved model. Although prediction accuracy at the validation points has improved, the RMSPE is still relatively large: 0.20% where the estimated mean SOC content in the study area is 0.54%. The difference in MPE and MSPE between FF-model 1 and model 2 is not significant. Model 2 cannot be considered as an improved version of model 1. The adjustment of the fine fraction content on basis of the spectral ratio ETM3/ETM1 had no positive effect on the prediction accuracy. As with the SOC models, the RMSPE is relatively large: 4.74% where the estimated mean FF content in the study area is 11.36.

Table 21. t-test statistics of four paired observations: the mean and standard deviation of the paired differences, the standard error (S.E.) of the mean difference, the lower (LL) and upper (UL) limits of the 95% confidence interval of the difference, the degrees of freedom, the t-value and the significance. SOC FF MPE 1–MPE 2 MSPE1–MSPE2 MPE 1–MPE 2 MSPE1–MSPE2 Mean -0.067 0.022 0.177 0.671 Std. deviation 0.073 0.044 0.720 5.437 S.E. of mean 0.013 0.008 0.127 0.959 LL of 95%CI -0.094 0.006 -0.082 -1.286 UL of 95%CI -0.041 0.038 0.437 2.628 df 31 31 31 31 t-statistic -5.23 * 2.88 ** 1.395 0.699 Significance 0.000 0.007 0.173 0.489 * Significant at .001 level. ** Significant at .01 level.

3.4.3 Duplicate analysis A part of the unexplained variance in SOC and FF contents in the study area can be related to the laboratory measurement error. The exact measurement error is unknown. However, with 32 duplicate samples we can estimate the mean difference (MD) between duplicates, the mean square difference (MSD), their variances and standard errors and the root mean square difference (RMSD), using Eq. 19-24. The results are presented in Table 22. This gives a measure of the precision of the sample analyses. The variance due to measurement error cannot be explained by the models. Therefore, large estimated differences (inaccurate analysis) will negatively affect the prediction accuracy at the validation points.

The small MDs shows that the sample analyses results are unbiased. The analysis precision is relatively large, indicated by the small RMSDs, which results in reliable observed values. This means that only a small part of the prediction errors is due to measurement errors. The RMSPEs for the SOC models are 0.253% and 0.201%, the measurement error 0.026%, which approximately 13% of the RMSPE. For the FF models this is also approximately 13%. The other part of the RMSPE is the

54

unexplained spatial variability. Thus, the poor model performance cannot be (partly) explained by measurement errors.

Table 22. Inference statistics of the measurement differences and squared differences. MD ± S.E. (%) MSD ± S.E. (%) RMSD (%) SOC duplicates 0.0002 ± 0.0033 0.0007 ± 0.0001 0.026 FF duplicates -0.055 ± 0.078 0.387 ± 0.107 0.622

55

56

4. Discussion

4.1 Prediction model design

McKenzie and Gallant (2004), Heuvelink and Webster (2001) and Walter et al. (2004) proposed that pedological knowledge in the form of qualitative mental models of soil distribution should be better incorporated into quantitative prediction models that use environmental correlation. The objective of this study was to apply the McKenzie and Gallant (2004) approach for a digital soil mapping case study in Senegal and validate model results with an independent data set. However, the validation showed that the quantitative prediction models developed in this study failed to predict the spatial distribution of soil organic carbon and the fine fraction within a catenary context with reasonable accuracy. All results were biased and the RMSPEs were relatively large.

There were several constraints that limited the predictive capability of the models. The limited available data, both soil and auxiliary, was an important factor contributing to these constraints. First, it was hard to support the qualitative conceptual knowledge of soil-landscape processes with available soil and auxiliary data sources. Therefore the conceptual model structure of soil-landscape processes was designed completely on the basis of expert knowledge and assumptions derived from field observations and other studies in similar landscapes or in a similar context (Dijkerman and Miedema, 1988; Allison, 1991 and Schoorl, 2002) without additional knowledge that could be derived from the soil and auxiliary data sources. Second, the translation from a qualitative conceptual model to a quantitative prediction model was weak. There were no empirical (predictive) relationships found between soil-landscape processes and the available soil data. Even linear regression of soil properties with environmental variables derived from the DEM or the soil data set (disregarding soil-landscape processes) did not yield any useful predictive relationships. Therefore the translation of the conceptual model to a quantitative prediction model was mainly based on expert judgement, without sound empirical support from environmental correlation. The available soil observations were only used to calibrate the model such that predictions were within a plausible physical range.

In spite of the bad model performance, we still believe that the soil-landscape and environmental processes that were captured in the conceptual model are important factors that can partially explain the spatial variation of the SOC and FF contents. Pieri (1969) noted the loss of topsoil due to splash erosion and farmers in the area report superficial run-off of water over the glacis and subsequent water logging in the lower parts of the landscape during the wet season. The validation observations showed low SOC and FF contents halfway the glacis and larger contents on the upper and lower parts, a pattern that could be related to erosion and sedimentation. Soil organic carbon and fine fraction contents were significantly larger under natural vegetation. The SOC-model significantly improved when the influence of natural vegetation was taken into account in model 2. SOC and FF contents halfway the glacis were lower than on the upper and lower parts which might be caused by superficial soil erosion. However, the interaction between land unit (or catena position), soil and landscape and environmental processes is very likely much more complex and somewhat more different from the used prediction models, which are constrained by unsupported assumptions that are the result of the limited available soil and auxiliary input data.

The catena is a widely adopted and applied concept for soil-landscape research (Dijkerman and Miedema, 1988; Applegarth and Dahms, 2001; Brown et al., 2004). This study shows that it is not self-evident that a catena indeed exists within a landscape and that it can be mapped easily. However, landscape units are significantly different from each other. The conceptual model framework using the catena concept forms a good basis for further research after more measurements of soil properties (e.g. the use of the collected validation data set for a new calibration procedure).

57

4.2 Model input: data quality and uncertainty

The limited availability of the soil data for the calibration of the models can be considered as the most important reason of the poor model performance. If the prediction models are to be improved in the future, this should be the first thing to pay attention to. Besides this, there are a few other factors that contributed to the poor model results.

But apart from the soil data, various other sources of input data were used for model design: a DEM, a Landsat 7 ETM+ image and environmental predictors derived from these two data sources. Each model or preprocess step adds uncertainty to the data that will affect the accuracy of the model results. It was beyond the scope of this study to perform an elaborate accuracy assessment, a model sensitivity analysis or an error propagation study. Therefore the major sources of uncertainty or error that affect the model outcomes are described in a qualitative way.

4.2.1 Digital elevation model The DEM had a grid cell size of 50 meters that was downscaled to 30 meters to facilitate cell-by-cell processing. The 50 meter cell size was sufficient considering the nature of the terrain: elevation differences within 50 meters are very small. However, the radiometric resolution of one meter did have negative effect on the model design and outcome. One meter vertical resolution is too low for a landscape where elevation changes are subtle and occur over large distances. It gives a very discrete representation of the topography with large flat areas and sudden drops of one meter instead of smooth transitions from high to low that would result from a larger radiometric resolution. This has severe negative consequences for the accuracy of the slope (illustrated in Fig. 9) and flow accumulation maps that were derived from the DEM and used for landscape classification.

A DEM with a high radiometric resolution (0.5 m. or even 0.25 m.) is desirable in a terrain were elevation differences occur over large distances. DEM derived terrain attributes would be more accurate and more meaningful and useful in digital soil mapping studies using environmental correlation in low-relief landscapes.

4.2.2 Landscape classification The use of decision-rules with threshold values for landscape classifications had its limitations for the accuracy that can be attained: the accuracy of the resulting land unit map might be smaller than a map resulting from manual delineation of the DEM based on field knowledge. The classification approach used in this study resulted in plateaus are in general too wide, which becomes clear when the plateaus are overlain with the DEM (Fig. 34). This explains the low reliability of the plateau classification (Tables 8 and 9). Almost 74% of the training pixels that were classified as plateau are in reality glacis. The bas-fonds are, although in lesser extent than the plateaus, also too wide in some areas. In total 40% of the glacis pixels were misclassified as plateau or bas-fond. Misclassification during landscape classification is one of the most important error sources that affect the model outcome. Prediction rules were directly linked to land unit, which means that misclassification automatically results in a very different prediction rule and thus a very different result with very likely a large prediction error.

Another striking feature in Fig. 34 is the irregular outline of the plateau and bas-fond. This is caused by the use of threshold values for flow accumulation. The dents in the left figure of Fig. 34 are places where flows converge, resulting in a flow accumulation value that exceeds the threshold values. The bulges are caused by relatively flat areas where flow accumulation only slowly increases and where it takes to exceed the threshold value. These imperfections in outline (plateau borders are not that erratic) and land unit width are inherent to the classification method used and will have its impact on the accuracy of the model output.

58

Plateau Bas -fond

Figure 34. Overlay of the DEM with a plateau and bas-fond unit. Note that the angular appearance of the plateau and bas- fond outlines is caused by conversion of grid to vector for the purpose of visualization.

4.2.3 Landsat 7 ETM+ image classification Natural vegetation and bare soil areas were classified from the Landsat image and used as input in the prediction models. A few aspects complicated the classification process. The Landsat ETM+ image used in this study was acquired in March 2003, during the dry season. The lack of water had a strong effect on the vegetation. Baobab trees lost their leaves, grasses turned yellow and the shrub leaves dried out and lost part of their chlorophyll. These processes had a negative effect on the spectral reflectance, i.e. the distinctive features of the vegetation spectral curve were weakened. This results in high correlation (0.77, Table 23) between bands 3 and 4, which are normally useful for vegetation mapping. This made it harder to classify natural vegetation from the remote sensing image. The areas with dried-out, yellow grasses were most heavily affected. The spectral curve of the yellow grasses resembled the curve of dried out crop residues that are left on the land after harvest. These residues covered the soil with densities ranging from 10 to 100%, creating mixed pixels. Especially the pixels with high soil coverage were difficult to separate from the grass pixels, pixels with low coverage were difficult to separate from bare soil pixels, thus creating mixed classification classes. This is reflected by the moderate user’s accuracy of the natural vegetation classification (Table 23): only 48.9% of the pixels classified as natural vegetation were natural vegetation, the other pixels were likely farmland covered with crop residues. On the other hand, approximately one third of the natural vegetation training pixels were not classified as natural vegetation. As with the landscape classification, misclassification had a direct effect on the model outcome and thus on the prediction error.

Table 23. Landsat ETM+ correlation coefficients. Band 1 Band 2 Band 3 Band 4 Band 5 Band 7 Band 1 1 Band 2 0.87 1 Band 3 0.68 0.86 1 Band 4 0.41 0.61 0.77 1 Band 5 0.65 0.75 0.79 0.50 1 Band 7 0.62 0.66 0.61 0.18 0.87 1

The Landsat ETM+ image acquired in November 2002 during the wet season did not have these constraints. However, besides natural vegetation, crops were still present. There was no ground truth crop data available to use as training areas, which makes it hard to distinguish crops and natural vegetation by supervised classification. Unsupervised classification of the November image did not result in a clear distinction between crops and vegetation. It was therefore decided not to use this image for classification.

59

4.2.4 Environmental predictor variables The standardized glacis position was used as predictor of both SOC and FF for the glacis. In order to calculate this standardized position indicator it was necessary to determine the distance to the closest plateau and bas-fond for each pixel in the study area (Eq. 2). For the pixels around the left arrow depicted in Fig. 35, the closest plateau is the one to the south. The closest bas-fond lies directly below the arrow. This means that the standardized glacis position for the pixels around the arrow are around 0.80 (1 is in the bas-fond). However, these pixels are located just below a ridge (that was not classified as plateau!) as can be seen in the DEM. A similar situation occurs for the pixels around the right (dashed) arrow. For these pixels the Euclidean distance to the plateau at the other side of the bas- fond is used to calculate the position indicator, instead of the small (unclassified) plateau to its left.

This example points to an important shortcoming of the approach used to express the position of a pixel on the glacis relative to the plateau clear. The shortest Euclidean distance from a glacis pixel to the plateau and bas-fond was used to calculate the standardized glacis position, which is not always the Euclidean distance to the plateau upslope or the bas-fond downslope as it should be. This is partly related to the fact that not all ridges (catchment boundaries) in the landscape are classified as plateau. This illustrates that the landscape misclassification has besides direct effects on the prediction results as described in section 4.2.2) also indirect effects via prediction variables derived from the land unit map.

It can be debated if all ridges are plateaus and should be classified as such. Many ridges are too narrow to be regarded as a plateau or miss the laterite capping. From a geomorphological point of view they can probably considered glacis. However, in a context of soil-landscape processes, in this case erosion-sedimentation distributions, these ridges are important. The standardized glacis position of the pixels indicated by the arrows in Fig. 35 should be expressed relative to the ridges upslope and not relative to the nearest plateau. This means that besides the plateaus, also the ridges should be delineated (by line or polygon) to improve the prediction. Furthermore it should be assured that for the calculation of the standardized position indicator the Euclidean distances to the nearest plateau / ridge upslope and the nearest bas-fond downslope from the pixel of interest.

Figure 35. Detail of the standardized glacis position (left) where plateaus are depicted in black and bas-fonds in white and a detail of the DEM of the same area (right).

4.3 Spatial prediction of the soil organic carbon and fine fraction contents

4.3.1 Spatial dependency The lack of quantitative empirical predictive relationships and uncertainties and inaccuracies of model input data influence the prediction accuracy of the models, leading to large prediction errors and low fits. Another factor that can influence the model results is the spatial dependence structure of the soil properties.

A target variable has to show spatial dependency in an area for soil-landscape modelling. The range of spatial dependency will affect the potential for modelling as a tool to gain more insight in the spatial distribution of the target variables and it also affects the model results. Spatial modelling is easier when a clear, long range spatial dependency structure exists. On the contrary, when there is

60

substantial short-range variation at finer resolution than the environmental predictor, spatial prediction may not be possible (McKenzie and Gallant, 2004). Results in terms of model fits and prediction accuracies will therefore always be low.

Thus if the two soil properties of interest show a large short range variation in the study area, then disappointing model results are not surprising. Furthermore large short-range variation will hamper the possibilities to improve the prediction capability. Semivariograms are widely used in geostatistics to describe and model spatial dependence of environmental variables (Isaaks and Srivastava, 1989). They are estimated from sample data (Lark, 2002), which are in this case the validation observations. Subsequently, the estimated semivariogram is fitted to a model, often with non-linear regression (van Groenigen, 2000). Figs 36 and 37 show the estimated and modelled semivariograms of the validation observations and the model residuals for respectively SOC and FF. All semivariograms were estimated and fitted in GSTAT (Pebesma and Wesseling, 1998) using an exponential model. Table 24 shows the semivariogram parameters.

0.06

0.05 SOC content 0.04 Residuals model #1 Residuals model #2 0.03 model variogram soc

Semivariance 0.02 model variogram res#1 0.01 model variogram res#2

0.00 0 500 1000 1500 2000 2500 3000 Distance Figure 36. Semivariograms showing the spatial dependence structure of the SOC content and of the residuals of the prediction models.

30

25 FF content 20 Residuals model #1 Residuals model #2 15 model variogram ff 10 Semivariance model variogram res#1 5 model variogram res#2

0 0 500 1000 1500 2000 2500 3000 Distance Figure 37. Semivariograms showing the spatial dependence structure of the FF content and of the residuals of the prediction models.

The semivariograms show that the spatial dependency is stronger for the FF content than for the SOC content, which is expressed by a larger range and a less steep slope. This is in concordance with findings in section 3.3.1. Within cluster variation appeared more homogeneous for FF than for SOC. The FF content is controlled by environmental variables or soil forming factors that are less variable in space than the SOC content.

61

Both ranges of spatial dependence exceed the transect size (720 m.). Thus observations within transects are spatially correlated, which makes spatial prediction theoretically possible. However, the spatial dependence structure is not strong. The ranges are for both soil properties relatively short: the semivariogram reaches 50% of the sill at a lag distance of 215 m. and 260 m. for SOC and FF respectively. There is considerable short-range variation. Besides, both ranges are smaller than 1200 m. This implies that soil properties at observation points that are at a distance of at least 1200 m. from each other, can be completely uncorrelated. Thus, spatial prediction of SOC and FF is possible according to the semivariograms, i.e. there exists a spatial dependence structure that can be modelled. However, the relatively short range will make spatial prediction difficult.

Table 24. Parameters of the modelled semivariograms. SOC Residual Residual FF Residual Residual content SOC-model 1 SOC-model 1 content FF-model 1 FF-model 2 Sill 0.045 0.048 0.041 18.85 22.76 21.61 Range (m) 930 969 794 1129 1398 1326

Besides insight in the spatial dependence structure of the soil properties, the semivariograms show another feature that is of great importance for the interpretation of the model results. The sills of the residuals of all models, except SOC-model 2, are all larger than the sills of the soil properties, i.e. the variance of the residuals is larger than the variance of the observations. This is surprising considering the fact that there exists a positive correlation (although very small) between observation and prediction (Figs. 27 and 28, section 3.4.1) but not impossible (Gerard Heuvelink, personal communication).

This has an important implication for the model results: besides the fact that they do not explain a part of the variance present in the observations, they add variance to the spatial distribution. Only the variance of the residuals of SOC-model 2 is smaller than the variance in the observations. Note that the range of the residuals of SOC-model 2 is smaller than the range of the observations which can be considered positive. In an ideal situation the residuals should not show any spatial dependency, i.e. all spatial variation is accounted for by the model, which means that the range should be close to zero. The variation that is left is then pure-nugget variation.

4.3.2 Other issues concerning spatial prediction Sensitivity of environmental variables . Another discussion point is the low SOC content in the study area. Differences between observations are in general less than 0.5% and often only a few tenths of a percent. It is arguable, if these very low differences in SOC within and between land units can be explained by digital soil mapping techniques. Differences of a few percent can often be (partially) explained by environmental correlation with reasonable accuracy, e.g. Florinsky et al. (2002), Hengl et al. (2004). However, differences of a few tenths of a percent might not be possible to relate to environmental variables by environmental correlation. Digital soil mapping techniques have of course sensitivity limits and might therefore not be able to predict these very subtle differences from environmental variables:

• McKenzie and Gallant (2004) point out that some landscapes have very few predictors because of too subtle terrain features. The relief in the Nioro area is possibly too subtle to derive suitable predictors from a DEM with a relatively coarse radiometric resolution like the one used in this study. A DEM with a finer resolution would probably be more useful to describe subtle terrain features.

• Soil organic matter has spectral activity throughout the visible (VIS), near infrared (NIR) and short-wave infrared (SWIR) regions. Accurate assessment of organic matter from remote sensing imagery requires high spectral resolution data across these three spectral regions (Ben-Dor, 2002), for example supplied by hyperspectral sensors like the space borne MODIS and airborne (AVIRIS). However, Baumgardner et al. (1970) state the soil organic matter

62

content of the topsoil has to be at least 2%, which corresponds to a SOC content of 1.16%, to have an effect on reflectance. SOC contents in the study area are in general much lower, which means that the use of remote sensing data for prediction of SOC in the study area is very limited.

It is, however, possible to predict clay content from remote sensing images (Odeh et al., 2000; Ben- Dor, 2002) or from terrain variables. The correlation between the fine fraction and SOC can then be used to estimate the SOC content.

Scale of prediction . SOC and FF contents were predicted and validated on grid cell level which means a spatial resolution of 30 m. It can be questioned if predictions on this level are feasible and realistic. DEM, remote sensing image and GPS have positional inaccuracies. According to field observations discrepancies between remote sensing coordinate and GPS coordinate are less than 60 meters, two grid cells. This might be not that much but it does affect the validation. Adjacent grid cells can have very different predictions which can lead to different validation results. However, this is not a problem if the soil spatial variation within these 60 meters is small.

Another factor that should be regarded is the spatial resolution of the environmental variables. Soil properties cannot be predicted at a finer resolution than the model input, i.e. the environmental variables. Although grid cells with 30 m spatial resolution were used for the landscape unit map and for the natural vegetation map, these maps do not supply information with a spatial accuracy of 30 m (corresponding to a cartographic scale 1:30,000 (McBratney et al., 2003)). The uncertainty in these maps is larger. This makes the use of these maps for predictions with 30 m spatial resolution questionable. Predictions for a grid cell size of 50 or even 100 m would give a more realistic approach. Furthermore it should be considered if the TOA model requires input with 30 m spatial resolution.

High resolution aerial photographs would help to acquire more accurate maps of natural vegetation and other land use types and landscape units. This would give more accurate model input that can result in model output at a higher spatial resolution. But then there is also a tradeoff between the costs of high resolution data vs. the costs of a soil survey to take additional observations.

4.4 Final thoughts

4.4.1 Digital soil mapping within the TOA context This study illustrates how difficult and challenging digital soil mapping can be in areas with very low data availability. However, it must not be forgotten that this can be considered as pioneer research to improve quantitative soil data availability for the West African Trade-Off Analysis project. Although the model results are disappointing, they are not completely surprising considering the fact that the prediction models were built from scratch with very little soil and auxiliary data that could be used to construct a qualitative conceptual model and to translate this model into a quantitative prediction model.

What does this imply for the future of quantitative soil data availability for the TOA project in West Africa? There are now four prediction models based on a qualitative conceptual framework. These models do not perform well. The conceptual model seems to work but can be improved. However, the validation observations and the model residuals show spatial dependence, which means there is room for improvement. Future study of the soil data set acquired during this study might lead to a better insight in the spatial distribution of the two soil properties and their relations to the landscape and its processes. The fact that these observations were taken in transects make them very suitable for soil- landscape modelling in a catenary context. The newly acquired data set can greatly contribute to existing knowledge about the spatial distribution of SOC and FF and their relation with soil-landscape processes and environmental variables.

63

Of course, the 155 soil observations could have been used in this study for model calibration. This might have lead to better results. However, we chose to use them for validation because, from scientific point of view, it is of great importance to validate model results. This is also one of the strengths of digital soil mapping. It allows quantifying the accuracy of the predicted soil (property) maps.

DEMs do not always have the quality we wish. Use of a DEM with a higher radiometric resolution with which subtle terrain features can be better described, and high-resolution aerial photographs can decrease the uncertainty in input data. In addition, a high resolution DEM would also make application of more sophisticated soil erosion and sedimentation models possible, e.g. Schoorl (2002). The use of high-resolution input allows a more accurate assessment of the correlation between soil- landscape processes, environmental variables and the soil properties.

In short, knowledge gained from the newly acquired soil observations, combined with high-resolution input data might lead to a better foundation for the qualitative conceptual model of spatial distribution of soil properties and a better supported translation into quantitative prediction models.

4.4.2 Digital soil mapping in Africa McBratney et al. (2003) made an inventory of quantitative studies in which soil classes or properties were spatially predicted. They listed 66 studies of which only three were located in developing countries: two in Asia and one in Africa. A striking difference between applications in the western world and the developing world.

Digital soil mapping techniques use various data sources that are in general widely available and easily accessible in the western world. These data sources range from national soil databases to digital elevation models and photographs and (hyperspectral) imagery and other data from airborne, space borne sensors and proximal sensors, including more sophisticated methods like airborne gamma radiometrics for capturing pedological and parent material differences (McKenzie and Gallant, 2004). Digital soil mapping can be a powerful tool if soil and auxiliary environmental data are available, indeed.

For the Nioro du Rip area, however, available soil and environmental data are limited. The existing data are almost never available in digital format and are scattered. More sophisticated techniques for (high resolution) data acquisition like airborne remote sensing for acquisition of aerial photographs, hyperspectral images or for gamma radiometrics are not easily available. Meerkerk (2003) lists the available soil data for Senegal. Maps on local scale are in general old, inaccurate and based on different classification systems and legends. Maps on national scale (1:500,000 by Stancioff et al., 1984 and 1:1,000,000 by Audry et al., 1965), property of the International Soil Reference and Information Center (ISRIC) were not available at the time this study was carried out. When digitized, these maps might prove a valuable information source for digital soil mapping in the Nioro du Rip area in the future.

All these factors make digital soil mapping in Africa much more challenging than mapping in the developed world. But there are also more opportunities for digital soil mapping. Rapid growth of the population will put more and more pressure on the environment and agriculture. The need for quantitative (soil) data will grow in order to work towards a sustainable future. There is less soil data available compared to developed countries so there is much to gain on this terrain. Still, the new advances in soil science, including digital soil mapping seem limited to the western world according to the inventory by McBratney et al. (2003). So far, countries in Africa do not seem to profit from the achievements in science and technology.

64

5. Conclusions

In this study we applied digital soil mapping to map the spatial distribution of soil organic carbon and fine fraction contents within a catenary context for a study area in the southwest of Senegal. The aim was to develop a quantitative prediction model in which qualitative knowledge about soil-landscape and environmental processes that affect the spatial distribution of the two soil properties within the catena was incorporated and subsequently to assess the model performance.

5.1 Model design and spatial prediction

The presented results show that the developed quantitative soil prediction models performed very poorly. All results were biased and uncertainties were large. The fit between observed and predicted value was very low. The variance of the residuals was for all models, except SOC-model 2 larger than the variance of the validation observations. Furthermore, the residuals of all four models showed still spatial dependence.

The bias was caused by the difference in SOC and FF contents between the calibration and validation dataset. The SOC contents of the validation set were larger than the contents of the calibration set which resulted in systematic underestimation. The FF contents of the validation set were lower than the contents of the calibration set leading to systematic overestimation.

SOC-model 2 yielded relatively the best predictions based on bias and uncertainty. The variance of the residuals was smaller than the variance of the validation observations and the range of spatial dependence of the residuals was smaller than the range of the observations. This model performed significantly better than SOC-model 1. The incorporation of the influence of natural vegetation on the SOC content improved the model performance. Statistical analysis of the validation observations showed that the SOC content of the topsoil under natural vegetation was significantly larger than the SOC under non-natural vegetation.

The FF-model 2 had slightly lower bias and MSPE than FF-model 1. However, the difference was not significant. The assumption that the red soils in the larger part of the landscape contain have a lower FF content than the gray brown soils in the lower parts of the landscape, which formed the basis of the conceptual model of FF-model2, can be rejected. The validation observations showed no significant difference in FF between the red and gray brown soils. Both models added variance to the spatial distribution of the FF content instead of explaining variance.

It was a precondition of this study to develop the prediction models without additional soil data acquisition. It can be concluded that the available soil was clearly not enough for application of digital soil mapping in the Nioro du Rip area. The lack of sufficient data was the main reason for the poor model performance. With the available data it was difficult to: • identify key soil-landscape processes that determine the spatial distribution of SOC and FF within and between catena positions, • support the assumptions on which the qualitative conceptual model of spatial distribution of SOC and FF is based, • translate the qualitative conceptual model to a quantitative prediction model using environmental correlation.

Other reasons were: • the use of a DEM with coarse radiometric resolution in a landscape with subtle relief differences,

65

• uncertainties in generated input data from DEM and Landsat 7 ETM+ image, • low SOC contents (< 1%) in general and small differences between catena positions for which digital soil mapping techniques might not be sensitive enough to predict from environmental variables.

Although the catena concept is widely applied in soil science research. This study shows that it is not self-evident that a catena indeed exists within a landscape and that it can be mapped easily.

5.2 Spatial distribution of SOC and FF

The 155 validation observations gave much better insight in the spatial distribution of the SOC and FF within the catena concept. The average SOC content in the study area is estimated on 0.54% with a standard error of 0.02%. The largest SOC contents are found in the bas-fonds, followed by the plateaus. The glacis have the lowest average SOC content. The difference between the landscape units is significant. The spread is large, the values found range from 0.31% to 1.58%. There is substantial short-range, within cluster, spatial variation. The range of spatial dependence is only 960 meters. Already at 215 meters the semivariograms reaches 50% of the sill value.

The FF content in the study area averages 11.36% with a standard error of 0.38% and a range from 5% to 32%. Again the largest contents are found in the bas-fonds, followed by the plateaus and the glacis. This is similar to the trend SOC content shows. This can be explained by the correlation between the two soil properties (r 2=0.61). SOC binds to the fine fraction of the soil. The difference in FF content between the three landscape units is statistically significant. The spatial distribution of the FF content within clusters appears somewhat more homogeneous than the SOC content although there is still considerable short-range variation. The range of spatial dependence is 1130 meters. The semivariograms reaches 50% of the sill value at 260 meters.

5.3 Concluding remarks and recommendations

Although model results are disappointing, the concept that is underlying the qualitative conceptual model can function as a framework for future improvement of both the conceptual and prediction models. The newly acquired soil data set containing 155 observations combined with high resolution and accurate environmental data can form a solid basis for better understanding of the spatial distribution of the soil properties. This should result in an improvement of the prediction models. We therefore recommend that digital soil mapping research in the TOA context is continued along the following lines:

• It is essential to acquire a more elaborate and more accurate set of environmental data to improve predictions. Aerial photographs, a DEM with a high radiometric resolution, hyperspectral remote sensing images and general purpose soil maps (which are available at ISRIC again in the near future in digital format) will prove very useful. However, acquisition can be a problem. But without additional data it will be difficult to improve the conceptual and prediction models. • Aerial photographs will clearly show the plateaus and natural vegetation areas. They can be used to improve landscape unit, natural vegetation and land use mapping. • Perform a thorough analysis of the new soil data set with respect to the spatial distribution of the two soil properties and their relationship to environmental variables and use these relationships to substitute expert judgement. • Extrapolate prediction models, in case of successful predictions, to other TOA study areas in similar landscapes.

This study showed how challenging digital soil mapping can be in a country where soil and environmental data are limited and where available data are scattered. Africa seems until now not to profit from the recent advancements in soil mapping techniques and data acquisition technology.

66

References

Allison, R.J., 1991. Slopes and slope processes. Progress in Physical Geography, 15(4): 423-437. Applegarth, M.T. and Dahms, D.E., 2001. Soil catenas of calcareous tills, Whiskey Basin, Wyoming, USA. CATENA, 42(1): 17-38. ArcInfo Version 9, 2004. Redlands, CA: ESRI Inc. Audry, P., Bonfils, P., Charreau, C., Dubois, J., Fauck, R., Faure, J., Gavaud, M., Maignien, R., Maymard, J., Peirera-Barreto, S., Turenne, J.F. and Vizier, J.F., 1965. Carte pédologique du Sénégal à l'échelle de 1:1 000 000. Baumgardner, M.F., Kristoff, S.J., Johannsen, C.J. and Zachary, A.L., 1970. The effect of organic matter on multispectral properties of soils. Proceedings of the Indian Academy of Science, 79: 413-429. Ben-Dor, E., 2002. Quantitative remote sensing of soil properties. Advances in Agronomy, 75: 173- 243. Brus, D.J. and Kiestra, E., 2002. Kan de efficientie van bodemkarteringen op schaal 1:10 000 worden vergroot met het Actuele Hoogtebestand Nederland? Alterra-rapport, 498, Alterra, Wageningen, 54 pp. De Bruin, S. and Stein, A., 1998. Soil-landscape modelling using fuzzy c-means clustering of attribute data derived from a Digital Elevation Model (DEM). Geoderma, 83(1-2): 17-33. De Gruijter, J.J., Brus, D.J., Bierkens, M.F.P. and Knotters, M., in press. Sampling for natural resource monitoring. Springer, 328 pp. Dijkerman, J.C. and Miedema, R., 1988. An Ustult-Aquult-Tropept catena in Sierra Leone, West Africa, I. Characteristics, genesis and classification. Geoderma, 42(1): 1-27. Dobos, E., Micheli, E., Baumgardner, M.F., Biehl, L. and Helt, T., 2000. Use of combined digital elevation model and satellite radiometric data for regional soil mapping. Geoderma, 97(3-4): 367-391. Dobos, E., Montanarella, L., Negre, T. and Micheli, E., 2001. A regional scale soil mapping approach using integrated AVHRR and DEM data. International Journal of Applied Earth Observation and Geoinformation, 3(1): 30-42. Driessen, P.M. and Dudal, R.E., 1991. The major soils of the world. Agricultural University Wageningen, Department of Soil Science and Geology, in association with: Catholic Univeristy Leuven, Institute for Land & Water Management, Wageningen, 310 pp. Erdas Imagine Version 8.7, 2003. Leica Geosystems GIS & Mapping LLC. Florinsky, I.V., Eilers, R.G., Manning, G.R. and Fuller, L.G., 2002. Prediction of soil properties by digital terrain modelling. Environmental Modelling & Software, 17(3): 295-311. Gessler, P.E., Moore, I.D., McKenzie, N.J. and Ryan, P.J., 1995. Soil-landscape modelling and spatial prediction of soil attributes. International Journal of Geographic Information Systems, 9(4): 421-432. Hengl, T., Heuvelink, G.B.M. and Stein, A., 2004. A generic framework for spatial prediction of soil variables based on regression-kriging. Geoderma, 120(1-2): 75-93. Hengl, T., Rossiter, D.G. and Stein, A., 2003. Soil sampling strategies for spatial prediction by correlation with auxiliary maps. Australian Journal of Soil Research(41): 1403-1422. Heuvelink, G.B.M. and Bierkens, M.F.P., 1992. Combining soil maps with interpolations from point observations to predict quantitative soil properties. Geoderma, 55(1-2): 1-15. Heuvelink, G.B.M., Brus, D.J. and De Gruijter, J.J., 2004. Optimization of sample configurations for digital soil mapping with universal kriging, Global Workshop on Digital Soil Mapping, Montpellier, pp. 13. Heuvelink, G.B.M. and Webster, R., 2001. Modelling soil variation: past, present, and future. Geoderma, 100(3-4): 269-301.

67

Isaaks, E.H. and Srivastava, R.M., 1989. Applied Geostatistics. Oxford University Presss, Inc., New York, USA, 561 pp. Jenny, H., 1941. Factors of soil formation, A system of quantitative pedology. McGraw-Hill, New York, 281 pp. Kariuki, P.C., Woldai, T. and van der Meer, F., 2004. The role of remote sensing in mapping swelling soils. Asian Journal of Geoinformatics, 5(1). Lagacherie, P. and Voltz, M., 2000. Predicting soil properties over a region using sample information from a mapped reference area and digital elevation data: a conditional probability approach. Geoderma, 97(3-4): 187-208. Lark, R.M., 2002. Optimized spatial sampling of soil for estimation of the variogram by maximum likelihood. Geoderma, 105(1-2): 49-80. Lillesand, T.M. and Kiefer, R.W., 2000. Remote sensing and image interpretation. John Wiley & Sons, Inc., New York, USA, 724 pp. Manlay, R.J., Chotte, J.-L., Masse, D., Laurent, J.-Y. and Feller, C., 2002a. Carbon, nitrogen and phosphorus allocation in agro-ecosystems of a West African savanna: III. Plant and soil components under continuous cultivation. Agriculture, Ecosystems & Environment, 88(3): 249-269. Manlay, R.J., Kaire, M., Masse, D., Chotte, J.-L., Ciornei, G. and Floret, C., 2002b. Carbon, nitrogen and phosphorus allocation in agro-ecosystems of a West African savanna: I. The plant component under semi-permanent cultivation. Agriculture, Ecosystems & Environment, 88(3): 215-232. McBratney, A.B., Mendonca Santos, M.L. and Minasny, B., 2003. On digital soil mapping. Geoderma, 117(1-2): 3-52. McBratney, A.B., Odeh, I.O.A., Bishop, T.F.A., Dunbar, M.S. and Shatar, T.M., 2000. An overview of pedometric techniques for use in soil survey. Geoderma, 97(3-4): 293-327. McKenzie, N.J. and Gallant, J., 2004. Digital soil mapping with improved environmental predictors and models of pedogenesis., Global Workshop on Digital Soil Mapping, Montpellier, France, pp. 25. McKenzie, N.J. and Ryan, P.J., 1999. Spatial prediction of soil properties using environmental correlation. Geoderma, 89(1-2): 67-94. Meerkerk, A., 2003. Soil and land use study of the Peanut Basin at Nioro, Senegal. Internship Report, Laboratory of Soil Science and Geology, Wageningen University, Wageningen, The Netherlands, 26 pp. Milne, G., 1935. Composite units for the mapping of complex soil associations. In: Transactions of the Third International Congress of Soil Science, Oxford, England, T. Murby & Co., London, pp. 345–347. Milne, G., 1936. Normal erosion as a factor in soil profile development. Nature 138: 548–549. Niang, A., 2004. Organic matter stocks under different types of land use in the Peanut Basin of the Nioro area in Senegal. MSc Thesis, Laboratory of Soil Science and Geology, Wageningen University, Wageningen, The Netherlands, 84 pp. Odeh, I.O.A. and McBratney, A.B., 2000. Using AVHRR images for spatial prediction of clay content in the lower Namoi Valley of eastern Australia. Geoderma, 97(3-4): 237-254. Odeh, I.O.A., McBratney, A.B. and Chittleborough, D.J., 1994. Spatial prediction of soil properties from landform attributes derived from a digital elevation model. Geoderma, 63(3-4): 197-214. Odeh, I.O.A., McBratney, A.B. and Chittleborough, D.J., 1995. Further results on prediction of soil properties from terrain attributes: heterotopic cokriging and regression-kriging. Geoderma, 67(3-4): 215-226. Pebesma, E.J. and Wesseling, C.G., 1998, Gstat: a program for geostatistical modelling, prediction and simulation. Computers & Geosciences Vol. 24, No. 1, pp. 17-31 Pieri, C., 1969. Étude pédologique de la région de Nioro du Rip, ISRA-CNRA, Bambey, Senegal, 134 pp. Schoorl, J.M., 2002. Adressing the Multi-scale Lapsus of Landscape. PhD Dissertation, Laboratory of Soil Science and Geology, Wageningen University, Wageningen, The Netherlands,172 pp. Stancioff, A., Staljanssens, M. and Tappan, G., 1984. Mapping and remote sensing of the resources fo the Republic of Senegal. A study of the geology, hydrology, soils, vegetation and land use

68

potential., The remote sensing institute, South Dakota State University., Brookings, USA, 655 pp. SPSS for Windows, Rel. 11.5, 2001. Chicago: SPSS Inc. Van Breemen, N. and Buurman, P., 1998. Soil Formation. Kluwer Academic Publishers, Dordrecht, The Netherlands, 377 pp. Van Groenigen, J.W., 2000. The influence of variogram parameters on optimal sampling schemes for mapping by kriging. Geoderma, 97(3-4): 223-236. Walter, C., Lagacherie, P. and Follain, S., 2004. Integrating pedological knowledge into soil digital mapping, Global Workshop on Digital Soil Mapping, Montpellier, France, pp. 20.

69

70

Appendices

Appendix A The soil data set collected during the fieldwork in 2005

Appendix B Summary statistics of the soil organic carbon and fine fraction contents for the 32 sampled clusters

Appendix C Scatterplots showing the observed vs. the predicted value for the three landscape positions

Appendix D Summary statistics of the PEs and SPEs of the two models for the 32 sampled clusters

71

72

Appendix A

The soil data set collected during the fieldwork in 2005

Landscape Transect Sample GPS-N GPS-E position Land use org C (%) fine fraction (%) Soil color, dry Soil color, wet 1 1 1537375 415164 Glacis Groundnut 0.44 10.5 7.5 YR 5/4 7.5 YR 3/4 1 2 1537371 415348 Glacis Millet 0.42 14.5 7.5 YR 5/4 7.5 YR 3/4 1 3 1537376 415521 Glacis Natural Vegetation 0.58 10.5 7.5 YR 6/4 7.5 YR 3/4 1 4 1537374 415705 Glacis Millet 0.63 13.0 7.5 YR 6/4 7.5 YR 3/4 1 5 1537374 415888 Glacis Maize 0.61 14.5 7.5 YR 6/4 7.5 YR 3/4 2 1 1544424 418738 Glacis Sorghum 0.36 9.0 10 YR 5/3 10 YR 3/4 2 2 1544424 418915 Glacis Sorghum 0.36 9.0 10 YR 5/4 10 YR 3/4 2 3 1544423 419096 Bas-fond Natural Vegetation 0.57 12.5 10 YR 5/3 10 YR 3/3 2 4 1544424 419274 Bas-fond Millet 0.60 12.0 10 YR 5/3 10 YR 3/3 2 5 1544423 419456 Glacis Groundnut 0.32 10.5 10 YR 6/4 10 YR 4/4 3 1 1531403 415587 Glacis Natural Vegetation 0.41 9.5 7.5 YR 6/2 7.5 YR 4/2 3 2 1531582 415585 Glacis Natural Vegetation 0.47 13.5 10 YR 6/3 10 YR 4/2 3 3 1531764 415589 Bas-fond Natural Vegetation 0.73 13.5 10 YR 6/2 10 YR 4/2 3 4 1531945 415583 Bas-fond Groundnut 0.39 8.5 10 YR 5/3 10 YR 4/2 3 5 1532122 415585 Bas-fond Groundnut 0.43 9.5 10 YR 5/4 10 YR 3/3 4 1 1523454 397677 Glacis Millet 0.49 10.5 7.5 YR 5/2 7.5 YR 4/2 4 2 1523633 397676 Glacis Millet 0.58 10.0 7.5 YR 5/2 7.5 YR 4/2 4 3 1523812 397677 Glacis Groundnut 0.81 15.0 7.5 YR 5/2 7.5 YR 4/2 4 4 1523994 397678 Glacis Groundnut 0.62 10.5 10 YR 5/4 10 YR 3/3 4 5 1524176 397677 Glacis Groundnut 0.43 6.5 10 YR 5/3 10 YR 4/3 5 1 1524294 417052 Glacis Natural Vegetation 0.77 12.5 10 YR 6/3 10 YR 3/3 5 2 1524474 417053 Glacis Maize 0.40 13.0 10 YR 6/3 10 YR 4/3 5 3 1524654 417053 Glacis Millet 0.44 11.0 10 YR 5/4 10 YR 3/3 5 4 1524831 417054 Glacis Millet (+nat veg) 0.56 12.0 10 YR 5/4 10 YR 3/3 5 5 1525015 417057 Glacis Natural Vegetation 0.59 10.0 10 YR 5/2 10 YR 3/3 6 1 1523213 418825 Glacis Millet 0.53 10.0 7.5 YR 5/4 7,5 YR 3/4 6 2 1523217 419005 Glacis Millet 0.55 12.0 7.5 YR 6/2 7.5 YR 4/2 6 3 1523212 419184 Glacis Millet 0.51 10.0 10 YR 6/3 10 YR 4/3 6 4 1523214 419365 Glacis Millet 0.53 17.0 10 YR 5/4 10 YR 4/3 6 5 1523213 419546 Glacis Millet / Sorghum 1.38 32.0 10 YR 5/2 10 YR 3/3 7 1 1545955 406072 Bas-fond Cattle Corral 0.54 15.0 10 YR 5/3 10 YR 3/3 7 2 1546134 406073 Bas-fond Millet / Cattle Corral 0.62 7.0 10 YR 5/3 10 YR 3/3 7 3 1546314 406076 Bas-fond Millet / Cattle Corral 0.39 8.0 10 YR 5/3 10 YR 3/3 7 4 1546496 406075 Glacis Millet 0.47 9.0 10 YR 5/3 10 YR 3/3 7 5 1546674 406076 Glacis Millet 0.34 5.0 10 YR 5/4 10 YR 3/4 8 1 1536353 405200 Plateau Millet 0.42 8.5 7.5 YR 5/4 7.5 YR 4/4 8 2 1536353 405385 Glacis Fallow 0.44 9.0 7.5 YR 5/4 7.5 YR 4/4 8 3 1536353 405566 Glacis Groundnut 0.41 9.5 7.5 YR 5/4 7.5 YR 3/4 8 4 1536355 405746 Plateau Millet 0.39 10.0 7.5 YR 5/4 7.5 YR 3/4 8 5 1536353 405925 Plateau Millet 0.64 9.0 7.5 YR 5/4 7.5 YR 3/4 9 1 1524205 397826 Glacis Groundnut 0.53 9.0 10 YR 5/3 10 YR 4/3 9 2 1524382 397827 Glacis Groundnut 0.46 8.5 7.5 YR 6/2 7.5 YR 4/2 9 3 1524563 397824 Glacis Groundnut 0.45 9.0 7.5 YR 6/3 7,5 YR 3/4 9 4 1524746 397826 Glacis Groundnut 0.51 8.5 10 YR 5/3 10 YR 4/3 9 5 1524924 397826 Glacis Groundnut 0.38 10.5 10 YR 5/3 10 YR 4/3 10 1 1543224 422065 Glacis Maize 0.54 9.0 7.5 YR 4/6 7.5 YR 3/4 10 2 1543403 422066 Glacis Fallow 0.45 13.0 7.5 YR 5/4 7.5 YR 3/4 10 3 1543583 422063 Glacis Millet 0.48 8.5 7.5 YR 5/4 7.5 YR 3/4 10 4 1543764 422067 Glacis Millet 0.54 9.5 7.5 YR 5/4 7.5 YR 3/4 10 5 1543944 422065 Glacis Maize 0.58 14.5 7.5 YR 5/4 7.5 YR 3/4 11 1 1540647 420471 Plateau Millet 0.48 13.5 7.5 YR 6/4 7.5 YR 3/4 11 2 1540644 420655 Glacis Millet 0.41 9.0 7.5 YR 5/4 7.5 YR 3/4 11 3 1540643 420834 Glacis Millet 0.63 10.0 7.5 YR 5/4 7.5 YR 3/4 11 4 1540644 421014 Glacis Millet 0.52 8.5 7.5 YR 5/2 7.5 YR 4/2 11 5 1540646 421193 Glacis Millet 0.49 9.5 10 YR 5/3 10 YR 3/3

73

12 1 1544994 402296 Plateau Millet 0.56 8.0 7.5 YR 5/4 7.5 YR 3/4 12 2 1545173 402296 Plateau Millet 0.46 8.5 7.5 YR 5/4 7.5 YR 3/4 12 3 1545355 402298 Plateau Sorghum 0.59 10.0 7.5 YR 5/3 7.5 YR 4/3 12 4 1545533 402295 Plateau Natural Vegetation 0.39 10.5 7.5 YR 6/4 7.5 YR 4/4 12 5 1545714 402297 Glacis Natural Vegetation 0.44 10.5 7.5 YR 6/2 7.5 YR 4/2 13 1 1544966 422485 Bas-fond Natural Vegetation 0.81 16.5 10 YR 5/2 10 YR 3/2 13 2 1545144 422485 Bas-fond Natural Vegetation 0.42 9.5 10 YR 5/3 10 YR 4/3 13 3 1545323 422486 Bas-fond Natural Vegetation 0.34 7.5 10 YR 6/3 10 YR 4/4 13 4 1545506 422486 Glacis Groundnut 0.33 7.5 10 YR 6/3 10 YR 3/4 13 5 1545684 422484 Glacis Natural Vegetation 0.62 10.5 10 YR 5/3 10 YR 3/3 14 1 1542325 411567 Glacis Groundnut 0.40 10.0 10 YR 6/3 10 YR 4/3 14 2 1542506 411567 Glacis Millet 0.45 11.0 10 YR 5/3 10 YR 3/3 14 3 1542682 411567 Bas-fond Natural Vegetation 1.14 15.5 10 YR 5/2 10 YR 3/2 14 4 1542865 411567 Bas-fond Natural Vegetation 1.53 21.5 10 YR 4/1 10 YR 3/1 14 5 1543042 411563 Bas-fond Natural Vegetation 0.84 17.5 10 YR 5/2 10 YR 3/2 15 1 1540731 397105 Bas-fond Millet 0.40 10.0 10 YR 6/3 10 YR 3/3 15 2 1540734 397289 Bas-fond Millet 0.48 16.5 10 YR 6/2 10 YR 4/2 15 3 1540735 397486 Bas-fond Fallow 0.42 12.5 10 YR 6/2 10 YR 4/2 15 4 1540734 397643 Bas-fond Maize 0.58 9.0 10 YR 6/2 10 YR 4/2 16 1 1546523 407874 Bas-fond Millet 0.35 8.5 10 YR 6/3 10 YR 4/3 16 2 1546523 408055 Bas-fond Natural Vegetation 0.76 14.5 10 YR 5/3 10 YR 3/3 16 3 1546524 408238 Bas-fond Natural Vegetation 0.59 10.5 10 YR 5/3 10 YR 3/3 16 4 1546524 408414 Glacis Millet 0.36 8.0 10 YR 5/3 10 YR 4/3 16 5 1546524 408595 Glacis Millet 0.31 6.0 10 YR 5/3 10 YR 4/3 17 1 1540011 424164 Glacis Groundnut 0.45 8.5 10 YR 5/3 10 YR 3/3 17 2 1540012 424324 Bas-fond Tomato 0.91 18.5 10 YR 5/2 10 YR 3/2 17 3 1540014 424526 Bas-fond Groundnut 0.46 12.0 10 YR 6/3 10 YR 4/2 17 4 1540013 424702 Bas-fond Millet 0.31 8.5 10 YR 5/3 10 YR 4/3 17 5 1540014 424884 Bas-fond Groundnut 0.39 9.0 10 YR 5/3 10 YR 3/3 18 1 1538153 411625 Bas-fond Groundnut 0.37 7.5 10 YR 6/2 10 YR 4/2 18 2 1538022 411745 Bas-fond Millet 0.40 11.0 10 YR 6/2 10 YR 4/2 18 3 1537828 411777 Bas-fond Millet 0.50 8.5 10 YR 6/2 10 YR 4/2 18 4 1537639 411776 Bas-fond Maize 0.59 15.0 10 YR 6/2 10 YR 4/2 18 5 1537430 411782 Glacis Groundnut 0.49 7.0 7.5 YR 6/2 10 YR 4/2 19 1 1532815 408858 Glacis Groundnut 0.42 11.0 10 YR 6/3 10 YR 4/3 19 2 1532813 409046 Glacis Groundnut 0.31 9.5 10 YR 6/3 10 YR 4/3 19 3 1532815 409228 Bas-fond Natural Vegetation 0.64 15.5 10 YR 6/3 10 YR 4/2 19 4 1532813 409405 Bas-fond Groundnut 0.56 10.5 10 YR 6/3 10 YR 3/3 19 5 1532814 409594 Bas-fond Groundnut 0.68 11.0 10 YR 6/3 10 YR 3/3 20 1 1536984 392695 Bas-fond Millet 0.38 9.0 10 YR 5/3 10 YR 3/3 20 2 1536987 392875 Bas-fond Millet 0.89 12.0 10 YR 5/2 10 YR 3/2 20 3 1536984 393055 Bas-fond Millet 0.65 13.0 10 YR 5/2 10 YR 3/3 20 4 1536984 393234 Bas-fond Sorghum 0.96 15.5 10 YR 5/2 10 YR 3/2 20 5 1536984 393415 Bas-fond Natural Vegetation 1.27 21.0 10 YR 5/2 10 YR 3/2 21 1 1536800 405924 Plateau Groundnut 0.56 15.5 7.5 YR 5/4 7.5 YR 3/4 21 2 1536986 405926 Glacis Millet 0.51 11.5 7.5 YR 5/4 7.5 YR 3/4 21 3 1537164 405926 Glacis Groundnut 0.42 10.0 7.5 YR 5/4 7.5 YR 3/4 21 4 1537345 405926 Glacis Millet 0.45 7.5 7.5 YR 5/4 7.5 YR 3/4 21 5 1537521 405927 Glacis Millet 0.39 8.5 7.5 YR 5/4 7.5 YR 3/4 22 1 1537733 423867 Glacis Millet 0.36 9.0 7.5 YR 5/4 7.5 YR 4/6 22 2 1537913 423865 Glacis Millet 0.40 8.5 7.5 YR 5/4 7.5 YR 4/6 22 3 1538093 423866 Glacis Groundnut 0.53 11.5 7.5 YR 5/4 7.5 YR 3/4 22 4 1538273 423864 Glacis Millet 0.46 7.5 7.5 YR 5/4 7.5 YR 3/4 22 5 1538455 423864 Glacis Millet 0.46 10.0 7.5 YR 5/3 7.5 YR 4/3 23 1 1524085 407936 Glacis Maize 0.46 15.0 7,5 YR 5/6 7.5 YR 4/4 23 2 1524087 408116 Glacis Millet 0.56 14.5 7.5 YR 5/4 7.5 YR 4/4 23 3 1524086 408294 Glacis Groundnut / Millet 0.74 15.5 7.5 YR 5/4 7.5 YR 4/2 23 4 1524086 408473 Glacis Maize 0.59 13.5 7.5 YR 5/4 7.5 YR 3/4 23 5 1524083 408655 Glacis Natural Vegetation 0.39 8.0 7.5 YR 5/4 7.5 YR 4/4 24 1 1534103 393805 Glacis Millet 0.50 10.5 10 YR 5/3 10 YR 3/3 24 2 1534282 393805 Glacis Groundnut 0.33 8.5 10 YR 5/2 10 YR 4/2 24 3 1534466 393806 Glacis Millet 0.43 7.0 10 YR 6/3 10 YR 4/3

74

24 4 1534644 393807 Glacis Millet 0.40 9.0 10 YR 5/3 10 YR 3/3 24 5 1534828 393806 Glacis Millet 0.45 6.5 10 YR 5/3 10 YR 3/3 26 1 1538664 409644 Bas-fond Millet 0.68 13.5 10 YR 5/2 10 YR 3/2 26 2 1538844 409644 Bas-fond Millet 0.43 8.5 10 YR 5/3 10 YR 3/3 26 3 1539027 409652 Bas-fond Groundnut 0.48 8.0 10 YR 5/3 10 YR 3/3 26 4 1539204 409646 Bas-fond Groundnut 0.49 12.5 10 YR 5/3 10 YR 3/3 26 5 1539384 409646 Bas-fond Millet 0.38 8.5 10 YR 5/3 10 YR 4/3 27 1 1521655 403254 Glacis Millet 0.36 7.5 10 YR 6/3 10 YR 4/3 27 2 1521837 403252 Glacis Fallow 0.47 12.0 10 YR 6/3 10 YR 4/3 27 3 1522012 403252 Glacis Groundnut 0.73 16.5 10 YR 5/2 10 YR 4/2 27 4 1522198 403257 Plateau Groundnut 0.66 13.5 10 YR 6/3 10 YR 4/2 28 1 1539597 410667 Bas-fond Groundnut 0.50 10.0 10 YR 5/3 10 YR 4/2 28 2 1539593 410847 Bas-fond Natural Vegetation 0.67 10.5 10 YR 5/3 10 YR 3/3 28 3 1539595 411027 Bas-fond Groundnut 0.73 18.0 10 YR 5/3 10 YR 3/3 28 4 1539599 411206 Bas-fond Natural Vegetation 0.91 18.0 10 YR 5/3 10 YR 3/3 28 5 1539594 411385 Bas-fond Millet 1.12 26.0 10 YR 5/2 10 YR 4/3 29 1 1521504 416362 Glacis Natural Vegetation 0.61 7.5 7.5 YR 5/4 7.5 YR 3/4 29 2 1521682 416365 Glacis Groundnut 0.55 12.5 7.5 YR 5/4 7.5 YR 4/2 29 3 1521866 416365 Glacis Natural Vegetation 0.51 9.5 7.5 YR 5/4 7.5 YR 4/2 29 4 1522044 416365 Glacis Groundnut 0.52 14.5 7.5 YR 5/4 7.5 YR 4/2 29 5 1522224 416365 Glacis Sorghum 0.93 16.5 7.5 YR 5/4 7.5 YR 3/3 30 1 1540377 398002 Glacis Groundnut 0.48 10.5 10 YR 5/3 10 YR 3/3 30 3 1540377 398368 Glacis Millet 0.49 10.5 7.5 YR 5/4 7.5 YR 3/4 30 4 1540376 398544 Glacis Millet 0.40 7.5 7.5 YR 5/4 7.5 YR 4/4 30 5 1540376 398726 Glacis Groundnut 0.36 5.5 7.5 YR 5/6 7.5 YR 4/4 31 1 1522554 412556 Plateau Natural Vegetation 0.66 15.0 7.5 YR 4/6 7.5 YR 3/4 31 2 1522735 412554 Plateau Natural Vegetation 0.48 16.5 7.5 YR 5/6 7.5 YR 4/4 31 4 1523094 412555 Glacis Natural Vegetation 0.71 11.5 7.5 YR 5/4 7.5 YR 4/6 31 5 1523275 412556 Glacis Natural Vegetation 0.56 11.0 7.5 YR 5/4 7.5 YR 3/4 32 1 1544994 409606 Bas-fond Cattle Corral 0.41 11.0 10 YR 6/3 10 YR 4/3 32 3 1545356 409614 Bas-fond Sorghum 0.45 12.0 10 YR 6/2 10 YR 4/2 32 4 1545533 409614 Bas-fond Millet (+nat veg) 0.36 9.0 10 YR 5/3 10 YR 4/3 32 5 1545715 409616 Bas-fond Millet (+nat veg) 0.49 13.5 10 YR 6/3 10 YR 4/3 33 1 1528764 416180 Glacis Natural Vegetation 0.70 17.5 10 YR 5/3 10 YR 4/3 33 2 1528944 416186 Glacis Natural Vegetation 0.52 13.5 10 YR 5/3 10 YR 4/3 33 3 1529123 416186 Glacis Millet 0.51 10.0 10 YR 5/3 10 YR 4/3 33 4 1529302 416186 Plateau Natural Vegetation 0.77 15.0 7.5 YR 6/4 7.5 YR 3/4 33 5 1529482 416187 Plateau Natural Vegetation 0.60 13.5 7.5 YR 5/4 7.5 YR 4/4

75

Appendix B

Summary statistics of the soil organic carbon and fine fraction contents on cluster level

N Range Minimum Maximum Mean Std. Variance CLUSTER Statistic Statistic Statistic Statistic Statistic Std. Error DeviationStatistic Statistic 1 SOC 5 .210 .420 .630 .53600 .04411 .098641 .010 Valid N (listwise) 5 2 SOC 5 .280 .320 .600 .44200 .05903 .131985 .017 Valid N (listwise) 5 3 SOC 5 .340 .390 .730 .48600 .06242 .139571 .019 Valid N (listwise) 5 4 SOC 5 .380 .430 .810 .58600 .06516 .145705 .021 Valid N (listwise) 5 5 SOC 5 .370 .400 .770 .55200 .06507 .145499 .021 Valid N (listwise) 5 6 SOC 5 .870 .510 1.380 .70000 .17012 .380395 .145 Valid N (listwise) 5 7 SOC 5 .280 .340 .620 .47200 .05034 .112561 .013 Valid N (listwise) 5 8 SOC 5 .250 .390 .640 .46000 .04572 .102225 .010 Valid N (listwise) 5 9 SOC 5 .150 .380 .530 .46600 .02619 .058566 .003 Valid N (listwise) 5 10 SOC 5 .130 .450 .580 .51800 .02332 .052154 .003 Valid N (listwise) 5 11 SOC 5 .220 .410 .630 .50600 .03586 .080187 .006 Valid N (listwise) 5 12 SOC 5 .200 .390 .590 .48800 .03760 .084083 .007 Valid N (listwise) 5 13 SOC 5 .480 .330 .810 .50400 .09255 .206954 .043 Valid N (listwise) 5 14 SOC 5 1.130 .400 1.530 .87200 .21292 .476099 .227 Valid N (listwise) 5 15 SOC 4 .180 .400 .580 .47000 .04041 .080829 .007 Valid N (listwise) 4 16 SOC 5 .450 .310 .760 .47400 .08675 .193985 .038 Valid N (listwise) 5 17 SOC 5 .600 .310 .910 .50400 .10496 .234691 .055 Valid N (listwise) 5 18 SOC 5 .220 .370 .590 .47000 .03912 .087464 .008 Valid N (listwise) 5 19 SOC 5 .370 .310 .680 .52200 .06917 .154661 .024 Valid N (listwise) 5 20 SOC 5 .890 .380 1.270 .83000 .14983 .335037 .112 Valid N (listwise) 5 21 SOC 5 .170 .390 .560 .46600 .03076 .068775 .005 Valid N (listwise) 5 22 SOC 5 .170 .360 .530 .44200 .02905 .064962 .004 Valid N (listwise) 5 23 SOC 5 .350 .390 .740 .54800 .05978 .133679 .018 Valid N (listwise) 5 24 SOC 5 .170 .330 .500 .42200 .02818 .063008 .004 Valid N (listwise) 5 26 SOC 5 .300 .380 .680 .49200 .05093 .113886 .013 Valid N (listwise) 5 27 SOC 4 .370 .360 .730 .55500 .08510 .170196 .029 Valid N (listwise) 4 28 SOC 5 .620 .500 1.120 .78600 .10614 .237339 .056 Valid N (listwise) 5 29 SOC 5 .420 .510 .930 .62400 .07846 .175442 .031 Valid N (listwise) 5 30 SOC 4 .130 .360 .490 .43250 .03146 .062915 .004 Valid N (listwise) 4 31 SOC 4 .230 .480 .710 .60250 .05138 .102754 .011 Valid N (listwise) 4 32 SOC 4 .130 .360 .490 .42750 .02780 .055603 .003 Valid N (listwise) 4 33 SOC 5 .260 .510 .770 .62000 .05070 .113358 .013 Valid N (listwise) 5

76

N Range Minimum Maximum Mean Std. Variance CLUSTER Statistic Statistic Statistic Statistic Statistic Std. Error DeviationStatistic Statistic 1 FF 5 4.0 10.5 14.5 12.600 .900 2.0125 4.050 Valid N (listwise) 5 2 FF 5 3.5 9.0 12.5 10.600 .731 1.6355 2.675 Valid N (listwise) 5 3 FF 5 5.0 8.5 13.5 10.900 1.077 2.4083 5.800 Valid N (listwise) 5 4 FF 5 8.5 6.5 15.0 10.500 1.351 3.0208 9.125 Valid N (listwise) 5 5 FF 5 3.0 10.0 13.0 11.700 .539 1.2042 1.450 Valid N (listwise) 5 6 FF 5 22.0 10.0 32.0 16.200 4.152 9.2844 86.200 Valid N (listwise) 5 7 FF 5 10.0 5.0 15.0 8.800 1.685 3.7683 14.200 Valid N (listwise) 5 8 FF 5 1.5 8.5 10.0 9.200 .255 .5701 .325 Valid N (listwise) 5 9 FF 5 2.0 8.5 10.5 9.100 .367 .8216 .675 Valid N (listwise) 5 10 FF 5 6.0 8.5 14.5 10.900 1.198 2.6786 7.175 Valid N (listwise) 5 11 FF 5 5.0 8.5 13.5 10.100 .886 1.9812 3.925 Valid N (listwise) 5 12 FF 5 2.5 8.0 10.5 9.500 .524 1.1726 1.375 Valid N (listwise) 5 13 FF 5 9.0 7.5 16.5 10.300 1.655 3.7014 13.700 Valid N (listwise) 5 14 FF 5 11.5 10.0 21.5 15.100 2.118 4.7355 22.425 Valid N (listwise) 5 15 FF 4 7.5 9.0 16.5 12.000 1.671 3.3417 11.167 Valid N (listwise) 4 16 FF 5 8.5 6.0 14.5 9.500 1.440 3.2210 10.375 Valid N (listwise) 5 17 FF 5 10.0 8.5 18.5 11.300 1.914 4.2808 18.325 Valid N (listwise) 5 18 FF 5 8.0 7.0 15.0 9.800 1.471 3.2901 10.825 Valid N (listwise) 5 19 FF 5 6.0 9.5 15.5 11.500 1.037 2.3184 5.375 Valid N (listwise) 5 20 FF 5 12.0 9.0 21.0 14.100 2.015 4.5056 20.300 Valid N (listwise) 5 21 FF 5 8.0 7.5 15.5 10.600 1.400 3.1305 9.800 Valid N (listwise) 5 22 FF 5 4.0 7.5 11.5 9.300 .682 1.5248 2.325 Valid N (listwise) 5 23 FF 5 7.5 8.0 15.5 13.300 1.366 3.0537 9.325 Valid N (listwise) 5 24 FF 5 4.0 6.5 10.5 8.300 .718 1.6047 2.575 Valid N (listwise) 5 26 FF 5 5.5 8.0 13.5 10.200 1.158 2.5884 6.700 Valid N (listwise) 5 27 FF 4 9.0 7.5 16.5 12.375 1.875 3.7500 14.063 Valid N (listwise) 4 28 FF 5 16.0 10.0 26.0 16.500 2.941 6.5765 43.250 Valid N (listwise) 5 29 FF 5 9.0 7.5 16.5 12.100 1.631 3.6469 13.300 Valid N (listwise) 5 30 FF 4 5.0 5.5 10.5 8.500 1.225 2.4495 6.000 Valid N (listwise) 4 31 FF 4 5.5 11.0 16.5 13.500 1.339 2.6771 7.167 Valid N (listwise) 4 32 FF 4 4.5 9.0 13.5 11.375 .944 1.8875 3.563 Valid N (listwise) 4 33 FF 5 7.5 10.0 17.5 13.900 1.219 2.7249 7.425 Valid N (listwise) 5

77

Appendix C

Scatterplots showing the observed vs. the predicted values for the three landscape positions

Soil Organic Carbon Models

Plateau - SOC Model 1 Plateau - SOC Model #2

1.20 1.20

1.00 1.00

0.80 0.80

0.60 0.60 0.40 Observed (%)

Observed (%) Observed 0.40

0.20 0.20 0.00 0.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 Predicted (%) Predicted (%)

Glacis - SOC Model 1 Glacis - SOC Model 2

1.60 1.60 1.40 1.40 1.20 1.20 1.00 1.00 0.80 0.80 0.60 0.60 Observed (%) Observed (%) 0.40 0.40 0.20 0.20 0.00 0.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 Predicted (%) Predicted (%)

Bas-fond - SOC Model 1 Bas-fond - SOC Model 2

1.60 1.60 1.40 1.40 1.20 1.20 1.00 1.00 0.80 0.80 0.60 0.60 Observed (%) Observed Observed (%) 0.40 0.40 0.20 0.20 0.00 0.00 0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00 Predicted (%) Predicted (%)

78

Fine Fraction Models

Plateau - FF Model 1 Plateau - FF Model 2 25.00 25.00

20.00 20.00

15.00 15.00

10.00 10.00 Observed (%) Observed (%) 5.00 5.00

0.00 0.00 0.00 5.00 10.00 15.00 20.00 25.00 0.00 5.00 10.00 15.00 20.00 25.00 Predicted (%) Predicted (%)

Glacis - FF Model 1 Glacis - FF Model 2

35.00 35.00

30.00 30.00

25.00 25.00 20.00 20.00

15.00 15.00 Observed (%) Observed (%) 10.00 10.00

5.00 5.00

0.00 0.00 0.00 5.00 10.00 15.00 20.00 25.00 0.00 5.00 10.00 15.00 20.00 25.00 Predicted (%) Predicted (%)

Bas-fond - FF Model 1 Bas-fond - FF Model 2

30.00 30.00

25.00 25.00

20.00 20.00

15.00 15.00

10.00

10.00 Observed (%) Observed (%)

5.00 5.00

0.00 0.00 0.00 5.00 10.00 15.00 20.00 25.00 0.00 5.00 10.00 15.00 20.00 25.00 Predicted (%) Predicted (%)

79

Appendix D

Summary statistics of the PEs and SPEs of the two models on cluster level

N Range Minimum Maximum Mean Std. Variance CLUSTER Statistic Statistic Statistic Statistic Statistic Std. Error DeviationStatistic Statistic 1.00 SOC PE #1 5 .251 -.170 .081 -.06220 .05543 .123944 .015 SOC SPE #1 5 .029 .000 .029 .01120 .00527 .011777 .000 SOC PE #2 5 .251 -.170 .081 -.06220 .05543 .123944 .015 SOC SPE #2 5 .025 .004 .029 .01620 .00479 .010710 .000 Valid N (listwise) 5 2.00 SOC PE #1 5 .278 -.185 .093 -.03160 .05899 .131897 .017 SOC SPE #1 5 .032 .002 .034 .01500 .00653 .014612 .000 SOC PE #2 5 .278 -.185 .093 -.03160 .05899 .131897 .017 SOC SPE #2 5 .032 .002 .034 .01500 .00653 .014612 .000 Valid N (listwise) 5 3.00 SOC PE #1 5 .395 -.377 .018 -.13520 .06981 .156089 .024 SOC SPE #1 5 .142 .000 .142 .03760 .02667 .059635 .004 SOC PE #2 5 .195 -.177 .018 -.09520 .04046 .090464 .008 SOC SPE #2 5 .031 .000 .031 .01540 .00673 .015043 .000 Valid N (listwise) 5 4.00 SOC PE #1 5 .363 -.518 -.155 -.29520 .06477 .144820 .021 SOC SPE #1 5 .244 .024 .268 .10380 .04411 .098642 .010 SOC PE #2 5 .363 -.518 -.155 -.29520 .06477 .144820 .021 SOC SPE #2 5 .244 .024 .268 .10380 .04411 .098642 .010 Valid N (listwise) 5 5.00 SOC PE #1 5 .400 -.409 -.009 -.15420 .07178 .160495 .026 SOC SPE #1 5 .167 .000 .167 .04440 .03126 .069895 .005 SOC PE #2 5 .200 -.209 -.009 -.11420 .04069 .090987 .008 SOC SPE #2 5 .044 .000 .044 .01980 .00862 .019267 .000 Valid N (listwise) 5 6.00 SOC PE #1 5 .827 -1.016 -.189 -.37640 .16035 .358556 .129 SOC SPE #1 5 .996 .036 1.032 .24460 .19692 .440328 .194 SOC PE #2 5 .824 -.516 .308 -.13640 .13572 .303484 .092 SOC SPE #2 5 .265 .001 .266 .09240 .04609 .103060 .011 Valid N (listwise) 5 7.00 SOC PE #1 5 .278 -.213 .065 -.06120 .05008 .111979 .013 SOC SPE #1 5 .044 .001 .045 .01380 .00823 .018404 .000 SOC PE #2 5 .278 -.213 .065 -.06120 .05008 .111979 .013 SOC SPE #2 5 .044 .001 .045 .01380 .00823 .018404 .000 Valid N (listwise) 5 8.00 SOC PE #1 5 .243 -.352 -.109 -.17400 .04507 .100770 .010 SOC SPE #1 5 .112 .012 .124 .03840 .02148 .048024 .002 SOC PE #2 5 .230 -.152 .078 -.01400 .04828 .107956 .012 SOC SPE #2 5 .021 .002 .023 .00940 .00379 .008473 .000 Valid N (listwise) 5 9.00 SOC PE #1 5 .274 -.247 .027 -.12360 .04432 .099095 .010 SOC SPE #1 5 .060 .001 .061 .02320 .01020 .022808 .001 SOC PE #2 5 .274 -.247 .027 -.12360 .04432 .099095 .010 SOC SPE #2 5 .060 .001 .061 .02320 .01020 .022808 .001 Valid N (listwise) 5 10.00 SOC PE #1 5 .319 -.272 .047 -.05580 .05662 .126608 .016 SOC SPE #1 5 .073 .001 .074 .01600 .01450 .032427 .001 SOC PE #2 5 .319 -.272 .047 -.05580 .05662 .126608 .016 SOC SPE #2 5 .073 .001 .074 .01600 .01450 .032427 .001 Valid N (listwise) 5 11.00 SOC PE #1 5 .337 -.245 .092 -.10500 .06849 .153155 .023 SOC SPE #1 5 .060 .000 .060 .03000 .01254 .028045 .001 SOC PE #2 5 .407 -.245 .162 -.09100 .07916 .177007 .031 SOC SPE #2 5 .060 .000 .060 .03340 .01154 .025803 .001 Valid N (listwise) 5 12.00 SOC PE #1 5 .199 -.086 .113 -.01140 .03758 .084032 .007 SOC SPE #1 5 .012 .001 .013 .00560 .00209 .004669 .000 SOC PE #2 5 .269 -.086 .183 .04260 .05222 .116770 .014 SOC SPE #2 5 .032 .001 .033 .01240 .00593 .013259 .000 Valid N (listwise) 5 13.00 SOC PE #1 5 .479 -.399 .080 -.09480 .09182 .205314 .042 SOC SPE #1 5 .159 .000 .159 .04260 .03010 .067300 .005 SOC PE #2 5 .288 -.208 .080 -.05480 .06282 .140477 .020 SOC SPE #2 5 .043 .000 .043 .01860 .00922 .020623 .000 Valid N (listwise) 5 14.00 SOC PE #1 5 1.078 -1.116 -.038 -.47040 .20717 .463244 .215 SOC SPE #1 5 1.245 .001 1.246 .39300 .23414 .523560 .274 SOC PE #2 5 .878 -.916 -.038 -.35040 .16746 .374453 .140 SOC SPE #2 5 .839 .001 .840 .23520 .15972 .357148 .128 Valid N (listwise) 5 15.00 SOC PE #1 4 .198 -.299 -.101 -.18400 .04260 .085202 .007 SOC SPE #1 4 .079 .010 .089 .03925 .01749 .034970 .001 SOC PE #2 4 .393 -.192 .201 -.05900 .08864 .177274 .031 SOC SPE #2 4 .030 .010 .040 .02700 .00704 .014071 .000 Valid N (listwise) 4 16.00 SOC PE #1 5 .412 -.353 .059 -.11160 .07183 .160626 .026 SOC SPE #1 5 .123 .001 .124 .03300 .02357 .052697 .003 SOC PE #2 5 .412 -.353 .059 -.11160 .07183 .160626 .026 SOC SPE #2 5 .123 .001 .124 .03300 .02357 .052697 .003 Valid N (listwise) 5

80

N Range Minimum Maximum Mean Std. Variance CLUSTER Statistic Statistic Statistic Statistic Statistic Std. Error DeviationStatistic Statistic 17.00 SOC PE #1 5 .595 -.498 .097 -.09580 .10400 .232561 .054 SOC SPE #1 5 .248 .000 .248 .05240 .04892 .109395 .012 SOC PE #2 5 .715 -.498 .217 -.05580 .12113 .270859 .073 SOC SPE #2 5 .246 .002 .248 .06180 .04728 .105725 .011 Valid N (listwise) 5 18.00 SOC PE #1 5 .212 -.176 .036 -.05980 .03763 .084138 .007 SOC SPE #1 5 .031 .000 .031 .00920 .00565 .012637 .000 SOC PE #2 5 .383 -.176 .207 .02020 .06744 .150795 .023 SOC SPE #2 5 .042 .001 .043 .01860 .00794 .017757 .000 Valid N (listwise) 5 19.00 SOC PE #1 5 .370 -.274 .096 -.11340 .06826 .152643 .023 SOC SPE #1 5 .075 .000 .075 .03140 .01389 .031069 .001 SOC PE #2 5 .370 -.274 .096 -.07340 .06331 .141562 .020 SOC SPE #2 5 .075 .000 .075 .02140 .01397 .031230 .001 Valid N (listwise) 5 20.00 SOC PE #1 5 .886 -.860 .026 -.42160 .14965 .334633 .112 SOC SPE #1 5 .739 .001 .740 .26760 .13071 .292273 .085 SOC PE #2 5 .686 -.660 .026 -.34160 .11567 .258649 .067 SOC SPE #2 5 .435 .001 .436 .17040 .07695 .172071 .030 Valid N (listwise) 5 21.00 SOC PE #1 5 .149 -.260 -.111 -.17520 .02745 .061382 .004 SOC SPE #1 5 .055 .012 .067 .03360 .01005 .022479 .001 SOC PE #2 5 .290 -.260 .030 -.09520 .05061 .113171 .013 SOC SPE #2 5 .067 .000 .067 .01920 .01234 .027599 .001 Valid N (listwise) 5 22.00 SOC PE #1 5 .323 -.183 .140 -.01460 .05989 .133923 .018 SOC SPE #1 5 .032 .001 .033 .01440 .00555 .012402 .000 SOC PE #2 5 .323 -.183 .140 -.01460 .05989 .133923 .018 SOC SPE #2 5 .032 .001 .033 .01440 .00555 .012402 .000 Valid N (listwise) 5 23.00 SOC PE #1 5 .353 -.309 .044 -.10240 .06478 .144858 .021 SOC SPE #1 5 .095 .000 .095 .02720 .01813 .040530 .002 SOC PE #2 5 .353 -.309 .044 -.06240 .06407 .143261 .021 SOC SPE #2 5 .095 .000 .095 .02020 .01871 .041847 .002 Valid N (listwise) 5 24.00 SOC PE #1 5 .181 -.185 -.004 -.07220 .03236 .072351 .005 SOC SPE #1 5 .034 .000 .034 .00940 .00640 .014311 .000 SOC PE #2 5 .381 -.185 .196 .08780 .07002 .156572 .025 SOC SPE #2 5 .029 .010 .039 .02720 .00493 .011032 .000 Valid N (listwise) 5 26.00 SOC PE #1 5 .299 -.272 .027 -.08440 .05111 .114275 .013 SOC SPE #1 5 .074 .000 .074 .01760 .01417 .031675 .001 SOC PE #2 5 .299 -.272 .027 -.08440 .05111 .114275 .013 SOC SPE #2 5 .074 .000 .074 .01760 .01417 .031675 .001 Valid N (listwise) 5 27.00 SOC PE #1 4 .366 -.230 .136 -.05750 .08458 .169166 .029 SOC SPE #1 4 .052 .001 .053 .02500 .01080 .021602 .000 SOC PE #2 4 .366 -.230 .136 -.05750 .08458 .169166 .029 SOC SPE #2 4 .052 .001 .053 .02500 .01080 .021602 .000 Valid N (listwise) 4 28.00 SOC PE #1 5 .625 -.710 -.085 -.37380 .10729 .239901 .058 SOC SPE #1 5 .497 .007 .504 .18580 .08927 .199616 .040 SOC PE #2 5 .655 -.710 -.055 -.29380 .11714 .261940 .069 SOC SPE #2 5 .501 .003 .504 .14120 .09297 .207887 .043 Valid N (listwise) 5 29.00 SOC PE #1 5 .423 -.519 -.096 -.22320 .07798 .174375 .030 SOC SPE #1 5 .260 .009 .269 .07400 .04952 .110720 .012 SOC PE #2 5 .423 -.319 .104 -.02320 .07798 .174375 .030 SOC SPE #2 5 .100 .002 .102 .02500 .01932 .043191 .002 Valid N (listwise) 5 30.00 SOC PE #1 4 .325 -.184 .141 -.03350 .07314 .146279 .021 SOC SPE #1 4 .033 .001 .034 .01725 .00685 .013696 .000 SOC PE #2 4 .288 .028 .316 .14150 .06258 .125157 .016 SOC SPE #2 4 .099 .001 .100 .03200 .02301 .046022 .002 Valid N (listwise) 4 31.00 SOC PE #1 4 .211 -.394 -.183 -.29300 .05013 .100256 .010 SOC SPE #1 4 .121 .034 .155 .09350 .02906 .058129 .003 SOC PE #2 4 .377 -.360 .017 -.14300 .08512 .170249 .029 SOC SPE #2 4 .130 .000 .130 .04200 .03057 .061139 .004 Valid N (listwise) 4 32.00 SOC PE #1 4 .177 -.126 .051 -.03000 .03725 .074498 .006 SOC SPE #1 4 .016 .000 .016 .00525 .00364 .007274 .000 SOC PE #2 4 .177 -.126 .051 -.03000 .03725 .074498 .006 SOC SPE #2 4 .016 .000 .016 .00525 .00364 .007274 .000 Valid N (listwise) 4 33.00 SOC PE #1 5 .230 -.440 -.210 -.29780 .04892 .109381 .012 SOC SPE #1 5 .150 .044 .194 .09840 .03153 .070493 .005 SOC PE #2 5 .230 -.240 -.010 -.09780 .04892 .109381 .012 SOC SPE #2 5 .058 .000 .058 .01920 .01202 .026883 .001 Valid N (listwise) 5

81

N Range Minimum Maximum Mean Std. Variance CLUSTER Statistic Statistic Statistic Statistic Statistic Std. Error DeviationStatistic Statistic 1.00 FF PE #1 5 3.540 -.540 3.000 1.19000 .75701 1.692720 2.865 FF SPE #1 5 9.000 .000 9.000 3.70800 2.16102 4.832191 23.350 FF PE #2 5 4.626 -2.788 1.838 -.38293 .73757 1.649263 2.720 FF SPE #2 5 7.773 .000 7.773 2.32269 1.49967 3.353359 11.245 Valid N (listwise) 5 2.00 FF PE #1 5 3.500 3.500 7.000 5.40000 .73144 1.635543 2.675 FF SPE #1 5 36.750 12.250 49.000 31.30000 7.82520 17.497678 306.169 FF PE #2 5 4.109 3.284 7.394 5.07968 .67886 1.517982 2.304 FF SPE #2 5 43.878 10.787 54.665 27.64656 7.41087 16.571203 274.605 Valid N (listwise) 5 3.00 FF PE #1 5 9.830 -2.330 7.500 2.36000 1.96584 4.395754 19.323 FF SPE #1 5 55.750 .500 56.250 21.02800 11.76557 26.308623 692.144 FF PE #2 5 10.797 -2.332 8.466 2.68096 2.16120 4.832585 23.354 FF SPE #2 5 71.163 .504 71.666 25.87063 14.87588 33.263478 1106.459 Valid N (listwise) 5 4.00 FF PE #1 5 8.380 -4.670 3.710 -.06800 1.34534 3.008275 9.050 FF SPE #1 5 21.730 .120 21.850 7.25800 4.49807 10.057995 101.163 FF PE #2 5 8.389 -4.675 3.714 -.06913 1.34670 3.011320 9.068 FF SPE #2 5 21.727 .125 21.852 7.25922 4.49755 10.056816 101.140 Valid N (listwise) 5 5.00 FF PE #1 5 5.210 .790 6.000 3.40400 1.02959 2.302223 5.300 FF SPE #1 5 35.370 .630 36.000 15.82800 6.80826 15.223740 231.762 FF PE #2 5 5.618 .794 6.411 3.45827 1.06924 2.390885 5.716 FF SPE #2 5 40.474 .630 41.103 16.53268 7.47658 16.718143 279.496 Valid N (listwise) 5 6.00 FF PE #1 5 20.790 -19.780 1.010 -4.98400 3.86405 8.640274 74.654 FF SPE #1 5 391.110 .290 391.400 84.60000 76.88353 171.9168 29555.389 FF PE #2 5 20.790 -19.784 1.006 -4.98757 3.86405 8.640270 74.654 FF SPE #2 5 391.114 .287 391.400 84.59928 76.88385 171.9175 29555.629 Valid N (listwise) 5 7.00 FF PE #1 5 10.000 1.000 11.000 7.20000 1.68523 3.768289 14.200 FF SPE #1 5 120.000 1.000 121.000 63.20000 19.65808 43.956797 1932.200 FF PE #2 5 9.138 1.490 10.628 7.16128 1.55558 3.478379 12.099 FF SPE #2 5 110.733 2.221 112.954 60.96323 18.44418 41.242445 1700.939 Valid N (listwise) 5 8.00 FF PE #1 5 2.350 .130 2.480 1.23600 .39061 .873430 .763 FF SPE #1 5 6.120 .020 6.140 2.13600 1.07694 2.408118 5.799 FF PE #2 5 2.344 .134 2.478 1.23691 .38944 .870820 .758 FF SPE #2 5 6.124 .018 6.142 2.13661 1.07748 2.409317 5.805 Valid N (listwise) 5 9.00 FF PE #1 5 4.070 1.490 5.560 3.72600 .79676 1.781609 3.174 FF SPE #1 5 28.740 2.210 30.950 16.42400 5.97078 13.351070 178.251 FF PE #2 5 4.809 1.485 6.294 3.88436 .89588 2.003259 4.013 FF SPE #2 5 37.414 2.206 39.619 18.29866 7.21963 16.143578 260.615 Valid N (listwise) 5 10.00 FF PE #1 5 8.740 -3.740 5.000 2.05200 1.64976 3.688973 13.609 FF SPE #1 5 24.750 .250 25.000 15.10000 4.16578 9.314975 86.769 FF PE #2 5 7.537 -3.742 3.796 .74386 1.48211 3.314102 10.983 FF SPE #2 5 11.195 3.211 14.407 9.33994 2.20383 4.927920 24.284 Valid N (listwise) 5 11.00 FF PE #1 5 4.500 .000 4.500 2.39000 .75115 1.679613 2.821 FF SPE #1 5 20.250 .000 20.250 7.96800 3.48616 7.795298 60.767 FF PE #2 5 4.664 -1.202 3.462 1.94189 .84168 1.882053 3.542 FF SPE #2 5 10.540 1.446 11.985 6.60462 1.97641 4.419387 19.531 Valid N (listwise) 5 12.00 FF PE #1 5 3.380 2.120 5.500 3.82400 .62768 1.403524 1.970 FF SPE #1 5 25.760 4.490 30.250 16.19800 4.89596 10.947706 119.852 FF PE #2 5 2.374 .842 3.216 1.96684 .41797 .934616 .874 FF SPE #2 5 9.636 .709 10.345 4.56727 1.70407 3.810405 14.519 Valid N (listwise) 5 13.00 FF PE #1 5 9.000 -.500 8.500 5.70000 1.65529 3.701351 13.700 FF SPE #1 5 72.000 .250 72.250 43.45000 13.60294 30.417100 925.200 FF PE #2 5 7.930 -.586 7.345 4.93664 1.41830 3.171407 10.058 FF SPE #2 5 53.603 .343 53.946 32.41667 9.02573 20.182153 407.319 Valid N (listwise) 5 14.00 FF PE #1 5 10.500 -5.500 5.000 .36600 1.84620 4.128230 17.042 FF SPE #1 5 30.000 .250 30.250 13.77000 6.00139 13.419510 180.083 FF PE #2 5 9.859 -4.832 5.028 .66808 1.71331 3.831071 14.677 FF SPE #2 5 24.812 .465 25.277 12.18801 5.31564 11.886138 141.280 Valid N (listwise) 5 15.00 FF PE #1 4 6.830 -5.700 1.130 -1.41500 1.62195 3.243892 10.523 FF SPE #1 4 31.330 1.130 32.460 9.88750 7.56868 15.137356 229.140 FF PE #2 4 6.829 -5.697 1.132 -1.41604 1.62110 3.242200 10.512 FF SPE #2 4 31.323 1.135 32.458 9.88905 7.56749 15.134970 229.067 Valid N (listwise) 4 16.00 FF PE #1 5 6.000 1.500 7.500 4.43400 1.00063 2.237472 5.006 FF SPE #1 5 54.000 2.250 56.250 23.67200 9.30156 20.798928 432.595 FF PE #2 5 6.024 1.412 7.436 4.34810 .98997 2.213635 4.900 FF SPE #2 5 53.300 1.994 55.294 22.82615 9.07190 20.285390 411.497 Valid N (listwise) 5

82

N Range Minimum Maximum Mean Std. Variance CLUSTER Statistic Statistic Statistic Statistic Statistic Std. Error DeviationStatistic Statistic 17.00 FF PE #1 5 10.000 -2.500 7.500 4.70000 1.91442 4.280771 18.325 FF SPE #1 5 50.000 6.250 56.250 36.75000 10.65686 23.829472 567.844 FF PE #2 5 10.028 -1.812 8.216 4.55080 1.77858 3.977024 15.817 FF SPE #2 5 64.219 3.283 67.503 33.36316 11.72479 26.217417 687.353 Valid N (listwise) 5 18.00 FF PE #1 5 8.000 1.000 9.000 6.20000 1.47139 3.290137 10.825 FF SPE #1 5 80.000 1.000 81.000 47.10000 14.96508 33.462946 1119.769 FF PE #2 5 8.496 1.304 9.800 6.68736 1.45484 3.253120 10.583 FF SPE #2 5 94.347 1.700 96.048 53.18701 15.69668 35.098847 1231.929 Valid N (listwise) 5 19.00 FF PE #1 5 6.000 .500 6.500 4.50000 1.03682 2.318405 5.375 FF SPE #1 5 42.000 .250 42.250 24.55000 6.84352 15.302573 234.169 FF PE #2 5 6.222 .154 6.375 4.33616 1.08125 2.417751 5.846 FF SPE #2 5 40.620 .024 40.643 23.47870 6.62688 14.818143 219.577 Valid N (listwise) 5 20.00 FF PE #1 5 12.000 -5.000 7.000 1.90000 2.01494 4.505552 20.300 FF SPE #1 5 48.750 .250 49.000 19.85000 8.34551 18.661123 348.238 FF PE #2 5 10.337 -3.510 6.826 3.07616 1.81252 4.052921 16.426 FF SPE #2 5 42.156 4.444 46.600 22.60369 7.17433 16.042294 257.355 Valid N (listwise) 5 21.00 FF PE #1 5 7.560 -4.910 2.650 -.23800 1.31646 2.943700 8.665 FF SPE #1 5 23.970 .140 24.110 6.99200 4.44300 9.934856 98.701 FF PE #2 5 7.563 -4.910 2.653 -.23924 1.31653 2.943846 8.666 FF SPE #2 5 23.974 .137 24.110 6.99022 4.44345 9.935852 98.721 Valid N (listwise) 5 22.00 FF PE #1 5 4.870 .130 5.000 3.29200 .95941 2.145313 4.602 FF SPE #1 5 24.980 .020 25.000 14.51000 5.19770 11.622401 135.080 FF PE #2 5 4.944 -.119 4.825 1.98045 .91104 2.037150 4.150 FF SPE #2 5 23.270 .014 23.284 7.24218 4.27355 9.555948 91.316 Valid N (listwise) 5 23.00 FF PE #1 5 6.560 -1.500 5.060 .18400 1.23899 2.770466 7.675 FF SPE #1 5 25.580 .060 25.640 6.18400 4.87928 10.910400 119.037 FF PE #2 5 8.976 -3.913 5.063 -.66500 1.56167 3.492003 12.194 FF SPE #2 5 25.577 .058 25.635 10.19750 4.68905 10.485036 109.936 Valid N (listwise) 5 24.00 FF PE #1 5 8.300 1.200 9.500 4.78200 1.36177 3.045016 9.272 FF SPE #1 5 88.810 1.440 90.250 30.29000 15.59676 34.875410 1216.294 FF PE #2 5 10.231 1.201 11.432 5.16948 1.70711 3.817218 14.571 FF SPE #2 5 129.257 1.443 130.700 38.38042 23.47843 52.499360 2756.183 Valid N (listwise) 5 26.00 FF PE #1 5 5.500 2.500 8.000 5.80000 1.15758 2.588436 6.700 FF SPE #1 5 57.750 6.250 64.000 39.00000 12.26428 27.423758 752.063 FF PE #2 5 5.302 3.834 9.135 6.97160 1.04648 2.339991 5.476 FF SPE #2 5 68.755 14.696 83.452 52.98365 13.63003 30.477663 928.888 Valid N (listwise) 5 27.00 FF PE #1 4 9.000 -3.000 6.000 1.12500 1.87500 3.750000 14.063 FF SPE #1 4 36.000 .000 36.000 11.81250 8.28614 16.572285 274.641 FF PE #2 4 10.447 -4.337 6.110 .85730 2.14614 4.292285 18.424 FF SPE #2 4 37.271 .056 37.327 14.55275 8.68008 17.360156 301.375 Valid N (listwise) 4 28.00 FF PE #1 5 16.000 -10.000 6.000 -.50000 2.94109 6.576473 43.250 FF SPE #1 5 96.000 4.000 100.000 34.85000 17.56481 39.276106 1542.613 FF PE #2 5 15.117 -8.355 6.762 .62072 2.83321 6.335255 40.135 FF SPE #2 5 69.751 .059 69.809 32.49366 13.47132 30.122783 907.382 Valid N (listwise) 5 29.00 FF PE #1 5 7.000 -.500 6.500 3.08400 1.35796 3.036491 9.220 FF SPE #1 5 42.000 .250 42.250 16.89600 9.15412 20.469235 418.990 FF PE #2 5 6.835 -.640 6.195 2.98685 1.34234 3.001570 9.009 FF SPE #2 5 37.971 .410 38.381 16.12882 8.63264 19.303177 372.613 Valid N (listwise) 5 30.00 FF PE #1 4 8.020 -.020 8.000 4.13000 1.86901 3.738030 13.973 FF SPE #1 4 64.000 .000 64.000 27.54500 15.36640 30.732807 944.505 FF PE #2 4 6.489 -.018 6.471 3.21047 1.40148 2.802966 7.857 FF SPE #2 4 41.876 .000 41.877 16.19958 9.43960 18.879209 356.425 Valid N (listwise) 4 31.00 FF PE #1 4 6.390 -5.360 1.030 -2.06750 1.55965 3.119309 9.730 FF SPE #1 4 28.760 .010 28.770 11.58000 6.85591 13.711812 188.014 FF PE #2 4 6.393 -5.364 1.029 -2.06811 1.56037 3.120734 9.739 FF SPE #2 4 28.757 .015 28.771 11.58133 6.85554 13.711079 187.994 Valid N (listwise) 4 32.00 FF PE #1 4 7.370 -.370 7.000 3.90750 1.55624 3.112484 9.688 FF SPE #1 4 48.860 .140 49.000 22.53500 10.20912 20.418249 416.905 FF PE #2 4 6.872 -.370 6.501 3.20472 1.40497 2.809934 7.896 FF SPE #2 4 42.128 .137 42.266 16.19204 9.07291 18.145814 329.271 Valid N (listwise) 4 33.00 FF PE #1 5 7.300 -6.810 .490 -2.72400 1.25845 2.813979 7.918 FF SPE #1 5 46.160 .240 46.400 13.76000 8.52463 19.061663 363.347 FF PE #2 5 7.305 -6.812 .493 -2.72387 1.25927 2.815809 7.929 FF SPE #2 5 46.160 .243 46.403 13.76249 8.52470 19.061803 363.352 Valid N (listwise) 5

83

84