GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Assessment of Land cover and forest loss in prime Chimpanzee habitat in the Boé of -, West-Africa

Final thesis report

Thierry van der Hoeven August 14th, 2020

Master Thesis submitted to the Faculty of Geosciences of Utrecht University in fulfilment of the requirements of the thesis module of the Master’s program Geographical Information Management and Applications. In co-operation with the Chimbo Foundation. Specialization: Remote sensing

1

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Abstract The Boé is a unique nature reserve in the southeastern border region of Guinea-Bissau that is home to a relatively large chimpanzee population. However, nowadays there are a number of phenomena threatening this nature. One of these phenomena is the emergence of cashew plantations. One of the focusses of this research is on this problem. However, it is difficult to distinguish cashew plantations from gallery and dry forests with remote sensing data with medium resolution. Furthermore, it is quite difficult to collect representative samples in the study area because there are few resources available and it is a very diverse and fragmented area in terms of landscape.

The research aim of this study has been to identify land use trends with an emphasis on deforestation in the last 20 years and specifically in the last 4 years in the Boé area. To achieve this, samples were collected during a fieldwork campaign of four months. During this fieldwork campaign, different land cover types were sampled throughout the Boé area with an emphasis on cashew plantations and forests. After all the samples were collected, several machine learning algorithms (MLA’s) were tested to see which one provided the most representative land cover classification of the Boé area. The accuracy of the classifications turned out to be reasonably well, with an average overall accuracy of about 70% and user's accuracies of around 80 and 90% were achieved, especially for the cashew plantations and gallery forests. A change detection analysis has revealed that 70 ha of gallery forest has been lost in the Boé area between 2016 and 2020 and that the total area of cashew plantations has increased in this same period. In addition, an explorative assessment of two different forest monitoring methods was conducted to see whether deforestation could also be mapped for a longer period of time (between 2001 and 2020). The first assessment of the proven Hansen dataset resulted in some striking results. This is because the dataset showed that until 2013 not much deforestation had occurred but after 2013 but from that year on, the extent of forest loss increased rapidly in the Boé area. However, it has also shown that the extent of forest loss in the Boé area was less than in some other areas in Guinea-Bissau and in the neighboring country of Guinea. The second assessment, the BFAST monitoring method has observed a lot of forest disturbance events in the Boé area. Because all of the conducted analysis have measured forest loss, it can be concluded that the habitat of the Chimpanzees in the Boé area is decreasing, although not as much as in other surrounding areas.

2

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Preface and acknowledgements Before you lies the GIMA MSc. thesis report "Assessment of Land cover and forest loss in prime Chimpanzee habitat in the Boé of Guinea-Bissau, West-Africa". The main aim of this thesis report is to identify spatiotemporal changes in land cover in the Boé in Guinea-Bissau, with a special focus on forest loss. Furthermore, it tried to assess whether different forest monitoring systems can be used to monitor forest loss in a fragmented landscape like the Boé over a longer period of time

This thesis report has been written as part of the graduation of the master programme Geographical Information Management & Applications and was conducted in the period September 2019 until August 2020.

As I was looking for a subject for the master thesis, I liked to do practical fieldwork combined with remote sensing. The WUR and the Chimbo Foundation provided me with the opportunity to accomplish both those things and also allowed me to go abroad and to contribute to the conservation of the unique environment of the Boé. Moreover, this research allowed me to develop my skills in remote sensing, which was relatively unknown for me, and to work with new program languages.

I would like to thank my supervisors René Henkens and Ron van Lammeren and the founders of Chimbo, Annemarie Goedmakers and Piet Wit, for their guidance and support during this process and for all their feedback which led to the completion of this thesis. I would also like to thank all the other members of the Chimbo team and in particular Anouk Puijk and my guides Sakamoussa and Djei, who guided me throughout the Boé and without whose support I would not have been able to conduct the fieldwork. In addition, I would like to thank the local population of Belí and my special gratitude goes to Suleymane Diallo and his family, for the Fulani lessons, all the nice things that they have done for me and for making me feel at home during my period in Guinea-Bissau.

Thierry van der Hoeven Amsterdam, October 2020

3

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Table of Contents 1. Introduction ...... 11 1.1 Boé as valuable nature region ...... 11 1.1.1 Chimpanzees ...... 11 1.1.2 Other endangered mammals and animals ...... 12 1.1.3 Sacred forests ...... 12 1.1.4 Importance of land cover data for the assessment of land management practices ...... 12 1.2 Background & Context ...... 14 1.2.1 Guinea-Bissau ...... 14 1.2.2 Population ...... 14 1.2.3 Climate and soil characteristics ...... 14 1.2.4 IBAP and the system of national protected areas ...... 15 1.2.5 Chimbo Foundation ...... 16 1.3 Problem Definition ...... 17 1.3.1 Threats to the habitat of Chimpanzees in the Boé (environmental challenges) ...... 17 1.3.2 Scientific challenges ...... 18 1.3.3 Importance of this thesis research for land management practices in the Boé ...... 18 1.4 Research Objectives ...... 19 1.4.1 Sub-objective 1: create extensive classified base maps of the land cover in 2016 and 2020 ...... 19 1.4.2 Sub objective 2: explore the feasibility of a multi temporal monitoring approach for the Boé ...... 20 1.5 General research approach ...... 21 1.6 Research Scope and Possible Limitations ...... 22 1.7 Reading guide ...... 23 2. Review ...... 24 2.1 Using remote sensing in land use/land cover studies...... 24 2.1.1 First satellite imaging ...... 24 2.1.2 Algorithm-based approaches ...... 24 2.1.3 Manual interpretation of change ...... 24 2.1.4 Interpretation elements ...... 24 2.1.5 Remote sensing studies in the context of Guinea-Bissau ...... 25 2.2 Sampling and sampling designs ...... 25 2.2.1 Samples and sampling frames ...... 25 2.2.2 Subjective sampling ...... 25 2.2.3 Probability sampling ...... 26 2.2.4 Applied sampling designs in other LULC studies ...... 27 2.2.5 Sample size ...... 28 2.2.6 Pratical side of field survey sampling ...... 29 2.2.7 Spectral assessment of collected samples ...... 29 2.2.8 Importance of sample selection for classifications ...... 29 2.3 Classification ...... 30 2.3.1 Pixel-based classification ...... 30 2.3.2 Unsupervised classification ...... 30 2.3.3 Object based classification ...... 31 2.3.4 Supervised classification...... 31 2.3.5 Maximum likelihood method ...... 31 2.3.6 Machine learning algorithms ...... 32 2.3.7 Random forest ...... 32 2.4 Accuracy assessment: ...... 33 2.4.1 Error matrix ...... 33 2.4.2 Producer’s accuracy ...... 34 2.4.3 Visual assessment...... 34 2.5 Change detection ...... 34 2.5.1 Change detection procedures and techniques ...... 35 2.5.2 Post-classification technique ...... 35 2.5.3 Change detection matrix ...... 36 2.6 Forest monitoring systems ...... 36 2.6.1 Hansen dataset ...... 36 2.6.2 BFAST monitoring ...... 37 3. Study Area & Data ...... 38 3.1 Study Area ...... 38 3.2 Land Cover Types ...... 40 3.3 Data ...... 46

4

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

3.3.1 Sampling data: ...... 46 3.3.2 Satellite data: ...... 46 3.3.3 Ancillary Data: ...... 52 3.3.4 Validation Data: ...... 53 3.3.5 Forest Monitoring Data: ...... 53 4. Methodology ...... 54 4.1 Preliminary data processing ...... 54 4.1.1 Pre-processing of Sentinel-2 data ...... 54 4.1.1.2 Mosaicking of Sentinel-2 scenes ...... 55 4.1.2 Pre-processing of Landsat 8 data ...... 55 4.1.3 Calculating (vegetation) indices ...... 55 4.1.4 Pre-processing of Aster GDEM data ...... 56 4.1.5 Compositing bands ...... 57 4.1.6 Clipping ...... 57 4.2 RQ1: Sampling ...... 58 4.2.1 Identification of land cover types and unsupervised classification ...... 58 4.2.2 Random points generation ...... 58 4.2.3 Field survey sampling ...... 59 4.2.4 Creating the first ‘basic’ sample collection ...... 59 4.3 RQ2: Establishment of representative sample collections ...... 59 4.3.1 Initial spectral assessment and test classification ...... 60 4.3.2 Merging of similar land cover types and sensitivity analysis of Maximum likelihood classifier ...... 60 4.3.3 Second spectral assessment ...... 61 4.4 RQ3: Classification ...... 61 4.4.1 Selection of classification method and classification type ...... 61 4.4.2 Image Segmentation ...... 62 4.4.3 Training samples and reference dataset ...... 62 4.4.4 Training of MLA’s ...... 63 4.4.5 Final classification...... 63 4.5 RQ4: Validation ...... 64 4.5.1 Confusion Matrix ...... 64 4.5.2 Visual validation ...... 64 4.5.3 Post-classification processing ...... 64 4.5.4 Reclassification ...... 65 4.5.5 Post-classification processing ...... 65 4.6 RQ5: Change detection ...... 66 4.6.1 Change detection matrix ...... 67 4.6.2 Change map ...... 67 4.7 RQ6: Assessment of forest monitoring systems ...... 67 4.7.1 Hansen dataset ...... 67 4.7.2 Bfast forest monitoring ...... 68 5. Results ...... 70 5.1 Field work Sampling (RQ1)...... 70 5.1.1 Sampling frame and design ...... 70 5.1.2 Fieldwork campaign ...... 71 5.1.3 Collected samples ...... 71 5.1.4 Addition of new samples ...... 71 5.2 Establishment of sample collections (RQ2) ...... 74 5.2.1 Initial spectral assessment and test classification ...... 74 5.2.2 Merging of land cover classes ...... 74 5.2.3 Sensitivity analysis of Maximum Likelihood classification ...... 75 5.2.4 Final sample collection ...... 78 5.2.5 Second spectral assessment ...... 78 5.3 Classification results (RQ3) ...... 86 5.3.1 Sensitivity analysis of MLA classifiers ...... 86 5.3.2 Summary of results of sensitivity analysis ...... 87 5.3.3 Sentinel-2, Maximum Likelihood classification of 2020 ...... 87 5.3.4 Sentinel-2, Random Forest classification of 2020...... 89 5.3.5 Sentinel-2, Support Vector Machines classification of 2020 ...... 90 5.3.6 Landsat 8, Random Forest classification ...... 90 5.3.7 Landsat 8, Support Vector Machine classification ...... 91 5.3.8 Visual comparison of results of different MLA classifiers ...... 92 5.4 Accuracy assessment (RQ4) ...... 94

5

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.4.1 Quantitative assessment ...... 94 5.4.2 Visual validation of the Random Forest classifications...... 95 5.4.3 Post-classification processing ...... 97 5.4.4 Final RF classification of 2020 Landsat 8 image ...... 98 5.4.5 Distribution of land cover types in final classified Landsat 8 output ...... 98 5.5 Change detection results (RQ5) ...... 100 5.5.1 Final classified Landsat 8 output of 2016 ...... 100 5.5.2 Visual comparison between classifications of 2016 and 2020 ...... 101 5.5.3 Comparison of land cover distribution in 2016 and 2020 ...... 101 5.5.4 Change dynamics ...... 103 5.5.5 Change map ...... 105 5.6 Forest monitoring results (RQ6) ...... 105 5.6.1 Forest loss in the Boé area and its surroundings...... 105 5.6.2 Forest loss in the Boé area ...... 106 5.6.3 Comparison of forest loss in Boé area and neighbouring Wendou M’Bour sub-prefecture ...... 107 5.6.4 Comparison of Hansen dataset and change map ...... 109 5.6.5 Yearly forest loss in the National Parks ...... 109 5.6.6 BFAST monitoring method ...... 110 6. Discussion ...... 113 7. Conclusion...... 121 7.1 Added value and potential applications ...... 122 References ...... 123 Appendices ...... 130 Appendix A ...... 130 Appendix B ...... 132 Appendix C...... 136 Appendix D ...... 137 Appendix E ...... 144 Appendix F ...... 146 Appendix G ...... 149 Appendix H ...... 150 Appendix I ...... 158 Appendix J ...... 160 Appendix K ...... 161 Appendix L ...... 165 Appendix M ...... 169

6

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

List of Figures Figure 1.1: Guinea-Bissau and the Boé highlighted in purple (Chimbo, 2020) ...... 14 Figure 1.2: The different bioclimatic zones of . (USGS, no date) ...... 15 Figure 1.3: IBAP’s national protected areas within the Boé, (Breider, 2016) ...... 16 Figure 1.4: Schematic overview of the general research approach ...... 21 Figure 1.5: The issue with the swath of the Sentinel-2 images ...... 22 Figure 2.1: Examples of sampling designs for land cover assessments, (McRoberts, Tomppo and Czaplewski, 2012) ...... 26 Figure 2.2: A simple decision tree of a machine learning algorithm (Koehrsen, 2018) ...... 32 Figure 2.3: A simple error matrix for accuracy assessments (Stehman, 2009) ...... 33 Figure 2.4: A simple change detection matrix (Juliev et al., 2019)...... 36 Figure 3.1: The study area in this research ...... 38 Figure 3.2: Depictions of the different land cover types that were sampled...... 44 Figure 3.3: Flowchart of the preliminary data collection ...... 47 Figure 3.4: The different acquisition modes of the Sentinel-1 C-band transmitter (Luqman, 2017) ...... 48 Figure 3.5: Overview of the Sentinel-2 bands with (a) 10m and (b) 20m spatial resolution (ESA; Sentinel Online, 2020) ...... 49 Figure 3.6: Overview of the Landsat 7 ETM+ and Landsat 8 OLI and TIRS bands...... 51 Figure 3.7: Shapefile of Boé area that is used in this research ...... 53 Figure 4.1: Flowchart of the pre-processing process for the Sentinel-2 data ...... 54 Figure 4.2: Flowchart of the pre-processing process for the Landsat 8 data ...... 55 Figure 4.3: Flowchart of the pre-processing process for the Aster GDEM data ...... 56 Figure 4.4: Flowchart of the stratified random field survey sampling process ...... 58 Figure 4.5: Flowchart of the process of establishing a representative training sample collection ...... 60 Figure 4.6: Flowchart of the final classification process ...... 61 Figure 4.7: Flowchart of data validation process ...... 64 Figure 4.8: Flowchart of the post-classification processing process ...... 65 Figure 4.9: Flowchart of change detection process ...... 66 Figure 4.10: Difference in output between the ‘Intersect’ feature on the left and ‘Union’ tool on the right...... 67 Figure 4.11: Flowchart of the Hansen forest monitoring process ...... 68 Figure 4.12: Flowchart of the BFAST forest monitoring process ...... 69 Figure 5.1: Resulting unsupervised classification ...... 70 Figure 5.2: Spatial distributions of the final training and validation samples ...... 72 Figure 5.3: Spectral profile of the final training samples ...... 79 Figure 5.4: Spectral signatures of the final sample collection across the six different (vegetation) indices ...... 82 Figure 5.5: Spectral signatures of the final sample collection across the Radar bands ...... 83 Figure 5.6: (ML) Classification result of 2020 ...... 88 Figure 5.7: (RF) Classification result of 2020 ...... 89 Figure 5.8: (SVM) Classification result of 2020 ...... 90 Figure 5.9: Landsat 8 (RF) classification result of 2020 ...... 91 Figure 5.10: Landsat 8 (SVM) classification result of 2020 ...... 92 Figure 5.11: Visual comparison of the RF and SVM classifications ...... 93 Figure 5.12: Visual validation of the final classified Landsat 8 (RF) output in the CheChe village area ...... 95 Figure 5.13: Visual validation of the final classified Sentinel-2 (RF) output in the CheChe village area ...... 96 Figure 5.14: Comparison of un-processed and post-processed classification ...... 97 Figure 5.15: Final Landsat 8 (RF) classification of 2020 ...... 98 Figure 5.16: Distribution of the land cover types in 2020 in the total Boé area (a) and its sub-regions (b) ...... 99 Figure 5.17: Classification result of 2016 ...... 100 Figure 5.18: Distribution of the land cover types in 2016 in the total Boé area (a) and its sub-regions (b) ...... 101 Figure 5.19: Gains and losses of all the land cover types between 2016 and 2020 ...... 104 Figure 5.20: Sankey diagram visualizing the dynamic behavior of the land cover classes ...... 104 Figure 5.21: Change map visualizing the changes in land cover between 2016 and 2020 ...... 105 Figure 5.22: Forest loss and gain in the Boé area and its surroundings ...... 106 Figure 5.23: Forest loss and gain in the Boé area according to the Hansen dataset ...... 107 Figure 5.24: Yearly forest loss between 2001 and 2020 in the Boé area ...... 108 Figure 5.25: Yearly forest loss between 2001 and 2020 in the Wendou M’Bour sub-prefecture ...... 108 Figure 5.26: Visual comparison between the Hansen dataset and change map ...... 109 Figure 5.27: Yearly forest loss in the parts of the National Parks that are located in the Boé area ...... 110 Figure 5.28: Occurrences of forest disturbances between 2009 and 2020 ...... 111 Figure 5.29: Magnitude of the forest disturbances between 2009 and 2020...... 112 Figure B.1: Spectral profiles of the basic sample collection...... 132 Figure B.2: Spectral signatures of the basic sample collection across the six different (vegetation) indices ...... 135 Figure C.1: Initial test classification ...... 136 Figure D.1: Resulting output of the 14 test classifications ...... 143

7

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure E.1: Segmented rasters used for MLA classifications ...... 145 Figure F.1: Resulting output of the MLA test classifications ...... 148 Figure G.1 Post-classification processing results of the Sentinel-2 classifications ...... 149 Figure I.1: Placed control points for the georeferencing of the Sentinel-2 images ...... 158 Figure I.2: Allignment issue of the Sentinel-2 2016 and 2020 images ...... 159 Figure K.1: Sentinel-2 land cover distribution in the three biggest regions in 2016 ...... 161 Figure K.2: Sentinel-2 land cover distribution in the three smallest regions in 2016 ...... 162 Figure K.3: Sentinel-2 land cover distribution in the three biggest sub-regions in 2020 ...... 163 Figure K.4: Sentinel-2 land cover distribution in the three smallest sub-regions in 2020 ...... 164 Figure L.1: Yearly forest loss chart of the West Fefine region ...... 165 Figure L.2: Yearly forest loss chart of the East Fefine region ...... 166 Figure L.3: Yearly forest loss chart of the Cuntabani corridor ...... 167 Figure L.4: Yearly forest loss chart of the Cheche corridor ...... 168 Figure M.1: BFAST monitoring results statistics in R ...... 169 Figure M.2: BFAST monitoring results statistics in ArcGIS Pro ...... 169

8

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

List of Tables Table 3.1: Descriptions of the different land cover types that were sampled ...... 45 Table 3.2: Sentinel-1 data used in this research ...... 49 Table 3.3: Sentinel-2 data used in this research ...... 50 Table 3.4: Landsat 8 data used in this research ...... 52 Table 3.5: Forest monitoring data used in this research ...... 53 Table 5.1: Sample collection after fieldwork campaign (a) ‘Basic sample collection’ with additional samples (b)...... 69 Table 5.2: Differences between sample collections (a) and (b) ...... 75 Table 5.3: ‘Researcher samples’ (a) and ‘Expert samples’ (b) ...... 77 Table 5.4: Differences between ‘researcher’ and ‘expert’ sample collections ...... 77 Table 5.5: Summary of the first three test classifications ...... 77 Table 5.6: Summary of test classifications 4 and 5 ...... 78 Table 5.7: Summary of test classifications 6 to 9 ...... 78 Table 5.8: Summary of test classifications 10 to 12 ...... 79 Table 5.9: Summary of the last two test classifications ...... 79 Table 5.10: Final training sample collection ...... 80 Table 5.11: Differences between sample collections...... 80 Table 5.12: Jeffries-Matusita distance for Sentinel-2 bands ...... 87 Table 5.13: Jeffries-Matusita distance for indices and Sentinel-1 bands ...... 87 Table 5.14: Summary of the MLA tests ...... 89 Table 5.15: Accuracy of the different MLA classifier tests ...... 96 Table 5.16: Change detection matrix ...... 111 Table A.1. Fieldwork schedule ...... 129 Table H.1: Accuracy of the Sentinel-2 Maximum Likelihood classification of 2020 ...... 149 Table H.2: Accuracy of the Sentinel-2 Random Forest classification of 2020 ...... 150 Table H.3: Accuracy of the Sentinel-2 Random Forest classification of 2016 ...... 151 Table H.4: Accuracy of the Sentinel-2 Support Vector Machines classification of 2020 ...... 152 Table H.5: Accuracy of the Landsat 8 Random Forest classification of 2020 ...... 153 Table H.6: Accuracy of the Landsat 8 Support Vector Machines classification of 2020 ...... 154 Table H.7: Accuracy of the Landsat 8 Random Forest classification of 2016 ...... 155 Table H.8: Accuracy of the Landsat 8 Support Vector Machines classification of 2016 ...... 156 Table J.1: Change detection matrix of Sentinel-2 change detection analysis ...... 159 .

9

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Acronyms and abbreviations ARVI: Atmospherically Resistant Vegetation Index AW: Tropical Savannah climate (based on the Köppen climate classification) CIG: Chlorophyll Index Green DEM: Digital Elevation Model GEE: Google Earth Engine IBA: Important Bird Area JM: Jeffries-Matusita KBA: Key Biodiversity Area KC: Kappa Coefficient LULC: Land Use Land Cover ML: Maximum Likelihood MLA: Machine Learning Algorithm NBR: Normalized Burn Ratio NDMI: Normalized Difference Moisture Index NDVI: Normalized Difference Vegetation Index OA: Overall Accuracy R: Language and environment for statistical computing and graphics. RF: Random Forest SAR: Synthetic aperture radar SAVI: Soil Adjusted Vegetation Index SVM: Support Vector Machines WUR: Wageningen University & Research

10

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

1. Introduction

1.1 Boé as valuable nature region The Boé Sector, located in the south eastern border area of Guinea-Bissau, is a very isolated area and difficult to access. Large parts of nature in the Boé have still remained untouched and the region has conserved an important biodiversity (Breider et al., 2016). In geographical terms, the Boé can be considered as the most north- westerly part of the Fouta Djallon mountain range which, for the largest part, is situated in the Republic of Guinea (Chimbo, 2018b). Local Béliefs of the people that are living in the Boé and the rest of the Fouta Djallon prohibit the hunting and eating of chimpanzees (Kormoz & Boesch, 2003). This is one of the reasons why the Boé has a relatively large chimpanzee population. Chimpanzees are considered the flag ship species of the Boé, since besides chimpanzees, a number of other large mammal species consider the Boé their home as well (Breider et al., 2016). Silva et al. (2007) conducted a feasibility study for the establishment of a protected area in Guinea-Bissau and for the options available for eco-tourism. In their research they concluded that the nature value of the Boé is very high. They also estimated the chimpanzee population of the Boé to be at 700 in 2007. Most recent estimations based on monitoring activities like nest counting and recce’s have led to higher numbers: a chimpanzee population of 1,000-1,500 in the area (IUCN SSC Primate Specialist Group, 2020).

1.1.1 Chimpanzees Chimpanzees (Pan troglodytes) are only indigenous to tropical Africa (Kormoz & Boesch, 2003). There are four distinctive subspecies. The subspecies Pan troglodytes schweinfurthii is native to East Africa and also lives in Central Africa. This subspecies has been studied the most and has been the subject of Jane Goodall’s famous research on chimpanzees in Western Tanzania (Goodall et al., 1978).

The Pan troglodytes verus (western chimpanzee) subspecies lives in West Africa. According to the IUCN Red List (2019) this subspecies can be found in Guinea, Guinea-Bissau, Ghana, Liberia, Mali, Senegal, Sierra Leone and the Ivory Coast. It is currently assessed as critically endangered (CR) and is the most threatened subspecies. It possibly already became extinct in Benin, Burkina Faso and Togo.

Furthermore, the Pan troglodytes ellioti can be found in Nigeria and Cameroon (Kormoz & Boesch, 2003). The fourth species, the Pan troglodytes troglodytes (central chimpanzee) predominantly lives in Cameroon, Gabon and the Republic of the Congo but also occurs in Equatorial Guinea, the Central African Republic, south-east Nigeria, Angola and in the coastal region of the Democratic Republic of the Congo (WWF, 2020).

The chimpanzees in the Boé, exhibit very special behavior that until now has only been observed in limited populations of these western chimpanzees (Kühl et al., 2016). This behavior is called ‘accumulative stone throwing’ and it involves behavior in which chimpanzees accumulate rocks around certain hollow and/or buttressed trees with the reason to throw them at those trees. This behavior is commonly accompanied by the pant hoot vocalization in which Chimpanzees utter a distinctive loud call, known as the ‘pant hoot’ (Mitani et al., 1992) and often, but not always, accompanied by the so called ‘climax phase’ of the pant hoot which consists of scream elements and drumming with the hands or feet on the tree (Kühl et al., 2016). Until now, scientist are not sure why the western chimpanzees exhibit this special behavior and therefore this behavior has been the subject of a number of studies over the last few years. The Dutch-based Chimbo Foundation, together with the local NGO from Guinea Bissau Daridibó, is actively stimulating research on chimpanzees and has therefore installed a large number of trail cameras in different forests in the Boé (Breider et al., 2016). The presence of these cameras and the fact that they already have recorded a lot of ‘stone throwing’ behavior, makes the Boé an attractive study region for researchers who want to study this behavior.

11

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

1.1.2 Other endangered mammals and animals As mentioned before, not only chimpanzees but a lot of other (special) wildlife species consider the Boé their home. A recent study on wild cats in the Boé indicated that trail cameras had recorded a number of wild cats, including leopards, servals, African golden cats, caracals and even lion (Breider et al., 2016). Cabuy (2014) reported on the sightings of reptiles and amphibians of which some are included in the IUCN red list as well. Another study on the ornithological importance of the Boé concluded that the area has a very rich bird life with 275 known different species (Coppens, 2016). Some of these species are included in the IBA1 A1 category of globally threatened species and the A3 category of biome restricted assemblages of the Sudan‐Guinea Savanna and Guinea‐Congo Forests biomes (Coppens, 2016).

1.1.3 Sacred forests In the whole of Guinea-Bissau and also in the Boé, some places, notably forests, are traditionally considered sacred (Biai et al., 2019). According to Biai et al., (2019 ) people are not allowed to inhabit these areas, nor should they practice economic activities of great scale in them, even if these sacred areas obtain a variety of useful resources. According to Chimbo (2017) the sacred forests in the Boé are relatively small patches of forests protected by traditional Bélieves and rules. The local people of the Boé Bélieve that spirits inhabit these sacred forests and therefore these forests have to be respected and protected. The sacred forests thereby also have a positive effect on the conservation of the environment in the Boé. Kühnert et al. (2019) refer to these sacred forests as ‘sacred groves’ in their study and stress the importance of the persistence of sacred groves in eastern Guinea-Bissau because of their vital role in long-term conservation efforts.

1.1.4 Importance of land cover data for the assessment of land management practices Conservation biology has always been concerned with monitoring efforts and coordinating research in order to revert crises in biodiversity all over the world. According to Petorelli, Safi & Turner (2014): “biodiversity is defined as the natural variety and variability within and among living organisms and the ecological complexes in which they naturally occur, as well as the ways in which organisms interact with each other and with the physical environment”. Developments in environmental conditions like climate change, the conversion of land use with the increase in the construction of human made objects like roads and buildings for example, are key drivers of changes in biodiversity in the world. Therefore it is very important for ecologists and conservation biologists to adequately monitor these kind of developments.

Since the last couple of decades, space-based observations of Earth have provided a lot of understanding of issues related to weather and climate as well as other geophysical phenomena. In doing so, they have facilitated scientist to perform more accurate long-term predictions and forecasts of changes in the physical and chemical processes of our planet.

Satellite remote sensing, can provide complete spatial coverage of large and remote areas and has therefore facilitated new (relatively) inexpensive and easily verifiable means of retrieving environmental information for such areas. Satellite remote sensing can be utilized to monitor the intensity and extent of threats to biodiversity. These threats are commonly referred to as pressures on the natural world. It can also be utilized to monitor the state of biodiversity in certain areas or to assess the effect of policies or practices that are aimed at preventing or reducing biodiversity loss. These policies and practices are commonly referred to as responses. One of the most common responses to changes in biodiversity is the establishing of and/or enhanced management of so called ‘protected nature areas’. Petorelli, Safi & Turner (2014) argue that satellite remote sensing can contribute a lot to these kind of responses across multiple spatial and temporal scales by providing the required environmental information.

Land cover studies are some of the major contributors to the creation of the above mentioned environmental information. Enderle and Weih (2005) highlight that land cover data, especially when used in conjunction with other ‘ancillary’ data like terrain maps or Digital Elevation Models (DEMs) and slope maps for example, can be

1 The term IBA stands for ‘Important Bird Areas’ (IBA) and is used in the IBA program of Birdlife International. This program is a worldwide initiative for the identification of key sites for the conservation of birds.

12

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) useful in identifying certain areas that are more or less suited to certain land management practices. Therefore land cover data can aid decision makers in the assessment of land management practices for use in certain areas to achieve specific goals. The development of land cover maps is often critical for monitoring changes in land cover in specific study or management areas. Gaining knowledge of changes and the extent of changes that have occurred in land cover, is critical for making the right decisions in land management.

13

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

1.2 Background & Context

1.2.1 Guinea-Bissau Guinea-Bissau is a small country with an area of approximately 36 125 km2 (Temundo & Abrantes, 2014). It is situated in West Africa along the coast of the Atlantic Ocean and is bordered by Senegal in the north and by the Republic of Guinea in the east and south. The research area for this thesis is the Boé (administrative) Sector in the Gabu province in the south-east of Guinea-Bissau, which is highlighted in purple in figure 1.1.

Figure 1.1: Guinea-Bissau and the Boé highlighted in purple (Chimbo, 2020)

1.2.2 Population According to the United Nations World Populations Prospects of 2019, the total population of Guinea-Bissau will amount to approximately 1 921 000 people at the end of 2019. The country is home to a large diversity of ethno- linguistic groups (Temundo & Abrantes, 2014). The biggest of these groups are the Fula people who amount to approximately 28,5% of the country’s population but they are with ± 22,5% closely followed by the Balanta people (International Security Sector Advisory team, 2019). Other notable ethnic groups are the Mandinga, Papel and Manjaca, amounting to 14,7%, 9,1% and 8,3% of the total population respectively.

1.2.3 Climate and soil characteristics The country has a tropical savannah (AW) climate based on the climate classification of Köppen (1918) and is part of the Guinean-Sudanian bioclimatic transition zone of West Africa, as can be seen in figure 1.2 (USGS, no date). The climate is characterized by two different well-defined seasons (Temundo & Abrantes, 2014). These are the rainy season and the dry season. The rainy season generally starts at the end of May and usually lasts until the end of October. Although the whole country is affected by this rainy season, the amount of rainfall varies greatly in the different geographical regions of the country. In the southern region there is on average more than 1800 mm of rainfall during this season while in the northern and eastern region there is usually a lot less with slightly more than 1200 mm of rain on average. The dry season generally starts in November and lasts until the end of April (Chimbo, 2018b).

14

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

The main agricultural soils in Guinea-Bissau are ferrasols but the most productive soils are either gleysols or fluvisols according to Temundo & Abrantes (2014). On the latter soils, rice is cultivated in flooded lowlands. In general the vegetation types of Guinea-Bissau belong to the Guinean-Sudanian transition zone of West-Africa. However each geographical region in Guinea-Bissau also has its own characteristic vegetation types due to variations in soils, rainfall and human interventions. The Boé with its hills and unproductive laterite soils is very different from the rest of the country. For instance, rice is grown here mostly under rain-fed conditions on reddish-brown lateritic soils (Oosterlynck, 2014).

Figure 1.2: The different bioclimatic zones of West Africa. (USGS, no date)

1.2.4 IBAP and the system of national protected areas IBAP (Instituto da Biodiversidade e das Áreas Protegidas) is the national institute that protects and manages protected areas and biodiversity in Guinea-Bissau (IBAP, 2019). It was founded in 2004 and although it falls within the responsibility of the ministry for the environment, it still has financial and administrative autonomy. The institute therefore has the capacity to develop policies and regulations with regard to the conservation of protected areas and biodiversity. IBAP’s efforts have led to the establishment of the National Protected Areas System (SNAP), which now consists of 8 protected areas throughout the country.

15

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

As can be seen in figure 1.3, the most northern and western parts of the Boé are part of the Dulombi - Boé National Park complex and fall under the jurisdiction of IBAP (IBAP, 2019). On the other hand, large parts in the south and in the east of the Boé are not included and its nature is therefore not fully protected.

Figure 1.3: IBAP’s national protected areas within the Boé, (Breider, 2016)

1.2.5 Chimbo Foundation One of the goals of the Chimbo Foundation is to make the whole of the Boé part of a larger network of well managed protected areas like national parks and ecological corridors in Guinea Bissau and its neighboring countries Senegal and Guinea Conakry (Chimbo, 2019b). The foundation also tries to include the sacred forests and sacred sites in the Indigenous Peoples and Community Conserved Territories and Areas (ICCA)-registry, which is registered by the World Conservation Monitoring Centre in Cambridge (UK). Furthermore, it is preparing an application to get a so called ‘Key Biodiversity Area’ (KBA) status for the Boé. Chimbo Foundation is a Dutch foundation which was founded in memory of David Goedmakers with the aim to preserve and, if needed, restore the Chimpanzee population of the Boé and its natural habitat (Chimbo, 2019a). Its main strategic axis is community based conservation. As of 2007, the foundation is cooperating with the local NGO ‘Daridibó’ in order to sensitize the local population for environmental issues, to promote responsible tourism and to gather information on the flora and fauna of the Boé. Chimbo Foundation’s current activities are:

• COMBAC Boé (Community based conservation of the cultural and natural values of the Boé sector) project • CVV (Village Vigilance Committee) and fire brigade program • Early fire program • Sensitizing the local population for environmental issues through the local radio station located in Béli • Chimpanzee and other animal observations with camera traps • Mapping sacred forests • Student thesis & internship program • Promoting & facilitating tourism • Active role in the updating of the IUCN action plan for the Western chimpanzee • Active role in certification of responsible Aluminium through ASI (Aluminium Stewardship Initiative)

16

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

1.3 Problem Definition

1.3.1 Threats to the habitat of Chimpanzees in the Boé (environmental challenges) Although the Boé has been left in peace for most of the recent time, some current phenomena pose a big threat to the area. According to Chimbo (2018e) the biggest threat for the area is population growth, caused by a high birth rate and high immigration rate from Guinea Conakry, which may increase deforestation, including the conversion into cashew plantations and may also increase hunting. Moreover, the subsurface of the area contains a lot of bauxite ores which may attract mining in the future.

1.3.1.1 Population growth The population of the Boé is rapidly growing and experts (and part of the population) think that the current population of 12,000 people is already too much to cope with for the ecological capacity of the region. (Chimbo 2018d). New settlements are being constructed in the unoccupied central areas of the Boé. Because of the population growth, traditional customs and taboos are under stress. In addition, more land is being used for livestock farming and more forests are being converted into agricultural land or cashew plantations.

1.3.1.2 Hunting Due to local Béliefs that traditionally prohibited the eating of chimpanzees (Kormoz & Boesch, 2003), the hunt on chimpanzees for the bush meat market did not exist in the Boé . The hunt on chimpanzees for the pet market, although not very common until 15 years ago, has been stopped. However, nowadays the younger generation in the region feels less restricted by these taboos and rules (Chimbo, 2018d).

1.3.1.3 Mining The Boé is an area where the natural resource bauxite can be found (Chimbo, 2018d). Bauxite is a mineral that is the basis for the production of aluminum. According to recent estimates, at least 100 million tons of bauxite are to be found in the Boé. The Republic of Guinea (Conakry) has much larger bauxite reserves and bauxite has been and will continue to be exploited for years in this country. The Boé still lacks the proper infrastructure for bauxite mining and that was one of the reasons why this kind of exploitation in the Boé was never considered as a viable option by western-based mining companies. However, as of late, research is being conducted for the options available for bauxite mining in the Boé. The mining of bauxite will have substantial negative impacts on the region’s biodiversity as the so called bauxite pockets are in the middle of chimpanzee habitat.

1.3.1.4 Cashew plantations & deforestation In the eastern region of Guinea-Bissau the traditional practice of agricultural cultivation was and still is one of ‘slash-and-burn agriculture’ (Temudo & Santos, 2017). During colonial times the production of peanuts was promoted but the peanut production contributed to deforestation and soil degradation of the region. A reforestation plan was introduced to counter these developments. Cashew trees were included in this plan because of their rusticity and high tolerance to drought and poor soils. At first, farmers could not find any advantages in the production of this cash crop but during the mid-1980s, attitudes changed and farmers progressively started to plant cashew trees. Today many farmers are switching from shifting cultivation to more permanent agriculture in the form of plantations such as cashew plantations (Temundo & Abrantes, 2014). In 2009, Guinea-Bissau was the second biggest exporter of raw cashews in the world and even had the largest per capita production. In the Boé, cashew plantations sometimes replace former rice fields but in most cases replace the original forest-trees. These plantations have far less biodiversity values than the forest patches they replaced. Because of this, the expansion of cashew will negatively affect the biodiversity and the flora and fauna of the region. In cases where they replace the rain fed rice fields of Boé, they will make the population dependent on world market prices for both cashew and rice instead of being able to produce its own rice. Also important forest products are getting more and more difficult to obtain (timber and non-timber products)

17

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

1.3.2 Scientific challenges Up-to-date land cover maps of the Boé area are missing because the last time that the Boé was properly mapped was during the colonial days by the Portuguese. In a previous attempt to study land cover change in the Boé (Studer, 2019), it turned out to be very difficult to distinguish cashew plantations from gallery and dry forests by using moderate resolution satellite data. Therefore it was hard to create accurate land cover maps for the Boé to present land cover change. Studer (2019) has not been the only one who found it difficult to distinguish cashew plantations from other forest types. Singh et al. (2018) were faced with the same challenge in their research on Cambodian community forests and tried to combine remote sensing data with LiDAR laser data within a machine learning framework to tackle the problem.

In addition, one of the major challenges of the study area in this research is the accessibility of the area and the fact that a limited amount of researches are available. For example, no expensive measuring equipment is available and transportation has to be carried out by bike. According to Stehman (2009), limited resources will require the need to focus a design on priority objectives (Stehman, 2009). When selecting an adequate design for a certain research area, one must recognize the strengths and weaknesses of different designs and understand the trade-offs among objectives and desirable design criteria. According to McRoberts, Tomppo and Czaplewski (2012) the primary costs of sampling may be attributed to travelling to and from the sampling location and the actual measurement of the sample location. Hence these costs are very dependent on the structure of the landscape, the required measurements that need to be taken and the local topographic, economic and transportation conditions.

1.3.3 Importance of this thesis research for land management practices in the Boé This thesis aims to create the aforementioned land cover maps that might provide critical environmental information on land cover changes and the intensity and extent of threats these changes may cause to the biodiversity of the region. These maps may be used as base maps for the Boé with the intent of monitoring land cover changes affecting the Chimpanzee protection and the estimate of carbon stock.

Currently, Chimbo Foundation is exploring the possibilities of implementing a carbon market platform, which can be used to incentivize forest conservation and sustainable land use in the context of the Boé in Guinea Bissau (van Gilst et al., 2019). Chimbo hopes that this platform may aid in: “generating income through carbon credits that would otherwise not be there and help in strengthening a participatory approach that enables sustainable land use through community stewardship”.

In order to apply for such a carbon market platform, Chimbo needs to have an extensive knowledge of the total carbon stocks and emissions in the region and in particular the potential to avoid future carbon emissions based on the speed and extent of forest cover loss in the area and the adjacent area in Guinea. Forest monitoring systems can be used to provide knowledge on the latter and therefore this thesis can be considered as a preliminary attempt to developing a forest monitoring system for the region. If the suggested monitoring approach(es) will prove to be successful, it may be used for the estimation of carbon stocks and for the often required 5 year verifications of carbon credits (van Gilst et al., 2019).

18

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

1.4 Research Objectives The overall research objective of this thesis is:

“Identifying the main spatiotemporal changes in land cover in the Boé in Guinea-Bissau with a particular focus on forest loss”

The section below will elaborate on two underlying sub-objectives and their corresponding research questions;

1.4.1 Sub-objective 1: create extensive classified base maps of the land cover in 2016 and 2020 The first sub-objective is to create extensive classified base maps of the land cover in 2016 and 2020 in the Boé, which can be compared to identify spatiotemporal changes in land cover over the past four years. The main priority of this sub-objective is to successfully identify forest loss during this period of time. As mentioned in section 1.3, a lot of the recent forest loss in the Boé can be attributed to the increase in cashew plantations. However, as mentioned in 1.3, it is proven to be difficult to differentiate cashew plantations from other forest types in satellite images with a moderate . Therefore, extra attention will be given to the differentiability between cashew plantations and other forest types in the classifications. A couple of research questions have been formulated that will help in accomplishing this first sub-objective. These research questions are outlined in the section below;

1.4.1.1 Sampling Before a land cover (change) map can be created, sample data have to be collected during a fieldwork campaign. One share of the sample data will be used as training data to train a machine learning algorithm (MLA) to perform a supervised classification and to eventually generate classified land cover maps. The other share of the sample data will be used for the validation of the classified land cover maps. The sample data serve as so called 'ground truth' data which can provide truthful representations of the existing different land cover types in the Boé. The more representative samples collected, the better the algorithms can be trained and the better the classified maps can be validated. Moreover, for the algorithms to perform well during the classification, it is required to have a balanced sample collection and the different land cover types need to be differentiable from each other. Hence, the samples have to be representative and have to be collected in an effective manner. However, as mentioned in section 1.3, the Boé area is not very accessible and not a lot of resources are available during the fieldwork campaign. It will therefore be very challenging and important to find the most adequate field work strategy for this study area. Another challenge will be to establish a representative sample collection which the MLA’s can use to successfully differentiate the different land cover types in the classified land cover maps. This is why the following research questions have been formulated to investigate how these challenge can be tackled: RQ1: “What field work sampling strategy has to be applied for the collection of representative ‘ground truth’ samples given a limited amount of resources in a fragmented study area?” RQ2: “What methods have to be used to establish a representative and balanced sample collection?”

1.4.1.2 Classification The second step that needs to be realized for this sub-objective is the classification of satellite images for the years 2016 and 2020. This classification process must provide accurate land cover maps, which make it possible to differentiate between different land cover types. It will therefore be interesting to assess whether different classification methods can provide more accurate results. The following research question has been formulated to tackle this issue:

RQ3: “What image classification methods can best be employed for the creation of representative base maps with enough distinctiveness to differentiate between different land cover types?

19

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

1.4.1.3 Validation of the classification After the classifications have been performed, it is important to validate the results to assess whether the classified output is accurate and to assess whether a clear distinction can be made between the land cover types and in particular the cashew plantations and other forest types. Additionally, it will be interesting to see whether a combination of conventional quantitative validation methods with qualitative validation methods in the form of visual validation can prove to be useful for accuracy assessments. The third research question will try to address this issue:

RQ4: “Does the classified output reflect an acceptable level of accuracy in order to be used as base maps, when assessed by means of a combination of quantitative and qualitative validation methods?

1.4.1.4 Change detection The last step of this sub-objective is to compare the different classifications of 2016 and 2020 and to assess whether significant changes in land cover can be detected. Since forest loss is the main priority, extra attention will be given to the conversions that are related to forest loss like the conversion from forest types to cashew plantations for example. The research question that has been formulated for this step is:

RQ5: “What land cover changes are found by comparing the classifications of 2016 and 2020?”

1.4.2 Sub objective 2: explore the feasibility of a multi temporal monitoring approach for the Boé The second research objective is to explore whether a multi temporal monitoring approach could be implemented for forest monitoring in the Boé over a longer period of time. The section below will elaborate on the underlying research questions of this objective;

1.4.2.1 Explorative assessment of forest monitoring systems In order to provide a good overview of the forest cover changes in the Boé over the last 20 years, it can be helpful to analyse trends in the amount of forest cover on a yearly basis. The Hansen Global Forest Change v1.6 (2000-2018) dataset (Hansen et al., 2013) is designed for this specific purpose and will therefore be explored in this thesis. Another interesting feature of the Hansen dataset is that it can also be utilized to identify specific locations with the highest amounts of forest loss and/or forest gain in your study area of interest. In addition to the above mentioned feature, it may also be interesting to identify the areas with the highest disturbances in vegetation in the Boé. The BFAST multi temporal monitoring algorithm (Reiche et al., 2015) is designed for this purpose and will therefore be implemented in this research. What makes this monitoring algorithm interesting, is that it enables the user to analyze these disturbances on a monthly basis and it also assesses the magnitude and duration of these disturbances. Hence it will be interesting to assess how well these monitoring systems can be used to monitor long-term trends in land use and in particular in deforestation. The following research question has been formulated for this assessment:

RQ6: “Are the Hansen dataset and BFAST algorithm suitable for long-term forest monitoring in a fragmented forest landscape like the Boé?

20

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

1.5 General research approach As mentioned above, the main objective of this research is to identify spatiotemporal trends in land cover over the last 20 years and in particular the last four years in the Boé. Before this objective is realized, a couple of steps need to be carried out to provide answers on the research questions introduced in section 1.4. These different steps are highlighted in figure 1.4.

Figure 1.4: Schematic overview of the general research approach

The first step consists of a literature review, which provides insights in theoretical findings and applied methods in related studies. As mentioned before, these insights form the base for the development of the general research approach and for the selection of the different research methods. After this research approach is developed and the methods are selected in the second step, it is important to collect useful data and pre-process the collected ‘raw’ data for the analysis. This third step is, for its majority, realized during a fieldwork campaign by means of field survey sampling. Hence, this step is connected to the first research question.

Once all the data are collected and pre-processed, they are implemented in the fourth step, which consists of the land cover classification of the Boé. This step is connected to the second research question. In the fifth step, the classification is validated and subsequently, the change in land cover between different periods of time is detected in step 6. These steps are connected to the third and fourth research questions respectively. The seventh step entails the explorative assessment of different forest monitoring systems. Hence, it is connected to the last research question. In the last step, conclusions are drawn from the results of the analysis and the applicability of the study is discussed.

21

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

1.6 Research Scope and Possible Limitations Although this research is conducted in an ‘exceptionally important priority area’ for the conservation of Chimpanzees, the habitats of Chimpanzees in the area are not the main focus and therefore no maps of the habitats will be created for this research. This research will solely be conducted for the creation of land cover and land cover change maps of the Boé. However, these land cover maps will be created for Chimbo that might use them for future research on the habitats of Chimpanzees.

One of the limitations of the fieldwork for this research is that it has been conducted in a period of four months during the dry season and therefore no samples were taken during the wet season. Some conditions and characteristics of the land cover types differ a lot between the different seasons but these differences could not be taken into account since no samples were taken during the wet season. Because the fieldwork was performed during the dry season, the satellite images, that will be used for the classification of the land cover types, were also selected in the months of the dry season. Hence, no classification of the land cover in the wet season will be performed in this research.

Another limitation of the fieldwork is that only the villages that are included in the CVV program of Chimbo could be visited because the other villages simply did not have an active CVV with members who could function as local guides. Therefore the western part of the Boé that is located on the western side of the Corubal river could not be visited because the villages over there are not active in the CVV program.

One of the limitations of the Sentinel-2 satellite data that has been used in this thesis, is that the geographical area of the Boé falls not within one single swath of this satellite, as can be seen in figure 1.5. Hence, this satellite never fully records the region in one single day. In fact, a ‘complete Sentinel-2 image’ of the region can only be achieved by mosaicking 8 different Sentinel-2 scenes into one image. The 4 scenes that will represent the western part of this image are, on average, recorded three days before the scenes representing the eastern part of the image. In the process of mosaicking, the mean values of the overlapping pixels of the scenes are calculated and these mean values will be shown in the final image. Eventually, this may lead to inconsistencies in the final thematic land cover maps. This thesis will try to overcome this problem by mosaicking the image of 2016 and 2020 in the exactly same manner. By doing this, the images will have the same inconsistencies and may therefore still be comparable.

Figure 1.5: The issue with the swath of the Sentinel-2 images

22

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

1.7 Reading guide This research will first provide a review of relevant literature in chapter 2. After this review, it will discuss the implemented methods that are used in this research and that are based on the literature review. It will then proceed with providing the results of the analysis in chapter 5 and further discuss these results in chapter 6. In the end it will provide the final conclusion in chapter 7. All the chapters will be structured by the research question. The map of the study area in this research, that is provided in figure 3.1 can be used as reference for when specific locations are mentioned in the results and discussion chapters. The abbreviations that are sometimes used in this report can be found in the list of abbreviations.

23

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

2. Review

2.1 Using remote sensing in land use/land cover studies

2.1.1 First satellite imaging When the first Landsat imaging satellite was launched in 1972, a huge amount of data that covered large areas became available and this provided a new source of information for environmental studies (Sohl, Gallant and Loveland, 2004). The remotely sensed imagery that became available was extremely useful for scientist in order to perform land-cover and land-cover-change studies. Since that moment, a lot of analysis techniques have been developed which can be used for the creation of land cover and land cover-change information. These techniques can rely on manual interpretation but some of them completely rely on automated approaches. The latter facilitate more efficient, repeatable, and affordable means for monitoring the landscape.

2.1.2 Algorithm-based approaches The algorithm-based approaches make use of the spectral information that is recorded by remote sensing instruments and they use products derived from the spectral data as measurement variables for in change detection (Sohl, Gallant and Loveland, 2004). The algorithm-based approaches especially focus on the spectral variability. One of the most commonly used algorithm-based approaches the ‘Change vector analysis’ (CVA) has the advantage of providing high-level information with regards to the magnitude and nature of a surface change. Hence it is widely used to monitor vegetation and vegetation condition. Other algorithm-based methodologies are used for the analysis of trends in ‘vegetation indices’ and so called ‘band ratios’ and indices. Some approaches might be used to analyze changes in sub-pixel land-cover modifications and lastly they can be used for fuzzy classification systems. A fuzzy classification handles sub-pixel heterogeneity and therefore allows for multiple land cover classes per image pixel (Mohammed, 2019).

2.1.3 Manual interpretation of change Besides algorithm-based approaches, manual interpretation of imagery and aerial photography is often used to validate results from those semiautomated change procedures. According to Sohl, Gallant and Loveland (2004), a human interpreter has many advantages over an algorithm-based approach because a human interpreter can incorporate information from all kinds of elements in a deductive image interpretation process and this process is unmatched by any computer algorithm. Human interpreters can also more easily interpret ancillary information and subjective information, such as conversations with persons who possess knowledge of an area or observations that are made during field work for example.

2.1.4 Interpretation elements All the techniques that are mentioned above try to detect and analyze changes in the basic elements of interpretation (Sohl, Gallant and Loveland, 2004). These elements are the characteristics of land cover and land use that are represented on remotely sensed imagery and these characteristics allow for the detection and analysis of land-surface change. The following characteristics can be considered the key variables that are used in the interpretation of change:

• Color/Tone—The relative responses among all spectral bands. • Brightness—The intensity of the spectral response. • Size—The area of a discrete surface feature. • Shape—The geometric form of a surface feature. • Shadow/Height—Shadow effects related to feature height and viewing angle. • Texture—The roughness or smoothness of an image feature, created by tonal repetitions within that feature. • Pattern—Arrangement and repetition of surface features. • Site—Geographic location or setting of a surface feature. • Association—Spatial relationship of different surface features.

24

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

2.1.5 Remote sensing studies in the context of Guinea-Bissau Throughout the years, a couple of studies on land use and land cover change in Guinea-Bissau have been conducted. These scientific reports were predominantly focused on issues of mangrove cover changes like Lourenço, et al. (2009), or Andrieu (2018). Other studies focus on the effects of land cover changes on biodiversity and ecosystem services in coastal national parks. Examples of these kind of studies are Vasconcelos et al. (2002) and García del Toro and Más-López (2019). In addition, some reports assess land cover changes and landscape pattern dynamics in other parts of Guinea-Bissau like Cabral & Costa (2017) and others are only focused on changes in forest cover like Cassama (2019) and Melo et al. (2018). The latter two reports also include the Boé in their research but as they make use of Modis satellite data with a very low spatial resolution, the resulting land cover maps are therefore not very accurate and do not show a wide variety of land cover types. This thesis reveals the need of further studies in different areas in Guinea-Bissau or adjacent Guinea (Conakry) like the Boé, to provide useful and timely information for better understanding of land cover changes.

2.2 Sampling and sampling designs In most land cover studies, a map depiction of the land cover of a given study area is made by means of a land cover classification. This classification has to be representative and therefore has to be compared to the ‘true land cover condition’ of the area of interest. It is, however, impossible to determine the so called ‘ground truth’ condition of an entire area and therefore a higher determination of the land cover has to be achieved. According to Stehman (2009), this higher determination of land cover is often referred to as ‘reference data’ and these data are used for both the classification and validation of land cover maps. Nevertheless, even with such a higher determination of the land cover, it is very difficult and expensive to obtain reference data for an entire area of interest and therefore (statistical) sampling becomes an important component of land cover assessments. The sections below will discuss the process of sampling and the urgency of an adequate sampling design, which has to be partially based on preliminary insights in the study area and the variables to be studied, the daily practice of realizing sampling in the field and a statistical approach given such insights and practice (Bron van Ron).

2.2.1 Samples and sampling frames In land cover assessments, a sample is considered a subset or portion of the area that will be mapped. McRoberts et al. (2012) stress the importance of collecting samples that are representative of the entire land cover condition. This is because a representative sample collection will make the estimate of the land cover more accurate and less likely to deviate from the true land cover condition.

In order to achieve this, an adequate sampling frame has to be selected. In the process of selecting a sampling frame, three terms can be distinguished. At first, the ‘sampling frame’ refers to the set of all possible sample units. Secondly, the ‘sample design’ can be seen as a protocol for the selection of sample units that will be representative of the true land cover condition. Finally, the ‘plot configuration’ refers to the shape, size and components of the field plots that will represent the sampling unit (McRoberts et al., 2012).

It is not that easy to select the perfect sampling design as there are many difficulties associated with it. For example, sample units are distributed in space but observations of the units may often be spatially correlated (McRoberts et al., 2012). The problem with spatial correlation is that climatic, ecological, soil and other factors, cause observations from sample plots that are near to each other to be, in general, more similar than observations from sample plots that are far apart from each other. Another difficulty is that different sampling designs have different and sometimes very high costs (McRoberts et al., 2012).

2.2.2 Subjective sampling According to McRoberts et al. (2012) there are two general sampling approaches, which are: ‘subjective/purposive sampling’ and ‘probability sampling’. Subjective sampling relies on so called professional judgement in the process of selecting sample units that are believed to be a good representation of the entire population. This professional judgement is often provided by an expert with a lot of knowledge on the subject matter. One of the advantages of subjective sampling is that the sample units selected in subjective sampling are often convenient to measure, which reduces the costs of the sampling design. A disadvantage, on the other hand, is that the data gathered with subjective sampling often accurately describe the conditions on the sampled

25

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) plots but do not accurately characterize the entire population. Another disadvantage is that the probability that the expert will select a potential representative sample plot is unknown and can therefore not be scientifically defended.

2.2.3 Probability sampling In probability sampling, subjective (expert) judgments are replaced with objective quantifiable rules that are based on known probabilities of selection for each member of the population. Hence, this method relies on a mathematical foundation and precise rules for the estimation of population attributes based on a sample.

After the population attributes have been defined and representative sample units have been selected, it is important to decide how the sample units are going to be spatially divided in the real life situation or the ‘field’ in order to minimize standard errors and the effects of spatial correlation. For this procedure, a couple of different designs can be distinguished. Some of these methods are depicted in examples a, b, c and d of figure 2.1.

Simple random sampling The most basic (equal) probability sampling design is the ‘simple random sample’, which can be seen in figure 2.1a. The sample plots are randomly distributed over the study area and although there are spatial clusters and voids in the distribution, this is still a valid probability sample (McRoberts, Tomppo and Czaplewski, 2012). The coordinates of the sample plots may be generated by a random number generator or certain GIS tools, as long as the allowable coordinates are restricted to the sampled population of the study area. The advantage of simple random sampling is that it is the least risky equal probability design because all the possible sampling unit locations have an equal possibility of being selected for the sample. The disadvantage, however, is that in simple random sampling no consideration is given to the safety and difficulty aspect of measuring plots and of traveling between different plot locations. This method is therefore usually the least efficient with regard to costs and precision of estimates, partially because the spatial correlation between observations is usually high in random sampling.

Figure 2.1: Examples of sampling designs for land cover assessments, (McRoberts, Tomppo and Czaplewski, 2012)

26

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Systematic sampling The systematic sampling design makes use of a fixed array or grid in order to assign sample plots in a regular pattern, as can be seen in example b of figure 2.1, systematic sampling can be based on hexagonal arrays or rectangular grids like rasters. At first, the starting point and orientation of the arrays/grids have to be randomly selected but once this is done, the plots can be assigned in a systematic pattern at the intersections or in the middle of the arrays/grids for example. One of the main advantages of systematic sampling is that it maximizes the average distance between sample plots and therefore minimizes the spatial correlation among observations while the statistical efficiency increases. The biggest risk with systematic sampling is that the array/grid may coincide with natural or man-made features like mountain ridges or roads, which may lead to higher spatial correlation among observations. Systematic sampling can be combined with simple random sampling when sample plots are assigned to randomly selected locations within array or grid cells. This design is then called systematic unaligned sampling and is depicted in example c of figure 2.1.

Cluster sampling The above mentioned designs are often not the most the practical sampling designs with regard to cost efficiency and travel costs. One way to increase cost efficiency and decrease travel times is to implement a cluster sampling design and thus organize sample plots in clusters. An example of cluster sampling is depicted in example d of figure 2.1. McRoberts, Tomppo and Czaplewski (2012) draw a distinction between: ‘systematic cluster sampling’ and ‘stratified systematic cluster sampling’. In systematic cluster sampling, clusters of sample plots are distributed throughout the entire population by means of tessellations like grids or hexagons. A couple of issues have to be taken into consideration when implementing a cluster sampling design. These issues are; the shape of the cluster, the spacing between clusters, the number of sample plots per cluster and the sample plot configuration. In order to adequately tackle these issues, preliminary information about the spatial distribution of the variables of interest and the correlation between these variables is required.

Stratified sampling The last sampling design, stratified sampling, is slightly different from the above mentioned designs because it first entails dividing the population into non-overlapping subpopulations, which are delineated in polygons for example, that together comprise the entire population. These subpopulations are called ‘strata’ and as soon as these strata are selected, independent samples can be drawn from each particular stratum. As stratified sampling is the desired sampling design for this research, it will be discussed in more detail than the other designs.

Some of the advantages of stratified sampling are the following: it can be used to increase precision of population estimates and it may contribute to avoiding bias when the right estimators are selected. According to McRoberts, Tomppo and Czaplewski (2012) the increase in precision may occur when heterogeneous populations are divided into more homogenous subpopulations because the subpopulations will then have significantly different variances and means or both. Another advantage of stratified sampling is that different sampling protocols and estimation procedures can be used for different strata. McRoberts, Tomppo and Czaplewski (2012) give an example of a forest cover assessment where remote sensing data was used to determine that some plots were located on non-forest land and therefore did not have to be visited by the field team. This substantially reduced the travel costs. McRoberts, Tomppo and Czaplewski (2012) also state that the greatest benefits of stratified sampling are realized when the population is stratified and stratum sample sizes are determined before the actual sampling is conducted. Like the others designs, stratified sampling can be combined with random sampling when the samples within the strata are randomly selected. This procedure would then be describes as stratified random sampling.

2.2.4 Applied sampling designs in other LULC studies As reference data is a vital component for the classification and validation of land cover, all LULC studies require the development of some form of sampling design for the collection of samples. However, as each individual study area is unique, each different LULC study will also require its own different sampling design/strategy. In addition to the different sampling designs, the methods for the actual collection of samples also differ a lot in different studies. A lot of studies make use of field surveys for the collection of samples. An example of a successful field survey can be found in the study of Fischer et al. (2015). In this study, they developed a systematic cluster sampling design and laid out large transects within the specified clusters to observe the amount of forest resources on a national level in Burkina Faso. In addition to cluster sampling designs, stratified

27

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) sampling designs are frequently implemented for field surveys as well. Often, so called a priori knowledge is needed for the identification of the strata. In Yang et al. (2016) for example, they first obtained a priori knowledge on the land use history, biodiversity condition, soil fertility and other related socio-economic issues of the study area and used this for the identification of land cover strata. Once the strata were identified, they randomly allocated sample plots within the strata to measure the above ground biomass of forest, plantation and agricultural patches for the calculation of time averaged carbon stocks. A similar sampling design will be employed in this research.

In other studies, the researchers don’t conduct field surveys but interpret remote sensing data for the collection of samples instead. The locations of sample plots are often allocated with the help of similar sampling designs as in field surveys but the main difference is that the sample plots will not be observed on the ‘real life’ locations on the ground but with the help of remotely sensed images or products derived from remotely sensed images. A good example of a study in which remote sensing data is used for the collection of samples is the thesis of Mohammed (2019) In this thesis, the author implemented a random cluster design and collected samples with the help of high-resolution Google Earth and DigitalGlobe images. Useya et al. (2019) made use of existing land cover type and cropland products like the MODIS Land Cover Type product (MCD12Q1.006) and Global Food Security-support Analysis Data at 30 m for the African Continent, Cropland Extent (GFSAD30AFCE), for the collection of reference data.

2.2.5 Sample size According to McRoberts, Tomppo and Czaplewski (2012), the determination of the sample size is one of the most important steps in constructing a sample design for a project. This is because the uncertainty of a classification increases when the sample is too small and the cost increases when the sample is too large. Jensen (1994) therefore states that it is often difficult to determine the actual amount of mapping units (pixels) to be sampled on the ground for training and validation purposes. He stresses that the number of training samples to be collected for each land cover class, has to be equated by 10x the amount of bands that are being used for the classification. Nowadays this rule is still being applied in a lot LULC change studies. However, in more recent studies this equation for the number of training samples is sometimes questioned. For example, Van Niel, McVicar and Datt (2005) argue that in some cases similar levels of classification accuracy can be achieved with 2x or 4x the amount of bands instead of 10x or even 30x the amount of bands.

For the calculation of the required number of validation samples for a project, a lot of analysts also use certain equations based on the binomial distribution of the sample collection. These equations are often based on the proportion of correctly classified samples and on some allowable error. For example, McRoberts, Tomppo and Czaplewski (2012) mention the use of equations for the calculation of confidence intervals and confidence coefficients for determining the size of a probability sample in a forest assessment.

Although these equations are statistically sound for determining the required sample size to compute the overall accuracy of a classification, they are not designed to select a sample size for filling an error matrix (Congalton, 1991). Traditional thinking about sampling does not apply to LULC studies due to the large amount of pixels in remotely sensed data according to Glantz (1993). A single Landsat scene that comprises the Boé area for example, consists of more than 9 million pixels. Hence, it is important to find a balance between what is statistically sound and what is practicable attainable.

In his article from 1991 Congalton (1991) suggested that for an adequate accuracy assessment of a classification, a good rule of thumb is to collect a minimum of 50 samples for each different land cover class that is represented in the error matrix. He even states that when the research area is especially large like more than 1 million acres or when the classification has a large number of land cover classes like more then 12 classes, the minimum sample size should be increased to 75 or 100 samples per class. Furthermore, the sample size can also be adjusted based on the relative importance of a land cover class within the objectives of a project. According to Jensen (1994), one of the main goals is to balance the statistical recommendation to get an adequate sample to compute an appropriate error matrix while also taking into account the cost, time and practical limitations that are associated with the project.

Although Congalton’s rule of thumb was introduced in 1991 and only applies to the required number of validation samples to be collected, it is still being applied in many current LULC change studies and quite often

28

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) it is also applied as a rule to determine the total sample size, meaning training + validation samples to be collected. For example, for their study of shifting cultivation practices in the West Garo Hills in India, Kurien et al. (2019) collected a total amount of 677 samples (training + validation) across 13 land cover classes, which is slightly more than 50 per class.

In this study this rule of thumb will also be applied, meaning that 50 samples of each class will be collected for training and validation purposes but with a slight emphasis on the cashew plantation class. Hence, more samples will be collected of this class.

2.2.6 Pratical side of field survey sampling A common issue in field survey sampling is the issue concerning misregistration of reference data due to locational and GPS errors. Gu, Congalton and Pan (2015) stress that synchronization errors between satellites and code-phased GPS receivers can results in locational errors ranging from 5 to 20 meters. According to Studer (2019), the Garmin device that the Chimbo team uses for field surveys has a inaccuracy of approximately 6m.

2.2.7 Spectral assessment of collected samples The spectral signatures of the collected samples are often assessed to see whether the sampled land cover types are differentiable from each other. There are a couple different methods for this type of assessment but according to Reiche et al. (2015) the calculation the Jeffries-Matusita distance (JM-distance) is one of the most common used methods in Remote Sensing studies. The (normalised) Jeffries-Matusita distance (JM-distance) uses a measure for pair-wise class separability and allows for an objective comparison (Reiche et al., 2015). The normalised JM has a dynamic range from 0 (inseparable) to 2 (separable). According to Song et al. (2018) a value of 1,8 is often considered as a threshold for an acceptable level of separability.

In their research, Singh et al. (2019) also calculated the JM-distance to measure how well cashew plantations could be differentiated from successional forests in a National Park in Cambodia. In this research they concluded that ALOS L band radar and LiDAR laser scan data could better be used to differentiate between cashew plantations and forests than optical Landsat data. Therefore in this research, Sentinel-1 radar data will also be used for the classification. In this research a similar spectral assessment as in the study of Singh et al. (2019) will be employed as well.

2.2.8 Importance of sample selection for classifications According to Millard and Richardson (2015), several aspects of the sampling design used to collect training and validation data play an important role in the resulting classifications of machine learning algorithms (MLAs) like the Random Forest algorithm for example. They state that, regardless of the choice of MLA classifier, several factors related to sample selection can affect the accuracy of the classification results. Some of these factors are; the number of classes in the classification, the training sample size and the ability of the training data to adequately characterize the land cover classes being mapped. The general rule in image classification and accuracy assessments is that the training and validation data should be representative of the entire landscape that is being mapped, should be statistically independent (not clustered for example) and there should be abundant sample data in all land cover classes. However, Millard and Richardson (2015) stress that when the sampling data are not randomly distributed over the training and validation sample collections, they tend to violate the assumption of independence, which may lead to optimistic bias in the classification results. When this happens, the accuracy of the classification will also be inflated. Hence, to avoid this type of bias, the validation data must be drawn from a sample independent of the training data.

Furthermore, Millard and Richardson (2015) also state that MLA classifiers may tend to provide biased results when the proportions of training/validation data classes are unequally distributed or are imbalanced when compared to the actual land cover proportions. In these instances, the MLA classifier may favour the land cover classes that represent the largest proportions in the sample collection, which are often referred to as the ‘majority class’. When these classes are over-represented in the training sample collection they may dominate in the classification result and thus the classes that are under-represented in the training sample collection (minority class) may also be under-represented in the resulting classification.

29

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Sometimes when the study area to be classified consists of a lot of rare land cover classes that only represent a small proportion of the entire landscape, it is unavoidable to have imbalanced sample collections. One of the most common approaches to mitigate this problem is to make use of oversampling and under-sampling methods to produce more balanced sample collections (Millard & Richardson, 2015). According to Ramentol, Caballero, Bello, and Herrera (2011) under-sampling methods are methods that create a subset of the original dataset by eliminating some of the examples of the majority class and oversampling methods are methods that create a superset of the original data-set by replicating some of the examples of the minority class or creating new ones from the original minority class instances.

Furthermore, Millard and Richardson (2015) stress that these problems associated with imbalanced sample collections may be aggravated when high dimensional (input) datasets, meaning datasets consisting of a lot of different of dimensions and variables, are combined with small sample sizes in sample collections. In those instances, it will be very difficult for the MLA to start ‘learning’ because it is very complex for them to make decisions to address a large number of features with only a limited amount of sample points. Nowadays datasets are becoming increasingly more complex and therefore MLA classifiers require a larger training sample collection to achieve acceptable levels of accuracy. Even though most MLA classifiers like the Random Forest algorithm are able to deal with high dimensional datasets, the resulting classifications can be significantly improved if only the most important input variables are used for the classification and the correlated variables are removed.

In their study, Millard and Richardson employed a sensitivity analysis of the Random Forest Machine Learning Algorithm (MLA) to assess how three aspects of the sampling strategy and resulting training data: sample size, spatial autocorrelation and proportions of classes within the training sample would influence the Random Forest classification. In this study, a similar sensitivity analysis for different MLA classifiers will be employed.

2.3 Classification Mohammed (2019) draws a distinction between three different approaches of classification. These approaches are: manual classification, pixel-based classification and object-based classification. The approaches are all associated with different classification methods and algorithms, which are discussed in (Mohammed, 2019). This thesis will only focus on object-based methods and algorithms and therefore these methods will be further discussed in the section below.

2.3.1 Pixel-based classification In pixel-based classification, the spectral information and differences of individual pixels of remote sensing images are used for the creation of classified thematic maps. Each pixel will be assigned to a certain land cover class, based on its spectral information. In pixel-based classification, the spatial context , for example the surrounding pixels, are not important and therefore not considered. One of the advantages is that the pixel- based classification methods utilize the rich spectral information of remote sensing images. The methods are in general easy to implement and more efficient than manual methods. According to Mohammed (2019) some of the most common methods are spectral mixture, fuzzy classification, unsupervised classification and supervised classification. The last two are the more traditional methods and these two are implemented in this research. However, the supervised classification is not implemented in combination with a pixel based classification but with an object based classification, which will be discussed in section 2.3.3.

2.3.2 Unsupervised classification Enderle and Weih (2005) point out that unsupervised classification involves the separation of pixels into natural classes, based on similar spectral characteristics. This can be achieved by means of classification algorithms. Once the pixels have been assigned to the classes, the analyst will assign these classes to land cover types. According to Hank there are a couple of advantages to using this approach of classification. The first advantage is that extensive knowledge of the study area is not needed for the first separation of image pixels. Unique classes that sometimes may be overlooked in a supervised specification can be identified and recognized by unsupervised classification. Another advantage is that there is a lower chance for human error because the analyst is not required to make as many decisions during the classification process. However, the method is not

30

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) short of disadvantages and limitations. Because the analyst has not much control over the classes chosen by the classification process, the relationships between the groupings of the spectral classes and the desired land cover types are not always correlated. Another limitation is that the natural classes identified by the unsupervised classification process are spectrally homogeneous and may not correspond with the land cover classes of interest.

2.3.3 Object based classification Mohammed (2019) stresses that the evolution in recent remote sensing data sources, especially high-resolution data sources, have advanced the development of object-based methods. The main difference between object- based and pixel-based classification is that object-based classification methods use contextual information, for example neighboring pixel information, in addition to the spectral information of individual pixels to conduct the classification. Object-based classification involves two steps, these are: segmentation and classification. In the process of segmentation, the area of interest, is ‘segmented’ into small homogenous clusters. These clusters form the objects for the classification. In image segmentation, the process starts from a single pixel as object and then merges similar pixels into objects. The merging is based on specified criteria in regard to contextual and spectral aspects. Contextual criteria can be shape and compactness of the clusters for example. These criteria are specified by the producer of the image segmentation and he/she can determine to what extent the contextual and spectral criteria contribute in defining the homogenous clusters. In the process of classification, the identified objects are assigned to the land cover classes, based on their statistical properties.

Some advantages of object-based classification over pixel-based classification can be found in its ability to better handle within the field variability. Especially in fragmented complex landscapes, object-based classification provides better results than pixel-based classification. In addition, by dividing the area of interest into objects, the issue of ‘salt and pepper’ classifications can be solved.

A negative aspect is that this type of classification needs high-resolution images, which increases the research cost. Furthermore, segmentation heavily depends on the specified criteria by the producer of the segmentation. Wrong criteria can heavily affect the accuracy of the segmentation and thus the accuracy of the final classified product. During, the process of segmentation, over- and under-segmentation are some frequently occurring issues.

2.3.4 Supervised classification According to (Campbell, 2002) supervised classification involves the classification of pixels or segments of unknown identity through a classification algorithm that uses spectral characteristics of pixels of known land cover classes, which are training areas that are identified by the analyst. Again, this approach of classification has a lot of advantages but also some disadvantages and limitations. One of the main advantages is that the analyst has full control of the land cover types to be assigned in the classification. Therefore, it is easier to compare the classification with other classifications when the same classes are selected. Another advantage is that the analyst does not have to match spectral classes to land cover types because this is addressed when the training areas are selected. In addition, the final classification can be compared to the training data as a means of detecting errors or problems in the classification process. Many of the limitations are related to the process of selecting training areas. Campbell (2002) argues that the analyst imposes a classification structure upon the data by selecting training areas and certain land cover types that may not be present in the data. In the identification process of the training samples, spectral properties of the pixels are usually not the primary characteristics that are used and that may lead to overlap in the classification process. In order for the analyst to identify and select the right training areas, he/she needs to have an extensive knowledge of the study area and this process also requires time and resources that are not needed for an unsupervised classification. In addition, Enderle and Weih (2005) point out that the analyst might overlook some unique classes that are present in the image when he/she selects the land cover types and training areas

2.3.5 Maximum likelihood method One of the most common methods for supervised classifications is the Maximum Likelihood (ML) method. According to Mohammed (2019), this classification method relies on assigning probabilities, like for example the

31

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) probability of belonging to a certain land cover class, to pixels based on a statistical model like variance and covariance calculations. He, however, also argues that the main problem associated with this method is that it assumes that the probability density function for all the classes are normally distributed. This is a problem because in the real world these distributions are far more complex.

2.3.6 Machine learning algorithms Mohammed (2019) therefore suggests that the evolution in computer technology contributed a lot to classification methods due to the introduction of machine learning algorithms. The most common supervised machine learning algorithms for the classification of images are; ‘support vector machines’ (SVM), ‘artificial neural network’ (ANN), ‘decision trees’ and ‘random forest’. Unlike the maximum likelihood method which is a parametric classifier, these algorithms are non-parametric. Instead, they are data driven and therefore overcome the problem of distribution assumptions (Mohammed, 2019). Parametric means that the algorithm uses a learning model that summarizes data with a set of parameters of fixed size and therefore these data don’t necessarily have to be normal and gaussian distributed (Brownlee, 2016). The last mentioned algorithm, random forest, will be used for the final classification in this thesis and thus will be discussed with more detail in the section below.

2.3.7 Random forest The random forest machine learning algorithm makes use of so called ‘decision trees’ for the prediction of land cover classes (Koehrsen, 2018). A simple decision tree is depicted in figure 2.2. Decision trees can be considered as a series of yes/no questions asked about the data that will eventually lead to the prediction of a land cover class. Instead of just utilizing one decision tree, this algorithm will ‘grow’ many of these trees into a fictional forest and will then average the predictions assigned in each class from all these decision trees. What makes it random is that it randomly samples training data points by using the so called ‘bootstrapping’ sample method. That means that sometimes samples will be used multiple times in a single decision tree. Another concept that makes it random is that it only considers random subsets of the decision features, as can be seen inside the boxes/nodes in figure 2.2, for splitting each node. That means, for example, that when there are 16 features to consider for splitting nodes, the random forest algorithm will randomly select 4 of those features and only use those 4 for splitting the node.

Figure 2.2: A simple decision tree of a machine learning algorithm (Koehrsen, 2018)

32

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

2.4 Accuracy assessment: After the image classification process has been completed, it is important to assess the accuracy of the classification itself. Accuracy assessments involve a comparison between the map depiction of the land cover and the true land cover condition of the study area, based on reference data (Stehman, 2009). The reference data used for accuracy assessments differ from the reference data used for classification purposes because they are solely used for assessing the accuracy/quality of the classification and not for the training of the classifiers. However, as with the reference data used for classification purposes, they often consist of data collected during fieldwork. The field data used for the accuracy assessment will be referred to as validation samples

Enderle and Weih (2005) argue that a distinction can be made between non-site-specific accuracy assessments and site-specific accuracy assessments. This is because non-site-specific accuracy examines the general agreement between the classification and the reference data without looking at site specific situations. For example, in a non-site-specific accuracy assessment, there will be assessed whether the percentage of a certain land cover type in a classification reflects the overall percentage of this type in the reference data. Looking solely to the non-site-specific accuracy can therefore result in disagreements related to the spatial placement of the land cover types between the classification and the reference data. Site-specific accuracy on the other hand, looks at the agreement between the land cover types at specific locations or sites on the classified map and the reference data (validation samples) The latter will therefore be examined in this research.

2.4.1 Error matrix In site specific accuracy assessments, the examination is performed by means of an error/confusion matrix as can be seen in figure 2.3. Each cell in the matrix represents how much area of the reference land cover class has been assigned to a class on the classified image and vice versa. Hence, the cells on the diagonal reflect the correct classification and the cells that are off the diagonal, the misclassifications (Stehman, 2009). Furthermore, the sum of every row is calculated for the row total and the sum of the columns for the column total.

Figure 2.3: A simple error matrix for accuracy assessments (Stehman, 2009)

This matrix is used to compare how a specific area has been classified with how that area is represented in the reference data. It can identify instances of classification error for all the different land cover types. The two main types of classification errors are the ‘error of commission’ and ‘error of omission’. The error of commission is often referred to as a false positive and it indicates that a classified site is included in the wrong land cover class. The error of omission is often referred to as a false negative and it indicates that a site is excluded from the class to which it actually belongs. These errors balance each other out as an error of commission in one class means that it is an error of omission in the other class. According to Enderle and Weih (2005), it is better to assess these classification errors on a class by class basis before you assume that errors in one specific class reflect errors in all the other classes.

In addition to identifying classification errors, the error matrix can also be used to assess the accuracy of the classification. The three main measures of classification accuracy are: ‘producer’s accuracy’, ‘user’s accuracy’ and ‘overall classification accuracy’.

33

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

2.4.2 Producer’s accuracy The producer’s accuracy indicates for each class how well the class has been classified by the producer of the classification. It therefore provides information on how well the classification was performed. It can be calculated by dividing the correctly classified pixels by the total number of reference data pixels of that class. This total is reflected by the column total in the matrix.

The user’s accuracy indicates how often the areas that are assigned to a certain class in the classification actually belong to that land cover type in the actual situation. It therefore provides the users of the classification with so called ‘ground truth’ information because it indicates how true the classification is to the actual situation on the ground. It is calculated by dividing the correctly classified pixels by the total number of pixels of that class in the classified image. This total is represented by the row total in the matrix.

In the end, the overall classification accuracy indicates how much area has been correctly classified out of the entire classified area. This is calculated by dividing the sum of the diagonals in the matrix by the total (Enderle and Weih, 2005). Besides these measures, other statistical measures such as kappa can be calculated in order to fully assess the accuracy of the classification.

Useya et al. (2016) stress that no general consensus has yet been reached on which measures are the most appropriate for certain objectives of accuracy assessment. Though, the kappa coefficient appears to be favored in most studies because this measure assesses the whole error matrix rather than just the diagonal elements, as is the case with the overall accuracy. Nevertheless Useya et al. (2016) look at both measures in their study to determine the accuracy of their classified output.

2.4.3 Visual assessment Foody (2002) stresses that nowadays the error or confusion matrix lies at the core of the work on accuracy assessment in most land cover change studies and is therefore frequently used but without questioning its suitability. According to Foody (2002) a confusion matrix can provide useful site-specific assessments of the similarities between a classified image and ground conditions but there are many problems associated with confusion matrices as well. Some concerns, for example, are that the data derived from classified images are rarely truly site-specific due to problems of misregistration of the ground and remotely sensed datasets and because of mixed pixels. Moreover, the defined land cover classes in classifications are usually generalizations and not often are the ground data an accurate representation of the ground conditions. Due to these concerns it proves to be difficult to obtain a reliable confusion matrix and therefore Foody (2002) suggests to provide additional information on the selected sampling design for the collection of the reference data, the confidence in the ground data labels, the selected classification protocols and on the origins of the data sets that were used for the classification.

In addition to providing more quantitative components of accuracy assessment like confusion matrices, other assessment approaches may be applied as well to provide additional information. One approach is to visually assess the classified image. Although this approach was mainly used during the early stages of mapping studies when a map would be considered accurate if it looked right or good, in more recent studies this approach is still being applied. For instance, Useya et al. (2019) randomly selected locations in their classified images and then consulted local experts to assist them in comparing these classified images with high-resolution ESRI images of the same locations. Christof et al. (2013) also utilized high-resolution Quickbird, RapidEye and SPOT5 satellite images to visually assess the distribution of riparian forest zones in their study area.

2.5 Change detection Changes on the surface of the earth can occur because of urbanization, natural disasters, changes in coarse of rivers or deforestation for example. Al-doski, Mansor and Shafri (2013) divide these changes into two categories, which are changes in ‘land use’ and ‘land cover’. Land use refers to the purpose for which a specific piece of land is used for, for example mining, agriculture or urbanization. Land cover refers to the features that cover the earth’s surface like for example trees, buildings, roads etc. According to Al-doski, Mansor and Shafri (2013), a timely and accurate change detection analysis can aid in better understanding the interaction and relationship

34

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) between natural phenomena and humans, which will result in better management and use of (natural) resources.

2.5.1 Change detection procedures and techniques A change detection analysis involves the employment of multi-temporal datasets to analyze changes of land cover classes in a quantitative manner. This type of analysis can be achieved by traditional methods and by using remote sensing techniques. The traditional methods are often very time consuming, expensive and not so accurate while these problems are not often associated with remote sensing techniques.

A multitude of different techniques for change detection can be used to detect and analyse changes on the earth’s surface. Before one uses any of these techniques, it is important to know about the procedures of a change detection analysis. Jensen (2005) identifies six important steps that have to be followed to detect changes of the surface of the earth. These steps are:

• Identification of the nature of change detection problems • Selection of remotely sensed data • Image pre-processing • Image processing or classification • Selection of change detection algorithm • Evaluation of change detection results

One of the main goals of a change detection analysis is to distinguish the areas on digital images which depict changes in the features of interest between multiple dates. It is always important to check the reliability of the change detection process because this process can be strongly influenced by various environmental factors that might change between different dates. Al-doski, Mansor and Shafri (2013) summarize some of the most commonly used change detection techniques which are:

• Post-classification • Image differencing • Direct multidate classification • Principal component analysis • Change vector analysis • Image ratioing

2.5.2 Post-classification technique The first one, which is sometimes also referred to as delta classification has proven to be the most popular approach in change detection analysis (Al-doski, Mansor and Shafri, 2013). This approach involves the comparison of different independently produced classified images. Comparable thematic maps can be created if the different classified images are (ortho)rectified independently and represent the same exact area. In these thematic maps the corresponding land cover labels are represented by geometrically referenced pixels which can be compared to identify the areas where change has occurred. When the corresponding pixels are compared to each other, the differences in per pixel units within a certain land cover class as well as the group of pixels of the same land cover class can be identified (Kaliraj et al., 2017). This comparative analysis allows for the estimation of differences in area and perimeter of the different classes. Change maps and change detection matrices can easily be obtained from this analysis if the classified images were properly coded for the different dates that are analysed in the project (Al-doski, Mansor and Shafri, 2013).

A lot of advantages are associated with the post-classification technique. For example, it is less affected by environmental, atmospheric and sensor differences because data from two different dates are separately classified, which minimizes the problem of normalizing for atmospheric and sensor differences between two dates (Al-doski, Mansor and Shafri, 2013). Nevertheless, it still has to be noted that the results derived from this technique are only as accurate as the individual classified images themselves (Civco et al., 2002). Moreover, when multi-sensor or multi-date images are used, this technique could lead to wrong results because of

35

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) differences in the radiometric characteristics of the satellite images from which the thematic maps were obtained (Al-doski, Mansor and Shafri, 2013).

Figure 2.4: A simple change detection matrix (Juliev et al., 2019)

2.5.3 Change detection matrix An example of a change detection matrix is provided in figure 2.4. This matrix is arranged by columns representing the total area of the land cover classes in the initial state (oldest classified image) and by rows representing the total area of the land cover classes in the final state. The difference in area can be calculated by subtracting the area of the initial state from the area of the final state (Kaliraj et al., 2017).

2.6 Forest monitoring systems

2.6.1 Hansen dataset Hansen et al. (2014) developed an analysis method which improved on the existing knowledge of global forest extent and change. Some of the method’s main strengths lie in its abilities to quantify gross forest loss and gain on a global level, provide annual forest loss information, quantify trends in forest loss and the fact that it is spatially explicit. The analysis is based on Landsat data and is derived through an internally consistent approach that is exempt from the vagaries of different data inputs, methods and definitions. In this analysis, forest loss is defined as the complete removal of the tree cover canopy as well as a stand-replacement disturbance at the Landsat pixel scale. Forest gain is defined as the establishment of tree canopy from a non-forest state or the inverse of loss from forest patches that have experienced losses before. According to Elias et al. (2014), the Hansen dataset is a scientific breakthrough because its global nature facilitates comparability across different jurisdictions, its data are transparent, accessible and free of charge. In addition, its methodology, uncertainty, and results are fully shared with users. Furthermore, it meets the key principles of the IPCC, namely: consistency, completeness, transparency, comparability and accuracy. Hence, it can be implemented by all different kinds of government agencies all over the world.

In their consultancy report on carbon credits in the Boé, van Gilst et al. (2019) made use of the Hansen dataset but this research was more specifically focused on carbon credits rather than forest loss. In contrast to this consultancy report, Vieilledent et al. (2018) did analyze the Hansen dataset in relation to forest loss. They, namely, combined the more recent global annual tree cover loss data of the Hansen dataset with historical national forest cover maps to look at six decades of deforestation and forest fragmentation in Madagascar. In addition, Galiatsatos et al. (2020) and Mitchard, Viergever, Morel, and Tipper (2015) also assessed the accuracy of this dataset in relation to forest loss and forest monitoring in their respective studies.

36

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

2.6.2 BFAST monitoring The ‘Breaks For Additive Seasonal and Trend’ (BFAST) change detection method, that was first introduced in the study of Verbesselt et al. (2010), detects and characterizes seasonal and trend changes within historical time series, including satellite image time series (Verbesselt et al., 2012). What makes it stand out from other time series change detection methods is its ability to integrate the iterative decomposition of time series into seasonal, trend and remainder components with methods for detecting and characterizing changes/breakpoints within time series (Verbesselt et al., 2010).

By accounting for seasonal, trend and remainder changes within time series, it avoids the need to summarize (satellite) data annually, which always leads to a loss of information. This ability also enables the BFAST method to classify different types of changes. For example, changes that occur in the seasonal component indicate phenological changes like changes in land cover type, while changes that occur in the trend component often indicate so called ‘disturbances’ which could be natural like fires or insect attacks or anthropogenic disturbances like farming, deforestation or urbanization. Moreover, changes occurring in the remainder or ‘noise’ component indicate the remaining variation in the data beyond that in the trend and seasonal components.

Some other advantages of BFAST are found in the fact that this method is not specific to a particular type of data and that it can be applied to time series without the need to define a change trajectory, select a reference period, normalize for land cover types or set a specific threshold. Furthermore, it allows for the analysis of long time series of different vegetation indices like NDVI, NDMI, etc.

In a later study, Verbesselt et al. (2012) added some steps to the change detection method, which made it appropriate for near real-time global scaled disturbance detection/monitoring. This improved method is called BFAST monitoring and is able to detect disturbances within newly acquired time series (satellite) data by automatically identifying a stable history period to model the season-trend variation in vegetation against which disturbances can be detected (Verbesselt et al., 2012). The time series in this new method is split in two periods, which are a history period and a monitoring period. The first period represents data that has already been acquired and which will be analyzed for stability to model normal vegetation behavior/dynamics with the season-trend model. The monitoring period represents new data that recently has been captured and that needs to be analyzed for disturbances. A disturbance is detected when the season-trend model does not remain stable for new incoming observations. With BFAST near real-time monitoring, disturbances can be detected with only a short delay.

Some of the advantages of this improved BFAST monitoring method are found in the fact that it is implemented in the open source software environment of R and is freely available in the BFAST package for R (Verbesselt et al., 2012). It is fast and therefore requires a minimum amount of processing time, the method can analyze time series with data gaps like sensor defects or masked clouds and it does not require gap filling techniques. Furthermore, it analyzes the full temporal detail of a time series, which is of importance because longer high temporal and spatially detailed satellite image time series are becoming available and therefore robust methods for the analysis of these time series are needed. Due to this emergence of high temporal and spatially detailed satellite image time series, disturbance detection will become possible at more detailed spatial scales where many human induced disturbances like farming for example tend to operate.

One of the limitations of this BFAST monitoring method, according to DeVries et al (2015), is that as the length of the monitoring period is increased, the measure of magnitude may be affected by an increased number of observations of disturbances before and after the measured change event. Therefore, DeVries et al. (2015), decided to limit the monitoring period to one year and then repeat that process in an iterative fashion for each year. In this research, this will not be done because of the time that it will take to repeat this step for each year. Instead, the monitoring period will be set from 2009 to 2020 to measure forest disturbances over a longer period of time.

37

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

3. Study Area & Data

3.1 Study Area

Figure 3.1: The study area in this research

The Boé sector is situated in the south-eastern part of the Gabu province. The capital is Madina de Boé but the government seat is located in Béli (Chimbo, 2018b). The Boé comprises approximately 3,300 km2 and has a population of approximately 12, 000 people that are living in 85 villages. The Boé sector is the least populated sector in Guinea-Bissau with a population density of only 3.6 inhabitants per square kilometer according to Temudo & Santos (2017). As can be seen in figure 3.1, The Boé comprises large parts of the Boé National Park (BNP) in the North-East, the Dulombi National Park in the West and the CheChe and Cuntabane Wildlife Corridors that connects the parks with each other.

The Boé sector has a tropical savannah (AW) climate (Chimbo, 2018b). Throughout the year, the average daytime temperatures vary from 30 to 33º Celcius. The warmest months in the eastern hinterland of Guinea-Bissau are April and May in which the average monthly highest temperature is 41.7 ºC (Temudo & Santos, 2017). The average nighttime temperatures in the Boé vary from 18 to 23 ºC (Chimbo, 2018b). The eastern hinterland of Guinea-Bissau experiences slightly less rainfall during the rainy season compared to other regions in the country, on average 1169 mm, with a minimum of 314 mm and a maximum of 1885 mm (Temudo & Santos, 2017).

In geographical terms, the Boé is the most north-westerly part of the Fouta Djallon mountain range which is situated for its majority in the Republic of Guinea (Chimbo, 2018b). In the regional action plan for the conservation of western chimpanzees (Pan troglodytes verus) 2020–2030 that has been written in 2020, the Fouta Djallon is considered as one of the ‘exceptionally important priority areas’ for the conservation of Chimpanzees (IUCN SSC Primate Specialist Group, 2020). According to Breider (2016), the Boé is covered by a thick laterite cap, which is dissected by narrow valleys of up to a few hundred meters wide and > 10 km long. Most of this laterite cap is covered by grasslands but on some places, where the soil is sufficiently deep, forests may develop. However, as Breider (2016) states, most of these places are used by the local population for their

38

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) practice of slash and burn agriculture to grow rain-fed rice and some other crops like peanut, cassava, corn etc.. Over the last decades, the Boé has experienced an increase in immigration of large cattle owners from the neighboring country Guinea Conakry who want to herd their livestock on the grasslands of the Boé.

39

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

3.2 Land Cover Types This section addresses the different land cover types that are sampled during the fieldwork campaign. Figure 3.2 provides photos depicting all the land cover types. Most of the photos are taken during the fieldwork campaign of this research but some of the photos of the rice field, fallow land and peanut field types were taken during an earlier fieldwork campaign that was conducted by Studer (2019). These photos are included in this research because they clearly show the differences in appearance of these types between the dry and rainy season.

1. Gallery forest

2. Primary dry forest

3. Secondary dry forest

40

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

4. Cashew plantation (<5 years)

5. Cashew plantation (>5 years)

6. Fallow Land (<3 years)

41

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

7. Fallow Land (>3 years)

8. Woodland

9. Savannah

42

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

10. Rice Field

11. Peanut field

12. Other agricultural crops

43

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

13. Wetland

14. Water bodies

15. Built-up (villages)

Figure 3.2: Depictions of the different land cover types that were sampled

44

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Table 3.1 provides a detailed description of the sampled land cover types that are depicted in figure 3.2. The criteria for the selection process of some of the land cover types are also mentioned in table 3.1.

Table 3.1: Descriptions of the different land cover types that were sampled Land cover type Description 1. Gallery Forest The gallery forest type consists of closed forests with (almost) closed canopies that were growing alongside rivers, streams and wetlands. Some forests that were growing alongside dried-up streams were also considered because usually these streams fill-up with water in the rainy season. 2. Primary Dry For the type primary dry forest, closed forests with relatively closed canopies were Forest considered of which it seemed that they were not disturbed by fires or human actions yet. 3. Secondary Dry This type consists of open forests with open canopies and with an abundance of Forest smaller trees and grasses which have been disturbed by fires or human actions once. Often these forests were adjacent to or completely surrounded by savannah plains. 4. Cashew The type cashew plantation (<5 years) consists of young developing cashew plantation (<5 plantations with small trees of which most of them have not yet reached full years) production. The production generally starts in year 3. The maximum age of the plantations that were mapped for this type is 5 years 5. Cashew The type cashew plantation (>5 years) in this thesis consists of matured plantations plantation with relatively big trees that have reached full production. (> 5 years) 6. Fallow land (1-3 The young fallow lands mapped for this research are former agricultural plots that, , years) have been abandoned recently. Only the plots that had been abandoned for at most three years were considered for this type. The plots consist of a lot of shrubs, bushes, weeds and tall grass. Sometimes some remnants of crops could still be found on these plots. 7. Fallow land (> 3 The older fallow lands mapped for this research had been abandoned for longer than years) three years. Besides the shrubs and weeds, some small trees were already growing on most of these plots and no remnants of crops could be found here. 8. Woodland The type woodland consists of open savannah plains vegetated with grasses and some small trees and shrubs but significantly less densely vegetated than secondary forests. 9. Savannah The type savannah consists of open plains with savannah grass. The savannah plains that were recently burned were also mapped. 10. Rice field The rice fields mapped consist of open fields with upland/rainfed rice, a relatively high growing crops. As this research was conducted during the dry season in the Boé, most of the fields were already harvested and the remaining vegetation was very dry with a yellow color but during the rainy season these crops are very green as can be seen in figure 3.2. 11. Peanut field The peanut fields are similar to the rice fields but the crops grows less high with peanuts developing underground. 12. Other The plots of the other agricultural crops type mainly consist of cassava plantations or Agricultural Crops mixed fields with corn and rice or rice with orher crops like sorghum for example. This mixed system is also known as ‘intercropping’. 13. Wetland For the wetland type, plots that are flooded for a prolonged time like the shallow floodplain (bas-fonds) near Aicum as well as dried-up plots that are only flooded during the rainy season were mapped. The local guides knew whether a plot could be considered as a wetland. 14. Water Bodies The water bodies type consists of large rivers like the Corubal and Fefine and smaller streams like the Quissem river for example. In addition, lakes like the lake in Vendu Cham were also mapped. This lake will be easily recognizable in all the classifications as it will appear as a big blue circle of the water bodies type. All the different water body types are highlighted in figure 3.2. 15. Built-up Area The type built-up area consists of villages and small settlements with small huts and (Villages) houses with thatched and corrugated iron roofs.

45

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

3.3 Data

3.3.1 Sampling data:

3.3.1.1 Training Sampling Data Sampling data were purposively collected on selected locations throughout the whole Boé area. The samples were collected during a fieldwork campaign which occurred during four months in the dry season 2019/2020. The data were collected with the help of local guides who had an extensive knowledge of the area and who could determine whether a location was accessible or not. On location, homogenous patches/polygons of one specific land cover type were registered by means of a Garmin GPS device. The resulting polygons all had at least the size of approximately 30m * 30m, which is the equivalent of 3 *3 Sentinel-2 pixels, and 1 pixel of the Landsat 8 images. The registered tracks were later imported into the training sample manager in the ArcGis Pro project for further processing. More information about the fieldwork campaign will be discussed in the methodology chapter.

3.3.1.2 Sacred Forest Data The sacred forest data that have been collected by the Chimbo Foundation over the past years, are used as training sample and validation data in this research. For the sample collections that are used for the 2020 classifications, it was decided to only use forests that were mapped in 2019 as training and validation data because these forest were the most recently mapped forests and therefore were most likely to be still untouched and less likely to be damaged by external factors like fires for example. These were 32 forests in total, which were allocated to the gallery forest, primary dry forest and secondary dry forest types, based on how they were classified by the members of the Chimbo team. Fortunately, the Chimbo team used similar classes for the mapping of sacred forests in the sacred forest database, namely gallery forest, primary forest and secondary forest. Hence it was easy to allocate them to the types that are used in this research. 13 of the 32 mapped forests of 2019 are allocated to the gallery forest type, of which 11 are used as training samples and 2 as validation samples. Furthermore, 9 mapped forests are allocated to the primary dry forest type, of which 7 are used as training samples and 2 as validation samples. The last 10 of the mapped forests of 2019 are allocated to the secondary dry forest type, of which 8 are used as training samples and 2 as validation samples.

Additionally, the sacred forests that were mapped in 2016 were added to the adjusted sample collection of 2016. This adjusted sample collection is further discussed in the methodology chapter. The mapped forests of 2016 were 72 forest in total. 66 of them are allocated to the gallery forest type, of which 50 are used as training samples and 16 as validation samples. Furthermore, 6 of the mapped forest are allocated to the secondary dry forest type, of which 4 are used as training samples and 2 as validation samples. The rest of the mapped sacred forests are converted into a feature layer which will be laid over the final land cover map.

3.3.2 Satellite data: Most of the satellite data, satellite derived data and the Aster GDEM data are collected with the help of the Google Earth Engine. This process involves some simple java scripting and luckily most of these data already come in pre-processed format and are thus ready to download. The Landsat 7 ETM+ data-batch of images from 1999 to 2020 is collected through the USGS Earth Explorer. These images are readily available for download and therefore no scripting is involved in this process. A flowchart of the preliminary data collection is provided in figure 3.3.

46

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 3.3: Flowchart of the preliminary data collection

3.3.2.1 Sentinel-1 Data The Sentinel-1 mission was the first of the five Sentinel missions that ESA is developing for the Copernicus initiative. This mission consists of two polar-orbiting synthetic aperture radar satellites, which are able to operate during day and night. The C-band transmitter enables them to acquire imagery regardless of the weather (ESA; Sentinel Online, 2019a). This Sentinel-1 C-band transmitter obtains radar data in four different observation/acquisition modes, as can be seen in figure 3.4. The first mode, the ‘Interferometric Wide swath’ (IW) mode has a 250 km swath width with a spatial resolution of 5m x 20m and a burst synchronization for interferometry (Kramer, 2020). The IW mode is considered the standard mode for observations of land masses and is also the acquisition mode used for this thesis. The ‘Wave’ (WV) mode has a low data rate with a 5m x 20m spatial resolution. It samples images of 20 km x 20 km at 100 km intervals along the orbit, as can be seen in figure 3.4. This mode at VV polarization is the default mode for observations of open ocean. The ‘Stripmap’ (SM) mode has a 80 km swath and a spatial resolution of 5m x 5m. The last mode, the ‘Extra Wide Swath’ (EW) mode is actually acquired through overlapping swathes, which make up the 400 km swath. Its spatial resolution is 25 m x 100m.

47

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 3.4: The different acquisition modes of the Sentinel-1 C-band transmitter (Luqman, 2017)

For this research, the default VV and VH bands are acquired in ascending IW acquisition mode. The band ‘VV’ is a single co-polarization band that transmits and receives its signal in vertical direction (ESA; Sentinel Online, 2019a). The ‘VH’ band is also a single co-polarization band but this one transmits its signal in vertical direction and receives it in horizontal direction. The data were acquired with the Google Earth Engine. The Sentinel-1 data on Google Earth Engine is provided as Ground Range Detected (GRD) scenes and is already pre-processed with the use of the Sentinel-1 toolbox. This means that the data is already calibrated and ortho-corrected and the thermal noise is removed, which makes it ready to use for analysis for users.

48

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Table 3.2: Sentinel-1 data used in this research Year Image Collection ID Spectral Spatial Date Purpose Bands Resolution Sentinel- Image, comprised of a mosaic of: VV 10 meters 08-03-2016 RQ5: Change 1 image S1A_IW_GRDH_1SDV_20160308T190827_20160308T190852_010282_00F347_7F8B VH detection of 2016 S1A_IW_GRDH_1SDV_20160308T190852_20160308T190917_010282_00F347_53E0 analysis Sentinel- S1A_IW_GRDH_1SDV_20190209T190901_20190209T190926_025857_02E0B1_9C69 VV 10 meters 09-02-2019 RQ1: 1 image VH Unsupervised of 2019 classification Sentinel- S1A_IW_GRDH_1SDV_20200204T190907_20200204T190932_031107_03932D_0763 VV 10 meters 04-02-2020 RQ3 1 image VH Supervised of 2020 classification

3.3.2.2 Sentinel-2 Data The Sentinel-2 mission consists of two polar-orbiting satellites that are placed in the same sun-synchronous orbit, phased at 180° to each other (ESA; Sentinel Online, 2019b). It has a wide swath width (290 km) and high revisit time (10 days at the equator with one satellite, and 5 days with 2 satellites under cloud-free conditions which results in 2-3 days at mid-latitudes). It’s spatial resolution varies between 10 and 60 m and it covers the world between latitudes 56° south and 84° north with 13 bands in the visible, near infrared, and short wave infrared part of the spectrum (ESA; Sentinel Online, 2019b).

For this thesis, Sentinel-2 scenes are collected with Google Earth Engine. For 2016, the level 2A (atmospherically corrected) format is not available and therefore the scene of 2016 is collected in level 1C format. This meant that the scene still had to be atmospherically corrected manually. The scenes of 2019 and 2020 were collected in level 2A format with Google Earth Engine and therefore did not require any pre-processing and were ready to use for analysis. For all the scenes, the Sentinel-2 bands with a 10m and 20m resolution were collected. The 10m bands are bands 2,3,4 and 8 as can be seen in figure 3.5(a). The 20m bands are bands 5,6,7, 8A, 11 and 12 as can be seen in figure 3.5(b).

Figure 3.5: Overview of the Sentinel-2 bands with (a) 10m and (b) 20m spatial resolution (ESA; Sentinel Online, 2020)

49

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

A more detailed overview of all the satellite data used in this thesis is provided below in tables 3.3 and 3.4.

Table 3.3: Sentinel-2 data used in this research Year Image Collection ID Spectral Bands Spatial Date Purpose Resolution Sentinel- Western part of the image, comprised of a mosaic of: Blue (B2) 10 meters 11-03-2016 RQ4: Change 2 image S2A_MSIL1C_20160311T112102_N0201_R037_T28PET_20160311T113234 detection Green (B3) of 2016 S2A_MSIL1C_20160311T112102_N0201_R037_T28PEU_20160311T113234 analysis S2A_MSIL1C_20160311T112102_N0201_R037_T28PFT_20160311T113234 Red (B4) S2A_MSIL1C_20160311T112102_N0201_R037_T28PFU_20160311T113234 Red Edge 1 (B5) 20 meters Eastern part of the image, comprised of a mosaic of: Red Edge 2 (B6) 08-03-2016 S2A_MSIL1C_20160308T112002_N0201_R137_T28PET_20160308T112645 Red Edge 3 (B7) S2A_MSIL1C_20160308T112002_N0201_R137_T28PEU_20160308T112645 S2A_MSIL1C_20160308T112002_N0201_R137_T28PFT_20160308T112645 NIR (B8) 10 meters S2A_MSIL1C_20160308T112002_N0201_R137_T28PFU_20160308T112645 Red Edge 4 (B8A) 20 meters SWIR 1 (B11) SWIR 2 (B12) Sentinel- Western part of the image, comprised of a mosaic of: Blue (B2) 10 meters 04-02-2019 RQ1: 2 image S2A_MSIL2A_20190204T112251_N0211_R037_T28PET_20190204T140902 Unsupervised Green (B3) of 2019 S2A_MSIL2A_20190204T112251_N0211_R037_T28PEU_20190204T140902 classification S2A_MSIL2A_20190204T112251_N0211_R037_T28PFT_20190204T140902 Red (B4) S2A_MSIL2A_20190204T112251_N0211_R037_T28PFU_20190204T140902 Red Edge 1 (B5) 20 meters Eastern part of the image, comprised of a mosaic of: Red Edge 2 (B6) 06-02-2019 S2B_MSIL2A_20190206T111239_N0211_R137_T28PET_20190206T135701 Red Edge 3 (B7) S2B_MSIL2A_20190206T111239_N0211_R137_T28PEU_20190206T135701 S2B_MSIL2A_20190206T111239_N0211_R137_T28PFT_20190206T135701 NIR (B8) 10 meters S2B_MSIL2A_20190206T111239_N0211_R137_T28PFU_20190206T135701 Red Edge 4 (B8A) 20 meters SWIR 1 (B11) SWIR 2 (B12) Sentinel- Western part of the image, comprised of a mosaic of: Blue (B2) 10 meters 04-02-2020 RQ3: 2 image S2B_MSIL2A_20200204T112149_N0214_R037_T28PET_20200208T053331 Supervised Green (B3) of 2020 S2B_MSIL2A_20200204T112149_N0214_R037_T28PEU_20200208T053331 classification S2B_MSIL2A_20200204T112149_N0214_R037_T28PFT_20200208T053331 Red (B4) S2B_MSIL2A_20200204T112149_N0214_R037_T28PFU_20200208T053331 Red Edge 1 (B5) 20 meters Eastern part of the image, comprised of a mosaic of: Red Edge 2 (B6) S2A_MSIL2A_20200206T111231_N0214_R137_T28PET_20200206T124429 Red Edge 3 (B7) S2A_MSIL2A_20200206T111231_N0214_R137_T28PEU_20200206T124429 S2A_MSIL2A_20200206T111231_N0214_R137_T28PFT_20200206T124429 NIR (B8) 10 meters S2A_MSIL2A_20200206T111231_N0214_R137_T28PFU_20200206T124429 Red Edge 4 (B8A) 20 meters

SWIR 1 (B11) SWIR 2 (B12)

50

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

3.3.2.3 Landsat 8 Data

The NASA Landsat program, with its first satellite (Landsat 1) launched in 1972, offers the longest continuous global record of the Earth’s surface (NASA; Landsat Science, 2019). The Landsat 1 was the first Earth-observing satellite to be launched with the intent to study and monitor Earth’s landmasses. Currently, the program provides two operational satellites, which are the Landsat 7 and 8.

The Landsat 7 was launched in 1999 and observes the Earth with the Enhanced Thematic Mapper Plus (ETM+) sensor. It orbits in a polar sun-synchronous orbit and scans the entire surface of the Earth with a repeat coverage of 16 days. The ETM+ sensor offers a panchromatic band with 15m spatial resolution, 6 different bands with 30m spatial resolution in the visible, near-infrared (NIR) and mid-infrared (MIR) part of the spectrum and a thermal infrared channel with 60m spatial resolution. One of the limitations of this satellite is that its so called ‘Scan Line Corrector’ (SLC) broke down in 2003. When this happened, the satellite had to suspend its acquisition of imagery for six weeks but luckily after these six weeks it was able to resume its mission. The failure, however, still affects the imagery of the Landsat 7 as the corrector can no longer correct for the ‘zigzag’ motion of the imaging field of view of the spacecraft and therefore the scans are no longer aligned parallel to each other. Hence, the ETM+ sensor currently (only) acquires about 75% of the data for any given scene. The Landsat 8 satellite consists of two science instruments, which are the Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIRS). The OLI also offers a panchromatic band with 15m spatial resolution and 8 different bands with 30m spatial resolution in the visible, near-infrared (NIR) and short wave infrared (SWIR) part of the spectrum, which is two more than the Landsat 7 satellite. The TIRS offers two (more narrow) bands with 100m spatial resolution in the thermal infrared region, which was covered by only one wide band on the Landsat 7 satellite. Landsat 8 has a 185km wide swath width and, like the Landsat 7, scans the entire surface of the Earth with a repeat coverage of 16 days.

Figure 3.6: Overview of the Landsat 7 ETM+ and Landsat 8 OLI and TIRS bands

For this research, the Landsat 8 OLI and TIRS scenes needed for the classification were collected with Google Earth Engine. These scenes were already available in orthorectified, atmospherically corrected, surface reflectance, ‘tier1’ format and therefore did not require any pre-processing and were ready to use for analysis. For all the scenes, the blue, green, red, NIR, SWIR1 and SWIR2 bands were collected as can be seen in table 3.4.

51

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Table 3.4: Landsat 8 data used in this research Year Image Collection ID Spectral Bands Spatial Date Purpose Resolution Landsat LANDSAT/LC08/C01/T1_SR/LC08_203052_20160317 Blue (B2) 30 meters 17-03-2016 RQ5: 8 image Change Green (B3) of 2016 detection Red (B4) analysis NIR (B5) SWIR1 (B6) SWIR2 (B7)

Landsat LANDSAT/LC08/C01/T1_SR/LC08_203052_20200328 Blue (B2) 30 meters 28-03-2020 RQ3: 8 image Supervised Green (B3) of 2020 classification Red (B4) NIR (B5) SWIR1 (B6) SWIR2 (B7)

3.3.3 Ancillary Data:

3.3.3.1 Aster GDEM The Aster Global Digital Elevation Model data Version 3 provides a global digital elevation model (DEM) of land areas on Earth at a spatial resolution of 1 arc second (approximately 30,87 meter horizontal posting at the equator) (NASA; EarthData, 2019). The dataset is provided in 22,912 separate tiles or granules that each contain at least 0.01% land area. All these tiles combined cover the Earth from 83° North to 83° South. The data is downloaded from the NASA Earth Data website. Because the Boé comprises four different tiles or granules, all these separate granules are downloaded and are mosaicked in ArcGis Pro.

3.3.3.2 Boé shapefiles In his thesis on land cover change in the Boé area, Studer (2019) has created shapefiles representing the geometry of the Boé area and of the separate regions comprising the Boé area like the Boé National Park for example. Furthermore, he created shapefiles representing the locations of most of the villages, roads, and rivers in the Boé area. These shapefiles are used in this research for multiple analysis. For example, the geometries of the different regions of the Boé area are used in the forest monitoring analysis to assess forest loss per region. The shapefile of the Boé region is presented in figure 3.7.

52

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 3.7: Shapefile of Boé area that is used in this research

3.3.4 Validation Data:

3.3.4.1 Validation samples Like the training samples, the validation samples are collected during the fieldwork campaign. As mentioned before, 80% of the recorded samples are assigned to the training sample collection and 20% to the validation sample collection. The validation samples are used as reference data for the accuracy assessment of the final classified output.

3.3.4.1 Panorama photos for validation Panorama photos were taken on location during the fieldwork campaign. The photos were taken from the center of the sample plot and were taken in northern, eastern, southern, and western direction. These photos may provide additional contextual information for the accuracy assessment of the final classified output.

3.3.4.2 ESRI High-Resolution Imagery High-resolution imagery of the year 2018, which is available in the ArcGis Pro software is used for the visual validation of multiple outputs.

3.3.5 Forest Monitoring Data:

3.3.5.1 Hansen global forest cover dataset The Hansen Global Forest Change v1.7 (2000-2019) dataset is downloaded from the Google Earth Engine Catalog. Of this dataset, the ‘bands’ treecover2000, loss, gain, lossyear are used for the first forest monitoring analysis, as can be seen in table 3.5.

3.3.5.2 Landsat 7 ETM+ images for BFAST monitoring method The Landsat 7 ETM+ scenes needed for the BFAST monitoring method were downloaded from the USGS Earth Explorer platform as can be seen in figure 3.3. For this analysis, only Landsat 7 ETM+ Surface Reflectance scenes at WRS-II path 203 and row 052 with less than 70% cloud cover and an additional NDMI layer were desired and therefore these criteria were applied to the data download. This resulted in a data batch 0f 236 Landsat 7 ETM+ Surface Reflectance scenes from the 19th of September 1999 up until the 21th of April 2020. These scenes were then ready to be pre-processed for further analysis, which will be further discussed in the methodology chapter.

Table 3.5: Forest monitoring data used in this research Year Image Collection ID Spectral Bands Spatial Date Purpose Resolution Landsat 7 ETM+ Image stack of 236 scenes from: NDMI (vegetation 30 meter 19-09- RQ6: images of 1999- LE072030521999091901T1-SC20200502144946 index) 1999 BfastSpatial 2020 to − forest monitoring LE072030522020042101RT-SC20200427103235 21-04- 2020 Hansen global Hansen Global Forest Change v1.7 (2000-2019) treecover2000 1 arc sec 01-01- RQ6: forest change loss (± 30,87 meter) 2000 Forest cover dataset of 2000- gain − change 2019 lossyear 01-01- assessment 2020

53

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

4. Methodology This chapter provides an outline of all the methods that were applied in the different analysis of this research. It provides a detailed overview of all the individual steps that were carried out to answer the different research questions. As in the review chapter, these steps are structured by research question and for each step, a separate flowchart is included.

4.1 Preliminary data processing Although most datasets already come in pre-processed format in the Google Earth Engine, some pre-processing was still required as these datasets had to be implemented in the analysis part of this research. The section below discusses all the pre-processing procedures into more detail.

4.1.1 Pre-processing of Sentinel-2 data The Sentinel-2 data in this research required some pre-processing because not all the data could be downloaded in the same quality format and because they needed to be combined with so called indices and Sentinel-1 SAR data. Furthermore, these combined datasets also needed to be clipped to the geometry of the Boé for the final land cover classification. Figure 4.1 displays a flowchart of the pre-processing process for the Sentinel-2 data.

Figure 4.1: Flowchart of the pre-processing process for the Sentinel-2 data

4.1.1.1 Atmospheric correction of the Sentinel-2 data The Sentinel-2 data of 2016 only come in 1c level format and therefore required atmospherically correction to be turned in level 2A format. This atmospheric correction was performed with the Sen2Cor processor which is

54

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) freely available in the SNAP software provided by ESA. In this processor, the L2A-GIPP file was used as a template to convert the 1c format to the 2A format. This GIPP file is provided in the SNAP software as well.

4.1.1.2 Mosaicking of Sentinel-2 scenes As mentioned in section 1.6, the different scenes of the Sentinel-2 images needed to be combined into one raster file that comprised the whole Boé and that is why they were mosaicked. The scenes of 2019 and 2020 were mosaicked in GEE with the ‘.mean()’ java script function but this was not applicable for the atmospherically corrected scenes of 2016. Hence, these scenes were mosaicked with the help of the Mosaic tool in ArcGIS Pro. In this tool, the Mosaic Operator was set to ‘Mean’ to match the java script function of GEE and this meant that the average value of overlapping cells was assigned to the output cells in those overlapping areas.

4.1.2 Pre-processing of Landsat 8 data The pre-processing of the Landsat 8 data involved most of the same procedures as represented in figure 4.2 except for the above mentioned atmospheric correction and mosaicking procedures. This is because the whole Boé was captured in a single Landsat 8 scene and because all the Landsat 8 data of all the assessed years were provided in the same pre-processed, atmospherically corrected quality format in GEE. The procedures that were still required for the pre-processing of the Landsat 8 data are displayed in figure 4.2.

Figure 4.2: Flowchart of the pre-processing process for the Landsat 8 data

4.1.3 Calculating (vegetation) indices For the Sentinel-2 and Landsat 8 raster files, indices’ were calculated that provide additional information on some specific characteristics of the different land cover types. For example, the NDVI index provides additional information on the relative biomass and chlorophyll absorption of vegetation types and the NBR index highlights burned areas.

The same indices were calculated for the Sentinel-2 and Landsat 8 raster files, which are, ‘NDVI’, ‘NDWI’,’NBR’, ‘Cig’, ‘SAVI’, ‘ARVI’. The ‘NDVI’, ‘SAVI’, ‘ARVI’ and ‘GCI’ indices all provide additional information on vegetation

55

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) characteristics, the NDWI highlights water bodies and, as mentioned before, the ‘NBR’ index highlights burned soils.

These indices were calculated by the following equations:

The calculations of the indices were conducted with the ‘Band Arithmetic’ tool in ArcGIS Pro and the same procedure was followed for the Sentinel-2 and Landsat 8 raster files. The only difference was that other bands were used because, for example, the NIR band is represented by a different band number in the Landsat and Sentinel sensors. The calculations of the indices resulted in new raster layers.

4.1.4 Pre-processing of Aster GDEM data The pre-processing of the Aster GDEM data required some other procedures than the pre-processing process of the satellite data. These procedures are displayed in the flowchart in figure 4.3.

Figure 4.3: Flowchart of the pre-processing process for the Aster GDEM data

56

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

4.1.4.1 Mosaicking of Aster GDEM tiles As mentioned in section 3.3, the Aster GDEM data did not come in one tile and are therefore had to be mosaicked to create a DEM comprising the entire Boé.

4.1.4.2 Generating of slope layer In addition to the DEM, the slope layer was used as additional ancillary data for the final classification. The Slope layer was extracted from the Aster GDEM data by the ‘Slope’ tool with the ‘method’ parameter set to Planar in ArcGIS Pro.

4.1.5 Compositing bands The pre-processed Sentinel-2 raster file was combined with the calculated indices raster layers, the Sentinel-1 raster layers and the DEM and Slope layers to form the final Sentinel raster file. This was achieved with the help of the ‘Composite bands’ tool in ArcGis Pro. This tool was used to create a single raster dataset from multiple bands. The same procedure was followed for the Landsat 8 data.

4.1.6 Clipping Once all the different bands of the Sentinel and Landsat rasters werre composited into single raster files, these raster files were clipped to the geometry of the Boé. This was conducted because only the land cover change in this study area was assessed in this research. The geometry of the Boé is represented by the 'BorderBoé' shapefile, as mentioned in the data chapter. This is a vector file that comprises the whole Boé and its border. The clipping was conducted by means of the ‘Clip raster’ tool in ArcGIS Pro. This tool was used to cut out a piece of a raster file by using another feature class or raster as a cookie cutter (ESRI, 2020a). This is useful when the geometry of the study area needs to be ‘clipped’ from another larger feature class like a Landsat scene for example.

57

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

4.2 RQ1: Sampling

Figure 4.4: Flowchart of the stratified random field survey sampling process

4.2.1 Identification of land cover types and unsupervised classification To answer this first research question, three different procedures were followed which were: identifying strata for stratified sampling, developing an adequate sampling strategy and applying the strategy to the fieldwork campaign. For this identification, an unsupervised classification was performed, based on a suggestion for future research made by Studer (2019). For the initial identification of land cover classes in this thesis, the research of Studer (2019) was consulted. In his research, Studer (2019) initially identifies 11 different land cover classes and later merges these into 6 distinct land cover classes. Based on Studer’s first 11 classes, 12 classes were identified in this research, which were then used to assign a maximum number of classes for the unsupervised classification. For the unsupervised classification, the ISO cluster method was used. This classification was conducted with the ‘Iso cluster unsupervised classification’ tool in ArcGIS Pro. This unsupervised classification was solely used for the first research question, for the identification of strata and was not used in the further classification process, related to the second research question. After the classified image was finished, the land cover classes were not assigned to the clusters yet and therefore, they were manually assigned to the identified strata. Subsequently the resulting strata were assessed with experts from the Chimbo team with an extensive knowledge of the landscape of the Boé. After this assessments, the final land cover types for the field survey sampling were identified.

4.2.2 Random points generation Once the final land cover types were identified, random points were generated within these strata to identify the sample locations. This was achieved with the ‘Create random points’ tool. The points were then imported into the Garmin device.

58

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

4.2.3 Field survey sampling The fieldwork campaign consisted of fieldtrips of varying duration. The home base of the fieldwork campaign was Chimbo’s office in the ‘Casa Daridibo’ complex and its adjacent camp site in the village of Beli. From this home base, fieldtrips to different villages and locations in the Boé were planned. These villages and locations have been visited by bike and samples are taken on location in the surroundings of these villages. Two local guides of the Chimbo Foundation alternately accompanied me during the fieldtrips to provide guidance in the Boé. When villages were visited, an introduction with the so called ‘Jarga’ of the village (village chief) was required and permission had to be asked for the mapping of the village and its surroundings. After permission was granted, a local member of the CVV was appointed as local guide and accompanied us into the surroundings of the village. These local guides were of good assistance because they possess extensive knowledge of the surrounding nature of the villages and could therefore point out all the locations of the different land cover types that had to be sampled. For the sampling, a stratified sampling approach was implemented as the samples are based on prior identified land cover classes (strata). As mentioned before, the samples were registered as sample tracks by means of a Garmin GPS device and the tracks were imported into the GIS project for further processing during rest days between fieldtrips. As mentioned in the data chapter, the sample plots differed in size but were always as big as at least one Landsat 8 pixel of 30 x 30m. During the fieldwork trips an average of 9 to 10 samples were taken on each day as can also be seen in the fieldwork schedule that is provided in appendix A. More about the implemented sampling design and fieldwork campaign will be discussed in the results chapter.

Since one of the aims of this study is to successfully distinguish cashew plantations from gallery forests, the cashew plantations were prioritized during field work and in the collection of samples. This meant that an emphasis was put on this class and that is why at least one sample of a cashew plantation, matured and/or developing, had to be collected during every field trip.

4.2.4 Creating the first ‘basic’ sample collection For the creation of the basic training and validation sample collections, the recorded sample tracks of the Garmin GPS device were first loaded into the computer and imported into the GIS project. Subsequently, the ‘Training sample manager’ tool in ArcgGIS Pro was used to convert these tracks into polygon shapefiles. 80% of these samples are randomly assigned to the training sample collection and 20% of them to the validation sample collection. This is an often used ratio and is based on the literature of In addition to the recorded Garmin sample tracks, the mapped sacred forests of 2019 were added to the training sample collection. Furthermore, the easily recognizable pixels representing villages and water bodies were compared with the high-resolution ESRI imagery to see whether they also represented these types in the higher resolution imagery. After this comparison, they were also drawn as shapefiles and added to the different sample collections in the training sample manager.

The sample collection for the classification of the 2016 images was based on the 2020 sample collection but corrected for 2016 by interpreting the 2016 image(s). Only the 2020 samples that also appeared to represent the same land cover types in 2016 were kept and the other samples were disregarded. In addition, the sacred forest samples that were collected in the year of 2016 were also added to this sample collection.

4.3 RQ2: Establishment of representative sample collections The aim of this research question is to establish more balanced and representative sample collections that could be used for the classification of the images. Therefore, the spectral signatures of the collected samples had to be assed in order to check whether the different land cover types were differentiable from each other or not. Furthermore, some samples with similar spectral signatures were merged into one land cover class to make the sample collection more balanced. At last, a sensitivity analysis with the Maximum Likelihood classifier was conducted to see whether the merges had any effect on the classification results.

59

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 4.5: Flowchart of the process of establishing a representative training sample collection

4.3.1 Initial spectral assessment and test classification For the identification of landcover types with similar signatures an initial spectral assessment was applied in the R software environment to make graphs of the spectral profile of all the different classes for the different satellite bands and vegetation indices. These graphs will be presented in the results chapter.

4.3.2 Merging of similar land cover types and sensitivity analysis of Maximum likelihood classifier For balancing the sample collections, certain land cover types with similar spectral signatures were merged and some more landcover types were added. Based on these merges, two different sample collections were established. These are the ‘expert’ and ‘researcher’ sample collections and will be further discussed in the results chapter. Along with the basic sample collection these sample collections were validated by a sensitivity analysis with the maximum likelihood classifier. This sensitivity analysis was performed to check whether these different collections generated significantly different classifications. In this analysis some other elements have been add it as well 2 analyze whether these also influence the resulting classifications

60

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

4.3.3 Second spectral assessment The second spectral assessment was similar to the first one but for this one, the Jeffries-Matusita distance was also calculated for all the land cover types combinations across all the different satellite bands, radar bands, and vegetation indices in order to see how well the different land cover types could be differentiated/separated from each other.

4.4 RQ3: Classification To answer this question, a sensitivity analysis of different machine learning algorithm classifiers was conducted. For the classification of the Sentinel-2 and Landsat 8 images, the Image Classification Wizard was used. This is a built-in workflow in ArcGIS Pro which involves multiple steps that need to be performed in the process of image classification (ESRI, 2020g). These steps were: preprocessing, image segmentation, training sample selection, MLA training, classification, accuracy assessment and reclassification. All the steps were iterative, and can be altered at any time during the process. It was also possible to skip some of the steps when they were already performed during an earlier stage, as was the case with the preprocessing and training sample selection. A schematic overview of this classification process is displayed in figure 4.6.

Figure 4.6: Flowchart of the final classification process

4.4.1 Selection of classification method and classification type In the early stage of the classification process it was important to determine which classification method and type were going to be implemented. The Image Classification Wizard offers two options for the classification method, which are unsupervised and supervised classification. As mentioned in the review chapter, the outcome of a supervised classification depends on the spectral characteristics of provided training samples and a land cover schema, whereas a unsupervised classification statistically assigns groups of pixels to classes without any source of reference, solely based on their spectral characteristics (ESRI, 2020g).

61

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Additionally, the Image Classification Wizard also provided two options for the classification type, which are pixel-based and object-based classification. As mentioned in the review chapter, a pixel based classification performs the classification on a per pixel basis and does not consider characteristics of neighboring pixels. This more traditional classification often results in the occurrence of the salt and pepper effect. The object based classification performs the classification on localized regions of pixels that are created in the process of image segmentation. The classified objects more closely resemble real-world features and therefore this type of classification generally produces more cleaner classification results (ESRI, 2020f). This research depends on the collection of samples and aims to produce a land cover map that closely resembles the real-world situation in the Boé. It therefore it made sense that a supervised object based classification method was selected.

4.4.2 Image Segmentation Due to this selection of a supervised object-based classification method, a segmented image needed to be created that groups neighboring pixels with similar values together into segments/objects. The built-in segmentation tool in the classification wizard was used for this task and besides pixels values, it also took shape characteristics into account when it grouped the pixels together. The resulting segmented image is based on an RGB image of the satellite imagery that is used and therefore this output could be manipulated by selecting different bands. In this research, the selection of the bands for the image segmentation was based on the results of the earlier mentioned spectral assessment and JM-distance calculation. Hence, the bands that could differentiate the best between cashew plantations and forests were selected for the image segmentation. In addition, three different parameters could be set that controlled how the satellite imagery was segmented into groups. The first one, the spectral detail parameter, set the level of importance that is given to the spectral differences of features in the imagery (ESRI, 2020f). The values of the parameter ranged from 1 to 20 and a higher value would result in a more detailed classification. Since this research aims to differentiate between different tree species (cashew plantations and forests), this parameter was set to the highest value of 20.

The second parameter, the spatial detail parameter, set the level of importance that is given to the proximity between features in the imagery. Again, the values ranged from 1 to 20 and a higher value is more appropriate for imagery with a lot of small and clustered features while lower values create spatially smoother classifications. Since, the study area in this research consists of a highly fragmented landscape, this parameter was set to the highest value of 20 as well.

The last parameter set the minimum size in pixels of each segment. The parameter excluded all the segments with a smaller size than the specified minimum from the classified output and merged them with their best fitting neighbor segments instead. For the classification of the Sentinel-2 imagery, this parameter was first set to 9 because, in theory, 9 pixels of 10m by 10m can form one Landsat pixel of 30m by 30m. Although the reasoning behind this number did not carry a lot of weight because a segment might as well be formed by multiple pixels in a row. Therefore some tests with different settings were conducted in this research to assess what segment size produces the best results in the final classification. For the classification of the Landsat 8 imagery, this parameter was set to 3 and this number was just randomly selected without any profound reasoning behind it. Therefore some additional tests with different settings were conducted for the Landsat 8 images as well.

4.4.3 Training samples and reference dataset The step in the classification wizard that involved the selection of training samples was skipped in this research because the training samples were managed with the aforementioned training sample manager tool. The training samples created with this tool were used as the training samples for the classification. Furthermore, a reference dataset was provided for the accuracy assessment of the classified output. The image classification wizard accepts many different datasets as long as they consist of features with a known location and class value and as long as the dataset matches the classification scheme of the training samples. In this research, the validation samples which are also created with the training sample manager were used for the accuracy assessment.

62

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

4.4.4 Training of MLA’s Once the training samples, reference data and segmented image were provided, the image classification wizard was used to train machine learning algorithms and/or the Maximum likelihood classifier, as mentioned in the review chapter. The wizard offered two different MLA’s, which were the Random Trees classifier (based on the Random Forest algorithm) and the Support Vector Machine classifier (SVM). Both MLA’s and the Maximum Likelihood classifier were used in this research and were compared with each other to assess which one produced the most reliable results, which will be discussed in the results chapter.

4.4.4.1 Maximum likelihood classifier The maximum likelihood classifier in the image classification wizard differs slightly from the maximum likelihood classification tool that is discussed in the review chapter because the classifier needs to be trained first and it requires the input of a segmented image instead of a spectral signature file. For the classifier, the amount of attributes from this segmented image to be considered for the training can be specified as well. In this research all the attributes of the segmented image were considered for the training of the Maximum Likelihood classifier.

4.4.4.2 Random Forest classifier For the Random classifier, which is called Random Trees in ArcGIS pro, the maximum number of trees, the maximum tree depth and the maximum number of samples per land cover class could be specified. Higher numbers for the maximum number of trees and the maximum tree depth parameters resulted in more accurate classifications but also increased processing time. The last parameter (max. number of samples) determined how many samples of each land cover class were used for the training of the classifier. When this parameter was set to 0, it meant that all the provided training samples were used for the training. In addition to these three parameters, a fourth parameter could be set which determined how many attributes of the segmented image, i.e. converged color, count of pixels or compactness, were considered for the training of the classifier. In this research, the maximum number of trees was set to 400, as this number was close to the number of 500 that Studer (2019) had used in his research. The tree depth was set to 30, as this was the default setting in ArcGIS Pro. Furthermore, all the provided training samples were used and all the attributes of the segmented image were considered for the training of the Random Trees classifier. In addition, some tests with different settings were also conducted to assess whether these different settings had a big influence on the classified output, which will be discussed in the results chapter as well.

4.4.4.3 Support Vector Machine classifier For the SVM classifier, only the number of training samples per land cover class and the amount of attributes of the segmented image to be considered, could be specified. In this research, the number of training samples per land cover class was set to 0 because all the provided training samples were used and all the attributes of the segmented image were considered for the training of the SVM classifier.

4.4.5 Final classification After all the classifiers were trained, they were used for the final classification of the satellite imagery. The classifiers were first trained by using the Sentinel-2 and Landsat 8 images of 2020 and were then applied for the classification of the same images as well as the images of 2016 for the purpose of change detection. In addition, the classifiers were also trained by using the images of 2016 and an adjusted set of training samples as mentioned in section 4.2.4 and were then applied for the classification of the images of 2016 to assess whether this different procedure produced better results. Furthermore a couple of classification tests were conducted with different settings for the parameters, different segmented images and with adjusted training samples to assess whether these changes had a big influence on the classified output.

63

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

4.5 RQ4: Validation To provide an answer on this research question, the accuracy assessment of the supervised classification was performed in a quantitative and qualitative form as can be seen in the flowchart in figure 4.7. For the quantitative form, confusion matrices were created and for the qualitative form, a visual comparison of the different results of the classifiers was implemented to show the differences in classification and to determine which classifier produced the best results. In addition, post classification processing was applied to check whether this methods could make the base maps more representative.The image classification wizard was also used for the accuracy assessment of those classified outputs.

Figure 4.7: Flowchart of data validation process

4.5.1 Confusion Matrix The quantitative form entailed the creation of confusion matrices as discussed in the review chapter. During this process, the classified outputs were first compared with the provided validation samples. For this comparison, a sampling strategy was specified for the distribution of sample points. This strategy was specified with the parameters of the built-in accuracy assessment tool in the classification wizard. In this research, these parameters were set to Stratified Random, which meant that the points were randomly distributed within each land cover class, where each class had a number of points proportional to its relative area (ESRI, 2020g). The second parameter was set to 500, which meant that 500 points was randomly distributed within the classes.

4.5.2 Visual validation The qualitative form employed a visual assessment by which all the classified were visually compared to the high resolution ESRI imagery representing the same area. When the results were not satisfying, the over/misclassified clusters of pixels in these outputs were reclassified and where possible, post-classification processing was applied. These procedures are discussed in further detail in the sections below.

4.5.3 Post-classification processing The flowchart in figure 4.8 describes the post-classification processing process. As mentioned above, this process was applied to erase the most obvious misclassifications in the classifications, by which it would hopefully improve the final classifications.

64

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 4.8: Flowchart of the post-classification processing process

4.5.4 Reclassification Some reoccurring issues in land cover classification are the mis- and over-classification of certain land cover classes and the occurrence of small isolated regions of misclassified pixels in the classified output (ESRI, 2020f). One way to address these issues is to re-create the training samples and re-classify the dataset but often it is easier to edit the resulting classified output, especially when only small errors are encountered. In this research, the ‘Reclassifier’ tool in the image classification wizard was used to apply these edits. With this tool, small regions of misclassified pixels could be selected and could be reclassified as any other class in the specified classification scheme. An additional advantages of this tool was its ability to ‘toggle’ the transparency of the classified image on and off, which made it easier to compare the reclassification with underlying imagery. In this research, the edits were verified by the ESRI high-resolution imagery and only the most obvious misclassifications were reclassified.

4.5.5 Post-classification processing As mentioned in the section above, the classified output often contains some misclassified isolated regions and pixels, which give it a ‘salt and pepper’ or ‘speckled’ appearance. Manual reclassification of all these small regions often consumes too much time and therefore post-classification processing can be applied as an additional step. Post-classification processing refers to the process of removing noise and improving the quality of the output (ESRI, 2020f). In this research, filtering and smoothing procedures were not applied because these processes would smoothen the boundaries of the distinct classes too much and thus ‘over simplify’ the classified output. Rather, the process of generalization was applied. This process removes small isolated regions from the classified output by reclassifying them to the values of their nearest neighbors (ESRI, 2020d). For the accomplishment of this task, three different ArcGIS Pro tools were used, which were: the ‘Region Group’, ‘Set Null’ and ‘Nibble’ tools. These tools are discussed in more detail in the section below.

4.5.5.1 Region Group This tool records, for each individual cell in the output, the identity of the connected region to which that cell belongs. Simply put, it identifies and distinguishes all the separate regions with similar cell values (land cover classes) in the classified output. In this tool, two parameters controlled how the connectivity between regions was established (ESRI, 2020d). These parameters were the ‘Number of neighbors to use’ and the ‘Zone grouping

65

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) method’ In this research, these parameters were set to ‘four’ and ‘within’, which meant that only the cells with similar values within a orthogonal (four way) connectivity were considered to be grouped in the same region. After the regions were established, a link field was added to them, which retains the original zone value for each of the input cells. By adding this link field, it became clear to which land cover class the region belonged.

4.5.5.2 Set Null After all the different regions were identified, the Set Null tool set some of these regions to NoData, based on specified criteria. It performed a conditional evaluation and returned NoData if the specified criteria was ‘true’ and another value if it was ‘false’ In this research the specified criteria was set to the value of a land cover class that had a lot of isolated misclassified pixels in the classified image. In addition, another criterion was set to a maximum amount of pixels. With these two criteria, the small isolated regions that belonged to the specified class were masked by setting them to NoData and all the other regions were set to the value 1.

4.5.5.3 Nibble In the last step, the Nibble tool replaced the cells that were masked by the Set Null tool with the values of their nearest neighbours. Only the masked NoData cells were nibbled and all the other cells remained the same. Therefore, only the undesired small isolated regions were replaced with the values of their nearest neighbours. The reclassification and generalization procedures resulted in the final classified raster outputs for 2016 and 2020.

4.6 RQ5: Change detection To answer this research question, the final classified raster outputs were compared with each other to detect changes in land cover and land use between the four years. This comparison resulted in a change detection matrix and in a change map. Moreover, a Sankey diagram was created, which made it easier to interpret the conversions identified in the change detection matrix. Figure 4.9 provides a flowchart of the change detection process.

Figure 4.9: Flowchart of change detection process

66

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

4.6.1 Change detection matrix The change detection matrix highlighted the change in total area of the different land cover types between 2016 and 2020 in a quantitative manner. For the creation of this matrix, the final classified raster outputs were first converted to polygon shapefiles because it was easier to perform calculations on shapefiles. This conversion was achieved with the ‘Raster to polygon’ tool in ArcGIS Pro. After the raster outputs were converted into shapefiles, they were intersected with each other by means of the ‘Intersect’ tool. This tool computed a geometric intersection of the input features as can be seen in figure 4.10. Features or portions of features which overlapped in all layers and/or feature classes were written to the output feature class (ESRI, 2020f).This intersection identified changes in land cover for the same locations in the 2016 and 2020 shapefiles. Because land cover shapefile products usually consist of thousands of separate features, it is important to aggregate these features based on specified attributes like, for example, land cover class. The ‘Dissolve’ tool in ArcGIS Pro was used for this task. This tool aggregated all the separate overlapping features in the output feature class, based on ‘change classes’ like, for example, ‘gallery forest to cashew plantation’. After all the change classes were aggregated, the total area in km² of these classes was calculated with the ‘Calculate geometry’ tool. The resulting attribute table was then exported to Microsoft Excel and converted into a change detection matrix with the pivot table option.

Figure 4.10: Difference in output between the ‘Intersect’ feature on the left and ‘Union’ tool on the right.

4.6.2 Change map The second step of the change detection analysis was the creation of a change map that visually highlighted some of the changes in land cover between 2016 and 2020 by displaying the aforementioned change classes in a map. As can be seen in figure 4.10, the procedure of the creation of the change map was very similar to the procedure of the creation of the change detection matrix but it used the union tool instead of the intersect tool. The main difference between these tools was that the union tool did not only write the overlapping features to the output feature class but instead wrote all the features to the output feature class. The difference in output of the union and intersect tools can be seen in figure 4.10. The different colors in this figure represent different layers and the green color represents the output. In this research only the feasible change classes are shown in the map to make the map less complicated to interpret.

4.7 RQ6: Assessment of forest monitoring systems

4.7.1 Hansen dataset The Hansen dataset is a straightforward dataset, representing forest loss and gain on a global scale, which could be easily imported in Google Earth Engine. It still required some java scripting to visualize the forest loss and forest gain maps for the extent of the Boé and to chart the yearly forest loss statistics per area, as this is the programming language that is used in Google Earth Engine. The first step was to compute the forest cover in 2000 layer which functioned as a reference layer for the rest of the analysis. After this step, the forest loss and forest gain up until 2019 layers were computed. The third step was to create the forest loss and gain map, which was created by combining the forest loss and forest gain layers with the reference layer in GEE.

This forest loss and gain map was still a global map and therefore needed to be clipped to the extent of the Boé, the individual regions in the Boé and the extent of the areas of the national parks that are located in the Boé. The Boé shapefiles were imported in Google Earth Engine for the accomplishment of this task. After the clipping, the forest loss and gain map for the Boé was created and yearly forest loss statisctics were calculated and charted for all the different areas. The map was then exported to ArcGIS Pro, in which it was compared to the change

67

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) map mentioned in section 4.6.2 for validation purposes. The flowchart in figure 4.11 displays a simplification of this process.

Figure 4.11: Flowchart of the Hansen forest monitoring process

4.7.2 Bfast forest monitoring The pre-processing was performed in the R software environment with the ‘processLandsatBatch’ and ‘timeStack’ tools of the bfastSpatial package. It involved the extraction of the scenes and NDMI layers from the data batch, cropping/clipping of the data to the desired spatial extent of the Boé, application of the Pixel_qa cloud mask that was supplied with the data batch and creation of a spatio-temporal raster object. After all the scenes were pre-processed with the ‘processLandsatBatch’ tool, the ‘timeStack’ tool was used to add all the processed rasters to a multi-temporal NDMI raster brick, which was used in subsequent analyses.

The subsequent analyses involved running the BFAST monitoring algorithm for the detection of breakpoints in the Boé area. The ‘bfmSpatial’ tool was used to apply the fitted history model to each pixel in the NDMI raster brick and to detect breakpoints per pixel. The parameter of the fitted history model was set to a harmonic- trend model and the order of this model was set to 1. More information about these parameters can be found in the R documentation of the bfastmonitor package. Furthermore the monitoring period was started at the first day of 2009 and the end of the period was not specified so it ended at the date of the last scene in 2020.

Running the algorithm resulted in a new simple raster brick with three layers. These layers were subjected to post-processing for the creation of adequate disturbance maps. The first layer represents the breakpoint timing in decimal year format, the second layer represents the NDMI change magnitude of the detected breakpoints and the third layer represents the error flag, providing a value of 1 if an error was encountered in a pixel and NA (no data) where the algorithm was successful. This last layer was disregarded for post- processing as it did not provide any useful information. On the first two layers, a magnitude threshold of -0.05 was applied. This threshold was based on the threshold used in the BFASTSpatial tutorial and is close to the threshold of -0.065 used in (DeVries et al., 2015). In their study they discovered that this threshold was associated with a high probability of actual detected deforestation. The threshold replaced all the values above -0.05 with NA in the two layers which meant that only the breakpoints with a high probability of actual deforestation remained in the final disturbance maps. In addition, an area sieve with a threshold of 5000 was applied to the layers. This threshold corresponds to a forest definition of a minimum area of 0.5 ha which is implemented by the FAO (2005). The threshold replaced all the breakpoint areas smaller than 0.5ha with NA,

68

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) which means that only the areas with a high probability to pass as forests remained in the final disturbance maps.

The resulting final disturbance maps represent the areas with a high probability of being an actual forest disturbance in a forest area. The disturbance year map depicts the years in which the remaining disturbances occurred and the magnitude map depicts the NDMI change magnitude of the remaining disturbances. A simplification of the monitoring process is displayed in figure 4.12.

Figure 4.12: Flowchart of the BFAST forest monitoring process

69

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5. Results This chapter presents the major findings and results that emerged from the different analyses. Like the other chapters, it is structured by research question. Hence, this chapter starts with highlighting the results from the sampling process. It then presents the results of the classification of the 2020 image and addresses the final classified output for 2020. Subsequently, it will addresses the accuracy of the classified output, after which the succeeding section presents the final classified output for 2016 and highlights the observed changes in land cover. The chapter concludes with the most striking observations that emerged from the analysis involving the forest monitoring systems.

5.1 Field work Sampling (RQ1) In the sampling results section, the results from the fieldwork campaign are addressed. Furthermore, the establishment of the final sample collection is highlighted and the results of the spectral assessment of this final sample collection are provided. The section, however, starts with introducing the sampling design and the results from the unsupervised classification. The test classifications that are conducted for the establishment of the sample collection are also provided.

5.1.1 Sampling frame and design As mentioned in the methodology chapter, the desired sampling design for this research is one based on stratified sampling because the fragmented landscape of the Boé consists of many different land cover types (strata), which are all important for the classification. Hence, an unsupervised classification needed to be conducted on the Sentinel-2 image of 2019, prior to the fieldwork campaign, in order to identify the different strata to be sampled. This unsupervised classification resulted in the classified image that is presented in figure 5.1. After the classification was performed, land cover types were assigned to the identified strata, based on my own assumptions, and subsequently the image was assessed by experts from Chimbo Foundation who possess an extensive knowledge of the landscape of the Boé.

Figure 5.1: Resulting unsupervised classification

This assessment resulted in adding 2 more classes to the list of land cover types, which were fallow land (<3 years) and cashew plantation (<5 years). The resulting 14 classes were implemented as the primary sample units into the sampling design. The implemented sampling design was a random stratified sampling design

70

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) because it was believed to be the most efficient sampling strategy for this area. The premise of this design was that all the 14 identified strata were randomly sampled throughout the entire Boé area, except for the areas with villages that do not to participate with Chimbo’s CVV program.

5.1.2 Fieldwork campaign During the first few fieldwork trips, the above-mentioned sampling design was applied. This meant that the randomly generated plots within the Boé were visited by bike and were located and mapped by means of the Garmin GPS device. However, this sampling strategy proved to be very time consuming because a lot of the plots were not easy to find and/or to access. Furthermore, this method also wasn’t liked by the local guides of the Chimbo team because it undermined the local custom, mentioned in section 4.2.3, whereby one has to introduce him/herself to the jarga (village chief) of a local village and ask for permission to map sites in the area. Hence, this strategy was quickly disregarded and we decided that the different land cover types would be first introduced to the jarga and the local member of the CVV of a village during the introduction ceremony and then the local CVV member could accompany us to the accessible land cover types in the area. This meant that the initial random stratified probability sampling design was replaced by a more subjective stratified sampling design.

Some more issues emerged during the field work campaign. The first issue was that the initial goal of 50 samples for each land cover type, set before the fieldwork campaign, was practically unattainable because this would have meant that approximately 9 samples had to be taken on each day of the campaign with no rest days. When this goal was set, the primary sampling costs that can be attributed to traveling from one sample plot to the other, traveling from village to village, the costs associated with measuring the samples and local customs like the introduction ceremony with the jarga were not foreseen and therefore not thoroughly considered. At last, the idea of taking panorama pictures for validation purposes was disregarded as the process was also time consuming and the photo’s did not provide very useful photo’s that could be used for the validation of the classified output. This is because the panorama photo’s often overestimated the amount of aboveground vegetation mass and when they were taken in densely vegetated land cover types like gallery forests, they only recorded the nearby trees and vegetation but could not provide much contextual information outside the vegetated area.

A finding that emerged was that the land cover classes ‘built-up area (villages)’ and ‘water bodies’ were very clearly distinguishable in the Sentinel-2 image and therefore did not have to be manually registered on the ground with the GPS device during the fieldtrips. They were, instead, registered on the computer after the fieldwork campaign when the samples were drawn into the ArcGIS Pro project by means of the training sample manager.

The second finding was that one of the classes, the dry forest class, could be further divided into two new classes. These classes were: primary and secondary dry forest. This division lead to the 15 land cover classes that are described in the section below.

5.1.3 Collected samples The fieldwork campaign resulted in the first collection of samples as shown in table 5.1(a) and the spatial distribution of the samples is depicted in figure 5.2. As can be seen in this figure, the samples are evenly distributed throughout all the areas of the Boé that could be visited. However from looking at the table, it becomes apparent that the sample sizes in these collections are not very balanced, meaning that the number of samples are unequally distributed over the different land cover classes.

5.1.4 Addition of new samples After the campaign, samples of the land cover classes ‘built-up area (villages)’ and ‘water bodies’ were identified in the Sentinel-2 image and the ESRI high-resolution image and were added to the sample collection. Additionally, the gallery and primary dry forest classes were supplemented with sacred forest samples that were collected in 2019 by members of the Chimbo team. In the sacred forest database of the Chimbo team, the same forest categories are used, namely gallery, primary and secondary forest. Thus, these sacred forest samples

71

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) could be assigned to the three different forest categories in this research, based on their assignment in the database. All these additions resulted in the collection as shown in table 5.1(b). This collection will from now on be referred to as the ‘basic sample collection’. From looking at this table, it must be noted that the total area that was sampled of each land cover type is relatively small, ranging from 3,08 ha to 47,93 ha.

Figure 5.2: Spatial distributions of the final training and validation samples

Table 5.1: Sample collection after fieldwork campaign (a) and the ‘Basic sample collection’ with additional samples (b)

72

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Land cover Amount of Amount of Land Amount Amount Total type training validation cover type of of area of samples samples training validation training (80%) (20%) samples samples samples Cashew 12 4 (80%) (20%) (hectare) plantation Built-up 40 10 45,64 (3-5 years) Area Cashew 37 10 (Villages) plantation Cashew 12 4 5,05 (> 5 years) plantation Fallow land 8 2 (3-5 years) (1-3 years) Cashew 37 10 18,86 Fallow land 19 5 plantation (> 3 years) (> 5 years) Gallery 20 6 Fallow 8 2 5,78 Forest land (1-3 Primary Dry 16 5 years) Forest Fallow 19 5 5,07 land (> 3 Secondary 45 12 years) Dry Forest Gallery 33 8 32,32

Other 12 3 Forest

agriculture Primary 24 6 21,37

Dry Forest Peanut field 28 7 Secondary 54 13 37,19

Dry Forest

Rice field 37 10 Other 12 3 3,08

agriculture Peanut 28 7 10,84 Savannah 38 10 field Rice field 37 10 25,19 Wetland 30 8 Savannah 38 10 47,93 Water 30 8 27,47

Bodies Woodland 39 10 Wetland 30 8 23,32 Table 5.2: Woodland 39 10 38,07 Differences between sample collections (a) and (b) Name of sample collection Difference with previous collection(s) Nr. of classes Sample collection after N/A 13 fieldwork campaign ‘Basic sample collection’ Added samples of the Built-up (Villages) and Water Bodies 15 classes and added sacred forest samples

73

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.2 Establishment of sample collections (RQ2)

5.2.1 Initial spectral assessment and test classification Although the sample size of this basic sample collection is still very unbalanced, it was nevertheless interesting to assess the spectral signatures of the different land cover classes across the different Sentinel-2 bands in order to see whether the collected samples were very distinguishable from each other or very similar. This spectral assessment resulted in the spectral profile graph that is presented in appendix B. Furthermore, it was also interesting to assess spectral signatures of all the land cover classes across the six different vegetation indices. These spectral signatures are also presented in appendix B.

In this initial spectral assessment it becomes clear that the cashew plantation (big/old) (>5 years) class has the highest reflectance out of all the land cover classes in Sentinel-2 bands 6, 7, 8 and 8A, which means that in these bands the samples of the cashew plantations are the most differentiable from the gallery and primary dry forest classes. However, in the other bands, the reflectance values of the cashew plantation (big/old) (> 5 years) class are very similar to those of the before mentioned forest classes. Moreover, the reflectance curve of the cashew plantation (small/young) (<5 years) class is very similar to the curves of the gallery and primary dry forest classes in almost all bands except for the bands 11 and 12. Additionally, the spectral curves of the three different agricultural classes are very close to each other. The same can be said for the curves of the woodland and savannah classes and the curves of the two different fallow land classes are very similar to each other as well.

Across the six vegetation indices, the last mentioned classes demonstrate similar spectral behavior. Especially the agricultural classes are very similar to each other and the woodland and savannah classes do not vary much from each other either. Unfortunately enough, the reflectance values of the cashew plantation (big/old) (> 5 years) class across the vegetation indices are very similar to the values of the gallery forest class.

After the spectral profiles and signatures of all the land cover classes were assessed, an initial classification test was performed with the Maximum Likelihood classifier. The resulting classified image is included in appendix C. This image highlights that the classification is not correctly performed yet, which was expected because of the various land cover classes with similar spectral profiles and unbalanced sample collection. The most striking observations to emerge from this image are the over-classification of the cashew plantation (>5 years) class in the bright green color and the woodland class in the orange color.

5.2.2 Merging of land cover classes Due to the unbalanced sample collection, similar reflectance curves and inaccurate test classification, experts of the Chimbo Foundation and I decided to merge some of the classes with similar spectral profiles. This resulted in two new sample collections, which are presented in table 5.3(a) and (b). The first collection is based on my own suggestion. In this collection, the different agricultural classes are merged with each other into a new class, as well as the different cashew plantation classes, fallow land classes and the gallery and primary dry forest class. From now on this collection of samples will be referred to as ‘Researcher samples’.

The second collection is based on expert suggestion and in this collection the different agricultural classes are merged with the two fallow land classes and the young cashew class into a new agriculture class. Furthermore, the gallery and primary dry forest classes are merged into a new forest class that has maintained the name ‘gallery forest’. From now on this collection of samples will be referred to as ‘Expert samples’. It is worth mentioning that the sample sizes of these two collections are more balanced, although the size of the agriculture class in the second collection is significantly bigger than those of the other classes.

74

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Table 5.3: ‘Researcher samples’ (a) and ‘Expert samples’ (b) Land cover Amount of Land cover type Amount of type training samples training (80%) samples Cashew 49 Cashew 37 plantation plantation (>5 Fallow land 27 years) Gallery Forest 57 Gallery Forest 57 Secondary Dry 54 Secondary Dry 54 Forest Forest Agriculture 88 Agriculture 116

Savannah 38 Savannah 38 Wetland 30 Wetland 30

Woodland 39 Woodland 39 Water Bodies 30 Water Bodies 30

Built-up Area 40 Built-up Area 40 Table 5.4: Differences between (Villages) (Villages) ‘researcher’ and ‘expert’ sample collections Name of sample collection Difference with previous collection(s) Nr. of classes ‘Researcher samples’ Merge of agricultural classes, cashew plantation classes, fallow 10 land classes and the gallery and primary dry forests are merged ‘Expert samples’ Merge of agricultural classes with the fallow land classes and 9 young cashew plantation class. The gallery and primary dry forest classes are merged as well

5.2.3 Sensitivity analysis of Maximum Likelihood classification These new sample collections, as well as the first (basic) sample collection, were used in several tests with the Maximum Likelihood classifier. The purpose of these tests was to assess which combinations of bands and samples provide the most representative classification results. All these test classifications are summarized in table 5.5 to 5.9. The resulting classified images are included in appendix D.

Table 5.5: Summary of the first three test classifications Test Bands Sample collection Difference with previous test(s) 1 All 10 Sentinel-2 MSI bands ‘Basic sample collection’ N/A All vegetation indices No Sentinel-1 SAR bands No ancillary band 2 All 10 Sentinel-2 MSI bands ‘Basic sample collection’ All of the available bands included All vegetation indices in the test All Sentinel-1 SAR bands Ancillary band 3 All 10 Sentinel-2 MSI bands ‘Basic sample collection’ No ancillary band included in the All vegetation indices test All Sentinel-1 SAR bands No ancillary band

A result to emerge from the first three tests is that the inclusion of the ancillary slope band seems to slightly improve the classification of gallery forests. However, the most striking observations to emerge from these tests are the over-representations of the cashew plantation (>5 years) class and the built-up area (villages) class. Almost all forests are classified as cashew plantations which is not observed during the fieldwork campaign and in the eastern part of the Boé, significantly large patches are represented as built-up areas that cannot be

75

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) detected in the high-resolution ESRI image. On the Sentinel-2 image, these areas appear to be barren land cover types with very high reflectance values represented by a bright white colour. This reflectance is similar to the reflectance of some of the corrugated iron roofs on some houses in the villages. Therefore, it was decided to add a barren land class to the sample collection to see whether the inclusion of such a class corrects for the over-representation of the built-up areas. The samples of this class were collected in the same manner as the samples of the water bodies and built up classes, with the help of the high-resolution image and Sentinel-2 image.

Table 5.6: Summary of test classifications 4 and 5 Test Bands Sample collection Difference with previous test(s) 4 All 10 Sentinel-2 MSI bands ‘Expert samples’. Barren land type Different samples of land All vegetation indices added to the expert types making it cover types included in the All Sentinel-1 SAR bands 10 types in total test No ancillary band

5 All 10 Sentinel-2 MSI bands ‘Researcher samples’. Barren land Different samples of land All vegetation indices type added and ‘slimmed down’ cover types included in the All Sentinel-1 SAR bands samples of built-up (villages) type test No ancillary band added to correct for the over classification of the built-up (villages) type.

In the fourth test, the samples of the new barren land class were added to the ‘expert samples’, which slightly diminished the over-representation of the built-up areas but not by much. Therefore, it was decided to test whether shrunken down samples of the built-up area class could aid in solving this issue. In these new samples, only pixels that seem to represent houses with corrugated iron roofs were included in the samples. In the fifth test, these new built-up-area samples, together with the barren land samples, were added to the ‘researcher samples. Surprisingly, the over-classification is still observed in the resulting classification but nevertheless it was decided to keep the newly added samples.

Table 5.7: Summary of test classifications 6 to 9 Test Bands Sample collection Difference with previous test(s)

6 The 4 most optimal bands for ‘Expert samples’ with added New selection of bands included differentiation of cashew plantations and barren land type and in the test and ‘slimmed down’ gallery forests, based on the results of the ‘slimmed down’ samples of samples of built-up (villages) type JM-distance calculation built-up (villages) type. added to the types of test 4 7 The 4 most optimal bands for ‘Researcher samples’ with Different samples of land cover differentiation of cashew plantations and added barren land type and types included in the test gallery forests, based on the results of the ‘slimmed down’ samples of JM-distance calculation built-up (villages) type.

8 8 optimal bands for differentiation of ‘Researcher samples’ with 4 more ‘optimal’ bands and cashew plantations and gallery forests, added barren land type and ancillary band added including the 4 most optimal bands and the ‘slimmed down’ samples of 4 ensuing optimal bands for differentiation built-up (villages) type. Ancillary band 9 8 optimal bands for differentiation of ‘Expert samples’ with added Different samples of land cover cashew plantations and gallery forests, barren land type and types included in the test including the 4 most optimal bands and the ‘slimmed down’ samples of 4 ensuing optimal bands for differentiation built-up (villages) type. Ancillary band

In tests 6 to 9, new selections of band combinations were tested, and the different sample collections were alternated. For test 6, the shrunken down built-up samples were also added to the ‘expert samples. In these

76

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) tests, the most optimal bands for the differentiation of cashew plantations and gallery forests, based on the spectral signatures mentioned in section 5.2.5 are used. In tests 6 and 7, only the four most optimal bands are tested. In tests 8 and 9, the four subsequent most optimal bands, together with the slope band, are added to the first four bands. The most striking results to emerge from these tests are the over-representation of the wetland, fallow land and secondary dry forest classes and the amount of ‘loose’ isolated individual pixels. A lot of separate isolated individual pixels are observed in the classified image, which is often described as the salt and pepper effect. This effect occurs much more in these tests than in the previous tests. The resulting images from tests 8 and 9 appear to be more accurate than the images from tests 6 and 7. Therefore it was decided to not solely use the bands that were used in test 6 and 7 for the final classification. Unfortunately enough, the over-representation of the built-up area class in the eastern part of the Boé is still observed in these last two tests as well.

Table 5.8: Summary of test classifications 10 to 12 Test Bands Sample collection Difference with previous test(s) 10 All 10 Sentinel-2 MSI bands ‘Expert samples’ with added Different selection of bands and All vegetation indices barren land type and samples with added buffers All Sentinel-1 SAR bands ‘slimmed down’ samples of around them included in the test Ancillary band built-up (villages) type and buffers of 6 meter around the samples to correct for the inaccuracy of the Garmin GPS device 11 All 10 Sentinel-2 MSI bands ‘Researcher samples’ with Different samples with added All vegetation indices added barren land type and buffers around them included in All Sentinel-1 SAR bands ‘slimmed down’ samples of the test Ancillary band built-up (villages) type and buffers of 6 meter around the samples to correct for the inaccuracy of the Garmin GPS device 12 All 10 Sentinel-2 MSI bands ‘Expert samples’ with added Different samples and ancillary All vegetation indices barren land type and band excluded in the test All Sentinel-1 SAR bands ‘slimmed down’ samples of No ancillary band built-up (villages) type and buffers of 6 meter around the samples to correct for the inaccuracy of the Garmin GPS device

As mentioned in the review chapter, the Garmin device that the Chimbo team uses for field surveys has a inaccuracy of approximately 6m. Therefore, tests 10, 11 and 12 are performed in which buffers of 6m are created inside both sample collections. These buffers, however, strongly affect the classification results. In test 10 and 12, almost the entire Boé is classified as cashew plantation (>5 years) and in test 11, almost all the non-forested areas are classified as barren land.

Table 5.9: Summary of the last two test classifications Test Bands Sample collection Difference with previous test(s) 13 All 10 Sentinel-2 MSI bands ‘Expert samples’ with 5 more added Different samples All vegetation indices samples of the barren land type to included in the test All Sentinel-1 SAR bands correct for the over-representation of No ancillary band the built-up (villages) type 14 All 10 Sentinel-2 MSI bands ‘Expert samples’ with the Different samples All vegetation indices aforementioned added samples of the included in the test All Sentinel-1 SAR bands barren land type but without samples No ancillary band of the built-up area class

77

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Test 13 is rather like the fifth test but for this test, 5 new samples are added to the barren land class to see whether the over-representation of built-up areas can be further corrected for. Unfortunately, this does not completely mitigate the issue. Hence, in test 14, the land cover class built-up area (villages) was excluded from the test. This exclusion does solve the issue of the over-representation but as expected it also completely excludes the built-up area class from the classification and therefore this adjustment is not desirable.

Hence, to conclude this sensitivity analysis: the ‘expert sample’ collection provides the best results and that the additions of the barren land and shrunken down built-up area samples somewhat correct for the over- representation of the built-up area class. Furthermore, the tests also show that the combination of all the different bands and indices provide better results than some of the alternative combinations. The inclusion of buffers inside samples, however, proves not to be useful in the classification process. Therefore, it is decided to implement the ‘expert samples’ with the added barren land and shrunken down built-up samples, without buffers and a combination of all the different bands in the final classification. The section below further addresses this new and final sample collection.

5.2.4 Final sample collection The final sample collection that is used for the classification of the 2020 land cover map which resulted from the aforementioned tests, is presented in table 5.10. It is worth mentioning that the division of 80-20% for the training and validation samples has been maintained.

Table 5.10: Final training sample collection

Classname Amount of Area Classname Amount of Area training (hectares) validation (hectares) samples samples Woodland 39 45,90 Savannah 10 15,23 Cashew Plantation (>5 37 22,74 Secondary Dry Forest 13 7,20 years)37 Gallery Forest 13 14,34 Wetland 30 28,26 Agriculture 31 10,46 Water Bodies 30 33,28 Wetland 8 3,09 Woodland 10 8,57 Gallery Forest 57 64,40 Water Bodies 8 8,76 Secondary Dry Forest 54 44,84 Barren Land 11 5,06 Savannah 38 57,79 Built-up Area (Villages) 10 1,99 Agriculture 116 66,36 Cashew Plantation (>5 10 3,72 years) Barren Land 45 36,17 Built-up Area (Villages) 40 29,32

Table 5.11: Differences between sample collections Name of sample Difference with previous collection(s) Nr. of classes collection ‘Final training sample ‘Expert samples’ with additional barren land type and ‘slimmed 10 collection’ down’ samples of built-up (villages) type

5.2.5 Second spectral assessment The spectral characteristics of the different classes within the final sample collection are assessed to see whether the adjustments have any influence on the differentiability of the different classes. A similar assessment procedure is followed as in the first spectral assessment, although in this assessment the JM-distance is also calculated for each land cover type. The graph in figure 5.3 shows the reflectance curves of the different classes across the 10 different Sentinel-2 bands that are used in this research. The best way to interpret this graph is to look at the extent of separability between the reflectance curves of the different land cover classes. Simply put, the more the lines are separated from each other across the different bands, the better these bands can be used

78

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) to differentiate the corresponding land cover classes from each other during the classification process. In this graph, the bands are represented by their central wavelengths.

The most significant difference to be observed between this spectral profile and the spectral profile that is included in appendix A, is of course the addition of the merged classes and the new barren land class. This barren land class has the highest reflectance in the first four bands and in the last two bands. Especially in the last two bands it is significantly more different from the built-up area class, which may be of importance in the final classification. Furthermore, all the different classes appear to be a little bit more different from each other when compared with the other spectral profile. Nonetheless, the woodland, savannah and wetland classes still demonstrate very similar spectral behaviour across the different bands. The agricultural and built-up area do too. The reflectance curve of the gallery forest class is relatively different from the secondary dry forest class in band 3, 11 and 12 and the cashew plantation (>5 years) class once again has the highest reflectance in bands 6, 7, 8 and 8A. In these bands, it is relatively different from the other forest classes. The water bodies class has a unique reflectance curve and therefore is very different from all the other classes.

Figure 5.3: Spectral profile of the final training samples

Additionally, it is interesting to assess the spectral signatures of the different classes across the different indices and Sentinel-1 radar bands. These signatures are highlighted in the graphs in figure 5.4. Like in the first spectral assessment, the signatures of the cashew plantation and gallery forest classes do not vary significantly from each other across these indices. They do, however, vary from the other classes. In general, these two classes have the highest values across the different indices.

The signatures from the secondary dry forest and agriculture classes vary from the other classes across most of the indices, although their signatures are relatively similar to those of the built-up area and wetland classes in the NDMI, NBR and ARVI indices. Generally, they have lower values than the gallery and cashew plantation classes and higher values than the woodland, savannah, and barren land classes.

The woodland, savannah, and barren land classes do not vary much from each other but in general these classes have some of the lowest values. The built-up area class has higher values than these three classes across all indices,

At last, the built-up area and wetland classes have very similar signatures are very similar and like in the first spectral assessment, the water bodies class has the most unique and distinct spectral signature.

79

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

80

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

81

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.4: Spectral signatures of the final sample collection across the six different (vegetation) indices

82

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.5: Spectral signatures of the final sample collection across the Radar bands

83

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

The signatures in the Sentinel-1 radar band graphs demonstrate a relatively similar pattern as the indices graphs. The forest classes have some of the highest values and the woodland, savannah and barren land classes some of the lowest. Some of the most striking results to emerge from these graphs are the high values for the built-up areas class and the secondary dry forest class. The latter’s signature is very similar to the signatures of the cashew plantation and gallery forest classes across these bands. The signature of the wetland class is very similar to the signatures of the woodland, savannah and barren land class.

Table 5.12: Jeffries-Matusita distance for Sentinel-2 bands Combination of land cover types Bands Cashew - Gallery Cashew - Dry Gallery Forest - Dry Dry Forest - Dry Forest - Forest Forest Forest Woodland Agriculture Band 2 0.057944687 0.780677265 0.685615921 0.867767764 0.446454403 Band 3 0.336428858 0.035242672 0.520903209 0.735083488 0.648915734 Band 4 0.026366027 0.850892335 0.82345116 0.779238734 0.382554846 Band 5 0.277131841 0.121717106 0.694066374 0.245690088 0.483177491 Band 6 0.6375758 1.428609106 0.425039399 0.314630588 0.431925264 Band 7 0.471053875 1.438751145 0.591413106 0.553948719 0.38145242 Band 8 0.692512921 1.461191498 0.384764354 0.847809105 0.242821788 Band 8A 0.546811427 1.388549421 0.497928089 0.720019775 0.37461137 Band 11 0.007248802 0.769266834 0.859022223 0.084230379 0.327747675 Band 12 0.006194868 1.058789615 0.942037484 0.55494746 0.231565862

Table 5.13: Jeffries-Matusita distance for indices and Sentinel-1 bands Combination of land cover types Bands Cashew - Gallery Cashew - Dry Gallery Forest - Dry Dry Forest - Dry Forest - Forest Forest Forest Woodland Agriculture NDVI 0.142224576 1.357979475 0.908977754 1.316170869 0.055688949 NDMI 0.213779713 1.5854379 0.922796058 1.126897107 0.004543701 NBR 0.159225804 1.53122649 0.974981605 1.056094531 0.001661126 CIg 0.142326507 1.277781954 0.791877424 1.384168114 0.068394234 SAVI 0.149243288 1.372745293 0.918694587 1.31576631 0.055231729 ARVI 0.108702601 1.319675683 0.909275502 1.324286725 0.079419548 VV 0.251344287 0.020394842 0.185566188 1.189887263 0.737934818 VH 0.016335107 0.154908071 0.241249304 0.823556273 1.377021789

In the end, the Jeffries-Matusita distance is assessed for some of the class combinations with similar spectral signatures. The JM-distance values indicate how well some bands can really be used to differentiate the different classes from each other and are provided in Table 5.12 and 5.13. As mentioned in section 4.3.3, a value of 0 of the JM-distance indicates that the classes have completely the same spectral characteristics and a value of 2 indicates complete separability. The highest values for each land cover combination are highlighted in green and the lowest values in red.

When looking at the first table, it becomes apparent that there is not one band of the Sentinel-2 sensor with the highest values for all the combinations. Rather, certain bands perform better for certain combinations while they have lower values for other combinations. Band 8 performs the best for the combinations that are of most importance in this research, namely the cashew-gallery and cashew-dry forest combinations. For these combinations it shows values of 0.69 and 1.46 respectively. However, for the gallery forest-dry forest combination it has the lowest value. The table shows that band 12 can best be used for the differentiation of these two classes with a value of 0.94. Furthermore, the table shows that bands 2 and 3 can best be used to differentiate dry forest from woodland and dry forest from agriculture.

Table 5.13 demonstrates that there also is not one index or Radar band that can best be used to differentiate all the classes. Again, different bands perform better for certain combinations and worse for other combinations. What strikes the most is that the Radar bands (VV & VH) in general have very low values but for the dry forest- agriculture combination, the VH band has the highest value with a relatively high value of about 1.38. In general,

84

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) the indices have higher values than the Sentinel bands for the cashew-dry forest, gallery-dry forest and dry forest-woodland combinations but generally have lower values for the dry forest-agriculture and cashew-gallery forest combinations.

85

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.3 Classification results (RQ3) This section elaborates on the results of the classification of the 2020 and 2016 images. It starts with introducing the results of the sensitivity analysis of the different MLA classifiers that are considered for the final classification. Subsequently, it addresses a visual comparison of the classification results of different classifiers.

5.3.1 Sensitivity analysis of MLA classifiers For the classification of the 2020 and images, multiple tests with different MLA classifiers are conducted to assess which algorithm produces the best classified output. These tests are summarized in table 5.14. As can be seen in this table, different algorithms, different satellite images (Sentinel-2 and Landsat 8) and different segmented images are tested in this sensitivity analysis. The classification results of the first three tests and the tests with the Landsat 8 image of 2020 are presented in the section below. This is because these tests provided the most accurate results for each MLA classifier and each satellite image, based on their corresponding confusion matrices, which are discussed in section 5.4.1. The classification result of the last test will be presented in section 5.5.1 because this is the 2016 classification that is used for the change detection. The results of the other test are included in appendix D, except for the tests that produced really inaccurate results.

Table 5.14: Summary of the MLA tests Satellite image MLA classifier Segmented image Training samples Sentinel-2 image 0f 2020 Maximum Likelihood Segmented image A ‘Expert samples’ of 2020

Sentinel-2 image 0f 2020 Random Forest Segmented image A ‘Expert samples’ of 2020 Sentinel-2 image 0f 2020 Support Vector Segmented image A ‘Expert samples’ of 2020 Machines Sentinel-2 image 0f 2020 Random Forest Segmented image B ‘Expert samples’ of 2020 Sentinel-2 image 0f 2020 Support Vector Segmented image B ‘Expert samples’ of 2020 Machines Sentinel-2 image of 2019 Support Vector Segmented image C Classifier is based on ‘Expert samples’ Machines classifier of of 2020 2020 Sentinel-2 image of 2016 Random Forest Segmented image D Classifier is based on ‘Expert samples’ classifier of 2020 of 2020 Sentinel-2 image of 2016 Random Forest Segmented image D Sample collection of 2016 without added sacred forest samples Sentinel-2 image of 2016 Support Vector Segmented image D Classifier is based on ‘Expert samples’ Machines classifier of of 2020 2020 Sentinel-2 image of 2016 Support Vector Segmented image D Sample collection of 2016 without Machines added sacred forest samples Sentinel-2 image of 2016 Random Forest Segmented image D Sample collection of 2016 with added sacred forest samples Sentinel-2 image of 2016 Support Vector Segmented image D Sample collection of 2016 with added Machines sacred forest samples Landsat 8 image of 2020 Random Forest Segmented image E ‘Expert samples’ of 2020 Landsat 8 image of 2020 Support Vector Segmented image E ‘Expert samples’ of 2020 Machines Landsat 8 image of 2016 Random Forest Segmented image F Classifier is based on ‘Expert samples’ classifier of 2020 of 2020 Landsat 8 image of 2016 Random Forest Segmented image F Sample collection of 2016 with added sacred forest samples Landsat 8 image of 2016 Support Vector Segmented image F Classifier is based on ‘Expert samples’ Machines classifier of of 2020 2020

86

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Landsat 8 image of 2016 Support Vector Segmented image F Sample collection of 2016 with added Machines sacred forest samples

5.3.2 Summary of results of sensitivity analysis The sensitivity analysis of the different MLA classifiers shows that the maximum likelihood classifier produces the least desirable and representative result, based on face value. Hence, this classifier is not used in combination with segmented image B and not for the classification of the 2016 images and the Landsat 8 images. For all the other tests, the Random Forest and the Support Vector Machines classifier are used. This is because these two classifiers often produce very similar results and therefore they are tested in combination with all the other variables to see whether one does perform slightly better than the other. Some of the classifications of these two classifiers are also visually compared to each other to assess the similarities and differences in more detail. This visual comparison is discussed in section 5.3.8. As can be seen in the table, the tests with less wetland samples and no built-up area samples do not produce representative results. Furthermore, the Sentinel-2 image of November 2019 is only classified with the SVM classifier of 2020 and just out of curiosity to see whether an image representing a different season really produces a significantly different classification. At last, one of the most striking findings to emerge from this sensitivity analysis is that the classification of the 2016 images is performed better when the classifier uses the adjusted sample collection of 2016 than when it uses a classifier of 2020. Additionally, the classification of the 2016 images is also performed better when the sacred forest samples of 2016 are added to the sample collection. Only the classifications that are considered for the change detection analysis are presented below. The other classifications are included in appendix D.

5.3.3 Sentinel-2, Maximum Likelihood classification of 2020 The first test is performed on the Sentinel-2 image with the Maximum Likelihood classifier that can be selected with the image classification wizard as mentioned in the methodology chapter. The classified output is displayed in figure 5.6. For this test, segmented image A is used. This segmented image is derived from the most optimal bands for the differentiation of cashew plantations and gallery forest, based on the calculated JM-distance. In addition, this segmented image consists of segments with a minimum size of 3 pixels. This image is provided in appendix E. At first glance the classification appears to be performed quite well. However, when zoomed in, some misclassifications and over-representations are observed. Especially in the eastern border region, the issue with the misclassification of the built-up class still exists. In the same region there also seems to be an over- representation of the agriculture class. Furthermore, in the Aicum region (north of the Pataque village) a lot of cashew plantations are observed. Because of these misclassifications, this classifier has not been used in the other tests.

87

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.6: (ML) Classification result of 2020

88

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.3.4 Sentinel-2, Random Forest classification of 2020 The result obtained from the second test is displayed in figure 5.7. This test is performed on the Sentinel-2 image with the Random Forest classifier and with a different segmented image. This segmented image is included in appendix E and is derived from a different selection of bands, namely the red green and blue bands of Sentinel sensor. The resulting image is a little different from the earlier mentioned result. The main difference is that the Aicum region appears to be less covered by cashew plantations. However, the over-representation of the built- up and agriculture classes in the eastern region are still observed and some new misclassifications are observed as well. These misclassifications are observed south of the lake of Vendu Cham where areas, classified as wetland in the previous result, are classified as water bodies. These water bodies were not encountered during the fieldwork campaign and are not observed in the high-resolution ESRI image. The visual validation with the ESRI image is further addressed in section 5.4.2.

Figure 5.7: (RF) Classification result of 2020

89

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.3.5 Sentinel-2, Support Vector Machines classification of 2020 The third test with the Sentinel-2 image is performed with the SVM classifier and with the same segmented image that was used in the previous test. The result is displayed in figure 5.8. This result is very similar to the result of the last test. Nevertheless, some differences are observed. The main difference is that the over- representation of the agriculture class in the eastern region is decreased. Furthermore, the Aicum region appears to be even less populated by cashew plantations. This classification, however, still over-estimates the built-up class in the east and the water bodies south of Vendu Cham.

Figure 5.8: (SVM) Classification result of 2020

5.3.6 Landsat 8, Random Forest classification The result of the Landsat 8 classification with the Random Forest classifier is displayed in figure 5.9. At face value, the classification appears to represent the area well again but when inspected more closely, the misclassification of the built-up class and over-representation of agriculture are apparent again, although in lesser extent than in the Sentinel-2 image. Moreover, the misclassification of the built-up class also occurs in other locations, for example east of the Dulombi National Park across the Corubal river. The most surprising result is the absence of wetland areas south of the Vendu Cham lake. Only some small wetland areas are represented in this region but not as much as in the Sentinel-2 classification. Nevertheless, the result appears to provide a representative classification and therefore this image is also used for further analysis.

90

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.9: Landsat 8 (RF) classification result of 2020

5.3.7 Landsat 8, Support Vector Machine classification At last, the result of the Landsat 8 classification with the SVM classifier is presented in figure 5.10 As can be seen in this figure, the classification result is very similar to the the result of the RF classifier although some small differences are observed. There appear to be less wetland and woodland areas in this classification. However, there appear to be more misclassifications of the built-up area type.

91

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.10: Landsat 8 (SVM) classification result of 2020

5.3.8 Visual comparison of results of different MLA classifiers Because the results of the Random Forest and SVM tests are so similar, it is needed to zoom in on them a little more and to visually compare them in more detail. The next section is concerned with this comparison. The comparison of the different MLA classifiers is aided by a visual comparison of Sentinel-2 classifications of the Pataque village area. This comparison is visualized in figure 5.11. In this figure the Sentinel-2 Random Forest classification is displayed in the bottom left corner and the Sentinel-2 SVM classification in the bottom right corner. Another image is added in the top left corner, which represents the areas where differences between cashew and gallery forest are observed. The areas classified as gallery in the RF classification but as cashew plantation in the SVM classification are highlighted in red and the areas classified as cashew plantation in the RF classification but as gallery in the SVM classification are highlighted in yellow. As shown in this figure, the different MLA classifiers produce significantly different results for these two classes.

92

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.11: Visual comparison of the RF and SVM classifications

93

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.4 Accuracy assessment (RQ4) The following section addresses the accuracy of the classified output for 2016 and 2020. This section is divided in two subsections, which represent the quantitative and qualitative aspects of accuracy assessment in this research. The first subsection deals with the most important statistical results of the confusion matrices. The second section provides a visual comparison of the classifications produced by the different algorithms and a comparison of the classifications of different satellite images. In addition, this section addresses the results of the post-classification process, after which it presents the final classified output and its distribution of the different land cover types.

5.4.1 Quantitative assessment In this section, the results of the confusion matrices, synthesized through the earlier highlighted MLA classifier tests with the most accurate classification results, are provided. The complete confusion matrices are included in appendix H. Table 5.15 summarizes the most important statistical results for this research, which are the overall accuracy (OA), Kappa Coefficient (KC), the user’s accuracy for the cashew plantation class and the user’s accuracy for the gallery forest class. As can be seen in this table, the Maximum Likelihood classifier performed significantly worse on the Sentinel-2 image of 2020 than the other classifiers. Hence, this classifier is disregarded in the other classification tests. The Support Vector Machine (SVM) classifier appears to have the highest OA and KC in most of the tests, although, these values are not significantly higher than the values of the Random Forest classifier. This Random Forest, on the other hand, has the highest values for the user’s accuracies of the cashew and gallery classes. Due to these results, the Random Forest classifier is considered as the best performing classifier. Hence, the classified output of this classifier is used in further analyses and is visually compared for different satellite images in the section below.

Table 5.15: Accuracy of the different MLA classifier tests 2016 2020 Sentinel-2 Landsat 8 Sentinel-2 Landsat 8 Maximum N/A N/A OA:0.59 N/A Likelihood KC: 0.54 User’s accuracy Cashew: 0.43 User’s accuracy Gallery: 0.78 Random Forest OA: 0.72 OA: 0.69 OA: 0.70 OA: 0.70 KC: 0.67 KC: 0.62 KC: 0.66 KC: 0.65 User’s accuracy User’s accuracy User’s accuracy User’s accuracy Cashew: 1.0 Cashew: 0.80 Cashew: 0.94 Cashew: 0.91 User’s accuracy User’s accuracy User’s accuracy User’s accuracy Gallery: 0.80 Gallery: 0.80 Gallery: 0.77 Gallery: 0.90 Support Vector OA: 0.69 OA: 0.75 OA: 0.73 Machines KC: 0.64 KC: 0.71 KC: 0.69 User’s accuracy User’s accuracy User’s accuracy Cashew: 1.00 Cashew: 0.86 Cashew: 0.75 User’s accuracy User’s accuracy User’s accuracy Gallery: 0.73 Gallery: 0.82 Gallery: 0.81

94

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.4.2 Visual validation of the Random Forest classifications The aforementioned quantitative approach is further aided by a visual assessment with support of the high- resolution ESRI imagery. This visual assessment is conducted for the Random Forest classifications of the Landsat 8 and Sentinel-2 images. Figures 5.12 and 5.13 highlight the results of these assessments. The CheChe village area is selected because it is a recognizable village and because the surrounding area consists of many different land cover types including the water body of the Corubal river. The biggest difference between the Landsat 8 and Sentinel-2 image is the spatial resolution and this is also apparent in the classification because the Landsat classification is less detailed that the Sentinel classification. Nevertheless, both classifications seem to capture the river, the village and the gallery forest alongside the river very well. The cashew plantations south of the village are also represented in both classification. Unfortunately, the classified Landsat image shows an slight over representation of the agriculture class in areas that appear to be secondary forests around the cashew plantations and an over-representation of the secondary dry forest class in areas that appear to be savannahs. The Sentinel classification over-estimates the woodland and the gallery forest classes in the same areas. Furthermore, some of the wetland areas appear to be over-estimated in both images. Although both classifications seem to produce satisfying results, these over-estimations are not desirable in further analyses and therefore the images are first subjected to post-classification processing. This process is further discussed in the section below.

Figure 5.12: Visual validation of the final classified Landsat 8 (RF) output in the CheChe village area

95

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.13: Visual validation of the final classified Sentinel-2 (RF) output in the CheChe village area

96

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.4.3 Post-classification processing For this post-classification processing, the procedures as discussed in the methodology chapter are followed. Only the misclassifications of classes that are easy to recognize in the high-resolution ESRI imagery are processed. This means that only the misclassifications of the built-up areas and water bodies are processed because these misclassifications can be validated by the high-resolution imagery. This process is performed for the RF classifications of the Sentinel-2 and Landsat 8 images to see whether a more ‘clean’ and filtered output can be obtained that can be better compared to the output of the 2016 images in the change detection analysis.

After the classifications were completely filtered, they were pre-processed for the change detection analysis. However during this pre-processing phase, something went wrong with the Sentinel-2 output due to an alignment issue between the output of 2020 and 2016. This issue will be further discussed in the discussion chapter. Nevertheless, because of this issue, the Sentinel-2 output of 2020 was harder to compare with the output of 2016, which also led to implausible results in the change detection analysis. Hence, the remaining part of this results chapter will from now on only be concerned with the Landsat 8 classification and will no longer provide the results of the Sentinel-2 classification. These results are, however, still included in appendix G but will no longer be discussed.

Figure 5.14 provides a visual comparison of the Landsat 8 RF classification before and after processing. The classification in the bottom left corner is still un-processed. This classification is zoomed in to the eastern border region where the reoccurring misclassifications of the built-up class are observed. The high-resolution image in the top left corner is used for the validation of the processing. Closer inspection of this image reveals that only two real villages are located in this area, which are Quissem and Burquelem. The corresponding areas of these villages in the classification are therefore not adjusted. All the other areas classified as built-up, however, cannot be observed in the high resolution imagery and are thus filtered out by means of the post-processing steps mentioned in the methodology chapter. The result is depicted in the bottom right corner.

Figure 5.14: Comparison of un-processed and post-processed classification

97

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.4.4 Final RF classification of 2020 Landsat 8 image The final, processed, classified output of the 2020 Landsat 8 image is provided in figure 5.15. As mentioned before, the final processed classified output of the 2020 Sentinel-2 image is provided in appendix G. In this figure, it becomes clear that all the misclassifications of the built-up class are filtered out while the real locations of the built-up areas, as observed in the ESRI imagery, have remained unaltered. The distribution of all the different land cover types in this classified output is further discussed in the section below.

Figure 5.15: Final Landsat 8 (RF) classification of 2020

5.4.5 Distribution of land cover types in final classified Landsat 8 output Figures 5.16(a) (b) present the distribution of the land cover types in the final classified Landsat 8 output for the entire Boé area and its subregions. As shown in these figures, the forest types ‘gallery’ and ‘secondary dry’ are represented by the highest percentages. Combined they form 46% of the total land cover in the Boé area. Most of these forest are located in the National Park Dulombi area, where a combined total of 72% of the land cover is occupied by gallery or secondary dry forests. Additionally, the Cuntabani corridor has high percentages of forest as well. However this region is not very large and therefore does not contribute significantly to the total amount of forests in the Boé area. Furthermore, the agriculture and woodland classes are also highly represented in the classified output with percentages of 17% and 19% respectively. The wetland, built-up and water body classes are not represented by high percentages. The most surprising result to emerge from this figure is the low percentage of matured cashew plantations. Only 1% of the total land cover and in some regions 2% is occupied by cashew plantations.

98

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.16: Distribution of the land cover types in 2020 in the total Boé area (a) and its sub-regions (b)

99

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.5 Change detection results (RQ5) This section begins by providing the final classified Landsat 8 output for 2016, its corresponding confusion matrix and a graph with the resulting distribution of the land cover types. Moreover, it addresses a visual comparison between the classified output of 2020 and 2016 and presents the change detection matrix and Sankey-diagram which visualize the land cover change dynamics over this period of time. It concludes with providing a change map that highlights the locations of the observed changes.

5.5.1 Final classified Landsat 8 output of 2016 The final output of 2016 is displayed in figure 5.17. Like the classified output of 2020, this classification is created by means of a Random Forest classification. The main difference, however can be found in the training samples that are used for this classification. For this classification, the adjusted sample collection of 2016 with the additional sacred forest samples is used. This is because the use of this sample collection achieved a better result than when the Random Forest classifier of 2020 is applied to the classification of 2016, which is based on the sample collection of 2020. The corresponding confusion matrix for the assessment of accuracy is provided in appendix H. As already could be seen in table 5.15, the overall accuracy of this classification is 0.69, which is slightly less than the overall accuracy for 2020. The Kappa coefficient is 0.62. The user accuracies of the cashew plantation and gallery forest classes are high again with 0.8 and 0.79 respectively.

Figure 5.17: Classification result of 2016

100

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.5.2 Visual comparison between classifications of 2016 and 2020 When compared to the final output of 2020, some different land cover patterns can be observed. The most obvious difference is found in the much larger representation of wetlands. Especially south of the lake of Vendu Cham and in the Dulombi National Park, a lot of wetlands are observed which cannot be found in the classification of 2020. The over-representation of agricultural areas in 2020, is not observed in this 2016 output. The areas defined as agricultural and savannah areas in the classification of 2020 appear to be mostly covered by the gallery and secondary dry forest types and the woodland type in this classification of 2016. The last striking difference can be observed in the smaller presence of cashew plantations in 2016. In the classification of 2020 this type appears to be scattered throughout the Boé area but in 2016 it appears to be more concentrated in specific areas like the village of Pataque for example.

5.5.3 Comparison of land cover distribution in 2016 and 2020 In addition to the visual comparison, it is also interesting to quantitatively compare the distribution of the land cover classes in this classified output of 2016 with the output of 2020. The distribution of the land cover classes in this classified output is provided in figures 5.18(a) and (b). Like in section 5.2.6, the distribution is provided for the entire Boé area and for its sub-regions.

Figure 5.18: Distribution of the land cover types in 2016 in the total Boé area (a) and its sub-regions (b)

101

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

When compared to the distribution of 2020, some striking differences in land cover distribution are observed. These differences are in agreement with the observed differences that were discussed earlier. This is because the biggest differences are found for the wetland and agriculture classes.

Most of the subregions, which arere also highlighted in the reference map figure 3.1, seem to experience similar changes as the Boé area but nevertheless, some regional differences can be observed as well. In the entire Boé area, a decrease of 5% is observed for the wetland class. The biggest decrease in wetland occurred in the Dulombi National Park where it decreased from 17 to 4 %. The agriculture class experienced an increase of 7% between 2016 and 2020. The majority of this increase occurred in the eastern part of the Boé region where it increased with 17%. For the gallery forest class, a decrease of 3% is observed. Most of this occurred in the area to the west of the Fefine river and unexpectedly in the Boé National Park. In these last two regions the gallery forests decreased with 4 and 3% respectively.

The most unexpected result to emerge from this comparison is observed for the Secondary dry forest class. This class increased with 2% in the entire Boé area but in the Dulombi National Park it increased with 16%. Meanwhile, in the areas to the East and West of the Fefine river, this class decreased with 4 and 3%. The cashew plantation class does not change much but still experiences small increases of 1% in the areas next to the Fefine and in the Cuntabani corridor.

Although this comparison highlights the statistical percentage changes in land cover between 2016 and 2018, it doesn’t elaborate on the dynamics behind these changes. Hence, the change detection matrix is addressed in the next section. This change detection matrix highlights the conversions between different land cover classes and therefore better describes the dynamic behavior over time of these land cover classes.

102

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.5.4 Change dynamics Table 5.16 displays the change detection matrix. The conversions in which no change occurred are highlighted in gray and the conversions in which an area of at least 5 km² converted from one class to another class are highlighted in green.

Table 5.16: Change detection matrix Land cover Agriculture Barren Built-up Cashew Gallery Savannah Secondary Water Wetland Woodland Total class (km²) Land Area Plantation Forest (km²) Dry Forest Bodies (km²) (km²) 2016 (km²) (km²) (km²) (km²) (km²) (km²) (km²)l Agriculture 104.68 0.15 0.02 3.60 59.08 0.52 105.72 0.23 1.98 11.66 287.64 (km²) Barren 2.99 8.56 0.03 0.08 0.42 9.58 0.44 0.00 1.50 2.56 26.17 Land (km²) Built-up 0.76 0.09 1.68 0.01 0.04 0.04 0.18 0.03 0.04 2.87 Area (km²) Cashew 7.09 0.02 0.00 6.21 6.58 0.02 1.30 0.01 0.03 0.19 21.46 Plantation (km²) Gallery 94.76 0.14 0.09 30.66 458.40 0.71 104.89 2.64 2.32 7.63 702.23 Forest (km²) Savannah 18.66 7.34 0.03 0.02 0.63 240.57 10.59 0.03 25.00 108.23 411.10 (km²) Secondary 157.52 0.32 0.10 3.68 74.89 5.48 348.59 0.05 10.02 94.48 695.13 Dry Forest (km²) Water 0.02 0.00 0.00 1.95 0.01 0.19 23.86 0.38 0.04 26.44 Bodies (km²) Wetland 33.90 1.16 0.02 0.26 24.83 8.62 99.10 0.27 43.35 57.52 269.03 (km²) Woodland 89.11 2.26 0.02 0.19 5.34 53.92 88.33 0.08 22.37 281.89 543.50 (km²) Total 2020 509.49 20.03 1.99 44.72 632.16 319.47 759.32 27.18 106.98 564.24 2985.58 (km²) Change (km²) 221.84 -6.14 -0.88 23.26 -70.07 -91.64 64.19 0.74 -162.05 20.74

This matrix reveals that the land cover classes predominantly remained unchanged between 2016 and 2020. Nevertheless, some big conversions occurred as well. According to this matrix, the total area of the gallery forest class has decreased with approximately 70 km², which equals a decrease of about 10%. As can be seen in the matrix, the forest classes predominantly transitioned to each other’s class and to the agriculture class. Approximately 105 km² of gallery forest transitioned to dry forest and 75 km² of dry forest changed to gallery forest. Additionally, 95 km² of gallery and 158 km² of dry forest conversed to agriculture. About 31 km² of gallery forest loss is accounted for by the cashew plantation class.

The total area of the secondary dry forest class has increased with approximately 64 km², which equals an increase of about 9%. Most of this increase is accounted for by the gallery forest and agriculture classes. About 4 km² of this class transitioned to cashew plantation and 94 km² to woodland. The cashew plantation class increased with approximately 23 km², which equals an increase of about 108%. Most of this increase is accounted for by the aforementioned loss of gallery forest. The other gains and losses of the land cover types are visualized in the chart in figure 5.19

103

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.19: Gains and losses of all the land cover types between 2016 and 2020

Furthermore, some other major conversions are observed for the wetland, savannah and woodland classes. Of the wetland class, only about 43 km² remained unchanged and 99 km² transitioned to secondary dry forest. Of the woodland class, about 89 km² transitioned to agriculture and 88 km² to secondary dry forest. The savannah class predominantly changed to woodland, namely 108 km². On the contrary, for the barren land, built-up, and water bodies classes, no big conversions occurred

The Sankey diagram, provided in figure 5.20 visualizes the above mentioned dynamic behavior of the land cover classes for better interpretation. This diagram can be interpreted as follows: the bigger the target node, the more surface area is covered by that class and the bigger the flow links, the more area conversed from one class to the other. The flow links use the color of the target node from which they originated.

Figure 5.20: Sankey diagram visualizing the dynamic behavior of the land cover classes

104

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.5.5 Change map Additionally, a change map is provided in figure 5.21. This change map highlights the major conversions in land cover classes between 2016 and 2020. In the map only the conversions related to forests and cashew plantations which are perceived to be feasible are highlighted. The highly improbable conversions like water bodies to gallery forest for example are excluded. This map reveals that especially in the central region of the Boé area, large forest patches are conversed to agricultural fields. Furthermore, this development is also observed in the bottom right corner region near the village of Veindu Leidi and in the Cuntabani region. This village and region are also highlighted in the reference map in figure 3.1.

Figure 5.21: Change map visualizing the changes in land cover between 2016 and 2020

5.6 Forest monitoring results (RQ6) The remaining part of this research proceeds by providing the results of the analyses involving the Hansen dataset and the BFAST monitoring method. This section begins by comparing forest loss in the Boé area to the loss of forests in its surrounding areas. Subsequently, the analysis focuses on the Boé area again to address the forest loss in that area in more detail. Additionally, a map with the observed forest loss between 2016 and 2020 is compared to the change map mentioned in section 5.4.5 and charts representing yearly forest loss in the Boé area and some of its sub-regions are presented. These charts are based on the results of the Hansen dataset. In the end, this section concludes with providing the results of the BFAST monitoring method.

5.6.1 Forest loss in the Boé area and its surroundings. In figure 5.22, the Hansen dataset is presented representing forest loss and gain between 2001 and 2019 in the Boé and other regions of Guinea-Bissau and in a large part of the neighboring country Guinea. The first observation to emerge from assessing this map is the low extent of forest loss in the Boé area when compared with other surrounding areas. Especially in Guinea and in the areas more to the west in Guinea-Bissau, the extent of forest loss appears significantly higher. It is also interesting to see that the forest cover in 2000 in the Boé area was not as dense as in other areas in Guinea-Bissau. The most distressing result is the almost complete

105

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) absence of locations where forest gain has occurred between 2001 and 2019. Only when inspected more closely, some areas with forest gain can be observed.

Figure 5.22: Forest loss and gain in the Boé area and its surroundings

5.6.2 Forest loss in the Boé area Figure 5.23 displays the same dataset but in this instance zoomed in on the Boé area. It reveals the exact locations where forest loss has occurred. In this map, the results agree with some of the results observed in the change map because forest loss occurred in the same regions. A comparison between these two maps will be further discussed in section 5.6.3 These similar regions are the region near the village of Colebe, the central region of the Boé area and in the Cuntabani corridor. These regions can also be seen in the reference map in figure 3.1 but in figure 5.23 they are also highlighted in the black rectangles for convenience. The Boé National Park has not experienced a lot of forest loss except for the region alongside the northern part of the Fefiné river, as highlighted in the black square. On the other hand, according to this dataset, a lot of forest loss has occurred in the Dulombi National Park, which is highlighted by two black rectangles in the western part of the Boé. In the entire Boé area, the areas representing forest gain are too small to be observed on the scale of this image. The section below, deals with the quantitative results of this analysis for the Boé area and also compares the extent of forest loss to the forest loss in a neighboring region in Guinea.

106

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.23: Forest loss and gain in the Boé area according to the Hansen dataset

5.6.3 Comparison of forest loss in Boé area and neighbouring Wendou M’Bour sub-prefecture Figures 5.24 and 5.25 provide charts representing forest loss between 2001 and 2019 in the total Boé area and in the Wendou M’Bour area in Guinea. This region is also highlighted in figure 5.22 and as can be seen in this figure, it is about a little more than half the size of the Boé. The first striking observation to emerge from assessing these charts is the significant difference in forest loss in the years after 2013 and the years before 2013. It seems like forest loss only really started to occur in 2013. When comparing these two charts, it becomes obvious that the extent of forest loss is much lower in the Boé area than in the Wendou M’Bour region. Between the, for this research important, period of 2016 and 2020, the Boé area has lost approximately 5120 ha of forest but the Wendou M’Bour region has lost approximately 16450 ha, which is 3,2 times as much. Between 2013 and 2020, the Boé area has lost approximately 8840 ha of forest and the Wendou M’Bour region has lost approximately 30650 ha, which is about 3,5 times as much . The next section is concerned with a comparison of the results of the Hansen dataset and the created change map.

107

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.24: Yearly forest loss between 2001 and 2020 in the Boé area

Figure 5.25: Yearly forest loss between 2001 and 2020 in the Wendou M’Bour sub-prefecture

108

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

5.6.4 Comparison of Hansen dataset and change map Figure 5.26 below provides the visual comparison of the results of the Hansen dataset and the change map, which derived from the change detection analysis. The image to the left in this figure highlights forest loss between 2016 and 2020 as identified by the Hansen dataset. The image to the right represents the change map. In this image, only the conversion gallery forest to agriculture is visualized, which does not include the conversion gallery to cashew plantation. This conversion is selected because it was the biggest contributor (-95 km²) to forest loss, as can be seen in the change detection matrix in table 5.16. The results indicate a positive correlation because most of the ‘loss locations’ in the Hansen dataset correspond with the gallery forest to agriculture locations in the change map. This is especially true for the areas alongside the Corubal river, where the images almost seem identical. Further inland to the north, however, the change map appears to identify more areas of forest loss than the Hansen dataset.

Figure 5.26: Visual comparison between the Hansen dataset and change map

5.6.5 Yearly forest loss in the National Parks Figure 5.27 provides the charts representing forest loss in the parts of the Boé National Park and Dulombi National Park that are located in the Boé area. The charts of the other subregions of the Boé are included in appendix L. Again in these charts, the significant difference in forest loss in the years after 2013 is observed. Another interesting observation to emerge from looking at these charts is that the extent of forest loss has seemed to decrease after 2015 but in 2019 it increased again. In the other regions this trend is not observed, however in most regions the extent of forest loss increased in 2019 compared to 2018. Furthermore, the charts in figure 5.27 demonstrate that the National Park Dulombi region has experienced more forest lost than the Boé National Park region. Between 2016 and 2020, the Boé National Park has lost approximately 4.200.000 m² (420 ha) of forest and the National Park Dulombi region has lost approximately 6.960.000 m² (696 ha) of forest during the same period. This higher extent of forest loss is also observed in the change map.

109

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.27: Yearly forest loss in the parts of the National Parks that are located in the Boé area

5.6.6 BFAST monitoring method The analysis involving the BFAST monitoring method has been concerned with the disturbances in the NDMI values of forest areas. As mentioned in the methodology chapter, the disturbance year map depicts the years in which the major detected forest disturbances occurred. This map is presented below in figure 5.28. For this map, the Hansen forest cover in 2000 layer is used again as a reference forest layer. The years are represented in colours with the earlier years represented in yellow colours, the more recent years in purple and the most recent years in magenta and blue. Although the colour palette of the disturbance years is very bright, the colours do not really stand out in the map of the Boé area. Hence, three areas in which most of the disturbances occurred are represented in more detail in the zoomed in maps A, B and C.

When compared to the results of the Hansen dataset presented in figure 5.23, there seem to be less and different areas in which forest disturbances occurred. Nevertheless, some areas have still experienced a lot of disturbances. In map A for example, an area in the Boé National Park, it becomes apparent that most of the detected disturbances occurred in the more recent years between 2014 and 2017 and some in the earlier years between 2009 and 2013. In map B, an area near the village of Madina de Boé, most of the disturbances seem to have occurred in the earlier years between 2009 and 2013. In map C, an area near the village of Boloba, the disturbances seem to have occurred throughout the monitoring period but none in the most recent years between 2018 and 2020.

110

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.28: Occurrences of forest disturbances between 2009 and 2020

The magnitude map visualizes the NDMI change magnitude of the detected disturbances. This map is presented in figure 5.29. In this figure, maps A, B and C, representing the same areas as in figure 5.28 are also provided. In these maps, it becomes apparent that most of the detected areas experienced disturbance magnitudes between -0.05 and -0.15. Only a few areas experienced disturbance magnitudes below -0.15 and there were no areas detected that experienced a disturbance magnitude below -0.20.

According to the statistics that were calculated in the R software environment, the smallest detected area was 0.54 ha and the biggest 195.21 ha. These statistics are provided in appendix M. When the map is inspected in the ArcGis Pro software, it becomes apparent that the total area that has been disturbed between 2009 and 2020 is 3795,84 ha. This statistic is also included in appendix M.

111

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure 5.29: Magnitude of the forest disturbances between 2009 and 2020

112

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

6. Discussion This chapter will discuss the validity of the conducted research per research question, further discuss and interpret the obtained results, link these results to the literature as discussed in the review chapter and address the limitations and implications of this research and recommendations for future research.

RQ1: “What field work sampling strategy has to be applied for the collection of representative ‘ground truth’ samples given a limited amount of resources in a fragmented study area?” The combination of the unsupervised classification and the expert assessment has proved to be valid as it resulted in 14 different land cover types that have all been encountered during the field work campaign, although some additional landcover types have later been added during and after this campaign. The unsupervised classification has identified clusters of similar pixels representing different land cover types throughout the whole Boé area, which makes the result representative for the whole population. Furthermore, this classification method can also be used in other research and study areas as it only needs the input of a satellite image. The expert assessment is less easy to apply in other study areas because these experts with enough knowledge of the landscape to be studied may not be available in all study areas and may not always be willing to help the researcher with identifying different landcover types. The results of this type of assessment are also less reliable because they are based on the judgment of people, which can be very subjective and can change during a different period of time.

For the field work sampling, the initially proposed stratified random sampling design has not turned out to be valid as this research has shown that this design is not desirable for field work in the Boé due to time constraints and because it doesn't pay attention to local customs. Due to these reasons, this sampling design has been replaced by a purposive stratified sampling design in which the samples have been selected based on the judgement of a member of the CVV of the visited village. This replacement had not been anticipated but nonetheless it has proven to be more easy to implement in the Boé area because it ensures a more easy accessibility to the sample plots and also takes local customs into account. Therefore it has also increased the efficiency of the actual sample collection during the fieldwork campaign.

The fieldwork campaign and the above mentioned sampling design have led to the collection of the ‘basic’ sample collection as discussed in the results chapter. A positive aspect of this sample collection is that the spatial distribution of the collected samples has been evenly divided over all the villages in the Boé that are participating in Chimbo’s CVV program. In addition, the manual addition of the easily recognizable land cover types waterbodies an built up areas after the fieldwork campaign, based on the interpretation of the Sentinel 2 satellite image of 2020, has proven to be efficient because it saves time. It has also proven to be valid because these landcover types have been represented by very high producer’s and user’s accuracies when validated by means of the confusion matrices later on in the research.

However, the implemented sampling design also has some limitations because it has negatively affected the internal and external validity of the sampling method. This is because with this type of design, the samples are no longer randomly distributed over the identified strata but instead are selected based on the judgment of the member of the CVV. Hence, it is unsure whether representative samples are selected that accurately represents the landscape of the whole area. Therefore they cannot be generalized to the whole population. The results are also less reliable because in case of repeating this sampling method, different sample locations will probably be selected and there is also a high probability that another member of the CVV is going to accompany the person that will perform the fieldwork. Furthermore, this sampling design is also difficult to implement in other areas where there is no system in place like the CVV system. At last, the selection of the sample plots by the member of the CVV has often been based on convenience and as a result, some of the sample plots were located in close proximity to each other and sometimes when the CVV member was busy with other issues, less samples as desired were collected during a day.

The major limitations of the sample collection during the fieldwork campaign are that the initial goal of 50 samples per land cover type has not been met, that the resulting collection of samples was not very balanced and that the total sampled area was relatively small. Another limitation is that savannah patches, that were

113

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) covered by a little amount of water, have been identified and sampled as wetlands. In addition, many of the sampled woodland areas were very similar to the sampled savannah areas as well.

These limitations can be explained by the fact that the goal of 50 samples has turned out to be practically unattainable due to unanticipated time and cost constraints as mentioned in the methodology/result chapter. The resulting sample collection was imbalanced because some land cover classes have not been encountered as often as other land cover classes during the fieldtrips. In addition, the relative small total sampled area can be attributed to the fact that it was often very time consuming to actually sample the plots. Especially for the more densely vegetated land cover types, like forests, it is not always easy to walk along the plots.

An implication of the small and imbalanced sample collection is that it may have affected the results of the classifications and in particular the results of the SVM and RF classifications since these classifiers are prone to overfitting and may therefore have been biased towards the classes with the most samples. This implication is also discussed in the literature of Millard and Richardson (2015). Hence, the results of the classifications ought to be interpreted with caution. The limitation of the wetland samples may be one of the main causes for the low user’s and producer’s accuracies of the wetland class in all the different classifications. It also may have led to the significant over-representation of the wetland class in the final classified Landsat 8 output of 2016. In addition, the similarity between the woodland and savannah samples may have led to the observed over- representation of the woodland class and to the huge increase in woodland between 2016 and 2020.

One way to mitigate the issue of the small and imbalanced sample collections in future research is to collect more samples in the same manner as the water bodies and built-up classes have been sampled, which is through the interpretation of high-resolution images and by manually drawing them into the training sample manager. This sampling method is also applied in the research of Mohammed (2019). Another way is to use external data like cropland maps as reference data for the collection of samples as is done in the research of Useya et al. (2016). At last, motorcycles instead of bicycles can be used for traveling to the sample plots, which will increase the efficiency.

RQ2: “What methods have to be used to establish a representative sample collection?”

The initial spectral assessment of the basic sample collection, has proven to be efficient because it has resulted in clear spectral signature graphs that clearly show the similarity and differentiability of the sampled land cover types. The results of this assessment have therefore provided a good indication for the land cover types that needed to be merged with each other. The spectral assessment method that has been applied in this research is also a proven method that has been used in many scientific researches like for example in the literature of Singh et al. (2019). The merging of similar land cover types has been performed to mitigate the aforementioned issue of the imbalanced sample collections. Although this method might be considered as a simple form of oversampling, it has still proven to be valid. This is because the merged sample collections have generated significantly more representative classifications compared to the classifications of the basic sample collection, which is validated by the sensitivity analysis of the Maximum likelihood classifier. Furthermore, the merges have also produced better differentiable land cover classes, which has been validated by the second spectral assessment. Hence it can also be stated that the sensitivity analysis and spectral assessment have proven themselves as adequate validation methods. The results of the sensitivity analysis have shown that the addition of ancillary data (slope layer) to the Sentinel- 2 bands, has improved the classification result. This result is in agreement with the literature of Enderle and Weih (2005), as mentioned in section 1.1.4. An unexpected result that has emerged from the sensitivity analysis is that the inclusion of buffers inside the samples has not provided more representative classification results. This is unexpected because Studer (2019) had also included these buffers in his research but in his research it provided a more clear signature of his sample classes. Another unexpected result that has emerged from the sensitivity analysis is the over-representation of the built-up area in the eastern border region of the Boé. However, the sensitivity analysis has also shown that the addition of the barren land class has somewhat solved this issue.

114

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Unfortunately, the spectral assessments have shown that also in this research it has been difficult to differentiate cashew plantations from gallery forests because the calculated JM-distance indicates that none of the used bands in this research have reached the threshold of 1,8 for this class combination. This limitation may have resulted in a wrong representation of these two classes in the final classified output. This result is not in agreement with the results in Singh et al. (2019), because in that research, SAR radar bands proved to improve the differentiability of cashew plantations and forests.

One of the limitations of the merging method is that, although the merges have been based on the results of the spectral assessment, they might not be totally reliable. This is because they are based on the judgment of the one who merges the land cover types, which is again very subjective and may be biased. This may also explain why the ‘expert’ and ‘researcher’ sample collections consisted of different merges of landcover types. In addition, the agriculture class might have been too oversimplified by merging all the agricultural classes together and by also including the fallow land and developing cashew plantation class in the merge. Because of these merges, the agriculture class also consisted of significantly more samples than the other classes, which is the reason why the sample collection that has been used for the classification is still a little imbalanced. A weakness of the sensitivity analysis is that the internal validity of the sensitivity analysis might have been affected by the use of extra variables such as the different selection of satellite bands and inclusion of buffers because these extra variables also heavily influenced the results of the classifications. Therefore it is not really clear how much influence the merges have had on the results. Since this research has prioritized the detection of deforestation and has put an emphasis on the differentiation of the cashew plantation and gallery forest classes, the aforementioned oversimplification of the agriculture class has been accepted. However, in future research it could be interesting to assess agricultural change dynamics in more detail and to assess it over a longer period of time. This can be achieved by using satellite data representing different seasons for time series analysis and by also incorporating climate data in the assessment like annual precipitation and annual temperature data for example, as was used in the research of Useya et al. (2016).

RQ3: “In what way can image classification of satellite images best be employed to create extensive base maps with enough distinctiveness to differentiate between different land cover types?

The implemented sensitivity analysis has been an iterative process in which three different MLA classifiers have been tested in combination with different segmented images, sample collections, satellite images and this analysis has proven itself valid because it has clearly shown that certain classifiers produce more representative base maps than the others, which has also been validated by a visual comparison. The sensitivity analysis of a classifier is a proven method to measure the sensitivity of certain parameters and has been used in other literature as well like in Millard and Richardson (2015) for example. The results of Millard and Richardson (2015) are also comparable to the results of the sensitivity analysis that has been applied in this research. An advantage of this type of analysis is that it can also be easily applied to testing the sensitivity of other MLA classifiers. Furthermore, the results of this sensitivity analysis are also reliable because when repeated it will result in the same classifications. On the contrary, the results of the visual comparison might be less reliable because they have been based on human judgment and therefore the results may be biased and different people might come up with different interpretations and results. In addition, only a small area has been validated in the visual comparison due to time constraints and therefore the outcome may be less generalizable to the whole image. The sensitivity analysis of the different classifiers in this research has shown that that the maximum likelihood classifier has produced the least desirable and representative result, based on face value. Furthermore, the tests with less wetland samples and no built-up area samples have not produced representative results either. Furthermore, the Sentinel-2 image representing November 2019 has only been classified out of curiosity to see whether an image representing a different season really produces a significantly different classification. Although the result has been quite promising and quite different from the other classifications, it is not used in further analyses. One of the most striking findings that has emerged from this sensitivity analysis is that the classification of the 2016 images is performed better when the classifier uses the adjusted sample collection of 2016 than when it uses a trained classifier of 2020 instead. This result is unexpected because Studer (2019) used a trained classifier of 2018 for his classification of an image of 2014. At first glance, all the classifications that have been presented in the results chapter appeared to be performed quite well but when inspected more closely, they have all been prone to a lot of misclassifications and over-

115

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) representation of certain land cover classes. Especially the misclassifications of the built-up areas in the eastern border region and misclassifications of water bodies, south of the lake of Vendu Cham, have been observed in all the classifications. These waterbodies have not been encountered during the fieldwork campaign and have not been observed in the high-resolution ESRI imagery, which is why they have been interpreted as misclassifications, despite the fact that these areas are known to be covered by water for quite some time after the raining season.

The visual comparison of the RF and SVM classifications has shown that these different MLA classifiers produce significantly different results for cashew plantation and gallery forest classes and that the RF classifier has produced a more realistic representation of the visually inspected area because the SVM classifications has seemed to over-estimate the area of cashew plantations.

One of the limitations of the visual comparison is that it again has been based on human judgement and therefore the results may have been biased for one of the classifiers. The main limitation of the classifications of the 2016 images has been the lack of training sample data of 2016. In this research an adjusted sample collection has been created, which has been based on the collection of 2020 but this is not ideal.

Hence, for the classification of older images, it is recommended to train the MLA classifiers with training samples that have been collected in the year that the image was captured. One way to manage this might be to store all of the samples, that are registered by students and by members of the Chimbo team, in some sort of database and to sort them by year, by which they can also be used in future research.

RQ4: “Does the classified output reflect an acceptable level of accuracy in order to be used as base maps, when assessed by means of a combination of quantitative and qualitative validation methods?

The resulting classifications have been validated by comparing them with the validation samples, which resulted in the confusion matrices. The use of confusion matrices is a proven and one of the most common used methods for determining the accuracy of classifications Stehman (2009). In this research, this type of accuracy assessment has also proven itself valid because it has been able to clearly show the accuracy of the different test results and has also provided information on the level of accuracy of the classification for each different land cover type. The method can also be used to test different land cover classifications of different study areas and also to test the results of different classifiers. However, the reliability is not completely guaranteed because when repeated it will result in slightly different values, although not significantly different. This is because of the random nature of the random stratified sample selection that is applied for this accuracy assessment. In addition, the results of a confusion matrix are not completely generalizable because they are only based on the validation samples. Besides the confusion matrices, the visual validation has also proven itself adequate because it has been able to identify misclassifications and over- and underestimations of land cover types on a detailed level. However, this visual validation has been based on human judgement again and the results may thus be biased again. At last, the applied post classification processing method has proved itself valid because it has successfully removed all the mis-classified pixels of the built-up area and water body classes throughout the whole image of the Boé area, which has been validated by an additional visual comparison. The results in the confusion matrixes have shown that the Maximum Likelihood classifier has not performed well on the Sentinel-2 and Landsat 8 images, which is why this classifier has not been used for the final classification. The Support Vector Machines classifier has obtained the highest overall accuracy and kappa coefficient in most tests, although these values are not significantly higher compared to the values of the Random Forest classifier. This classifier on the other hand has obtained the highest user’s accuracies for the cashew plantation and gallery forest classes which have been the most important classes in this research. This last result is also consistent with the result of the visual comparison of the RF and SVM classifiers because this comparison has shown that the Random Forest classifier has classified the differences between cashew plantations and gallery forests more accurately and has produced a more realistic representation of these two classes. The confusion matrices have also clearly shown that all the classifiers have experienced difficulties in the classification of the wetland class because this class was represented by the lowest producer’s and user’s accuracies in all the confusion matrices. This may again be explained by the incorrect sampling of the wetland land cover type during the fieldwork campaign. This result is consistent with the results from the visual comparison of the different MLA classifiers

116

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) because the Random Forest classifier produces the more accurate representation of the cashew and gallery classes.

Although the Random Forest classifier produces the most representative classifications, the visual validation has highlighted that this classifier has still over-estimated and misclassified some of the land cover types in the classified image, especially the agricultural, secondary dry forest and woodland classes have been overrepresented in areas that appear to be savannahs in the high-resolution ESRI imagery. Furthermore, the visual validation has also highlighted that most of the classified wetland areas are indeed just savannah patches with little drainage where water can quickly accumulate. This may explain why the wetlands have been overestimated and misclassified in the final RF classifications. Nevertheless, the RF classifications of both satellite images have still produced satisfying results and therefore both images have been used in the change detection analysis.

The post classification processing has successfully filtered out all the misclassifications of the built-up class, while the real locations of built-up areas that were identified in the ESRI imagery have remained unaltered. Hence, it is safe to assume that the post-classification processing improved the final output. The most striking finding that has emerged from looking at the land cover distribution of the final classified output of 2020, is that the two forest types combined form 46% of the total land cover in the Boé area. This is not very feasible because the percentage of forests is not even that high in the old land cover maps that were made by the Portuguese during the colonial days. According to this land cover distribution, 72% of the Dulombi National Park area consists of gallery forest. This percentage is improbably high again and may be explained by the fact that no samples have been collected in this area and therefore it has been difficult for the classifier to classify this area. Another remarkable finding that has emerged from looking at this land cover distribution is that only 1% of the total land cover and in some regions 2% is occupied by cashew plantations. This result is not very likely either because it does not agree with the observations that have been made during the fieldwork campaign. Due to these unlikely results, this land cover distribution and the classification ought to be interpreted with caution.

As mentioned earlier, one of the limitations of the quantitative accuracy assessment is that it is difficult to determine the level of accuracy of the entire classified image as it is only based on the validation samples. The visual validation has been applied to provide additional information on accuracy for the areas of the map that were not included in the validation samples but this validation has only been applied to the CheChe village area because it would have been practically unattainable to visually validate the entire Boé area. Moreover, the visual validation has been based on the ESRI imagery and even though this is high-resolution imagery, it is still difficult to differentiate certain classes from each other like the different forest classes and the savannah and woodland classes for example. Furthermore the imagery is from 2018 and therefore the land cover represented in the imagery is not fully comparable with the land cover in 2016 and 2020. In future research it might therefore be interesting to utilize other sources of high-resolution imagery for the visual validation of accuracy, like for example UAV (drone) images. The drones can, for example, be brought along during the fieldtrips to capture the sampled areas from the sky.

RQ5: “What land cover changes are found by comparing the classifications of 2016 and 2020?”

Although the implemented visual comparison of classification results and comparison of land cover distribution have provided useful information about some of the changes in land cover, they have not provided information on the dynamics behind these changes and have therefore not proven to be really adequate. The implemented post classification change detection technique has proven itself more valid because it has been able to clearly show differences in landcover between the two satellite images of 2016 and 2020 and the resulting change detection matrix has also clearly indicated the dynamics behind these changes. This technique is a proven technique that has been used in many studies like for examples the study of Useya et al (2016). This technique will also provide the same results when the analysis is repeated and it has clearly shown the changes throughout the whole Boé area. Therefore the results are also generalizable. Additionally, the Sankey diagram has proven to be a very adequate way to visualize the dynamic behavior of the land cover classes and the change map has

117

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) proven to be an adequate way to visualize the exact locations of the measured land cover changes between 2016 and 2020. The results in this change detection matrix have shown that the total area of the gallery forest has decreased with approximately 70 km², which equals 7000 ha and implies that about 10% of the total area of gallery forests in 2016 has been lost in four years. This is a very high percentage. However, the change detection matrix has also indicated that approximately 105 km² of the gallery forest class has transitioned to the secondary dry forest class and approximately 75 km² of the secondary dry forest class to the gallery forest class and therefore this high percentage of gallery forest loss may also be partially attributed to misclassifications of these two classes in the 2016 and 2020 classifications. Nevertheless, about 31 km² of the gallery forest loss has been accounted for by the cashew plantation class according to this analysis, which is in line with the findings of Temudo & Abrantes (2014) who have stated that cashew plantations in most cases replace the original forest trees.

Surprisingly, the secondary dry forest class has seen an increase of about 64 km² and only 4 km² of this secondary dry forest class has transitioned to cashew plantations, which is very unlikely because not only gallery forests are getting replaced by cashew plantations but also dry forests. Some more conversions that are not very feasible have also been observed in this change detection matrix, namely, of the agricultural class, about 106 km² has transitioned to secondary dry forests and about 95 km² has transitioned to gallery forests. Especially this last conversion is very unlikely because agricultural fields cannot converse into gallery forest in only four years. At last, the change detection matrix has revealed that the cashew plantation class has increased with about 23 km² and most of this increase has been accounted for by the aforementioned loss of Gallery forests.

The change map has highlighted that especially in the central region of the Boé area large forests patches have conversed to agricultural fields. Furthermore, also in the region near the village of Colebe and in the Cuntabani region, large areas have been observed in which gallery forests have transitioned into other landcover types.

One of the main limitations of the change detection analysis is that the Sentinel-2 images of 2016 and 2020 have not been used because of an alignment issue. This is unfortunate because the classifications of the Sentinel-2 images have obtained higher overall accuracy and Kappa coefficients than the classifications of the Landsat 8 images and because these classifications are more detailed due to the higher spatial resolution of the Sentinel- 2 image. The alignment issue is illustrated in appendix I. Although the difference in alignment might seem small in this image, some features are actually located between 20 and 30 meter apart from each other, which significantly affects the change detection analysis. Although it is not completely clear what has caused this image distortion, it might be caused by the difference in data quality between the Sentinel-2 images of 2016 (L1C) and 2020 (L2A), or to external factors like differences in the atmospheric refraction. In order to solve this issue, an attempt has been made to georeference the images with the support of the ground control points that are illustrated in appendix I. However this approach has not significantly improved the alignment of the images. Hence, the results of the change detection of the Sentinel-2 classifications have not been presented in the results chapter but have only been included in the appendices J and K.

Another limitation of the implemented change detection analysis is that this type of analysis is depended on the quality of the classifications of 2016 and 2020 and is therefore also heavily affected by the misclassifications and over-representations in these classifications. Furthermore, the limitation in comparing only two images lies in the fact that these two images only represent a snap shot of the land cover during one moment in time but do not provide information on the seasonal changes of land cover.

Due to this last mentioned limitation the changes might have seemed like very abrupt changes where one land cover type completely replaced the other type but might actually be more subtle changes where the spectral characteristics of the land cover type slightly changed but the type was not completely replaced as such by another type. Hence, in future research it may be interesting to use more images and also images representing different seasons for the classification and change detection analysis to more adequately capture the dynamics behind more subtle land cover changes and to better identify trends in land cover.

118

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

RQ6: “Are the Hansen dataset and BFAST algorithm suitable for long-term forest monitoring in a fragmented forest landscape like the Boé?

The analysis involving the Hansen dataset and the BFAST forest monitor have both proven to be suitable for forest monitoring in the Boé area as they have been able to highlight the areas in the Boé where forest loss and disturbances have occurred and have also indicated the timing of these disturbances and losses. Both forest monitoring methods are proven methods that have also been used in many other studies like Verbesselt et al. (2012) and Vieilledent et al (2018) for example. The results of the Hansen dataset have also been validated in this research by the visual comparison with the change map. Unfortunately, the results of the BFAST monitor have not been validated due to time constraints. The Hansen dataset is a global dataset and therefore the results are very generalizable and can also be applied in other studies in different study areas. The BFAST monitoring approach can also be applied for other areas as it only requires the inputs of satellite images. The results are also quite reliable because when the analyses are repeated, they will provide the same results.

A positive finding that has emerged from the analysis involving the Hansen dataset is the low extent of forest loss in the Boé area when compared with other surrounding areas. Especially in Guinea and in the areas more to the west in Guinea-Bissau, the extent of forest loss is much higher. Although this can be partially explained by the fact that the forest cover in 2000 in the Boé area was not as dense as in these other areas, some striking differences have still been measured. This analysis has namely shown that the amount of forest loss in the Boé area over the past 4 years is about 3,2 times as low as the amount of forest loss in the neighboring Wendou M’Bour sub-prefecture in Guinea, which is only about half the size of the Boé area. This difference might be explained by the presence of bauxite mines in the Wendou M’Bour sub-prefecture. Another positive result that has emerged from this analysis is that the amount of forest loss in the Boé National Park is much lower as in the Dulombi National Park, which might indicate that this National Park is better protected against forest logging. However, the most distressing result that has emerged from this analysis is the almost complete absence of locations where forest gain has occurred between 2001 and 2019 in the Boé area and its surroundings, which might explain why the Western Chimpanzee is critically endangered as is also mentioned in IUCN SSC Primate Specialist Group (2020).

The visual comparison of the Hansen dataset and the change map has highlighted a positive correlation between the results of these analysis as most of the forest loss locations in the Hansen dataset coincide with the locations of the locations of the forest loss changes in the change map.

The yearly forest loss charts of the Hansen dataset have indicated a significant trend of increase in forest loss from the year 2013 in the Boé area and its subregions, which may be partially explained by the fact that the price for cashew (nuts) was really high in 2012. After 2013, most areas experienced a peak year in 2015 after which it seemed to decrease a little but in almost all areas the forest loss increased again in 2019 compared to 2018, which is again after a year in which the price for cashew was really high. In future research it might therefore be interesting to compare the amount of yearly forest loss to yearly cashew prices.

The BFAST forest monitoring method has highlighted less areas in which forest disturbances occurred. When compared to the Hansen dataset, less and different areas with forest disturbances haven been observed. This may be attributed to the fact that only the disturbance areas with a minimum of 0.5 ha and with a disturbance magnitude lower than -0.05 are depicted and that the disturbances from 2009 until 2020 are depicted rather than the disturbances from 2001 until 2019. The forest disturbance events that have been highlighted in sub- map A of the disturbance year and magnitude maps may be explained by the fact that the village of Tabadara in this region has been moved more to the Corubal river in the north and therefore forests may haven been logged. Furthermore, the highlighted forest disturbance areas in the disturbance map have shown that the disturbances have predominantly occurred between 2009 and 2017 but not in the most recent years 2018 to 2020. This may be attributed to the fact that the monitoring period of the disturbances started in the year 2009 and a longer monitoring period may be affected by an increased number of observations before and after the change event, as also has been stated by DeVries et al (2015).

At last, the magnitude map has indicated that most disturbances have magnitudes between -0.05 to -0.15. Some disturbances have magnitudes below 0.15 but none below -0.20, which is feasible because according to DeVries et al. (2015) a magnitude of -0.18 already indicates a complete forest clearance event. However, this finding has

119

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380) still been unexpected because one would expect that complete forest clearance events must have happened in some areas between 2009 and 2020.

One of the limitations of both forest monitoring systems is that the results don’t say anything about the underlying reasons behind the measured forest loss. Hence, in future research it is advisable to assess the results with experts who do know more about these underlying reasons or to train these experts on how to conduct the analyses themselves. Another limitation is that the both analysis give different results when it comes to the amount of forest loss that has been lost or disturbed. According to the Hansen dataset, 8840 ha of forest has been lost in the Boé area between 2013 and 2020 but according to the BFAST monitoring method, only 3795,84 ha of forest has been disturbed between 2009 and 2020. As mentioned before, the BFAST method has not been validated. Hence, for the BFAST monitoring method it is advisable to apply some type of validation method to validate the results. Furthermore, it is advisable to set the monitoring period to only one year instead of 11 years because then it will better measure the disturbances for that year. This process can then be iterated for each year to measure more accurate results for each year.

Nevertheless, the implication of the forest loss and disturbances that have been measured by both forest monitoring systems is that the habitat of the Chimpanzees is decreasing. Furthermore the Wendou M’Bour area has been compared to the Boé area. In this area some bauxite mines are located, which may explain the higher extent of forest loss in the area. Hence, an implication of the possible arrival of mines in the Boé area can be that the habitat of the Chimpanzees will further decrease.

120

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

7. Conclusion In this research the aim has been to “identify the main spatiotemporal changes in land cover in the Boé in Guinea-Bissau with a particular focus on forest loss’. Besides this overall objective, this research has also aimed to achieve two sub-objectives. The first sub-objective has been the creation of extensive classified base maps of the land cover in 2016 and 2020 that allow for a change detection analysis. For this sub-objective a fieldwork campaign has been performed for the collection of representative samples, which have later been used for the classification of the land cover in the Boé area in 2016 and 2020. In the end these classifications have been compared to each other to identify spatiotemporal changes in land cover over the past four years.

Although the initially proposed stratified random sample design has turned out to not be desirable for performing fieldwork in the Boé area, the conducted fieldwork campaign has still resulted in a collection of representative training and validation samples that have been collected in most of the villages that are participating in Chimbo’s CVV program throughout the whole area. A first spectral assessment has shown that even with these representative fieldwork samples it has still been difficult to differentiate between gallery forest and cashew plantations in this research. Additionally, the first basic fieldwork sample collection has turned out to be imbalanced, which has led to the decision to merge some similar land cover classes with each other. These merges have produced better differentiable sample classes and have improved classification results, which has been validated by a sensitivity analysis of the Maximum likelihood classifier. Furthermore this sensitivity analysis has also shown that a combination of satellite bands, radar bands, vegetation indices and additional slope layer works best as input data for the classification.

An additional sensitivity analysis of three different machine learning algorithm classifiers has demonstrated that the Random Forest classifier provides the best classified output for this study area. Furthermore, Landsat 8 images have been used for the final classification as the Sentinel-2 images of 2016 and 2020 could not be compared due to an alignment issue. The Random Forest classifications of the Landsat 8 images have obtained overall accuracies of 70% for 2020 and 69% for 2016 and higher user’s accuracies for the gallery forest and cashew plantation classes. A visual validation of the final classifications has revealed that the classifications are representative for the land cover in the Boé, although they are not exempt from inevitable misclassifications and over-representations of certain land cover types. The distribution of the land cover in the 2020 classification has indicated that the secondary dry forest and gallery type combined form 46% of the total land cover in the Boé area and that the cashew plantation type forms 1% of the total land cover. Nevertheless, the change detection analysis has confirmed that approximately 70 km² of gallery forests has been lost between 2016 and 2020, which equals a decrease of about 10% and has also confirmed that the total area of cashew plantations has increased with approximately 23 km² between 2016 and 2020, which equals an increase of about 108%.

The second sub-objective has been to explore the feasibility of different multi temporal forest monitoring approaches for the Boé area. For this sub-objective, two proven forest monitoring approaches have been implemented, which are the Hansen dataset and the BFAST forest monitoring system. The Hansen dataset has shown that the extent of forest loss in the Boé area is significantly lower than in some of its surrounding areas in Guinea-Bissau and Guinea. From the comparison of forest loss in the Boé area and the neighboring sub- prefecture Wendou M’Bour in Guinea, it has become apparent that the Wendou M’Bour area has experienced 3,2 times as much forest loss as the Boé area between 2016 and 2020. The Hansen dataset has also indicated that the National Park Dulombi has experienced more forest loss (696 ha) compared to the Boé National Park (420 ha).

The BFAST monitoring method has shown that most of the measured forest disturbance events have occurred between 2009 and 2017 but not in the most recent years 2018 to 2020. This method has also revealed that 3795,84 ha of the forest area in the Boé has been disturbed between 2009 and 2020.

Hence it can be concluded that the analysis of both sub-objectives have measured (gallery) forest loss. As these forests are the prime habitat of the Chimpanzees in the Boé, it can be concluded that this habit is decreasing. At last a trend in increase in the amount of cashew plantions has been measured, which is related to this habitat loss.

121

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

7.1 Added value and potential applications This study has elaborated on a study performed by Studer (2019). In his study, Studer was unable to provide reliable change detection results, which was mainly caused by the difficulty of differentiating cashew plantations and forests. This research has therefore provided added value by testing different MLA’s and by using different satellite datasets. It has also proven to be a bit more adequate in differentiating these land cover types and has therefore been able to present realistic results which are in line with the expectations of forest loss and the increase of cashew plantations. This research can easily be repeated because all the data that have been used to conduct this study are free of cost and most of the analysis have performed with the support of pretty straightforward tools in ArcGis Pro and Google Earth Engine, although the fieldwork campaign has been quite intensive. Hence this type of research can be applied for other land cover assessments as well and also in the context of other areas with highly fragmentated landscapes like the Boé. It can thus be used for land management practices in other African countries as well.

The Hansen dataset has been used before in the context of the Boé area in the consultancy report on carbon credits by van Gilst et al. (2019). However, the present research has provided added value because it has been able to present forest loss per subregion in the Boé area and per year and has also compared the extent of forest loss with a neighboring region in Guinea . Furthermore the results from the Hansen dataset have been compared to the change detection results to validate the accuracy. Once again, the data that have been used for this forest monitoring system are free of cost and the method that has been used is easy and time efficient to be implemented, although with the caveat that some knowledge of Java-scripting and scripting with R is required. Nevertheless, this method can therefore effectively be used by organizations and governments with limited funds to assess forest loss per year and over longer periods of time.

122

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

References Al-doski, J., Mansor, S. B. and Shafri, H. Z. M. (2013) ‘Change Detection Process and Techniques’, Civil and Environmental Research, 3(10), p. 37-45.

Andrieu, J. (2018) ‘Land cover changes on the West-African coastline from the Saloum Delta (Senegal) to Rio Geba (Guinea-Bissau) between 1979 and 2015’, European Journal of Remote Sensing, 51(1), p. 314-325, doi:10.1080/22797254.2018.1432295

Avtar, R., Suzuki, R. and Sawada, H. (2014) ‘Natural Forest Biomass Estimation Based on Plantation Information Using PALSAR Data’, PLoS ONE. Edited by S. Jose. Public Library of Science, 9(1), p. e86121. doi: 10.1371/journal.pone.0086121.

Aweto, A. O. and Ishola, M. A. (1994) ‘The Impact of Cashew ( Anacardium Occidentale ) on Forest Soil’, Experimental Agriculture. Cambridge University Press, 30(3), pp. 337–341. doi: 10.1017/S0014479700024443.

Biai, J. C. M. et al (2019), 'Strategy and national action plan for the biodiversity', Technical report, The state’s general office of the environment; The Republic of Guinea-Bissau

Breider, M. J. et al. (2016) 'Recent records of wild cats in the Boé sector of Guinea Bissau', Published by IUCN/SSC Cat Specialist Group, CATnews, 63, p. 15-17

Brownlee, J. (2016) 'Parametric and Nonparametric Machine Learning Algorithms', viewed 2 may 2020, https://machinelearningmastery.com/parametric-and-nonparametric-machine-learning-algorithms/

Cabral, A. & Costa, F. (2017), 'Land cover changes and landscape pattern dynamics in Senegal and Guinea Bissau borderland', Applied Geography, 82. doi: 10.1016/j.apgeog.2017.03.010.

Cabuy T., 2014. A survey of reptiles and amphibians in the Boé region, Guinea-Bissau. Chimbo Foundation, Kesteren, The Netherlands & KU Leuven, Leuven, Belgium.

Campbell, J. B. (2002) Introduction to Remote Sensing, 3rd ed. Guilford Press, New York. 621p

Cassama, V. (2019), Proposed forest reference emission level for the national system of protected areas of Guinea-Bissau, Technical report, Ministry of Environment and Sustainable Development; The Republic of Guinea-Bissau

Chimbo, (2017) 'In Boé (Guinea Bissau) people Bélieve in the spirits of the forest: this protects the environment', viewed 1 may 2020, http://chimbo.org/?p=1034&lang=en

Chimbo, (2018a) 'Guinee-Bissau', viewed 12 october 2019, http://chimbo.org/?page_id=44&lang=en

Chimbo, (2018b) 'Boé', viewed 15 october 2019, http://chimbo.org/?page_id=44&lang=en

Chimbo, (2018c) 'Other nature reserves', viewed 15 october 2019, http://chimbo.org/?page_id=46&lang=en

Chimbo, (2018d) 'Chimpanzees in the Boé', viewed 16 october 2019, http://chimbo.org/?page_id=39&lang=en

Chimbo, (2018e) 'Threaths in the Boé', viewed 14 october 2019, http://chimbo.org/?page_id=62&lang=en

Chimbo, (2019a) 'Chimbo', viewed 1 may 2020, http://chimbo.org/?page_id=30

Chimbo, (2019b) 'Policy 2019', Policy Report, viewed 1 may 2020, http://chimbo.org/wp- content/uploads/2019/07/Policy-2019.pdf

Christof, N. C et al. (2013) Pan-European distribution modelling of stream riparian zones based on multi-source Earth Observation data, Ecological Indicators 24, p. 211-223, https://doi.org/10.1016/j.ecolind.2012.06.002

123

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Civco, D. L., et al. (2002) ‘A Comparison of Land Use and Land Cover Change Detection Methods’, Proceedings of the 2002 ASPRS Annual Convention, Washington DC, 22-26 April 2002.

Congalton, R. G., (1991) ‘A Review of Assessing the Accuraccy of Classifications of Remotely Sensed Data’, Remote Sensing of Environment, 37, p. 35-46. https://doi.org/10.1016/0034-4257(91)90048-B

Coppens, B. (2016) 'Report on the ornithological importance of the Boé region, Guinea-Bissau', Technical Report, Chimbo Foundation

Crowson, M., Hagensieker, R. and Waske, B. (2019) ‘Mapping land cover change in northern Brazil with limited training data’, International Journal of Applied Earth Observation and Geoinformation. Elsevier, 78, pp. 202– 214. doi: 10.1016/J.JAG.2018.10.004.

DeVries, B., Verbesselt, J., Kooistra, L. and Herold, M. (2015) ‘Robust monitoring of small-scale forest disturbances in a tropical montane forest using Landsat time series’ Remote Sensing of Environment, 161, p. 107-121. https://doi.org/10.1016/j.rse.2015.02.012.

Elias, P., Ellis, P. Griscom, B. (2014) Applicability of the Hansen Global Forest Data to REDD+ Policy Decisions, The Nature Conservancy

Enderle, D. I. M. and Weih, R. C. (2005), ‘Integrating supervised and unsupervised classification methods to develop a more accurate land cover classification’, Journal of the Arkansas Academy of Science, 59(10), p. 65– 73

EORC; JAXA (2019) 'ALOS-2 Project/PALSAR-2', viewed 23 october 2019, https://www.eorc.jaxa.jp/ALOS- 2/en/about/palsar2.htm

ESA; Sentinel Online (2019a) 'Sentinel-1', viewed 22 october 2019, https://sentinel.esa.int/web/sentinel/missions/sentinel- 1;jsessionid=85CB677FFC4FE86543D9CD74C51681F8.jvm2

ESA; Sentinel Online (2019b) 'Sentinel-2', viewed 22 october 2019, https://sentinel.esa.int/web/sentinel/missions/sentinel-2

ESA; Sentinel Online (2020) 'Spatial Resolution', viewed 10 may 2020, https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/resolutions/spatial

ESRI (2016), 'Union', viewed 4 august 2020, https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis- toolbox/union.htm

ESRI (2020a), 'Clip Raster (Data Management)', viewed 3 august 2020, https://pro.arcgis.com/en/pro-app/tool- reference/data-management/clip.htm

ESRI (2020b), 'Composite Bands (Data Management)' viewed 4 august 2020, https://pro.arcgis.com/en/pro- app/tool-reference/data-management/composite-bands.htm

ESRI (2020c), 'Dissolve (Data Management)', viewed 5 august 2020, https://pro.arcgis.com/en/pro-app/tool- reference/data-management/dissolve.htm

ESRI (2020d), 'Generalizing classified output by removing small isolated regions', viewed 6 august 2020, https://desktop.arcgis.com/en/arcmap/latest/extensions/spatial-analyst/image-classification/generalizing- classified-output-by-removing-small-isolated-regions.htm

ESRI (2020e), 'Intersect (Analysis)', viewed 5 august 2020, https://pro.arcgis.com/en/pro-app/tool- reference/analysis/intersect.htm

124

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

ESRI (2020f), 'Processing classified output', viewed 5 august 2020, https://desktop.arcgis.com/en/arcmap/latest/extensions/spatial-analyst/image-classification/processing- classified- output.htm#:~:text=Post%2Dclassification%20processing%20refers%20to,the%20post%2Dclassification%20pr ocessing%20task.

ESRI (2020g), 'Segmentation', viewed 5 august 2020, https://pro.arcgis.com/en/pro-app/help/analysis/image- analyst/segmentation.htm

ESRI (2020h), 'The Image Classification Wizard', viewed 4 august 2020, https://pro.arcgis.com/en/pro- app/help/analysis/image-analyst/the-image-classification-wizard.htm

Extension; Mapasyst (2019) 'What’s the difference between a supervised and unsupervised image classification?', viewed 19 october 2019, https://mapasyst.extension.org/whats-the-difference-between-a- supervised-and-unsupervised-image-classification/

Fischer, C., Kleinn, C., Fehrmann, L., Fuchs, H., & Panferov, O (2011), A national level forest resource assessment for Burkina Faso . A field based forest inventory in a semiarid environment combining small sample size with large observation plots, Forest Ecology and Management 262 (8), p.1532-1540, https://doi.org/10.1016/j.foreco.2011.07.001

Foody, G. M. (2002) Status of land cover classification accuracy assessment, Remote Sensing of Environment 80, p.185-201

García del Toro, E. M. and Más-López, M. I. (2019), 'Changes in Land Cover in Cacheu River Mangroves Natural Park, Guinea-Bissau: The Need for a More Sustainable Management', Sustainability 11, 6247

Galiatsatos, N. et al. (2020) ‘An Assessment of Global Forest Change Datasets for National Forest Monitoring and Reporting’, Remote Sensing, 12, p. 1790; doi:10.3390/rs12111790 Glantz, S.A., 1993, Bio-Statistics, New York: McGraw-Hill, p. 440

Goodall, J., et al. (1978) 'Culture in chimpanzees', Nature 399, p.628-685

Google Earth Engine (2019) 'Meet Earth Engine', viewed 23 october 2019, https://earthengine.google.com/

Gu, J., Congalton, R. G., & Pan, Y. (2015) 'The Impact of Positional Errors on Soft Classification Accuracy Assessment: A Simulation Analysis', Remote Sensing 7(1), p. 579-599; https://doi.org/10.3390/rs70100579

Hansen, M. C., et al. (2013) 'High-Resolution Global Maps of 21st-Century Forest Cover Change.', Science, 342, p. 850–853, viewed 1 may 2020, https://developers.google.com/earth- engine/datasets/catalog/UMD_hansen_global_forest_change_2018_v1_6

Hansen, M. C. et al. (2014) High-Resolution Global Maps of 21st-Century Forest Cover Change, Science 342(6160), p.850-853, DOI: 10.1126/science.1244693

Hurni, K. et al. (2017) ‘Mapping the Expansion of Boom Crops in Mainland Southeast Asia Using Dense Time Stacks of Landsat Data’, Remote Sensing. Multidisciplinary Digital Publishing Institute, 9(4), p. 320. doi: 10.3390/rs9040320.

Hurskainen, P. et al. (2019) ‘Auxiliary datasets improve accuracy of object-based land use/land cover classification in heterogeneous savanna landscapes’, Remote Sensing of Environment. Elsevier, 233, p. 111354. doi: 10.1016/J.RSE.2019.111354.

IBAP (2019) '(IBAP), Guiné-Bissau', viewed 2 may 2020, https://ibapgbissau.org

125

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

International Security Sector Advisory team (2019) 'Guinea Bissau SSR Background Note', viewed 20 october 2019, https://issat.dcaf.ch/Learn/Resource-Library/Country-Profiles/Guinea-Bissau-SSR-Background-Note

IUCN; Red List (2019) 'Western Chimpanzee', viewed 15 october 2019, https://www.iucnredlist.org/species/15935/102327574

IUCN SSC Primate Specialist Group (2020). Regional action plan for the conservation of western chimpanzees (Pan troglodytes verus) 2020–2030. Gland, Switzerland: IUCN.

Jensen, J. R. (1994) 'Introductory Digital Image Processing; A Remote Sensing Perspective' New Jersey: Prentice Hall. Jensen, J. R. (2005) 'Introductory Digital Image Processing; A Remote Sensing Perspective: 3rd (Third) edition' New Jersey: Prentice Hall.

Juliev, M., et al. (2019) Analysis of Land Use Land Cover Change Detection of Bostanlik District, Uzbekistan, Polish Journal of Environmental Studies, 28(5) p. 3235-3242. doi: 10.15244/pjoes/94216

Kaliraj, S., et al. (2017) ‘Coastal landuse and land cover change and transformations of Kanyakumari coast, India using remote sensing and GIS’, The Egyptian Journal of Remote Sensing and Space Science, 20(2), p. 169-185. https://doi.org/10.1016/j.ejrs.2017.04.003

Koehrsen, M. (2018) 'An Implementation and Explanation of the Random Forest in Python', viewed 2 may 2020, https://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python- 77bf308a9b76

Köppen, W. (1918) ‘Klassifikation der Klimate nach Temperatur, Niederschlag und Jahresablauf’ Petermanns Geographische Mitteilungen, 64, 193-203.

Kormos, R. and Boesch, C. (eds) (2003) 'Regional Action Plan for the Conservation of Chimpanzees in West Africa', IUCN/SSC Primate Specialist Group and Conservation International, Washington DC.

Kramer, H. J. (2020) 'Copernicus: Sentinel-1', viewed 10 may 2020, https://directory.eoportal.org/web/eoportal/satellite-missions/c-missions/copernicus-sentinel-1

Kühnert, K., Grass, I. and Waltert, M. (2019) ' Sacred groves hold distinct bird assemblages within an Afrotropical savanna', Global Ecology and Conservation, 18

Kurien, A. J.; Lele, S.; Nagendra, H., (2019) ‘Farms or Forests? Understanding and Mapping Shifting Cultivation Using the Case Study of West Garo Hills, India’ Land 8, p. 133-159. doi:10.3390/land8090133

Lourenço, P. et al. (2009) 'Re-growth of mangrove forests of Guinea-Bissau', ISRSE 2009: 33rd International Symposium on Remote Sensing of Environment: Sustaining the Millenium Development Goals, Palazzo dei Congressi, Stresa, Lago Maggiore, 4-8 May 2009.

Luqman, M. (2017) 'Free Satellite Imagery For You: Sentinel 1 & 2', viewed 10 may 2020, https://gogeomatics.ca/free-satellite-imagery-for-you-sentinel-1-2/.

McRoberts, R. E., Tomppo, E. O. and Czaplewski, R. L. (2012) 'Sampling designs for national forest assessments', National Forest Assesment.

Melo, J. B., et al.(2018) 'Striking divergences in Earth Observation products may limit their use for REDD+', Environmental Research Letters, 13(10), 104020, viewed 2 may 2020, https://doi.org/10.1088/1748- 9326/aae3f8

Millard, K., and Richardson, M. (2015) 'On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping', Remote Sensing, 7, p. 8489-8515.

126

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Mishra, S., Shrivastava, P. and Dhurvey, P. (2005) Change Detection Techniques in Remote Sensing: A Review, International Journal of Wireless and Mobile Communication for Industrial Systems 4(1), pp. 1-8, http://dx.doi.org/10.21742/ijwmcis.2017.4.1.01

Mitani, J. C. (1992) 'Dialects in wild chimpanzees?', American Journal of Primatology, 27(4), p. 233-243

Mohammed, I. M. A. (2019) 'Mapping crop field probabilities using hyper temporal and multi spatial remote sensing in a fragmented landscape of ethiopia', Msc. thesis, University of Twente.

Mulatu, K. et al. (2017) ‘Biodiversity Monitoring in Changing Tropical Forests: A Review of Approaches and New Opportunities’, Remote Sensing. Multidisciplinary Digital Publishing Institute, 9(10), p. 1059. doi: 10.3390/rs9101059.

Mustak, S. et al. (2019) ‘EVALUATION OF THE PERFORMANCE OF SAR AND SAR-OPTICAL FUSED DATASET FOR CROP DISCRIMINATION’, ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLII-3/W6, pp. 563–571. doi: 10.5194/isprs-archives-XLII-3-W6-563-2019.

NASA; EarthData (2019) 'ASTER Digital Elevation Model V003' viewed 12 october 2019, https://search.earthdata.nasa.gov/search/granules/collection-details?p=C1299783579-LPDAAC_ECS&m=- 56.38898573474023!57.37500000000001!0!1!0!0%2C2&fi=ASTER&tl=1556095662!4!!

NASA; Landsat Science (2019) 'Landsat 8', viewed 20 october 2019 https://landsat.gsfc.nasa.gov/landsat-data- continuity-mission/ de Oliveira Silveira, E. M. et al. (2017) ‘Assessment of geostatistical features for object-based image classification of contrasted landscape vegetation cover’, Journal of Applied Remote Sensing. International Society for Optics and Photonics, 11(3), p. 036004. doi: 10.1117/1.JRS.11.036004.

Olofsson, P. et al. (2014) ‘Good practices for estimating area and assessing accuracy of land change’, Remote Sensing of Environment, 148, pp. 42–57. doi: 10.1016/j.rse.2014.02.015.

Oosterlynck, B. (2014) 'The impact of agriculture on the biodiversity in the Boé region (Guinea Bissau)', Internship report for the Chimbo Foundation, viewed 27 august 2020, http://chimbo.org/wp- content/uploads/2015/06/The-impact-of-agriculture-on-the-biodiversity-in-the-Boe%CC%81-region-B.- Oosterlynck-2014.pdf

Pettorelli, N., Safi, K. and Turner, W., (2014), ‘Satellite remote sensing, biodiversity research and conservation of the future’, Philosophical Transactions of the Royal Society B, p. 369 Ramentol, E. Caballero, Y., Bello, R. and Herrera, F. (2011) ‘SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory’, Knowledge and Information Systems, 33(2). doi: 10.1007/s10115-011-0465-6. Reiche, J., Hamunyela, E., Verbesselt, J., Hoekman, D., and Herold, M. (2018). 'Improving near realtime deforestation monitoring in tropical dry forests by combining dense sentinel-1 time series with landsat and alos- 2 palsar-2', Remote Sensing of Environment, 204:147{161.

Reiche, J., Verbesselt, J., Hoekman, D., and Herold, M. (2015) 'Fusing landsat and sar time series to detect deforestation in the tropics' Remote Sensing of Environment, 156:276{293.

Silva, C. S., Serra, A. and Lopes, E. (2007) 'Étude de faisabilité du projet Développement touristique de la Boé au profit de la conservation des Chimpanzés et des populations locales', Technical Report, Chimbo Foundation.

Singh, M. et al. (2018) ‘Evaluating the ability of community-protected forests in Cambodia to prevent deforestation and degradation using temporal remote sensing data’, Ecology and Evolution. John Wiley & Sons, Ltd, 8(20), pp. 10175–10191. doi: 10.1002/ece3.4492.

127

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Singh, M., Evans, D., Chevance, J. B., Tan, B. S., Wiggins, N., Kong, L. and Sakhoeun, S. (2019). 'Evaluating remote sensing datasets and machine learning algorithms for mapping plantations and successional forests in Phnom Kulen National Park of Cambodia', PeerJ, 7, pp. 7841. http://doi.org/10.7717/peerj.7841.

Sohl, T. L., Gallant, A. L. and Loveland, T. R. (2004) ‘The Characteristics and Interpretability of Land Surface Change and Implications for Project Design’, 70(4), pp. 439–448.

Song, R., Lin, H., Wang, G., Yan, E. and Ye, Z. (2018) 'Improving Selection of Spectral Variables for Vegetation Classification of East Dongting Lake, China, Using a Gaofen-1 Image', Remote Sensing, 10(50). doi:10.3390/rs10010050

Stehman, S. V. (2009) ‘Sampling designs for accuracy assessment of land cover’, International Journal of Remote Sensing, 30(20), pp. 5243–5272. doi: 10.1080/01431160903131000.

Studer, D. (2019) 'Land cover assessment in the habitat of the chimpanzees in the Boé sector, Guinea-Bissau', Msc. thesis, Wageningen University & Research.

Temudo, M. P. and Abrantes, M. (2014) ‘The Cashew Frontier in Guinea-Bissau, West Africa: Changing Landscapes and Livelihoods’, Human Ecology. Springer US, 42(2), pp. 217–230. doi: 10.1007/s10745-014-9641- 0.

Temudo, M. P. and Santos, P. (2017) ‘Shifting environments in Eastern Guinea-Bissau, West Africa: The length of fallows in question’, NJAS - Wageningen Journal of Life Sciences. Elsevier, 80, pp. 57–64. doi: 10.1016/J.NJAS.2016.12.001.

Thanh Noi, P. and Kappas, M. (2017) ‘Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery’, Sensors. Multidisciplinary Digital Publishing Institute, 18(2), p. 18. doi: 10.3390/s18010018.

Useya, J., Chen, S. and Murefu, M. (2019) ‘Cropland Mapping and Change Detection: Toward Zimbabwean Cropland Inventory’, IEEE Access, 7, pp. 53603–53620. doi: 10.1109/ACCESS.2019.2912807.

USGS (no date) ' West Africa: Land Use and Land Cover Dynamics; Bioclimatic Regions Map', viewed 12 ockotober 2019, https://eros.usgs.gov/westafrica/node/147 van Gilst, L. et al. (2019), 'Carbon Credits in the Boé: A Feasibility Study', Consultancy Report, commissioned by Chimbo Foundation van Niel, T. G., McVicar, T. R., Datt, B. (2005) ‘On the relationship between training sample size and data dimensionality: Monte Carlo analysis of broadband multi-temporal classification’, Remote Sensing of Environment, 98, p. 468-480. doi:10.1016/j.rse.2005.08.011

Vasconcelos, M. J. P. et al. (2002) 'Land cover change in two protected areas of Guinea-Bissau (1956-1998)', Applied Geography, 22(2), p. 139-156, viewed 2 may 2020, https://doi.org/10.1016/S0143-6228(02)00005-X.

Verbesselt, J., Hyndman, R., Newnham, G. and Culvenor, D. (2010) 'Detecting trend and seasonal changes in satellite image time series'. Remote Sensing of Environment, 114 (1), p. 106–115. https://doi.org/10.1016/j.rse.2009.08.014

Verbesselt, J., Zeileis, A. and Herold, M. (2012). 'Near real-time disturbance detection using satellite image time series'. Remote Sensing of Environment, 123, 98–108. http://dx.doi. org/10.1016/j.rse.2012.02.022.

Vieilledent, G. et al. (2018) 'Combining global tree cover loss data with historical national forest cover maps to look at six decades of deforestation and forest fragmentation in Madagascar'. Biological Conservation, 222, p.189-197. https://doi.org/10.1016/j.biocon.2018.04.008

128

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Vieira, W. F., Kerry, C. and Hockings, K. J. (2019) ‘A comparison of methods to determine chimpanzee home- range size in a forest–farm mosaic at Madina in Cantanhez National Park, Guinea-Bissau’, Primates. Springer Japan, 60(4), p. 355–365. doi: 10.1007/s10329-019-00724-1.

Wang, W. et al. (2019) 'Uncertainty Problems in Image Change Detection' Sustainability 12, p. 274-287. doi:10.3390/su12010274

Wessels, K. et al. (2016) ‘Rapid Land Cover Map Updates Using Change Detection and Robust Random Forest Classifiers’, Remote Sensing. Multidisciplinary Digital Publishing Institute, 8(11), p. 888. doi: 10.3390/rs8110888.

Wenceslau, J. F. C. (2014), ‘Bauxite mining and chimpanzees population distribution, a case study in the boé sector, guinea-bissau’, Technical Report, Chimbo Foundation

Wulder, M. A. et al. (2006) An accuracy assessment framework for large-area land cover classification products derived from medium-resolution satellite data, International Journal of Remote Sensing 27(4), p.663–683, DOI:10.1080/01431160500185284

WWF (2020), 'Chimpanzees', viewed 29 august 2020, https://wwf.panda.org/knowledge_hub/endangered_species/great_apes/chimpanzees/

Yang, X. et al. (2016) Land-use change impact on time-averaged carbon balances: Rubber expansion and reforestation in a biosphere reserve, South-West China, Forest Ecology and Management 372, p.149-163, https://doi.org/10.1016/j.foreco.2016.04.009

129

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendices

Appendix A Table A.1. Fieldwork schedule Day Guide Location/Direction Hours Worked Land cover types mapped Monday Djay Uncire 2 0 18-11-2019 Tuesday Sakamusa Pataque 8 15 19-11-2019 Wednesday Djay Uncire 8 15 20-11-2019 Thursday Sakamusa Lugajole 8 18 21-11-2019 Wednesday Sakamusa Cobolo 8 16 27-11-2019 Thursday Djay Quinuique 5 9 28-11-2019 Friday Djay Aicum 8 13 29-11-2019 Sunday Sakamusa Fefine (Malangare) 8 14 01-12-2019 Tuesday Djay Dingurai 8 13 03-12-2019 Wednesday Djay Capebonde 8 13 04-12-2019 Thursday Djay Munhini 8 12 05-12-2019 Tuesday Sakamusa Colebe 8 10 10-12-2019 Wednesday Sakamusa Veindu Leidi 8 11 11-12-2019 Thursday Sakamusa Balandugo 8 15 12-12-2019 Friday Sakamusa On the way back 2 4 13-12-2019 Monday Sakamusa Fefine (Direction of 8 15 30-12-2019 Quissem) Tuesday Djay Tuba Boé 8 14 31-12-2019 Thursday Djay Hore Limbi 8 8 02-01-2020 Friday Djay Limbi Lucom 8 11 03-01-2020 Saturday Djay Farambandi 8 10 04-01-2020 Wednesday Sakamusa Bugafale 8 14 08-01-2020 Thursday Sakamusa Che Che 8 10 09-01-2020 Friday Sakamusa Veindu Cham 8 13 10-01-2020 Saturday Sakamusa Sutamaca 8 11 11-01-2020 Wednesday Djay Limbi Mangatambe 8 10 15-01-2020 Thursday Djay Limbi Mangatambe 8 9 16-01-2020

130

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Friday Djay Bufena 8 9 17-01-2020 Wednesday Sakamusa Dakale 8 6 22-01-2020 Thursday Sakamusa Dakale & Boloba 8 9 23-01-2020 Friday Sakamusa Boloba 8 6 24-01-2020 Tuesday Djay & Piet Aicum 5 12 28-01-2020 Saturday Sakamusa Limbi Afia 4 4 01-02-2020 Sunday Sakamusa Limbi Afia 4 6 02-02-2020 Monday Sakamusa Dandum 8 9 03-02-2020 Tuesday Sakamusa Madina de Boé & 6 14 04-02-2020 Diquel Wednesday Sakamusa Diquel 5 9 05-02-2020 Thursday Sakamusa Chancum Sate 6 8 06-02-2020 Monday Djay & Gonzal Quissem 4 8 10-02-2020 Tuesday Djay & Gonzal Dandula 5 10 11-02-2020 Wednesday Djay & Gonzal Dalaba 5 8 12-02-2020 Thursday Djay & Gonzal Tabadara 4 9 13-02-2020 Saturday Djay & Gonzal Burquelem 6 10 15-02-2020 Wednesday Sakamusa Quebube (river) 8 12 19-02-2020 Friday Sakamusa & Luc Uncire 6 9 21-02-2020

131

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendix B Spectral Profiles of Training Samples 3500,00 Rice Field Woodland Gallery Forest Secondary Dry Forest Cashew Plantation (big/old) Primary Dry Forest 3000,00 Savannah Fallow land (small/young) Peanut Field Cashew Plantation (small/young) Fallow land (big/old) Wetland 2500,00 Water Bodies Built-up Area (Villages) Other Agriculture

2000,00

1500,00

MeanDN (Surface Reflectance) 1000,00

500,00

0,00 2 3 4 5 6 7 8 8A 11 12 Sentinel-2 Band Figure B.1: Spectral profiles of the basic sample collection.

132

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

133

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

134

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure B.2: Spectral signatures of the basic sample collection across the six different (vegetation) indices

135

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendix C

Figure C.1: Initial test classification

136

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendix D

Test1 Test2

137

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Test3

Test4

138

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Test5 Test6

139

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Test7 Test8

140

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Test9 Test 10

141

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Test11 Test12

142

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Test13 Test14 Figure D.1: Resulting output of the 14 test classifications

143

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendix E

Segmented raster A (based on most optimal Sentinel-2 bands for differentiation in 2020) Segmented raster B (based on Sentinel-2 bands 2-3-4 in 2020)

Segmented raster C (based on most optimal Sentinel-2 bands for differentiation in 2019) Segmented raster D (based on Sentinel-2 bands 2-3-4 in 2016)

144

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Segmented raster E (based on most optimal Landsat 8 bands for differentiation in 2020) Segmented Raster F (based on most optimal Landsat 8 bands for differentiation in 2016) Figure E.1: Segmented rasters used for MLA classifications

145

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendix F

Sentinel-2 SVM 2020 with less Wetland samples Sentinel-2 SVM 2020 without Built-up Area (Villages) samples

Sentinel-2 RF 2020 without Built-up Area (Villages) samples Sentinel-2 SVM 2019

146

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Sentinel-2 2016, classified by RF classifier of 2020 Sentinel-2 2016, classified by SVM classifier of 2020

Sentinel-2 RF 2016 Sentinel-2 SVM 2016

147

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Sentinel-2 RF 2016 with added Sacred Forest samples Sentinel-2 SVM 2016 with added Sacred Forest samples Figure F.1: Resulting output of the MLA test classifications

148

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendix G

Final Sentinel-2 RF classification of 2016 Final Sentinel-2 RF classification of 2020 Figure G.1 Post-classification processing results of the Sentinel-2 classifications

149

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendix H Table H.1: Accuracy of the Sentinel-2 Maximum Likelihood classification of 2020 Land cover class Wetland Woodland Cashew Gallery Secondary Savannah Built-up Water Agriculture Barren Total User's Kappa Plantation Forest Dry Forest Area Bodies Land Accuracy (>5 years) (Villages) Wetland 1 0 0 1 4 1 0 0 11 0 18 0.055556 0 Woodland 7 25 0 0 8 26 0 0 3 8 77 0.324675 0 Cashew 0 0 19 21 1 0 0 0 3 0 44 0.431818 0 Plantation (>5 years) Gallery Forest 0 2 2 45 4 0 1 4 0 0 58 0.775862 0 Secondary Dry 0 3 1 20 17 0 0 0 4 0 45 0.377778 0 Forest Savannah 10 23 0 0 0 63 0 0 4 0 100 0.63 0 Built-up Area 0 0 0 0 3 4 12 0 0 0 19 0.631579 0 (Villages) Water Bodies 0 0 0 0 0 0 0 52 0 0 52 1 0 Agriculture 0 2 2 5 9 1 0 0 38 0 57 0.666667 0 Barren Land 2 0 0 0 0 3 0 0 0 25 30 0.833333 0 Total 20 55 24 92 46 98 13 56 63 33 500 0 0 Producer's 0.05 0.454545 0.791667 0.48913 0.369565 0.642857 0.923077 0.928571 0.603175 0.757576 0 0.594 0 Accuracy Kappa 0 0 0 0 0 0 0 0 0 0 0 0 0.537381

150

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Table H.2: Accuracy of the Sentinel-2 Random Forest classification of 2020 Land cover class Wetland Woodland Cashew Gallery Secondary Savannah Built-up Water Agriculture Barren Total User's Kappa Plantation Forest Dry Forest Area Bodies Land Accuracy (>5 years) (Villages) Wetland 2 0 0 0 0 3 0 0 0 0 5 0.4 0 Woodland 5 25 0 0 1 23 0 0 8 0 62 0.403226 0 Cashew 0 0 16 1 0 0 0 0 0 0 17 0.941176 0 Plantation (>5 years) Gallery Forest 0 2 5 75 6 0 1 4 5 0 98 0.765306 0 Secondary Dry 0 2 1 15 29 1 0 0 5 0 53 0.54717 0 Forest Savannah 11 24 0 0 0 64 0 0 2 0 101 0.633663 0 Built-up Area 0 0 0 0 4 4 12 0 0 0 20 0.6 0 (Villages) Water Bodies 0 0 0 0 0 0 0 52 0 0 52 1 0 Agriculture 0 2 2 1 6 0 0 0 43 0 54 0.796296 0 Barren Land 2 0 0 0 0 3 0 0 0 33 38 0.868421 0 Total 20 55 24 92 46 98 13 56 63 33 500 0 0 Producer's 0.1 0.454545 0.666667 0.815217 0.630435 0.653061 0.923077 0.928571 0.68254 1 0 0.702 0 Accuracy Kappa 0 0 0 0 0 0 0 0 0 0 0 0 0.656527

151

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Table H.3: Accuracy of the Sentinel-2 Random Forest classification of 2016 Land cover class Wetland Woodland Cashew Gallery Secondary Savannah Built-up Water Agriculture Barren Total User's Kappa Plantation Forest Dry Forest Area Bodies Land Accuracy (>5 years) (Villages) Wetland 3 0 0 0 0 3 0 0 1 0 7 0.428571 0 Woodland 3 31 0 0 4 26 0 0 1 1 66 0.469697 0 Cashew Plantation 0 0 5 0 0 0 0 0 0 0 5 1 0 (>5 years) Gallery Forest 0 0 10 78 8 0 0 0 2 0 98 0.795918 0 Secondary Dry Forest 0 6 0 19 27 0 0 0 4 0 56 0.482143 0 Savannah 15 19 0 0 1 76 0 0 0 2 113 0.672566 0 Built-up Area 0 0 0 1 4 1 15 0 1 0 22 0.681818 0 (Villages) Water Bodies 0 0 0 0 0 0 0 62 0 0 62 1 0 Agriculture 0 3 0 3 2 0 0 0 24 0 32 0.75 0 Barren Land 1 0 0 0 0 1 0 0 0 37 39 0.948718 0 Total 22 59 15 101 46 107 15 62 33 40 500 0 0 Producer's Accuracy 0.136364 0.525424 0.333333 0.772277 0.586957 0.71028 1 1 0.727273 0.925 0 0.716 0 Kappa 0 0 0 0 0 0 0 0 0 0 0 0 0.669032

152

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Table H.4: Accuracy of the Sentinel-2 Support Vector Machines classification of 2020 Land cover class Wetland Woodlan Cashew Gallery Secondar Savanna Built-up Water Agricultur Barren Total User's Kappa d Plantatio Forest y Dry h Area Bodies e Land Accurac n (>5 Forest (Villages y years) ) Wetland 7 0 0 0 0 2 0 0 0 4 13 0.53846 0 2 Woodland 3 41 0 0 3 27 0 0 8 0 82 0.5 0 Cashew Plantation (>5 0 0 18 2 1 0 0 0 0 0 21 0.85714 0 years) 3 Gallery Forest 0 3 6 85 7 0 0 0 3 0 104 0.81730 0 8 Secondary Dry Forest 1 1 0 22 40 0 0 0 4 0 68 0.58823 0 5 Savannah 13 19 0 0 1 83 0 0 3 0 119 0.69747 0 9 Built-up Area (Villages) 0 1 1 0 0 1 14 0 0 0 17 0.82352 0 9 Water Bodies 0 0 0 0 0 0 0 68 0 0 68 1 0 Agriculture 0 1 4 2 3 2 1 0 58 0 71 0.81690 0 1 Barren Land 0 0 0 0 0 2 0 0 0 35 37 0.94594 0 6 Total 24 66 29 111 55 117 15 68 76 39 600 0 0 Producer's Accuracy 0.29166 0.621212 0.62069 0.76576 0.727273 0.709402 0.93333 1 0.763158 0.89743 0 0.74833 0 7 6 3 6 3 Kappa 0 0 0 0 0 0 0 0 0 0 0 0 0.71030 4

153

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Table H.5: Accuracy of the Landsat 8 Random Forest classification of 2020 Land cover class Wetland Woodland Cashew Gallery Secondary Savannah Built-up Water Agriculture Barren Total User's Kappa Plantation Forest Dry Forest Area Bodies Land Accuracy (>5 years) (Villages) Wetland 2 0 0 1 0 2 0 0 4 0 9 0.222222 0 Woodland 3 30 0 0 3 23 0 0 5 0 64 0.46875 0 Cashew Plantation 0 0 10 1 0 0 0 0 0 0 11 0.909091 0 (>5 years) Gallery Forest 0 0 6 72 2 0 0 0 0 0 80 0.9 0 Secondary Dry 0 6 0 14 22 1 0 0 5 0 48 0.458333 0 Forest Savannah 14 18 0 1 1 67 0 0 2 3 106 0.632075 0 Built-up Area 0 0 0 0 2 1 12 0 0 0 15 0.8 0 (Villages) Water Bodies 0 0 0 0 0 0 0 56 0 0 56 1 0 Agriculture 1 1 8 2 16 0 1 0 51 0 80 0.6375 0 Barren Land 0 0 0 0 0 3 0 0 0 29 32 0.90625 0 Total 20 55 24 91 46 97 13 56 67 32 501 0 0 Producer's 0.1 0.545455 0.416667 0.791209 0.478261 0.690722 0.923077 1 0.761194 0.90625 0 0.700599 0 Accuracy Kappa 0 0 0 0 0 0 0 0 0 0 0 0 0.654565

154

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Table H.6: Accuracy of the Landsat 8 Support Vector Machines classification of 2020 Land cover class Wetland Woodland Cashew Gallery Secondary Savannah Built-up Water Agriculture Barren Total User's Kappa Plantation Forest Dry Forest Area Bodies Land Accuracy (>5 years) (Villages) Wetland 6 4 0 0 3 0 0 0 2 0 15 0.4 0 Woodland 3 32 0 1 3 10 0 0 4 0 53 0.603774 0 Cashew Plantation (>5 0 1 12 3 0 0 0 0 0 0 16 0.75 0 years) Gallery Forest 0 4 4 66 5 0 0 0 2 0 81 0.814815 0 Secondary Dry Forest 1 0 2 16 21 0 0 0 9 0 49 0.428571 0 Savannah 10 14 0 2 0 84 0 0 3 2 115 0.730435 0 Built-up Area (Villages) 0 0 0 0 0 0 13 0 0 0 13 1 0 Water Bodies 0 0 0 0 0 0 0 56 0 0 56 1 0 Agriculture 0 0 6 3 14 3 0 0 47 1 74 0.635135 0 Barren Land 0 0 0 0 0 0 0 0 0 29 29 1 0 Total 20 55 24 91 46 97 13 56 67 32 501 0 0 Producer's Accuracy 0.3 0.581818 0.5 0.725275 0.456522 0.865979 1 1 0.701493 0.90625 0 0.730539 0 Kappa 0 0 0 0 0 0 0 0 0 0 0 0 0.688936

155

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Table H.7: Accuracy of the Landsat 8 Random Forest classification of 2016 Land cover Wetland Woodland Cashew Gallery Secondary Savannah Built-up Water Agriculture Barren Total User's Kappa class Plantation Forest Dry Forest Area Bodies Land Accuracy (>5 years) (Villages) Wetland 4 0 0 2 3 0 0 0 0 0 9 0.444444 0 Woodland 1 15 0 0 3 14 0 0 1 0 34 0.441176 0 Cashew 0 1 4 0 0 0 0 0 0 0 5 0.8 0 Plantation (>5 years) Gallery 0 3 4 153 23 0 0 3 6 0 192 0.796875 0 Forest Secondary 0 2 0 15 42 2 0 0 7 0 68 0.617647 0 Dry Forest Savannah 9 19 0 0 4 50 0 0 0 3 85 0.588235 0 Built-up 0 0 0 2 4 3 10 0 1 1 21 0.47619 0 Area (Villages) Water 0 0 0 0 0 0 0 39 0 0 39 1 0 Bodies Agriculture 0 0 3 4 5 0 0 0 8 0 20 0.4 0 Barren Land 1 0 0 0 0 4 0 0 0 23 28 0.821429 0 Total 15 40 11 176 84 73 10 42 23 27 501 0 0 Producer's 0.266667 0.375 0.363636 0.869318 0.5 0.684932 1 0.928571 0.347826 0.851852 0 0.694611 0 Accuracy Kappa 0 0 0 0 0 0 0 0 0 0 0 0 0.61803

156

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Table H.8: Accuracy of the Landsat 8 Support Vector Machines classification of 2016 Land cover class Wetland Woodland Cashew Gallery Secondary Savannah Built-up Water Agriculture Barren Total User's Kappa Plantation Forest Dry Forest Area Bodies Land Accuracy (>5 years) (Villages) Wetland 6 0 0 2 0 2 0 1 0 0 11 0.545455 0 Woodland 8 36 0 0 2 22 0 0 1 4 73 0.493151 0 Cashew Plantation 0 0 8 0 0 0 0 0 0 0 8 1 0 (>5 years) Gallery Forest 0 2 4 80 7 5 1 3 7 0 109 0.733945 0 Secondary Dry 0 5 2 19 37 4 1 0 18 0 86 0.430233 0 Forest Savannah 8 15 0 0 0 73 0 0 0 9 105 0.695238 0 Built-up Area 0 0 1 0 0 0 13 0 0 1 15 0.866667 0 (Villages) Water Bodies 0 0 0 0 0 0 0 58 0 0 58 1 0 Agriculture 0 1 0 0 0 1 0 0 7 0 9 0.777778 0 Barren Land 0 0 0 0 0 0 0 0 0 26 26 1 0 Total 22 59 15 101 46 107 15 62 33 40 500 0 0 Producer's Accuracy 0.272727 0.610169 0.533333 0.792079 0.804348 0.682243 0.866667 0.935484 0.212121 0.65 0 0.688 0 Kappa 0 0 0 0 0 0 0 0 0 0 0 0 0.635468

157

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendix I

Figure I.1: Placed control points for the georeferencing of the Sentinel-2 images

158

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure I.2: Allignment issue of the Sentinel-2 2016 and 2020 images

159

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendix J Table J.1: Change detection matrix of Sentinel-2 change detection analysis Land cover class Agriculture Barren Built-up Cashew Gallery Savannah Secondary Dry Water Wetland Woodland Total 2016 (km²) Land (km²) Area (km²) Plantation (km²) Forest (km²) (km²) Forest (km²) Bodies (km²) (km²) (km²) (km²) Agriculture (km²) 37.57 0.03 0.01 3.22 60.14 0.39 72.56 0.01 14.33 10.50 198.78

Barren Land (km²) 1.92 4.16 0.04 0.04 0.14 4.73 0.37 0.01 0.70 2.58 14.70

Built-up Area (km²) 0.53 0.05 1.27 0.01 0.22 0.02 0.25 0.01 0.06 2.43

Cashew Plantation 3.72 0.02 0.00 7.19 7.26 0.01 1.31 0.04 0.15 19.71 (km²) Gallery Forest 73.36 0.21 0.10 31.00 465.88 1.37 94.96 1.28 14.49 10.43 693.07 (km²) Savannah (km²) 47.90 6.54 0.04 0.28 5.40 269.37 76.69 0.07 40.54 242.00 688.81

Secondary Dry 140.99 0.38 0.10 7.42 117.94 6.94 280.82 0.02 21.23 68.34 644.18 Forest (km²) Water Bodies (km²) 0.07 0.00 0.01 0.01 2.05 0.00 0.03 22.54 0.62 0.01 25.36

Wetland (km²) 24.20 0.32 0.02 0.13 42.95 8.02 102.27 0.17 103.93 93.98 375.98

Woodland (km²) 52.49 2.57 0.08 0.39 4.34 42.23 53.83 0.06 29.22 136.80 322.00

Total 2020 (km²) 382.76 14.29 1.66 49.70 706.32 333.10 683.09 24.15 225.12 564.85 2985.03

Change (km²)

160

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendix K

Figure K.1: Sentinel-2 land cover distribution in the three biggest regions in 2016

161

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure K.2: Sentinel-2 land cover distribution in the three smallest regions in 2016

162

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure K.3: Sentinel-2 land cover distribution in the three biggest sub-regions in 2020

163

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure K.4: Sentinel-2 land cover distribution in the three smallest sub-regions in 2020

164

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendix L

Figure L.1: Yearly forest loss chart of the West Fefine region

165

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure L.2: Yearly forest loss chart of the East Fefine region

166

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure L.3: Yearly forest loss chart of the Cuntabani corridor

167

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Figure L.4: Yearly forest loss chart of the Cheche corridor

168

GIMA MSc. Thesis Report Thierry van der Hoeven (6529380)

Appendix M

Figure M.1: BFAST monitoring results statistics in R

Figure M.2: BFAST monitoring results statistics in ArcGIS Pro

169