A dataset of glacier and glacial lake distribution in key areas of the - Economic Corridor during 2013 – 2017

Ren Yanrun1,2, Zhang Yaonan1*, Kang Jianfang1

ABSTRACT The China-Pakistan Economic Corridor is characterized by complex landform and geology, unique hydrological and climate condi- ARTICLE DOI: tions, and rich mountain snow and glacier distributions, which altogether 10.11922/csdata.2019.0022.zh DATA DOI: provide sufficient material conditions for glacier disaster development. 10.11922/sciencedb.786 However, due to geographical factors, field investigation and on-site data SUBJECT CATEGORY: collection are difficult. Remote sensing technology thus provides an impor- Earth sciences tant means to obtain data on the change and development of glacier and RECEIVED: June 12, 2019 glacial lake in this region. We determine a definition and classification cri- RELEASED: June 20, 2019 terion applicable for the glacier and glacial lake in the study region based ACCEPTED: August 26, 2019 PUBLISHED: August 30, 2019 on the concept of glacial lake and the scope of construction for China-Pa- kistan Economic Corridor, and make the relevant researches in glacial lake catalogue and study on glacier disasters as appeal. The dataset of glacier and glacial lake distribution in key areas of the China-Pakistan Economic Corridor is built with an object-oriented classification approach based on the Landsat 8 OLI images for 2013–2017. The data have a spatial scope between 34° to 42°N latitude and 73° to 82°E longitude approximately, covering Basin, Nubra Basin, Gaizi River Basin, and River Basin as typical watersheds in northern Pakistan. The object-oriented classification method can be used to improve the interpretation accuracy while ensuring the timeliness of interpretation as compared with tradi- tional methods. Long-term and regular monitoring of glaciers and glacial lakes in the areas can provide data support for further construction of the China-Pakistan Economic Corridor, and is significant as a support for sci- entific decision-making regarding regional water resource change and the risk assessment of glacial lake outburst.

KEYWORDS China-Pakistan Economic Corridor; glacier; glacial lake; object-oriented

1. Scientific Big Data Center, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou 730000, P.R. China 2. University of Chinese Academy of Sciences, Beijing 100004, P.R. China * Email:[email protected]

14 www.csdata.org A dataset of glacier and glacial lake distribution in key areas of the China-Pakistan Economic Corridor during 2013 – 2017

Dataset Profile A dataset of glacier and glacial lake distribution in key areas of the China-Pakistan Title Economic Corridor during 2013 – 2017 Data corresponding author Zhang Yaonan ([email protected]) Data authors Ren Yanrun, Zhang Yaonan , Kang Jianfang Time range From January 1, 2013 to December 31, 2017 Geographical scope 34°–42°N, 73°–82°E Spatial resolution 30 m Data volume 337 MB Data format SHP Data service system Special environment and function of observation and research stations shared service platform of the National Science and Technology Infrastructure Sources of funding (Y719H71006); Informatization Program of Chinese Academy of Sciences: Construction and application of technology cloud on studies of environmental evolution in the cold region (XXH13506). The dataset consists of 2 subsets in total. It comprises datasets of glacier and glacial Dataset composition lake distribution in glacial regions of the China–Pakistan Economic Corridor during 2013–2017.

1. Introduction

In the context of global warming, the area of glaciers in China has generally accelerated retreat in recent decades. Comparing the data of twice glaciers cataloguing, we can see that the area of glaciers in China has decreased by about 17% in the past 30 years[1]. The regional hydrological processes and changes in glacial lakes are greatly affected by the glacial environment. The China-Pakistan Economic Corridor is steep in terrain, complex in geological structure, variable in hydrological characteristics, abundant in solid precipitation, with full development of modern glaciers, active in glaciers change, where the glacially obstructed lakes are widely distributed. The flood disaster caused by the glacial lake collapse is one of the typical types of disasters in the study region. The glacial lake is a lake formed by glaciers or a lake with glacial melt water as the main source of makeup[2]. The development and change process of glacial lakes are closely related with the changes in climatic environment, glacier mass balance, glacier temperature and hydraulic characteristics. Modern glaciers are the main source of water for melting snow and ice in the mountains and the floodwater due to glacial lakes collapse in this region. The increase of water from the upper reaches of the glacial lake caused by climate change, the ablation deformation of ice dams and ice landslides, ice falls, landslides and mudslides that enter the lakes could possibly cause the dam to break down (collapse) and form a glacial lake collapse flood. Floods caused by glacial lake outbursts and their secondary disasters are one of the most common disasters in alpine glaciers. Glacier obstructed lakes and hail obstructed lakes are the two types of icy lakes that are the most often broken (collapsed). Most of the roads of China-Pakistan Expressway are located in the valleys. The glaciers in the mountains and valleys on both sides of the roads are covered with snow for many years. Glacier activity is one of the main stimulating factors for the geological disasters in the China-Pakistan Economic Corridor. In view of the glacial lake disaster caused by glacial activity and its possible damage to the China-Pakistan Economic

China Scientific Data Vol. 4, No. 3, 2019 15 China-Pakistan Economic Corridor

Corridor, this paper uses the Landsat 8 OLI remote sensing image from the US Geological Survey data center and employs the object-oriented classification method based on the support of the GIS platform. And conducts the classification and extraction of glaciers and glacial lakes distribution in multiple typical basins in the scope of the China-Pakistan Economic Corridor including the Basin, the Nubra Basin, the Gaizi River Basin, and the Shaksgam River Basin. In view of the symbiotic relationship between glaciers and glacial lakes, sometimes the spatial positions of the two overlap, and the icy lake and the mountain shadows have similar spectral features. It is difficult to accurately extract the boundaries between glaciers and glacial lakes. Due to the complexity of its physical and chemical characteristics and the influence of its surrounding background, the extraction of alpine glacial lakes is mostly carried out by traditional field monitoring combined with manual interpretation, which requires a large amount of analysis and processing. Based on the object-oriented classification method, the multi-scale segmentation algorithm is used to segment high-resolution images. The snow cover index method and the normalized water body index method are used to classify and extract typical features, and the interference factors are eliminated to realize automatic extraction of information on glaciers and glacial lakes in the remote sensing images. Long-term and time-based monitoring of the spatial distribution and variation of glaciers and glacial lakes in the study region can provide data support for the further construction of the China-Pakistan Economic Corridor, which is also significant for the scientific decisions on regional water resources change and glacial lake collapse risk assessment. The Cold and Arid Regions Engineering and Research Institute of the Chinese Academy of Sciences organized and participated in many scientific surveys of glaciers and glacial lakes, and participated in the compilation of a large number of glacier cataloging data, such as the second glacier cataloging work in China since 2006, the Chinese Himalayan in 2015. Remote sensing survey and cataloging of mountain ice lakes, and shared them in the cold area scientific data center. In the economic corridor of China and Pakistan, a large number of investigations and research work have been carried out on the glacial disasters in the Karakorum Mountains based on construction projects such as transportation and water conservancy. Most of China’s glacial lake research and cataloguing work is concentrated in the Himalayas in Tibet. Tibet is the region with the largest distribution of glacial lakes in China. In 1987, Chinese scientists teamed up with Nepal to investigate the glacial lake outburst floods in the region and catalogue the glacial lakes. Based on this, the scale and scope of the glacial lake outburst were predicted and the achievements were remarkable. The type of glacial lakes in this region is mainly hail obstructed lakes. The catastrophic flood caused by the glacial obstructed lake collapse is mainly in the northern slope of the Karakorum Mountains. The floods in this region have a significant impact on the water resources change and research of the Yarkant River, but the complete glacial lake data for this region has yet to be seen. The preparation of this glacial and glacial lake dataset has partly compensated for the vacancies in the glacial lake cataloguing.

2. Data collection and processing

2.1 Data Sources and Pre-processing Landsat 8 is the latest satellite launched by the Landsat series in February 2013. It carries two sensors, TIRS (Thermal Infrared Sensor) and OLI (Operational Land Imager), and adds some new features. Some new bands are added based on the original, and some bands have been adjusted. The Landsat 8 remote sensing image has the characteristics of large image area (image width 185 km×185 km), short acquisition cycle (16 days) and rich band information (9 bands), which can be analyzed by different band combinations.

16 www.csdata.org China-Pakistan Economic Corridor A dataset of glacier and glacial lake distribution in key areas of the China-Pakistan Economic Corridor during 2013 – 2017

The Landsat 8 OLI image has a maximum spatial resolution of 15 m in the panchromatic band and a spatial resolution of 30 m in the other bands. By combining different bands of multi-spectral images, the types of land objects of interest to the user can be highlighted. According to the reflectivity characteristics of different bands of ice, snow and water, the RGB combinations of the 2nd, 5th and 6th bands can be selected for interpretation and visual interpretation. The preparation of the glacier/glacial lake distribution dataset is mainly based on the Landsat 8 OLI remote sensing images. A total of 80 images from 2013 to 2017 were selected, covering most of the glacier distribution region around the China-Pakistan Economic Corridor, for the detailed info of the Landsat images used in the dataset please see Table 1, and for the distribution of Landsat images throughout the study region please see Figure 1.

70°0′0′′E 73°0′0′′E 74°0′0′′E 76°0′0′′E 78°0′0′′E 80°0′0′′E N N ′′ ′′ 0 0 ′ ′ 0 0 ° ° 40 40 N N ′′ ′′ 0 0 ′ ′ 0 0 ° ° 35 35

Legend Key zones of China-Pakistan Economic Corridor Boundary of watershed at study region Landsat image 100 50 0 100Kilomctcrs RGB Red: Band_1 Green: Band_2 Blue: Band_3

72°0′0′′E 74°0′0′′E 76°0′0′′E 78°0′0′′E Figure 1 Distribution diagram of remote sensing images used in dataset 2015 in the study region

In the process of interpretation of remote sensing images, the extraction of alpine glaciers and glacial lakes is affected by the shadows of clouds and mountains due to the phenomenon of “homologous (same object with different spectrum)” and “homogeneous foreign bodies (same spectrum for different objects)”. In order to eliminate the influence of cloud, fog, snow, and mountain shadow on the interpretation accuracy of remote sensing images, remote sensing images with less cloud cover in July-October during the year are

China Scientific Data Vol. 4, No. 3, 2019 17 China-Pakistan Economic Corridor

prioritized. The images in this period accounts for 70% of the total of images. Since the glacial lake is greatly affected by the temperature and precipitation during the year, the alternative image period is extended to June to November when the temperature of the study region is high and the precipitation is rich. At this time, the area of the glacial lakes is relatively larger and the water level is high, which is convenient for extraction and identification. All Landsat images provided by the USGS (United States Geological Survey) portal have undergone system radiation correction, ground control point geometry correction and DEM-based terrain correction, so no image is corrected during this data pre-processing phase.

Table 1 List of Landsat image information as used in dataset

Image Numbering Acquisition Date of Image Watershed Title Range of Longitude/latitude (Path: band number; Row: row number) 2013 2014 2015 2016 2017 10/09 09/26 10/15 11/02 08/01 P: 149; R: 34; 10/09 07/24 10/31 10/01 11/05 P: 149; R: 35; P: 149; R: 36; 11/26 06/06 10/31 10/17 10/04 Hunza Basin 35°–37°10′N, 73°–76°E P: 150; R: 34; 07/28 10/03 08/19 07/02 09/09 P: 150; R: 35; P: 150; R: 36; 07/28 09/17 08/19 05/01 09/09 06/01 06/13 05/15 07/02 10/27 10/27 11/15 11/18 10/19 11/07 P: 147; R: 35; P: 147; R: 36; 11/28 11/15 11/18 10/19 11/07 Nubra Basin 34°32′–35°40′N, 76°45′–77°48′E P: 148; R: 35; 11/03 09/19 07/04 09/24 10/29 P: 148; R: 36; 11/03 09/19 07/04 12/13 10/29

Gaizi Basin P: 149; R: 33; 10/09 09/26 10/15 11/02 10/02 38°10′–39°10′N, 73°42′–77°08′E P: 150; R: 33; 06/01 10/03 08/19 07/02 09/09 Shaksgam Basin 35°31′–36°49′N, 75°35′–77°30′E P: 148; R: 35; 11/03 09/19 07/04 09/24 10/29 07/03 06/15 07/04 11/11 08/01 P: 148; R: 33; P: 148; R: 34; 07/03 09/19 07/04 06/02 09/27 Others P: 151; R: 35; 10/07 07/22 10/13 10/15 08/15 P: 151; R: 36; 06/01 11/11 10/29 10/15 08/15

2.2 Profile of the Study Region The research scope of this glacier/glacial lake distribution dataset is shown in Figure 2. According to the characteristics of watershed scale, hydrological variation characteristics and spatial distribution of glaciers, the Hunza River Basin, Nubra Basin, Gaizi River Basin and Shaksgam River Basin in the Karakorum Mountains are selected as typical research objects. The Hunza River Basin is located at the junction of the and Hindu Kush Mountains. The terrain is steep and the glaciers are densely presented, 80% of which are mountain glaciers[3]. The watershed and its surrounding feature the Paso-Mustag Mountain active glacier cluster including the Glacier, the Guerjin Glacier, the Paso Glacier, the Bartola Glacier, and many other Northwest-Southeast-oriented glaciers, where the hazards due to glacial lake collapses and glacier mudslides as caused by glacier change are high. By statistics 110 glacial lakes with dynamically changing area are distributed at the Hunza River Valley[4]. In the observation of the glaciers in the Hunza River Basin, it is also found that adjacent glaciers are not synchronized for retreat or forward in the same period[5], it can be seen that regular and dynamic monitor helps to understand the law of glacial advancement and retreat and the prevention and control of glacial

18 www.csdata.org China-Pakistan Economic Corridor A dataset of glacier and glacial lake distribution in key areas of the China-Pakistan Economic Corridor during 2013 – 2017

disasters. The Paso-Mustag Glacier near the Hunza River Valley is the most active part of the active glaciers along the China-Pakistan expressway. Since 1974, China has started the survey and research on glacial debris flow disasters in the Hunza River Valley on the southern slope of the Karakorum Mountains. In the recent glacial disasters, the Hunza River was blocked by the landslide caused by the earthquake in the mountains of northern Pakistan in 2010, which formed the world’s largest dammed lake, which was the most severe disaster in recent years. The Nubra Basin belongs to the Basin and is located in the central part of the Karakorum Mountains with high average altitude. It has an average area of about 5×103 square kilometers. It is affected by the westerly circulation and the precipitation is rich with glacier cover close to 50%. The river is mainly supplied by glacier melts. The Siachen Glacier, known as the “ Geographical Center”, is distributed in the basin, and there are meteorological stations such as Shiquan River and Tashkurgan. Every year, a large number of Karakoram glacier melts flows into the Indus tributary, and the Nubra River is one of them. The melting water of the Siachen Glacier is the main source of supply for the Nubra River. The existing research on the Nubra Basin disasters is relatively rare, and the long-term monitoring of the glaciers in the Nubra Basin has practical significance for regional water resources utilization and disaster research. N ′′ 0 ′ 0 °

Kashgar City 40

Buren Township N ′′ 0 ′ 0

Tashkurghan County ° 38

Mazar Village Khunjerab Port N ′′ 0 ′ 0 °

Legend 36 Node cities/towns on “China- Pakistan Economic Corridor” Karakorum Road River

0 25 50 100 150 200 km

74°0′0′′E 76°0′0′′E 78°0′0′′E 80°0′0′′E Elevation/(m) 0 1000 2000 3000 4000 5000 >5000 Figure 2 Sketch of the geographical position in the scope of the research region

China Scientific Data Vol. 4, No. 3, 2019 19 China-Pakistan Economic Corridor

The Gaizi River Basin is located in the southwestern part of , on the eastern edge of the Pamirs. It is affected by the westerly circulation and the scale and geomorphology of the glaciers are fully developed. The Gaizi River is formed by the melting of snow and ice in the mountains of Muztag and Kongur. It is one of the sources of the River system. It has the Karakuri Hydrological Station, the Krok Hydrological Station, and the Vitak Hydrological Station and meteorological stations such as Kashgar and Tashkurgan. In 1984, China conducted a scientific investigation on the glacial debris flow in the Gaizi River Valley in the eastern edge of Pamir. There are many glaciers in the basin, such as the Krajak and Qimugan, where the Krajak Glacier is the largest modern glacier in the northern slope of Kongur Mountain. In 2015, glacial pulsations were observed. The Shaksgam River Basin is located in the source area of the , located on the north side of the Karakoram Mountain Watershed, in the westerly circulation control region[6]. The glaciers are highly densely distributed. The Shaksgam River is formed by glacial melt water and is one of the important tributaries of the Yarkant River. There are many large glaciers such as the longest glacier Insugati Glacier in China, the Tramukanli Glacier, and many other large glaciers in the basin. The glacial pulsation phenomenon has been recorded and observed many times[7-8], this is the main research region for the ice dam collapse flood in China. This region is surrounded by three meteorological observatories, Tashkurgan, Turkut and Ucha. In 1985, China began a scientific investigation on the glacial floods in the Shaksgam Valley on the northern slope of the Karakorum Mountains, providing data support for the development and utilization of the Yarkant River water resources. 2.3 Data Processing Procedure 2.3.1 Data Processing Method In order to quickly and accurately extract the target category objects, the object-oriented classification method is selected to process the image. According to the basic unit of the classification process, the classification of remote sensing images is divided into pixel-based classification methods and object- oriented classification methods. The pixel-based classification method mainly uses spectral features as the classification basis, and it is difficult to avoid the misclassification caused by “homologous (same object with different spectra)” and “different objects with the same spectrum”, and the classification accuracy is not high. The object-oriented classification method takes the object as the basic unit of image classification, makes full use of the spectral information of the image, and does the classification split by features such as shape, texture, spatial structure and context information[9-11], and the classification performance is better. The processing of remote sensing images is based on image spectral information. Due to differences in illumination intensity and satellite system deviation caused by different generation time between images, the spectrum of the same ground object may differ in different images. In order to avoid this situation, choose to complete all data processing and classification work for each image before image assembly. The data processing flow is shown in Figure 3. The specific steps are as follows:

20 www.csdata.org China-Pakistan Economic Corridor A dataset of glacier and glacial lake distribution in key areas of the China-Pakistan Economic Corridor during 2013 – 2017

Figure 3 Data processing flow

(1) Selection of the study region: Determine the scope of the study region according to the coverage of the China-Pakistan Economic Corridor and the spatial location of the glacier distribution, and obtain the strip number and line number of the corresponding Landsat images. (2) Selection and download of data: The research data is downloaded from the portal of the US Geological Survey, and the images with less clouds in summer and autumn are selected to reduce the interference factors. (3) Preprocessing of remote sensing images: Since the selected Landsat 8 OLI remote sensing image product grade is L1, the steps of atmospheric correction and geometric precision correction are omitted, and the full-color band of images and multi-spectral images are directly merged. (4) Image fusion: Combining multi-spectral and panchromatic bands, the spatial resolution is improved on the basis of maintaining multi-spectral characteristics, and the purpose of improving the accuracy of ground object extraction is achieved. (5) Determination of classification criteria: Select the type of features to be extracted and statistically calculate and analyze the category characteristic values, and determine the specific classification criteria for objects (features) and fine-tune the threshold range. (6) Object-oriented classification method: Combine the classification criteria to determine the optimal combination of segmentation parameters, and then perform object-oriented classification to achieve automatic interpretation of remote sensing images; due to the specificity of glaciers and glacial lakes in physicochemical properties, it is impossible to determine a unified extraction standard, and a combination of

China Scientific Data Vol. 4, No. 3, 2019 21 China-Pakistan Economic Corridor

global threshold extraction and local optimization is used in the classification and extraction process. (7) Export and verification of results: Combined with DEM data, when the classification result is exported, the scope whose slope does not conform to the water body feature (slope is greater than 5°) is removed, and the phenomenon of mountain shadow and water body misclassification is avoided; according to the definition of glacial lake, the classification results of the glaciers this time are analyzed by buffer zone and combined with the water system distribution map to remove the water bodies and natural water systems that do not conform to the spatial distribution of the glacial lakes; the classification results are then exported; and the classification results are evaluated by validating the samples. (8) Correction of results: Correction and modification of the results by means of visual interpretation of multi-source data in areas with poor classification results. 2.3.2 Removal of Mountain Shadows and Impact of Cloud/Snow The scope of mountain shadows and snow cover is the smallest in area in summer images. The selection of summer remote sensing images can eliminate the influence of mountain shadow and thin snow cover to the greatest extent. It is the best source of image extraction for target objects in this paper. Since the influence factors such as cloud amount are comprehensively considered (see Figure 4 for cloud amount information), the selection range of remote sensing images in this paper is extended to June to November. Extracting the glacier boundary in combination with the solar elevation angle can eliminate the images of the mountain shadow to the greatest extent. In the non-summer, the images with large snow images will be combined with visual interpretation to carry out the boundary treatment of glaciers and snow during the subsequent classification and extraction of the objects, and the influence of snow will be minimized. The cloud-removing plug-in i.e. haze tool provided by ENVI has a better effect on removing the influence of a small amount of clouds on the images during the classification process.

Figure 4 Cloud amount information diagram of remote sensing images used in the paper

Combined with the angle of incidence of the Sun when the Landsat 8 OLI images were acquired, the effect of the mountain shadow on the target feature extraction can be eliminated. The area and distribution of the mountain shadows at different solar elevation angles are different. The solar angle and azimuth angle of the remote sensing images are read, and the terrain shading map is generated by combining the terrain slope map to obtain the possible distribution range of the mountain shadow (The mountain shadow generally exists

22 www.csdata.org China-Pakistan Economic Corridor A dataset of glacier and glacial lake distribution in key areas of the China-Pakistan Economic Corridor during 2013 – 2017

on the shady surface of the huge ridgeline of the mountain, and the terrain shading value is generally 0); the similarity of the spectral features of the water body and the mountain shadow in the Landsat image exists, the mountain shadow can be removed by the slope map as generated by the DEM data when the result is exported. It is determined to be a mountain shadow and deleted when the slope is greater than 5°. Most of the glaciers in the study region have thick snow cover, which is characterized by a single spectral feature and narrow spectral range, this is beneficial to the extraction of glacier boundaries. In order to remove the influence of seasonal snow on the extraction of glaciers, the data obtained during the non-summer period in this paper will be combined with the original images of Google Earth to manually depict some areas. The medium-resolution single-time phase image is difficult to distinguish between glaciers and perennial snow. In the case of sufficient meteorological data, combined with meteorological data, the glacier boundary in the remote sensing images that is not affected by snow accumulation can be determined to maximize the distinguishing of range of snow and the glacier as a basis. 2.3.3 Extraction Method of Glaciers and Water Bodies Common methods for extracting snow and ice covered surface are Ratio Threshold Method and Normalized Difference Snow/Ice Index (NDSI). The ratio threshold method is based on the strong absorption of short-infrared band by snow/ice and strong reflection in visible light band for ground object distinguishing, and the snow cover index method normalizes the reflectance of visible light and near-infrared bands on the snow ice surface to extract the target object. In this paper, the normalized snow cover index method is used to distinguish the snow ice surface from the non-snow ice surface. The visible green band and the short infrared band of the Landsat 8 OLI images are selected and calculated. The calculation formula is (1):

Band - Band NDSI= Green SW1R1 (1) BandGreen + BandSW1R1

Where NDSI is the snow index, BandGreen is the visible green band of the Landsat 8 OLI image, and

BandSWIR1 is the short infrared band of the Landsat 8 OLI image. Common methods for extracting water bodies include single-band threshold method, multi-band threshold method (water body index method, inter-spectral relationship method), etc. The single-band threshold method is relatively simple and has limited precision. In this paper, the water body index method with the difference of water bodies and other ground features in spectral values among different bands as the classification basis is selected. The water body in the snow ice surface is extracted by the Normalized Difference Water Index (NDWI), because the reflectance of the water body in the near infrared band is almost zero so the visible green band and near-infrared bands of the Landsat 8 OLI images are used for calculation. The calculation formula is (2):

BandGreen - BandNIR NDW1= (2) BandGreen + BandNIR

Where NDWI is the water body index, BandGreen is the visible green band of the Landsat 8 OLI images,

and BandNIR is the near infrared band of the Landsat 8 OLI images. Comparing the statistical characteristics of the greyscale map calculated by the snow cover index method

China Scientific Data Vol. 4, No. 3, 2019 23 China-Pakistan Economic Corridor

and the normalized water body index method, the optimal threshold extraction range is determined. If the NDSI is greater than 0.05, it is a snow-covered area. In the snow ice surface, the NDWI is greater than 0.15 and then it is assigned into water bodies. Due to the differences in terrain and geomorphic conditions and generation time of remote sensing images selected in this paper, that only NDSI, NDWI as indices are used as the basis for extraction and a uniform threshold is given cannot effectively process all the data. Therefore, in the specific treatment process the threshold and method are adjusted accordingly for the partial images with poor information extraction performance, in order to acquire more accurate boundary info for the glaciers and glacial lakes in the study region. 2.3.4 Segmentation and Classification The main process of object-oriented classification includes image segmentation and classification. Image segmentation is the first step in object-oriented classification. By comparing the experimental results, multi- scale segmentation is selected as the image segmentation method. As a more commonly used segmentation method in object-oriented classification, multi-scale segmentation considers the spectrum, shape and texture information of images for the bottom-up areas combination to ensure the minimum heterogeneity between objects and the maximum homogeneity within the object. The merged heterogeneity is smaller than the set pixel or the “small object” with set threshold[12-14]. The polygon formed by the segmentation not only contains the original spectral information of the merged pixel, but also forms the shape, texture and spatial location as information. The technical pathway of the object-oriented classification process is shown in Figure 5. The specific steps are as follows:

Figure 5 Technical pathway for object oriented classification process

24 www.csdata.org China-Pakistan Economic Corridor A dataset of glacier and glacial lake distribution in key areas of the China-Pakistan Economic Corridor during 2013 – 2017

(1) Multi-scale segmentation and parameter selection: Segmentation of the merged images. Combine experience and segmentation experiment results to establish appropriate segmentation rules, adjust the image segmentation size and select segmentation parameters. The selection of the segmentation scale and the segmentation parameters will directly affect the accuracy of the classification. According to the resolution of the Landsat images and the accuracy requirements of the classification, the scale of segmentation is determined to be 200. On the basis of the experience of the predecessors, after many attempts the segmentation parameters are determined as: Compactness is 0.5, Shape parameter is 0.2, and better classification results are obtained. (2) Object feature extraction: The image NDSI and NDWI feature sets are obtained by calculating the snow index and the normalized water body index for the 3rd, 5th and 6th bands of Landsat 8. (3) Automatic computer classification: According to the determined classification rules, preliminary classification and extraction are carried out in order. The classification rules of snow ice surface are NDSI>0.05, glaciers are NDSI>0.05 and NDWI ≤ 0.15, glacial lakes are NDSI>0.05 and NDWI> 0.15. The global threshold segmentation based on the above initial threshold will extract all possible water bodies in the study region, including partially melted glaciers and mountain shadows, etc., using criteria of NIR<0.15 in the near-infrared band and SWIR<0.05 in the short-wave infrared band, the melting glaciers in the results can be excluded. Local optimization is carried out to eliminate the water body that is misjudged. The icy lakes are used as the local segmentation unit. According to the statistical characteristics of the NDWI value, the histogram of the glacial lake is bimodal distribution, which is used as the criterion of judgment. If it is a single peak, it is a non- ice lake. It is considered that the unit is a non-ice lake and is excluded. If the histogram meets the bimodal distribution criterion, the unit is considered to be an icy lake and the icy lake boundary is determined. The average value of the slope and the shading value in each glacial lake unit is counted, and the glacial lake unit whose slope mean value is not within the slope value range of the water body is removed, and the glacial lake unit whose shading value is within the range of the shading value of the mountain body is removed. (4) Combine and remove small map spots: Combine the adjacent objects of the same category into one whole; merge the smaller area map spots with adjacent spots of the same category, if there is no same category, then it is merged with feature(s) of other categories; refer to the criteria used by predecessors to prepare glacial catalogs based on medium-resolution remote sensing data such as Landsat and combine the actual area ofimage pixels, select 0.01km2 as the minimum glacial area threshold, and 0.001 km2 as the minimum area threshold for retaining glacial lakes. . (5) Evaluation after classification: Verification and evaluation of classification results.

3. Sample description

The scope of the study includes typical surface types in high and cold alpine regions such as the glaciers, snow, and water bodies. The glaciers and glacial lakes are selected as the target object types for extraction. The types and ranges of features (objects) in this dataset are based on GCS_WGS_1984 projection coordinates. And the existing glacier/glacial lake catalogue specification is referred to, the extraction of glacier boundaries and the calculation of glacier properties are conducted using the internationally accepted glacial cataloguing method, the files include FID, shape type, GLIMS code (GLIMS_ID), glacier name (Glc_ Name), area, perimeter, category (Class_name), centroid position coordinates (x-coordinate, y-coordinate) and other information. The distribution of glacier/glacial lakes in the typical region of China-Pakistan

China Scientific Data Vol. 4, No. 3, 2019 25 China-Pakistan Economic Corridor

Economic Corridor in 2015 is shown in Figure 6.

70°0′0′′E 72°0′0′′E 74°0′0′′E 76°0′0′′E 78°0′0′′E 80°0′0′′E 82°0′0′′E N N ′′ ′′ 0 ′ 0 ′ 0 0 ° ° 40 40 N N ′′ ′′ 0 0 ′ ′ 0 0 ° ° 35 35

Legend Key zones in China-Pakistan Economic Corridor Boundary of watersheds in study region km Glacial lake 0 60 120 240 360 480 Glacier

70°0′0′′E 72°0′0′′E 74°0′0′′E 76°0′0′′E 78°0′0′′E 80°0′0′′E 82°0′0′′E Figure 6 Distribution map of glaciers/glacial lakes in key zones of China-Pakistan Economic Corridor in 2015

In the data preparation process, the area and perimeter in the attribute information are statistically calculated by the tools provided by ARCGIS; the glaciers and glacial lakes are coded according to the GLIMS coding method, and some glacier names are supplemented. Specifically, the latitude and longitude coordinates of the glacier or glacial lake centroid are coded as follows: GnnnnnnEmmmmm[N|S] GLnnnnnnEmmmmm[N|S] Where G stands for glaciers, GL stands for glacial lakes, n and m can be obtained by the latitude and longitude as derived from the centroid decimal format, which is then multiplied by 1000 and rounded up, and N and S represent the southern and northern hemispheres, respectively. The object-oriented classification method has good applicability in this data preparation process. Because the object-oriented classification process considers the texture and other information, it has a better classification effect on the shape-regulated features, and it performs well in the extraction of water features. It can effectively inhibit the cloud-snow interference with excellent extraction precision of glaciers, and the integrity and homogeneity of the classification results have been improved[15-16]. The method used in this paper has a good overall classification performance, but there are also some cases where the snow cover index method mis-assigns some mountain shadows, rivers or water bodies into snow ice surface. The

26 www.csdata.org China-Pakistan Economic Corridor A dataset of glacier and glacial lake distribution in key areas of the China-Pakistan Economic Corridor during 2013 – 2017

normalized water body index method cannot completely distinguish the water body and snow ice surface and some water bodies are mis-judged as glaciers.

4. Quality control and assessment

Unsupervised classification can do automatic identification and classification in the absence of a priori sample, without human intervention, and is the mainstream method for automatic extraction of remote sensing images. The supervised classification selects representative and definitive surface points as training samples and establishes judgment functions according to the characteristic parameters of the samples and classifies the pixels, and subsequently improves the classification accuracy by testing and evaluating the control samples, it is a method of high precision as widely recognized in the classification of features. In order to evaluate the adaptability and accuracy of the object-oriented method in the classification process, the classification results of the maximum likelihood method in the supervised classification are combined with the visual interpretation of the classification results with the highest accuracy of the classification as currently acknowledged to form the standard data, and the confusion matrix iscalculated (Table 2), and the production accuracy, user accuracy, overall accuracy and Kappa coefficient are also calculated for accuracy evaluation. The results are shown in Table 3. From the table, we can find that the object-oriented classification method has higher classification accuracy for glaciers and water bodies, and meets the user’s need for automatic interpretation of Landsat remote sensing images. The overall classification accuracy is 0.9072165, and the Kappa coefficient is 0.8088042, and the adaptability in the ground objects (features) classification process for Landsat 8 remote sensing images is excellent. In this paper, the study region overlaps with the second glacial catalogue data of China. Therefore, the glacial distribution data of the same region has been selected to evaluate the accuracy of the targeted glaciers. The results are shown in Table 4. It can be seen from the table that the classification method of this paper has higher classification accuracy for glaciers and better consistency with glacier catalogue. Due to the strong seasonal variation of glacial lakes and the inter-annual and seasonal differences in the variation of different types of glacial lakes, this paper selects the remote sensing images of summer and autumn for assembly. Although it is suitable for glacial data, it has a great influence on the extraction of glacial lake data. In the next work, we will combine the remote sensing images with shorter revisiting cycles to extract information from the glacial lakes with multi-source data.

Table 2 Confusion matrix for object oriented classification

User\Reference Glacial Lake Glacier Other Objects Total Glacial Lake 11 0 1 12 Glacier 1 62 6 69 Other Objects 0 1 15 16 Total 12 63 22

Table 3 Analysis of accuracy for single type and overall accuracy Kappa Overall Producer Heddlen Short Kappa User Accuracy Coefficient of Classification Accuracy Accuracy Accuracy Coefficient All Types Accuracy Glacial Lake 0.9166667 0.9166667 0.9166667 0.8461538 0.905 Glacier 0.9841270 0.8985507 0.9393939 0.8857143 0.945 0.9072165 0.8088042 Other Objects 0.6818182 0.9375 0.7894737 0.6521739 0.619

China Scientific Data Vol. 4, No. 3, 2019 27 China-Pakistan Economic Corridor

Table 4 Analysis of glacier type accuracy as done with the 2nd glacier catalogue data China

Producer Accuracy User Accuracy Heddlen Accuracy Short Accuracy Overall Classification Accuracy Kappa Coefficient Glacier 1 0.6551724 0.7916667 0.0.6551724 0.7297297 0.451

5. Value and significance

The glacier disasters in this study are widely distributed, and there are many types and great hazards. The regular monitoring of the spatial location and distribution of local glaciers and glacial lakes is of great significance for the research and prevention of glacial disasters. In the macro background of global warming, the Karakoram Mountains have stagnation and even advancement of glaciers in some areas and the glacier movement is active. Selecting typical areas for research is of great significance for understanding the climate change in the Karakoram Mountains. There are many types of geological disasters near the China-Pakistan Economic Corridor, which are widely distributed, and most of them are disasters triggered by glacial activities. The relevant observations and research can provide theoretical and data support for the disaster prevention and mitigation work of the China-Pakistan Expressway. In the Karakoram and Pamir eastern sections under the both influence of the westerly wind and the monsoon, the glaciers are large, and glacial disasters such as floods and glacial debris flows frequently occur. However, due to the complex terrain and harsh environment, the field observation data is less. It is necessary to detect it by means of remote sensing. The ground resolution of Landsat images has been difficult to meet the currently evolving needs for surface refinement research, but it is still irreplaceable as a reference data in the preparation of surface type data. This paper automatically interprets glacier, water and other land types through image selection, multi- scale segmentation, and determination of feature extraction functions. Compared with the traditional method, the object-oriented classification method improves the interpretation accuracy based on the guarantee of the timeliness of interpretation. The prepared dataset conducts ground features (objects) extraction based on the object-oriented classification method for the glacier distribution of the China-Pakistan Economic Corridor. Using the optimized segmentation parameters and classification criteria, the overall classification accuracy is 0.9072165, and the Kappa coefficient is 0.8088042 as the classification results. In the object-oriented classification method, the selection of segmentation parameters has a great influence on the classification results. The optimal parameter combinations corresponding to different types of remote sensing images and target features are different. Due to the lack of uniform evaluation indicators for segmentation effects, artificially and visually determining the segmentation effect brings a certain subjective influence to the classification. Next, we should focus on the establishment of the segmentation effect evaluation index and improve the application scope of the object-oriented classification method. Due to differences in topography and seasonal variation characteristics, there is no uniform method for processing all data and obtaining good results. The difference in glacier morphology and characteristics determines that a single threshold cannot obtain good extraction results in a wide range. In the specific processing procedure, in addition to using NDSI, NDWI as indices as the basis for extraction, for some images, an information extraction method combining threshold segmentation and classification algorithm is needed. The best way to verify the classification accuracy is to use the field survey point as a sample point. However, the ground object category of this verification sample is from the supervised classification result based on visual interpretation, which is inevitably biased.

28 www.csdata.org China-Pakistan Economic Corridor A dataset of glacier and glacial lake distribution in key areas of the China-Pakistan Economic Corridor during 2013 – 2017

6. Usage notes and recommendations

Based on the Landsat 8 OLI images, this dataset obtains the distribution data of glaciers and glacial lakes in the typical zones of China-Pakistan Economic Corridor in 2013–2017. Based on the object-oriented method, the classification of features is done and provides fundamental data for the glacier change monitoring and local hydrological change research. The fundamental data is an important reference data for local glacier disaster research. The dataset is saved in vector SHP format. The commonly used GIS and remote sensing software such as ArcGIS, QGIS, ENVI, ERDAS all support the reading and operation of the data.

Acknowledgements Thanks to the Landsat data provided by the USGS Geological Data Center, thanks to the second glacier catalogue data of the Chinese Pamirs and the Pakistan Glacier catalogue data provided by the Cold and Arid Science Data Center.

References 1. qiu J. Measuring the meltdown. Nature Publishing Group, 2010. 2. Qin DH. Scientific Thesaurus for Frozen Circle, Beijing: Meteoreological Press, 2014. 3. Campbell JG, Pradesh H. Inventory of glaciers, glacial lakes and the identification of potential glacial lake outburst floods (GLOFs) affected by global warming in the mountains of , Pakistan and China/Tibet Autonomous Region.Available: [Accessed June12, 2018]. 4. Zhu YY, Yang ZQ, Ye CY. Glacier diasters with China-Pakistan Karakoram road. Road Transportation Technologies, 31(2014): 51-59. 5. Cockerill G, Conway M, Younghusband F, et al. Explorations in the Karakoram: discussion. The Geographical Journal, 68(1926): 468-473. 6. Feng T. Comparison study for glacier changes in south and north slopes of Chogori Karakoram. Beijing: University of Chinese Academy of Sciences, 2015. 7. Shanngguan DH, Liu SY, Ding YJ, et al. Pulsating glaciers found in recent years at Shaksgam valley Karakoram.Glaciers and Frozen Earth, 27(2005): 641-644. 8. Xu AW, Yang TB, Wang CQ, et al. Remote Sensing Monitor of Glacier Changes at Shaksgam Basin Karakoram from 1978 to 2015.Progress of Geographical Science,35(2016): 878-888. 9. Su W, Li J, Chen Y, et al. Textural and local spatial statistics for the object‐oriented classification of urban areas using high resolution imagery. International journal of remote sensing, 29(2008): 3105-3117. 10. Yu Q, Gong P, Clinton N, et al. Object-based detailed vegetation classification with airborne high spatial resolution remote sensing imagery. Photogrammetric Engineering & Remote Sensing, 72(2006): 799- 811. 11. Lobo A, Chic O, Casterad A. Classification of Mediterranean crops with multisensor data: per-pixel versus per-object statistics and image segmentation. International Journal of Remote Sensing,17(1996): 2385-2400. 12. Zhu JJ, Du XP, Fan XT, et al. Comparison of three image segementation algorithm and improvement of image segementation method. Computer Application and Software, 31(2014): 194-196. 13. Lyv ZY, Zhang XL, Gao LP, et al. Research and Application Analysis for FNEA Segementation Algorithm Based on High Resolution Remote Sensing Data.Mapping & Space Geographical Information,35(2012): 13-16. 14. Tan QL, Liu ZJ, Shen W. An object oriented remote sensing image multi-scale segementation method. Journal of Beijing Jiaotong University,31(2007): 111-114. 15. Laliberte A S, Rango A, Havstad K M, et al. Object-oriented image analysis for mapping shrub encroachment from 1937 to 2003 in southern New Mexico. Remote Sensing of Environment,93(2004): 198-210. 16. Schiewe J, Tufte L, Ehlers M. Potential and problems of multi-scale segmentation methods in remote sensing. GeoBIT/GIS,6(2001): 34-39.

Authors and contributions Ren Yanrun, PhD candidate, research direction: hydrology in the cold and dry regions. Main

China Scientific Data Vol. 4, No. 3, 2019 29 China-Pakistan Economic Corridor

responsibilities:algorithm design and implementation, data processing, data accuracy verification. Zhang Yaonan,PhD, researcher, research direction: the big data from geosciences. Main responsibilities: data processing process design. Kang Jianfang, master, engineer, research direction: the big data application in the cold and dry regions. Main responsibilities: data management and analysis.

Data Citation 1. Ren YR, Zhang YN, Kang JF.A dataset of glacier and glacial lake distribution in key areas of the China- Pakistan Economic Corridor during 2013 – 2017. Science Data Bank, 2019. (2019-06-12). DOI: 10.11922/ sciencedb.786.

How to cite this article: Ren YR, Zhang YN, Kang JF. A dataset of glacier and glacial lake distribution in key areas of the China-Pakistan Economic Corridor during 2013 – 2017. China Science Data 4(2019). DOI: 10.11922/csdata.2019.0022.zh.

30 www.csdata.org