<<

A dataset of lakes with an area above 10 km2 in northwest China (2000 – 2014) China Scientific Data 1 2 1* Vol.3, No.4, 2018 Zhang Dahong , Li Xiaofeng , Yao Xiaojun 1. College of Geography and Environmental Science, Northwest Normal University, Lanzhou 730070, P.R. China; 2. Geological Survey Institute of Sichuan Province, Chengdu 610081, P.R. China * Email: [email protected] ARTICLE DOI: 10.11922/csdata.2018.0041.zh Abstract: Northwest China is situated inland which has a dry climate. Lake DATA DOI: 10.11922/sciencedb.621 area can reflect the temporal and spatial distribution and the variations of

SUBJECT CATEGORY: regional water resources to a certain extent. Based on a comprehensive Earth sciences analysis of meteorological data and the actual coverage of Landsat satellite RECEIVED: images, we firstly identified the time periods for interpretation of the lakes June 22, 2018 in northwest China. Referring to “A data set of lakes with an area above 1 RELEASED: June 28, 2018 km2 in China (2005-2006) in the scale of 1:250000”, this study selected 113 2 PUBLISHED: non-dry salt lakes with an area above 10 km as vectorization objects, all of November 14, 2018 which are in natural conditions. The lake boundary vector data from 2000 to 2014 were extracted by using artificial visual interpretation. Abiding by relevant interpretation principles stipulated by the Ministry of Science and Technology, the accuracy of interpretation was controlled within one pixel. The dataset includes three parts: (1) boundary vector data of northwest China; (2) boundary vector data of the lakes from 2000 to 2014; and (3) centroid vector data and area statistics of the lakes. The dataset basically reflects the changes of lake boundaries in northwest China from 2000 to 2014, which can be used as basic data for researches on temporal and spatial changes of lakes in this region, climate changes, and human intervention in regional water resources.

Keywords: lake; northwest China; Landsat; visual interpretation

Dataset Profile A dataset of lakes with an area above 10 km2 in northwest China English title (2000 – 2014) Corresponding Yao Xiaojun ([email protected]) author Data authors Zhang Dahong, Li Xiaofeng, Yao Xiaojun

Time range 2000 – 2014

- 1 - Northwest China (31° 36’ – 49° 36’N, 73° 29’E – 111° 27’E); specific areas include: Tienshan Mountains, Altun Mountains, Qilian Mountains, Kunlun Mountains, Altay Mountains, Junggar Basin, Geographical scope Tarim Basin and Turpan Basin. Administrative units include: Shaanxi Province, Province, Qinghai Province, Xinjiang Uygur Autonomous Region and Alxa League, City and in Inner Autonomous Region. Spatial resolution 30 m Data volume 37.9 MB Data format ESRI Shapefile file (compressed in *.zip) Data service system http://www.sciencedb.cn/dataSet/handle/621 Open Foundation of the State Key Laboratory of Cryosphere Sciences, Chinese Academy of Sciences (SKLCS-OP-2016-10); Sources of funding Fundamental Program from the Ministry of Science and Technology of China (MOST) (2013FY111400) The dataset consists of 3 subsets in total: (1) Boundary_NWC.zip is made up of boundary data of regions in northwest China, with a data volume of 1.29 MB; (2) Lake_NWC_2000-2014.zip is made up of Dataset composition boundary data of the lakes with an area above 10 km2 in northwest China, with a data volume of 36.56 MB, and (3) Location_NWC.zip is made up of location data of the lakes, including statistical data of lake area changes in 2000–2014, with a data volume of 0.01 MB.

1. Introduction

Lake information, which can reveal global climate changes and corresponding regional responses, plays a special role in maintaining regional food, ecological and environmental security.1-2 As a sensitive indicator of climatic and environmental changes, lake is a direct embodiment of global changes.2-4 As lake area changes reflect the spatial and temporal distribution of regional water resources, the vector data of lake boundaries are fundamental for researches on the spatial and temporal changes of lakes, climate changes and regional water resource utilization. Though the development of remote sensing technology has enabled lake information to be extracted semi-automatically or automatically, acquisition of long-sequence and large-scale lake data is still limited by many conditions, such as algae coverage, water quality (e.g., sediment content, salinity, etc.) and depth of lakes, image shadows and cloud cover. These conditions cause different spectra on the same lake. Although artificial visual interpretation has such disadvantages as heavy workload and high knowledge and experience demand, as a traditional method for remote sensing recognition of lakes, it makes full use of prior

- 2 - knowledge to generate high-accuracy data. Along with the large-scale development of land resources in northwest China in the past 50 years, surface runoffs were excessively intercepted into lakes as a large number of water conservancy facilities were built, aggravating the shortage of water resources in the downstream. This has caused many problems, such as rapid shrinkage of lakes, salinization and even drying up. Ecological environment of the lakes and their adjacent areas were seriously endangered, resulting in loss of lake biodiversity and intensified desertification of lakeside areas. In 2014, the Chinese Academy of Sciences launched the STS project "Comprehensive Assessment of Ecological Changes in Northwest China". By using remote sensing data, one of the tasks was to make a detailed assessment of the ecological change process in northwest China in the past 15 years. An important component of this assessment was to understand the current status and variation characteristics of the lakes in northwest China, which would provide the basis for formulating scientific protection countermeasures. In recent years, some relevant datasets for the lakes have been published,5 such as "A dataset of microwave brightness temperature and freeze-thaw for medium-to-large lakes over the High Asia region 2002 – 2016"6 and "MODIS (MOD09Q1)-derived lake water surface dataset of the Tibetan Plateau (2000 – 2012)".7 However, high-accuracy and long time-series datasets covering the whole northwest China is still rare. This dataset was produced from a combination of the following: remote sensing Landsat images, "A data set of lakes with an area above 1 km2 in China (2005-2006) in the scale of 1:250000",5 "A data set of vector boundaries of main lakes in Hoh Xil region (2000-2011)"8 and China Lakes Record.9 Meanwhile, abiding by relevant regulations concerning artificial visual interpretation of lake area stipulated by the Ministry of Science and Technology,1 artificial visual interpretation was used to obtain the vector data of lake boundaries in northwest China from 2000 to 2014, which had an accuracy within one pixel. The time series of this dataset can be expanded by using remote sensing data products of other periods, through which to analyze the spatial and temporal changes of lake area and water volume in the region. When combined with meteorological data, it can also be used to carry out relevant researches on the response mechanism of lakes to climate change and the spatial and temporal distribution of water resources in the region.

2. Data collection and processing

Affected by seasonal precipitation, the intra-annual fluctuation of lake area is usually more intense. Therefore, the remote sensing images should be selected for the relatively stable period of the lake area. Based on nearest distance, monthly temperature and precipitation of the nearest location were extracted from 33

- 3 - national meteorological stations in northwest China downloaded and collated by National Meteorological Data Center (http://data.cma.cn/). The base remote sensing images, used for visual interpretation of the lakes, were from USGS (https://glovis.usgs.gov). A total of 590 Landsat TM/ETM+/OLI images were downloaded, with a data volume of about 450 GB. Objects of visual interpretation were determined according to "A data set of lakes with an area above 1 km2 in China (2005-2006) in the scale of 1:250000" from National Earth System Science Data Sharing Infrastructure (http://lake.geodata.cn). The names of the lakes were from China Lakes Record.9 When interpreting the boundaries of lakes in Hoh Xil region, we also referred to "A data set of vector boundaries of main lakes in Hoh Xil region (2000-2011)".8

2.1 Objects of interpretation and selection of remote sensing images According to Lake-basin Science Data,5 there are 437 natural lakes with an area of above 1.0 km2 in northwest China, totaling an area of 2.03×104 km2 that account for 16.2% of China’s total number of lakes and 24.9% of total lake area, respectively. Among them, there are 143 lakes with an area above 10.0 km2, totaling an area of 1.93×104 km2 that account for 95.1% of the total lake area in northwest China, which constitute the main body of lakes in this region. The two types of lakes unable to reflect their natural status were eliminated during dataset building: dry salt lakes strongly affected by seasonal precipitation, and lakes with dams or a large number of diversion facilities. The study selected 113 non-dry salt lakes with an area above 10 km2 as vectorization objects (Figure 1). These included 77 lakes in Qinghai Province, 33 lakes in Xinjiang Uygur Autonomous Region, 1 in Gansu Province, 1 in Shaanxi Province, and 1 in Autonomous Region, respectively. The total area of these lakes accounts for 81.76% that of the lakes with an area above 1.0 km2, which can basically reflect the overall change characteristics of lake area in northwest China.

Fig.1 Map of lake distribution in northwest China

- 4 - In order to minimize the impact of intra-annual lake area changes and ensure the comparability of lake boundaries across different years, the monthly precipitation data from 33 national meteorological stations in northwest China was analyzed. After understanding the characteristics of intra-annual precipitation variation in each region, we identified the months with less and stable precipitation for visual interpretation of the lakes. Then Landsat images for the identified months were downloaded, where images with less cloud and snow were selected as base maps for visual interpretation of the lakes. According to statistics, October and November had the largest number of remote sensing images, with the number being 256 and 179, respectively, followed by September, August and December, which totaled 136. In addition, a small number of 19 remote sensing images in January, February and July were selected for certain years or lakes.

2.2 Data processing The dataset production mainly involves five steps: reading and band composition of remote sensing images, quality inspection of remote sensing images, visual interpretation of lake boundaries, inputting of property values and quality inspection of lake boundaries. The specific steps are described as follows. (1) Reading and band composition of remote sensing images After batch downloading and decompression of remote sensing images, the bands needed for visual interpretation (standard false color composite: NIR-R-G) were extracted using Arcpy scripts, which were then composed into a multi-band TIF format dataset in the order of G, R, NIR. (2) Quality inspection of remote sensing images Enter the centroid coordinates of the lake for visual interpretation, and check the cloud and snow status of lake surface on remote sensing images. If the amount exceeds requirements, the quality of other remote sensing images in the same year was checked until the conditions were met. (3) Visual interpretation of lake boundaries Taking qualified remote sensing images as base maps and controlling the display scale at 1:500 - 1:1000, the boundaries of lakes were digitized along the center of the land-water pixels, and the accuracy of vectorization was controlled within 1 pixel. Meanwhile, islands were excluded from the polygons of the lake boundaries where appropriate. To ensure the accuracy of visual interpretation, the vector boundaries of lakes were checked at least once by a second member of the working group. (4) Inputting of property values The related data were manually entered into an attribute table, including the code, name, type, affiliated region, province, first-order and second-order basin of

- 5 - the lakes; tracking number, acquisition time, sensor type of Landsat images; and trends of lake changes. Other information were automatically calculated using the module of "Calculate Geometry" in ArcGIS software, including the area, perimeter, centroid longitudes and latitudes of the lakes, changing trends of the lakes as compared with last year. (5) Quality inspection of lake boundaries Direct assessment was adopted in this step, by means of checking the graphic data and attribute data of each lake boundary, which was to ensure that no data items were missing and all were correct.

3. Sample description 3.1 Graphic samples of the dataset The nomenclature of this dataset follows the following rules: Lake_NWC_YYYY, where Lake represents lake, NWC represents Northwest China, YYYY represents year. So, the time information of the data can be obtained from file names. The distribution of lakes in the region is shown in Figure 2. Three objects of interpretation are selected to show their changes in the whole time series of the data. Sub-figure A, B and C represent Lake Ayakum, Lake Bosten and Ebinur Lake, respectively. As shown in Figure 2, between 2000 and 2014, the lake boundaries of A, B and C changed in varying degrees in the horizontal direction. The area of Lake Ayakum increased continuously, with an increment of 335.01 km2 and an expansion of 1.52 times. In the eastern part of the lake, the horizontal displacement of the lakeshore was more than 30 km because of the relatively flat lake basin. The area of Lake Bosten decreased by 216.31 km2, about 19.21% compared with the year 2000. The change is reflected in the open marginal area of the lake basin, and the main body of the lake did not change much. The area of Ebinur Lake decreased by 39.52% and 238.93 km2. The shape of the main body of the lake changed conspicuously. While the area of the lake decreased as a whole, it fluctuated locally. Great changes took place between and beyond the two peaks in 2003 and 2012. The above three lakes are representative of the whole region in terms of both area and shape changes, which can reflect the boundaries of lakes and their changes, which is where the value of the dataset lies.

- 6 - Fig.2 The sketch map of lake area change during 15 years

3.2 Attribute table of the dataset The database of the visual interpretation results includes 15 fields, describing the geometric and coordinate parameters of the lakes, as well as certain ascription parameters and physical and chemical parameters, where NULL was allowed for certain fields. Among them, the codes of lakes (Code) is the unique primary key in the data table adopted from the GLIMS coding scheme. The Code was formatted as follows: LnnnnnnnnEmmmN, where L represents lake, n represents its centroid longitudes in 2000, m represents the centroid latitudes in 2000, E and N represents, respectively, the eastern longitude and the northern latitude, the first three n and the first two m represent, respectively, the integer of longitude and latitude, and the last three n and m represent, respectively, the first three decimals of longitude and

- 7 - latitude. In addition, the centroid longitude and centroid latitude of East Juyan Lake was calculated using the data of 2002. Data fields and their descriptions are shown in Table 1.

Table 1 Data field description

Allow NULL No. Field Type Length Description Values?

1 Code string 14 No Code of lake

2 Name string 30 Yes Name of lake

3 Property string 10 No Type of lake

4 Region string 20 No Region where the lake is located

5 Province string 20 No Province where the lake is located

6 Basin_F string 20 No First-order basin

7 Basin_S string 30 No Second-order basin

8 Area double 16 No Lake area

9 Perimeter double 16 No Lake perimeter

10 Long_Cen float 8 No Centroid longitude of lake

11 Lat_Cen float 8 No Centroid latitude of lake

12 RC_Landsat string 6 No Tracking number of Landsat image

13 Date_DS date - No Acquisition time of Landsat image

14 Sensor string 20 No Sensor type of Landsat image

15 Area_Ch string 10 No Trend of lake change

4. Quality control and assessment

Visual interpretation was performed in consultation with Ma et al.1 The accuracy of the interpretation depends on image resolution, time phase, registration accuracy, lake area and professional experience. The absolute accuracy of lake area was evaluated by referring to Liao et al.’s method for calculating vectorized area error of lake boundaries.10 The relative accuracy of lake area was the ratio of absolute accuracy to lake area. Artificial visual interpretation has uncertainties itself, especially on such aspects as mixed pixels, layer scaling level, etc., which can be regarded as unavoidable random errors proportional to the length of lake boundaries. However, these errors can be effectively controlled through subsequent statistical calculation. For a scientific evaluation of the lake changes, Li et al.’s method 11 is recommended when users extract the changes of lake area from this dataset.

- 8 - 5. Value and significance

Compared with other lake data sets,6-7 this dataset is featured by larger scope, longer time series and higher time resolution. It also uses Landsat images with high spatial resolution as basic maps for visual interpretation of the lakes, which is still relatively rare up to now. In addition, the use of artificial visual interpretation has greatly ensured the reliability of the data. The dataset can reflect the current status and overall change of the lakes in northwest China from 2000 to 2014, as well as the law of their variation and differentiation. It can be used as basic data in lake-related research.

6. Usage notes

After being decompressed, the dataset can be opened, edited, viewed and analyzed by GIS software supporting ESRI Shapefile format, such as ArcGIS. The data uses the WGS-84 geographic coordinate system and the Albers projection coordinate system, which enables many parameters to be directly calculated, such as the changes and change rates of lake area. The dataset can be used directly or after further extension. When extended to a longer time series and combined with meteorological data, the dataset can be used to analyze the temporal and spatial changes of the lake area and lake volume in the region. The dataset can also be used to explore the response mechanism of lakes to climate changes and the spatial and temporal distribution of water resources in the region, theoretically supporting government sectors in the artificial intervention of local water resources allocation.

References

1. Ma R H, Yang G H, Duan H T et al. China’s lakes at present: Number, area and spatial distribution. Sci China Earth Sci 41 (2011): 394 – 401. 2. Ding Y J, Liu S Y, Ye B S et al. Climatic implications on variations of lakes in the cold and arid regions of China during the recent 50 years. Journal of Glaciology and Geocryology 28 (2006): 623 – 632. 3. Zhu G, Gao H J & Zeng G. Lake change research and reasons analysis in Xinjiang arid regions during the past 35 years. Arid Land Geography 38 (2015): 103 – 110. 4. Zhang X, Wu Y H & Zhang X. Water level variation of inland lakes on the south-central Tibetan Plateau in 1972-2012. Acta Geographica Sinica 69 (2014): 993 – 1001. 5. National Earth System Science Data Sharing Infrastructure, National Science and Technology Infrastructure. Lake-basin Science Data, available at: < http://lake.geodata.cn>.

- 9 - 6. Qiu Y, Guo H, Ruan Y et al. A dataset of microwave brightness temperature and freeze-thaw for medium-to-large lakes over the High Asia region 2002 – 2016. China Scientific Data 2 (2017). DOI: 10.11922/csdata.170.2017.0117 7. Lu S, Jin J, Jia L et al. MODIS (MOD09Q1)-derived lake water surface dataset for the Tibetan Plateau (2000 – 2012). China Scientific Data 2(2017). DOI: 10.11922/csdata.170.2016.0113 8. Yao X J, Liu S Y, Li L et al. Spatial-temporal variations of lake area in Hoh Xil region in the past 40 years. Acta Geographica Sinica 68 (2013): 886 – 896. 9. Wang S M, Dou H S. Chen K Z, et al. China Lakes Record. Beijing: Science Press, 1998. 10. Liao S F, Wang X, Xie Z C et al. Changes of glacial lakes in different watersheds of Chinese Himalaya during the last four decades. Journal of Natural Resources 30 (2015): 293 – 303. 11. Li X F, Yao X J, Sun M P et al. Spatial-temporal variations in lakes in the northwest China from 2000 to 2014. Acta Ecologica Sinica 38 (2018): 96 – 104. Data citation

1. Zhang D, Li X & Yao X. A dataset of lakes with an area above 10 km2 in northwest China (2000 – 2014). Science Data Bank. DOI: 10.11922/sciencedb.621 (2018). Authors and contributions Zhang Dahong, MSc; research area: GIS design and development. Contribution: basic data collection and processing, manuscript writing.

Li Xiaofeng, MSc; research area: GIS design and development. Contribution: basic data processing and visual interpretation.

Yao Xiaojun, PhD, Professor; research area: GIS and cryospheric change. Contribution: design of the overall scheme.

------How to cite this article: Zhang D, Li X & Yao X. A dataset of lakes with an area above 10 km2 in northwest China (2000 – 2014). China Scientific Data 3(2018). DOI: 10.11922/csdata.2018.0041.zh

- 10 -