International Journal of Geo-Information

Article Where Urban Youth Work and Live: A Data-Driven Approach to Identify Urban Functional Areas at a Fine Scale

Yiming Yan 1,2, Yuanyuan Wang 1,3,*, Zhenhong Du 1,2, Feng Zhang 1,2 , Renyi Liu 1,3 and Xinyue Ye 4 1 School of Earth Sciences, University, 38 Zheda Road, 310027, ; [email protected] (Y.Y.); [email protected] (Z.D.); [email protected] (F.Z.); [email protected] (R.L.) 2 Zhejiang Provincial Key Laboratory of Geographic Information Science, 148 Tianmushan Road, Hangzhou 310028, China 3 Ocean Academy, , 1 Zheda Road, Zhoushan 316021, China 4 Department of Informatics, New Jersey Institute of Technology, Newark, NJ 07102, USA; [email protected] * Correspondence: [email protected]; Tel.: +86-571-8827-3287

 Received: 13 December 2019; Accepted: 13 January 2020; Published: 14 January 2020 

Abstract: As a major labor force of cities, young people provide a huge driving force for urban innovation and development, and contribute to urban industrial upgrading and restructuring. In addition, with the acceleration of , the young floating population has increased rapidly, causing over-urbanization and creating certain social problems. It is important to analyze the demand of urban youth and promote their social integration. With the development of the mobile Internet and the improvement of the city express system, ordering food delivery has become a popular and convenient way to dine, especially in China. Food delivery data have a significant user attribute where the ages of most delivery customers are under 35 years old. In this paper, we introduce food delivery data as a new data source in urban functional zone detection and propose a time-series-based clustering approach to discover the urban hotspot areas of young people. The work and living areas were effectively identified according to the human behavioral characteristics of ordering food delivery. Furthermore, we analyzed the relationship between young people and the industry structure of Hangzhou and discovered that the geographical distribution of the identified work areas was similar to that of the Internet and e-commerce companies. The characteristics of the identified living areas were also analyzed in combination with the distribution of subway lines and residential communities, and it was found that the living areas were mainly distributed along subway lines and that urban villages appeared in the living hotspot regions, indicating that transportation and living cost were two important factors in the choice of residential location for young people. The findings of this paper can help urban industrial and residential planning and young population management.

Keywords: food delivery data; urban youth; urban hotspots; industrial structure; urban village

1. Introduction The youth population is a group with strong creativity and high consumption power, who provide a strong driving force for urban innovation and development. With the acceleration of urbanization in China, a large number of young people have migrated from villages to cities. The fast-growing young floating population causes over-urbanization, which leads to urban congestion and insufficient infrastructure, and creates certain social problems, especially with respect to housing, employment,

ISPRS Int. J. Geo-Inf. 2020, 9, 42; doi:10.3390/ijgi9010042 www.mdpi.com/journal/ijgi ISPRS Int. J. Geo-Inf. 2020, 9, 42 2 of 20 and transportation [1–3]. On the other hand, young people are accustomed to learning new things and can better adapt to the development of emerging industries [4] such as new-generation information and renewable energy technologies, and they contribute to urban industrial upgrading and restructuring. Therefore, it is important to analyze the demand of urban youth and facilitate their social integration to promote sustainable urban development. Housing and employment are hot research areas for urban issues and are also the focus of urban youth. Many studies use questionnaires or government survey statistics to analyze the urban residential environment and employment situation such as the built environment of residential areas and workplaces, the choice of the commuting mode, and the regional jobs–housing balance [5–7]. However, these survey methods have certain defects in subjective bias, sample size, and timeliness. In recent years, with the widespread use of positioning devices such as civilian GPS (Global Positioning System) technology on mobile terminals and the development and popularization of location-based services and mobile social networks, a large number of trajectory or geotag data are accumulating in daily life and are serving different types of applications [8–10]. These data sources mainly include mobile phones [11–16], taxis [17–21], public or shared bicycles [22–25], transit smart cards [26–29], and so on. They convey human mobility and activity information and provide content and methodological innovation to the research of the spatial-temporal behavior of people in urban areas [8,30]. The identification of urban functional areas such as residential and industrial areas is an important way to understand urban land use structure and population distribution [31,32]. Mao et al. extracted the spatial distribution of urban jobs and housing areas by using a re-clustering algorithm based on the “job–residential factor” index from massive taxi origin-destination (OD) points [17]. Pei et al. identified land use types such as residential, business, commercial, open space and others from mobile phone activity data by using a semi-supervised clustering method [13]. Long et al. analyzed the urban job–housing structure and commuting patterns with the location–time–duration (PTD) data mined from the raw bus smart card data [29]. Zhang et al. used urban bicycle rental records and Point-of-Interest (POI) information to delineate urban functional zones by applying topic modeling techniques where business areas, industrial areas, and residential areas were well detected [25]. However, the above-mentioned data have certain geographical limitations. Taxi GPS data mainly exist in road networks, and the location of the mobile phone data comes from the locations of the base transceiver stations. The smart card data only provide the location of transportation stops and stations, and the origin and destination location of public bicycles are determined by the service stations. These limitations will not be able to effectively identify the complex geographical environment such as communities and colleges where they are outside the road networks or away from the transportation stations. In addition, most studies on the relationship between urban areas with age, gender, and socio-economic factors are based on static survey data. Therefore, there is a lack of a big data-driven approach to detect the geographic distribution of specific age groups. Due to the wide usage of mobile phones and the convenience of delivery systems, food delivery has become a popular way of eating because it has convenient order services, takes less time, and offers a variety of food options. Food delivery data are trajectory data produced by delivery riders from the store to the customer’s place after receiving the order, usually using electric scooters in China. According to the China delivery big data reports issued by the food delivery services (such as Ele.me, meituan) in 2017 [33–35], the number of online catering users exceeded 300 million, and the delivery dataset has significant user characteristics as follows: (1) the largest user population are white-collar workers, which account for over 80% of the total delivery orders, and the second largest user population are college students; and (2) the majority of users are under 35 years old. In other words, young people are the major consumer groups of online ordering. In this paper, food delivery data are introduced as a new data source into urban functional zone detection. Compared with other trajectory or geotag data, the characteristics of delivery data provide a new way to explore the spatially significant areas of the urban youth group. In addition, the electric delivery scooters can enter complex geographical areas such as residential communities, ISPRS Int. J. Geo-Inf. 2020, 9, 42 3 of 20

ISPRS Int. J. Geo-Inf. 2020, 9, 42 3 of 20 college campuses, and large industrial parks, and can provide a more accurate dataset for functional areaextraction. extraction. This paper This paper proposes proposes a time-series-based a time-series-based clustering clustering approach approach to discover to discover the urban the hotspot urban hotspotareas of areasyoung of people. young A people. case study A case of Hangzhou, study of Hangzhou, China, which China, is famous which isfor famous its Internet for its economy, Internet economy,was implemented was implemented to discuss tothe discuss relationship the relationship between urban between youth urban and youth the urban and theindustrial urbanindustrial structure structureand the factors and the that factors affect that the aresidentialffect the residential location choice location of choiceyoung ofpeople. young people. The remainder of this paper is organized as follows: Section 22 describesdescribes thethe studystudy areaarea andand datadata source, andand performsperforms some some statistics statistics of theof the dataset. datase Sectiont. Section3 discusses 3 discusses the method the method for extracting for extracting urban functionalurban functional areas from areas food from delivery food delivery data, and data, the and results the results are verified are verified with o withffice urbanoffice urban planning planning maps. Sectionmaps. Section4 analyzes 4 analyzes the spatial the distribution spatial distribution of the urban of the youth urban population, youth population, combined combined with the industrial with the structureindustrial andstructure the distribution and the distri ofbution residential of residential areas in Hangzhou.areas in Hangzhou. This paper This concludes paper concludes with a briefwith summarya brief summary and discussion and discussion in Section in Section5. 5.

2. Study Area and Data Hangzhou, capital of ’s Zhejiang Province, is famous for its Internet and digital economy, andand it it is is also also home home to to Chinese Chinese Internet Intern andet and e-commerce e-commerce giant giant Alibaba Alibaba and aand number a number of other of world-leadingother world-leading high-tech high-tech companies. companies. According Accordin to theg statisticsto the statistics in 2018, in the 2018, main the business main business income ofincome Hangzhou’s of Hangzhou’s digital economydigital economy exceeded exceeded 1 trillion 1 yuan,trillion and yuan, its and added its valueadded reached value reached 335.6 billion 335.6 yuan,billion accounting yuan, accounting for 24.8% for of 24.8% the city’s of the GDP city’s [36 GDP]. The [36]. study The area study of area this paperof this is paper the main is the city main area city of Hangzhou,area of Hangzhou, which coverswhich 74covers street 74 districts street districts with an with area an of overarea 1600of over km 16002 (Figure km2 1(Figure). 1).

Figure 1. Study area.

The dataset used in this paper was provided by the Chinese logistics company “Dianwoda”, which recordedrecorded foodfood deliverydelivery electric electric scooter scooter trips trips that that occurred occurred in in Hangzhou Hangzhou from from 28 28 July July 2017 2017 to 26to September26 September 2017, 2017, and and the user-relatedthe user-related information inform wasation removed was removed for privacy for privacy protection. protection. The original The deliveryoriginal datadelivery collected data real-timecollected GPSreal-time information GPS information of the electric of scootersthe electric during scooters the entire during food the delivery entire process.food delivery The fields process. of the The dataset fields mainly of the include dataset the mainly order ID,include rider the ID, order status, ID, timestamp, rider ID, latitude, status, andtimestamp, longitude. latitude, The order and longitude. ID is a unique The order identifier ID is for a unique a delivery identifier order, for while a delivery the rider order, ID is while a unique the identifierrider ID is for a unique a delivery identifier rider. Real-time for a delivery GPS locationrider. Real-time information GPS is location converted information to latitude is and converted longitude, to accompaniedlatitude and longitude, by a time accompanied stamp. The fieldby a time “status” stam representsp. The field di “status”fferent stages represents of the different food delivery stages processof the food including delivery dispatch, process arriving, including arrive, dispatch, leave, arriving, delivering, arrive, and leave, finish. delivering, In this study, and the finish. origin In andthis destinationstudy, the origin (OD) and pairs destination for each delivery (OD) pairs order for are each worthy delivery of our order major are worthy concern, of where our major the origin concern, for thewhere delivery the origin order for refers the delivery to the location order refers of the to restaurant, the location and of the the destination restaurant, refers and the to thedestination location ofrefers the to delivery the location customer. of the Thedelivery record, customer. whose statusThe record, changes whose from status arrive changes to leave, from was arrive extracted to leave, as thewas origin extracted of one as delivery the origin order, of andone thedelivery record, order, whose and status the changes record, fromwhose delivering status changes to finish, from was delivering to finish, was extracted as the destination. In this way, we could obtain the delivery OD dataset by extracting and reorganizing the delivery electric scooter trip data. ISPRS Int. J. Geo-Inf. 2020, 9, 42 4 of 20

ISPRS Int. J. Geo-Inf. 2020, 9, 42 4 of 20 extracted as the destination. In this way, we could obtain the delivery OD dataset by extracting and reorganizingIn the delivery the delivery OD dataset, electric each scooter record trip data.includes the order ID, the departure and arrival time, the originIn the and delivery destination OD dataset, location, each the recorddelivery includes trip distance, the order and ID,the thedelivery departure time. andThe delivery arrival time, OD datasetthe origin contained and destination 7,480,402 location, orders. Table the delivery 1 shows trip a distance,small subset and of the these delivery data time.for reference, The delivery and FigureOD dataset 2 shows contained the location 7,480,402 distribution orders. Tableof the1 fshowsood delivery a small destination subset of these points data in the for reference,study area. and Figure2 shows the location distribution of the food delivery destination points in the study area. Table 1. A small subset of the delivery origin-destination (OD) dataset. Each row corresponds to an occupiedTable 1. A delivery small subset order. of the delivery origin-destination (OD) dataset. Each row corresponds to an occupied delivery order. Departure Departure Departure Arrival Arrival Arrival Duration Distance Order ID Order DatetimeDeparture lonDeparture latDeparture DatetimeArrival lonArrival latArrival (sec)Duration (meter)Distance ID Datetime lon lat Datetime lon lat (sec) (meter) 2017-07-29 2017-07-29 166256640 120.279905 30.305337 120.28229 30.313441 1206 4187.01 20:59:482017-07-29 21:19:552017-07-29 166256640 120.279905 30.305337 120.28229 30.313441 1206 4187.01 2017-07-3020:59:48 2017-07-3021:19:55 166862850 2017-07-30 120.152728 30.182544 2017-07-30 120.144546 30.161267 1317 4587.01 166862850 13:27:28 120.152728 30.182544 13:49:26 120.144546 30.161267 1317 4587.01 13:27:28 13:49:26 2017-07-29 2017-07-29 165838860 2017-07-29 120.32855 30.305576 2017-07-29 120.328559 30.305209 389 1403.50 165838860 13:57:53 120.32855 30.305576 14:04:21 120.328559 30.305209 389 1403.50 13:57:53 14:04:21 2017-07-282017-07-28 2017-07-282017-07-28 164347920164347920 120.382136120.382136 30.305377 30.305377 120.387719120.387719 30.299668 30.299668 529 1977.62 10:18:4410:18:44 10:27:3310:27:33

Figure 2. Location distribution of the food delivery destination points in the study area. Figure 2. Location distribution of the food delivery destination points in the study area. As with many large datasets, the delivery OD dataset contains errors. Some errors are easy to identifyAs with such many as point large coordinates datasets, outthe ofdelivery the research OD data areaset or contains order trips errors. with Some a negative errors distanceare easy orto identifytime. Table such2 shows as point the coordinates thresholds usedout of in the each research filter as area well or as order the quantity trips with of ordersa negative that distance violate the or time.threshold. Table A2 shows trip distance the thresholds of less than used 0 in as each a negative filter as value well as was the filtered quantity because of orders it is that geometrically violate the threshold.impossible, A and trip the distance orders whoseof less tripthan distance 0 as a negative was more value than was 20 km filtered were alsobecause filtered it is because geometrically of their impossible,low incidence. and The the ordersorders withwhose a deliverytrip distance time was between more 1 than min and20 km 90 were min werealso filtered reserved. because Figure of3 theirshows low the incidence. delivery durationThe orders and with trip adistance deliverydistribution time between of 1 the min delivery and 90 ODmin data were after reserved. filtering. Figure 3 shows the delivery duration and trip distance distribution of the delivery OD data after filtering.

ISPRS Int. J. Geo-Inf. 2020, 9, 42 5 of 20

ISPRS Int. J. Geo-Inf. 2020, 9, 42 5 of 20 Table 2. The error thresholds that are applied to the trip distance and duration of the dataset. The last column shows the percentage of orders that violate each threshold. ISPRSTable Int. J. Geo-Inf. 2. The 2020error, 9 thresholds, 42 that are applied to the trip distance and duration of the dataset. The last5 of 20 column shows the percentage of orders that violate each threshold. Feature Lower Threshold Upper Threshold Portion of Errors (%) Table 2. The error thresholds that are applied to the trip distance and duration of the dataset. The last Trip Distance columnFeature shows the percentage0 Lower of orders Threshold that violate 20,000.0 Upper each threshold. Threshold 0.20 Portion of Errors (%) (meters) Trip Distance 0 20,000.0 0.20 FeatureDuration(meters) (min) Lower 1 Threshold 90 Upper Threshold 0.10 Portion of Errors (%) Trip DistanceDuration (meters) (min) 0 1 90 20,000.0 0.10 0.20 Duration (min) 1 90 0.10

Figure 3. Delivery duration and trip distance distribution after filtering. Figure 3. Delivery duration and trip distance distribution after filtering. Figure 3. Delivery duration and trip distance distribution after filtering. We also calculated some statistics to understand the characteristics of the dataset. As shown in We also calculated some statistics to understand the characteristics of the dataset. As shown in Figure 4, we counted the average number of delivery orders of each day in one week and the average FigureWe4, wealso counted calculated the averagesome statistics number to of understand delivery orders the characteristics of each day in of one the week dataset. and As the shown average in number of delivery orders on an hourly basis in one day for both weekdays and weekends. It was numberFigure 4, of we delivery counted orders the average on an hourlynumber basis of deliver in oney orders day for of both each weekdays day in one and week weekends. and the average It was observed that people ordered more deliveries on weekends than on weekdays. There were two peaks observednumber of that delivery people orders ordered on more an hourly deliveries basis on in weekends one day for than both on weekdays.weekdays Thereand weekends. were two It peaks was on each day, one peak from 11:00 to 13:00 and another peak from 17:00 to 20:00, which was similar onobserved each day, that one people peak ordered from 11:00 more to 13:00deliveries and anotheron weekends peak fromthan on 17:00 weekdays. to 20:00, There which were was similartwo peaks on on both weekdays and weekends. bothon each weekdays day, one and peak weekends. from 11:00 to 13:00 and another peak from 17:00 to 20:00, which was similar on both weekdays and weekends.

Figure 4. (a) Average number of delivery orders each day in a week; (b) Average number of delivery Figure 4. (a) Average number of delivery orders each day in a week; (b) Average number of delivery orders on an hourly basis on weekdays and weekends. orders on an hourly basis on weekdays and weekends. Figure 4. (a) Average number of delivery orders each day in a week; (b) Average number of delivery 3. Methodsorders on an hourly basis on weekdays and weekends. 3. Methods In this section, we aimed to explore the spatially significant areas of urban youth, and determine the3. Methods functionalIn this section, types we of theaimed extracted to explore areas. the The spatially process significant was divided areas into of urban the following youth, and steps: determine (1) map the functional foodIn this delivery section, types destination weof the aimed extracted to points explore areas. into the aThe spatially spatiotemporal process significant was divided cube areas and into of divide theurban following the youth, research andsteps: determine area (1) map into the food delivery destination points into a spatiotemporal cube and divide the research area into equalthe functional grids; (2) types calculate of the the extracted delivery areas. hot valueThe pr byocess using was the divided modified into Getis-Ordthe following statistic steps: method, (1) map andequalthe food construct grids; delivery (2) thecalculate timedestination seriesthe delivery bypoints combining hot into value a spatio the by timeusingtemporal pattern the modified cube with and the Getis-Ord divide delivery the statistichot research values method, area of each andinto grid;constructequal and grids; (3)the (2) clustertime calculate series the time bythe combining delivery series of hot the the value delivery time bypatte using datarn with withthe modified the the delivery unsupervised Getis-Ord hot values k-means statistic of each methodmethod, grid; andand determine(3)construct cluster thethe the timetime functional seriesseries byof types thecombining delivery of clusters the data time based with patte onthern the unsupervised with behavioral the delivery characteristicsk-means hot valuesmethod ofof and orderingeach determine grid; food and delivery.the(3) clusterfunctional Finally, the timetypes the series resultsof clusters of werethe delivery based verified on data bythe o with ffibehavioralce the urban unsupervised planningcharacteri maps. sticsk-means of ordering method foodand determine delivery. Finally,the functional the results types were of verificlustersed bybased office on urban the behavioral planning maps.characteri stics of ordering food delivery. Finally, the results were verified by office urban planning maps. ISPRS Int. J. Geo-Inf. 2020, 9, 42 6 of 20 ISPRS Int. J. Geo-Inf. 2020, 9, 42 6 of 20

3.1. Spatiotemporal Cube Construction 3.1. Spatiotemporal Cube Construction In this paper, the delivery destination points are the main processing target. As shown in Table In this paper, the delivery destination points are the main processing target. As shown in Table1, 1, the destination points have longitude–latitude information and the arrival time, which can be the destination points have longitude–latitude information and the arrival time, which can be mapped mapped into a three-dimensional Euclidian space. We constructed a spatiotemporal cube model into a three-dimensional Euclidian space. We constructed a spatiotemporal cube model where the where the cubes had equal space size and time intervals. Each cube was represented as a three- cubes had equal space size and time intervals. Each cube was represented as a three-dimensional array dimensional array =(,,), where is the number of cubes in the x dimension, is the c = (xc, yc, tc), where xc is the number of cubes in the x dimension, yc is the number of cubes in the y number of cubes in the y dimension, and is the number of cubes in the time dimension. We set the dimension, and tc is the number of cubes in the time dimension. We set the time size to 1 h and the time size to 1 h and the spatial size to 0.001 degrees (approximately 95 to 115 m). The research area spatial size to 0.001 degrees (approximately 95 to 115 m). The research area was also divided into equal was also divided into equal grids, as shown in Figure 5. After data mapping and aggregation, each grids, as shown in Figure5. After data mapping and aggregation, each spatiotemporal cube stored the spatiotemporal cube stored the count of the destination points in that cube as its attribute. count of the destination points in that cube as its attribute.

Figure 5. The research area covered by equal grids. Figure 5. The research area covered by equal grids. 3.2. Time Series Construction 3.2. Time Series Construction 3.2.1. Modified Getis-Ord Statistic Method 3.2.1. Modified Getis-Ord Statistic Method The well-known Getis-Ord statistic method was used in this paper to calculate the hot value of eachThe cube. well-known To be a statistically Getis-Ord significant statistic method hotspot, was a feature used in will this have paper a high to calculate value and the be hot surrounded value of eachby other cube. features To be a statistically that also have significant high values. hotspot, The a feature method will provides have a high a z-score value that and allowsbe surrounded users to bydetermine other features where thethat features also have with high either values. high or The low method values areprovides clustered a z-score spatially. that Formally, allows basedusers onto determinethe constructed where spatiotemporal the features with cube, either the Getis-Ordhigh or low value values Gi∗ areof each clustered cube isspatially. defined Formally, as [37,38]: based on the constructed spatiotemporal cube, the Getis-Ord value G∗ of each cube is defined as [37,38]: P P n w a a n w j=nn1 i,j j − j=1 i,j G∗ = wa− a w, (1) i * rjj==11ij,, j  ij = Pn 2 Pn 2 n j=1 wi j j=1 wi,j , G i , − 2 S nn2 nwn−1() w (1) jj==11ij,, ij S − s− Pn n P1n 2 j=1 aj j=1 aj a = and S = a2, (2) nnn n − aa2 jj==11jj aSa = and = − 2 , (2) nn where n refers to the total number of cubes, and is the attribute value of cube . , is the spatial weight between cubes i and j, and is defined as follows: ISPRS Int. J. Geo-Inf. 2020, 9, 42 7 of 20

where n refers to the total number of cubes, and aj is the attribute value of cube cj. wi,j is the spatial weight between cubes i and j, and is defined as follows: ( 1, if c and c are neighbors w = i j , (3) i,j 0, otherwise

Here, we define two cubes as neighbors only if the maximum distance between i and j in any of the three coordinates (xc, yc, tc) 1. A cube is also its own neighbor. Then, to simplify the computation ≤ of the Getis-Ord statistic value, we use N(ci) to denote the set of neighboring cubes of ci and N(ci) to denote the number of such neighboring cubes. Therefore, we obtain the following simplified formula for the Getis-Ord statistic: P j N(c ) aj a N(ci) ∈ j − · Gi∗ = r , (4) n N(c ) N(c ) 2 i − i S | n| 1| | − 3.2.2. The Time Series Construction It is well known that people usually have different behavior patterns on weekdays and on weekends; thus, we built a 48-h timeline that consisted of 24-h periods on weekdays and weekends. After the Getis-Ord statistic calculation was performed for all the cubes, a group function was used to gather the cubes with the same location and time period, and an aggregate function averaged the Getis-Ord values of the cubes in the same group. Finally, the synthesized time series were obtained as a linear combination of the 48-h timeline and the statistical value of delivery order quantity. Then, each n k o grid would have a time series TSk gt , t = 1, 2, ... , 48; k = 1, 2, ... , n , where n is the total number of k the grids, and gt is the hot value of grid k at time period t (1–24 for weekdays and 25–48 for weekends). As the Getis-Ord statistic method is a common method for hotspot analysis, we also obtained 48 delivery hotspot distribution maps that corresponded to 48 time periods. The delivery hotspot distribution during typical time periods (i.e., lunch time, dinner time, and midnight) is visualized in Figure6.

3.3. Time Series Classification and Identification The statistical hot values in the time series represent the delivery quality in one region during one period. That is, when the hot value is higher, more delivery orders are placed in this area, and low hot values generally represent less delivery orders. To better identify functional areas, the grids whose hot values were negative in all time periods were removed. To emphasize the characteristics of the delivery data and weaken the impact of regional factors, the hot values of the time series were normalized for each grid. The clustering method k-means was applied to classify the time series. Unsupervised clustering methods require the number of clusters to be known beforehand. To determine an appropriate value for K, the ratio between the intra-cluster and inter-cluster distances was introduced as a validity index [39], which is defined as: K 1 X X intra cluster = (x z )2, (5) − N − i i=1 x C ∈ i 2 inter cluster = min(zi zj) , (6) − i,j − intra cluster validity = − , (7) inter cluster − where N is the number of instances; K is the number of clusters; and zi is the centroid of cluster Ci. ISPRSISPRS Int.Int. J. Geo-Inf. 2020,, 9,, 42 8 of 20

ISPRS Int. J. Geo-Inf. 2020, 9, 42 8 of 20

Figure 6. Delivery hotspot distribution during typical time periods. Figure Figure6. Delivery 6. Delivery hotspot hotspot distribution distribution during during typical typical time timeperiods. periods. An ideal partition will minimize the intra-cluster distance and maximize the inter-cluster distance; An ideal partition will minimize the intra-cluster distance and maximize the inter-cluster An ideal partition will minimize the intra-cluster distance and maximize the inter-cluster therefore, thedistance; best partitiontherefore, the will best be partition the partition will be the that partition minimizes that minimizes the validity the validity value. value. Figure Figure7 presents 7 the validitydistance; values therefore,presents by the using the validity best the values partition k-means by using clusterwill the be k-means methodthe partition cluster to the method that synthesized minimizes to the synthesized time the seriesvalidity time forseries value.K for 2, Figure... , 10 7, ∈ { } andpresents the minimum theK ∈ validity{}2,...,10 validity , valuesand the value minimumby using was validity obtainedthe k-means value when was cluster obtainedK = 6. method when K = to6. the synthesized time series for K ∈{}2,...,10 , and the minimum validity value was obtained when K = 6.

Figure 7. Validity values in different numbers of clusters by using k-means.

ISPRS Int. J. Geo-Inf. 2020, 9, 42 9 of 20

ISPRS Int. J. Geo-Inf. 2020, 9, 42 9 of 20 ISPRS Int. J. Geo-Inf. Figure2020, 9, 427. Validity values in different numbers of clusters by using k-means. 9 of 20 Figure 7. Validity values in different numbers of clusters by using k-means. Figure 8 represents the cluster results of the time series by using the k-means method with K = 6, where the plotted curves (each consisting of 48 points) represent the cluster centers (red for FigureFigure8 8represents represents the the cluster cluster results results of of the the time time series series by by using using the the k-means k-means method method with with K =K 6,= weekdays and blue for weekends). In addition, the maximum and minimum values of each time where6, where the plottedthe plotted curves curves (each (each consisting consisting of 48 points)of 48 points) represent represent the cluster the centers cluster (red centers for weekdays (red for period in each class were plotted in a thin color shadow. Figure 9 shows the location distribution of andweekdays blue for and weekends). blue for weekends). In addition, In theaddition, maximum the maximum and minimum and minimum values of eachvalues time of periodeach time in the grids according to the cluster results. eachperiod class in each were class plotted were in plotted a thin color in a thin shadow. color Figure shadow.9 shows Figure the 9 shows location the distribution location distribution of the grids of accordingthe grids according to the cluster to the results. cluster results.

Figure 8. Cluster results of the time series by using the k-means method with K = 6. Figure 8. Cluster results of the time series by using the k-means method with K = 6. Figure 8. Cluster results of the time series by using the k-means method with K = 6.

Figure 9. Grid distribution of the cluster results in the research area. Figure 9. Grid distribution of the cluster results in the research area.

Within differentFigure land use9. Grid areas, distribution people mayof the demonstrate cluster results di inff erentthe research routine area. activities. This allows Within different land use areas, people may demonstrate different routine activities. This allows us to determine the social functions of areas based on the customary characteristics of ordering delivery. us to determine the social functions of areas based on the customary characteristics of ordering The peaksWithin of different the cluster land curves use wereareas, concentrated people may demo in fournstrate dining different periods (i.e.,routine at lunch activities. time This and dinnerallows timeus to on determine weekdays the and social weekends). functions By comparingof areas ba thesed curves on the of thecustomary cluster centroids,characteristics we found of ordering that six ISPRS Int. J. Geo-Inf. 2020, 9, 42 10 of 20

delivery. The peaks of the cluster curves were concentrated in four dining periods (i.e., at lunch time ISPRS Int. J. Geo-Inf. 2020, 9, 42 10 of 20 and dinner time on weekdays and weekends). By comparing the curves of the cluster centroids, we found that six clusters could be divided into two categories based on their similar curve clusterscharacteristics: could be one divided category into contains two categories clusters based a–c, and on their another similar category curve contains characteristics: clusters one d–f. category containsClusters clusters a–c: a–c, This and category another is category characterized contains by clustersthe fact d–f.that a significant peak appears during lunchClusters time on a–c: weekdays. This category Delivery is characterized during weekdays by the fact is more that a than significant delivery peak during appears weekends, during lunch and timedelivery on weekdays. during lunch Delivery time duringis greater weekdays than delive is morery thanduring delivery dinner during time both weekends, on weekdays and delivery and duringweekends. lunch This time category is greater shows than a delivery work-related during activi dinnerty characteristic, time both onweekdays and the hypothesis and weekends. is that This the categoryareas included shows in a work-relatedthis category activity are used characteristic, as industrial and parks, the hypothesis office buildings, is that theand/or areas commercial- included in thisrelated category areas. are used as industrial parks, office buildings, and/or commercial-related areas. Clusters d–f: This category representative showsshows that during weekdays, delivery at dinner time is greatergreater thanthan deliverydelivery atat lunchlunch time,time, especiallyespecially inin clustercluster d.d. Delivery during weekends is greatergreater than deliverydelivery duringduring weekdays.weekdays. These These characteristic characteristicss imply imply living areas, wh whereere people come back after work on weekdays and stay during weekends, which may include residential communities, apartments, andand/or/or collegecollege dormitories.dormitories. Clusters in the the same same category category also also exhibited exhibited diffe diffrenterent characteristic characteristic strength, strength, with with a > ab> > bc and> c andd > de > ef. >Thef. The factors factors that that contribute contribute to todifferen differentt cluster cluster characteristic characteristic strength strength may may include the following: (1) (1) The The number number of of people people in in a aregion region varies varies during during differ differentent time time periods, periods, for for example, example, in inindustrial industrial parks, parks, the the workers workers are are mainly mainly concentr concentratedated during during the thedaytime daytime on weekdays; on weekdays; and and(2) the (2) thedining dining time time is short is short or limited. or limited. The peak The peaktime of time ordering of ordering delivery delivery is related is related to people’s to people’s eating habits. eating habits.However, However, white-collar white-collar workers workers may be may limited be limited by work by arrangements, work arrangements, resulting resulting in a dining in a dining peak. peak.The strong The strong cluster cluster characteristic characteristic also indicates also indicates that the that social the socialfunction function of this of area this is area significant. is significant.

3.4. Cluster Results Validation To validatevalidate ourour hypotheses, we compared the clustercluster results against oofficeffice urban planning maps released by the Hangzhou Planning Bureau, in which the land use types mainly include: residential, residential, commercial, industrial,industrial, educational,educational, andand greengreen (Figure(Figure 1010).).

Figure 10. OfficeOffice urban planning map for Hangzhou (partial).

As shownshown in in Figure Figure9, the9, clusterthe cluster results results are displayed are displayed in the form in the of grids form on of maps. grids To on understand maps. To howunderstand well the how clusters well werethe clusters identified were as theidentified work andas the living work areas, and weliving evaluated areas, thewe percentageevaluated the of overlappercentage that of exists overlap between that theexists grids between of the clusters the grid ands of the the o fficlusterscial urban and planning the official maps. urban This planning way, we wouldmaps. This have way, an understanding we would have of thean understanding accuracy of our of approach the accuracy as well of as our of approach the difference as well between as of the results.difference It should between be notedthe results. that this It should verification be noted method that is this an approximate verification measuremethod dueis an to approximate the different granularitiesmeasure due ofto mapsthe different or human granular factors.ities of maps or human factors. Table3 shows the percentage of overlap between the o ffice planning map (columns) and six cluster results (rows). Here, the office planning map was divided into the six land use types of residential, ISPRS Int. J. Geo-Inf. 2020, 9, 42 11 of 20 commercial, industrial, educational, mixed commercial and residential, and other. Commercial and industrial types refer to work areas on which office buildings and industrial parks are located; residential type areas are obviously living areas; educational areas are mixed areas in which dormitories are living type areas, and research institutes are work type areas. We can overlay the cluster grids with the office planning map to see which land use type the grids belong. Each element in the table represents the percentage of grids that belong to each land use type in one cluster. According to our assumptions, clusters a–c represent work areas, while clusters d–f represent living areas. The commercial and industrial types are the two land use types that occupy the greatest proportion in clusters a–c, which account for 94.6%, 81.2%, and 66.4% in total, respectively. In cluster a, the industrial type accounts for 64.9% and was more than twice the proportion of the commercial type, which means that large-scale industry parks have a stronger industrial aggregation effect. The residential type obviously occupied the greatest proportion in clusters d–f, which account for 87.2%, 64.7% and 59.0%, respectively. The educational type is the second largest proportion in clusters e and f, perhaps because a college dormitory area, as a type of living area, is a hotspot for delivery due to its large student user group.

Table 3. Percentage of overlap between the official planning maps and the cluster results for Hangzhou.

Official Planning Map Cluster Mixed Results Residential Commercial Industrial Educational Commercial Other and Residential a 0 29.7% 64.9% 0 2.7% 2.7% b 8.3% 44.0% 37.3% 1.3% 2.9% 6.2% c 12.4% 42.6% 23.9% 4.6% 2.7% 13.8% d 87.2% 2.6% 0 7.7% 2.6% 0 e 64.7% 3.3% 1.2% 25.3% 2.1% 3.3% f 59.0% 10.1% 3.9% 18.1% 2.2% 6.6%

Overall, according to the distribution of the land use types in each cluster, the work and living areas extracted by the method of this paper were credible. In addition, it could be discovered that the recognition accuracy of the regional function type was related to the strength of the cluster characteristics. That is, when the cluster characteristic strength is stronger, the accuracy of the regional function type identification is higher.

4. Analysis

4.1. Work Areas The work areas represent the location distribution of companies, enterprises, and industrial parks. As Hangzhou is famous for its Internet and data economy, we obtained information on Internet and e-commerce companies from a recruitment website called lagouwang (https://www.lagou.com) to compare the extracted work areas of this paper. The dataset included the company ID, company name, industry field, company scale, and company address, and all company address text were converted to latitude and longitude coordinates. After filtering out the data that were outside the research area and that contained errors, we finally had 1005 Internet and e-commerce company items. Figure 11 shows the location distribution of the companies. We used heat maps to visualize the hotspot regions of the work areas, helping to discover the distribution pattern and potential characteristics. The heat maps were made with the kernel density estimation (KDE) method, which calculates the density of the features in a neighborhood around these features. Before making heat maps, three parameters should be set, namely the output cell size, the search radius, and the population field. The cell size was set to 0.0001 degrees, and the search radius was set to 0.005 degrees. The population field value determines the number of times to count the ISPRS Int. J. Geo-Inf. 2020, 9, 42 12 of 20 estimation (KDE) method, which calculates the density of the features in a neighborhood around these features. Before making heat maps, three parameters should be set, namely the output cell size, theISPRS search Int. J. Geo-Inf.radius,2020 and, 9, 42the population field. The cell size was set to 0.0001 degrees, and the search12 of 20 radius was set to 0.005 degrees. The population field value determines the number of times to count the feature, which could be used to weigh some features more heavily than other features or to allow feature, which could be used to weigh some features more heavily than other features or to allow one one point to represent several observations. In this section, the population field was set based on the point to represent several observations. In this section, the population field was set based on the scale scale of the companies and the strength of the cluster results’ characteristics, as shown in Table 4. of the companies and the strength of the cluster results’ characteristics, as shown in Table4.

Figure 11. Location distribution of Internet and e-commerce companies from lagouwang. Figure 11. Location distribution of Internet and e-commerce companies from lagouwang.

TableTable 4. TheThe population population field field set for the ke kernelrnel density estimation method. PopulationPopulation Field Field 1 1 2 2 3 3 4 4 5 5 7 7 CompanyCompany scale scale <15 <15 15–50 15–50 50–150 50–150 150–500 150–500 500–2000 500–2000 >2000 >2000 ClusterCluster result result Cluster Cluster c c Cluster Cluster b b Cluster Cluster a a

Figure 12 shows the heat maps made from the Internet and e-commerce companies and the extracted work areas. areas. We We discovered discovered that that the the two two he heatat maps maps had had similar similar hotspo hotspott region region distributions. distributions. This shows that a large large number number of of young young people people were were working working in in the the Internet-related Internet-related industries areas. There is a mutual driving relationship between industry agglomeration agglomeration and and talent talent gathering gathering [40]. [40]. Regional industrial structure structure plays plays an an important important role role in in attracting talent talent or or human capital [41], [41], and thethe improvement improvement of of Internet-relat Internet-relateded industrial agglomeration level can promote the development of young labor gathering. The The enhancement enhancement of of the the yo youngung labor labor gathering gathering level level can can also also promote the strengthening of Internet-related industrial agglom agglomerationeration in the region. From From the phenomenon that thethe spatial spatial distribution distribution of of the the delivery delivery customers’ customers’ wo workrk places places is is similar similar to to the the spatial distribution distribution of of thethe Internet Internet and and e-commerce e-commerce companies, companies, we we can can obtain obtain an an inference inference that that Hangzhou’s Hangzhou’s Internet-related Internet-related industryindustry hashas a well-planneda well-planned structure structure and goodand development,good development, which can which provide can su ffiprovidecient employment sufficient employmentand development and development opportunities opportunities for young people. for young people. ISPRS Int. J. Geo-Inf. 2020, 9, 42 13 of 20

Figure 12. (a) Heat map made from the extracted work areas of this paper; (b) Heat map made from InternetFigure 12. and (a) e-commerce Heat map made companies. from the extracted work areas of this paper; (b) Heat map made from Internet and e-commerce companies. The layout of urban industrial planning will affect the distribution of work hotspots. As shown in FigureThe 13 ,layout areas A,of urban B, C, and industrial D are the planning four high-tech will affect areas the planned distribution by the of government.work hotspots. As shown in FigureArea 13, Ais areas located A, B, in C, Hangzhou and D are Future the four Sci-tech high-tech City, areas also called planned Haichuang by the government. Park, which has made greatArea efforts A tois located cultivate in industries Hangzhou such Future as electronic Sci-tech City, information, also called new Haichuang energy and Park, materials, which andhas made so on. Itgreat has efforts a series to of cultivate new high-tech industries enterprises such as suchelectr asonic the information, Alibaba Taobao new Townenergy and and Zhejiang materials, Overseas and so High-levelon. It has a Talent series Innovation of new high Park.-tech enterprises such as the Alibaba Taobao Town and Zhejiang OverseasArea High-level B is located Talent in the Xiasha Innovation , Park. which stands for the Hangzhou Economic and Technological DevelopmentArea B is Area. located Xiasha in the is also Xiasha a district District, of colleges which andstands universities, for the Hangzhou 14 in total, whichEconomic provide and aTechnological large number Development of high-quality Area. young Xiasha talent. is also Several a district high-tech of colleges industrial and universities, parks are located 14 in total, here includingwhich provide the Hangzhou a large number Singapore of high-quality Science and young Technology talent. Park,Several Hangzhou high-tech Smart industrial Valley parks Mobile are Internetlocated here Pioneer including Park, and the Hangzhou Hangzhou High-Tech Singapore Enterprise Science and Incubation Technology Park. Park, Hangzhou Smart ValleyArea Mobile C is locatedInternet in Pioneer the Hangzhou Park, and Binjiang Hangzhou High-Tech High-Tech Industrial Enterprise Development Incubation Zone, Park. which has followedArea the C is leadership located in ofthe the Hangzhou Scientific Binjiang Outlook Hi ongh-Tech Development Industrial and Development persisted in the Zone, “development which has offollowed high technology the leadership and realization of the Scientific of industrialization”. Outlook on Development It has many and famous persisted Internet in the companies “development such asof high NetEase, technology Hikvision, and Alibaba,realization and of industrialization”. Huawei, and has large It has industrial many famous parks Internet such as companies the Hangzhou such High-Techas NetEase, Software Hikvision, Park, Alibaba, Xike Technology and Huawei, Park, and and has Shangfeng large industrial E-Commerce parks such Industrial as the Park. Hangzhou High-TechArea D Software is located Park, in Hangzhou Xike Technology Northern Park New, and City, Shangfeng whose goal E-Commerce is to become Industrial the second Park. largest InternetArea innovation D is located center in Hangzhou in Hangzhou. Northern The largeNew industrialCity, whose parks goal located is to become here include the second Hangzhou largest NorthInternet Software innovation Park center and Paradise in Hangzhou. e Valley The E-commerce large industrial Creative parks Industry located Park. here include Hangzhou NorthThe Software establishment Park and of Paradise high-tech e Valley zones inE-commerce the city, where Creative locations Industry are Park. generally selected in the peripheralThe establishment areas of the of city, high-tech will help zones the cityin the to city, gradually where change locations from are agenerally single-center selected form in tothe a multi-centerperipheral areas form of and the drive city, thewill development help the city ofto relatedgradually industries change byfrom sharing a single-center resources andform overcoming to a multi- externalcenter form negative and drive effects the to development thus effectively of promote related industries the formation by sharing of industrial resources clusters. and overcoming externalArea negative E in Figure effects 13 tois thus the urbaneffectively downtown promote area, the andformation most ofof theindustrial traditional clusters. and prosperous businessArea districts E in Figure are located 13 is the here urban such downtown as Wulin, Huanglong,area, and most Qianjiang of the traditional New Town, and and prosperous QianJiang Centurybusiness City. districts There are are located many ohereffice buildingssuch as Wulin, in these Huanglong, business districts, Qianjiang which New have Town, good and transportation QianJiang andCentury communication City. There conditions are many and office high-quality buildings supporting in these facilities,business thus districts, attracting which a large have number good oftransportation companies to and settle communication here. conditions and high-quality supporting facilities, thus attracting a large number of companies to settle here. ISPRS Int. J. Geo-Inf. 2020, 9, 42 14 of 20

ISPRS Int. J. J. Geo-Inf. Geo-Inf. 2020,, 9,, 42 42 14 of 20

D

A D

A B E B E

C

C

Figure 13. The distribution of high-tech zones and the downtown area in the heat map. Figure 13. The distribution of high-tech zones and the downtown area in the heat map. 4.2. Living AreasFigure 13. The distribution of high-tech zones and the downtown area in the heat map. 4.2. Living Areas Like the work areas analysis, we made a heat map to localize the hotspot regions in living areas 4.2. Living Areas by usingLike the the KDE work method. areas analysis, The parameters we made of a heatthe method map to were localize set theas follows: hotspot (1) regions the cell in size living was areas set byto 0.0001 usingLike thedegrees;the KDEwork method.(2) areas the analysis,search The parameters radius we made was of aset heat the to method 0.map005 todegrees; werelocalize set and the as follows:(3) hotspot for the (1)regions population the cell in sizeliving field, was areas the set bytoclusters 0.0001using d, the degrees; e, KDEand fmethod. (2) were the set search The to 7,parameters radius 3, and was 1, respof set the toectively. method 0.005 degrees;Figure were set14 and asshows (3)follows: for th thee heat(1) population the map cell of size the field, was living theset toclustersareas. 0.0001 d, degrees; e, and f were(2) the set search to 7, 3, radius and 1, was respectively. set to 0.005 Figure degrees; 14 shows and (3) the for heat the map population of the living field, areas. the clusters d, e, and f were set to 7, 3, and 1, respectively. Figure 14 shows the heat map of the living areas.

Figure 14. Heat map of the living areas made by th thee kernel density estimation (KDE) method.

Transportation is one of the important factors that affect the choice of residential location, which Figure 14. Heat map of the living areas made by the kernel density estimation (KDE) method. is related to commuting time. The subway, as a form of public transportation, has become a major way ISPRS Int. J. Geo-Inf. 2020, 9, 42 15 of 20

Transportation is one of the important factors that affect the choice of residential location, which ISPRSis related Int. J. Geo-Inf.to commuting2020, 9, 42 time. The subway, as a form of public transportation, has become a 15major of 20 way for commuting, especially in large cities. Hangzhou has opened subway lines Nos. 1, 2, and 4. We used the subway stations as centers to create buffer zones with a buffer range of 1.5 km and for commuting, especially in large cities. Hangzhou has opened subway lines Nos. 1, 2, and 4. We overlay the buffer zones with the heat map of the living areas; the result is shown in Figure 15. From used the subway stations as centers to create buffer zones with a buffer range of 1.5 km and overlay the the results of the buffer analysis, it is clear that the living areas are mainly distributed along the buffer zones with the heat map of the living areas; the result is shown in Figure 15. From the results subway lines. Statistically, the grids that represent living areas within the buffer zones account for of the buffer analysis, it is clear that the living areas are mainly distributed along the subway lines. 81.24% of the total. Statistically, the grids that represent living areas within the buffer zones account for 81.24% of the total.

Figure 15. The buffer zones of the Hangzhou subway overlaying the heat map. Figure 15. The buffer zones of the Hangzhou subway overlaying the heat map. It is easy to find that there are four hotspot regions of living areas in the heat map, as shown It is easy to find that there are four hotspot regions of living areas in the heat map, as shown in in Figure 16e. In contrast with online maps such as Google Maps, a large number of residential Figure 16e. In contrast with online maps such as Google Maps, a large number of residential communities and apartments appear in the hotspot regions, as shown in Figure 16a–d. In addition, communities and apartments appear in the hotspot regions, as shown in Figures 16a–d. In addition, we found that urban villages, which are also called chengzhongcun in Chinese, also appeared in all we found that urban villages, which are also called chengzhongcun in Chinese, also appeared in all hotspot regions. Urban villages are formed when expanding modern urban districts encroach on rural hotspot regions. Urban villages are formed when expanding modern urban districts encroach on settlements and became transitional neighborhoods under rapid urbanization [1,42]. rural settlements and became transitional neighborhoods under rapid urbanization [1,42]. Hangzhou has a huge job market for Internet-related industries, which has attracted a large Hangzhou has a huge job market for Internet-related industries, which has attracted a large number of young talent in recent years and produces a growing demand for housing, especially for number of young talent in recent years and produces a growing demand for housing, especially for low-rent housing. We added the center points of the grids from clusters d and e to the hotspot region low-rent housing. We added the center points of the grids from clusters d and e to the hotspot region A, which were identified as living areas and had more significant functional characteristics. As shown A, which were identified as living areas and had more significant functional characteristics. As shown in Figure 17, although residential communities and apartments were densely distributed in area A, the in Figure 17, although residential communities and apartments were densely distributed in area A, locations of the cluster grids were mostly concentrated in the urban village areas. It shows that the the locations of the cluster grids were mostly concentrated in the urban village areas. It shows that urban villages have become one of the residential choices for many young people, which provides the urban villages have become one of the residential choices for many young people, which provides low-rent housing and low-cost living spaces. However, the population of urban villages has strong low-rent housing and low-cost living spaces. However, the population of urban villages has strong mobility and instability [43], and there are many private rental housing accommodations in urban mobility and instability [43], and there are many private rental housing accommodations in urban villages that are not registered in the local house lease management office, which will make it difficult villages that are not registered in the local house lease management office, which will make it difficult to monitor the floating population in urban villages. Thus, the method of this paper provides a new to monitor the floating population in urban villages. Thus, the method of this paper provides a new way to discover urban villages in cities, which will help to find the hotspot areas of floating populations way to discover urban villages in cities, which will help to find the hotspot areas of floating and promote urban youth population management. populations and promote urban youth population management. ISPRS Int. J. Geo-Inf. 2020, 9, 42 16 of 20 ISPRS Int. J. Geo-Inf. 2020, 9, 42 16 of 20

Figure 16. Residential communities, apartments, and urban villages in the hotspot regions of living areas. Figure 16. Residential communities, apartments, and urban villages in the hotspot regions of living areas. ISPRS Int. J. Geo-Inf. 2020, 9, 42 17 of 20 ISPRS Int. J. Geo-Inf. 2020, 9, 42 17 of 20

Figure 17. The distribution of clusters d andand e in living hotspot region A.A.

5. Conclusions InIn thisthis paper,paper, foodfood deliverydelivery data,data, asas aa newnew datadata source,source, were introduced into urban computing, and we proposed a a time-series-based time-series-based clustering clustering meth methodod to to discover discover the the geogra geographicalphical distribution distribution of ofurban urban youth. youth. The The synthetic synthetic time time series series of offood food delivery delivery were were constructed constructed by by combining thethe weekdays–weekendsweekdays–weekends 48-h 48-h timeline timeline with with delivery delivery hot valueshot values that were that calculatedwere calculated by using by the using modified the Getis-Ordmodified Getis-Ord statistic method statistic to utilizemethod the to characteristicsutilize the characteristics of the delivery of data.the delivery Then, an data. unsupervised Then, an k-meansunsupervised clustering k-means method clustering was adopted method towas classify adopted the to time classify series the into time six series classes, into and six based classes, on and the behavioralbased on the characteristics behavioral characteristics of ordering food of ordering delivery food that dideliveryffer between that differ job and between housing job areas,and housing the six classesareas, the were six identified classes were as two identified functional as two types functional (i.e., work types areas (i.e., and work living areas areas). and Theliving identification areas). The resultidentification was verified result by was comparing verified by it with comparing the offi ceit with urban the planning office urban map, andplanning the accuracy map, and of thethe identifiedaccuracy of work the identified areas was work 66.4–94.6%, areas was while 66.4–94.6%, the accuracy while of the the accura identifiedcy of living the identified areas was living 59–87.2%. areas was 59–87.2%.Focusing on the urban youth population, especially white-collar workers, the work and living areasFocusing were further on the analyzed urban youth by comparing population, them especially with the white-collar industrial structure workers, and the residential work and layoutliving inareas Hangzhou. were further Heat analyzed maps were by madecomparing with thethem kernel with densitythe industrial estimation structure method and to residential localize hotspot layout regionsin Hangzhou. in the workHeat andmaps living were areas, made with with the the main kern findingsel density as estimation follows: method to localize hotspot regions in the work and living areas, with the main findings as follows: 1. The spatial distribution of the delivery customers’ work places was similar to the spatial 1. Thedistribution spatial distribution of the Internet of andthe e-commercedelivery customers' companies work of Hangzhou. places was This similar shows to thatthe aspatial large distributionnumber of youngof the peopleInternet are and working e-commerce in the co Internet-relatedmpanies of Hangzhou. industries This areas shows planned that a by large the numbergovernment, of young and therepeople exists are working a symbiotic in the relationship Internet-related between industries Hangzhou’s areas Internet-relatedplanned by the government,industrial agglomeration and there exists and younga symbiotic labor gathering.relationship The between establishment Hangzhou’s of high-tech Internet-related zones has industrialeffectively agglomeration attracted young and people, young and labor theyoung gather laboring. The gathering establishment also promotes of high-tech the development zones has effectivelyof the Internet-related attracted young industry. people, There and is a mutualthe young driving labor relationship gathering between also promotes Hangzhou’s the developmentInternet-related of industrythe Internet-related agglomeration industry. and young There labor is a gathering.mutual driving relationship between 2. Hangzhou’sTransportation Internet-related and living costs industry are the ag twoglomeration important and factors young that labor affect gathering. the choice of residential 2. Transportationlocation for young and people.living costs The are hotspot the two living important areas of factors urban that youth affect are the mostly choice located of residential within location1.5 km from for young the subway people. stations, The hotspot and the living subway areas is of becoming urban youth one ofare the mostly most located important within modes 1.5 km from the subway stations, and the subway is becoming one of the most important modes for commuting. Urban villages have become one of the residential choices for many young people, which provide low-rent housing and low-cost living spaces. ISPRS Int. J. Geo-Inf. 2020, 9, 42 18 of 20

for commuting. Urban villages have become one of the residential choices for many young people, which provide low-rent housing and low-cost living spaces.

In addition, the findings of this paper can not only improve the view of the current state of the city’s industrial development, but can also promote floating population management by discovering and monitoring the key areas such as urban villages. In future studies, machine learning methods can be introduced to improve the classification method for food delivery data, and thus, the identified functional types can be divided into more land use categories. By comparing multi-year food delivery data, we can discover changes in the distribution of work and living areas to track the urbanization process.

Author Contributions: Yiming Yan was involved in the design of the study, interpretation of data, drafted the major revisions, and performed the experiments; Yuanyuan Wang contributed to the study design and algorithm improvement; Zhenhong Du drafted part of the manuscript; Feng Zhang conceived the experiments and improved the manuscript; Renyi Liu was involved in the data acquisition and analyses of the data and experiments; and Xinyue Ye was involved in the revision of the manuscript. All authors have read and agreed to the published version of the manuscript. Funding: This research was funded by the National Natural Science Foundation of China (Grant Nos. 41701436, 41671391), National Key R&D Program of China (Grant No. 2018YFB0505000), and the Open Fund of Laboratory of Target Microwave Properties (Grant No. 2018KF05). Acknowledgments: The authors would like to thank the support from our university and laboratory, and thank all anonymous reviewers for their constructive comments that better shaped the paper. Conflicts of Interest: The authors declare no conflicts of interest.

References

1. Tan, S.K.; , Y.A.; Song, Y.; Luo, X.; Zhou, M.; Zhang, L.; Kuang, B. Influence factors on settlement intention for floating population in urban area: A China study. Qual. Quant. 2017, 51, 147–176. [CrossRef] 2. Tao, L.; Hui, E.C.; Wong, F.K.; Chen, T. Housing choices of migrant workers in China: Beyond the Hukou perspective. Habitat Int. 2015, 49, 474–483. [CrossRef] 3. Li, B. Floating population or urban citizens? Status, social provision and circumstances of rural–urban migrants in China. Soc. Policy Adm. 2006, 40, 174–195. [CrossRef] 4. Frosch, K.H. Workforce age and innovation: A literature survey. Int. J. Manag. Rev. 2011, 13, 414–430. [CrossRef] 5. Chatman, D.G. How density and mixed uses at the workplace affect personal commercial travel and commute mode choice. Transp. Res. Rec. 2003, 1831, 193–201. [CrossRef] 6. Peng, Z.-R. The jobs-housing balance and urban commuting. Urban Stud. 1997, 34, 1215–1235. [CrossRef] 7. Wu, W. Migrant intra-urban residential mobility in urban China. Hous. Stud. 2006, 21, 745–765. [CrossRef] 8. Liu, Y.; Xiao, Y.; Gao, S.; Kang, C.-G.; Wang, Y.-L. A Review of Human Mobility Research Based on Location Aware Devices. Geogr. Geo Inf. Sci. 2011, 4, 3. 9. Zheng, Y. Trajectory data mining: An overview. ACM Trans. Intell. Syst. Technol. 2015, 6, 29. [CrossRef] 10. Feng, Z.; Zhu, Y. A survey on trajectory data mining: Techniques and applications. IEEE Access 2016, 4, 2056–2067. [CrossRef] 11. Gao, S. Spatio-temporal analytics for exploring human mobility patterns and urban dynamics in the mobile age. Spat. Cogn. Comput. 2015, 15, 86–114. [CrossRef] 12. Jiang, S.; Ferreira, J.; Gonzalez, M.C. Activity-based human mobility patterns inferred from mobile phone data: A case study of Singapore. IEEE Trans. Big Data 2017, 3, 208–219. [CrossRef] 13. Pei, T.; Sobolevsky, S.; Ratti, C.; Shaw, S.-L.; Li, T.; Zhou, C. A new insight into land use classification based on aggregated mobile phone data. Int. J. Geogr. Inf. Sci. 2014, 28, 1988–2007. [CrossRef] 14. Toole, J.L.; Ulm, M.; González, M.C.; Bauer, D. Inferring land use from mobile phone activity. In Proceedings of the ACM SIGKDD International Workshop on Urban Computing, , China, 12–16 August 2012; pp. 1–8. 15. Yang, X.; Fang, Z.; Yin, L.; Li, J.; Zhou, Y.; Lu, S. Understanding the Spatial Structure of Urban Commuting Using Mobile Phone Location Data: A Case Study of , China. Sustainability 2018, 10, 1435. [CrossRef] ISPRS Int. J. Geo-Inf. 2020, 9, 42 19 of 20

16. Zhao, Z.; Shaw, S.-L.; Xu, Y.; Lu, F.; Chen, J.; Yin, L. Understanding the bias of call detail records in human mobility research. Int. J. Geogr. Inf. Sci. 2016, 30, 1738–1762. [CrossRef] 17. Mao, F.; Ji, M.; Liu, T. Mining spatiotemporal patterns of urban dwellers from taxi trajectory data. Front. Earth Sci. 2016, 10, 205–221. [CrossRef] 18. Tang, J.; Liu, F.; Wang, Y.; Wang, H. Uncovering urban human mobility from large scale taxi GPS data. Phys. A Stat. Mech. Its Appl. 2015, 438, 140–153. [CrossRef] 19. Zheng, Q.; Zhao, X.; Jin, M. Research on Urban Public Green Space Planning Based on Taxi Data: A Case Study on Three Districts of Shenzhen, China. Sustainability 2019, 11, 1132. [CrossRef] 20. Pan, G.; Qi, G.; Wu, Z.; Zhang, D.; Li, S. Land-use classification using taxi GPS traces. IEEE Trans. Intell. Transp. Syst. 2013, 14, 113–123. [CrossRef] 21. Yuan, N.J.; Zheng, Y.; Xie, X.; Wang, Y.Z.; Zheng, K.; Xiong, H. Discovering Urban Functional Zones Using Latent Activity Trajectories. IEEE Trans. Knowl. Data Eng. 2015, 27, 712–725. [CrossRef] 22. Biehl, A.; Ermagun, A.; Stathopoulos, A. Community mobility MAUP-ing: A socio-spatial investigation of bikeshare demand in Chicago. J. Transp. Geogr. 2018, 66, 80–90. [CrossRef] 23. Yang, T.; Li, Y.; Zhou, S. System Dynamics Modeling of Dockless Bike-Sharing Program Operations: A Case Study of Mobike in Beijing, China. Sustainability 2019, 11, 1601. [CrossRef] 24. Zhang, L.; Zhang, J.; Duan, Z.-Y.; Bryde, D. Sustainable bike-sharing systems: Characteristics and commonalities across cases in urban China. J. Clean. Prod. 2015, 97, 124–133. [CrossRef] 25. Zhang, X.; Li, W.; Zhang, F.; Liu, R.; Du, Z. Identifying Urban Functional Zones Using Public Bicycle Rental Records and Point-of-Interest Data. ISPRS Int. J. Geo Inf. 2018, 7, 459. [CrossRef] 26. Zhu, Y.; Chen, F.; Li, M.; Wang, Z. Inferring the economic attributes of urban rail transit passengers based on individual mobility using multisource data. Sustainability 2018, 10, 4178. [CrossRef] 27. Zhou, J.; Murphy, E.; Long, Y. Commuting efficiency in the Beijing metropolitan area: An exploration combining smartcard and travel survey data. J. Transp. Geogr. 2014, 41, 175–183. [CrossRef] 28. Mohamed, K.; Côme, E.; Oukhellou, L.; Verleysen, M. Clustering smart card data for urban mobility analysis. IEEE Trans. Intell. Transp. Syst. 2016, 18, 712–728. 29. Long, Y.; Thill, J.-C. Combining smart card data and household travel survey to analyze jobs–housing relationships in Beijing. Comput. Environ. Urban Syst. 2015, 53, 19–35. [CrossRef] 30. Mazimpaka, J.D.; Timpf, S. Trajectory data mining: A review of methods and applications. J. Spat. Inf. Sci. 2016, 2016, 61–99. [CrossRef] 31. Hu, Y.; Han, Y. Identification of Urban Functional Areas Based on POI Data: A Case Study of the Economic and Technological Development Zone. Sustainability 2019, 11, 1385. [CrossRef] 32. Yang, X.; Zhao, Z.; Lu, S. Exploring spatial-temporal patterns of urban human mobility hotspots. Sustainability 2016, 8, 674. [CrossRef] 33. 2017 China Internet Local Life Services Blue Book. Available online: http://mp.163.com/v2/article/detail/ D8KG29RO0518SLLV.html (accessed on 15 December 2018). 34. 2017–2018 China Online Catering Food Market Research Report. Available online: https://www.iimedia.cn/ c400/60449.html (accessed on 15 December 2018). 35. 2017 China Delivery Development Research Report. Available online: https://www.sohu.com/a/216309928_ 800248 (accessed on 15 December 2018). 36. Statistical Communique of Hangzhou on the 2018 National Economic and Social Development. Available online: http://www.hangzhou.gov.cn/col/col805865/index.html (accessed on 15 April 2019). 37. Ord, J.K.; Getis, A. Local Spatial Autocorrelation Statistics: Distributional Issues and an Application. Geogr. Anal. 1995, 27, 286–306. [CrossRef] 38. Makrai, G. Efficient method for large-scale spatio-temporal hotspot analysis (GIS Cup). In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Francisco Bay Area, CA, USA, 31 October–3 November 2016; pp. 1–18. 39. Ray, S.; Turi, R.H. Determination of number of clusters in k-means clustering and application in colour image segmentation. In Proceedings of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques, Calcutta, India, 27–29 December 1999; pp. 137–143. 40. Zhang, Y.; Zhu, Z. Research on mutual driving relationship between industrial agglomeration and talent gathering in . In Proceedings of the Second International Conference On Economic and Business Management (FEBM 2017), , China, 21–23 October 2017. ISPRS Int. J. Geo-Inf. 2020, 9, 42 20 of 20

41. Florida, R. The economic geography of talent. Ann. Assoc. Am. Geogr. 2002, 92, 743–755. [CrossRef] 42. Song, Y.; Zenou, Y. Urban villages and housing values in China. Reg. Sci. Urban Econ. 2012, 42, 495–505. [CrossRef] 43. Wehrhahn, R.; Bercht, A.L.; Krause, C.L.; Azzam, R.; Kluge, F.; Strohschön, R.; Wiethoff, K.; Baier, K. Urban restructuring and social and water-related vulnerability in megacities–the example of the urban village of Xincun, Guangzhou (China). Die Erde 2008, 139, 227–249.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).