Quick viewing(Text Mode)

Quantifying the Urban Visual Perception of Chinese Traditional-Style Building with Street View Images

Quantifying the Urban Visual Perception of Chinese Traditional-Style Building with Street View Images

applied sciences

Article Quantifying the Urban Visual Perception of Chinese Traditional-Style Building with Street View Images

Liying Zhang 1,2,3 , Tao Pei 3,4,5,* , Xi Wang 3,4, Mingbo Wu 3,4, Ci Song 3,4 , Sihui Guo 3,4 and Yijin Chen 2 1 College of Information Science and Engineering, China University of Petroleum, 102249, China; [email protected] 2 School of Geosciences & Surveying Engineering, China University of Mining & Technology, Beijing 100083, China; [email protected] 3 State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China; [email protected] (X.W.); [email protected] (M.W.); [email protected] (C.S.); [email protected] (S.G.) 4 University of Chinese Academy of Sciences, Beijing 100049, China 5 Center for Collaborative Innovation in Geographical Information Resource Development and Application, 210023, China * Correspondence: [email protected]; Tel.: +86-010-6488-8960

 Received: 20 July 2020; Accepted: 21 August 2020; Published: 28 August 2020 

Abstract: As a symbol of Chinese culture, Chinese traditional-style architecture defines the unique characteristics of Chinese cities. The visual qualities and spatial distribution of architecture represent the image of a city, which affects the psychological states of the residents and can induce positive or negative social outcomes. Hence, it is important to study the visual perception of Chinese traditional-style buildings in China. Previous works have been restricted by the lack of data sources and techniques, which were not quantitative and comprehensive. In this paper, we proposed a deep learning model for automatically predicting the presence of Chinese traditional-style buildings and developed two view indicators to quantify the pedestrians’ visual perceptions of buildings. Using this model, Chinese traditional-style buildings were automatically segmented in streetscape images within the Fifth Ring Road of Beijing and then the perception of Chinese traditional-style buildings was quantified with two view indictors. This model can also help to automatically predict the perception of Chinese traditional-style buildings for new urban regions in China, and more importantly, the two view indicators provide a new quantitative method for measuring the urban visual perception in street level, which is of great significance for the quantitative research of tourism route and urban planning.

Keywords: urban perception; Chinese traditional-style building; street view images; view indicator; deep learning

1. Introduction Urban perception has been receiving more and more attention because it plays an important role in urban studies [1–4]. Urban streets, as representations of urban landscapes, determine the qualities of visual perception of urban style, and buildings are the most pronounced elements in urban streets, which shape and articulate the urban style due to their dominant shapes and vivid colors, historical sites, and unique symbols [5,6]. Thus, urban perception studies have been carried out via visual perception of buildings [7,8]. Early urban perception studies were carried out at small scale through a field study of people’s perceptions and documentation of urban landscapes, which involved considerable collecting effort [4]. Later, in large-scale urban perception research, urban elements could

Appl. Sci. 2020, 10, 5963; doi:10.3390/app10175963 www.mdpi.com/journal/applsci Appl. Sci. 2020, 10, 5963 2 of 19 be perceived with the help of conventional computer vision techniques; however, these methods faced scalability issues because the feature engineering relied heavily on manual extraction [9]. To address the issues, we proposed a method based on deep learning techniques to study the urban visual perception of Chinese traditional-style building with street view images. We propose a deep learning network model based on convolutional neural networks and transfer learning that can automatically extract Chinese traditional-style architecture from thousands of street view images with pixel-level semantic segmentation. Then, based on the pixel-level image segmentation, we developed the traditional building view index to quantify pedestrians’ visual perceptions of Chinese traditional-style buildings in large areas. In this research, we used Beijing as a case study. Beijing is ancient cultural capital with rich historical sites. In addition, Beijing is one of the most rapidly urbanizing cities in China, and modern buildings and historical sites have been mixed in the process of urban transformation, which allow testing of the effectiveness of the method and contribute to the planning and protection of the architecture of Beijing [10]. The main contributions of our study are as follows: (1) construction of categories of Chinese traditional-style buildings to train datasets and testing and verifying the method with street view (TSV) images; (2) suggestion of the model traditional building mask region-based convolutional neural networks (TBMask R-CNN) to achieve automatic classification and segmentation of Chinese traditional-style buildings; (3) development of the traditional building view index (TBVI) and the street of traditional building view index (STBVI) to quantify pedestrians’ visual perceptions of Chinese traditional-style buildings; and (4) calculation of the TBVI and STBVI values within the Fifth Ring Road of Beijing and analysis of the spatial distribution of the TBVI and STBVI of traditional-style buildings.

2. Related Works In this section, we review key works in urban perception about data sources and techniques, such as urban perception with street view images and urban perception with deep learning.

2.1. Urban Perception with Street View Images

Street views are interactive electronic maps that can provide 360◦ panoramas along urban streets; therefore, the street view images from these maps can be regarded as representations of urban landscapes. Currently, Google, Tencent and Baidu are the three mapping service providers of street view images in the world [5,11,12]. Street view images have become an important new data source in urban studies due to their high accessibility, high resolution and broad coverage [2,13,14]. In recent years, researchers have exploited these images to study urban perception with a certain urban element or synthesis of the perception results of several urban elements [15]. Generally, the perceived urban elements are mainly divided into five categories: buildings, green plants, roads, traffic signs, and pedestrians. The perception of a certain urban element or synthesis of the perception results of several urban elements can be used to study urban perception. For instance, the urban physical environment can be assessed by perceiving urban elements such as buildings, green plants, roads, etc. from Google Street View images, and the results are generally accordant with the field assessments [16–20]. By perceiving trees and grass using street view images, a series of quantitative methods for urban greenery were developed [21–23]. Through analysis of the perception of urban elements such as road suitability and green vegetation, urban safety made great progress at the street level in large areas [3,24,25]. The above studies show that street view is a very reliable data source for urban studies; however, for large-scale quantitative urban perception studies, processing large numbers of images is very challenging.

2.2. Urban Perception with Deep Learning The traditional methodology for studying urban perception was carried out on small scales through field studies; photographs or videotapes were manually reviewed and urban residents were interviewed due to inadequate technologies, which made it difficult to analyze urban perception Appl. Sci. 2020, 10, 5963 3 of 19 in a quantitative manner and thus became a bottleneck for urban planning based on quantitative Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 20 studies [5,26]. Later, in large-scale urban perception studies, approaches based on conventional computerquantitative vision manner techniques and thus were became developed, a bottleneck such for as urban the planning histogram based of on oriented quantitative gradient studies (HOG) descriptor[5,26]. [Later,27], scale in large-scale invariant urban feature perception transform stud (SIFT)ies, approaches method [28 based], and on edge conventional detection computer methods with differentvision filters techniques for image were processing;developed, such however, as the histogram feature engineering of oriented gradient relies on (HOG) manual descriptor extraction [27], and thenscale manual invariant coding feature based transform on the domain (SIFT) andmethod data [28], types, and which edge detection is both di methodsfficult and with expensive different [9]. Deepfilters learningfor image hasprocessing; the ability howe tover, automatically feature engineering learn hierarchicalrelies on manual features extraction and has and proven then its potentialmanual for coding processing based largeon the image domain datasets and data with types, near-human which is both accuracy difficult and in image expensive classification [9]. and Deep learning has the ability to automatically learn hierarchical features and has proven its pattern recognition [29,30], which provides the possibility of large-scale automatic urban perception potential for processing large image datasets with near-human accuracy in image classification and studies. Convolutional neural networks (CNNs) are one of the deep learning techniques that have pattern recognition [29,30], which provides the possibility of large-scale automatic urban perception madestudies. breakthroughs Convolutional in image neural processing networks and(CNNs) are beingare one explored of the deep by urbanlearning researchers techniques to that study have urban perception.made breakthroughs Ref. [3] introduced in image CNN processing to predict and urban are being safety explored using Googleby urban Street researchers View images, to study which significantlyurban perception. improved Ref. prediction [3] introduced accuracy CNN compared to predict tourban previous safety methods.using Google Ref. Street [2] used View a images, CNN-based modelwhich to study significantly the relationship improved prediction between accuracy the visual compared attributes to previous of cities methods. and urban Ref. perceptions[2] used a of safetyCNN-based and liveliness model at to the study global the scale. relationship Ref. [31 between] trained the the visual places attributes convolutional of cities neuraland urban network (CNN)perceptions to extract of featuressafety and from liveliness 0.2 million at the imagesglobal scale. to perceive Ref. [31] beauty trained inthe outdoor places convolutional spaces. Ref. [32] usedneural SegNet, network a semantic (CNN) segmentation to extract features model-based from 0.2 million CNN, images to extract to perceive urban features,beauty in suchoutdoor as trees, spaces. Ref. [32] used SegNet, a semantic segmentation model-based CNN, to extract urban features, sky, roads, buildings, from Google Street View images and then compared the differences in urban such as trees, sky, roads, buildings, from Google Street View images and then compared the perception in four cities using street view elements. To our knowledge, the existing deep learning differences in urban perception in four cities using street view elements. To our knowledge, the networkexisting models deep can learning extract network buildings; models however, can extr theyact cannotbuildings; realize however, the fine-grained they cannot segmentationrealize the of buildingfine-grained subclasses, segmentation such as those of building of Chinese subclasse traditional-styles, such as buildings.those of Chinese traditional-style buildings. 3. Study Area and Dataset 3. Study Area and Dataset 3.1. Study Area 3.1. Study Area Beijing is a world-famous historical and cultural city that has undergone dramatic transformation from beingBeijing the capitalis a world-famous of China’s six historical dynasties and to thecultural administrative city that has center undergone and, even dramatic at present, to China’stransformation political, from cultural being andthe capital economic of China’s center. six Beijingdynasties has to the many administrative historical sitescenter and and, traditional even Chineseat present, architecture, to China’s which political, made cultural it a powerful and economic case study center. area Beijing for ourhas research.many historical We focused sites and on the traditional Chinese architecture, which made it a powerful case study area for our research. We area within the Fifth Ring Road, which includes the traditional Chinese landscape area with the Old focused on the area within the Fifth Ring Road, which includes the traditional Chinese landscape City as the core, and the area from Old City to the Fifth Ring Road (Figure1). The study region has a area with the Old City as the core, and the area from Old City to the Fifth Ring Road (Figure 1). The 2 totalstudy area of region 668.4 has km a totaland area is home of 668.4 to approximatelykm2 and is home10.54 to approximately million people 10.54 [ 33million]. people [33].

FigureFigure 1.1. Study area. area. Appl. Sci. 2020, 10, 5963 4 of 19 Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 20

3.2. Data Acquisition

WeAppl. Sci. usedused 2020 street,street 10, x FOR viewview PEER imagesimagesREVIEW obtainedobtained from Tencent Maps,Maps, which provides an application4 of 20 programming interface (API) for querying and downloadingdownloading street view images with parameters 3.2. Data Acquisition such as size, locationlocation/pano,/pano, heading, pitch, and key [[12].12]. The TSV panoramas capture photos from horizontalWe and used vertical street cameras view images and stitchobtained them from toge together Tencentther to produce Maps, which panoramas provides of an the application original street view.programming The TSV image’s interface visual (API) fields fields for querying are approximatelyapproximat and downelyloading equal street to those view of images the normal with parameters human visual such as size, location/pano, heading, pitch, and key [12]. The TSV panoramas capture photos from fieldfield when the pitch angle of the camera is 0° 0◦ [12];[12]; considering that the four images were sufficient sufficient to horizontal and vertical cameras and stitch them together to produce panoramas of the original street form panoramic images that simulated the 360360°◦ visual landscape of a pedestrian at a fixed fixed point, we view. The TSV image’s visual fields are approximately equal to those of the normal human visual downloaded a total of four images (each with a pitch angle of 00°◦ and headingsheadings of 00°,◦, 9090°,◦, 180180°,◦, and field when the pitch angle of the camera is 0° [12]; considering that the four images were sufficient to 270 ) at a location to represent its panoramic view. 270°)◦ form at a locationpanoramic to images represent that itssimulated panoramic the 360° view. visual landscape of a pedestrian at a fixed point, we Thedownloaded procedure a total for of obtaining four images street (each view with images a pitch includedin anglecluded of 0° three and headingssteps: (1) of obtaining 0°, 90°, 180°, the and location coordinates270°) at aat location an interval to represent of 100 its m panoramic along all theview. streetsst reets in the vector road network within the Fifth Ring Road,The (2) procedure searching for the obtaining nearest street TSV view ID within images 50 in cludedm based three on steps:the Tencent (1) obtaining web servicethe location API and usingcoordinates the TSV API at an to interval download of 100 the m four along TSV all the images streets at in each the location,vector road and network (3) cleaning within the Fifth data and eliminatingRing Road, the (2) location searching coordinates the nearest without TSV ID obtainingobtainingwithin 50 m allall based fourfour on TSVTSV the images.images. Tencent web service API and Forusing detailed detailedthe TSV instructionsAPI instructions to download on the onthe TSVfour the TSVAPI, TSV images see API, re atference each see location, [34]. reference An and example (3) [34 cleaning]. of Anthe the acquisition data example and of eliminating the location coordinates without obtaining all four TSV images. theTSV images acquisition through of the TSV TSV images API is shown through in Figures the TSV 2 and API 3 with is a shownuniform inresource Figures locator2 and (URL)3 For detailed instructions on the TSV API, see reference [34]. An example of the acquisition of withexample a uniformof resourcethe locator (URL)Tencent example ofstatic the Tencentimage static imageAPI: TSV images through the TSV API is shown in Figures 2 and 3 with a uniform resource locator (URL) https://apis.map.qq.com/ws/streetview/v1/image?size=960x640&pano=10011512120530135909300& https://apis.map.qq.com/ws/streetview/v1/imexample of the Tencentage?size=960x640&pano=10011512120530135909300&p static image API: pitchitch=0&heading=0&key=5MJBZ-NEhttps://apis.map.qq.com/ws/streetview/v1/im=0&heading=0&key=5MJBZ-NEIE6-QMLSW-MX4KF-5OEDE-C7B4ZIE6-QMLSW-MX4KF-5OEDE-C7B4Zage?size=960x640&pano=10011512120530135909300&p . Response:itch=0&heading=0&key=5MJBZ-NE IE6-QMLSW-MX4KF-5OEDE-C7B4Z Response:

FigureFigure 2. Tencent Tencent2. Tencent Street Street View View staticstatic imageimagee application application program program interface. interface.

FigureFigure 3. Tencent3. Tencent Street Street View View imageimage download parameter parameter configuration. configuration.

Figure 3. Tencent Street View image download parameter configuration. Appl. Sci. 2020, 10, 5963 5 of 19

In this study, the resolution of the TSV images was 960 640, and the format was jpg. After data × cleaning, a total of 272,244 TSV images were obtained for 68,061 street points with streetscape images in four directions. The TSV images in the study area were downloaded during November 2018.

3.3. Categories of ChineseAppl. Traditional-Style Sci. 2020, 10, x FOR PEER REVIEW Buildings 5 of 20

Generally, traditionalIn this Chinese study, the buildingsresolution of the can TSVbe images divided was 960 × into 640, and offi thecial format architecture was jpg. After data and folk architecture accordingcleaning, to their a total forms of 272,244 [35, 36TSV]. images The owerefficial obtained building for 68,061 style, streetwhich points with is alsostreetscape known as images in four directions. The TSV images in the study area were downloaded during November palace-style architecture,2018. is characterized by high-grade buildings in ancient Chinese architecture that are rigorous, dignified, and elegant with mainly red, yellow, green, and blue colors, which reflect the 3.3. Categories of Chinese Traditional-Style Buildings royal authority and feudal hierarchy. The palace, prince palaces, temple and religious buildings are Generally, traditional Chinese buildings can be divided into official architecture and folk representative buildings ofGenerally, official traditional style. InChinese contrast buildings with can the be odividedfficial into building official style,architecture there and is folk also folk architecture according to their forms [35,36]. The official building style, which is also known as architecture, which ispalace-style also known architecture, as the is local characterized building by high-grade style. This buildings style in is ancient characteristically Chinese architecture pleasant, not large in scale, simplethat are in rigorous, shape, dignified, and mainly and elegant composed with mainly of red, blue-gray yellow, green, and and gray-white blue colors, which colors [37]. reflect the royal authority and feudal hierarchy. The palace, prince palaces, temple and religious Referring to the classificationbuildings are systemrepresentative of traditionalbuildings of official Chinese style. Inarchitecture, contrast with the buildingsofficial building are style, divided into three categories:there the oisffi alsocial-style folk architecture, architecture, which isfolk-style also known architecture,as the local building and style. nontraditional-style This style is characteristically pleasant, not large in scale, simple in shape, and mainly composed of blue-gray architecture. Among them,characteristically official-style pleasant,and not large folk-style in scale, si buildingsmple in shape, belong and mainly to Chinese composed traditionalof blue-gray style and gray-white colors [37]. Referring to the classification system of traditional Chinese architecture, buildings. Table1 showsbuildings the o areffi dividedcial-style into three architecture, categories: the folk-style official-style architecture, architecture, folk and-style nontraditional-style architecture, and architecture buildings.nontraditional-style architecture. Among them, official-style and folk-style buildings belong to Chinese traditional style buildings. Table 1 shows the official-style architecture, folk-style architecture, and nontraditional-style architecture buildings. Table 1. Description of building category. Table 1. Description of building category. Category Features Representative Buildings Category Features Representative Buildings

The buildingThe building has has a a grandgrand scale, scale, luxuriousluxurious and and elegantelegant style, style, with with glazedglazed tiles, tiles, red red Official Official style columns,columns, and and style coloredcolored paintings. paintings. The colorThe color of the of the buildingbuilding is mainly is mainly red, yellow,red, yellow, green, and blue.green, and blue.

The shape of The shape of building is building is simple, simple, and the and the materials materials are blue are blue brick, gray brick, gray tile, Folk style Folk styletile, and stone. and stone. The The colorcolor ofof thethe buildingbuilding is mainly is mainly blue-grayblue-gray and and gray-white.gray-white.

Appl. Sci. 2020, 10, 5963 6 of 19

Table 1. Cont.

CategoryAppl. Sci. 2020, 10, x Features FOR PEER REVIEW Representative Buildings 6 of 20

Simple, plain, Simple,geometric plain, forms, geometricrectangular forms, rectangularshapes, shapes,linear Non-tradilinearelements, elements, and a

Non-traditional-styletional-styland arejection rejection of of e ornament,ornament, particularlyparticularly the usethe of glass,use steelof glass, and steel reinforcedand reinforced concrete. concrete.

To realize the intelligent recognition of Chinese traditional-style buildings based on deep To realize the intelligentlearning in recognition large regions, we of Chinesefirst constructed traditional-style datasets for the tr buildingsaining and evaluation based on models; deep 4310 learning images were selected from the TSV images, and the training set, validation set and test set were in large regions, we firstconstructed constructed with a ratio datasets of 8:1:1. for There the were training 3448 images and for evaluation the training models;set, 431 for 4310 the validation images were selected from the TSVsets images, and 431 and for the the test training sets. Among set, them, validation the training set set andincluded test 1148 set imag werees with constructed official-style with a ratio of 8:1:1. There werebuildings, 3448 images1150 images for with the trainingfolk-style set,buildings, 431 forand the 1150 validation images with sets nontraditional-style and 431 for the test buildings. Table 2 shows the description of the datasets. Data annotation was performed with the sets. Among them, theopen training source image set includedannotation tool 1148 VGG images Image Annotator with o (VIA)fficial-style [38], which buildings, was developed 1150 by the images with folk-style buildings,Visual andGeometry 1150 Group. images with nontraditional-style buildings. Table2 shows the description of the datasets. Data annotation wasTable performed 2. Description withof datasets. the open source image annotation tool VGG Image Annotator (VIA) [38], which was developedCategories by the Visual of Datasets Geometry Group. Categories of Building Style Training Validation Test Table 2.Official-styleDescription of datasets.1148 143 143 Folk-style 1150 144 144 Non-traditional-style Categories1150 of Datasets144 144 Categories of Building Style Total 3448 431 431 Training Validation Test 4. Methodology Official-style 1148 143 143 Folk-styleHere, we introduce the three-step workflow 1150 procedure for 144 automatically quantifying 144 the visual perception of Chinese traditional-style buildings on a large scale. Steps 1 and 2 are presented Non-traditional-style 1150 144 144 together in Section 4.1: Deep Learning Network for Traditional Style Building Segmentation, which describesTotal our deep learning model 3448that can classify and 431 predict building 431 categories from fine-grained images; Step 3 is introduced in Section 4.2: Traditional Building View Index, which describes the method for quantifying the pedestrians’ perceptions of traditional-style buildings 4. Methodology based on the predicted building categories generated in Step 2. In Section 4.3, we introduce Moran’s I index to evaluate the degree of spatial autocorrelation of the Chinese traditional-style buildings in Here, we introducethe study thethree-step area. In addition, workflow several important procedure abbreviations for automatically are defined in Table quantifying 3 to make an easier the visual perception of Chinesereading traditional-style of this paper. buildings on a large scale. Steps 1 and 2 are presented together in Section 4.1: Deep Learning Network for Traditional Style Building Segmentation, which describes our deep learning model that can classify and predict building categories from fine-grained images; Step 3 is introduced in Section 4.2: Traditional Building View Index, which describes the method for quantifying the pedestrians’ perceptions of traditional-style buildings based on the predicted building categories generated in Step 2. In Section 4.3, we introduce Moran’s I index to evaluate the degree of spatial autocorrelation of the Chinese traditional-style buildings in the study area. In addition, several important abbreviations are defined in Table3 to make an easier reading of this paper. Appl. Sci. 2020, 10, 5963 7 of 19

Table 3. List of Abbreviations.

Abbreviation Full Name TSV Tencent Street View TBMask R-CNN Traditional Building Mask Region-based Convolutional Neural Networks TBVI Traditional Building View Index STBVI Street of Traditional Building View Index RPN Region proposal network ResNet Residual network ROI Region of interest AP Average precision MAP Mean average precision O-TBVI The TBVI of official-style buildings F-TBVI The TBVI of the folk-style buildings O-STBVI The STBVI of official-style buildings F-STBVI The STBVI of the folk-style buildings

4.1. Deep Learning Network for Traditional-Style Building Segmentation Image segmentation is a necessary step for extracting pixel-level urban features from street-level images. Recent progress in image segmentation based on deep learning enables researchers to extract objects more accurately. However, the existing deep learning network model cannot obtain the fine-grained segmentation of building subclasses, such as those of Chinese traditional-style buildings. To realize the automatic segmentation of traditional-style buildings, we proposed a deep learning network model: Traditional Building Mask Region-based Convolutional Neural Networks (TBMask R-CNN), which is based on pixel-level semantic segmentation, Mask R-CNN [39] and transfer learning. The framework of TBMask R-CNN is shown in Figure4. The network architecture of TBMask R-CNN is composed of three modules: the backbone architecture, region proposal network (RPN) and head architecture. The backbone architecture uses a deep convolution neural network (CNN) to better extract features from each street view images. The residual network (ResNet) is a CNN architecture that revolutionizes the CNN architectural race and devises an efficient methodology for training deep convolution neural networks. ResNet101 is a 101 layers residual network and compared with ResNet50, has a higher accuracy in this experiment. Therefore, we used the convolution layer weights of the ResNet100 pre-training model based on the Microsoft Common Objects in Context (MS COCO) dataset as the backbone architecture of TBMask R-CNN. The convolution kernels of the first and second layers of ResNet101 mainly extract the low-level features of the building, such as edges, colors, pixels, contours, gradients, etc. The third layer extracts the larger and more complex local features of the building, which is derived from the low-level features of the second layer. By analogy, layer by layer corresponds to more and more complex concepts, that is, higher-level networks recognize higher-level semantic features of building categories. Low-level features are universal features, and they are also suitable for traditional style buildings. Therefore, we used the fine-tuned approach based on transfer learning to train our model. The specific steps are as follows: first, the first and second layers’ weights of the pre-training model were frozen; then, the traditional-style building training set and test set were used to train the top-level weights of the backbone network to obtain the features of the official-style buildings, folk-style buildings and nontraditional-style buildings. Appl. Sci. 2020, 10, 5963 8 of 19 Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 20

Street view … … panorama

TBMask R-CNN network architecture ResNet50/101 image annotation mask fixed size branch ResNet50/101+FPN feature map head full convolution network RoI classifica- feature Align softmax tion map 1×1 conv fully box 3×3 conv connected box backbone softmax layers regression architecture proposals box head architecture 1×1 regression conv … … region proposal network training dataset and test dataset Prediction

Chinese traditional-style … … building segmentation mask

FigureFigure 4. The4. The traditional traditional building building mask mask region-based region-based convolutional convolutional neural neural network network (TBMask (TBMask R-CNN) frameworkR-CNN) framework for Chinese for traditional-style Chinese traditional-style building segmentation.building segmentation.

TheThe region region proposal proposal network network was was used used to sampleto sample multiscale multiscale regions regions of interestof interest (ROI) (ROI) for for the the head network,head network, which was which implemented was implemented with a full with convolution a full convolution network tonetwork input featureto input map feature images map and outputimages a setand of output rectangular a set targetof rectangular proposals target with pr objectoposals scores. with Theobject region scores. proposal The region network proposal uses a 3 network3 convolution uses a kernel3 × 3 convolution network to kernel slide onnetwork the feature to slide map on output the feature by the map last output shared by convolution the last × layershared of ResNet101, convolution and layer maps of the ResNet101, features of and each maps sliding the window features to aof low-dimensional each sliding window feature, to which a is inputlow-dimensional to two 1 1feature, fully connected which is input network to two layers: 1 × 1 boxfully regression connectedlayer network and layers: box classification box regression layer. × ROIlayer align and layer box classification finely aligns layer. the multi-scale ROI align layer features finely extracted aligns the by multi-scale the ROI with features the extracted input content, by the ROI with the input content, and generates a fixed-size feature map for each ROI through the and generates a fixed-size feature map for each ROI through the ROI align layer. ROI align layer. The head architecture computed the bounding box, category, and mask prediction for each ROI, The head architecture computed the bounding box, category, and mask prediction for each which consisted of three branches: the box prediction, classification prediction, and mask prediction. ROI, which consisted of three branches: the box prediction, classification prediction, and mask For box prediction and classification prediction, classification and further box positioning are performed prediction. For box prediction and classification prediction, classification and further box throughpositioning the fully are connectedperformed layer; through for maskthe fully prediction, connect theed layer; head afterfor mask ROI alignprediction, is to expand the head the after output dimensionROI align of is ROI to expand align as the the output input ofdimension the full convolutional of ROI align network,as the input which of the is implemented full convolutional in each ROInetwork, based onwhich the pixel-wiseis implemented prediction, in each a moreROI based accurate on segmentationthe pixel-wise maskprediction, is generated. a more accurate segmentation mask is generated. 4.2. Traditional Building View Index 4.2.Image Traditional classification Building andView segmentationIndex of the buildings is a necessary step in calculating the view index ofImage traditional-style classification buildings and segmentation from the pedestrian of the buildings perspective. is a necessary Inspired step by the in greencalculating view index,the whichview uses index color of picturestraditional-style to evaluate buildings the visibility from th ofe pedestrian the surrounding perspective. urban forestsInspired as by representative the green of theview pedestrians’ index, which view uses of color the greenery pictures [to40 evaluate], we developed the visibility the traditional of the surro buildingunding viewurban index forests (TBVI) as torepresentative quantify pedestrians’ of the pedestrians’ visual perception view of ofthe Chinese greenery traditional-style [40], we developed buildings the traditional by interpreting building the streetview view index images. (TBVI) The to quantify TBVI is definedpedestrians’ as the visual proportion perception of Chinese of Chinese traditional traditional-style building buildings area in the totalby areainterpreting of the building the street in view the normal images. field The ofTBVI view is defined of the pedestrians. as the proportion of Chinese traditional buildingBased area on the in the building total area detection of the building and recognition in the normal for thefield panoramic of view of the imagery pedestrians. of each location, the TBVIBased is the on ratio the ofbuilding the number detection of pixels and ofrecognition the Chinese for traditional-style the panoramic buildingimagery toof theeach total location, number of buildingthe TBVI pixels.is the ratio The TBVIof the formulanumber isof as pixels follows: of the Chinese traditional-style building to the total number of building pixels. The TBVI formula is as follows: Appl. Sci. 2020, 10, 5963 9 of 19

Pm i=1 Areat_i TBVI = Pm 100% (1) i=1 Areab_i × where Areat_i represents the total number of pixels of the Chinese traditional-style building extracted by the TBMask R-CNN model in direction i; Areab_i is the total number of building pixels in the whole image in direction i; and the parameter m is the number of images facing different directions in the horizontal orientation of the camera lens; in this study, m = 4. To measure the visibility of Chinese traditional-style buildings at street level, the street of traditional building view index (STBVI) was defined. We provide the STBVI Formula (2) as follows:

Pn Pm j=1 i=1 Areat_ij STBVI = Pn Pm 100% (2) j=1 i=1 Areab_ij × where Areat_ij indicates the total number of traditional-style building pixels extracted from the image by the TBMask R-CNN model in the i direction of location j on the street. Areab_ij is the total number of building pixels extracted from the image by the TBMask R-CNN model in the i direction of location j on the same street. Parameter m is the number of images with different horizontal orientations, and n is the number of locations on the street.

4.3. Moran’s I Global spatial autocorrelation analysis mainly uses the global Moran’s I index to reflect whether the pattern expressed is clustered, dispersed, or random in the whole study area. Moran’s I is expressed as Formula (3) [41,42]: Pn Pn   n i=1 j=1 wij(xi x) xj x I = − − (3) P P  P 2 n n w n (x x) i=1 j=1 ij i=1 i − where n is equal to the total number of regions of the study object space; xi and xj represent the attribute value in the i-th and the j-th region, respectively; x is the average value of the attribute value of the studied region; wij represents the spatial weight between the attribute values in the i-th and the j-th region, and i is not equal to j. Moran’s I is [ 1, 1], in general, a Moran’s I value near 1.0 indicates − clustering or a stronger spatial positive autocorrelation while a value approaching 1.0 indicates − dispersion or a stronger spatial negative autocorrelation. Random distribution exists when the value is closer to zero. In this paper, TBVI at the street spot location is used as the attribute value of the point to calculate the global Moran’s I, which measures the degree of spatial autocorrelation of the Chinese traditional-style buildings in the study area.

5. Results

5.1. Accuracy of Building Segmentation We used the training set and the validation set to build the TBMask R-CNN model, the training set is used to train the network model to adjust the weights, and the validation set is used to minimize overfitting. The test set is only for testing the trained models in order to confirm the actual predictive power of the network. Here, two training strategies are designed based on transfer learning. Strategy 1: freeze the convolutional base, which directly uses the model pre-trained on the COCO data set to extract features of the buildings, training set and validation set are only used for the retraining of the pre-trained model’s head layer to obtain the classifier of the building categories; Strategy 2: fine-tuning, which trains some layers and leaves others frozen. Here, we freeze the weights of the low-level convolutional layer of the pre-training model ResNet101, and start fine-tuning the parameters from the third layer to obtain the parameters of TBMask R-CNN model. Appl. Sci. 2020, 10, 5963 10 of 19

The test set is used to evaluate the performance of the trained models by two strategies. We use the evaluation performance indicators of the average precision (AP) of each category and the mean Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 20 average precision (MAP) of all categories. Table4 present the evaluation performance indicators of the two trainedindicators models of the on two test trained dataset, models which on show test thatdataset, the which second show model that using the second Strategy model 2 is betterusing than the firstStrategy model 2 is using better Strategy than the 1. first With model the secondusing Stra model,tegy 1. the With MAP the wassecond 0.8, model, AP of the the MAP official-style was 0.8, and folk-styleAP of buildings the official-style were 0.78 and and folk-style 0.81, respectively, buildings were which 0.78 indicated and 0.81, that respectively, the MAP which of the indicated segmentation buildingthat couldthe MAP reach of the 0.8, segmentation AP of the segmentation building could of reach offi cial-style0.8, AP of buildingsthe segmentation could reachof official-style 0.78, and AP buildings could reach 0.78, and AP segmentation of folk-style buildings could reach 0.81, which segmentation of folk-style buildings could reach 0.81, which indicated that the second model can indicated that the second model can predict the architectural style categories. Finally, we applied predict the architectural style categories. Finally, we applied the second model to predict 272,244 TSV the second model to predict 272,244 TSV images and complete the pixel-level image segmentation, imagesand and the completeresults provide the pixel-level the data source image for segmentation, calculating the and TBVI the of resultstraditional-style provide thebuildings data sourcefrom for calculatingthe pedestrian the TBVI perspective. of traditional-style buildings from the pedestrian perspective.

TableTable 4. Average 4. Average precision precision (AP) (AP) and and mean mean average average precision precision(MAP) (MAP) ofof thethe TBMask R-CNN R-CNN model model on test dataset.on test dataset.

StrategyStrategy BuildingBuilding Style Style APAP MAP MAP OffiOfficialcial style style 0.53 0.53 StrategyStrategy 1 1 Folk-styleFolk-style 0.55 0.550.54 0.54 Non-traditional-style 0.54 Non-traditional-style 0.54 Official style 0.78 Strategy 2 Folk-styleOfficial style 0.81 0.780.80 Strategy 2 Non-traditional-styleFolk-style 0.82 0.81 0.80 Non-traditional-style 0.82 Figure5 shows the original TSV images and the manual segmentation and TBMask R-CNN segmentationFigure results 5 shows for the one original random TSV location images on and Dafengxiang the manual Hutongsegmentation within and the TBMask Second R-CNN Ring Road. The firstsegmentation and second results columns for one in random Figure 5location present on the Dafengxiang TSV images Hutong in the within four horizontalthe Second Ring directions Road. and The first and second columns in Figure 5 present the TSV images in the four horizontal directions the artificial annotation for the buildings, and the last column shows the results identified by using the and the artificial annotation for the buildings, and the last column shows the results identified by pixel-levelusing segmentationthe pixel-level segmentation model (TBMask model R-CNN), (TBMask which R-CNN), had anwhich accuracy had an of accuracy 0.983. The of 0.983. TBVI The value of the locationTBVI value was of 1.0 the according location was to Formula1.0 according (1). to Formula (1).

Figure 5. Comparison of the segmentation results (116.381335, 39.937614). Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 20 Appl. Sci. 2020, 10, 5963 11 of 19 Figure 5. Comparison of the segmentation results (116.381335, 39.937614).

5.2. TBVI Results Results According to to Formula Formula (1), (1), the the TBVI TBVI of of the the offici offial-stylecial-style buildings buildings (O-TBVI) (O-TBVI) and and the theTBVI TBVI of the of thefolk-style folk-style buildings buildings (F-TBVI) (F-TBVI) at each at each location location were were calculated, calculated, and and the the TBVI TBVI of of the ChineseChinese traditional-style buildingbuilding (TBVI) (TBVI) was was equal equal to to the the sum sum of of the the O-TBVIs O-TBVIs and and F-TBVIs. F-TBVIs. Figure Figure6 shows 6 shows the TBVIthe TBVI results results of the of Chinesethe Chinese traditional-style traditional-style buildings buildings for all for locations all locations in the in study the study area. Thearea. TBVI The rangedTBVI ranged from 0from to 1, with0 to 1, a meanwith a of mean 0.137. of The 0.137 red. The dot indicatesred dot indicates the largest the TBVI, largest while TBVI, the greenwhile dotthe indicatesgreen dot the indicates smallest the TBVI. smallest Generally, TBVI. the Generally, values of thethe TBVI values were of mostthe TBVI pronounced were most inside pronounced the Second Ringinside Road, the Second first decreasing Ring Road, from first the decreasing Second Ring from Road the toSecond the Fourth Ring RingRoadRoad, to the and Fourth then Ring increasing Road, towardand then the increasing Fifth Ring toward Road to the beyond Fifth the Ring Fourth Road Ring to Road.beyond The the spatial Fourth distribution Ring Road. of theThe Chinese spatial traditional-styledistribution of the buildings Chinese within traditional-style the Second buildi Ring Roadngs within was aggregated the Second overall, Ring Road and the was northern aggregated area hadoverall, a higher and concentrationthe northern thanarea thehad southern a higher area. concentration However, thethan spatial the southern distribution area. was However, dispersed the in thespatial region distribution outside the was Second dispersed Ring in Road. the region outside the Second Ring Road.

Figure 6. The overall spatial distribution of the traditio traditionalnal building view index (TBVI) of the Chinese traditional-style buildings.

The results of the O-TBVI and F-TBVI were shownshown in Figures7 7 and and8 .8. The The Moran’s Moran’s I I values values of of the O-TBVI, F-TBVI and TBVI in the study area are shown in Table4 4.. InIn thethe studystudy area,area, Moran’sMoran’s II ofof the Chinese traditional-style building’s TBVI, the oofffificial-stylecial-style building’s O-TBVI and the folk-style building’s F-TBVIF-TBVI were were 0.27, 0.27, 0.09 0.09 and 0.28,and respectively.0.28, respectively. The results The indicated results indicated that the spatial that distributionthe spatial ofdistribution the Chinese of traditional-style the Chinese buildingstraditional-style and its twobuildings subcategories and its presented two subcategories aggregation andpresented had a positiveaggregation spatial and correlation, had a positive in which spatial the correlation, positive spatial in which correlation the positive of the folk-stylespatial correlation was the largest. of the Thefolk-style Moran’s was I the values largest. of the The TBVI Moran’s within I values the Second of the TBVI Ring Road,within betweenthe Second the Ring Second Road, Ring between Road (excludingthe Second theRing Second Road Ring(excluding Road) the and Second Third RingRing Road,Road) betweenand Third the Ring Third Road, Ring between Road (excluding the Third theRing Third Road Ring (excluding Road) andthe Third the Fourth Ring Road) Ring Road, and th ande Fourth between Ring the Road, Fourth and Ring between Road the (excluding Fourth Ring the FourthRoad (excluding Ring Road) the and Fourth the FifthRing RingRoad) Road and werethe Fifth 0.47, Ring 0.13, Road 0.033, were and 0.47, 0.145, 0.13, respectively, 0.033, and which0.145, decreasedrespectively, from which the Seconddecreased Ring from Road the to Second the Fourth Ring Ring Road Road, to the but Fourth the trend Ring toward Road, thebut Fifththe trend Ring Appl. Sci. 2020, 10, 5963 12 of 19

Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 20 Road was increasing. The positive spatial correlation of the TBVI was the largest within the Second Ringtoward Road, whilethe Fifth it had Ring the Road smallest was increasing. positive correlation The positive from spatial the Thirdcorrelation Ring of Road the toTBVI the was Fourth the Ring largest within the Second Ring Road, while it had the smallest positive correlation from the Third Road. The spatial correlation patterns of the O-TBVI and F-TBVI were the same as those of the TBVI. Ring Road to the Fourth Ring Road. The spatial correlation patterns of the O-TBVI and F-TBVI were In particular,the same Moran’sas those of I ofthe the TBVI. O-TBVI In particular, was lower Moran’s than I thatof the of O-TBVI the F-TBVI was lower (Table than5),indicating that of the that the ChineseF-TBVI (Table traditional-style 5), indicating buildings that the Chinese that pedestrians traditional-style perceived buildings from that the pedestrians street level perceived were mainly folk-stylefrom the buildings. street level were mainly folk-style buildings.

Appl. Sci. 2020, 10, xFigure FORFigure PEER 7. Spatial7. REVIEW Spatial distribution distribution ofof the TBVI of of the the official-style official-style buildings. buildings. 13 of 20

FigureFigure 8. Spatial8. Spatial distribution distribution of the TBVI TBVI of of the the folk-style folk-style buildings. buildings.

Table 5. Moran’s I of the Chinese traditional-style buildings.

Area TBVI O-TBVI F-TBVI Within Second Ring Road 0.47 0.17 0.45 Between Second and Third Ring Roads 0.13 0.14 0.11 Between Third and Fourth Ring Roads 0.033 0.026 0.029 Between Fourth and Fifth Ring Roads 0.145 0.094 0.129 Total 0.27 0.09 0.28

5.3. STBVI Results According to Formula (2), the STBVI of the official-style buildings (O-STBVI) and the STBVI of the folk-style buildings (F-STBVI) at each street were calculated, and the STBVI of the Chinese traditional-style building (as STBVI) is equal to the sum of O-STBVI and F-STBVI. Figure 9 shows the STBVI of the traditional Chinese building for all streets in the study area. Based on the natural break’s method, the STBVI, O-STBVI and F-STBVI were divided into five intervals: very low (0.00, 0.09), low (0.10, 0.33), medium (0.34, 0.57), high (0.58, 0.83) and very high (0.84, 0.10). From the city center to the Fifth Ring Road in the study area, the streets with very high STBVI values were mainly distributed to the north of the second ring area, such as outside the eastern gate of the Palace Museum, Jingshan Qian Street, Wenjin Street, Houhai Beiyan Street and Gulou West Street (see Marking T1); to the south of the Second Ring Road, such as Yongding Men West Street, Qinian Street, Dongxiao Street, Xiting Hutong Street (see Marking T2); and to the west of the second ring area, such as the roads from the north of the western fourth ring area to the eight north of the western fourth ring area, Baochan Hutong, Qianmao Hutong and the southern side of Chang’an Street (see marks T3 and T4). In addition, the very high STBVI values were also concentrated in the southeast, northwest, and southwest corners of the fifth ring area (see marks T5, T6 and T7). The spatial distributions of O-STBVI and F-STBVI are shown in Figures 10 and 11. Compared with Figures 10 and 11, it can be seen that the buildings of official-style and folk-style were mainly concentrated in the second ring area, and the spatial distribution of the O-STBVI was obviously Appl. Sci. 2020, 10, 5963 13 of 19

Table 5. Moran’s I of the Chinese traditional-style buildings.

Area TBVI O-TBVI F-TBVI Within Second Ring Road 0.47 0.17 0.45 Between Second and Third Ring Roads 0.13 0.14 0.11 Between Third and Fourth Ring Roads 0.033 0.026 0.029 Between Fourth and Fifth Ring Roads 0.145 0.094 0.129 Total 0.27 0.09 0.28

5.3. STBVI Results According to Formula (2), the STBVI of the official-style buildings (O-STBVI) and the STBVI of the folk-style buildings (F-STBVI) at each street were calculated, and the STBVI of the Chinese traditional-style building (as STBVI) is equal to the sum of O-STBVI and F-STBVI. Figure9 shows the STBVI of the traditional Chinese building for all streets in the study area. Based on the natural break’s method, the STBVI, O-STBVI and F-STBVI were divided into five intervals: very low (0.00, 0.09), low (0.10, 0.33), medium (0.34, 0.57), high (0.58, 0.83) and very high (0.84, 0.10). From the city center to the Fifth Ring Road in the study area, the streets with very high STBVI values were mainly distributed to the north of the second ring area, such as outside the eastern gate of the Palace Museum, Jingshan Qian Street, Wenjin Street, Houhai Beiyan Street and Gulou West Street (see Marking T1); to the south of the Second Ring Road, such as Yongding Men West Street, Qinian Street, Dongxiao Street, Xiting Hutong Street (see Marking T2); and to the west of the second ring area, such as the roads from the north of the western fourth ring area to the eight north of the western fourth ring area, Baochan Hutong, Qianmao Hutong and the southern side of Chang’an Street (see marks T3 and T4). In addition, the very high STBVI values were also concentrated in the southeast, northwest, and southwest corners of the fifth ring area (see marks T5, T6 and T7). The spatial distributions of O-STBVI and F-STBVI are shown in Figures 10 and 11. Compared with Figures 10 and 11, it can be seen that the buildings of official-style and folk-style were mainly concentrated in the second ring area, and the spatial distribution of the O-STBVI was obviously sparser than that of the F-STBVI. Official-style buildings were concentrated in the area near the Temple of Heaven and the Palace Museum, and the folk-style buildings were concentrated to the northeast, northwest and south of Chang’an Street in the second ring area. Table6 shows the mean and percentage of the STBVI. From the table, it can be seen that the mean O-STBVI values from the Second Ring Road to the Fifth Ring Road were 0.018, 0.005, 0.006 and 0.009; the mean F-STBVI values were 0.347, 0.050, 0.047 and 0.067, and the mean STBVI values were 0.365, 0.055, 0.052 and 0.076, respectively. The overall spatial distribution of STBVI was similar to that of TBVI. The streets with high STBVI were densely distributed inside the Second Ring Road and scattered on the fringe around the Fifth Ring Road. Very few streets between the Second Ring Road and Fourth Ring Road had visible traditional-style buildings. Appl. Sci. 2020, 10, x FOR PEER REVIEW 14 of 20 sparser than that of the F-STBVI. Official-style buildings were concentrated in the area near the Temple of Heaven and the Palace Museum, and the folk-style buildings were concentrated to the northeast, northwest and south of Chang’an Street in the second ring area. Table 6 shows the mean and percentage of the STBVI. From the table, it can be seen that the mean O-STBVI values from the Second Ring Road to the Fifth Ring Road were 0.018, 0.005, 0.006 and 0.009; the mean F-STBVI values were 0.347, 0.050, 0.047 and 0.067, and the mean STBVI values were 0.365, 0.055, 0.052 and 0.076, respectively. The overall spatial distribution of STBVI was similar to that of TBVI. The streets with high STBVI were densely distributed inside the Second Ring Road and scattered on the fringe

Appl.around Sci. 2020the ,Fifth10, 5963 Ring Road. Very few streets between the Second Ring Road and Fourth Ring14 Road of 19 had visible traditional-style buildings.

Figure 9.9. TheThe overalloverall spatialspatial distribution distribution of of the the street street of of traditional traditional building building view view index index (STBVI) (STBVI) of the of Appl. Sci.Chinesethe 2020 Chinese, 10 traditional-style, x traditional-styleFOR PEER REVIEW buildings. buildings. 15 of 20

Figure 10. SpatialSpatial distribution distribution of the STBVI of the official-style official-style buildings.

Figure 11. Spatial distribution of the STBVI of the folk-style buildings.

Appl. Sci. 2020, 10, x FOR PEER REVIEW 15 of 20

Appl. Sci. 2020, 10, 5963 15 of 19 Figure 10. Spatial distribution of the STBVI of the official-style buildings.

Figure 11. Spatial distribution of the STBVI of the folk-style buildings.

Table 6. STBVI statistics of the Chinese traditional-style buildings and its two subcategories.

Chinese Traditional-Style Subclass 1: Official-Style Subclass 2: Folk-Style Area STBVI Percentage STBVI Percentage STBVI Percentage Mean (%) Mean (%) Mean (%) Within Second 0.365 31.76 0.018 7.69 0.347 30.19 Ring Road Between Second and 0.055 14.40 0.005 3.18 0.050 12.70 Third Ring Roads Between Third and 0.052 13.81 0.006 3.77 0.047 11.79 Fourth Ringing Roads Between Fourth and 0.076 17.11 0.009 5.16 0.067 14.78 Fifth Ring Roads

6. Discussion and Conclusions

6.1. Discussion on TBVI The spatial pattern of TBVI can be interpreted from two perspectives. First, the history of urban development determined the location and quality of Chinese traditional-style buildings, which was the basis of future preservation. The area within the Second Ring Road was the Old City, which had existed since the Ming Dynasty. Historically, the Old City was once divided into the South City and the North City. The North City has the Imperial City, while the South City has the Seoul City, where the ordinary artisans and the captured Han people lived [43]. Therefore, the quality of the buildings in the North City is higher than that of the buildings in the South City, and they more likely to be preserved as heritage. In addition, the Summer Palace and Yuanmingyuan, which are at the northwest corner of the Fifth Ring Road, are the royal courts that were built during the Qing Dynasty; to the east of the Fifth Ring Road is the village of Gaobeidian, which was a center for the distribution of the grains transported via the Grand Canal from South China to the capital during the Qing Dynasty. Appl. Sci. 2020, 10, 5963 16 of 19

Second, the development and renewal of modern Beijing based on the historical cities greatly affect the level of preservation of traditional buildings. During this process, there are different development strategies for historical sites. The choice between preservation or reconstruction is heavily influenced by the attitudes toward traditional buildings. During the early years of urban development after the founding of the People’s Republic of China, the value of historical districts was not widely recognized, which resulted in massive demolition in the area of active urban construction from the Second Ring Road [43]. As the city expanded further from the Old City, the awareness of the historical value increased, resulting in greater preservation in the later-developed areas near the Fifth Ring Road.

6.2. Discussion on STBVI In the Old City of Beijing, the STBVI value of the east-west streets was generally higher than that of the north-south streets. This was probably a result of the configuration of hutongs and courtyards, which are the traditional form of residence in Beijing. Due to the long-standing emphasis on orientation in China, each traditional courtyard housing one large family usually extended in the north-south direction, while several courtyards would line up side by side with an alley, called a hutong, connecting their northern and southern entrances from the east to west; several rows of courtyards and hutong were aligned one after another from the south to the north, forming a basic residential unit [44,45]. Because the east–west streets mainly accommodated the entrances while the north–south streets could only see the sidewalls, the traditional buildings could be perceived more easily on the east–west streets. The streets with high STBVI values were interconnected with some districts inside the Second Ring Road and near the eastern and northwestern Fifth Ring Road, while the streets where mostly buildings of the traditional style were visible were scattered isolated in between. On the one hand, this distribution is a result of the history of Beijing, which is similar to the reason behind the distribution of the TBVI. On the other hand, this difference was partly the outcome of the different conservation strategies. There were several historic conservation districts classified by the planning institute and the city government inside the Second Ring Road, where urban development was required to pay special attention to the overall protection of the traditional style and features [46]. Meanwhile, the streets with high STBVI values scattered between the Second and Fourth Ring Roads were mostly located along historic buildings, for which the protection covers only the building and its surroundings instead of the whole district. Thus, it is implied that the overall protection strategy in the historic conservation districts was effective inside the Second Ring Road and that the interconnected streets could form multiple convenient touring paths and increase the integration of visual perception, contributing to harmony and impressiveness in the local image of the city. The results provide a reference for the realization of the goal of “forming a landscape control area between the Second and Third Ring Road”, which was proposed by the Beijing urban master plan in 2017 [47–49]. According to the requirements for coordination with the ancient capital landscape, it is necessary to strengthen the modern representation of the connotation of traditional-style architectural culture.

6.3. Conclusions This paper used TSV images with deep learning and transfer learning to construct a deep learning model, TBMask R-CNN, which automatically extracts Chinese traditional-style architecture from street view images and proposed TBVI and STBVI to quantify pedestrians’ visual perceptions of Chinese traditional-style buildings. With the support of TBMask R-CNN, the images of the Chinese traditional-style buildings within the Fifth Ring Road of Beijing were automatically segmented, and then the TBVI was used to quantify the spatial distribution of perception of Chinese traditional-style buildings from the pedestrian’s perspective. In general, the spatial distribution of the Chinese traditional-style buildings was characterized by aggregation within the Second Ring Road, with a higher concentration of buildings in the north than in the south, and dispersion between the Second Ring Road (excluding the Second Ring Road) Appl. Sci. 2020, 10, 5963 17 of 19 and the Fifth Ring Road. The average value of the TBVI indicated that traditional-style buildings within Beijing’s Fifth Ring Road need to be protected compared with other styles of buildings, and the buildings to be planned within the Third Ring Road should consider whether their style will contribute to the Beijing urban planning goals proposed in 2017. The Moran’s I of the O-TBVI was lower than that of the F-TBVI, which indicated that the Chinese traditional-style buildings that pedestrians perceived from street view were mainly folk-style buildings. Our work could be useful for urban street management and planning, Specifically, (1) we propose a method that can automatically perceive the urban landscape on a large scale; (2) the proposed indicators can quantitatively evaluate the overall spatial distribution of the urban traditional style, and provide an overview of the spatial distribution of traditional urban features, which helps urban management and planning. For example, location planning issues: when planning such as hotels, shopping malls, public stadiums and other large facilities, considered traditional landscapes as their viewing points; (3) street view images are updated quickly, and our proposed method can be directly applied to new street view images to perceive the city scene, and can help evaluate urban renewal projects with historical street view images. The project’s influence on the perception of traditional-style buildings can be easily clarified quantitatively by comparison of the TBVI and STBVI values calculated through street view images before and after the project. All these analyses could contribute to the realization of the Beijing urban planning goal of “forming a landscape control area between the Second and Third Ring Road”, which was proposed by the Beijing urban master plan in 2017. In this experiment, the Chinese traditional-style building recognition model, TBMask R-CNN, had MAP of 0.8 for the verification set, but the building occlusion problem caused by trees and utility poles in the street view images affected the accuracy of the recognition and segmentation of the Chinese traditional-style building images. Therefore, strategies for dealing with occlusion and improving the accuracy of the recognition and segmentation requires further study.

Author Contributions: Conceptualization, T.P., L.Z. and C.S.; methodology, L.Z.; formal analysis, M.W.; data curation, X.W.; writing—original draft preparation, L.Z.; writing—review and editing, T.P., L.Z. and M.W.; visualization, S.G.; supervision, Y.C.; funding acquisition, T.P. and L.Z. All authors have read and agreed to the published version of the manuscript. Funding: This research was founded by the National Natural Science Foundation of China, grant number 41525004, 41421001, and 41877523, and supported by a grant from the State Key Laboratory of Resources and Environmental Information System, and the Science Foundation of China University of Petroleum, Beijing, Grant No. ZX20200100. Acknowledgments: Many thanks to the Tencent company for authorizing us to use Tencent street view pictures in this study. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Dulinkeita, A.; Thind, H.; Affuso, O.; Baskin, M.L. The associations of perceived neighborhood disorder and physical activity with obesity among African American adolescents. BMC Public Health 2013, 13, 440. [CrossRef][PubMed] 2. Dubey, A.; Naik, N.; Parikh, D.; Raskar, R.; Hidalgo, C.A. Deep learning the city: Quantifying urban perception at a global scale. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 196–212. 3. Porzi, L.; Rota Bulò, S.; Lepri, B.; Ricci, E. Predicting and understanding urban perception with convolutional neural networks. In Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia, 26–30 October 2015; pp. 139–148. 4. Naik, N.; Philipoom, J.; Raskar, R.; Hidalgo, C.A. Streetscore—Predicting the perceived safety of one million streetscapes. In Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 793–799. 5. Liu, L.; Silva, E.A.; Wu, C.; Wang, H. A machine learning-based method for the large-scale evaluation of the qualities of the urban environment. Comput. Environ. Urban Syst. 2017, 65, 113–125. [CrossRef] Appl. Sci. 2020, 10, 5963 18 of 19

6. He, S.; Yoshimura, Y.; Helfer, J.; Hack, G.; Ratti, C.; Nagakura, T. Quantifying memories: Mapping urban perception. arXiv 2018, arXiv:1806.04054. 7. Doersch, C.; Singh, S.; Gupta, A.; Sivic, J.; Efros, A.A. What makes Paris look like Paris. CACM 2015, 58, 103–110. [CrossRef] 8. Lee, S.; Maisonneuve, N.; Crandall, D.; Efros, A.A.; Sivic, J. Linking past to present: Discovering style in two centuries of architecture. In Proceedings of the IEEE International Conference on Computational Photography, Houston, TX, USA, 24–26 April 2015; pp. 1–10. 9. Seide, F.; Li, G.; Chen, X.; Yu, D. Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription. In Proceedings of the Automatic Speech Recognition and Understanding, Waikoloa, HI, USA, 11–15 December 2011; pp. 24–29. 10. Xu, Y.; Yang, Q.; Cui, C.; Shi, C.; Song, G.; Han, X.; Yin, Y. Visual urban perception with deep semantic-aware network. In Proceedings of the Conference on Multimedia Modeling, Thessaloniki, Greece, 8–19 January 2019; pp. 28–40. 11. Anguelov, D.; Dulong, C.; Filip, D.; Frueh, C.; Lafon, S.; Lyon, R.; Ogale, A.; Vincent, L.; Weaver, J. Google street view: Capturing the world at street level. Computer 2010, 43, 32–38. [CrossRef] 12. Cheng, L.; Chu, S.; Zong, W.; Li, S.; Wu, J.; Li, M. Use of tencent street view imagery for visual perception of streets. Int. J. Geo-Inf. 2017, 6, 265. [CrossRef] 13. Hara, K.; Le, V.; Froehlich, J. Combining crowdsourcing and google street view to identify street-level accessibility problems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Paris, France, 27 April–2 May 2013; pp. 631–640. 14. Hwang, J.; Sampson, R.J. Divergent pathways of gentrification:racial inequality and the social order of renewal in Chicago neighborhoods. Am. Sociol. Rev. 2014, 79, 726–751. [CrossRef] 15. Zhang, L.; Pei, T.; Chen, Y.; Song, C.; Liu, X. A Review of urban environmental assessment based on street view images. J. Geo-Inf. Sci. 2019, 21, 46–58. [CrossRef] 16. Kelly, C.M.; Wilson, J.S.; Baker, E.A.; Miller, D.K.; Schootman, M. Using google street view to audit the built environment: Inter-rater reliability results. Ann. Behav. Med. 2013, 45, 108–112. [CrossRef] 17. Rundle, A.G.; Bader, M.D.M.; Richards, C.A.; Neckerman, K.M.; Teitler, J.O. Using google street view to audit neighborhood environments. Am. J. Prev. Med. 2011, 40, 94–100. [CrossRef] 18. Clarke, P.; Ailshire, J.; Melendez, R.; Bader, M.; Morenoff, J. Using google earth to conduct a neighborhood audit: Reliability of a virtual audit instrument. Health Place 2010, 16, 1224–1229. [CrossRef][PubMed] 19. Badland, H.M.; Opit, S.; Witten, K.; Kearns, R.A.; Mavoa, S. Can virtual streetscape audits reliably replace physical streetscape audits? J. Urban Health 2010, 87, 1007–1016. [CrossRef][PubMed] 20. Naik, N.; Kominers, S.D.; Raskar, R.; Glaeser, E.L.; Hidalgo, C.A. Computer vision uncovers predictors of physical urban change. Proc. Natl. Acad. Sci. USA 2017, 114, 7571–7576. [CrossRef][PubMed] 21. Dong, R.; Zhang, Y.; Zhao, J. How green are the streets within the sixth ring road of Beijing? An analysis based on tencent street view pictures and the green view index. Int. J. Environ. Res. Public Health 2018, 15, 1367. [CrossRef][PubMed] 22. Li, X.; Zhang, C.; Li, W.; Kuzovkina, Y.A.; Weiner, D. Who lives in greener neighborhoods? The distribution of street greenery and its association with residents’ socioeconomic conditions in Hartford, Connecticut, USA. Urban For. Urban Green. 2015, 14, 751–759. [CrossRef] 23. Li, X.; Zhang, C.; Li, W.; Ricard, R.; Meng, Q.; Zhang, W. Assessing street-level urban greenery using Google Street View and a modified green view index. Urban For. Urban Green. 2015, 14, 675–685. [CrossRef] 24. Li, X.; Zhang, C.; Li, W. Does the visibility of greenery increase perceived safety in urban areas? Evidence from the place pulse 1.0 dataset. ISPRS Int. Geo-Inf. 2015, 4, 1166–1183. [CrossRef] 25. Li, H.; Páez, A.; Liu, D. Built environment and violent crime: An environmental audit approach using Google Street View. Comput. Environ. Urban Syst. 2017, 66, 83–95. 26. Liang, J.; Gong, J.; Sun, J.; Zhou, J.; Li, W.; Li, Y.; Liu, J.; Shen, S. Automatic sky view factor estimation from street view photographs—A big data approach. Remote Sens. 2017, 9, 411. [CrossRef] 27. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; pp. 886–893. 28. Lowe, D.G. Distintive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [CrossRef] Appl. Sci. 2020, 10, 5963 19 of 19

29. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. 30. Donahue, J.; Jia, Y.; Vinyals, O.; Hoffman, J.; Zhang, N.; Tzeng, E.; Darrell, T. Decaf: A deep convolutional activation feature for generic visual recognition. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 647–655. 31. Seresinhe, C.I.; Preis, T.; Moat, H.S. Using deep learning to quantify the beauty of outdoor places. R. Soc. Open Sci. 2017, 4, 170170. [CrossRef] 32. Shen, Q.; Zeng, W.; Ye, Y.; Arisona, S.M.; Schubiger, S.; Burkhard, R.; Qu, H. StreetVizor: Visual exploration of human-scale urban forms based on street views. IEEE Trans. Vis. Comput. Graph. 2018, 24, 1004–1013. [CrossRef][PubMed] 33. Trends and Characteristics of Population Change in Beijing in 2014. Available online: http://tjj.beijing.gov.cn/ tjsj/zxdcsj/rkcydc/dcsj_4597/201601/t20160128_171191.html (accessed on 6 July 2019). 34. Tencent Street View (TSV) API. Available online: https://lbs.qq.com/panostatic_v1/guide-getImage.html (accessed on 10 August 2018). 35. Li, S. The Arts of China; People’s Publishing House: , China, 2006. 36. Liu, S. Construction Civilization—Chinese Traditional Culture and Traditional Architecture; Tsinghua University Press: Beijing, China, 2014. 37. Cao, X. The Research on Urban Color of Historic Sites in the Old City of Beijing. Master’s Thesis, Beijing University of Civil Engineering and Architecture, Beijing, China, 2012. 38. Dutta, A.; Gupta, A.; Zissermann, A. Image Annotator. Available online: http://www.robots.ox.ac.uk/~{}vgg/ software/via (accessed on 20 August 2018). 39. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. 40. Yang, J.; Zhao, L.; Mcbride, J.; Gong, P. Can you see green? Assessing the visibility of urban forests in cities. Landsc. Urban. Plan. 2009, 91, 97–104. [CrossRef] 41. Dubin, R.A. Spatial autocorrelation: A primer. J. Hous. Econ. 1998, 7, 304–327. [CrossRef] 42. Kelejian, H.H.; Prucha, I.R. On the asymptotic distribution of the Moran I test statistic with applications. J. Econom. 2001, 104, 219–257. [CrossRef] 43. Liu, L. Study on the Protection of District Protectde Historical Sites in Beijing Old City. Master’s Thesis, Beijing University of Civil Engineering and Architecture, Beijing, China, 2018. 44. Ping-Fang, X.U. The Planning and preservation of the streets in the old city of Beijing. J. Beijing Union Univ. 2008, 6, 23–27. 45. Yao, T.; Yang, X. A Comparative study on the street space form in the old city of Beijing: A case study of Shijia Hutong, the White Stupa Temple area, and dashilanr. J. Landsc. Res. 2018, 10, 22–26. 46. Whitehand, J.; Gu, K. Urban conservation in China: Historical development, current practice and morphological approach. Town Plan. Rev. 2007, 78, 643–670. [CrossRef] 47. Beijing Urban Master Plan (2016–2035). Available online: http://ghzrzyw.beijing.gov.cn/zhengwuxinxi/zxzt/ bjcsztgh20162035/202001/t20200102_1554606.html (accessed on 20 May 2018). 48. Lambiotte, R.; Blondel, V.D.; Kerchove, C.D.; Huens, E.; Prieur, C.; Smoreda, Z.; Dooren, P.V. Geographical dispersal of mobile communication networks. Phys. A Stat. Mech. Its Appl. 2008, 387, 5317–5325. [CrossRef] 49. Wang, M.H.; Schrock, S.D.; Broek, N.V.; Mulinazzi, T. Estimating dynamic origin-destination data and travel demand using cell phone network data. Int. J. ITS Res. 2013, 11, 76–86. [CrossRef]

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).