The Best Place in Tokyo for Anything: Classifying Stations by Walkability to Speciﬁc Amenities

4N4-IS-1c-01

The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021

Aaron Bramson∗1 Megumi Hori∗2 ∗1 GA Technologies, Inc. ∗2 GA Technologies, Inc. RIKEN Center for Biosystems Dynamics Research Ghent University

Walkability analyses have recently gained attention for economic, health, and environmental reasons. We find clusters of areas with similar walkability profiles among the train stations of central Tokyo. First, we use a breadth-first search algorithm on the road network to determine the walkable areas within 5, 10, and 15 minutes of each station. We then collect the establishments within 50m of any traversed edge. We perform three analyses: (1) classifying regions by the numbers of stores of each type, (2) recursive feature selection and reclassification, and (3) scoring areas by their specialization in one of the categories. We find that classification without feature selection produces more useful results, and in both cases the <15 minute isochrones yield the best results. We also achieve realistic specialization results. These methods can be broadened to identify regions that are over- and under-serviced by amenities with impacts for both policy and business planning.

1. Introduction of the level of specialization of areas within each category; i.e., to what degree does an area specializes in hobby shops, With increasing concerns over the environmental, eco- restaurants, etc. As much the analysis here is exploratory, nomic, and health effects of an overly car-dependent society, we finish with a description of the many avenues by which research into walkability has gain significant momentum. we can improve this research going forward. The end-goal is to find ways to improve the walkability of areas, but there are still many conceptual and methodologi- 2. Data cal issues needing to be resolved. Walkability is not a single measure, but rather a subclass of accessibility measures. Our analysis requires the integration of three disparate Accessibility measures describe the degree to which specific datasets: the walking network, train stations, and stores. amenities (healthy food, day care, sports facilities, etc.) can 2.1 Store Data be reached within an area or from some point, and ‘walka- Our store data comes from NTT Townpage bility’ measures the ability to reach those places by foot. In [NTT Townpage Inc. 19], a private data service that this way, we can have an entertainment-walkability measure provides lists of stores and other entities by category based for the degree to which entertainment venues are reachable on phone numbers. For the current demonstration, we by foot, but it’s not clear what components such a measure limit our analysis to within Tokyo’s 23 Wards (central needs. The number reachable, the minimum time to reach Tokyo) and to establishments within selected categories. one, the variety of such establishments, and other factors There are a total 93,145 establishments within this area, are all important, and combining them into a useful mea- partitioned into the following categories: amusement, sure of walkability will require a complex formulation and sport, travel, religion, cafe, supermarket, laundry, concern, many empirical tests. cram school, variety store, spa, hotel, hobby, convenience The current work focuses on clustering areas of central store, nursery school, public bath, clinic, shopping, bar, Tokyo by their access to various categories of stores and general hospital, eatery, sport shop, and drugstore. For identifying areas that specialize in some specific category. simplicity we refer to all these establishments as “stores.” We define regions as the walkable areas from each of the train stations within the 23 wards of Tokyo. After deter- 2.2 Walking Network Data mining the walkable area from each station, we collect the Our base network data is the “road” network from stores within that area to generate a store profile for each Open Street Map (OSM) [OpenStreetMap contributors 21], station. We use these profiles to cluster regions with sim- which includes pedestrian-only paths. We remove roads ilar numbers across store types. Then, because there is where walking is marked as prohibited (such as express- a great deal of correlation among the store categories, we ways). We also collect the nodes for subway and train sta- perform feature selection on the profiles to determine which tion entrances and wherever disconnected we link them to store categories have the greatest impact in differentiating the nearest road network node. Using the coordinates of the regions’ walkability. We follow this up with an analysis nodes, we generate the length of each edge using Haversine distance, and the traversal time using a walking speed of Contact: Aaron Bramson, GA Technologies, Inc., Roppongi 4.8 kph (80 meters per minute). Ideally we would use the 3-2-1, Minato-ku, Tokyo 106-6290, Japan, Tel: 03- footpath network data that includes sidewalks, pedestrian 6230-9280, [email protected]

1 4N4-IS-1c-01

The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021

bridges, multi-use paths, greenways, etc. to more accurately determine accessibility via walking [Ellis 16], however such data is not reliably available for Tokyo at present. 2.3 Train Station Data Because our analysis focuses on train stations, we in- corporate the locations of railway, subway, and street- car stations using [Lüthy 17] for the station and line names, and merge this with the station nodes from OSM [OpenStreetMap contributors 21] to get accurate locations for the 600 stations within central Tokyo. Stations for different lines that share the same name are captured as separate stations. Because entrance nodes in the OSM data typically do not specify for which station they are an exit, we simply connect each entrance node to its nearest station node to integrate the stations with the walking network. The resulting integrated network has 731,570 nodes and 893,187 edges. Upon analyzing our preliminary results we discovered that 27 stations were not properly integrated into the surrounding network (stations were connected to exits, which were connected to road components, but not to the main component). We ﬁltered these stations from our analysis here, leaving us with 573 stations.

3. Methods

We describe our method for integrating the stations into the walking network above; now we describe our methods for generating walkable areas, clustering areas by similar store profiles, feature selection, and identifying area spe- cialities. 3.1 Determining the Walkable Area Starting from each station we perform a single source Di- Figure 1: Top: The (often overlapping) isochrones for 573 jkstra algorithm using the NetworkX library [Hagberg 08] stations in the central Tokyo area. From the dark center- to calculate the walking times to each node in the network point marking the stations, the colors lighten in 5-minute within 15 minutes. We select three isochrones (<5, <10, increments. Bottom: Detail of the walkable areas for two <15 minutes) to test the sensitivity of the results to the stations (Akasaka and Gaienmae) demonstrating how the stores’ distances from the station. For each isochrone, we three walkable area isochrones (<5, <10, <15 minutes) take the set of edges traversed within the appropriate time are buffered by 50 meters to generate the walkable zones. frame and add a 50m buffer to its geometry (after con- Grey lines are road edges while yellow lines mark links be- verting its geometry into a distance-preserving coordinate tween exits and stations. Gold spots mark store locations. reference system centered on Tokyo station). [keplergl 20, Mapbox 21, OpenStreetMap contributors 21] Figure 1 (top) demonstrates the isochrones for all included stations within central Tokyo. Note that because the network and store data are all isolated within central Tokyo, any methodology that relies on network buffers to deter- there is an edge effect for stations near the boundary (see mine store inclusion) has the risk of missing stores in such especially the eastern area bordering Urayasu). Figure 1 case unless one takes care to fill in these gaps (see future (bottom) offers a detailed view for two sample stations. In work below). this figure you can also see the high level of detail of our 3.2 Clustering By Establishment Profiles network and the extent of a 50m buffer applied to all the Using the counts of stores for each category as the feature traversed edges. vector, we cluster the stations into groups with similar store The lower figure reveals a problem regarding the loca- profiles. First we need to prepare the store count data for tions of stores with respect to the road edges. We apply clustering. We first apply the typical method to standard- a uniform 50m buffer to all traversed edges; however, for ize the data: for each category we subtract the mean value a large shopping mall or department store, its coordinates from each station’s value, then divide by the standard devi- may be placed outside that buffer range (note gold dots ation for that category across all stations. The scale of the within white areas surrounded by walkable areas in Fig- resulting distributions can vary greatly depending on the ure 1 (bottom). The current methodology (and essentially spread of the data, so for parsimony across categories while

2 4N4-IS-1c-01

The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021

clustering we further normalize the data onto the [0, 1] scale stores in a particular category is insuﬃcient for making it by subtracting the min value and dividing by the spread a specialty. Areas like Shinjuku and Shibuya have many (max-min). Finally, we perform hierarchical clustering us- stores of every kind, so even though the number of drug ing the SciKit-Learn python package [Pedregosa 11] with stores is high, they do not count as areas that specialize seven clusters. in drugstores based on the large number alone. A crucial aspect of specialization is that the percent of stores of that 3.3 Recursive Feature Selection type is higher in that area than in other places. However, a Because it is diﬃcult to understand the meaning behind place that only has a few stores, even if they are all in the clusters based on so many features, we perform a feature same category, hardly counts as specializing in that cate- selection procedure to reduce the number of variables and gory. So a place that specializes in cafes doesn’t just have focus on the categories that provide the greatest informa- lots of cafes, it has lots of cafes and the ratio of those cafes tional gain. The store counts for many of the categories to all other stores is higher than other places. are (unsurprisingly) highly correlated. Figure 2 shows the Let ci,b represent the number of stores of category c ∈ C correlation matrix among the 23 store categories sorted by for location i within isochrone b, and let Ni,b represent the their correlation with the eateries category. We expect there total number of stores across all categories for that location within that isochrone. First we calculate the percent of stores for each category in the obvious way: pi,b = ci,b/Ni,b. We then standardize these percentages to get p˜i,b by subtracting the mean for that category across all the stations and dividing by the standard deviation. These p˜i,bs are z- values telling us how far above (or below) this location is for that category compared to the average percentage. So, for every location, the percent of stores that are convenience stores is high, but the standardized percent tells us which areas have an especially high concentration of them. To combine our measurement of the concentration of stores by category (p˜i,b) with the number of stores of that category (ci,b) into a specialization index, we use the following equation:

SI(ci,b)=˜pi,b ln ci,b (1)

We use the natural logarithm of the store counts to capture diminishing returns on larger numbers of stores. Also, while having more stores is always better for greater specializa- Figure 2: Blocks of highly related categories can be easily tion, the log allows us to balance the number against the discerned from this correlation matrix for all 23 included concentration of stores of that type to create a viable mea- store categories. sure of specialization. Note that for areas with fewer than average numbers of stores for a category, the p˜i,b value is is a great deal of redundant information among those cat- negative, so it will never have high specialization regardless egories that could prevent the clustering algorithm from of how many stores of that type it has. finding natural groupings of stations. For this reason, we also perform a recursive feature combination based on the 4. Results correlation values. For each of the isochrones, we determine the pair of cate- Our analysis reveals clusters of stations that match an gories with the highest correlation value. We then combine intuitive grouping of stations by similarity, but the results the raw store counts for those two categories and then re- are sensitive to the time horizons and the selected categories standardize and renormalize the result, effectively merging in surprising ways. the categories into one. Then, the correlation matrix of 4.1 Clustering on All Categories the remaining categories is recalculated. This process is re- First we report the results of clustering on all store cat- peated until no pair of categories exceeds a 0.50 correlation egories. For this analysis we chose to form 7 clusters in level. We then take the remaining combined categories and all cases. The maps in Figure 3 display the found clus- perform the same 7-group hierarchical clustering for each ters for each isochrone. With the <15 minute isochrones isochrone. (bottom) we can see strong clustering around known city 3.4 Identifying Area Specialties centers: central Shinjuku in red, surrounding Shinjuku as In order to identify the specialties of areas, and which ar- well as Ikebukuro, Shibuya and Ueno in dark blue, the eas are specialized in each category, we first need to define Ginza/Shimbashi area in dark green, the business area and formalize what we mean by specialization. The number around Tokyo station in light green, Odaiba in orange, and of stores is surely important, but having a large number of the area around Tokyo Dome in pink. For those familiar

3 4N4-IS-1c-01

The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021

with the Tokyo area, these groupings are highly intuitive. Grouping nearby stations together is a minimal expecta- tion because of how the walkable zones are created. But grouping the areas around Shinjuku, Shibuya, Ikebukuro, and Ueno together really demonstrates that the store profile is capturing a genuine similarity among these famous shopping and entertainment areas. Decreasing the walk time maintains some of those clusters, but in general the identified groups become smaller and less identifiable. Comparing the <5 isochrone (Fig- ure 3 top) we see that one station in the Ikebukuro area is in the same category as Ginza , two others are grouped with major shopping areas scattered around central Tokyo (light green), and two others are in the catch-all group (light- blue). When it comes to very large stations, if the travel time is only 5 minutes, then which side of the main station you are on can have a big difference on the reachable stores. Such a high degree of sensitivity makes the <5 minute results less reliable for capturing perceived levels of similar walkability. 4.2 Clustering on Selected Features Correlation-based recursive feature selection identifies the store categories with the least redundant information with respect to finding groups with similar store profiles. Be- cause the correlations differ by isochrone, we perform feature selection separately for each one, and Table 1 shows the resulting features. With increasing distance, more cate-

feature <5 min isochrone < 10 min isochrone <15 min isochrone 1 amusement amusement amusement 2 religion religion religion 3 cramSchool cramSchool cramSchool 4 generalHospital generalHospital generalHospital 5 spa spa publicBath, supermarket, laundry, nurserySchool 6 publicBath publicBath sportShop, drugstore, hotel, shopping, sport, hospital, travel, eatery, cafe, bar, hobby, conve- nienceStore, vari- etyStore, concern, spa 7 nurserySchool supermarket, laundry, nurserySchool 8 laundry concern, sportShop, hotel, varietyStore, drugstore, shopping, convenienceStore, sport, cafe, hobby, hospital, eatery, bar, travel 9 supermarket 10 concern 11 sportShop, drugstore, hotel, vari- etyStore, shopping, hospital, sport, cafe, hobby, eatery, bar, convenienceStore, travel Figure 3: Results of clustering on the all 23 categories for the <5 (top), <10 (middle), and <15 Table 1: The combined categories produced by recursive (bottom) minute isochrones. [keplergl 20, Mapbox 21, feature selection. OpenStreetMap contributors 21] gories become similar, and the number of features decreases

4 4N4-IS-1c-01

The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021

from 23 to 6 for the <15 minute isochrone compared to 11 for the <5 minute isochrone. Notice that the same 4 categories remain independent across all isochrones, and the pattern of combination matches expectations of store types that occur more frequently together in shopping districts, business centers, or suburban areas. The clusters generated by these reduced sets of features (shown in Figure 4) is markedly diﬀerent than the clusters produced using all the variables, although there are some important consistencies as well. For example, the impor-

Figure 5: Stations colored according to the number of cram schools within 15 minutes. The red and dark orange locations exhibit a high degree of match with the orange group from Figure 4. [keplergl 20, Mapbox 21, OpenStreetMap contributors 21]

the categories of importance in major shopping areas, we trade having a distinction between mega shopping and major shopping for a segmentation of stations by features that distinguish among less shopping-focused areas. This phe- Figure 4: Results of clustering on the combined nomena is exactly our motivation for performing the feature categories from recursive feature selection for the selection, and in the future we aim to amplify this effect to <15 minute isochrones. [keplergl 20, Mapbox 21, uncover a larger number of groups with each sharing a rec- OpenStreetMap contributors 21] ognizable store profile. 4.3 Specialty Analysis tance of the amusement category is revealed by the persis- In order to analyze the degree of specialization of an area, tent groups containing Tokyo Dome (blue in Figure 4) and we developed an index that combines the number of estab- Odaiba (dark green in Figure 4), which also appeared in the lishments of each category with the percent of local stores previous results. Just as above, the <5 minute isochrone that are within that category. Table 2 shows, for each cate- groups fail to identify areas with a recognizable pattern, gory, the station with the largest number of stores and that with most groups containing a few distant stations. The number, and similarly for the largest percent and largest results after feature selection are even less useful than those specialization index for the <10 walkable area. From this using all categories. The groups found using <10 minute table we can see all the possible patterns among these vari- isochrones with feature selection are more reasonable than ables: the same station is highest in all three (e.g. bars in the <5 minute ones, but here again feature selection seems Shimbashi), a higher concentration compensates for a lower to have moved the focus to niche areas that are particularly number of stores (e.g. stores of concern in Uguisudami vs strong in just one category. Ikebukuro), a large number suffices to outweigh a lower fo- The most salient difference in this analysis compared to cus (e.g. hotels in Minami-Senju vs Kasai-Rinkai Park), using all categories is the identification of the orange group and the many cases where the balance of two is a different in the <15 isochrone shown in Figure 4. Unlike all other station than either of the ones with the greatest number or groups found, this one is scattered all over central Tokyo concentration (e.g. hobby shops and supermarkets). rather than being a collection of city centers. In exam- Note the large differences in the specialization indices ining the values of the categories making up this group, across categories. The highest category (hotels) has a spe- we find that the combined feature including public baths, cialization index of 40.40 while the smallest (general hos- supermarkets, laundries and nursery schools is highly cor- pitals) has a specialization index of just 1.28. Part of this related with this group, but having a low number of cram reflects the nature of hospitals vs hotels; the number of hos- schools is the best identifier. Comparing the red and dark pitals is just small compared to the numbers for other kinds orange stations of Figure 5 with the orange stations from of stores. General hospitals are always large buildings, so Figure 4 bottom clearly indicates the role of this feature in having one of them already excludes having a large number creating these groups. Essentially, by combining most of

5 4N4-IS-1c-01

The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021

highest highest largest station with normalized station with specialization station with highest category number largest number percent largest percent index specialization index hotel 99 Minami-Senju 14.07 Kasai-Rinkai Park 40.40 Minami-Senju bar 1272 Shimbashi 4.04 Shimbashi 28.84 Shimbashi concern 25 Ikebukuro 10.32 Uguisudani 28.61 Uguisudani sportShop 48 Jimbocho 5.93 Meiji-Jingumae ’Harajuku’ 19.54 Meiji-Jingumae ’Harajuku’ amusement 4 Tokyo Teleport 13.91 Tokyo Teleport 19.28 Tokyo Teleport religion 76 Inaricho 12.83 Tonerikoen 16.12 Honkomagome hobby 298 Shibuya 4.08 Aomi 15.87 Omotesando cramSchool 35 Takadanobaba 4.91 Seijogakuenmae 15.61 Seijogakuenmae eatery 1400 Ginza 3.69 Sakuradamon 15.59 Nijubashimae ’Marunouchi’ cafe 179 Shinjuku 5.27 Shin-Toyosu 12.77 Harajuku shopping 23 Ginza 8.53 Shinagawa 11.13 Futago-Tamagawa drugstore 67 Shinjuku 6.24 Ariake-Tennis-No-Mori 10.06 Umeyashiki hospital 149 Ginza 4.19 Shinonome 9.81 Kitami travel 77 Shimbashi 19.24 Showajima 9.09 Daimon publicBath 7 Ochiai 8.28 Tenkubashi 9.03 Nakai varietyStore 34 Shinjuku 7.72 Futago-Shinchi 8.78 Futago-Tamagawa laundry 23 Nishi-Ogikubo 3.80 Kotake-Mukaihara 8.73 Magome nurserySchool 21 Kameido 9.13 Ariake-Tennis-No-Mori 8.43 Tatsumi convenienceStore 80 Shinjuku 15.52 Ryutsu Center 8.00 Shin-Toyosu supermarket 15 Mukohara 4.69 Tatsumi 7.16 Hikawadai sport 51 Ginza 17.76 Showajima 6.89 Takaido spa 8 Seibu-Shinjuku 18.07 Tokyo International Cruise Terminal 1.97 Higashi-Shinjuku generalHospital 4 Ochanomizu 22.42 Shin-Toyosu 1.28 Takashimadaira

Table 2: For each category, the stations with the most stores, highest percent of local area stores, and the highest specialization index using the <10 minute walkable area.

of other stores (note that this category has the largest nor- has the highest score in the <5 isochrone, but for <10 and malized percent of it area). We computed the normalized <15 it is Harajuku 16 kilometers away. There is a similar counts of stores for clustering, and using those here instead switch to a distant stations for the variety store and shop- of the logarithm of the number would change then mean- ping categories. However, note that this table only lists the ing from having a large number, to having more than other top station, so in some of these cases it is merely a slight places. That is, we could build a relative specialization in- reordering among the top few stations rather than a sudden dex in this way, but the absolute specialization index we and large shift. present here that requires a large number of stores seems It is perhaps more insightful to look at the changes in the more intuitive. This tells us that there really are no places specialization index values themselves. The values in the ta- that specialize in large hospitals, spas, sports venues, or ble are sorted by the <15 minute isochrone, and deviations supermarkets. It also tells us that there are places that from that order in the <5 and <10 minute isochrones indi- specialize in hotels, bars, and red-light district activities. cate differences in the diversity of the collections of stores. Instead of just looking at the locations with the maximum Figure 6 also tells us there there is significant differences in specialization values, we can also gain insight by looking at the distributions of the specialization scores by categories: the geospatial distribution of stations with higher and lower some categories have just a few areas with high values, while scores. Figure 6 shows the specialization indices for all sta- other categories exhibit high values across a much broader tions in central Tokyo for two related categories: eateries area. In light of these observations we can consider a more (top) and bars (bottom). The plot of eatery specialization sophisticated measure of specialization that includes sim- tells us that business (and to a lesser degree shopping) ar- ilar values/ranks across isochrones as well as more focus eas have more and a higher percentage of eateries than res- in the area(s) exhibiting specialization. Because there is idential areas, with the area around Tokyo station having no ground truth for walkability scores or for specialization, the greatest concentration. The distribution of bars places the best we can say at this stage is that the specializa- Shimbashi at the top, then the nearby Ginza area, then tion index captures and balances intuitively important as- Shinjuku before falling into lower-than-average locations. pects of specialization and that the results of the <10 and Neither of these results are surprising; on the contrary, these <15 isochrones largely match our experiences of the areas results confirm that our specialization index is accurately around Tokyo. tracking which stores are overly abundant in which areas. That said, the specialization results do depend on the 5. Conclusions and Future Work time horizon used. Table 3 shows the differences in specialization indices and corresponding stations for the <5, <10, Both of our clustering methods captured city centers in and <15 minute walkable areas. The stations for the <10 an intuitive way, but it would be interesting to see clusters and <15 minute isochrones are often the same or nearby sta- among the suburban areas as well to identify gradations tions, but the stations for the <5 minute isochrones are of- in the level of urbanization. Simply adding more clusters ten far away. As an example, for the cafe category, Asakusa splinters nearby areas into distinct groups, while grouping

6 4N4-IS-1c-01

The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021

highest highest highest specialization station with highest specialization station with highest specialization station with highest index specialization index index specialization index index specialization index category <5 min <5 min <10 min <10 min <15 min <15 min hotel 9.67 Uguisudani 40.40 Minami-Senju 42.96 Minami-Senju bar 26.79 Shimbashi 28.84 Shimbashi 28.72 Kasumigaseki religion 24.30 Honkomagome 16.12 Honkomagome 20.50 Sengakuji sportShop 21.70 Ogawamachi 19.54 Meiji-Jingumae ’Harajuku’ 19.89 Jimbocho hobby 10.46 Aomi 15.87 Omotesando 19.51 Omotesando concern 14.44 Uguisudani 28.61 Uguisudani 18.28 Uguisudani travel 16.44 Onarimon 9.09 Daimon 16.80 Monorail Hamamatsucho hospital 8.74 Horikiri 9.81 Kitami 16.76 Kitami cramSchool 13.66 Seijogakuenmae 15.61 Seijogakuenmae 16.12 Hikarigaoka cafe 9.27 Asakusa 12.77 Harajuku 15.79 Harajuku nurserySchool 12.55 Arakawa-Nichome 8.43 Tatsumi 15.45 Tatsumi eatery 15.75 Nijubashimae ’Marunouchi’ 15.59 Nijubashimae ’Marunouchi’ 14.86 T¯oky¯o shopping 11.88 Shinagawa 11.13 Futago-Tamagawa 12.94 Futago-Shinchi varietyStore 9.96 Oshiage ’SKYTREE’ 8.78 Futago-Tamagawa 12.39 Futago-Shinchi convenienceStore 4.72 Seibijo 8.00 Shin-Toyosu 10.38 Telecom Center drugstore 16.93 Wakamatsu-Kawada 10.06 Umeyashiki 9.98 Hatanodai amusement 12.67 Aomi 19.28 Tokyo Teleport 9.73 Aomi publicBath 5.66 Yotsugi 9.03 Nakai 9.66 Anamori-Inari sport 7.94 Kokuritsu-Kyogijo 6.89 Takaido 9.48 Takaido laundry 8.91 Oyamadai 8.73 Magome 9.40 Magome supermarket 1.97 Takashimadaira 7.16 Hikawadai 7.59 Saginomiya spa 2.41 Ryogoku 1.97 Higashi-Shinjuku 3.87 Sugamo generalHospital 4.26 Kudanshita 1.28 Takashimadaira 1.81 Higashi-J¯uj¯o

Table 3: For each category, the station with the highest specialization index and its index for the <5, <10, and <15 minute walkable areas.

seemingly unrelated areas together. Although those seem- category. To address these issues, we will expand the cover- ingly unrelated areas may have similar store profiles, the age area and detail of our store dataset for future analyses. application of clustering in that way loses its value. The re- In the current methodology we include stores that are ality is that by clustering on the mix of stores we are going within 50m of any road edge using three time horizons, and to see the greatest concentration of groups in areas with the so some stores that should be reachable will not be included most stores (where differences in the numbers matter the in the walkable area. Furthermore, using buffered networks most). If we want to see different groupings (such as urban we cannot weight the stores’ contribution by walking time vs suburban, entertainment vs business, dining out vs daily except within those three zones. Following [Araki 20] we life) or based on differences in other features (e.g. trans- could instead integrate the stores into the road network by portation network features [Bramson 19]) then we need to creating nodes for them and connecting those nodes to the adjust the data used for clustering and perhaps use a more road network. This method would allow us to calculate the sophisticated clustering method. The specialization analy- walking time to each store and use it to adjust the impact sis taught us that we can indeed classify areas by their level the store has on walkability. of specialization in each category. Although we focused on Although focusing the analysis on scoring stations is nat- the positive values in order to validate our methods, ex- ural in train-centric Tokyo [Calimente 12], it does have its amining the lowest negative locations may mark opportu- limitations. As apparent from the maps, there are gaps be- nities to improve areas with especially poor access to that tween stations that are not being sampled or scored. One amenity. option is to augment the stations with a collection of addi- We include establishments in 23 different categories, but tional points that are placed to fill in these areas. Another due to limitations in our data, not all types of stores and option is to identify natural (and perhaps overlapping) re- amenities are included. Also, for some purposes, our cate- gions and include the stores in such regions for scoring those gories may be too broad to identify the aspects of an area regions. The problem becomes one of determining what that people are genuinely concerned with. For example, our constitutes natural regions; and the store profile is a strong cafe category groups together chain coffee shops, ‘kisaten’, candidate for associating areas together into the same re- tea shops, and coffee roasters; however, those differences gion. We are now exploring the use of walkability scores have potentially important influences on the perceived en- and other data (population, number of companies, average vironment. Similarly, for some purposes it would be helpful building heights etc.) to profile all segments of a grid, then to separate store brands and/or premium establishments associate neighboring grid spaces if their profiles are similar from ordinary ones (e.g. Natural Lawson from ordinary enough. Lawson convenience stores). Along these lines, one may be As stated in the introduction, walkability scoring has interested in not merely the number of stores, but also on gained popularity among researchers, city planners, and having a variety of each subcategory or brand within each developers; however, research on walkability is still in its

7 4N4-IS-1c-01

The 35th Annual Conference of the Japanese Society for Artificial Intelligence, 2021

them to compare areas by their walkability.

References

[Araki 20] Araki, S. and Bramson, A.: Connecting the Dots: Integrating Point Location Data into Spatial Net- work Analyses, in International Conference on Complex Networks and Their Applications, pp. 193–205Springer (2020)

[Bramson 19] Bramson, A., Hori, M., Zha, B., and In- amoto, H.: Scoring and classifying regions via multi- modal transportation networks, Applied Network Sci- ence, Vol. 4, No. 1, p. 97 (2019)

[Calimente 12] Calimente, J.: Rail integrated communities in Tokyo, Journal of Transport and Land Use, Vol. 5, No. 1, pp. 19–32 (2012)

[Ellis 16] Ellis, G., Hunter, R., Tully, M. A., Donnelly, M., Kelleher, L., and Kee, F.: Connectivity and physical ac- tivity: using footpath networks to measure the walkability of built environments, Environment and Planning B: Planning and Design, Vol. 43, No. 1, pp. 130–151 (2016)

[Hagberg 08] Hagberg, A., Swart, P., and S Chult, D.: Ex- ploring network structure, dynamics, and function using NetworkX, Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States) (2008)

[keplergl 20] keplergl, : kepler.gl (2020)

[Lüthy 17] Lüthy, M.: japan-train-data (2017)

[Mapbox 21] Mapbox, (2021)

[NTT Townpage Inc. 19] NTT Townpage Inc., : Townpage Database, Proprietary Dataset (2019) Figure 6: Results of the speciality analysis for three cate- [OpenStreetMap contributors 21] OpenStreetMap gories using the <10 min walkable zone: eaters (top) and contributors, : Planet dump retrieved from bars (bottom). Dark blue indicates areas with a high de- https://planet.osm.org (2021) gree of specialty in that category. [keplergl 20, Mapbox 21, OpenStreetMap contributors 21] [Pedregosa 11] Pedregosa, F., Varoquaux, G., Gram- fort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., infancy. Aside from the variety of amenities that could be Passos, A., Cournapeau, D., Brucher, M., Perrot, M., included or excluded from such a measure, there is also the and Duchesnay, E.: Scikit-learn: Machine Learning in question of how to capture and integrate those amenities Python, Journal of Machine Learning Research, Vol. 12, for the analysis (buffered roads, distance weighting, inte- pp. 2825–2830 (2011) grated network, etc.). Clearly additional work is necessary to untangle these considerations and establish shared stan- dards for measuring an area’s walkability. What we have done here towards that goal is to reveal how the distance traveled affects the impact of different kinds of establishments on the measured walkability. We also showed how the selection of different sets of amenities can alter scores and change which areas are measured as being more similar to others. Although there is no single, correct walkability score against which we can check our results, by revealing these sensitivities we uncover important factors to consider for any attempts at generating walkability scores and using