City of the People, for the People: Sensing Urban Dynamics Via Social Media Interactions (Extended Version of Socinfo’18 Paper)

City of the People, for the People: Sensing Urban Dynamics via Social Media Interactions (Extended version of SocInfo’18 paper) Sofiane Abbar1, Tahar Zanouda1, Noora Al-Emadi1, and Rachida Zegour2 1 Qatar Computing Research Institute, HBKU, Doha, Qatar 2 Abderrahmane Mira - Béjaïa University, Algeria {sabbar, tzanouda, nalemadi}@qf.org.qa, [email protected] Abstract. Understanding the spatio-temporal dynamics of cities is in the heart of many applications including urban planning, zoning, and real-estate construction. So far, much of our understanding about urban dynamics came from traditional surveys conducted by persons or by leveraging mobile data in the form of Call Detailed Records. However, the high financial and human cost associated with these methods make the data availability very limited. In this paper, we investigate the use of large scale and publicly available user contributed content, in the form of social media posts to understand the urban dynamics of cities. We build activity time series for different cities, and different neighborhoods within the same city to identify the different dynamic patterns taking place. Next, we conduct a cluster analysis on the time series to understand the spatial distribution of patterns in the city. Our instantiation for the two cities of London and Doha shows the effectiveness of our method. Keywords: Urban Computing, Timeseries Analysis, Urban Dynamics, Clustering, So- cial Data, Spatial Data. 1 Introduction After the revolutionary growth of the modern industry in the late 18th century, cities have become a niche for new opportunities that attracted huge numbers of local and international migrants. In 2015, we were 53.89% to live in urban areas and this number is projected to reach 70% by the end of 2050 [31]. This fast urbanization has exposed different services and infrastructures in cities to an increasing stress, leading to a plethora arXiv:1807.05734v1 [cs.SI] 16 Jul 2018 of urban challenges related to human mobility (e.g., traffic congestion), public health (e.g, air pollution), neighborhood deprivation (e.g., lack of parks), and economical difficulties (e.g., unemployment). The situation is such that the United Nation has declared managing urban areas as one of the most important development challenges of the 21st century. Thus, understanding different patterns of urban dynamics taking place in our cities is of a paramount importance for many applications. Until now, much of our understanding about urban dynamics came from traditional surveys conducted by human agents either physically or by phones [19,28]. While this way of collecting data provides detailed information about urban behaviors, it remains hard to update and present many weaknesses regarding generalization and scalability. More recently, researchers have looked into the use of mobile data such as Call Detailed Records (CDRs) to model urban dynamics [24,10,11]. These attempts have been quite successful in building accurate models for different dynamics such as mobility, traffic congestion, and land use at city-wide scales. However, these two methods suffer major drawbacks related to the high financial (CDRs) and labor (human operated surveys) costs required to acquire the data needed, which makes its availability very limited. For instance, CDRs are not only expensive, but are also highly proprietary (telecommunication companies) posing serious data sharing challenges. Alternatively, social media is notorious for providing its users with a means of documenting the minutiae of their daily lives, including places they go and activities they engage with [9,27]. In this paper, we investigate the use of large scale and widely available public user generated data, in the form of social media posts, to understand urban dynamics taking place in cities and neighborhoods. We are particularly interested in data that is geo-located, i.e. associated with an accurate pair of latitude and longitude coordinates, that is generated by users on different social media platforms such as Twitter, Foursquare, and Instagram. We argue that analyzing geo-located posts can provide an interesting angle to look at the city as a holistic dynamic system. Our analysis is driven by the following two research questions. RQ 1. Can we use social media – and Twitter in particular – to get insights into urban dynamics in different cities? Inspired by th work of Reades et al. [24] on exploring the use of mobile data, we hypothesize that spaces can be characterized by the volume of social media use over time. By analyzing the use of social signatures, we can show how social media interactions are related to urban activities. After all, posting a check-in in a restaurant, at a stadium, or from the college amphitheater to inform friends and the world about our real-time location and activity has become so common in the era of social media . This unprecedented willingness to share with others provides a novel view about urban spaces as seen through social media activity. It also gives good insights about the spatio-temporal dynamics of urban life, leading to a better understanding of the city’s complex hourly, daily, and weekly rhythms. In this study, we collected geo-located tweets over a period of two months from two different cities: London (UK) as a mature and developed city, and Doha (Qatar) as a fast-growing multi-cultural city. The data is transformed into time-series that capture the volume of social media activity over time, which are then normalized to form typical weekly signatures. These signatures are analyzed to discover common weekly patterns in which more activity is observed during working days compared to weekends, and daily cycles that reveal activities such as waking-up in the morning, going to work, then heading back home. A decomposition of the typical signature time-series allows to disentangle general weekly trends from residuals that capture special events such as holidays. RQ 2. Is social media data sensitive enough to pick out more fine grained varia- tions at the level of city neighborhoods? To answer this question, we perform a cluster analysis on typical weekly signatures in the format of time-series created for different neighborhoods. To this end, a city is partitioned into different zones corresponding to a particular administrative division such counties (in the US), boroughs (in UK), and districts (in Doha). The objective is to discover the different patterns and urban rhythms that characterize different areas of the city presenting different cultural, social and economic properties. The cluster analysis allows also to spot neighborhoods that have similar patterns. As a clustering algorithm, we use k − means method that aims at maximizing the similarity among items belonging to the same cluster, and maximizing the dissimilarity among items belonging to different clusters. As the clustering is operated upon time-series, we use the Dynamic Time Warping (DTW) distance [26] that has been shown to outperform all other distances when comparing time-series [32]. Interestingly enough, when the cluster analysis is performed on the zones of both cities together, it identified neighborhoods that share similar spatio-temporal patterns across the two cities. Positioning. Using social media data to characterize urban dynamics in neighborhoods is not new [20,9,27,3,14]. Therefore, we analyze in the following major works done in this area and try to contrast them to our work. In a study by [20], authors looked at modeling human activity and geographical areas via spectral clustering of Foursquare data. The main idea is to split a region into equally sized rectangles, and then characterize each rectangle with a vector containing the total number Foursquare places belonging to different categories. While this approach is interesting, it completely dismisses the temporal aspect of human activity and focuses solely on the geographical distribution and popularity of places (e.g. shops, restaurants, schools, etc.) Our framework’s main contribution is to allow a spatio-temporal analysis of human dynamics in cities. The Livehoods Project is another influential work in this area [9]. It aims at using Foursquare data for explaining dynamics in cities, and use for that spectral clustering on types of Foursquare venues present in different areas of a city. However, similar the work by [20], this project ignores the temporal aspect in the data, which we believe is very important in modeling human dynamics. Indeed, two areas with similar facilities may show different temporal behaviors (e.g. people may stay late at night in one, but not in another.) More recently, [14] demonstrated that using tweet counts could help identifying the land use profile of a neighborhood in a city. Authors used temporal features such as average number of tweets observed within different hours of the day to train a classifiers to label neighborhoods. The main weakness of this approach resides in the many heuristics introduced in defining land use profiles and selecting the ones to test against. Indeed, while land use has already a well-known classification into commercial, business, residential, industrial, etc. Authors provide their own definition of land use profiles on the only basis of resident and business count. A typical profile in their case would be a one that represents a neighborhood (a square cell) with low number of residents and high number of businesses. Our work is different from this one in two aspects. First we introduce a clustering approach using dynamic time warping on weekly time-series. Second, we study well-defined neighborhoods (represented as polygons) that correspond to actual administrative zones. In order to demonstrate the effectiveness of using social media based time-series to capture the different urban rhythms within cities, we built and released a tool that we call CityPulse3. The main objective of CityPulse is to allow a visual inspection of the different spatio-temporal patterns captured in different cities.

Load more