International Journal of Geo-Information

Article Towards Detecting Social Events by Mining Geographical Patterns with VGI Data

Zhewei Liu , Xiaolin Zhou, Wenzhong Shi * and Anshu Zhang

Department of Land Surveying and Geo-informatics, The Polytechnic University, Hung Hom 999077, Hong Kong, ; [email protected] (Z.L.); [email protected] (X.Z.); [email protected] (A.Z.) * Correspondence: [email protected]; Tel.: +852-9348-6920

 Received: 7 November 2018; Accepted: 13 December 2018; Published: 17 December 2018 

Abstract: Detecting events using social media data is important for timely emergency response and urban monitoring. Current studies primarily use semantic-based methods, in which “bursts” of certain semantic signals are detected to identify emerging events. Nevertheless, our consideration is that a social event will not only affect semantic signals but also cause irregular human mobility patterns. By introducing depictive features, such irregular patterns can be used for event detection. Consequently, in this paper, we develop a novel, comprehensive workflow for event detection by mining the geographical patterns of VGI. This workflow first uses data geographical topic modeling to detect the hashtag communities with VGI semantic data. Both global and local indicators are then constructed by introducing spatial autocorrelation measurements. We then adopt an outlier test and generate indicator maps to spatiotemporally identify the potential social events. This workflow was implemented using a real-world dataset (104,000 geo-tagged photos) and the evaluation was conducted both qualitatively and quantitatively. A set of experiments showed that the discovered semantic communities were internally consistent and externally differentiable, and the plausibility of the detected events was demonstrated by referring to the available ground truth. This study examined the feasibility of detecting events by investigating the geographical patterns of social media data and can be applied to urban knowledge retrieval.

Keywords: event detection; volunteered geographic information; geographical pattern mining; feature transformation

1. Introduction Volunteered geographic information (VGI) [1] is a recent development that has considerably influenced the way humans interact and information is retrieved. Semantic as well as location information can be attached to VGI, providing new opportunities for research on human mobility and urban systems. Recent observations have also found that some emerging news and events spread rapidly throughout social media, making social media a new tool for event detection and disaster response [2]. Currently, most research on event detection with social media uses semantic-based methods and text mining [3,4]. Platforms, such as Twitter, provide accessible streaming data, which contains real-time text information, making it possible to detect emerging social events by examining the change in semantic-based features [5–7]. However, a new event detection methodology is proposed by investigating the geographical patterns of VGI data. Our intuition is that a social event will affect how the objects spatially distribute across certain regions and how they mutually interact thus causing irregular geographical patterns, especially irregular human mobility and interaction patterns (e.g., sports games causing intense human aggregation or terrorism attacks causing sudden evacuation from certain regions). By introducing depictive measuring features with VGI data and identifying the

ISPRS Int. J. Geo-Inf. 2018, 7, 481; doi:10.3390/ijgi7120481 www.mdpi.com/journal/ijgi ISPRS Int. J. Geo-Inf. 2018, 7, 481 2 of 19 feature irregularity, such geographical patterns, in turn, can be used to distinguish potential social events. To the best of our knowledge, previous works mainly detected social events by word frequency and semantic analysis, and there is still a need for a comprehensive methodology to effectively detect events by investigating irregular geographic patterns. Consequently, a major motivation for this research is to provide new insights into the following issues: what kinds of features can be used for detecting events, or from what aspects can a social event be differentiated with VGI data. Specifically, the novelty and point of our research was to detect social events by mining the geographical patterns of VGI and using geography-based features. To do so, we proposed a comprehensive workflow that consists of three parts: (1) semantic community discovery, (2) geography-based event representation, and (3) irregular event feature detection. First, a data-driven geographic topic modeling method was used to detect hashtag communities and identify social media topics. Second, by introducing quantitative spatial autocorrelation indicators, a novel global index was constructed for representing the potential events in the created feature space. A time series of the univariate event feature was then generated with variable temporal granularities. Third, an outlier test was adopted to detect event feature irregularity, and the global index was localized for finding event location. A detailed event description can then be obtained using the detected semantic and spatiotemporal identification. We conducted the experiment with a real-world dataset (104,000 geo-tagged photos) and verified the effectiveness of the proposed workflow both qualitatively and quantitatively. The main contributions of this paper are as follows:

• We presented a novel workflow to take advantage of new geographical features for event representation and used the change (“outliers”) in the geographical patterns revealed by VGI to differentiate potential events. • We developed an indicator system that includes both global and local indicators for pinpointing social events, as well as a comprehensive data model for event detection, with the consideration of semantic and geographical identification.

The remainder of the paper is organized as follows: Section2 summarizes the related work, Section3 presents the proposed workflow, Section4 reports the experimental results, Section5 discusses the performance of our topic modeling and event detection methods, and finally, Section6 concludes the paper.

2. Related Work Our workflow detects social events by investigating the semantic topics and geographic patterns of VGI data. In this part, previous related works on geographical topic discovery and event detection with social media are presented. Geographical topic discovery [8] finds topics in spatial big data (e.g., GPS-associated documents, geo-tagged social media data) in a geographical context. Among the various geographical topic modeling methods, statistical models such as PLSA [9] and Latent Dirichlet Allocation [10] are commonly used for geographic topic modeling [11–14]. However, these methods have certain limitations in relation to social media data because short noisy texts are predominant and a priori knowledge, such as a predefined count of topics, is unavailable [15]. In other research studies, other forms of information have been used, such as images [16] and heterogeneous unstructured articles [17], the correlation of user movement and interest [18], and geographical diversity and interest distribution of users [19], to facilitate the process of topic discovery. In our work, to detect thematic meanings, we used a data-driven geographic topic modeling method based on the assumption that the semantic information attached to a social media post, such as hashtags, is quite self-explanatory and thus, the potential topics can be detected. The method is data-driven and does not require training data or a priori knowledge such as topic counts, thus reducing potential perceptual biases. ISPRS Int. J. Geo-Inf. 2018, 7, 481 3 of 19

Another research area related to our work is event (anomaly) detection with social media data. A challenging problem is to extract useful, structured representations of events from the disorganized corpus of noisy posts [20]. Sudden increases in the frequency (“bursts”) of sets of keywords have been popularly used for detecting new events in a data stream [21]. Some other approaches include wavelet-based clustering of frequency signals, topic clustering with meta-data analysis and domain-specific approaches [22,23]. In Schinas et al. [24], the authors grouped the event detection methods into three categories: (1) feature-pivot, i.e., detection of abnormal patterns in the appearance of features [5,23,25,26]; (2) document-pivot, i.e., clustering documents with similarity measures [27–30]; and (3) topic modeling, i.e., using statistical models to identify events [31–34]. Becker et al. [35] highlighted four feature categories (temporal, social, topical, and Twitter-centric features) that can be used for modeling event features. These methods are mainly semantic-based, that is, they use text mining techniques to investigate the change in social media semantic information for event detection. Other studies have used spatial-related methods. These have addressed issues such as measuring regional irregularities with post count and user count [36], using content-based methods to detect event location [37,38], estimating the influenced area of an event with kernel density estimation [39] investigating the geographic extent to find localized events [40], assessing the impact area of a natural disaster [41], and detecting events (outliers) by finding the weighted centroids outside the spatial standard deviation ellipse [42]. Compared with semantic-based methods, these spatial-related methods basically focused on identifying the spatial information of the detected event or using the change in the frequency of the posts/users in certain spatial regions to detect events. In our work, we postulate that social events can not only lead to “bursts” in certain semantic signals and the number of posts/users but they also cause structural change in geographical patterns, such as how objects spatially distribute and interact with each other. So, rather than investigating the numeric change in posts/users, we tried to construct features to indicate the structural change in objects’ geographical distribution patterns. Specifically, we picked up the clustering patterns (e.g., spatial autocorrelation) and designed the event features accordingly. To the best of our knowledge, detecting social events by mining irregularities in geographical patterns has not been adequately investigated by previous studies, and this work can shed some light on relevant fields.

3. Proposed Methods In this section, the data model and analytic workflow of our methods are presented, followed by a detailed description of each data handling procedure. First, we introduce a geographic topic modeling method to investigate the semantic meaning of social media posts. Next, we explain a novel event feature representation method based on the geographical patterns of the social media post distribution. Finally, we describe the method for irregularity detection and location indication based on the data model.

3.1. Data Model and Analytic Workflow In our model, to detect social events, the following issues were considered:

• What are the social events about? In other words, semantically, how can we identify the potential topics of the social events using social media data? • How should the social event be depicted from a geographical perspective? Where is the event location? Is there any irregular geographical pattern (e.g., crowd aggregation, evacuation) caused by the event, and how can such patterns be represented with the geo-tagged social media posts?

Based on the above considerations, our event detection model was defined as follows: E = W (S, G) (1) where E is the event detection result, S is the semantic identification to indicate the event topics, G is the geographical patterns of human mobility revealed by VGI, and W is the analytic workflow. ISPRS Int. J. Geo-Inf. 2018, 7, 481 4 of 19

TheISPRS underlying Int. J. Geo-Inf. perception 2018, 7, x FOR is PEER that REVIEW a social event may affect regular human mobility and interaction 4 of 19 patterns. By introducing features derived from quantitative measurements of such patterns, a new feature feature space can be created, and the social event can be represented and detected with the newly space can be created, and the social event can be represented and detected with the newly created features. created features. As represented in Equation (1), the semantic and geographical patterns are both considered in the As represented in Equation (1), the semantic and geographical patterns are both considered in model. Thus, to address the semantic and geographical issues, we constructed a three-fold analytic the model. Thus, to address the semantic and geographical issues, we constructed a three-fold workflow (Figure1), which consists of three interrelated modules: (1) semantic community discovery; analytic workflow (Figure 1), which consists of three interrelated modules: (1) semantic community (2) geography-based event representation; and (3) detection of event feature irregularity. discovery; (2) geography-based event representation; and (3) detection of event feature irregularity.

FigureFigure 1.1. ProposedProposed analytic workflow. workflow.

SemanticSemantic CommunityCommunity Discovery:Discovery: In this step, the the semantic semantic content content attached attached to to the the VGI VGI is is investigatedinvestigated first first to to determine determine the the user user activityactivity associatedassociated with the the geo-tagged geo-tagged posts posts (e.g., (e.g., geo-tagged geo-tagged photos).photos). This This step step starts starts by by retrievingretrieving geo-taggedgeo-tagged media data data from from a a relevant relevant API API (check-in (check-in dataset). dataset). Furthermore,Furthermore, the the hashtags hashtags in in the the titlestitles attachedattached toto the geo-tagged photos photos are are extracted, extracted, and and based based on on thisthis an an undirected undirected “hashtag “hashtag network: network: model model is built,is built, where where each each hashtag hashtag is denoted is denoted as a as network a network node, andnode, the and co-occurrence the co-occurrence frequency frequency between between hashtags ha isshtags assigned is assigned to the weight to the of weight the edge of connectingthe edge theconnecting corresponding the corresponding nodes in the nodes ’hashtag in the network’ ’hashtag (Tag network’ Network (Tag Construction Network Construction in Figure1). in A Figure greedy optimization1). A greedy methodoptimization is then method implemented is then implemen to discoverted to the discover communities the communities from the from hashtag the hashtag network, andnetwork, a common and topica common is assigned topic tois theassigned hashtags to the within hashtags the same within community the same (Semantic community Exploration (Semantic in FigureExploration1). The topicsin Figure of geo-tagged1). The topi photoscs of geo-tagged are further photos identified are further by introducing identified the by topics introducing of attached the topics of attached hashtags as an indicator, as shown in Figure 2 (Interest Identifying in Figure 1). hashtags as an indicator, as shown in Figure2 (Interest Identifying in Figure1). Geography-Based Event Representation: In this step, a new event feature representation method is developed. By introducing quantitative spatial autocorrelation indicators (Geographical Pattern Mining), we constructed a global index for representing the potential event in the created feature space (Feature Transformation and Event Feature Representation). A time series of the univariate event feature with different temporal granularity can then be generated by adjusting the temporal range parameter. Detection of Event Feature Irregularity: After mapping the geographical patterns into a new feature space, we adopt an outlier test to detect the feature irregularities. The global index is then localized for identifying the event location. By combining the semantic and location identification, a detailed description of the detected event can be obtained.

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 4 of 19

feature space can be created, and the social event can be represented and detected with the newly created features. As represented in Equation (1), the semantic and geographical patterns are both considered in the model. Thus, to address the semantic and geographical issues, we constructed a three-fold analytic workflow (Figure 1), which consists of three interrelated modules: (1) semantic community discovery; (2) geography-based event representation; and (3) detection of event feature irregularity.

Figure 1. Proposed analytic workflow.

Semantic Community Discovery: In this step, the semantic content attached to the VGI is investigated first to determine the user activity associated with the geo-tagged posts (e.g., geo-tagged photos). This step starts by retrieving geo-tagged media data from a relevant API (check-in dataset). Furthermore, the hashtags in the titles attached to the geo-tagged photos are extracted, and based on this an undirected “hashtag network: model is built, where each hashtag is denoted as a network node, and the co-occurrence frequency between hashtags is assigned to the weight of the edge connecting the corresponding nodes in the ’hashtag network’ (Tag Network Construction in Figure ISPRS1). A greedy Int. J. Geo-Inf. optimization2018, 7, 481 method is then implemented to discover the communities from the hashtag5 of 19 network, and a common topic is assigned to the hashtags within the same community (Semantic Exploration in Figure 1). The topics of geo-tagged photos are further identified by introducing the The details of each data model and module are presented below. topics of attached hashtags as an indicator, as shown in Figure 2 (Interest Identifying in Figure 1).

Figure 2. Photo sample from Instagram. The photo has a title with multiple hashtags (e.g., #occupycentral, #umbrellarevulotion, #best_streetview) to annotate, indicating potential topics for the photo.

3.2. Semantic Community Discovery The semantic content attached to the geotagged social media posts is first investigated. Statistical models such as Latent Dirichlet Allocation (LDA) [10] show certain limitations in handling social media data given that short noisy texts are predominant and a priori knowledge is unavailable [15]. Consequently, we developed an unsupervised data-driven method, which uses the co-occurrence pattern of the self-explanatory information for detecting semantic communities to investigate the semantic divisions of social media topics. The assumption for this method is that the semantic information, such as hashtags, is highly relevant to the topics of a social media post, and hashtags with similar semantic meaning tend to have a high probability of co-occurrence (as shown in Figure2). Based on the above assumptions, a hashtag network model is built to investigate the geographic pattern of the social media topic in the rich-hashtag environment. The details of the method are defined below: Let P be a collection P = [p1, ... , pn] of social media posts pi= (li, ti, tgsi, tpsi), where li is the location of post pi, ti is the time-stamp, tgsi is the union of the hashtags attached to pi, and tpsi are the Sn potential topics remaining to be identified. Let HT = i=1 tgsi = [tg1, ... , tgm] be the hashtag union of tgsi, i.e., if and only if item  tgsi (i = 1, . . . ,n), item  HT.  The co-occurrence function CRC pi, tgj, tgk is defined as follows: ( 1, i f tg e p .tgs AND tg e p .tgs CRCp , tg , tg  = j i i k i i (2) i j k 0, otherwise where pi is a post item from collection P, tgj and tgk are hashtag items from HT. A hashtag network is further defined as HTN = (V, E), where V is a set of vertices, each of which denotes a hashtag tgi in HT, and E is a set of undirected edges connecting the vertices. The weight of the edge is calculated as: n  Ajk = ∑ CRC pi, tgj, tgk (3) i=1 where Ajk is the weight assigned to the edge connecting vertices j(tgj) and k(tgk). The weight, calculated by summing the co-occurrence frequency of the corresponding pair of hashtags, represents the connectivity between nodes. ISPRS Int. J. Geo-Inf. 2018, 7, 481 6 of 19

Based on the assumption that hashtags with similar semantic meaning have a high probability of co-occurrence and vice versa, we grouped the hashtag network into several communities based on their connectivity so that the vertices within the same community have a dense connection and share the same topic. In other words, a community in the network is interpreted as a group of hashtags sharing oneISPRS common Int. J. Geo-Inf. topic, 2018 and, 7, x basedFOR PEER on theREVIEW connectivity between nodes, different topics can be detected. 6 of 19 The concept of modularity [43] is imported as a measurement of the strength of network division and connectivity.The concept Modularityof modularityQ is[43] often is imported used in optimizationas a measurement methods of the for strength detecting of communitiesnetwork division in a networkand connectivity. and defined Modularity as follows [is44 often]: used in optimization methods for detecting communities in a network and defined as follows [44]:   1 kikj  Q = 1 ∑ Aij − δ ci, cj (4) =2m − 2m (,) 2 i,j 2 (4) , where Aij is the weight of the edge connecting vertices i and j, ki = ∑ Aij is the sum of weight of edges where is the weight of the edge connecting vertices and , j = ∑ is the sum of weight of edges linking to the vertex= 1 , = ∑ , is the community that vertex belongs,δ the δ-function linking to the vertex i, m 2 ∑ Aij, ci is, the community that vertex i belongs, the -function δ(u, v) is 1 i,j (, ) is 1 if =, and 0 otherwise. if u = v, and 0 otherwise. To detect communities, we employed the Louvain algorithm [44]. As a greedy method inspired To detect communities, we employed the Louvain algorithm [44]. As a greedy method inspired by by the optimization of modularity, the Louvain algorithm is data-driven and does not require an a the optimization of modularity, the Louvain algorithm is data-driven and does not require an a priori priori selection of the community count, which can detect network communities with fewer potential selection of the community count, which can detect network communities with fewer potential perceptual biases. The algorithm is applied to the hashtag network and several hashtag perceptual biases. The algorithm is applied to the hashtag network HTN and several hashtag communities can be detected. Based on the semantic meaning of hashtags, a topic can be assigned to communities can be detected. Based on the semantic meaning of hashtags, a topic can be assigned to each community (as shown in Figure 3). The topics of social media post . can be identified by each community (as shown in Figure3). The topics of social media post pi. tpsi can be identified by referring to the topics of .. referring to the topics of pi.tgsi.

Figure 3. Semantic community discovery based on hashtag co-occurrence pattern and modularity, Figure 3. Semantic community discovery based on hashtag co-occurrence pattern and modularity, with node and edge size proportional to occurrence frequency, and the node color indicating with node and edge size proportional to occurrence frequency, and the node color indicating community category. community category. 3.3. Geography-Based Event Representation 3.3. Geography-Based Event Representation After identifying the semantic topics of the geo-tagged posts, we constructed geography-based featuresAfter for identifying event representation. the semantic topics We consider of the ge thato-tagged a social posts, event we may constructed lead to irregulargeography-based human mobilityfeatures (e.g.,for event crowd representation. aggregation), andWe suchconsider irregularity that a cansocial be event revealed may by lead the geographicalto irregular patternhuman ofmobility VGI. Consequently, (e.g., crowd aggregation), a geography-based and such event irregularity representation can be revealed method by was the proposed. geographical The pattern spatial autocorrelationof VGI. Consequently, is introduced a geography-based as a geographical event descriptionrepresentation of human method mobility was proposed. and Moran’s The spatial I [45] asautocorrelation the quantitative is introduced measurement. as a Givengeographical a set of description features and of anhuman associated mobility attribute, and Moran’s Moran’s I [45] I and as z-scoresthe quantitative can be calculated measurement. to evaluate Given whether a set of andfeatures to what and degreean associated the pattern attribute, expressed Moran’s is clustered. I and z- scores can be calculated to evaluate whether and to what degree the pattern expressed is clustered. To analyze the spatial autocorrelation, a spatial fishnet was created first and the geo-tagged posts were then mapped into the corresponding fishnet cells, according to the post locations. Let be the

collection = [,…,] of continuous time segments , and be the collection = [,…,] of the fishnet cell = (, ()), where is the spatial region of , and () is the number of the posts , where . ϵ . AND . ϵ AND . is of specific topics. The Global Moran’s I based on the post number can be further calculated as below [46]: n ∑∑ = (5) ∑

ISPRS Int. J. Geo-Inf. 2018, 7, 481 7 of 19

To analyze the spatial autocorrelation, a spatial fishnet was created first and the geo-tagged posts were then mapped into the corresponding fishnet cells, according to the post locations. Let T be the collection T = [t1,..., tm] of continuous time segments ti, and C be the collection C = [c1,..., cn] of the fishnet cell ci = (ri, numi(tk)), where ri is the spatial region of ci, and numi(tk) is the number of the posts pj, where pj.lj e ci.ri AND pj.tj e tk AND pj.tpsj is of specific topics. The Global Moran’s I based on the post number can be further calculated as below [46]:

n n n ∑i= ∑j= wijzizj = 1 1 Itk n 2 (5) S0 ∑i=1 zi

n ∑j=1 cj. numj(tk) where zi is the deviation of ci. numi(tk) from its average value (ci. numi(tk) − n ), wij is the spatial weight between ci.ri and cj.rj, and S0 is the summation of all the spatial weights. The value of Moran’s I ranges from [–1,1]. The more similar values cluster, the closer I is to +1; the more dissimilar values cluster (similar values disperse), the closer I is to −1; Moran’s I being +1 indicates perfect clustering of similar values while Moran’s I being −1 indicates perfect dispersion. To construct the event feature, we develop a global event index (GEI) as below:  GEItk = −c·ln 1 − Itk (6) where c is a scale coefficient for normalization. By using Equation (6), the GEI value will acceleratedly increase as Itk approaches +1 (−1), indicating the transition from random to geographical clustering (dispersion). By sequentially sampling GEI in the T = [t1,..., tm], a discrete time series will be generated:

TS = [GEIt1 , GEIt2 ,..., GEItm ] (7) where each item is a transformed feature for event representation in the created one-dimensional feature space. Different temporal granularity for analysis can be obtained, by adjusting the span of time segment ti.

3.4. Detection of Event Feature Irregularity Here, we aimed to detect irregularity of event features and pinpoint the spatiotemporal identification of the social event. Thus, for irregularity detection, we adopted the generalized extreme studentized deviate (ESD) test [47], which is a statistical test to detect one or more outliers in a univariate data set. The time series TS (see Equation (7)) will be the input data set for the test and the outliers of the TS will be found accordingly. The ESD test is explained in detail in Algorithm 1. |xk−x| RFunc(X) finds the index k that maximizes the value of Rk = s , where xk, x, s denote the kth item, mean value and sample standard deviation of dataset X, respectively. The Lambda(i) are calculated as follows: (n − i)tp,n−i−1 Lambda(i) = (8) q 2 (n − i − 1 + tp,n−i−1)(n − i + 1) where n is the item count of dataset X, tp,v is the 100p percentage point from the t distribution with v = − α degrees of freedom and p 1 2(n−i+1) , with α being the significance level for the ESD test. Apart from temporally detecting the happening of an event with the above ESD test, another important issue is where the event happened. To further identify the event location, the GEI is localized to construct a local event index (LEI), as below:  LEItk (ci) = −ln 1 − c Itk (ci) (9)

where LEItk (ci) is the calculated LEI for the fishnet cell ci in the time segment tk, Itk (ci) is ci’s Local Moran’s I [46] calculated from ci. numi(tk) is the attribute value, and c is a scale coefficient ISPRS Int. J. Geo-Inf. 2018, 7, 481 8 of 19 for normalization. If the fishnet cell has a high LEI this indicates particular geographical clustering or dispersion patterns of human mobility, which may be caused by special social events in that region. By adopting the workflow described above, a potential social event can be detected by the identification of the semantic topics and the depiction of underlying spatiotemporal patterns. ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 8 of 19 Algorithm 1. Generalized Extreme Studentized Deviate Test Algorithm 1. Generalized Extreme Studentized Deviate Test Input: (1) Observation dataset X (2) Upper Bound Count r Input: (1) Observation dataset 𝑋 (2) Upper Bound Count 𝑟 Output: A list of detected outliers L Output: A list of detected outliers 𝐿 Begin for 𝑖 = 1 to 𝑟 do

(𝑅, 𝑘) ← 𝑅𝐹𝑢𝑛𝑐(𝑋) 𝜆 ← 𝐿𝑎𝑚𝑏𝑑𝑎(𝑖) if 𝑅 > 𝜆 do 𝐿. add(𝑋. item(𝑘)) // The 𝑘 item in 𝑋 is added to 𝐿 end X ← X.item_remove(𝑘) // Remove the 𝑘 item from 𝑋 end Return 𝐿 // Return the list of detected outliers End

4. Experiments In this this section, section, we we implemented implemented our our proposed proposed wo workflowrkflow and and report report our ourexperiments experiments with witha real- a real-worldworld dataset. dataset.

4.1. Experimental Settings The check-in dataset used for the experiment was retrieved from Instagram API, which covered 127,630127,630 geo-taggedgeo-tagged photos photos in in Hong Hong Kong Kong from from 6 December 6 December 2014 2014 to 4 Januaryto 4 January 2015, generated2015, generated by 26,420 by users.26,420 Ausers. social A mediasocial media post included post included the post the ID, post user ID, ID, user the ID, attached the attached hashtags, hashtags, the time-stamp the time-stamp of the onlineof the online post and post the and location the location (longitude (longitude and latitude) and latitude) indicating indicating post place post (Table place1). (Table All the 1). user All IDs the wereuser IDs anonymized were anonymized for privacy for protection. privacy protection.

TableTable 1. 1. DescriptionDescription of socialsocial media media post post field. field.

FieldsFields Description Description pidpid A string A string uniquely uniquely indicating indicating a apost post uiduid An encrypted An encrypted string string uniquely uniquely indicating indicating a a user hashtagshashtags The attached The attached hashtags hashtags indicating indicating the the potential potential post post topics stime The time that the user publishes the online post stime The time that the user publishes the online post location A pair of longitude and latitude indicating the post location location A pair of longitude and latitude indicating the post location

Some of the user user accounts accounts were were for for commercial commercial advertising advertising and and might might post post redundant redundant photos photos in inspecific specific venues. venues. Consequently, Consequently, an anomaly an anomaly detectio detectionn method method was wasused used to remove to remove these these outliers. outliers. The Thenumber number of photos of photos posted posted by each by each user user was was first first investigated. investigated. The Thecalculations calculations showed showed that thatthe theaverage average count count of posted of posted photos photos per user per 𝜇 userwas 4.8µ wasand the 4.8 standard and the standarddeviation deviation𝜎 was 8.8. σThewas three- 8.8. Thesigma three-sigma rule [48] was rule introduced, [48] was introduced,which states which that, for states both that, normally for both distributed normally and distributed non-normally and non-normallydistributed variables, distributed most variables, cases should most fall cases within should three-sigma fall within inte three-sigmarvals. Therefore, intervals. we used Therefore, three- wesigma used intervals three-sigma to set intervals the threshold to set thefor thresholdfiltering. The for filtering.users whose The usersphoto whose numbers photo were numbers outside were the outsidethree-sigma the three-sigma intervals (i.e., intervals 𝜇+3𝜎 (i.e.,, 32)µ +were3σ, 32)recognized were recognized as outliers as outliersand their and posted their posted photos photos were wereremoved removed from fromthe photo the photo dataset. dataset. After After data data clea cleaning,ning, 104,366 104,366 photos photos generated generated by by 25,996 25,996 users

remained. For the parameter setting, the span of time segment 𝑡 was set as 24 h and the size of fishnet cell 𝑐 was set as 100 × 100 m.

4.2. Experimental Results The experimental results were reported and the performance of our workflow was evaluated accordingly. Evaluating event detection techniques is a very challenging and complex problem, and there is still no commonly accepted standard [3]. Theoretically, the results of event detection can be evaluated by quantitative and qualitative analysis. In practice, quantitative evaluation requires a

ISPRS Int. J. Geo-Inf. 2018, 7, 481 9 of 19

remained. For the parameter setting, the span of time segment ti was set as 24 h and the size of fishnet cell ci was set as 100 × 100 m.

4.2. Experimental Results The experimental results were reported and the performance of our workflow was evaluated accordingly. Evaluating event detection techniques is a very challenging and complex problem, andISPRS there Int. J. isGeo-Inf. still no2018 commonly, 7, x FOR PEER accepted REVIEW standard [3]. Theoretically, the results of event detection 9 of can 19 be evaluated by quantitative and qualitative analysis. In practice, quantitative evaluation requires athorough thorough ground ground truth truth data data set, set, which which is always is always unavailable unavailable and and undermines undermines the effectiveness the effectiveness and andeven even feasibility feasibility of this of thisevaluation evaluation approach. approach. Ther Therefore,efore, a qualitative a qualitative evalua evaluationtion was wasadopted adopted in our in ourexperiment. experiment.

4.2.1. Event Detection by Feature Irregularity After completing the semantic community discov discovery,ery, several hashtag groups were detected, each of which shared a common topic. To demonstratedemonstrate the usefulness of our workflow, workflow, we chose two hashtag groups forfor casecase studies.studies. We foundfound oneone hashtaghashtag communitycommunity whosewhose semanticsemantic topic was mainlymainly about the umbrellaumbrella movement (see(see FigureFigure4 ).4). The The umbrella umbrella movement movement was was a political a political movement movement that that emerged emerged during during the Hongthe Hong Kong Kong democracy democracy protests protests of 2014 of [49 2014]. This [49]. movement This movement lasted for lasted 79 days, for and 79 manydays, downtownand many streetsdowntown were streets occupied were during occupied the event, during which the event, ended which in a police ended clearance in a police operation. clearance By operation. studying theBy spatiotemporalstudying the spatiotemporal patterns of the patterns corresponding of the correspon geo-taggedding check-ins, geo-tagged we arecheck-ins, able to we look are into able the to event look frominto the a new event perspective. from a new perspective.

FigureFigure 4. WordWord cloud for representative hashtags of the ‘Umbrella Movement’ community.community.

The GEI waswas calculated calculated with with the the geo-tagged geo-tagged check- check-insins that that were identified identified as the ’Umbrella’Umbrella Movement’ topic. Table2 2 lists lists the the distribution distribution of of time time series seriesTS of ofGEI .. EachEach GEI inin timetime seriesseries TS represented the correspondingcorresponding event featurefeature forfor thatthat timetime segmentsegment (i.e.,(i.e., thatthat day).day). ByBy implementingimplementing the ESDESD testtest uponupon thetheTS ,, thethe irregularirregular event event feature feature was was detected. detected. The The visualized visualized results results are are shown shown in Figurein Figure5. It5. can It can be be seen seen that that at at the the beginning, beginning, the theGEI value value increased increasedin inan an oscillatoryoscillatory mannermanner and achieved its maximum of 37.43 on 11 December 2014, which was detected as the irregularity of the event features by thethe ESDESD method,method, andand thenthen decreaseddecreased toto itsits normalnormal levellevel afterafter that.that.

TableTable 2. Time 2. Time series series of of GEI withwith geo-tagged geo-tagged check-ins check-ins identified identified as the as the’Umbrella ’Umbrella Movement’ Movement’.

DateDate Dec Dec 6 6 Dec Dec 7 7 Dec Dec 8 8 Dec Dec 9 9 Dec 1010 Dec Dec 11 Dec Dec 12 12 … ... Jan 4Jan 4 GEI 3.133.13 3.723.72 1.481.48 13.1813.18 8.138.13 37.43 37.43 6.48 6.48 … . . .0.61 0.61

ISPRS Int. J. Geo-Inf. 2018, 7, 481 10 of 19 ISPRSISPRS Int.Int. J.J. Geo-Inf.Geo-Inf. 20182018,, 77,, xx FORFOR PEERPEER REVIEWREVIEW 10 10 ofof 1919

GEI value of ‘Umbrella Movement’ check-ins 40.00 37.43, event feature irregularity 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00 2015/1/1 2015/1/2 2015/1/3 2015/1/4 2015/1/1 2015/1/2 2015/1/3 2015/1/4 2014/12/6 2014/12/7 2014/12/8 2014/12/9 2014/12/6 2014/12/7 2014/12/8 2014/12/9 2014/12/10 2014/12/11 2014/12/12 2014/12/13 2014/12/14 2014/12/15 2014/12/16 2014/12/17 2014/12/18 2014/12/19 2014/12/20 2014/12/21 2014/12/22 2014/12/23 2014/12/24 2014/12/25 2014/12/26 2014/12/27 2014/12/28 2014/12/29 2014/12/30 2014/12/31 2014/12/10 2014/12/11 2014/12/12 2014/12/13 2014/12/14 2014/12/15 2014/12/16 2014/12/17 2014/12/18 2014/12/19 2014/12/20 2014/12/21 2014/12/22 2014/12/23 2014/12/24 2014/12/25 2014/12/26 2014/12/27 2014/12/28 2014/12/29 2014/12/30 2014/12/31

FigureFigure 5. 5.Detected DetectedDetectedfeature featurefeature irregularity irregularityirregularity asas thethe ’Um ’Umbrella’Umbrella Movement’ Movement’ event event on on 11 11 December December 2014. 2014.

Moreover,Moreover, to to identify identify the the potential potential event event location, location, the theLEI 𝐿𝐸𝐼(see (see(see Equation EquationEquation (9)) (9))(9)) was waswas also alsoalso calculated calculatedcalculated for 11forfor December 1111 DecemberDecember 2014. 2014.2014. The spatialTheThe spatialspatial distribution distributiondistribution of LEI ofof 𝐿𝐸𝐼is shown isis shownshown in Figure inin FigureFigure6. We 6.6. found WeWe foundfound that thatthat the regions thethe regionsregions with awith high aLEI highvalue 𝐿𝐸𝐼 valuevalue were werewere mainly mainlymainly around aroundaround Admiralty, Admiralty,Admiralty, where whwh theere keythe key governmental governmental departments departments (i.e., (i.e., the Hongthethe HongHong Kong KongKong Central CentralCentral Government GovernmentGovernment Offices, Offices,Offices, Legislative LegiLegislativeslative Council CouncilCouncil Complex ComplexComplex and the andand High thethe CourtHighHigh CourtCourt of Special ofof AdministrativeSpecial Administrative Region) Region) are located, are located, and the and place the with place the with highest the highestLEI is at 𝐿𝐸𝐼 [lat: isis at 22.28,at [lat:[lat: lon:22.28,22.28, 114.17], lon:lon: 𝐿𝐸𝐼 where114.17], the whereLEI are the 125.46, 𝐿𝐸𝐼 areare which 125.46,125.46, according whichwhichto accordingaccording Equation toto (9), EquationEquation indicates (9),(9), significant indicatesindicates human significantsignificant aggregation humanhuman in aggregation in that region and the regions nearby. To determine the underlying reason causing this thataggregation region and in the that regions region nearby. and the To regions determine nearby. the To underlying determine reason the underlying causing this reason human causing movement this human movement pattern, we searched the background news with reference to the semantic pattern,human wemovement searched pattern, the background we searched news the with backgr referenceound tonews the semanticwith reference identification to the ’Umbrellasemantic identification ’Umbrella Movement’ and detected the spatiotemporal pattern, ‘Admiralty, 11 Movement’identification and ’Umbrella detected theMovement’ spatiotemporal and detect pattern,ed the ‘Admiralty, spatiotemporal 11 December pattern, 2014’ ‘Admiralty, (see Equation 11 December 2014’ (see Equation (1)). By researching the local news, we identified the relevant event, (1)). By researching the local news, we identified the relevant event, the Admiralty Site Clearance [50]. thethe AdmiraltyAdmiralty SiteSite ClearanceClearance [50].[50]. OnOn ThursdayThursday 1111 DecemberDecember 2014,2014, thethe HongHong KongKong policepolice executedexecuted On Thursday 11 December 2014, the Hong Kong police executed the first site-clearance operation to thethe firstfirst site-clearancesite-clearance operationoperation toto endend thethe mainmain sit-insit-in ofof thethe ‘Umbrella‘Umbrella Movement’Movement’ inin AdmiraltyAdmiralty withwith end the main sit-in of the ‘Umbrella Movement’ in Admiralty with the arrest of 247 people. The police thethe arrestarrest ofof 247247 people.people. TheThe policepolice releasedreleased anan announcementannouncement aboutabout thethe site-clearancesite-clearance beforebefore thethe released an announcement about the site-clearance before the operation. Therefore, many citizens went operation. Therefore, many citizens went to the Admiralty site to witness this social event on that to the Admiralty site to witness this social event on that day, causing significant human aggregation in day, causing significant human aggregation in Admiralty, which consequently was indicated by the Admiralty, which consequently was indicated by the LEI spike in the corresponding regions (Figure6). 𝐿𝐸𝐼 spikespike inin thethe correspondingcorresponding regionsregions (Figure(Figure 6).6).

FigureFigure 6. 6.The TheThe Admiralty AdmiraltyAdmiralty region, region,region, which whichwhich showsshowsshows highhighhigh 𝐿𝐸𝐼LEI valuesvalues forfor for ’Umbrella’Umbrella ’Umbrella Movement’Movement’ Movement’ check-ins,check-ins, check-ins, indicatingindicatingindicating significant significantsignificant human humanhuman aggregation aggregationaggregation ininin thatthatthat regionregionregion andand the the regions regions nearby.nearby.

ISPRS Int. J. Geo-Inf. 2018, 7, 481 11 of 19 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 11 of 19 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 11 of 19 ToTo furtherfurther demonstratedemonstrate the effectiveness effectiveness of of our our wo workflow,rkflow, another another case case study study is isdescribed described below. below. WeWe found foundTo further anotheranother demonstrate hashtaghashtag communitycommunity the effectiveness whose ofsemantic our wo rkflow,topic topic is is anothermainly mainly about case about study fitness. fitness. is describedThe The word word below.cloud cloud forforWe thethe found hashtagshashtags another ofof thatthathashtag groupgroup community is shown whosein Figure semantic 77.. topic is mainly about fitness. The word cloud for the hashtags of that group is shown in Figure 7.

Figure 7. TheThe word word cloud cloud for for representative representative hashtags hashtags of of a a’fitness’ ’fitness’ community. community. Figure 7. The word cloud for representative hashtags of a ’fitness’ community. WeWe calculatedcalculated thethe GEI𝐺𝐸𝐼 withwith the the geo-tagged geo-tagged check-ins check-ins identi identifiedfied as as ‘fitness’ ‘fitness’ (see (see Table Table 3)3) and and detecteddetectedWe thethecalculated eventevent feature the 𝐺𝐸𝐼 irregularity with the geo-tagged with with the the ES ESD check-insD test test (see (see identi Figure Figurefied 88). ).as The‘fitness’ event event (see irregularity irregularity Table 3) was wasand detecteddetecteddetected onon the 77 DecembereventDecember feature 2014, irregularity when the withGE GEII value the ES achieved achievedD test (see its its maximumFigure maximum 8). Theof of 32.47. 32.47. event irregularity was detected on 7 December 2014, when the GEI value achieved its maximum of 32.47. TableTable 3.3. TimeTime seriesseries ofof GEI𝐺𝐸𝐼 with geo-tagged check-ins check-ins identified identified as as ’fitness’ ’fitness’. Table 3. Time series of 𝐺𝐸𝐼 with geo-tagged check-ins identified as ’fitness’ DateDate Dec Dec 6 6 Dec Dec 7 7 Dec Dec 8 8 Dec Dec 9 9 Dec Dec 1010 Dec Dec 11 Dec Dec 12 12 … ... Jan 4 Jan 4 Date Dec 6 Dec 7 Dec 8 Dec 9 Dec 10 Dec 11 Dec 12 … Jan 4 GEI 𝑮𝑬𝑰 1.201.20 32.4732.47 1.771.77 0.860.86 0.340.34 0.35 0.35 0.76 0.76 … . .0.01 . 0.01 𝑮𝑬𝑰 1.20 32.47 1.77 0.86 0.34 0.35 0.76 … 0.01

GEI value of ‘Fitness’ check-ins GEI value of ‘Fitness’ check-ins 35.00 35.00 30.00 32.47 event feature irregularity 30.00 32.47 event feature irregularity 25.00 25.00 20.00 20.00 15.00 15.00 10.00 10.00 5.00 5.00 0.00 0.00

Figure 8. The detected feature irregularity of the ‘fitness’ event on 11 December 2014. FigureFigure 8.8. TheThe detecteddetected featurefeature irregularityirregularity of the ‘fitness’‘fitness’ event on 11 December December 2014. 2014. The location information was needed to pinpoint the social event; therefore, the 𝐿𝐸𝐼 wascalculated.TheThe location location Figure information information 9 shows was the wasneeded spatial needed todistribution pinpoint to pi the npointof social𝐿𝐸𝐼 calculatedthe event; social therefore, from event; the the ‘fitness’therefore,LEI wascalculated. geo-tagged the 𝐿𝐸𝐼 Figurecheck-inswascalculated.9 shows on 7 December theFigure spatial 9 shows 2014. distribution Thethe spatial results of distribution LEIshowedcalculated that of some 𝐿𝐸𝐼 from calculated areas the ‘fitness’in Chek from Lap geo-tagged the Kok‘fitness’ had check-ins geo-taggedunusually on 7highcheck-ins December 𝐿𝐸𝐼 compared on 2014. 7 December The with results other2014. showedregions.The results thatThe showed somearea withareas that the some in highest Chek areas Lap 𝐿𝐸𝐼in Chek Kok is athad Lap[lat: unusually Kok22.32 had lon: unusually high113.94],LEI comparedwherehigh 𝐿𝐸𝐼 the compared 𝐿𝐸𝐼 with was other 189.23. with regions. otherThis area Theregions. is area actually The with area the the AsiaWorld-Expo,with highest the LEIhighestis at 𝐿𝐸𝐼which [lat: is at 22.32has [lat: the lon: 22.32 biggest 113.94], lon: purpose- 113.94], where thebuiltwhereLEI indoor-seated thewas 𝐿𝐸𝐼 189.23. was entertainment189.23. This area This is area actually arena is actually in the Hong AsiaWorld-Expo, the Ko AsiaWorld-Expo,ng. We tried whichto find which hasthe underlyingthe has biggest the biggest event purpose-built purpose- related indoor-seatedtobuilt this indoor-seated irregularity entertainment entertainment of 𝐺𝐸𝐼 arena and arena𝐿𝐸𝐼 in Hong, inby Hong Kong.combing Ko Weng. We triedthe triedsemantic to find to find the identificationthe underlying underlying event’fitness’ event related related and to spatiotemporalto this irregularity pattern of ‘AsiaWorld-Expo, 𝐺𝐸𝐼 and 𝐿𝐸𝐼 , 7by December combing 2014 the′ (seesemantic Equation identification (1)). After searching’fitness’ andthe spatiotemporal pattern ‘AsiaWorld-Expo, 7 December 2014′ (see Equation (1)). After searching the

ISPRS Int. J. Geo-Inf. 2018, 7, 481 12 of 19 this irregularity of GEI and LEI, by combing the semantic identification ’fitness’ and spatiotemporal ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 12 of 19 pattern ‘AsiaWorld-Expo, 7 December 20140 (see Equation (1)). After searching the relevant background information,relevant background we identified information, the Color we Runidentified Hong th Konge Color event Run [51 Hong]. This Kong event event was [51]. the firstThis Colorevent Runwas eventthe first in Color Hong Run Kong event and wasin Hong held Kong on 7 December and was he 2014ld on at 7 the December AsiaWorld-Expo. 2014 at the This AsiaWorld-Expo. was one of the mostThis was popular one of single the most day eventspopular for single the Color day events Run in for the the Asian Color region, Run andin the there Asian were region, approximately and there 16,000were approximately fitness fans and 16,000 runners fitness participating fans and in thisrunners event, participating bringing about in immense this event, human bringing aggregation about toimmense the AsiaWorld-Expo, human aggregation which was to reflected the AsiaWorld-Expo, by the detected irregularitywhich was in reflected the calculated by theGEI detectedand LEI (Figuresirregularity8 and in9 the). calculated 𝐺𝐸𝐼 and 𝐿𝐸𝐼 (Figures 8 and 9).

𝐿𝐸𝐼 Figure 9. The ChekChek LapLap KokKok region,region, whichwhich showsshows highhigh LEI values for ‘fitness’‘fitness’ check-ins,check-ins, indicatingindicating significantsignificant human aggregation.

4.2.2. Urban Structure Understanding Our workflowworkflow cancan alsoalso be be used used for for studying studying the th urbane urban structure. structure. Studying Studying the the urban urban structure structure on aon large a large scale scale has has traditionally traditionally been been a challenge, a challenge, requiring requiring considerable considerable labor labor and and often often resulting resulting in in a partiala partial depiction depiction of reality.of reality. We We speculated speculated that thethat geo-tagged the geo-tagged check-ins check-ins generated generated by residents by residents reveal thereveal patterns the patterns of human of human mobility mobility and aggregation and aggregat in anion urban in an context, urban whichcontext, can which be used can for be studyingused for thestudying structure the andstructure composition and composition of a city. An of application a city. An scenario application is the scenario study of humanis the study mobility of human during amobility festival. during How peoplea festival. move How and people interact move within and theinteract urban within region the during urban aregion festival during can reveal a festival the distinctlycan reveal functional the distinctly areas offunctional the city andareas residents’ of the perceptioncity and residents’ about them. perception Consequently, about inthem. our experiment,Consequently, the inGEI ourand experiment,LEI for a festival the 𝐺𝐸𝐼 event and were 𝐿𝐸𝐼 calculated. for a festival One semanticallyevent were detectedcalculated. hashtag One communitysemantically is detected about ‘Christmas’ hashtag community (Figure 10 ).is about ‘Christmas’ (Figure 10). To investigate the geographical patterns of human mobility and aggregation during Christmas, we calculated the LEI with the ’Christmas’ geo-tagged check-ins on 25 December 2014. In Figure 11, we can see, across the whole urban area of Hong Kong, the regions with high LEI were mainly located within three regions, Tsim Sha Tsui, Lan Kwai Fong and Causeway Bay. This finding is consistent with the urban structure of Hong Kong. All three regions are major recreational areas that residents visit for shopping and entertainment during the holidays. Specifically, the regions with the highest LEI were around Lan Kwai Fong, which is the most well-known area in Hong Kong for drinking, clubbing and dining. This can be explained by the fact that, in Lan Kwai Fong, various street performance and celebration activities are held every Christmas, which would attract many tourists and local residents to visit, thus, causing extreme human aggregation in that region. Figure 10. The word cloud for representative hashtags of the ’Christmas’ community.

To investigate the geographical patterns of human mobility and aggregation during Christmas, we calculated the 𝐿𝐸𝐼 with the ’Christmas’ geo-tagged check-ins on 25 December 2014. In Figure 11,

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 12 of 19

relevant background information, we identified the Color Run Hong Kong event [51]. This event was the first Color Run event in Hong Kong and was held on 7 December 2014 at the AsiaWorld-Expo. This was one of the most popular single day events for the Color Run in the Asian region, and there were approximately 16,000 fitness fans and runners participating in this event, bringing about immense human aggregation to the AsiaWorld-Expo, which was reflected by the detected irregularity in the calculated 𝐺𝐸𝐼 and 𝐿𝐸𝐼 (Figures 8 and 9).

Figure 9. The Chek Lap Kok region, which shows high 𝐿𝐸𝐼 values for ‘fitness’ check-ins, indicating significant human aggregation.

4.2.2. Urban Structure Understanding Our workflow can also be used for studying the urban structure. Studying the urban structure on a large scale has traditionally been a challenge, requiring considerable labor and often resulting in a partial depiction of reality. We speculated that the geo-tagged check-ins generated by residents reveal the patterns of human mobility and aggregation in an urban context, which can be used for studying the structure and composition of a city. An application scenario is the study of human mobility during a festival. How people move and interact within the urban region during a festival can reveal the distinctly functional areas of the city and residents’ perception about them. ISPRSConsequently, Int. J. Geo-Inf. in2018 our, 7, 481experiment, the 𝐺𝐸𝐼 and 𝐿𝐸𝐼 for a festival event were calculated. One13 of 19 semantically detected hashtag community is about ‘Christmas’ (Figure 10).

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 13 of 19

we can see, across the whole urban area of Hong Kong, the regions with high 𝐿𝐸𝐼 were mainly located within three regions, Tsim Sha Tsui, Lan Kwai Fong and Causeway Bay. This finding is consistent with the urban structure of Hong Kong. All three regions are major recreational areas that residents visit for shopping and entertainment during the holidays. Specifically, the regions with the highest 𝐿𝐸𝐼 were around Lan Kwai Fong, which is the most well-known area in Hong Kong for drinking, clubbing and dining. This can be explained by the fact that, in Lan Kwai Fong, various street performance and celebration activities are held every Christmas, which would attract many tourists and local residentsFigureFigure 10. 10. to TheThe visit, word word thus, cloud cloud causing for for representative representative extreme human ha hashtagsshtags aggregation of ofthe the ’Christmas’ ’Christmas’ in that community. region. community.

To investigate the geographical patterns of human mobility and aggregation during Christmas, we calculated the 𝐿𝐸𝐼 with the ’Christmas’ geo-tagged check-ins on 25 December 2014. In Figure 11,

FigureFigure 11. 11.The TheLEI 𝐿𝐸𝐼values values across across the the Hong Hong Kong Kong urban urban area, area calculated, calculated from from the the ‘Christmas’ ‘Christmas’ check-ins. check- Theins. high-value The high-value regions regions are mainly are mainly in Tsim in Tsim Sha Tsui, Sha LanTsui, Kwai Lan Kwai Fong Fong and Causeway and Causeway Bay, whichBay, which are all majorare all recreational major recreational regions regions in Hong in Kong. Hong Kong.

5.5. Discussion Discussion AsAs illustrated illustrated in in Equation Equation (1),(1), toto detectdetect potentialpotential social events, events, our our analytical analytical workflow workflow mainly mainly considersconsiders two two issues, issues, semantic semantic identificationidentification andand geographical patterns, revealed revealed by by VGI VGI data. data. In In this this section,section, we we discuss discuss the the results results ofof ourour methodology with the the aim aim of of answering answering the the following following questions: questions: • • SemanticSemantic RepresentativenessRepresentativeness:: How How well well could could the the semantic semantic similarities similarities between between hashtags hashtags be berepresented represented by bythe the connectivity connectivity between between vertices vertices in a hashtag in a hashtag network? network? In other Inwords, other from words, a fromsemantic a semantic perspective, perspective, were werethe thedetected detected semantic semantic communities communities internally internally consistent consistent and and externallyexternally different? different? • Geographical Event Depiction: How did we depict the social event from a geographical • Geographical Event Depiction: How did we depict the social event from a geographical perspective? How were such geographical patterns, in turn, used for constructing event features perspective? How were such geographical patterns, in turn, used for constructing event features and finally, for detecting events? and finally, for detecting events? The effectiveness and limitations of our methodology are discussed using both quantitative and qualitiveThe effectiveness analysis. and limitations of our methodology are discussed using both quantitative and qualitive analysis. 5.1. Semantic Representativeness 5.1. Semantic Representativeness We implemented Google 𝑊𝑜𝑟𝑑2𝑣𝑒𝑐 to investigate the semantic similarities between hashtag communities.We implemented 𝑊𝑜𝑟𝑑2𝑣𝑒𝑐 Google is a groupWord 2ofvec neuralto investigate network models the semantic that take similarities a large corpus between of hashtagtext as communities.training data andWord produces2vec is aa groupmultidimensional of neural networkvector space, models where that each take unique a large word corpus in the of corpus text as trainingis represented data and by produces a corresponding a multidimensional vector. Words vector that space,share wherecommon each contexts unique are word located in the in corpusclose isproximity represented to one by aanother corresponding in the vector vector. space Words and have that high share cosine common similarity contexts [52]. are The located corpus inof closethe proximitytexts attached to one to anothergeo-tagged in the photos vector was space input and as training have high data cosine to create similarity a vector [52 space.]. The Let corpus 𝐶 and of 𝐶 the be two detected hashtag communities, the community semantic similarity (abbreviated as CSS)

between 𝐶 and 𝐶 is calculated as below:

ISPRS Int. J. Geo-Inf. 2018, 7, 481 14 of 19

texts attached to geo-tagged photos was input as training data to create a vector space. Let Cm and Cn be two detected hashtag communities, the community semantic similarity (abbreviated as CSS) betweenISPRS Int. CJ. mGeo-Inf.and 2018Cn is, 7, calculated x FOR PEER asREVIEW below: 14 of 19

 m n ∑i ∑j sim tagi , tagj ∑∑ 𝑠𝑖𝑚 (𝑡𝑎𝑔 ,𝑡𝑎𝑔 ) CSS(Cm, Cn) = (10) 𝐶𝑆𝑆(𝐶,𝐶)= Tg_Cnt(C )∗Tg_Cnt(C ) (10) Tg_Cnt(𝐶)∗Tg_Cnt(𝐶m )n m th ( m n) m n wherewheretag 𝑡𝑎𝑔i is is the thei 𝑖 tag tag in inC 𝐶m, sim, sim(𝑡𝑎𝑔tagi , tag,𝑡𝑎𝑔j is) is the the cosine cosine similarity similarity between betweentag 𝑡𝑎𝑔i and andtag 𝑡𝑎𝑔j in in the Wordthe 𝑊𝑜𝑟𝑑2𝑣𝑒𝑐2vec vector vector space, space, and and Tg_Cnt( Tg_Cnt(Cm)𝐶 is the) is the total total count count of theof the tags tags of ofC m𝐶. The. The higher higher the the value value of CSSof 𝐶𝑆𝑆, the, the more more similar similar the the hashtag hashtag communities communities are are from from a a semantic semantic perspective. perspective. TheThe divisionsdivisions ofof hashtaghashtag communities were were visualized visualized using using a acomparative comparative word word cloud cloud of ofthe the correspondingcorresponding representativerepresentative hashtagshashtags (shown in Figure Figure 12).12).

FigureFigure 12. A comparative word word cloud cloud of of 13 13 hashtag hashtag communities. communities.

TableTable4 showed4 showed the semanticthe semantic similarities similarities between between hashtag communities.hashtag communities. For hashtag For communities hashtag Ccommunitiesn we introduced 𝐶 weintra introduced community intra community semantic similarity semantic (intrasimilarity-CSS )(𝑖𝑛𝑡𝑟𝑎 calculated- 𝐶𝑆𝑆 ) bycalculatedCSS(Cn ,byC n) (see𝐶𝑆𝑆 Equation(𝐶 , 𝐶 ) (see (10)) Equation to measure (10)) theto measure internal the semantic internal consistency semantic consistency within the within hashtag the community hashtag Ccommunityn and inter community𝐶 and inter semantic community similarity semantic (inter similarity-CSS) calculated (𝑖𝑛𝑡𝑒𝑟-𝐶𝑆𝑆 by) calculatedCSS(Cn, C mby)( 𝐶𝑆𝑆n 6=(𝐶m), to𝐶 measure) (𝑛≠ 𝑚 𝐶 𝐶 𝑖𝑛𝑡𝑟𝑎 the) semanticto measure differences the semantic between differences communities between Ccommunitiesn and Cm. Semantically, and . Semantically, high intra-CSS highindicates - that𝐶𝑆𝑆 theindicates detected that community the detected is community internally consistent; is internally significant consistent; differences significant between differencesintra between-CSS and inter𝑖𝑛𝑡𝑟𝑎-CSSs-𝐶𝑆𝑆 andindicate 𝑖𝑛𝑡𝑒𝑟 that-𝐶𝑆𝑆𝑠 the indicate detected that communities the detected are communi externallyties are differentiable. externally differentiable. We listed the maximum, minimum and average values of inter-CSS together with intra-CSS for Table 4. A comparison of the semantic similarities between communities. comparison. The results show that the detected communities had very high intra-CSS (approximately 0.90). The highest intra-CSS were achieved when it came to communityInter-CSS ‘Tattoo’, which also achieved Hashtag a low value in average inter-Intra-CSS (0.30).Maximum By studying Inter- the textMinimum content and Inter- users inAverage detail, Inter- we found Communities that this hashtag communityCSS was mainly popularCSS among a populationCSS with a strong specificCSS interest. WhenDrinking&Bar they posted photos of0.89 relevant topics0.56 on social media, the 0.32 attached hashtags 0.38 were always very activity-orientedTravel&Wander and even0.93 specialized, which0.61 means the hashtags 0.26 in these communities 0.38 were mainlyLifeStyle&Happiness related to very specific 0.92 activities, i.e., tattoos,0.69 and the chances 0.37 that such hashtags 0.48 co-occurred with the hashtagsFood from other0.91 communities was0.63 relatively low. For0.16 example, hashtag 0.41 ’solotatoo’ fromDisneyLand&Toys the ’Tattoo’ community, 0.93 might occur frequently0.54 with ‘ink’, another 0.22 popular hashtag 0.43 from the ‘Tattoo’ community,Tattoo but seldom0.95 co-occurred0.38 with the hashtags from 0.23 other semantic0.30 communities such asPets&Animals ’Food’ or ‘Pets&Animals’. 0.94 Consequently,0.57 such peculiar co-occurrence 0.20 patterns 0.43 contributed to strongerShop&Luxury internal, rather than0.93 external community0.58 connectivity, 0.11 leading to high intra 0.38- CSS and lowConcert&LiveMusicinter-CSS. A similar explanation0.93 can also0.59 be applied for another 0.26 relevant phenomenon, 0.40 i.e., Makeup 0.92 0.57 0.26 0.36 Fitness 0.86 0.56 0.32 0.37 Christmas 0.90 0.37 0.26 0.28 UmbrellaMovement 0.84 0.39 0.12 0.22

ISPRS Int. J. Geo-Inf. 2018, 7, 481 15 of 19 the ’LifeStyle&Happiness’ community had a relatively high inter-CSS value, indicating a strong external connectivity. A potential reason was that the hashtags in ‘LifeStyle&Happiness’ were mainly emotion-related (e.g., ’amazing’, ’happy’, ’enjoy’). These emotion-related hashtags tended to co-occur with other activity-oriented hashtags to express a user’s emotional reactions and evaluations of the activities, leading to communities with more external connectivity and a higher inter-CSS value. Among all the communities, the intra-CSSs were mostly achieved with a very high value (over 0.85) and the differences between intra-CSSs and average inter-CSSs were mostly over 0.5 (the exception was ’LifeStyle&Happiness’, whose difference was 0.44). The results verified that the division of semantic communities was internally consistent and externally different.

Table 4. A comparison of the semantic similarities between communities.

Inter-CSS Hashtag Communities Intra-CSS Maximum Inter-CSS Minimum Inter-CSS Average Inter-CSS Drinking&Bar 0.89 0.56 0.32 0.38 Travel&Wander 0.93 0.61 0.26 0.38 LifeStyle&Happiness 0.92 0.69 0.37 0.48 Food 0.91 0.63 0.16 0.41 DisneyLand&Toys 0.93 0.54 0.22 0.43 Tattoo 0.95 0.38 0.23 0.30 Pets&Animals 0.94 0.57 0.20 0.43 Shop&Luxury 0.93 0.58 0.11 0.38 Concert&LiveMusic 0.93 0.59 0.26 0.40 Makeup 0.92 0.57 0.26 0.36 Fitness 0.86 0.56 0.32 0.37 Christmas 0.90 0.37 0.26 0.28 UmbrellaMovement 0.84 0.39 0.12 0.22

5.2. Geographical Event Depiction To summarize, to detect social events from geo-tagged social media data, three aspects were considered by our data model. First, our consideration was that the semantic meaning should be investigated to identify the topics for social events. Therefore, we adopted a data-driven topic modeling method to detect hashtag communities and identify potential topics for the posts, in the rich-hashtag environment. Second, besides semantic identification, spatiotemporal identification was also considered by our model to pinpoint the events. This was derived from the perception that across an area, activities in the same category may take place at different places and times, which means that only by combining semantic and spatiotemporal identification can we specify a particular event. Consequently, in our model, a comprehensive workflow was implemented that included both global and local indexes to indicate the temporal and spatial information of the events, respectively. Third, to differentiate social events, we developed a novel geography-based event representation method. Previous works have mainly detected events by investigating the semantic feature of the streaming social media data, while we considered that social events may also lead to irregular geographical patterns in human mobility, and such geographical irregularity, in turn, can be used to detect social events. Specifically, we constructed a one-dimensional event feature based on the quantitative measurement of crowd aggregation (spatial autocorrelation) and detected the event by finding the feature irregularity. By experimenting with sample data, several events (e.g., a site-clearance operation and Color Run) were detected. We found that, as a matter of popularity, those events attracted a large number of citizens to the site, causing unusual human aggregation in corresponding areas, and subsequently, event feature irregularity. Such irregularities were captured and finally, the events were detected using our method.

5.3. Limitations and Future Work Detecting social events with VGI data in this research has certain limitations. One limitation is that the categories of geographical patterns that we considered to construct event features are still ISPRS Int. J. Geo-Inf. 2018, 7, 481 16 of 19 relatively simple and lacking in variety. In this study, we picked the clustering patterns (e.g., spatial autocorrelation) and used a logarithmic formula for event transformation. This is an initial attempt, however, it proves the feasibility of detecting events using geography-based features. We hope to extend the current work into a comprehensive analytic framework where different kinds of geographical pattern indicators and feature transformation operations are tested and incorporated so that social events can be more effectively differentiated. Another limitation is that the evaluation methods are still limited in their effectiveness. In our experiments, we did find that there were small parts of the irregularities that could not be meaningfully interpreted due to a lack of relevant background information. However, these only accounted for a very minor portion of the total detected events. Most of the detected irregularities, when combined with the semantic and spatiotemporal identification (Equation (1)), could be clearly interpreted and assigned to the corresponding social events. Due to the unavailability of a thorough ground truth data set, the current evaluation of the detection results was conducted using qualitative methods, and quantitative evaluations are absent. An authoritative ground truth dataset can be very helpful in achieving plausible quantitative evaluations. Finally, using social media data to detect events inevitably suffers from data bias issues because social media users are a skewed sample of the entire population, mainly consisting of specific groups of people and the younger generation. How to quantify and alleviate the influence of data bias and improve the detection results could be a potential future research direction.

6. Conclusions Event detection with social media is not a novel topic and several relevant studies have been done. Some previous methods have been very effective. However, the novelty of our work lies in providing new insights into the following issues: what kinds of features can be used for detecting events, or, from what aspects can a social event be differentiated using VGI data. Previous methods are mostly semantic-based, which is to say they detect the “bursts” of certain semantic signals to identify events. By contrast, our consideration is that a social event will also affect how the objects spatially distribute and mutually interact, thus causing irregular geographical patterns. Furthermore, by introducing depictive features with VGI, such geographical irregularities can be quantified and used to distinguish social events. Based on the above consideration, we designed a comprehensive workflow where the event feature was represented by investigating the geographical pattern (e.g., spatial autocorrelation) of VGI and the social events were thereby detected. The proposed workflow consists of three modules: (1) semantic community discovery, (2) geography-based event representation, and (3) detection of event feature irregularity. First, we implemented a data-driven geographic topic modeling method to detect hashtag communities and identify social media topics. Second, by introducing quantitative spatial autocorrelation indicators, a global index was constructed for representing the potential events in the created feature space. A time series of the univariate event feature was then generated with variable temporal granularities. Third, an outlier test was adopted to detect event feature irregularity, and the global index was localized for finding event location. The main innovations in our work lies in constructing event features based on the geographical patterns revealed by VGI and differentiating events by detecting feature irregularities, which are mainly related to the second and third modules (i.e., geography-based event representation, and detection of event feature irregularity) of our proposed workflow, where both global and local indicators (features) are constructed by investigating the geographical patterns of VGI data and feature irregularities are detected. By experimenting with sample datasets, this study demonstrated the feasibility of detecting social events with geography-based features, and it could potentially complement current semantic-based event detection methods, which is the scientific weight of our study. Our current work is a preliminary workflow; nevertheless, it provides a framework for future work. Detecting social events by mining geographical irregularities is a potentially promising direction, as relatively few ISPRS Int. J. Geo-Inf. 2018, 7, 481 17 of 19 relevant works have done this previously, compared with the massive number of semantic-based event detection methods.

Author Contributions: Z.L. conceived the methodology and conducted the experiments. Z.L. and X.Z. designed the evaluation. W.S. identified the problem and solution. Z.L. in collaboration with A.Z. wrote the manuscript. Funding: This study is funded by The Hong Kong Polytechnic University [1-ZVF2, 1-ZEAB] Acknowledgments: We would like to thank the anonymous reviewers for their insightful comments and substantial help on improving this article. We also thank Jittin Chaitamart for providing the valuable sample data. Conflicts of Interest: The authors declare no conflicts of interest.

References

1. Goodchild, M.F. Citizens as sensors: The world of volunteered geography. GeoJournal 2007, 69, 211–221. [CrossRef] 2. Earle, P.S.; Bowden, D.C.; Guy, M. Twitter earthquake detection: Earthquake monitoring in a social world. Ann. Geophys. 2012, 54, 708–715. 3. Weiler, A.; Grossniklaus, M.; Scholl, M.H. Survey and experimental analysis of event detection techniques for twitter. Comput. J. 2016, 60, 329–346. [CrossRef] 4. Cordeiro, M.; Gama, J. Online social networks event detection: A survey. In Solving Large Scale Learning Tasks. Challenges and Algorithms; Michaelis, S., Piatkowski, N., Eds.; Springer: Cham, Switzerland, 2016; pp. 1–41. 5. Guille, A.; Favre, C. Mention-anomaly-based event detection and tracking in twitter. In Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), , China, 17–20 August 2014; IEEE: Washington, DC, USA, 2014; pp. 375–382. 6. Thapen, N.; Simmie, D.; Hankin, C. The early bird catches the term: Combining twitter and news data for event detection and situational awareness. J. Biomed. Semant. 2016, 7, 61. [CrossRef][PubMed] 7. Salas Jones, A.; Georgakis, P.; Petalas, Y.; Suresh, R. Real-time traffic event detection using twitter data. Infrastruct. Asset Manag. 2018, 5, 77–84. [CrossRef] 8. Yin, Z.; Cao, L.; Han, J.; Zhai, C.; Huang, T. Geographical topic discovery and comparison. In Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India, 28 March–1 April 2011; ACM: New York, NY, USA, 2011; pp. 247–256. 9. Hofmann, T. Probabilistic Latent Semantic Analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, 30 July–1 August 1999; Morgan Kaufmann: San Francisco, CA, USA, 1999; pp. 289–296. 10. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. 11. Zhang, L.; Sun, X.; Zhuge, H. Location-driven geographical topic discovery. In Proceedings of the 2013 Ninth International Conference on Semantics, Knowledge and Grids, Beijing, China, 3–4 October 2013; IEEE: Washington, DC, USA, 2013; pp. 210–213. 12. Lansley, G.; Longley, P.A. The geography of twitter topics in . Comput. Environ. Urban Syst. 2016, 58, 85–96. [CrossRef] 13. Steiger, E.; Resch, B.; Zipf, A. Exploration of spatiotemporal and semantic clusters of twitter data using unsupervised neural networks. Int. J. Geogr. Inf. Sci. 2016, 30, 1694–1716. [CrossRef] 14. Wang, C.; Wang, J.; Xie, X.; Ma, W.-Y. Mining geographic knowledge using location aware topic model. In Proceedings of the 4th ACM Workshop on Geographical Information Retrieval, Lisbon, Portugal, 6–10 November 2007; ACM: New York, NY, USA, 2007; pp. 65–70. 15. Prateek, M.; Vasudeva, V. Improved topic models for social media via community detection using user interaction and content similarity. In Proceedings of the 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT), St. Petersburg, Russia, 28 August–4 September 2016; IEEE: Washington, DC, USA, 2016; pp. 1–7. 16. Rykov, Y.; Nagornyy, O.; Koltsova, O. Semantic and geospatial mapping of instagram images in saint-petersburg. In Proceedings of the ANIL FRUCT 2016 Conference, St. Petersburg, Russia, 10–12 November 2016; IEEE: Washington, DC, USA, 2016; pp. 110–113. ISPRS Int. J. Geo-Inf. 2018, 7, 481 18 of 19

17. Adams, B.; Janowicz, K. On the geo-indicativeness of non-georeferenced text. In Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media, , Ireland, 4–7 June 2012; The AAAI Press: Palo Alto, CA, USA, 2012; pp. 375–378. 18. Hu, B.; Ester, M. Spatial topic modeling in online social media for location recommendation. In Proceedings of the 7th ACM Conference on Recommender Systems, Hong Kong, China, 12–16 October 2013; ACM: New York, NY, USA, 2013; pp. 25–32. 19. Hong, L.; Ahmed, A.; Gurumurthy, S.; Smola, A.J.; Tsioutsiouliklis, K. Discovering geographical topics in the twitter stream. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; ACM: New York, NY, USA, 2012; pp. 769–778. 20. Ritter, A.; Etzioni, O.; Clark, S. Open domain event extraction from twitter. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; ACM: New York, NY, USA, 2012; pp. 1104–1112. 21. Imran, M.; Castillo, C.; Diaz, F.; Vieweg, S. Processing social media messages in mass emergency: Survey summary. In Proceedings of the Web Conference 2018, Lyon, France, 23–27 April 2018; ACM: New York, NY, USA, 2018; pp. 507–511. 22. Corley, C.D.; Dowling, C.; Rose, S.J.; McKenzie, T. Social sensor analytics: Measuring phenomenology at scale. In Proceedings of the 2013 IEEE International Conference on Intelligence and Security Informatics (ISI), Seattle, WA, USA, 4–7 June 2013; IEEE: Washington, DC, USA, 2016; pp. 61–66. 23. Weng, J.; Lee, B.-S. Event detection in twitter. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, , Spain, 17–21 July 2011; The AAAI Press: Palo Alto, CA, USA, 2011; pp. 401–408. 24. Schinas, M.; Papadopoulos, S.; Kompatsiaris, Y.; Mitkas, P. Event detection and retrieval on social media. arXiv, 2018; arXiv:1807.03675. 25. Alvanaki, F.; Michel, S.; Ramamritham, K.; Weikum, G. See what’s enblogue: Real-time emergent topic identification in social media. In Proceedings of the 15th International Conference on Extending Database Technology, Berlin, Germany, 26–30 March 2012; ACM: New York, NY, USA, 2012; pp. 336–347. 26. Zhang, X.; Chen, X.; Chen, Y.; Wang, S.; Li, Z.; Xia, J. Event detection and popularity prediction in microblogging. Neurocomputing 2015, 149, 1469–1480. [CrossRef] 27. Lee, C.-H. Mining spatio-temporal information on microblogging streams using a density-based online clustering method. Expert Syst. Appl. 2012, 39, 9623–9641. [CrossRef] 28. Petkos, G.; Papadopoulos, S.; Kompatsiaris, Y. Social event detection using multimodal clustering and integrating supervisory signals. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, Hong Kong, China, 5–8 June 2012; ACM: New York, NY, USA, 2012; p. 23. 29. Bao, B.-K.; Min, W.; Lu, K.; Xu, C. Social event detection with robust high-order co-clustering. In Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, Dallas, TX, USA, 16–19 April 2013; ACM: New York, NY, USA, 2013; pp. 135–142. 30. Petkos, G.; Schinas, M.; Papadopoulos, S.; Kompatsiaris, Y. Graph-based multimodal clustering for social multimedia. Multimedia Tools Appl. 2017, 76, 7897–7919. [CrossRef] 31. Hu, Y.; John, A.; Wang, F.; Kambhampati, S. Et-lda: Joint topic modeling for aligning events and their twitter feedback. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, , ON, Canada, 22–26 July 2012; AAAI Press: Palo Alto, CA, USA, 2012; pp. 59–65. 32. Diao, Q.; Jiang, J. In A unified model for topics, events and users on twitter. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; ACL: Stroudsburg, PA, USA, 2013; pp. 1869–1879. 33. Wei, W.; Joseph, K.; Lo, W.; Carley, K.M. A bayesian graphical model to discover latent events from twitter. In Proceedings of the Ninth International AAAI Conference on Web and Social Media, Oxford, UK, 26–29 May 2015; AAAI Press: Palo Alto, CA, USA, 2015; pp. 503–512. 34. Zhou, D.; Chen, L.; He, Y. An unsupervised framework of exploring events on twitter: Filtering, extraction and categorization. In Proceedings of the Twenty-Ninth AAAI Conference on Artifical Intelligence, Austin, TX, USA, 25–30 January 2015; AAAI Press: Palo Alto, CA, USA, 2015; pp. 2468–2474. 35. Becker, H.; Naaman, M.; Gravano, L. Beyond trending topics: Real-world event identification on twitter. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, 17–21 July 2011; AAAI Press: Palo Alto, CA, USA, 2011; pp. 438–441. ISPRS Int. J. Geo-Inf. 2018, 7, 481 19 of 19

36. Lee, R.; Sumiya, K. Measuring geographical regularities of crowd behaviors for twitter-based geo-social event detection. In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks, San Jose, CA, USA, 3–5 November 2010; ACM: New York, NY, USA, 2010; pp. 1–10. 37. Paule, J.D.G.; Sun, Y.; Moshfeghi, Y. On fine-grained geolocalisation of tweets and real-time traffic incident detection. Inf. Process. Manag. 2018.[CrossRef] 38. Sakaki, T.; Okazaki, M.; Matsuo, Y. Earthquake shakes twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; ACM: New York, NY, USA, 2010; pp. 851–860. 39. Gao, Y.; Wang, S.; Padmanabhan, A.; Yin, J.; Cao, G. Mapping spatiotemporal patterns of events using social media: A case study of influenza trends. Int. J. Geogr. Inf. Sci. 2018, 32, 425–449. [CrossRef] 40. Abdelhaq, H.; Sengstock, C.; Gertz, M. Eventweet: Online localized event detection from twitter. In Proceedings of the VLDB Endowment, Trento, Italy, 26–30 August 2013; ACM: New York, NY, USA, 2013; pp. 1326–1329. 41. Panteras, G.; Wise, S.; Lu, X.; Croitoru, A.; Crooks, A.; Stefanidis, A. Triangulating social multimedia content for event localization using flickr and twitter. Trans. GIS 2015, 19, 694–715. [CrossRef] 42. Wachowicz, M.; Liu, T. Finding spatial outliers in collective mobility patterns coupled with social ties. Int. J. Geogr. Inf. Sci. 2016, 30, 1806–1831. [CrossRef] 43. Newman, M.E. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [CrossRef][PubMed] 44. Blondel, V.D.; Guillaume, J.-L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [CrossRef] 45. Moran, P.A. Notes on continuous stochastic phenomena. Biometrika 1950, 37, 17–23. [CrossRef][PubMed] 46. Anselin, L. Local indicators of spatial association—Lisa. Geogr. Anal. 1995, 27, 93–115. [CrossRef] 47. Rosner, B. Percentage points for a generalized esd many-outlier procedure. Technometrics 1983, 25, 165–172. [CrossRef] 48. Pukelsheim, F. The three sigma rule. Am. Stat. 1994, 48, 88–91. 49. Umbrella Movement. Available online: https://en.wikipedia.org/wiki/Umbrella_Movement (accessed on 11 September 2018). 50. Hong Kong Protests: Arrests as Admiralty Site Is Cleared. Available online: https://www.bbc.com/news/ world-asia-china-30426346 (accessed on 8 August 2018). 51. The Color Run Makes Long Awaited Debut at Asiaworld-Expo 16,000 Joins in the Color Fun. Available online: https://www.asiaworld-expo.com/news/detail/the-color-run-makes-long-awaited-debut-at- asiaworld-expo-16000-joins-in-the-color-fun/ (accessed on 15 August 2018). 52. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv, 2013; arXiv:1301.3781.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).