sustainability

Article Mining Public Opinion on Transportation Systems Based on Social Media Data

Dawei Li 1,2,*, Yujia Zhang 1,2,* and Cheng Li 3

1 Key Laboratory of Urban ITS, School of Transportation, Southeast University, 210096, China 2 Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic, Southeast University, Nanjing 210096, China 3 China Academy of Transportation Sciences, No.240, Huixinli, Chaoyang District, Beijing 100029, China * Correspondence: [email protected] (D.L.); [email protected] (Y.Z.)

 Received: 10 June 2019; Accepted: 17 July 2019; Published: 25 July 2019 

Abstract: Public participation plays an important role of traffic planning and management, but it is a great challenge to collect and analyze public opinions for traffic problems on a large scale under traditional methods. Traffic management departments should appropriately adopt public opinions in order to formulate scientific and reasonable regulations and policies. At present, while increasing degree of public participation, data collection and processing should be accelerated to make up for the shortcomings of traditional planning. This paper focuses on text analysis using large data with temporal and spatial attributes of social network platform. Web crawler technology is used to obtain traffic-related text in mainstream social platforms. After basic treatment, the emotional tendency of the text is analyzed. Then, based on the probabilistic topic modeling (latent Dirichlet allocation model), the main opinions of the public are extracted, and the spatial and temporal characteristics of the data are summarized. Taking as an example, the existing problems are summarized from the public opinions and improvement measures are put forward, which proves the feasibility of providing technical support for public participation in public transport with social media big data.

Keywords: traffic planning; public participation; big data; content analysis; spatiotemporal properties

1. Introduction The transportation industry is the leading and basic industry of national economic and social development, and an important guarantee for social and economic development and improvement of people’s living standards. Transportation plays an important role in the whole social development. Raising the level of transportation management can not only promote employment, expand domestic demand, and promote social and economic development and urbanization, but also effectively improve the utilization of social resources, actively improve the environment, and facilitate people’s travel. In some areas, transportation administration departments have begun to try to introduce public participation. The public is the direct beneficiary and experiencer of transportation management and services. The introduction of public participation system in transportation management can better satisfy people’s interests and needs by drawing on public opinions and making decisions, supervising, and evaluating transportation planning. At present, the public participation system in China’s transportation management is still in early stages, and the transparency and openness of information such as transportation decision-making and supervision are insufficient. The mechanism of public participation is not perfect, the consciousness of public participation is weak, and the channels of participation are scarce. Therefore, actively exploring the countermeasures to improve the public participation system in transportation management is an urgent problem to be solved in the current transportation management.

Sustainability 2019, 11, 4016; doi:10.3390/su11154016 www.mdpi.com/journal/sustainability Sustainability 2019, 11, x FOR PEER REVIEW 2 of 14

consciousness of public participation is weak, and the channels of participation are scarce. Therefore, Sustainabilityactively exploring2019, 11, 4016 the countermeasures to improve the public participation system in transportation2 of 15 management is an urgent problem to be solved in the current transportation management. InIn recent recent years,years, thethe emergenceemergence of of big big d dataata analysis analysis technology technology provides provides us uswith with new new ideas ideas and and methods to acquire and process traffic data [1]. The purpose of this paper is to apply the content methods to acquire and process traffic data [1]. The purpose of this paper is to apply the content analysis method of big data technology to traffic planning field, and put forward a set of processes analysis method of big data technology to traffic planning field, and put forward a set of processes and and methods combined with it, so as to provide feasible large data analysis methods for traffic methods combined with it, so as to provide feasible large data analysis methods for traffic planners. planners. This study focuses on text data that people often ignore, trying to obtain public views and This study focuses on text data that people often ignore, trying to obtain public views and opinions on opinions on the traffic system from comment texts, and then apply them to public participation in the traffic system from comment texts, and then apply them to public participation in transportation. transportation. In doing so, it can improve the extent of public participation, and speed up the Incollection doing so, and it can processing improve theof information. extent of public The participation, effective information and speed in uppublic the collectionopinion can and be processing made offull information. use of to analyze The eff ectivethe spatial information and temporal in public characteristics opinion can beand made make full up use forof the to blind analyze spots the that spatial andtraditional temporal planning characteristics cannot achieve. and make up for the blind spots that traditional planning cannot achieve. TheThe analysisanalysis processprocess is shown in in Figure Figure 1.1. Firstly, Firstly, public public opinions opinions are are collected collected from from common common socialsocial network network platforms platforms such as as microblog microblog by by using using open open platform platform SDK SDK and and API, API, and and do do basic basic processingprocessing such such asas de-duplicationde-duplication and word word segmentation. segmentation. As As one one of of the the most most commonly commonly used used social social mediamedia platformsplatforms inin China,China, data in microblog have have large large mining mining and and research research value. value. However, However, as as shownshown in in the the next next section, section, the the previous previous studies studies of microblog of microblog data data in tra inffi trafficc field field rarely rarely focus focus on specific on textspecific semantic text semantic analysis analysis and sentiment and sentiment analysis, analysis, and therefore and therefore public public comments comments and feedbackand feedback cannot becannot collected. be collected. Thus, in Thus this, study,in this study we apply, we apply a content a content classification classification method method to extract to extract valuable valuable traffi c commentstraffic comments from the from whole the data whole and data conduct and conduct emotional emotional orientation orientation analysis. analysis. The latent The Dirichlet latent allocationDirichlet a (LDA)llocation topic (LDA) model topic is usedmodel to is mine used the to topicmine ofthe text topic and of analyze text and the analyze subjective the subjective information behindinformation the text. behind Finally, the text. the Finally, temporal the temporal and spatial and characteristicsspatial characteristics of the of text the are text described. are described. After summarizingAfter summarizing the above the analysis above analysisresults, corresponding results, corresponding improvement improvement measures measures are put forward. are put forward.

Data Collection Data Preprocessing Data Classification Data Modeling Feedback Data Basic Processing Basic Data Screen Sentiment Duplication Optimization Text Analysis Data Related to Compress Spatiotemporal Analysis Website Traffic Analysis Results Comments LDA Topic Deletion Modeling Interaction

Figure 1. PublicPublic o opinionpinion a analysisnalysis p procedurerocedure for for traffic traffi pclanning. planning.

TheThe structure structure of of this this paper is as follows: T Thehe first first chapter chapter mainly mainly introduces introduces the the research research backgroundbackground and and significance significance of the of subject, the subject, and thenand lists then the lists research the research content and content chapter and organization. chapter Theorganization. second chapter The second is about chapter the research is about status the research at home status and abroad,at home andand summarizesabroad, and summarizes the limitations andthe shortcomings limitations and of public shortcomings participation. of public Chapter participation. 3 is an introduction Chapter 3 tois the an methodology introduction of tocontent the analysis.methodology Chapter of content 4 takes analysis. Nanjing MetroChapter System 4 takes as Nanjing an example Metro to System collect publicas an example opinions to on collect Nanjing Metropublic System opinions for content on Nanjing analysis Metro and System topic modeling. for content Chapter analysis 5 provides and topic suggestions modeling. and Chapter conclusions 5 forprovides the improvement suggestions measures and conclusions of Nanjing for metro the improvement system according measures to the of analysis Nanjing results. metro system according to the analysis results. 2. Literature Review 2. Literature Review With the development of economy and society, the number of urban populations keeps increasing, whichWith leads the to the development continuous of increase economy of people’s and society, demand the for number transportation. of urban In populations recent years, keeps traffi c congestion,increasing, which environmental leads to the pollution, continuous and increase other problemsof people’ causeds demand by for urban transportation. traffic have In gradually recent emerged. The formulation and implementation of urban traffic policy will directly affect the allocation of urban traffic resources and the long-term development of urban traffic. As a group affected by traffic Sustainability 2019, 11, 4016 3 of 15 policy, the public should participate in the formulation of traffic policy in an appropriate way to make public policy scientific and reasonable. Booth et al. argued that public participation is top-down in the formulation of transport policy, and to some extent it is insufficient [2]. Bickerstaff et al. studied public participation in urban traffic planning and believed that public participation should be introduced as early as possible in decision-making planning, finding out the causes of different situations, and giving timely feedback [3]. Kennedy et al. argued that in transportation management, public needs should be taken as one of the elements of decision-making. The introduction of public participation legally, reasonably, and actively is not only conducive to decision-making, but also conducive to the realization of public interests [4]. Banister believes that in order to ensure the acceptability of traffic decision-making, it is far from enough to define the form of public participation as propaganda and consultation. Public participation should be actively introduced to educate the public through community or stakeholder social groups so as to make the public realize the importance of participation and truly participate in the decision-making process [5]. Santos et al. put forward the coordination and cooperation between relevant institutions and decision makers to integrate environment and transportation resources so as to make the development goals of transportation consistent with social development goals [6]. Gil researched that more and more stakeholders are involved in transportation management issues, and participants’ control ability is gradually enhanced, and their influence is gradually expanded [7]. Schlag et al. analyzed the stakeholders of urban passenger transport policy considering the environmental impact factors, and put forward the road toll acceptance model, involving passengers, drivers, and management departments, in order to improve the acceptability of urban passenger transport policy [8]. Rowe established a model of public participation in the research of relevant scholars, through which to evaluate the effect of public participation [9]. Konisky and others have constructed a framework similar to the Thomas model, which involves participants, expected outcomes, decision-making bodies, and a certain type of public and expected outcomes in the process of participation [10]. Figueredo establishes a model for evaluating the effectiveness of public participation in urban transport planning by investigating public participation activities in the U.S. Department of Transportation [11]. Kramer (2008) proposed performance indicators for evaluating the effectiveness of public participation activities at the Center for Urban Transportation Research in Florida [12]. Susilo et al. (2009) found that fairness and acceptability are two factors that the public believes have a greater impact on the implementation of urban passenger transport policy through interviews and diary surveys in Indonesia [13]. In recent years, the big data provide more opportunities to better understand travelers’ behavior and the transportation systems [14–18]. Social media is an important data source in social transportation research. Research and application of traffic based on social media big data is in the ascendant. Zeng et al. pointed out that social media information can provide traffic warning signals and road condition information prediction [19]. Wanicbayapong et al. developed a traffic information collection and classification system based on Twitter data [20]. Endarnoto et al. developed a Twitter traffic information acquisition system and designed an Android mobile software to display traffic information [21]. D’ Andrea et al. developed a real-time traffic information monitoring system based on Twitter information flow, which can detect traffic information before news websites publish the same information [22]. Hasan et al. use social media data to analyze traveler activity patterns [23]. Gu et al. developed a traffic incident detection system based on social media and applied it in two cities [24]. Kuflik et al. proposed a framework for extracting traffic-related information from social media information [25]. Rashidi et al. discussed the behavior and challenges of social media data in mining human travel behavior [26]. These application methods of social media data mostly focus on traffic status and traffic incidents detection, ignoring the text semantic analysis of the collected data. Therefore, it cannot reflect the severity of traffic incidents and public feedback. Microblog, as a new social media, has been widely accepted by the public, and the amount of data per day has increased explosively. This provides a new research field for natural language processing Sustainability 2019, 11, 4016 4 of 15 and a large number of new forms of commentary text. However, as a short text with short length, strong emotion, and a single topic, microblogs need new technical means to understand the contents and tendencies. Sentiment analysis refers to processing and analyzing texts with emotions, which is a frontier research field in natural language processing. Combined with microblogs, a new social media on the Internet, it has important practical value. Content analysis of microblogs can track users’ attention and comment tendency to current hot topics. Through content analysis of comments on hot topics in microblog, it can provide managers with an effective tool to understand people’s feelings and guide public opinion.

3. Methodology Content analysis is to mine the deep meaning of text. Text is the carrier of the author’s intention. Important information such as opinions and positions expressed by people in the text can be inferred through content analysis. In order to obtain the public’s evaluation of the traffic system and emotional tendency, this paper applies content analysis to the public’s traffic-related comments.

3.1. Data Collection Data collection is a technique for directionally grabbing structured data from search engines or data sources. Sina microblog has a large number of users, known as the “sensor” of social phenomena, which has an open data platform and more effective information. With the rapidly development of metro networks, the urban transportations heavily depend on the metro systems in some large cities of China, such as Nanjing [27,28]. Particularly, the rapid popularizations of free-floating sharing bicycle systems in recent years, which provide the conveniences of accessing the metro stations, further increase the travel mode shares of metro systems [29–31]. Therefore, we take “Nanjing Metro” as the search keyword to filter the relevant microblogs. Web crawler is applied for collecting the content of searched webpages, which includes microblog author, release time, and microblog text. The time range is from January 2014 to April 2018.

3.2. Data Preprocessing The textual comments obtained are generally large in number and contain a large amount of content that is not related to the subject or with low research value. Therefore, some basic simple preprocessing is required to remove the meaningless information. The main steps of preprocessing include text removal, mechanical compression, and short sentence deletion. There are a large number of advertising or promotional text on some social platforms, which are published or repeated for many times. In addition, users often choose to repost relevant text to express their own opinions, which causes repetition in content. Useless text is required to be deleted before next step.

3.3. Chinese Word Segmentation Chinese word segmentation refers to dividing a Chinese text into individual words due to no spaces as separators in Chinese writing. If the frequency between adjacent words is high, the greater the probability that the system will recognize it as a word. The principle of word segmentation algorithm based on statistical learning is to use adjacent probability of occurrence to reflect the credibility and accuracy of discriminating it as a word. There are many functional words that have no practical meaning in text writing, which are called “stop words” in content analysis. According to Internet resources, we obtain a complete stop words list, which needs to put the keywords used for searching microblogs. Sustainability 2019, 11, 4016 5 of 15

3.4. Text Categorization Text categorization is defined as the categorization of several texts into two or more categories according to requirements or predetermined rules. Text categorization is an important application of supervised learning in machine learning. This paper adopts support vector machine (SVM) classification algorithm to implement text categorization. The basic principle is to find an optimal decision plane so that it can segment two classified data points with the best effect, which is the most recognized text classification method at present. The feature selection method used in this paper is TF-IDF algorithm. TF refers to word frequency and IDF refers to inverse document frequency, which is given as follows:

ni,j TFi,j = (1) Σknk,j where the numerator is the number of occurrences of the word in the file and the denominator is the sum of the occurrences of all words in the file. Inverse document frequency is a measure of the universality and importance of a word, which is defined as follows:

D IDF = lg | | (2) i 1 + d D : t d | ∈ ∈ | where D is the total number of files in the corpus, and then the denominator is the number of | | documents containing the word. In general, we add 1 to denominator to avoid it equaling zero. Then, we calculate the product of TF and IDF. TF-IDF tends to filter out common words and retain important words.

3.5. LDA Topic Model Latent Dirichlet allocation (LDA) topic model was proposed by David Blei, Andrew Ng, and Michael I. Jordan [32]. It is also known as the three-layer Bayesian probability model: A three-layer structure of vocabulary, topic, and text. LDA topic model has excellent dimension reduction ability, which can reduce the original high-dimensional word space to a small topic space composed of a group of topics. For short text like microblogs, words in the text are very limited. The probability of the same word in two different short texts is low. It is difficult to accurately calculate the similarity between texts by using the traditional vector representation method characterized by words or phrases. For microblog text with poor standardization of language and a large number of new vocabularies, a topic model such as LDA is more suitable for accurate calculation in the uncertain environment. We suppose the vocabulary size is M. An M-dimensional vector w = (1, 0, 0, , 0, 0) represents a ··· word. The text is represented as a set of N words d = (w , w , , wN). Then, the comments set D consists 1 2 ··· of L comments: D = (d , d , , dL). There are K topics in L comments, expressed as z (1, 2, , K) 1 2 ··· i ··· The LDA theme model is shown in Figure2, where α and β are priori parameters of the Dirichlet distribution. θ is the multi-distribution parameter of the subject in the document, obeying the Dirichlet prior distribution of the hyper parameter α. ϕ is the multiple distribution parameter of the word in the subject, obeying the Dirichlet prior distribution of the hyper parameter β. LDA topic model assumes that each text is randomly combined in a specific proportion by its potential individual topics. The proportion of the composition obeys the polynomial distribution:

Z θ = Multionomial(θ) (3) | Each theme is randomly combined with the vocabulary in the word bag according to a specific ratio. The proportion of the composition is also subject to polynomial distribution:

W Z, φ = Multionomial(φ) (4) | Sustainability 2019, 11, 4016 6 of 15 Sustainability 2019, 11, x FOR PEER REVIEW 6 of 14

Figure 2. SchematicSchematic diagram diagram of of latent latent Dirichlet Dirichlet allocation allocation ( (LDA)LDA) topic model model..

LDAThus, topic the probability model assumes of generating that each the text word is randomlywi with the combined comment ind jacan specific be expressed proportion as: by its potential individual topics. The proportion of the composition obeys the polynomial distribution: XK     P wi dj = P(wi z = s) P z = s dj (3(5)) 푍|휃 = 푀푢푙푡푖표푛표푚푖푎푙| × (휃). s=1 Each theme is randomly combined with the vocabulary in the word bag according to a specific   th ratio.where TheP(w proportioni z = s) indicates of the composition the probability is also that subject the word to polynomialwi belongs distribution: to the s topic and P z = s dj | indicates the probability of the sth topic in the comment d . 푊|푍, 휙 = 푀푢푙푡푖표푛표푚푖푎푙j (휙). (4) Approximate estimation of the parameters θ and φ in the model is required while establishing LDAThus, topic the model. probability The parameter of generating of the polynomialthe word 푤 distribution푖 with the comment of the word d푗 wcani in be the expressed subject z sas:is φs,i. The multi-distribution parameter of the topic퐾 zs in the comment dj is θj,s. The estimated formulas are as follows:   푃(푤푖|푑푗) = ∑ 푃(푤푖|푧 =V푠) × 푃(푧 = 푠|푑푗) (5) X  φ = (n + β )/ n + β  (6) s,i s=1 s,i i  s,i i i=1 푡ℎ where 푃(푤푖|푧 = 푠) indicates the probability that the word 푤푖 belongs to the 푠 topic and 푡ℎ  K  푃(푧 = 푠|푑푗) indicates the probability of the 푠 topic X in the comment d푗. θ = n + αs / n + αs (7) Approximate estimation of the jparameters,s j,s θ and φ inj, sthe model is required while establishing s=1 LDA topic model. The parameter of the polynomial distribution of the word 푤푖 in the subject 푧푠 is where ns,i represents the number of occurrences of the word wi in the topic zs and nj,s represents the 휙푠,푖 . The multi-distribution parameter of the topic 푧푠 in the comment d푗 is 휃푗,푠 . The estimated number of topics z included in the comment d . formulas are as follows:s j 4. Data Analysis and Results 푉 (6) In this section, we set Nanjing휙푠,푖 metro= (푛 system푠,푖 + 훽 as푖)/ the(∑ research푛푠,푖 + object훽푖) and apply the method above to collect and analyze social media information in order to푖= obtain1 public opinions for the metro system and summarize the spatiotemporal properties. 퐾 휃푗,푠 = (푛푗,푠 + 훼푠)/ (∑ 푛푗,푠 + 훼푠) (7) 4.1. Classification Results 푠=1 After data collection and basic preprocessing, LibSVM is used to classify all text data. Among where 푛푠,푖 represents the number of occurrences of the word w푖 in the topic 푧푠 and 푛푗,푠 represents the total of 50,970 pieces of data extracted, traffic evaluation accounts for 13,021 (25.5%), information the number of topics 푧푠 included in the comment d푗. reporting accounts for 9501 (18.6%), traffic demand accounts for 9557 (18.8%), and irrelevant data 4.accounts Data Analysis for 18,891 and (37.1%). Results The proportion of each of the categories is shown in the Figure3. In this section, we set Nanjing metro system as the research object and apply the method above to collect and analyze social media information in order to obtain public opinions for the metro system and summarize the spatiotemporal properties.

4.1. Classification Results After data collection and basic preprocessing, LibSVM is used to classify all text data. Among the total of 50,970 pieces of data extracted, traffic evaluation accounts for 13,021 (25.5%), information Sustainability 2019, 11, x FOR PEER REVIEW 7 of 14 Sustainability 2019, 11, 4016 7 of 15 reporting accounts for 9501 (18.6%), traffic demand accounts for 9557 (18.8%), and irrelevant data accountsSustainability 2019, 11, x FOR PEER REVIEW 7 of 14

reporting accounts for 9501 (18.6%), traffic demand accounts for 9557 (18.8%), and irrelevant data accounts for 18,891 (37.1%). The proportion of each of the categories is shown in the Figure 3.

Figure 3. TextText classification classification results results..

Figure 3. Text classification results. 4.2. Sentiment Sentiment Analysis For4.2. Sentimentthe the traffic traffi Analysisc evaluation evaluation text text classified classified above, above, we we use use ROST ROST Content Content Mining Mining System System Version Version 6.0 6.0 (ROSTCM 6) for sentiment analysis. Through sentiment analysis, positive emotions accounts for (ROSTCMFor 6) thefor trafficsentiment evaluation analysis. text classifiedThrough above, sentiment we use analysis, ROST Content positive Mining emotions System accounts Version for6.0 4660 (56.89%),4660(ROSTCM (56.89%), neutral neutral6) for emotions sentiment emotions accounts analysis. accounts Through for for 1468 1468 sentiment (17.92%) (17.92%), analysis,, and and negative positive negative emotions emotions emotions accounts accounts accounts for 4660 for for 2063 (25.19%).(56.89%), The detailed neutral emotions statistical accounts results forare 1468 shown (17.92%), in Figure and4 4. negative. emotions accounts for 2063

FigureFigure 4. 4.Results Results of sentiment analys analysis.is. Figure 4. Results of sentiment analysis. 4.3. LDA Topic Model Analysis 4.3. LDA Topic Model Analysis 4.3. LDA Topic Model Analysis LDALDA topic topic analysis analysis of positive of positive and negative and negative comments comments on Nanjing on Nanjing metro is metro conducted, is conducted respectively., WhenLDArespectively. building topic the When analysis model, building we of take positive the themodel, following and we take negative valuesthe following commentsfor the values parameters on for Nanjingthe ofparameters the model:metro of the isDirichlet model: conducted prior, parametersrespectively.Dirichlet take Whenprior empirical parameters building values, the take model, empirical respectively we values,take theα respectively= following50⁄K, β = αvalues0.1. = 50⁄K, The for β text =the 0.1. isparameters The clustered text is clusteredof into the 3 model: topics, whileDirichletinto 10 words3prior topics, parameters with while the 10 highest words take empirical probabilitywith the highest values, of occurrence pro respectivelybability and of occurrence theirα = 50⁄K, probabilities andβ = their0.1. Theareprobabilities outputtext is asclustered are results. Tableinto 3output1 topics,shows as while theresults. potential 10 Table words 1 topics shows with of thethe public potentialhighest opinions protopicsbability onof Nanjingpublic of occurrenceopinions metro. on Table andNanjing2 their shows metro. probabilities the Table potential 2 are shows the potential topics of negative opinions. outputtopics of as negative results. opinions.Table 1 shows the potential topics of public opinions on Nanjing metro. Table 2 shows the potential topics of negative opinions. TableTable 1. 1.Positive Positive potential potential topic of of Nanjing Nanjing metro metro.. Topic1 Probability Topic2 Probability Topic3 Probability Topic1 ProbabilityTable 1. Positive Topic2 potential topic Probability of Nanjing Topic3metro. Probability Carriage 0.0166 Open 0.0316 Check 0.0235 Topic1Eat Carriage Probability0.0098 0.0166 Topic2Line1 Open Probability 0.03160.0107 CheckWorkTopic3 0.02350.0133Probability SeatEat 0.0091 0.0098 minute Line1 0.01070.0099 Passengers Work 0.01330.0123 Carriage Seat0.0166 0.0091 Open minute 0.00990.0316 PassengersCheck 0.0123 0.0235 Cold 0.0074 Line 0.0093 Culture 0.0102 Eat Cold0.0098 0.0074 Line1 Line 0.00930.0107 CultureWork 0.0102 0.0133 Conditioner 0.0068 Intercity 0.0079 Quality 0.0098 SeatConditioner 0.0091 0.0068 minute Intercity 0.00790.0099 QualityPassengers 0.0098 0.0123 Passengers 0.0067 Time 0.0077 Line 0.0095 Cold Passengers0.0074 0.0067 Line Time 0.00770.0093 LineCulture 0.0095 0.0102 LineLine 0.0066 0.0066 Operation Operation 0.00760.0076 GetGet off off 0.00540.0054 ConditionerLine1 0.0068 0.0063 Intercity Line3 0.00740.0079 Line3Quality 0.0049 0.0098 PassengersBridge 0.0067 0.0061 Time Plan 0.00710.0077 ConvenientLine 0.0049 0.0095 Line Empty0.0066 0.0055 Operation Train 0.0070.0076 ThankGet off 0.0046 0.0054 Sustainability 2019, 11, x FOR PEER REVIEW 8 of 14

Sustainability 2019 11 Line1 , , 4016 0.0063 Line3 0.0074 Line3 0.0049 8 of 15 Bridge 0.0061 Plan 0.0071 Convenient 0.0049 Empty 0.0055 Train 0.007 Thank 0.0046 Table 2. Negative potential topic of Nanjing metro.

Topic1 ProbabilityTable 2. Negative Topic2 potential Probabilitytopic of Nanjing metro Topic3. Probability

Topic1Passengers Probability 0.0200 Topic2 Check Probability 0.0204 CarriageTopic3 0.0134Probability Late 0.0165 Conditioner 0.0176 Stuff 0.0118 PassengersWorry 0.0200 0.0161 Check Line1 0.01040.0204 QualityCarriage 0.00980.0134 LateFault 0.0165 0.0150 Conditioner Hot 0.01020.0176 EatStuff 0.00970.0118 WorryRain 0.0161 0.0148 Line1 Seat 0.0090.0104 WorkQuality 0.00860.0098 FaultStop 0.0150 0.0136 AgainHot 0.00840.0102 StationEat 0.00750.0097 RainLine1 0.0148 0.0128 Seat Die 0.00780.009 ThingsWork 0.00520.0086 StopOverhaul 0.0136 0.0118 CarriageAgain 0.00760.0084 Line1Station 0.00520.0075 Broken 0.0106 Slow 0.0075 Drink 0.0048 Line1 0.0128 Die 0.0078 Things 0.0052 Explain 0.0105 Line2 0.0073 Train 0.0048 Overhaul 0.0118 Carriage 0.0076 Line1 0.0052 Broken 0.0106 Slow 0.0075 Drink 0.0048 FromExplain the potential0.0105 topics of positiveLine2 comments, the0.0073 high-frequency Train words in Topic0.0048 1 mainly reflectFrom the situationthe potential that topics there areof positive more seats comments, in the subway the high-frequency when the subway words is notin Topic crowded, 1 mainly and thereflect cabin the issituation carriage. that Topic there 2 are mainly more showsseats in the the public’s subway eager when expectationthe subway andis not concern crowded, for and the constructionthe cabin is andcarriage. opening Topic of new 2 mainly metrolines. shows Topic the 3public shows’s that eager Nanjing expectation metro is and convenient. concern Cultural for the atmosphereconstruction is and good opening and passengers’ of new metro quality lines. is high. Topic From 3 shows the potential that Nanjing topicsof metro negative is convenient. comments, theCultural high-frequency atmosphere words is good in Topicand passengers’ 1 show the quality inconvenience is high. From and troubles the potential for passengers topics of negative facing a subwaycomments, operation the high-frequency failure and concernswords in aboutTopic 1 trip show delays. the inconvenience For example, and the troubles faults of for line passengers 1 is more serious.facing a Topicsubway 2 shows operation theideas failure of and air conditioning concerns about in metro trip delays. carriages. For Topicexample, 3 shows the faults the exposure of tois uncivilizedmore serious. behavior Topic in2 shows the carriages, the ideas such of asair eating conditioning or drinking in metro something. carriages. The modelingTopic 3 shows results the of positiveexposure and to negativeuncivilized comments behavior are in shown the carriages, in Figures such5 and as6, respectively. eating or drinking The two something. graphs show The themodeling probability results distribution of positive ofand words negative under comments three topics. are shown Ten words in Figures with the5 and highest 6, respectively. probability The of occurrencetwo graphs are show shown the in probability Tables1 and distribution2 above. of words under three topics. Ten words with the highest

(a) Keywords in topic 1

(b) Keywords in topic 2

(c) Keywords in topic 3

Figure 5. Probability distribution of words in positive opinions.opinions. Sustainability 2019, 11, 4016 9 of 15 SustainabilitySustainability 20192019,, 1111,, xx FORFOR PEERPEER REVIEWREVIEW 99 ofof 1414

(a)(a) KeywordsKeywords inin topictopic 11

(b)(b) KeywordsKeywords inin topictopic 22

(c)(c) KeywordsKeywords inin topictopic 33 Figure 6. Probability distribution of words in negative opinions. Figure 6. Probability distribution of words in negative opinions opinions..

4.4.4.4. Spatiotemporal Spatiotemporal Properties Properties

4.4.1. Temporal Distribution of Text Data 4.4.1.4.4.1. TemporalTemporal DDistributionistribution ofof TTextext DDataata According to the keywords obtained from above analysis, we select two hot topics of high public AccordingAccording toto thethe keykeywordswords obtainedobtained fromfrom aboveabove analysis,analysis, wewe selectselect twotwo hothot topicstopics ofof highhigh publicpublic discussion as research objects: Metro security check and metro operation faults. We add time sequence discussiondiscussion as as research research objects: objects: MMetroetro security security check check and and metro metro operation operation faults. faults. We We add add time time labels for them and the topic attention degree changing with time is shown in Figures7 and8. sequencesequence labelslabels forfor themthem andand thethe topictopic attentionattention degreedegree changingchanging withwith timetime isis shownshown inin FigureFiguress 77 andand 88.. Topic:Topic: SecuritySecurity CheckCheck 500500

400400

300300

200200

100100

00

Number Number Microblog of Number Number Microblog of Year/Month Year/Month FigureFigure 7.7. “Security“Security Check” Check” topic topic discussion discussion changes changes over over time time.time.. Sustainability 2019, 11, 4016 10 of 15 Sustainability 2019, 11, x FOR PEER REVIEW 10 of 14

Topic: Operation Fault 300 250 200 150 100 50

0 Number Number Microblog of Year/Month FigureFigure 8. 8.“Operation “Operation Fault” Fault” topic topic discussion discussion changes changes over over time. time.

AccordingAccording to to Figure Figure7 ,7 the, the discussion discussion on on the the topic topic metro metro security security check check peaked peaked in in August August 2014, 2014, AugustAugust 2016, 2016, and and August August 2017. 2017. Nanjing Nanjing metro metro and and Nanjing Nanjing Public Public Security Security Bureau Bureau executed executed security security checkschecks for for all all lines lines during during the the Youth Youth Olympic Olympic Games Games in in 2014, 2014, 1 1 September September 2016, 2016, and and 21 21 AugustAugust 2017,2017, whichwhich led led to to a a public public discussion discussion of of security security check check and and the the impact. impact. FigureFigure8 8shows shows that that when when faults faults occur,occur, public public often often chooses chooses to to post post an an instant instant blog blog on on social social platforms platforms such such as as Sina Sina Microblog. Microblog.

4.4.2.4.4.2. SpatialSpatial Properties Properties WeWe count count the the numbers, numbers, dates, dates, and and locations locations of of the the metro metro line line faults faults during during the the study study period, period, as as shownshown in in Table Table3. 3.

Table 3. Faults information of Nanjing metro. Table 3. Faults information of Nanjing metro.

DateDate LocationLocation AmountAmount Date Date Location Location Amount Amount 2014.5.132014.5.13Andemen Andemen 7979 2014.5.23 2014.5.23 Hedingqiao Hedingqiao 42 42 2014.8.232014.8.23 Hongshan Hongshan Zoo Zoo 101101 2014.12.1 2014.12.1 Jinma Jinma Road Road 54 54 2015.4.32015.4.3 FuqiaoFuqiao 6868 2015.5.29 2015.5.29 Tianlongsi Tianlongsi 45 45 2015.6.192015.6.19 MaqunMaqun 7171 2016.6.21 2016.6.21 Tianlongsi Tianlongsi 90 90 2016.7.26 Hedingqiao 78 2016.9.3 Kazimen 43 2016.7.26 Hedingqiao 78 2016.9.3 Kazimen 43 2016.10.21 Huashenmiao 211 2016.10.26 Xinmofanmalu 299 2016.10.21 Huashenmiao 211 2016.10.26 Xinmofanmalu 299 2016.11.30 Zhongsheng 30 2016.12.13 Huashenmiao 203 2016.11.30 Zhongsheng 30 2016.12.13 Huashenmiao 203 2017.5.25 Lingshan 31 2017.12.13 Maigaoqiao 102 2018.1.262017.5.25 Minggugong Lingshan 33 31 2017.12.13 Maigaoqiao 102 2018.1.26 Minggugong 33 After screening the comment text related to “security check” and “operation fault”, LDA topic modelAfter analysis screening is carried the comment out through text the related above to process, “security and check” the text and is “operation clustered into fault”, two LDA topics. topic The modelkeywords analysis and istheir carried probabilities out through under the each above topic process, are shown and the in text Tables is clustered 4 and 5. into two topics. The keywords and their probabilities under each topic are shown in Tables4 and5. Table 4. Potential topics of text related to “security check”. Table 4. Potential topics of text related to “security check”. Topic1 Probability Topic2 Probability Topic1Bag Probability0.0445 Line Topic2 0.0212 Probability BagStaff 0.04450.0164 Arrival Line 0.0115 0.0212 Staff Brand 0.01640.0117 Queue Arrival 0.0091 0.0115 BrandWork 0.01170.0111 Time Queue 0.0075 0.0091 WorkInstrument 0.01110.0100 Work Time 0.0074 0.0075 InstrumentCheck 0.01000.0089 Peak Work 0.0071 0.0074 CheckLine 0.00890.0087 Coordination Peak 0.0069 0.0071 LineThink 0.00870.0068 CoordinationStation 0.0067 0.0069 Think 0.0068 Station 0.0067 Free 0.0068 Passenger 0.0066 Free 0.0068 Passenger 0.0066 Machine 0.0065 Understand 0.0065 Machine 0.0065 Understand 0.0065

Table 5. Potential topics of text related to “operation fault”.

Topic1 Probability Topic2 Probability Sustainability 2019, 11, 4016 11 of 15

Table 5. Potential topics of text related to “operation fault”.

Topic1 Probability Topic2 Probability Line1 0.0198 Rain 0.0213 Sustainability 2019Conditioner, 11, x FOR PEER REVIEW 0.0111 Line1 0.0142 11 of 14 Equipment 0.0102 Train 0.012 Train 0.008 Stop 0.0118 Line1 0.0198 Rain 0.0213 Stop 0.0077 Passenger 0.0109 Conditioner 0.0111 Line1 0.0142 Hot 0.007 Temporary 0.0107 Line2Equipment 0.00690.0102 Train Late 0.012 0.0101 RechargeTrain 0.00670.008 Stop Times 0.0118 0.0094 ElevatorStop 0.00670.0077 Passenger Line3 0.0109 0.0091 EveryHot 0.00660.007 Temporary Recovery 0.0107 0.0089 Line2 0.0069 Late 0.0101 Recharge 0.0067 Times 0.0094 From Table4, the high-frequencyElevator words0.0067 in Topic 1Line3 mainly reflect the0.0091 public’s strong discussion on the news report: In NanjingEvery metro, a0.0066 famous-brand Recovery bag can be exempted0.0089 from security checker. After theFrom news Table was 4, issued, the high-frequency Nanjing Metro words Department in Topic responded1 mainly reflect to it, the saying public that’s securitystrong discussion checks will strictlyon the implement news report: the In policy Nanjing of “everymetro, Baga famous-brand must be checked”. bag can Regardlessbe exempted of from brand, security all bags checker. should beAfter subjected the news to security was issued, checks Nanjing consciously, Metro Department and it will notresponded be treated to it, di sayingfferently that because security of checks its high price.will Thestrictly high-frequency implement the words policy in Topicof “every 2 mainly Bag reflectmust be the checked”. specific impactRegardless on public of brand, travel all after bags the implementationshould be subjected of security to security checks checks on the consciously, whole lines and of subway,it will not such be treated as too differently long of a wait because for security of its inspectionhigh price. in queuesThe high-frequency travel time duringwords in rush Topic hours 2 mainly being reflect too long, the etc.specific However, impact moston public of the travel public stillafter express the implementation their understanding of security and activechecks cooperation on the whole on lines the of security subway, check such policy. as too long of a wait forAs security can be inspection seen from in Table queues5, Topic travel 1 time mainly during reflects rush the hours equipment being too failure long, etc. problems However, encountered most of the public still express their understanding and active cooperation on the security check policy. by the public, such as complaints about air-conditioning failure in carriages or not being opened, As can be seen from Table 5, Topic 1 mainly reflects the equipment failure problems encountered damage of elevators and recharging machines for Metro cards. The high-frequency words in Topic 2 by the public, such as complaints about air-conditioning failure in carriages or not being opened, mainly reflect that passengers are blocked from traveling and late for work due to the malfunction of damage of elevators and recharging machines for Metro cards. The high-frequency words in Topic 2 Metro operation, and also reflect that Nanjing Metro is prone to malfunction under the influence of mainly reflect that passengers are blocked from traveling and late for work due to the malfunction of badMetro weather operation, such as and rain, also which reflect needs that Nanjing the attention Metro of is relevant prone to departments. malfunction under the influence of badWe weather filter relevant such as microblogsrain, which forneeds metro the operationattention of faults relevant above. departments. We use the names of metro stations to determineWe filter the relevantlocation ofmicroblogs fault and for measure metro theoperation severity faults of the above. fault withWe use the numberthe names of microblogsof metro generatedstations byto determine public, which the location is shown of infault Figure and9 .measure the severity of the fault with the number of microblogs

FigureFigure 9. 9.Distribution Distribution of location for for metro metro fault fault..

The location of the bubble in the figure indicates the location of the accidents mentioned in microblogs. The bubble size indicates the number of related microblogs. From the figure, the number of faults in metro line 1 is more intensive. We enlarge the image of the relevant area to Figure 10 and add the time labels. It is known that there are many accidents in metro line 1 from Andemen Station Sustainability 2019, 11, 4016 12 of 15

The location of the bubble in the figure indicates the location of the accidents mentioned in microblogs. The bubble size indicates the number of related microblogs. From the figure, the number of faults in metro line 1 is more intensive. We enlarge the image of the relevant area to Figure 10 and add the time labels. It is known that there are many accidents in metro line 1 from Andemen Sustainability 2019, 11, x FOR PEER REVIEW 12 of 14 Station to Hedingqiao Station, which has affected the trip of public and caused unsafe incidents such as confusionto Hedingqiao and panic. Station, which has affected the trip of public and caused unsafe incidents such as

Figure 10.FigureDetailed 10. Detailed distribution distribution of faults of in faults metro in line metro 1. line 1.

5. Conclusions5. Conclusion and Discussion and Discussion Based onBased the results on the ofresults model of analysismodel analysis and spatiotemporal and spatiotemporal properties properties above, theabove, public the opinionspublic opinions on the improvementon the improvement of Nanjing of Nanjing metro system metro aresystem summarized are summarized as follows: as follows: (1) Metro(1) operation Metro operation management management In the caseIn the of case severe of congestionsevere congestion in the morning in the morning and evening and evening rush hours, rush tryhours, to apply try to express apply express trains thattrains only that stop only at importantstop at important stations stations to alleviate to alleviate the passenger the passenger flow pressure flow pressure at some at stations. some stations. The frequencyThe frequency of maintenance of maintenance and overhaul and overhaul should should be increased be increase for Metrod for lineMetro 1 and line line 1 and 2, whichline 2, which have existedhave existed for a long for time. a long From time. the From analysis, the analysis, it can be it seencan be that seen the that frequency the frequency of accidents of accidents in metro in metro line 1 isline higher, 1 is especiallyhigher, especially in heavy in rain heavy and rain other and bad other weather. bad weather. (2) Station(2) Stationsafety management safety management StrengthenStrengthen the security the sec checkurity of check stations, of stations, especially especially in important in important stations stations and during and large-scaleduring large-scale events,events, which cannotwhich cannot be a mere be formality.a mere formality. In order In to order maintain to maintain the cleanliness the cleanliness of the carriages, of the carriages, we we should strictlyshould supervisestrictly supervise the uncivilized the uncivilized behavior behavior of passengers of passengers in carriages in carriag and encouragees and encourage the public the public to reportto and report supervise and supervise the uncivilized the uncivilized phenomena. phenomena. (3) Auxiliary(3) Auxiliary facilities facilities management management For theFor controversial the controversial situation situation of air-conditioning of air-conditioning temperature temperature in the in carriage,the carriage, improve improve the the air- air-conditioningconditioning system system and and optimize optimize its temperature its temperature regulation regulation system system to make to make it more it more in line in line with with the the perceptionperception of theof the majority majority of of people. people. Strengthen Strengthen the the supervision supervision ofof damageddamaged escalators, lamp lamp boards, boards,toilets toilets,, and and other facilities in thethe stations,stations, andand promptlypromptly checkcheck out out problems problems if if they they are are found found to be to be brokenbroken or or reported reported by by the the public. public. (4) Emergency(4) Emergency handling handling In case ofIn unexpectedcase of unexpected situation, situation, it is necessary it is necessary to make to a make good plana good in plan advance, in advance, and inform and theinform the public immediatelypublic immediately of the causes of the andcauses the and progress the progress of the treatment of the treatment in order in to order help passengersto help passengers obtain obtain timely informationtimely information and then and change then change their travel their plans. travel plans. This paperThis introduces paper introduces the process the ofprocess mining of public mining opinions public opinions on social on networks social networks and discusses and indiscusses detail thein theoreticaldetail the theoretical and practical and processes practical from processes data collection, from data analysis, collection, to modeling.analysis, to On modeling. the basis On the of contentbas analysis,is of content this analysis, paper attempts this paper to obtain attempts the temporalto obtain andthe spatialtemporal attributes and spatial of text. attributes Taking of text. the MetroTaking fault the as theMetro research fault as topic, the thisresearch paper topic, discusses this paper the location discusses of thethe faultlocation and of the the severity fault and the severity of the accident, and obtains the spatial distribution law of the fault on Metro lines. The results of content analysis show that passengers’ comments on metro system mainly focus on Metro congestion, air-conditioning temperature, environment in the carriage, equipment failure, and so on. From the analysis of the temporal and spatial characteristics of the text, it can be seen that the subway operation failure will cause a wide range of public discussion. According the accident location, it can be found the accidents mostly occurred in Metro Line 1 and on rainy days, which are built in the early years and have a larger passenger flow. Sustainability 2019, 11, 4016 13 of 15 of the accident, and obtains the spatial distribution law of the fault on Metro lines. The results of content analysis show that passengers’ comments on metro system mainly focus on Metro congestion, air-conditioning temperature, environment in the carriage, equipment failure, and so on. From the analysis of the temporal and spatial characteristics of the text, it can be seen that the subway operation failure will cause a wide range of public discussion. According the accident location, it can be found the accidents mostly occurred in Metro Line 1 and Line 2 on rainy days, which are built in the early years and have a larger passenger flow. This set of content analysis method can be applied to the field of traffic planning. The traditional public opinion collection is replaced by a large data collection method and manual processing is replaced by computer, which not only greatly improves the speed and efficiency of data collection and analysis, but also improves the accuracy. In the early stage of planning, it can be used to collect public needs and provide important data for planning. After the implementation of policies and projects, it can be used as a public supervision mechanism to collect public opinions in time to respond and deal with them as soon as possible. The timely handling of public opinions can also encourage the public to put forward their own valuable opinions for the development of urban traffic on the social network according to what they have seen and heard, and make up for the loopholes and shortcomings of planning. In this paper, the extraction and processing of traffic-related microblog text is still imperfect, which may lead to subsequent results affected. Direct use of traditional LDA model for topic modeling of microblog, to a certain extent, is still affected by the size, content, scattered format, data noise, and other aspects. The efficiency of LDA topic model is also influenced by the length of documents. The lack of sufficient words in a short text will affect the effectiveness of topic modelling. Mining the temporal and spatial characteristics of text data is relatively simple, such as not using the geographic location information published by microblog users. In the future, we can fully mine all kinds of social media data and establish a traffic public opinion monitoring system, which can provide a better supplement for traffic planning and management.

Author Contributions: Conceptualization, methodology, formal analysis, and writing—original draft: D.L.; writing—discussion of the original draft: D.L., Y.Z., and C.L.; Writing—revision and editing: D.L. and Y.Z.; supervision: D.L. Funding: This research was funded by the National Key Research and Development Program of China (NO. 2018YFB1600900), Natural Science Foundation of China (no. 51608115, NSFC-RCUK_EPSRC no. 51561135003), the open project of the Key Laboratory of Advanced Urban Public Transportation Science, Ministry of Transport, PRC. This research was also jointly funded by research grants from the Research Grants Council of the Hong Kong Special Administrative Region (Project No. PolyU 15212217) and the Hong Kong Scholars Program (Project No. G-YZ1R). Conflicts of Interest: The authors declare no conflict of interest.

References

1. Li, D.; Miwa, T.; Morikawa, T.; Liu, P. Incorporating observed and unobserved heterogeneity in route choice analysis with sampled choice sets. Transp. Res. Part C 2016, 67, 31–46. [CrossRef] 2. Booth, C.; Richardson, T. Placing the public in integrated transport planning. Transp. Policy 2001, 8, 141–149. [CrossRef] 3. Bickerstaff, K.; Talley, R.; Walker, G. Transport planning and participation: The rhetoric and realities of public involvement. J. Transp. Geogr. 2002, 10, 61–73. [CrossRef] 4. Kennedy, C.; Miller, E.; Shalaby, A.; Maclean, H.; Coleman, J. The four pillars of sustainable urban Transportation. Transp. Rev. 2005, 25, 393–414. [CrossRef] 5. Banister, D. The sustainable mobility paradigm. Transp. Policy 2008, 15, 73–80. [CrossRef] 6. Santos, G.; Behrendt, H.; Maconi, L.; Shirvani, T.; Teytelboym, A. Part one: Externalities and economic policies in road transport. Res. Transp. Econ. 2010, 28, 2–45. [CrossRef] Sustainability 2019, 11, 4016 14 of 15

7. Gil, A.; Calado, H.; Bentz, J. Public participation in municipal transport planning processes: The case of the sustainable mobility plan of Ponta Delgada, Azores, Portugal. J. Transp. Geogr. 2011, 19, 1309–1319. [CrossRef] 8. Schlag, B.; Teubel, U. Public acceptability of transport pricing. LATSS Res. 1997, 21, 134–142. 9. Rowe, G.; Frewer, L. Public participation methods: A framework for evaluation. Sci. Technol. Hum. Values 2000, 25, 3–29. [CrossRef] 10. Konisky, D.; Beierle, T. Innovations in public participation and environmental decision-making: Examples from the Great Lakes region. Soc. Nat. Resour. 2001, 14, 815–826. 11. Figueredo, J. Public Participation in Transportation: An Empirical Test for Authentic Participation. Ph.D. Thesis, University of Central Florida, Orlando, FL, USA, 2005. 12. Kramer, J.; Williams, K.; Hopes, C.; Bond, A. Performance Measures to Evaluate the Effectiveness of Public Involvement Activities in Florida; Center for Urban Transportation Research, University of South Florida, College of Engineering: Tampa, FL, USA, 2008. 13. Susilo, Y.; Joewono, T.; Santosa, W. An exploration of public transport users’ attitudes and preferences towards various policies in Indonesia: Some preliminary results. J. East. Asia Soc. Transp. Stud. 2009, 8, 1–15. 14. Li, Z.; Liu, P.; Xu, C.; Duan, H.; Wang, W. Reinforcement Learning-Based Variable Speed Limits Control to Reduce Crash Risks near Traffic Oscillations on Freeways. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1–14. [CrossRef] 15. Pan, Y.; Chen, S.; Qiao, F.; Ukkusuri, S.V.; Tang, K. Estimation of Real-Driving Emissions for Buses Fueled with Liquefied Natural Gas Based on Gradient Boosted Regression Trees. Sci. Total Environ. 2019, 660, 741–750. [CrossRef][PubMed] 16. Gu, X.; Mohamed, A.; Xiang, Q.; Cai, Q.; Yuan, J. Utilizing UAV video data for in-depth analysis of drivers’ crash risk at interchange merging areas. Accid. Anal. Prev. 2019, 123, 159–169. [CrossRef][PubMed] 17. Chao, W.; Ye, Z.; Chen, E.; Xu, M.; Wang, W. Diffusion approximation for exploring the correlation between failure rate and bus-stop operation. Transp. A Transp. Sci. 2019, 15, 1306–1320. 18. Chen, D. Research on Traffic Flow Prediction in the Big Data Environment Based on the Improved RBF Neural Network. IEEE Trans. Ind. Inform. 2017, 13, 2000–2008. [CrossRef] 19. Zeng, K.; Liu, W.; Wang, X.; Chen, S. Traffic congestion and social media in China. IEEE Intell. Syst. 2013, 28, 72–77. [CrossRef] 20. Wanichayapong, N.; Pruthipunyaskul, W.; Pattara-Atikom, W.; Chaovalit, P. Social-based traffic information extraction and classification. In Proceedings of the 11th International Conference on ITS Telecommunications, St. Petersburg, Russia, 23–25 August 2011; pp. 107–112. 21. Endarnoto, S.; Pradipta, P.; Nugroho, A.; Purnama, J. Traffic condition information extraction & visualization from social media Twitter for Android mobile application. In Proceedings of the 2011 International Conference on Electrical Engineering and Informatics, Bandung, Indonesia, 17–19 July 2011; pp. 1–4. 22. D’Andrea, E.; Ducange, P.; Lazzerini, B.; Marcelloni, F. Real-time detection of traffic from Twitter stream analysis. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2269–2283. [CrossRef] 23. Hasan, S.; Ukkusuri, S. Urban activity pattern classification using topic models from online geo-location data. Transp. Res. Part C 2014, 44, 363–381. [CrossRef] 24. Gu, Y.; Qian, Z.; Chen, F. From Twitter to detector: Real-time traffic incident detection using social media data. Transp. Res. Part C 2016, 67, 321–342. [CrossRef] 25. Kuflik, T.; Minkov, E.; Nocera, S.; Grant-Muller, S.; Gal-Tzur, A.; Shoor, I. Automating a framework to extract and analyse transport related social media content: The potential and the challenges. Transp. Res. Part C 2017, 77, 275–291. [CrossRef] 26. Rashidi, T.; Abbasi, A.; Maghrebi, M.; Hasan, S.; Waller, T. Exploring the capacity of social media data for modelling travel behaviour: Opportunities and challenges. Transp. Res. Part C 2017, 75, 197–211. [CrossRef] 27. Liu, Y.; Liu, Z.; Jia, R. DeepPF: A deep learning-based architecture for metro passenger flow prediction. Transp. Res. Part C 2019, 101, 18–34. [CrossRef] 28. Chen, E.; Ye, Z.; Wang, C.; Xu, M. Subway passenger flow prediction under special events using smart card data. IEEE Trans. Intell. Transp. Syst. 2019, 1–12. [CrossRef] 29. Ji, Y.; Ma, X.; Yang, M.; Jin, Y.; Gao, L. Exploring Spatially Varying Influences on Metro-Bikeshare Transfer: A Geographically Weighted Poisson Regression Approach. Sustainability 2018, 10, 1526. [CrossRef] Sustainability 2019, 11, 4016 15 of 15

30. Du, M.; Cheng, L. Better Understanding the Characteristics and Influential Factors of Different Travel Patterns in Free-Floating Bike Sharing: Evidence from Nanjing, China. Sustainability 2018, 10, 1244. [CrossRef] 31. Li, D.; Miwa, T.; Xu, C.; Li, Z. Non-linear fixed and multi-level random effects of origin–destination specific attributes on route choice behavior. IET Intell. Transp. Syst. 2019, 13, 654–660. [CrossRef] 32. Blei, D.; Ng, A.; Jordan, M. Latent Dirichlet allocation. J. Mach. Learn. Res. 2013, 3, 993–1022.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).