Discovery of Important Crossroads in Road Network Using Massive Taxi
Total Page:16
File Type:pdf, Size:1020Kb
Discovery of Important Crossroads in Road Network using Massive Taxi Trajectories Ming Xu1, Jianping Wu2, Yiman Du2,, Haohan Wang3, Geqi Qi2, Kezhen Hu2, Yunpeng Xiao4 1School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China 2Department of Civil Engineering, Tsinghua University, Beijing, China 3School of Computer Science, Carnegie Mellon University, Pittsburgh PA USA 4School of Software, Chongqing University of Posts and Telecommunications, Chongqing, China {xum, xiaoyp}@bupt.edu.cn, {2855175, 153789104}@qq.com ABSTRACT main traffic volume is carried on a small number of nodes. A major problem in road network analysis is discovery of Lämmer et al. [2] find that road network shows a phenomenon of important crossroads, which can provide useful information for cascading failures. In other words, when a few crucial nodes fail, transport planning. However, none of existing approaches some other nodes would also fail because of the connection addresses the problem of identifying network-wide important between the nodes, and this situation can even lead to the collapse crossroads in real road network. In this paper, we propose a novel of the entire road network. Therefore, how to identify such a few data-driven based approach named CRRank to rank important significant nodes accurately, and then focus on them for control crossroads. Our key innovation is that we model the trip network strategies, is an effective approach for reducing congestion and reflecting real travel demands with a tripartite graph, instead of improving efficiency of road network. solely analysis on the topology of road network. To compute the In this paper, we aim to discover the important nodes in the road importance scores of crossroads accurately, we propose a HITS- network using a novel data-driven framework named CRRank. In like ranking algorithm, in which a procedure of score propagation out method, the trip network which consists of dynamic transport on our tripartite graph is performed. We conduct experiments on information like traffic flow, Origin-Destination (OD) and paths CRRank using a real-world dataset of taxi trajectories. of trips is considered instead of static road network. Firstly, we Experiments verify the utility of CRRank. extract trip information from massive daily taxi trajectories and road network, and then we model the trip network using a tripartite graph. Inspired by HITS algorithm [9], which ranks web documents based on link information among nodes in a hyperlink Categories and Subject Descriptors graph, we propose a mutual reinforcing algorithm, which D.3.3 [Analysis of Algorithms and Problem Complexity]: performs an iterative procedure of score propagation over our Nonnumerical Algorithms and Problems – Pattern matching. tripartite graph to rank the network-wide important crossroads. The remainder of the paper is organized as follows. In Section 2, General Terms related studies of discovery of important nodes are presented. In Algorithms Section 3, we briefly discuss original intention behind our model. Then, proposed framework and related definitions are introduced Keywords in this section. In Section 4, basic definition of tripartite graph, Tripartite graph, Ranking algorithm, Road network, Taxi and how it can be used in the trip network modeling are described. trajectories, Also, an algorithm of score propagation on tripartite graph is presented here. Experiments on validity of CRRank algorithm are presented in Section 5. Section 6 concludes the paper with a 1. INTRODUCTION summary. So far, urban transportation system is one of the largest and most complex systems closely related with people. In such a system, road network plays a key role, for it carries all the daily traffic 2. RELATED WORK flow. The understanding and interpretation of reliability of road Many previous methods of searching for important nodes are network is a crucial component to improve transportation widely studied in complex network and graph mining. In these efficiency, which attracts a large number of researchers. In general, methods, some metrics are proposed to accomplish this task. Zhao road network is abstracted as graph structure. In this structure, et al. [3] find that degree distribution of a node is closely related nodes represent crossroads, and edges represent road segment to its importance. In detail, hubs or high connectivity nodes at the linking two neighborhood crossroads. tail of the power-law degree distribution are known as important ones and play key roles in maintaining the functions of networks. De Montis et al. [1] find that an obvious scale free characteristic Betweenness centrality is also used to find crucial central nodes is showed in the road network, and the relationship between nodes [4], and Wang et al. [5] introduce betweenness centrality to and traffics follows a power law distribution, which means, the analyze air transport network. A method to measure the node importance by combining degree distribution and clustering Copyright is held by the author/owner(s). coefficient information is presented in [6]. Unfortunately, these UrbComp’14, August 24, 2014, New York, NY, USA correspondence author of this paper metrics cannot be directly applied to road network, since each while traffic flows of is generated by a wider range of OD pairs. node in road networks exhibits a very limited range of degrees, Obviously, crossroad , which acts as a traffic hub, is more and almost all the nodes degrees are similar. Moreover, these important than crossroad . Consequently, we can conclude that metrics only reflect characteristics of topology of the road network, some dynamic elements, such as traffic flows and OD the importance of a crossroad is related to OD distribution of trips information, are lost by utilizing them straightforwardly. as well as traffic flow. In fact, transportation network includes not only road network, but also the trip network which reflects travel Another approach is called system analysis approach. The basic demands. The latter can provide more valuable information for idea of such approach is evaluating cascading failure after locating important crossroads. removing each node. Combining such idea and characteristics of transport system, Wu et al. [7] present a cascading failure a b c d transportation network evaluation model, which can reveal deep A B C knowledge of network. However, such theory-driven model has to be analyzed on a simulated network instead of real road network, e f g h while the underlying logic of the real road network can hardly be D E F grasped for simulation. In addition, some data-driven approaches depending on the calculation of the eigenvectors are successfully i j k l used to find important web or information on internet. Google G H I PageRank [8] and HITS are famous examples in this category. This work is inspired by this kind of methods. A major difference m n o p of CRRank from approaches previously mentioned is that our Figure 1. Insert caption to place caption below. data-driven algorithm is based on the tripartite graph which incorporates traffic, travel demands, empirical knowledge of drivers and topology of road network, and CRRank can identify . network-wide important crossroads effectively. 3.2 Framework of our method In addition, the flourishing of sensing technologies and large- Before giving our approach, we first introduce some terms used in scale computing infrastructures have produced a variety of big this paper. and heterogeneous data in urban spaces, e.g., human mobility, Definition 1 (Road Network): A road network is a directed vehicle trajectories, air quality, traffic patterns, and geographical weighted graph GVE(,) , where V is set of nodes representing data [10]. These data brings out a variety of novel data mining crossroads or end points of road segments, and E is set of methods for urban planning and road network analysis. A LDA- weighted edges representing road segments linking two crossroads, based inference model combining road network data, points of a node is uniquely identified with an id v. id and an edge is interests, and a large number of taxi trips to infer the functional v e regions in a city is proposed in [11]. Zheng et al. [12] discover the associated with an id e. id . e. level indicates the level of e . In bottleneck of road network using the corresponding taxi detail, if e is an expressway, freeway or arterial road, e. level is trajectories. Chawla et al.[13] present a PCA-based algorithm to equal to 1, if is a sub-arterial road, is equal to 2, if is identify link anomalies, and then optimization techniques is a bypass, is equal to 3. L1 utilized to infer the traffic flows that lead to a traffic anomaly. Liu Definition 2 (Region): A city map can be partitioned into some et al. [14] use frequent subgraph mining to discover anomalous non-overlapping regions by high level road segments in road links for each time interval, and then form outlier trees to reveal network. The set of these regions is represented by {}g , the potential flaws in the design of existing traffic networks. ii1 Being similar with these studies, this paper also addresses a where is total number of Regions. A region g is associated problem of transport planning using a large number of taxi with an uniquely identifier g. id ,some low level road segments trajectories. can be included within a region,this method of map partition can 3. OVERVIEW preserve semantic information of regions , e.g. school, park, business areas or resident area etc. 3.1 Motivation To illustrate our idea, consider a simplified road network as an Definition 3 (Transition): A transition t is represented by a example in Figure 1. Each circle represents a crossroad, and each triple o,, d Pt , where o represents origin region, d directed edge linking two neighbor circles represents a road represents destination region, , Pt is the set of segment.