2017 2nd International Conference on Artificial Intelligence and Engineering Applications (AIEA 2017) ISBN: 978-1-60595-485-1 Passenger Route Identification of Rail Transit Based on AFC Data

RUILI ZHAO, ZHIZHONG ZHANG and JIE ZHU

ABSTRACT

In view of the problem of passenger route identification in the complex urban rail transit network, taking into account that the IC card only records passenger's travel time, the entry and exit station information, without recording the specific information of the stations they passed, therefore, to infer passenger's travel route, a passenger route identification scheme based on AFC data is designed in this paper, which based on the analysis of the main factors influencing the passengers' route choice and their route choice behavior. Firstly, the collected AFC data was preprocessed; Secondly, according to the main factors influencing passenger's route choice, the most possible routes were screened out; Then the time training between entry and exit stations was carried out, and passengers' travel routes were selected according to the entry time and exit time; Finally, taking the rail transit network as an example, the route identification scheme designed in this paper was verified by actual survey data. The result shows that the accuracy of route identification was 76.2%, and the travel route obtained by the scheme has high accuracy , which proves the feasibility of the scheme.

KEYWORDS Prediction model Track IC card, Data mining, Path identification, K shortest path.

INTRODUCTION

Recent years, as the urban rail transit is developing quickly, the flexibility of passenger's travel condition has improved significantly, meanwhile, it enhances the complexity of passenger's travel path selection. AFC (Automated Fare Collection) system technology emerged. The use of the AFC system significantly reduces the flow of passenger cash, and increases the efficiency of subway operations. In addition, the extensive application of the seamless transfer mode makes it convenient for passenger's transfer, but at the same time the AFC system only obtains the OD (Origin Destination) information of the passenger's travel. It is helpless for the detailed understanding of the passengers' specific route in the orbit network. The accuracy of passenger's travel route identification is significant, as for ticketing points, subway emergency management, passenger travel path induction, passenger flow forecast, and these all need an accurate grasp of the passenger s traveling route.

______Ruili Zhao, [email protected], Zhizhong Zhang [email protected], Jie Zhu, [email protected], Chongqing University of Posts and Telecommunications, Chongqing 400065, China

516

At present, relevant research on the route choice of rail transit passengers are: In the literature [1]-[4], Naohiko and Hibino et al studied the influencing factors of passenger travel choice and the law of route selection in the rail transit system based on RP passenger flow survey. In the literature [5]-[7], Bingfeng Si et al proposed a general path selection behavior model based on generalized cost, and analyzed the optimization algorithm of passenger flow distribution of rail transit. This paper analyzes the main influencing factors of the urban passenger flow distribution, then puts forward the theoretical model for the distribution problem, and designs the corresponding algorithm to verify the problem. In references [8]-[10],according to the status of rail transit operations, Xiangyun Wu et al proposed the urban rail transit passenger flow distribution model and use the appropriate algorithm for solution. In the above study, the problem of passenger routing choice and the distribution of passenger flow in traffic network have been analyzed respectively. Based on the above research, this paper analyzes the main influencing factors of passenger route selection in urban rail transit, designs the passenger route identification of urban rail transit network, and verifies the design scheme through statistical investigation and computer simulation.

PASSENGER'S TRAVEL ROUTE IDENTIFICATION SCHEME

As the requirements of transfer time, travel time are different for different people, resulting in the choice of travel routes are not same. In a complex urban rail transit network, each OD pair offers some alternative paths in theory, but only a part of the route will be considered by passengers because of the impact of factors such as travel time, transfer times, crowdedness and price. How to weigh these factors, design a suitable scheme to identify the passenger travel route has become a hot topic of rail transit.

Factors that affect passenger route selection.

(1) Travel time The total time spent by the passenger from the departure place to the destination is called the travel time. In the urban rail transit system, the passenger travel time is positively related to the mileage of the train, and it is the most important factor when selecting the path. (2) Travel expenses In the urban rail transit system, the travel expense refers to the total cost of the passenger from the departure point to the destination, and the travel expense is also a major consideration when passenger select the path. (3) Transfer situation Transfer times and the walking distance while transferring can be reflected by the transfer time. When the passenger travels face multiple path choices and time costs are close, at this time convenience will be the main factor affecting the passenger's choice. (4) Comfortableness Comfortableness mainly depends on the degree of crowding, when the degree exceeds a certain value, it will directly affect the choice of passengers' selection.

517 Preprocessing data

Delete this data marked as 22 Grouped by user

Sorted by time N Check next record whether the entry/ exit flag is 21

Check the first record whether N Y the entry/ exit flag is 21

Y

N Delete the former data Check next record whether the marked as 21 entry/ exit flag is 22

Y Match N Check next record whether the Y Check next record whether the N successfully, entry/ exit flag is 22 entry/ exit flag is 22 output data

Y

Delete the former data marked as 21and later two data marked as 22 Figure 1. The flow chart of OD match.

Take the median of it Output

N

Y Get group numbers The number of time samples of by square and take each OD is > 50? the upper bound

Range divide group number, get the class interval OD data(delete data whose time Take the maximum below 120 seconds or longer than value of each OD 9000 seconds),delete staff data Take the median of the group Take the range of each OD pair

Take the minimum Choose the value of each OD Divide into groups group which has by range the most sample Figure 2. OD time training flow chart.

Specific scheme design.

The specific process of passenger' route identification is as follows: Step 1: Collect the data of the IC card of the rail transit passengers. The original data is sorted and processed into the preprocessing data. Then use the OD match for the preprocessing data, mainly by time series. Users were sorted by time, combined with the entry and exit identification to determine the user's each pair of OD, the specific implementation process of OD match is shown in Figure 1;

518 Step 2: According to K (K = 3) asymptotic paths searching algorithm, choose three feasible routes ,each route is from the inbound station to the outbound station, which are, the shortest path, the second shortest path, the third shortest path; The process is as follows: (1) The shortest path According to the urban rail transit network map, an oriented graph is constructed in which the distance of the connected nodes is described as the travel distance between adjacent stations. For the OD pair that needs to be calculated, we can find the shortest path between the OD pairs through the classical Dijkstra shortest path algorithm. (2)The second shortest path According to the shortest path between the OD pairs obtained by the algorithm (1), each time we delete the edge of the shortest path from the original oriented graph and we use the algorithm(1) for the new oriented graph formed, A temporary second shortest path is obtained, and the process is repeated until all the edges in the shortest path obtained in algorithm (1) are deleted, and then we compare all temporary second shortest paths, and the shortest one is defined as the second shortest path. (3) The third shortest path If the shortest path and the second shortest path obtained by (1) (2) contain the same edge, then we delete one same edge from the original image each time, and then use the algorithm (1) to search until the remaining edges of the shortest path and the second shortest path do not contain same edges, and then make these two groups of remaining into pairs, each time we delete a pair, and then use the (1) to search; Finally, we compare all the temporary paths, the shortest one is the third shortest path. Step 3: Do the OD time training according to the matching OD data, obtain the average training time from departure place to the destination, the specific operation process is shown in Figure 2. Step 4: The final route selecting, according to the training time in step 3 and the passengers’ entry and exit time information, find the most likely travel path from above three routes obtained from Step2, the specific process is shown in Figure 3. CASE TESTING

In order to test the rationality of the scheme, take the network as the research object. By December 2016, Chongqing Rail Transit has four operating lines, including 1,2,3,6 line (including the EXPO line, airport line) , covering the entire city of Chongqing, which contains 126 stations, eight transfer points, 213 km operating mileage, the highest daily passenger volume is 261.82 million times. The average daily passenger volume is more than 2 million times. As shown in Table 1, we can see the Chongqing city rail transit station map, Figure 4 shows the Chongqing rail transit line map.

Data Preparation (OD Match)

First, number the various stations of Chongqing Rail Station, as shown in Figure 5, part data of Chongqing Rail stations. Where NO is the number of the station, CODE means the line number, NAME is the station name, TS is the transfer identifier (0 for the normal station, 1 for the transfer station), and TS_TIME means the average transfer time (Unit: second).

519 KSP path Training time of each OD

Y Whether it is a single path Whether the absolute value of the Y actual time and the minimum time N difference is> = 1200 seconds

Calculate the transfer times of each route N

Delete route which Calculate the Is the number of Y differs greatly with consuming time paths> 2? real time, and left Delete route whose transfer N of each route times is higher than other route two routes

Calculate the Whether it is Y Calculate the number consuming time two paths of passing sites of each route

N

Delete the route which pass the same site twice The absolute value of the Choose the path N Output route matching time difference is which fits the real trajectory <1200 seconds time mostly

Delete route whose passing sites is bigger than other route Y

Y Whether the transfer time is equal

N Choose the path with lower transfer times

Figure 3. The flow chart of path selection.

Figure 4. Chongqing rail transit network map.

520 Second, describe and pretreat the collected data to meet the needs of our scheme. Chongqing Rail Transit AFC system currently only accept bus and single ticket, therefore, this intelligent traffic card has a market share of 100%, so the data is comprehensive. This paper selects the data from October 19 to 24, 2015, during which there is no holiday and special circumstances, the collected data can more accurately reflect the daily travel situation of Chongqing passengers.

The data collected certainly contains some abnormal data, how we treat these data will directly affect the accuracy of the results, therefore, an appropriate treat of exception data is significant. For the exception data processing, the steps are as follows: (1) Remove data which only has the entry records or only outbound records. (2) Inbound and outbound identification is opposite to the time series, and we correct the data to match the last entry time from the records, and delete the data if it cannot be corrected. For example, a user's records are sorted by time as shown in Table 2. Through the program to correct the middle of the two data, after correcting, the data is shown in Table 3.

Table 1. Transfer station information table. Site Line A Line B Transfer Form Xiaoshizi Line 6 Channel transfer Jiaochangkou Line 1 Channel transfer Lianglukou Line 1 Line 3 Cross transfer Daping Line 1 Line 2 Channel transfer Niujiaotuo Line 2 Line 3 Channel transfer Yudong Line 2 Line 3 Station transfer Hongqihegou Line 3 Line 6 Cross transfer Lijia Line 6 EXPO Line On the same stage transfer

Figure 5. Part data of Chongqing Light Rail transit.

Table 2. User’s check time sequence table. Card number Entry and exit identification Time 00000008888 21 2015/10/20 08:39:24 00000008888 22 2015/10/20 09:22:24 00000008888 22 2015/10/20 10:23:24 00000008888 21 2015/10/20 10:44:24 00000008888 21 2015/10/20 17:33:24 00000008888 22 2015/10/20 18:26:24

521

Table 3. Correction table of user’s check time series. Card number Entry and exit identification Time 00000008888 21 2015/10/20 08:39:24 00000008888 22 2015/10/20 09:22:24 00000008888 21 2015/10/20 10:23:24 00000008888 22 2015/10/20 10:44:24 00000008888 21 2015/10/20 17:33:24 00000008888 22 2015/10/20 18:26:24

(3) Data with more than once entry record or exit record, match the data by the closest time. (4) In and out the station twice, which means, the record shows a single user get in and out station twice, according to the last pit stop time information of the outbound record to correct the match, the data will be deleted if it cannot be matched. (5) Every day before 0:00 and after the data to be combined after the day (4:00) before the data to match, if still no match is removed. (6) Remove the match of the OD pair in which the entry and exit station are same. OD matching, mainly aims at the time series, for each user were sorted by time, combined with the entry and exit identification to determine the user's each pair of OD . Successful matched data has to be output as an intermediate result, the output data is shown in Figure 6, the output line’s structure is as follows: card number; card type; entry station identification; entry time; entry station number; exit station identification; exit time; exit station number; amount; entry gate number; exit gate number; transfer sign; entry and exit time difference (Unit: second).

Figure 6. OD match results graph.

Figure 7. K shortest path result graph.

522

K shortest path

For the matching OD pair, use the KSP algorithm to find three shortest path in each travel record. When finding the shortest path of KSP, add the actual track topology information and attribute information. The input is the entry and exit station number, the output is three best paths which has been calculated before. The output results data of KSP shortest path is shown in Figure 7, the output line field structure is as follows: card number; card type; entry station identification; entry time; entry station number; exit station identification; exit time; exit station number; amount; entry gate number; exit gate number; transfer sign; entry and exit time difference (Unit: second); K path (Each path is separated by "#").

Time training

To ensure the effectiveness of time training, the input should contains at least one month data to match the OD result data. The output is OD and the training time: entry station number; exit station number; Time (second). The results of the time training are shown in Figure 8.

Figure 8. Time training results.

Figure 9. Path recognition result graph.

523 Path Selection

The purpose of the path selection is use the OD training results to select one path from the derived KSP paths which is most consistent with passenger’s real situation. The output result of the path selection is shown in Figure 9, and the output line field structure is: card number; card type; entry station identification; entry time; entry station number; exit station identification; exit time; exit station number; amount; entry gate number; exit gate number; transfer sign; entry and exit time difference (Unit: second); the route of final choice; the length of Rail path; transfer times; path time. The number of output is the actual number of the track site, and the transfer station has two numbers, take the later number as the reserved number.

CALCULATION RESULTS AND COMPARISON

In order to test the reliability and practicability of the design scheme, we collect the sample data to verify the scheme. The sample data is from the actual test. Which records the card number and travel route of 500 passengers on October 20, 2015. Some sample data of the travel routes were recorded as shown in Figure 10. According to the scheme proposed in this paper, the identification results for are shown in Figure 11, we make the results to a chart, as shown in Figure 12 (part which be marked by red font are stations that has different results with the sample data).

Figure 10. Partial sample data route.

Figure 11. Partial route identification results.

524

Figure 12. Partial sample data identifies routes.

According to the scheme designed in this paper and the sample data. The verification data shows that there are 381 lines consistent with the route in the 500 lines sample data, and the correct rate of the scheme is 76.2%.

SUMMARY

Based on the study of passenger route identification of rail transit, this paper proposes a passenger travel route identification scheme based on data mining. Firstly, we introduce the present situation of the urban rail transit and analyze the main factors influencing the choice of passenger travel route. Combined with the actual operation of urban rail transit, we design the passenger’s route identification scheme. Finally, use the large data technology to analyze the rail transit AFC system and test with the actual operation data of Chongqing rail transit. From the results of the verification of the sample data, the accuracy of the design scheme is 76.2%, the ability of recognition is strong, but the amount of data recorded in the input model is about 3 million per day. Compared with the amount of data generated per day, the amount of data collected is relatively low. Therefore, the accuracy of the proposed scheme needs further verification.

REFERENCES

1. Naohiko H., Hyodo T., Uchiyama H. A study on characteristics of non-IIA route choice models on high density railway network [J]. Journal of Infrastructure Planning and Management, 2004, 765: 131-142. 2. Naohiko H., Yoshihisa Y., Uchiyama H. A study on evaluation of level of railway services in Tokyo metropolitan area based on railway network assignment analysis [J]. Journal of the Eastern Asia Society for Transportation Studies, 2005, 6: 342-355. 3. Li Sijie, Zhu Wei, Huang Zhaodong, etc. The spatial and temporal trajectory of urban rail transit passengers based on WIFI data [J]. Journal of east China Jiao tong University, 2017, 34 (2): 85-92. 4. Liu Shasha, Yao Enjian, Zhang Yongsheng, etc. The planning algorithm of personalized travel path for rail transit passengers [J]. Transportation system engineering and information, 2014, (5): 100-104, 132. 5. Si Bingfeng, Mao Baohua, Liu Zhili. Traffic distribution model and algorithm of urban rail transit network under seamless transfer conditions [J]. Journal of railway, 2007, 29 (6): 12-18. 6. Xia hexiang, Liu erhui. Study on the selection of passenger flow bottleneck in rail transit station based on reverse search [J]. Transportation research, 2015, (2):36-41. 7. Du Cuifeng, Wang Jun. Based on improving the location planning of urban rail transit sites based on improved PageRank algorithm [J]. Mobile communication, 2016, 40 (14): 60- 65. 8. Wu Xiangyun, Liu Canqi. Model and algorithm of equilibrium allocation of passenger traffic flow in rail [J]. Journal of tongji university (natural science edition), 2004, 32 (9): 1158-1162. 9. Wang Qingyu, Chen Lei. Analysis of passenger flow characteristics in line 5 of shenzhen rail transit [J]. Transportation technology and economy, 2015, 17 (6): 71-75.

525 10. Lijie Yu, Kuanmin Chen, Yang Liu et al. The Analysis of Characteristics of the Passenger Flow of the Original and Terminal Stations of Urban Rail Transit: Take the NO.1 and NO.2 Subway Lines in Xi'an for Examples [C].//14th COTA international conference of transportation professionals, vol. 2: 14th COTA (Chinese Overseas Transportation Association) international conference of transportation professionals (CICTP 2014), 4-7 July 2014, Changsha, China. 2014: 1433-1442.

526