Ticket Data Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Analysis of Communication and Transport System, Linköping University, 2017. Analysis of Communication and Transport Systems 2017 Ticket data analysis Lars Drageryd, Ivan Postigo, and Tobias Åresten Abstract As public transport agencies move towards more intelligent ways of handling tickets and payments, new doors to analyze travel patterns are opened in parallel. This project highlights one method to analyze passenger travel data in a smartcard based ticket system. Data from Blekingetrafiken was used with two versions of algorithms based to estimate travel flow OD-matrices during October 2016. The advanced algorithm was able to estimate the destination for 64% of the data set, which is similar to results obtained by Trepanier and Chapleu (2006). The project showed examples of numerous different outputs enabled by the algorithm such as; peak hour analysis, clustered OD-matrices and transit-analysis. Keywords: Public transports; OD-estimation; Transit analysis; Smartcard data; 1. Introduction Smartcard Automated Fare Collection Systems (SCAFS) are frequently used in public transport systems all around the globe. The system can be considered convenient and efficient, both through the eyes of the user and operator. One operator in Sweden that utilizes the system is Blekingetrafiken, responsible for public transports in the southern Swedish county of Blekinge. The system works such that travelers register a smartcard charged with money or a time validation in a transaction machine upon boarding of the public transport vehicle. The primal scope of the system from the operators' point of view is to have a flexible, efficient and fast way of collecting the fare for the trip (Kurauchi & Schmöcker, 2017). However, SCAFS also provide an alternative benefit by providing the agency with large amounts of individual travel data. Data that can be used to analyze for instance utilization of vehicles, time and frequency of transits. With knowledge of the behavior of the public transport users, the service can be planned more efficiently. A problem while analyzing travel behavior based on SCAFS is that users, mainly due to convenience, only tap their card upon boarding and not when alighting the vehicle. This is also the case with Blekingetrafiken’s system. This results in data that does not provide information on the whole trip, but merely the start of it. Hence, it is easy to locate the origins of the trips but more challenging to estimate their destination. A trip estimation model developed in 2006 provides one alternative solution to the problem. Several behavioral assumptions are made in the model; travelers are assumed to always return to their origin by the end of the day, furthermore are travelers that transit between two lines assumed to want to walk as little as possible. The algorithm tries to find the nearest alighting station from the previous trip for transit travelers. (Trépanier & Chapleau, 2006) This project features data from Blekingetrafiken on trips registered by the smart card system in the county during October 2016. The main scope of the project is to construct an OD-matrix using data from tap-ins and the algorithm developed by Trépanier & Chapleau (2006). With a deeper knowledge of travel patterns, traffic planners and decision makers can be assisted to plan the public transport service more efficiently. Proceedings of the course TNK103 Analysis of Communication and Transport System, Linköping University, 2017. 1.1. Aim The primal aim of the paper is to estimate the destination of trips using data of tap-ins provided by Blekingetrafiken. With the help of two algorithms based on the one described by Trépanier & Chapleau (2006) will the project generate OD-matrices for trips in Blekingetrafiken during October 2016. One of the algorithm will be considered simpler not taking timetables into consideration while the other will. The concept of the OD is further explained in subchapter 1.2. The OD-matrices serves the purpose of performing a deeper analysis of travel patterns. To be able to visualize the OD-matrix efficiently will origins and destinations, representing stations, be clusterered. This will shrink the size of the OD-matrix substantially making it easier to identify travel patterns. Besides generating the OD-matrices will the scope of the project be to answer the following questions: • What will the difference be when implementing the time plan in the algorithm? • What are the biggest challenges to overcome while transferring raw data to results? 1.2. Limitations Due to privacy concerns of the passengers, the number to identify the smartcards is randomized every day. This means that no analysis can be done to spot travel patterns for a unique user during a period of time other than those occurring in one single day. Some travelers might behave repetitively on a weekly but this behavior cannot be identified due to this daily randomization. The algorithm presented by Trépanier & Chapleau features methods to analyze also these kinds of data but since the id-number of the cards are randomized each day can no such analysis be conducted in this project. The algorithm takes into consideration the next blip performed by the same card when estimating the alighting station of the previous trip. Cards registered in the system only once during a day will, therefore, be neglected in the analysis. A fundamental concept in traffic planning in general is that of OD-matrices. The OD represents origins and destinations and could, depending of what is of interest to analyze represent the movement of vehicles, the travel time or the number of passengers. In this project is are all origins destinations and hence all destinations origins. Origins and destinations are represented in real life by stations and the generated numbers in the matrix represent the registered trips (passengers) between the two during the course of one day. As the number of stations are large (above 1000) will stations later be clustered together to reduce the size of the OD-matrix. The project features tap-in data from Blekinge for the month of October 2016. In the initial analysis will the entire data-set be used, however, in the final analysis, will a smaller fraction, featuring data from only the 3rd of October be used. The reason behind this delimitation is that the purpose of the project is not to analyze the traffic situation in Blekinge but rather show methods on how tap-in data can be used to give traffic planners support for decisions. Data on the whole month can be later analyzed with the method presented, but due to the size of the data, computation limitations also exist, taking a long time to run the algorithm. 1.3. Outline The remaining of the report has the following structure: Chapter 2 presents the methodology for the project. Chapter 3 presents relevant literature. Chapter 4 features a description of the data given. Chapter 5 highlights initial analysis of tap-in data. Chapter 6 explains the algorithm used. Chapter 7 and 9 features the results and the conclusion of the project. 2 Proceedings of the course TNK103 Analysis of Communication and Transport System, Linköping University, 2017. 2. Methodology A large obstacle with OD-estimation in public transport systems is the lack of destination data as most systems do not require users, mainly due to convenience, to tap their card upon leaving the vehicle. Due to this will the alighting station need to be estimated to generate an OD-matrix. In this project the destinations are estimated using tap-in data, stations data and the 2016-timetable from Blekingetrafiken. Tap-in data includes a timestamp of when a passenger boarded what bus and at which station. Stations-data includes the stations and their corresponding coordinate. The timetable corresponds to all arrival and departure times for all vehicles at all stations during all periods in time and is hence substantial in size. The stations connected to Blekingetrafiken was extracted from the timetable data-set using a SQL-query. Through this was the timetables only of relevance imported into the algorithm. An example of the three types of data is seen in the tables below (see chapter 4 for a deeper description of data). Table 1 - A selection of the most important columns stated as an example of tap-in data. The columns represents the time stamp, the route, the stop number, the sequence of the stop in the route and daily randomized card id. TRS_DT RUT STP_LST_NUM STP_LST_SEQ_NUM CRD_NUM 2016-10-05 21:33 150 150013 33 23b22c6ccd530509c86dd7133477d4ce Table 2 - The stations data with stations number expressed both using local and national methods, the station name and its coordinate STP_LST_NUM Agency STPNAMPRN GPS_LATITUDE GPS_LONGITUDE Stop_id 100101 1001 Kungsplan Karlskrona centrum 56,165381 15,586921 Table 3 – Timetable data displaying time of arrival, departure at specific stop, sequence of stop and unique trip id. Trip_id Arrival_time Departure_time Stop_id Stop_sequence Agency_stop_id 45501 15:32:00 15:35:00 740000096 2 8109 Using this given data, destinations are estimated with an algorithm based on the one presented by Trépanier & Chapleau (see chapter 6 for algorithm description). As the algorithm features, broad assumptions might affect the estimations while studied in deep detail be far from perfect, however, the main purpose of the algorithm is rather to get an indication of travel behavior on a more general and broad level. Initial analysis and filtering of data have been done through Microsoft Excel and SQL-queries in Microsoft Access. In the excel-analysis (see chapter 5) the pivot-table function was used to analyze tap-ins during different periods in time not taking into consideration the algorithm or any potential alighting station. In the initial analysis was the GIS-software ArcMap used to display the frequency of tap-ins at the different stations. As mentioned, was the algorithm to a high degree based on the one developed by Trepanier and Chapleau, however not entirely similar.