Mining Airfare Data to Minimize Ticket Purchase Price

To Buy or Not to Buy: Mining Airfare Data to Minimize Ticket Purchase Price Oren Etzioni Craig A. Knoblock Dept. Computer Science Information Sciences Institute University of Washington University of Southern California Seattle, Washington 98195 Marina del Rey, CA 90292 [email protected] [email protected] Rattapoom Tuchinda Alexander Yates Dept. of Computer Science Dept. Computer Science University of Southern California University of Washington Los Angeles, CA 90089 Seattle, Washington 98195 [email protected] [email protected] ABSTRACT Keywords As product prices become increasingly available on the price mining, Internet, web mining, airline price prediction World Wide Web, consumers attempt to understand how corporations vary these prices over time. However, corporations change prices based on proprietary algorithms and hid- 1. INTRODUCTION AND MOTIVATION den variables (e.g., the number of unsold seats on a flight). Corporations often use complex policies to vary product Is it possible to develop data mining techniques that will prices over time. The airline industry is one of the most enable consumers to predict price changes under these con- sophisticated in its use of dynamic pricing strategies in an ditions? attempt to maximize its revenue. Airlines have many fare This paper reports on a pilot study in the domain of air- classes for seats on the same flight, use different sales chan- line ticket prices where we recorded over 12,000 price obser- nels (e.g., travel agents, priceline.com, consolidators), and vations over a 41 day period. When trained on this data, frequently vary the price per seat over time based on a slew Hamlet | our multi-strategy data mining algorithm | gen- of factors including seasonality, availability of seats, compet- erated a predictive model that saved 341 simulated passen- itive moves by other airlines, and more. The airlines are said gers $198,074 by advising them when to buy and when to to use proprietary software to compute ticket prices on any postpone ticket purchases. Remarkably, a clairvoyant algo- given day, but the algorithms used are jealously guarded rithm with complete knowledge of future prices could save trade secrets [19]. Hotels, rental car agencies, and other at most $320,572 in our simulation, thus Hamlet's savings vendors with a \standing" inventory are increasingly using were 61.8% of optimal. The algorithm's savings of $198,074 similar techniques. represents an average savings of 23.8% for the 341 passen- As product prices become increasingly available on the gers for whom savings are possible. Overall, Hamlet saved World Wide Web, consumers have the opportunity to be- 4.4% of the ticket price averaged over the entire set of 4,488 come more sophisticated shoppers. They are able to com- simulated passengers. Our pilot study suggests that mining parison shop efficiently and to track prices over time; they of price data available over the web has the potential to save can attempt to identify pricing patterns and rush or delay consumers substantial sums of money per annum. purchases based on anticipated price changes (e.g., \I'll wait to buy because they always have a big sale in the spring..."). In this paper we describe the use of data mining methods to Categories and Subject Descriptors help consumers with this task. We report on a pilot study I.2.6 [Artificial Intelligence]: Learning in the domain of airfares where an automatically learned model, based on price information available on the Web, was able to save consumers a substantial sum of money in simulation. The paper addresses the following central questions: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are • What is the behavior of airline ticket prices not made or distributed for profit or commercial advantage and that copies over time? Do airfares change frequently? Do they bear this notice and the full citation on the first page. To copy otherwise, to move in small increments or in large jumps? Do they republish, to post on servers or to redistribute to lists, requires prior specific tend to go up or down over time? Our pilot study en- permission and/or a fee. SIGKDD ’03, August 24-27, 2003, Washington, DC, USA. ables us to begin to characterize the complex behavior Copyright 2003 ACM 1-58113-737-0/03/0008 ...$5.00. of airfares. 119 • What data mining methods are able to detect for each departure date was collected 8 times a day.2 Over- patterns in price data? In this paper we consider all, we collected over 12,000 fare observations over a 41 day reinforcement learning, rule learning, time series meth- period for six different airlines including American, United, ods, and combinations of the above. etc. We used three-hour intervals to limit the number of http requests to the web site. For each flight, we recorded • Can Web price tracking coupled with data min- the lowest fare available for an economy ticket. We also ing save consumers money in practice? Vendors recorded when economy tickets were no longer available; we vary prices based on numerous variables whose values refer to such flights as sold out. are not available on the Web. For example, an airline may discount seats on a flight if the number of 2.1 Pricing Behavior in Our Data unsold seats, on a particular date, is high relative to We found that the price of tickets on a particular flight the airline's model. However, consumers do not have can change as often as seven times in a single day. We cate- access to the airline's model or to the number of avail- gorize price change into two types: dependent price changes able seats on the flights. Thus, a priori, price changes and independent price changes. Dependent changes occur could appear to be unpredictable to a consumer track- when prices of similar flights (i.e. having the same origin ing prices over the Web. In fact, we have found price and destination) from the same airline change at the same changes to be surprisingly predictable in some cases. time. This type of change can happen as often as once or twice a day when airlines adjust their prices to maximize The remainder of this paper is organized as follows. Sec- their overall revenue or \yield". Independent changes occur tion 2 describes our data collection mechanism and analyzes when the price of a particular flight changes independently the basic characteristics of airline pricing in our data. Sec- of similar flights from the same airline. We speculate that tion 3 considers related work in the areas of computational this type of change results from the change in the seat avail- finance and time series analysis. Section 4 introduces our ability of the particular flight. Table 1 shows the average data mining methods and describes how each method was number of changes per flight aggregated over all airlines for tailored to our domain. We investigated rule learning [8], Q- each route. Overall, 762 price changes occurred across all learning [25], moving average models [13], and the combina- the flights in our data. 63% of the changes can be classified tion of these methods via stacked generalization [28]. Next, as dependent changes based on the behavior of other flights Section 5 describes our simulation and the performance of by the same airline. each of the methods on our test data. The section also reports on a sensitivity analysis to assess the robustness of Route Avg. number of price changes our results to changes in the simulation. We conclude with LAX-BOS 6.8 SEA-IAD a discussion of future work and a summary of the paper's 5.4 contributions. Table 1: Average number of price changes per route. 2. DATA COLLECTION We found that the round-trip ticket price for flights can We collected airfare data directly from a major travel web vary significantly over time. Table 2 shows the minimum site. In order to extract the large amount of data required price, maximum price, and the maximum difference in prices for our machine learning algorithms, we built a flight data that can occur for flights on each route. collection agent that runs at a scheduled interval, extracts the pricing data, and stores the result in a database. Route Min Price Max Price Max Price Change We built our flight data collection agent using Agent- LAX-BOS 275 2524 2249 Builder1 for wrapping web sites and Theseus for executing SEA-IAD 281 1668 1387 the agent [3]. AgentBuilder exploits machine learning tech- nology [15] that enables the system to automatically learn Table 2: Minimum price, maximum price, and max- extraction rules that reliably convert information presented imum change in ticket price per route. All prices in on web pages into XML. Once the system has learned the this paper refer to the lowest economy airfare avail- extraction rules, AgentBuilder compiles this into a Theseus able for purchase. plan. Theseus is a streaming dataflow execution system that supports highly optimized execution of a plan in a network For many flights there are easily discernible price tiers environment. The system maximizes the parallelism across where ticket prices fall into a relatively small price range. different operators and streams data between operations to The number of tiers typically varies from two to four, de- support the efficient execution of plans with complex navi- pending on the airline and the particular flight. Even flights gation paths and extraction from multiple pages. from the same airline with the same schedule (but with dif- For the purpose of our pilot study, we restricted our- ferent departure dates) can have different numbers of tiers.

Mining Airfare Data to Minimize Ticket Purchase Price

ARCHITECTS of INTELLIGENCE for Xiaoxiao, Elaine, Colin, and Tristan ARCHITECTS of INTELLIGENCE

Keeping AI Legal

AI Ignition Ignite Your AI Curiosity with Oren Etzioni AI for the Common Good

Oren Etzioni, Phd: CEO of Allen Institute for AI

The Elephant in the Room: Getting Value from Big Data Serge Abiteboul, Luna Dong, Oren Etzioni, Divesh Srivastava, Gerhard Weikum, Julia Stoyanovich, Fabian Suchanek

Toward Automatic Bootstrapping of Online Communities Using

Autonomous Cars

Jeffrey P. Bigham Updated 2/3/2020

Convolutional Neural Networks (CNN) for Data Classification

Douglas C Downey

Exploiting Parallel News Streams for Unsupervised Event Extraction Congle Zhang, Stephen Soderland & Daniel S

Artificial Intelligence: Background, Selected Issues, and Policy Considerations