Demand for Domestic Air Transportation

1

Ayda Coskunpinar

Hanover College

Professor Eric Dodge

Introduction to Econometrics

December 10, 2007 2

Abstract

The modern period of aviation began with a hot air balloon flight on November 21, 1783.

The air travel industry has been improving since then, and it has become one of the most popular modes of transportation. This study is an attempt to analyze the factors that affect the demand for domestic air transportation in the United States between the years 1980 through 2005. After reviewing different literature on this topic, the variables used in this model include income, population, air travel safety, and motor vehicle safety variables. Empirical results show that the average price of domestic tickets and the per capita income are highly significant explanatory variables in this study. 3

Contents

Introduction 4

Literature Review 5

Data and Model Specifications 8

Variables 8

Model 9

Main Hypothesis to Test 10

Expected Signs 10

Econometric Issues 16

Data and Empirical Results 16

Data collection 16

Variable Definitions 17

Data Problems 18

Summary Statistics 18

Empirical Results 19

Conclusion 27

References 28

Tables 30 4

Introduction

Air transportation has become a popular medium for various reasons since Leonardo da

Vinci began with the studies of flying. Even in the early 1930s, it was thought to be the superior transportation medium. Watkins (1931) presents “speed” as the prime factor that distinguishes air travel from all other means of transportation, as they existed 76 years ago. He also states that the ability to make direct transit between two locations without stopping and the free right of way, as the other major factors that promote the demand for this type of transportation. This time- series study examines the socioeconomic factors that influence the demand for domestic air transportation in the United States between the years 1980 through 2005. The analysis of the data also provides information on whether the September 11, 2001 attacks had an impact on the demand.

The data are gathered from various private and government resources such as National

Transportation Board, U.S Census Board and National Highway Traffic Safety Administration.

This study shows that the average price of domestics fares, per capita income, number of motor vehicle fatalities per population, and whether the year is 2001 or after, are the significant variables in the model, whereas the price of oil is insignificant. 5

Literature Review

The investigation of this topic developed as the object of theoretical and empirical study to understand its demand. Young (1972) used both time-series and cross sectional analysis in his study where his dependent variable as the measure of the demand for air transportation was per capita passenger trips. In his time-series analysis, he used ticket price, journey time by the airplane, and income as the independent variables. He experimented with various models for this time-series analysis: the current income model, the permanent income model, the partial adjustment model, and the general time- series model. He applied linear regression analysis for the general time-series model, and log-linear form for the other three. He concluded that all the variable coefficients in his various model specifications had the expected signs, positive. The R- squares laid between 0.94 and 0.99 for his various models. He concluded that the general time- series model was superior to the other three.

Another study for the demand for air transportation was done by Verleger (1972), in which the goal was to “ develop a model which should indicate whether differences in demands exist across city pairs, whether there is any regularity to these differences, and thus whether cross-sectional or aggregate models are theoretically feasible” ( Verleger, 1972, p. 443). Gravity model was used because it makes it possible to analyze travel demand over time between two cities. Verleger (1972) tested several specifications of the demand model. He compared various specifications of the gravity model to a simple log linear demand model, and found that the modified form of the gravity model performed better on average compared to the log linear model. The study concluded that the demand for air travel was very income elastic: “The results for the estimate of demand for all city pairs indicate that only 7 of the 115 β coefficients are less 6 than zero and 79 of the remaining 110 coefficients are statistically greater than zero at a 95- percent confidence level” ( Verleger, 1972, p. 451). The study also showed that more than half of the price coefficients were statistically insignificant. The author concluded that shorter travel routes have smaller price elasticities. The absolute value of price elasticity increased with distance and approached unity in the limit.

Lansing, Liu and Suits (1961) analyzed the interurban air travel in the United States. The study had two statistical sections. The first section analyzed the traffic between New York and a sample of 151 cities. The dependent variable was the number of trips and the independent variables were distance, population and per cent of families with income over $ 10,000. They used regressions that were linear in the variables’ logarithms. As stated in the article, “The final regressions involved a sample of cities including 151 of the 170 standard metropolitan areas exclusive of New York and Chicago” ( Lansing, Liu & Suits, 1961, p.91). The results showed that the impact of the independent variables were highly affected by the nature of the city. This section of the study concluded that “ The equality of the income and population coefficients in the Chicago regression is consistent with the theory that air travel should depend on the number of high income families, whereas travel to and from New York appears to be affected by the whole population rather than by the upper income group alone” (Lansing, Liu, Suits, 1961, p.91).

The second section analyzed the number of trips between two cities as a function of the distance between them. They fitted the regression into three samples of cities which they had categorized according to size. They excluded New York and Chicago from the samples and concluded the following: “For the largest cities, a 1 per cent increase in intercity distance is associated with a .6 of 1 per cent decrease in number of trips, whereas in the samples of smaller cities the distance elasticity is twice as great” (Lansing, Liu, Suits, 1961, p.93). Through analyzing both of the 7 statistical sections of the study, the authors concluded that the dependent variable, the volume of air travel between two cities, was positively related with their populations and the per cent of the population with incomes over $10,000, and negatively related with the distance between them.

The study noted that as the population increased, its coefficient rose, and that the coefficient of the income variable was affected by the city size.

Tretheway and Oum (1992) measured the demand for airlines and specified the following as the determinants: price, income, price of other modes of transport, frequency of service, timing of service, day of the week, season of the year, safety record, demographics, distance, in- flight amenities, customer loyalty, and travel time. They associated the major demand effects with price and income. Battersby and Oczkowski (2000) examined price, income, substitute prices, and seasonality as demand determinants. They used revenue passenger kilometer, which was derived by multiplying the distance of a route by the product number of the number of passengers in a route, as the dependent variable, all in linear forms. They concluded that the transportation price elasticities varied across classes and routes. The own price, income, substitute price and seasonality variables were found to be credible variables that described airline demand. 8

Model Specification

The primary goal of this study is to provide an understanding of the domestic air travel demand in the United States- what are the principal factors that influence the number of domestic trips made by passengers, per capita? According to the theory of consumer behavior, individuals will demand the particular commodity that satisfies their needs and desires, subject to certain constraints such as budget. Tretheway and Oum (1992) identify thirteen variables that determine airline demand: “price, income, price of other modes of transport, frequency of service, timing of service, day of the week, season of the year, safety record, demographics, distance, in-flight amenities, customer loyalty and travel time”. The present study analyzes the demand on domestic air travel over years using a time- series model, thus several of the variables used in Tretheway and Oum’s model are relevant. Following are the independent variables that bear the qualities that affect consumer behavior regarding air travel.

Per capita income: The mean income computed for the residents of the United States. The data is derived by dividing aggregate income by total population. These data are adjusted according to inflation- Real Income.

Price of crude oil: The domestic first purchase annual average price of crude oil, per million

British thermal unit (Btu), in the United States. These data are adjusted according to inflation-

Real price of crude oil.

Accidents: Number of accidents per domestic flight hours, during the calendar year.

Motor vehicle fatalities per population: Number of motor vehicle accidents occurring on the road in a given year in the United States, divided by the total population of the United States, which includes Armed forces abroad. 9

Average price of domestic fares : Average ticket price of a one-way domestic trip in the

United States, during the calendar year. These data are adjusted according to inflation. Real average price of domestic fares.

September: If the time period specified is 2001or later.

Below is the dependent variable of this study:

Total passengers enplaned per population: The total number of passengers boarding domestic flights, divided by the population.

The following is the mathematical representation of my hypothesized model:

TENP/POPt = β0 - β1 ACCt - β2 APDFt+ β3(log)INt + β4 MVFATPPt - β5 OILt -β6 SEPTt + ε

Where:

TENP/POPt: Total Passengers Enplaned Per Population

ACCt: Accidents

APDFt: Average Price of Domestic Fares

INt: Per Capita Income

MVFATPP: Motor Vehicle Fatalities Per Population

OILt: Price of crude oil

SEPTi: 0= if before 2001, 1= if 2001 or later.

My model will have a linear functional form and OLS will be used as the estimation technique. Young stated that “estimation of the parameters in the general time- series model can be regarded as a problem of nonlinear estimation” (Young, 1972, p.561). However he decided to treat it as a problem of linear estimation for practical purposes, and applied linear regression 10 analysis. Based on the theory behind consumer behavior and the literature findings of Battersby and Oczkowski, which state that “the major demand effects are associated with price and income variables” (Battersby and Oczkowski, p.4), my main hypothesis is that per capita income would have a significant positive impact on the dependent variable.

Expected Signs:

The hypothesized signs can be expressed as follows:

Per Capita Income:

Ho: βIN ≤0

Ha: βIN >0

There is an expected positive relationship between air travel demand and personal income. Lansing, Liu and Suits (1961) used income as one of their independent variables while determining the demand for trips between two cities. The study concluded that the volume of air travel between two cities was positively related with the per cent of population with incomes over $10,000. In addition, Verleger (1972) found in his study that air travel demand was very income elastic. He concluded that the “city-to-city air travel can be explained very well by incomes in the individual cities” ( Verleger, 1972, p.452). It makes sense that people would increase demand for air travel as their income increases, however I believe that there is a point after which the demand increases at a decreasing rate. Thus, the income variable needs to be logged and the elasticity will be calculated. 11

Crude Oil:

Ho: βOIL ≥0

Ha: βOIL <0

There is an expected negative relationship between air travel demand and price of crude oil. Young states in his study for demand for air transportation service that, “an individual’s demand for a particular commodity depends on its price, the prices of related commodities and his own income” (Young, 1972, 560). As the price of oil, which is an input, goes up, the price of aviation fuel that is used to operate planes increases. This adds to the cost index of operating planes, which increases the overall cost of air transportation. There is an expected negative relationship between air travel demand and price of crude oil.

Accidents:

Ho: βACC ≥0

Ha: βACC <0

There is an expected negative relationship between air travel demand and the accidents variable. Tretheway and Oum (1992) identify safety record as one of the main determinants of airline demand. The San Fransisco Bay Conservation and Development Commission(2007) states perceptions of aviation safety as a global factor that affects air passenger demand. It makes sense that the individuals would choose the mode of transportation that they believe is the safest.

The passengers would not risk their lives by choosing to use a mode of transportation that has a high risk, or historical record of accidents. 12

Motor Vehicle Fatalities Per Population:

Ho: βMVFATPP ≤0

Ha: βMVFATPP >0

There is an expected positive relationship between air travel demand and the number of motor vehicle fatalities per population. Motor vehicles are substitute for planes. If the motor vehicle fatalities increase, symbolic of a decrease in the safety of this type of transportation, the demand for other types of transportation will increase in order to avoid risk of injury or death.

Average Price of Domestic Fares:

Ho: βAPDF ≥0

Ha: βAPDF <0

There is an expected negative relationship between air travel demand and the average price of domestic fares. It is logical that as the price of a commodity goes up, consumers purchase less of that commodity. This is classical consumer behavior. As the average price of domestic fares increases, they will most likely reduce the quantity demanded.

September:

Ho: βSEPT ≥0

Ha: βSEPT <0

There is an expected negative relationship between air travel demand and the September

11 dummy variable. Ito and Lee (2004) concluded in their study which assesses the impact of

September 11 attacks that, “September 11 resulted in both a negative transitory shock of over

30% and an ongoing negative demand shock amounting to roughly 7.4% of pre-September 11 demand” (Ito & Lee, 2004). I thought that it was imperative in my study to test whether

September 11 attacks reduced the demand as it would be predicted. This is a factor that 13 represents airline safety and it is predicted that as safety level decreases, demand should increase as well.

Econometric Issues:

Functional Form:

I use linear relationships for all of my variables except income. Theoretically it makes sense for my model that the income variable would have a logged functional form. An increase in income will increase the number of total passengers enplaned per capita at an increasing rate up to a certain level after which the demand will continue to increase at a decreasing rate. In order to account for this concept, and be able to measure the income elasticity, this variable will be logged.

Omitted Variables:

Any variable that measures the operation consistency is going to be an omitted variable in this study. Theoretically it makes sense that passenger utility would increase if the flights are on time and land at the original scheduled destination. However if the flights are late, cancelled , or forced to land at a destination other than the original scheduled location, the consumer utility will go down, thus this transportation mode will no longer prove to be efficient.

Anderson and Kraus (1981) stated in their study that demand for travel depends on time travel, usual price and activity variables, and that delay is included in the travel time as well.

However, data on this variable that seem to be relevant in this study based on previous literature and theory, were not reported till 1991. If this variable is to be included, the sample size for this study will decrease to 14, which is too small. Therefore, this variable is an omitted variable.

Another omitted variable is the obesity rate. An article published in The Orlando

Sentimental (2004) reports that larger passengers mean heavier planes, which results in an 14 increase in the cost of flying. The logic behind this is that extra weight causes airlines to burn extra gallons of aviation fuel. So, higher cost is expected to decrease demand. It is also expected that heavier individuals would use air travel less due to the heart attack risk and the inability to fit in one seat. So, this factor seems relevant in our study. However Body Mass Index rates

(which is a measurement of categorizing individuals as obese or not), or the national average obesity rates, have not been reported before 1986. The elimination of the data before 1986 is not logical since the sample size will be too small for this time- series study. Therefore, this is another omitted variable.

The same problem exists with the advertising and promotion costs variable. More advertising and promotion is expected to increase the demand for a good. However these data were not reported before 1990, and in order to avoid the small sample problem, it will be considered as an omitted variable.

Multicollinearity:

It is possible that the independent variables used for this study can correlate to one another and cause perfect or imperfect multicollinearity within the model. For example, crude oil and average price of domestic fares might have multicollinearity. An increase in the price of crude oil could mean an increase in the aviation fuel prices, thus increase the average price of domestic fares. It is also possible that there is multicollinearity between the September variable and the number of accidents. It is obvious that the number of aviation accidents will be higher at

2001. Also, there might be multicollinearity between September and the price of oil. Oil prices decline sharply after September 11 attacks due to low oil demand. Prices then increase due to a strike in Venezuela, rising tension in the Middle East, and the cold winter weather. This up and 15 down pattern in the price of oil continued, clearly due to the tension originated by the September

11 attacks.

Serial Correlation:

Serial correlation is a possible problem that the model experiences since the data are time- series. It is highly likely that different observations of the error term are correlated with each other either due to specification error or data problems. 16

Data and Empirical Results

This study analyzes the demand for domestic air travel with a time- series data set. The purpose of this type of a model is to be able to analyze the impact of the change in the quantitative independent variables on demand over time, and also to test whether there is a significant increase or decrease in the demand for air travel due to the September 11, 2001 attacks. The data ranges from 1980 through 2005, and includes information from all 50 states as the study measures the demand for domestic air travel in the United States between the years specified.

The data used in this study are collected from one private and three government resources. The transportation data for the accidents and the total passengers enplaned are derived from the National Transportation Safety Board, an independent Federal agency. The number of motor vehicle fatalities data are provided by the National Highway Traffic Safety

Administration. The income, population and oil price data are gathered from the U.S. Census

Bureau. The average price of domestic fares data are originated from the US DOT Origin and

Destination Passenger Survey, collected by the Database Products, a private institution located in

Texas.

Once again, this study measures the demand for domestic air travel in the United States between 1980 through 2005. The dependent variable measuring the demand is the total passengers enplaned per capita. The number of total passengers enplaned is the count of passengers boarding domestic flights. This number is divided by the population.

Numerous research studies have discovered that the variables mentioned below would influence the number of passengers enplaned per capita. 17

Accidents: The number of occurrences in major carriers and cargo carriers within the

United States, in which any person aboard with the intention of flight suffers death or serious injury (hospitalization for more than 48 hours).

Average Price of Domestic Fares: The one way average price of a domestic fare in the

United States. Does not include the tickets for which frequent fliers pay zero fare (diluted fares).

The fares are the net of all taxes, passenger facility costs and security charges. This variable is in real values.

Per Capita Income: Personal income divided by population. Personal income is the total amount of income received from all sources during the year specified, by the residents of each state. The data does not include federal employees overseas and the U.S residents on temporary foreign assignment that work with private U.S firms. This variable is in real values.

Motor vehicle fatalities per population: The number of occupants killed or injured in a motor vehicle crash. A motor vehicle includes passenger cars, light trucks, large trucks, busses and other. A crash involves at least one motor vehicle traveling on a traffic way that is open to the public.

Price of oil: Price of crude oil in dollars per million British thermal unit (Btu). This variable is in real values.

September: Dummy variable. 1= If the year is 2001 or later, 0= Otherwise. 18

Data Problems:

The original goal was to include data from 1970 through 2005, and thus have 36 observations in order to not have a small sample for this time- series model. However most data in the air travel industry have recently been started to get reported, and E-views skips over the entire observation if there are missing data, so the sample size fell down to 25. Also, the missing data created omitted variables which were not able to be used as independent variables in the project. For example, advertising and promotion costs were not reported before 1990, and the airlines did not start reporting the data on operation consistency (cancellations and delays) until

1987. Also, there are no specific data available on the obesity rates before 1987. Due to these missing data, variables that seemed significant in previous literature and that make theoretical sense, were not able to be included in the model. The issues that small sample size raise such as multicollinearity and insignificance, will later be discussed in the Empirical Results section.

Another data concern exists within the air transportation accidents variable. These data do not include small commercial carriers and non- scheduled taxis, thus they smaller than they would be if they were to be included. This does not create a great concern as the variable still provides information on the number of domestic accidents, and the relationship between this variable and the dependent variable is still expected to be negative, even though the cumulative air travel accidents data are not included.

The summary statistics for this study are found in Table 1. In this data set, the average number of total enplanement per capita is 1.759015. The average number of accidents is

25.11538. The average price of domestic fares is $204.6165. The average income per capita is

$29,516.44. The average number of motor vehicle fatalities per population is 0.000128. The average price of oil is $5.425385. 19

Empirical Results

The dependent variable of this study is a quantitative variable so it is essential to run the regressions using the Ordinary Least Squares method. The original regression provided the following equation (see Table 2):

TENP/POP= -17.59910-0.000112ACC-0.006662APDF+1.995191LOG(IN)+

1651.632MVFATPP+0.008476OIL-0.299097SEPT

According to the model if the number of accidents increase by 1 unit, the number of total passengers enplaned per capita will decrease by .000112 people per capita. If the average price of domestic fares increases by 1 unit, the number of total passengers enplaned per capita will decrease by 0.006662 people per capita. If the income per capita increases by 1 percent, the number of total passengers enplaned per capita will increase by 1.995191 people per capita. If the number of motor vehicle fatalities per population increase by 1 unit, the number of total passengers enplaned per capita will increase by 1651.632 people per capita. If the price of oil is increased by 1 unit, the number of total passengers enplaned per capita will increase by 0.008476 people per capita. If the year is 2001 or later, the number of total passengers enplaned per capita will decrease by .299097 people per capita.

Since this is a time series model, it is necessary to test all variables to check if they are stationary. According to the Dickey- Fuller test, the variables OIL, IN, APDF, and ACC all have unit roots (see Table 3). Since these variables are stationary, it is necessary to test the residuals of the equation for cointegration. According to the Dickey- Fuller test, the residuals have no unit roots, thus the variables are cointegrated (see Table 4). Since the nonstationary variables are 20 cointegrated, there is no need to estimate the equation using first differences or percentage changes. The equation can be estimated in its original units.

The only variable that has a flipped sign while compared to the expected relationship is

OIL in this regression. The variables APDF, LOG(IN) and SEPT are significant at a 99+% confidence level. The variables MCFATPP, OIL and ACC are insignificant, out of which the

ACC is the most insignificant with a p-value of .4685(0.9370/2). Therefore, the variable ACC is to be dropped from the final regression. The Adjusted R2 is 0.981773, which means that the specified regression estimates 98.17% of the total variation in the total number of passengers enplaned per capita, which is a pretty high percentage to observe in real life.

The model does not show any evidence of omitted variable bias (see Table 5). According to the Ramsey Reset test, the fstat of 1.470666 is smaller than the critical value of 3.24.

Therefore, the null hypothesis that Ho:β7= β8= β9=0 cannot be rejected, showing no evidence for omitted variables.

The model has several variables that are correlated with each other (see Table 6). The variables that have the highest multicollinearity are APDF & LOG(IN), and APDF &

MVFATPP. This makes sense because higher income could increase demand for air transportation, which increases the average price of domestic fare. Also, as the number of motor vehicle fatalities per population increases, the demand for air transportation would increase, and this increases the price as well. In order to detect the severity of the multicollinearity in this model, the variance inflation factors (VIF) will be calculated. 21

DEPENDENT VARIABLE R2 VIF VALUE ACC 0.563530 2.29 APDF 0.907735 10.84 LOG(IN) 0.877590 8.17 MVFATPP 0.715420 3.51 OIL 0.652806 2.88 SEPT 0.749581 3.99

As suggested in the table above, the variables that have a VIF value that is bigger than 5 have severe multicollinearity.

The model has no evidence for serial autocorrelation. According to the Durbin- Watson test, the dstat of 2.538051 is bigger than the DU of 1.99, thus the null hypothesis of Ho: p≤0 cannot be rejected, showing no evidence for serial autocorrelation.

The model has no evidence of heteroskedasticity (see Table 7). According to the White test, The dstat of 15.37601 is lower than the chi value of 19.68. Thus we cannot reject the null hypothesis that Ho: Homoskedastic.

Further analysis of this regression provides the following explanation:

The variables TENP/POP, APDF, IN,MVFATPP and OIL has either obvious decreasing or increasing trends over time (see Table 8). This means that the time factor is factored into the effects of each of these variables, causing bias in a way. This might be the reason why the adjusted R2 is so high. This problem needs to be solved immediately by adjusting the variables in order to account for the effects of time.

The following is the final regression with TENP/POP as the dependent variable and an additional qualitative independent variable, TIME (see Table 9) The variable assumes the value of 1 for

1980, and the numbers increase in a consecutive number for each year, up to 26 for 1995.

Below is a chart of the coefficients, tstats and probabilities of the independent variable. It is important to remember to divide the probabilities by 2 since the variables are tested one- sided. 22

VARIABLES β T- STATISTICS PROBABILITY APDF -0.006264 -8.441739 0.0000 LOG(IN) 1.778654 5.927633 0.0000 MVFATPP 2272.637 1.601024 0.1259 OIL 0.007456 1.168251 0.2572 SEPT -0.301082 -8.422418 0.0000 TIME 0.005641 1.101422 0.2845

Hypothesis Testing:

Ho: βAPDF≥0; βLOG(IN)≤0; βMVFATPP≤0; βOIL≥0; βSEPT≥0; βTIME≤0

Ha: βAPDF <0; βLOG(IN)>0; βMVFATPP>0; βOIL<0; βSEPT<0; βTIME>0

The obtained model was significant in explaining the factors involved in TENP/POP

However there was only one unexpected sign in the explanatory variables; OIL.

The average price of domestic fares is expected to have a negative relationship with the total number of passengers enplaned per capita, because the cheaper the price of a product or service is, the higher the quantity demanded for that product is. We can reject the null with 99+% confidence. The estimated coefficient of -0.006264 indicates that a 1unit increase in the average price of domestic fares will decrease the total number of passengers enplaned per population by

0.001646 people per capita.

The per capita income has a positive relationship with the total number of passengers enplaned per capita. Income is one of the determinants of demand for a good or service and the study measures the demand for a normal good. Regarding air travel, as income increases, it is expected that the demand would also increase up to a certain point after which the increase will occur at a decreasing rate. Semi-log functional form is appropriate for functions in which a one percent increase in an independent variable causes the dependent variable to increase at a 23 decreasing rate. We can reject the null with 99+% confidence. The estimated coefficient of

1.778654 indicates that a 1 percent increase in the per capita income will increase the total number of passengers enplaned per capita by 1.0778654 people per population.

The number of motor vehicle fatalities per population has a positive relationship with the total number of passengers enplaned per capita. It makes theoretical sense that if motor vehicles have a high fatality rate, people would increase the demand for air travel, as it is a substitute means of transportation. We can reject the null hypothesis with 93% confidence. The estimated coefficient of 2272.637 indicates that a 1 unit increase in the percentage change of motor vehicle fatalities per population will increase the total number of passengers enplaned per capita by

2272.637 people per population.

The price of oil has a flipped sign. The variable was expected to have a negative a relationship with the total number of passengers enplaned per capita, because if the crude oil costs less, aviation fuel will cost less, and the total cost of flying would be cheaper, thus increasing the total number of passengers enplaned per population. However, further thinking provides a logical explanation for the sign of this variable: As the price of crude oil increases, cost of operating motor vehicles will increase too, therefore the advantage of speed and comfort that the planes provide will lead the consumers to increase the quantity demanded of the demand for air transportation. We cannot reject the null hypothesis at a significant confidence level, stating that its effect on the dependent variable is insignificantly different from zero. The estimated coefficient of 0.007456 indicates that a 1 unit increase in the price of oil will increase the total number of passengers enplaned per capita by 0.007456 people per population.

The September dummy variable has a negative relationship with the total number of passengers enplaned per capita. It makes theoretical sense that people would decrease the 24 demand for air travel after September 11, 2001, due to safety reasons. We can reject the null hypothesis at a 99% confidence level. The estimated coefficient of -0.301082 indicates that if the year is 2001 or later, the total number of passengers enplaned per capita will decrease by

0.301082 people per population.

The time variable has a positive relationship with the total number of passengers enplaned per capita. This variable is added to the model in order to account for the time factor.

The dependent variable can pretty much be explained by this variable alone. As the time passes, the demand for air travel will increase as well. We cannot reject the null hypothesis at a significant confidence level. The estimated coefficient of 0.005641 indicates that if it is one year later than the given, the total number enplaned per capita will increase by 0005641 people per population.

The model does not show any evidence of omitted variable bias (see Table 10).

According to the Ramsey Reset test, the fstat of 3.005713 is smaller than the critical value of 3.24. Therefore, the null hypothesis that Ho:β7= β8= β9=0 cannot be rejected, showing no evidence for omitted variables.

The model has several variables that are correlated (see Table 11) with each other. The variables that have the highest multicollinearity are APDF & LOG(IN), and APDF &

MVFATPP, just like in the first regression. One of the reasons why the MVFATPP variable has such a big variance yet a small t-score might be related to this multicollinearity. It is also observed that the TIME variable in this model is highly correlated to all variables except OIL.

In order to detect the severity of the multicollinearity in this model, the variance inflation factors

(VIF) will be calculated.

DEPENDENT VARIABLE R2 VIF VALUE APDF 0.928968 14.08 25

LOG(IN) 0.943156 17.59 MVFATPP 0.790134 4.76 OIL 0.651545 2.86 SEPT 0.745837 3.93 TIME 0.972288 36.085

As suggested in the table above, the variables that have a VIF value that is bigger than 5 have severe multicollinearity.

The model has no evidence of heteroskedasticity (see Table 12). According to the White test, The dstat of 13.07751 is lower than the chi value of 19.68. Thus we cannot reject the null hypothesis that Ho: Homoskedastic.

The model has no evidence for serial autocorrelation. According to the Durbin- Watson test, the dstat of 2.329819 is bigger than the DU of 1.99, thus the null hypothesis of Ho: p≤0 cannot be rejected, showing no evidence for serial autocorrelation.

Compared to the first regression, this regression has smaller Akaike and Schwarz indexes and a higher adjusted R2, which suggests its superiority over the first one.

The fstat of this model shows that the null hypothesis can be rejected, meaning that there is a good overall significance in this model.

Ho: βAPDF= βLOG(IN)= βMVFATPP= βOIL= βSEPt= βTIME=0

Ha: βAPDF ≠βLOG(IN) ≠βMVFATPP≠ βOIL≠βSEPT≠ βTIME≠0

Since this study is a demand model, there is possibility for simultaneity bias in which the

Classical Assumption III is violated. In simultaneous equations, the error terms are correlated with the endogenous variables whenever the dependant variables appear as explanatory variables. The Two-Stage Least Squares (2SLS) method needs to be used to test for simultaneity because this equation might be simultaneously determined by another model—a supply function.

This might be a reason behind the upward bias in some of the estimated coefficients. In this 26 mode, the average price of domestic fares is the endogenous variable. The income, motor vehicle fatality, oil, September and time variables are the instruments. The results of the 2SLS estimates are shown in table 13. Compared to the OLS, the tstats of the 2SLS are lower, except for oil.

This means that my final model has a simultaneity problem because 2SLS increases SEs and lowers tstats.

Conclusion

The primary purpose of this paper has been to develop a model to examine the factors that affect the demand for domestic air transportation in the United States between the years

1980 through 2005. Several major finding emerged through the analysis of data. Average price of plane tickets, per capita income, number of motor vehicle fatalities per population and the fact that whether the year is 2001 or later, are significant variables in explaining the demand. It is concluded that as the income, number of motor vehicle fatalities, price oil and time increase, the number of total passengers enplaned per capita also increase, thus showing and increase in demand. However, if the average price of domestic fares increases, or if the year is 2001 or later, the number of total passengers enplaned per capita decreases.

The main shortcoming of the study is the severe multicollinearity that exists between variables, and the upward biased adjusted R2 due to the effects of the time factor. Though many factors were listed as determinants of the demand for domestic airline transportation, there are still many factors that are not included in this analysis due to unavailability of data. The 27 advertising and promotion, operation consistency, and the obesity data are among those that I would have definitely included in my study if the data were available.

The study cannot reject the main hypothesis that the per capita income has a significant affect on the demand for domestic air transportation. In the future, I expect that the price of oil variable would gain more importance and may end up being one of the statistically significant variables in this study.

References Anderson, J. E., & Kraus, M. (1981). Quality of Service and the Demand for Air Travel. The

Review of Economics and Statistics , 63 (n4), 533-540.

As Americans' body weights rise, so do airlines' costs, researchers say. (2004). The Orlando

Sentimental .

Battersby, B., & Oczkowski, E. (n.d.). An Econometric Analysis of the Demand for Domestic

Air Travel in Australia. School of Management, Charles Sturt University .

Bureau of Economic Analysis. (2007, September 28). Retrieved October 20, 2007, from Bureau

of Economic Analysis Industry Economic Accounts:

http://www.bea.gov/industry/index.htm

FedStats. (n.d.). Retrieved October 10, 2007, from http://www.fedstats.gov/cgi-bin/A2Z.cgi

International Civil Aviation Organization. (n.d.). Retrieved October 11, 2007, from

http://icao.int/ 28

Ito, H., & Lee, D. (200). Assessing the impact of the September 11 terrorist attacks on U.S.

airline demand. Journal of Economics and Business , 57 (1), 75-95.

Lansing, J. B., Liu, J.-C., & Suits, D. B. (1961). An Analysis of Interurban Air Travel. The

Quarterly Journal of Economics , 75 (n1), 87-95.

National Highway Traffic Safety Administration. (n.d.). Retrieved November 20, 2007, from:

http://www.nhtsa.dot.gov/

National Transportation Safety Board. (n.d.). Retrieved November 1, 2007, from Aviation:

http://www.ntsb.gov/aviation/aviation.htm

Pisarski, A. E. (1981). Transportation. Annals of the American Academy of Political and Social

Science , 453, 70-95.

The World Bank Group. (n.d.). Retrieved November 3, 2007, from The World Bank Group Data

Query: http://devdata.worldbank.org/data-query/

U.S Department of Transportation Research and Innovative Technology Administration. (2007,

October 31). Retrieved October 31, 2007, from Bureau of Transportation Statistics:

http://www.bts.gov/

US Census Bureau. (2007, June 19). Retrieved October 3, 2007, from US Census Bureau The

2007 Statistical Abstract: http://www.census.gov/compendia/statab/

US Department of Transportation . (2007, April 25). Retrieved November 2, 2007, from US

Department of Transportation Office of Public Affairs:

http://www.dot.gov/affairs/bts1907.htm

Verleger, P. K. (1972). Models of the Demand for Air Transportation. The Bell Journal of

Economics and Management Science , 3 (n2), 437-457.

Watkins, M. W. (1931). The Aviation Industry. The Journal of Political Economy , 39 (n1), 42-

68. 29

Welcome to California. (n.d.). Retrieved November 3, 2007, from San Francisco Bay

Conservation and Development Comission: http://www.bcdc.ca.gov/

Young, K. H. (1972). A Synthesis of Time-Series and Cross-Section Analyses: Demand for Air

Transportation Service. Journal of the American Statistical Association , 67 (n339), 560-

566. 30

Table 1: Summary Statistics

TENP/POP ACC APDF IN MVFAT/POP OIL SEPT TIME Mean 1.759015 25.11538 204.6165 29516.4 0.000128 5.42538 0.192308 13.50000 4 5 Median 1.732192 21.00000 204.7850 29114.1 0.000124 4.25500 0.000000 13.50000 2 0 Maximum 2.272287 48.00000 268.4400 34559.3 0.000162 11.7700 1.000000 26.00000 1 0 Minimum 1.089829 12.00000 146.0500 23919.9 0.000111 2.24000 0.000000 1.000000 7 0 Std. Dev. 0.341943 10.02129 33.61672 3335.11 1.29E-05 2.53457 0.401918 7.648529 0 7 Skewness -0.567443 0.755527 0.060478 -0.140975 0.893516 0.97383 1.561440 0.000000 4 Kurtosis 2.420640 2.441036 2.261163 1.96598 3.097714 2.80367 3.438095 1.796444 3 7

Jarque-Bera 1.758926 2.812032 0.607220 1.24441 3.469950 4.15128 10.77300 1.569258 2 6 Probability 0.415006 0.245118 0.738149 0.53675 0.176405 0.12547 0.004578 0.456289 9 6

Sum 45.73439 653.0000 5320.030 767427. 0.003341 141.060 5.000000 351.0000 3 0 Sum Sq. 2.923131 2510.654 28252.09 2.78E+0 4.14E-09 160.602 4.038462 1462.500 Dev. 8 0

Observations 26 26 26 26 26 26 26 26

Table 2: OLS Original Regression

Dependent Variable: TENP/POP Method: Least Squares Date: 12/10/07 Time: 22:30 Sample: 1980 2005 Included observations: 26 Newey-West HAC Standard Errors & Covariance (lag truncation=2) Variable Coefficient Std. Error t-Statistic Prob. C -17.59910 2.399414 -7.334750 0.0000 ACC -0.000112 0.000868 -0.128719 0.8989 APDF -0.006662 0.000852 -7.823174 0.0000 LOG(IN) 1.995191 0.224196 8.899304 0.0000 MVFATPP 1651.632 1011.993 1.632059 0.1191 OIL 0.008476 0.005967 1.420360 0.1717 SEPT -0.299097 0.034372 -8.701739 0.0000 R-squared 0.986147 Mean dependent var 1.759015 Adjusted R-squared 0.981773 S.D. dependent var 0.341943 S.E. of regression 0.046165 Akaike info criterion -3.088366 31

Sum squared resid 0.040494 Schwarz criterion -2.749648 Log likelihood 47.14876 F-statistic 225.4264 Durbin-Watson stat 2.538051 Prob(F-statistic) 0.000000

Table 3: Dickey- Fuller Tests

Null Hypothesis: ACC has a unit root Exogenous: Constant Lag Length: 1 (Automatic based on SIC, MAXLAG=8) t-Statistic Prob.* Augmented Dickey-Fuller test statistic -1.250543 0.6349 Test critical values: 1% level -3.737853 5% level -2.991878 10% level -2.635542 *MacKinnon (1996) one-sided p-values.

Null Hypothesis: APDF has a unit root Exogenous: Constant Lag Length: 0 (Automatic based on SIC, MAXLAG=8) t-Statistic Prob.* Augmented Dickey-Fuller test statistic -0.963009 0.7502 Test critical values: 1% level -3.724070 5% level -2.986225 10% level -2.632604 *MacKinnon (1996) one-sided p-values.

Null Hypothesis: IN has a unit root Exogenous: Constant Lag Length: 0 (Automatic based on SIC, MAXLAG=8) t-Statistic Prob.* Augmented Dickey-Fuller test statistic -0.854174 0.7856 Test critical values: 1% level -3.724070 5% level -2.986225 10% level -2.632604 *MacKinnon (1996) one-sided p-values.

Null Hypothesis: OIL has a unit root Exogenous: Constant Lag Length: 0 (Automatic based on SIC, MAXLAG=8) t-Statistic Prob.* 32

Augmented Dickey-Fuller test statistic -1.455686 0.5389 Test critical values: 1% level -3.724070 5% level -2.986225 10% level -2.632604 *MacKinnon (1996) one-sided p-values.

Table 4: Cointegration

Null Hypothesis: E has a unit root Exogenous: Constant Lag Length: 1 (Automatic based on SIC, MAXLAG=8) t-Statistic Prob.* Augmented Dickey-Fuller test statistic -5.291033 0.0003 Test critical values: 1% level -3.737853 5% level -2.991878 10% level -2.635542 *MacKinnon (1996) one-sided p-values.

Table 5: Ramsey Reset Test

Ramsey RESET Test: F-statistic 1.470666 Probability 0.260088 Log likelihood ratio 6.331887 Probability 0.096534

Test Equation: Dependent Variable: TENP/POP Method: Least Squares Date: 12/11/07 Time: 01:19 Sample: 1980 2005 Included observations: 26 Newey-West HAC Standard Errors & Covariance (lag truncation=2) Variable Coefficient Std. Error t-Statistic Prob. C 323.0736 289.7595 1.114972 0.2813 ACC 0.003671 0.002620 1.401135 0.1803 APDF 0.118872 0.107008 1.110868 0.2830 LOG(IN) -35.81202 32.10585 -1.115436 0.2811 MVFATPP -29656.49 27322.87 -1.085409 0.2938 OIL -0.165051 0.136041 -1.213245 0.2426 SEPT 5.367982 4.799433 1.118462 0.2799 FITTED^2 19.28239 14.82509 1.300659 0.2118 FITTED^3 -8.596447 5.994750 -1.433996 0.1708 FITTED^4 1.403419 0.898702 1.561606 0.1379 R-squared 0.989141 Mean dependent var 1.759015 Adjusted R-squared 0.983033 S.D. dependent var 0.341943 S.E. of regression 0.044540 Akaike info criterion -3.101131 33

Sum squared resid 0.031741 Schwarz criterion -2.617248 Log likelihood 50.31471 F-statistic 161.9429 Durbin-Watson stat 2.441282 Prob(F-statistic) 0.000000

Table 6: Correlations

TENP/POP ACC APDF LOG(IN) MVFATPP OIL SEPT TENP/POP 1.000000 0.713146 -0.927810 0.960303 -0.789915 -0.630289 0.497468 ACC 0.713146 1.000000 -0.670268 0.737821 -0.555197 -0.335651 0.461034 APDF -0.927810 -0.670268 1.000000 -0.904575 0.835039 0.489943 -0.709021 LOG(IN) 0.960303 0.737821 -0.904575 1.000000 -0.770834 -0.564935 0.618289 MVFATPP -0.789915 -0.555197 0.835039 -0.770834 1.000000 0.426628 -0.521880 OIL -0.630289 -0.335651 0.489943 -0.564935 0.426628 1.000000 0.059020 SEPT 0.497468 0.461034 -0.709021 0.618289 -0.521880 0.059020 1.000000

Table 7: White Test

White Heteroskedasticity Test: F-statistic 1.842008 Probability 0.140396 Obs*R-squared 15.37601 Probability 0.165922

Test Equation: Dependent Variable: RESID^2 Method: Least Squares Date: 12/11/07 Time: 01:21 Sample: 1980 2005 Included observations: 26 Newey-West HAC Standard Errors & Covariance (lag truncation=2) Variable Coefficient Std. Error t-Statistic Prob. C 7.658387 6.344112 1.207165 0.2474 ACC 4.33E-05 0.000259 0.166685 0.8700 ACC^2 -2.76E-06 3.54E-06 -0.778446 0.4493 APDF 0.000200 0.000462 0.433594 0.6712 APDF^2 -3.96E-07 1.10E-06 -0.360247 0.7240 LOG(IN) -1.533264 1.240491 -1.236013 0.2368 (LOG(IN))^2 0.075385 0.060357 1.248997 0.2321 MVFATPP 1666.250 1255.115 1.327567 0.2056 MVFATPP^2 -5865684. 4605354. -1.273666 0.2235 OIL -0.000807 0.000701 -1.150907 0.2690 OIL^2 4.55E-05 4.04E-05 1.124517 0.2797 SEPT 0.004075 0.002107 1.933841 0.0736 R-squared 0.591385 Mean dependent var 0.001557 Adjusted R-squared 0.270331 S.D. dependent var 0.001817 S.E. of regression 0.001552 Akaike info criterion -9.794369 34

Sum squared resid 3.37E-05 Schwarz criterion -9.213709 Log likelihood 139.3268 F-statistic 1.842008 Durbin-Watson stat 2.460253 Prob(F-statistic) 0.140396

Table 8: Graphs of the variables

2.4

2.2

2.0

1.8

1.6

1.4

1.2

1.0 1980 1985 1990 1995 2000 2005

TENP/POP 35

280

260

240

220

200

180

160

140 1980 1985 1990 1995 2000 2005

APDF

36000

34000

32000

30000

28000

26000

24000

22000 1980 1985 1990 1995 2000 2005

IN 36

.00017

.00016

.00015

.00014

.00013

.00012

.00011 1980 1985 1990 1995 2000 2005

MVFATPP 37

12

10

8

6

4

2 1980 1985 1990 1995 2000 2005

OIL

Table 9: OLS Final Regression

Dependent Variable: TENP/POP Method: Least Squares Date: 12/10/07 Time: 23:06 Sample: 1980 2005 Included observations: 26 Variable Coefficient Std. Error t-Statistic Prob. C -15.60588 3.257840 -4.790253 0.0001 APDF -0.006264 0.001014 -6.176951 0.0000 LOG(IN) 1.778654 0.331382 5.367389 0.0000 MVFATPP 2272.637 1541.179 1.474610 0.1567 OIL 0.007456 0.006073 1.227744 0.2345 SEPT -0.301082 0.044843 -6.714061 0.0000 TIME 0.005641 0.007136 0.790498 0.4390 R-squared 0.986584 Mean dependent var 1.759015 Adjusted R-squared 0.982347 S.D. dependent var 0.341943 S.E. of regression 0.045432 Akaike info criterion -3.120388 Sum squared resid 0.039218 Schwarz criterion -2.781670 Log likelihood 47.56505 F-statistic 232.8648 Durbin-Watson stat 2.329819 Prob(F-statistic) 0.000000 38

Table 10: Ramsey Reset Test

Ramsey RESET Test: F-statistic 3.005713 Probability 0.061236 Log likelihood ratio 11.62128 Probability 0.008800

Test Equation: Dependent Variable: TENP/POP Method: Least Squares Date: 12/11/07 Time: 01:28 Sample: 1980 2005 Included observations: 26 Variable Coefficient Std. Error t-Statistic Prob. C -125.0560 302.5304 -0.413367 0.6848 APDF -0.050254 0.118569 -0.423840 0.6773 LOG(IN) 13.79635 33.58730 0.410761 0.6867 MVFATPP 22157.58 44424.40 0.498770 0.6247 OIL 0.048556 0.139388 0.348353 0.7321 SEPT -2.439659 5.703508 -0.427747 0.6745 TIME 0.065170 0.111190 0.586116 0.5660 FITTED^2 -3.120038 16.94905 -0.184083 0.8563 FITTED^3 -0.159310 6.636277 -0.024006 0.9811 FITTED^4 0.227299 0.963198 0.235984 0.8164 R-squared 0.991419 Mean dependent var 1.759015 Adjusted R-squared 0.986593 S.D. dependent var 0.341943 S.E. of regression 0.039593 Akaike info criterion -3.336591 Sum squared resid 0.025082 Schwarz criterion -2.852708 Log likelihood 53.37569 F-statistic 205.4093 Durbin-Watson stat 2.559245 Prob(F-statistic) 0.000000

Table 11: Correlations

TENP/POP APDF LOG(IN) MVFATPP OIL SEPT TIME TENP/POP 1.000000 -0.927810 0.960303 -0.789915 -0.630289 0.497468 0.944278 APDF -0.927810 1.000000 -0.904575 0.835039 0.489943 -0.709021 -0.952794 LOG(IN) 0.960303 -0.904575 1.000000 -0.770834 -0.564935 0.618289 0.958507 MVFATPP -0.789915 0.835039 -0.770834 1.000000 0.426628 -0.521880 -0.858395 OIL -0.630289 0.489943 -0.564935 0.426628 1.000000 0.059020 -0.487120 SEPT 0.497468 -0.709021 0.618289 -0.521880 0.059020 1.000000 0.683130 TIME 0.944278 -0.952794 0.958507 -0.858395 -0.487120 0.683130 1.000000 39

Table 12: White Test

White Heteroskedasticity Test: F-statistic 1.287995 Probability 0.322761 Obs*R-squared 13.07751 Probability 0.288290

Test Equation: Dependent Variable: RESID^2 Method: Least Squares Date: 12/11/07 Time: 01:31 Sample: 1980 2005 Included observations: 26 Variable Coefficient Std. Error t-Statistic Prob. C 13.11337 12.74002 1.029306 0.3208 APDF -0.000438 0.000576 -0.759835 0.4600 APDF^2 1.02E-06 1.32E-06 0.773626 0.4520 LOG(IN) -2.574492 2.475044 -1.040181 0.3159 (LOG(IN))^2 0.125716 0.120512 1.043180 0.3145 MVFATPP 1539.597 1064.176 1.446750 0.1700 MVFATPP^2 -5407035. 3904215. -1.384922 0.1877 OIL -0.000435 0.001050 -0.414389 0.6849 OIL^2 4.31E-05 7.11E-05 0.606540 0.5539 SEPT 0.004243 0.002154 1.969377 0.0690 TIME 0.001444 0.001212 1.191959 0.2531 TIME^2 -6.10E-05 5.03E-05 -1.211732 0.2457 R-squared 0.502981 Mean dependent var 0.001508 Adjusted R-squared 0.112466 S.D. dependent var 0.001719 S.E. of regression 0.001620 Akaike info criterion -9.708991 Sum squared resid 3.67E-05 Schwarz criterion -9.128331 Log likelihood 138.2169 F-statistic 1.287995 Durbin-Watson stat 2.757280 Prob(F-statistic) 0.322761 40

Table 13: 2SLS

Dependent Variable: TENP/POP Method: Two-Stage Least Squares Date: 12/11/07 Time: 08:54 Sample: 1980 2005 Included observations: 26 Instrument list: (TENP/POP) ((TENP/POP)-1) LOG(IN) MVFATPP OIL SEPT TIME Variable Coefficient Std. Error t-Statistic Prob. C -16.39965 3.993550 -4.106534 0.0006 APDF -0.009384 0.001519 -6.176951 0.0000 LOG(IN) 1.920388 0.407531 4.712246 0.0002 MVFATPP 2982.195 1896.767 1.572252 0.1324 OIL 0.014686 0.007705 1.905977 0.0719 SEPT -0.362990 0.057572 -6.305012 0.0000 TIME -0.005051 0.009236 -0.546862 0.5908 R-squared 0.979903 Mean dependent var 1.759015 Adjusted R-squared 0.973556 S.D. dependent var 0.341943 S.E. of regression 0.055605 Sum squared resid 0.058747 F-statistic 157.5673 Durbin-Watson stat 2.367941 Prob(F-statistic) 0.000000