<<

fleet modelling: Data processing and discrete choice model estimation

YU SHEN

MASTER’S THESIS SUPERVISOR:EMMA FREJINGER

KTH ROYAL INSTITUTE OF TECHNOLOGY

STOCKHOLM,SWEDEN JUNE,2011

TSC-MT 11-017

谨以此文献给我的父母和妻子 Abstract

This thesis deals with the modelling of the choice of new car based on the registra- tion data of the whole Sweden car fleet for 2005 to 2010. It is divided into two parts. In the first part, to obtain the observations of new car choices for the discrete choice modelling, a subset based on the first registration date of each car is extracted. Then, a descriptive analysis based on the new car choice data is presented to find the variances of the attributes for the modelling. Specifically, two major issues are paid attention to. One is the change of market share of each car make in these years and the other is the incremental demand of diesel and hybrid fuel . The second part of the thesis deals with the discrete choice modelling. In order to designate the alternatives, another dataset showing the new car supply in Sweden is in- troduced. In the supply data, the alternatives are shown in the car version level, whereas the registration data only contain the names of car models. Additionally, the supply data also have some attributes that are unavailable in the registration, e.g. price. Thus, this thesis presents various matching methods to match the supply and the registration to define the alternatives for the modelling and also to obtain a higher precision of each attribute than that in matching with model names only. Finally, we choose to match the data by the same model name with the same maximum power, which is defined as the “model-engine” level. Therefore, based on these model-engine level alternatives, 18 MNL models are estimated from 2005 to 2010, with 3 different ownerships, namely private owned, company owned and company owned but leasing to its employee which is named as “leasing users”. The results show the slump of the brand constants of Saab among these years in private owners and leasing users due to the close-down crisis when the coefficient of Volvo is fixed to zero. By contrast, the brand value of for private owners and the value of VW for leasing users go up. Meanwhile, this thesis analyses a shift of car buyers’ attitude to the alternative fuel car from negative in 2006 to positive in 2007 when a “clean car” compensation policy is implemented from Jan. 2007 to Jul. 2009. And in 2010, the coefficient of the alternative fuel remains positive. These results indicate that this policy was quite successful.

3 4 Acknowledgements

First, I want to thank my dear parents, Zhencheng Shen and Honggang Du, and my beloved wife, Jing Wu. Without their fully helps, I can hardly finish my master study in Sweden. Second, I am deeply grateful to my supervisor, Dr. Emma Frejinger. Without her suggestion, I cannot even imagine that I would have an opportunity to take part in this project. During these months, the discussions and meetings of this thesis with Emma indeed help me a lot in both professional knowledges and scientific writings. And I also appreciate Visiting Professor Staffan Algers and Dr. Muriel Beser Hugosson for their kindly helps to this thesis. Then, I want to appreciate all the colleges in Division of Transport and Location Analysis, especially Shiva Habibi, Qian Wang, Dr. Tom Petersen and Tongzhou Bai, for their enthusiastic help to my work in different ways. Meanwhile, I would like to thank all the teachers and classmates in Transport Systems programme for their helps in these two years, e.g. Professor Lars-Goran¨ Mattsson, Professor Haris Koutsopoulos, Dr. Joel Franklin, and also my classmates Yu Liu, Shuang Zhang, etc, just to name but a few. Finally, I want to thank those who read this thesis. Your readings and comments make my work valuable. Tack sa˚ mycket!

5 6 Contents

Abstract 3

Acknowledgements 5

Contents 9

List of figures 12

List of tables 14

1 Introduction and literature review 15 1.1 Background ...... 15 1.2 Literature review ...... 15 1.2.1 Discrete choice modelling ...... 16 1.2.2 Modelling methodology ...... 17 1.3 Thesis structure ...... 18 1.4 Scope and limitations ...... 19

2 Data storage and processing 25 2.1 Introduction ...... 25 2.2 Data storage and migration ...... 25 2.3 Software and processing ...... 26

I Descriptive analysis 29

3 Descriptive analysis of vehicle ownership 31 3.1 Introduction ...... 31 3.2 Car ownership analysis ...... 32 3.2.1 Car ownership share by make ...... 32 3.2.2 Car ownership by vintage ...... 34

7 8 CONTENTS

4 Descriptive analysis of new car registries 39 4.1 Introduction ...... 39 4.1.1 Extraction of new car data ...... 39 4.1.2 Model name generation for 2005 to 2007 ...... 41 4.2 Market analysis of choices ...... 43 4.2.1 Issues of defining price ...... 43 4.2.2 Market analysis in Sweden new car market ...... 44 4.2.3 Comparative analysis of the new car market in other countries . 47 4.3 Analysis of fuel type choices ...... 50

5 Descriptive analysis of car attributes 53 5.1 Introduction ...... 53 5.2 Share of fuel types in supply ...... 55 5.3 Distribution of the attribute values ...... 55 5.4 Technology attributes ...... 58

II Disaggregated analysis 61

6 Data matching 63 6.1 Description ...... 63 6.2 Methodology of matching ...... 65 6.2.1 Standardisation of model name ...... 65 6.3 Results and drawbacks of model level ...... 66 6.4 Matching in a more detailed level ...... 68 6.4.1 A level between model and version ...... 68 6.4.2 Matching with power or weight ...... 69 6.4.3 Results of matching by power ...... 76 6.4.4 Conclusion about matching with power and weight ...... 77

7 New car choice modelling 79 7.1 Introduction and methodology ...... 79 7.1.1 Analysis of new car choice sets ...... 79 7.1.2 Estimation tool - BIOGEME ...... 80 7.2 Model estimation ...... 81 7.3 Estimation results ...... 83 7.3.1 Sampling from private owned car data ...... 83 CONTENTS 9

7.3.2 Parameter analysis for private owner choices ...... 84 7.3.3 Parameter analysis for company owner choices ...... 88 7.3.4 Parameter analysis of the choices of company cars for leasing . 91 7.4 Analysis across various years ...... 95 7.4.1 The impact of “clean car” compensation ...... 95 7.4.2 The brand value decline of Saab ...... 97

8 Conclusion and discussion 99 8.1 Summary of results ...... 99 8.2 Comparison results in literatures ...... 100 8.3 Future works ...... 101

List of appendices 107

A List of car makes and models 107

B List of numerical attributes 111

C Estimated parameters comparison 115 10 CONTENTS List of Figures

3.2.1 Total market share of different brands ...... 35 3.2.2 Ownership by vintage of 1984 to 2004 ...... 36

4.1.1 Different numbers of new car registration ...... 40 4.1.2 Comparison of monthly sales ...... 41 4.1.3 Procedures of finding model names before 2008 ...... 42 4.2.1 Shares of car make by origin area in 2007 ...... 45 4.2.2 Shares of car make by origin area in 2010 ...... 45 4.3.1 Share of various fuel types of new registered cars ...... 51 4.3.2 Share of various fuel types in Bil Sweden ...... 51

5.2.1 Share of various fuel types in supply ...... 55 5.3.1 Histogram and density of price ...... 57 5.3.2 Histogram and density of log price ...... 57 5.3.3 Histogram and density of power ...... 57 5.3.4 Histogram and density of displacement ...... 57 5.3.5 Histogram and density of weight ...... 58 5.3.6 Histogram and density of acceleration ...... 58

6.3.1 CV of price, model level 2007 ...... 67 6.3.2 CV of price, model level 2008 ...... 67 6.3.3 CV of price, model level 2009 ...... 68 6.3.4 CV of price, model level 2010 ...... 68 6.4.1 CV of price, model-engine level 2007 ...... 71 6.4.2 CV of price, model-engine level 2008 ...... 71 6.4.3 CV of price, model-engine level 2009 ...... 71 6.4.4 CV of price, model-engine level 2010 ...... 71 6.4.5 CV of price, matching by power and gear 2007 ...... 72 6.4.6 CV of price, matching by power and gear 2008 ...... 72

11 12 LIST OF FIGURES

6.4.7 CV of price, matching by power and gear 2009 ...... 72 6.4.8 CV of price, matching by power and gear 2010 ...... 72 6.4.9 CV of price, model-weight level 2007 ...... 74 6.4.10CVof price, model-weight level 2008 ...... 74 6.4.11CVof price, model-weight level 2009 ...... 74 6.4.12CVof price, model-weight level 2010 ...... 74

7.4.1 Change of MWTP of alternative fuel ...... 96 List of Tables

1.4.1 Summary of literatures ...... 21

3.1.1 Fuel types and codes ...... 32 3.2.1 Rank of car ownership shares by make ...... 33

4.1.1 Different numbers of new car registration ...... 39 4.1.2 Numbers and shares of new cars deregistration in 2008 ...... 43 4.2.1 Numbers and shares of new registries by vintage ...... 44 4.2.2 Market Shares of Top 15 Brands in New Car Market ...... 44 4.2.3 Top 20 models and sales in Sweden new car market ...... 46 4.2.4 Top 10 brands of new car registries in Germany ...... 48 4.2.5 Passenger cars share by origin in China in 1st half of 2010 ...... 48 4.2.6 Sales and market shares in the U.S. in 2009 and 2010 ...... 48 4.2.7 Top 10 models in other European countries in 2010 ...... 49 4.2.8 Top 10 models in U.S. and in 2010 ...... 50

5.1.1 Number of versions in supply without data missed ...... 54 5.1.2 Shares of vehicle types in the supply ...... 54 5.3.1 Correlation of attributes ...... 58 5.4.2 Shares of dummies in supply data 2007 and 2008 ...... 59 5.4.3 Shares of dummies in supply data 2009 and 2010 ...... 60

6.1.1 Shares of new cars matched in different aggregation level ...... 64 6.1.2 Shares of Fiat can be matched ...... 64 6.2.1 Comparison of the number of models ...... 66 6.3.1 Greatest 5 models with most versions ...... 67 6.4.1 Data missing in displacement ...... 69 6.4.2 Matching and mismatching by power ...... 73 6.4.3 Matching and mismatching by weight ...... 74 6.4.4 Summary of CV of price from 2007 to 2010 ...... 75

13 14 LIST OF TABLES

6.4.5 Shares of data after matching (excluding imported cars) ...... 76 6.4.6 Paired t-test of 2009 and 2010 ...... 77

7.1.1 Size of choice sets from 2005 to 2010 ...... 80 7.3.1 Estimation results of private owned cars ...... 84 7.3.2 Estimation results of company owned cars ...... 88 7.3.3 Estimation results of company leasing cars ...... 92

A.1 Car makes and models in Swedish new car market ...... 107

B.1 Statistical analysis of the variables ...... 111

C.1 Comparison between sample and total observations for model 2009 . . 115 Chapter 1

Introduction and literature review

1.1 Background

The automobile industries and markets play crucial roles in modern society. Among the industrialised countries with greatest gross domestic product (GDP), most of them have mighty car industries and huge markets, such as U.S. (GM, Ford), Japan (, ), Germany (Volkswagen, Mercedes) and (Renault, Peugeot). Even in the newly industrialised countries, like China (FAW, ) and (Tata), their automo- bile industries are booming as well. Meanwhile, in Sweden, from 2009 to 2010, both Swedish car makes, Volvo and Saab, go through a reselling crisis by their former U.S. parent companies respectively, Ford and GM, due to the effect of late-2000s financial crisis. Looking at the aspect of climate change, the emissions of vehicles on the road contribute a great amount of greenhouse gases, e.g. carbon dioxide, which lead to the global warming. It would be essential for the transport sectors to know the consumers’ behaviour in car purchase in order to control the carbon budgets. Therefore, it is of interest to study and model the consumers’ choice cars. Given that an individual has already decided to purchase a car, she usually has two alternatives: to buy a new car or to buy a second-hand car. This project specially focuses on the choice of new cars. That is, which make/model/version of car one probably choose in a particular year (or vintage) if she wants to purchase a new car.

1.2 Literature review

Before this project, to my knowledge, the disaggregate car choice model has been stud- ied by many researchers since late 1970s. As forerunners, Lave and Train (1979) present the earliest disaggregated model to study the vehicle choice decisions. Then, under the circumstance of 1979 oil crisis, face on different data, locations and years, various dis- aggregate models have been developed in the U.S. since 1980s. Exemplarily, Manski

15 16 CHAPTER 1. INTRODUCTION AND LITERATURE REVIEW

and Sherman (1980), Mannering and Winston (1985) conduct their car choice models in multinomial logit forms, while Berkovec (1985), Berkovec and Rust (1985) develop the models in nested logit structures. Later on, due to the consideration of environment, energy consumptions and the effects of market, various models are constructed to find the endogenous reasons, e.g. Mannering et al. (1991), Choo and Mokhtarian (2004), Train and Winston (2007) and Hess et al. (2009), just to name but a few. In Nordic countries, laying on great emphasis on environmental protection, some recent disaggregate car choice models are developed as well. For instance, in Denmark, Arnberg et al. (2008) present the Danish individuals’ new car choice by estimating a multinomial logit model to investigate the impact of fuel cost. In Sweden, Hugosson and Algers (2011) estimate Sweden car fleet models to analyse the policy effect of the increase share of “clean” cars. In terms of the master theses focusing on the same issue as here, Nilsson (2008), Kunnapuu¨ (2009) develop new car choice models respectively to test various policies or markets scenarios and the consequent environmental effects. These these provide comprehensive instructions to deal with the Sweden new car choice modelling with the similar data in this project. Some of these researches are reviewed explicitly in tabular form by Choo and Mokhtarian (2004). Next, we attempt to continue their works to extend such a table, shown in Table 1.4.1. In this table, we summarise several recent researches adapted in the U.S., Denmark and Sweden, which may be quite of help to the work of this project. These papers, especially for the researches conducted in Sweden, show the alternatives in their studies and the significant explanatory variables, which can be the references to our works.

1.2.1 Discrete choice modelling

To model the car choice behaviour, a good way is to employ the discrete choice mod- elling methods. The following discussions are based on the description regarding to discrete choice theory in the book of Train (2009), Discrete Choice Methods with Sim- ulation. To apply the discrete choice methods, the following three assumptions of our alter- natives must hold: Mutual exclusivity: that the car buyers purchase one car indicates she does not ￿ buy other cars. Exhaustiveness: all possible alternatives are covered and the car buyers are to ￿ 1.2. LITERATURE REVIEW 17

choose one of the alternatives. Finiteness: the number of alternatives is countable. ￿ Among these criteria, what has to be noted is the second one that our choice sets should be exhaustive. Our choice sets include all available car information in Sweden domestic passenger car market. Actually, one can buy foreign cars directly from other countries, and one can also buy pure electrical cars or formula-shaped cars, which are conceptual. These alternatives are not counted in this thesis as first such information of these cars are of lack. And, as what has been analysed before, the share of these cars are rather minor. One may complicate the model by taking account of these data but the affect to the whole model would be limited. So, in this project, we focus on the prevailing passenger cars in the domestic automobile market.

1.2.2 Modelling methodology

Before the car buyers make their decisions, we assume that their purchase behaviours are rational, which means that they intend to maximise the utilities of their choices. Thus, we are to construct the utility functions of car choices like the shape of equa- tion 1.1.

Unik = Vnik + εnik (1.1)

Utility function 1.1 shows the utility of individual n choosing car alternative i in

year k, which are consisted by two parts, the deterministic term Vnik and the error term

εnik. The deterministic part of utility Vnik is linear in parameter represented by βk￿ xnik

where βk￿ is a vector of parameters of car attributes xnik in year k. The error term εnik is treated as the random error, which captures the unobserved part of the utility. In logit

model employed in this project, each εnik is assumed to be independent and identical with Gumbel distribution, of which density function is like:

ε εnik e− nik f(εnik)=e− e−

After several mathematics derivations, the probability for individual n to choose i in year k is: V β x e nik e k￿ nik Pnik = J = J (1.2) Vnjk βk￿ xnjk j=1 e j=1 e To estimate the model, we try￿ to find the value￿ of vector βk, which can maximise

the log-likelihood function 1.3, where ynik equals 1 if individual n chooses alternative i in year k. N

LL(βk)= ynik ln Pnik (1.3) n=1 i ￿ ￿ 18 CHAPTER 1. INTRODUCTION AND LITERATURE REVIEW

Plug equation 1.2 into 1.3, we can eventually obtain the form of log-likelihood func- tion like:

N N J β x LL(β )= y e k￿ nik y ln β￿ x (1.4) k nik − nik k njk n=1 i n=1 i ￿ j=1 ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ To maximise the log-likelihood function, we simply take the first-order condition by making the derivative of function 1.4 equal to zero:

dLL(β ) N k = (y P ) x =0 dβ nik − nik nik k n=1 i ￿ ￿

While we get the estimated parameters, βˆk, we can study the likelihood ratio as

LL(βˆ ) ρ =1 k . − LL(0)

to see how the likelihood ration can be better. In this goodness of fit measurement, LL(0) indicates the log-likelihood by setting all other parameters as zero. So, this ρ provides us a relative value between 0 to 1 reflecting the improvement from the esti- mated parameters to zero parameters. Extremely, if ρ =1, it means that the model predicts the choices in the sample perfectly.

1.3 Thesis structure

To handle the analysis of Sweden new car purchase, two analysis methods are intro- duced. One is descriptive analysis, dealing with the statistical analysis in an aggregated level. In the aggregate analysis, we do not know how various parameters may affect the behaviour of buying a new car. Thus, the second part is taken account, which is disag- gregated analysis. With the help of discrete choice modelling, the resultant models in this part can be of help in forecasting and in response to policies. Therefore, this project is divided into 2 major parts: Part 1 is from Chapter 3 to Chapter 5, conducting the descriptive analysis of • the demand (car registries) and the supply (car attributes), involving the market- and statistical analysis. Specifically, Chapter 3 makes an analysis of the whole Swedish car fleet data, which is the basis for the following data analysis and modelling. Chapter 4 extracts subsets from the car fleet data as the new car registration data in each year which is to be used for the modelling. Due to the incomplete car characteristics information in the new car registration data, we added an additional car supply data including the car attributes to our analysis, which is conducted in Chapter 5. 1.4. SCOPE AND LIMITATIONS 19

Part 2, from Chapter 6 to Chapter 7, shows the methodology of data matching as • well as the structures and results of modelling. In this part, Chapter 6 analyses different scenarios of matching under various criteria. And the matching method with the best result is proposed. Chapter 7 constructs in total 18 multinomial logit models for each year (2005 to 2010) with each type of ownerships (private owned, company owned and company owned for leasing). The results of these models are explicated. And two major issues are analysed. One is the influence of “clean car” compensation, whereas the other is the decline brand value of a Swedish car maker, Saab. Before the analysis, the motivation to deal with this project is introduced and a sum- mary of previous related researches is drawn. After the 2-part analysis, the conclusion is made and some limitations are discussed. Last but not least, in this project, the data sources of all the tables and charts are from author’s calculation, unless otherwise stated.

1.4 Scope and limitations

This thesis focuses on the data processing and modelling approaches dealing with the Sweden car fleet modelling. Different from other master’s these, this thesis estimates models from 6 years, with a model-engine level, which have more accurate attributes (e.g. price) with smaller coefficient of variation. This thesis also develops a compre- hensive method adapted to deal with the Sweden new car choice data, which can be replicated for the following researches. However, there are still some issues that should be paid attention to. This thesis handles the choice of new car in Sweden, where the term of “new car” actually refers to “new passenger cars”. This means that the luxury cars (e.g. Ferrari and Lamborghini), recreational vehicles1 (e.g. Burstner),¨ formula shaped cars (e.g. Ariel Atom 2), and conceptual cars, (e.g. Think City), are not taken account into our models. And due to the incompleteness of our car supply data, there is no information about gas fuel (e.g. LPG or Bio-gas) cars. The information about this kind of cars cannot be captured though there are gas fuel cars being registered in each year. In this thesis, the omittance of alternatives in terms of gas fuel limits the accuracy of the resultant models. This thesis estimates one static MNL model for each type of ownerships per year. And it may be better if we estimate dynamic models with the consideration about time parameters. To develop dynamic models, we have to track the information of each

1This term is also called as camper , which refers to the vehicles installed with living space and home amenities. 20 CHAPTER 1. INTRODUCTION AND LITERATURE REVIEW individuals. However, the car registration data we have only contain the identity (reg- istration number) of each car instead of each owner. And it is hard to find the unique attribute of each car owner. Another issue is that, when we match the data in Chapter 6, the price of car is irrelevant to the fuel types. This can introduce errors to the results, since in this case, the only parameter distinguishing different fuel type of cars is the fuel dummies. But actually, the prices of various fuel type cars are different, and this may play an important rule in affecting car buyers’ choice behaviour. 1.4. SCOPE AND LIMITATIONS 21 U.S. car: Japanese car: brand loyalty: Continued on next page ￿ age (+) ￿ pacific coast (+) metropolitan area (+) ￿ Pre 1980s: Chrysler 1980s: Japanese Big 3 (e.g. age, HH income) auto (+) (+) length - wheelbase (+) fuel consumption (-) car type dummies car maker dummies mid-sized; large;luxury; sports; (e.g. calm, organizer) /van;pickup; SUV lifestyle (e.g. workaholic) demographics vintage of 2000 horsepower/weight (+) Upper level:Lower level: chosen from RP data price repair (-) index (+) ￿ ￿ make/model/vintage utility vehicle (-) Summary of literatures Table 1.4.1: San Francisco Bay Area (1998)consumers from U.S. (2000) small; compact; personality models with the retained value (+) histories in U.S. (1989) new or used car alternatives weight (+) Choo and Mokhtarian (2004) A survey of 1904 residents in MNLTrain and Winston (2007) A random sample of 458 Mixed logit 9 categories: travel attitudes 200 makes and retail price (-) ReferenceMannering et al. (1991) 488 complete vehicle ownership Data source NL (year) Model Notes Alternatives Significant attributes 22 CHAPTER 1. INTRODUCTION AND LITERATURE REVIEW End of table weight (+) payload (+) acceleration (+) air-bags (+) ABS (+) 4 doors (+) car types price (-) fuel efficiency (+) fuel availability (+) large HH for largecle vehi- (+) large vehicle withnative alter- fuel (-) rust protection warranty safety engine power share of fuel stations car make × Lower level: 4 stated choice acceleration (+) Upper level:Lower level: 300 car models price/benefit tax size class Upper level: 15 car types ￿ 105 combinations alternatives￿ range (+) ￿ ￿ Table 1.4.1 – continued from previous page or online survey, fromCalifornia (2008-2009) from Swedish Car Register andan SP fuel survey nest: of 7 new fuel carfrom types; purchase autumn 2005 combining with car age (-) car brands car models running cost volume Denmark (1992-2001)SP data from mail-back paper car nest: 15 car types; versions 7 fuel types, fuel consumptions income (-) (+) Hugosson and Algers (2011) Complete Swedish vehicle stock NL ReferenceArnberg et al. (2008) 131,214 observations from Data source (year)Hess MNL et al. (2009) log of price Model RP data from telephone, Notes 1,266 new car CNL log-price (-) Alternatives Significant attributes 1.4. SCOPE AND LIMITATIONS 23 24 CHAPTER 1. INTRODUCTION AND LITERATURE REVIEW Chapter 2

Data storage and processing

2.1 Introduction

In this thesis, we consider a static discrete choice model and estimate it based on the revealed preference (RP) data within each year. Three main data sources are available. The first one is the Sweden whole car fleet registration data from 2004 to 2010, includ- ing the information about every car being registered in Sweden, e.g. car registration number, name of car makes, horsepower, etc.. The second one is the car registry data of new vehicles from 2007 to 2010, which can be treated as subsets of the ownership data. These data are used in previous studies about Sweden car fleet modelling. The third source of data covers the supply information in Sweden. The supply data show the actual car alternatives in Swedish new car market and their features, including more than 100 attributes of each car version which have been available in Swedish car market since 1999.

2.2 Data storage and migration

The original data are stored in the SPSS1 (.sav) format. Each size of these files is more than 2 gigabyte since there are more than 5 million rows of records in each year. With these huge amount of data, we may find that SPSS cannot process these data rapidly. In addition, as a commercial software, SPSS cannot be available in every computer. Besides SPSS, another spreadsheet software, Microsoft Office Excel, is not even able to store such amount of data, since the maximum row in Excel 2007 is limited to 1,048,576 rows. Therefore, one of the feasible solutions is to migrate the data from SPSS to an SQL database, which is able to process the data in a very short computational time. In this project, an open source database, MySQL2, is chosen. In fact, besides SQL

1IBM SPSS statistics. http://www.spss.com/ 2MySQL Community Server. http://www.mysql.com/

25 26 CHAPTER 2. DATA STORAGE AND PROCESSING database, another database, Microsoft Access, is also available. But, since one single Access database file cannot exceed 2 gigabytes, this database cannot be used for the migration here. However, since the speed of table joining in Access is faster and more user-friendly, the usage of Access for data matching is specified in Chapter 6. Due to some practical issues, namely, that the MySQL server is set up in a Unix- based system, and the “myodbc” connector has some unknown problems in such a system, one cannot export the data directly from SPSS to MySQL database. As a result, another feasible method is to export the whole data into a .csv file, and than use some SQL commands to import the .csv file. Because of the decimal symbol in Sweden is comma as well, this situation may lead to a conflict with the comma delimited format. Fortunately, in SPSS, the cells including comma are enclosed by quotation marks when a .csv file is exported. Thus, when one implements the data import, this fact should be defined in the SQL commands.

2.3 Software and processing

After, we migrate the data to our MySQL database as what has been introduced in Section 2.2. Since MySQL itself does not contain many data analysis tools, we use R3 for the statistic analysis. James (2001) proposed an open source interface package of R that can connect to MySQL database, named “RMySQL”. With the database interface (DBI) of R, the RMySQL package allows one to call the tables in MySQL. Therefore, the statistic analysis of the demand and supply can be easily done. With comparison to Excel, the combination of R and MySQL has the following ad- vantages. First, the huge amount of data can be processed quickly in MySQL. Second, both MySQL and R are cross-platform, which means that they can be run in Microsoft Windows, Linux or Mac OS systems with good compatibility. In Excel, the Data Anal- ysis Toolbox is currently only available in Windows and there is no Microsoft Excel edition available for Linux system. Finally, MySQL and R are both open source that ones can make changes for their own preference. Meanwhile, besides R, Matlab4 can be another useful tool for the statistic analysis. To connect Matlab with MySQL database, several interface codes are available. For instance, Almgren (2005) releases a function file named “Matlab Database Connector”, which is able to execute the SELECT query in Matlab and insert additional data to the database. Though both Matlab and R are compatible for either Windows-based or Unix-

3The R Project for Statistical Computing. http://www.r-project.org/ 4http://www.mathworks.com/products/matlab/ 2.3. SOFTWARE AND PROCESSING 27 based systems, Matlab is not a free software and an additional statistics toolbox5 is also needed for the analysis. Despite of the charges, the computing performance of Matlab is more powerful than R as one can program her own code in Matlab with custom settings.

5http://www.mathworks.com/products/statistics/ 28 CHAPTER 2. DATA STORAGE AND PROCESSING Part I

Descriptive analysis

29

Chapter 3

Descriptive analysis of vehicle ownership

3.1 Introduction

In this chapter, the vehicle ownership in Sweden are analysed. The ownership data show the information about every car which is currently registered in Sweden. The information is provided by two institutions: SCB1, which provides the data from 2004 to 2007, and the Swedish Transport Agency2, which provides the data from 2008 to 2010. In these stock data, the registration of the whole vehicle population these years in Sweden are all available. Due to the different sources, the registration details in these data are not identical. However, some useful information of car attributes can still be found in both resources, which are enumerated and explained below. Car plate license, which is called as “registry number” in the datasets with the ￿ shape of “ABC123”, indicating the Swedish car plate license of each registered vehicles. Each vehicle has and only has one unique license serial number. Car make, each of which is given to an abbreviation with two letters in the stock ￿ data. For instance, VO is the abbreviation of Volvo; VW is short for Volkswagen. Vintage, which shows the produced year of a car. ￿ Direct import, which distinguishes between the domestic cars and the imported ￿ cars. Since the imported cars may be registered abroad before enter Sweden, they may have very old vintages and different attributes. Power, which demonstrates the maximum power of each registered car in kilo- ￿ watt. Total weight, which means the payload that a vehicle can afford plus the car’s ￿ own weight.

1Statistics Sweden, Statistiska centralbyran˚ in Swedish. For SCB, the data are from the Swedish Transport Agency as well, but the data we get have been processed by SCB. 2Transportstyrelsen in Swedish.

31 32 CHAPTER 3. DESCRIPTIVE ANALYSIS OF VEHICLE OWNERSHIP

Fuel type, which is coded from 1 to 17, each of which indicates a type of fuel. ￿ The fuel types are listed in Table 3.1.1. Clutch, which mainly includes manual and automatic transmission. ￿ Length and width, which is measured in cm in SCB’s data but in mm in Swedish ￿ Transport Agency’s data. Other car attributes like colour, environmental class (Sweden standard) are also ￿ available in the datasets. Besides these car attributes, some social-economic values are also registered, such as the car owner’s living area or the gender and birthday of the current owners.

Table 3.1.1: Fuel types and codes

Code 12 3 4 5 6 Name Gasoline Diesel Electricity Kerosene Liquid-gas Producer gas

Code 7 8 9 10 11 12 Name Ethanol Methanol LPGα Canola oil Paraffin Natural gas

Code 13 14 15 16 17 Name Biogas E85β RMEγ Methane Hydrogen α Liquefied petroleum gas. β E85 is the mixture of 85% methanol and 15% gasoline. γ Abbreviation of Rapsmetylester in Swedish, which refers to a type of biodiesel.

3.2 Car ownership analysis

3.2.1 Car ownership share by make

Unlike buying yoghurt or coke, the price of a car is much higher and the purchase behaviour of a car for an individual is not recurrent in a short term (e.g. 1- or 2-year). And a car is more durable than some daily goods (like yoghurt). Therefore, one has an (acceptable) experience about a particular car make, she might feel that the uncertainty of giving up a familiar brand but choosing another brand might be of importance. Under such an assumption, the conservative choice that purchasing the make have even owned may be the best choice. Actually, in the U.S., Mannering et al. (1991) investigate the brand loyalty into their research of the vehicle choice model in the U.S. automobile market in 1980s. They point out that brand loyalty is an essential explanatory variable that can affect the market shares. Train and Winston (2007) present an interpretation in terms of brand loyalty that, due to the confidence built in one brand for a consumer, her own experience with this brand is likely to affect her decision to buy the products of the 3.2. CAR OWNERSHIP ANALYSIS 33 same brand in the future. According to these explanations, if the consumer behaviour between Sweden and the U.S. is similar, it is necessary to study the historical car fleet ownership in Sweden, like what the market shares are and if the shares have shifted for these years.

Table 3.2.1: Rank of car ownership shares by make

2004 2005 2006 2007 2008 2009 2010

Volvo Volvo Volvo Volvo Volvo Volvo Volvo 1 0.2335 0.2362 0.2346 0.2335 0.2303 0.2292 0.2270 VW VW VW VW VW VW VW 2 0.0982 0.0988 0.0985 0.0982 0.0976 0.0981 0.0995 Saab Saab Saab Saab Saab Saab Saab 3 0.0860 0.0890 0.0877 0.0860 0.0835 0.0812 0.0778 Ford Ford Ford Ford Ford Ford Ford 4 0.0730 0.0874 0.0757 0.0730 0.0698 0.0686 0.0671 Opel Toyota Toyota Toyota Toyota Toyota Toyota 5 0.0571 0.0543 0.0553 0.0563 0.0572 0.0589 0.0598 Toyota Opel Opel Opel Audi Audi Audi 6 0.0533 0.0539 0.0513 0.0487 0.0460 0.0478 0.0479 Audi Audi Audi Audi Opel Opel Opel 7 0.0470 0.0476 0.0481 0.0483 0.0460 0.0449 0.0430 Mercedes Mercedes Mercedes Mercedes Mercedes Mercedes Mercedes 8 0.0394 0.0394 0.0394 0.0393 0.0388 0.0387 0.0387 Renault Renault Renault Renault Renault Renault BMW 9 0.0345 0.0357 0.0366 0.0369 0.0367 0.0367 0.0373 BMW BMW BMW BMW BMW BMW Renault 10 0.0298 0.0311 0.0325 0.0340 0.0353 0.0362 0.0373 Peugeot Peugeot Peugeot Peugeot Peugeot Peugeot Peugeot 11 0.0275 0.0296 0.0314 0.0331 0.0352 0.0352 0.0355 Nissan Nissan Nissan Skodaˇ Skodaˇ Skodaˇ 12 0.0271 0.0258 0.0243 0.0228 0.0233 0.0242 0.0256 Mazda Mazda Mazda Skodaˇ Nissan Citroen¨ Citroen¨ 13 0.0257 0.0239 0.0223 0.0214 0.0211 0.0206 0.0203 Mitsubishi Mitsubishi Mitsubishi Mazda Citroen¨ Nissan Nissan 14 0.0210 0.0209 0.0205 0.0206 0.0199 0.0200 0.0200 Skodaˇ Skodaˇ Skodaˇ Mitsubishi Mazda Hyundai Hyundai 15 0.0159 0.0176 0.0195 0.0196 0.0188 0.0190 0.0200 Citroen¨ Citroen¨ Citroen¨ Citroen¨ Mitsubishi Mazda Mitsubishi 16 0.0154 0.0167 0.0175 0.0189 0.0187 0.0183 0.0175 Hyundai Hyundai Hyundai Hyundai Hyundai Mitsubishi Mazda 17 0.0126 0.0142 0.0156 0.0167 0.0182 0.0180 0.0174 Honda Honda Honda Honda Honda Honda Honda 18 0.0108 0.0108 0.0108 0.0112 0.0119 0.0119 0.0121 Continued on next page 34 CHAPTER 3. DESCRIPTIVE ANALYSIS OF VEHICLE OWNERSHIP

Table 3.2.1 – continued from previous page 2004 2005 2006 2007 2008 2009 2010

Seat Chevrolet Chevrolet Chevrolet Chevrolet Chevrolet Chevrolet 19 0.0079 0.0082 0.0087 0.0092 0.0104 0.0105 0.0107 Chevrolet Seat Seat Seat Seat Seat Kia 20 0.0074 0.0080 0.0081 0.0083 0.0085 0.0085 0.0097 Sum of the shares 0.942 0.940 0.939 0.936 0.929 0.927 0.924

End of table

According to Table 3.2.1, Volvo, Volkswagen, Saab and Ford have the highest own- ership in Sweden from 2004 to 2010. Especially, Volvo shares about 23% of whole ownership in Sweden. For another Swedish car make, Saab, though it has the third largest market share in Sweden, its whole ownership share decreases continuously since 2005. For foreign car makes, except Ford and Toyota, most Swedish car owners own the European make of cars. Meanwhile, the trend of the ownership of European cars seems to increase in these years, represented by Skodaˇ and Citroen.¨ But for some Japanese makes in this table, Nissan, Mazda and Mitsubishi, their total market shares of ownership decline continuously in these 7 years. These trends are shown clearly in Figure 3.2.1. Although the share of Volvo goes down slightly, it still shares more than 20% of the whole ownership, twice more than Volkswagen. The shares of Saab and Ford decline while the share of Toyota increases steadily. If this trend did not change, Toyota would replace the standing of Ford, even Saab, in the future. As a typical exam- ple of European make, Skoda’sˇ market share continuously goes up, whereas Mazda, as an example of Japanese make (except Toyota), follows a converse trend of Skoda.ˇ If Mannering and Train’s assumption holds, these trends may demonstrate that if an individual owns a Volvo (or Volkswagen) car, she may probably choose another Volvo (or Volkswagen) as well if she wants to purchase a new car. If one owns a Nissan, Mazda or Mitsubishi, she may likely choose another makes as her new car due to the continuous decline of ownership.

3.2.2 Car ownership by vintage

Due to the different data source, there are a large amount of vintage data omitted from 2004 to 2007. That is because in the registration, it is not mandatory for the car own- ers to register their car vintage. In fact, in data from 2008 to 2010, there are still a 3.2. CAR OWNERSHIP ANALYSIS 35

!"#$%& -#$&./*%$0"1(&0"#$%0& '(%$)%*+,& !#&$"

!#&"

,-./-"

!#%$" ,0" 1223"

4-56"

!#%" 7-8-92" :2;62"

1<-62"

!#!$"

!" &!!'" &!!$" &!!(" &!!)" &!!*" &!!+" &!%!"

Figure 3.2.1: Total market share of different brands

large amount of data missed, but the Swedish Transport Agency provides the following priority to cover the omitted vintage:

Year of the model, arsmodell˚ in Swedish. ￿ Registry date of import, which means the date of first registration abroad. ￿ Production month. ￿ Registry date, if none of above is available, the last choice is to use the year of ￿ registration.

Therefore, to obtain a better consistency in 2004 to 2007, we use the same prece- dence order, provided by the Transport Agency above. What have to be noted is that, in these 4 years’ data, we do not have the data of registration abroad. So, in our ad- justment, we consider the production month as the second priority. However, this may cause some confusions. The year of the model can be an indicator of the model’s shape, since for different years, the shapes can be various. One of the examples is , of which 2004’s model (Passat B5) is different from the 2005’s model (Passat B6). However, the Passat model produced in the begin of 2005 is still the 2004’s model (Passat B5), since the new model, B6, is first exhibited in March of 2005 at the Geneva Motor Show. Then until the summer of 2005, this model starts to be available in Euro- pean new car market. In this case, the production year/month and the year of the model are not be equivalent. Fortunately, one cannot say this situation may happen frequently, 36 CHAPTER 3. DESCRIPTIVE ANALYSIS OF VEHICLE OWNERSHIP

Car ownership by vintage in 2004 - 2010 '!!"

&#!" !"#$%&'(%)

&!!"

%#!"

%!!" %!!'" %!!#"

$#!" %!!*"

%!!+" $!!" %!!)"

%!!(" #!" %!$!" !"

$()'" $()#" $()*" $()+" $())" $()(" $((!" $(($" $((%" $((&" $(('" $((#" $((*" $((+" $(()" $(((" %!!!" %!!$" %!!%" %!!&" %!!'" Vintage

Figure 3.2.2: Ownership by vintage of 1984 to 2004 since that if a model were launched to the market, it would be stable for several years. So, the majority of the years of the model can be the same to the production years. Meanwhile, the production year and the registration year can be different as well, e.g. a car assembled in December can be registered in January of the next year.

Figure 3.2.2 indicates the change of car ownership in 2004 to 2010 by different vintages. This figure summarises the ownership of cars the vintage from 1984 to 2004. According to the chart, the differences of ownership are quite obvious before the 1997 vintage. An extreme example is that, in 2004, the ownership of 1988 vintage cars is approximately 350,000; but in 2010, the ownership of 1988 vintage cars decreases by 200,000, to the number of less than 150,000. However, with the vintage after 1997, the numbers do not alter largely. This may show that, during 2004 to 2010, the vehicles with the order vintage of 1997 are more likely to be scrapped. Namely, if the age of vehicle is older than 15 years, the owners may probably decide to change them. Or, we might say that the expected life-span of a car is roughly 15 years.

Additionally, this chart also shows that some particular vintages of cars (e.g. 1988 and 1999) are more than other vintage of cars in the stock data. And between the con- secutive registration years, the number of obsolescence of cars in various vintages are not the same. For instance, between year 2005 and 2006, more 1988 vintage of cars are scrapped than the 1998 vintage of cars. Plus, for the same vintage of cars, the number of abandonment of cars are various either in each year. Like, for the 1988 vintage of cars, 3.2. CAR OWNERSHIP ANALYSIS 37 there are about 100,000 cars of this vintage are abandoned from year 2005 to 2006, and only about 20,000 cars of this vintage are scrapped from year 2009 to 2010. Actually, in 2005, when the age of 1998 vintage of cars becomes 17 year-old, these car owners starts to abandon their cars. Till 2008, when the numbers of 1988 cars in the stock data decrease to about 170,000 from 330,000, the speed of obsolescence goes down. So, besides the judgement to the approximate life-span of a car, this phenomenon can be of help to the following researches in terms of car scrapping models. 38 CHAPTER 3. DESCRIPTIVE ANALYSIS OF VEHICLE OWNERSHIP Chapter 4

Descriptive analysis of new car registries

4.1 Introduction

4.1.1 Extraction of new car data

To obtain the new car registration data, we have generate the RP data of new car regis- tration. The first method (method 1) is to use the new car registration data extracted by the transport agency. Since these data are used for previous studies in terms of Swedish new car choice modelling. The second alternative (method 2) is to find the cars which newly enter the stock data per year. That is, if a car is not available in 2007 but ap- pears in 2008, we would consider it as the new car in 2008. This interpretation is also reasonable since the data of new car can only be first appeared in the stock data of the purchase year. The final method (method 3) is simply to focus on the attribute named “date for first registration” in each year’s stock data. For instance, if a car is registered in 2008 in the stock data of 2008, we may count it as a new car in 2008. All of these three methods are quite reasonable, but we need to find the most ac- curate one among this three. To test and verify which method is more accurate, we consider the data provided by “trafikanalys” as our reference, which can be counted as the official statistic data, which are available in 2006 to 2010.

Table 4.1.1: Different numbers of new car registration

2006 2007 2008 2009 2010 Method 1 N/A 329,013 270,815 221,837 306,164 Method 2 307,423 302,734 318,612 218,238 320,289 Method 3 313,522 338,216 274,286 225,084 306,465 Trafikanalys 313,812 338,538 276,344 228,528 308,734

The comparison of the numbers generated by each method is shown in Table 4.1.1 and Figure 4.1.1. Comparing with these three methods, we may find that “method 3” is the best alternative to generate the new car datasets, since for method 1, the data before

39 40 CHAPTER 4. DESCRIPTIVE ANALYSIS OF NEW CAR REGISTRIES

2007 are not available, while for method 2, the differences from the reference data are quite great. Meanwhile, to test whether method 3 indeed fits the official statistics, Fig- ure 4.1.2 compares the monthly sales computed by method 3 from the sales provided by Trafikanalys from January, 2006 to December, 2010. These two lines almostly coin- cide, which means that the method that considers about the “date of first registration” fits the official statistics. Therefore, we are to use this method to generate the new car data from 2004 to 2010. In fact, it is reasonable that all the data from method 3 are slightly smaller than Trafikanalys’ data, since there is a small amount of registration information omitted in the stock data each year, for instance, one may not provide any- thing about the registration date in the stock data. So, with a synthesis consideration above, the new car data generated by the 3rd method is finally chosen to be our new car choice data.

Different number of new car registration

&'####$

&&####$

&!####$

%"####$ +,-./0$!$

%(####$ +,-./0$%$ +,-./0$&$

%'####$ 12345363789$

%&####$

%!####$

!"####$ %##)$ %##($ %##*$ %##"$ %#!#$

Figure 4.1.1: Different numbers of new car registration

Due to the discussions above, the new car registration data are the subsets of the ownership data. According to the car registration, we can know the circumstances in- volving the annual sales of each car make and model. Nonetheless, the RP registry data do not only contain the information of new cars with the latest vintages, but also have the registration records with old car models. Mostly, the registration of records show the information of new cars, since the information of second-hand cars are al- ready registered. The change for a second-hand car simply leads to the change of its owner. However, there are still some exceptions that a used car is registered in the new car registration data. One of the cases is that if a vehicle is imported directly from an- other country. Since this used car has never been registered in Sweden, this information 4.1. INTRODUCTION 41

!"#$%& Monthly sales of new cars &$"""#

&""""#

%$"""#

%""""#

!$"""#

+,-./#01,1#

2314/151678# !""""#

%""'"!#%""'"&#%""'"$#%""'"(#%""'")#%""'!!#%""("!#%""("&#%""("$#%""("(#%""(")#%""(!!#%""*"!#%""*"&#%""*"$#%""*"(#%""*")#%""*!!#%"")"!#%"")"&#%"")"$#%"")"(#%"")")#%"")!!#%"!""!#%"!""&#%"!""$#%"!""(#%"!"")#%"!"!!#

Figure 4.1.2: Comparison of monthly sales

maybe appears in the new car registration data. In the registration data, besides the car plate license (registration number) and vin- tage, the following attributes are recorded: Car model and model name, which include the make and model name of the ￿ registered cars, like “Volvo B + V70”. But such information is not available in data from 2004 to 2007. Group-code. In majority, each car model or version has a unique code with the ￿ form of “AA 123456”, where the first 2 letters indicate the abbreviation of car make and the last 6 digits refer to the car version. Nonetheless, unfortunately, the terminology of group-code is unknown. In Section 4.1.2, we try to find the terminology for each year. Owner, which indicates if the car is owned by personal or by a company. ￿ Leasing, which means that if a car is leased. ￿ Except these values, some environmental related attributes like fuel type and envi- ronmental class are contained in the new car registration data as well.

4.1.2 Model name generation for 2005 to 2007

The attribute of car model name is essential, since we cannot know which model of the car is without this information. However, this number is only available from 2008. To attain the model names before 2007, a process is developed, which is shown in the flow chart, Figure 4.1.3. The procedure is adapted designed for the new car registration data from 2005 to 2007. First, we need to match the new car registration data with the stock data in 2008 by registration number. In this step, we can add the model names of the same cars which are available in 2008 to the new car data. However, since a car can 42 CHAPTER 4. DESCRIPTIVE ANALYSIS OF NEW CAR REGISTRIES be registered in 2007 and be deregistered in 2008, in the second step, we separate the cars which are deregistered in 2008 from the cars which are still holding. Then, for the holding cars, we know their registered model names, like “Volvo B + V70”, whereas for the deregistered cars, we still cannot know the names of the models, which is shown as null. Thus, we need to find the car name information of these deregistered cars.

new car match with stock registries registration data 2008 2005 - 2007 number

new table deregistered deregistered holding in 2008 (car name unavailable)

holding group code in 2008 match with corresponds (car name group code to car name available)

deregistered in 2008 (car name available)

final table

Figure 4.1.3: Procedures of finding model names before 2008

What can be done is to reference the information from the holding cars and their corresponding group codes, since we can trust that the terminology of the group code holds the same within each year for the same car registration data. Next, based on this assumption, we can generate a mapping that each group code maps the corresponding car model name. With this map, we can then get the model names of the deregistered cars since we know their group-codes. What has to be noted is that, since some group- codes may not only indicate one car model, there may be some errors. For example, if the code is like “VO 000000”, no one can get any information on it, except the brand, 4.2. MARKET ANALYSIS OF CHOICES 43

Volvo in this case. That is, not all of the group-codes refer to a unique known car model. Nonetheless, it seems to be the only way to obtain the omitted model names. Based on these discussions, we know that the mapping may introduce some errors. To evaluate how many errors are introduced, it is necessary to provide the evidences in terms of the share of deregistered cars in 2008 for each year’s new car data. Thus, Table 4.1.2 shows the ratio of deregistration to the whole number of new car registration. According to this table, we can say that the share of the deregistration is quite small in the whole new car registration data. In this minority car registries, the share of a car being deregistered with an ambiguous group code is quite small. So, even if there were some errors introduced by the mapping, it would not affect the total share significantly. Eventually, with the generation of deregistered car model names, and the holding car model names, we need to combine these two parts of data to get the final table. This table thus contains model name information we need.

Table 4.1.2: Numbers and shares of new cars deregistration in 2008

2005 2006 2007 Deregistration 7,303 4,710 3,017 New car registries 311,242 313,522 338,216 Ratio 0.022 0.015 0.009

4.2 Market analysis of choices

In the Swedish new car market from 2005 to 2010, there are in total 47 car makes and more than 300 models that have ever been available in the market, which is listed in Table A.1 in Appendix A. However, the number of makes and models are not constant. From 2009, a Romanian car make named “Dacia” enter the market with 2 models, Logan and Sandero. In 2010, Dacia introduces a new model of car, Duster, into the market. Similarly, other car makes also promote their new car models into the market and withdraw some of their models from the market.

4.2.1 Issues of defining price

Considering about the data matching part in this project. In the registries, we have different vintages of the cars which are sold as new cars. In fact, for instance, in car registry data in 2008, one may not know if a 2006 vintage of car is sold as the same price as that in 2006. There is not a clear boundary of it and such the price information 44 CHAPTER 4. DESCRIPTIVE ANALYSIS OF NEW CAR REGISTRIES of a same model in different years is causing lack. Even for a car with a vintage of 2006 being sold in 2008, we cannot be sure whether its model is really different from the one of 2008, or if they are indeed sold in various prices. So, we have to somehow arbitrarily designate a new definition of “new car” by taking account of the vintage of car from 2 years before to 1 year after in each registration year. And this issue in further discussed in the data matching part, in Section 6.4.3 in Chapter 6.

Table 4.2.1: Numbers and shares of new registries by vintage

2005 2006 2007 2008 2009 2010 All vintages 311,242 313,522 338,216 274,826 225,084 306,465 Previous 2-year 2,969 2,752 3,121 3,713 6,760 3,868 Previous 1-year 64,425 66,008 64,921 61,270 58,722 49,963 Vintage year 227,208 231,488 257,392 190,386 145,747 235,822 Next 1-year 1,572 1,477 1,162 1,648 1,096 939

Table 4.2.1 shows the volume of the annual new car registries by vintage in Sweden from 2007 to 2010. From this table, by our definition, the sales in new car market decrease roughly by 100,000 from 2007 to 2009, but in 2010, the annual sales recover to the level of 2007. And, the shares of our definition of “new cars” are approximately 95% of the new car annual registration.

4.2.2 Market analysis in Sweden new car market

Table 4.2.2: Market Shares of Top 15 Brands in New Car Market

Rank 2007 2008 2009 2010 1 Volvo 0.2372 Volvo 0.2195 Volvo 0.2447 Volvo 0.2082 2 VW 0.0961 VW 0.1138 VW 0.1251 VW 0.1323 3 Saab 0.0911 Saab 0.0914 Toyota 0.0856 Toyota 0.0653 4 Toyota 0.0682 Toyota 0.0725 Audi 0.0696 Ford 0.0624 5 Ford 0.0589 Audi 0.0652 Ford 0.0649 Audi 0.0568 6 Audi 0.0556 Ford 0.0632 BMW 0.0612 BMW 0.0499 7 Peugeot 0.0496 BMW 0.0542 Skodaˇ 0.0405 Renault 0.0492 8 BMW 0.0471 Skodaˇ 0.0416 Mercedes 0.0347 Skodaˇ 0.0458 9 Skodaˇ 0.0400 Peugeot 0.0359 Saab 0.0334 Kia 0.0370 10 Citroen¨ 0.0357 Opel 0.0305 Renault 0.0333 Mercedes 0.0347 11 Opel 0.0339 Mercedes 0.0253 Kia 0.0294 Peugeot 0.0343 12 Renault 0.0260 Hyundai 0.0248 Peugeot 0.0265 Hyundai 0.0310 13 Mercedes 0.0217 Citroen¨ 0.0244 Opel 0.0219 Saab 0.0300 14 Honda 0.0206 Renault 0.0224 Hyundai 0.0183 Citroen¨ 0.0206 15 Kia 0.0186 Kia 0.0172 Citroen¨ 0.0178 Opel 0.0200 Total 0.900 0.902 0.907 0.877

Table 4.2.2 shows the market shares of top 15 car makes in Swedish new car mar- ket from 2007 to 2010. These 15 brands occupy, on average, about 90% of annual 4.2. MARKET ANALYSIS OF CHOICES 45 sales in the whole new car market, which reflects the brands preferences of the major Swedish car buyers in these years. According to this table, we can find that except an Asian brand, Honda, is replaced by another Asian brand, Hyundai, from 2008, there are not any other new brands enter the top 15 ranking. We can also know that Swedish consumers purchase European brands more than any other brands, in which typically the domestic brand, Volvo, occupies more than 20% of the whole market every year. We can see that the European brands are from Germany, France, Sweden and Czech, in which the Czech one, Skoda,ˇ is now wholly owned by the Volkswagen Group from Ger- many. Only one US car brand, Ford, operates well in Swedish car market. The Asian car makes are either from Japan (Toyota, Honda) or from (Kia, Hyundai).

2007 2010

Sweden

Germany Sweden

Germany

Not in top 15 Not in top 15 France

France Asia Czech Asia Czech U.S. U.S.

Figure 4.2.1: Shares by origin area in 2007 Figure 4.2.2: Shares by origin area in 2010

Accordingly, Figure 4.2.1 and Figure 4.2.2 in Appendix ?? demonstrate the market shares of car makes from different origin countries or continents among the top 15 ranking in Sweden. Although there are few brands enter or exit the top 15, the standings and shares of these brands change. For example, the shares of Swedish makes drop significantly from 2007 to 2010 due to the decline of Saab, which shares more than 9% in the market before 2008, but its share decreases to 3% in 2010, as well as its ranking drops from the 3rd in 2007, 2008 to the 13th in 2010. The story behind this phenomenon maybe is that, in 2009, the owner of Saab at that time, General Motors, planned to close Saab after the failure of attempting to sell it to another Swedish car manufacturer, Koenigsegg (GM to ‘wind down’ Saab business, 2009). Although a Dutch automaker, Spyker, finally finished the purchase of Saab in 2010 (General Motors sells Saab to Dutch firm Spyker, 2010), the uncertainty of Saab’s future still led to the decline of Saab in the Swedish new car market in 2009 and 2010. Meanwhile, the market shares 46 CHAPTER 4. DESCRIPTIVE ANALYSIS OF NEW CAR REGISTRIES of Kia increase from less than 2% to 3.7% in 2010. Such a boom may be because that Kia offers a 7-year warranty of its new cars from 2010 in Europe, which is the longest warranty in the automobile industry (7-Year warranty now for every Kia in Europe, 2010). Recall the discussion in Section 3.2 that we assume the expected vehicle life-span can be about 15 years. Under such a policy, the service of half of the life-span of a Kia car is guaranteed. Meanwhile, this can also tell the customers about Kia’s confidence to the reliability of their products. So, this offer can be very attractive to some customers. The stories of these two brands show that the unknown risk of a car make and the better guarantee of a car make may both influence the consumers’ choices. So, we may infer that the maintenance and repair cost after the purchase can also be an essential factor that affect the decisions.

Table 4.2.3: Top 20 models and sales in Sweden new car market

Rank 2007 2008 2009 2010 Volvo V70 Volvo V70 Volvo V70 Volvo V70 1 26280 18744 17404 22806 Saab 9-3 Volvo V50 Volvo V50 VW Golf 2 12125 10207 9420 10859 Volvo V50 VW Golf VW Passat VW Passat 3 9715 8834 7326 10784 Saab 9-5 Saab 9-3 VW Golf Volvo V50 4 9068 8599 6549 9338 VW Passat BMW 3-Series Audi A41 5 7763 5434 4008 5501 VW Golf Saab 9-5 Ford Focus Ford Focus 6 7739 5172 3293 5494 Ford Focus VW Passat Toyota Yaris Saab 9-3 7 7454 5034 3059 5328 Volvo XC70 Audi A4 Avant Skodaˇ Octavia Kia Cee’d 8 5294 4154 2952 5182 Opel Astra Skodaˇ Octavia Kia Cee’d Renault Clio 9 4700 3944 2936 5082 Peugeot 307 Toyota Aygo Audi A4 Avant Volvo XC60 10 4556 3507 2823 4712 Audi A4 Avant BMW 3-Series Saab 9-3 BMW 3-Series 11 4319s 3461 2722 4373 Skodaˇ Fabia Skodaˇ Fabia Toyota Avensis Renault Megane´ 12 4238 3324 2579 4071 Skodaˇ Octavia Volvo XC70 Renault Megane´ VW Polo 13 4090 2780 2279 3835 BMW 3-Series Toyota Prius Skodaˇ Fabia Skodaˇ Octavia 14 4004 2670 2143 3702 BMW 5-Series Kia Cee’d BMW 5-Series Toyota Yaris 15 Continued on next page 4.2. MARKET ANALYSIS OF CHOICES 47

Table 4.2.3 – continued from previous page Rank 2007 2008 2009 2010 3607 2650 2101 3319 Volvo C30 BMW 5-Series Volvo XC60 Nissan Qashqai 16 3595 2639 2005 3296 Toyota Auris Toyota Auris Audi A6 Avant Volvo XC70 17 3554 2546 1986 3276 Kia Cee’d Saab 9-5 BMW 5-Series 18 3316 2409 1877 3271 Audi A6 Avant Audi A6 Avant Ford Mondeo Skodaˇ Fabia 19 3158 2157 1869 3160 Toyota Avensis Volvo C30 Peugeot 308 Toyota Auris 20 3139 2131 1805 2893 End of table

Table 4.2.3 shows the first 20 greatest sold car models in the Swedish new car mar- ket. Volvo V70 is the most sold car model in Sweden every year, and its annual sales are more or less twice the sales of the second most sold car models (Saab 9-3 in 2007, Volvo V50 in 2008 and 2009, in 2010). In the top 20 models standing, most of the popular car models belong to the top 15 makes, whereas a model named Qashqai from Nissan is also sold well though Nissan is not very popular in Sweden. Actually, the sales of Qashqai occupy more than 70% of the total annual sales of Nissan. This may hint that some particular car models can still have good annual sales even its brand is not well recognised in the market.

4.2.3 Comparative analysis of the new car market in other countries

With the comparison to the other car markets, we may find that, if a country has au- tomobile industry, the domestic car makes are usually sold the best in this country. In another European country, Germany, the German car brands, Volkswagen, Mercedes- Benz, BMW (including ), Opel and Audi stand at the top 5 in new car registries in 2010 (Automobil/Tabellen und Grafiken, 2010). The annual registries in Germany from 2007 to 2010 by makes are listed in Table 4.2.4. Among the top 10 makes in new car registration, except Ford and Toyota, all of the others are European brands.

1The car registry data in 2010 do not separate Audi A4 Avant from Audi A4. So, the sales contain both two types of cars. 48 CHAPTER 4. DESCRIPTIVE ANALYSIS OF NEW CAR REGISTRIES

Table 4.2.4: Top 10 brands of new car registries in Germany1

2007 2008 2009 2010 1 VW 608,820 VW 615,229 VW 805,262 VW 613,808 2 Mercedes 327,742 Mercedes 327,965 Opel 338,603 Mercedes 281,240 3 Opel 285,267 BMW2 284,767 Ford 290,620 BMW2 266,729 4 BMW2 284,889 Opel 258,274 Mercedes 282,527 Opel 233,498 5 Audi 249,305 Audi 251,393 BMW2 258,041 Audi 226,872 6 Ford 213,873 Ford 217,305 Audi 234,861 Ford 198,156 7 Renault3 153,555 Renault3 147,167 Renault3 225,965 Renault3 153,555 8 Toyota4 132,535 Skodaˇ 127,277 Skodaˇ 190,717 Skodaˇ 132,150 9 Skodaˇ 118,682 Toyota4 96,781 Fiat 163,953 Peugeot 84,242 10 Peugeot 93,394 Peugeot 94,676 Toyota4 138,498 Toyota4 78,708

All 3,148,163 3,090,040 3,807,175 2,916,260 1 Source table: PKW-Neuzulassungen nach Automarken. Automobil/Tabellen und Grafiken, 2010. 2 Including Mini. 3 Including Dacia. 4 Including .

Table 4.2.5: Passenger cars share by origin in China in 1st half of 2010

Rank Origin Share Note 1 China 34.2% 19.9% for China state-owned makes, 14.3% for private owned makes 2 Asia 33.2% 24.1% for Japanese makes, 9.1% for Korean makes 3 Europe 20.2% Volkswagen shares largest among EU makes 4 U.S. 12.4% GM shares largest among American makes

Table 4.2.6: Sales and market shares in the U.S. in 2009 and 2010

2009 2010 Make1 Sales Share Sales Share General Motors 2,071,749 19.9% 2,211,699 19.1% Ford 1,677,234 16.1% 1,964,059 16.9% Toyota 1,770,174 17.0% 1,763,595 15.2% Honda 1,150,784 11.0% 1,230,480 10.6% Chrysler 931,402 8.9% 1,085,211 9.4% Nissan 770,103 7.4% 908,570 7.8% Hyundai 735,127 7.0% 894,496 7.7% Volkswagen 297.537 2.9% 359,889 3.1% BMW 242,053 2.3% 266,069 2.3% Subaru 216,652 2.1% 263,820 2.3%

Total Sales 10,431,510 11,590,274

1 Data url: http://www.autosavant.com/wp-content/uploads/2011/01/Sales-Results1.png 4.2. MARKET ANALYSIS OF CHOICES 49

Besides Europe, in the U.S. and China, their domestic car makes are sold better than the foreign makes as well. In China, Feng (2010) publishes the Chinese market share of passenger cars in Chinese new car market in the first half of 2010. According to his data, shown in Table 4.2.5, though the Chinese car brands are not as famous as Volkswagen or Toyota in the world, the sum share of China state-owned car makes (e.g. FAW1 and SAIC2) and private-owned car brands (e.g. Chery and ) still occupy the largest market share, which approximate 34.2%. Other Asian car makes from Japan and South Korea shares about 33.2%, and the European makes and American makes rank at the third and the fourth respectively. Meanwhile, in the U.S, according to the data released by Haak (2011), though the annual sales of Japanese “Big Three” in 2010 is catching up the sales of the American “Big Three”, GM and Ford still have the best annual sales in the U.S. automobile market, which is shown in Table 4.2.6.

Table 4.2.7: Top 10 models in other European countries in 2010

Rank Germany France U.K. 1 VW Golf/Plus/ Renault Clio Ford Fiesta 2 VW Polo Renault Megane´ Vauxhall Astra 3 Opel Astra Citroen¨ C3 Ford Focus 4 Mercedes C-Class Peugeot 207 Vauxhall Corsa 5 BMW 3-Series VW Golf 6 VW Passat Citroen¨ C4 VW Polo 7 Opel Corsa Dacia Sandero Peugeot 207 8 Audi A3/S3 Peugeot 308 BMW 3-Series 9 Audi A4/S4 Peugeot 206 Mini 10 BMW 1-Series Peugeot 3008 Nissan Qashqai

In terms of the annual sales of car models, shown in Table 4.2.8 and Table 4.2.73, all these countries which have their own automobile industries sells their domestic makes and models best, except the Great Britain. (VW Golf in Germany, Renault Clio in France, Ford Fiesta in Great Britain, Ford F-Series in U.S., Toyota Prius in Japan, BYD F3 in China.) However, even in Great Britain, the two models with its own car make, Vauxhall Astra and Corsa rank at the second and the fourth in the annual sales in 2010. Comparing with the size of the most sold vehicles in Sweden and other European coun- tries, based on the vehicle class classification from Euro NCAP 1997-2009, we can find that in Sweden, the most sold car model, Volvo V70 is classified as an , which has a much larger size than the most sold cars in Germany (VW Golf), France

1First Automobile Works 2Shanghai Corporation 3Data for Table 4.2.7 and Table 4.2.8 are from: Automobil/Tabellen und Grafiken, 2010 and China car sales ranking by model in 2010, 2011. 50 CHAPTER 4. DESCRIPTIVE ANALYSIS OF NEW CAR REGISTRIES

Table 4.2.8: Top 10 models in U.S. and Asia in 2010

Rank U.S. Japan China 1 Ford F-Series Toyota Prius BYD F3 2 Chevrolet Silverado (Jazz) CSVW Lavida 3 Toyota Vitz (Yaris) BJH Yuedong1 4 FAW-VW Jetta 5 Toyota Corolla Honda Freed SGM2Excelle 6 Toyota Passo CSVW Santana 7 Honda Stepwgn FAW Tianjin Xiali 8 Ford Fusion SGM Cruze 9 Honda CR-V Toyota Voxy Chery Cowin 10 Ram FAW-VW New Bora 1 The full name is Beijing- Yuedong. 2 SGM is short for Shanghai-GM.

(Renault Clio) and U.K. (Ford Fiesta), which are grouped as the supermini or small . This might be caused because that the larger (or maybe more expensive) cars usually have more advanced technologies and higher estimation of safety, which can be more adaptive to the roads with deep snow in Nordic winters.

4.3 Analysis of fuel type choices

Hugosson and Algers (2011) examine the increase share of clean cars in Sweden by looking into the shift of different types of fuel with the help of the data from Bil Swe- den4. In this section, a similar data analysis is drawn according to the new car regis- tration data introduced in this project with all available fuel types listed in Table 3.1.1 in Section 3.1. To be consistent with the analysis presented by Hugosson and Algers (2011), among these various types of fuel, we simply reduce them to 5 groups, which are: gasoline, ￿ diesel, ￿ ethanol and E85, ￿ gas, ￿ other hybrid types (e.g. electrical), ￿ where the latter 3 groups are also counted as “alternative” fuel in this thesis to simplify the models we develop for analysis in Chapter 7. Except gasoline, all other types of fuel can also be counted as clean fuel due to the less emission of carbon dioxide. Similar to the charts from Bil Sweden, Figure 4.3.1 shows the variety of fuel type

4http://www.bilsweden.se 4.3. ANALYSIS OF FUEL TYPE CHOICES 51

!"#$%&'(#%)*+,#%-.%/#0%,#12)&,2#)% $"

!#,"

!#+"

!#*"

-./01234" !#)" 524/41" !#(" 678.30196+("

!#'" :;<=2>" -./" !#&"

!#%"

!#$"

!" %!!'" %!!(" %!!)" %!!*" %!!+" %!!," %!$!"

Figure 4.3.1: Share of various fuel types of new registered cars

!"#$%&'(#%)*+,#%-.%/-$%01#2#.% #!!"

+!"

*!"

)!"

,-./0123" (!" 413.30" '!" 5*'"

&!" 67891:" ,-." %!"

$!"

#!"

!" $!!&" $!!'" $!!(" $!!)" $!!*" $!!+"

Figure 4.3.2: Share of various fuel types in Bil Sweden 52 CHAPTER 4. DESCRIPTIVE ANALYSIS OF NEW CAR REGISTRIES share computed from the new car registration data from 2004 to 2010. According to this table, the trends of the decline of gasoline as well as the increase of diesel are very clear. In 2004, almost 90% registered cars are gasoline car, while the share of diesel car is less than 10%. However, the share of gasoline decreases continuously afterwards and the share of diesel car reaches about 50% of the whole new car purchase in 2010. Besides these two main fuel types, the share of ethanol (and E85) increases from 2004 as well and then approaches to the peak in 2008, sharing more than 20% in the market. However, the share of ethanol fuel decreases to 10% roughly again in 2010, but this share is still larger than the level of 2007. Regarding to the gas and other types of hybrid fuel, their shares go up slightly. The calculation from our data is similar to Bil Sweden’s, which is shown as Fig- ure 4.3.2. In the data from Bil Sweden, the share in 2010 is omitted. However, both of these figure indicate a increasing demand of clean fuel cars from 2005 and the demand of gasoline cars decline continuously. Chapter 5

Descriptive analysis of car attributes

In this chapter, we analysis the supply data, which contain the missing values in the registration data, e.g. price. In addition, this data show the car alternatives in three different levels, namely make, model and version. So, the analysis can be of help to know the alternatives in Swedish new car market.

5.1 Introduction

Among the different attributes of cars, we have two main sorts of variables. One is numerical attributes, like price, maximum speed, fuel consumptions, vehicle size, etc., which are listed in Table B.1 in Appendix B. The other sort is dummy variables, like if the vehicle is installed the equipments like ABS, air-conditioner, air-bags, etc., which are listed in Table 5.4.2 and Table 5.4.3. However, among the numerical attributes, some information is zero. The difficulty here is that we need to judge which one is de facto zero and which one is missed. For example, it is possible that the maximum roof load of a racing car is zero, but we still do not have the corresponding information. And we should also decide how to deal with the missing data. There are 51 of 3469 supply data with a 2010 vintage of cars that miss the information of maximum speed including Volvo V50 and V70. These can be a huge data missing since the annual sales of these two models are more than 30,000, which nearly occupies one seventh of the total sales of new car in 2010. There are two ways to address these issues. One is to copy the supply data of other years like 2009, which contains such information. However, in this case, we need to assume that the configurations of the same model of vehicles do not change in two consecutive years. The other way is simply exclude the data which is zero, since at least we can make sure that some attributes, like price, size, or displacement cannot be actually zero. In this case, we need to find out that how many data remain after the excluding in each year. In this thesis, we choose the second method, that is to simple

53 54 CHAPTER 5. DESCRIPTIVE ANALYSIS OF CAR ATTRIBUTES

filter the zero data. And this reasons are discussed as following. To reformat the missing data, we filter the data of the following attributes that are zeros, which are price, maximum speed, CO2 emissions, fuel consumptions, volume of tank, size (length, width and height), displacement, curb weight, payload, total weight, power in kilowatt (kw) or horsepower (hp), torque, revolutions per minute (RPM) at maximum speed or torque, weight capacity (kg per kw or hp) and acceleration. These attributes cannot be zero for a car. Although for other attributes, the missing data are still always possible, these attributes are for sure that not possible to be zero. Table 5.1.1 shows the volume of remaining data in the version level. After the excluding, still at least 96% of data remain.

Table 5.1.1: Number of versions in supply without data missed

Year 2007 2008 2009 2010 Total number 2679 2992 3432 3469 No data missed1 2610 2943 3377 3331 Ratio 0.974 0.984 0.984 0.960

1 Specially, no data missed stands for that no numerical car attributes data are omitted

Besides the numerical and dummy variables, some other vehicle features are also available. Table 5.1.2 lists 10 vehicle types available in Swedish new car market from 2007 to 2010. Among the types, MPV stands for multi-purpose vehicle such as Skodaˇ Roomster or Honda FR-V, and SUV is short for like Audi Q5 or Volvo XC90, meanwhile, pick-up refers to pick-up . The only pick-up truck model in Swedish new car market is Hummer H2.

Table 5.1.2: Shares of vehicle types in the supply

Shares Types Examples 2007 2008 2009 2010 Cabriolet 0.0608 0.0709 0.0635 0.0692 Alfa Romeo Spider Coupe´ 0.0687 0.0769 0.0839 0.0917 Porsche Cayman 0.2639 0.2483 0.2535 0.2643 Volvo V70 Hatchback 0.2299 0.2249 0.2214 0.2208 Volkswagen Golf Minibus 0.0131 0.0094 0.0067 0.0075 Volkswagen Caravelle Minivan 0.0459 0.0464 0.0417 0.0357 Mercedes R-Class MPV 0.0116 0.0120 0.0117 0.0176 Honda FR-V 0.1941 0.1892 0.1876 0.1712 Volvo S80 SUV 0.1116 0.1213 0.1300 0.1219 Volvo XC90 Pick-up 0.0004 0.0007 0 0 Hummer H2 5.2. SHARE OF FUEL TYPES IN SUPPLY 55

5.2 Share of fuel types in supply

In Section 4.3 of the former chapter, we draw a chart showing the shift of various fuel types in the registration data. In this section, we present how the shares of different fuel types vary in the supply, as shown in Figure 5.2.1. Due to incomplete information in our supply data, the gas fuel car data are omitted. Consulting the circumstance of the demand, the share of gas fuel in quite small. What has to be noted is that, though the omittance of such information is not likely to affect the whole share significantly, the complete missing of fuel data may cause some problems. With comparison to Figure 4.3.1, it can be found that the shares of supply and demand in terms of fuel types are not consistent. For instance, in 2010, car makers still supply a larger number of gasoline cars than diesel cars, whereas half of the new registration cars use diesel and the share of gasoline registration is only about 35%. Such a difference can also be found for hybrid fuel cars. As what has been presented before, the share of new registered ethanol and E85 cars reaches 20% in 2008, but the supply share of such fuel type cars is only approximately 5%. Although we may say that the overall trend of the supply is similar to the demand during these years (e.g. the increase number of clean cars and the decline of gasoline cars), there is still a gap between the supply and the demand.

!"#$%&'(#%)*+,#%-.%)"(($'% !#+"

!#*"

!#)"

!#(" -./01234" 524/41" !#'" 678.30196+("

!#&" :;<=2>" -./" !#%"

!#$"

!" %!!'" %!!(" %!!)" %!!*" %!!+" %!!," %!$!"

Figure 5.2.1: Share of various fuel types in supply

5.3 Distribution of the attribute values

Table B.1 in Appendix B shows the main statistical values of each car characteristics. But the shape of the distribution of these values cannot be obtained. This chapter is to 56 CHAPTER 5. DESCRIPTIVE ANALYSIS OF CAR ATTRIBUTES analyse the distribution of some of the main car attributes. Take the car attributes data of 2007 as an example. Figure 5.3.1 shows the histogram and the density plot of the price of each car version in a range between 105 and 106. However, according to Table B.1 in Appendix B, we can see clearly that some car versions own the price of, at most, 5 106 SEK (Mercedes-Benz Maybach)1. However, × most of the car versions’ prices locate in the range between 1 105 SEK and 5 × × 105 SEK. Meanwhile, the cheapest version of car in Swedish new car market in 2007 is about 8 104 SEK (Chevrolet Matiz and ). Although the order of × magnitude of price from the cheapest version to the most expensive one shifts from 104 to 107, in Figure 5.3.1, we only take account into the prices from 105 to 106 to obtain clearer differentiation. Arnberg et al. (2008) face a similar situation in their research when they try to handle the car price in Denmark in DKK (Danish Krone). To get a better log-likelihood result, they take the natural logarithm (log) of their car price. Therefore, we deem that it can be a good suggestion that we take a similar manipulation. Thus, we compute the log of price to normalise the distribution in order to obtain a clearer differentiation. The histogram and the density plot of the log price is shown in Figure 5.3.2. After the log is taken, it can be found that most of the prices locate between e12 and e13, which approximate to a range from 160,000 to 440,000 SEK. Meanwhile, to take the log of the price, the potential assumption is that the new car buyers’ sensitivities in different price levels are not identical. This assuming can be reasonable as the curve of logarithm indicates that it is less sensitive for the ones who are in a higher price level than the ones in a lower price level by paying the same amount of additional money. For instance, one who can afford a Chevrolet Matiz should be more sensitive than one who can afford a Mercedes-Benz Maybach if they need to pay an additional 100,000 SEK to obtain the same vehicle. Besides price, the distribution of some other main factors are shown below as well. Especially, the term “power” refers to the maximum power of the vehicle in kilowatt. The term “displacement” stands for the emission displaced by a car, with the unit of cubic centimetre. The term “weight” refers to the total weight that a car can afford, including its own weight and its payload, measured in kilogram. Finally, the term “ac- celeration” is measured in second, meaning that the duration of a car spending on ac- celerating from 0 to 100 km/h. From these distributions, for non-price attributes, the non-linear trend of sensitivity

1Actually, in the supply data, the most expensive car is Koenigsegg CC. However, there are lots of attributes data omitted for this car. So, such records are excluded. Meanwhile, there are not any new Koenigsegg CCs registered in 2007. 5.3. DISTRIBUTION OF THE ATTRIBUTE VALUES 57

Histogram of price in 2007 Histogram of log price in 2007 0.8 3e-06 0.6 2e-06 0.4 Density Density 0.2 1e-06 0.0 0e+00

2e+05 4e+05 6e+05 8e+05 1e+06 11 12 13 14 15

price in 100,000 to 1,000,000 Log price in 2007

Figure 5.3.1: Distribution of price Figure 5.3.2: Distribution of log price may not be as significant as price. This may indicate that it is enough only to take the log of the price and use the original values of these variables in our modelling part.

Histogram of power (kw) in 2007 Histogram of displacment in 2007 0.008 8e-04 0.006 6e-04 Density Density 0.004 4e-04 0.002 2e-04 0.000 0e+00

100 200 300 400 1000 2000 3000 4000 5000 6000 7000

power (kw) in 2007 displacement in 2007

Figure 5.3.3: Distribution of power in kw Figure 5.3.4: Distribution of displacement

Table 5.3.1 lists the correlation values among these variables. We can find that there is a strong positive correlation between displacement and power. And acceleration has the negative correlation with displacement or power, as the variable of acceleration is measured by second. Here, we calculate the correlation of total weight of the vehicle as well, since this attribute is a potential indicator to match the supply and the demand in Chapter 6. The total weight does not have a strong correlation with the other values. However, based on the common sense, subjectively, a higher class of car may be larger (therefore, heavy) with a better engine inside. So, it is reasonable that its correlations to 58 CHAPTER 5. DESCRIPTIVE ANALYSIS OF CAR ATTRIBUTES

Histogram of car weight in 2007 Histogram of acceleration in 2007 0.15 0.0014 0.0012 0.0010 0.10 0.0008 Density Density 0.0006 0.05 0.0004 0.0002 0.00 0.0000 1000 1500 2000 2500 3000 3500 5 10 15 20

car weight in 2007 acceleration in 2007

Figure 5.3.5: Distribution of weight Figure 5.3.6: Distribution of acceleration

Table 5.3.1: Correlation of attributes

displacement power in kw acceleration total weight displacement 1.0000 0.9316 -0.6747 0.6229 power in kw 1.0000 -0.7876 0.5237 acceleration 1.0000 -0.2583 total weight 1.0000 displacement and power are positive. And a larger weight may lead to a slower acceler- ation, but the better engine may also indicate a faster acceleration. Such a contradiction results in the correlation between weight and acceleration close to zero.

5.4 Technology attributes

Besides the numerical attributes, demonstrating the main characteristics of a version, the indexes of technology variables are shown explicitly in Table 5.4.2 and Table 5.4.3, like ABS, ESP, etc. These variables are represented as dummies, and formed into two groups. The first one is if such technologies are pre-installed as the standard specifica- tion, shown in columns with the headers “Yes” and “No”. The second group shows that if such technologies are not installed at the beginning, but can be options that the clients may determine if they want these additional equipments. The share of such options is listed in the column named “Append.”. Meanwhile, the price data for these additional appendages are available too. In this two tables, the sum of the three shares, titled by “Yes”, “No” and “Append.”, are not equal to one, due to the missing. According to Table 5.4.2 and Table 5.4.3, some of the technologies, which are ABS, Driver’s air-bag, Belt tension and Servo-system, are configured in almost every (namely 5.4. TECHNOLOGY ATTRIBUTES 59 more than 99%) vehicles as the standard specifications. Since these technologies are available in the majority of the vehicles, the effect of such technologies may not be significant to influence consumers’ decisions. In other words, if a customer purchase a car without such a configuration, like ABS, whereas more than 99.5% of other cars in the market own this technology, she may probably be more sensitive to other attributes, e.g. brands, shapes, etc., but not due to the statement like “I want such a car because it does not have ABS installed”.

Table 5.4.2: Shares of dummies in supply data 2007 and 2008

2007 2008 Name Yes No Append.α Yes No Append. ABSβ 0.996 0.002 0.001γ 0.997 0.002 0.001 Manual air-conditioner 0.345 0.542 0.088 0.311 0.583 0.084 Automatic air-conditioner 0.498 0.145 0.258 0.540 0.116 0.252 Driver’s air-bag 0.991 0.003 0 0.992 0.003 0 Passenger’s air-bag 0.959 0.006 0.019 0.964 0.005 0.016 Side air-bag 0.867 0.048 0.059 0.888 0.040 0.043 Side protect 0.989 0.006 - 0.986 0.006 - Anti-slide 0.823 0.086 0.053 0.899 0.051 0.025 Auto gear transmission 0.100 0.751 0.138 0.079 0.790 0.104 Belt tension 0.995 0.002 0 0.996 0.001 0 Central lock 0.962 0.012 0.009 0.961 0.010 0.004 Cruise Control 0.403 0.205 0.304 0.451 0.135 0.300 Neck protect 0.919 0.068 0.004 0.795 0.064 0.002 Theft alarm 0.604 0.027 0.205 0.540 0.024 0.215 Automatic level control 0.105 0.789 0.062 0.092 0.759 0.072 Servo 0.986 0.004 0.003 0.993 0.002 0 ESPδ 0.748 0.159 0.021 0.976 0.122 0.013 Pollen filter 0.710 0.150 0.033 0.642 0.148 0.039 Radio 0.605 0.010 0.312 0.682 0.007 0.238 Roof racks 0.231 0.588 0.110 0.234 0.554 0.108 α Appendage. The extra price is available for every dummy if appended. β Anti-lock braking system. γ The sum of the shares is usually not equal to 1 due to the data missed. δ Electronic stability control. 60 CHAPTER 5. DESCRIPTIVE ANALYSIS OF CAR ATTRIBUTES

Table 5.4.3: Shares of dummies in supply data 2009 and 2010

2009 2010 Name Yes No Append. Yes No Append. ABS 0.998 0.001 0 0.998 0.001 0.001 Manual air-conditioner 0.328 0.570 0.073 0.341 0.558 0.076 Automatic air-conditioner 0.533 0.108 0.278 0.521 0.099 0.288 Driver’s air-bag 0.994 0.003 0 0.993 0.003 0 Passenger’s air-bag 0.963 0.005 0.022 0.972 0.005 0.016 Side air-bag 0.909 0.315 0.034 0.910 0.020 0.042 Side protect 0.982 0.008 - 0.986 0.005 - Anti-slide 0.947 0.022 0.013 0.948 0.016 0.013 Auto gear transmission 0.067 0.846 0.067 0.047 0.906 0.033 Belt tension 0.998 0.001 0 0.998 0.001 0 Central lock 0.966 0.012 0.007 0.969 0.010 0.006 Cruise Control 0.462 0.129 0.300 0.437 0.102 0.358 Neck protect 0.929 0.054 0.001 0.921 0.054 0.004 Theft alarm 0.472 0.020 0.298 0.425 0.023 0.335 Automatic level control 0.080 0.737 0.095 0.072 0.730 0.107 Servo 0.996 0.002 0 0.996 0.002 0 ESP 0.824 0.101 0.015 0.837 0.089 0.024 Pollen filter 0.655 0.113 0.037 0.673 0.095 0.024 Radio 0.747 0.013 0.184 0.775 0.014 0.167 Roof racks 0.241 0.544 0.096 0.224 0.545 0.107 Part II

Disaggregated analysis

61

Chapter 6

Data matching

This chapter discusses how the matching is interpreted. To deal with the discrete choice modelling, first, we need to define our alternatives. Thus, we analyses the alternatives that can be got by various matching level. While, based on what has been discussed in the above chapters, we can find that the car registration data do not contain all sensitive information we need, e.g., price and fuel consumptions. However, for the sake of mod- elling, these attributes are quite essential. Therefore, we need to add such information which the demand data do not own to the demand by matching with the supply.

6.1 Description

The data can be matched by different levels of aggregation. So, before matching, we need to decide the best aggregation level. The aggregations can be categorised by three different level: make, model and version. Car make refers to its brand, like Volvo, Volkswagen or Toyota, and car model means the series of the car, like Volvo V70, Volk- swagen Golf or Toyota Auris. Finally, the car version indicates the most detailed level of the car, like “Volvo V70 II D5 DPF Kinetic” or “Volkswagen Golf 1.6 TDI DPF BlueMotion”, where the registration data do not contain. However, the registration data do not contain so many details as the version level even they provide not only the model information. For example, some of the registrants provide Volvo V70 as “Volvo B + V70” or Volkswagen Golf as “VW 1KP Golf Plus”. Therefore, it may be unrealistic to match the data in the level of versions. Moreover, due to the different nomenclature sys- tem of each car company for their productions, the model of different cars can provide different amount of information. For example, in Volvo, all V-series refer to a station wagon with two-box styling and all S-series refer to a sedan with three-box styling. However, a Volkswagen Passat may refer to a sedan, a station wagon or even a coupe.´ Considering Audi, it divides its A4 and A6 series of vehicles by two different models, A4/A6 and A4/A6 Avant, of which “avant” refers to a station wagon. According to

63 64 CHAPTER 6. DATA MATCHING the different annual sales, we may suppose that consumers have different preference for different car styling even it has the similar look and configuration. Therefore, these situation can also be intractable.

Table 6.1.1: Shares of new cars matched in different aggregation level

Year 2007 2008 2009 2010 New Car Registry 235565(1.000) 188003(1.000) 144025(1.000) 235568(1.000) Make Matching 234460(0.995) 187125(0.995) 143703(0.998) 235174(0.998) Model Matching 229987(0.976) 175088(0.931) 140674(0.978) 229094(0.973)

The row named “Make Matching” in Table 6.1.1 shows the numbers and shares of new cars that can be matched at the level of make, whereas the row of “Version Matching” shows numbers and shares at the level of model. As what has been shown in Table 6.1.1, not all of the cars in the demand can be matched with the supply. The main reason is that, not all of the car makes and models are available in the Swedish domestic new car market. However, in fact, some buyers would still find their own ways to purchase them. For instance, a Romanian car make named “Dacia” had not been available in Swedish car market until 2009, but there are still some Dacia cars registered in 2007 and 2008. The similar situation takes place in the model level, like the information of Volvo V60 is not available in the supply data of 2010, but there are still 1530 of such type of vehicles are registered in this year. An extreme case in all of the car makes is Fiat. In Fiat, only a minor of the registries can be matched with the supply which is shown in Table 6.1.2. This is because that there are lots of recreational vehicle brands, like Burstner,¨ Dethleffs or Hymer, of which productions are modified based on the Fiat vehicles. Other car makes like Renault and Mercedes have similar issues but the share of the recreational vehicles in their registration data is quite small. So, in the matching, we need to exclude these data since the attributes of this type of vehicles are not available.

Table 6.1.2: Shares of Fiat can be matched

Year 2007 2008 2009 2010 All Sales 1287 1774 1113 2328 Can be Matched 467 476 221 619 Ratio 0.363 0.268 0.199 0.266

Besides Fiat, some new-style but unprevailing car makes or models are not included in the matching due to the lack of car attributes information, like new-concept cars and 6.2. METHODOLOGY OF MATCHING 65 formula shaped cars (e.g. Ariel Atom 2). For instance, there are a few registries of a electric car model, Think City, which is conceptual and future-based with zero emission. This amount of data omitted will not affect the market share, but it can limit the policy and environment analysis.

6.2 Methodology of matching

In this project, first, we need to use the car plate license to join the car attributes from the ownership data into the new car registration data. Since the data type of car plate license is VARCHAR, and no records are not duplicate, it is a good choice to add an unique index for this attribute to enhance the matching efficiency. In this step, all of the records (rows) should be matched since the new car registration data is the subset of the ownership data. After adding new attributes into the new car registries, we use the similar method to match the demand with the supply by different criteria. All matching circumstances and results are recounted in the following sections.

6.2.1 Standardisation of model name

Due to the large amount of data shown in Table 4.2.1 and Table 6.1.1, the data cannot be matched manually. Additionally, as what has been described in section 6.1, the car models are recorded differently in the registration data and the attributes data. It is not possible to match the models directly from the two data sources. Thus, the standardi- sation allows us to match the data automatically. Specifically, we need to correspond the model name in the registration with the model name in the supply. For example, the car model like “A4 Avant” in the supply may be registered as “Audi A4 AV” or “Audi A4 Avant” in the demand data. And, the letters with accent in the supply data like “Megane”´ and “Doblo”` are all converted into non-accent characters in the demand data, like “Megane” and “Doblo”. For standardisation, we use the model names from 2007 and 2010 given in the supply data as our standard names, which are listed in Ta- ble A.1 in Appendix A, since the information in the supply data shows the official model names for each car make. In Table A.1, some linguistic changes are made: the Swedish words of “serien” in BMW models are translated into English of “series”; and the ab- breviation of “Gr.” is short for “Grand”. To standardise the model name, the general idea is straightforward. First, we simply search the keywords of the car model names in the registry data, like “Passat” or “V70”. If a record contains the keyword of a model, we thus tag the record on its standardised model name. Since the car model names are 66 CHAPTER 6. DATA MATCHING rarely duplicated, most of the model names can be given to a unique keyword, e.g. all of the Ford Focus registration contains the keyword of “Focus”. However, there are limited issues that still have to be noted. For example, a car with the keyword Mercedes “SL” may refer to “SLK”, “SLR” and “SLS”, or the keyword of “Cherokee” can also repre- sent “Grand Cherokee”. So, under such conditions, we have to add other restrictions to eliminate the confusion. After the standardisation, Table 6.2.1 shows the number of models in the supply comparing with the number of models in the registries that are standardised. Among these ratios, the number of models in supply and the number of models in the demand are not equal. This may be because that some of the models are available in the Sweden new car market but none of these cars are sold, especially for some luxury car models. For instance, no Lamborghini Murcielago´ and Superleggera and no Lotus car is reg- istered in 2009. The inaccuracy of the model names in the registration data may also cause this issue. For example, besides Kia Cee’d, there is another Kia model named Pro Cee’d. However, in the registries, regardless of Cee’d or Pro Cee’d, all of these cars are registered as “Kia Cee’d”.

Table 6.2.1: Comparison of the number of models

2007 2008 2009 2010 Models in supply 298 310 315 307 Models in demand 295 287 270 274 Ratio 0.990 0.926 0.857 0.893

6.3 Results and drawbacks of model level

After the standardisation, the demand data provided by Swedish Car Registry and the supply data from a company can now be matched in the model level. However, due to the fact that most of car models have more than one version, matching in model level may not lead to a good accuracy. Among our models, roughly 90% of them have more than one version1. Extremely, Table 6.3.1 shows the top 5 models with the highest number of versions. The model with the greatest number of versions is BMW 3-Series with the vintage of 2010, which have 126 different versions.

190.6% in 2007, 89.7% in 2008, 91.4% in 2009, 87.3% in 2010 6.3. RESULTS AND DRAWBACKS OF MODEL LEVEL 67

Table 6.3.1: Greatest 5 models with most versions

2007 2008 2009 2010 Ford Focus 88 Saab 9-3 59 Saab 9-3 83 BMW 3-Series 126 BMW 3-Series 75 Renault Megane´ 52 BMW 3-Series 71 Saab 9-3 81 BMW 5-Series 67 BMW 3-Series 47 Mercedes E 64 Volvo V70 78 Saab 9-3 47 Honda Accord 45 Volvo V70 59 Audi A3 74 Skodaˇ Octavia 42 Mazda 6* 44 Audi A3 55 BMW 5-Series 65 Volvo S80 55 Volvo S80 65 * Audi A3, Audi A6 Avant, Cadillac BLS and Ford Focus have the same amount of versions. From 2007 to 2010, most of the models have more than one version, some of which have even more than 50 versions. For different versions, the attributes are quite differ- ent, especially for price, since price is supposed to be the most important attribute of a car. If anything is changed for a version, the price is definitely to be shifted. To find how accurate the model level matching is, it is necessary to examine the discrepancies of the prices within each model. In this case, we use the coefficient of variation (CV) as an indicator to examine the differences, which is defined as the standard deviation σ divided by the absolute value of mean µ: σ CV = . µ | |

CV of Price, model level in 2007 CV of Price, model level in 2008 20 15 15 10 10 Density Density 5 5 0 0

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

N = 228210 Bandwidth = 0.003053 N = 173912 Bandwidth = 0.00403

Figure 6.3.1: CV of price, model level 2007 Figure 6.3.2: CV of price, model level 2008

Figure 6.3.1 to Figure 6.3.4 show the density plot of the CV of price from 2007 to 2010. According to these figures, we can find that most of the values locate in the range between 0.1 to 0.2. And there is only a few number of CVs are equal to zero, which means that these models that are registered in the demand data contain only one version. Nonetheless, the car models that are listed in Table 6.3.1 are all have high 68 CHAPTER 6. DATA MATCHING

CV of Price, model level in 2009 CV of Price, model level in 2010 25 15 20 10 15 Density Density 10 5 5 0 0

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

N = 139184 Bandwidth = 0.003053 N = 217621 Bandwidth = 0.003737

Figure 6.3.3: CV of price, model level 2009 Figure 6.3.4: CV of price, model level 2010 sales in Sweden in each year. (See Table 4.2.3 in Appendix ??) So, this may have implications for the model estimation.

6.4 Matching in a more detailed level

6.4.1 A level between model and version

In Section 6.1, we describe that it is not possible to match the registration data to the level of each version. And in Section 6.3, we show the conclusion that the car choice model we are about to build may not have accurate attributes, e.g. price. However, to achieve more accuracy, we can match the data to a level between model and version. Thus, this section discusses the different results for each criterion we are to use. Kunnapuu¨ (2009) proposed a way to match the demand and supply data by aggre- gating the alternatives into different models. And the author disaggregate the model level with different types of fuel. For example, Volvo S40 Petrol and Volvo S40 Diesel. This level is the most detailed level that one can achieve with the car registration data only, as expect the types of fuel, there is not any other auxiliary information that are helpful to be used. Nonetheless, we obtain more car attributes in the stock registration data. So, we can match the data in a different way. At first thought, the information of displacement, “cylindervolym” in Swedish, can be a sensitive attribute to distinguish different ver- sions, because it seems that, usually, one may be conditioned to use the displacement to distinguish different versions within a car model, like “Volvo V70 2.5T”. Meanwhile, as what has been listed in Table 6.3.1, for the model BMW 3-Series with 126 different 6.4. MATCHING IN A MORE DETAILED LEVEL 69 versions, we know that the BMW cars can be distinguished as “BMW 320”, “BMW 325”, etc., where “20” or “25” indicates the approximate displacement of the vehicle (2.0L or 2.5L). So, this implementation can divide car models in a more detailed level. These values are available in the ownership data from 2008 to 2010. However, accord- ing to Table 6.4.1, we can find that in terms of displacement, there is a large volume of displacement data that are missing. Therefore, it is not realistic to match the demand and the supply by this attribute.

Table 6.4.1: Data missing in displacement

Year 2008 2009 2010 Missing 3,484,895 2,007,144 3,184,125 Total 5,315,382 5,359,554 5,473,004 Share 65.56% 37.45% 58.18%

6.4.2 Matching with power or weight

Fortunately, in our data, we get some additional registry information in terms of car weight, power, clutch and size. Although we cannot match the data by displacement, according to the correlation showing in Table 5.3.1, we can find that power has a very high correlation with displacement. Therefore, it seems like a reasonable assumption to match the supply and demand based on power. In terms of weight, we find that within each model, most versions obtain different weights. Since the differences of doors, engine technologies (if a turbo is installed or different types of fuel are used), etc. shift the total weight of a car. Additionally, these two attributes are available in all of the demand data in each year. Then, to approach a more detailed level, we can add more restrictions, such as clutch technologies and type of fuel. For the matching view, these attributes are not recorded in the same way in the supply and the demand, so we need to make the classification of these attributes “aliasing” to enhance the successful matching rate. Particularly, in terms of clutch, the values are defined as being binary, manual- or auto- transmission. In fact, some more advanced transmission technologies are all counted as automatic trans- mission as well, like CVT2 and DCT3, since the operation method of these advanced technologies for drivers is as same as that of automatic transmission.

2Continuous variable transmission. 3Dual clutch transmission. Technically, DCT, or Direct-Shift Gearbox as being renamed by German, should be counted as semi-automatic transmission, which can be described briefly as that one gearbox contains two manual transmission clutches working together in a fully automatic mode. 70 CHAPTER 6. DATA MATCHING

Meanwhile, for type of fuel, though Table 3.1.1 in Section 3.1 shows that there are 17 different types of fuel, most of the car manufactures in the market still mainly uses the fuel technology of gasoline and diesel. Thus, these values are re-defined as trinary variables, which are gasoline, diesel and others. If the car mainly uses gasoline or diesel, we simply designate their fuel type as gasoline or diesel. Otherwise, we call their fuel type “others”. In the demand, the information about fuel types are separated into two parts, the first one is the main type, the other one shows if other type of fuel is used in the same time. This means that, for example, a car using gasoline with electricity (e.g. Toyota Prius) and a car using gasoline with ethanol (e.g. some versions of Volvo V70) both belong to the gasoline vehicles in the demand, but these attributes are not recorded as the same way in the supply. Due to the mismatching, this may lead to the underestimation of the volume of hybrid fuel vehicles in our RP data after matching, which is quite important in our model for environmental analysis. Therefore, fuel types can be a possible additional restriction, but it may not enhance the accuracy much. If we add more conditions for matching, we can get more precise matching results but less volume of samples due to the data omitted or the errors between different data sources. This is the trade-off we have to consider about that which conditions should be added to obtain a better accuracy and larger sample, relatively.

Model-engine level Figure 6.4.1 to Figure 6.4.4 show the density of the CV of price when we match the data by different powers within each model. As the same power within each model refers to the same engine technology, this level is named as model- engine level. Among these density plots, what can be found is that, if we match the data by power, the new CV plots of price mostly locate between 0 and 0.1. This new level is much better than the density of CV plots in model level since we shift the range of the main part from CV [0.1, 0.2] in model level to CV [0, 0.1] in model- price ∈ price ∈ engine level. Such a density of CV is reasonable, as the number of version belonging to one model with the same maximum power capacity can often be more than two. These versions have the same engine but may own other different attributes, like the inconsistency of clutches or other accessories, but, their prices cannot be very different. Now that we have the heterogeneity of clutches, we can then add a new restrictions of clutch types as what we have defined above, which are manual- and automatic- trans- mission. Figure 6.4.5 to Figure 6.4.8 show the density of CV after adding the condition of clutch. Comparing with these two groups, by adding one more restrictions, the over- all shape indeed shifts to the left which means the CVs are more close to zero. But this 6.4. MATCHING IN A MORE DETAILED LEVEL 71

CV of Price, match with power in 2007 CV of Price, match with power in 2008 20 20 15 15 10 10 Density Density 5 5 0 0

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

N = 213222 Bandwidth = 0.002976 N = 162295 Bandwidth = 0.00275

Figure 6.4.1: CV of price Figure 6.4.2: CV of price

CV of Price, match with power in 2009 CV of Price, match with power in 2010 15 20 15 10 Density Density 10 5 5 0 0

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

N = 121263 Bandwidth = 0.00255 N = 205904 Bandwidth = 0.001958

Figure 6.4.3: CV of price Figure 6.4.4: CV of price 72 CHAPTER 6. DATA MATCHING is not an important improvement according to the figures. Moreover, there is a quite large amount of mismatching in 2007. Indeed, the number of successful matching de- creases from 213222 in Figure 6.4.1 to 147758 in Figure 6.4.5 by adding the clutch in 2007. 30.7% of data are mismatched. This hints that maybe there are some errors, like a big amount of data omitted, in the registration in 2007 in terms of the clutches.

CV of Price, match with power and clutch in 2007 CV of Price, match with power and clutch in 2008 20 25 20 15 15 10 Density Density 10 5 5 0 0

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

N = 147758 Bandwidth = 0.004221 N = 152491 Bandwidth = 0.004285

Figure 6.4.5: CV of price Figure 6.4.6: CV of price

CV of Price, match with power and clutch in 2009 CV of Price, match with power and clutch in 2010 20 25 20 15 15 10 Density Density 10 5 5 0 0

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

N = 115979 Bandwidth = 0.003141 N = 197948 Bandwidth = 0.002305

Figure 6.4.7: CV of price Figure 6.4.8: CV of price

By the synthesis of results above, we may say that matching in model-engine level can be a good alternative. First, the volume of data mismatched is not very large. Shown in Table 6.4.24, except 2009, the mismatched data are roughly 5% to 6%. Even in 2009,

4In this table, what has to be noted is that the numbers show in “Model level” is not as same as the 6.4. MATCHING IN A MORE DETAILED LEVEL 73 there are still 87.1% of data can be matched. Then, with comparison to model level, the new level decline the CV of price significantly. Finally, adding the restriction of clutch cannot improve the result very much and a large amount of data omitted occurs in 2007.

Table 6.4.2: Matching and mismatching by power

2007 2008 2009 2010 Model level 228,210 173,912 139,184 217,621 Model-engine level 213,222 162,295 121,263 205,904 Mismatching shares 6.57% 6.68% 12.88% 5.38%

Model-weight level For matching by weight, which means the total weight of a car, we can find that many specific car versions can be successfully matched, since the cars with the same weight in each model should own the same engine technology, and there are more attributes that should be the same. This level is quite close to the most detailed level, the version level. However, the problem is that the volume of mismatching in this level is larger than the model-engine level, since the range of weight is approximately from 1000 kg to 4000 kg, where as the range of power is from 30 kw to 480 kw, roughly. The possibly of mismatching by weight will be larger, e.g. a car with 1000 kg of total weight in the supply can be possibly registered as 1010 kg of total weight in the registration data. Since that if the data were matched by clutches, there would be lots of data mis- matched, the matching with weight and clutch is not plotted. The matching results of matching only by weight are examined. Figure 6.4.9 to Figure 6.4.12 show the den- sity plots of CV of price in model-weight level. Comparing with model-engine level, the overall price CV is improved by matching with weight, since more versions can be distinguished in this level. However, Table 6.4.3 shows the mismatching volume in this level. The omitted data is much more than the model-engine level, of which the shares are from 20% at least to 30% at most. With 20% to 30% of data missing, the market shares are not likely to be consistent. So, though we enhance the accuracy in matching by weight, it may lead to a larger error in the estimation of the model due to the inconsistency of the market shares. numbers that are shown in Table 6.1.1, but the differences are not very large. That is because the number here is generated after the standardisation of the model names. Some errors in the registration of car names cannot be recognised automatically, such as the spelling errors (e.g. the spellings of Mercedes or Volkswagen). And these errors can be recognised manually, which are taken into account in Table 6.1.1. 74 CHAPTER 6. DATA MATCHING

CV of Price, match with weight in 2007 CV of Price, match with weight in 2008 25 25 20 20 15 15 Density Density 10 10 5 5 0 0

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

N = 158959 Bandwidth = 0.004577 N = 124437 Bandwidth = 0.004066

Figure 6.4.9: CV of price Figure 6.4.10: CV of price

CV of Price, match with weight in 2009 CV of Price, match with weight in 2010 30 25 25 20 20 15 15 Density Density 10 10 5 5 0 0

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

N = 110975 Bandwidth = 0.002959 N = 171741 Bandwidth = 0.003095

Figure 6.4.11: CV of price Figure 6.4.12: CV of price

Table 6.4.3: Matching and mismatching by weight

2007 2008 2009 2010 Model level 228,210 162,295 139,184 217,621 Model-engine level 158,959 124,437 110,975 171,741 Mismatching shares 30.35% 23.33% 20.27% 21.08% 6.4. MATCHING IN A MORE DETAILED LEVEL 75

Table 6.4.4 shows the comparison of the statistic summaries of the price CV in three different levels. According to this table, the more detailed levels, model-engine level and model-weight level, indeed improve the matching accuracy that they generates the smaller CVs of price than that in model level. With comparison to these two detailed levels, the model-weight level recognise more versions, as the number of CV which equals to zero is greater in this level. For instance, in 2007 and 2008, the first quarter of the CVs are all equal to zero. However, considering with Table 6.4.2 and Table 6.4.3 together, the cost of such improvement is a large amount of mismatched data. The summary shows the results of model-engine level is more homogeneous in these years. Since the time series are continuous, though we separate them by year, the CV of prices should not change significantly in each year.

Table 6.4.4: Summary of CV of price from 2007 to 2010

Min. 1st Qu. Median Mean 3rd Qu. Max. 2007 Model level 0.000 0.111 0.135 0.140 0.164 0.724 Model-engine level 0.000 0.031 0.051 0.061 0.082 0.347 Model-weight level 0.000 0.000 0.033 0.052 0.075 0.461 2008 Model level 0.001 0.102 0.134 0.139 0.169 0.491 Model-engine level 0.000 0.024 0.047 0.056 0.069 0.234 Model-weight level 0.000 0.000 0.035 0.045 0.067 0.459 2009 Model level 0.000 0.104 0.122 0.128 0.152 0.431 Model-engine level 0.000 0.030 0.051 0.054 0.069 0.256 Model-weight level 0.000 0.016 0.041 0.042 0.064 0.333 2010 Model level 0.000 0.104 0.122 0.128 0.152 0.431 Model-engine level 0.000 0.035 0.047 0.055 0.069 0.253 Model-weight level 0.000 0.009 0.044 0.047 0.060 0.317

By synthesis of the volume of data mismatched and the CV of the price in each year, we may say that to approach a more detailed level, the model-engine level which means that the model and power are used to match in each vintage year can be the best alternative. 76 CHAPTER 6. DATA MATCHING

6.4.3 Results of matching by power

Based on our matching strategy of power matching, and the definition of new cars in Section 4.2, we need to test how many data are still left after matching. To get the more accurate result, we first exclude the car registries which are imported from other countries. That is because these cars are likely bought in the different prices from the domestic ones. And these group of cars may not be brand-new cars. That is because that the searching tactic we applied for filtering new cars from the ownership data is to examine the “date of first registration”, but these cars can be already registered and used in other countries. Table 6.4.5 shows the number of registries which are successfully matched with the supply data and the share of omittance of each vintage.

Table 6.4.5: Shares of data after matching (excluding imported cars)

Year New car After Share of New car After Share of model registries matching omittance registries matching omittance 2005 2006 2-year before 2574 1727 0.329 2325 1589 0.317 1-year before 61749 48651 0.212 63476 50430 0.206 vintage year 207735 184811 0.110 215467 197434 0.084 1-year after 1555 1022 0.342 1430 808 0.435 Sum 273613 236211 0.137 282698 250261 0.115 2007 2008 2-year before 2430 1639 0.326 2096 1487 0.291 1-year before 62333 53552 0.141 59638 48879 0.180 vintage year 241336 215364 0.108 189506 162041 0.145 1-year after 1140 453 0.603 1629 707 0.566 Sum 307239 271008 0.118 254877 213114 0.164 2009 2010 2-year before 5752 3893 0.323 2408 1345 0.441 1-year before 57830 45045 0.221 48894 35731 0.269 vintage year 145345 119710 0.176 235096 197016 0.162 1-year after 1063 640 0.398 921 0 1 Sum 211999 169288 0.201 289329 234092 0.191

To make sure that the car registration are not significantly shifted after matching, it is a good idea to test the market share before and after matching. As the largest share of missing is year 2009 and 2010, where 20% of registration cannot be successfully matched, we test the market share of each of these years as an example to show if the market share varies significantly. We have the market shares before and after matching of each model. To find out if the matching leads to the change of the market shares, we 6.4. MATCHING IN A MORE DETAILED LEVEL 77 introduce a paired t-test to examine the effect, in general, of market share after match- k ing. Here, we denote xi is the market share x of model i in year k before matching, and k yi is the market share y of model i in year k after matching. We calculate the difference between xk and yk as dk = xk yk. The t-value in year k will be i i i i − i d¯k µ tk = − 0 , (6.1) sdk/√n

n k di ¯k i=1 where d = ￿n , and sdk is the standard deviation of these differences d in year k. Since we want to test the average of difference is significantly different from zero, the constant µ0 =0. Table 6.4.6 shows the results of the paired t-test of market shares in 2009 and 2010 before and after matching. Meanwhile, we compute the correlation of each two pair of values, which are 0.921 in 2009 and 0.969 in 2010. We can find that after matching, the market shares do not vary significantly.

Table 6.4.6: Paired t-test of 2009 and 2010

Difference T-Test D. freedom Significance Mean Std. Dev. d2009 0.000 346 1.000 0.000 0.00289 d2010 0.000 342 1.000 0.000 0.00236

6.4.4 Conclusion about matching with power and weight

For the discrete choice modelling, we need to obtain the alternatives and attributes, which are provided in the supply data in a version level. However, in the registration data, we can only know the model names. In this section, to get more precise attributes, we test various matching methods to two different levels between model and version. This two methods are matching by power and matching by weight. After a comprehen- sive comparison of the variation of attributes (e.g. price) and the share of mismatched data, the matching method of matching by power is finally chosen, of which the level is named as “model-engine” level. In addition, due to the big omittance when we put some additional attributes (e.g. fuel types and clutch types) in to the matching, we thus decide to match the data with model and power only. 78 CHAPTER 6. DATA MATCHING Chapter 7

New car choice modelling

7.1 Introduction and methodology

Part I studies the statistical analysis about the Sweden new car market, involving that which car models are chosen and what the car attributes are. Then we need to know the essential characteristics of each chosen car as accurate as possible by matching the car attribute (supply data) to the car registration data (demand data). Chapter 6 thus compares different ways of matching and obtains a relatively better method for matching (i.e. matching by power) than the matching in previous master’s theses, which are matching by models only. As a result, the matching level of car make-model-engine is selected due to the synthesis consideration of the omittance of data after matching and the coefficient of variation of price as well as some other attributes.

7.1.1 Analysis of new car choice sets

After matching, we now have the data sets about each alternative and its attributes in each year, from 2005 to 2010. Based on our matching level, we first define the alternatives as car models with their maximum power. For instance, an alternative of Volvo V70 with 170 kw (231 horsepower) car belongs to Volvo V70 2.5T family. In each family, not all of the cars use the same type of fuel. In general, the fuel types are classified into three different groups, which are gasoline, diesel and hybrid fuel. Thus, our final alternatives become make-model-engine level with different fuel types, like Volvo V70 with 170 kw using gasoline or Volvo V70 170 kw using hybrid fuel. Due to our matching level, the attributes that are irrelevant to fuel, like price and weight, are the same, but other attributes, like the vehicle tax is different. Table 7.1.1 shows the various numbers of alternatives and observations in the 6 years. In among these alternatives, we exclude the alternative of minority cars, or so called luxury cars, since the demand of these cars is rare. We assume that the luxury cars buyers’ preference and their sensitive to the attributes, like price and horsepower, can be

79 80 CHAPTER 7. NEW CAR CHOICE MODELLING

Table 7.1.1: Sizes of choice sets from 2005 to 2010

2005 2006 2007 2008 2009 2010 Alternatives With luxury cars 907 982 1,020 942 955 946 No luxury cars 901 978 1,010 933 948 934 Observations Private owned 145,359 144,392 153,112 104,181 84,344 117,281 Company owned 35,957 37,704 44,249 37,214 31,878 37,905 Company leasing 69,113 78,936 82,359 76,748 57,469 83,094 very different from the public. So, the existence of these cars may complicate the model. The following car makes are defined as “luxury cars” based on the annual sales but still somehow arbitrarily: Aston Martin, Bentley, Ferrari, Koenigsegg, Lamborghini, Lotus, Maserati, Mercedes-Maybach and Rolls-Royce. Nevertheless, several makes which may be also recognised as luxury brands by some people, e.g. Porsche, Lexus, Alfa Romeo, are included in the models due to the considerable annual sales per year.

7.1.2 Estimation tool - BIOGEME

To estimate the model, Bierlaire (2003) distributes an open source package named BIO- GEME, which is the abbreviation of BIerlaire Optimization toolbox for GEv Model Estimation. To estimate a model, two files are needed, one is model specification file (.mod), and the other is data file (.dat). In the model file, one needs to identify the choice, β parameters and utility functions, and the type of models. Meanwhile, there are some custom settings as well to meet the users’ specific requirements, e.g. the LATEX code, additional expressions to the utility functions, and model structures. In our modelling, we employ the package of 1.8 version. This version provides the graphic user interface (GUI) for both Windows and Mac OS users, but it is a better choice to run the model in terminal (Mac OS or Linux) or command-line (Windows). In fact, running the program by command is not more complex than the GUI tool, and this way can reduce the possibility of the software crash. When the programme starts to run the model, it first stores the data in the computer’s memory. However, due to the huge size of our data, especially when we need to estimate a large sample, with thousands of alternatives and tens of thousands observations, the memory cache will easily exceed the limitation of 1 Gigabytes. Therefore, some default settings in the parameter file (default.par) have to be changed following the instruction from Bierlaire (2008). To avoid the memory overflow problem, instead of storing data 7.2. MODEL ESTIMATION 81 in memory, we need to set the data store on the hard disk as a binary file. Another practical issues is that, since the BIOGEME program is coded in Linux system, when one sets to store the data on file, the Windows system (either XP or 7) will not correctly read the automatically created binary file. So, in this project, all models are run in Mac OS X, which is a Unix-based system.

7.2 Model estimation

Before estimating the model, we need first designate our utility functions as the form of function 1.1.

cost cost perf perf class class fuel fuel Vik = αmk + βik xik + βik xik + βik xik + βik xik (7.1) safe safe tech tech + βik xik + βik xik

Function 7.1 shows the utility function of each alternative in each year k, where αmk is the make specific constant. Here, we assume that every car alternative i belonging to make m holds the same constant value. So, this constant represents, in year k, the utility difference of car make m from a reference car make by holding other terms the same. In our estimation, we set the make constant of Volvo to zero. Thus, we can use this coefficient to examine the relative value of each brand. Besides the constant, to capture the utility, in reference to the factors that signifi- cantly affect the new car choice in Sweden presented by Hugosson and Algers (2011), we aggregate the car attributes into 6 categories, which are listed below.

cost Vehicle price xik The parameter of vehicle cost are purchase price or benefit tax, and vehicle tax. Each parameter are described as following.

1. Retail price: If a car is owned by a private person or a company but not leasing, one would pay the price of the car. In our model, we assume all buyers are to pay the suggested retail price for a new car. And the natural logarithm is taken.

2. Benefit tax: If a car is owned by a company for leasing, instead of paying the retail price of the car, one would only pay the benefit tax of the car, since the usage of such a car creates benefit for the user. The car benefit is calculate according to formula 7.2, where the constant 13,600 is computed as a share of the base amount (BA) which equals to 42,800 kr. In our case, the constant is 31.7% of BA. In our models, the logarithm is taken as well to be consistent with the manipulation of 82 CHAPTER 7. NEW CAR CHOICE MODELLING

retail price.

Benefit =13, 600 + 2.17% purchase price × +9% purchase price up to 321,000 (7.2) × +20% purchase price over 321,000 ×

3. Vehicle tax: This tax is based on the CO2 emission and fuel type, which is com- puted as formula 7.3.

360 + 15(CO 100) if gasoline 2 − Vehicle Tax =  3.15 [360 + 15(CO2 100)] if diesel (7.3)  · −  360 + 10(CO 100) if alternative fuel 2 −  In our model, we use the ratio of vehicle tax to the retail price to obtain the share of the tax to the car price.

perf Vehicle performance xik These attributes capture the performance and capability of each alternative, which are power capacity, acceleration and cruising ability.

1. Power capacity: This is defined as the maximum power of a vehicle divided by

its total weight (kw/ton), indicating the capacity of the power per ton.

2. Acceleration: The acceleration is measured as the time eclipse in second that a car accelerates from 0 to 100 km/h.

3. Fuel consumption: This attribute is the fuel consumption of a vehicle per 100 km in litre under the mixed traffic condition (both city and rural area).

4. Tank volume: This attribute shows the size for the fuel tank in litre.

class Vehicle size class xik Here, we use the length of the vehicle to represent the size of a vehicle and the dummies of some typical car types listed in Table 5.1.2 in Chapter 5.

fuel Fuel types dummy xik From the descriptive analysis about the shift of fuel type shares every year in Section 4.3 in Chapter 4, we decide to use the dummy of fuel types to see how different types of fuel can affect the utility of an alternative. In the model, the fuel types are aggregated into 3 categories, which are gasoline, diesel and alternative fuels. Each alternative belongs to one of these categories. So, in the models, diesel and alternative fuels are variables. 7.3. ESTIMATION RESULTS 83

safe Safety dummy xik There are lots of safety test to demonstrate the safety level of a car model, e.g. Whiplash test and Euro-NCAP. In the models, the test grade of Whiplash is introduced. Basically, there are 4 levels of the safety grade, which are ordered as good, accept, deficient and poor. We focus on the influence to the choice when the alternative is a “good” safety car.

tech Technical index dummy xik This attribute is the dummies to show if a vehicle has certain technologies. In the models, one typical car technology is introduced, which is ESP, namely electronic stability control. This technology is essential for the vehicles driven over ice and snow in Nordic winter.

7.3 Estimation results

In section 7.1 and 7.2, we present that there are 3 different types of owners: private owner, company owner and company owned but leasing to employee1 in each year. In this section, the modelling estimation results of each type of owners are presented. For each type, 6 static models are estimated respectively, which show the coefficients within every year from 2005 to 2010.

7.3.1 Sampling from private owned car data

For the data size presented in Table 7.1.1, we may find that the observations for private owned cars are quite large, most of which are larger than 100,000. So, if we took all of these observations into the model, the total run time would be very long. In fact, with the proper initial values, the estimation time for the smallest sample, 2009, which contains about 84,000 observations is more than 16 hours, in which the run time for estimation is 13.5 hours with 13 iterations. So, we have to reduce the sample size. Here, a random sample of 20% of the total observations is selected. With a smaller sample, the run time for decreased to less than 4 hours with the estimation duration of 2.5 hours with 11 iterations. We simply draw a random sample from the whole private owned new car data in each year. Since, the models are MNL forms, the results will be unbiased, but they maybe less efficient than a larger sample. To test the results, we estimate the parameters both from the sample and from the population. Since the value of each parameter β and its standard error SE are already know, we can calculate the

1To be simplified, this type of ownership is called as company leasing owners in short. 84 CHAPTER 7. NEW CAR CHOICE MODELLING t-value by: β β t = p − s 2 2 SEp + SEs where the subscripts p and s represent￿ population and sample respectively. All the t- value computation results are shown in Table C.1 in Appendix C. The maximum value is 1.66, which is the dummy of Honda. Between the coefficient of the sample and the whole observations, no value is significantly different from each other in the significant level of 95%. For the car attribute parameters, the absolute values of t-value are all less than one. Therefore, one can be sure that the estimation based on the 20% samples will not be likely to introduce bias to the results.

7.3.2 Parameter analysis for private owner choices

Table 7.3.1: Estimation results of private owned cars

2005 2006 2007 2008 2009 2010 Statistic summary Observations 29087 28870 25344 20613 16803 23358 LL at 0 -197893 -198755 -175322 -140960 -115173 -159756 LL at βˆk -178208 -183235 -155917 -128776 -106513 -147697 Adjusted-ρ2 0.099 0.078 0.110 0.086 0.075 0.075 Coefficient (Standard Error) Make specific constant -3.51 -3.05 -2.82 -2.76 -3.17 -4.14 Alfa Romeo (0.206) (0.19) (0.355) (0.278) (0.355) (0.501) -1.25 -0.726 -0.501 -0.597 -0.717 -0.975 Audi (0.0368) (0.0353) (0.0587) (0.0406) (0.0482) (0.0379) -0.955 -0.596 -1.2 -0.44 -0.437 -0.835 BMW (0.0352) (0.0367) (0.0678) (0.045) (0.0465) (0.0397) -2.78 -3.39 -3.14 -1.82 -1.19 -0.432 Cadillac (0.253) (0.316) (0.448) (0.187) (0.174) (0.292) -1.21 -1.3 -1.83 -1.1 -1.22 -0.921 Citroen¨ (0.0356) (0.0389) (0.125) (0.0451) (0.0577) (0.0473) -1.83 -1.21 -0.299 -1.9 -1.12 -0.479 Chrysler (0.0708) (0.073) (0.139) (0.135) (0.169) (0.157) -2.83 -2.22 -1.61 -1.82 -1.42 -1.59 Chevrolet (0.13) (0.0714) (0.2) (0.0755) (0.088) (0.0793) - -2.61 -1.78 -1.6 -1.51 -1.99 Dodge (-) (0.268) (0.383) (0.17) (0.193) (0.221) -1.84 - - - - - Daewoo (0.0663) (-) (-) (-) (-) (-) - - - - -2.33 -0.569 Dacia (-) (-) (-) (-) (0.113) (0.082) -2.71 -3.23 -3.97 -3.12 -3.52 -4.12 Fiat (0.0846) (0.0999) (0.383) (0.108) (0.16) (0.144) -1.38 -1.34 -1.18 -0.951 -1.09 -0.959 Ford (0.0303) (0.0311) (0.0586) (0.0361) (0.0388) (0.0327) Continued on next page 7.3. ESTIMATION RESULTS 85

Table 7.3.1 – continued from previous page 2005 2006 2007 2008 2009 2010 -1.37 -1.17 -1.56 -1.32 -0.947 -0.719 Hyundai (0.0404) (0.0416) (0.137) (0.0538) (0.0555) (0.0462) - - -5.35 -2.26 -0.532 -2.08 Hummer (-) (-) (7.18) (0.697) (0.325) (0.501) -1.26 -0.658 -1.54 -0.177 -0.442 -0.376 Honda (0.0474) (0.043) (0.126) (0.0451) (0.0514) (0.0462) -3.7 -3.36 -1.36 -0.42 -1.15 -1.26 Jaguar (0.486) (0.564) (0.503) (0.203) (0.212) (0.226) -0.725 -0.77 -1.08 -0.709 -0.425 -0.624 Jeep (0.119) (0.143) (0.259) (0.133) (0.172) (0.18) -2.27 -2.22 -2.18 -1.83 -0.371 0.055 Kia (0.0604) (0.0669) (0.218) (0.081) (0.0688) (0.0546) -1.11 -0.0767 -0.47 -0.515 -0.616 -0.743 Lexus (0.131) (0.0944) (0.195) (0.143) (0.157) (0.153) -1.93 -0.976 -1.47 -0.816 -0.104 -0.875 Land Rover (0.172) (0.186) (0.268) (0.218) (0.186) (0.205) -0.986 -1.11 -1.81 -0.945 -0.841 -2.42 Mercedes (0.0383) (0.042) (0.0822) (0.0514) (0.0539) (0.089) -3.57 - - - - - MG (0.29) (-) (-) (-) (-) (-) -1.63 -1.17 -1.4 -0.597 -1.11 -0.323 Mitsubishi (0.0455) (0.0478) (0.169) (0.0588) (0.0642) (0.0487) -1.32 -1.87 -0.23 -0.978 -0.872 -1.47 Mini (0.142) (0.195) (0.215) (0.126) (0.131) (0.13) - -7.32 -6.74 - - - Morgan (-) (9.7) (17.9) (-) (-) (-) -1.64 -1.12 -1.44 -0.659 -0.67 -0.908 Mazda (0.0478) (0.0464) (0.163) (0.0527) (0.0583) (0.0526) -1.57 -1.85 -1.71 -0.548 -0.692 -0.535 Nissan (0.0559) (0.0557) (0.129) (0.0524) (0.0555) (0.0454) -1.88 -1.38 -1.17 -1.12 -1.23 -1.38 Opel (0.0382) (0.0381) (0.0972) (0.0468) (0.0533) (0.0458) -0.165 -0.32 -1.38 -0.559 -0.793 -0.762 Peugeot (0.0302) (0.0313) (0.103) (0.0388) (0.0467) (0.0405) 0.412 0.0618 -0.84 0.244 -0.474 -0.163 Porsche (0.136) (0.133) (0.416) (0.169) (0.195) (0.139) -1.51 -1.09 -1.97 -2.42 -1.66 -0.854 Renault (0.0418) (0.04) (0.146) (0.066) (0.0617) (0.0455) -3.27 - - - - - Rover (0.214) (-) (-) (-) (-) (-) -0.0746 0.388 0.306 -0.0528 -0.698 -0.639 Saab (0.0343) (0.0307) (0.0497) (0.0439) (0.0649) (0.0538) -3.09 -2.76 -3.83 -2.37 -1.67 -1.91 Seat (0.0749) (0.0675) (0.319) (0.069) (0.0666) (0.0617) -1.05 -0.733 -0.91 -0.32 -0.322 -0.366 Skodaˇ (0.0374) (0.0374) (0.099) (0.0445) (0.0476) (0.0384) -1.34 -0.383 -2.03 -0.728 -0.0965 0.484 Subaru (0.0572) (0.0518) (0.166) (0.0641) (0.0574) (0.0491) - -2.77 -1.35 -2.19 -2.34 -6.26 Ssangyong (-) (0.357) (0.361) (0.379) (0.579) (5.36) -2 -2.41 -2.27 -1.76 -1.27 -1.24 Suzuki (0.0672) (0.0885) (0.451) (0.13) (0.0886) (0.0727) -0.344 -0.151 -0.217 0.191 -0.0106 -0.156 Toyota (0.0273) (0.0273) (0.061) (0.0331) (0.0372) (0.0319) Continued on next page 86 CHAPTER 7. NEW CAR CHOICE MODELLING

Table 7.3.1 – continued from previous page 2005 2006 2007 2008 2009 2010 0 0 0 0 0 0 Volvo fixed fixed fixed fixed fixed fixed -0.891 -1.07 0.0929 -0.249 -0.367 -0.0668 Volkswagen (0.0299) (0.0308) (0.059) (0.035) (0.0409) (0.0316) -3.18 -3.37 -2.16 -3.34 -2.21 -2.4 (0.199) (0.22) (0.341) (0.306) (0.208) (0.263) Car attributes 0.185 0.205 0.989 0.331 0.29 0.32 Whiplash “Good” (0.0196) (0.0208) (0.0489) (0.0241) (0.0247) (0.0207) 0.021 0.0108 0.00586 0.000396 0.00146 0.00463 Length (cm) (0.000378) (0.000337) (0.000575) (0.000389) (0.000339) (0.000389) -2.88 -2.81 -1.41 -2.61 -2.31 -1.96 Log of price (0.0642) (0.0575) (0.104) (0.0582) (0.0617) (0.0599) -0.0163 -0.0136 -0.00656 0.00221 -0.00224 -0.00563 Power capacity (0.000825) (0.000899) (0.00169) (0.0012) (0.00133) (0.00111) -0.0442 0.00564 0.0234 0.0354 0.0325 0.0257 Tank volume (0.00103) (0.00117) (0.00195) (0.00143) (0.00159) (0.00138) -0.116 -0.129 -0.0713 -0.0525 -0.0528 -0.037 Acceleration (0.00398) (0.00424) (0.00702) (0.00536) (0.00647) (0.00462) -1.94 -0.693 1.27 1.05 0.81 0.0582 Alternative fuel (0.0445) (0.0306) (0.0429) (0.0226) (0.0255) (0.0245) -0.51 0.0971 0.792 0.0413 0.279 0.311 Diesel fuel (0.0673) (0.0494) (0.0967) (0.0316) (0.0398) (0.032) 0.193 0.0512 -0.125 0.295 -0.0562 0.0124 ESP (0.0154) (0.0158) (0.0389) (0.0228) (0.0283) (0.0239) 0.143 0.0342 -0.0867 -0.164 -0.157 -0.207 Fuel consumption (0.00894) (0.00812) (0.0185) (0.00948) (0.0111) (0.00932) 0.325 0.458 0.477 0.498 0.458 0.423 Two-box car (0.0147) (0.015) (0.0302) (0.0181) (0.0205) (0.0173) -1.2 -0.906 -0.409 -0.277 -0.752 -0.927 Share of tax (0.0503) (0.0369) (0.0726) (0.0235) (0.0364) (0.0302) End of table

The estimation results are shown in Table 7.3.1 where 20% of the observations are selected. By setting the make specific constant of Volvo as zero, we then can find the relative value of a brand comparing with Volvo. Among these brands, a remarkable trend we can find is the decline value of Saab for private owners. In 2006, the coeffi- cient of Saab is 0.388 (0.0307), which is significantly larger then any other car makes, including Volvo. While in 2010, the coefficient of Saab becomes -0.639 (0.0538). This value is not only significantly smaller than Volvo, but also decreases to the level of Da- cia2. By contrast, if we assume the brand value of Volvo does not change much from 2009 to 2010, one can believe that the value of Dacia goes up conspicuously due to the increased constant from -2.33 to -0.569. As a well-known low-price car maker, al-

2Testing the differences between the coefficients of Saab and Dacia, the t-value is 0.714. 7.3. ESTIMATION RESULTS 87 though Dacia enters Sweden since 2009, its sales to private owners increase from 493 in 2009 to 1056 in 20103, where the retail prices of Logan and Sandero however in- crease by 5% to 10% respectively. Therefore, the value of brand is likely to play a quite important role in affecting the buyer’s decision. If a private owner wanted to choose a car, several basic characteristics, namely price and tax, size, safety and performance, would be quite essential. And, the sign of these attributes are also homogeneous among these years. Besides the fact that a higher price tends to decrease the utility of a car, a larger share of vehicle tax to the retail price also has negative effect to the car utility as well. In terms of the size and class of a car, we may find that one prefers a longer car with two-box type. Meanwhile, if the grade of Whiplash safety test is “Good”, the utility of such a car may increase as well. In fact, the negative sign of power capacity supports this argument as well. This coefficient shows that for the vehicles with the same horsepower, one would prefer the heavier one, since a greater weight may lead to a better safety of a car, where only in 2008, the sign turns to positive. However, such a coefficient may cause the confusion that it may hint the private car buyer dislikes a good performance of a car. This confusion can be eliminated by the coefficient of acceleration. According to the definition, a larger value of this attribute means a worse performance of acceleration. Therefore, we can find that a better acceleration ability increases the utility. Focusing on the fuel types, the parameters of alternative fuel and diesel are variable, whereas the coefficient of gasoline is fixed to zero. So, we can look at the sign of these two parameters to study how various fuel types may change the utilities in different years in contrast with gasoline cars. For alternative fuel, or so called hybrid fuel, there is a big change from 2006 to 2007. In 2005 and 2006, the alternative fuel decreases the utility. From 2007 and the years afterwards, on the contrary, the alternative fuel increases the utility. However, from 2009, although the parameter is still positive, the level of parameter starts to decrease. For diesel fuel, the change of the coefficient sign occurs between 2005 and 2006, where in 2005 the diesel has a negative effect but in 2006, the effect becomes positive. Hugosson and Algers (2011) review a Sweden national policy involving the clean cars, which may support the change of alternative fuel coefficients in our results. In 2007, if one purchases a new alternative fuel car, she can be rewarded a subsidy of about 10,000 SEK. And in July of 2009, this subsidy is terminated. As a result, we may find that in 2009, the parameter of alternative fuel

3The sales only include the model Logan and Sandero. In fact, a new SUV model Duster enter the market in 2010 with 145 sales to private owners. 88 CHAPTER 7. NEW CAR CHOICE MODELLING decrease to 0.81. And in the next year, the value of alternative fuel dummy drops to 0.0582, which is the lowest from 2007. But this value is still larger than the parameters before the introduction of the clean car subsidy. The change is also reflected by the sign change of fuel consumptions. Before 2006, a larger fuel consuming car will be more preferable to the private car buyers. But after 2007, this type of buyers starts to be in favour of fuel efficient cars. Meanwhile, the buyers prefer larger fuel tank from 2006. And the level of parameter increases significantly in 2007. Synthesise the positive attitude to large fuel tank and smaller fuel consumption, we may infer that, on average, private car buyers become more sensitive to the cruising capability of a car, since they want a larger cruising mileage after refilling and less frequency to the fuel station to refill their cars.

7.3.3 Parameter analysis for company owner choices

Table 7.3.2: Estimation results of company owned cars

2005 2006 2007 2008 2009 2010 Statistic summary Observations 35954 39700 44239 37208 31875 37899 LL at 0 -244613 -273314 -306032 -254443 -218482 -259209 LL at βˆk -219497 -249659 -279699 -228701 -197525 -232412 Adjusted-ρ2 0.102 0.086 0.086 0.101 0.096 0.103 Coefficient (Standard Error) Make specific constant -2.44 -2.92 -3.4 -2.92 -2.82 -3.34 Alfa Romeo (0.121) (0.146) (0.157) (0.219) (0.271) (0.268) 0.865 0.539 0.309 0.552 0.426 0.248 Audi (0.268) (0.243) (0.217) (0.224) (0.275) (0.229) -1.5 0.606 0.928 0.165 0.287 0.347 BMW (0.28) (0.27) (0.242) (0.276) (0.312) (0.253) -2.31 -2.56 -2.26 -1.2 0.883 0.176 Cadillac (0.166) (0.154) (0.12) (0.803) (0.106) (0.27) -2.2 -2.5 -1.82 -2.23 -1.14 -1.15 Citroen¨ (0.53) (0.563) (0.483) (0.543) (0.572) (0.47) -1.55 -1.2 0.529 -1.19 0.239 0.387 Chrysler (0.67) (0.622) (0.575) (0.849) (0.11) (0.11) -2.3 -1.94 -1.32 -2.14 -1.59 -1.7 Chevrolet (0.125) (0.663) (0.67) (0.809) (0.103) (0.738) - -2.75 -1.68 -1.91 -1.7 -2.18 Dodge (-) (0.312) (0.139) (0.142) (0.225) (0.237) -1.93 - - - - - Daewoo (0.797) (-) (-) (-) (-) (-) - - - - -2.69 -1.2 Dacia (-) (-) (-) (-) (0.113) (0.753) -2.9 -3.33 -3.75 -2.63 -1.58 -4.51 Fiat (0.754) (0.106) (0.137) (0.769) (0.678) (0.192) Continued on next page 7.3. ESTIMATION RESULTS 89

Table 7.3.2 – continued from previous page 2005 2006 2007 2008 2009 2010 -1.6 -1.46 -1.28 -1.25 0.885 -1.5 Ford (0.289) (0.254) (0.252) (0.275) (0.277) (0.26) -1.39 0.778 0.734 -1.93 0.509 -1.16 Hyundai (0.404) (0.338) (0.354) (0.65) (0.439) (0.505) - - -1.67 -2.35 0.367 -4.8 Hummer (-) (-) (0.37) (0.575) (0.381) (1) -1.69 -1.45 -1.7 0.936 0.938 -1.26 Honda (0.539) (0.494) (0.408) (0.457) (0.488) (0.48) -1.83 -2.1 -1.24 0.569 0.955 -1.36 Jaguar (0.205) (0.259) (0.192) (0.123) (0.16) (0.152) -1.5 0.987 -1.13 0.561 0.655 0.132 Jeep (0.11) (0.122) (0.994) (0.888) (0.996) (0.143) -1.58 -2.37 -1.93 -2.21 0.643 0.74 Kia (0.544) (0.702) (0.733) (0.987) (0.753) (0.736) 0.513 0.196 0.452 0.766 0.204 0.587 Lexus (0.854) (0.759) (0.798) (0.933) (0.992) (0.935) -1.65 -1.29 -1.24 0.97 0.753 0.779 Land Rover (0.124) (0.132) (0.949) (0.127) (0.146) (0.142) -1.37 -1.34 -1.52 0.777 0.358 -4.7 Mercedes (0.36) (0.338) (0.292) (0.329) (0.415) (0.135) -4.57 - - - - - MG (0.544) (-) (-) (-) (-) (-) -1.45 -1.4 -1.34 0.917 0.817 0.561 Mitsubishi (0.432) (0.507) (0.622) (0.572) (0.489) (0.433) 0.119 0.716 0.194 0.183 0.101 0.847 Mini (0.931) (0.122) (0.78) (0.839) (0.113) (0.115) - -7.24 -6.61 - - - Morgan (-) (0.333) (0.295) (-) (-) (-) -1.55 -1.31 0.917 -1.8 0.461 -1.9 Mazda (0.526) (0.48) (0.491) (0.52) (0.577) (0.561) -1.6 -1.35 -1.27 -1.12 -1.53 -1.23 Nissan (0.662) (0.532) (0.416) (0.483) (0.773) (0.566) -1.39 -1.18 0.77 -1.27 0.233 0.937 Opel (0.341) (0.319) (0.342) (0.396) (0.383) (0.337) -1.4 -1.57 -1.15 0.796 0.53 0.968 Peugeot (0.409) (0.378) (0.347) (0.3) (0.386) (0.384) 0.183 -1.13 0.588 0.164 0.193 0.651 Porsche (0.144) (0.167) (0.134) (0.153) (0.167) (0.107) -1.8 -1.66 -1.66 -1.47 -1.68 0.682 Renault (0.343) (0.439) (0.49) (0.393) (0.557) (0.344) -2.18 - - - - - Rover (0.146) (-) (-) (-) (-) (-) 0.544 0.379 0.4 0.286 0.314 0.714 Saab (0.212) (0.203) (0.201) (0.239) (0.387) (0.336) -3.37 -3.17 -2.82 -1.55 -1.36 -1.33 Seat (0.994) (0.788) (0.772) (0.415) (0.549) (0.413) 0.821 0.803 0.468 0.254 0.147 0.572 Skodaˇ (0.385) (0.337) (0.335) (0.338) (0.37) (0.357) -1.56 0.427 -1.53 -1.15 0.142 0.111 Subaru (0.544) (0.434) (0.527) (0.641) (0.562) (0.456) - -1.87 -1.88 -1.85 -2.28 -6.42 Ssangyong (-) (0.159) (0.187) (0.176) (0.431) (0.507) -3.3 -2.9 -1.75 -2.96 -1.31 -1.48 Suzuki (0.111) (0.126) (0.129) (0.26) (0.117) (0.108) Continued on next page 90 CHAPTER 7. NEW CAR CHOICE MODELLING

Table 7.3.2 – continued from previous page 2005 2006 2007 2008 2009 2010 0.17 0.171 0.596 0.464 0.388 0.272 Toyota (0.236) (0.209) (0.216) (0.242) (0.274) (0.247) 0 0 0 0 0 0 Volvo fixed fixed fixed fixed fixed fixed 0.443 0.57 0.339 0.325 0.961 0.484 Volkswagen (0.227) (0.226) (0.227) (0.233) (0.288) (0.237) -1.26 -3.31 -1.99 -1.27 0.195 0.397 Smart (0.101) (0.243) (0.11) (0.191) (0.128) (0.249) Car attributes 0.782 0.665 0.893 0.434 0.833 0.637 Whiplash “Good” (0.0175) (0.0177) (0.0172) (0.0182) (0.0191) (0.0175) 0.0155 0.00917 0.00559 0.00425 0.00496 0.013 Length (cm) (0.000323) (0.000266) (0.000209) (0.000258) (0.000237) (0.00032) -2.33 -2.44 -1.54 -1.69 -1.64 -1.46 Log of price (0.0527) (0.0447) (0.0388) (0.0372) (0.0385) (0.0418) -0.0239 -0.0176 -0.00766 -0.00548 -0.00495 -0.00631 Power capacity (0.000735) (0.000735) (0.000621) (0.000799) (0.000921) (0.000852) -0.0203 0.0205 -0.02 -0.037 0.0214 0.0151 Tank volume (0.000875) (0.000884) (0.000749) (0.000869) (0.000899) (0.00783) -0.158 -0.104 -0.0537 -0.075 -0.0376 -0.0346 Acceleration (0.0035) (0.00342) (0.00254) (0.0034) (0.00415) (0.00361) -0.252 0.503 0.871 1.8 1.69 1.43 Alternative fuel (0.0231) (0.0171) (0.0173) (0.016) (0.018) (0.0169) -0.527 0.261 0.904 0.218 0.236 0.75 Diesel fuel (0.0587) (0.0419) (0.0362) (0.0261) 0.0279 (0.0269) 0.237 0.232 -0.164 -0.0629 -0.281 -0.152 ESP (0.0143) (0.0138) (0.0146) (0.0174) (0.0226) (0.0191) 0.143 0.0556 -0.0281 -0.22 0.195 -0.247 Fuel consumption (0.00740) (0.00651) (0.00675) (0.00671) (0.00792) (0.00733) 0.143 0.372 0.422 0.439 0.434 0.551 Two-box car (0.0074) (0.0124) (0.0114) (0.0128) (0.0145) (0.0134) -0.805 -0.739 -0.658 0.0615 -0.212 -0.759 Share of tax (0.00432) (0.0319) (0.0273) (0.0197) (0.0246) (0.0244) End of table

Table 7.3.2 lists the coefficients of company owned car model. The total amount of observations varies from 30,000 to 40,000 cars depending on different years. To estimate the models, the average time duration including data reading and coefficients estimation is about 6 hours, which is not very long. So, for this owner type, the whole sample of company owned cars are estimated. Looking into the make specific constant in this table, for company owners, it shows that in most of these 6 years, the car brands from Volkswagen Group (Volkswagen, Audi, Skodaˇ and Porsche), Toyota Group (Toyota and Lexus), BMW Group (BMW and Mini) and Saab have the positive signs, some of which are significantly larger than zero thanks to the small standard errors. Unexpectedly, although Saab goes through a crisis 7.3. ESTIMATION RESULTS 91 of shutting down in 2009 and 2010 with a slump of sale, which is narrated in Section 4.2, for company users, in 2010, the make constant of Saab still holds a relatively high level. This phenomenon might indicate that although the operation situation of Saab is not good, holding all else attributes the same, the probability that for a company to purchase a Saab is still higher than to buy other makes. If a company wants to choose a car, several basic characteristics, namely price, size, safety and performance, are important, and the sign of these attributes are also the same among these years. These trends are quite similar to the private car owners. That is, a company car buyer prefers a larger and less expensive car with a good safety standard and better acceleration performance. However, different from the private car owner group, the sign of alternative fuel changes earlier, between 2005 and 2006, which means in 2006, for company cars, the alternative fuel may increase the utility whereas for private cars, the same type of fuel may decline the utility. The installation of ESP increases the utility in 2005 and 2006. However, after 2007, the availability of this technology declines the utility. It may be because that a company buyer feels unworthy to have this technology comparing with its extra cost and the actual increase of safety. The drawback of the model is that, since we cannot distinguish the usage of each car, some strange results occur in year 2008 and 2009. In 2008, the parameter of the share of vehicle tax to the retail price changes to positive, and in 2009, the sign of fuel consumption coefficient shifts to positive either. Between these two results, the most abnormal coefficient is the positive vehicle tax share in 2008, since one is more likely to prefer a smaller amount of vehicle tax. However, we still cannot simply reject the model, as the other coefficients, like price, size or performance, are quite reasonable. So, these results are probably due to the large variance of preference for company car buyers.

7.3.4 Parameter analysis of the choices of company cars for leasing

In Section 7.2, we introduce that if a vehicle is owned by a company and leasing to its employee as benefit, one do not have to pay the retail price for the vehicle, but have to pay the benefit tax, which is calculated based on retail price. So, in the estimation of leasing cars, we replace the retail price to the benefit tax. Table 7.3.3 lists all of the coefficients estimated from 2005 to 2010. According to the big volume of the whole observation, about 60,000 to 80,000 registries per year, we select a random sample of 50% of the whole leasing car observations. 92 CHAPTER 7. NEW CAR CHOICE MODELLING

Table 7.3.3: Estimation results of company leasing cars

2005 2006 2007 2008 2009 2010 Statistic summary Observations 34448 39454 40799 38136 28584 41151 LL at 0 -234367 -271620 -282235 -260789 -195924 -281451 LL at βˆk -203094 -232788 -246002 -216146 -169652 -241842 Adjusted-ρ2 0.133 0.143 0.128 0.171 0.134 0.141 Coefficient (Standard Error) Make specific constant -3.64 -3.3 -2.87 -3.2 -3.38 -4.83 Alfa Romeo (0.184) (0.165) (0.141) (0.209) (0.268) (0.409) -1.1 -0.598 -0.455 -0.614 0.0179 -0.521 Audi (0.0249) (0.0231) (0.023) (0.0232) (0.0271) (0.0214) -1.2 -0.687 -0.987 -0.524 -0.345 -0.921 BMW (0.0281) (0.0268) (0.0255) (0.0277) (0.0315) (0.0253) -3.39 -3.11 -3.69 -2.94 -4.02 -0.89 Cadillac (0.226) (0.201) (0.23) (0.165) (0.377) (0.408) -2 -1.84 -1.44 -1.95 -1.42 -1.43 Citroen¨ (0.0439) (0.0462) (0.0426) (0.0454) (0.0572) (0.0481) -1.39 -0.661 -0.376 -1.13 -0.345 0.203 Chrysler (0.0523) (0.0511) (0.0552) (0.0788) (0.0939) (0.0743) -3.15 -2.56 -1.75 -2.93 -3.25 -2.46 Chevrolet (0.181) (0.109) (0.0896) (0.115) (0.21) (0.15) - -2.03 -1.75 -2.23 -2.07 -2.58 Dodge (0) (0.245) (0.148) (0.157) (0.232) (0.279) -2.49 - - - - - Daewoo (0.108) (-) (-) (-) (-) (-) - - - - -3.69 -1.96 Dacia (-) (-) (-) (-) (0.199) (0.132) -3.73 -3.47 -3.82 -3.9 -3.26 -3.42 Fiat (0.16) (0.131) (0.139) (0.122) (0.132) (0.106) -1.55 -1.49 -1.1 -1.78 -1.22 -1.68 Ford (0.0249) (0.0235) (0.022) (0.0223) (0.0251) (0.0231) -2.86 -2.34 -2.26 -3.36 -2.87 -2.27 Hyundai (0.0761) (0.0738) (0.077) (0.119) (0.114) (0.0786) - - -2.56 -1.54 -0.296 -7.77 Hummer (-) (-) (0.707) (0.447) (0.503) (6.38) -2.46 -2.66 -1.6 -1.91 -1.88 -1.99 Honda (0.0754) (0.0725) (0.0513) (0.0554) (0.0614) (0.0585) -3.06 -4.32 -2.18 -1.4 -2.17 -2.31 Jaguar (0.29) (0.695) (0.29) (0.156) (0.22) (0.176) -1.58 -0.843 -0.948 -1.54 -0.793 -0.426 Jeep (0.114) (0.0999) (0.0916) (0.13) (0.159) (0.152) -2.86 -2.62 -2.28 -2.5 -1.56 -0.54 Kia (0.0917) (0.0859) (0.0912) (0.108) (0.0915) (0.0596) -1.34 -0.127 -0.188 -0.562 -0.198 -0.34 Lexus (0.121) (0.0761) (0.0724) (0.0814) (0.0903) (0.077) -2.7 -2.41 -1.57 -2.02 -1.18 -1.24 Land Rover (0.165) (0.211) (0.114) (0.159) (0.175) (0.141) -1.86 -2.14 -2.01 -1.95 -1.35 -5.18 Mercedes (0.0399) (0.0421) (0.0355) (0.0389) (0.0429) (0.121) -5.95 - - - - - MG Continued on next page 7.3. ESTIMATION RESULTS 93

Table 7.3.3 – continued from previous page 2005 2006 2007 2008 2009 2010 (0.983) (-) (-) (-) (-) (-) -2.34 -2.18 -1.75 -1.96 -3.01 -1.37 Mitsubishi (0.0625) (0.0786) (0.0794) (0.0904) (0.125) (0.0577) -1.2 -1.39 -1.1 -0.821 -0.302 -0.694 Mini (0.165) (0.179) (0.123) (0.102) (0.106) (0.114) - -7.44 -2.77 - - - Morgan (-) (9.53) (0.984) (-) (-) (-) -3.22 -2.34 -1.98 -2.45 -2.31 -2.16 Mazda (0.0994) (0.0861) (0.082) (0.0892) (0.107) (0.0842) -2.18 -3.7 -1.93 -1.93 -1.15 -1.02 Nissan (0.0628) (0.0946) (0.0556) (0.0547) (0.0583) (0.0506) -1.56 -1.23 -1.4 -1.5 -1.49 -2.03 Opel (0.0346) (0.0349) (0.0414) (0.0427) (0.048) (0.046) -0.952 -1.26 -1.31 -2.31 -1.83 -1.48 Peugeot (0.0346) (0.0387) (0.04) (0.0464) (0.0558) (0.0455) -0.276 -1.07 -1.56 -0.299 -0.193 -0.889 Porsche (0.16) (0.177) (0.252) (0.175) (0.186) (0.155) -2.1 -1.96 -2.06 -2.98 -1.73 -1.32 Renault (0.0479) (0.0516) (0.0626) (0.0642) (0.0553) (0.0459) -4.68 - - - - - Rover (0.409) (-) (-) (-) (-) (-) 0.35 0.273 0.265 -0.239 -0.273 -0.824 Saab (0.02) (0.0181) (0.0191) (0.0205) (0.031) (0.0273) -3.7 -3.59 -3.55 -3.77 -2.97 -2.88 Seat (0.11) (0.109) (0.11) (0.0995) (0.0998) (0.0817) -1.69 -1.24 -1.02 -0.82 -0.408 -0.523 Skodaˇ (0.0478) (0.0436) (0.0413) (0.0405) (0.0409) (0.0333) -2.2 -1.34 -2.33 -1.8 -0.993 -0.459 Subaru (0.0659) (0.0632) (0.073) (0.0669) (0.0703) (0.0473) - -2.72 -1.93 -3.24 -3.49 -7.66 Saangyong (-) (0.226) (0.171) (0.291) (0.564) (5.71) -2.97 -3.39 -2.04 -2.43 -1.6 -0.963 Suzuki (0.128) (0.169) (0.166) (0.21) (0.135) (0.1) -0.711 -0.679 -0.218 -0.586 -0.384 -0.634 Toyota (0.0253) (0.023) (0.0243) (0.0264) (0.0302) (0.0253) 0 0 0 0 0 0 Volvo fixed fixed fixed fixed fixed fixed -0.985 -1.02 -0.0833 -0.146 0.402 0.374 Volkswagen (0.0266) (0.0235) (0.0241) (0.0239) (0.0265) (0.0207) -1.94 -3.59 -3.51 -0.307 -1.42 1.18 Smart (0.142) (0.305) (0.269) (0.16) (0.306) (0.214) Car attributes 0.616 0.788 0.911 0.711 0.836 0.868 Whiplash “Good” (0.0204) (0.02) (0.0201) (0.0219) (0.0226) (0.0186) 0.0183 0.00669 0.00544 0.00406 0.00681 0.0166 Length (cm) (0.000315) (0.000249) (0.000219) (0.000275) (0.000263) (0.000294) -3.54 -2.65 -1.62 -1.48 -1.29 -1.15 Log of benefit tax (0.0603) (0.0504) (0.0433) (0.0401) (0.0451) (0.0401) -0.0189 -0.0252 -0.00484 -0.00359 0.000681 -0.00199 Power capacity (0.000721) (0.000883) (0.000675) (0.000808) (0.00104) (0.000825) -0.0143 0.0349 0.0272 0.0462 0.0153 0.00736 Tank volume (0.000816) (0.000893) (0.000773) (0.000821) (0.000953) (0.000763) -0.116 -0.153 -0.0656 -0.0539 -0.0231 -0.048 Acceleration Continued on next page 94 CHAPTER 7. NEW CAR CHOICE MODELLING

Table 7.3.3 – continued from previous page 2005 2006 2007 2008 2009 2010 (0.00357) (0.00437) (0.00281) (0.00349) (0.00478) (0.00368) -0.182 1.15 1.38 2.21 1.95 1.67 Alternative fuel (0.0212) (0.0153) (0.0167) (0.0165) (0.0196) (0.0183) 0.925 1 0.97 0.545 0.448 0.994 Diesel fuel (0.0538) (0.0417) (0.0379) (0.029) (0.0315) (0.0276) 0.295 -0.077 -0.16 -0.229 -0.267 -0.0242 ESP (0.0152) (0.0143) (0.0156) (0.0165) (0.0227) (0.0176) 0.287 0.0811 -0.0791 -0.294 -0.283 -0.296 Fuel consumption (0.00771) (0.00711) (0.00749) (0.0071) (0.00861) (0.00717) 0.546 0.442 0.563 0.805 0.617 0.699 Two-box car (0.0124) (0.0126) (0.0118) (0.0125) (0.0151) (0.0129) -1.68 -1 -0.455 0.0746 0.0588 -0.426 Share of tax (0.0418) (0.0326) (0.0288) (0.022) (0.0262) (0.0224) End of table

Looking into the brand coefficient, as same as the private owned car and company owned car models, we fix the constant of Volvo to zero as well. Before 2007, the constants of Saab (0.35 in 2005; 0.273 in 2006; 0.265 in 2007) are greater than zero. But from 2008 afterwards, the Saab constants shift to negative values, which means that, comparing with Volvo, there is a large decline of Saab’s brand value from 2008 in the leasing car group. By contrast, there is an increase of brand value for Volkswagen from 2009, that the coefficients in 2009 and 2010 are greater than the value of Volvo, which are 0.402 in 2009 and 0.374 in 2010. Similar to the other two car user groups, the leasing car users have positive attitude to a safer, longer two-box style car with better acceleration performance. Meanwhile, with the same horsepower, one may want a heavier car. Similar to the manipulation to the retail price, we take logarithm to the benefit tax as well. Among these 6 years, a larger benefit tax decreases the utility of a car. Only in 2005, the installation of electronic stability control has a negative effect to the utility. Focus on the attitude to fuel, except the fact that diesel fuel has a positive influence in all of these 6 years, in the beginning of these years, a smaller tank volume (in 2005) and a larger fuel consumption (in 2005 and 2006) increase the utility and the coefficient of alternative fuel (in 2005) is negative. Nonetheless, after 2006, this group of users become more “environmental friendly”, that they prefer a larger size of fuel tank and a smaller consumption of fuel. At the same time, the coefficients of alternative fuel become larger than diesel. 7.4. ANALYSIS ACROSS VARIOUS YEARS 95

7.4 Analysis across various years

As the results shown in Section 7.3, from 2005 to 2010, we have in total 18 groups of results. It is unwise to simply compare the changes of parameter among different years, since both of the number of alternatives and the values of attributes in the choice sets of various years are different. In these models, we fix the coefficient of Volvo to zero. One can only know the relative values to Volvo. If the de facto value of Volvo shifts year by year, we thus cannot compare value of coefficients of the same attributes in different years. So, we then need to monetise these parameters based on the price information, and then to compare the changes of monetary values in different years.

7.4.1 The impact of “clean car” compensation

One way is to calculate the marginal willingness to pay (MWTP) for each parameter depending on the car price. Since the logarithm of price is taken, the MWTP is not unique and depends on the price. Arnberg et al. (2008) present a simple way to compute the MWTP when the logarithm is taken, that is to divide the coefficient βx of attribute β x by the marginal effect of price p/P¯, where βp is the coefficient of log price and P¯ is the mean of price without logarithm taken:

P¯β MWTP = x . (7.4) − βp

In section 7.3.2, we analyse that the clean car compensation shifts private car users’ attitude from gasoline to alternative fuel. In this paper, we use the same way to find the car buyers’ MWTP from a gasoline car to a alternative fuel car in each year. Based on equation 7.4, in 2006, if one wants to change the same car from gasoline to alterna- tive fuel, the car price may have a discount of 24.7%. However, in the next year, the compensation policy is implemented. Thus, holding all other attributes the same, one will choose an alternative fuel car in stead of a gasoline car. Then, one may still buy the alternative fuel car instead of a gasoline car even if the price of the alternative fuel car increases by 90%. In 2008, this percentage decreases to 40.2%, and in 2009, it then decreases to 35.1%. In 2010, when the “clean car” policy is terminated, the percentage decreases to 3.0%, but the value of alternative fuel remains higher than gasoline. The trend of changes in percentage of car price is plotted in Figure 7.4.1. In 2006, the average purchased car price in our choice set is 237,000 SEK. Between 2006 and 2007, the only known policy is the compensation about 10,000 SEK, which is only about 4.2% of the average car price. Meanwhile, with the comparison to the 96 CHAPTER 7. NEW CAR CHOICE MODELLING

Change of MWTP of alternative fuel

(##$

of car price of car "#$ % %#$

&#$

'#$

#$ '##)$ '##%$ '##*$ '##"$ '##+$ '#(#$ Year !'#$

!&#$

!%#$

!"#$

Figure 7.4.1 parameter of log car price in the company owned car user group, in all of the years except 2007, private car users are more sensitive to the car price than the company car owners. Nonetheless, in 2007, the parameter of log price for private users is - 1.41, whereas the parameter of company owned cars is -1.54. Due to this policy, a compensation of a small amount of the car price leads to a huge shift of gasoline car buyers’ valuation of alternative fuel, while private car owners become less sensitive to the car price than the company car owners.

This phenomenon seems to be unrealistic, as it does not make sense that 10,000 SEK is able to result in a huge shift of their valuation. One interpretation is that when this policy is implemented, some original gasoline car buyers may treat this amount of money as somehow a “free lunch”. So, they may prefer to get a alternative fuel car simply to receive this compensation. Even according to the calculation, they may value the alternative fuel about 20% less than the gasoline fuel. In this case, private car owners’ purchase behaviour becomes, to some degree, irrational to the alternative fuel cars. The other explanation is that maybe the data themselves have some problems. However, in the sections handling data processing, we test the market share and the percentage of omittance to avoid the data sets for estimation being much different from the original data. Recall the Table 4.1.1 in Section 4.1.2, Chapter 4. We can find that the new entries to the stock data in each year (Method 2) is quite different than the “first registration” number (Method 3) in 2007 and 2008, but they are actually from the same data set. Hence, if the data themselves have certain problems, it should be from the original registration, which cannot be controlled. 7.4. ANALYSIS ACROSS VARIOUS YEARS 97

However, generally, whatever the valuation of alternative fuel is, one can be sure that this policy is very successful. Not only because the coefficient of alternative fuel dummy turns to positive in 2007, but also since that the buyers’ still hold the preference to alternative fuel comparing with gasoline after the end of this policy. This compen- sation ends in July, 2009. From the coefficient in 2010 model, the private car owners value the alternative fuel cars 3% more than the gasoline fuel cars, which is much higher than the valuation before the beginning of the compensation, where alternative fuel has a negative effect to the utility. This means that private car buyers in 2010 spontaneously want to buy a more environmental friendly car even they do not receive the compen- sation. The implementation of this policy successfully establishes the awareness of environment protection.

7.4.2 The brand value decline of Saab

In the statistical analysis in Section 4.2 in Chapter 4, we mention that there is a signif- icant market share slump of Saab after 2008, due to its reselling storm and an unclear future. Thus, we try to use the model results to find if the brand value of Saab really changes during these years. Among three user groups, the signs of brand specific constant of Saab are not simi- lar. For private owned cars, in 2006 and 2007, the coefficient are greater than zero. But in 2009 and 2010, the coefficient of Saab becomes negative. For company owned cars, the coefficient holds positive in all of these 6 years. However, for the leasing cars, the parameter of Saab shifts from positiveness to negativeness between year 2007 and 2008. Based on the changes of signs, we cannot find the decline for company car owners, but in the other two group, the decrease of brand value of Saab is quite obvious. Table 7.1.1 in Section 7.1.1 show the size of each user group. The volume of com- pany owned cars is minority of the total new car registration data, about less than 20%. So, the decline of Saab coefficient among private car users and company leasing car users plays a rather important role.

The decline of Saab value in private car owner group In 2006 and 2007, the Saab brand value are 0.388 (0.0307) and 0.306 (0.0497) respectively, significantly higher than Volvo (which is fixed to zero) with a confidence level of 99%. In 2008, with a coefficient of -0.0528 with standard error 0.0439, one cannot reject the null hypothesis that value of Saab is as same level as Volvo. In the next two years, 2009 and 2010, the Saab constant drops to -0.698 (0.0649) and -0.639 (0.0538), significantly lower than 98 CHAPTER 7. NEW CAR CHOICE MODELLING the value of Volvo with 99% confidence level. According to equation 7.4, in 2006, hold all other attributes the same, if the price of a Saab is 13.8% than Volvo, one may still choose Saab; whereas in 2009 and 2010, if a Saab is 30% to 32% cheaper than Volvo, she may not buy it. Within these 5 years, the change of brand value is rather huge. Before 2007, its value is the top among all of the car makes in Sweden. In 2009 and 2010, more than 10 car makes have the larger brand constant value than Saab. With a comparison to Kia, Section 4.2 describes the 7-year warranty of Kia which starts from January of 2010. In this year, the brand value of Kia (which is 0.055 with standard error 0.0546) becomes no less than Volvo.

The decline of Saab value in company leasing user group In this group, the car benefit tax is used instead of the retail price, which is a function of price. Before 2007, the values of Saab, 0.35 in 2005, 0.273 in 2006 and 0.265 in 2007, are higher than Volvo with a 99% confidence interval, but afterwards, the value becomes lower than Volvo. In 2010, the Saab constant value drops to -0.824, lower than other 11 car makes including Volvo. The value of Saab in 2006 is 10% of the benefit tax higher than Volvo, whereas in 2010, it becomes 51% of benefit tax lower. On contrary, after 2009, and brand value of Volkswagen (0.402 in 2009 and 0.374 in 2010) becomes statistically significantly higher than Volvo. And in terms of Kia, though the 7-year warranty starts in 2010, the brand value is still lower than Volvo. But comparing to previous years, the distance is smaller. From the analysis of the shift of brand value, we can conclude that the bankrupt crisis and reselling issue of Saab results the slump of Saab brand value. The decreasing is not terminated after the purchase of a Dutch auto manufacturer Skyper. This change of brand value mainly affects the private car owners and company leasing car owners, while there is not evidence to show the negative effect to company car owners without leasing. By contrast, other car makers are making progress in Sweden. One example is Kia. Its long term quality guarantee leads to a increasing brand value. The increasing sales in 2009 of Kia is mainly due to other parameters, like price. But in 2010, the brand value plays an essential role. Contrasting the different user groups, the 7-year warranty policy influences the shift attitude of private car owners more than the other two groups, as for company car owners, the Kia’s brand value becomes no less than zero even before the execution of the policy, and for leasing car users, though the distance to zero is narrower, this policy still does not reverse the negative effect to the Kia car utility. Chapter 8

Conclusion and discussion

8.1 Summary of results

This report analyses the annual sales and the market share at a macro-level in the first part, then 18 models at a micro-level are estimated and their results are discussed in the second part. The results from these two parts can support each other. In the first part, we make a descriptive analysis carried out on the data from two different sources, the whole car fleet registration data of these years in Sweden and the supply data with available car versions in corresponding years. In each year’s car fleet registration data, a subset based on the first registration date of each car is extracted as the new car choice data. Focusing on the new car data, we mainly analyse the market changes of each make and model in Sweden and their changes among various years. Meanwhile, a comparative analysis of new car market in some of other countries who have their own car industries is drawn. We found that new car buyer somehow prefer to buy their own countries’ car makes and models. So, the domestic car market for the car manufacturers can be quite of importance. Then, we find that in Sweden, there is a big growth of the share of diesel and alternative fuel cars, whereas the share of gasoline cars decline. The second part starts with matching of the two sources of data to designate alter- natives of the car choice models and also to add the missing attributes to the registries. Due to the inconsistency of the car model names in the registration and in the supply, this thesis shows a standardisation of the names to matching the data automatically. Then, with the help of discrete choice analysis, each MNL model for each year with a signal type of ownership is estimated. With the comparison to the resultant models, we discover the changes of each car make specific constant and we try to calculate the consumers’ marginal willingness to pay to find how do they evaluate each attribute of a car. We find that, for instance, the make constant of Saab decreases among this years for private owners and leasing users, but for company owners, the trend is not obvious.

99 100 CHAPTER 8. CONCLUSION AND DISCUSSION

In terms of fuel types, we find that the diesel and alternative fuel dummies are neg- ative in 2005. But afterwards, these dummies shift to positive in the following years respectively. Specifically, in this thesis, two major issues are focused on. In the market share analysis, we devote our attention to the decline market share of Saab and the increasing share of Kia after 2008. And from the news, we found that Saab suffers a close-down crisis while Kia offers a “industry-leading” 7-year guarantee to each new Kia car which is sold in Europe. So, we want to prove that if these two pieces of news really affect the change of market share of Saab and Kia respectively by looking into the changes of brand value among these years. As a result, this thesis finds the decreasing brand value of Saab in private car owner group and company car leasing group, with the increasing brand value of Kia among these years and the conspicuous effect of the 7-year warranty to private car owners in 2010. These findings may provide evidences that the bad news of Saab and good news of Kia indeed affect their brand value. Meanwhile, in the descriptive analysis part, the increasing share of diesel and alter- native fuel cars can be found according to the bar chart, Figure 4.3.1 in Chapter 4. In the second part, various fuel types are divided into three parts, which are gasoline, diesel and alternative fuel. The parameters of diesel and alternative fuel are taken account into the models, which is equivalent to fix the parameter of gasoline to zero. Therefore, the signs of the coefficients can reflect the relative attitude in contrast with gasoline. Among the results, these coefficients verify the trends shown in the bar chart. We dis- cuss the change of sign with emphasis on private car owner group, due to the “clean car” compensation policy implemented from 2007 to 2009. Across different years, the value of fuel is monetised based on the car retail price in order to compare the valuation of various types of fuel in different years. The results show that this policy achieves a big success since it reverses new car buyers’ attitude to make them more friendly to environment. And the attitude to the alternative fuel remains positive even when the policy is terminated.

8.2 Comparison results in literatures

First, recall the researches focusing on the car choice behaviours in the U.S. These researches, for example, Mannering et al. (1991) and Train and Winston (2007), show that American car buyers have a preference to U.S. car makers, e.g. GM and Ford. In their analysis about the car market share decline of U.S. car makers, these researches point out that there is a decreasing brand loyalty of U.S. brands among these buyers. 8.3. FUTURE WORKS 101

In fact, this paper somehow verifies such a view. This report finds that the slump of brand value of Saab plays an essential role in the decline market share of Saab. If one investigates the story, we may find that the close-down crisis is caused by the bad management of its former parent company, GM. Hence, this research shows that the decline of U.S. car manufacturers does not only lose market in the U.S., but also affect the new car sales in Sweden. In Sweden, some similar researches are made as well, e.g. Hugosson and Algers (2011). The parameters used in this paper are consistent with their research. However, some parameters are not used in this paper, which are fuel cost, rust protection warranty and share of fuel stations, due to the lack of information. Since this report tries to model the new car choice behaviour from 2005, and the data of these parameters are incom- plete, especially in the earlier years. Before the study of Hugosson and Algers (2011), two other researches dealing with the Sweden new car choices are made respectively by Nilsson (2008) and Kunnapuu¨ (2009). All of these researches aggregate the alternatives into the model level, whereas this paper dig into a more disaggregated level, model- engine level, which increases the number of alternatives from about 300 to more than 900. Meanwhile, the number of observations inputs into the model is larger than pre- vious researches as well, which means that the results can be more efficient. However, due to the large size of data, only multinomial logit models are estimated here.

8.3 Future works

Although this study makes some improvement from the previous studies, there are still some limitations, which are listed below.

Variance of company owners In our data sets, what have to be emphasised is that the data of company owned cars show the records about non-private owners. That is, for non-private used car, they may have a very large variety of usage, since the companies can be various types. If express delivery companies wanted to purchase a car, they would be more likely to get a van and less likely to get a three-box car as they need larger space for goods. However, if a car were used for formal commerces, perhaps, it could be a three-box sedan. In our estimation, we put all of the data into the model, simply because we have no information about the companies and there is few evidence to show what the usage is for such a new car. However, the overall preference of non- private car buyers can still be captured. 102 CHAPTER 8. CONCLUSION AND DISCUSSION

Ambiguity between “leasing” and “renting” The terminology of a “leasing” car means an employee is supplied a car by her company for private travels. However, for some of the cars labelled by leasing, it maybe also refers to renting, which means a car is owned by a car renting company. In this case, it is not supplied for its employees, but for the customers. This situation may also lead to some errors, since the customers are not going to pay the benefit tax. However, similar to the circumstances of the company owned cars, the information distinguishing “leasing” and “renting” is of lack either. As a result, we still have to input all the data into the estimation.

Lack of buyers’ information When modelling, we only have the information of cars. So, we have to assume that all of the buyers in each group are indifferent. However, it cannot be true, because they have their own situations, like the size of household, their income or if they have already had a car. So, these models cannot explain too many things due to the lack of such data.

Brand loyalty and brand value Strictly, these two concepts are not equivalent. Brand loyalty relates to the previous car make one owns. If she purchases a same make of car as her new car, we may define this situation as the loyalty to such a brand. In this paper, as we do not have the information of each buyer, thus we cannot know if they have had a car and purchase the same car make. So, the make specific constant is treated as brand value in the models, reflecting that holding all other parameters the same, which make has a higher utility.

Calculation of benefit tax This only affects the models of company leasing cars. According to equation 7.2, the computation of car benefit tax is based on the “base amount”. In our estimation, the unique base amount, 42,800 SEK, is used to calculate all of tax in these 6 years. Actually, this amount of money may be changed in different years. Therefore, the estimation results of leasing car user group may have some errors.

IIA test The multinomial logit model assumes that the IIA property holds. Actually, it is reasonable to deem that the elasticities can be larger between the alternatives of the same model than different car makes, and it can also be larger between the alternatives of the same car type, like SUV or racing car. Due to the data size and time constraint, this paper only estimates the MNL model. However, the estimation results might be better if a nested logit or cross nested logit model is introduced. 8.3. FUTURE WORKS 103

Time parameters In this paper, we estimate 18 models in total. For each user group, there are 6 models for 6 years. Therefore, it becomes difficult to analyse the changes among different years. If a set of time parameters, which indicates the time of purchase, is added to the estimation, we can then study the changes across years by comparing with the coefficients. In this case, the set of alternatives should be various as well, since the available alternatives are different during each year. 104 CHAPTER 8. CONCLUSION AND DISCUSSION Appendices

105

Appendix A

List of car makes and models

Table A.1: Car makes and models in Swedish new car market from 2007 to 2010

Make Model 147 159 166 Brera GT Alfa Romeo Spider Crosswagen Mi.To Aston Martin DB9 DBS Vantage A1 A3 A4 A4 Avant A5 A5 Avant A6 A6 Avant A8 Q5 Q7 R8 RS4 RS4 Avant RS5 Audi RS6 RS6 Avant S3 S4 S4 Avant S5 S6 S8 TT TT TTS Bentley Arnage Azure Brookland Continetal Mulsanne 1-Series 3-Series 5-Series 6-Series 7-Series BMW M3 M5 M6 X1 X3 X5 X6 Z4 BLS CTS Escalade SRX STS Cadillac XLR Aveo Captiva Corvette Cruze Epica Chevrolet HHR Kalos Lacetti Matiz Nubira Spark Uplander 300C Crossfire Gr. Voyager PT Cruiser Sebring Chrysler Voyager Berlingo C-Crosser C1 C2 C3 Citroen¨ C4 C4 Picasso C5 C6 C8 DS3 Jumpy XSara Corvette Corvette Logan Sandero Caliber Journey Nitro Viper 599 612 575M 458 Italia California Ferrari F430 500 500C Bravo Croma Doblo` Fiat Ducato Panda Punto Qubo Ulysse C-Max Fiesta Focus Fusion Galaxy Ford Ka Kuga Maverick Mondeo S-Max Tourneo Transit Accord Civic CR-V FR-V Insight Legend S2000 Continued on next page

107 108 APPENDIX A. LIST OF CAR MAKES AND MODELS

Table A.1 – continued from previous page Make Model Hummer H2 H3 Accent Atos Coupe Elantra Getz Grandeur H-1 I10 I20 I30 Matrix Santa Fe´ Sonata Starex Terracan Trajet Tucson Daimler S-Type X-Type XF XFR Jaguar XJ XK XK8 XKR Cherokee Commander Compass Gr. Cherokee Patriot Jeep Wrangler Carens Carnival Cee’d Cerato Magentis Picanto Pro Cee’d Rio Sorento Soul Sportage Venga Koenigsegg Agera CC Lamborghini Gallardo Murcielago´ Reventon´ Superleggera Land Rover Defender Discovery Freelander Range Rover Lexus GS IS LS RX SC Lotus Elise Europa Evora Exige Coupe Grancabrio Gransport Granturismo Quattroporte Maserati Spyder 2 3 5 6 CX-7 Mazda MX-5 RX-8 A B C CL CLC CLK CLS E G GL Mercedes GLK Maybach ML R S SL SLK SLR SLS Mini Cooper ASX Colt Grandis Lancer Outlander Morgan 4/4 Aero 8 Plus 4 Roadster 350Z 370Z Almera GT-R Juke Micra Murano Note NV2000 Pathfinder Pixo Primastar Primera Qashqai X-Trail Agila Antara Astra Combo Corsa Opel GT Insignia Meriva Movano Tigra Vectra Vivaro Zafira 107 206 207 307 308 Peugeot 407 607 807 1007 3008 4007 5008 Boxer Partner Porsche 911 Boxster Cayenne Cayman Panamera Clio Espace Gr. Espace Gr. Scenic´ Kangoo Renault Koleos Laguna Megane´ Modus Scenic´ Trafic Twingo Rolls-Royce Phantom Saab 9-3 9-5 Alhambra Altea Cordoba Exeo Ibiza Seat Leon Toledo ˇ Fabia Octavia Praktik Roomster Superb Skoda Yeti Continued on next page 109

Table A.1 – continued from previous page Make Model Smart Smart Ssangyong Actyon Kyron Rexton Rodius B9 Forester Impreza Justy Legacy Subaru Outback Tribeca Alto Gr. Vitara Jimny Splash Swift Suzuki SX4 Auris Avensis Aygo Corolla Hiace Toyota iQ Land Cruiser Prius RAV4 Urban Cruiser Verso Yaris Caddy Caravelle EOS Fox Golf Jetta Multivan New Beetle Passat Phaeton Scirocco Sharan Shuttle Tiguan Touareg Touran Transporter C30 C70 S40 S60 S80 Volvo V50 V70 XC60 XC70 XC90 End of table 110 APPENDIX A. LIST OF CAR MAKES AND MODELS Appendix B

List of numerical attributes

Table B.1: Statistical analysis of the variables

Statistical results Variables Vintage Min. Median Mean Max. Std. dev1 2007 79900 252800 338333 5093000 362007.2 2008 79900 263900 340548 5093000 331367.5 Price (SEK) 2009 82500 268900 339455 5093000 312855.5 2010 84900 274800 348956 5100000 330747.0 2007 90.0 191.0 202.9 512.0 60.17 2008 90.0 189.0 198.9 512.0 58.02 CO emission (g/km) 2 2009 90.0 179.0 189.6 495.0 54.43 2010 89.0 169.0 179.3 495.0 53.77 2007 3.0 8.00 8.3 22.0 2.64 2008 3.0 8.00 8.1 21.0 2.56 Fuel mixed2 2009 3.0 7.00 7.8 21.0 2.39 2010 3.0 7.00 7.7 21.0 2.39 2007 11.0 60.0 62.3 125.0 13.39 2008 11.0 61.0 62.7 121.0 12.96 Tank volume 2009 12.0 60.0 62.3 110.0 12.96 2010 12.0 60.0 61.5 110.0 12.65 2007 250.0 452.0 447.8 617.0 38.80 2008 270.0 457.0 449.6 617.0 37.67 Length (cm) 2009 270.0 456.0 449.9 617.0 36.56 2010 48.03 454.0 449.5 617.0 36.72 2007 150.0 179.0 179.3 220.0 7.92 2008 150.0 180.0 180.0 220.0 7.79 Width (cm) 2009 141.0 180.0 180.1 220.0 7.64 2010 141.0 180.0 180.2 205.0 7.48 2007 107.0 148.0 152.8 208.0 13.72 2008 107.0 148.0 152.6 263.0 13.47 Height (cm) 2009 107.0 148.0 152.8 263.0 13.45 2010 107.0 148.0 152.1 263.0 13.06 2007 730 1545 1568 2890 322.67 2008 750 1564 1582 2903 308.09 Curb weight (kg) 2009 750 1564 1583 2800 311.33 2010 750 1560 1570 2780 301.90 Payload (kg) 2007 145.0 465.0 471.0 1135.0 113.37 Continued on next page

111 112 APPENDIX B. LIST OF NUMERICAL ATTRIBUTES

Table B.1 – continued from previous page Statistical results Variables Vintage Min. Median Mean Max. Std. dev 2008 80.0 465.0 472.7 1138.0 114.48 2009 80.0 470.0 476.9 1135.0 113.99 2010 80.0 475.0 480.1 1018.0 114.88 2007 0 75.0 69.3 150.03 32.99 2008 0 75.0 69.5 820.03 36.27 Roof load 2009 0 75.0 70.6 470.0 36.91 2010 0 75.0 70.3 493.0 39.01 2007 0 1460 1417 3500 675.70 2008 0 1500 1421 3500 665.32 Trailer weight 2009 0 1500 1435 3500 672.18 2010 0 1480 1390 3500 673.74 2007 980 2010 2039 3490 382.46 2008 940 2035 2055 3901 368.12 Total weight 2009 940 2035 2060 3405 372.66 2010 940 2030 2051 3405 359.39 2007 698 1997 2340 7011 1013.39 2008 796 1997 2341 7011 1010.51 Displacement 2009 796 1997 2271 7011 943.11 2010 796 1995 2213 6761 936.96 2007 30.0 110.0 129.7 478.0 67.51 2008 33.0 110.0 131.8 478.0 66.59 Power (kw) 2009 33.0 110.0 129.5 478.0 64.29 2010 33.0 110.0 130.2 478.0 67.04 2007 41.0 150.0 176.4 650.0 91.78 2008 45.0 150.0 179.2 650.0 90.54 Power (hp) 2009 45.0 150.0 176.2 650.0 87.41 2010 45.0 150.0 177.1 650.0 91.15 2007 3000 5600 5283 8500 1041.04 2008 3000 5500 5206 8500 1055.94 RPM at max power 2009 2800 5300 5086 8500 1048.76 2010 2800 5000 5022 8500 1051.38 2007 72.0 251.0 278.5 1000.0 129.42 2008 72.0 280.0 287.3 1050.0 128.92 Torque (N m) · 2009 72.0 280.0 290.4 1050.0 128.21 2010 72.0 280.0 295.1 1050.0 130.42 2007 1000 3000 3032 7500 1110.89 2008 1000 2750 2914 7500 1136.81 RPM at max torque 2009 1200 2200 2769 7500 1148.64 2010 1903 2000 2613 6750 1150.67 2007 0 430.0 413.9 1651 172.59 2008 0 430.0 413.2 1651 162.36 Luggage space 2009 0 430.0 417.0 1900 171.62 2010 0 430.0 419.1 1900 170.04 2007 3.0 10.0 10.2 22.0 3.06 2008 3.0 10.0 10.1 22.0 2.93 Weight capacity (kg/hp) 2009 3.0 10.0 10.2 22.0 2.89 2010 3.0 10.0 10.2 24.0 2.98

Continued on next page 113

Table B.1 – continued from previous page Statistical results Variables Vintage Min. Median Mean Max. Std. dev

2007 4.0 14.0 13.9 30.0 4.17 2008 4.0 14.0 13.7 30.0 3.99 Weight capacity (kg/kw) 2009 4.0 14.0 13.8 30.0 3.93 2010 4.0 14.0 13.8 32.0 4.05 2007 132 203 207 337 30.86 2008 132 205 207 337 29.72 Max speed 2009 132 205 207 337 28.94 2010 132 205 208 335 29.93 2007 3.0 10.0 10.0 24.0 2.74 2008 3.0 10.0 9.9 99.03 3.07 Acceleration 2009 3.0 10.0 9.9 24.0 2.57 2010 3.0 10.0 9.8 22.0 2.64

End of table

1Standard deviation 2Fuel mixed refers to the fuel consumptions under mixed county and town driving. 3Numbers in italics indicate that maybe the data are recorded falsely in the supply 114 APPENDIX B. LIST OF NUMERICAL ATTRIBUTES Appendix C

Estimated parameters comparison

Table C.1: Comparison between sample and total observations for model 2009

Parameter Coefficient Standard error Coefficient Standard error T-value Alfa Romeo -2.64 0.122 -3.17 0.355 1.4119 Audi -0.706 0.0216 -0.717 0.0482 0.2083 BMW -0.373 0.0205 -0.437 0.0465 1.2594 Cadillac -1.25 0.0803 -1.19 0.174 -0.3131 Citroen¨ -1.2 0.0257 -1.22 0.0577 0.3166 Chrysler -1.11 0.0749 -1.12 0.169 0.0541 Chevrolet -1.51 0.0404 -1.42 0.088 -0.9295 Dodge -1.58 0.0888 -1.51 0.193 -0.3295 Dacia -2.38 0.0506 -2.33 0.113 -0.4038 Fiat -3.5 0.0707 -3.52 0.16 0.1143 Ford -1.1 0.0174 -1.09 0.0388 -0.2352 Hyundai -0.964 0.025 -0.947 0.0555 -0.2793 Hummer -0.846 0.165 -0.532 0.325 -0.8615 Honda -0.349 0.0222 -0.442 0.0514 1.6610 Jaguar -1.37 0.106 -1.15 0.212 -0.9282 Jeep -0.389 0.0747 -0.425 0.172 0.1920 Kia -0.257 0.0295 -0.371 0.0688 1.5229 Lexus -0.456 0.0656 -0.616 0.157 0.9403 Land Rover -0.394 0.0962 -0.104 0.186 -1.3849 Mercedes -0.794 0.0238 -0.841 0.0539 0.7977 Mitsubishi -1.16 0.0291 -1.11 0.0642 -0.7093 Mini -0.804 0.0559 -0.872 0.131 0.4774 Mazda -0.646 0.0259 -0.67 0.0583 0.3762 Nissan -0.707 0.0247 -0.692 0.0555 -0.2469 Opel -1.3 0.0244 -1.23 0.0533 -1.1941 Peugeot -0.748 0.0207 -0.793 0.0467 0.8809 Porsche -0.434 0.0848 -0.474 0.195 0.1881 Renault -1.62 0.0272 -1.66 0.0617 0.5932 Saab -0.684 0.0291 -0.698 0.0649 0.1968 Seat -1.7 0.03 -1.67 0.0666 -0.4107 Skodaˇ -0.384 0.0216 -0.322 0.0476 -1.1861 Subaru -0.107 0.0258 -0.0965 0.0574 -0.1668 Ssangyong -1.87 0.205 -2.34 0.579 0.7652 Suzuki -1.19 0.0382 -1.27 0.0886 0.8292 Toyota -0.0175 0.0167 -0.0106 0.0372 -0.1692 Volkswagen -0.368 0.0183 -0.367 0.0409 -0.0223 Smart -2.5 0.105 -2.21 0.208 -1.2446 Whiplash “Good” 0.281 0.0111 0.29 0.0247 -0.3324 Continued on next page

115 116 APPENDIX C. ESTIMATED PARAMETERS COMPARISON

Table C.1 – continued from previous page Parameter Coefficient Standard error Coefficient Standard error T-value Length (cm) 0.00117 0.000152 0.00146 0.000339 -0.7806 Log of price -2.34 0.0277 -2.31 0.0617 -0.4436 Power capacity -0.00239 0.000594 -0.00224 0.00133 -0.1030 Tank volume 0.0322 0.000712 0.0325 0.00159 -0.1722 Acceleration -0.0561 0.0029 -0.0528 0.00647 -0.4654 Alternative fuel 0.815 0.0114 0.81 0.0255 0.1790 Diesel fuel 0.312 0.0177 0.279 0.0398 0.7576 ESP -0.0852 0.0126 -0.0562 0.0283 -0.9361 Fuel consumption -0.148 0.00494 -0.157 0.0111 0.7408 Two-box car 0.448 0.00914 0.458 0.0205 -0.4455 Share of tax -0.768 0.0163 -0.752 0.0364 -0.4012 End of table Bibliography

7-Year warranty now for every Kia in Europe (2010). http://www.kia-press.com/ press/products/7-year_warranty_01_10.aspx. Online; last checked 2011- 03-25.

Almgren, R. (2005). Mysql database connector. URL: http://www.mathworks.com/matlabcentral/fileexchange/8663-mysql-

Arnberg, S., Bjørner, T. B., Fosgerau, M. and Larsen, M. M. (2008). Fuel costs and consumers’ choice of car, AKF Working paper 2008(9).

Automobil/Tabellen und Grafiken (2010). http://de.wikipedia.org/wiki/ Automobil/Tabellen_und_Grafiken. Online; last checked 2011-03-28.

Berkovec, J. (1985). Forecasting automobile demand using disaggregate choice models, Trans- portation Research Part B: Methodological 19(4): 315–329.

Berkovec, J. and Rust, J. (1985). A nested logit model of automobile holdings for one vehicle households, Transportation Research Part B: Methodological 19(4): 275–285.

Bierlaire, M. (2003). BIOGEME: A free package for the estimation of discrete choice models, Proceedings of the 3rd Swiss Transportation Research Conference, Ascona, Switzerland .

Bierlaire, M. (2008). Estimation of discrete choice models with BIOGEME 1.8, biogeme.epfl.ch .

China car sales ranking by model in 2010 (2011). http://www.cs.com.cn/qcpd/02/ 201101/t20110120_2752665.html. Online; last checked 2011-03-28.

Choo, S. and Mokhtarian, P. L. (2004). What type of vehicle do people drive? The role of attitude and lifestyle in influencing vehicle type choice, Transportation Research Part A 38: 201–222.

Feng, S. (2010). China passenger car shares by brand in 1st half of 2010, http://oriony. blog.163.com/blog/static/126979422201061922955270/. Online; last checked 2011-03-25.

General Motors sells Saab to Dutch firm Spyker (2010). http://news.bbc.co.uk/2/ hi/business/8481621.stm. Online; last checked 2011-03-25.

GM to ‘wind down’ Saab business (2009). http://news.bbc.co.uk/2/hi/8421007. stm. Online; last checked 2011-03-25.

Haak, C. (2011). 2010 US auto sales may indicate brighter 2011, Autosavant.com. http: //www.autosavant.com/2011/01/05/. Online; last checked 2011-03-28.

Hess, S., Fowler, M., Adler, T. and Bahreinian, A. (2009). A joint model for vehicle type and fuel type, European Transport Conference 2009.

117 118 BIBLIOGRAPHY

Hicks, J. R. (1939). Value and capital: An inquiry into some fundamental principles of economic theory, Oxford: Clarendon Press.

Hugosson, M. B. and Algers, S. (2011). Accelerated introduction of ‘clean’ cars in sweden, in T. Zachariadis (ed.), Cars and Carbon Automobiles and European Climate Policy in a Global Context.

James, D. A. (2001). An R/S interface to the mysql database. URL: http://www.omegahat.org/download/contrib/RS-DBI/RS-MySQL.pdf

Kunnapuu,¨ L. (2009). Modeling new car purchase choices - A decision support tool for climate policy making, Master’s thesis, KTH Royal Institute of Technology.

Lave, C. A. and Train, K. E. (1979). A disaggregate model of auto-type choice, Transportation Research Part A: General 13(1): 1–9.

Mannering, F. and Winston, C. (1985). A dynamic empirical analysis of household vehicle ownership and utilization, Rand Journal of Economics 16(2): 215–236.

Mannering, F., Winston, C., Griliches, Z. and Schmalensee, R. (1991). Brand loyalty and the decline of American automobile firms, Brookings Papers on Economic Activity. Microe- conomics 1991: 67–114.

Manski, C. F. and Sherman, L. (1980). An empirical analysis of household choice among motor vehicles, Transportation Research Part A 14A: 349–366.

MySQL 5.0 Reference Manual (2011).

Nilsson, M. (2008). Sav˚ aljer¨ svensken bil - Hur bilspecifika egenskaper, miljobilspremier¨ och branslepriser¨ paverkar˚ val av bil, Master’s thesis, Uppsala University.

Train, K. E. (2009). Discrete Choice Methods with Simulation, Cambridge University Press.

Train, K. E. and Winston, C. (2007). Vehicle choice behavior and the declining market share of U.S. automakers, International Economic Review 48(4): 1469–1496.