Iryna Bedovska Kristina Klatz Anna Nikolova ECO 300: Econometrics Prof. G. Kalchev AUBG 04/28/2015 I. Content

Introduction…………………………………………………………………………………………

Data…………………………………………………………………………………………………

Methodology……………………………………………………………………………………

Empirical Results………………………………………………………………………………….

Conclusion………………………………………………………………………………………… II. Introduction

Driving a motor vehicle is an extremely popular and useful advantage nowadays. No matter whether people are using it to go to work, go on a business trip, or just on a vacation, the automobile transportation seems to be a general necessity for almost the entire world.

Unfortunately, every year thousands of people are dying on the road in a car accidents. This project encompasses a deep econometrical analysis about the main factors that influence the probability of getting in a car accident. The main question that the paper is going to answer is whether the gender of the driver matters when it comes to a car accident caused by the driver. This is a controversial question that has been discussed by various studies. For example, an article from the Washington Post web site states that “the men are the worse drivers than the women.”(Julie Zauzmer, 2014) It also comments that in the USA, men and woman hold driving license in about the same numbers, according to federal data.

The article also comments that another main contribution for higher probability of a car accident is the distraction of the driver in every form – reading while driving, smoking, talking on the phone, or distraction caused by other passengers in the car.

According to other study, “Mayo Road Safety,” Ireland, however, the gender is not an important factor to the road safety. On the other hand, it points out that speed, whether, and road conditions are big contributors to the road safety. An opinion on the problem has also Michael Pines who wrote an article in Drivers.com web site mentioning that top three causes of a car accidents in America are drunk driving, speeding, and distracted driving.

What and who causes the car accidents is indeed a controversial topic. With this project we are going to consider and analyze the problem while looking at a specific cross section data set consisting of several variables explaining the topic in England in 2011. III. Data

The data on the traffic incidents was published by the United Kingdom Department of

Transport and licensed by the Open Government license (Department of Transport, 2012). The dataset consists of 3995 observations collected during a year of 2011 in the Great Britain and contains the important characteristics of each of them. In our dataset, we have included the variety of factors which can contribute to the traffic-related accident, among which are the weather conditions, purpose of driving, age of driver and vehicle, type of vehicle, time of the week, light conditions, and the Breath Alcohol Test results. The data also includes the information on whether or not the driver was male or female, whether the accident happen in the rural or urban area or close to junction and whether the accident was caused by driver, passenger, or pedestrian. The light conditions were scaled from 1 to 7, where 1 is the lightest and 7 the darkest. The Breath Alcohol Test (BAT) is measured in the micrograms contained in the 100ml of blood. To measure other factors, except for the age which is self explanatory, we used dummy variables which contain the information on whether or not the certain factor was present in the case.

The descriptive statistics of each variable is demonstrated below (Table 1). The mean of the first three variables indicates that around 60% of the accidents were caused by the driver,

15% caused by the passenger and around 25% by the pedestrian who was involved in a traffic accident. The variable female indicates that the female driver was involved in around 30% of the accidents, which is reasonable. According to the British Department of Transport, there is a bigger percentage of male drivers applying for the driver’s license than the female, which leads to the conclusion that having more man on the road should correspond with the percentage of male involved in a car crash. Whether the gender contributes to the probability of car accident will be explained later in the paper. For both male and female drivers the average age was around 30 years, and the oldest driver being 75. Meanwhile, the youngest driver to get in a traffic-related accident was 6 years old, which is abnormal for the sample and is very likely to contribute to the accident in that particular case.

Table 1: Descriptive Statistics

Independent Observations Mean Std. Dev. min max variables causedbydr~r 3995 .6010013 .4897538 0 1 causedbypa~r 3995 0.151189 0.3582778 0 1 causedbype~n 3995 .2478098 .4317948 0 1 Ageofdrver 3697 32.77144 13.89976 6 75 Female 3995 .3319149 .4709596 0 1 Weekend 3995 .2145181 .410539 0 1 Rotary 3995 .0235294 .1515966 0 1 Onewaystreet 3995 .0565707 .2310494 0 1 Closetojun~n 3995 .8187735 .385254 0 1 Light_Cond~s 3995 1.926408 1.395174 1 7 Fogormist 3995 .0002503 .0158213 0 1 Rain 3995 .0876095 .2827616 0 1 Snow 3995 .0005006 .0223719 0 1 Wetroad 3995 .1409262 .3479889 0 1 Frostice 3995 .0015019 .0387298 0 1 Badroadmai~e 3995 .0180225 .1330494 0 1 BreathAlco~l 3995 32.81377 41.06407 0 945 Was_Vehicl~e 3995 .0020025 .0447101 0 1 Apurposeas~k 3995 .2225282 .415996 0 1 Bpurposeco~k 3995 .0347935 .1832793 0 1 Cpurposeta~o 3995 .0022528 .0474163 0 1 Dpurposepu~s 3995 .0002503 .0158213 0 1 Fnotknownp~ 3995 .7401752 .4385932 0 1 e Age_of_Veh~e 2499 6.489396 4.154852 1 39 Car 3995 .5041302 .5000455 0 1 Motocycle 3995 .3246558 .4683047 0 1 Goodsvehic~s 3995 .089612 .2856609 0 1 busminibus 3995 .0748436 .2631717 0 1 othertypes~s 3995 .0067584 .0819418 0 1

The table also indicates that the bigger portion of accidents happens during the working dates instead of Sunday and Saturday, and most of them happen in the places close to junction (around 80%) or in the urban area. Most of crashes happened during the daylight, as we can see that on average it was light with the standard deviation of one on the scale from 1 to 7.

As for the weather conditions, most of the variables in the sample does not seem to contribute as much, which we will analyze later in the paper.

The Breath alcohol test is a quite interesting variable here. According to the US national Medical association, the alcohol level of 500 micrograms contained in the 100ml of blood is highly abnormal (and dangerous); therefore, the maximum result of 945 micrograms is not only abnormal but also lethal for the person. However, the average result of 32 micrograms is reasonable with the standard deviation being 41 micrograms, meaning that the majority of the sample is falling between 0 and 50 micrograms per 100 ml.

The purpose of driving was unknown in most of the cases, which might make it quite difficult to estimate whether or not the purpose will actually contribute to the accident, which is one of the shortcomings of the data. The data on the age of vehicle indicates that the age of most of the vehicles falls between 2 and 10 years, however the data on the age is missing for the quarter of the sample (number of observations is 2499 out of 3995). The final rows show that

50% of the vehicles involved in an accident were cars, 30% were motorcycles, and 20% were classified as other types of vehicles.

IV. Empirical Results

We estimated many different specifications, which are appropriate for our dataset, but for the sake of the research we decided to limit the data to the most valuable ones. Therefore, we ran a regression on the amount of accidents caused by the driver on all of the specifications including driver’s age, gender, level of intoxication (if any), weather and light conditions, location and the purpose of driving. The regression showed the following results:

Testing the gender as a factor for the higher possibility of the car accidents Dependent variable: causedbydriver Independent variables (1) Ageofdrver -0.0039177 (0.0006984*) Female -0.3445226 (0.0200852*) Weekend -0.006749 (0.23405*) Rotary 0.0575422 (0.0629283*) Onewaystreet 0.062074 (0.0389585*) Closetojun~n -0.0487564 (0.0241277*) Light_Cond~s 0.0202728 (0,069643*) Fogormist 0.6049933 (0.4869581*) Rain 0.0720242 (0.0504925*) Snow 0.3814812 (0.4525281*) Wetroad -0.0333851 (0.0417999*) Frostice 0.4404122 (0.261792*) Badroadmai~e 0.0101642 (0.0666238*) BreathAlco~l 0.0013927 (0.0002318*) Apurposeas~k 0.407608 (0.1856447) Bpurposeco~k 0.3969768 (0.192715*) Age_of_Veh~e -0.0009889 (0.0022916*) Car 0.0289897 (0.033572*) Motocycle 0.008398 (0.379719*) Goodsvehic~s 0 busminibus 0.0354432 (0.0440995*) othertypes~s 0 Observations 2312 R-squared 0.1486 After the regression we conducted the Breusch-Pagan test to test for the heteroskedasticity, so the regression will not produce any misleading or bias results. The test proved that the heteroskedasticity was unlikely in our case, so we did not use the robust command in the test. From the regression we can see that some variables are much more significant that the others. For instance, the age of driver, light conditions, gender, BAT, location and the purpose of driving are among the significant ones. As for the weather conditions, none of them is significant except for the frost and ice, which we decided to include in our next regressions as well. At this stage, we are not considering the impact of the variables on the probability of causing the car crash yet. Instead, we are trying to make sure that we have included all of the significant variables.

The results on the weather conditions seem to be suspicious. Therefore, we are testing the fog, mist, snow and wet road for the joint significance by using the F-test. Stata proves that we do not reject the hypothesis that none of the variables is significant, and that we can exclude these variables from the regression.

As part of our empirical results we estimated three different specifications consisting of the statistically significant independent variables from the previous regression. The dependent variable in each specification is causedbydriver – a dummy variable that represents the probability of getting in a car accidents caused by the driver of the vehicle.

Testing the gender as a factor for the higher possibility of the car accidents Dependent variable: causedbydriver Independent variables (1) (2) (3)

Female -0.3566255 -0.3618095 -0.3605597 (0.0154588)* (0.0157997)* (0.0157842)* BreathAlcoholLevelmicrog100ml - 0.0009082 0.0010463 (0.0001875)* (0.0001925)* Ageofdriver - -0.0033131 -0.003418 (0.0005542)* (0.000555)* Closetojunction - - -0.05024 (0.0193117)* Light_Conditions - - 0.01577 (0.0054466)* Apurpuseaspartofwork - - -0.0134018 (0.0178304)* Frostice - - 0.2357548 (0.2017408)* Intercept 0.7193706 0.8181745 0.830215 (0.0089061)* (0.0195961)* (0.0275091)* Observations 3995 3697 3697 R-squared 0.1176 0.1384 0.1406

In the first specification, we regressed causedbydriver on female, which leads to the equation: causedbydriver = α + β1*female + u. After estimating the results we get the following expression: causedbydriver = 0.7193706 – 0.3566255female, where the female, after tested with a t-test, is a highly significant dummy variable with rejection region less than 1%.

In order to make sure that the regression is fully reliable we tested it for heteroskedasticity.

According to the Breush-Pegan heteroskedasticity test, the regression is heteroskedastic with

Prob>chi2 = 0.0081. This characteristic of the regression may lead to a misleading results and that is why we estimated the same specification but using robust standard errors.

WLS Estimation of the causedbydriver Equation Independent Variables With Nonrobust With Robust Standard Standard Errors Errors female -0.3566255 -0.3566255 (0.0154588) (0.018143) After being corrected for heteroskedasticity, the coefficient in front of the independent female variable stays highly significant and robust. The negative sign of the β1 coefficient in front of the variable implies that the female have around 36% less contribution to the probability of getting in a car accident than men. This result coincides with the article written by Julie Zauzmer.

In the second specification, we added two more independent variables in order to see what the results will be if we reduce the probability of bias in the coefficient in front of female.

The equation that we are testing is:

causedbydriver = α + β1*female + β2*BreathAlcoholLevelmicrog100ml + β3*Ageofdriver + u

The estimation that results from the regression is expressed as the following expression: causedbydriver = 0.8181745 - 0.3618095*female +

0.0009082*BreathAlcoholLevelmicrog100ml – 0.0033131*Ageofdriver

Corrected for heteroskedasticity, all coefficients are highly significant with rejection region less than 1%. The bias in the estimate in front of female is very small, showing that the alcohol and the age of the driver have just a little impact on the probability of getting in a car accident. This can be explained with the fact that if a driver causes a car accident it does not necessary means that she/he was drunk or too young. But if the driver was drunk or immature, there is a higher chance for him/her to get in a car accident. WLS Estimation of the causedbydriver Equation Independent Variables With Nonrobust With Robust Standard Standard Errors Errors female -0.3618095 -0.3618095 (0.0157997) (0.0164824) BreathAlcoholLevelmicrog100ml 0.0009082 0.0009082 (0.0001875)* (0.0002519) Ageofdriver -0.0033131 -0.0033131 (0.0005542) (0.0006065)

In the last specification, we included all the significant variables from the general regression.

That leads to the equation: causedbydriver = α + β1*female + β2*BreathAlcoholLevelmicrog100ml + β3*Ageofdriver +

β4*Closetojunction + β5*Light_Conditions + β6* Apurpuseaspartofwork + β6*Frostice + u

The estimation observed after regressing the equation is: causedbydriver = 0.830215 – 0.3605597*female +

0.0010463*BreathAlcoholLevelmicrog100ml - 0.003418 *Ageofdriver -

0.05024*Closetojunction + 0.01577*Light_Conditions - 0.0134018 * Apurpuseaspartofwork +

0.2357548*Frostice

The regression was corrected for heteroskedasticity with the robust standard errors for more precise results.

WLS Estimation of the causedbydriver Equation Independent Variables With Nonrobust With Robust Standard Errors Standard Errors female -0.3605597 -0.3605597 (0.0157842) (0.0165358) BreathAlcoholLevelmicrog100ml 0.0010463 0.0010463 (0.0001925) (0.0002791) Ageofdriver -0.003418 -0.003418 (0.000555) (0.0006115) Closetojunction -0.05024 -0.05024 (0.0193117) (0.0191075) Light_Conditions 0.01577 0.01577 (0.0054466) (0.0056518) Apurpuseaspartofwork -0.0134018 -0.0134018 (0.0178304) (0.0178519) Frostice 0.2357548 0.2357548 (0.2017408) (0.2472195)

The t-statistic shows that all the independent variables remain highly significant except the

Apurposeaspartofwork and Frostice which have rejection region of 0.453 and 0.340 respectively.

The result on the last regression is the most accurate since it includes more variables and decreases the bias in the coefficients from the second specification. The results conveys that even when decreasing the possibility of correlation of female with the error term, the independent variable stays highly significant. It implies that being a female has 36% less chance of getting in a car accident. BreathAlcoholLevelmicrog100ml, Ageofdriver, Closetojunction and

Light_Conditions have a very little impact on the probability of getting in a car accident – 0.1%,

0.3%, 5%, and 1% respectively, but still remaining extremely statistically significant.

We explored further the influence of women on getting in a car accident. We decided to check whether women are more distracted on the road than men. Therefore, we conducted a new regression with another dependent variable causedbypassenger, whereas our model is causedbypassenger = α + β1*female WLS Estimation of the causedbydriver Equation Independent Variables With Nonrobust With Robust Standard Errors Standard Errors female -0.3605597 (0.0157842) (0.0165358) 0.0010463 0.0010463 (0.0001925) (0.0002791)

V. References

 Zauzmer, Julie. "Men Are Worse Drivers, Reading Causes More Crashes than Eating, and 6 Other Facts about D.C. Accidents." Washington Post. July 23, 2014. Accessed April 28, 2015. http://www.washingtonpost.com/blogs/dr-gridlock/wp/2014/07/23/men-are-worse- drivers-reading-causes-more-crashes-than-eating-and-6-other-facts-about-d-c-accidents/.

 "Road Safety." Road Safety. Accessed April 28, 2015. http://www.roadsafetymayo.ie/CausesofAccidents/.  Pines, Michael. "Top 3 Causes of Car Accidents in America." Driverscom RSS. February 19, 2013. Accessed April 28, 2015. http://www.drivers.com/article/1173/.

 "Publications." - GOV.UK. January 1, 2011. Accessed April 30, 2015.

https://www.gov.uk/government/publications.

 "Breath Alcohol Test: US National Medical Association." U.S National Library of Medicine.

Accessed April 30, 2015. http://www.nlm.nih.gov/medlineplus/ency/article/003632.htm.