A Comparative Analysis of Empirical Bayes and Bayesian Hierarchical Models in Hotspot Identification

By

Xiaoyu Graduate Research Assistant Texas A&M Transportation Institute 3135 TAMU, College Station, TX 77843-3135 Tel: (979) 845-8408 fax: (979) 845-6481 Email: [email protected]

Lingtao *, Ph.D. Assistant Research Scientist Texas A&M Transportation Institute Texas A&M University System, 3135 TAMU College Station, Texas 77843-3135 Phone: (979) 845-7214, fax: (979) 845-6481 Email: [email protected]

Yajie Zou, Ph.D. Associate Professor Department of Transportation Engineering Tongji University 4800 ’an Road, Shanghai 201804 Phone: +86 (21) 6958-8152 Email: [email protected]

Lee Fawcett, Ph.D. Lecturer School of Mathematics, Statistics & Physics Newcastle University Newcastle upon Tyne, NE1 7RU, UK Phone: +44 (0)191-2087228, fax: +44 (0)191-2087228 Email: [email protected]

Word count: 7,488 Words (5,788 Text + 7 Tables * 250 each)

November 15, 2018 *Corresponding Author Guo, Wu, Zou, Fawcett 1

1 ABSTRACT

2 Hotspot identification is an important step in the highway safety management process. Errors in 3 hotspot identification (HSID) may result in an inefficient use of limited resources for safety 4 improvements. The empirical Bayesian (EB)-based HSID has been widely applied as an 5 effective approach in identifying hotspots. However, there are some limitations with the EB 6 approach. It assumes that the parameter estimates of the safety performance function (SPF) are 7 correct without any uncertainty, and does not consider temporarily instability in crashes, which 8 has been reported in recent studies. Bayesian hierarchical model is an emerging technique that 9 addresses the limitations on the EB method. Thus, the objective of this study is to compare the 10 performance of the standard EB method and the Bayesian hierarchical model in identifying 11 hotspots. Three methods (i.e. Crash rate, EB, and the Bayesian hierarchical model-based 12 methods) were applied to identify risky intersections with different significant levels. Four 13 evaluation tests (i.e., Site Consistency; Method Consistency; Total Rank Differences; and 14 Poisson Mean Differences tests) were conducted to assess the performance of these three 15 methods. The testing results suggest that: (1) the Bayesian hierarchical model outperforms the 16 crash rate and the EB-based methods in most cases. Bayesian hierarchical model improves the 17 accuracy of HSID significantly; (2) hotspots identified with crash rates are generally unreliable. 18 It is significant for roadway agencies and practitioners to accurately rank sites in the roadway 19 network in order to effectively manage safety investments. Roadway agencies and practitioners 20 are encouraged to consider the Bayesian hierarchical models in identifying hotspots.

21

22 Keywords: Roadway Safety, Hotspot Identification, Empirical Bayes, Bayesian Hierarchical 23 Model Guo, Wu, Zou, Fawcett 2

1 INTRODUCTION

2 The identification of crash hotspots (also known as prone sites, sites with promise, or black 3 spots) is one of the most important tasks in the roadway safety management process. Errors in 4 hotspot identification (HSID) can result in inefficient use of limited resources for safety 5 improvements and cause additional loss of lives.

6 Various methods have been proposed for HSID (1-3), and researchers have been 7 continuously improving the methods (4-8); unfortunately, sites identified by different methods 8 and ranking criteria are not identical (9) (they are not discussed in detail here due to space 9 limitations). Observed crash counts and crash rates were often used by roadway agencies, but 10 analyses have shown that these two methods cannot account for the regression-to-the-mean 11 (RTM) bias and are not reliable (10, 11). Empirical Bayes (EB) based methods have shown 12 superiority in estimating safety as well as in identifying hotspots (8, 10). The standard EB 13 method combines the observed crash counts of one site and the predicted safety of similar sites. 14 The latter one is typically derived from a safety performance function (SPF). The EB method has 15 been included in the first edition of the Highway Safety Manual (12) and is widely used in HSID 16 for its ability in correcting the RTM bias and increasing estimation precision. Although many 17 studies have shown that the EB method always performs better than other common HSID 18 methods, it is not without any limitations. One critical issue with the EB method is the 19 implementation of the SPF to predict crashes. The SPF is usually modeled using crash data 20 occurring at a similar “reference” pool of sites. In the conventional EB method, the SPF assumes 21 the number of crashes at each site follows a Poisson distribution and is independent from each 22 other in different years. In other words, there is no yearly variation in safety at each site when 23 assuming that the traffic volume and other key roadway features remain at the same level. 24 However, this is often not true. With the evolution in vehicles, driving behavior, roadway design 25 standards, etc., the crash data are temporarily instable (13, 14). Without accounting for the 26 instability of crashes, the HSID results estimated through the EB method may be inaccurate 27 under certain situations. In addition, the EB method assumes that the parameters in a fitted SPF 28 are the true estimates, which is also a problematic assumption, especially when the sample size 29 of reference sites is low (15-17).

30 Recently, Fawcett et al. (18) proposed a novel Bayesian hierarchical model (denoted as 31 Bayesian hierarchical model hereafter) for estimating and predicting roadway safety. The main 32 advantage of the Bayesian hierarchical model is incorporating crash counts from multiple time- 33 points, with the counts in more recent years lending more weight to safety estimates than the 34 counts from time-points further in the past. The proposed model is able to capture the temporary 35 trend of crashes at a site. Previous study has discussed the structure of the Bayesian hierarchical 36 model and its application to real crash data. It was also noted that the standard EB method is a 37 special case of the Bayesian hierarchical model (18). However, the HSID results have not been 38 examined between the two methods. It is unknown whether or not the novel Bayesian 39 hierarchical model outperforms the commonly used EB method in HSID. As a result, the primary 40 purpose of this paper is to comparatively analyze the performance between the EB method and 41 the Bayesian hierarchical model in HSID. To achieve the objective, this study identifies hotspots Guo, Wu, Zou, Fawcett 3

1 of intersections using the two methods, separately. Evaluation tests are conducted to assess the 2 performance of each method.

3 METHODOLOGY

4 The following two sections introduce the EB method and Bayesian hierarchical model for 5 identifying hotspots, separately. For comparison purpose, the crash rate-based HSID method is 6 also discussed in the first section.

7 Crash Rate and EB-based HSID Methods 8 As the name implies, the crash rate-based method mainly calculates the rate of crashes at each 9 site, and ranks the sites by their rates. It is usually calculated by dividing the observed crash 10 number by exposure (e.g., vehicle miles of travelling, or VMT). There are a few types of crash 11 rates: target crash rates (considering target collisions only), equivalent crash rate (converting 12 crashes into the same severity level by different weight), etc. For the purpose of this study, we 13 consider the number of total crashes, and the exposure as traffic volume traveling through an 14 intersection (see Data Description for more details). Thus, the crash rate for a site is calculated 15 as:

� , Crash Rate, = × 10 (1) Volume, × � 16 where:

17 Crash Rate, = crash rate at site i in period j (per 10 vehicles per year);

18 �, = total number of observed crashes at site i in period j;

19 N = number of years in period j; and,

20 Volume, = traffic volume at site i in period j.

21 Since the crash rate simply relies on the observed number of crashes and exposure at the 22 sites, the randomness of crashes and RTM bias are not well addressed. It has been pointed out 23 that the HSID results using crash rates are not reliable (10, 19, 20). Safety researchers proposed 24 using an EB approach to correct for the RTM bias (21-23). With the EB approach, an estimate of 25 the -term safety of an entity is obtained from two sources, as described above. Let K be the 26 observed number of crashes which is Poisson-distributed, and let k be the expected crash count; 27 the EB estimator of k is given as:

� = � × �(�) + (1 − �) × �. (2) 28 � denotes the EB estimate of the expected number of crashes. E(k) can be estimated by 29 the crash prediction model. Many statistical models have been proposed to predict safety by 30 transportation safety analysts (24, 25), for instance, the negative binomial (NB) (26), the 31 Poisson-lognormal (27, 28), the Conway-Maxwell-Poisson (29, 30), the gamma (31), the Guo, Wu, Zou, Fawcett 4

1 Sichel(11, 24), the negative binomial-Lindley models (32), and others (33, 34). Among these 2 models, the NB model has been the most frequently used for predicting crashes. The NB model 3 has the following structure: the number of crashes y during a given time period is assumed to be 4 Poisson- distributed, the probability mass function (PMF) for which is given by:

× () �(�|�) = , l > 0, y = 0, 1, 2, …, (3) ! 5 where � = mean response of the observed crash counts during given period.

6 The rate parameter � is assumed to be gamma-distributed with �(�) = µ and ���(�) = 7 µ × �. Equation (4) shows the probability density function (PDF) for � (35):

1 �/ × exp (−�/µ�) (4) �(�|µ, �) = × (µ × �)/� �(1/�)

8 The NB distribution can be viewed as a mixture of Poisson distributions where the 9 Poisson rate � is gamma distributed. The PDF of the NB (Equation (5)) can be obtained by 10 summing out � in Equation (4) (readers are referred to Hilbe (36) for complete derivation of the 11 NB model):

� × exp (−�) 1 �/ × exp (−�/µ�) (5) �(�|µ, �) = / × �� �! (µ × � ) � �(1/� ) � � + � × � 1 / (6) �(�|µ, �) = × ( ) × ( ) �(� + 1) × � 1 + � × � 1 + � × � 12 where,

13 y = response variable;

14 µ = mean response of the observation; and

15 � = � is the dispersion parameter.

16 With the NB model structure, the weight factor w in the EB method is given as (37):

1 � = (7) 1 + � × �(�) 17 For the detailed procedures of estimating roadway safety and ranking sites using the EB 18 method, readers are referred to (21, 37, 38).

19 Bayesian Hierarchical Model 20 Rather than estimating from a single before period, the Bayesian hierarchical model proposed by 21 Fawcett et al. (18) incorporates counts from multiple past periods. This adjusts the variations 22 over the past crash counts by a SPF [note that the researchers used accident prediction model Guo, Wu, Zou, Fawcett 5

1 (APM), and it is essentially the same as a SPF]. For evaluating the model, a discrete time 2 indicator � is chosen from previous years and current year, i.e. � < 0 and � = 0 , respectively.

3 The SPF is used to overcome the effects of global trend and RTM, and it is assumed that:

� �(�) , � ≥ 0 �(�)|�(�)~ � (�) 1 (8) �� � = , � = , � < 0 �(�) − 1 �(�)

4 Where �(�) = ��(�) exp(��) with � ∈ ℝ, � ∈ ℝ and �(�) estimated from a

5 global SPF for all �; �(�) = exp(−��) with � ∈ ℝ for � < 0. We choose a log-linear SPF to 6 estimate �(�) with an assumed NB over-dispersion parameter � as below,

(9) �(�) = exp (� + �� + �log (�) + ��)

7 Here, � and � indicate covariate information related with traffic volumes for � such 8 covariates and other observations for � counts at each site �. For prior distributions, Fawcett et 9 al. (18) suggested using �~Γ(�, �), � = �� with �~�(�, �) and �~���������(�), 10 �~Γ(2,20) after extensive prior trials. For � and �, we assume � follows a gamma 11 distribution, � follows a normal distribution and � is beta-distributed. The regression 12 coefficients in the SPF are given uninformative priors, so the model is relatively data-driven. 13 Then, a Markov chain Monte Carlo (MCMC) procedure is employed to simulate the mean crash 14 rates �(�)in the Bayesian framework, by making inferences on �, �, �with given prior 15 specifications described above. With this structure, the Bayesian hierarchical model allows more 16 recent observations to inform predictions with more certainty than observations further in the 17 past. The choice of prior distribution for bj ensures that local site-specific trend can be accounted 18 for, over-and-above global trend which is identified across the network, if this local trend at the 19 site level is deemed significant.

20 Evaluation Methods 21 The crash rate method, the EB method and the Bayesian Hierarchical model are all techniques to 22 identify the hotspots. Evaluation tests on the performance of them are needed as rules to measure 23 their performances in HSID. In most previous studies related to HSID, percentage of false 24 positives (FP) and percentage of false negatives (FN) are the only measures. FP is the percentage 25 of sites claiming a safe site as unsafe, whereas, FN occurs when an unsafe site failed to be 26 identified. Because the feedbacks from FP and FN are in the binary format, this does not provide 27 insight into the relative performance among HSID methods. For this study, four evaluation tests 28 are implemented, namely: (1) Site Consistency test; (2) the Method Consistency test; (3) the 29 Total Rank Differences test; and (4) the Poisson Mean Differences test. These tests not only 30 include the consideration of FP and FN, but also take rank of sites into account for evaluating Guo, Wu, Zou, Fawcett 6

1 HSID method. These tests are following and referencing from the evaluation tests proposed by 2 and Washington (19).

3 The Site Consistency test is to evaluate the ability of HSID methods by measuring the 4 consistent appearance of a high-risk site over a subsequent observation period. The high-risk 5 section includes sites � with estimated crash rates or means from �() to �()ranking from 6 (� − ��) to � at period �, where � is the significant level of the high-risk. Test 1 compares the 7 summation of crash rates for those identified high-risk site at later period � + � after � amount of 8 observation time period within all method �. Method A is better than other methods, 9 when �1, > ∀ �1,. The mathematical equation of this site consistency test is give as 10 Equation (10) when �2 > ∀ �2,

�1∀, = ∑ �, , � ∈ ��(),, ��(),, … , �()�,; � ∈ ℤ. (10) 11 The Site Consistency test evaluates the better method with a higher output as indicating 12 such method is consistently good over sites.

13 The Method Consistency test looks at the consistency of the method over time periods, 14 rather than being site-based. For each method � from � to � + � has its high-risk site range from 15 {�(), �(), … , �()}, that may or may not intersect with other HSID methods. This 16 evaluation counts the numbers of elements in the intersection set, that is, it is counting the same 17 sites that were identified as high-risk formulated as shown in the equation below:

�2 = ⋂� , � , … , � ; � ∈ ℤ . (11) ∀ () () () 18 The Method Consistency test justifies a better method if a method scores higher than other 19 methods, meaning such method has larger numbers of similar high-risk sites identified 20 throughout the time periods.

21 Further, the third test, the Total Rank Differences test, compares the summation of 22 ranking differences between high risked sites identified from base period � to a later time period 23 (� + �). That is for every method �, there are k sites identified as high-risk sites based on period 24 �. With these identified k sites, their corresponding rank ℜ, is found in period � + �. The 25 variations of the ranks are then summed as the output of this method as shown in equation (12):

�3 = ∑(ℜ − ℜ ) ; � ∈ � , � , … , � ; � ∈ ℤ . ∀ , , () () () (12) 26 Because it is a measure on variations of ranking, a smaller output indicates a better 27 performance, that is �3 < ∀ �3. 28 Last, the Poisson Mean Differences test is used as an evaluation test in this study. It is an 29 extension of the widely-used false identification test. False identification test determines FN and 30 FP as they appear in each HSID method. A site is justified as FN it is truly hazardous, but 31 mistakenly identified by a method as safe. On the other hand, if a site is truly safe but wrongly 32 justified as hazardous, it is then considered as FP. Hence, it is important to have a knowledge on 33 the truly hazardous sites and truly safe sites over time and space. However, the observed crash Guo, Wu, Zou, Fawcett 7

1 rate is site-specific and time-specific. The true Poisson mean (TPM) � is defined as the mean of 2 the observed crash rates over the observation periods. The sites are then ranked based on the 3 TPM as their true risk level, with a higher TPM assuming to be a truly riskier site. As sites are 4 ranked by TPM, a critical TPM value is determined at a significance level � in the rank. Such 5 critical TPM, ��� is the ruler in the Poisson Mean Differences test. Under the evaluation of 6 this test, FNs and FPs of each method are identified with their corresponding TPM values, 7 ��� and ���. These then belong to the set ��� = ��� ∪ ���. Then, for each 8 HSID method, the absolute differences between ��� and ���are summed over sites and 9 time period. The test is mathematically formatted as:

∑ ∑ �4∀ = |���, − ���|. (13) 10 The Poisson Mean Differences test is improved from FI test by setting the associated 11 TPM difference as the weight. In this test, a relatively small output for �4 indicating less 12 identification of false in the HSID for a better performance in terms of hotspot identification.

13 It is worth mentioning that researchers and practitioners have been using various 14 measurements in ranking sites or network screening, for example, observed crash number or rate, 15 EB method, difference between the EB and SPF(also known as potential for safety improvement, 16 PSI), ratio between the EB and SPF. Some previous studies (10) have shown that the EB method 17 performs better than others in identifying hotspots. Thus, we utilized the EB method in this 18 study.

19 DATA DESCRIPTION

20 This study utilized the same data used by Fawcett et al. (18), but some filters were applied to 21 exclude outliers and incomplete cases. The dataset includes annual accident counts from 2004 to 22 2012 at 734 intersections in the city of Halle, Germany. There are two types of observations in 23 the dataset: numerical and binary. The numerical observations � are average traffic volume, 24 traffic volumes from major and minor streets. The infrastructure observations � are represented 25 in binary format, such as signalized or non-signalized intersection, major or non-major 26 intersection, and so on. Several adjustments are assigned to the original dataset. In safety 27 analysis, if the traffic volume at a site is relatively low, collisions are rare and the true Poisson 28 mean is small, making the parameter estimate unstable (17). In addition, sites with low traffic 29 volume are less interesting to roadway agencies and practitioners. Hence, thresholds are made 30 respectively to major and minor traffic volumes as 100 vehicles and 50 vehicles. That is, only the 31 sites with a major volume of more than 100 vehicles and a minor volume of more than 50 32 vehicles are included in this study. In the meanwhile, considering the fact that area type and 33 traffic control have significant effect on both operation and safety of intersections, this study 34 focused on unsignalized intersections in urban area only. These leave 186 sites for this study. 35 TABLE 1 provides summary statistics of the crash count and intersection features. Guo, Wu, Zou, Fawcett 8

1 TABLE 1 Summary Statistics of Accident Counts and Observations (186 Sites)

Variable Min Max Mean(SD) Crash (2004) 0 28 3.81 (4.30) Crash (2005) 0 27 3.73 (4.34) Crash (2006) 0 36 3.38 (4.42) Crash (2007) 0 35 3.76 (4.55) Crash (2008) 0 29 3.43 (4.09) Crash (2009) 0 24 3.47 (4.10) Crash (2010) 0 34 3.46 (4.29) Crash (2011) 0 28 3.20 (4.27) Crash (2012) 0 33 2.77 (3.83) Average Volume (veh/day) 206 48,192 5,657.5 (7168.1) Major Volume (veh/day) 117 38,341 4,316.9 (5447.2) Minor Volume (veh/day) 52 23,498 1,340.6 (2781.0) Speed Limit 30 kmph Yes: 32.3%; No: 67.7% Speed Limit 45 kmph Yes: 23.1%; No: 76.9% Speed Limit 50 kmph Yes: 25.8%; No: 74.2% Speed Limit 60 kmph Yes: 12.4%; No: 87.6% Speed Limit 70 kmph Yes: 4.3%; No: 95.7% Four Legs Yes: 23.1%; No: 76.9% 2 Note: SD = standard deviation.

3 RESULTS

4 Crash Rate and EB HSID 5 Cash rates over the 186 intersections for every two years were calculated using the definition 6 documented in the previous section (i.e., Equation 1). A few examples are illustrated in Table 2. 7 As can be seen, Site 74 ranked the first in the period of 2004 & 2005, and the crash rate was 8 3,947.1 per 100,000 vehicles of traveling per year. It varies from 1,879.7 to 2,255.6 in the 9 following three periods. It ranked the sixth and tenth in 2008 & 2009 and 2010 & 2011, 10 respectively. TABLE 2 also lists the sites with relatively lower crash rates (see the bottom rows). Guo, Wu, Zou, Fawcett 9

1 TABLE 2 Example Sites of Crash Rate based HSID Results Year Site No. 2004-2005 2006-2007 2008-2009 2010-2011 31 3,358.2 (2)* 4,477.6 (1) 2,611.9 (3) 7,462.7 (1) 74 3,947.4 (1) 2,255.6 (6) 1,879.7 (10) 2,067.7 (8) … … … … … 7 481.2 (47) 524.9 (39) 612.4 (39) 831.1 (34) 28 482.6 (46) 289.6 (69) 530.9 (44) 144.8 (92) 34 451.1 (49) 300.8 (64) 601.5 (40) 150.4 (91) 41 437.4 (50) 397.7 (54) 397.7 (54) 357.9 (60) … … … … … 2 366.5 (63) 366.5 (56) 733.0 (36) 1,099.5 (27) 3 44.7 (146) 31.6 (147) 42.1 (140) 26.3 (146) 4 396.7 (58) 377.8 (55) 245.6 (74) 188.9 (81) 5 390.0 (61) 229.4 (79) 344.1 (63) 390.0 (57) 2 Note: * Number of total crashes per 100,000 vehicles of traveling per year; Number in 3 parenthesis indicates ranking.

4 As has been discussed, the SPF is an important part of the EB method. This study 5 modeled the data with a NB distribution, and the modeling results are shown in TABLE 3. It is 6 worth mentioning that we tried to include all the variables in the model, since the main purpose 7 of the SPF is prediction rather than inference. A few variables (e.g., speed limit 50, logarithm of 8 major volume) are not significant at the 5% significance level, but they have been kept in the 9 model. It is possible that some variables are correlated, for example volume and minor volume, 10 making the parameter estimates relatively unstable. However, this does not affect the prediction, 11 which is of more interested in the EB method. Guo, Wu, Zou, Fawcett 10

1 TABLE 3 Safety Performance Function (SPF) of the NB Model

Variable Estimate Std. Error z value Pr(>|z|) Significance Intercept -1.0648 0.3471 -3.0678 0.0022 99% Speed Limit 30 kmph 0.9964 0.2120 4.7000 < 0.0001 99% Speed Limit 45 kmph 0.6991 0.2308 3.0286 0.0025 99% Speed Limit 50 kmph 0.3780 0.0860 4.3979 < 0.0001 99% Speed Limit 60 kmph 0.5022 0.0605 8.3036 < 0.0001 99% Speed Limit 70 kmph 0.4715 0.2082 2.2647 0.0235 95% Major Intersection 0.1773 0.0520 3.4124 0.0006 99% Four Legs -0.4696 0.2528 -1.8578 0.0632 -* Log (Major Volume) 1.3405 0.0704 - - 95% Log (Minor Volume) -1.0648 0.3471 -3.0678 0.0022 99% Log (Volume) 1.0564 0.2242 4.7117 < 0.0001 99% Dispersion parameter 1.5314 0.2173 7.0458 < 0.0001 99% AIC 7,699.1 2 Note: * not significant at the 95% level.

3 Using the SPF, the number of crashes for each site can be predicted, and the EB estimate 4 is then calculated. Taking site 9 as an example, the predicted number of crashes in 2004 is 8.39. 5 The observed number of crashes in the year is 28. The weight is calculated as � = = ./. 6 0.15. The EB estimate is 0.15 × 8.39 + (1 − 0.15) × 28 = 24.8. Similarly, the EB estimate at 7 this site for 2005 is 23.2. The expected number of crashes in 2004 and 2005 together at this site 8 is 48.0, shown in Table 4 row 3. TABLE 4 (rows 3 to 14) lists the expected number of crashes as 9 well as ranking for a few sites. Guo, Wu, Zou, Fawcett 11

1 TABLE 4 Example Sites of EB-based and Bayesian Hierarchical HSID Results

Year Site No. 2004-2005 2006-2007 2008-2009 2010-2011 EB-based Method 9 48.0 (1)* 34.4 (3) 34.4 (4) 26.8 (4) 17 44.1 (2) 62.8 (1) 44.1 (1) 55.1 (1) 159 2.4 (164) 4.3 (108) 5.0 (97) 6.9 (56) 94 6.8 (73) 5.2 (91) 4.4 (106) 19.3 (9) 142 1.8 (180) 2.5 (154) 2.5 (154) 5.9 (74) … … … … … 1 11.8 (28) 8.2 (53) 12.5 (22) 7.5 (48) 2 7.1 (69) 12.9 (21) 12.2 (25) 12.9 (19) 5 12.8 (26) 9.5 (42) 8.2 (48) 15.4 (15) 7 10.3 (37) 14.8 (16) 11.8 (28) 15.5 (14) 8 5.8 (87) 2.8 (148) 2.8 (148) 2.0 (171) Bayesian Hierarchical Method 9 42.0 (4)* 39.3 (4) 36.9 (4) 34.5 (4) 17 64.9 (1) 61.0 (1) 57.3 (1) 53.8 (1) 159 4.8 (105) 5.1 (95) 5.4 (83) 5.8 (74) 94 5.4 (96) 7.1 (69) 9.3 (42) 12.3 (21) 142 1.9 (161) 2.1 (153) 2.3 (144) 2.6 (130) … … … … … 1 11.6 (40) 10.9 (38) 10.2 (38) 9.6 (39) 2 14.5 (21) 13.7 (20) 12.9 (20) 12.1 (22) 5 15.5 (16) 14.5 (15) 13.6 (15) 12.8 (17) 7 15.7 (15) 14.7 (14) 13.8 (14) 13.0 (16) 8 2.4 (148) 2.3 (149) 2.1 (151) 2.0 (150) 2 Note: * Expected number of crashes (in two years); Number in parenthesis indicates ranking.

3

4 Bayesian Hierarchical Model 5 The Bayesian Hierarchical Model is coded in R for this study. The SPF model defined in 6 Equation (9) estimates � with regression coefficients as well as dispersion parameter. The 7 results are shown in TABLE 5. The covariates about urban area, intersection, signalization, 8 major intersection, four leg intersection, and minor street traffic volume showed significant 9 impact on the number of crashes. This is overall consistent with NB model in EB method (please 10 see TABLE 3). The estimated coefficient � for year is negative, indicating that the number of 11 crashes over the network is in a decreasing trend (without significant changes in other 12 situations). Guo, Wu, Zou, Fawcett 12

1 TABLE 5 Regression Results of SPF in the Bayesian Hierarchical Model

Variable Estimate Std. Error z value Pr(>|z|) Significance Intercept -1.1926 0.3487 -3.4200 0.0006 99% Speed Limit 30 kmph 1.0658 0.2239 4.7595 < 0.0001 99% Speed Limit 45 kmph 1.5398 0.2171 7.0933 < 0.0001 99% Speed Limit 50 kmph 0.9092 0.2140 4.2489 < 0.0001 99% Speed Limit 60 kmph 0.9974 0.2118 4.7102 < 0.0001 99% Speed Limit 70 kmph 0.7091 0.2305 3.0762 0.0021 99% Major Intersection 0.3839 0.0857 4.4787 < 0.0001 99% Four Legs 0.5083 0.0603 8.4314 < 0.0001 99% Log (Major Volume) 0.4754 0.2076 2.2900 0.0220 95% Log (Minor Volume) 0.1766 0.0518 3.4065 0.0007 99% Log (Volume) -0.4742 0.2521 -1.8809 0.0600 -* Year -0.0316 0.0098 -3.2311 0.0012 99% Dispersion Parameter 1.3534 0.0714 - - 99% AIC 7,690.9 2 Note: * not significant at the 95% level.

3 Moreover, parameters are determined by their priors for MCMC methods. According to 4 Fawcett et al. (18), the prior for the time-dependent inflation parameter � should be �~Γ(2,20). 5 With the dispersion parameter obtained using maximum likelihood from the SPF, the prior of 6 site-specific crash modification factor � is �~Γ(1.3534,1.3534). The term �, a prior weight to 7 the local trend, is formed by � and �, as explained in the Methodology Section with 8 �~�(�, �) and zero inflation �~���������(�). We assumed 9 �~N(0, 10), �~Γ(0.1, 0.01) and �~Β(0.1, 0.1), which are all weak informative priors. The 10 MCMC procedure is then employed to simulate the mean crash rates �(�) in the Bayesian 11 framework by making inferences on �, �, � with given prior specifications for a total 20,000 12 walks with 10 walks per step. TABLE 6 illustrates the parameters corresponding to sample sites 13 9, 17, 41 and 67 with observed crash counts �, SPF predicted crash rates �and the mean crash 14 rate � in each pairs of years. Table 6 shows MCMC results with local deviation adjuster � and 15 discrepancy � estimates the crash counts better than the SPF predictions. Here, for sites 9 and 16 17, the posterior means for �′� are very close to zero, that is, the trend of MCMC results for the 17 site is almost parallel to SPF predicted crash rates. For sites 159 and 94, however, their �′� are 18 relatively large, indicating the higher deviation in yearly crashes at the two intersections. Guo, Wu, Zou, Fawcett 13

1 TABLE 6 Observed Crash Count, SPF Prediction and MCMC Results of Example Sites

Site Year � � No � � 2004-2005 2006-2007 2008-2009 2010-2011

� 54 38 38 29 2.524 -0.0008 9 � 16.5 15.5 14.6 13.7 (2.091, 2.957) (-0.016, 0.014) � 42.0 39.3 36.9 34.5 0.0003 � 49 71 49 62 3.7544 17 (-0.009, 0.009)- � 17.3 16.3 15.3 14.3 (3.245, 4.264) 0.003, 0.149) � 64.9 61.0 57.3 53.8 1.3794 � 1 4 5 8 0.0635 159 (0.174, � 5.5 5.2 4.9 4.6 (-0.155, 0.282) 2.585)) � 4.8 5.1 5.4 5.8 � 6 4 3 22 1.8098 0.1707 94 � 10.6 10.0 9.4 8.8 (0.538, 3.082) (-0.074, 0.415) � 5.4 7.1 9.3 12.3 2 Note: y = observed crash count; µ = SPF Prediction; λ = MCMC results. 3 4 Like the crash rate and EB results, TABLE 4 (rows 15 to 26) illustrates the example 5 results for the Bayesian hierarchical method. Taking site 9 as an example, the predicted number 6 of crashes in 2004 & 2005 by SPF is 16.5. From MCMC, we obtain the associated � = 7 2.524, � = −0.0008. Then the Bayesian hierarchical estimate � is 16.5 × 2.524 × 8 exp(−0.0008) ≈ 42.0, shown in Table 4 row 15 and Table 6 row 5. The rank associated with 9 each mean crash rate represents the ranking of the crash rate in the period. For example, site 17 10 ranks at the highest in every pairs, with an expected mean crash number of 64.9 for period of 11 2004 & 2005, 61.0 for 2006 & 2007, 57.3 for 2008 & 2009, and 53.8 for 2010 & 2011. Site 17 12 ranks similar as first and second with EB-based method. However, not all sites rank similar as 13 site 17. For example, site 1 ranks as the 40th in 2004 & 2005 by Bayesian hierarchical method, 14 but ranks as the 28th in 2004 & 2005 by EB-based method. It ranks the 38th in 2006 & 2007 by 15 Bayesian hierarchical method, but ranks as the 53rd in 2006 & 2007 by EB-based method. This 16 leads to the evaluations of results as addressed in next section to compare HSID methods. 17 Evaluation Results 18 As stated in the Data Description Section, there are a total of 186 sites. With significance levels 19 of 0.025, 0.050 and 0.075, there are 5, 9 and 14 sites considered as higher-risk respectively for 20 each HSID method and period. Four tests introduced in methodology are implemented to 21 evaluate the HSID results from crash rate method, EB method and Bayesian Hierarchical 22 Method. TABLE 7 presents the results of four tests. Guo, Wu, Zou, Fawcett 14

1 TABLE 7 Results of Four Evaluation Tests on Crash Rate, EB and Bayesian Hierarchical 2 Method 2004&2005 2006&2007 2008&2009 Test vs. vs. vs. 2006&2007 2008&2009 2010&2011 CR EB BH CR EB BH CR EB BH � = 0.025 T1 32 144 129 27 137 152 37 138 138 T2 1 1 3 0 3 4 1 3 4 T3 36 27 20 44 23 3 47 5 4 T4 208 49 8 210 1 0 212 3 0 � = 0.050 T1 92 270 296 76 258 268 93 257 272 T2 6 6 8 3 6 9 5 6 9 T3 71 67 25 77 92 3 67 69 3 T4 389 24 22 391 22 8 363 18 24 � = 0.075 T1 158 293 303 149 258 290 163 276 291 T2 7 5 11 8 6 13 8 8 11 T3 95 162 29 115 271 9 112 171 15 T4 300 81 32 321 40 15 315 30 27 3 Note: Bold and underline indicate the highest performance; CR = Crash Rate; EB = Empirical 4 Bayesian; BH = Bayesian Hierarchical. 5 The results of four evaluation tests under � = 0.050 and 0.075 clearly show that the 6 Bayesian Hierarchical Method performs the best, as a 100% highest performance among three 7 HSID methods. EB follows as the secondary best method with most test results indicating that 8 the secondary best method is the crash rate method. The evaluation test scores for � = 0.025 are 9 also shown in Table 8. As can be seen, the results are quite similar to that of � = 0.050 10 and 0.075. Bayesian hierarchical model performed the best in all the tests among the three 11 methods (except one case, in which EB method is slightly better). The EB approach is generally 12 better than the crash rate-based method.

13 CONCLUSIONS AND DISCUSSIONS

14 Hotspot identification is the first step for improving traffic safety, and it is an important 15 component of the highway safety management process. Errors in HSID lead to inefficient use of 16 limited resources for safety improvements. Initially, roadway agencies used observed crash 17 numbers and crash rates for identifying sites with promise. But this method does not account for 18 the RTM bias. Safety analysts proposed using statistical models for estimating safety as well as 19 HSID. So far, various models have been extensively applied to identify hotspots. The EB 20 technique has shown to be an effective approach for identifying hotspots, and it has been widely 21 used in recent decades (12). However, there are also some limitations with the EB approach. It Guo, Wu, Zou, Fawcett 15

1 assumes the parameters in the SPF for predicting the number of crashes are correct without any 2 variation, and also the safety of a site is temporarily independent and stable. Recent studies 3 revealed that this is not always true (13). These disadvantages with the EB approach may result 4 in errors in hotspot identification. Fawcett et al. (18) proposed a novel Bayesian Hierarchical 5 model structure for estimating roadway safety. This model is able to capture the temporary trend 6 of crashes at a site, and improves the accuracy of safety estimates. Thus, the study utilized the 7 Bayesian hierarchical model in HSID and compared it to the results using the standard EB 8 method. The purpose was to examine if the Bayesian hierarchical model can identify hotspots 9 more accurately.

10 To achieve the objective, this study analyzed crash data from 2004 to 2011 at 186 urban 11 unsignalized intersections in the city of Halle, Germany. Certain number of top intersections 12 were identified as hotspots with three methods: crash rate, EB, and the Bayesian hierarchical 13 model, on a two-year basis. The identification results, safety estimates and ranking were 14 examined using four evaluation measurements: (1) Site Consistency test; (2) Method 15 Consistency test; (3) Total Rank Differences test; and (4) Poisson Mean Differences test. All the 16 test results indicate that the Bayesian hierarchical model performed the best among three models. 17 The crash-rate based HSID method was overall the worst (without surprise), and the EB method 18 was much better than the crash rate method. This is in line with previous studies [e.g., (11), (39), 19 and (2)]. In short, crash rate-based HSID is not recommended, since it produces unreliable HSID 20 results. Although the differences between the EB and Bayesian hierarchical model in terms of 21 HSID results were not as obvious as the differences between the EB and the crash rate-based 22 model, the Bayesian hierarchical model outperformed the EB approach in almost all the tests and 23 periods. This study found that the Bayesian hierarchical model provides more accurate 24 identification results than the other two methods. Considering the high costs associated with false 25 identification of collision-prone sites, safety analysts and practitioners are encouraged to 26 consider the Bayesian hierarchical model for HSID in order to reasonably distribute funds for 27 roadway safety improvements.

28 There are a few limitations to this study. First, limited number of variables were included 29 in the SPFs development due to data availability. It may suffer from the omitted variable 30 problem, as discussed in Wu et al. (40), (41). Second, the Bayesian hierarchical model approach 31 for estimating safety and hotspot identification needs MCMC, which may not be feasible to most 32 transportation engineers. Software packages with an interface for deploying the Bayesian 33 hierarchical analysis are needed for safety analysts and roadway agencies. Finally, all the 34 analyses in this study were based on historical crash data, which is passive. Proactive safety has 35 gained more attention in recent years. Both the Bayesian hierarchical model and the EB method 36 have the ability to predict crashes in future time periods. This is not tested in this study and needs 37 further examination in the future.

38 ACKNOWLEDGMENT

39 The authors would like to thank Newcastle Research Data Service for providing the dataset. Guo, Wu, Zou, Fawcett 16

1 AUTHOR CONTRIBUTION STATEMENT

2 The authors confirm contribution to the paper as follows: study conception and design: Xiaoyu 3 Guo, Lingtao Wu; data preparation: Xiaoyu Guo; analysis and interpretation of results: Lee 4 Fawcett, Xiaoyu Guo, Lingtao Wu, Yajie Zou; draft manuscript preparation: Lee Fawcett, 5 Xiaoyu Guo, Lingtao Wu, Yajie Zou. All authors reviewed the results and approved the final 6 version of the manuscript.

7 REFERENCES

8 [1] Hauer, E. Identification of Sites with Promise. Transportation Research Record, Vol. 1542, 9 1996, pp. 54-60. 10 [2] Cheng, W., and S. P. Washington. Experimental Evaluation of Hotspot Identification 11 Methods. Accident analysis and prevention, Vol. 37, No. 5, 2005, pp. 870-881. 12 [3] Miranda-Moreno, L. F., L. P. , F. F. Saccomanno, and A. Labbe. Alternative Risk Models 13 for Ranking Locations for Safety Improvement. Transportation Research Record, Vol. 1908, 14 2005, pp. 1-8. 15 [4] Cheng, W., W. H. , X. D. , X. K. Wu, and J. . Ranking Cities for Safety 16 Investigation by Potential for Safety Improvement. Journal of Transportation Safety & Security, 17 Vol. 10, No. 4, 2018, pp. 345-366. 18 [5] Cheng, W., G. S. Gill, R. Dasu, M. Q. , X. D. Jia, and J. Zhou. Comparison of 19 Multivariate Poisson Lognormal Spatial and Temporal Crash Models to Identify Hot Spots of 20 Intersections Based on Crash Types. Accident analysis and prevention, Vol. 99, 2017, pp. 330- 21 341. 22 [6] Park, B.-J., D. Lord, and C. Lee. Finite Mixture Modeling for Vehicle Crash Data with 23 Application to Hotspot Identification. Accident Analysis & Prevention, Vol. 71, 2014, pp. 319- 24 326. 25 [7] , C., M. A. Quddus, and S. G. Ison. Predicting Accident Frequency at Their Severity 26 Levels and Its Application in Site Ranking Using a Two-Stage Mixed Multivariate Model. 27 Accident analysis and prevention, Vol. 43, No. 6, 2011, pp. 1979-1990. 28 [8] Zou, Y. J., J. E. Ash, B. J. Park, D. Lord, and L. T. Wu. Empirical Bayes Estimates of Finite 29 Mixture of Negative Binomial Regression Models and Its Application to Highway Safety. 30 Journal of Applied Statistics, Vol. 45, No. 9, 2018, pp. 1652-1669. 31 [9] Hauer, E., and B. N. Persaud. Problem of Identifying Hazardous Locations Using Accident 32 Data. Transportation Research Record, Vol. 975, 1984, pp. 36-43. 33 [10] Montella, A. A Comparative Analysis of Hotspot Identification Methods. Accident analysis 34 and prevention, Vol. 42, No. 2, 2010, pp. 571-581. 35 [11] Wu, L., Y. Zou, and D. Lord. Comparison of Sichel and Negative Binomial Models in Hot 36 Spot Identification. Transportation Research Record: Journal of the Transportation Research 37 Board, Vol. 2460, 2014, pp. 107-116. 38 [12] AASHTO. Highway Safety Manual. American Association of State Highway and 39 Transportation Officials, Washington, D.C., 2010. 40 [13] Mannering, F. Temporal Instability and the Analysis of Highway Accident Data. Analytic 41 Methods in Accident Research, Vol. 17, 2018, pp. 1-13. Guo, Wu, Zou, Fawcett 17

1 [14] Behnood, A., and F. L. Mannering. The Temporal Stability of Factors Affecting Driver- 2 Injury Severities in Single-Vehicle Crashes: Some Empirical Evidence. Analytic Methods in 3 Accident Research, Vol. 8, 2015, pp. 7-32. 4 [15] Persaud, B., B. Lan, C. Lyon, and R. Bhim. Comparison of Empirical Bayes and Full Bayes 5 Approaches for before–after Road Safety Evaluations. Accident Analysis & Prevention, Vol. 42, 6 No. 1, 2010, pp. 38-43. 7 [16] Persaud, B., and C. Lyon. Empirical Bayes before-after Safety Studies: Lessons Learned 8 from Two Decades of Experience and Future Directions. Accid Anal Prev, Vol. 39, No. 3, 2007, 9 pp. 546-555. 10 [17] Lord, D. Modeling Motor Vehicle Crashes Using Poisson-Gamma Models: Examining the 11 Effects of Low Sample Mean Values and Small Sample Size on the Estimation of the Fixed 12 Dispersion Parameter. Accident Analysis & Prevention, Vol. 38, No. 4, 2006, pp. 751-766. 13 [18] Fawcett, L., N. Thorpe, J. Matthews, and K. Kremer. A Novel Bayesian Hierarchical Model 14 for Road Safety Hotspot Prediction. Accid Anal Prev, Vol. 99, No. Pt A, 2017, pp. 262-271. 15 [19] Cheng, W., and S. Washington. New Criteria for Evaluating Methods of Identifying Hot 16 Spots. Transportation Research Record, Vol. 2083, 2008, pp. 76-85. 17 [20] Elvik, R. Comparative Analysis of Techniques for Identifying Locations of Hazardous 18 Roads. Transportation Research Record, No. 2083, 2008, pp. 72-75. 19 [21] Hauer, E., D. W. Harwood, F. M. Council, and M. S. Griffith. Estimating Safety by the 20 Empirical Bayes Method - a Tutorial. Transportation Research Record: Journal of the 21 Transportation Research Board, Vol. 1784, 2002, pp. 126-131. 22 [22] Hauer, E. Observational before-after Studies in Road Safety: Estimating the Effect of 23 Highway and Traffic Engineering Measures on Road Safety. Pergamon, Tarrytown, N.Y., 24 U.S.A., 1997. 25 [23] Hauer, E. Empirical Bayes Approach to the Estimation of “Unsafety”: The Multivariate 26 Regression Method. Accident Analysis & Prevention, Vol. 24, No. 5, 1992, pp. 457-477. 27 [24] Zou, Y., D. Lord, Y. , and Y. . Comparison of Sichel and Negative Binomial 28 Models in Estimating Empirical Bayes Estimates. Transportation Research Record, Vol. 2392, 29 2013, pp. 11-21. 30 [25] Lord, D., and F. Mannering. The Statistical Analysis of Crash-Frequency Data: A Review 31 and Assessment of Methodological Alternatives. Transportation Research Part A, Vol. 44, No. 32 5, 2010, pp. 291-305. 33 [26] Zou, Y., L. Wu, and D. Lord. Modeling over-Dispersed Crash Data with a Long Tail: 34 Examining the Accuracy of the Dispersion Parameter in Negative Binomial Models. Analytic 35 Methods in Accident Research, Vol. 5–6, 2015, pp. 1-16. 36 [27] Miranda-Moreno, L. F., L. P. Fu, F. F. Saccomanno, and A. Labbe. Alternative Risk Models 37 for Ranking Locations for Safety Improvement. Transportation Research Record, No. 1908, 38 2005, pp. 1-8. 39 [28] Lord, D., and L. F. Miranda-Moreno. Effects of Low Sample Mean Values and Small 40 Sample Size on the Estimation of the Fixed Dispersion Parameter of Poisson-Gamma Models for 41 Modeling Motor Vehicle Crashes: A Bayesian Perspective. Safety Science, Vol. 46, No. 5, 2008, 42 pp. 751-770. 43 [29] Lord, D., S. R. Geedipally, and S. D. Guikema. Extension of the Application of Conway- 44 Maxwell-Poisson Models: Analyzing Traffic Crash Data Exhibiting Underdispersion. Risk 45 Analysis, Vol. 30, No. 8, 2010, pp. 1268-1276. Guo, Wu, Zou, Fawcett 18

1 [30] Lord, D., S. D. Guikema, and S. R. Geedipally. Application of the Conway-Maxwell- 2 Poisson Generalized Linear Model for Analyzing Motor Vehicle Crashes. Accident Analysis and 3 Prevention, Vol. 40, No. 3, 2008, pp. 1123-1134. 4 [31] Oh, J., S. P. Washington, and D. Nam. Accident Prediction Model for Railway-Highway 5 Interfaces. Accident analysis and prevention, Vol. 38, No. 2, 2006, pp. 346-356. 6 [32] Geedipally, S. R., D. Lord, and S. S. Dhavala. The Negative Binomial-Lindley Generalized 7 Linear Model: Characteristics and Application Using Crash Data. Accident analysis and 8 prevention, Vol. 45, 2012, pp. 258-265. 9 [33] Das, S., X. D. , F. Wang, and C. Leboeuf. Estimating Likelihood of Future Crashes for 10 Crash-Prone Drivers. Journal of Traffic and Transportation Engineering-English Edition, Vol. 2, 11 No. 3, 2015, pp. 145-157. 12 [34] Das, S., and X. D. Sun. Factor Association with Multiple Correspondence Analysis in 13 Vehicle-Pedestrian Crashes. Transportation Research Record, No. 2519, 2015, pp. 95-103. 14 [35] Rigby, B., and M. Stasinopoulos. A Flexible Regression Approach Using Gamlss in R. 15 http://www.gamlss.org/wp-content/uploads/2013/01/Lancaster-booklet.pdf. Accessed July 28, 16 2013. 17 [36] Hilbe, J. Negative Binomial Regression. Cambridge : Cambridge University Press, 18 Cambridge, 2007. 19 [37] Hauer, E. Observational before--after Studies in Road Safety : Estimating the Effect of 20 Highway and Traffic Engineering Measures on Road Safety. Pergamon, Tarrytown, N.Y., 21 U.S.A., 1997. 22 [38] Persaud, B., C. Lyon, and T. Nguyen. Empirical Bayes Procedure for Ranking Sites for 23 Safety Investigation by Potential for Safety Improvement. Transportation Research Record: 24 Journal of the Transportation Research Board, Vol. 1665, 1999, pp. 7-12. 25 [39] Montella, A. Safety Evaluation of Curve Delineation Improvements Empirical Bayes 26 Observational before-and-after Study. Transportation Research Record, No. 2103, 2009, pp. 69- 27 79. 28 [40] Wu, L., D. Lord, and Y. Zou. Validation of Crash Modification Factors Derived from Cross- 29 Sectional Studies with Regression Models. Transportation Research Record: Journal of the 30 Transportation Research Board, Vol. 2514, 2015, pp. 88-96. 31 [41] Wu, L., and D. Lord. Examining the Influence of Link Function Misspecification in 32 Conventional Regression Models for Developing Crash Modification Factors. Accident Analysis 33 & Prevention, Vol. 102, 2017, pp. 123-135. 34