The Pennsylvania State University

The Pennsylvania State University The Graduate School Department of Civil and Environmental Engineering FUNCTIONAL FORM AND HETEROGENEITY EFFECTS ON SAFETY PERFORMANCE FUNCTION ESTIMATION A Dissertation in Civil Engineering by Baradhwaj Hariharan 2015 Baradhwaj Hariharan Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2015 The dissertation of Baradhwaj Hariharan was reviewed and approved* by the following: Venky N. Shankar Professor of Civil Engineering Dissertation Adviser Chair of Committee Swagata Banerjee Basu Assistant Professor of Civil Engineering Jeremy Blum Associate Professor of Computer Science Evelyn Thomchick Associate Professor of Supply Chain Management Peggy A. Johnson Professor of Civil and Environmental Engineering Head of the Department of Civil and Environmental Engineering *Signatures are on file in the Graduate School ii ABSTRACT The American Association of State Highway and Transportation Officials (AASHTO) defines safety performance functions (SPFs) as statistical models used to estimate the average crash frequency for a specific site type with specific base conditions, based on traffic volume, roadway segment length, and other site characteristics such as lane width, shoulder width, and radius and degree of horizontal curvature. In essence, SPFs are mathematical equations developed through statistical regression modeling of historical data, and are used to predict crash occurrence at sites comparable to those where the historical data was obtained. The aim of this research is to use statistically derived functional form transformations coupled with the parameterization of the negative binomial coefficient of over-dispersion, to minimize the effect of heterogeneity on the predictions of crash frequencies. The functional form transformations were targeted at the parameter heterogeneity that could arise from the use of improper functional form, while the over-dispersion parameterization aimed at accounting for any heterogeneity resulting from unobserved effects empirical to the dataset. The scope of this study is approximately 5,443 centerline miles of 142 2-lane State highways in Washington State, with crash data for nine years from 2002-2010. This dissertation investigates the utility of the multinomial fractional polynomial (MFP) search algorithm for deriving functional forms for SPFs, while accounting for heterogeneous effects. The original contribution of this dissertation lies in the development of novel nonlinear functional form SPFs, and their comparative evaluation against a baseline negative binomial (NB) specification, a baseline heterogeneous negative binomial, and a random parameter negative binomial specification. A 10-fold cut method was used for the MFP search due to the size of the modeling dataset- this enabled the teasing out of minute variations in the dataset. Cumulative residual (CURE) plots were also employed to visually gauge the effect of the MFP functional iii form transformations on the model cumulative residuals. Following this, the resulting functional forms were incorporated with a heterogeneous negative binomial specification, as a means of obtaining segment specific parameterizations of the negative binomial over-dispersion term alpha, while obtaining some insight into the factors that contribute to the observed over- dispersion. It was found that Average Annual Daily Traffic (AADT), homogeneous segment length, the degree of curvature of horizontal curves, the length of a roadside culvert, the presence of a tree group on the roadside, and the presence of a retaining wall, contributed significantly to the over-dispersion of the observed crash data. The model specifications were also compared to a random parameter negative binomial specification estimated for the same dataset, and it was found that the heterogeneous negative binomial specification with MFP derived functional form transformations provided a better convergent log-likelihood, as well as lower prediction validation estimates for mean absolute error (MAE), mean absolute percentage error (MAPE), mean square error (MSE), and root mean square error (RMSE). The gaps in the extant literature that this dissertation was hoping to fill were the lack of a statistical procedure to empirically determine variable functional forms without any assumptions on the distribution of the variable’s effect on crash counts, and a modeling approach that could capture segment specific heterogeneity effectively. Towards this goal, the proposed algorithm proves to be an acceptable method towards modeling SPFs, while providing good insight into segment level parameterization of over-dispersion, thereby enabling effective safety rating and decision making at specific regions of a roadway network. iv TABLE OF CONTENTS List of Figures .......................................................................................................................... vii List of Tables ........................................................................................................................... viii Acknowledgements .................................................................................................................. x Chapter 1 Introduction ............................................................................................................. 1 Chapter 2 Literature review ..................................................................................................... 7 2.1 Statistical analysis of crash counts ............................................................................. 7 2.2 Functional form in count regressions ......................................................................... 11 Chapter 3 Overview of study area and data descriptions ......................................................... 16 3.1 Database evolution ..................................................................................................... 18 3.1.1 Crash data ........................................................................................................ 18 3.1.2 Average annual daily traffic data .................................................................... 28 3.1.3 Roadway geometrics data ................................................................................ 31 3.1.4 Roadside data .................................................................................................. 36 3.2 Combined homogeneous segments dataset ................................................................ 39 Chapter 4 Modeling methodology ........................................................................................... 44 4.1 Poisson distribution, over-dispersion and the negative binomial regression ............. 46 4.2 Model structure for functional form treatments ......................................................... 48 4.3 Estimation algorithm and model selection ................................................................. 49 4.4 Modeling a large dataset with FP(m,p) ...................................................................... 51 4.5 Incorporation of FP(m) structure into heterogeneous negative binomial estimation ................................................................................................................. 55 4.6 Prediction validation measures .................................................................................. 58 4.7 CURE plots to test functional forms resulting from the MFP algorithm ................... 60 4.8 Modeling methodology summary .............................................................................. 62 Chapter 5 Model estimation framework and results ................................................................ 66 5.1 Baseline negative binomial specifications ................................................................. 67 5.2 Random parameter negative binomial specification .................................................. 72 5.3 Multinomial fractional polynomial negative binomial specification ......................... 76 5.4 CURE plots ................................................................................................................ 82 5.4.1 Untransformed variables ................................................................................. 83 5.4.2 MFP algorithm derived variable functional form transformations .................. 86 5.5 Heterogeneous negative binomial specification with MFP and non-MFP predictors .................................................................................................................. 89 5.6 Variable elasticities and model prediction measures ................................................. 96 5.6.1 Elasticities ....................................................................................................... 96 5.6.2 Model prediction evaluations .......................................................................... 97 v 5.6.3 AIC and BIC comparisons .............................................................................. 99 5.6.4 Model prediction measure insights.................................................................. 102 Chapter 6 Summary ................................................................................................................. 103 Bibliography ............................................................................................................................ 109 Appendix A Description of homogeneous roadway segment dataset parameters .........

The Pennsylvania State University

Statistical Approaches for Highly Skewed Data 1

Negative Binomial Regression Models and Estimation Methods

Measures of Fit for Logistic Regression Paul D

Multiple Binomial Regression Models of Learning Style Preferences of Students of Sidhu School, Wilkes University

Regression Models for Count Data Jason Brinkley, Abt Associates

Generalized Linear Models and Point Count Data: Statistical Considerations for the Design and Analysis of Monitoring Studies

Lecture 3 Residual Analysis + Generalized Linear Models

The Overlooked Potential of Generalized Linear Models in Astronomy, I: Binomial Regression

Negative Binomial Regression Second Edition, Cambridge University Press

R-Squared Measures for Count Data Regression Models with Applications to Health Care Utilization

Negative Binomial Regression

Multiple Approaches to Analyzing Count Data in Studies of Individual