<<

2014 TRB Contest

Combing Factor with Binary for Analysis of Driver Behavior in Dilemma Zone

Wenfu Wang (Corresponding Author) and Kushal Mehta Department of Civil Engineering, University of Waterloo, Waterloo, Ontario N2L3G1, Canada Corresponding author e-mail: [email protected]

Problem Formulation The study intends to understand the behavior of different drivers at dilemma zone while distracted by phone usage. Using the binary logistic regression model and the method, this study predicts the possibility of drivers stop (or go-through) at the intersection based on a series of independent predictors. The performances of binary logistic model alone were compared to the combined model structure of factor analysis and binary logistic regression.

Data Preparation The data used in this study were collected from the University of Iowa National Advanced Driving Simulator (NADS), where drivers of three age groups were asked to travel pass intersections while engaged in one of the three secondary tasks (No Phone Call, Outgoing Call, Incoming Call).

The original data were collected at 240 Hz, so the given time frame data were divided by 240 to obtain the occurrence time point of each event. Several data filtering rules were applied to the original datasets to ensure that the obtained variables add value to the model. Data records with any of the following attributes were not used in this analysis:

1) negative yellow phase length 2) negative red phase length 3) positive deceleration rate 4) negative acceleration rate

In the end, a total of 812 records were selected out of the original 1157 records and used in this study. Table 1 lists all the potential input variables.

Table 1: Aggregated Variables for Driver Behaviors at Dilemma Zone

Variable Coding MAge Dummy Variable, 1= Middle Age, 0=others Age Group OAge Dummy Variable, 1= Older, 0=others Gender Dummy Variable, 1= Male, 0=others HF Dummy Variable, 1= Hand free, 0=others Cell Phone Interface HS Dummy Variable, 1= Headset, 0=others

1

OCall Dummy Variable, 1= Outgoing Call, 0=others Call Interface ICall Dummy Variable, 1= Incoming Call, 0=others Yellow Length Scaled variable, unit =seconds Acceleration Pedal Change Direction Dummy Variable, 1=Depressing, 0=Released Acceleration Rate Scaled variable, unit =

Deceleration Rate Scaled variable, unit =

Distance at Green to Yellow Scaled variable, unit =

Velocity at Green to Yellow Scaled variable, unit = Time Headway (Binned) Dummy Variable, 1=more than 3.06 seconds, 0=others Velocity at Yellow to Red Scaled variable, unit =

It should be noted that the data for red phase were not used in this study, as the data did not match the description provided (possibly over/under recording of red phase). The Time Headway is a combination of two provided variables namely distance @ green to yellow and velocity at green to yellow (former divided by the latter), because time headway has been identified by previous researchers as a significant predictor of passing events (1). The dummy variables are used for the Time Headway is because most headway values were distributed around either 3 seconds or 3.75 seconds and were not following the normal distribution.

The frame of acceleration pedal change 10% (Column F) was not used, because the values did not make intuitive sense. Distance from the Stop Line (Column I) was not used because of insufficient observation of stop beyond the stop line events; the Velocity at Stop Line (Column N) and the Frame at Stop Line (Column O) were not used because they were not related to the decision process of drivers when they drive past the intersections.

The independent variable was derived from First Stop Frame (Column H), and was coded as dummy variables with 1=stop, and 0=go through.

Methods and Assumptions Stop and go-through events were examined by the combination of factor analysis and binary logistic regression models. The factor analysis is selected because factor scores can reveal the underlying patterns in the original data while reducing data dimensions and resolving the variable collinearities (2, 3) .

Factor Analysis

The basic factor analysis equation can be represented in form as follows:

2

Where, Z is a n by 1 vector of variables, λ is a n by m matrix of factor loadings, F is a m by 1 vector of factors and ε is a n by 1 vector of error(4).

Factor loadings represent the correlation coefficients between variables and factors. Higher absolute loading values indicate higher contributions to the factor meanings from the corresponding variables, and vice versa. The extent to which a factor represents the variations in the data can be evaluated by Eigen value, and a larger than 1.0 Eigen value indicates a significant factor (5). The Varimax rotation is used in this study to produce orthogonal/uncorrelated factors. And the factor scores were used as inputs into binary logistic regression in the combined model structure.

Binary Logistic Regression Model

Binary logistic regression is a widely used method for predicting probability of a binary outcome (i.e., stop event or go-through event in this study) based on values of a set of explanatory variables (1) .

In logistic regression, the dependent variable is a logit, which is the natural log of the odds:

( ) ( )

Where P is the possibility of the event (coded with 1) occurrence, a is a constant, X are the predictor variables, and b are the predictor coefficients.

Some Assumptions

It is assumed that all the input variables into the factor analysis and logistic models follow normal distribution. All categorical variables in this study were coded as 1 or 1 dummy variables, and this coding allows them to operate as normal scaled variables. In addition, it is assumed that the factors and random error in factor analysis were not correlated.

Performance Measures

The input data were randomly divided into 2 groups: 70% of data into the training (calibration) group and 30% of data into the testing (validation) group. The following four performance indexes were used to evaluate the model performance:

Table 2: Model Performance Measures

Measure Description Sensitivity (Sen) % of stop events predicted correctly Specificity (Spe) % of go-through events predicted correctly False Positive Rate (FPR) % of incorrect stop event prediction False Negative Rate (FNR) % of incorrect go-through event prediction Higher Sen and Spe value together with lower FPR and FNR value indicate better model performance.

3

As described in previous sections, two model structures were developed in this study:

a) Binary logistic regression model b) Combined model with factor analysis score as inputs into binary logistic regression model

Results and Analysis Model Structure a

The binary logistic regressions were performed in SPSS v17.0 (6) , and the forward stepwise model was used with confidence levels of 0.05 and 0.1 as thresholds of variables entering and removing the model, respectively. The established binary logistic model is as follows

( )

( )

The -2 Log likelihood of the above model is 298.605, while the Cox & Snell equals to 0.526

From the above model it can be found that older drivers are less likely to stop than young drivers, and longer yellow length together with short time headway decrease the chance of stop. These observations go against previous observations (1) . It is possible that some correlation has reduced the model explanation power. In addition, the phone usage (any interface) were not found to influence the results.

Model Structure b

The factor analysis was conducted in SPSS v17.0. The principal component method was used to extract factors, and Varimax methods were used in rotated the factor loading. In addition, the factor scores were calculated using regression method in SPSS. The results showed that 5 out of 13 factors achieved the Eigen value of over 1.0, and these factors explained 62.2% of variations (similar to R2) in the original data. The rotated factor loadings are listed as follows:

Table 3: Results of Rotated Factor Loadings

Variables Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 MAge -.021 .006 .005 -.849 .044 OAge .021 .019 -.002 .849 -.052 Gender .224 -.011 .023 .055 -.356 HS .004 -.859 -.014 .006 -.038 HF .002 .863 -.005 .031 .021 OCall .035 .016 -.867 .005 .022 ICall .056 .011 .858 .001 -.047

4

Yellow Length .812 -.028 .043 .020 .365 Min Accel After Accel Pedal -.912 -.038 .016 -.014 .239 Change Max Accel After Accel Pedal .916 .003 .017 -.011 .070 Change Accel Pedal Change Direction .006 .135 -.086 -.104 .216 Time Headway (Binned) .177 -.057 .047 .067 .897 Vel at Yellow to Red -.952 .013 .010 -.038 .141

The Factor 1 in the above table represents the characteristics of the driver, and higher Factor 1 values are associated with higher acceleration, deceleration, and velocity, therefore responsive drivers would achieve high score in Factor 1. Factor 2 represents the phone interface, and Factor 3 is related to call interface. Factor 4 is associated with age, and older people will achieve higher score than younger people. And factor 5 is associated with time headway, with longer time headway associated with higher score.

Then the factor scores were calculated with regression method in SPSS, and the factors scores for the above 5 factors were input into the binary logistic regression models, the established model is as follows:

( )

The -2 Log likelihood of the above model is 374.721, while the Cox & Snell equals to 0.458.

The model explained around 45.8% of variation in the data, and higher value in Factor 1 and lower value in Factor 4 increase the occurrence of stop event. By combing the information in Table 3, it is concluded that more responsive drivers and middle age/young driver are more likely to stop than other drivers. In addition, the phone usage was not found to influence model results.

The overall model results for training group and testing group are summarized in the following table: Table 4: Classification Results for Model Structure a and b

Model Training Group Testing Group

Sen Spe FPR FNR Sen Spe FPR FNR

Structure a 96.8% 73.5% 3.2% 26.5% 93.5% 72.6% 6.5% 27.4%

Structure b 96.3% 64.0% 3.7% 36.0% 96.5% 63.0% 3.5% 37.0%

5

From the above table it can be observed that for training group, Model Structure a is better than the combined Structure b, which is demonstrated by the Structure a’s higher Sen and Spe combined with lower FPR and FNR value. However, for testing group, the combined model (Structure b) is better in predicting stop events. Besides it was also noticed that the performance of model b is more consistent between training group and testing group than model a, and this results showed the stability of the combined model structure.

Critical Review In the modeling process it was assumed that all the variables follow normal distribution however this assumption does not stand true for several variables. This might influence the model results. After the initial analysis results showed that phone usage does not influence the driver behavior. Joint effect of phone interface and call type was also tried however the results were still not significant. It is thus suspicious that the experimental condition with the simulator might have been biased by certain unobserved factors or distraction is not significant while driving and talking on the phone. Intuitively, talking on phone may not be as distracting as texting (entering text) while driving.

It is hoped that the driver response time could be provide in this analysis. The 10% accel change variable did not provide a usable response time. Furthermore, the driver’s driving experience, which might influence the drivers’ distraction response, is also an expected variable. It is also suspected that some bias may be present due to the nature of the process. It is possible that the drivers were conscious about the fact that their driving was observed and they were more careful when driving and talking on the phone together. A real-life of traffic violations (particularly running through a red light while talking on a phone) might provide more meaningful results.

References

1. Gates, T. J., D. A. Noyce, L. Laracuente, and E. V. Nordheim. Analysis of Driver Behavior in Dilemma Zones at Signalized Intersections. Transportation Research Record: Journal of the Transportation Research Board, Vol. 2030, No. 1, 2007, pp. 29-39.

2. Rummel, . J. Applied Factor Analysis. Northwestern Univ Press, 1988.

3. Tabachnick, B. G., L. S. Fidell, and S. J. Osterlind. Using Multivariate . , 2001.

4. Sharma, S. Applied Multivariate Techniques. John Wiley & Sons, Inc., 1995.

5. Saccomanno, F. F., and X. Lai. A Model for Evaluating Countermeasures at Highway-Railway Grade Crossings. Transportation Research Record: Journal of the Transportation Research Board, Vol. 1918, No. 1, 2005, pp. 18-25.

6. Statistics, S. Version 17.0. Chicago: SPSS Inc, 2008.

6