<<

TRANSIT BUS LOAD-BASED MODAL EMISSION RATE MODEL DEVELOPMENT

A Thesis Presented to The Academic Faculty

by

Chunxia Feng

In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Civil and Environmental Engineering

Georgia Institute of Technology May 2007

TRANSIT BUS LOAD-BASED MODAL EMISSION RATE MODEL DEVELOPMENT

Approved by:

Dr. Randall Guensler, Advisor Dr. Michael Hunter School of Civil and Environmental School of Civil and Environmental Engineering Engineering Georgia Institute of Technology Georgia Institute of Technology

Dr. Michael Rodgers Dr. Jennifer H. Ogle School of Civil and Environmental Civil Engineering Department Engineering Georgia Institute of Technology Clemson University

Dr. Michael Meyer School of Civil and Environmental Engineering Georgia Institute of Technology

Date Approved: March 06, 2007 ACKNOWLEDGEMENTS

I would like to express the deepest gratitude to my advisor, Dr. Randall Guensler, for his full support, expert guidance, understanding and encouragement throughout my

study and research. Without his incredible patience and timely wisdom and counsel, my dissertation would have been a frustrating and overwhelming pursuit.

To the other members of my thesis committee, Dr. Michael Rodgers, Dr. Michael

Meyer, Dr. Michael Hunter, and Dr. Jennifer Ogle, I want to say thank you for having

served committee. Their thoughtful questions and comments were valued

greatly. Particular appreciation is reserved for Michael Rodgers for numerous

consultations. His voracious appetite for knowledge was intimidating, challenging,

infectious, and admirable.

I am grateful for financial support from many funding agencies and organizations,

including the USEPA, the Federal Highways Administration, and Georgia Department of

Transportation. Without their support I would not have been able to pursue a doctoral

degree.

Lastly, my heartfelt gratitude is to my husband, Daiheng, and sons, Andrew and

Daniel. Their endless love, understanding and dedication are elements that have

sustained me throughout this journey.

iii

TABLE OF CONTENTS

Page

ACKNOWLEDGEMENTS iii

LIST OF TABLES viii

LIST OF FIGURES xiii

SUMMARY xxi

CHAPTER

1 INTRODUCTION 1

1.1 Emissions from Heavy-Duty Diesel Vehicles 1

1.2 Current Heavy-Duty Vehicle Emissions Modeling Practices 2

1.3 Research Approaches and Objectives 3

1.4 Summary of Research Contributions 4

1.5 Dissertation Organization 5

2 HEAVY-DUTY DIESEL VEHICLE EMISSIONS 6

2.1 How Diesel Engine Works 6

2.2 Diesel Engine Emissions 10

2.3 Heavy-Duty Diesel Vehicle Emission Regulations 13

2.4 Heavy-Duty Diesel Vehicle Emission Modeling 16

3 HEAVY-DUTY DIESEL VEHICLE EMISSIONS MODELING 17

3.1 VMT-Based Vehicle Emission Models 17

3.2 Fuel-Based Vehicle Emission Models 25

3.3 Modal Emission Rate Models 26

4 EMISSION DATASET DESCRIPTION AND POST-PROCESSING PROCEDURE 41

iv 4.1 Transit Bus Dataset 41

4.2 Heavy-duty Vehicle Dataset 56

5 METHODOLOGICAL APPROACH 65

5.1 Modeling Goal and Objectives 65

5.2 Statistical Method 66

5.3 Modeling Approach 77

5.4 Model Validation 79

6 DATASET SELECTION AND ANALYSIS OF EXPLANATORY VARIABLES 81

6.1 Dataset Used for Model Development 81

6.2 Representative Ability of the Transit Bus Dataset 83

6.3 Variability in Emissions Data 86

6.4 Potential Explanatory Variables 99

6.5 Selection of Explanatory Variables 106

7 MODAL ACTIVITY DEFINITIONS DEVELOPMENT 112

7.1 Overview of Current Modal Activity Definitions 112

7.2 Proposed Modal Activity Definitions and Validation 115

7.3 Conclusions 123

8 IDLE MODE DEVELOPMENT 124

8.1 Critical Value for Speed in Idle Mode 124

8.2 Critical Value for Acceleration in Idle Mode 127

8.3 Emission Rate Distribution by Bus in Idle Mode 131

8.4 Discussions 137

8.5 Idle Emission Rates Estimation 140

8.6 Conclusions and Further Considerations 143

9 DECELERATION MODE DEVELOPMENT 145

v 9.1 Critical Value for Deceleration Rates in Deceleration Mode 145

9.2 Analysis of Deceleration Mode Data 149

9.3 The Deceleration Motoring Mode 157

9.4 Deceleration Emission Rates Estimations 160

9.5 Conclusions and Further Considerations 165

10 ACCELERATION MODE DEVELOPMENT 167

10.1 Critical Value for Acceleration in Acceleration Mode 167

10.2 Analysis of Acceleration Mode Data 172

10.3 Model Development and Refinement 179

10.4 Conclusions and Further Considerations 236

11 CRUISE MODE DEVELOPMENT 238

11.1 Analysis of Cruise Mode Data 238

11.2 Model Development and Refinement 245

11.3 Conclusions and Further Considerations 292

12 MODEL VERIFICATION 294

12.1 Engine Power vs. Surrogate Power Variables 294

12.2 Mean Emission Rates vs. Linear Regression Model 297

12.3 Mode-specific Load Based Modal Emission Rate Model vs. Emission Rate Models as a Function of Engine Load 300

12.4 Separation of Acceleration and Cruise Modes 304

12.5 MOBILE 6.2 vs. Load-Based Modal Emission Rate Model 306

12.6 Conclusions 307

13 CONCLUSIONS 309

13.1 Transit Bus Emission Rate Models 311

13.2 Model Limitations 313

13.3 Lessons Learned 315

vi 13.4 Contributions 316

13.5 Recommendation for Further Studies 316

REFERENCES 318

VITA 327

vii LIST OF TABLES

Page

Table 2-1 National Ambient Air Quality Standards 14

Table 2-2 Heavy-Duty Engine Emissions Standards 16

Table 3-1 Heavy-Duty Vehicle NOx Emission Rates in MOBILE6 21

Table 3-3 Heavy-Duty Vehicle HC Emission Rates in MOBILE6 21

Table 4-1 Buses Tested for USEPA 42

Table 4-2 Transit Bus Parameters Given by the USEPA 44

Table 4-3 List of Parameters Used in Explanatory Analysis for Transit Bus 54

Table 4-4 Summary of Transit Bus Database 55

Table 4-5 Onroad Tests Conducted with Pre-Rebuild Engine 57

Table 4-6 Onroad Tests Conducted with Post-Rebuild Engine 58

Table 4-7 List of Parameters Given in Heavy-duty Vehicle Dataset Provided by USEPA 59

Table 4-8 List of Parameters Used in Explanatory Analysis for HDDV 63

Table 4-9 Summary of Heavy-Duty Vehicle Data 64

Table 5-1 ANOVA Table for Single-Factor Study 73

Table 6-1 Basic Summary Statistics for Emissions Rate Data for Transit Bus 89

Table 6-2 Basic Summary Statistics for Truncated Emissions Rate Data 92

Table 6-3 CARB Emission Regime Definition 97

Table 6-4 Percent of High Emission Points by Bus 98

Table 6-5 Correlation Matrix for Transit Bus Dataset 107

Table 7-1 Comparison of Modal Activity Definition 114

Table 7-2 Four Different Mode Definitions and Modal Variables 116

viii Table 7-3 Results for Pairwise Comparison for Modal Average Estimates In Terms of P- value 119

Table 7-4 Sensitivity Test Results for Four Mode Definition 122

Table 8-1 Engine Power Distribution for Three Critical Values for Three Pollutants 127

Table 8-2 Percentage of Engine Power Distribution for Three Critical Values for Three Pollutants 127

Table 8-3 Engine Power Distribution for Four Options for Three Pollutants 130

Table 8-4 Percentage of Engine Power Distribution for Three Critical Values for Three Pollutants 131

Table 8-5 Median, and Mean of Three Pollutants in Idle Mode by Bus 134

Table 8-6 Engine Power Distribution in Idle Mode by Bus 135

Table 8-7 Idle Mode Statistical Analysis Results for NOx, CO, and HC 141

Table 8-8 Idle Emission Rates Estimation and 95% Confidence Intervals Based on Bootstrap 143

Table 9-1 Engine Power Distribution for Three Options for Three Pollutants 146

Table 9-2Percentage of Engine Power Distribution for Three Options for Three Pollutants 146

Table 9-3 Median, and Mean for NOx, CO, and HC in Deceleration Mode by Bus 152

Table 9-4 High HC Emissions Distribution by Bus and Trip for Deceleration Mode 154

Table 9-5 Engine Power Distributions in Deceleration Mode by Bus 155

Table 9-6 Comparison of Emission Distributions between Deceleration Mode and Two Sub-Modes (Deceleration Motoring Mode and Deceleration Non-Motoring Mode) 160

Table 9-7 Emission Rates Estimation and 95% Confidence Intervals Based on Bootstrap for Deceleration Mode 164

Table 10-1 Engine Power Distribution for Three Options for Three Pollutants 170

Table 10-2 Percentage of Engine Power Distribution for Three Options for Three Pollutants 170

Table 10-3 Engine Power Distribution for Acceleration Mode and Cruise Mode 171

Table 10-4 Median, and Mean of Three Pollutants in Acceleration Mode by Bus 174

ix Table 10-5 Engine Power Distribution in Acceleration Mode by Bus 177

Table 10-6 Original Untrimmed Regression Tree Results for Truncated Transformed NOx Emission Rate in Acceleration Mode 186

Table 10-7 Trimmed Regression Tree Results for Truncated Transformed NOx Emission Rate in Acceleration Mode 187

Table 10-8 Original Untrimmed Regression Tree Results for Truncated Transformed CO Emission Rate in Acceleration Mode 190

Table 10-9 Trimmed Regression Tree Results for Truncated Transformed CO Emission Rate in Acceleration Mode 191

Table 10-10 Original Untrimmed Regression Tree Results for Truncated Transformed HC Emission Rate in Acceleration Mode 193

Table 10-11 Trimmed Regression Tree Results for Truncated Transformed HC in Acceleration Mode 195

Table 10-12 Secondary Trimmed Regression Tree Results for Truncated Transformed HC Emission Rate in Acceleration Mode 197

Table 10-13 Final Regression Tree Results for Truncated Transformed HC and Engine Power in Acceleration Mode 198

Table 10-14 Regression Result for NOx Model 1.1 200

Table 10-15 Regression Result for NOx Model 1.2 202

Table 10-16 Regression Result for NOx Model 1.3 203

Table 10-17 Regression Result for NOx Model 1.4 205

Table 10-18 Regression Result for NOx Model 1.5 207

Table 10-19 Comparative Performance Evaluation of NOx Emission Rate Models 210

Table 10-20 Regression Result for CO Model 2.1 213

Table 10-21 Regression Result for CO Model 2.2 215

Table 10-22 Regression Result for CO Model 2.3 216

Table 10-23 Regression Result for CO Model 2.4 218

Table 10-24 Regression Result for CO Model 2.5 220

Table 10-25 Regression Result for CO Model 2.6 222

x Table 10-26 Comparative Performance Evaluation of CO Emission Rate Models 224

Table 10-27 Regression Result for HC Model 3.1 227

Table 10-28 Regression Result for HC Model 3.2 229

Table 10-29 Regression Result for HC Model 3.3 230

Table 10-30 Regression Result for HC Model 3.4 232

Table 10-31 Comparative Performance Evaluation of HC Emission Rate Models 234

Table 11-1 Engine Power Distribution for Cruise Mode 238

Table 11-2 Median and Mean of Three Pollutants in Cruise Mode by Bus 241

Table 11-3 Engine Power Distribution in Cruise Mode by Bus 243

Table 11-4 Original Untrimmed Regression Tree Results for Truncated Transformed NOx Emission Rate in Cruise Mode 251

Table 11-5 Trimmed Regression Tree Results for Truncated Transformed NOx Emission Rate in Cruise Mode 252

Table 11-6 Original Untrimmed Regression Tree Results for Truncated Transformed CO Emission Rate in Cruise Mode 255

Table 11-7 Trimmed Regression Tree Results for Truncated Transformed CO Emission Rate in Cruise Mode 256

Table 11-8 Original Untrimmed Regression Tree Results for Truncated Transformed HC Emission Rate in Cruise Mode 258

Table 11-9 Trimmed Regression Tree Results for Truncated Transformed HC Emission Rate in Cruise Mode 260

Table 11-10 Trimmed Regression Tree Results for Truncated Transformed HC in Cruise Mode 261

Table 11-11 Final Regression Tree Results for Truncated Transformed HC and Engine Power in Cruise Mode 263

Table 11-12 Regression Result for NOx Model 1.1 264

Table 11-13 Regression Result for NOx Model 1.2 266

Table 11-14 Regression Result for NOx Model 1.3 267

Table 11-15 Regression Result for NOx Model 1.4 269

xi Table 11-16 Comparative Performance Evaluation of NOx Emission Rate Models 271

Table 11-17 Regression Result for CO Model 2.1 274

Table 11-18 Regression Result for CO Model 2.2 276

Table 11-19 Regression Result for CO Model 2.3 277

Table 11-20 Regression Result for CO Model 2.4 279

Table 11-21 Comparative Performance Evaluation of CO Emission Rate Models 280

Table 11-22 Regression Result for HC Model 3.1 283

Table 11-23 Regression Result for HC Model 3.2 285

Table 11-24 Regression Result for HC Model 3.3 286

Table 11-25 Regression Result for HC Model 3.4 288

Table 11-26 Comparative Performance Evaluation of HC Emission Rate Models 290

Table 12-1 Regression Result for NOx Model 1 296

Table 12-2 Comparative Performance Evaluation between Mode-Only Models and Linear Regression Models 299

Table 12-3 Trimmed Regression Tree Results for Truncated Transformed NOx 301

Table 12-4 Regression Result for NOx Load-Based Only Emission Rate Model 302

Table 12-5 Comparative Performance Evaluation Between Load-Based Only Emission Rate (ER) Model and Load-Based Modal Emission Rate Model 304

Table 12-6 Comparative Performance Evaluation between Linear Regression with Combined Mode and Linear Regression with Acceleration & Cruise Modes 305

Table 12-7 Comparative Performance Evaluation between MOBILE 6.2 and Load-Based Modal ER Model 307

Table 13-1 Load Based Modal Emission Models 312

xii LIST OF FIGURES

Page

Figure 2-1 Actions of a four-stroke internal combustion engine 8

Figure 2-2 Actions of a four-stroke diesel engine 10

Figure 3-1 FTP Transient Cycle 19

Figure 3-2 Urban Dynamometer Driving Schedule Cycle for Heavy-Duty Vehicle 20

Figure 3-3 CARB's Four Mode Cycles 24

Figure 3-4 A Framework of Heavy-Duty Diesel Vehicle Modal Emission Model 31

Figure 3-5 Primary Elements in the Drivetrain 34

Figure 4-1 Bus Routes Tested for USEPA 43

Figure 4-2 SEMTECH-D in Back of Bus 44

Figure 4-3 Bus 380 GPS vs. ECM Vehicle Speed 47

Figure 4-4 Example Check for Erroneous GPS Data for Bus 360 49

Figure 4-5 Example Check for Synchronization Errors for Bus 360 50

Figure 4-6 Histograms of Engine Power for Zero Speed Data Based on Three Different Time Delays 51

Figure 4-7 General Criteria for Maximum Grades 52

Figure 4-8 Onroad Diesel Emissions Characterization Facility 57

Figure 4-9 Example Check for Erroneous Measured Horsepower for Test 3DRI2-2 61

Figure 4-10 Vehicle Speed Correlation 62

Figure 4-11 Vehicle Speed Error for Different Speed Ranges 62

Figure 6-1 HTBR Regression Tree Result for NOx Emission Rate for All Datasets 82

Figure 6-2 HTBR Regression Tree Result for CO Emission Rate for All Datasets 82

Figure 6-3 HTBR Regression Tree Result for HC Emission Rate for All Datasets 83

Figure 6-4 Transit Bus Speed-Acceleration Matrix 84

xiii Figure 6-5 Test Environmental Conditions 85

Figure 6-6 Median and Mean of NOx Emission Rates by Bus 86

Figure 6-7 Median and Mean of CO Emission Rates by Bus 87

Figure 6-8 Median and Mean of HC Emission Rates by Bus 87

Figure 6-9 Empirical Cumulative Distribution Function Based on Bus Based Median Emission Rates for Transit Buses 88

Figure 6-10 Histogram, Boxplot, and Probability Plot of NOx Emission Rate 89

Figure 6-11 Histogram, Boxplot, and Probability Plot of CO Emission Rate 90

Figure 6-12 Histogram, Boxplot, and Probability Plot of HC Emission Rate 90

Figure 6-13 Histogram, Boxplot, and Probability Plot of Truncated NOx Emission Rate 92

Figure 6-14 Histogram, Boxplot, and Probability Plot of Truncated CO Emission Rate 93

Figure 6-15 Histogram, Boxplot, and Probability Plot of Truncated HC Emission Rate 93

Figure 6-16 Histogram, Boxplot, and Probability Plot of Truncated Transformed NOx Emission Rate 95

Figure 6-17 Histogram, Boxplot, and Probability Plot of Truncated Transformed CO Emission Rate 95

Figure 6-18 Histogram, Boxplot, and Probability Plot of Truncated Transformed HC Emission Rate 96

Figure 6-19 The X Classes and Typical Vehicle Configurations 101

Figure 6-20 Throttle Position vs. Engine Power for Transit Bus Dataset 109

Figure 7-1 Average NOx Modal Emission Rates for Different Activity Definitions 117

Figure 7-2 Average CO Modal Emission Rates for Different Activity Definitions 117

Figure 7-3 Average HC Modal Emission Rates for Different Activity Definitions 118

Figure 7-4 HTBR Regression Tree Result for NOx Emission Rate 120

Figure 7-5 HTBR Regression Tree Result for CO Emission Rate 121

Figure 7-6 HTBR Regression Tree Result for HC Emission Rate 121

xiv Figure 8-1 Engine Power vs. NOx Emission Rate for Three Critical Values 125

Figure 8-2 Engine Power vs. CO Emission Rate for Three Critical Values 125

Figure 8-3 Engine Power vs. HC Emission Rate for Three Critical Values 126

Figure 8-4 Engine Power Distribution for Three Critical Values based on NOx Emissions 126

Figure 8-5 Engine Power vs. NOx Emission Rate for Four Options 128

Figure 8-6 Engine Power vs. CO Emission Rate for Four Options 129

Figure 8-7 Engine Power vs. HC Emission Rate for Four Options 129

Figure 8-8 Engine Power Distribution for Four Options based on NOx Emission Rates 130

Figure 8-9 Histograms of Three Pollutants for Idle Mode 132

Figure 8-10 Median and Mean of NOx Emission Rates in Idle Mode by Bus 132

Figure 8-11 Median and Mean of CO Emission Rates in Idle Mode by Bus 133

Figure 8-12 Median and Mean of HC Emission Rates in Idle Mode by Bus 133

Figure 8-13 Histograms of Engine Power in Idle Mode by Bus 136

Figure 8-14 Tree Analysis Results for High HC Emission Rates by Bus and Trip 138

Figure 8-15 Time Series Plot for Bus 360 Trip 4 Idle Segment 1 (130 Seconds) 138

Figure 8-16 Time Series Plot for Bus 360 Trip 4 Idle Segment 38 (516 Seconds) 139

Figure 8-17 Time Series Plot for Bus 372 Trip 1 Idle Segment 1 (500 Seconds) 139

Figure 8-18 Time Series Plot for Bus 383 Trip 1 Idle Segment 12 (1258 Seconds) 140

Figure 8-19 Graphical Illustration of Bootstrap 142

Figure 8-20 Bootstrap Results for Idle Emission Rate Estimation 142

Figure 9-1 Engine Power Distribution for Three Options 147

Figure 9-2 Engine Power vs. NOx Emission Rate for Three Options 147

Figure 9-3 Engine Power vs. CO Emission Rate for Three Options 148

Figure 9-4 Engine Power vs. HC Emission Rate for Three Options 148

xv Figure 9-5 Histograms of Three Pollutants for Deceleration Mode 150

Figure 9-6 Median and Mean of NOx Emission Rates in Deceleration Mode by Bus 150

Figure 9-7 Median and Mean of CO Emission Rates in Deceleration Mode by Bus 151

Figure 9-8 Median and Mean of HC Emission Rates in Deceleration Mode by Bus 151

Figure 9-9 Histograms of Engine Power in Deceleration Mode by Bus 156

Figure 9-10 Engine Power vs. Vehicle Speed, Engine Power vs. and Engine Speed, and Vehicle Speed vs. Engine Speed 157

Figure 9-11 Histograms for Three Pollutants in Deceleration Motoring Mode (a) and Deceleration Non-Motoring Mode (b) 158

Figure 9-12 Bootstrap Results for NOx Emission Rate Estimation in Deceleration Mode 162

Figure 9-13 Bootstrap Results for CO Emission Rate Estimation in Deceleration Mode 162

Figure 9-14 Bootstrap Results for HC Emission Rate Estimation in Deceleration Mode 163

Figure 9-15 Emission Rate Estimation Based on Bootstrap for Deceleration Mode 163

Figure 10-1 Engine Power Distribution for Three Options 168

Figure 10-2 Engine Power vs. NOx Emission Rate for Three Options 168

Figure 10-3 Engine Power vs. CO Emission Rate for Three Options 169

Figure 10-4 Engine Power vs. HC Emission Rate for Three Options 169

Figure 10-5 Engine Power vs. Emission Rate for Acceleration Mode and Cruise Mode 172

Figure 10-6 Histograms of Three Pollutants for Acceleration Mode 173

Figure 10-7 Median and Mean of NOx Emission Rates in Acceleration Mode by Bus 174

Figure 10-8 Median and Mean of CO Emission Rates in Acceleration Mode by Bus 175

Figure 10-9 Median and Mean of HC Emission Rates in Acceleration Mode by Bus 175

Figure 10-10 Histograms of Engine Power in Acceleration Mode by Bus 178

xvi Figure 10-11 Histogram, Boxplot, and Probability Plot of Truncated NOx Emission Rate in Acceleration Mode 181

Figure 10-12 Histogram, Boxplot, and Probability Plot of Truncated CO Emission Rate in Acceleration Mode 182

Figure 10-13 Histogram, Boxplot, and Probability Plot of Truncated HC Emission Rate in Acceleration Mode 182

Figure 10-14 Histogram, Boxplot, and Probability Plot of Truncated Transformed NOx Emission Rate in Acceleration Mode 183

Figure 10-15 Histogram, Boxplot, and Probability Plot of Truncated Transformed CO Emission Rate in Acceleration Mode 183

Figure 10-16 Histogram, Boxplot, and Probability Plot of Truncated Transformed HC Emission Rate in Acceleration Mode 184

Figure 10-17 Original Untrimmed Regression Tree Model for Truncated Transformed NOx Emission Rate in Acceleration Mode 185

Figure 10-18 Reduction in Deviation with the Addition of Nodes of Regression Tree for Truncated Transformed NOx Emission Rate in Acceleration Mode 185

Figure 10-19 Trimmed Regression Tree Model for Truncated Transformed NOx Emission Rate in Acceleration Mode 187

Figure 10-20 Original Untrimmed Regression Tree Model for Truncated Transformed CO Emission Rate in Acceleration Mode 189

Figure 10-21 Reduction in Deviation with the Addition of Nodes of Regression Tree for Truncated Transformed CO Emission Rate in Acceleration Mode 189

Figure 10-22 Trimmed Regression Tree Model for Truncated Transformed CO Emission Rate in Acceleration Mode 191

Figure 10-23 Original Untrimmed Regression Tree Model for Truncated Transformed HC Emission Rate in Acceleration Mode 192

Figure 10-24 Reduction in Deviation with the Addition of Nodes of Regression Tree for Truncated Transformed HC Emission Rate in Acceleration Mode 193

Figure 10-25 Trimmed Regression Tree Model for Truncated Transformed HC in Acceleration Mode 195

Figure 10-26 Secondary Trimmed Regression Tree Model for Truncated Transformed HC Emission Rate in Acceleration Mode 196

xvii Figure 10-27 Final Regression Tree Model for Truncated Transformed HC and Engine Power in Acceleration Mode 198

Figure 10-28 QQ and Residual vs. Fitted Plot for NOx Model 1.1 201

Figure 10-29 QQ and Residual vs. Fitted Plot for NOx Model 1.2 202

Figure 10-30 QQ and Residual vs. Fitted Plot for NOx Model 1.3 203

Figure 10-31 QQ and Residual vs. Fitted Plot for NOx Model 1.4 205

Figure 10-32 QQ and Residual vs. Fitted Plot for NOx Model 1.5 208

Figure 10-33 QQ and Residual vs. Fitted Plot for CO Model 2.1 214

Figure 10-33 QQ and Residual vs. Fitted Plot for CO Model 2.2 215

Figure 10-34 QQ and Residual vs. Fitted Plot for CO Model 2.3 216

Figure 10-35 QQ and Residual vs. Fitted Plot for CO Model 2.4 218

Figure 10-36 QQ and Residual vs. Fitted Plot for CO Model 2.5 221

Figure 10-37 QQ and Residual vs. Fitted Plot for CO Model 2.6 223

Figure 10-38 QQ and Residual vs. Fitted Plot for HC Model 3.1 228

Figure 10-39 QQ and Residual vs. Fitted Plot for HC Model 3.2 230

Figure 10-40 QQ and Residual vs. Fitted Plot for HC Model 3.3 231

Figure 10-41 QQ and Residual vs. Fitted Plot for HC Model 3.4 233

Figure 11-1 Histograms of Three Pollutants for Cruise Mode 239

Figure 11-2 Median and Mean of NOx Emission Rates in Cruise Mode by Bus 240

Figure 11-3 Median and Mean of CO Emission Rates in Cruise Mode by Bus 240

Figure 11-4 Median and Mean of HC Emission Rates in Cruise Mode by Bus 241

Figure 11-5 Histograms of Engine Power in Cruise Mode by Bus 244

Figure 11-6 Histogram, Boxplot, and Probability Plot of Truncated NOx Emission Rate in Cruise Mode 246

Figure 11-7 Histogram, Boxplot, and Probability Plot of Truncated CO Emission Rate in Cruise Mode 247

xviii Figure 11-8 Histogram, Boxplot, and Probability Plot of Truncated HC Emission Rate in Cruise Mode 247

Figure 11-9 Histogram, Boxplot, and Probability Plot of Truncated Transformed NOx Emission Rate in Cruise Mode 248

Figure 11-10 Histogram, Boxplot, and Probability Plot of Truncated Transformed CO Emission Rate in Cruise Mode 248

Figure 11-11 Histogram, Boxplot, and Probability Plot of Truncated Transformed HC Emission Rate in Cruise Mode 249

Figure 11-12 Original Untrimmed Regression Tree Model for Truncated Transformed NOx Emission Rate in Cruise Mode 250

Figure 11-13 Reduction in Deviation with the Addition of Nodes of Regression Tree for Truncated Transformed NOx Emission Rate in Cruise Mode 250

Figure 11-14 Trimmed Regression Tree Model for Truncated Transformed NOx Emission Rate in Cruise Mode 252

Figure 11-15 Original Untrimmed Regression Tree Model for Truncated Transformed CO Emission Rate in Cruise Mode 254

Figure 11-16 Reduction in Deviation with the Addition of Nodes of Regression Tree for Truncated Transformed CO Emission Rate in Cruise Mode 254

Figure 11-17 Trimmed Regression Tree Model for Truncated Transformed CO Emission Rate in Cruise Mode 256

Figure 11-18 Original Untrimmed Regression Tree Model for Truncated Transformed HC Emission Rate in Cruise Mode 257

Figure 11-19 Trimmed Regression Tree Model for Truncated Transformed HC Emission Rate in Cruise Mode 259

Figure 11-20 Secondary Trimmed Regression Tree Model for Truncated Transformed HC in Cruise Mode 261

Figure 11-21 Final Regression Tree Model for Truncated Transformed HC and Engine Power in Cruise Mode 262

Figure 11-22 QQ and Residual vs. Fitted Plot for NOx Model 1.1 265

Figure 11-23 QQ and Residual vs. Fitted Plot for NOx Model 1.2 266

Figure 11-24 QQ and Residual vs. Fitted Plot for NOx Model 1.3 267

Figure 11-25 QQ and Residual vs. Fitted Plot for NOx Model 1.4 270

xix Figure 11-26 QQ and Residual vs. Fitted Plot for CO Model 2.1 275

Figure 11-27 QQ and Residual vs. Fitted Plot for CO Model 2.2 276

Figure 11-28 QQ and Residual vs. Fitted Plot for CO Model 2.3 277

Figure 11-29 QQ and Residual vs. Fitted Plot for CO Model 2.4 279

Figure 11-30 QQ and Residual vs. Fitted Plot for HC Model 3.1 284

Figure 11-31 QQ and Residual vs. Fitted Plot for HC Model 3.2 285

Figure 11-32 QQ and Residual vs. Fitted Plot for HC Model 3.3 286

Figure 11-33 QQ and Residual vs. Fitted Plot for HC Model 3.4 288

Figure 12-1 QQ and Residual vs. Fitted Plot for NOx Model 1 297

Figure 12-2 Trimmed Regression Tree Model for Truncated Transformed NOx 301

Figure 12-3 QQ and Residual vs. Fitted Plot for Load-Based Only NOx Emission Rate Model 303

xx SUMMARY

Heavy-duty diesel vehicle (HDDV) operations are a major source of pollutant emissions in major metropolitan areas. Accurate estimation of heavy-duty diesel vehicle emissions is essential in air quality planning efforts because highway and non-road heavy-duty diesel emissions account for a significant fraction of the oxides of nitrogen

(NOx) and particulate matter (PM) emissions inventories. Yet, major modeling deficiencies in the current MOBILE6 modeling approach for heavy-duty diesel vehicles have been widely recognized for more than ten years. While the most recent

MOBILE6.2 model integrates marginal improvements to various internal conversion and correction factors, fundamental flaws inherent in the modeling approach still remain.

The major effort of this research is to develop a new heavy-duty vehicle load- based modal emission rate model that overcomes some of the limitations of existing models and emission rates prediction methods. This model is part of the proposed

Heavy-Duty Diesel Vehicle Modal Emission Modeling (HDDV-MEM) which was developed by Georgia Institute of Technology. HDDV-MEM first predicts second-by- second engine power demand as a function of vehicle operating conditions and then applies brake-specific emission rates to these activity predictions.

To provide better estimates of microscopic level, this modeling approach is designed to predict second-by-second emissions from onroad vehicle operations. This research statistically analyzes the database provided by EPA and yields a model for prediction emissions at microscopic level based on engine power demand and driving mode. Research results will enhance the explaining ability of engine power demand on emissions and the importance of simulating engine power in real world applications. The

xxi modeling approach provides a significant improvement in HDDV emissions modeling compared to the current average speed cycle-based emissions models.

xxii CHAPTER 1

INTRODUCTION

1.1 Emissions from Heavy-Duty Diesel Vehicles

Heavy-duty diesel vehicles (HDDVs) operations are a major source of oxides of nitrogen (NOx) and particulate matter (PM) emissions in metropolitan area nationwide.

Although HDDVs constitute a small portion of the onroad fleet, they typically contribute more than 45% of NOx and 75% of PM onroad mobile source emissions (USEPA 2003).

HDDV emissions are a large source of global greenhouse gas and toxic air containment emissions. According to Environmental Defense Report in 2002, NOx causes many environmental problems including: acid rain, haze, global warming and nutrient overloading leading to water quality degradation (CEDF 2002). HDDV emissions are also harmful to human health and the environment (SCAQMD 2000). Groundbreaking long-term studies of children’s health conducted in California have demonstrated that particle pollution may significantly reduce lung function growth in children (Peters 1999;

Avol 2001; Gauderman 2002). Previous studies have stressed the significance of emissions from HDVs, in urban non-attainment areas especially for ozone (for which nitrogen oxides are a precursor) and PM2.5 (Lloyd and Cackette 2001; Gautam and Clark

2003).

Over the last several decades, both government and private industry have made extensive efforts to regulate and control mobile source emissions. In 1961, the first automotive emissions control technology in the nation, Positive Crankcase Ventilation

(PCV), was mandated by the California Motor Vehicle State Bureau of Air Sanitation to control hydrocarbon crankcase emissions, and PCV Requirement went into effect on domestic passenger vehicles for sale in California in 1963 (CARB 2004). At the same

1 time, first Federal Clean Air Act was enacted. Although this act only dealt with reducing air pollution by setting emissions standards for stationary sources such as power plants and steel mills at the beginning, amendments of 1965, 1966 and 1967 focused on establishing standards for automobile emissions (AMS 2005). Emission control was first required on light-duty gasoline vehicles (LDVs) in the 1968 model year by USEPA.

Developed and refined over a period of more than 30 years, these controls have become more effective at reducing LDV emissions (FCAP 2004).

The relative importance of emissions from HDDVs has increased significantly because today’s gasoline powered vehicles are more than 95% cleaner than vehicles in

1968. Considering that HDDVs typically have a life cycle of over one million miles and may be on the road as long as 30 years, and HDDVs will continue to play a major emission inventory role with increases in goods movement with their high durability and reliability, modeling of HDDV emissions is going to become increasingly important in air quality planning.

1.2 Current Heavy-Duty Vehicle Emissions Modeling Practices

In current regional and microscale modeling conducted in every state except

California, HDDV emissions rates are taken from the U.S. Environmental Protection

Agency’s (EPA’s) MOBILE 6.2 model(USEPA 2001a). MOBILE 6.2 emission rates were derived from baseline emission rates (gram/brake- horsepower-hour) developed in the laboratory using engine dynamometer test cycles. While different driving cycles have been developed over the years, dynamometer testing is conceptually designed to obtain a

“representative sample” of vehicle operations. These work-based emission rates are then modified through a series of conversion and correction factors so that approximate emission rates in units of grams/mile that can be applied to onroad vehicle activity

(vehicle miles traveled), as a function of temperature, humidity, altitude, average vehicle speed, etc. (Guensler 1993). The conversion process used to translate laboratory

2 emission rates to onroad emission rates employs fuel density, brake specific fuel

consumption, and fuel economy for each HDDV technology class. However, the

emission rate conversion process does not appropriately account for the impacts of roadway operating conditions on brake specific fuel consumption and fuel economy

(Guensler et al. 1991).

The U.S. Environment Protection Agency (USEPA) is currently developing a new set of modeling tools for the estimation of emissions produced by onroad and off-road

mobile source. The new Multi-scale mOtor Vehicle & equipment Emission System,

known as MOVES, is a modeling system designed to better predict emissions from

onroad operations. The philosophy behind MOVES is to develop a model that is as

directly data-driven as possible, meaning that emission rate are developed from second-

by-second or binned emission rate data.

1.3 Research Approaches and Objectives

The major effort of this research is to develop a new heavy-duty vehicle load-

based modal emission rate model that overcomes some of the limitations of existing

models and emission rates prediction methods. This model is part of the proposed

Heavy-Duty Diesel Vehicle Modal Emission Modeling (HDDV-MEM) which was developed by Georgia Institute of Technology (Guensler et al. 2006). HDDV-MEM differs from other proposed HDDV modal models (Frey et al. 2002; Nam 2003; Barth et

al. 2004) in that the modeling framework first predicts second-by-second engine power

demand as a function of vehicle operating conditions and then applies brake-specific

emission rates to these activity predictions. This means that HDDV emission rates are

predicted as a function of engine horsepower loads for different driving modes. Hence,

the basic algorithm and matrix calculation in the HDDV-MEM should be transferable to

MOVES. The new model implementation is similar in general structure to previous

model emission rate model known as Mobile Emissions Assessment System for Urban

3 and Regional Evaluation (MEASURE) model developed by Georgia Institute of

Technology several years ago (Backman 2000).

The major effort of this research consists of a number of specific objectives outlined below:

• Develop a new load-based modal emission rate model to improve spatial/temporal

emissions modeling

• Develop a HDDV modal emission rate model to more accurately estimate onroad

HDDV emissions

• Develop a modal model that can be verified at multiple levels

• Develop a HDDV modal emission rate model that can be integrated into the

MOVES

1.4 Summary of Research Contributions

There are four major contributions developed by this research. First, a framework for emission rate modeling suitable for predicting emissions at different scales

(microscale, mesoscale, and macroscale) is established. Since this model is developed using on-board emissions data which are collected under real-world conditions, this model will provide capabilities for integrating necessary vehicle activity data and emission rate algorithms to support second-by-second and link-based emissions prediction. Combined with GIS framework, this model will improve spatial/temporal emissions modeling.

Second, the relationship between engine power and emissions is explored and integrated into the modeling framework. Research results indicate that engine power is more powerful than surrogate variables to present load data in the proposed model.

Based on the important role of engine power in explaining the variability of emissions, it is better to include the load data measurement during emission data collection procedure.

4 Meanwhile, development of methods to simulate real world engine power is equally important.

Third, this research verifies that vehicle emission rates are highly correlated with modal vehicle activity. To get better understanding of driving modes, it is important to examine not only emission distributions, but also engine power distributions.

Finally, a living framework is created for further improvement. As more databases become available, this approach could be re-run and get more reliable load- based modal emission model based on the same philosophy.

1.5 Dissertation Organization

Chapter 2 examines the diesel fuel combustion process and its relationship to diesel engine emissions formation. Chapter 3 overviews the existing heavy-duty vehicle emission models and presents the proposed heavy-duty diesel vehicle modal emission model (HDDV-MEM). Chapter 4 provides an overview of the emission rate testing databases provided by USEPA, the quality assure and quality control (QA/QC) procedures to review the validity of the data, and the methods used to post-process these databases to correct data deficiencies. In Chapter 5, the various statistical models considered for data analysis are discussed. Chapter 6 selects the database used to develop the conceptual model and discusses the influence of explanatory variables on emissions.

Chapter 7 covers sensitivity tests of driving mode definitions and outlines the potential impacts on derived models. Chapter 8 to 11, elaborate the different emission models developed for idle, deceleration, acceleration and cruise driving modes. In Chapter 12, research results are verified. Finally, Chapter 13 presents a discussion and conclusion on research results.

5 CHAPTER 2

HEAVY-DUTY DIESEL VEHICLE EMISSIONS

Diesel engines differ from gasoline engines in terms of the combustion processes and engine size, giving rise to their different emission properties and therefore different emissions standards. This chapter examines the diesel fuel combustion process and its relationship to diesel engine emissions formation followed by a summary of the emission regulations for diesel engines.

2.1 How Diesel Engine Works

By far the predominant engine design for transportation vehicles is the reciprocating internal combustion (IC) engine which operates either on a four-stroke or a two-stroke cycle. The two-stroke engine is commonly found in lower-power applications such as snowmobiles, lawnmowers, mopeds, outboard motors and motorcycles, while both gasoline and diesel automotive engines are classified as four-stroke engines. To understand the formation and control of emissions, it is necessary to first develop an understanding of the operation of the internal combustion engine.

2.1.1 The Internal Combustion Engine

Internal combustion engines generate power by converting the chemical energy stored in fuels into mechanical energy. The engine is termed “internal combustion” because combustion occurs in a confined space called a combustion chamber.

Combustion of the fuel charge inside a chamber causes a rapid rise in temperature and pressure of the gases in the chamber, which are permitted to expand. The expanding gases are used to move a piston, turbine blades, rotor, or the engine itself.

6 The four-stroke gasoline engine cycle is also called Otto cycle, in honor of

Nikolaus Otto, who is credited with inventing the process in 1867. The four piston

strokes are illustrated in Figure 2-1. The following processes take place during one cycle

of operation:

1. Intake stroke: the piston starts at the top, the intake valve opens, and the piston

moves down to let the engine takes in a fresh charge composed of a mixture of fuel and

air (for spark-ignition or gasoline engine) or air only (for auto-ignition or diesel engine).

(Part 1 of the figure)

2. Compression stroke: then the piston moves back up to compress this fuel/air

mixture (gasoline engines) or the air only (diesel engines). In gasoline engines

combustion is by ignition from a spark plug, in diesel engines auto-ignition occurs

when fuel is injected into the compressed air which has achieved a high temperature

through compression such that the temperature is high enough to cause self-ignition. (Part

2 of the figure)

3. Expansion stroke: when the piston reaches the top of its stroke, the

combustion process results in a substantial increase in the gas temperature and pressure

and drives the piston down. (Part 3 of the figure)

4. Exhaust stroke: once the piston hits the bottom of its stroke, the exhaust valve

opens and the exhaust leaves the cylinder into the exhaust manifold and then into the tail pipe. Discharge of the burnt gases (exhaust) from the cylinder to make room for the next cycle. (Part 4 of the figure)

7

Figure 2-1 Actions of a four-stroke internal combustion engine (Adapted from (HowStuffWorks 2005))

Figure 2-1 is a diagrammatic representation of the four strokes of an internal

combustion engine. The upper end of the cylinder consists of a clearance space in which

ignition and combustion occur. The expanding medium pushes against the piston head

inside the cylinder, causing the piston to move; this straight line motion of the piston is

converted into the desired rotary motion of the wheels by means of a drivetrain

consisting of a connecting rod and crankshaft. Figure 2-1 illustrates that the only stroke that delivers useful work is the expansion stroke; thus the other three strokes are termed idle strokes. The reader interested in a detailed description of the internal combustion engine is referred to specialized texts, such as Heywood (Heywood 1998) and Newton et al. (Newton et al. 1996).

8 2.1.2 Comparison with the Gasoline Engine

The diesel engine employs the compression ignition cycle. German engineer

Rudolf Diesel developed the idea for the diesel engine and received the patent on

February 23, 1893. His goal was to create an engine with high efficiency. Figure 2-2 is a diagrammatic representation of the four strokes of a diesel engine. The main differences between the gasoline engine and the diesel engine are:

• A gasoline engine compresses at a ratio of 8:1 to 12:1, while a diesel engine compresses at a ratio of 14:1 to as high as 25:1. The higher compression ratio of the diesel engine leads to higher peak combustion temperatures and better fuel efficiency.

• Unlike a gasoline engine, which intakes a mixture of gas and air, compresses it and ignites the mixture with a spark, a diesel engine takes in just air, compresses it and then injects fuel into the compressed air. The heat of the compressed air spontaneously ignites the fuel.

• Gasoline engines generally use either carburetion, in which the air and fuel is mixed long before the air enters the cylinder, or port fuel injection, in which the fuel is injected just prior to the intake stroke (outside the cylinder), while diesel engines use direct fuel injection – the diesel fuel is injected directly into the cylinder.

9

Figure 2-2 Actions of a four-stroke diesel engine (HowStuffWorks 2005)

2.2 Diesel Engine Emissions

Like any other internal combustion engine, diesel engines convert the chemical

energy contained in diesel fuel into mechanical power. Diesel fuel is injected under

pressure into the engine cylinder, where it mixes with air and combustion occurs. Diesel

fuel is heavier and oilier than gasoline. Diesel fuel evaporates much more slowly than

gasoline, with a boiling point that is actually higher than that of water. The lean nature of the diesel-air mixture results in a combustion environment that produces lower emission

rates of carbon monoxide (CO) and hydrocarbons (HC) compared to gasoline-powered

engines. However, diesel engines do produce relatively high levels emissions of oxides

of nitrogen (NOx) and particulate matter (PM), especially fine particulate matter. This section will discuss oxides of nitrogen and particulate emissions in detail.

10 2.2.1 Oxides of Nitrogen and Ozone Formation

Oxides of nitrogen, a mixture of nitric oxide (NO) and nitrogen dioxide (NO2), are produced from the destruction of atmospheric nitrogen (N2) during the combustion

process. Atmospheric air generally consists 80% N2 and 20% O2, and these elements are stable because of the moderate temperatures and pressures. However, during high

temperature and pressure conditions of combustion, excess oxygen in the combustion

chamber reacts with N2 to create NO which is quickly transformed into NO2. The role of nitrogen contained in the air in NO formation was initially postulated by Zeldovich

(Zeldovich et al. 1947). In near-stoichiometric or lean systems the mechanisms associated with NO formation (as many as 30 or so independent chemical reactions that also involve participation of hydrocarbon species) can generally be simplified to the following:

Reaction 1: O2 O + O

Reaction 2: O + N2 NO + N

Reaction 3: N + O2 NO + O In near-stoichiometric and fuel-rich mixtures, where the concentration of OH

radicals can be high, the following reaction also takes place:

Reaction 4: N + OH NO + H

Reaction 4, together with reactions 1, 2 and 3, are known as the extended

Zeldovich mechanism. It is also important to note that emitted Nitric Oxide (NO) will

oxidize to Nitrogen dioxide (NO2) in the atmosphere over a period of a few hours. Oxides of nitrogen (NOx) are reactive gases that cause a host of environmental

concerns impacting adversely on human health and welfare. Nitrogen dioxide (NO2), in particular, is a brownish gas that has been linked with higher susceptibility to respiratory infection, increased airway resistance in asthmatics, and decreased pulmonary function.

Most importantly, NOx emitted from heavy-duty vehicles plays a major role in the

formation of ground level ozone pollution, which causes wide-ranging damage to human

11 health and the environment (USEPA 1995). Ozone is a colorless, highly reactive gas

with a distinctive odor. Naturally, ozone is formed by electrical discharge (lightning) and

in the upper atmosphere at altitudes of between 15 and 35 km. Stratospheric ozone

protects the Earth from harmful ultraviolet radiation from the sun. However, ground

level ozone is formed by chemical reactions involving NOx and volatile organic compounds (VOC) combining in the presence of heat and sunlight. These two categories

of pollutants are also referred to as ozone precursors. The production of photochemical

oxidants usually occurs over several hours which mean that the highest concentrations of

ozone normally occur on summer afternoons, in areas downwind of major sources of

ozone precursors. The simplified reaction processes are illustrated as:

NO2 + VOC + sunlight (UV) ⇒ NO2 + O2 + sunlight (UV) ⇒ NO + O3

At ground level, elevated ozone concentrations can cause health and

environmental problems. Ozone can affect the human cardiac and respiratory systems, irritating the eyes, nose, throat, and lungs. Symptoms of ozone exposure include itchy and watery eyes, sore throats, swelling within the nasal passages and nasal congestion.

Effects from ozone are experienced only for the period of exposure to elevated levels.

EPA promulgated 8-hour ozone standards in 1997 and designated an area as

nonattainment if it has violated, or has contributed to violations of the national 8-hour ozone standard over a three-year period.

2.2.2 Fine Particulate Matter (PM2.5)

Particulate matter (PM) is a complex mixture of solid and liquid particles

(excluding water) that are suspended in air. These particles typically consist of a mixture

of inorganic and organic chemicals, including carbon, sulfates, nitrates, metals, acids, and

semi-volatile compounds. The size of PM in air ranges from approximately 0.005 to 100

micrometers (µm) in aerodynamic diameter -- the size of just a few atoms to about the

12 thickness of a human hair. USEPA defined three general categories for PM as coarse (10

to 2.5 µm), fine (2.5 µm or smaller), and ultrafine (0.1 µm or smaller).

Heavy-duty diesel vehicles are known to emit large quantities of small particles

(Kittelson et al. 1978). A majority of the PM found in diesel exhaust is in the nanometer

size range. Lloyd found that more than 90% of fine particles from heavy-duty vehicles

are small than 1µm in diameter (Lloyd and Cackette 2001).

Fine PM can cause not only human health problems and property damage, but

also aversely impact the environment through visibility reduction and retard plant growth

(Davis et al. 1998). Health studies have shown a significant association between

exposure to fine particles and premature death from heart or lung diseases. Other

important effects include aggravation of respiratory and cardiovascular disease, lung

disease, decreased lung function, asthma attacks. Individuals particularly sensitive to fine

particle exposure include older adults, people with heart and lung disease, and

children(USEPA 2005a). EPA promulgated the PM2.5 standard in 1997 and included a

3 24-hour standard for PM2.5 set at 65 micrograms per cubic meter (µg/m ), and an annual standard of 15 µg/m3.

2.3 Heavy-Duty Diesel Vehicle Emission Regulations

2.3.1 National Ambient Air Quality Standards

The Clean Air Act, which was last amended in 1990, requires the U.S.

Environmental Protection Agency (USEPA) to set National Ambient Air Quality

Standards (NAAQS) to safeguard public health against six common air pollutants. They

are ozone (O3), particulate matter (PM), sulfur dioxide (SO2), carbon monoxide (CO),

nitrogen dioxide (NO2) and lead (Pb). The Clean Air Act established two types of national air quality standards. Primary standards set limits to protect public health,

including the health of "sensitive" populations such as asthmatics, children, and the

13 elderly. Secondary standards set limits to protect public welfare, including protection

against decreased visibility, damage to animals, crops, vegetation, and buildings (CFR

2004b). Table 2-1 illustrates the current NAAQS for ambient concentrations of various

pollutants. Units of measure for the standards are parts per million (ppm) by volume,

milligrams per cubic meter of air (mg/m3), and micrograms per cubic meter of air (µg/m3).

Table 2-1 National Ambient Air Quality Standards (USEPA 2006)

Pollutant Average Times Standard Value Standard Type Carbon Monoxide 8-hour Average 9 ppm (10 mg/m3) Primary (CO) 1-hour Average 35 ppm (40 mg/m3) Primary Nitrogen Dioxide Annual Arithmetic 0.053 ppm (100 Primary & 3 (NO2) Mean µg/m ) Secondary Ozone (O3) 1-hour Average 0.12 ppm (235 Primary & µg/m3) Secondary 8-hour Average 0.08 ppm (157 Primary & µg/m3) Secondary Lead (Pb) Quarterly Average 1.5 µg/m3 Primary & Secondary 3 Particulate (PM10) Annual Arithmetic 50 µg/m Primary & Mean Secondary 24-hour Average 150 µg/m3 Primary & Secondary 3 Particulate (PM2.5) Annual Arithmetic 15 µg/m Primary & Mean Secondary 24-hour Average 65 µg/m3 Primary & Secondary Sulfur Dioxide Annual Arithmetic 0.030 ppm (80 Primary 3 (SO2) Mean µg/m ) 24-hour Average 0.14 ppm (365 Primary µg/m3) 3-hour Average 0.50 ppm (1300 Secondary µg/m3)

2.3.2 Heavy-Duty Engine Certification Standards

Heavy-duty vehicles are defined as vehicles of GVWR (gross vehicle weight

rating) of above 8,500 lbs in the federal jurisdiction and above 14,000 lbs in California

14 (model year 1995 and later). Diesel engines used in heavy-duty vehicles are further

divided into service classes by GVWR, as follows:

• Light heavy-duty diesel engines: 8,500

(14,000

• Medium heavy-duty diesel engines: 19,500≤MHDDE≤33,000

• Heavy heavy-duty diesel engines (including urban bus): HHDDE>33,000

Under the federal light-duty Tier 2 regulation (phased-in beginning 2004),

vehicles of GVWR up to 10,000 lbs used for personal transportation have been re-

classified as “medium-duty passenger vehicles” (MDPV – primarily larger SUVs and

passenger vans) and are subject to the light-duty vehicle legislation. Thus, the same

diesel engine model used for the 8,500-10,000 lbs vehicle category may be classified as

either light- or heavy-duty and certified to different standards, depending on the

manufacturer-defined application (CFR 2004c). Except for the heavy-duty vehicles

classified as LDVs, all heavy-duty vehicle emissions standards are established using the

engine dynamometer certification process.

2.3.3 Heavy-Duty Engine Emission Regulations

EPA regulates heavy-duty vehicle emissions, complied with emissions standards,

over the useful life of the engine. Useful life was adopted as follows (USEPA and

California) (CFR 2004d):

• LHDDE – 8 years/110,000 miles (whichever occurs first)

• MHDDE – 8 years/185,000 miles

• HHDDE – 8 years/290,000 miles

Federal useful life requirements were later increased to 10 years, with no change

to the above mileage numbers, for the urban bus PM standard (1994+) and for the NOx standard (1998+). The emission warranty period is 5 years/100,000 miles (5 years/100,000 miles/3,000 hours in California), but no less than the basic mechanical

15 warranty for the engine family. Table 2-2 shows the heavy-duty engine emissions standards by model year group.

Table 2-2 Heavy-Duty Engine Emissions Standards (USEPA 1997)

Year HC (g/bhp-hr) CO (g/bhp-hr) NOx (g/bhp-hr) PM (g/bhp-hr) Heavy-Duty Diesel Truck Engines 1988 1.3 15.5 10.7 0.60 1990 1.3 15.5 6.0 0.60 1991 1.3 15.5 5.0 0.25 1994 1.3 15.5 5.0 0.10 1998 1.3 15.5 4.0 0.10 Urban Bus Engines 1991 1.3 15.5 5.0 0.25 1993 1.3 15.5 5.0 0.10 1994 1.3 15.5 5.0 0.07 1996 1.3 15.5 5.0 0.05* 1998 1.3 15.5 4.0 0.05* * -in-use PM standard 0.07

2.4 Heavy-Duty Diesel Vehicle Emission Modeling

There are several models currently used to estimate emission from heavy-duty vehicle. The most common emission rate models are VMT-based or cycle-based models, developed from laboratory test facility driving cycle data. Due to lack of available data to representing real world condition, all previous models were developed based upon the engine dynamometer data. The following chapter will address this issue in detail.

16 CHAPTER 3

HEAVY-DUTY DIESEL VEHICLE

EMISSIONS MODELING

There are several models currently used to estimate emission from heavy-duty

vehicle. A comprehensive review of the existing heavy-duty vehicle emission modeling

will help the modelers to understand the different approaches and how they can

contribute to the development of enhanced emission rate modeling techniques.

The most common emission rate models are VMT-based or cycle-based models,

developed from laboratory test facility driving cycle data. Fuel-based model modeled

emission as the product of fuel rate and other parameters. In the 1990’s, even the

proposed enhanced modal models, designed to predict emissions as a function of the

speed and acceleration profiles of vehicles, were still based upon statistical analysis of

cycle-based data (Backman 2000; Fomunung 2000). More recent emission rate modeling

frameworks are proposing to model modal emission rates on a second-by-second basis directly from the vehicle operating mode.

3.1 VMT-Based Vehicle Emission Models

The current emission rate models used by state and federal agencies include the

Mobile Source Emission Model (MOBILE) series of models developed by the U.S.

Environmental Protection Agency (USEPA) and the Emission Factor Emission Inventory

Model (EMFAC) series developed by California Air Resources Board (CARB).

3.1.1 MOBILE

USEPA created MOBILE in the late 1970’s to estimate vehicle emission, which

has since become the nation’s standard in assessing the emission impacts of various

17 transportation inputs. MOBILE uses the method of base emission rates and correction

factors. This model has undergone significant expansion and improvements over the

years. The latest version is MOBILE6 released in Feb 2002.

MOBILE is based on engine dynamometer test data from selected driving cycles.

The Federal Test Procedure (FTP) transient cycle is composed of a unique profile of

stops, starts, constant speed cruises, accelerations and decelerations. Different driving cycles are developed to simulate both urban and freeway driving. A concern with driving cycles is that they may not be sufficiently representative of real-world emissions (Kelly and Groblicki 1993; Denis et al. 1994). For HDV emission rates, MOBILE uses the method of base emission rates and conversion factors which convert the g/bhp-hr emissions estimates observed in the laboratory to g/mile emission rates, to be consistent with available travel information. Conversion factor used to convert the g/bhp-hr emissions estimates to grams per mile traveled contributes a large source of uncertainty in MOBILE model since the BSFC (brake specific fuel consumption) data are aggregated for the fleet and may not represent in-use vehicle characteristics (Guensler et al. 1991).

Conversion factors have improved accuracy in MOBILE6 due to improved data, but fundamental flaws remain (Guensler et al. 2006).

3.1.1.1 Diesel Engine Test Cycles

Currently EPA uses the transient Federal Test Procedure (FTP) engine

dynamometer cycle, which includes both engine cold and warm start operations, for

heavy-duty vehicles [CFR Title 40, Part 86.1333]. Unlike the chassis dynamometer test

for light-duty vehicle, the engine is removed from the vehicles’ chassis, mounted on the

engine dynamometer test stand, and operated on the transient FTP test cycle. The

transient cycle consists of four phases: the first is a NYNF (New York Non Freeway)

phase typical of light urban traffic with frequent stops and starts, the second is LANF

(Los Angeles Non Freeway) phase typical of crowded urban traffic with few stops, the

18 third is a LAFY (Los Angeles Freeway) phase simulating crowded expressway traffic in

Los Angeles, and the fourth phase repeats the first NYNF phase. This cycle comprises a cold start after a parking overnight, followed by idling, acceleration and deceleration phases, and a wide variety of different speeds and loads sequenced to simulate the running of the vehicle that corresponds to the engine being tested. There are few stabilized running conditions, and the average load factor is about 20 to 25% of the maximum horsepower available at a given speed.

Figure 3-1 FTP Transient Cycle (DieselNet 2006)

Emission and operation parameters are measured while the engine operates during the test cycle. The engine torque is determined by applying performance percentages with an engine lug curve (maximum torque curve). Engine torque is then converted to engine brake horsepower using engine revolution per minute (RPM). Brake specific emissions rate are reported in g/bhp-hr and then converted to g/mile using pre-defined conversion factors [CFR Title 40, Part 86.1342-90].

19 Because the engine dynamometer test procedure does not directly account for the

impacts from load and grade changes, a chassis dynamometer test procedure and cycle

known as the HDV urban dynamometer driving schedule (HDV-UDDS), was developed

[CFR Title 40, Part 86, App. I], sometimes referred to as “cycle D”. This cycle is different from the UDDS cycle for light-duty vehicles (FTP-72). This HDV cycle lasts

1060 seconds, and covers 5.55 miles. The average speed for HDV UDDS is 18.86 mph while the maximum speed is 58mph. The following figure shows the speed profile for the chassis UDDS test.

Figure 3-2 Urban Dynamometer Driving Schedule Cycle for Heavy-Duty Vehicle (DieselNet 2006)

3.1.1.2 Baseline Emission Rates

Baseline emission rates (g/bhp-hr) for heavy-duty vehicles are obtained from the engine dynamometer test results collected during USEPA’s cooperative test program with engine manufacturers. The zero mile levels and deterioration rates for NOx, CO, and HC are presented in the following tables for heavy-duty gasoline and diesel engines. All the emission rates are available from “Update of Heavy-Duty Emission Levels (Model Years

1998-2004+) for Use in MOBILE6” (Lindhjem and Jackson 1999).

20 Table 3-1 Heavy-Duty Vehicle NOx Emission Rates in MOBILE6

Zero Mile Level (g/bhp-hr) Deterioration (g/bhp-hr/10,000 miles) Model Year Class Gasoline Diesel Engine Gasoline Diesel Engine Engine Engine Heavy Med. Light Heavy Med. Light 1988-1989 4.96 6.286.43 4.34 0.044 0.01 0.009 0.002 1990 3.61 4.854.85 4.85 0.026 0.004 0.006 0.011 1991-1993 3.24 4.564.53 1.38 0.038 0.004 0.007 0.003 1994-1997 3.24 4.614.61 1.08 0.038 0.003 0.001 0.001 1998-2003 2.59 3.683.69 3.26 0.038 0.003 0.001 0.001 2004+ 2.59 1.841.84 1.63 0.038 0.003 0.001 0.001

Table 3-2 Heavy-Duty Vehicle CO Emission Rates in MOBILE6

Zero Mile Level (g/bhp-hr) Deterioration (g/bhp-hr/10,000 miles) Model Year Class Gasoline Diesel Engine Gasoline Diesel Engine Engine Engine Heavy Med. Light Heavy Med. Light

1988-1989 13.84 1.341.70 1.21 0.246 0.008 0.018 0.022 1990 6.89 1.811.81 1.81 0.213 0.005 0.007 0.012 1991-1993 7.10 1.821.26 0.40 0.255 0.003 0.010 0.004 1994-1997 7.10 1.070.85 1.19 0.255 0.004 0.009 0.003 1998-2003 7.10 1.070.85 1.19 0.255 0.004 0.009 0.003 2004+ 7.10 1.070.85 1.19 0.255 0.004 0.009 0.003

Table 3-3 Heavy-Duty Vehicle HC Emission Rates in MOBILE6

Zero Mile Level (g/bhp-hr) Deterioration (g/bhp-hr/10,000 miles) Model Year Class Gasoline Diesel Engine Gasoline Diesel Engine Engine Engine Heavy Med. Light Heavy Med. Light

1988-1989 0.62 0.470.66 0.64 0.023 0.001 0.002 0.002 1990 0.35 0.520.52 0.52 0.023 0.000 0.001 0.001 1991-1993 0.33 0.300.40 0.47 0.021 0.000 0.001 0.001 1994-1997 0.33 0.220.31 0.26 0.021 0.001 0.001 0.001 1998-2003 0.33 0.220.31 0.26 0.021 0.001 0.001 0.001 2004+ 0.33 0.220.31 0.26 0.021 0.001 0.001 0.001

21 3.1.1.3 Conversion Factors

Because emission standards for both gasoline and diesel heavy-duty vehicles are

expressed in terms of grams per brake-horsepower (g/bhp-hr), the MOBILE6.2 model employs conversion factors of brake horsepower-hour per mile (bhp-hr/mile) to convert the emission certification data from engine testing to grams per mile. Conversion factors are a function of fuel density, brake-specific fuel consumption (BSFC), and fuel economy for each HDV class (USEPA 2002a). The conversion factors were calculated from the following expression:

(Equation 3-1)

To calculate BSFCs, USEPA first obtained data from model year 1987 through

1996 supplied by six engine manufacturers. USEPA then performed regression analysis

for BSFCs by model year for each weight class and a logarithmic curve to extrapolate

values prior to 1988 and after 1995, since sales data was only available for model years

1988 through 1995 (USEPA 2002c).

Fuel economy was calculated using a regression curve derived from the1992

Truck Inventory and Use Survey (TIUS) conducted by the U.S. Census Bureau. Fuel

densities were determined from National Institute for Petroleum and Energy Research

(NIPER) publications for both gasoline and diesel (Browning 1998). Using the equation

defining the conversion factor together with the data described above, weight class

specific conversion factors were calculated for gasoline and diesel vehicles for model

years 1987 through 1996 (USEPA 2002c).

22 3.1.2 EMFAC

EMFAC was developed by CARB separately from MOBILE based upon the presence of vehicle technologies in the onroad fleet that would be subject to more stringent standards and fuels used in California. The latest version, EMFAC 2002, was

released in September 2002. EMFAC can estimate emission for calendar years 1970 to

2040.

EMFAC abandoned the use of conversion factors from EMFAC 2000 and used chassis dynamometer data collected for 70 trucks tested over the Urban Dynamometer

Driving Schedule (UDDS). Although the use of UDDS test data marked a significant

improvement, it is hard to say that UDDS adequately represented the full range of heavy

duty diesel operation. Although the cycle was constructed from actual truck activity data, it lacks extended cruises known to cause many trucks to default to a high NOx emitting, fuel saving mode referred to as “Off-Cycle” NOx. The cycle also lacks hard accelerations known to result in high emissions of particulate matter (CARB 2002).

CARB continues to develop more mode test cycle designed to better depict the

emissions of HDDVs under real world conditions, including emissions from engines

programming to go “off-cycle” at certain speeds. Activity data from instrumented truck

studies conducted by Battelle and Jack Faucett Associates for CARB (2002) have been

used to develop a four mode heavy-heavy-duty diesel cycles. The following figure shows

these four mode cycles developed by CARB. It is reported that the creep mode produced

the greatest gram per mile results followed by the transient and the cruise mode. The

transient and cruise modes produced higher and lower emissions, respectively, than the

UDDS (CARB 2002).

23

Figure 3-3 CARB’s Four Mode Cycles (CARB 2002)

3.1.3 Summary

EPA’s MOBILE series models have significantly improved through the series of

model revisions from 1970’s. However, the MOBILE series of models still have major

modeling defects for the heavy-duty components, which have been widely recognized for

more than 10 years (Guensler et al. 1991). One of the most frequently stated defects is

that fleet average speed, which aggregates other vehicle activity factors that may yield

significant bias in emissions characterization, is used to characterize vehicle emission

rates.

In developing emissions inventories using the MOBILE and EMFAC emission

rate models, vehicle activity is estimated using travel demand models. The estimation of

vehicle miles traveled (VMT) was based on EPA’s fleet characterization study (USEPA

1998). It is common to estimate heavy-duty travel as a fixed percentage of predicted

traffic volumes (TRB 1995). This estimate is not correct since heavy-duty truck travel does not follow the same spatial and temporal patterns as that of light-duty vehicle

(Schlappi et al. 1993).

24 3.2 Fuel-Based Vehicle Emission Models

The fuel-based emission inventory models for heavy-duty diesel trucks combine

vehicle activity data (i.e., volume of diesel fuel consumed) with emission rates

normalized to fuel consumption (i.e., mass of pollutant emitted per unit volume of fuel

burned) to estimate emissions within a region of interest (Dreher and R. Harley 1998).

This approach was proposed to increase accuracy of truck VMT estimation by combining state level truck VMT with statewide fuel sales to estimate total heavy-duty truck activity, using the amount of fuel consumed as a measure of activity.

In California, fuel consumption data are available through tax records at the statewide level and this statewide fuel consumption can be apportioned to provide emission estimates for an individual air basin by month, day of week, and time of day.

At the same time, emission rates are normalized to fuel consumption as follows:

S EI = p (Equation 3-2) p BSFC Where EIp is the emission index for pollutant P, in units of mass of pollutant

emitted per unit mass of fuel burned; Sp is the brake specific pollutant emission rate obtained from the dynamometer test, expressed in g/bhp-hr units; and BSFC is the brake

specific fuel consumption of the engine being tested, also in g/bhp. Exhaust emissions

are estimated by multiplying vehicle activity, as measured by the volume of fuel used, by

emission rates which are normalized to fuel consumption and expressed as grams of

pollutant emitted per gallon of diesel fuel burned instead of grams of pollutant per mile

(Dreher and R. Harley 1998). Average emission rates for subgroups of vehicles are

weighted by the fraction of total fuel used by each vehicle subgroup to obtain an overall

fleet-average emission rate. The fleet-average emission rate is multiplied by regional fuel

sales to compute pollutant emissions (Singer and Harley 1996).

The advantages of the fuel-based approach include the fact that fuel-use data are

available from tax records in California. Furthermore, emission rates normalized to fuel

25 consumption vary considerably less over the full range of driving conditions than travel-

normalized emission factors (Singer and Harley 1996). The disadvantage is obvious too.

The tax records are not available for other states. It is difficult to get input data outside of

California, limiting the scope of the modeling approach. Furthermore, the users have to

run two models to predict fuel used first and then predict emission rates, which is not

statistically efficient.

3.3 Modal Emission Rate Models

Modal emission rate models works on the premise that emission are better

modeled as a function of specific models of vehicle operation, i.e., idle, steady-state

cruise, various levels of acceleration/deceleration, etc., than they are as a function of

average vehicle speed (Bachman 1998; Ramamurthy et al. 1998; USEPA 2001a).

Emissions of heavy-duty vehicle powered by diesel cycle engines are more likely to be a

function of brake work output of engine than normal gasoline vehicles, because

instantaneous emissions levels of diesel engine is highly correlated with the instantaneous work output of the engine (USEPA 2001a). With the consideration of vehicle modal activity, EPA and various research

communities have been developing modal activity based emission models. The report published by Nation Research Council (NRC 2000) comprehensively reviewed the modeling of mobile source emissions and provided recommendations for the improvement of future mobile source emission models. The following sections will introduce the most representative modal emission models one by one.

3.3.1 CMEM

The Comprehensive Modal Emissions Model (CMEM) was developed by the

Center for Environmental Research and Technology at University of California Riverside

(UCR-CERT). It was first funded by National Cooperative Highway Research Program

26 Project (1995-2000) and then is being enhanced and improved with EPA funding (2000- present). From 2001, CE-CERT created a modal-based inventory at the micro-

(intersection), meso- (highway link), and macro- (region) scale levels for light-duty vehicle (LDV) and heavy-duty diesel (HDD) vehicles. The CMEM model derived a fuel rate from road-load and simple powertrain model. Emissions rates are then derived empirically from the fuel rate. Fuel rate, or fuel consumption per unit time, forms the basis for CMEM.

The CMEM HDD emissions model (Barth et al. 2004) accepted the same approach as the light-duty vehicle model. In that model, second-by-second tailpipe emissions are modeled as the product of three components: fuel rate (FR), engine-out emission indices (gemission/gfuel), and an emission after-treatment pass fraction. The model is composed of six modules: 1) engine power demand; 2) engine speed; 3) fuel-rate; 4) engine control unit; 5) engine-out emissions; and 6) after-treatment pass fraction. The vehicle power demand is determined based on operating variables (second-by-second vehicle speed (from which acceleration can be derived; note that acceleration can be input as a separate input variable), grade, and accessory use (such as air conditioning)) and specific vehicle parameters (vehicle mass, engine displacement, cross-sectional area, aerodynamics, vehicle accessory load, transmission efficiency, and drive-train efficiency, and so on). The core of the model is the fuel rate calculation which is a function of power demand and engine speed. Engine speed is determined based on vehicle velocity, gear shift schedule and power demand (Barth et al. 2004). The model uses a total of 35 parameters to estimate vehicle tailpipe emissions.

3.3.2 MEASURE

The Mobile Emissions Assessment System for Urban and Regional Evaluation

(MEASURE) model was developed by Georgia Institute of Technology in the late 1990’s.

The MEASURE model is developed within a geographic information system (GIS) and

27 employs modal emission rates, varying emissions according to vehicle technologies and

modal operation (cruise, acceleration, deceleration, idle). The model emission rate

database consisted of more than 13,000 laboratory tests conducted by the EPA and

CARB using standardized test cycle conditions and alternative cycles (Bachman 1998).

The aggregate modal model within MEASURE employs emission rates based on theoretical engine-emissions relationships. The relationships are dependent on both modal and vehicle technology variables, and they are “aggregate” in the sense that they rely on bag data to derive their modal activities (Washington et al. 1997a). Emission rates were statistically derived from the emission rate data as a function of operating mode power demand surrogates. The model uses statistical techniques to predict emission rates using a process that utilizes the best aspects of hierarchical tree-based regression (HTBR) and ordinary least squares regression (Breiman et al. 1984). HTBR is used to reduce the number of predictor variables to a manageable number, and to identify useful interactions among the variables; then OLS regression techniques are applied until a satisfactory model is obtained (Fomunung et al. 2000). Vehicle activity variables include average speed, acceleration rates, deceleration rates, idle time, and surrogates for power demand. The MEASURE model for light-duty vehicle was completed and verified in 2000.

MEASURE provides the following benefits since it has been developed under the

GIS platform (Bachman et al. 2000): 1) manages topographical parameters that affect emissions; 2) calculates emissions from vehicle modal activities; 3) allows a ‘layered’ approach to individual vehicles activity estimation; 4) aggregates emission estimates into grid cells for use in photochemical air quality models.

3.3.3 MOVES

To keep pace with new analysis needs, modeling approaches, and data, the

USEPA's Office of Transportation and Air Quality (OTAQ) is developing a modeling

28 system termed the MOtor Vehicle Emission Simulator (MOVES). This new system will

estimate emissions for on-road and nonroad sources, cover a broad range of pollutants,

and allow multiple scale analysis, from fine-scale analysis to national inventory

estimation. In the future, MOVES will serve as the replacement for MOBILE6 and

NONROAD (USEPA 2001a). This project was previously known as the New Generation

Mobile Source Emissions Model (NGM).

The current plan for MOVES is to use Vehicle Specific Power (VSP) as a variable

on which their emission rates can be based (Koupal et al. 2002). The VSP approach to

emissions characterization was developed by Jimenez-Palacios (Jimenez-Palacios 1999).

VSP is a function of speed, acceleration, road grade, etc:

(Equation 3-3)

Where: v: vehicle speed (assuming no headwind) (m/s)

a: vehicle acceleration in (m/s2)

ε : mass factor accounting for the rotational masses (~0.1)

g: acceleration due to gravity

grade: road grade

CR: rolling resistance (~0.0135) ρ : air density (1.2)

CD: aerodynamic drag coefficient A: the frontal area

m: vehicle mass in metric tones

The basic concept of MOVES starts with the characterization of vehicle activity

and the development of relationships between characterized vehicle activity and energy consumption, and between energy consumption and vehicle emission (Nam 2003). The

USEPA established a modal binning approach, developed using VSP, to characterize the

29 relationship between vehicle activity and energy consumption. Originally, a total 14 modal bins were developed based on different VSP ranges (USEPA 2001a). This

approach was revised by two different ways. USEPA refined the VSP binning approach

by the association of second-by-second speed, engine rpm, and acceleration rates, and the

original 14 VSP binning approach are revised with the combination of five different speed operating modes and redirected to a total of 37 VSP bins (Koupal et al. 2004).

Researchers at North Carolina State University (NCSU), divided each bin into four strata representing two engine size and two odometer reading categories, and this was referred to as the “56-bin” approach. (USEPA 2002b).

Another important conceptual model for MOVES is developed by NCSU in 2002

(Frey et al. 2002). Dr. Frey summarized the conceptual analytical methodology in report

“Recommended Strategy for On-Board Emission Data Analysis and Collection for the

New Generation Model”. This method is to use power demand estimate (P) as a variable on which their emission rates can be based (Frey et al. 2002).

P = v × a (Equation 3-4) Where: P: power demand (mph2/sec)

v: vehicle speed (mph)

a: vehicle acceleration in (mph/s)

This method uses on-board emissions data which data are collected under real-

world conditions to develop modal emission model which can estimate emissions at

different scales such as microscale, mesoscale, and marcroscale. The philosophy is

similar as MEASURE (Fomunung 2000), which first segregated the data into four modes

based on suitable modal definitions then developed OLS regression model for each mode

using explanatory variables selected by HTBR techniques. These explanatory variables include model year, humidity, temperature, altitude, grade, pressure, and power. Second and third powers of speed and acceleration were also included in the regression analysis.

30 3.3.4 HDDV-MEM (Guensler et al. 2005a)

The researchers in Georgia Institute of Technology have developed a beta version

of the heavy-duty diesel vehicle modal emissions model (HDDV-MEM), which is based

on vehicle technology groups, engine emission characteristics, and vehicles modal activity (Guensler et al. 2005a). The HDDV-MEM first predicts second-by-second

engine power demand as a function of onroad vehicle operating conditions and then

applies brake-specific emission rates to these activity predictions. The HDDV-MEM

consists of three modules: a vehicle activity module (with vehicle activity tracked by

vehicle technology group), an engine power module, and an emission rate module. The

model framework is illustrated in Figure 3-4.

Figure 3-4 A Framework of Heavy-Duty Diesel Vehicle Modal Emission Model (Guensler et al. 2005a)

3.3.4.1 Model Development Approaches

The HDDV-MEM modeling framework is designed for transportation

infrastructure implementation on link-by-link basis. While the modeling routines are

31 actually amenable to implementation on a vehicle-by-vehicle basis, the large number of vehicles operating on infrastructure links precludes practical application of the model in this manner. As such, the model framework capitalizes upon previous experience gained in development of the MEASURE modeling framework, in which vehicle technology groups were employed. A new heavy-duty vehicle visual classification scheme, which is an EPA and FHWA hybrid vehicle classification scheme developed by Yoon et al. (Yoon et al. 2004b), classified vehicle technology groups by engine horsepower ratings, vehicles

GVWR, vehicle configurations, and vehicle travel characteristics (Yoon 2005c).

However, whereas the MEASURE model employed load surrogates for the implementation of a light-duty modal modeling regime, this new modeling framework directly heavy-duty vehicle operating loads and uses these load predictions directly in the emission prediction process. Engine power module is designed for this task.

Emission rates are first established for various heavy-duty technology groups

(engine and vehicle family, displacement, certification group, drivetrain, fuel delivery system, emission control system, etc.) based upon statistical analysis of standard engine dynamometer certification data, or onroad emission rate data when available (Wolf et al.

1998; Fomunung et al. 2000). The following subsets will discuss three main modules in the HDDV-MEM.

3.3.4.2 Vehicle Activity Module

The vehicle activity module provides hourly vehicle volumes for each vehicle technology group on each transportation link in the modeled transportation system. The annual average daily traffic (AADT) estimate for each road link is processed to yield vehicle-hours of operation per hour for each technology group (using truck percentages,

VMT fraction by vehicle technology group, diesel fraction, hourly volume apportionment of daily travel, link length, and average vehicle speed) (Guensler et al. 2005a; Yoon

2005c).

32 VAv,h,s f = (AADTs × (NLs /TNL) × HVFv,h ×VFv × DFv ) × (SLs / ASv ) (Equation 3-5) Where: VA: the estimated vehicle activity (veh-hr/hr):

v: the vehicle technology group

h: the hour of day

s: the transportation link

f: the facility type for the link

AADT: the annual average daily traffic for the link

NL: the number of lanes in the specific link direction

TNL: the total number of lanes on the link

HVF: the hourly vehicle fraction

VF: the VMT fraction for each vehicle technology group

DF: the diesel vehicle fraction for each technology group

SL: the link length (miles)

AS: the link average speed of the technology group (mph)

To estimate onroad running emissions from each link, two sets of calculations are performed. Onroad vehicle activity (vehicle-hr) for each hour is multiplied by engine power demand for observed link operations (positive tractive power demand plus auxiliary power demand), and then by baseline emission rates (g/bhp-hr). These calculations are processed separately for each speed/acceleration matrix cell (Yoon et al.

2005b). Emissions from motoring/idling activity are calculated by the determination of the vehicle-hours of motoring/idling activity on each link for each hour and the multiplication of the baseline idle emission rate (g/hr).

3.3.4.3 Engine Power Module

Internal combustion engines translate linear piston work (force through a distance)

to a crankshaft, rotating the crankshaft and creating engine output torque (work

33 performed in angular rotation). The crankshaft rotation speed (engine speed in revolutions per minute) is a function of engine combustion and physical design parameters (mean effective cylinder pressure, stroke length, connecting rod angle, etc.).

The torque available at the crankshaft (engine output shaft) is less than the torque generated by the pistons, in that there are torque losses inside the engine associated with operating a variety of internal engine components. Torque is transferred from the engine output shaft to the driveshaft via the transmission (sometimes through a torque-converter, i.e. fluid coupling) and through a series of gears that allow the drive shaft to rotate at different speeds relative to engine crankshaft speed. The drive shaft rotation is then transferred to the drive axle via the rear differential. The ring and pinion gears in the rear differential translate the rotation of the drive shaft by 90 degrees; from the drive shaft running along the vehicle to the drive axle that runs across the vehicle. Torque available at the drive axle, is now delivered directly to the drive wheels, which generates the tractive force used to overcome road friction, wind resistance, road grade (gravity), and other resistive forces, allowing the vehicle to accelerate on the roadway. Figure 3-5 illustrates the primary components of concern.

Figure 3-5 Primary Elements in the Drivetrain (Gillespie 1992)

34 The vehicle drivetrain (engine, torque converter, transmission, drive shaft, rear differential, axles, and wheels) is designed as a system to convert engine torque into useful tractive force at the wheel-to-pavement interface. When the tractive force is greater than the sum of forces acting against the vehicle, the vehicle accelerates in the direction of travel. Given that on-road speed/acceleration patterns for HDDVs can be observed (or empirically modeled), the modal modeling approach works backwards from observed speed and acceleration to estimate the tractive force (and power) that was available at the wheels to meet the observed conditions. Then, working backwards from tractive force, the model accounts for additional power losses that occurred between the engine and the wheels to predict the total brake-horsepower output of the engine. Force components that reduce available wheel torque and tractive force include:

• Aerodynamic drag, which depends on the frontal area, the drag coefficient, and the square of the vehicle speed

• Tire rolling resistance, which is determined by the coefficient of rolling resistance, vehicle mass, and road grade (where the coefficient of rolling resistance is a function of: tire construction and size; tire pressure; axle geometry, i.e., caster and

camber; and whether the wheels are driven or towed)

• Grade load, which is determined by the roadway grade and vehicle mass

• Inertial load, which is determined by the vehicle’s mass and acceleration

The tractive force required at the interface between the tires and the road to

overcome these resistive forces and provide vehicle acceleration can be described by

(Gillespie 1992):

FT = FD + FR + FW + FI + ma (Equation 3-6) Where: FT : the tractive force available at the wheels (lbf)

FD : the force necessary to overcome aerodynamic drag (lbf)

FR : the force required to overcome tire rolling res:tance (lbf)

35 FW : the force required to overcome gravitational force (lbf)

FI : the force required to overcome inertial loss (lbf) m : the vehicle mass (lbm)

a : the vehicle acceleration (ft/sec2)

Load prediction models could employ a wide variety of aerodynamic drag (Wolf-

Heinrich 1998) and rolling resistance functional forms, some of which may be more appropriate for certain vehicle designs and at certain vehicle speeds. Note that vehicle

mass is a critical parameter that must be included in the load-based modeling approach.

Therefore, estimates of gross vehicle weight must be included in any transit (vehicle

weight plus passenger loading) or heavy-duty truck (vehicle weight plus cargo payload)

application. The following subsections describe each force in the Equation 3-6, taken

from Yoon et al. (Yoon et al. 2005a).

Aerodynamic Drag Force (FD) As a vehicle moves forward through the atmosphere, drag forces are created at the interface of the front of the vehicle and by the vacuum generated at the tail of the vehicle.

The flow of the air around the vehicle creates a very complex set of forces providing both resistance to forward motion and vehicle lift. The net aerodynamic drag force is a function of air density, aerodynamic drag coefficient, vehicle frontal area, and effective vehicle velocity.

ρ F = ( ) × C × A ×V 2 (Equation 3-7) D 2g d f e

Where: ρ : the air density (lb/ft3)

g : the acceleration of gravity (32.2 ft/sec2)

Cd : the aerodynamic drag coefficient

2 Af : the vehicle frontal area (ft )

Ve : the effective vehicle velocity (ft/sec)

36 Rolling Resistance Force (FR) Rolling resistance force is the sum of the force required to overcome the

combined friction resistance at the tires. Tires deform at their contact point with the

ground as they roll along the roadway surface. Rolling resistance is caused by contact

friction, the tires’ resistance to deformation, aerodynamic drag at the tire, etc. The force

required to overcome rolling resistance can be expressed with rolling resistance

coefficient, vehicle weight, and road grade.

FR = Cr × m× g ×cos(θ ) (Equation 3-8)

Where: Cr: the rolling resistance coefficient

θ : the road grade (degree)

Gravitational Weight Force (FW) The gravitational force components account for the effect of gravity on vehicle

weight when the vehicle is operating on a grade. The grade angle is positive on uphill

grades (generating a positive resistance) and negative on downgrades (creating a negative

resistance).

FW = m× g ×sin(θ ) (Equation 3-9)

Drivetrain Inertial Loss (FI) The engine, transmission, drive shaft, axles and wheels are all in rotation. The

rotational speed of each component depends upon the transmission gear ratio, the final drive ratio, and the location of the component in the drive train (i.e. the total gear ratio

between each component and the wheels). The rotational moment of inertia of the

various drivetrain components constitutes a resistance to change in motion. The torque

delivered by each rotating component to the next component in the power chain (engine

to clutch/torque converter, clutch/torque converter to transmission, transmission to drive

shaft, drive shaft to axle, axle to wheel) is reduced by the amount necessary to increase

angular rotation of the spinning mass during vehicle acceleration. Given the torque loss

37 at each component, the reduction in motive force available at the wheels due to inertial

losses along the drivetrain can be modeled (Wolf-Heinrich 1998). This model term is

most significant under low speed acceleration conditions, such as vehicle operation in

truck and rail yards where vehicles are lugging heavy loads over short distances.

However, as will be discussed later, significant new data will be required to incorporate

the inertial loss effects into modal models.

a× I a×[(I + (G 2 × I ) + (G 2 ×G 2 )×(I + I )] F = EFF = W d D t d E t (Equation 3-10) I r 2 r 2

Where: a : the acceleration in the direction of vehicle motion (ft/sec2)

2 IEFF : the effective moment of inertia (ft- lbf -sec )

2 IW : the rotational moment of inertia of the wheels and axles (ft-lbf -sec )

2 ID : the rotational moment of inertia of the drive shaft (ft-lbf -sec )

2 IT : the rotational moment of inertia of the transmission (ft-lbf -sec )

2 IE : the rotational moment of inertia of the engine (ft-lbf -sec )

Gt : the gear ratio at the engine transmission

Gd : the gear ratio in the differential r : wheel radius (ft)

Power Demand

Using the equations outlined above, the total engine power demand, which is the

combination of tractive power and auxiliary power demands, can be expressed as:

V P = [()× (F + F + F + F + ma)]+ AP (Equation 3-11) 550 D R W I Where: V : the vehicle speed (ft/s)

AP : the auxiliary power demand (bhp)

550 : the conversion factor to bhp

38 3.3.4.4 Emission Rate Module

The emission rate module provides work-related emission rates (g/bhp-hr) and idle emission rates (g/hr) for each technology group. The basic application of the

HDDV-MEM incorporates a simple emission rate modeling approach. The predicted engine power demand (bhp) for each second of vehicle operation is multiplied by emission rates in gram/bhp-sec for a given bhp load. Technology groups (i.e. vehicles that perform similarly on the certification tests) are established based upon the engine and control system characteristics and each technology group is assigned a constant g/bhp-sec emission rate based upon regression tree and other statistical analysis of certification data.

Under the assumption that testing cycles represent the typical modal activities undertaken by onroad activities, such emission rates are applied to onroad activity data. Given the large repository of certification data, detailed statistical analysis of the certification test results can be used to obtain applicable emission rates for these statistically derived vehicle technology groups. The data required for analysis must come from chassis dynamometer (the engine remains in the vehicle and the vehicle is tested on a heavy-duty treadmill) and onroad test programs in which second-by-second grams/second emission rate data have been collected concurrently with axle-hp loads.

At this moment, HDDV-MEM accepts EPA’s baseline running emission rate data as work-related emission rates and EMFAC2002 idling emission rate test data as idle emission rates. Diesel vehicle registration fractions and annual mileage accumulation rates are employed to develop calendar year emission rates for each technology group. In the future, a constant emission rate need not be used as more refined testing data become available. Linear, polynomial, or generalized relationships can be established between gram/second emission rate and tractive horsepower (axle horsepower) and other variables.

Sufficient testing data are required to establish statistically significant samples for each technology group.

39 3.3.4.5 Emission Outputs

HDDV-MEM outputs link-specific emissions in grams per hour (g/hr) for volatile organic compound (VOC), carbon monoxide (CO), oxides of nitrogen (NOx), and particular matter (PM) for each vehicle type. Toxic air contaminant emissions rates

(benzene, 1, 3-butadiene, formaldehyde, acetaldehyde, and acrolein) are also estimated in grams/hour for each vehicle type using the MOBILE6.2-modeled ratios of air toxics to

VOC for each calendar year. HDDV-MEM provides not only hourly emissions, but also aggregated total daily emissions (in accordance with input command options). The structure of output files, which provide link specific hourly emissions, can be directly incorporated with roadway network features in a GIS environment for use in interactive air quality analysis in various spatial scales, i.e., national, regional, and local scales

(Guensler et al. 2005a; Yoon 2005c).

40 CHAPTER 4

EMISSION DATASET DESCRIPTION AND

POST-PROCESSING PROCEDURE

Using second-by-second data collected from onroad vehicles, the research effort reported in this thesis developed models to predict emission rates as a function of onroad operating conditions that affect vehicle emissions. Such models should be robust and ensure that assumptions about the underlying distribution of the data are verified and that assumptions associated with applicable statistical methods are not violated. Due to the general lack of data available for development of heavy-duty vehicle modal emission rate models, this study focuses on development of an analytical methodology that is repeatable with different data set collected across space and time. There are two second- by-second data sets in which emission rate and applicable load and vehicle activity data had been collected in parallel. One database was a transit bus dataset, collected on diesel transit buses operated by Ann Arbor Transit Authority (AATA) in 2001, and another dataset was heavy HDV (HDV8B) dataset prepared by National Risk Management

Research Laboratory (NRMRL) in 2001. Each is summarized one by one in the following sections.

4.1 Transit Bus Dataset

Transit bus emission dataset was prepared by Sensor, Inc. Sensors, Inc. has supplied gas analyzers and portable emissions testing systems worldwide for over three decades. Their products, SEMTECH-G for gasoline powered vehicle, and SEMTECH-D for diesel powered vehicles, are commercially available for on-vehicle emission test applications. In October 2001, Sensors, Inc. conducted a real-world, onroad emissions measurements of 15 heavy-duty transit buses for USEPA. Transit buses were provided

41 by the Ann Arbor Transit Authority (AATA) and all of them were New Flyer models with Detroit Diesel Series 50 engines. Table 4-1 summarizes the buses tested for USEPA.

Table 4-1 Buses Tested for USEPA (Ensfield 2002)

Displace Peak Model ment Torque Bus # Bus ID Year Odometer Engine series (liters) (lb-ft) Test Date 1 BUS360 1995 270476 SERIES 50 8047 GK40 8.5 890 10/25/2001 2 BUS361 1995 280484 SERIES 50 8047 GK38 8.5 890 10/25/2001 3 BUS363 1995 283708 SERIES 50 8047 GK37 8.5 890 10/24/2001 4 BUS364 1995 247379 SERIES 50 8047 GK42 8.5 890 10/24/2001 5 BUS372 1995 216278 SERIES 50 8047 GK41 8.5 890 10/26/2001 6 BUS375 1996 211438 SERIES 50 8047 GK39 8.5 890 10/25/2001 7 BUS377 1996 252253 SERIES 50 8047 GK36 8.5 890 10/24/2001 8 BUS379 1996 260594 SERIES 50 8047 GK35 8.5 890 10/23/2001 9 BUS380 1996 223471 SERIES 50 8047 GK28 8.5 890 10/23/2001 10 BUS381 1996 200459 SERIES 50 8047 GK29 8.5 890 10/22/2001 11 BUS382 1996 216502 SERIES 50 8047 GK30 8.5 890 10/17/2001 12 BUS383 1996 199188 SERIES 50 8047 GK31 8.5 890 10/19/2001 13 BUS384 1996 222245 SERIES 50 8047 GK32 8.5 890 10/17/2001 14 BUS385 1996 209470 SERIES 50 8047 GK33 8.5 890 10/18/2001 15 BUS386 1996 228770 SERIES 50 8047 GK34 8.5 890 10/19/2001

4.1.1 Data Collection Method

A total of 15 files were provided for the purpose of model development. Each file represents data collected from different transit bus. Five of these buses were 1995 model year and the rest were 1996 model year. All of the bus test periods lasted approximately

2 hours. The buses operated during standard Ann Arbor bus routes and stopped at all regular stops although the buses did not board or discharge any passengers. The routes were mostly different for each test, and were selected for a wide variety of driving conditions. All of the bus routes for the test are shown below.

42

Figure 4-1 Bus Routes Tested for USEPA

Sensor’s, Inc. engineers preformed the instrument setup and data collection for all the buses. Test equipment, SEMTECH-D analyzer, is shown in Figure 4-2. Because engine computer vehicle interface (SAE J1708) data were collected at 10 Hz, Sensor’s

Inc. engineers manually started and stopped data collections at approximately 30 minute intervals to keep file size manageable and a total of four trip files were generated per bus.

Zero drift was checked between data collections. Then four files for each bus were combined into one file after post-processing. This can explain why the time for each bus is not continuous sometime. To derive other variables easily, like acceleration, and keep data manageable or other purposes, data for each bus was separated into trips based on continuous time. After this processing, there were 62 “trips” in the transit bus database.

43

Figure 4-2 SEMTECH-D in Back of Bus (Ensfield 2002)

4.1.2 Transit Bus Data Parameters

Each of the 15 data files share the same format. The data fields included in each file are summarized below.

Table 4-2 Transit Bus Parameters Given by the USEPA (Ensfield 2002)

Category Parameters Test Information Date; Time Vehicle Characteristics License number; Engine size; Instrument configuration number Roadway Characteristics GPS Latitude (degree); GPS Longitude (degree); GPS Altitude (feet); Grade (%) Onroad Load Parameters Vehicle speed (mph); Engine speed (rpm); Torque (lb-ft); Engine power (bhp) Engine Operating Parameters Engine load (%); Throttle position (0 – 100%); Fuel volumetric flow rate (gal/s); Fuel specific gravity; Fuel mass flow rate (g/s); Calculated instantaneous fuel economy (mpg); Engine Oil temperature(deg F); Engine oil pressure (kPa); Engine warning lamp (Binary); Engine coolant temperature (deg F); Barometric pressure reported from ECM (kPa); Calculated exhaust flow rate (SCFM)

44 Table 4-2 Continued

Environment Conditions Ambient temperature (deg C); Ambient pressure (mbar); Ambient relative humidity (%); Ambient absolute humidity (grains/lb air) Vehicle Emission HC, CO, NOx, CO2 emission (in PPM, g/sec, g/ke-fuel, g/bhp-hr units)

4.1.3 Sensor’s Data Processing Procedure

It is helpful to understand how Sensor processing the dataset after data collection and this information is very important for data quality assurance and quality check. This section is adapted primarily from the Sensor’s field data collection report (Ensfield 2002).

Data Synchronization: According to Sensor’s report, each of the analytical instruments, vehicle interface, and GPS equipment reported data to the SEMTECH data logger asynchronously and at differing rates, but with a timestamp at millisecond precision. The first step of post-processing procedure is to eliminate the extra data by interpolating and synchronizing all the data to 1 Hz. With all the raw data synchronized to the same data rate, it is then time aligned so that engine data corresponds to emissions data in real time.

Mass Emissions Calculations: Mass emissions (gram/second) are calculated by fuel flow method. With access to real-time, second-by-second fuel flow rate, transient mass emissions is computed by multiplying these by the real-time fuel specific emissions.

Using NO for example,

Fuel specific emissions are the mass fractions of each pollutant to the fuel in the combusted air/fuel mixture. The mass fuel flow rate is converted from fuel volumetric flow rate with the fuel specific gravity.

Brake Specific Emissions Calculations: Engine torque is first computed by applying the engine load parameter, which represents the ratio between current engine

45 torque and maximum engine torque, with the engine lug curve (maximum torque curve).

Engine horsepower is then converted from engine torque using engine speed data. Work

(BHP-hr) is computed for each second of the test, and brake specific emissions are reported as the sum of the grams of pollutant emitted over the desired interval (one second) divided by the total work.

Vehicle Speed Validation: Vehicle speed is a critical parameter that influences the derived parameter, acceleration and emission rates. It is important for researchers to understand the measurement way and data accuracy. Sensor, Inc. measured vehicle speed using two methods: vehicle Electronic Control Module (ECM) and Global

Positioning System (GPS). Figure 4-3 shows the GPS vs. ECM comparison for Bus 380.

The regression analysis shows that the ECM data are around 10% higher than the GPS data, according to Sensor’s report (Ensfield 2002). Sensor’s researchers believe that this comparison suggests that GPS data may be more reliable for onroad testing. Buses of model year 1995 were equipped with an earlier version ECM that did not provide vehicle speed and GPS velocity data were used in place of the ECM data. Buses of model year

1996 were equipped with the current version ECM that can provide vehicle speed and vehicle speed was reported after validation with the GPS data. It is reported that GPS data were within 1% accuracy based upon analysis of 10 miles of data.

46

Figure 4-3 Bus 380 GPS vs. ECM Vehicle Speed (Ensfield 2002)

4.1.4 Data Quality Assurance/Quality Check

After understanding the way which Sensor processed the reported data set, the data set for each bus was screened to check for errors or possible problems. Possible sources of errors associated with data collection should be considered before undertaking data analysis for the development of a model. The types of errors checked are listed below.

Loss of Data: It is observed that emission data are missing for some buses. For example, bus 382 had missing HC data for 343 seconds. Bus 361, 377, 384 have similar problem. There might be several reasons for loss of data. Communication between instruments might be lost or a particular vehicle failed to report a particular variable.

These records are removed from the test database and not employed in development of

HC models because the instantaneous emission values will be recorded as zero,

47 introducing significant bias to the result. Similarly, calculated fuel economy data are missing for some buses.

Erroneous ECM Data: It is reported that there were some cases where certain engine parameters were well outside of physical limits, and these erroneous ECM data were filtered out with the pre-defined filter limits. The following filter limits were imposed on the rate of change of RPM, fuel flow, and vehicle speed data:

• Rate of change limit for RPM = 10,000 (RPM)/sec

• Rate of change limit for Fuel flow = 0.003 (gal/sec)/sec

• Rate of change limit for Vehicle speed = 21 (mph)/sec

According to Sensor’s report, these filters remove the data outside the defined limits.

The SEMTECH post processor automatically interpolates between the remaining data, and produces results at 1Hz as before (Ensfield 2002). Because this procedure was finished by manually plotting the ECM parameters and computed mass results, all the buses data were screened again to check any remaining data spikes for data quality assurance purpose. No such errors were identified for this kind of problem. But the modeler should keep in mind that data could be erroneous because “unreasonable” engine acceleration or deceleration was removed that could have been within reasonable absolute limits.

GPS Dropouts: It is reported that there were a few instances when the GPS lost communication with the satellite for unknown reasons, and these erroneous GPS data were removed manually (Ensfield 2002). To guarantee data quality, the modeler screened all GPS data again to check any remaining erroneous cases. The principles to screen erroneous GPS data are based on the consistence between GPS data and engine parameters. The secondary screening identified that Bus 360 data still contained some erroneous GPS data. The questionable area covers the beginning 434 seconds of the whole trip (see Figure 4-4). Their GPS data are shown as red part in left figure. Right figure illustrates the time series plot for checked area. Although GPS signals are reported

48 as some fixed positions in left figure while vehicle speed data are reported as zero in right figure, engine speed and engine power in right figure shows that Bus 360 did move during that period. This error might due to GPS dropouts.

Figure 4-4 Example Check for Erroneous GPS Data for Bus 360

Due to GPS dropouts, the GPS signals were reported as some fixed positions. At the same time, the vehicle speed might be reported as zero while other ECM data, such as engine speed and engine power, would show that the bus did move during that period.

If the modeler fails to screen and remove such data, these data will be classified as idle mode. Further, it will cause erroneous analysis result for idle mode. The modeler screened all buses manually and found that 6 buses had such problem (Bus 360,361, 363,

364, 375, 377). Usually, this type of error was prevalent during the beginning of bus trip.

All erroneous data were removed manually. The correction of the database to remove these erroneous data is critical to model development (initial models associated with development of idle and load-based emission rates were problematic until this database error was identified and corrected by the author).

Synchronization Errors: Data were checked for synchronization errors. An example plot of such check is presented in Figure 4-5 where part of the trip for Bus 360 is used. The selected area covers about 200 seconds. Their GPS data are shown as

49 green/red part in the left figure. The figure on the right illustrates the time series plot for the area checked. The speed for red points in both figures is 0 mph. Although NOx correlates well to engine load and engine speed, vehicle speed doesn’t correlate well to engine data and NOx emissions data. Bus 360 was equipped with an earlier version ECM that did not provide vehicle speed. GPS velocity data were used in place of the ECM data. According to Sensor’s report, data synchronization was only done between emissions data and engine data, not for vehicle speed for emissions data (Ensfield 2002).

Figure 4-5 Example Check for Synchronization Errors for Bus 360

All bus data were checked for this type of error and such errors were identified in all of the test data for 6 buses (Bus 360, 361, 363, 364, 375, 377). Coincidentally, these 6 buses had GPS dropout problems too. Although from Frey’s work (Frey and Zheng

2001), it is found that small errors in synchronization do not substantially impact estimate of total trip emissions, such deviations will influence the estimate for micro-scale analysis. To choose the right delay time to remove the GPS data and vehicle speed data, the author compared the impacts of using a 2-second, 3-second, and 4-second delay.

Figure 4-6 illustrates histograms of engine power for zero speed data based on three different proposed time delay options. A 3-second delay is chosen because engine power distribution for zero speed data based on 3-second delay is more reasonable. Comparing

50 to the 2-second delay results, zero speed data contain less data points with higher engine power (>150 brake horsepower) for 3-second delay. Meanwhile, zero speed data contain more data points with lower engine power (<20 brake horsepower) for 3-second delay than 4-second delay time.

Figure 4-6 Histograms of Engine Power for Zero Speed Data Based on Three Different Time Delays

Road Grade Validation: According to Sensor’s report, the GPS data were used for grade calculation. Combing the velocity at time t with the difference in altitude between time t and t-1 second, the instantaneous grade is computed as shown below.

51 The calculation formula can generate significant errors given the uncertainty in the GPS position, particularly at low speeds where there is less of a differential in distance over the one-second interval (Ensfield 2002). In the real world, maximum recommended grades for use in design depends upon the type of facility, the terrain in which it is built, and the design speed. Figure 4-7 is directly cited from Traffic

Engineering (Roess et al. 2004) to present a general overview of usual practice. Roess et al. (2004) indicated that these criteria represent a balance between the operating comfort of motorists and passengers and the practical constrains of design and construction in more severe terrains.

Figure 4-7 General Criteria for Maximum Grades (Roess et al. 2004)

52 The modeler screened the grade data in the database and found that 0.42% of the data have higher grade (>10%). Meanwhile, 2% of the road grade data have higher rate of change (> 5%). This means some road grade data are doubtable or erroneous.

Considering Sensor’s recommendations, road grade data would only be used as reference, and would not be used directly in model development.

4.1.5 Database Formation

The data dictionaries of the source files were reviewed for parameter content. Not all variables reported will be included in explanatory analysis. A standard file structure was designed to accommodate the available format. Emissions rate data with unit of gram/second were selected to develop the proposed emission rate model. Because volumetric fuel rate, fuel specific gravity, and fuel mass flow rate are used to calculate mass emissions (g/s), these variables will be excluded in further analysis. Similarly, because percent engine load, engine torque, and engine speed are used to calculate engine power (brake horsepower), only engine power (bhp) is selected to represent power related variables. Exhaust flow rate is excluded because it is back-computed from the mass emissions generated with the fuel flow method. Fuel economy is excluded because it is a

30 second moving average data and computed for a test period by summing the fuel consumed and dividing by the distance traveled. Because GPS data were used for grade calculation and road grade data would only be used as reference, a dummy variable was created to represent different road grade ranges.

At the same time, variables that might be helpful in explaining variability in vehicle emissions were included in the proposed file structure although they were not provided in the original dataset. These variables include model year, odometer reading, and acceleration. Acceleration data were derived from speed data using central difference method. Table 4-3 summaries the parameter list for explanatory analysis.

53 Table 4-3 List of Parameters Used in Explanatory Analysis for Transit Bus

Category Parameters Test Information Date; Time Vehicle Characteristics License number; Model year; Odometer reading; Engine size; Instrument configuration number Roadway Characteristics Dummy variable for road grade range Onroad Load Parameters Engine power (bhp); Vehicle speed (mph); Acceleration (mph/s) Engine Operating Parameters Throttle position (0 – 100%); Engine oil temperature(deg F); Engine oil pressure (kPa); Engine warning lamp (Binary); Engine coolant temperature (deg F); Barometric pressure reported from ECM (kPa) Environmental Conditions Ambient temperature (deg C); Ambient pressure (mbar); Ambient relative humidity (%); Ambient absolute humidity (grains/lb air) Vehicle Emissions HC, CO, NOx emission (in g/sec)

4.1.6 Data Summary

After the post-processing procedure was completed, the summary of the emissions and activity data as well as environmental and roadway characteristics is given in Table 4-4.

54

55 4.2 Heavy-duty Vehicle Dataset

The heavy-duty vehicle emission dataset is prepared by the USEPA National Risk

Management Research Laboratory (USEPA 2001b). EPA’s Onroad Diesel Emissions

Characterization (ODEC) facility has been collecting real-world gaseous emissions data for many years. The onroad facility incorporated a 1990 Kenworth T800 tractor as its test vehicle to collect this database. When this truck was purchased, it had already logged over 900,000 miles and was due for an overhaul of its Detroit Diesel Series 60 engine. The vehicle was tested prior to having this work done and after the overhaul.

NRMRL collected the test data for USEPA from 1999 to 2000 and included all the results and findings in Report “Heavy Duty Diesel Fine Particulate Matter Emissions:

Development and Application of On-Road Measurement Capabilities”.

4.2.1 Data Collection Method

The general capabilities of the ODEC facility are shown in Figure 4-8. The facility is designed to collect data while traveling along the public roadways with fully integrated into the test class 8b truck, a 1990 Kenworth T800 tractor. This truck was tested using two types of tests. During ‘parametric’ testing, the truck systematically follows a test matrix representing the full range of load, grade, speed and acceleration conditions. During ‘highway’ testing, the truck travels along an interstate highway with no specific agenda other than covering the distance safely and efficiently; speed and acceleration vary randomly with grade, speed limit, and traffic effects. Table 4-5 and 4-6 summarize the testes finished by NRMRL for USEPA.

56

Figure 4-8 Onroad Diesel Emissions Characterization Facility (USEPA 2001b)

Table 4-5 Onroad Tests Conducted with Pre-Rebuild Engine

Test Load Grade(s) Comments ID lb GCW % 3F00V 79280 Zero Constant Speed Testing 3F00C 79280 Zero Cost Down & Acceleration 3F00A 79280 Zero Governed Acceleration & Short-shift Acceleration 3H00V 61060 Zero Constant Speed Testing 3H00C 61060 Zero Cost Down & Acceleration 3H00A 61060 Zero Governed Acceleration & Short-shift Acceleration 3E00V 42840 Zero Constant Speed Testing 3E00C 42840 Zero Cost Down & Acceleration 3E00A 42840 Zero Governed Acceleration & Short-shift Acceleration 3F0GA 79280 Zero Governed Acceleration 3F0SA 79280 Zero Short-shift Acceleration 3F0V 79280 Zero Constant Speed Testing 3H0GA 61060 Zero Governed Acceleration 3H0SA 61060 Zero Short-shift Acceleration 3H0V 61060 Zero Constant Speed Testing 3E0GA 42840 Zero Governed Acceleration 3E0SA 42840 Zero Short-shift Acceleration 3E0V 42840 Zero Constant Speed Testing 3F3&6 79280 3.1, 6.0 Uphill Grade Tests 3H3&6 61060 3.1, 6.0 Uphill Grade Tests 3E3&6 42840 3.1, 6.0 Uphill Grade Tests

57 Table 4-5 Continued

3F-SEQ 79280 Zero Dyno Sequence Simulations 3DRI 79280 Various Open Highway Tests - Tunnel 3FIL 61060 Various Open Highway Tests – Filters 3DIOX* 61060 Various Open Highway Tests - Dioxin Note: * These tests are not available at this moment

Table 4-6 Onroad Tests Conducted with Post-Rebuild Engine

Test Load Grade(s) Comments ID lb GCW % 5F0V 74000 Zero Constant Speed Testing 5F0C* 74000 Zero Cost Down & Acceleration 5F0A* 74000 Zero Governed Acceleration & Short-shift Acceleration 5H0V 61440 Zero Constant Speed Testing 5H0C* 61440 Zero Cost Down & Acceleration 5H0A* 61440 Zero Governed Acceleration & Short-shift Acceleration 5E0V 42600 Zero Constant Speed Testing 5E0C* 42600 Zero Cost Down & Acceleration 5E0A* 42600 Zero Governed Acceleration & Short-shift Acceleration 5F3&6 74000 3.1, 6.0 Uphill Grade Tests 5H3&6 61440 3.1, 6.0 Uphill Grade Tests 5E3&6 42600 3.1, 6.0 Uphill Grade Tests 5F-SEQ* 74000 Zero Dyno Sequence Simulations 5Plume 61440 Various Open Highway Tests - Plume 5NOxB* 61440 Various Open Highway Tests - Burst 5DIOX* 61440 Various Open Highway Tests - Dioxin Note: * These tests are not available at this moment

4.2.2 Heavy-duty Vehicle Data Parameters

A total of 42 files were collected for the pre-rebuild engine and a total of 38 file collected for the post-rebuild engine. Each file represents data collected for a different engine and test. Preliminary analysis of individual files indicated that the format of files was same for all available file. The data fields included in each file are summarized below.

58 Table 4-7 List of Parameters Given in Heavy-duty Vehicle Dataset Provided by USEPA

Category Parameters Test Information Date; Time Vehicle Characteristics Vehicle make/model; Model year; Engine type; Engine Rating; Vehicle maintenance history Onroad Load Parameters Truck load weight (lb); Vehicle speed (mph); Measured engine power (bhp) Engine Operating Engine speed (RPM); Shaft volts; Torque volts; Fuel H/C Parameters ratio; Fuel factor; Engine intake air temperature (deg F); Engine exhaust air temperature (deg F); Engine coolant temperature (deg F); Engine oil temperature (deg F) Environment Conditions Barometric pressure (inch Hg); Ambient humidity (%) Vehicle Emissions CO, NOx, and HC emission (in PPM, g/hr, g/kg fuel and g/hp-hr units)

4.2.3 Data Quality Assurance/Quality Control Check

Although it is reported that a total of 80 tests finished for that project, preliminary screening found that there were some test files missed from data DVD provided by

USEPA to the researchers. The missing test files include: 3DIOX, 5E0C, 5H0C, 5F0C,

5F-SEQ, 5NOxB, and 5DIOX. For quality assurance purposes, the available data files were screened to check for errors or possible problems. Possible sources of errors for data collection should be considered before developing model. The types of errors checked are listed below.

Loss of Data: Measured horsepower (engine power) and emission data were missing for some tests. Test 3F-SEQ, 3FIL1, 3FIL2, and 3FIL3 had no measured horsepower data for the entire test. These test files couldn’t be included in emission model development. In addition, test 3E00A, 3E00C, 3E00V, 3F0GA, 3F0SA, 3F0V,

3H0SA, 3FIL4, 3FIL5, 3FIL7, 3FIL8, 3FIL9, 3FIL10, and 5H0V had no HC emission data. This problem will be fixed by removing these tests for HC emission model development. Test 3H0SA also had no CO emission data and this problem will be treated by removing this test for CO emission model development.

59 Duplicated Records: A notable issue was there were duplicate records with different emission values for same time in some test files. After communicating with Mr.

Brown who prepared this dataset for EPA, the reason was identified as the data were recorded as high as 10 Hz to improve the resolution of the data. To keep consistent with other test files, these data were post-processed as one data point for each second.

Erroneous Load Data: The “measured horsepower” field is engine power data calculated from measurement of the drive shaft torque and rotational speed. Results from the literature review show that engine power is a major explanatory variable. This variable was screened to check for errors or possible problems. An example of measured horsepower check is given in Figure 4-9. The observed relationship between measured horsepower and engine speed is a kind of relationship between vehicle speed and engine speed which could be found in book of “Fundamentals of Vehicle Dynamics” (Gillespie

1992). At a given gear ratio, the relationship between engine speed and road speed is a kind of linear relationship. The geometric progression in the left figure reflects the choices made in selection of transmission gear ratios. The right figure shows such impossible linear relationship between measured horsepower and vehicle speed. This indicates that measured horsepower is calculated incorrectly for this test. It is found that such problem exists in the series of test 3DRI and test 5Plume. These test files were removed from emission model development.

60

Figure 4-9 Example Check for Erroneous Measured Horsepower for Test 3DRI2-2

Vehicle Speed Validation: The author reviewed NRMRL’s report related to vehicle speed validation. It is reported that vehicle speed data were measured with

Datron LS1 optical speed sensor. The product literature specifies an accuracy of +/-

0.2% and a reproducibility of +/- 0.1% over the measurement range of 0.5 to 400 kph.

The following figure from NRMRL’s report correlates the speed measurement to a drive shaft speed sensor that was scaled using a NIST-traceable frequency source. It is said that the outliers at the low-speed indicated when the truck was turning (the tractor and the trailer-mounted speed sensor traveled less distance than the tractor does during turns).

Notwithstanding these points, the correlation is a good indication of speed measurement precision.

61

Figure 4-10 Vehicle Speed Correlation (USEPA 2001b)

At the same time, NRMRL provided the following figure to show the precision for four ranges of vehicle speed, along with similar estimates of accuracy. This will help the researchers to deal with speed measurement noise in future.

Figure 4-11 Vehicle Speed Error for Different Speed Ranges (USEPA 2001b)

62 4.2.4 Database Formation

The data dictionaries of the source files were reviewed for parameter content. Not all variables reported will be included in explanatory analysis. A standard file structure was designed to accommodate the available format. Emissions data with unit of gram/second are selected to develop the proposed emission model. All variables used to calculate mass emissions are excluded in further analysis. Similarly, because “measured horsepower” field is calculated from measurements of the drive shaft torque and rotational speed, only “measured horsepower” is used to represent power related variables. At the same time, variables that might be helpful in explaining variability in vehicle emissions were included in the proposed file structure although they were not provided in the original dataset, like acceleration. Acceleration data were derived from speed data using the central difference method.

Table 4-8 List of Parameters Used in Explanatory Analysis for HDDV

Category Parameters Test Information Date; Time Vehicle Characteristics Vehicle make/model; Model year; Engine type; Engine rating; Vehicle maintenance history Onroad Load Parameters Truck load weight (lb); Vehicle speed (mph); Acceleration (mph/s); Measured engine power (bhp) Engine Operating Engine intake air temperature (deg F); Engine exhaust air Parameters temperature (deg F); Engine coolant temperature (deg F); Engine oil temperature (deg F) Environment Conditions Barometric pressure (Hg), Ambient moisture (%) Vehicle Emissions CO, NOx, and HC emission (in g/s units)

4.2.5 Data Summary

After the post-processing procedure was completed, the summary of the emissions and activity data as well as environmental and roadway characteristics is given in Table 4-9.

63 Table 4-9 Summary of Heavy-Duty Vehicle Data

Environment Vehicle Operation Emission Data Characteristics Number Average Average Barometric Ambient of Engine Average Average Average Test ID Speed Pressure Moisture Seconds Power CO (g/s) NOx (g/s) HC (g/s) (mph) (Hg) (%) of Data (bhp) 3F00V 4430 43.55 163.10 0.11633 0.27983 0.001442 28.273 1.6874 3F00C 7991 36.49 323.79 0.08200 0.19566 0.001166 28.272 1.6874 3F00A 1904 43.55 475.12 0.17476 0.34262 0.001471 28.272 1.6874 3H00V 3718 43.66 130.99 0.08386 0.22701 0.001429 28.273 1.6874 3H00C 7593 39.43 112.50 0.07456 0.17866 0.001414 28.272 1.6874 3H00A 1959 48.04 218.50 0.20521 0.32078 0.001751 30.423 1.3573 3E00V 3863 41.41 123.42 0.10896 0.21157 NA 28.273 1.6874 3E00C 7962 39.31 104.95 0.07489 0.14908 NA 28.272 1.6874 3E00A 1810 50.15 197.07 0.22324 0.26108 NA 30.137 1.9020 3F0GA 577 35.93 302.14 0.23114 0.41269 NA 29.995 0.4685 3F0SA 792 36.26 287.45 0.25140 0.37947 NA 29.995 0.4685 3F0V 3635 41.65 152.23 0.14879 0.28413 NA 29.995 0.4685 3H0GA 594 33.81 253.63 0.30036 0.48494 0.002159 29.690 1.6059 3H0SA 707 34.27 223.73 NA 0.32498 NA 29.690 1.6059 3H0V 3331 41.53 143.38 0.08892 0.27712 0.002436 28.020 0.4742 3E0GA 421 32.91 233.93 0.37978 0.30728 0.000589 29.976 0.5812 3E0SA 571 31.99 180.73 0.23652 0.33325 0.003042 29.976 0.5812 3E0V 3395 42.64 103.63 0.08879 0.25745 0.002805 29.976 0.5812 3F3&6 8629 36.59 131.00 0.14409 0.31374 0.001426 28.282 1.2520 3H3&6 10573 43.13 107.06 0.16769 0.27507 0.001753 28.273 1.6874 3E3&6 9825 44.74 121.69 0.16617 0.23913 0.001839 28.250 1.5716 3FIL4 12456 66.54 152.91 0.06994 0.29925 NA 29.238 0.3886 3FIL5 13738 58.76 129.99 0.06354 0.22315 NA 29.238 0.3886 3FIL6 6415 66.94 130.11 0.06273 0.20833 0.001409 29.238 0.3886 3FIL7 10678 62.76 164.82 0.07042 0.28353 NA 29.854 0.1480 3FIL8 12248 64.70 147.26 0.06688 0.26035 NA 29.773 0.1484 3FIL9 11956 65.62 153.44 0.06551 0.20905 NA 29.418 0.1502 3FIL10 12367 63.71 167.73 0.07481 0.35788 NA 30.132 0.1466 5F0V 4895 32.87 96.09 0.10716 0.23558 0.002828 30.101 0.5761 5H0V 4091 42.36 126.14 0.12564 0.30933 NA 30.179 0.6091 5E0V 4407 42.60 105.84 0.10681 0.29045 0.002894 30.278 0.8601 5F3&6a 6971 36.24 147.99 0.13716 0.31607 0.003111 28.004 0.9070 5F3&6b 5058 38.69 133.54 0.14044 0.30661 0.001924 28.009 0.8862 5H3&6a 6919 39.74 133.01 0.12723 0.28763 0.002397 28.024 0.8138 5H3&6b 6951 39.44 148.26 0.15400 0.32910 0.002807 28.014 1.2149 5E3&6 10807 46.01 124.07 0.13981 0.27674 0.002827 28.024 1.0131

64 CHAPTER 5

METHODOLOGICAL APPROACH

The following chapter lays the theoretical foundation of the conceptual framework of model development. This chapter outlines the statistical methods, addresses issues that arise in statistical modeling, and presents the solutions that are employed in addressing these issues. So this chapter will serve as a guide or “road map” for the underlying methodology of the model development process.

5.1 Modeling Goal and Objectives

The goal of this research is to provide emission rate models that fill the gap between the existing models and the ideal models for predicting emissions of NOx, CO, and HC from heavy-duty diesel vehicles. Deficiencies in existing models, like EPA’s

MOBILE series and CARB’s EMFAC series of models, have been highlighted in previous chapters. The U.S. Environment Protection Agency (USEPA) is currently developing a new set of modeling tools for the estimation of emissions produced by onroad and off-road mobile source. The new Multi-scale mOtor Vehicle & equipment

Emission System, known as MOVES, is a modeling system designed to better predict emissions from onroad operations. The philosophy behind MOVES is the development of a model that is as directly data-driven as possible, meaning that emission rate are developed from second-by-second or binned data.

Using second-by-second data collected from onroad vehicles, the research effort reported in this thesis will develop models that predict emissions as a function of onroad variables known to affect vehicle emissions. The model should be robust and ensure that assumptions about the underlying distribution of the data are verified and the properties of parameter estimates are not violated. With limited available data, this study focuses on

65 development of an analytical methodology that is repeatable with a different data set from across space and across time. As more data become available, the proposed model will need to be re-estimated to ensure that the model is transferable across additional

HDV engine types, operating conditions, environmental conditions, and even perhaps geographical region.

5.2 Statistical Method

The purpose of statistical modeling was to determine which explanatory variables significantly influence vehicle emissions so that the data can be stratified by those variables and corresponding regression relationship can be developed. For many statistical problems there are several possible solutions. In comparing the means of two small groups, for instance, we could use a t test, a t test with a transformation, a Mann-

Whitney U test, or one of several others. The choice of method depends on the plausibility of normal assumptions, the importance of obtaining a confidence interval, the ease of calculation, etc.

Parametric or non-parametric approaches to evaluation can be applied.

Parametric methods are used when the distribution is either known with certainty or can be guessed with a certain degree of certainty. These methods are only meaningful for continuous data which are sampled from a population with an underlying normal distribution or whose distribution can be rendered normal by mathematical transformation. Analysts must be careful to ensure that significant errors are not introduced when assumptions are not met. In contrast, nonparametric methods make no assumptions about the distribution of the data or of the functional form of the regression equation. Nonparametric methods are especially useful in situations where the assumptions required by parametric are in question. Brief overviews and underlying theories of statistical methods those might used in this research are addressed in the following sections.

66 5.2.1 Parametric Methods

5.2.1.1 The t-Test

Student’s t-test is one of the most commonly used techniques for testing whether the means of two groups are statistically different from each other. The

Student’s t-distribution was published by William Gosset in 1908. This test tries to determine whether the measured difference between two groups is large enough to reject the null hypothesis or such differences are just due to “chance”. The formula for the t- test (Equation 5-1) is a ratio. The numerator of the ratio is just the difference between the two means or averages. The denominator is a measure of the variability or dispersion of the data.

x − x t = 1 2 (Equation 5-1) s2 s2 1 + 2 n1 n2 v 2 2 where x1 and x2 are the sample means, s1 and s2 are the sample variances, n1 and n2 are the sample sizes and t is a Student t quantile with n1 + n2 - 2 degrees of freedom. Usually a significance level of 0.05 (or equivalently, 5%) is employed in statistical analyses. The significance level of a statistical hypothesis test is a fixed probability of wrongly rejecting the null hypothesis H0, if it is in fact true. Another index is p-value which is the probability of getting a value of the test statistic as extreme as or more extreme than that observed by chance alone, if the null hypothesis H0, is true. The p-value is compared with the actual significance level of the test and, if it is smaller, the result is significant. That is, if the null hypothesis were to be rejected at the 5% significance level, this would be reported as "p < 0.05".

The assumptions for t-test include: 1) the populations are normally distributed; 2) variances in the two populations are equal; 3) the populations are independent. The results of the analysis may be incorrect or misleading when assumptions are violated.

67 For example, if the assumption of independence for the sample values is violated, then the two sample t test is simply not appropriate. If the assumption of normality is violated, or outliers are present, the two sample t test may not be the more powerful available test.

This could mean the difference between detecting a true difference or not. A nonparametric test or employing a transformation may result in a more powerful test.

5.2.1.2 Ordinary Least Squares Regression

Regression analysis is a statistical methodology that utilizes the relation between two or more quantitative variables so that one variable can be predicted from the other, or others (Neter et al. 1996). Regression analysis was first developed by Sir Francis Galton in the latter part of the 19th century. There are many different kinds of regression models, like linear regression model, exponential regression model, logistic regression model, and so on. Among them, linear regression is a commonly used and easily understood statistical method. Linear regression explores relationships that can be described by straight lines or their generalization to many dimensions. Regression allows a single response variable to be described by one or more predictor variables.

Ordinary least squares (OLS) regression is a common statistical technique for quantifying the relationship between a continuous dependent variable and one or more independent variables. The dependent variables may be either continuous or discrete.

Neter et al. (1996) provide the basic OLS regression equation for a single variable regression model as follows: ˆ ˆ ˆ Yi = β 0 + β i X i + ε i (Equation 5-2) Where, ˆ Yi = value of the response variable in the ith trial ˆ ˆ β 0 , β i = estimators of regression parameters

X i = value of the predictor variable in the ith trial 2 2 ε i = random error term with mean E{ε i } = 0 and variance σ {ε i } = σ ;

ε i and ε j are uncorrelated so that their covariance is zero.

68 ˆ ˆ The parameters of the OLS regression equation, β 0 and βi , are found by the least squares method, which requires that the sum of squares of errors be minimized. Gauss-

Markov theorem states that, among all unbiased estimators that are linear combinations of y’s, the OLS estimators of regression coefficients have the smallest variance, i.e., they are best linear unbiased estimators. The Gauss-Markov Theorem doesn’t tell one to use least squares all the time, but it strongly suggests (Neter et al. 1996).

In linear regression, there are key assumptions that must be met, including:

• Yi are independent normal random variables

• The expected value of the error terms ε i is zero

2 • The error terms ε i are assumed to have constant variance σ

• The error terms ε i are assumed normally distributed

• The error terms ε i are assumed to be uncorrelated so that their covariance

is zero

• The error terms ε i are independent of the explanatory variable

If the above assumptions are violated the regression equation may yield biased results (Neter et al. 1996). For example, if explanatory variable is not independent of the error term, larger sample sizes do not lead to lower standard errors for the parameters, and the parameter estimates (slope, etc.) are biased. If the error is not distributed normally, for example, there may be fat tails. Consequence, use of the normal may underestimate true 95% confidence intervals.

5.2.1.3 Robust Regression

OLS models generally rely on the normality assumption and are often fitted by means of the least squares estimators. However, the sensitivity of these estimation techniques is related this underlying assumption which has been identified as a weakness that can lead to erroneous interpretations (Copt and Heritier 2006). Robust regression

69 procedures dampen the influence of outlying cases, as compared to OLS estimation, in an effort to provide a better fit for the majority of cases. They are useful when a known, smooth regression function is to be fitted to data that are “noisy”, with a number of outlying cases, so that the assumption of a normal distribution for the error terms is not appropriate (Neter et al. 1996). The MM-estimators are designed to be both highly robust against outliners and highly efficient.

5.2.2 Nonparametric Methods

Nonparametric methods have several advantages comparing with parametric methods. Nonparametric methods require no or very limited assumptions to be made about the format of the data, and they may therefore be preferable when the assumptions required for parametric methods are not valid (Whitley and Ball 2002). Nonparametric methods can be useful for dealing with unexpected, outlying observations that might be problematic with a parametric approach. Nonparametric methods are intuitive and are simple to carry out by hand, for small samples at least.

However, nonparametric methods may lack power as compared with more traditional approaches (Siegel S 1988). This is a particular concern if the sample size is small or if the assumptions for the corresponding parametric method hold true (e.g. normality of the data). Nonparametric methods are geared toward hypothesis testing rather than estimation of effects. It is often possible to obtain nonparametric estimates and associated confidence intervals, but this is not generally straightforward. In addition, appropriate computer software for nonparametric methods can be limited, although the situation is improving.

5.2.2.1 Chi-Square Test

The Chi-square test is the oldest and best known goodness-of-fit test. The test assumes that the observations are independent and that the sample size is reasonably

70 large. This method can be used to test whether a sample fits a known distribution, or whether two unknown distributions from different samples are the same. The test can detect major departures from a logistic response function, but is not sensitive to small departures from a logistic response function. The test assumptions are that the sample is random and that the measurement scale is at least ordinal (Conover 1980; Neter et al.

1996).

Pearson's chi-square goodness of fit test statistic is (StatsDirect 2005):

(Equation 5-3) where Oj are observed counts, Ej are corresponding expected count and c is the number of classes for which counts/frequencies are being analyzed.

The test statistic is distributed approximately as a chi-square random variable with c-1 degrees of freedom. The test has relatively low power (chance of detecting a real effect) with all but large numbers or big deviations from the null hypothesis (all classes contain observations that could have been in those classes by chance).

The handling of small expected frequencies is controversial. Koehler and Larnz asserted that the chi-square approximation is adequate provided all of the following are true: total of observed counts (N) ≥ 10; number of classes (c) ≥ 3; all expected values≥ 0.25 (Koehler and Larnz 1980).

5.2.2.2 Kolmogorv-Smirnov Two-Sample Test

The Kolmogorov-Smirnov (KS) two-sample test compares the empirical distribution functions of two samples, F1 and F2. The Kolmogorov-Smirnov test is a nonparametric test, which can be used to test whether two or more samples are governed by the same distribution by comparing their empirical distribution functions.

The Kolmogorov-Smirnov two sample test statistic can be defined as follows

(Chakravart and Roy 1967):

71 (Equation 5-4) where E1 and E2 are the empirical distribution functions for the two samples. The Kolmogorov-Smirnov two-sample test provides an improved methodology over the chi-squared test since data do not have to be assigned arbitrarily to bins. Further, it is a non-parametric test so a distribution does not have to be assumed. However, the main disadvantage to the K/S is similar to the chi-square in that the orders of magnitudes of separate tests that would have to be conducted to test all the possible combinations of variables in the datasets which is logistically infeasible (Hallmark 1999).

5.2.2.3 Wilcoxon Mann-Whitney Test

The Wilcoxon Mann-Whitney Test is one of the most powerful of the nonparametric tests for comparing two populations (Easton and McColl 2005). It is used to test the null hypothesis that two populations have identical distribution functions against the alternative hypothesis that the two distribution functions differ only with respect to location (median), if at all.

The Wilcoxon Mann-Whitney test does not require the assumption that the differences between the two samples are normally distributed. In many applications, the

Wilcoxon Mann-Whitney Test is used in place of the two sample t-test when the normality assumption is questionable. This test can also be applied when the observations in a sample of data are ranks, that is, ordinal data rather than direct measurements.

The Mann Whitney U statistic is defined as (StatsDirect 2005):

(Equation 5-5) where samples of size n1 and n2 are pooled and Ri are the ranks. U can be resolved as the number of times observations in one sample precede observations in the other sample in the ranking. Wilcoxon rank sum, Kendall's S and the

72 Mann-Whitney U test are exactly equivalent tests. In the presence of ties the Mann-

Whitney test is also equivalent to a chi-square test for trend.

5.2.2.4 Analysis of Variance (ANOVA)

ANOVA (Analysis of Variance), sometimes called an F test, is closely related to the t test. The major difference is that, where the t test measures the difference between the means of two groups, an ANOVA tests the difference between the means of two or more groups. ANOVA modeling does not require any assumptions about the nature of the statistical relation between the response and explanatory variables, nor do they require that the explanatory variables be quantitative (Neter et al. 1996).

The one way ANOVA, or single factor ANOVA, compares several groups of observations, all of which are independent, but possibly with a different mean for each group. A test of great importance is whether or not all the means are equal. The advantage of using ANOVA rather than multiple t-tests is that it reduces the probability of a type-I error (making multiple comparisons increases the likelihood of finding something by chance). One potential drawback to an ANOVA is that it can only tells that there is a significant difference between groups, not which groups are significantly different from each other. The breakdowns of the total sum of squares and degrees of freedom, together with the resulting mean squares, are presented in an ANOVA table such as Table 5-1.

Table 5-1 ANOVA Table for Single-Factor Study (Neter et al. 1996)

Source of Variation SS df MS E(AMS) Between 2 r −1 2 SSTR = ni (Yi. − Y.. ) SSTR n (μ − μ ) treatments ∑ MSTR = σ 2 + ∑ i i . r −1 r −1

Error (within SSE = (Y − Y )2 n − r SSE σ 2 treatments) ∑∑ ij i. T MSE =

nT − r Total 2 n −1 SSTO= ∑∑(Yij −Y..) T

73 A factorial ANOVA can examine data that are classified on multiple independent variables. A factorial ANOVA can show whether there are significant main effects of the independent variables and whether there are significant interaction effects between independent variables in a set of data. Interaction effects occur when the impact of one independent variable depends on the level of the second independent variable (Neter et al.

1996). Computation can be performed with standard statistical software.

5.2.2.5 Hierarchical Tree-Based Regression

Hierarchical Tree-Based Regression (HTBR) is a forward step-wise variable selection method, similar to forward stepwise regression. This method is also known as

Classification and Regression Tree (CART) analysis (Breiman et al. 1984). This technique generates a "tree" structure by dividing the sample data recursively into a number of groups. The groups are selected to maximize some measure of difference in the response variable in the resulting groups. As Washington et al. summarized in 1997, this method is based upon iteratively asking and answering the following questions: (1) which variable of all of the variables ‘offered’ in the model should be selected to produce the maximum reduction in variability of the response? and (2) which value of the selected variable (discrete or continuous) results in the maximum reduction in variability of the response? The HTBR terminology is similar to that of a tree; there are branches, branch splits or internal nodes, and leaves or terminal nodes (Washington et al. 1997a).

To explain the method in mathematical terms, the definitions presented by

Washington et al. (Washington et al. 1997a). The fist step is to define the deviance at a node. A node represents a data set containing L observations. The deviance, Da , can be estimated as follows:

L 2 Da = ∑(yl,a − xa ) (Equation 5-6) l=1 where,

74 Da = total deviance at node a, or the sum of squared error (SSE) at the node

th yl,a = l observation of dependent variable y at node a

xa = estimated mean of L observations in node a

Next, the algorithm seeks to split the observation at node a on a value of an independent variable, X i , into two branches and corresponding nodes b and c, each

containing M and N of the original L observations (M+N=L) of the variable X i . The deviance reduction function evaluated over all possible Xs then can be defined:

Δ (allX ) = Da − Db − Dc (Equation 5-7) M 2 Db = ∑(ym,b − xb ) (Equation 5-8) m=1 N 2 Dc = ∑(yn,c − xc ) (Equation 5-9) n=1

Where

Δ (allX ) = the total deviance reduction function evaluated over the domain of

all Xs

Db = total deviance at node b

Dc = total deviance at node c

th ym,b = m observation on dependent variable y in node b

th yn,c = n observation on dependent variable y in node c

xb = estimated mean of M observations in node b

xc = estimated mean of N observations in node c

The variable X k and its optimum split X k (i) is sought so that the reduction in deviance is maximized, or more formally when

75 L M N 2 2 2 Δ (allX ) = ∑(yl,a − xa ) − ∑(ym,b −x b ) − ∑(yn,c − xc ) = max l=1 m=1 n=1

(Equation 5-10)

The maximum reduction occurs at a specific value X k (i) , of the independent

variable X k . When the data are split at this point, the remaining samples have a much smaller variance than the original data set. Thus, the reduction in node a deviance is greatest when the deviances at nodes b and c are smallest. Numerical search procedures are employed to maximize equation 5-10 by varying the selection of variables used as a basis for a split and the value to use for each variable at a split.

In growing a regression tree, the binary partitioning algorithm recursively splits the data in each node until the node is homogenous or the node contains too few observations. If left unconstrained, a regression tree model can “grow” until it results in a complex model with a single observation at each terminal node that explains all the deviance. However, for application purposes, it is desirable to create criteria to balance the model's ability to explain the maximum amount of deviation with a simpler model that is easy to interpret and apply. Some software, such as S-PlusTM, allows the user to select such criteria. The software allows the user to interact with the data in the following manner to select variables and help simplify the final model:

• Response variable: the response variable is selected by the user from a list

of fields from the data set;

• Predictor variables: one or more independent variables can be selected by

the user from a list of fields associated with the dataset;

• Minimum number of observations allowed at a single split: sets the

minimum number of observations that must be present before a split is

allowed (default is 5);

76 • Minimum node size: sets the allowed sample size at each node (default is

10);

• Minimum node deviance: the deviance allowed at each node (default is

0.01).

However, unlike ordinary least squares (OLS) regression models, a shortcoming of HTBR is the absence of formal measures of model fit, such as t-statistics, F-ratio, and r-square, to name a few. Thus, the HTBR model is used to guide the development of an

OLS regression model, rather than as a model in its own right. Similar uses of HTBR techniques have been developed and applied in previous research papers (Washington et al. 1997a; Washington et al. 1997b; Fomunung et al. 1999; Frey et al. 2002).

5.3 Modeling Approach

The model development process will start by using HTBR both as a data reduction tool and for identifying potential interactions among the variables. Then

Ordinary Least Squares (OLS) Regression or Robust Regression is used with the identified variables to estimate a preliminary “final” model. After that, we need to check the model for compliance with normality assumptions and goodness of fit.

Several diagnostic tools are available to perform these checks. Once a preliminary “final” model is obtained, regression coefficients are examined using their t- statistics and correlation coefficients to determine which variables should be removed or retained in the model for further analysis. But, this procedure can lead to the removal of potentially important intercorrelated explanatory variables. In fact, variable agreement with underlying scientific principles of combustion, pollutant formation, emissions control (cause-effect relationships) should be the basis for the ultimate decisions regarding variable selection. Thus, a t-statistic may indicate that a parameter is insignificant (at level of significance = 0.05), while theory indicates that such a parameter

77 should be retained in the model for further analysis. This is usually referred to as a type

II error (Fomunung 2000).

2 F-statistics and adjusted coefficient of multiple determination, Ra are used to determine the effect-size of the parameters. Usually, adding more explanatory variables to the regression model can only increase R2 and never reduce it, because SSE can never become larger with more X variables and SSTO is always the same for a given set of responses. The adjusted coefficient of multiple determination can adjust R2 by dividing each sum of squares by its associated degrees of freedom. The F-test is used to test whether the parameter can be dropped even the t-statistic is appropriate.

In multiple regression analysis, the predictor or explanatory variables tend to be correlated among themselves and with other variables related to the response variable but not included in the model. The effects of multicollinearity are many and can be severe.

Neter et al. (Neter et al. 1996) have documented a few of these: when multicollinearity exists the interpretation of partial slope coefficients become meaningless; it can lead to estimated regression coefficients that vary widely from one sample to another; and there may be several regression functions that provide equally good fits to the data, making the effects of individual predictor variables difficult to assess.

There are some informal diagnostic tools suggested to detect this problem. A frequently used technique is to perform a simple correlation coefficient between the predictor variables to detect the presence of inter-correlation among independent variables. Large value of correlation is an indication that multicollinearity may exist.

Large changes in the estimated regression coefficients when a predictor variable is added or deleted are also an indication. Finally, multicollinearity may be a problem if estimated regression coefficients with an algebraic sign that is the opposite of that expected from theoretical considerations or prior experience (i.e. the beta coefficient is compensating for the beta coefficient of a correlated explanatory variable).

78 A formal method of detecting this problem is the variance inflation factor (VIF), which is a measure of how much the variances of the estimated regression coefficients are inflated as compared to when the predictor variables are not linearly related (Neter et al. 1996). This method is widely used because it can provide quantitative measurements of the impact of multicollinearity. The largest VIF value among all Xs is used to assess the severity of multicollinearity. As a rule of thumb, a VIF in excess of 10 is frequently used as an indication that multicollinearity is severe.

Diagnostic plots are examined to verify that normality and homoscedasticity assumptions, and the goodness of fit is investigated too. Because of difficulty in assessing normality, it is usually recommended that non-constancy of error variance could be investigated first (Neter et al. 1996). The plots used to identify any patterns in the residuals are considered as informal diagnostic tools and include plot of the residuals versus the fitted values and plot of square root of absolute residuals versus the fitted values. The normality of the residuals can be studied from histograms, box plots, and normal probability plots of the residuals. In addition, comparisons can be made of observed frequencies with expected frequencies if normality holds can be utilized too.

Usually, heteroscedasticity and/or inappropriate regression functions may induce a departure from normality. When OLS is applied to heteroskedastic models the estimated variance is a biased estimator of the true variance. That is, it either overestimates or underestimates the true variance, and, in general it is not possible to determine the nature of the bias. The variances, and the standard errors, may therefore be either understated or overstated.

5.4 Model Validation

Model validity refers to the stability and reasonableness of the regression coefficients, the plausibility and usability of the regression function, and the ability to

79 generalize inferences drawn from the regression function. Validation is a useful and necessary part of the model-building process (Neter et al. 1996).

Two basic ways of validating a regression model are: internal and external.

Internal validation consists of model checking for plausibility of signs and magnitudes of estimated coefficients, agreement with earlier empirical results and theory, and model diagnostic checks such as distribution of error terms, normality of error terms, etc.

Internal validation will be performed as part of the model estimation procedure.

External validation is the process to check the model and its predictive ability with collection of new data, like, data from another location or time, or using a holdout sample. Considering there are only 15 buses/engines in the data set, it is not practical to split the data set and hold sample for validation purpose. This will definitely influence the regression estimators. But suggestions and procedure about external validation will be provided.

80 CHAPTER 6

DATASET SELECTION AND ANALYSIS OF

EXPLANATORY VARIABLES

6.1 Dataset Used for Model Development

Development of a modal model designed to predict emissions on a second-by- second basis as a function of engine load requires the availability of appropriate emission test data. Modal modeling required the availability of second-by-second vehicle emissions data, collected in parallel with corresponding revealed engine load data. In

2004, only two data sets could be identified for use in this modeling effort. The U.S.

Environmental Protection Agency (EPA) provided two major HDV activity and emission databases to develop emission rate model. One database is a transit bus database, which included emissions data collected on diesel transit buses operated by the Ann Arbor

Transit Authority (AATA) in 2001, and another database is heavy HDV (HDV8B) database prepared by National Risk Management Research Laboratory (NRMRL) in

2001. The transit database consisted of data collected from 15 buses with same type of engines while the HDV8B database consisted of only one truck engine tested extensively onroad under pre-rebuild and post rebuild engine conditions. To decide whether it is suitable to combine these two datasets together or treat them individually, two dummy variables added in the databases to describe vehicle type. For the first dummy variable named “bus”, 1 was assigned for transit bus, and 0 for others. For the second dummy variable, 1 was assigned for HDDV with pre-rebuild engine, and 0 for others. HTBR was applied to all datasets to examine whether transit bus behave differently from HDDV or not. The regression trees and results for NOx, CO, and HC emission rates are given in

Figure 6-1 to Figure 6-3.

81

Figure 6-1 HTBR Regression Tree Result for NOx Emission Rate for All Datasets

Figure 6-2 HTBR Regression Tree Result for CO Emission Rate for All Datasets

82

Figure 6-3 HTBR Regression Tree Result for HC Emission Rate for All Datasets

Dummy variable for bus is selected as the first split for all three trees above, that means transit bus and HDDV should be treated separately. Since there are 15 engines in transit bus dataset while 1 engine (pre-rebuild and post-rebuild for the same engine) in

HDDV dataset, transit bus dataset should be used for the final version of the conceptual model development.

6.2 Representative Ability of the Transit Bus Dataset

The transit dataset was collected by Sensors, Inc. in Oct. 2001. The buses tested came from the Ann Arbor Transit Authority (AATA) and included 15 New Flyer models with Detroit Diesel Series 50 engines. All of the buses were of model year 1995 and

1996. All of the bus tested periods lasted approximately 2 hours. It is reported that the buses operated during standard AATA bus routes and stopped at all regular stops although the buses did not board or discharge any passengers (Ensfield 2002) The routes were mostly different for each test, and were selected for a wide variety of driving conditions (see Figure 4-1).

83 Figure 6-4 shows the speed-acceleration matrix developed with second-by-second data. There are two high speed/acceleration frequency peaks here. One is the bin of speed<=2.5 mph and acceleration [-0.25 mph/s, 0.25 mph/s] and contains 26.11% of the observations, while the other is the combination of several adjacent bins which covers speed [22.5 mph, 47.5 mph] and acceleration [-0.75 mph/s, 0.75 mph/s].

Figure 6-4 Transit Bus Speed-Acceleration Matrix

Georgia Institute of Technology researchers collected more than 6.5 million seconds of transit bus speed and position data using Georgia Tech Trip Data Collectors

( an onboard computer with GPS receiver, data storage, and wireless communication device) installed on two MARTA buses in 2004(Yoon et al. 2005b). With second-by- second data, the research team developed transit bus speed/acceleration matrices for the combinations between roadway facility type (arterial or local road) and time range

(morning, midday, afternoon, night). For each of them, two high acceleration/deceleration frequency peaks were also found. This finding is consistent with AATA data set, indicating at least that the onroad operations of the buses in Ann

Arbor are similar to operations in the Atlanta region.

84 This dataset was collected under a wide variety of environmental conditions too.

The temperature ranged from 10 C to 30 C, the relative humidity ranged from 15% to

65%, while the barometric pressure ranged from 960 mbar to 1000 mbar (Figure 6-5). So we can use this data set to examine the impact of environmental conditions on emissions.

Figure 6-5 Test Environmental Conditions

Transit buses tested were provided by the Ann Arbor Transit Authority (AATA) and all of them are New Flyer models with Detroit Diesel Series 50 engines. This will limit the ability of estimated emission models to incorporate the effect of vehicle technologies since all test buses were equipped with same fuel injection type, catalytic converter type, transmission type, and so on. Another limitation is the consideration about effect of emission control technology deterioration on emission levels because these buses were only 5 or 6 years older during the test.

85 6.3 Variability in Emissions Data

6.3.1 Inter-bus Variability

At first, data are presented to illustrate the variability in observed data. Inter-bus variability are illustrated using median and mean of NOx, CO, and HC emission rates for each bus from Figure 6-6 to 6-8. The difference between median and mean is an indicator of skewness for the distribution of emission rates.

Figure 6-6 Median and Mean of NOx Emission Rates by Bus

86

Figure 6-7 Median and Mean of CO Emission Rates by Bus

Figure 6-8 Median and Mean of HC Emission Rates by Bus

87

Figure 6-9 Empirical Cumulative Distribution Function Based on Bus Based Median Emission Rates for Transit Buses

The purpose of inter-bus variability analysis was to characterize the range of variability in vehicle average emissions among all of the buses, to determine whether the dataset is relatively homogenous. Although there are some clusters among the buses as suggested from Figure 6-6 to 6-8 and some skewness in the distribution as suggested by upper tails in Figure 6-9, it is not obvious that this dataset is lack of homogeneity and should be separated in different groups. So, this dataset is treated as a single group for purpose of analysis and model development.

6.3.2 Descriptive Statistics for Emissions Data

Applicable numerical summary statistics, such as variable means and standard deviations, are presented in Table 6-1. Relatively simple graphics such as histograms and boxplots describing variable distributions are presented from Figure 6-10 to 6-12. It may

88 also be necessary to assess whether the individual variables are normally distributed prior to any further analysis using parametric methods that are based upon this assumption.

Table 6-1 Basic Summary Statistics for Emissions Rate Data for Transit Bus

*** Summary Statistics for data in: transitbus.data ***

CO NOx HC Min: 0.000000e+000 0.000000e+000 0.000000e+000 1st Qu.: 3.030000e-003 2.195000e-002 4.200000e-004 Mean: 3.183675e-002 1.052101e-001 1.438709e-003 Median: 7.540000e-003 5.058000e-002 9.300000e-004 3rd Qu.: 2.197000e-002 1.731100e-001 1.840000e-003 Max: 3.057700e+000 2.427900e+000 6.679000e-002 Total N: 1.075350e+005 1.075350e+005 1.075350e+005 NA's : 0.000000e+000 0.000000e+000 0.000000e+000 Std Dev.: 8.479305e-002 1.162344e-001 1.956353e-003

Figure 6-10 Histogram, Boxplot, and Probability Plot of NOx Emission Rate

89

Figure 6-11 Histogram, Boxplot, and Probability Plot of CO Emission Rate

Figure 6-12 Histogram, Boxplot, and Probability Plot of HC Emission Rate

90 Further analysis indicated that there are some zero values existing in the emission data. There might be several reasons for zero values. Missing data caused by loss of communication between instruments or failure of a particular vehicle were recorded as zero in the dataset. Those zero values were already identified in the data post-processing procedure in Chapter 4. Another situation for zero values might have occurred when the presence of reference air contained significant amounts of a pollutant and the instrument may systematically report negative emission values. It is suggested by Sensors that negative data should be set to zero. That means these zero values were artificially recorded as zero, not observed by test equipment as zero. These zero values would bring truncation issue in the model, since Sensors’ transit bus dataset only contained valid positive emission data in the nature. Usually, truncation is found when a random variable is not observable over its entire range. Truncation couldn’t be treated as a missing data problem as the missing observations are moved at random. In statistics it can mean the process of limiting consideration or analysis to data that meet certain criteria or it can refer to a data distribution where values above or below a certain point have been eliminated (or cannot occur). A program in Matlab was written to check for the presence of zero emissions estimates in the dataset. It was found that there were 1.45% of zero values for NOx emissions, 1.65% of zero values for CO emissions and 3.84% of zero values for HC emissions. Since negative emission values are not observable for transit bus dataset, further analysis will focus on truncated dataset with valid positive emission data only.

The numerical summary statistics such as variable means and standard deviations for truncated emission data are presented in Table 6-2, and relatively simple graphics such as histograms and boxplots describing variable distributions are presented from

Figure 6-13 to 6-15. The mean of truncated NOx emission data increases 1.26%, while the mean of truncated CO emission data increases 1.23% and the mean of truncated HC emission data increase 0.99%, comparing with those of original dataset.

91 Table 6-2 Basic Summary Statistics for Truncated Emissions Rate Data

NOx CO HC Min: 1.000000e-005 1.000000e-005 1.000000e-005 1st Qu.: 2.256000e-002 3.190000e-003 4.700000e-004 Mean: 1.067578e-001 3.236955e-002 1.496171e-003 Median: 5.243500e-002 7.770000e-003 9.900000e-004 3rd Qu.: 1.749625e-001 2.246000e-002 1.880000e-003 Max: 2.427900e+000 3.057700e+000 6.679000e-002 Total N: 1.059760e+005 1.057650e+005 1.034050e+005 NA's : 0.000000e+000 0.000000e+000 0.000000e+000 Std Dev.: 1.163785e-001 8.539871e-002 1.973375e-003

Figure 6-13 Histogram, Boxplot, and Probability Plot of Truncated NOx Emission Rate

92

Figure 6-14 Histogram, Boxplot, and Probability Plot of Truncated CO Emission Rate

Figure 6-15 Histogram, Boxplot, and Probability Plot of Truncated HC Emission Rate

93 These boxplots for truncated emission data show that there are some obvious outliers in the measured emissions of all three pollutants, and the histograms suggest high degree of non-normality, also indicated in the probability plots. Thus there is need to transform the response variable to correct for this. Transformations are used to present data on a different scale. In modeling and statistical applications, transformations are often used to improve the compatibility of the data with assumptions underlying a modeling process, to linearize the relation between two variables whose relationship is non-linear, or to modify the range of values of a variable (Washington et al. 2003).

6.3.3 Transformation for Emissions Data

Although evidence in the literature suggests that a logarithmic transformation is most suitable for modeling motor vehicle emissions (Washington 1994; Ramamurthy et al. 1998; Fomunung 2000; Frey et al. 2002), this transformation needs to be verified through Box-Cox procedure. Box-Cox function in Matlab can automatically identify a transformation from the family of power transformations on emission data, ranging from

-1.0 to 1.0. The lambdas chosen by Box-Cox procedure are 0.22875 for truncated NOx, -

0.0648 for truncated CO, 0.14631 for truncated HC.

Box-Cox procedure is only used to provide a guide for selecting a transformation, so overly precise results are not needed (Neter et al. 1996). It is often reasonable to use a nearby lambda value for with the power transformation is easily to understand. The lambda values used for transformations are 1/4 for truncated NOx, 0 for truncated CO, 0 for truncated HC. Histograms, boxplots and normal-normal plots describing transformed variable distributions are presented from Figure 6-16 to 6-18, where a great improvement is noted.

94

Figure 6-16 Histogram, Boxplot, and Probability Plot of Truncated Transformed NOx Emission Rate

Figure 6-17 Histogram, Boxplot, and Probability Plot of Truncated Transformed CO Emission Rate

95

Figure 6-18 Histogram, Boxplot, and Probability Plot of Truncated Transformed HC Emission Rate

Although transformations can result in improvement of a specific modeling assumption, such as linearity or normality, they can often result in the violation of others.

Thus, transformations must be used in an iterative fashion, with continued checking of other modeling assumptions as transformations are made. Dr. Washington suggested the comparisons should always be made on the original untransformed scale of Y when comparing statistical models and these comparisons extend to goodness of fit statistics and model validation exercises (Washington et al. 2003).

6.3.4 Identification of High Emitter

From a modeling viewpoint, it is important to accurately predict the number of

‘high emitter’ vehicles in the fleet (older technology, poorly maintained, or tampered vehicles that emit significantly elevated emissions relative to the fleet average under all operating conditions) and the fraction of activities that yield high emissions for normal

96 emitting vehicles. Historic practices to identify ‘high emitters’ in a data set have relied on judgment to set cut points that are often indefensible from a statistical, and sometimes even practical, perspective. EPA uses 5 times the prevailing emission standards as the cut point across all pollutants (USEPA 1993), while CARB has defined different emission regimes ranging from normal to super emitters and used different criteria for each regime (CARB 1991; Carlock 1994).

Table 6-3 CARB Emission Regime Definition (Carlock 1994)

Emitter Status NOx CO HC Normal <= 1 standard < 1 standard < 1 standard Moderate 1 to 2 standard 1 to 2 standard 1 to 2 standard High 2 to 3 standard 2 to 6 standard 2 to 4 standard Very High 3 to 4 standard 6 to 10 standard 5 to 9 standard Super > 4 standard > 10 standard > 9 standard

In contrast, the methodology employed as MEASURE database development at

Georgia Tech is statistically based. Wolf et al used regression tree techniques to classify vehicles into classes that behave similarly, exhibit similar technology characteristics, and exhibit similar mean emission rates under standardized testing conditions(Wolf et al.

1998). The cut points within each technology class are then defined on the basis of pre- selected percentiles of a normal distribution of the emission rates for each pollutant. The analysis by Wolf et al specified a cut point of 97.73 percent (that is, mean + 2 standard deviation), which implies that approximately 2.27 percent of the vehicles in each technology class are high emitters.

For this research, although inter-bus variability exists in the data set, these 15 buses should be treated as one technology class because they shared the same fuel injection type, catalytic converter type, transmission type, and their model year and odometer reading were similar. Just as Wolf’s approach, the emissions value located at two standard deviations above the mean of the normalized emissions distribution is used

97 as cutpoint to distinguish between normal and high emission points. Theoretically, this method will consistently identify approximately 2.27 percent of the data as high emission point. That means 97.73 percent of the population should full into the normal status.

Analysis results showed that 0.33 percent of NOx emission, 3.76 percent of CO emission, and 1.37 percent of HC emissions were identified as high emission point. After assigning those high emissions points into different buses, the distribution is shown below.

Table 6-4 Percent of High Emission Points by Bus

NOx CO HC bus 360 0.02% 2.80% 5.06% bus 361 0.32% 1.08% 0.25% bus 363 0.06% 3.10% 0.00% bus 364 0.04% 0.87% 7.38% bus 372 0.00% 0.13% 1.96% bus 375 0.69% 3.16% 0.27% bus 377 0.00% 4.44% 0.00% bus 379 0.67% 2.85% 1.17% bus 380 0.52% 7.67% 0.69% bus 381 0.10% 4.76% 0.14% bus 382 1.14% 8.12% 0.36% bus 383 0.88% 3.44% 1.82% bus 384 0.50% 5.10% 1.33% bus 385 0.55% 2.10% 0.60% bus 386 0.20% 6.63% 0.57% Total 0.36% 3.81% 1.38%

For each individual bus, the highest proportion is 1.14 percent for bus 382 for

NOx emissions, 8.12 percent for bus 380 for CO emissions, and 7.38 percent for bus 364 for HC emissions. No evidence from the above table suggest that there are some “high emitters” (older technology, poorly maintained, or tampered vehicles) in the data set.

This conclusion makes sense since all buses were only 5 or 6 years older during the test.

Another finding indicated that a small of fraction of a bus’s observed activity exhibit disproportionately high emissions. Activities found in literatures include hard accelerations at low speeds, moderate acceleration at high speeds, or equivalent

98 accelerations against gravity (Fomunung 2000). Given that high emissions points make up only 0.33 percent of the data set for NOx, 3.76 percent for CO, 1.37 percent for HC, it is not necessary to develop two different models for normal emissions and high emissions.

Based on this analysis, these 15 buses should be treated as one technology class since no high emitters were identified.

6.4 Potential Explanatory Variables

There are mainly four groups of parameters that affect vehicle emissions as indicated in literature(Guensler 1993; Clark et al. 2002). These groups are: 1) vehicle characteristics, including vehicle type, make, model year, engine type, transmission type, frontal area, drag coefficient, rolling resistance, vehicle maintenance history, etc.; 2) roadway characteristics, including road grade and possibly pavement surface roughness, etc.; 3) onroad load parameters, like onroad driving trace (sec-cy-sec) or speed/acceleration profile, vehicle payload, onroad operating modes, driver behavior, etc.; and 4) environment conditions, including humidity, ambient temperature, and ambient pressure (Feng et al. 2005; Guensler et al. 2005a).

In general, emissions from HDDVs are more likely to be a function of brake- horsepower load on the engine (especially for NOx) than light-duty gasoline vehicles, because instantaneous emissions levels of diesel engine are highly correlated with the instantaneous work output of the engine (Ramamurthy and Clark 1999; Feng et al. 2005).

That is, in particular, the higher the engine load, the higher emissions for NOx The emissions modeling framework (from which most of the items below are derived) is outlined in the RARE report (Guensler et al. 2006). The goal of that modeling regime is to predict onroad load and then apply appropriate emission rates to the load. Most of the items outlining below are related to the amount of engine load that a vehicle will experience. Although each of the variables below are important, they are not always available in onroad testing data (although in the future we need to make sure that these

99 data are all collected). But, engine load in the AATA database could be used in emission rate model development for this research. Also, there are some factors, such as temperature and humidity, may affect emission rates independent of load, or perhaps interacting with load. As such, the model should incorporate such variables.

6.4.1 Vehicle Characteristics

Factors related to vehicle characteristics influencing heavy-duty diesel vehicle emissions which are summarized in the literature include vehicle class (i.e., weight, engine size, horsepower rating), model year, vehicle mileage, emission control system

(i.e., engine exhaust aftertreatment system), transmission type, inspection and maintenance history, etc. (Guensler 1993; Clark et al. 2002).

The effect of vehicle class on emissions is significant. Five main factors that cause a vehicle to demand engine power are vehicle speed, vehicle acceleration, drivetrain inertial acceleration, vehicle weight, and road grade. As the required power and work performed by the vehicle increases, the amount of fuel burned to produce that power also increases, and the applicable emission rates also generally increase. Thus, emissions vary as a function of vehicle class and vehicle configuration. The higher truck classes with larger engines are heavier and, thus, typically produce more emissions.

Vehicle configurations with large frontal areas and high drag coefficients will yield higher emissions when operated at higher speeds and/or accelerated at higher rates.

The concept of vehicle technology groups is to identify and track subsets of vehicles that have similar onroad load responses and similar laboratory emission rate performance. The basic premise is that vehicles in the same heavy-duty vehicle class, employing similar drivetrain system, and of the same size and shape have similar load relationships. There is also an important practical consideration in establishing vehicle technology groups. Researchers need to be able to identify these vehicles in the field during traffic counting exercises.

100 The starting point for technology group criteria is a visual classification scheme.

Yoon, et al. (Yoon et al. 2004a) developed a new HDV visual classification scheme called the X-scheme based on the number of axles and gross vehicle weight ratings

(GVWR) as a hybrid scheme between the FHWA truck and EPA HDV classification schemes. With field-observed HDV volumes, emissions rates estimated using the X- scheme were 34.4% and 32.5% higher for NOx and PM, compared to using the standard

EPA guidance. The X-scheme more realistically reflects vehicle composition in the field than does the standard EPA guidance, which shifted heavy-HDV volumes into light- or medium-HDV volumes 21% more frequently than the X-scheme. Figure 6-19 shows X- scheme classes and their typical figures.

Figure 6-19 The X Classes and Typical Vehicle Configurations

Vehicle age and model year effects are accounted for because some vehicle models have much lower average emissions. Researchers from West Virginia University reported that most regulated emissions from engine produced by Detroit Diesel

101 Corporation have declined over the years and the expected trend of decreasing emission levels with the model year of the engine is clear and consistent for PM, HC, CO and NOx, starting with the 1990 models (Prucz et al. 2001). Information on vehicle age can be obtained from registration database using vehicle identification numbers and truck manufacturer records. The registration database can be sorted by calendar year and show vehicle registered in the given year by model year. However, given the differences noted between field-observation fleet composition and registration data in the light-duty fleet

(Granell et al. 2002), significant additional research efforts designed to model the on-road subfleet composition (classifications and model year distributions) are even more warranted for HDVs. It is also important to keep in mind that heavy-duty engines accumulate miles of travel vary rapidly and that engine rebuilding is a common practice.

Hence, the age of the vehicle does not necessarily equal the age of the engine. Previous field work in Atlanta indicates that onroad surveys provide better information on fleet composition (Ahanotu 1999). To refine the model, appropriate datasets that include detailed information on engine type, transmission type, etc. will be needed to appropriately subdivide the observed on-road groups and continue to develop respective emission rate. The data collection challenge in this area is daunting, but it is worth to perform once to provide a library of information that can be used in a large number of modeling applications.

Vehicle weight is critical to the demand engine power that must be supplied to produce the tractive force needed to overcome inertial and drag forces and then influences vehicle emissions. It is found that NOx emissions increase as the vehicle weight increases and this relationship did not vary much from vehicle to vehicle

(Gajendran and Clark 2003). The effects of vehicle age, engine horsepower ratings, transmission type, and engine exhaust aftertreatment were also investigated in other literature (Clark et al. 2002; Feng et al. 2005).

102 The vast majority of heavy-duty vehicles are normal emitters, but a small percentage of vehicles are high-emitters under every operating condition, typically because they have been tampered with or they are malfunctioning (i.e. defective or mal- maintained engine sensors or actuators). As the vehicle ages, general engine wear and tear will increase emission rates moderately due to normal degradation of emissions controls of properly functioning vehicles. On the other hand, as vehicles age, the probability that some of the vehicles will malfunction and produce significantly higher emissions (i.e. become high-emitters). Probability functions as to the classification of vehicles within specific model years (and later, within specific statistically-derived vehicle technology groups) are currently being developed through the assessment of certification testing and various roadside emissions tests. Obtaining additional detailed sources of data for developing failure models appears to be warranted.

After engine horsepower at the output shaft has been reduced by power losses associated with fluid pressures, operation of air conditioning, and other accessory loads, there is still an additional and significant drop in available power from the engine before reaching the wheels. Power is required to overcome: mechanical friction within the transmission and differential, internal working resistance in hydraulic couplings, and friction of the vehicle weight on axle bearings. The combined effect of these components is parameterized as drivetrain efficiency. However, the more difficult and more significant component of power loss in the drivetrain is associated with the inertial resistance of drivetrain components rotational acceleration (Gillespie 1992).

A heavy-duty truck drivetrain is significantly more massive than its light-duty counterpart. The net effect of drivetrain inertial losses when operating in higher gears on freeway may not be significant enough to be included in the model (relative to the other load-related components in the model for these heavy vehicles). However, recent studies appear to indicate very high truck emission rates (gram/second) in “creep mode” stop and start driving activities noted in ports and rail yards. This may indicate that the high

103 inertial loads for low gear, low speed, and acceleration operations may contribute significantly to emissions from mobile sources in freight transfer yards and therefore should not be ignored (Guensler et al. 2006).

The inertial losses are a function of a wide variety of physical drivetrain characteristics (transmission and differential types, component mass, etc.) and on-road operating conditions. To refine the use of inertial losses in the modal model, new drivetrain testing data will be designed to evaluate the inertial losses for various engine, drive shaft, differential, axle, and wheel combinations and to establish generalized drivetrain technology classes. Then, gear selection probability matrices for each drivetrain technology class, and gear and final drive ratio data can be provided in lookup tables for model implementation, in place of the inertial assumptions currently employed.

However, data are currently significantly lacking for development of such lookup tables.

6.4.2 Roadway Characteristics

The three basic geometric elements of a roadway are the horizontal alignment, the cross-slope or amount of super-elevation and the longitudinal profile or grade. Among them, road grade has been shown to have significant impact on engine load and vehicle emissions (Guensler 1993). Other roadway characteristics, such as lane width, are also noted to have a significant impact on the speed-acceleration profiles of heavy-duty vehicles and can therefore affect engine load (Grant et al. 1996).

6.4.3 Onroad Load Parameters

Onroad load parameters include onroad driving trace (sec-by-sec) or speed/acceleration profile, engine load, onroad operating modes (i.e., idling/motoring, acceleration, deceleration, and cruise), driver behavior, and so on. Vehicle speed and acceleration are integral components to the estimate of vehicle road load, and therefore engine load. Previous studies indicated that increased engine power requirements could

104 result in the increase in NOx emissions (Ramamurthy and Clark 1999; Feng et al. 2005).

Clark et al. reported that the vehicle applications and duty cycles can have an effect on the emission produced (Clark et al. 2002). His study found that over a typical day of use for any vehicle, one that stops and then accelerates more often might produce higher distance-specific emissions, providing all else is held constant.

Passenger and freight payloads together with the vehicle tare weight contribute to the demand for power that must be supplied to produce the tractive force needed to overcome inertial and drag forces. Passenger loading functions for transit operations can be obtained through analysis of fare data or on-board passenger count programs. On the heavy-duty truck side, on-road freight weight distributions by vehicle class can be derived from roadside weigh stations studies. Ahanotu conducted detailed weigh-in- motion studies in Atlanta and found that reasonable load distributions by truck class and time of day could be applied in such a modal modeling approach (Ahanotu 1999).

Although additional field studies are warranted to examine the stability of the Atlanta results over time and the transferability of findings in Atlanta to other metropolitan areas

(especially considering the potential variability in commodity transport, such as agricultural goods that may occur in other areas) the modeling methodology seems appropriate.

6.4.4 Environmental Conditions

Environmental conditions under which the vehicle is operated include humidity, ambient temperature, and ambient pressure. EPA is currently conducting studies to find the effect of ambient conditions on HDDV emissions (NRC 2000). The current

MOBILE6.2 model includes correction factors to account for the impact of environmental on vehicle emission rates. Given the lack of compelling additional data available for analysis, it may be necessary to ignore the effects of these environmental parameters (altitude, temperature, and humidity) or simply incorporate the existing

105 MOBILE6.2 corrections factors. Preliminary analyses of the data and methods used to derive the MOBILE6.2 environmental correction factors indicate that the embedded equations in MOBIL6.2 probably need to be revisited.

6.4.5 Summary

It is impossible for modeler to include all explanatory variables identified in the literature review for model development because the explanatory variables available for model development and model validation are only a subset of potential explanatory variables identified above. Therefore, the conceptual model will only include available variables and derived variables in the provided dataset.

6.5 Selection of Explanatory Variables

Just as mentioned earlier, available explanatory variables for transit bus are only a subset of potential explanatory variables identified. In brief, available explanatory variables can be summarized as:

• Test information: date, time;

• Vehicle characteristics: license number; model year, odometer reading,

engine size, instrument configuration number;

• Roadway characteristics: road grade (%);

• Onroad load parameters: engine power (bhp), vehicle speed (mph),

acceleration (mph/s);

• Engine operating parameters: throttle position (0 – 100%), engine oil

temperature(deg F), engine oil pressure (kPa), engine warning lamp

(Binary), engine coolant temperature (deg F), barometric pressure reported

from ECM (kPa);

106 • Environmental conditions: ambient temperature (deg C), ambient pressure

(mbar), ambient relative humidity (%), ambient absolute humidity

(grains/lb air).

The most important question related to engine power is how to simulate engine power in real world for application purpose. Georgia Institute of Technology researchers developed a transit bus engine power demand simulator (TB-EPDS), which estimates transit bus power demand for given speed, acceleration, and road grade conditions (Yoon et al. 2005a; Yoon et al. 2005b). Speed-acceleration-road grade matrices were developed from speed and location data obtained using a Georgia Tech Trip Data Collector. The researchers conclude that speed-acceleration-road grade matrices at the link level or the route level are both acceptable for regional inventory development. However, for micro- scale air quality impact analysis, link-based matrices should be employed (Yoon et al.

2005a). Although significant uncertainties still exist for inertial loss which is significant at low speeds and motoring mode with negative engine power, this research showed that using engine power as load data is possible for application purpose. So we concluded that engine power could be used as load data in estimated emission models.

The relationships between explanatory variables were investigated using S-Plus.

Three variables were excluded because they have only single value for all records, and they are engine size, instrument configuration number and engine warning lamp. So there are 14 explanatory variables included in correlation analysis. The correlation matrix is shown in table 6-5.

Table 6-5 Correlation Matrix for Transit Bus Dataset

*** Correlations for data in: transitbus.data ***

model.year odometer temperature baro model.year 1.0000000000 -0.65527310560 0.047048515 0.394378106 odometer -0.6552731056 1.00000000000 0.186771499 -0.704310642 temperature 0.0470485145 0.18677149860 1.000000000 -0.326938545 baro 0.3943781056 -0.70431064156 -0.326938545 1.000000000 SCB.RH 0.0684118420 0.34381446472 0.488214011 -0.632480147

107 Table 6-5 Continued

humid 0.0309977344 0.39026147955 0.751260451 -0.649522446 grade -0.0042410213 0.00052737023 -0.005590441 0.002384338 vehicle.speed -0.0149162038 -0.06290809815 -0.225478003 0.054918347 throttle.position -0.0018682400 0.00934657062 -0.091132660 -0.014470281 oil.temperature 0.0517590690 -0.01188182701 0.042676227 -0.026744091 oil.pressure 0.0505213386 -0.09844247172 -0.073256993 0.034212231 coolant.temperature 0.2067272410 -0.11771006697 0.077114798 0.045844706 eng.bar.press 0.1377810757 -0.24887618256 -0.260525088 0.371021489 engine.power -0.0060664550 0.02128322926 -0.059512654 -0.035718725

SCB.RH humid grade vehicle.speed model.year 0.06841184197 0.030997734 -0.0042410213 -0.0149162038 odometer 0.34381446472 0.390261480 0.0005273702 -0.0629080981 temperature 0.48821401099 0.751260451 -0.0055904408 -0.2254780028 baro -0.63248014742 -0.649522446 0.0023843383 0.0549183467 SCB.RH 1.00000000000 0.931879078 -0.0060751123 -0.0345026974 humid 0.93187907778 1.000000000 -0.0064110094 -0.1178709844 grade -0.00607511233 -0.006411009 1.0000000000 0.0008965681 vehicle.speed -0.03450269737 -0.117870984 0.0008965681 1.0000000000 throttle.position 0.01342357443 -0.024720165 0.0201865069 0.3877053983 oil.temperature 0.09601856999 0.087317807 -0.0071166692 0.0186414330 oil.pressure -0.04985283726 -0.077649741 0.0098369544 0.5674938143 coolant.temperature 0.20055598839 0.171558840 -0.0145315240 0.0729981993 eng.bar.press -0.36638292674 -0.373540032 0.0021320630 0.1432703187 engine.power 0.02574364223 -0.003279122 0.0216620907 0.3032096568 acc 0.00004037101 0.003340728 0.0129300756 0.0002241259

throttle.position oil.temperature oil.pressure model.year -0.001868240 0.051759069 0.050521339 odometer 0.009346571 -0.011881827 -0.098442472 temperature -0.091132660 0.042676227 -0.073256993 baro -0.014470281 -0.026744091 0.034212231 SCB.RH 0.013423574 0.096018570 -0.049852837 humid -0.024720165 0.087317807 -0.077649741 grade 0.020186507 -0.007116669 0.009836954 vehicle.speed 0.387705398 0.018641433 0.567493814 throttle.position 1.000000000 0.012077329 0.681336402 oil.temperature 0.012077329 1.000000000 -0.117896787 oil.pressure 0.681336402 -0.117896787 1.000000000 coolant.temperature 0.059605193 0.335667341 -0.298083257 eng.bar.press 0.102861968 0.059886972 0.022549030 engine.power 0.959310116 0.007171781 0.656609695 acc 0.660747116 -0.004185245 0.465493435

coolant.temperature eng.bar.press engine.power model.year 0.206727241 0.137781076 -0.006066455 odometer -0.117710067 -0.248876183 0.021283229 temperature 0.077114798 -0.260525088 -0.059512654 baro 0.045844706 0.371021489 -0.035718725 SCB.RH 0.200555988 -0.366382927 0.025743642 humid 0.171558840 -0.373540032 -0.003279122 grade -0.014531524 0.002132063 0.021662091 vehicle.speed 0.072998199 0.143270319 0.303209657 throttle.position 0.059605193 0.102861968 0.959310116 oil.temperature 0.335667341 0.059886972 0.007171781 oil.pressure -0.298083257 0.022549030 0.656609695 coolant.temperature 1.000000000 0.284506753 0.050584845 eng.bar.press 0.284506753 1.000000000 0.089702976 engine.power 0.050584845 0.089702976 1.000000000

108 All variables pairs with correlation coefficients greater than 0.5 were scrutinized and subjected to further analysis, which invariably helped in pairing down the number of variables. The values in the correlation matrix show that throttle position and engine power, ambient relative humidity and ambient absolute humidity are highly correlated

(higher than 0.90); model year and odometer, odometer and barometric pressure, barometric pressure and ambient relative humidity, barometric pressure and ambient absolute humidity, ambient absolute humidity and temperature, oil pressure and throttle position, oil pressure and vehicle speed, oil pressure and engine power, throttle position and acceleration, engine power and acceleration are moderately correlated (higher than

0.50); other pairs of variables, however, have only light correlations.

The relationship between throttle position and engine power is shown in Figure 6-

20. Since engine power is derived from percent engine load, engine torque, and engine speed, and previous studies indicated that increased engine power requirements could result in the increase in NOx emissions (Ramamurthy and Clark 1999; Feng et al. 2005), the author retained engine power in the database.

Figure 6-20 Throttle Position vs. Engine Power for Transit Bus Dataset

109 Ambient relative humidity and ambient absolute humidity provide same information in two different ways, and either is enough to count the influence of ambient humidity on emissions. The author retained ambient relative humidity in the database.

Three other findings related to the correlation matrix are:

1. All environmental characteristics, like temperature, humidity, and

barometric pressure, are moderately correlated with each other (Figure 6-

21). This means modelers should pay attention to such relationships when

developing environmental factors.

2. Engine power is correlated with not only onroad load parameters such as

vehicle speed, acceleration, and road grade, but also engine operating

parameters such as throttle position and engine oil pressure. Engine power

in this data set is derived from measured engine speed, engine torque and

percent engine load. On the other hand, engine power could be derived

theoretically from vehicle speed, acceleration and road grade using an

engine power demand equation. So, engine power can connect onroad

modal activity with engine operating conditions at this level. This fact

strengthens the importance to introduce engine power into conceptual

emissions model and to improve the ability to simulate engine power for

regional inventory development.

3. Engine operating parameters, like throttle position (0 – 100%), engine oil

pressure (kPa), engine oil temperature (deg F), engine coolant temperature

(deg F), and barometric pressure reported from ECM (kPa), have highly or

moderately related to onroad operating parameters. For example, engine

power and throttle position is highly correlated, while oil pressure and

vehicle speed, oil pressure and engine power, throttle position and

acceleration are moderately correlated. Although engine operating

parameters maybe have power to explain the variability of emission data,

110 it is difficult to get such data in the real world for application purpose.

These four variables are retained for further analysis of their relationships with emissions. Although these four will be excluded for emission model at this moment, analysis of these potential relationships may indicate a need for further research in this area.

111 CHAPTER 7

MODAL ACTIVITY DEFINITIONS DEVELOPMENT

7.1 Overview of Current Modal Activity Definitions

Current research suggests that vehicle emission rates are highly correlated with modal vehicle activity. Modal activity is a vehicle activity characterized by cruise, idle, acceleration or deceleration operation. Consequently, a modal approach to transportation-related air quality modeling is becoming widely accepted as more accurate in making realistic estimates of mobile source contribution to local and regional air quality. Research in Georgia Tech has clearly identified that modal operation is a better indicator of emission rates than average speed (Bachman 1998). The analysis of emissions with respect to driving modes, also referred to as modal emissions, has been done in several recent researches (Barth et al. 1996; Bachman 1998; Fomunung et al.

1999; Frey et al. 2002; Nam 2003; Barth et al. 2004). These studies indicated that driving modes might have ability to explain a significant portion of variability of emission data. Usually, driving can be divided into four modes: acceleration, deceleration, cruise, and idle. But driving mode definitions in literature were somewhat arbitrary. To define the driving modes or choose the more reasonable definitions for proposed modal emissions model, current driving mode definitions used in different modal emission models are needed to be investigated first.

MEASURE’s Definitions Researchers in Georgia Tech developed Mobile Emissions Assessment System for Urban and Regional Evaluation (MEASURE) model in 1998 (Guensler et al. 1998).

This model was developed from more than 13,000 laboratory tests conducted by the EPA and CARB using standardized test cycle conditions and alternative cycles (Bachman

1998). Modal activities variables were introduced into the MEASURE model as follows:

112 acceleration (mph/sec), deceleration (mph/sec), cruise (mph) and percent in idle time. In addition, two surrogate variables were also developed, inertial power surrogate, IPS

(mph2/s), which was defined as acceleration times velocity; and drag power surrogate,

DPS (mph3/s), which was defined as acceleration times velocity squared. Within each mode, several ‘cut points’, or threshold values, were specified and used to create several categories. In total, six threshold values were defined for acceleration, three for deceleration, five for cruise modes, seven for IPS, and seven for DPS. Modal activity surrogate variables were added as percent of cycle time spend in specified operating condition (Fomunung et al. 1999).

NCSU’s Definitions

Dr. Frey in North Carolina State University (NCSU) defined four modes of operation (idle, acceleration, deceleration, and cruise), for EPA’s Multi-Scale Motor

Vehicle and Equipment Emission System (MOVE) in 2001 (Frey and Zheng 2001; Frey et al. 2002). The following description is directly cited from his report.

Idle is defined as based upon zero speed and zero acceleration. The acceleration

mode includes several considerations. First, the vehicle must be moving and

increasing in speed. Therefore, speed must be greater than zero and the

acceleration must be greater than zero. However, vehicle speed can vary slightly

during events that would typically be judged as cruising. Therefore, in most

instances, the acceleration mode is based upon a minimum acceleration of two

mph/sec. However, in some cases, a vehicle may accelerate slowly. Therefore, if

the vehicle has had a sustained acceleration rate averaging at least one mph/sec

for at least three seconds or more, that is also considered acceleration.

Deceleration is defined in a similar manner as acceleration, except that the criteria

for deceleration are based upon negative acceleration rates. All other events not

classified as idle, acceleration, or deceleration, are classified as cruising. Thus,

113 cruising is approximately steady speed driving but some drifting of speed is

allowed.

PERE’s Definitions Dr. Nam developed his definitions when he introduced his Physical Emission

Rate Estimator (PERE) model in 2003 (Nam 2003). Idle is defined as speed less than 2 mph. Acceleration mode is based on acceleration rate greater than 1 mph/sec. However, deceleration is based on deceleration rate less than -0.2 mph/sec. Other events are classified as cruise mode and the acceleration range is between -0.2 mph/sec and 1 mph/sec. Nam also mentioned in his report that the definition of cruise (based only on acceleration) will change depending on the speed in future studies.

Summary Current driving mode definitions related to modal emission models are all significantly different from each other. NCSU used one absolute critical value, 2 mph/sec, for acceleration and deceleration mode. However, PERE chose two different critical values, 1 mph/sec and -0.2 mph/sec, for acceleration and deceleration mode individually.

The critical values, no matter 2 mph/sec, 1 mph/sec, or 0.2 mph/sec, were chosen somewhat arbitrarily. MEASURE used several threshold values to add modal activity surrogate variables. Table 7-1 summarizes these modal activity definitions.

Table 7-1 Comparison of Modal Activity Definition

MEASURE NCSU PERE Idle Speed=0, Acc=0 Speed=0, Acc=0 Speed<2 Acceleration Acc>6,Acc>5,Acc>4, Acc>2 or Acc>1 for Acc>1 Acc>3,Acc>2,Acc>1 three seconds Deceleration Acc<-3,Acc<-2, Acc<- Acc<-2 or Acc<-1 for Acc<-0.2 1 three seconds Cruise Speed>70, Speed>60, Other events -0.250, Speed>40, Speed>30 Note: Unit for speed is mph, unit for acceleration is mph/sec.

114 7.2 Proposed Modal Activity Definitions and Validation

Although the current mode definitions all had the ability to explain some variability in different emission datasets (Barth et al. 1996; Bachman 1998; Fomunung et al. 1999; Frey et al. 2002; Nam 2003; Barth et al. 2004), they differ significantly from each other. This makes the determining whether to accept current definitions or develop new definitions a bit of a challenge.

MEASURE’s definitions were developed based on cycle tested data and modal activity surrogate variables were added as percent of cycle time spend in specified operating condition. Obviously, this definition is not suitable for second-by-second data.

PERE’s definition couldn’t assign all data into appropriate modes. Idle mode was defined as zero speed and zero acceleration in NCSU’s definitions. Although idle mode is defined theoretically as zero speed and zero acceleration, it couldn’t be defined just like this considering unavoidable measurement error and measurement noise. Based on this analysis, it seems more reasonable to develop new definitions for this proposed modal emission model, where such definitions can be defined through empirical analysis of the data. In fact, the definition of modal activity is depended on the available speed/acceleration data and data quality. For example, a lack of zero speed records doesn’t mean that there is no idle activity in dataset.

The initial proposed modal activity definitions were defined as follows:

• Idle is defined as based on speed less than 2.5 mph and absolute acceleration

less than 0.5 mph/sec.

• Acceleration mode is based upon a minimum acceleration of 0.5 mph/sec.

• Deceleration is defined in a similar manner as acceleration, except what the

criteria for deceleration are based upon negative acceleration rates.

• All other events not classified as idle, acceleration, or deceleration, are

classified as cruise.

115 At the same time, several different critical values were chosen to examine the reasonableness proposed criteria. Four different mode definitions using different critical values are shown as following.

Table 7-2 Four Different Mode Definitions and Modal Variables

Idle Acceleration Deceleration Cruise Definition 1 Speed<=2.5 & Acc> 0.5 Acc< -0.5 Other abs(acc)<=0.5 Definition 2 Speed<=2.5 & Acc> 1 Acc< -1 Other abs(acc)<=1 Definition 3 Speed<=2.5 & Acc> 1.5 Acc< -1.5 Other abs(acc)<=1.5 Definition 4 Speed<=2.5 & Acc> 2 Acc< -2 Other abs(acc)<=2 Note: Unit for speed is mph, unit for acceleration is mph/sec.

A program was written in MATLAB to determine the driving mode for second- by-second data and estimates the average value of emissions for each of the driving modes. At the same time, average modal emission rates are estimated for each mode based on different modal activity definitions in table 7-2. Figures 7-1 to 7-3 present a comparison of average modal emission rates for different pollutants (NOx, CO, and HC).

116

Figure 7-1 Average NOx Modal Emission Rates for Different Activity Definitions

Figure 7-2 Average CO Modal Emission Rates for Different Activity Definitions

117

Figure 7-3 Average HC Modal Emission Rates for Different Activity Definitions

These four different modal activity definitions show a kind of consistent pattern.

The average emissions during the acceleration mode are significantly higher than any other driving mode for all of the pollutants. The average emission rate during deceleration mode is the lowest of the four modes for NOx and CO emissions while the average emission rate during idle mode is the lowest of the four modes for HC emissions.

The average cruising emission rate is typically higher than the average idling and decelerating emission rate, except for CO emission in definition 3 and 4.

To assess whether the average modal emission rates are statistically significantly different from each other, two-sample tests were estimated for each pair. Lilliefors tests for goodness of fit to a normal distribution were first used for each mode based on different modal activity definitions. The results show that all of them reject the null hypothesis of normal distribution at 5% level. Kolmogorov-Simirnov two-sample test

118 was chosen to take place of t test because the assumption of normal distribution was questionable. The Kolmogorov-Smirnov two-sample test is a test of the null hypothesis that two independent samples have been drawn from the same population (or from populations with the same distribution). The test uses the maximal difference between cumulative frequency distributions of two samples as the test statistic. Results of the

Kolmogorov-Smirnov two-sample tests are presented in Table 7-3 in terms of p-values where “Acc” represents acceleration mode while “Dec” represents deceleration mode.

The cases where the p-value is less than 0.05 indicate that the distributions are different at the 5% level. All p-values for 72 possible pairwise comparisons are lower than 0.05, indicating that the distributions for these pairs are statistically different from each other.

Table 7-3 Results for Pairwise Comparison for Modal Average Estimates In Terms of P- value Idle-Acc Idle-Dec Idle-Cruise Acc-Dec Acc-Cruise Dec-Cruise

NOx 0 0 0 0 0 0 CO 0 0 0 0 0 0

Definiton1 HC 0 0 0 0 0 0 NOx 0 0 0 0 0 0 CO 0 0 0 0 0 0

Definiton2 HC 0 0 0 0 0 0 NOx 0 0 0 0 0 0 CO 0 0 0 0 0 0

Definiton3 HC 0 0 0 0 0 0 NOx 0 0 0 0 0 0 CO 0 0 0 0 0 0

Definiton4 HC 0 0 0 0 0 0

The modal emission analysis results suggest that all four mode definitions proposed in Table 7-2 appear reasonable. These modal definitions allow some explanation of differences in emissions based upon driving mode, as revealed by the fact

119 that, the modal emission distributions differ from each other. A further step is taken here to see which mode definition would be identified as the most appropriate definition by utilizing HTBR technique. For each definition, three dummy variables are added to represent idle, acceleration, and deceleration mode. The regression trees are developed between emission data and these three dummy variables for each definition are shown in

Figure 7-4 to 7-6. The sensitivity test results based on these regression trees for NOx,

CO, and HC are summarized in Table 7-4.

Figure 7-4 HTBR Regression Tree Result for NOx Emission Rate

120

Figure 7-5 HTBR Regression Tree Result for CO Emission Rate

Figure 7-6 HTBR Regression Tree Result for HC Emission Rate

121 Table 7-4 Sensitivity Test Results for Four Mode Definition

NOx Mode Number Deviance Mean ER Residual Mean Deviance 105976 1435.00 0.10680 Definition 1 0.006967 = 738.3 / 106000 Idle 29541 11.04 0.03235 Acceleration 25931 320.90 0.22480 Deceleration 22242 41.32 0.02671 Cruise 28262 365.10 0.13930 Definition 2 0.007658 = 811.5/106000 Idle 31064 16.05 0.03342 Acceleration 18894 206.50 0.23110 Deceleration 16644 21.14 0.02214 Cruise 39374 567.80 0.14070 Definition 3 0.00856 = 907.1 / 106000 Idle 32010 23.07 0.03470 Acceleration 13417 130.50 0.2297 Deceleration 12768 14.27 0.02065 Cruise 47781 739.30 0.14350 Definition 4 0.009397 = 995.8 / 106000 Idle 32717 30.240 0.03583 Acceleration 8719 77.150 0.22600 Deceleration 9452 9.191 0.02015 Cruise 55088 879.200 0.14490 CO 105765 771.300 0.032370 Definition 1 0.005795 = 612.9 / 105800 Idle 29287 2.166 0.005590 Acceleration 25866 559.400 0.099740 Deceleration 22456 3.903 0.006564 Cruise 28156 47.380 0.018910 Definition 2 0.005486283 = 580.2 / 105800 Idle 30764 4.185 0.005944 Acceleration 18864 484.900 0.122400 Deceleration 16919 2.410 0.005803 Cruise 39218 88.710 0.021250 Definition 3 0.005293 = 559.8 / 105800 Idle 31691 9.131 0.006610 Acceleration 13402 410.100 0.147600 Deceleration 13035 1.861 0.005454 Cruise 47637 138.700 0.024440 Definition 4 0.005239 = 554 / 105800 Idle 32375 15.5200 0.007365 Acceleration 8712 339.1000 0.179700 Deceleration 9681 0.7047 0.005049 Cruise 54997 198.7000 0.028560 HC 103405 0.40270 0.0014960 Definition 1 3.648e-006 = 0.3772 / 103400 Idle 28780 0.09337 0.0009217 Acceleration 25122 0.09143 0.0022310 Deceleration 22287 0.07644 0.0012180 Cruise 27216 0.11600 0.0016530

122 Table 7-4 Continued

Definition 2 3.629e-006 = 0.3752 / 103400 Idle 30250 0.09492 0.0009176 Acceleration 18330 0.06668 0.0023860 Deceleration 16805 0.05355 0.0011790 Cruise 38020 0.16010 0.0016680 Definition 3 3.636e-006 = 0.376 / 103400 Idle 31157 0.09651 0.0009258 Acceleration 12999 0.04355 0.0025110 Deceleration 12970 0.04256 0.0011600 Cruise 46279 0.19330 0.0016890 Definition 4 3.656e-006 = 0.378 / 103400 Idle 31849 0.09835 0.0009364 Acceleration 8443 0.02944 0.0026390 Deceleration 9613 0.03257 0.0011470 Cruise 53500 0.21760 0.0017120

7.3 Conclusions

Comparison of modal average estimates show that the average modal emission rates are statistically different from each other for three different pollutants. HTBR regression tree results demonstrate that all four definitions can work well to divide the database. Comparisons of residual mean deviance indicate that definition 1 has the smallest residual mean deviance for NOx, while definition 4 for CO and definition 2 for

HC, but differences were small. At this moment, it is difficult to choose one definition for three pollutants just based on sensitivity analysis results in this chapter. The analysis results in this section indicate that driving mode definition couldn’t be introduced from one research to another research directly. It is better to test several different critical values and get the most suitable one instead of testing only one definition developed from other research. For this research, more analysis will be done in the chapters that follow to develop the most suitable driving mode definitions.

123 CHAPTER 8

IDLE MODE DEVELOPMENT

In Chapter 7, the concept of driving modes was introduced and several sensitivity tests (comparison of modal average estimates, comparison of HTBR regression tree results, and comparison of residual mean deviance) were performed for four different mode definitions. Based on sensitivity analysis results, it is difficult to choose one definition for three pollutants at this moment. It means that more analysis will be done next to develop the most suitable driving mode definition. This chapter will focus on developing the suitable definition for idle mode.

Theoretically, idle mode usually is defined as zero speed and zero acceleration.

In real world data collection efforts, this definition must be refined due to the presence of speed measurement error. In this research, idle mode will be defined by speed and acceleration too. The critical value couldn’t be introduced directly from previous research. It is better to statistically test several critical values and identify the most suitable idle definition.

8.1 Critical Value for Speed in Idle Mode

Three critical values were tested to get the appropriate critical value for speed in defining idle activity. Figures 8-1 to 8-3 illustrate engine power vs. emission rates for three pollutants for three critical speed values: 1 mph, 2.5 mph, and 5 mph. Figure 8-4 compares engine power distribution for these three critical values. Because engine power distributions for three pollutants exhibit similar patterns, only NOx emissions are shown in Figure 8-4. Table 8-1 and 8-2 provide the engine power distribution for these three critical values in two ways: by number and percentage.

124

Figure 8-1 Engine Power vs. NOx Emission Rate for Three Critical Values

Figure 8-2 Engine Power vs. CO Emission Rate for Three Critical Values

125

Figure 8-3 Engine Power vs. HC Emission Rate for Three Critical Values

Figure 8-4 Engine Power Distribution for Three Critical Values based on NOx Emissions

126 Table 8-1 Engine Power Distribution for Three Critical Values for Three Pollutants

Engine Power (brake horsepower (bhp)) Speed Pollutant [0 20) [20 30) [30 40) [40 50) >=50 Total <=5 mph NOx 31631 2272 1323 152 2348 37726 CO 31258 2269 1316 149 2342 37334 HC 30737 2264 1321 147 2284 36753 <=2.5mph NOx 29222 2098 1196 83 1143 33742 CO 28880 2096 1189 81 1139 33385 HC 28373 2093 1194 80 1106 32846 <=1 mph NOx 27516 2011 1100 51 700 31378 CO 27217 2010 1093 51 699 31070 HC 26713 2007 1099 48 680 30547

Table 8-2 Percentage of Engine Power Distribution for Three Critical Values for Three Pollutants Engine Power (brake horsepower (bhp)) Speed Pollutant [0 20) [20 30) [30 40) [40 50) >=50 Total <=5 mph NOx 83.84% 6.02% 3.51% 0.40% 6.22% 100% CO 83.73% 6.08% 3.52% 0.40% 6.27% 100% HC 83.63% 6.16% 3.59% 0.40% 6.21% 100% <=2.5mph NOx 86.60% 6.22% 3.54% 0.25% 3.39% 100% CO 86.51% 6.28% 3.56% 0.24% 3.41% 100% HC 86.38% 6.37% 3.64% 0.24% 3.37% 100% <=1 mph NOx 87.69% 6.41% 3.51% 0.16% 2.23% 100% CO 87.60% 6.47% 3.52% 0.16% 2.25% 100% HC 87.45% 6.57% 3.60% 0.16% 2.23% 100%

Based on above analysis, critical value of 5 mph include more data points with higher engine power (>50 bhp) than 2.5 mph and 1 mph. But there is no large difference for engine power distribution between 2.5 mph and 1 mph. These two critical values for speed will be tested further with different acceleration values in next section to make final decision.

8.2 Critical Value for Acceleration in Idle Mode

After setting the critical value for speed, the next step is to determine a critical value for acceleration. In total, four options were tested.

127 • Option 1: speed <= 2.5 mph and absolute acceleration <=2 mph/s

• Option 2: speed <= 2.5 mph and absolute acceleration <=1 mph/s

• Option 3: speed <= 1 mph and absolute acceleration <=2 mph/s

• Option 4: speed <= 1 mph and absolute acceleration <=1 mph/s

Using the same method as outlined in the previous section, Figure 8-5 to 8-7 illustrates engine power vs. emission rates for three pollutants for four options above.

Figure 8-8 compares engine power distribution for data falling into these four options.

Because engine power distributions for three pollutants exhibit the similar pattern, only

NOx emissions are shown in Figure 8-8. Table 8-3 and 8-4 provide the engine power distribution for four options in two ways: by number and percentage.

Figure 8-5 Engine Power vs. NOx Emission Rate for Four Options

128

Figure 8-6 Engine Power vs. CO Emission Rate for Four Options

Figure 8-7 Engine Power vs. HC Emission Rate for Four Options

129

Figure 8-8 Engine Power Distribution for Four Options based on NOx Emission Rates

Table 8-3 Engine Power Distribution for Four Options for Three Pollutants

Engine Power (brake horsepower (bhp)) Pollutants [0 20) [20 30) [30 40) [40 50) >=50 Total Option 1 NOx 28694 2075 1177 78 693 32717 CO 28366 2073 1170 76 69032375 HC 27855 2070 1175 75 67431849 Option 2 NOx 27571 2030 1120 53 290 31064 CO 27284 2028 1114 51 28730764 HC 26771 2026 1119 51 28330250 Option 3 NOx 27367 1999 1091 50 527 31034 CO 27071 1998 1084 50 52630729 HC 26569 1995 1090 47 51230213 Option 4 NOx 26719 1969 1057 34 205 29984 CO 26446 1968 1051 34 20429703 HC 25944 1966 1056 32 19829196

130 Table 8-4 Percentage of Engine Power Distribution for Three Critical Values for Three Pollutants Engine Power (brake horsepower (bhp)) Pollutants [0 20) [20 30) [30 40) [40 50) >=50 Total Option 1 NOx 87.70% 6.34% 3.60% 0.24% 2.12% 100.00% CO 87.62% 6.40% 3.61% 0.23% 2.13%100.00% HC 87.46% 6.50% 3.69% 0.24% 2.12%100.00% Option 2 NOx 88.76% 6.53% 3.61% 0.17% 0.93% 100.00% CO 88.69% 6.59% 3.62% 0.17% 0.93%100.00% HC 88.50% 6.70% 3.70% 0.17% 0.94%100.00% Option 3 NOx 88.18% 6.44% 3.52% 0.16% 1.70% 100.00% CO 88.10% 6.50% 3.53% 0.16% 1.71%100.00% HC 87.94% 6.60% 3.61% 0.16% 1.69%100.00% Option 4 NOx 89.11% 6.57% 3.53% 0.11% 0.68% 100.00% CO 89.03% 6.63% 3.54% 0.11% 0.69%100.00% HC 88.86% 6.73% 3.62% 0.11% 0.68%100.00%

Based on above analysis, data falling into option 2 and option 4 contain fewer data points with higher engine powers (>50 bhp) than data falling into option 1 and option 3. But a large difference is not noted in engine power distribution for data falling into option 2 and option 4. Based upon these results, the idle mode is defined as speed

<=2.5 mph and absolute acceleration <= 1 mph/s.

8.3 Emission Rate Distribution by Bus in Idle Mode

After defining “speed <= 2.5 mph and absolute acceleration <=1 mph/s” as idle mode, emission rate histograms for each of the three pollutants for idle operations are presented in Figure 8-9. Figure 8-9 shows significant skewness for all three pollutants for idle mode. Inter-bus response variability for idle mode operations is illustrated in

Figures 8-10 to 8-12 using median and mean of NOx, CO, and HC emission rates. Table

8-5 presents the same information in tabular form. The difference between median and mean is also an indicator of skewness.

131

Figure 8-9 Histograms of Three Pollutants for Idle Mode

Figure 8-10 Median and Mean of NOx Emission Rates in Idle Mode by Bus

132

Figure 8-11 Median and Mean of CO Emission Rates in Idle Mode by Bus

Figure 8-12 Median and Mean of HC Emission Rates in Idle Mode by Bus

133 Table 8-5 Median, and Mean of Three Pollutants in Idle Mode by Bus

NOx CO HC Bus ID Median Mean Median Mean Median Mean Bus 360 0.071020 0.059444 0.004830 0.009145 0.00072 0.002441 Bus 361 0.020455 0.020216 0.005740 0.008895 0.00063 0.000865 Bus 363 0.022555 0.032140 0.000670 0.005408 0.00007 0.000385 Bus 364 0.025050 0.026480 0.003110 0.003601 0.00071 0.000927 Bus 372 0.055210 0.054766 0.013150 0.011739 0.00220 0.002272 Bus 375 0.028880 0.035050 0.005390 0.013385 0.00076 0.001311 Bus 377 0.023370 0.025393 0.000960 0.001572 0.00019 0.000219 Bus 379 0.033210 0.038500 0.006730 0.011425 0.00085 0.001531 Bus 380 0.026200 0.027371 0.000930 0.001218 0.00024 0.000298 Bus 381 0.027115 0.028768 0.001915 0.004044 0.00020 0.000228 Bus 382 0.027605 0.036734 0.002980 0.009836 0.00034 0.000624 Bus 383 0.027790 0.027520 0.002290 0.002736 0.00065 0.000950 Bus 384 0.024210 0.026982 0.001205 0.003428 0.00043 0.000498 Bus 385 0.023750 0.024339 0.002590 0.005782 0.00043 0.000453 Bus 386 0.032140 0.030031 0.004860 0.006155 0.00055 0.000579

Figure 8-10 to 8-12 and Table 8-5 illustrate that bus 372 has the largest median and the second largest mean for CO and HC emissions, and the second largest median and the second largest mean for NOx emissions. The activity of bus 372 in terms of distribution of engine power by bus was compared to that of other buses in an effort to identify why the emission rates were significantly higher than for other buses. Table 8-6 and Figure 8-13 show that bus 372 has higher min (2nd), 1st Quartile (2nd), median (1st), and 3rd Quartile (2nd) of engine power compared to the other 14 buses. Engine power in idle mode may include cooling fan, air compressor, air conditioner, and alternator loads

(Clark et al. 2005). Considering test buses and engines are similar in many ways, this difference might be caused by variability across the engines, or may be associated with unrecorded air conditioner use. In analyzing the database, the modeler couldn’t identify a contribution of air conditioner to engine power in idle mode. So, model development will include these data but readers should be cautioned that the noted variability is an indication that significant numbers of vehicles may need to be tested in the future if such

134 inter-engine differences are significant in the fleet. In addition, the role of air conditioning usage on engine load in transit buses warrants additional future research.

Table 8-6 Engine Power Distribution in Idle Mode by Bus

1st 3rd Bus ID Min Quartile Median Quartile Max Bus 360 3.92 15.36 18.7 19.83 135.43 Bus 361 0 5.35 12.52 13.83 89.47 Bus 363 0 13.1 13.34 15.16 152.94 Bus 364 0 13.18 13.85 14.99 154.51 Bus 372 0 26.44 31.84 33.10 79.08 Bus 375 0 12.52 13.81 18.08 167.72 Bus 377 0 8.5 9.17 9.85 166.86 Bus 379 0 15.86 17.15 19.42 126.64 Bus 380 2.67 7.85 8.49 9.17 100.99 Bus 381 0 8.7 10.49 11.17 148.28 Bus 382 0 7.35 8.52 13.89 99.04 Bus 383 0 7.16 10.03 12.5 91.86 Bus 384 0 6.01 7.34 8.51 117.39 Bus 385 0 4.53 7.19 8.51 139.05 Bus 386 4.68 9.18 13.33 14.46 105.44

135

Figure 8-13 Histograms of Engine Power in Idle Mode by Bus

136 8.4 Discussions

8.4.1 High HC Emissions

Figure 8-7 shows that there are some high HC emissions in idle mode. Based on definitions of “speed <= 2.5 mph and absolute acceleration <=1 mph/s”, there are

388/30250=1.28% of data points in idle mode for HC are high emissions. This was only noted in the HC emissions data, not in NOx and CO. All high HC emissions have been coded as high-idle and see if they are related to any other parameters. Tree analysis could be used for this screening analysis. After screening engine speed, engine power, engine oil temperature, engine oil pressure, engine coolant temperature, ECM pressure, and other parameters, no specific operating parameters related to these high-idle emissions were identified.

On the other hand, regression tree analysis results by bus and trip are presented in

Figure 8-14 where the left figure shows that these high HC emissions happened in bus

360 and 372 while the right figure shows that these high HC emissions happened in bus

360 trip 4 and bus 372 trip 1. Even for HC emissions, Figure 8-14 shows that these high emissions are not a common situation in idle mode. There are 1529 idle segments in total for 15 buses, but most of these high HC emissions came just from three idle segments.

These three idle segments are: bus 360 trip 4 idle segment 1 (130 seconds), bus 360 trip 4 idle segment 38 (516 seconds), bus 372 trip 1 idle segment 1 (500 seconds). More specifically, bus 360 trip 4 idle segment 1 contains 102 high HC emissions, bus 360 trip 4 idle segment 38 contains 264 high HC emissions, while bus 372 trip 1 idle slots contains

13 high HC emissions. Figure 8-15 to 8-17 illustrates time series plots for HC for these three idle segments while vehicle speed, engine speed, engine power, engine oil temperature, engine oil pressure, engine coolant temperature, ECM pressure are presented too. These figures don’t include NOx and CO because NOx and CO don’t show such pattern for these three idle segments as HC. These three idle segments contain

137 379 high HC emissions in total. It means about 98% of high emissions came just from three idle segments only. It is difficult to exclude these three idle segments based on all information on hand. The modeler prefers to keep them because these outliers might reflect variability in real world. However, future data collection efforts should seek to identify the causes of such events.

Figure 8-14 Tree Analysis Results for High HC Emission Rates by Bus and Trip

Figure 8-15 Time Series Plot for Bus 360 Trip 4 Idle Segment 1 (130 Seconds)

138

Figure 8-16 Time Series Plot for Bus 360 Trip 4 Idle Segment 38 (516 Seconds)

Figure 8-17 Time Series Plot for Bus 372 Trip 1 Idle Segment 1 (500 Seconds)

8.4.2 High Engine Operating Parameters

Figure 8-15 shows that engine speed once jumped to about 2000 rpm during bus

360 trip 4 idle segment 1, while corresponding engine power and engine oil pressure jumped too. This jump only lasted 9 seconds. There are several potential reasons which might be responsible for this jump. The first one might be that bus 360 did move slowly from one location to another location while GPS data failed to catch that movement.

Other explanations might be an engine computer problem, or sensor problem. This kind of jump, higher engine speed (about 2000 rpm) accompanied by higher engine power and engine oil pressure in idle mode did occur in the real world. The jump in Figure 8-16

139 doesn’t belong to such situation because engine speed is only about 1000 rpm during that jump. After screening the whole dataset, another example of jump is shown in Figure 8-

18. The jump in bus 383 trip 1 idle segment 12 lasts 28 seconds. Since there are only two observations for such jumps in the whole database, there are not enough data to assess whether they belong to a new mode or not. But this might be evidence to pay attention to the slow movement during idle segment. Even these two idle segments show some unusual activities, the modeler will keep them to avoid any bias for the result.

Figure 8-18 Time Series Plot for Bus 383 Trip 1 Idle Segment 12 (1258 Seconds)

8.5 Idle Emission Rates Estimation

Based on definition of “speed <= 2.5 mph & absolute acceleration <= 1 mph/s”, there are about 30% of data classified as idle mode. Usually, modelers estimate the idle emission rate by averaging all emission rates in idle mode. Although there are some data points with higher engine power (> 50 bhp) in idle mode, about 90% of data in idle mode exhibit engine power between 0 and 20 bhp. After detailed analysis of all idle segments using time series plots, although some data may be incorrectly classified into the idle mode, no anomalies were noted. To avoid introducing any significant bias, a single idle emission rate is developed for each pollutant. When we treat all data as a whole and put them in the pool, the mean and confidence interval can reflect the distribution of emission

140 rates in real world. Table 8-7 provides idle mode statistical analysis results for NOx, CO, and HC.

Table 8-7 Idle Mode Statistical Analysis Results for NOx, CO, and HC

NOx CO HC min 0.00121 0.00002 0.00001 1st Qu 0.02201 0.00120 0.00026 mean 0.03342 0.00594 0.00092 median 0.02670 0.00293 0.00051 3rd Qu 0.03549 0.00554 0.00079 max 0.40259 0.48118 0.05232 skewness 4.45050 13.1840 11.6100 Total N 31064 30764 30250

Due to the non-normality of emission rates, the median value (the value that divides observations into an upper and lower half) and the inter-quartile range (the range of values that include the middle 50% of the observations) are the most appropriate for describing the distribution. The mean and skewness for original data is presented in

Table 8-8 as well. Although transformation for three pollutants were already discussed based on the whole dataset in Chapter 6, lambdas chosen by Box-Cox procedure for the whole dataset and idle mode are different. Lambdas chosen by Box-Cox procedure for the whole dataset are 0.22875 for NOx, -0.0648 for CO, 0.14631 for HC, while lambdas for idle mode are -0.19619 for NOx, -0.0625 for CO, 0.002875 for HC. At the same time, using transformation to estimate the mean and construct confidence intervals will bring some other problems. So the modeler considers bootstrap, another class of general methods, to get the estimation and construct confidence intervals.

The bootstrap is a procedure that involves choosing random samples with replacement from a dataset and analyzing each sample the same way (Li 2004). To obtain the 95% confidence interval, the simple method is by taking 2.5% and 97.5% percentile of the B replications T1, T2, .., TB as the lower and upper bound respectively.

141 The bootstrap function in this study will resample the emission data 1000 times and computes the mean, 2.5% and 97.5% percentile on each sample. Results are presented in

Figure 8-20 and Table 8-8.

Figure 8-19 Graphical Illustration of Bootstrap (Adopted from Li (Li 2004))

Figure 8-20 Bootstrap Results for Idle Emission Rate Estimation

142 Table 8-8 Idle Emission Rates Estimation and 95% Confidence Intervals Based on Bootstrap Average 2.5% Percentile 97.5% Percentile NOx Estimation 0.033415 0.010754 0.083266 Confidence 0.033162 0.010509 0.082279 Interval 0.033669 0.010998 0.084252 CO Estimation 0.0059439 0.00036116 0.028429 Confidence 0.0058184 0.00034446 0.028083 Interval 0.0060693 0.00037775 0.028775 HC Estimation 0.00091777 0.000059167 0.0037260 Confidence 0.00089742 0.000047572 0.0036412 Interval 0.00093811 0.000070763 0.0038108

Based on table 8-9, the modeler recommends idle emission rates for NOx as

0.033415 g/s with 95% confidence interval [0.010754 0.083266], CO as 0.0059439 g/s with 95% confidence interval [0.00036116 0.028429], HC as 0.00091777 g/s with 95% confidence interval [0.000059167 0.0037260].

8.6 Conclusions and Further Considerations

• In this research, idle mode is defined as “speed <= 2.5 mph and absolute acceleration <=1 mph/s”. But the critical value couldn’t be introduced from other research to this research directly. It is better to test several critical values and get the most suitable one instead of testing only one developed from other research.

• Inter-bus variability analysis results indicate that bus 372 has the largest mean for NOx, CO, and HC emissions. Meanwhile, bus 372 has higher min (2nd), 1st Qu

(2nd), median (1st), and 3rd Qu (2nd) of engine power comparing with other 14 buses.

Considering test buses and engines are similar in most ways, this difference might caused by variability of engines, or air conditioner usage. However, the contribution of air conditioner to engine power in idle mode could not be identified in the database. Future research about the role of air conditioner on engine power and emission rates in idle mode might interpret this difference.

143 • Although some trips or some buses have higher mean and standard deviance than others, this kind of variability will decrease when all data in idle mode are treated as a whole. On the other hand, some elevated emissions events may simply reflect real world variability. Without additional evidence, modelers should treat all data as a whole instead of removing outliers and potentially biasing results.

• There are two observations of an emissions jump that appears to be unrelated to engine speed, engine power, and engine oil temperature, in single idle segment. The modeling first assumed that the bus did move slowly from one location to another location while GPS/ECM failed to catch that movement. Other explanations might be engine computer problem, or sensor problem, and so on. These two jumps might be evidence to support further research on slow movement during idle segment.

• In summary, the modeler recommends idle emission rates for NOx as

0.033415 g/s with 95% confidence interval [0.010754 0.083266], CO as 0.0059439 g/s with 95% confidence interval [0.00036116 0.028429], HC as 0.00091777 g/s with 95% confidence interval [0.000059167 0.0037260].

144 CHAPTER 9

DECELERATION MODE DEVELOPMENT

Chapter 7 introduced the concept of driving mode into the study and several sensitivity tests were performed for four different definitions, including comparison of modal average emission rate estimates, HTBR regression tree results, and residual mean deviance. After developing the idle mode definition and emission rate in Chapter 8, the next task is dividing the rest of the vehicle activity data into driving mode (deceleration, acceleration and cruise) for further analysis. The deceleration mode is examined first.

9.1 Critical Value for Deceleration Rates in Deceleration Mode

The first task related to analysis of emission rates in the deceleration mode is identifying a critical values for deceleration.. The literature indicates that critical values of -1 mph/s and -2 mph/s should be examined. Because the critical value of “acceleration

< -1 mph/s” also includes all data that conform with a critical value of “acceleration <-2 mph/s”, comparison of data that fall between these two potential cut points is first performed. In summary, these three deceleration bins for analysis include:

• Option 1: acceleration < -2 mph/s

• Option 2: acceleration >= -2 mph/s & acceleration < -1 mph/s

• Option 3: acceleration >= -1 mph/s & acceleration < 0 mph/s

If the critical value is set as -1 mph/s for deceleration mode, data falling into option 1 and option 2 will be classified as deceleration mode while data falling into option 3 will be classified as cruise mode. If the critical value is set as -2 mph/s for deceleration mode, data falling into option 1 will be classified as deceleration mode while data falling into option 2 and option 3 will be classified as cruise mode.

Figure 9-1 illustrates engine power distribution for these three options. Figure 9-2 to 9-4 compare engine power vs. emission rate for three pollutants for three options.

145 Table 9-1 and 9-2 provide the distribution for these three options in two ways: by number and percentage.

Table 9-1 Engine Power Distribution for Three Options for Three Pollutants

Engine Power (brake horsepower (bhp)) Deceleration Pollutants (0 20) (20 30) (30 40) (40 50) >=50 Total Option 1 NOx 9322 94 16 5 15 9452 CO 9558 89 15 4 15 9681 HC 9483 94 16 5 15 9613 Option 2 NOx 6748 127 101 42 174 7192 CO 6800 126 99 42 171 7238 HC 6754 125 99 42 172 7192 Option 3 NOx 6806 950 1062 562 4353 13733 CO 6782 949 1061 558 4326 13676 HC 6705 921 1044 541 4212 13423

Table 9-2 Percentage of Engine Power Distribution for Three Options for Three

Pollutants

Engine Power (brake horsepower (bhp)) Deceleration Pollutants (0 20) (20 30) (30 40) (40 50) >=50 Total Option 1 NOx 98.6% 1.0% 0.2% 0.1% 0.2% 100.0% CO 98.7% 0.9% 0.2% 0.0% 0.2% 100.0% HC 98.6% 1.0% 0.2% 0.1% 0.2% 100.0% Option 2 NOx 93.8% 1.8% 1.4% 0.6% 2.4% 100.0% CO 93.9% 1.7% 1.4% 0.6% 2.4% 100.0% HC 93.9% 1.7% 1.4% 0.6% 2.4% 100.0% Option 3 NOx 49.6% 6.9% 7.7% 4.1% 31.7% 100.0% CO 49.6% 6.9% 7.8% 4.1% 31.6% 100.0% HC 50.0% 6.9% 7.8% 4.0% 31.4% 100.0%

146

Figure 9-1 Engine Power Distribution for Three Options

Figure 9-2 Engine Power vs. NOx Emission Rate for Three Options

147

Figure 9-3 Engine Power vs. CO Emission Rate for Three Options

Figure 9-4 Engine Power vs. HC Emission Rate for Three Options

148 There is little difference in the engine power distributions noted for data falling into option 1 and option 2 while the power distribution for option 3 is obviously different from option 1 and option 2 in the above figures and tables. Table 9-1 and 9-2 show that the engine power is more concentrated in the lower engine power regime (<20 bhp) for data in deceleration mode. Table 9-1 and 9-2 better reflect the power demand of the vehicle in real world in deceleration mode. Hence, the critical value is set to -1 mph/s for deceleration mode.

9.2 Analysis of Deceleration Mode Data

9.2.1 Emission Rate Distribution by Bus in Deceleration Mode

After defining vehicle activity data with “acceleration <-1 mph/s” as deceleration mode, emission rate histograms for each of the three pollutants for deceleration operations are presented in Figure 9-5. Figure 9-5 shows significant skewness for all three pollutants for deceleration mode. Inter-bus emission rate variability is illustrated by plotting median and mean NOx, CO, and HC emission rates in deceleration mode for each bus in Figures 9-6 to 9-8 and Table 9-3. The difference between median and mean is also an indicator of skewness.

149

Figure 9-5 Histograms of Three Pollutants for Deceleration Mode

Figure 9-6 Median and Mean of NOx Emission Rates in Deceleration Mode by Bus

150

Figure 9-7 Median and Mean of CO Emission Rates in Deceleration Mode by Bus

Figure 9-8 Median and Mean of HC Emission Rates in Deceleration Mode by Bus

151 Table 9-3 Median, and Mean for NOx, CO, and HC in Deceleration Mode by Bus

NOx CO HC Bus ID Median Mean Median Mean Median Mean Bus 360 0.00325 0.01998 0.00502 0.00814 0.00040 0.00097 Bus 361 0.00624 0.02206 0.00384 0.00535 0.00079 0.00095 Bus 363 0.00483 0.01952 0.00446 0.00486 0.00004 0.00008 Bus 364 0.00324 0.01255 0.00474 0.00586 0.00551 0.00613 Bus 372 0.00437 0.01924 0.00578 0.00803 0.00161 0.00229 Bus 375 0.00499 0.01997 0.00410 0.00567 0.00066 0.00085 Bus 377 0.00414 0.01940 0.00317 0.00630 0.00034 0.00040 Bus 379 0.02664 0.03457 0.00397 0.00522 0.00078 0.00103 Bus 380 0.00525 0.01914 0.00359 0.00716 0.00060 0.00072 Bus 381 0.01666 0.02420 0.00369 0.00452 0.00034 0.00038 Bus 382 0.01214 0.03541 0.00450 0.00564 0.00073 0.00083 Bus 383 0.00741 0.02385 0.00322 0.00452 0.00128 0.00172 Bus 384 0.00828 0.02869 0.00259 0.00411 0.00113 0.00127 Bus 385 0.02066 0.02118 0.00377 0.00585 0.00088 0.00086 Bus 386 0.00341 0.01786 0.00406 0.00583 0.00091 0.00120

Figure 9-6 to 9-8 and Table 9-3 illustrate that Bus 379 has the largest median and the second largest mean for NOx emissions, Bus 372 has the largest median and the second largest mean for CO emissions, while Bus 364 has the largest median and mean for HC emissions. At the same time, Bus 382 has the largest mean for NOx emissions, and Bus 360 has the largest mean for CO emissions. Above figures and table demonstrate that although variability exist among buses, it is difficult to conclude that which, if any, bus is a high emitter (i.e., a bus that exhibits extremely high emission rates under all operating conditions, which also may exhibit significantly different emissions responses to operating activity than normal emitters).

The modeler notices that there are also a small number of some very high HC emissions events noted in deceleration mode. Based on definitions of “acceleration < -1 mph/s”, there are 242/16237=1.49 % of data points in deceleration mode for HC are high emissions. Especially, this only happened for HC, not for NOx and CO. All high HC emissions have been coded and see if they are related to any other parameters. Tree analysis could be used for this screening analysis. After screening engine speed, engine

152 power, engine oil temperature, engine oil pressure, engine coolant temperature, ECM pressure, and other parameters, no operating parameters appeared to be correlated to these high emissions events.

On the other hand, high HC emissions distribution by bus and trip are presented in

Table 9-4. Unlike idle mode which high HC emissions mainly happened in three idle segments (Bus 360, trip 4, idle segment 1; Bus 360, trip 4, idle segment 38; and Bus 372, trip 1, idle segment 1), high HC emissions are dispersed among 7 different buses and 18 different trips. Although there is not enough evidence to suggest that which bus is a

“high emitter”, Bus 364 is worthy of additional attention. There are 5284 data points for

Bus 364 and, among them, 887 data points classified as deceleration mode. There are

408 high HC emissions data points for Bus 364 in deceleration mode. The percentage of high HC emission for Bus 364 is 7.72% (408/5284), while the percentage of high HC emissions for Bus 364 in deceleration mode is about 21% (193/887). Given the limited available data, no conclusion could be drawn about high HC emissions in deceleration mode. These potential outliers may simply reflect real-world emissions variability for these engines.

Emission rate behavior as a function of operating mode and power for high- emitting vehicles may differ significantly from normal-emitting vehicles. Since there is no high-emitting vehicle identified in AATA data set, it is impossible to the modeler to examine such difference. To ensure that models are applicable to normal and high- emitters in the fleet, models have to have both available in the analytical data set. So it is important to identify high-emitting vehicles and bringing them in for testing.

153 Table 9-4 High HC Emissions Distribution by Bus and Trip for Deceleration Mode

Bus ID Number of Trip Number of High HC High HC Events Events Bus 360 11 Bus 360, trip 3 3 Bus 360, trip 4 8 Bus 361 1 Bus 361, trip 5 1 Bus 364 193 Bus 364, trip 1 46 Bus 364, trip 2 61 Bus 364, trip 3 86 Bus 372 19 Bus 372, trip 1 6 Bus 372, trip 2 4 Bus 372, trip 3 3 Bus 372, trip 4 6 Bus 383 11 Bus 383, trip 1 3 Bus 383, trip 2 3 Bus 383, trip 3 2 Bus 383, trip 4 3 Bus 384 1 Bus 384, trip 3 1 Bus 386 6 Bus 386, trip 1 1 Bus 386, trip 2 2 Bus 386, trip 4 3

9.2.2 Engine Power Distribution by Bus in Deceleration Mode

Engine power distribution by bus is shown in Figure 9-9 and Table 9-5. When the bus is decelerating, the engine typically absorbs energy, yielding low engine power, or even negative engine power. Table 9-5 reflects this characteristic of deceleration mode.

According to Sensor’s report (2001), negative engine power is recorded as zero power in the data, which explains the large number of zero power values in the deceleration mode.

The emission rates under negative engine power conditions may signficiantly different from those under positive engine powers. Further analysis will examine this question.

On the other hand, Bus 372 has the largest 3rd Quartile engine power in deceleration mode. This is consistent with the finding in idle mode.

154 Table 9-5 Engine Power Distributions in Deceleration Mode by Bus

1st 3rd Bus No Min Quartile Median Quartile Max Bus 360 0 0 0 3.88 275.40 Bus 361 0 0 0 5.16 173.10 Bus 363 0 0 0 6.70 274.90 Bus 364 0 0 0 0 254.30 Bus 372 0 0 0 20.41 112.00 Bus 375 0 0 0 5.84 274.90 Bus 377 0 0 0 3.33 275.10 Bus 379 0 0 0 11.77 164.90 Bus 380 0 0 0 5.19 29.40 Bus 381 0 0 0 7.19 121.15 Bus 382 0 0 0 5.84 20.75 Bus 383 0 0 0 8.51 94.65 Bus 384 0 0 0 5.86 162.37 Bus 385 0 0 0 6.00 102.59 Bus 386 0 0 0 7.18 42.20

155

Figure 9-9 Histograms of Engine Power in Deceleration Mode by Bus

156 Based on definitions of “acceleration < -1 mph/s”, there are about 1% of data points with high engine power (>=50 bhp) in deceleration mode (Table 9-1). Figure 9-10 illustrates plot of engine power vs. vehicle speed, engine power vs. engine speed, and vehicle speed vs. engine speed. Figure 9-10 shows that higher engine power always happened with higher vehicle speed & higher engine speed. These data points with higher engine power likely reflect the variability of the real world and are all retained in the data set and mode definition to avoid potentially biasing results.

Figure 9-10 Engine Power vs. Vehicle Speed, Engine Power vs. and Engine Speed, and Vehicle Speed vs. Engine Speed

9.3 The Deceleration Motoring Mode

Bus engines absorb energy during the deceleration mode, resulting in low or negative engine power. According to Sensor’s report (2001), such negative power was recorded as zero power. The emissions under these negative engine power conditions

157 may be significantly different from those under positive engine power conditions, and therefore may need to be included in the modeling regime as a separate mode of operation. To examine this possibility, deceleration mode data were split into two mode bins for analysis. The first bin includes all data points with zero engine power in deceleration mode, termed ‘deceleration motoring mode.’ The remaining data in the deceleration mode, which exhibit positive engine power, are classified as deceleration non-motoring mode. The analysis will begin as comparing histograms of three pollutants between deceleration motoring mode and deceleration non-motoring mode (Figure 9-11).

Table 9-6 compares the mean, median, and skewness of emission distributions between these two potential modes for the three pollutants. The statistical results for all deceleration data are also presented as a reference. Figure 9-11 and Table 9-6 show that lower emission rates are more prevalent in the deceleration motoring mode than in the deceleration non-motoring mode. Skewness of emission distributions for deceleration motoring mode is also smaller.

(a) (b) Figure 9-11 Histograms for Three Pollutants in Deceleration Motoring Mode (a) and Deceleration Non-Motoring Mode (b)

To test the difference between deceleration motoring mode and deceleration non- motoring mode, Kolmogorov-Simirnov two-sample test was chosen rather than a standard t-test, because the normal distribution assumption was questionable. The

158 Kolmogorov-Smirnov two-sample test is a test of the null hypothesis that two independent samples have been drawn from the same population (or from populations with the same distribution). The test uses the maximal difference between cumulative frequency distributions of two samples as the test statistic. Results of the Kolmogorov-

Smirnov two-sample tests demonstrate that the differences in emission rates under deceleration motoring mode and deceleration non-motoring mode are statistically significant.

159 Table 9-6 Comparison of Emission Distributions between Deceleration Mode and Two

Sub-Modes (Deceleration Motoring Mode and Deceleration Non-Motoring Mode)

NOx CO HC Deceleration Mode Number 16644 16919 16805 Min 0.00001 0.00001 0.00001 1st Quartile 0.00182 0.00249 0.00039 Median 0.00611 0.00398 0.00068 3rd Quartile 0.03155 0.00605 0.00120

Max 1.30640 0.85208 0.04200

Mean 0.02215 0.00580 0.00118

Skewness 6.02890 30.6459 5.76530

Sub-mode 1:Deceleration Motoring Mode

Number 10925 11304 11240 Min 0.00001 0.00001 0.00001 1st Quartile 0.00124 0.00269 0.00041 Median 0.00272 0.00401 0.00067 3rd Quartile 0.00816 0.00567 0.00110

Max 0.14930 0.20366 0.01425 Mean 0.00978 0.00528 0.00111

Skewness 3.08780 12.27120 3.92760 Sub-mode 2: Deceleration Non-Motoring Mode

Number 5719 5615 5565 Min 0.00002 0.00003 0.00001 1st Quartile 0.01973 0.00204 0.00034 Median 0.03431 0.00384 0.00069 3rd Quartile 0.05658 0.00741 0.00150 Max 1.30640 0.85208 0.04200 Mean 0.04576 0.00685 0.00131 Skewness 5.7018 26.8539 6.8026

9.4 Deceleration Emission Rates Estimations

Using the “acceleration < -1 mph/s” cutpoint, about 16% of total data collected are classified in the deceleration mode. While deceleration emission rates could simply be estimated directly by averaging all deceleration mode emission rates, the emission rate

160 distribution is non-normal. Because lambdas identified by the Box-Cox procedure for the whole dataset and deceleration mode subsets are different, and because using a transformation to estimate the mean and construct confidence intervals will create other problems, the bootstrap (another class of general methods) was used for estimation of the mean and for construction of confidence intervals. The bootstrap function in this study resampled the emission rate data 1000 times and computed the mean, 2.5%, and 97.5% percentile of each sample.

The results of the bootstrap analyses indicate that splitting the deceleration mode into deceleration motoring mode and deceleration non-motoring mode using the zero engine power criteria is warranted. The bootstrap distributions of mean emission rates for deceleration mode, deceleration motoring mode, and deceleration non-motoring mode are presented in Figure 9-12 to 9-14 and Table 9-7. To illustrate the difference in emission rates estimation between deceleration motoring mode and deceleration non- motoring mode, Figure 9-15 presents bootstrap means and confidence intervals for the emission rates of all three pollutants. For reference purpose, deceleration mode emission rates estimation are also presented. Table 9-7 and Figure 9-15 show that the average emission rate for the deceleration motoring mode is much lower than that for deceleration non-motoring mode for all pollutants, and especially for NOx.

161

Figure 9-12 Bootstrap Results for NOx Emission Rate Estimation in Deceleration Mode

Figure 9-13 Bootstrap Results for CO Emission Rate Estimation in Deceleration Mode

162

Figure 9-14 Bootstrap Results for HC Emission Rate Estimation in Deceleration Mode

Figure 9-15 Emission Rate Estimation Based on Bootstrap for Deceleration Mode

163 Table 9-7 Emission Rates Estimation and 95% Confidence Intervals Based on Bootstrap for Deceleration Mode

Average 2.5% Percentile 97.5% Percentile Deceleration Mode NOx Estimation 0.02215 0.00024 0.10919 Confidence 0.02161 0.00022 0.10427 Interval 0.02268 0.00027 0.11411 CO Estimation 0.00580 0.00055 0.02191 Confidence 0.00562 0.00051 0.02067 Interval 0.00598 0.00059 0.02314 HC Estimation 0.00118 0.00004 0.00652 Confidence 0.00115 0.00004 0.00626 Interval 0.00121 0.00004 0.00679 Deceleration Motoring Mode NOx Estimation 0.00978 0.00017 0.06540 Confidence 0.00945 0.00015 0.06306 Interval 0.01010 0.00019 0.06774 CO Estimation 0.00529 0.00072 0.01743 Confidence 0.00514 0.00068 0.01635 Interval 0.00543 0.00075 0.01850 HC Estimation 0.00111 0.00004 0.00652 Confidence 0.00109 0.00004 0.00621 Interval 0.00114 0.00004 0.00683 Deceleration Non-Motoring Mode NOx Estimation 0.04578 0.00173 0.17187 Confidence 0.04457 0.00152 0.16343 Interval 0.04698 0.00195 0.18031 CO Estimation 0.00686 0.00037 0.02846 Confidence 0.00643 0.00033 0.02587 Interval 0.00728 0.00040 0.03104 HC Estimation 0.00131 0.00004 0.00650 Confidence 0.00125 0.00003 0.00594 Interval 0.00137 0.00005 0.00706

Based on table 9-7, the deceleration emission rate for NOx is set as 0.02215 g/s with 95% confidence interval [0.00024 to 0.10919], CO as 0.00580 g/s with 95%

164 confidence interval [0.00055 to 0.02191], HC as 0.00118 g/s with 95% confidence interval [0.00004 to 0.00652]. The deceleration motoring emission rate for NOx is set as

0.00978 g/s with 95% confidence interval [0.00017 to 0.06540], CO as 0.00529 g/s with

95% confidence interval [0.00072 to 0.01743], HC as 0.00111 g/s with 95% confidence interval [0.00004 to 0.00652]. The deceleration non-motoring mode emission rate for

NOx is set as 0.04578 g/s with 95% confidence interval [0.00173 to 0.17187], CO as

0.00686 g/s with 95% confidence interval [0.00037 to 0.02846], HC as 0.00131 g/s with

95% confidence interval [0.00004 to 0.00650].

9.5 Conclusions and Further Considerations

• In this research, deceleration mode is defined as “acceleration <-1 mph/s”.

But the emissions under negative engine powers are different from those under positive engine power. Hence, the deceleration mode is split into deceleration motoring mode and deceleration non-motoring mode based on engine power.

• Inter-bus variability analysis indicates that Bus 372 has the largest 3rd

Quartile for engine power among 15 buses in deceleration mode. This is consistent with the finding in idle mode. At the same time, inter-bus variability analysis results show that Bus 379 has the largest median and the second largest mean for NOx emissions, Bus

372 has the largest median and the second largest mean for CO emissions, while Bus 364 has the largest median and mean for HC emissions. But it is difficult to conclude that these buses should be classified as high emitters or that there are any special modes that should be modeled separately as high-emitting modes.

• Some high HC emissions events are noted in deceleration mode. After screening engine speed, engine power, engine oil temperature, engine oil pressure, engine coolant temperature, ECM pressure, and other parameters, these operating parameters could not be linked to these high emissions occurrences. Additional causal variables may be in play that are not included in the data available for analysis.

165 • Based on definitions of “acceleration < -1 mph/s”, about 1% of data points exhibit somewhat unusually high engine power (>=50 bhp) in deceleration mode.

Analysis shows that higher engine power always happened with higher vehicle speed and higher engine speed. These higher-power data points likely reflect the variability in real world power demand (perhaps associated with operations on grade, which could not be identified in the database). All of these data were retained in the model to avoid potentially biasing the results.

• In summary, the deceleration non-motoring mode emission rate for NOx is set as 0.04578 g/s, CO as 0.00686 g/s, and HC as 0.00131 g/s. The deceleration motoring emission rates for NOx is set as 0.00978 g/s, CO as 0.00529 g/s, and HC as 0.00111 g/s.

Emission rate estimation for deceleration motoring mode is significantly lower in deceleration non-motoring mode for all three pollutants, especially for NOx.

166 CHAPTER 10

ACCELERATION MODE DEVELOPMENT

After developing the idle mode definition and emission rate in Chapter 8 and deceleration mode definitions and emission rates in Chapter 9, the next task is to divide the rest of the data into acceleration and cruise mode. This chapter examines the definition of acceleration activity and emission rates for acceleration activity.

10.1 Critical Value for Acceleration in Acceleration Mode

The first task related to analysis of emission rates in the acceleration mode is identifying a critical value for acceleration. Two values were tested: 1 mph/s and 2mph/s.

Since the critical value of “acceleration > 1 mph/s” will include all data under the critical value of “acceleration > 2 mph/s”, comparison of data falling between these two potential cut points is conducted first. Once selected, the chosen critical value will be used divide the data into acceleration mode and cruise mode. So “acceleration > 0 mph/s & acceleration <= 1 mph/s” will be another option. Similar with analysis for deceleration mode, these three options will be:

• Option 1: acceleration > 2 mph/s

• Option 2: acceleration > 1 mph/s & acceleration <= 2 mph/s

• Option 3: acceleration > 0 mph/s & acceleration <= 1 mph/s

Figure 10-1 illustrates engine power distribution for these three options. Figure 10-2 to

10-4 compares engine power vs. emission rate for three pollutants for three options.

Table 10-1 and 10-2 provide the distribution for these three options in two ways: by number and percentage.

167

Figure 10-1 Engine Power Distribution for Three Options

Figure 10-2 Engine Power vs. NOx Emission Rate (g/s) for Three Options

168

Figure 10-3 Engine Power vs. CO Emission Rate (g/s) for Three Options

Figure 10-4 Engine Power vs. HC Emission Rate (g/s) for Three Options

169 Table 10-1 Engine Power Distribution for Three Options for Three Pollutants

Engine Power (brake horsepower (bhp)) Acceleration Pollutants (0 50) (50 100) (100 150) (150 200) >=200 Total Option 1 NOx 322 446 852 1229 5870 8719 CO 319 444 851 1228 5870 8712 HC 318 440 833 1203 5649 8443 Option 2 NOx 613 865 1358 1324 6015 10175 CO 606 858 1355 1321 6012 10152 HC 605 843 1328 1287 5824 9887 Option 3 NOx 3208 4130 4378 2490 3205 17411 CO 3190 4105 4362 2487 3185 17329 HC 3104 3972 4195 2408 3131 16810

Table 10-2 Percentage of Engine Power Distribution for Three Options for Three

Pollutants

Engine Power (brake horsepower (bhp)) Acceleration Pollutants (0 50) (50 100) (100 150) (150 200) >=200 Total Option 1 NOx 3.7% 5.1% 9.8% 14.1% 67.3% 100.0% CO 3.7% 5.1% 9.8% 14.1% 67.4% 100.0% HC 3.8% 5.2% 9.9% 14.2% 66.9% 100.0% Option 2 NOx 6.0% 8.5% 13.3% 13.0% 59.1% 100.0% CO 6.0% 8.5% 13.3% 13.0% 59.2% 100.0% HC 6.1% 8.5% 13.4% 13.0% 58.9% 100.0% Option 3 NOx 18.4% 23.7% 25.1% 14.3% 18.4% 100.0% CO 18.4% 23.7% 25.2% 14.4% 18.4% 100.0% HC 18.5% 23.6% 25.0% 14.3% 18.6% 100.0%

If the critical value is set as 1 mph/s for acceleration mode, data falling into option

1 and option 2 will be classified as acceleration mode while data falling into option 3 will be classified as cruise mode. If the critical value is set as 2 mph/s for acceleration mode, data falling into option 1 will be classified as acceleration mode while data falling into option 2 and option 3 will be classified as cruise mode. There is little difference in the engine power distributions noted for data falling into option 1 and option 2 while the power distribution for option 3 is obviously different from option 1 and option 2 in the above figures and tables. Table 10-1 and 10-2 show that the engine power is more

170 concentrated in higher engine power (>=200 bhp) for data in acceleration mode. Table

10-1 and 10-2 better reflect the power demand of the vehicle in real world in acceleration mode. Hence, the critical value is set as 1 mph/s for acceleration mode.

After defining “acceleration > 1 mph/s” as acceleration mode, cruise mode data will be the all of the remaining data in the database (i.e. data no previously classified into idle, deceleration, and now acceleration). Unlike idle and deceleration mode, there is a general relationship between engine power and emission rate for acceleration mode and cruise mode. Even though the engine power distribution for acceleration mode is different than that of cruise mode (Table 10-3), these two modes share a relationship between engine power and emission rate (Figure 10-5), although there are potentially some significant differences noted in the HC chart.

Table 10-3 Engine Power Distribution for Acceleration Mode and Cruise Mode

Engine Power Distribution Pollutants (0 50) (50 100) (100 150) (150 200) >=200 All Acceleration mode Number NOx 935 1311 2210 2553 11885 18894 CO 925 1302 2206 2549 11882 18864 HC 923 1283 2161 2490 11473 18330 Percentage NOx 4.95% 6.94% 11.70% 13.51% 62.90%100.00% CO 4.90% 6.90% 11.69% 13.51% 62.99%100.00% HC 5.04% 7.00% 11.79% 13.58% 62.59%100.00% Cruise mode Number NOx 15885 8988 7173 3536 3792 39374 CO 15834 8940 7145 3529 3770 39218 HC 15481 8600 6830 3394 3715 38020 Percentage NOx 40.34% 22.83% 18.22% 8.98% 9.63% 100.00% CO 40.37% 22.80% 18.22% 9.00% 9.61% 100.00% HC 40.72% 22.62% 17.96% 8.93% 9.77% 100.00%

171

Figure 10-5 Engine Power vs. Emission Rate for Acceleration Mode and Cruise Mode

The relationships between emission rate and power for acceleration mode data will be explored in this Chapter, while the relationships between emission rate and power for cruise mode data will be explored in the next chapter.

10.2 Analysis of Acceleration Mode Data

10.2.1 Emission Rate Distribution by Bus in Acceleration Mode

After defining vehicle activity data with “acceleration >1 mph/s” as acceleration mode, emission rate histograms for each of the three pollutants for acceleration operations are presented in Figure 10-6. Figure 10-6 shows significant skewness for all three pollutants for acceleration mode. There are also a small number of some very high

HC emissions events noted in acceleration mode. After screening engine speed, engine power, engine oil temperature, engine oil pressure, engine coolant temperature, ECM

172 pressure, and other parameters, no operating parameters appeared to be correlated with the high emissions events.

Figure 10-6 Histograms of Three Pollutants for Acceleration Mode

Inter-bus response variability for acceleration mode operations is illustrated in

Figures 10-7 to 10-9 using median and mean of NOx, CO, and HC emission rates. Table

10-4 presents the same information in tabular form. The difference between median and mean is also an indicator of skewness.

173 Table 10-4 Median, and Mean of Three Pollutants in Acceleration Mode by Bus

NOx CO HC Bus ID Median Mean Median Mean Median Mean Bus 360 0.27729 0.25957 0.06527 0.09217 0.00159 0.00182 Bus 361 0.30170 0.28125 0.05177 0.08001 0.00184 0.00228 Bus 363 0.14459 0.14058 0.03836 0.09012 0.00022 0.00039 Bus 364 0.28948 0.26033 0.03501 0.05650 0.00306 0.00363 Bus 372 0.17834 0.18627 0.02980 0.03475 0.00250 0.00279 Bus 375 0.31092 0.28991 0.05929 0.08619 0.00143 0.00176 Bus 377 0.17827 0.17335 0.04755 0.09612 0.00104 0.00112 Bus 379 0.17788 0.20883 0.08430 0.10346 0.00222 0.00276 Bus 380 0.26410 0.26620 0.08238 0.19149 0.00210 0.00253 Bus 381 0.18011 0.19806 0.07856 0.12646 0.00095 0.00106 Bus 382 0.28966 0.29152 0.09234 0.18179 0.00263 0.00272 Bus 383 0.24419 0.26739 0.05355 0.13112 0.00308 0.00368 Bus 384 0.18775 0.22139 0.07111 0.17389 0.00401 0.00429 Bus 385 0.17783 0.21706 0.05141 0.07893 0.00361 0.00384 Bus 386 0.22674 0.24673 0.10412 0.23806 0.00272 0.00282

Figure 10-7 Median and Mean of NOx Emission Rates in Acceleration Mode by Bus

174

Figure 10-8 Median and Mean of CO Emission Rates in Acceleration Mode by Bus

Figure 10-9 Median and Mean of HC Emission Rates in Acceleration Mode by Bus

175 Figure 10-7 to 10-9 and Table 10-4 illustrate that NOx emissions are more consistent than CO and HC emissions. Across the 15 buses, Bus 386 has the largest median and mean for CO emissions, while Bus 384 has the largest median and mean for

HC emissions. The above figures and table demonstrate that although variability exist across buses, it is difficult to conclude that there are any true “high emitters”. That is, the emissions from these buses are not consistently more than one or two standard deviations from the mean under normal operating conditions. Meanwhile, Bus 363 has the smallest mean and median HC emissions compared to the other 14 buses.

10.2.2 Engine Power Distribution by Bus in Acceleration Mode

Engine power distribution in acceleration mode by bus is shown in Figure 10-10 and Table 10-5. When the bus is accelerating, the engine will be required to produce more power. Figure 10-10 and Table 10-5 reflect this characteristic of acceleration mode.

The distribution of engine power in acceleration mode is significantly different from what is in deceleration mode and idle mode. Bus 372 has the largest minimum engine power in acceleration mode. This is consistent with the finding in idle mode and deceleration mode. The maximum power values for each bus match well with the manufacturer’s engine power rating. Although variability for engine power distribution exist across buses, it is difficult to conclude that such variability is affected by individual buses, bus routes, or other factors. The relationship between power and emissions appears consistent across the buses for acceleration mode.

176 Table 10-5 Engine Power Distribution in Acceleration Mode by Bus

1st 3rd Bus ID Number Min Quartile Median Quartile Max Mean Bus 360 1507 0 162.96 255.57 275.05 275.59 212.04 Bus 361 545 7.16 131.96 199.58 261.51 275.54 184.46 Bus 363 1287 0 111.52 200.39 267.06 275.59 180.03 Bus 364 931 0 142.82 228.25 270.01 275.56 197.27 Bus 372 728 34.42 145.57 213.51 264.70 275.56 199.81 Bus 375 1599 0 140.92 259.45 275.13 275.57 205.56 Bus 377 1751 3.35 166.25 256.89 275.08 275.60 212.09 Bus 379 1427 0 204.15 264.54 275.18 275.58 233.71 Bus 380 1823 0 202.69 262.11 275.15 275.54 228.55 Bus 381 1362 0 139.86 220.00 272.21 275.60 199.20 Bus 382 691 0 173.36 250.90 275.05 275.58 218.82 Bus 383 1043 0 161.16 250.37 275.08 275.59 213.70 Bus 384 1292 0 144.10 213.87 269.50 275.60 198.80 Bus 385 1377 0 143.51 226.37 274.99 275.55 201.67 Bus 386 1532 13.81 164.27 244.80 275.06 275.60 215.95

Engine power distribution also shows that there are about 0.19% (36/18895) of data points with zero load in acceleration mode. For the 36 data points exhibiting zero indicated engine load, about 92% (33/36) occurred on roads reported to have zero or negative grade. Due to the inaccuracy of road grade value, it was not possible to simulate the engine power in this research. However, in the real world, linear acceleration with zero load can happen on downhill stretches. Application of load based emission rates to predicate engine load will be able to take grade into account in the overall modeling framework. Because only 36 data points with zero load were included in the acceleration data, it was unnecessary to develop a sub-model for them. Meanwhile, such zero load in acceleration mode do reflect the variability of acceleration data in the real world.

177

Figure 10-10 Histograms of Engine Power in Acceleration Mode by Bus

178 10.3 Model Development and Refinement

10.3.1 HTBR Tree Model Development

The potential explanatory variables included in the emission rate model development effort include:

• Vehicle characteristics: model year, odometer reading, bus ID (14 dummy

variables)

• Roadway characteristics: dummy variable for road grade;

• Onroad load parameters: engine power (bhp), vehicle speed (mph),

acceleration (mph/s);

• Engine operating parameters: engine oil temperature(deg F), engine oil

pressure (kPa), engine coolant temperature (deg F), barometric pressure

reported from ECM (kPa);

• Environmental conditions: ambient temperature (deg C), ambient pressure

(mbar), ambient relative humidity (%).

The Hierarchical Tree-Based Regression (HTBR) technique is used first to identify potentially significant explanatory variables and this analysis provides the starting point for conceptual model development. The HTBR model is used to guide the development of an OLS regression model, rather than as a model in its own right. HTBR can be used as a data reduction tool and for identifying potential interactions among the variables. Then OLS regression is used with the identified variables to estimate a preliminary “final” model.

These 27 variables were first offered to the tree model. To arrive at the “best” model, various regression tree models were created. The initial model was created by allowing the tree to grow unconstrained for the first cut. Once an initial model was created, the supervised technique in S-PLUS was used to simplify the model by removing

179 the lower branches of the tree that explained the least deviance. For application purposes, the resulting tree was examined to ensure that the model’s predictive ability wasn’t compromised by allowing the overall amount of deviance to increase significantly.

The 27 variables include continuous, categorical, and dummy variables. Dummy variables for buses could be used to indicate the variability of buses. Just as analysis in

Chapter 6, these 15 buses could be treated as a single group for purpose of analysis and model development. HTBR technique can examine the potential additional influence of road grade (i.e. above and beyond the contribution to power demand) using s dummy variable to represent a grade categories (the final model does not include this dummy variable due to the inaccuracy of road grade values). Analysis results in Chapter 6 indicate that all environmental characteristics, like temperature, humidity and barometric pressure, are moderately correlated with each other. On the other hand, engine operating parameters, like engine oil pressure, engine oil temperature, engine coolant temperature, and barometric pressure reported from ECM, are highly or moderately related to onroad operating parameters, like engine power, vehicle speed, and acceleration. The modeler should aware of such correlations among explanatory variables.

Although evidence in the literature suggests that a logarithmic transformation is most suitable for modeling motor vehicle emissions (Washington 1994; Ramamurthy et al. 1998; Fomunung 2000; Frey et al. 2002), this transformation needs to be verified through Box-Cox procedure. Box-Cox function in Matlab can automatically identify a transformation from the family of power transformations on emission data, ranging from

-1.0 to 1.0. The lambdas chosen by Box-Cox procedure for acceleration mode are 0.683 for NOx, 0.094438 for CO, 0.31919 for HC. Box-Cox procedure is only used to provide a guide for selecting a transformation, so overly precise results are not needed (Neter et al.

1996). It is often reasonable to use a nearby lambda value for the power transformation that is easier to understand. Although the lambdas chosen by Box-Cox procedure are different for acceleration and cruise mode, the nearby lambda values are same for these

180 two modes. In summary, the lambda values used for transformations are ½ for NOx, 0 for CO (indicating a log transformation), and ¼ for HC for acceleration mode. Figure 10-

11 to 10-13 presented histogram, boxplot, and probability plot of truncated emission rate in acceleration mode for NOx, CO, and HC, while Figure 10-14 to 10-16 presented same plots for truncated transformed emission rate for NOx, CO and HC, where a great improvement is noted.

Figure 10-11 Histogram, Boxplot, and Probability Plot of Truncated NOx Emission Rate in Acceleration Mode

181

Figure 10-12 Histogram, Boxplot, and Probability Plot of Truncated CO Emission Rate in Acceleration Mode

Figure 10-13 Histogram, Boxplot, and Probability Plot of Truncated HC Emission Rate in Acceleration Mode

182

Figure 10-14 Histogram, Boxplot, and Probability Plot of Truncated Transformed NOx Emission Rate in Acceleration Mode

Figure 10-15 Histogram, Boxplot, and Probability Plot of Truncated Transformed CO Emission Rate in Acceleration Mode

183

Figure 10-16 Histogram, Boxplot, and Probability Plot of Truncated Transformed HC Emission Rate in Acceleration Mode

10.3.1.1 NOx HTBR Tree Model Development

Figure 10-17 illustrates the initial tree model used for truncated transformed NOx emission rate in acceleration mode. Results for initial model are given in Table 10-6.

The tree grew into a complex model, with a considerable number of branches and 36 terminal nodes. Figure 10-18 illustrates the amount of deviation explained corresponding to the number of terminal nodes.

184

Figure 10-17 Original Untrimmed Regression Tree Model for Truncated Transformed NOx Emission Rate in Acceleration Mode

Figure 10-18 Reduction in Deviation with the Addition of Nodes of Regression Tree for Truncated Transformed NOx Emission Rate in Acceleration Mode

185 Table 10-6 Original Untrimmed Regression Tree Results for Truncated Transformed

NOx Emission Rate in Acceleration Mode

Regression tree: tree(formula = NOx.50 ~ model.year + odometer + temperature + baro + humidity + vehicle.speed + oil.temperture + oil.press + cool.temperature + eng.bar.press + engine.power + acceleration + bus360 + bus361 + bus363 + bus364 + bus372 + bus375 + bus377 + bus379 + bus380 + bus381 + bus382 + bus383 + bus384 + bus385 + dummy.grade, data = busdata10242006.1.3, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01) Variables actually used in tree construction: [1] "engine.power" "vehicle.speed" "temperature" "baro" [5] "bus375" "humidity" "oil.press" "odometer" [9] "eng.bar.press" "bus379" "model.year" "oil.temperture" Number of terminal nodes: 36 Residual mean deviance: 0.005538 = 104.4 / 18860 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -3.769e-001 -4.176e-002 -4.298e-003 3.661e-017 3.957e-002 8.965e-001

For model application purposes, it is desirable to select a final model specification that balances the model’s ability to explain the maximum amount of deviation with a simpler model that is easy to interpret and apply. Figure 10-18 indicated that reduction in deviation with addition of nodes after 4, although potentially statistically significant, is very small. A simplified tree model was derived which ends in 4 terminal nodes as compared to the 36 terminal nodes in the initial model. The residual mean deviation only increased from 104.4 to 151.2 and yielded a much more efficient model. Results are shown in Table 10-7 and Figure 10-19. Based on above analysis, NOx acceleration emission rate model will be developed based upon these results.

186

Figure 10-19 Trimmed Regression Tree Model for Truncated Transformed NOx Emission Rate in Acceleration Mode

Table 10-7 Trimmed Regression Tree Results for Truncated Transformed NOx Emission Rate in Acceleration Mode Regression tree: snip.tree(tree = tree(formula = NOx.50 ~ model.year + odometer + temperature + baro + humidity + vehicle.speed + oil.temperture + oil.press + cool.temperature + eng.bar.press + engine.power + acceleration + bus360 + bus361 + bus363 + bus364 + bus372 + bus375 + bus377 + bus379 + bus380 + bus381 + bus382 + bus383 + bus384 + bus385 + dummy.grade, data = busdata10242006.1.3, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01), nodes = c(13., 7., 12., 2.)) Variables actually used in tree construction: [1] "engine.power" "vehicle.speed" "temperature" Number of terminal nodes: 4 Residual mean deviance: 0.008002 = 151.2 / 18890 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -4.265e-001 -5.813e-002 -7.517e-004 8.861e-016 5.810e-002 8.710e-001 node), split, n, deviance, yval * denotes terminal node

1) root 18894 247.20 0.4669 2) engine.power<72.3 1397 13.67 0.2581 * 3) engine.power>72.3 17497 167.70 0.4836 6) vehicle.speed<25.95 13777 121.40 0.4662 12) temperature<20.5 4902 42.44 0.5034 * 13) temperature>20.5 8875 68.45 0.4456 * 7) vehicle.speed>25.95 3720 26.60 0.5482 *

This tree model suggested that engine power is the most important explanatory variable for NOx emissions. This finding is consistent to previous research results which

187 verified the important role of engine power on NOx emissions (Ramamurthy et al. 1998;

Clark et al. 2002; Barth et al. 2004). Analysis in previous chapter also indicates that engine power is correlated with not only onroad load parameters such as vehicle speed, acceleration, and grade, but also engine operating parameters such as throttle position and engine oil pressure. On the other hand, engine power in this research is derived from engine speed, engine torque and percent engine load. So engine power can connect onroad modal activity with engine operating conditions to that extent. This fact strengthens the importance of introduce engine power into the conceptual model and the need to improve the ability to simulate engine power for regional inventory development.

HTBR results suggest that temperature may be an important predictive variable for NOx emissions under certain conditions. Temperature effects may need to be integrated into new models in the form of a temperature correction factor. But adequate data are not yet available for this purpose. For the time being, temperature is removed from consideration in further linear regression model development, but the effect is probably significant and should be examined when more comprehensive emission rate data collected under a wider variety of temperature conditions are available for analysis.

10.3.1.2 CO HTBR Tree Model Development

Figure 10-20 illustrates the initial tree model used for truncated transformed CO emission rate in acceleration mode. Results for initial model are given in Table 10-8.

The tree grew into a complex model with a considerable number of branches and 33 terminal nodes. Figure 10-21 illustrates the amount of deviation explained corresponding to the number of terminal nodes.

188

Figure 10-20 Original Untrimmed Regression Tree Model for Truncated Transformed CO Emission Rate in Acceleration Mode

Figure 10-21 Reduction in Deviation with the Addition of Nodes of Regression Tree for Truncated Transformed CO Emission Rate in Acceleration Mode

189 Table 10-8 Original Untrimmed Regression Tree Results for Truncated Transformed CO

Emission Rate in Acceleration Mode

Regression tree: tree(formula = log.CO ~ model.year + odometer + temperature + baro + humidity + vehicle.speed + oil.temperture + oil.press + cool.temperature + eng.bar.press + engine.power + acceleration + bus360 + bus361 + bus363 + bus364 + bus372 + bus375 + bus377 + bus379 + bus380 + bus381 + bus382 + bus383 + bus384 + bus385 + dummy.grade, data = busdata10242006.1.3, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01) Variables actually used in tree construction: [1] "engine.power" "humidity" "vehicle.speed" "acceleration" [5] "odometer" "model.year" "baro" "eng.bar.press" Number of terminal nodes: 33 Residual mean deviance: 0.1184 = 2229 / 18830 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -2.552e+000 -2.001e-001 -1.285e-002 3.025e-017 1.981e-001 1.653e+000

For model application purposes, it is desirable to select a final model specification that balances the model’s ability to explain the maximum amount of deviation with a simpler model that is easy to interpret and apply. Figure 10-21 indicated that reduction in deviation with addition of nodes after 4, although potentially statistically significant, is very small. A simplified tree model was derived which ends in 4 terminal nodes as compared to the 33 terminal nodes in the initial model. The residual mean deviation only increased from 2229 to 3093 and yielded a much cleaner model than the initial one.

Results are shown in Table 10-9 and Figure 10-22. The CO acceleration emission rate model will be developed based upon these results.

190

Figure 10-22 Trimmed Regression Tree Model for Truncated Transformed CO Emission Rate in Acceleration Mode

Table 10-9 Trimmed Regression Tree Results for Truncated Transformed CO Emission

Rate in Acceleration Mode

Regression tree: snip.tree(tree = tree(formula = log.CO ~ model.year + odometer + temperature + baro + humidity + vehicle.speed + oil.temperture + oil.press + cool.temperature + eng.bar.press + engine.power + acceleration + bus360 + bus361 + bus363 + bus364 + bus372 + bus375 + bus377 + bus379 + bus380 + bus381 + bus382 + bus383 + bus384 + bus385 + dummy.grade, data = busdata10242006.1.3, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01), nodes = c(12., 7., 2., 13.)) Variables actually used in tree construction: [1] "engine.power" "vehicle.speed" Number of terminal nodes: 4 Residual mean deviance: 0.164 = 3093 / 18860 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -3.019e+000 -2.450e-001 -1.062e-002 -9.774e-017 2.430e-001 1.735e+000 node), split, n, deviance, yval * denotes terminal node

1) root 18864 5309.0 -1.1990 2) engine.power<82.625 1624 560.0 -1.9810 * 3) engine.power>82.625 17240 3662.0 -1.1250 6) vehicle.speed<19.05 9752 1994.0 -0.9339 12) engine.power<152.965 2335 522.6 -1.2510 * 13) engine.power>152.965 7417 1163.0 -0.8342 * 7) vehicle.speed>19.05 7488 847.2 -1.3740 *

191 This tree model suggested that engine power is the most important explanatory variable for CO emissions. This finding is consistent with NOx emissions. This tree will be used as reference for linear regression model development.

10.3.1.3 HC HTBR Tree Model Development

Figure 10-23 illustrates the initial tree model used for truncated transformed HC emission rate in acceleration mode. Results for initial model are given in Table 10-10.

The tree grew into a complex model with a considerable number of branches and 30 terminal nodes.

Figure 10-23 Original Untrimmed Regression Tree Model for Truncated Transformed HC Emission Rate in Acceleration Mode

192

Figure 10-24 Reduction in Deviation with the Addition of Nodes of Regression Tree for Truncated Transformed HC Emission Rate in Acceleration Mode

Table 10-10 Original Untrimmed Regression Tree Results for Truncated Transformed

HC Emission Rate in Acceleration Mode

Regression tree: tree(formula = HC.25 ~ model.year + odometer + temperature + baro + humidity + vehicle.speed + oil.temperture + oil.press + cool.temperature + eng.bar.press + engine.power + acceleration + bus360 + bus361 + bus363 + bus364 + bus372 + bus375 + bus377 + bus379 + bus380 + bus381 + bus382 + bus383 + bus384 + bus385 + dummy.grade, data = busdata10242006.1.3, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01) Variables actually used in tree construction: [1] "odometer" "bus377" "bus381" "baro" [5] "engine.power" "humidity" "vehicle.speed" "oil.press" [9] "bus375" "oil.temperture" "acceleration" "bus384" [13] "bus364" "model.year" Number of terminal nodes: 31 Residual mean deviance: 0.0005694 = 10.42 / 18300 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -1.004e-001 -1.347e-002 -2.222e-003 1.386e-016 1.091e-002 2.755e-001

Figure 10-23 and Table 10-12 suggest that the tree analysis of HC emission rates identified a number of buses that appear to exhibit significantly different emission rates under all load conditions than the other buses (i.e. some of the bus dummy variables appeared as significant in the initial tree splits). Two bus dummy variables split the data

193 pool at the top levels of the HC tree model. The first cut point of “odometer>282096” in

HC tree model could be directly replaced by “bus363>0.5”, because only bus 363 has an odometer reading larger than 282096. So there were three bus dummy variables that split the first three levels of the HC tree model. Although higher emissions were noted in all three pollutants for some of the 15 buses, the division was even more obvious for HC emissions (see Figure 10-9 and Table 10-4). This is consistent with the findings in idle and deceleration mode. Although it is tempting to develop different emission rates for these buses, to reduce emission rate deviation in the sample pool, it is difficult to justify doing so. Unless these is an obvious reason to classify these three buses as high emitters

(i.e. significantly higher than normal emitting vehicles, perhaps by as much as a few standard deviations from the mean), and unless there are enough data to develop separate emission rate models for high emitters, one cannot justify removing the data from the data set. Until such data exist to justify treating these buses as high emitters, the bus dummy variables for individual buses are removed from the analyses and all 15 buses are treated as part of the whole of the data.

Another tree model was generated excluding the bus dummy variables, model year, and odometer. This new tree model is illustrated in Figure 10-25 and Table 10-11.

The tree model is then trimmed for application purposes, as was done for the NOx and

CO models.

194

Figure 10-25 Trimmed Regression Tree Model for Truncated Transformed HC in Acceleration Mode

Table 10-11 Trimmed Regression Tree Results for Truncated Transformed HC in

Acceleration Mode

Regression tree: snip.tree(tree = tree(formula = HC.25 ~ temperature + baro + humidity + vehicle.speed + oil.temperture + oil.press + cool.temperature + eng.bar.press + engine.power + acceleration + dummy.grade, data = busdata10242006.1.3, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01), nodes = c(2., 6., 15., 14.)) Variables actually used in tree construction: [1] "baro" "engine.power" Number of terminal nodes: 4 Residual mean deviance: 0.001018 = 18.65 / 18330 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -9.502e-002 -2.174e-002 -2.213e-003 9.390e-016 1.844e-002 3.100e-001 node), split, n, deviance, yval * denotes terminal node

1) root 18330 30.840 0.2099 2) baro<969.5 1189 1.239 0.1286 * 3) baro>969.5 17141 21.210 0.2155 6) engine.power<56.24 850 1.069 0.1682 * 7) engine.power>56.24 16291 18.140 0.2180 14) baro<989.5 13717 13.970 0.2134 * 15) baro>989.5 2574 2.372 0.2423 *

195 The new tree model suggests that barometric pressure is the most important explanatory variable for HC emission rates. However, this finding is challenged by this fact: among those 1189 data points (baro<969.5) in the first left branch, 1187 data points belong to Bus 363. Although this dataset was collected under a wide variety of environmental conditions, the scope of barometric pressure was limited for individual buses tested. As reported earlier, Bus 363 exhibited significantly lower HC emissions that the other buses (see Figure 10-9). But the reason is not clear at this time. To develop a reasonable tree model given the limited data collected, the environmental parameters are excluded from the model until a greater distribution of environmental conditions can be represented in a test data set. With data collected from a more comprehensive testing program, environmental variables can be integrated into the model directly, or perhaps correction factors for the emission rates can be developed. The secondary trimmed tree is presented in Figure 10-26 and Table 10-12.

Figure 10-26 Secondary Trimmed Regression Tree Model for Truncated Transformed HC Emission Rate in Acceleration Mode

196 Table 10-12 Secondary Trimmed Regression Tree Results for Truncated Transformed

HC Emission Rate in Acceleration Mode

Regression tree: snip.tree(tree = tree(formula = HC.25 ~ engine.power + vehicle.speed + acceleration + oil.temperture + oil.press + cool.temperature + eng.bar.press, data = busdata10242006.1.3, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.1), nodes = c(7., 13., 12.)) Variables actually used in tree construction: [1] "engine.power" "oil.press" "eng.bar.press" Number of terminal nodes: 4 Residual mean deviance: 0.00136 = 24.92 / 18330 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -1.178e-001 -2.378e-002 6.119e-004 -4.275e-017 2.231e-002 3.223e-001 node), split, n, deviance, yval * denotes terminal node

1) root 18330 30.840 0.2099 2) engine.power<54.555 988 1.779 0.1559 * 3) engine.power>54.555 17342 26.020 0.2130 6) oil.press<427.75 12457 18.610 0.2076 12) eng.bar.press<100.249 4989 9.241 0.1937 * 13) eng.bar.press>100.249 7468 7.763 0.2169 * 7) oil.press>427.75 4885 6.136 0.2266 *

This tree model suggested that engine power is the most important explanatory variable for HC emissions. This finding is consistent with analysis of NOx and CO emission rates. HTBR results also suggest that oil pressure and engine bar pressure may be important predictive variables for HC emissions under certain conditions. After excluding engine barometric pressure and oil pressure from the tree model, leaving engine power only, the residual mean deviation increased slightly from 24.92 to 27.34.

While engine operating parameters such as oil pressure and engine barometric pressure may impact emissions, such variables are not easy to implement in real-world models.

The final HTBR tree for HC emissions are shown in Figure 10-27 and Table 10-13. HC acceleration emission rate model will be developed based upon these results.

197

Figure 10-27 Final Regression Tree Model for Truncated Transformed HC and Engine Power in Acceleration Mode

Table 10-13 Final Regression Tree Results for Truncated Transformed HC and Engine

Power in Acceleration Mode

Regression tree: snip.tree(tree = tree(formula = HC.25 ~ engine.power, data = busdata10242006.1.3, na.action = na.exclude, mincut = 5, minsize = 10,mindev = 0.01), nodes = c(7., 6., 4., 5.)) Number of terminal nodes: 4 Residual mean deviance: 0.001492 = 27.34 / 18330 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -1.296e-001 -2.277e-002 8.001e-005 4.271e-016 2.298e-002 3.065e-001 node), split, n, deviance, yval * denotes terminal node

1) root 18330 30.8400 0.2099 2) engine.power<54.555 988 1.7790 0.1559 4) engine.power<14.825 438 0.6518 0.1360 * 5) engine.power>14.825 550 0.8171 0.1717 * 3) engine.power>54.555 17342 26.0200 0.2130 6) engine.power<98.385 1177 1.8580 0.2022 * 7) engine.power>98.385 16165 24.0100 0.2137 *

198 10.3.2 OLS Model Development and Refinement

Once a manageable number of modal variables have been identified through regression tree analysis, the modeling process moves into the phase in which ordinary least squares techniques are used to obtain a final model. The research objective here is to identify the extent to which the identified factors influence emission rates in acceleration mode. Modelers rely on previous research, a priori knowledge, educated guesses, and stepwise regression procedures to identify acceptable functional forms, to determine important interactions, and to derive statistically and theoretically defensible models. The final model will be our best understanding about the functional relationship between independent variables and dependent variable.

10.3.2.1 NOx Emission Rate Model Development for Acceleration Mode

Based on previous analysis, truncated transformed NOx will serve as the independent variable. However, modelers should keep in mind that the comparisons should always be made on the original untransformed scale of Y when comparing the performance of statistical models. HTBR tree model results suggest that engine power is the best one to begin with. Linear regression model with engine power will be developed first, followed by a combined power and vehicle speed model.

10.3.2.1.1 Linear Regression Model with Engine Power

Let’s select engine power to begin with, and estimate the model:

Y = β0 + β1(engine.power) + Error (1.1)

The regression run yields the following results.

199 Table 10-14 Regression Result for NOx Model 1.1

Call: lm(formula = NOx.50 ~ engine.power, data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.4093 -0.08133 0.005414 0.07084 0.9344

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.3054 0.0021 147.9391 0.0000 engine.power 0.0008 0.0000 83.3557 0.0000

Residual standard error: 0.09781 on 18892 degrees of freedom Multiple R-Squared: 0.2689 F-statistic: 6948 on 1 and 18892 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) engine.power -0.9387

Analysis of Variance Table

Response: NOx.50

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) engine.power 1 66.4763 66.47630 6948.175 0 Residuals 18892 180.7482 0.00957

The results suggest that engine power explains about 27% of the variance in truncated transformed NOx. F-statistic shows that β1≠0, and the linear relationship is statistically significant. To evaluate the model, residual normality is checked by examining QQ plot and check constancy of variance by examining residuals vs. fitted values.

200

Figure 10-28 QQ and Residual vs. Fitted Plot for NOx Model 1.1

The residual plot in Figure 10-28 shows a slight departure from linear regression assumptions indicating a need to explore a curvilinear regression function. Since the variability at the different X levels appears to be fairly constant, a transformation on X is considered. The reason to consider transformation first is avoiding multicollinearity brought about by adding the second-order of X. Based on the prototype plot in Figure

10-28, the square root transformation and logarithmic transformation are tested. Scatter plots and residual plots based on each transformation should then be prepared and analyzed to determine which transformation is most effective.

Y = β0 + β1engine.power^(1/2) + Error (1.2)

Y = β0 + β1log10(engine.power+1) + Error (1.3)

The result for Model 1.2 will be shown in Table 10-15 and Figure 10-29, while the result for Model 1.3 will be shown in Table 10-16 and Figure 10-30.

201 Table 10-15 Regression Result for NOx Model 1.2

Call: lm(formula = NOx.50 ~ engine.power^(1/2), data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.4106 -0.07981 0.004093 0.06858 0.9248

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1912 0.0030 63.2141 0.0000 I(engine.power^(1/2)) 0.0196 0.0002 93.5953 0.0000

Residual standard error: 0.09455 on 18892 degrees of freedom Multiple R-Squared: 0.3168 F-statistic: 8760 on 1 and 18892 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) I(engine.power^(1/2)) -0.9738

Analysis of Variance Table

Response: NOx.50

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) I(engine.power^(1/2)) 1 78.3199 78.31986 8760.082 0 Residuals 18892 168.9047 0.00894

Figure 10-29 QQ and Residual vs. Fitted Plot for NOx Model 1.2

202 Table 10-16 Regression Result for NOx Model 1.3

*** Linear Model ***

Call: lm(formula = NOx.50 ~ log10(engine.power + 1), data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.4109 -0.07485 0.001841 0.06716 0.9119

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -0.0514 0.0052 -9.7873 0.0000 log10(engine.power + 1) 0.2291 0.0023 99.6000 0.0000

Residual standard error: 0.09263 on 18892 degrees of freedom Multiple R-Squared: 0.3443 F-statistic: 9920 on 1 and 18892 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) log10(engine.power + 1) -0.9917

Analysis of Variance Table

Response: NOx.50

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) log10(engine.power + 1) 1 85.1206 85.12056 9920.161 0 Residuals 18892 162.1040 0.00858

Figure 10-30 QQ and Residual vs. Fitted Plot for NOx Model 1.3

203 The results suggest that by using square root transformed engine power, the model increases the amount of variance explained in truncated transformed NOx from about 27% (Model 1.1) to about 32% (Model 1.2), while to about 34% (Model 1.3) by using log transformed engine power.

Model 1.3 improves the R2 more than does Model 1.2. The residuals scatter plot for Model 1.3 (Figure 10-30) shows a more reasonably linear relation than Model 1.2

(Figure 10-29). Figure 10-30 also shows that Model 1.3 does a better job in improving the pattern of variance. QQ plot shows general normality with the exceptions arising in the tails.

10.3.2.1.2 Linear Regression Model with Engine Power and Vehicle Speed

HTBR tree model results also suggest that vehicle speed may be an important predictive variable for emissions under certain conditions. After developing linear regression model with engine power, adding vehicle speed might improve the model predictive ability. The new model is proposed as:

Y = β0 + β1log10(engine.power+1) + β2vehicle.speed + Error (1.4)

The result for Model 1.4 will be shown in Table 10-17 and Figure 10-31.

204 Table 10-17 Regression Result for NOx Model 1.4

Call: lm(formula = NOx.50 ~ log10(engine.power + 1) + vehicle.speed, data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.4133 -0.07416 0.004219 0.06303 0.9019

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -0.0195 0.0053 -3.6693 0.0002 log10(engine.power + 1) 0.2007 0.0025 79.3288 0.0000 vehicle.speed 0.0019 0.0001 25.1554 0.0000

Residual standard error: 0.09112 on 18891 degrees of freedom Multiple R-Squared: 0.3656 F-statistic: 5442 on 2 and 18891 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) log10(engine.power + 1) log10(engine.power + 1) -0.9681 vehicle.speed 0.2383 -0.4470

Analysis of Variance Table

Response: NOx.50

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) log10(engine.power + 1) 1 85.1206 85.12056 10251.92 0 vehicle.speed 1 5.2540 5.25404 632.80 0 Residuals 18891 156.8499 0.00830

Figure 10-31 QQ and Residual vs. Fitted Plot for NOx Model 1.4

205 The results suggest that by using vehicle speed and transformed engine power, the model increases the amount of variance explained in truncated transformed NOx from about 34% (Model 1.3) to about 37% (Model 1.4). The residuals scatter plot for Model

1.4 (Figure 10-30) shows a more reasonably linear relation. Figure 10-30 also shows that model 1.4 does a better job in improving the pattern of variance. QQ plot shows general normality, with deviation at the tails.

10.3.2.1.3 Linear Regression Model with Dummy Variables Figure 10-19 suggests that the relationship between NOx and engine power may be somewhat different across the engine power ranges identified in the tree analysis. That is, there may be higher or lower NOx emissions in different engine power operating ranges.

One dummy variable is created to represent different engine power ranges identified in

Figure 10-19 for use in linear regression analysis as illustrated below:

Engine power (bhp) dummy1 <72.30 1 >=72.30 0

This dummy variable and the interaction between dummy variable and engine power are then tested to determine whether the use of the variables and interactions can help improve the model:

Y = β0 + β1log10(engine.power+1) + β2vehicle.speed + β3 dummy1 + β4dummy1 log10(engine.power+1) + β5 dummy1vehicle.speed + Error (1.5)

206 Table 10-18 Regression Result for NOx Model 1.5

Call: lm(formula = NOx.50 ~ log10(engine.power + 1) + vehicle.speed + dummy1 * log10( engine.power + 1) + dummy1:vehicle.speed, data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.4124 -0.07157 0.003012 0.06319 0.8924

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1439 0.0115 12.4979 0.0000 log10(engine.power + 1) 0.1281 0.0052 24.8261 0.0000 vehicle.speed 0.0023 0.0001 28.9191 0.0000 dummy1 -0.1492 0.0148 -10.0783 0.0000 dummy1:log10(engine.power + 1) 0.0609 0.0081 7.4995 0.0000 dummy1:vehicle.speed -0.0035 0.0003 -10.4883 0.0000

Residual standard error: 0.09022 on 18888 degrees of freedom Multiple R-Squared: 0.3781 F-statistic: 2297 on 5 and 18888 degrees of freedom, the p-value is 0

Analysis of Variance Table

Response: NOx.50

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value log10(engine.power + 1) 1 85.1206 85.12056 10456.89 vehicle.speed 1 5.2540 5.25404 645.45 dummy1 1 1.9017 1.90166 233.62 dummy1:log10(engine.power + 1) 1 0.3018 0.30180 37.08 dummy1:vehicle.speed 1 0.8955 0.89546 110.01 Residuals 18888 153.7510 0.00814

Pr(F) log10(engine.power + 1) 0.000000e+000 vehicle.speed 0.000000e+000 dummy1 0.000000e+000 dummy1:log10(engine.power + 1) 1.158203e-009 dummy1:vehicle.speed 0.000000e+000 Residuals

207

Figure 10-32 QQ and Residual vs. Fitted Plot for NOx Model 1.5

The results suggest that by using dummy variables and interactions with transformed engine power and vehicle speed, the model slightly increases the amount of variance explained in truncated transformed NOx from about 37% (Model 1.4) to about

38% (Model 1.5).

Model 1.5 slightly improves the R2 compared to Model 1.4. The residuals scatter plot for Model 1.5 (Figure 10-32) shows a slightly more linear relation. Figure 10-32 also shows that Model 1.4 may also do a slightly better job in improving the pattern of variance. The QQ plot shows general normality with the exceptions arising in the tails.

However, it is important to note that the model improvement, in terms of amount of variance explained by the model, is marginal at best.

10.3.2.1.4 Model Discussions

The performance of alternative models can be evaluated by comparing model predictions and actual observations for emission rates. The performance of the model can

208 be evaluated in terms of precision and accuracy (Neter et al. 1996). The R2 value is an indication of precision. Usually, higher R2 values imply a higher degree of precision and less unexplained variability in model predictions than lower R2 values. The slope of the trend line for the observed versus predicted values is an indication of accuracy. A slope of one indicates an accurate prediction, in that the prediction of the model corresponds to an observation. Since the R2 and slope are derived by comparing model predictions and actual observations for emission rates, these numbers will be different from what in linear regression models.

The models' predictive ability is also evaluated using the root mean square error

(RMSE) and the mean prediction error (MPE) (Neter et al. 1996). The RMSE is a measure of prediction error. When comparing two models, the model with a smaller

RMSE is a better predictor of the observed phenomenon. Ideally, mean predication error is close to zero. RMSE and MPE are calculated as follows:

n 1 ) 2 RMSE = ∑(yi − yi ) n i=1 n 1 ) MPE = ∑(yi − yi ) n i=1

Previous sections provide the model development process from one model to another model. To test whether the linear regression with power was a beneficial addition to the regression tree model, the mean ERs at HTBR end nodes (single value) are compared to the predictions from the linear regression function with engine power.

The results of the performance evaluation are shown in Table 10-23. The improvement in R2 associated with moving toward a linear function of engine power is large. Hence, the use of the linear regression function will provide a significant improvement on spatial and temporal model prediction capability. But, this linear regression function might still be improved. Since the R2 and slope in Table 10-19 are derived by comparing model

209 predictions and actual observations for emission rates (untransformed y), these numbers are different from what in linear regression models.

The two transforms of engine power were tested: square root transformation and log transformation. The results of the performance evaluation are shown in Table 10-19.

Results suggest that linear regression function with log transformation performs slightly better.

The addition of vehicle speed was also tested. The results of the performance evaluation are shown in Table 10-23. Analysis results suggest that a linear regression function for engine power and vehicle speed also performs slightly better.

Given that the regression tree modeling exercise indicated that a number of power cutpoints may play a role in the emissions process, an additional modeling run was conducted. The results of the performance evaluation are also shown in Table 10-19.

Analysis results suggest that linear regression function with dummy variables performs slightly better than the model without the power cutpoints.

Table 10-19 Comparative Performance Evaluation of NOx Emission Rate Models

Coefficient of Slope (β1) RMSE MPE determination (R2) Mean ERs 0.00026 1.00000 0.10455 0.00001 Linear Regression (Power) 0.18951 0.83817 0.09463 0.00428 Linear Regression (Power^0.5) 0.21520 0.90107 0.09321 0.00898 Linear Regression (log(Power)) 0.23635 1.01220 0.09178 0.00872 Linear Regression 0.26835 1.00140 0.08982 0.00837 (log(Power)+Speed) Linear Regression 0.28003 1.03640 0.08912 0.00834 (log(Power)+Speed+Dummy)

Although the linear regression function with dummy variables works slightly better than linear regression function with engine power and vehicle speed, it introduces more explanatory variables (dummy variables and the interaction with engine power) and

210 increases the complexity of regression model. There is only one regression function for

Model 1.4 while there are two regression functions for Model 1.5. There is also no obvious reason why the engine may be performing slightly differently within these power regimes, yielding different regression slopes and intercepts. It may be that the fuel injection systems in these engines operate slightly differently under low load (near-idle) and high load conditions. This may be controlled by the engine computer. Or, it may be that there are a sufficient number of low power cruise operations and high power cruise operations that are incorrectly classified, and may be better classified as idle or acceleration events (perhaps due to GPS speed data errors). In any case, because the model with dummy variables does not perform appreciably better than the model without the dummy variables, the dummy variables are not included in the final model selection at this time. These dummy variables are, however, worth exploring when additional data from other engine technology groups become available for analysis. Model 1.4 is selected as the preliminary ‘final’ model.

The next step in model evaluation is to once again examine the residuals for the improved model. A principal objective was to verify that the statistical properties of the regression model conform with a set of properties of lease squares estimators. In summary, these properties require that the error terms are normally distributed, have a mean of zero, and have uniform variance.

Test for Constancy of Error Variance A plot of the residuals versus the fitted values is useful in identifying any patterns in the residuals. Figure 10-31(c) shows this plot for NOx model. Without considering variance due to high emission points and zero load data, there is no obvious pattern in the residuals across the fitted values.

Test of Normality of Error terms

211 The first informal test normally reserved for the test of normality of error terms is a quantile-quantile plot of the residuals. Figure 10-31 plot (c) shows the normal quantile plot of the NOx model. The second informal test is to compare actual frequencies of the residuals against expected frequencies under normality. Under normality, we expect 68 percent of the residuals fall between ± MSE and about 90 percent fall between

± 1.645 MSE . Actually, 72.64% of residuals fall within the first limits, while 93.79% of residuals fall within the second limits. Thus, the actual frequencies here are reasonably consistent with those expected under normality. The heavy tails at both ends are a cause for concern, but are due to the nature of data set. For example, even after the transformation, the response variable is not a true normal distribution.

Based on above analysis, final NOx emission model for cruise mode is:

NOx = (-0.0195 + 0.2007log10(engine.power+1) + 0.0019vehicle.speed)^2

Analysis results support that the final NOx emission model is significantly better at explaining variability without making the model too complex. Since there is only one engine type, complexity may not be valid in terms of transferability. This model is specific to the engine classes employed in the transit bus operations. Different models may need to be developed for other engine classes and duty cycles.

10.3.2.2 CO Emission Rate Model Development for Acceleration Mode

Based on previous analysis, truncated transformed CO will serve as the independent variable. However, modelers should keep in mind that the comparisons should always be made on the original untransformed scale of Y when comparing statistical models. HTBR tree model results suggest that engine power is the best one to begin with.

10.3.2.2.1 Linear Regression Model with Engine Power

Let’s select engine power to begin with, and estimate the model:

Y = β0 + β1engine.power + Error (2.1)

212 The regression run yields the following results.

Table 10-20 Regression Result for CO Model 2.1

Call: lm(formula = log.CO ~ engine.power, data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -3.151 -0.3515 -0.05231 0.3448 1.453

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -1.8549 0.0100 -185.2318 0.0000 engine.power 0.0031 0.0000 69.7761 0.0000

Residual standard error: 0.473 on 18862 degrees of freedom Multiple R-Squared: 0.2052 F-statistic: 4869 on 1 and 18862 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) engine.power -0.939

Analysis of Variance Table

Response: log.CO

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) engine.power 1 1089.300 1089.300 4868.698 0 Residuals 18862 4220.097 0.224

The results suggest that engine power explains about 21% of the variance in truncated transformed CO. F-statistic shows that β1≠0, and the linear relationship is statistically significant. To evaluate the model, the normality is examined in the QQ plot and check constancy of variance by examining residuals vs. fitted values.

213

Figure 10-33 QQ and Residual vs. Fitted Plot for CO Model 2.1

The residual plot in Figure 10-33 shows a slight departure from linear regression assumptions indicating a need to explore a curvilinear regression function. Since the variability at the different X levels appears to be fairly constant, a transformation on X is considered. The reason to consider transformation first is avoiding multicollinearity brought about by adding the second-order of X. Based on the prototype plot in Figure

10-33, the square root transformation and logarithmic transformation are tested. Scatter plots and residual plots based on each transformation should then be prepared and analyzed to determine which transformation is most effective.

Y = β0 + β1engine.power^(1/2) + Error (2.2)

Y = β0 + β1log10(engine.power+1) + Error (2.3)

The result for Model 2.2 will be shown in Table 10-21 and Figure 10-33, while the result for Model 2.3 will be shown in Table 10-22 and Figure 10-34.

214 Table 10-21 Regression Result for CO Model 2.2

Call: lm(formula = log.CO ~ engine.power^(1/2), data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -2.798 -0.3492 -0.0529 0.3381 1.52

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -2.3146 0.0149 -155.8023 0.0000 I(engine.power^(1/2)) 0.0793 0.0010 77.1161 0.0000

Residual standard error: 0.4626 on 18862 degrees of freedom Multiple R-Squared: 0.2397 F-statistic: 5947 on 1 and 18862 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) I(engine.power^(1/2)) -0.974

Analysis of Variance Table

Response: log.CO

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) I(engine.power^(1/2)) 1 1272.706 1272.706 5946.896 0 Residuals 18862 4036.691 0.214

Figure 10-33 QQ and Residual vs. Fitted Plot for CO Model 2.2

215 Table 10-22 Regression Result for CO Model 2.3

Call: lm(formula = log.CO ~ log10(engine.power + 1), data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -2.187 -0.3475 -0.05182 0.3313 2.475

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -3.2695 0.0261 -125.3639 0.0000 log10(engine.power + 1) 0.9152 0.0114 80.0560 0.0000

Residual standard error: 0.4584 on 18862 degrees of freedom Multiple R-Squared: 0.2536 F-statistic: 6409 on 1 and 18862 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) log10(engine.power + 1) -0.9918

Analysis of Variance Table

Response: log.CO

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) log10(engine.power + 1) 1 1346.515 1346.515 6408.966 0 Residuals 18862 3962.882 0.210

Figure 10-34 QQ and Residual vs. Fitted Plot for CO Model 2.3

216 The results suggest that by using transformed engine power, the model increases the amount of variance explained in truncated transformed CO from about 21% to about

25%.

Model 2.3 improves the R2 more than does Model 2.2. The residuals scatter plot for Model 2.3 (Figure 10-33) shows a more reasonably linear relationship than Model 2.2

(Figure 10-33). Figure 10-34 also shows that Model 2.3 does a better job in improving the pattern of variance. QQ plot shows general normality with the exceptions arising in the tails.

10.3.2.2.2 Linear Regression Model with Engine Power and Vehicle Speed HTBR tree model results also suggest that vehicle speed may be an important predictive variable for emissions under certain conditions. After developing linear regression model with engine power, adding vehicle speed might improve the model predictive ability. The new model is proposed as:

Y = β0 + β1log10(engine.power+1) + β2vehicle.speed + Error (2.4)

The result for Model 2.4 will be shown in Table 10-23 and Figure 10-35.

217 Table 10-23 Regression Result for CO Model 2.4

*** Linear Model ***

Call: lm(formula = log.CO ~ log10(engine.power + 1) + vehicle.speed, data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -2.299 -0.236 -0.02889 0.2281 3.209

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -3.7472 0.0225 -166.3169 0.0000 log10(engine.power + 1) 1.3412 0.0107 125.1282 0.0000 vehicle.speed -0.0285 0.0003 -89.0585 0.0000

Residual standard error: 0.3846 on 18861 degrees of freedom Multiple R-Squared: 0.4746 F-statistic: 8517 on 2 and 18861 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) log10(engine.power + 1) log10(engine.power + 1) -0.9683 vehicle.speed 0.2380 -0.4463

Analysis of Variance Table

Response: log.CO

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) log10(engine.power + 1) 1 1346.515 1346.515 9103.577 0 vehicle.speed 1 1173.140 1173.140 7931.415 0 Residuals 18861 2789.742 0.148

Figure 10-35 QQ and Residual vs. Fitted Plot for CO Model 2.4

218 The results suggest that by using vehicle speed and transformed engine power, the model increases the amount of variance explained in truncated transformed CO from about 25% to about 47%.

Model 2.4 tremendously improves the R2 achieved in Model 2.3. The residuals scatter plot for Model 2.4 (Figure 10-32) shows a reasonably linear relationship. Figure

10-35 also shows that Model 2.4 does a slightly better job in improving the pattern of variance. QQ plot shows general normality with the exceptions arising in the tails.

10.3.2.2.3 Linear Regression Model with Dummy Variables

Figure 10-22 suggests that the relationship between CO and engine power may be somewhat different across the engine power ranges identified in the tree analysis. That is, there may be higher or lower CO emissions in different engine power operating ranges.

One dummy variable is created to represent different engine power ranges identified in

Figure 10-22 for use in linear regression analysis as illustrated below:

Engine power (bhp) Dummy1 <82.625 1 >=82.625 0

This dummy variable and the interaction between dummy variable and engine power are then tested to determine whether the use of the variable and interactions can help improve the model.

Y = β0 + β1log10(engine.power+1) + β2vehicle.speed + β3 dummy1 + β4dummy1 log10(engine.power+1) + β5 dummy1vehicle.speed + Error (2.5)

The result for Model 2.5 will be shown in Table 10-24 and Figure 10-36.

219 Table 10-24 Regression Result for CO Model 2.5

*** Linear Model ***

Call: lm(formula = log.CO ~ log10(engine.power + 1) + vehicle.speed + dummy1 * log10( engine.power + 1) + dummy1 * vehicle.speed, data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -2.383 -0.233 -0.02602 0.2235 2.124

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -4.4320 0.0498 -89.0217 0.0000 log10(engine.power + 1) 1.6746 0.0222 75.4956 0.0000 vehicle.speed -0.0333 0.0003 -102.3796 0.0000 dummy1 1.4402 0.0614 23.4537 0.0000 dummy1:log10(engine.power + 1) -1.0349 0.0321 -32.2634 0.0000 dummy1:vehicle.speed 0.0414 0.0013 32.8802 0.0000

Residual standard error: 0.3655 on 18858 degrees of freedom Multiple R-Squared: 0.5255 F-statistic: 4177 on 5 and 18858 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) log10(engine.power + 1) log10(engine.power + 1) -0.9926 vehicle.speed 0.3000 -0.4020 dummy1 -0.8108 0.8047 dummy1:log10(engine.power + 1) 0.6864 -0.6915 dummy1:vehicle.speed -0.0774 0.1038

vehicle.speed dummy1 log10(engine.power + 1) vehicle.speed dummy1 -0.2432 dummy1:log10(engine.power + 1) 0.2780 -0.9559 dummy1:vehicle.speed -0.2581 0.0018

dummy1:log10(engine.power + 1) log10(engine.power + 1) vehicle.speed dummy1 dummy1:log10(engine.power + 1) dummy1:vehicle.speed -0.1467

Analysis of Variance Table

Response: log.CO

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) log10(engine.power + 1) 1 1346.515 1346.515 10079.07 0 vehicle.speed 1 1173.140 1173.140 8781.31 0 dummy1 1 23.180 23.180 173.51 0 dummy1:log10(engine.power + 1) 1 102.793 102.793 769.44 0 dummy1:vehicle.speed 1 144.430 144.430 1081.10 0 Residuals 18858 2519.338 0.134

220

Figure 10-36 QQ and Residual vs. Fitted Plot for CO Model 2.5

Model 2.5 does improve R2 from around 0.47 to around 0.52by adding the dummy variables. The residuals scatter plot for Model 2.5 (Figure 10-36) shows a slightly more linear relation. Figure 10-36 also shows that Model 2.5 perhaps may improve the pattern of variance. The QQ plot again shows general normality with the exceptions arising in the tails. However, it is important to note that the model improvement, in terms of amount of variance explained by the model, is not large.

Then, three more dummy variables will be created to represent different engine power and vehicle speed ranges in Figure 10-22 and are shown as follow:

Thresholds Dummy21 Dummy22 Dummy23 engine.power<82.625 1 0 0 engine.power [82.625, 152.96] & 0 1 0 vehicle.speed<19.05 engine.power >=152.96 & 0 0 1 vehicle.speed<19.05 engine.power>=82.625 & 0 0 0 vehicle.speed>=19.05

221 These three dummy variables and the interaction between dummy variables and engine power and vehicle speed are added to improve the model. This model will be:

Y = β0 + β1log10(engine.power+1) + β2 vehicle.speed + β3dummy21 + β4 dummy21 log10(engine.power+1) + β5 dummy21 vehicle.speed + β6 dummy22 + β7 dummy22 log10(engine.power+1) + β8 dummy22 vehicle.speed + β9dummy23 +

β10dummy23log10(engine.power+1) +β11dummy23 vehicle.speed + Error (2.6)

Table 10-25 Regression Result for CO Model 2.6

*** Linear Model ***

Call: lm(formula = log.CO ~ log10(engine.power + 1) + vehicle.speed + dummy21 * log10(engine.power + 1) + dummy21 * vehicle.speed + dummy22 * log10( engine.power + 1) + dummy22 * vehicle.speed + dummy23 * log10( engine.power + 1) + dummy23 * vehicle.speed, data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -2.562 -0.2086 -0.02372 0.2012 2.124

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -3.5895 0.0945 -37.9720 0.0000 log10(engine.power + 1) 1.1014 0.0389 28.3316 0.0000 vehicle.speed -0.0150 0.0007 -21.0912 0.0000 dummy21 0.5978 0.1007 5.9384 0.0000 dummy22 -1.4856 0.2216 -6.7035 0.0000 dummy23 -2.3863 0.1632 -14.6202 0.0000 dummy21:log10(engine.power + 1) -0.4617 0.0448 -10.3020 0.0000 dummy21:vehicle.speed 0.0231 0.0014 16.8659 0.0000 dummy22:log10(engine.power + 1) 0.8643 0.1048 8.2494 0.0000 dummy22:vehicle.speed -0.0194 0.0016 -12.1421 0.0000 dummy23:log10(engine.power + 1) 1.3505 0.0701 19.2614 0.0000 dummy23:vehicle.speed -0.0387 0.0012 -30.9943 0.0000

Residual standard error: 0.3517 on 18852 degrees of freedom Multiple R-Squared: 0.5609 F-statistic: 2189 on 11 and 18852 degrees of freedom, the p-value is 0

Analysis of Variance Table

Response: log.CO

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value log10(engine.power + 1) 1 1346.515 1346.515 10887.89 vehicle.speed 1 1173.140 1173.140 9485.98 dummy21 1 23.180 23.180 187.44 dummy22 1 67.463 67.463 545.50 dummy23 1 100.345 100.345 811.39 dummy21:log10(engine.power + 1) 1 35.491 35.491 286.98 dummy21:vehicle.speed 1 93.450 93.450 755.63 dummy22:log10(engine.power + 1) 1 3.681 3.681 29.76

222 Table 10-25 Continued

dummy22:vehicle.speed 1 3.564 3.564 28.82 dummy23:log10(engine.power + 1) 1 12.318 12.318 99.61 dummy23:vehicle.speed 1 118.804 118.804 960.65 Residuals 18852 2331.445 0.124

Pr(F) log10(engine.power + 1) 0.000000e+000 vehicle.speed 0.000000e+000 dummy21 0.000000e+000 dummy22 0.000000e+000 dummy23 0.000000e+000 dummy21:log10(engine.power + 1) 0.000000e+000 dummy21:vehicle.speed 0.000000e+000 dummy22:log10(engine.power + 1) 4.942365e-008 dummy22:vehicle.speed 8.032376e-008 dummy23:log10(engine.power + 1) 0.000000e+000 dummy23:vehicle.speed 0.000000e+000 Residuals

Figure 10-37 QQ and Residual vs. Fitted Plot for CO Model 2.6

Model 2.6 does improve the ability to explain variance by another 4% (R2 increases from from 0.47 to 0.52 and then to 0.56 by adding the dummy variables).

Model 2.6 slightly improves R2 compared to Model 2.5. The residuals scatter plot for

223 Model 2.6 (Figure 10-37) shows a more reasonably linear relation. Figure 10-37 also shows that Model 2.6 does a better job in improving the pattern of variance. The QQ plot again shows general normality with the exceptions arising in the tails. However, it is important to note that the model improvement, in terms of amount of variance explained by the model, is small.

10.3.2.2.4 Model Discussions

The previous sections outline the model development process from regression tree model, to a simple OLS model, to more complex OLS models. Since the performance of the models are evaluated by comparing model predictions and actual observations for emission rates, the R2 and slope are different from those in previous linear regression models. The results of each step in the model improvement process are presented in

Table 10-26. The mean emission rates at HTBR end nodes (single value) are compared to the results of various linear regression functions with engine power. Since the R2 values and slopes in Table 10-26 are derived by comparing model predictions and actual observations for emission rates (untransformed y), these numbers will be different from what in linear regression models.

Table 10-26 Comparative Performance Evaluation of CO Emission Rate Models

Coefficient of Slope (β1) RMSE MPE determination (R2) Mean ERs 0.00003 0.99985 0.16032 -0.00002 Linear Regression (Power) 0.04615 1.17980 0.16516 0.05229 Linear Regression (Power^0.5) 0.05019 1.22710 0.16420 0.05006 Linear Regression (log(Power)) 0.05527 1.53410 0.16455 0.05120 Linear Regression 0.39173 2.16050 0.14252 0.04211 (log(Power)+Speed) Linear Regression 0.40622 1.76490 0.13632 0.03689 (log(Power)+Speed+Dummy Set 1) Linear Regression 0.43749 1.24220 0.12565 0.03003 (log(Power)+Speed+Dummy Set 2)

224 The improvement in R2 associated with moving toward a linear function of engine power is significant. Hence, the use of the linear regression function will provide a significant improvement on spatial and temporal model prediction capability. But, this linear regression function might still be improved.

Results suggest that linear regression function with log transformation performs slightly better than the others and that the use of dummy variables can further improve model performance. Although the linear regression function with dummy variables performs slightly better than linear regression function with log transformation, it introduces more explanatory variables (dummy variables and the interaction with engine power) and increases the complexity of regression model. As discussed in Section

10.3.2.1.4, there is no compelling reason to include the dummy variables in the model, given that: 1) the models with dummy variables are more complex without significantly improving model performance, and 2) there is no compelling engineering reason at this time to support the difference in model performance within these specific power regions.

Yet, given the explanatory power of the power cutpoint dummy variables (a 10% increase in explained variance), additional investigation into why these values are turning out to be significant is definitely warranted. It may be wise to include such cutpoints in onroad models for various engine technology groups. Such dummy variables are, however, worth exploring when additional data from other engine technology groups become available for analysis. It can be argued that inclusion of the dummy variables for power is warranted.

However, Model 2.4 is chosen as the preliminary ‘final’ model based solely upon ease of implementation. The next step in model evaluation is to once again examine the residuals for the improved model. A principal objective was to verify that the statistical properties of the regression model conform with a set of properties of lease squares

225 estimators. In summary, these properties require that the error terms are normally distributed, have a mean of zero, and have uniform variance.

Test for Constancy of Error Variance A plot of the residuals versus the fitted values is useful in identifying patterns in the residuals. Figure 10-35 plot (a) shows this plot for CO model. Without considering variance due to high emission points and zero load data, there is no obvious pattern in the residuals across the fitted values.

Test of Normality of Error Terms

The first informal test normally reserved for the test of normality of error terms is a quantile-quantile plot of the residuals. Figure 10-35 plot (c) shows the normal quantile plot of the CO model. The second informal test is to compare actual frequencies of the residuals against expected frequencies under normality. Under normality, we expect 68 percent of the residuals fall between ± MSE and about 90 percent fall between

± 1.645 MSE . Actually, 87.35% of residuals fall within the first limits, while 92.19% of residuals fall within the second limits. Thus, the actual frequencies here are reasonably consistent with those expected under normality. The heavy tails at both ends are a cause for concern, but are due to the nature of data set. For example, even after the transformation, the response variable is not the real normal distribution.

Based on above analysis, final CO emission model for cruise mode is:

CO = 10^(-3.7472+1.3412log10(engine.power+1) - 0.0285vehicle.speed)

Analysis results support that the final CO emission model is significantly better at explaining variability without making the model too complex. Since there is only one engine type, complexity may not be valid in terms of transferability. This model is specific to the engine classes employed in the transit bus operations. Different models may need to be developed for other engine classes and duty cycles.

226 10.3.2.3 HC Emission Rate Model Development for Acceleration Mode

Based on previous analysis, truncated transformed HC will serve as the independent variable. However, modelers should keep in mind that the comparisons should always be made on the original untransformed scale of Y when comparing statistical models. HTBR tree model results suggest that engine power is the best one to begin with.

10.3.2.3.1 Linear Regression with Engine Power

Let’s select engine power to begin with, and estimate the model:

Y = β0 + β1engine.power + Error (3.1)

The regression run yields the following results.

Table 10-27 Regression Result for HC Model 3.1

Call: lm(formula = HC.25 ~ engine.power, data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.1285 -0.02417 -0.00003173 0.02467 0.2904

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1840 0.0009 216.4203 0.0000 engine.power 0.0001 0.0000 32.4947 0.0000

Residual standard error: 0.03989 on 18328 degrees of freedom Multiple R-Squared: 0.05447 F-statistic: 1056 on 1 and 18328 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) engine.power -0.938

Analysis of Variance Table

Response: HC.25

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) engine.power 1 1.67991 1.679912 1055.908 0 Residuals 18328 29.15918 0.001591

227

Figure 10-38 QQ and Residual vs. Fitted Plot for HC Model 3.1

The results suggest that engine power explains about 5% of the variance in truncated transformed HC. F-statistic shows that β1≠0, and the linear relationship is statistically significant. To evaluate the model, the normality is examined in the QQ plot and check constancy of variance by examining residuals vs. fitted values.

The residual plot in Figure 10-38 shows a slight departure from linear regression assumptions indicating a need to explore a curvilinear regression function. Since the variability at the different X levels appears to be fairly constant, a transformation on X is considered. The reason to consider transformation first is avoiding multicollinearity brought about by adding the second-order of X. Based on the prototype plot in Figure

10-38, the square root transformation and logarithmic transformation are tested. Scatter plots and residual plots based on each transformation should then be prepared and analyzed to determine which transformation is most effective.

Y = β0 + β1engine.power^(1/2) + Error (3.2)

228 Y = β0 + β1log10(engine.power+1) + Error (3.3)

The result for Model 3.2 will be shown in Table 10-28 and Figure 10-39, while the result for Model 3.3 will be shown in Table 10-29 and Figure 10-40.

Table 10-28 Regression Result for HC Model 3.2

Call: lm(formula = HC.25 ~ engine.power^(1/2), data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.1173 -0.02389 -0.0002473 0.0244 0.2969

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1625 0.0013 127.4341 0.0000 I(engine.power^(1/2)) 0.0034 0.0001 38.2005 0.0000

Residual standard error: 0.03948 on 18328 degrees of freedom Multiple R-Squared: 0.07375 F-statistic: 1459 on 1 and 18328 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) I(engine.power^(1/2)) -0.9735

Analysis of Variance Table

Response: HC.25

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) I(engine.power^(1/2)) 1 2.27433 2.274333 1459.28 0 Residuals 18328 28.56475 0.001559

229

Figure 10-39 QQ and Residual vs. Fitted Plot for HC Model 3.2

Table 10-29 Regression Result for HC Model 3.3

Call: lm(formula = HC.25 ~ log10(engine.power + 1), data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.1186 -0.02345 -0.00007336 0.02386 0.3004

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1136 0.0022 50.8911 0.0000 log10(engine.power + 1) 0.0426 0.0010 43.4726 0.0000

Residual standard error: 0.03906 on 18328 degrees of freedom Multiple R-Squared: 0.09347 F-statistic: 1890 on 1 and 18328 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) log10(engine.power + 1) -0.9916

Analysis of Variance Table

Response: HC.25

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) log10(engine.power + 1) 1 2.88268 2.882681 1889.863 0 Residuals 18328 27.95641 0.001525

230

Figure 10-40 QQ and Residual vs. Fitted Plot for HC Model 3.3

The results suggest that by using transformed engine power, the model increases the amount of variance explained in truncated transformed HC from about 5% to about

9%.

Model 3.3 improves R2 relative to Model 3.2. The residuals scatter plot for

Model 3.3 (Figure 10-40) also shows a more reasonably linear relation than Model 2.2

(Figure 10-39). Figure 10-40 also shows that Model 3.3 does a better job in improving the pattern of variance. QQ plot shows general normality with the exceptions arising in the tails.

10.4.2.3.2 Linear Regression Model with Dummy Variables

Figure 10-26 suggests that the relationship between HC and engine power may differ across the engine power ranges. One dummy variable is created to represent different engine power ranges identified in Figure 10-26 for use in linear regression analysis as illustrated below:

231

Engine power (bhp) Dummy1 <54.555 1 >=54.555 0

This dummy variable and the interaction between dummy variable and engine power are then tested to determine whether the use of the variable and interaction can help improve the model.

Y = β0 + β1log10(engine.power+1) + β2 dummy1 + β3dummy1 log10(engine.power+1) + Error (3.4)

Table 10-30 Regression Result for HC Model 3.4

Call: lm(formula = HC.25 ~ log10(engine.power + 1) + dummy1 * log10(engine.power + 1), data = busdata10242006.1.3, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.1278 -0.02305 0.0002278 0.0231 0.314

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1734 0.0042 41.4191 0.0000 log10(engine.power + 1) 0.0171 0.0018 9.4715 0.0000 dummy1 -0.0643 0.0062 -10.3151 0.0000 dummy1:log10(engine.power + 1) 0.0195 0.0039 4.9731 0.0000

Residual standard error: 0.03873 on 18326 degrees of freedom Multiple R-Squared: 0.1084 F-statistic: 742.8 on 3 and 18326 degrees of freedom, the p-value is 0

Analysis of Variance Table

Response: HC.25

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value log10(engine.power + 1) 1 2.88268 2.882681 1921.331 dummy1 1 0.42377 0.423774 282.449 dummy1:log10(engine.power + 1) 1 0.03711 0.037107 24.732 Residuals 18326 27.49553 0.001500

Pr(F) log10(engine.power + 1) 0.000000e+000 dummy1 0.000000e+000 dummy1:log10(engine.power + 1) 6.647205e-007 Residuals

232

Figure 10-41 QQ and Residual vs. Fitted Plot for HC Model 3.4

The results suggest that by using transformed engine power and speed, the model only increases the amount of variance explained in truncated transformed HC from about

9% to about 10%.

Model 3.4 slightly improves R2 relative to Model 3.3. The residuals scatter plot for Model 3.4 (Figure 10-41) is not appreciably better nor does model 3.4 do a better job in improving the pattern of variance. The QQ plot still shows general normality with the exceptions arising in the tails.

10.3.2.3.4 Model Discussions The previous sections outline the model development process from regression tree model, to a simple OLS model, to more complex OLS models. To test whether the linear regression with power was a beneficial addition to the regression tree model, the mean

ERs at HTBR end nodes (single value) are compared to the predictions from the linear regression function with engine power. The results of the performance evaluation are shown in Table 10-31. The improvement in R2 associated with moving toward a linear

233 function of engine power is nearly imperceptible. Hence, the use of the linear regression function will provide almost no significant improve spatial and temporal model prediction capability. This linear regression function might still be improved. Since the

R2 and slope in Table 10-31 are derived by comparing model predictions and actual observations for emission rates, these numbers will be different from what in linear regression models.

Table 10-31 Comparative Performance Evaluation of HC Emission Rate Models

Coefficient of Slope RMSE MPE determination (β1) (R2) Mean ERs 0.000090 1.0001 0.0019072 0.00000022 Linear Regression (Power) 0.016629 0.97936 0.0019879 0.00061206 Linear Regression (Power^0.5) 0.021387 0.74875 0.0019311 0.00040055 Linear Regression (log(Power)) 0.028090 0.86477 0.0019249 0.00040884 Linear Regression 0.036692 1.06020 0.0019151 0.00040366 (log(Power)+ Dummy)

Results suggest that linear regression function with log transformation performs slightly better than the others and that the use of dummy variables can further improve model performance, but again there is almost no perceptible change in terms of explained variance. Although the linear regression function with log transformation and dummy variables performs slightly better than linear regression function with log transformation alone, the revised model introduces additional explanatory variables (dummy variables and the interaction with engine power) and increases the complexity of regression model without significantly improving the model. As discussed in Section 10.3.2.1.4, there is no compelling reason to include the dummy variables in the model, given that: 1) the second model is more complex without significantly improving model performance, and

2) there is no compelling engineering reason at this time to support the difference in model performance within these specific power regions. These dummy variables are,

234 however, worth exploring when additional data from other engine technology groups become available for analysis.

Model 3.3 is recommended as the preliminary ‘final’ model (although one might argue that using directly the regression tree results would also probably be acceptable).

The next step in model evaluation is to once again examine the residuals for the improved model. A principal objective was to verify that the statistical properties of the regression model conform with a set of properties of lease squares estimators. In summary, these properties require that the error terms are normally distributed, have a mean of zero, and have the same variance.

Test for Constancy of Error Variance A plot of the residuals versus the fitted values is useful in identifying any patterns in the residuals. Figure 10-40 plot (a) shows this plot for HC model. Without considering variance due to high emission points and zero load data, it can be seen that there is no obvious pattern in the residuals across the fitted values.

Test of Normality of Error terms The first informal test normally reserved for the test of normality of error terms is a quantile-quantile plot of the residuals. Figure 10-40 plot (d) shows the normal quantile plot of the HC model. The second informal test is to compare actual frequencies of the residuals against expected frequencies under normality. Under normality, we expect 68 percent of the residuals fall between ± MSE and about 90 percent fall between

± 1.645 MSE . Actually, 84.83% of residuals fall within the first limits, while 93.60% of residuals fall within the second limits. Thus, the actual frequencies here are reasonably consistent with those expected under normality. The heavy tails at both ends are a cause for concern, but this is due to the nature of data set. For example, even after the transformation, the response variable is not the real normal distribution.

235 Based on above analysis, final NOx emission model for cruise mode is:

HC = (0.1136 + 0.0426log10(engine.power+1))^4

10.4 Conclusions and Further Considerations

In this research, acceleration mode is defined as “acceleration >1 mph/s”. Data not considered to be in idle, deceleration or acceleration mode will be deemed to be in cruise mode. Compared to cruise mode activity, the engine power is more concentrated in higher engine power ranges (>=200 bhp) for acceleration mode activity.

Inter-bus variability analysis indicated that some of the 15 buses are higher emitters that others (especially noted for HC emissions). However, none of the buses appear to qualify as traditional high-emitters, which would exhibit emission rates of two to three standard deviations above the mean. Hence, it is difficult to classify any of these

15 buses as high emitters for modeling purposes. At this moment, these 15 buses are treated as a whole for model development. Modelers should keep in mind that although no true high-emitters are present in the database, such vehicles may behave significantly differently than the vehicles tested. Hence, data from high-emitting vehicles should be collected and examined in future studies.

Some high HC emissions events are noted in acceleration mode. After screening engine speed, engine power, engine oil temperature, engine oil pressure, engine coolant temperature, ECM pressure, and other parameters, no variables were identified that could be linked to these high emissions events. It may be that these events represent natural variability in onroad emissions, or it may be that some other variable (such as grade or an engine variable that is not measured) may be linked to these events.

Engine power is selected as the most important variable for three pollutants based on HTBR tree models. This finding is consistent to previous research results which verified the important role of engine power (Ramamurthy et al. 1998; Clark et al. 2002;

Barth et al. 2004). The noted HC relationship is significant but fairly weak. Analysis in

236 previous chapters also indicates that engine power is correlated with not only onroad load parameters such as vehicle speed, acceleration, and grade, but also potentially with engine operating parameters such as throttle position and engine oil pressure. On the other hand, engine power in this research is derived from engine speed, engine torque and percent engine load.

The regression tree models still suggest that some other variables, like oil pressure and engine bar pressure, may also impact the HC emissions. Further analysis demonstrates the using engine power only could get the similar explanatory ability as using engine power and other variables. To develop models that are efficient and easy to implement, only engine power is used to develop emission models. However, additional investigation into these variables is warranted as additional detailed data from engine testing become available for analysis.

Given the relationships noted between engine indicated HP and emission rates, it is imperative that data be collected to develop solid relationships in engine power demand models (estimating power demand as a function speed/acceleration, grade, vehicle characteristics, surface roughness, inertial losses, etc.) for use in regional inventory development and microscale impact assessment.

In summary, the modeler recommends acceleration emission models as:

NOx = (-0.0195 + 0.2007log10(engine.power+1) + 0.0019vehicle.speed)^2

CO = 10^(-3.7472 + 1.3412log10(engine.power+1) - 0.0285vehicle.speed)

HC = (0.1136 + 0.0426log10(engine.power+1))^4

237 CHAPTER 11

CRUISE MODE DEVELOPMENT

After developing idle mode definition and emission rate in Chapter 8 and deceleration mode definition and emission rate in Chapter 9, acceleration emission model in Chapter 10, the next task will be develop cruise mode.

11.1 Analysis of Cruise Mode Data

After dividing the database into idle mode, deceleration mode, and acceleration mode, cruise mode data will be the all of the remaining data in the database (i.e. data no previously classified into idle, deceleration, and now acceleration). Unlike the idle and deceleration modes, there is a general relationship between engine power and emission rate for acceleration mode and cruise mode. The engine power distribution for data collected in the cruise mode provided in Table 11-1.

Table 11-1 Engine Power Distribution for Cruise Mode

Engine Power Distribution Pollutants (0 50) (50 100) (100 150) (150 200) >=200 All Number NOx 15885 8988 7173 3536 3792 39374 CO 15834 8940 7145 3529 3770 39218 HC 15481 8600 6830 3394 3715 38020 Percentage NOx 40.34% 22.83% 18.22% 8.98% 9.63% 100.00% CO 40.37% 22.80% 18.22% 9.00% 9.61% 100.00% HC 40.72% 22.62% 17.96% 8.93% 9.77% 100.00%

Emission rate histograms for each of the three pollutants for cruise operations are presented in Figure 11-1. Figure 11-1 shows significant skewness for all three pollutants for cruise mode. There are some high HC emissions events noted in cruise mode. After screening engine speed, engine power, engine oil temperature, engine oil pressure, engine

238 coolant temperature, ECM pressure, and other parameters, no operating parameters appeared to be correlated with the high emissions events.

Figure 11-1 Histograms of Three Pollutants for Cruise Mode

11.1.1 Engine Rate Distribution by Bus in Cruise Mode

Inter-bus response variability for cruise mode operations is illustrated in Figures

11-2 to 11-4 using median and mean of NOx, CO, and HC emission rates. Table 11-2 presents the same information in tabular form. The difference between median and mean is also an indicator of skewness.

239

Figure 11-2 Median and Mean of NOx Emission Rates in Cruise Mode by Bus

Figure 11-3 Median and Mean of CO Emission Rates in Cruise Mode by Bus

240

Figure 11-4 Median and Mean of HC Emission Rates in Cruise Mode by Bus

Table 11-2 Median and Mean of Three Pollutants in Cruise Mode by Bus

NOx CO HC Bus ID Median Mean Median Mean Median Mean Bus 360 0.11666 0.14506 0.01618 0.02891 0.00120 0.00146 Bus 361 0.18479 0.18507 0.01091 0.01389 0.00122 0.00135 Bus 363 0.05924 0.07384 0.00534 0.01341 0.00012 0.00021 Bus 364 0.12779 0.14644 0.01259 0.01875 0.00237 0.00343 Bus 372 0.09092 0.09936 0.01262 0.01704 0.00181 0.00236 Bus 375 0.13714 0.16103 0.01254 0.02383 0.00121 0.00146 Bus 377 0.11139 0.11094 0.01454 0.02559 0.00064 0.00075 Bus 379 0.12570 0.15673 0.01394 0.02298 0.00151 0.00195 Bus 380 0.16713 0.18183 0.01994 0.04532 0.00110 0.00148 Bus 381 0.09227 0.11789 0.01074 0.02505 0.00060 0.00080 Bus 382 0.14987 0.16698 0.01342 0.02544 0.00130 0.00155 Bus 383 0.16355 0.18468 0.00921 0.01949 0.00126 0.00198 Bus 384 0.11597 0.13933 0.00934 0.01903 0.00181 0.00221 Bus 385 0.10244 0.13024 0.01266 0.02066 0.00187 0.00205 Bus 386 0.12254 0.13632 0.01147 0.02197 0.00129 0.00167

241 Figure 11-2 to 11-4 and Table 11-2 illustrate that NOx emissions are more consistent than CO and HC emissions. Across the 15 buses, Bus 380 has the largest median and mean for CO emissions, while Bus 364 has the largest median and mean for

HC emissions. The above figures and table demonstrate that although variability exist across buses, it is difficult to conclude that there are any true “high emitters” in the database. This conclusion is consistent with the result for other three modes. As was also noted in the acceleration mode data, Bus 363 has the smallest mean and median HC emissions compared to the other 14 buses.

11.1.2 Engine Power Distribution by Bus in Cruise Mode

Engine power distribution in cruise mode by bus is shown in Figure 11-5 and

Table 11-3. Bus 361 has the largest 1st Quartile engine power in cruise mode while Bus

377 has the largest median and 3rd Quartile engine power in cruise mode. The maximum power values for each bus match well with the manufacturer’s engine power rating.

Although variability for engine power distribution exist across buses, it is difficult to conclude that such variability is affected by individual buses, bus routes, or other factors.

The relationship between power and emissions appears consistent across the buses for acceleration mode.

242 Table 11-3 Engine Power Distribution in Cruise Mode by Bus

1st 3rd Bus ID Number Min Quartile Median Quartile Max Mean Bus 360 1653 0 14.68 71.25 169.03 275.46 97.70 Bus 361 3140 0 70.13 108.12 140.28 296.91 107.16 Bus 363 3286 0 10.46 47.19 112.37 275.55 71.45 Bus 364 2575 0 14.47 64.30 130.62 275.51 85.56 Bus 372 2278 0 30.13 68.23 118.10 275.49 79.77 Bus 375 2890 0 23.19 72.09 142.47 275.54 94.36 Bus 377 1647 0 17.93 118.01 210.27 275.50 121.33 Bus 379 2544 0 43.51 102.68 165.04 275.57 110.84 Bus 380 1242 0 18.85 91.07 187.71 275.56 109.41 Bus 381 2537 0 6.72 49.18 113.81 275.46 70.68 Bus 382 1208 0 32.39 81.02 124.97 275.55 89.42 Bus 383 3062 0 29.42 77.95 141.19 275.53 90.85 Bus 384 3638 0 21.82 61.20 115.75 275.46 72.69 Bus 385 3327 0 11.86 48.80 102.91 275.47 68.20 Bus 386 4539 0 19.24 53.43 94.38 275.30 61.66

243

Figure 11-5 Histograms of Engine Power in Cruise Mode by Bus

244 11.2 Model Development and Refinement

11.2.1 HTBR Tree Model Development

The potential explanatory variables included in the emission rate model development effort include:

• Vehicle characteristics: model year, odometer reading, bus ID (14 dummy

variables)

• Roadway characteristics: dummy variable for road grade;

• Onroad load parameters: engine power (bhp), vehicle speed (mph),

acceleration (mph/s);

• Engine operating parameters: engine oil temperature(deg F), engine oil

pressure (kPa), engine coolant temperature (deg F), barometric pressure

reported from ECM (kPa);

• Environmental conditions: ambient temperature (deg C), ambient pressure

(mbar), ambient relative humidity (%).

The Hierarchical Tree-Based Regression (HTBR) technique is used first to identify potentially significant explanatory variables and this analysis provides the starting point for conceptual model development. The HTBR model is used to guide the development of an OLS regression model, rather than as a model in its own right. HTBR can be used as a data reduction tool and for identifying potential interactions among the variables. Then OLS regression is used with the identified variables to estimate a preliminary “final” model.

Although evidence in the literature suggests that a logarithmic transformation is most suitable for modeling motor vehicle emissions (Washington 1994; Ramamurthy et al. 1998; Fomunung 2000; Frey et al. 2002), this transformation needs to be verified through Box-Cox procedure. Box-Cox function in Matlab can automatically identify a transformation from the family of power transformations on emission data, ranging from

245 -1.0 to 1.0. The lambdas chosen by Box-Cox procedure for cruise mode are 0.40619 for

NOx, 0.012969 for CO, 0.241 for HC. Box-Cox procedure is only used to provide a guide for selecting a transformation, so overly precise results are not needed (Neter et al.

1996). It is often reasonable to use a nearby lambda value for the power transformation that is easier to understand. Although the lambdas chosen by Box-Cox procedure are different for acceleration and cruise mode, the nearby lambda values are same for these two modes. In summary, the lambda values used for transformations are ½ for NOx, 0 for CO (indicating a log transformation), and ¼ for HC for cruise mode. Figure 11-6 to

11-8 presented histogram, boxplot, and probability plot of truncated emission rate in acceleration mode for NOx, CO, and HC, while Figure 11-9 to 11-11 presented same plots for truncated transformed emission rate for NOx, CO and HC, where a great improvement is noted.

Figure 11-6 Histogram, Boxplot, and Probability Plot of Truncated NOx Emission Rate in Cruise Mode

246

Figure 11-7 Histogram, Boxplot, and Probability Plot of Truncated CO Emission Rate in Cruise Mode

Figure 11-8 Histogram, Boxplot, and Probability Plot of Truncated HC Emission Rate in Cruise Mode

247

Figure 11-9 Histogram, Boxplot, and Probability Plot of Truncated Transformed NOx Emission Rate in Cruise Mode

Figure 11-10 Histogram, Boxplot, and Probability Plot of Truncated Transformed CO Emission Rate in Cruise Mode

248

Figure 11-11 Histogram, Boxplot, and Probability Plot of Truncated Transformed HC Emission Rate in Cruise Mode

11.2.1.1 NOx HTBR Tree Model Development

Figure 11-12 illustrates the initial tree model used for truncated transformed NOx emission rate in cruise mode. Results for initial model are given in Table 11-4. The tree grew into a complex model, with a considerable number of branches and 32 terminal nodes. Figure 11-13 illustrates the amount of deviation explained corresponding to the number of terminal nodes.

249

Figure 11-12 Original Untrimmed Regression Tree Model for Truncated Transformed NOx Emission Rate in Cruise Mode

Figure 11-13 Reduction in Deviation with the Addition of Nodes of Regression Tree for Truncated Transformed NOx Emission Rate in Cruise Mode

250 Table 11-4 Original Untrimmed Regression Tree Results for Truncated Transformed

NOx Emission Rate in Cruise Mode

Regression tree: tree(formula = NOx.50 ~ model.year + odometer + temperature + baro + humidity + vehicle.speed + oil.temperture + oil.press + cool.temperature + eng.bar.press + engine.power + acceleration + bus360 + bus361 + bus363 + bus364 + bus372 + bus375 + bus377 + bus379 + bus380 + bus381 + bus382 + bus383 + bus384 + bus385 + dummy.grade, data = busdata10242006.1.4, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01) Variables actually used in tree construction: [1] "engine.power" "dummy.grade" "baro" "oil.press" [5] "humidity" "vehicle.speed" "temperature" "bus372" [9] "odometer" "model.year" Number of terminal nodes: 32 Residual mean deviance: 0.005398 = 212.4 / 39340 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -4.634e-001 -4.130e-002 -1.265e-003 -1.315e-016 3.646e-002 1.180e+000

For model application purposes, it is desirable to select a final model specification that balances the model’s ability to explain the maximum amount of deviation with a simpler model that is easy to interpret and apply. Figure 11-7 indicated that reduction in deviation with addition of nodes after 4, although potentially statistically significant, is very small. A simplified tree model was derived which ends in 4 terminal nodes as compared to the 37 terminal nodes in the initial model. The residual mean deviation only increased from 210.2 to 298.9 and yielded a much cleaner model than the initial one.

Results are shown in Table 11-5 and Figure 11-14. Based on above analysis, NOx cruise model will be developed based on this result.

251

Figure 11-14 Trimmed Regression Tree Model for Truncated Transformed NOx Emission Rate in Cruise Mode

Table 11-5 Trimmed Regression Tree Results for Truncated Transformed NOx Emission

Rate in Cruise Mode

Regression tree: snip.tree(tree = tree(formula = NOx.50 ~ model.year + odometer + temperature + baro + humidity + vehicle.speed + oil.temperture + oil.press + cool.temperature + eng.bar.press + engine.power + acceleration + bus360 + bus361 + bus363 + bus364 + bus372 + bus375 + bus377 + bus379 + bus380 + bus381 + bus382 + bus383 + bus384 + bus385 + dummy.grade, data = busdata10242006.1.4, na.action = na.exclude, mincut = 400,minsize = 800, mindev = 0.01), nodes = c(5., 4., 6., 7.)) Variables actually used in tree construction: [1] "engine.power" Number of terminal nodes: 4 Residual mean deviance: 0.007591 = 298.9 / 39370 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -4.643e-001 -5.592e-002 3.280e-004 -4.143e-016 5.370e-002 1.179e+000 node), split, n, deviance, yval * denotes terminal node

1) root 39374 1095.00 0.3360 2) engine.power<52.525 16280 160.50 0.1831 4) engine.power<19.05 9222 47.70 0.1252 * 5) engine.power>19.05 7058 41.36 0.2588 * 3) engine.power>52.525 23094 285.90 0.4438 6) engine.power<109.555 10186 81.41 0.3791 * 7) engine.power>109.555 12908 128.40 0.4948 *

252 This tree model suggested that engine power is the most important explanatory variable for NOx emissions. This finding is consistent to previous research results which verified the important role of engine power on NOx emissions (Ramamurthy et al. 1998;

Clark et al. 2002; Barth et al. 2004). Analysis in previous chapter also indicates that engine power is correlated with not only onroad load parameters such as vehicle speed, acceleration, and grade, but also engine operating parameters such as throttle position and engine oil pressure. On the other hand, engine power in this research is derived from engine speed, engine torque and percent engine load. So engine power can connect onroad modal activity with engine operating conditions to that extent. This fact strengthens the importance of introduce engine power into the conceptual model and the need to improve the ability to simulate engine power for regional inventory development.

11.2.1.2 CO HTBR Tree Model Development

Figure 11-15 illustrates the initial tree model used for truncated transformed CO emission rate in cruise mode. Results for initial model are given in Table 11-6. The tree grew into a complex model with a considerable number of branches and 65 terminal nodes. Figure 11-16 illustrates the amount of deviation explained corresponding to the number of terminal nodes.

253

Figure 11-15 Original Untrimmed Regression Tree Model for Truncated Transformed CO Emission Rate in Cruise Mode

Figure 11-16 Reduction in Deviation with the Addition of Nodes of Regression Tree for Truncated Transformed CO Emission Rate in Cruise Mode

254 Table 11-6 Original Untrimmed Regression Tree Results for Truncated Transformed CO

Emission Rate in Cruise Mode

Regression tree: tree(formula = log.CO ~ model.year + odometer + temperature + baro + humidity + vehicle.speed + oil.temperture + oil.press + cool.temperature + eng.bar.press + engine.power + acceleration + bus360 + bus361 + bus363 + bus364 + bus372 + bus375 + bus377 + bus379 + bus380 + bus381 + bus382 + bus383 + bus384 + bus385 + dummy.grade, data = busdata10242006.1.4, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01) Variables actually used in tree construction: [1] "engine.power" "oil.press" "baro" [4] "cool.temperature" "vehicle.speed" "acceleration" [7] "humidity" "odometer" "dummy.grade" [10] "temperature" "eng.bar.press" "model.year" [13] "oil.temperture" Number of terminal nodes: 65 Residual mean deviance: 0.1089 = 4265 / 39150 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -2.335e+000 -1.783e-001 -1.233e-002 1.869e-016 1.691e-001 2.013e+000

For model application purposes, it is desirable to select a final model specification that balances the model’s ability to explain the maximum amount of deviation with a simpler model that is easy to interpret and apply. Figure 11-16 indicated that reduction in deviation with addition of nodes after 4, although potentially statistically significant, is very small. A simplified tree model was derived which ends in 4 terminal nodes as compared to the 67 terminal nodes in the initial model. The residual mean deviation only increased from 4265 to 5698 and yielded a much more efficient model. Results are shown in Table 11-7 and Figure 11-17. The CO cruise emission rate model will be based upon these results.

255

Figure 11-17 Trimmed Regression Tree Model for Truncated Transformed CO Emission Rate in Cruise Mode

Table 11-7 Trimmed Regression Tree Results for Truncated Transformed CO Emission

Rate in Cruise Mode

Regression tree: snip.tree(tree = tree(formula = log.CO ~ model.year + odometer + temperature + baro + humidity + vehicle.speed + oil.temperture + oil.press + cool.temperature + eng.bar.press + engine.power + acceleration + bus360 + bus361 + bus363 + bus364 + bus372 + bus375 + bus377 + bus379 + bus380 + bus381 + bus382 + bus383 + bus384 + bus385 + dummy.grade, data = busdata10242006.1.4, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01), nodes = c(4., 6., 7., 5.)) Variables actually used in tree construction: [1] "engine.power" Number of terminal nodes: 4 Residual mean deviance: 0.1453 = 5698 / 39210 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -2.679e+000 -2.065e-001 -7.150e-003 -4.942e-015 2.041e-001 2.452e+000 node), split, n, deviance, yval * denotes terminal node

1) root 39218 8170.0 -1.944 2) engine.power<114.355 27187 4482.0 -2.076 4) engine.power<15.445 8414 1639.0 -2.321 * 5) engine.power>15.445 18773 2115.0 -1.967 * 3) engine.power>114.355 12031 2147.0 -1.646 6) engine.power<181.235 7220 1146.0 -1.753 * 7) engine.power>181.235 4811 797.8 -1.487 *

256 This tree model suggested that engine power is the most important explanatory variable for CO emissions. This finding is consistent with NOx emissions. This tree will be used as reference for linear regression model development.

11.2.1.3 HC HTBR Tree Model Development

Figure 11-12 illustrates the initial tree model used for truncated transformed HC emission rate in cruise mode. Results for initial model are given in Table 11-8. The tree grew into a complex model with a considerable number of branches and 61 terminal nodes.

Figure 11-18 Original Untrimmed Regression Tree Model for Truncated Transformed HC Emission Rate in Cruise Mode

257 Table 11-8 Original Untrimmed Regression Tree Results for Truncated Transformed HC

Emission Rate in Cruise Mode

Regression tree: tree(formula = HC.25 ~ model.year + odometer + temperature + baro + humidity + vehicle.speed + oil.temperture + oil.press + cool.temperature + eng.bar.press + engine.power + acceleration + bus360 + bus361 + bus363 + bus364 + bus372 + bus375 + bus377 + bus379 + bus380 + bus381 + bus382 + bus383 + bus384 + bus385 + dummy.grade, data = busdata10242006.1.4, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01) Variables actually used in tree construction: [1] "bus363" "bus364" "engine.power" [4] "oil.temperture" "odometer" "oil.press" [7] "humidity" "cool.temperature" "bus381" [10] "bus377" "baro" "temperature" [13] "bus372" "vehicle.speed" "dummy.grade" [16] "bus385" Number of terminal nodes: 56 Residual mean deviance: 0.0008147 = 30.93 / 37960 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -1.862e-001 -1.595e-002 -3.021e-003 -1.297e-018 1.230e-002 2.886e-001

Figure 11-18 and Table 11-8 suggest that the tree analysis of HC emission rates identified a number of buses that appear to exhibit significantly different emission rates under all load conditions than the other buses (i.e. some of the bus dummy variables appeared as significant in the initial tree splits). Two bus dummy variables split the data pool at the first two levels of the HC tree model. This same result was noted for these buses in the acceleration mode. Although variety existing for three pollutants across 15 buses, the division was even more obvious for HC emissions (see Figure 11-4 and Table

11-2). Although it is tempting to develop different emission rates for these buses, to reduce emission rate deviation in the sample pool, it is difficult to justify doing so.

Unless these is an obvious reason to classify these three buses as high emitters (i.e. significantly higher than normal emitting vehicles, perhaps by as much as a few standard deviations from the mean), and unless there are enough data to develop separate emission rate models for high emitters, one cannot justify removing the data from the data set.

Until such data exist to justify treating these buses as high emitters, the bus dummy

258 variables for individual buses are removed from the analyses and all 15 buses are treated as part of the whole of the data.

Another tree model was generated excluding the bus dummy variables. However, odometer reading also had to be excluded because the previous “Bus 363<0.5” tree cutpoint was replaced by “odometer>282096” (i.e. was identically correlated to the same bus). This new tree model is illustrated in Figure 11-19 and Table 11-9. The tree model is then trimmed for application purposes, as was done for the NOx and CO models.

Figure 11-19 Trimmed Regression Tree Model for Truncated Transformed HC Emission Rate in Cruise Mode

259 Table 11-9 Trimmed Regression Tree Results for Truncated Transformed HC Emission

Rate in Cruise Mode

Regression tree: snip.tree(tree = tree(formula = HC.25 ~ temperature + baro + humidity + vehicle.speed + oil.temperture + oil.press + cool.temperature + eng.bar.press + engine.power + acceleration + dummy.grade, data = busdata10242006.1.4, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01), nodes = c(15., 28., 2., 29., 6.)) Variables actually used in tree construction: [1] "baro" "engine.power" "oil.temperture" Number of terminal nodes: 5 Residual mean deviance: 0.001207 = 45.87 / 38020 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -1.328e-001 -2.037e-002 -3.530e-003 1.177e-015 1.609e-002 3.256e-001 node), split, n, deviance, yval * denotes terminal node

1) root 38020 71.970 0.1876 2) baro<968.5 2957 2.349 0.1082 * 3) baro>968.5 35063 49.420 0.1943 6) engine.power<12.645 6821 13.850 0.1750 * 7) engine.power>12.645 28242 32.420 0.1989 14) oil.temperture<192.1 26727 29.900 0.2005 28) baro<980.5 11265 9.610 0.1918 * 29) baro>980.5 15462 18.820 0.2068 * 15) oil.temperture>192.1 1515 1.244 0.1706 *

The new tree model suggests that barometric pressure is the most important explanatory variable for HC emission rates. However, this finding is challenged by this fact that all those 2957 data points in the first left hand branch of the tree (barometric pressure < 968.5) belong to Bus 363. Although this dataset was collected under a wide variety of environmental conditions, the scope of barometric pressure was limited for individual buses tested. As reported earlier, Bus 363 exhibited significantly lower HC emissions that the other buses (see Figure 11-4). But, the reason is not clear at this time.

To develop a reasonable tree model given the limited data collected, the environmental parameters are excluded from the model until a greater distribution of environmental conditions can be represented in a test data set. With data collected from a more comprehensive testing program, environmental variables can be integrated into the model directly, or perhaps correction factors for the emission rates can be developed. The secondary trimmed tree is presented in Figure 11-20 and Table 11-10.

260

Figure 11-20 Secondary Trimmed Regression Tree Model for Truncated Transformed HC in Cruise Mode

Table 11-10 Trimmed Regression Tree Results for Truncated Transformed HC in Cruise

Mode

Regression tree: snip.tree(tree = tree(formula = HC.25 ~ engine.power + vehicle.speed + acceleration + oil.temperture + oil.press + cool.temperature + eng.bar.press, data = busdata10242006.1.4, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01), nodes = c(6., 5., 7., 4.)) Variables actually used in tree construction: [1] "eng.bar.press" "oil.press" "engine.power" Number of terminal nodes: 4 Residual mean deviance: 0.00148 = 56.27 / 38020 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -1.310e-001 -2.290e-002 -2.164e-003 1.281e-015 1.942e-002 3.220e-001 node), split, n, deviance, yval * denotes terminal node

1) root 38020 71.970 0.1876 2) eng.bar.press<99.9348 10827 24.640 0.1656 4) oil.press<345.25 4965 10.870 0.1400 * 5) oil.press>345.25 5862 7.754 0.1873 * 3) eng.bar.press>99.9348 27193 40.010 0.1963 6) engine.power<13.975 5879 12.660 0.1786 * 7) engine.power>13.975 21314 24.990 0.2012 *

The tree model excluding bus dummy variables, odometer readings, and environmental conditions is shown in Figure 11-20 and Table 11-11. This final tree model suggests that engine power is the most important explanatory variable for HC emissions. This finding is consistent with analysis of NOx and CO emission rates.

261 Although engine operating parameters such as oil pressure might impact emissions, such variables are not easy to implement in real-world models. After excluding engine barometric pressure and oil pressure from the tree model, leaving engine power only, the residual mean deviation increased slightly from 56.27 to 65.56. The final HTBR tree for HC emissions are shown in Figure 11-21 and Table 11-11. HC acceleration emission rate model will be developed based upon these results.

Figure 11-21 Final Regression Tree Model for Truncated Transformed HC and Engine Power in Cruise Mode

262 Table 11-11 Final Regression Tree Results for Truncated Transformed HC and Engine

Power in Cruise Mode

Regression tree: snip.tree(tree = tree(formula = HC.25 ~ engine.power, data = busdata10242006.1.4, na.action = na.exclude, mincut = 400, minsize = 800, mindev = 0.01), nodes = c(11., 10., 3.)) Number of terminal nodes: 4 Residual mean deviance: 0.001725 = 65.56 / 38020 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -1.372e-001 -2.070e-002 -6.875e-004 1.742e-015 2.090e-002 3.309e-001 node), split, n, deviance, yval * denotes terminal node

1) root 38020 71.970 0.1876 2) engine.power<15.335 8298 21.630 0.1666 4) engine.power<0.265 4617 9.741 0.1757 * 5) engine.power>0.265 3681 11.020 0.1551 10) engine.power<7.875 1746 3.849 0.1390 * 11) engine.power>7.875 1935 6.311 0.1697 * 3) engine.power>15.335 29722 45.660 0.1934 *

11.2.2 OLS Model Development and Refinement

Once a manageable number of modal variables have been identified through regression tree analysis, the modeling process moves into the phase in which ordinary least squares techniques are used to obtain a final model. The research objective here is to identify the extent to which the identified factors influence emission rate in cruise mode. Modelers rely on previous research, a priori knowledge, educated guesses, and stepwise regression procedures to identify acceptable functional forms, to determine important interactions, and to derive statistically and theoretically defensible models.

The final model will be our best understanding about the functional relationship between independent variables and dependent variable.

11.2.2.1 NOx Emission Rate Model Development for Cruise Mode

Based on previous analysis, truncated transformed NOx will serve as independent variable. However, modelers should keep in mind that the comparisons should always be made on the original untransformed scale of Y when comparing the performance of

263 statistical models. HTBR tree model results suggest that engine power is the best one to begin with.

11.2.2.1.1 Linear Regression Model with Engine Power Let’s select engine power to begin with, and estimate the model:

Y = β0 + β1(engine.power) + Error (1.1)

The regression run yields the following results.

Table 11-12 Regression Result for NOx Model 1.1

Call: lm(formula = NOx.50 ~ engine.power, data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.5717 -0.06302 0.006377 0.06653 1.259

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1815 0.0007 242.8528 0.0000 engine.power 0.0018 0.0000 274.7573 0.0000

Residual standard error: 0.09765 on 39372 degrees of freedom Multiple R-Squared: 0.6572 F-statistic: 75490 on 1 and 39372 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) engine.power -0.7526

Analysis of Variance Table

Response: NOx.50

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) engine.power 1 719.8396 719.8396 75491.58 0 Residuals 39372 375.4263 0.0095

The results suggest that engine power explains about 66% of the variance in truncated transformed NOx. F-statistic shows that β1≠0, and the linear relationship is statistically significant. To evaluate the model, residual normality is examined in the QQ plot and check constancy of variance by examining residuals vs. fitted values.

264

Figure 11-22 QQ and Residual vs. Fitted Plot for NOx Model 1.1

The residual plot in Figure 11-22 shows a departure from linear regression assumptions indicating a need to explore a curvilinear regression function. Since the variability at the different X levels appears to be fairly constant, a transformation on X is considered. The reason to consider transformation first is avoiding multicollinearity brought about by adding the second-order of X. Based on the prototype plot in Figure

11-22, the square root transformation and logarithmic transformation are tested. Scatter plots and residual plots based on each transformation should then be prepared and analyzed to determine which transformation is most effective.

Y = β0 + β1engine.power^(1/2) + Error (1.2)

Y = β0 + β1log10(engine.power+1) + Error (1.3)

The result for Model 1.2 will be shown in Table 11-13 and Figure 11-17, while the result for Model 1.3 will be shown in Table 11-14 and Figure 11-18.

265 Table 11-13 Regression Result for NOx Model 1.2

Call: lm(formula = NOx.50 ~ engine.power^(1/2), data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.5007 -0.04881 -0.0008896 0.05047 1.22

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.0874 0.0008 104.1024 0.0000 I(engine.power^(1/2)) 0.0311 0.0001 342.3056 0.0000

Residual standard error: 0.08364 on 39372 degrees of freedom Multiple R-Squared: 0.7485 F-statistic: 117200 on 1 and 39372 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) I(engine.power^(1/2)) -0.8649

Analysis of Variance Table

Response: NOx.50

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) I(engine.power^(1/2)) 1 819.8002 819.8002 117173.2 0 Residuals 39372 275.4656 0.0070

Figure 11-23 QQ and Residual vs. Fitted Plot for NOx Model 1.2

266 Table 11-14 Regression Result for NOx Model 1.3

Call: lm(formula = NOx.50 ~ log10(engine.power + 1), data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.4047 -0.06677 -0.002155 0.06107 1.182

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.0306 0.0012 25.5525 0.0000 log10(engine.power + 1) 0.1895 0.0007 279.4403 0.0000

Residual standard error: 0.09656 on 39372 degrees of freedom Multiple R-Squared: 0.6648 F-statistic: 78090 on 1 and 39372 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) log10(engine.power + 1) -0.9135

Analysis of Variance Table

Response: NOx.50

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) log10(engine.power + 1) 1 728.1347 728.1347 78086.87 0 Residuals 39372 367.1311 0.0093

Figure 11-24 QQ and Residual vs. Fitted Plot for NOx Model 1.3

267 The results suggest that by using square root transformed engine power, the model increases the amount of variance explained in truncated transformed NOx from about 66% (Model 1.1) to about 75% (Model 1.2), while remaing about 66% (Model 1.3) by using log transformed engine power.

Model 1.2 improves the R2 more than does Model 1.3. The residuals scatter plot for Model 1.2 (Figure 11-23) shows a more reasonably linear relation than Model 1.3

(Figure 11-24). Figure 11-23 also shows that Model 1.2 does a better job in improving the pattern of variance. QQ plot shows a kind of normality except two tails.

11.2.2.1.2 Linear Regression Model with Dummy Variables Figure 11-14 suggests that the relationship between NOx and engine power may be somewhat different, across the engine power ranges identified in the tree analysis.

That is, there may be higher or lower NOx emissions in different engine power operating ranges. One dummy variable is created to represent different engine power ranges identified in Figure 11-14 for use in linear regression analysis as illustrated below:

Engine power (bhp) Dummy1 <52.525 1 >=52.525 0

This dummy variable and the interaction between dummy variable and engine power are then tested to determine whether the use of the variables and interactions can help improve the model.

Y = β0 + β1 engine.power^(1/2) + β2 dummy1 + β3 dummy1engine.power^(1/2) +

Error (1.4) The result for Model 1.4 will be shown in Table 11-15 and Figure 11-25.

268 Table 11-15 Regression Result for NOx Model 1.4

Call: lm(formula = NOx.50 ~ engine.power^(1/2) + dummy1 * engine.power^(1/2), data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.4812 -0.04778 0.0001059 0.04843 1.195

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1581 0.0024 65.9078 0.0000 I(engine.power^(1/2)) 0.0254 0.0002 122.2468 0.0000 dummy1 -0.0682 0.0026 -25.9438 0.0000 I(engine.power^(1/2)):dummy1 0.0020 0.0003 6.1264 0.0000

Residual standard error: 0.08224 on 39370 degrees of freedom Multiple R-Squared: 0.7569 F-statistic: 40850 on 3 and 39370 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) I(engine.power^(1/2)) dummy1 I(engine.power^(1/2)) -0.9742 dummy1 -0.9123 0.8888 I(engine.power^(1/2)):dummy1 0.6175 -0.6339 -0.8171

Analysis of Variance Table

Response: NOx.50

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) I(engine.power^(1/2)) 1 819.8002 819.8002 121203.8 0.000000e+000 dummy1 1 8.9202 8.9202 1318.8 0.000000e+000 I(engine.power^(1/2)):dummy1 1 0.2539 0.2539 37.5 9.073785e-010 Residuals 39370 266.2915 0.0068

269

Figure 11-25 QQ and Residual vs. Fitted Plot for NOx Model 1.4

The results suggest that by using dummy variables and interactions with transformed engine power, the model increases the amount of variance explained in truncated transformed NOx from about 75% (Model 1.2) to about 77% (Model 1.4).

Model 1.4 slightly improves the R2 more than does Model 1.2. The residuals scatter plot for Model 1.4 (Figure 11-19) shows a slightly more reasonably linear relation.

Figure 11-19 also shows that Model 1.4 may also do a slightly better job in improving the pattern of variance. The QQ plot shows general normality with the exceptions arising in the tails. However, it is important to note that the model improvement, in terms of amount of variance explained by the model, is marginal at best.

11.2.2.1.3 Model Discussion

Previous sections provide the model development process from one OLS model to another OLS model. To test whether the linear regression with power was a beneficial

270 addition to the regression tree model, the mean ERs at HTBR end nodes (single value) are compared to the predictions from the linear regression function with engine power.

The results of the performance evaluation are shown in Table 11-16. The improvement in R2 associated with moving toward a linear function of engine power is tremendous.

Hence, the use of the linear regression function will provide a significant improve spatial and temporal model prediction capability. But, this linear regression function might still be improved. Since the R2 and slope in Table 11-16 are derived by comparing model predictions and actual observations for emission rates (untransformed y), these numbers are different from what in linear regression models.

The two transforms of engine power were tested; square root transformation or log transformation. The results of the performance evaluation are shown in Table 11-16.

Results suggest that linear regression function with square root transformation performs slightly better.

Given that the regression tree modeling exercise indicated that a number of power cutpoints may play a role in the emissions process, an additional modeling run was conducted. The results of the performance evaluation are shown in Table 11-16.

Analysis results suggest that linear regression function with dummy variable performs slightly better than the model without the power cutpoints.

Table 11-16 Comparative Performance Evaluation of NOx Emission Rate Models

Coefficient of determination Slope (R2) (β1) RMSE MPE Mean ERs 0.00003 0.99995 0.12008 -0.000006 Linear regression (power) 0.52896 0.81404 0.08542 0.01031 Linear regression (power^0.5) 0.61439 0.97511 0.07494 0.00707 Linear regression (log(power)) 0.58666 1.2872 0.08043 0.00933 Linear regression (power^0.5) 0.62666 1.0111 0.07372 0.00704 w/dummy variables

271 Although the linear regression function with dummy variables performs slightly better than linear regression function with square root transformation, it introduces more explanatory variables (dummy variable and the interaction with engine power) and increases the complexity of regression model. There is only one regression function for

Model 1.2 while there are two regression functions for Model 1.4. There is also no obvious reason why the engine may be performing slightly differently within these power regimes, yielding different regression slopes and intercepts. It may be that the fuel injection systems in these engines operate slightly differently under low load (near-idle) and high load conditions. This may be controlled by the engine computer. Or, it may be that there are a sufficient number of low power cruise operations and high power cruise operations that are incorrectly classified, and may be better classified as idle or acceleration events (perhaps due to GPS speed data errors). In any case, because the model with dummy variables does not perform appreciably better than the model without the dummy variables, the dummy variables are not included in the final model selection at this time. These dummy variables are, however, worth exploring when additional data from other engine technology groups become available for analysis. Model 1.2 is selected as the preliminary ‘final’ model.

The next step in model evaluation is to once again examine the residuals for the improved model. A principal objective was to verify that the statistical properties of the regression model conform with a set of properties of lease squares estimators. In summary, these properties require that the error terms are normally distributed, have a mean of zero, and have uniform variance.

Test for Constancy of Error Variance A plot of the residuals versus the fitted values is useful in identifying any patterns in the residuals. Figure 11-17 plot (b) shows this plot for NOx model. Without

272 considering variance due to high emission points and zero load data, there is no obvious pattern in the residuals across the fitted values.

Test of Normality of Error terms The first informal test normally reserved for the test of normality of error terms is a quantile-quantile plot of the residuals. Figure 11-17 plot (d) shows the normal quantile plot of the NOx model. The second informal test is to compare actual frequencies of the residuals against expected frequencies under normality. Under normality, we expect 68 percent of the residuals fall between ± MSE and about 90 percent fall between

± 1.645 MSE . Actually, 81.79% of residuals fall within the first limits, while 94.05% of residuals fall within the second limits. Thus, the actual frequencies here are reasonably consistent with those expected under normality. The heavy tails at both ends are a cause for concern, but are due to the nature of data set. For example, even after the transformation, the response variable is not a true normal distribution. Based on above analysis, the final NOx emission rate model selected for cruise mode is:

NOx = (0.087 + 0.0311(engine.power)^(1/2))^2

Analysis results support that the final NOx emission model is significantly better at explaining variability without making the model too complex. Since there is only one engine type, complexity may not be valid in terms of transferability. This model is specific to the engine classes employed in the transit bus operations. Different models may need to be developed for other engine classes and duty cycles.

11.2.2.2 CO Emission Rate Model Development for Cruise Mode

Based on previous analysis, truncated transformed CO will serve as the independent variable. However, modelers should keep in mind that the comparisons should always be made on the original untransformed scale of Y when comparing

273 statistical models. HTBR tree model results suggest that engine power is the best one to begin with.

11.2.2.2.1 Linear Regression Model with Engine Power Let’s select engine power to begin with, and estimate the model:

Y = β0 + β1engine.power + Error (2.1)

The regression run yields the following results.

Table 11-17 Regression Result for CO Model 2.1

Call: lm(formula = log.CO ~ engine.power, data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -2.779 -0.2088 -0.01417 0.2153 2.376

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -2.2230 0.0030 -751.4277 0.0000 engine.power 0.0033 0.0000 125.1304 0.0000

Residual standard error: 0.3859 on 39216 degrees of freedom Multiple R-Squared: 0.2853 F-statistic: 15660 on 1 and 39216 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) engine.power -0.7525

Analysis of Variance Table

Response: log.CO

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) engine.power 1 2331.251 2331.251 15657.62 0 Residuals 39216 5838.839 0.149

The results suggest that engine power explains about 29% of the variance in truncated transformed CO. F-statistic shows that β1≠0, and the linear relationship is statistically significant. To evaluate the model, the normality is examined in the QQ plot and check constancy of variance by examining residuals vs. fitted values.

274

Figure 11-26 QQ and Residual vs. Fitted Plot for CO Model 2.1

Although the residual plot in Figure 11-20 shows a linear relationship between engine power and truncated transformed CO, the square root transformation and logarithmic transformation are tested to see whether transformation could be help to improve the model. Scatter plots and residual plots based on each transformation should then be prepared and analyzed to decide which transformation is most effective.

Y = β0 + β1engine.power^(1/2) + Error (2.2)

Y = β0 + β1log10(engine.power+1) + Error (2.3)

The result for model 2.2 will be shown in Table 11-18 and Figure 11-27, while the result for model 2.3 will be shown in Table 11-19 and Figure 11-28.

275 Table 11-18 Regression Result for CO Model 2.2

Call: lm(formula = log.CO ~ engine.power^(1/2), data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -2.679 -0.2124 -0.01769 0.2178 2.319

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -2.3645 0.0039 -610.0636 0.0000 I(engine.power^(1/2)) 0.0526 0.0004 125.3638 0.0000

Residual standard error: 0.3857 on 39216 degrees of freedom Multiple R-Squared: 0.2861 F-statistic: 15720 on 1 and 39216 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) I(engine.power^(1/2)) -0.8646

Analysis of Variance Table

Response: log.CO

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) I(engine.power^(1/2)) 1 2337.466 2337.466 15716.09 0 Residuals 39216 5832.624 0.149

Figure 11-27 QQ and Residual vs. Fitted Plot for CO Model 2.2

276 Table 11-19 Regression Result for CO Model 2.3

Call: lm(formula = log.CO ~ log10(engine.power + 1), data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -2.636 -0.2225 -0.0167 0.2193 2.308

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -2.4326 0.0050 -489.4690 0.0000 log10(engine.power + 1) 0.3031 0.0028 107.5567 0.0000

Residual standard error: 0.4011 on 39216 degrees of freedom Multiple R-Squared: 0.2278 F-statistic: 11570 on 1 and 39216 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) log10(engine.power + 1) -0.9132

Analysis of Variance Table

Response: log.CO

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) log10(engine.power + 1) 1 1861.106 1861.106 11568.45 0 Residuals 39216 6308.983 0.161

Figure 11-28 QQ and Residual vs. Fitted Plot for CO Model 2.3

277 The results suggest that by using transformed engine power, the model remains the amount of variance explained in truncated transformed CO about 29% (Model 2.2), even decreases to 23% (Model 2.3).

Considering two kinds of transformation, Model 2.2 improves the R2 more than does Model 2.3. The residuals scatter plot for Model 2.2 (Figure 11-27) shows a more reasonably linear relation than Model 2.3 (Figure 11-28). Figure 11-27 also shows that

Model 2.2 does a better job in improving the pattern of variance comparing with Model

2.3. QQ plot shows a kind of normality except two tails. That means Model 2.1 and

Model 2.2 are both acceptable at this moment.

11.2.2.1.2 Linear Regression Model with Dummy Variables

Figure 11-17 suggests that the relationship between CO and engine power may be somewhat different across the engine power ranges identified in the tree analysis. That is, there may be higher or lower CO emissions in different engine power operating ranges.

One dummy variable is created to represent different engine power ranges identified in

Figure 11-17 for use in linear regression analysis as illustrated below:

Engine power (bhp) Dummy1 <114.355 1 >=114.355 0

This dummy variable and the interaction between dummy variable and engine power are then tested to determine whether the use of the variable and interactions can help improve the model.

Y = β0 + β1 engine.power^(1/2) + β2 dummy1 + β3 dummy1 engine.power^(1/2) +

Error (2.4)

278 Table 11-20 Regression Result for CO Model 2.4

*** Linear Model ***

Call: lm(formula = log.CO ~ engine.power^(1/2) + dummy1 * engine.power^(1/2), data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -2.714 -0.2081 -0.01473 0.2136 2.37

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -2.6690 0.0250 -106.5896 0.0000 I(engine.power^(1/2)) 0.0772 0.0019 41.2399 0.0000 dummy1 0.3472 0.0254 13.6516 0.0000 I(engine.power^(1/2)):dummy1 -0.0338 0.0020 -17.0016 0.0000

Residual standard error: 0.3836 on 39214 degrees of freedom Multiple R-Squared: 0.2936 F-statistic: 5432 on 3 and 39214 degrees of freedom, the p-value is 0

Analysis of Variance Table

Response: log.CO

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) I(engine.power^(1/2)) 1 2337.466 2337.466 15881.03 0 dummy1 1 18.325 18.325 124.50 0 I(engine.power^(1/2)):dummy1 1 42.545 42.545 289.05 0 Residuals 39214 5771.754 0.147

Figure 11-29 QQ and Residual vs. Fitted Plot for CO Model 2.4

279 Model 2.4 only improves R2 marginally and remains the amount of variance explained in truncated transformed CO about 29%, same as Model 2.1 and Model 2.2.

Model 2.4 slightly improves R2 more than does Model 2.2. The residuals scatter plot for

Model 2.4 (Figure 11-29) shows a reasonably linear relation. Figure 11-29 also shows that model 2.4 does a good job in improving the pattern of variance. QQ plot shows general normality with the exceptions arising in the tails. Until now, these three models

(Model 2.1, Model 2.2, and Model 2.4) are all acceptable.

11.2.2.2.3 Model Discussion

The previous sections outline the model development process from regression tree model, to a simple OLS model, to more complex OLS models. Since the performance of the models are evaluated by comparing model predictions and actual observations for emission rates, the R2 and slope are different from those in previous linear regression models. The results of each step in the model improvement process are presented in

Table 11-21. The mean emission rates at HTBR end nodes (single value) are compared to the results of various linear regression functions with engine power. Since the R2 and slope in Table 11-21 are derived by comparing model predictions and actual observations for emission rates (untransformed y), these numbers are different from what in linear regression models.

Table 11-21 Comparative Performance Evaluation of CO Emission Rate Models

Coefficient of determination Slope (R2) (β1) RMSE MPE Mean ERs 0.000005 1.0000 0.047559 0.0000002 Linear regression (power) 0.08799 1.4222 0.04622 0.00749 Linear regression (power^0.5) 0.08992 1.9840 0.04662 0.00804 Linear regression (log(power)) 0.06592 2.5597 0.04736 0.00866 Linear regression (power^0.5) 0.09152 1.6566 0.04634 0.00777 w/dummy variables

280 The improvement in R2 associated with moving toward a linear function of engine power is significant. Hence, the use of the linear regression function will provide a significant improvement on spatial and temporal model prediction capability. But, this linear regression function might still be improved.

Results suggest that linear regression function with square root transformation performs slightly better than the others and that the use of dummy variables can further improve model performance. However, given the marginal improvement in R2, one could argue that use of the engine power may be just as reasonable considering the slope,

RMSE, and MPE. Although the linear regression function with dummy variables performs slightly better than other linear regression models, it introduces more explanatory variables (dummy variables and the interaction with engine power) and increases the complexity of regression model. As discussed in Section 11.2.2.1, there is no compelling reason to include the dummy variables in the model, given that: 1) the second model is more complex without significantly improving model performance, and

2) there is no compelling engineering reason at this time to support the difference in model performance within these specific power regions. These dummy variables are, however, worth exploring when additional data from other engine technology groups become available for analysis. Considering all four parameters together, Model 2.1 is recommended as the preliminary ‘final’ model. The next step in model evaluation is to once again examine the residuals for the improved model. A principal objective was to verify that the statistical properties of the regression model conform with a set of properties of lease squares estimators. In summary, these properties require that the error terms are normally distributed, have a mean of zero, and have uniform variance.

Test for Constancy of Error Variance

281 A plot of the residuals versus the fitted values is useful in identifying patterns in the residuals. Figure 11-20 plot (b) shows this plot for CO model. Without considering variance due to high emission points and zero load data, there is no obvious pattern in the residuals across the fitted values.

Test of Normality of Error Terms The first informal test normally reserved for the test of normality of error terms is a quantile-quantile plot of the residuals. Figure 11-20 plot (c) shows the normal quantile plot of the CO model. The second informal test is to compare actual frequencies of the residuals against expected frequencies under normality. Under normality, we expect 68 percent of the residuals fall between ± MSE and about 90 percent fall between

± 1.645 MSE . Actually, 95.20% of residuals fall within the first limits, while 96.97% of residuals fall within the second limits. Thus, the actual frequencies here are reasonably consistent with those expected under normality. The heavy tails at both ends are a cause for concern, but are due to the nature of data set. For example, even after the transformation, the response variable is not the real normal distribution.

Based on above analysis, the final CO emission rate model for cruise mode is:

CO = 10^(-2.2230+0.0033engine.power)

11.2.2.3 HC Emission Rate Model Development for Cruise Mode

Based on previous analysis, truncated transformed HC will serve as the independent variable. However, modelers should keep in mind that the comparisons should always be made on the original untransformed scale of Y when comparing statistical models. Previous analysis results suggest that engine power is the best one to begin with.

11.2.2.3.1 Linear Regression Model with Engine Power

Let’s select engine power to begin with, and estimate the model:

282 Y = β0 + β1engine.power + Error (3.1)

The regression run yields the following results.

Table 11-22 Regression Result for HC Model 3.1

Call: lm(formula = HC.25 ~ engine.power, data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.123 -0.0212 0.00002295 0.02228 0.3279

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1769 0.0003 537.0480 0.0000 engine.power 0.0001 0.0000 43.0656 0.0000

Residual standard error: 0.04248 on 38018 degrees of freedom Multiple R-Squared: 0.04651 F-statistic: 1855 on 1 and 38018 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) engine.power -0.7501

Analysis of Variance Table

Response: HC.25

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) engine.power 1 3.34748 3.347484 1854.647 0 Residuals 38018 68.61934 0.001805

The results suggest that engine power explains about 5% of the variance in truncated transformed HC. F-statistic shows that β1≠0, and the linear relationship is statistically significant. To evaluate the model, the normality is examined in the QQ plot and check constancy of variance by examining residuals vs. fitted values.

283

Figure 11-30 QQ and Residual vs. Fitted Plot for HC Model 3.1

The residual plot in Figure 11-30 shows a slight departure from linear regression assumptions indicating a need to explore a curvilinear regression function. Since the variability at the different X levels appears to be fairly constant, a transformation on X is considered. The reason to consider transformation first is avoiding multicollinearity brought about by adding the second-order of X. Based on the prototype plot in Figure

11-30, the square root transformation and logarithmic transformation are tested. Scatter plots and residual plots based on each transformation should then be prepared and analyzed to determine which transformation is most effective.

Y = β0 + β1engine.power^(1/2) + Error (3.2)

Y = β0 + β1log10(engine.power+1) + Error (3.3)

The result for Model 3.2 will be shown in Table 11-23 and Figure 11-31, while the result for Model 3.3 will be shown in Table 11-24 and Figure 11-32.

284 Table 11-23 Regression Result for HC Model 3.2

Call: lm(formula = HC.25 ~ engine.power^(1/2), data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.1233 -0.02113 -0.0002419 0.02195 0.3266

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1700 0.0004 396.7451 0.0000 I(engine.power^(1/2)) 0.0022 0.0000 47.6385 0.0000

Residual standard error: 0.04227 on 38018 degrees of freedom Multiple R-Squared: 0.05633 F-statistic: 2269 on 1 and 38018 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) I(engine.power^(1/2)) -0.8625

Analysis of Variance Table

Response: HC.25

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) I(engine.power^(1/2)) 1 4.05395 4.053948 2269.422 0 Residuals 38018 67.91288 0.001786

Figure 11-31 QQ and Residual vs. Fitted Plot for HC Model 3.2

285 Table 11-24 Regression Result for HC Model 3.3

Call: lm(formula = HC.25 ~ log10(engine.power + 1), data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.127 -0.02073 -0.0003198 0.02203 0.3226

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1653 0.0005 313.2136 0.0000 log10(engine.power + 1) 0.0139 0.0003 46.4046 0.0000

Residual standard error: 0.04233 on 38018 degrees of freedom Multiple R-Squared: 0.05361 F-statistic: 2153 on 1 and 38018 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) log10(engine.power + 1) -0.9114

Analysis of Variance Table

Response: HC.25

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) log10(engine.power + 1) 1 3.85779 3.857786 2153.39 0 Residuals 38018 68.10904 0.001791

Figure 11-32 QQ and Residual vs. Fitted Plot for HC Model 3.3

286 The results suggest that by using transformed engine power, the model remains the amount of variance explained in truncated transformed HC about 5% (Model 2.2 and

Model 2.3). The improvement is very small.

Model 3.2 improves R2 relative to Model 3.3. The scatter plot for Model 3.2

(Figure 11-27) also shows a reasonably linear relation than Model 2.3 (Figure 11-28).

Figure 11-27 also shows that Model 3.2 does a good job in improving the pattern of variance. QQ plot shows general normality with the exceptions arising in the tails.

11.2.2.3.3 Linear Regression Model with Dummy Variables

Figure 11-21 suggests that the relationship between HC and engine power may differ across the engine power ranges. One dummy variable is created to represent different engine power ranges identified in Figure 11-21 for use in linear regression analysis as illustrated below:

Engine power (bhp) Dummy1 <15.335 1 >=15.335 0

This dummy variable and the interaction between dummy variable and engine power are then tested to determine whether the use of the variable and interaction can help improve the model.

Y = β0 + β1 log10(engine.power+1) + β2 dummy1 + β3 dummy1 log10(engine.power+1) + Error (3.4)

287 Table 11-25 Regression Result for HC Model 3.4

*** Linear Model ***

Call: lm(formula = HC.25 ~ log10(engine.power + 1) + dummy1 * log10(engine.power + 1), data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.1292 -0.0209 -0.0007262 0.02123 0.3423

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1695 0.0015 109.7632 0.0000 log10(engine.power + 1) 0.0124 0.0008 15.7058 0.0000 dummy1 0.0022 0.0017 1.3388 0.1807 dummy1:log10(engine.power + 1) -0.0249 0.0012 -20.1153 0.0000

Residual standard error: 0.04184 on 38016 degrees of freedom Multiple R-Squared: 0.07514 F-statistic: 1030 on 3 and 38016 degrees of freedom, the p-value is 0

Analysis of Variance Table

Response: HC.25

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) log10(engine.power + 1) 1 3.85779 3.857786 2203.411 0 dummy1 1 0.84128 0.841276 480.503 0 dummy1:log10(engine.power + 1) 1 0.70843 0.708425 404.624 0 Residuals 38016 66.55934 0.001751

Figure 11-33 QQ and Residual vs. Fitted Plot for HC Model 3.4

288 The results suggest that by using dummy variables and interactions with transformed engine power, the model only increases the amount of variance explained in truncated transformed HC from about 5% to about 8%.

Model 3.4 slightly improved R2 relative to Model 3.2. The F-statistic shows that all β values are not equal to zero, and the linear relationship is statistically significant.

The gap in the residuals plot may be shifted by the difference of two regression functions regarding to the intercept and slope.

11.2.2.3.3 Model Discussion

The previous sections outline the model development process from regression tree model, to a simple OLS model, to more complex OLS models. Since the performance of the models are evaluated by comparing model predictions and actual observations for emission rates, the R2 and slope are different from those in previous linear regression models. To test whether the linear regression with power was a beneficial addition to the regression tree model, the mean ERs at HTBR end nodes (single value) are compared to the predictions from the linear regression function with engine power. The results of the performance evaluation are shown in Table 11-26. The improvement in R2 associated with moving toward a linear function of engine power is nearly imperceptible. Hence, the use of the linear regression function will provide almost no significant improve spatial and temporal model prediction capability. This linear regression function might still be improved. Since the R2 and slope in Table 11-26 are derived by comparing model predictions and actual observations for emission rates (untransformed y), these numbers are different from what in linear regression models.

289 Table 11-26 Comparative Performance Evaluation of HC Emission Rate Models

Coefficient of determination Slope (R2) (β1) RMSE MPE Mean ERs 0.00002 1.00020 0.0020519 0.0000003 Linear regression (power) 0.00766 0.88591 0.0020984 0.00047397 Linear regression 0.00912 0.72400 0.0020845 0.00040936 (power^0.5) Linear regression 0.00950 0.82019 0.0020831 0.00040857 (log(power)) Linear regression (log(power)) w/dummy 0.00939 -1.1423 0.0022933 0.00097449 variables

Results suggest that linear regression function with log transformation performs slightly better than the others and that the use of dummy variables can further improve model performance, but again there is almost no perceptible change in terms of explained variance. Although the linear regression function with log transformation and dummy variables performs slightly better than linear regression function with square root transformation alone, the revised model introduces additional explanatory variables

(dummy variables and the interaction with engine power) and increases the complexity of regression model without significantly improving the model. As discussed in Section

11.2.2.1, there is no compelling reason to include the dummy variables in the model, given that: 1) the second model is more complex without significantly improving model performance, and 2) there is no compelling engineering reason at this time to support the difference in model performance within these specific power regions. These dummy variables are, however, worth exploring when additional data from other engine technology groups become available for analysis.

Model 3.2 is recommended as the preliminary “final” model (although one might argue that using directly the regression tree results would also probably be acceptable).

The next step in model evaluation is to once again examine the residuals for the improved

290 model. A principal objective was to verify that the statistical properties of the regression model conform with a set of properties of lease squares estimators. In summary, these properties require that the error terms are normally distributed, have a mean of zero, and have uniform variance.

Test for Constancy of Error Variance A plot of the residuals versus the fitted values is useful in identifying any patterns in the residuals. Figure 11-28 plot (c) shows this plot for HC model. Without considering variance due to high emission points and zero load data, there is no obvious pattern in the residuals across the fitted values.

Test of Normality of Error terms

The first informal test normally reserved for the test of normality of error terms is a quantile-quantile plot of the residuals. Figure 11-28 plot (d) shows the normal quantile plot of the HC model. The second informal test is to compare actual frequencies of the residuals against expected frequencies under normality. Under normality, we expect 68 percent of the residuals fall between ± MSE and about 90 percent fall between

± 1.645 MSE . Actually, 95.20% of residuals fall within the first limits, while 96.99% of residuals fall within the second limits. Thus, the actual frequencies here are reasonably consistent with those expected under normality. The heavy tails at both ends are a cause for concern, but are due to the nature of data set. For example, even after the transformation, the response variable is not the real normal distribution. The final HC emission rate model selected for cruise mode is:

HC = (0.1700 + 0.0022(engine.power)^(1/2))^4

291 11.3 Conclusions and Further Considerations

In this research, engine power is used as the main explanatory variable to develop cruise emission rate models. The explanatory ability of engine power varies by pollutant.

In general, the relationship between NOx and engine power is more highly correlated than the other two pollutants.

Inter-bus variability analysis indicated that some of the 15 buses are higher emitters that others (especially noted for HC emissions). However, none of the buses appear to qualify as traditional high-emitters, which would exhibit emission rates of two to three standard deviations above the mean. Hence, it is difficult to classify any of these

15 buses as high emitters for modeling purposes. At this moment, these 15 buses are treated as a whole for model development. Modelers should keep in mind that although no true high-emitters are present in the database, such vehicles may behave significantly differently than the vehicles tested. Hence, data from high-emitting vehicles should be collected and examined in future studies.

Some high HC emissions events are noted in cruise mode. After screening engine speed, engine power, engine oil temperature, engine oil pressure, engine coolant temperature, ECM pressure, and other parameters, no variables were identified that could be linked to these high emissions events. It may be that these events represent natural variability in onroad emissions, or it may be that some other variable (such as grade or an engine variable that is not measured) may be linked to these vents.

Engine power is selected as the most important variable for three pollutants based on HTBR tree models. This finding is consistent to previous research results which verified the important role of engine power (Ramamurthy et al. 1998; Clark et al. 2002;

Barth et al. 2004). The noted HC relationship is significant but fairly weak. Analysis in previous chapters also indicates that engine power is correlated with not only onroad load parameters such as vehicle speed, acceleration, and grade, but also potentially with engine operating parameters such as throttle position and engine oil pressure. On the

292 other hand, engine power in this research is derived from engine speed, engine torque and percent engine load.

The regression tree models still suggest that some other variables, like oil pressure and engine bar pressure, may also impact the HC emissions. Further analysis demonstrates the using engine power only could get the similar explanatory ability as using engine power and other variables. To develop models that are efficient and easy to implement, only engine power is used to develop emission models. However, additional investigation into these variables is warranted as additional detailed data from engine testing become available for analysis.

Given the relationships noted between engine indicated HP and emission rates, it is imperative that data be collected to develop solid relationships in engine power demand models (estimating power demand as a function speed/acceleration, grade, vehicle characteristics, surface roughness, inertial losses, etc.) for use in regional inventory development and microscale impact assessment.

In summary, the cruise emission rate models selected for implementation as:

NOx = (0.0087+0.0311 (engine.power)^(1/2))^2

CO = 10^(-2.2230+0.0033engine.power)

HC = (0.1700+0.0022 (engine.power)^(1/2)) ^4

293 CHAPTER 12

MODEL VERIFICATION

In the previous chapters, three statistically-derived modal emission rate models were developed for use in predicting emissions of NOx, CO and HC from transit buses.

This chapter discusses the reasons for using engine power instead of surrogate power variables in emission rate modeling, the necessity of developing linear regression model rather than using mean emission rates, the need to introduce driving mode with load modeling, the possibility of combing acceleration and cruise modes, and other issues.

12.1 Engine Power vs. Surrogate Power Variables

The first step to verify the model is comparing the explanatory power of real load data and surrogate power variable. Different approaches have been proposed by several researchers. The MOVES model employs vehicle specific power (VSP), defined as instantaneous power per unit mass of the vehicle (Jimenez-Palacios 1999).

Vehicle Specific Power (VSP) is a measure of the road load on a vehicle; it is defined as the power per unit mass to overcome road grade, rolling & aerodynamic resistance, and inertial acceleration (Jimenez-Palacios 1999; USEPA 2002b; Nam 2003;

Younglove et al. 2005):

3 VSP = v * (a * (1+ γ ) + g * grade + g *CR ) + 0.5ρ *CD * A* v / m Where:

v: vehicle speed (assuming no headwind) in m/s

a: vehicle acceleration in m/s2

γ : mass factor accounting for the rotational masses (~0.1)

g: acceleration due to gravity

grade: road grade

CR: rolling resistance (~0.0135)

294 ρ : air density (1.2)

CD: aerodynamic drag coefficient A: the frontal area

M: vehicle mass in metric tones

Using typical values for coefficients, in SI units the equation become (CDA/m ~

0.0005)(Younglove et al. 2005):

VSP approach to emission characterization was developed by several researchers

(Jimenez-Palacios 1999; USEPA 2002b; Nam 2003; Younglove et al. 2005) and further developed as part of the MOVES model. The coefficients used to estimate VSP were different in previous research because of the choice of typical values of coefficients.

However, the coefficients given in above equation are specific for light-duty vehicle. For example, a mass factor of 0.1 is not suitable to describe the transit bus characteristics on inertial loss. This surrogate power variable (VSP) is not suitable to compare with engine load data for this study. First, the implementation approach that is used in MOVES is based upon VSP bins, and not on instantaneous VSP. Second, the coefficients given in above equation are specific for light-duty vehicle, not for transit bus.

Other research efforts have used surrogate power variables such as the inertial power surrogate, defined as acceleration times velocity, and drag power surrogate, defined as acceleration times velocity squared (Fomunung 2000). Barth and Frey also used acceleration times velocity for power demand estimation (Barth and Norbeck 1997;

Frey et al. 2002). Both surrogate variables for power demand can be used in comparing of NOx in cruise mode. Using surrogate variables instead of real load data, the model will be:

295 Y = β0 + β1 acceleration + β2 vehicle.speed + β3 vehicle.speed*acceleration

+ β4 vehicle.speed^2*acceleration + Error (1)

Table 12-1 Regression Result for NOx Model 1

Call: lm(formula = NOx.50 ~ vehicle.speed * acceleration + vehicle.speed^2: acceleration, data = busdata10242006.1.4, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.4779 -0.08625 0.001824 0.08759 1.338

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.1996 0.0018 113.0559 0.0000 vehicle.speed 0.0043 0.0001 77.4369 0.0000 acceleration 0.0738 0.0052 14.2957 0.0000 vehicle.speed:acceleration 0.0066 0.0004 15.5704 0.0000 acceleration:I(vehicle.speed^2) -0.0001 0.0000 -13.7590 0.0000

Residual standard error: 0.1323 on 39369 degrees of freedom Multiple R-Squared: 0.3708 F-statistic: 5801 on 4 and 39369 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) vehicle.speed acceleration vehicle.speed -0.9243 acceleration 0.0796 -0.0590 vehicle.speed:acceleration -0.0825 0.0569 -0.9114 acceleration:I(vehicle.speed^2) 0.0782 -0.0593 0.7978

vehicle.speed:acceleration vehicle.speed acceleration vehicle.speed:acceleration acceleration:I(vehicle.speed^2) -0.9678

Analysis of Variance Table

Response: NOx.50

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) vehicle.speed 1 122.5215 122.5215 6999.67 0 acceleration 1 278.9165 278.9165 15934.55 0 vehicle.speed:acceleration 1 1.4036 1.4036 80.19 0 acceleration:I(vehicle.speed^2) 1 3.3136 3.3136 189.31 0 Residuals 39369 689.1106 0.0175

296

Figure 12-1 QQ and Residual vs. Fitted Plot for NOx Model 1

The result suggests that the surrogate variable model can explain about 37 % of the variance in truncated transformed NOx, whereas the OLS model developed in

Chapter 10 explained more than 75% of the cruise mode variance. Considering the theoretical equation of engine power presented much earlier in Chapter 3, the surrogate variables can only represent some, and not all of the components of engine power. Given the importance of engine power in explaining the variability of emissions, it is essential that field data collection efforts include the measurement of indicated load data as well as all of the operating conditions necessary to estimate bhp load when second-by-second emission rate data are collected.

12.2 Mean Emission Rates vs. Linear Regression Model

The modeling approach employed in this research involved the separation of data into separate driving modes for analysis and then applying modeling techniques to derive emission rates as a function of engine load. Although constant emission rates in grams/second were adequate for idle, motoring, and non-motoring deceleration modes, modeling efforts in Chapters 10 and 11 demonstrated that linear regression function should improve spatial and temporal model prediction capability significantly for acceleration and cruise modes. However, one verification comparison that should be

297 undertaken is on the overall benefit of introducing engine load into the modeling regime vs. simply using average emission rate values for each operating mode. This comparison will provide insight into the overall effect of introducing engine load (even though it is only introduced into acceleration and cruise modes).

There are a number of model goodness-of-fit criteria that can be used to assess the difference between the emissions predicted by load-based modal emission rate model and the mode-only emission rate models. Normally, one would compare the alternative model performance for an independent set of data collected from similar vehicles, which is currently not available. Alternatively, model developers would set aside a significant subset of the data in the model development data set so that the data are not used in model development and instead used in model comparisons. However, there were not enough data available to do this. Hence, at this time, the only comparisons that can be made are for alternative model performance using the same data that were used to develop the models presented in this research effort.

The performances of the models are first evaluated by comparing model predictions and actual observations for emission rates. The performance of the model can be evaluated in terms of precision and accuracy (Neter et al. 1996). The R2 value is an indication of precision. Usually, higher R2 values imply a higher degree of precision and less unexplained variability in model predictions than lower R2 values. The slope of the trend line for the observed versus predicted values is an indication of accuracy. A slope of one indicates an accurate prediction, in that the prediction of the model corresponds to an observation.

The models' predictive ability is also evaluated using the root mean square error

(RMSE) and the mean prediction error (MPE) (Neter et al. 1996). The RMSE is a measure of prediction error. When comparing two models, the model with a smaller

RMSE is a better predictor of the observed phenomenon. Ideally, mean predication error is close to zero. RMSE and MPE are calculated as follows:

298

n 1 ) 2 RMSE = ∑(yi − yi ) n i=1 n 1 ) MPE = ∑(yi − yi ) n i=1

To test whether the linear regression with power was a beneficial addition to the regression tree model, the mean ERs at HTBR end nodes (single value) are compared to the predictions from the linear regression function with engine power. The results of the performance evaluation are shown in table 12-2.

Table 12-2 Comparative Performance Evaluation between Mode-Only Models and

Linear Regression Models

Coefficient of Slope (β1) RMSE MPE determination (R2) NOx Mean ERs 0.43800 1.0001 0.08725 0.000002 Linear Regression 0.66519 1.1018 0.07122 0.021463 CO Mean ERs 0.24787 0.9999 0.07406 -0.000004 Linear Regression 0.49055 1.7490 0.06691 0.010285 HC Mean ERs 0.06856 0.9998 0.00190 0.0000005 Linear Regression 0.06766 1.2130 0.00192 0.000223

For NOx and CO, the R2 values indicate that load based modal emission model perform slightly better than mean emission rates and the use of linear regression function can further improve model performance. This reinforces the importance of introducing linear regression functions in acceleration and cruise mode. For HC, there is no discernible difference in model performance. Combining this finding with the performance results for HC noted in Chapters 8 through 11, it may be that using constant emission rates for each operating mode could be justified for this data set. When

299 additional data are collected, researchers should compare mean emission rates approaches to power-based approaches to ensure that power demand models for HC are necessary.

12.3 Mode-specific Load Based Modal Emission Rate Model

vs. Emission Rate Models as a Function of Engine Load

Modal modeling approaches are becoming widely accepted as more accurate in making realistic estimates of mobile source contribution to local and regional air quality.

Research in Georgia Tech has clearly identified that modal operation is a better indicator of emission rates than average speed (Bachman 1998). The analysis of emissions with respect to driving modes, also referred to as modal emissions, has been done in several recent research (Barth et al. 1996; Bachman 1998; Fomunung et al. 1999; Frey et al. 2002;

Nam 2003; Barth et al. 2004). These studies indicated that driving modes might have ability to explain certain portion of variability of emission data. In Chapters 10 and 11, emission rates were derived as a function of driving mode (cruise, idle, acceleration, and deceleration operations) and engine power because previous research efforts had separately suggested that vehicle emission rates were highly correlated with modal activity and engine power. In this research, five driving modes are introduced in total: idle mode, deceleration motoring mode, revised deceleration mode, acceleration mode, and cruise mode.

Chapters 10 and 11 did not compare the combined modal and engine power models to models that use power alone to predict emission rates. To test the effect of adding driving modes in the emission rate model, the derivation of a load-only model for

NOx emissions is illustrated in detail. Load-only CO emissions model and HC emissions model are also derived for comparison purposes and presented in final form (but the detailed regression plots and tables are omitted for the purposes of brevity).

300 As was done in previous chapters, first step for load based only model is to select the most important variable for NOx emissions. When using the entire database at once

(data are not broken into mode subsets for this derivation), the appropriate transformation for NOx is ¼ based on Box-Cox results, rather than the ½ value used in developing models for acceleration and cruise mode (see Chapters 10 and 11). The trimmed HTBR tree models for NOx are illustrated in Figure 12-2 and Table 12-3.

Figure 12-2 Trimmed Regression Tree Model for Truncated Transformed NOx

Table 12-3 Trimmed Regression Tree Results for Truncated Transformed NOx

Regression tree: tree(formula = NOx.25 ~ engine.power + vehicle.speed + acceleration + oil.temperture + oil.press + cool.temperature + eng.bar.press + model.year + odometer + bus360 + bus361 + bus363 + bus364 + bus372 + bus375 + bus377 + bus379 + bus380 + bus381 + bus382 + bus383 + bus384 + bus385 + dummy.grade, data = busdata10242006.1, na.action = na.exclude, mincut = 3000, minsize = 6000, mindev = 0.1) Variables actually used in tree construction: [1] "engine.power" Number of terminal nodes: 4 Residual mean deviance: 0.005837 = 618.6 / 106000 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -5.187e-001 -4.510e-002 -9.204e-003 3.768e-016 5.004e-002 6.557e-001 node), split, n, deviance, yval * denotes terminal node

301 Table 12-3 Continued

1) root 105976 3058.00 0.4991 2) engine.power<41.535 62441 666.60 0.3823 4) engine.power<4.515 17897 195.50 0.2768 * 5) engine.power>4.515 44544 192.20 0.4246 * 3) engine.power>41.535 43535 316.60 0.6667 6) engine.power<96.255 11504 61.56 0.5926 * 7) engine.power>96.255 32031 169.20 0.6933 *

After testing different transformations for Y and adding dummy variables according to HTBR results, Table 12-4 and Figure 12-3 show that load based only model for NOx emissions is a fairly good model, considering the constancy of error variance and normality of error terms. So, the final load based only model for NOx emissions are:

NOx = (0.2303 + 0.1950log10(engine.power+1))^4

Table 12-4 Regression Result for NOx Load-Based Only Emission Rate Model

Call: lm(formula = NOx.25 ~ log10(engine.power + 1), data = busdata10242006.1, na.action = na.exclude) Residuals: Min 1Q Median 3Q Max -0.4683 -0.04297 -0.01329 0.04138 0.663

Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 0.2303 0.0005 489.9131 0.0000 log10(engine.power + 1) 0.1950 0.0003 657.2170 0.0000

Residual standard error: 0.0754 on 105974 degrees of freedom Multiple R-Squared: 0.803 F-statistic: 431900 on 1 and 105974 degrees of freedom, the p-value is 0

Correlation of Coefficients: (Intercept) log10(engine.power + 1) -0.8702

Analysis of Variance Table

Response: NOx.25

Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) log10(engine.power + 1) 1 2455.676 2455.676 431934.2 0 Residuals 105974 602.494 0.006

302

Figure 12-3 QQ and Residual vs. Fitted Plot for Load-Based Only NOx Emission Rate Model

Following the same derivation techniques, the final load-only model for CO emissions is:

CO = 10^(-2.6590 + 0.0899(engine.power)^(1/2))

Following the same derivation techniques, the final load-only model for HC emissions is:

HC = 10^(-3.3062 + 0.0382(engine.power)^(1/2))

The relative performance of the load-only models to the combined mode and load models developed in Chapters 8 through 11 are presented in Table 12-5.

303 Table 12-5 Comparative Performance Evaluation Between Load-Based Only Emission

Rate (ER) Model and Load-Based Modal Emission Rate Model

Coefficient of Slope RMSE MPE determination (β1) (R2) NOx Load-Only Emission Rate Model 0.71494 1.1810 0.06494 0.011382 Mode/Load Emission Rate Models 0.66519 1.1018 0.07122 0.021463 CO Load-Only Emission Rate Model 0.24629 2.0712 0.07886 0.015568 Mode/Load Emission Rate Models 0.49055 1.7490 0.06691 0.010285 HC Load-Only Emission Rate Model 0.06722 0.9815 0.00197 0.000499 Mode/Load Emission Rate Models 0.06766 1.2130 0.00192 0.000223

For NOx, both models perform well in explaining the variance of emission rates.

This reinforces the importance of including engine power as a variable in explaining the variance of NOx emission rates. Results suggest that mode/load modal emission modeling approach performs slightly better than load-only emission rate models for CO.

For HC, there is no discernible difference in model performance. Combining this finding with the performance results for HC noted in Chapters 8 through 11, it may be that using constant emission rates for each operating mode could be justified for this data set. When additional data are collected, researchers should compare mode-only approaches to power-based approaches to ensure that power demand models for HC are necessary.

12.4 Separation of Acceleration and Cruise Modes

In this research effort, separate models were developed for acceleration and cruise modes (Chapters 10 and 11). However, some have suggested that it may be possible to combine acceleration and cruise mode activity into a new “combined driving” mode. As noted in Chapter 10, although engine power distribution for acceleration mode is different from what for cruise mode, these two modes share a similar pattern. A quick

304 analysis of the impact of combining acceleration and cruise mode is presented in this section.

After examining HTBR results, selecting the important explanatory variables, testing different transformation for X and Y, and adding dummy variables according to

HTBR results, the final NOx emission model for combined driving mode is:

NOx = ( 0.1134 + 0.0266(engine.power^(1/2))^2

At the same time, the final CO emission model for combined driving mode is:

CO = 10^(-2.2376 + 0.0043(engine.power))

While the final HC emission model for combined driving mode is:

HC = (0.1668 + 0.0028(engine.power^(1/2))^4

To test whether combining acceleration and cruise modes together would benefit the load-based modal emission model, the predictions from linear regression function for combined driving mode are compared to the predictions from sub-models for acceleration and cruise mode in load-based modal emission model. Since the other elements are the same for two models, they will be excluded from test. The results of the performance evaluation are shown in table 12-6.

Table 12-6 Comparative Performance Evaluation between Linear Regression with

Combined Mode and Linear Regression with Acceleration & Cruise Modes

Coefficient of Slope (β1) RMSE MPE determination (R2) NOx Combined Driving Mode 0.53087 0.92071 0.08488 0.00840 Acceleration & Cruise Mode 0.52735 0.95320 0.09312 0.03904 CO Combined Driving Mode 0.17676 1.59420 0.10395 0.02305 Acceleration & Cruise Mode 0.45160 1.77460 0.08966 0.01873 HC Combined Driving Mode 0.03381 0.90673 0.00204 0.00042 Acceleration & Cruise Mode 0.04103 0.90482 0.00203 0.00041

305 Results suggest that separate linear regression functions for acceleration and cruise modes perform significantly better than linear regression functions with combined driving mode for CO. For NOx and HC, both models perform similarly with respect to explaining the variance of emission rates. In general, these results support introducing acceleration and cruise mode into conceptual model. However, as new data become available for testing, researchers should examine whether it is reasonable to simply separate idle and deceleration modes from other driving modes and then apply a simple power-based model to the remaining combined driving activity for NOx.

12.5 MOBILE 6.2 vs. Load-Based Modal Emission Rate Model

The final step undertaken in the model verification process was a comparison of prediction results from MOBILE6.2 and load-based modal emission rate model developed in this research. It should be noted in advance that the comparisons are based upon the Ann Arbor transit vehicle test data. These data were used to develop the modal emission rates for this dissertation, but were not used in developing the MOBILE6.2 model. Normally, one would compare alternative model performance using an independent set of data collected from similar vehicles, which is currently not available.

Hence, the comparisons that will be presented are far from unbiased. When new data from an independent test fleet become available, these comparisons should be performed again.

To facilitate the emission rate prediction comparison, lookup tables for

MOBILE6.2 transit bus emission rates on arterial roads were first created for average speeds from 2.5 mph to 65 mph. The MOBILE6.2 calendar year was set to January 2002 since the data set were collected during October 2001. The temperature was set as 75ºF, since the emission rates for transit bus in MOBILE6.2 do not change with temperature.

Emissions predictions from MOBILE6.2 were then obtained by combining lookup tables

306 and corresponding speed values in the AATA data set. The results of the performance evaluation are shown in table 12-7.

Table 12-7 Comparative Performance Evaluation between MOBILE 6.2 and Load-Based

Modal ER Model

Coefficient of Slope RMSE MPE determination (β1) (R2) NOx MOBILE 6.2 0.17187 0.7056 0.10825 0.011217 Load-Based Modal ER Model 0.66519 1.1018 0.07122 0.021463 CO MOBILE 6.2 0.01946 1.6902 0.08516 0.013399 Load-Based Modal ER Model 0.49055 1.7490 0.06691 0.010285 HC MOBILE 6.2 0.04075 0.5837 0.00194 0.000173 Load-Based Modal ER Model 0.06766 1.2130 0.00192 0.000223

Results suggest that load-based modal emission rate model performs significantly better than MOBILE6.2 for NOx and CO, and slightly better for HC. The performance of the load-based modal emission rate model is not surprising because the same data used to develop the model are used in the comparison. Results suggest that load-based modal emission model perform well on explaining the variance of NOx and CO emission rates on microscopic level. The slight difference on RMSE and MPE indicate that both models

(MOBILE6.2 and load-based modal emission model) perform well at the macroscopic level, and should perform similarly when used in regional inventory development.

12.6 Conclusions

In general, the results provided here are encouraging for load based modal emission model. The comparison between engine power and surrogate power variables confirms the important role of engine power in explaining the variability of emissions.

The comparison between load-only emission rate model and load-based modal emission

307 rate model shows that the impact of driving mode on emissions is signficiant for NOx and CO emissions while no such trend is discernible for HC. The comparison between acceleration and cruise modes and combined driving mode indicates that the relationships between engine power and emissions are slightly different for acceleration and cruise modes. Splitting the database into five modes (idle mode, decelerating motoring mode, deceleration non-motoring mode, acceleration mode, and cruise mode) appears warranted.

The data used to develop load based modal emission model in this research are very limited which only contained 15 transit buses. Inter-bus variability is more obvious for HC emissions since Bus 363 has the lowest HC emissions comparing with other 14 buses. This kind of variability might influence the explanatory of modal emission model for HC emissions. When new data become available and these models should be re- derived to obtain further improved performance in applications to the transit bus fleet.

308 CHAPTER 13

CONCLUSIONS

The goal of this research is to provide emission rate models that fill the gap between the existing models and ideal models for predicting emissions of NOx, CO, and

HC from heavy-duty diesel vehicles. The researchers in Georgia Institute of Technology have developed a beta version of the heavy-duty diesel vehicle modal emissions model

(HDDV-MEM), which is based upon vehicle technology groups, engine emission characteristics, and vehicles modal activity (Guensler et al. 2005a). The HDDV-MEM first predicts second-by-second engine power demand as a function of onroad vehicle operating conditions and then applies brake-specific emission rates to these activity predictions. The HDDV-MEM consists of three modules: a vehicle activity module (with vehicle activity tracked by vehicle technology group), an engine power module, and an emission rate module.

Using second-by-second data collected from onroad vehicles, the research effort reported in this thesis developed models to predict emission rates as a function of onroad operating conditions that affect vehicle emissions. Such models should be robust and ensure that assumptions about the underlying distribution of the data are verified and that assumptions associated with applicable statistical methods are not violated. Due to the general lack of data available for development of heavy-duty vehicle modal emission rate models, this study focuses on development of an analytical methodology that is repeatable with different data set collected across space and time. The only acceptable second-by-second data set in which emission rate and applicable load and vehicle activity data had been collected in parallel was the Ann Arbor Transit Agency (AATA) bus emissions database collected by Sensors, Inc. for use by the US Environmental Protection

Agency.

309 The models developed in this dissertation are applicable to transit buses only, and are not applicable to all transit buses (see limitations discussion in Section 13.2).

However, a significant contribution of the research is in the development of the analytical framework established for analysis of second-by-second emission rate data collected in parallel with engine load, and other onroad operating parameters, and in the development of applicable processes for developing statistical models using such data. To demonstrate the capability of the modeling framework, three modal emission rate models have been developed for prediction of NOx, CO and HC emissions from mid-1990s transit buses.

The AATA transit bus data set was first post-processed through a quality control/quality assurance process. Data problems were identified and corrected during this stage of the research effort. The types of errors checked include: loss of data, erroneous ECM data, GPS dropouts, and synchronization errors. Data records for which all data elements were not collected were removed to avoid any bias to the results. No erroneous ECM data were identified. Six buses experienced GPS dropouts and synchronization errors and these problems were treated as described in chapter 4.

Emission rate variability was also assessed across the sample of buses to identify any potential high-emitters that may behave differently than other buses under normal operating conditions and therefore warrant separate model development. However, no high-emitters were identified. To find the true ‘high-emitter’, modelers need to include a representative sample of buses to try to ensure that mean emissions and response rates to operating variables are represented in the data. Since there are only 15 buses in the data set, modelers couldn’t exclude buses that are higher than the others.

Model development then proceeded through a structured series of steps.

Transformations of emission rates (NOx, CO, and HC) were verified through Box-Cox procedure to improve the specific modeling assumption, such as linearity or normality.

HTBR regression tree results were used to identify the most important explanatory variables on emission rates. OLS regression models were developed for transformed

310 emission rates using chosen explanatory variables. Dummy variables which were created to represent the cut points identified in HTBR trees. Interactions effects for identified explanatory variables were also tested to see whether they can improve the model or not.

The models were comparatively evaluated and the most efficient models for each pollutant were selected. By demonstrating its statistical “robustness” and sufficiency in previous chapters, the main goal of this research, that of “developing a new load-based models with significant improvement”, was achieved.

The chapter will review the key accomplishments of this research. The chapter provides the final models selected for implementation begins with a summary of the final models developed for the transit buses, followed immediately by a discussion on the limitations of these models. The chapter concludes with the lessons learned and recommendations on further research.

13.1 Transit Bus Emission Rate Models

The goal of this research was to develop a methodology for creating load-based emission rate models designed to predict emission rates of NOx, CO, and HC from transit buses as a function of onroad operating conditions. The models should be robust and ensure that statistical assumptions in model development are not violated. With limited available data, this study developed a methodology that is repeatable with a different data set from across space and across time. The final estimated models are presented in Table

13-1.

311 Table 13-1 Load Based Modal Emission Models

Driving Mode NOx Idle Mode 0.033415 g/s Decelerating Motoring Mode 0.0097768 g/s Deceleration Non-Motoring 0.045777 g/s Mode Acceleration Mode NOx = (-0.0195 + 0.2007log10(engine.power + 1) + 0.0019vehicle.speed)^2 Cruise Mode NOx = (0.0087 + 0.0311 (engine.power)^(1/2))^2 CO Idle Mode 0.0059439 g/s Decelerating Motoring Mode 0.0052857 g/s Deceleration Non-Motoring 0.0068557 g/s Mode Acceleration Mode CO = 10^(-3.7472 + 1.3412log10(engine.power + 1) - 0.0285vehicle.speed) Cruise Mode CO = 10^(-2.2230+0.0033engine.power) HC Idle Mode 0.00091777 g/s Decelerating Motoring Mode 0.001113 g/s Revised Deceleration Mode 0.001312 g/s Acceleration Mode HC = (0.1136 + 0.0426log10(engine.power + 1))^4 Cruise Mode HC = (0.1700 + 0.0022 (engine.power)^(1/2)) ^4

The transformations employed for the three pollutants in acceleration and cruise modes are different. The predictive capabilities of each of the models for three pollutants are also different. The R2 value is high for NOx and CO emission rates, but very low for

HC emission rates. HC models are not much better than simply using HTBR mean ERs.

The relatively poor performance of the HC models is not an inherent limitation of the modal modeling approach. Instead, it is a result of the lack of availability of a suitable explanatory variable for model development purposes. Although the model with dummy variables and interactions works better, the final model is not necessarily the best fits, but is one that can be readily implemented.

The three models include all of those significant variables identified as affecting gram/second emissions rates, with the exception of those variables that are highly

312 correlated with individual bus ID. Although a few of the vehicles behaved differently from other vehicles, modelers could not reasonably include bus ID as a variable, nor environment parameters of testing since all low barometric pressure tests were conducted on one or two vehicles. Additional exploration of environmental conditions should be conducted by collecting data over a larger fleet under a wider variety of environmental conditions.

The new modal emission rates models all indicate that engine power has a significant impact on the acceleration and cruise emission rates. This strengthens the importance to use load based emission data to develop new emission model and simulate engine power in real world applications. All three models were shown to be robust by use of several statistical measures. Although some departures from accepted norms were noted, they were judged not so serious as to compromise the usefulness of the models, hence no remedial measures were taken.

13.2 Model Limitations

There are several limitations in the models estimated and presented in this work.

Theoretically, the models cannot be used to forecast emissions beyond the domain of variables used in estimating the models. These models were developed from 15 buses equipped with same fuel injection type, catalytic converter type, transmission type, and so on. This means the models couldn’t consider the effect of vehicle technologies on emissions. Another limitation is the consideration about effect of emission control technology deterioration on emission levels since all buses were only 5 or 6 years old at the time testing was conducted. Although the speed/acceleration profiles between AATA dataset and Atlanta bus are similar, this is no way to estimate the changes on vehicle technologies and deterioration on emissions in the current and future fleet in Atlanta.

Such a limitation would introduce obvious uncertainties in use of the model to make predictions for other fleets.

313 The predictive models are derived from a research effort conducted by other parties. The modeling at this time cannot control for those variables for which data were not collected. This inability to control the variables may yield several uncertainties in the models. First, important or useful variables relevant to the effect of emission rates may not have been observed at all. When this happens, it may be difficult to derive a model with sufficient explanatory power, or variables that are selected may simply be correlated to the true causal variables that are affecting instantaneous emission rates. Second, the interpretation of individual variables effects might be limited. For example, the ability of negative load to explain the variability on emissions is limited due to the negative loads recorded as zero.

An additional limitation imposed by the data is the uncertainty introduced by the actual data collection process. The uncertainty in the GPS position will introduce significant instantaneous error in grade computation (grade should be collected by other means than GPS). Although filter limits were imposed on the rate of change of engine speed (RPM), fuel flow, and vehicle speed data, data could yield unreasonable instantaneous vehicle acceleration or deceleration rates, and still be within reasonable absolute limits. This uncertainty may bias predictions.

The possible presence of outliers has the potential to cause a misleading fit by disproportionately pulling the fitted regression line away from the majority of the data points (Neter et al. 1996). Cook’s distance plots indicated that some points do have influence over the regression fit. However, none of these points are indicative of obvious errors in data. As such, it is difficult to determine whether those extreme values were actually outliers or not. It is assumed instead that since the data passed through EPA’s rigorous QA/QC procedures and that no “true” outliers exist and that these high-emission events are representative of events that occur in the real world. As such, all of these data were retained in model development. When additional data become available,

314 researchers should make it a priority to examine these high emissions events to identify the underlying causal factors.

13.3 Lessons Learned

Because driving mode definitions varied across previous research efforts, findings from these efforts are not directly comparable. This study independently developed driving mode definition through comparison across critical values. Suitable modal activity definition can divide the data into several homogeneous groups according to emission rates and driving conditions. Unlike previous research efforts which only present pairwise comparisons of modal average estimates, or HTBR regression tree analysis, this study compared distributions of engine operating characteristics under proposed vehicle mode definitions in defining applicable vehicle modes.

A representative dataset is the most critical issue for the final version development of the proposed model. This issue plays an important role no matter which modeling approach employed. This representative dataset should reflect real world with respects to vehicle emissions and activity patterns. The dataset used for the proposed model is EPA AATA data and includes 15 buses. At the time this research was conducted, the AATA data was the only applicable data set that contained all required data (second-by-second emission rates, engine load, and applicable operating variables) all collected in parallel. New data sets will improve model performance in future.

A combination of tree, and OLS regression method was used to estimate NOx,

CO and HC emission models from EPA’s transit bus database tested by Sensors. The

HTBR technique was used as a tool to reveal underlying data structure and identify useful explanatory variables and was demonstrated as a powerful tool that researchers can deal with large multivariate data sets with mixed mode (discrete and continuous) variables.

315 13.4 Contributions

This research verifies that vehicle emission rates are highly correlated with modal vehicle activity. Further more, the relationship between engine power and emissions is also significant and is quantified for the available data. Research results indicate that engine power is more powerful than surrogate variables in predicting second-by-second grams/second emission rates. Hence, to improve our understanding of emission rates, it is important to examine not only vehicle operating modes, but also engine power distributions. Based upon the important role of engine power in explaining the variability of emissions, it is critical to include the load data measurement (and collection of all onroad operating parameters to estimate load, such as grade) during emission data collection procedure.

Another major contribution of the work is the establishment of a framework for emission rate model development suitable for predicting emissions at microscopic level.

As more databases become available, the model development steps can be re-run to develop a more robust load-based modal emission model based on the same philosophy.

This living modeling framework provide the ability to integrate necessary vehicle activity data and emission rate algorithms to support second-by-second and link-based emissions prediction. Combined with GIS framework, models derived through this methodology will improve spatial/temporal emissions modeling.

13.5 Recommendation for Further Studies

The methodology developed and applied in this research can, and should be used to estimate similar models for the on-road fleet consisting of transit bus and heavy-duty vehicles. Since emissions of these vehicles are heavily dependent on vehicle dynamics

(that is, load and power), a successful validation will provide further evidence of the

“correctness” of the method employed here. When new data become available and these

316 models are re-derived, modelers can expect further improved performance in applications to the transit bus fleet and eventually to other heavy-duty vehicle fleets.

Given the important role of engine power in explaining the variability of emissions, engine load data should be measured during emission data collection procedure and all parameters necessary to estimate onroad load (such as grade and vehicle payload) should be included in the data collection efforts. Similarly, simulation of engine power demand for onroad operations becomes important in the implementation of emission inventory modeling for heavy-duty transit buses. Refinement of roadway characteristic data (grade, etc.) for urban areas is paramount and research efforts that can quantify drivetrain inertial losses under various operating conditions will help enhance modal model development.

Because all buses tested were of the same model with the same engine, the test data were valuable from the perspective of controlling potential explanatory variables related to vehicle characteristics. But, these data simultaneously constrain the ability to explain the effect of vehicle technology groups and deterioration of emission control technologies on emissions data. Expanded data collection efforts should focus on identification of appropriate vehicle technology groups and high-emitting vehicle groups.

In these test programs, it will also be important to test buses under their real-world operating conditions (on a variety of routes, road types and grade, onroad operating conditions, environmental conditions, passenger loadings, etc.) to better reflect the situation in real world. These high-resolution data collection efforts will provide the data needed by modelers to develop new and enhanced modal emission rate models for a variety of heavy-duty vehicle classes.

317 REFERENCES

Ahanotu, D. (1999). Heavy-Duty Vehicle Weight and Horsepower Distributions: Measurement of Class-Specific Temporal and Spatial Variability. School of Civil and Environmental Engineering. Atlanta, GA, Georgia institute of Technology. Ph.D.

AMS. (2005). "A look at U.S. air pollution laws and their amendments." Retrieved July 30, 2005, from http://www.ametsoc.org/sloan/cleanair/cleanairlegisl.html

Avol, e. a. (2001). "Respiratory effects of relocating to areas of differing air pollution levels." Am J Respir Crit Care Med 164: 2067-2072.

Bachman, W. (1998). A GIS-Based Modal Model of Automobile Exhaust Emissions Final Report. Atlanta, GA, Prepared by Georgia Institute of Technology for U.S. Environmental Protection Agency.

Bachman, W., W. Sarasua, et al. (2000). "Modleing Regional Mobile Source Emissions in a GIS Framework." Transportation Research C 8(1-6): 205-229.

Backman, W., W. Sarasua, S. Hallmark, and R. Guensler (2000). "Modleing Regional Mobile Source Emissions in a GIS Framework." Transportation Research C 8(1- 6): 205-229.

Barth, M., F. An, et al. (1996). "Modal Emission Modeling: A Physical Approach." Transportation Research Record 1520: 81-88.

Barth, M., G. Gcora, et al. (2004). A Modal Emission Model for Heavy Duty Diesel Vehicles. 83rd Transportation Research Board Annual Meeting Proceedings (CD- ROM), Washington, DC.

Barth, M. and J. Norbeck (1997). NCHRP Project 25-11: The Development of a Comprehensive Modal Emission Model. Proceedings of the 7th CRC On-Road Vehicle Emissions Workshop, Coordinating Research Council, Atlanta, GA.

Breiman, L., J. Friedman, et al. (1984). Classification and Regression Trees. Wadsworth International Group, Belmont. CA.

Browning, L. (1998). Update of Heavy-Duty Engine Emission Conversion Factors -- Analysis of Fuel Economy, Non-Engine Fuel Economy Improvements and Fuel Densities, U.S. Environmental Protection Agency.

CARB (1991). Modal Acceleration Testing. Mailout No. 91-12; Mobile Source Division; El Monte, CA.

318 CARB (2002). "Heavy-Duty Diesels Compression Ignition Engine Emissions and Testing." California Air Resources Board Emissions Inventory Series 1(10).

CARB. (2004). " California's Air Quality History Key Events." California Air Resources Board Retrieved July 2, 2004, from http://www.arb.ca.gov/html/brochure/history.htm

Carlock, M. A. (1994). An analysis of High Emitting Vehicles in the On-road Vehicle Fleet. 87th Air and Waste Management Association Annual Meeting Proceeding Pittsburgh, PA.

CEDF. (2002). "Nitrogen Oxides: How NOx Emissions Affect Human Health and the Environment."

CFR (2004b). National Primary and Secondary Ambient Air Quality Standards (40CFR50). Code of Federal Regulations. National Archives and Records Administration.

CFR (2004c). Gross Vehicle Weight Rating (40CFR86.1803). Code of Federal Regulations. National Archives and Records Administration.

CFR (2004d). Useful Lift (40CFR86.1805). Code of Ferderal Regulations. National Archives and Records Adminisration.

Chakravart, L. and Roy (1967). Handbook of Methods of Applied Statistics, Volume I, John Wiley.

Clark, N. N., J. M. Kern, et al. (2002). "Factors Affecting Heavy-Duty Diesel Vehicle Emissions." Journal of the Air & Waste Management Association 52: 84-94.

Clark, N. N., A. S. Khan, et al. (2005). Idle Emissions from Heavy-Duty Diesel Vehicles, Center for Alternative Fuels, Engines, and Emissions (CAFEE), Department of Mechanical and Aerospace Engineering, West Virginia University (WVU).

Conover, W. J. (1980). Practical Non-parametric Statistics, John Wiley and Sons; New York, NY.

Copt, S. and S. Heritier (2006). Robust MM-Estimation and Inference in Mixed Linear Modles. Department of Econometrics, Working Papers, University of Sydney.

Davis, W., K. Wark, et al., Eds. (1998). Air Pollution Its Origin and Control. 3rd Edition, 2003 Special Studies. Addison Wesley Longman, Inc. Menlo Park, California.

Denis, M. J. S., P. Cicero-Fernandez, et al. (1994). "Effects of In-Use Driving Conditions and Vehicle/Engine Operating Parameters on “Off-Cycle” Events: Comparison

319 with Federal Test Procedure Conditions." Journal of the Air & Waste Management Association 44(1): 31-38.

DieselNet. (2006). "Heavy-Duty FTP Transient Cycle." Retrieved December 20, 2006, from http://www.dieselnet.com/standards/cycles/ftp_trans.html

Dreher, D. and R. Harley (1998). "A Fuel-Based Inventory for Heavy-Duty Diesel Truck Emissions." Journal of the Air & Waste Management Association 48: 352-358.

Easton, V. J. and J. H. McColl. (2005). "Statistics Glossary." Retrieved March, 28, 2005, from http://www.stats.gla.ac.uk/steps/glossary/index.html.

Ensfield, C. (2002). On-Road Emissions Testing of 18 Tire 1 Passenger Cars and 17 Diesel Powered Public Transport Buses. Saline, MIchigan, Sensors, inc.

FCAP. (2004). " Ambient Air Quality Trends: An Analysis of Data Collected by the U.S. Environmental Protection Agency." Foundation for Clean Air Progress.

Feng, C., S. Yoon, et al. (2005). Data Needs for a Proposed Modal Heavy-Duty Diesel Vehicle Emission Model. 98th Air and Waste Management Association Annual Meeting Proceeding (CD-ROM), Pittsburgh, PA.

Fomunung, I., S. Washington, et al. (1999). "A Statistical Model for Estimating Oxides of Nitrogen Emissions from Light-Duty Motor Vehicles." Transportation Research D 4D(5): 333-352.

Fomunung, I., S. Washington, et al. (2000). "Validation of the MEASURE Automobile Emissions Model: A Statistical Analysis." Journal of Transportation Statistics 3(2): 65-84.

Fomunung, I. W. (2000). Predicting emissions rates for the Atlanta on-road light duty vehiclar fleet as a function of operating modes, control technologies, and engine characteristics. Civil and Environmental Engineering. Atlanta, Georgia Institute of Technology. Ph.D.

Frey, H. C., A. Unal, et al. (2002). Recommended Strategy for On-Board Emission Data Analysis and Collection for the New Generation Model. Raleigh, NC, Prepared by Computational laboratory for Enginer, Air and Risk, Department of Civil Engineering, North Carolina State University, Prepared for U.S. Environmental Protection Agency.

Frey, H. C. and J. Zheng (2001). Methods and Example Case Study for Analysis of Variability and Uncertainty in Emissions Estimation (AUVEE). Research Triangle Park, NC, Prepared by North Carolina State University for Office of Air Quality Planning and Standards, U.S. Environmental Protection Agency.

320 Gajendran, P. and N. N. Clark (2003). "Effect of Truck Operating Weight on Heavy-Duty Diesel Emissions." Environment Science Technology 37: 4309-4317.

Gauderman, e. a. (2002). "Association between air pollution and lung function growth in Southern California children: Results from a second cohort." Am J Resp Crit Care Med 166(1): 74-84.

Gautam, M. and N. Clark (2003). Heavy-Duty Vehicle Chassis Dynamometer Testing for Emissions Inventory, Air Quality Modeling, Source Apportionment and Air Toxics Emission Inventory; Phase I Report. . Coordinating Research Council, Project No. E-55/E-59.

Gillespie, T. (1992). Fundamentals of Vehicle Dynamics. Warrendale, PA, Society of Automotive Engineers, Inc.

Granell, J. L., R. Guensler, et al. (2002). Using Locality-Specific Fleet Distributions in Emissions Inventories: Current Practice, Problems, and Alternatives. 81st Transportation Research Board Annual Meeting Proceedings (CD-ROM), Washington, DC.

Grant, C., R. Guensler, et al. (1996). Variability of Heavy-Duty Vehicle Operating Mode Frequencies for Prediction of Mobile Emissions. 89th Air and Waste Management Association Annual Meeting Proceeding (CD-ROM), Pittsburgh, PA.

Guensler, R. (1993). "Data Needs for Evolving Motor Vehicle Emission Modeling Approaches." American Society of Civil Engineers.

Guensler, R., Y. S., et al. (2005a). Heavy-Duty Diesel Vehicle Modal Emissions Modeling Framework. Regional Applied Research Effort (RARE) Project. Presented to U.S. Environmental Protection Agency, Georgia Institute of Technology.

Guensler, R., Y. S., et al. (2006). Heavy-Duty Diesel Vehicle Modal Emissions Modeling Framework. Regional Applied Research Effort (RARE) Project. Presented to U.S. Environmental Protection Agency, Georgia Institute of Technology.

Guensler, R., D. Sperling, et al. (1991). Uncertainty in the Emission Inventory for Heavy- Duty Diesel-Powered Trucks. 84th Air and Waste Management Association Annual Meeting Proceedings (CD-ROM), Pittsburgh, PA.

Guensler, R., S. Washington, et al. (1998). "Overview of MEASURE Modeling Framework." Proc Conf Transport Plan Air Quality A: 51-70.

Hallmark, S. L. (1999). Analysis and Prediction of Individual Vehicle Activity for Microscopic Traffic Modeling. Civil and Environmental Engineering. Atlanta, Georgia Institute of Technology. Ph.D.

321 Heywood, J. B. (1998). Internal Combustion Engine Fundamentals. New York, The McGraw Hill Publishing Company

HowStuffWorks. (2005). Retrieved December 30, 2005, from http://www.howstuffworks.com

Jimenez-Palacios, J. (1999). Understanding and Quantifying Motor Vehicle Emissions with Vehicle Specific Power and TILDAS Remote Sensing. Cambridge, MA, Massachusetts Institute of Technology. PhD.

Kelly, N. A. and P. J. Groblicki (1993). "Real-World Emissions from a Modern Production Vehicle Driven in Los Angeles." Journal of the Air & Waste Management Association 43(10): 1351-1357.

Kittelson, D. B., D. F. Dolan, et al. (1978). "Diesel Exhaust Particle Size Distribution - Fuels and Additive Effects." SAE Paper NO. 780787.

Koehler, K. J. and K. Larnz (1980). "An empirical investigation of goodness-of-fit statistics for sparse multinomials." Journal of the American Statistical Association 75: 336-344.

Koupal, J., M. Cumberworth, et al. (2002). Draft Design and Implementation Plan for EPA's Multi-Scale Motor Vehicle and Equipment Emissions System (MOVES), U. S. Environmental Protection Agency.

Koupal, J., N. E. Nam, et al. (2004). The MOVES Approach to Modal Emission Modeling. The 14th CRC On-Road Vehicle Emissions Workshop, Coordinating Research Council, San Diego, CA.

Li, L. (2004). Calculating the Confidence Intervals Using Bootstrap, Department of Statistics, University of Toronto, Presented for a project of Ontario Power Generation on October 28, 2004.

Lindhjem, C. and T. Jackson (1999). Update of Heavy-Duty Emission Levels (Model Years 1998-2004+) for Use in MOBILE6, U.S. Environmental Protection Agency.

Lloyd, A. C. and T. A. Cackette (2001). "Diesel Engines: Environmental Impact and Control." Journal of Air and Waste Management Association 51: 809-847.

Nam, E. K. (2003). Proof of Concept Investigation for the Physical Emission Rate Estimator (PERE) to be Used in MOVES, Ford Research and Advanced Engineering.

322 Neter, J., M. H. Kutner, et al. (1996). Applied Linear Statistical Models, McGraw-Hill: Chicago IL.

Newton, K., W. Steeds, et al. (1996). The Motor Vehicle. Warrendale, PA, Society of Automotive Engineers, Inc.

NRC (2000). Modeling Mobile-Source Emissions. Washington, D.C., National Academy Press, National Research Council.

Peters, J. M., et al. (1999). "A study of twelve Southern California communities with differing levels and types of air pollution. II. Effects on pulmonary function." Am J Respir Crit Care Med 159: 768-775.

Prucz, J. C., N. N. Clark, et al. (2001). "Exhaust Emissions from Engines of the Detroit Diesel Corporation in Transit Buses: A Decade of Trends." Environment Science Technology 35: 1755-1764.

Ramamurthy, R. and N. Clark (1999). "Atmospheric Emissions Inventory Data for Heavy-Duty Vehicles." Environmental Science and Technology 33: 55-62.

Ramamurthy, R., N. N. Clark, et al. (1998). "Models for Predicting Transient Heavy Duty Vehicle Emissions." Society of Automotive Engineers SAE 982652.

Roess, R. P., E. S. Prassas, et al. (2004). Traffic Engineering, Pearson Education, Inc.

SCAQMD (2000). Multiple Air Toxics Exposure Study (MATES-II), South Coast Air Quality Management District Governing Board.

Schlappi, M. G., R. G. Marshall, et al. (1993). "Truck travel in the San Francisco Bay Area." Transportation Research Record 1383: 85-94.

Siegel S, C. N. (1988). Non-parametric Statistics for the Behavioural Sciences 2 Edition.

Singer, B. C. and R. A. Harley (1996). "A fuel-based motor vehicle emission inventory." Journal of the Air & Waste Management Association 46: 581-593.

StatsDirect. (2005). "Statistical Help." http://www.statsdirect.com/ Retrieved May 30, 2005.

TRB (1995). Expanding Metropolitan Highways: Implications for Air Quality and Energy Use. Washington, DC, Transportation Research Board, National Academy Press.

USEPA (1993). User's Guide to Mobile5a.

323 USEPA (1995). National Air Quality and Emission Trends Report 1995, Office of Air Quality Planning and Standards, U.S. Environmental Protection Agency.

USEPA (1997). Emissions Standards Reference Guide for Heavy-Duty and Nonroad Engines.

USEPA (1998). Update of Fleet Characterization Data for Use in MOBILE6 - Final Report, U.S. Environmental Protection Agency.

USEPA (2001a). EPA's New Generation Mobile Source Emissions Model: Initial Proposal and Issues, U.S. Environmental Protection Agency.

USEPA (2001a). Update of Heavy-duty Emission Levels (Model Years 1988-2004) for Use in MOBILE6, U.S. Environmental Protection Agency.

USEPA (2001b). Heavy Duty Diesel Fine Particulate Matter Emissions: Development And Application Of On-Road Measurement Capabilities. Research Triangle Park, NC Prepared by National Risk Management Research Laboratory for Office of Air Quality Planning and Standards, U.S. Environmental Protection Agency.

USEPA (2002a). Update Heavy-Duty Engine Emission Conversion Factors for MOBILE6, Analysis of Fuel Economy, Non-Engine Fuel Economy Improvements and Fuel Densities.

USEPA (2002b). Methodology for Developing Modal Emission Rates for EPA’s Multi- Scale Motor Vehicle and Equipment Emission System. Raleigh, NC, Prepared by North Carolina State University for Office of Transportation and Air Quality, U.S. Environmental Protection Agency.

USEPA (2002c). Update Heavy-duty Engine Emission Conversion Factors for MOBILE6: Analysis of BSFCs and Calculation of Heavy-duty Engine Emission conversion Factors.

USEPA (2003). National Air Quality and Emissions Trends Report, 2003 Special Studies Edition. Research Triangle Park, NC, Office of Air Quality and Standards, U.S. Environmental Protection Agency.

USEPA. (2005a). "Fine Particle (PM2.5) Designations." Retrieved October 20, 2005, from http://www.epa.gov/pmdesignations/index.htm.

USEPA. (2006). "National Ambient Air Quality Standards (NAAQS)." Retrieved October 30, 2006, from http://www.epa.gov/air/criteria.html

Washington, S. (1994). Estimation of a vehicular carbon monoxide modal emissions model and assessment of an intelligent transportation technology, University of California at Davis. Ph.D.

324 Washington, S., J. Leonard, et al. (1997b). Forecasting Vehicle Modes of Operation needed as Input to 'Model' Emissions Models. Proceedings of the 4th International Scientific Symposium on Transport and Air Pollution, Lyon, France.

Washington, S., L. F. Mannering, et al. (2003). Statistical and Econometric Methods for Transportation Data Analysis, CRC Pr I Llc.

Washington, S., J. Wolf, et al. (1997a). "Binary Recursive Partitioning Method for Modeling Hot-Stabilized Emissions from Motor Vehicles." Transportation Research Record 1587: 96-105.

Whitley, E. and J. Ball (2002). "Statistics review 6: Nonparametric methods." Critical Care 6: 509-513.

Wolf-Heinrich, H. (1998). Aerodynamics of Road Vehicles, Society Of Locomotive Engineers Inc., USA.

Wolf, J., R. Guensler, et al. (1998). "High Emitting Vehicle Characterization Using Regression Tree Analysis." Transportation Research Record 1641: 58-65.

Yoon, S. (2005c). A New Heavy-Duty Vehicle Visual Classification and Activity Estimation Method For Regional Mobile Source Emissions Modeling. School of Civil and Environmental Engineering. Atlanta, Georgia Institute of Technology. Ph.D.

Yoon, S., H. Li, et al. (2005a). Transit Bus Engine Power Simulation: Comparison of Speed-Acceleration-Road Grade Matrices to Second-by-Second Speed, Acceleration, and Road Grade Data. 98th Air and Waste Management Association Annual Meeting Proceeding (CD-ROM), Pittsburgh, PA.

Yoon, S., H. Li, et al. (2005b). A Methodology for Developing Transit Bus Speed- Acceleration Matrices to be used in Load-Based Mobile Source Emissions Models. 84th Transportation Research Board Annual Meeting Proceedings (CD- ROM), Washington, DC.

Yoon, S., M. Rodgers, et al. (2004b). "Engine and Vehicle Characteristics of Heavy-Duty Diesel Vehicles in the Development of Emissions Inventories: Model Year, Engine Horsepower and Vehicle Weight." Transportation Research Record(1880): 99-107.

Yoon, S., P. Zhang, et al. (2004a). A Heavy-Duty Vehicle Visual Classification Scheme: Heavy-Duty Vehicle Reclassification Method for Mobile Source Emissions Inventory Development. 97th Air and Waste Management Association Annual Meeting Proceeding (CD-ROM), Pittsburgh, PA.

325 Younglove, T., G. Scora, et al. (2005). Designing On-road Vehicle Test Programs for Effective Vehicle Emission Model Development. 84th Transportation Research Board Annual Meeting Proceedings (CD-ROM), Washington, DC.

Zeldovich, Y. B., P. Y. Sadonikov, et al. (1947). "The oxidation of nitrogen in combustion and explosions." Acta Physicochimica USSR 21(4): 577-628.

326 VITA

CHUNXIA FENG

Chunxia Feng earned a Bachelor’s degree in Transportation Engineering from

Northern Jiaotong University, China. She subsequently pursued her graduate studies and received the Master’s degree in 1996. She enrolled as a doctoral student in the School of

Civil and Environmental Engineering at the Georgia Institute of Technology in the fall of

2000. During her way to pursue her Ph.D. degree, she received the Master’s degrees in

Civil Engineering in 2001 and the Master’s degree in Industrial Engineering in 2003.

Her current research interests include air quality and environment, emission modeling, transportation planning and system analysis, and transportation logistics and supply chain.

327