Using Data Analytics for Cost-Effective Prediction

SUMMARY REPORT Using Data Analytics for Cost-Effective Prediction of Road Conditions: Case of the Pavement Condition Index FHWA Publication No.: FHWA-HRT-18-065 FHWA Contact: Deborah Walker, HRDI-30, 202-493-3068, [email protected] Abstract of 14 attributes, a set of decision trees was developed to predict the level of PCI deterioration Municipalities and transportation departments with an accuracy of more than 70 percent. Finally, devote considerable effort to collecting data— the accuracy and confusion matrices of different particularly in relation to road conditions. Many decision trees were compared to test the impact small municipalities do not have sufficient resources of each attribute on prediction accuracy. This to collect data regularly. In larger municipalities, on method can help municipalities and transportation the other hand, collecting field-based data may departments identify the most significant attributes have negative impacts in terms of crew safety to accurately predict road performance indicators and traffic interruptions; data analytics could help (PIs). reduce these negative impacts. In this study, data analytics is used to test if affordable and easy-to- Introduction collect data can be used to predict future values of Understanding and tracking PIs of roads, especially the Pavement Condition Index (PCI). North American the physical PIs, is critical to a successful asset- transportation departments frequently use the PCI management plan. A better understanding of PIs to assess road conditions. To calculate the PCI, helps decisionmakers schedule remedial actions, transportation departments and municipalities must increase customer satisfaction, and be proactive in collect distress data and their severity levels. budget planning and risk assessment. In this study, the Long-Term Pavement Performance Different PIs are used to assess the condition and (1) (LTPP) database was used as the source of data. remaining life of roads. Some of the most popular Because the LTPP database does not include the PCI PIs include the PCI, International Roughness values of its road sections, the first step in the study Index (IRI), Structural Condition Index (SCI), and was to develop a program to calculate the PCI from Present Serviceability Index (PSI). Collecting data the distress values in the LTPP database. Next, a set for these indices could be a challenge for smaller of pavement attributes was selected—mainly based municipalities, which are usually restricted in on the ease of collection and cost effectiveness. terms of human and financial resources. For larger The researchers tested the potential importance municipalities, in addition to costs, collecting field- of these attributes in predicting PCI using seven based data can have negative impacts on crew ranking algorithms and a heuristic feature-selection safety or traffic flow.(2) algorithm. Data analytics can provide valuable support for Two types of decision trees were trained based on the data-collection and prediction processes. 942 examples of asphalt roads. Using combinations Recently, the availability of increased amounts of https://highways.dot.gov/research Research, Development, and Technology Turner-Fairbank Highway Research Center 300 Georgetown Pike, McLean, VA 22101-2296 data and computational power and, on the other they discovered that, in most transportation side, the variety of available analytics algorithms departments, data collection is still highly subjective have enabled engineers to move from descriptive and conventional. In other words, data are collected statistics and simplistic correlation analyses to more based on past practices and staff experience rather sophisticated analytics. Data-mining and machine- than solid rational analysis of relevance and value learning techniques can detect patterns in large added. datasets, hence the growing use of analytics for Limited research is available about how to define different purposes in a variety of industries.(3) informative data to collect. Pantelias et al. proposed This paper demonstrates how machine-learning a framework for data collection that aims to models can help municipalities predict the PCI values support project selection for rehabilitation.(6) The of roads using easy-to-collect and cost-effective study provided general guidelines and a framework attributes. Therefore, the rationale behind choosing based on literature reviews and survey results. The attributes was not a conventional mechanistic or framework suggests that decisionmakers must engineering reasoning. Rather, it was to find out if study available and missing data and identify data affordable and accessible data can do the same job. that are necessary to collect. Another study by The scope of this paper is not limited to predicting Woldesenbet et al. used a social-network-analysis the conditions of roads using data analytics. The approach to model the use of data in generating authors also investigated the relative significance information and supporting decisionmaking in road of a road’s attributes in its deterioration. This type of management.(7) Using surveys and interviews to analysis can guide municipalities and transportation create networks of data interrelationships, they departments in crafting a more efficient data- assessed how frequently a specific piece of data collection and -management policy. was used in decisionmaking. In this study, the PCI was chosen because it is One of the areas of road asset management that commonly used by municipalities and transportation could be improved by data analytics is deterioration departments in North America. However, the same modeling. Although deterioration modeling is methodology can be used for analyzing other an integral part of asset-management planning, PIs, such as the IRI, SCI, and PSI. PCI values vary many municipalities overlook it or use generic between 0 and 100. A PCI of 100 represents the best models. For instance, a recent study in Canada possible condition, and 0 represents the worst. Both revealed that most small municipalities in Ontario ASTM and the Ontario Ministry of Transportation did not incorporate a deterioration model in their have produced detailed guidelines for calculating asset-management analyses.(8) The same study the PCI. Both sets of guidelines require collecting reported that municipalities that paid attention distress data, such as fatigue cracking, bleeding, to deterioration modeling mostly depended on edge cracking, rutting, longitudinal and transversal deterministic deterioration curves to predict the cracking, and raveling.(4,5) conditions of their assets.(8) These deterioration Related Work: Data and Analytics in Asset curves have several pitfalls. First, they are Management deterministic—users have no guidelines on how to add variability to their values when conducting Without clear understanding of the value and a probabilistic risk analysis. Second, these models role of datasets in analyzing asset conditions, are context insensitive; i.e., PCI deterioration curves municipalities might invest in collecting data that are predict future PCI values merely based on the length not generating much value or relevant information. of time. These curves overlook other road attributes, After surveying 50 transportation departments in such as pavement type, traffic volumes, and climate. the United States, Pantelias et al. reported that, in many cases, transportation agencies have created Stochastic deterioration models do not have the vast databases that do not necessarily supply useful disadvantages standardized deterioration curves information for decisionmaking.(6) Furthermore, have.(9) Markovian models, for example, study 2 and estimate deterioration based on probabilistic sections, may raise some questions regarding the analysis. Nonetheless, they often disregard the reliability and robustness of their models.(2) history of deterioration and previous maintenance Researchers have conducted data analysis on data actions.(9–11) Additionally, they require longitudinal from the LTPP database to model PIs.(2,15,16) Using data, which are not easily available.(8,9) Data-analytics the historical distress data in the LTPP database tools that learn or detect patterns from a large and Minnesota road database, Wu developed a dataset can be a suitable alternative. Data analysis is methodology to predict the PCI of asphalt roads a broad term that has been used to refer to a range of over time by calculating the current PCI from methods, from simple statistical analysis to distress values and predicting future PCI values machine-learning and data-mining techniques. In using PCI master curves.(17) In another study, this summary report, data analytics specifically Meegoda and Gao developed a quantitative refers to machine learning and data mining relationship between roughness progression and only. Machine-learning and artificial-intelligence accumulative traffic load, structural number, annual algorithms have become popular in civil precipitation, and freezing index.(15) Moreover, engineering, including analytics to predict they used a Weibull distribution to investigate the the condition of roads. For instance, Yang et al. reliability of roughness progression models. used neural networks to predict variations in the crack index of asphalt roads over a short Objectives: Predictions Using Cost-Effective term.(12) Neural networks have a good learning Data capability; however, large amounts of data are The first objective of this research

Using Data Analytics for Cost-Effective Prediction

Analytics in Higher Education: Benefits, Barriers, Progress, and Recommendations (Research Report)

Using Data Analytics to Detect Fraud

Google Analytics User Guide

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Dimmwitted: a Study of Main-Memory Statistical Analytics

Data Analytics and Decision-Making in Education: Towards the Educational Data Scientist As a Key Actor in Schools and Higher Education Institutions

Data Mining and Big Data Analytics (2 Credits)

Evaluating Business Intelligence / Business Analytics Software for Use in the Information Systems Curriculum

ITOM Advising

M.S. in Data Analytics

Market Analysis: Data Science and Data Analytics

Analytics for Education