Available online at www.sciencedirect.com

International Journal of Project Management 30 (2012) 470–478 www.elsevier.com/locate/ijproman

Predicting construction cost and schedule success using artificial neural networks ensemble and support vector machines classification models ⁎ Yu-Ren Wang , Chung-Ying Yu, Hsun-Hsi Chan

Dept. of Civil Engineering, National Kaohsiung University of Applied Sciences, 415 Chien-Kung Road, Kaohsiung, 807, Taiwan

Received 11 May 2011; received in revised form 2 August 2011; accepted 15 September 2011

Abstract

It is commonly perceived that how well the planning is performed during the early stage will have significant impact on final project outcome. This paper outlines the development of artificial neural networks ensemble and support vector machines classification models to predict project cost and schedule success, using status of early planning as the model inputs. Through industry survey, early planning and project performance information from a total of 92 building projects is collected. The results show that early planning status can be effectively used to predict project success and the proposed artificial intelligence models produce satisfactory prediction results. © 2011 Elsevier Ltd. APM and IPMA. All rights reserved.

Keywords: Project success; Early planning; Classification model; ANNs ensemble; Support vector machines

1. Introduction Menches and Hanna, 2006). In particular, researches have indi- cated that project definition in the early planning process is an im- In the past few decades, the researchers and industry prac- portant factor leading to project success (Le et al., 2010; Thomas titioners have recognized the potential impact of early plan- and Fernández, 2008; Younga and Samson, 2008). Based on ning to final project outcomes and started to put more these results, this research intends to further investigate this rela- emphasis on early planning process (Dvir, 2005; Gibson et tionship and to examine if the status of early planning can be used al., 2006; Hartman and Ashrafi, 2004). In particular, the Con- to predict final project outcomes. struction Industry Institute (CII, a consortium of more than To achieve this goal, a scope definition tool, Project Defi- 100 leading owner, engineering-contractor, and supplier nition Rating Index (PDRI), is adopted in this research to firms from both the public and private arenas based at the Uni- evaluate the completeness of project scope definition during versity of Texas at Austin) has constituted several research the early planning stage. As an easy-to-use tool developed projects focusing on the topics of early planning since the by the Construction Industry Institute (CII), the PDRI is a early 1990s. Their research results have indicated that early comprehensive, weighted checklist of crucial scope definition planning is a key process in the project life cycle and how elements that have to be addressed in the early planning pro- well early planning is performed will affect cost and schedule cess (Gibson and Dumont, 1996). The PDRI for Building Pro- performance (CII, 1995, 1999, 2006). Results provided by jects consists of 64 elements, which are grouped into 11 other researchers also confirm that better early planning will categories and further grouped into three main sections. The improve efficiency and thus lead to profitability (Gigado, 2004; 64 elements are arranged in a score sheet format and sup- ported by 38 pages of detailed descriptions and checklists. For illustration purposes, Section I — Category A of the PDRI for Building Projects (both elements and their weights) ⁎ Corresponding author. is shown in Fig. 1 (CII, 1999). Designed in a score sheet for- E-mail addresses: [email protected] (Y.-R. Wang), mat, the PDRI can be used to measure the status of early pro- [email protected] (C.-Y. Yu), [email protected] (H.-H. Chan). ject planning and a score is obtained after the evaluation. With

0263-7863/$ - see front matter © 2011 Elsevier Ltd. APM and IPMA. All rights reserved. doi:10.1016/j.ijproman.2011.09.002 Y.-R. Wang et al. / SciVerse ScienceDirect 30 (2012) 470–478 471

Fig. 1. PDRI for building projects—category A. the maximum score of 1000 points, the PDRI is designed in a (bootstrap aggregating and adaptive boosting) and SVMs way that a lower score indicating a better-defined project prediction models. These techniques scope. After its introduction, the PDRI has been widely used are briefly introduced in the following sections. by the construction industry, especially within CII member companies, to assist with their early planning process (Gibson et al., 2006). Proven as a successful evaluation tool, the PDRI 2.1. Bootstrap aggregating neural networks is incorporated in the survey questionnaire for this study to collect early planning related information from the building The Bootstrap Aggregating (also known as Bagging) neural construction industry in Taiwan. In the mean time, the project networks model generates an aggregated ANNs predictor outcomes are measured by comparing its original cost and using multiple sets of artificial neural networks. Sets of training schedule estimates with final project cost and schedule. data are generated from bootstrap replicates of the learning set As demonstrated in previous researches, statistical analy- and are used to train ANNs models. If the aggregation predic- sis techniques, such as linear and , and tion outcome is numerical, an average is taken for the predic- artificial intelligence (i.e. neural networks) are successfully tion outcomes from the multiple sets of ANNs models. On the applied for project performance prediction (Berlin et al., other hand, if the prediction outcome is categorical, a plurality 2009; Kim et al., 2009; Ko and Cheng, 2007; Ling et al., vote is conducted to generate the prediction for the aggregation. 2008). Among them, artificial intelligence (neural networks) For this study, the model prediction result, project success, is models produce better prediction results than those obtained obtained by plurality vote from the ANNs classifiers. Previous from regression models (Wang and Gibson, 2010). To further research has shown that bagging can give substantial gains in explore the prediction capability of artificial intelligence accuracy after tests on real and simulated data sets (Breiman, techniques, this research uses modified neural networks (neu- 1996). Through the years, bootstrap aggregating neural networks ral networks ensemble) and support vector machines (SVMs) have been applied to areas such as (Rowley to develop project success classification models. Classifica- et al., 1996), non-linear modeling (Franke and Neumann, 2000; tion models are built and tested with data collected from a Zhang, 1999), regression and classification (Kleinbaum et al., total of 92 building construction projects in Taiwan. 2002; Zhou et al., 2002). It is found that models built from boot- strap aggregated neural networks are more accurate and robust 2. Research methodology than those built from single neural networks. As a result, this re- search applies the bootstrap aggregating neural networks to In order to investigate the relationship between the status model the project success. The bootstrap aggregating neural net- of early planning (as measured by the PDRI evaluation) and works development process is briefly described below. project success, industry data is collected through question- A learning set of L is taken from the data set P and the naire survey. From late 2007 to early 2010, early planning remaining data form the testing set, T (difference between P and project success information is collected from a total of and L). The replicate sub-dataset L(B), each consisting m 92 building projects, representing a total construction cost cases, are drawn randomly from the learning set L with replace- of approximately 1.1 billion U.S.D. The sample covers a ment. A total of n sub-datasets are drawn from the bootstrap wide variety of building projects including schools, houses, distribution approximating the distribution underlying L. apartment buildings, hospitals, offices, temples, recreational These n sub-datasets are then fed into n individual ANNs facilities, hotels, and department stores. The collected infor- models as training data. The aggregating prediction outcome mation is used to build and test neural networks ensemble for these n models is obtained by plurality voting, as illustrated 472 Y.-R. Wang et al. / SciVerse ScienceDirect 30 (2012) 470–478

Fig. 2. Bootstrap aggregating neural networks. in Fig. 2. Finally, the overall bootstrap aggregating model pre- weights are adjusted after each round of training. Weights of in- diction accuracy is examined by the testing dataset, T. correctly classified examples are increased so that the classifier is forced to focus on the hard (or misclassified) examples in the 2.2. Adaptive boosting neural networks training set in the following round of training. The final or com- bined classifier is a weighted majority vote of the multiple clas- Introduced by Freund and Schapire (1997), AdaBoost sifiers where classifiers with higher accuracy are assigned (Adaptive Boosting) algorithm incorporates the ensemble con- higher weights in the vote (Schapire, 2002). It is shown by cept that the final classifier is aggregated from multiple classi- demonstration in real-world applications that AdaBoost can fiers by voting, which is similar to Bagging. Nevertheless, significantly improve neural networks classifiers (Schwenk while Bagging generates the multiple training sets in parallel, and Bengio, 2000). This research incorporates the AdaBoost AdaBoost generates them in sequence, based on the training re- neural networks to build the prediction model in hope to im- sults of the previous classifier (Bauer and Kohavi, 1999), as prove the performance of the neural networks model. The Ada- shown in Fig. 3. Different from Bagging, AdaBoost maintains Boost neural networks development process is briefly described a set of weights over the training set, instead of randomly draw- below. t ing a series of independent bootstrap samples from the original Let wx denotes the weight of example x at tth training where, 1 training dataset (Quinlan, 1996). The weights are used to deter- for every x, wx =1/m. At each trial, a neural networks classifier mine the probabilities that examples are chosen for training in Ct (t=1,2,…,T) is developed from the given training dataset wt. t each round. Initially, all the weights are set equal and these The weight wx reflects the probability of occurrence for

Fig. 3. AdaBoost neural networks. 478 Y.-R. Wang et al. / SciVerse ScienceDirect 30 (2012) 470–478

Gigado, K., 2004. Enhancing the prime contractor's pre-construction planning. Rowley, H.A., Baluja, S., Kanade, T., 1996. Neural network-based face detec- Journal of Construction Research 5 (1), 87–106. tion. Ph.D. Thesis, Carnegie Mellon University, PA. Hartman, F., Ashrafi, R., 2004. Development of the SMARTTM project plan- Schapire, R.E., 2002. The Boosting Approach to : An Over- ning framework. International Journal of Project Management 22, 449–510. view, MSRI Workshop on Nonlinear Estimation and Classification. He, J., Hu, H.J., Harrison, R., Tai, P.C., Pan, Y., 2006. Transmembrane seg- Schwenk, H., Bengio, Y., 2000. Boosting neural networks. Neural Computation ments prediction and understanding using support vector machine and deci- 12 (8), 1869–1887. sion tree. Expert Systems with Applications 30 (1), 64–72. Suykens, J.A.K., Gestel, T.V., Brabanter, J.D., Moor, B.D., Vandewalle, J., Hsu, C.W., Chang, C.C., Lin, C.J., 2003. A Practical Guide to Support Vector 2002. Least Squares Support Vector Machines. World Scientific, Singapore. Classification, Technical report, Department of Computer Science. National Suykens, J.A.K., Gestel, T.V., Brabanter, J.D., Moor, B.D., Vandewalle, J., Taiwan University. 2010. LS-SVMlab: Least Squares—Support Vector Machines. http:// Kim, D.Y., Han, S.H., Kim, H., Park, H., 2009. Structuring the prediction of www.esat.kuleuven.be/sista/lssvmlab 2010(Version 1.7). project performance for international construction projects: a comparative The Construction Industry Institute, 1995. Pre-Peoject Planning Handbook. analysis. Expert System with Application 36 ((2) part 1), 1961–1971. Special Publication 39–2. The University of Texas at Austin, Austin. Kleinbaum, D.G., Klein, M., Pryor, R.R., 2002. Logistic Regression: A Self- The Construction Industry Institute, 1999. Project Definition Rating Index Learning Text, 2nd ed. Springer, New York. (PDRI)—Building Projects. Implementation Resource 155–2. The Univer- Ko, C.H., Cheng, M.Y., 2007. Dynamic prediction of project success using ar- sity of Texas at Austin, Austin. tificial intelligence. Journal of Construction Engineering and Management The Construction Industry Institute, 2006. Front End Planning: Break the Rules, 133 (4), 316–324. Pay the Price. Research Summary 213–1. The University of Texas at Austin, Lam, K.C., Palaneeswaran, E., Yu, C.Y., 2009. A support vector machine model Austin. for contractor prequalification. Automation in Construction 18 (3), 321–329. Thomas, G., Fernández, W., 2008. Success in IT projects: a matter of defini- Le, T., Caldas, C.H., Gibson, G.E., Thole, M., 2010. Assessing scope and man- tion? International Journal of Project Management 26 (7), 733–742. aging risk in the highway project development process. Journal of Construc- Vapnik, V.N., 1995. The Nature of Statistical Learning Theory, 2nd ed. Springer- tion Engineering and Management 135 (9), 900–910. Verlag, New York. Ling, F.Y.Y., Low, S.P., Wang, S.Q., Egbelakin, T., 2008. Models for predict- Wang, Y. R., 2002. Applying the PDRI in Project Risk Management. Ph.D. ing project performance in China using project management practices Thesis, University of Texas at Austin, Austin, TX. adopted by foreign AEC firms. Journal of Construction Engineering and Wang, Y.R., Gibson, G.E., 2010. A study of preproject planning and project Management 134 (12), 983–990. success using ANNs and regression models. Automation in Construction Liong, S.Y., Sivapragasam, C., 2002. Flood stage forecasting with support vector ma- 19, 341–346. chines. Journal of the American Water Resources Association 38 (1), 173–186. Younga, C.S., Samson, D., 2008. Project success and project team manage- Menches, C.L., Hanna, A.S., 2006. Conceptual planning process for electrical ment: evidence from capital projects in the process industries. Journal of construction. Journal of Construction Engineering and Management 132 Operations Management 26 (6), 749–766. (12), 1306–1313. Zhang, J., 1999. Developing robust non-linear models through bootstrap aggre- NeuroDimension, Inc., 2011. http://www.neurosolutions.com/2011. gated neural networks. Neurocomputing 25 (1), 93–113. Quinlan, J.R., 1996. Bagging, Boosting, and C4.5. Proceedings of the Thirteenth Zhou, Z.H., Wu, J., Tang, W., 2002. Ensembling neural networks: many could National Conference on Artificial Intelligence. AAAI Press, pp. 725–730. be better than all. Artificial Intelligence 137 (1), 239–263.