Comparison of Artificial Intelligence Techniques for Project Conceptual Prediction

Haytham H. Elmousalami Project engineer and PMP at General Petroleum Company (GPC), Egypt. MS.c, Department of Construction and Utilities, Faculty of Engineering, Zagazig University, Egypt. E-mail: [email protected]

Abstract: Developing a reliable parametric cost model at cost engineers, project managers, and decision makers the conceptual stage of the project is crucial for projects (Jrade 2000). Parametric cost modeling is to develop a managers and decision makers. Existing methods, such as model based on logical or statistical relations of the key cost probabilistic and statistical algorithms have been developed drivers extracted by conducting qualitative techniques for project cost prediction. However, these methods are (ElMousalami et al. 2018 a) or statistical analyses such as unable to produce accurate results for conceptual cost factor analysis (Marzouk, and Elkadi 2016) or stepwise prediction due to small and unstable data samples. Artificial regression technique (ElMousalami et al. 2018 b). The main intelligence (AI) and machine learning (ML) algorithms motivations to automate cost estimation are that: include numerous models and algorithms for supervised 1. Quantity survey is time and effort consuming process regression applications. Therefore, a comparison analysis (Sabol, 2008). for AI models is required to guide practitioners to the 2. Cost estimation may prone to human errors during appropriate model. The study focuses on investigating estimation or personal judgment where biases and twenty artificial intelligence (AI) techniques which are inaccuracy can exist. conducted for cost modeling such as fuzzy logic (FL) 3. High accurate and reliable tool is required for project model, artificial neural networks (ANNs), multiple managers and decision makers. regression analysis (MRA), case-based reasoning (CBR), Artificial intelligence (AI) includes powerful hybrid models, and ensemble methods such as scalable techniques to automate cost estimate with high precision boosting trees (XGBoost). Field canals improvement based on collected projects data. However, the accuracy of projects (FCIPs) are used as an actual case study to analyze cost prediction is a major criterion in the success of any the performance of the applied ML models. Out of 20 AI construction project, where cost overruns are a critical techniques, the results showed that the most accurate and unknown risk, especially with the current emphasis on tight suitable method is XGBoost with 9.091% and 0.929 based budgets. Moreover, cost overruns can lead to the on Mean Absolute Percentage Error (MAPE) and adjusted cancellation of a project (Feng et al. 2010; AACE 2004). R2. Nonlinear adaptability, handling missing values and Therefore, improving the prediction accuracy is the main outliers, model interpretation and uncertainty have been requirement in developing the cost model. Small data size, discussed for the twenty developed AI models. missing data values, maintaining uncertainty, Keywords: Artificial intelligence, Machine learning, computational complexity and model interpretation are the ensemble methods, XGBoost, evolutionary fuzzy rules key challenges during prediction modeling. Accordingly, generation, Conceptual cost, and parametric cost model. what are the best technique to model the project conceptual cost? Introduction Research Methodology Conceptual cost estimate occurs at 0% to 2% The objective of the study is to answer the last of the project completion where limited information about question through conducting a comparative analysis to AI the project is available with a high level of uncertainty and techniques for conceptual cost modeling. The scope of this unknown risks (Hegazy and Ayed, 1998). Conceptual cost study focuses on the most common (AI) techniques such as prediction is considered one of the fundamental criteria in supportive vector machine (SVM), fuzzy logic (FL) model, the projects’ decision making and feasibility studies at the artificial neural networks (ANNs), Multiple regression early stages of the project. The estimating should be analysis (MRA), case-based reasoning (CBR), hybrid completed within a limited time period. Therefore, the models, and ensemble methods such as scalable boosting accurate conceptual cost estimate is a challengeable task for trees (XGBoost), diction tree (DT), random forest (RF), Adaboost, scalable extreme gradient boosting machines

1

(XGBoost), and evolutionary computing (EC) such as (2016) have developed a highly accurate model based on genetic algorithm (GA). random tree ensembles to predict construction cost index where the model's accuracy has reached 0.8%. Williams and Gong (2014) have built a stacking ensemble learning and text mining to estimate the cost overrun using the project contract document where the accuracy was 44%. Chou and Lin (2012) have established an ensemble learning model of ANNs, SVM and a decision tree for predicting the potential for disputes in public-private partnership (PPP) with the accuracy of 84%. Analytic hierarchy process (AHP) has incorporated into CBR to build a reliable cost estimation model for highway projects in South Korea (Kim, 2013). However, the main gap of these studies is developing deterministic predictive models without taking uncertainty nature into account where adding uncertainty nature to the predicted values improves the quality and reliability of the developed models (Zadeh, 1965, 1973). Consequently, Fuzzy theory can be conducted to handle uncertainty concept to prediction modeling (Zadeh, 1965, 1973). Based on 568 Towers, a four input fuzzy clustering model and sensitivity analysis are conducted for estimating telecommunication towers with acceptable Fig.1.Research methodology. MAPE (Marzouk and Alarabyb, 2014). Shreenaath et al, (2015) have conducted a statistical fuzzy approach for As illustrated in Fig.1, the first step in the prediction of construction cost overrun. The FL model is proposed methodology is a literature review of the previous developed for satellite cost estimation. Such model works practices. The second step is data collection of FCIPs as a fuzzy expert tool for cost prediction based on two input historical cases. The third step is model development based parameters (Karatas and Ince, 2016). However, these on the AI techniques for cost prediction. The fourth step in studies have developed fuzzy systems without mentioning model validation and the final step is models comparisons the method of fuzzy rules generation or the fuzzy rules has and analysis. been developed based on experts’ experience. Determining Literature review the fuzzy rules is the main gap of the previous studies. Many previous studies have applied AI Therefore, a new trend evolves to solve this problem such techniques and ML models. A semilog regression model has as developing hybrid fuzzy modeling for the cost estimate performed to develop cost models for residential building purposes such as evolutionary-fuzzy modeling. projects in German with a prediction accuracy of 7.55% (Stoy et al, 2012). Based on 92 building projects, ANNs and Zhai et al, (2012) have created an improved fuzzy SVM have been used to predicted cost and schedule success system which is established based on fuzzy c-means (FCM) at the conceptual stage. Such a model has a prediction to solve the problem of fuzzy rules generation. Zhu et al, accuracy of 92% and 80% for cost success and schedule (2010) have conducted an evolutionary fuzzy neural success, respectively (Wang et al, 2012). Based on 657 network model for cost estimation based on eighteen building projects in Germany, a multistep ahead approach examples and two examples of training and testing, is conducted to increase the accuracy of the model’s respectively. Cheng and Roy, (2010) have developed a prediction (Dursun and Stoy, 2016). Marzouk and Elkadi hybrid artificial intelligence (AI) system based on (2016) have applied ANNs where the MAPE for test sets supportive vector machine (SVM), FL and GA for decision was 21.18%. Fan et al. (2006) have developed a decision making . tree approach for investigating the relationship between Application to Field canals improvement projects house prices and housing characteristics. Monte Carlo (FCIPs) simulation and a multiple linear regression model have been In this section, the selected AI techniques are developed as a benchmark model to evaluate the model's applied to the conceptual cost prediction of FCIPs in performance where the MAPE was 7.56. Wang and Ashuri Egypt as an actual case study.

2

Case Background new case information to solve the problem. Revising FCIPs are one of the main projects in process is evaluating the suggested solution to the Irrigation Improvement Projects (IIPs) in Egypt. The problem. Finally, retaining process is to update the strategic aim of these projects is to save fresh water, stored past cases with such a new case by facilitate water usage and distribution among incorporating the new case to the existing case-base stakeholders and farmers. To finance this project, (Aamodt and Plaza 1994). A CBR model is conceptual cost models are important to accurately developed to predict the conceptual cost of FCIP predict preliminary at early stages of the project based on similarity attribute of the entered case (Elmousalami et al. 2018 b; Radwan, 2013). comparable with the stored cases. Once attributes are Data collection and feature selection entered, attributes similarities (AS) can be computed Elmousalami et al. (2018 a) have conducted based on equation (1) (Kim & Kang, 2004). qualitative approachs such as Fuzzy Delphi method and Min(퐴푉푁, 퐴푉푅) AS = (1) fuzzy analytical hierarchy process to rank the cost drivers. Max(퐴푉푁, 퐴푉푅) Moreover, Elmousalami et al. (2018 b) have developed a Where AS = Attribute Similarity, AVN = quantitative hybrid approach based on both Pearson Attribute value of new entered case, AVR = Attribute correlation and stepwise regression to filter the key cost value of retrieved case. Depending on AS and drivers. The key cost drivers were area served (P1), pipeline attribute weights (AW), case similarity (CS) can be total length (P2), and the number of irrigation valves (P3), computed by equation (2) (Perera and Waston, and construction year (P4). Accordingly, a total of 144 1998). AW are selected by an expert to emphasize FCIPs during 2010 and 2015 have been Collected. For the existence and importance of the case attributes. validation purposes, this collected sample has randomly ∑푛 (퐴푆 ∗ 퐴푊 ) CS = 푖=1 푖 푖 (2) branched into a training sample (111 instances) and a testing ∑푛 ( 퐴푊 ) sample (33 instances).The training sample in the present 푖=1 푖 Where CS is case similarity, AS is attribute similarity, case study is 111 instances would be sufficiently acceptable AW = attribute to train reliable ML models where Green (1991) concluded weight, (i) is the number of attributes (key cost that [50 + 8*N] is the minimum sample size, and N is the drivers). number of independent variables (key cost drivers). Fuzzy logic model Artificial intelligence (AI) techniques developments Fuzzy logic (FL) is to model human AI techniques are aspects of human knowledge and reasoning taking uncertainties possibilities into computational adaptively to become more vital in system account where incompleteness, randomness, and modeling than classical mathematical modeling (Bezdek ignorance of data are represented in the model 1994). Based on AI, an intelligent system can be developed (Zadeh, 1965, 1973). If–Then rule statements are to produce consequent outputs and actions depending on the utilized to formulate the conditional statements that observed inputs and outputs of the system (Siddique and develop FL rules base system. As shown example in Adeli 2013; Bishop 2006). Fig(2), there are two parameters X1 and X2 where μ Case-based reasoning (CBR) X1 ={ a1,b1,c1,d1}, μ X2 ={ a2,b2,c2,d2}, μ Y ={ ay, by, cy, dy } and the fuzzy system consists of two rules as Case-based reasoning (CBR) is a sustained following: learning and incremental approach that solves Rule 1: IF x1 is a1 AND x2 is c2 THEN y is ay. problems by searching the most similar past case and Rule 2: IF x1 is b1 AND x2 is d2 THEN y is by. reusing it for the new problem situation (Aamodt and Where two inputs are used {X1=4, X2=6}. Such two Plaza 1994). Therefore, CBR mimics a human inputs intersect with the antecedents MF of the two problem solving (Ross 1989; Kolodner 1992). CBR rules where two consequents rules are produced {R1 is a cyclic process of learning from past cases to solve and R2} based on minimum intersections. The a new case. The main processes of CBR are consequent rules are aggregated based on maximum retrieving, reusing, revising and retaining. The intersections where the final crisp value is 3. The retrieving process is solving a new case by retrieving aggregated output for Ri rules are given by the past cases. The case can be defined by key Rule 1: μ R1 = min [μ a1 (x1) and μ c2 (x2)] attributes. Such attributes are used to retrieve the most Rule 2: μ R2 = min [μ b1 (x1) and μ d2 (x2)] similar case, whereas, reusing process is utilizing the Y: Fuzzification [max [R1, R2]

3

1997)]. Traditionally, an expert is consulted to define such fuzzy rules or the fuzzy designer can use trial and error approach to map the fuzzy rules and MFs. However, such an approach is time-consuming and does not guarantee the optimal set of the fuzzy rules. Moreover, the number of fuzzy IF-Then rules increase exponentially by increasing the number of inputs, linguistic variables, or a number of outputs. In addition, the experts cannot easily define all required fuzzy rules and the associated MFs. In many engineering problems, the evolutionary algorithm (EA) has been conducted to automatically develop fuzzy rules and MFs to improve the system performance (Chou 2006; Loop et al. 2010). Genetic-Fuzzy model has been developed to Fig. 2. Fuzzy rules firing. optimally generate fuzzy rules. The study has applied The first step in the FL model is fuzzification Genetic algorithm (GA) to optimally select the fuzzy the four key cost drivers and identify their MFs. The rules where 2401 rules represent the whole possible most critical stage is to develop fuzzy rules base. search space for GA. The formulation of genetic experts are consulted to give their experience to algorithm model depends mainly on defining two develop such rules. As shown in Fig. 3, seven triangle core terms: a chromosome representation and an MFs have been used to fuzzify the variables of FCIPs objective function. First, based on Michigan For example, the input variable construction year approach, the chromosomes represents the fuzzy rules consists of seven triangle MFs {MF1, MF2, MF3, MF4, where the number of chromosomes (CHn) are the MF5, MF6, MF7}. Accordingly, the number of number of fuzzy rules. Each chromosome is consists possible rules equals 74 rules. Therefore, there is a of five genes where four genes are for the key cost need to automatically generate such rules. For FL drivers the fifth gene is for the output (the cost of model, a total of 190 IF-Then rules have been FCIP). Each gene consists of one of the seven formulated. membership functions (MFi) where (i) is ranging from one to seven (MF1:MF7) as shown in Fig.4. For example: IF {Area served (P1) is MF5 AND Total length (P2) is MF2 AND Irrigation valves (P3) is MF2 AND construction year (P4) is MF6} THEN {The Cost LE / Mesqa is MF3}.

Fig. 3. Fuzzy logic model for FCIPs. Fig. 4. The process of genetic fuzzy system. Genetic-Fuzzy model Many approaches exist for evolutionary Secondly, the fitness function is problem- fuzzy hybridization [(Angelov, 2002); (Pedrycz et al, dependent where the objective is to enhance the

4

accuracy and quality of the system performance Decision trees (DT) (Hatanaka et al., 2004). The fitness function is formulated to minimize the MAPE as equation (3). DT is a supervised ML model that divides 풏 the cost data into hierarchical rules on each tree node ퟏ |풚풊 − 풚̂풊| 푭 = 푴푨푷푬 = ∑ 풙ퟏퟎퟎ (ퟑ) by a repetitive splitting algorithm (Berry and Linoff 풏 풚̂풊 풊=ퟏ 1997; Breiman et al. 1984). Classification and Where: (F) is a fitness function, (n) is the number of regression trees (CART) is a DT model that can be cases, (i) is the number of the case and 풚̂풊 is the applied for both regression (continuous variables) and outcome of the model and 풚풊 is the actual outcome. classification (categorical variables) applications As shown in Fig.4, the process of the developed (Quinlan 1986). CART has been developed to the model consists of five main steps: FCIPs data. The features of the tree are the key cost drivers and the terminal tree nodes (leaf nodes) are 1. An initial population of chromosomes has been continues values of the project cost. identified to represent the initial state of the fuzzy MRA and transformed regressions rules. The four key cost drivers have been fed to the Haytham et al. (2018 b) have developed five fuzzy system. regression models: standard linear regression, quadratic 2. The fuzzy system produces the final output of the model, reciprocal model, semilog model, and power model. system 풚̂ 풊 The most accurate model is the quadratic model where the 3. The 풚̂ has been fed to fitness function (F) to 풊 quadratic model is a dependent variable transformation by evaluate the model performance. 4. GA uses the fitness function (F) to evaluate the taking the Square Root (Sqrt). The regression model search process where crossover probability and consists of four key cost drivers as independent variables mutation probability have been set at 0.7 and 0.01 and (Y) represent FCIP cost per field canal as the dependent respectively. predictor. The quadratic regression model is formulated as 5. The new population of fuzzy rules has been the following Equation (6):

produced based on crossover and mutation 0.5 processes. (Y) =-37032.81 + 2.21*P1 + Support Vector Machines (SVM) 0.1691*P2 +2.265*P3 + 18.594*P4 (6) SVM is a non-parametric supervised ML ANNs and DNNs algorithm that can be applied for regression and ANNs is a computational method that is classification problems (Vapnik 1979). inspired by neuron cells. The major advantage of The study has applied the radial base ANNs is their ability to fit nonlinear data (Siddique function (RBF) as a kernel for Supportive vector and Adeli 2013). Haytham et al. (2018 b) have regression model. A positive slack variable (휉) will developed three ANNs models with structure (4-5-0- be added to handle the non-linearity of the data as 1). four represents the number of inputs (four key cost the following equation (4) (Cortes and Vapnik drivers), five represents the number of hidden nodes 1995). in the first hidden layer, zero means no second hidden 푦푖 (푊. 푋푖 + 푏) ≥ 0 − 휉 , i layer used and one represents one node to produce the = 1,2,3, … … m (4) total cost of the FCIPs. The first model is the The objective is to minimize untransformed model whereas the second model is misclassifications cases through optimizing the transformed by the square root of the project cost. The margin and hyperplanes distance as equation (5). third model is transformed by the natural log of the 푖=푚 푖=푚 1 project cost. The type of training is batch, the learning 푀푖푛 ∑ 푤. 푤푇 + 퐶 ∑ 휉 (5) algorithm is the scaled conjugate gradient and the 2 푖 푖=0 푖=0 activation function is hyperbolic tangent. A standard For i = 1,2,3, …… m where m is the number of rectified linear unit (ReLU) is an activation function cases. that can enhance the computing performance of ANNs (LeCun et al. 2015; Nair & Hinton, 2010). Mathematically, ReLU is defined as:

5

푋 , 푖푓 푋 ≥ 0 {(x ; y )} (x ∈ Rm; y ∈ R) where R is the real 퐴 = { 푖 푖 i i i i 0 , 푋푖 < 0 numbers set. K is an additive function to predict the output as equation (7). Deep neural networks (DNNs) has been developed to be 퐾 investigated. The structure of DNNs model consists of ( ) three hidden layers where each hidden layer contains 100 푦̂푖 = ∑ 푓푘 푋푖 , 푓푘 neurons. The activation function is ReLU function. 푘=1 ∈ 퐹 (7) Accordinly DNNs’s structure is (4-100-100-100-1). m Where F = { f (x) = wq(x) } (q: ℝ → T , Ensemble methods T ∈ ℝ m). q is the structure of each tree that Ensemble methods (fusion learning) are maps an example to the corresponding. T elegant data mining techniques to combine multiple corresponds to the number of leaves in the tree. 푦̂푖 learning algorithms to enhance the overall is the predicted dependent variable (FCIPs cost LE performance (Hansen and Salamon 1990). Ensemble / project). Each 푓푘 represents an independent tree methods can apply ML algorithms such as ANN, DT, structure q and leaf weights (w). (Xi) represents and SVM which are called “base model or base independent variables. (F) represents the regression learner” as inputs for ensemble methods. The concept trees space. behind ensemble methods can be illustrated as Fig.5 Bagging and mathematically as equation (7) (Chen and Guestrin 2016). Bagging is a variance reduction algorithm to train several classifiers based on bootstrap aggregating as shown in Fig.6 (a). Bagging algorithm randomly draws replicas of training dataset with replacement to train each classifier (Breiman 1996; Breiman 1999). As a result, diversity is obtained by resampling several data subsets. On average, each bootstrap sample contains 63.2% of the original training data set. The CART is selected as a base learner for bagging model.

Random Forest (RF)

Random Forests is a one of bagging ensemble learning models that can produce accurate performance without overfitting issue (Breiman 2001) as shown in Fig.6 (b). RF algorithms draw Fig. 5. Additive function concept. bootstrap samples to develop a forest of trees based on random subsets of features. Extremely randomized tree algorithm (Extra Trees) merges the randomization of random subspace to a random selection of the cut-point during splitting tree node process. Extremely randomized tree mainly controls the attribute randomization and smoothing parameters (Geurts 2006).

Boosting and Adaptive boosting (Adaboost)

Fig. 6. (a) Bagging, (b) RF, and (c) Boosting. Schapire has presented Boosting procedure (also known as adaptive resampling) as an algorithm For the given dataset (D) with n examples that boosts the performance of weak learning (144 cases) and m features (4 key cost drivers) D = algorithms (Schapire 1990). Bagging generates

6 classifiers in parallel while boosting develops the classifiers sequentially as shown in Fig.6 (C). Thus, Evaluation techniques Boosting converts weak models to strong ones. Evaluation techniques for predictive models Freund and Schapire (1997) have can be MAPE, the mean squared error (MSE), the presented Adaptive Boosting algorithm (AdaBoost). root mean squared error (RMSE), the coefficient of 2 2 AdaBoost is selected as one of the top ten data determination (R ) or adjusted R* mining. AdaBoost serially manipulates the cost data MAPE is comparing the predicted and actual for each base learner to AdaBoost assigns equal outcomes (Makridakis et al. 1998) as equation (3). weights for all cases where larger weights are MAPE can be classified as an excellent prediction assigned to the misclassified cases. The objective is if MAPE is less than 10 %, between 10% from 20 to make a greater focus on the misclassified cases to % is good prediction. Between 20% to 50 % is be corrected in the consequent iteration. In acceptable forecasting and more than 50 % is addition, AdaBoost algorithm assigns other weights inaccurate prediction (Lewis 1982). However, to rank each individual base learning algorithm based Peurifoy and Oberlender (2002) have defined 20% on its accuracy (Bauer and Kohavi 1999). The CART as acceptable limit for the conceptual cost estimate is selected as a base learner for AdaBoost model to based on MAPE. Therefore, this study has predict the conceptual cost of FCIPs. categorized model accuracy to three main Extreme Gradient Boosting (XGBoost) categories: XGBoost is a Large-scale ML system that 푀퐴푃퐸 % 푐푎푡푒푔표푟푖푧푎푡푖표푛 can build a highly scalable end-to-end ensemble tree 푏푒푙표푤 10 , 10% ≥ 푀퐴푃퐸 ≥ 0 boosting system for big data processing (Chen and = { 푏푒푙표푤 20 , 20% ≥ 푀퐴푃퐸 > 10% Guestrin 2016). XGBoost is a modified gradient tree 푢푛푎푐푐푒푝푡푎푏푙푒 , 푀퐴푃퐸 > 20% model with regularization term to the additive Where “below 10” indicates high accuracy function as equation (8): level than “푏푒푙표푤 20”. 풏 푲 The R-squared R2 (coefficient of determination) is 풏 푳(ɸ) = (풙 + 풂) = ∑ 풍(풚̂풊, 풚풊) + ∑ 휴 (풇풌) , expressed as equation (9): ퟐ 풌=ퟎ 풌=ퟏ 푺푺푬 ∑풏 (풚 − 풚̂ ) 풘풉풆풓풆 ퟐ 풊=ퟏ 풊 풊 푹 = ퟏ − = ퟏ − ퟐ (ퟗ) ퟏ 푺푺푻 ∑풏 (풚 − 풚̅ ) 휴(풇) = 휸푻 + 흀 ‖풘‖ퟐ (ퟖ) 풊=ퟏ 풊 풊 ퟐ Where SSE is the sum of squares of the Where L represents a differentiable residuals and SST is the total sum of squares. 풚̅풊 is the convex cost function that determines the difference arithmetic mean of the Y variable. R2 measures the between the predicted output 풚̂풊 and the actual percentage of the variation percentage of the predictor 2 output yi. 훺 is a regularization term to avoid 풚풊 explained by the dependent variable X. Thus, R overfitting and smooth the learned weights (Wi). indicates how well the model fits the cost data. IF R2 The regularization term penalizes the complexity value of 0.9 or above, it is classified as very good, of the regression tree functions. above 0.8 is good, above 0.5 is satisfactory, below 0.5 Stochastic gradient boosting (SGB) is poor (Aczel 1989; Ostertagová 2011). Adjusted R- The performance of gradient boosting can squared R∗2 is computed by equation (10). iteratively be improved as stochastic gradient (ퟏ − 푹ퟐ)푲 boosting algorithm by injecting randomization into 푹∗ퟐ = 푹ퟐ − (ퟏퟎ) 풏 − (푲 + ퟏ) the selected data subsets. Injecting randomization to Where R∗2 is adjusted for the number of boosting algorithm can substantially boost both the variables included in the proposed model where R*2 fitting accuracy and computational cost of the is lower than R2 value. For model evaluation, R∗2 is gradient boosting algorithm [Breiman (1996); always preferred to R2 to avoid the over-fitting (Freund and Schapire, 1996)]. The training data is problem (Aczel 1989; Ostertagová 2011). randomly drawn at each iteration without Comparison and analysis replacement from the data set. Stochastic gradient boosting can be viewed in this sense as a boosting MAPE and R*2 have been validated the bagging hybrid. twenty developed models as displayed in Table.1. The

7 whole developed models have been descendingly sorted (Perner et al. 2001). Another advantage of DT is from M 1 to M 20 based on MAPE as shown in Fig.7. avoiding the curse dimensionality and providing a high- Elmousalami et at. (2018 b) have presented quadratic performance computing efficiency through its splitting regression model (M 2) as the most accurate for FCIPs procedure (Prasad et al. 2006). However, DT is among the developed regression and ANNs models producing unsatisfactory performance in time series, (M3, M4, M5, M6, M8, M13, and M14) with 9.120 and noisy, or nonlinear data (Curram and Mingers 1994). 0.851 for MAPE and R*2, respectively. However, this Although DT (CART) is inherently used as a based study presents that XGBoost (M 1) is more accurate learner for the ensemble methods, DT (M 16) produces than quadratic regression (M 2). XGBoost (M 1) comes 12.488 % MAPE less than all developed ensemble in the first place slightly higher than M 2 with 9.091 % methods (M1, M7, M9, M10, M11, and M12). and 0.929 for MAPE and R*2, respectively. Moreover, Therefore, Ensemble methods produce better the unique advantage of the XGBoost is its high performance than a single learning algorithm. scalability where it can process noisy data and fit high Moreover, ensemble methods can effectively handle dimension data without overfitting. XGBoost applies missing values and noisy data due to scalability. parallel computing to effectively reduce computational complexity and learn faster (Chen and Guestrin 2016). Ensemble methods and data transformation play Another key advantage of XGBoost is handling the an important role in prediction accuracy. However, the main missing values where defaults direction is identified as gap of the previous models is lacking the uncertainty shown in Fig.5. Accordingly, no effort is needed for modeling to the prediction cost model. Therefore, Fuzzy logic theory has been conducted to maintain uncertainty cleaning the collected data (Fan, 2008). concept through fuzzy logic model (M 17) and hybrid fuzzy Ensemble methods such as [Extra Trees (M model (M 20). The number of generated rules by the fuzzy 7), bagging (M 9), RF (M 10), AdaBoost (M 11), and genetic model (M 17) are 63 rules and the MAPE is 14.7%. SGB (M 12)] have produced a high acceptable On the other hand, a traditional fuzzy logic model (M performance where its accuracy is ranging from 9.714% 20) has been built based on the experts ‘experience where a to 11.008 %. The ensemble learning methods can total of 190 rules are generated to cover all the possible effectively deal with the problems of high-dimension combinations of the fuzzy system and MAPE is 26.3 %. data, complex data structures, and small sample size. Moreover, the fuzzy rules (IF-Then rules) generated by Bagging algorithms can increase generalization by experts have redundant rules which can be deleted to decreasing variance error (Breiman 1998) while improve the model computation and performance. boosting can improve generalization by decreasing bias Moreover, the expert’s knowledge cannot cover all error (Schapire et al. 1998). Ensemble methods can combination to represent all possible rules (2401 rules). In effectively handle continues, categorical, and dummy addition, the generation of the experts’ rules is time and features with missing values. However, ensemble effort consuming process. Consequently, hybrid fuzzy methods may increases model complexity which systems are more effective than traditional fuzzy logic decreases the model interpretability (Kuncheva 2004). system. Although the prediction accuracy of the fuzzy RF (M 10) is a robust algorithm against noisy data or genetic model (M 17) and the fuzzy logic model (M 20) is big data than the DT (M 16) algorithm (Breiman, 1996; 14.7 % and 26.3 %, respectively, the fuzzy model would Dietterich, 2000). However, RF algorithm is unable to produce more reliable prediction results due to taking interpret the importance of features or the mechanism uncertainty into account. However, the traditional fuzzy of producing the results. model gives unacceptable accuracy of 26.3% MAPE (Peurifoy and Oberlender 2002). Therefore, maintaining DNNs (M 15) produces 12.059% MAPE less uncertainty decrease the predictive model accuracy. than all the developed MLP (M 4, M 5, and M 8). Accordingly, DNNs provide bad performances with a CBR (M 18) produces acceptable low accuracy of small dataset. Conversely, deep learning and DNNs can 17.3% MAPE. The advantage of CBR is dealing with a vast produce the most accurate performance with high amount of data where all past cases and new cases are stored dimension data (LeCun et al. 2015). An alternative to in database techniques (Kim and Kang 2004). Moreover, the black box nature of ANNs and DNNs, DT generates Finding similarities and similar cases improve the reliability logic statements and interpretable rules which can be and confidence in the output. Hybrid models can be used for identifying the importance of data features incorporated to CBR to enhance the performance of CBR

8 such as applying GA and decision tree to optimize attributes process. SVM can be applied for both regression weights and applying regression analysis for the revising and classification tasks. SVM (M 19) produces unacceptable accuracy of 21.217% MAPE (Peurifoy and Oberlender 2002). Finally, Table.2 summarizes strengths and weakness of each developed model.

Table.1: Accuracy of the developed algorithms. Notation Algorithm / model Algorithm type: MAPE MAPE % R2 R*2 supervised regression % categorization M 1 XGBoost Ensemble methods 9.091 below 10 0.931 0.929 M 2 Quadratic regression* MRA 9.120 below 10 0.857 0.851 M 3 Plain regression* MRA 9.130 below 10 0.803 0.796 M 4 Quadratic MLP* ANNs 9.200 below 10 0.904 0.902 M 5 Plain MLP* ANNs 9.270 below 10 0.913 0.912 M 6 Semilog regression* MRA 9.300 below 10 0.915 0.910 M 7 Extra Trees Ensemble methods 9.714 below 10 0.948 0.947 M 8 Natural log MLP* ANNs 10.230 below 20 0.905 0.910 M 9 Bagging Ensemble methods 10.246 below 20 0.914 0.911 M 10 RF Ensemble methods 10.503 below 20 0.916 0.913 M 11 AdaBoost Ensemble methods 10.679 below 20 0.875 0.871 M 12 SGB Ensemble methods 11.008 below 20 0.926 0.924 M 13 Reciprocal regression* MRA 11.200 below 20 0.814 0.801 M 14 Power (2) regression* MRA 11.790 below 20 0.937 0.931 M 15 DNNs ANNs 12.059 below 20 0.785 0.779 M 16 DT Tree model 12.488 below 20 0.886 0.883 M 17 Genetic-Fuzzy Hybrid model 14.700 below 20 0.863 0.857 M 18 CBR Case based 17.300 below 20 0.859 0.852 M 19 SVM Kernel based 21.217 unacceptable 0.136 0.133 M 20 Fuzzy Fuzzy theory 26.300 unacceptable 0.857 0.851 * : (Elmousalami et at. 2018 b)

9

Table.2: characteristics of the developed algorithms.

Strengths weaknesses interpretation uncertainty noisy data M 1 high scalability, handing no uncertainty and no no yes missing values, high accuracy, interpretation low computational cost M 2 more accurate then plain prone to overfitting yes no no regression, hand nonlinearity of data M 3 Works on small size of dataset linear assumptions yes no no

M 4 High accuracy, handling Black box nature, need no no no complex patterns sufficient data for training M 5 High accuracy, handling Black box nature no no no complex patterns M 6 producing better results than Unable to capture yes no no plain regression complex patterns M 7 handing data randomness Black box nature, sufficient no no yes data M 8 producing better results than Black box nature, sufficient no no no plain MLP data M 9 providing higher performance depending on other no no yes than a single algorithm algorithms performance M 10 Accurate and high No interpretability, no no yes performance on many overfitting can easily occur, problems including non need to choose the number linear of trees M 11 high scalability, and high depends on other no no yes adaptability algorithms performance M 12 handing difficult examples high sensitive to noisy data no no yes M 13 handing data nonlinearity and Unable to capture yes no no training small sample size complex patterns M 14 handing data nonlinearity and Unable capture complex yes no no training small sample size patterns M 15 Capturing complex patterns, Sufficient training data and no no no processing big data and high high cost computation performance computing (HPC) M 16 working on both linear / Poor results on too small yes no no nonlinear problems , and datasets, overfitting can producing logical expressions easily occur M 17 Handing uncertainty and more more complex that fuzzy yes yes no accurate then fuzzy model model and needs more computational resources M 18 Handling small data sets, poor performance and yes no no simple and needs less accuracy where the optimal computational time case cannot be retrieved M 19 Easily adaptable, works very Compulsory to apply no no no well on nonlinear problems, not feature scaling, not well biased by outliers known, more difficult to understand M 20 handing uncertainty low accuracy yes yes no

10

30 model based on FL to obtain uncertainty nature for the developed model and produce more reliable 25 performance. In addition, the study highlights the main 20 problem for fuzzy modeling which is fuzzy rules generation. This study has discussed the importance of 15 the hybrid fuzzy model methodologies to generate rules MAPE % MAPE 10 such as fuzzy genetic model. Therefore, this study 5 recommends developing an automated hybrid fuzzy rules models than traditional fuzzy models. The Fusion 0 of the AI techniques is called hybrid intelligent systems M M M M M M M M M M M M M M M M M M M M where Zadeh (1994) has predicted that the hybrid 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 intelligent systems will be the way of the future. AI models It is recommended to develop more than one cost 1 prediction model such as the regression model, ANNs, FL, ensemble methods or CBR model. As a result, the 0.8 researcher can compare the results of the developed models

0.6 and set a benchmark to select the most accurate model. In 2 R 0.4 addition, the comparisons of the developed models enhance the quality of cost estimate and the decision based on it 0.2 (Amason 1996). Moreover, this paper will provide the 0 comprehensive knowledge needed to develop a reliable M M M M M M M M M M M M M M M M M M M M parametric cost model at the conceptual stage of the 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 project. However, Automation the cost models are prone to AI models many machine learning problems such as overfitting issues,

Fig.7: MAPE and R*2 for all algorithms. and hyper-parameter selection. Conclusion References This paper presents a comparison of AI A.Aamodt, E. Plaza (1994); Case-Based Reasoning: techniques to develop a reliable conceptual cost Foundational Issues, Methodological Variations, and prediction model. Twenty one machine learning models System Approaches. AI Communications. IOS Press, are developed utilizing tree based models, ensemble Vol. 7: 1, pp. 39-59. methods, fuzzy systems, CBR, ANNs SVM, and A.Amason, (1996), Distinguishing the effects of transformed regression models. The accuracy of the functional and dysfunctional conflict on strategic developed models are tested from two perspectives: 2 decision making: resolving a paradox for top MAPE and adjusted R . The results show that the most management teams, Academy of Management Journal accurate and suitable method is XGBoost with 9.091% 39 123–148. and 0.929 for MAPE and adjusted R2, respectively. The study emphasis the importance of ensemble methods for A.Jrade, (2000), a conceptual cost estimating computer improving the prediction accuracy, handling noisy and system for building projects. Master Thesis, missing data. However, the key limitation of the Department of Building Civil & Environmental ensemble methods in inability to interpret the producing Engineering, Concordia University, Montreal, Quebec, results. In addition, Decision tree algorithm and Canada. ensemble methods can provide an alternative technique to many ML algorithms such as multiple regression AACE International recommended practices. 2004. analysis and ANNs. AACE (Association for the Advancement of ) International, Morgantown, W.V. The conceptual cost estimate is conducted under uncertainty. Therefore, this study recommended Aczel, A. D., 1989. Complete Business Statistics. Irwin, using fuzzy theory such as FL and to develop a hybrid p. 1056. ISBN 0-256-05710-8.

11

Angelov .P.P., (2002), Evolving Rule-Based Models. A Chou, J. S., and C. Lin. 2012. “Predicting disputes in Tool for Design of Flexible Adaptive Systems, Physica- public-private partnership projects: Classification and Verlag, Wurzburg. ensemble models.” J. Comput. Civ. Eng. 27 (1): 51–60. https://doi.org/10.1061/(ASCE)CP.1943- B.H Ross, (1989), some psychological results on case- 5487.0000197." based reasoning. Case-Based Reasoning Workshop, DARPA 1989. Pensacola Beach. Morgan Kaufmann, Cortes, C., Vapnik, V., 1995. Support-vector networks. 1989. pp. 144-147. Machine Learning 20 (3), 273–297. Bauer, E. and Kohavi, R., 1999. An empirical Curram, S.P. and Mingers, J., 1994. Neural networks, comparison of voting classification algorithms: Bagging, decision tree induction and discriminant analysis: An boosting, and variants. Machine learning, 36(1-2), empirical comparison. Journal of the Operational pp.105-139. Research Society, 45(4), pp.440-450. Berry, M.J. and Linoff, G., 1997. Data mining Dietterich, T.G., 2000. An experimental comparison of techniques: for marketing, sales, and customer support. three methods for constructing ensembles of decision John Wiley & Sons, Inc.. trees: Bagging, boosting, and randomization. Machine learning, 40(2), pp.139-157. Bezdek, J.C. 1994. “What is computational intelligence? In Computational Intelligence Imitating Elmousalami, H. H., Elyamany, A. H., and Ibrahim, A. Life.” J.M. Zurada, R.J. Marks II and C.J. Robinson H. 2018 a. “Evaluation of Cost Drivers for Field Canals (eds), IEEE Press, New York, pp. 1–12. Improvement Projects.” Water Resources Management. Bishop, C. M. 2006. Pattern recognition and machine Elmousalami, H. H., Elyamany, A. H., and Ibrahim, A. learning, 1–58.New York: Springer. H. 2018 b. “Predicting Conceptual Cost for Field Canal Improvement Projects.” Journal of Construction Breiman, L., 1996. Bagging predictors. Mach. Learning Engineering and Management, 144(11), 04018102. 26, 123–140. Fan, G.Z., Ong, S.E. and Koh, H.C., 2006. Breiman, L., 1999. Pasting small votes for classification Determinants of house price: A decision tree approach. in large databases and on-line. Machine learning, 36(1- Urban Studies, 43(12), pp.2301-2315. 2), pp.85-103. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R. and Breiman, L., 2001. Random forests. Machine learning, Lin, C.J., 2008. LIBLINEAR: A library for large linear 45(1), pp.5-32. classification. Journal of machine learning research, Breiman, L., Friedman, J.H., Olshen, R. and Stone, C., 9(Aug), pp.1871-1874. 1984. Classification and Regression Trees. California, Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R. and Wadsworth. Lin, C.J., 2008. LIBLINEAR: A library for large linear Chen, T. and Guestrin, C., 2016, August. Xgboost: A classification. Journal of machine learning research, scalable tree boosting system. In Proceedings of the 9(Aug), pp.1871-1874. 22nd acm sigkdd international conference on Geurts, P., Ernst, D. and Wehenkel, L., 2006. Extremely knowledge discovery and data mining (pp. 785-794). randomized trees. Machine learning, 63(1), pp.3-42. ACM. Green, S. B. (1991). How many subjects does it take to Cheng, M.-Y., and Roy, A. F. 2010. “Evolutionary do a regression analysis? Multivariate Behavioral fuzzy decision model for construction management Research, 26,499–510. using support vector machine.” Expert Systems with Applications, 37(8), 6061–6069. Hansen, L.K. and Salamon, P., 1990. Neural network ensembles. IEEE transactions on pattern analysis and Chou, C.-H. (2006) Genetic algorithm-based optimal machine intelligence, 12(10), pp.993-1001. fuzzy controller design in the linguistic space, IEEE Transactions on Fuzzy Systems, 14(3), 372–385. Hatanaka, T., Kawaguchi, Y. and Uosaki, K. (2004) Nonlinear system identification based on evolutionary

12 fuzzy modelling, IEEE Congress on Evolutionary Marzouk, M., and Elkadi, M. (2016). “Estimating water Computing, 1, 646–651. treatment plants costs using factor analysis and artificial neural networks.” Journal of Cleaner Production, 112, Hegazy, T., and Ayed, A. (1998). “Neural Network 4540–4549. Model for Parametric Cost Estimation of Highway Projects.” Journal of Construction Engineering and Nair, V. and Hinton, G.E., 2010. Rectified linear units Management J. Constr. Eng. Manage., 124(3), 210– improve restricted boltzmann machines. In Proceedings 218. of the 27th international conference on machine learning (ICML-10) (pp. 807-814). Karatas, Y., and Ince, F. (2016). “Feature article: Fuzzy expert tool for small satellite cost estimation.” IEEE Ostertagová, E., 2011. Applied Statistic (in Slovak). Aerospace and Electronic Systems Magazine, 31(5), Elfa Košice, Slovakia, p. 161, ISBN 978-80-8086-171- 28–35. 1. Kim, G.-H., An, S.-H., and Kang, K.-I. (2004). Pedrycz . W(Ed.), (1997), “Fuzzy Evolutionary “Comparison of construction cost estimating models Computation, Kluwer Academic Publishers”, based on regression analysis, neural networks, and case- Dordrecht, 1997. based reasoning.” Building and Environment, 39(10), 1235–1242. Perera, S., & Watson, I. (1998). Collaborative case- based estimating and design. Advances in Engineering Kim, S. (2013). “Hybrid forecasting system based on Software, 29(10), 801–808. case-based reasoning and analytic hierarchy process for cost estimation.” Journal of Civil Engineering and Perner, P., Zscherpel, U. and Jacobsen, C., 2001. A Management, 19(1), 86–96. comparison between neural networks and decision trees based on data from industrial radiographic testing. Kolodner.J.L, (1992), An Introduction to Case-Based Pattern Recognition Letters, 22(1), pp.47-54. Reasoning, Artificial Intelligence Review 6, 3--34, College of Computing, Georgia Institute of Peurifoy, R.L. and Oberlender, G.D. (2002), Technology, Atlanta, GA 30332-0280, U.S.A. “Estimating Construction Costs”, 5th Edition, McGraw-Hill, New York. Kuncheva, L.I., 2004. Combining pattern classifiers: methods and algorithms. John Wiley & Sons. Prasad, A.M., Iverson, L.R. and Liaw, A., 2006. Newer classification and regression tree techniques: bagging LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep and random forests for ecological prediction. learning. nature, 521(7553), p.436. Ecosystems, 9(2), pp.181-199. Lewis, C. D., 1982. Industrial and Business Forecasting Quinlan, J.R., 1986. Induction of decision trees. Methods, London, Butterworths. Machine learning, 1(1), pp.81-106. Loop, B.P., Sudhoff, S.D., Zak, S.H. and Zivi, E.L. Radwan,H.G. ( 2013). “Sensitivity Analysis of Head (2010) Estimating regions of asymptotic stability of Loss Equations on the Design of Improved Irrigation power electronics systems using genetic algorithms, On-Farm System in Egypt”, International Journal of IEEE Transactions on Control Systems Technology, Advancements in Research & Technology, Volume 2, 18(5), 1011– 1022. Issue1. Makridakis, S., Wheelwright, S. C., Hyndman, R. J., Schapire, R.E., 1990. The strength of weak learnability. 1998. Forecasting methods and applications, New Machine learning, 5(2), pp.197-227. York, Wiley. Schapire, R.E., Freund, Y., Bartlett, P. and Lee, W.S., Marzouk, M., and Alaraby, M. 2014. “Predicting 1998. Boosting the margin: A new explanation for the telecommunication tower costs using fuzzy subtractive effectiveness of voting methods. The annals of clustering.” Journal of Civil Engineering and statistics, 26(5), pp.1651-1686. Management, 21(1), 67–74. Schapire, R.E., Freund, Y., Bartlett, P. and Lee, W.S., 1998. Boosting the margin: A new explanation for the

13 effectiveness of voting methods. The annals of Zadeh, L.A. (1994) Fuzzy logic, neural networks and statistics, 26(5), pp.1651-1686. soft computing, Communications of the ACM, 37, 77– 84. Shreenaath.A, Arunmozhi.S., Sivagamasundari.R,(2015),Prediction of Construction Zhai, K., Jiang, N., and Pedrycz, W. 2012. “Cost Cost Overrun in Tamil Nadu- A Statistical Fuzzy prediction method based on an improved fuzzy model.” Approach,International Journal of Engineering and The International Journal of Advanced Manufacturing Technical Research (IJETR) Technology, 65(5-8), 1 Siddique, N., and Adeli, H. (2013). Computational Zhu, W.-J., Feng, W.-F., and Zhou, Y.-G. 2010. “The intelligence: synergies of fuzzy logic, neural networks Application of Genetic Fuzzy Neural Network in and evolutionary computing. John Wiley & Sons, Project Cost Estimate.” 2010 International Conference Chichester, West Sussex. on E-Product E-Service and E-Entertainment Stoy, C., Pollalis, S., and Dursun, O. 2012. “A concept Zhu, W.-J., Feng, W.-F., and Zhou, Y.-G. 2010. “The for developing construction element cost models for Application of Genetic Fuzzy Neural Network in Project German residential building projects.” IJPOM Cost Estimate.” 2010 International Conference on E- International Journal of Project Organisation and Product E-Service and E-Entertainment. Management, 4(1), 38. Stoy, C., Pollalis, S., and Dursun, O. 2012. “A concept for developing construction element cost models for German residential building projects.” IJPOM International Journal of Project Organisation and Management, 4(1), 38. Vapnik, V., 1979. Estimation of Dependences Based on Empirical Data. Nauka, Moscow, pp. 5165–5184, 27 (in Russian) (English translation: Springer Verlag, New York, 1982). Wang, Y.-R., Yu, C.-Y., and Chan, H.-H. 2012. “Predicting construction cost and schedule success using artificial neural networks ensemble and support vector machines classification models.” International Journal of , 30(4), 470–478. Williams, T.P. and Gong, J., 2014. Predicting construction cost overruns using text mining, numerical data and ensemble classifiers. Automation in Construction, 43, pp.23-29. Yoav, F. and Schapire, R.E., 1999. Experiments with a new boosting algorithm machine learning. In Proceedings of the Thirteen International Conference (pp. 148-156). Zadeh, L.A. (1965) Fuzzy sets, Information and Control, 8(3), 338–353. Zadeh, L.A. (1973) Outline of a new approach to the analysis of complex systems and decision process, IEEE Transactions on System, Man and Cybernetics, 3, 28–44.

14