Kernel Methods in Remote Sensing: a Review Mahesh Pal Assistant Professor Department of Civil Engineering NIT Kurukshetra, 136119 Mpce [email protected]

Kernel Methods in Remote Sensing: a Review Mahesh Pal Assistant Professor Department of Civil Engineering NIT Kurukshetra, 136119 Mpce Pal@Yahoo.Co.Uk

Kernel Methods in Remote Sensing: A review Mahesh Pal Assistant Professor Department of Civil Engineering NIT Kurukshetra, 136119 [email protected]

Abstract: Kernel-based machine learning algorithms are based on mapping data from the original input feature space to a kernel feature space of higher dimensionality to solve a linear problem in that space. Over the last decade, kernel based classification and regression approaches such as support vector machines have widely been used in remote sensing as well as in various civil engineering applications. In spite of their better performance with different datasets, support vector machines still suffer from shortcomings such as visualization/interpretation of model, choice of kernel and kernel specific parameter as well as the regularization parameter. Relevance vector machines are another kernel based approach being explored for classification and regression with in last few years. The advantages of the relevance vector machines over the support vector machines is the availability of probabilistic predictions, using arbitrary kernel functions and not requiring setting of the regularization parameter. This paper presents a state-of- the-art review of SVM and RVM in remote sensing and provides some details of their use in other civil engineering application also.

Introduction Support Vector Machines (SVM) is one of the several Kernel-based techniques available in the field of machine learning (Cortes and Vapnik, 1995). It is one of the most sophisticated nonparametric machine learning approach available with various applications and many different configurations (Burges, 1998) depending upon the kernel and optimization method used. SVM provide an innovative means of classification and regression using the principles of structural risk minimization (SRM) which arise from statistical learning theory. Kernel-based methods are based on mapping data from the original input feature space to a kernel feature space of higher dimensionality and solving a linear problem in that space. These methods allow us to interpret and design learning algorithms geometrically in the kernel space (which is nonlinearly related to the input space). SVM is a good candidate for remote sensing data classification for a number of reasons. Firstly, an SVM can work well with a small training data set as the selection of a sufficient number of pure training pixels has always been a problem with remote sensing data. Secondly, SVM has been found to perform well with high accuracy for problems involving hundreds of features (Gualtieri and Cromp, 1998). The processing of such high-dimensional data to extract quality information has always been a challenge for remote sensing community. Conventional statistical classifiers may fail to process high dimensional data due to the requirement of large training samples. Further, unlike neural networks, SVMs are robust to the overfitting problem as they rely on margin maximization rather than finding a decision boundary directly from the training samples. Despite the current popularity of SVM, like neural network, this algorithm also require setting of appropriate values for the regularization parameter (C), choice of kernel and kernel specific parameter (such as gamma value with radial basis function kernel). A recent development in the kernel based classification/regression approach is the relevance vector machine (RVM) (Tipping, 2001). The RVM is a Bayesian extension of the SVM with some advantages such as the removal of the need to define the regularization parameter, an ability to use non-Mercer kernels and the providing probabilistic output. The aim of this article is to discuss the use of SVM/RVM in remote sensing and to discuss about the parameters considered for the design of SVM/RVM. Finally, a brief review of their applications in other civil engineering applications is also provided.

Support vector machines SVM is based on statistical learning theory as proposed by Vapnik and Chervonenkis (1971) and discussed in detail by Vapnik (1995). The SVM can be seen as a new way to train polynomial, radial basis function, or multilayer perceptron classifiers, in which the weights of the network are found by solving a Quadratic Programming (QP) problem with linear inequality and equality constraints using structural risk minimisation rather than by solving a non-convex, unconstrained minimisation problem, as in standard neural network training technique using empirical risk minimisation. Empirical risk minimises the misclassification error on the training set, whereas structural risk minimises the probability of misclassifying a previously unseen data point drawn randomly from a fixed but unknown probability distribution. The name SVM results from the fact that one of the outcomes of the algorithm, in addition to the parameters for the classifiers, is a set of data points (the "support vectors") which contain, in a sense, all the information relevant to the classification problem. An SVM is basically a linear learning machine based on the principle of optimal separation of classes and was initially designed to solve classification problems. The aim is to find a linear separating hyperplane that separates classes of interest. The hyperplane is a plane in a multidimensional space and is also called a decision surface or an optimal separating hyperplane or an optimal margin hyperplane. The linear separating hyperplane is placed between classes in such a way that it satisfies two conditions. First, all data vectors that belong to the same class are placed on the same side of the hyperplane. Second, the distance between the closest data vectors in both the classes is maximized (Vapnik, 1995). In other words, the optimum hyperplane is the one that leaves the maximum margin between the two classes, where margin is defined as the sum of the distances of the hyperplane from the closest data vectors of the two classes. For each class, the data vectors forming the boundary of classes are located on supporting hyperplanes. Thus, these data are called the Support Vectors (Schölkopf, 1997) and are the most significant ones for SVM (Schölkopf, 1997). Many a times, a linear separating hyperplane is not able to classify input data without error. Under such circumstances, the data are transformed to a higher dimensional space using a non-linear transformation function that spreads the data apart such that a linear separating hyperplane may be found in that space. But, due to very large dimensionality of the feature space, it is practically not possible to compute the inner product of two transformed data vectors. This may, however, be achieved by using a kernel function in place of the inner product of two transformed data vectors in the feature space. The use of kernel function reduces the computational effort by a significant amount. Other advantages of SVM are their ability to adapt their learning characteristic via a kernel function and an ability to adequately classify data on a high-dimensional feature space with a limited number of training data sets (Vapnik, 1995). Linearly separable classes are the simplest case on which to train a support vector

machine. Let the training data with k number of samples is represented by {}xi ,ci , i = 1, …, k, where x ∈ R n is an n-dimensional vector and c ∈{−1, +1} is the class label. These training patterns are said to be linearly separable if a vector w and a scalar b can be defined so that inequalities (1) and (2) are satisfied.

w ⋅ x i + b ≥ +1 for all c = +1 (1)

w⋅xi +b ≤ −1 for all c = -1 (2) The aim is to find a hyperplane that divides the data so that all the points with the same label are on the same side of the hyperplane. This amount to finding w and b such that:

ci ()w⋅ xi + b > 0 (3) If a hyperplane exists that satisfies (3), the two classes is said to be linearly separable. In this case, it is always possible to rescale w and b so that

min ci ()w⋅ xi + b ≥ 1 1 ≤i≤k That is, the distance from the closest point to the hyperplane is 1 w . Then (3) can be written as

ci ()w⋅ xi + b ≥ 1 (4) The hyperplane for which the distance to the closest point is maximal is called the optimal separating hyperplane (OSH). As the distance to the closest point is 1 w , the 2 OSH can be found by minimising w under constraint (4). The minimisation procedure

uses Lagrange multipliers and Quadratic Programming (QP) optimisation methods. If λi , i = 1,...,k are the non-negative Lagrange multipliers associated with constraint (4), the optimisation problem becomes one of maximising: 1 L()λ = ∑ λi − ∑ λi λ j cic j( xi ⋅ x j ) (5) i 2 i, j

under constraints λi ≥ 0, i = 1, …..,k.

a a a If λ = ()λ1,...... ,λk is an optimal solution of the maximisation problem (5) then the optimal separating hyperplane can be expressed as: a a w = ∑ ciλi xi (6) i a The support vectors are the points for which λ i > 0 when the equality in (4) holds.

If the data are not linearly separable then a slack variable ξi , i =1,……,k can be

introduced with ξi ≥ 0 (Cortes and Vapnik, 1995) such that (4) can be written as

ci ()w⋅ xi + b −1+ ξi ≥ 0 (7) and the solution to find a generalised OSH, also called a soft margin hyperplane, can be obtained using the conditions ⎡1 2 k ⎤ min ⎢ w + C∑ ξi ⎥ (8) w,b,ξ1,....ξk ⎣2 i=1 ⎦

ci ()w⋅ xi + b −1+ ξi ≥ 0 (9)

ξi ≥ 0 i = 1, ……k. (10) The first term in (8) is same as in the linearly separable case and control the learning capacity, while the second term controls the number of misclassified points, whereas parameter C is chosen by the user. Larger value of C means assigning a higher penalty to errors. In situations where it is not possible to have a hyperplane defined by linear equations on the training data, the techniques discussed for linearly separable data can be extended to allow for non-linear decision surfaces. Boser et al., (1992) suggested to map input data into a high dimensional feature space through some nonlinear mapping. The transformation to a higher dimensional space spreads the data out in a way that facilitates the finding of linear hyperplane. After replacing x by its mapping in the feature space Φ()x , equation (5) can be written as: 1 L()λ = ∑λi − ∑λiλ j cic j (Φ()xi ⋅Φ(x j )) (11) i 2 i, j It is convenient to introduce the concept of kernel function K, such that:

K(xi ,x j )= Φ(xi )⋅ Φ(x j ) (12) A kernel function is substituted for the dot product of the transformed vectors. The formulation of the kernel function from the dot product is a special case of Mercer’s theorem (Vapnik, 1995; Cristianini and Shawe-Taylor, 2000). In this optimisation problem, only the kernel function is computed in place of computing Φ()x , which could be computationally expensive. By doing this, the training data are moved into a higher- dimensional feature space where the training data may be spread further apart and a larger margin may be found for the optimal hyperplane. For further details of SVM readers are referred to Vapnik (1995) and Cristianini & Shawe-Taylor (2000).

Relevance Vector Machines Relevance vector machine (RVM) is a recent development in kernel based machine learning approaches and can be used as an alternative to SVM for both regression and classification problems. RVM is based on a Bayesian formulation of a linear model with an appropriate prior that results in a sparse representation than that achieved by SVM. RVM is based on a hierarchical prior, where an independent Gaussian prior is defined on the weight parameters in the first level, and an independent Gamma hyper prior is used for the variance parameters in the second level. This results in an overall student-t prior on the weight parameters, which leads to model sparseness (Tipping, 2001). Relevance vector machines (RVM) have recently attracted much interest in various civil engineering applications. RVM can effectively be used for regression and classification problems. Major advantages of RVM over the SVM are: 1. reduced sensitivity to the hyperparameter settings, 2. Ability to use non-Mercer kernels, 3. probabilistic output with fewer relevance vectors for a given dataset and 4. No need to define the parameter C. Like SVM, the RVM was originally designed for binary classification. In a two class classification by RVM the aim is, essentially, to predict the posterior probability of membership for one of the classes (0 or 1) for a given input x. A case may then be allocated to the class with which it has the greatest likelihood of membership.

For a 2-class problem with training data X =(x1 ,...... , xn ) having class labels

C = ()c1, ...... ,cn with ci ∈ ()−1,1 . Based on Bernoulli distribution, the likelihood is expected as: n 1− ci ci p()c w = ∏σ{}()y ()xi []1 −σ{}()y ()xi (13) i =1 Where σ()y is the logistic sigmoid function: 1 σ()y()x = 1+ exp()− y()x ∗ To obtain p()c w , an iterative method has to be used. Let αi denotes the maximum a

posteriori estimate of the hyperparameter αi . The maximum a posteriori estimate of the

weights ( Wm ) can be obtained by maximizing the following objective function: n n ∗ f ()w1 , w2 ,...... , wn = ∑ log p ()ci wi + ∑log p(wi α i ) (14) i =1 i =1 where the first summation term corresponds to the likelihood of the class labels, and the second term corresponds to the prior on the parameters wi. In the resulting solution, the gradient of f with respect to w is calculated and only those training data having nonzero coefficients wi (called relevance vectors) will contribute to the decision function. The

posterior is approximated around Wm by a Gaussian approximation with −1 covariance Σ = − ()H W m and mean µ =ΣΦ T Bc

where H is the Hessian of f, matrix Φ has elements φi j =K(x i , x j ) and B is a diagonal matrix with elements defined by σ(y(x i )) [1−σ(y(x i ))]. An iterative analysis is followed to find the set of weights that maximizes the function

(14) in which the hyperparameters αi , associated with each weight are updated. For further details of RVM, readers are referred to Tipping (2001).

SVM/RVM for land cover classification The first reported work in the use of SVM for remote sensing data classification was in 1998 (Gualtieri & Cromp, 1998). They used a SVM to classify an Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) image. Zhu and Blumberg (2002) classified an Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) image and reported the results of a classification experiment with an SVM using polynomial and Radial Basis Function (RBF) kernel. Melgani and Bruzzone (2002) used an SVM to classify an AVIRIS image and compared that with K-Nearest Neighbours (K-NN) and Radial Basis Function neural network (RBF-NN) classifiers. Huang et al. (2002) intensively investigated the accuracy obtained from SVM classifiers on a Landsat TM image. They also reported a better performance by the SVM classifier in comparison with the back propagation neural network, the decision tree and the maximum likelihood classifier. Pal and Mather (2003) reported the use of SVM with multi and hyperspectral data and found that SVM provide an improved performance in comparison to the backpropagation neural network and maximum likelihood classifier with both multi- and hyperspectral data. After 2003, a large number of works reporting the use of SVM for various applications using remote sensing data is reported (Archibald and Fann, 2007; Bazi and Melgani, 2006;Camps-Valls et al., 2004; Camps-Valls and Bruzzone, 2005; Camps-Valls et al., 2006; Camps-Valls et al., 2008; Dixon and Candade, 2008; Dalponte et al., 2008; Foody and Mathur, 2004; 2006, Mathur and Foody, 2008; Melgani and Bruzzone, 2004; Mazzoni et al., 2007; Plaza et al., 2008; Pal and Mather, 2004, 2005, 2006; Pal, 2006; Waske and Benediktsson, 2007). Like SVM, the RVM may be used for land cover classification problems using remote sensing data. An exhaustive literature survey suggests only two application of RVM for land cover classification. First study by Demir and Erturk (2007) utilized RVM for hyperspectral data classification and compared its performance with SVM. This study suggested that the performance of RVM is slightly inferior to that of SVM with same training dataset but requires a small number of relevance vector in comparison to the support vector used by SVM. Another study by Foody (2008) used airborne thematic mapper data to classify crop types at an agricultural test site and suggested similar trends in terms of classification accuracy and number of relevance vector as by Demir and Erturk (2007). He also suggested that the probabilistic output information may also be of value to the analyst undertaking the classification especially and may help to identify possible routes to refine the analysis to obtain further increases in classification accuracy.

Multiclass approaches with SVM and RVM Originally, SVM were developed to perform binary classification. However, applications of binary classification are very limited especially in remote sensing where most of the classification problems involve more than two classes. A number of methods to generate multiclass SVM have been proposed in literature and is still a continuing research topic. Most commonly used approaches such as one vs. one, one vs. rest, Directed Acyclic Graph (DAG), and Error Corrected Output Coding (ECOC) based multiclass approaches creates many binary classifiers and combines their results to determine the class label of a test pixel. Another category of multi class approach is to modify the binary class objective function and allows simultaneous computation of multiclass classification by solving a single optimisation problem. A survey of remote sensing literature suggest that majority of published work used one vs. one or one vs. rest multiclass approaches. Study by Pal (2005) examined six approaches for the solution of multiclass classification problem using remote sensing data with SVM. Results from this study suggest the suitability of one against one approach for the dataset used in term of classification accuracy and the computational cost. A recent study by Mathur and Foody (2008) suggested that multiclass approach as proposed by Westin and Watkins (1998) works well in comparison to one vs. one or one vs. rest approaches for their dataset. For RVM, Demir and Ertürk (2007) used one vs. one approach whereas Foody (2008) employed the approach based on the principles of multinomial logistic regression as proposed by Zhang and Malik (2005). Within last few years a number of work suggesting the use of different multiclass approaches are reported in literature. Some of the multiclass approaches such as dendrogram based approach (Benabdeslem and Bennani, 2006), Fuzzy support vector machines (Abe and Inoue, 2002) was found performing well in comparison to one vs. one approach. Some other approaches such as LaRank (Bordes et al., 2007), Half-agaianst- Half (Lei and Govindaraju, 2005) and the use of inter-class confusion (Godbole et al., 2002) have also been proposed for multiclass SVM.

Choice of Kernel The choice of a proper kernel function plays an important role in SVM based classification/regression. A number of kernels are discussed in the literature (Vapnik, 1995), but it is difficult to choose one which gives the best generalization for a given dataset. Pal (2002) used five different kernels (the linear kernel, the polynomial kernel, the Radial Basis Function (RBF) kernel, linear spline and the sigmoid kernel) in order to investigate the effect of kernel choice on classification accuracy using multispectral data and suggested that the radial basis and the linear splines perform equally well and achieve the highest accuracy for this data set. Recently, Camps-Valls et al., (2006) and Camps- Valls et al., (2008) proposed to use new kernels that takes care of spatial and spectral characteristics of remote sensing data for land cover classification and found them to be working well in comparison to RBF kernel. Mercier and Lennon (2003) and Sap and Kohram (2008) used a modified kernel that take into consideration the spectral similarity between support vectors for classification of hyperspectral data. This kernel is based on the use of spectral angle to evaluate the distance between support vectors. Results from these studies suggest that this approach can compete with existing classification methods. Recently, Üstün et al. (2006) proposed a universal Pearson VII function based kernel (PUK) to solve SVM based regression problems. They suggested that this kernel can be an alternative to the linear, polynomial and radial basis function kernels. Studies using remote sensing as well as some other datasets suggests an improved performance by PUK kernel based SVM in comparison with RBF kernel based SVM (Table 1). Table 1

RBF based support PUK based support Data set vector machine vector machine

ETM+ data as reported in Pal (2008) Accuracy =88.66% Accuracy = 89.59%

Data as used in Pal and Deswal CC = 0.969 CC = 0.969 (2008) RMSE = 0.328 RMSE = 0.327

CC = 0.996 CC = 0.997 Dataset used in Pal and Goel (2007) RMSE = 0.0014 RMSE = 0.0013

Parameter selection with SVM SVM is a powerful machine learning method for both classification and regression. However, SVM require the user to set two or more parameters which affect the training process as well as their setting can have a profound affect on the final performance and finding good parameters can become a computational burden as the size of the dataset increases. Most of the studies in remote sensing used grid search method together with k- fold cross-validation error for parameter selection with SVM. A grid search tries values of each parameter across the specified search range using geometric steps. Grid searches are computationally expensive because the model must be evaluated at many points within the grid for each parameter. Recently, Yan et al., (2007) proposes an evolutionary algorithm to automatically determine the optimal parameters of SVM with the better classification accuracy and generalization ability simultaneously. They used evolutionary SVM for Land-cover classification using remote sensing data. Their study suggests that the use of evolutionary algorithm for finding the optimal parameters results in improvement in overall accuracy and generalization ability in comparison to a conventional SVM. Several other approaches using evolution strategies, genetic algorithm, particle swarm optimization and their combination with grid search are proposed in literature (Keerthi, 2002; Frohlich and Zell, 2005; Friedrichs and Igel, 2005; Lorena and de Carvalho, 2006; Liu et al., 2006; Chunhong and Licheng, 2004; Guo et al., 2006) and found performing well in comparison to grid search method for different datasets.

Visualization of SVM model SVM is found to build accurate models in comparison to other popular machine learning approach but these algorithms are at a disadvantage in terms of intuitive presentation of the classifier, particularly when compared to some other supervised learning techniques like classification trees. Achieving high prediction accuracy and the interpretability of models are two important criteria for machine learning algorithms. While high accuracy classifier such as SVM has intensively been explored, its interpretability still poses a difficult problem. A SVM provide only the support vectors, as a subset of training data that defines the decision boundary. This only reduces the number of data to consider in the interpretation but does not give any idea about “which are the most important factors that determine the class of the instance?”, and “What is the magnitude of the effect of these?”, and “How do various factors interact?” (Jakulin et al., 2005). With in last few years, several work reported various approaches to visualize SVM models. Poulet (2004) proposed a cooperative approach using SVM and visualization methods, whereas Jakulin et al. (2005) used nomograms to visualize SVM models. Nomograms are an established model visualization technique that can graphically encode the complete model on a single page. In another study, Caragea et al. (2004) used tour-based methods to plot aspects of the SVM classifier. Their approach helps in providing insights about the cluster structure in the data, the nature of boundaries between clusters, problematic outliers and suggested that tours can be used to assess the variable importance. Recently, Ustün et al. (2007) proposed a technique to visualize the information content of the kernel matrix and a way to interpret the ingredients of the Support Vector Regression (SVR) model. Their approach gives an idea about what is happening in the complex SVM and which variables of the input space are the most important ones. SVM and RVM based regression for Remote sensing Data In the field of regression, only few applications of support vector regression (SVR) and RVM based methods are published using remote sensing data. Zhan et al., (2003) used SVR as the nonlinear transfer function between oceanic chlorophyll concentration and marine reflectance and found it working well in comparison to a neural network. Camps- Valls et al., (2006) used the ε -Huber loss function in the SVR formulation for the estimation of biophysical parameters extracted from remotely sensed data and compared it to other cost functions in the SVR, neural networks and classical bio-optical models for the particular case of the estimation of ocean chlorophyll concentration from satellite remote sensing data. They suggested that the proposed model provides more accurate, less biased, and improved results. Wang et al. (2007) used a kernel-based bidirectional reflection distribution function model inversion method for land surface parameter retrieval. Durbha et al., (2007) used SVR for the retrieval of leaf area index (LAI) from multiangle imaging spectroradiometer data. Camps-Valls et al., (2006) propose a novel formulation of the RVM that incorporates prior knowledge of the problem and successfully tested for quantifying chlorophyll-a concentrations accurately from multispectral satellite ocean colour data. They suggested that their approach provides better results than standard RVM formulations, SVR and neural networks.

SVM and RVM for civil engineering applications Soft computing technique, like artificial neural network is being used to solve various problems related to the geotechnical, water resource, transportation and structural engineering in civil engineering. Results obtained by neural network were compared with different equations proposed in literature and found to be working well in comparison to the empirical relations. A neural network based modelling algorithm requires setting up of different learning parameters (like learning rate, momentum), the optimal number of nodes in the hidden layer and the number of hidden layers. A large number of training iterations may force a neural network to overtrain, which may affect the predicting capabilities of the model. Further, presence of local minima is another problem with the use of a back propagation neural network. Within last few years, a number of research papers have reported the use of support vector machines in civil engineering and found it to be working well in comparison to neural network approach. Works reporting the use of SVM in geotechnical engineering and found it quite effective in comparison to neural network and empirical relations (Pal, 2006; Gill et al., 2006; Gill et al., 2007; Goh and Goh, 2007; Samui, 2008(a), 2008(b), 2008(c); Samui et al., 2008; Zho, 2008). A recent study by Pal and Deswal (2008) suggest an improved performance by Generalized Regression Neural Network (GRNN) in comparison to SVM, thus suggesting the need to further explore the potential GRNN for geotechnical engineering problems. Samui (2007; 2008d) explored the potential of RVM to assess the Seismic liquefaction potential and settlement of shallow foundation on cohesionless soils respectively and found it to be working well in comparison to neural network approach. Several studies examined the potential of support vector machines in different water resource engineering applications also (Dibike et al., 2001; Liong and Sivapragasam, 2002; Cannas et al., 2004; Sivapragasam and Muttil, 2005; Pal and Goel, 2006; Khan and Coulibaly, 2006; Pao-shan et al., 2006; Pal and Goel, 2007; Msiza et al., 2007; Hong and Pai, 2007; Singh et al., 2008; Deswal and Pal, 2008; Hanbay et al., and Goel and Pal (in press)). The SVM showed good performance and is proved to be competitive with neural network with different problems related to water resource engineering whereas the study by Msiza et al., (2007) suggested a better performance by ANN approach in water demand forecasting. An exhaustive literature review suggests few application of RVM in water resources engineering (Tripathi and Govindaraju, 2007; Ghosh and Mujumdar, 2008). Ghosh and Mujumdar (2008) reported a better performance by RVM for their dataset in comparison to SVM based regression approach. As compared to geotechnical and water resource engineering, few works reporting the use of SVM/RVM in other areas of civil engineering are found in literature. An et al. (2007) compared the performance of SVM in conceptual cost estimation in comparison to discriminant analysis model. They suggested a better performance by SVM based approach. Park et al., (2008) used a two-step SVM classifier for railroad track damage identification and suggested that damage estimation rate of 96.67% was achieved by SVM based classifier. Bulut et al., (2004) used SVM for real-time nondestructive structural health monitoring whereas Oh (2007) reported the use of RVM in earthquake engineering and structural health monitoring

Future Perspectives In this article, we reviewed two kernel based classification and regression approach as well as the factor affecting them for remote sensing and civil engineering applications. Except SVM/RVM, nonparametric models such as Bayesian framework for Gaussian processes (GP) models has recently received significant attention in the filed of machine learning. In comparison to backpropagation neural networks, GP are conceptually simpler to understand and implement in practice (Seeger et al., 2003). GP models are closely related to approaches such as support vector machines (Vapnik, 1995) and specially relevance vector machines (Qüinonero-Candela and Rasmussen, 2005) due to the use of kernel functions (Bishop and Tipping, 2003). Compared with SVM based regression model, GP models are probabilistic in nature (Rasmussen and Williams, 2006). The probabilistic models means that it provides reliability of response of GP models on the given input data (Yuan et al. 2008). GP models were initially designed to solve regression problems only but several approaches are proposed to use it for classification (Williams and Barber, 1998; Seeger and Jordan, 2004; Girolami and Rogers, 2006). Due to its better performance in practice, GP models are being applied to solve various problems effectively (Chen et al. 2007; Likar and Kocijan, 2007; Yuan et al. 2008) but so far no study reported its use in civil engineering. Recently, Pal and Deswal (under communication) reported the use of GP regression approach to predict the load capacity of driven piles and found it performing well in comparison to SVM and GRNN for the dataset used in this study (Table 2). Thus, suggesting the data dependent nature of different machine learning algorithms.

Table 2 Root mean square R2 value (Coefficient Approach error of determination) Gaussian Process Regression with PUK 312.18 0.898 kernel Gaussian Process Regression with RBF 335.85 0.881 kernel SVMs with PUK kernel 334.05 0.887

SVMs with RBF kernel 371.89 0.855

GRNN 436.42 0.835

References:

Archibald R. and Fann, G. (2007): Feature Selection and Classification of Hyperspectral Images with Support Vector Machines, IEEE Geoscience and Remote Sensing Letters, Vol. 4, No. 2, pp.674-677. Abe, S. and Inoue, T. (2002): Fuzzy Support Vector Machines for Multiclass Problems ESANN'2002 proceedings - European Symposium on Artificial Neural Networks, Bruges (Belgium), 24-26 April 2002, pp.113-118. An, S-H., Park, U-Y., Kang, K-I., Cho, M-Y. and Cho, H-H. (2007): Application of Support Vector Machines in Assessing Conceptual Cost Estimates. J. Comp. in Civ. Engrg. Vol 21, pp. 259-264. Bordes, A., Bottou, L., Gallinari P. and Weston, J. (2007): Solving MultiClass Support Vector Machines with LaRank, Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR. Benabdeslem, K and Bennani, Y. (2006): Dendogram Based SVM for Multi-Class Classification, 28th International Conference on Information Technology Interfaces, June 19-22, Cavtat, Crotia, pp. 173-178. Bulut, A., Singh, A. K., Shin, P., Jasso, H., Fountain, T., Yan, L. and Elgamal, A. (2004): Real-time Nondestructive Structural Health Monitoring using Support Vector Machines and Wavelets. Department of Computer Science • University of California, Santa Barbara, Report ID: 2004-26. Bazi, Y. and Melgani, F. (2006): Toward an Optimal SVM Classification System for Hyperspectral Remote Sensing Images, IEEE Transactions on Geoscience and Remote Sensing, vol. 44, No. 11, pp. 3374-3385. Boser, B., Guyon, I. and Vapnik, V. N. (1992): A training algorithm for optimal margin classifiers. Proceedings of 5th Annual Workshop on Computer Learning Theory, Pittsburgh, PA: ACM, pp. 144-152. Bishop C. M., Tipping, M. E. (2003): Bayesian regression and classification. In: Suykens J, Horvath G, Basu S, Micchelli C, Vandewalle J, editors. Advances in learning theory: Methods, models and applications. NATO science series III: Computer and systems sciences, vol. 190. Amsterdam: IOS Press. Burges, C. J. C. (1998): A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, Vol. 2, pp. 121-167. Chen, T., Morris, J., Martin, E. (2007): Gaussian process regression for multivariate spectroscopic calibration. Chemometrics and Intelligent Laboratory Systems, Vol. 87, No. 1, pp. 59–71. Camps-Valls, G., Gómez-Chova, L., Calpe-Maravilla,J., Martín-Guerrero, J. D., Soria- Olivas, E., Alonso-Chordá, L. and Moreno, J. (2004): Robust Support Vector Method for Hyperspectral Data Classification and Knowledge Discovery. IEEE Transactions on Geoscience and Remote Sensing, Vol. 42, No. 7, pp. 1530-1542. Camps-Valls, G., Gómez-Chova, L., Muñoz-Marí, J., Rojo-Álvarez, J. L. and Martínez- Ramón, M. (2008): Kernel-Based Framework for Multitemporal and Multisource Remote Sensing Data Classification and Change Detection, IEEE Transactions on Geoscience and Remote Sensing, Vol. 46, No. 6, pp. 822-1834. Camps-Valls, G. and Bruzzone, L. (2005): Kernel-Based Methods for Hyperspectral Image Classification, IEEE Transactions on Geoscience and Remote Sensing, Vol. 43, No. 6, pp.1351-1362. Camps-Valls, G., Gómez-Chova, L., Muñoz-Marí, J., Vila-Francés, J., Amorós-López, J. and Calpe-Maravilla, J. (2006): Retrieval of oceanic chlorophyll concentration with relevance vector machines. Remote Sensing of Environment, Vol. 115, No. 1, 23-33. Camps-Valls, G., Gomez-Chova, L., Muñoz-Marí, J., Vila-Francés, J. and Calpe- Maravilla, J. (2006): Composite Kernels for Hyperspectral Image Classification. IEEE Geoscience and Remote Sensing Letters, Vol. 3, No. 1, pp. 93-97. Camps-Valls, G., Bruzzone, L., Rojo-Álvarez, J. L. and Melgani, F. (2006): Robust Support Vector Regression for Biophysical Variable Estimation From Remotely Sensed Images, IEEE Geoscience and Remote Sensing Letters, Vol. 3, No. 3, pp. 339-343. Cortes, C. and Vapnik, V. N. (1995): Support vector networks, Machine Learning, Vol. 20, pp. 273-297. Cristianini N. and Shawe-Taylor, J. (2000): An Introduction to Support Vector Machines, London, Cambridge University Press. Chunhong, Z. and Licheng, J. (2004): Automatic parameters Selection for SVM based on GA, Proceedings of the 5" World Congress on Intelligent Control and Automation, IEEE, June 15-19, Hangzhou, P.R. China, pp. 1869-1872. Cannas, B., See, L., Montisci, A., Fanni, A. and Sechi, G.M. (2004): Comparing Artificial Neural Networks and Support Vector Machines for Modelling Rainfall- Runoff. In: Proceedings of Hydroinformatics 2004, Singapore, 21-24 July. Caragea, D., Cook, D., Wickham, H., and Honavar, V. (2008): Invited Chapter. Visual Methods for Examining SVM Classifiers. In: Visual Data Mining: Theory, Techniques, and Tools for Visual Analytics. Springer, LNCS Volume 4404. Dalponte, M., Bruzzone, L. and Gianelle, D. (2008): Fusion of Hyperspectral and LIDAR Remote Sensing Data for Classification of Complex Forest Areas. IEEE Transactions on Geoscience and Remote Sensing, Vol. 46, No. 5, pp.1416-127. Demir, B. and Ertürk, S. (2007): Hyperspectral Image Classification Using Relevance Vector Machines, IEEE Geoscience and Remote Sensing Letters, Vol.4, No. 4, pp.586-590. Dixon, B. and Candade, N. (2008): Multispectral landuse classification using neural networks and support vector machines: one or the other, or both?, International Journal of Remote Sensing, Vol. 29, No. 4, pp.1185 -1206. Deswal, S. and Pal, M., (2008): Modelling of Pan Evaporation Using Support Vector Machines Algorithm. ISH Journal of Hydraulic Engineering (ISSN 0971 5010); vol.14, no.1, pp.104-116. Dibike, Y. B., Velickov, S., Solomatine, D. and Abbott, M. B. (2001): Model Induction with Support Vector Machines: Introduction and Applications. J. Comp. in Civ. Engrg. Vol. 15, pp. 208-216. Durbha, S. S., King, R. L., Younan, N. H. (2007): Support vector machines regression for retrieval of leaf area index from multiangle imaging spectroradiometer, Remote Sensing of Environment, vol.107, pp.348–361. Friedrichs, F and Igel, C. (2005): Evolutionary Tuning of Multiple SVM Parameters. Neurocomputing 64(C), pp. 107-117. Frohlich, H. and Zell, A., (2005): Efficient Parameter Selection for Support Vector Machines in Classification and Regression via Model-Based Global Optimization. Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, pp. 1431-1436. Foody, G. M. and Mathur, A. (2006): The use of small training sets containing mixed pixels for accurate hard image classification: Training on mixed spectral responses for classification by a SVM. Remote Sensing of Environment, Vol.103, No. 2, pp.179- 189. Foody, G. M. (2008): RVM-based multi-class classification of remotely sensed data, International Journal of Remote Sensing, Vol. 29, No. 6, pp.1817-1823. Foody, G. M. and Mathur, A., (2004): A Relative Evaluation of Multiclass Image Classification by Support Vector Machines, IEEE Transactions on Geoscience and Remote Sensing, Vol. 42, No.6, pp.1335-1343. Ghosh, S. and Mujumdar, P. P. (2008): Statistical downscaling of GCM simulations to streamflow using relevance vector machine. Advances in Water Resources, Vol. 31, pp.132–146. Girolami, M., Rogers, S. (2006): Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors. Neural Computation, Vol. 18, Nos. 8, pp 1790-1817. Godbole, S., Sarawagi, S. and Chakrabarti, S. (2002):. Scaling multi-class support vector machines using inter-class confusion. In the Proc. of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD-2002), Edmonton, Canada. Guo, X.C., Liang, Y.C., Wu, C.G. and Wang, C.Y. (2006): PSO-Based Hyper-Parameters Selection for LS-SVM Classifiers, Lecture Notes in Computer Science 4233, Publisher Springer Berlin / Heidelberg, pp.1138 – 1147. Goh, A. T. C. and Goh, S. H. (2007): Support vector machines: Their use in geotechnical engineering as illustrated using seismic liquefaction data, Computers and Geotechnics, Vol. 34, No. 5, pp. 410-421. Goel, A. and Pal, M. Application of support vector machines in scour prediction on grade-control structures, Engineering Applications of Artificial Intelligence (Accepted for final publication). Gill, M. K., Asefa, T., Kemblowski, M. W. and McKee, M. (2006): Soil Moisture Prediction using Support Vector Machines, Journal of the American Water Resources Association, Vol. 42, No. 4, pp.1033-1046. Gill, M. K., Kemblowski, M. W. and McKee, M. (2007): Soil Moisture Data Assimilation Using Support Vector Machines and Ensemble Kalman Filter, Journal of the American Water Resources Association, vol. 43, No. 4, pp.1004-1015. Gualtieri, J. A. and Cromp, R. F. (1998): Support vector machines for hyperspectral remote sensing classification. Proceedings of the of the SPIE, 27th AIPR Workshop: Advances in Computer Assisted Recognition, Washington, DC, October 14-16, pp. 221-232. Huang, C., Davis, L. S. and Townshend, J. R. G. (2002): An assessment of support vector machines for land cover classification, International Journal of Remote Sensing, Vol. 23, No. 4, pp. 725-749. Hong, W-C. and Pai, P-F. (2007): Potential assessment of the support vector regression technique in rainfall forecasting, Water Resource Management, Vol. 21, pp.495–513. Hanbay, D., Baylar, A. and Batan, M., Prediction of aeration efficiency on stepped cascades by using least square support vector machines. Expert Systems with Applications. In press. Jakulin, A., Mozina, M., Bratko, I. and Zupan, B. (2005): Nomograms for Visualizing Support Vector Machines, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, Chicago, Illinois, USA, pp. 108- 117. Keerthi, S. S. (2002): Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms. IEEE Trans Neural Network, Vol. 13, No.5, pp.1225- 9. Khan, M. S. and Coulibaly, P. (2006): Application of Support Vector Machine in Lake Water Level Prediction. J. Hydrologic Engrg. Vol.11, pp.199-205. Lei, H. and Govindaraju, V. (2005): Half-Against-Half Multi-class Support Vector Machines. Multiple Classifier Systems. pp.156-164. Liu, R., Liu, E., Yang, J., Li, M. and Wang, F. (2006): Optimizing the hyper-parameters for SVM by combining evolution strategies with a grid search, International conference on intelligent computing, Kunming, China, vol. 44, pp. 712-721. Liong, S-Y. and Sivapragasam, C. (2002): Flood Stage Forecasting with Support Vector Machines, Journal of the American Water Resources Association, Vol. 38, No.1, pp.173-186. Likar, B., Kocijan, J. (2007): Predictive control of a gas–liquid separation plant based on a Gaussian process model. Computers & Chemical Engineering, vol. 31, No. 3, pp.142–152. Lorena, A. C. and de Carvalho, A.C.P.L.F. (2006): Multiclass SVM Design and Parameter Selection with Genetic Algorithms, Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06), pp. 23-28. Mazzoni, D., Garay, M. J., Davies, R., Nelson, D. (2007): An operational MISR pixel classifier using support vector machines. Remote Sensing of Environment, vol. 107, No. 1-2, pp. 149-158. Mercier, G. and Lennon, M. (2003): Support vector machines for hyperspectral image classification with spectral-based kernels, Proceedings of IEEE Geoscience and Remote Sensing Symposium, pp.288- 290. Melgani, F., & Bruzzone, L. (2002): Support vector machines for classification of hyperspectral remote-sensing images. Proceedings of IEEE Geoscience and Remote Sensing Symposium, Toronto, Canada, pp. 506-508. Melgani, F. and Bruzzone, L. (2004): Classification of Hyperspectral Remote Sensing Images With Support Vector Machines, IEEE Transactions on Geoscience and Remote Sensing, Vol. 42, No. 8, pp.1778-1790. Mathur, A. and Foody, G. M. (2008): Multiclass and Binary SVM Classification: Implications for Training and Classification Users, IEEE Geoscience and Remote Sensing Letters, Vol. 5, No. 2, pp.241-245. Msiza, I. S., Nelwamondo, F. V., and Marwala, T. (2007): Artificial Neural Networks and Support Vector Machines for water demand time series forecasting. Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp.638-643. Oh, C. K. (2007): Bayesian Learning for Earthquake Engineering Applications and Structural Health Monitoring, Unpublished PhD thesis, California Institute of Technology, Pasadena, California. Plaza , A., Benediktsson , J. A., Boardman , J. Brazile , J., Bruzzone , L., Camps-Valls G., Chanussot , J., Fauvel, M., Gamba, P., Gualtieri, A., Marconcini, M., Tilton, J. C. and Trianni, G. (2008): Recent Advances in Techniques for Hyperspectral Image Processing, Remote sensing of Environment( In press). Pal, M. (2008): Ensemble of support vector machines for land cover classification. International journal of remote sensing, Vol.29, No.10, pp.3043 -3049. Pal, M. (2002): Factors influencing the accuracy of remote sensing classifications: a comparative study. Ph. D. Thesis (unpublished), University of Nottingham, UK. Pal, M. (2006): Support Vector Machines Based Feature Selection for land cover classification: a case study with DIAS Hyperspectral Data. International Journal of Remote Sensing, Vol. 27, No. 14, pp.2877–2894. Pal, M. and Mather, P. M. (2006): Some Issues in the Classification of Remote Sensing Data: A Case Study with DIAS Hyperspectral Data. International Journal of Remote Sensing, Vol. 27, No. 14, 2895–2916. Pal, M., and Mather, P.M. (2005): Support Vector Machines for Classification in Remote Sensing. International Journal of Remote Sensing, Vol. 26, No. 5, pp.1007-1011. Pal, M. and Mather, P. M. (2004): Assessment of the effectiveness of support vector machines for hyperspectral data. Future Generation Computer Systems, Vol.20, No. 7, 1215-1225. Pal, M. and Deswal, S. (2008): Modeling Pile Capacity Using Support Vector Machines and Generalized Regression Neural Network, ASCE Journal of Geotechnical and Geoenvironmental Engineering, Vol.134, No. 7, pp.1021-1024. Pal, M. and Goel, A. (2007): Prediction of End-Depth-Ratio and Discharge in Trapezoidal shaped channels using Support Vector Machines. Water resource management, Vol. 21, pp.1763-1780. Pal, M. (2006): Support vector machines-based modelling of seismic liquefaction potential, International Journal for Numerical and Analytical Methods in Geomechanics, 30, 983-996. Pal, M. and Goel, A. (2006): Prediction of End-Depth-Ratio and Discharge in Semi- Circular and Circular shaped channels using Support Vector Machines. Flow measurement and Instrumentation, Vol.17, No.1, pp. 49-57. Pal, M. (2005): Multiclass Approaches for Support Vector Machine Based Land Cover Classification. Proc.Map India, 8th annual international conference and exhibition in the field of GIS, GPS, Arial Photography, and Remote Sensing, New Delhi, Proceedings on CD. Pal, M. and Mather, P.M. (2003): Support vector classifiers for land cover classification. Map India 2003, New Delhi, January 28-31 2003. Proceedings on CD. Pal, M. and Deswal, S, Modeling Pile Capacity Using Gaussian Process Regression, (Under correction after first review), Computers and Geotechnics. Park, S., Inman, D. J., Lee, J-J. and Yun, C-B. (2008): Piezoelectric Sensor-Based Health Monitoring of Railroad Tracks Using a Two-Step Support Vector Machine Classifier, J. Infrastruct. Syst. Vol.14, pp.80-88. Poulet, F. (2004), SVM and Graphical Algorithms: a Cooperative Approach, Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04), pp.499- 502. Pao-shan, Y., Shien, T.C., I-Fan, C. 2006. Support vector regression for real time flood stage forecasting. Journal of Hydrology, vol. 328, pp.704–716. Qüinonero-Candela, J., Rasmussen, C. E. (2005): Analysis of some methods for reduced rank Gaussian process regression. In: Murray-Smith R, Shorten R, editors. Switching and learning in feedback systems. Lecture notes in computer science, Springer, vol. 3355. Rasmussen, C. E. and Williams, C. K. I. (2006): Gaussian Processes for Machine Learning. The MIT Press, Cambridge, MA. Sap, M.N.M., and Kohram, M., (2008): Spectral Angle Based Kernels for the Classification of Hyperspectral Images Using Support Vector Machines. Second Asia International Conference on Modelling & Simulation, Kuala Lumpur, Malaysia 13-15 May, pp.559-563. Singh, K. K., Pal, M., Ojha, C. S. and Singh, V. P. (2008): Estimation of removal efficiency for settling basins using neural networks and support vector machines, ASCE Journal of Hydrologic Engineering, Vol. 13, No. 3, pp.146-155. Samui, P. (2008a): Prediction of friction capacity of driven piles in clay using the support vector machine. Canadian Geotechnical Journal, Vol. 45, No. 2, pp.288-295. Samui, P. (2008b): Predicted ultimate capacity of laterally loaded piles in clay using support vector machine. Geomechanics and Geoengineering, Vol. 3, No. 2, pp.113 – 120. Samui, P. (2008c): Support vector machine applied to settlement of shallow foundations on cohesionless soils, Computers and Geotechnics, Vol. 35, No. 3, pp.419-427. Samui, P. (2008d): Relevance vector machine applied to settlement of shallow foundation on cohesionless soils, Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards, Vol. 2, No.1, pp.41 – 47. Samui, P. (2007): Seismic liquefaction potential assessment by using Relevance Vector Machine, Journal Earthquake Engineering and Engineering Vibration. Vol. 6, No.4, pp. 331-336. Samui, P., Sitharam, T. G. and Kurup, P. U. (2008): OCR Prediction Using Support Vector Machine Based on Piezocone Data. J. Geotech. and Geoenvir. Engrg. Vol. 134, pp. 894-898. Seeger, M., Williams, C. K. I., Lawrence, N. D. (2003): Fast forward selection to speed up sparse Gaussian process regression, in: C. M. Bishop, B. J. Frey (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics. Seeger, M.and Jordan, M. I. (2004): Sparse Gaussian process classification with multiple classes. Technical Report TR 661, Department of Statistics, University of California at Berkeley. Scholkopf, B. (1997): Support Vector Learning. Ph. D. thesis, Technische Universitat, Berlin. Sivapragasam, C. and Muttil, N. (2005): Discharge Rating Curve Extension – A New Approach, Water Resources Management, Vol.19, pp.505-520. Tipping, M. (2001): Sparse Bayesian Learning and the Relevance Vector Machine," Journal of Machine Learning Research Vol.1, pp.211-244. Tripathi, S. and Govindaraju, R. S. (2007): On selection of kernel parametes in relevance vector machines for hydrologic applications. Stoch Environ Res Risk Assess, Vol. 21, pp. 747–764 Üstün, B., Melssen, W. J., Buydens, L. M. C. (2006): Facilitating the application of Support Vector Regression by using a universal Pearson VII function based kernel. Chemometrics and Intelligent Laboratory Systems, vol. 81, pp. 29 – 40. Ustün, B., Melssen, W.J. and Buydens, L.M.C. (2007): Visualisation and interpretation of Support Vector Regression models, Analytica Chimica Acta, 595, pp. 299–309. Vapnik, V. N. (1995): The Nature of Statistical Learning Theory. New York: Springer- Verlag. Vapnik, W. N., and Chervonenkis, A. Y. 1971, On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications. Vol.17, pp.264-280. Waske, B. and Benediktsson, J. A. (2007): Fusion of Support Vector Machines for Classification of Multisensor Data, IEEE Transactions on Geoscience and Remote Sensing, Vol.45, No. 12, pp. 3858-3886. Westin, J., and Watkins, C. (1998): Multi-class Support Vector Machines. Royal Holloway, University of London, U. K., Technical Report CSD-TR-98-04. Wang, F., Li, X., Nashed, Z., Zhao, F., Yang, H., Guan, Y. and Zhang, H. (2007): Regularized kernel-based BRDF model inversion method for ill-posed land surface parameter retrieval, Remote Sensing of Environment, Vol.111, No.1, pp.36-50. Williams, C. K. I. and Barber, D. (1998): Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 12, pp.1342-1351. Yan, G., Lishan, K., Fujiang, L. and Linlu, M. (2007): Evolutionary support vector machine and its application in remote sensing imagery classification. Geoinformatics 2007: Remotely Sensed Data and Information, Proceedings of the SPIE. Edited by Ju, Weimin; Zhao, Shuhe, 6752, 67523D. Yuan, J., Wang, K., Yu, T., Fang, M.(2008): Reliable multi-objective optimization of high-speed WEDM process based on Gaussian process regression. International Journal of Machine Tools & Manufacture, Vol. 48, pp. 47–60. Zhang, H. and Malik, J. (2005): Selecting shape features using multi-class relevance vector machine. Technical Report UCB/EECS-2006-6, Electrical Engineering and Computer Sciences, University of California at Berkeley, USA. Zhu, G. and Blumberg, D. G. (2002): Classification using ASTER data and SVM algorithms: The case study of Beer Sheva, Israel, Remote Sensing of Environment, Vol.80, No.2, pp. 233-240. Zhan, H., Shi, P. and Chen, C. (2003): Retrieval of Oceanic Chlorophyll Concentration Using Support Vector Machines, IEEE Transactions on Geoscience and Remote Sensing, Vol. 41, No. 12, pp.2947-2951. Zhao, H-b., (2008): Slope reliability analysis using a support vector machine, Computers and Geotechnics, Vol.35, No.3, pp.459-467.