Neurocomputing 186 (2016) 66–73

Contents lists available at ScienceDirect

Neurocomputing

journal homepage: www.elsevier.com/locate/neucom

Software reliability prediction via relevance vector regression

Jungang Lou a,b, Yunliang Jiang b,n, Qing Shen b, Zhangguo Shen b, Zhen Wang c, Ruiqin Wang b a Institute of Cyber-Systems and Control, Zhejiang Univeristy, 310027 Hangzhou, China b School of Information Engineering, Huzhou University, 313000 Huzhou, China c College of Computer Science and Technology, Shanghai University of Electric Power, 200090 Shanghai, China article info abstract

Article history: The aim of software reliability prediction is to estimate future occurrences of software failures to aid in Received 21 September 2015 maintenance and replacement. Relevance vector machines (RVMs) are kernel-based learning methods Received in revised form that have been successfully adopted for regression problems. However, they have not been widely 27 November 2015 explored for use in reliability applications. This study employs a RVM-based model for software relia- Accepted 9 December 2015 bility prediction so as to capture the inner correlation between software failure time data and the nearest Communicated by Liang Wang Available online 6 January 2016 m failure time data. We present a comparative analysis in order to evaluate the RVMs effectiveness in forecasting time-to-failure for software products. In addition, we use the Mann-Kendall test method to Keywords: explore the trend of predictive accuracy as m varies. The reasonable value range of m is achieved through Software reliability model paired T-tests in 10 frequently used failure datasets from real software projects. Relevance vector machine & 2016 Elsevier B.V. All rights reserved. Mann–Kendall test Paired T-test

1. Introduction in software reliability modeling and predicting. Unlike traditional statistical models, ANNs are data-driven, nonparametric weak models In the modern world, computers are used for many different [9,11,13,14]. ANN-based software reliability models require only failure applications, and research on software reliability has become history as an input, and they can predict future failures more accu- increasingly essential. Software reliability describes the probability rately than some commonly used parametric models. However, ANNs that software will operate without failure under given environmental suffer from a number of weaknesses, including the need for numerous conditions during a specified period of time [1].Todate,software controlling parameters, difficulty in obtaining a stable solution, and a reliability models are among the most important tools in software tendency to cause over fitting. A novel type of learning machine, reliability assessment [2]. Most existing software reliability models, kernel machines (KMs), is emerging as a powerful modeling tool, and known as parametric models, depend on priori assumptions about it has received increasing attention in the domain of software relia- software development environments, the nature of software failures, bility prediction. Kernel-based models can achieve better predictive and the probability of individual failures occurring. Parametric models accuracy and generalization performance, thus arousing the interest of may exhibit different predictive capabilities across different software many researchers [18–22]. Generally speaking, KMs have been suc- projects [3–8], and researchers have found it almost impossible to cessfully applied to regressions, with remarkable training results even ; ; ; ; …; ; A develop a parametric model that can provide accurate predictions given a relatively small dataset D ¼fðx1 y1Þ ðx2 y2Þ ðxl ylÞg d under all circumstances. To address this problem, several alternative R R,wherext are input vectors, yt are output vectors, t ¼ 1; 2; …; l , solutions have been introduced over the last decade. One possible d is a dimension of xt,andl is the number of observed input/output solution is to employ artificial neural networks (ANNs) [9–17].Kar- pairs [18]. unanithi et al., Dohi et al., Cai et al., Ho et al., Tian and Noore and Hu Examples of KMs include support vector machines (SVMs) and et al. used both classical and recurrent multi-layer neural relevance vector machines (RVMs). Vapnik [18] developed SVMs with networks to forecast software reliability. ANNs have proven to be the goal of minimizing the upper boundary of the generalization error universal approximates for any nonlinear continuous function with an consisting of the sum of the training error and confidence interval, arbitrary accuracy. Consequently, they represent an alternative method which appears to be less computationally demanding. Tian and Noore [19] proposed an SVM-based model for software reliability prediction that embraces some remarkable characteristics of SVMs, including n Corresponding author. good generalization performance, absence of local minima, and sparse E-mail addresses: [email protected] (J. Lou), [email protected] (Y. Jiang), [email protected] (Q. Shen), [email protected] (Z. Shen), solution representation. Pai and Hong [20] and Yang and Li [21] also [email protected] (Z. Wang), [email protected] (R. Wang). made efforts to develop SVM-based reliability models, showing that http://dx.doi.org/10.1016/j.neucom.2015.12.077 0925-2312/& 2016 Elsevier B.V. All rights reserved. J. Lou et al. / Neurocomputing 186 (2016) 66–73 67 these models can achieve good prediction accuracy. However, SVMs approximation is: are sensitive to uncertainties because of the lack of probabilistic out- XM puts, as well as the need to determine a regularization parameter and t ¼ yðx; wÞ¼ wiKðx; xiÞþw0 ð1Þ select appropriate kernel functions to obtain optimal prediction i ¼ 1 accuracy [22–25]. where these fwig aretheparametersofthemodel,generallycalled This paper proposes a new data-driven approach for predicting weights, and Kð; Þ is the kernel function. Assuming that each example – software reliability using RVM [23 27] to capture the uncertainties from the data set has been generated independently (an often realistic in software failure data and predictions about possible present and assumption, although not always true), the likelihood of all the data is future aquifer conditions. RVM adopts kernel functions to project given by the product: the input variables into a high-dimensional feature space, in order ! n 2 = Jt ΦwJ to extract the latent information. Compared to SVM, it uses fewer pðtjσ2Þ¼ ∏ Nðt jyðx ; wÞ; σ2Þ¼ð2πσ2ÞN 2exp i i σ2 kernel functions and avoids the use of free parameters [28–30]. i ¼ 1 2 The kernel-based software reliability modeling process also T T where w ¼½w0; w1; w2; …; wN , Φ ¼½ϕðx1Þ; ϕðx2Þ; …; ϕðxNÞ ,and focuses on choosing the number of past observations related to the T future value. Some researchers suggest that failure behavior earlier ϕðxnÞ¼½1; Kðxn; x1Þ; Kðxn; x2Þ; …; Kðxn; xNÞ : in the testing process has less impact on later failures, and Next we introduce a prior distribution over the parameter vector w.The therefore not all available failure data should be used in model key difference in the RVM is that we introduce a separate hyperpara- training. However, to the best of our knowledge, such claims lack meter αi for each of the weight parameters wi instead of a single shared either theoretical support or experimental evidence. This study hyperparameter. Thus, the weight prior takes the form: uses the Mann–Kendall test and paired T-test [31–33] to investi-  N α α w2 gate the appropriate number of past observations related to the α ∏ ffiffiffiffiffiffii i i ; α α ; α ; …; α : pðwj Þ¼ p πexp ¼½ 1 2 N future value for RVM-based software reliability modeling. i ¼ 0 2 2 The paper is organized as follows. After explaining the background Having defined the prior, proceeds by computing, of the research, Part 2 outlines the principle of RVM for regression. Part from Bayes rule, the posterior over all unknowns give the data: 3 introduces the framework for software reliability prediction based pðtjw; α; σ2Þpðw; α; σ2Þ on RVM and describes how RVM regression can be used in predicting pðw; α; σ2jtÞ¼ : ð2Þ pðtÞ software failure time. Part 4 discusses the process for RVM-based software reliability models and presents experimental datasets and Then, given a new test points xn,predictionsaremadeforthecorre- measures for evaluating predictability. Following that, Part 5 explains sponding target tn, in terms of the predictive distribution: Z the Mann–Kendall test and paired T-test, demonstrates the detailed ; α; σ2 ; α; σ2 α σ2 experimentation process, and analyzes the experimental results on the pðtnjtÞ¼ pðtnjw Þpðw jtÞ dw d d ð3Þ 10 datasets. Finally, Part 6 concludes the paper. We cant compute the posterior pðw; α; σ2jtÞ in (1) directly.Instead,we decompose the posterior as: ; α; σ2 ; α; σ2 α; σ2 : 2. RVM for regression pðw jtÞ¼pðwjt Þpð jtÞ The posterior distribution over the weights is thus given by: Support vector machines (SVMs) are a set of related supervised pðw; α; σ2jtÞ pðtjw; σ2ÞpðwjαÞ pðtjw; σ2ÞpðwjαÞ learning methods used for classification and regression. The goal of pðwjt; α; σ2Þ¼ ¼ ¼ R α; σ2 α; σ2 ; σ2 α fi pð jtÞ pðtj Þ pðtjw Þpðwj Þ dw SVMs classi cation is to separate an n-dimensional data space (trans- () T 1 formed using nonlinear kernels) by an ðnlÞ-dimensional hyper-plane N þ1 ð = Þ ðwμÞ Σ ðwμÞ ¼ð2πÞ Σ 1 2 nexp ; that creates the maximum separation (margin) between two classes. 2 2 This technique can be extended to regression problems in the form of support vector regression. Regression is essentially an inverse classifi- where the posterior covariance and the mean are respectively: cation problem where, instead of searching for a maximum margin μ ¼ σ 2ΣΦT t; classifier, a minimum margin fit needs to be found. However, SVMs are Σ σ 2ΦT Φ 1; not well suited to software reliability prediction due to the lack of ¼ðAþ Þ α ; α ; …; α : probabilistic outputs. Tipping [24–27] introduced the RVM which A ¼ diagð 0 1 NÞ makes probabilistic predictions and yet which retains the excellent By integrating the weights, we obtain the marginal likelihood for the predictive performance of the support vector machine. It also preserves hyper parameter: the sparseness property of the SVM. The RVM is a Bayesian form () T 1 ð = Þ t Ω t representing a generalized linear model of identical functional form to pðtjα; σ2Þ¼ð2πÞ ðN=2Þ Ω 1 2 exp ; ð4Þ the SVM, and it is a Bayesian sparse kernel technique for regression, 2 which introduces a prior over the model weights dominated by a set of where Ω ¼ σ2I þΦA 1ΦT . Our goal is now to maximize (4) with hyper-parameters, whose most probable values are iteratively estimated respect to the hyper parameters α; σ2.Wesimplysettherequired from the data. In addition to the probabilistic interpretation of its out- derivatives of the marginal likelihood to zero and obtain the following put, it uses far fewer kernel functions for comparable performance. We re-estimation equations: give a brief review of RVM for regression. For a more detailed discussion γ on RVM, readers can refer to [24–27]. Assuming that a total of N pairs of αnew ¼ i ; i μ2 training patterns are given during RVM learning process, i J ΦμJ 2 2 new t ðx ; t Þ; ðx ; t Þ; …; ðx ; t Þ; …; ðx ; t Þ; ðσ Þ ¼ P ; 1 1 2 2 i i N N N N γ X i ¼ 0 i A n γ α : where the inputs are n-dimensional vectors xi R and the target i ¼ 1 i outputs are continuous values ti AR. The RVM model used for function ii 68 J. Lou et al. / Neurocomputing 186 (2016) 66–73

We can then compute the predictive distribution, from (2),foranew Parameters optimizing input data xn using (3): Using the first k α ; σ2 α; σ2 : Choosing k as the failure datas for pð MP MP Þ¼arg maxpð jtÞ α;σ2 number of datas RVM learning Choosing Kernel for learning The probability distribution of the corresponding output is given by Functions Z ; α ; σ2 ; α ; σ2 ; α ; σ2 : pðtnjt MP MP Þ¼ pðtnjw MP MP Þpðw MP MP jtÞ dw Make predictions accumulated Determine the software failure Kernel Width r SincebothtermsintheintegrandareGaussian,thisisreadilycomputed, time data giving: Datas scaling back 2 pðtnjtÞNðtnjyn; σn Þ; Data scaled and T normalized yn ¼ μ ϕðxnÞ; Determine the m σ2 σ2 ϕ T Σϕ ; value of Predicting the n ¼ MP þ ðxnÞ ðxnÞ (k+d)thsoftware T failure time ϕðxnÞ¼½1; Kðxn; x1Þ; Kðxn; x2Þ; …; Kðxn; xNÞ : Fig. 1. Process of software reliability model based on RVM. It should be noted that many characteristics of RVM rely on the kernel function being used. By using a kernel function, one can map xi non- software failure sequence for prediction purposes. Following suc- linearly into a high dimensional feature space and perform the linear cessful training, the RVM can predict future outcomes t^ at regression in this space. The appropriate kernel function should be n þ k different time steps. If k¼1, the prediction is a one-step-ahead adapted to more precise and effective results. forecast, and when k41, the prediction is a multi-step forecast. In practice, one-step-ahead forecasting results are more useful since they provide timely information for preventive and corrective 3. Formulation of the RVM-predictor maintenance plans. Therefore, this study only considers one-step- ahead predictions for analysis. Suppose that a total number of n failures have been observed, and ti; i ¼ 1; 2; …; n is the accumulated execution time. The general software reliability prediction model can then be represented as follows: 4. Process of software reliability model based on RVM ; ; …; ; tl ¼ f ðtl m tl m þ 1 tl 1Þ Fig. 1 describes the process of software reliability modeling ; ; …; where ðtl m tl m þ 1 tl 1Þ is a vector of lagged variables, and m based on RVM. This section provides a brief introduction of each represents the dimensions of the input vector or the number of process, and the paramount topic of this paper is how to deter- past observations related to the future value. The RVM approach mine a suitable value for m. We tested the performance of our attempts to identify the appropriate representation of software proposed approach using the same real-time control application failure time data. The key to solving prediction problems lies in and flight dynamic application datasets cited in Park et al. [23] and approximating the function f. Solving a function regression pro- Karunanithi et al. [9]. We chose a common baseline in order to blem in the modeling process can illuminate the autocorrelation compare our results with related work in the literature. Table 2 among the data and produce better estimates. In predicting soft- summarizes all 10 datasets used in the experiments [34,35]: ware reliability, RVM can be trained to first analyze the relation- Ohba [36] used the variable-term-predictability approach to ship between past historical reliability indices and the corre- propose a two-component predictability measure, average relative sponding targets, and then predict future failures. Table 1 illus- prediction error (AE), that compares predictive capabilities of trates the training patterns designed for the reliability prediction models at different fault density ranges. The measure is illustrated process. The constructed model first trains based on the failures as follows: detected; during the model training phase, the data used are as Xk ^ ; ; …; 1 ti ti follows: Inputs: m-dimensional vectors ðti ti þ 1 tm þ iÞ, Outputs: AE ¼ ; km ti scalars ftm þ i þ 1g; i ¼ 1; 2; …; nm. Therefore, a total number of k i ¼ m þ 1 m pairs of training patterns are given during the RVM learning where t^ denotes the predicted value of the failure time and t process: i i indicates the actual value of the failure time. AE is a measure of ðT1; tm þ 1Þ; ðT2; tm þ 2Þ; …; ðTn m; tnÞ: how well a model predicts throughout the test phase. The RVM learning scheme is then applied to failure time data, in order to recognize the inherent internal temporal property of a 4.1. Kernel function

; fi Table 1 In formula (1), Kðx xiÞ is de ned as the kernel function, which is RVM based approach to software reliability prediction. the inner product of two vectors in feature space ϕðxÞ and ϕðxiÞ. Introducing the kernel function allows us to deal with feature iTi ti þ m spaces of arbitrary dimensionality without explicitly computing the mapping relationship ϕðxÞ. Two commonly used kernel func- RVM Learning 1 t1 t2 ⋯ tm tm þ1

2 t2 t3 ⋯ tm þ 1 tm þ 2 tions are the polynomial kernel function and the Gaussian kernel ⋯ 3 t3 t4 tm þ 2 tm þ 3 function. Here, we use the Gaussian kernel function to map the ⋮ ⋮⋮⋯⋮⋮ original inputs into a high-dimensional feature space. The Gaus- ⋯ () nmtn m tn m þ 1 tn 1 tn 2 Jxi xj J RVM Learning nmþ1 t t ⋯ t ^ sian kernel is defined as Kðx; xiÞ¼exp , where r 40 n m þ 1 n m þ 2 n tn þ 1 r2 nmþ2 t t ⋯ ^ ^ n m þ 2 n m þ 3 tn þ 1 tn þ 2 fi ⋮ ⋮⋮⋯⋮⋮ is a constant that de nes the kernel width, which represents scale parameters that control a kernels resolution. J. Lou et al. / Neurocomputing 186 (2016) 66–73 69

4.2. Data representation accuracy, because it determines the structure of high-dimensional feature space and governs the complexity of the ultimate solution. All the inputs and outputs of the KMs are normalized within Past research adopted trial-and-error approaches to tune up the the range of [0.1, 0.9] to reduce the training cost [37]. The actual parameters. We set r A½r1; r2, increasing the values of r by rs.We values are scaled using the following relationship: also computed the predictive performance in allusion to each  parameter r was also computed. Then we adopted the value of r 0:8 xmax y ¼ Δ xþ 0:90:8 Δ ; that produces the best predictive performance as the kernel functions parameter. In order to assess kernel width r on its pre- where y is the scaled value we feed into the machine, x is the dictive accuracy, we performed a preliminary experiment using actual value before scaling, xmax is the maximum value in the datasets 1 through 10. In the experiments, the values of r1; r2; rs Δ fi samples while xmin the minimum value, and is de ned as were set at 0.1, 2 and 0.1 respectively. Fig. 2 describes the average xmax xmin. After the training process, we test prediction perfor- relative prediction error for each dataset when kernel width r mance by scaling all the network outputs back to their actual changes, all values are computed with m ¼ 8; αi ¼ 0:5ði ¼ 1; 2; …; nÞ values using the following equation: and σ2 ¼ 1. As the figure illustrates, the average relative prediction error changes with different values of r , Table 3 lists the relatively y0:9 x ¼ Δþx : better values of AE with r A½0; 2 for datasets 1 through 10. 0:8 max

4.3. Determining the kernel parameters 5. Experimentation

Determining the kernel functions parameter is an optimization 5.1. Model comparison problem. Choosing the kernel parameter r requires great effort and Table 4 summarizes the best results from modeling the tem- Table 2 poral relationships among software failure time sequences using Data Sets used for model evaluation. our proposed RVM approach for datasets 1, 2, 8, and 9. We use the same data sets as cited in Tian et al. [19], Park et al. [23] and Data Sets Lines of codes Failures Type of software observed Karunanithi et al. [9] in order to establish a common baseline for comparison purposes. Park et al. applied failure sequence numbers Data Set 1 21 700 136 Real-time command and as input and cumulative failure times as desired outputs in a feed- control forward neural network (FFNN). Based on the input–output Data Set 2 10 000 118 Flight dynamic application learning pairs of cumulative execution time and the corresponding Data Set 3 35 000 279 Hardware control software Data Set 4 870 000 535 Real-time control application Data Set 5 200 000 481 Monitoring and real-time Table 3 control Data Sets used for model evaluation. Data Set 6 90 000 198 Monitoring and real-time control Data sets 1 2 3 4 5 6 7 8 9 10 Data Set 7 1317 000 328 Database application software Data Set 8 22 500 180 Flight dynamic application r 0.9 0.5 0.3 0.7 0.9 1.2 1.6 0.3 1.2 0.5 Data Set 9 38 500 213 Flight dynamic application The best value of 1.11 2.04 3.96 1.21 4.11 1.87 1.65 1.71 0.87 1.02 Data Set 10 Unknown 266 Real-time control application AE

9 4.5 Data Set−1 Data Set−6 Data Set−2 Data Set−7 8 Data Set−3 4 Data Set−8 Data Set−4 Data Set−9 Data Set−5 Data Set−10 7 3.5

6 3

5 2.5

4 2 Average Relative Prediction Error Average Relative Prediction Error 3 1.5

2 1

1 0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 kernel width r kernel width r

Fig. 2. The plots of the average relative prediction error for data1-10 with different Kernel Width r. 70 J. Lou et al. / Neurocomputing 186 (2016) 66–73

Table 4 level (that is, a long-term probability of rejecting the null Comparison of average relative prediction error for Data Sets 1,2,8,9. hypothesis when it is true) of 5%. In other words, trends exist in the data, and the trends are s-significant. Moreover, the positive Data Sets SVM [19] FFNN1 [15] RNN [13] FFNN2 [13] RVM signs of the Z values show that an increasing trend exists in most Data Sets 1 2.44 2.58 2.05 2.50 1.11 cases. With regard to the AE series on all datasets, we would Data Sets 2 1.52 3.32 2.97 5.23 2.04 expect an increasing trend rather than the decreasing one that we Data Sets 8 1.24 2.38 3.64 6.26 1.71 Data Sets 9 1.20 1.51 2.28 4.76 0.87 detected. A possible explanation is that recent failure history Average 1.60 2.45 2.74 4.69 1.43 records the latest characteristics of the testing process, con- tributing to a more accurate prediction of near future failure events. accumulated number of defects disclosed, Karunanithi et al. employed both FFNN and (RNN) struc- tures to model the failure process. Table 4 also summarizes these 5.3. Additional experiments results. For example, using our proposed approach with dataset 1, the average value of AE for four datasets is 1.43. These values are In the previous section, we compared the predictive accuracy of lower than the results obtained by Tian Liang (1.60) using SVM, RVM-based software reliability prediction models with different Park et al. (2.45) using feed-forward neural network, Karunanithi values of m. In order to determine relatively better m values for et al. (2.74) using recurrent neural network, and Karunanithi et al. related software reliability modeling, we performed further sta- (4.69) using feed-forward neural network. In all four datasets, the tistical analysis using a paired T-test. In this approach, we viewed AE results show that our RVM approach yields a lower average the AE measures as outcomes of randomized experiments in which prediction error than the other approaches. the datasets were randomly selected and the values of m as treatments were applied to each of the datasets. These experi- 5.2. Test result and trend analysis ments divided the values of m into 6 sets, for a total of 12 pairs of competing models. fAg¼f6; 7; 8; 9; 10g, fBg¼f11; 12; 13; 14; 15g, Fig. 3 shows the average relative prediction error for datasets fCg¼f16; 17; 18; 19; 20g, fDg¼f21; 22; 23; 24; 25g, fEg¼f26; 27; 28; ; ; …; 1 through 10 in the cases of m ¼ 6 7 30. The values of r are 0.1, 29; 30g. So there are 12 pairs of competing models totally: 2, and the best value listed in Table 3 for each dataset. The figure demonstrates that the values of AE change as m varies. Due to this fAg-fBg; fAg-fCg; fAg-fDg; fAg-fEg; variability as well as the existence of outliers, it is difficult to fBg-fCg; fBg-fDg; fBg-fEg; visually discern any trends from Fig. 3. Therefore, we will apply statistical techniques for trend testing and trend estimation in this fCg-fDg; fCg-fEg; section. To test the null hypothesis that a sample x1; x2; …; xn, does not exhibit a trend, Mann [31] used a linear function of a test fDg-fEg: statistic originally developed by Kendall [32] to test whether two fi sets of rankings are s-independent. The direct application of this Thus, if a set fXg has an AE that is signi cantly lower than that of 4 test statistic S for our purposes is known as the Mann–Kendall test another set fYg then we can infer, then we can infer that fXg fYg 4 fi A for trend. The value of the test statistic is computed as follows: (where denote signi cantly better), and consequently that m 8 fXg should perform better than mAfYg. > 1; x xj 40; nX 1 Xn nX 1 Xn < k ; ; S ¼ sgnðx x Þ¼ 0 xk xj ¼ 0 k j :> i ¼ 1 ¼ þ i ¼ 1 ¼ þ 5.4. Paired T-test k j 1 k j 1 1; xk xj o0: ÀÁ 0 n = ; o Table 6 describes the Paired T-test for small samples, where D For all n ¼ 2 ¼ nðn1Þ 2 pairs of values xk xjðk jÞ, S counts those pairs for which the earlier observation xk is smaller than xj, represents the difference between observations and the hypoth- and subtracts the number of pairs for which the latter observation eses D0 ¼ 0 means that the difference between two observations is Pl is smaller. An S-value close to zero suggests that no trend exists in ; ; ; …; ; μ μ ; 0. In Table 6, di ¼ xi yi i ¼ 1 2 l d ¼ di ¼ 1 2 Sd ¼ the data, and a high absolute value of the test statistic hints at the qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i ¼ 1 P σ2 þ σ2 2σ σ ρ existence of a trend. In calculating S, tied pairs, that is, those pairs 1 l 2; σ 1 2 1 2 σ ; σ ðdi dÞ d ¼ , where 1 2 means for which x ¼x , are not considered. However, the existence of tied l 1 i ¼ 1 l k j population standard variance of two samples, ρ means the cor- pairs does influence the variance of the test statistic. The variance relation of the two samples, α degree of confidence. of S is given by Table 7 shows the detailed test results. In the table, SIG refers to 1 fi VarðSÞ¼ ½nðn1Þð2nþ5Þ: signi cance, which represents the P-value statistically. When the 18 confidence α is set as 95% of two side test, the condition 0:01oP Under the null hypothesis, the distribution of S is always sym- o0:05 means that a statistically significant difference exists, and metric, and the expected value of S is zero. Moreover, as n P o0:01 indicates an extremely significant difference; both of approaches infinity, the distribution of S converges to the s-normal these outcomes mean that the null hypothesis H0 should be distribution. Allowing for a continuity correction, the value of the rejected. Alternatively, P 40:05 means that the hypothesis should S ffiffiffiffiffiffiffiffiffisgnðSÞ test statistic Zstatistic ¼ p can be compared to the quartiles of not be rejected. While the marginal value is 0.025 for single side VarðsÞ the standard s-normal distribution in order to determine whether test, the hypothesis should not be rejected on the condition the null hypothesis can be rejected. Table 5 lists the values of Z P 40:025. We can see from the table that fAg4fBg4fCg4fDg4 calculated for the series AE on four datasets. Both of the Z values fEg, fAg4fCg, fAg4fDg, fAg4fEg, fBg4fCg, fCg4fDg, fCg4fEg all fi are larger than λ0:975 ¼ 1:960, which is the 97.5% quantile of the have extremely signi cant difference, we can draw the conclusion standard s-normal distribution except in the case of dataset 9 with that mAf6; 7; 8; 9; 10g has the best predictive performance among r¼1.2. Consequently, in the other 29 cases, the null hypothesis that [6,30] with RVM based software reliability models for the 10 the time series contains no trend can be rejected at a Type I error Data Sets. J. Lou et al. / Neurocomputing 186 (2016) 66–73 71

9 8 r=1 8 r = 1 r = 3.8 7 r=3.1 7 r = 10 r=10 6 6 5 5

4 4 3

AE for Data Set−1 AE for Data Set−2 3 2 2 1 0 1 5 10 15 20 25 30 5 10 15 20 25 30 the value of m the value of m 25 6 r=1 r=1 5 r=1.9 20 r=5.6 r=10 r=10 4 15 3 10 2 AE for Data Set−3 AE for Data Set−4 5 1

0 0 5 10 15 20 25 30 5 10 15 20 25 30 the value of m the value of m 18 10 r=1 r=1 16 r=4.4 r=6 8 r=10 r=10 14

12 6

10 4 AE for Data Set−5 8 AE for Data Set−6 2 6

4 0 5 10 15 20 25 30 5 10 15 20 25 30 the value of m the value of m 12 5.5 r=1 r=1 5 10 r=8.3 r=1.1 r=10 4.5 r=10 8 4

6 3.5

3 4 AE for Data Set−8 AE for Data Set−7 2.5 2 2

0 1.5 5 10 15 20 25 30 5 10 15 20 25 30 the value of m the value of m 8 6 r=1 r=1 7 r=5.4 5 r=2.5 6 r=10 r=10 4 5

4 3

3 2 AE for Data Set−9 2 AE for Data Set−10 1 1

0 0 5 10 15 20 25 30 5 10 15 20 25 30 the value of m the value of m

Fig. 3. The plots of the average relative prediction error for Data Sets 1–10 with different m values. 72 J. Lou et al. / Neurocomputing 186 (2016) 66–73

Table 5 Trend test for data sets 1–10.

Data Sets rZstatistics Explanation Data Sets rZstatistics Explanation

Data Set 1 0.1 2.8727 Increasing Data Set 6 0.1 3.4565 Increasing 0.9 5.0213 Increasing 1.2 3.5032 Increasing 2 3.4799 Increasing 2 4.2506 Increasing

Data Set 2 0.1 2.9894 Increasing Data Set 7 0.1 4.6243 Increasing 0.5 2.4990 Increasing 1.6 4.7644 Increasing 2 2.8054 Increasing 2 4.1572 Increasing

Data Set 3 0.1 4.7190 Increasing Data Set 8 0.1 4.1338 Increasing 0.3 4.8578 Increasing 0.3 3.9937 Increasing 2 4.1338 Increasing 2 5.4974 Increasing

Data Set 4 0.1 5.7220 Increasing Data Set 9 0.1 4.7177 Increasing 0.7 3.0128 Increasing 1.2 1.7049 No significant trend 2 4.4141 Increasing 2 5.2315 Increasing

Data Set 5 0.1 4.5309 Increasing Data Set 10 0.1 4.7878 Increasing 0.9 3.2697 Increasing 0.5 4.5075 Increasing 2 5.1614 Increasing 2 3.2330 Increasing

Table 6 The promising results obtained by this work suggest that the Paired T-test for small samples. proposed model has potential value as high-technology products increase the demand for forecasting science. In the future, novel Two-sided test Left-side test Right-side test hybrid evolutionary algorithms such as differential evolution, : μ μ : μ μ Z : μ μ Z Hypothesis H0 1 2 ¼ D0 H0 1 2 D0 H0 1 2 D0 genetic algorithms, simulated annealing, chaotic genetic algo- : μ μ a : μ μ o : μ μ o Hα 1 2 D0 Hα 1 2 D0 Hα 1 2 D0 pffiffi rithms, and particle swarm optimization should be applied to T-statistic d D0 d D0 t ¼ σ l d Sd obtain more appropriate parameters for kernel functions, and 4 o 4 Rejection jjt tα=2ðl1Þ t tαðl1Þ t tαðl1Þ consequently to achieve more accurate predictions of software region reliability. Decision rule P oα; reject H0 of P value Assumptions The observed data are from the same subject Acknowledgments or from a matched sub- ject and are drawn from a population with a This research is based upon work supported in part by National normal distribution. Natural Science Foundation of China (61103051,61370173,61402336), Science Foundation of Ministry of Education of China (14YJCZH152), Zhejiang Provincial Natural Science Foundation (LY15F020018, LQ12F02008), Zhejiang Provincial Science and Technology Plan of China (2015C33247,2013C31138,2013C31138) and Huzhou Science Table 7 The result of paired T-test under different regression methods. and Technology Plan (2014GZ02).

Pairs Mean value T-statistic SIG. (two Comparison of side) performance References Pair 1 fAg-fBg0.54207 7.460 0.048 fAg4fBg Pair 1 fAg-fCg0.90620 10.409 0.009 fAg4fCg [1] J.D. Musa, Software Reliability Engineering, McGraw Hill, New York, 1999. Pair 1 fAg-fDg1.50600 14.866 0.000 fAg4fDg [2] IEEE Recommended Practice on Software Reliability, IEEE Standard 1633, 2008. Pair 1 fAg-fEg2.08300 16.327 0.007 fAg4fEg [3] M.X. Liu, L. Miao, D.Q. Zhang, Two stage cost sensitive learning for software Pair 1 fBg-fCg0.36413 5.609 0.004 fBg4fCg defect prediction, IEEE Trans. Reliab. 63 (2) (2014) 676–686. - 4 Pair 1 fBg fDg0.96393 11.960 0.010 fBg fDg [4] Z.B. Sun, Q.B. Song, X.Y. Gao, Using coding based to improve - 4 Pair 1 fBg fEg1.54093 12.951 0.010 fBg fEg software defect prediction, IEEE Trans. Syst. Man Cybern. 42 (6) (2012) Pair 1 fCg-fDg0.59980 8.812 0.009 fCg4fDg 1806–1817. Pair 1 fCg-fEg1.17680 9.999 0.005 fCg4fEg [5] C.Y. Huang, C.H. Kuo, S.P. Luan, Evaluation and application of bounded gen- Pair 1 fDg-fEg0.57700 5.811 0.026 fDg4fEg eralized pareto analysis to fault distributions in open source software, IEEE Trans. Reliab. 63 (1) (2014) 309–319. [6] S. Pham, H. Pham, Quasi renewal time delay fault removal consideration in 6. Conclusions software reliability modeling, IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 39 (1) (2009) 1–10. [7] C.Y. Huang, W.C. Huang, Software reliability analysis and measurement using In this paper, we first introduced an RVM-based software finite and infinite server queuing models, IEEE Trans. Reliab. 57 (1) (2008) – reliability prediction model. We then analyzed experimental 192 203. [8] C.Y. Huang, S.Y. Kuo, M.R. Lyu, An assessment of testing-effort dependent results using 10 datasets collected from real software projects. software reliability growth models, IEEE Trans. Reliab. 56 (2) (2007) 198–211. Using the Mann–Kendall test method, we studied the trend of AE [9] N. Karunanithi, D. Whitley, Y. Malaiya, Prediction of software reliability using – fi connectionist models, IEEE Trans. Softw. Eng. 18 (7) (1992) 63 74. serials on four datasets as m changes, and we con rmed that early [10] J.G. Lou, Y.L. Jiang, Q. Shen, J.H. Jiang, Evaluating the prediction performance of failure behavior in the testing process may have less impact on different kernel functions in kernel based software reliability models, Chin. A ; ; ; ; J. Comput. 36 (6) (2013) 1303–1311. later failure processes. At last, we selected m f6 7 8 9 10g was [11] T. Dohi, Y. Nishio, S. Osaki, Optimal software release scheduling based on chosen as the best m for software reliability based on paired T-test artificial neural networks, Ann. Softw. Eng. 8 (2009) 167–185. of the 10 datasets. [12] K.Y. Cai, L. Cai, W.D. Wang, Z.Y. Yu, D. Zhang, On the neural network approach in software reliability modeling, J. Syst. Softw. 58 (2001) 47–62. J. Lou et al. / Neurocomputing 186 (2016) 66–73 73

[13] S. Ho, M. Xie, T. Goh, A study of the connectionist models for software relia- Yunliang Jiang received the Ph.D. degree in Computer bility prediction, Comput. Math. Appl. 46 (2003) 1037–1045. Science and Technology from Zhejiang University, in [14] L. Tian, A. Noore, Evolutionary neural network modeling for software cumu- 2006. He is a Professor in School of Information Engi- lative failure time prediction, Reliab. Eng. Syst. Saf. 87 (2005) 45–51. neering, Hu Zhou University. His research interests fi [15] L. Tian, A. Noore, On-line prediction of software reliability using an evolu- include geographic information system, arti cial intel- tionary connectionist model, J. Syst. Softw. 77 (2005) 173–180. ligence and information fusion. [16] Q.P. Hu, M. Xie, S.H. Ng, G. Levitin, Robust recurrent neural network modeling for software fault detection and correction prediction, Reliab. Eng. Syst. Saf. 92 (2007) 332–340. [17] Y.S. Su, C.Y. Huang, Neural network based approaches for software reliability estimation using dynamic weighted combinational models, J. Syst. Softw. 80 (4) (2007) 606–615. [18] V.N. Vapnik, The Nature of Statistical Learning Theory, Springer Verlag, Berlin, 1995. [19] L. Tian, A. Noore, Dynamic software reliability prediction: an approach based on support vector machines, Int. J. Reliab. Qual. Saf. Eng. 12 (4) (2005) Qing Shen received her M.E. degree in Computer 309–321. Application Technology in July, 2007 from North Uni- [20] P.F. Pai, W.C. Hong, Software reliability forecasting by support vector machines versity of China. She is now a Lecturer in the School of with simulated annealing algorithms, J. Syst. Softw. 79 (6) (2006) 745–755. Information Engineering of Huzhou University. Her [21] B. Yang, X. Li, A study on software reliability prediction based on support current research interests including artificial intelli- vector machines, in: Proceedings of 2007 IEEE International Conference on gence, software testing, and reliability evaluation. Industrial Engineering and Engineering Management, 2007, pp. 1176–1180. [22] J.G. Lou, J.H. Jiang, C.Y. Shuai, A study on software reliability prediction based on transduction inference, in: Proceedings of IEEE 19th Asian Test Symposium, 2010, pp. 77–80. [23] J. Park, N. Lee, J. Baik, On the long-term predictive capability of data-driven software reliability model: an empirical evaluation, in: Proceedings of the 25th International Symposium on Software Reliability Engineering, 2010, pp. 213–224. [24] M.E. Tipping, Sparse kernel principal component analysis, Adv. Neural Inf. Process. Syst. (2001) 633–639. [25] M.E. Tipping, Sparse Bayesian leaning and the relevance vector machine, Zhangguo Shen is a M.S. who was born in 1982. He is – J. Mach. Learn. Res. 1 (1) (2001) 211 244. now a Lecturer in the Department of Computer, Huz- [26] M.E. Tipping, The relevance vector machine, Adv. Neural Inf. Process. Syst. hou University, and his research interests include – (2000) 652 658. software reliability evaluation, computer system per- [27] M.E. Tipping, Bayesian inference: an introduction to principles and practice in formance evaluation and dependable computing. , Adv. Lect. Mach. Learn. (2004) 41–62. [28] H. Li, D. Pan, C.L. Chen, Intelligent prognostics for battery health monitoring using the mean entropy and relevance vector machine, IEEE Trans. Syst. Man Cybern.: Syst. 44 (7) (2014) 851–862. [29] A.I. Khader, M. McKee, Use of a relevance vector machine for groundwater quality monitoring network design under uncertainty, Environ. Model. Softw. 57 (2014) 115–126. [30] S. Yu, K. Wang, Y.M. We, A hybrid self-adaptive particle swarm optimization– genetic algorithm–radial basis function model for annual electricity demand prediction, Energy Convers. Manage. 91 (2015) 176–185. [31] M.G. Kendall, A new measure of rank correlation, Biometrical 30 (2) (1938) 81–93. Zhen Wang received the Ph.D. degree in computer [32] H.B. Mann, Nonparametric tests against trend, Econometrica 13 (3) (1945) architecture from Tongji University. She has worked in 245–259. Synopsys as a Senior Engineer, and she is now a Tea- [33] H. Ltkepohl, Univariate time series analysis, Applied Time Series Econometrics, cher in Shanghai University of Electric Power. Her Cambridge University Press, Cambridge, 2004. research interest includes principle curve theory, [34] M. Ohba, Software reliability analysis models, IBM J. Res. Dev. 28 (4) (1984) reliability analysis of high-level circuits and fault- 428–443. tolerant computing. [35] S. Yamada S, J. Hishitani, S. Osaki, Software reliability growth model with Weibull testing effort: a model and application, IEEE Trans. Reliab. 42 (1993) 100–105. [36] M. Ohba, Software reliability analysis models, IBM J. Res. Dev. 28 (4) (1984) 428–443. [37] P.F. Pai, W.C. Hong, Software reliability forecasting by support vector machines with simulated annealing algorithms, J. Syst. Softw. 79 (6) (2006) 747–755.

Rinqin Wang is a Lecturer who was born in 1979. She received her Ph.D. in Computer Science and Technology Jungang Lou received his M.Sc. in Computational in June 2009 from Zhejiang University. Dr. Wang's Mathematics (2006) from Tongji University and PhD in research interests include , natural lan- Computer Software and Theory (2010) from Tongji guage processing, semantic retrieval and personalized University. He is now an associate professor in the recommendation. School of Information Engineering at Huzhou Uni- versity, and he is also a Postdoctoral in the institute of Cyber-Systems and Control, Department of Control Science and Engineering, Zhejiang University. His research interests include dependable computing, software reliability evaluation, computer system per- formance evaluation, neural network optimization and time series prediction.