A Hidden-Gamma Model-Based Filtering and Prediction Approach for Monotonic Health Factors in Manufacturing

Gian Antonio Sustoa,∗, Andrea Schirrub, Simone Pampurib, Alessandro Beghia, Giuseppe De Nicolaoc

aDepartment of Information Engineering, University of Padova, via G. Gradenigo 6/B, 35131 Padova, Italy. bStatwolf LTD, 38-39 Baggot Street Lower, Dublin, Ireland. cDepartment of Computer Engineering and Systems Science, University of Pavia, via Ferrata 1, 27100 Pavia, Italy.

Abstract In the context of Smart Monitoring and Fault Detection and Isolation in industrial systems, the aim of Predictive Maintenance technologies is to predict the happening of process or equipment faults. In order for a Predictive Main- tenance technology to be effective, its predictions have to be both accurate and timely for taking strategic decisions on maintenance scheduling, in a cost-minimization perspective. A number of Predictive Maintenance technologies are based on the use of ”health factors”, quantitative indicators associated with the equipment wear that exhibit a monotone evolution. In real industrial environment, such indicators are usually affected by measurement noise and non-uniform sampling time. In this work we present a methodology, formulated as a stochastic filtering problem, to optimally predict the evolution of the aforementioned health factors based on noisy and irregularly sampled observations. In particular, a hidden Gamma process model is proposed to capture the nonnegativity and nonnegativity of the derivative of the health factor. As such filtering problem is not amenable to a closed form solution, a numerical Monte Carlo approach based on particle filtering is here employed. An adaptive parameter identification procedure is proposed to achieve the best trade-off between promptness and low noise sensitivity. Furthermore, a methodology to identify the risk function associated to the observed equipment based on previous maintenance data is proposed. The present study is motivated and tested on a real industrial Predictive Maintenance problem in semiconductor manufacturing, with reference to a dry etching equipment. Keywords: Decision Support System, Gamma Distribution, Industry 4.0, Monte Carlo Methods, Particle Filters, Predictive Maintenance, Prognostic and Health Management, Semiconductor Manufacturing

1. Introduction or, more generally, to failures that can be predicted in ad- vance [8, 9]. Examples of such type of faults are the break- Advanced monitoring is a fundamental activity in the ing of the source in ion-implantation processes in semicon- Industry 4.0 scenario to implement control, maintenance, ductor manufacturing [7], the flute wear in cutting tool quality, reliability, and safety policies [1, 2, 3]. In particu- equipment [10], and the lifespan of lithium-ion batteries lar, Fault Detection and Isolation (FDI) [3] and Predictive [11]. Maintenance (PdM) [4] technologies have proliferated in In this work we focus on the so-called ’Health Fac- the past recent years for diagnosis and prognosis of pro- tors’ (HFs), an important concept in prognostic1. HFs cess/tool failures [5]. While the aim of such technologies is are quantitative indexes used to define the current sta- similar and partly overlapped, PdM technologies are more tus of a tool/process and to assess the future statuses of focused on prognosis. Prognosis can be defined as the ca- the system under exam (or of one of its components/sub- pability to provide early detection of the precursor and/or systems), and its Remaining Useful Life (RUL) [14, 16, 17], incipient fault condition of a component, and to design so that strategic decisions regarding maintenance schedul- tools for managing and predicting the progression of such ing and dynamic sampling plans can be taken [4]. Be- fault condition to component failure [6]. Given their goal, ing in direct relationship with wear, usage or stress of an PdM technologies are typically applied to failures that are equipment/component or system, HFs generally have a associated with wear and usage of the system/process [7], monotone evolution. A HF can be of very different na- ture: in its simplest form, HFs can be observable param- ∗Corresponding author Email addresses: [email protected] (Gian Antonio Susto), [email protected] (Andrea Schirru), 1Health Factors are also indicated as ’Component Health’ [5], [email protected] (Simone Pampuri), ’State of Health’/’Health State’ [12, 13] or as ’Health Indicators’ [email protected] (Alessandro Beghi), [10, 14] by different authors and they are closely in relation with the [email protected]. (Giuseppe De Nicolao) concept of ’degradation data’ [15].

Preprint submitted to Elsevier March 7, 2019 eters that, thanks to specific domain expertise, can be as- proach proposed in this work considers the HF as a mono- sociated with equipment/process health status. Example tonic Gamma process (with time-varying shape parame- of health factors as quantities that are directly related to ter) corrupted by Gaussian noise (hidden-Gamma model). system health, such as the thermal index of a polymeric Such assumptions lead to the lack of closed-form solutions material [18], the scar width in sliding metal wear [19], and for the estimation of model parameters in the proposed the temperature difference in semiconductor manufactur- approach. However, it will be shown that the prediction ing epitaxy processes [20]. HFs can also be the output of problem can be efficiently solved by resorting to particle fil- Soft Sensor modules [21, 22], where the status health is im- tering methods [43, 44, 45], employing Monte Carlo (MC) possible/costly to be monitored. Moreover, HFs can be the simulations to derive the target posterior distributions. Fi- residual of first principle FDI models [23]. In fact, in many nally, a recursive procedure to estimate the time-varying practical examples [1, 17, 23, 24], residuals have a mono- shape parameter is proposed. Such procedure allows to tonic behaviour and threshold-based policies to mainte- optimize a trade-off between the need for promptness and nance management are implemented on such quantities. noise insensitivity/outlier rejection. HFs are therefore relevant quantities in both model-based The paper is organized as follows. In Section 2 the [24, 25, 26] and model-free [1, 3, 27, 28] prognostic ap- hidden-Gamma model is presented. In Section 3.1 the proaches. principles of Particle Filtering (PF) are briefly summa- In the present paper, the problem of designing a HF rized and adapted to Gamma processes. In Section 4 an for Predictive Maintenance (PdM) purposes is considered adaptive recursive scheme for estimation of monotone HFs [7, 29, 30]. In particular, the issue of assessing the probabi- is presented. Section 5 is dedicated to the definition and lity distribution of the HF future values given its past mea- estimation of an appropriate Risk Function for the pro- surements is addressed, under the following assumptions: posed model. In Section 6 some experimental results on (i) the HF is monotonically increasing; (ii) its measure- synthetic datasets are reported, whereas in Section 7 a real ments are subject to random noise that may conceal its PdM semiconductor manufacturing problem related to dry monotonic nature; (iii) measurements are non-uniformly etching is tackled. 2 sampled over time. The aforementioned features are typi- cal traits of HF signals [17, 20, 31, 32, 33], but they are gen- erally not simultaneously accounted for in the related liter- 2. The Hidden Gamma Process ature. Non-stochastic models (see [12] for a broad review 2.1. Gamma Probability Distribution on RUL estimation) for HFs have been presented in litera- ture, as well as inspection and intervention approaches for The most notable property of Gamma distributions is increasing maintenance actions effectiveness and decreas- their non-negative support. We consider a random vari- ing the associated costs. However, such methodologies are able x with Gamma distribution Γ(a, θ), where a is the well suited for noise-free scenarios and, given the aforemen- shape parameter and θ is the scale factor. The first two tioned assumptions on the HF signals, it is here proposed statistical moments of x are E[x] = aθ and V ar[x] = aθ2 to adopt a stochastic filtering paradigm [34]. With the and the probability density function (PDF) is p(x) = − x proposed approach, the HF is treated as a stochastic pro- xa−1e θ Γ(a)θa . Gamma distributed random variables enjoy the cess, with the possibility to combine prior knowledge on following property: the HF with statistical information regarding the observed Property 1 (Infinite Divisibility): if x1 ∼ Γ(a1, θ) and noisy data. A simple approach to deal with the problem x2 ∼ Γ(a2, θ), then the sum x = x1 + x2 has a Gamma at hand is provided by the Wiener and Kalman predictors distribution with shape a1 + a2 and scale factor θ. [20, 35, 36, 37], which are statistically optimal for linear The shape of the Gamma probability distribution for Gaussian models. However, such classical approaches may different values of a and θ is shown in Figure 1. be considered suboptimal for signals with the character- istics given in assumptions (i)-(iii). As a matter of fact, 2.2. The Hidden-Gamma Model far from being Gaussian, the HF derivative is in this work considered to be a nonnegative random variable. In the following, the HF is denoted by x(·). Measure- Given such premises, a framework for HF filtering and ments x(t1), x(t2), . . . , x(tk) are available for time instants prediction based on the Gamma distribution is here pro- t1 ≤ t2 ≤ ... ≤ tk. We indicate with t0 ≤ t1 the ini- posed. PdM applications employing Gamma distributions tial time instant, and with tk+1 ≥ tk the instant points in has been developed since the 1970s [38], especially in me- which predictions of the HF are desired. chanical and civil engineering applications [39, 40, 41] and, In the following, we adopt a Bayesian paradigm by as- recently, in industrial environments [42]. Indeed, if the signing to {xj, j = 1, . . . , k + 1} a joint prior distribution. HF is modeled as the sum of Gamma distributed random variables, such sum is still Gamma distributed, with the 2 advantage that convenient estimation and prediction algo- The present work is an extended version of [46]. Additional material concerns implementation details, the derivation of a risk rithms can be derived. Given that in real-world industrial function associated with the maintenance operation, and the use of applications HFs are usually observed with noise, the ap- synthetic data to better assess performance of the algorithms. 2 future values of the HF, could be performed for example θ 0.35 a = 2.0, = 1 via maximum likelihood estimation (MLE). In such case, a = 2.0, θ = 3 θ the posterior expectation can be employed as point pre- 0.3 a = 2.0, = 5 a = 3.0, θ = 1 dictor a = 3.0, θ = 3 0.25 a = 3.0, θ = 5 xˆk+1 := E[xk+1|xk] = xk + E[wk+1] = xk + αθ(tk+1 − tk). 0.2 p(x) Moreover, the knowledge of the distribution of wk+1 allows 0.15 to define confidence intervals for the HF. Such confidence 0.1 intervals could be exploited in a PdM perspective, by com- puting the probability of exceeding predefined thresholds 0.05 associated with a maintenance action.

0 Unfortunately, in real world scenario, HFs are usually 0 2 4 6 8 10 12 14 16 18 20 affected by measurement noise. For this reason, a mea- x surement equation is added to the model (1): Figure 1: Gamma probability distributions for different values of a and θ. Assumption 2. The HF is observed through the noisy measures As stated in Section 1, the HF is associated with equip- yj = xj + vj, j = 1, . . . , k (3) 2 ment/process degradation, therefore the prior information where vj ∼ N (0, σ ) (Gaussian distribution with 0 mean on the HF can be formalized as follows: and standard deviation σ) is independent from the initial value x0 and {wj}. • HF takes non negative values: xj ≥ 0, ∀ 0 ≤ j ≤ k+1;

The stochastic process yk is named here a hidden Gamma • HF has non negative increments: ∆xj ≥ 0, ∀ k, 1 ≤ process (HGP). Given the presence of the measurement j ≤ k + 1, where ∆xj := xj − xj−1; noise vj in (3), there is no guarantee that the sequence • ∆xj is positively correlated with the length of tj − {yj} is monotonic. In the following, it is assumed that tj−1. a0, α, θ are known, as they can be estimated by MLE even in presence of noise. The formulation of the filtering prob- The previous characteristics for the HF are captured by lem is then the following: the following stochastic model: Problem 1. Given the available measures Yk = {yj, j = Assumption 1. The HF evolution is governed by the fol- 1, . . . , k} and Assumptions 1-2, compute the posterior lowing equation PDF p(xk+1|Yk).

xj+1 = xj + wj+1, j = 1, . . . , k (1) Since p(xk+1|xk, xk−1,...) = p(xk+1|xk), the HGP de- fined in (1) is a first-order Markov process. Then, Problem where x ∼ Γ(a , θ) and w ∼ Γ(α(t − t ), θ). It is sup- 0 0 j j j−1 1 can be approached by looking for a recursive solution posed that wj are mutually independent random variables, where p(xk+1|Yk) is computed by updating p(xk|Yk−1), also independent from x0.  once the measure yk is available. In the noisy conditions It is straightforward to see from Eq. (1) that both the we are considering in this work, such solution must be de- HF and its increments are non negative. Moreover, thanks rived with numerical MC techniques like PF. to Property 1, it can be seen that E[x(tj)] = (a0 + αtj)θ, which means that it is expected that the HF is linearly 3. Particle Filtering of Gamma Processes increasing with time. Remark 1: the discrete-time model described in (1) 3.1. Basics of Particle Filtering can be obtained by sampling the continuous-time Gamma process x(t) that satisfies PFs or Sequential Monte Carlo methods are numeri- cal approaches that allow to approximate intractable or

dx(t) = x(t)dt + dw(t), t ≥ t0, (2) complex distributions by employing discrete distributions whose statistical moments and confidence intervals can be where x(t0) ∼ Γ(a0, θ) and dw(t) is a Wiener process with easily calculated. PFs exploits the generations of N ran- dw(t) ∼ Γ(αdt, θ). Equation (2) can be obtained thanks dom variables, named particles, to approximate the un- to Property 1. Such formulation allow us to estimate the known stochastic process posterior. A basic PF algorithm HF for the generic time instant t 6= tj.  for a hidden state-space system as the one given by (1)- 3 If {xj} were noiseless, the estimation of the unknown (3) is provided here (a graphical representation of the PF parameters {a0, α, θ}, that specify the distribution of the procedure is depicted in Fig. 2): 3 Algorithm 1: Particle Filter (PF) algorithm 1. Set k = 0 (j) 2. Particles x0 , j = 1,...,N are drawn from the initial distribution p(x0) (j) (j) 3. Weights w0 = p(x0 = x0 ), j = 1,...,N are computed 4. Update k = k + 1 5. From a suitable proposal distribution (j) (j) (j) G(xk |xk−1), N particles xk , j = 1,...,N are sampled 6. The weights Figure 3: HGP-based predictions (represented by the shaded areas) in three different future time instants for a synthetic HF. Thanks to (j) (j) (j) the monotonic nature of the HGP model, the prediction uncertainty (j) (j) L(yk; xk )p(xk ; xk−1) grows with time only in one direction. By computing the integral of wk = wk−1 (4) G(x(j)|x(j) ) the red shaded areas it is possible, for fixed thresholds, to estimate k k−1 the threshold-crossing probability at different future time instants. are adjusted, where L is a likelihood function defined by the measurement model (3) and the Beside the selection of the number of particles N, the known statistics of vj, while the state-transition most important design choice of the PF procedure is the (j) (j) (j) (j) probability p(xk ; xk−1) is specified by the selection of G. The simplest choice is to set G(xk |xk−1) = state-space model (1) (j) (j) p(xk ; xk−1), so that only L is required in (4). Notably, 7. The weights are normalized in order to sum to 1 with this choice, yk has no influence on Step 5 of the proce- 8. p(xk|Yk) is approximated by dure, reducing in general the robustness of the estimates. A possible alternative is to use a G that considers a prelim- N X (j) j inary approximation of p(x |Y ); this can be achieved, p(x |Y ) ≈ w δ(x − x ), (5) k k−1 k k k k k for example, by means of a Kalman Filter (KF) approach j=1 [48]. a discrete distribution with support points A second critical design issue in the PF procedure is (j) the resampling step [49]. If a large number of parti- xk , j = 1,...,N, where δ(·) is the Dirac delta measure. 9. Go to Step 4. cles have their respective weights with very small values (j) 1 wk  N , it is necessary that such particles are discarded and re-sampled from the distribution p(xk|Yk−1). This is done in order to allow the uninformative particles to con- tribute again to the estimation. Many resampling strate- gies have computational cost O(N), however less resource- demanding approaches for resampling can be implemented [50]. 40 3.2. Adaptation to Gamma Processes A possible issue affecting the PF problem for system (1)- 30 (3) is related to the nonnegativity of quantities wj. Such issue is related to the fact that an overestimation of the [Arbitrary Unit] 20 lower limit of the distribution of xk will propagate to all estimates xi with i > k. In such case, lower limits of xi will be overestimated as well, leading to the accumulation 10 of one-sided errors. This issue is formally described in the

k−1 k k+1 following proposition.

Figure 2: A graphical representation of the PF procedure. At time Proposition 1. If, for some k, the posterior distribution instant k, the prior state distribution, represented by the blue solid p(xk|Yk−1) is approximated by a representation with dis- curve, is approximated by the N particles (black dots). The posterior crete support, the lower limit of the support of p(x |Y ) distribution arises from the interplay between the prior distribution k+1 k and the observation likelihood, represented by the red dashed curve, is greater or equal to that of p(xk|Yk−1). generated by the noisy observation (the red circle). 3We refer the interested readers to [47] or [45] for more details on PF. 4 Filtering result: comparison between no smoothing (W=1) and smoothing with W=5 1 1

0.8 0.8 Noisy signal Filtering result (W=1) 0.6 0.6

Filtering result (W=5) x x 0.4 0.4

0.2 0.2

0 0 0 5 10 0 5 10 t t

1 1

0.8 0.8

0.6 0.6 x x 0.4 0.4

0.2 0.2

0 0 0 5 10 0 5 10 t t Figure 4: State estimations with different lag values. It can be ap- preciated how larger values of lag are associated with higher stability Figure 5: Expected value (red line) and confidence limits (blue lines) in presence of outliers. corresponding with different choices of α(t) (from top-left, in clock- wise order) constant, linear increasing, linear decreasing and peri- odic. For a proof of Proposition 1, we refer the interested read- ers to [46]. A possible way to mitigate the aforementioned propagation error is to employ a fixed-lag smoother [47], the expected size of the increments. In real world indus- where p(xk+1|Yk+W ) is taken as the basis for future up- trial cases, variations in HF may happen quite suddenly dates instead of p(xk+1|Yk), p(xk+1|Yk+W ) (the integer W with a steep rise of x(t) after flat steady-state behavior denotes a fixed window size). With this approach, the (see the application case discussed in Section 7). For this smoothed p(xk+1|Yk+W ) is generally more accurate and reason, a time-varying shape factor α = α(t) is used here prone to overestimating the lower limit of the distribution, (see Fig. 5), that must be estimated as well by the PF. thanks to the increased availability of information. Considering the discrete-time nature of (1)-(3), the shape The fixed-lag smoother can be derived as follows. The factor can be denoted as αj = α(tj), and the following 0 augmented state vectorx ˜j := [xj xj+1 . . . xj+W ] is in- hypothesis can be made: troduced and, similarly,w ˜ ,v ˜ ,y ˜ . By exploiting (1)-(3) j j j Assumption 3. The shape factor α evolves according to it is possible to obtain j

αj+1 = αj + δj, j = 1, . . . , k (8) x˜j+1 =x ˜j +w ˜j (6) 2 y˜j =x ˜j +v ˜j (7) where δj ∼ N (0, λ ) is independent of x0, {wj} and {vj}. In this perspective, the hyper-parameter λ2 can be The augmented-state model described by equations (6)- tuned to modify the variability of αj. Indeed, large val- 2 (7) allows more robustness in the PF approach. In Fig. ues of λ lead to quickly varying αj and promptly reactive 4 an example with a synthetic HF is illustrated. It can adapting PF. On the other hand, small values of λ2 can be observed that larger lag sizes allow enhanced estima- improve the noise sensitivity at the price of a less respon- tion stability (even in presence of outliers) and prevent sive PF. systematic bias. A maximum a posteriori (MAP) estimate for αk is A final design guideline for the Hidden Gamma Process- adopted based on a moving window approach. Ifα ˜j := PF regards resampling. Conditional resampling has been 0 [αjαj+1 . . . αj+W ] , the MAP estimate α˜ˆk is here implemented to make less likely for a particle to be sampled depending on its distance from the critical edge. α˜ˆk = arg max p(˜αk|y˜k, xk−W , αk−W +1) where While this procedure can lead to the creation of zones α˜k where particles are rarely resampled, a mitigated risk of p(˜αk|y˜k, xk−W , αk−W ) ∝ p(˜yk|α˜k, xk−W )p(˜αk|αk−W ) error propagation is achieved. (9) with xk−W and αk−W set to be equivalent to their point 4. Regularized Adaptive Filtering estimates at previous iteration. Given that the conditional distributions ofy ˜k and The a-priori knowledge on the HF increment from tj−1 α˜k in (9) are Gaussian, the logposterior L = to tj is expressed by the statistics E[wj] = α(tj − tj−1)θ log(p(˜αk|y˜k, xk−W , αk−W )) is defined (up to a constant) and V ar[wj] = α(tj − tj−1): the higher the α, the higher as 5 1 1 L = SSR + 2 R (10) 0.9 λ Probability where SSR and R are respectively the sum of squared 0.8 residuals and the sum of squares of δj, j = k − W, . . . , k. 0.7 Equation (10) is typical of regularization methods [51] 0.6 where a trade off choice between accuracy in fitting the training data and complexity of the estimated function 0.5 has to be made. In regularization methods, the following 0.4 family of penalty terms is usually considered: 0.3

W −1 Threshold−crossing probability 0.2 X q R = |δk−i| . (11) 0.1 i=0 0 For q = 2, the well-known Ridge Regression (RR) [52] is 0 20 40 60 80 100 120 140 160 180 obtain. The advantage of RR is that it admits a closed- Time [working hours] form solution. For q = 1, a LASSO-type [53, 54] regular- Figure 6: Representation of a risk function: thanks to the proper- ization is obtained instead. LASSO provides sparse solu- ties of the proposed methodology, it is possible to assess the risk of tions, an important property that makes LASSO the first crossing the threshold H(t) at a generic future time instant. choice in many applications over RR, even at the price of not admitting a closed-form solution. Values of q larger than 1 can also be adopted; for instance, q ∈]1, 2[ leads 5. Risk Function Evaluation to a penalization region similar to the Elastic Net [55]. The computational cost of this operation for k > W is Once a prediction of the future probability distribution O(W 2k) for q = 1 and O(W 3 + W 2k) for q = 2; optimized of an observed HF is available, it can be compared with approaches [56] are now available for the less frequent case a maintenance threshold to compute and evaluate a risk k < W . function (RF) [57] associated with the maintenance op- Remark 2: The discrete-time model with time-varying eration (Figure 6). Such a maintenance threshold may α can still be interpreted as the sampled-data version of k be given from process/equipment operating conditions, or the continuous time model with time-varying α(t). In fact, inferred from historical, noisy HF data, and it can be p(x(t )|x(t )) in (2) depends on the mean value of α(t) j+1 j time/usage-dependent. In the following, a RF is formally in the interval [t , t ], and not on its evolution inside j j+1 defined and motivated in a maintenance optimization per- the interval. From Property 1, w = x(t ) − x(t ) is j j+1 j spective. Furthermore, resorting to supervised classifica- Gamma distributed, that is, w ∼ Γ(¯α , θ) whereα ¯ := j j j tion theory, a method to estimate the maintenance thresh- R tj+1 1 R tj+1 α(t)dt. Then, by setting αj := α(t)dt it tj tj+1−tj tj old from historical maintenance data is also presented. follows that wj ∼ Γ(αj(tj+1 − tj), θ) and the discrete-time increment model is obtained.  5.1. RF definition 4.1. Implementation notes tk The HGP-PF approach proposed in this work can be Let pτ := p(x(tk +τ)|Yk), τ > 0 be the predicted distri- summarized (for the sequences {tj, yj}, j = 1, . . . , k) as: bution of the observed health factor at time instant tk +τ. 1. Initial parameter are selected: the process noise- According to the paradigm established in the previous sec- tions, pτ is a mixture of Gamma distributions such that related quantities θ and α0, initial state distribution 2 P (x0), measurements noise variance σ , the PF design parameters N, W and λ2. N tk X 2. For j = 1, . . . , k: pτ (x) = wiωi(x, τ) (12) i=1 (a) If (j ≥ W ),α ˜j is updated by solving the regu- larization problem (Section 4); ( −(x−xi)/θ (b) p(˜x |Y ) is approximated; ταk e j j (x − xi) Γ(τα )θταk x ≥ xi 3. Predictions and confidence intervals are computed us- ωi(x, τ) = k 0 x < xi ing the newest estimation; PN It can be shown, given that p(xk+1|xk) is Gamma dis- where i=1 wi = 1 and Γ(·) is the gamma function. Fur- tributed [46], that p(xk+1|Yk) is a continuous mixture of thermore, let H(t) be a continuous real function with Gamma distributions. Therefore p(xk+1|Yk) will be ap- nonnegative codomain representing the time-dependent proximated by a finite mixture of Gamma distributions threshold for the analyzed HF. It follows that the proba- since the PF provides an approximation of p(xk|Yk) with bility of exceeding the threshold H at time instant tk + τ discrete support (see Eq. 5). is 6 measurement. Furthermore, let si be an indicator of the effectiveness of the i-th maintenance cycle. Since it is not possible to know what the status is of the maintained com- ponent before its replacement, two situations can occur. Let si = −1 conventionally represent an early replacement Late maintenance (the component is still functional when replaced) and let Early maintenance si = 1 represent a belated replacement (component re- Support Vectors placed after it has broken). To derive the threshold function H(t) from D, a Sup-

Health Factor port Vector Machine (SVM) approach is hereby proposed. SVM techniques allow to find an optimal nonlinear sepa- ration between two categories of data points (if such cat- egories are separable) or, in the soft-margin version, to produce an optimal robust (with respect to data mislabel- ing) classification. In the following, the focus is set on the

estimation of a H(t) represented by a p-th degree poly- Time ˜ 2 p nomial function of t. Let ti = [ti, ti , . . . , ti ] be the polynomial span of ti up to the p-th degree. Furthermore, Figure 7: A dataset D along with an example of HF reading (in 0 p+1 letz ˜i = [yi t˜i] ∈ . grey). The optimal threshold H(t) is obtained using SVM (p = 1). R It is to be noted that the optimal order of the class separation, p, can The problem of estimating H, according to SVM soft- be determined by means of Generalized Cross Validation (GCV). margin theory, can then be seen as the research of a param- eter vectorc ˜, defined asc ˜ = [c(y), c(t1), c(t2), . . . , c(tp)]0 that solves the following problem.

Problem 2. Find Z H(tk+τ) min max J p(x(tk + τ) > H(tk + τ)|Yk) = 1 − pτ (x)dx c,ξ,b˜ µ,β 0 N H(t +τ) where the cost function J is defined as X Z k = 1 − wi ωi(x, τ)dx N i=1 0 1 Xm J = ||c˜||2 + [(C − β )ξ − µ (s (˜c0z˜ − b) − 1 + ξ )] 2 i i i i i i By observing that i=1

Z H(tk+τ) γ(τα , (H(t + τ) − x )/θ) under the constraints µi ≥ 0, βi ≥ 0. C is a tuning param- ω (x, τ)dx = k k i i Γ(τα ) eter that can be selected by means of Generalized Cross 0 k Validation (GCV). where γ(·, ·) is the incomplete lower Gamma function, it is possible to compute the risk function Problem 2 is the standard representation of the SVM problem with soft margin. Its solution can be obtained by means of a Sequential Minimal Optimization (SMO) N γ(τα , (H(t + τ) − x )/θ) approach [58]. tk X k k i R (τ) := 1 − wi (13) The optimal separating hyperplane resulting from the Γ(ταk) i=1 solution of Problem 2 (Figure 7) satisfies the condition

The risk function R(t) depends on the choice of the p X threshold function H(t). Such choice can be done either b + c(y)y + c(ti)ti = 0 by exploiting experts knowledge (for instance, a threshold i=1 beyond which the machine is known to malfunction) or by The threshold function H(t) is then obtained as analysing historical maintenance data. In the next subsec- tion, classification theory is employed in order to estimate p (t ) X c i b the optimal threshold when such data are available. H(t) = − ti − (14) w(y) c(y) i=1 5.2. Threshold Estimation By combining (13) and (14), the risk function is then

Let D be a set of Nm maintenance operations

tk Nm R (τ) = 1− (15) D = {Ti ∈ R, yi ∈ R, si ∈ {−1, 1}}i=1 N Pp c(ti) i b X γ(ταk, −( (y) (tk + τ) + (y) + xi)/θ) where Ti is the duration of the i-th production cycle (main- w i=1 c c i Γ(τα ) tenance to maintenance) and yi is the last observed HF i=1 k 7 5.3. Use of RF for Maintenance Optimization The main goal of a maintenance management system is the minimization of the costs associated with failures and maintenance operations. The cost of maintenance oper- ations can be associated with the cost of several factors, such as spare parts, equipment/production downtime, staff performing the interventions, scrap products due to the re- qualification of the system (a post-maintenance phase in which the process needs to run before being ’stable’ or within the production standards). At first glance, mini- mizing the number of interventions seems to be the natu- ral approach to minimize the aforementioned costs. How- ever, Run-to-Failure (R2F) policies, where maintenances are performed only after a failure has happened, are gen- erally deprecated because the costs related to unexpected Figure 8: [Sigmoid Data] On the left panel a set of 5 time series from failures can be very high. The trade-off between early- D(1) is shown. The red line (–) represents the maintenance threshold H = 8. On the right panel an histogram of {τ }100 with H = 8. stage maintenances (and associated unexploited lifetime i i=1 UL of the system) and unexpected breaks UB can be opti- Experiment Name Kalman HGP-PF mized by using corresponding agglomerated costs (respec- Sigmoid Data 0.843 0.784 tively cUL and cUB) and the proposed RF. Furthermore, Sigmoid Data with Discontinuities 1.211 1.063 in the proposed approach, reliable predictions can be ob- Industrial Data 2.471 2.118 tained over given time frames, a relevant feature in appli- cations where timeliness is a critical issue, such as when Table 1: Filtering performances of the considered approaches in interventions have to be planned in advance or the mon- terms of RMSE. itoring of RF and re-scheduling can be guaranteed only with a fixed time delay. where ai ∼ U(0.005, 0.02), a uniform distribution of sup- port [0.005, 0.02], and bi ∼ U(750, 1000). Gaussian noise 6. Simulation Results v ∼ N (0, 1) is then added to the HF. A set of 5 time series belonging to D(1) is reported in Fig. 8. For this study a To test the proposed methodology, synthetic datasets fixed maintenance threshold has been set to H = 8. representing HFs have been created. The generic dataset The proposed HGP-PF is compared with a KF-based D has a total of N time series, the i-th time series is defined approach. The KF employed in this Section, and also in as the experimental work detailed in Section 7 is formulated as follows. The following linear state-space model is con-  ni ni Si = {tj, xj}j=1, {tj, yj}j=1, τi sidered: = {[Tn Xn ], [Tn Yn ], τi} , i i i i 0 zj+1 = Azj + Gvj (16) where τ indicates the time instant when the i-th HF i y = Cz + w0 , (17) crosses a predefined threshold H. j j j The accuracy of the prediction at a time instant τ − ∆t i where zj = [xj . . . xj−nO ] is a state vector that contains the is used to assess the performance of the proposed PdM al- gorithm. The initial filter parameters are chosen via like- 100 lihood maximization. A truncated multivariate Normal PF [τ-50] distribution is used as proposal distribution, that is then Kalman [τ-50] 80 PF [τ-100] sampled through a Gibbs sampler [59]. A Kalman Filter Kalman [τ-100] is used to obtain the non-truncated Normal distribution. Ideal Although the use of the Kalman filter is justified by model 60 linearity, an Unscented Kalman Filter can also be used [%] R [60]. 40

6.1. Sigmoid Data 20 A first synthetic dataset D(1) of N = 100 time series has been generated as follows: for each time series the 0 time domain is T = [0, 1,..., 1500], while the i-th HF at τ-80 τ-60 τ-40 τ-20 τ τ+20 τ+40 τ+60 τ+80 τ+100 time t is 10 Figure 9: [Sigmoid Data] Averaged Risk Functions associated with xi(t) = , different prediction approach and different prediction horizons. 1 + e−ai(t−bi) 8 0 estimated present and past values of the HF, vj ∼ N (0,Q) 0 is the model error, and wj ∼ N (0,R) is the measurement 0 0 error. vj and wj are supposed to be uncorrelated. Matrices A, C and G are estimated using the N4SID algorithm [61], while the model order nO is chosen according to the Akaike Information Criterion [62]. The Kalman predictor for the 1-step ahead prediction has the classical formulation   zˆj+1|j = Azˆj + Kj yj − Czˆj|j−1 , (18) 0  0 −1 Kj = APj|j−1C CPj|j−1C + R , (19) where Pj|j−1 is the variance matrix of the prediction error zˆj+1|j − zj+1 and it is updated through the discrete Ric- cati equation. The tuning of Q and R has been done by computing a test on the residuals correlation

REQ,R(σ) = E [eQ,R(j)eQ,R(j + σ)] , (20) with eQ,R(j) = yj − Czˆj|j where the estimationz ˆj|j de- pends on the choice of Q and R. A grid search on different set of values of Q and R has been performed to minimize Figure 10: [Sigmoid Data with Discontinuities] On the left panel a set of 5 time series from D(1) is depicted. The red line (–) represents maxσ>0 |REQ,R(σ)|. The multiple-step ahead prediction the maintenance threshold H = 14. On the right panel an histogram that can be easily derived by exploiting 16-18 [20]. 100 of {τi} with H = 14. In Table 1 the filtering performances of the proposed i=1 HGP-PF are reported, and compared to that of a KF- based approach in terms of RMSE. As for prediction ac- curacy, averaged Risk Functions are reported in Fig. 9 for 2 cases, a 50-step and 100-step ahead predictions. Risk Functions are aligned with respect to the time instant of the fault τ and compared with a ideal function that pro- vides  1 if t > τ R (t) = ideal 0 otherwise The HGP-PF beats the KF both in terms of RMSE and in PdM accuracy. In particular, it can be appreciated in Fig. 9 how the PF of the HGP-PF is qualitatively closer 100 to R (·) than that of the KF. PF [τ-50] ideal Kalman [τ-50] 80 PF [τ-100] Kalman [τ-100] 6.2. Sigmoid Data with Discontinuities Ideal (2) A sigmoid dataset with discontinuities D (with N = 60

100 time series and support T = [0, 1,..., 1500], as before) [%] R has been generated as follows: The i-th HF at time t is 40 10 xi(t) = + 0.6ci, 1 + e−ai(t−bi) 20 where ai ∼ U(0.005, 0.02), bi ∼ U(750, 1000) and ci ∼ 0 B(0.01, 1) is a binomial distribution with 1 trial and 0.01 τ-80 τ-60 τ-40 τ-20 τ τ+20 τ+40 τ+60 τ+80 τ+100 success probability. Gaussian noise v ∼ N (0, 1) is then added to the HF. A set of 5 time series in D(2) is reported Figure 11: [Sigmoid Data with Discontinuities] Averaged Risk Func- tions associated with different prediction approach and different pre- in Fig. 10. A fixed maintenance threshold has been set to diction horizons. H = 14. All of the N generated HFs exceed H in T . In Fig. 11 the averaged Risk Functions of PF and Kalman Filter fot the 50-step and 100-step ahead predictions are reported. RMSE performance is given in Table 1. In this case, too, the HGP-PF ouperforms the KF both in terms of RMSE and in PdM accuracy. 9 FCL FCL Helium Flow Helium Flow

Time

Time Figure 12: Anonymized helium flow data and an example of fixed control limit (FCL). It can be noticed the observation noise (possibly Figure 13: [Industrial Data] Filtering and prediction of the signal resulting in outliers), as well as the time-varying behavior of the presented in Fig. 12. signal.

operation has taken place, and {t , y }ni are the related 7. Application to predictive maintenance of a dry j j j=1 helium flow readings. Specifically, the accuracy of the pre- etching equipment for semiconductor manufac- diction at a time t¯ − ∆t is used to assess the performance turing i of the proposed algorithm. Maintenance optimization strategies based on HFs are In Table 1 the filtering performances of the proposed receiving increasing attention in the field of semiconductor HGP-PF and of a KF in terms of RMSE are reported. As manufacturing [17, 20]. In this Section, the HPG-PF ap- in the simulation studies of previous section, the HGP-PF proach developed in the present paper is tested on a PdM outperforms the KF. Fig. 13 reports filtering and predic- problem related to a dry etching equipment. Etching is tion results based on the data of Fig. 12. In Fig. 3, the a fundamental step in semiconductor fabrication that is predictions distributions are represented with grey and red employed to chemically remove layers from the wafer sur- areas if they are below or above the FCL, respectively. Fig. face. In some etching tools, the wafer is held on a Electro- 14 shows the estimated risk (i.e., the threshold-crossing static Chuck (ESC) thanks to electrostatic charge, while probability) at time instant t¯i − ∆t for the actual failure a backside helium flow cools down the wafer and prevents time t¯i (i.e., R(t¯i) computed with all the available infor- problems during the unloading of the product [63]. Dur- mation at time instant t¯i − ∆t). Rather unsurprisingly, ing this operation, the quartz parts around the ESC (and the best performances are obtained with the smallest ∆t, the ESC itself) undergo the action of aggressive plasma, namely, 100 working hours (left panel of Figure 14). In causing a wearing out that affects both wafer quality and this case, only 3 out of 17 estimated risks are below 50%, process stability. The intensity of the helium flow intensity and only one below 40%. When ∆t is increased to 200 in the etching equipment, represented in Fig. 12, reflects working hours (central panel of Figure 14), the increased the wear of the ESC and can be considered a HF for the uncertainty does not allow to obtain optimal results. For degradation problem at hand. The helium flow exhibits a ∆t = 250 working hours (right panel of Figure 14) the monotonically increasing trend (masked by noise) and it is phenomenon is even more evident, as the distribution of common practice that, in a Condition-based Maintenance the risk function assessment is almost flat. fashion, when a given threshold is exceeded, maintenance From an operational point of view, the most important operations take place, including the (expensive) replace- feature of the proposed methodology is its capability to ment of several components. Predicting future values of precisely assess the fault event probability when the life- the HF is fundamental to timely schedule maintenance ac- cycle of the equipment is coming to an end, thus allowing tions and minimize the trade-off between unnecessary re- to schedule an early maintenance operation. With respect placements and unexpected breaks. For such reasons, sta- to this requirement, the performances presented in Figure bility and reliability of the predictions are primary needs in 14 are satisfactory. The experimental results presented the problem at hand, as in general for HF signal employed in this section show that the choice of a proper predic- in PdM. tion range is crucial: Indeed, a long range would result To test the proposed methodology, a dataset consisting in almost uninformative predictions, while a short range of 17 complete helium flow readings (from maintenance to would yield extremely precise predictions when the opti- maintenance) is employed as benchmark. The i-th test mal time to adjust the maintenance schedule is already series is defined as passed. Such trade-off between precision and timeliness is consistent with the characteristics of proposed prediction S = {t , y }ni , t¯ , i = 1,..., 17 i j j j=1 i paradigm. where τi is the actual time instant at which a maintenance Finally, the HGP-PF based and KF based PdM policies 10 100 hours before maintenance 200 hours before maintenance 250 hours before maintenance 3 4 3

3.5 2.5 2.5

3

2 2 2.5

1.5 2 1.5 Frequency Frequency Frequency 1.5 1 1

1

0.5 0.5 0.5

0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Probability Probability Probability

Figure 14: [Industrial] Risk function assessments for the experimental dataset.

Maintenance Policy cUL/cUB = 25 cUL/cUB = 40 Preventive Maintenance 45 Kalman [ -50] PdM HGP-PF ∆t = 25 11.13 13.25 Kalman [ -100] PdM HGP-PF ∆t = 50 15.81 19.75 40 PF [ -50] PF [ -100] PdM HGP-PF ∆t = 75 17.34 22.43 35 PdM HGP-PF ∆t = 100 20.14 26.29

30 PdM HGP-PF ∆t = 150 23.49 30.71 PdM HGP-PF ∆t = 200 26.56 38.31 25 PdM HGP-PF ∆t = 250 27.12 39.12 J [arbitrary unit] 20 PvM 24.84 37.81

15 Table 2: [Industrial Data] HGP-PF-based PdM and PvM perfor- mances in terms of overall cost J for fixed values of the ratio cUL . 10 cUB 10 15 20 25 30 35 40 45 50 Costs ratio performances for fixed values of the ratio cUL and differ- cUB Figure 15: [Industrial Data] Optimal value of J for the various main- ent values of ∆t for the HGP-PF PdM-based and for the tenance strategies as a function of the ratio of the costs cUL (averaged cUB PvM-based maintenance management policy. are reported results over 100 Monte Carlo simulations). in Table 2. It can be noticed that, for ∆t = 200 and ∆t = 250, the PvM-based policy is more effective than the HGP-PF PdM-based one. It can also be appreciated have been tested versus a simulated Preventive Mainte- that, as expected, the more timely a prediction is, the nance (PvM) tool. PvM is a really popular approach to lower the performance is in terms of costs associate with maintenance that triggers actions based on the amount of unexpected breaks and unexploited lifetime. However, in time passed from previous maintenance. Here the PvM a cost-minimization perspective such lower performance tool has been simulated by computing a risk factor R PvM could be justified in some real world example by the cost on a different set D of N time series from the same di- 0 0 savings associated with timely maintenance planning. stributions of D as follows

PN0 i=i Θ(τi − t) Conclusions and Discussion RPvM(t) = %, (21) N0 In this work, a hidden Gamma process particle-filter ap- where Θ(·) is the Heaviside step function. Observe that the proach for health factor has been presented. The proposed Risk Function (21) is computed on the training data and approach is well suited for real-world industrial health fac- does not depend on current sensor readings. Let ρUB be tors, characterized by monotonic behavior and observed the percentage of unexpected breaks and ρUl the number through irregularly sampled and noisy measurements. The of unexploited runs. In Fig. 15 the overall costs J = proposed approach provides predictions based on a Par- cUBρUB + cULρUL are reported, over a range of different ticle Filter that employs Monte Carlo simulation to ap- values of the ratio cUL , for the different policies. It can be proximate the health factor posterior distribution from the cUB appreciated how PdM approaches are generally superior aforementioned data. To account for changes in variability to PvM and PF outperfoms the Kalman Filtering-based of the health factor, an adaptive filtering scheme, based on PdM policy. a regularization approach, has also been proposed. Fur- To better highlight the trade-off between prediction ac- thermore, the definition and generation of a proper risk curacy and timeliness of the proposed HGP-PF approach, function associated with the model has been discussed. 11 The proposed approach has been tested on both synthetic nance for multi-component systems, Reliability Engineering & data and experimental data coming from a semiconductor System Safety 144 (2015) 83–94. manufacturing application. In both cases, the new ap- [5] J. Sikorska, M. Hodkiewicz, L. Ma, Prognostic modelling op- tions for remaining useful life estimation by industry, Mechani- proach has proved to ensure better performance with re- cal Systems and Signal Processing 25 (5) (2011) 1803–1836. spect to those based on Kalman Filtering when applied to [6] S. J. Engel, B. J. Gilmartin, K. Bongort, A. Hess, Prognos- the definition of Predictive Maintenance policies. Further- tics, the real issues involved with predicting life remaining, in: Aerospace Conference Proceedings, 2000 IEEE, Vol. 6, IEEE, more, the possibility of calculating the future distribution 2000, pp. 457–469. of the health factor can be used to obtain a quantitative [7] G. Susto, A. Schirru, S. Pampuri, S. McLoone, A. Beghi, Ma- assessment of failure risks. chine learning for predictive maintenance: a multiple classifiers In Fault Detection and Isolation, model robustness and approach, IEEE Transactions on Industrial Informatics 11 (3) (2015) 812–820. reliability are crucial issues [64, 65]. For this reason, many [8] D. R. Lewin, Predictive maintenance using pca, Control Engi- data-driven approaches, thanks to their simple forms and neering Practice 3 (3) (1995) 415–421. limited requirements in terms of design and engineering [9] G. A. Susto, S. McLoone, D. Pagano, A. Schirru, S. Pampuri, A. Beghi, Prediction of integral type failures in semiconduc- efforts, have become more and more popular both in in- tor manufacturing through classification methods, in: Emerging dustry and academia [66]. One of the main advantages of Technologies & Factory Automation (ETFA), 2013 IEEE 18th the approach proposed in this work is that it only relies Conference on, IEEE, 2013, pp. 1–4. on the simple model assumption that the Health Factor [10] T. Benkedjouh, K. Medjaher, N. Zerhouni, S. Rechak, Health assessment and life prediction of cutting tools based on support and its increments are nonnegative. As shown in Sec- vector regression, Journal of Intelligent Manufacturing 26 (2) tion 1, such assumption is common and realistic in most (2015) 213–223. industrial/real-world scenarios. [11] L. Liao, F. K¨ottig,A hybrid framework combining data-driven Beside robustness and reliability, current research in and model-based methods for system remaining useful life pre- diction, Applied Soft Computing 44 (2016) 191–199. Fault Detection and Isolation and Predictive Maintenance [12] X.-S. Si, W. Wang, C.-H. Hu, D.-H. Zhou, Remaining useful life is dedicated to the derivation of multi-component, incip- estimation–a review on the statistical data driven approaches, ient faults [67], and model-free solutions. In the present European Journal of Operational Research 213 (1) (2011) 1–14. [13] X. Zhou, J. L. Stein, T. Ersal, Battery state of health mon- work the latter 2 aspects have been addressed, while multi- itoring by estimation of the number of cyclable li-ions, Con- component problems have not been explored. In fact, in trol Engineering Practice 66 (Supplement C) (2017) 51 – 63. complex, real-world industrial scenarios processes are de- doi:https://doi.org/10.1016/j.conengprac.2017.05.009. scribed by a large number of variables that can be re- [14] T. Wang, J. Yu, D. Siegel, J. Lee, A similarity-based prognos- tics approach for remaining useful life estimation of engineered lated to a given fault. For this reason, many works on systems, in: Prognostics and Health Management, 2008. PHM multi-dimensional diagnosis have been presented in the 2008. International Conference on, IEEE, 2008, pp. 1–6. past recent years [66, 68, 3]. However, in terms of prog- [15] D.-G. D. Chen, Y. Lio, H. K. T. Ng, T.-R. Tsai, Statistical nosis, the focus of the proposed approach, Health Factors modeling for degradation data (2017). [16] M. Bressel, M. Hilairet, D. Hissel, B. O. Bouamama, Remaining can be traced back to a single variable/quantity, identi- useful life prediction and uncertainty quantification of proton fied through experience/domain expertise, the output of exchange membrane fuel cell under variable load, IEEE Trans- multi-dimensional Soft Sensor, or the residual of a Fault actions on Industrial Electronics 63 (4) (2016) 2569–2577. Detection & Identification multi-dimensional procedure. [17] S. Butler, J. Ringwood, Particle filters for remaining useful life estimation of abatement equipment used in semiconductor man- One aspect that has not been tackled in this work and ufacturing, in: Control and Fault-Tolerant Systems (SysTol), may be subject of future research activities is the case of 2010 Conference on, 2010, pp. 436–441. multiple fault problems that may be associated with mul- [18] Y. Xie, Z. Jin, Y. Hong, J. H. Van Mullekom, Statistical meth- tiple Health Factors. While the Hidden-Gamma Process- ods for thermal index estimation based on accelerated destruc- tive degradation test data, in: Statistical modeling for degrada- Particle Filter procedure can still be applied in such cases tion data, Springer, 2017, pp. 231–251. (prognostic for each Health Factor can be run in parallel [19] L. Hu, L. Li, Q. Hu, Degradation modeling, analysis, and ap- by different instances of the proposed approach), a jointly plications on lifetime prediction, in: Statistical Modeling for Degradation Data, Springer, 2017, pp. 43–66. risk function has to be considered to implement Predictive [20] G. Susto, A. Beghi, C. D. Luca, A predictive maintenance sys- Maintenance policies considering multiple faults. tem for epitaxy processes based on filtering and prediction tech- niques, Semiconductor Manufacturing, IEEE Transactions on [1] E. Arinton, S. Caraman, J. Korbicz, Neural networks for mod- 25 (4) (2012) 638–649. elling and fault detection of the inter-stand strip tension of a [21] F. Souza, R. Araujo, Online mixture of univariate linear regres- cold tandem mill, Control Engineering Practice 20 (7) (2012) sion models for adaptive soft sensors, IEEE Transactions on 684–694. Industrial Informatics 10 (2) (2014) 937–945. [2] M. Chioua, M. Bauer, S.-L. Chen, J. C. Schlake, G. Sand, [22] D. Wang, J. Liu, R. Srinivasan, Data-driven soft sensor ap- W. Schmidt, N. F. Thornhill, Plant-wide root cause identifi- proach for quality prediction in a refining process, IEEE Trans- cation using plant key performance indicators (kpis) with ap- actions on Industrial Informatics 6 (1) (2010) 11–17. plication to a paper machine, Control Engineering Practice 49 [23] Q. Zhang, M. Canova, Fault detection and isolation of automo- (2016) 149–158. tive air conditioning systems using first principle models, Con- [3] L. Ma, J. Dong, K. Peng, K. Zhang, A novel data-based quality- trol Engineering Practice 43 (2015) 49–58. related fault diagnosis scheme for fault detection and root cause [24] D. Hast, R. Findeisen, S. Streif, Detection and isolation of para- diagnosis with application to hot strip mill process, Control metric faults in hydraulic pumps using a set-based approach and Engineering Practice 67 (2017) 43–51. quantitative–qualitative fault specifications, Control Engineer- [4] K.-A. Nguyen, P. Do, A. Grall, Multi-level predictive mainte- 12 ing Practice 40 (2015) 61–70. Methods in Practice, Springer Verlag, 2001. [25] S. Dey, Z. A. Biron, S. Tatipamula, N. Das, S. Mohon, B. Ay- [46] A. Schirru, S. Pampuri, G. DeNicolao, Particle filtering of hid- alew, P. Pisu, Model-based real-time thermal fault diagnosis of den gamma processes for robust predictive maintenance in semi- lithium-ion batteries, Control Engineering Practice 56 (2016) conductor manufacturing, in: IEEE Conference on Automation 37–48. Science and Engineering (CASE), 2010, pp. 51–56. [26] Q.-N. Xu, K.-M. Lee, H. Zhou, H.-Y. Yang, Model-based fault [47] M. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tuto- detection and isolation scheme for a rudder servo system, IEEE rial on particle filters for online nonlinear/non-gaussian bayesian Transactions on Industrial Electronics 62 (4) (2015) 2384–2396. tracking, IEEE Transactions on Signal Processing 50 (2) (2002) [27] A. Bakdi, A. Kouadri, A. Bensmail, Fault detection and diagno- 174–188. sis in a cement rotary kiln using pca with ewma-based adaptive [48] R. Douc, O. Capp´e,Comparison of resampling schemes for par- threshold monitoring scheme, Control Engineering Practice 66 ticle filtering, in: Image and Signal Processing and Analysis, (2017) 64–75. IEEE, 2005, pp. 64–69. [28] Z. Ge, Z. Song, F. Gao, Review of recent research on data- [49] M. Pitt, N. Shephard, Filtering via simulation: Auxiliary par- based process monitoring, Industrial & Engineering Chemistry ticle filters, Journal of the American Statistical Association Research 52 (10) (2013) 3543–3562. 94 (446) (1999) 590–599. [29] S. Ding, S. Yin, K. Peng, H. Hao, B. Shen, A novel scheme [50] T. Li, M. Bolic, P. M. Djuric, Resampling methods for particle for key performance indicator prediction and diagnosis with ap- filtering: classification, implementation, and strategies, IEEE plication to an industrial hot strip mill, IEEE Transactions on Signal Processing Magazine 32 (3) (2015) 70–86. Industrial Informatics 9 (4) (2013) 2239–2247. [51] M. Seeger, Bayesian modelling in machine learning: A tutorial [30] D. Filev, R. Chinnam, F. Tseng, P. Baruah, An industrial review, Tech. rep., EPFL (2009). strength novelty detection framework for autonomous equip- [52] G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction ment monitoring and diagnostics, IEEE Transactions on Indus- to Statistical Learning, Springer, 2014. trial Informatics 6 (4) (2010) 767–779. [53] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statis- [31] D. Gorinevsky, Monotonic regression filters for trending deteri- tical Learning, Springer, 2009. oration faults, in: American Control Conference, Vol. 6, IEEE, [54] R. Tibshirani, Regression shrinkage and selection via the lasso, 2004, pp. 5394–5399. Journal of the Royal Statistical Society. Series B (Methodolog- [32] B. Saha, K. Goebel, J. Christophersen, Comparison of prognos- ical) 58 (1) (1996) 267–288. tic algorithms for estimating remaining useful life of batteries, [55] H. Zou, T. Hastie, Regularization and variable selection via Transactions of the Institute of Measurement and Control 31 the elastic net, Journal of the Royal Statistical Society Series (2009) 293–308. B(Statistical Methodology) 67 (2) (2005) 301–320. [33] M.-Y. You, L. Li, G. Meng, J. Ni, Cost-effective updated se- [56] B. Wahlberg, S. Boyd, M. Annergren, Y. Wang, An admm algo- quential predictive maintenance policy for continuously moni- rithm for a class of total variation regularized estimation prob- tored degrading systems, Automation Science and Engineering, lems, IFAC Proceedings Volumes 45 (16) (2012) 83–88. IEEE Transactions on 7 (2) (2010) 257–265. [57] X. Ding, Y. Tian, Y. Yu, A real-time big data gathering algo- [34] W. Wang, B. Hussin, T. Jefferis, A case study of condition rithm based on indoor wireless sensor networks for risk analysis based maintenance modelling based upon the oil analysis data of industrial operations, IEEE Transactions on Industrial Infor- of marine diesel engines using stochastic filtering, International matics PP (99) (2015) 1–1. doi:10.1109/TII.2015.2436337. Journal of Production Economics 136 (1) (2012) 84–92. [58] J. Platt, Sequential minimal optimization: A fast algorithm for [35] K. Abdennadher, P. Venet, G. Rojat, J.-M. R´etif,C. Rosset, training support vector machines, Tech. rep., Microsoft (April A real-time predictive-maintenance system of aluminum elec- 1998). trolytic capacitors used in uninterrupted power supplies, Indus- [59] G. Casella, E. George, Explaining the gibbs sampler, The Amer- try Applications, IEEE Transactions on 46 (4) (2010) 1644– ican Statistician 46 (3) (1992) 167–174. 1652. [60] R. VanDerMerwe, A. Doucet, N. DeFreitas, E. Wan, The [36] S. Lu, Y.-C. Tu, H. Lu, Predictive condition-based maintenance unscented particle filter, in: Computer Vision-ECCV 2004, for continuously deteriorating systems, Quality and Reliability Springer, 2004, pp. 28–39. Engineering International 23 (1) (2007) 71–81. [61] P. Van Overschee, B. De Moor, Subspace identification for lin- [37] S. Yang, T. Liu, State estimation for predictive maintenance ear systems: Theory, Implementation, Applications, Springer using kalman filter, Reliability Engineering & System Safety Science & Business Media, 2012. 66 (1) (1999) 29–39. [62] H. Bozdogan, Model selection and akaike’s information crite- [38] M. Abdel-Hameed, A gamma wear process, IEEE Transactions rion (aic): The general theory and its analytical extensions, on Reliability 24 (2) (1975) 152–153. Psychometrika 52 (3) (1987) 345–370. [39] E. Cinlar, E. Osman, Z. Bazoant, Stochastic process for extrap- [63] X. Wang, J. Cheng, K. Wang, Y. Yang, Y. Sun, M. Cao, C. Han, olating concrete creep, Journal of the Engineering Mechanics L. Ji, Modeling of electrostatic chuck and simulation of electro- Division 103 (6) (1977) 1069–1088. static force, Applied Mechanics and Materials 511 (2014) 588– [40] J. Lawless, M. Crowder, Covariates and random effects in a 594. gamma process model with application to degradation and fail- [64] S. X. Ding, Model-based fault diagnosis techniques: design ure, Lifetime Data Analysis 10 (3) (2004) 213–227. schemes, algorithms, and tools, Springer Science & Business [41] D. Lu, M. D. Pandey, W.-C. Xie, An efficient method for the es- Media, 2008. timation of parameters of stochastic gamma process from noisy [65] J. Chen, R. J. Patton, Robust model-based fault diagnosis for degradation measurements, Proceedings of the Institution of dynamic systems, Vol. 3, Springer Science & Business Media, Mechanical Engineers, Part O: Journal of Risk and Reliability 2012. 227 (4) (2013) 425–433. [66] A. Beghi, R. Brignoli, L. Cecchinato, G. Menegazzo, M. Ram- [42] K. LeSon, M. Fouladirad, A. Barros, Remaining useful lifetime pazzo, F. Simmini, Data-driven fault detection and diagnosis estimation and noisy gamma deterioration process, Reliability for hvac water chillers, Control Engineering Practice 53 (2016) Engineering & System Safety 149 (2016) 76 – 87. 79–91. [43] F. Alrowaie, R. Gopaluni, K. Kwok, Fault detection and isola- [67] H. Ji, X. He, J. Shang, D. Zhou, Incipient fault detection with tion in stochastic non-linear state-space models using particle smoothing techniques in statistical process monitoring, Control filters, Control Engineering Practice 20 (10) (2012) 1016–1032. Engineering Practice 62 (2017) 11–21. [44] A. Doucet, On sequential simulation-based methods for [68] G. Li, S. J. Qin, Y. Ji, D. Zhou, Reconstruction based fault bayesian filtering, Tech. rep. (1998). prognosis for continuous processes, Control Engineering Prac- [45] A. Doucet, N. DeFreitas, N. Gordon, Sequential Monte Carlo tice 18 (10) (2010) 1211–1219.

13