Remaining Useful Life Estimation of Batteries Using Dirichlet Process with Variational Bayes Inference /Author=Pajovic, Milutin;

MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com

Remaining Useful Life Estimation of Batteries using Dirichlet Process with Variational Bayes Inference

Pajovic, Milutin; Orlik, Philip V.; Wada, Toshihiro TR2018-152 November 13, 2018

Abstract Rechargeable batteries supply numerous devices with electric power and are critical part in a variety of applications. While estimation of battery’s state of charge (SoC), state of health (SoH) and state of power (SoP) have been in research focus in the past years, prediction of battery degradation has recently started to gain interest. An accurate prediction of the remaining number of charge and discharge cycles a battery can undergo before it can no longer hold charge and is declared dead, is directly related to making timely decision as to when a battery should be replaced so that power interruption of the system it supplies power to is avoided. A methodology for inferring probability distribution of the remaining number of charge-discharge cycles of a battery, based on training dataset containing measured discharge voltage waveforms of one or more batteries of similar type, is presented in this paper. The methodology strongly draws on modeling discharge voltage waveforms using Dirichlet Pro- cess Mixture Model framework and performs approximate inference using variational Bayes approach. The experimental results corroborate that the proposed method is able to provide useful predictions of the remaining useful life of a battery in early stages of its life.

Annual Conference of the IEEE Industrial Electronics Society (IECON)

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonproﬁt educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.

Remaining Useful Life Estimation of Batteries using Dirichlet Process with Variational Bayes Inference

Milutin Pajovic Philip Orlik Toshihiro Wada Mitsubishi Electric Research Labs Mitsubishi Electric Research Labs Mitsubishi Electric Corporation Cambridge, MA, USA Cambridge, MA, USA Amagasaki City, Japan [email protected] [email protected] [email protected]

Abstract—Rechargeable batteries supply numerous devices prognostics of battery’s state of health has been approached with electric power and are critical part in a variety of applica- from Bayesian perspective in [4]. Particle filtering is used tions. While estimation of battery’s state of charge (SoC), state in a variety of contexts to estimate the remaining useful of health (SoH) and state of power (SoP) have been in research focus in the past years, prediction of battery degradation has life of a battery. As such, [5] tracks battery degradation recently started to gain interest. An accurate prediction of the using a regular particle filter, [6] employs spherical cubature remaining number of charge and discharge cycles a battery can particle filter, [7] designs risk sensitive particle filter, while [8] undergo before it can no longer hold charge and is declared exploits unscented particle filter. A state-of-health regeneration dead, is directly related to making timely decision as to when phenomenon is modeled and predicted using suitably designed a battery should be replaced so that power interruption of the system it supplies power to is avoided. A methodology for particle filter [9]. On-line fault diagnosis and failure prognosis inferring probability distribution of the remaining number of is also approached with particle filter in [10]. In a similar charge-discharge cycles of a battery, based on training dataset vain, diagnosis and prognosis is done using Lebesgue sampling containing measured discharge voltage waveforms of one or more in [11]. batteries of similar type, is presented in this paper. The method- In this paper, we propose an algorithm for inferring remain- ology strongly draws on modeling discharge voltage waveforms using Dirichlet Process Mixture Model framework and performs ing useful life (RUL) of a battery using Dirichlet Process Mix- approximate inference using variational Bayes’ approach. The ture Model (DPMM) with variational Bayes (VB) inference. experimental results corroborate that the proposed method is More specifically, we cluster feature vectors representing dis- able to provide useful predictions of the remaining useful life of charge voltage waveforms of one or more batteries measured a battery in early stages of its life. during their lifetimes. The obtained clusters indicate types of Index Terms—rechargeable battery, remaining useful life, charge-discharge cycle, Dirichel process mixture model, varia- possible aging stages a battery goes through during its life- tional Bayes’, probabilistic inference time. In the operational stage, upon representing a measured discharge voltage waveform of a battery of similar type with a I.INTRODUCTION feature vector, the discharge cycle is probabilistically associ- Rechargeable batteries nowadays supply power to a variety ated with one of the clusters, i.e., aging stage. Consequently, of systems such as electric vehicles, consumer electronic the remaining number of charge-discharge cycles the battery devices, uninterrupted power supply (UPS) systems, etc. They can further undergo is predicted based on the estimated aging also support photo-voltaic systems and smart power grids. stage. Developing methods for estimating battery state of charge The works most relevant to this paper are [12] and [13]. (SoC), state of power (SoP) and state of health (SoH) has While we adopt the same empirical model for discharge been in persistent research focus in the past years [1], [2]. voltage waveform and generative model for feature vectors Predicting battery degradation over time is of utmost impor- as [12], we address a completely different problem. Namely, tance in a number of applications and has recently started we infer remaining useful life (RUL) of a battery at the to gain research interest. More specifically, the problem is end of some discharge cycle, while [12] estimates remaining to accurately predict how long a battery can supply its load, time to full discharge of a battery at some time instant usually expressed with the number of remaining charge and early in the discharge cycle. In addition, [12] performs the discharge cycles it can undergo. This ensures that the battery DPMM inference using Gibbs’ sampling, while we perform is replaced on time, thus avoiding (1) power disruptions due inference using variational Bayes’. A DPMM-based algorithm to unexpected battery failure and (2) discarding the battery for estimating remaining number of charge-discharge cycles before its useful life ends. A related problem, also relevant of a battery is also proposed in [13]. In comparison to [13], in a number of applications, is concerned with predicting the we use different empirical model for discharge voltage and remaining time to discharge of a battery [12], preferably early assume different generative model for feature vectors. Further- in the discharge cycle. more, [13] performs DPMM inference using Gibbs’ sampling, A concise review of methods for battery remaining useful while the models we adopt admit faster inference based on life estimation is given in [3]. In the domain of our interest, the variational Bayes’. The paper is organized as follows. Section II describes charge-discharge cycles the battery can undergo from that time feature vectors used to represent battery aging. Section III instant until it can no longer hold the charge and is declared describes the approach used for inferring remaining useful dead. In the problem setup considered here, the battery is life of a battery. Section IV presents our method based on being charged and discharged according to some predefined DPMM modeling and variational Bayes’ inference. Section template, and the aim is to infer the RUL at the end of a V experimentally validates the proposed method. Section VI discharge cycle, based on the measured voltage during that concludes the paper. discharge cycle. The charge-discharge template specifies how a battery is charged and discharged, for example, a battery II.FEATURE VECTORSFOR BATTERY AGING in the NASA’s dataset is charged with current 1.5 A until its We use experimental battery dataset collected and made voltage reaches 4.2 V, which is then kept constant until the publicly available by the NASA Ames Research Center of current drops below 20 mA, while the discharge is performed Excellence [18]. The dataset contains measured voltage and with constant discharge current of 2 A until the voltage drops current waveforms taken during charge and discharge cycles below 2.5 V. of various batteries undergoing accelerated aging, starting from The RUL of a battery in operational/online stage is inferred nominal capacity of 2 Ah and ending when the capacity falls using training data that contains feature vectors representing below 1.6 Ah, at which point a battery is declared dead. measured discharge voltage waveforms during discharge cy- In addition to voltage and current measurements, the dataset cles of one or more batteries of similar type. More specifically, also contains battery capacity recording at the end of each the training data D contains N pairs discharge cycle, evaluated using Coulomb counting, as well as the battery impedance measurements. We use discharge D = {(a1, k1), (a2, k2),..., (aN , kN )}, (2) voltage measurements as proxy to battery aging. where a is the feature vector computed from measured A voltage waveform, recorded during a single discharge n voltage after some discharge cycle, while k is the number cycle, contains few hundreds of samples. While the method n of remaining charge-discharge cycles left after that discharge described here can, in general, be applied directly to the cycle. In the case the training data D is populated with recorded discharge voltage, a computationally less demanding discharge voltage measurements from only one battery over and analytically more tractable method results from repre- its useful life, N is the overall number of discharge cycles that senting each discharge voltage waveform with parameters of battery underwent during its lifetime, n is the discharge cycle an empirical model for the discharge voltage waveform. We index such that n = 1 corresponds to the first discharge cycle, utilize empirical model from [12], which models the discharge while k = N − n. In general, D may contain measurements voltage V (t) as n from more than one battery, in which case N is the total −a2/t a4t V (t) = E0 − a1e − a3e + a5t, (1) number of measured discharge voltage waveforms, while n where t is the time such that t → 0 corresponds to the indexes data points. Given training data D, the RUL of a battery at the end of beginning of the discharge cycle and E0 is the voltage of a some discharge cycle in the operational/online stage is inferred fully charged battery, where E0 = 4.2 V for dataset [18]. The by extracting k ’s from D whose corresponding a ’s are model parameters aj, j = 1,..., 5, are all real-valued and con- n n 5 close in some sense to feature vector a0 representing measured stitute a feature vector a = a1 a2 a3 a4 a5 ∈ R . Therefore, each discharge cycle i in the dataset is represented discharge voltage during that discharge cycle. This is robustly done by clustering data from D in the training stage, and with feature vector ai, also referred to as the data point, by 0 fitting V (t), measured during that discharge cycle, with the classifying the feature vector a into one of those clusters in empirical model (1). Consequently, only five model parameters the operational stage. Formally, assume D is clustered into per discharge cycle are stored instead of the whole waveform L clusters and let Kl denote the set of remaining number of V (t). More importantly, model training and estimation of the charge-discharge cycles corresponding to training data points battery’s remaining useful life operates over five dimensional that constitute cluster l. More specifically, if some ai belongs to cluster l, then the corresponding k ∈ K . feature vectors ai instead of much longer discharge voltage i l waveforms. Since a cluster l contains data points whose corresponding As a side note, we emphasize that direct measurements numbers of remaining charge-discharge cycles are different of other quantities that depend on battery aging, or the (although relatively close), the remaining useful life of points parameters of the corresponding empirical models can be used from cluster l is represented with a probability distribution X for estimation of battery’s remaining useful life. The notion of p (x|l) ∝ g(x, k), (3) remaining useful life of a battery and outline of an approach x|l k∈Kl for its probabilistic inference are described in the following part. where g(x, k) is a kernel centered at k ∈ Kl, and x is the number of remaining charge-discharge cycles. Without loss of III.INFERENCEOF REMAINING USEFUL LIFE generality, we use Gaussian kernel

The remaining useful life (RUL) of a battery at some time 2 − (x−k) instant is a random variable that represents the number of g(x, k) = e 2σ2 , (4) where σ2 is an appropriately selected kernel width. Other as to yield tractable approximate inference. Therefore, the kernel functions are possible, for example, a discrete delta covariance matrix Σci is given by [12] function δ(x, k) (equal to 1 if x = k and zero otherwise), yields normalized histogram over k’s from K . Σ = {s−1 , . . . , s−1 } l ci diag ci,1 ci,5 (7) In the online/operational stage, measured discharge voltage of a battery is ﬁtted with empirical model (1), resulting in where sci,k ∼ Gamma(β, γ), k = 1,..., 5, are independent feature vector a0. The feature vector a0 is then softly classiﬁed samples from Gamma distribution with hyperparameters β and l p (a0 ∈ l) a0 into clusters , yielding probability a that comes γ. The mean vector Λci is generated according to from cluster l, l = 1,...,L. Finally, the remaining useful life of the battery is evaluated as the weighted combination p(Λci |Σci ) = N (Λci ; a0, hΣci ), (8) of px|l(x|l), l = 1,...,L, where the weights are cluster 0 probabilities p(a ∈ l), where a0 is same across all clusters and h is a hyperparameter.

L This generative model for cluster centers (Λci , Σci ) embodies 0 X 0 pRUL(x|a ) = pa(a ∈ l) px|l(x|l) (5) base distribution H. While the selection of H may seem l=1 arbitrary and not much related to our original problem, we Given the described framework, it remains to specify the point out that, in general, a generative model with hierarchical method for clustering training data D. In general, the RUL structure for observed data is robust to selection of hyperpa- framework places on constraints on the clustering method rameters. that can be used. However, for the reasons that will become The number of clusters in the DPMM is not ﬁxed in advance clear later we employ a powerful Dirichlet Process Mixture and is inferred from data. The DPMM generates clusters Model (DPMM). The following section provides details on according to a stick-breaking process, wherein an imagined the DPMM clustering method. unit-length stick is being broken into a growing number of segments such that each segment represents one cluster with IV. DATA MODELAND INFERENCE probability equal to the length of the corresponding segment. This section provides details on generative model assumed Speciﬁcally, the probability πl of cluster l is given by [15] for feature vectors a’s, describes clustering of the training data l−1 D, and outlines a method for computing class probabilities Y p(a0 ∈ l) of feature points acquired in the online stage. πl = vl (1 − vj), (9) j=1 A. DPMM Generative Model for Data Points

We assume data points ai, i = 1,...,N are gen- where vl ∼ Beta(1, α). Thus, at the beginning of the stick- erated according to the Dirichlet Process Mixture Model breaking process, we are given a unit length stick and break (DPMM) [15]–[17]. The Dirichlet Process (DP) is a generative a segment from it of length v1 ∼ Beta(1, α), which gives model specified with concentration parameter α and base the probability of the first cluster. The remaining segment of distribution H, often denoted as DP(α, H). One may think of a length 1 − v1 is then subject to stick breaking in the second DP as a probability distribution over probability distributions round, so that the length of the segment obtained after breaking such that a sample from the DP is an infinite dimensional a segment of length v2 ∼ Beta(1, α) is π2 = v2(1 − v1) discrete probability distribution. The support of such discrete and is equal to the second cluster probability. The process probability distribution is generated according to the base continues in the same manner and results in infinitely many distribution H, while the values of probability masses are stick segments, the length of each is equal to one cluster controlled by the concentration parameter α. Each support probability. element in the DP is a cluster center of the DPMM, while Finally, we emphasize that although the DPMM has po- its probability mass is the corresponding cluster probability. tential to generate infinitely many clusters, we observe finite More specifically, a data point ai is generated by sampling number of data points N and consequently can see only a cluster ci according to cluster probabilities πl, l = 1,...,L, a finite number of clusters. In fact, one may think that a and then sampling from Gaussian distribution, parameterized newly observed data point a comes from one of the existing with the cluster center (which in turn is generated from H). clusters, or from some previously unseen cluster. The rate at This results in data point ai being associated with cluster which new clusters are created is directly related to cluster ci and generated as a sample from, in our case, Gaussian probabilities πl and both are controlled by the concentration distribution with mean vector Λci and covariance matrix Σci , parameter α. Intuitively, the larger probabilities of existing clusters, the smaller chance a newly acquired data point comes p(a |c , Λ , Σ ) = N (a ; Λ , Σ ) (6) i i ci ci i ci ci from unseen cluster. We model the concentration parameter as

We emphasize that (Λci , Σci ) are associated with the cluster α ∼ Gamma(s1, s2) to fully adhere to hierarchical generative ci. The generative model for mean vector and covariance model so that the inference is less sensitive to the choice of matrix associated with each cluster l is chosen such that the hyperparameters, in this case, s1 and s2 than it would be in prior p(Λci , Σci ) is conjugate to model p(ai|ci, Λci , Σci ) so case α was itself a hyperparameter. B. DPMM Approximate Inference factorize, respectively, over their entries Λl,k and slk, where Given training dataset D, the goal of the training stage is k = 1,..., 5, such that to infer posterior distribution of cluster centers Λl and Σl, 2 Λl,k ∼ N (µl,k, σl,k) and sl,k ∼ Gamma(βl,k, γl,k) (13) cluster probabilities implicitly given by vl, cluster indices ci of training data points ai, and concentration parameter α. Furthermore, the approximate posteriors qvl and qα are given Formally, the aim is to ﬁnd by v ∼ Beta(δ , α ) and α ∼ Gamma(s , s ) (14) N L L l l l 1 2 p , p({ci}i=1, {Λl, Σl}l=1, {vl}l=1, α|D), (10) The approximation for the posterior probability pi,l , qci (l) where L is the implicit number of clusters observed/seen in that the data point ai belongs to cluster l, is evaluated as the data. Since solving (10) using Bayes’ rule is intractable, 5 we need to resort to approximations. Two main approaches 1 X 1 log p ∝ [log s ] − [s ] [(a − Λ )2] for approximate inference are Markov Chain Monte Carlo i,l 2 E l,k 2E l,k E i,k l,k (MCMC) and variational Bayes (VB) [14]. The generative k=1 l−1 model described in the previous part employs conjugate priors X +E[log vl] + E[log(1 − vl)], (15) and is thus suitable for Gibbs’ sampling, as done in [12], n=1 because all posterior conditionals that the Gibbs’ sampler samples from are standard distributions. Nevertheless, we where ai,k is the kth entry in ai. Above, given that sl,k and utilize the mean ﬁeld approximation (MFA) algorithm, which vl are, respectively, Gamma and beta distributed, E[sl,k] = is the most popular VB method because the MFA is faster than βl,k/γl,k, E[log sl,k] = ψ(βl,k) − log(γl,k) and E[log vl] = the Gibbs’ sampling. In addition, the MFA yields closed form ψ(δl) − ψ(δl + αl), where ψ(x) is Digamma function. expressions for iterative parameter updates due to conjugacy Thus, all factors from the approximating distribution q in the data generative model. follow standard distributions. The parameters of those dis- The MFA approximates the true posterior p with a distribu- tributions are also obtained from (12) and alike expressions. tion that factorizes over all unknown variables, Omitting the derivation details, these parameters are iteratively updated as p ≈ q({c }N , {Λ , Σ }L , {v }L , α) = i i=1 l l l=1 l l=1 h PN a p + a N L L L i=1 i,k i,l 0,k µl,k = (16) Y Y Y Y h PN p + 1 qci (ci) qΛi (Λi) qΣi (Σi) qvl (vl)qα(α) (11) i=1 i,l i=1 l=1 l=1 l=1 hγ σ2 = l,k (17) l,k PN The approximating distribution is found by minimizing the βl,k(h i=1 pi,l + 1) Kullback-Liebler (KL) divergence between the two distribu- N ! X tions, KL(q||p) [14]. Using the calculus of variations and βl,k = β + 0.5 pi,l + 1 (18) some algebraic manipulations yield expressions for computing i=1 each factor in q. For example, the resulting approximating 1 2 2 2 γl,k = γ + (a0,k − 2a0,kµl,k + µl,k + σl,k) distribution for ck is given by 2h N N L L 1 X 2 2 2 log qck (ck) ∝ E log p(D, {ci}i=1, {Λl, Σl}l=1, {vl}l=1, α) , + p (a − a µ + µ + σ ) (19) 2 i,l i,k i,k l,k l,k l,k (12) i=1 where the expectation is taken with respect to N Q Q Q Q X i6=k qci l qΛl l qΣl l qvl qα, i.e., with respect to δl = 1 + pi,l (20) the product of all factors in q excluding qck . The update i=1 expressions for other factors in q are given with analogous N l w1 X X expressions. αl = + N − pi,k (21) w2 The distribution under the expectation operator in (12) i=1 k=1 is joint likelihood of the observed data D and unknown w1 = s1 + L − 1 (22) L−1 parameters, and can be expressed as the sum of logarithm X terms by exploiting the conditional independences embedded w2 = s2 − (ψ(αk) − ψ(δk + αk)), (23) in the generative model. Once that is done, each factor term k=1 in q is computed by taking the expectation of the resulting where ψ(x) is the Digamma function. expression with respect to the product of all other factor terms. The iterative routine is initialized with uniform pi,l for each We omit the derivation details as they are tedious and do data point ai, while w1 = s1, w2 = s2 and the maximum not lend any particular insights. In the following, we provide number of clusters to look for in the dataset is set to some the resulting expressions for the factors in the approximating L. The hyperparameters s1, s2, β and γ are set to some distribution q. small values so that the corresponding distributions are non-

The factor distributions qΛl and qΣl that approximate informative, while h is set so that the distribution in (8) is wide 2 posteriors of the mean vector Λl and covariance matrix Σl enough. Then, αl, w1, w2, βl,k, µl,k, σl,k, γl,k and pi,l, where l = 1,...,L, k = 1,..., 5 and i = 1,...,N are iteratively parameters (feature vectors) that fit the specified empirical updated according to (16)–(23). The iterative updates are model (1). The feature vectors a of one battery, labeled as run predefined number of iterations or until convergence is battery B6 in the NASA’s dataset, are part of the training data established. We note that some other ways of initializing the D, while the feature vectors corresponding to the other battery, iterative routine and ordering in updating the factor parameters labeled as B5 in the NASA’s dataset, are used to testing. are also possible, however we believe the presented one is most We start by clustering data points from D using the DPMM suitable. Overall, the iterative procedure results in approximate generative model and the approximate inference, as detailed posterior distributions of unknown parameters, conditioned on in Section IV. For that purpose, we set h = 5 so that the data D. The next problem addressed in the following part is to distribution that generates feature vectors a is wide enough. 0 compute cluster probabilities of a feature vector a , acquired The mean feature vector a0 is set to the mean of all feature in the operational stage, based on clustering results of data D. vectors from D. The hyper-parameters of Gamma distributions in the DPMM model are chosen so that the distributions are C. Cluster Probabilities of Test Data Points −7 −7 non-informative, s1 = γ = 10 and s2 = β = 1+10 . One In the testing/operational stage, measured voltage during of the outputs from the DPMM inference routine are cluster some discharge cycle is fitted using the empirical model (1) probabilities pi,l, representing the probability that a feature 0 0 and represented with feature vector a . Classifying a into vector ai is generated from cluster l. They are shown in Fig. 1 clusters discovered in the training stage consists of computing such that its vertical slice is a resulting cluster probability 0 cluster probability distribution pa(a ∈ l), l = 1,...,L. The distribution of the corresponding feature vector. We note that cluster probability is computed as cluster indices have no particular meaning and some of them Z contain no data. In fact, out of L = 20 clusters that the p (a0 ∈ l) = p(a|l, Λ , Σ )q (Λ )q (Σ )dΛ dΣ . a l l Λl l Σl l l l (24) inference procedure is initialized with, the result indicates the existence of only 10 clusters. Notably, most feature vectors Since q (Λ ) and q (Σ ) factorize into products of con- Λl l Σl l are associated with high probability (close to 1) with only one stituent factors, the cluster probability is after a few steps of cluster. algebraic manipulations further given by 5 0 Y 0 pa(a ∈ l) = EΛl,k,Σl,k [N (ak;Λl,k, Σl,k)] k=1 5 √ s Y sl,k − l,k (a0 −Λ )2 = e 2 k l,k (25) EΛl,k,sl,k 2π k=1 −1 0 0 where Σl,k = sl,k and ak is the kth entry in a . The expectation in (25) is computed using the law of iterated expectations, where the expectation with respect to sl,k is computed first and expressed in terms of Gamma function of first kind Γ(). Skipping the algebraic manipulations, this results in

5 βl,k 1 Y γl,k Γ(βl,k + 0.5) p (a0 ∈ l) = × a 5/2 (2π) Γ(βl,k) k=1 " # 1 Λ (26) E l,k 0 2 βl,k+0.5 (γl,k + 0.5(ak − Λl,k) ) Since the expectation in (26) is not given in a closed form, it Fig. 1. DPMM data clustering probabilities of the training data (battery B6). 2 is computed numerically by sampling Λl,k ∼ N (µl,k, σl,k). This concludes the computation of cluster probability dis- Each feature vector in D is paired with the number of 0 tribution pa(a ∈ l). Substituting the resulting distribution charge-discharge cycles that remain from the cycle that feature 0 into (5) yields probability distribution pRUL(x|a ) of the re- vector corresponds to until the end of the battery’s useful life. maining useful life of a battery at the end of the discharge The number of remaining charge-discharge cycles associated cycle corresponding to feature vector a0. with feature vectors from the same cluster versus cluster index is shown in Fig. 2. As expected, most clusters cover a range of V. EXPERIMENTAL RESULTS the number of remaining number of charge-discharge cycles. The experimental testing of the described methodology is The exception is cluster 1 which contains 4 data points from performed using battery data from NASA Ames Research D whose corresponding remaining number of cycles cover Center of Excellence [18]. The measured discharge voltage relatively wide range, indicating that this cluster effectively waveforms of two selected batteries are represented with collects outliers and is thus not useful for RUL estimation. the true number of remaining charge-discharge cycles is 99. 140 Similarly, the RUL distribution at the end of the 80th cycle is shown in Fig. 5, where the true number of remaining cycles 120 is 29. We observe that the RUL distributions in Figs. 4 and 5 are concentrated around the corresponding true numbers of 100 remaining cycles and fairly accurately predict the remaining

80 useful life of the tested battery at the tested time instants.

60 0.035

RUL (# discharge cycles) 40 0.03

20 0.025

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.02 Cluster index

0.015 Fig. 2. Remaining number of charge-discharge cycles versus cluster index.

0.01 Each feature vector from the test data is softly classiﬁed into clusters identiﬁed in the training stage, as detailed in 0.005 Section IV-C. The resulting clustering probability distributions 0 are shown in Fig. 3. As can be observed, most of the test 0 20 40 60 80 100 120 140 160 180 200 data points are, with probability close to one, associated with RUL (# discharge cycles) one cluster. Also, a fairly large number of test data points get Fig. 4. Predicted probability distribution of the RUL at the end of 10th clustered in a particular cluster (the one with index 20). charge-discharge cycle. The true number of remaining cycles is 99.

0.03

0.025

0.02

0.015

0.01

0.005

0 0 20 40 60 80 100 120 140 160 180 200 RUL (# discharge cycles)

Fig. 5. Predicted probability distribution of the RUL at the end of 80th Fig. 3. Clustering of feature vectors from the test data (battery B5). charge-discharge cycle. The true number of remaining cycles is 29.

The RUL probability distribution is inferred at the end of Finally, we evaluate a hard estimate of the number of each charge-discharge cycle of the testing data using (5) and remaining charge-discharge cycles of the test battery at the the results shown in Fig. 3. We use σ2 = 4 for the parameter end of each discharge cycle by taking the mean of the in the Gaussian kernel (4) such that if a certain feature vector corresponding RUL distribution. The absolute estimation error is associated with the remaining number of charge-discharge is shown in Fig. 6. As can be observed from the plot, with cycles k, it can also be associated with the remaining number only few exceptions, the estimation error is below 20 charge- of cycles k − 1 and k + 1, with relatively high probability. discharge cycles even at the beginning of battery’s life. In fact, The RUL distribution at the end of the 10th charge-discharge the time span from the beginning until the mid of the battery’s cycle of the tested battery B5 is shown in Fig 4, where useful life is essentially where the proposed methodology provides useful estimates. As the battery gets closer to the end REFERENCES of its useful life, the estimated remaining number of charge- [1] G. O. Sahinoglu, M. Pajovic, Z. Sahinoglu, Y. Wang, P. Orlik and T. discharge cycles becomes unreliable. We hypothesize that as Wada, ”Battery State-of-Charge Estimation Based on Regular/Recurrent the battery approaches its end of life the model parameters do Gaussian Process Regression”, IEEE Transactions on Industrial Elec- tronics, vol. 65, no. 5, May 2018. not capture all the subtleties in the discharge voltage waveform [2] M. Pajovic, Z. Sahinoglu, Y. Wang, P. Orlik and T. Wada, ”Online data- so as to aid accurate prediction of the remaining number driven battery voltage prediction”, IEEE 15th International Conference of cycles. Further consideration of this issue and effort to on Industrial Informatics (INDIN), July 2017. [3] L. Wu, X. Fu and Y. Guan, ”Review of Remaining Useful Life Prog- ﬁnd better empirical model for this regime are left as future nostics of Vehicle Lithium-Ion Batteries Using Data-Driven Methodolo- research. gies”, MPDI Applied Sciences, 2016. [4] B. Saha, K. Goebel, S. Poll and J. Christophersen, ”Prognostics Methods for Battery Health Monitoring Using a Bayesian Framework”, IEEE Transactions on Instrumentation and Measurement, vol. 58, no. 2, Feb. 100 2009. [5] Z. Liu, G. Sun, S. Bu, J. Han, X. Tang, M. Pecht, ”Particle Learning 90 Framework for Estimating the Remaining Useful Life of Lithium-Ion 80 Batteries”, IEEE Transactions on Instrumentation and Measurement, vol. 55, no. 2, Feb. 2017. 70 [6] D. Wong, F. Yang, K. L. Tsui, Q. Zhou and S. J. Bae, ”Remaining Useful Life Prediction of Lithium-Ion Batteries Based on Spherical 60 Cubature Particle Filter”, IEEE Transactions on Instrumentation and 50 Measurement, vol. 65, no. 6, June 2016. [7] M. E. Orchard, L. Tang, B. Saha, K. Goebel, G. Vachtsevanos, ”Risk- 40 Sensitive Particle-Filtering-based Prognosis Framework for Estimation of Remaining Useful Life in Energy Storage Devices”, Studies in 30 Information and Control, vol. 19, no. 3, Sept. 2010.

Estimated EOL absolute error [8] Q. Miao, L. Xie, H. Cui, W. Liang and M. Pecht, ”Remaining useful 20 life prediction of lithium-ion battery with unscented particle filter 10 technique”, Microelectronics Reliability 53 (2013) 805–810. [9] B. E. Olivares, M. A. Cerda Munoz, M. E. Orchard and J. F. Silva, 0 ”Particle-Filtering-Based Prognosis Framework for Energy Storage De- 0 10 20 30 40 50 60 70 80 90 100 110 vices With a Statistical Characterization of State-of-Health Regeneration Discharge cycle Phenomena”, IEEE Transactions on Instrumentation and Measurement, vol. 62, no. 2, Feb. 2013. Fig. 6. Absolute estimation error of the remaining number of charge-discharge [10] M. E. Orchard and G. J. Vachtsevanos, ”A particle-filtering approach cycles of test battery B5. for on-line fault diagnosis and failure prognosis”, Transactions of the Institute of Measurement and Control 31, 3/4 (2009) pp. 221-246. [11] W. Yan, B. Zhang, G. Zhao, J. Weddington and G. Niu, ”Uncertainty Management in Lebesgue-Sampling-Based Diagnosis and Prognosis for Lithium-Ion Battery”, IEEE Transactions on Industrial Electronics, vol. VI.CONCLUSION 54, no. 10, Oct. 2017. [12] Y. Jinsong, L. Shuang, T. Diyin and L. Hao, ”Remaining Discharge This paper presents a methodology for inferring probability Time Prognostics of Lithium-Ion Batteries Using Dirichlet Process Mixture Model and Particle Filtering Method”, IEEE Transactions on distribution of the remaining number of charge-discharge Instrumentation and Measurement, vol. 66, no. 9, Sept. 2017. cycles a battery can undergo before it is declared dead. [13] J. Joseph, F. Doshi-Velez and N. Roy, ”A Bayesian Nonparametric Accurate prediction of the remaining useful life is important Approach to Modeling Battery Health”, IEEE International Conference on Robotics and Automation (ICRA), May 2012. in numerous applications since it ensures making timely [14] D. M. Blei and M. I. Jordan, ”Variational Inference for Dirichlet Process decision as to when a battery should be replaced such that Mixtures”, Bayesian Analysis, 2006. power interruptions of the system it supplies are avoided. The [15] Y. W. Teh, M. I. Jordan, M. J. Beal and D. M. Blei, ”Hierarchical Dirichlet Processes”, Journal of the American Statistical Association, methodology is based on Dirichlet process mixture model, 101(476):1566–1581, Nov. 2005. which is used to cluster feature vectors that represent discharge [16] Y. W. Teh, ”Dirichlet Processes”, Encyclopedia of Machine Learning, voltage waveforms of one or more batteries of similar type Springer, 2010. [17] C. E. Antoniak, ”Mixtures of Dirichlet Processes with Applications to measured over their life spans. The inference of the DPMM Bayesian Nonparametric Problems”, The Annals of Statistics, vol. 2, pp. generative model is performed using variational Bayes, which 1152-1174, 1974. is faster and provides better means to check for convergence [18] B. Saha and K. Goebel, ”Battery Data Set”, NASA Ames Prognostics Data Repository, NASA Ames, Moffett Field, CA, 2007. than sampling-based inference methods. In the operational stage, the discharge voltage of a battery is measured and fitted using the selected empirical model. The obtained feature vector is then classified into one of the DPMM clusters and the probability distribution of the remaining useful life, expressed as the number of remaining charge-discharge cycles, is inferred. The developed method is tested with experiments, which show that it provides useful predictions of the remaining number of charge-discharge cycles of a battery relatively early in its life.