sensors

Letter Chaotic Ensemble of Online Recurrent Extreme Learning Machine for Temperature Prediction of Control Moment Gyroscopes

Luhang Liu 1,2 , Qiang Zhang 2, Dazhong Wei 2, Gang Li 2, Hao Wu 2, Zhipeng Wang 3 , Baozhu Guo 1,* and Jiyang Zhang 2 1 China Aerospace Academy of Systems Science and Engineering, Beijing 100037, China; [email protected] 2 Beijing Institute of Control Engineering, Beijing 100094, China; [email protected] (Q.Z.); [email protected] (D.W.); [email protected] (G.L.); [email protected] (H.W.); [email protected] (J.Z.) 3 State Key Lab of Rail Traffic Control & Safety, Beijing Jiaotong University, Beijing 100044, China; [email protected] * Correspondence: [email protected]

 Received: 30 July 2020; Accepted: 19 August 2020; Published: 25 August 2020 

Abstract: Control moment gyroscopes (CMG) are crucial components in spacecrafts. Since the anomaly of bearing temperature of the CMG shows apparent correlation with nearly all critical fault modes, temperature prediction is of great importance for health management of CMGs. However, due to the complicity of thermal environment on orbit, the temperature signal of the CMG has strong intrinsic nonlinearity and chaotic characteristics. Therefore, it is crucial to study temperature prediction under the framework of chaos time series theory. There are also several other challenges including poor data quality, large individual differences and difficulty in processing streaming data. To overcome these issues, we propose a new method named Chaotic Ensemble of Online Recurrent Extreme Learning Machine (CE-ORELM) for temperature prediction of control moment gyroscopes. By means of the CE-ORELM model, this proposed method is capable of dynamic prediction of temperature. The performance of the method was tested by real temperature data acquired from actual CMGs. Experimental results show that this method has high prediction accuracy and strong adaptability to the on-orbital temperature data with sudden variations. These superiorities indicate that the proposed method can be used for temperature prediction of control moment gyroscopes.

Keywords: control moment gyroscope; temperature prediction; online-recurrent extreme learning machine

1. Introduction Health management of attitude control systems of spacecrafts is a promising research direction at present. Recently, control moment gyroscopes (CMGs) have become the actuator of choice due to their high torque amplification capability and play essential roles in the operation of spacecrafts. CMGs are capable of producing significant torques and can handle large quantities of momentum over long periods of time. Consequently, CMGs are preferred in precision pointing applications and in momentum management of large, long-duration spacecrafts. However, owing to the short period of applications, condition monitoring and health management of CMGs is less studied. Therefore, it is essential to study this issue. A CMG is comprised of a rapidly spinning rotor mounted on one or two gimbals, and is accordingly called a single gimbal CMG (SGCMG) or a double gimbal CMG. Due to the limitations of weight, space and energy cost, there are only a few signals (rotating speed, temperature, electric current and so

Sensors 2020, 20, 4786; doi:10.3390/s20174786 www.mdpi.com/journal/sensors Sensors 2020, 20, 4786 2 of 13 on) gathered from the on-orbit satellites. Among those signals, the temperature signal shows apparent correlation with nearly all of the critical fault modes such as bearing failure, rotor jam and frame jam. Therefore, the temperature signal is an important indicator to identify whether the CMG is working well. However, considering the delay of transmission and processing, health assessment based on the monitoring of temperature signal has a certain hysteresis. Therefore, it is essential to study temperature prediction so as to assess the CMG’s health state and detect weak failures as early as possible. Currently, a number of methods are proposed for monitoring machine temperatures [1–4]. Deng X. et al. established a mathematical model of the thermal characteristics of a spindle bearing system [5]. Bing C. et al. designed a digital temperature-based temperature alarm system [6]. Ma W. et al. used statistical methods based on historical data to study temperature prediction [7]. In recent years, the rapid development of algorithms has brought new possibilities for temperature prediction. For instance, Luo et al. proposed a long short term memory-based approach to forecast the temperature trend [8,9]. In addition, many other methods based on temperature data have also been proposed [10,11]. Considering the universal presence of chaotic phenomena and the complex environments of spacecrafts, the temperature signal also has strong intrinsic nonlinearity and chaotic characteristics. Therefore, it is crucial to study temperature prediction under the framework of chaos time series theory. As a typical feedforward neural network, online sequential extreme learning machine (OS-ELM) is a good choice to deal with sequence data prediction [12]. Zhang et al. used an OS-ELM-based model to realize the real-time prediction of solar radiation [13]. Yu et al. proposed an improved OS-ELM method and achieved good results in the online ship rolling prediction [14]. Park et al. proposed an improved OS-ELM model called online recurrent extreme learning machine (OR-ELM). Experimental results showed that the model can not only obtain high prediction accuracy, but it also has remarkable adaptability to mutation data [15]. However, those individual algorithms have variations in different trials of simulations and relatively poor stabilities [16]. To overcome this weakness, on the basis of chaos time series theory, we propose an integrated structure to ensemble a number of OR-ELMs as a whole, called Chaotic Ensemble of Online Recurrent Extreme Learning Machine (CE-ORELM). This structure attempts to improve the stability and accuracy by synthetic analysis according to results of several individual OR-ELMs. In this paper, a CE-ORELM-based method is proposed for temperature prediction of control moment gyroscopes. The structure of this paper is arranged as follows: Section2 describes the challenges of the CMG temperature prediction problem; Section3 introduces our prediction framework for the temperature of CMG; Section4 introduces the experiments details; in Section5 we illustrate experimental results and make some discussion about the result; Section6 concludes the work of this paper.

2. Challenges of Control Moment Gyroscope Temperature Prediction As aforementioned, since the inherent relationships and operation environments of control moment gyroscopes are complex and rough, there are many challenges during temperature prediction.

2.1. Chaotic Characteristics of Raw Signals According to chaos theory, chaos phenomena is ubiquitous. Since the temperature signal of CMG is a typical chaos time series, strong intrinsic nonlinearity and chaotic characteristics can be found and have critical influence on the prediction. Therefore, we have to study temperature prediction under the framework of chaos time series theory.

2.2. Poor Quality of Training Data The monitoring signals are transmitted from spacecrafts to the ground. Due to the limitations of transmission modes and hardware resources, the sampling frequency of data obtained by the ground is Sensors 2020, 20, 4786 3 of 13 greatly reduced. The problem of missing data also occurs from time to time. In addition, the data delay and noise interference in the transmission process directly lead to further reducing of data quality.

2.3. Large Individual Differences The running conditions and environments of CMGs are quite different, and the distributions of temperatures are also related to these factors. It is almost impossible to train one model which can be universally applied to all CMGs. Even if a model is trained that performs well in offline prediction, its performance may greatly reduce in practice.

2.4. Streaming Data Processing The temperature data are transmitted to the computing platform in the form of network byte stream. The collection, preprocessing and model training method of the streaming data are different from the traditional offline training approach. The streaming data is usually unstable, which requires the algorithm to have the ability of continuous learning and timely adaptation to these changeable features. Currently, there is no existing model which can solve all aforementioned challenges. Concerning those issues, we introduce a new efficient and accurate prediction framework which is also verified by using actual data collected from CMGs in service.

3. Methods

3.1. Phase-Space Reconstruction of Chaotic Time Series The dynamic phase-space reconstruction theory is an important method to study in dynamic systems [17]. It extends a one-dimensional chaotic time series to a high-dimensional phase space, which can better describe the dynamic morphology of the system. For time series x(t), t = 1, 2, 3, ... , N, the system embedding dimension (m) is introduced to construct an m-dimensional phase space, in which x(t) can be expressed as:

X(t) = (x(t), x(t 1), , x(t (m 1))). (1) − ··· − − According to Takens’ embedding theory [18], as long as the dimension satisfies:

m 2D + 1, (2) ≥ where D is the dimension of the system attractor, then the reconstructed system is equivalent to the original dynamics system. Therefore, the determination of the system embedding dimension m is the key issue for phase space reconstruction. Therefore, we need to calculate the system attractor dimension D. In general, the correlation dimension is calculated as Grassberger and Procaccia proposed an algorithm for calculating the correlation dimension [19], also known as the G–P algorithm, which is widely used. The steps of the algorithm are as follows: (1) Calculation of correlation integral: Define N as the number of vectors in reconstructed phase space, and the correlation integral Cm(r) is defined as XN XN 2   Cm(r) = H r Xi Xj , (3) N(N 1) − − − i=1 j=i+1 where, H is the Heaviside function. (2) A cluster of ln Cm(r)-ln(r) curves are plotted through increasing m. Then the method is used to make in the curve’s approximately linear part, to obtain the estimated value of correlation dimension D. Sensors 2020, 20, x FOR PEER REVIEW 4 of 14 where, H is the Heaviside function. (2) A cluster of ln Cm(r)-ln(r) curves are plotted through increasing m. Then the least squares method is used to make linear regression in the curve’s approximately linear part, to obtain the estimatedSensors 2020 ,value20, 4786 of correlation dimension D. 4 of 13 The correlation dimension D is defined as The correlation dimension D is defined as () ln ()Crm D = lim . (4) → ln(Cm(r)) D =r lim0 ln ()r . (4) r 0 ln(r) →

3.2.3.2. OR-ELM OR-ELM Theory Theory AsAs a widelywidely usedused solution solution to theto the online online time-series time-series prediction prediction problem, problem, online sequentialonline sequential extreme extremelearning learning machine machine (OS-ELM) (OS-ELM) can quickly can track quickly new tr sequenceack new sequence patterns andpatterns performs and betterperforms than better other thanonline other learning online solutions learning insolutions most cases in most [12]. cases Due [12]. to the Due introduction to the introduction of an incremental of an incremental learning learningalgorithm, algorithm, OS-ELM OS-ELM can update can the update model the parameters model parameters sample-by-sample. sample-by-sample. When new When samples new are samplesadded, itare is notadded, necessary it is not to necessary recalculate to allrecalculat the previouse all the data, previous but to data, conduct but incrementalto conduct incremental learning on learningthe new on sample the new based sample on the based previous on the model. previous model. ParkPark et al.al. [15[15]] proposed proposed an an improved improved OS-ELM, OS-ELM, called called online online recurrent recurrent extreme extreme learning learning machine machine(OR-ELM), (OR-ELM), which shows which better shows performance better performance on time-series on time-series prediction prediction tasks than tasks other than traditional other traditionalonline learning online methods learning such methods as hierarchical such as hierarchical temporalmemory temporal (HTM) memory and (HTM) online and long online short-term long short-termmemory (online memory LSTM). (online The LSTM). OR-ELM The algorithm OR-ELM adds algorithm an LN layeradds toan the LN basic layer OS-ELM to the basic and constructsOS-ELM anda recurrent constructs neural a recurrent network neural as its network main framework. as its main The framework. weights ofThe input weights layer of and input hidden layer layerand hiddenare updated layer byare twoupdated ELM-auto-encoders by two ELM-auto-encoders (ELM-AE) [16 (ELM-AE)], the first [16], being the ELM-AE first being for inputELM-AE weight for input(ELM-AE-IW) weight (ELM-AE-IW) and the second and ELM-AE the second for ELM-AE hidden weightfor hidden (ELM-AE-HW). weight (ELM-AE-HW). The learning The process learning of processOR-ELM of isOR-ELM mainly dividedis mainly into divided two stages. into two The stages. first partThe isfirst the part initialization is the initialization stage, in whichstage, in the which input theweight, input output weight, weight output and weight parameter and parameter matrix are matrix initialized. are initialized. The second The part second is the part online is the sequential online sequentiallearning phase, learning in which phase, a new in which chunk a of new samples chunk is used of samples to update is theused input to update weight, the output input weight weight, and outputhidden weight weight. and Figure hidden1 shows weight. the di Figurefference 1 shows between the a difference simple OS-ELM between model a simple and the OS-ELM corresponding model andimproved the corresponding OR-ELM model. improved OR-ELM model.

y()t y(1)t− y()t

Output Layer

Recurrent Edge Hidden Layer

LN Layer

Input Layer x()t x(1)t− x()t OS-ELM OR-ELM

Figure 1. A simple online sequential extreme learning machine (OS-ELM) model and a simple online Figure 1. A simple online sequential extreme learning machine (OS-ELM) model and a simple online recurrent extreme learning machine (OR-ELM) model. recurrent extreme learning machine (OR-ELM) model. 1. Initialization stage: 1. Initialization stage: For any online prediction model, there is no training data available in the initialization process, For any online prediction model, there is no training data available in the initialization process, so the algorithm adopts the method to complete the initialization of the output weight ββ0: so the algorithm adopts the method to complete the initialization of the output weight 0 :  I  1 β = 0, P = − . (5) 0 0 C where C is the regularization constant of the ELM-AE. In addition, the initial values of ELM-AE-IW’s input weight, ELM-AE-HW’s input weight and hidden layer’s output H0 are assigned by the standard normal distribution. Sensors 2020, 20, 4786 5 of 13

2. Online sequential learning stage:

In the online sequential learning stage, the weight matrix will be updated once a new set of input data with Nk+1 training samples arrives. x(k + 1) represents the newly added sample here. According to the RLS method, the forgetting factor λ is introduced, then the updated equations of the model are as follows:  I  1 β = 0, P = − . (6) 0 0 C W = i T The input weight update is k+1 βk+1 , where

βi = βi + Pi Hi T(x(k + 1) Hi βi ); (7) k+1 k k+1 k+1 − k+1 k

1 1 Pi = Pi Pi Hi T(λ2 + λHi Pi Hi T)− Hi Pi . (8) k+1 λ k − k k+1 k+1 k k+1 k+1 k V = h T The hidden weight update is k+1 βk+1 , where

βh = βh + Ph Hh T(H Hh βh); (9) k+1 k k+1 k+1 k − k+1 k

1 1 Ph = Ph PhHh T(λ2 + λHh PhHh T)− Hh Ph. (10) k+1 λ k − k k+1 k+1 k k+1 k+1 k The Output matrix update is

Hk+1 = g(norm(Wk+1x(k + 1) + Vk+1Hk)) (11) where, g() is the activation function. The output weight update is

T β + = β + P + H (x(k + 1) H + β ) (12) k 1 k k 1 k+1 − k 1 k

1 T 2 T 1 P + = P PkH + (λ + λH + P H + )− H + P . (13) Sensors 2020, 20, x FOR PEER REVIEWk 1 λ k − k 1 k 1 k k 1 k 1 k 6 of 14 The OR-ELM based prediction network designed in this paperχ t is shown in Figure2. The input layerL represents of the model the length consists of the of feature N cells, table and and each the cell scalar contains value a vectork represents of L dimensions the value arranged of the (k)th in specificinput atorder. time t. The Here output we set layer the value of the L model as 11,contains then the onlyL elements one-dimensional in each χ(t) scalar correspond value. Theexactly input to layerthe L-dimensional and the output inputs. layer are sequentially connected by the LN layer and the hidden layer. The output + of a hiddenThe output layer’s of cellthe ismodel input is to also the a LN set layer of time of theseries next data cell togetherY ={y(N with k) }, where the original y(N+ k) data represents at next timethe predicted step after temperature the multiplication after k operationcycles at the with measurement the specific weightpoint. coefficient.

y(N+ k) Hidden Hidden Hidden Hidden Units Units Units Units LN Unit LN Unit LN LN Unit LN LN Unit LN ......

χ (1) χ (2) ... χ (N)

FigureFigure 2.2. The architecture ofof thethe OR-ELMOR-ELM model.model.

3.3. Chaotic Ensemble of OR-ELM The OR-ELM network takes the minimum embedding dimension as the input number. Therefore, the estimation of the minimum embedding dimension has a great influence on the accuracy of the network. However, in practice, it is difficult to get the exact embedding dimension using the GP algorithm, and the stability of a single OR-ELM network is poor, which leads to the inaccurate prediction results of a single OR-ELM network. Therefore, a chaotic ensemble of OR-ELM consisting of multiple OR-ELMs (CE-ORELM) is proposed in this paper. The structure is shown in Figure 3. In this model, multiple parallel connected OR-ELM networks are constructed, which are denoted as sub− ORELM (i = 1, 2, …, n). The prediction results of each network are integrated i with proper weights to obtain the final prediction results of the network, to improve the prediction accuracy. In this paper, each subnet uses the default parameter, where the number of hidden layers is 1, and the number of hidden nodes is equal to the number of input vectors.

()+ xˆ1 tstep − sub ORELM1

()+ xˆ2 tstep ω sub− ORELM 1 2 ω X(t) 2 xˆ ()tstep+ ω  ()+ 3 xˆ3 tteps .

sub− ORELM .

3 . ω

. n . . . . .

− sub ORELM n xˆ ()tstep+ n Figure 3. The structure of chaotic ensemble of OR-ELM (CE-ORELM).

The correlation dimension D is obtained by the G–P algorithm described above, and the minimum embedding dimension m is determined according to the formula (2). Let the number of

Sensors 2020, 20, x FOR PEER REVIEW 6 of 14

χ t L represents the length of the feature table and the scalar value k represents the value of the (k)th input at time t. Here we set the value L as 11, then the L elements in each χ(t) correspond exactly to the L-dimensional inputs. + The output of the model is also a set of time series data Y ={y(N k) }, where y(N+ k) represents the predicted temperature after k cycles at the measurement point.

y(N+ k) Hidden Hidden Hidden Hidden Units Units Units Sensors 2020, 20, 4786 Units 6 of 13

n o The input of the model consists of a set of time series data χ = χ(1), χ(2), ... , χ(N) , where N LN Unit LN Unit LN LN Unit LN LN Unit LN represents the length of time window required. The... OR-ELM-based... prediction model proposed in this paper is only applicable to a certain position’s temperature prediction task. Each element χ(t) RL in n o ∈ χ t t t the input sample is composed of an m-dimensional vector χ1, χ2, ... , χL , where L represents the t length of the feature table(1) and the scalar(2) value represents(N) the value of the (k)th input at time t. Here χ χ χk ... χ we set the value L as 11, then the L elements in each χ(t) correspond exactly to the L-dimensional inputs. n o The output of the model is also a set of time series data Y = y(N+k) , where y(N+k) represents the Figure 2. The architecture of the OR-ELM model. predicted temperature after k cycles at the measurement point.

3.3.3.3. Chaotic Chaotic Ensemble Ensemble of of OR-ELM OR-ELM TheThe OR-ELMOR-ELM networknetwork takes takes the the minimum minimum embedding embedding dimension dimension as the inputas the number. input Therefore,number. Therefore,the estimation the ofestimation the minimum of the embedding minimum dimensionembedding has dimension a great influence has a great on the influence accuracy on of the the accuracynetwork. of However, the network. in practice, However, it isin di practice,fficult to it getis difficult the exact to embeddingget the exact dimension embedding using dimension the GP usingalgorithm, the GP and algorithm, the stability and ofthe a stability single OR-ELMof a single network OR-ELM is poor,network which is poor, leads which to the leads inaccurate to the inaccurateprediction prediction results of a results single of OR-ELM a single network. OR-ELM Therefore, network. Therefore, a chaotic ensemble a chaotic ofensemble OR-ELM of consisting OR-ELM consistingof multiple of OR-ELMs multiple OR-ELMs (CE-ORELM) (CE-ORELM) is proposed is proposed in this paper. in this The paper. structure The structure is shown is in shown Figure in3. FigureIn this 3. model, In this multiple model, parallelmultiple connected parallel connected OR-ELM networksOR-ELM arenetworks constructed, are constructed, which are denotedwhich are as − denotedsub ORELM as sub(i = ORELM1, 2, ... ,i n(i). = The 1, 2, prediction …, n). The results prediction of each results network of each are network integrated are with integrated proper − i withweights proper to obtain weights the to final obtain prediction the final results prediction of the network, results of to the improve network, the predictionto improve accuracy. the prediction In this accuracy.paper, each In subnetthis paper, uses theeach default subnet parameter, uses the defa whereult theparameter, number ofwhere hidden the layers number is 1, of and hidden the number layers isof 1, hidden and the nodes number is equal of hidden to the nodes number is ofequal input to vectors.the number of input vectors.

()+ xˆ1 tstep − sub ORELM1

()+ xˆ2 tstep ω sub− ORELM 1 2 ω X(t) 2 xˆ ()tstep+ ω  ()+ 3 xˆ3 tteps .

sub− ORELM .

3 . ω

. n . . . . .

− sub ORELM n xˆ ()tstep+ n FigureFigure 3. 3. TheThe structure structure of of chaotic chaotic ensemble ensemble of of OR-ELM OR-ELM (CE-ORELM). (CE-ORELM).

TheThe correlationcorrelation dimensiondimensionD Dis obtainedis obtained by the by G–P the algorithmG–P algorithm described described above, and above, the minimumand the minimumembedding embedding dimension dimensiom is determinedn m is determined according to according the Formula to th (2).e formula Let the number(2). Let the of input number nodes of in the central subnet sub-ORELM[n/2] be equal to m. The number of input nodes in other subnets is defined as follows: In = In + (i [n/2])(i = 1, 2, , n), (14) i [n/2] − ··· where n is the total number of subnets in CE-ORELM. Ini represents the number of input nodes in the subnet. When i = [n/2], the subnet is called central subnet. Due to the instability of the performance of a single ORELM network, each subnet needs to be weighted appropriately to obtain more accurate prediction results. Define the weighted factor as ω, the optimal weighted factors of each subnet ωi (i = 1, 2, ... , n) is calculated through the least square regression algorithm.

XN Xn 2 minJCE ORELM = min [x(t + step) ωixˆi(t + step)] , (15) − t=1 − i=1 Sensors 2020, 20, x FOR PEER REVIEW 7 of 14

− input nodes in the central subnet sub ORELM[]n/2 be equal to m. The number of input nodes in other subnets is defined as follows: =+−()[ ] = Ini In[]n /2 i n /2 (1,2,,)in , (14)

where n is the total number of subnets in CE-ORELM. Ini represents the number of input nodes in the subnet. When in= [ /2] , the subnet is called central subnet. Due to the instability of the performance of a single ORELM network, each subnet needs to be weighted appropriately to obtain more accurate prediction results. Define the weighted factor as ω, the optimal weighted factors of each subnet ωi (i = 1, 2, …, n) is calculated through the least square regression algorithm.

𝑚𝑖𝑛 𝐽 =𝑚𝑖𝑛∑[𝑥(𝑡 + 𝑠𝑡𝑒𝑝) − ∑ 𝜔𝑥(𝑡+𝑠𝑡𝑒𝑝)] , (15) where, N is the number of samples, 𝑠𝑡𝑒𝑝 is the prediction step and 𝑥(𝑡 + 𝑠𝑡𝑒𝑝) is the true value. The final prediction results xˆ ()tstep+ of CE-ORELM network can be expressed as follows:

n ˆ ()+=×+ϖ ˆ () x t step ii x t step , (16) Sensors 2020, 20, 4786 i=1 7 of 13 ()+ where, n is the total number of subnets in CE-ORELM and xˆi tstep represents the output of the where, N is the number of samples, step is the prediction step and x(t + step) is the true value. ith subnet. The final prediction results xˆ(t + step) of CE-ORELM network can be expressed as follows:

3.4. Framework of Temperature Prediction Xn xˆ(t + step) = $ xˆ (t + step), (16) As shown in Figure 4, the CE-ORELM-based tei ×mperaturei prediction framework proposed in i=1 this paper is mainly composed of three parts: data preprocessing part, CE-ORELM-based model’s where,trainingn andis the prediction total number part, and of subnets an auxiliary in CE-ORELM alarm part. and xˆi(t + step) represents the output of the ith subnet.The CE-ORELM model predicts the temperature data of each measurement point in the next k cycles. In the comparison link, the prediction value was compared with the temperature warning 3.4.threshold Framework of each of Temperature measurement Prediction point, and the difference between the prediction value and the externalAs shown environment in Figure temperature4, the CE-ORELM-based was compared temperature with the temperature prediction framework warning threshold proposed of in each this papermeasurement is mainly point. composed When of any three one parts: of the data prediction preprocessing values part, exceeded CE-ORELM-based the threshold, model’s the warning training andsignal prediction would be part, sent. and an auxiliary alarm part.

Streaming CE-ORELM-based CE-ORELM-based Start Data Preprocessing Sensor Data Model Training Model Prediction

Yes

Prediction Stop No continue No Prediction Result >Threhold ? Result

Send Warning Message Yes

Figure 4. Framework of the prediction model. Figure 4. Framework of the prediction model. The CE-ORELM model predicts the temperature data of each measurement point in the next k 4. Experiments cycles. In the comparison link, the prediction value was compared with the temperature warning threshold4.1. Data Description of each measurement point, and the difference between the prediction value and the external environment temperature was compared with the temperature warning threshold of each measurement point.The When data anyused one in the of theexperiments prediction are values collected exceeded from control the threshold, moment thegyroscopes warning of signal a satellite would in beservice. sent. The temperature sensor installed on the high-speed bearing of each CMG which is shown in Figure 5. We collected the running data within 15 days and stored them in the offline database, and 4. Experiments 4.1. Data Description The data used in the experiments are collected from control moment gyroscopes of a satellite in service. The temperature sensor installed on the high-speed bearing of each CMG which is shown in Figure5. We collected the running data within 15 days and stored them in the o ffline database, and the raw data is shown in Figure6. If the temperature value is larger than 70 ◦C, the CMG is considered to be breaking down. To simulate the online scenario as much as possible, we arranged this data into streaming data format in time order for the use of prediction task. There are two hyper parameters that need to be determined before applying the model to make predictions tasks. They are M and λ, which respectively represent the number of hidden nodes and the forgetting factor of model. Finding the optimal hyper parameters becomes a key task for the data with different distributions. We divided the dataset into three parts νn, νt and νm. To evaluate the feasibility and adaptability of the model in response to emergencies, we filtered out a piece of data containing special events (emergency braking in our experiment) from the data set, which is νm. Then we divided the remaining data into νn and νt according to the ratio of 8:2. The data set νn was used to train the optimal hyper parameters of the model, and νt was used to evaluate the model’s performance in solving this prediction problem and to compare its characteristics with other models. Sensors 2020, 20, x FOR PEER REVIEW 8 of 14 Sensors 2020, 20, x FOR PEER REVIEW 8 of 14 thethe raw raw data data is is shown shown in in Figure Figure 6. 6. If If the the temperatur temperature evalue value is is larger larger than than 70 70 °C, °C, the the CMG CMG is is considered considered Sensorstoto be be breaking2020 breaking, 20, 4786 down. down. To To simulate simulate the the online online scenario scenario as as much much as as possible, possible, we we arranged arranged this this data data8 into of into 13 streamingstreaming data data format format in in time time orde order rfor for the the use use of of prediction prediction task. task.

Figure 5. Control moment gyroscopes. Figure 5. Control moment gyroscopes. Degrees Celsius Degrees Degrees Celsius Degrees

FigureFigure 6. 6. The The raw raw data. data. 4.2. Hyper Parametric Setting ThereThere are are two two hyper hyper parameters parameters th thatat need need to to be be determined determined befo beforere applying applying the the model model to to make make Μ λ predictionspredictionsTo obtain tasks. tasks. the They optimalThey are are hyperΜandand parameters λ , which, which respectively Mrespectivelyand λ, the represent represent data set theν then is number number involved. of of hidden Accordinghidden nodes nodes to and the and structuralthe forgetting characteristics factor of model. of the Finding model itself,the optima the optimall hyper parameters value of the becomes parameter a keyλ istask between for the thedata the forgetting factor of model. Finding the optimal hyper parameters becomesν ν a key taskν for the data intervalwith different 0.9–1. Wedistributions. first fix a certain We dividedλ, and the then dataset test the into prediction three parts accuracyν n ,ν oft and theν modelm . To obtained evaluate with different distributions. We divided the dataset into three parts n , t and m . To evaluate after taking different M (10–1500) on the data set νn. Finally, we can get the optimal M with this λ. thethe feasibility feasibility and and adaptability adaptability of of the the model model in in resp responseonse to to emergencies, emergencies, we we filtered filtered out out a apiece piece of of By constantly adjusting the value of λ, the optimal hyper parameters M and λ applicable to the data datadata containing containing special special events events (e (emergencymergency braking braking in in our our experiment) experiment) from from the the data data set, set, which which is is set can be obtained after some iterative searches.

Sensors 2020, 20, 4786 9 of 13

4.3. Experiment Scheme Design We conducted experiments to compare the performances of the proposed CE-ORELM for temperature prediction with different parameters. To evaluate the performance of the prediction result on the test data set, we introduce the normalized root mean square error (NRMSE) and mean absolute percentage error (MAPE) indicators for analysis. In the time series prediction, NRMSE is the most common evaluation indicator used to evaluate the performance, which can reflect the similarity between the prediction series and the actual series. It can be calculated as: q 1 PT 2 T k=1 (Yk y(k)) NRMSE = − , (17) Ymeam where T represents the length of data sequence, Yk represents the observed value at time k, and y(k) represents the predicted value of the algorithm at time k. Although data preprocessing methods were implemented, there were still some noises contained in data. Contrary to NRMSE, MAPE is more suitable to evaluate the performance of prediction models with noisy data. It can be calculated as:

PT Y y(k) k=1 k − MAPE = P , (18) T Y k=1| k| where T, Yk, and y(k) are defined the same as NRMSE.

4.4. Parameters of CE-ORELM The correlation dimension D can be obtained through the G–P method described above. Firstly, a cluster of ln Cm(r)–ln(r) curves of the normal data are plotted through increasing m. Then the least square regression method is used to obtain the estimated value of D. According to the Formula (2), the embedding dimension m is determined, and m = 6. After training and testing, a prediction model of normal state can be determined.

5. Results and Discussion The experimental results are illustrated in this section. By analyzing the prediction error of the model with different forgetting factors λ and hidden node number M, as shown in Figure7, we obtained the optimal hyper parameters in this experiment. The algorithm proposed in this paper is verified by the test data νt mentioned above. At the same time, the performance of the proposed CE-ORELM algorithm is compared by the NRMSE and MAPE in different training samples and prediction steps. The results are shown in Table1. In terms of training sample 3000, the NRMSE values are 0.0021, 0.015, 0.0028 and 0.0030 respectively, which are lowest in all the NRMSE values, including the training sample 2500 (0.0042, 0.1400, 0.0224 and 0.1423) and the training sample 2000 (0.0118, 0.1179, 0.0423 and 0.0466). Therefore, the number of the training sample has certain influences on the performance of CE-ORELM. The MAPE are 0.0014, 0.0012, 0.0025 and 0.0031 respectively, which is lower than other training samples (2500: 0.0107, 0.0226, 0.0165 and 0.0340; 2000: 0.2250, 0.5130, 0.2515 and 0.3659). Therefore, the CE-ORELM algorithm is robust and effective in the forecasting of time series with intrinsic nonlinearity and chaotic characteristics. In order to compare the performance of the models with different prediction steps and training data, we selected different prediction steps of 5, 10, 15 and 20, and training samples of 2000, 2500 and 3000 to analyze the prediction accuracy respectively, as shown in Figure8. It is shown in Figure8a that when the number of training samples is 3000, the prediction accuracies of all models perform best compared with the training samples of 2500 and 2000 and are all above 90%. Meanwhile, the accuracies of training samples 2500 are around 90%, and the accuracies of training samples 2000 are from 60% to 90%. Therefore, it can be seen from Figure8a that with the number of training sample increases, Sensors 2020, 20, 4786 10 of 13 the accuracy of CE-ORELM can reach about 90% gradually. The greater the number of training samples, the higher accuracy it achieves. According to Figure8b, with the di fferent prediction steps, CE-ORELM has the highest accuracies all the time (no less than 92%), compared with the sub-ORELMs. What is more, as shown in both Figure8a,b, the prediction accuracies of prediction step 10 is almost the best with different parameters. Therefore, the proposed algorithm with the prediction step of 10 can be usedSensors for 2020 temperature, 20, x FOR PEER prediction REVIEW of control moment gyroscopes. 10 of 14

0.7 0.9

0.8 0.6

0.7 0.5 0.6

0.4 0.5 NRMSE 0.3 NRMSE 0.4

0.3 0.2 0.2 0.1 0.1

0.0 0.0 -200 0 200 400 600 800 1000 1200 1400 1600 1800 -200 0 200 400 600 800 1000 1200 1400 1600 1800 # of hidden nodes # of hidden nodes

(a) forgetting factor λ =1.0 (b) forgetting factor λ = 0.99

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5 NRMSE 0.4 NRMSE 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0 -200 0 200 400 600 800 1000 1200 1400 1600 1800 0 200 400 600 800 1000 1200 1400 1600 # of hidden nodes # of hidden nodes (c) forgetting factor λ = 0.95 λ = (d) forgetting factor 0.93 Figure 7. Performance on the prediction by Chaotic EnsembleEnsemble of Online Recurrent Extreme Learning Machine (CE-ORELM).(CE-ORELM).

Table 1. The analysis of prediction results based on normalized root mean square error (NRMSE) and The algorithm proposed in this paper is verified by the test data ν mentioned above. At the mean absolute percentage error (MAPE). t same time, the performance of the proposed CE-ORELM algorithm is compared by the NRMSE and Training Prediction Prediction Prediction Prediction MAPE in different training samplesValue and prediction steps. The results are shown in Table 1. In terms Sample Step 5 Step 10 Step 15 Step 20 of training sample 3000, the NRMSE values are 0.0021, 0.015, 0.0028 and 0.0030 respectively, which NRMSE 0.0021 0.0015 0.0028 0.0030 are lowest in all the3000 NRMSE values, including the training sample 2500 (0.0042, 0.1400, 0.0224 and MAPE 0.0014 0.0012 0.0025 0.0031 0.1423) and the training sample 2000 (0.0118, 0.1179, 0.0423 and 0.0466). Therefore, the number of the NRMSE 0.0042 0.1400 0.0224 0.1423 training sample has2500 certain influences on the performance of CE-ORELM. The MAPE are 0.0014, MAPE 0.0107 0.0226 0.0165 0.0340 0.0012, 0.0025 and 0.0031 respectively, which is lower than other training samples (2500: 0.0107, 0.0226, NRMSE 0.0118 0.1179 0.0423 0.0466 0.0165 and 0.0340; 20002000: 0.2250, 0.5130, 0.2515 and 0.3659). Therefore, the CE-ORELM algorithm is MAPE 0.2250 0.5130 0.2515 0.3659 robust and effective in the forecasting of time series with intrinsic nonlinearity and chaotic characteristics. To demonstrate the efficiency of the CE-ORELM in details, a number of results are shown in TableTable2 and 1. Figure The analysis9. The of weights prediction of sub-ORELMsresults based on with normalized the prediction root mean step square 10 areerror shown (NRMSE) in Tableand 2, and themean actual absolute outputs percentage and predictederror (MAPE). outputs with prediction step 10 are shown in Figure9. It is obvious that the curve of actual outputs and the curve of predicted outputs are almost the same. Training Prediction Prediction Prediction Prediction Moreover, the predictedValue outputs (ranging from 39 to 42) are not only close to actual data, but also Sample Step 5 Step 10 Step 15 Step 20 NRMSE 0.0021 0.0015 0.0028 0.0030 3000 MAPE 0.0014 0.0012 0.0025 0.0031 NRMSE 0.0042 0.1400 0.0224 0.1423 2500 MAPE 0.0107 0.0226 0.0165 0.0340 NRMSE 0.0118 0.1179 0.0423 0.0466 2000 MAPE 0.2250 0.5130 0.2515 0.3659

In order to compare the performance of the models with different prediction steps and training data, we selected different prediction steps of 5, 10, 15 and 20, and training samples of 2000, 2500 and

Sensors 2020, 20, x FOR PEER REVIEW 11 of 14

3000 to analyze the prediction accuracy respectively, as shown in Figure 8. It is shown in Figure 8a that when the number of training samples is 3000, the prediction accuracies of all models perform best compared with the training samples of 2500 and 2000 and are all above 90%. Meanwhile, the accuracies of training samples 2500 are around 90%, and the accuracies of training samples 2000 are from 60% to 90%. Therefore, it can be seen from Figure 8a that with the number of training sample increases, the accuracy of CE-ORELM can reach about 90% gradually. The greater the number of Sensorstraining2020 ,samples,20, 4786 the higher accuracy it achieves. According to Figure 8b, with the different11 of 13 prediction steps, CE-ORELM has the highest accuracies all the time (no less than 92%), compared with the sub-ORELMs. What is more, as shown in both Figure 8a,b, the prediction accuracies of indicateprediction the step same 10 variation is almost trend the best with with the actual differ data,ent parameters. which has obviousTherefore, variable the proposed stages. Therefore,algorithm thewith proposed the prediction method step is e offfective 10 can in be temperature used for temperature prediction prediction under different of control operation moment states. gyroscopes. Prediction Accuracy Prediction

(a) Prediction Accuracy Prediction

(b)

FigureFigure 8.8. PredictionPrediction accuraciesaccuracies ((aa)) underunder didifferentfferent samples samples ( b(b)) under under di differentfferent sub sub CE-ORELM. CE-ORELM.

To demonstrateTable 2. theWeights efficiency of sub-online of the recurrentCE-ORELM extreme in details, learning a machine number (ORELMs). of results are shown in Table 2 and Figure Embedding9. The weights ofEmbedding sub-ORELMs wiEmbeddingth the predictionEmbedding step 10 are shownEmbedding in Table 2, Sub-ORELM Dimension 4 Dimension 5 Dimension 6 Dimension 7 Dimension 8 Weight 0.3454 0.1350 0.1522 0.2358 0.1316

Moreover, as shown in Figure9, it is noticeable that due to the regular change of thermal environment on orbit, the temperature of the high-speed bearing increases with step-wise changes and the values fluctuate between adjacent stages. In this scenario, the proposed method tends to get close Sensors 2020, 20, x FOR PEER REVIEW 12 of 14 and the actual outputs and predicted outputs with prediction step 10 are shown in Figure 9. It is obvious that the curve of actual outputs and the curve of predicted outputs are almost the same. Moreover, the predicted outputs (ranging from 39 to 42) are not only close to actual data, but also indicate the same variation trend with the actual data, which has obvious variable stages. Therefore, the proposed method is effective in temperature prediction under different operation states.

Table 2. Weights of sub-online recurrent extreme learning machine (ORELMs).

Sub- Embedding Embedding Embedding Embedding Embedding ORELM Dimension 4 Dimension 5 Dimension 6 Dimension 7 Dimension 8 Weight 0.3454 0.1350 0.1522 0.2358 0.1316

Moreover, as shown in Figure 9, it is noticeable that due to the regular change of thermal Sensors 2020, 20, 4786 12 of 13 environment on orbit, the temperature of the high-speed bearing increases with step-wise changes and the values fluctuate between adjacent stages. In this scenario, the proposed method tends to get closeto the to later the stage.later stage. Although Although this tendency this tendency decreases decreases the accuracy, the accuracy, it is helpful it is helpful for us for to decide us to decide ahead aheadof time. of Ittime. is also It is shown also shown that when that thewhen suddenly the suddenly varying varying data emerges, data emerges, the proposed the proposed CE-ORELM CE- ORELMcan also performcan also well.perform Therefore, well. Therefore, it is obvious it thatis obvious the CE-ORELM that the CE-ORELM has high prediction has high accuracy prediction and accuracystrong adaptability and strong to adaptability the on-orbital to the temperature on-orbital data temperature with sudden data variations.with sudden variations. Degrees Celsius Degrees

Figure 9. PredictionPrediction result. result.

6. Conclusions Conclusions Temperature predictionprediction is is a significanta significant part part of theof the health health management management system system for CMGs. for CMGs. This paper This paperproposes proposes a chaotic a chaotic ensemble-ORELM-based ensemble-ORELM-based framework framework that that can can be be applied applied to to thethe temperature prediction of CMGs. We We tested this framework with the actual data acquired in the running process of CMGs. CMGs. The The experiment experiment results results show show that that the the predic predictiontion accuracy accuracy increases increases with with the theincrease increase of the of numberthe number of training of training samples. samples. The prediction The prediction step stepalso has also influence has influence on the on prediction the prediction accuracy, accuracy, and theand accuracy the accuracy is highest is highest with withthe prediction the prediction step 10 step in 10the in experiment. the experiment. The experimental The experimental results results show thatshow this that framework this framework outperforms outperforms others otherswith a withhigher a higher prediction prediction accuracy accuracy over 96%. over What 96%. is What more, is themore, proposed the proposed framework framework has a hasgood a goodability ability in predicting in predicting the thedata data with with sudden sudden variations variations and and is thereforeis therefore effective effective inin temperature temperature prediction prediction un underder different different operation operation states. These These advantages indicateindicate thatthat thisthis framework framework approximately approximately meets meets the the requirements requirements of temperature of temperature prediction prediction of CMGs of CMGsin practice. in practice. However, the high accuracy is based on the sufficiency of training samples. In practice, the absence of fault data may create over-fitting. Thus, more experiments with small sample data and imbalanced data should be done in the future.

Author Contributions: B.G. and J.Z. conceived and designed the experiments; Q.Z., D.W., G.L., and H.W. performed the experiments; L.L. and Z.W. analyzed the data; L.L. wrote the paper. All authors have read and agreed to the published version of the manuscript. Funding: This research is supported by National Natural Science Foundation of China (Grant Nos. U1837602 and 61803022). Conflicts of Interest: The authors declare no conflict of interest. Sensors 2020, 20, 4786 13 of 13

References

1. Mili´c,S.D.; Sreckovic, M. A Stationary System of Noncontact Temperature Measurement and Hotbox Detecting. IEEE Trans. Veh. Technol. 2008, 57, 2684–2694. [CrossRef] 2. Ngigi, R.W.; Pislaru, C.; Ball, A.D.; Gu, F. Modern techniques for condition monitoring of railway vehicle dynamics. J. Phys. Conf. Ser. 2012, 364, 012016. [CrossRef] 3. Amini, A.; Entezami, M.; Papaelias, M. Onboard detection of railway axle bearing defects using envelope analysis of high frequency acoustic emission signals. Case Stud. Nondestruct. Test. Eval. 2016, 6, 8–16. [CrossRef] 4. Sun, L.; Hei, X.; Xie, G.; Qian, F.; Wang, Z.; Liu, R. Data Based Fault Diagnosis of Hot Axle for High-Speed Train. In Proceedings of the Chinese Automation Congress (CAC), Xi’an, China, 30 November–2 December 2018. 5. Deng, X.; Fu, J.; Zhang, Y. A Predictive Model for Temperature Rise of Spindle–Bearing Integrated System. J. Manuf. Sci. Eng. 2015, 137, 021014. [CrossRef] 6. Bing, C.; Shen, H.; Chang, J.; Li, L. Design of CRH axle temperature alarm based on digital potentiometer. In Proceedings of the 2016 35th Chinese Control Conference (CCC), Chengdu, China, 27–29 July 2016. 7. Ma, W.; Tan, S.; Hei, X.; Zhao, J.; Xie, G. A Prediction Method Based on Stepwise for Train Axle Temperature. In Proceedings of the 2016 12th International Conference on Computational Intelligence and Security (CIS), Wuxi, China, 16–19 December 2017. 8. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef] [PubMed] 9. Luo, C.; Yang, D.; Huang, J.; Deng, Y. LSTM-Based Temperature Prediction for Hot-Axles of Locomotives. In Proceedings of the 4th Annual International Conference on Information Technology and Applications (ITA), Beijing, China, 26–28 May 2017. 10. Xie, G.; Wang, Z.; Hei, X.; Takahashi, S. Data-Based Axle Temperature Prediction of High Speed Train by Multiple Regression Analysis. In Proceedings of the 12th International Conference on Computational Intelligence and Security (CIS), Wuxi, China, 16–19 December 2017. 11. Cheng, Y.; Wang, Z.; Zhang, W.; Ranganathan, T.; Singh, V.; Thondiyath, A. A Novel Condition-Monitoring Method for Axle-Box Bearings of High-Speed Trains Using Temperature Sensor Signals. IEEE Sens. J. 2019, 19, 205–213. [CrossRef] 12. Liang, N.-Y.; Huang, G.-B.; Saratchandran, P.; Sundararajan, N. A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks. IEEE Trans. Neural Netw. 2006, 17, 1411–1423. [CrossRef] [PubMed] 13. Zhang, J.; Xu, Y.; Xue, J.; Xiao, W. Real-time prediction of solar radiation based on online sequential extreme learning machine. In Proceedings of the 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA), Wuhan, China, 31 May–2 June 2018. 14. Yu, C.; Yin, J.; Hu, J.; Zhang, A. Online ship rolling prediction using an improved OS-ELM. Presented at 33rd Chinese Control Conference (CCC), Nanjing, China, 28–30 July 2014. 15. Park, J.-M.; Kim, J.-H. Online recurrent extreme learning machine and its application to time-series prediction. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017. 16. Lan, Y.; Soh, Y.C.; Huang, G. Letters: Ensemble of online sequential extreme learning machine. Neurocomputing 2009, 72, 3391–3395. [CrossRef] 17. Packard, N.H.; Crutchfield, J.P.; Farmer, J.D.; Shaw, R.S. Geometry from a Time Series. Phys. Rev. Lett. 1980, 45, 712–716. [CrossRef] 18. Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick 1980; Rand, D., Young, L.S., Eds.; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1981. 19. Grassberger, P.; Procaccia, I. Measuring the strangeness of strange attractors. Phys. D Nonlinear Phenom. 1983, 9, 189–208. [CrossRef]

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).