Quick viewing(Text Mode)

An Integrated Deep Learning Method Towards Fault Diagnosis of Hydraulic Axial Piston Pump

An Integrated Deep Learning Method Towards Fault Diagnosis of Hydraulic Axial Piston Pump

sensors

Article An Integrated Method towards Fault Diagnosis of Hydraulic Axial Piston Pump

Shengnan Tang 1, Shouqi Yuan 1,*, Yong Zhu 1,2,3 and Guangpeng Li 1

1 National Research Center of Pumps, Jiangsu University, Zhenjiang 212013, China; [email protected] (S.T.); [email protected] (Y.Z.); [email protected] (G.L.) 2 State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou 310027, China 3 Ningbo Academy of Product and Food Quality Inspection, Ningbo 315048, China * Correspondence: [email protected]; Tel.: +86-0511-88780280

 Received: 14 October 2020; Accepted: 10 November 2020; Published: 18 November 2020 

Abstract: A hydraulic axial piston pump is the essential component of a hydraulic transmission system and plays a key role in modern industry. Considering varying working conditions and the implicity of frequent faults, it is difficult to accurately monitor the machinery faults in the actual operating process by using current fault diagnosis methods. Hence, it is urgent and significant to investigate effective and precise fault diagnosis approaches for pumps. Owing to the advantages of intelligent fault diagnosis methods in big data processing, methods based on deep learning have accomplished admirable performance for fault diagnosis of rotating machinery. The prevailing convolutional neural network (CNN) displays desirable automatic learning ability. Therefore, an integrated intelligent fault diagnosis method is proposed based on CNN and continuous wavelet transform (CWT), combining the feature extraction and classification. Firstly, CWT is used to convert the raw vibration signals into time-frequency representations and achieve the extraction of image features. Secondly, a new framework of deep CNN is established via designing the convolutional layers and sub-sampling layers. The learning process and results are visualized by t-distributed stochastic neighbor embedding (t-SNE). The results of the experiment present a higher classification accuracy compared with other models. It is demonstrated that the proposed approach is effective and stable for fault diagnosis of a hydraulic axial piston pump.

Keywords: convolutional neural network; continuous wavelet transform; intelligent fault diagnosis; hydraulic axial piston pump

1. Introduction Owing to the advantages of fast response, high power density and high stability, hydraulic transmission systems play a critical role in industry [1–3]. The hydraulic axial piston pump is considered the critical power source of the hydraulic transmission system, and it is meaningful to ensure its stable operation. On account of the severe conditions of high temperature, high pressure and heavy working load, the incident and unexpected faults may lead to enormous economic losses and potential safety impacts [4–6]. Therefore, it is significant and valuable to exploit the effective and accurate fault diagnosis methods for the stability and reliability of the system. In light of fault diagnosis in hydraulic axial piston pumps, numerous studies have been emphasized conventional methods [7,8]. Traditional fault diagnosis methods are mainly based on the analysis of the mechanism, characteristic frequency or the extraction of fault feature. In consideration of the fuzzy fault characteristics and complex structure of the pump, it is difficult to use traditional subjective manual diagnosis methods to exactly achieve its fault diagnosis.

Sensors 2020, 20, 6576; doi:10.3390/s20226576 www.mdpi.com/journal/sensors Sensors 2020, 20, 6576 2 of 20

With the development of artificial intelligence, intelligent fault diagnosis technologies have aroused increasing attention of researchers [9–11]. Intelligent diagnosis methods present the powerful processing capability for mechanical big data, which no longer rely on professional knowledge and diagnostic experience, as was previously the case. It is worth pointing out that machine learning based diagnostic methods are viewed as the typical representative. Amar et al. constructed a neural network model based on a vibration spectrum for bearing fault diagnosis [12]. As one of classical and popular machine learning methods, support vector machine (SVM) has been employed to achieve the non-linear classification [13–15]. Moreover, SVM based methods were employed to investigate the fault diagnosis of rotating machinery [16,17]. Contrastive analysis was performed by Han et al., primarily on random forest, annual neural networks, and SVM for fault diagnosis in rotating machinery [18]. Due to the limitations of traditional machine learning in feature extraction and model training, deep learning (DL) based technology motivates the investigation of intelligent fault diagnosis [19–21]. As one of the effective and precise DL methods, convolutional neural network (CNN) has been regarded as a potential tool for machinery fault diagnosis. The characteristics of weight sharing and down-sampling means that CNN outperforms other deep network models. To analyze the fault type and severity, a LiftingNet framework was developed, and the varying rotating speed and operation environment were taken into account [22]. Liang et al. used wavelet transform to accomplish the transformation of signal into images, and established CNN for fault classification [23]. By introducing a simplified shallow information fusion, a new CNN model was employed for fault diagnosis towards the high-speed train axle-box bearing [24]. The training time of the network was effectively reduced and the diagnostic performance was promoted in the meantime. By combining Hilbert–Huang transform (HHT) and CNN, Gao et al. developed a novel approach for fault diagnosis of micro-electromechanical system inertial sensors [25]. The proposed method presents remarkable performance in comparison to modern methods. In order to figure out the imbalanced distribution of machinery data, Lei et al. carried out a deep normalized CNN for bearing fault diagnosis [26]. Based on LeNet-5, new diagnostic methods were developed for the fault diagnosis of pumps, the ability of CNN in image processing was admirably demonstrated in the form of predicted fault classification results [21,27]. By integrating CNN and recurrent neural network, Shenfield et al. constructed a novel deep model for the fault detection and distinction of bearing [28]. Motivated by transfer learning, a new method called deep domain-adversarial transfer learning was developed for fault diagnosis of rolling bearings [29]. Based on the transfer learning model of ResNet-50, a multi-scale deep model was constructed for bearing fault diagnosis, enhancing the robustness and generalizability of the model [30]. Moreover, the proposed model was compared with many other models, SVM, CNN, CNN with maximum mean discrepancy (MMD) and so on. The advantage of the established model was fully proved. Inspired by the analysis of the single signals, Ye et al. constructed a new model based on deep neural network, employing the feature fusion on the signals from multi-channel sensors [31]. In contrast with other intelligent methods with signals from a single sensor, including the Back-Propagation neural network and SVM, the proposed method was demonstrated to be more effective and accurate for fault diagnosis. Using continuous wavelet transform (CWT) as a preprocessing method, a CNN was employed for bearing fault diagnosis [32]. To solve the problem of insufficient fault data, a combined intelligent approach was developed based on CNN, nonlinear auto-regression neural network and CWT. The performance was verified by two imbalanced datasets, including bearing and gear [33]. By integrating CNN and an extreme learning machine, Chen et al. developed a novel method for fault diagnosis of gearboxes and motor bearings, using CWT for the conversion of raw signal [34]. As a prevailing dimension reduction algorithm in machine learning, t-distributed stochastic neighbor embedding (t-SNE) has been employed for the visualization of the feature learned by the CNN model [35–37]. Although many studies based on DL methods have achieved some successful results for fault diagnosis of bearing and gearing, the research on pumps are still insufficient, especially for hydraulic axial piston pumps. Furthermore, owing to the complex structure and the difficulty in acquiring the fault data, it is a great challenge to accomplish the precise fault diagnosis of a hydraulic axial piston pump. Sensors 2020, 20, 6576 3 of 20

In this paper, three key contributions are made in the following: (1) Known as one of the most widely-used rotating machinery in many fields, fault diagnosis of hydraulic axial piston pumps is considered to be necessary and significant in engineering applications. Moreover, the present intelligent fault diagnosis methods are mainly focused on the bearing, gearing and gearbox, the research on hydraulic axial piston pumps is lacking. (2) In consideration of the superiority of wavelet transform in nonlinear signal processing, CWT is integrated into the approach to achieve the transformation of the time-frequency representations from raw vibration signals. (3) The limitations of traditional diagnostic methods and common intelligent fault diagnosis approaches are effectively overcome, the proposed diagnosis method will provide an important concept for exploring the new diagnostic methods. Therefore, this research puts emphasis on the intelligent fault diagnosis methods of the hydraulic axial piston pump. Firstly, basic theory of CNN is briefly introduced in Section2. In Section3, in order to reduce the difficulty of feature extraction, CWT is selected for preprocessing of raw vibration signals. In light of the superiority of CNN in the feature learning, a new CNN model is employed for fault diagnosis of the pump. In Section4, the diagnostic performance of the proposed method is validated by the experiments, and the effectiveness of the model is displayed by confusion matrix and t-SNE. Furthermore, the comparisons are performed with different CNN based models.

2. Basic Algorithm Theory

2.1. Brief Introduction to Convolutional Neural Network In light of the diverse fault classification methods and the nonlinear characteristics of machinery big data, deep learning based technology aroused the concern of researchers in the fault diagnosis field [38–40]. As one of the prevailing and effective representatives, CNN presents a powerful automatic learning capability for useful and distinguished features, compensating for the deficiencies of the fully connected feedforward neural network in multiple parameters and local invariance. Generally, typical CNN structure is composed of different layers, involving a data input layer, convolution layer, ReLU (Rectified Linear Unit) layer called the activation layer, a pooling layer and a fully connected layer. The structural layers can be used to complete the feature extraction and final classification. CNN shows superiority over other DNN methods owing to three main traits: involving local connection, weights sharing and down-sampling. Therefore, a reduction in network parameters needing to be optimized can be achieved, and the bottleneck of overfitting can be resolved to a certain extent during feature learning [41,42]. Compared to the structures of other deep learning models, convolution layer and subsampling layer are distinct for the CNN models. In terms of the local receptive field in the convolutional layer, the size is the same as the convolutional kernels. Convolutional kernel is also named filters and is considered to be a local window. In a local window, two layers of adjacent neurons are connected to each other [43,44]. Convolutional kernel can be viewed as a linear time-invariant system, the feature map of the next layer can be calculated by: X l l 1 l l x = f ( x − k + b ) (1) j i M i j j ∈ j × where, ( ) denotes the operation of the convolution. x represents the input of the network. k denotes × j convolutional kernel. Then, the convolution of the kernel is performed on the input data. b is the bias and is introduced during the process. Ultimately, the f could be employed for obtaining output nonlinear features. The pooling layer is also called the sub-sampling layer, and can further reduce the number of parameters on the basis of local connection. Furthermore, it can enhance the generalization ability of model. Sensors 2020, 20, 6576 4 of 20

The process of the pooling operation can be expressed by,

l l l 1 l aj s = f (wjdown(Mj− ) + bj) (2) − Among them, down( ) means the calculation of the maximum or the mean values in regard to · l l the convolved features. In the function, f , Mj, wj, and bj represent the feature map, weight and the bias, respectively. As for the fully connected layer, a softmax regression model can be considered as effective and accurate in conducting multiclass classification.

2.2. Basic Principle of Continuous Wavelet Transform With regard to the basic theory of WT, the relative studies can provide some references [45]. The mother wavelet can be presented as follows,

1 t u ψu,s(t) = ψ( − ) (3) √s s

Among them, ψu,s represents the wavelet dictionary, which is generated by a single wavelet; s and u are two variables, respectively; the parameter s denotes the scale; u denotes the translation and u R, ∈ which is employed to control the translation of the wavelet function. The WT can be accordingly calculated by, Z 1 t u W f (u, s) = f , ψu,s = x(t)ψ( − )dt (4) √s s

Owing to the advantages in processing nonstationary signal, CWT was carried out to accomplish the image transformation for the fault data of the hydraulic axial piston pump in each condition. Compared with regular wavelets, ComplexMorlet presents good resolution in the time-domain and frequency-domain. Hence, ComplexMorlet is selected as the wavelet basis function.

3. Proposed Intelligent Fault Diagnosis Method

3.1. Data Description The experiments were performed on a hydraulic axial piston pump test platform, as shown in Figure1. The test bench was primarily composed of a motor, a pump and an acceleration sensor et al. The object of this test was a swash-plate axial piston pump with seven plungers. The rated speed of the pump was 1470 r/min, which means the corresponding rotary frequency was 24.5 Hz. In regard to the data acquisition equipment, a multi-function data acquisition card is provided by National Instruments (NI) Company (Austin, TX, USA). The model number of the equipment is USB-6221. The fault vibration signals are acquired from the data acquisition system. As for each condition, the sampling frequency was set to 10 kHz. Sensors 2020, 20, x FOR PEER REVIEW 4 of 19

ll=+ ll−1 awMbjs− fdown(()) j j j (2)

down()⋅ Among them, means the calculation of the maximum or the mean values in regard to l l f j j the convolved features. In the function, , M j , w , and b represent the feature map, weight and the bias, respectively. As for the fully connected layer, a softmax regression model can be considered as effective and accurate in conducting multiclass classification.

2.2. Basic Principle of Continuous Wavelet Transform With regard to the basic theory of WT, the relative studies can provide some references [45]. The mother wavelet can be presented as follows, − ψψ1 tu us, ()=t ( ) (3) s s ψ Among them, us, represents the wavelet dictionary, which is generated by a single wavelet; s and u are two variables, respectively; the parameter s denotes the scale; u denotes the ∈ translation and uR, which is employed to control the translation of the wavelet function. The WT can be accordingly calculated by, − ==ψψ1 () tu Wusfus(,) f ,,  xt ( ) dt (4) s s Owing to the advantages in processing nonstationary signal, CWT was carried out to accomplish the image transformation for the fault data of the hydraulic axial piston pump in each condition. Compared with regular wavelets, ComplexMorlet presents good resolution in the time-domain and frequency-domain. Hence, ComplexMorlet is selected as the wavelet basis function.

3. Proposed Intelligent Fault Diagnosis Method

3.1. Data Description The experiments were performed on a hydraulic axial piston pump test platform, as shown in Figure 1. The test bench was primarily composed of a motor, a pump and an acceleration sensor et al. The object of this test was a swash-plate axial piston pump with seven plungers. The rated speed of the pump was 1470 r/min, which means the corresponding rotary frequency was 24.5 Hz. In regard to the data acquisition equipment, a multi-function data acquisition card is provided by National Instruments (NI) Company (Austin, TX, USA). The model number of the equipment is USB-6221. The Sensorsfault vibration2020, 20, 6576 signals are acquired from the data acquisition system. As for each condition,5 ofthe 20 sampling frequency was set to 10 kHz.

Motor

Pump

Outlet pipeline Vibration sensor

Inlet pipeline Figure 1. The test system for fault experiment.

During the experiments, five different health conditions were simulated, mainly including normal and faulty states. The obtained data were employed for the following fault diagnosis to demonstrate the classification of the CNN model. The index names represent the index corresponding to the name of the fault category. Specifically, it is a processing step on the input data before feeding it to the neural network. The specific descriptions of the five conditions are expressed in Table1.

Table 1. The health conditions and type labels of hydraulic axial piston pump.

Health Condition Description Index Names Type Labels Normal no fault in hydraulic pump zc 0 swash plate wear xp 1 loose slipper failure sx 2 Faulty slipper wear hx 3 central spring wear th 4

3.2. Data Preprocessing In common fault diagnosis methods, data preprocessing technologies are usually used to achieve feature extraction by complex steps [46]. Combining signal acquisition, feature extraction and fault classification, intelligent techniques could be considered as a potent direction in developing novel fault classification methods [47]. However, the requirements for data input should be eligible for the training of deep network models, especially, image/graph inputs are requested for methods such as 2D CNN. In addition to CWT, there are many other processing methods for transforming the signals into images, including short time Fourier transform (STFT), S-transform (ST), discrete wavelet transformation (DWT) and cyclic spectral coherence (CSCoh). STFT uses a fixed window function and is usually used to analyze piecewise stationary signals or quasi-stationary signals. However, the frequency and time resolution cannot be taken into account in the meantime [48]. DWT is a discretization to the scale and translation of basic wavelet and generally refers to two-scale wavelet transform. Compared with CWT, DWT resolves the problem of calculated quantity [49]. As the inheritance and development of WT and STFT, ST eliminates the selection of window function and enhances the deficiency of fixed window width. Moreover, the features extracted by ST are not sensitive to noise [50]. Compared to conventional cyclostationary analysis, CSCoh can effectively overcome noise interference and obtain the potential fault information via the analysis of the relationship between the spectral frequency and cyclic frequency [51]. From Figure2, with regard to each fault type, the acquired raw time series are firstly divided into various data segments. Each segment involves 1024 sampling points. Then on account of the input requirements of diverse models, different preprocessing methods can be employed to the following steps. The amount of one dimensional (1D) data can be increased by data augmentation for expending Sensors 2020, 20, 6576 6 of 20 the training datasets. As for the models of two dimensional (2D) input, the segments should be converted into 2D images or matrixes through time-frequency analysis methods, including STFT, ST, CWT and CSCoh [52–55]. Furthermore, the obtained 2D images are taken as the input of the established CNN model. Deep models can include CNN, deep belief networks (DBN), recurrent neural networks (RNN), and generative adversarial networks (GAN). CNN is selected in this work. Through the training and testing of the network, the outputs present the performance of the model, includingSensors 2020, the 20, x training FOR PEER loss REVIEW and the classification accuracy. 6 of 19

Figure 2. Flowchart of fault data preprocessing. CWT: continuous wavelet transform; DWT: discrete Figure 2. Flowchart of fault data preprocessing. CWT: continuous wavelet transform; DWT: discrete wavelet transformation; CSCoh: cyclic spectral coherence; STFT: short time Fourier transform; wavelet transformation; CSCoh: cyclic spectral coherence; STFT: short time Fourier transform; ST: S- ST: S-transform; CNN: convolutional neural network; DBN: deep belief networks; RNN: recurrent transform; CNN: convolutional neural network; DBN: deep belief networks; RNN: recurrent neural neural networks; GAN: generative adversarial networks. networks; GAN: generative adversarial networks. 3.3. Proposed Intelligent Method 3.3. Proposed Intelligent Method In view of the excellent performance of the popular CNN model in image identification and classification,In view of a newthe excellent intelligent performance method based of the on thepopular CNN CNN model model is proposed in image for identification fault diagnosis and of theclassification, hydraulic a pump. new intelligent Firstly, the method vibration based signal on the is acquiredCNN model as raw is proposed data. Secondly, for fault the diagnosis vibration of signalsthe hydraulic are transformed pump. Firstly, into time-frequencythe vibration signal images is acquired for the establishment as raw data. Secondly, of training the and vibration testing datasets.signals are Then,transformed a CNN into model time-frequency is constructed images and for trained the establishment with the training of training datasets and obtained testing above.datasets. Finally, Then, thea CNN testing model datasets is constructed are employed and totrai testned and with validate the training the classification datasets obtained performance. above. Hence,Finally, thethe intelligent testing datasets fault diagnosis are employed is accomplished to test and validate for hydraulic the classification axial piston pump.performance. Hence, the intelligentIt is composed fault diagnosis of five varying is accomplished convolutional for layershydraulic (Conv), axial three piston sub-sampling pump. layers and three fully-connectedIt is composed layers of five (Figure varying3). convolutional layers (Conv), three sub-sampling layers and three fully-connected layers (Figure 3). Maxpooling is used to reduce the dimension of features and overfitting of the model. The size of the pooling area is 3 × 3 for each pooling layer. In order to inhibit overfitting and gradient vanishing of the model, the operation of dropout is taken into account. Namely, the dropout layer is introduced during the fully connected layers. Owing to the five different fault types for hydraulic axial piston pump, the output of the network is set as five. During the classification step, the Softmax function is employed to convert the prediction results of the model into the exponential function, to ensure the non-negative probability. Moreover, it can guarantee that the sum of the probabilities of each prediction is equal to one. In order to obtain the optimized structural parameters of CNN, the gradient descent algorithm can be employed. It can be understood that the parameters of the network will be updated according to the gradient information from the back-propagation. Then the value of the cross-entropy loss function will be reduced and finally, the learning of the network will be accelerated. Adam is a typical and effective optimization algorithm proposed by Kingma and Ba [56]. Adam integrates Momentum with the RMSprop algorithm, adopting Momentum and an adaptive learning rate to accelerate the convergence speed. Moreover, Adam presents superiority in processing non-stationary objectives and problems with noisy and/or sparse gradients.

Sensors 2020, 20, 6576 7 of 20 Sensors 2020, 20, x FOR PEER REVIEW 7 of 19

Figure 3. The framwork of the proposed CNN model. Figure 3. The framwork of the proposed CNN model.

4. ValidationMaxpooling of Proposed is used to CNN reduce Model the dimension of features and overfitting of the model. The size of the pooling area is 3 3 for each pooling layer. In order to inhibit overfitting and gradient vanishing of × the4.1. model, Input Data the Description operation of dropout is taken into account. Namely, the dropout layer is introduced duringIn the the fullyoperation connected process layers.of equipment, Owing mechanical to the five faults diff erentmay lead fault to typesvarious for signals, hydraulic including axial piston pump,impact, the environmental output of the noise network and other is set features. as five. It is During of great the difficulty classification to classify step, diverse the fault Softmax types function isonly employed from 1D to time-domain/freque convert the predictionncy-domain results analyses. of the model Therefore, into 2D the time-frequency exponential function, analysis is to ensure theconsidered non-negative to be more probability. effective for Moreover, processing it nonlinear can guarantee signals. that the sum of the probabilities of each predictionCWT isis equalselected to one.as the preprocessing method for this research. During the CWT operation, ComplexMorletIn order to obtainis chosen the as optimizedthe wavelet structuralbasis function. parameters The bandwidth of CNN, and thecenter gradient frequency descent are both algorithm canthree, be employed.and the length It canof the be scale understood sequence thatused the in wavelet parameters transform of the is 256. network The time-frequency will be updated images according of five conditions are displayed in Figure 4. The images of different fault types are similar to a certain to the gradient information from the back-propagation. Then the value of the cross-entropy loss degree. It can be found that there was some distinction in changing fault types. The frequency varied function will be reduced and finally, the learning of the network will be accelerated. Adam is a typical with time as depicted in representations under various states. However, it is hard to distinguish andvarious effective fault optimizationtypes based on algorithmexperience proposedand diagnostic by Kingmaknowledge. and It Bajust [ 56provides]. Adam sufficient integrates space Momentum for withautomatic the RMSprop feature learning algorithm, of the adopting following Momentumestablished deep and CNN an adaptive model. It learning could demonstrate rate to accelerate the the convergencemining capability speed. of the Moreover, implied characteri Adam presentsstics from superiority such similar inrepresentations. processing non-stationary objectives and problems with noisy and/or sparse gradients.

4. Validation of Proposed CNN Model

4.1. Input Data Description In the operation process of equipment, mechanical faults may lead to various signals, including impact, environmental noise and other features. It is of great difficulty to classify diverse fault types only from 1D time-domain/frequency-domain analyses. Therefore, 2D time-frequency analysis is considered to be more effective for processing nonlinear signals. CWT is selected as the preprocessing method for this research. During the CWT operation, ComplexMorlet is chosen(a) Normal as the condition wavelet basis( function.b) Slipper failure The bandwidth and center frequency are both three, and the length of the scale sequence used in wavelet transform is 256. The time-frequency images of five conditions are displayed in Figure4. The images of di fferent fault types are similar to a certain degree. It can be found that there was some distinction in changing fault types. The frequency varied with time as depicted in representations under various states. However, it is hard to distinguish various fault types based on experience and diagnostic knowledge. It just provides sufficient space for automatic feature learning of the following established deep CNN model. It could demonstrate the mining capability of the implied characteristics from such similar representations.

Sensors 2020, 20, x FOR PEER REVIEW 7 of 19

Figure 3. The framwork of the proposed CNN model.

4. Validation of Proposed CNN Model

4.1. Input Data Description In the operation process of equipment, mechanical faults may lead to various signals, including impact, environmental noise and other features. It is of great difficulty to classify diverse fault types only from 1D time-domain/frequency-domain analyses. Therefore, 2D time-frequency analysis is considered to be more effective for processing nonlinear signals. CWT is selected as the preprocessing method for this research. During the CWT operation, ComplexMorlet is chosen as the wavelet basis function. The bandwidth and center frequency are both three, and the length of the scale sequence used in wavelet transform is 256. The time-frequency images of five conditions are displayed in Figure 4. The images of different fault types are similar to a certain degree. It can be found that there was some distinction in changing fault types. The frequency varied with time as depicted in representations under various states. However, it is hard to distinguish various fault types based on experience and diagnostic knowledge. It just provides sufficient space for Sensorsautomatic2020 , 20feature, 6576 learning of the following established deep CNN model. It could demonstrate8 ofthe 20 mining capability of the implied characteristics from such similar representations.

Sensors 2020, 20, x FOR PEER REVIEW 8 of 19 (a) Normal condition (b) Slipper failure

(c) Loose slipper failure (d) Swash plate wear

(e) Central spring wear

Figure 4. Time-frequencyTime-frequency image represen representationstations under 5 conditions.

In the obtained samples, there was a total of 6000 time-frequency images, and each type of fault involved 12001200 images.images. Before Before inputting inputting into into the the CNN CNN model, model, the the strategies strategies of data of data transform transform were were used usedfor the for adjustment the adjustment of raw of image raw image size. Thesize. image The imag sizee was size transformed was transformed into the into same the sizesame of size 224 of 224.224 × The× 224. random The horizontalrandom horizontal flip was carriedflip was out carried on the samplesout on the in thesamples training in dataset.the training The samplesdataset. wereThe samplesrandomly were divided randomly into adivided training into dataset a training and testing dataset dataset and testing in the dataset ratio ofin 7:3.the ratio Namely, of 7:3. there Namely, were there840 training were 840 samples training and samples 360 testing and samples360 testing under samples each faultunder category, each fault respectively. category, Therespectively. detailed dataThe detailedare described data inare Table described2. Furthermore, in Table 2. in Furthermor order to validatee, in order the diagnosis to validate performance the diagnosis of the performance model and ofeff ectivelythe model avoid and overfitting, effectively onlyavoid the overfitting, training samples only the were training used samples for updating were the used weights for updating and bias the of weightsthe model and in bias the trainingof the model process. in the The training network proc modeless. The has network never been model exposed has never to the been test samples.exposed to the test samples.

Table 2. The number and label configuration of datasets for hydraulic axial piston pump under 5 conditions.

Fault Time-Frequency Train Test Type Fault Description Type Image Dataset Dataset Labels

hx slipper wear 840 360 0

sx loose slipper failure 840 360 1

th central spring wear 840 360 2

Sensors 2020, 20, x FOR PEER REVIEW 9 of 19 Sensors 2020, 20, x FOR PEER REVIEW 9 of 19 Sensors 2020, 20, x FOR PEER REVIEW 9 of 19 Sensors 2020, 20, x FOR PEER REVIEW 9 of 19 Sensors 2020, 20, x FOR PEER REVIEW 9 of 19

Table 2. The number and label configuration of datasets for hydraulic axial piston pump under 5 SensorsTableconditions 2020 , 2.20 The,. 6576 number and label configuration of datasets for hydraulic axial piston pump under 5 9 of 20 conditionsTable 2. The. number and label configuration of datasets for hydraulic axial piston pump under 5 FaultTableconditions 2. The. number and label configurationTime-Frequency of datasets for hydraulicTrain axial pistonTest pump underType 5 conditionsTable 2. The. Fault number Description and label configuration of datasets for hydraulic axial piston pump under 5 FaultTable 2. The number and label configurationTime-Frequency of datasets forT hydraulicrain axialT pistonest pumpType under TypeFaultconditions . Fault Description TimeImage-Frequency DatasetTrain DatasetTest LabelsType TypeFault 5 conditions. Fault Description TimeImage-Frequency DatasetTrain DatasetTest LabelsType Type Fault Description Image Dataset Dataset Labels TypeFault TimeImage-Frequency DatasetTrain DatasetTest LabelsType Fault Description Fault Time-Frequency Train Test Type Type Fault Type Image Dataset Dataset Labels Description Image Dataset Dataset Labels hx slipper wear 840 360 0 hx slipper wear 840 360 0 hx slipper wear 840 360 0 hx slipper wear 840 360 0 hx hxslipper slipperwear wear 840840 360360 0 0

sx loose slipper failure 840 360 1 sx loose slipper failure 840 360 1 sx loose slipperloose failure slipper 840 360 1 sx sxloose slipper failure 840840 360360 1 1 sx loose slipper failurefailure 840 360 1

th central spring wear 840 360 2 th central springcentral wear spring 840 360 2 th thcentral spring wear 840840 360360 2 2 wear th central spring wear 840 360 2 th central spring wear 840 360 2

xp swash plate swash wear plate 840 360 3 xp xpswash plate wear 840840 360360 3 3 wear xp swash plate wear 840 360 3 xp swash plate wear 840 360 3 xp swash plate wear 840 360 3

no fault in no fault in hydraulic zc zcno fault in hydraulichydraulic 840840 360360 4 4 zc no faultpump in hydraulic pump 840 360 4 zc pump 840 360 4 no faultpump in hydraulic zc no fault in hydraulic 840 360 4 totalzc totalpump- -- 42004200840 18001800360 - 4- total pump- - 4200 1800 - total - - 4200 1800 - 4.2.4.2. totalParameter Parameter Selection Selection for- for the the Proposed Proposed Model Model - 4200 1800 - 4.2. totalParameter Selection for- the Proposed Model - 4200 1800 - 4.2. ParameterIn consideration Selection for of the the Proposed great Model influence on the classification performance, some critical 4.2. ParameterInIn consideration Selection offor of the the the greatProposed great influence Model influenceon the on classification the classification performance, performance, some critical some parameters critical parameters4.2.were ParameterIn analyzed consideration were Selection and analyzed discussed, for of the theand Proposed including greatdiscussed, Model influence epoch, including batch on size,theepoch, classificationand batch the number size, performance,and and the size number of the some convolutional and criticalsize of parametersthe convolutionalIn consideration were analyzedkernel. of The theand suitable greatdiscussed, influencenetwork including model on theepoch, will classification be batch established size, performance,and via the the numberoptimization some and criticalsize of the of theparameterskernel. convolutionalIn consideration The were suitable analyzedkernel. network of The theand suitable model greatdiscussed, will influencenetwork be including established model on theepoch, will via classification be thebatch established optimization size, performance,and via the ofthethe numberoptimization parameters some and criticalsize of above. the of parametersthe convolutionalSmall above.were epochs analyzedkernel. will resultThe and suitable indiscussed, the undesirednetwork including model effect epoch, will of fittingbe batch established towards size, and via the the the model. numberoptimization If and a big size of epoch the of parametersthe convolutionalSmall epoch above.were s analyzed kernel.will result The and insuitable discussed,the undesired network including effectmodel epoch,of will fitting be batch established towards size, theand via model.the the numberoptimization If a big and epoch size of the ofis parameterstheis selected,convolutionalSmall epoch above. the classifications kernel.will result The insuitable accuracy the undesired network may be effectmodel enhanced, of will fitting be but established towards it will bring the via model. about the optimization a If higher a big epoch time of cost.the is selected,parametersTherefore,Small the epoch itabove. clisassification vitals will to chooseresult accuracy in an the appropriate undesired may be enhanced, epoch effect forof thefitting but constructionit towardswill bring the of about themodel. model. a higher If a big time epoch cost. is selected,Therefore,parametersSmall the itepoch above. is cl vitalassifications will to choose result accuracy inan theappropriate undesired may be epoch enhanced, effect for of the fitting but construction it towardswill bring theof aboutthe model. model. a higher If a big time epoch cost. is Therefore,selected,SmallIn order the itepoch is cl tovitalassifications study will to choose result the accuracy convergence inan theappropriate undesired may process be epoch enhanced, effect of thefor of the networkfitting but construction it towardswill model, bring theof we aboutthe model. set model. thea higher If epoch a big time as epoch 100 cost. and is selected,Therefore,repeatedIn order the theit is clto trialsvitalassification study to 10 choosethe times. convergence accuracy an The appropriate average may process be resultsenhanced,epoch of the for were network the but recordedconstruction it will model, bring as theweof about theset final model.the a diagnostic epochhigher as time 100 accuracy. cost.and repeatedTherefore,selected,In order the it is trials clto vitalassification study 10 to times. choosethe convergence accuracy The an appropriateaverage may process results be epoch enhanced, of were the for recordednetwork the but construction it will model,as the bring final weof about theset diagnostic model.the a epochhigher accuracy. as time 100 cost.and As repeatedTherefore,As depictedIn order the it is trialsto in vital study Figures 10 to times. choosethe5 and convergence The 6an, the appropriateaverage initial process results training epoch of were the lossfor recordednetwork the was construction more model,as thanthe final weone.of theset diagnostic It model.the gradually epoch accuracy. as decreased 100 and As depictedrepeatedwithIn the order inthe increase F iguretrialsto study s10in 5− thetimes. 6the, the epoch.convergence initialThe average At training the process beginning,results loss wasof were the more the recordednetwork classification than one model,as the. It accuracygradually finalwe set diagnostic the was decreased epoch lower accuracy. as thanwith100 and 80%.theAs depictedincreaserepeatedIn order ininthe theF iguretrialsto epoch study s10 5.− times. At6the, the the convergence initial Thebeginning average training, processthe results lossclassification wasof were the more recordednetwork accuracy than one model,as wasthe. It gradually finalwelower set diagnostic than the decreased epoch80% accuracy.. It as increased with100 and theAs increasedepictedrepeatedIt increased ininthe theF iguretrials gradually epoch s10 5.− times.At6, with the the initialThebeginning the increaseaverage training, the results in lossclassification the was epochwere more recorded contrary accuracy than one toas thewasthe. It traininggradually finallower diagnostic than loss. decreased 80% When accuracy.. It increased thewith epoch theAs graduallydepictedincreasewas over inin with15, theFigure the epochthe testings increase 5.− At6, the accuracythe ininitial beginning the wasepochtraining more, the contrary lossclassification than was 94%.to morethe When training accuracy than the one loss epoch was. It. Whengradually lower was morethe than epochdecreased than 80% 30,.was It increased the withover testing the15, graduallytheincreasedepicted testing inin with theaccuracyFigure epochthes increase 5 was.− At6, the themore in initialbeginning the than epochtraining 94%., the contrary When lossclassification was the to moretheepoch training accuracy than was one moreloss was. It. Whengraduallythan lower 30, the than the epochdecreased 80%testing .was It increased accuracy withover the15, thegraduallyincreaseaccuracy testing in reachedwith theaccuracy epochthe over increase was. At 96%. themore in Withbeginning the than theepoch 94%. further, the contrary When classification increases the to theepoch in training trainingaccuracy was moreloss epochs, was. Whenthan lower the 30, lossthe than the epoch value 80%testing tended.was It increasedaccuracy over to be15, a reacgraduallythesmall testinghed value over with accuracy and96%. the remained Withincrease was the more stable. infurther the than Meanwhile,epoch increase 94%. contrary Whens in the training the classificationto theepoch trainingepoch was accuracys, moretheloss loss. Whenthan slightly value 30, the tendthe fluctuated,epoch testinged towas be indicatingaccuracy over a small 15, reacvaluethegradually testinghed and over remainwith accuracy 96%. theed Withincrease stablewas the more. Mean infurther the thanwhile epoch increase 94%., the contrary When classifications in training the to theepoch trainingaccuracyepoch wass, moretheloss slightly loss. Whenthan value fluctuated 30, the tendthe epoch testinged, indicating towas be accuracy over a small that 15, valuereacthethat testinghed theand over CNN remain accuracy 96%. modeled With stablewas has the more been. Mean further trainedthanwhile increase 94%., to the converge. When classifications in training the Hence, epoch accuracyepoch thewas trainings, morethe slightly loss than epoch value fluctuated 30, was tendthe chosen testinged, indicating to be as accuracy 30a small in that the thereacvaluefollowing CNNhed and over modelremain research 96%. hased With studies.stable been the . trainedMean furtherwhile to increase converge, the classifications in. Hence,training the accuracyepoch trainings, the slightly epochloss value fluctuated was tendchosened, indicating toas be 30 a insmall thatthe thefollowingvaluereac CNNhed and over modelresearchremain 96%. hased Withstudies stable been the . trainedMean furtherwhile to increase converge, the classifications in. Hence,training the accuracyepoch trainings, the slightly epochloss value fluctuated was tendchosened, indicating toas be 30 a insmall thatthe followingthevalue CNN and modelresearchremain hased studies stable been . trainedMeanwhile to converge, the classification. Hence, the accuracy training slightly epoch fluctuated was chosen, indicating as 30 in thatthe thefollowing CNN modelresearch has studies been . trained to converge. Hence, the training epoch was chosen as 30 in the followingthe CNN modelresearch has studies been . trained to converge. Hence, the training epoch was chosen as 30 in the following research studies.

Sensors 2020, 20, x FOR PEER REVIEW 9 of 19

xp swash plate wear 840 360 3

no fault in hydraulic zc 840 360 4 pump

total - - 4200 1800 -

4.2. Parameter Selection for the Proposed Model In consideration of the great influence on the classification performance, some critical parameters were analyzed and discussed, including epoch, batch size, and the number and size of the convolutional kernel. The suitable network model will be established via the optimization of the parameters above. Small epochs will result in the undesired effect of fitting towards the model. If a big epoch is selected, the classification accuracy may be enhanced, but it will bring about a higher time cost. Therefore, it is vital to choose an appropriate epoch for the construction of the model. In order to study the convergence process of the network model, we set the epoch as 100 and repeated the trials 10 times. The average results were recorded as the final diagnostic accuracy. As depicted in Figures 5−6, the initial training loss was more than one. It gradually decreased with the increase in the epoch. At the beginning, the classification accuracy was lower than 80%. It increased gradually with the increase in the epoch contrary to the training loss. When the epoch was over 15, the testing accuracy was more than 94%. When the epoch was more than 30, the testing accuracy reached over 96%. With the further increases in training epochs, the loss value tended to be a small value and remained stable. Meanwhile, the classification accuracy slightly fluctuated, indicating that theSensors CNN2020 ,model20, 6576 has been trained to converge. Hence, the training epoch was chosen as 30 in10 ofthe 20 following research studies. 1

0.8

0.6 Loss 0.4

0.2

0 0 20 40 60 80 100 Epochs Sensors 2020, 20, x FOR PEER REVIEW 10 of 19 FigureFigure 5. TheThe tendency tendency of of the the relationship relationship be betweentween the training loss and epoch.

100 95 90 Train samples Test samples 85 80 75

Accuracy (%) Accuracy 70 65 60

55 0 20 40 60 80 100 Epochs Figure 6. The tendency of the relationship between the classification accuracy and epoch. Figure 6. The tendency of the relationship between the classification accuracy and epoch. Large batch size may lead to a faster convergence speed of the network model. Then, the training Large batch size may lead to a faster convergence speed of the network model. Then, the training time can be reduced, and the training curve of the model will be smoother, which can improve the time can be reduced, and the training curve of the model will be smoother, which can improve the stability of the model. However, with the increase in batch size, the number of adjustment weights and stability of the model. However, with the increase in batch size, the number of adjustment weights offset will be reduced, and the performance of the model will be reduced, resulting in a reduction in and offset will be reduced, and the performance of the model will be reduced, resulting in a reduction the generalization ability of the model. The smaller batch size is favorable for improving the effect in the generalization ability of the model. The smaller batch size is favorable for improving the effect of classification, but it will bring about a higher computation cost. If the batch size is smaller than of classification, but it will bring about a higher computation cost. If the batch size is smaller than the the number of categories in the datasets, the model will not converge. Therefore, proper batch size is number of categories in the datasets, the model will not converge. Therefore, proper batch size is necessary for the selection of the parameters of the model. necessary for the selection of the parameters of the model. In this research, the batch size was selected in the light of multiple factors, involving the sensitivity In this research, the batch size was selected in the light of multiple factors, involving the of graphics processing unit (GPU) to the value of 2n, eight multiples. The batch size is divisible by the sensitivity of graphics processing unit (GPU) to the value of 2n, eight multiples. The batch size is total number of training samples. In consideration of computational time and classification accuracy, divisible by the total number of training samples. In consideration of computational time and the batch was chosen as 56. classification accuracy, the batch was chosen as 56. To inhibit overfitting of the model, the operation of dropout was conducted among the FC layers. To inhibit overfitting of the model, the operation of dropout was conducted among the FC layers. The models with and without dropout layers were investigated to explore the effect on the classification The models with and without dropout layers were investigated to explore the effect on the accuracy. As can be seen from Figure7, the model without dropout layers presents remarkable classification accuracy. As can be seen from Figure 7, the model without dropout layers presents fluctuation in 10 trials. Moreover, the accuracy is less than that of the proposed CNN model. It can be remarkable fluctuation in 10 trials. Moreover, the accuracy is less than that of the proposed CNN model. It can be demonstrated that the proposed CNN model is stable, and the design of dropout layers enhances the performance of the model.

Sensors 2020, 20, 6576 11 of 20 demonstrated that the proposed CNN model is stable, and the design of dropout layers enhances the performanceSensors 2020 of, 20 the, x FOR model. PEER REVIEW 11 of 19 Sensors 2020, 20, x FOR PEER REVIEW 11 of 19

0.985 0.985

0.980 0.980

0.975 0.975

0.970 Accuracy 0.970 Accuracy

0.965 0.965

0.960 0.960

0.955 12345678910 Proposed0.955 CNN 0.983 0.983 0.984 0.982 0.982 0.982 0.984 0.983 0.981 0.981 12345678910 ProposedNo dropout CNN 0.9830.975 0.9830.976 0.9840.971 0.9820.977 0.9820.974 0.9820.967 0.9840.977 0.9830.965 0.9810.978 0.9810.979 No dropout 0.975 0.976 0.971 0.977 0.974 0.967 0.977 0.965 0.978 0.979

FigureFigure 7. The 7. The testing testing accuracy accuracy withwith andand with withoutout dropout dropout layers layers for for10 trials. 10 trials. Figure 7. The testing accuracy with and without dropout layers for 10 trials. In orderIn order to probe to probe the the influence influence of of the the pooling pooling layerlayer on on the the classification classification performance performance of the of CNN the CNN model,model, theIn averageorderthe average to probe pooling pooling the wasinfluence was employed employed of the pooling for for comparisons.comparisons. layer on the As classification As shown shown in inFigure performance Figure 8, the8, theclassification of classificationthe CNN accuracymodel,accuracy of the the of average the model model pooling with with average was average employed pooling pooling for is iscomparisons. lowerlower than than 98%,As 98%, shown which which in is Figure inferior is inferior 8, theto thatclassification to thatwith withthe the maxpoolingaccuracymaxpooling operation. of theoperation. model Therefore, Therefore,with average the the operation poolingoperation is of oflower maxpooling than 98%, was was which selected selected is inferiorto achieve to achieve to thatthe reduction thewith reduction the in themaxpoolingin dimension the dimension operation. of the of the data. Therefore,data. the operation of maxpooling was selected to achieve the reduction in the dimension of the data. 0.985 0.985

0.980 0.980

0.975 0.975 Accuracy

Accuracy 0.970 0.970

0.965 0.965

0.960 12345678910 Proposed0.960 CNN 0.983 0.983 0.984 0.982 0.982 0.982 0.984 0.983 0.981 0.981 12345678910 ProposedAvg CNN 0.97670.983 0.9744 0.983 0.9789 0.984 0.9717 0.982 0.9789 0.982 0.9711 0.982 0.9772 0.984 0.9744 0.983 0.9739 0.981 0.9739 0.981 Avg 0.9767 0.9744 0.9789 0.9717 0.9789 0.9711 0.9772 0.9744 0.9739 0.9739 Figure 8. The testing accuracy with average pooling and maxpooling. FigureFigure 8. The8. The testing testing accuracy accuracy withwith av averageerage pooling pooling and and maxpooling. maxpooling.

Sensors 2020, 20, 6576 12 of 20

Sensors 2020, 20, x FOR PEER REVIEW 12 of 19 4.3. Performance Validation of the Proposed Model 4.3. Performance Validation of the Proposed Model To validate the reliability and stability of the proposed model, 10 repeated trials were conducted To validate the reliability and stability of the proposed model, 10 repeated trials were conducted through adopting the optimized training and structural parameters. The maximum pooling method through adopting the optimized training and structural parameters. The maximum pooling method was employed to the dimension reduction in features to be learned. Adam optimizer was used for the was employed to the dimension reduction in features to be learned. Adam optimizer was used for the optimization of the model, and the original learning rate was set as 0.0002. optimization of the model, and the original learning rate was set as 0.0002. As shown in Figure9, the di fference between testing and training accuracy is not very obvious, As shown in Figure 9, the difference between testing and training accuracy is not very obvious, and the average testing accuracy of the 10 trials all exceeded 98%. Therefore, the effectiveness of the and the average testing accuracy of the 10 trials all exceeded 98%. Therefore, the effectiveness of the parameters can be demonstrated for the proposed model. parameters can be demonstrated for the proposed model.

100

99

98

97 Test samples Train samples 96

95 Accuracy (%) Accuracy

94

93

92 1 2 3 4 5 6 7 8 9 10 The number of trials Figure 9. The accuracy curve of the training and testing samples in the proposed method. Figure 9. The accuracy curve of the training and testing samples in the proposed method. As one of the visualization tools in artificial intelligence, a confusion matrix is employed for precisionAs one evaluation of the visualization of classification, tools especiallyin artificial in intelligence, the process a of confusion supervised matrix learning. is employed In order for to precisionanalyze and evaluation discuss of the classification, misclassification especially of the in model, the process a confusion of supervised matrix waslearning. used In for order simply to analyzeand intuitively and discuss presenting the misclassification the statistical classificationof the model, and a confusion misclassification matrix was result used of eachfor simply fault type.and intuitivelyThe proposed presenting model showedthe statistical a favorable classification diagnosis and accuracy misclassification for the non-linear result of each and fault non-stationary type. The proposedsignals. The model accuracy showed reached a favorable 100% diagnosis in the condition accuracy of for xp the and non-linear zc (Figure and 10). non-stationary The misclassification signals. Thewas accuracy primarily reached concentrated 100% in the conditionscondition of of xp th and zc sx, (Figure 24 samples 10). The in the misclassification condition of th was are primarilymisclassified concentrated into sx and in one the sample conditions is misclassified of th and into sx, hx. 24 The samples potential in reasonsthe condition could beof thatth are the misclassifiedhidden features into in sx the and image one ofsample sx and is hx misclassified are similar for into CNN, hx. The and potential it is hard toreasons distinguish could somebe that of the the hiddenlearned features information. in the image of sx and hx are similar for CNN, and it is hard to distinguish some of the learned information.

Figure 10. The confusion matrix of the testing samples in the fifth trial.

Sensors 2020, 20, x FOR PEER REVIEW 12 of 19

4.3. Performance Validation of the Proposed Model To validate the reliability and stability of the proposed model, 10 repeated trials were conducted through adopting the optimized training and structural parameters. The maximum pooling method was employed to the dimension reduction in features to be learned. Adam optimizer was used for the optimization of the model, and the original learning rate was set as 0.0002. As shown in Figure 9, the difference between testing and training accuracy is not very obvious, and the average testing accuracy of the 10 trials all exceeded 98%. Therefore, the effectiveness of the parameters can be demonstrated for the proposed model.

100

99

98

97 Test samples Train samples 96

95 Accuracy (%) Accuracy

94

93

92 1 2 3 4 5 6 7 8 9 10 The number of trials

Figure 9. The accuracy curve of the training and testing samples in the proposed method.

As one of the visualization tools in artificial intelligence, a confusion matrix is employed for precision evaluation of classification, especially in the process of supervised learning. In order to analyze and discuss the misclassification of the model, a confusion matrix was used for simply and intuitively presenting the statistical classification and misclassification result of each fault type. The proposed model showed a favorable diagnosis accuracy for the non-linear and non-stationary signals. The accuracy reached 100% in the condition of xp and zc (Figure 10). The misclassification was primarily concentrated in the conditions of th and sx, 24 samples in the condition of th are misclassified into sx and one sample is misclassified into hx. The potential reasons could be that the hiddenSensors 2020 features, 20, 6576 in the image of sx and hx are similar for CNN, and it is hard to distinguish some13 of of 20 the learned information.

FigureFigure 10. 10. TheThe confusion confusion matrix matrix of of the the test testinging samples samples in in the the fifth fifth trial.

In allusion to the complicated and unintelligible internal operations of CNN, it is of great significance to uncover the mysterious mask to reveal the potential automatic learning process. The visualization of feature learning results was conducted to demonstrate the performance of the model. The feature extraction of major layers were selected to observe the effectiveness of the model, involving five convolutional layers (Conv 1, Conv 2, Conv 3, Conv 4, Conv 5) and three fully connected layers (FC 1, FC 2, FC 3). Meanwhile, the results of raw input data are taken for comparisons. As a powerful nonlinear dimension reduction algorithm, t-SNE is employed to reduce the high-dimensional feature representations to two dimensions [57]. The visualization results represent the first two dimensions of the features obtained from t-SNE. Each point denotes a testing sample. The horizontal and vertical axes display the dimensions of t-SNE. It is worth pointing out that the values of each axis express the results after dimension reduction via t-SNE. It can be found that the useful features of the testing datasets are effectively extracted and represented. From early Conv layers to the final FC layers, the features of different fault categories present an increasingly clear classification, as can be seen from Figure 11. Sensors 2020, 20, x FOR PEER REVIEW 13 of 19

In allusion to the complicated and unintelligible internal operations of CNN, it is of great significance to uncover the mysterious mask to reveal the potential automatic learning process. The visualization of feature learning results was conducted to demonstrate the performance of the model. The feature extraction of major layers were selected to observe the effectiveness of the model, involving five convolutional layers (Conv 1, Conv 2, Conv 3, Conv 4, Conv 5) and three fully connected layers (FC 1, FC 2, FC 3). Meanwhile, the results of raw input data are taken for comparisons. As a powerful nonlinear dimension reduction algorithm, t-SNE is employed to reduce the high-dimensional feature representations to two dimensions [57]. The visualization results represent the first two dimensions of the features obtained from t-SNE. Each point denotes a testing sample. The horizontal and vertical axes display the dimensions of t-SNE. It is worth pointing out that the values of each axis express the results after dimension reduction via t-SNE. It can be found that the useful features of the testing datasets are effectively extracted and represented. From early Conv layers to the final FC layers, the features of different fault categories present an increasingly clear classification, as can be seen from Figure 11. In consideration of raw input, the distributions of the five fault types are almost uniform, which indicates that it is hard to identify the specific types at this stage. After convolutional operation, the features of some fault types start to cluster together. As a whole, overlay phenomena of fault features in the previous layers are apparent. Especially, there is an obvious overlap of various features in the first two layers. As shown in Conv 1, the features of most fault types are scattered points, only the features in the condition of zc present clear clustering; moreover, serious overlays are observed in the features of the four fault types. The features of the other two conditions begin to cluster in Conv 2, xp and hx, respectively; nevertheless, the representations of sx and th are mixed with each other and it is hard to distinguish either of the two types. In view of the FC layers, some crossover areas can be found in the condition of both sx and th, which indicates that misclassification between the two types of faults may occur with this method. However, the feature representations of different fault types Sensors 2020become, 20, 6576 very discriminative, and the features of the same fault types are clustered into the same 14 of 20 region. It can be indicated that the low-hierarchical features are converted into high-hierarchical ones through different network layers and the fault classification performance can be enhanced. 30 40

20 30 20 10 10 0 0 -10 -10 -20 -20

-30 -30

-40 -40 -30 -20 -10 0 10 20 -50 0 50 Raw input Conv1 40 60

20 40

0 20

-20 0

-40 -20

-60 -40 -40 -30 -20 -10 0 10 20 30 40 -60 -40 -20 0 20 40 60 Conv 2 Conv 3

Figure 11. Cont. Sensors 2020, 20, 6576 15 of 20 Sensors 2020, 20, x FOR PEER REVIEW 14 of 19

40 60

20 40

0 20

-20 0

-40 -20

-60 -40 -60 -40 -20 0 20 40 60 -60 -40 -20 0 20 40 60 Conv 4 Conv 5 60 80

40 60

40 20 20 0 0 -20 -20

-40 -40

-60 -60 -80 -60 -40 -20 0 20 40 60 80 -100 -50 0 50 100 FC 1 FC 2 60

40

20

0

-20

-40

-60 -80 -60 -40 -20 0 20 40 60 80 FC 3

Figure 11.FigureVisualization 11. Visualization of of di differentfferent layers via via t-SNE: t-SNE: feature feature representations representations for the raw for input, the rawfive input, five convolutionalconvolutional layers layers andand the last last fully fully connected connected layer, layer, respectively. respectively.

In consideration4.4. Contrastive Analysis of raw input, the distributions of the five fault types are almost uniform, which indicates thatIn order it is hardto further to identify explore the the diagnostic specific pe typesrformance at this of stage.the proposed After CNN, convolutional different CNN operation, the featuresmodels of were some employed fault typesfor comparisons, start to cluster including together. Traditional As LeNet a whole, 5 (T-LeNet overlay 5), Improved phenomena LeNet 5 of fault features(I-LeNet in the previous 5), the CNN layers containing are apparent. three convolut Especially,ional therelayers is (CNN-3), an obvious the overlapCNN containing of various four features in the firstconvolutional two layers. layers As shown (CNN-4), in and Conv Traditional 1, the features AlexNet of (T-AlexNet). most fault types are scattered points, only the features in theFrom condition Figure 12, of it zccan present be seen clearthat the clustering; convergence moreover, effect of the serious proposed overlays CNN areis better observed than in the that of other models. During the early stages of feature learning, the CNN models based on LeNet 5 featurespresent of the a four lower fault accuracy. types. When The it reaches features over of 10 the epochs, other the two accuracy conditions of the proposed begin to CNN cluster is more in Conv 2, xp and hx,than respectively; 96%, but it is lower nevertheless, than 90% thefor the representations LeNet 5 based diagnostic of sx and method. th aremixed with each other and it is hard to distinguish either of the two types. In view of the FC layers, some crossover areas can be found in the condition of both sx and th, which indicates that misclassification between the two types of faults may occur with this method. However, the feature representations of different fault types become very discriminative, and the features of the same fault types are clustered into the same region. It can be indicated that the low-hierarchical features are converted into high-hierarchical ones through different network layers and the fault classification performance can be enhanced. Sensors 2020, 20, 6576 16 of 20

4.4. Contrastive Analysis In order to further explore the diagnostic performance of the proposed CNN, different CNN models were employed for comparisons, including Traditional LeNet 5 (T-LeNet 5), Improved LeNet 5 (I-LeNet 5), the CNN containing three convolutional layers (CNN-3), the CNN containing four convolutional layers (CNN-4), and Traditional AlexNet (T-AlexNet). From Figure 12, it can be seen that the convergence effect of the proposed CNN is better than that of other models. During the early stages of feature learning, the CNN models based on LeNet 5 present a lower accuracy. When it reaches over 10 epochs, the accuracy of the proposed CNN is more thanSensors 96%, 2020 but, it20, is x FOR lower PEER than REVIEW 90% for the LeNet 5 based diagnostic method. 15 of 19

100

90

80 Proposed CNN 70 T-LeNet 5 I-LeNet 5 60 CNN-3 50 CNN-4 Accuracy (%) Accuracy T-AlexNet 40

30

20 0 5 10 15 20 25 30 Epochs Figure 12. The curve of testing accuracy for different CNN models. Figure 12. The curve of testing accuracy for different CNN models. As can be seen from Table3, the average accuracy of the proposed CNN reached 98.44% and the As can be seen from Table 3, the average accuracy of the proposed CNN reached 98.44% and the lowerlower standard standard deviation deviation (STD) (STD) was was only only 0.001171. 0.001171. The The classification classification accuracy accuracy of Traditional of Traditional LeNet LeNet 5 was5 was only only 95.22%, 95.22%, which which was was obviously inferior inferior to the to theproposed proposed model model and to andthe other to the models. other The models. The proposedproposed model outperformed outperformed the the other other models, models, indicating indicating a higher a higher average average accuracy accuracy and a and lower a lower STD.STD. It can Itbe can implied be implied that thethat proposed the proposed CNN CNN displayed displayed good good classification classification performance performance and and stability for hydraulicstability for pump hydraulic faults. pump faults.

TableTable 3. The 3. The number number and and label label configurationconfiguration of of datasets datasets for for hydraulic hydraulic axial axial piston piston pump pumpunder 5 under conditions. 5 conditions. CNN Models Average Accuracy (%) STD CNN Models Average Accuracy (%) STD T-LeNet 5 95.22 0.007472 T-LeNetI-LeNet 5 5 95.22 96.36 0.007472 0.001172 I-LeNetCNN-3 5 96.3696.70 0.0011720.004603 CNN-3CNN-4 96.7096.20 0.0046030.008946 CNN-4T-AlexNet 96.2095.87 0.0089460.003608 T-AlexNet 95.87 0.003608 Proposed CNN 98.44 0.001171 Proposed CNN 98.44 0.001171 For the purpose of observing the classification effectiveness of each different fault type, Forrespectively, the purpose the same of observing models were the classification used for contrast effectivenessive analysis. of From each diFigurefferent 13, fault it can type, be seen respectively, that the sameno obvious models difference were used was for obtained contrastive considering analysis. the Fromclassification Figure 13effect, it canon the be seenthree thatconditions, no obvious differenceincluding was zc, obtained xp and hx. considering However, as the for classificationthe conditions of eff sxect and on th, the the three proposed conditions, CNN model including was zc, a slightly superior to the other CNN models. The distinction of the two types of faults will be xp and hx. However, as for the conditions of sx and th, the proposed CNN model was a slightly considered as the emphasis of following research. superior to the other CNN models. The distinction of the two types of faults will be considered as the emphasis of following research.

Sensors 2020, 20, 6576 17 of 20 Sensors 2020, 20, x FOR PEER REVIEW 16 of 19

1.0000

0.9800

0.9600

0.9400

0.9200

0.9000 Testing accuracy 0.8800

0.8600

0.8400 Proposed T-LeNet 5 I-LeNet5 CNN-3 CNN-4 T-AlexNet CNN hx 0.9944 0.9978 0.9972 0.9917 0.9944 0.9806 sx 0.9861 0.9111 0.9250 0.9111 0.9167 0.9889 th 0.9306 0.9028 0.9139 0.9333 0.9194 0.9056 xp 1.0000 0.9944 0.9972 0.9861 0.9917 0.9972 zc 1.0000 0.9917 0.9944 0.9889 0.9806 1.0000

Figure 13. The curve of testing accuracy for different CNN models. Figure 13. The curve of testing accuracy for different CNN models. 5. Conclusions 5. Conclusions In this paper, an integrated deep learning method was constructed on the basis of CNN for In this paper, an integrated deep learning method was constructed on the basis of CNN for fault fault diagnosis in in a a hydraulic hydraulic axial axial piston piston pump. pump. Th Thee diagnostic diagnostic performance performance was wasvalidated validated by the by the experimentsexperiments on the on the hydraulic hydraulic pump pump testing testing platform. platform. In considerationIn consideration of theof the deficiencies deficiencies in in directly directly using raw raw vibration vibration signals signals for forfeature feature extraction, extraction, CWTCWT was employedwas employed to convertto convert time time series series signals signals into time-frequency time-frequency images. images. The The converted converted images images couldcould provide provide more more useful useful feature feature information information toto bebe used for for the the deep deep model. model. In lightIn light of the of the remarkable remarkable superiority superiority inin imageimage classification, classification, CNN CNN is established is established for feature for feature extractionextraction and and fault faultclassification. classification. Adam Adam is used is for used parameter for parameter optimization optimization of the model. ofMoreover, the model. the dropout strategy is designed in the fully connected layers. Moreover, the dropout strategy is designed in the fully connected layers. The effectiveness and feasibility of the proposed method is demonstrated by the fault Theexperiment. effectiveness The faults and feasibilityof the hydraulic of the pump proposed test methodrig include isdemonstrated hx, sx, xp, th and by the zc. faultThe highest experiment. The faultsaccuracy of the of hydraulic100% can be pump achieved test rigin the include health hx, condition sx, xp, th of and zc and zc. Thexp. The highest average accuracy accuracy of 100% can can be achievedreach up in to the 98.44%, health which condition is superior of zc andto that xp. of The other average CNN accuracymodels. The can stability reach up of tothe 98.44%, model is which is superiordemonstrated to that ofby otherthe results CNN of models. the repeated The stability trials. Furthermore, of the model the is effectiveness demonstrated of the by themodel results is of the repeateddemonstrated trials. by Furthermore, t-SNE, and thethe efeaturesffectiveness after ofdimension the model reduction is demonstrated representby the t-SNE, learning and the featuresconsequence after dimension of CNN. reductionIt can be indicated represent that the the learningproposed consequencemodel presents of the CNN. desirable It can visualized be indicated that theclassification proposed performance model presents for different the desirable fault types visualized in a hydraulic classification axial piston performance pump. Therefore, for different the fault typesproposed in a hydraulic model axialcan automatically piston pump. learn Therefore, the useful the fault proposed features model from cana visually automatically similar time- learn the frequency representation. The proposed CNN model effectively overcomes the exiting shortcomings useful fault features from a visually similar time-frequency representation. The proposed CNN model of conventional methods in terms of complex feature extraction and severe dependence on diagnostic effectivelyknowledge overcomes and experience. the exiting shortcomings of conventional methods in terms of complex feature extractionAlthough and severe the dependencemodel is not desirable on diagnostic for the knowledgefault type of th, and the experience. classification performance of the Althoughmodel presents the modelan advantage is not desirablecompared forwith the other fault methods. type of In th, future the classification research, different performance search of the modelalgorithms presents will anbe advantageexploited for compared the optimization with other of the methods. model, such In future as random research, search di ffanderent grid search algorithmssearch. willIn addition, be exploited enhancement for the optimizationof the input data of the will model, be taken such into as account. random The search conversion and grid from search. In addition,raw signals enhancement to images will of be the accomplished input data throug will beh other taken data into preprocessing account. Themethods conversion for promoting from raw signalsthe to performance images will ofbe the accomplished network model. through other data preprocessing methods for promoting the performance of the network model.

Author Contributions: Conceptualization, S.T.; methodology, S.T. and G.L.; investigation, S.T.; writing—original draft preparation, S.T.; writing—review and editing, Y.Z.; supervision, S.Y. All authors have read and agreed to the published version of the manuscript. Sensors 2020, 20, 6576 18 of 20

Funding: This work is supported by National Natural Science Foundation of China under grant 51779107 and grant 51805214, National Key Research and Development Program of China (No. 2019YFB2005204), and in part by China Postdoctoral Science Foundation under grant 2019M651722, Postdoctoral Science Foundation of Zhejiang Province (No. ZJ2020090), Ningbo Natural Science Foundation (No. 202003N4034), and Open Foundation of the State Key Laboratory of Fluid Power and Mechatronic Systems (No. GZKF-201905), Natural Science Foundation of Jiangsu Province under grant BK20170548 and the Youth Talent Development Program of Jiangsu University. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Du, J.; Wang, S.P.; Zhang, H.Y. Layered clustering multi-fault diagnosis for hydraulic piston pump. Mech. Syst. Signal Process. 2013, 36, 487–504. [CrossRef] 2. Lan, Y.; Hu, J.; Huang, J.; Niu, L.; Zeng, X.; Xiong, X.; Wu, B. Fault diagnosis on slipper abrasion of axial piston pump based on Extreme Learning Machine. Measurement 2018, 124, 378–385. [CrossRef] 3. Wang, S.; Xiang, J.; Tang, H.; Liu, X.; Zhong, Y. Minimum entropy deconvolution based on simulation–determined band pass filter to detect faults in axial piston pump bearings. ISA Trans. 2019, 88, 186–198. [CrossRef][PubMed] 4. Kumar, S.; Bergada, J.M. The effect of piston grooves performance in an axial piston pumps via CFD analysis. Int. J. Mech. Sci. 2013, 66, 168–179. [CrossRef] 5. Kumar, S.; Bergada, J.M.; Watton, J. Axial piston pump grooved slipper analysis by CFD simulation of three dimensional NVS equation in cylindrical coordinates. Comput. Fluids 2009, 38, 648–663. [CrossRef] 6. Bergada, J.M.; Kumar, S.; Watton, J. Axial Piston Pumps, New Trends and Development; Nova Science Publishers: New York, NY, USA, 2012. 7. Lu, C.Q.; Wang, S.P.; Zhang, C. Fault diagnosis of hydraulic piston pumps based on a two-step EMD method and fuzzy C-means clustering. Inst. Mech. Eng. C- J. Mech. Eng. Sci. 2016, 230, 2913–2928. [CrossRef] 8. Gao, Y.; Zhang, Q.; Kong, X. Wavelet-based pressure analysis for hydraulic pump health diagnosis. Trans. ASAE 2003, 46, 969–976. 9. Hoang, D.T.; Kang, H.J. A survey on deep learning based bearing fault diagnosis. Neurocomputing 2019, 335, 327–335. [CrossRef] 10. Zhang, F.; Appiah, D.; Hong, F.; Zhang, J.; Yuan, S.; Adu-Poku, K.A.; Wei, X. Energy loss evaluation in a side channel pump under different wrapping angles using entropy production method. Int. Commun. Heat Mass. 2020, 113, 104526. [CrossRef] 11. Zhao, D.; Wang, T.; Chu, F. Deep convolutional neural network based planet bearing fault classification. Comput. Ind. 2019, 107, 59–66. [CrossRef] 12. Amar, M.; Gondal, I.; Wilson, C. Vibration spectrum imaging: A novel bearing fault classification approach. IEEE Trans. Int. Electron. 2015, 62, 494–502. [CrossRef] 13. Wendy, F.F.; Moises, R.-L.; Oleg, S.; Felix, F.G.-N.; Javier, R.-C.; Daniel, H.-B.; Julio, C.R.-Q. Combined application of power spectrum centroid and support vector machines for measurement improvement in optical scanning systems. Signal Process. 2014, 9, 37–51. 14. Jacob, F.T.; Landen, D.B.; Kody, M.P. On-line classification of coal combustion quality using nonlinear SVM for improved neural network NOx emission rate prediction. Comput. Chem. Eng. 2020, 141, 106990. 15. Kouziokas, G.N. SVM kernel based on particle swarm optimized vector and Bayesian optimized SVM in atmospheric particulate matter forecasting. Appl. Soft Comput. 2020, 93, 106410. [CrossRef] 16. Tang, B.; Song, T.; Li, F.; Deng, L. Fault diagnosis for a wind turbine transmission system based on manifold learning and Shannon wavelet support vector machine. Renew. Energy 2014, 62, 1–9. [CrossRef] 17. Ren, L.; Lv, W.; Jiang, S.; Xiao, Y. Fault diagnosis using a joint model based on sparse representation and SVM. IEEE Trans. Instrum. Meas. 2016, 65, 2313–2320. [CrossRef] 18. Han, T.; Jiang, D.; Zhao, Q.; Wang, L.; Yin, K. Comparison of random forest, artificial neural networks and support vector machine for intelligent diagnosis of rotating machinery. Trans. Inst. Meas. Control 2018, 40, 2681–2693. [CrossRef] 19. Shao, H.; Jiang, H.; Zhang, H.; Liang, T. Electric locomotive bearing fault diagnosis using a novel convolutional deep belief network. IEEE Trans. Ind. Electron. 2018, 65, 2727–2736. [CrossRef] Sensors 2020, 20, 6576 19 of 20

20. Xiong, S.; Zhou, H.; He, S.; Zhang, L.; Xia, Q.; Xuan, J.; Shi, T. A novel end-to-end fault diagnosis approach for rolling bearings by integrating wavelet packet transform into convolutional neural network structures. Sensors 2020, 20, 4965. [CrossRef] 21. Wen, L.; Li, X.; Gao, L.; Zhang, Y. A new convolutional neural network-based data-driven fault diagnosis method. IEEE Trans. Ind. Electron. 2018, 65, 5990–5998. [CrossRef] 22. Pan, J.; Zi, Y.; Chen, J.; Zhou, Z.; Wang, B. LiftingNet: A novel deep learning network with layerwise feature learning from noisy mechanical data for fault classification. IEEE Trans. Ind. Electron. 2018, 65, 4973–4982. [CrossRef] 23. Liang, P.; Deng, C.; Wu, J.; Yang, Z.; Zhu, J.; Zhang, Z. Compound fault diagnosis of gearboxes via multi-label convolutional neural network and wavelet transform. Comput. Ind. 2019, 113, 103132. [CrossRef] 24. Luo, H.; Bo, L.; Peng, C.; Hou, D. Fault Diagnosis for High-Speed Train Axle-Box Bearing Using Simplified Shallow Information Fusion Convolutional Neural Network. Sensors 2020, 20, 4930. [CrossRef][PubMed] 25. Gao, T.; Sheng, W.; Zhou, M.; Fang, B.; Luo, F.; Li, J. Method for Fault Diagnosis of Temperature-Related MEMS Inertial Sensors by Combining Hilbert–Huang Transform and Deep Learning. Sensors 2020, 20, 5633. [CrossRef][PubMed] 26. Jia, F.; Lei, Y.; Lu, N.; Xing, S. Deep normalized convolutional neural network for imbalanced fault classification of machinery and its understanding via visualization. Mech. Syst. Signal Process. 2018, 110, 349–367. [CrossRef] 27. Xu, G.; Liu, M.; Jiang, Z.; Shen, W.; Huang, C. Online fault diagnosis method based on transfer convolutional neural networks. IEEE Trans. Instrum. Meas. 2020, 69, 509–520. [CrossRef] 28. Shenfield, A.; Howarth, M. A novel deep learning model for the detection and identification of rolling element-bearing faults. Sensors 2020, 20, 5112. [CrossRef] 29. Lia, F.; Tang, T.; Tang, B.; He, Q. Deep convolution domain-adversarial transfer learning for fault diagnosis of rolling bearings. Measurement 2021, 169, 108339. [CrossRef] 30. Wang, X.; Shen, C.; Xia, M.; Wang, D.; Zhu, J.; Zhu, Z. Multi-scale deep intra-class transfer learning for bearing fault diagnosis. Reliab. Eng. Syst. Saf. 2020, 202, 107050. [CrossRef] 31. Ye, Q.; Liu, S.; Liu, C. A deep learning model for fault diagnosis with a deep neural network and feature fusion on multi-channel sensory signals. Sensors 2020, 20, 4300. [CrossRef] 32. Xu, Y.; Li, Z.; Wang, S.; Li, W.; Sarkodie-Gyan, T.; Feng, S. A hybrid deep-learning model for fault diagnosis of rolling bearings. Measurement 2021, 169, 108502. [CrossRef] 33. Zhou, Q.; Li, Y.; Tian, Y.; Jiang, L. A novel method based on nonlinear auto-regression neural network and convolutional neural network for imbalanced fault diagnosis of rotating machinery. Measurement 2020, 161, 107880. [CrossRef] 34. Chen, Z.; Gryllias, K.; Li, W. Mechanical fault diagnosis using Convolutional Neural Networks and Extreme Learning Machine. Mech. Syst. Signal Process. 2019, 133, 106272. [CrossRef] 35. Wang, H.; Xu, J.; Yan, R.; Gao, R.X. A new intelligent bearing fault diagnosis method using SDP representation and SE-CNN. IEEE Trans. Instrum. Meas. 2020, 69, 2377–2389. [CrossRef] 36. Wang, R.; Liu, F.; Hou, F.; Jiang, W.; Hou, Q.; Yu, L. A non-contact fault diagnosis method for rolling bearings based on acoustic imaging and convolutional neural networks. IEEE Access 2020, 8, 132761–132774. [CrossRef] 37. Kim, M.; Jung, J.H.; Ko, J.U.; Kong, H.B.; Lee, J.; Youn, B.D. Direct connection-based convolutional neural network (DC-CNN) for fault diagnosis of rotor systems. IEEE Access 2020, 8, 172043–172056. [CrossRef] 38. Li, X.; Jiang, H.; Niu, M.; Wang, R. An enhanced selective ensemble deep learning method for rolling bearing fault diagnosis with beetle antennae search algorithm. Mech. Syst. Signal Process. 2020, 142, 106752. [CrossRef] 39. Wendy, F.F.; Oleg, S.; Felix, F.G.-N.; Moises, R.-L.; Julio, C.R.-Q.; Daniel, H.-B.; Vera, T.; Lars, L. Multivariate outlier mining and regression feedback for 3D measurement improvement in opto-mechanical system. Opt. Quantum Electron. 2016, 48, 403. 40. Shao, S.; Yan, R.; Lu, Y.; Wang, P.; Gao, R.X. DCNN-based multi-signal induction motor fault diagnosis. IEEE Trans. Instrum. Meas. 2020, 69, 2658–2669. [CrossRef] 41. Tang, S.; Yuan, S.; Zhu, Y. Convolutional Neural Network in Intelligent Fault Diagnosis toward Rotatory Machinery. IEEE Access 2020, 8, 86510–86519. [CrossRef] Sensors 2020, 20, 6576 20 of 20

42. Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [CrossRef] 43. Chen, Z.Q.; Li, C.; Sanchez, R.V. Gearbox fault identification and classification with convolutional neural networks. Shock Vib. 2015, 10–13, 1–10. [CrossRef] 44. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, MA, USA, 2016. 45. Peng, Z.K.; Chu, F.L.; Tse, P.W. Singularity analysis of the vibration signals by means of wavelet modulus maximal method. Mech. Syst. Signal Process. 2007, 21, 780–794. [CrossRef] 46. Tang, S.; Yuan, S.; Zhu, Y. Data Preprocessing Techniques in Convolutional Neural Network based on Fault Diagnosis towards Rotating Machinery. IEEE Access 2020, 8, 149487–149496. [CrossRef] 47. Tang, S.; Yuan, S.; Zhu, Y. Deep learning-based intelligent fault diagnosis methods towards rotating machinery. IEEE Access 2020, 8, 9335–9346. [CrossRef] 48. Liu, D.; Cheng, W.; Wen, W. Rolling bearing fault diagnosis via STFT and improved instantaneous frequency estimation method. Procedia Manuf. 2020, 49, 166–172. [CrossRef] 49. Hajiabotorabi, Z.; Kazemi, A.; Samavati, F.F.; Ghaini, F.M.M. Improving DWT-RNN model via B-spline wavelet multiresolution to forecast a high-frequency time series. Expert Syst. Appl. 2019, 138, 112842. [CrossRef] 50. Li, P.; Zhang, Q.S.; Zhang, G.L.; Liu, W.; Chen, F.R. Adaptive S transform for feature extraction in voltage sags. Appl. Soft Comput. 2019, 80, 438–449. [CrossRef] 51. Liu, C.; Gryllias, K. A semi-supervised Support Vector Data Description-based fault detection method for rolling element bearings based on cyclic spectral analysis. Mech. Syst. Signal Process. 2020, 140, 106682. [CrossRef] 52. Liu, H.; Li, L.; Ma, J. Rolling bearing fault diagnosis based on STFT-deep learning and sound signals. Shock Vib. 2016, 2016, 1–12. [CrossRef] 53. Zeng, X.; Liao, Y.; Li, W. Gearbox fault classification using S-transform and convolutional neural network. In Proceedings of the 10th International Conference on Sensing Technology (ICST), Nanjing, China, 11–13 November 2016; pp. 1–5. 54. ALTobi, M.A.S.; Bevan, G.; Wallace, P.; Harrison, D.; Ramachandran, K.P. Fault diagnosis of a centrifugal pump using MLP-GABP and SVM with CWT. Eng. Sci. Technol. Int. J. 2019, 22, 854–861. [CrossRef] 55. Chen, Z.; Mauricio, A.; Li, W.; Gryllias, K. A deep learning method for bearing fault diagnosis based on cyclic spectral coherence and convolutional neural networks. Mech. Syst. Signal Process. 2020, 140, 106683. [CrossRef] 56. Kingma, D.; Ba, J. Adam: A method for Stochastic Optimization. In Proceedings of the 6th International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 1–15. 57. van der Maaten, L.J.P.; Hinton, G.E. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).