<<

energies

Article A Study on Deep Neural Network-Based DC Offset Removal for Phase Estimation in Power Systems

Sun-Bin Kim 1, Vattanak Sok 1, Sang-Hee Kang 1 , Nam-Ho Lee 2 and Soon-Ryul Nam 1,* 1 Department of Electrical Engineering, Myongji University, Yongin 17058, Korea; [email protected] (S.-B.K.); [email protected] (V.S.); [email protected] (S.-H.K.) 2 Korea Electric Power Research Institute, Daejeon 34056, Korea; [email protected] * Correspondence: [email protected]; Tel.: +82-31-330-6361

 Received: 25 March 2019; Accepted: 26 April 2019; Published: 28 April 2019 

Abstract: The purpose of this paper is to remove the exponentially decaying DC offset in fault current waveforms using a deep neural network (DNN), even under harmonics and . The DNN is implemented using the TensorFlow library based on Python. are utilized to determine the number of neurons in each hidden layer. Then, the number of hidden layers is experimentally decided by comparing the performance of DNNs with different numbers of hidden layers. Once the optimal DNN size has been determined, intensive training is performed using both the supervised and unsupervised training methodologies. Through various case studies, it was verified that the DNN is immune to harmonics, noise distortion, and variation of the time constant of the DC offset. In addition, it was found that the DNN can be applied to power systems with different voltage levels.

Keywords: ; exponentially decaying DC offset; deep neural networks (DNNs); optimal size; supervised training; Tensorflow; unsupervised training

1. Introduction While the consumption of electrical energy is increasing day by day, other factors such as power quality and reliability also need to be improved. Protective devices have been improved dramatically as time has passed by to protect power systems and to provide continuous power supply to users. However, variation of current magnitude, known as DC offset, sometimes confuses the operation of protective devices, which leads to miss-operation or unnecessary power outage due to the inaccuracy of current phasor estimation, which is the fundamental for operating modern protective devices. In order to get better accuracy estimation, this paper proposes a method for constructing, training, and testing a deep neural network (DNN) for removing the exponentially decaying DC offset in the current waveform when a power system fault occurs.

Background In recent decades, various algorithms and methods have been proposed to deal with the DC offset problem [1–17]. An approach to remove the DC offset from fault current waveforms is to modify their discrete Fourier transform (DFT) in the phasor domain [2–11]. Most phasor-domain algorithms remove DC offset by analyzing the relationship between the real and imaginary parts of the phasor. By taking advantages of the superior performance of the DFT, phasor-domain algorithms can effectively eliminate DC offset and provide smooth transient response. These algorithms are also robust against noise, but the disadvantage is that the phasor estimation needs more than one cycle. Another approach is to remove the DC offset directly in the time domain [12–17]. Most time-domain algorithms significantly remove the DC offset faster than phasor-domain algorithms. Phasor estimation

Energies 2019, 12, 1619; doi:10.3390/en12091619 www.mdpi.com/journal/energies Energies 2019, 12, 1619 2 of 19 can also be achieved in the time domain, either simultaneously with or directly after removing the DC offset [12–14]. These methods provide a fast response, but are prone to errors when the fault current waveform contains high-frequency components. To overcome this drawback, some algorithms remove the DC offset in the time domain, then apply the DFT for phasor estimation [15–17]. Similar to phasor-domain algorithms, these new algorithms effectively eliminate DC offset. However, the time delay imposed by performing the DC offset removal and DFT leads to taking more than one cycle to estimate the phasor. In order to eliminate the time delay while removing the DC offset in the time domain, the DNN method has been proposed in this paper. The DNN output is the signal of current waveform after removing the DC offset without any time delay. The non-DC offset output is then applied as an input for DFT-based phasor estimation algorithm. In addition to the nonlinearity of the DC offset, the phasor estimation accuracy is usually affected by random noise, which is frequently caused by the analog to digital conversion (ADC) process [18]. As DNN is one of the most effective solutions for nonlinear applications [19–23], it will be chosen to deal with the nonlinearity of DC offset. In this study, in order to improve the training speed through parallel computation [24], a CPU NVIDIA GTX 1060 graphics processing unit has been used for training the DNN. The details of proposed method, experiment procedures, and results will be discussed in the following sections. Section2 will explain the preparation of the training data, while the structure and DNN training will be demonstrated in Section3. The results and conclusion are provided in Sections4 and5, respectively.

2. Data Acquisition There are four stages of training data preparation: data generation, moving window formation, normalization, and random shuffling. As the training datasets have a considerable influence on the performance of DNN, there should be a sufficient amount of data of the required quality, such that the training process can find a good local optimum that satisfies every training dataset given to the DNN. In this study, the theoretical equations for each component (DC offset, harmonics, and noise) were modeled using Python code and training data were acquired directly through iterative simulations.

2.1. Data Generation We modeled a simple single-phase RL circuit (60 Hz) to simulate power system faults with DC offset, harmonics, and noise based on theoretical equations. It was designed to manually determine the time constant, fault inception angle, number of harmonics, and amount of noise. The modeling result is then compared with that of PSCAD/EMTDC simulation. The results show that the modeling single-phase RL (60 Hz) is adequate to apply for simulation based on the theorical condition.

2.1.1. Exponentially Decaying DC Offset It is well known that DC offset occurs in current waveform during faults in a power system. The fault current with DC offset after fault inception time t0 is given in (1), and its illustration is shown in Figures1 and2.

  (t t ) Em   Em − 0 i(t) = sin(ω(t t ) + α φ) + i t− sin(α φ) e− τ , (1) Z − 0 − 0 − Z − where φ is the angle of the RL circuit, which is near to 90◦; and α is the inception angle. A sudden increase in current causes a discontinuous section in the waveform. However, according to Faraday’s Law of Induction, the current flowing in the line cannot change instantaneously in the presence of line inductance. This is the main cause of the DC offset, and the initial value of the DC offset depends on the fault inception angle, while its decay rate depends on the time constant of the power system. Particularly, Figure2b demonstrates fault current when the inception angle is 90 ◦ and its waveform is similar to fault current waveform without DC offset. Assuming that there is no Energies 2019, 12, 1619 3 of 19

Energies 2019, 12, x FOR PEER REVIEW 3 of 19     = current just before the fault inception time i t0− 0 , the DC offset given in (1) is almost zero because Energiessin(α 2019φ), 12is, nearx FOR to PEER zero. REVIEW 3 of 19 −

82

8283 Figure 1. Current increasing during fault occurs at t0 (fault inception angle 0˚). Figure 1. Current increasing during fault occurs at t0 (fault inception angle 0◦). 83 Figure 1. Current increasing during fault occurs at t0 (fault inception angle 0˚).

84 Figure 2. Exponentially decaying DC offset at different fault inception angles: (a) fault inception angle 85 Figure 2. Exponentially decaying DC offset at different fault inception angles: (a) fault inception angle 84 of 180◦;(b) fault inception angle of 90◦. 86 of 180˚; (b) fault inception angle of 90˚. 85 2.1.2.Figure Harmonics 2. Exponentially decaying DC offset at different fault inception angles: (a) fault inception angle 8687 ofA 180 sudden˚; (b) fault increase inception in angle current of 90 ˚causes. a discontinuous section in the waveform. However, Harmonics are defined in the works of [25] as sinusoidal components of a periodic wave or 88 according to Faraday’s Law of Induction, the current flowing in the line cannot change quantity having a frequency that is an integral multiple of the fundamental frequency, and can be 8789 instantaneouslyA sudden increase in the presence in current of linecauses inductance. a discontinuous This is the section main causein the of waveform.the DC offset, However, and the expressed as follows: 8890 accordinginitial value to of Faraday’s the DC offset Law depends of Induction, on the fault the inception current angle,flowing while in its the decay line rate cannot depends change on the in(t) = An sin(nωt θ), (2) 8991 instantaneouslytime constant ofin thethe presencepower system. of line Particularly,inductance. ThisFigure is− the2b maindemonstrates cause of faultthe DC current offset, when and the the 90 92 initialwhereinception valuen is angle the of the harmonic is DC 90 offset˚ and order dependsits and waveformA onn is the its isfault amplitude. similar inception to The fault angle, second, current while third, waveform its decay fourth rate andwithout depends fifth harmonicDC on offset. the 9193 timecomponentsAssuming constant that were of there the considered ispower no current system. in this just paperParticularly, before to maintainthe fault Figure inception an acceptable2b demonstrates time training(𝑖(𝑡 ) =0fault speed.), thecurrent DC offsetwhen given the 9294 inceptionin (1) is almost angle zerois 90 because˚ and its sin waveform(𝛼−𝜙) is is near similar to zero. to fault current waveform without DC offset. 93 Assuming2.1.3. Additive that there Noise is no current just before the fault inception time (𝑖(𝑡 ) =0), the DC offset given 9495 in2.1.2. (1) is Harmonics almost zero because sin(𝛼−𝜙) is near to zero. There are many types of noise interference; however, the typical noise type in power system 96 measurementsHarmonics is are quantization defined in noise the works [26] in ADof [25] conversion. as sinusoidal As noise components components of a have periodic no specified wave or 95 2.1.2. Harmonics 97 frequency,quantity having it was a approximated frequency that as is additive an integral Gaussian multiple white of noise the fundamental (AGWN) and frequency, the amount and of can noise be 9698 wasexpressedHarmonics adjusted as byfollows: are setting defined the signal-to-noisein the works of ratio [25] (SNR as si),nusoidal which can components be expressed of as a follows:periodic wave or 97 quantity having a frequency that is an integral multiple of the fundamental frequency, and can be 𝑖(𝑡) = 𝐴sin(𝑛𝜔𝑡 , −𝜃) (2) 98 expressed as follows: SNRdB = 10 log Asignal/Anoise . (3) · 99 where n is the harmonic order and An is its amplitude. The second, third, fourth and fifth harmonic 𝑖(𝑡) = 𝐴sin(𝑛𝜔𝑡 , −𝜃) (2) 100 componentsThe noise were component considered will disruptin this paper the normal to maintain operation an acceptable of digital filtering training algorithms, speed. as discussed 99 wherein earlier n is sections. the harmonic order and An is its amplitude. The second, third, fourth and fifth harmonic 100101 components2.1.3. Additive were Noise considered in this paper to maintain an acceptable training speed. 2.1.4. Simulation 102 There are many types of noise interference; however, the typical noise type in power system 101 2.1.3. OnAdditive the basis Noise of the theories discussed above, we modeled a simple RL circuit in Python code 103 measurements is quantization noise [26] in AD conversion. As noise components have no specified using NumPy for the simulation of current waveforms. Then, we iterated over the parameters shown 102104 frequency,There are it was many approximated types of noise as additiveinterference; Gaussian however, white the noise typical (AGWN) noise and type the in amount power systemof noise in Table1 to obtain a range of situations that can occur in a power system. As shown in Table1, the 103105 measurementswas adjusted byis quantizationsetting the signal-to-noise noise [26] in ADratio co (SNRnversion.), which As can noise be componentsexpressed as havefollows: no specified 104 frequency, it was approximated as additive Gaussian (AGWN) and the amount of noise 𝑆𝑁𝑅 =10∙𝑙𝑜𝑔𝐴 ⁄𝐴 . (3) 105 was adjusted by setting the signal-to-noise ratio (SNR), which can be expressed as follows:

𝑆𝑁𝑅 =10∙𝑙𝑜𝑔𝐴⁄𝐴. (3)

Energies 2019, 12, 1619 4 of 19 total number of performing simulations is 6480. For simple rehearsal training, the parameters are selected less strictly to maintain an acceptable training speed.

Table 1. Parameters for generating the training datasets. SNR—signal-to-noise ratio.

Parameter Values 10 20 30 40 50 Time constant [ms] 60 70 80 90 100 Fault inception angle [ ] 0 90 180 90 ◦ − 2nd Harmonics ratio [%] 0 10 20 3rd Harmonics ratio [%] 0 7 14 4th Harmonics ratio [%] 0 5 10 5th Harmonics ratio [%] 0 3 6 SNR [dB] 25 40

The validation datasets were generated using the same methodology as the training datasets; however, the validation datasets do not include any of the training datasets. Their generation was conducted by iterating over the parameters given in Table2.

Table 2. Parameters for generating the validation datasets.

Parameter Values Time constant [ms] 25 32 47 Fault inception angle [◦] 45 92 135 2nd Harmonics ratio [%] 15 18 3rd Harmonics ratio [%] 7 9 4th Harmonics ratio [%] 2 4 5th Harmonics ratio [%] 0.5 1 SNR [dB] 25

2.2. Pre-Processing Pre-processing includes moving window formation, normalization, and random shuffling. This process was carried out to generate training data in a form suitable for implementing a DNN. In this paper, the DNN was planned to reconstruct an output of one cycle corresponding to the fundamental current waveform when the DNN received one cycle of the current waveform including DC offset, harmonics, and noise. As 64 samples/cycle was decided as the sampling rate, the size of the input and output layers of the DNN was also determined to be 64. After performing 6480 simulations, we used the moving window technique to prepare the training datasets. Because each simulation lasted 0.4 s with a sampling frequency of 3840 Hz (64 samples/cycle), the total number of training datasets was 9,545,040. Next, we performed data normalization such that every training data lay between 1.0 and +1.0. − This was done so that the final DNN could be generally applied to different power systems under different conditions (voltage level, source impedance, etc.) [27]. For normalization, every value in the input data was divided by the maximum absolute value and the same procedure was applied to the labels. Finally, we randomly shuffled the entire datasets so that we could perform training not focused on specific data. If random shuffling is not performed, divided mini-batches in the stochastic gradient descent (SGD) algorithm would have high variance, leading to the cost minimizing process becoming unstable [28].

3. Design of a DNN and Its Training The size of a DNN is decided by the number of hidden layers and the number of neurons in each layer. Generally, DNNs tend to have better performance when their size is larger [29]; however, too many layers and neurons may induce the DNN to overfit on the training datasets, leading to it Energies 2019, 12, x FOR PEER REVIEW 5 of 19

135 descent (SGD) algorithm would have high variance, leading to the cost minimizing process becoming 136 unstable [28].

137 3. Design of a DNN and its Training 138 The size of a DNN is decided by the number of hidden layers and the number of neurons in each 139 layer.Energies Generally,2019, 12, 1619 DNNs tend to have better performance when their size is larger [29]; however,5 of too 19 140 many layers and neurons may induce the DNN to overfit on the training datasets, leading to it being 141 inappropriate for applications in different environments. Thus, the size of a DNN should be being inappropriate for applications in different environments. Thus, the size of a DNN should be 142 determined carefully to ensure its generality. In this study, we used autoencoders for determining the determined carefully to ensure its generality. In this study, we used autoencoders for determining 143 number of neurons in each hidden layer. Then, the number of hidden layers was experimentally the number of neurons in each hidden layer. Then, the number of hidden layers was experimentally 144 decided by comparing the performance of DNNs with different numbers of hidden layers. decided by comparing the performance of DNNs with different numbers of hidden layers. 145 In addition to contributing to the determination of the number of neurons in each layer, the In addition to contributing to the determination of the number of neurons in each layer, the 146 autoencoder also plays a significant role in pre-training [30]. By using autoencoders for pre-training, autoencoder also plays a significant role in pre-training [30]. By using autoencoders for pre-training, 147 we can start training from well-initialized weights, so that it is more likely to reach better local optima we can start training from well-initialized weights, so that it is more likely to reach better local optima 148 faster than the training process without pre-training. In other words, pre-training contributes to an faster than the training process without pre-training. In other words, pre-training contributes to an 149 improvement in training efficiency. improvement in training efficiency.

150 3.1.3.1. Autoencoder Autoencoder 151 TheThe autoencoderautoencoder can can be describedbe described in brief in asbrief a self-copying as a self-copying neural network neural used network for unsupervised used for 152 unsupervisedfeature extraction. feature The extraction. autoencoder The consists autoencoder of two parts:consists the of encoding two parts: layer the and encoding the decoding layer layer.and the It 153 decodingis possible layer. to restore It is possible the original to restore input usingthe original only the input features using extracted only the by features a well-trained extracted autoencoder. by a well- 154 trainedThis implies autoencoder. that the This extracted implies features that the are extracted distinct features characteristics are distinct representing characteristics the original representing input. 155 theAlso, original we can input. see that Also, these we featurescan see that may these reflect featur nonlineares may characteristics reflect nonlinear of thecharacteristics input, as the of activation the input, 156 asfunction the activation of the encodingfunction of layer the encoding is selected layer to beis selected a rectified to be linear a rectified unit (ReLU) linear unit function. (ReLU) After function. one 157 Afterautoencoder one autoencoder is trained is su trainedfficiently, sufficiently, we can train we thecan secondtrain the autoencoder second autoencoder using the using extracted the extracted features 158 featuresfrom the from first autoencoderthe first autoencoder as an input. as By an repeating input. By this repeating process, wethis can process, achieve we the can gradational achieve andthe 159 gradationalmulti-dimensional and multi-dimensional features of the original features input. of the After original training input. several After training autoencoders several in autoencoders this way, the 160 inencoding this way, layers the of encoding every trained layers autoencoder of every tr areained stacked, autoencoder known as are stacked stacked, autoencoders. known as stacked 161 autoencoders. 3.2. Training Scenario of a DNN 162 3.2. Training Scenario of a DNN Figure3 shows an overall process of the DNN training for this simulation. For every training step, 163 the sameFigure cost 3 shows function, an overall optimizer, process and of training the DNN algorithm training are for utilized.this simulation. For every training step, 164 the same cost function, optimizer, and training algorithm are utilized.

165 166 Figure 3. DeepDeep neural neural network network (DNN) (DNN) trai trainingning process. AE—autoencoder.

Root mean square deviation (RMSD) is used as the cost function, as given in (4):

tuv X64 , Cost = (Output Label )2 n . (4) i − i samples i=1 Energies 2019, 12, 1619 6 of 19

Among several optimizers offered in the TensorFlow library [31], the Adam optimizer was selected. This optimizer is a mixture of the RMSProp optimizer and the Momentum optimizer, which based on a general gradient descent (GD) algorithm. In the training loop, the RMSProp optimizer takes a different numberEnergies of update2019, 12, x valuesFOR PEER for REVIEW each weight, which makes the training process more adaptive. This6 of 19 helps accelerate the training speed and refine the training process. The Momentum optimizer adds inertia to 167 Root mean square deviation (RMSD) is used as the cost function, as given in (4): the weight update, to accelerate the convergence speed and allow the possibility of escape from bad ∑ (𝑂𝑢𝑡𝑝𝑢𝑡 −𝐿𝑎𝑏𝑒𝑙) local minima. The Adam optimizer𝐶𝑜𝑠𝑡 is considered = as one of the most efficient tools for achieving(4) our 𝑛. purpose, as it combines the best features of both optimizers. The Adam optimizer is very effective, 168straight forward,Among several requires optimizers little memory, offered is in well the suitedTensorFlow for problems library [31], that the are Adam large inoptimizer terms of was data or 169 selected. This optimizer is a mixture of the RMSProp optimizer and the Momentum optimizer, which parameters, and is appropriate for problems with noisy or sparse gradients [32]. 170 based on a general gradient descent (GD) algorithm. In the training loop, the RMSProp optimizer 1713.2.1. takes Pre-Training a different Hidden number Layers of update values for each weight, which makes the training process more 172 adaptive. This helps accelerate the training speed and refine the training process. The Momentum 173 optimizerSelecting adds a method inertia to to train the weight hidden update, layers to is significantaccelerate the for convergence DNN. There speed are and many allow techniques the 174available possibility such of as escape an autoencoder, from bad local principal minima. componentThe Adam optimizer analysis is (PCA), considered missing as one values of the ratio,most low 175variance efficient filter, tools and for achieving others. However,our purpose, the as it goal combin of thises the paper best features is to increaseof both optimizers. the accuracy The Adam of phase 176estimation, optimizer so is the very autoencoder effective, straight is selected forward, as requires it can replicate little memory, the structure is well suited of original for problems data, that and its 177 performanceare large isin very terms accurate of data [33or ].parameters, The key point and regardingis appropriate this for process problems is the with necessity noisy ofor trainingsparse an 178 gradients [32]. autoencoder sufficiently before moving on to the next autoencoder. If an autoencoder is not trained 179su fficiently,3.2.1. Pre-Training a certain amountHidden Layers of error would exist that would be delivered to the next autoencoder, causing cascading errors. In short, the pre-training is useless if the autoencoders are not properly 180 Selecting a method to train hidden layers is significant for DNN. There are many techniques trained. On this basis, we determined the number of neurons in hidden layers. 181 available such as an autoencoder, principal component analysis (PCA), missing values ratio, low 182 varianceIn layer-wise filter, trainingand others. shown However, in Figure the4 goal, several of this autoencoders paper is to wereincrease implemented the accuracy with of phase di fferent 183numbers estimation, of neurons. so the Afterautoencoder training is (unsupervisedselected as it ca trainingn replicate using the inputstructure as label),of original the trainingdata, and accuracy its 184is comparedperformance to select is very the accurate optimal [33]. size The of key the point layer. rega Althoughrding this more process neurons is the givenecessity better of performancetraining an in 185a neuralautoencoder network, sufficiently there is abefore limit moving to the numberon to the of ne suchxt autoencoder. neurons, beyondIf an autoencoder which the is not reconstructing trained 186performance sufficiently, of thea certain autoencoder amount wouldof error not would be improved exist that would further. be In delivered addition, to thethe numbernext autoencoder, of neurons in 187each causing layer may cascading be diff erenterrors. from In short, the inputthe pre-training size to prevent is useless overfitting. if the autoencoders are not properly 188 trained. On this basis, we determined the number of neurons in hidden layers.

189 190 FigureFigure 4. Layer-wise 4. Layer-wise training training of of a a DNN DNN usingusing autoencodersautoencoders to to determine determine the the optimal optimal number number of of 191 neuronsneurons in each in each layer layer (optimal (optimal layer layer size). size).

192 AfterIn thelayer-wise size of training the first shown autoencoder in Figure 4, seve is determined,ral autoencoders the were same implemented procedure with is repeateddifferent on 193the secondnumbers autoencoder; of neurons. After the training extracted (unsupervised features fromtraining the using first input autoencoder as label), the are training used accuracy as an input. 194These is compared steps are to repeated select the severaloptimal size times of the until laye therer. Although is no more further neurons noticeable give better improvement performance in the 195reconstruction a neural network, performance. there is a limit to the number of such neurons, beyond which the reconstructing 196 performance of the autoencoder would not be improved further. In addition, the number of neurons 1973.2.2. in Pre-Training each layer may Output be different Layer from the input size to prevent overfitting. 198 After the size of the first autoencoder is determined, the same procedure is repeated on the 199 secondThe output autoencoder; layer isthe known extracted as features the regression from the first layer autoencoder in this study, are used because as an input. we are These performing steps 200regression are repeated through several the DNN,times until not classification.there is no furthe Trainingr noticeable an output improvement layer is in identical the reconstruction to training an 201 performance.

Energies 2019, 12, 1619 7 of 19 autoencoder, except for the activation function. For an autoencoder, the ReLU function was used to reflect nonlinear characteristics in the extracted features. However, ReLU cannot be used as an activation function for the output layer, as ReLU only passes positive values and cannot express negative values. We aim to obtain a sinusoidal current waveform containing negative values, from the output layer; thus, the output layer was set as a linear layer, meaning that it has no activation function. By setting the output layer as a linear layer, it becomes possible to express negative values, and training can proceed. As the signal without interference (DC offset, harmonics, and noise) was used as a label, the training process of the output layer is supervised.

3.2.3. Supervised Fine Tuning DNN training includes the input layer, hidden layers, and output layer. After completing the pre-training of each layer, every layer is stacked together. To improve the performance of the DNN, it is necessary to connect the pre-trained layers naturally, and to adjust the pre-trained weights precisely while taking every layer into account. The DNN is optimized using this process, which is called supervised fine tuning.

3.3. Determination of the DNN Size To determine the DNN size including the number of neurons in each layer and the number of hidden layers, we used partial datasets instead of the whole datasets, because using 9,545,040 datasets would be time-consuming and ineffective and, as the purpose of this step is investigation of the reconstruction ability of autoencoders with different sizes, there would be no issue in using partial datasets. The validation datasets were used as the partial datasets to determine the DNN size. The DNN size was decided according to an experimental procedure based on applying the training process repeatedly. We implemented DNNs with different numbers of hidden layers, so that we could determine the optimal number of layers by analyzing the training results, similar to the determination of the number of neurons in each layer. After completing the determination of the DNN size, we re-initialized all weights before the pre-training process.

3.3.1. Number of Neurons in Each Layer As described in Section 3.2.1, for the first hidden layer, we implemented autoencoders with a different number of neurons and recorded the average cost of each epoch during training. For the first hidden layer, we implemented autoencoders with different sizes and recorded the average cost of each epoch during training. Figure5 shows the e ffect of the number of neurons on the cost reduction. In every case, regardless of the number of neurons, there was no significant change in cost after about 150 epochs. However, the point of convergence differed depending on the size. In fact, except for sizes 10 and 20, every other case converged to a similar cost value. This implies that 30 neurons may be an adequate number for AE1. To confirm this, we analyzed the training accuracy of each case by calculating the maximum cost, average cost, and its standard deviation. Figures6 and7 show the waveforms corresponding to the maximum RMSD for different sizes of AE1. The results of experimental training for AE1 are summarized in Table3. From the cost reduction curve shown in Figure5, we expected that AE1 with 30 neurons would be optimal. However, from the RMSD analysis given in Table3, it was found that AE1 with 50 neurons would be optimal. With more than 40 neurons, the RMSD started to fall within the acceptable range. Energies 2019, 12, x FOR PEER REVIEW 7 of 19

202 3.2.2. Pre-Training Output Layer 203 The output layer is known as the regression layer in this study, because we are performing 204 regression through the DNN, not classification. Training an output layer is identical to training an 205 autoencoder, except for the activation function. For an autoencoder, the ReLU function was used to 206 reflect nonlinear characteristics in the extracted features. However, ReLU cannot be used as an 207 activation function for the output layer, as ReLU only passes positive values and cannot express 208 negative values. We aim to obtain a sinusoidal current waveform containing negative values, from 209 the output layer; thus, the output layer was set as a linear layer, meaning that it has no activation 210 function. By setting the output layer as a linear layer, it becomes possible to express negative values, 211 and training can proceed. As the signal without interference (DC offset, harmonics, and noise) was 212 used as a label, the training process of the output layer is supervised.

213 3.2.3. Supervised Fine Tuning 214 DNN training includes the input layer, hidden layers, and output layer. After completing the 215 pre-training of each layer, every layer is stacked together. To improve the performance of the DNN, 216 it is necessary to connect the pre-trained layers naturally, and to adjust the pre-trained weights 217 precisely while taking every layer into account. The DNN is optimized using this process, which is 218 called supervised fine tuning.

219 3.3. Determination of the DNN Size 220 To determine the DNN size including the number of neurons in each layer and the number of 221 hidden layers, we used partial datasets instead of the whole datasets, because using 9,545,040 datasets 222 would be time-consuming and ineffective and, as the purpose of this step is investigation of the 223 reconstruction ability of autoencoders with different sizes, there would be no issue in using partial 224 datasets. The validation datasets were used as the partial datasets to determine the DNN size. 225 The DNN size was decided according to an experimental procedure based on applying the 226 training process repeatedly. We implemented DNNs with different numbers of hidden layers, so that 227 we could determine the optimal number of layers by analyzing the training results, similar to the 228 determination of the number of neurons in each layer. After completing the determination of the DNN Energies 2019, 12, 1619 8 of 19 229 size, we re-initialized all weights before the pre-training process. Energies 2019, 12, x FOR PEER REVIEW 8 of 19

233 3.3.1. Number of Neurons in Each Layer Energies 2019, 12, x FOR PEER REVIEW 8 of 19 234 As described in Section 3.2.1, for the first hidden layer, we implemented autoencoders with a 233235 3.3.1.different Number number of Neurons of neurons in Each and recordedLayer the average cost of each epoch during training. For the first 236 234 hiddenAs described layer, we inimplemented Section 3.2.1, autoencoders for the first hiddenwith di fferentlayer, wesizes implemented and recorded autoencoders the average with cost aof 237 235 differenteach epoch number during of neuronstraining. and Figure recorded 5 shows the the average effect costof the of numbereach epoch of neurons during training.on the cost For reduction. the first 238 236 hiddenIn every layer, case, we regardless implemented of the autoencodersnumber of neurons, with di therefferent was sizes no significant and recorded change the in average cost after cost about of 239 237 each150 epochepochs. during However, training. the point Figure of 5convergence shows the effect differe ofd the depending number ofon neurons the size. onIn fact,the cost except reduction. for sizes 240 238 In10 every and 20,case, every regardless other case of the converged number ofto neurons,a similar therecost value. was no This significant implies changethat 30 inneurons cost after may about be an 241 239 150adequate epochs. numberHowever, for the AE1. point To of confirmconvergence this, differe we andalyzed depending the trainingon the size. accuracy In fact, ofexcept each for case sizes by 242 240 10calculating and 20, every the othermaximum case convergedcost, average to a cost, similar and co itsst value.standard This deviation. implies that Figures 30 neurons 6 and may7 show be anthe 243 241 adequatewaveforms number corresponding for AE1. Toto confirmthe maximum this, we RMSD analyzed for thedifferent training sizes accuracy of AE1. of Theeach resultscase by of 244 242 calculatingexperimental the trainingmaximum for cost, AE1 averageare summarized cost, and in its Table standard 3. From deviation. the cost Figures reduction 6 and curve 7 show shown the in 245 243 waveformsFigure 5, we corresponding expected that AE1to the wi thmaximum 30 neurons RMSD would forbe optimal.different Ho sizeswever, of from AE1. the The RMSD results analysis of 246 244 experimentalgiven in Table training 3, it was for found AE1 thatare AE1summarized with 50 neurons in Table would 3. From be optimal.the cost reductionWith more curve than 40shown neurons, in 247 the RMSD started to fall within the acceptable range. 230245 Figure 5, we expected that AE1 with 30 neurons would be optimal. However, from the RMSD analysis 246 givenFigure in Table 5. 3,Cost it was reduction found that curve AE1 of autoencoderwith 50 neurons 1 (AE1) would with be di optimal.fferent sizes. With RMSD—root more than 40 mean neurons, 231247 the RMSDFiguresquare started5. deviation. Cost reductionto fall within curve the of autoencoder acceptable range.1 (AE1) with different sizes. RMSD—root mean square 232 deviation.

248

249 Figure 6. The waveform with the maximum RMSD for different sizes of AE1. 248 Figure 6. The waveform with the maximum RMSD for different sizes of AE1. 249 Figure 6. The waveform with the maximum RMSD for different sizes of AE1.

250 Figure 7. The waveform with the maximum RMSD for different sizes of AE1. 251 Figure 7. The waveform with the maximum RMSD for different sizes of AE1. 250 This result implies that the performance of the autoencoder does not depend entirely on the final 252 This result implies that the performance of the autoencoder does not depend entirely on the final 251 cost of training.Figure Among 7. The the waveform eight candidates with the thatmaximum fall within RMSD the for desireddifferent range sizes of of AE1. RMSD, 50 neurons 253 arecost chosen of training. as the sizeAmong of AE1 the becauseeight candidates of a desire that to sparefall within neurons the fordesired AE1 inrange order of to RMSD, simulate 50 partialneurons 254 252 datasetsare Thischosen ratherresult as the thanimplies size the of that whole AE1 the because datasets. performance of a desire of the to autoencoder spare neurons does for not AE1 depend in order entirely to simulate on the partial final 255 253 costdatasets of training. rather Amongthan the the whole eight datasets. candidates that fall within the desired range of RMSD, 50 neurons 256 254 are chosenAs the as size the sizeof AE1 of AE1 was becausedetermined of a todesire be 50, to thespare size neurons of the firstfor AE1 hidden in order layer to of simulate the DNN partial would 257 255 datasetsalso be rather50. This than is the the entire whole procedure datasets. for deciding the hidden layer size. We performed three more 256 As the size of AE1 was determined to be 50, the size of the first hidden layer of the DNN would 257 also be 50. This is the entire procedure for deciding the hidden layer size. We performed three more

Energies 2019, 12, 1619 9 of 19

Energies 2019, 12, x FOR PEER REVIEW 9 of 19 Table 3. Root mean square deviation (RMSD) analysis (autoencoder 1 (AE1)). 258 experiments under the same scenario to determine the size of the next hidden layers and determined Number of Neurons Maximum Average Standard Deviation 259 the size of hidden layers as follows: Hidden layer 1:50, Hidden layer 2:50, Hidden layer 3:60, and Energies 2019, 12, x FOR PEER REVIEW 9 of 19 260 Hidden layer 4:50. 10 263.3774 95.0713 54.2323 20 125.2946 7.5627 5.3217 258 experiments under30 the same scenario to 34.1716 determine the size 6.5513 of the next hidden layers 0.7282 and determined 261 259 Table 3. Root mean square deviation (RMSD) analysis (autoencoder 1 (AE1)). the size of hidden40 layers as follows: Hidden 17.7876 layer 1:50, Hidden 6.5220 layer 2:50, Hidden 0.6824 layer 3:60, and 260 Hidden layer 4:50. Number50 of Neurons 11.7860 Maximum Average 6.5247 Standard 0.6814Deviation 60 14.9845 6.5217 0.6825 261 Table 3. Root10 mean square deviation263.3774 (RMSD) 95.0713 analysis (autoencoder54.2323 1 (AE1)). 70 22.3760 6.5260 0.6838 20 125.2946 7.5627 5.3217 Number80 of Neurons 17.8318 Maximum Average 6.5221 Standard Deviation 0.6849 30 34.1716 6.5513 0.7282 9010 26.2250263.3774 95.0713 6.5210 54.2323 0.6844 40 17.7876 6.5220 0.6824 10020 13.7827125.2946 7.5627 6.5215 5.3217 0.6821 1105030 15.701634.171611.7860 6.55136.5247 6.5884 0.72820.6814 0.6788 6040 17.787614.9845 6.52206.5217 0.68240.6825 7050 11.786022.3760 6.52476.5260 0.68140.6838 As the size of AE1 was determined to be 50, the size of the first hidden layer of the DNN would 8060 14.984517.8318 6.52176.5221 0.68250.6849 also be 50. This is the entire procedure for deciding the hidden layer size. We performed three more 9070 22.376026.2250 6.52606.5210 0.68380.6844 experiments under the same scenario to determine the size of the next hidden layers and determined 10080 17.831813.7827 6.52216.5215 0.68490.6821 the size of hidden layers as110 follows:90 Hidden26.225015.7016 layer 1:50,6.52106.5884 Hidden layer0.68440.6788 2:50, Hidden layer 3:60, and Hidden layer 4:50. 100 13.7827 6.5215 0.6821 110 15.7016 6.5884 0.6788 262 3.3.2. Number of Hidden Layers 3.3.2. Number of Hidden Layers 263 262 3.3.2.After Number the four of different Hidden Layers sizes of DNNs in TensorFlow are implemented, as shown in Figure 8, After the four different sizes of DNNs in TensorFlow are implemented, as shown in Figure8, each 264 263 each outputAfter layer the four is trained different befo sizesre ofperforming DNNs in Tensor the fineFlow tuning. are implemented, as shown in Figure 8, output264 each layer output is trained layer is before trained performing before performing the fine the tuning.fine tuning.

265 265 Figure 8. Four DNNs with different numbers of hidden layers. 266 266 FigureFigure 8. 8. FourFour DNNs with with different different numbers numbers of hidden of hidden layers. layers. 267 The training process of the output layer was performed by the layer-wise training method, and 267 TheThe training training process process of of thethe output layerlayer was was pe performedrformed by theby thelayer-wise layer-wise training training method, method, and and the268 trainingthe training parameters parameters and and the the training training results results for each each output output layer layer are areshown shown in Figure in Figure 9. Every9. Every 268 the training parameters and the training results for each output layer are shown in Figure 9. Every DNN269 exceptDNN except DNN4 DNN4 showed showed a similar a similar training training performanceperformance regarding regarding the thecost costreduction. reduction. For more For more 269 precise270 DNNprecise comparison except comparison DNN4 between showed between diff adifferenterent similar numbers numbers training ofof perf hiddenormance layers, layers, regarding we we performed performed the RMSDcost RMSDreduction. analysis, analysis, and For more and 270 precise comparison between different numbers of hidden layers, we performed RMSD analysis, and the271 resultsthe results are shown are shown in Figure in Figure 10. 10. In In Figure Figure 10 10,, DNN1DNN1 had had the the lowest lowest average average RMSD; RMSD; however, however, DNN2 DNN2 271 272 the results are shown in Figure 10. In Figure 10, DNN1 had the lowest average RMSD; however, DNN2 and DNN3and DNN3 performed performed better better in termsin terms of of the the maximum maximum and and standard standard deviation deviation error. error. DNN4 DNN4 showed showed 272 273 andthe DNN3 worst performed performance, better even inthough terms it ofhad the the maximu most hiddenm and layers. standard deviation error. DNN4 showed the worst performance, even though it had the most hidden layers. 273 the worst performance, even though it had the most hidden layers. RMSD value RMSD value 274 Epoch 275 Figure 9. Four DNNs with different numbers of hidden layers. 274 Epoch 275 Figure 9. Four DNNs with different numbers of hidden layers. Figure 9. Four DNNs with different numbers of hidden layers.

Energies 2019, 12, x FOR PEER REVIEW 10 of 19

276 Fine tuning was then conducted to confirm the best-performing DNN. Unlike the layer-wise 277 training, fine tuning is a training process that involves all layers of the DNN; thus, we connected every 278 layer in the DNNs, as shown in Figure 8. The results of the fine tuning are shown in Figures 11 and 279 12. The results of RMSD analysis after fine tuning were significantly different to those from before. In 280 the results before fine tuning, the performance of DNN appeared to be independent of the number of 281 hiddenEnergies 2019layers., 12, x FORHowever, PEER REVIEW comparing the results after fine tuning, the maximum and standard10 of 19 282 deviation error appeared to decrease as the number of hidden layers increased; DNN3 and DNN4 Energies 2019, 12, x FOR PEER REVIEW 10 of 19 283276 wereEnergies Finewithin2019 tuning, 12 an, 1619 acceptable was then rangeconducted of error. to confirm Finally, theDNN3 best-performing was chosen becauseDNN. Unlike it performed the layer-wise best10 of 19in 277 training, fine tuning is a training process that involves all layers of the DNN; thus, we connected every 284276 termsFine of the tuning average was error then and conducted has the simplestto confirm structure the best-performing capable of obtaining DNN. the Unlike desired the accuracy.layer-wise 278 layer in the DNNs, as shown in Figure 8. The results of the fine tuning are shown in Figures 11 and 277 training, fine tuning is a training process that involves all layers of the DNN; thus, we connected every 279 12. The results of RMSD analysis after fine tuning were significantly different to those from before. In 278 layer in the DNNs, as shown in Figure 8. The results of the fine tuning are shown in Figures 11 and 280 the results before fine tuning, the performance of DNN appeared to be independent of the number of 279 12. The results of RMSD analysis after fine tuning were significantly different to those from before. In 281 hidden layers. However, comparing the results after fine tuning, the maximum and standard 280 the results before fine tuning, the performance of DNN appeared to be independent of the number of 282 deviation error appeared to decrease as the number of hidden layers increased; DNN3 and DNN4 281 hidden layers. However, comparing the results after fine tuning, the maximum and standard 283 were within an acceptable range of error. Finally, DNN3 was chosen because it performed best in 282 deviation error appeared to decrease as the number of hidden layers increased; DNN3 and DNN4 284 terms of the average error and has the simplest structure capable of obtaining the desired accuracy. 283 were within an acceptable range of error. Finally, DNN3 was chosen because it performed best in 284 terms of the average error and has the simplest structure capable of obtaining the desired accuracy.

285 Figure 10. Maximum, average, and standard deviation of costs depending on the number of hidden 286 Figurelayers after10. Maximum, output layer average, training; and the standard two cases deviation with the of best costs performance depending are on marked the number in red. of hidden 287 layers after output layer training; the two cases with the best performance are marked in red. Fine tuning was then conducted to confirm the best-performing DNN. Unlike the layer-wise training, fine tuning is a training process that involves all layers of the DNN; thus, we connected every layer in the DNNs, as shown in Figure8. The results of the fine tuning are shown in Figures 11 and 12. 285 The results of RMSD analysis after fine tuning were significantly different to those from before. In the results before fine tuning, the performance of DNN appeared to be independent of the number of 285286 hiddenFigure layers. 10. However,Maximum, comparingaverage, and the standard results deviation after fine of tuning, costs depending the maximum on the and number standard of hidden deviation 287 errorlayers appeared after tooutput decrease layer astraining; the number the two of cases hidden with layersthe best increased; performance DNN3 are marked and DNN4 in red. were within 286 Figure 10. Maximum, average, and standard deviation of costs depending on the number of hidden an acceptable range of error. Finally, DNN3 was chosen because it performed best in terms of the 287 layers after output layer training; the two cases with the best performance are marked in red. average error and has the simplest structure capable of obtaining the desired accuracy.

288

289 Figure 11. Cost reduction curve of four DNNs with different numbers of nodes after fine tuning.

288

288289 Figure 11. Cost reduction curve of four DNNs with different numbers of nodes after fine tuning. Figure 11. Cost reduction curve of four DNNs with different numbers of nodes after fine tuning. 289 Figure 11. Cost reduction curve of four DNNs with different numbers of nodes after fine tuning.

290

291 Figure 12. Maximum, average, and standard deviation of costs depending on the number of hidden 292 layers after fine tuning; the two cases with the best performance are marked in red.

290 Figure 12. Maximum, average, and standard deviation of costs depending on the number of hidden 290 291 layersFigure after 12. Maximum, fine tuning; average, the two casesand standard with the deviation best performance of costs aredepending marked on in red.the number of hidden 292 layers after fine tuning; the two cases with the best performance are marked in red. 291 Figure 12. Maximum, average, and standard deviation of costs depending on the number of hidden 292 layers after fine tuning; the two cases with the best performance are marked in red.

Energies 2019, 12, x FOR PEER REVIEW 11 of 19

Energies 2019, 12, x FOR PEER REVIEW 11 of 19 293 3.4. DNN3 Training Result 293294 3.4. DNN3Figure Training 13 shows Result the cost reduction curves of three autoencoders and the output layer, and Figure 295 14 shows the cost reduction curve of fine tuning. Comparing the average cost between the layer-wise 294 Figure 13 shows the cost reduction curves of three autoencoders and the output layer, and Figure 296 trained DNN and the fine-tuned DNN3, the latter exhibited a lower final average cost. This 295 14 shows the cost reduction curve of fine tuning. Comparing the average cost between the layer-wise 297 demonstrates the effectiveness of the fine-tuning step for improving performance. 296 trained DNN and the fine-tuned DNN3, the latter exhibited a lower final average cost. This 298 After every training step had been completed, a validation test was performed. The purpose of 297 demonstrates the effectiveness of the fine-tuning step for improving performance. 299 the validation test was to examine the generality of the DNN3. Even if the training is conducted 298 After every training step had been completed, a validation test was performed. The purpose of 300 successfully, the DNN3 may not operate accurately in new situations it has not experienced before, a 299 theEnergies validation2019, 12 ,test 1619 was to examine the generality of the DNN3. Even if the training is conducted11 of 19 301 phenomenon known as overfitting. The maximum, average, and standard deviation of the RMSD 300 successfully, the DNN3 may not operate accurately in new situations it has not experienced before, a 302 were 11.3786, 2.1417, and 1.3150, respectively; there was no sign of overfitting. The average value of 301 phenomenon known as overfitting. The maximum, average, and standard deviation of the RMSD 303 RMSD3.4. DNN3 was Traininglow, and Result the standard deviation was also sufficiently low. A more detailed analysis of 302 were 11.3786, 2.1417, and 1.3150, respectively; there was no sign of overfitting. The average value of 304 validation accuracy is provided in Figure 15 and Figure 16. The DNN3 shows good accuracy even in 303 RMSDFigure was low, 13 showsand the the standard cost reduction deviation curves was also of three sufficiently autoencoders low. A andmore the detailed output analysis layer, and of 305 situations that have not been experienced before in the training process. Finally, the training process 304 validationFigure 14 accuracy shows the is provided cost reduction in Figure curve 15 of and fine Figure tuning. 16. ComparingThe DNN3 shows the average good accuracy cost between even thein 306 of DNN3 was considered to have been completed successfully. 305 situationslayer-wise that trained have DNNnot been and experienced the fine-tuned before DNN3, in the lattertraining exhibited process. a lowerFinally, final the average training cost. process This 306 ofdemonstrates DNN3 was considered the effectiveness to have of been the fine-tuningcompleted stepsuccessfully. for improving performance.

307

307308 Figure 13. Cost reduction curves of three autoencoders and the output layer. Figure 13. Cost reduction curves of three autoencoders and the output layer. 308 Figure 13. Cost reduction curves of three autoencoders and the output layer.

309 Figure 14. Cost reduction curve of DNN3 fine-tuning stage. 309310 Figure 14. Cost reduction curve of DNN3 fine-tuning stage.

310 After every trainingFigure step 14. had Cost been reduction completed, curve of a DNN3 validation fine-tuning test was stage. performed. The purpose of the validation test was to examine the generality of the DNN3. Even if the training is conducted successfully, the DNN3 may not operate accurately in new situations it has not experienced before, a phenomenon known as overfitting. The maximum, average, and standard deviation of the RMSD were 11.3786, 2.1417, and 1.3150, respectively; there was no sign of overfitting. The average value of RMSD was low, and the standard deviation was also sufficiently low. A more detailed analysis of validation accuracy is provided in Figures 15 and 16. The DNN3 shows good accuracy even in situations that

have not been experienced before in the training process. Finally, the training process of DNN3 was considered to have been completed successfully. EnergiesEnergies 20192019,, 1212,, xx FORFOR PEERPEER REVIEWREVIEW 1212 ofof 1919

EnergiesEnergies 20192019, ,1212, ,x 1619 FOR PEER REVIEW 1212 of of 19 19

311311

311312312 FigureFigure 15.15. ValidationValidation testtest casecase 1.1. SNR—signal-to-noiseSNR—signal-to-noise ratio.ratio. Figure 15. Validation test case 1. SNR—signal-to-noise ratio. 312 Figure 15. Validation test case 1. SNR—signal-to-noise ratio.

313313 Figure 16. Validation test case 2. 313314314 FigureFigure 16.16. ValidationValidation testtest casecase 2.2. 4. Performance Tests and Discussions 314 Figure 16. Validation test case 2. 315315 4.4. PerformancePerformance TestsTests andand DiscussionsDiscussions PSCAD/EMTDC is used to model the power system shown in Figure 17 and generate datasets for 315316316 4.performance PerformancePSCAD/EMTDCPSCAD/EMTDC testing. Tests andisis usedused Discussions toto modelmodel thethe powerpower systemsystem shownshown inin FigureFigure 1717 andand generategenerate datasetsdatasets 317317 forfor performanceperformance testing.testing. 316 PSCAD/EMTDC is used to model the power system shown in Figure 17 and generate datasets 317 for performance testing.

318318 Figure 17. Power system model for test datasets (PSCAD/EMTDC). 318319319 FigureFigure 17.17. PowerPower systemsystem modelmodel forfor testtest datasetsdatasets (PSCAD/EMTDC).(PSCAD/EMTDC).

319 After modelingFigure was completed, 17. Power system single model line to for ground test datasets faults were(PSCAD/EMTDC). simulated to acquire fault current 320320 AfterAfter modelingmodeling waswas completed,completed, singlesingle lineline toto grgroundound faultsfaults werewere simulatedsimulated toto acquireacquire faultfault waveforms with DC offset. By adding harmonic components and additive noise, we were able to 321321 currentcurrent waveformswaveforms withwith DCDC offset.offset. ByBy addingadding harmonharmonicic componentscomponents andand additiveadditive noise,noise, wewe werewere ableable 320 prepareAfter test modeling datasets thatwas cancompleted, verify the single performance line to ofgr theound DNN3. faults For were a comparative simulated analysisto acquire between fault 322322 toto prepareprepare testtest datasetsdatasets thatthat cancan verifyverify thethe peperformancerformance ofof thethe DNN3.DNN3. ForFor aa comparativecomparative analysisanalysis 321 currentthe proposed waveforms DNN3 with and DC digital offset. filtering By adding algorithms, harmon weic implementedcomponents and a second additive order noise, Butterworth we were filterable 323323 betweenbetween thethe proposedproposed DNN3DNN3 andand digitaldigital filterinfilteringg algorithms,algorithms, wewe implementedimplemented aa secondsecond orderorder 322 toand prepare a mimic test filter datasets (known that as acan DC-o verifyffset filter),the pe whichrformance is commonly of the DNN3. used in For the a existing comparative digital analysis distance 324324 ButterworthButterworth filterfilter andand aa mimicmimic filterfilter (known(known asas aa DC-offsetDC-offset filter),filter), whichwhich isis commonlycommonly usedused inin thethe 323 betweenrelays [34 the]. Evenproposed if the noiseDNN3 component and digital is removedfiltering usingalgorithms, an analog we low-passimplemented filter beforea second ADC, order the 325325 existingexisting digitaldigital distancedistance relaysrelays [34].[34]. EvenEven ifif thethe noisenoise componentcomponent isis removedremoved usingusing anan analoganalog low-passlow-pass 324 Butterworth filter and a mimic filter (known as a DC-offset filter), which is commonly used in the

325 existing digital distance relays [34]. Even if the noise component is removed using an analog low-pass

Energies 2019, 12, x FOR PEER REVIEW 13 of 19 Energies 2019, 12, 1619 13 of 19 326 Energiesfilter before 2019, 12 ADC,, x FOR thePEER influence REVIEW of the noise generated after the ADC cannot be eliminated from13 of the 19 327 current waveform. We have taken this into account while generating the test datasets. 326 filterinfluence before of theADC, noise the generatedinfluence afterof the the noise ADC generated cannot be after eliminated the ADC from cannot the currentbe eliminated waveform. from Wethe 327328 currenthave4.1. Response taken waveform. this to Curents into We account havewithout whiletaken Harmonics this generating into and account theNoise test while datasets. generating the test datasets. 329 In this test, the current waveform includes only the fundamental current and the DC offset. 328 4.1. Response Response to Curents without Harmonics and Noise 330 Regarding the DC offset, three cases were considered: DC offset in a positive direction, DC offset in a 329331 negativeIn this direction, test, the and current a relatively waveformwaveform small includes amount of onlyon DCly the offset. fundamentalfundamental current and the DCDC ooffset.ffset. 330 Regarding the DC ooffset,ffset, threethree casescases werewere considered:considered: DCDC ooffsetffset inin aa positivepositive direction,direction, DCDC ooffsetffset inin aa 331332 negative4.1.1. Instantaneous direction, and Current a relatively small amount ofof DCDC ooffset.ffset. 333 Figures 18 and 19 show the input, DNN3 output, and DC-offset filter output for three different 332 4.1.1. Instantaneous Instantaneous Current 334 cases. Every DNN3 output (size of 64) overlaps in the graph, and this demonstrates the distinct Figures 18 and 19 show the input, DNN3 output, and DC-offset filter output for three different 333335 featuresFigures of the 18 DNN3and 19 output.show the As input, the DNN3 DNN3 had output, no opportunity and DC-offset to be filter trained output for fortransient three different states, it cases. Every DNN3 output (size of 64) overlaps in the graph, and this demonstrates the distinct 334336 cases.exhibits Every unusual DNN3 behavior output near (size the of fault 64) overlapsinception in time. the However,graph, and subseque this demonstratesnt tests have the confirmed distinct features of the DNN3 output. As the DNN3 had no opportunity to be trained for transient states, it 335337 featuresthat this of phenomenon the DNN3 output. has no Asadverse the DNN3 effect had on nothe opportunityphasor estimation to be trained process, for based transient on full-cyclestates, it exhibits unusual behavior near the fault inception time. However, subsequent tests have confirmed 336338 exhibitsdiscrete unusualFourier transformbehavior near (DFT). the As fault the inceptio output nof time. the DNN3However, is in subseque the formnt of tests a data have window, confirmed we that this phenomenon has no adverse effect on the phasor estimation process, based on full-cycle 337339 thatplotted this the phenomenon most recent has values no adverseusing a redeffect line, on as th showne phasor in Figureestimation 20. From process, this basedpoint forwards,on full-cycle the discrete Fourier transform (DFT). As the output of the DNN3 is in the form of a data window, we 338340 discreteinstantaneous Fourier current transform value (DFT). will be As plotted the output in this of way. the DNN3 is in the form of a data window, we 339 plotted the most recent valuesvalues usingusing aa redred line,line, asas shownshown inin FigureFigure 20 20.. FromFrom thisthis pointpoint forwards,forwards, thethe 340 instantaneous current value will bebe plottedplotted inin thisthis way.way.

341

341342 Figure 18. Performance test result on DC offset in the positive direction (time constant: 0.0210 s, fault 343 Figureinception 18. angle:Performance 0˚): (a) input test result current on DCsignal; offset (b) in DNN3 the positive output; direction (c) DC-offset (time filter constant: output. 0.0210 s, fault

342 Figureinception 18. angle:Performance 0◦): (a) test input result current on DC signal; offset (b )in DNN3 the positive output; direction (c) DC-o ff(timeset filter constant: output. 0.0210 s, fault 343 inception angle: 0˚): (a) input current signal; (b) DNN3 output; (c) DC-offset filter output.

344 Figure 19. Performance test result on DC offset in the negative direction (time constant: 0.0210 s, fault 345 Figure 19. Performance test result on DC offset in the negative direction (time constant: 0.0210 s, fault 344 inception angle: 180◦): (a) input current signal; (b) DNN3 output; (c) DC-offset filter output. 346 inception angle: 180˚): (a) input current signal; (b) DNN3 output; (c) DC-offset filter output. 345 Figure 19. Performance test result on DC offset in the negative direction (time constant: 0.0210 s, fault

346 inception angle: 180˚): (a) input current signal; (b) DNN3 output; (c) DC-offset filter output.

Energies 2019, 12, x FOR PEER REVIEW 14 of 19

Energies 2019, 12, x FOR PEER REVIEW 14 of 19

EnergiesEnergies 20192019,, 12,, x 1619 FOR PEER REVIEW 1414 ofof 1919

347

348347 Figure 20. DNN3 response to the transient state.

349348 4.1.2. Results and ComparisonsFigure of Instantaneous 20. DNN3 response Current to the transient state. 350 Figures 21 and 22 show the phasor analysis results for two different cases. Original signal (green 351347349 waveform),4.1.2. Results signal and Comparisons after using DNNof Instantaneous (red waveform), Current and signal after using DC-offset filter (blue 352 waveform) in Figure 19 are usedFigure to 20.simulateDNN3 their response performances to the transient using state. the full-cycle DFT algorithm. 348350 Figures 21 and 22 show Figurethe phasor 20. DNN3 analysis response results to thefor transienttwo different state. cases. Original signal (green 353 351 waveform),4.1.2.While Results applying signal and Comparisonsafter DFT using to the DNN ofthree Instantaneous (red signals, waveform), the Current DN Nand signal signal gave after the using fastest DC-offset convergence filter speed. (blue 354 349352 Table4.1.2.waveform) Results4 summarizes in and Figure Comparisons 19the are amplitude used of to Instantaneous simulate convergence their Current performances time for each using method the full-cycle in three DFT different algorithm. cases. 355353 AccordingWhileFigures toapplying 21the and table, 22 DFT showthe toDNN thethe phasormethodthree signals, analysis has the the resultsbest DN convergenceN for signal two di gaveff timeerent the for cases. fastest all cases Original convergence studied. signal (greenspeed. 354350 Tablewaveform),Figures 4 summarizes 21 signal and after22 the show usingamplitude the DNN phasor convergence (red analysis waveform), results time andfor for two signaleach different method after usingcases. in Originalthree DC-o ffdifferentset signal filter (greencases. (blue 355351 Accordingwaveform),waveform) to insignal the Figure table, after 19 the areusing DNN used DNN tomethod simulate (red has waveform), theirthe best performances convergence and signal using timeafter the for using full-cycle all cases DC-offset DFTstudied. algorithm. filter (blue 352 waveform) in Figure 19 are used to simulate their performances using the full-cycle DFT algorithm. 353 While applying DFT to the three signals, the DNN signal gave the fastest convergence speed. 354 Table 4 summarizes the amplitude convergence time for each method in three different cases. 355 According to the table, the DNN method has the best convergence time for all cases studied.

356

357 Figure 21. Comparison of phasor estimation between DNN3 output and DC-offset filter output: CASE 356 358 1. Figure 21. Comparison of phasor estimation between DNN3 output and DC-offset filter output: CASE 1. 357 Figure 21. Comparison of phasor estimation between DNN3 output and DC-offset filter output: CASE 358 1. 356

357 Figure 21. Comparison of phasor estimation between DNN3 output and DC-offset filter output: CASE 358 1.

359 360 FigureFigure 22. 22.Comparison Comparison of of phasor phasor estimation estimation between between DNN3 DNN3 output outputand andDC-o DC-offsetffset filter filter output: output: CASECASE 2. 361359 2. While applying DFT to the three signals, the DNN signal gave the fastest convergence speed. 360 Figure 22. Comparison of phasor estimation between DNN3 output and DC-offset filter output: CASE Table4 summarizes the amplitude convergence time for each method in three di fferent cases. According 361 2. to the table, the DNN method has the best convergence time for all cases studied. 359 360 Figure 22. Comparison of phasor estimation between DNN3 output and DC-offset filter output: CASE 361 2.

Energies 2019, 12, 1619 15 of 19 Energies 2019, 12, x FOR PEER REVIEW 15 of 19

Table 4. Amplitude convergence time [ms]. DNN—deep neural network. 362 Table 4. Amplitude convergence time [ms]. DNN—deep neural network.

MethodMethod Case Case 1 1 Case Case 2 Case Case 3 3 InputInput 140.956140.956 138.495 138.495 69.277 69.277 DNN3DNN3 15.93415.934 16.158 16.158 17.727 17.727 Filter 17.339 17.285 20.533 Filter 17.339 17.285 20.533

363 4.1.3.4.1.3. Comparison Comparison Using Using Inaccurate Inaccurate Time Time Constants Constants 364 WeWe have have discussed discussed the the test test results in the case in which the time constant is predicted accurately; 365 however,however, when when an an inaccurate inaccurate time time constant constant is is used used in in a DC-offset DC-offset filter, filter, its performance is adversely 366 affected,affected, as as shown shown in in Figure Figure 23.23.

367 Figure 23. Phasor estimation with inaccurate time constants (the accurate time constant of 0.0210 s): 368 Figure 23. Phasor estimation with inaccurate time constants (the accurate time constant of 0.0210 s): (a) 20%: 0.0042 s; (b) 150%: 0.0315 s. 369 (a) 20%: 0.0042 s; (b) 150%: 0.0315 s. In the case of the DC-offset filter, some oscillations appear in the current amplitude after fault 370 inceptionIn the depending case of the on DC-offset the extent filter, of the some inaccuracy oscillations in the appear time constant. in the current On the otheramplitude hand, after the DNN fault 371 inceptionshows robust depending characteristics on the extent and has of the less inaccuracy convergence in the time. time constant. On the other hand, the DNN 372 shows robust characteristics and has less convergence time. 4.2. Case Study 373 4.2. Case Study For further performance analysis of the proposed DNN3, we will discuss three case studies in 374 this section.For further performance analysis of the proposed DNN3, we will discuss three case studies in 375 this section. 4.2.1. Case A (Line to Line Fault) 376 4.2.1. UpCase to A this (Line point, to Line we have Fault) performed tests based only on the single line to ground fault simulated 377 by PSCADUp to this/EMTDC. point, Inwe this have case, performed the fault typetests isbased changed only toon a the line single to line line fault. to ground In addition, fault harmonicssimulated 378 byand PSCAD/EMTDC. noise are considered In this simultaneously. case, the fault type is changed to a line to line fault. In addition, harmonics 379 and noise are considered simultaneously. Voltage level: 154 kV 380 •• Voltage level: 154 kV Time constant: 0.0210 s 381 •• Time constant: 0.0210 s • Harmonics: 2nd = 11%, 3rd = 7%, 4th = 3%, 5th = 0.5% 382 • Harmonics: 2nd = 11%, 3rd = 7%, 4th = 3%, 5th = 0.5% • Noise: 25 dB 383 • Noise: 25 dB 384 • Fault type: line to line fault (phase A and phase B short circuit) • Fault type: line to line fault (phase A and phase B short circuit) 385 • FaultFault inception inception angle: angle: 30 30˚◦ • 386 Figure 24 shows the test results of CASE A, which are summarized in Table 5. DNN3 has a faster Figure 24 shows the test results of CASE A, which are summarized in Table5. DNN3 has a 387 convergence speed and lower standard deviation compared with the DC-offset filter. Compared with faster convergence speed and lower standard deviation compared with the DC-offset filter. Compared 388 the original input (where DC offset was not removed), the DC-offset filter shows remarkable with the original input (where DC offset was not removed), the DC-offset filter shows remarkable 389 improvement, though it still performs slightly less effectively than the DNN3. However, Figure 24(a) improvement, though it still performs slightly less effectively than the DNN3. However, Figure 24a 390 shows at a glance that the performance of the DC-offset filter is very poor in estimating the shows at a glance that the performance of the DC-offset filter is very poor in estimating the instantaneous 391 instantaneous current waveform. As a result, it was found that the DNN3 performed much better, 392 even though the test was conducted with the DC-offset filter given a precise time constant. Thus, in

Energies 2019, 12, 1619 16 of 19

current waveform. As a result, it was found that the DNN3 performed much better, even though the Energies 2019, 12, x FOR PEER REVIEW 16 of 19 test was conducted with the DC-offset filter given a precise time constant. Thus, in the case of line to 393line fault,the case the of proposed line to line method fault, the is betterproposed than method the DC-o is betterffset filter than inthe both DC-offset factors, filter convergence in both factors, time and 394standard Energiesconvergence deviation. 2019, 12 ,time x FOR and PEER standard REVIEW deviation. 16 of 19

393 the case of line to line fault, the proposed method is better than the DC-offset filter in both factors, 394 convergence time and standard deviation.

395 Figure 24. Test results of CASE A: (a) instantaneous current; (b) amplitude. 396 Figure 24. Test results of CASE A: (a) instantaneous current; (b) amplitude. 395 Table 5. Case A: analysis of current amplitude (phasor). 397 Table 5. Case A: analysis of current amplitude (phasor). 396 MethodFigure 24. Test Convergence results of CASE [ms] A: (a) instantaneous Average [A] current; Standard (b) amplitude. Deviation Method Convergence [ms] Average [A] Standard deviation Input 123.362 16,659.86 289.72 397 Input Table 5. Case123.362 A: analysis of current 16,659.86 amplitude (phasor). 289.72 DNN3 33.085 16,645.36 29.28 DNN3 33.085 16,645.36 29.28 FilterMethod Convergence 36.253 [ms] Average 16,648.87 [A] Standard deviation 43.47 Filter 36.253 16,648.87 43.47 Input 123.362 16,659.86 289.72 DNN3 33.085 16,645.36 29.28 3984.2.2. 4.2.2. Case Case B (Di B ff(Differenterent Power Power System System Situation) Situation) Filter 36.253 16,648.87 43.47 399 To verifyTo verify the the generality generality of of the the proposed proposed DNN3,DNN3, we we performed performed a test a test using using a fault a fault current current in a in a 398400di fferent4.2.2.differentpower Case power B system(Different system model. Powermodel. System Situation) 401 • Voltage level: 22.9 kV 399 To verify the generality of the proposed DNN3, we performed a test using a fault current in a 402 •Voltage Time level: constant: 22.9 0.0402 kV s 400• different power system model. 403 • Harmonics: 2nd = 18%, 3rd = 12%, 4th = 8%, 5th = 4% 401 •Time Voltage constant: level: 0.0402 22.9 kV s 404• • Noise: 25 dB 402 •Harmonics: Time constant: 2nd = 0.040218%, 3rds = 12%, 4th = 8%, 5th = 4% 405• • Fault type: single line to ground fault 403 •Noise: Harmonics: 25 dB 2nd = 18%, 3rd = 12%, 4th = 8%, 5th = 4% 406 • Fault inception angle: 0˚ 404• • Noise: 25 dB 407 FaultThe type: generality single of line the to proposed ground DNN3 fault is verified through this test, as the results are close to those 405• • Fault type: single line to ground fault 408 ofFault CASE inception A, as shown angle: in 0 Figure◦ 25 and summarized in Table 6. Even under the increased level of 406• • Fault inception angle: 0˚ 409 harmonics, as compared with former test cases, DNN3 exhibited robust performance with less 407 The generality of the proposed DNN3 is verified through this test, as the results are close to those 410 convergenceThe generality time of and the lower proposed standard DNN3 deviation, is verified whil throughe a different this power test, as system the results model are is used close for to thosethe of 408 of CASE A, as shown in Figure 25 and summarized in Table 6. Even under the increased level of 411CASE simulations. A, as shown in Figure 25 and summarized in Table6. Even under the increased level of harmonics, 409 harmonics, as compared with former test cases, DNN3 exhibited robust performance with less as compared with former test cases, DNN3 exhibited robust performance with less convergence time and 410 convergence time and lower standard deviation, while a different power system model is used for the lower standard deviation, while a different power system model is used for the simulations. 411 simulations.

412

413 Figure 25. Test results of CASE B: (a) instantaneous current; (b) amplitude. 412 Figure 25. Test results of CASE B: (a) instantaneous current; (b) amplitude. 413 Figure 25. Test results of CASE B: (a) instantaneous current; (b) amplitude.

Energies 2019, 12, 1619 17 of 19

Energies 2019, 12, x FOR PEER REVIEWTable 6. Case B: analysis of current amplitude (phasor). 17 of 19 Method Convergence [ms] Average [A] Standard Deviation 414 Table 6. Case B: analysis of current amplitude (phasor). Input 115.907 2843.29 53.81 MethodDNN3 Convergence 33.127 [ms] Average 2839.76 [A] Standard deviation 9.68 InputFilter 115.907 35.811 2843.29 2840.58 53.81 13.80 DNN3 33.127 2839.76 9.68 4.2.3. Case C (DifferentFilter Power System35.811 Situation) 2840.58 13.80 In this section, we have observed how the proposed DNN3 and the DC-offset filter responded to 415 4.2.3. Case C (Different Power System Situation) the presence of fault resistance. The details of the studied system are given bellow: 416 In this section, we have observed how the proposed DNN3 and the DC-offset filter responded to Voltage level: 345 kV 417 the• presence of fault resistance. The details of the studied system are given bellow: Time constant: 0.034 s 418 •• Voltage level: 345 kV • Harmonics: 2nd = 12%, 3rd = 4%, 4th = 2%, 5th = 0.2% 419 • Time constant: 0.034 s • Noise: 25 dB 420 • Harmonics: 2nd = 12%, 3rd = 4%, 4th = 2%, 5th = 0.2% 421 • Fault type: single line to ground fault (fault resistance 1 ohm) • Noise: 25 dB 422 • Fault type: inception single angle: line to 0◦ ground fault (fault resistance 1 ohm) • 423 • Fault inception angle: 0˚ In Figure 26a, it is seen that the oscillations appear in the result of the DC-offset filter when a fault 424 In Figure 26(a), it is seen that the oscillations appear in the result of the DC-offset filter when a resistance exists. This oscillation is caused by the fault resistance, which affects the time constant of the 425 fault resistance exists. This oscillation is caused by the fault resistance, which affects the time constant DC offset. As summarized in Table7, the oscillation had an adverse e ffect on the convergence time of 426 of the DC offset. As summarized in Table 7, the oscillation had an adverse effect on the convergence the filter. In contrast, DNN3 was not influenced by the fault resistance, which has shown its potential 427 time of the filter. In contrast, DNN3 was not influenced by the fault resistance, which has shown its to solve this problem. 428 potential to solve this problem.

429 Figure 26. Test results of CASE C: comparison between two cases: (a) fault resistance: 1 Ω;(b) no 430 Figure 26. Test results of CASE C: comparison between two cases: (a) fault resistance: 1 Ω; (b) no fault fault resistance. 431 resistance. Table 7. Case C: analysis of current amplitude (phasor). 432 Table 7. Case C: analysis of current amplitude (phasor). Method Convergence [ms] Average [A] Standard Deviation ResistanceMethod 1 ohm Convergence 0 ohm [ms] 1 ohm Average [A] 0 ohm Standard 1 ohm deviation 0 ohm Resistance 1 ohm 0 ohm 1 ohm 0 ohm 1 ohm 0 ohm Input 65.631 117.102 30,588.60 31,683.76 309.66 371.09 DNN3Input 33.12165.631 33.091117.102 30,588.60 30,588.03 31,683.76 31,659.30 309.66 64.81 371.0962.09 FilterDNN3 49.33233.121 35.30133.091 30,588.0 30,557.723 31,659.30 31,675.29 64.81 176.84 62.09 108.77 Filter 49.332 35.301 30,557.72 31,675.29 176.84 108.77 5. Conclusions 433 5. Conclusions In this paper, we developed a DNN to effectively remove DC offset. Autoencoders were used to 434 determineIn this thepaper, optimal we developed size of the a DNN. DNN Subsequently, to effectively intensiveremove DC training offset. for Autoencoders the DNN was were performed used to 435 determineusing both the the optimal supervised size andof th unsupervisede DNN. Subsequently, training methodologies.intensive training for the DNN was performed 436 using both the supervised and unsupervised training methodologies. 437 Even under harmonics and noise distortion, the DNN showed good and robust accuracy in 438 instantaneous current reconstruction and phasor estimation, compared with the DC-offset filter. It 439 was also confirmed that the errors due to inaccurate time constants of the DC offset were significantly

Energies 2019, 12, 1619 18 of 19

Even under harmonics and noise distortion, the DNN showed good and robust accuracy in instantaneous current reconstruction and phasor estimation, compared with the DC-offset filter. It was also confirmed that the errors due to inaccurate time constants of the DC offset were significantly reduced compared with the DC-offset filter. These results confirmed that the method of determining the DNN size using the autoencoder was appropriate. Therefore, the optimal DNN size in other deep learning applications could be determined based on this methodology. As the performance of the DNN is largely affected by the quality of the training datasets, it would be possible to train the DNN more precisely if more sophisticated training datasets could be prepared. Furthermore, it is expected that it would be possible to reconstruct the secondary current waveform of the current transformer distorted by saturation, by modeling the current transformer mathematically and applying the methodology used in this paper.

Author Contributions: S-B.K. prepared the manuscript and completed the simulations. S-R.N. supervised the study and coordinated the main theme of this paper. V.S. developed the simulation model in the study and revised the manuscript. S-H.K. and N-H.L. discussed the results and implications, and commented on the manuscript. All of the authors read and approved the final manuscript. Funding: This research was supported in part by the Human Resources Program in Energy Technology of the Korea Institute of Energy Technology Evaluation and Planning (KETEP), and granted financial resources from the Ministry of Trade, Industry & Energy, Republic of Korea (No. 20154030200770). This research was also supported in part by Korea Electric Power Corporation (Grant number: R17XA05-2). Conflicts of Interest: The authors declare no conflict of interest.

References

1. Lee, J.S.; Hwang, S.-H. DC Offset Error Compensation Algorithm for PR Current Control of a Single-Phase Grid-Tied Inverter. Energies 2018, 11, 2308. [CrossRef] 2. Jyh-Cherng, G.; Sun-Li, Y. Removal of DC offset in current and voltage signals using a novel Fourier filter algorithm. IEEE Trans. Power Deliv. 2000, 15, 73–79. [CrossRef] 3. Sun-Li, Y.; Jyh-Cherng, G. Removal of decaying DC in current and voltage signals using a modified Fourier filter algorithm. IEEE Trans. Power Deliv. 2001, 16, 372–379. [CrossRef] 4. Soon-Ryul, N.; Sang-Hee, K.; Jong-Keun, P. An analytic method for measuring accurate fundamental frequency components. IEEE Trans. Power Deliv. 2002, 17, 405–411. [CrossRef] 5. Yong, G.; Kezunovic, M.; Deshu, C. Simplified algorithms for removal of the effect of exponentially decaying DC-offset on the Fourier algorithm. IEEE Trans. Power Deliv. 2003, 18, 711–717. [CrossRef] 6. Kang, S.; Lee, D.; Nam, S.; Crossley, P.A.; Kang, Y. Fourier transform-based modified phasor estimation method immune to the effect of the DC offsets. IEEE Trans. Power Deliv. 2009, 24, 1104–1111. [CrossRef] 7. Nam, S.; Park, J.; Kang, S.; Kezunovic, M. Phasor Estimation in the Presence of DC Offset and CT Saturation. IEEE Trans. Power Deliv. 2009, 24, 1842–1849. [CrossRef] 8. Silva, K.M.; Kusel, B.F. DFT based phasor estimation algorithm for numerical digital relaying. Electron. Lett. 2013, 49, 412–414. [CrossRef] 9. Zadeh, M.R.D.; Zhang, Z. A New DFT-Based Current Phasor Estimation for Numerical Protective Relaying. IEEE Trans. Power Deliv. 2013, 28, 2172–2179. [CrossRef] 10. Rahmati, A.; Adhami, R. An Accurate Filtering Technique to Mitigate Transient Decaying DC Offset. IEEE Trans. Power Deliv. 2014, 29, 966–968. [CrossRef] 11. Silva, K.M.; Nascimento, F.A.O. Modified DFT-Based Phasor Estimation Algorithms for Numerical Relaying Applications. IEEE Trans. Power Deliv. 2018, 33, 1165–1173. [CrossRef] 12. Nam, S.-R.; Kang, S.-H.; Sohn, J.-M.; Park, J.-K. Modified Notch Filter-based Instantaneous Phasor Estimation for High-speed Distance Protection. Electr. Eng. 2007, 89, 311–317. [CrossRef] 13. Mahari, A.; Sanaye-Pasand, M.; Hashemi, S.M. Adaptive phasor estimation algorithm to enhance numerical distance protection. IET Gener. Transm. Distrib. 2017, 11, 1170–1178. [CrossRef] 14. Gopalan, S.A.; Mishra, Y.; Sreeram, V.; Iu, H.H. An Improved Algorithm to Remove DC Offsets From Fault Current Signals. IEEE Trans. Power Deliv. 2017, 32, 749–756. [CrossRef] 15. Benmouyal, G. Removal of DC-offset in current waveforms using digital mimic filtering. IEEE Trans. Power Deliv. 1995, 10, 621–630. [CrossRef] Energies 2019, 12, 1619 19 of 19

16. Cho, Y.; Lee, C.; Jang, G.; Lee, H.J. An Innovative Decaying DC Component Estimation Algorithm for Digital Relaying. IEEE Trans. Power Deliv. 2009, 24, 73–78. [CrossRef] 17. Pazoki, M. A New DC-Offset Removal Method for Distance-Relaying Application Using Intrinsic Time-Scale Decomposition. IEEE Trans. Power Deliv. 2018, 33, 971–980. [CrossRef] 18. Gosselin, P.; Koukab, A.; Kayal, M. Computing the Impact of White and Flicker Noise in Continuous-Time Integrator-Based ADCs. In Proceedings of the 2016 MIXDES—23rd International Conference Mixed Design of Integrated Circuits and Systems, Lodz, Poland, 23–25 June 2016; pp. 316–320. 19. Nagamine, T.; Seltzer, M.; Mesgarani, N. On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models. In Proceedings of the INTERSPEECH 2016, San Francisco, CA, USA, 8–12 September 2016; pp. 803–807. 20. Hu, W.; Liang, J.; Jin, Y.; Wu, F.; Wang, X.; Chen, E. Online Evaluation Method for Low Frequency Oscillation Stability in a Power System Based on Improved XGboost. Energies 2018, 11, 3238. [CrossRef] 21. Sun-Bin, K.; Sun-Woo, L.; Dae-Hee, S.; Soon-Ryul, N. DC Offset Removal in Power Systems Using Deep Neural Network. In Proceedings of the Innovative Smart Grid Technologies Conference, Washington, DC, USA, 17–20 February 2019; pp. 1–5. 22. Huang, X.; Hu, T.; Ye, C.; Xu, G.; Wang, X.; Chen, L. Electric Load Data Compression and Classification Based on Deep Stacked Auto-Encoders. Energies 2019, 12, 653. [CrossRef] 23. Kim, M.; Choi, W.; Jeon, Y.; Liu, L. A Hybrid Neural Network Model for Power Demand Forecasting. Energies 2019, 12, 931. [CrossRef] 24. Cegielski, M. Estimating of computation time of parallel algorithm of computations of dynamic processes in GPU systems. In Proceedings of the 2015 16th International Conference on Computational Problems of Electrical Engineering (CPEE), Lviv, Ukraine, 2–5 September 2015; pp. 14–16. 25. IEEE Power and Energy Society. IEEE Recommended Practice and Requirements for Harmonic Control in Electric Power Systems; IEEE Std 519-2014 (Revision of IEEE Std 519-1992); IEEE: Piscataway, NJ, USA, 2014; pp. 1–29. [CrossRef] 26. Chiorboli, G.; Franco, G.; Morandi, C. Uncertainties in quantization-noise estimates for analog-to-digital converters. IEEE Trans. Instrum. Meas. 1997, 46, 56–60. [CrossRef] 27. Xie, Y.; Jin, H.; Tsang, E.C.C. Improving the lenet with batch normalization and online hard example mining for digits recognition. In Proceedings of the 2017 International Conference on Analysis and Pattern Recognition (ICWAPR), Ningbo, China, 9–12 July 2017; pp. 149–153. 28. Bilbao, I.; Bilbao, J. Overfitting problem and the over-training in the era of data: Particularly for Artificial Neural Networks. In Proceedings of the 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 5–7 December 2017; pp. 173–177. 29. Shibata, K.; Yusuke, I. Effect of number of hidden neurons on learning in large-scale layered neural networks. In Proceedings of the 2009 ICCAS-SICE, Fukuoka, Japan, 18–21 August 2009; pp. 5008–5013. 30. Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504. [CrossRef] 31. TensorFlow. API r1.13. Available online: https://www.tensorflow.org/api_docs/python (accessed on 23 October 2018). 32. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2015, arXiv:1412.69802015. 33. Almotiri, J.; Elleithy, K.; Elleithy, A. Comparison of autoencoder and Principal Component Analysis followed by neural network for e-learning using handwritten recognition. In Proceedings of the 2017 IEEE Long Island Systems, Applications and Technology Conference (LISAT), Farmingdale, NY, USA, 5 May 2017; pp. 1–5. 34. GE Multili. Manual P/N: 1601-0089-F5 (GEK-106440D); GE Multili: Toronto, ON, Canadan, 2009.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).