applied sciences

Article Benchmarking Daily Line Loss Rates of Low Voltage Transformer Regions in Power Grid Based on Robust Neural Network

Weijiang Wu 1,2 , Lilin Cheng 3,*, Yu Zhou 1,2, Bo Xu 4, Haixiang Zang 3 , Gaojun Xu 1,2 and Xiaoquan Lu 1,2

1 State Grid Jiangsu Electric Power Company Limited Marketing Service Center, Nanjing 210019, China; [email protected] (W.W.); [email protected] (Y.Z.); [email protected] (G.X.); [email protected] (X.L.) 2 State Grid Key Laboratory of Electric Power Metering, Nanjing 210019, China 3 College of Energy and Electrical Engineering, Hohai University, Nanjing 211100, China; [email protected] 4 Jiangsu Frontier Electric Technology Co., Ltd., Nanjing 211100, China; [email protected] * Correspondence: [email protected]; Tel.: +86-173-7211-8166

 Received: 28 September 2019; Accepted: 3 December 2019; Published: 17 December 2019 

Abstract: Line loss is inherent in transmission and distribution stages, which can cause certain impacts on the profits of power-supply corporations. Thus, it is an important indicator and a benchmark value of which is needed to evaluate daily line loss rates in low voltage transformer regions. However, the number of regions is usually very large, and the dataset of line loss rates contains massive . It is critical to develop a regression model with both great robustness and efficiency when trained on big data samples. In this case, a novel method based on robust neural network (RNN) is proposed. It is a multi-path network model with denoising auto-encoder (DAE), which takes the advantages of dropout, L2 regularization and Huber . It can achieve several different outputs, which are utilized to compute benchmark values and reasonable intervals. Based on the comparison results, the proposed RNN possesses both superb robustness and accuracy, which outperforms the testing conventional regression models. According to the benchmark analysis, there are about 13% outliers in the collected dataset and about 45% regions that hold outliers within a month. Hence, the quality of line loss rate data should still be further improved.

Keywords: benchmark calculation; daily line loss rate; low voltage transformer region; robust neural network

1. Introduction Line loss rate is a vital indicator in power gird as it can reflect the operation and management levels in both economic and technical aspects [1]. It is inevitable and directly impacts the profits of power-supply corporations [2]. Commonly, line loss can be classified as technical and non-technical ones. In terms of technical line loss, it is caused by the electro-thermal effect of conductors, which is unavoidable over the transmission process. The level of technical line loss depends on the structure and operation state of the power grid, conductor type, as well as the balance condition of three-phase loads [3]. Non-technical line loss usually occurs due to electricity theft, which may lead to abnormal values in metered line loss rates. Hence, it is urgently concerned by power grid operators whether the daily line loss rate values are within a reasonable range, i.e., the pass percentages of daily line loss rates. It is sometimes difficult to distinguish normal line loss rate values and outliers from a large number of collected samples. Besides the intervals, a benchmark value of daily line loss rate is also essential for

Appl. Sci. 2019, 9, 5565; doi:10.3390/app9245565 www.mdpi.com/journal/applsci Appl. Sci. 2019, 9, 5565 2 of 17 transformer regions, as it directly provides the information what a correct daily line loss rate value the region should approximately achieve, in order for operators to better know the operating condition of the region, as well to further improve the level of line loss management. In this case, an accurate calculation method is still needed nowadays to obtain the benchmark values and reasonable intervals of line loss rates, as well as to recognize outliers from those samples of line loss rates. In the field of data mining and analysis, there are usually four approaches to calculate benchmarks and detect outliers, i.e., the empirical method, statistical method, unsupervised method and supervised method [4–7]. Firstly, empirical method utilizes practical experience to set an interval with fixed bounds, where a value out of the interval can be treated as an . In the benchmarking of daily line loss rates, an empirical interval is customarily set to be 1%~5%. It is noted that although line loss − rate is usually non-negative, a value no less than 1% is acceptable due to unavoidable acquisition − errors. This method is simple and easy to realize, whereas a fixed interval is sometimes inaccurate and cannot reflect the influences of relevant factors. Secondly, the statistical method is aimed to research the distribution of data samples, where the outliers can be eliminated using probabilistic density functions [8] or box-plots [9,10]. Comparing to empirical methods, the interval bounds in statistical methods is able to adapt to different testing samples, while the influencing factors of line loss rates can still hardly be involved in this kind method. Thirdly, the unsupervised method, i.e., the clustering method, is also an efficient way to detect outliers [11]. In clustering, a sample of line loss rate can be treated as a data point, where the values of line loss rate and its influencing factors are the different dimensional attributes of the point. The outliers can be identified based on the distances between the data points and clustering centers [12–14]. Unsupervised methods usually outperform empirical and statistical methods, as multi-dimensional factors can be inputted and analyzed [15]. Nevertheless, unsupervised methods still face several problems. On one hand, it is sometimes difficult to design a proper distance function as those input factors are in dissimilar dimensions and units. On the other hand, some clustering methods involve all data points to update clustering centers, e.g., k-means and fuzzy C-means (FCM), where outliers may easily affect the centers and final clustering results. Reasonable intervals cannot be calculated by unsupervised methods as well. Finally, supervised methods utilize machine learning models to solve classification [16,17] and regression problems [18,19], which are designed for outlier detecting and benchmark calculating tasks, respectively. Classification models learn from labeled samples to distinguish normal and abnormal data. However, line loss rate samples are usually unlabeled, as it is unable to recognize whether a collected value of line loss rates is normal or not. According to the relevant references, data-driven methods have been widely applied to estimate line loss values and loss rates, where clustering and regression methods are the two major ones [20]. In [21], FCM is adopted to select a reference value for each type of feeders, in order to calculate the limit line loss rates of feeders for distribution networks. Similarly, another cluster method, namely Gaussian mixture model (GMM), is utilized to calculate line loss rates for low-voltage transformer regions in [22]. The above two methods both calculate a fixed benchmark value for each cluster, where they are not designed for a single feeder or region. Considering this fact, regression methods are proposed to calculate a benchmark with certain inputs. Those inputs are the influencing factors of line losses from only one feeder or region. Reviewing the state-of-art methods, decision tree and its derived boosting models are the most frequent-used ones in line loss computation. In [23,24], gradient boosting decision tree (GBDT) is used to predict and estimate line losses for distribution networks and transmission lines, respectively. Among those, power flow and weather information are taken as input factors. In [25], extreme gradient boosting (XGBT) model is proposed to estimate line losses for distribution feeders, based on the characteristics of feeders. Regression models can obtain benchmark values for line loss rate samples based on different influencing factors, where a value greatly away from benchmarks can be treated as an outlier. Nevertheless, a regression model may perform poor robustness and reliability when directly trained on samples with massive outliers, causing it critical to develop a highly robust method. In addition to Appl. Sci. 2019, 9, 5565 3 of 17 the boosting models mentioned before, k- nearest neighbors (KNN) is one of the most common-used machine learning models with great robustness [26–28]. It analyzes the similarity between the predicted sample and original training ones, in order to calculate an averaged value based on those nearest training samples. This calculating method can relieve certain impacts from outliers, but it increases the computational burden at the application stage and can hardly deal with high-dimensional inputs. Besides, support vector machine (SVM) is also frequently utilized for [29]. As the regression results of SVM are strongly correlated with support vectors, the influences of outliers are decreased. However, SVM shows low training efficiency on a large amount of samples. Due to the great number of regions in practical application, a method with both high efficiency and robustness is proposed in this study, called robust neural network (RNN). A neural network utilizes error back-propagation (BP) and mini-batch gradient descent algorithms to update its parameters, suitable for big data samples training. Besides, RNN modifies the structure of conventional neural networks, further increasing its robustness against outliers. The main contributions of this study can be summarized in detail as follows.

(1) As the number of researches that focus on benchmarking daily line loss rates is limited, a supervised regression method is proposed in this study to obtain benchmark values of daily line loss rates in different transformer regions. In the proposed supervised method, various influencing factors of line loss rates are considered, where a high computation accuracy can be thus ensured. (2) A novel RNN model is proposed in this study. It possess a multi-path architecture with denoising auto-encoders (DAEs). Moreover, L2 regularization, dropout layer and Huber loss function are also applied in RNN. The robustness and reliability of the proposed regression model are greatly improved when compared with conventional machine learning models according to the testing datasets in the case study. (3) Based on the multiple outputs of the RNN, a method is proposed to calculate benchmark values and reasonable intervals for line loss rate samples. It can precisely evaluate the quality of sampled datasets and eliminate outliers of line loss rates, increasing the stability of data monitoring.

The rest of the paper is organized as follows. The utilized dataset and the proposed method based on RNN are introduced in Section2. The comparison results and discussions are provided in Section3. The conclusion is drawn finally in Section4.

2. Materials and Methods

2.1. Theoretical Computation Equations of Line Losses Theoretical equations of line losses are proposed to compute technical line losses, which are based on the equivalent resistance method. The method supposes that there is an equivalent resistance at the head of line, where the energy loss of three-phase three-wire and three-phase four-wire systems can be formulated as [30]: 2 2 3 ∆A = NK I ReqT 10− (kWh), (1) b av × where ∆Ab refers to the theoretical line loss with balanced three-phase load. N is the structure coefficient, which is equal to 3 at a three-phase three-wire system and equal to 3.5 at three-phase four-wire system. K, Iav, Req and T denote the shape coefficient of load curves, the average current at the head of line (A), the equivalent resistance of conductors (Ω) and the operating time (h), respectively. Furthermore, Req can be computed under the following equation:

P 2 i NiA Ri R = i , (2) eq P 2 N j Aj Appl. Sci. 2019, 9, 5565 4 of 17

where Ni, Ai and Ri are the structure coefficient, the metered electricity power and the resistance of the ith line segment, respectively. Aj denotes the electricity power collected from the jth power meter. For a system with balanced three-phase load, the theoretical line loss should be corrected as:

∆A = ∆A K , (3) ub b × ub where Kub represents the correction coefficient that can be defined as:

2 Kub = 1 + kδI , (4) where k = 2 with one phase heavy load and two phase light loads. If there are two phase heavy loads, k = 8. δI denotes the unbalance level of three-phase load, which can be calculated as follows:

Imax Iav δI = − , (5) Iav where Imax is the current from the phase with the maximum load. Thus, the theoretical line loss defined above is an unavoidable energy loss, which is so-called technical line loss. However, the non-technical line loss caused by electricity theft is also concerned by power grid operators. As those non-technical loss situations can lead to abnormal values in the metered daily line loss rates, it is of necessity to calculate reasonable intervals for outlie discrimination, which is one of the purposes in this study.

2.2. Datasets In practical application, the pass percentages of daily line loss rates in transformer regions are generally examined once a month in State Grid Corporation of China. In this case, the line loss rate dataset from July 2017 is utilized in the study, which is collected in daily intervals, in order to examine the pass percentage of line loss rates in the month. The pass percentage quota is especially important in July as it usually meets the peak load period of summer. The dataset is obtained from a total of 19,884 regions, which are located in Wuxi, Jiangsu Province, China. As a result, there are altogether 616,404 samples in this study, which satisfies the demand of big data analysis. Based on the dataset, about 80% of the samples (15,907 regions) are chosen as training ones and the others (3977 regions) are testing samples.

2.2.1. Data Quality Analysis The research object in this study is daily line loss rate, some of whose curves are presented as examples in Figure1. Besides, the 25 percentile ( q1), (q2), 75 percentile (q3), maximum (max), minimum (min), mean, standard deviation (std), lower bound (la) and upper bound (ua) values of the overall line loss rate dataset are calculated and provided in Table1. The distribution box-plots of the line loss rates are provided in Figure2. It is noted that lower bound ( la) and upper bound (ua) are calculated based on 25 percentile (q1) and 75 percentile (q3) [31], where a value out of the bounds can be treated as an outlier: ( la = q1 1.5 (q3 q1) − × − , (6) ua = q3 + 1.5 (q3 q1) × − According to the curves and quality analysis, the data characteristics of daily line loss rates can be summarized as follows:

1. The line loss rate data possess few daily regularity and show high fluctuation. From Figure1, the curves of line loss rates in different regions change greatly over the days, where historical line loss rates can be hardly used to estimate the further values. Thus, it is vital in this study to select influencing factors of line loss rate. Appl. Sci. 2019,, 9,, x 5565 FOR PEER REVIEW 55 of of 17

to 0 and 0, the lower and upper bounds of the original dataset in the box-plot are –1.57% and 2. The deviations of outliers in the dataset are sometimes extremely away from normal values, 5.22%,indicating respectively, the low dependability which is quite of theclose acquisition to the project and communication standard (−1% equipment.and 5%). However, According the to maximumTable1 and and Figure minimum2, the lower of th ande collected upper bounds line loss of therates original are 100% dataset and in –1.69×10 the box-plot6%, respectively; are 1.57% − and 5.22%, respectively, which is quite close to the project standard ( 1% and 5%). However, the greatly different from the bounds. In this case, benchmarking line− loss rates is still necessary maximum and minimum of the collected line loss rates are 100% and 1.69 106%, respectively; nowadays in practicable applications. − × greatly different from the bounds. In this case, benchmarking line loss rates is still necessary

3. Thenowadays quality in of practicable the dataset applications. is poor to be used directly. As the component analysis of the dataset 3. isThe presented quality ofin the0, there dataset are isa large poor tonumber be used of directly.outliers and As the missing component values, analysis constituting of the 8.67% dataset and is 6.72%presented of the in overall Figure3 dataset., there are In athis large study, number the ofspline outliers interpolation and missing method values, is constituting utilized to fill 8.67% the missingand 6.72% values. of the From overall 0 dataset.and 0, Inthe this dataset study, afte ther splineinterpolation interpolation holds method a similar is utilized distribution to fill the missing values. From Table1 and Figure2, the dataset after interpolation holds a similar comparing to that of the original dataset. On the contrary, although the outliers can be distribution comparing to that of the original dataset. On the contrary, although the outliers can eliminated directly based on la and ua, the distribution will change and it will be difficult to be eliminated directly based on la and ua, the distribution will change and it will be difficult to calculatecalculate an an accurate accurate reasonable reasonable intervals. intervals.

District No.2180 District No.3372 District No.15786 District No.10464 4 4 3 10 3 3 2 5 2 2 0 1 Line loss rate (%) rate loss Line 1 1 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 District No.1871 District No.4854 District No.2130 District No.1302 6 3 5 4 4 2.5 0 2 2 2 -5 0 Line loss rate (%) rate loss Line 1.5 0 -10 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 District No.3175 District No.5001 District No.962 District No.2148 10 4 1 4

5 2 0.8 3 0 0 0.6 -2

Line loss rate (%) rate loss Line -5 2 0 10 20 30 0 10 20 30 0 10 20 30 0 10 20 30 Time (day) Time (day) Time (day) Time (day) Figure 1. ExamplesExamples of of daily daily line line loss loss rates rates in in July July 2018 2018 from from different different regions in Wuxi, Jiangsu Province, China.

Table 1. The data quality analysis based on the overall line loss rate dataset. Original Dataset Indicator Original Dataset Original Dataset Dataset after Interpolation Indicator Original Dataset (Without Outliers)Dataset after Interpolation (Without Outliers) mean (%) 5.16 1.78 10.96 mean (%) −5.16− 1.78 −10.96 − std (%) 2.47 103 1.16 3.19 103 std (%) 2.47×10× 3 1.16 3.19×103 × min (%) 1.69 106 1.57 1.69 106 − × 6 − − 6 × maxmin (%) (%) −1.69×10 100 −1.57 5.22−1.69×10 100 lamax(%) (%) 1001.57 5.22 1.10 100 1.50 − − − ua (%)la (%) −1.57 5.22 −1.10 4.57−1.50 5.26 q1 (%) 1.02 1.02 1.03 ua (%) 5.22 4.57 5.26 q2 (%) 1.74 1.67 1.76 q1 (%) 1.02 1.02 1.03 q3 (%) 2.70 2.44 2.72 q2 (%) 1.74 1.67 1.76 q3 (%) 2.70 2.44 2.72 Appl. Sci. 2019, 9, 5565 6 of 17 Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 17 Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 17 6 5.26 5.22 ua (upper bound) 56 4.57 5.26 ua (upper bound) 45 5.22 4.57 34 q3 (75 percentile) 2.70 2.44 2.72 2 3 1.74 1.67 1.76 q3q2 (75 (median) percentile) 2.70 2.44 2.72 12 1.02 q1 (25 percentile) 1.021.74 1.67 1.031.76 q2 (median) 01 1.02 q1 (25 percentile)

Line loss rate (%) rate loss Line 1.02 1.03 -10 -1.10

Line loss rate (%) rate loss Line -1.57 la (lower bound) -1.50 -2-1 -1.10 -1.57 la (lower bound) -3-2 -1.50 Original dataset Dataset after Original dataset -3 (without outliers) interpolation Original dataset Dataset after Original dataset FigureFigure 2.2. The distribution box-plots of (without original outliers) datasetdataset andandinterpolation thethe datasetdataset afterafter interpolationinterpolation operation.operation. Figure 2. The distribution box-plots of original dataset and the dataset after interpolation operation. Line loss rate (LLR) Data quality analysis Line loss rate (LLR) Data quality analysisNormal : 521,510 (84.61%)

NormalLLR: 521,510= -1.57%~1% (84.61%): 126,594 (20.54%)

LLRLLR = = 1%~3% -1.57%~1%: 318,831: 126,594 (51.72%) (20.54%)

The total LLRLLR = = 3%~5.22% 1%~3%: 318,831: 76,085 (51.72%) (12.35%) number of The total LLR = 3%~5.22%: 76,085 (12.35%) 20.54% samples: Outlier: 53,499 (8.67%) number616,404 of

20.54% samples: OutlierLLR: 53,499 < -1.57% (8.67%): 12,430 (2.01%) 616,404 LLRLLR > < 5.22% -1.57%: 41,069: 12,430 (6.66%) (2.01%) 51.72% MissingLLR: 41,395 > 5.22% (6.72%): 41,069 (6.66%) Figure 3. The component result51.72% of data quality analysisMissing (84.61%: 41,395 (6.72%)normal values, 8.67% outliers and 6.72% missing values). FigureFigure 3. 3.The The componentcomponent resultresult ofof datadata qualityquality analysisanalysis(84.61% (84.61%normal normalvalues, values,8.67% 8.67%outliers outliersand and 6.72%6.72% missing missing values). values). 2.2.2. Influencing Factors of Line Loss Rate 2.2.2. Influencing Factors of Line Loss Rate 2.2.2.Taking Influencing into account Factors of of both Line the Loss possible Rate influencing factors and the information recorded, a total of twelveTakingTaking factors into into account accountare selected of of both both as the theinputs possible possible of the influencing influe regressionncing factors factors models, and and as the the shown information information in 0. Among recorded, recorded, those, a a total total the ofthirdof twelvetwelve and factors fourthfactors factors are are selected selected are one-bit as as inputs inputs codes, of of thewhile the regression re thegression others models, models,are numerical as as shown shown values. in in Table 0. Among2. Among those, those, the thethird third and and fourth fourth factors factors are are one-bit one-bit codes, codes, while while the the others others are are numerical numerical values. values. Table 2. Influencing factors of line loss rate utilized in this study. Table 2. Influencing factors of line loss rate utilized in this study. No. TableFactor 2. Influencing factors of line loss rate utilized Remarksin this study. No.1 No. FactorDate Factor Remarks - Remarks 21 1 Capacity of Datethe transformer Date - - - 32 2 Capacity Type of Capacityofthe the transformer transformer of the transformer Public transformer (=0); -special - transformer (=1) Public transformer (=0); special transformer 3 3Type Type of the of girdthe Type transformerthat of thethe transformer region Public transformer (=0); special transformer (=1) 4 City grid (=0); country(=1) gird (=1) Type of thebelongs gird that to the region 4 4 Type of the gird that the region belongs toCity grid City grid(=0); (country=0); country gird gird (=1) (= 1) 5 5 Monthlybelongs line Monthly loss to linerate loss rate Three inputs, Three inputs, including including the last the lastthree three months months Daily load rate =Daily daily load power rate supply/(transformer= daily power 5 6 Monthly line Daily loss loadrate rate Three inputs, including the last three months 6 Daily load rate supply/(transformer capacity 24) Daily load rate = dailycapacity power × 24) supply/(transformer × 76 7 Daily Dailymaximum Daily load maximum loadrate rate load rate - - 8 Daily average power factorcapacity × -24) 87 9 Daily Daily average maximum Number power load of factor customersrate - - - 98 Daily Number average of customers power factor Average capacity - - = transformer 10 Average transformer capacity per customer 9 Average Number transformer of customers capacity per Average capacitycapacity = transf/numberormer - capacity/number of customers of 10 11 Rate of residential capacity Rate = residential capacity/transformer capacity Average transformercustomer capacity per Average capacity = transfcustomersormer capacity/number of 10 12 Power supply duration - 11 Rate of residentialcustomer capacity Rate = residential capacity/tcustomersransformer capacity 1211 Rate Power of residential supply duration capacity Rate = residential capacity/t - ransformer capacity 12 Power supply duration -

Appl.Appl. Sci. Sci. 20192019, 9,,9 x, 5565FOR PEER REVIEW 7 of7 of17 17

2.3. Calculation of Benchmark Values and Reasonable Intervals 2.3. Calculation of Benchmark Values and Reasonable Intervals According to the data quality analysis, the original datasets contain a large quantile of outliers, whichAccording are far from to the the rational data quality range, analysis, causing theit much original difficult datasets to obtain contain an accurate a large quantile result. Therefore, of outliers, thewhich task areof this far fromstudy the is to rational utilizerange, robust causing learning it strategy much di ffiandcult achieve to obtain a stable an accurate regression result. result Therefore, away fromthe taskoutliers, of this as studyshown is in to 0. utilize robust learning strategy and achieve a stable regression result away from outliers, as shown in Figure4. y-axis y-axis

outliers x-axis x-axis

(a) (b)

FigureFigure 4. 4. ConventionalConventional learning learning methods methods may may be be easily easily affected affected by by outliers. outliers. (a ()a Common) Common condition; condition; (b(b) )affected affected by by outliers. outliers.

Commonly,Commonly, there there is is a arobust robust learning learning solution solution that that one one can can set set thresholds thresholds manually manually and and delete delete outliersoutliers from from the the dataset dataset according according to to those those thresh thresholds,olds, where thethe restrest ofof thethe datasetdataset can can be be applied applied to totrain train a machinea machine learning learning model. model. However, However, it still it remains still remains a problem a problem how to decidehow to precise decide thresholds. precise thresholds.Besides, computed Besides, computed bounds of bounds the reasonable of the reasonable interval from interval the learning from the model learning may model be close may to be the closemanual to the thresholds, manual thresholds, which breaks which the distribution breaks the ofdistribution the original of datasetthe original and makes dataset it meaninglessand makes it to meaninglesstrain a probabilistic to train learninga probabilistic model. learning In this case, model. the calculation In this case, method the calculation based on RNN method is proposed, based on as RNNshown is proposed, in Figure5 ,as which shown consists in 0, which of the consists following of the steps: following steps:

1.1. BuildBuild a a RNN.RNN. InIn order order to fullyto fully expand expand its robustness, its robustness, DAE, multi-path DAE, multi-path architecture, architecture, L2 regularization, L2 regularization,dropout layer dropout and Huber layer loss and function Huber are loss applied. function It is notedare applied. that a RNNIt is possessesnoted that ten a outputRNN possessesnodes, where ten output each node nodes, is connectedwhere each to node one layer is connected with a dissimilar to one layer dropout with ratea dissimilar (from 0.05 dropout to 0.50). 2. rateCalculate (from 0.05 an averageto 0.50). value according to the ten different outputs, which is the final benchmark 2. Calculatevalue of an line average loss rate, value as: according to the ten different outputs, which is the final benchmark X value of line loss rate, as: 1 10 n eyi = yi , (7) 1010 n=1  = 1 n yii = y , (7) 10 n n 1 where eyi is the ith benchmark value, and yi is the nth output of the ith line loss rate.  n 3. whereOperate yi erroris the analysis ith benchmark to acquire value, a reasonable and yi is interval. the nth Notoutput only of thethe absoluteith line loss error rate. between the 3. Operatebenchmark error valueanalysis and to the acquire actual a linereasonable loss rates interval. is computed, Not only the the variance absolute of dierrorfferent between outputs the is benchmarkcalculated value as well. and According the actual toline the loss interval rates is result, computed, data points the variance that are of not different involved outputs between is calculatedthe bounds as ofwell. the According interval are to considered the interval as result, outliers. data The points operation that are can not be describedinvolved underbetween the following equations: the bounds of the interval are considered ras outliers. The operation can be described under the 1 Xns  2 following equations: = e1 eyi yi∗ , (8) ns i=1 − 1 n ∗ 2 eyy=−s () 1 r i=1 ii, (8) X10 2 ns 1  n e2 = eyi y , (9) 10 1 i=1 − i 1 10 2 eyy=−− () n (9) 2 (  i =1 ii, 10− 1li = eyi e1 e2 − − , (10) lyee=−−ui = eyi + e1 + e2  ii12, (10) =++ where e1 and e2 are the results of erroruyeeii analysis.12e1 is a constant and e2 changes according to the wherenumber e1 andi. n se2is are the the number results of of training error analysis. samples; e1y isi∗ is a theconstantith actual and linee2 changes loss rate; accordingli and ui toare the the lower and upper bounds of the reasonable interval,∗ respectively. Furthermore, as outliers exist in number i. ns is the number of training samples; yi is the ith actual line loss rate; li and ui are the the actual line loss rate values, which may influence the result of e1, a two-tailed test is utilized to lower and upper bounds of the reasonable interval, respectively. Furthermore, as outliers exist in the actual line loss rate values, which may influence the result of e1, a two-tailed test is utilized Appl. Sci. 2019, 9, x FOR PEER REVIEW 8 of 17

∗ Appl. Sci. 2019, 9, 5565 8 of 17 to eliminate the possibly abnormal yi values that are smaller than 0.7 percentile and bigger Appl. Sci. 2019, 9, x FOR PEER REVIEW 8 of 17 than 99.3 percentile, as shown in 0: eliminate the possibly abnormal y values∗ that are smaller than 0.7 percentile and bigger than to eliminate the possibly abnormali∗ y values that are smaller than 0.7 percentile and bigger 99.3 percentile, as shown in Figure6: i Dropout Ouput 1 than 99.3 percentile, as shown in 0: rate = 0.05 Averaging Denoising auto-encoder DropoutDropout Benchmark Region ID OuputOuput 1 2 raterate == 0.050.10 Power load rate Multiple paths Averaging Value Denoising auto-encoder Dropout… … … … Benchmark Number of consumersRegion ID L2 regularization Ouput 2 rate = 0.10 … … Power load rate Multiple paths Dropout Value Dropout … … …Ouput … i Error analysis Average transformer capacityNumber of consumers L2 regularization rate = 0.05×i … … Huber loss function Dropout Reasonable Dropout Dropout Ouput i Error analysis Input factorsAverage transformer capacity rate = 0.05×i Ouput 10 Interval Huber loss function rate = 0.50 Reasonable Dropout Input factors Robust neural network Ouput 10 Interval rate = 0.50 Robust neural network Multiple outputs Multiple outputs Figure 5. The flowchart of the proposed method in this study for calculating benchmark values and FigureFigure 5. 5.The The flowchart flowchart of of the the proposed proposed method method in in this this study study for for calculating calculating benchmark benchmark values values and and reasonablereasonablereasonable intervals intervals intervals based based onbased robust on on robust robust neural neural neural network network network (RNN).(RNN). (RNN).

0.4 0.4 ReservedReserved area area

Elimination area Elimination area 0.3 0.3

0.2 0.2 0.1 0.7 percentile 99.3 percentile Probability density function 0.1 0 0.7 percentile 99.3 percentile Probability density function Actual line loss rate 0 Figure 6. A two-tailed test to eliminate possibly abnormal line loss rate values. Actual line loss rate Figure 6. 2.4. RobustFigure Neural 6. A Network two-tailedA two-tailed test testto eliminate to eliminate possibly possibly abnormal abnormal line line loss loss rate values.rate values. 2.4. RobustAs mentioned Neural Network before, an RNN is used in this study for robust learning, whose architecture is provided in 0. It is made up of three main paths which are combined by concatenation, where there 2.4. Robust NeuralAs mentioned Network before, an RNN is used in this study for robust learning, whose architecture is a DAE on each main path. The concatenated output nodes are placed in a same layer, which is provided in Figure7. It is made up of three main paths which are combined by concatenation, As mentionedrepresent high-order before, an features RNN extractedis used infrom this or studyiginal inputs.for robust For thelearning, further whoseimprovement architecture of is where there is a DAE on each main path. The concatenated output nodes are placed in a same layer, provided robustness,in 0. It is made L2 regularization up of three is utilizedmain paths in this whichlayer to arelimit combined the output valuesby concatenation, of those nodes. whereThen, there which represent high-order features extracted from original inputs. For the further improvement of ten dropout layers with different dropout rate are stacked after the high-order feature layer, where is a DAErobustness, on each L2main regularization path. The is utilizedconcatenated in this layer ou totput limit nodes the output are valuesplaced of in those a same nodes. layer, Then, which ten outputs can be obtained. The ten outputs are analyzed to calculate the benchmark value and representten high-order dropout layers features with di ffextractederent dropout from rate areoriginal stacked inputs. after the high-orderFor the further featurelayer, improvement where of reasonable interval, as introduced above in Section 2.3. robustness,ten L2 outputs regularization can be obtained. is utilized The ten in outputs this layer are analyzed to limit tothe calculate output the values benchmark of those value nodes. and Then, ten dropoutreasonable layers interval, with different as introduced dropout above inrate Section are stacked2.3. after the high-order feature layer, where ten outputs can be obtained. The ten outputs are analyzed to calculate the benchmark value and reasonable interval, as introduced above in Section 2.3.

Appl. Sci. 2019, 9, 5565 9 of 17 Appl.Appl. Sci. Sci. 2019 2019, ,9 9, ,x x FOR FOR PEER PEER REVIEW REVIEW 99 ofof 1717

InputInput

PathPath 11 PathPath 33 PathPath 22

DenoisingDenoising NoiseNoise Noise Noise Noise Noise auto-encoderauto-encoder (DAE)(DAE)

AddAdd++ Add Add++ Add Add ++

CC ConcatenateConcatenate High-orderHigh-order features features ++ L2L2 regularizationregularization DropoutDropout (from(from 0.05~0.50) 0.05~0.50)

Huber loss functions ...... Huber loss functions OutputOutput 1 1 Output Output 2 2...... Output Output 10 10 Benchmark value & interval Benchmark value & interval FigureFigure 7. 7. The The architecture architecture of of the the proposed proposed robust robust neural neural network network (RNN). (RNN). 2.4.1. Denoising Auto-Encoder 2.4.1.2.4.1. DenoisingDenoising Auto-EncoderAuto-Encoder The architecture of DAE is provided in Figure8. It is a robust variant of auto-encoder, which TheThe architecturearchitecture ofof DAEDAE isis providedprovided inin 0.0. ItIt isis aa robustrobust variantvariant ofof auto-encoder,auto-encoder, whichwhich possessespossesses possesses a noise layer before encoder [32], such as a normal (Gaussian) noise layer: aa noisenoise layerlayer beforebefore encoderencoder [32],[32], suchsuch asas aa normalnormal (Gaussian)(Gaussian) noisenoise layer:layer: 2 xi,xxNn ==+=+xi + N(0,(0,σσσ)22, ) , (11) xxNinin,, i i (0, ) , (11)

22 wherewhere x xxii iand and x xi,ni,n are areare the the the i ithithth input input input and and and the the the i ithithth output output output of of of the the the noise noise noise layer, layer, layer, respectively. respectively. N NN (0, (0,(0,σσσ2)) is is a a normal normal 2 distributiondistribution with with a a mean mean value value of of 0 0 and and a a variance variance value value of of σσ22.. . InIn In thisthis this study,study, study, σσ isis set set to to be be 0.05 0.05 when when thethe inputs inputs are are normalized normalized into intointo [0, [0,[0, 1]. 1].1].

NoiseNoise EncoderEncoder DecoderDecoder InputInput OutputOutput layerlayer layerlayer layerlayer

FigureFigure 8. 8. The The architecture architecture of of denoising denoising auto-encoder auto-encoder (DAE). (DAE).

Besides,Besides, thethethe encoder encoderencoder and andand decoder decoderdecoder layers layerslayers in DAE inin DADA areEE both areare made bothboth upmademade of conventional upup ofof conventionalconventional fully-connected fully-fully- connected(FC)connected layers, (FC)(FC) whose layers,layers, equation whosewhose can eqequationuation be expressed cancan bebe expressedas:expressed as:as: − l lllll 1−11 y =ywxb=+w=+ijx − + bj, (12) i ywxbiijjjiijjjj ,, (12)(12)

− lll l 1 l−l 11 where yyand andx − xare theare itheth outputith output and and the jtheth inputjth input of the of ltheth layer, lth layer, respectively; respectively;wij and wijij andbj are bjj theare where yiii and j x jj are the ith output and the jth input of the lth layer, respectively; w and b are weight and bias of the FC layer that connect the jth input and the ith output. For the encoder layer, the thethe weight weight and and bias bias of of the the FC FC layer layer that that connect connect the the j jthth input input and and the the i ithth output. output. For For the the encoder encoder layer, layer, number of outputs is smaller than that of inputs, i.e., i < j; and the number of outputs in the decoder thethe numbernumber ofof outputsoutputs isis smallersmaller thanthan thatthat ofof inputs,inputs, i.e.,i.e., ii << jj;; andand thethe numbernumber ofof outputsoutputs inin thethe layer is equal to that of the original inputs. Hence, after the computation of DAE, the dimension and decoderdecoder layerlayer isis equalequal toto thatthat ofof thethe originaloriginal inputs.inputs. Hence,Hence, afterafter thethe computationcomputation ofof DAE,DAE, thethe size of inputs remain unchanged, whereas the robustness of features increases as they can resist a dimensiondimension and and size size of of inputs inputs remain remain unchanged, unchanged, wh whereasereas the the robustness robustness of of features features increases increases as as they they certain degree of noise interference. cancan resistresist aa certaincertain degreedegree ofof noisenoise interference.interference. Appl. Sci. 2019, 9, x FOR PEER REVIEW 10 of 17 Appl. Sci. 2019, 9, 5565 10 of 17 2.4.2. Multiple Paths Combined by Addition and Concatenation There are altogether three main paths in RNN, which have similar layers and whose outputs are 2.4.2. Multiple Paths Combined by Addition and Concatenation combined under a concatenation operation: There are altogether three main paths in RNN, which have similar layers and whose outputs are nn yxwb= Cf(,{wb , }) combined under a concatenation operation:ckkk , k = 1,2,3 ,  nn (13)  yxwb=h f (,{n wb ,o })i  kmp, knw knb k  y = C fk(x, w , b )  c n k k o , k = 1, 2, 3, (13)  y = (x wnw bnb ) where C(·) is the concatenation operation,k,mp fk which, k co, mbinesk output nodes from different layers into an entire layer; yc and yk,mp are the output vector of the concatenation and the output vector of the kth where C( ) is the concatenation operation,nnwb which combines output nodes from different layers into an main path,· respectively; fkkk(,{xw , b }) is the computation result of the kth main path; x denotes entire layer; yc and yk,mp are the output vector of the concatenation and the output vector of the kth n n n w n n o b the input vector of RNN; (xwkw wandb b b)k are the weight matrixes and bias vectors of xthe kth path, main path, respectively; fk , k , k is the computation result of the kth main path; denotes the n n respectively; nw and nwb arew the bnumbersb of weights and biases, respectively. input vector of RNN; k and k are the weight matrixes and bias vectors of the kth path, respectively; nw andFurthermore,nb are the numbers a main path of weights is formed and by biases, two sub- respectively.paths, i.e., a DAE sub-path and a FC layer sub- path.Furthermore, The outputs of a the main two path sub-paths is formed are byadded two as sub-paths, the output i.e., of a a main DAE path, sub-path as follows: and a FC layer sub-path. The outputs of the two sub-paths arenn added as the output of a main path, as follows: yxwb= f (,{wb , }) kmp, k k k , n nn−−11o (14) =++nw wbnb ()sp sp y = fgxk(kkkx (xw, ,{w , b , b) }) w kk b k,mp n k k o   (14) nw 1 nb 1 sp sp = g (x, w − , b − ) + w x + b k k k k k sp sp where gk(·) represents the computation of DAE on the kth main path; wk and bk are the weight sp sp wherematrixg and( ) representsbias vector the of computationthe FC layer ofsub-path DAE on on the thekth kth main main path; path,w respectively.and b are the weight matrix k · k k and bias vector of the FC layer sub-path on the kth main path, respectively. 2.4.3. Dropout 2.4.3. Dropout Dropout is a special kind of layer, which is efficient to prevent over-fitting [33]. The procedure of dropoutDropout can is abe special summarized kind of layer,into two which steps, is e ffii.e.,cient the totraining prevent step over-fitting and application [33]. The step. procedure For a ofconventional dropout can FC be layer summarized as expressed into in twoEquation steps, (12), i.e., there the training are j input step nodes. andapplication In the training step. step, For the a conventionalinput nodes will FC be layer abandoned as expressed under in a Equation probability (12), p (0 there < p < are 1),j whereinput nodes.those abandoned In the training nodes step, cannot the inputbe connected nodes will to beoutputs abandoned [34], as under shown a probability in 0. Afterp training,(0 < p < 1),all wherethe input those nodes abandoned are operating nodes cannot in the beapplication connected step, to outputs whereas [34 the], as weight shown invalu Figurees are9. Aftermultiplied training, by allthe the probability input nodes p, arewhich operating can be indescribed the application as: step, whereas the weight values are multiplied by the probability p, which can be described as: − l ll=⋅l 1 1 + y =ypwxbp w x − + b ,, (15) i iijjj· ij j j wherewhere ppis is the the probability, probability, called called dropout dropout rate. rate. It isIt ais hyper-parameter a hyper-parameter and and is set is from set from 0.05 to0.05 0.50, to with0.50, awith step a ofstep 0.05, of in0.05, order in order to obtain to obtain ten di tenfferent different outputs outputs in this in study. this study.

Input

Dropout

output

Figure 9. The principle of dropout duringduring the training step.

2.4.4. Huber Loss Function The training process of neural network is to set a loss function and utilize back-propagation (BP) gradient descent algorithm to update the parameters layer by layer. One of thethe mostmost common-usedcommon-used functionfunction isis meanmean squaredsquared errorerror (MSE):(MSE): X 1 1 nsn  ∗2 2 = =−s y y MSEMSE = ()yi yi∗ ,, (16) ns i=i11 ii− ns Appl. Sci. 2019, 9, x FOR PEER REVIEW 11 of 17

Appl. Sci. 2019, 9, 5565 11 of 17 ∗ where ns is the number of training samples; yi and yi are the ith predicted and the ith actual outputs, respectively. Besides, mean absolute error (MAE) is another conventional function, which where n is the number of training samples; y and y are the ith predicted and the ith actual outputs, can be defineds as follows: i i∗ respectively. Besides, mean absolute error (MAE) is another conventional function, which can be n defined as follows: =−1 s ∗ MAE Xns = yiiy , (17) 1n i 1 MAE = s yi yi∗ , (17) ns i=1 − where MSEMSE andand MAE MAE can can be be also also called called L1 L1 loss loss and and L2 loss,L2 loss, as linearas linear term term and and quadratic quadratic term term are used are inused the in MSE the andMSE MAE, and MAE, respectively. respectively. Comparing MSE and MAE, MSE holds a smoother derivative function, which can benefitbenefit the calculation ofof gradientgradient descent descent algorithm, algorithm, whereas whereas a small a small difference difference in MAE in MAE may cause may acause huge a change huge inchange parameter in parameter updating. updating. On the contrary, On the MAE contrary, demonstrates MAE demonstrates a better capability a better than capability MSE when than fighting MSE againstwhen fighting outliers against [35]. In outliers this case, [35]. a Huber In this loss case, function a Huber is applied loss function in this study is applied that combines in this study the merits that ofcombines both MSE the and merits MAE of [both36], asMSE shown and inMAE Figure [36], 10 as: shown in 0:  ∗∗2  1()yy−−≤2 , yy δ 1 ns 1 ii ii =1 Xns  2 y y , y y δ Huber = 2 i i∗ i i∗ , (18) Huber =  i 1 − ∗∗2 − ≤ , (18) n i=1 δδ −− 11 2 −> δ ns s  δ yyyi iiy∗ δ , , yi yy iiy∗ > δ  − i − 22 − i where δ isis aa hyper-parameterhyper-parameter thatthat needsneeds toto bebe setset manually.manually. ItIt isis setset toto bebe 10%10% inin thisthis study.study.

4 L1 loss L2 loss 3 Huber loss

2

Loss function 1

0 -3 -2 -1 0 1 23 Error Figure 10. The principle of Huber loss function.

2.4.5. L2 Regularization L2 regularization in this study is aimed to set a penalty term for the nodes that hold large activation outputs, in order to prevent over-fitting over-fitting and increase robustness of of the neural network. network. The regularization works during the training phase, where a two-norm penalty term is added to the training loss function [[37],37], whichwhich can be expressed asas follows:follows: =+λ L L= λ yc + Huber k ck22   1  2 P  y y ∗∗,2 y y δ (19) = + 1 ns  12 ()i −−≤i∗ i i∗ δ λ yc n1 n  yyii− , yy− ii ≤ , (19) =+λ ky k2 s i=1s  δ2 y y 1 δ2, y y > δ c  i =1  i i∗ 2 i i∗ 2 − ∗∗− 1 2 − ns δδyy−−, yy −> δ  ii2 ii where L represents the final loss function for model training; and λ is a hyper-parameter of the penalty term,where which L represents is set to bethe 0.001 final inloss this function study. for model training; and λ is a hyper-parameter of the penalty term, which is set to be 0.001 in this study. 3. Results and Discussion 3. ResultsThe architecture and Discussion and hyper-parameters of the proposed RNN are presented in Table3. Moreover, consideringThe architecture the fact that and there hyper-parameters are a large number of the of proposed training samples,RNN arek -nearestpresented neighbors in 0. Moreover, (KNN), decisionconsidering tree the regression fact that (DTR) there and are single-hidden-layera large number of artificialtraining neuralsamples, network k-nearest (ANN) neighbors are established (KNN), fordecision comparisons, tree regression which hold (DTR) relatively and single-hidden-layer high training efficiency artificial on big neural datasets. network The training (ANN) of theare deepestablished RNN modelfor comparisons, is conducted which based hold on a relatively personal computerhigh training with efficien NVIDIAcy GTXon big 1080 datasets. GPU using The Pythontraining 3.5 of andthe deep Tensorflow RNN model 1.4. The is calculationconducted resultsbased on and a personal discussions computer are provided with NVIDIA in detail asGTX follows. 1080 ItGPU should using be Python noted that 3.5 alland the Tensorfl hyper-parametersow 1.4. The andcalculation training results configurations and discussions of RNN, are along provided with the in hyper-parametersdetail as follows. mentionedIt should be in noted Section that 2.4 (i.e.,all theσ, δhyper-parametersand λ), are chosen and via girdtraining search configurations with three-fold of RNN, along with the hyper-parameters mentioned in Section 2.4 (i.e., σ, δ and λ), are chosen via gird Appl. Sci. 2019, 9, 5565 12 of 17 cross-validation based on the overall training dataset. The searching spaces and final results of those parameters are listed in Table4.

Table 3. The architecture and hyper-parameters of the proposed robust neural network (RNN).

Path Layer Hyper-Parameter Node number: 64 DAE sub-path (1~3) FC layer 0 Activation: ReLU (Rectified Linear Unit) DAE sub-path (1~3) Noise layer Standard deviation: 0.05 Node number: 8 DAE sub-path (1~3) FC layer 1 (encoder) Activation: ReLU DAE sub-path (1~3) FC layer 2 (decoder) Node number: 64 Fully connected (FC) sub-path FC layer 3 Node number: 64 (1~3) Main path (1~3) Add layer Inputs: FC layer 2 and FC layer 3 Inputs: Add layers Node number: 192 - Concatenate layer Activation: ReLU Activity regularization: L2, λ = 1 10 3 × − - Dropout layer (1~10) Dropout rate: 0.05~0.50 (0.05 step) Node number: 1 - FC layer 4 (output 1~10) Activation: ReLU

Table 4. The searching spaces of selected hyper-parameters in RNN.

Hyper-Parameters Searching Space Result σ (standard error of noise in DAE) [0.00:0.05:0.50] 0.05 δ (coefficient in Huber loss) [0.00:0.10:0.50] 0.10 λ (penalty coefficient of L2) 1 10[ 4:1:0] 1 10 3 × − × − Number of nodes in a sub-path {16, 32, 64, 128} 64 Number of sub-paths [2:1:5] 3 Training optimizer {Adadelta, Adam} Adam Number of epochs [50:50:200] 100 Batch size {64, 128, 256, 512, 1024, 2048} 1024 Learning rate 1 10[ 5:1:0] 1 10 4 × − × −

3.1. Calculation Results In the case study, six regions from testing samples are chosen randomly as examples for exhibition, which are shown in Figure 11. Their region IDs are 1100, 1302, 7015, 8125, 12,610 and 14,072, respectively. From the results, the bounds of reasonable intervals can adjust adaptively according to multiple input factors, especially in Region No. 1100 and Region No. 8125. Some outliers that are far away from benchmarks are efficiently picked out, although those outliers may stay between 1% and 5%. Hence, − the results of reasonable intervals are better than a fixed interval between 1% and 5%. Besides, the − benchmark values show less fluctuation the actual line loss rate values, indicating a high reliability to estimate daily line loss rates. Rather than a mean value or a median value calculated from the original dataset, benchmark values are able to adaptively reflect the daily operating conditions in transformer regions according to the changes of relevant factors. Appl. Sci. 2019, 9, x FOR PEER REVIEW 13 of 17 Appl. Sci. 2019, 9, 5565 13 of 17

Appl. Sci. 2019, 9, Rex gFORion No.1100 PEER REVIEW Region No.1302 Region No.7015 13 of 17

12 Actual loss 12 12 Actual loss Actual loss Benchmark Region No.1100 Region No.1302 Benchmark Region No.7015 Benchmark 10 Interval 10 10 Interval Interval 12 IntervalActual loss bounds 12 12 Actual loss 8 8 ActualInterval loss bounds 8 Interval bounds Benchmark Outliers BenchmarkOutliers BenchmarkOutliers 10 Interval 10 10 6 6 Interval 6 Interval Interval bounds 8 8 Interval bounds 8 Interval bounds Outliers 4 4 Outliers 4 Outliers 6 6 6 2 2 2

4 4 rate (%) loss Line 4 Line loss rate (%) rate loss Line (%) rate loss Line 0 0 0 2 2 2 -2 -2 rate (%) loss Line -2 Line loss rate (%) rate loss Line (%) rate loss Line 0 0 0 -4 -4 -4 -2 -2 -2 -6 -6 -6 -40 5 10 15 20 25 30 -4 0 5 10 15 20 25 30 -4 0 5 10 15 20 25 30 Time (day) Time (day) Time (day) -6 Region No.8125 -6 Region No.12610 -6 Region No.14072 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Time (day) Time (day) Time (day) 12 Actual loss 12 12 Region No.8125 Region No.12610 Actual loss Region No.14072 Actual loss Benchmark Benchmark 10 10 Benchmark 10 12 Interval 12 12 Interval Actual loss ActualInterval loss Actual loss Interval bounds Interval bounds 8 Benchmark 8 BenchmarkInterval bounds 8 Benchmark 10 Outliers 10 10 Outliers Interval IntervalOutliers Interval 6 8 Interval bounds 86 Interval bounds 8 6 Interval bounds Outliers Outliers Outliers 4 6 64 6 4

2 4 42 4 2 Line loss rate rate (%) loss Line Line loss rate (%) rate loss Line (%) rate loss Line 0 2 20 2 0 Line loss rate rate (%) loss Line Line loss rate (%) rate loss Line (%) rate loss Line -2 0 -20 0 -2

-4 -2 -2-4 -2 -4

-6 -4 -4-6 -4 -6 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Time (day) Time (day) Time (day) -6 -6 -6 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Time (day) Time (day) Time (day) Figure 11. The results of benchmark values and reasonable intervals in six testing regions. FigureFigure 11. 11.The The results results of of benchmarkbenchmark valuesvalues andand reasonable intervals in in six six testing testing regions. regions. Based on the proposed RNN, the pass percentage results of line loss rates can be analyzed, which are shownBasedBased onin on 0. the the For proposed proposed the data RNN, RNN,point the analysisthe pass pass percentagepercentageof line loss resultsrerates,sults the of linenumber loss rates ratesof outliers can can be be are analyzed, analyzed, more than which which that areinare 0, shown shownas the in proposed in Figure 0. For 12the method. Fordata the point is dataable analysis pointto precisely analysisof line id loentify ofss rates, line those loss the rates, numberoutliers the ofthat number outliers are greatly of are outliers more different than are morethat from thanthein 0,benchmark that as the in Figure proposed values.3, as method the Furthermore, proposed is able methodto although precisely is able the identify pe torcentages precisely those outliers identifyof missing that those valuesare outliers greatly and that differentoutliers are greatly arefrom not the benchmark values. Furthermore, although the percentages of missing values and outliers are not direlativelyfferent from big in the all benchmark data points, values. which Furthermore,are 6.72% and although 13.06%, respectively, the percentages the regions of missing that values possess and no relatively big in all data points, which are 6.72% and 13.06%, respectively, the regions that possess no outliersmissing are and not abnormal relatively values big in within all data a points,month whichonly occupy are 6.72% 19.84% and of 13.06%, the whole respectively, dataset in the the regions study, missing and abnormal values within a month only occupy 19.84% of the whole dataset in the study, thatindicating possess a no low missing reliability and of abnormal the current values acquisition within a monthequipment. only occupy 19.84% of the whole dataset inindicating the study, a indicating low reliability a low of reliability the current of acquisition the current equipment. acquisition equipment.

Pass percentage analysis Pass percentage analysis Pass ofpercentage data points analysis Passof transformerpercentage analysis regions of data points of transformer regions Normal: 494,486 (80.22%) Normal: 494,486 (80.22%) 6.72% The total 6.72% The total Missing: 41,395 (6.72%) Normal: 3,946 (19.84%) The total Missing: 41,395 (6.72%) Normal: 3,946 (19.84%) number of numberThe total of number of number of regions: data points: Outlier: 80,523 (13.06%) regions: data points: Outlier: 80,523 (13.06%) Missing: 6,899 (34.70%) 19,884 616,404 Missing: 6,899 (34.70%) 19,884 616,404 < lower bounds: 30,841 (5.00%) < lower bounds: 30,841 (5.00%) > upper bounds: 49,682 (8.06%) Outlier: 9,039 (45.46%) > upper bounds: 49,682 (8.06%) Outlier: 9,039 (45.46%)

((aa)) (b(b) )

FigureFigureFigure 12.12. 12. Pass Pass percentage percentage analysis analysis of of the the lineline line lossloss loss ratesrates rates based based based on on onrobust robust robust neural neural neural network network network (RNN). (RNN). (RNN). (a )( a) (aAnalysis)Analysis Analysis of of ofdata data data points;points; points; (b ())b analysis) analysis of oftransformertransformer transformer regions.regions. regions.

3.2.3.2.3.2. ComparsionsComparsions Comparsions and andand Disscussion DisscussionDisscussion InInIn order orderorder to to evaluateto evaluateevaluate the performancethethe performance of the ofof proposed thethe proposedproposed method, method,method, including including including robustness robustness robustness and accuracy, and and comparisonaccuracy,accuracy, comparison comparison studies are studiesstudies conducted are conducted in this section. inin thisthis section.Thesection. hyper-parameters The The hyper-parameters hyper-parameters of the of established of the the established established KNN, DTRKNN,KNN, and DTR DTR ANN and and are ANN ANN provided areare providedprovided in Table in5. 0. The The comparison compariscomparisonon results results results areare are presentedpresented presented and and discussed discussed in in in detail detail detail asasas follows. follows. follows.

Appl. Sci. 2019, 9, x FOR PEER REVIEW 14 of 17

Table 5. The hyper-parameters of the established k-nearest neighbors (KNN), decision tree regression (DTR) and artificial neural network (ANN).

Appl. Sci. 2019, 9, 5565 14 of 17 Model Hyper-Parameter Number of neighbors: 10 KNN Table 5. The hyper-parametersMethod of the established of weighting:k-nearest neighborsbased on (KNN), distance decision tree regression (DTR) and artificial neural network (ANN). DTR Method of splitter: choose the best spilt ModelNode number Hyper-Parameter in the hidden layer: 64 ANN Number of neighbors: 10 KNN Activation: logistic Method of weighting: based on distance DTR Method of splitter: choose the best spilt Node number in the hidden layer: 64 3.2.1. The Robustness of theANN Proposed Method Activation: logistic In order to evaluate the robustness of the proposed method, the distributions of the calculated benchmark3.2.1. values The Robustness based on of different the Proposed testing Method models are analyzed, as shown in 0. The detailed values of the distributionIn order indictors to evaluate are the presented robustness ofin the 0. proposedFrom the method, results, the the distributions testing ANN of the model calculated shows the worst performancebenchmark values and basedis totally on diff unableerent testing to calculate models are a analyzed, valid benchmark as shown in Figurevalue. 13 .The The maximum detailed and minimumvalues values of the from distribution the ANN indictors are are4.49×10 presented6% and in Table −8.26×106. From7%, the results,respectively, the testing which ANN can model hardly be shows the worst performance and is totally unable to calculate a valid benchmark value. The maximum applied as benchmarks. KNN and DTR obtain similar results according to the distributions. They and minimum values from the ANN are 4.49 106% and 8.26 107%, respectively, which can × − × both utilizehardly massive be applied training as benchmarks. samples KNN that and are DTR close obtain to the similar situation results accordingof an unknown to the distributions. testing region to decide newThey benchmark both utilize massive values. training Thus, samples they are that able are close to toreceive the situation a better of an robustness unknown testing than region ANN in the study andto decideare practicable new benchmark at most values. of Thus,the testing they are regions. able to receive However, a better the robustness minimum than ANNbenchmark in the of the two modelsstudy is and −8.13×10 are practicable4%, which at most is of still the testingnot a regions.reasonable However, value. the minimumFurthermore, benchmark the ofproposed the two RNN models is 8.13 104%, which is still not a reasonable value. Furthermore, the proposed RNN achieve achieve the best result− × among all the four testing models, where the calculated benchmark values are the best result among all the four testing models, where the calculated benchmark values are within a within areasonable reasonable range. range. The standardThe standard deviation deviation of the benchmark of the valuesbenchmark calculated values by RNN calculated is only 0.80%, by RNN is only 0.80%,indicating indicating a stable a andstable robust and result robust obtained result via obtained the proposed via method.the proposed method.

3.5×106

3×106

2.5×106

2×106

7.5 5

2.5

0 Line loss rate (%) rate Line loss -2.5

-1×106

-1.5×106

-2×106 Proposed KNN DTR ANN RNN Figure 13.Figure The 13.distributionsThe distributions of the of thecalculated calculated benchmark benchmark values values based based on di onfferent different testing testing models. models.

Table 6. The robustness analysis results of different testing models.

Indicator Proposed RNN KNN DTR ANN mean (%) 2.03 1.03 −4.81 −5.83×106 std (%) 0.80 185.69 194.78 1.87×107 min (%) 0.00 −8.13×104 −8.13×104 −8.26×107 max (%) 33.61 100.00 100.00 4.49×106 la (%) 0.81 −1.80 −1.80 −1.17×106 ua (%) 3.36 5.25 5.24 2.80×106 q1 (%) 1.76 0.84 0.84 3.23×105 q2 (%) 2.09 1.65 1.65 8.47×105 q3 (%) 2.40 2.60 2.60 1.32×106 Appl. Sci. 2019, 9, 5565 15 of 17

Table 6. The robustness analysis results of different testing models.

Indicator Proposed RNN KNN DTR ANN mean (%) 2.03 1.03 4.81 5.83 106 − − × std (%) 0.80 185.69 194.78 1.87 107 × min (%) 0.00 8.13 104 8.13 104 8.26 107 − × − × − × max (%) 33.61 100.00 100.00 4.49 106 × 6 la (%) 0.81 1.80 1.80 1.17 10 − − − × 6 ua (%) 3.36 5.25 5.24 2.80 10 × q1 (%) 1.76 0.84 0.84 3.23 105 × q2 (%) 2.09 1.65 1.65 8.47 105 × q3 (%) 2.40 2.60 2.60 1.32 106 ×

3.2.2. Accuracy Analysis Besides the performance of robustness, accuracy is another important criterion in the study. As a result, three loss indicators are utilized to compare the four testing models, i.e., MAE, MSE and Huber loss, as their equations are discussed in Section 2.4.4. The outliers in the testing samples are eliminated before loss calculation using the two-tailed test, which is introduced above in Figure6. The comparison result is shown in Table7. According to the result, ANN performs the worst as its three loss indicators are much more than those of other models. Hence, the testing ANN is inapplicable when directly trained on samples with extreme outliers. Besides, although KNN and DTR show similar robustness, their accuracy indicators are quite different. KNN obtains the best MAE indicator, whereas the MSE value of KNN is bigger than that of the proposed RNN, due to the small number of outliers calculated by KNN. Comparing those indicators comprehensively, the proposed RNN shows the highest performance, as it achieves the best MSE and Huber loss indictors with a small MAE value as well.

Table 7. The accuracy analysis results of different testing models.

Indicator Proposed RNN KNN DTR ANN Mean absolute error (MAE, %) 1.46 0.70 5.01 7.26 106 × (MSE, %2) 13.70 574.58 3.90 103 3.54 1014 × × Huber loss 4.44 5.90 48.19 7.26 107 ×

4. Conclusions Daily line loss rate is a critical indictor for power-supply corporations as it greatly affects their profits. In order to better manage the level of line losses and provide guidance for the construction and operation of low-voltage transformer regions, it is concerned with developing an efficient method to compute benchmark values for daily line loss rates. Based on the benchmarks, reasonable intervals of the daily line loss rates can be further obtained, which are able to help discover abnormal line loss rate values, as well to help operators to check and confirm those irregular operating conditions. However, the number of studies is limited that research on calculating benchmark values of daily line loss rates and eliminating outlier from collected line loss rate data. As a result, a regression calculation method based on RNN is proposed in this study. It consists of DAE, three main paths, dropout layers, Huber loss function, L2 regularization and ten outputs. The benchmarks are calculated according to the mean values of the ten outputs. After error analysis, reasonable intervals can be obtained to detect outliers of original line loss rate samples. From the case study and comparison results, conventional ANN model fails to calculate results of benchmarks, as it cannot deal with outliers. KNN, DTR and the proposed RNN are proved applicable in the case study, where the proposed RNN outperforms the other two models. It shows both high accuracy and robustness among all the testing models. Furthermore, according to the final result obtained from the proposed RNN, there are about 13% outliers in the overall data points. Appl. Sci. 2019, 9, 5565 16 of 17

The percentage of regions only occupy about 20% that hold no missing and abnormal values of line loss rates within a month, indicating a low dependability of the acquisition equipment. Therefore, a reliable monitoring and management system for line loss data is still necessary nowadays in power grid.

Author Contributions: Writing original draft preparation, W.W. and L.C.; reviewing and editing, Y.Z., B.X. and H.Z.; supervision, G.X. and X.L. Funding: This research is supported by Science and Technology Project of Jiangsu Frontier Electric Technology Corporation, called “Research on key techniques of line loss benchmark value modeling in low-voltage transformer area by data-driven model” (project ID. 0FW-18553-WF). Conflicts of Interest: The authors declare no conflict of interest.

References

1. Chen, B.J.; Xiang, K.L.; Yang, L.; Su, Q.M.; Huang, D.S.; Huang, T. Theoretical Line Loss Calculation of Distribution Network Based on the Integrated electricity and line loss management system. In Proceedings of the China International Conference on Electricity, Tianjing, China, 17–19 September 2018; pp. 2531–2535. 2. Yang, F.; Liu, J.; Lu, B.B. Design and Application of Integrated Distribution Network Line Loss Analysis System. In Proceedings of the 2016 China International Conference on Electricity Distribution (CICED), Xi’an, China, 10–13 August 2016. 3. Hu, J.H.; Fu, X.F.; Liao, T.M.; Chen, X.; Ji, K.H.; Sheng, H.; Zhao, W.B. Low Voltage Distribution Network Line Loss Calculation Based on The Theory of Three-phase Unbalanced Load. In Proceedings of the 3rd International Conference on Intelligent Energy and Power Systems (IEPS 2017), Hangzhou, China, 10 October 2017; pp. 65–71. [CrossRef] 4. Campos, G.O.; Zimek, A.; Sander, J.; Campello, R.J.G.B.; Micenkova, B.; Schubert, E.; Assent, I.; Houle, M.E. On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study. Data Min. Knowl. Disc. 2016, 30, 891–927. [CrossRef] 5. Paulheim, H.; Meusel, R. A decomposition of the outlier detection problem into a set of supervised learning problems. Mach. Learn. 2015, 100, 509–531. [CrossRef] 6. Daneshpazhouh, A.; Sami, A. Entropy-based outlier detection using semi-supervised approach with few positive examples. Pattern Recogn. Lett. 2014, 49, 77–84. [CrossRef] 7. Bhattacharya, G.; Ghosh, K.; Chowdhury, A.S. Outlier detection using neighborhood rank difference. Pattern Recogn. Lett. 2015, 60–61, 24–31. [CrossRef] 8. Domingues, R.; Filippone, M.; Michiardi, P.; Zouaoui, J. A comparative evaluation of outlier detection algorithms: Experiments and analyses. Pattern Recogn. 2018, 74, 406–421. [CrossRef] 9. Dovoedo, Y.H.; Chakraborti, S. Boxplot-Based Outlier Detection for the Location-Scale Family. Commun. Stat.-Simul. Comput. 2015, 44, 1492–1513. [CrossRef] 10. Pranatha, M.D.A.; Sudarma, M.; Pramaita, N.; Widyantara, I.M.O. Filtering Outlier Data Using Box Whisker Plot Method For Fuzzy Time Series Rainfall Forecasting. In Proceedings of the 2018 4th International Conference on Wireless and Telematics (ICWT), Nusa Dua, Indonesia, 12–13 July 2018. 11. Campello, R.J.G.B.; Moulavi, D.; Zimek, A.; Sander, J. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection. ACM Trans. Knowl. Discov. Data 2015, 10, 5. [CrossRef] 12. Huang, J.L.; Zhu, Q.S.; Yang, L.J.; Cheng, D.D.; Wu, Q.W. A novel outlier cluster detection algorithm without top-n parameter. Knowl.-Based Syst. 2017, 121, 32–40. [CrossRef] 13. Jiang, F.; Liu, G.Z.; Du, J.W.; Sui, Y.F. Initialization of K-modes clustering using outlier detection techniques. Inf. Sci. 2016, 332, 167–183. [CrossRef] 14. Todeschini, R.; Ballabio, D.; Consonni, V.; Sahigara, F.; Filzmoser, P. Locally centred Mahalanobis distance: A new distance measure with salient features towards outlier detection. Anal. Chim. Acta 2013, 787, 1–9. [CrossRef] 15. Jobe, J.M.; Pokojovy, M. A Cluster-Based Outlier Detection Scheme for Multivariate Data. J. Am. Stat. Assoc. 2015, 110, 1543–1551. [CrossRef] 16. An, W.J.; Liang, M.G.; Liu, H. An improved one-class support vector machine classifier for outlier detection. Proc. Inst. Mech. Eng. C J. Mech. 2015, 229, 580–588. [CrossRef] Appl. Sci. 2019, 9, 5565 17 of 17

17. Chen, G.J.; Zhang, X.Y.; Wang, Z.J.; Li, F.L. Robust support vector data description for outlier detection with noise or uncertain data. Knowl.-Based Syst. 2015, 90, 129–137. [CrossRef] 18. Zou, C.L.; Tseng, S.T.; Wang, Z.J. Outlier detection in general profiles using penalized regression method. IIE Trans. 2014, 46, 106–117. [CrossRef] 19. Peng, J.T.; Peng, S.L.; Hu, Y. Partial least squares and random sample consensus in outlier detection. Anal. Chim. Acta 2012, 719, 24–29. [CrossRef] 20. Ni, L.; Yao, L.; Wang, Z.; Zhang, J.; Yuan, J.; Zhou, Y. A Review of Line Loss Analysis of the Low-Voltage Distribution System. In Proceedings of the 2019 IEEE 3rd International Conference on Circuits, Systems and Devices (ICCSD), Chengdu, China, 23–25 August 2019; pp. 111–114. 21. Yuan, X.; Tao, Y. Calculation method of distribution network limit line loss rate based on fuzzy clustering. IOP Conf. Ser. Earth Environ. Sci. 2019, 354, 012029. [CrossRef] 22. Bo, X.; Liming, W.; Yong, Z.; Shubo, L.; Xinran, L.; Jinran, W.; Ling, L.; Guoqiang, S. Research of Typical Line Loss Rate in Transformer District Based on Data-Driven Method. In Proceedings of the 2019 IEEE Innovative Smart Grid Technologies-Asia (ISGT Asia), Chengdu, China, 21–24 May 2019; pp. 786–791. 23. Yao, M.T.; Zhu, Y.; Li, J.J.; Wei, H.; He, P.H. Research on Predicting Line Loss Rate in Low Voltage Distribution Network Based on Gradient Boosting Decision Tree. Energies 2019, 12, 2522. [CrossRef] 24. Zhang, S.; Dong, X.; Xing, Y.; Wang, Y. Analysis of Influencing Factors of Transmission Line Loss Based on GBDT Algorithm. In Proceedings of the 2019 International Conference on Communications, Information System and Computer Engineering (CISCE), Haikou, China, 5–7 July 2019; pp. 179–182. 25. Wang, S.X.; Dong, P.F.; Tian, Y.J. A Novel Method of Statistical Line Loss Estimation for Distribution Feeders Based on Feeder Cluster and Modified XGBoost. Energies 2017, 10, 2067. [CrossRef] 26. Radovanovic, M.; Nanopoulos, A.; Ivanovic, M. Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection. IEEE Trans. Knowl. Data Eng. 2015, 27, 1369–1382. [CrossRef] 27. Yosipof, A.; Senderowitz, H. k-Nearest Neighbors Optimization-Based Outlier Removal. J. Comput. Chem. 2015, 36, 493–506. [CrossRef] 28. Wang, X.C.; Wang, X.L.; Ma, Y.Q.; Wilkes, D.M. A fast MST-inspired kNN-based outlier detection method. Inf. Syst. 2015, 48, 89–112. [CrossRef] 29. Yang, J.H.; Deng, T.Q.; Sui, R. An Adaptive Weighted One-Class SVM for Robust Outlier Detection. Lect. Notes Electr. Eng. 2016, 359, 475–484. [CrossRef] 30. Feng, N.; Jianming, Y. Low-Voltage Distribution Network Theoretical Line Loss Calculation System Based on Dynamic Unbalance in Three Phrases. In Proceedings of the 2010 International Conference on Electrical and Control Engineering, Wuhan, China, 25–27 June 2010; pp. 5313–5316. 31. Hubert, M.; Vandervieren, E. An adjusted boxplot for skewed distributions. Comput. Stat. Data Anal. 2008, 52, 5186–5201. [CrossRef] 32. Lamb, A.; Binas, J.; Goyal, A.; Serdyuk, D.; Subramanian, S.; Mitliagkas, I.; Bengio, Y. Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations. arXiv 2018, arXiv:1804.02485. 33. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. 34. Cheng, L.L.; Zang, H.X.; Ding, T.; Sun, R.; Wang, M.M.; Wei, Z.N.; Sun, G.Q. Ensemble Recurrent Neural Network Based Probabilistic Wind Speed Forecasting Approach. Energies 2018, 11, 1958. [CrossRef] 35. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [CrossRef] 36. Esmaeili, A.; Marvasti, F. A Novel Approach to Quantized Matrix Completion Using Huber Loss Measure. IEEE Signal Proc. Lett. 2019, 26, 337–341. [CrossRef] 37. Shah, P.; Khankhoje, U.K.; Moghaddam, M. Inverse Scattering Using a Joint L1-L2 Norm-Based Regularization. IEEE Trans. Antennas Propag. 2016, 64, 1373–1384. [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).