Statistical Identification and Classification Using Laguerre Functions

by

Pramoda Sachinthana Jayasinghe

A Thesis submitted to the Faculty of Graduate Studies of The University of Manitoba in partial fulfilment of the requirements of the degree of

Master of Science

Department of University of Manitoba Winnipeg

Copyright c 2019 by Pramoda Sachinthana Jayasinghe Abstract

This research focuses on developing classification methods to automatically identify

partial discharge (PD) signals from multiple sources and developing new system

identification techniques to help increase the accuracy of these classifications. We use a Laguerre functional basis to approximate PD signal waveforms, obtained through a lab . To perform the approximation, we considered multiple approaches, such as using objective functional methods and a deterministic method. In the process of signal approximation, we developed methods of selecting a proper scaling factor for Laguerre functions. We evaluated the use of the Laguerre basis expansion coefficients

to classify PD signals into their respective sources. Linear discriminant analysis (LDA),

quadratic discriminant analysis (QDA) and support vector machines (SVM) were

used as classifiers in this analysis. It was observed that these methods can classify partial discharge signals with high accuracies, even when the signals are visually indistinguishable. We also developed two methods for system identification, based on a deterministic approach in the form of a recursive formula and a stochastic method based on a group Lasso model. The aim of system identification was to improve the classification accuracies by removing the effect of the system from the observed PD waveforms. Through numerical evaluations we showed that there are situations where classification after system identification can improve classification accuracies. Acknowledgement

I would like to express many thanks to my advisor Dr. Mohammad Jafari Jozani for the guidance and support given to me throughout the M.Sc. program. His suggestions and support was invaluable not only for this research but for achieving my future goals as well. I would also like to express my sincerer gratitude to my co-advisor, Dr. Behzad Kordi for his suggestions and comments throughout my research. Without their help, I would not have been able to realize my dream of obtaining a masters degree and continuing my higher studies. Their doors were always open whenever I hit encountered a problem or when I had a question about my research.

Many thanks to my thesis committee members, Dr. Alexandre Leblanc and Dr. Saman Muthukumarana for their comments and suggestions to improve my research.

I am also grateful to Dr. Saeed Shahabi and Mr. Ali Nasr Esfahani for their contribution in collecting laboratory required for this project.

Last but not least, I would like to thank my family and friends. Many sincere thanks to my lovely wife and my parents and my sister for their kindness and patience and their entire support while I was pursuing my studies.

i Examining Committee

This thesis was examined and approved by the following examining committee on August 16, 2019:

• Dr. Mohammad Jafari Jozani (advisor)

Department of Statistics University of Manitoba

• Dr. Behzad Kordi (co-advisor)

Department of Electrical & Computer Engineering University of Manitoba

• Dr. Alexandre Leblanc (examiner)

Department of Statistics University of Manitoba

• Dr. Saman Muthukumarana (examiner)

Department of Statistics University of Manitoba

ii Dedication Page

To my lovely and supportive parents.

iii Contents

Contents iv

List of Tables viii

List of Figures x

1 Introduction 1

1.1 Introduction to and Classification ...... 1

1.2 Problem Definition and Motivation ...... 3

1.2.1 Output Classification ...... 3

1.2.2 Input Classification and System Identification ...... 4

1.3 Partial Discharge Classification and System Identification ...... 4

1.4 Research Contributions ...... 9

1.5 Publications ...... 11

1.6 Organization of the Thesis ...... 11

iv 2 Signal Approximation 13

2.1 Introduction ...... 13

2.2 Laguerre Functions ...... 15

2.3 Estimating Coefficients ...... 18

2.3.1 Least-Squares Objective Function ...... 19

2.3.2 Least Absolute Objective Function ...... 20

2.3.3 Lasso Objective Function ...... 20

2.4 Deterministic Method of Coefficient Estimation ...... 21

2.5 Examples of Signal Approximation ...... 23

2.5.1 Example 1 (Gaussian Function) ...... 24

2.5.2 Example 2 ...... 27

2.6 Selecting a Scaling ...... 28

3 Partial Discharge Source Classification 32

3.1 Experimental Setup ...... 33

3.2 Data Description ...... 34

3.3 Signal Reconstruction ...... 35

3.4 Removing Signal Delay ...... 37

3.5 Overview of Some Classification Methods ...... 41

3.5.1 Bayes Classifier ...... 42

3.5.2 Linear Discriminant Analysis ...... 43

3.5.3 Quadratic Discriminant Analysis ...... 45

v 3.5.4 Support Vector Machines ...... 46

3.6 Classification of Experimental Data ...... 49

3.7 Principal Component Analysis ...... 51

3.8 Normalizing the Signals ...... 52

4 System Identification with Laguerre Functions 55

4.1 System Identification ...... 55

4.2 Studies ...... 58

4.2.1 Example 1 ...... 59

4.2.2 Example 2 ...... 63

4.2.3 Example 3 ...... 65

4.3 System Identification Based on Noisy Signals ...... 66

5 Lasso Methodology for System Identification 70

5.1 Introduction ...... 70

5.2 Statistical Approach Towards System Identification ...... 72

5.3 Group Lasso Objective Function ...... 73

5.3.1 Group Lasso ...... 74

5.4 System Identification Using Input and Output Signals . . . . . 76

5.5 Experimental Setup ...... 77

5.6 Estimated Systems using Group Lasso Methodology ...... 78

5.7 Reconstructed Output ...... 80

vi 5.8 Reconstructed Input ...... 81

5.9 Classification on Reconstructed Input ...... 82

5.10 Simulating for Classification using the Input Signal ...... 84

6 Conclusion and Future Work 90

6.1 Numerical Limitations ...... 93

6.2 Future Work ...... 93

A Smoothing the Signal Using Cubic Smoothing Splines 94

Bibliography 96

vii List of Tables

2.1 Interval for p for different orders...... 31

3.1 The confusion matrix between the actual and predicted labels for the test data using the LDA model...... 50

3.2 The confusion matrix between the actual and predicted labels for the test data using the QDA model...... 50

3.3 The confusion matrix between the actual and predicted labels for the test data using the SVM model with a Gaussian kernel...... 51

4.1 Resistors, capacitors and inductors in the time and Laplace domains. 59

5.1 The confusion matrix between the actual and predicted labels for the test data using the LDA model...... 83

5.2 The confusion matrix between the actual and predicted labels for the test data using the QDA model...... 83

5.3 The confusion matrix between the actual and predicted labels for the test data using the SVM model...... 84

5.4 The confusion matrix between the actual and predicted labels for the test data using the QDA and SVM models of the input and the QDA classification in case 1...... 88

viii 5.5 The confusion matrix between the actual and predicted labels for the test data using the SVM classifier of case 1...... 88

5.6 The confusion matrix between the actual and predicted labels for the test data using the SVM classifier of case 2...... 88

5.7 The confusion matrix between the actual and predicted labels for the test data using the SVM classifier of case 2...... 88

ix List of Figures

1.1 A blackbox representation of a system...... 1

2.1 A simple Resistor–Capacitor (RC) circuit which is time invariant if the

values of R and C do not change over time. [Source: Oppenheim et al.

(1996)] ...... 14

2.2 First 5 orders of (a) Laguerre polynomials in the y of −10 to 20

and (b) Laguerre functions with p = 1...... 16

2.3 Laguerre functions on order n = 4 for multiple scaling (p). 17

2.4 Gaussian function f1(t) to be approximated using Laguerre functions. 24

2.5 Approximated Gaussian functions using (a) least square and (b) Lasso

objective functions together with the actual function...... 25

2.6 Approximated Gaussian function using least absolute objective for an

initial value of (a) 1 and (b) 0...... 26

2.7 Approximated Gaussian function using the exact method...... 26

2.8 (a) Function f2(t) to be approximated. Laguerre approximation of

f2(t) using (b) least-squares, (c) least absolute and (d) Lasso objective functions...... 28

x 2.9 Approximated function f2(t) using the exact method...... 29

1%  2.10 (a) Relationship of n and p with log fT (n, p) . (b) Fitted relationship

1%  of n and p with log fT (n, p) , using a quadratic model...... 30

3.1 Discharge sources, (a) a twisted pair of wires and (b) a needle-plane

setup, used to obtain partial discharge signals in this research. . . . . 34

3.2 Sample of PD pulses for the sources (a) twisted pair of wires, (b)

needle-plane setup and (c) combined, obtained in the lab...... 35

3.3 (a) a regular PD pulse in the source with the twisted pair of wires. (b) –

(f) anomaly pulses generated due to the positive half of the high voltage

signal...... 36

3.4 Sample PD pulses and the approximated signals obtained using the Laguerre basis approximation...... 37

3.5 Sample of PD pulses where the delays are removed, and the approxi- mated signal overlaid...... 41

3.6 Three separating hyperplanes, out of infinite possibilities. [Source:

James et al. (2013).] ...... 47

3.7 Support vectors...... 48

3.8 Errors in support vector classification...... 49

3.9 First 3 Laguerre coefficients for the signals from all sources...... 50

3.10 First 3 principal components of the Laguerre coefficients for the signals form all sources...... 52

3.11 Sample of normalized PD pulses and the approximated signal overlaid. 53

xi 3.12 First 3 Laguerre coefficients for the scaled signals form all sources. . . 54

3.13 Misclassification error for the three sources with the number of selected coefficients...... 54

4.1 Circuit diagram of the simulation setup for the example for system identification...... 59

4.2 (a) Input, (b) output and (c) impulse response functions used in example

1 to demonstrate the developed recursive formula for system identification. 62

4.3 (a) Input and (b) output functions used in Example 1 with the Laguerre

basis approximation. An order of 90 and a scaling parameter of 10 was used for the approximation...... 63

4.4 The estimated system using the Laguerre coefficients calculated using the developed recursive formula...... 64

4.5 (a) Input and (b) output functions used in Example 2 with the Laguerre

basis approximation. An order of 90 and a scaling parameter of 10 was used for the approximation...... 65

4.6 Estimated impulse response functions h(t) for (a) p = 10 and (b) p = 0.1. 65

4.7 (a) Input, (b) output and (c) impulse response functions used in example

3 to demonstrate the developed recursive formula for system identification. 66

4.8 The estimated system with the actual unit response function, for Ex- ample 3...... 67

4.9 (a) Input and (b) output functions approximated using a Laguerre basis

with an order of 90 and a scaling parameter of 10, when there is noise. 67

xii 4.10 The estimated system with the actual unit response function in the example with noisy input and output functions...... 68

4.11 Mean squared error (MSE) between the estimated and actual systems

for different levels of noise...... 68

5.1 Sample of input-output signal pairs for the three sources, used for system identification...... 79

5.2 Estimated impulse response functions for the three PD sources, using

a group Lasso objective function method with ky = kx = kh = 90 and

m0 =2...... 80

5.3 Sample of reconstructed output signal using the estimated system. . . 81

5.4 Sample of reconstructed input signal using the estimated system. . . 82

5.5 First 3 Laguerre coefficients for the estimated input PD signals. . . . 83

5.6 Impulse response function used in this simulation...... 85

5.7 Three cases considered in the simulation study to show that performing classification on the input might be beneficial to classifying on the

output. setup for (a) case 1, (b) case 2 and (c) for case 3...... 86

5.8 Simulated input functions (a) – (b) and output functions (c) – (h)

shows that performing classification on the input might be beneficial to classifying on outputs...... 87

xiii Chapter 1

Introduction

In this chapter we provide an introduction to systems and system identification.

We provide the motivation for the problem of partial discharge source classifica-

tion and system identification. We provide a literature review for the problem of

partial discharge source classification and of the current methodology for system

identification. Finally, we also provide an overview of the thesis as well as its content.

1.1 Introduction to Systems and Classification

A system, as depicted in Figure 1.1, is defined as a process which takes an input and produces an output. This process can consist of interconnected components, devices or subsystems to convert the input to an output using different methods.

Most systems are complex in nature and work like a blackbox, which that a precise description of the system is unavailable. For example, an industrial plant

Input Output Blackbox

Figure 1.1: A blackbox representation of a system.

1 CHAPTER 1. INTRODUCTION 2

that is given raw material as an input and produces finished goods can be seen as a complex system. Individual steps in the manufacturing process can be complicated and interest could lie in examining the system as a whole. A typical room can be

considered as an acoustic system which produces an echo given by an audio signal (for

example, playing music). An econometric system is another example, which is driven

by environmental factors such as cultural factors, politics, and economics (Billings,

2013). Understanding the process within the systems can be of importance for a variety of reasons. For example, this can be used to better understand the system and

improve the associated process (Billings, 2013).

System identification is a useful tool in understanding the inner working of a system. The aim of system identification is to build a to explain the process followed by the system. In the literature, system identification is also referred to as deconvolution since it acts as the complement to the convolution operator

in mathematics (Oppenheim et al., 1996).

Classification is another useful technique which has numerous applications. In statistical learning, classification is used to identify the group or category a test observation belongs to, given a set of training data with known groups. An example can be classifying a patient having or not having a disease, given some observed features such as gender, age, and blood pressure. Classification is referred to as a supervised learning approach, which means that the training data contains a set of features as well as a response. In contrast, unsupervised learning is used when the data comes in the form of a set of features without having a specific working response variable.

In this research, we focus on developing system identification and classification methods using a Laguerre basis function approximation. We aim to propose novel CHAPTER 1. INTRODUCTION 3

methods for system identification based on deterministic and stochastic techniques. Applications of system identification are frequently seen in many areas, such as the medical field to identify how a drug works, to process images and audio signals in , and in environmental sciences to identify the effect of climate

change (Keesman, 2011). In this thesis, we provide applications of our developed

system identification and classification methodologies in a Partial Discharge (PD)

source classification problem in high voltage engineering using experimental data obtained from an experiment conducted at the McMath High Voltage Laboratory at the University of Manitoba. All numerical studies and are conducted

using the statistical software R.

1.2 Problem Definition and Motivation

Consider a LTI system as in Figure 1.1 such that the output y(t) is given by

y(t) = x(t) ∗ h(t) (1.1) where x(t) is the input and h(t) represents the impulse response of the system. In most

practical situations, both h(·) and x(·) are unknown. However, in lab

one can often generate x(·) to have some known inputs. This is particularly useful in

system identification problems. In what follows, we consider two important problems

associated with (1.1).

1.2.1 Output Classification

Consider an output signal y(t) that can be assigned a label according to the source

that generated the signal. In output classification, we attempt to identify the label

of a test signal (where the label is unknown), based on a set of outputs where the CHAPTER 1. INTRODUCTION 4

labels are known (training signals). In the problem of output classification, we do not

consider effects from x(t) or h(t).

1.2.2 Input Classification and System Identification

Input classification is a similar process to output classification, with the only difference

being that we use the input signals x(t) to perform classification. Since the input is

unknown in most cases, we can initially estimate h(t) and use that information to

estimate x(t).

The process of estimating h(t) is known as system identification. System identifi-

cation techniques rely on having access to a set of known pairs of y(t) and x(t).

1.3 Partial Discharge Classification and System Iden- tification

Increased use of electrical energy, together with increased transmission distances, has led to an increase in transmission voltages. This, in turn, has led to the need of developing more robust electrical insulators as the stress that is put on these insulators increases. At the same time, due to the rising cost of materials and maintenance, these insulators are expected to be small in both size and weight. Electrical insulators prevent the current from flowing between two wires and are an integral part of any electrical system. Most problems that come up in high voltage insulation can be attributed to electrostatic fields which are directly related to the transmission voltage. Due to defects in the insulator or harmful operating conditions, an insulator might not be able to withstand the electrical stress that is put on it. This can have many CHAPTER 1. INTRODUCTION 5 negative side effects on the safety of the equipment that is connected to a power transmission line with compromised insulation (Kuffel et al., 2000).

Warne and Haddad (2004) defines partial discharges (PDs) as localised dielectric breakdowns of the insulation material which occurs when the insulator reaches certain electrical stress limits. These type of discharges are known as “partial” since the local breakdown does not result in complete breakdown. Partial discharges can release energy in the form of heat, light, or sound. The most common way that partial discharges can occur is when there is damage in the insulator due to cracks or voids in a solid insulator, or bubbles within a liquid insulator. This indicates that PD activity can be used as a symptom of a defective insulator, regardless of the type of insulator. This property is useful to understand the extent to which the insulator has deteriorated, which helps engineers in providing maintenance or assessing the risk of insulation failure.

When PDs are generated, they tend to add to the electrical stress of the insulator, resulting in further degradation. Gao and Noda (2005) have shown that if PDs are caused by a crack in a solid insulator, in time the crack can grow due to the same discharges and continue to deteriorate the insulation material. Repeated exposure to PD releases will lead to irreversible mechanical and chemical damage to the insulation material.

While identification of PDs is a well studied problem and is widely used in practice, identification of the source of discharges can also be of importance. Partial discharge source identification, also referred to as PD source separation, can be used in a variety of applications to create an automatic warning system for the source that generates a PD. This can also aid in assessing the risk of PD activity since all sources might not possess the same amount of damage to the insulator. In general, the aim of PD source CHAPTER 1. INTRODUCTION 6

identification is to recognize potential issues with the insulator, before a catastrophic failure occurs.

In the literature, PD source identification is done using a variety of tools and techniques. Most methods rely on the assumption that multiple PD signals from a specific source have similar shapes while PD from different sources have different

forms. (Contin and Pastore, 2009).

Use of phase resolved partial discharge (PRPD) patterns is a popular method to

identify the PD source which has been studied for a long time (Altenburger et al.,

2002). PRPD patterns can be defined as the illustration of the PD activity relative

to the 360 degrees of an alternating current (AC) cycle. External noise in data

collection as well as many other measurement errors can cause PRPD patterns to have complicated forms, which can limit the usefulness of certain tools in such analysis.

Okubo and Hayakawa (2005) argue that, to identify the physical mechanism of the

PD, time-resolved PD characteristics or current pulse waveforms provide more insight.

There are many studies on PD source identifications using waveforms, in either the

time (Okubo and Hayakawa, 2005) or domains (Hao et al., 2011). The usual

practice is to develop classification tools based on a set of features constructed from PD signals mainly acquired in a laboratory environment. Different approaches for

classification can be seen in the literature, such as, time-series analyses (Cavallini et al.,

2003; Contin and Pastore, 2009), artificial neural networks (Salama and Bartnikas,

2002), fuzzy algorithms (Contin et al., 2002), support vector machines and hidden

Markov models (Janani and Kordi, 2018). Tools such as transforms have been

useful in extracting sets of features as mentioned in Nik Ali et al. (2014) and Janani

et al. (2017). Another common set of features that has been used in the literature would be direct characteristics of PD signals such as the amplitude, rise-time and fall CHAPTER 1. INTRODUCTION 7

time. Principal component analysis (PCA) is a common technique in such applications

to reduce the dimension of the feature set (Nik Ali et al., 2014; Alvarez´ et al., 2016).

Most proposed methods rely on the supervision of a skilled technician to per- form source classification since they include techniques such as extracting significant

characteristics from each PD signal, which cannot be automated (Janani and Kordi,

2018). In this research, a method was developed for PD source identification based on

representing PD signals using a Laguerre function basis expansion. One advantage of this method would be the use of the entire signal waveform, rather than selecting a set for characteristics from the signal. Laguerre basis representation will also help to automatically select the necessary features while selecting required computational parameters.

This research also aims to examine system identification techniques to extract the underlying system of the PD experimental setup. System identification is used to build a mathematical model for an unknown dynamic system, using a known pair of inputs and outputs. Most methods of system identification are based on least-squares

techniques (Vajda et al., 1988). Earlier applications of these techniques are frequently

seen in the medical field as evident in the works of Vajda et al. (1988); Verotta (1993);

Zacharakis et al. (1999).

Vajda et al. (1988) propose a method of system identification in relation to a

pharmacokinetic application with small samples. The proposed method uses models to perform continuous-time least-squares system identification. Requiring no prior assumptions, such as assuming a functional form for the input

and/or output, is pointed out as an advantage of this technique. Similarly, Verotta

(1993) applies an inequality-constrained linear regression model after using a spline

basis to represent the system effect. CHAPTER 1. INTRODUCTION 8

In Marmarelis (1993), a least-squares approach is used to estimate Laguerre

expansions of kernels. This estimate is used to perform system identification on a dynamic non-linear biological system. The proposed method is an extension of the

Volterra-Wiener approach (Wiener and Teichmann, 1959) which uses time-averaging

of samples to estimate the same Laguerre expansions. Authors argue that this method performs well in the presence of noise in the input and output signals and is computationally attractive compared to the method of Wiener and Teichmann

(1959).

A system identification method suitable for online analysis of fluorescence spec-

troscopy data is introduced by Dabir et al. (2009). The proposed method is based on a

principle similar to Verotta (1993), the difference being the use of a Laguerre function

as the basis. A least-squares objective function is used to estimate the Laguerre coefficients which are then used to identify the underlying system effect. An iterative nonlinear least-squares optimization technique has been used to estimate the scaling parameter in the Laguerre function and the estimation order. The research by Liu

et al. (2012) follows the same approach as Dabir et al. (2009), while authors use a

constrained least-squares objective function instead of an ordinary least-squares to address the problem of overfitting. This method is also applied to a case of fluorescence

measurements of biological tissue (Liu et al., 2012).

Our research follows a basis similar to Dabir et al. (2009) and Liu et al. (2012) and

uses a Laguerre expansion to perform system identification. While using a constrained least-squares objective function in the form of a group Lasso model, we also introduce a deterministic relationship with the Laguerre representation of the functions. In addition to system identification, we study the problem of PD source identification through the same Laguerre basis representation. CHAPTER 1. INTRODUCTION 9 1.4 Research Contributions

The main objectives of this research are to develop classification methods to identify partial discharge signals from multiple sources and to develop new system identification techniques to help increase the accuracy of the classification. To this end, we use a Laguerre functional basis to approximate partial discharge signals obtained from a lab experiment. To perform the approximation, we considered multiple approaches, such as methods based on objective functions and a method based on the inner product of functions. We explored least-squares, least absolute and Lasso objective functions to estimate the coefficients of the approximation. Out of the considered techniques, the

deterministic method (based on the inner product) of estimation provided the best fit

to the given partial discharge signals. In the approximation process, the selection of a

scaling parameter (a parameter in Laguerre functions) was found to be essential. We

developed two methods to select a proper scaling parameter for Laguerre functions. One approach was to consider the time it takes for a Laguerre function to fall into

1% of it’s peak and build a quadratic model to capture the relationship of the order of the Laguerre function and the scaling parameter. Based on this, we obtained an interval for the scaling parameter which tends to be from zero to close to the order of the Laguerre function. Alternatively, we used a grid approach for this parameter to find a value minimizing the mean squared error between the actual and approximated signals.

We evaluated the use of the Laguerre basis in partial discharge source identification. We used the coefficients of the Laguerre basis expansion to classify PD signals into their respective sources. We employed classifiers such as linear discriminant analysis and support vector machines, for this analysis. The method of using basis function expansion coefficients for classification performed well in this application, with some CHAPTER 1. INTRODUCTION 10 classifiers providing low misclassification rates on test data when only the first three expansion coefficients were used. The partial discharge waveforms were normalized and the classification was performed again, to make the partial discharge waveforms as visually similar such that a person could not visually distinguish these waveforms. It was seen that, even in this case, the proposed method is able to classify the signals to their sources with high accuracy.

We also developed two methods for system identification, based on a deterministic approach in the form of a recursive formula and a stochastic method based on a group Lasso model. The aim of system identification was to improve classification accuracy by removing the effect of the system from the observed PD waveforms. A separate experiment was conducted to implement these techniques, using the same sources and providing a known input to obtain the corresponding output. The system estimated from this experiment was used to identify the actual partial discharge signal from the previous experiment. We finally evaluated the classification performance in cases with or without system identification. We were not able to find a significant improvement in the classification accuracy with the use of system identification. So, we designed a simulation study and showed that this process might be useful in other applications and with different data.

For system identification, we developed a recursive formula to estimate the basis function expansion coefficients for the system effect. This method was unstable in the presence of noise in the input and output signals. Therefore, we developed an alternative method based on a group Lasso objective function which can be minimized to obtain the basis expansion coefficients for the system. This method performed better than the recursive formula, when the same signals are used. CHAPTER 1. INTRODUCTION 11 1.5 Publications

• Jayasinghe, P., Jafari Jozani, M., Kordi, B. (2019a). Developing New Statistical

Pattern Recognition and System Identification Techniques for Partial Discharge Analysis. Statistical Society of Canada Annual General Meeting 2019, Calgary. May 28, 2019.

• Jayasinghe, P., Jafari Jozani, M., Kordi, B. (2019b). Developing New Statistical

Pattern Recognition and System Identification Techniques for Partial Discharge Analysis. Joint Statistical Meeting 2019, Denver. July 30, 2019.

• Jayasinghe, P., Jafari Jozani, M., Kordi, B. (2019c) New Statistical System

Identification and Classification Techniques for Partial Discharge Analysis. To be published.

1.6 Organization of the Thesis

This research focuses on developing classification and system identification methods with applications in partial discharge analysis. We use a Laguerre function as a basis to approximate PD signals and obtain mathematical expressions for the observed signals. In Chapter 2, we used least-squares, least absolute and Lasso objective functions to estimate the expansion coefficients of the approximation. These objective functional approaches aim to estimate the coefficients which minimize an error measurement

such as the mean squared error (MSE). To estimate the Laguerre basis expansion

coefficients, we also considered an approach based on the inner product of functions. To obtain an estimate for the coefficients using this method, the trapezoidal rule was used. In the process of the approximation, a proper scaling factor in the Laguerre CHAPTER 1. INTRODUCTION 12

function, should be selected. To this end, an interval for the scaling factor was developed based on a fitted quadratic regression model in Section 2.6.

After properly approximating the PD signal, the coefficients of the approximation were used as a set of features to train classifiers to group PD signals according to their sources. These classifiers are able to accurately estimate the source of a given new PD signal. This process is explained in Chapter 3. A novel method was developed in Section 3.4 to remove the delay from a signal, which is based on the Laguerre basis expansion outlined in Chapter 2. For the purpose of classification, linear discriminant

analysis (LDA), quadratic discriminant analysis (QDA) and support vector machines

(SVM) were used.

Chapters 4 and 5 are dedicated to the problem of system identification. In Chapter 4, we use simulated data to develop a new recursive formula to identify an underlying system, given the pair of input-output signals of the same system. This approach is based on the Laguerre basis approximation that was used throughout this research. Section 4.3 outlines the limitations of the proposed system identification method and suggests the use of a grid search method to select the scaling parameter of the Laguerre basis to be used in the entire process.

In Chapter 5, we develop an alternative method to the recursive formula in Chapter 4 which is suitable even when there is noise in the input output signals. This approach

is based on a group Lasso regression methodology (Section 5.3.1).

Chapter 6 summarize the contribution of this research and provides concluding remarks. This chapter also refers to numerical limitations encountered throughout this thesis and provides suggestions for future improvements of this research. Chapter 2

Signal Approximation

In this chapter, we use Laguerre functions as a basis in approximating signals.

We study different ways of performing the approximations using the least-squares

and least absolute errors as well as an approach based on a Lasso objective

function. We also present an exact method that is derived from the definition

of basis function expansions. We implement our methods on two examples and

propose a method to find a range for the scaling parameter of Laguerre functions.

2.1 Introduction

Depending on the inputs and outputs, there are two main categories of systems. If we have a continuous-time input signal and the system produces a continuous-time output, the system is known as a continuous-time system. For a discrete-time input that produces a discrete-time output, the system is called a discrete-time system. Linearity and time-invariance are two important properties in systems related to signal processing. A system is time-invariant if the characteristics of the system do

not change with time. A simple resistor–capacitor (RC) circuit is an example of a

13 CHAPTER 2. SIGNAL APPROXIMATION 14

Figure 2.1: A simple Resistor–Capacitor (RC) circuit which is time invariant if the values of R and C do not change over time. [Source: Oppenheim et al. (1996)]

time-invariant system, if the values of the resistor (R) and the capacitor (C) do not

change over time (Oppenheim et al., 1996).

A linear system should have the property of superposition, which means that if the input consists of a weighted sum of multiple signals, then the output should be the weighted sum of the outputs associated with each input entered into the same system.

For example, if an input x(t) produces the output y(t), the same system given the

input kx(t) should produce the output ky(t), where k ∈ R (Oppenheim et al., 1996).

A discrete-time linear time-invariant system, with a discrete input x[n] and a

discrete output y[n], can be mathematically represented as

∞ X y[n] = x[k]h[n − k], (2.1) k=−∞ where h[n] is the effect of the system, also known as the unit impulse response.

Equation (2.1) is known as the convolution sum and can be expressed using the convolution operator as

y[n] = x[n] ∗ h[n].

Similarly, we can represent a continuous-time linear time-invariant system using a CHAPTER 2. SIGNAL APPROXIMATION 15

similar form as Z ∞ y(t) = x(τ)h(t − τ) dτ, (2.2) −∞ where x(t) and y(t) are the continuous input and output and h(t) is the unit impulse

response of the system. This representation can also be written using the convolution

operator as y(t) = x(t) ∗ h(t) (Oppenheim et al., 1996). System identification is the

process of estimating the unit impulse response h[n] or h(t) given the inputs and

outputs in either discrete or continuous-time forms (Gu, 2012).

To mathematically represent the input x(t), the output y(t) or the impulse response

functions h(t), we use a basis function approximation. According to the approximation

theory (Lorentz, 1973), under suitable regularity conditions one can use basis functions

to mathematically represent any function. A continuous function f(·) can be expressed

as a linear combination of a set of basis functions in the same functional space as ∞ X f(t) = fiφi(t), (2.3) i=0

where φi(t) is a suitable basis function with a corresponding coefficient fi in the linear combination.

2.2 Laguerre Functions

We can use Laguerre functions as a basis for the purpose of approximating partial discharge signals. We selected Laguerre functions as the basis because they possess

some important characteristics with respect to a specific inner product (eg. see Lemma

4.1.1). Another reason for selecting the Laguerre basis is a special property of these

functions, which helps us with the process of system identification as described in Chapters 4 and 5. CHAPTER 2. SIGNAL APPROXIMATION 16

(a) (b)

Figure 2.2: First 5 orders of (a) Laguerre polynomials in the y range of −10 to 20 and (b) Laguerre functions with p = 1.

A Laguerre function of order n and parameter p > 0 can be defined as

p np −pt ln(t) = (−1) 2pe Ln(2pt); t ≥ 0, n = 0, 1, 2,..., (2.4)

where Ln(t) is the Laguerre polynomial of order n given by

n X n(−t)k L (t) = ; t ≥ 0, n = 0, 1, 2,.... (2.5) n k k! k=0

In Laguerre functions, p is a scaling parameter which controls the rate at which

p ∞ the Laguerre functions approach zero. For any p > 0 the set {ln(t)}n=0 forms an orthonormal basis (Budke, 1989). We can also write the orthonormality property of

Laguerre functions using an inner product as follows

 0 if i 6= j, lp(t), lp(t) = (2.6) i j 1 if i = j.

The first 5 Laguerre polynomials and Laguerre functions when p = 1 are shown in

Figures 2.2(a) and 2.2(b), respectively.

When the scaling parameter (p) is large, the function goes to zero faster as we can CHAPTER 2. SIGNAL APPROXIMATION 17

Figure 2.3: Laguerre functions on order n = 4 for multiple scaling parameters (p).

see in Figure 2.3. Similarly, if the scaling parameter is small, it will take longer for the Laguerre functions to go to zero.

Using the closed form expression for the Laguerre polynomials is computationally intensive in a case where multiple Laguerre basis functions are needed and there are

multiple values of t to consider. According to Szeg˝o(1939, pp. 102), a recursive

formula for Laguerre polynomials is given by

L0(t) = 1,

L1(t) = 1 − t and

(2k − 1 − t) L (t) − (k − 1)L (t) L (t) = k−1 k−2 ; for any k ≥ 2, (2.7) k k which will significantly reduce the computation times when evaluating high order Laguerre functions.

To be used as a basis function to approximate a signal, the Laguerre functions

should have a finite energy. The energy of a Laguerre function (E∞) can be expressed as Z ∞ p 2 E∞ = |ln(t)| dt < ∞ (2.8) 0 CHAPTER 2. SIGNAL APPROXIMATION 18 where |·| is the magnitude of a complex number, n is the order of the Laguerre function

p ∞ and p is the scaling parameter. According to Budke (1989), for any p > 0 {ln(t)}n=0 produces a basis for the Hilbert space L2 ([0, ∞)) which satisfies the condition (2.8).

2.3 Estimating Coefficients

As mentioned in Section 2.2, we are aiming to mathematically represent partial

discharge pulses using Laguerre functions. Suppose y(t) is the underlying partial

discharge. Using approximation theory, y(t) can be expressed as

∞ X p y(t) = yjlj (t), (2.9) j=0

p where lj (t) are Laguerre functions evaluated at time t and yj are coefficients that need to be evaluated or estimated.

Instead of considering all basis functions, in practice, we can select a finite number

of basis functions ky(< ∞) such that the approximated signal is close to the actual function. The approximated signal can be expressed as,

ky X p y(t) ≈ yˆ(t) = yjlj (t), (2.10) j=0

where yj are coefficients that need to be evaluated or estimated.

N Let the observed time points be denoted by ti; i = 1,...,N and t = {ti}i=1. Let alsoy ˆ(t) be defined as

 Pky p  j=0 yjlj (t1) Pky p  yjl (t2)  yˆ(t) =  j=0 j  . (2.11)  .   .  Pky p j=0 yjlj (tN ) CHAPTER 2. SIGNAL APPROXIMATION 19

Then, the expression in (2.10) using all the time points t can be written as,

yˆ(t) = Lpy, (2.12) with lp(t ) lp(t ) . . . lp (t )  0 1 1 1 ky 1 p p p l0(t2) l1(t2) . . . l (t2)  L =  ky  , (2.13) p  .   .  lp(t ) lp(t ) . . . lp (t ) 0 n 1 N ky N

and   y0  y   1  y =  .  . (2.14)  . 

yky

To approximate the partial discharge pulses, we have to evaluate or estimate the vector y or in other words the coefficients yj; j = 0, . . . , ky. For this purpose, we can define an objective function which is then minimized to obtain the estimated coefficients. There are different objective functions that we can consider. For this research, we have considered three such functions.

2.3.1 Least-Squares Objective Function

The first objective function that we considered is the least-squares objective function as specified below:

tN X 2 OLS = (y(t) − yˆ(t))

t=t1

2 tN ky ! X X p = y(t) − yjlj (t) . (2.15) t=t1 j=0 CHAPTER 2. SIGNAL APPROXIMATION 20

The solution that we get by minimizing this objective function with respect to y, is the same as the one that one can get by fitting a linear regression model in the form

of y(t) = Lpy + (t) and estimating it’s parameters. According to the solution of a linear regression model, we can estimate the coefficients as

 > −1 > y = Lp Lp Lp y(t), (2.16)

 >  given that Lp Lp is non singular.

2.3.2 Least Absolute Objective Function

One can also consider the least absolute objective function for parameter estimation as given below:

t XN OAbs = |y(t) − yˆ(t)|

t=t1

ky tN X X p = y(t) − yjlj (t) . (2.17) t=t1 j=0

This objective function can be minimized to obtain the coefficients that we need. The estimated coefficients do not have a closed form solution and we can use a numerical minimization algorithm to obtain the required results.

2.3.3 Lasso Objective Function

The Lasso was a method popularized by Tibshirani (1996), which works as a regression

modelling technique and provides variable selection within the same process. The main motivation for using Lasso regression models is that they can provide much more interpretable models due to the variable selection component and, at the same time, CHAPTER 2. SIGNAL APPROXIMATION 21

they show the stability of a ridge regression model (Tibshirani, 1996). This property

can be useful in our application too since we can ignore the basis functions which have coefficients close to zero. The Lasso objective function can be written in the form of,

tN ky X 2 X OLasso = (y(t) − yˆ(t)) + λ |yj|

t=t1 j=0

2 tN ky ! ky X X p X = y(t) − yjlj (t) + λ |yj| . (2.18) t=t1 j=0 j=0

In the Lasso objective function, λ is an extra shrinking parameter that needs to be estimated. We can use the parameter λ to balance between accuracy and simplicity of the model. The optimal λ for a given set of values can be found using cross-validation

(Tibshirani, 1996).

In the Lasso method, the objective is to shrink all the coefficients towards zero. If we want only a group of coefficients to be penalized and the rest to be unpenalized, we can use the group Lasso objective function (Friedman et al., 2010). We will be

discussing Lasso and group Lasso methods in detail in Section 5.3.1.

2.4 Deterministic Method of Coefficient Estima- tion

We also looked at a mathematical method to estimate the same set of parameters, without using an objective function. Let y(t) be defined as in (2.9). Then the inner

p product of y(t) and li (t) can be used to get an expression for the coefficient yi. To CHAPTER 2. SIGNAL APPROXIMATION 22

this end, note that

* ∞ + p X p p hy(t), li (t)i = yjlj (t), li (t) j=0

∞ X p p = yj lj (t), li (t) . j=0

Since  1 if i = j, lp(t), lp(t) = i j 0 otherwise,

one can easily observe that

p yi = hy(t), li (t)i . (2.19)

This result can be rewritten using the definition of the inner product as,

p yi = hy(t), li (t)i

Z ∞ p = y(τ)li (τ) dτ. (2.20) 0

The coefficients that we are estimating will be directly related to the corresponding Laguerre basis and do not depend on the number of terms of the expansion that we

have. Therefore, the coefficients of the approximated signal in (2.10) can also be written the same way.

In the case of signal approximation using a basis function, the true form of y(t)

is unknown. An approximation to (2.20) can be given using the trapezoidal rule for

integral approximation, when y(t) is observed at multiple discrete time points. To CHAPTER 2. SIGNAL APPROXIMATION 23

this end, we have Z ∞ p yi = y(τ)li (τ) dτ 0

N−1 X 1 ≈ [t − t ][y(t )lp(t ) + y(t )lp(t )] 2 j+1 j j i j j+1 i j+1 j=1

∗ := yi . (2.21)

If the signal is observed at equal time gaps, tj+1 − tj = ∆t for any j = 1,...,N − 1,

(2.21) can be simplified as,

N−1 ∆t X y∗ = [y(t )lp(t ) + y(t )lp(t )] i 2 j i j j+1 i j+1 j=1

∆t = [y(t )lp(t ) + 2y(t )lp(t ) + ··· + 2y(t )lp(t ) + y(t )lp(t )] . (2.22) 2 1 i 1 2 i 2 N−1 i N−1 N i N

We call this the exact method since this uses the definition of the inner product to obtain the coefficient of the basis expansion mathematically, rather than using an objective function to estimate the coefficients statistically. Throughout the thesis we will use the term exact method to refer to this method and the coefficients obtained using (2.20) or the approximated coefficients given in (2.22).

2.5 Examples of Signal Approximation

To demonstrate the above approximation methods, we work with two functions that will also be used in Chapter 4. We use these functions to generate necessary signals to

be used as inputs and/or outputs in our simulation for system identification. System

identification is the process of building a mathematical model for a dynamic system

by looking at the way the system behaves (Sage and Melsa, 1971). In these studies, CHAPTER 2. SIGNAL APPROXIMATION 24

Figure 2.4: Gaussian function f1(t) to be approximated using Laguerre functions. we consider two settings. In the first example, there is a delay in the start of the input signal and we study the effect of this delay on the signal approximation using our proposed method.

2.5.1 Example 1 (Gaussian Function)

The first example takes the form of a Gaussian function given by

−2(t−5)2 f1(t) = e , (2.23)

and illustrated in Figure 2.4. In this function, we can see that the start of the signal is not the same as the start of the experiment.

When using the least-squares objective function to approximate f1(t), we can think in terms of a where the parameters should be approximated as mentioned in Section 2.3.1. We get a closed form expression for the coefficients which can be

easily obtained using the function lm in R.

The approximated functions at orders 1, 3, 10, 20, 50 are shown in Figure 2.5(a).

We can see that the a linear model is unable to estimate the required parameters for the approximation of order 50. This is a common issue for higher orders and occurs CHAPTER 2. SIGNAL APPROXIMATION 25

(a) (b)

Figure 2.5: Approximated Gaussian functions using (a) least square and (b) Lasso objective functions together with the actual function.

because the Laguerre functions at higher orders look similar and are highly correlated. This issue will be discussed in detail in Section 6.1.

We can use an optimization function such as nlminb in R for the numerical minimization of the least absolute objective function (2.17). The resulting fits for the

same orders as above are shown in Figure 2.6(a). This objective function gives a good

fit to the original function, especially at higher orders.

The initial value for all the coefficients was selected to be 1 for the numerical minimization in the case shown in Figure 2.6(a). We can see that the program isn’t able

to estimate the coefficients properly if we select the initial value as 0 (Figure 2.6(b)).

This indicates that the method of the least absolute objective function is sensitive to the initial value that we select, which can be a disadvantage in an automated process where we cannot check the fit for each signal.

In order to estimate the coefficients using the Lasso objective, we used the

cv.glmnet function in the package glmnet in R. Figure 2.5(b) provides the fitted curves for different orders for the function. This result also shows that increasing CHAPTER 2. SIGNAL APPROXIMATION 26

(a) (b)

Figure 2.6: Approximated Gaussian function using least absolute objective for an initial value of (a) 1 and (b) 0.

Figure 2.7: Approximated Gaussian function using the exact method. the order does not make much of a difference for the fit of the function. This is understandable since Lasso shrinks parameters towards zero and has shrunk all high order coefficients to zero.

We can also use the exact method mentioned in Section 2.4 which would provide a good fit for the actual function as seen in Figure 2.7. CHAPTER 2. SIGNAL APPROXIMATION 27

2.5.2 Example 2

We now look at another function used in Chapter 4, of the form

f2(t) = g(t) − g(t − 2), (2.24) where " " √ ! √ √ !## − t 3 3 3 g(t) = 1 − e 2 cos t + sin t u(t). (2.25) 2 3 2

and  0, t < 0, u(t) = (2.26) 1, t > 0.

The form of the function f2(t) is as shown in Figure 2.8(a). The approximated functions using least-squares, least absolute, Lasso objective functions and the exact

method are shown in Figures 2.8(b), 2.8(c), 2.8(d) and 2.9, respectively.

Using these graphs we can see that all four methods can approximate the given function well. Estimation using the least-squares objective function failed for the case of order 50, similarly to the previous example, due to the singular design matrix. The Lasso method has underestimated the function in this example too, due to the tendency of the Lasso approach to shrink coefficients towards zero. The exact method has been able to properly estimate the function at a lower order compared to the other methods.

We can see that all the methods have performed here well or even better than in the previous example for lower order approximations. This suggests that if we have a time lag in the signal, we will have to use an order higher than what we would use for a signal without a time lag. We should pay attention to the order which can depend on the function that we try to approximate at the same time we should select a proper value for the scaling parameter p as well. CHAPTER 2. SIGNAL APPROXIMATION 28

(a) (b)

(c) (d)

Figure 2.8: (a) Function f2(t) to be approximated. Laguerre approximation of f2(t) using (b) least-squares, (c) least absolute and (d) Lasso objective functions.

2.6 Selecting a Scaling Parameter

In the process of approximating a partial discharge signal using the Laguerre basis functions, we need to choose the scaling parameter (p) properly such that the approxi- mated function represents the true signal waveform. For the purpose of identifying this parameter, Yuan et al. (2005) has outlined a relationship between the order of the Laguerre basis (n) and the scaling factor (p).

1% As stated by Saboktakinrizi (2011), fT (n) is the time in which the Laguerre CHAPTER 2. SIGNAL APPROXIMATION 29

Figure 2.9: Approximated function f2(t) using the exact method.

1% function of order n takes to fall to 1% of it’s peak value. The time fT (n), n and p have been empirically observed to follow a relationship in the form of,

1% fT (n) = 1.92n + 2.64. (2.27)

Suppose τ is the time window of the signal to be approximated. Saboktakinrizi (2011)

1% expressed a relationship between τ, fT (n) and p as follows,

f 1%(n) τ ≤ T . (2.28) p

1% We aim to provide a simple model (fT (n, p)) which can be used to replace the

1% fT (n) 1%  term p . The relationship of n and p with log fT (n, p) is shown in Figure

2.10(a).

Judging by the shape of the relationship in Figure 2.10(a), we used a quadratic

1%  model to approximate log fT (n, p) . Specifically, we estimated the parameters using a linear regression model including linear, and quadratic terms as predictors. This provided a model in the form of

 1%  2 2 E log fT (n, p) = 0.84 + 74.34n − 82.85p − 9.84np − 27.53n + 33.35p , (2.29) CHAPTER 2. SIGNAL APPROXIMATION 30

(a) (b)

1%  Figure 2.10: (a) Relationship of n and p with log fT (n, p) . (b) Fitted relationship 1%  of n and p with log fT (n, p) , using a quadratic model. which is also shown in Figure 2.10(b). We can use this model together with the

1% 1% expression τ ≤ fT (n, p) to get a range for p. To this end, note that τ ≤ fT (n, p) is

1%  equivalent to log(τ) ≤ log fT (n, p) and accordingly, log(τ) ≤ 0.84 + 74.34n − 82.85p − 9.84np − 27.53n2 + 33.35p2. (2.30)

This gives us an interval for p given by [p1, p2], where ( ) 82.85 + 9.84n ± p3769.82n2 − 8286.94n + 6751.98 + 133.41 log(τ) p , p := max 0, . 1 2 66.70 (2.31)

Table 2.1 provides the values of p1 and p2 when τ = 1.

Saboktakinrizi (2011) argued that the range for p can be bounded by the frequency

component of the signal. If the frequency at which the magnitude of the Laguerre

p function ln(jω), where j is the imaginary unit, of order n takes to fall to 10% of its CHAPTER 2. SIGNAL APPROXIMATION 31

Table 2.1: Interval for p for different orders. n 0 40 80 120 160 200 Lower Interval (p1) 0 0 0 0 0 0 Upper Interval (p2) 2.5 43 85.7 128.4 171.1 213.8

10% peak value can be calculated using, ffreq(p) = 1.59p, then the bandwidth component

10% of the signal (BW ) should be BW ≤ ffreq(p). Considering both criteria, we suggest to use a value for p which is close to the upper limit, which tends to be close to the order that we are using to approximate the partial discharge signal.

Since the margins of this interval are wide, we used a grid search approach to find

the scaling parameter which is best suited by calculating the mean squared error (MSE)

of the actual and approximated functions and then selecting the scaling parameter which provides the minimum MSE.

In this chapter, we used the Laguerre basis to approximate a known function. We used least-squares, least absolute and Lasso objective functions to estimate the expansion coefficients of the approximation. To estimate the Laguerre basis expansion coefficients, we also considered a deterministic approach based on the inner product of functions which we coined the exact method. To obtain an estimate for the coefficients using the exact method, the trapezoidal rule was used. In the process of the approximation, a proper scaling factor should be selected. To this end, an interval for the scaling factor was developed based on a fitted quadratic regression model. In the next chapter, we will discuss partial discharge source identification using coefficients of the Laguerre basis approximation. Chapter 3

Partial Discharge Source Classification

In this chapter, we develop partial discharge source classification techniques

using Laguerre functions for signal approximation and implement them on a

real data application. To this end, we work with a set of partial discharge

signals obtained from a lab experiment and approximate them using Laguerre

basis functions. We then use the coefficients of the approximation as a set

of features for classification. We develop different classifiers to classify the

partial discharge pulses to their respective sources. To be more specific, we

perform linear discriminant analysis (LDA), quadratic discriminant analysis

(QDA), and develop support vector machines (SVM) to classify each signal to its

corresponding source. Also, we explore the use of principal component analysis

(PCA) in this context and compare the results with the results obtained without

the use of PCA.

32 CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 33 3.1 Experimental Setup

The partial discharge (PD) signals that we used in this research were obtained from

an experiment conducted by Dr. Saeed Shahabi (Shahabi, 2019) and Mr. Ali Nasr

Esfahani (Nasr Esfahani, 2018) in the McMath High Voltage Laboratory at the

University of Manitoba. The signals needed for system identification, which will be detailed in Chapter 5, were separately obtained at the same lab with the help of the same researchers.

We studied two types of sources to collect partial discharge signals in both ex-

periments. One source consisted of a twisted pair of wires as shown in Figure 3.1(a).

The other source consisted of a needle-plane setup (see Figure 3.1(b)). These two

sources were used in parallel and we considered the combined setup as a third partial discharge source, which will be called the combined source. Throughout the thesis, the source with twisted pairs is referred to as source 1, the source with the needle-plane setup as source 2 and both sources in parallel as source 3.

These partial discharge setups were connected to a high voltage source of 3 kV with the partial discharge measurement system connected to a ground wire. An oscilloscope was used to record the partial discharge waveform.

The oscilloscope discretizes the continuous input that is given to it and records the amplitude of the signal at regular intervals. When recording the waveforms in

this experiment, a rate of 1 GigaSamples per second (GSa/s) was used.

The sampling rate of an oscilloscope is the number of samples or the number of data points observed in one second. When a higher sampling rate is used, the recorded signal is much closer to the true continuous signal and at the same time requires more processing power to obtain and more space to store. Therefore as a compromise CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 34

(a) (b)

Figure 3.1: Discharge sources, (a) a twisted pair of wires and (b) a needle-plane setup, used to obtain partial discharge signals in this research.

between being close to the continuous signal and not exhausting the oscilloscopes

resources, 1 GSa/s rate was selected. Since there are 109 points within a time window

of 1 second, the time gap between two points is 10−9 seconds, or 1 nanosecond (ns).

During the experiment, the pressure was fixed at 33 kilopascal (kPa) and the frequency

at 2000 Hertz (Hz).

3.2 Data Description

In the data set collected through the experiment, there are 1024 pulses associated with

source 1 (twisted pairs), 512 pulses associated with source 2 (needle-plane setup) and

512 pulses associated with source 3 (combined). A sample of a single pulse from each

source is shown in Figure 3.2. The time axis in Figure 3.2 is given in microseconds

(µs) or 10−6 seconds and the oscilloscope considers the time to be zero (0) at the point when the partial discharge pulse triggered the oscilloscope. This leads to reporting the CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 35

(a) (b) (c)

Figure 3.2: Sample of PD pulses for the sources (a) twisted pair of wires, (b) needle- plane setup and (c) combined, obtained in the lab.

time values as negative in the plots and these time values do not have any meaning in the case of this analysis.

When we analysed all the pulses for the source with twisted pairs of wires, we noticed some pulses with a significantly different pattern than the regular PD pulses

(Figure 3.3). This can be attributed to the positive half of the high voltage input

signal used in this experiment. These pulses can be considered as noise and therefore these anomalies were removed from the data.

We also observed that these anomaly pulses tend to occur after one another. To get rid of these pulses, pulses in the source with the twisted pair of wires which cross 0.005 on the positive quadrant were identified as the first pulse and the 4 successive pulses were removed from the data. After this operation, only 653 pulses are left for source 1.

3.3 Signal Reconstruction

We used the approach mentioned in Chapter 2 to mathematically represent each partial discharge signal in our dataset. Recall that we can approximate a given signal using CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 36

(a) (b) (c)

(d) (e) (f)

Figure 3.3: (a) a regular PD pulse in the source with the twisted pair of wires. (b) – (f) anomaly pulses generated due to the positive half of the high voltage signal.

Pky p the basis function expansion as y(t) = j=0 yjlj (t), where yj are the coefficients of

p the basis functions and lj (t) is a Laguerre function of order j with a scaling parameter

p. We used a time window t ∈ [0, 1] since the time recorded in the experiment has no

specific meaning as mentioned in Section 3.2.

Figure 3.4 shows that the approximated signals follow the general pattern of the PD signal, but do not reveal the entire shape of the signal. This is similar to the example in Section 2.5.1, which illustrates the difficulty in observing a basis function approximation close to the original signal, when the signal does not start at time 0. Therefore, a reasonable approach is to remove the delay from the PD signals to make them start at time 0. CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 37

(a) (b) (c)

Figure 3.4: Sample PD pulses and the approximated signals obtained using the Laguerre basis approximation for sources (a) twisted pair of wires, (b) needle-plane setup and (c) combined. An order of 90 and a scaling parameter of 10 was used for the approximation.

3.4 Removing Signal Delay

In order to approximately identify the starting point of a signal, we propose a method based on the energy of a signal. This method will help us to remove the delay from the signal which will lead to a better approximation of the PD signal, when a Laguerre basis function approximation is used.

Let the total energy of a signal be denoted by E∞. According to Oppenheim et al.

(1996), E∞ can be written as

E∞ = hy(t), y(t)i

Z ∞ = |y(t)|2 dt. (3.1) 0

P∞ p Using a Laguerre expansion representation of a signal y(t) = j=0 yjlj (t), we can

rewrite (3.1) as CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 38

Z ∞ " ∞ #" ∞ # X p X p E∞ = yili (t) yjlj (t) dt 0 i=0 j=0

Z ∞ ∞ ∞ X X p p = yiyjli (t)lj (t) dt. (3.2) 0 i=0 j=0

Since Laguerre functions form an orthonormal basis on R+, one can easily show that

the total energy E∞ of a signal y(t) is ∞ Z ∞ X 2 p 2 E∞ = yi [li (t)] dt i=0 0

∞ X 2 = yi . (3.3) i=0

Similarly, the energy of the signal at a time T (< ∞) is defined as Z T 2 ET = |y(t)| dt 0

Z T ∞ ∞ X X p p = yiyjli (t)lj (t) dt. (3.4) 0 i=0 j=0

Using Jensen’s inequality

Z T ∞ ∞ X X p p |ET | ≤ yiyjli (t)lj (t) dt, (3.5) 0 i=0 j=0

and with the use of the triangle inequality we obtain

Z T ∞ ∞ Z T ∞ ∞ X X p p X X p p yiyjli (t)lj (t) dt ≤ yiyjli (t)lj (t) dt, (3.6) 0 i=0 j=0 0 i=0 j=0 which leads to the expression for the absolute energy to be bounded above as follows: Z T ∞ ∞ X X p p |ET | ≤ yiyjli (t)lj (t) dt. (3.7) 0 i=0 j=0 CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 39

The multiplicative property of the absolute value will lead to rewriting the expression as

∞ ∞ Z T X X p p |ET | ≤ |yi||yj| |li (t)||lj (t)| dt. (3.8) i=0 j=0 0

Our objective is to find T such that ET = E∞P where P is a desired proportion. According to Love (1997), absolute Laguerre functions are bounded above as follows:

1  p Γ(N + p + 1)Γ p + 2 t |lN (t)| ≤ 1  e . (3.9) |Γ(N + 1)|Γ(p + 1) Γ p + 2

This leads to the expression,

" 1  #" 1  # p p Γ(i + p + 1)Γ p + 2 t Γ(j + p + 1)Γ p + 2 t |li (t)||lj (t)| ≤ 1  e 1  e |Γ(i + 1)|Γ(p + 1) Γ p + 2 |Γ(j + 1)|Γ(p + 1) Γ p + 2

" #2 Γ(i + p + 1)Γ(j + p + 1) Γ p + 1  = 2 e2t, (3.10) |Γ(i + 1)||Γ(j + 1)| 1  Γ(p + 1) Γ p + 2 and since i and j are positive integers,

" #2 (i + p)(j + p)Γ(i + p)Γ(j + p) Γ p + 1  |lp(t)||lp(t)| ≤ 2 e2t i j i!j! 1  Γ(p + 1) Γ p + 2

2t = Ai,j,pBp e (3.11)

 1 2 (i+p)(j+p)Γ(i+p)Γ(j+p) Γ(p+ 2 ) where Ai,j,p = i!j! and Bp = 1 . Then, Γ(p+1)|Γ(p+ 2 )|

Z T Z T p p 2t |li (t)||lj (t)| dt ≤ Ai,j,pBp e dt 0 0

A B = i,j,p p e2T − 1 . (3.12) 2 CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 40

Therefore, expression (3.8) becomes,

∞ ∞ Z T X X p p |ET | ≤ |yi||yj| |li (t)||lj (t)| dt i=0 j=0 0

∞ ∞ X X Ai,j,pBp ≤ |y ||y | e2T − 1 . (3.13) i j 2 i=0 j=0

This yields the expression

∞ ∞ Bp X X |E P | ≤ e2T − 1 |y ||y |A . (3.14) ∞ 2 i j i,j,p i=0 j=0

P∞ P∞ Let Cp = i=0 j=0|yi||yj|Ai,j,p. Then, B C E P ≤ p p e2T − 1 . (3.15) ∞ 2

Finally, we get a bound for the time T such that the signal reaches a certain proportion

(P ) of the total energy, with the expression

1 2E P  T ≥ log ∞ + 1 . (3.16) 2 BpCp

One can use this bound to identify the beginning of the signal given an appropriate

proportion P . Using (3.16) and by selecting the proportions to be 0.064, 0.064 and 0.1

for sources twisted pair, needle-plane and combined sources, respectively, we found the starting point of the PD signals. When the PD signals with the delays removed were approximated, we can see that the fits are improved significantly (Figure 3.5).

In the field of signal processing, researchers tend to remove the noise from the signals before approximation, which is referred to as denoising or smoothing. Techniques such as wavelet transformation and filtering or some statistical methods are commonly used for denoising, in practice. In this analysis, we do not require an additional step CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 41

(a) (b) (c)

Figure 3.5: Sample of PD pulses where the delays are removed, and the approximated signal overlaid, for sources (a) twisted pair of wires, (b) needle-plane setup and (c) combined. An order of 90 and a scaling parameter of 10 was used for the approximation. for smoothing, as often done in the literature, since the basis function approximation acts as a smoother. This helps us to reduce the computations required. A description of some smoothing methods is given in Appendix A.

The coefficients obtained in the process of approximating the PD signals can be used as a set of features, to reach our objective of classifying the partial discharge signals to their respective sources. An advantage of the proposed method is the ability to perform signal smoothing and construct the set of features simultaneously.

3.5 Overview of Some Classification Methods

In statistical learning, we can identify two main types of learning methods, namely, supervised and unsupervised learning. Supervised learning methods refer to having an output or a response which we hope to predict using a set of inputs also referred to as predictors, features or independent variables. In contrast, unsupervised learning methods have a set of inputs while there is no response variable. Regression and classification are the two most popular supervised learning approaches. A problem can be called a regression problem if the response is quantitative in nature and a CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 42

classification problem if the response is qualitative in nature. There are also many instances that a regression-based approach can be used to perform classification. There are many classification techniques that can be used, the most popular ones being linear

and quadratic discriminant analysis (LDA and QDA, respectively), ,

K-nearest neighbours and support vector machines (SVM) (James et al., 2013).

We can formulate the classification problem mathematically, using a set of training

p data (Xi,Yi), i = 1,...,N, where Yi is the class of the ith observation and Xi ⊆ R are the p features of the ith observation. If the response has K classes {0, 1,...,K − 1},

a classification rule or a classifier is a function h : X → {0, 1,...,K − 1} where X is

the domain of X. Regardless of the actual class labels, we can encode the original

class labels to {0, 1,...,K − 1} and build a classifier. We denote the predicted class

ˆ of the ith observation as Yi = h(xi), where xi are the measured features of that observation (James et al., 2013).

In binary classification, there are 2 classes in the response (K = 2) where the

classification rule is defined as h : X → {0, 1}. We can use a to express the error associated with the binary classification problem as,  0 when yi =y ˆi, L(yi, yˆi) = (3.17) 1 when yi 6=y ˆi.

We can obtain the classification risk of the classifier h as R(h) = E (L(Y, h(X))). One can show that R(h) = Pr (Y 6= h(X)), which is the probability of misclassification,

and minimize the risk to obtain an optimal classifier.

3.5.1 Bayes Classifier

The naive Bayes classifier is built on the concept of minimizing the classification risk.

Suppose that the classification problem has a response with K(≥ 2) classes or in other CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 43 words, K possible distinct and unordered values. We denote the probability that

a randomly chosen observation comes from the kth class as πk. This probability is

known as the overall or of class k. Let fk(x) = Pr(X = x|Y = k) represent the density function of the features X, given that the observation actually

comes from the kth class (James et al., 2013).

Using the Bayes theorem, π f (x) p (x) := Pr (Y = k|X = x) = k k . (3.18) k PK l=1 πlfl(x)

Through this expression, we can obtain the probability that a certain observation belongs to a specific class, given the predictor value x. This leads us to define the

Bayes classifier to be argmax pk(x). In other words, each observation is assigned to k the class which has the largest (James et al., 2013).

One can also formulate the Bayes classifier using the risk function which is R(h) =

1 − Pr (Y = h(X)). We can show that minimizing this risk function provides the same

classifier.

To use (3.18), we need knowledge of πk and fk(x). πk can be estimated easily using a random sample of responses from the population and finding the proportions

of observations which belong to each class. Estimating fk(x) is not simple and the practice is to assume a simple distribution and use it instead. One can also use

non-parametric techniques to estimate fk(x).

3.5.2 Linear Discriminant Analysis

Linear discriminant analysis (LDA) assumes a Gaussian distribution for fk(x). Initially, let’s assume a classification problem that uses LDA with a single predictor variable. CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 44

In this approach, we assume that in class k, the predictor variable X is distributed as   1 1 2 fk(x) = √ exp − 2 (x − µk) , (3.19) 2πσk 2σk

2 where µk and σk are the mean and of X in the kth class, respectively. In LDA for simplicity, we assume that the variances of different classes are equal

2 2 2 (σ1 = ··· = σK = σ ). Then (3.18) can be expressed as 2 π √ 1 exp − 1 (x − µ )  k 2πσ 2σ2 k pk(x) = K 2 P π √ 1 exp − 1 (x − µ )  l=1 l 2πσ 2σ2 l

π exp − 1 (x − µ )2 = k 2σ2 k . (3.20) PK 1 2 l=1 πl exp − 2σ2 (x − µl)

By taking the natural logarithm of both sides of the equation, we can obtain the expression

2 2 ( K  ) xµk µ x X 1 2 log (p (x)) = log (π ) + − k − − log π exp − (x − µ ) . k k σ2 2σ2 2σ2 l 2σ2 l l=1 (3.21)

Intuition suggests to select k which produces the largest pk(x) by looking at the k which gives the largest value for µ µ2 δ (x) = x k − k + log (π ) , (3.22) k σ2 2σ2 k

x2 nPK 1 2o for a given set of features X = x since the term − 2σ2 −log l=1 πl exp − 2σ2 (x − µl) is independent of k.

2 We will also need to replace the parameters µk and σ with the estimates of these as K 1 X 2 1 X X 2 µˆk = xi andσ ˆ = (xi − µˆk) , (3.23) Nk N − K i:yi=k k=1 i:yi=k CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 45

where N is the total number of training data points and Nk is the number of training

data points in class k. We can estimate πk as N πˆ = k . (3.24) k N

All of these results will lead us to define the LDA classifier as, µˆ µˆ2 δˆ (x) = x k − k + log (ˆπ ) . (3.25) k σˆ2 2ˆσ2 k

The LDA classifier is linear in terms of x which has led to this method being called linear discriminant analysis.

We can extend this result to a setting with p features. In this case we consider a

multivariate Gaussian distribution N (µk, Σ) where µk is the mean vector for class k and Σ is the common to all K classes. Then we can define the classifier to be 1 δ (x) = xT Σ−1µ − µT Σ−1µ + log (π ) , (3.26) k k 2 k k k

(James et al., 2013).

3.5.3 Quadratic Discriminant Analysis

Quadratic discriminant analysis (QDA) is similar to LDA, the one major difference

being that we are not assuming the same covariance structure over all classes (James

et al., 2013). Instead, we consider a separate covariance structure Σk for the kth class. This leads to the QDA classifier being defined as, 1 δ (x) = − (x − µ )T Σ−1 (x − µ ) + log (π ) . (3.27) k 2 k k k k

The classifier is a quadratic function in the features, which leads to the name. The choice between LDA and QDA comes down to the bias- trade-off. LDA takes CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 46

a linear decision boundary which is less flexible compared to the decision boundary of a QDA model. But the QDA model has to estimate many more additional parameters since it assumes separate covariance matrices for different classes. LDA is widely considered to have better performance compared to QDA if the number of training observations is small and QDA seems to perform better with a large number of training

data (James et al., 2013).

3.5.4 Support Vector Machines

Support vector machines (SVM) were developed by computer scientists in the 1990s

and have become popular since then. SVMs are based on a simple classifier called the maximum margin classifier. This is a simple and elegant classifier which gives a linear decision boundary in its original form making it unsuitable for most problems. This linear boundary can be extended to accommodate nonlinear classifiers which produce

nonlinear SVMs (James et al., 2013).

Maximum margin classifiers are based on hyperplanes in the p-dimensional space. A hyperplane is a flat affine subspace with a dimension of p − 1 which can be expressed as,

β0 + β1X1 + β2X2 + ··· + βpXp = 0. (3.28)

For a given training observation (x1, . . . , xp), if a separating hyperplane exists, we can use it to classify based on what side of the hyperplane the observation falls on. In

other words, if β0 + β1x1 + β2x2 + ··· + βpxp > 0, the observation lies on one side of

the hyperplane where as if β0 + β1x1 + β2x2 + ··· + βpxp < 0, the observation lies on the other side. Naturally this method will only be able to classify binary-class cases,

but we can extend the method to a multi-class case with a voting system (James et al.,

2013). CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 47

Figure 3.6: Three separating hyperplanes, out of infinite possibilities. [Source: James et al. (2013).]

If the training data are linearly separable with a hyperplane, we will be able to find an infinite number of such hyperplanes (Figure 3.6). In maximal margin hyperplanes or optimal separating hyperplanes, we aim to find the plane which is the furthest from the training observations that fall above and below it. This is done by calculating the perpendicular distance from each training observation to the separating hyperplane, as shown in Figure 3.7. The largest distance that the observations provide is called the margin. In this classifying method, we hope to find the separating hyperplane such that this margin is maximized (James et al., 2013).

In practice, one might not be able to always classify the data perfectly using a linear separating hyperplane. Even if we are able to find such a plane, it could be very sensitive to individual observations. To get rid of this problem of sensitivity, one can define a support vector classifier which does not perfectly separate the observations outside the margins. With this classifier, we allow for a small subset of observations to be on the wrong side of the margin as shown in Figure 3.8 (James et al., 2013). CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 48

Figure 3.7: An example of a two class problem with the maximal margin separating hyperplane shown in a solid black line. The distance between the solid and dashed lines corresponds to the margin. The points marked on the margins are known as support vectors. [Source: James et al. (2013).]

Support vector machines are an extension of support vector classifiers which can produce nonlinear decision boundaries. This is done by expanding or projecting the feature space into another dimension in which these data can be easily separated using

a linear hyperplane. This projection is done with the help of kernels (James et al.,

2013).

Kernels typically construct a kernel matrix to project the data into another dimension. One advantage of kernels is that, although it changes the dimension

(typically to a higher dimension) of the data, it does not require any calculations in

those higher dimensions (Wood, 2017). CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 49

Figure 3.8: Observations 1 and 8 are on the wrong side of the margin while 2 and 9 are on the margins. Other observations are classified correctly. [Source: James et al. (2013).]

3.6 Classification of Experimental Data

We use the classification methods discussed earlier to automatically classify the partial discharge signals from the experiment into their respective sources. We used the first 3 coefficients of the Laguerre basis approximation as the set of features for the

classification. An initial visual inspection (Figure 3.9) shows these coefficients can

easily identify the source that each signal belongs to.

Before performing classification, we separated 75% of the signals from each source as training data and the rest as test data. Using the training data we fitted the LDA

model using the lda function in the MASS package in R and the QDA model using qda

function in the same package. For the SVM model we used the svm function in the e1071 package. We applied a Gaussian kernel in SVM and used the tune function to select the optimal cost and gamma parameters using 10-fold cross-validation. To evaluate the performance of these classifiers, we predicted the source for the test data using all three methods and compared the results with the actual source labels. CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 50

Figure 3.9: First 3 Laguerre coefficients for the signals from all sources.

Table 3.1: The confusion matrix between the actual and predicted labels for the test data using the LDA model. Actual Twisted Pair Needle-Plane Both Twisted Pair 163 0 1 Predicted Needle-Plane 0 62 21 Both 0 66 106

Table 3.2: The confusion matrix between the actual and predicted labels for the test data using the QDA model. Actual Twisted Pair Needle-Plane Both Twisted Pair 163 0 0 Predicted Needle-Plane 0 108 0 Both 0 20 128

Table 3.1 shows that the LDA classifier has a misclassification error rate of 21.00% on the test data when 3 Laguerre coefficients are selected as features. Comparatively,

QDA has a lower misclassification error of 4.77% (Table 3.2). SVM has performed the

best with a rate of 3.10% as evident by Table 3.3. This indicates that the method of using Laguerre basis expansion coefficients in identifying the source of the PD yields CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 51

Table 3.3: The confusion matrix between the actual and predicted labels for the test data using the SVM model with a Gaussian kernel. Actual Twisted Pair Needle-Plane Both Twisted Pair 163 0 1 Predicted Needle-Plane 0 125 10 Both 0 3 118

results with high accuracies in this case and when used with QDA and SVMs.

3.7 Principal Component Analysis

Principal component analysis (PCA) is a commonly used method in the context of

partial discharge source classification. Principal components (PCs) project a set of

data into a different space where the variables are uncorrelated and ordered according to their variance. Since the PCs are ordered according to their variance, we might be able to summarize a large set of variables with a smaller number of representative variables. This smaller number of PCs has the potential to explain most of the variability of the original set of features. Let there be a set of features denoted by

X1,X2,...,Xp. Then the first principal component can be defined as the normalized linear combination of the features

Z1 = a11X1 + a21X2 + ··· + ap1Xp, (3.29)

that has the largest variance. The coefficients of the linear combination a11, . . . , ap1 are known as loadings of the first PC and are normalized in the sense that they satisfy

Pp 2 j=1 aj1 = 1. Similarly, we can find the successive principal components. When we find these loadings such that the variance is maximized, it can be shown that they are

the elements of the eigenvector of the covariance matrix of the features (James et al.,

2013). CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 52

Figure 3.10: First 3 principal components of the Laguerre coefficients for the signals form all sources.

Using the data from the lab experiment, we used the prcomp function in R to find PCs for the coefficients of the Laguerre basis expansion. Visualization of the first 3 PCs in Figure 3.10 shows that the PCs are not as easily separable, compared to the case in which Laguerre coefficients are used directly. When we train the classifiers by taking all the PCs as the features, classifiers show very poor classification powers with misclassification rates being 23.63%, 48.21% and 23.63% for LDA, QDA and SVM, respectively. Therefore this method is not suggested to be used when using Laguerre coefficients as the set of features.

3.8 Normalizing the Signals

Figure 3.5 shows that the three partial discharge sources we considered typically produce different waveforms. Although the needle-plane and combined PD sources, show similar waveforms, the PD with the twisted pair of wires produces a PD with a low amplitude. Since we can visually separate some signals into their respective sources, CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 53

(a) (b) (c)

Figure 3.11: Sample of normalized PD pulses and the approximated signal overlaid, for sources (a) twisted pair of wires, (b) needle-plane setup and (c) combined. An order of 90 and a scaling parameter of 10 was used for the approximation.

one can argue against the need to use classifiers such as LDA and SVM. Therefore, to evaluate the robustness of the method introduced earlier, we normalized all the signals and repeated the classification procedure. The resulting signal waveforms are shown in Figure 3.11 in which the pulses from the 3 sources are visually similar.

The normalized signals gave us the first three coefficients as shown in Figure 3.12 which shows that it is slightly difficult to classify the needle-plane and the combined sources with only 3 coefficients.

When we trained the classifiers using the 3 coefficients, we observed misclassification

rates of 5.97%, 2.63% and 0.95% for the three sources, for test data. Figure 3.13 shows that the test error becomes minimum using 7 coefficients from the Laguerre function expansion of the signals.

This indicates that the implemented method can be used even if the signals are visually similar. Another advantage of this method is the low number of Laguerre coefficients needed to satisfactorily classify the sources of partial discharges. This will reduce the computation burden and time.

In this chapter, we used the coefficients of the Laguerre basis approximation as a CHAPTER 3. PARTIAL DISCHARGE SOURCE CLASSIFICATION 54

Figure 3.12: First 3 Laguerre coefficients for the scaled signals form all sources.

Figure 3.13: Misclassification error for the three sources with the number of selected coefficients.

set of features to train classifiers to group PD signals according to their sources. In the process of classification, a novel method, based on the same Laguerre expansion, was developed to remove the delay from a signal. For the purpose of classification, linear

discriminant analysis (LDA), quadratic discriminant analysis (QDA) and support vector machines (SVM) were used. In the next chapter, we discuss system identification

and develop a novel method to identify a system given the input and output functions. Chapter 4

System Identification with Laguerre Functions

In this chapter, we develop a system identification approach using a Laguerre

functional basis expansion. To evaluate our proposed methodology, we perform

simulation studies by simulating examples of an electrical system with an input

and corresponding output. The aim is to estimate the system using our proposed

approach. In order to understand the process of estimating a system given

an input and an output, also known as system identification, we discuss the

convolution operator and related concepts.

4.1 System Identification

Discrete and continuous linear time-invariant (LTI) systems were introduced in Section

2.1. We explained that a continuous LTI system with a unit impulse response h(t), can be expressed as Z ∞ y(t) = x(τ)h(t − τ) dτ, (4.1) 0

55 CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 56 where x(t) is the input and y(t) is the corresponding produced output, since the time

t ∈ R+. Assuming no prior information on the mathematical forms of the input and output, or on the system, our objective is to estimate the impulse response function

(system effect), h(t), given x(t) and y(t). First, we approximate the input and output

signals using Laguerre basis functions as

kx ky kh X p X p X p x(t) = xili (t), y(t) = yjlj (t) and h(t) = hmlm(t), i=0 j=0 m=0

where xi, yj and hm are basis expansion coefficients for the input, output and the system, respectively. With the Laguerre basis approximation of these functions, we can build a relationship between the input, output and the system. To this end, we use the convolution operator: y(t) = x(t) ∗ h(t)

Z ∞ = x(τ)h(t − τ) dτ. (4.2) 0

Using basis function representations for x(t), y(t) and h(t) we get:

ky Z t " kx #" kh # X p X p X p yjlj (t) = xili (τ) hmlm(t − τ) dτ j=0 0 i=0 m=0

Z t kx kh X X p p = xihmli (τ)lm(t − τ) dτ 0 i=0 m=0

kx kh Z t X X p p = xihm li (τ)lm(t − τ) dτ i=0 m=0 0

kx kh X X  p p  = xihm li+m(t) ∗ li+m+1(t) , (4.3) i=0 m=0

R t p p p p where expression 0 li (τ)lm(t − τ) dτ is the same as li (t) ∗ lm(t). CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 57

Lemma 4.1.1. The convolution of two Laguerre functions with the same scaling parameter p, can be rewritten as follows

p p p p p 2p [li (t) ∗ lm(t)] = li+m(t) + li+m+1(t) for any i, m ≥ 0. (4.4)

Proof. See Budke (1989).

This property is useful in rewriting (4.3) as

ky k k X 1 Xx Xh y lp(t) = √ x h lp (t) + lp (t) , (4.5) j j 2p i m i+m i+m+1 j=0 i=0 m=0 which can be used in building expressions for calculating the coefficients of the system.

To estimate the coefficients hm, we look at different orders of Laguerre functions on either side of the equation. Equating corresponding Laguerre functions will result in a relationship between the coefficients of the Laguerre function representation of the input, output and the system.

Considering the 0th order Laguerre functions, we obtain 1 y = √ x h , (4.6) 0 2p 0 0

which will lead to estimating h0 as 1 p h0 = 2py0. (4.7) x0

Similarly, by equating the coefficients of the 1st order Laguerre functions, we get 1 y = √ (x h + x h + x h ) , (4.8) 1 2p 0 0 0 1 1 0 which will lead to 1 p  h1 = 2py1 − x0h0 − x1h0 x0

1 p p  = 2py1 − 2py0 − x1h0 . (4.9) x0 CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 58

Continuing this process while looking at the pattern of these expressions, we can

obtain an expression for a Laguerre expansion coefficient hm as in Theorem 4.1.2.

Theorem 4.1.2. Let there be a continuous linear time-invariant (LTI) system, with a unit impulse response h(t), an input x(t) and a corresponding output y(t). If h(t),

x(t) and y(t) are approximated using a Laguerre basis expansion as

kx ky kh X p X p X p x(t) = xili (t), y(t) = yjlj (t) and h(t) = hmlm(t), i=0 j=0 m=0

the coefficients hm can be calculated using any if the following recursive formulas

" m ! m−1 # 1 X X h = p2py − p2p (−1)m+jy − h x , or (4.10) m x m j−1 i m−i 0 j=1 i=0

" m # 1 X h = p2py − h (x + x ) , or (4.11) m x m m−i i i−1 0 i=1

" m # 1 X h = p2py − (h x + h x ) . (4.12) m x m m−i i i−1 m−i 0 i=1

To use the approach in (4.5) we should have orders that satisfy ky = kx +kh +1, for the equality to hold. If the interest is only in estimating the coefficients of the system, we can just consider the first kh Laguerre functions on either side of the equation.

4.2 Simulation Studies

In this section, we build simulation studies to demonstrate the usability of this method and to identify its limitations. CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 59

Figure 4.1: Circuit diagram of the simulation setup for the example for system identification.

4.2.1 Example 1

We consider a simple resistor-inductor-capacitor (RLC) circuit, shown in Figure 4.1,

as our underlying system for this simulation study. In using this setup, we can use Laplace transformation of circuit components to build a mathematical function for

the system. The Laplace transform of a function f(t) is defined as

F (s) = Lf {s}

Z ∞ = f(t)e−st dt, (4.13) 0 where s is a complex number. The components of this circuit can be expressed in the

Laplace domain as in Table 4.1 (Izadian, 2019).

Table 4.1: Resistors, capacitors and inductors in the time and Laplace domains. Component Laplace Domain Resistor RR 1 Capacitor C sC Inductor L sL V Voltage V s

Considering Ohm’s law for this circuit (refer (Izadian, 2019)), we can express the

relationship V = IR, (4.14) CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 60 where V is the voltage, I is the current that flows through the circuit and R, the resistance of the circuit. The Laplace transformation of Ohm’s law can be expressed as V (s) = i(s)Z(s), (4.15) where Z(s) is known as the Laplace impedance of the circuit elements. Since the

L-R-C components in Figure 4.1 are in series, Z(s) can be written as the addition of

the individual Laplace domain representations of the components as 1 Z(s) = R + sL + . (4.16) sC

Therefore, (4.15) can be written as

V  1  = i(s) × R + sL + . (4.17) s sC

Then,

V  1  i(s) = 1 s R + sL + sC

 1  = V . (4.18) 2 1 s L + sR + C

Considering R = 1Ω, L = 1H, C = 1F and V = 1V , we can express the Laplace

transformation of h(t) as 1 H(s) = , (4.19) s2 + s + 1 where the Laplace transformation (H(s)) of the function h(t) is defined to be

Z ∞ H(s) = h(t)e−2st dt. (4.20) 0 CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 61

Inverse Laplace transformation of H(s) presents the function h(t) as √ ! 2  t  3 h(t) = √ exp − sin t . (4.21) 3 2 2

We used a square pulse as the input to the system, with an amplitude of 1V and a 2 second duration. The input can be mathematically represented as x(t) = u(t) − u(t − 2), (4.22) where ( 0 when t < 0, u(t) = (4.23) 1 when t > 0.

The function u(t) is known as a continuous-time unit step function in the field of

electrical engineering (Oppenheim et al., 1996). Note that it is discontinuous at t = 0.

The Laplace transformation X(t) of the input function takes the form of

X(s) = Lx(s)

Z ∞ = [u(t) − u(t − 2)] e−st dt 0

Z ∞ Z ∞ = e−st dt − e−st dt 0 2

1 e−2s = − s s 1 = 1 − e−2s . (4.24) s

Since the convolution of two functions in the time domain relates to the multiplication of their transforms in the Laplace domain, the Laplace transformation of the output can be expressed using Y (s) = X(s)H(s)

(1 − e−2s) = , (4.25) s(s2 + s + 1) CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 62

(a) (b) (c)

Figure 4.2: (a) Input, (b) output and (c) impulse response functions used in example 1 to demonstrate the developed recursive formula for system identification. which we can use to identify the output function through an inverse Laplace transfor- mation as follows

y(t) = y1(t) − y1(t − 2), (4.26) where ( " √ ! √ √ !#) − t 3 3 3 y (t) = 1 − e 2 cos t + sin t u(t). (4.27) 1 2 3 2

For numerical evaluation of these functions, t was selected to be a sequence from 0 to 20 with increments of 0.01. Figure 4.2 depicts the resulting signals for the input,

output and the system (impulse response function). Figure 4.3 shows that the Laguerre

function basis expansion has been able to closely approximate the input and output functions.

We used the developed recursive formula to estimate the system, assuming it is unknown. The resulting function is shown in Figure 4.4. We see that the two functions line up perfectly on top of each other indicating a good estimate of the system. This suggests that this recursive formula provides a good way to obtain coefficients associated with the system. CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 63

(a) (b)

Figure 4.3: (a) Input and (b) output functions used in Example 1 with the Laguerre basis approximation. An order of 90 and a scaling parameter of 10 was used for the approximation.

4.2.2 Example 2

In this example, we choose a Gaussian input signal which can be expressed as

 (t − b)2  x(t) = a exp − , (4.28) c

1 with a = 1, b = 5 and c = 2 . We specifically choose b to have a delay in starting the signal which was discussed in Section 2.5, when approximating signals using a Laguerre basis.

We used the same system as in Example 1 (Section 4.2.1) with the unit impulse response √ ! 2 − t 3 h(t) = √ e 2 sin t . (4.29) 3 2

Also, we used the convolution operator to numerically evaluate the output as

Z t y(t) = x(τ)h(t − τ) dτ. (4.30) 0 CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 64

Figure 4.4: The estimated system using the Laguerre coefficients calculated using the developed recursive formula.

The functional forms of the input and output are shown in Figure 4.5 with the approximated functions. Similar to the previous example, we used an order of 90 and a scaling parameter of 10. These values were selected by visually observing the fit that they provide. These plots show that the Laguerre basis approximation provides a perfect fit for the two signals. We can use the approximation coefficients

to estimate the system using the introduced method. Figure 4.6(a) shows that the

estimated system is completely different from the actual functional form when the scaling parameter is selected to be 10. If we select the scaling parameter to be a

smaller value, such as p = 0.1 (Figure 4.6(b)), the fit seems to be perfect for this

example. The reason for seeing amplitude of magnitude 1040 in Figure 4.6(a) is that

the coefficients hm starts to diverge and due to the recursive nature of the formula, the errors accumulate. This indicates that this recursive formula is extremely sensitive to the value of the scaling parameter and a proper result is not guaranteed even if the input and output functions show a perfect fit. CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 65

(a) (b)

Figure 4.5: (a) Input and (b) output functions used in Example 2 with the Laguerre basis approximation. An order of 90 and a scaling parameter of 10 was used for the approximation.

(a) (b)

Figure 4.6: Estimated impulse response functions h(t) for (a) p = 10 and (b) p = 0.1.

4.2.3 Example 3

In this example, we use a simple exponential function as the underlying system given by

− t h(t) = e τ u(t), (4.31) CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 66

(a) (b) (c)

Figure 4.7: (a) Input, (b) output and (c) impulse response functions used in example 3 to demonstrate the developed recursive formula for system identification. where u(t) is the continuous-time unit step function. We selected a damped sine wave

as the input, in the form of  t  x(t) = sin (ωt) 1 − . (4.32) t0

The value of τ was selected to be 0.1, ω to be 10π and t0 to be 1. The output is numerically calculated through the definition of the convolution as in the previous example. The resulting functional forms are shown in Figure 4.7. The estimated

system (Figure 4.8) shows a pattern close to the actual function and therefore we can

conclude that the developed recursive formula has been able to estimate the system properly in this case.

4.3 System Identification Based on Noisy Signals

The above examples have smooth and noiseless input, output and impulse response

(system) functions. To test the robustness of our proposed method, we added noise to

the input and output signals. For the purpose of demonstration, we used Gaussian noise with zero mean and a of 0.01. The input and output functions with the respective approximated functions are shown in Figure 4.9. CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 67

Figure 4.8: The estimated system with the actual unit response function, for Example 3.

(a) (b)

Figure 4.9: (a) Input and (b) output functions approximated using a Laguerre basis with an order of 90 and a scaling parameter of 10, when there is noise.

Following the same procedure as in our previous examples, we approximated the

input and output functions using the Laguerre basis functions (Figure 4.9) and used

the established recursive formula to estimate the coefficients of the unit impulse response. The resulting function is shown in Figure 4.10. This plot shows that the estimated system is not perfect as in the case without noise in the input and output. This indicates that the proposed recursive formula is sensitive to noise in the input and output. We investigated the effect of the noise by calculating the Mean Squared CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 68

Figure 4.10: The estimated system with the actual unit response function in the example with noisy input and output functions.

Error (MSE), defined as the difference between the actual and approximated functions, with varying levels of noise (Figure 4.11). We changed the standard deviation of the

Gaussian noise that we are adding to both input and output functions.

Figure 4.11: Mean squared error (MSE) between the estimated and actual systems for different levels of noise.

One can observe that when the noise increases, so does the severity of the error. As a solution to this problem we suggest tuning the scaling parameter p, such that we get a reasonable fit for the system. This can be done by selecting a range of values for the scaling parameter, estimating the system based on each values of p and then evaluating the performance using a criterion such as MSE. CHAPTER 4. SYSTEM IDENTIFICATION WITH LAGUERRE FUNCTIONS 69

In this chapter, we developed a new method of system identification. The developed recursive formula to identify an underlying system was obtained by considering the Laguerre basis approximation of a pair of input-output signals from the same system. We outlined the limitations of the proposed system identification method. We suggest to use a grid search method for the scaling parameter of the Laguerre basis to be used in the entire process. In the next chapter, we develop an alternative method for system identification using a group Lasso regression methodology. Chapter 5

Lasso Methodology for System Identification

We propose a system identification approach using a group Lasso methodology

to obtain more reliable Laguerre expansion coefficients of the underlying system.

The resulting signals are used for PD source classification and the results are

compared with the results in Chapter 3.

5.1 Introduction

Section 2.1 outlined discrete and continuous linear time-invariant (LTI) systems where we explained that a continuous LTI system with a unit impulse response of h(t), can

be expressed as Z ∞ y(t) = x(τ)h(t − τ) dτ, (5.1) −∞ where x(t) is the input and y(t) is the corresponding produced output.

Further, in (4.5) we showed that if the input, output and the unit impulse response

70 CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 71

are written using Laguerre basis expansion as

kx ky kh X p X p X p x(t) = xili (t), y(t) = yjlj (t) and h(t) = hmlm(t), i=0 j=0 m=0 we can build the expression

ky k k X 1 Xx Xh y lp(t) = √ x h lp (t) + lp (t) . (5.2) j j 2p i m i+m i+m+1 j=0 i=0 m=0

We observed that the recursive formula introduced in Chapter 4 was highly sensitive

to noise in the input and output functions (Section 4.3). Therefore, this method is not

suitable for most practical applications. In this chapter, we introduce an alternate approach to the recursive formula developed in Chapter 4, to estimate the expansion

coefficients (hm) of the unit impulse response h(t).

The expression (4.5) can be rearranged to represent the difference between the

output function and the “estimated output function” as

ky k k X 1 Xx Xh y lp(t) − √ x h lp (t) + lp (t) = 0. (5.3) j j 2p i m i+m i+m+1 j=0 i=0 m=0

The “estimated output function” is constructed with the known input and an unknown impulse response function. The actual and estimated functions should be theoretically equal. Due to noise in the signals this condition might not be strictly true. Therefore, we consider this problem as an optimization problem where we try to minimize the difference between the actual and estimated outputs. CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 72 5.2 Statistical Approach Towards System Identifi- cation

To build the error minimization problem, for a given time t, we can construct a least-squares objective function in the form of

2 ( ky k k ) X 1 Xx Xh O (h) = y lp(t) − √ x h lp (t) + lp (t) , (5.4) t j j 2p i m i+m i+m+1 j=0 i=0 m=0

 > where h = h0, . . . , hkh . The goal is to estimate the unknown coefficients hm’s by

minimizing (5.4).

Pky p Since j=0 yjlj (t) is an approximation of the actual output signal y(t), we can consider the objective function to be

( k k )2 1 Xx Xh O (h) = y(t) − √ x h lp (t) + lp (t) . (5.5) t 2p i m i+m i+m+1 i=0 m=0

This can be further improved to include repeated observations of the input-output

signal pairs s = {1,...,S} and consider the difference in functions at all time points

t = {t1, . . . , tN } as

S N ( k k )2 X X 1 Xx Xh O(h) = y (t ) − √ x h lp (t ) + lp (t ) , (5.6) s r 2p i,s m i+m r i+m+1 r s=1 r=1 i=0 m=0

where ys(tr) is the functional value of the sth output signal at time tr, xi,s is the

ith Laguerre expansion coefficient of the sth input. Expression (5.6) can then be

minimized with respect to hm to obtain the system which makes the estimated output to be closest to the actual output signal.

A linear regression approach can be used to estimate the required coefficients by rear-

k p p √1 P x   ranging the terms in (5.6). Let y˜v = ys(tr), x˜m,v = 2p i=0 xi,s li+m(tr) + li+m+1(tr) CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 73 with v = (r − 1)S + s. This leads to the objective function

NS ( k )2 X Xh O(h) = y˜v − hmx˜m,v , (5.7) v=1 m=0 which can be treated as a problem of estimating regression coefficients of a model in

Pkh the form ofy ˜v = m=0 hmx˜m,v + v for v = 1,...,NS.

Computationally, the objective function in (5.7) might not be able to estimate

the required parameters if x˜m,v are highly correlated with each other for different m values. The issue arises from a singular design matrix in the linear regression model. Therefore, a penalized regression approach would be suitable. In this research, we use

a Lasso (Tibshirani, 1996) approach since this method tends to shrink insignificant

coefficients towards zero. This leads to an objective function in the form of

NS ( k )2 k X Xh Xh O(h) = y˜v − hmx˜m,v + λ |hm|, (5.8) v=1 m=0 m=0 where λ is a tuning parameter to be estimated using cross-validation.

5.3 Group Lasso Objective Function

When the objective function given in (5.8) is used in practice, estimated output

can underestimate the actual output function since the Lasso approach attempts

to shrink all the parameters hm of the underlying system towards zero. The first few Laguerre basis functions play an important role in accurate estimation of h(t),

therefore shrinking the corresponding coefficients will lead to increased bias. This can be solved by imposing an additional restriction to prevent shrinking a set of important CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 74

parameters through the following objective function:

NS ( k )2 k X Xh Xh O(h) = y˜v − hmx˜m,v + λ |hm|, (5.9)

v=1 m=0 m=m0

where h0, . . . , hm0−1 are not penalized in the regression procedure. This approach is known as the group Lasso approach (Yuan and Lin, 2006) as described further in the next section.

5.3.1 Group Lasso

In a regression problem, the interest is in evaluating significant explanatory (inde-

pendent) variables in predicting a response (dependent) variable of interest. Each

independent variable can be either categorical or numerical in nature while the re-

sponse is numerical. An (ANOVA) model is used when all the

explanatory variables are categorical whereas the regression model is an additive model when all the explanatory variables are numerical. The most common use of is to select important explanatory variables to be used to obtain an accurate

prediction of the response (Yuan and Lin, 2006).

Consider a regression model with J explanatory variables as

J X y = Xjβj + , (5.10) j=1

2 where y is a response vector with n elements,  ∼ Nn (0, σ I), Xj is a matrix with dimension n × pj which is associated with the jth independent variable and

βj is a vector of coefficients of size pj, j = 1,...,J. For the purpose of simplicity,

this relationship can be written as y = Xβ +  where X = (X1,..., XJ ) and

> >> β = β1 ,..., βJ . CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 75

In a typical regression problem the matrix X is known as the design matrix and it is required to be full rank. It can be seen that in many cases this requirement is not satisfied, such as in an ANOVA model with unbalanced designs. In such a circumstance, the importance of one independent variable will depend on the effect of other explanatory variables. This makes the method of separating variances to identify the significant variables unusable. Then, a stepwise or a subset selection procedure

can be used based on a criterion such as the Akaike information criterion (AIC), but

this tends to be impractical when the number of independent variables is large. Even in a case which is computationally feasible, a local solution can be reached instead of

the required global solution (Yuan and Lin, 2006).

As an answer to the above mentioned problem, there are many

procedures in the literature out of which the non-negative garrotte of Breiman (1995)

stands out. Tibshirani (1996) introduced the popular Lasso estimates based on the

non-negative garrotte, defined as

βˆ (λ) = argmin ky − Xβk2 + λ kβk  , (5.11) LASSO l1 β where λ is a tuning parameter and k.k is the l -norm of a vector. l1 1

Although the Lasso has many computational advantages compared to the tradi- tional methods, it is not designed for factor selection. In a regression setting, factors

(categorical independent variables) are represented using a set of dummy variables.

Therefore, treating these dummy variables independently, as done in Lasso, is not suitable. To address this problem, the group Lasso is introduced by Yuan and Lin

(2006). This also provides more control towards the penalty imposed on the regression

coefficients. CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 76

The group Lasso objective function is defined as

J 2 J 1 X X y − Xjβj + λ βj , (5.12) 2 Kj j=1 j=1

1 >  2 d where kηkK = η Kη , η ∈ R ; d ≥ 1 and K is a positive definite matrix with dimension d × d. λ ≥ 0 is a tuning parameter similar to the Lasso. Yuan and

Lin (2006) provide the R package gglasso to obtain the group Lasso estimates as

mentioned above. The cv.gglasso function in the same package obtains the group Lasso estimates using cross-validation to select the tuning parameter λ. The group

Lasso objective function mentioned in (5.9) is a simplification of the generic function

(5.12).

5.4 System Identification Using Mean Input and Output Signals

In a case where multiple pairs of input-output signals are observed, the objective

function (5.9) can be rewritten considering the mean input of all input signals s =

{1,...,S} defined by

S 1 X x¯ (t) = x (t), (5.13) . S s s=1

and the mean output of all output signals as

S 1 X y¯ (t) = y (t). (5.14) . S s s=1

Section 5.1 mentioned the relationship between the input, output and impulse response R ∞ functions to be ys(t) = −∞ xs(τ)h(t − τ) dτ. Considering the mean input and output CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 77

signals, this expression can be expressed as

S S 1 X 1 X Z ∞ y (t) = x (τ)h(t − τ) dτ, S s S s s=1 s=1 −∞

Z ∞ = x¯.(τ)h(t − τ) dτ, −∞

=x ¯.(t) ∗ h(t). (5.15)

This relationship makes the group Lasso objective function in (5.9) to be

N ( k )2 k X Xh Xh O(h) = y¯.(tr) − hmx˜m,r + λ |hm|, (5.16)

t=1 m=0 m=m0 where

k 1 Xx x˜ = √ x¯ lp (t ) + lp (t ) , (5.17) m,r 2p i i+m r i+m+1 r i=0

andx ¯i the ith Laguerre expansion coefficient of the mean input signalx ¯.(t).

In the next section, we study an experiment where the input and output are known. We then use the method described in Section 5.4 to estimate the impulse response function. To test the validity of the estimated impulse response functions, the input is reconstructed using the output together with the estimated function. We use the estimated systems with the observed PD signals from Section 3.1 to estimate the corresponding inputs. These estimated inputs are then used for PD source classification.

5.5 Experimental Setup

To apply the proposed method, a lab experiment was conducted similar to the

experiment outlined in Section 3.1. While the same PD sources were used (twisted CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 78 pair, needle-plane setup and combined), they were connected to an electrical calibrator instead of a high voltage source. This was done with the intention of providing a known input to the system which includes components such as measuring equipment and connecting wires. The method described in Chapter 4 as well as those introduced in this chapter rely on providing a known input while measuring the output. To this end, the calibrator was used to generate a “known input” to the system. Samples of the input-output signal pairs for three PD sources (twisted pairs, needle-plane setup and combined) are given in Figure 5.1.

In this experiment, 15 input-out pairs were observed from each PD source. The mean input and output signals were calculated and were used in the group Lasso regression model. Only 10 signals were used to calculate the mean signals while the rest were kept aside for testing the validity of the estimated systems.

5.6 Estimated Systems using Group Lasso Method- ology

In using expression (5.16) to estimate the system, a suitable group size m0 should be selected. Since only the first coefficient most influences the scale of the basis expansion, m0 was set to be 2. The estimated system effects (impulse response functions) for the three sources are shown in Figure 5.2.

According to Figure 5.2, it can be seen that the effects of the three PD sources have a similar shape. This shows that most of the difference between the input and output is due to the common equipment that are connected. Due to this fact, we use the mean of the three estimated impulse responses to denote the “effect of the system” that is acting on the input, to create the output. CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 79

(a) (b)

(c) (d)

(e) (f)

Figure 5.1: Sample of input-output signal pairs for the three sources, used for system identification. CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 80

Figure 5.2: Estimated impulse response functions for the three PD sources, using a group Lasso objective function method with ky = kx = kh = 90 and m0 = 2.

5.7 Reconstructed Output

To evaluate the accuracy of the estimated system effect, the output is recreated using

the mean input and the estimated system. To recreate the output y[t], we use the

discrete convolution relationship between the input x[t] and the estimated system h[t]

given by

∞ X y[t] = x[k]h[t − k]. (5.18) k=0

In this procedure, it was observed that the scales of the actual and reconstructed functions are different. The difference is a result of the system estimate of the group

Lasso (Yuan and Lin, 2006). This was automatically corrected by calculating a scaling

factor as maxy ¯(t) , (5.19) max y¯ˆ(t) where y¯ˆ(t) is the reconstructed mean output signal. The scaling factor was used to

subsequently correct the estimated system effect. CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 81

Figure 5.3: Sample of reconstructed output signal using the estimated system.

Figure 5.3 shows one instance of an output test signal and its corresponding reconstruction using the method explained above. A test signal refers to a signal that was left out from the initial system identification process. It shows that the reconstructed signal has been able to capture the general pattern of the actual waveform for the first half. It can also be seen that the initial rise is not captured perfectly.

5.8 Reconstructed Input

When the system is estimated properly, reconstructing the input signal can also be of importance. This was the aim of using system identification in this research. To this

end, the same group Lasso objective function (5.9) can be used for a single signal:

N ( k k )2 k X 1 Xx Xh Xx O(x ) = y (t ) − √ x h lp (t ) + lp (t ) + λ |x |, s s r 2p i,s m i+m r i+m+1 r i,s r=1 i=0 m=0 i=m0 (5.20)

  where xs = x0,s, . . . , xkx,s , s is the signal pair that is of interest and hm are the

coefficients of the Laguerre basis approximation of the estimated system.

The reconstructed input signal and the actual input signal associated to the previous test case are shown in Figure 5.4. Examining the figure indicates that the CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 82

Figure 5.4: Sample of reconstructed input signal using the estimated system.

general pattern of the actual input is captured very well by the reconstruction although the initial rise is again not captured properly.

5.9 Classification on Reconstructed Input

Chapter 3 explained the procedure followed to classify PD sources to their respective sources using the observed signals. It would be logical to remove the effect of equipment used in the experiment to reveal the actual PD on which we can perform classification.

The method outlined above (Section 5.8) can be applied to this end, on the normalized

PD signals described in Chapter 3. Figure 5.5 illustrates the first 3 Laguerre basis expansion coefficients of the estimated input PD signals. This figure shows that the sources are harder to visually distinguish than the case where the output waveforms were used directly (Figure 3.12).

Three classifiers are used to classify the signals, similar to the previous case.

Namely, linear discriminant analysis (LDA), quadratic discriminant analysis (QDA)

and support vector machines (SVM). Table 5.1 shows a poor performance with a

misclassification rate of 24.58% for LDA. On the contrary, QDA and SVM classifiers CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 83

Figure 5.5: First 3 Laguerre coefficients for the estimated input PD signals.

Table 5.1: The confusion matrix between the actual and predicted labels for the test data using the LDA model. Actual Twisted Pair Needle-Plane Both Twisted Pair 156 8 3 Predicted Needle-Plane 1 70 35 Both 6 50 90

have shown a low misclassification rates with 1.19% (Table 5.2) for QDA and 0.24%

(Table 5.3) for SVM.

Table 5.2: The confusion matrix between the actual and predicted labels for the test data using the QDA model. Actual Twisted Pair Needle-Plane Both Twisted Pair 161 0 3 Predicted Needle-Plane 0 128 0 Both 2 0 125

The misclassification rates when classifying on the output signals were observed

to be 5.97%, 2.63% and 0.95% for LDA, QDA and SVM, respectively. Although a CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 84

Table 5.3: The confusion matrix between the actual and predicted labels for the test data using the SVM model. Actual Twisted Pair Needle-Plane Both Twisted Pair 162 0 0 Predicted Needle-Plane 0 128 0 Both 1 0 128

slight decrease in misclassification rates can be seen in the QDA and SVM methods, a significant increase in accuracy cannot be observed.

Although removing the effect of the system improved the classification accuracy in the specific example that was considered, we used a simulation study to examine whether classifying on the input has an advantage for classification compared to classifying on the output.

5.10 Simulating for Classification using the Input Signal

To examine whether classification on the input signal can provide a significant im- provement in the classification accuracy, we used two Gaussian functions with different variances as the input functions

 (t − 5)2  x (t) = exp − , (5.21) 1 0.5

and  (t − 5)2  x (t) = exp − . (5.22) 2 1.5 CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 85

Figure 5.6: Impulse response function used in this simulation.

These values were selected so that the Gaussian function does not rise before time zero. The system was selected to take the form √ ! 2  t  3 h(t) = √ exp − sin t , (5.23) 3 2 2 which was the system shown in Figure 4.1. Figure 5.6 shows the impulse response

function (system effect) used in this simulation.

To obtain multiple signals as the input-output signal pairs, a Gaussian noise with

∗ mean zero was added to the input functions x1(t) and x2(t), denoted by x1(t) and

∗ x2(t), respectively. The level of noise was changed by randomly assigning a value from 0.01 to 0.05 as the standard deviation of the Gaussian noise. 300 such inputs were

generated from x1(t) and another 300 from x2(t). Corresponding output functions were calculated under three scenarios as depicted in Figure 5.7. In the first case, the

∗ ∗ ∗ system effect h(t) was added to x1(t) and x2(t) to calculate the outputs y1(t) and

∗ y2(t) using discrete convolution:

∞ ∗ X ∗ yi [t] = xi [k]h[t − k], i = 1, 2. (5.24) k=−∞ CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 86

∗ h(t) ∗ ∗ h(t) ∗ h(t) x1(t) y1(t) x1(t) y1(t) x1(t) y1(t) ∗ ∗ ∗ ∗ x2(t) y2(t) x2(t) y2(t) x2(t) y2(t)

Noise Noise Noise Noise Noise

∗∗ ∗ ∗∗∗ x1(t) x1(t) y1 (t) x1(t) y1 (t) ∗∗ ∗ ∗∗∗ x2(t) x2(t) y2 (t) x2(t) y2 (t)

(a) (b) (c)

Figure 5.7: Three cases considered in the simulation study to show that performing classification on the input might be beneficial to classifying on the output. setup for (a) case 1, (b) case 2 and (c) for case 3.

The second case was built by adding a Gaussian noise with the same characteristics to the outputs generated in case 1 as

∗∗ ∗ y1 (t) = y1(t) + noise, (5.25)

∗∗ ∗ y2 (t) = y2(t) + noise. (5.26)

In the third case, the output functions y1(t) and y2(t) were calculated numerically using the definition of convolution and the input and impulse response functions as

Z t yi(t) = xi(τ)h(t − τ) dτ, i = 1, 2, (5.27) 0

and adding a noise to them.

Samples of input and output signals for the three cases are shown in Figure 5.8.

75% of the signals from each case were selected as the training signals while the rest were used as the test data. Laguerre expansion coefficients were calculated for the training signals with a scaling parameter of 10. This scaling parameter was selected to be the same as in Section 4.2. QDA and SVM were used to classify these signals. Both classifiers were able to classify the inputs and the outputs in case 1, correctly with 0 misclassifications. Both classifiers were not able to classify the outputs in case CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 87

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 5.8: Simulated input functions (a) – (b) and output functions (c) – (h) shows that performing classification on the input might be beneficial to classifying on outputs. CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 88

2 or 3 with about 45% misclassifications. 12 Laguerre approximation coefficients were used in this simulation.

Table 5.4: The confusion matrix between the actual and predicted labels for the test data using the QDA and SVM models of the input and the QDA classification in case 1. Actual ∗ ∗ ∗ ∗ x1/y1 x2/y2 ∗ ∗ x1/y1 75 0 Predicted ∗ ∗ x2/y2 0 75

Table 5.5: The confusion matrix between the actual and predicted labels for the test data using the SVM classifier of case 1. Actual ∗ ∗ y1 y1 ∗ y1 74 5 Predicted ∗ y2 1 70

Table 5.6: The confusion matrix between the actual and predicted labels for the test data using the SVM classifier of case 2. Actual ∗∗ ∗∗ y1 y2 ∗∗ y1 42 36 Predicted ∗∗ y2 33 39

Table 5.7: The confusion matrix between the actual and predicted labels for the test data using the SVM classifier of case 2. Actual ∗∗∗ ∗∗∗ y1 y2 ∗∗∗ y1 36 32 Predicted ∗∗∗ y2 39 43

This simulation shows that, in some cases, classifying on the input signal has an advantage over classifying on the output signal. It also shows that classifying on the input is useful when there is noise in the outputs, which are caused by the effect of

the system (which is the case in practice). This proves the importance of removing

the system effect from the output signals before performing the analysis. CHAPTER 5. LASSO METHODOLOGY FOR SYSTEM IDENTIFICATION 89

In this chapter, we developed an alternative method to the recursive formula in

Chapter 4. This method is suitable even when there is noise in the input/output

signals. Our approach was based on a group Lasso regression model. We observed that this method was able to estimate the system and remove its effect to obtain the actual input PD signal. In the next chapter, we discuss the conclusions and future work of this research. Chapter 6

Conclusion and Future Work

In this thesis, we introduced a new method for partial discharge source identification while proposing novel methods for system identification. We implemented our devel- oped methods on experimental data obtained from a lab experiment, with three PD sources, namely, a twisted pair of wires and a needle-plane setup and a combination of them.

Initially, we discussed the use of Laguerre functions as a basis to obtain a mathe- matical expression for the PD signals. We studied different ways of performing the approximations using the least-squares and least absolute errors as well as an approach based on the Lasso objective function. We also presented a deterministic method derived from the inner product of functions and used a numerical approximation to obtain estimates for the Laguerre expansion coefficients. We implemented our methods on two examples and observed that the least absolute objective function produced the best approximation, compared to the other two statistical methods. It was also observed that the deterministic method, coined the “exact method”, provided a better approximation for the signals than any other statistical approach we used in our analysis.

Signal approximation using the Laguerre function basis depends on a scaling

90 CHAPTER 6. CONCLUSION AND FUTURE WORK 91

parameter which should be selected properly. To this end, we improved on an existing method in the literature, which focuses on the time it takes for a Laguerre function to

fall to 1% of its peak. We proposed using a quadratic model to capture the relationship between the order of the Laguerre function and the scaling factor with the time for

the function to reach 1% of the peak. Based on the quadratic model, we obtained an interval for possible values of the scaling parameter. It was observed that this interval tends to be from zero to close to the order of the Laguerre function. In addition to this interval in the time domain, we also considered the function in the , when selecting a scaling parameter. We argued that a scaling parameter close to the higher limit of the interval defined earlier, is most suitable. Alternatively, we used a grid search approach for this parameter to find the value which minimizes a dissimilarity criterion between the approximated and actual signals, based on the mean squared error. We used the approximated PD signals to implemented classification and system identification techniques on these approximated signals.

To identify the source of the PD, we used the coefficients of the Laguerre basis expansion as a set of features and trained classifier algorithms. To perform the

classification, we separated a training set which included 75% of the signals from each source while the rest were marked for testing. We used linear discriminant analysis, quadratic discriminant analysis and support vector machines to identify the source of the partial discharges. The method of using basis function expansion coefficients for classification performed well in this application with the support vector classifier

providing misclassification rates around 3% on test data using only the first three expansion coefficients. The partial discharge waveforms for the three sources shows differences in their amplitudes, therefore they were normalized and the classification was performed again. The aim was to make the partial discharge waveforms as visually similar such that a person could not easily assign these signals to their respective CHAPTER 6. CONCLUSION AND FUTURE WORK 92

sources. It was seen that, even in this case, the proposed method is able to classify the signals to their sources with high accuracy.

It is logical to assume that the observed PD signals include the effect of measuring equipment used in this experiment. In the hope of improving the PD source classifica- tion accuracy, we developed system identification methods to estimate and remove the effect of these equipment from the observed signals. To this end, we developed two methods of system identification. A separate experiment was conducted to obtain data to implement these techniques, using the same sources and providing a known input to obtain the corresponding output. One system identification method was based on a deterministic method where we equated the coefficients of the Laguerre representation of the input, output and system functions. This resulted in a recursive formula to estimate the basis function expansion coefficients for the system effect. This method was seen to exhibit unstable behaviour in the presence of noise in the input and output signals.

To combat this issue, we suggested an alternative method based on an objective function. We used a group Lasso objective function which was minimized to obtain the basis expansion coefficients for the system. After estimating the system using this approach, we reconstructed the output to evaluate the validity of the estimated system. We introduced a scaling factor to correct the scale of the estimated system and then reconstructed the inputs the initial experimental data that was used to perform classification. This showed that in this specific example, there was a decrease of misclassification rates when classifying on the input, compared to classifying on the output. We further showed, through simulation studies, that in some cases there is a significant difference between classifying on the input and classifying on the output signals. CHAPTER 6. CONCLUSION AND FUTURE WORK 93 6.1 Numerical Limitations

During this research, we encountered several numerical limitations. The most signifi- cant limitation was the generation of Laguerre functions. As we discussed in Section

3.1, the the time gap between two points (∆t) measured from the oscilloscope was 1

nanosecond (ns) or 10−9 seconds. The Laguerre functions does not represent actual

forms of the function when ∆t is set to 10−9. It was also observed that the basis functional approximations, estimates of the approximation coefficients and estimates of the system were highly sensitive to the value of ∆t.

Another issue this created was that the least square objective function was unusable in many cases due to Laguerre functions of different orders being highly correlated.

6.2 Future Work

As future extensions of this research, we suggest the following research direction.

• Develop a method to calculate Laguerre functions even with small ∆t values.

• Implement the proposed classification and system identification methods on other applications such as medical problems, etc.

• Try other basis functions and applying the same procedure to compare results with the Laguerre basis function.

• Incorporate other environmental variables such as pressure at the source and the frequency of the high voltage source into the classification and system identification processes. Appendix A

Smoothing the Signal Using Cubic Smoothing Splines

Let there be a partial discharge signal y(t) with an unknown form. We discussed

in Chapter 2 that this function can be approximated using a basis function. In that chapter, we used a Laguerre function basis, but there are other bases available. Polynomial basis functions can be considered the most intuitive basis but, in cubic

smoothing splines, we use a cubic spline basis (Wood, 2017).

A spline fits different polynomials of order k to different parts of the function that we need to approximate. Naturally, cubic splines use splines with cubic polynomials, or in other words, polynomials of order 3. These different cubic polynomials should be joined such that the final function is continuous in value as well as in first and second derivatives. The points which join the two polynomials are known as knots. In the case of smoothing splines, we have to select the knot locations that we hope to use

(Wood, 2017).

We can write the approximated function with an expression similar to (2.10) as

ky X y(t) ≈ yˆ(t) = yjbj(t), (A.1) j=0

94 APPENDIX A. SMOOTHING THE SIGNAL USING CUBIC SMOOTHING SPLINES95

where yj are the coefficients of the basis expansion and bj is the basis function. If we

∗ select knot locations at times tj : j = 0, . . . , ky − 2, according to Wood (2017) we can write the basis functions as

b0(t) = 1,

b1(t) = t,

∗ bj+2 = R(t, tj ) for j = 0, . . . , ky − 2, where

h 1 2 1 i h 1 2 1 i h 1 4 1 1 2 7 i z − 2 − 12 x − 2 − 12 |x − z| − 2 − 2 |x − z| − 2 + 240 R(x, z) = − . 4 24

The coefficients of the basis function expansion can be approximated using a least-squares approach similar to what we discussed in Section 2.3.1. We can also control the “wiggliness” of the polynomial by adding a penalty to the least-squares

problem (Wood, 2017).

We can use the function smooth.spline in R to fit the cubic smoothing spline. The function would automatically select the number of knots to be a number of unique data points, by default. Bibliography

Altenburger, R., C. Heitz, and J. Timmer (2002, jun). Analysis of phase-resolved

partial discharge patterns of voids based on a stochastic process approach. J. Phys.

D. Appl. Phys. 35 (11), 309. (Cited on page 6.)

Alvarez,´ F., F. Garnacho, A. Khamlichi, and J. Ortego (2016, jul). Classification of

partial discharge sources by the characterization of the pulses waveform. In 2016

IEEE Int. Conf. Dielectr., Volume 1, pp. 514–519. (Cited on page 7.)

Billings, S. A. (2013, jul). Nonlinear System Identification. Chichester, UK: John

Wiley & Sons, Ltd. (Cited on page 2.)

Breiman, L. (1995, nov). Better Subset Regression Using the Nonnegative Garrote.

Technometrics 37 (4), 373–384. (Cited on page 75.)

Budke, G. (1989, dec). On a Convolution Property Characterizing the Laguerre

Functions. Monatshefte f¨urMath. 107 (4), 281–285. (Cited on pages 16, 18 and 57.)

Cavallini, A., G. Montanari, A. Contin, and F. Pulletti (2003, mar). A new approach

to the diagnosis of solid insulation systems based on PD signal inference. IEEE

Electr. Insul. Mag. 19 (2), 23–30. (Cited on page 6.)

96 BIBLIOGRAPHY 97

Contin, A., A. Cavallini, G. Montanari, G. Pasini, and F. Puletti (2002, jun). Digital

detection and fuzzy classification of partial discharge signals. IEEE Trans. Dielectr.

Electr. Insul. 9 (3), 335–348. (Cited on page 6.)

Contin, A. and S. Pastore (2009, dec). Classification and separation of partial discharge

signals by means of their auto-correlation function evaluation. IEEE Trans. Dielectr.

Electr. Insul. 16 (6), 1609–1622. (Cited on page 6.)

Dabir, A. S., C. A. Trivedi, Y. Ryu, P. Pande, and J. A. Jo (2009). Fully automated

deconvolution method for on-line analysis of time-resolved fluorescence spectroscopy

data based on an iterative Laguerre expansion technique. J. Biomed. Opt. 14 (2),

024030. (Cited on page 8.)

Friedman, J., T. Hastie, and R. Tibshirani (2010, jan). A Note on the Group Lasso

and a Sparse Group Lasso. (Cited on page 21.)

Gao, C.-F. and N. Noda (2005, apr). Effects of Partial Discharges on Crack Growth

in Dielectrics. Appl. Phys. Lett. 86 (16), 162904. (Cited on page 5.)

Gu, G. (2012). System Identification. In Discret. Linear Syst., pp. 343–375. Boston,

MA: Springer US. (Cited on page 15.)

Hao, L., P. L. Lewin, J. A. Hunter, D. J. Swaffield, A. Contin, C. Walton, and M. Michel

(2011). Discrimination of multiple PD sources using wavelet decomposition and

principal component analysis. In IEEE Trans. Dielectr. Electr. Insul. (Cited on

page 6.)

Izadian, A. (2019). Fundamentals of Modern Electric Circuit Analysis and Filter

Synthesis. Cham: Springer International Publishing. (Cited on page 59.) BIBLIOGRAPHY 98

James, G., D. Witten, T. Hastie, and R. Tibshirani (2013). An Introduction to

Statistical Learning, Volume 103 of Springer Texts in Statistics. New York, NY:

Springer New York. (Cited on pages xi, 42, 43, 45, 46, 47, 48, 49 and 51.)

Janani, H. and B. Kordi (2018, sep). Towards Automated Statistical Partial Discharge

Source Classification using Pattern Recognition Techniques. High Volt. 3 (3), 162–

169(7). (Cited on pages 6 and 7.)

Janani, H., B. Kordi, and M. Jafari Jozani (2017, feb). Classification of simultaneous

multiple partial discharge sources based on probabilistic interpretation using a

two-step logistic regression algorithm. IEEE Trans. Dielectr. Electr. Insul. 24 (1),

54–65. (Cited on page 6.)

Keesman, K. J. (2011). System Identification. Advanced Textbooks in Control and

Signal Processing. London: Springer London. (Cited on page 3.)

Kuffel, J., P. Kuffel, and W. Zaengl (2000). High Voltage Engineering Fundamentals

(2nd ed.). Elsevier. (Cited on page 5.)

Liu, J., Y. Sun, J. Qi, and L. Marcu (2012, feb). A novel method for fast and

robust estimation of fluorescence decay dynamics using constrained least-squares

deconvolution with Laguerre expansion. Phys. Med. Biol. 57 (4), 843–865. (Cited

on page 8.)

Lorentz, G. G. (1973). Approximation Theory. New York: Academic Press. (Cited on

page 15.)

Love, E. R. U. o. M. (1997). Inequalities for Laguerre Functions. J. Inequalities

Appl. 1, 293–299. (Cited on page 39.) BIBLIOGRAPHY 99

Marmarelis, V. Z. (1993, nov). Identification of nonlinear biological systems using

laguerre expansions of kernels. Ann. Biomed. Eng. 21 (6), 573–589. (Cited on

page 8.)

Nasr Esfahani, A. (2018). Detection and Classification of Partial Discharge Sources

Under Variable Frequency and Air Pressure. M.sc., University of Manitoba. (Cited

on page 33.)

Nik Ali, N. H., J. A. Hunter, P. Rapisarda, and P. L. Lewin (2014, oct). Identification

of Multiple Partial Discharge Sources in High Voltage Transformer Windings. In 2014 IEEE Conf. Electr. Insul. Dielectr. Phenom., Des Moines, pp. 188–191. IEEE.

(Cited on pages 6 and 7.)

Okubo, H. and N. Hayakawa (2005, aug). A novel technique for partial discharge and

breakdown investigation based on current pulse waveform analysis. IEEE Trans.

Dielectr. Electr. Insul. 12 (4), 736–744. (Cited on page 6.)

Oppenheim, A. V., A. S. Willsky, and S. Hamid (1996). Signals and Systems (2nd

ed.). New Jersey: Prentice-Hall. (Cited on pages x, 2, 14, 15, 37 and 61.)

Saboktakinrizi, S. (2011). Time-Domain Distortion Analysis of Wideband Electromag-

netic Field Sensors Using Orthogonal Polynomial Subspaces Master of Science. Ph.

D. thesis, University of Manitoba. (Cited on pages 28, 29 and 30.)

Sage, A. P. and J. L. Melsa (1971). System Identification. Academic Press. (Cited on

page 23.)

Salama, M. and R. Bartnikas (2002, mar). Determination of neural-network topology

for partial discharge pulse pattern recognition. IEEE Trans. Neural Networks 13 (2),

446–456. (Cited on page 6.) BIBLIOGRAPHY 100

Shahabi, S. (2019). Low Air Pressure Partial Discharge Recognition using Statistical

Analysis of Time-domain Pulse Features. Ph.d., University of Manitoba. (Cited on

page 33.)

Szeg˝o,G. (1939, dec). Orthogonal Polynomials, Volume 23 of Colloquium Publications.

Providence, Rhode Island: American Mathematical Society. (Cited on page 17.)

Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. J. R. Stat.

Soc. Ser. B 58 (1), 267–288. (Cited on pages 20, 21, 73 and 75.)

Vajda, S., K. R. Godfrey, and P. Valko (1988, feb). Numerical deconvolution using

system identification methods. J. Pharmacokinet. Biopharm. 16 (1), 85–107. (Cited

on page 7.)

Verotta, D. (1993, oct). Two constrained deconvolution methods using spline functions.

J. Pharmacokinet. Biopharm. 21 (5), 609–636. (Cited on pages 7 and 8.)

Warne, D. and A. Haddad (Eds.) (2004, jan). Advances in High Voltage Engineering.

The Institution of Engineering and Technology, Michael Faraday House, Six Hills

Way, Stevenage SG1 2AY, UK: IET. (Cited on page 5.)

Wiener, N. and T. Teichmann (1959, aug). Nonlinear Problems in Random Theory.

Phys. Today 12 (8), 52–54. (Cited on page 8.)

Wood, S. N. (2017, may). Generalized Additive Models (2nd ed.). Chapman and

Hall/CRC. (Cited on pages 48, 94 and 95.)

Yuan, M., J. Koh, T. K. Sarkar, W. Lee, and M. Salazar-Palma (2005). A comparison

of performance of three orthogonal polynomials in extraction of wide-band response BIBLIOGRAPHY 101

using early time and low frequency data. IEEE Trans. Antennas Propag. 53 (2),

785–792. (Cited on page 28.)

Yuan, M. and Y. Lin (2006, feb). Model selection and estimation in regression with

grouped variables. J. R. Stat. Soc. Ser. B (Statistical Methodol. 68 (1), 49–67. (Cited

on pages 74, 75, 76 and 80.)

Zacharakis, G., A. Zolindaki, V. Sakkalis, G. Filippidis, E. Koumantakis, and T. G.

Papazoglou (1999, feb). Nonparametric characterization of human breast tissue by

the Laguerre expansion of the kernels technique applied on propagating femtosecond

laser pulses through biopsy samples. Appl. Phys. Lett. 74 (5), 771–772. (Cited on

page 7.)