<<

EEG

by

Alex Sanford

Thesis submitted in partial fulfillment of the requirements for the Degree of Bachelor of Computer Science with Honours

Acadia University March 2012 © Alex Sanford, 2012 This thesis by Alex Sanford is accepted in its present form by the School of Computer Science as satisfying the thesis requirements for the degree of Bachelor of Computer Science with Honours

Approved by the Thesis Supervisor

Dr. Jim Diamond Date

Approved by the Head of the Department

Dr. Danny Silver Date

Approved by the Honours Committee

Date

ii I, Alex Sanford, grant permission to the University Librarian at Acadia University to reproduce, loan, or distrubute copies of my thesis in microform, paper or electronic formats on a non-profit basis. I, however, retain the copyright in my thesis.

Signature of Author

Date

iii iv Contents

Abstract xi

Acknowledgments xiii

1 Introduction1

2 Background7 2.1 EEG Details...... 7 2.1.1 EEG Data Characteristics...... 10 2.1.2 EEG Data Capture...... 11 2.2 Data Compression...... 12 2.2.1 Compression Performance...... 13 2.2.2 Lossy vs. ...... 14 Lossless Techniques...... 14 Lossy Techniques...... 15 2.2.3 Audio Compression...... 15 2.2.4 FLAC...... 18 Predictor...... 18 Residual Encoding...... 20 2.3 Digital Signal Filtering...... 20 2.3.1 FIR Filters...... 21 2.3.2 Types of FIR Filters...... 23 2.3.3 FIR Design...... 24 2.4 Related Work...... 27

v 3 Theory and Approach 29 3.1 Converting EEG Data to Audio Data Format...... 30 3.2 EEG Data Filtering...... 30

4 Data 33 4.1 Hand Motion Data...... 33 4.2 Real and Imagined Motion Data...... 34 4.3 P300 Data...... 35 4.4 Sleep Data...... 36

5 Method and Results 39 5.1 Method...... 39 5.1.1 Quantization...... 40 Smallest Difference...... 40 Maximum Amplitude...... 41 5.1.2 Filtering EEG Data...... 41 5.2 Results...... 42 5.2.1 Compression of Unfiltered EEG...... 43 5.2.2 Compression of Low-pass Filtered EEG...... 45 5.2.3 Compression of Notch Filtered EEG...... 47

6 Conclusions and Future Work 57 6.1 Future Work...... 58

A Fourier Transform 61

B Fixed Polynomial Predictor 63

C Linear Predictive Coding 67

Bibliography 69

vi List of Tables

5.1 Average compression ratios of unfiltered data...... 44 5.2 Average compression ratios of unfiltered data with generic compression programs...... 45 5.3 Some subframe information from audio and EEG data...... 46 5.4 Average compression ratios of low-pass filtered data...... 47 5.5 Average compression ratios of hand motion data after notch filtering. 50 5.6 Compression performance of hand motion data with various low-pass filters (15 Hz width and 50 dB attenuation)...... 50 5.7 Compression performance of sleep data with various low-pass filters (15 Hz width and 50 dB attenuation)...... 53 5.8 Average compression ratios of low-pass filtered data with generic com- pression programs...... 53 5.9 Average compression ratios of low-pass filtered data before and after adding high-order fixed polynomial prediction...... 53

vii viii List of Figures

1.1 One minute of EEG data...... 2 1.2 EEG data zoomed in to less than a second...... 3 1.3 Recording from the Emotiv EPOC device...... 4

2.1 Illustration of applying PCM to a sine wave...... 8 2.2 A compound signal...... 9 2.3 The frequency components of a compound signal...... 10 2.4 Calculating a residual based on the original sample value and a prediction 17 2.5 Example impulse response plot...... 22 2.6 Example frequency response plot...... 23 2.7 Frequency response of an ideal low-pass filter...... 25 2.8 Specification parameters of a low-pass filter...... 26

4.1 Sample of hand motion data...... 34 4.2 Sample of real and imagined motion data...... 36 4.3 Sample of P300 data...... 37 4.4 Sample of sleep data...... 38

5.1 Unfiltered EEG data...... 42 5.2 EEG data of Figure 5.1 with low-pass filter applied...... 43 5.3 FFT of unfiltered EEG data...... 48 5.4 FFT of low-pass filtered EEG data...... 49 5.5 FFT of notch filtered hand motion data...... 51 5.6 FFT of hand motion data with harmonics of 50 Hz filtered...... 52

ix 5.7 Compression ratio vs. cutoff frequency of hand motion data...... 54 5.8 Compression ratio vs. cutoff frequency of sleep data...... 55

x Abstract

Electroencephalography (EEG) is the recording of the brain’s electrical activity at the scalp. EEG data is widely used by physicians and researchers for several types of medical tests and psychological research. EEG is also very popular in the emerging field of Brain Computer Interfaces, as evidenced by the current availability of several consumer-level EEG recording devices. EEG recordings can produce a lot of data. This data requires a lot of storage space and transmission time. One solution to this problem is data compression. This thesis presents a new approach to EEG data compression using audio compression techniques. Digital signal filtering is used as a way of increasing the performance of the compression at the cost of losing some information. The results are compared to the compression results obtained when using some generic compression software. When an appropriate filter is used, audio compression provides very good com- pression performance for EEG data, especially data recorded at high sampling rates. This method produced compressed EEG files which were 15–30% of the original size. This is superior to the generic compression algorithms which produced files which were 50–100% of the original size.

xi xii Acknowledgments

First, I would like to thank Jim Diamond for being a great supervisor and professor. His experience and knowledge have made this research possible, and his encourage- ment and patience has helped me to get this far. I greatly respect him as a teacher and as a friend, and have learned more from him than I’ve learned from many others. I would also like to thank the Faculty and Staff in the Jodrey School of Computer Science. The friendly, personal atmosphere of the school has made my research and education here a really great experience. Finally, a big thank you goes out to my family and friends for being there for me and seeing me through. Their love, understanding, and support are very valuable to me, and I couldn’t have done this without them.

xiii xiv Chapter 1

Introduction

Electroencephalography (EEG) is the recording of the brain’s electrical activity at the scalp. This information is useful for several applications, such as medical diagnosis, psychological research, and increasingly, it is being used in the field of Brain Computer Interfaces, or BCI (described further below). EEG is a popular way of recording electrical brain activity because it is non-invasive (i.e., it doesn’t require subjects to have surgical implants in order to collect the data). EEG data is collected by placing electrodes on a subject’s scalp. The electrodes are typically spread out over the head and there can be anywhere from a single electrode to over a hundred, depending on the application. For example, the MindSet1 EEG gaming device by Neurosky has only a single electrode, whereas the g.GAMMAsys2 EEG research device by g.tec may have up to 86. Electrodes are often placed in standard locations across the scalp based on the “10-20” system (see page 140 of [NL05]). In the past, EEG data was stored by being written onto paper. The earliest EEG devices were analog rather than digital (as mentioned in the introduction of [Hug08]). Modern EEG, however, is recorded and stored digitally on a computer. The data from each electrode on an EEG device is typically viewed or recorded as a separate channel, similar to how stereo audio data is stored in a left channel and a

1http://neurosky.com/Products/MindSet.aspx 2http://www.gtec.at/Products/Electrodes-and-Sensors/g.GAMMAsys-Specs-Features

1 2 CHAPTER 1. INTRODUCTION

right channel. See Figure 1.1 for one minute of data from a single channel of EEG and Figure 1.2 for the same data zoomed in to under one second. Also, see Figure 1.3 for a multi-channel recording from the Emotiv EPOC device3. This device has 14 channels of data corresponding to its 14 electrodes. The channels are named based on the standard placement according to the 10-20 system. The channel names for the EPOC, which are listed on the left hand side of the figure, are AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4. Each name specifies the location of the corresponding electrode on the subject’s head. As can be seen, the recorded channels are stacked vertically.

Figure 1.1: One minute of EEG data

A very interesting application of EEG technology which was mentioned above

3http://emotiv.com/store/hardware/epoc-bci/epoc-neuroheadset/ 3

Figure 1.2: EEG data zoomed in to less than a second is BCI. A great deal of research has been done around BCI, and some affordable consumer level products are emerging. The Emotiv EPOC device, for example, is a simple EEG device which can be used to detect mental states, facial expressions, and particular cognitive thoughts. For example, a game which accesses the device could detect when a user starts to get bored and present more difficult challenges. Also, the user could think “push” or “pull” to activate particular functions in the game, such as opening or closing a door. The Emotiv software API has three types of detections:

• The Expressiv detections capture facial expressions, such as winks, eyebrow position, and a smile or frown.

• The Affectiv detections capture emotional state, such as boredom, excitement, and frustration. 4 CHAPTER 1. INTRODUCTION

Figure 1.3: Recording from the Emotiv EPOC device

• The Cognitiv detections capture thoughts such as “push”, “pull”, “lift”, “drop”, “left”, and “right”. These detections require the user to provide training data to the software so it can learn the EEG patterns of the particular user.

However, BCI is not limited to games, and has also been used for such tasks as letting paralyzed people move robotic limbs (see [Eav11]). As mentioned above, EEG is preferred over other forms of brain data collection because of it’s non-invasiveness. However, the electrical signal at the scalp is very weak and the electrodes must be very sensitive in order to detect the signal. This causes the EEG data to contain a lot of electrical “noise”, which occurs when elec- trical activity from sources other than the brain interfere with the EEG signal in the electrodes and in the wires. This noise includes artifacts from electronic devices, AC power lines, and any other electrical activity which may be present. Thus, for some applications, more invasive methods such as electrocorticography (ECoG) are preferred. ECoG involves surgically placing electrodes just inside the skull, but not 5

touching the brain matter itself. This decreases the amount of interference and thus the level of noise, giving a much cleaner signal. However, this is not appropriate for applications such as consumer level BCI, because of the surgery involved. Since EEG is so widely used, it is inevitable that problems begin to occur as the amount of data recorded, stored, and transmitted accumulates. Some medical procedures require EEG data to be recorded for a very long period of time, such as 24 hours. Even with just a single electrode, this could generate from 15 MB to over 400 MB of data, depending on the required quality of the recording. This means that a lot of disk space is required to store it, and transmitting it to others can take a long time. Furthermore, there may be several electrodes, several patients, and several trials of each. If each electrode for each trial of each patient stores 15–400 MB of data, the size of the data can increase dramatically. One solution to this problem is to employ data compression. Data compression is the art and science of representing data in a compact form. In the context of computing, this means representing, storing, or transmitting an object, such as a file or message, using fewer bits than the usual representation. This has several widespread practical uses which computer users see every day. For example, image, video, and audio data are seldom stored or transferred uncompressed because doing so would take a great deal of storage space and time. Another example of compression is “zipping” a directory. Zipping is a common action which involves archiving the contents of the directory into a single file and then compressing the resulting file. Data compression is a very common method for solving the problem of managing very large collections of data. Thus, one way to deal with increasing storage requirements and lengthy transmission times of EEG data is to compress it. This thesis explores the problems associated with compressing EEG data, develops a novel approach to solve this problem, and presents the results of applying this solution to actual EEG data. The next chapter presents some required background material. 6 CHAPTER 1. INTRODUCTION Chapter 2

Background

This chapter will provide the necessary background in the areas of EEG, data com- pression and digital signal filtering.

2.1 EEG Details

EEG signals are typically stored in a computer using Pulse Code Modulation (PCM). PCM is a technique which is used in many applications for producing a digital repre- sentation of an analog signal. The analog signal is analyzed at regular intervals and the magnitude of the signal at each of the points is converted to an integer. This is done by applying a linear mapping to the analog value and then rounding the result to the nearest integer. The linear mapping is constructed such that it converts all real-valued data in some desired range to integers in a range which can be represented by some predetermined number of bits, such as 16 (i.e., two bytes). The resulting integers are called samples, and the number of samples which are recorded per second is called the sampling rate. The sampling rate is usually expressed in units of Hz, which is equal to the number of samples per second. These samples provide a close approximation to the analog signal. Hardware components which perform the detec- tion of analog signal levels and convert them into integers are called analog-to-digital (A/D) converters. See Figure 2.1 for an illustration of PCM applied to a portion of a sine wave.

7 8 CHAPTER 2. BACKGROUND

In this illustration, the range of the sine wave is −7 to 7, so no linear mapping is necessary if 4-bit signed integers are to be used, as in this case. Each tick on the horizontal axis represents the time when a sample is taken. Each time a sample is taken, the value of the sine wave at that point is determined. This is shown in Figure 2.1 for the first five points (starting at sample x = 1). Afterward, these values are rounded to the nearest integer. The first five integer sample values would be 2, 4, 5, 6, and 7.

Figure 2.1: Illustration of applying PCM to a sine wave

Any analog or digital signal, including EEG, can be analyzed in the time domain or the frequency domain. The time domain represents the signal as a varying magnitude of some measurement (such as voltage) over time. This is how signals are often viewed. However, in some cases, it is more practical to study or discuss signals in 2.1. EEG DETAILS 9

the frequency domain. Fourier analysis is a well-studied subject area which treats all signals as a sum of sinusoids at various frequencies and magnitudes. In practical terms, a signal can be broken into several components at various frequencies, where each component has its own magnitude. These components are called frequency components. Figure 2.2 shows an example of a compound signal which is made up of the frequency components shown in Figure 2.3. All frequency components in a digital signal with a given sampling rate fs have frequency strictly less than fs/2 (according to the Nyquist Sampling Theorem; see AppendixA).

Figure 2.2: A compound signal 10 CHAPTER 2. BACKGROUND

Figure 2.3: The frequency components of a compound signal

2.1.1 EEG Data Characteristics

Some characteristics of EEG signals are linked to particular activities of the brain. For example, the EEG signal from a given subject who is sleeping will be different than the EEG signal of the same subject while trying to solve a math problem (the discussion here is limited to human subjects). The frequency components of EEG data are typically characterized into the following frequency ranges:

• Delta waves are in the frequency range 0–4 Hz. They are dominant when the subject is sleeping.

• Theta waves are in the range 4–8 Hz. They are typically dominant when the subject is drowsy or meditating. 2.1. EEG DETAILS 11

• Alpha waves are in the range 8–13 Hz. They are dominant when the subject is relaxing, and there are also some characteristic patterns in this range which emerge over the sensorimotor cortex (the part of the brain responsible for phys- ical movement) when the subject moves part of their body, such as an arm.

• Beta waves are in the range 13–30 Hz. They are associated both with physical movement and concentration.

• Gamma waves are in the range 30–100 Hz and are believed to be associated with particular cognitive or motor functions.

These definitions are widely used among EEG researchers, although the exact frequencies may vary. See [Bio12] for a summary of Delta, Theta, Alpha, and Beta waves. See [Hug08] for a lengthy discussion on a study of Gamma waves. In general, medical EEG applications are limited to activity in the 0–30 Hz range. Signals in the Gamma range have gained more research interest recently with digital EEG capture, but haven’t gained widespread medical use. This is because early analog EEG systems could only capture EEG data up to around 25 Hz and so Gamma waves are not as well understood (see [Hug08]). Thus, many EEG applications do not use them, and in most cases they can be ignored.

2.1.2 EEG Data Capture

EEG data is captured using specialized hardware, which includes an A/D converter as well as other components. The specialized hardware includes the following (see [GPR+03]):

• Electrodes which are placed on the subject’s head. Usually a conductive gel or liquid is applied to the electrodes to facilitate the detection of the electrical signals.

• An amplifier which boosts the weak electrical signals from the scalp making it possible to detect them. It is extremely important to have protective circuitry at this stage, since a fault could apply a direct electrical shock to the subject’s head and brain, which could have severe consequences. 12 CHAPTER 2. BACKGROUND

• An A/D converter which converts the EEG data to integer samples.

• A recording device, which is usually a computer. The integer samples are stored and/or viewed on this device.

As mentioned above, A/D converters convert the real-valued voltages into integer samples. However, in many cases these integer samples are best interpreted by humans by converting them into floating point values to represent an approximation of the voltages which were detected. Thus some EEG data, especially data stored in the MATLAB file format, is stored on the computer as floating point values, rather than the original integer values which were generated by the A/D converter.

2.2 Data Compression

The goal of data compression is to represent an object (such as a file) using a smaller number of bits than the usual representation. There are several techniques which are used to compress files, most of which are application specific; for example, a com- pression technique which works well on image files may not work well on audio files. However, all data compression techniques follow some common theoretic principles. Data compression is tied to the mathematical study of information theory. In- formation theory involves the quantification of information. Usually we talk about information being inherent in a particular event. The definition of an event is appli- cation specific, and in the case of data compression, is usually some part of the object we’re trying to represent. For example, a pixel in an image file which has a particular color may be considered an event. The amount of information this pixel contains depends on its color, as well as the color of surrounding pixels. In the study of data compression, the goal is to assign a smaller number of bits to the representation of an event which contains less information, and consequently a larger number of bits to an event containing more information. Events which occur often contain less information, since they are less “surprising”. To illustrate this point, consider words in English containing the letter q. The letter q is almost always followed by the letter u in English words. Thus, one method of 2.2. DATA COMPRESSION 13

compressing English words would be to remove all u’s which occur after a q, given the understanding that every q is followed by a u by default. Then, whenever a word occurs which contains a q with no subsequent u, output some character after the q which represents the fact that there is no u after the q. This is an example of using more representation (or more bits) for events which contain more information (or are more surprising). Thus, if we assign less bits to the more common events, we can benefit from using a small representation for many events, even if that means we have to use a larger representation for the less common events. An early form of data compression which takes advantage of commonly occurring letters in the English language is Morse Code (see page 2 of [Say00]). This code uses short sequences for commonly used letters (such as e and a) and long sequences for uncommon letters (such as q and j ). This allows people to send messages quickly, since they spend little time transmitting common letters. Even though they consequently have to spend more time to transmit uncommon letters, the gain almost always outweighs the loss. A similar form of data compression which is very common today is known as Huffman coding (Chapter 3 of [Say00]). The Huffman coding algorithm encodes a sequence of characters (such as a text file) one character at a time. Given the probability of each character occurring in the file, an encoding is created such that characters with a high probability are encoded using fewer bits, and characters with a low probability are encoded using more bits. Similar to Morse Code, the gains usually outweigh the losses, making Huffman coding a moderately good data compression technique for most files.

2.2.1 Compression Performance

There are several standard ways to measure the performance of a compression tech- nique. Two very common measures are the time complexity of the compression algo- rithm, and the amount by which a file’s size decreases after compression. The latter is usually of the most interest, although other measurements are also very important (for example, if a compression technique compresses audio data better than any other 14 CHAPTER 2. BACKGROUND

technique but the algorithm takes several hours to run on typical audio files, then it is not a good technique to use in general). In this thesis, we will use the term compression performance to mean the amount by which the file is compressed. A high compression performance means that the compression technique performed well, and the compressed file is much smaller than the original. There are several com- mon ways to measure compression performance. We will use the measurement of the compression ratio to determine the compression performance of an experiment. The compression ratio may be defined in a number of ways, but the following definition is used in thesis:

Compressed File Size Compression Ratio = Original File Size

Thus a small compression ratio implies high compression performance. The com- pression ratio will be expressed as a percentage. For a more detailed discussion of compression performance, see Section 1.1.3 of [Say00].

2.2.2 Lossy vs. Lossless Compression

There are two fundamental categories of data compression techniques, namely “loss- less” and “lossy” techniques. As the name implies, lossless techniques compress the data without losing any information. However, lossy techniques involve the loss of information in order to increase compression performance. More formally, assume a data string S is compressed yielding an encoded string C using some compression technique. C is then decompressed yielding the decoded string S0. If a lossless com- pression technique is used, the string S0 will exactly equal the original string S. If a technique is used, S0 will not be equal to S in general.

Lossless Techniques

Huffman coding, as described earlier, is an example of a lossless compression tech- nique; i.e., when a string is encoded using Huffman coding, and the result is then decoded, the final decoded string will be identical to the original string before it was 2.2. DATA COMPRESSION 15

encoded. Lossless compression is important for applications where alterations in the data can cause problems. Text compression is one such example, since changing a character in a word can make the word meaningless or change its meaning.

Lossy Techniques

The problem with lossless techniques is that they usually do not compress the data as well as lossy techniques. In other words, lossy techniques can usually represent the original data using fewer bits than a lossless technique (at the cost of losing some information). In many cases, an exact reproduction of the original data is not necessary, as a close approximation will do. For example, some audio compression algorithms remove sounds from the audio file which cannot be perceived by the human . Since information has been removed from the file, it can be represented using a smaller number of bits, resulting in higher compression performance. Obviously, this is only appropriate if the purpose of the audio file is to be listened to by a human. If the audio file contains data such as sonar signals to be inspected visually, removing inaudible portions of the signal is not acceptable. Lossy compression techniques raise the issue of acceptable loss vs. compression performance. The amount of acceptable loss is very application specific (as discussed above) and can be subjective. Some lossy compression techniques, such as JPEG and MP3 audio compression, allow the user to specify the amount of compression necessary. The user can inspect the file after compression to ensure that only an acceptable amount of information has been lost (e.g., an image still looks good and is not too grainy).

2.2.3 Audio Compression

There have been various studies into the performance of data compression techniques when applied to EEG data. An overview of some of these studies is given in Sec- tion 2.4. A technique which is not employed in any of the mentioned studies is audio compression. Audio data is stored as multi-channel PCM and is recorded using an A/D converter, after converting the acoustic sound waves into an electrical signal 16 CHAPTER 2. BACKGROUND

using hardware such as a microphone. Since EEG is stored as multi-channel PCM data, it has several similarities with audio data. Because of these similarities, it is possible to compress EEG data using audio compression techniques. Many audio compression algorithms exist, both lossless and lossy. Since audio data and EEG data are perceived differently and have different properties, it doesn’t make sense to use lossy audio compression techniques on EEG data. As stated in Section 2.2.2, the acceptable amount of loss is very dependent on the application. For example, removing frequencies which are not perceivable by the human ear is not acceptable for EEG data since most of the meaningful information is in very low frequency waves, outside the range of human hearing. Thus the audio compression algorithms used in this thesis will be lossless algorithms. In particular, the use of FLAC, the Free Lossless Audio Codec (see [Coa08]), will be discussed. Audio compression is like text compression in that the compressor attempts to encode events which contain less information using a smaller number of bits. In the case of text compression, the relevant events are the occurrences of particular characters. Huffman coding determines the amount of information contained in the character occurrence by calculating the probability of occurrence of each character. For audio data, the relevant events are audio samples. A common method of determining the amount of information contained in an audio sample is to use a predictor. A predictor is a mathematical function which predicts the expected value for a sample given some number of previous samples. This prediction is then subtracted from the actual value of the sample being predicted. This difference is known as the residual. See Figure 2.4 for an example of a predictor and residual. The black data points represent the sample values and the dotted line is the predictor. This particular predictor uses the first two black points to generate a prediction for the third one. The prediction is shown by the white data point. The residual of this calculation is the value of the black data point at x = 3 minus the value of the diamond shaped data point, giving a residual value of 9 − 8 = 1. Assuming we have a good predictor, we can say that the information contained in a sample is related to the magnitude of the residual. For sequences of samples containing little information, in principle it should be possible to build a predictor 2.2. DATA COMPRESSION 17

Figure 2.4: Calculating a residual based on the original sample value and a prediction whose predictions are very close to the actual sample values, resulting in residuals with low magnitude. For samples containing more information, the predictor may not produce predictions which are as close to the actual values, thus yielding residuals with larger magnitude. After these residuals are produced they are encoded. Ideally, the encoded residuals each take up fewer bits then their corresponding audio samples, and so the compressed file is smaller than the original. Residuals with low magnitude (corresponding to predictions which are very close to the sample value) can be encoded with fewer bits than residuals with high magnitude. As with text compression, some samples may contain more information and thus may produce residuals with higher magnitude (resulting in a greater number of bits after encoding), but typically the gains outweigh 18 CHAPTER 2. BACKGROUND

the losses.

2.2.4 FLAC

FLAC is a software which uses the concepts described above (among oth- ers) to compress audio files. The operation of FLAC includes partitioning the audio file, building the predictors, computing the residuals, and encoding the residuals, all of which is described in this section. This thesis uses the terminology and spelling which is used by the author of FLAC. This terminology, at the time of this writ- ing, is defined at http://flac.sourceforge.net/format.html#definitions and is discussed below. Before compression, FLAC partitions the audio file into several sections called blocks. Each block is made up of several subblocks. There is one subblock for each channel of audio data. Each subblock contains N consecutive samples of the corre- sponding audio channel. The number N is configurable and is known as the blocksize. By default, the blocksize is 4096 samples. Each subblock is encoded independently using, in general, a different predictor with different parameters. This is beneficial because the nature of the audio data varies throughout the data file, and also varies between channels, making different predictors perform better on different sections of the audio. A compressed subblock is called a subframe, and the collection of com- pressed subblocks within a block is called a frame.

Predictor

FLAC uses one of four methods for encoding a subblock to produce a subframe: constant, verbatim, fixed, and LPC :

• A constant subframe is one for which each sample in the original subblock had the same value. This can be compressed trivially, and so FLAC simply stores the value of the samples and the blocksize. No further encoding is necessary for this type of subframe. 2.2. DATA COMPRESSION 19

• A verbatim subframe is one for which no attempted compression algorithm de- livered any benefit. In other words, every attempted compression algorithm produced a representation of the subframe which used more bits than the origi- nal subblock, rather than less. In this case, FLAC will simply store the original data in the subframe without predicting or encoding.

• A fixed subframe is a subframe which uses a fixed polynomial predictor to com- pute the residuals for the audio samples. Fixed polynomial predictors usually do not perform as well as the more powerful LPC method described below, but they require much less overhead space. The only parameter which needs to be stored is the predictor order. For details on the mathematics behind fixed polynomial predictors, see AppendixB.

• An LPC subframe uses linear predictive coding (LPC) as a predictor. LPC is much more flexible than fixed polynomial prediction, and consequently often provides better compression performance, at the cost of requiring more over- head space. LPC requires more parameters to be stored than fixed polynomial predictors: the order n of the predictor must be stored, and n coefficients must also be stored. In other words, LPC usually performs better than fixed poly- nomial predictions, but in the case that it doesn’t perform very much better, the fixed polynomial predictor is preferred because it has less overhead space requirements. For details on the mathematics behind LPC and how the coeffi- cients are determined, see AppendixC.

The first thing that FLAC does with each subblock is determine which subframe type will use the smallest number of bits. It typically tries each subframe type and chooses the best one. In the cases of fixed and LPC subframes, it uses heuristics to determine the predictor order which should be used. It may also choose several predictor orders or even exhaustively search through all supported orders and try a different predictor with each one. Once the predictor and its coefficients are chosen, FLAC computes the residuals and encodes them. 20 CHAPTER 2. BACKGROUND

Residual Encoding

FLAC uses Rice coding to encode the residuals after they are computed. Rice coding involves choosing a parameter k (known as the Rice parameter) and dividing the residual by 2k. The quotient is then encoded in unary and the remainder in binary, along with a sign bit to represent the sign of the residual. Since the magnitude of the remainder is a number between 0 and 2k − 1, it can be represented in binary using k bits. Also, one bit is required to separate the unary portion and the remainder of the residual. Thus each residual uses q + k + 2 bits where q is the quotient. FLAC also uses heuristics for choosing the Rice parameter and can be configured to search exhaustively through a range of numbers for the best value. Furthermore, beyond splitting the audio data file into blocks, frames, and sub- frames, an individual subframe is further split into partitions before encoding, where each partition has its own Rice parameter. The number of partitions is another pa- rameter which must be decided by FLAC, and also may be searched for exhaustively. This is useful when the predictor performance varies throughout the subblock. When the predictor produces large residuals, it is best to have a larger value of k to ensure that the unary coded values of q do not become too large. However, when the pre- dictor produces small residuals, a smaller value of k gives better compression. Thus the number of partitions can greatly affect the compression performance.

2.3 Digital Signal Filtering

Digital signal filtering is a powerful technique for analyzing and modifying digital signals. It is commonly used in EEG data analysis. A digital filter is often used to attenuate components of a signal in a given frequency range or set of ranges. In the case of this study, filtering was used as a technique for increasing compression perfor- mance. In medical EEG analysis, any frequency components above 30 Hz are usually considered noise (see Section 2.1.1), and there is often interference which comes from electrical lines because of the sensitivity of the electrodes. This interference occurs at 50 Hz or 60 Hz, depending on the electrical standards of the location where the 2.3. DIGITAL SIGNAL FILTERING 21

data is recorded. Thus, it is usually helpful to filter out data above 30 Hz as it is often not needed. However, filtering is a lossy process as it cannot be undone. Thus, if it is employed to improve compression, then the compression algorithm must be considered lossy, even if the filter output is subsequently compressed using a lossless technique. A discussion of digital signal filtering follows.

2.3.1 FIR Filters

There are several types of digital filters, but this discussion will be limited to a particular type of linear filter called a Finite Impulse Response (FIR) filter. A FIR filter works by applying a linear operation called convolution (described below) to a signal and a list of coefficients (this process is described fully in Section 5.2 of [Lyo01]). These coefficients (called the filter coefficients) define the operation of the filter. Convolution is a fundamental operation on two continuous functions or two dis- crete arrays which is used very commonly in signal processing and analysis. The convolution of two signals x and y produces a signal x ∗ y as a result. The operation is defined as follows: +∞ X (x ∗ y)[n] = x[m]y[n − m] m=−∞ where any undefined value of x or y is assumed to be zero. This means that even though m varies from −∞ to +∞, whenever x[m] = 0 or y[n − m] = 0 the corre- sponding element of the sum will also be 0 so it can be ignored. Thus the sum can be done over a finite number of nonzero elements. There exist algorithms to compute the convolution of two signals in O (N log N) time where N is the number of samples in the larger of the two signals. Two important properties of a FIR filter are the impulse response and the fre- quency response. The impulse response of the filter is the output which results from an input signal whose first sample value is 1 and all subsequent sample values are 0 (this type of signal is called an impulse). See Figure 2.5 for a plot of an impulse response of a particular FIR filter. Because of the nature of convolution, the impulse 22 CHAPTER 2. BACKGROUND

response of a filter is a signal whose sample values are equal to the filter coefficients, and so the filter coefficients are often referred to as the impulse response of the filter.

Figure 2.5: Example impulse response plot

The frequency response of a FIR filter is obtained by taking the Fourier Trans- form (described below) of the impulse response. The Fourier Transform of a time domain signal produces a representation of the signal in the frequency domain. See AppendixA for details. The frequency response is typically shown as a plot and it represents the multiplicative change in magnitude of the various frequency compo- nents after going through the filter. For example, a frequency response with a value of 1.0 at 5 Hz indicates that the magnitude of the 5 Hz frequency component of a signal will be untouched by the filter (i.e., multiplied by 1.0). Similarly, a frequency response with a value of 0.5 at 10 Hz indicates that the magnitude of the 10 Hz 2.3. DIGITAL SIGNAL FILTERING 23

frequency component of a signal will be reduced by half after going through the fil- ter. See Figure 2.6 for a plot of the frequency response corresponding to the impulse response in Figure 2.5.

Figure 2.6: Example frequency response plot

2.3.2 Types of FIR Filters

Signal filters which are designed to attenuate particular frequencies and let others pass are typically grouped into one of four categories based on which frequencies pass through the filter unaffected (the passband), and which frequencies are attenuated (the stopband):

• Low-pass filters allow low frequencies to pass while attenuating high frequencies. 24 CHAPTER 2. BACKGROUND

• High-pass filters allow high frequencies to pass while attenuating low frequen- cies.

• Band-pass filters allow frequencies within a given range to pass. All other frequencies are attenuated.

• Band-stop filters attenuate frequencies which are within a given range. All other frequencies are passed. A band-stop filter with a very small stopband is called a notch filter.

In order to discuss the parameters and construction of a FIR Filter, only low- pass filters need to be considered. This is because any other type of filter can be constructed from a set of low-pass filters and some basic operations. For example, to construct a high-pass filter which attenuates frequencies below 50 Hz, we can first design a low-pass filter which attenuates frequencies above 50 Hz, filter the signal, and subtract the filter output from the original signal. This removes all the frequency components which would pass through the low-pass filter, thus creating a high-pass filter. In practice, the low-pass filter and the subtraction operation are combined to produce a new impulse response representing the desired high-pass filter. Similarly, band-pass and band-stop filters can be created from low-pass filters.

2.3.3 FIR Filter Design

There are various parameters to consider when designing a FIR low-pass filter. Ideally, the filter should pass all frequency components below a given cutoff frequency fc and remove all frequency components above the cutoff. The ideal frequency response of a low-pass filter is shown in Figure 2.7. Unfortunately, the ideal frequency response requires an infinitely long impulse response, which is obviously not possible with a FIR filter. Thus, a perfect low-pass filter cannot be constructed, and must instead be approximated. For a real low-pass filter, in general, the frequency components in the passband will not be passed completely unaffected, the frequency components in the stopband will not be perfectly attenuated, and the passband and stopband will not be 2.3. DIGITAL SIGNAL FILTERING 25

adjacent, but rather there will be a range of frequencies between them. The following quantities are used to specify the extent of each of these issues for a low-pass filter:

Figure 2.7: Frequency response of an ideal low-pass filter

• The passband ripple, δp, is the amount by which the frequency response within the passband may vary from 1. In other words, this quantity specifies how much the frequency components in the passband will be affected by the filter.

• The stopband ripple, δs, is the amount by which the frequency response within the stopband may vary from 0. The smaller this value is, the more attenuation will be applied to the frequency components in the stopband.

• The stopband edge frequency, θs, is the lowest frequency in the stopband. 26 CHAPTER 2. BACKGROUND

• The passband edge frequency, θp, is the highest frequency in the passband.

See Figure 2.8 for a graphical view of these quantities. A similar figure is given in Section 8.2.1 of [Por97]. Two other quantities which are frequently discussed are the transition width

TW = θs − θp (also shown in Figure 2.8) and the stopband attenuation As =

−20 log(δs). The stopband attenuation is in units of decibels (dB) and a high stop- band attenuation implies a low stopband ripple.

Figure 2.8: Specification parameters of a low-pass filter

The way to choose these parameters is very dependent on the goal. More demand- ing filters, with a thin transition width, high stopband attenuation, and low ripple, require more filter coefficients and consequently take more time to apply to a signal. This may be required for very accurate automated analysis, but if the filter is going 2.4. RELATED WORK 27

to be used, for example, by a human to visually search for particular artifacts, this level of precision may not be required.

2.4 Related Work

Some work has been done in the field of EEG data compression by various groups. A brief discussion of some of these studies follows. Magotra et al. describe a process of lossless EEG compression using reversible filters and entropy coding in [MMM96]. The EEG data is filtered with a reversible filter, and the result from that filter is encoded using a type of entropy coding known as arithmetic coding. They achieved compression ratios (as defined in Section 2.2.1) of roughly 60%. Antoniol and Tonella give a summary of several EEG compression techniques in [AT91], as well as results. These techniques were used on EEG recorded with 8-bit samples, rather than 16-bit samples as used in this thesis (see Chapter4). Ylostalo also gives an overview of compression techniques which are used for EEG data in [Ylo99]. This paper does not include any experimental results. Memon and Cinkler describe an experiment in [MC99] done on EEG signals ob- tained from one-week-old piglets. Several compression methods were used, including linear predictive coding (as used by FLAC). With the linear predictive coding method, they were able to obtain compression ratios of roughly 50%. It is very difficult to compare the results of these mentioned studies to the results of this study. Some broad comparisons can be made, but in general, compression algorithms are very data dependent. Since the data used in these studies is not publicly available, there is no way to directly compare compression performance by using the same data. Furthermore, differences such as the amount of pre-filtering applied and the use of different types of EEG data (such as adult human and one- week-old piglet) make direct comparisons meaningless. The next chapter outlines the new approach which is taken in this thesis to com- press EEG data. 28 CHAPTER 2. BACKGROUND Chapter 3

Theory and Approach

In this thesis, a novel approach is taken to EEG data compression. First, the EEG data sets are converted into an audio data format, and the data is then compressed us- ing FLAC. This conversion is possible because of the similarities between the storage techniques employed for audio and EEG data (see Section 3.1). Many of the exper- iments used signal filtering as a tool to improve compression performance. When this is done, the compression is considered lossy. Otherwise, it is lossless. The filters which were used are described in Section 3.2. As seen in Section 2.4, EEG data compression has been approached using a wide variety of methods, but unlike audio or video data, no compression methods have been standardized for EEG compression. Standard EEG data file formats store EEG data uncompressed. Likewise, there is no baseline test data for EEG compression performance, such as the Lenna image used for image compression1. Thus in this thesis, comparisons will be made between the compression performance of FLAC and the compression performance of some generic compression software, namely , , and . It is assumed that FLAC should perform better than generic compression software. This is because EEG is stored as a PCM waveform. This makes it very similar to audio data, which FLAC compresses well. The generic programs are not built specifically for audio or EEG data, but rather are designed to work with a wide variety of data.

1http://www.lenna.org

29 30 CHAPTER 3. THEORY AND APPROACH

3.1 Converting EEG Data to Audio Data Format

As mentioned in Section 2.2.3, audio data and EEG data are both stored on computers using PCM. Audio data is stored in a number of different file formats, so the simple wav format was chosen to represent EEG data. A wav file stores some information about the audio file in a header, and subsequently stores the data samples. EEG files store the same information, but in various formats. Thus to convert an EEG data file to a wav data file, the header information and EEG data has to be read from the EEG file, and then rewritten in the wav file format. This process does not alter the EEG data, but represents it in a way which can be read and processed by tools designed for audio data, such as FLAC.

3.2 EEG Data Filtering

As mentioned in Section 2.3, a technique which was used to increase compression performance was filtering the EEG data. The prediction functions of FLAC tend to compress lower frequency signals better than higher frequency signals (as discussed in Section 5.2). Removing the high frequency signals from the EEG data improves the performance of the predictor, thus reducing the magnitude of the residuals and increasing compression performance. A great deal of the high frequency noise in EEG data is electrical noise originating from the AC electrical power in the building where the recording took place. The recordings were done in Europe, so the AC power noise occurred at 50 Hz, in keeping with the frequency of electrical power systems in Europe. If the recording was from North America, the noise would have been at 60 Hz instead. It was also seen that the harmonics of this noise were present in the signal. The harmonics of a frequency component are all frequency components whose frequencies are an integer multiple of that component’s frequency. For example, the harmonics of the 50 Hz noise are the frequency components at 100 Hz, 150 Hz, 200 Hz, etc. Several different filters were used on the EEG data sets. These include a low-pass filter with a cutoff of 30 Hz and a notch filter to remove the 50 Hz noise. Another 3.2. EEG DATA FILTERING 31

filtering technique which was attempted was to apply a bank of notch filters to remove the 50 Hz noise and its harmonics. The signal was fed through each notch filter in series in order to remove each harmonic. Finally, several low pass filters were used with varying cutoff frequencies in order to explore the effect of filtering out varying amounts of high frequency data. 32 CHAPTER 3. THEORY AND APPROACH Chapter 4

Data

Several data files were used from various sources for experimentation. This chapter describes the data which will be discussed in this thesis.

4.1 Hand Motion Data

This data was collected from a 21 year old, right handed male subject with no known medical conditions. There are three data sets:

• Baseline - the subject made no movements

• Left - the subject made random movements with his left hand

• Right - the subject made random movements with his right hand

Each file contains 19 channels of data corresponding to 19 electrodes. The data samples are stored in the MATLAB file format as 64-bit floats. The duration of the recordings are each 128.60 seconds with a sampling rate of 500 Hz, giving 64,300 samples in each file. The data can be found (as of the time of this writing) at http://sites.google.com/site/projectbci/, dataset 1. Download and extract the three RAR files to get a file Subject1 1D.mat. This is the MATLAB file which contains the data. Figure 4.1 shows a sample plot of this data.

33 34 CHAPTER 4. DATA

Figure 4.1: Sample of hand motion data

4.2 Real and Imagined Motion Data

This data was also collected from a 21 year old, right handed male subject with no known medical conditions. There are eighteen data files which involve the subject performing the following activities:

• Three trials of left hand forward movement

• Three trials of left hand backward movement

• Three trials of right hand forward movement

• Three trials of right hand backward movement 4.3. P300 DATA 35

• One trial of imagined left hand forward movement

• One trial of imagined left hand backward movement

• One trial of imagined right hand forward movement

• One trial of imagined right hand backward movement

• One trial of left leg movement

• One trial of right leg movement

Each file contains 19 channels of data corresponding to 19 electrodes. The data samples are stored in the MATLAB file format as 64-bit floats. All data is recorded at a sampling rate of 500 Hz. The duration of the hand movement recordings are each 6.02 seconds giving 3008 samples. The duration of the imagined hand move- ment recordings are each 14.08 seconds giving 7040 samples. The duration of the leg movement recordings are each 20.10 seconds giving 10,048 samples. The data can be found at http://sites.google.com/site/projectbci/, dataset 2. Directly download the Subject1 2D.mat file which contains all the data. Figure 4.2 shows a sample plot of this data.

4.3 P300 Data

This data contains various trials of data used for P300 detection. P300 is a particular EEG waveform which occurs when the subject recognizes an infrequent task-related stimulus, for example, when the image they are looking at on a screen disappears. The subjects were presented with various stimuli at various points in time while remaining still. The data consists of six subjects, each with four sessions of six trials. However, only the first two subjects, with the first two trials of their first two sessions were used giving a total of 8 trials. Each file contains 34 channels of data corresponding to 34 electrodes. The data samples are stored in the MATLAB file format as 64-bit floats. All data is recorded at a sampling rate of 2048 Hz. The trials have varying durations between 50 and 36 CHAPTER 4. DATA

Figure 4.2: Sample of real and imagined motion data

70 seconds (102,400 and 143,360 samples, respectively). The data can be found at http://mmspg.epfl.ch/cms/page-58322.html. The files subject1. through subject9.zip contain all the MATLAB files organized by session and trial. See Figure 4.3 for a sample plot of this data.

4.4 Sleep Data

This data was collected from Caucasian males and females, 21–35 years old, without any medication. The original data files contained data from various sources, such as EOG (Electrooculography: a measure of the resting potential of the retina) and rectal temperature which were recorded while the subject was sleeping. Three data files corresponding to three subjects were used for this study. Two EEG channels were extracted from each data file. The samples are stored in the EDF1

1European Data Format. See http://www.edfplus.info/ 4.4. SLEEP DATA 37

Figure 4.3: Sample of P300 data

file format as 16-bit integers. All data is recorded at a sampling rate of 100 Hz. The three subjects were recorded for almost 24 hours. The data can be found at http: //www.physionet.org/physiobank/database/sleep-edf/. The EDF files used in the paper, which can be directly downloaded, are sc4012e0.rec, sc4102e0.rec, and sc4112e0.rec. See Figure 4.4 for a sample plot of this data. In order to work with these data files, some steps were taken to convert them into a format which could be easily compressed. The next chapter provides the details on the method of preparing the data files for compression, followed by a description of the experiments which were performed and their results. 38 CHAPTER 4. DATA

Figure 4.4: Sample of sleep data Chapter 5

Method and Results

This chapter will describe the compression method used for the experiments in this thesis and the results.

5.1 Method

In order to compress the EEG data, some particular software packages were used, including the scripting language Python, the scientific software libraries SciPy and NumPy, and the FLAC project which performs compression of audio data. First, since the EEG data files used for experimentation were in various formats, a standard file format was chosen for EEG data storage. The format was provided by the NumPy library. Several Python scripts were written to convert each EEG data file into this format. Once the data was in this consistent format, another Python script was written to convert the EEG data into wav files. Each EEG file produced one single channel wav file per channel of EEG data. Often this required the data to be quantized (see Section 5.1.1), and a Python module was written to quantize the EEG data. The wav files were then compressed by FLAC using the line tool. Several other scripts and modules were written to apply filters and other manipulation to the EEG data, and generate plots such as time and frequency domain plots of the EEG data.

39 40 CHAPTER 5. METHOD AND RESULTS

5.1.1 Quantization

Some of the EEG data used in this study used 64-bit floating point numbers to store the EEG samples. However, FLAC only supports compression of wav files with integer samples. Thus, in order to compress this data with FLAC, the samples had to be converted to a format supported by FLAC, namely 16-bit integers, before being saved into a wav file. To accomplish this, two quantization methods were tested. These are described in the next two subsections. After quantization, it is straightforward to create a wav file from the data.

Smallest Difference

Since the original EEG samples were integers and were converted to floating point after being recorded (see Section 2.1.2), it was assumed that this process could be easily reversed. To do so, one could find the smallest non-zero difference between any two samples in the data. This difference is assumed to represent a single bit of preci- sion in the original integer data. Thus the difference between any two samples should be an integer multiple of this smallest difference. Therefore, using this difference and assuming that the data is centered at zero, a linear mapping from the floating point samples to integer samples can be constructed as follows:

Floating Point Sample Quantized Sample = round − C, Smallest Difference

where C is chosen such that the quantized samples fit into an integer data type of the appropriate number of bits, usually 16. Typically C is chosen so that the resulting average value of the integer data is zero. Unfortunately, in the data which was analyzed, the differences between sample values were not always integer multiples of the smallest difference found. When this occurred, some error was introduced into the quantization because the sample values had to be rounded to the nearest integer multiple of the smallest difference. This technique introduced errors into all of the data sets. 5.1. METHOD 41

Maximum Amplitude

The maximum possible error which can occur for a given sample using the above quantization approach is exactly one half the smallest difference (i.e., the divisor). This occurs when a floating point sample value falls directly between two integer multiples of the smallest difference, and thus maps (before rounding) to a number exactly halfway between two integers. There was enough drift in every data file that some samples in each one produced this maximum possible error. Because this error appeared to be inevitable, a second quantization method was used which minimized this error. Since the maximum possible error for a sample is half the divisor, the error can be minimized by minimizing the divisor. To do this, the mapping is constructed such that the largest possible integer sample value maps to the absolute maximum value in the floating point data. This gives the maximum granularity in the integer data and minimizes the divisor, as required.

5.1.2 Filtering EEG Data

For each filter which was applied (described in Section 3.2), several steps were taken. First, a script was written to read the original EEG data, construct the desired filter, apply the filter to the data, and write the resulting data to a new file. Then, the new filtered data file was converted into a set of wav files to be compressed by FLAC, as described in Section 3.1. Some scripts were also written to generate plots of the filtered data. See Figures 5.1 and 5.2 for sample plots of EEG data before and after applying a low-pass filter at 30 Hz. Note that this low-pass filter removes the 50 Hz noise as well as all of its harmonics. All of the scripts were implemented in Python, using the SciPy and NumPy scientific libraries. As was also mentioned in Section 2.3, filtering the data cannot be undone and thus makes the compression process lossy. However, since it was determined that the data being filtered out is typically not used by physicians, this was judged to be acceptable in most cases (see Section 2.1.1). 42 CHAPTER 5. METHOD AND RESULTS

Figure 5.1: Unfiltered EEG data

5.2 Results

Several experiments were done using FLAC to compress EEG data. This section will provide a description of these experiments and their results. In each experiment, compression was applied to all wav files in a given data set. As described in Chapter4, each data set consists of several data files, each of which contains several channels of EEG data. Section 5.1 described the process of converting each of these files into a single channel wav file. For each experiment, this set of wav files is compressed and the compression ratio is computed for each one. Because there are many wav files in each data set, and because the compression ratios for all wav files in a data set tended to have compression ratios which were close to each other, the results for each data set in each experiment will be presented in the form R ± S% where R is the average 5.2. RESULTS 43

Figure 5.2: EEG data of Figure 5.1 with low-pass filter applied compression ratio of all the wav files in the data set, and S is the standard deviation. As mentioned in Section 2.2.1, the compression ratio is expressed as a percentage.

5.2.1 Compression of Unfiltered EEG

The first experiment was to use FLAC to compress the EEG data sets without any filtering. In the case of the sleep data, no quantization was required. This is because the original data was stored as 16-bit integer samples (see Section 4.4). Thus this experiment provided lossless compression for this data set. For the rest of the data, this technique was close to lossless, but there was a small amount of error introduced by quantizing the original floating point samples (see Section 5.1.1). This experiment did not yield results as positive as expected. For typical audio 44 CHAPTER 5. METHOD AND RESULTS

files, FLAC may give a compression ratio of 40–60% or less. However, three of the four unfiltered EEG data sets gave an average compression ratio of roughly 70–80% and one data set didn’t compress at all, as the average compression ratio was greater than 100%. See Table 5.1 for details. Data Set Compression Ratio Hand Motion Data 82.1 ± 3.6% Real and Imagined Motion Data 192.5 ± 41.1% P300 Data 70.5 ± 3.9% Sleep Data 77.7 ± 5.3%

Table 5.1: Average compression ratios of unfiltered data

As shown in Table 5.1, the Real and Imagined Motion Data has a compression ratio greater than 100%, which means that after running the compression algorithm, the output data file was larger than the original. This means that it would have been better, from a data compression standpoint, not to attempt to compress the data at all. Upon inspection, there is nothing apparently out of the ordinary in the EEG data itself. However, the data file is very small. Each trial contains only 3008 or 7040 samples of data (see Section 4.2). The overhead which is written out to every FLAC file is high enough that a file this short could never be compressed very well. For comparison, compression was attempted using some generic data compression programs, namely gzip, bzip2, and lzip. In most cases, at least one of these generic compression algorithms performed better than FLAC. These results are shown in Table 5.2. This was unexpected, since the generic programs do not take the structure and properties of the EEG data in the wav files into consideration. Exploration of the implementation of FLAC was done to determine why the EEG files were not compressing well. Since FLAC is an open-source project, it was possible to add additional logging in order to display information about each subframe as it was produced. This implementation was done, giving information which included the type of predictor which was used, the predictor order, the residuals which were computed, the number of Rice partitions and the Rice parameter for each partition, and the quotients and remainders of the residual encoding. 5.2. RESULTS 45

Data Set Compression Ratio FLAC gzip bzip2 lzip Hand Motion Data 82.1 ± 3.6% 93.7 ± 2.5% 77.1 ± 4.9% 81.9 ± 3.9% Real and Imagined Motion Data 192.5 ± 41.1% 98.7 ± 2.1% 93.9 ± 6.7% 96.8 ± 3.5% P300 Data 70.5 ± 3.9% 88.6 ± 3.4% 76.1 ± 6.1% 77.3 ± 4.3% Sleep Data 77.7 ± 5.3% 73.4 ± 6.7% 53.7 ± 4.5% 58.6 ± 4.8%

Table 5.2: Average compression ratios of unfiltered data with generic compression programs

It was determined that there were various factors which made compressing the EEG data less effective than compressing audio data. The residuals produced by FLAC for the EEG files were much larger than typical residuals produced for audio files. This caused FLAC to use larger Rice parameters for the EEG files, thus using more bits per sample in the compressed file. Also, FLAC tended to use mostly LPC subframes with high order for EEG files, generating more overhead than for lower- order subframes. Upon further inspection of some audio files, it was seen that FLAC occasionally produced some subframes which used high order predictors and had large residu- als. These subframes tended to occur when high frequency sounds, such as cymbal crashes, were introduced into the audio. These subframes were very similar to the subframes which occurred in the EEG files. Table 5.3 gives a comparison of some subframe information from an audio file and an EEG file. The information includes the LPC order of the subframe, the maximum magnitude residual value, and the average Rice parameter among the partitions (Rice partitions are described in Sec- tion 2.2.4). Because of these similarities, it was hypothesized that the reason for the poor compression in the EEG files was due to the existence of high-frequency waves in the EEG data.

5.2.2 Compression of Low-pass Filtered EEG

To confirm this hypothesis, the next experiment involved filtering out some of the high frequency noise which is generally not useful in EEG analysis (see Section 2.1.1 46 CHAPTER 5. METHOD AND RESULTS

Subframe type LPC Order Maximum Residual Average Rice Magnitude Parameter Normal audio data 7 76 5 5 99 5 5 83 4.25 Audio data with 8 1272 7.5 high frequencies 8 5775 9 8 1457 7.875 EEG data 8 3022 9 8 2588 9 8 2827 9.5

Table 5.3: Some subframe information from audio and EEG data for a discussion of the various frequency components of EEG data). Initially, the filter used was a low pass filter with a cutoff at 30 Hz, transition width of 15 Hz, and attenuation of 50 dB. The filter was designed this way in order to pass all useful frequencies of EEG data (0–30 Hz), remove the noise resulting from AC power lines (50 Hz or 60 Hz, depending on the power supply of the country where the EEG data was recorded), but still have a reasonably wide transition band to maintain efficiency of the filter. The low-pass filter significantly increased the compression performance of FLAC. Table 5.4 contains the resulting average compression ratios. The average compression ratio decreased significantly for the first three data sets in the table. The last data set’s compression ratio also decreased, but not by as much. It is speculated that this moderate change in performance is because of the lower sampling rate of this data set. When filtering the first two data sets at 30 Hz, the filter attenuates all components of the signal between 30 Hz and 250 Hz, which is the highest frequency stored in this file. Similarly, for the third data set, the filter attenuates all components between 30 Hz and 1024 Hz. However, for the fourth data set, the filter only attenuates the components between 30 Hz and 50 Hz, so it’s no great surprise that the compression ratio didn’t change by a large amount. Along the same lines, note that the third data set had very good compression performance because of it’s high sampling rate. This is because a great deal of the 5.2. RESULTS 47

Data Set Sampling Rate Compression Ratio Compression Ratio (Hz) Unfiltered Low-Pass Filter Hand Motion Data 500 82.1 ± 3.6% 32.4 ± 1.9% Real and Imagined Motion Data 500 192.5 ± 41.1% 148.9 ± 46.6% P300 Data 2048 70.5 ± 3.9% 15.2 ± 1.2% Sleep Data 100 77.7 ± 5.3% 62.2 ± 3.6%

Table 5.4: Average compression ratios of low-pass filtered data high frequency noise in the signal was attenuated, namely, all frequency components between 30 Hz and 1024 Hz. So EEG data with a high sampling rate should tend to give better compression after filtering, as this one does.

5.2.3 Compression of Notch Filtered EEG

Initially, it was suspected that the reason that the 30 Hz low-pass filter improved performance so greatly was due to the fact that it filtered out the 50 Hz or 60 Hz electrical line interference. Figure 5.3 shows the FFT plot (FFT plots are described in AppendixA) of a channel of data from the hand motion data. We see from this plot that there is a large spike in the frequency spectrum around 50 Hz. This means that the frequency component at 50 Hz has a much higher magnitude than the other high-frequency components. Other than that, most of the power appears to be in the low frequencies as expected. Figure 5.4 shows the FFT plot of the data after having the low-pass filter applied. The next experiment was to apply a notch filter to remove the 50 Hz electrical noise, as previously described in Section 3.2. The notch filter attenuated frequencies from 40–60 Hz with a width on each side of 9.5 Hz. The attenuation was 50 dB. This filter is less lossy than the low-pass filter as it does not remove all frequencies above 30 Hz. Afterward, another filter was tried which applied a notch filter at 50 Hz as well as all the harmonics of 50 Hz, also described in Section 3.2. As seen in Figure 5.3, there are significant portions of the signal contained in these harmonics. The signal was passed through a bank of filters, where each filter had an attenuation of 50 dB. 48 CHAPTER 5. METHOD AND RESULTS

Figure 5.3: FFT of unfiltered EEG data

There were four notch filters, with cutoff bands of 40–50 Hz, 90–110 Hz, 140–160 Hz, and 190–210 Hz. Each notch filter used a transition width of 9.5 Hz. Finally, a low- pass filter was applied with a cutoff of 230 Hz and a width of 15 Hz to remove the harmonic near the Nyquist frequency (250 Hz). The results of both experiments are shown in Table 5.5 and their FFT plots are shown in Figures 5.5 and 5.6. The notch filters did improve the compression, but not by as much as the low- pass filter. Filtering out the harmonics of 50 Hz improved the compression by more than just filtering out the 50 Hz noise. However, it appears that the high frequency noise outside the harmonics of 50 Hz contributes significantly to the compression performance as well. To explore this effect, the next experiment was to apply several low-pass filters to the data at various cutoff frequencies. This experiment was done for 5.2. RESULTS 49

Figure 5.4: FFT of low-pass filtered EEG data both the Hand Motion Data and the Sleep Data. As expected, the compression ratio decreased as the cutoff frequency decreased. See Tables 5.6 and 5.7 and Figures 5.7 and 5.8 for results. In the case of the Hand Motion Data and P300 Data, filtering the data and then compressing with FLAC gave better compression performance than compressing with generic compression programs (compare Tables 5.2 and 5.4). After performing the low-pass filter, the generic compression programs were used again to compress the filtered data. In this case, FLAC performed better all the data sets except for the Real and Imagined Motion Data. These results are shown in Table 5.8. The subframe information was examined for the low-pass filtered data after com- pression. Interestingly, for the filtered data, FLAC used subframes coded with fixed 50 CHAPTER 5. METHOD AND RESULTS

Filter Cut Average Compression Ratio None 82.1 ± 3.6% 40–60 Hz 80.5 ± 3.9% Harmonics 77.2 ± 4.2%

Table 5.5: Average compression ratios of hand motion data after notch filtering

Filter type Cut (Hz) Average Compression Ratio Low-pass 10 24.6 ± 0.8% Low-pass 20 29.2 ± 1.3% Low-pass 30 32.4 ± 1.9% Low-pass 40 40.2 ± 3.8% Low-pass 50 50.7 ± 2.3% Low-pass 60 52.0 ± 2.6% Low-pass 70 54.9 ± 2.6% Low-pass 80 56.9 ± 2.5% Low-pass 90 58.9 ± 2.4% Low-pass 100 61.4 ± 2.3% None 82.1 ± 3.6%

Table 5.6: Compression performance of hand motion data with various low-pass filters (15 Hz width and 50 dB attenuation) polynomial predictors far more often than LPC. Recall that before filtering, high or- der LPC predictors were the dominant predictor used. FLAC uses fixed polynomial predictors of order 0–4 when encoding subframes. Most of the subframes in the fil- tered EEG file used a fixed polynomial predictor of order 4. Since order 4 was the maximum possible, it was thought that higher orders would possibly perform better if FLAC had the capability to use them. An extension was implemented for FLAC so that it could use fixed polynomial predictors of order 5, 6, or 7, in addition to order 0–4. Even after this change, relatively few subframes used predictors of order greater than 4, and the compression ratio did not greatly improve. In fact, in one case it got worse. See Table 5.9. The next chapter provides conclusions and ideas for future work. 5.2. RESULTS 51

Figure 5.5: FFT of notch filtered hand motion data 52 CHAPTER 5. METHOD AND RESULTS

Figure 5.6: FFT of hand motion data with harmonics of 50 Hz filtered 5.2. RESULTS 53

Filter type Cut (Hz) Average Compression Ratio Low-pass 4 30.0 ± 3.2% Low-pass 8 36.6 ± 3.5% Low-pass 12 42.0 ± 3.5% Low-pass 16 46.2 ± 3.4% Low-pass 20 50.7 ± 3.4% Low-pass 24 54.8 ± 3.2% Low-pass 28 59.7 ± 3.5% Low-pass 32 64.5 ± 3.8% Low-pass 36 68.5 ± 3.9% Low-pass 40 71.3 ± 4.0% None 77.7 ± 5.3%

Table 5.7: Compression performance of sleep data with various low-pass filters (15 Hz width and 50 dB attenuation)

Data Set Compression Ratio FLAC gzip bzip2 lzip Hand Motion Data 32.4 ± 1.9% 94.1 ± 1.8% 87.7 ± 3.3% 76.9 ± 4.1% Real and Imagined Motion Data 192.5 ± 41.1% 96.4 ± 4.7% 100.5 ± 7.2% 87.4 ± 7.9% P300 Data 15.2 ± 1.2% 31.4 ± 14.7% 15.5 ± 10.0% 20.2 ± 9.7% Sleep Data 62.2 ± 3.6% 90.7 ± 2.7% 78.4 ± 3.7% 83.2 ± 3.7%

Table 5.8: Average compression ratios of low-pass filtered data with generic compres- sion programs

Data Set Compression Ratio Compression Ratio Before Change After Change Hand Motion Data 32.4 ± 1.9% 31.9 ± 1.7% Real and Imagined Motion Data 148.9 ± 46.6% 148.0 ± 47.8% P300 Data 15.2 ± 1.2% 15.1 ± 1.3% Sleep Data 62.2 ± 3.6% 69.2 ± 10.1%

Table 5.9: Average compression ratios of low-pass filtered data before and after adding high-order fixed polynomial prediction 54 CHAPTER 5. METHOD AND RESULTS

Figure 5.7: Compression ratio vs. cutoff frequency of hand motion data 5.2. RESULTS 55

Figure 5.8: Compression ratio vs. cutoff frequency of sleep data 56 CHAPTER 5. METHOD AND RESULTS Chapter 6

Conclusions and Future Work

In this thesis, the problem of large data sizes for EEG data was discussed. The proposed solution was to apply data compression to the EEG data in order to reduce storage space and transfer time. Signal filtering was used as a method for improving the compression performance. For most data, a compression ratio of 15–30% was achieved. In comparison, the best compression ratio which was achieved using generic compression programs was around 50%, so using FLAC along with signal filtering was shown to be a superior method of EEG compression. As mentioned before, a 24 hour recording of a single electrode could generate 15–400 MB of uncompressed data. With a 30% compression ratio, this data can be stored or transmitted using only 4.5–120 MB. This is a little less than one third the amount of space required for the uncompressed data. This means that EEG researchers or medical specialists may use three times as many electrodes, three times the recording quality, or record for three times as long without taking up extra space. Furthermore, compression performance is better for data with higher sampling rates, which corresponds to higher quality EEG recordings. Thus, EEG can be recorded at very high quality and then be greatly compressed so that it doesn’t take too much space. EEG data compression has the potential of changing the field of consumer-level BCI as well. Since many games and applications are moving data processing to the cloud, it’s probable that games and applications which use BCI will begin to do the

57 58 CHAPTER 6. CONCLUSIONS AND FUTURE WORK

same. Like image and audio data files, EEG data will have to be compressed in order to be transferred efficiently across computer networks. EEG data compression opens up new possibilities for medical applications as well. EEG data could be monitored or analyzed at high quality in a remote location since the compressed signal can be transferred quickly. If EEG is to be recorded for a very long period of time, doctors will not have to decrease the recording quality or the number of electrodes by as much in order to make the data size manageable.

6.1 Future Work

In the future, a more in-depth investigation of FLAC’s behaviour with EEG data would be useful. FLAC compresses audio data reasonably well, even though audio data has a wide frequency range with the possibility of high frequency sounds. It remains uncertain as to why FLAC performed so poorly on the unfiltered EEG data with its high frequency components retained. Another useful study would be to show the filtered EEG data to people such as doctors and EEG researchers who use EEG data on a daily basis. The cutoff frequency of the low-pass filters was chosen so that the frequency bands of the EEG data which are traditionally most useful remain mostly unaffected. However, modern research may require access to higher frequencies, and as the research progresses in the future, some medical and BCI applications may require these high frequencies as well. Along with determining which frequencies are needed for which applications, a study into the compression of specifically high frequency EEG data could also be useful. Beyond having doctors and researchers visually inspect the data, there is also software which can be used to detect particular EEG characteristics. This software can be used to determine whether particular characteristics can be detected even after lossy compression is employed. Higgins et al. used this method in [HFM+10] to determine the acceptable amount of loss for EEG compression. However, they used a different compression algorithm, so it would be valuable to use this idea for the compression algorithm presented in this thesis. 6.1. FUTURE WORK 59

A study of the how the various parameters of EEG recording affect its compres- sion would be very useful. It has been mentioned in this thesis that higher quality recordings produce data which compresses better. However, other parameters could be explored as well, such as the number of bits per sample, the type of electrodes used, the amount of amplification and filtering before the A/D converter, etc. Since some error was introduced into the data by quantizing the floating point sample values, it would be good to at least quantify this error and preferably avoid it in the future. Obtaining the integer samples directly rather than floating point samples would be beneficial for any future research. 60 CHAPTER 6. CONCLUSIONS AND FUTURE WORK Appendix A

Fourier Transform

The Fourier Transform describes a signal in the frequency domain, rather than the time domain. Every time domain signal can be decomposed into a (possibly infinite) set of sinusoids with various frequencies, amplitudes and phase shifts. For discrete signals, such as EEG data, we take the Discrete Fourier Transform (DFT) to get this information. The DFT is a mathematical transformation which takes a discrete signal x of length n as input, and produces an array of complex numbers X of length n. If x consists only of real numbers, then the result of the DFT will have the property that the first bn/2c elements will equal the complex conjugates of the last bn/2c elements. Formally: jnk X[i] = X¯[n − i], 0 ≤ i < 2 This assumes 0-based indexing for X. These complex numbers represent the magnitude and phase angle of the sinusoidal frequency component of x at a given frequency. The frequency represented by element i is fi = i·fs/n, where fs is the sampling rate. The Nyquist Sampling Theorem states that all frequency components in a signal which can be represented by sampling at a given rate fs are strictly less than half of that rate (the quantity fs/2 is known as the Nyquist Rate). That is, all values in X at indices greater than or equal to n/2 are meaningless in the context of real-valued signals. Thus when analyzing the DFT

61 62 APPENDIX A. FOURIER TRANSFORM

of a signal, often only the first b(n − 1)/2c samples are considered. The mathematical definition of the DFT is as follows, assuming a signal x of √ length n, and where j is the imaginary unit (j = −1):

n−1 X  j2πki X[k] = x[n] exp − , 0 ≤ k ≤ n − 1 n i=0

An algorithm which directly computes the DFT would run in O(n2) time. However, in 1965 Cooley and Tukey produced a method for computing the DFT in O(n log n) time, which is now called the Fast Fourier Transform, or FFT. A discussion of the details of the FFT method is beyond the scope of this thesis; however, most software libraries which compute DFTs use an FFT algorithm because of its performance. For a detailed discussion of the DFT and FFT, see chapters 4 and 5 of [Por97]. In this thesis, when an FFT is applied to a signal and then shown in a plot, the plot is referred to as an FFT Plot and will display the signal power in dB with respect to frequency. The power of a frequency component in dB is given by P = −20 log M where M is the magnitude of the frequency component. This measurement is often more convenient since the plot is displayed on a logarithmic scale. Often the magnitudes of various frequency components vary widely, so viewing them on a logarithmic scale gives a better sense of how the frequency components compare. Appendix B

Fixed Polynomial Predictor

A fixed-order polynomial predictor computes an expected value for a sample given some number of previous samples. It does this by fitting a polynomial curve to the previous samples, using the resulting curve to compute the expected value of the next sample. The only parameter of the predictor is the order n. Given a predictor of order n, it will construct a polynomial of order n − 1 to fit to the previous n points in the data. Specifically:

• n = 0: the predictor is the line y = 0

• n = 1: the predictor is the line y = a which passes through one previous sample

• n = 2: the predictor is the line y = ax + b which passes through two previous samples

• n = 3: the predictor is the parabolic function y = ax2 + bx + c which passes through three previous samples

• n = 4: the predictor is the cubic function y = ax3 + bx2 + cx + d which passes through four previous samples

This is very space efficient since the only prediction parameter which needs to be stored is the order of the predictor. This is also very time efficient because the predic- tion polynomials can be calculated very efficiently using the Lagrange Interpolation

63 64 APPENDIX B. FIXED POLYNOMIAL PREDICTOR

Polynomial1.

Given a set of (k + 1) data points (x0, y0), ..., (xj, yj), ..., (xk, yk), then given an x value x, we can predict the value y = L(x) as follows:

k X L(x) = yjlj(x) j=0

where Y x − xm lj(x) = xj − xm 0≤m≤k,m6=j

However, if these points are the sample values to be fed into our predictor, then it is a special case where xi are all integers, xj − xi = j − i, and the point to be predicted is yk+1. Thus:

Y xk+1 − xm lj(k + 1) = xj − xm 0≤m≤k,m6=j Y (k + 1) − m = xj − xm 0≤m≤k,m6=j (k + 1) − 0 (k + 1) − 1 (k + 1) − (j − 1) = ··· j − 0 j − 1 j − (j − 1) (k + 1) − (j + 1) (k + 1) − k  ··· j − (j + 1) j − k k + 1  k  k − j + 2 k − j   1  = ··· ··· j j − 1 1 −1 j − k

The first term in square brackets can be rewritten:

k + 1  k  k − j + 2 (k + 1)! ··· = j j − 1 1 j!((k + 1) − j)! k + 1 = j

1For example, see http://mathworld.wolfram.com/LagrangeInterpolatingPolynomial.html 65

The second term can also be rewritten. There are k − j terms being multiplied together, each with a negative denominator. So the sign of the product is determined by (−1)k−j. Thus:

k − j   1  k − j  k − j − 1  2   1  ··· = (−1)k−j ··· −1 j − k 1 2 k − j − 1 k − j (k − j)! = (−1)k−j (k − j)! = (−1)k−j

So with this special case, we have the following:

k + 1 l (x ) = (−1)k−j . j k+1 j

Thus:

k X L(xk+1) = yjlj(xk+1) j=0 k   X k−j k + 1 = (−1) y . j j j=0

This final equation can be hard-coded for supported orders of the fixed polynomial

predictor. The only variables are the yj’s and the rest can be hard-coded constants. This makes the fixed polynomial predictor very computationally efficient, along with being space efficient. 66 APPENDIX B. FIXED POLYNOMIAL PREDICTOR Appendix C

Linear Predictive Coding

Linear Predictive Coding (LPC) is a method of predicting the next value of a sequence based on some number of previous values. For a model of order n, the previous n samples are used. The prediction is made by applying a linear combination to the sample values. To get a predictionx ˆi for sample i, we compute:

n X xˆi = a1xi−1 + a2xi−2 + ··· + anxi−n = ajxi−j j=1 where the aj are the LPC coefficients.

The coefficients aj are chosen in such a way so that the sum of the squared residuals is minimized. The residual ei for sample i is computed as follows:

ei = xi − xˆi.

Thus the sum of the squared residuals is (assuming a signal x of length M):

n ! X 2 X X E = ei = xi − ajxi−j i i j=1

The values aj are then chosen to minimize E. The procedure for finding the aj values can be found in Chapter 11 of [Say05].

67 68 APPENDIX C. LINEAR PREDICTIVE CODING Bibliography

[AT91] G Antoniol and P Tonella. EEG data compression techniques. IEEE Transactions on Biomedical Engineering, 44(2):417–423, 1991.

[Bio12] Biomedical signals acquisition, 2012. Retrieved February 29, 2012 from the World Wide Web: http://www.medicine.mcgill.ca/physio/vlab/ biomed_signals/vlabmenubiosig.htm.

[Coa08] Josh Coalson. FLAC — Free Lossless Audio Codec, 2000–2008. Re- trieved February 28, 2012 from FLAC’s official website: http://flac. sourceforge.net/.

[Eav11] Paralyzed man controls a robotic arm with high precision, via thought, 2011. Retrieved February 28, 2012 from the World Wide Web: http://www.element14.com/community/groups/robotics/blog/ 2011/10/18/paralyzed-man-controls-a-robotic-arm-with-high- precision-via-thought.

[GPR+03] Dan Griffiths, Jim Peters, Andreas Robinson, Jack Spaar, Yaniv Vilnai, and Nelo (sic). The ModularEEG design, 2003. Retrieved February 23, 2012 from the World Wide Web: http://openeeg.sourceforge.net/ doc/modeeg/modeeg_design.html.

[HFM+10] Garry Higgins, Stephen Faul, Robert P McEvoy, Brian McGinley, Mar- tin Glavin, William P Marnane, and Edward Jones. EEG compression using JPEG2000: how much loss is too much? Conference proceedings: Annual International Conference of the IEEE Engineering in Medicine

69 70 BIBLIOGRAPHY

and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, 2010:614–7, January 2010.

[Hug08] John R Hughes. Gamma, fast, and ultrafast waves of the brain: their relationships with epilepsy and behavior. Epilepsy & behavior : E&B, 13(1):25–31, July 2008.

[Lyo01] R G Lyons. Understanding Digital Signal Processing. Prentice Hall, 2001.

[MC99] N. Memon and J. Cinkler. Context-based lossless and near-lossless com- pression of EEG signals. IEEE Transactions on Information Technology in Biomedicine, 3(3):231–238, 1999.

[MMM96] N. Magotra, G. Mandyam, and W. McCoy. Lossless compression of elec- troencephalographic (EEG) data. In 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. IS- CAS 96, volume 2, pages 313–315. IEEE, 1996.

[NL05] Ernst Niedermeyer and Fernando Lopes da Silva. Electroencephalography: Basic Principles, Clinical Applications, and Related Fields, volume 1. Lip- pincott Williams & Wilkins, 2005.

[Por97] Boaz Porat. A Course in Digital Signal Processing. John Wiley and Sons, 1997.

[Say00] Khalid Sayood. Introduction to Data Compression. The Morgan Kauf- mann Series in Multimedia Information and Systems. Morgan Kaufmann Publishers, second edition, 2000.

[Say05] Khalid Sayood. Introduction to Data Compression. The Morgan Kauf- mann Series in Multimedia Information and Systems. Morgan Kaufmann Publishers, third edition, 2005.

[Ylo99] Jyri Ylostalo. Data compression methods for EEG. Technology and Health Care, 7(4):285–300, June 1999.