Understanding 1D Convolutional Neural Networks Using Multiclass Time-Varying Signals Ravisutha Sakrepatna Srinivasamurthy Clemson University, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
Clemson University TigerPrints All Theses Theses 8-2018 Understanding 1D Convolutional Neural Networks Using Multiclass Time-Varying Signals Ravisutha Sakrepatna Srinivasamurthy Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_theses Recommended Citation Srinivasamurthy, Ravisutha Sakrepatna, "Understanding 1D Convolutional Neural Networks Using Multiclass Time-Varying Signals" (2018). All Theses. 2911. https://tigerprints.clemson.edu/all_theses/2911 This Thesis is brought to you for free and open access by the Theses at TigerPrints. It has been accepted for inclusion in All Theses by an authorized administrator of TigerPrints. For more information, please contact [email protected]. Understanding 1D Convolutional Neural Networks Using Multiclass Time-Varying Signals A Thesis Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Master of Science Computer Engineering by Ravisutha Sakrepatna Srinivasamurthy August 2018 Accepted by: Dr. Robert J. Schalkoff, Committee Chair Dr. Harlan B. Russell Dr. Ilya Safro Abstract In recent times, we have seen a surge in usage of Convolutional Neural Networks to solve all kinds of problems - from handwriting recognition to object recognition and from natural language processing to detecting exoplanets. Though the technology has been around for quite some time, there is still a lot of scope to do research on what's really happening 'under the hood' in a CNN model. CNNs are considered to be black boxes which learn something from complex data and provides desired results. In this thesis, an effort has been made to explain what exactly CNNs are learning by training the network with carefully selected input data. The data considered here are one dimensional time varying signals and hence the 1-D convolutional neural networks are used to train, test and to analyze the learned weights. The field of digital signal processing (DSP) gives a lot of insight into understanding the seemingly random weights learned by CNN. In particular, the concepts of Fourier transform, Savitzky-Golay filters, Guassian filters and FIR filter design lights up seeming dark alley of CNNs. As a result of this study, a few interesting inferences can be made regarding dropout regularization, optimal kernel length and optimal number of convolution layers. ii Acknowledgments I would like to thank my advisor, Dr. Robert J Schalkoff for his expert advice and for the encouragement without which this work would have never seen a light of day. His courses ECE 8650 and ECE 8720 have greatly helped me in understanding machine learning and artificial neural network concepts in depth and has inspired me to explore uncharted territories. I would also like to thank Dr. Russell and Dr. Safro for being a part of my committee and for their timely suggestions. Finally, I am grateful to all the professors who have taught me so much in the span of past two years. iii Table of Contents Title Page ....................................... i Abstract ........................................ ii Acknowledgments .................................. iii List of Tables ..................................... vi List of Figures..................................... vii 1 Introduction.................................... 1 1.1 Objective ..................................... 1 1.2 Evolution of Convolutional Neural Networks.................. 1 1.3 Analysis Using DSP Tools............................ 3 1.4 Related Works .................................. 4 1.5 Contributions................................... 4 2 Basics of 1D CNN ................................ 6 2.1 Forward Propagation............................... 6 2.2 Convolution Layer ................................ 7 2.3 Multilayer Feed Forward Network........................ 12 2.4 Output Layer - One Hot Encoding....................... 13 2.5 Activation Functions............................... 13 2.6 Dropouts ..................................... 15 2.7 Back Propagation................................. 15 3 Time Domain Analysis.............................. 19 3.1 Classification of Sinusoids............................ 19 3.2 A More Challenging Dataset .......................... 23 3.3 Classification of Triangular Waveforms..................... 31 3.4 Weights Learned by Fully Connected Network................. 35 4 Frequency Domain Tools ............................ 39 4.1 Discrete Fourier Transform ........................... 39 4.2 DFT of Triangular and Rectangular Waveforms................ 43 4.3 FIR Filters .................................... 43 4.4 Analyzing FFT of a Filter............................ 46 iv 5 Frequency Domain Analysis .......................... 47 5.1 Classification of Sinusoids............................ 47 5.2 Classification of Sinusoids in Presence of Noise ................ 53 5.3 Effect of Dropout Regularization........................ 57 5.4 Effect of Different Types of Padding ...................... 61 5.5 Triangle or Rectangular Waveform Classifications of Different Frequencies . 61 5.6 Conclusion .................................... 64 6 Implications and Extensions .......................... 65 6.1 Time Domain Implications............................ 65 6.2 Frequency Domain Implications......................... 69 6.3 Distinguishing Sequences............................. 75 7 Conclusions and Future Work ......................... 80 Appendices ...................................... 82 A Introduction to Keras .............................. 83 Bibliography...................................... 89 v List of Tables 2.1 Keras non causal convolution code....................... 9 2.2 Keras causal convolution code.......................... 10 3.1 Dataset 1 ..................................... 20 3.2 CNN Model 1................................... 21 3.3 CNN model 1 validation............................. 21 3.4 Dataset ...................................... 24 3.5 Experiment 2 validation............................. 24 3.6 Dataset ...................................... 31 3.7 Experiment 3 validation............................. 31 5.1 Dataset ...................................... 47 5.2 CNN Model 1................................... 48 5.3 CNN model 1 validation............................. 48 5.4 Dataset description................................ 54 5.5 CNN model 1 validation............................. 54 5.6 Dataset description................................ 64 6.1 Dataset ...................................... 66 6.2 CNN Model 2................................... 66 6.3 Dataset ...................................... 73 6.4 CNN Model.................................... 73 6.5 Results....................................... 73 6.6 Result....................................... 74 6.7 Result - Dataset With Noise........................... 75 6.8 Result - CNN with Layer 2 ........................... 77 6.9 Results - Dataset Without Noise ........................ 77 vi List of Figures 1.1 A simple ANN model consisting of an input layer, a hidden layer and an output layer.................................... 2 1.2 ILSVRC-2012 winning CNN model [14] .................... 3 2.1 A typical CNN structure............................. 7 2.2 Flatten output and fully connected network.................. 12 2.3 Plot of ReLU activation function........................ 14 2.4 Dropout...................................... 15 3.1 Two classes - sinusoids of 5 and 10Hz ..................... 20 3.2 CNN Model 1................................... 21 3.3 Loss plot vs epoch ................................ 22 3.4 Kernel 1...................................... 22 3.5 Kernel 2...................................... 23 3.6 Two ways of overlapping............................. 25 3.7 Dataset 2 - Overlapped triangle and rectangular waveforms......... 26 3.8 Differentiation of sigmoids............................ 28 3.9 Learned Kernels ................................. 29 3.10 Comparing learned kernel weights and artificially generated Gaussian filter weights ...................................... 29 3.11 Example inputs.................................. 30 3.12 Example outputs of convolution......................... 30 3.13 Dataset ...................................... 32 3.14 First run...................................... 33 3.15 Second run .................................... 33 3.16 Possible filter coefficients to achieve second order differentiation via convolution. 34 3.17 Second run - Comparing Sal-gov filter output with original kernel . 36 3.18 Second run .................................... 37 3.19 Last layer weights ................................ 38 4.1 Magnitude FFT response of the input sequence................ 42 4.2 Rectangular waveform and its DFT....................... 44 4.3 Triangular waveform and its DFT ....................... 45 5.1 Weights ...................................... 48 5.2 FFT of Kernels 1 and 2 ............................. 50 5.3 Example 1: 5 Hz input.............................. 51 vii 5.4 Example 2: 10 Hz input............................. 52 5.5 Dataset ...................................... 54 5.6 Frequency Response ............................... 55 5.7 Examples ..................................... 56 5.8 CNN Model 2................................... 58 5.9 Example outputs with dropout - p(drop) = 0.8................ 59 5.10 Frequency response of the convolution kernels for p(drop) = 0.8 . 60 5.11 Triangular and Rectangular datasets...................... 62 5.12