Unitary and Symmetric Structure in Deep Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
University of Kentucky UKnowledge Theses and Dissertations--Mathematics Mathematics 2020 Unitary and Symmetric Structure in Deep Neural Networks Kehelwala Dewage Gayan Maduranga University of Kentucky, [email protected] Author ORCID Identifier: https://orcid.org/0000-0002-6626-9024 Digital Object Identifier: https://doi.org/10.13023/etd.2020.380 Right click to open a feedback form in a new tab to let us know how this document benefits ou.y Recommended Citation Maduranga, Kehelwala Dewage Gayan, "Unitary and Symmetric Structure in Deep Neural Networks" (2020). Theses and Dissertations--Mathematics. 77. https://uknowledge.uky.edu/math_etds/77 This Doctoral Dissertation is brought to you for free and open access by the Mathematics at UKnowledge. It has been accepted for inclusion in Theses and Dissertations--Mathematics by an authorized administrator of UKnowledge. For more information, please contact [email protected]. STUDENT AGREEMENT: I represent that my thesis or dissertation and abstract are my original work. Proper attribution has been given to all outside sources. I understand that I am solely responsible for obtaining any needed copyright permissions. I have obtained needed written permission statement(s) from the owner(s) of each third-party copyrighted matter to be included in my work, allowing electronic distribution (if such use is not permitted by the fair use doctrine) which will be submitted to UKnowledge as Additional File. I hereby grant to The University of Kentucky and its agents the irrevocable, non-exclusive, and royalty-free license to archive and make accessible my work in whole or in part in all forms of media, now or hereafter known. I agree that the document mentioned above may be made available immediately for worldwide access unless an embargo applies. I retain all other ownership rights to the copyright of my work. I also retain the right to use in future works (such as articles or books) all or part of my work. I understand that I am free to register the copyright to my work. REVIEW, APPROVAL AND ACCEPTANCE The document mentioned above has been reviewed and accepted by the student’s advisor, on behalf of the advisory committee, and by the Director of Graduate Studies (DGS), on behalf of the program; we verify that this is the final, approved version of the student’s thesis including all changes required by the advisory committee. The undersigned agree to abide by the statements above. Kehelwala Dewage Gayan Maduranga, Student Dr. Qiang Ye, Major Professor Dr. Benjamin Braun, Director of Graduate Studies Unitary and Symmetric Structure in Deep Neural Networks DISSERTATION A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the College of Arts and Sciences at the University of Kentucky By Kehelwala Dewage Gayan Maduranga Lexington, Kentucky Director: Dr. Qiang Ye, Professor of Mathematics Lexington, Kentucky 2020 Copyright c Kehelwala Dewage Gayan Maduranga 2020 https://orcid.org/0000-0002-6626-9024 ABSTRACT OF DISSERTATION Unitary and Symmetric Structure in Deep Neural Networks Recurrent neural networks (RNNs) have been successfully used on a wide range of sequential data problems. A well-known difficulty in using RNNs is the vanishing or exploding gradient problem. Recently, there have been several different RNN ar- chitectures that try to mitigate this issue by maintaining an orthogonal or unitary recurrent weight matrix. One such architecture is the scaled Cayley orthogonal recur- rent neural network (scoRNN), which parameterizes the orthogonal recurrent weight matrix through a scaled Cayley transform. This parametrization contains a diagonal scaling matrix consisting of positive or negative one entries that can not be optimized by gradient descent. Thus the scaling matrix is fixed before training, and a hyperpa- rameter is introduced to tune the matrix for each particular task. In the first part of this thesis, we develop a unitary RNN architecture based on a complex scaled Cayley transform. Unlike the real orthogonal case, the transformation uses a diagonal scaling matrix consisting of entries on the complex unit circle, which can be optimized using gradient descent and no longer requires the tuning of a hyperparameter. We compare the performance of The scaled Cayley unitary recurrent neural network (scuRNN) with scoRNN and other unitary RNN architectures. Convolutional Neural Networks (CNNs) is a class of deep neural networks, most commonly applied to analyzing visual imagery. Nowadays, deep neural networks also play an important role in understanding biological problems such as modeling RNA sequences and protein sequences. The second part of the thesis explores deep learning approaches involving recurrent and convolutional networks to directly infer RNA secondary structure or Protein contact map, which has a symmetric feature matrix as output. We develop a CNN architecture with a suitable symmetric parameterization of the convolutional Kernel that naturally produces symmetric feature matrices. We apply this architecture to the inference tasks for the RNA secondary structure or protein contact maps. We compare our symmetrized CNN architecture with the usual convolution network architecture and show that these approaches can improve prediction results while using equal or fewer numbers of machine parameters. KEYWORDS: Deep Learning, Recurrent Neural Networks, Convolutional Neural Networks, Symmetrized CNN Architecture, RNA Secondary Structure, Protein Contact Map Kehelwala Dewage Gayan Maduranga August 6, 2020 iii Unitary and Symmetric Structure in Deep Neural Networks By Kehelwala Dewage Gayan Maduranga Qiang Ye Director of Dissertation Benjamin Braun Director of Graduate Studies August 6, 2020 Date This dissertation is dedicated to my family. Without your patience and understanding in me, this would not be achievable. ACKNOWLEDGMENTS I feel exceptional appreciation toward many people who have supported me during my time in graduate school. More specifically, a few people were particularly relevant to the creation of this thesis and deserved their individual thanks: Thank you to my advisor, Dr. Qiang Ye, for taking me on as a student, and for all for your patience, support, and guidance throughout the past three years. I want to thank Dr. Lawrence Harris and Dr. David Murrugarra for being my Ph.D. committee members and guidance through the teaching and guidance. Thank you, Dr. YuMing Zhang and Dr. Neil Moore, for agreeing to serve as an outside committee within a short notice. I want to thank Dr. Guowei Wei and Menglun Wang for introducing the contact map prediction problem and allow me to access the Michigan State University HPC computers. I want to thank Devin Willmott for the enthusiasm to discuss any manner of topics about machine learning and programming. Also, for being a great academic sibling, and for being helpful to understand RNA problem. Thank you to Kyle Helfrich, for being a great academic sibling and being a neigh- borly and faithful collaborator. Finally, I want to thank all of the staff and faculty in the math department, my friends and family outside Kentucky. iii TABLE OF CONTENTS Acknowledgments . iii Table of Contents . iv List of Tables . v List of Figures . vi Chapter 1 Introduction . 1 1.1 Machine Learning . 1 1.2 Neural Networks . 5 1.3 Deep Learning . 9 1.4 Neural Network Training . 9 1.5 Neural Network Types . 12 1.6 Thesis Outline . 16 Chapter 2 The Scaled Cayley Unitary RNN . 18 2.1 Introduction . 18 2.2 Background . 19 2.3 Scaled Cayley Unitary RNN (scuRNN) . 26 2.4 ModReLU activation Function . 32 2.5 Experiments . 34 2.6 Conclusion . 45 Chapter 3 Symmetrizing Convolutional Neural Network . 46 3.1 Convolutional Kernel . 46 3.2 Symmetrizing Convolutional Kernel . 48 3.3 RNA Direct Secondary Structure Inference . 53 3.4 Contact map prediction for Protein Sequences . 65 3.5 Conclusion . 72 Bibliography . 74 Vita . 80 iv LIST OF TABLES 2.1 Results for the unpermuted MNIST classification problem. The best epoch test accuracy over the entire 70 epoch run are recorded. Entries marked by an asterix are reported results from [48]. 36 2.2 Results for the permuted MNIST classification problem. The best epoch test accuracy over the entire 70 epoch run are recorded. Entries marked by an asterix are reported results from [39]. 37 2.3 Additional convergence results for various optimizers/learning rates for A; D and all the other weights for the MNIST classification problem with n=116 (RMS. = RMSprop,Adg. = Adagrad) . 38 2.4 Table of hidden sizes and total number of trainable parameters of each machine used in the copying problem experiment . 40 2.5 Additional convergence results for various optimizers/learning rates for A, D and all the other weights for the Copying problem T = 2000 (RMS. = RMSprop, Iter.=first iteration below baseline, It. MSE = MSE at iteration 2000) . 42 2.6 Table of hidden sizes and total number of trainable parameters of each machine used in the adding problem experiment . 43 2.7 Results for TIMIT speech set. Evaluation based on MSE . 45 3.1 Results on the 16S rRNA secondary structure inference task using our symmetrized deep learning method for secondary structure inference (RNN/SCNN) and RNN/CNN method. 62 3.2 Secondary structure inference accuracy comparison on the Rogers set with neural networks trained on 16S rRNA sequences . 63 3.3 Sequence results comparison on the Sukosd set on the 16S rRNA secondary structure inference task . 64 3.4 Results on the 16S rRNA secondary structure inference task using our symmetrized deep learning method for secondary structure inference (RNN/SCNN) and RNN/CNN method. Here our method adjusted to match the number of trainable parameters used in the model RNN/CNN. Here the 16S Rogers set and Sukosd set are different test sets that use to test the performance of trained model. 65 3.5 Contact map prediction accuracy on the training, validation, and test set. task using our symmetrized deep learning method for contact map prediction (RNN/SCNN) and general deep learning RNN/CNN method.