Patel Udel 0060D 136

SUPERVISED MACHINE LEARNING FOR SMALL RNA INFORMATICS AND BIG DATA ANALYTICS IN PLANTS by Parth Patel A dissertation submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics and Systems Biology Winter 2019 © 2019 Parth Patel All Rights Reserved SUPERVISED MACHINE LEARNING FOR SMALL RNA INFORMATICS AND BIG DATA ANALYTICS IN PLANTS by Parth Patel Approved: __________________________________________________________ Cathy H. Wu, Ph.D. Chair of Bioinformatics & Computational Biology Approved: __________________________________________________________ Levi T. Thompson, Ph.D. Dean of the College of Engineering Approved: __________________________________________________________ Douglas J. Doren, Ph.D. Interim Vice Provost for Graduate and Professional Education I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: __________________________________________________________ Blake Meyers, Ph.D. Professor in charge of dissertation I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: __________________________________________________________ Hagit Shatkay, Ph.D. Member of dissertation committee I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: __________________________________________________________ Li Liao, Ph.D. Member of dissertation committee I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: __________________________________________________________ Michael Axtell, Ph.D. Member of dissertation committee ACKNOWLEDGMENTS I would like to thank my dissertation advisor Dr. Blake Meyers for providing me an opportunity to work in his group, the financial support, and tremendous guidance academically and professionally during the years of my Ph.D. study. I learnt a lot from you, especially how to manage work-life balance and still work hard and stay focused on goals. You taught me an important lesson in life, during the time of manuscript rejection, that failure is another opportunity to be successful again. I will never forget that. It was pleasure and honor to work for you. I would like to sincerely thank you for being kind and flexible with me and my work throughout my Ph.D. study. I also would like to thank my committee members, including Dr. Hagit Shatkay, Dr. Michael Axtell, and Dr. Li Liao for their invaluable assistance, scientific suggestions, and ideas to my projects. You have invested significant amounts of time in attending my committee meetings and providing me timely suggestions in the past few years. I could not ask for more supportive committee. I also would like to sincerely thank Dr. Cathy Wu for advising me academically and professionally. I greatly appreciate your time and guidance. I am also thankful to so many great colleagues during my Ph.D. study in the Meyers lab, including Dr. Atul Kakrana, Dr. Sandra Mathioni, Reza Hammond, Saleh iv Tamim, Dr. Tzuu-fen Lee, Dr. Arikit Siwaret, Dr. Kun Huang, Dr. Rui Xia, Dr. Chong Teng, Mayumi Nakano, Ayush Dusia, Deepti Ramachandruni, Dr. Qili Fei, Dr. Alex Harkess, Dr. Patricia Baldrich, Dr. Margaret Frank, Suresh Pokhrel, Guna Ranjan Gurazada, Josh Rothhaupt, Ryan DelPercio, and Aleksandra Beric. I would like to thank Dr. Atul Kakrana and Dr. Sandra Mathioni for providing me important technical and scientific guidance. Special thanks to Ayush Dusia and Mayumi Nakano for providing me plenty of help and guidance when I freshly joined the lab. Special thanks to Deepti Ramachandruni for giving me an emotional support after the Meyers lab moved to the Danforth Plant Science Center in Missouri. I would also like to thank Dr. Jixian Zhai for valuable discussion and ideas when I joined the lab in 2013. I would like to thank Dr. Karen Hoober and Susan Phipps for taking care of administrative work. Special thanks to Donna Holesinger, Tracy Walsh, Connie Brantley, and Zaharo Pappoulis for helping me setup meetings and room reservations. I would like to thank my family and friends in Delaware, Philadelphia, New Jersey, Canada, and India for being there during tough times. I would also like to thank my cricket group (“Strikers”) for helping me relax and maintain work-life balance. Finally, I would like to thank my parents for their undivided support, encouragement, and love. It was your belief and hard work that I am here today. Thank you mummy and papa for always being there for me. v TABLE OF CONTENTS LIST OF TABLES ......................................................................................................... x LIST OF FIGURES ....................................................................................................... xi ABSTRACT ................................................................................................................ xiv Chapter 1 INTRODUCTION .............................................................................................. 1 1.1 Phased Secondary SiRNAs are Crucial Regulators of Development, Reproduction, and Plant Defense .............................................................. 4 1.2 Pre-Dissertation State of Approaches to Characterize PhasiRNAs ........... 5 1.3 Rationale for Developing a New Machine Learning Based Tools in Plant Reproductive Biology ...................................................................... 7 1.4 Rationale for Exploring PhasiRNAs Beyond Grasses ............................... 9 1.5 Overview of Dissertation Research ......................................................... 11 1.6 Publications from This Dissertation ........................................................ 14 2 DE NOVO PREDICTION OF REPRODUCTIVE PHAS LOCI IN GENOMES OF GRASSES .............................................................................. 16 2.1 Methods ................................................................................................... 17 2.1.1 Workflow ..................................................................................... 17 2.1.2 Classification method .................................................................. 18 2.1.3 Data .............................................................................................. 20 2.1.3.1 Cross validation dataset ................................................ 20 2.1.3.2 The cross-species validation dataset ............................. 23 2.1.4 Performance evaluation ............................................................... 24 2.1.5 Feature extraction ........................................................................ 25 2.2 Results ..................................................................................................... 27 2.2.1 Evaluation of machine learning classifier ................................... 27 2.2.2 Validation on cross species test ................................................... 30 2.2.3 Informative Features .................................................................... 31 vi 2.3 Chapter Summary .................................................................................... 37 2.4 Author Contributions ............................................................................... 39 3 REPRODUCTIVE PHASIRNAS IN GRASSES ARE COMPOSITIONALLY DISTINCT FROM OTHER CLASSES OF SMALL RNAS ................................................................................................. 40 3.1 Methods ................................................................................................... 41 3.1.1 Classification via machine learning ............................................. 41 3.1.2 Dataset used for cross validation study ....................................... 42 3.1.3 Development of a machine learning classifier for plant small RNAs ........................................................................................... 45 3.1.4 Features included in the machine learning algorithm, and their selection ....................................................................................... 47 3.1.4.1 Sequence-based features ............................................... 47 3.1.4.2 Positional features ........................................................ 48 3.1.4.3 Shannon entropy ........................................................... 49 3.1.5 Generation and computational analysis of sequencing data from maize and Setaria viridis .................................................... 54 3.1.6 Extraction of a set of maize 22-nt hc-siRNAs ............................. 55 3.1.7 Performance evaluation ............................................................... 57 3.2 Results ..................................................................................................... 57 3.2.1 Cross validation results distinguishes reproductive phasiRNAs from other sRNAs ........................................................................ 57 3.2.2 Position-specific biases in phasiRNAs relative to other small RNAs ..........................................................................................

Patel Udel 0060D 136

Reproductive Phasirna Loci and DICER-LIKE5, but Not Microrna

Low Risk Aquarium and Pond Plants

Echinodorus Uruguayensis Arechav

Guidelines for the USDA-APHIS- PPQ Weed Risk Assessment Process

Bromelcairns Bimonthly Newsletter of Cairns Bromeliad Societ Inc

Threats to Australia's Grazing Industries by Garden

October 2020 Mpeda Newsletter 1 Cpf (India) Private Limited Cpf 3 Best Approach for Aquaculture

Alismataceae)

The Taxonomy of Echinodorus Palaefolius (NEES Et MART.) Macbr

Invasive Aquatic Plants and the Aquarium and Ornamental Pond Industries Shakira Stephanie Elaine Azan

Alismataceae

Bromeliaceae