An Investigation of miRNA Repertoires in Bdelloid

By

Anupriya Dutta

B.A., Rutgers University, 2002

A Dissertation Submitted in Partial Fulfillment of the Requirements for

the Degree of Doctor of Philosophy in the Division of Biology and

Medicine at Brown University

Providence, Rhode Island

May 2013

© Copyright 2013 by Anupriya Dutta This dissertation by Anupriya Dutta is accepted in its present form by the

Division of Biology and Medicine as satisfying the dissertation requirement for the

degree of Doctor of Philosophy.

Date______David Mark Welch, Ph.D., Advisor

Recommended to the Graduate Council

Date______Michael Mckeown, Ph.D., Reader

Date______Richard Bennett, Ph.D., Reader

Date______Irina Arkhipova, Ph.D., Reader

Date______Kevin Chen, Ph.D., Reader Rutgers University

Approved by the Graduate Council

Date______Peter Weber Dean of the Graduate School

iii Acknowledgements

First and foremost I want to thank the many people who have mentored and inspired me throughout my life. The following pages are a product of the your effort and influence. I am deeply grateful for your guidance. Specifically, I would like to thank my advisor, David Mark Welch, for all the helpful discussions and feedback. I would also like to express my gratitude to my thesis committee members made up of Richard Bennett, Irina Arkhipova, and Mike Mckeown. They have also been immensely helpful throughout this experience.

I thank Kevin Peterson for introducing me to a project that transformed the course of my thesis work.

I thank my outside reader, Kevin Chen.

I thank the Brown-MBL program for their support.

I gratefully acknowledge the two funding sources that supported this work: NSF and the Watson Fellowship.

I thank my family for their support.

I found the strength to overcome challenges thanks to a group of caring people I am fortunate enough to call friends. This journey would have been a lot tougher without their comfort, concern and companionship.

Lastly, I thank my computer and car for surviving this long. I realize I have not been easy on you these past couple of years, but you have dutifully kept going and that has helped keep me going

iv Table of Contents List of Figures ...... 1 List of Tables ...... 1 Abstract ...... 2 Chapter I ...... 4 Introduction ...... 4 Introduction to microRNAs ...... 5 The Discovery of microRNAs ...... 6 miRNA Biogenesis ...... 8 miRNA Target Interaction ...... 9 Mechanisms of miRNA Regulation ...... 11 Implications of miRNA Regulation ...... 14 miRNA Evolution ...... 16 Introduction to Bdelloid Rotifers ...... 18 Overview of Bdelloid Rotifers ...... 18 Comparison of Bdelloids and Monogononts ...... 21 Bdelloid Genome Structure ...... 23 Chapter II ...... 35 General Methods ...... 35 Chapter III ...... 50 miRNA informatics ...... 50 Chapter IV ...... 105 Loss of widely conserved miRNAs let-7 and miR-100 in bdelloid rotifers ...... 105 Chapter V ...... 140 Frequent substitutions in some Adineta vaga miRNA suggest RNA editing ... 140 Chapter VI ...... 156 Future Directions ...... 156 Appendix A ...... 162 Secondary Structures of miRNAs ...... 162 Appendix A-1 ...... 163 A.vaga secondary structures ...... 163 Appendix A-2 ...... 168 B.manjavacas secondary structures ...... 168

v List of Figures

Figure 1-1 Overview of the miRNA pathway ______8 Figure 3-1 let-7 family members in nematodes. ______78 Figure 3-2 let-7 sequences of flatworms ______79 Figure 3-3 Processing pipeline of SMD ______80 Figure 3-4 Schematic depiction of SMD accessory programs ______81 Figure 3-5 Mixed top hits returned by BLAST ______96 Figure 3-6 Advantage of the SMD algorithm in identifying correct miRNA homolog match for queried Dme miRNA ______97 Figure 3-7 Gap tolerance beyond the seed sequence to minimize substitutions within the match chosen by SMD ______98 Figure 3-8 SMD return of divergent miRNA homologs ______99 Figure 3-9 The conserved let-7 cluster in B. manjavacas ______101 Figure 3-10 SMD vs. miRBase(BLAST) ______102 Figure 3-11 Seed shift in a rotifer miRNA homolog. ______103 Figure 3-12 Sample output from SMD accessory programs ______103 Figure 4-1 Alignment of rotifer miRNAs ______135 Figure 4-2 Ct tables of qPCR reactions ______136 Figure 4-3 qPCR amplification of miRNAs ______137 Figure 4-4 Northern for let-7 and miR-87 ______138 Figure 4-5 Reduced stringency search for A. vaga let-7 ______139 Figure 4-6 Alignment of miR-100/let-7/miR-125 genomic cluster ______140 Figure 4-7 Sequences from miRbase of miR-100 and let-7 ______136 Figure 4-8 lin-41 binding sites in B. manjavacas ______143 Figure 4-9 lin-41 binding sites in A. vaga ______144 Figure 4-10 Binding sites on hbl-1 and dicer-1 ______145 Figure 5-1 miR-125 editing in A. vaga. ______152 Figure 5-2 miR-125 reads from B. manjavacas ______153 Figure 5-3 Alignment of isomirs corresponding to conserved miRNAs from SMD output ______154 Figure 5-4 Lack of C-to-U substitutions in miR-1175 and miR-315 ______155

List of Tables

Table 3-1 Summary of SMD and BLAST identification of Dme miRNAs _____ 82 Table 3-2 SMD and BLAST identification of Dme miRNAs ______84 Table 3-3 SMD and BLAST assignment of Dme miRNAs ______100 Table 4-1 Sequences of qPCR probes ______126 Table 4-2 Abundance of conserved miRNAs from small RNA libraries _____ 127 Table 4-3 Sequences of conserved miRNA homologs from rotifer surveys __ 128 Table 4-4 Genomic environment of miR-125 loci in A. vaga. ______135

1

Abstract

An exceptional group of aquatic invertebrates makes up Class .

Bdelloid rotifers are the only group of ancient asexual known. A second distinction that sets them apart from all other metazoans is their outstanding DNA repair abilities.

A class of small noncoding regulatory RNAs known as microRNAs (miRNAs) has been implicated in numerous cellular processes. The role of miRNAs in responding to DNA damage and asexual reproduction prompted an investigation of the bdelloid miRNA repertoire. A comparative approach was used for the investigation. The miRNA repertoire of a facultatively sexual sister clade of bdelloids, the monogononts, was also surveyed. Examination of small RNA libraries from a bdelloid and monogonont species revealed the absence of two widely conserved miRNAs: let-7 and miR-100. These results were rigorously evaluated through an additional small RNA library survey of a second species of bdelloid rotifers along with qPCR and Northern blot assays. The apparent loss of let-7 and miR-100 in bdelloids suggested that other conserved miRNAs may exhibit sequence divergence. A program designed with flexible stringency parameters was created to identify all conserved miRNAs with varying levels of sequence divergence. In addition, accessory programs were created to identify miRNA sequence variants. This software revealed several divergent miRNA homologs in bdelloid rotifers that could not be identified with existing software.

2 Potentially edited variants of some conserved miRNAs were also identified.

The conserved miRNA repertoire of bdelloids does indeed reflect the unique aspects of bdelloid biology, as let-7 is an integral part of the DNA damage response and also shown to be suppressed in other asexual species. The miRNA editing phenomenon may be a compensatory mechanism for the absence of conserved miRNAs. Cumulatively, these findings suggest that the unique miRNA evolution of bdelloid rotifers has contributed to their successful asexual evolution.

3 Chapter I

Introduction

4 Introduction to microRNAs

The central dogma of molecular biology describes a cascade where DNA is made into RNA and then RNA is read into protein, leaving RNA with an intermediary role when in fact it has the ability to do much more. The role of RNA within the cell is dynamic, having the potential to impact both the DNA and protein landscape. The central dogma’s depiction of RNA’s role is far too simple considering the multitude of functions it carries out within the cell. Outside of mRNAs exists a world of noncoding RNAs that carry out tremendously diverse functions within the cell. Many of these functions actively shape and engage the mRNA population. Most prominent among the noncoding RNAs are ribosomal

RNAs (rRNAs) and transfer RNAs (tRNAs), which function specifically in translation of mRNA. Further exploration of other noncoding RNAs continues to reveal that the interactions between noncoding RNAs and mRNAs are not restricted to the translation step alone, but are imbedded within profoundly more complex regulatory networks.

Noncoding RNAs tend to dominate transcriptional profiles. Their sheer abundance can make proper quantitative measurements of messenger RNA difficult in sequencing-based approaches. While it would seem their presence is a hindrance to drawing accurate estimates of mRNA copy number, ironically, it is the output of mRNA that is sometimes greatly influenced by the function of noncoding RNAs. Within this pool of molecules exist distinct functional classes of

RNA, some of which, not so long ago, were dismissed as “junk.” However, in a

5 flurry of revolutionary findings, many of these noncoding RNAs have been implicated as critical components of countless biological processes.

The Discovery of microRNAs

Contributing to the effort to characterize and understand noncoding RNAs is research on a group of small noncoding RNAs called microRNAs (miRNAs).

They are the first of the small noncoding RNAs to be discovered, in 1993

(Hamilton and Baulcombe, 1999; Lee et al., 1993; Wightman et al., 1993).

Collaboration between the Ambros and Ruvkun labs using forward genetic screens for developmental mutants in C. elegans led to the identification the founding member of the miRNA family, lin-4 (Lee and Ambros, 2001; Reinhart et al., 2000). Several years later came the discovery of let-7, which, through its sequence conservation, brought with it the understanding that miRNAs are a feature common among bilaterians and arose early in the history of Metazoa

(Lagos-Quintana et al., 2001; Pasquinelli et al., 2000). The mechanism by which lin-4 and let-7 operate became apparent not too long after their discovery. The genes lin-14 and lin-41 were shown to be downregulated by the binding to transcripts of lin-4 and let-7, respectively (Lee et al., 1993; Wightman et al.,

1993). The results from these early experiments found that these 22 nt noncoding small RNAs bound with varying degrees of complementarity to the

3’UTR regions of their mRNA targets. Further research involving lin-4 and let-7 contributed much of the foundational work in this field and thus their examples will be used to describe the important insights that were gleaned from their study.

6

Marking close to twenty years from the discovery of the first miRNA, great strides have been made to elucidate the workings of the miRNA pathway. It is a pathway that is found in both plants and animals (Hamilton and Baulcombe, 1999).

Eukaryotic RNAi, the pathway, which silences genes through dsRNA, has co- opted much of its machinery from archaea, bacteria and phage (Grimson et al.,

2008; Song et al., 2004). The miRNA pathway employs much of the same machinery for its purposes, but is distinctly different from RNAi in the way biogenesis occurs within the cell as well as certain consequences of target recognition.

!"#$%"&'

!"# $"# ()*+,*-!.'

!%# $%#

()%+,*-!.' !"# $"# ,*-!.' 5)%#")&3)' 4"5$%6' #$%7872%'

()%+,*-!.'

-/01' -/01' .23' .23' -/01'*9#3)53)7:39' &'()# ...... ' &'()#*+,-./,01-,/#+2%+2..31-## # ##1+# ###########&'()#425+,4,*,31-##

7 Figure 1-1 Overview of the miRNA pathway miRNA Biogenesis

The metazoan miRNA pathway begins at the nucleus (Figure 1-1). Primary miRNAs (pri-miRNAs) are encoded within the nuclear genome and when expressed their transcripts adopt a characteristic hairpin structure. miRNAs are sometimes found on polycistronic transcripts. Reports indicate that regions of eukaryotic genomes may be enriched for miRNAs, as approximately 50% of miRNAs are believed to be in close proximity to other miRNAs (Lagos-Quintana et al., 2001; Lau et al., 2001; Mourelatos et al., 2002). While many miRNAs have their own loci, there is a fraction generated from introns of mRNA, known as mirtrons. There are even reports of miRNAs that arise from ribosomal genes

(Xiong J., 2010). Pri-miRNAs are transcribed by RNA polymerase II and are also capped and polyadenylated like other genic transcripts (Grosshans, 2010). Pri- miRNAs are cleaved by a complex known as the microprocessor, with the exception of mirtrons that are processed by splicesome machinery (Denli et al.,

2004; Okamura et al., 2007). The microprocessor is made up of an RNAse III enzyme, Drosha, and its cofactor, DCGR8 (Han et al., 2006; Krol et al., 2010).

Cleavage results in a ~70 nt precursor-miRNA (pre-miRNA) molecule, with the hairpin structure maintained and a 2 nt 3’ overhang. The overhang marks it for further processing within the miRNA pathway as it exported to the cytoplasm.

In the cytoplasm, another RNAse III enzyme, known as Dicer, cleaves the pre- miRNA to yield the final mature miRNA of approximately 22 nt. The duplex is

8 then unwound and the guide strand is incorporated into a miRNA-induced silencing complex (miRISC). The other strand, known as miRNA* or passenger strand, is generally discarded; however, there are reports of silencing activity by miRNA* species (Stark et al., 2005). This nomenclature has been revised for miRNAs that arise from opposite arms of the pre-miRNA molecules. The terms

“5p” and “3p” are used to indicate the location of the miRNA on either the 5’ or 3’ end of the pre-miRNA molecule (Kozomara and Griffiths-Jones, 2011). The strand with the lowest thermodynamic stability on its 5’ end is the strand that is usually selected for miRISC incorporation (Khvorova et al., 2003; Schwarz et al.,

2003). From the Argonaute family of proteins, Argonaute (Ago) sits at the heart of the miRISC complex. The Ago protein is composed of a bilobial structure consisting of three functional domains. The PAZ domain is near the N-terminus and binds the 2 nt 3’ overhang of the cleaved precursor product (Djuranovic et al., 2011). Located on the other lobe is the MID domain, which binds the 5’ end of miRNA. The PIWI domain is located near the C-terminus and has a ribonuclease

H-like region for endonucleolytic activity (Carthew and Sontheimer, 2009). Once anchored within miRISC, the miRNA guides the complex to its target transcript.

Other accessory proteins then associate with miRISC, such as GW182, a glycine-tryptophan protein, to mediate translational repression or degradation of the target (Behm-Ansmant et al., 2006a).

miRNA Target Interaction

Within the miRNA, nucleotides 2-7 or 2-8, from the 5’ end, are considered the

9 seed sequence (Figure 1-2). Mutational analyses of miRNA::target interactions have helped identify the critical regions on the miRNA that are necessary for pairing (Vella et al., 2004). This is further reinforced by observed sequence conservation among miRNA homologs (Wheeler et al., 2009). Together these areas of research contribute to a model for miRNA::target interaction (Figure 1-

2). Following the seed sequence, a bulge formed by unpaired bases is commonly observed and then downstream pairing involving nucleotides from position 13-16.

Then following this region, there is generally partial complementarity (indicated by dashed red lines in Figure 1-1) (Filipowicz et al., 2008; Grimson et al., 2007;

Wheeler et al., 2009). As mentioned before, miRNAs bind to their targets with varying degrees of complementarity. The seed sequence of the miRNA largely determines specificity to the target since it must bind with absolute complementarity. There are however, variations to this rule. Hepta- and octamer seed sequences have been reported (Baek et al., 2008; Friedman et al., 2009;

Nielsen et al., 2007). Seed sequences, themselves, can also be offset. There is also compensatory 3’ complementarity for mismatch within the seed sequence to target (Bartel, 2009; Yekta et al., 2004). The degree of complementarity between miRNA and its target also determines the fate of the target. miRNA sequences with extensive complementarity to their targets follows an RNAi-like pathway that results in degradation of the target. Extensive complementarity in miRNA-mRNA pairing is rare in animals, but far more common in plants. Poor complementarity between a miRNA and its target in animals allows a miRNA to have multiple targets. This increases the complexity of possible interactions a single miRNA

10 may have and adds to the difficulty in prediction of its true targets. Further complicating the details of miRNA::target interaction is a recent study which reported the affinity of miRNA to its targets is dependent on concentration of both the levels of miRNA and target (Ragan et al., 2011).

! ! !!!-./0! %,! ',! """""!!!!!!!!!!!!!!!""""! ! !!!!!!!!!"! !! ! !!!!!!!"""" ! !!!!!!!""""""""! !!!!!!!!"""" ! !!!""""""""! !!!!!!!!!!!!!!!!!!!!!)$!!)%!!)&!!)'!!!!!! !!!!!!"!#!$!%!&!'!(!! """""! !!!!!!!!!!!""""!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"!! ()!!(*!)+!)"!)#!! )(!!!))!!)*!!!+!!!!!! !)!!!!!! ',! ! %,! ! -1./0! 213456!)7)8!/9:456!;?;=!! !

Mechanisms of miRNA Regulation

miRNA regulation is perhaps the most challenging subject of miRNA biology.

Various aspects of miRNA regulation cannot be defined in a straightforward biochemical pathway. The postulated mechanisms for miRNA regulation are many, as are the consequential outcomes. The complexity begins at the level of miRISC targeting of mRNA translation. There is evidence of miRISC targeting prior to translational initiation and also during translational elongation. To distinguish between these two possibilities, experimental approaches involving polysomal profiling and ribosomal profiling were employed. Polysomal profiling examines the distribution of the targeted mRNA in translationally active and

11 inactive fractions, while ribosomal profiling examines the distribution of ribosomes along the targeted molecule on a temporal scale.

If miRISC targets the mRNA prior to translational initiation, it is believed Ago's interaction with the methylated cap on the mRNA prevents its recruitment of eIF4, rendering the mRNA translationally inactive. A polysomal profiling study looking at distribution of mRNA targets of let-7 and lin-4 in C. elegans found this to to be the case (Ding and Grosshans, 2009). They observed a shift of lin-41 mRNA to submonosomal fractions in the wild type, coincident with the let-7 expression during development. In the let-7 mutant they noticed an increase of lin-41 in the polysomal fractions. Mutational analyses of miRISC proteins such as

GW182 and AIN-1/2 showed that these proteins are necessary for the degradation of mRNA by sequestering it to P-bodies. However, it is important to note that the localization and eventual degradation of the targets within P-bodies is not immediate as AIN-1/2 were immunoprecipitated with their miRNA targets

(Zhang 2007).

Intriguingly, the support for miRISC targeting at the translational elongation step comes from an investigation of the very same targets of let-7 and lin-4 also in C. elegans using ribosomal profiling. If translational initiation were inhibited by miRISC targeting, the expectation would be the number of sites on targeted mRNA that were occupied by ribosomes would decrease over the time a miRNA is expressed. However, if translational elongation were affected, this would result

12 in pause sites, which in the ribosomal profile would result in peaks representing increased ribosomal density. If ribosomal drop-off occurred as result of miRISC targeting, the ribosomes would have a tendency to map to the 5' end of the target mRNA. The study concluded for the surveyed targets of let-7 and lin-4 that there were a variety of mechanisms and outcomes that result from miRISC targeting, which failed to be explained by single scenarios. Ribosome drop-off occurred for some of the targets while deadenylation occurred in others. Deadenylation is not restricted to translational elongation step. It is a process that can also occur at earlier step of miRISC targeting, prior to translational initiation. Deadenylation is mediated by Ago2-GW182's interaction with the CAF1–CCR4–NOT deadenylase complex (Behm-Ansmant et al., 2006b; Eulalio et al., 2009; Eulalio et al., 2007).

Following deadenylation, the decapping enzyme, DCP2, with a host of cofactors, removes the cap and the transcript is subsequently degraded in a 5’-to-3’ manner by XRN1. The most surprising of the ribosomal profiling results is lin-41 mRNA did not exhibit a marked difference following expression of let-7. This forced the authors to conclude that this may be a result of ribosomal pausing. Other ways of repression of miRISC targeting can result in degradation of the nascent polypeptide and premature termination of translation (Nottrott et al., 2006).

The polysomal shifts or ribosomal profiles following expression of a miRNA are not dramatic in either of the described experiments. Therefore, the data does not strongly favor one outcome over the other. There are populations of mRNA that are likely engaged in repression at either translational step. Interpretation of the

13 results accommodating both possibilities occurring simultaneously may reconcile the discrepancy for targeting mediated by a specific miRNA. Another factor confounding correct interpretation of these results is global populations of mRNA and miRNA are examined in these experiments, while expression and targeting are likely restricted to certain cell types. Nonetheless, the results provide an important basis for understanding the intricate details of miRNA::target regulation.

Implications of miRNA Regulation

The global impact of miRNA regulation may be observed at the organismal level and yet is mediated through exquisite spatiotemporal specificity within individual cells. The overarching theme of miRNA regulation is the fine-tuning of gene expression (Bartel, 2009). The screens for developmental mutants that revealed lin-4 and let-7 show how miRNA regulation plays an important role during development. Mutants of lin-4 and let-7 exhibit defects in postembryonic development. Postembryonic development consists of four larval stages in worms, L1-L4. Both lin-4 and let-7 target, and are targeted by, genes that are expressed during these stages. Lin-4 and let-7 mutants reiterate L1 and L4 stages, respectively (Slack et al., 2000; Wightman et al., 1993). The let-7 mutants display the most overt phenotype of vulval bursting, while lin-4 mutants lack adult structures. The lin-4/let-7 regulatory network demonstrates the complexity of miRNA interactions. Within the network, there are negative and positive feedback loops as well as redundancy among interactions. This is

14 demonstrated by the fact lin-4 and let-7 both act on hbl-1 (Bagga et al., 2005). however, during different periods of development. The miRNA regulatory network may be wired with a degree of redundancy.

A perplexing consequence of miRNA perturbations is in many instances they have little to no effect on the levels of mRNA and protein output (Miska et al.,

2005). Single deletions of miRNAs were found not to greatly offset the amount of mRNA and protein. While redundancy of miRNA function doesn’t completely explain these effects, the regulatory network involving lin-4/let-7 invokes a second idea: miRNA networks are wired with buffering capabilities for contingencies. A single miRNA may have multiple targets and each target might have multiple miRNA binding sites. These inherent features might help explain why the robustness of miRNA regulation is not easily upset by single deletions.

Further complicating matters is miRNA regulation may either have antagonistic or promoting influence on overall gene expression. The more difficult to untangle interactions involve miRNAs in coherent and incoherent feed forward loops. In incoherent feed forward loops, there is a third component that is affected by or also affects the other two components within the loop. In such cases, single deletions may not have a pronounced effect. So far, a few recent studies exploring miRNA function in a sensitized genetic background have observed mutant phenotypes, supporting the role for miRNAs as buffering agents (Brenner et al., 2010).

15 Beyond development, miRNA expression is employed in myriad cell functions.

Their role in responding to environmental factors contributes to miRNA playing a buffering role for environmental contingencies. A recent study by Qui et al. explores the role of miRNAs in responding to abiotic and biotic stress factors.

Using known miRNAs implicated in environmental response catalogued in miREnvironment database, the study examined the profile of these miRNAs across 17 different species in an effort to understand signatures of miRNAs in cancer treatment (Qiu et al., 2012). The findings were built on the correlation of miRNA expression to environmental stress factors related to cancer treatment.

They concluded miRNA profiles were reliable predictors of therapy outcomes to different treatments. This underlies a similar theme of miRNA function during development as miRNAs respond to environmental stimuli to aid the cell during periods of stress. The miRNA profile during these treatments is an accurate indicator of cellular response and therefore overall a gauge for the patient’s response to treatment.

miRNA Evolution

As mentioned previously, a defining feature of miRNAs is their sequence conservation. The evolution of miRNAs has been investigated at both the sequence level and on a more macroscopic level looking at general evolutionary patterns of innovation and loss. At the sequence level, the mode in which miRNAs bind to their targets largely influences how these short RNA molecules evolve. Through diversely sampled taxa, the identified miRNA homologs exhibit

16 conservation in the seed sequence, and partial complementarity downstream. In cases where different miRNAs may bear the same seed sequence, homology is determined by the remainder of the sequence. Different miRNA homologs with the same seed sequence are grouped into families (Elefant et al., 2011). miRNA sequence evolution was studied in a survey of miRNAs in four divergent nematode species (de Wit et al., 2009). The phenomenon of arm-switching between the 5’ arm of precursor sequence to the other 3’ arm was commonly observed among the conserved miRNAs detected. In many of these cases gene duplication occurred, supporting the hypothesis that such events lead to the arm- switching phenomenon. Other phenomena such as seed-shifting and favoring of the 3’ arm miRNA variant were also observed. Not surprisingly, the most divergent nematode species shared the least miRNAs with the other three surveyed species.

Overall the prevailing idea of miRNA evolution in animals is that innovation of miRNAs has contributed to innovations in the animal body plan. Once miRNAs are acquired, they are difficult to lose (Sempere et al., 2006). While miRNAs are rarely lost, certain miRNAs, such as widely conserved let-7, are believed to essential and therefore indispensible (Pasquinelli et al., 2000; Wheeler et al.,

2009). Novel miRNAs are thought to contribute to phenotypic variation specific to lineages. Such novel miRNAs may arise from local and non-local duplications of miRNAs. If the miRNA arose from a non-local duplication, there would be a greater chance of the novel miRNA acquiring subfunctionalization or

17 neofunctionalization roles because it would be transcriptionally unlinked to its paralog. Another way novel miRNAs may arise is from loci of conserved miRNAs through the mechanism of RNA editing (Yang et al., 2006). While there aren’t many observations of edited miRNAs thus far reported, they are among the more easily identified novel miRNAs because of their homology to conserved miRNAs in addition to absence of genomic context. Through genome evolution and epigenetic mechanisms, novel miRNAs can carve out new functions for themselves.

Introduction to Bdelloid Rotifers

Overview of Bdelloid Rotifers

There are costs and advantages to parthenogenesis, an uncommon mode of reproduction in eukaryotes with long evolutionary lifespans. Meiosis shuffles around gene combinations that are then distributed across reduced genomes. On the other hand, the most common form of parthenogenesis, apomictic parthenogensis, bypasses the process entirely as eggs are produced through mitotic division (Suomalainen, 1962). From a theoretical standpoint, the advantages of asexual reproduction (i.e., loss of half the maternal genome, risk of sexually transmitted diseases or energy and time spent searching for a mate), seem to far outweigh those conferred by sex; yet nature has decidedly favored sex over parthenogenesis as the most common mode of reproduction among eukaryotic life. The seeming indispensability of sex may be broadly summarized

18 in two key postulated reasons that deal with fitness and natural selection: 1.) sex confers an increase in overall average fitness of offspring; and 2.) the resultant increase in genetic and fitness variation allows natural selection to operate more effectively as well as improve adaptability to harsh environmental conditions (de

Visser and Elena, 2007). Alternatively, the maintenance of sex can also be explained through the utility of recombinational machinery employed to overcome

DNA damage (Bernstein et al., 1987).

The competing theories dealing with the advantages and disadvantages of sex are not mutually exclusive as they differ only on the temporal scale used. Should a sexual lineage abandon meiosis, it would enjoy short-term success by avoiding the two-fold cost of sex, i.e., males which reduce the number of offspring by half compared to their asexual counterparts. However, for reasons previously stated, such success is generally short-lived. Their abbreviated evolutionary lifespans is best evidenced by the dearth of asexual lineages that have diversified or achieved high taxonomic rank (Arkhipova and Meselson, 2005).

Therefore, the existence of an ancient asexual lineage is remarkable because it goes against evolutionary theory. Class Bdelloidea, belonging to the phlyum

Rotifera, is reputed to be the only ancient asexual animal lineage. The phylum is made up of obligate parthogenetic aquatic invertebrates, estimated to have arisen tens of millions of years ago (Mark Welch et al., 2008). Despite observation for over 300 years, males have yet to be identified. Anatomical

19 structures resembling sperm are absent as well as any evidence of chromosome pairing or reduction during oogenesis, supporting the claim that bdelloids have persisted without meiosis and syngamy (Hsu, 1956). Theories abound as to what characteristics are specific to an ancient asexual lineage (Lam et al., 2011). The hypothesized ideal ancient asexual: would we know it if we came across it? How much of its attributes are genome-dependent and/or -specific? Also, confounding factors such as homoplasy and infrequent sexual reproduction within an asexual lineage are exceedingly difficult to tease out making identification of a truly ancient asexual lineage even more problematic. Given the existence of an ancient asexual animal lineage is considered such an anomaly, the strongest supporting evidence for its asexual evolution may lie in the peculiarities that set it apart from all other sexual metazoan taxa. Indeed, bdelloid rotifers possess no shortage of such peculiarities from their desiccation-prone lifestyle to their degenerate tetraploid composition. How these properties coalesced in one asexual lineage is best understood after examining their evolutionary history and physical traits.

Bdelloids, along with monogononts and seisonoids belong to a phylum of a basal position among the tripoblast protostomes. Specific unifying characteristics of

Phlyum Rotifera include a mastax and a corona. The mastax is a food- processing pharynx; the corona is a ciliated region around the mouth used for food gathering and locomotion. Common physical characteristics include a nervous system complete with ganglia, photosensitive, tactile and secretory

20 organs. In addition, basic features such as muscles, digestive system, and gonads are also present (Ricci C, 2000). Rotifers are direct developers and eutelic, possessing approximately 1,000 nuclei their entire life. Gross developmental features are largely shared among all rotifers, possibly with the exception of key aspects related to oogenesis, further elaborated upon in the next paragraph. During development, holoblastic cleavage occurs resulting in a modified spiral patterning of blastomeres, characteristic of the Spiralian taxon

(Ricci and Boschetti, 2003). Gastrulation begins at the 16-cell stage and proceeds through epibolic movements of the blastomeres, finally leading to the specification of the three germ layers. Organogenesis results in differentiation of specialized cells that make up the animal. While a great deal is not known about rotifer organogenesis, it is observed to be the longest period of embryonic development (Boschetti et al., 2005).

Comparison of Bdelloids and Monogononts Bdelloids’ suite of uniquely evolved characteristics is better appreciated when compared to their closest relative: the monogononts. According to molecular clock estimates, bdelloids separated from the facultatively sexual class of rotifers, monogononts, approximately 100 million years ago. Following the split, they diversified into the most ancient asexual metazoan taxon consisting of four families, 19 genera, and at least 460 species (Donner, 1965; Segers, 2007).

Bdelloid and monogonont rotifers both have the shared synapomorphies of

Rotifera reflecting a developmental program that is not markedly different.

Bdelloid and amictic monogonont eggs are produced in much the same manner: eggs are produced from oocytes through mitotic divisions, although the exact

21 numbers of mitotic divisions that occur during bdelloid oogenenesis remain unclear. There have been conflicting reports of either one or two polar bodies extruded during the process in bdelloids; in monogononts only one polar body is released in oogenesis (Hsu, 1956; Ricci and Boschetti, 2003). If a second polar body were released during bdelloid oogenenesis, then the processes would be fundamentally different with regard to the number of mitotic divisions required to produce an egg. These specific details have yet to be determined.

As the facultatively sexual relatives of bdelloids, the monogonont life cycle includes mixis. Diploid amictic females release a mixis signal in response to crowding. This initiates the sexual cycle in the population stimulating production of diploid mictic females. The fundamental difference in development of monogononts and bdelloids is monogonont diploid amictic females produce haploid eggs through meiosis that if left unfertilized, will give rise to males. If these eggs are fertilized, they form resting eggs capable of surviving harsh environmental conditions (García-Roger et al., 2006).

Bdelloids are found in a wide array of freshwater habitats, which often tend to be ephemerally aquatic ones such as moss, lichens, and temporary pools of freshwater; monogononts reside not only in freshwater, but also brackish water as well as other marine environments (Witek et al., 2009). As a necessary adaptation to desiccation-prone environments, bdelloids frequently undergo anhydrobiosis, a quiescent stage in which they curl up into a ball, called a tun, to

22 weather the desiccation process. The only desiccation-tolerant state in monogononts is restricted to diapausing resting eggs, while bdelloids are able to desiccate at any point during their life (García-Roger et al., 2006; Ricci et al.,

2007). When desiccated, bdelloids may incur a large number of DNA breaks, which must be repaired in order to recover. Although bdelloids are known primarily for their ancient asexual status, the outstanding DNA repair abilities of bdelloids are equally remarkable. Experiments testing the ability of bdelloids to withstand ionizing radiation revealed they are capable of surviving up to a 1000

Gy of ionizing radiation (Gladyshev and Meselson, 2008). No other metazoan has been observed to come close to such a feat. Presumably, such radioresistance exists because DNA damage resulting from desiccation is similar to the extent of DNA damage experienced during these treatments.

Bdelloid Genome Structure

Present-day bdelloid genome structure and organization may have largely been shaped by the desiccation process and consequently tailored to accommodate it.

Molecular and cytogenetic evidence suggest that the progenitor of bdelloids underwent whole genome duplication, followed by partial loss of the duplicated genome rendering them degenerate tetraploids (Hur et al., 2009; Mark Welch et al., 2008; Mark Welch, 1998). The hypothesis is founded primarily on sequencing and FISH analysis of 40-70 kb stretches around hsp82 loci of two distantly related bdelloid rotifers, Adineta vaga and roseola. Further supporting the hypothesis are analyses involving loci around the hox genes and histone

23 cluster, both of which are concordant with the organization observed around hsp82. Altogether, these results reveal a genome organized in a quartet structure, comprising of collinear pairs. The collinear pairs are constituents of two ancient lineages, A and B. Between collinear pairs, genes and their synteny are preserved. However, indel polymorphisms do exist between the copies, which at times include whole introns. This nucleotide difference between sequenced gene copies is reflected in a Ks that varies from zero to twenty percent. Overall, the average nucleotide difference of aligned regions between members of a collinear pair is 4%. A vastly different account is observed when a comparison is made between lineages. Between lineages, the regions are virtually unalignable outside of coding sequences. While synteny is preserved across lineages, segmental deletions including entire genes are among the stark differences.

Perhaps the strongest support for asexual evolution of bdelloids comes from a comparison of lineages across different bdelloid species. This comparison shows that lineage A has greater homology to a lineage A in other species than it does to the other lineage, lineage B, within that same species. The shared ancestry of lineages and common genomic architecture among distantly related bdelloid highlights one particularly important conclusion: degenerate tetraploidy arose early in bdelloid history and is a synapomorphy of the entire class.

Although colinearity between chromosomes also exists in sexually reproducing organisms, it is believed to serve a different purpose in bdelloids. The studies of bdelloid oogenesis maintain that chromosome pairing and reduction do not

24 occur; eggs are produced mitotically. Chromosomal colinearity is most likely maintained to repair DNA damage during anhydrobiosis. The hypothesis proposes the template of the corresponding intact member of a collinear pair is used to repair the damaged chromosome through homologous recombination.

Gene conversion occurs as a consequence, evidenced by the regions of near to complete identity (Ks ≈ 0) between members of a collinear pair. Recent work by

Connallon et al. expounded on how this phenomena may help an asexually evolving lineage escape the detrimental effects of Muller’s ratchet :

Nonrecombining chromosomes, such as the Y, are expected to degenerate over time due to reduced efficacy of natural selection compared to chromosomes that recombine. However, gene duplication, coupled with gene conversion between duplicate pairs, can potentially counteract forces of evolutionary decay that accompany asexual reproduction.

Muller’s ratchet, the accumulation of slightly deleterious mutations over time, is hugely problematic for an asexual lineage as it can succumb to stochastic loss of the least mutated individuals. Through a combination of analytical and computer simulations involving the frequency of gene conversion and duplication on the Y chromosome, the authors were able to determine:

Gene conversion appears to constrain accumulation of deleterious mutations in a way that is identical to crossing over in traditional models of Muller’s Ratchet. Under both models, the rate at which the ratchet “clicks” – the least mutated class of individuals is lost – is highest when individual mutations are weakly deleterious and/or the chromosome-wide mutation rate (an increasing function of the mutation rate per locus and the number of loci) is high (Charlesworth & Charlesworth 2000; Bachtrog 2008).

An analogous scenario is likely present in bdelloids, but at a genome-wide level.

Both gene conversion and tetraploidy are a part of the bdelloid arsenal, perhaps

25 working in a similar manner to prevent hypothesized gene degradation.

The last distinctions to be described in bdelloids have to do with genomic modifications. Transposable elements (TEs) are composed of repetitive elements that are capable of random insertion in the genome. The expansion of TEs makes the existence of an asexually reproducing species difficult. Their proliferation would quickly result in deleterious mutations over the genome without a means of purging them, ultimately leading to the extinction of the asexual lineage. A near absence of a retrotransposons of the LINE and gypsy superfamilies is observed in the bdelloid genome. PCR screens revealed that these transposable elements are not maintained in high copy number in bdelloids as they are in nearly all sexually reproducing taxa (Gladyshev et al., 2008). The remnants of retrotransposons found in the bdelloid genome appear to be inactivated by segmental deletions, most likely a consequence of DNA repair following desiccation. While common retrotransposons were close to absent in bdelloids, DNA TEs were detected. From sequencing telomeric regions, it was found DNA TEs were enriched within these regions.

The telomeric regions were also hotspots for horizontal gene transfer. Horizontal gene transfer is rarely documented in other metazoans. Genomic profiling of telomeric regions shows an accumulation of foreign DNA, in addition to DNA TEs

(Gladyshev et al., 2008). This is believed to be a genomic signature of the desiccation process as foreign DNA is taken up in the midst of DNA repair. The

26 foreign DNA integrated within telomeres is largely of nonmetazoan origin. A survey of subset of the foreign DNA shows that is expressed and perhaps even functional.

While numerous pitfalls are speculated to cause the extinction of an asexual lineage, there may be additive attributes of an asexual lineage to save them from it. The unique features of bdelloid biology carry with them unforeseen advantages for advancing an asexual lineage in the face of environmental adversity. Hallmarks of the bdelloid genome structure, germane to the dissertation subject, include degenerate tetraploidy and occasional gene conversion, both of which have an impact upon gene copy number. Furthermore, the accumulation of heterozygosity in an asexually evolving lineage leaves opportunities for unchecked divergence of gene sequences; regulatory elements, which control expression of those genes, are equally likely to accumulate heterozygosity. In independently evolving asexual lineages heterozygosity might be quite different at the interspecies level, therefore, questions regarding the management of fluctuating gene copies (produced by gene conversion) and alleles are difficult to address as it may manifest in varied scenarios. This would also depend on the genetic background across the multitude of clonal lineages within Bdelloidea. Detailed answers to such questions will likely involve population-level studies. However, what is evident from the findings thus far presented is common features required for long-term asexual evolution in bdelloids arose early in their history. The qualities, which bind the species of

27 Bdelloidea together, are the very ones that have enabled their success.

28 References

Arkhipova, I., and Meselson, M. (2005). Deleterious transposable elements and the extinction of asexuals. BioEssays : news and reviews in molecular, cellular and developmental biology 27, 76-85.

Baek, D., Villen, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. (2008). The impact of microRNAs on protein output. Nature 455, 64-71.

Bagga, S., Bracht, J., Hunter, S., Massirer, K., Holtz, J., Eachus, R., and Pasquinelli, A.E. (2005). Regulation by let-7 and lin-4 miRNAs results in target mRNA degradation. Cell 122, 553-563.

Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233.

Behm-Ansmant, I., Rehwinkel, J., Doerks, T., Stark, A., Bork, P., and Izaurralde, E. (2006a). mRNA degradation by miRNAs and GW182 requires both CCR4:NOT deadenylase and DCP1:DCP2 decapping complexes. Genes & development 20, 1885-1898.

Behm-Ansmant, I., Rehwinkel, J., and Izaurralde, E. (2006b). MicroRNAs silence gene expression by repressing protein expression and/or by promoting mRNA decay. Cold Spring Harbor symposia on quantitative biology 71, 523-530.

Bernstein, H., Hopf, F.A., and Michod, R.E. (1987). The molecular basis of the evolution of sex. Advances in genetics 24, 323-370.

Boschetti, C., Ricci, C., Sotgia, C., and Fascio, U. (2005). The Development of a Bdelloid Egg: A Contribution after 100 years. Hydrobiologia 546, 323-331.

Brenner, J.L., Jasiewicz, K.L., Fahley, A.F., Kemp, B.J., and Abbott, A.L. (2010). Loss of individual microRNAs causes mutant phenotypes in sensitized genetic backgrounds in C. elegans. Current biology : CB 20, 1321-1325.

Carthew, R.W., and Sontheimer, E.J. (2009). Origins and Mechanisms of miRNAs and siRNAs. Cell 136, 642-655.

Connallon, T., and Clark, A.G. (2010). Gene duplication, gene conversion and the evolution of the Y chromosome. Genetics 186, 277-286. de Visser, J.A., and Elena, S.F. (2007). The evolution of sex: empirical insights into the roles of epistasis and drift. Nature reviews Genetics 8, 139-149. de Wit, E., Linsen, S.E., Cuppen, E., and Berezikov, E. (2009). Repertoire and evolution of miRNA genes in four divergent nematode species. Genome Res 19, 2064-2074.

29 Denli, A.M., Tops, B.B., Plasterk, R.H., Ketting, R.F., and Hannon, G.J. (2004). Processing of primary microRNAs by the Microprocessor complex. Nature 432, 231-235.

Ding, X.C., and Grosshans, H. (2009). Repression of C. elegans microRNA targets at the initiation level of translation requires GW182 proteins. The EMBO journal 28, 213-222.

Djuranovic, S., Nahvi, A., and Green, R. (2011). A parsimonious model for gene regulation by miRNAs. Science 331, 550-553.

Donner, J. (1965). Ordnung Bdelloidea (Akademie-Verlag, Berlin).

Elefant, N., Altuvia, Y., and Margalit, H. (2011). A wide repertoire of miRNA binding sites: prediction and functional implications. Bioinformatics 27, 3093- 3101.

Eulalio, A., Huntzinger, E., Nishihara, T., Rehwinkel, J., Fauser, M., and Izaurralde, E. (2009). Deadenylation is a widespread effect of miRNA regulation. RNA 15, 21-32.

Eulalio, A., Rehwinkel, J., Stricker, M., Huntzinger, E., Yang, S.F., Doerks, T., Dorner, S., Bork, P., Boutros, M., and Izaurralde, E. (2007). Target-specific requirements for enhancers of decapping in miRNA-mediated gene silencing. Genes & development 21, 2558-2570.

Filipowicz, W., Bhattacharyya, S.N., and Sonenberg, N. (2008). Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight? Nature reviews Genetics 9, 102-114.

Friedman, R.C., Farh, K.K., Burge, C.B., and Bartel, D.P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19, 92- 105.

García-Roger, E.M., Carmona, M.J., and Serra, M. (2006). Patterns in rotifer diapausing egg banks: Density and viability. Journal of Experimental Marine Biology and Ecology 336, 198-210.

Gladyshev, E., and Meselson, M. (2008). Extreme resistance of bdelloid rotifers to ionizing radiation. Proc Natl Acad Sci U S A 105, 5139-5144.

Gladyshev, E.A., Meselson, M., and Arkhipova, I.R. (2008). Massive horizontal gene transfer in bdelloid rotifers. Science 320, 1210-1213.

Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. (2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Molecular cell 27, 91-105.

30 Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N., Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. (2008). Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals. Nature 455, 1193-1197.

Grosshans, H. (2010). Regulation of microRNAs. Preface. Advances in experimental medicine and biology 700, v-vi.

Hamilton, A.J., and Baulcombe, D.C. (1999). A species of small antisense RNA in posttranscriptional gene silencing in plants. Science 286, 950-952.

Han, J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho, Y., Zhang, B.T., and Kim, V.N. (2006). Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125, 887-901.

Hsu, W.S. (1956). Oogenesis in the Bdelloidea rotifer Philodina roseola La Cellule 57, 283-296.

Hur, J.H., Van Doninck, K., Mandigo, M.L., and Meselson, M. (2009). Degenerate tetraploidy was established before bdelloid rotifer families diverged. Molecular biology and evolution 26, 375-383.

Khvorova, A., Reynolds, A., and Jayasena, S.D. (2003). Functional siRNAs and miRNAs exhibit strand bias. Cell 115, 209-216.

Kozomara, A., and Griffiths-Jones, S. (2011). miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic acids research 39, D152-157.

Krol, J., Loedige, I., and Filipowicz, W. (2010). The widespread regulation of microRNA biogenesis, function and decay. Nature reviews Genetics 11, 597-610.

Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of novel genes coding for small expressed RNAs. Science 294, 853-858.

Lam, F., Langley, C.H., and Song, Y.S. (2011). On the genealogy of asexual diploids. Journal of computational biology : a journal of computational molecular cell biology 18, 415-428.

Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862.

Lee, R.C., and Ambros, V. (2001). An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862-864.

Lee, R.C., Feinbaum, R.L., and Ambros, V. (1993). The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843-854.

31 Mark Welch, D.B., Mark Welch, J.L., and Meselson, M. (2008). Evidence for degenerate tetraploidy in bdelloid rotifers. Proc Natl Acad Sci U S A 105, 5145- 5149.

Mark Welch, J.L. (1998). Karyotypes of bdelloid rotifers from three families. Hydrobiologia 387/388, 403–407.

Miska, E.A., Alvarez-Saavedra, E., Abbott, A.L., Lau, N.C., Hellman, A.B., McGonagle, S.M., Bartel, D., Ambros, V., and Horvitz, H.R. (2005). Most Caenorhabditis elegans microRNAs are individually not essential for development or viability. PLoS genetics preprint, e215.

Mourelatos, Z., Dostie, J., Paushkin, S., Sharma, A., Charroux, B., Abel, L., Rappsilber, J., Mann, M., and Dreyfuss, G. (2002). miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes & development 16, 720-728.

Nielsen, C.B., Shomron, N., Sandberg, R., Hornstein, E., Kitzman, J., and Burge, C.B. (2007). Determinants of targeting by endogenous and exogenous microRNAs and siRNAs. RNA 13, 1894-1910.

Nottrott, S., Simard, M.J., and Richter, J.D. (2006). Human let-7a miRNA blocks protein production on actively translating polyribosomes. Nature structural & molecular biology 13, 1108-1114.

Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. (2007). The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130, 89-100.

Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B., Hayward, D.C., Ball, E.E., Degnan, B., Muller, P., et al. (2000). Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408, 86-89.

Qiu, C., Chen, G., and Cui, Q. (2012). Towards the understanding of microRNA and environmental factor interactions and their relationships to human diseases. Scientific reports 2, 318.

Ragan, C., Zuker, M., and Ragan, M.A. (2011). Quantitative prediction of miRNA- mRNA interaction based on equilibrium concentrations. PLoS computational biology 7, e1001090.

Reinhart, B.J., Slack, F.J., Basson, M., Pasquinelli, A.E., Bettinger, J.C., Rougvie, A.E., Horvitz, H.R., and Ruvkun, G. (2000). The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403, 901-906.

Ricci C, B.M. (2000). The biology and ecology of lotic rotifers and

32 gastrotrichs. Freshwater Biol 44, 15 – 28.

Ricci, C., and Boschetti, C. (2003). Bdelloid rotifers as model system to study developmental biology in space. Advances in space biology and medicine 9, 25- 39.

Ricci, C., Caprioli, M., and Fontaneto, D. (2007). Stress and fitness in parthenogens: is dormancy a key feature for bdelloid rotifers? BMC Evol Biol 7 Suppl 2, S9.

Schwarz, D.S., Hutvagner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P.D. (2003). Asymmetry in the assembly of the RNAi enzyme complex. Cell 115, 199- 208.

Segers, H. (2007). Annotated checklist of the rotifers (Phylum Rotifera), with notes on nomenclature, taxonomyand distribution. Zootaxa 1564, 1-104.

Sempere, L.F., Cole, C.N., McPeek, M.A., and Peterson, K.J. (2006). The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. J Exp Zoolog B Mol Dev Evol 306, 575-588.

Slack, F.J., Basson, M., Liu, Z., Ambros, V., Horvitz, H.R., and Ruvkun, G. (2000). The lin-41 RBCC gene acts in the C. elegans heterochronic pathway between the let-7 regulatory RNA and the LIN-29 transcription factor. Molecular cell 5, 659-669.

Song, J.J., Smith, S.K., Hannon, G.J., and Joshua-Tor, L. (2004). Crystal structure of Argonaute and its implications for RISC slicer activity. Science 305, 1434-1437.

Stark, A., Brennecke, J., Bushati, N., Russell, R.B., and Cohen, S.M. (2005). Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution. Cell 123, 1133-1146.

Suomalainen, E. (1962). SIGNIFICANCE OF PARTHENOGENESIS IN THE EVOLUTION OF INSECTS. Annual Reviews 7, 349-366.

Vella, M.C., Choi, E.Y., Lin, S.Y., Reinert, K., and Slack, F.J. (2004). The C. elegans microRNA let-7 binds to imperfect let-7 complementary sites from the lin- 41 3'UTR. Genes & development 18, 132-137.

Wheeler, B.M., Heimberg, A.M., Moy, V.N., Sperling, E.A., Holstein, T.W., Heber, S., and Peterson, K.J. (2009). The deep evolution of metazoan microRNAs. Evol Dev 11, 50-68.

33 Wightman, B., Ha, I., and Ruvkun, G. (1993). Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75, 855-862.

Witek, A., Herlyn, H., Ebersberger, I., Mark Welch, D.B., and Hankeln, T. (2009). Support for the monophyletic origin of Gnathifera from phylogenomics. Molecular phylogenetics and evolution 53, 1037-1041.

Xiong J., D.Q., Liang Z. (2010). Tumor-suppressive microRNA-22 inhibits the transcription of E-box-containing c-Myc target genes by silencing c-Myc binding protein. . Oncogene 29, 4980–4988.

Yang, W., Chendrimada, T.P., Wang, Q., Higuchi, M., Seeburg, P.H., Shiekhattar, R., and Nishikura, K. (2006). Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nature structural & molecular biology 13, 13-21.

Yekta, S., Shih, I.H., and Bartel, D.P. (2004). MicroRNA-directed cleavage of HOXB8 mRNA. Science 304, 594-596.

34 Chapter II

General Methods

35 454 small RNA Library Construction

Preparation of RNA

Whole animals, representative of all life stages, were collected from Philodina acuticornis and Brachionus manjavacas cultures. Density of cultures varied over the collection period but cultures were maintained to promote optimal growth by feeding on a weekly basis and frequent cleaning of cultures. Briefly, 50 ml volumes of culture were collected in a 40 µM cell strainer (BD Biosciences), The biomass was added to TRIzol Reagent® (Invitrogen, Carlsbad, CA) and homogenized using a Dounce homogenizer, then flash-frozen. 1 ml of Trizol was used per 100 mg of rotifer biomass. The total volume of homogenized-Trizol sample from each species was approximately 50 ml. Following the Trizol extraction protocol, 200-300 µg of total RNA was obtained as starting material for small RNA library construction (Heimberg et al., 2008).

Small RNA Library Construction

The following protocol was adapted from Lau et al., 2001. Unless otherwise specified, all buffers were made with RNAse-free H2O. Total RNA was extracted from each species and resuspended in no greater than 150 µl volume of RNAse- free H2O at a concentration of 1.3 mg/ml. For size fractionation of the small RNA from total RNA, a 1.5 mm 15% TBE-UREA denaturing gel was cast. The reagents required for gel preparation were the following: RNAse-free 10X TBE

(Invitrogen), gel concentrate EC-830, gel diluent EC-840, and gel buffer EC-835

(National Diagnostics), and 8M UREA. After the cast gel polymerized, wells were

36 flushed out with 1X TBE buffer prior to sample loading. 2X UREA loading dye was added to each resuspended sample of total RNA in a final volume of 300 µl.

The 300 µl volume of sample was split into 30 µl aliquots. FITC labeled markers of 18 and 28 nt in size were (Integrated DNA Tech) were added to each 30 µl aliquot and heated for 5 min at 80 °C, briefly chilled on ice and then loaded onto the gel. The gel chamber was filled with 1X TBE and the gel was run for 1-2 hr at

2 watts or until dye front was approximately 1 cm from the bottom. The gel was removed from the apparatus and wrapped in Saran wrap. The wrapped gel was placed over a UV light box and a square was drawn around the region that contained the visible markers. After the gel was removed from the UV light box the outlined fragment was excised from the gel. The sliced fragment was cut into smaller regions and added to pre-weighed Eppendorf tubes. The mass of each gel piece was calculated after adding to pre-weighed Eppendorf tubes and then gel pieces were crushed in tubes using the blunted end of a pipet tip. Then 3 times the volume of the gel piece of 0.3M NaCl was added to each tube. To elute size-selected RNA, the tubes were kept in a rotating rack overnight at 4 °C. The following day, samples were taken from 4 °C and as much liquid as possible was removed from each tube and transferred to a new tube. The tubes were spun briefly to pellet any residual gel fragments. The supernatant was transferred to a new tube leaving enough space to add twice the volume of 100% ethanol and glycogen at a concentration of 10 µg/ml. The tubes were then inverted to mix the added reagents and stored at -20 °C overnight to precipitate size-selected RNA.

The next day tubes were removed from -20 °C and spun at 13,000 x g for 30 min

37 at 4 °C. Supernatant was removed and pellet was air-dried for 10 min. The pellet was then resuspended in 5 µl of RNAse-free water. 2 µl of 3’ adaptor DNA oligonucleotide 5’ pCTGTAGGCACCATCAAx 3’ (p: phosphorylated 5’ end, x:

DMT-O-C3–CPG) at a concentration of 100 mM was added to the resuspended

RNA. In the same tube, 2 µl of T4 DNA 5x Ligation Buffer and 1 µl T4 DNA ligase (Invitrogen) were also added. The sample was incubated for 2 hours at room temperature. Another 15% TBE UREA denaturing gel was cast. The ligation reaction was stopped with the addition 10 µl of 2X Urea Loading Dye.

Then 28 and 40 nt fluorescein-labeled DNA markers were added to the reaction and heated to 80 °C prior to loading onto gel. The ligation reaction was separated on a polyacrylamide gel using the gel running conditions described above. The protocols followed for gel fragment excision, elution and precipitation of eluted

RNA were also the same as described above. Following resuspension of the eluted RNA in 10 µl of RNAse-free water, 2 µl of 5’ adaptor DNA/RNA oligonucleotide (5’ ATCGTaggcacctgaaa 3’, lowercase letters indicate RNA) at concentration of 200 mM were added to the tube. Then 2 µl of 5X ligation buffer,

1 µl of 4 mM ATP and 1 µl T4 RNA ligase were also added. The ligation reaction was incubated for 6 hr at room temperature and then stopped with the addition of

13 µl 2X Urea loading dye. Along with the addition of dye, 2 µl of a 50 nt FITC molecular marker was also added. The sample was run on a 15% TBE UREA denaturing gel and eluted. The 3’ and 5’ adaptor-ligated RNA fraction was then precipitated. Following precipitation, the sample was resuspended in 10 µl of

RNAse-free water. For RT-PCR, 5 µl of the ligated RNA sample was used in a

38 reaction with 1 µl 100 uM RT/ 5’ primer (5’ ATTGATGGTGCCTAC 3’ ) and heated for 2 min at 80 °C. After a brief spin, 6 µl of 5x first strand buffer

(Invitrogen), 7 µl of 10x dNTPs and 3 µl of 100 mM DTT were added to the reaction and heated at 48 °C for 2 min. 3 µl of reaction was removed for RT control and then 1 µl of SuperScript III (Invitrogen) was added and incubated at

48 °C for 1 hr. Then reaction was incubated with 1 µl of RNAseH for 30 min. PCR amplification of the cDNA was used to generate libraries for 454 sequencing. The

PCR reaction was set up with 5 µl of cDNA, 10 µl 10X PCR buffer, 10 µl 10X dNTPs (0.2 mM of each dNTP), 1 µl of 100 mM barcode A (5’

GCCTCCCTCGCGCCATCAGTACG ATCGTAGGCACCTGAAA 3’) and barcode

B (5’ GCCTTGCCAGCCCGCT CAGTACGATTGATGGTGCCTACAG 3’), 72 µl of

H2O, and 1 µl of Taq polymerase. Cycling conditions were as follows: 96 °C for 1 min; 96 °C for 10 sec, 50 °C for 1 min, and 72 °C for 15 sec, repeat 33 times.

PCR products were resolved on a 3% agarose gel and the 100 bp bands were excised and eluted. The amplified products contained a 5’ linker region, which was introduced through PCR amplification using the barcoded primers. The linker region to binds to its complementarity sequence that is tethered to beads during the 454 sequencing process.

Bioinformatic Processing of 454 Sequenced Libraries

The small RNA libraries were submitted to 454 Life Sciences (Branford, CT,

USA) and the Yale Center for Genomics and Proteomics Sequencing Facility for sequencing. A total of 68,671 reads were generated from the P. acuticornis

39 library and 50,224 reads from the B. manjavacas library. Pyrosequencing reads were filtered to exclude those less than 17 nt or greater than 25 nt in length; the remaining data was compressed to unique sequences and the number of reads representing each sequence with frequency listed in the sequence ID. The identification of conserved miRNAs was carried out by software described in the next chapter. Sequences that were not identified as conserved miRNAs were compared to the NCBI nt database using megablast; sequences that had a match over their entire length with greater than 95% identity were considered contamination and removed. The remaining sequences represented by at least two reads were retained as potential non-conserved miRNAs.

Illumina small RNA Library Construction

RNA Extraction

RNA extraction was carried out as described above, but with bdelloid species A. vaga.

Small RNA Library Construction

Total RNA required for library construction was 100 µg. The small RNA library construction steps were similar to the protocol described above. The protocol followed an adaptation of the Bartel lab small RNA construction protocol for the

Illumina sequencing platform; only those modifications will be described in detail.

Size fractionation of small RNA species on 15% denaturing gel required the use of 10 bp DNA Ladder (Invitrogen) and a larger region corresponding to 17-30 nt

40 was excised from lanes loaded with total RNA. The ladder was used to approximate the area on the gel for excision. Gel elution and RNA precipitation was carried out as described above. RNA was resuspended in 8 µl of H2O. The

3’ adaptor ligation reaction used 0.5 µl of the adaptor sequence 5’ pTCGTATGCCG TCTTCTGCTTGidT 3’ (inverted dT at 3’ terminus) at a concentration of 100 uM. To the reaction, 2 µl of T4 RNA Ligase buffer and 1 µl of T4 RNA ligase (Promega or GE Amersham, FPLC pure) were added.

Reactions were incubated at 18 °C for >2-4 hours. Following this step, 3’ ligated

RNAs were size-fractionated on 15% TBE UREA denaturing gel using 10 bp

DNA ladder to approximate the region corresponding to ~38-51 nt. This region was excised, and from it ligated RNA was eluted and precipitated. Purified 3’ adaptor-ligated RNAs were resuspended in 10 µl of H2O, to which 4 µl of 100 uM

5’ RNA adaptor sequence (5’GUUCAGAGUUCUACAGUCCGACGA UCCCAA

3’, barcode underlined) was added. Also added were 5 µl of T4 RNA Ligase buffer, 1 µl of 5mM ATP and 1 µl of T4 RNA ligase. The reaction was incubated overnight at 22 °C. The 5’ adaptor ligations were size-fractionated on a 10%

UREA denaturing gel and the region around 64-77 nt was excised, gel eluted and precipitated. Following precipitation 5’ and 3’ ligated RNAs were resuspended in

10 µl of H2O, to which 1 µl of 100 uM RT/ 5’ primer

(5’CAAGCAGAAGACGGCATA 3’) was added. The sample was heated for 5 min at 65 °C and spun briefly. To the sample, 6µL 5x first strand buffer (Invitrogen),

6µL 10X dNTPs (2mM), 2µL 100mM DTT, and 1µL of Superscript III RT

(200U/µL) were added. The sample was incubated at 50 °C for 1 hr and then 75

41 °C for 15 min and finally cooled at 4 °C. After RT, the following were added to 5

µl of RT reaction: 5 µl of 5X PCR Buffer, 6 µl of 2mM dNTP, 0.5 µl of 25 uM 3’

PCR primer (5’AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTC

CGA 3’), 0.5 µl of 0.25 uM of 5’ PCR primer, 31 µl of H2O, and 1 µl of Phusion

(Fineberg et al., 2009) were added. The following PCR program was used to amplify products for Illumina sequencing: 98°C for 3 min; 94°C for 30 sec, 60°C for 30 sec, 72°C for 15 sec, repeat 15 times; 72 °C for 10 min; cool at 4 °C.

Bioinformatic Processing of Reads

The A.vaga small RNA library was combined with another small RNA library containing a different barcode. These multiplexed libraries were run on the same

Illumina lane on the GAIIx Illumina sequencing platform. From the sequencing run a total of 27,502,262 reads were generated for the lane. These reads were demultiplexed using Casava1.7 to separate libraries according to barcodes incorporated during small RNA library construction (Hosseini et al., 2010).

Following demultiplexing, the reads were put through an Illumina chastity filter.

The chastity filter was a purity filter which removes reads that exhibited high base calling error due to proximity in the cluster generation step during the Illumina sequencing process. The read ids of the Illumina reads were annotated with this information in the .qseq files. The conversion of .qseq files to .fastq used a perl script that was available through the Brown University bioinformatics group, called qseq2fastq.pl. The program was used with the option to filter out

42 sequences that did not pass the Illumina chastity filter. The generated .fastq file was inspected using FASTQC

(http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/), which provides a GUI interface to visualize the read properties included in the .fastq file. Reads that met the criterion of having an average quality score > 30 were retained in the output fasta file. The fasta file of the A. vaga small RNA library contained

9,861,255 reads that met this criterion. The sequences corresponding to the adaptors were cut using a cutadapt (ver 0.8) from the reads. Sequences from the small RNA library were size-selected from the 17-25 nt size range using a python script. The script also removed any reads with ambiguous base calls designated by a “N” in the sequence. The resultant file contained a total of 3,474,005 reads.

The identification of conserved miRNAs used software described in the next section.

Northern

The following protocol is also adapted from Bartel lab protocols

(http://bartellab.wi.mit.edu/protocols). Again, all buffers were made with RNAse- free H2O. Synthesized RNA oligos of 18, 21, and 78 nt were labeled with 33P-

γATP in a standard 5’ kinasing reaction using 10 pmoles of RNA and 50 uCi of

33P. The oligos were labeled using the following reaction conditions, 1 µl of 10 pm of each oligo was added to 2 µl γ 33P ATP, 10 mCi/ml (Perkin Elmer), 2 µl

10X PNK buffer (Ambion), 2 µl T4 PNK (Ambion), and 13 µl water (from

43 KinaseMax kit). The reaction was incubated at 37 °C for 1 hr. Oligos were subsequently gel purified and resuspended in 100 µl of H2O. Total RNA was extracted as described above from Trizol flash frozen samples of A. vaga. A total of 10 µg of RNA was resuspended in 15 µl of H2O and 12 µl of 8M Urea Loading

Dye. To make loading dye, first 8M Urea loading buffer (24 g urea 2 ml 0.5 M

EDTA, pH 8.0 0.1 ml 1M Tris, pH 7.5, 50 ml H2O) was made and filter-sterilized.

Xylene cyanol and bromophenol blue stocks were made by adding 140 mg of each to separate aliquots of 1 ml of 8M Urea loading buffer. Loading dye was made by adding 12 µl of each Xylene cyanol and 4 µl of Bromophenol blue stock to 1 ml of Urea loading buffer. Following addition of loading dye to RNA, the sample was heated at 80 °C for 5-10 min. 1 µl of synthesized markers was added to 9 µl of H2O and 12 µl of loading dye, then heated at 80 °C for 5-10 min. A 15%

TBE UREA denaturing gel was cast on glass plates (19.5 cm x 16 cm) with a 20 well comb (4mm x 15 mm x 0.8 mm). The gel was run in 0.5x TBE buffer for 1 hr at 200 V for 1.0 hr and voltage was raised to 500 V until Bromophenol blue dye front ran off the gel, which took approximately 1.5 hr. The gel was wrapped in

Saran wrap and stained with 4 µg/ml EtBr in 0.5X TBE for 5 min to check for intact ribosomal bands. For transfer, Hybond NX membrane (Amerhsam) was cut to the dimensions of the used lanes and 3 mm Whatman paper was cut to the dimensions of the entire gel. Membrane and Whatman paper were presoaked in

0.5X TBE. Then the soaked membrane was placed on top of the gel (still placed in saran wrap) and the unused area of the gel was excised. Then 3 sheets of soaked Whatman paper were placed on top of membrane. The gel, membrane

44 and Whatman paper were flipped over and placed on the bottom of Semi-dry apparatus. Then saran wrap was removed and additional 3 sheets of soaked

Whatman paper were placed on top of the gel. To remove air bubbles, a pipette was rolled along gel/membrane/filter paper sandwich. To keep the sandwich moist, a bit of 0.5X TBE was added. A lid was placed on top of Semidry apparatus and connected to Hoefer EPS 2A200 power supply (Amersham

Pharmacia). The transfer was started using a constant current of 3.3 mA/cm2 for

35 minutes, while voltage remained around 20 V. Following completion of transfer, the sandwich was removed and flipped over so that the membrane side was up. With a syringe, a hole was made in the membrane corresponding to the side that matched the top of the gel. The gel was restained to check for sufficient transfer. The blot was marked on the corner with pencil to indicate side containing RNA. While blot was moist, the RNA was chemically crosslinked by adding 2 µl of EDC chemical crosslinking solution. EDC crosslinking solution was made by making 0.16 M of EDC (1-ethyl-3-(3-dimethylaminopropyl) carbodiimide). To make crosslinking solution, first, 245 µl of 12.5M 1- methylimidazole was added to 9 ml of DEPC-treated water and pH adjusted to

8.0 with 1 M HCl. Prior to use, 0.753 g of EDC was added to the crosslinking solution and volume brought up to 24 ml with DEPC-treated water for a working solution of 0.16 M EDC in 0.13 M 1-methylimidazole at pH 8. The blot was covered with EDC crosslinking solution incubated at 60 °C for 1 hr (Fineberg et al., 2009; Pall and Hamilton, 2008). Following crosslinking, the blot was transferred to a bottle containing 25 ml of Prehyb/hyb solution with 1mg of

45 sheared salmon sperm DNA (Sigma) that was heat denatured for 5 min.

Prehyb/Hyb solution was made up of 12.5 ml of 20X SSC, 1 ml of 1 M Na2HPO4 pH7.2, 35 ml of 10 % SDS, and 1.5 ml of 100X Denhardt’s Solution. The hyb oven was set to 50 °C for prehybridization and the blot with prehybridization solution was left overnight. During prehybridization, probe for hybridization was prepared using the oligo corresponding to the let-7 probe used by Pasquinelli et al. 2003 (5’AACTATACA ACCTACTACCTCACCGGATCC 3’) was labeled in a kinase reaction using 2 µl 10µM oligo (~20mer, 20 pmoles), 2 µl 10X T4

PNKinase Buffer, 2.5 µl 32P γ-ATP, >7000Ci/mmole (Perkin-Elmer), 12 µl dH2O 1

µl T4 Polynucleotide Kinase (Fineberg et al.). The reaction was incubated at 37

°C for 1 hr and 68 °C for 10 min. 30 µl of H2O was added to labeling reaction. To purify labeled probe, the entire volume was applied to G-25 MicroSpin columns

(Amersham Pharmacia). 1 mg of denatured salmon sperm DNA was added to a bottle of 25 ml of pre-warmed Prehyb/Hyb solution. Then 25 µl of purified radiolabeled probe was added to the bottle and incubated in hyb oven overnight at 50 °C. Non-stringent and stringent wash solutions were prepared for the following day and incubated overnight at 50 °C. The non-stringent wash solution was made using 30 ml of 20X SSC, 5 ml of 1M NaH2PO4 pH 7.5, 100 ml of 10%

SDS, 20 ml of 100X Denhardt’s Solution, and 45 ml of dH2O. The stringent wash solution was made using 10 ml of 20X SSC, 20 ml of 10% SDS, and 170 ml of dH2O. The next day, hyb solution was poured out into radioactive liquid waste container and then 40 ml of non-stringent wash solution was added to blot in the bottle. This was incubated at 50 °C for 30 min. The solution was dumped out and

46 40 ml of non-stringent wash solution was added and incubated again for for 30 min at 50 °C. Then 80 ml of stringent wash solution was added and incubated at

50 °C for 5 min. After removal of stringent wash, the blot was removed from bottle. Then it was placed in Saran Wrap and then taped to the inside of a cassette with a cleared phospohorimager screen and left overnight for exposure.

The next day, screen was imaged on a Phosphoimager.

qPCR

Total RNA was extracted from biomass of three rotifer species: A. vaga, P. acuticornis and B. manjavacas. Total RNA was also extracted from Nemostella vectinis, with animals collected from both planula (larval form) and adult life stages. Using miScript miRNA detection kit (Qiagen), 1 µg of total RNA for each sample was added to separate tubes containing 4 µl of 5X miScript RT buffer and

1 µl of miScript Reverse Transcriptase Mix. Samples were brought up to 20 µl with RNAse-free H2O. The samples were stored on ice and incubated at 1 hr at

37 °C and then 5 min at 95 °C for heat inactivation. The samples were placed on ice and then qPCR reactions were set up in triplicate for most reactions using probes for miR-87, miR-100, miR-125 and let-7 (sequences listed Chapter 4,

Table 4-1). Each qPCR reaction contained 10 µl of 2X QuantiTect SYBR Green

PCR master mix, 2 µl of 10X miScript Universal Primer, 2 µl of 10X miScript

Primer assay, 1 µl of Uracil N-glycosylase (Life Technologies), 2 µl of cDNA from previous step and 3 µl of H2O. Reactions using rotifer and N. vectinis cDNA with miR-100, miR-125 and let-7 probes were set up in triplicate. Single reactions

47 were set up for rotifer cDNA with miR-87 probe. For all probes a no template reaction was also set up in triplicate with the exception of mir-100 for N. vectinis and miR-87 reactions. qPCR reactions were run on an ABI Step One Thermal

Cycler using the following conditions: 50 °C for 2 min; 95 °C for 15 min; 94 °C for

15 sec; 55 °C for 30 sec, 70 °C for 30 sec (acquire fluorescence data collection at this step); repeat 40 times.

Secondary Structure Prediction

Reads matching conserved miRNAs returned from SMD (Software for miRNA

Detection), described in the next chapter, were placed into a additional python script, Genomicreturnofprecursor, which is given the precursor and mature sequence of the miRNA that matches the read and finds the genomic context of the miRNA precursor region and the secondary structure of the returned genomic segment is predicted by mFold (Markham and Zuker, 2005). Small RNA reads which matched to conserved miRNAs were accepted if the corresponding precursor region yielded a secondary structure that had a free-energy of folding above -20 kcal/mol.

miRNA Target Prediction

The maximum free energy threshold for accepted miRNA::target interactions was set to 14 kcal/mol. Only targets of widely conserved miRNAs such as let-7 and miR-125 were tested for binding with putative targets. RNAhybrid was used to predict miRNA::target interaction (Kruger and Rehmsmeier, 2006).

48

References

Fineberg, S.K., Kosik, K.S., and Davidson, B.L. (2009). MicroRNAs potentiate neural development. Neuron 64, 303-309.

Heimberg, A.M., Sempere, L.F., Moy, V.N., Donoghue, P.C., and Peterson, K.J. (2008). MicroRNAs and the advent of vertebrate morphological complexity. Proc Natl Acad Sci U S A 105, 2946-2950.

Hosseini, P., Tremblay, A., Matthews, B.F., and Alkharouf, N.W. (2010). An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets. BMC research notes 3, 183.

Kruger, J., and Rehmsmeier, M. (2006). RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic acids research 34, W451-454.

Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862.

Markham, N.R., and Zuker, M. (2005). DINAMelt web server for nucleic acid melting prediction. Nucleic acids research 33, W577-581.

Pall, G.S., and Hamilton, A.J. (2008). Improved northern blot method for enhanced detection of small RNA. Nature protocols 3, 1077-1084.

Pasquinelli, A.E., McCoy, A., Jimenez, E., Salo, E., Ruvkun, G., Martindale, M.Q., and Baguna, J. (2003). Expression of the 22 nucleotide let-7 heterochronic RNA throughout the Metazoa: a role in life history evolution? Evol Dev 5, 372- 378.

49 Chapter III

miRNA informatics

50 Abstract

The presence of miRNAs in diverse metazoan taxa suggests that miRNA regulation was a common feature early in animal evolution. However, only a small fraction of widely conserved miRNA orthologs shows near complete sequence identity. Without genome data, some divergent miRNA homologs in protostome species are difficult to identify with available tools that rely on assumptions of sequence identity that fail to account for the pattern of variation likely to occur in the evolution of a metazoan miRNA sequence. Since target sites for a miRNA determine its evolutionary fate following innovation, a search for miRNA homologs must reflect the nature in which a miRNA binds its targets. Here we present a method to identify miRNA sequences built on the assumption that constraints on miRNA sequence conservation are imposed by miRNA::target site interaction. The method was implemented within a program, SMD (Software for miRNA

Detection), and evaluated using known datasets to test sensitivity and specificity. The method was then used to identify the conserved miRNA complement in a previously unsampled species of rotifers, an early- branching metazoan phylum evolutionarily distant from established model systems and their annotated miRNA repertoire. Since the algorithm is able to dictate regions of accepted sequence divergence along with the degree of divergence to reflect miRNA::target interaction, it returned conserved and divergent miRNA homologs. Several rotifer miRNA homologs were

51 undetected or miscalled using other miRNA detection tools, demonstrating the utility of the program.

52 A class of small noncoding RNAs (~22 nt), called microRNAs (miRNAs), has changed our understanding of the post-transcriptional regulatory landscape. In animals, these small RNAs bind with imperfect complementarity to their mRNA targets to mediate translational repression. Sequence similarity among miRNA homologs has aided their identification across diverse animal phyla. These findings suggest once innovated, miRNAs are infrequently lost due to the indispensability of their function in numerous cellular processes (Wheeler et al.,

2009). Determining the occurrence and distribution of miRNA orthologs across diverse phyla strengthens our understanding of animal phylogeny and also the role of miRNAs in the evolution of the animal body plan.

Sequence similarity among miRNA homologs tends to follow a logic characteristic of their function. miRNA homologs show complete sequence conservation towards the 5’ end in a region called the seed sequence and then partial identity downstream. Since the seed sequence generally binds with specific complementarity to the target, it is the primary determinant of miRNA function as well as miRNA homology. The classification of the sequence as a miRNA homolog or member of a miRNA family is decided by the degree of similarity in the remainder of the sequence, following the seed. Functionally, the level of 3’ sequence complementarity to the target is responsible for varying degrees of target specificity, which is largely dependent on the sequence composition of the target site.

53 The lessons of miRNA::target interaction from the well-studied models of let-7 and its paralogs are useful for examining key aspects of miRNA evolution. These lessons provide fundamental examples that demonstrate the way in which miRNA targets may and may not constrain miRNA sequence as it evolves. The most salient example of miRNA sequence conservation is let-7, which exhibits sequence identity from worms to humans (Pasquinelli et al., 2003; Pasquinelli et al., 2000). In contrast, homologs of the let-7 family of miRNAs that arose within the nematode lineage show varying levels of sequence divergence (Figure 3-1A and 3-1B). Alignment of let-7 and its family members miR-84, miR-48 and miR-

241 homologs show substitutions and indels (Figure 3-1A), while their orthologs show sequence divergence even within the nematode lineage (Figure 3-1B).

The answer as to why the sequences of the let-7 family members were able to diverge, while the let-7 orthologs have been conserved may lie in the targeting repertoire of these miRNAs. Sequence similarity is poor among the let-7 family members beyond the seed sequence (Figure 3-1A); however, experimental evidence shows they are capable of binding the same target sites on hbl-1 and let-60/KRAS (Abbott et al., 2005; Johnson et al., 2005). In contrast, both the let-7 and its target site on the 3’ UTR of lin-41 reflect a sequence conservation underlying the sole function of the molecule: to bind its target (Slack et al., 2000).

The let-7 complementary sites (LCSs) on the 3'UTR of lin-41 are specifically bound by let-7 to the exclusion of other let-7 family members (Cevec et al., 2008,

2010). Structural data shows that despite the bulge within the seed sequence of

54 LCS1 on the lin-41 binding site, the base-pairing interactions of both LCS1 and

LCS2 with let-7 are quite stable. This suggests that the lin-41 binding site imposes a degree of specificity on the miRNA sequence that binds it. The binding of the other let-7 paralogs is likely not as energetically favorable, which could have led to the preference of these LCSs on lin-41 for let-7. The specificity of this target site for let-7 may exert constraints on the let-7 sequence, preventing it from diverging in sequence like other members of the let-7 family. This hypothesis is further supported by the absence of lin-41 in conjunction with divergent let-7 homologs identified in Platyhelminthes (Friedlander et al., 2009)

(Copeland et al., 2009; Simoes et al., 2011) (Figure 3-2). The let-7 designation of these miRNA sequences may be contentious, but their existence raises questions about miRNA homology, specifically orthology (Hertel, 2012). Since there doesn’t appear to be a lin-41 homolog in Schistosoma, the constraints on let-7 sequence conservation may be different.

The evolution of protein coding genes can be understood by examining their function. Similarly, the evidence heretofore presented suggests that the evolution of miRNA sequences can also be understood through their function. From these data, it could be inferred that the binding sites of miRNAs do not always impose similar levels of sequence constraints, which then allows the miRNA sequence to evolve without the loss of its targeting repertoire. Over short phylogenetic distances, such as orthologs of the let-7 family members in the nematode lineage, slight changes to the miRNA sequence does not greatly impact their

55 targeting repertoire and miRNA::target interaction is generally conserved (Calvi et al., 2007; Chan et al., 2005; Stark et al., 2005). However, miRNA-target conservation is rare over larger phylogenetic distances. An examination of miRNA-target sites across worms, flies and vertebrates reveals only a small fraction of sites are preserved (Chen and Rajewsky, 2006; Lall et al., 2006). This finding is supported largely by the difference in the target repertoires of miRNAs that arose early in animal evolution. As would be expected, miRNA orthologs over larger phylogenetic distance exhibit greater sequence divergence. The divergent let-7 sequences that are coincident with the absence of its conserved target lin-41 in Platyhelminthes suggest sequence divergence among distant miRNA orthologs is not just stochastic, but is influenced by its targets. Rotifers are a sister group to Platyhelminthes and may also exhibit a dearth of miRNA targets that constrain the evolution of miRNA sequences in earlier investigated taxa. Since whole genome data was not available for the sampled rotifer species, a novel method was needed to recognize both divergent and well-conserved homologs. If a method that uses miRNA::target interaction to define conservation within miRNA homologs is correct in its assumption, it should be more effective in identifying conserved miRNA homologs of underrepresented taxa in miRBase

(version 18) than existing methods that use a different algorithm.

Here we describe a method that implements such an algorithm within a Python script, SMD (Software for miRNA Detection), to find miRNA homologs in small

RNA data without the aid of an assembled genome. The method may be applied

56 to create software in any programming language to search for miRNA homologs with varying levels of sequence divergence. While this feature may compromise the sensitivity of the program, the described algorithm then applies a stringent selection in its next step to compensate for this problem. Both these filtering steps mimic the nature of miRNA::target interactions to identify conserved miRNA homologs. The overall program design enables the sequence similarity- based search to be conducted with precision, specific to user-defined boundaries of sequence divergence. For a queried read, the program first searches for a homolog match that is comparable to the sequence identity among let-7 orthologs. If none of the miRNA homologs within its supplied database meet the strict sequence identity requirement, it reduces its stringency parameters to find a miRNA homolog match. It does so by incorporating an indel and substitution tolerance following the seed sequence, which was tailored to conservatively reflect the sequence divergence among the let-7 paralogs in the nematode lineage. As observed in the let-7 paralog examples as well as numerous other orthologs, contiguous sequence identity enforced by local alignment algorithms is not a necessary feature of miRNA sequence conservation or miRNA target interaction.

Available datasets were first used to test the sensitivity and specificity of the method in comparison to algorithms used in existing methods. Then it was tested using a small RNA survey of a protostome species belonging to the phylum

Rotifera, Brachionous manjacavas. This species belongs to a class of

57 facultatively sexual rotifers known as monogononts. The two other rotifer species, Adineta vaga and Philodina acuticornis, belong to a class of obligately asexual rotifers called bdelloids.

Methods

Rotifer small RNA library construction and sequencing

Whole animals were collected from Philodina acuiticornis, Brachionus manjavacas and Adineta vaga cultures representative of all stages of development. Total RNA was extracted using Trizol (Invitrogen Life

Technologies, Carlsbad, CA). For B. manjavacas and P. acuticornis, 200-300 µg of total RNA was used to construct small RNA libraries following the protocol described in Wheeler at al. and sequenced using a 454 GSFLX, which yielded a total of ~60,000 reads from the small RNA libraries of each species. For A. vaga

~100 µg of total RNA was used following the library construction protocol by Lau et al. and sequenced on an Illumina GA IIx (Lau et al., 2001). Primers and adaptors for library construction are listed in Chapter 2. Both Illumina and 454 reads were filtered to exclude those less than 17 nt or greater than 25 nt in length; the remaining data were compressed to unique sequences and the number of reads representing each sequence. The Illumina reads totaled around

1.8 million and were filtered based on average quality score of above 30.

Identification of conserved miRNAs in small RNA library

58 The similarity of a queried sequence to conserved miRNAs in miRBase was evaluated in a step-wise fashion within the program (Figure 3-3). First, the program determined if there was an occurrence of a miRNA seed anywhere in the first 7 bases of the sequence; a seed-shifting mode allowed a match anywhere in the first 8 bases to accommodate either incomplete 5’ trimming of adaptor sequences or slight shifts in seed sequence. If a match to a seed region was found, the query was compared to the full-length miRNA sequence using an approximate string-matching tool, called agrep. Agrep is a tool for string comparison, which in this case is represented by comparison of the read and conserved miRNA with the matching seed sequence to the read (Wu, 1994).

Agrep was set to tolerate at most 4 nt substitutions for an accepted match. The

-I1 -D1 flags were also used for the comparison to set the tolerance level of gaps and deletions. This allowed for gaps and substitutions to be counted as 1 error among the substitutions accepted. The string that was compared started from the seed sequence portion of the read and miRNA, disregarding the bases that came before the matching seed on both sequences. If this condition was met, the matching conserved miRNA was added to a second list of potential miRNA homolog matches. The agrep step serves as a net for specifying the exact degree of tolerated divergence for homolog matches. Since the seed matching criterion was previously applied to this step, substitutions should occur only downstream of this region. Then the accumulated list of miRNA homolog matches were then ranked by their similarity to the queried read using a

Needleman-Wunsch (NW) alignment of each of the conserved miRNA match

59 generated for a queried read. This alignment took only the overlapping regions between the miRNA homolog and queried read. The remaining unaligned regions were recorded as well as the gaps required to form the alignment. A NW score was returned for the alignment. Preference was set within the program at this step to look for the miRNA homolog match that had the greatest identity to the read with the least number of substitutions and no gaps. If these criteria could not be fulfilled, the algorithm attempted to find a less preferred match that at least met the following criteria: NW score > 12, 1 gap, 4 substitutions and shortest length of unaligned regions. The highest scoring conserved miRNA homolog match, reflecting the least number of substitutions was reported as the read’s miRNA homolog match. At this step, the algorithm applied a built-in preference for matches without gaps and shortest unaligned regions. Using all of these factors reported from the NW alignment, the match of the conserved miRNA homolog to the queried read was assessed. The last filtering step was particularly useful for overcoming 3’ heterogeneity often reported among detected miRNAs in small RNA libraries (Lee et al., 2010; Neilsen et al., 2012). It assessed homology in a manner that doesn’t penalize the read or compared miRNA for differences in size length.

Modifications and adjustments were made to the algorithm at each step to increase both sensitivity and specificity to ensure accurate assignment of conserved miRNA to the queried read. SMD created three output files (not including a log file). The first file is a fasta file of all the reads and their conserved

60 miRNA match appended to the read ID. The second file is an alignment file of the read with the conserved miRNA it matches. The last file is a counts file, which stores the number of times a read matches a conserved miRNA. Accessory programs, miRreadcollapser and miRseqvars helped to improve detection of other sequence variants of an SMD-reported read in the rotifer small RNA libraries (Figure 3-4). The first program, miRreadscollapser, was used to identify sequences in the output file that shared an exact subsequence identity. This program required anything matching the region from second nucleotide of the read to two nucleotides from its end be returned along with the queried read. This additional feature also provided a way to examine 5’ and 3’ heterogeneity within identified miRNA homologs. The second program, miRseqvars, scanned the input file of small RNAs for any variants that matched within 4 nt of SMD-reported read. In this way a matrix of homology was built for a newly identified miRNA homolog depicted by the concentric circles in Figure 3-4. As shown in the example in Figure 3-4, miRreadscollapser was not only a tool for identifying 5’ and 3’ heterogeneity within sequences that correspond to a conserved miRNA homolog, but was also useful for identifying instances where varying lengths of a read correspond to different miRNAs. This feature prevented false inflation of miRNA diversity, a problem that would otherwise arise from not anchoring the reads to an assembled genome. miRseqvars looked for polymorphisms within reads in the input file and was built with a feature to also search for genomic correspondence of discovered sequence variants, if genomic data was available for a particular species. In Figure 3-4, the concentric circles represent the output

61 of both programs and show the overlap between them. Reads Z and Y would also be returned by miRseqvars along with the reads in the red circle. An alignment tool was then used to compare the outputs from both programs to determine which reads fell into any of the aforementioned categories.

Genomic validation of SMD-identified miRNAs

Characteristic hairpin structure of the precursors of putative miRNA homologs was obtained by pulling out its corresponding genomic context from a sequenced genomic library of B. manjavacas, comprising of 200 bp reads. A simple script was created to extract the precursor region from the file of genomic sequences.

The length of the precursor region extracted was determined by the precursor of the miRNA homolog to which the read best corresponded. The minimal free energy of folding (mfe) was calculated using RNA folding program mFOLD. Only hairpin structures lower than or equal to -20 kcal/mol were accepted. This criterion was the final filter applied to the identified miRNA homologs in order to be reported.

Results

Detection of Drosophila melanogaster miRNAs using SMD

BLAST is routinely used for mining small RNA surveys for conserved miRNAs.

To demonstrate the utility of the method (over BLAST-based algorithms) in the identification of conserved miRNAs without the use of whole genome data,

SMD’s detection of conserved miRNAs was compared to BLAST using known

62 datasets. The first test was set up to identify the subset of D. melanogaster miRNAs that have known homologs outside the Drosophila genus. First, all

Drosophila miRNAs were removed from miRBase. Then a file containing only

Drosophila melanogaster miRNAs was given to SMD using its default settings.

The same file was also given to BLAST, which was only restricted to a word size of 7. There are a total of 430 D. melanogaster (Dme) miRNAs in miRBase.

These miRNAs were parsed into three tables (Tables 3-1, 3-2 and 3-3). The first table, Table 3-1, lists the Dme miRNAs that belong to miRNA families found outside the Drosophila genus. For purposes of a simple comparison, the table is made up of miRNA terms. A single miRNA term represents all miRNA isoforms with the same designation. For example, miR-2 detection in the table indicates recognition of any miR-2 sequence by either of the methods. Since homologs for these miRNAs exist in the reference database depleted of all Drosophila miRNAs, a detection method to identify conserved miRNAs should be able to identify these Dme miRNAs. Table 3-1 was simplified to miRNA terms because both isoforms originating from a pre-miRNA are not always conserved, so detection of either isoform was considered sufficient for assignment (ref). Tables

3-2 and 3-3 detail assignment of each of the 430 Dme miRNAs by both methods.

These tables summarize the output files from SMD and BLAST analysis, which can be found in Supplementary Files 1 and 2. Table 3-2 lists miRNA isoforms represented by the miRNA terms in Table 3-1. Table 3-3 is comprised of miRNA isoforms that are found only within the Drosophila genus. Assignment of a miRNA homolog to any of these miRNA isoforms was considered a false positive

63 because they did not possess homologs outside of the Drosophila genus.

It is evident from the data presented in Tables 3-2 and 3-3 that SMD outperforms

BLAST in both areas of sensitivity and specificity to identify conserved miRNAs.

These results will be summarized, but first specific examples taken from the output of both methods will be reviewed to describe the scoring for the two methods. Also, these examples will be useful for understanding the convenience of interpreting SMD output and the advantage of its algorithm over the BLAST algorithm. Certain features, such as the utility of SMD’s seed sequence requirement and its gap tolerance, will be highlighted. The output format for SMD and BLAST were different. SMD returned one miRNA homolog match for a given miRNA sequence, while BLAST returned a set of top hits. In certain instances, the top hits were heterogeneous, as shown in the example in Figure 3-5A and 3-

5C. From the alignment shown in Figure 3-5B, it would be difficult to distinguish from these data alone if the queried sequence corresponded to a legitimate conserved miRNA homolog or contamination. Despite the occasional lack of agreement among all top hits returned by BLAST, the BLAST assignment was still scored as correct if just one of these top hits matched the miRNA term of the queried Dme miRNA (Tables 3-1, 3-2 and 3-3). In contrast to the BLAST output, the SMD output was simplified returning only a single best match, identified by the built-in ranking process of its algorithm.

The first step within SMD requires a seed sequence match of 6 nt for any

64 reported match. Unlike BLAST top hits returned for a queried sequence, visual inspection of the alignment to each of the best hits was not needed to verify that seed sequences match within the alignment. The problem with mismatching seed sequences among top hits reported by BLAST hits are highlighted in Figure 3-6.

For dme-miR-9c-3p, the top hit from BLAST reports an unrelated miRNA homolog (Figure 3-6A). It can be seen from the alignment that there is no requirement within BLAST to ensure that seed sequences match among its top hits. In contrast, the miRNA homolog match returned by SMD for the same queried dme miRNA homolog allowed for a shifted seed sequence, which enabled it to find the matching homolog outside the Drosophila genus (Figure 3-

6B). Additionally, the gap tolerance also aided in its recovery of the correct miRNA homolog. There is one instance in Table 3-2, where SMD returns a miRNA homolog match not identified as a miRNA family member of the queried

Dme miRNA homolog. dme-miR-309 is matched to miR-3477 by SMD and

BLAST returns miR-309 as its match. The miRNA homolog chosen by SMD may fall into the category of a miRNA family member as indicated by the sequence similarity (Figure 3-7), and would therefore not preclude it from being considered a legitimate Dme miRNA homolog. The reason miR-3477 was chosen over the top hit reported by BLAST was due again to the tolerance of the SMD algorithm for gaps in order to minimize substitutions within the alignment of the queried sequence and its chosen miRNA homolog match. Overall, this gap tolerance feature to minimize substitutions is an asset to finding the correct miRNA homolog match as shown in Figure 3-6B.

65

SMD matches at minimum resemble miRNA family members and in most cases match the miRNA term of the queried Dme miRNA homolog. The incorrect top hits returned by BLAST often do not agree within the seed sequence of the queried Dme miRNA and miRNA homologs of BLAST top hits. In total, SMD analysis would have excluded a single conserved miRNA homolog from detection, dme-miR-958, which belongs to miRNA family that includes miR-1175.

The other dme miRNAs missed by SMD, but returned by BLAST represent isoforms or paralogs of a Dme miRNA SMD correctly identified (Table 3-2).

Furthermore, many of these miRNAs belong to the category of mixed BLAST results listed in Figure 3-5C. Discounting the heterogeneity among BLAST top hits, 3 miRNA homologs, dme-miR-311, dme-miR-313, and dme-miR-958 would be missed due to poor alignment with their top hits. Granted BLAST does return a miRNA family member, miR-92, for dme-miR-313-3p as one of its hits. This hit, however, is not scored among its top hit and visual inspection of the alignment would be needed to identify this as its correct homolog. In contrast, the SMD algorithm returns miR-92 as its match because the algorithm weighs the most important features of a miRNA match correctly.

The real advantage of SMD is seen in its low rate of false positives. In total, there are 3 false positives for SMD in the Dme miRNA analysis. Discounting the false positives for the miRNAs that are an isoform of conserved Dme miRNA homologs in Table 3-2, there are 246 false positives listed in Table 3-3.

66 Alignment of the top hits for these miRNAs show little seed sequence correspondence to BLAST top hits. These alignments exemplify the sloppy and imprecise nature of BLAST for matching sequences within a small RNA surveys to conserved miRNA homologs. Using a common e-value cutoff of 0.01 for the

BLAST hits would reduce the false positives reported in Table 3-3, but also at a cost of 13 correctly identified miRNAs reported by BLAST in Table 3-2 (Table 3-

4). While the miRNAs listed in Table 3-3 represent conserved miRNAs within the

Drosophila genus, they serve as random sequences for this analysis. From the

BLAST alignments, it would be difficult to determine that these sequences are legitimate Drosophila miRNAs. In the absence of a sequenced genome, it is impossible to distinguish these sequences from other random sequences in the small RNA library by just their BLAST results.

SMD detection of divergent miRNA homologs using other datasets

A second comparative test was conducted using the detection of aforementioned divergent let-7 sequences. This test highlighted the utility of the algorithm’s flexible substitution parameters to uncover more divergent miRNA sequences in an iterative manner. The Platyhelminthes let-7 sequences represent the most divergent let-7 sequences (Figure 3-8A). Again, a reference database was constructed as described above with all Platyhelminthes miRNAs removed. Then the Platyhelminthes let-7 sequences were given to both SMD and BLAST (with same settings as previously mentioned), which mined the reference database lacking Platyhelminthes sequences for a miRNA homolog match. SMD returned

67 only 3 of the 9 conserved miRNAs as these three were the only miRNAs that met the criteria enforced by the default settings of the program (Figure 3-8B). These three sequences represent one let-7 homolog from each of the species surveyed.

The other S. mediterranea let-7-5p variants that were undetected represent the only known protostome let-7 paralogs and therefore are even more divergent than the let-7 ortholog. BLAST only returned two of these sequences. When the program was modified to tolerate 5 substitutions, then 6 of the 9 platyhelminthes let-7 miRNAs were returned. In this instance, the limits of BLAST could not be further relaxed to retrieve the identified let-7 homologs from Platyhelminthes.

These let-7 homologs were initially uncovered using available whole genome data. In this instance, modifications of SMD parameters recovered 5 of the 6 let-7

Platyhelminthes homologs without relying on whole genome data.

For a final set of divergent miRNA homologs, a seed shifted let-7 sequence from

Nereis diversicolor (ndi) was taken from Wheeler et al. survey along with the most diverged miRNA sequences from two parasitic nematode species, Brugia pahangi (bpa) and Haemonchus contortus (hco) (Winter et al., 2012). In addition to the second set of let-7 homologs, a divergent miR-100 homolog and divergent members of the miR-2 and miR-55 families were also used in this test (Figure 3-

8C). Interestingly, the level of sequence divergence of these miRNAs to other members of their miRNA families is similar to the level of sequence divergence between the reported Platyhelminthes let-7 sequences to other known let-7 sequences. However, in parasitic nematodes, this warranted a different homolog

68 assignment of these miRNAs while in Platyhelminthes they were considered paralogs, despite the fact there is no evidence of genomic clustering of these miRNAs.

Sequences were recovered using different settings within SMD to show the utility of these features. The alignments with matched miRNA orthologs are shown in

Figure 3-7D. To retrieve a match for seed-shifted let-7, SMD was run in seed- shifted mode. To retrieve a match for hco-miR-5899, the substitution tolerance was raised to 6 nt. The other matches were retrieved using default settings, which do not accommodate severe seed shifting or higher than 4 substitutions between the queried read and miRNA homolog match. The divergent nematode miRNAs were originally identified using a different strategy: reads were mapped to a genome and those with hairpin structure from genomic context were blasted to other nematode miRNAs. With the exception of bpa-miR-5870, this method, with either default or relaxed settings, was able to identify divergent miRNAs sequences without the aid of a genome. These results again demonstrate the advantage of modifiable parameters within the algorithm that enabled the detection of divergent miRNA homologs without the reliance of assembled genome data.

SMD specificity test using miRBase

The flexibility of the algorithm implemented within SMD makes accommodations to search for well-conserved and poorly conserved miRNA homologs that other

69 search tools do not. However, it is equally important that the algorithm performs with the level of specificity as previously applied algorithms. To test this, all sequences listed in miRBase were given to SMD and its ability to return the sequence with representing the matching miRNA homolog was assessed. The self-to-self test of miRBase sequences using SMD accurately returns the sequence corresponding miRNA match for all of the miRBase sequences shown in alignment file of the output (Supplementary File 4). Out of 21,643 sequences, all sequences returned by the program represent the sequence of queried miRNA homolog match. There were 14 instances where there was a difference in miRNA homolog assignment than the queried miRNA homolog (e.g., rno-miR-

3596a for bfl-let-7b). In these instances, one sequence is representative of different miRNAs, but to have the program enumerate the entire list versus a single match would clutter the output file; therefore only one homolog is selected.

The reason for different homolog assignments of the same mature miRNA sequence level might lie in the sequence composition and origin of the precursor.

The program therefore performs as well as local alignment searches to find the best matching miRNA, while also tolerating flexibility within search criteria to find the best match for the entire queried sequence. This demonstrates that this algorithm does look for the best local alignment match first, but if this criterion is not met, it then searches for the match that has the greatest homology to the entire queried sequence it is given.

Identification of conserved rotifer miRNAs by SMD versus other software

70 The final test of the SMD algorithm was conducted by mining the small RNA libraries of B. manjavacas, P. acuticornis and A. vaga. The sequences from the small RNA survey were processed by software that used a BLAST-based algorithm (Griffiths-Jones et al., 2006; Wheeler et al., 2009). The same sequences were processed by SMD with default settings. While the BLAST- based program only uncovered only a small subset of conserved miRNAs, the full complement of conserved miRNAs was only recovered using SMD (Table 3-

5). This table comprises of miRNAs that are supported by hairpin secondary structure (Appendix A). The full rotifer miRNA complement from all three species will be discussed in detail in Chapter 4. In this section, the miRNAs listed in Table

3-5 are used to demonstrate the utility of the SMD algorithm.

Further confirming these miRNAs as legitimate are miRNAs, miR-100 and miR-

125, which were returned only by SMD. Inspection of partial genome assemblies of B. manjavacas revealed that these miRNAs along with let-7 existed on a conserved genomic cluster (Figure 3-9A) (Hertel, 2012). Neither of the undetected miRNA homologs was extremely divergent in sequence to known miRNA homologs that may prevent established software from recognizing them

(Figure 3-9B). Figure 3-10A presents an example where the search tool within miRBase reports the incorrect homolog match for a conserved miRNA found within A. vaga and P. acuticornis. The score from a local alignment algorithm is better with miRBase selected match miR-184. However, as can be seen from alignment below of the miRNA homologs to the queried small RNA read, miR-

71 748, chosen by SMD, is more likely to be the homolog match. The tolerance for gaps to minimize substitutions beyond the seed sequence allows SMD to find a different homolog assignment than miRBase in Figure 3-10B. Finally, the seed- shifting accommodation allowed identification of a seed shifted miR-281, found exclusively in the A. vaga small RNA library (Figure 3-11). The advantages of

SMD’s accessory program will be highlighted in the subsequent chapters. Select examples from the output of the accessory programs are shown in Figure 3-12.

In Figures 3-12A and 3-12B, the different miRNA homolog match to truncated versions of the same read is identified by the miRreadscollapser program. In

Figure 3-12C, the isomirs of miR-748 are identified. These last set of findings using experimental data establishes that the method as a novel and improved tool for mining conserved miRNAs in small RNA libraries.

Discussion

In the absence of contextual genomic data of conserved miRNAs, SMD outperforms BLAST-based algorithms in all instances because of its tolerance for gaps and stringency in its ranking process over BLAST. These features are reflective of the nature in which miRNAs evolve and function. Examination of the individual queried miRNAs in Tables 3-2 and 3-3 will also demonstrate that there is not an instance where the BLAST algorithm is a more effective algorithm in its search for conserved miRNA homologs over SMD. Therefore, a tool such as

SMD can be useful to learn about conservation of miRNA homologs between taxa and also across many taxa.

72

The advantage of the algorithm’s flexibility is demonstrated through its adjustable parameters used to recover the miRNA family members in nematode dataset.

The adjustable parameters of SMD provide a means to relax or restrict accepted sequence divergence among returned miRNA homolog matches. The method’s ability to recovery divergent miRNA homologs shows it to be a useful feature.

This algorithm is tailored to the nature of miRNA evolution, in a way that modification of BLAST parameters such as e-value and percent identity cannot mimic. Furthermore, the algorithm may be modified to accommodate seed mismatches to identify conserved miRNAs and BLAST cannot. The algorithm defines a seed as 6 nt, while word size within BLAST can go only as low as a perfect match of 7 nt. The 6 nt of seed sequence definition was modeled after the data on conservation of miRNA::target interaction as this length was found to be a sufficient to recover conserved miRNA targets (Lall et al., 2006). Once again, the algorithm within SMD follows the logic of miRNA evolution and function, which enabled it improved detection of miRNA homologs.

Finally, its detection of a greater number of miRNA homologs from the experimental rotifer dataset demonstrates that the algorithm is indeed a necessary tool for improving detection of miRNA homologs in previously unsampled species. It also underscores the importance of the outlined approach for other datasets where mining of conserved miRNAs relied solely on its ability to map to an assembled genome. In such cases, BLAST cannot be guaranteed

73 to correctly identify all conserved miRNA homologs from a small RNA library.

This is especially important when whole genome data does not exist for sister taxa.

74 References

Abbott, A.L., Alvarez-Saavedra, E., Miska, E.A., Lau, N.C., Bartel, D.P., Horvitz, H.R., and Ambros, V. (2005). The let-7 MicroRNA family members mir-48, mir- 84, and mir-241 function together to regulate developmental timing in Caenorhabditis elegans. Developmental cell 9, 403-414.

Calvi, B.R., Byrnes, B.A., and Kolpakas, A.J. (2007). Conservation of epigenetic regulation, ORC binding and developmental timing of DNA replication origins in the genus Drosophila. Genetics 177, 1291-1301.

Cevec, M., Thibaudeau, C., and Plavec, J. (2008). Solution structure of a let-7 miRNA:lin-41 mRNA complex from C. elegans. Nucleic acids research 36, 2330- 2337.

Cevec, M., Thibaudeau, C., and Plavec, J. (2010). NMR structure of the let-7 miRNA interacting with the site LCS1 of lin-41 mRNA from Caenorhabditis elegans. Nucleic acids research 38, 7814-7821.

Chan, C., Elemento, O., and Tavazoie, S. (2005). Revealing post-transcriptional regulatory interactions through network-level conservation. PLoS computational biology preprint, e69.

Chen, K., and Rajewsky, N. (2006). Deep conservation of microRNA-target relationships and 3'UTR motifs in vertebrates, flies, and nematodes. Cold Spring Harbor symposia on quantitative biology 71, 149-156.

Copeland, C.S., Marz, M., Rose, D., Hertel, J., Brindley, P.J., Santana, C.B., Kehr, S., Attolini, C.S., and Stadler, P.F. (2009). Homology-based annotation of non-coding RNAs in the genomes of Schistosoma mansoni and Schistosoma japonicum. BMC Genomics 10, 464.

Friedlander, M.R., Adamidi, C., Han, T., Lebedeva, S., Isenbarger, T.A., Hirst, M., Marra, M., Nusbaum, C., Lee, W.L., Jenkin, J.C., et al. (2009). High- resolution profiling and discovery of planarian small RNAs. Proc Natl Acad Sci U S A 106, 11546-11551.

Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., and Enright, A.J. (2006). miRBase: microRNA sequences, targets and gene nomenclature. Nucleic acids research 34, D140-144.

Hertel, J. (2012). Evolution of the let-7 microRNA Family. RNA biology 9:3, 1–11.

Johnson, S.M., Grosshans, H., Shingara, J., Byrom, M., Jarvis, R., Cheng, A., Labourier, E., Reinert, K.L., Brown, D., and Slack, F.J. (2005). RAS is regulated by the let-7 microRNA family. Cell 120, 635-647.

75 Lall, S., Grun, D., Krek, A., Chen, K., Wang, Y.L., Dewey, C.N., Sood, P., Colombo, T., Bray, N., Macmenamin, P., et al. (2006). A genome-wide map of conserved microRNA targets in C. elegans. Current biology : CB 16, 460-471.

Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862.

Lee, L.W., Zhang, S., Etheridge, A., Ma, L., Martin, D., Galas, D., and Wang, K. (2010). Complexity of the microRNA repertoire revealed by next-generation sequencing. RNA 16, 2170-2180.

Lightfoot, H.L., Bugaut, A., Armisen, J., Lehrbach, N.J., Miska, E.A., and Balasubramanian, S. (2011). A LIN28-dependent structural change in pre-let-7g directly inhibits dicer processing. Biochemistry 50, 7514-7521.

McBrayer, Z., Ono, H., Shimell, M., Parvy, J.P., Beckstead, R.B., Warren, J.T., Thummel, C.S., Dauphin-Villemant, C., Gilbert, L.I., and O'Connor, M.B. (2007). Prothoracicotropic hormone regulates developmental timing and body size in Drosophila. Developmental cell 13, 857-871.

Neilsen, C.T., Goodall, G.J., and Bracken, C.P. (2012). IsomiRs - the overlooked repertoire in the dynamic microRNAome. Trends in genetics : TIG.

Pasquinelli, A.E., McCoy, A., Jimenez, E., Salo, E., Ruvkun, G., Martindale, M.Q., and Baguna, J. (2003). Expression of the 22 nucleotide let-7 heterochronic RNA throughout the Metazoa: a role in life history evolution? Evol Dev 5, 372- 378.

Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B., Hayward, D.C., Ball, E.E., Degnan, B., Muller, P., et al. (2000). Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408, 86-89.

Simoes, M.C., Lee, J., Djikeng, A., Cerqueira, G.C., Zerlotini, A., da Silva- Pereira, R.A., Dalby, A.R., LoVerde, P., El-Sayed, N.M., and Oliveira, G. (2011). Identification of Schistosoma mansoni microRNAs. BMC Genomics 12, 47.

Slack, F.J., Basson, M., Liu, Z., Ambros, V., Horvitz, H.R., and Ruvkun, G. (2000). The lin-41 RBCC gene acts in the C. elegans heterochronic pathway between the let-7 regulatory RNA and the LIN-29 transcription factor. Molecular cell 5, 659-669.

Stark, A., Brennecke, J., Bushati, N., Russell, R.B., and Cohen, S.M. (2005). Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution. Cell 123, 1133-1146.

76 Wheeler, B.M., Heimberg, A.M., Moy, V.N., Sperling, E.A., Holstein, T.W., Heber, S., and Peterson, K.J. (2009). The deep evolution of metazoan microRNAs. Evol Dev 11, 50-68.

Winter A, W.W., Hunt M, Berriman M, Gilleard JS, Devaney E, Britton C ( 2012). Diversity in parasitic nematode genomes: the microRNAs of brugia pahangi and haemonchus contortus are largely novel. BMC Genomics 13:4.

Wu, S., Manber, Udi (1994). FAST ALGORITHM FOR MULTI-PATTERN SEARCHING (Chung-Cheng University,University of Arizona).

77

A

B

Figure 3-1 let-7 family members in nematodes. let-7 and let-7 family of paralogs (A) An alignment of let-7 family of paralogs in C. elegans. (B) Alignment of the orthologs of let-7 paralogs from nematodes C. briggsae, C. remanei (crm), Pristionchus pacificus (ppc), and Ascaris suum.

78

Figure 3-2 let-7 sequences of flatworms. The alignment of lophotrochozoan let-7 sequences from Lottia gigantea (lgi) Capitella teleta (cte) Schmidtea mediterranea (sme), Schistosoma mansoni (sma), and Schistosoma japonicium (sja).

79

Step 1 Step 2

Illumina read X Illumina read X conserved miRNA Match? Agrep -4 –I1 –D1 seeds from ? miRBase Conserved miRNA Step 3

Needleman-Wunsch Alignment Illumina read X Output !le > Illumina Read X Score>12?

Conserved miRNA

Figure 3-3 Processing pipeline of SMD. First step asks if any seed sequence from miRBase occurs in the 5’ region of the read. If yes, the second filter aligns the read and conserved miRNA, from the seed sequence and asks if at most 4 substitutions occur between the read and matching conserved sequence. If yes, the match is stored in a list. The third step ranks the matches to read based on Needleman-Wunsch alignment. The alignment returns the score and the program also records the extent of overlap. The highest score with the most overlap to the read is identified as its ortholog and reported in the output file.

80

!"#$%&8& !"#$%&6& 333333333& 333333333&

!"#$%&'(&)$*+,#-&*.&)/012& !"#$%&7& 333333333& 333333&&&33& !"#$%&4(&"#$%&)$*+,#-&*.&)/015& 3333333333333& !"#$%&2(&"#$%&)$*+,#-&*.&)/015& 333333333333& & miRreadscollapser& &

miRseqvars

Figure 3-4 Schematic depiction of SMD accessory programs. The queried read is represented by read X at the center of the circle. The accessory program miRreadscollapser returns anything that contains a chunk of read X. In this case, it returns reads Z and Y, both of which correspond to miR-Q and are longer reads of the same sequence. More distant sequence variants of that read may be found using miRseqvars program, which will return sequences that are 4 nt substitutions away from the original read X.

81 Table 3-1 Summary of SMD and BLAST identification of Dme miRNAs that have homologs outstide of the Drosophila genus. All isoforms of a miRNA are represented by a miRNA term. Blank boxes represent instances where a conserved homolog was not found by a method. miRNA homologs in red are incorrect assignments. miRNA homologs in blue represent instances where a member of the miRNA family of the queried Dme miRNA homolog was returned by the method.

Orthologs outside Drosophila SMD BLAST Genus assignment Assignment 1 miR-1 miR-1 miR-1 2 miR-2 miR-2 miR-2 3 miR-3 miR-309 miR-309 4 miR-4 miR-9 miR-9 5 miR-7 miR-7 miR-7 6 miR-8 miR-8 miR-8

7 miR-9 miR-9 miR-9 8 miR-10 miR-10 miR-10 9 miR-11 miR-11 miR-11 10 miR-12 miR-12 miR-12

11 miR-13 miR-13 miR-13

12 miR-14 miR-14 miR-14 13 miR-263 miR-263 miR-263 14 miR-184 miR-184 miR-184 15 miR-274 miR-274 miR-274 16 miR-275 miR-275 miR-275 17 miR-92 miR-92 miR-92 18 miR-219 miR-219 miR-219 19 miR-276 miR-276 miR-276 20 miR-277 miR-277 miR-277 21 miR-278 miR-278 miR-278

22 miR-133 miR-133 miR-133 23 miR-279 miR-279 miR-279

82 24 miR-33a miR-33 miR-33 25 miR-281 miR-281 miR-281 26 miR-282 miR-282 miR-282 27 miR-283 miR-283 miR-283 28 miR-34 miR-34 miR-34 29 miR-124 miR-124 miR-124 30 miR-79 miR-79 miR-79 31 miR-210 miR-210 miR-210 32 miR-285 miR-285 miR-285 33 miR-100 miR-100 miR-100 34 miR-286 miR-286 miR-286 35 miR-87 miR-87 miR-87 36 bantam bantam bantam 37 miR-31 miR-31 miR-31b 38 miR-304 miR-3477 miR-304 39 miR-305 miR-305 miR-305 40 miR-307 miR-307 miR-307a 41 miR-306 miR-306 miR-306 42 let-7 let-7 let-7 43 miR-125 miR-125 miR-125 44 miR-308 miR-308 miR-308 45 miR-309 miR-309 miR-309 46 miR-310 miR-310 miR-92 47 miR-311 miR-310 miR-32 miR-4009c- 48 miR-313 miR-92 3p 49 miR-315 miR-315 miR-315

50 miR-316 miR-316 miR-316 51 miR-317 miR-317 miR-317 52 miR-318 miR-318 pab-miR3700 53 miR-iab miR-iab miR-iab 54 miR-190 miR-190 miR-190 55 miR-193 miR-193 miR-193 56 miR-957 miR-957 miR-957 57 miR-375 miR-375 miR-375 58 miR-932 miR-932 miR-932 59 miR-965 miR-965 miR-965 60 miR-970 miR-970 miR-970 61 miR-971 miR-971 miR-971

83 62 miR-980 miR-980 miR-980 63 miR-981 miR-981 miR-981 64 miR-927 miR-927 miR-927 65 miR-985 miR-985 miR-985 66 miR-988 miR-988 miR-988 67 miR-989 miR-989 miR-989 68 miR-137 miR-137 miR-137 69 miR-929 miR-929 miR-929 70 miR-993 miR-993 miR-993 71 miR-995 miR-995 miR-995 72 miR-958 73 miR-996 miR-996 miR-996 74 miR-252 miR-252 miR-252 75 miR-998 miR-998 miR-998 76 miR-999 miR-999 miR-999 77 miR-1000 miR-1000 miR-1000 78 miR-1006 miR-1006 miR-1006

Table 3-2 SMD and BLAST identification of Dme miRNAs corresponding to conserved miRNA homologs. Blank boxes represent instances where a conserved homolog was not found by a method. miRNA homologs in red are incorrect assignments. miRNA homologs in blue represent instances where a member of the miRNA family of the queried Dme miRNA homolog was returned by the method. Yellow highlighted boxes represent instances where BLAST correctly identified a miRNA isoform when SMD did not. Green highlighted boxes represent instances where SMD correctly identified a miRNA isoform and BLAST did not.

SMD BLAST Dme miRNA assignment assignment 1 miR-1-5p miR397 2 miR-1-3p miR-1 miR-1

3 miR-2a-1-5p miR-2-1 miR-2 4 miR-2a-3p miR-2 miR-2 5 miR-2a-2-5p miR-2-2 miR-2

6 miR-2b-1-5p miR-4446 7 miR-2b-3p miR-2 miR-2

8 miR-2b-2-5p miR-2 9 miR-2c-5p miR-130a*

84 10 miR-2c-3p miR-2 miR-2 11 miR-3-5p miR-18b

12 miR-3-3p miR-309 miR-309

13 miR-4-5p miR-9* miR-9b* 14 miR-4-3p miR-9* miR-9b* 15 miR-7-5p miR-7 miR-7 16 miR-7-3p miR-71 17 miR-8-5p miR-8 miR-8 18 miR-8-3p miR-8 miR-8 19 miR-9a-5p miR-9 miR-9 20 miR-9a-3p miR-9 miR-9 21 miR-9b-5p miR-9 miR-9 22 miR-9b-3p miR-3140 23 miR-9c-5p miR-9 miR-9 24 miR-9c-3p miR-9 miR-579 25 miR-10-5p miR-10 miR-10 26 miR-10-3p miR-10 miR-10 27 miR-11-5p miR-876 28 miR-11-3p miR-11 miR-11 29 miR-12-5p miR-12 miR-12 30 miR-12-3p miR-12 miR-12 31 miR-13a-5p miR-13 32 miR-13a-3p miR-13 miR-13 33 miR-13b-1-3p miR-545 34 miR-13b-3p miR-13b miR-13 35 miR-13b-2-5p miR-13 36 miR-14-5p iltv-miR-I5 37 miR-14-3p miR-14 miR-14 38 miR-263a-5p miR-263 39 miR-263a-3p miR-263 miR-263 40 miR-263b-5p miR-263 miR-263 41 miR-263b-3p miR-1573 42 miR-184-5p miR-184 miR-184 43 miR-184-3p miR-184 miR-184 44 miR-274-5p miR-274 miR-274 45 miR-274-3p miR-274 miR-274 46 miR-275-5p miR4387d 47 miR-275-3p miR-275 miR-275 48 miR-92a-5p miR-H6

85 49 miR-92a-3p miR-92 miR-92a 50 miR-92b-5p miR-2162 51 miR-92b-3p miR-92 miR-92 52 miR-219-5p miR-219 miR-219 53 miR-219-3p miR3464 54 miR-276a-5p miR-276 miR-276 55 miR-276a-3p miR-276 miR-276 56 miR-276b-5p miR-276 miR-276 57 miR-276b-3p miR-276 miR-276 58 miR-277-5p miR-277 miR-277 59 miR-277-3p miR-277 miR-277 60 miR-278-5p miR3695 61 miR-278-3p miR-278 miR-278 62 miR-133-5p miR-133 miR-133 63 miR-133-3p miR-133 miR-133 64 miR-279-5p miR-1230 65 miR-279-3p miR-279 miR-279 66 miR-281-1-5p miR-281 miR-281 67 miR-281-3p miR-281 miR-281 68 miR-281-2-5p miR-281 miR-281 69 miR-282-5p miR-282 miR-282 70 miR-282-3p miR-282 miR-282 71 miR-283-5p miR-283 miR-283 72 miR-283-3p miR-29 73 miR-33-5p miR-33 miR-33 74 miR-33-3p miR-98 75 miR-34-5p miR-34 miR-34 76 miR-34-3p miR-34 77 miR-124-5p miR-192 78 miR-124-3p miR-124 miR-124 79 miR-79-5p miR-79 miR-79 80 miR-79-3p miR-79 miR-79 81 miR-210-5p miR-210 miR-210 82 miR-210-3p miR-210 miR-210 83 miR-285-5p miR-5597 84 miR-285-3p miR-285 miR-285 85 miR-100-5p miR-100 miR-100 86 miR-100-3p miR-5408 87 miR-286-5p miR2592 88 miR-286-3p miR-286 miR-286 89 miR-87-5p miR-5343

86 90 miR-87-3p miR-87 miR-87 91 bantam-5p bantam 92 bantam-3p bantam bantam 93 miR-31b-5p miR-31b miR-31b 94 miR-31b-3p miR-4186 95 miR-31a-5p miR-31 miR-31 96 miR-31a-3p miR-18 97 miR-304-5p miR-3477 miR-304 98 miR-304-3p miR4244 99 miR-305-5p miR-305 miR-305 100 miR-305-3p miR-305 miR-305 101 miR-307a-5p miR-307 miR-307 102 miR-307a-3p miR-307 miR-307 103 miR-307b-5p miR-307 104 miR-307b-3p miR-307 105 miR-306-5p miR-306 miR-306 106 miR-306-3p miR-214 107 let-7-5p let-7 let-7 108 let-7-3p let-7 miR-3307 109 miR-125-5p miR-125 miR-125 110 miR-125-3p miR-125 111 miR-190-5p miR-190 miR-190 112 miR-190-3p miR-190 miR-190 113 miR-193-5p miR-193 miR-193 114 miR-193-3p miR-193 115 miR-308-5p miR-153 116 miR-308-3p miR-308 miR-308 117 miR-309-5p miR-3287 118 miR-309-3p miR-309 miR-309 119 miR-310-5p miR-3692 120 miR-310-3p miR-310 miR-92 121 miR-311-5p miR-669d-2 122 miR-311-3p miR-310 miR-32 123 miR-313-5p miR-3827-5p 124 miR-313-3p miR-92 miR-4009c-3p 125 miR-315-5p miR-315 miR-315 126 miR-315-3p miR-315 127 miR-316-5p miR-316 miR-316 128 miR-316-3p miR-5006 129 miR-317-5p miR-317 miR-317 130 miR-317-3p miR-317 miR-317

87 131 miR-318-5p miR3700 132 miR-318-3p miR-653 133 miR-375-5p miR-4446 134 miR-375-3p miR-375 miR-375 135 miR-iab-4-5p miR-iab-4 miR-iab-4 136 miR-iab-4-3p miR-iab-4 miR-iab-4 137 miR-iab-8-5p miR-iab-8 miR-iab-8 138 miR-iab-8-3p miR-iab-8 miR-iab-4 139 miR-957-5p miR-3813 140 miR-957-3p miR-957 miR-957 141 miR-932-5p miR-932 miR-932 142 miR-932-3p miR-147 143 miR-965-5p miR-2036 144 miR-965-3p miR-965 miR-965 145 miR-970-5p miR-5405 146 miR-970-3p miR-970 miR-970 147 miR-971-5p miR-2 148 miR-971-3p miR-971 miR-971 149 miR-980-5p miR-4633 150 miR-980-3p miR-980 miR-980 151 miR-981-5p miR-503 152 miR-981-3p miR-981 miR-981 153 miR-927-5p miR-927 miR-927 154 miR-927-3p miR-927 miR-927 155 miR-985-5p miR1222 156 miR-985-3p miR-985 miR-985 157 miR-988-5p miR5081 158 miR-988-3p miR-988 miR-988 159 miR-989-5p miR5504 160 miR-989-3p miR-989 miR-989 161 miR-137-5p miR-5689 162 miR-137-3p miR-137 miR-137 163 miR-929-5p miR-929 miR-929 164 miR-929-3p miR-929 miR-929 165 miR-993-5p miR-993 miR-993 166 miR-993-3p miR-993 miR-993 167 miR-995-5p miR2604 168 miR-995-3p miR-995 miR-995 169 miR-996-5p miR-279 170 miR-996-3p miR-996 miR-996 171 miR-252-5p miR-252 miR-252

88 172 miR-252-3p miR-252 miR-252 173 miR-958-5p miR-3317 174 miR-958-3p miR-5695 175 miR-998-5p miR-1175 176 miR-998-3p miR-998 miR-998 177 miR-999-5p miR-124 178 miR-999-3p miR-999 miR-999 179 miR-1000-5p miR-1000 miR-1000 180 miR-1000-3p miR-4516 181 miR-1006-5p vvi-miR3626 182 miR-1006-3p miR-1006 miR-1006

Table 3-3 SMD and BLAST assignment of Dme miRNAs found only within the Drosophila genus. Blank boxes represent instances where a conserved homolog was not found by SMD. miRNA homologs in red are incorrect assignments.

Nonconserved SMD miRNAs assignment BLAST Assignment 1 miR-5-5p miR-2944 miR-1843b-3p 2 miR-5-3p miR-2 bantam-5p 3 miR-6-1-5p miR-5399*

4 miR-6-3p miR1104

5 miR-6-2-5p miR-2944a 6 miR-6-3-5p miR-5322 7 miR-280-5p miR-5580-5p 8 miR-284-5p miR-250

9 miR-284-3p miR773b

10 miR-287-3p miR4341 11 miR-288-3p miR-5550-3p 12 miR-289-5p miR-3925-5p 13 miR-303-5p miR-K12-1 14 miR-303-3p miR2275c-3p

89 15 miR-314-5p miR-125a 16 miR-314-3p miR171g 17 miR-312-5p miR-92 miR399 18 miR-312-3p miR-4818 19 miR-954-5p miR-4046-5p 20 miR-954-3p miR-669j 21 miR-955-5p miR162 22 miR-955-3p miR-1252 23 miR-956-5p miR-5194 24 miR-956-3p miR-3869-5p 25 miR-959-5p miR-5441 26 miR-959-3p miR-754d 27 miR-960-5p miR828 28 miR-960-3p miR-411a-5p 29 miR-961-5p miR-340-3p 30 miR-961-3p miR-125a 31 miR-962-5p miR-4812-3p 32 miR-962-3p miR-4738-5p 33 miR-963-5p miR-374a 34 miR-963-3p miR-3171 35 miR-964-5p miR-3152-3p 36 miR-964-3p miR-3152-5p 37 miR-966-5p miR-3490 38 miR-966-3p miR-4821 39 miR-967-5p miR-993 40 miR-967-3p miR-4714-3p 41 miR-1002-5p miR-381-5p 42 miR-1002-3p miR-2235 43 miR-968-5p miR-3683 44 miR-968-3p miR-144 45 miR-969-5p miR-4000h-5p 46 miR-969-3p miR-3267a 47 miR-972-5p 48 miR-972-3p miR-3323 49 miR-973-5p miR-4020b-5p 50 miR-973-3p miR3636 51 miR-974-5p miR-452-3p 52 miR-974-3p miR843 53 miR-975-5p miR-1823 54 miR-975-3p miR-2411* 55 miR-976-5p miR-2404

90 56 miR-976-3p miR-4924 57 miR-977-5p miR4371c 58 miR-977-3p miR780.2 59 miR-978-5p miR-5342 60 miR-978-3p miR-3177-3p 61 miR-979-5p miR-182d 62 miR-979-3p miR-355 63 miR-982-5p miR-2807c 64 miR-982-3p miR169 65 miR-983-5p miR-5600 66 miR-983-3p miR-196a 67 miR-984-5p miR2592bl 68 miR-984-3p miR-10b 69 miR-986-5p miR-5407 70 miR-986-3p miR-5446 71 miR-987-5p miR-1422q* 72 miR-987-3p miR-3038 73 miR-990-5p miR-571 74 miR-990-3p miR-5623-3p 75 miR-991-5p miR472b 76 miR-991-3p miR-3970 77 miR-992-5p miR-5343 78 miR-992-3p miR-9* 79 miR-994-5p miR-5356b* 80 miR-994-3p miR-2365 81 miR-997-5p miR-3813-3p 82 miR-997-3p miR-5197-5p 83 miR-1001-5p miR-5408c-5p 84 miR-1001-3p miR415 85 miR-1003-5p miR-4570 86 miR-1003-3p miR-3924 87 miR-1004-3p miR-3391 88 miR-1005-3p miR1886.3 89 miR-1007-5p miR-548ah-5p 90 miR-1007-3p miR-452 91 miR-1008-3p miR-888 92 miR-1009-3p miR4222 93 miR-1010-5p miR-2958 94 miR-1010-3p miR-788 95 miR-1011-3p miR869.1 96 miR-1012-5p miR-9b*

91 97 miR-1012-3p miR-3259 98 miR-1013-3p miR-2245 99 miR-1014-5p miR-2702 100 miR-1014-3p miR-3827-3p 101 miR-1015-3p miR529 102 miR-1016-5p miR157c* 103 miR-1016-3p miR-1759 104 miR-1017-3p miR5285c 105 miR-2279-5p miR-4159-3p 106 miR-2279-3p miR5628 107 miR-2280-5p miR-5092 108 miR-2280-3p miR-750 109 miR-2281-5p miR1030j 110 miR-2281-3p miR5213* 111 miR-2282-3p miR478f 112 miR-2283-5p miR-5697 113 miR-2283-3p miR-4905 114 miR-2489-5p miR-1c* 115 miR-2489-3p miR-4842 116 miR-2490-5p let-7d* 117 miR-2490-3p lin-4* 118 miR-2491-5p miR-1501 119 miR-2491-3p miR-1538 120 miR-2492-5p miR-1d 121 miR-2492-3p miR3637* 122 miR-2493-5p 123 miR-2493-3p miR-375* 124 miR-2494-5p miR-495-3p 125 miR-2494-3p miR-3803-3p 126 miR-2495-5p miR846 127 miR-2495-3p miR-3901-3p 128 miR-2496-5p miR4379 129 miR-2496-3p miR-574-5p 130 miR-2497-5p miR-2062* 131 miR-2497-3p miR-3143 132 miR-2498-5p miR-4098-5p 133 miR-2498-3p miR-4681 134 miR-2499-5p miR-4856b 135 miR-2499-3p miR-124-3p 136 miR-2500-5p miR-4194-3p 137 miR-2500-3p miR-669n

92 138 miR-2501-5p miR-2382 139 miR-2501-3p miR-2789 140 miR-3641-5p miR-4850 141 miR-3641-3p miR-rL1-6 142 miR-3642-5p miR-551a 143 miR-3642-3p miR-3813-5p 144 miR-3643-5p miR5389 145 miR-3643-3p miR-1a-1* 146 miR-3644-5p miR-4704-5p 147 miR-3644-3p miR-2220-5p 148 miR-3645-5p miR-1420a* 149 miR-3645-3p miR-285* 150 miR-4910-5p miR5537 151 miR-4911-3p miR-H3-5p 152 miR-4912-5p miR-545-5p 153 miR-4913-3p miR5564a 154 miR-4914-5p miR-HSUR5-3p 155 miR-4915-5p miR-98* 156 miR-4916-3p miR-1823 157 miR-2535b-3p miR-3616-3p 158 miR-4917-3p miR780.1 159 miR-4918-5p miR-2752 160 miR-4919-5p miR-10c* 161 miR-4908-3p miR-625-3p 162 miR-4909-3p bantam 163 miR-4939-3p miR1030j 164 miR-4940-5p miR-1018 165 miR-4940-3p miR5511 166 miR-4941-5p miR-449c* 167 miR-4941-3p miR-939 168 miR-4942-3p miR5242 169 miR-4943-5p miR-BART2-3p 170 miR-4943-3p miR-5366 171 miR-4944-5p miR-55* 172 miR-4944-3p miR-1231-5p 173 miR-4945-5p miR1134 174 miR-4946-5p miR-582-3p 175 miR-4947-5p miR482 176 miR-4948-5p miR-1307 177 miR-4948-3p miR-5617-5p 178 miR-4949-5p miR-574-5p

93 179 miR-4949-3p miR-210 180 miR-4950-5p miR-1493 181 miR-4950-3p miR5554c 182 miR-4951-5p miR-553 183 miR-4951-3p miR-4651 184 miR-4952-5p miR1441 185 miR-4952-3p miR-5360* 186 miR-4953-3p let-7b* 187 miR-4954-5p miR-208* 188 miR-4954-3p miR439j 189 miR-4955-5p 190 miR-4955-3p miR-1649* 191 miR-4956-5p miR-K12-7 192 miR-4956-3p miR-3187-5p 193 miR-4957-5p miR-2277-5p 194 miR-4957-3p miR167b 195 miR-4958-5p miR-760 196 miR-4958-3p miR-3557-3p 197 miR-4959-5p miR-153* 198 miR-4960-3p miR-583 199 miR-4961-5p miR-467g 200 miR-4961-3p miR-3598-3p 201 miR-4962-5p miR-4495 202 miR-4962-3p miR156f* 203 miR-4963-5p miR-76 204 miR-4963-3p miR-4734 205 miR-4964-3p miR-4857 206 miR-4965-5p miR-3354 207 miR-4965-3p miR-M1-2 208 miR-4966-5p miR-4009a-3p 209 miR-4966-3p miR-4541 210 miR-4967-5p miR5509 211 miR-4967-3p miR5238 212 miR-4968-5p miR2634 213 miR-4969-5p miR-1709 214 miR-4969-3p miR-3799 215 miR-4970-5p miR-3075-5p 216 miR-4971-5p miR-1632 217 miR-4972-5p miR-3231 218 miR-4972-3p miR-5608 219 miR-4973-5p miR-3379*

94 220 miR-4973-3p miR-595 221 miR-4974-5p miR-84* 222 miR-4974-3p miR-1277-5p 223 miR-4975-5p miR-26b* 224 miR-4976-5p miR-4663 225 miR-4976-3p miR-1949 226 miR-4977-5p miR-4445-3p 227 miR-4977-3p miR5659 228 miR-4978-5p miR-4311 229 miR-4979-5p miR-5590-5p 230 miR-4979-3p miR-5011-3p 231 miR-4980-3p miR-2342 232 miR-4981-3p miR858* 233 miR-4982-5p miR-4666b 234 miR-4982-3p miR-4188-5p 235 miR-4983-5p miR-19b* 236 miR-4983-3p miR828 237 miR-4984-5p miR-2448 238 miR-4984-3p miR-4046-3p 239 miR-4985-5p miR4372b 240 miR-4985-3p miR-3401 241 miR-4986-5p miR-1239 242 miR-4986-3p miR5556* 243 miR-4987-5p miR-252a 244 miR-4987-3p miR-1456* 245 miR-5613* miR-4104-5p 246 miR-5613 miR-4104-5p 247 miR-5614* miR-4104-5p 248 miR-5614 miR-1945

95 A

B

C

Figure 3-5 Mixed top hits returned by BLAST. (A) Tops hits returned by BLAST for Dme-miR-7-5p correspond to both miR-7 homologs as well as miR-

96 3529. (B) Alignment of Dme-miR-7-5p with miRNA homologs returned by BLAST. (C) Other Dme miRNAs listed in Table 3-2 with mixed top hits.

A

B

Figure 3-6 Advantage of the SMD algorithm in identifying correct miRNA homolog match for queried Dme miRNA. (A) Alignment of top hit returned by BLAST shows its returned match does not having a matching seed sequence to Dme-miR-9c-3p. (B) Due to weighted features of the SMD algorithm such as seed sequence requirement and gap tolerance, the correct homolog is matched to queried Dme miR-9c-3p.

97 !"#$%&'()&*+',&

$-.&/*(+/01&

Figure 3-7 Gap tolerance beyond the seed sequence to minimize substitutions within the match chosen by SMD. Top hit from BLAST corresponds to tca-miR-304. SMD picks tca-miR-3477 because overall number of substitutions in the alignment is less for the SMD choice.

98 A B

C D

Figure 3-8 SMD return of divergent miRNA homologs. (A) SMD was given the flatworm let-7 sequences. The C.elegans let-7 sequence is shown as a reference. (B) The alignments returned by SMD of identified let-7 sequences are shown. (C) Seed shifted let-7 sequence from Nereis diversicolor (ndi). Divergent nematode miRNA homologs not found in miRBase from Brugia pahangi (bpa) and Haemonchus contortus (hco) are listed with the miRNA family classification. (D) The alignments returned by SMD of identified homologs.

99 Table 3-4 Identification of rotifer miRNA homologs by SMD.

miRNA !" #"$%&'(')('&" miRNA homolog !" #"*%)+,,')(""""" -..*/%0'%(" homolog !" 1 miR-125 17 let-7 2 miR-87 !" 18 miR-100 3 miR-1 !" !" 19 miR-79 4 miR-7 !" 20 bantam !" 5 miR-281 !" !" 21 miR-263 6 miR-315 !" 22 miR-184 !" 7 miR-242 !" 23 miR-12 8 miR-190 !" 24 miR-304 !" 9 miR-981 25 miR-9 10 miR-153 !" !" 26 miR-277 !" 11 miR-1175 !" 27 miR-219 !" 12 miR-750 !" 13 miR-375 !" 28 miR-2 !" 14 miR-124 29 miR-71 !" 15 miR-36 30 miR-279 16 miR-29 31 miR-748 !"!"

100 A ',))&*+& '())&*+& -./$,))& !"#$%& -./$,(0&

B

Figure 3-9 The conserved let-7 cluster in B. manjavacas. (A) The three miRNAs are found tandemly arranged within the span 500 bp in B. manjavacas. (B) The miRNA homolog matches of let-7, miR-100 and miR-125 as assigned by SMD.

101 A

B

!"#$%&'()*+")',( -./()*+")',(

Figure 3-10 SMD vs. miRBase(BLAST) (A) The best three matches for queried sequence is shown from miRBase results are listed first and ranked by e-value. The second image is how SMD views and processes those same matches and why the first choice is preferred. Black arrows indicate substitutions between queried sequence and conserved miRNA. (B)The same read, sequence-X was run through both SMD and miRBase. The returned conserved miRNA match from miRBase calls this sequence spu-miR-29, where SMD calls it tca-miR-995- 3p. If a gap is tolerated there is greater homology downstream of read to SMD match versus the miRBase match.

102

Figure 3-11 Seed shift in a rotifer miRNA homolog. Alignment of miR-281 homologs from Brachionous manjacavas (Bm), Adineta vaga (Av) and Philodina acuticornis (Pa)

A 2 length - 2 nt

2 length - 2 nt B

C

Figure 3-12 Sample output from SMD accessory programs. (A) miRreadscollapser takes the length of read, starting from its second position to 2 positions from its end and pulls out all sequences in the output file that contain this string. In the example, SMD finds greater homology to spu-miR-29 of the shorter version of that read. The program identifies these instances so the incorrect conserved miRNA is not miscalled. (B) miRreadscollapser identifies the read corresponding to miR-184 is the truncated version of the read

103 corresponding to miR-748. Genomic context of read miR-748 shows that it is the correct sequence. (C) miRseqvars is a program that returns anything that is within 4 nt substitutions of the queried read from both the input file used for SMD and output file. The black arrow indicates the substitution in the sequence that lacks genomic context, therefore an editing candidate.

104 Chapter IV

Loss of widely conserved miRNAs let-7 and miR-100 in bdelloid rotifers

105 ABSTRACT

The role of microRNA (miRNA) regulation in determining cell identity provides a possible mechanism for understanding the phenotypic diversity and successful evolution of bdelloid rotifers, an ancient asexual group of aquatic microinvertebrates. We examined small RNA libraries made from two bdelloid rotifers and a representative of their sexual relatives, a monogonont rotifer. All rotifers had many conserved miRNAs with majority of conserved miRNA repertoire shared among the three. We found the highly conserved miRNAs let-7, miR-100, and miR-125 in the monogonont and determined from genomic data that they occur in tandem. All three miRNAs have a role in developmental timing, most notably widely conserved let-7, which is critical for the transition towards a differentiated cell state during development and life stages beyond. In contrast, both bdelloid libraries contained miR-125 but neither contained let-7 nor miR-

100. Northern and qPCR assays of let-7 and let-7 variants failed to reveal this miRNA in total RNA from several bdelloid species. Finally, we identified two miR-125 loci in the genome of a bdelloid but no potential loci for let-7 or miR-100. Regions contiguous with miR-125 loci did not contain any legitimate pre-miRNA hairpins with homology to either let-7 or miR-100.

Overall, these findings suggest the remarkable conserved miRNA complement of bdelliods is another feature underlying the unique biology and evolution of this asexual lineage.

106 The surprising absence of the two earliest metazoan miRNAs is perhaps best understood after considering the evolutionary history and unusual lifestyle of bdelloid rotifers. Bdelloids are microinvertebrates that live in ephemerally aquatic habitats and are consequently prone to frequent desiccation. Studies show bdelloids are remarkably robust to DNA damage, which is most likely an adaptation to their desiccation-prone lifestyle (Gladyshev and Meselson, 2008).

They are the most successful known group of obligately asexual animals.

According to current views of bdelloid evolutionary history based on molecular and cytogenetic evidence, bdelloids diverged from a facultatively sexual class of rotifers known as monogononts approximately 100 million years ago. Following the split of bdelloids and monogononts, whole genome duplication occurred with subsequent gene loss rendering bdelloids as degenerate tetraploids. Today

Class Bdelloidea boasts over 400 species. Morphological variation observed in jaws from independently evolving bdelloid species resemble the diversity of a sexual population. The mechanisms underlying the unexpected degree of phenotypic diversity observed within this group of ancient asexual invertebrates remain a mystery. The instrumental role of small non-coding RNAs (~22 nucleotides) known as microRNAs (miRNAs) in shaping cell identity and contributing to phenotypic diversity provides one mechanism for the source of such variation (Jovelin and Cutter, 2011). miRNAs are involved in post- transcriptional gene regulation. In animals, miRNAs bind most commonly to the 3' untranslated (UTR) region of messenger RNA (mRNA) transcripts resulting in translational repression. As the functional roles of miRNAs were elucidated, so

107 too emerged their conservation, further underscoring their importance within a regulatory network. Most noted among miRNAs for sequence conservation is one of the first miRNAs to be discovered, let-7. It is found to play a critical role in developmental timing and also cancer in mammals (Ambros, 2000; Boyerinas et al., 2010). let-7 was first found to mediate the cell state transition from late larval stage to adult in C. elegans. It has since been shown to play a role in numerous other cellular processes across Metazoa (Ding et al., 2008; Hayes et al., 2006;

Zhu et al., 2011). The expression pattern of let-7 and miR-125 was inherited from the oldest conserved miRNA, miR-100, which also shows key role in development; all three miRNAs arose early in animal evolution (Christodoulou et al., 2010). There is also evidence from many taxa that these miRNAs are coordinately expressed (Bashirullah et al., 2003) (Christodoulou et al., 2010).

With the ever-increasing sampling of miRNAs throughout Metazoa, the conservation and evolution of the let-7 cluster containing both miR-125 and miR-

100 is becoming clearer. Hertel et al. and others have elaborated on this subject, culling information reported from numerous small RNA surveys. The study concludes that the bilaterian ancestor possessed an intact let-7 cluster, which contained both miR-100 and miR-125. Following the split of protostomes and deuterostomes, the cluster underwent duplication in vertebrates (coincident with whole genome duplication), while in the protostome branch rearrangement and/or loss was more commonly observed. However, in all such cases let-7 was found to be conserved within a phylum. In the instances where miR-125 or miR-

108 100 were lost, existing members of their miRNA families were found (e.g., miR-

51 in C. elegans belongs to the miR-100 family). As evidenced by these and numerous other studies, miRNAs that arose early in animal evolution have in the process integrated themselves into many regulatory networks, supported by miRNA::target interaction data from studies in diverse species. This provides the functional basis for conservation. The observed expansion of miRNA repertoires at specific nodes of animal evolution also bolsters the idea that miRNAs play an integral role in evolution of the animal body plan (Wheeler et al., 2009). The evolution of the body plan requires new cell types. As new cells types are innovated, they also inherit the expression pattern of conserved miRNAs. As a result, highly conserved miRNAs exhibit broad distribution of expression in numerous cell types (Christodoulou et al., 2010)

Materials and Methods

Small RNA Library Construction and Sequencing

Whole animals were collected from Philodina acuiticornis, Brachionus manjavacas and Adineta vaga cultures representative of all stages of development. Total RNA was extracted using Trizol. Approximately 200-300 µg of total RNA was used to construct small RNA libraries for B. manjavacas and

P.acuticornis following protocol described in Wheeler at al. 2009 and sequenced using 454 sequencing technology. Approximately 100 µg of total RNA from A. vaga was used following library construction protocol by Lau et al and was deep sequenced on Illumina GA IIx (Lau et al., 2001).

109

Northern analysis

Approximately 10 µg of total RNA extracted from both A. vaga and B. manjavacas, representative of all life stages, were run on a 15% TBE-UREA denaturing gel, then transferred and chemically crosslinked using1-ethyl-3-(3- dimethylaminopropyl)-carbodiimide onto a nylon membrane. Hybridization conditions were adapted from Reihnart et al.. Briefly, probes were end-labeled with 32P γ-ATP using T4 polynucleotide kinase. The probe for let-7 was used from Pasquinelli et al. for northern blot assays (Pasquinelli et al., 2003). The oligonucleotide probe for miR-87 is as follows: 5’ CACTCGTTTCAAAATCCACAT

3’.

Quantitative-PCR Detection of miRNAs

For detection of miRNAs uncovered from sequencing surveys, the commercially available miScript assay (Qiagen) was used. Custom primers to bdelloid- and monogonont-specific miRNAs were ordered as a part of the kit. However, for detection of each miRNA, only a single probe was used. A list of sequences can be found in Table 4-1. For all reactions 1 µg of total RNA was used. Briefly, the assay employs a poly(A) polymerase to adenylate small RNAs (including miRNAs) and then cDNA is made by using a oligo-dT primer, which contains a tag on the 5’ end. The complement to this end is used as the reverse primer while the forward primer is the exact sequence of the miRNA.

Bioinformatic processing of sequence reads

110 454 and Illumina reads were filtered to exclude those less than 17 nt or greater than 25 nt in length; the remaining data were compressed to unique sequences and the number of reads representing each sequence. The Illumina reads were filtered based on average quality score of above 30 for reads. All size filtered reads were processed using SMD. Briefly, SMD compares sequences to conserved miRNAs in mirBase. Then it finds matches to database sequences that meet three criteria to be identified as potential homologs to conserved miRNAs: 1) the lengths of query and subject sequences must match within 4 nt starting from the seed position; 2) positions 2-7 corresponding to the seed sequence of the subject sequence must match the query exactly; and 3) the rest of the alignment should contain no more than 4 mismatches, which includes the allowance of a single deletion or insertion. For secondary structure prediction, reads were counted if identity was within 90% of genomic loci (Nygaard et al.,

2009). Secondary structure of all identified miRNAs was predicted by mFOLD

(Markham and Zuker, 2005). Small RNA reads were mapped to miR-125-loci using BOWTIE from the Galaxy software package (http://galaxy.psu.edu).

miRNA:Target prediction

RNA hybrid was used to find putative binding sites for identified miRNAs using default settings no energy cutoffs (Kruger and Rehmsmeier, 2006). Binding sites, which either displayed binding within the seed sequence and partial complementarity downstream or compensatory binding downstream were considered valid.

111

Results

The absence of let-7 and miR-100 in bdelloid small RNA libraries

The 454 libraries consisted of approximately 30,000 reads following size filtration out of ~60,000 initial reads, while the Illumina library consisted of approximately

5.3 million reads that were searched for homology to reported conserved miRNAs. Comparison of conserved miRNA repertoires of the monogonont and bdelloids revealed many conserved miRNAs with characteristic hairpin secondary structure in all three rotifer libraries with the majority of miRNAs shared between the groups (Tables 4-2, 4-3, Appendix A). There were also several miRNAs that were bdelloid-specific, however, the majority were shared between the rotifer groups with the trend noted in the previous chapter of bdelloid miRNA homologs exhibiting greater sequence divergence to its closest homolog match (Tables 4-2, 4-3). In the A. vaga small library, sequences corresponding to miR- 748 and miR-71 were reported as the fourth and seventh most abundant reads, respectively. However, the general abundance of other small RNA species in miRNA size range inflated the overall read count for the library and masked this result. Both miR-100 and let-7 were noticeably absent from both bdelloid libraries. The comparison of miR-125 sequences from the two bdelloid species and monogonont showed only a few nucleotide changes between the miR-125 sequences from all three rotifer species (Figure 4-1A). The monogonont sequences of let-7 and miR-100 showed sequence similarity to known miRNA homologs (Figure 4-1B,C). miR-125 was found to exist on a genomic cluster with

112 miR-100 and let-7 in B. manjavcas. The common ancestor of the bdelloids and monogononts likely possessed this intact cluster. Therefore, homologous sequences to B. manjavacas let-7 and miR-100 would be expected, if present, in the small RNA bdelloid libraries.

Detection of identified conserved miRNAs

Select miRNAs identified from sequencing surveys were amplified using qPCR. qPCR reactions profiled the expression of let-7, miR-100, and miR-125 in all three rotifer species. Amplification was detected for miR-125 in all three species, however amplification could not be detected in the miR-100 and miR-125 reactions using RNA from P. acuticornis or A. vaga (Figure 4-2, 4-3). The data are displayed as lists of Ct values, where miRNAs represented in the sequencing surveys (i.e., miR-87 and miR-125) serve as a relative measure for expression. A comparison of the Ct values in all reactions shows that miRNA expression for detected miRNAs were comparable to one another as well as concordant with sequencing results; expression of miR-100 and let-7 could not be confirmed in the bdelloid samples. The absence of let-7 has been shown in the cnidarian,

Nematostella vecteinis (Pasquinelli et al., 2000), and therefore its cDNA was used as a negative control with the let-7 probe. The amplification plots for all reactions are shown in Figure 4-4. A Northern blot was also performed using total

RNA from A. vaga and B. manjavacas. The assay tested for simple hybridization to miRNAs using a labeled oligo complementary to let-7 and miR-87; miR-87 served as a positive control. Again, both miR-87 and let-7 were detected in the

113 monogonont, B. manjavacas, but no signal was observed for let-7 in the bdelloid,

A. vaga (Figure 4-4).

Reduced stringency search for candidate bdelloid let-7 and miR-100

The maintenance of the let-7 sequence across bilateria is the most often cited example of miRNA sequence conservation. Rotifers are protostomes. A survey of the protostome let-7 sequences reported in miRBase revealed let-7 sequences in

Platyhelminthes were greatly diverged from the other protostome let-7 sequences (Figure 4-5A). While these sequences all bear the canonical let-7 seed, they exhibit marked sequence divergence downstream in comparison to other let-7 sequences. As mentioned in the previous chapter, the let-7 designation of these miRNA sequences appears to be contentious and may require further functional validation to be proven as legitimate let-7 homologs

(Hertel et al., 2006). Nonetheless, these divergent sequences were considered in a regular expression search for the let-7 sequence. The regular expression condition was constructed by aligning the known orthologs of the let-7 sequence.

All polymorphisms at each position were reflected in the expression and only the seed sequence was fixed. In this way, the bdelloid small RNA reads were searched for putative let-7 homologs that met this loosest possible criterion for a let-7 sequence identity. The results revealed a sequence, Av-sme-let-7a, which was represented in only 7 reads of the deep sequenced Illumina library (Figure 4-

5B) This sequence failed to meet the minimum read count requirement based on miRbase criterion (Kozomara and Griffiths-Jones, 2011) and also by secondary

114 structure estimates (Figure 4-5C). Reports exist of compensatory downstream base pairing to overcome a mismatch within the seed sequence, possibly allowing for alteration of seed sequence (Yekta et al., 2004). So stringency was further relaxed to find a viable let-7 sequence, allowing one mismatch within the seed sequence in an attempt to uncover any “let-7-like” sequences in the small

RNA library. The search turned up only a few candidates that met the relaxed criterion, the highest of which corresponded to 30 reads in the deep sequenced

Illumina library, Av-sme-let-7d-like (Figure 4-5B). Genomic data also invalidated these candidates when surrounding genomic context of the putative miRNAs failed to give the characteristic hairpin secondary structure of precursor miRNA

(Figure 4-5C). The same exhaustive search approach was repeated for miR-100 candidates within the small RNA Illumina library. The second search for miR-100 candidates failed to return any candidates even bearing the miR-100 seed sequence. The secondary structure of the only miRNA member of the let-7 cluster, Av-miR-125, is also shown to demonstrate that precursors of bdelloid miRNAs do exhibit the characteristic hairpin structure (Figure 4-5C).

A. vaga miR-125 genomic loci missing let-7 and miR-100

The above findings indicate that let-7 and miR-100 are not expressed in A. vaga.

As the draft genome of A. vaga became available (Flot et al. in prep), the regions surrounding the miR-125 loci were interrogated to look for the presence or absence of let-7 and miR-100 within their expected genomic context. Two allelic variants of miR-125 exist in the A. vaga genome; each allele was found to exist in

115 three copies within the assembled degenerate tetraploid genome (Table 4-4).

Each copy had predicted hairpin structure of the precursor miR-125. However, the upstream regions of only 2 loci, one of each allele, could be inspected for let-

7 and miR-100 candidates (Figure 4-6). The syntenic region from a B. manjavacas miR-125 locus were aligned to the two A. vaga miR-125 loci. In B. manjavacas, miR-100 and let-7 occur less than 400 bp upstream of miR-125.

Small RNA reads from deep sequenced A. vaga small RNA library were mapped to regions 1000 bp upstream of miR-125 loci in A. vaga; the only reads that mapped to this region corresponded to miR-125. Other attempts to look for let-7 within the A. vaga miR-125 loci included taking the surrounding regions of any miR-100 or let-7 seed-containing sequences and determining if they gave the precursor hairpin secondary structure; this attempt did not turn up any candidates. The full alignment of the three rotifer miR-125 loci is shown to display the absence of miR-100 and let-7 resembling sequences (Figure 4-6). The two miR-125 loci in A. vaga genome both possessed genes approximately 1000 bp upstream (Table 4-4). Predicted genes are found upstream and downstream of two miR-125 loci. Of the four remaining loci, two loci had predicted ORFs downstream, but upstream regions were not part of the assembly.

There still remains the possibility of a rearrangement within the bdelloid, let-7 cluster following the split between bdelloids and monogononts. Such a rearrangement is hypothesized to have occurred only once with the

Caenorhabditis let-7 cluster. To test if this were the case in bdelloids, the same

116 regular expression used for results in Figure 4-7 was reapplied to search genomic data for any miR-100 and let-7 candidates (Figure 4-7A). Again, there were no miR-100 candidates that met this criteria and the let-7 candidate returned did not have proper secondary structure (Figure 4-7B).

let-7 binding sites on the 3’ UTR of its conserved targets

The absence of let-7 at both transcriptional and genomic levels raises questions about the regulation of let-7 targets in bdelloids. The 3’ UTR of a conserved let-7 target, lin-41, in A. vaga was examined for let-7 binding sites while the 3’ UTR of

B. manjavacas lin-41 was used as a comparison (Figure 4-8). The average length of bdelloid 3’ UTRs have yet to be determined, however, the length of the monogonont 3’ UTRs was found to range from 100 nt to 400 nt (Suga et al.,

2007). The lin-41 sequences were aligned according to their conserved NHL- repeat on the C-terminus (Figure 4-8A). Due to the uncertainty of the average length of bdelloid 3’UTRs, 1000 bases following the stop codon of all miRNA targets were searched for potential miRNA binding sites. Binding sites were reported if the minimum requirements for miRNA::target interaction were met

(Elkayam et al., 2012). Within 500 bases of the 3’UTR in B. manjavacas lin-41, 3 potential let-7 binding sites were found using the B. manjavacas let-7 sequence

(Figure 4-8B). Three copies of lin-41 from A. vaga genome were found and confirmed by reciprocal BLAST matches (Figure 4-9). Their 3’UTRs were interrogated for Bm-let-7 binding sites. Single or multiple binding sites were discovered within the 3’ UTRs of each copy of A. vaga lin- 41. However, the

117 pairing of let-7 and lin-41 in A. vaga was markedly different from the putative binding sites found within the B. manjavacas lin-41 3’UTR. Closer examination of these miRNA::target interactions revealed multiple positions containing G:U pairing within the seed sequence of nearly all predicted let-7 binding sites, a feature that has been shown reduce the levels of miRNA mediated repression

(Doench and Sharp, 2004). Also, characterized let-7 bindings sites are often repeated within the 3’ UTRs of their targets as seen with the predicted let-7 binding sites on the 3’ UTR of the monongonont lin-41 (Reinhart et al., 2000)

(Vella et al., 2004)(Figure 4-8B). The predicted binding sites do not meet these additional characteristics found just within a single 3’UTR of the monongonont lin-41. Finally, the 3’UTRs of lin-41 from both A. vaga and B. manjavacas were aligned to look for conserved motifs that matched the seed sequence of the let-7 binding site shown in Figure 4-8B. No conserved motifs could be identified within the alignment.

The 3’ UTRs of other conserved let-7 targets such as hbl-1 and dicer-1 were also retrieved from the genome and searched for let-7 binding sites (Grosshans et al.,

2005; Saito K, 2005)(Figure 4-10). Four copies of both hbl-1 and dicer-1 were identified in the genome; however, let-7 binding sites could not be identified in all

3’ UTRs. The binding sites that most closely resembled canonical let-7 binding sites are shown in Figures 4-10A and 4-10B. Canonical let-7 binding sites also could not be identified within the 3’ UTRs of 2 remaining two copies of A. vaga hbl-1 and dicer-1. Tandem let-7 binding sites were not identified on the 3’ UTRs

118 of targets that possessed at least one let-7 binding site. The average GC content of the A. vaga 3’ UTRs was 30%. The only strict requirement for pairing was the occurrence of the let-7 seed sequence hexamer; the probability of finding the hexamer by random chance occurs approximately once in 7000 bases. The cumulative length of all interrogated let-7 targets region exceeds 7000 nt, thus the presence of let-7 binding sites on some target regions were not considered statistically significant.

The absence of let-7’s negative regulator

The negative regulator of let-7, lin-28, possesses two conserved domains that function as RNA binding motifs: a cold shock domain and two retroviral-type

CCHC zinc knuckles (Lightfoot et al., 2011) (Nam et al., 2011). Searches in cDNA of both monogonont and bdelloid cDNAs did not return any lin-28 candidates. The lin-28 search in the A. vaga genome also failed to turn up any positive hits.

Discussion

The accumulated evidence supporting the claim of let-7 and miR-100 loss at the transcriptomic and genomic levels from the bdelloid A. vaga in conjunction with inability to detect both miRNAs in distantly related bdelloid, P. acuticornis, suggests that loss of these widely conserved miRNAs occurred early in the evolutionary history of bdelloids. For the shared rotifer miRNAs listed in Table 4-

2, sequences do not differ beyond 4 nt between monongonont and bdelloid

119 homologs. The ability of single probes to amplify shared miRNAs across three rotifer species shows that this sequence similarity can be used for methods of detection. Furthermore, miR-125 and let-7 are known to regulate the same target, hbl-1 (albeit through separate binding sites) at different time points during development in C. elegans (Grosshans et al., 2005); let-7 and miR-125 are intertwined with one another in a complex regulatory network in well characterized systems. The existence of polycistronic let-7 clusters, with miR-100 and miR-125, found in other species supports the suggestion genomic clustering of miR-100, let-7, and miR-125 in B. manjavacas is similarly polycistronic.

Therefore, it’s altogether unlikely that sequence conservation constraints would be relaxed for let-7 and miR-100, while still exerted on miR-125 in bdelloids. This is further supported by the lack of sequences resembling either of the lost miRNAs. Small RNA libraries do not possess candidates with the same seed sequences as miR-100 and let-7. Cumulatively, experimental and bioinformatic lines of evidence both support the claim of let-7/miR-100 loss in bdelloids.

The copy number of miR-125 and let-7 targets found in the A. vaga genome is intriguing in the context of let-7 absence. The predicted let-7 binding sites among the A. vaga let-7 targets appear questionable due to particular aspects of their miRNA::target interaction believed to compromise miRNA-mediated repression.

However, the maintenance of these targets in >2 copies as well as the multiple loci of miR-125 may tie into features that have contributed the successful asexual evolution of bdelloids. Approximately 77% of genes were found in two copies in

120 the A. vaga genome. The multiple copy number of genes may be significant because the bdelloid genome has lost only genes related sexual reproduction.

Perhaps the genes that were enriched for successful asexual evolution have conversely been enriched such as the miR-125 and let-7 targets.

It is conceivable that during a desiccation event early in bdelloid history a microhomology-mediated deletion took place within the let-7 cluster leaving only miR-125 intact while wiping out miR-100 and let-7. The consequences of conserved miRNA loss is especially intriguing when considering the biology of bdelloids and what makes them unique among metazoans. let-7 is deeply integrated within the DNA damage response pathway. let-7 was shown to be downregulated in response to DNA damage. let-7 targets, Ras, Rad 51, cyclin

D1, and Rad 18, play an integral role in responding to DNA damage (Saleh et al.,

2011). Another key component of the DNA damage response pathway, p53, downregulates machinery involved in miRNA processing such as Dicer, Drosha, and Dcr8 (Nittner et al., 2012). Although not a great deal is known about the desiccation process in bdelloids, the ability to repair considerable DNA damage is thought to ensure their survival during this period. They are the most radiosresistant metazoan known (Gladyshev and Meselson, 2008). Overlaying the absence of let-7 with the incredible DNA damage repair ability of bdelloids suggests that dispensing of let-7 might have been beneficial for the bdelloid lifestyle, which requires extreme robustness to the process of DNA damage. The absence of let-7’s negative regulator, lin-28, may be a synapomorphy of Rotifera,

121 which necessitated the loss of let-7 in the early ancestor of bdelloids. This may have facilitated a quicker and more effective response to DNA damage during desiccation.

Data on asexual organisms and let-7 expression is comparatively scarce, however, a survey of parthenogenetic and sexual aphids found that let-7 expression was markedly upregulated in the sexual strain in comparison to parthenogenetic one (Legeai et al., 2010). How this ties into their ancient asexuality remains to be seen, but miRNAs, or perhaps the absence of, may be the link between both aspects of bdelloid biology.

122 References

Christodoulou, F., Raible, F., Tomer, R., Simakov, O., Trachana, K., Klaus, S., Snyman, H., Hannon, G.J., Bork, P., and Arendt, D. (2010). Ancient animal microRNAs and the evolution of tissue identity. Nature 463, 1084-1088. Elkayam, E., Kuhn, C.D., Tocilj, A., Haase, A.D., Greene, E.M., Hannon, G.J., and Joshua-Tor, L. (2012). The structure of human argonaute-2 in complex with miR-20a. Cell 150, 100-110. Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge, C.B., and Bartel, D.P. (2005). The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310, 1817-1821. Fontaneto, D., Herniou, E.A., Boschetti, C., Caprioli, M., Melone, G., Ricci, C., and Barraclough, T.G. (2007). Independently evolving species in asexual bdelloid rotifers. PLoS biology 5, e87. Gladyshev, E., and Meselson, M. (2008). Extreme resistance of bdelloid rotifers to ionizing radiation. Proc Natl Acad Sci U S A 105, 5139-5144. Grosshans, H., Johnson, T., Reinert, K.L., Gerstein, M., and Slack, F.J. (2005). The temporal patterning microRNA let-7 regulates several transcription factors at the larval to adult transition in C. elegans. Developmental cell 8, 321-330. Hertel, J. (2012). Evolution of the let-7 microRNA Family. RNA biology 9:3, 1–11. Hertel, J., Lindemeyer, M., Missal, K., Fried, C., Tanzer, A., Flamm, C., Hofacker, I.L., and Stadler, P.F. (2006). The expansion of the metazoan microRNA repertoire. BMC Genomics 7, 25. Jovelin, R., and Cutter, A.D. (2011). MicroRNA sequence variation potentially contributes to within-species functional divergence in the nematode Caenorhabditis briggsae. Genetics 189, 967-976.

Kosik, K.S. (2010). MicroRNAs and cellular phenotypy. Cell 143, 21-26. Kozomara, A., and Griffiths-Jones, S. (2011). miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic acids research 39, D152-157. Kruger, J., and Rehmsmeier, M. (2006). RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic acids research 34, W451-454. Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862. Legeai, F., Rizk, G., Walsh, T., Edwards, O., Gordon, K., Lavenier, D., Leterme,

123 N., Mereau, A., Nicolas, J., Tagu, D., et al. (2010). Bioinformatic prediction, deep sequencing of microRNAs and expression analysis during phenotypic plasticity in the pea aphid, Acyrthosiphon pisum. BMC Genomics 11, 281. Lightfoot, H.L., Bugaut, A., Armisen, J., Lehrbach, N.J., Miska, E.A., and Balasubramanian, S. (2011). A LIN28-dependent structural change in pre-let-7g directly inhibits dicer processing. Biochemistry 50, 7514-7521. Mark Welch, D.B., Mark Welch, J.L., and Meselson, M. (2008). Evidence for degenerate tetraploidy in bdelloid rotifers. Proc Natl Acad Sci U S A 105, 5145- 5149. Mark Welch, J.L. (1998). Karyotypes of bdelloid rotifers from three families. Hydrobiologia 387/388, 403–407. Markham, N.R., and Zuker, M. (2005). DINAMelt web server for nucleic acid melting prediction. Nucleic acids research 33, W577-581. Minoche, A.E., Dohm, J.C., and Himmelbauer, H. (2011). Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol 12, R112. Nam, Y., Chen, C., Gregory, R.I., Chou, J.J., and Sliz, P. (2011). Molecular basis for interaction of let-7 microRNAs with Lin28. Cell 147, 1080-1091. Nittner, D., Lambertz, I., Clermont, F., Mestdagh, P., Kohler, C., Nielsen, S.J., Jochemsen, A., Speleman, F., Vandesompele, J., Dyer, M.A., et al. (2012). Synthetic lethality between Rb, p53 and Dicer or miR-17-92 in retinal progenitors suppresses retinoblastoma formation. Nature cell biology 14, 958-965. Nygaard, S., Jacobsen, A., Lindow, M., Eriksen, J., Balslev, E., Flyger, H., Tolstrup, N., Moller, S., Krogh, A., and Litman, T. (2009). Identification and analysis of miRNAs in human breast cancer and teratoma samples using deep sequencing. BMC medical genomics 2, 35. Olsen, P.H., and Ambros, V. (1999). The lin-4 regulatory RNA controls developmental timing in Caenorhabditis elegans by blocking LIN-14 protein synthesis after the initiation of translation. Dev Biol 216, 671-680. Pasquinelli, A.E., McCoy, A., Jimenez, E., Salo, E., Ruvkun, G., Martindale, M.Q., and Baguna, J. (2003). Expression of the 22 nucleotide let-7 heterochronic RNA throughout the Metazoa: a role in life history evolution? Evol Dev 5, 372- 378. Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B., Hayward, D.C., Ball, E.E., Degnan, B., Muller, P., et al. (2000). Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408, 86-89. Saleh, A.D., Savage, J.E., Cao, L., Soule, B.P., Ly, D., DeGraff, W., Harris, C.C.,

124 Mitchell, J.B., and Simone, N.L. (2011). Cellular stress induced alterations in microRNA let-7a and let-7b expression are dependent on p53. PloS one 6, e24429. Shaw, W.R., Armisen, J., Lehrbach, N.J., and Miska, E.A. (2010). The conserved miR-51 microRNA family is redundantly required for embryonic development and pharynx attachment in Caenorhabditis elegans. Genetics 185, 897-905. Sokol, N.S., Xu, P., Jan, Y.N., and Ambros, V. (2008). Drosophila let-7 microRNA is required for remodeling of the neuromusculature during metamorphosis. Genes & development 22, 1591-1596. Suga, K., Welch, D.M., Tanaka, Y., Sakakura, Y., and Hagiwara, A. (2007). Analysis of expressed sequence tags of the cyclically parthenogenetic rotifer Brachionus plicatilis. PloS one 2, e671.

Wheeler, B.M., Heimberg, A.M., Moy, V.N., Sperling, E.A., Holstein, T.W., Heber, S., and Peterson, K.J. (2009). The deep evolution of metazoan microRNAs. Evol Dev 11, 50-68. Yekta, S., Shih, I.H., and Bartel, D.P. (2004). MicroRNA-directed cleavage of HOXB8 mRNA. Science 304, 594-596

125 Tables and Figures

!"#$% &%'(%)*%

!"#$%&#'(( )*+++),*)*,++)**+,,),)

-$#$%&#'(( +*+++),**,,++)**+,,)*),

./#$%&#'01 *+++,)*)*+++,**+,,)*)*

.2#$%" ),)*)+***),,,+*)),),*)

Table 4-1 Sequences of qPCR probes. Sequence homology allowed for common probes to be used for species with shared miRNAs. Nv (N. vectinis), Pa (P. acuticornis) Bm (B. manjavacas) Av (A. vaga).

126 Table 4-2 Abundance of conserved miRNAs from rotifer small RNA libraries. miRNA homologs are color-coded to show conservation across species.

B.manjavacas Conserved miRNAs P. acuticornis Conserved miRNAs A.vaga Conserved miRNAs

Conserved Normalized Conserved Normalized Conserved Normalized miRNA Reads Reads (%) miRNA Reads Reads (%) miRNA Reads Reads (%)

miR-125 195 0.39 miR-125 177 0.26 miR-125 4709 0.09

miR-87 3987 7.94 miR-87 410 0.60 miR-87 5815 0.11 miR-1 1498 2.98 miR-1 1344 1.96 miR-1 1646 0.03 miR-7 432 0.86 miR-7 242 0.35 miR-7 1045 0.02 miR-281 113 0.22 miR-281 5* 0.01 miR-281 360** 0.01 miR-315 99 0.20 miR-315 89 0.13 miR-315 219 0.00

miR-242 85 0.17 miR-242 13 0.02 miR-242 913 0.02 miR-190 69 0.14 miR-190 12 0.02 miR-190 687 0.01 miR-981 32 0.06 miR-981 4* 0.01 miR-981 435 0.01 miR-153 29 0.06 miR-153 11 0.02 miR-153 66 0.00 miR-1175 150 0.30 miR-1175 472 0.69 miR-1175 352 0.01 miR-750 129 0.26 miR-750 132 0.19 miR-750 870 0.02

miR-375 98 0.20 miR-375 20 0.03 miR-375 1015 0.02

miR-79 309 0.62 miR-9 5 0.01 miR-9 1871 0.03 miR-100 65 0.13 miR-277 735 1.07 miR-277 729 0.01

bantam 53 0.11 miR-219 47 0.07 miR-219 143 0.00 miR-263 14 0.03 miR-2 11 0.02 miR-2 8191 0.15 let-7 4525 9.01 miR-71 23 0.03 miR-71 52171 0.97 miR-184 744 1.48 miR-279 16 0.02 miR-279 171 0.00 miR-12 150 0.30 miR-748 48 0.07 miR-748 67748 1.26 miR-304 337 0.67 miR-36 29 0.00 miR-995 2224 0.04 miR-124 15 0.00

*Read cutoff of 10 was overridden due to conservation of homologs

127 Table 4-3 Sequences of conserved miRNA homologs from rotifer surveys. E-values are reported from miRBase search tool. Multiple sequences for a single homolog represent alleles.

B.manjavacas Conserved miRNAsB.manjavacas Conserved miRNAs P. acuticornis Conserved miRNAsP. acuticornis Conserved miRNAs A.vaga Conserved miRNAsA.vaga Conserved miRNAs

Conserved Normalized Normalized Normalized miRNA Reads Reads (%) Sequence Conserved miRNA Reads Reads (%) Sequence Conserved miRNA Reads Reads (%) Sequence

miR-125 195 0.39 TCCCTGAGACCCTAACTTGTGA miR-125 177 0.26 ACCCTGAGACCCTAACTTGAGA miR-125 4709 0.09 ACCCTGAGACCTTAATTTGAGA

miR-87 3987 7.94 GTGAGCAAAGTTTCAAGTGTGT miR-87 410 0.60 GTGAGCAAAGTTTCAGGTGTAG miR-87 5815 0.11 GTGAGCAAAGTTTTAGGTGTA TGGAATGTAAAGAAGTTTGTG [1a] miR-1 1498 2.98 TGGAATGTAGTGAAGTGCGTTG [1c] miR-1 1344 1.96 TGGAATGTAATAGAAGTATGC miR-1a 1646 0.03 TGGAATGTAATAGAAGTATGC TGGAAGACCAGTGATTTTGTGT miR-7 432 0.86 TGGAAGACTAATGATTTTGTGT miR-7 242 0.35 TGGAAGACCTTTGATTTAGTGT miR-7a-5p 1045 0.02 TGGAAGACCTTTGATTTAGTG

miR-281 113 0.22 TGTCATGGAGTCGCTCTCGCAT miR-281 5* 0.01 TGTCATGGAATTGCTCTCCTC miR-281 360** 0.01 TTGTCATGGGGATTGCTCTCTTC

miR-315 99 0.20 TTTTGATTGTTGCTCAGAATGT miR-315 89 0.13 TTTTGATTGTTGCTCAGAGAGT miR-315 219 0.00 TTTTGATTGTTGCTCAGAGAG

TTGCGTAGGCGTTCTTGCAAGGA miR-242 85 0.17 TTGCGTAGGCGTTTTGCACTG miR-242 13 0.02 TTGCGTAGGCGTTCTTGCAAGG miR-242 913 0.02 TTGCGTAGGCGTTTTTGCAAGGA

miR-190 69 0.14 TGATATGTTGGATATTTGGTT miR-190 12 0.02 AGATATGTTTGATATTTGGTTG miR-190 687 0.01 TGATATGTTTGACATTTGGT

miR-981 32 0.06 TTCGTTGTCTTCGAAACCTGC miR-981 4* 0.01 TTCGTTGTCGTCAAAACCTGT miR-981 435 0.01 TTCGTTGTCGTCAAAACCTGT

miR-153 29 0.06 TTGCATAGTCACAAAAGCGATT miR-153 11 0.02 TTGCATAGTCACAAAAGCGACC miR-153 66 0.00 TTGCATAGTCACAAAAGCGA

miR-1175-3p 150 0.30 TGAGATTCAACTAACTTCACTTG miR-1175-3p 472 0.69 TGAGATTCAACTCCTCCACTTC miR-1175-3p 352 0.01 TGAGATTCAACTCCTCCACTTCT

miR-750 129 0.26 TCAGATCTAACTCTTTTGGCATT miR-750 132 0.19 CCAGATCTATATTCTTCCAGCTCA miR-750 870 0.02 CCAGATCTATATTCTTCCAGCTC

TAAATGCATTGGTCTGGTACGAT miR-375 98 0.20 TTTGTTCGTTAGGCTCGCACTA miR-277a 735 1.07 TAAATGCATTGGTCTGGTACGA miR-277 729 0.01

miR-79 309 0.62 ATAAAGCTAGATTATCAATGG miR-9 5 0.01 ATAAAGCTTGAATACCGGAGGA miR-9 1871 0.03 ATAAAGCTTGAATACCGGAGGAT

miR-100 65 0.13 CACCCGTAATTCCGAACTTGAGT miR-219 47 0.07 TGATTGTCTATACGCATTTCGT miR-219 143 0.00 TGATTGTCTATACGCATTTCGTT TATCACAGTCTTGCTTTGTTGAC TATCACAGTCTTGCTTTGTTGA bantam 53 0.11 TGAGATCATTGTGAAAACTGAT miR-2 11 0.02 TATCACAGTCTTGCTTTGTTGA miR-2 8191 0.15

miR-263 14 0.03 AATGGCACTAGGAAAACTCACG miR-71 23 0.03 TGAAAGACATGGGTAGTGAGAT miR-71-5p 52171 0.97 TGAAAGACATAGGCATTAATGAT

let-7 4525 9.01 TGAGGTAGTTGGTTGTATGGTT miR-375 20 0.03 TTTGTTCGTCTGGCTCGTATTA miR-375 1015 0.02 TTTGTTCGTCTGGCTCGTATTAT

miR-184 744 1.48 TGGACGGAGAATTGATAAGG miR-279 16 0.02 TGACTAGATCTCACACTCATC miR-279 171 0.00 TGACTAGATCTCACACTCATCC

miR-12 150 0.30 TGAGTATTACATCAGGTATTGAA miR-748 48 0.07 TGGACGGAGGTTTGACGAGGA miR-748 67748 1.26 TGGACGGAGGTTTGATGAGGAAT

miR-304 337 0.67 TAATCTCAGCTTGTAACATGGAG miR-36 29 0.00 TCACCGGGTATAACACTCTTCC

miR-995-3p 2224 0.04 AAGCACCAGGTGATATCAGCTTC

Sequences listed are largely represantive but not inclusive of all reads that correspond to the listed miRNA family. Read minimum set to 20 reads. miR-124 15 0.00 TAAGGCACGTGGTTATGAAT

Reads reported in B. manjavacas and A. vaga libraries are genomically supported.

*read minimum requirement overridden due to conservation **sequence if seed sequence is shifted 1 nt

128

A

B

C

Figure 4-1 Alignment of rotifer miRNAs that occur on the miR- 125/let-7/miR-100. (A) Rotifer miR-125 sequences from B. manjavacas (Bm), P. acuticornis (Pa), and A. vaga (Av). Allelic variants of miR-125 in A. vaga denoted by –C16 or –T16. (B) B. manjavacas let-7 sequence aligned to let-7 from Danio rerio . (C) B. manjavacas miR-100 sequence aligned to let-7 from Drosophila pseudoobscura (dps).

129 A B Reaction Probe Ct Value Detected Reaction Probe Ct Value Detected Av-let-7-1 let-7 39.49 ! Av-miR-100-1 miR-100bm Undetermined ! Av-let-7-2 let-7 Undetermined ! Av-miR-100-2 miR-100bm 38.50 ! Av-let-7-3 let-7 Undetermined ! Av-miR-100-3 miR-100bm Undetermined ! Pa-let-7-1 let-7 Undetermined ! Pa-miR-100-1 miR-100bm Undetermined ! Pa-let-7-2 let-7 Undetermined ! Pa-miR-100-2 miR-100bm 39.53 ! Pa-let-7-3 let-7 Undetermined ! Pa-miR-100-3 miR-100bm Undetermined ! Nv-let-7-1 let-7 37.00 ! Bm-miR-100-1 miR-100bm 24.73 " Nv-let-7-2 let-7 37.14 ! Bm-miR-100-2 miR-100bm 25.16 " Nv-let-7-3 let-7 39.07 ! Bm-miR-100-3 miR-100bm 27.97 " Bm-let-7-1 let-7 20.06 " Nv-miR-100nv-1 miR-100nv 19.51 " Bm-let-7-2 let-7 21.35 " Nv-miR-100nv-2 miR-100nv 18.97 " Bm-let-7-3 let-7 22.01 " Nv-miR-100nv-3 miR-100nv 18.33 " let-7-primer only-1 let-7 35.58 miR-100-only-1 miR-100bm Undetermined let-7-primer only-2 let-7 35.12 miR-100-only-2 miR-100bm Undetermined let-7-primer only-3 let-7 35.18 miR-100-only-3 miR-100bm Undetermined miR-100nv-only-1 miR-100nv Undetermined

C D Reaction Probe Ct Value Detected Reaction Probe Ct Value Detected Av-miR-125-1 miR-125 24.83 ! Av-miR-87 miR-87 29.35 ! Av-miR-125-3 miR-125 22.73 ! Pa-miR-87 miR-87 27.10 ! Bm-miR-125-1 miR-125 23.02 ! Bm-miR-87 miR-87 26.82 ! Bm-miR-125-2 miR-125 20.74 ! miR-87-only miR-87 37.00 Bm-miR-125-3 miR-125 16.60 ! Pa-miR-125-1 miR-125 26.92 ! Pa-miR-125-2 miR-125 19.38 ! Pa-miR-125-3 miR-125 22.54 ! miR-125-only-1 miR-125 36.28 miR-125-only-2 miR-125 38.57 miR-125-only-3 miR-125 37.88

Figure 4-2. Ct tables of qPCR reactions using let-7,miR-125, miR-100 and miR- 87 on N. vectinis , P.acuticornis (Pa), A. vaga (Av) and B. manjavacas (Bm) Shaded gray boxes indicate primer only reactions. (A) Table of all let-7 reactions, N. vectinis serving as a cDNA negative control. (B) miR-100 reactions. (C) miR- 125 reactions. (D) miR-87 reactions.

130 N.vectinis A.vaga P. acuticornis B. manjavacas

let-7

!!!" !!!" !!!" !!!"

miR-100

!" !!" !!" !!"

miR-125

!" !" !" !!" !!" !!"

Bm

Av Pa miR-87

!"

Figure 4-3 qPCR amplification of miRNAs on miR-100/let-7/miR-125 cluster. All experimental reactions were done in triplicate as shown (A. vaga miR-125 third replicate not shown). miR-87, a positive control amplified in all 3 rotifer species, was not done in triplicate. Primer only amplification profiles denoted by (*).

131 Figure 4-4 Northern for let-7 and miR-87. B. manjavacas and A. vaga were selected as representative species for monogononts and bdelloids, respectively.

132 A

B sme-let-7a TGAGGTAGAATGTTGGATGAC! Av-let-7 candidate_freq =7 TGAGGTAGAGCGTCGGA----! ! sme-let-7d AGAGGTAGTGATTCAAAAAGTT ! Av-let-7 candidate_freq =30 AGAGGTGGTGATTCAAGA-G—! ! C Av-sme-let-7a Av-sme-let-7d-like Av-miR-125

Figure 4-5 Reduced stringency search for A. vaga let-7 in Illumina small RNA library. (A) Alignment of basal metazoan let-7s from Schmidtea mediterranea (sme) , Schistosoma mansoni (sma) and Echinococcus multilocularis (emu) with B. manjavacas (Bm) let-7. (B) Alignment of let-7 candidates with best corresponding homolog match returned from reduced seed sequence stringency search. (C) Predicted secondary structure of Av-sme-let-7a, Av-sme-let-7d-like, and Av-miR-125. Outlined red box indicates position of seed sequence.

133 Figure 4-6 Alignment of miR-100/let-7/miR-125 genomic cluster from B. manjavacas (Bm) with miR-125-T16 and miR-125-C16 loci from A. vaga (Av).

134 "#$%&'$%()$*$ !"#$%&'()**+*+ ,"-.)/0+(1234 53-.6+)! ,"-.)/0+(1234 ,78/-.6+)! =/0>)6)0.+6"?+@( !"#$%&'$9%:(073;(% !! !! :< 367.+"/ =/0>)6)0.+6"?+@( 37*;1B,C$6"27-+4( !"#$%&'$9%:(073;(& %&:% 367.+"/ %A:& D*;07>;@67*)-+$*"E+ !"#$%&'$9%:(073;(A !! !! !! !! @;/+/"/(>+)H;( !"#$%&'$F%:(073;(% %%GG 0>)"/(& <&I JKB,(L)!"*;(367."+/ =/0>)6)0.+6"?+@( !"#$%&'$F%:(073;(& !! !! %: 367.+"/ !"#$%&'$F%:(073;(A !! !! !! !!

Table 4-4 Genomic environment of miR-125 loci in A. vaga. Insufficient sequence data indicated by (--).

135

A miR-100 variants

No matches in A.vaga Genome

let-7 variants

B

Av-genomic return-let-7- candidate

Figure 4-7 Sequences from miRbase of miR-100 and let-7 were collapsed to unique sequences and polymorphic positions are indicated as shown above. (A) Regex search on draft genome to capture any variants of miR-100 returned no hits. Regex search to capture any variants of let-7 returned the above sequence, which met sequence criterion of 3 nt substitution to nearest

136 known let-7. (B) The let-7 candidate does not fold into hairpin structure or is found in small RNA library.

A !"#$%&'()*+,-+./+!"#$%&' !"#$%&'()*+,-+0&+!"#$%&'

!12+%$3$4#+

B

Figure 4-8 lin-41 binding sites in B. manjavacas. (A) Alignment of the NHL repeat on C-terminal domain of lin-41 in B. manjavacas (Bm) and A. vaga (Av). (B) Predicted binding sites of Bm-let-7 on the 3’UTR of Bm-lin-41.

137 !"#$%&'()*+,& -./0&,&

!"#$%&'()*+,& -./0&1&

!"#$%&'()*+,& -./0&!&

Figure 4-9 Predicted binding site of Bm-let-7 on the 3’ UTRs of copies of Av-lin-41

138 A !"#$%&&'()*+& ,-./&+&

!"#$%&'()*+& ,-./&0&

B

!"#$%&12,34& ,-./&+&

!"#$%&12,34& ,-./&0&

Figure 4-10 Predicted binding site of Bm-let-7 on the copies of hbl-1 (A) and dicer (B).

139 Chapter V

Frequent substitutions in some Adineta vaga miRNA suggest RNA editing

140 ABSTRACT

Examination of small RNA reads in Adineta vaga mapping to the miR-125 loci show a comparatively high degree of sequence polymorphism. This is attributed to an increase in the rate of C-to-U substitutions among the representative reads. The level of substitutions is above sequencing error rates or variability reported for miRNA sequences of other metazoan small

RNA libraries. The remaining fraction of identified miRNA homologs does not show the level of sequence heterogeneity as miR-125 reads. The abundance of cytidine deaminases within the Adineta vaga genome and their detected expression provides a possible mechanism of RNA editing through which these polymorphic miRNAs may arise. The nucleotide change from C-to-U increases the targeting repertoire for these variants as uracil is able to pair with both adenosine and guanine. The finding of edited variants of miR-125 in conjunction with the absence of other members of the let-7 cluster, let-7 and miR-100, suggests a compensatory role for these edited miRNA in Adineta vaga.

141 miRNA sequence polymorphism is observed in numerous small RNA surveys.

These polymorphisms are termed "isomirs." The different types of isomirs include nucleotide substitutions, indels and sequence length variation (Reese et al.,

2010; Sdassi et al., 2009). The action of deaminases and allelic variations provide ways to alter the miRNA sequence, while Drosha-, Dicer- and Ago2- mediated cleavage have been shown to contribute to miRNA length polymorphisms (Kim, 2005; Mi et al., 2008; Takeda et al., 2008). The effect of the polymorphism varies. In some instances, either the frequency or nature of the polymorphism has seemingly negligible effects (Doench and Sharp, 2004). In other cases, the effects are more pronounced (Abelson et al., 2005). There is also a critical as well as a confounding factor in understanding sequence complexity within a miRNA repertoire: detected miRNA variation itself is heavily dependent on the bioinformatic analysis through which it is obtained.

Isomirs have the potential to alter an existing gene regulatory program with the acquisition of different targets or changes in the strength of miRNA regulation.

Among the different types of isomirs, base substitution is possibly one of the more dynamic ways to epigenetically alter gene regulation, especially if the change occurs within the seed sequence. Thus far, two types of nucleotide editing have been observed in miRNAs: adenosine-to-inosine (A-to-I) and cytidine-to-uracil (C-to-U) (Blow et al., 2006; Kawahara et al., 2007b; Luciano et al., 2004; Mi et al., 2008). Only A-to-I editing of miRNAs has been documented in metazoans and C-to-U editing is observed in plants. The catalogue of known metazoan miRNA-editing enzymes comprises only of a family of adenosine deaminases called ADAR. ADARs act on dsRNA and have been shown to edit both pri-miRNA and pre-miRNA molecules (Luciano et al., 2004; Nishikura,

2006). Consequences of editing on both types of miRNA molecules usually prevent their participation in downstream processing steps. Other groups observed both pri-miRNAs and pre-miRNAs are degraded by inosine-specific ribonuclease, Tudor-SN (Scadden, 2005). This form of RNA editing generally has antagonistic effects on miRNA-mediated translational repression(Kawahara et al., 2007a). However, contrary to these findings, A-to-I edited miRNA variants have also been observed. Since such variants are present in the mature sequence, it proves that degradation is not always the outcome of A-to-I editing

(Pfeffer et al., 2005). From these findings, it appears the fate of A-to-I edited miRNAs is varied in metazoans and perhaps results in different outcomes depending on when the editing occurs and also the position of the nucleotide change of the pre-/pri-miRNA molecule. In plants, a family of cytidine deaminases (CDARs) mediate C-to-U editing on miRNAs (Ebhardt et al., 2009).

Earlier reports showed CDAR activity was predominantly in mitochondria and plant organelles. However, the findings of miRNA editing in conjunction with the absence of miRNA localization in any of these organelles suggested that editing took place either in the nucleus or cytoplasm.

The most commonly noted form of isomirs in metazoan small RNA library results from 5’ and 3’ heterogeneity (Wyman et al., 2011). A comprehensive survey of

143 miRNA complexity from different small RNA libraries made from human and mouse cell types revealed that the greatest fraction of isomirs fall into this category, but 5’ heterogeneity occurred outside of the seed sequence (Lee et al.,

2010). To assay the contribution of small RNA library construction and sequencing error to miRNA complexity, a spiked oligo was added to RNA prior to library construction. The lack of isomirs arising from the synthetic RNA oligo suggested that small RNA library construction and deep sequencing contribute minimally to the complexity of a miRNA repertoire. The differences in isomir distribution among different cell types indicated potentially different miRNA processing mechanisms that were specific to certain cell types.

Internal nucleotide substitutions can have the greatest effect on miRNA targeting.

Single nucleotide changes within the miRNA sequence, depending on its position, can upset target specificity. Cases where such substitutions occur in the seed sequence have the most profound effects (Doench and Sharp, 2004). The

G:U wobble pairing is also strongly disfavored in miRNA:target interactions.

Overall, substitutions beyond or at the 5’ and 3’ termini have the least effects on targeting specificity. While substitutions occurring outside of the seed sequence might not effect targeting dramatically, they may destabilize the secondary structure or effect downstream miRNA processing steps (Habig et al., 2007).

144 Material and Methods

Detailed descriptions of the methods used for this section are provided in the preceding chapters. Briefly summarized methods are provided for the bioinformatic processing of small RNA reads.

Small RNA library construction

Full description of method found in Chapter 2.

Bioinformatic Processing of Reads

Small RNA reads were processed by fastx collapse (Chapter 2). The output returns a read id, which is represented by two numbers (>x-y); x is the rank of the small read in the library and y is total count. In Chapter 4 methods, small RNA reads that matched conserved miRNAs returned by SMD were required to have at minimum 90% identity to genomic context. Bowtie was run using default settings, which allowed a maximum of 3 substitutions to map to genomic loci. To comprehensively analyze miRNA sequence variants in small RNA surveys of

B.manjavacas and A. vaga, the program, miRseqvars (Chapter 3) was used. miRseqvars requires a 100% match to genomic context to ascertain internal miRNA sequence heterogeneity.

Results

The discovery of C-to-U substitutions in miR-125 in A. vaga

145 There are two alleles of miR-125 loci in A. vaga genome differing in the 16th position of the miRNA sequence by a C or T; alleles were accordingly named Av- miR-125-T16 and Av-miR-125-C16. Reads that mapped to these loci exhibited sequence polymorphism in the form of C-to-U substitutions (Figure 5-1). The substitutions occurred at any position containing a cytidine, except for the cytidine at the tenth position of the sequence. Many of these substitutions were in the seed sequence of the miRNA. In some cases, multiple substitutions occurred on the same read. miRseqvars was run on the small RNA surveys of all libraries.

Overall, many isomirs were frequently found at a frequency of 10 or below in both small RNA libraries. They were therefore discounted as significant contributors to miRNA complexity. Instances of 5’/3’ heterogeneity were also returned by miRseqvars, but internal nucleotide substitutions were focused upon, specifically.

The miR-125 sequences of the 454 libraries showed little variation in this respect.

The miR-125 sequences of P. acuticornis had only single substitutions in singleton reads and miR-125 of B. manjavacas showed little internal sequence variation (Figure 5-2).

Other miRNA homologs in A. vaga show little sequence heterogeneity

The number of different miR-125 isomirs and their relative frequencies prompted closer examination of miRNA sequences within the Illumina A. vaga small RNA library. Hydrolytic deamination of the cytidines may have occurred during library construction steps or during the sequencing process (Friedberg E.C. 1995).

Several cytidine-containing reads corresponding to miRNAs were aligned with

146 their respective isomirs to determine the background level of C-to-U substitution in the remainder of the A. vaga small RNA library (Figure 5-3). For a specific miRNA homolog, the percentage of potentially edited isomirs was determined by taking the proportion of isomirs without genomic context over the total number of isomirs. The edited version of miR-748 was originally identified by SMD because it possessed sequence similarity to a known miRNA homolog (Figure 5-3A). The unedited version with genomic correspondence was not returned by SMD, but identified by miRseqvars and represents the most abundant miRNA sequence in the small RNA library, however, only a small fraction its isomirs are editing candidates. The remaining homologs, mir-242, miR-375, and miR-2 all show a C- to-U substitution rate of around 1% (Figure 5-3B-D). There was a lack of internal

C-to-U substitutions in two additional cytidine-containing miRNAs (Figure 5-4).

Discussion

The rate of C-to-U substitution among miR-125 variants is unusually high in comparison to the rest of the cytidine-containing miRNA homologs. Among these homologs, the miR-2 isomirs occur at a frequency close to that of miR-125, but total percentage of edited isomirs is far less. Furthermore, mir-748 isomirs overwhelmingly dominate the miRNA repertoire of A. vaga in the small RNA library; however, the edited isomirs make up a small fraction of the total.

147 C-to-U editing of miRNAs has not been previously observed in any metazoan. C- to-U editing of mRNA has been shown. APOBECs, a family of cytidine deaminases, are among the well studied examples of metazoan cytidine deaminases, earning their name from their first identified function of editing

Apolipoprotien mRNA B (Teng et al., 1993). These deaminases rely on a mooring sequence for substrate recognition. Due to the short sequence of the miRNA and secondary structure of the pre-miRNA, they are unlikely to be targets of these cytidine deaminases. Whole genome sequencing of A. vaga revealed as many as 200 copies of cytidine deaminases and two copies of adenosine deaminases (IA personal communication). Many of these cytidine deaminases have been detected in cDNA libraries made from axenic cultures of A. vaga.

Most of the cytidine deaminases in A. vaga bear a mitochondrial targeting signal

(mts) and best correspond to plant cytidine deaminases. As suggested with the plant cytidine deaminases, they may not reside exclusively in the mitochondria.

The predominance of plant cytidine deaminases in the A. vaga genome provides a mechanism for C-to-U editing of miRNAs in A. vaga. Alternatively, ADARs may also mediate A-to-I editing on the pre-miRNA, which if repaired would create A- to-G pairing (Gu et al., 2012). This would result in U-to-C editing on the complementary strand. Such a phenomenon was also noted among the miRNA sequence variants.

This data in conjunction with the loss of miR-100 and let-7 in A. vaga show that editing of miR-125 may play a compensatory role in the absence of two

148 conserved miRNAs to assist with repression of former let-7 and miR-100 targets.

It is also important to note that due to a lack of contiguous uracils in the let-7 seed sequence, there is no way for a novel miRNA sequence to be edited to resemble the targeting capacity of let-7. However, the C-to-U substitutions are likely not restricted to miRNAs. They are equally likely to be modifying the target sites to mediate a co-evolution of miRNA:target interaction.

Curiously, the miR-125 of P. acuticornis did not show any instances of editing in the small RNA library. The failure to recover edited isomirs of miR-125 in P. acuticornis might be due to a lack of sequencing depth. However, the miR-748 in

P. acuticornis is identical to edited miR-748 in A. vaga, which might suggest that it is the edited form in P. acuticornis (see Table 4-3 in Chapter 4). The relatively low levels of C-to-U substitutions among some other miRNA homologs suggest these homologs may also be substrates for cytidine deaminases. Sequencing error and small RNA library construction error have not previously been shown to be biased towards C-to-U substitutions, which is the predominant sequence polymorphism that is observed in the small RNA library. Furthermore, those surveys focus on substitutions arising around the 5’ and 3’ termini of the sequences. Therefore, the isomirs resulting from C-to-U substitutions are likely to be true novel miRNAs that arise from the editing of the genomically derived miRNA homologs.

149

References

Abelson, J.F., Kwan, K.Y., O'Roak, B.J., Baek, D.Y., Stillman, A.A., Morgan, T.M., Mathews, C.A., Pauls, D.L., Rasin, M.R., Gunel, M., et al. (2005). Sequence variants in SLITRK1 are associated with Tourette's syndrome. Science 310, 317-320.

Blow, M.J., Grocock, R.J., van Dongen, S., Enright, A.J., Dicks, E., Futreal, P.A., Wooster, R., and Stratton, M.R. (2006). RNA editing of human microRNAs. Genome Biol 7, R27.

Backus J. W., Smith H. C. (1991) Nucleic Acids Res. 19:6781–6786.

Doench, J.G., and Sharp, P.A. (2004). Specificity of microRNA target selection in translational repression. Genes & development 18, 504-511.

Ebhardt, H.A., Tsang, H.H., Dai, D.C., Liu, Y., Bostan, B., and Fahlman, R.P. (2009). Meta-analysis of small RNA-sequencing errors reveals ubiquitous post- transcriptional RNA modifications. Nucleic acids research 37, 2461-2470.

Friedberg E.C., W., G.C. and Siede,W. 1995 DNA Repair and Mutagenesis. ASM Press, Washington, DC, 135–190.

Gu T, Buaas FW, Simons AK, Ackert-Bicknell CL, Braun RE, et al. (2012) Canonical A-to-I and C-to-U RNA Editing Is Enriched at 3′UTRs and microRNA Target Sites in Multiple Mouse Tissues. PLoS ONE 7(3): e33720

Habig, J.W., Dale, T., and Bass, B.L. (2007). miRNA editing--we should have inosine this coming. Molecular cell 25, 792-793.

Kawahara, Y., Zinshteyn, B., Chendrimada, T.P., Shiekhattar, R., and Nishikura, K. (2007a). RNA editing of the microRNA-151 precursor blocks cleavage by the Dicer-TRBP complex. EMBO reports 8, 763-769.

Kawahara, Y., Zinshteyn, B., Sethupathy, P., Iizasa, H., Hatzigeorgiou, A.G., and Nishikura, K. (2007b). Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315, 1137-1140.

Kim, V.N. (2005). MicroRNA biogenesis: coordinated cropping and dicing. Nature reviews Molecular cell biology 6, 376-385.

Lee, L.W., Zhang, S., Etheridge, A., Ma, L., Martin, D., Galas, D., and Wang, K. (2010). Complexity of the microRNA repertoire revealed by next-generation sequencing. RNA 16, 2170-2180.

150 Luciano, D.J., Mirsky, H., Vendetti, N.J., and Maas, S. (2004). RNA editing of a miRNA precursor. RNA 10, 1174-1177.

Mi, S., Cai, T., Hu, Y., Chen, Y., Hodges, E., Ni, F., Wu, L., Li, S., Zhou, H., Long, C., et al. (2008). Sorting of small RNAs into Arabidopsis argonaute complexes is directed by the 5' terminal nucleotide. Cell 133, 116-127.

Nishikura, K. (2006). Editor meets silencer: crosstalk between RNA editing and RNA interference. Nature reviews Molecular cell biology 7, 919-931.

Pfeffer, S., Sewer, A., Lagos-Quintana, M., Sheridan, R., Sander, C., Grasser, F.A., van Dyk, L.F., Ho, C.K., Shuman, S., Chien, M., et al. (2005). Identification of microRNAs of the herpesvirus family. Nat Methods 2, 269-276.

Reese, T.A., Xia, J., Johnson, L.S., Zhou, X., Zhang, W., and Virgin, H.W. (2010). Identification of novel microRNA-like molecules generated from herpesvirus and host tRNA transcripts. Journal of virology 84, 10344-10353.

Scadden, A.D. (2005). The RISC subunit Tudor-SN binds to hyper-edited double- stranded RNA and promotes its cleavage. Nature structural & molecular biology 12, 489-496.

Sdassi, N., Silveri, L., Laubier, J., Tilly, G., Costa, J., Layani, S., Vilotte, J.L., and Le Provost, F. (2009). Identification and characterization of new miRNAs cloned from normal mouse mammary gland. BMC Genomics 10, 149.

Takeda, A., Iwasaki, S., Watanabe, T., Utsumi, M., and Watanabe, Y. (2008). The mechanism selecting the guide strand from small RNA duplexes is different among argonaute proteins. Plant & cell physiology 49, 493-500.

Teng, B., Burant, C.F., and Davidson, N.O. (1993). Molecular cloning of an apolipoprotein B messenger RNA editing protein. Science 260, 1816-1819.

Wyman, S.K., Knouf, E.C., Parkin, R.K., Fritz, B.R., Lin, D.W., Dennis, L.M., Krouse, M.A., Webster, P.J., and Tewari, M. (2011). Post-transcriptional generation of miRNA variants by multiple nucleotidyl transferases contributes to miRNA transcriptome complexity. Genome Res 21, 1450-1461.

151 Figures

miR -125 - T16 miR – 125 -C16

perfect match reads ACCCTGAGACCCTAATTTGAG = 2697 reads ACCCTGAGACCCTAACTTGAG = 1269 reads

Candidate Reads for RNA Editing ACTTTGAGACCCTAATTTGAGA = 23 reads* ACTCTGAGACCCTAACTTGAGA = 40 reads* ACCCTGAGACCTTAATTTGAGA[A] = 100 reads ACCCTGAGATTCTAACTTGAGA = 15 reads ACCCTGAGACTTTAATTTGAGA = 79 reads ACCCTGAGACCTTAACTTGAGA = 45 reads ACTCTGAGACCCTAATTTGAGA = 86 reads* ACCCTGAGATCCTAACTTGAGA = 24 reads ACCCTGAGACTCTAATTTGAGAA = 50 reads ACCTTGAGACCCTAACTTGAG = 39 reads* ACCCTGAGACTTTAATTTGAGA = 33 reads ACTTTGAGACTCTAACTTGAGA = 5 reads* ACCTTGAGACCCTAATTTGAGA = 44 reads* ACCCTGAGACTCTAACTTGAGA = 5 reads ACCCTGAGACTCTAATTTGAGA = 50 reads ATTCTGAGACCCTAACTTGAGA = 26 reads* ATTCTGAGACCCTAATTTGAGAA = 27 reads* ACCCTGAGACTTTAACTTGAGA = 9 reads ATCCTGAGACCCTAATTTGAGA = 7 reads* ACTCTGAGACCCTAATTTGAGA = 7 reads*

Figure 5-1 miR-125 editing in A. vaga. The alleles of miR-125 are shown as miR-125-T16 and miR-125-C16. Reads that have genomic context to either allele are grouped under perfect match reads below the loci to which they map. The reads that do not have genomic context, but mapped to miR-125 loci are listed under candidate reads for RNA editing, also below the loci to which they map. The read numbers reflect the number of occurrences for a particular sequence. The letters in red indicate C-to-U changes and blue indicates any other type of changes from the genomic sequence. (*) indicate substitutions that result in seed changes to the number of reads for the miR-125 loci.

152 A

B

Figure 5-2 miR-125 reads from B. manjavacas (A) library and P. acuticornis library (B). Reads are homogenous with the exception of insertion and substitution in miR-125 from P. acuticornis.

153 A

!"#$%

B

&"&$%

C

&"'$%

D

&"!$%

Figure 5-3 Alignment of isomirs corresponding to conserved miRNAs from SMD output. For each read, the rank of its abundance followed by a “-“ and its frequency, then conserved miRNA match. Those matched by SMD have the miRNA homolog match and NW_score appended to read id. The letters “A”and “E” stand for allelic variant and edited variant, respectively. The percentage of total edited variants is shown next to the alignment (A) Alignment of isomirs corresponding to miR-748 (B) Alignment of isomirs corresponding to miR-242 shows two alleles. (C) Alignment of isomirs corresponding to miR-375 (D) Alignment of isomirs corresponding to miR-2a shows two alleles.

154 A

B

Figure 5-4 Lacking of C-to-U substitutions in miR-1175 (A) and miR-315 (B). For each read, the rank of its abundance followed by a “-“ and its frequency, then conserved miRNA match. Those matched by SMD have the miRNA homolog match and NW_score appended to read id.

155

Chapter VI

Future Directions

156 The exceptional miRNA repertoire of bdelloid rotifers is perhaps not surprising considering their other outstanding qualities. The knowledge of their miRNA repertoire provides insights into their unique biology. The absence of let-7 in the most radioresistant animal known, not only sheds light on how they may achieved radioresistance, but taken with additional absence of lin-28, elucidates additional mechanisms to understand let-7’s role in DNA damage resistance.

Intriguingly, the dispensability of let-7 in bdelloids provides a compelling link between asexuality and DNA damage as let-7 likely has a role in both. The editing of miR-125 and several other miRNAs that may be provide a buffering role for the absence of conserved miRNAs. Perhaps the observation of low level editing of other miRNAs is a general mechanism to buffer against miRNA loss in animals. The remaining subset of novel miRNAs making up the complement of bdelloid miRNAs will be revealing. Considering the abundance of miR-748 in small RNA library, it’s likely that there are other highly expressed novel miRNAs in the bdelloid A .vaga library also contributing to bdelloid biology, possibly in the regulation of genes that undergo gene conversion. However, they will require more rigorous validation.

As previously mentioned, body plan evolution is brought about through innovation of new cell types. One of the more intriguing theories that stem from such an idea is canalization during the transition to new cell types is achieved through miRNA regulation. During cell state transitions, miRNAs are known to act by essentially buffering transcriptomic noise. Canalization through miRNAs

157 allows new cell identities to evolve by exploiting the pool of precursors within the progenitor cells to innovate new cell types, while avoiding the pitfalls that may result from the transition (Kosik, 2010; Peterson et al., 2009). Thus, the utility of miRNAs is especially compelling in the context of bdelloid biology; the foremost reason being that miRNAs may provide the tools to overcome hurdles in adaptation of an asexually reproducing organism. If adaptation necessitates new cell types, this can also be achieved either through modification of existing miRNA by means of RNA editing.

Elucidating let-7 and miR-100 function in B. manjavacas

B.majavacas is an established model system for RNAi work. Therefore, let-7 and miR-100 can be knocked out to determine their putative targets in monogononts.

Since lin-41 has predicted binding sites for let-7, its expression profile will be monitored through qPCR and serve as a guide for de-repression. To experimentally confirm these targets, the 3’UTR of candidate targets will then be cloned into a heterologous system described in Brenneck et al.(Brennecke et al.,

2005). The endogenous let-7 will also be knocked out and B. manjavas let-7 will be transfected into cells. Then repression will be measured as described in the study.

Using SMD to find other novel rotifer miRNAs and miRNA-target prediction

A scheme to identify other novel miRNAs within the sequenced small RNA libraries would require appending the miRBase database with the recovered

158 rotifer miRNA sequences. The small RNA reads, which were previously identified, would be removed only unmatched reads would be given as input to

SMD.

As explained in Chapter 3, tools for miRNA analysis in nonmodel organisms are scarce. This poses a challenge for understanding the impact of miRNA regulation in rotifers. However the complement of cDNA data begins to resolve this problem. While modeled for miRNA sequence prediction, SMD, may be manipulated to search for potential miRNA binding sites if the rules for miRNA- target interaction are applied to the current algorithm. The miRNA-target data may reveal that miRNAs, as shown in other organisms, preferentially target genes in multiple copies, which is a common feature of the degenerate tetraploid bdelloid genome (Li et al., 2008).

Expression profiling of select conserved and novel miRNAs

To quantitatively determine the effect of miRNA editing in bdelloids. A qPCR strategy can be employed to screen for miR-125 isomirs relative to the genomic loci. DNA and RNA can be extracted from the same RNA extraction samples and then expression profile can be normalized to DNA copy of pre-miR-125. For a given sample equivalent concentration of cDNA and DNA can be added to qPCR reactions and primers can be used to amplify miR-125 from cDNA and pre-miR-

125 from genomic DNA. Thus relative expressions of miR-125 variants can be precisely measured relative to DNA copy number within the same sample. A similar approach can be employed to amplify miR-100, miR-125 and let-7 in

159 monogononts, so relative expression of these conserved miRNAs can be determined.

160 References

Brennecke, J., Stark, A., Russell, R.B., and Cohen, S.M. (2005). Principles of microRNA-target recognition. PLoS biology 3, e85.

Kosik, K.S. (2010). MicroRNAs and cellular phenotypy. Cell 143, 21-26.

Li, J., Musso, G., and Zhang, Z. (2008). Preferential regulation of duplicated genes by microRNAs in mammals. Genome Biol 9, R132.

Peterson, K.J., Dietrich, M.R., and McPeek, M.A. (2009). MicroRNAs and metazoan macroevolution: insights into canalization, complexity, and the Cambrian explosion. BioEssays : news and reviews in molecular, cellular and developmental biology 31, 736-747.

161 Appendix A

Secondary Structures of rotifer miRNAs

162 Appendix A-1 A.vaga secondary structures

Av-miR-277 !"#$%&#'()* !"#$%&#+,*

!"#$%&#-* !"#$%&#'.* !"#$%&#(-*

163 !"#$%&#'()* !"#$%&#+,(* !"#$%&#-.,*

!"#$%&#/,-* !"#$%0#/.,*

164 Output of sir_graph (©) Created Sat Oct 1 17:30:14 2011 mfold_util 4.6 TT C A 40 T T A A G T T G T A 30 T A T C G C G A T A A C G 50 T A C G C C 20 T A T A T A T G A T A T A T 60 G C T A C G 10 C G G T C G A T C G T A A T A T 70 5’ T GAT CCAA

3’ dG = -27.594 A !"#$%&#''()#*+, !"#$%&#-(, !"#$%&#./.,

!"#$%�)#*+, !"#$%&#.)1,

165 !"#$%&#'()*

166

167 Appendix A-2 B.manjavacas secondary structures

!"#"$%#&'() !"#"$%#*+() !"#"$%#*,-)

!"#"$%#(*+) !"#"$%#&.*) !"#"$%#./)

168 !"#"$%#&''( !"#"$%#)*&( !"#"$%#&+,(

!"#"$%#-( !"#"$%#.-,( !"#"$%#&+(

169 !"#$%&#'(

170