<<

Conceptual and Empirical Investigations of Eukaryotic

by

Tyler Adam Elliott

A Thesis

presented to

The University of Guelph

In partial fulfilment of requirements

for the degree of

Doctor of Philosophy in Integrative Biology

Guelph, Ontario, Canada

© Tyler Adam Elliott, December, 2016

ABSTRACT

Conceptual and Empirical Investigations of Eukaryotic Transposable Element Evolution

Tyler Adam Elliott Advisor: University of Guelph, 2016 Professor T.R. Gregory

Transposable elements (TEs), mobile pieces of self-replicating DNA, are one of the driving forces behind genomic evolution in eukaryotic organisms. Their contribution to size variation and as mutagens has led researchers to pursue their study in the hope of better understanding the evolution of genomic properties and organismal phenotypes But TEs can also be thought of in a multi-level evolutionary context, with TEs best understood as evolving populations residing within (and interacting with) the host genome. I argue, with empirical evidence from the literature, that the multi-level approach advocated by the classic ―selfish

DNA‖ papers of 1980 has become less commonly invoked over the past 35 years, in a favour of a strictly organism-centric view. I also make the case that an exploration of evolution at the level of TEs within is required, one which articulates the similarities and differences between a TE population and a traditional population of organisms. A comprehensive analysis of sequenced genomes outlines the landscape of how TE superfamilies are distributed, but also reveals that how TEs are reported needs to be addressed. A proper exploration of evolution at the TE level will require a dramatic change to how TE information is annotated, curated, and stored, and I make several specific recommendations in this regard.

ACKNOWLEDGEMENTS

This thesis represent the most taxing work I‘ve done, for various reasons, and there are many people to thank for it finally coming to fruition. I‘d like to thank my advisor Ryan Gregory for guiding me through this sometimes strange PhD experience. I‘ll always be grateful for the opportunities I‘ve been given by working with an advisor who is always willing to help me explore interesting and new areas of research. Dr. Stefan Linquist introduced me to the world of philosophy years ago, and I greatly value his friendship, and the new perspective this had afforded me.

Dr. Teri Crease has always been a sympathetic and wise ear when I was having a problem, and is nearly always able to answer the questions I have so very easily. She is a font of knowledge that all graduate students should be so lucky to have her just two doors away. My thanks go to Dr. Cort Griswold, for being subjected to countless meetings where we discussed

TEs. Although that particular project didn‘t come to fruition, I know that Cort has incorporated

TEs into his lectures. I consider that an immense win.

Without the support of my lab mates and colleagues, graduate school would have been way less enjoyable. Dr. Nick Jeffery was a joy to sit beside for 5 years, and in him I will always have a life-long friend. Brent Saylor was always there when I needed a fellow TE person to complain to about the quirks of our field, and his knowledge was invaluable to this PhD. And it amazes me that we never drove each other nuts on the 10 hour long drives to TE meetings at Cold Spring Harbor and Woods Hole. I would also like to thank all past members of the Gregory, Hebert, Crease, Lynn and Adamowicz labs for their friendship and help over the years. To Haley, Darren, Katherine and Ella, thank-you for bringing countless smiles to my face this past year.

iii

Many others contributed in smaller ways to this thesis, in the form of advice, both in person, at the pub and through mail, electronic or otherwise. These include: Dr. Sally

Adamowicz, Dr. Peter Cock and the Biostar forum, the TE community on Twitter, Dr. Gabriel da

Luz Wallau, Dr. Arnaud Le Rouzic, Dr. Roland Vergilino, Dr. Ilia Leitch, Dr. Arvid Ågren, Dr.

Peter A. Peterson, and Dr. Ellen Clarke. I would also like to thank Dr. Roy G. Danzmann and Dr.

Cedric Feschotte for serving on my defence committee and providing useful suggestions and stimulating conversation.

To the Science and Philosophy squad (I guess that‘s what we are now): Dr. Karl Cottenie,

Dr. Stefan Kremer, Dr. Stefan Linquist, Ryan and Brent. Our weekly meetings were always enlightening, even the ones where we got side-tracked on inexplicable tangents that had nothing to do with TEs, community ecology, computer science or philosophy.

NSERC grants to Ryan, and OGS and OGSST grants to myself, provided the resources to be able to learn as a vocation.

We all know academic departments cannot run smoothly without the efforts of wonderful staff. Thanks go to Lori Ferguson, Monica McKay, Debbie Bailey and Mary Ann Davis.

That which has kept me sane over these 5 years deserves accolades: Mike, Joel, the Mads and the Bots, Ed Greenwood and his Forgotten Realms, the Tell‘Em Steve-Dave podcast, and the folks at Red Letter Media.

My friends Dan, Victoria, Matt, Bill, Brandon, Ryan and Shane were always there throughout the years to provide laughs and comfort. I look forward to being able to spend more time with them in the future, and being able to respond with more than ―finishing my thesis‖ when they ask what I‘m doing.

My family has always been there for me. I‘m grateful to have grown up with four sisters

iv whom I admire, love and respect. And without my brother-in-law, I wouldn‘t know the power of positive thinking. To Shannon, Derek, Kirstin, Marissa, Rhianna, Adrian, Kevin, Christina and

Oliver; I love you all.

More so than any other people, I owe the greatest thanks to my parents, Douglas Elliott and Sonia O‘Brien. They must have seen the budding scientist in me all those many decades ago and never stopped encouraging me. I dedicate this thesis to them.

v

STATEMENT OF AUTHORSHIP AND CONTRIBUTIONS

All chapters herein were written by the author of this thesis alone. Chapter 3 is based upon two publications published in The Philosophical Transactions of the Royal Society of

London B (2015, Vol. 370, pg. 20140331) and BMC Evolutionary Biology (2015, Vol. 15, pg.

69), both co-authored with Dr. T. Ryan Gregory. For these papers, and for Chapter 2, both co- authors had input on study design, but data collection and analysis were performed by me. The ideas from Chapter 4 were conceived of by the author of this thesis, with some input from Dr. T.

Ryan Gregory.

vi

TABLE OF CONTENTS

ACKNOWLEDGEMENTS………………………………………………………...…..……...iii

STATEMENT OF AUTHORSHIP AND CONTRIBUTIONS…………………………...….vi

TABLE OF CONTENTS……………………………………………………………………....vii

LIST OF TABLES…………………………………………………………………………...…xii

LIST OF FIGURES…………………………………………………………………………....xiii

LIST OF ABBREVIATIONS……………………………………………………………...….xvi

LIST OF APPENDICES………………………………………………………………...…….xix

Chapter 1

A general introduction to transposable element biology and evolution………...... ……1

Abstract…………………………………………………………………………...………1

Introduction……………………………………………………………………………....2

The diversity and evolution of eukaryotic transposable elements…………………….3

Class I- …………………………………………………………5

i)Penelope………………………………………………………………....6

ii) LINEs/Non-LTRS…………………………………………………...…6

iii) SINEs………………………………………………………………….8

iv) retrotransposons (Ty3/Gypsy, Ty1/Copia,

BEL/Pao)………………………………………………………………………………………..…9

v ) intermediate repeat sequences (DIRS) and relatives…..11

vi) Miscellaneous non-autonomous retrotransposons………………...….12

Class I – DNA Transposons…………………………………………………...…13

vii

i) Inverted terminal repeat elements…………………………………..…13

ii) Maverick/………………………………………………….....15

iii) Helitrons……………………………………………………………...16

iv) Cryptons……………………………………………………………...17

v) Miscellaneous non-autonomous DNA transposons…………………..18

TE biology and behavior…………………………………………………………….…20

Vertical inheritance and horizontal transmission………………………………..22

TE-organisms interactions……………………………………………………………..24

Overview of this thesis……………………………………………………………….…30

Where we have been…………………………………………………………...…31

What we know……………………………………………………………...…….32

Where we are going………………………………………………………..…….33

Tables and figures…………………………………………………………………..…..35

Chapter 2

Where we have been: the Selfish DNA hypothesis, then and now …………………….……42

Abstract……………………………….…………………………………………….…...42

Introduction……………………………………………………………………………..43

Setting the context: views on genome function before 1980………………………….46

The concept of selfish DNA before 1980 ………………………………………...……50

What the 1980 papers on selfish DNA did not say……………………………………52

Doolittle and Sapienza, 1980…………………………………………………….52

Orgel and Crick, 1980…………………………………………………………...54

viii

Misconception about the selfish DNA hypothesis………………………………..56

Early reactions to the selfish DNA papers…………………………………………….57

Elaboration of the selfish DNA concept…………………………………………….....61

How the selfish DNA papers have been cited over the past 35 years………………..64

An updated view of selfish DNA……………………………………………………….68

Concluding remarks…………………………………………………………………....71

Tables and figures...... 73

Chapter 3

Where we are: Transposable element abundance and diversity in genomes of different sizes………………………………………………………………………………………………75

Abstract……………………………….…………………………………………………75

Introduction…………………………………………………………………………..…76

Methods………………………………………………………………………………….80

Baseline analyses of genomic parameters and determining their general patterns..80

Genome parameters data set…………………………………………………….80

Genome size data set………………………………………………………….….81

TE richness data set………………………………………………………….…..81

Statistical analyses…………………………………………………………….....83

Follow-up analyses and testing predictions………………………………………...…85

Statistical analyses……………………………………………………………….85

Results…………………………………………………………………………………...87

Baseline analyses of genomic parameters and determining their general patterns..87

ix

Genomic parameters………………………………………………………….….87

TE richness……………………………………………………………………….89

Follow-up analyses and testing predictions………………………………………...…90

Lack of reporting TE information………………………………………………..91

Discussion…………………………………………………………………………….…92

Genome parameters……………………………………………………………..92

Predictive value of genome size………………………………………………....95

TE richness………………………………………………………………………96

TE community composition…………………………………………………..….98

How TE information is reported………………………………………………..101

Concluding remarks…………………………………………………………………..103

Tables and figures…………………………………………………………………..…106

Chapter 4

Where we are going: Transposable element evolution from an element-level perspective and the future of TE research………………………………………………………………...135

Abstract………………………………………………………………………………...135

Introduction…………………………………………………………………………....136

Function, selfish DNA, multi-level selection, and misconceptions………………….137

Developing the analogy……………………………………………………...…139

An expanded continuum…………………………………………………………...…140

The element-level perspective in the post-genomic era………………………….….142

Why we need the element-level perspective more than ever…………………....142

x

Distinguishing TE ecology from TE evolution……………………………….....143

How further development of the element-level perspective can benefit genome biology…………………………………………………………………………………………..145

Outstanding questions about TE biology from an element-level perspective…..…148

Boundary conditions…………………………………………………………....148

i) What is a TE individual?...... 149

ii) What is a TE population?...... 150

iii) Are there TE species?...... 150

Evolutionary factors…………………………………………………………….151

i) What is TE effective population size?...... 152

ii) What are the rates and dynamics of and selection at the level of TE populations?...... 153

iii) Do TEs have different reproductive modes?...... 155

A new TE biology which acknowledges multiple levels of evolution……………....156

Stress and TE activation in the light of the TE level……………………………156

Experimentally disentangling element-level evolution from organism-level evolution………………………………………………………………………………………...158

Concluding remarks………………………………………………………………..…161

The future of TE research……………………………………………………....163

Tables and figures……………………………………………………………………..168

References……………………………………………………………………………………...169

Phylogeny references………………………………………………………………………….206

Genome data references……………………………………...……………………………….212

xi

LIST OF TABLES

Chapter 3

Table 3.1. Summary of genome data included in Elliott and Gregory (2015b)…….…………106

Table 3.2. Average values for several genomic parameters from each taxonomic group, as determined by data from Elliott and Gregory (2015a; 2015b)…………………………………108

Table 3.3. Relationships between genome size and other genomic parameters, as shown by correlation coefficients (r)………………………………………………………………...…….109

Table 3.4. Statistics for cross-validation and relationships between predicted and observed values (represented by correlation coefficients [r]) for genomic parameters from newly collected genomes…………………………………………………………………………………….…..113

Chapter 4

Table 4.1. Conditions and selection pressures under which the experimental evolution of a TE population could be studied in a Saccharomyces cerevisiae system…………………………...168

xii

LIST OF FIGURES

Chapter 1

Figure 1.1. Composite phylogeny of eukaryotic retrotransposons based on reverse transcriptase ………………………………………………………………………………………....35

Figure 1.2. Eukaryotic retrotransposons….………………………………………………….....36

Figure 1.3. Eukaryotic retrotransposons continued…….…………………………………...…..37

Figure 1.4. Eukaryotic DNA transposons….……………………………………………………38

Figure 1.5. Phylogenetic relationships of eukaryotic DNA transposons at the superfamily level based upon information……………………………………………………………….…39

Figure 1.6. Forces through which selection at the host-level acts to effect the TE distribution within the genome……………………………………………………………………………..…40

Figure 1.7. The TE lifecycle….……………………………………………………………...….41

Chapter 2

Figure 2.1. The percentage of selfish DNA paper citations classified as Multi-level over time from 1980 until the end of 201..…………………………………………………………………73

Figure 2.2. The total number of citations of the selfish DNA papers from 1980 until the end of

2013……………………………………………………………………………………………...74

Chapter 3

Figure 3.1. Process of compiling comprehensive TE richness data for eukaryote genomes….115

xiii

Figure 3.2. Log10 cytological genome size and assembled genome size (Mbp) for animals, land , fungi and protists………………………………………………………………………..116

Figure 3.3. Log10 cytological genome size and discrepancy between cytological and assembled genome size in animals…………………………………………………………………………117

Figure 3.4. Log10 cytological genome size and discrepancy between cytological and assembled genome size in land plants……………………………………………………………………...118

Figure 3.5. Log10 cytological genome size and number in animals………………….....119

Figure 3.6. Log10 cytological genome size and gene number in fungi……………………….120

Figure 3.7. Log10 cytological genome size and gene number in land plants……………...…121

Figure 3.8. Log10 cytological genome size and coding proportion of the genome in animals, fungi, land plants and protists.……………………………………………………………….…122

Figure 3.9. Log10 cytological genome size and intron proportion of the genome in land plants……………………………………………………………………………………………123

Figure 3.10. Log10 cytological genome size and proportion of the genome composed of (A) repeats and (B) TEs in animals……………………………………………………………...….124

Figure 3.11. Log10 cytological genome size and proportion of the genome composed of (A) repeats and (B) TEs in fungi………………………………………………………………...….125

Figure 3.12. Log10 cytological genome size and TE richness in land plants………………....126

Figure 3.13. Log10 cytological genome size and TE richness in fungi…………………….…127

Figure 3.14. Log10 cytological genome size and TE richness in animals, fungi, land plants and protists………………………………………………………………………………………..…128

xiv

Figure 3.15. Log10 cytological genome size and (A) DNA transposon and (B) richness in animals, fungi, land plants and protists……………………………………….……129

Figure 3.16. Log10 cytological genome size and TE richness in animals, fungi, land plants and protists with genomes smaller than 500 Mbp……………………………………………….….130

Figure 3.17. Log10 predicted and observed gene number from animals, predicted from the cytological genome size values of novel genome sequences…………………………………...131

Figure 3.18. Log10 predicted and observed gene number from fungi, predicted from the cytological genome size values of novel genome sequences…………………………………...132

Figure 3.19. Log10 predicted and observed gene number from land plants, predicted from the cytological genome size values of novel genome sequences………………………………...…133

Figure 3.20. Log10 predicted and observed discrepancy between cytological and assembled genome size, predicted from the cytological genome size values of novel genome sequences..134

xv

LIST OF ABBREVIATIONS

7SL: signal recognition particle RNA

ATPase: adenosine triphosphatase

AP-EN: apurinic/apyrimidinic endonuclease

AP: aspartyl protease

BAC: bacterial artificial

Bp: base pairs

DDD/E: aspartic acid mediated transposase

DIRS: Dictyostelium intermediate repeat sequence dbRDA: distance-based redundancy analysis dsDNA: double-stranded DNA

EN: endonuclease

ENV: envelope protein

ERV:

GAG: group-specific antigen protein

Gbp: gigabase pairs

GIRI: Genetic Information Research Institute

GyDB: Gypsy Database

HT: horizontal transfer

INT: integrase

IS: insertion sequence

ITR: inverted terminal repeat

Kbp: kilobase pairs

xvi

LARD: large retrotransposon derivative

LINE: long interspersed nuclear element

LTR: long terminal repeat

Mbp: megabase pairs

MITE: miniature transposable element

M/P: Maverick/Polinton mRNA: messenger RNA

Nc: census population size

Ne: effective population size

ORF: open reading frame

PBS: primer binding site

PIC: phylogeny independent contrast

RC: rolling circle rDNA: ribosomal DNA

REL-EN: restriction enzyme-like endonuclease

RNH: ribonuclease H

RPA: replication protein A

RT: reverse transcriptase

SINE: short interspersed nuclear element ssDNA: single-stranded DNA

TE: transposable element

TPRT: target-primed reverse transcription

TRIM: terminal repeat retrotransposon in miniature

xvii tRNA: transfer RNA

TSD: target-site duplication

VLP: -like particle

YR: tyrosine recombinase

xviii

LIST OF APPENDICES

Appendix 1. Selfish DNA citations data set.

Appendix 1.1 Selfish DNA papers and citation coding.csv

Appendix 1.2 Selfish DNA citations data set.csv

Appendix 2. Genome parameters data set.

Appendix 2.1 Genome parameters data set.csv

Appendix 3. TE richness data set.

Appendix 3.1 TE richness in 257 .csv

Appendix 3.2 DNA transposon richness.csv

Appendix 3.3 Retrotransposon richness.csv

Appendix 3.4 TE richness in eukaryotes with genomes smaller than 500 Mbp.csv

Appendix 3.5 TE richness in eukaryotes with genomes larger than 500 Mbp.csv

Appendix 3.6 Survey sequenced genomes coverage and read length data.csv

Appendix 3.7 Excluded genome TE richness matrix.csv

Appendix 4. Tree used for phylogenetic corrections in genome parameters data set.

Appendix 4.1 Genome parameters tree.nex

Appendix 5. Tree used for phylogenetic corrections in TE richness data set.

Appendix 5.1 TE richness tree.nex

Appendix 6. New genome parameters data set.csv

Appendix 6.1 New genome size measurements.csv

Appendix 6.2 New genome TE richness matrix.csv

Appendix 6.3 New TE richness predictions.csv

xix

Appendix 6.4 New gene number predictions.csv

Appendix 6.5 New genome size discrepancy predictions.csv

xx

Chapter 1: A general introduction to transposable element biology and

evolution

Abstract

Genome size varies enormously across the eukaryotic tree of life. Much of this

variation is caused by changes in the proportion of the genome made up of repetitive

DNA, such as transposable elements (TEs). TEs are genetic entities which encode their

own means to move from place to place within the genome and replicate themselves. By

virtue of this replication and movement, TEs are mutagenic agents, causing harmful,

neutral, and beneficial changes in an organism‘s genome. In this introductory chapter I

describe the phylogenetic diversity of TEs in eukaryotes, how they evolve, and how they

have affected the genomes of organisms over the billions of years of coevolution between

the two. I conclude with an outline of the rest of this PhD thesis.

1

Introduction

The size of genomes varies quite substantially across all life, from 138 kilobase pairs

(Kbp) in a mealybug bacterial endosymbiont (McCutcheon and von Dohlen, 2011) to the

152.230 gigabase pair (Gbp) genome of the lily Paris japonica (Pellicer et al., 2010) – a more than one million-fold range. Genomes can be relatively simple and straight forward, such as a single, circular piece of DNA containing thousands of which encode for the biochemical processes necessary for the survival and reproduction of a unicellular bacterium. In such genomes, genes can be spaced so close together that where one ends another begins. At the other extreme, genes can be separated by thousands or hundreds of thousands of base pairs (bp), spread across anywhere from one to hundreds of linear . These genes can regulate the growth and development of organisms which start out as a single-celled zygote and grow into a large-bodied entity with tens to hundreds of specialized cell types which carry out the day-to- day workings of life.

Genes are not the only components of the genome. Non-protein coding regulatory elements, such as transcription factor binding sites, interact with one another to modulate the expression of genes in response to stimuli and in accordance with developmental pathways.

Interspersed across the genome are loci which generate a variety of RNAs of various sizes, some of which may serve to also regulate the expression of genes, others which are crucial for making proteins, and others of which may be nothing more than artifacts of a messy transcription process (Cech and Steitz, 2014; Palazzo and Lee, 2015). In the eukaryotes, where DNA is enclosed in a protective nucleus, genes are partitioned into chromosomes, which require expanses of repeats known as centromeres to ensure the proper transmission of a full set of chromosomes during DNA replication and (Verdaasdonk and Bloom, 2011). Other

2 repeats called play essential roles at the ends of chromosomes, but not all repeats within genomes can be assigned a simple function for the organism (Blasco, 2005). One of the driving forces behind the great variation in genome size, in particular the variation found within eukaryotes, is driven by the proportion of the genome composed of repetitive DNA (Elliott and

Gregory, 2015a).

From the small genome to the large, we find repeated, molecular entities which differ quite remarkably from other molecular component of the genome Although these sequences are also passively replicated by the polymerases and accessory proteins found in the cell, much like any feature of the genome, they also possess their own means of replication. In between, and sometimes even within genes can be found a vibrant community of molecular replicators that have evolved to exploit the resource-rich domains of cells without necessarily contributing anything in return for using these resources. Amongst the genes, proteins, and RNAs carrying out their tasks for the greater good of the genomic whole can be found these genomic parasites, known as transposable elements (TE). This chapter will serve as an introduction to the biology and evolution of the TEs found within eukaryotes. Their structure, mechanisms of transposition, and phylogenetic diversity will be discussed along with how they interact with the genome of the organism they inhabit. The chapter will conclude with an overview of the contents of this PhD thesis.

The diversity and evolution of eukaryotic transposable elements

TEs are molecular replicators composed of DNA that either encode the means to mobilize and replicate themselves independently of the rest of the genome, or use the molecular

3 machinery of other TEs for this purpose. This is often at the expense of the rest of the genome, because energy and resources are diverted from normal cellular functions, and the mobility and replication of TEs is often quite harmful to the host organism (Belancio et al., 2010; Nellåker et al., 2012). The fate of some TEs is to be suppressed by inhibitory RNAs, cytosine methylation and histone proteins, which render them silent. In some genomes, the dead remains of TEs occur within the chromosomal landscape to such an extent that the majority of the genome is composed of them, contributing to the vast diversity in genome size seen in eukaryotes (Elliott and

Gregory, 2015a). However, transpositionally active individual elements can be found, although many are inevitably doomed to be rendered inactive by the process of mutation. Some genomes are dominated by a few, high copy number lineages of elements, with most of these copies being inactive. In other cases, a rich community exists, with many active members from a variety of different lineages reproducing and moving from place to place within the genome. Not all genomes seem to tolerate a great deal of space being occupied by these parasites and their remains, while others tolerate a great deal. It is TEs in eukaryotes, their evolution and how best to perceive them, to which this thesis will be devoted.

What follows is an overview of the phylogenetic diversity of eukaryotic TEs, which have been traditionally divided into two classes based upon whether or not an RNA intermediate is used in the transposition mechanism (Finnegan 1989). Those TEs that do use an RNA intermediate, the Class I elements, are known as retrotransposons based upon their means of replication. Retrotransposons are first transcribed into mRNA which leaves the nucleus and is translated by ribosomes into the various open reading frames (ORF) it encodes, among these is a reverse transcriptase (RT), an enzyme which can synthesize a DNA copy from an RNA template

(A list of abbreviations is provided in Table 1.1). The element mRNA is then reverse transcribed

4 back into DNA and is inserted into the genome through several different mechanisms. Examples of Class I elements include the Penelope elements, long interspersed nuclear elements (LINEs), long terminal repeat (LTR) elements, short interspersed nuclear elements (SINEs), and other derivatives of retrotransposons.

Class II elements, known as the DNA transposons, lack an RNA intermediate and thus a direct copy mechanism. Some DNA transposons encode proteins known as transposases or recombinases that recognize the boundaries of a given element, excise that element from one position in the genome, and insert it into another (―cut and paste‖). This can result in an increase in copy number depending on when this excision and reinsertion occurs during the cell cycle, by the repair of the empty transposon site and/or by the element reinserting into a region of the genome that has not been replicated yet (Engels et al., 1990). In the following sections, I provide more details on the structure and diversity of both classes.

Class I – Retrotransposons

Early attempts to determine the evolutionary relationships between various RTs suggested that those encoded by retroplasmids, retrons, positive strand RNA , and group

II mobile introns occupy the earliest branches on the tree, all of which are presumably descended from RNA-dependent RNA polymerases (Xiong and Eickbush, 1988; 1990). Group II mobile introns are mobile DNA found in the nuclear genomes of Bacteria and Archaea and in the organelle genomes of some eukaryotes, and are also thought to share a common ancestor with the introns found within the genes of eukaryotes (Lambowitz and Zimmerly, 2010). Problems with mutation saturation over the geological time scale that various RTs have been evolving from one another makes it difficult to determine a root for the tree, and can make determining branching orders within some of these early lineages next to impossible (Simon et al., 2009;

5

Figure 1.1).

i) Penelope

Penelope is a superfamily of retrotransposons present in metazoans, fungi, and some protists (Arkhipova et al., 2002; Arkhipova, 2006). They are between 2 Kbp and 5 Kbp long and possess a single ORF encoding an RT and a GIY-YIG amino acid motif endonuclease (EN) that is used to cut the host DNA to allow for reinsertion of the new copy of the element made by the

RT (Jurka et al., 2005). Arkhipova et al. (2002) found a well-supported sister relationship between the RT sequences of Penelope elements and those of the telomerase enzyme of eukaryotes. There is strong support for the hypothesis that Penelope and long interspersed nuclear elements (LINE) diverged from a common ancestor prior to the Cambrian period, although later work on long terminal repeat (LTR) elements suggests this may have happened much earlier (Malik et al., 1999; Arkhipova, 2006; Llorèns et al., 2009).

ii) LINEs/Non-LTRs

LINEs, also known as non-LTR retrotransposons, are between 2.5 Kbp and 10 Kbp long, and encode one or two ORFs depending on the clade of elements being considered (Jurka et al.,

2005). All LINEs include an RT and EN coding regions, sometimes accompanied by separate

RNA-binding and RNase H (RNH) ORFs (Han, 2010). These EN genes either come in the form of restriction enzyme-like (REL-EN) or apurinic/apyrimidinic (APEN) endonuclease depending on the relevant lineage. Replication begins with the transcription of an mRNA copy of the element, usually from an internal , and the subsequent nuclear export and translation of the transcript by the ribosome. The translated protein products associate with the mRNA and the entire complex is transported back into the nucleus. LINEs and Penelope elements undergo a

6 process known as target-primed reverse transcription (TPRT) in which the EN makes a cut in the bottom strand of the intended target site, and the RT then uses the exposed single-stranded DNA as a primer to start reverse transcription of the associated element mRNA (Luan et al., 1993) .

The process of TPRT commonly results in the RT aborting reverse transcription prematurely resulting in the frequent creation of new insertions which are truncated at the 5‘ end. This occurs so frequently that most new LINE insertions are incomplete. Non-LTR elements are the most prolific elements in all three major clades of mammals including the platypus, woolly mammoth and Tammar wallaby genomes (Warren et al., 2008; Zhao et al., 2009; Renfree et al., 2011).

Non-LTR elements can be divided into two distinct groups of early and late branching clades separated by two related diagnostic features: insertion site specificity and the type of EN

ORF. Members of the early branching R2 clade have very specific insertion sites, such as in ribosomal DNA (rDNA), and possess a REL-EN which mediates their reinsertion back into the genome (Malik et al., 1999). At some point an R2-clade collateral ancestor acquired a second

EN-coding region, this time from the AP-EN group of elements, the descendants of which formed the Dualen clade (Kojima and Fujiwara, 2005). The subsequent loss of the original REL-

EN coding region resulted in the loss of strict site-specificity and the explosive diversification of the later branching clades, including the clade containing the pervasive L1 element (Malik et al.,

1999). In theory, the loss of site-specificity allowed members of later-branching clades to exploit genomic niches that were lacking non-LTRs, and the greater number of non-site-specific clades compared to those that are site-specific suggests this event may have been a significant transition point. This EN-switching event was also accompanied by the acquisition of another

ORF upstream of that encoding the RT, characterized as a single-stranded RNA-binding protein in mammalian L1 elements (Martin, 2006). The final major acquisition by some later branching

7 clades of non-LTR elements was a type I RNH coding region, used to process RNA intermediates in the retrotransposition process, which may have subsequently been lost in several lineages (Malik, 2005).

iii) SINEs

Short interspersed nuclear elements (SINEs) are small retrotransposons, ranging from

~80 bp to 1400 bp, which do not encode any proteins (Jurka et al., 2005). SINEs are all derived from small RNA loci such as transfer RNAs (tRNAs), 7SL RNA and 5S ribosomal RNA

(rRNA), and retrotranspose by parasitizing the enzymes of LINE elements. SINEs are transcribed from internal RNA polymerase III promoters, rather than polymerase II-like genes and other TEs, due to their history of derivation from RNA-coding loci (Kramerov and

Vassetzky, 2011). Some SINEs are partially derived from certain LINEs, and sharing the 3‘ regulatory region of their cognate LINE allows them to use the LINE proteins for replication

(Okada et al., 1997). Not all SINEs use this approach to replicate, but rather take advantage of the promiscuous target interactions some LINE RTs have for competing RNA templates; the same mechanism that generates processed , which are non-functional copies of genes (Esnault et al., 2000). Due to their dependence on LINEs, SINEs also undergo TPRT and are reinserted back into the genome much in the same way as LINEs. The genome contains 7SL-derived SINEs called Alu elements, which have reached a copy number of over 1 million elements by utilizing the proteins of LINE1 elements (International

Sequencing Consortium, 2001; Dewannieux et al., 2003).

Although they are effectively non-autonomous non-LTR elements, SINEs often do not share any sequence similarity with their autonomous LINE partners, and this combined with

8 other factors, merits giving them separate consideration. SINEs have evolved multiple times independently throughout eukaryotic evolution, with 500 million year-old CORE-SINEs being some of the oldest and the primate-specific Alu SINEs being some of the youngest (Gilbert and

Labuda, 1999; Batzer and Deininger, 2002). SINEs typically consist of several modules including a derivation of the original small RNA, a conserved ‗core‘ region shared by related elements, a poly-A tail that is a by-product of mRNA reverse transcription, and sometimes a

LINE-derived region (Kramerov and Vassetzky, 2011).

iv) Long terminal repeat retrotransposons (Ty3/Gypsy, Ty1/Copia, BEL/Pao)

LTR retrotransposons are between 2 Kbp and 12 Kbp in length and are divided into three superfamilies: Ty3/Gypsy, Ty1/Copia and BEL/Pao (Jurka et al., 2005). The different superfamilies are separated based not only on sequence identity, but also on the order of their

ORFs, 5‘ to 3‘ (Figure 1.2 and Figure 1.3). Some elements also encode an extra envelope (ENV) protein that mediates the transfer between cells and has been crucial to the evolution of retroviruses in vertebrates (Lerat and Capy, 1999; Lloréns et al., 2008). LTR elements are bound on each end by long terminal repeat structures that house the transcriptional regulatory regions of the element. During replication, transcription starts within the 5‘ LTR and continues through the rest of the element until it terminates at the end of the 3‘ LTR (Havecker et al., 2004). Reverse transcription begins with a tRNA binding to the complementary primer binding site (PBS) located downstream of the 5‘ LTR, or through a unique self-priming mechanism found in some elements (Levin, 1995). The nucleic acid-protein complex is then shuttled to the nucleus where the new DNA copy is reinserted back into the genome by the element-encoded integrase (INT).

LTR elements make up a major component of genomes in particular, and represent the

9 most numerous TEs in, amongst others, the , English Oak and Fritillaria lilly genomes

(Baucom et al., 2009; Ambrožová et al., 2011; Rampant et al., 2011). LTR retrotransposons are thought to have a chimeric origin, as first proposed by Capy et al. (1995, 1997). Shared ancestry between the integrases of LTR elements and the transposases of Class II elements led the aforementioned authors to hypothesize that the origins of LTR elements lay in the fusion of a DNA transposon with a non-LTR retrotransposon, with the subsequent acquisition of a suite of other ORFs. This conjecture received a great deal of support with the discovery of the Banshee/Ginger/GIN superfamily of DNA transposons (Barrie and

Pritham, unpublished; Bao et al., 2010; Marín, 2010). The transposases of this superfamily are divided into two lineages, Ginger1 and Ginger2, the latter appearing to be closely related to the ancestral transposon, which was thought to have partially given rise to LTR elements.

Interestingly, Ginger1 is thought to have evolved from a Ty3/Gypsy integrase and represents the transition of an integrase back to a Class II element (Bao et al., 2010). In a ground-breaking paper, Llorèns et al. (2009) used conventional and network-based phylogenetic approaches to propose that the first LTR elements appeared over 2 billion years ago in an ancestral eukaryote through the previously-mentioned fusion event. LTR elements are thought to have acquired their

RNH from the earlier branching non-LTR elements based on phylogenetic analysis (Malik and

Eickbush, 2001).

Which fusion event occurred first is unknown, but the first LTR elements differed from non-LTR elements in several other ways. The proto LTR elements gained an aspartyl protease

(AP), used in element protein processing, and a group-specific antigen (GAG) protein, serving as a structural scaffold upon which element replication could occur, most likely from their host genome (Krylov and Koonin, 2001). Ty1/Copia elements are thought to be the earliest branching

10 lineage of integrase-containing LTR elements, having split from a common ancestor with

Ty3/Gypsy elements some 1.9-2.7 billion years ago (Llorèns et al., 2009). BEL/Pao elements evolved from a Ty3/Gypsy-like ancestor some 1.3-1.5 billion years ago in a metazoan host prior to the split between cnidaria and bilateria, and were recently found to be the second most numerous lineage of LTR elements in eukaryotes, with Ty3/Gypsy elements first followed by

Ty1/Copia as a distant third (Llorèns et al., 2009; de la Chaux and Wagner, 2011). The final major bifurcation event in the LTR lineage was the evolution of retroviruses from at least three separate lineages of Ty3/Gypsy elements by acquiring an ENV protein coding region (Llorèns et al, 2008).

v) Dictyostelium intermediate repeat sequences (DIRS) and relatives

DIRS superfamily elements are phylogenetically related to LTR retrotransposons through most of their ORFs, but their means of replication and integration are thought to be different.

They are divided up into the DIRS, PAT, Ngaro and VIPER lineages, but the phylogenetic relationships between these groups remain cryptic (Goodwin and Poulter 2004; Wicker et al.,

2007; Szitenburg et al., 2015). The INT ORF in DIRS elements has been replaced by a tyrosine recombinase (YR), thought to have been acquired from an ancestor of Crypton DNA transposons

(Goodwin et al., 2003). As well, instead of having LTRs, DIRS elements have combinations of both inverted and ‗split‘ direct repeats at their termini (Goodwin and Poulter, 2001). They are thought to use these unique repeats after reverse transcription to generate circular, double- stranded DNA intermediates that are then directly recombined back into the genome using the

YR protein (Goodwin and Poulter, 2004). This means of integration does not generate a target site duplication (TSD) of DNA at the site of insertion back into the genome, unlike most other

11 forms of TEs. Although they have not been extensively studied, preliminary evidence suggests that DIRS elements are more common within crustacean genomes including decapods and the freshwater microcrustacean Daphnia pulex (Piednoël and Bonnivard, 2009; Rho et al., 2010).

vi) Miscellaneous non-autonomous retrotransposons

Due to the occurrence of mutation and a lack of host-level selection to maintain their

ORFs, many retroelements in genomes cannot produce functional proteins. Provided these elements can still produce mRNA transcripts, they can use the proteins of functional elements for reverse transcription and reintegration. The entire ‗life-cycle‘ of SINE elements is based on their immensely successful parasitism of the proteins produced by LINE elements. Besides elements without functional ORFs, there are several lineages of specialized non-autonomous elements which make use of the proteins of the various retrotransposon superfamilies. While

SINEs are typically seen as the ‗poster-child‘ of non-autonomous non-LTR elements, the

Anopheles gambiae genome harbours a family non-LTR elements named Sponge, which encode only an RNA-binding protein and would require the RT activity of a related autonomous family for retrotransposition (Biedler and Tu, 2003). A similar deletion derivative termed HAL1 was discovered in several mammal genomes, including , and is presumed to parasitize L1 elements (Bao and Jurka, 2010). Groups of non-autonomous LTR elements that are smaller

(Terminal-repeat retrotransposons in miniature [TRIM]) and larger (Large retrotransposon derivatives [LARD]) than their autonomous counterparts have also been identified in plant genomes (Witte et al. 2001; Kalendar et al., 2004). Specialized non-autonomous lineages of

DIRS and Penelope elements have not been identified, although some may remain to be discovered in these under-studied groups.

12

Class II - DNA Transposons

Class II elements represent a diverse group of TEs that do not actually share a common ancestor. Instead, they are grouped because they either transpose directly or via a DNA copy from one position in the genome to another, without requiring an RNA intermediate. These include the cut-and-paste, inverted terminal repeat (ITR) elements encoding transposases, and

Cryptons, which encode YRs instead. Other types of DNA transposons utilize more direct means of copy number increase, such a rolling circle (RC) replication-like mechanisms employed by plasmids, the Helitrons, or by encoding their own means to produce a new copy of themselves like many viruses, the Maverick/Polinton (M/P) elements (Kapitonov and Jurka, 2001; Feschotte and Pritham, 2005; Kapitonov and Jurka, 2006).

i) Inverted terminal repeat elements

The most well-studied and understood Class II elements are the cut and paste DNA transposons, all of which encode a transposase and are usually flanked by inverted terminal repeat (ITR) sequences of DNA at the 3‘ and 5‘ termini of the element (Figure 1.4). They were first famously discovered by Barbara McClintock in maize as the causative agents behind the changing colour of corn kernels that had been treated with radiation (McClintock, 1946; 1947).

They range in size from >200 bp to ~15 Kbp, are divided into at least 19 superfamilies and are widespread within all eukaryotes (Feschotte and Pritham, 2007; Yuan and Wessler, 2011; Böhne et al., 2012; Kojima and Jurka, 2013). ITR elements replicate via the transposase protein(s) recognizing and binding to the ITRs at either end of the element, followed by nucleophilic attack of phosphodiester bonds mediated by a divalent cation bound to the catalytic pocket (Hickman et al., 2010). The complex formed by the transposase proteins and transposon DNA, or

13 transpososome, then moves to a different location in the genome where the transposase cuts open the new insertion site and the element is reinserted back into the genome. Thus, unlike the ―copy- and-paste‖ mechanism of retrotransposons, DNA transposons employ a ―cut-and-paste‖ mechanism. Because an mRNA transcript is made only of the transposase ORF and not the entire element, once the transposase protein returns to the nucleus it does not have a cis preference for the specific insertion from which it was transcribed, unlike retrotransposons.

Transposition typically occurs during or very close to S phase in the cell cycle, during which

DNA is replicated and the cellular repair machinery is most active (Bell and Dutta, 2002; Izsvák et al., 2009; Walisko et al., 2009).

DNA replication is the key to copy number increase in ITR elements, which occurs through several related mechanisms tied to the presence of multiple copies of the same chromosomes and the very active DNA repair proteins. If a particular copy of an element transposes when an identical sister is present, a net increase in copy number can occur when, during gap repair of the element excision site, the repair proteins use the sister chromatid as a template to repair the gap (Engels et al., 1990). Copy number can also increase if the element transposes into a section of chromosome that has yet to be replicated, the result being a new TE copy on each copy of a particular chromosome destined for the two daughter cells (Chen et al., 1992). Although usually less numerous than retrotransposons, ITR elements make up the majority of TEs in the genomes of the nematode , rice Oryza sativa, and the frog Xenopus tropicalis (Feschotte and Pritham, 2007; Hellsten et al., 2010).

Most phylogenetic work done on ITR DNA transposons has involved determining within-superfamily relationships, and even these studies are limited in number (Ahn et al., 2008;

Arensburger et al., 2011). All ITR elements possess the transposase coding region, which

14 consists of a catalytic motif formed by either a DDD or DDE amino acid triad, accompanied by a

DNA-binding domain, either fused to the catalytic motif or present in a separate ORF

(Kapitonov and Jurka, 2004; Hickman et al., 2010). Eukaryotic ITR elements are most likely descended from TEs found in the ancestors of modern Bacteria and Archaea, and in some cases similarity has been found between transposases from certain superfamilies and those encoded by particular families of insertion sequence (IS) elements (Doak et al., 1994; Zhang et al., 2001; Feschotte, 2004). However, IS relatives have not been identified for most of the 19 or so currently recognized superfamilies, simply because they have not been studied, due to most of these superfamilies only receiving a superficial description in Repbase and little else in the conventional literature (Jurka et al., 2005). Yuan and Wessler (2011) sought to determine the relationship between the different ITR-element superfamilies by finding and aligning the DDD/E triads in transposase coding regions, in the cases where they were unknown. From this alignment, ‗signature strings‘ of conserved residues were identified in each superfamily and their presence or absence was used as a binary character in a parsimony-based matrix method to construct a phylogeny (Figure 1.5).

ii) Maverick/

Maverick/Polinton (M/P) elements were first characterized in silico in the genome sequences of several species of protists, fungi and animals (Feschotte and Pritham, 2005;

Kapitonov and Jurka, 2006; Pritham et al., 2007). They represent the largest TEs found to date, ranging from 2Kb to 42Kb in size and are bound by ITRs on each end (Jurka et al., 2005).

Having perhaps the most complex replication cycle of all transposons, M/P elements are self- synthesizing DNA transposons mediated by numerous proteins including polymerase B, an

15 integrase related to those from LTR retrotransposons and ITR transposases, adenosine triphosphatase (ATPase) and cysteine proteases, all encoded by the element (Kapitonov and

Jurka, 2006). The element is thought to be excised from the genome after which one or both strands of the element are used as a template for protein-primed DNA replication carried out by the proteins produced from the element. The new copy is then presumably reinserted back into the genome via the element‘s INT protein, although the exact mechanisms of M/P replication have not been investigated. M/P elements make up nearly 1/3rd of the genome of the parabasalid parasite Trichomonas vaginalis (Pritham et al., 2007).

When first discovered, M/P elements were suspected to be either related to double- stranded DNA viruses or linear plasmids based on the of ORFs and their suspected protein-primed mode of replication (Kapitonov and Jurka, 2006; Pritham et al., 2007). Fischer and Suttle (2011) reported a novel dsDNA virus that parasitizes the CroV virus from the marine flagellate Cafeteria roenbergensis. When the authors sought out the phylogenetic history of this virophage, or virus that infects another virus, they discovered not only synteny between some

ORFs, but also homology with those of a Maverick element from the slime mold,

Polysphondylium pallidum (Fischer and Suttle, 2011). They suggest that Maverick elements may have arisen from one or more endogenization events of a virophage in an ancestral eukaryote followed by the loss of ORFs used in large DNA virus infection, and the acquisition of others useful for intracellular propagation.

iii) Helitrons

Helitrons were first discovered in the thale cress, rice and C. elegans genomes, and have been found in a broad range of taxa including protists, cnidarians, echinoderms, insects, vertebrates, fungi and plants (Kapitonov and Jurka, 2001; Kapitonov and Jurka, 2007). They are

16

700 bp – 15 Kbp long elements encoding a Rep helicase and sometimes other ORFs such as replication protein A (RPA), as well as ENs and zinc-finger proteins responsible for DNA binding (Kapitonov and Jurka, 2007). Rather than being flanked by ITRs, Helitrons are bounded by 5‘-TC and CTRR-3‘ sequences along with a 15-20 bp hairpin forming-sequence close to the

3‘ end, which is thought to be a terminator sequence (Kapitonov and Jurka, 2007). Helitrons are thought to use a rolling circle (RC) means of replication similar to some plasmids and bacterial

IS elements (Kapitonov and Jurka, 2007). However, recent evidence from various maize lines suggests that Helitrons can also undergo excision, much like a cut and paste transposon, without copy number increase suggesting flexibility in the means of mobilization (Li and Dooner, 2009).

Helitrons make up as much as 2% of the maize genome and have recently undergone a burst of activity in the genome of the little brown bat, Myotis lucifigus (Pritham and Feschotte, 2007; Du et al., 2009).

The origin of elements has not been extensively investigated by the transposon biology community, nor has a detailed dissection of how they mobilize been published yet. Helitrons were proposed to have arisen from an RC IS element from Bacteria and Archaea, namely relatives of the IS91, IS801 and IS1294 families, with the subsequent acquisition of other ORFs from their hosts in different Helitron lineages, such a RPA, EN and ssDNA binding protein

(Kapitonov and Jurka, 2001). Even so, the derivation of Helitrons from other RC-utilizing types of mobile DNA, such as ssDNA bacteriophages and some plasmids, cannot be ruled out completely.

iv) Cryptons

Cryptons are 2 Kbp to 3 Kbp long DNA transposons that use a YR protein as opposed to a transposase for replication. They have received little study but been identified in diatoms,

17 oomycetes, fungi, and animals (Goodwin et al., 2003, Jurka et al., 2005; Kojima and Jurka,

2011). Some Cryptons have a 4-6 bp sequence that is repeated at each end, similar to ITRs, but they do not appear to create TSDs during replication (Goodwin et al., 2003). The YR protein is thought to recognize these repeats as a substrate for recombination, and then excises the element from the genome. The YR-element transpososome is then thought to reinsert into the genome at a site with sequence similarity to the 4-6 bp repeats. This replication mechanism is based on reactions performed by other recombinases in , but remains to be functionally characterized using an actual Crypton. Copy number increase in Cryptons presumably occurs in a similar fashion to other cut and paste DNA transposons.

Phylogenetic analysis of the most conserved region of the YR ORF from Cryptons and other YRs revealed that Cryptons appear to be monophyletic and nested within the recombinases encoded by DIRS elements (Goodwin et al., 2003). However, more recent analyses suggest both

Crypton and DIRS element YRs are not reciprocally monophyletic (Kojima and Jurka, 2011;

Szitenberg et al., 2014). Both groups are nested within bacteriophage and bacterial tyrosine recombinases, although the authors state that the low sequence similarity of tyrosine recombinases in general makes it difficult to assign the most likely sister taxa for Cryptons

(Kojima and Jurka, 2011).

v) Miscellaneous non-autonomous DNA transposons

Like retrotransposons, mutation can generate non-autonomous derivatives of DNA transposons that often make up the bulk of the population of elements within a genome. As well, the process of gap repair, which creates new copies of ITR elements, can be interrupted and this abortive gap repair results in new copies that are missing internal sequences including sections

18 of, or entire transposase coding regions (Rubin and Levy, 1997). This is one of the mechanisms behind the creation of small, non-autonomous ITR elements called miniature inverted-repeat elements, or MITEs (Bureau and Wessler, 1994). MITEs range from ~100 bp to 1600 bp and are predicted to form very stable secondary structures (Feschotte et al., 2002). It can usually be determined which superfamily of ITR elements is parasitized by a particular MITE by comparing

ITR and TSD sequences (Feschotte et al., 2002). MITEs like these are clearly deletion derivatives, however many MITEs share so little or no sequence similarity to known superfamilies. Either they cannot be assigned an autonomous partner, or they can only be assigned to one using transposition assays (Yang et al., 2009; Abe et al., 2010). In some cases,

MITEs with cryptic partners have either diverged from their autonomous ancestor to the point where it cannot be inferred, or they arose de novo from a non-mobile sequence within the host genome (Feschotte et al., 2002).

MITEs have reached copy numbers into the tens of thousands in the genomes of some plants, including O. sativa (Oki et al., 2008). Although non-autonomous Helitron elements are also common, Du et al. (2009) discovered elements they termed Helitirs in the maize genome.

These non-autonomous Helitrons lack any of the ORFs present in autonomous elements and have 37 bp ITRs that are similar to the 3‘ end of Helitrons, including a 16 bp hairpin-forming sequence. Whether or not Helitirs excise like ITR elements, or transpose through RC replication is unknown.

19

TE biology and behaviour

Coevolution between TEs and their hosts for millions of years has resulted in TE-level adaptations to aid in their own survival and reproduction. Regulation of transposition timing, and where it occurs in the case of multicellular organisms, is one such feature. Research on a few model elements, such as the Ac and P DNA transposons, Ty elements from yeast, and L1 from mammals suggests that transposition is regulated by the cell cycle, with the greatest amount of activity during or just before DNA synthesis (Chen et al., 1992; Menees and Sandmeyer, 1994;

Shi et al., 2007; Walisko et al., 2009). This is thought to be adaptive for the elements because this is when DNA repair activity necessary for mobility is highest, and when the greatest potential for copy number increase occurs. For example, have been shown to jump preferentially into un-replicated origins of replication during S phase which allows a single transposition event to produce multiple copies in a single round of DNA synthesis (Spradling et al., 2011). High rates of activity and mobility in gametes and support cells in reproductive tissues, or cells that will become reproductive tissues in the case of plants, makes transmission of new element copies to the next generation easier, and is less harmful to the host than indiscriminate activity in somatic tissues (Dupressoir and Heidmann, 1996; Pasyukova et al.,

1997; Ergün et al., 2004; McVicker and Green, 2010). Because low transposition rates equate to slow copy number increase, and high rates might increase the deleterious effect on the host, the modulation of transposition rate is an important feature of TEs. In mariner elements, this manifests as the phenomenon of overproduction inhibition (OPI) whereby high concentrations of transposase protein produce non-functional multimeric forms that cannot carry out the transposition reaction (Lohe and Hartl, 1996; Townsend and Hartl, 2000). Non-autonomous and

20 inactive elements from both DNA transposons and retrotransposons alike have been proposed to contribute to rate modulation in a variety of ways, such as titrating functional proteins away from active elements and multimer poisoning by mutated transposases (Simmons and Bucholz, 1985;

De Aguiar and Hartl, 1999).

TE insertion in general has a modest degree of non-randomness when directly assayed, first in the form of a preference for short sequences of DNA that will become part of the TSD that can be used as a diagnostic character for some DNA transposon superfamilies (Feschotte and

Pritham, 2007). Some retrotransposons also have a slight preference for particular sequences, such as the favouring of TnAn TnAn sites and GC rich regions of the genome by the EN of human

L1 elements (Cost and Boeke, 1998; Ovchinnikov et al., 2001). The folding and structure of surrounding DNA, independent of sequence is also known to be important for both the insertion and excision of DNA transposons (Crénès et al., 2009; Esnault et al., 2011). At its most extreme, target site selection can become quite specific ranging from heterochromatic targeting in

‗chromovirus‘ lineages of Ty3/gypsy elements, and mating-type loci targeting of Ty5 in yeast to Pokey elements inserting at a specific site in the large ribosomal subunit RNA coding region in Daphnia (Zou et al., 1996; Penton et al., 2002; Kojima and Fujiwara, 2003; Gao et al.,

2008). A further degree of non-randomness manifests based on the negative effects of insertions on the host genome. How TEs can influence their host genome negatively has historically been partitioned into three categories, with the recent inclusion of a fourth (Hua-Van et al., 2005;

Hollister and Gaut, 2009). These include the ectopic pairing of non-homologous TE loci during , the toxic effects of transposition through DNA breakage, mutagenesis via insertion and epigenetic disruption through heterochromatin nucleation. These four forces, along with non- random insertion contribute to the distribution of TEs throughout the genome. There is a general

21 trend towards TE accumulation in heterochromatic regions, making it difficult to determine which force is most influential (Figure 1.6). Observed target site selection is essentially a combination of where the element lands and where it is tolerated mediated by the strength of at the level of the organism acting upon the insertion, and how deleterious the insertion is.

Vertical inheritance and horizontal transmission

TEs have persisted in genomes through both Mendelian and non-Mendelian patterns of inheritance. Like host loci in the genome, they are segregated and assorted during meiosis and mitosis and passed down to progeny, whether they are cells in the soma or offspring in the next generation. However, because of their mobility and ability to increase their copy number, TEs can defeat the ‗fairness‘ normally implicit in chromosomal segregation and be present in more than two copies per genome, usually in much greater numbers than host genes as well. A TE

‗lifecycle‘ was proposed originally by Kidwell and Lisch (2001) and further refined by others to outline the general evolutionary path taken by TEs in the host genome (Hua-Van et al., 2005;

Schaack et al., 2010). This cycle begins with the invasion of a TE into a new genome or the evolution of a new, distinct TE lineage from a previously existing one (Figure 1.7). If this new element can establish itself in the genome before being lost by genetic drift, a period of copy number increase follows until the host can mount a defense and proliferation can be curtailed.

Before or after this phase, inactive and non-autonomous elements will begin to appear in the population of elements due to mutation. Some insertions that are beneficial at the host level may become exapted and/or co-evolution between elements and the host may foster a variety of other interactions, such as mutualism rather than parasitism. Diversification of one lineage into several

22 new lineages might occur and the cycle beings anew. Conversely, due to mutational processes, all autonomous and active elements may be rendered inactive or non-autonomous resulting in the extinction of that lineage, the loss of some copies and the retention of others as molecular fossils.

The length of time a typical lineage of elements spends in each of these phases could vary from lineage to lineage as well on the class of elements in question.

Another way the cycle may continue before inactivation by mutation is for a single element to be transferred from one species to another in a horizontal transfer event (HT) (Wallau et al., 2012). HT occurs when an element from one species is able to successfully colonize the genome of another species in such a way that it will be passed down to the next generation, such as inserting into germline genomes in multicellular organisms. HT events have been inferred when TE phylogenies do not match host phylogenies, when two elements have higher similarity at the nucleotide level than typical genes from their hosts, and when elements are seen in patchy distributions in the host phylogeny (Schaack et al., 2010; Wallau et al., 2012). It has been pointed out that all three of these criteria can have alternate explanations, including unequal rates of evolution in different element families and lineage sorting of said families in the host phylogeny, and that all three of the criteria should be present in order for the putative HT event to have strong support. In the case of TEs in drosophilid flies, for example, this is true only for about 15% of purported HT cases (Capy et al., 1994; Cummings, 1994; Loreto et al., 2008).

Vectors for HT in eukaryotes remain largely unknown, although evidence has lent strong support to the role of viruses and other parasites in transporting elements from one species to another

(Piskurek and Okada, 2007; Laha et al., 2007; Marquez and Pritham, 2010; Gilbert et al., 2010).

Most recorded examples of HT events involving TEs come from within drosophilid flies, reflecting the bias in research focus and resource availability. Thus, the extent to which HT

23 contributes to overall TE evolution remains cryptic (Schaack et al., 2010). Most HT events also involve either LTR retrotransposons or DNA transposons, with only a few cases known for

LINEs and SINEs (Wallau et al. 2012).

TE-organism interactions

By virtue of their ability to excise DNA and move from one position in the genome to another, TEs can introduce a variety of into the genome of their host organism. The most obvious and easily observed mutation is through insertion into or near genes, which changes either the regulation of a gene or the sequence of the protein coding region. Many TEs were first discovered by inserting into and disrupting a gene with an easy-to-identify phenotype, such as the Ant1 element disrupting the nitrate reductase gene in Aspergillus niger, and Tam3 inserting into the chalcone synthase gene in Antirrhinum majus (Sommer et al., 1985; Glayzer et al., 1995). Analysis of the fitness effects of TE insertions in general have been studied in

Drosophila, and have run the gamut from deleterious to neutral to a small number of potentially beneficial effects (Albornoz and Domínguez, 1999; Houle and Nuzhdin, 2004; Pasyukova et al.,

2004; Gonzalez et al., 2010).

Both non-LTR retrotransposons and Helitrons have been shown to capture host sequence, including genes and fragments thereof, downstream of their 3‘ termini due to weak termination mechanisms, and move these fragments throughout the genome (Goodier et al., 2000; Jameson et al., 2008). This can lead to gene duplications or the incorporation of gene fragments into other genes upon reinsertion, upon which host-level selection can act. DNA transposons can also

24 introduce variation into a locus after excising from it by the incomplete repair of the double- strand breaks, creating what are known as ‗footprints‘ or remnants of the element. These footprints are usually derived from sequences near the termini of the element such as the ITRs

(Labrador and Corces, 1997). Similarly, internal recombination events between the repeated termini of LTR elements remove most of the element but leave behind a single LTR sequence whose remaining regulatory sequences can still affect nearby genes (Roeder and Fink, 1980;

Vitte and Panaud, 2003)

Because populations of TEs consist of very similar DNA sequences distributed at non- homologous loci throughout the genome, they present a very serious problem for the recombination machinery during meiosis or DNA repair. Given sufficient identity at the sequence level between two TE insertions, they may pair during recombination, even if present on non-homologous chromosomes, and resolution of the recombination event might create a variety of chromosomal rearrangements including translocations, duplications and deletions

(Hedges and Deininger, 2007). This ectopic recombination was predicted in models generated by

Langley et al. (1988) to result in host-level selection removing TE insertions from high- recombination regions of the genome. Not just restricted to ectopic recombination, the repair of double-strand breaks after the transposition of DNA transposons can also generate a bevy of rearrangements depending upon the chromosomal position and orientation of said elements

(Huang and Dooner, 2008; Zhang et al., 2009; Krishnaswamy et al., 2010; Yu et al., 2011).

Gould and Vrba (1982) first coined the termed ―‖ for the shift in function some traits have undergone during the course of evolution. One of their examples of exaptation was the possible co-option or ―domestication‖ of TEs and TE-derived sequences into the normal molecular machinery of the host cell in which they reside. This was first observed by Miller et al.

25

(1992) in Drosophila guanche where P element loci encoding defective repressor transposases appeared to have gone to fixation in the host population due to their ability to repress transposition of active P elements. Other TE proteins have also been exapted for a variety of functions including DNA repair and maintenance in humans, chromosome segregation in fission yeasts and embryonic development in Arabidopsis (Bundock and Hooykaas, 2005; Casola et al.,

2008; Shaheen et al., 2010). A more frequent occurrence is the exaptation of partial sequences of

TEs, often accompanied by subsequent modifications to enhance their new host function.

Examples of this form of exaptation encompass the breadth of the various features found in the genome including exons, regulatory sequences and small RNA encoding loci (Jordan et al.,

2003; Piriyapongsa et al., 2007; Sela et al., 2010).

The deleterious effects of TEs are suppressed through various forms of silencing and suppression that fall under the umbrella of epigenetic modifications (Law and Jacobsen, 2010).

These include the reversible methylation of cytosine residues, disruption of transcription and translation due to interfering RNAs, and the sequestering of TEs into condensed heterochromatin through modifications to histone proteins (Lippman et al., 2004; Jordan and Miller, 2009). All three of these mechanisms have been shown to counteract the activity and mobility of various

TEs. While silencing active TEs would clearly be beneficial to the host, it was demonstrated in

Arabidopsis thaliana that this benefit is not without a cost (Hollister and Gaut, 2009). TEs located very close to genes were in low frequency but tended not to be methylated because silencing through methylation of a TE can spread to nearby loci, possibly shutting down a crucial gene at the expense of silencing a single TE insertion. Because of this effect, any insertions near a gene that change its regulation for the better might be favoured by selection. Over time, that insertion might be modified and be exapted into the normal regulatory machinery of the

26 organism (Slotkin and Martienssen, 2007). Some have theorized that cytosine methylation and other epigenetic forms of regulation may have in fact first evolved to combat the replication and spread of molecular parasites such as TEs and viruses and were later co-opted for gene regulation at large (Yoder et al., 1997; Matzke et al., 1999).

By virtue of their ability to copy themselves, another simple but powerful way TEs can influence their hosts is through effects on the size of the genome. Genome size is known to vary

>60,000 fold in eukaryotes and the percentage of the genome composed of TEs is known to positively correlate with genome size across this range (Kidwell, 2002; Lynch and Conery, 2003;

Gregory, 2001). For example, although Zea mays and Zea luxurians only diverged from a common ancestor approximately 140,000 years ago, the genome size of Z. luxurians is 50% larger than its sister species, and most of this increase is due to new TE insertions (Tenaillon et al., 2011). The effects of TEs can also reduce genome size through the deletion of DNA during recombination, such intra-element recombination between the terminal repeats within an LTR element (Vitte and Panaud, 2003), or illegitimate recombination among non-homologous copies of the same element types. Genome size has been shown to correlate in a number of taxa to a variety of variables causally related to the larger cell size often accompanied by larger genome size (Gregory and Hebert, 1999). These include parameters such as metabolic rate and development, which are thought to be influenced by the effects of genome size on cell surface area to volume ratios and the length of time taken to replicate DNA during cell division

(Gregory, 2002). Mediated through genome size, TE copy number increase and decrease can affect these traits, which are important at the organism level, in addition to causing other mutagenic effects.

Due to the deleterious nature of TEs, any demographic forces that reduce or increase the

27 efficiency of selection in the host will have top down effects on the ability of new TE copies to persist in the genome. The most well-known example is the relationship between TE dynamics and the breeding system of the host organism (Wright and Schoen, 1999; Wright and Finnegan,

2001; Arkhipova and Meselson, 2005). Sexual organisms are theoretically better able to deal with deleterious TEs insertions due to removal of insertions by recombination, by low TE load individuals breeding with each other, and by having larger effective population sizes and accordingly stronger selection (Charlesworth and Wright, 2001). On the other hand, asexual or apomictic species are expected to accumulate deleterious insertions through a lack of recombination, eventually culminating in either the loss of individuals or clones with active TEs, or extinction of the species or population (Arkhipova and Meselson, 2005). From the element perspective, sexual reproduction in the host allows for easier spread within the host population, as was demonstrated by Hickey (1982). The types and copy number of TEs in the asexual rotifer

Adineta vaga suggest that they are hard-pressed to survive and proliferate in its genome

(Arkhipova and Meselson, 2005; Gladyshev and Arkhipova, 2010). Paradoxically, sexual reproduction in the host is thought to both slow and increase the spread of TEs, but which is the most likely outcome? Data suggests that TE dynamics and host breeding system interact in a number of ways, and comparisons between closely related species with different breeding systems suggest both increases and decreases in TE load can occur with a switch to asexuality

(de la Chaux et al., 2012; Kraaijeveld et al., 2012; Ågren et al., 2014).

There are several more ways that TEs are thought to influence host genomes that deserve mentioning. A variety of authors have commented on the idea that the TEs might be a driving force in due to the large effects TE insertions can have on gene regulation and chromosome structure in a single mutational step (McClintock, 1984; McDonald, 1990; Oliver

28 and Greene, 2009). A similar argument has been raised with respect to the ability of TEs to insert into new regions and attract epigenetic marks which could influence local gene regulation substantially, and possibly introduce a reproductive barrier that would facilitate speciation (Zeh et al., 2009; Rebollo et al., 2010). Taking a more theoretical approach, McFadden and Knowles

(1997) modelled populations with and without TEs and found that populations with TEs were better able to move from one adaptive peak to another on the fitness landscape as a consequence of deleterious TE insertions. Oliver and Greene (2011) outlined conditions by which the types, copy number and activity of TEs might aid lineage survival and diversification, and used evidence from TE exaptation in primate genomes as a possible example. A recent counter argument by Jurka et al. (2011) suggested that frequent population subdivision and the fixation of different TE variants caused by drift results in different TE distributions. Which scenario is more prevalent, if either, remains to be seen.

TEs have been known for some time to be responsive, either in the form of increased transcription or increased mobility, to a variety of stressors including temperature fluctuation,

UV light, starvation, and cocaine exposure (Rep et al., 2005; Ueki and Nishii, 2008; Ramallo et al., 2008; Maze et al., 2011). Some have attributed this correlation to the adaptive benefit of stress-induced TE activity for the individual organism, or for the species at large (Arnault and

Dufournel, 1994; Capy, 2000; Sacerdot et al., 2005). If mutation spectra of TEs are anything similar to other mutagens, then TEs as generators of variation can only be beneficial at levels above the individual, as was pointed out by Doolittle (1982). Species or lineages that contain

TEs as variation generators during stress might have an advantage over those that do not and thus be able to respond to selection pressures more easily, but testing this is difficult. No research has demonstrated that there is a net benefit to TE hyperactivity during stress for a population of

29 organisms after considering probable reductions in fecundity caused by this activity.

Overview of this thesis

Despite their discovery by McClintock some 70 years ago, the study of TEs by a wider community did not begin until the late 1960s with the discovery of mobile DNA in other organisms. In line with McClintock, the discovery of other TEs was often motivated by the detection of mutant phenotypes in organisms of interest that turned out to be caused by the mutagenic action of TEs. TE research was born from an interest in the functioning and variation of traditional, organism-focused biology, and I feel this is a very important point in trying to understand modern TE research. This perspective is not difficult to understand as interests and funding would likely drive researchers to focus more on how TEs have impacted the health and evolution of ourselves and the economically important model organisms that receive attention in thousands of labs around the world. But it has made me wonder whether or not this is the only course of action that the TE community should be taking.

Like any relatively young discipline, transposon biology still has its fair share of unanswered questions. For example, to what extent do the presence, abundance, and distribution of TEs vary across eukaryotes? Why do some categories of TEs dominate in some genomes while not in others? How often do horizontal transfer events occur and how much does this help to explain the variation mentioned previously? Under what conditions does mutually beneficial coevolution occur between TEs and their host organisms? How much of the taxonomic diversity of TEs remains to be discovered in the un-annotated repeats of sequenced genomes? Under what

30 conditions will a trait such as high insertion site-specificity evolve? Why do TEs so often activate transcriptionally and/or transpositionally during stressful conditions for the host organism? Many of these questions will be approached in different ways depending on the conceptual foundation used, that is, focusing on the organism or focusing on the elements. This thesis will focus on the current conceptual and empirical status of eukaryotic transposon biology, and will be organized as chapters under the following themes: ‗Where we have been‘, ‗What we know‘ and ‗Where we are going‘.

‗Where we have been‘ will consider the historical reasons that organism-centric thinking tends to dominate in the literature, and why a perspective that incorporates both organism and element levels is needed. ‗What we know‘ encompasses a survey of genomic information as on overview of our knowledge of both TE and other content found within over 500 sequenced eukaryotic genomes. ‗Where we are going‘ will be argued in the form of a counte-argument to the dominance of the organism-level perspective. A fully articulated description of element- level evolution will be advocated, and questions that need to be answered regarding this shift will be outlined.

Where we have been

Chapter 2 will examine the history of how TEs have been thought of since their discovery up until very recently, framed within the context of the seminal selfish DNA papers of

1980 (Doolittle and Sapienza, 1980; Orgel and Crick, 1980). Since their discovery and throughout the 60s and 70s, many have seen TEs in light of their mutagenic and evolutionary contributions at the organism level, but others saw that a level of evolution could exist below that of the individual organism (eg. Östergren, 1945; Peterson, 1970). This was fully articulated in

1980 by the authors of the selfish DNA papers, suggesting that the behavior of mobile DNA

31 should be viewed as coming from entities competing with one another in the genome. Genomes in fact contain several nested levels within the evolutionary hierarchy, and need to be thought of this way. I have noticed a growing trend in the TE literature whereby citations of the selfish

DNA papers tended to focus on an organism-level interpretation of the message of said papers, rather than one more focused on the elements or the interaction between the two levels of the TE and the organism. I felt this was indicative of a change in the way researchers thought about TEs and a gradual shift away from the actual message of the selfish papers. With that in mind, a citation analysis of the 1980 papers was performed whereby papers published from 1980 until the end of 2013 were categorized on whether they stressed the organism level or took into account both the organism and TE levels The proportion of multi-level citations was shown to decline dramatically over time, emphasizing the need to stress the multi-level nature of the genome in modern discussions of TEs and their biology.

What we know

We now exist in a world where the sequencing and assembly of genomes is much more routine than at the dawn of genomics ~16 years ago. A wealth of genomic data now exists, which allows us to assess how genomic content varies across the eukaryote tree. A comparative dataset of genomic information for sequenced eukaryotes was compiled from the literature for Chapter

3. This included TE superfamily richness, abundance, gene content, intron content, genome size and a variety of other parameters from over 500 genomes (Elliott and Gregory, 2015a; 2015b). A phylogeny for all species in the data set was also assembled from the literature to allow examination of phylogeny-corrected relationships. This allowed previously identified or proposed relationships between various parameters and genome size to be tested on a broad scale. This was then supplemented with new data that had been published, since the publications

32 in 2015, which was used to assess how well trends from the original data could predict the values for genomic parameters.

It was proposed that organisms with relatively small genomes, such as and Takifugu rubripes, tended to have more diverse TE communities, with more superfamilies, than those with relatively larger genomes, such as sapiens (Volff et al.,

2003; Gregory, 2005). As well, the relationship between genome size and other parameters such as gene number, intron number and GC content have been analyzed before and found to be positive, although with far less species than in the present analysis and without any phylogenetic corrections (Vinogradov, 1998; Vinogradov, 1999; Waltari and Edwards, 2002; Wendel et al.,

2002; Hou and Lin, 2009; Friar et al., 2012; Zhang and Edwards, 2012; Šmarda et al., 2014).

These relationships were examined, but the most striking result of this analysis was what was missing from the data, namely detailed information about the TE community within the genome.

Problems with assembling the data set will be discussed, in particular with respect to the presentation of data and how assembling TE catalogues for the entirety of the 500 genome data set was impossible.

Where we are going

Chapter 4 will expand upon conclusions in Chapter 2, namely that a multi-level perspective of TEs was championed in the selfish DNA papers but was subsequently not adopted. However, the literature is replete with examples of glimpses of element-level thinking, and the rise of the ‗genome ecology‘ movement seemed to indicate a need for a more fleshed-out element-level perspective. In Chapter 4, I argue that while the organism-level has enjoyed dominance, the time has come to explore and understand the level of the elements as well. TEs can be understood as populations nested within the organism population, and efforts need to be

33 made to understand how well element populations and evolution map onto processes at the organism-level. I will advocate for a perspective of TE evolution that acknowledges the place of

TEs in the hierarchy of evolution, and why viewing TEs as a population of evolving entities might help to address some long-standing questions in TE biology.

The work presented in this thesis will demonstrate that a more balanced view of TE biology could be taken, where the properties of the elements themselves are considered, without totally ignoring factors at the organism-level. The analysis of genomic data will show that a great deal of data concerning TEs and their diversity has not been properly collected. Recently, the methods used to annotate TEs in genome sequence data has been criticized, as has the ability of the current TE classification system to properly capture and categorize the phylogenetic diversity of mobile DNA within genomes (Hoen et al., 2015; Piégu et al., 2015). Current online resources for TEs are scattered across dozens of databases, and while possessing useful features, they are often taxon or TE-type specific, poorly supported, and have little to no inter-connectivity with each other or other online resources. With these facts in mind, I will lay out a number of recommendations for the TE community to undertake in the future to ensure a better standardization of how TE data is collected and analyzed.

34

Tables and figures

Figure 1.1. Composite phylogeny of eukaryotic retrotransposons based on reverse transcriptase proteins (Lorenzi et al., 2006; Llorèns et al., 2011; Koonin and Dolja, 2014; Szitenberg et al., 2014; Poulter and Butler, 2015). Crucial transitions have been indicated with arrows. RBD: RNA-binding domain, GAG: group-specific antigen protein, ENV: envelope protein, YR: tyrosine recombinase.

35

Figure 1.2. Eukaryotic retrotransposons. Sequences are not to scale. Boxes within each element represent conserved features, with named ones representing open reading frames and triangles representing terminal repeats. RT: reverse transcriptase, REL-EN: restriction enzyme-like endonuclease, APE: apurinic/apyrimidinic endonuclease, RDB: RNA-binding domain, RNH: RNase H, GIY-YIG-EN: GIY-YIG motif endonuclease, AAAn: poly-adenosine tail sequence, Tail: tail sequence characteristic of the type of SINE.

36

Figure 1.3. Eukaryotic retrotransposons continued. GAG: structural protein, AP: aspartyl protease, INT: integrase, YR: tyrosine recombinase, MT: methyltransferase, ENV: envelope protein. Identical colours shared between different ORFs (eg. RNaseH and Integrase) indicate identifiable common ancestry.

37

Figure 1.4. Eukaryotic DNA transposons. Triangles represent inverted repeats and horse-shoes are DNA hairpins. INT: integrase, ATPase: adenosine triphosphatase, Cys Prot: cysteine protease, Pol B: polymerase B, RPA: replication protein A. Identical colours shared between different ORFs (eg. Transposase and Integrase) indicate identifiable common ancestry.

38

Figure 1.5. Phylogenetic relationships of eukaryotic DNA transposons at the superfamily level based upon protein information. Relationships between the transposase encoding elements and integrase encoding elements (such as some retrotransposons) are illustrated in the tree. The likely origins of DNA transposons which either do not encode transposases/integrases or encode other proteins as well are illustrated below the tree. Adapted from Yuan and Wessler (2011), and other sources (Goodwin et al., 2003; Lloréns et al., 2011; Kojima and Jurka, 2013; Koonin and Dolja, 2014; Kapitonov and Koonin, 2015).

39

Figure 1.6. Forces through which selection at the host-level acts to effect the TE distribution within the genome.

40

Figure 1.7. The TE lifecycle. Adapted from Kidwell and Lisch (2001), Hua-Van et al. (2005) and Schaack et al. (2010).

41

Chapter 2. Where we have been: the Selfish DNA hypothesis, then and now

Abstract

The selfish DNA papers by Orgel and Crick and Doolittle and Sapienza were published in 1980 in the journal Nature, and served as a counterargument to the adaptationist explanations for the presence and abundance of repetitive DNA in genomes. The central tenet of these papers was that if repetitive DNA could replicate independently of the rest of the genome, and possessed variation with respect to this property, the conditions for natural selection to occur within the genome are satisfied and no other reasons for the presence of repeats need to be invoked. This effectively introduced the idea that genomes are the product of multiple levels of evolution. Similar ideas to this had been introduced decades earlier but this was the first time it was explicitly stated and discussed. The original selfish DNA papers have been cited over 1000 times, but the citation analysis I performed revealed that citations stressing the multi-level nature of the genome and transposable elements (TEs) have declined greatly over time and been surpassed by citations stressing a single level, often that of the organism. A focus on organisms, and the occasional adaptive mutations caused by TEs, may have lead many in the TE and genomics community to favour an organism-level perspective when thinking about the genome and TEs. I argue that a return to the message of the selfish DNA papers is crucial to better understand the co-evolution between TEs and the genomes in which they reside.

42

Introduction

―This story begins more than 20 years ago with the observations of the Vendrely's (1948) and of Mirsky & Ris (1951) that different species contain different amounts of DNA in their nuclei. This harmless information caused some discomfort when it was learned that primitive amphibians and fish contained more than 20 times as much DNA per nucleus as did man. It was argued that mammals display a greater developmental complexity than primitive fish, therefore, they must have more genes, yet why should the lower forms have more DNA, if DNA is the chemical basis of the gene?‖ – Thomas, 1971

Over the past 65 years, estimates of genome size – the quantity of DNA contained in a single copy of the nuclear genome – have been made for some 8,200 species of plants and 5,600 animals (Bennett and Leitch, 2012; Gregory, 2016). This reflects an ongoing exploration of genome size diversity that began in the mid-20th century with small-scale surveys of a handful of key species (e.g., Vendrely and Vendrely, 1948; Mirsky and Ris, 1951). Even in the earliest days of genome size analyses, it had become apparent that genome size is largely a species- specific characteristic, with relative constancy seen among conspecifics (hence the term ―C- value‖ in reference to a ―constant‖ haploid DNA content within species). This constancy, in comparison to widely variable protein content from cell to cell, was taken as evidence for the role of DNA as the molecular basis of inheritance prior to the resolution of the DNA versus protein debate. However, as these early studies also showed, genome size is enormously variable across taxa, including among many species that exhibit similar levels of complexity or share close phylogenetic affinity. Conversely, many species displaying comparatively simple morphological or developmental features were found to possess genomes much larger than those of apparently more complicated organisms.

These two observations – constancy of genome size within species but extensive diversity among species – were very difficult to reconcile at first. That is, DNA amount per nucleus (C-

43 value) was thought to be constant in quantity within species because it is the material of which genes are composed, and yet it also appeared to be unrelated to organismal complexity and, by extension, the expected number of genes. Two decades later, these apparently contradictory findings remained sufficiently perplexing to be dubbed the ―C-value paradox‖ (Thomas 1971).

The solution to this ―paradox‖ is now well known: most of the DNA in eukaryotic genomes does not encode proteins (Elliott and Gregory, 2015). As such, there is nothing paradoxical about a ―simple‖ organism with a large genome or of two morphologically similar species with widely different amounts of DNA in their nuclei. Modern discussions of genome size diversity and evolution tend to focus on specific questions relating to this non-genic majority, which together form a broader ―C-value enigma‖ (Gregory 2001). These questions include: To what extent, and following what distributional patterns, does genome size vary among taxa? Which types of sequences make up the non-genic DNA, and does the proportion of different elements differ among genomes? By what mechanisms are these sequences gained and lost over evolutionary time? What impacts, if any, does this non-genic majority have on the cellular and organismal phenotype? Does any or all of the non-genic DNA have a function

(versus effects; Doolittle et al. 2014) at the organism level? Why do some groups exhibit massive quantities of DNA in their nuclei, while others have much more diminutive genomes?

There has been much progress toward answering the question of genome composition in recent years, in no small part because of the explosion in available genome sequence data. For the first time, it has become possible to explore in detail the non-genic constituents and their relative abundances across genomes of different sizes (though not, as yet, in truly large genomes). This includes a number of non-genic sequence types: introns, simple repeats (e.g., satellite DNA), pseudogenes, and wide variety of transposable elements (TEs). In particular, it

44 has become clear that much of the variability in genome size in eukaryotes can be attributed to the differential abundance of TEs (Kidwell and Lisch, 2000; Elliott and Gregory, 2015).

In an important sense, resolving the long-standing enigma of genome size diversity – an essential component in any comprehension of genome form, function, and evolution – hinges on an understanding of TE biology. Unfortunately, there is often a tendency to consider TEs from a relatively narrow perspective, such as their role as mutagens (for good or ill) at the organism level, as potentially or actually functional regions involved in gene regulation relevant to organismal development and physiology, or simply by lumping them into a broad category of

―junk DNA‖ that includes any sequences without an organism-level function. In other words, the biology of TEs is often viewed only through an organism-level lens.

Not all authors have taken such an organism-centric view of TEs. Thirty five years ago, two papers appeared side by side in Nature that sought to challenge this tendency (Doolittle and

Sapienza 1980; Orgel and Crick 1980). These papers differed from many discussions published before and since by invoking evolutionary processes that operate within the genome (e.g., natural selection among competing TEs, interactions between TEs and the host genome), and not only among organisms within populations. This was presented as an explicit counterpoint to what they called the ―phenotypic paradigm‖ in which the mere presence of vast quantities of non- genic DNA was taken to imply that it must have some, as yet unknown, function at the organism level. Instead, the evolution of TEs could often (but not always) be understood at their own evolutionary level without requiring constant recourse to organism-level explanations.

This review seeks to clarify the historical and conceptual context in which the classic

―selfish DNA‖ papers were written. In particular, this chapter highlights the importance of multi- level selection as opposed to the single-level view that preceded these papers and persists to this

45 day in much of the literature. The early reaction to, and interpretation of these papers, and how this has changed over time, is discussed. In particular, I describe how an understanding of the central message of these papers appears to have been missed by current authors, and track the ways in which the papers have been cited over the past three decades. I then make the case that the central theme of these classic papers – a pluralistic, multi-level framework – needs to be incorporated into the modern understanding of TE diversity, abundance, and evolution – and thus, genome biology at large.

Setting the context: views on genome function before 1980

―A concept that is repugnant to us is that about half of the DNA of higher organisms is trivial or permanently inert (on an evolutionary time scale).‖ - Britten and Kohne, 1968

The view of TEs and their significance has undergone a great deal of change since their discovery. The initial characterization of TEs by McClintock (1946; 1947) was a result of their phenotypic effects on kernel colouration in maize, which appeared to be tied to particular developmental steps. This association between TE activity and phenotype led McClintock (1961) to suggest that these ‗controlling elements‘, as she called them, were responsible for modulating during ontogeny. Contrary to commonly recounted history, the occurrence of transposition by certain genetic elements was largely accepted by the scientific community at the time; however, McClintock‘s ideas regarding the orchestration of gene expression by these elements were not (Comfort, 1995; 2001). Notably, other contemporary researchers working on maize doubted the central developmental role of but not their ability to

46 move around the genome, leading Wood and Brink (1956) to propose ‗transposable elements‘ in place of ‗controlling elements‘ as a term less committed to a particular functional hypothesis

(Comfort, 2001).

The general notion that TEs and other repetitive DNA sequences play some functional role(s) at the organism level continued after the rise of early genome-scale analyses. The reassociation kinetic studies of the late 1960s and early 1970s revealed that eukaryotic genomes are highly repetitive (Britten and Kohne 1968; Britten and Davidson 1971), although at the time, it was not known that a very large fraction of this consisted of TEs. Findings such as the very high copy number of repeats, their interspersion among protein-coding regions, and similarities of the repeat sequences in the genomes of distantly related species (e.g., cow and sea urchin), led

Britten and Davidson (1971) to propose a novel model for gene regulation in eukaryotes in which repetitive DNA played a central role. Indeed, it seemed inconceivable – repugnant, even – that this much DNA could be present without being functional at the organism level. The Britten and Davidson (1971) model proposed that repeats interspersed with non-repetitive sequences exert regulatory effects on nearby genes through the action of RNA or proteins produced by the repeats. The replication of repeats through polymerase errors, recombination, and duplication would spread them throughout the genome and favourable configurations could be fixed by natural selection for their effect on gene regulation (Britten and Davidson, 1971). This complemented McClintock‘s proposed role for ‗controlling elements‘, although the similarity of the ideas was not discussed by the authors themselves.

An important counterpoint to arguments of function for repetitive sequences came in

1972, when Susumu Ohno formalized the concept of junk DNA. Junk DNA is DNA without any function for the host organism, and upon which selection does not operate in a sequence-specific

47 way; at the time Ohno (1972) was speaking mainly about the fate of duplicated genes. Given that most mutations are neutral or deleterious, he proposed that the fate of most duplicated genes was inactivation, and that eukaryote genomes should be littered with such genetic fossils we now know as pseudogenes (Ohno, 1972). This would be a better explanation for the size differences in genomes rather than vastly differing amounts of coding or functional DNA, given that the mutation load in eukaryotic genomes would preclude selection from maintaining such large amounts of functional DNA (Ohno, 1972). Ohno‘s view, however was not held by all and many still entertained functional roles for repeats and other non-genic DNA.

Sequences similar to controlling elements were discovered in bacteria and drosophilids in the late 1960s, being mobile and able to exert disruptive effects on the expression of nearby genes (Jordan et al., 1968; Green, 1969; Shapiro, 1969). The structural and behavioural similarity of these sequences led some to infer similar proposed functions for them across the eukaryote- prokaryote divide. Mobile elements in prokaryotes were thought to be involved in gene expression by acting as genetic switches, and as a mechanism for large-scale chromosomal rearrangements (Cohen, 1976: Shapiro, 1977). Several researchers suggested in reviews that similar sequences in eukaryotes, the middle repeats, might be involved in a variety of functions such as gene regulation, chromatin organization, cell differentiation and development, or as units for future adaptive functions via rearrangements (Cohen, 1976; Shapiro, 1977; Kleckner, 1977;

Nevers and Saedler, 1977; Roeder and Fink, 1980). The discovery that some repetitive DNA was mobile in bacteria, drosophilids and yeast occurred in the 1970s (Saedler and Heiss, 1973; Hu et al., 1975a; 1975b; Ilyin et al., 1977; Ananiev et al., 1978; Potter et al., 1979; Young, 1979;

Strobel et al., 1979; Cameron et al., 1979). At the time, the structures of the bulk of middle repeats, namely TEs, in eukaryotes were not as well characterized as they were in prokaryotes,

48 and sequencing of even the prokaryotic elements was just starting. Antibiotic resistance genes were known to be found on mobile DNA in prokaryote genomes (Sharp et al., 1973; Foster et al.,

1975), and having such an obvious host-beneficial trait present on a mobile piece of DNA could conceivably lead people to think that mobile genetic elements were present primarily for the benefit of the host organism.

The tendency to view any given trait as likely to be functional is known as adaptationism, a view not just restricted to the topic of repeats and their significance, but rather a more general situation within evolutionary biology itself at the time. Gould and Lewontin (1979) penned a critique of the prevailing adaptationist framework and stressed that natural selection should not automatically be the null hypothesis when trying to explain the forms of traits in organisms.

Instead, alternative explanations should be considered, such as the form of certain traits being more likely the product of genetic drift or correlated by-products from other traits (Gould and

Lewontin, 1979). Although adaptationist thinking prevailed in the explanation of TE function during the 1970s, these cases more so represent examples of single-level thinking. Evolution can be viewed as occurring at multiple levels of biological organization at once where selection and other forces can act from the level of the DNA, to cells/organelles, organisms, groups and species (Lewontin, 1970; Wilson, 1975; Jablonski, 2008). Although some aspects of this theory are controversial, levels below the organism are less so, such as cell level selection acting in the opposite direction of selection at the organism level in the case of cancer (Cairns, 1975; Pepper et al., 2009). Although adaptive, single-level explanations for the presence of repeats tended to dominate the discourse of mid-20th century literature, several alternative explanations were proposed showing that the root concept of selfish DNA and multi-level thinking predates the discovery of repeats.

49

The concept of selfish DNA before 1980

―It is not necessary that they [supernumerary chromosomes] are useful to the plants. They need only be ‗useful‘ to themselves.‖ - Ostergren, 1945

The kernel of the selfish DNA concept appeared again and again decades before it was popularized in the 1980s. Östergren (1945) discussed the nature of accessory chromosomes in plants, better known today as B chromosomes, which are not essential for the organism and are often missing from individuals within a given population (Camacho, 2005). Östergren (1945) conceded that some accessory chromosomes might be of benefit to the organism, but he argued that some might be seen ‗differently‘ by selection, and those which possess mechanisms to accumulate and spread in the population would do so, even if they had some negative effects upon the organism. He described these chromosomes as being ―useful to themselves‖ and not necessarily to the plants in which they are found. He even suggested that coevolution could occur between organism and chromosomes, where adaptations to restrict accessory chromosomes might be favoured, and the possibility these accessory chromosomes could evolve to become less harmful to the organism.

Although McClintock is perhaps the most well known of the early researchers who studied mobile elements in maize, she was by no means alone. Whereas McClintock favoured an organism-level functional role for her Ac and Ds DNA transposons, others dissented from this opinion (eg. Wood and Brink, 1956). Peterson (1970; 2002) saw these so-called ‗controlling elements‘ as randomly distributed throughout the maize genome, which he interpreted as evidence against their role in gene regulation. He proposed his alternative ‗insert concept‘, whereby ‗controlling elements‘ are more akin to bacterial episomes and might in fact be parasitic 50 or symbiotic residents within the genome. The movements of elements McClintock interpreted as orchestrated for gene regulation, Peterson (1970) saw as a result of them being disrupted from a quiescent state by the x-ray treatment that plant had undergone. Fincham and Sastry (1974) saw the demonstrated ability of differential multiplication by ‗controlling elements‘ as evidence that they could be ―cryptic chromosomal parasites‖, similar again to bacterial episomes, whose evolution could be considered partially independent of the organism. They also suggested that the origin and function of ‗controlling elements‘ are logically independent, meaning that they could sometimes be involved in adaptation, but this has no bearing on their original function and how they originally arose.

By extending his selfish genes idea, Dawkins (1976) saw the large amounts of seemingly unnecessary DNA in many eukaryotes as evidence for DNA that was good at surviving and replicating within the genome as a parasite or hitchhiker, without necessarily having any function for the organism. Cavalier-Smith (1978), too, postulated that selection unrelated to the organismal phenotype could result in the proliferation and movement of sequences within genomes, which did not exclude these sequences sometimes inadvertently becoming useful to the organism. Cameron et al. (1979) saw the ability to transpose and maintain multiple copies by the yeast Ty1 retrotransposon as important features for their persistence and dispersal if they were parasitic similar to viruses.

The term ‗selfish DNA‘ itself was first used by Crick (1979) to describe stretches of

DNA that preferentially replicated and spread within the genome, essentially a shorter description of the concept discussed in 1980. Finally, in the discussion section of Walker (1979),

Bodmer, Fincham and Crick (two of which were already familiar with the concept) discussed the function of repeats, which, if they could have their own fitness associated with selective

51 replication itself, need not have a function for the organism. Although not explicitly so, these authors were all thinking of the genome within the framework of multiple levels of selection and evolution, hinting at a level within the genome, nested within that of the organism. It was this idea that was first explored in detail in the ‖selfish DNA‖ papers in 1980.

What the 1980 papers on selfish DNA did and did not say

―It would be surprising if the host genome did not occasionally find some use for particular selfish DNA sequences, especially if there were many different sequences widely distributed over the chromosomes. One obvious use … would be for control purposes at one level or another.‖ -

Orgel and Crick, 1980 Although the two 1980 selfish DNA papers have been cited numerous times, very few papers have discussed in detail the arguments that were presented. To better understand the impact and response to said papers, what follows is a detailed elaboration of the main arguments found in both. These are then contrasted with the misconceptions that are generally attributed to these papers.

Doolittle and Sapienza, 1980

Doolittle and Sapienza (1980) begin by characterizing the fallacy of the phenotype paradigm, whereby the way a sequence of DNA or gene can ensure its propagation is by benefiting the organism in which it finds itself. This form of adaptationism at the genic level attempts to assign function to any and every genetic sequence, in particular mobile and repetitive

DNA, and when these attempts fail, said sequences are proposed to facilitate genetic rearrangements and be beneficial in a long-term evolutionary sense. Quoting Gould and

Lewontin‘s (1979) paper on the dangers of rampant adaptationism, the authors warn against the 52 same phenomenon occurring with assigning roles to sequences within the genome. Instead, they propose another explanation for the presence of certain sequences such as repeats by proposing their maintenance to be the product of a form of selection that is not relevant to the phenotype of the organism itself. If the cell itself is considered an environment for a given set of genetic sequences, then those which have a higher relative fitness need not require another explanation for their existence.

At the time, transposable elements were known in both prokaryotes and eukaryotes, although the exact molecular mechanisms of transposition were often not well understood. Their ability to move from place to place within the genome and generate new copies ensured survival regardless of effects on the host, unless they were very negative. The behaviour of individual copies of these elements within the genome can be understood as one would consider organisms in an environment. New copies may diverge in sequence from their parent insertions but only in ways that do not comprise their ability to mobilize; otherwise they would cease to jump again and would produce no further copies. Those that are better at surviving and/or transposing in the genome are more likely to leave more copies than competitors, and perhaps supplant them in the long-term. In eukaryotes, the middle repetitive portion of the genome, comprised primarily of

TEs, was known to vary substantially between organisms in the types of repeats present. This was thought to be too large a fraction to be functional by the selfish DNA proponents, based on conventions from the neutral molecular theory of evolution (Doolittle and Sapienza, 1980). This variation was known to extend to the amount of DNA as well, and the relative intensities of host level and selfish DNA level selection might determine the size of the genome.

Doolittle and Sapienza conclude by recognizing that despite non-phenotypic selection being the reason for the maintenance and persistence of selfish DNA, this does not mean that selfish DNA

53 could never contribute to the evolution of the organism:

―We do not deny that prokaryotic transposable elements or repetitive and unique- sequence DNAs not coding for proteins in eukaryotes may have roles of immediate phenotypic benefit to the organism. Nor do we deny roles for these elements in the evolutionary process. We do question the almost automatic invocation of such roles for DNAs whose function is not obvious, when another and perhaps simpler explanation for their origin and maintenance is possible.‖

Doolittle and Sapienza (1980)

Orgel and Crick, 1980

The titular selfish DNA is described as having the properties of arising when a sequence spreads by creating additional copies of itself within the genome and makes no specific contribution to the phenotype of the organism, perhaps only as a burden, like a not-too-harmful parasite. The term selfish DNA was suggested by Orgel and Crick for the bulk of most eukaryotic genomes, namely repeats, introns and intergenic regions. They argue the vast difference in genome sizes between species, and the marked divergence in the amount and type of repeats between closely related species, that repeats are less likely to be functionally relevant for the organism. These selfish sequences were thought to spread by some sort of multiplication mechanism, such as . There would need to be some balancing process between the levels of selfish DNA and the host, perhaps manifesting as the accumulation of selfish sequences in non-transcribed and non-translated regions of the genome. Selfish sequences could also further their own survival by spreading between chromosomes and even across species boundaries, necessitating the use of a modification of population to understand

54 their dynamics.

Orgel and Crick also had thoughts about whether or not selfish DNA could contribute in a beneficial manner to the organism:

―Thus, some selfish DNA may acquire a useful function and confer a selective advantage on the organism. Using the analogy of parasitism, slightly harmful infestation may ultimately be transformed into a symbiosis. What we would stress is that not all selfish DNA is likely to become useful. Much of it may have no specific function at all. It would be folly in such cases to hunt obsessively for one. To continue our analogy the idea that all human parasites have been selected by human beings for their own advantage.‖

Orgel and Crick also addressed the effects of bulk DNA content on phenotypes such as generation time, but argued there was not enough data collected at that point to make a definitive conclusion. To the authors, the selfish DNA concept was imminently testable, as any good scientific idea is. They reasoned that certain groups of repeats should be in different chromosomal positions in different organisms, if their presence was auxiliary to the fitness of the organism. The movement of selfish DNA should be amenable to experimental manipulation and study, as should the phenotypic effects of extra DNA on traits such as metabolism and generation time. Again, they conclude with a balanced view of the purpose of different components of the genome:

―While proper care should be exercised both in labelling as selfish DNA every piece of

DNA whose function is not immediately apparent and in invoking plausible but unproven hypotheses concerning the details of natural selection, the idea seems a useful one to bear in mind when exploring the complexities of the genomes of higher organisms.‖

55

Misconceptions about the selfish DNA hypothesis

There are a minority of cases in the literature where the terms junk DNA and selfish

DNA are used interchangeably or together. Only a few dozen citations of the selfish DNA papers lump them together either with Ohno (1972), or incorrectly attribute the junk DNA concept to the selfish DNA authors (see citation analysis). While much of the TE-derived DNA in genomes might be completely non-functional (junk), any actively transposing elements arguably are not, since they have a function; just not one applicable to the level of the organism but rather to that of the element itself (Elliott et al., 2014). Doolittle (1982) drew this distinction between the two categories first, calling true junk DNA ―meaningless and uninteresting‖, while calling selfish

DNA ―subcellular organisms‖ worthy of study. How then did this association between the selfish DNA papers and junk DNA come about? Balaram (2012) noted that Orgel and Crick

(1980) might tie junk DNA to selfish DNA themselves in their discussion of how excess DNA is eliminated, so it could perhaps stem from that. Although the two terms are distinct in the minds of some, it is easy to see how they could be confused or synonymized; Ohno (1981) himself confused selfish DNA for a revival of junk DNA initially.

Selfish DNA represents the application of Darwin‘s four postulates to the level of sequences of DNA, the implication being that a level within the evolutionary hierarchy exists within the genome, nested within the organism. Something that is commonly said in the TE literature is when TEs or TE-derived sequences are found to provide a benefit to the organism, this invalidates the concept of selfish DNA. This is similar to the false equivocation of junk

DNA and selfish DNA. As was shown above, neither set of the selfish DNA authors rules out selfish DNA from ever being able to contribute to the genome in a manner beneficial to the host organism. By virtue of being composed of DNA, TEs and other selfish DNA have the obvious

56 potential of causing mutations in the genome, a minority of which might be beneficial. TEs which might live in a mutualistic relationship with their host, such as telomeric non-LTR elements in drosophilids, or pieces of TEs that have been exapted into the host level all arose at the level of selfish DNA (Pardue and DeBaryshe, 2011; Alzohairy et al., 2013).. These elements have since had their fitness interests more aligned with the hosts, or have ceased to function and their raw material has been incorporated into the higher level. Again, what is of paramount concern is the idea that multiple levels of evolution exist within the genome that need to be considered when interpreting a set of observations.

Early reactions to the selfish DNA papers

―Both we [Doolittle and Sapienza] and Drs Orgel and Crick were surprised at the sometimes rather violent negative reactions provoked by our explicit articulation of ideas which we thought intuitively obvious. … Apparently, what we did was point out a serious disharmony between the ways in which many molecular who believe they understand the evolutionary process and most contemporary population geneticists (who probably do understand it) think about how natural selection really works.‖ - Doolittle, 1982

Two follow-ups to the 1980 selfish DNA papers were published later that same year in

Nature, which included commentary from the scientific community, and later a response by most of the original authors to these comments and criticisms. In the first set of response papers an editor is conflicted concerning the use of the word ‗selfish‘ when applied to DNA sequences, recognizing its appropriateness but cautioning against anthropomorphism and associations to sociobiology (Anonymous, 1980). Cavalier-Smith (1980) took issue with the selfish DNA

57 hypothesis being the final answer to the sometimes vast differences in genome size between organisms, and suggested that selection on genome size itself must be taken into account. In this respect, selfish action by TEs would provide the variation upon which selection on genome size would act. Any selection pressure on genome size would, in effect, set the maximum population size for TEs within the genome depending on what was favoured. Cavalier-Smith (1980) also pointed out that calling natural selection at the level of selfish DNA ‗non-phenotypic‘ was perhaps a poor choice of words, as selfish DNAs themselves would have phenotypic analogs upon which selection at their level would act. With this in mind, he proposed intragenomic selection be used to differentiate it from other levels, including between genomes for bulk DNA content. Orgel, Crick and Sapienza (1980) agreed with this whole-heartedly and endorsed the use of this new term.

Many of the commenters took issue with a perceived absolutism, in that no other explanation need be given for extra DNA if it were found to be capable of self-replication. Orgel et al. (1980) agreed with the fact that selfish DNA could affect the host phenotype in both positive and negative ways, which was also stated in the original papers. They asked whether a given repetitive sequence in a population was present at its current frequency was due mostly to selfish replication or the benefit it confers upon the host organism. This would determine whether it was largely selfish, or perhaps a form of symbiotic DNA, positive co-evolving with the host organism. In both cases the sequences arose selfishly, the question of how they are maintained within the genome will depend on the above qualifiers (Doolittle, 1982). Selfish

DNA can be viewed like any mutagen, possessing a spectrum of neutral, deleterious, and beneficial mutation outcomes for the host, the latter category most likely being in the minority.

Smith (1980) saw the usefulness of the selfish DNA concept but suggested more

58 theoretical and empirical support was needed to understand how such sequences could spread and the quantitative effects they would have on the fitness of their host organism. Many population geneticists took up this call and showed that selfish DNA could spread, and to what degree, even while reducing the fitness of the organisms that carried them (Hickey, 1982; Le

Rouzic and Deceliere, 2005; Le Rouzic and Capy, 2009). In addition, the molecular details of transposition for many types of selfish DNA and TEs have been described to one degree or another, showing how they propagate and generate deleterious effects on the host genome

(Hickman et al., 2010)

Some critics raised the issue that the selfish DNA hypothesis is not testable, and Doolittle admits that it would be impossible to demonstrate that a candidate selfish piece of DNA did not arise through individual or group selection for the sole benefit of either. This may still be the case, but evidence has not come to light to suggest that any TE, or other form of selfish DNA, arose for the benefit of the organism and not due to selection between molecular replicators below the level of the organism. Ample examples have been found of of former selfish DNA for the benefit of the host, and for intricate, co-evolutionary relationships occurring between some TEs and their hosts, but these are distinct from the case above (Sinzelle et al.,

2009; Chalker and Yao, 2011; Pardue and DeBaryshe, 2011; Alzohairy et al., 2013). The presence of host-beneficial traits, such as heavy metal or antibiotic resistance loci in some bacterial and archaeal TEs, is advantageous for both host and element, and it is more likely that the ancestral state of the element was the lack of a resistance gene (Doolittle, 1982).

Some at the time, such as Jain (1980), suggested that the genomic rearrangements, or ability to cause them, manifested by much of the selfish DNA is the reason for their maintenance in genomes. Although it is possible that selfish DNA might cause beneficial mutations, Doolittle

59

(1982) and the other authors argue that we cannot assume that selfish DNAs arose and are maintained because of this. The ability for selfish DNA to be beneficial because of the variability it can generate is possible under the terms of group selection, but as Doolittle (1982) pointed out, it is rarely phrased this way by molecular biologists, who often may not understand the specific circumstances under which it would apply. More carefully articulated arguments for this position have been proposed recently (e.g., Oliver and Greene, 2009), but the question remains as to how to collect the data to evaluate whether this is likely . If species with more selfish DNA, or more of a certain type, were found to have a higher fitness than a species without, it would be incredibly interesting, but did the selfish DNA arise for this benefit or merely persist for longer because of it? Which is the cause and which the effect?

Noted evolutionary and paleontologist Steven J. Gould‘s criticism of selfish

DNA focused on the term selfish as well, although he praised the concept itself (Gould, 1983;

2002). First, he felt that it still places priority on the level of the host organism, namely that selfish DNA is no more ‗selfish‘ than any other Darwinian individual engaged in a struggle for survival and reproduction in its own environment (Gould, 1983; 2002). Second, that the phrase itself is too close to the ‗selfish gene‘, the name for a concept put forth by Dawkins that advocates a gene‘s eye view of evolution, rather than focusing on organisms (Dawkins 1976).

Incidentally, Dawkins embraced the concept of selfish DNA due to its partial foundation within his selfish gene idea, and found it to be both ―compelling and obvious‖ (Dawkins 1982). This point that the term selfish might be poorly chosen does have some validity, in that the term

‗selfish‘ itself may have influenced how people think of this idea (see the citation analysis in a later section). The 1980 selfish DNA papers were written in response to organism-centric, adaptationist thinking about repetitive DNA, and that a strong word to counteract this viewpoint

60 was necessary.

What of the researcher who originally discovered TEs themselves, Barbara McClintock?

There appears to be no record of her reaction to the publication of the selfish DNA papers.

Peterson (2002), a contemporary of McClintock in maize ‗controlling element‘ research, notes that she thought his ‗insert‘ concept was ―nonsense‖, so she may have shared a similar opinion of selfish DNA itself.

Elaboration of the selfish DNA concept

―It is exciting to think about selection as operating at several levels simultaneously, the products of selection at one level not necessarily being adaptive at the next. Failure to adopt such an hierarchical view certainly underlies much of the confusion about transposable elements.

Acceptance of hierarchy permits one at the very least to begin to construct a population genetics of the components of the genome—a very complex sort of thing because it must be nested within an already complex population genetics of organisms‖ – Doolittle, 1984

It will perhaps come as little surprise that the final word on selfish DNA by the originators of the term did not take place in the rebuttals and commentary found in Nature during

1980. In particular, Doolittle continued to elaborate on the selfish DNA concept throughout the early 1980s (Doolittle, 1982; 1984). It seems that these papers written by Doolittle have tended to receive little regard, as the Web of Science indicates each has not been cited more than 10-20 times in the past ~30 years. These are all unified by the fact that they stress the multi-level nature of the genome, and draw analogies between the level of organisms and the level of TEs.

61

Orgel et al. (1980) wrote a rebuttal to the first round of comments and criticism generated by the original selfish DNA papers. They acknowledged that not all repeats in genomes are necessarily selfish forms of DNA, and that there might actually be a fuzzy separation between functionless junk and functional sequences within the genome. Dover and Doolittle (1980) similarly acknowledged that not all repeats need be selfish but that some could be termed

‗ignorant DNA‘. This ignorant DNA most likely applies more to the types of DNA sequences associated with Dover‘s concept, a process involving the turnover of sequences and biases in this process that might favour the persistence of certain types of these sequences independent of their benefit to the host organism (Dover, 1982). There would also be a fuzzy continuum between true parasitic DNA sequences and those that exist more as symbionts with the host, either not being harmful or perhaps providing some benefit. This claim is known primarily from Kidwell and Lisch (2001), although the seeds of the idea clearly predate the popularization of the concept by Kidwell and Lisch. They also reinforced the idea of multiple levels being important in the genome, which agrees with critics that selfish DNA itself could have its own phenotype relevant to its level, and should be differentiated from the phenotype of the host. Sapienza and Doolittle (1981) also took aim at critics about the amount of DNA within genomes that could be considered selfish. They argued that many repeats and forms of mobile

DNA in prokaryotes exhibit the traits that are characteristic of selfish DNA, and that if the same traits were to be found in any repeats in eukaryotes they should be considered selfish as well.

They also predicted that most of the middle-repetitive DNA in eukaryotes would turn out to be

TEs and thus selfish. They also outlined for the first time the evolution and fate of TEs over time, and did so to explicitly to differentiate the level of selfish DNA from that of the host by

62 describing parts of selfish DNA sequences being able to diverge through mutation if they were not relevant to the fitness of the TE. But the level of the host cannot be forgotten, as unchecked transposition would be constrained by top-down host-level selection should the increased selfish

DNA load be unfavourable (Sapienza and Doolittle, 1981).

Because TEs are nested within the host genome and can influence its evolution, it is not uncommon to see claims made about their presence and/or activity being overall beneficial to the organism (eg. Freeling, 1984; Shapiro, 2010). These often take the form that induced activity of elements can generate beneficial mutations, and that this observation, along with known cases of beneficial insertions, disprove the selfish DNA hypothesis (eg., Allen et al., 2004; Pidpala et al.,

2008; Spadafora et al., 2008; Muotri et al., 2009). The ability of TEs to act as mutagens, and sometimes create beneficial mutations, is not anathema to their inclusion as selfish DNA, and was predicted in the original papers. It is also possible that the presence or activity of TEs in genomes might be beneficial, but we once again must be conscious of the level in the hierarchy that we are discussing. Most evidence indicates that the activity of TEs in the genome is primarily benign or detrimental to the organism (Hickey, 1982; Wilke and Adams, 1992;

Albornoz and Domínguez, 1999; Boissinot et al., 2001; Hollister and Gaut, 2009; Lukic and

Chen, 2011; Nellåker et al., 2012). However, at levels above the organism, it is possible that greater TE activity in one deme, species, or lineage as compared to other competing entities might give that entity a long-term benefit that can be identified retrospectively (McFadden and

Knowles, 1997; Oliver and Greene, 2009; Oliver and Greene, 2011). This possibility is not incompatible with TEs being selfish DNAs whose fitness interests may or may not be aligned with entities at other levels (Doolittle, 1982; Doolittle, 1989; Werren, 2011; Brunet and

Doolittle, 2015).

63

What the Doolittle papers reiterate, is the fact that genomes are the product of multiple separate, but interacting levels within the evolutionary hierarchy. Accepting this fact provides the foundation by which to understand TE behaviour, abundance, diversity and distribution within genomes. TEs within genomes are analogous to populations of organisms within an environment, with individuals that vary in their ability to survive and reproduce within the population of genome, and whose variation could be understood in the context of a TE population genetics

(Doolittle, 1984; Doolittle, 1987). However, the dynamics of these TE populations are also filtered through interactions at the level of the organism, and possibly higher levels as well.

Bottom up effects from the TE level and top down effects from the organism level interact to affect both levels simultaneously. Asserting causal primacy at one level over the other without evidence will lead to misinterpretations of patterns and processes.

How the selfish DNA papers have been cited over the past 35 years

―Since TEs are no longer considered to be just selfish or junk DNA, but have been shown to have important functions in the genomes of their hosts (Biémont and Vieira 2005, 2006; Medstrand et al. 2005), it is of fundamental interest to study the way they have evolved in closely related genomes.‖ - Fablet et al., 2007

Today, it is very common to find the 1980 selfish DNA papers cited as the prime example of a long-standing dismissal of non-coding DNA as useless junk or mere parasites of the genome, in contrast to newly-discovered functions for (small amounts of) non-genic DNA.

How these papers are viewed provides an important gauge of the prevailing conceptual framework for studying and understanding TEs – that is, whether they are seen as an outmoded

64 argument for non-functionality at the organism level, or in their proper context of a multi-level evolutionary perspective on TEs, genomes, and organisms. Speculation or general impressions are insufficient to assess this, and as such a thorough citation analysis was undertaken to determine how these papers have been referenced over the past 35 years.

Papers citing each of the two 1980 selfish DNA papers (Orgel and Crick, 1980; Doolittle and

Sapienza, 1980) were identified using the Web of Science, from its date of publication up until the end of 2013, for a total of 1311 papers which cited one or both papers. All but 59 were obtained electronically, or in hard copy at the University of Guelph, or via Interlibrary Loan.

Those not included primarily represented foreign-language papers from the 1980s.Each paper was consulted directly and classified according to three categories based on how its authors cited the selfish DNA papers:

1. Multi-level thinking. Accurately reflects the point of the selfish DNA papers by discussing parasitic/selfish DNA, doesn‘t need to make contribution to host/can be harmful, selection at their own level.

2. Organism-level thinking. Generally take one (or both) of two forms. A) Equating with junk, i.e., the focus is on whether it has function for organisms or not, often cited along with

Ohno (1972). Citation incorrectly implies that the main claim of selfish DNA papers was that

TEs and other elements have no phenotypic consequences for organisms, such that if you find some elements with consequences, there is no junk or selfish DNA. B) Assuming that selfish

DNA means strictly parasitic. Citation incorrectly implies that the idea of selfish DNA is that all elements are strictly detrimental parasites, such that if you find any examples of function, then

65 there is no parasitic DNA (i.e., selection is only on the organism level). Type B citations were

1.2 times more frequent than type A citations.

3. Tangential citations. Citations of 1980 papers not related to selfish DNA, but rather to other aspects of the papers (genome size, origins of introns, etc.) or something completely unrelated. These included perfunctory citations, those where the paper(s) are cited but not discussed in any meaningful way.

No paper was counted more than once for any given category of citation, even if the selfish DNA papers were cited in multiple instances in the text. Tangential citations were removed from subsequent analyses because they were not informative on how those who cited the papers perceived the selfish DNA hypothesis and the 1980 Nature papers. The data set used for this analysis can be found in Appendix 1.

The results of this detailed citation analysis were unambiguous and striking. The percentage of Multi-Level citations declined over time and is overtaken by Organism-Level citations around 2001 (r = -0.881, p <0.0001; Figure 2.1). This cannot be explained by a significant decline in citations over time since the publication of the selfish DNA papers, (r=-

0.2298, p=0.191; Figure 2.2). Immediately after the publication of the selfish DNA papers, citations of a Multi-level nature peaked at over 80%, while at its lowest this number was 23.53% in 2012, and 30% in 2013 (Figure 2.1).

How could this marked decline in the number of Multi-Level citations, and the rise of

Organism-level citations, over the past 35 years be interpreted? There are several possibilities, none necessarily mutually exclusive from the others. The rise of Junk citations could be explained if the papers were written ambiguously and there is genuine misunderstanding of what

66 the original authors said. There is some evidence for this given that Orgel and Crick (1980) did mention junk DNA, and the fact that Orgel, Crick and Sapienza (1980) had to clarify some things in a subsequent issue of Nature that year. As mentioned previously, selfish DNA was confused for junk DNA by Ohno (1972) himself. Doolittle (1982) clarified the difference between junk and selfish DNA, however this paper is rarely cited, most likely because it is difficult to obtain and not well known to the relevant audience.

Despite it being more than 30 years since the most famous critique of adaptationism was published, it can be argued that it is still a persistent problem (Gould and Lewontin, 1979). The paradigm of carving the biological world up into traits that need to be explained in terms of their adaptive benefit to a given organism would appear to be all too tempting a trap in which to fall.

When assessing adaptation in the genome, although the signatures of selection are seemingly more evident than at the organism-level, the inherent multi-level nature of the genome makes finding adaptations all the more complex. Assigning function is marred, when mere effects need to be distinguished from true functions, and when function is likely, the level at which it is thought to apply (host, TE or other) needs to be carefully evaluated (Doolittle et al., 2014; Elliott et al., 2014). The non-intuitive complexity of the situation within the genome suggests another reason for the multi-level decline over time, namely that the relationship between TEs, their host and concepts of function and adaptation are difficult to grasp, and favouring the host-level is a much easier path to drawing conclusions.

Another option is that a large number of authors who cite the original papers may not have read them carefully. Gould (1991) coined the ‗creeping fox terrier effect‘ for a phenomenon whereby a particular interpretation of an idea appears in the literature and then is seized upon and copied by future works without referring to the original source. Early on, papers showcasing the

67 organism-level interpretations of the selfish DNA papers may have been widely read, and their interpretation was taken to be the strongest message that the original Nature papers contributed.

This may have been compounded by the fact that subsequent papers by Doolittle that discussed more about the hypothesis, and stressed selfish DNA as its level were not as widely cited, due to the difficulty of accessing them. For whatever reason, it is clear from the analysis that the aspects of the selfish DNA papers dealing with TEs from a multi-level perspective, and the benefits this perspective might impart, have been greatly, and progressively, downplayed over the past 35 years.

An updated view of selfish DNA

―[M]olecular evolutionists who are debating ‗selfish DNA‘ and ‗junk DNA‘ have not been at all careful in recognizing that they are dealing with ‗ simultaneous multi-level selection‘ in their arguments.‖ Jungck, 1982

Historically, when researchers write about selfish DNA there has been a marked decrease in thinking about the two levels of element and host in favour of tending to think of the host alone. Why should we strive for a more balanced view? A recent example of why viewing selfish

DNA, and genomes at large, as multi-level entities might yield some benefit can be found in the controversy surrounding the analysis of the ENCODE dataset (The ENCODE Project

Consortium, 2012). The ENCODE consortium sought to exhaustively characterize functional elements within human genomes, spanning a plethora of cell types using a variety of assays to identify regions including those that are transcribed, bind transcription factors, are in areas of open chromatin and have methylated cytosine bases (The ENCODE Project Consortium, 2012;

Thurman et al., 2012; Wang et al., 2012). Based on this extensive analysis, the flagship

68 publication of the consortium concluded that approximately 80% of the human genome appeared in their analyses and was therefore functional (The ENCODE Project Consortium, 2012). The subsequent debate in the literature was well publicized and won‘t be reviewed here (Eddy, 2012;

Graur et al., 2013; Doolittle, 2013; Kellis et al., 2014; Palazzo and Gregory, 2014; Brunet and

Doolittle, 2014; Graur et al., 2015). While ENCODE did generate data that will doubtless prove to be useful in the years to come, it did not attempt to distinguish between sequences residing within TEs, and TE-derived sequences, and those that did not. Function cannot be assigned without first acknowledging the level (either TE, host organism or both) to which a certain feature would pertain, and more so whether or not a given trait or feature truly is functional

(Gould and Lewontin, 1979).

I do not wish to say that much of the work involving TEs and selfish DNA over the past

35 years has been incorrect or misguided. Many researchers have demonstrated a more hierarchy-conscious approach to selfish DNA that is very much in line with the initial spirit of the selfish DNA papers (eg. Deininger and Batzer, 1999; Belancio et al., 2008; Wilkins, 2010).

Kidwell and Lisch (2001) argued for a more nuanced understanding of TEs corresponding to the fleshed-out approach taken by those studying host-parasite interactions. They viewed a continuum on which selfish DNA entities could exist, ranging from strict, harmful parasites at one end to mutualists on the other, providing some benefit to the host organisms that would also enhance their intragenomic survival (Kidwell and Lisch, 2001). The long timescales over which selfish DNA such as TEs and their hosts interact facilitates the evolution of a variety of types of relationships that fall along this continuum. Interestingly, this idea appeared first in a nascent form just after the release of the initial selfish papers, when Orgel, Crick and Sapienza (1980) said that it might be difficult to draw a line between parasitic DNA on the one hand and

69 symbiotic DNA on the other. Whatever its source, at its heart this idea is a recognition of interaction between levels whose signature is found written in the genome for us to interpret.

Selfish DNA renewed

How should the idea of selfish DNA be understood now? Selfish DNA represents a level of selection below that of the organism although not entirely divorced from the organism due to its nested nature. At its most basic, selfish DNA occurs when Darwin‘s four postulates are applied to DNA individuals within the environment of the genome. Given heritable variation between DNA sequences in their capacity to survive and reproduce in a genome, where survival and reproduction are not guaranteed, natural selection will favour variants best able to copy themselves in the given environment. Selfish DNA need not necessarily contribute positively, or at all, to host fitness to survive. However, due to co-evolution or the inherent mutational activity and subsequent loss of mobility, some sequences may become incorporated into the host genome fully as something positively selected at that level. Fitness interests may also be aligned between the host and selfish DNA levels to achieve a more mutualistic relationship with the host organism (Durand and Michod, 2010). Both possible positive contributions to the host organism, and the possibility of mutualistic relationship with the host were recognized early on by the selfish DNA authors, but were not greatly expanded upon, and would resurface much later from other authors. The type of relationship a particular selfish DNA and a host organism will have will depend on many factors.

In cases such as prokaryotic TEs that carry antibiotic resistance genes, it stands to reason that negative selection against elements is so strong by the host that elements that incorporate such genes increase their chances of surviving and propagating in the genome. Discoveries of mutualism with the host and/or discoveries of exaptation of pieces of formerly selfish DNA by

70 the host organism do not invalidate the hypothesis/idea that a level of selection exists below that of the host organism, nor do they invalidate it for a particular family of TEs where some insertions have been exapted. Rather, this represents an example of how selection at multiple levels interacts within the genome to influence its architecture. Accumulated evidence suggests strongly that TEs have affected the evolution of their host organisms, and this is an important observation, but no longer a novel one. The question now remains as to how much they have affected the evolution of their hosts and in what capacity. Either way, this does not invalidate the selfish DNA hypothesis. It remains possible that the activity of TEs within a species or lineage relative to other species or lineages might give the former taxon an advantage over longer evolutionary time scales. However, this does not invalidate the existence of a level below the organism where selfish DNAs survive by propagating themselves (Brunet and Doolittle, 2015).

Concluding remarks

The ideas found within the 1980 selfish DNA papers and subsequent publications provide a baseline from which to understand the behaviour and evolution of TEs. TEs encompass their own level within the hierarchy of evolution and both factors at the element level, and levels above, will influence their abundance, distribution, diversity and evolution. The multi-level nature of the genome can no longer be ignored. Unfortunately, evidence from the literature suggests that this message from the selfish DNA papers has been emphasized less and less over the years. Neither malicious nor intentional, this trend perhaps reflects the charismatic mutational contributions TEs have made to numerous organisms, in the form of TE-related adaptations. It can be easy to focus on the effects of TEs at the expense of how those effects arose due to the interactions between the TE and organism levels.

71

Without ignoring the level of the organism and higher levels, the time has come to reacquaint ourselves with the TE level, to explore how similar it is to that of the organism and populations. This will require drawing parallels between TEs and organisms, and whether the ideas and tools of organismal evolution can be directly applied to TEs. Ideas such as these will be explored in Chapter 4 of this thesis. Developing an element-level perspective will first require determining the current state of TE data, the subject of Chapter 3.

72

Tables and figures

Figure 2.1. The percentage of selfish DNA paper citations classified as Multi-level over time from 1980 until the end of 2013. There is a significant, negative relationship between the percentage of multi-level citations and the year (r = -0.881, p <0.0001).

73

Figure 2.2. The total number of citations of the selfish DNA papers from 1980 until the end of 2013. There is no significant relationship between total citations and the year (r=-0.2298, p=0.191).

74

Chapter 3. Where we are: Transposable element abundance and diversity in

genomes of different sizes

Abstract

The explosion of sequence information in the past decade allows us to compare the content of eukaryote genomes as has never been possible before. This allows us to compare how communities of transposable elements (TEs), mobile sequences of DNA, differ across the tree of eukaryotes, and how these relate to the important parameter of genome size. Observations in the early stages of the explosion in genomic resources suggested that smaller genomes contained fewer kinds of TEs than larger ones. To test this relationship with a larger data set, TE richness data, as measured by the number of superfamilies in a given genome, was collected from over

500 eukaryote genome sequencing papers. This was supplemented with searches in the primary literature and databases to create as comprehensive as possible of a TE catalog for eukaryotes.

Genomic parameters such as gene number, intron number and repeat content were also collected.

Some strong, linear relationships were found between genome size and many genomic parameters. However, the relationship between TE richness and genome size proved to be more complex, with TE richness and variation in TE richness being small in the largest and smallest genomes. The origins and implications of this relationship are discussed, as well as a trend in how TE content is reported.

75

Introduction

In previous chapters, I laid out how potential biases in the way transposable elements (TEs) are thought of by the research community has influenced the way they are studied. This comes in the form of favouring the perspective of TE evolution that favours the level of the organism, rather than acknowledging the multiple levels present in the genome. This was supported in Chapter 2 by using data about how the seminal selfish DNA papers (Orgel and

Crick, 1980; Doolittle and Sapienza, 1980) have been cited over the past 30 years, and the reduction over time in element and multi-level interpretation advocated by the original authors.

To remedy this, I suggested that evolution at the element-level needs to be explored much more thoroughly, in line with the ideas and spirit of Doolittle (1982; 1989) and the original selfish

DNA papers. Taking the view that TEs are evolutionary individuals in their own genomic population, this will be extended in Chapter 4 to determine which facets of traditional, organism- focused evolutionary biology would be useful in understanding the evolution, abundance, pervasiveness and diversity of TEs. With that in mind, this chapter explores TE abundance and diversity from an empirical perspective, along with determining the degree to which general information is available on TE content now that a great deal of eukaryote genomes have been sequenced.

The analogy between genomes and ecosystems was first made by Holmquist (1989), who suggested that different components of the genome might localize to different niches within the genome, the way species do within ecosystems. Such niches to Holmquist included the darkly staining, heterochromatic bands of chromosomes, rich in repetitive DNA such as TEs. TEs replicate and use resources within the genome much as organisms do within their habitat. TEs

76 also interact with each other in the form of parasitism between autonomous and non-autonomous elements, and compete with each other for resources such as space and the proteins necessary for replication (Leonardo and Nuzhdin, 2002). This has led some to propose that the genome as ecosystem analogy might be a useful tool in understanding TE abundance and diversity, even though caution must be taken in how ideas and methods from ecology are applied to TEs within the genome (Venner et al., 2009; Linquist et al., 2013; 2015).

In traditional ecosystems, it has been known for some time that a reliable relationship exists between the number of species and the size of the ecosystem they are found in. As ecosystem size goes up so does the number of species, also known as species richness

(Magurran, 1988; Lomolino, 2000). This typically takes the form of a sigmoidal relationship, with species richness increasing rapidly with ecosystem size and then tending to plateau

(Lomolino, 2000). In the genome-as-ecosystem analogy genome size, or C-value for constant value, best represents the size of the genomic ecosystem that TEs occupy. Genome size is a trait which has been of interest to biologists since the 1940s. Although genome size is not related to intuitive notions of organismal complexity, it has been found to correlate with a variety of factors important for organism function and evolution, such as cell size, development and metabolic rate

(Cavalier-Smith, 1978; Gregory, 2000; 2001; Gregory, 2002a; Gregory, 2002b; Kozlowski et al.,

2003).

With regards to TE information, data from some of the earliest sequenced genomes suggested that TEs make up a larger proportion of larger genomes, and indeed this fact was known about repetitive DNA in general since the 1960s (Britten and Kohne, 1968). Gregory

(2005) and Volff et al. (2003) observed a different relationship between the human genome

(3200 megabase pairs [Mbp]), and several other smaller genomes, namely that of the Takifugu

77 pufferfish (400 Mbp) and Drosophila melanogaster (180 Mbp). The relatively smaller genomes of the latter two species have richer TE communities in the form of more lineages than the larger human genome. These comparisons were done with very limited sample size, so it was of interest to this author to investigate whether this pattern was supported by a wider range of taxa with varying genome sizes. It was also of interest to see if a sigmoidal relationship between richness and genome size manifested as it does in traditional ecosystems. Instead of focusing on the many molecular details, like specific sequence, copy number or bulk abundance, superfamily presence/absence data was instead gathered to give an idea of the overall TE community in sequenced genomes, which could then be compared to the genome size of those host organisms.

Determining the superfamily count, or TE richness, in genomes not only provides information on how large TE communities are from genome to genome, but also about their composition. TEs are divided into a variety of phylogenetic lineages depending on their mode of transposition and other diagnostic features (Chapter 1). New TEs are discovered routinely, and undoubtedly more remain hidden in the unclassified repeat regions in sequenced genomes, and in the wide swath of eukaryotes that have yet to be the target of any sequencing. Although the particular compositions of TE communities are often mentioned, and how they might differ from other genomes, a comprehensive resource of the exact TE community composition of many genomes does not exist. The collection of TE richness data allows such a resource to be made, which could have a multitude of uses, such as determining which forces might be important in generating the makeup of a genomic TE community, and providing a resource to non-specialists who might wish to know what TEs are likely to be part of a soon to be sequenced genome. It also allows us to determine whether TE richness follows the same positive relationship with genome size, as species richness has with ecosystem size.

78

Databases for eukaryote genome sizes already exist (Kullman et al. 2005; Gregory et al.

2007; Bennett and Leitch 2012; Gregory 2016), but estimates are also found in genome papers themselves, sometimes using cytological methods and other times using assembly statistics to estimate it. Genomic parameters and genome size have been of interest to researchers since the late 1990s, when genetic information was beginning to accumulate to sufficient levels for initial analyses (Vinogradov, 1999). Parameters such as intron size have been shown to positively correlate with genome size in a variety of organisms (Waltari and Edwards, 2002; Wendel et al.,

2002; Zhang and Edwards, 2012). Genome size and gene number have been found to have a positive relationship in several studies based on data from 55 (Hou and Lin, 2009) and 106 eukaryote genomes (Friar et al., 2012). The copy number of ribosomal DNA genes was also found to be strongly positively correlated with genome size in 166 animals and land plants

(Prokopowich et al., 2003). A large number of new genomes have been published, which would allow for this relationship to be verified. These past studies often did not take into account shared common ancestry between species, so whether these relationships remain after phylogenetic correction remains to be tested. With that in mind I also wanted to assay the relationship between various genomic parameters to genome size, beyond TE richness, within a phylogeny-corrected context.

New sequencing technologies have made unraveling the genomes of more organisms easier, and our ability to generate sequences for new genomes might now outstrip our ability to analyze them properly. This wealth of information of sequences now gives us the opportunity to determine what sort of information is easily accessible concerning the basic parameters of genomes, such as gene number, intron number, but more importantly information on repeats and

TEs. If a more balanced view of TEs, which acknowledges their place nested within the

79 evolutionary hierarchy, is to take place it is useful to understand the current state of information.

Such a survey was undertaken (Elliott and Gregory 2015a:2015b), the largest survey and collection of such information to date. A selection of those results will be detailed and discussed in this chapter.

It was also of interest to determine how predictive genome size might be of genomic parameters once this large data set was gathered. To that end, the data from Elliott and Gregory

(2015a; 2015b) were used for curve-fitting to generate functions to represent the relationships between genome size and a given parameter. These were used, in conjunction with novel genome size data from previously unanalyzed genomes, to generate and test predictions about genomic parameter values from un-analyzed genomes, such as gene number and TE richness. If genome size could be shown to have predictive capability, it would further solidify the usefulness of generating genome size estimates prior to possibly expensive genome sequencing and assembly.

Methods

Baseline analyses of genomic parameters and determining their general patterns

Genome parameters data set A search of the primary literature was performed to find published papers describing animal, fungi, plant and other eukaryote genome sequences. This included completed, draft and survey sequenced genomes that were available up until September 2014 via search in PubMed.

This was supplemented by consulting other online sources such as the Genomes Online Database

(Reddy et al., 2015), CoGePedia sequenced plant genome wiki, Phytozome (Goodstein et al.,

2012) and the Wikipedia list of sequenced protist genomes.

From these papers data was collected or derived about 22 genomic parameters, including

80 information on gene, intron and exon content, repetitive and TE content, base-pair composition and chromosome number. When necessary, this information was gathered from additional sources, details of which can be found in the Genome data reference list. The full data set can be found as a CSV file in Appendix 2. This compiled data set represents information from 502 eukaryote species, including 148 species of animals, 81 land plants, 202 fungi and 70 protists.

Genome size data set Cytological estimates of genome size in Mbp were compiled from taxon-specific online databases including the Animal Genome Size Database (Gregory, 2016), Plant DNA C-Values

Database (Bennett and Leitch, 2012), and the Fungal Genome Size Database (Kullman et al.,

2005). Cytological estimates for other eukaryotes were collected from genome sequences papers when available or the rest of the primary literature. In some instances, discretion was used when determining which estimate of genome size to use. Older estimates of genome size within the various databases that conflicted significantly with assembly estimates of genome size were supplemented, when available, with newer independent estimates of genome size performed during the given sequencing project using methods such as flow cytometry or Feulgen image analysis densitometry. The sequence assembly size of each genome was also compiled from each genome sequence paper.

TE richness data set Completed, draft and survey-sequenced genome papers published up until January 2014 were used to compile a presence/absence matrix of TE superfamily data for each species.

Survey-sequenced genomes include those which only sequenced a fraction of the genome lower than 20%. These data represent the TE community in each genome without taking into account abundance of each superfamily, and will be referred to as TE richness (referred to as TE

81 diversity in Elliott and Gregory 2015b). In total, this data set included 541 genomes; 176 animals, 117 land plants, 170 fungi, and 69 protists. Of these, 74 were from BAC-end, fosmid or survey-sequenced genomes. PubMed and resources previously mentioned were similarly used to compile this data set (see Genome data references). The superfamily level of the TE taxonomic hierarchy was chosen because this level is most commonly reported, and is the least ambiguous and most information-rich level below that of Class. The superfamily assignments found in

RepBase (Jurka et al., 2005; Bao et al., 2015) were used as the basis for classification, along with some modifications suggested by phylogenetic work performed by Yuan and Wessler (2011).

That work suggested collapsing certain superfamilies due to well-supported monophyly, and those conventions were used here.

The publications on which this data set was based spanned a period of time where discoveries of new TE superfamilies were made. Therefore, published genome papers would not necessarily reflect our complete knowledge of a given genome‘s TE complement. The rest of the primary literature was searched to supplement the information found in the genome papers. This included searching RepBase (Bao et al., 2015), papers characterizing new superfamilies, the

Gypsy Database (Llorens et al., 2011), SINEbase (Vassetzky and Kramerov, 2013), and taxon- specific databases. Naming conventions for TEs are not universal, and occasionally TE catalogs were listed partially, or only, as family names which can make determining the superfamily to which a particular family belongs difficult. In these cases, the literature was used for superfamily assignments of families. To help account for possibly novel but uncharacterized TE superfamilies, unknown categories were added for the major TE groupings (DNA transposon,

LTR element, ERV, LINE, SINE) as well as novel TEs of unknown classification. These were used when unknown elements were mentioned, and where potentially novel superfamilies were

82 listed but not well described. In total there were 75 superfamily categories, 69 of which were known superfamilies and six categories for unknown elements, such as unknown DNA transposon, unknown LTR retrotransposn, etc. An example of this process is illustrated in Figure

1. In this case, the paper describing the genome of the freshwater microcrustacean Daphnia pulex was searched for information relevant to its TE content (Colbourne et al., 2011). This resulted in the identification of 21 superfamilies. Next, a variety of TE databases were searched for D. pulex entries, adding 7 more superfamilies found in Repbase (Jurka et al., 2005). A search of the primary literature was then conducted, in this case adding two more superfamilies to the count

(Schaack et al., 2010; Kojima and Jurka, 2013). This process was repeated for all other genomes in the data set. This data set can be found as a CSV file in Appendix 3.

Statistical analyses Summary statistics and correlation coefficients were calculated using standard methods.

Genomic parameters, including genome size, were log-transformed because the distributions of the untransformed variables violated normality. Because shared common ancestry violates the assumption of independence of data points, significant relationships were corrected using

Felsenstein‘s phylogenetically independent contrasts (PICs) (Felsenstein, 1985). These were positivized, forced through the origin, and computed using the PDAP module in Mesquite 2.75

(Maddison and Maddison, 2008; Midford et al., 2011). PICs were generated from a eukaryote wide-phylogeny that was assembled from the literature. Due to the broad nature of the taxonomic comparisons involved in this data set, this phylogeny conveyed topology only and not distance between taxa. This was done using information provided in the Tree of Life database

(Maddison and Schulz, 2007), as well as a variety of other sources from the primary literature, detailed in the Phylogeny Reference list. This information was used to generate phylogenies for

83 data sets in Elliott and Gregory (2015a; 2015b). All branch lengths were set to one, and for each soft polytomy present on the tree, one degree of freedom was subtracted from the PIC analysis

(Purvis and Garland, 1993). All analyses were repeated using the branch length transformation methods of Nee, Grafen and Pagel in Mesquite (Maddison and Maddison, 2008). The method of

Nee sets the distance from each tip to their closest node equal to the log10 transformed number of all tips descending from that node. The method of Grafen sets each branch length to the number of descendant tips from that branch minus 1. Finally, the method of Pagel sets each branch length equal to the maximum of the number of bifurcation levels originating from both bifurcation events at that node plus 1. The trees that were assembled can be viewed as NEXUS files in Appendix 4 (520 species) and Appendix 5 (257 species).

Analyses were conducted on the entire data set, as well as on taxonomic subsets of the dataset, such as animals, land plants, fungi, etc. For the TE richness data, subsets based upon genome size ranges were also used. Although data was collected for protists, it was not analyzed as a distinct group, as out of all categories the least data was available for protists, and protists are not a monophyletic grouping. To account for the multitude of tests performed, the Bonferroni correction (Bonferroni, 1935) was implemented, with adjusted alpha values found in Table 3.3.

To assess the possible influence of genome size and host-phylogeny on TE community composition, distance-based redundancy analysis was used (dbRDA). An RDA is a means to measure how much a given set of proposed explanatory factors influences a set of response variables. In this case, the TE community of each species, as represented by the TE superfamily presence/absence matrix, was analyzed to determine what, if any, influence the factors of genome size and host-phylogeny had on each community. This was performed in the statistical analysis software R (v.3.2.1) using the packages vegan, ape, and phytools on the original TE

84 richness data set (Paradis et al., 2004; Revell, 2012; R Core Team, 2015: Oksanen et al., 2015;

Elliott and Gregory, 2015b). All species confirmed to lack TEs and thus have empty matrices were removed, and in all instances where multiple unknown superfamilies of the same category

(eg. Unknown DNA transposon = 2 instances) were present, these were set to one instance. The

Jaccard dissimilarity index, appropriate for presence/absence matrices, was used for computing distances between TE communities.

Follow-up analyses and testing predictions

The availability of new genome sequences not available to Elliott and Gregory (2015a,

2015b) provides an opportunity to test how representative and predictive the previous results would be for new genome data. A follow-up data set was assembled, which included 180 new genomes that had been published after collection of the original data sets, as well as 70 genomes which were already published but had not been included in the analysis found in either paper

(Elliott and Gregory, 2015a; 2015b). , The parameters gene number, discrepancy between estimated and assembly genome size, and TE richness were collected from these new genomes.

Gene number was available from 76 animal, 37 land plant, 55 fungal and 12 protist genomes; discrepancy data from 27 animal, 27 land plant, and 1 fungal genome; and 91 animal, 46 land plant, 97 fungal and 16 protist genomes were analyzed for new TE richness data.

Statistical analyses

Before gene number, discrepancy and TE richness were collected from these genomes, cytological estimates of genome size were collected using previously mentioned methods to see whether or not it is a useful predictive factor for each of the parameters. The shape of the data for these parameters from the original data set was used to make predictions about the expected

85 values for the new data set. Local regression (Loess curve fitting) was used on subsets of the original data to find functions that best represented those complex distributions, such as genome size and TE richness in animals. In most cases, the Loess curve was not substantially more useful than fitting a regular regression line, so to avoid over-fitting in those cases, a general linear regression line was fitted. Cytological values for genome size from the new data set were log- transformed (base 10) and used in combination with the function generated from either Loess curve or linear regression fitting to generate expected values for gene number, discrepancy and

TE richness in animals, land plants and fungi.

Cross-validation was performed on all linear models to ensure they provided robust predictions based on the given data set. This method involves dividing the dataset into subsets of a fixed number of observations (K), systematically leaving out a specified number of observations at a time and recalculating the linear model, which is then used to predict the missing values given the known genome size value. Values of 6, 20 and the number of observations in a given data set were used for K. The output of cross-validation is a raw and adjusted delta value, representing the estimates of prediction error without and with the cross- validation method. The magnitude of these prediction errors (in the units of the predicted variables) and how different they are from one another represents how predictive the model is for the given data set. The standard error was reported for Loess-fitted lines, Predicted values generated for the new data set were then regressed against the actual values, and linear models were fitted to these distributions to provide another means of determining how predictive the original data set was for new data. In some cases, where new genome size estimates fell outside the range of the original data set, values for genomic parameters could not be predicted. All curve fitting, regressions, and leave-one-out cross validations were performed in R (v. 3.2.1)

86 using native functions, as well as the package boot (R Core Team, 2015; Canty and Ripley,

2015). The new data set for both genome parameters and TE richness can be found in Appendix

6 as a CSV file. Papers used for this data set can be found in the Genome data reference list.

Results

Baseline analyses of genomic parameters and determining their general patterns

Genomic parameters Data for 22 parameters was assembled but only a small portion of possible relationships were examined. Summary statistics for some parameters can be found in Table 3.1, and details about correlations, phylogenetic corrections, correlations with branch length transformations, and adjusted α values can be found in Table 3.2. It was of interest to see how well the genome size as measured by the assembly of scaffolds from the sequencing project compared to reported values for genome size from traditional cytological methods, such as Feulgen densitometry and flow cytometry. A strong, positive relationship (Table 3.1; Figure 3.2) was found between these two parameters in animals, fungi and land plants. Overall there was great agreement between traditional and sequenced based methods. Anecdotally it has been noticed that assembly sizes are often smaller than sizes reported by cytological methods. To address this, the discrepancy between cytological and assembly methods was compared to the cytological size to determine if discrepancy increased with genome size. Discrepancy was positively correlated with cytological genome size in animals (Figure 3.3) and land plants (Figure 3.4), however there was not enough data for fungi. There were 29 instances where the assembly size was larger than cytological size, the remaining 194 where cytological size was larger than assembly size. Of these 223 instances of discrepancy, ~75% represented discrepancies of greater than 10% difference in size between

87 the two estimates (Elliott and Gregory, 2015a). It‘s possible that sequencing coverage could explain these discrepancies, with lower coverage genomes displaying a larger discrepancy. In the

10% or greater discrepancy instances, this was not found to be the case and coverage and size of the discrepancy are not related.

The number of genes was positively correlated with cytological genome size in animals

(Figure 3.5), fungi (Figure 3.6) and land plants (Figure 3.7). In contrast, the proportion of the genome composed of genes displayed a strong negative relationship (Figure 3.8) with cytological genome size in all taxa. Although gene number increased with genome size, the amount it contributes to genome size does the opposite. The proportion of the genome composed of introns was strongly negatively correlated with genome size in land plants (Figure 3.9), although this relationship appears to be driven by two large genome outliers (Zea mays and Picea abies).

There was no significant relationship between the proportion of the genome composed of introns and and cytological genome size in either of the remaining two major taxonomic groups.

Although GC percentage has been a genomic parameter of interest, there was no significant relationship between it and cytological genome size in animals, fungi, nor in land plants. Chromosome number varied from 3 in some animals to 84 in the sea lamprey Petromyzon marinus, and displayed a positive relationship with genome size in animals, but this was non- significant when incorporating branch length in the analysis. There was a very weak, non- significant relationship between these two variables in both fungi and land plants.

The proportion of the genome composed of repeats and TEs was positively correlated with cytological genome size in animals (Figure 3.10a and 3.10b) and stronger in fungi (Figure

3.11a and 3.11b). These relationships remained after estimating branch lengths for fungi, but were no longer significant in animals. No significant relationship was found in land plants in

88 either case. On average, the percentage of the genome composed of repeats varied from 14% in fungi (0.01%-57%), 27% in animals (3%-73%), and 51% in land plants (3%-87%).

TE richness Although the data set consisted of 541 genomes, TE content was reported at the superfamily level in only 256 (47.32%) of these. This included 75 animal genomes, 80 land plants, 77 fungi and 25 protists, of which 45 were BAC-end, fosmid and survey sequences.

Animal TE richness was not significantly correlated with genome size, nor in vertebrates only

(Table 3.2). TE richness in land plants (Figure 3.12) was negatively correlated with genome size, but in fungi it was positively correlated (Figure 3.13). Examination of the overall distribution of

TE richness and genome size (Figure 3.14) showed a pattern of low richness values at small genome sizes, an expansion of variance as genome size increased to approximately 500 Mbp, and then a decrease in variance at genome sizes larger than that. This same distribution was found when DNA transposons (Figure 3.15a) and retrotransposons alone (Fig 3.15b) were plotted against genome size. Consequently, there is a positive relationship between TE richness and genome size in genomes less than 500 Mbp (Figure 3.16), but no significant relationship in genomes larger than 500 Mbp.

The dbRDA sought to determine whether the factors of genome size and phylogeny had any power to explain the composition of the TE community, rather than just the richness of the community. The analysis on the original Elliott and Gregory (2015b) data set showed that genome size only appears to explain <1% of the variation in TE richness, and phylogeny only

3.89%, albeit both significantly (p<0.001). Even after estimating branch lengths, the amount of variation explained by the phylogeny did not exceed ~5% (p<0.001).

89

Follow-up analyses and testing predictions

Using the data distributions from the Appendix 2 data set and the new genome size values collected, predictions were made about gene number, discrepancy between assembly and cytological size and TE richness. The linear model fitted to the original animal gene number and genome size data set showed excellent predictive power (Table 3.4). When predicted and observed values were plotted, I observed a weak significant, positive relationship (Figure 3.17).

The linear model fitted to the original fungal gene number and genome size data set showed excellent predictive power and a strong, significant, positive relationship (Figure 3.18) between predicted and observed values. The linear model fitted to the original land plant gene number and genome size data set showed excellent predictive power as well but a weak, significant, positive relationship (Figure 3.19) between predicted and observed values.

The linear model fitted to the original discrepancy and genome size data set showed excellent predictive power, and a significant positive relationship (Figure 3.20) between predicted and observed values was found. Overall, the positive relationship between cytological estimate and genome size discrepancy is still found in the new data, with assembly size being smaller than cytological size in the majority of cases (n=47 of 53).

The Loess model fitted to the original animal TE richness and genome size data set showed weak predictive power (SE= ±8.03 superfamilies). Considering the average TE richness of an animal genome is 18.3 superfamilies, the predictive model is on average 43.8% different from observed TE richness. Consequently, the relationship between predicted and observed values in the new data set was weak and non-significant. There was not enough data to regress predicted versus observed values of TE richness for fungi or land plants with any statistical robustness. The new data set included 91 animal genomes (19 useable), 46 land plant genomes (3

90 useable), 16 protist genomes (1 useable) and 97 fungi genomes (6 useable), for a total of 250 new genomes. Only 11.6% (29) of the 250 new genomes had TE content that could be adequately assigned to superfamilies. This represents a much smaller proportion than in the original data set from Elliott and Gregory (2015b). Only the relationship between genome size and animal genome TE richness could be analyzed due to small sample sizes in the other taxon groups. Again, as with the original data set, this relationship was weak and not significant.

Lack of reporting of TE information

Of the 791 genomes in both the newly collected and older data set from Elliott and

Gregory (2015b), TE content was not reported at the superfamily level in a combined 63.9% of genomes (506/791). The lack of resolution down to this level took many forms. In some cases,

TE content was neither reported nor discussed at all (209/791), as was often the case for genome sequences that were short reports lacking any in-depth information (57/791). Some genomes included a mixture of superfamily-level descriptions and order-level description, the latter being less informative categories such as LTR, LINE, SINE or ITR DNA transposon (107/791). Order- only descriptions were also prevalent (80/791). In some cases, the way information was presented or particular circumstances for a given genome strongly indicated information for the full TE catalog was not possible (60/791). This included instances in ciliates where the macronuclear genome was sequenced, therefore the full TE catalog, which would likely only be present in the inherited micronuclear genome, was not available. Other genomes could not be included because the work focused more intensely on a few types of elements (11/791). Class- level descriptions of TE content (ie. DNA transposon, retrotransposon) made up a small minority of cases (3/791)

Occasionally, what appeared to be raw RepeatMasker tables were presented (9/791),

91 which included hits to all known element superfamilies from Repbase, such as in the

Rhizophagus irregularis genome. In 12 cases, superfamilies were lumped together, making it difficult or impossible to assess actual TE content. This included combining Helitrons with LTR elements in Camelina sativa, Ty3/Gypsy elements with DIRS elements in Symbiodinium kawaguttii, and combining all DNA transposons in Agaricus bisporus. This cataloguing of TE content also revealed at least three probable cases of human DNA contamination, due to the overabundance of human and mammal-specific families and superfamilies found in the genomes of several fungal species.

Discussion

Genome parameters The compilation and analysis of genomic information from 502 genomes showed that the size of the genome as reported from a sequence assembly, and that estimated from more traditional cytological methods, are very strongly positively correlated. This also revealed that significant discrepancies between these two estimates are not uncommon, and that the size of these discrepancies increases with genome size. Although coverage of the genome was a possible explanatory variable for this disconnect, coverage had no relationship with the size of the discrepancy Furthermore, the size as determined by sequence assembly is smaller than that determined by cytological methods in the majority of instances. Several things could explain this phenomenon. Cytological estimates, using flow cytometry and feulgen image analysis, could inherently over-estimate genome size. This possibility would be hard to evaluate currently, but theoretically could be tested if an artificial genome could be constructed of a known size which was then subjected to cytological size quantification. Efforts are currently underway to re-

92 engineer the S. cerevisiae genome by removing introns, TEs, and adding sites to rearrange the genome to create a minimized, customizeable yeast genome (Annaluru et al., 2014). With a platform such as this, sequences of a known length could be added to a genome to create a known sequenced-based size which could then be measured independently. Alternatively, using genome size reported by sequence assembly could inherently underestimate genome size relative to cytological methods by lacking particularly hard to sequence repeats in the final assembly.

As found by others (Kidwell and Lisch, 2000; Biémont, 2010; Chalopin et al., 2015), repeats and TEs display a positive relationship with genome size, demonstrating their role as a driver of genome size change along with polyploidy. Consequently, the proportion of the genome composed of genes declines with genome size. However in land plants, neither TEs nor repeats were found to be significantly correlated with genome size, nor was it in animals after branch lengths and testing for multiple hypotheses were taken into account. If the expansion of

TEs and other repeats are thought to be the causative agents behind genome size change, why isn‘t this signal stronger? The answer might lie in the strong relationship between genome size and the discrepancy between how large a genome is and how much of it can be assembled properly. Repeats are the most difficult part of the genome to assemble (Treangen and Salzburg,

2012), which suggests that the discrepancy is due to unassembled repeats in larger genomes. A small analysis comparing cytological genome size and repeat proportion of the genome as determined from non-sequencing methods (eg. reassociation kinetics) showed a much stronger relationship (r= 0.8724, p<0.001, n= 10) even after correcting for phylogeny (data taken from:

Britten and Davidson, 1971; Bigot et al., 1991; Peterson et al., 2002; Liu et al., 2011; ). As well, a re-annotation of repeats in the chicken genome using methods which find repeats in a de novo

93 manner rather than using a reference library suggested an additional ~20% of the genome was repetitive, bringing it much closer to pre-sequencing estimates of repetitive content for this species (Guizard et al., 2016). This suggests that the missing signal of the repetitive proportion driving genome size change is due to the discrepancy between the assembly and cytological size, and incomplete repeat annotation.

There was a positive relationship between gene number and genome size, as had been demonstrated previously (Hou and Lin, 2009; Friar et al., 2012), but now with a larger data set incorporating phylogenetic corrections. This is consistent with the Lynchian paradigm of genome evolution, namely that large genomes in eukaryotes are thought to be the result of relaxed selection due to a reduced effective population size (Lynch and Conery, 2003; Lynch,

2007). While this explanatory framework is typically applied to introns and TE insertions, Lynch and Force (2000) suggested that rates of gene duplicate retention were higher in organisms with effective populations sizes smaller than 105. This might occur through the sub-functionalization of gene duplicates, whereby original and duplicate gene copies incur mutations that are complemented by the other member, thereby increasing gene number but still maintaining the function of the original, single gene. It should be remembered that while this is a tempting, simple explanation for the evolution of genome content, the relationship between genome size and effective population size has been found to disappear after phylogenetic correction (Whitney and Garland 2010). Furthermore, the taxonomic scale at which the relationship is meant to apply remains nebulous (Daubin and Moran, 2004; Gregory and Witt, 2008). Still, the idea that effective population size drives genome evolution is, at the very least, a useful null hypothesis against which to test alternatives.

No significant relationship was found between genome size and intron number, with the

94 exception of a negative relationship in land plants, albeit with a relatively small sample size

(n=20). Nor was there a significant relationship with GC percentage. Ŝmarda et al (2014) found a quadratic relationship between GC content and genome size in over 200 species of monocot land plants. The land plant data set I collected was less than half the size of this, and not just monocots (n=16), therefore it is possible there was not enough statistical power to detect this relationship.

Predictive value of genome size

One goal of this paper was to determine if the data set from Elliott and Gregory (2015a) could be used to predict genomic parameter values given just genome size information for new genomes. Internal cross-validation of gene number and discrepancy data sets showed excellent internal predictive power, and when predicted and observed gene numbers were compared, positive relationships were found in all three taxon groups, although the strongest relationship was found in fungi. The weaker relationships in animals and land plants might reflect their larger genome size range, as well as more within-group taxon-specific patterns that cannot be represented when combining all animals or land plants into their respective data sets. While small data subsets could have been used (eg. mammals, grasses, etc.) this creates the risk of less statistical robustness due to smaller sample sizes. This can only be remedied by more data.

The strongest relationship was found between predicted and observed genome size discrepancy , suggesting that based on cytological genome size alone the proportion of the genome that will not be resolved through sequencing can be estimated prior to sequencing. It would be useful if future sequencing projects tested this by producing new cytological estimates for a species and then compared that value to the assembly size.

95

TE richness The diversity of TEs was assessed in 541 genomes, resulting in 256 estimates of TE richness at the superfamily level. These values were compared to estimates of genome size to determine what, if any, relationship might exist between the two, as suggested by researchers over 10 years ago (Volff et al., 2003; Gregory, 2005). Fungi and land plants displayed opposite relationships, the former being positive and the latter being negative, and animals having no significant relationship at all. Examination of the overall data set showed a humped distribution between log genome size and TE richness, with an inflection point around 500 Mbp.

Consequently, genomes smaller than 500 Mbp showed a positive relationship between genome size and TE richness, although genomes larger than 500 Mbp did not show a negative relationship. Of the 256 genomes, 45 were survey sequences where only small percentage of the genome was sequenced. It‘s possible that some TE superfamilies may not have been captured in this subset of the data, so both coverage and average read length were compared to TE richness.

No relationship was found in either case. I investigated whether or not the repeat identifying methodology used affected the number of TE superfamilies detected. Repeat identification can be broadly divided into methods that rely on a library of known repeats and identify repeats in a genome using homology, and those that identify repeats in a de novo fashion using a variety of other methods, such as aligning the genome to itself and identifying repeated sequences. The distributions for TE richness values between homology-only and combined (homology and de novo methods) genomes were not different (see Elliott and Gregory, 2015b for more details).

Regarding this last point, it should be noted that a recent re-annotation of the same mammal

96 genomes using homology and de novo methods separately did show the shortcomings of homology-only procedures in producing a complete TE catalog (Platt et al., 2016).

Contrary to prior thoughts, the relationship between TE richness and genome size is not as simple as a monotonic decrease in richness as genome size increases. Also, rather than a relationship which tightly follows a curve, it is instead more a contraction of variance in richness at very small and very large genomes, with an expansion around 500 Mbp. In a simulation study by Kijima and Innan (2013), TE proliferation, mutations, activity and simulated neighbour- joining trees of TEswere tracked over time while variables such as the strength of natural selection against insertions was modulated. They showed that when natural selection against insertions is weak, copy number obviously increased but the proportion of transpositionally active TEs within a genome decreased. The opposite occurred with strong selection, resulting in a decrease in copy number but an increased proportion of active elements. This mirrors the state in some genomes, for instance the relatively large human genome is dominated by millions of copies of retrotransposons, yet only a few thousand of these might be active (Mills et al., 2007).

Conversely, smaller genomes such as Takifugu rubripes or Drosophila melanogaster have TE copy numbers in the thousands, but a great deal of these are thought to be active (Kofler et al.,

2015; Chalopin et al., 2015). While not directly related to TE richness, this simulation might provide some help to explain it. If this relationship between genome size, TE copy number and

TE activity plays out over much longer time-scales, perhaps the attrition of entire superfamilies occurs as genome size and TE copy number increases, not just the proportion of active elements within a family but active families and superfamilies themselves, which then are rendered harder to detect through rearrangements and mutation. However, in the smallest genomes we do not see a small, highly diverse TE community, but rather a handful of superfamilies. Is there a limit to

97 the number of superfamilies that can be supported in a smaller genome? If so why? It would also be interesting to determine if the predictions of Kijima and Innan (2013) hold true across the diversity of both genome size and TE richness. For a given small range of genome size values there might be vastly different TE richness values, so will all families in those genomes display the same or similar ratios between active and inactive elements commensurate with the size of the genome they inhabit?

The TE richness-genome size distribution might also be showing susceptibility to horizontal transfer (HT) of TEs, with the richest genomes being those that can most easily acquire new TEs (Blumenstiel, pers. comm.). Parasites and viruses have been implicated as TE vectors, so knowledge of the parasite pool for various species might be useful in explaining the

TE richness distribution. The tendency to acquire new TEs might also be related to factors such as native range size, or ability to disperse, which would be interesting factors to test for their explanatory power (Schaack et al., 2010). The inclusion of new data for TE richness did little to expand the data set, as close to 90% of new genomes did not report TE content down to the superfamily level. This meant that using the previous data to predict TE richness for new genome size values was also difficult, and was only possible in animals. Even so, this analysis showed that predictions poorly matched observed values, which given the high variance in TE richness found in the original animal genomes data set, should not be surprising. It is doubtful that richness would be easily predictable from genome size alone, except perhaps in the largest and smallest genomes where variance in richness contracts.

TE community composition A dbRDA was used to explain the composition of the TE communities within genomes, rather than just the number of superfamilies. This analysis showed that neither the phylogeny of

98 host organisms, nor genome size were able to explain most of the variation in TE community composition, and in fact together explained only about 5%. Assuming that just genome size and host phylogeny influence TE communities is obviously naive. Other factors could be included in an RDA, such as the breeding system of the organisms in question, mode of reproduction (sexual or asexual), their demographic history, or the particulars of their TE silencing machinery, although these may in fact be more informative about TE abundance rather than community composition. TEs themselves are not independent of one another, sharing various degrees of common ancestry depending on the element lineage which would need to be taken into account

(Llorens et al., 2009; Kapitonov et al., 2009;Yuan and Wessler, 2011).

If the vertical inheritance of TE superfamilies appears to explain little of TE community variation, what about the HT of TEs? HT has become a hot topic in TE biology within the last 10 years (Schaack et al., 2010; Wallau et al., 2012). Despite cataloguing more and more instances of

HT, we suffer from a data shortage on the basic parameters of the phenomenon itself. Reports of

HT events in TEs are heavily biased to one genus, Drosophila, making up 41.2% of suspected

HT events (Dotto et al., 2015). Might we be looking at a very coarse signal of horizontal transfer due to the fact that host phylogeny does not seem to explain TE communities very well? Perhaps, but the superfamily-level is not the best at which to judge whether HT has occurred. Family-level assignment would be necessary, as this is the level most commonly used for HT detection, and even this would be limited to the families found by the repeat identifying process as opposed to de novo ones. This would be best determined through sequence analysis-based methods for detecting HT and associated tools, of which several now exist (Wallau et al., 2012; Modolo et al., 2014;Wallau et al., 2016). An unanswered question in TE biology is the extent to which HT occurs, and the availability of genomic resources have made this easier to answer, at least within

99 the confines of what has already been sequenced. If genomes with the highest TE richness also show evidence of extensive HT, it would suggest that HT is a driving force in shaping TE communities.

It should not be forgotten than an important, and little considered, factor influencing the

TE community in genomes would be interactions between TEs themselves. The most obvious include interactions between autonomous and non-autonomous elements, but also between phylogenetically unrelated elements, such as competition between rDNA inserting LINEs in arthropods, and MITEs preferentially inserting into LINEs in Xenopus (Christensen et al., 2000;

Ye et al., 2005). This factor might be the least studied of all and deserves investigation. Analyses such as those by Saylor et al (2013) show that the TE community can exhibit a spatial pattern along chromosomes analogous to that of species along transects in an environment. By combining this or similar data with rough estimates of TE age it might be possible to infer how

TEs interact with one another, synergistically or antagonistically, over time and how these interactions may have changed. Over short time scales, these interactions would help to determine the abundance of various TEs in the same way they shape the abundance of species in conventional habitats. These interactions would then partially determine longer-term interactions where evolution can also occur and influence the interactions as well. Both are important, but the magnitude of each remains to be investigated.

The current analysis did not differentiate between evolutionary and ecological influences on TE communities; ecological in this sense referring to the ecological proxy forces within the genome at the TE level, as opposed to ecological forces at the organism level, (Linquist et al.,

2013). The effect of ecological proxies would be better studied in a more phylogenetically constrained group of host organisms, such as drosophilids or primates, which would arguably be

100 most similar to traditional community ecology sampling. The large-scale analysis conducted here is more akin to a global ecosystem survey across disparate environments, such as islands, deserts, jungles and the ocean floor. Global surveys of biodiversity tend to show an increased species richness at the equator and a decrease as one moves closer to the poles, which had been attributed to various ecological and evolutionary forces, such as species diversification rates, geographical area and the ability for those species to adapt to cold environment (Rosenzweig,

1992; Hillebrand, 2004; Rolland et al., 2014). A large-scale pattern was also found within genomes, and this is likely the result of multiple forces interacting.

TEs are often referred to as the parasites of the genome, so it would be of interest to see how parasite species richness is distributed in nature. A recent meta-analysis of 62 studies of parasite species richness in fungal, plant and animal hosts showed that parasites did not assort along a latitudinal gradient (Kamiya et al., 2014). Rather, the factors of host body size, geographical range size and population density were determinants of parasite richness across hosts of all taxonomic groups. What these theoretically have in common is that they could affect the frequency of successful parasite transmission from one individual host to another, whether intra or interspecifically. Whether or not these same factors, or analogs thereof, increase the rates of HT of TEs, and result in more TE-rich genomes deserves further investigation.

How TE information is reported Remarkably, 52.5% of the 506 genomes where TE data was searched for could not be used because adequate information on their TE content was not present. Of these 506 genome papers, 41.3% neither discussed nor reported TE content at all. And while gene number information was reported for over 94% of the 520 genomes from Elliott and Gregory (2015a), the proportion of the genome composed of TEs and repeats was only reported for 51% and 53%

101 of these, respectively. When the way TE reporting was handled across the combined 791 genome data set was examined, 63.9% of genomes could not have their TE catalogs assigned to the superfamily level. This included instances where TEs were described in a mixture of the superfamily and ordinal levels, at 23.64%. Over a quarter of the combined dataset featured no discussion or reporting of TE content whatsoever

Unsurprisingly, there are very complete TE catalogs for the genomes of model organisms such as Drosophila melanogaster and our own genome, based on the genome papers themselves, numerous other studies in the primary literature, and in databases. Details about what particular superfamilies they have, their abundances and copy numbers are not difficult to find. Even so,

TE richness values for both of these species are lower than the average value for animals. A mixture of superfamily/ordinal level reporting, or no reporting at all, makes up the bulk of the TE content landscape. A small number of cases feature non-ideal TE report that would not be distinguishable from a full TE catalog by non-experts. These took the form of raw RepeatMaker tables, which appear to show a given genome possessing hits to all known TE superfamilies, such as in Rhizophagus irregularis and Flammulina velutipes. The latter species, with a genome size of 35.6 Mbp, seems unlikely to harbor over 60 superfamilies, given the known TE richness and genome size relationship in other fungi. Interestingly, R. irregularis has a genome size larger than anything included in the analysis of Elliott and Gregory (2015b), at 141 Mbp, and given the strong, positive relationship in fungi, it is possible this species does have one of the highest TE richness values in sequenced fungi. Perhaps the least useful instance of TE reporting comes from the Apis cerana genome, where a table is given with human TE entries, but with no hits to these human specific elements. The TEs actually present in this organism are relegated to unclassified and ordinal-level, non-descriptive categories such as LTR elements and DNA transposons.

102

Why this variation in the detail given to TE and repeat catalogs? One reason could be navigating the plethora of tools to use during repeat identification and annotation, of which there are many (Janicki et al., 2011). Proper TE identification and annotation requires researchers to be knowledgeable in the field, with the time and resources to devote to sifting through the sometimes hundreds to thousands of Mbp of repeats (Bao et al., 2015; Platt et al., 2016). Given the comparatively small TE research community, and smaller still the number of people with the expertise to annotate, there are limits to what can be reasonably done currently by TE biologists.

It seems likely that this problem of the ‗black box‘ genome will only escalate. Large collaborative projects are underway to sequence thousands and thousands of animal (1000 Plants and Animals; Haussler et al., 2009; Evans et al., 2013; GIGA, 2014), plants (1000 Plants; Goff et al., 2011), and fungi genomes (1000 Fungal Genomes), which will possibly amplify the problem.

The implications of this paucity of readily available comprehensive TE data will be discussed more in Chapter 4.

Concluding remarks

This large survey of the sequencing literature revealed that large-scale relationships between genome size and some genomic parameters are widespread throughout the eukaryotic tree, and remain significant even when using the most basic method of taking shared common ancestry into account. In some cases, the genome sizes that have been generated using cytological methods in use for decades can be useful in predicting what will be found in the virgin territory of soon-to-be sequenced genomes. The reporting of TE content lags behind that of other genomic parameters, and flipping a coin will be just as useful for predicting whether or not a new genome sequence will have a suitably detailed TE catalogue. What information is

103 available shows that the relationship between the number and type of TE superfamilies and genome size is more complex than a simple linear relationship, in contrast to other genomic parameters.

The results of the current study have suggested that while TE information is often lacking, the information that has been gathered does point to interesting areas of further research.

The distribution of TE richness values across the range of genome size requires explanation, and a host of explanatory variables are likely to be responsible. These include only properties of the organisms, such as breeding system, demographic history and effective population size. The

RDA also suggested that host factors cannot be the only ones considered, as the element properties and perhaps interactions between elements might also inform how a TE community is structured. Although not considered in this work, the impact of the activity composition of the

TE population on this distribution also deserves further work; for example, whether or not all large genomes are composed of a few, high-copy number superfamilies with a small number of active elements.

Now that a broad idea of what data is available, and what broad patterns exist in genomes, how can we understand this information? As we learned in Chapter 2, despite the publication of the selfish DNA papers, the organism-centric view of TEs persisted and is arguably the dominant paradigm in genome and TE biology. The selfish DNA papers laid the groundwork for understanding TEs as one level in a multi-level genome, and subsequent papers introduced ideas about TEs as populations of entities struggling for survival within the environment of the genome. While a TE-centric perspective shouldn‘t replace the organism- centric one, a better balance must be struck between these two levels in trying to understand the patterns and evolution of TEs. In the final chapter of this thesis, I will attempt to lay the

104 foundations for what an element-level perspective would look like, and what research questions could be asked within this framework.

105

Tables and figures

Table 3.1. Summary of genome data included in Elliott and Gregory (2015b).

Parameter Animals Land Plants Fungi Protists

Genome size

Average Assembled Genome Size 1,153.98 ±100.18 1,065.82± 34.98± 1.74 61.11± 9.76

(Mbp) (n=149) 176.86 (n=83) (n=218) (n=70)

Cytological Genome Size (Mbp) 1,294.71± 110.65 1,498.43± 35.71± 1.92 75.51± 21.49

(n=149) 232.52 (n=83) (n=218) (n=70)

Discrepancy (Mbp) 165.107± 25.73 448.83± 206.26 26.38± 12.34 100.82±

(n=127) (n=80) (n=6) 87.56 (n=10)

Average Genome Size for Each 4,176.06± 112.36 6,120.79± 65.89± 5.31 NA

Genome Size Database (Mbp) 107.58

Maximum Database Genome Size 129,907.74(n=5635) 148,852(n=8257 5,800 NA

(Mbp) ) (n=1932)

Gene content

Number of protein-coding genes 18,943± 451.82 35,577± 9,953± 12,589±

(n=139) 1641.08 (n=80) 315.16 1148.69

(n=202) (n=70)

Average amount of coding DNA 27.58± 1.26 (n=90) 39.23± 1.81 13.059± 0.56 18.55± 2.16

(Mbp) (n=64) (n=97) (n=49)

Coding % of cytological GS 10.4%± 1.12% (n=90) 7.86%± 0.87% 46.66%± 42.31± 3.23

(n=64) 1.62% (n=97) (n=49)

Average exon length (bp) 218.8± 9.28 (n=70) 256.35± 6.06 498.77± 600.05±

(n=55) 41.21 (n=72) 53.37 (n=40)

106

Amount of exonic DNA per gene 1,489.54± 35.65 1,159.98± 27.49 1,392.89± 1497.27±

(bp) (n=91) (n=63) 24.72 (n=89) 54.49 (n=49)

Average intron length (bp) 2,172.5± 255.34 430.091± 28.08 133.34± 6.87 204.37±

(n=72) (n=50) (n=80) 20.44 (n=44)

Average number of introns per gene 5.05± 0.47(n=26) 3.94± 0.38 1.72± 0.24 2.61± 0.66

(n=20) (n=50) (n=27)

Average amount of intronic DNA 8191.5± 2033.71 1804.45± 201.23± 1047.99± per gene (bp) (n=25) 287.43 (n=18) 25.23 (n=38) 472.46 (n=19)

Average gene region size (introns + 9533.11± 2050.22 2956.72± 1655.47± 2487.97± exons)/gene (bp) (n=25) 302.28 (n=18) 48.73 (n=30) 552.27 (n=17)

Repetitive content

Repeats as % of assembly GS 27.35%± 1.83% 50.6%± 3% 14.38%± 19.45± 3.98

(n=102) (n=54) 1.75% (n=92) (n=26)

Average amount of repetitive DNA 459.41± 69.96 946.23± 202.51 8.81± 1.76 24.88± 7.56 per species (Mbp) (n=102) (n=54) (n=92) (n=26)

TEs as % of assembly GS 23%± 1.85% (n=100) 38.88%± 2.44% 13.59%± 13.84± 3.19

(n=61) 2.75% (n=74) (n=28)

Base pair composition

GC % 37.68%± 0.55% 36%± 0.62% 45.73%± 47.24% ±

(n=76) (n=31) 0.58% 1.76% (n=56)

(n=161)

107

Table 3.2. Average values for several genomic parameters from each taxonomic group, as determined by data from Elliott and Gregory (2015a; 2015b).

Taxon Average TE Average Common Superfamilies (% of

Group Cytological Content Superfamily genomes in analysis where present)

Genome Size Count

Animals 1294 Mbp 23% 18 ± 9 hAT and Tc1/Mariner (88%), CR1/L3

(78.76%), Ty3/Gypsy (76%), L1 (68%),

Helitron (60%), RTE (58.67%), PiggyBac

(56%), L2 (54.67%), PIF/Harbinger +

ISL2EU (49.3%), Ty1/Copia (48%),

Penelope (45.33%), SINE2 tRNA (46.67%),

Mutator + Rehavkus (41.3%), Maverick

(40%), ERV1 (37.33%), DIRS (38.67%), P

(32%)

Land 1498 Mbp 38.88% 11 ± 3 Ty3/Gypsy and Ty1/Copia (100%), CMC

Plants (95.06%), Mutator + Rehavkus (90.12%), hAT

(88.88%), PIF/Harbinger + ISL2EU, Helitron

(66.25%), Tc1/Mariner (60%), L1 (50%),

RTE (18.75%), SINE2- tRNA (16.25%)

Fungi 35 Mbp 13.59% 6 ± 3 Ty3/Gypsy (87%), Ty1/Copia (77.92%),

Tc1/Mariner (68.83%), hAT (40.26%),

Helitron (35.06%), Mutator + Rehavkus

(32.47%)

108

Table 3.3. Relationships between genome size and other genomic parameters, as shown by correlation coefficients (r). Branch length transformation statistics represent the average values for the three types of branch length transformation previously discussed. The adjusted α value is the result of the Bonferroni correction for testing multiple hypotheses

Relationship Taxon Statistics Branch Length Adjusted

Transformation α Value

Applied

Cytological Animals r= 0.9208 r= 0.9136 0.002083

Genome Size vs n= 148 n= 148

Assembly Genome p<0.0001 p<0.0001

Size

Fungi r= 0.9897 r= 0.9821 0.002083

n= 217 n= 217

p<0.0001, p<0.0001

Land Plants r= 0.9382, r= 0.9184 0.002083

n= 82 n= 82

p<0.0001 p<0.0001

Cytological Animals r= 0.4284, r= 0.4488 0.002083

Genome Size vs (Figure 3.3) n= 126 n= 126

Genome Size p<0.00001 p= 1.61E-7

Discrepancy

Land Plants r= 0.6936, r= 0.6041 0.002083

(Figure 3.4) n= 79 n= 79

p<0.00001 p= 5.962E-9

Genome Coverage All r= -0.028 NA 0.002083 vs >10% Genome n=143

109

Size Discrepancy p=0.73

Cytological Animals r= 0.3621 r= 0.4403 0.002083

Genome Size vs (Figure 3.5) n= 138 n= 138

Gene Number p<0.0001 p= 5.83E-7

Fungi r= 0.5868 r= 0.4436 0.002083

(Figure 3.6) n= 201 n= 201

p<0.0001 p= 4.42E-9

Land Plants r= 0.5032 r= 0.5681 0.002083

(Figure 3.7) n= 79 n= 79

p<0.00001 p=6.46E-8

Cytological Animals r= -0.6595 r= -0.8857 0.002083

Genome Size vs n=89 n= 89

Coding Proportion p<0.00001 p<0.0001

Fungi r= -0.7680 r= -0.8021 0.002083

n=96 n= 96

p<0.0001 p<0.0001

Land Plants r= -0.9351 r= -0.8902 0.002083

n= 63 n= 63

p<0.0001, p<0.0001

Cytological Land Plants r= -0.733 r= -0.7857 0.002083

Genome Size vs (Figure 3.9) n= 20 n= 20

Intron Proportion p<0.001 p= 6.48E-5

Cytological Animals r= 0.296 r= 0.1882 0.002083

Genome Size vs n= 112 n= 112

Chromosome p<0.002 p= 0.056

Number

Cytological Animals r= 0.313 r= 0.308 0.002083

110

Genome Size vs (Figure 3.10a) n=101 n= 101

Repeat Proportion p<0.002 p= 0.00219

Fungi = 0.511 r= 0.5661 0.002083

(Figure 3.11a) n= 91 n= 91

p<0.00001 p<0.00001

Cytological Animals r= 0.4628 , r= 0.369 0.002083

Genome Size vs (Figure 3.10b) n= 99 n= 99

TE Proportion p<0.00001 p= 0.00323

Fungi (Figure r=0.5204 r=0.46 0.002083

3.11b) n=73 n= 73

p<0.00001 p<0.0001

Cytological Animals r= -0.1548 NA 0.00625

Genome Size vs n=75

TE Richness p= 0.18,

Vertebrates r= 0.03, NA 0.00625

n= 34

p= 0.86

Fungi (Figure r= 0.649, r= 0.677 0.00625

3.13) n= 76 n= 76

p<0.0001, p= 3.02E-11

Land Plants r= -0.306 r= -0.3683 0.00625

(Figure 3.12) n=79 n=79

p<0.006, p=9.44E-4

Cytological All r= 0.4136 r= 0.4126 0.00625

Genome Size vs (Figure 3.16) n= 150 n= 150

Genomes <500 p<0.00001, p= 3.73E-7

Mbp

111

Cytological All r= -0.2492, r= -0.275 0.00625

Genome Size vs n= 106 n= 106

Genomes >500 p<0.01, p= 4.28E-3

Mbp

Cytological Animals r= 0.1255, NA NA

Genome Size vs n=19

New Dataset TE p= 0.655

Richness

112

Table 3.4. Statistics for cross-validation and relationships between predicted and observed values (represented by correlation coefficients [r]) for genomic parameters from newly collected genomes. Linear models were fitted to data distributions, data sets were divided into three different numbers of subsets (K), after which each subset of data was left out of the dataset and the linear model was recalculated. Raw and adjusted delta values represent the prediction error of the models with and without data removal and recalculation.

Relationship Taxon Cross-Validation Statistics Predicted vs

Observed Values

Cytological Genome Animals K= 6, raw delta= 1.025945, adjusted r= 0.344

Size vs (Figure 3.17) delta= 1.026152 n= 72

Gene Number K= 20, raw delta= 1.025971, p<0.01

adjusted delta= 1.025959

K= 139, raw delta= 1.025989,

adjusted delta= 1.026154

Fungi K= 6, raw delta= 1.03086, adjusted r= 0.91054

(Figure 3.18) delta= 1.030751 n= 46

K= 20, raw delta= 1.030635, p<0.0001

adjusted delta= 1.03061

K= 202, raw delta= 1.030911,

adjusted delta= 1.030908

Land Plants K= 6, raw delta= 1.040223, adjusted r= 0.5581

(Figure 3.19) delta= 1.039751 n= 32

K= 20, raw delta= 1.038859, p<0.0001

adjusted delta= 1.038762

K= 80, raw delta= 1.038317,

adjusted delta= 1.038297

113

Cytological Genome All K= 6, raw delta= 1.845953, adjusted r= 0.7691

Size vs (Figure 3.20) delta= 1.843323 n= 53

Genome Size K= 20, raw delta= 1.839352, p<0.00001

Discrepancy adjusted delta= 1.838774

K= 223, raw delta= 1.83809, adjusted

delta= 1.838042

114

Figure 3.1. Process of compiling comprehensive TE richness data for eukaryote genomes.

115

Figure 3.2. Log10 cytological genome size and assembled genome size (Mbp) for animals, land plants, fungi and protists.

116

Figure 3.3. Log10 cytological genome size and discrepancy between cytological and assembled genome size in animals.

117

Figure 3.4. Log10 cytological genome size and discrepancy between cytological and assembled genome size in land plants.

118

Figure 3.5. Log10 cytological genome size and gene number in animals.

119

Figure 3.6. Log10 cytological genome size and gene number in fungi.

120

Figure 3.7. Log10 cytological genome size and gene number in land plants.

121

Figure 3.8. Log10 cytological genome size and coding proportion of the genome in animals, fungi, land plants and protists.

122

Figure 3.9. Log10 cytological genome size and intron proportion of the genome in land plants.

123

Figure 3.10. Log10 cytological genome size and proportion of the genome composed of (A) repeats and (B) TEs in animals. 124

Figure 3.11. Log10 cytological genome size and proportion of the genome composed of (A) repeats and (B) TEs in fungi.

125

Figure 3.12. Log10 cytological genome size and TE richness in land plants.

126

Figure 3.13. Log10 cytological genome size and TE richness in fungi.

127

Figure 3.14. Log10 cytological genome size and TE richness in animals, fungi, land plants and protists.

128

Figure 3.15. Log10 cytological genome size and (A) DNA transposon and (B) retrotransposon richness in animals, fungi, land plants and protists.

129

Figure 3.16. Log10 cytological genome size and TE richness in animals, fungi, land plants and protists with genomes smaller than 500 Mbp.

130

Figure 3.17. Log10 predicted and observed gene number from animals, predicted from the cytological genome size values of novel genome sequences. The linear model used was derived from the animal data set from Elliott and Gregory (2015b).

131

Figure 3.18. Log10 predicted and observed gene number from fungi, predicted from the cytological genome size values of novel genome sequences. The linear model used was derived from the fungi data set from Elliott and Gregory (2015b).

132

Figure 3.19. Log10 predicted and observed gene number from land plants, predicted from the cytological genome size values of novel genome sequences. The linear model used was derived from the land plant data set from Elliott and Gregory (2015b).

133

Figure 3.20. Log10 predicted and observed discrepancy between cytological and assembled genome size, predicted from the cytological genome size values of novel genome sequences. The linear model used was derived from the data set from Elliott and Gregory (2015b).

134

Chapter 4. Where we are going: Transposable element evolution from an element-level perspective and the future of TE research

Abstract

Historically, the study of TEs has been approached from the perspective of the organism that they inhabit, and their behaviour interpreted within the functional context of said host organism. The selfish DNA concept provided a counterargument to this by suggesting that TEs are but one level in the hierarchy of evolution and need not contribute anything to an organism to persist. In this chapter, I expand upon this idea by further developing this element-level perspective to balance out the organism-centric perspective that has been dominant for some time. Others have touched upon this in the form of ideas such as ‗genome ecology‘, which suggests that recognizing a level below that of the organism might have explanatory power. I describe how best to separate TE ecology from TE evolution, as well as TE evolution from evolution at the level of the organism. I raise a series of questions concerning fundamental factors in understanding TE evolution, such as how to count TE individuals and whether or not

TEs exhibit analogs for different modes of reproduction. I close by arguing that the way we annotate and store TE information needs to be addressed to not only better investigate evolution at the TE level, but also in understanding how TEs have affected the evolution of organisms.

135

Introduction

Genetic information filtered through environmental influences shapes the morphology and behaviour of all living things on this planet. This process starts at the level of the genome where the expression of signals from genetic loci are turned on in different cell types, contexts, and time periods, and transformed into proteins and RNAs to carry out the varied functions of the cell and organism at large. Genomes vary in size across the diversity of life, from 138 kilobase pairs (Kbp) in a bacterial endosymbiont to 152.23 gigabase pairs (Gbp) in the Japanese lily, yet the amount of genomic real estate that the instruction-bearing protein-coding and regulatory sequences occupies varies significantly less (Pellicer et al., 2010; McCutcheon and von Dohlen, 2011; Elliott and Gregory, 2015a). Part of the reason for this disconnect is due to the remaining sequences in many cellular organisms possessing a behaviour and dynamism unlike the instruction-bearing DNA. As reviewed in Chapter 1, these dynamic sequences include transposable elements (TEs) which can move from place to place within the genome and copy themselves (Hua-Van et al., 2011).

The ability of TEs to move within the genome and reinsert back into DNA is what first brought TEs to the attention of researchers over 70 years ago. This mobility imbues TEs with a unique mutagenic potential that has captured the attention of many as they have tried to determine how TEs have influenced the evolution of their hosts, and how they have contributed to disease (O‘Donnell and Burns, 2010; Levin and Moran, 2011; Alzohairy et al., 2013). It is through this host-centric lens that much, although not all, TE research has been carried out.

Functional consequences of TEs for the host usually frame the research program in TE biology.

Consequently, unanswered questions in TE biology concerning their evolution and their varying

136 abundance and diversity in different genomes are addressed in large part only from the perspective of the host. Complementary to this view is one advocated some 30 years ago which argues that TEs occupy their own level in the evolutionary hierarchy, along with organisms, groups, populations, species, and lineages, and this fact must be considered in interpreting their origins, evolution, persistence, and pervasiveness in genomes (Orgel and Crick, 1980; Doolittle and Sapienza, 1980; Doolittle, 1982; 1984; 1989; Brunet and Doolittle, 2015). Arguably, this is one of the most significant contributions to understanding TE evolution: to view them as biological entities in their own right. This chapter reviews the historical development of such ideas, outlines where the TE-level perspective stands today in the post-genomic era, and discusses why it is crucial that it continue to be developed.

Function, selfish DNA, multi-level selection, and misconceptions

The history of TE research is primarily one of attempting to explain their presence, abundance, diversity and pervasiveness in the context of how they function for and/or harm the host organism. Their discoverer, Barbara McClintock, proposed from the beginning that they have a role in the development and gene expression of the organisms they inhabit (McClintock,

1961). Repeats in general were ascribed a regulatory role, and the subsequent discovery of TEs in prokaryotes had researchers assigning them regulatory and evolutionary functions (Britten and

Davidson, 1971; Cohen, 1976; Shapiro, 1977). Yet, from the discovery of TEs in the late 1940s through to the 1970s, there was always a minority of researchers who disagreed with these assertions, intimating that much of the DNA in the larger eukaryote genomes was unlikely to be functional, or that mobile DNAs and other repeats might in fact represent parasitic sequences

137 operationally separate, to a certain extent, from the rest of the genome (Östergren, 1945;

Peterson, 1970; Cavalier-Smith, 1978).

The publication of the selfish DNA papers sought to codify this idea more, and to combat rampant adaptationism in explaining genomic features, by suggesting that many repeats and mobile DNA might represent a level in the evolutionary hierarchy below that of the organism

(see Chapter 2). At this level, sequences of DNA that can ensure their own survival and proliferation, independently of contributing to the fitness of the organism, would be analogous to organisms competing in an ecosystem (Orgel and Crick, 1980; Doolittle and Sapienza, 1980).

Given variation in their ability to survive and reproduce, the ability to pass on this propensity, and the possibility that more competing sequences might be produced than survive, the ingredients for evolution via natural selection would exist within the genome as well as within populations. However, when considering TEs, the host organism cannot be ignored either, because factors such as silencing and the deleterious effects of TEs upon the host must be taken into account (Blumenstiel, 2011; Lisch, 2012). This necessitates taking a multi-level view of how

TEs and the organism interact to influence the evolution of one another.

As seen in Chapter 2, although a multi-level view of TEs was articulated in the early

1980s, a host-centric approach to TEs remains very common in the genomics community.

Misconceptions persist about what the selfish DNA hypothesis represents, such as claims that evidence of exaptation, or of co-evolution between the host and TEs, undermines the concept of selfish DNA as a framework for understanding TE biology (eg., Allen et al., 2004; Muotri et al.,

2009). Statements such as: ―It remains a curiosity why sequences without any apparent purpose continue to thrive in genomes‖ (Levin and Moran, 2011) continue to persist in the literature, when the answer to such a question has already been proffered. There is a need to move beyond

138 the word ‗selfish‘, and the misconceptions it invokes, and instead focus on the genome as a multi-level entity.

Developing the analogy

As noted in Chapter 2, after 1980, Doolittle and colleagues published other papers where the selfish DNA hypothesis was discussed further, and aspects of the TE as a level within the evolutionary hierarchy were expanded upon. As noted, some of these papers can be difficult to access, which may explain why they are rarely cited in the TE literature. No more than a year after the publication of the selfish DNA hypothesis, Sapienza and Doolittle (1981) suggested that the evolution of TEs ―should be comprehensible in the same terms as the evolution of organisms within their environments‖. Using this approach and knowledge of , they predicted that intragenomic selection, selection operating between replicators within the genome, would tend to favour sequences within elements that promoted mobility, but that the lack of cis- activity in what we know today as DNA transposons, would allow for non-autonomous cheater elements to evolve. Sapienza and Doolittle (1981) suggested that these non-autonomous cheaters might put pressure on ‗mother‘ elements to ‗evolve away‘, and no longer have proteins or recognition sequences compatible with the cheater and thus escape the parasitism. Lampe et al.

(2001) provided empirical data based upon ITR-transposase interactions between different lineages of Mariner elements on how this escape would be theoretically possible. Spaienza and

Doolittle (1981) also expected that nucleotides of no selective value within elements would be subject to drift, much as neutral variation within genomes of the host is. The application of concepts from traditional, organismal evolution to the level of TEs was expanded upon by

Doolittle, such as describing TEs as subcellular organisms selected for intragenomic survival

(Doolittle, 1982). A few years later, Doolittle (1984) took this further and stated that one could

139 construct a ―population genetics of the components of the genome‖, with TE copies of the same family as the individuals of interest at their own level in the evolutionary hierarchy. Doolittle also argued that taking a multi-level viewpoint with regards to TEs and genome evolution in general was essential, and that much of the confusion with regards to TEs arose when it was not taken (Doolittle, 1984; 1987; 1989). These papers by Doolittle (1984; 1987; 1989) offered an interesting counterbalance to the host-centrism typically seen in the literature. Moreover, while focussing only on the element level would be just as ill-advised as focussing only on the host level, it can be argued that TE biology would benefit from having the conceptual pendulum swing back more towards the element side.

An expanded continuum

The selfish DNA hypothesis continued in largely the same form for some 20 years before a significant expansion of it came in the seminal review paper by Kidwell and Lisch (2001).

They discuss two examples concerning some interesting interactions observed between elements and their hosts that seem to defy the selfish DNA hypothesis. The first is a phenomenon observed in several species of arthropods and the green alga, Chlorella vulgaris where telomeres are not maintained by repeat addition via the telomerase enzyme, but instead are maintained by the insertion of one or several lineages of non-LTR retrotransposons (Higashiyama et al., 1997;

Takahashi et al., 1997; Pardue and DeBaryshe, 2011). These elements display site-specific insertion for the telomeres and have unique adaptations suited to maintaining telomere function, including GAG proteins, which target the elements to telomeres, and unique promoter and end- extension mechanisms to help ensure a population of active elements (Rashkova et al., 2002;

140

George et al., 2010; Traverse et al., 2010). The second involves a phenomenon particular to hypotrichous ciliated protists that involves complex genomic rearrangements during sexual reproduction (Chalker and Yao, 2011). These rearrangements involve the removal of large sections of DNA from chromosomes, re-ligation of the remaining fragments and the amplification of others to produce a second ‗somatic‘ copy of the genome used for protein production separate from the intact ‗germline‘ copy. This is carried out by thousands of copies of specialized DNA transposons in Oxytricha trifallax, partially guided by RNAs, which bind to and cut specific sequences in the genome. In other species, such as Paramecium spp., the same process is mediated by an exapted transposase under complete host control (Nowacki et al.,

2008; 2009; Baudry et al., 2009; Cheng et al., 2010).

TEs involved in telomere maintenance and genome remodelling were previously thought to be altruistically serving their host by performing beneficial functions. However, Kidwell and

Lisch (2001) interpreted these interactions slightly differently from those who discovered the respective phenomena. In light of these examples, they argued that the selfish tag attributed to

TEs was limited in that it ignored the complex situations that can arise during the long-term, co- evolutionary relationship between a host genome and its resident TEs. Instead, they suggested a continuum of interactions would be more appropriate to describe TE-host relationships, spanning the breadth commonly found in organism-organism relationships, from strict parasitism at one extreme, commensalism midway, and mutualism at the other extreme. This concept was first mentioned by Orgel and Crick (1980) but it was Kidwell and Lisch (2001) who formally described it.

The telomeric and genome remodelling examples represent instances where the direction

141 of selection at two levels, that of the elements and the host, is effectively aligned when the survival of the elements happens to also be beneficial for the survival of the host (Durand and

Michod, 2010). This idea of a continuum added depth and complexity to the conceivable types of interactions between TEs and their hosts, and provided a plausible explanation for unusual examples such as retrotransposons that maintain telomeres.

The element-level perspective in the post-genomic era

Why we need the element-level perspective more than ever

Since the publication of Kidwell and Lisch (2001), much has changed in the world of TEs and genomics. Known TE diversity has increased with the discovery of new superfamilies, such as rolling circling replicating Helitrons (Kapitonov and Jurka, 2001; Thomas and Pritham, 2015), self-synthesizing M/P elements (Feschotte and Pritham, 2005; Kapitonov and Jurka, 2006;

Pritham et al., 2007), tyrosine recombinase-bearing Cryptons and DIRS-like elements (Goodwin and Pulter, 2001; Goodwin et al., 2003), and a plethora of new cut-and-paste DNA transposons

(Bao et al., 2009; Bao et al., 2010; Böhne et al., 2012; Kojima and Jurka, 2013). It is almost certain that the un-annotated ‗unknown repeats‘ found in many genomes also hide more undiscovered forms of TEs. Whole genome sequences available for eukaryotes have ballooned from a handful, to hundreds, and large sequencing consortiums seek to increase that number to the thousands over the next decade (Genome 10K Community of Scientists, 2009; i5K

Consortium 2013; 1000 Fungal Genomes Project, 2015). Advances in sequencing technology have made the assembly of larger, more repeat-rich genomes easier to manage, and survey- sequencing, the sequencing of only a small percentage of the genome, has given us glimpses into the TE communities of some truly bloated genomes (Kovach et al., 2010; Ambrožová et al.,

142

2011; Sun et al., 2012; 2014). Sequencing efforts within species have expanded as well, with multiple genomes available for a number of species, revealing intraspecific variation in genomic content (Pelak et al., 2010; Cao et al., 2011; Engel and Cherry, 2013; Grenier et al., 2015) Some patterns have started to emerge, such as plant genomes dominated by LTR retrotransposons, and mammals dominated by mostly inactive non-LTR retrotransposons and inactive DNA transposons (Giordano et al., 2007; Chapter 3). However, notable exceptions exist, such as the

Xenopus tropicalis and genomes dominated by DNA transposons (The

Arabidopsis Genome Initiative, 2000; Hellsten et al., 2010), and the particularly active DNA transposons in bats (Ray et al., 2007; Ray et al., 2008). Do these interesting exceptions represent patterns more dominant than we think due to limited sampling, or are they truly exceptions that need to be investigated to tease out causation?

Like genomics in general, this explosion of data represents a new opportunity in our quest to understand the forces that shape the evolution and distribution of TEs in genomes. What limits us most are the computational tools needed to wade through this vast sea of data, and the conceptual framework through which we view and interpret that data. Host-centric thinking in regards to TEs has been the dominant view for most of their history as discussed in Chapter 2.

TEs are not just interesting loci present in the genomes of organisms, but interesting entities in their own right. I would argue that acknowledging TEs as their own level in the evolutionary hierarchy and shifting our perspective on occasion will be profitable in the insights gained from it. This view has existed as a minor contributor in TE biology throughout its history, and the recent interest in ‗genome ecology‘ represents an attempt to take this view.

Distinguishing TE ecology from TE evolution

The continuum o of TE-host interactions discussed by Kidwell and Lisch (2001)

143 describes TEs as being capable of interacting with the host in a variety of ways, similar to interactions between different species in an ecosystem. This idea of an ‗ecology of the genome‘ was stated before by Holmquist (1989), Brookfield (1995), Bennetzen (2000), and Kidwell and

Lisch (1997), but had not been expanded upon, although Kidwell and Lisch (2001) did so in only a limited fashion with the concept of TEs occupying different genomic niches. These niches typified different insertion strategies, with certain elements favouring gene-dense heterochromatin to avoid deleterious effects on the host while others favoured gene-rich euchromatin, with a higher probability of causing harm but possibly greater transcriptional activity (Kidwell and Lisch, 1997; 2001). Leonardo and Nuzhdin (2002) extended the idea to interactions between TEs and discussed possible examples of parasite-host, competition and cooperation. Le Rouzic et al.(2007) and Le Rouzic and Capy (2006) modelled competitive and mutualistic interactions between TEs and were able to demonstrate the long-term stability and cyclical nature of these, much like host-parasite interactions. Similar cyclical dynamics were predicted using a Lotka-Volterra predator-prey framework by Abrusán and Krambeck (2006).

The goal of ‗genome ecology‘ is to explore whether or not ecological concepts and tools have any value in deciphering the varied abundance and distribution patterns of TEs in eukaryotic genomes. However, when the existing literature concerning genome ecology is examined closely, there is a trend of fundamental confusion by authors about what exactly constitutes ecology or evolution at the TE level (Linquist et al., 2013). Ecological models of competition between elements take place over thousands of host generations (Abrusán and Krambeck, 2006), during which one would assume evolution at both host and element levels would take place.

Indeed, questions that Brookfield (2005a) identifies as answerable within a genome ecology context are in fact more relevant to TE evolution than TE ecology.

144

It seems that the genome ecology movement represents a recognition by some in the TE community that current research models are ‗missing something‘ in their ability to explain TE patterns at large. Genome ecology proposes an analogy between ecological communities and TE communities, to some extent implying that multiple levels of ecology are possible and that this new research model could have explanatory power (Mauricio, 2005; Le Rouzic et al., 2007).

Linquist et al. (2013) demonstrated two examples of conceptual confusion. First, ecology and evolution at the TE level were being used interchangeably even though these are two different approaches hinging on whether heritable change is occurring over successive TE-generations

(Linquist et al. 2013). If heritable change is likely to be a factor at the host and/or element level, then using ecological approaches (eg. Lotka-Volterra dynamics) is not likely to be useful.

Second, the term genome ecology was more appropriate to studying how ecological factors at the host level might affect TEs, such as host/organism range size and exposure to TEs through horizontal transfer events (Venner et al., 2009). This should be distinguished from transposon/TE ecology, which encompasses the effect ecological proxies at the genome level, such as how TEs are arranged spatially on chromosomes, have upon TEs (Linquist et al. 2013). This confusion in the literature between ecology and evolution strongly suggests that the TE level needs to be defined much more clearly, namely in identifying the boundaries of both TE ecology and TE evolution (Linquist et al., 2013). The genome ecology push was a way to emphasize the benefit that a TE-level perspective might have in conducting research, albeit with a focus on ecology.

How further development of the element-level perspective can benefit genome biology

Some of the greatest unanswered questions in genome biology involve repeats in general, but TEs in particular because they, on average, make up most of the identified repeats in

145 sequenced eukaryote genomes (Elliott and Gregory, 2015a; Chapter 3). The types of TEs found in genomes varies across the broad swath of genome sizes, with the number of superfamilies being highest in small and lowest in very large genomes, and variance in superfamily number increasing between them (Elliott and Gregory, 2015b; Chapter 3). Both the abundance and presence of the different superfamilies varies across the diversity of eukaryotes, and the reasons for this remain unknown. Recent work has sought to quantify the changes in abundance and diversity at the superfamily and family levels between genomes in greater detail (eg. Serra et al.,

2013;Staton and Burke, 2015). While these go beyond the much simpler descriptions of TE content, they are usually focussed on host-level processes as causal drivers of change within the populations of TEs. While evolutionary and demographic changes in the host population will most certainly influence TE populations and TE community composition, one must not forget the inherent hierarchical nature of the genome, with TEs nested within genomes, genomes within individual organsims, and organisms within populations (Doolittle, 1989; Gregory, 2004). TE abundance, distribution and diversity across genomes are the product of interactions at the host, element and perhaps even higher levels in the hierarchy of evolution? (Brunet and Doolittle,

2015). When looking at the composition of communities of organisms, one would not suggest that by studying the abundances of species alone you could reach a satisfactory understanding of them, without considering factors like how they have evolved and their phylogeography. TEs are best understood as being influenced by the evolutionary forces at the host level and by evolutionary forces at the element level. This has been at least implicitly recognized by some researchers, who value understanding the genetic variation of TEs families as another explanatory variable for their current distribution and abundance in genomes (Senerchia et al.,

2013; Middleton et al., 2013; El Baidouri and Panaud, 2013).

146

Another example of how element-level thinking could benefit the field of TE biology is in deciphering the effects of host breeding system on TE dynamics. Early models predicted that, depending on how selection acts upon new TE insertions, the amount of selfing/asexual reproduction a genome undergoes would either result in a decrease or increase in TE copy number (Wright and Schoen, 1999; Morgan, 2001). Empirical investigations of TEs and the mode of host reproduction have also yielded a variety of outcomes. The bdelloid, asexual rotifer

Adineta vaga has a variety of TEs, in low copy number and with very few active elements (Flot et al., 2013). Zeyl et al. (1994) observed TE dynamics in sexual and asexual lines of the algae

Chlamydomonas reinhardtii over the course of a year and saw no change in TE abundance, nor did Kraaijeveld et al. (2012) see any significant difference in TE abundance between sexual and asexual Leptopilina clavipes wasp lineages. The selfer Arabidopsis thaliana has fewer unique families and a recent reduction in TE copy number compared to its outcrossing relative, A. lyrata

(de la Chaux et al., 2012). Conversely, Ågren et al. (2015) found no relationship between mode of reproduction, TE dynamics and genome size increase in a genus of evening primroses.

The complexity of the relationship between breeding system and TE dynamics has gained some recognition recently, and factors such has host demography and length of time a host species has been selfing, or asexual, is thought to be important (Boutin et al., 2012; Crespi and

Schwander, 2012; Ågren et al., 2014). Assuming that uniform constraints affect all TEs in a given genome due to differences in breeding system is perhaps too simplistic. TEs within a genome will vary in their rates of transposition, regions of insertion, length of time present in the genome, not only between TE families but possibly within as well given enough genetic variation present in a single family. If an asexual means of host reproduction imparts pressure upon the element, its ability to respond to that pressure will depend upon multiple factors. These

147 factors include the relevant evolutionary forces acting at the level of that particular TE population, in addition to forces at the organism level, as is discussed later in the next section.

An effective TE response could include evading host defenses, evolving a more efficient form of self-regulation or the evolution of the transposition reaction to be less deleterious, such as the de novo evolution of site specificity for particular regions within the genome or sequences

(Charlesworth and Langley, 1986; Chalker and Sandmeyer, 1992; Bestor, 1999). Some TE populations within a genome could respond in one of several of these ways, or not, depending on the variation present within those populations, and whether any new mutations pertinent to adaptation appear. What remains is to generate a framework with which to answer these questions which takes into account the nuances of TE biology, and whether or not these differ from how traditional populations of organisms evolve.

Outstanding questions about TE biology from an element-level perspective

In order for the element level to receive as much as attention as the host level has received over the past ~35 years, certain questions need to be addressed. A non-exhaustive list of questions can be found below, divided into two sections. The first concerns what I call ‗boundary conditions‘ concerning TEs, namely how to best delineate particular TE individuals, populations and species. Secondly, how do the evolutionary factors of effective population size (Ne), mutation, selection and mode of reproduction manifest at the level of TEs.

Boundary conditions

Boundary conditions encompass how best to separate different demographic units from one another at the TE level. These bounded entities begin with individuals; where does one

148 individual TE end and another begin? Similarly the best means to differentiate TE populations from each other, and whether or not TE species exist requires discussion.

i) What is a TE individual?

At its most basic level, much of ecologically- and biodiversity-oriented biology starts with counting and characterizing individual members within the same population or species in a given area of interest. While counting individual organisms can be straight forward (at least in principle), for example with large, conspicuous mammals, it can sometimes be quite difficult to know what to count, such as in the case of clonal but connected organisms including some plants and fungi (Clarke, 2012). TEs appear to be easily counted by copy number, but upon further consideration this is not necessarily the only option. Imagine a group of DNA transposons of the same family within a genome. If a burst of transposition had recently occurred these elements might all be identical, or nearly so. Because transcription and translation are decoupled in eukaryotes, cut-and-paste transposases lack a cis preference for the element they were transcribed from and will bind to any ITR structures from elements within that recently transposed population. Due to their near identity and effective functional equivalence with respect to transposase interaction, conceivably the entire collection of elements could be considered a single individual due to the lack of selective differences between the individual copies. Over time as the elements diverge and the ability for transposases from different elements to cross-mobilize between copies diminishes, it could be argued that new TE individuals arise. Conversely, the interactions between the proteins and mRNA of non-LTR elements is thought to be much stronger, and that both stay associated with each other throughout the reverse transcription process (Han et al., 2010), suggesting that each active copy is truly akin to an individual. This is one possible way of assigning individuals within a TE population, but

149 not the only conceivable one. The question of TE individuality may not have a simple answer, but if counting entities is the first step in trying to understand their evolution it must be considered.

ii) What is a TE population?

Populations of organisms can be small and isolated, or massive and spread over vast distances as interconnected metapopulations (Waples and Gaggiotti, 2006). Population structure is an important feature for a species which affects all the classical evolutionary forces (Holsinger and Weir, 2009). The boundaries of a given TE population could theoretically correspond very closely to that of its host organisms population, although local extinctions of particular TEs could occur in small, isolated host populations (Jurka et al., 2011). A TE population could have added structure in the fact that within a genome they are spread across different chromosomes, and the propensity for inserting into the same or different chromosomes during transposition might vary (eg. Zhang and Spradling, 1993; Montiel et al., 2015). Do different types of TEs vary substantially in how they are structured within and between genomes? How much does variation in a given TE population partition between and within both chromosomes and genomes? How do rates of transposition between chromosomes and within chromosomes vary? Can TEs be considered to be structured into metapopulations (Quesneville and Anxolabéhère, 1998; Le

Rouzic, 2002)? In genomes with low deletion rates where older TE copies are allowed to accumulate, does the population display age structure, or are older element more akin to a different population?

iii) Are there TE species?

TEs have their own taxonomic hierarchy, organized at the highest level into classes based on the mode of transposition. While the highest level of organization is less controversial (but see

150

Piégu et al., 2015 for a dissenting opinion), there is less agreement about the boundaries between lower levels of the organization, such as family and sub-family. Moreover, it remains unclear to what extent TE classification reflects evolutionary relationships rather than convenient but biologically arbitrary constructs. The family and sub-family level are generally assigned through sequence similarity via a variety of thresholds (Wicker et al., 2007; Bao et al., 2015). Using sequence similarity to assign families has been criticized by some as allowing for a mixture of mono-, para-, and polyphyletic groups being considered as families, and that families should reflect a shared, phylogenetic history (Petersen and Seberg, 2009; Seberg and Petersen, 2009).

Related to this is the question of whether or not the families we assign based upon phylogeny, sequence similarity or a combination of the two are biologically meaningful categories. Can TE families be considered species? A species, like an individual, implies some internal cohesion and external separation from comparable units within the same environment. Are cohesion and separation maintained via the interactions between the proteins and nucleic acids of a given prospective TE family? Is the frequency of recombination (see below) a barrier to the mixing between two distinct TE species as suggested by Flavell et al. (1997)? When can two TE lineages be considered distinct species and how does the speciation process at this level occur?

Do new TE families mainly arise through drift, as suggested by Jurka et al. (2011), or via intragenomic selection and adaptation to their genomic environment?

Evolutionary factors

Separate, but related to boundary conditions, are the evolutionary factors acting upon

TEs. These are the same factors relevant to the evolution of organisms, but they are not necessarily translated from the organism to TE level without taking some consideration.

151

i) What is TE effective population size?

The property of Ne has been of interest in evolutionary biology, particularly population and conservation genetics, for some time (see Charlesworth, 2009). Ne represents what the size an idealized population that would need to be to have the same value for a specified property that a real population of organisms posseses. These idealized populations are assumed to have much simpler properties than a real population, such as equal sex ratios and random mating. Factors such as migration rate, inbreeding, census population size (Nc) and mode of reproduction all inform Ne for a given species or population, and typically Ne is much smaller than the Nc

(Charlesworth, 2009). Conventionally, populations with smaller relative Ne experience a greater change in frequencies mediated by genetic drift to the, relative to the contribution by natural selection. Being able to estimate Ne is therefore a useful tool in assessing how a population has evolved in the past and how it might continue to evolve in the future. Being able to determine the Ne for a particular TE population would be necessary in assessing how strongly drift and selection at the intragenomic level would act, and how well said TE population could respond to selection pressures. Interpreting the behaviour of TEs solely through the lens of intragenomic selection would just be adaptationism applied to the TE level, which would be best remedied by a study of how much drift effects TEs. Methods to estimate TE Ne would need to be adapted, or developed de novo, perhaps using approaches developed in the 1980s as a starting point (Ohta, 1984; Slatkin, 1985; Brookfield, 1986; Hudson and Kaplan, 1986; Charlesworth,

1986). A TE population being nested within a host population would perhaps present novel factors that would influence TE Ne, and intuitively one would think TE Ne because would always be smaller than host Ne but need that always be the case? How does host Nc and Ne affect TE Ne for the TEs found within that genome? Is there a uniform effect across the different taxonomic

152 categories of TEs? How predictive is current TE Ne for the success of a particular family in a given host genome? What are the criteria for an idealized TE population that would be comparable to the criteria in a Wright-Fisher organismal population?

ii) What are the rates and dynamics of the factors of mutation and selection at the level of

TE populations?

Like any population of evolving entities, a TE population would be influenced by other evolutionary factors, such as selection, mutation and gene flow, in addition to drift. Intragenomic selection, acting between elements in competition for space and resources, is often mentioned in the TE literature, but explicit discussions of this concept are few and far between. Witherspoon and colleagues (Witherspoon et al., 1997; Witherspoon, 1999; 2000) have discussed intragenomic selection in cut-and-paste DNA transposons with the most nuance. Because of a lack of cis-preference of a transposase protein for the element it was produced from, intragenomic selection is ineffective at maintaining active elements in a DNA transposon population due to the decoupling of transposase and element fitness (Witherspoon, 2000).

Witherspoon (1999) outlines situations where selection could favour elements which produce functional transposase, one of which was a trait group selection model that was shown to be stable for a variety of conditions (Witherspoon, 2000). The ideas broached in these works are thought provoking but have received little attention in the literature, and beg to be tested using the wealth of genomic data now available. In general, the dynamics and strength of selection acting on TEs have yet to be investigated beyond observing evidence of purifying selection acting upon the protein coding regions of DNA transposons (Subramanian et al., 2007; Mota et al., 2010; Zhou et al., 2010), LTR retrotransposons (Jordan and McDonald, 1999; Gómez et al.,

2006; Jiang et al., 2011; Novikov et al., 2012), and non-LTR retrotransposons (Bao and Jurka,

153

2010; Granzotto et al., 2011). However, evidence suggests that more than just selection for functional mobility proteins is relevant to TE evolution, as that only covers the fecundity component of TE fitness. The RNA interference apparatus found within most eukaryotes is a known policing mechanism against TE activity, and consequently a number of interesting putative adaptations to avade this system have been found within TEs (Nosaka et al., 2012;

McCue et al., 2013; Fu et al., 2013). As well, a number of TEs have been found to contain other

ORFs of unknown or cryptic function, which suggests that other selection pressures may be acting upon TEs (Warbrick et al., 1998; Novikova, 2010; Böhne et al., 2012; Gao et al., 2012;

Abrusán et al., 2013) An extensive review of the types of pressures experienced by TEs has yet to be done to address questions such as, how much does the strength of intragenomic selection vary across TEs, and what factors affect it? Do smaller TE populations exhibit weaker signals of selection and less of an ability to respond to selection pressures as in organisms?

Mutation rates for TEs are usually assumed to be equal to the neutral, background rates measured in the host (Ma and Bennetzen, 2004; Ray et al., 2008; Hellen and Brookfield, 2012).

Lack of proofreading in the reverse transcriptase (RT) protein probably drives this rate up in retrotransposons. How many and what types of mutations can TEs accrue before they are rendered inactive as opposed to non-autonomous? How much does the mutational process differ between families and classes of TEs, as well as between different genomes (eg. Hellen and

Brookfield, 2012)? Doolittle (1984) spoke of their being a population genetics of the components of the genome (eg. TEs), and developing such a framework would be essential to understanding element-level evolution. W.F. Bodmer alluded to something similar in the commentary section discussing Walker‘s (1979) conference presentation, when talking about understanding drift and selection of the ―organisms within organisms‖ represented by selfish DNAs such as TEs. The

154 foundation for such a thing already exists in the literature which seeks to model the spread of

TEs within host populations (Le Rouzic and Deceliere, 2005). Brookfield (2005b) modelled the spread of a beneficial mutation through a TE population, and showed that times to fixation are immense due to having to spread through both the host and element populations. This model did not include recombination between elements as a parameter, so for some TEs these fixation times might be much smaller than predicted. More work of this nature needs to be done, incorporating more accurate information about TE mutation, recombination, drift, selection and rates of gene flow to better understand the evolutionary factors governing TE populations. This

TE population genetics would also be nested within the host undergoing its own array of factors, and how these two levels interact to sort TE variation remains to be properly understood.

iii) Do TEs have different reproductive modes?

The reproductive mode of organisms varies across living things and has been extensively studied for its effects on the evolution of populations (de Visser and Elena, 2007). Whether or not a species undergoes sexual or asexual reproduction, and at what frequency and manner, has a variety of effects on the factors that influence their evolution. Although rarely explicitly discussed, TEs have been though of as asexual entities (Brookfield, 1995). But, when cast in the light of recombination there is evidence to suggest processes analogous to sexual reproduction does occur in TEs. Holmquist (1989) was the first to suggest that mobile DNA could have different forms of reproduction depending on the intra-group recombination rate. The process of

LTR retrotransposition requires two copies of the mRNA be present in the virus-like particle

(VLP) prior to reverse transcription (Havecker et al., 2004). These two copies need not come from the same parental insertion however, and during the production of the cDNA progeny copy, the reverse transcriptase can jump between the two RNA genomes, resulting in a recombinant

155 progeny element, analogous to sexual recombination (see Bleykasten-Grosshans et al., 2011 for an example in yeast). LTR elements appear to be the only category of TEs with an explicit step in their replication cycle that is analogous to sex, but evidence of recombination has been found in DNA transposons (Fischer et al., 2003; Novick et al., 2011; Vergilino et al., 2013), SINEs

(Yadav et al., 2012) and LINEs (Bringaud et al., 2006; Sharma et al., 2008). Whether these examples occur frequently enough to be analogous to sexual reproduction would require investigation. Do TEs which frequently undergo recombination enjoy similar benefits to sexually reproducing organisms (Schaack et al., 2010)? How often does recombination occur in a TE population, and is it enough to confer the same benefits as sexual reproduction? Does the rate of recombination vary across different taxonomic categories of TEs, is it host genome dependent or both? Do TE taxa with higher rates of recombination also have higher abundances in genomes and/or are they found to be more widely distributed across eukaryotes?

A new TE biology which acknowledges multiple levels of evolution

What would the field of TE biology look like if both the levels of organism and element were given more equal footing than they enjoy now? Phenomena involving TEs need to be seen at both the organism and element levels, much like how adaptationism in conventional biology should be tempered by carefully considering non-adaptive explanations for traits as well.

Stress and TE activation in the light of the TE level

Let us explore how viewing TEs as their own level might lead to us better understanding them in an example. The stress-induced transcription and/or transposition of TEs has been observed in a variety of organisms under a variety of conditions, such as excess heat (Matsunaga

156 et al., 2012), cold, (Ueki et al., 2008), salt (De Felice et al., 2008), radiation, (Tanaka et al.,

2012). Interpretation of this activity substantially swings the pendulum to the organism side, positing that this stress activity generates beneficial mutations for the organism when they are needed (Syvanen, 1984; Wessler, 1996; Kidwell and Lisch, 1997; 2000; 2001; Chénais et al.,

2012). It has been established that new TE insertions generated during periods of stress can change gene expression and sometimes bring new genes under stress-induced regulatory control

(eg., Naito et al., 2009; Feng et al., 2013). What is unknown is the consequences of such activity.

Its possible that activity could be beneficial at levels above the organism, bestowing demes or species with an advantage over their competitors lacking, or with lessened, TE activity during stress (Brunet and Doolittle, 2015).

What if we view this phenomenon from the level of elements? What advantages might stress-induced activity bestow? Many stressors are known to cause DNA damage (eg., Schwarz,

1996; Sinha and Häder, 2002; Velichoko et al., 2012), and while TEs can induce double-strand breaks in DNA themselves, they do not possess the means to repair these breaks, and so rely on the organism. Perhaps TE activity during stress is adaptive for elements by taking advantage of a time when DNA repair processes will be active outside of meiosis or mitosis. Bennetzen (2000) drew parallels between viruses and TEs and proposed that stress-activity might be adaptive for the horizontal propagation of TEs. Such stresses could include viral or parasitic infections both of which could be vectors to invade new genomes, or a new host in the latter case (Gilbert et al.,

2010; Kuraku et al., 2012; Gilbert et al., 2014; Guo et al., 2014).We must not fall into the adaptationist trap when considering phenomenon from the perspective of TEs either. In cases where TEs lack explicit stress-responsive promoters but are still active during stress, the explanation could be more accidental than adaptive. Stress is known to alter the state of

157 normally-silenced regions of chromatin, and allow TEs found within these regions to be more active (Kaeppler et al., 2000; Kim et al., 2001; Madlung and Comai, 2004). The activity of some

TEs during stress may be nothing more than a side-effect of activating other genes during the stress response, or a general failure of the silencing machinery. Activity then might be neutral or even maladaptive at both the organism and element levels. The activity of TEs in some somatic tissues of multi-cellular organisms under some conditions could be interpreted in this light

(Collier and Largaespada, 2007; Kano et al., 2009).

Experimentally disentangling element-level evolution from organism-level evolution

Not ignoring the TE level when thinking about TE evolution is useful not only conceptually, but practical applications stemming from it can also be imagined. TE biology enjoys a rich history in molecular biology, and molecular investigations into the transposition mechanisms, genetic and epigenetic consequences of TEs have been ongoing since the late 1960s

(see chapters in Craig et al., 2015). Understanding TEs requires incorporating the consequences of causal factors at several levels, which can make interpreting patterns seen in genomes difficult. These patterns in TE abundance and distribution are the product of history and contingency, which cannot always be properly accounted for when sussing out causation.

While there will always be a place for looking at natural TE patterns in genomes, perhaps it is time a new TE biology, informed by multiple levels of evolution, made use of the methods found in experimental evolution (Kawecki et al., 2012). Experimental evolution seeks to understand the causes and consequences of evolutionary factors on populations via their more direct control and manipulation in an artificial setting. A system would be needed where this level of control could be exerted over both the host organism and the TEs of interest. Some simple experiments have already been performed to assess the effects of breeding system on TE

158 copy number over time (Zeyl et al., 1994; Zeyl et al., 1996), and to measure transposition rates of various elements (Harada et al., 1990; Domínguez and Albornoz, 1996; Nuzhdin et al., 1997), but these lacked any form of direct manipulation of the genetic composition of elements or the tracking of element over time. Copy number changes can affect both the organism and element levels but this is more analogous to pure demographic changes in the population and not evolution via changes in allele frequencies.

Saccharomyces cerevisiae, brewer‘s yeast, has been a model organism for decades, and the few families of Ty1/Copia and Ty3/Gyspy LTR retrotransposons that inhabit its genome have served as model TEs as well (Curcio et al., 2015; Sandmeyer et al., 2015). Saccharomyces cerevisiae is an easily cultured eukaryote with a short generation time, a well characterized genome with multiple strains sequenced, and is very amenable to genetic manipulation (Botstein and Fink, 2011; Engel and Cherry, 2013). This flexibility has already made yeast a successful platform for experimental evolution (Zeyl, 2006). Efforts are currently underway to create a synthetic yeast genome lacking features such as introns and TEs, while inserting loxP inducible recombination sites to allow for rearrangement of the genome upon command (Annaluru et al.,

2014). More modest genetic changes that can be made to yeast include engineering asexual strains via deletions to meiosis genes or forced homozygosity at mating loci (Birdsell and Wills,

1996; Goddard et al., 2005), manipulations of ploidy level (Selmecki et al., 2015), and targeted mutations to select genetic loci using the versatile CRISPR/Cas system from Streptococcus pyogenes (DiCarlo et al., 2013). Although mutation rates in yeast can be relatively low, at 1 x

10-4 single substitutions/diploid cell, mutator strains of yeast can be created via deletions to the DNA repair machinery, such as the MSH mismatch repair genes, to increase rates of mutation 100-fold (Drotschmann et al., 1999). These features would make yeast an excellent

159 system where control over the organism-level could be used to better understand evolutionary responses to a variety of pressures at the TE-level. Although not able to reproduce as rapidly as bacteria, yeast populations have been allowed to evolve experimentally for thousands of generations (Francis and Hansche, 1972; McDonald et al., 2012), giving time for a TE population to evolve in response to a given selection pressure or to track the effect of mutant TEs introduced into a population or naturally generated through an increased mutation rate.

S. cerevisiae already supports the transposition of the two major groups of LTR retrotransposons, and assays have demonstrated the ability of the yeast genome to support the transposition of a Zorro3 non-LTR retrotransposon from Candida albicans (Dong et al., 2009), along with various cut-and-paste DNA transposons from other eukaryotes (Weil and Kunze,

2000; Yang et al., 2009). All of the other major taxonomic categories of TEs (Cryptons,

Helitrons, M/P elements, SINEs, Penelope, DIRS) are found in other yeasts and fungi (see data set in Elliott and Gregory, 2015b), and their transposition feasibility within S. cerevisiae could be tested, and if found viable, could be used in experimental evolution. A small number of individual TEs could be introduced into the yeast genome as a starting population, but if more control was needed methods could be adapted which use the LTRs of endogenous Ty1 elements as integration sites to allow for the seeding of genomes with TE copies in a copy number-specific manner (Parekh et al., 1996).

Yeast populations could be held as constant as possible at large population sizes in chemostats to remove top-down organism effects on the TE population caused by genetic drift.

Should the interest be in organism-effects on the TE population, it would not be difficult to test the effects of a selection regime, genetic drift at the organism-level through dilution and serial transfers, fluctuating population size, migration from other yeast populations, and a variety of

160 other factors. As previously mentioned, mode of reproduction and polyploidy can be induced and controlled to a degree in yeast and would allow for their effects on TE evolution to be evaluated experimentally. Although S. cerevisiae lacks the genes for mounting an RNAi response, thought to be an important selection pressure acting on TEs, these have been found in other yeasts, and when introduced into the S. cerevisiae genome, generated interfering RNAs against its own TEs

(Drinnenberg et al., 2009). Observing how the presence of a functional RNAi response affected the evolution of a TE population in real time would be valuable. The evolution of a TE population in response to a variety of factors, and how factors at the element and organism level interact, could be studied experimentally (see Table 4.1 for a list of examples). Samples of the yeast, and consequently element, population could be taken throughout a given experiment, stored and used for comparative, competitive or growth experiment purposes to assess the fitness of the organism, elements or both. This would also be a useful system for testing TE ecology more explicitly without manipulating the genetic composition of the TEs, such as by manipulating the number of TE superfamilies in a genome and tracking abundance within each over time, or looking at the dynamics of elements competing for a common resource, such as proteins involved in transposition (Le Rouzic and Capy, 2006; Robillard et al., 2016) or shared genomic target sites (Ye et al., 2005; Elliott et al., 2013).

Concluding remarks

The propensity to view TEs more so in light of how they contribute to the evolution of organisms is understandable. As we are organisms ourselves, it is logical to start at the organism level to begin to understand function, origins, ecology and evolution within biology. The mutagenic nature of TEs makes them capable of influencing the evolution of organisms, and

161 focussing on this aspect given our predilection for the organismal level is not unusual. As well, in an ever more competitive world for sources of research funding, relating one‘s work to medically important or applied goals adds another motive to stress the the host-level. To paraphrase Maslow (1962), those who only have a hammer view the world as nothing but nails.

The selfish DNA concept, and its precursors, provided a different way of viewing the genome and TEs that stressed the existence of multiple levels. However, organism-centrism proved to be a hard trope to break free of, for the reasons above, and others mentioned in a previous chapter.

However, it is difficult to deny the fact that genomes are composed of multiple levels of interacting entities, sometimes with divergent fitness interests (Doolittle, 1989; Gregory, 2004).

This makes genomes much more complex than a collection of alleles only at the mercy of evolutionary and demographic factors imposed at the level of the organism, an already complicated entity on its own. If genomes were more simplistic, an organism-centric perspective would not be as problematic.

As in any field, much about transposon biology remains unanswered, but a better way forward should involve the integration across both the levels of the organism and the elements.

Abandonment of the organism level for that of the elements would be just as unwise as the previous dominance of organism-level thinking at the expense of the element level. The genome is the product of evolutionary factors acting simultaneously at several levels, sometimes in concert and sometimes in conflict. A population of TEs within a genome grows, contracts and responds to selection pressures to which it is exposed. This is analogous to any organismal population, with the important exception that the variation present is sorted again by factors operating at the organism-level. What remains to be done now is developing this analogy of the

TE population, and determining how much of the wealth of theory and methodology from

162 organismal biology and evolution can be transposed to the TE level.

The future of TE research

I would argue that the field of transposon biology has reached an inflection point, and I believe that many fellow researchers would agree with this assessment. The lack of a more fully developed element-level is just one aspect of how change is needed in transposon biology. One of the greatest difficulties in genome assembly is the repetitive portion, and then attempting to make sense of said portion is a further hurdle. The tools to accomplish this second task are varied, using homology, de novo and structure-based methods to identify and categorize the TE content of a given genome (Bergman and Quesneville, 2007; Janicki et al., 2011). Within these subcategories there is still a number of different tools that can be used, as well as custom-made ones (see Elliott and Gregory, 2015b: supplementary material concerning repeat annotation methods). This variation presents a problem, namely how do these methods perform in producing TE annotations of comparable quality? Ragupathy et al. (2013) were among the first to comment on the non-standard annotation of TEs in plant genomes, namely in the context of how this lack of a standard affects the ability to understand the effects of TEs and other repeats on epigenetic regulation in plants. More recently, Hoen et al. (2015) argued that a benchmark for

TE annotation needs to be found, not only so better comparisons between genomes can be made, but also so there is some sort of baseline for the development of better tools for annotation. Platt et al. (2016) compared de novo and homology-based approaches to TE annotation on the same data sets and concluded that a hybrid approach was best, pointing towards requirements for a benchmark. They observed that homology alone resulted in annotation of a significantly different number of TE copies and lineages than de novo or combination approaches. Moreover, the identification of recently active families, and the detection of novel repeats was seriously

163 hindered.

Annotation of TEs is not the only aspect of transposon biology that deserves evaluation.

The classification of TEs began in the late 1980s, with Finnegan‘s (1989) proposal that TEs be divided into two Classes based on whether or not they employed an RNA intermediate during transposition. What this and subsequent alternatives/expansions to classification lack is the assignment of groups based on phylogenetic relationships, as argued by Piégu et al. (2015). For instance, PiggyBac and Crypton superfamily elements are both included as Class II, or DNA , transposons when the proteins they encode do not share common ancestry, and the mechanisms of their transposition are quite different. Also, the integration of LTR elements shares recent evolutionary history with TEs encoding transposases, yet these two groups of elements are not in the same class (Piégu et al., 2015). Current classification systems for TEs also tend to exclude related elements from non-eukaryotes, even though phylogenetic relationships between TEs from the two groups are evident. Piégu et al. (2015) propose a new classification system but also advocate that a conversation needs to occur within the community on the best approach overall, and ways to include prokaryotic elements, other forms of selfish DNA that are usually excluded, and to account for a probable lack of common ancestry for all mobile DNA. In essence, what is sought is a classification system with evolutionary and biological meaning, and the same could be said for annotation and curation. For myself, the question of whether the way TE families are assigned has universal, biological meaning remains unanswered as well. Do members of TE families have cohesiveness similar to species that supports their grouping together?

TE research would be more difficult without the hard work and resources made available by those who maintain various TE databases. These run the gamut from highly specific databases, such as the repository of plant Sireviruses (Darzentas et al., 2010), to more general

164 and widely known databases such as Repbase (Jurka et al., 2005). In general, useful information can be found in databases, but these are often only limited to a certain taxonomic subset of TEs or organisms, and some seem to receive little support from the community. For instance, the placement of families within the phylogeny of Ty3/Gypsy elements in GyDB (Lloréns et al.,

2011), would be valuable information to extend to a wider array of TE lineages, and then make easily accessible. WikiPoson is a wiki that is open to registration by the general community to upload and modify entries about TEs, however it appears to have received little activity in the past 9 years (Wicker et al., 2007).

Repbase is the prime example of a TE database, having been created as part of the

Genetic Information Research Institute (GIRI), founded by in 1994. Repbase has been the premier database for TE information for both the TE community and sequencing community at large, providing a library of curated repeat entries with which to query genomes.

Despite its long history serving the TE and genomics communities, it is not lacking flaws. It remains cryptic as to how new families or superfamilies are assigned, and a number of superfamilies that have been discovered and named by GIRI lack a formal description in the primary literature. Searching Repbase can be difficult, and although entries within the database contain a number of fields of metadata, the ability to to filter using this metadata is not an option.

Repbase also displays a common taxonomic bias which can be found in genomics in general, with the curated TE entries of 60% of genomes being from 80 species of vertebrates and arthropods (Bao et al., 2015). This is probably due to the fact that voluntary submissions of new

TE information to the database from outside researchers is extremely low, and most new information submitted to the database has come from the small number of GIRI researchers over the past 22 years (Bao et al., 2015). Sadly, Jurka passed away in the summer of 2014, and since

165 then, , the other fixture at GIRI, has left. Moreover, sources of funding to maintain RepBase have dwindled, leaving the future of this TE database in question.

The evolution of annotation, curation, classification and storage of TE information all require effort on the part of the TE community itself. Has the time come for a body to form, such as society of transposon biology, which could serve to set standards for TE research? A body that could act as a resource for non-specialists to consult when seeking out TE information, and perhaps as a home for a centralized TE database? Piégu et al. (2015) has already proposed that within the next ~4 years, a committee should form to address the classification of TEs, but I would argue we could go further. A society or institute could serve to maintain benchmarks for annotation, and make this available for use in new sequencing projects. A society could also serve as a better-supported, stable entity around which a central database could be maintained for

TE information. This database could include curated catalogues of TE content for a variety of genomes, but would ideally include information of interest to a wide variety of researchers. This could include meta-data on suspected horizontal transfer events, proposed instances of exapted

TEs, insertions associated with particular phenotypes or disease pathologies, and a variety of other categories. A resource that aims to serve as much of the TE and non-TE research community as possible would be more likely to attract public funding, rather than resources tailored to highly-specific research avenues. As larger and larger data sets of genomes are sequenced, the hurdle to understanding TEs and their effects on organisms becomes larger and larger without standards in place.

No single researcher will be able to answer most or all of the questions in TE biology.

Such an endeavour will take the effort of many people within the community. Part of this will involve taking a critical look at the history of TE biology, and its current state, and accepting the

166 fact that things could be improved. There is great opportunity in the coming years for advancing our understanding how TEs have evolved, how they interact with each other and with host genomes, and how they influence organismal phenotypes. Achieving this understanding will require continued developments in both the conceptual framework and the empirical tools used to study these extremely abundant and diverse genomic entities.

167

Tables and figures

Table 4.1. Conditions and selection pressures under which the experimental evolution of a TE population could be studied in a Saccharomyces cerevisiae system.

TE population Starting TE population size factors which could be controlled TE size modulated through spacer DNA

Proportion of TE population composed of autonomous and non- autonomous individuals

Number and type of TE alleles in TE population

Constitutive and/or inducible expression of transposition Organism population Varying rates of migration and populations structure at the organism factors which could level (Deceliere et al., 2005) be controlled Organism level selection on trait(s) unrelated to TEs

Presence or absence of RNAi response (Drinnenberg et al., 2009)

Sexual versus asexual reproduction (Boutin et al., 2012)

Organism population size, and fluctuation thereof

Unicellularity versus multicellularity (Ratcliff et al., 2012)

Ploidy (Selmecki et al., 2015)

TE evolution of Spread of intragenomically favoured TE allele throughout TE and interest to be organism population (Brookfield, 2005b) observed Effect of element size on ectopic recombination frequency (Petrov et al., 2003) and transposition

Possible benefits of stress-induced TE activity on TEs, organisms or organism population

Effect of increased variation in the starting TE population on long-term success

Effect of different ratios of autonomous to non-autonomous element on long-term success of TE population

168

References

1000 Fungal Genomes Project. http://1000.fungalgenomes.org/home

1000 Plants and Animals. http://ldl.genomics.cn/page/pa-research.jsp

1000 Plants. https://pods.iplantcollaborative.org/wiki/display/iptol/OneKP+Capstone+Wiki

Abe, H., Fujii, T., Shimada, T., Mita, K. 2010. Novel non-autonomous transposable elements on W chromosome of the silkworm, Bombyx mori. J. Genet. 89, 375-387.

Abrusán, G., Krambeck, H.-J. 2006. Competition may determine the diversity of transposable elements. Theor. Pop. Biol. 70, 364-375.

Abrusán, G., Szilágyi, A., Zhang, Y., Papp, B. 2013. Turning gold into 'junk': transposable elements utilize central proteins of cellular networks. Nucl. Acids Res. 41, 3190-3200.

Ågren, J.A., Greiner, S., Johnson, M.T.J., Wright, S.I. 2015. No evidence that sex and transposable elements drive genome size variation in evening primroses. Evolution. 69, 1053- 1062.

Ågren, J.A., Wang, W., Koenig, D., Neuffer, B., Weigel, D., Wright, S.I. 2014. Mating system shifts and transposable element evolution in the plant genus Capsella. BMC Genomics. 15, 602.

Ahn, S.J., Kim, M., Jang, J.H., Lim, S.U., Lee, H.H. 2008. MMTS, a new subfamily of Tc1-like transposons. Molec. Cells. 26, 387-395.

Albornoz, J., Domínguez, A. 1999. Spontaneous changes in Drosophila melanogaster transposable elements and their effects on fitness. Heredity. 83, 663-670.

Allen, T.A., Von Kaenel, S., Goodrich, J.A., Kugel, J.F. 2004. The SINE-encoded mouse B2 RNA represses mRNA transcription in response to heat shock. Nat. Struc. Mol. Biol. 11, 816- 821.

Alzohairy, A.M., Gyulai, G., Jansen, R.K., Bahieldin, A. 2013. Transposable elements domesticated and neofunctionalized by eukaryotic genomes. Plasmid. 69, 1-15.

Ambrožová, K., Mandáková, T., Bureš, P., Neumann, P., Leitch, I.J., Koblížková, A., Macas, J., Lysak, M.A. 2011. Diverse retrotransposon families and AT-rich satellite DNA revealed in giant genomes of Fritillaria lilies. Ann. Bot. 107, 255-268.

Ananiev, E.V., Gvozdev, V.A., Ilyin, Y.V., Tchurikov, N.A., Georgiev, G.P. 1978. Reiterated genes with varying location in intercalary heterochromatin regions of Drosophila melanogaster polytene chromosomes. Chromosoma. 17, 1-17.

Annaluru, N., Muller, H., Mitchell, L.A., Ramalingam, S., Stracquadanio, G., Richardson, S.M.,

169

Dymond, J.S., Kuang, Z., Scheifele, L.Z., Cooper, E.M., et al. 2014. Total synthesis of a functional designer eukaryotic chromosome. Science. 344, 55-58.

Anonymous. 1980. Can DNA properly be called selfish? Nature. 285, 604.

Arensburger, P., Hice, R.H., Zhou, L., Smith, R.C., Tom, A.C., Wright, J.A., Knapp, J., O'Brochta, D.A., Craig, N.L., Atkinson, P.W. 2011. Phylogenetic and functional characterization of the hAT transposon superfamily. Genetics. 188, 45-57.

Arkhipova, I.R. 2006. Distribution and phylogeny of Penelope-like elements in eukaryotes. Syst. Biol. 55, 875-885.

Arkhipova, I.R., Meselson, M. 2005. Diverse DNA transposons in rotifers of the class Bdelloidea. P. Natl. Acad. Sci. USA. 102, 11781-11786.

Arkhipova, I.R., Pyatkov, K.I., Meselson, M., Evgen‘ev, M.B. 2002. Retroelements containing introns in diverse invertebrate taxa. Nat. Genet. 33, 123-124.

Arnault, C., Dufournel, I. 1994. Genome and stresses: reactions against aggressions, behavior of transposable elements. Genetica. 93, 149-160.

Balaram, P. 2012. DNA: Finding uses for junk. Curr. Sci. India. 103, 607-608.

Bao, W., Jurka, J. 2010. Origin and evolution of LINE-1 derived "half-L1" retrotransposons (HAL1). Gene. 465, 9-16.

Bao, W., Jurka, M.G., Kapitonov, V.V., Jurka, J. 2009. New superfamilies of eukaryotic DNA transposons and their internal divisions. Mol. Biol. Evol. 26, 983-993.

Bao, W., Kapitonov, V., Jurka, J. 2010. Ginger DNA transposons in eukaryotes and their evolutionary relationships with long terminal repeat retrotransposons. Mob. DNA. 1, 3.

Bao, W., Kojima, K.K., Kohany, O. 2015. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 6, 11.

Batzer, M.A., Deininger, P.L. 2002. Alu repeats and human genomic diversity. Nat. Rev. Genet. 3, 370-379.

Baucom, R.S., Estill, J.C., Chaparro, C., Upshaw, N., Jogi, A., Deragon, J.-M., Westerman, R.P., SanMiguel, P.J., Bennetzen, J.L. 2009. Exceptional diversity, non-random distribution, and rapid evolution of retroelements in the B73 maize genome. PLoS Genet. 5, e1000732.

Baudry, C., Malinsky, S., Restituito, M., Kapusta, A., Rosa, S., Meyer, E., Bétermier, M. 2009. PiggyMac, a domesticated piggyBac transposase involved in programmed genome

170 rearrangements in the ciliate Paramecium tetraurelia. Genes Dev. 23, 2478-2483.

Belancio, V. P., Roy-Engel, A. M., Deininger, P. L. 2010. All y'all need to know 'bout retroelements in cancer. Sem. Cancer Biol. 20, 200-210.

Belancio, V.P., Hedges, D.J., Deininger, P. 2008. Mammalian non-LTR retrotransposons: for better or worse, in sickness and in health. Genome Res. 18, 343-358.

Bell, S., Dutta, A. 2002. DNA replication in eukaryotic cells. Ann. Rev. Biochem. 71, 333-374. Bennett, M.D., Leitch, I.J. 2012. Plant DNA C-values Database (release 6.0). http://www.kew.org/cvalues

Bennetzen, J.L. 2000. Transposable element contributions to plant gene and genome evolution. Plant Mol. Biol. 42, 251-269.

Bergman, C.M. and H. Quesneville. 2007. Discovering and detecting transposable elements in genome sequences. Brief. Bioinfo. 8, 382-392.

Bestor, T.H. 2000. Sex brings transposons and genomes into conflict. Genetica. 107, 289-295.

Biedler, J.K., Tu, Z.. 2003. Non-LTR retrotransposons in the African malaria mosquito, Anopheles gambiae: unprecedented diversity and evidence of recent activity. Mol. Biol. Evol. 20, 1811-1825.

Biémont, C. 2010. A brief history of the status of transposable elements: from junk DNA to major players in evolution. Genetics. 186, 1085-1093.

Bigot, Y., Hamelin, M.-H., Periquet, G. 1991. Molecular analysis of the genomic organization of the Hymenoptera Diadromus pulchellus and Eupelmus vuilleti. J. Evol. Biol. 4, 541-556.

Birdsell, J., Wills, C. 1996. Significant competitive advantage conferred by meiosis and syngamy in the yeast Saccharomyces cerevisiae. P. Natl. Acad. Sci. USA. 93, 908-912.

Blasco, M.A. 2005. Telomeres and human disease: ageing, cancer and beyond. Nat. Rev. Genet. 6, 611-622.

Bleykasten-Grosshans, C., Jung, P.P., Fritsch, E.S., Potier, S., de Montigny, J., Souciet, J.-L. 2011. The Ty1 LTR-retrotransposon population in Saccharomyces cerevisiae genome: dynamics and sequence variations during mobility. FEMS Yeast Res. 11, 334-344.

Blumenstiel, J.P. 2011. Evolutionary dynamics of transposable elements in a small RNA world. Trends Genet. 27, 23-31.

Böhne, A., Zhou, Q., Darras, A., Schmidt, C., Schartl, M., Galiana-Arnoux, D., Volff, J.-N. 2012. Zisupton – a novel superfamily of DNA transposable elements recently active in fish. Mol.

171

Biol. Evol. 29, 631-645.

Boissinot, S., Entezam, A., Furano, A.V. 2001. Selection against deleterious LINE-1-containing loci in the human lineage. Mol. Biol. Evol. 18, 926-935.

Bonferroni, C. E. 1935. Il calcolo delle assicurazioni su gruppi di teste. In Studi in Onore del Professore Salvatore Ortu Carboni. Rome, Italy.

Botstein, D., Fink, G.R. 2011. Yeast: an experimental organism for 21st century biology. Genetics. 189, 695-704.

Boutin, T.S., Le Rouzic, A., Capy, P. 2012. How does selfing affect the dynamics of selfish transposable elements? Mob. DNA. 3, 5.

Bringaud, F., Bartholomeu, D.C., Blandin, G., Delcher, A., Baltz, T., El-Sayed, N.M.A., Ghedin, E. 2006. The Trypanosoma cruzi L1Tc and NARTc non-LTR retrotransposons show relative site specificity for insertion. Mol. Biol. Evol. 23, 411-420.

Britten, R., Kohne, D.1968. Repeated sequences in DNA. Science. 161, 529-540.

Britten, R.J., Davidson, E.H. 1971. Repetitive and non-repetitive DNA sequences and a speculation on the origins of evolutionary novelty. Q. Rev. Biol. 46, 111-138.

Brookfield, J.F.Y. 1986. A model for DNA sequence evolution within transposable element families. Genetics. 112, 393-407.

Brookfield, J.F.Y. 1995. Transposable elements as selfish DNA. In: Sherratt, D.J. (ed.), Mobile Genetic Elements. Oxford University Press, Oxford, UK.

Brookfield, J.F.Y. 2005a. The ecology of the genome — mobile DNA elements and their hosts. Nat. Rev. Genet. 6, 128-136.

Brookfield, J.F.Y. 2005b. Evolutionary forces generating sequence homogeneity and heterogeneity within retrotransposon families. Cyt. Gen. Res. 110, 383-391.

Brunet, T.D.P., Doolittle, W.F. 2014. Getting "function" right. P. Natl. Acad. Sci. USA. 111, E3365.

Brunet, T.D.P., Doolittle, W.F. 2015. Multilevel selection theory and the evolutionary functions of transposable elements. Genome Biol. Evol. 7, 2445-2457.

Bundock, P., Hooykaas, P. 2005. An Arabidopsis hAT-like transposase is essential for plant development. Nature. 436, 282-284.

Bureau, T.E., Wessler, S.R. 1994. Mobile inverted-repeat elements of the Tourist family are

172 associated with the genes of many cereal grasses. P. Natl. Acad. Sci. USA. 91, 1411-1415.

Cairns, J. 1975. Mutation selection and the natural history of cancer. Nature. 255, 197-200.

Camacho, J.P.M. 2005. B chromosomes. In: Gregory, T.R. (ed.), The Evolution of the Genome. Elsevier, San Diego, CA, USA.

Cameron, J.R., Loh, E.Y., Davis, R.W. 1979. Evidence for transposition of dispersed repetitive DNA families in yeast. Cell. 16, 739-751.

Campbell, A. 1981. Evolutionary significance of accessory DNA elements in bacteria. Annu. Rev. Microbiol. 35, 55-83.

Canty, A., Ripley, B. 2015. boot: Bootstrap R (S-Plus) functions. R package version 1.3-17.

Capy, P., Anxolabéhère, D., Langin, T. 1994. The strange phylogenies of transposable elements: are horizontal transfers the only explanation? Trends Genet. 10, 7-12.

Capy, P., Gasperi, G., Biémont, C., Bazin, C. 2000. Stress and transposable elements: co- evolution or useful parasites? Heredity. 85, 101-106.

Capy, P., Langin, T., Higuet, D., Maurer, P., Bazin, C. 1997. Do the integrases of LTR- retrotransposons and class II element transposases have a common ancestor? Genetica, 100, 63- 72.

Capy, P., Vitalis, R., Langin, T., Higuet, D., Bazin, C. 1995. Relationships between transposable elements based upon the integrase-transposase domains: is there a common ancestor? J. Mol. Evol. 42, 359-368.

Casola, C., Hucks, D., Feschotte, C. 2008. Convergent domestication of pogo-like transposases into centromere-binding proteins in fission yeast and mammals. Mol. Biol. Evol. 25, 29-41.

Cavalier-Smith, T. 1978. Nuclear volume control by nucleoskeletal DNA, selection for cell volume and cell growth rate, and the solution of the DNA C-value paradox. J. Cell Sci. 34, 247- 278.

Cavalier-Smith, T. 1980. How selfish is DNA? Nature. 285, 617-618.

Cech T.R., Steitz J.A. 2014. The noncoding RNA revolution - Trashing old rules to forge new ones. Cell. 157, 77-94.

Chalker, D.L., Sandmeyer, S.B. 1992. Ty3 integrates within the region of RNA polymerase III transcription initiation. Genes Dev. 6, 117-128.

Chalker, D.L., Yao, M.-C. 2011. DNA elimination in Ciliates: transposon domestication and genome surveillance. Annu. Rev. Genet. 45, 227-246.

173

Chalopin, D., Naville, M., Plard, F., Galiana, D., Volff, J.-N. 2015. Comparative analysis of transposable elements highlights mobilome diversity and evolution in vertebrates. Genome Biol. Evol. 7, 567-580.

Charlesworth, B. 1986. Genetic divergence between transposable elements. Genetical Res. 48, 111-118.

Charlesworth, B. 2009. Effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10, 195-205.

Charlesworth, B., Langley, C.H. 1986. The evolution of self-regulated transposition of transposable elements. Genetics. 1121, 359-383.

Charlesworth, D., Wright, S.I. 2001. Breeding systems and genome evolution. Curr. Op. Genet. Dev. 11, 685-690.

Chen, J., Greenblatt, I.M., Dellaporta, S.L. 1992. Molecular analysis of Ac transposition and DNA replication. Genetics. 130, 665-676.

Chénais, B., Caruso, A., Hiard, S., Casse, N. 2012. The impact of transposable elements on eukaryotic genomes: From genome size increase to genetic adaptation to stressful environments. Gene. 509, 7-15.

Cheng, C.-Y., Vogt, A., Mochizuki, K., Yao, M.-C. 2010. A domesticated piggyBac transposase plays key roles in heterochromatin dynamics and DNA cleavage during programmed DNA deletion in Tetrahymena thermophila. Mol. Biol. Cell. 21, 1753-1762.

Christensen, S., G. Pont-Kingdon, and D. Carroll. 2000. Target specificity of the endonuclease from the Xenopus laevis non-long terminal repeat retrotransposon, Tx1L. Mol. Cell. Biol. 20, 1219-1226.

Clarke, E. 2012. Plant individuality: a solution to the demographer‘s dilemma. Biol. Phil. 27, 321-361.

CoGePedia Sequenced Plant Genomes Wiki. https://genomevolution.org/wiki/index.php/Sequenced_plant_genomes

Cohen, S.N.1976. Transposable genetic elements and plasmid evolution. Nature. 263, 731-738.

Colbourne, J.K., Pfrender, M.E., Gilbert, D., Thomas, W.K., Tucker, A., Oakley, T.H., Tokishita, S., Aerts, A., Arnold, G.J., Basu, M.K. et al. 2011. The ecoresponsive genome of Daphnia pulex. Science. 331, 555-561.

Collier, L.S., Largaespada, D.A. 2007. Transposable elements and the dynamic somatic genome. Genome Biol. 8(Suppl 1), S5.

174

Comfort, N. 2001. The Tangled Field. Harvard University Press, Cambridge, MA, USA.

Comfort, N.C. 1995. Two genes, no enzyme: a second look at Barbara McClintock and the 1951 Cold Spring Harbor Symposium. Genetics. 140, 1161-1166.

Corradi, N., Pombert, J.-F., Farinelli, L., Didier, E.S., Keeling, P.J. 2010. The complete sequence of the smallest known nuclear genome from the microsporidian Encephalitozoon intestinalis. Nat. Comm. 1, 77.

Cost, G.J., Boeke, J.D. 1998. Targeting of human retrotransposon integration is directed by the specificity of the L1 endonuclease for regions of unusual DNA structure. Biochemistry. 37, 18081-18093.

Craig, N.L., Chandler, M., Gellert, M., Lambowitz, A.M., Rice, P.A., Sandmeyer, S.B. (eds). 2015. Mobile DNA III. American Society for Microbiology Press, Washington, DC, USA.

Crénès, G., Moundras, C., Demattei, M.-V., Bigot, Y., Petit, A., Renault, S. 2009. Target site selection by the mariner-like element, Mos1. Genetica. 138, 509-517.

Crespi, B., Schwander, T. 2012. Asexual evolution: do intragenomic parasites maintain sex? Mol. Ecol.21, 3893-3895.

Crick, F.H.C. 1979. Split genes and RNA splicing. Science. 204, 264-271.

Cummings, M.P. 1994. Transmission patterns of eukaryotic transposable elements: arguments for and against horizontal transfer. Trends Ecol. Evol. 9, 141-145.

Curcio, M.J., Lutz, S., Lesage, P. 2015. The Ty1 LTR-retrotransposon of budding yeast, Saccharomyces cerevisiae. In: Craig, N.L., Chandler, M., Gellert, M., Lambowitz, A.M., Rice, P.A., Sandmeyer, S.B. (eds.), Mobile DNA III. American Society for Microbiology Press, Washington, DC, USA.

Darzentas, N., A. Bousios, V. Apostolidou, and A.S. Tsaftaris. 2010. MASiVE: mapping and anlysis of Sirevirus elements in plant genome sequences. Bioinformatics 26: 2452-2454.

Daubin, V., Moran, N.A. 2004. Comment on " The origins of genome complexity". Science. 306, 978a-978b.

Dawkins, R. 1976. The Selfish Gene. Oxford University Press, Oxford, UK.

Dawkins, R. 1982. The Extended Phenotype. Oxford University Press, Oxford, UK.

De Aguiar, D., Hartl, D.L. 1999. Regulatory potential of nonautonomous mariner elements and subfamily crosstalk. Genetica. 107, 79-85.

175

De Felice, B., Wilson, R.R., Argenziano, C., Kafantaris, I., Conicella, C. 2008. A transcriptionally active copia-like retroelement in Citrus limon. Cell. Mol. Biol. Lett. 14, 289- 304. de la Chaux, N., Tsuchimatsu, T., Shimizu, K.K., Wagner, A. 2012. The predominantly selfing plant Arabidopsis thaliana experienced a recent reduction in transposable element abundance compared to its outcrossing relative Arabidopsis lyrata. Mob. DNA. 3, 2. de la Chaux, N., Wagner, A. 2011. BEL/Pao retrotransposons in metazoan genomes. BMC Evol. Biol. 11, 154. de Visser, J.A.G.M., Elena, S.F. 2007. The evolution of sex: empirical insights into the roles of epistasis and drift. Nat. Rev. Genet. 8, 139-149.

Deceliere, G., Charles, S., Biémont, C. 2005. The dynamics of transposable elements in structured populations. Genetics. 169, 467-474.

Deininger, P., Batzer, M.A. 1999. Alu repeats and human disease. Mol. Genet. Metabolis. 67, 183-193.

Dewannieux, M., Esnault, C., Heidmann, T. 2003. LINE-mediated retrotransposition of marked Alu sequences. Nat. Genet. 35, 41-48.

DiCarlo, J.E., Norville, J.E., Mali, P., Rios, X., Aach, J., Church, G.M. 2013. Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucl. Acids Res. 41, 4336-4343.

Doak, T.G., Doerder, F.P., Jahn, C.L., Herrick, G. 1994. A proposed superfamily of transposase genes: transposon-like elements in ciliated protozoa and a common "D35E" motif. P. Natl. Acad. Sci. USA. 91, 942-946.

Domínguez, A., Albornoz, J. 1996. Rates of movement of transposable elements in Drosophila melanogaster. Mol. Gen. Genet. 251, 130-138.

Dong, C., Poulter, R.T., Han, J.S. 2009. LINE-like retrotransposition in Saccharomyces cerevisiae. Genetics. 181, 301-311.

Doolittle, W. F. 1989. Hierarchical approaches to genome evolution. Canadian J. Phil. Supp 14, 101-133.

Doolittle, W. F. 2013. Is junk DNA bunk? A critique of ENCODE. P. Natl. Acad. Sci. USA. 110, 5294–5300.

Doolittle, W.F. 1982. Selfish DNA after Fourteen Months. In: Dover, G.A. and Flavell, R.B. (eds.), Genome Evolution. Academic Press, London, UK.

176

Doolittle, W.F. 1984. Some broader evolutionary issues which emerge from contemporary molecular biological data. P. Bienn. Phil. Sci. Assoc. 1984, 129-144.

Doolittle, W.F. 1987. What introns have to tell us: hierarchy in genome evolution. Cold Spring Harbor Symp. Quant. Biol.52, 907-913.

Doolittle, W.F. 1989. Hierarchical approaches to genome evolution. Canadian J. Phil. Supp 14, 101-133.

Doolittle, W.F. 2013. Is junk DNA bunk? A critique of ENCODE. P. Natl. Acad. Sci. USA. 110, 5294–5300.

Doolittle, W.F., Brunet, T.D.P., Linquist, S., Gregory, T.R. 2014. Distinguishing between "function" and "effect" in genome biology. Genome Biol. Evol. 6, 1234-1237.

Doolittle, W.F., Sapienza, C. 1980 Selfish genes, the phenotype paradigm and genome evolution. Nature. 284, 601-603.

Dotto, B.R., Carvalho, E.L., Freitas, A., Duarte, L.F., Pinto, M., Ortiz, M.F., Wallau, G.L. 2015. HTT-DB - Horizontally transferred transposable elements database. Bioinformatics. 31, 2915-2917.

Dover, G. 1982. Molecular drive: a cohesive mode of species evolution. Nature. 299, 111-117. Drinnenberg, I.A., Weinberg, D.E., Xie, K.T., Mower, J.P., Wolfe, K.H., Fink, G.R., Bartel, D.P. 2009. RNAi in budding yeast. Science. 326, 544-550.

Drotschmann, K., Clark, A.B., Tran, H.T., Resnick, M.A., Gordenin, D.A., Kunkel, T.A. 1999. Mutator phenotypes of yeast strains heterozygous for mutations in the MSH2 gene. P. Natl. Acad. Sci. USA. 96, 2970-2975.

Du, C., Fefelova, N., Caronna, J., He, L., Dooner, H.K. 2009. The polychromatic Helitron landscape of the maize genome. P. Natl. Acad. Sci. USA. 106, 19916-19920.

Dupressoir, A., Heidmann, T. 1996. Germ line-specific expression of intracisternal A-particle retrotransposons in transgenic mice. Mol. Cell. Biol. 16, 4495-4503.

Durand, P.M., Michod, R.E. 2010. Genomics in the light of evolutionary transitions. Evolution. 64, 1533-1540.

Eddy, S.R. 2012. The C-value paradox, junk DNA and ENCODE. Curr. Biol. 22, R898-R899.

El Baidouri, M., Panaud, O. 2013. Comparative genomic paleontology across plant reveals the dynamics of TE-driven genome evolution. Genome Biol. Evol. 5, 954-965.

Elliott, T.A., Gregory, T.R. 2015a. What‘s in a genome? The C-value enigma and the evolution

177 of eukaryotic genome content. Phil. Trans. R. Soc. B. 370, 20140331.

Elliott, T.A., Gregory, T.R. 2015b. Do larger genomes contain more diverse transposable elements? BMC Evol. Biol. 15, 69.

Elliott, T.A., Linquist, S., Gregory, T.R. 2014. Conceptual and empirical challenges of ascribing functions to transposable elements. Am. Nat. 184, 14-24.

Elliott, T.A., Stage, D.E., Crease, T.J., Eickbush, T.H. 2013. In and out of the rRNA genes: characterization of Pokey elements in the sequenced Daphnia genome. Mob. DNA. 4, 20.

Engel, S.R., Cherry, J.M. 2013. The new modern era of yeast genomics: community sequencing and the resulting annotation of multiple Saccharomyces cerevisiae strains at the Saccharomyces Genome Database. Database. 2013, bat012.

Engels, W.R., Johnson-Schlitz, D.M., Eggleston, W.S., Svedt, J. 1990. High-frequency P element loss in Drosophila is homolog dependent. Cell. 62, 515-525.

Ergün, S., Buschmann, C., Heukeshoven, J., Dammann, K., Schnieders, F., Lauke, H., Chalajour, F., Kilic, N., Strätling, W.H., Schumann, G.G. 2004. Cell type-specific expression of LINE-1 open reading frames 1 and 2 in fetal and adult human tissues. J. Biol. Chem. 279, 27753-27763.

Esnault, C., Maestre, J., Heidmann, T. 2000. Human LINE retrotransposons generate processed pseudogenes. Nat. Genet. 24, 363-367.

Esnault, C., Palavesam, A., Pilitt, K., O'Brochta, D.A. 2011. Intrinsic characteristics of neighboring DNA modulate transposable element activity in Drosophila melanogaster. Genetics. 187, 319-331.

Evans, J.D., Brown, S.J., Hackett, K.J.J., Robinson, G., Richards, S., Lawson, D., Elsik, C., Coddington, J., Edwards, O., Emrich, S., et al. 2013. The i5K initiative: Advancing arthropod genomics for knowledge, human health, agriculture, and the environment. J. Hered. 104, 595- 600.

Fablet, M., Souames, S., Biémont, C.,Vieira, C. 2007. Evolutionary pathways of the tirant LTR retrotransposon in the Drosophila melanogaster subgroup of species. J. Mol. Evol. 64, 438-447.

Felsenstein, J. 1985. Phylogenies and the comparative method. Am. Nat. 125, 1–15.

Feng, G., Leem, Y.-E., Levin, H.L. 2013. Transposon integration enhances expression of stress response genes. Nucl. Acid. Res. 41, 775-789.

Feschotte, C. 2004. Merlin, a new superfamily of DNA transposons identified in diverse animal genomes and related to bacterial IS1016 insertion sequences. Mol. Biol. Evol. 21, 1769-1780.

Feschotte, C., Pritham, E.J. 2005. Non-mammalian c-integrases are encoded by giant

178 transposable elements. Trends Genet. 21, 551-552.

Feschotte, C., Pritham, E.J. 2007. DNA transposons and the evolution of eukaryotic genomes. Annu. Rev. Genet. 41, 331-368.

Feschotte, C., Zhang, X., Wessler, S.R. 2002. Miniature inverted-repeat transposable elements and their relationship to established DNA transposons. In: Craig, N.L., Craigie, R., Gellert, M., Lambowitz, A.M. (eds), Mobile DNA II. American Society for Microbiology Press, Washington, DC, USA.

Fincham, J.R.S., Sastry, G.R.K. 1974. Controlling elements in maize. Annu. Rev. Genet. 8, 15- 50.

Finnegan, D.J. 1989. Eukaryotic transposable elements and genome evolution. Trends Genet. 5. 103-107.

Fischer, M.G., Suttle, C.A. 2011. A virophage at the origin of large DNA transposons. Science. 322, 231-233.

Fischer, S.E.J., Wienholds, E., Plasterk, R.H.A. 2003. Continuous exchange of sequence information between dispersed Tc1 transposons in the Caenorhabditis elegans genome. Genetics. 164, 127-134.

Flavell, A.J., Pearce, S.R., Heslop-Harrison, J.S., Kumar, A. 1997. The evolution of Ty1-copia group retrotransposons in eukaryotic genomes. Genetica. 100, 185-195.

Flot, J.-F., Hespeels, B., Li, X., Noel, B., Arkhipova, I., Danchin, E.G.J., Hejnol, A., Henrissat, B., Koszul, R., Aury, J.-M., et al. 2013. Genomic evidence for ameiotic evolution in the bdelloid rotifer Adineta vaga. Nature. 500, 453-457.

Foster, T.J., Howe, T.G.B., Richmond, K.M.V. 1975. Translocation of the tetracycline resistance determinant from R100-1 to the Escherichia coli K-12 chromosome. J. Bacteriol. 124, 1153- 1158.

Francis, J.C., Hansche, P.E. 1972. Directed evolution of metabolic pathways in microbial populations. I. modification of the acid phosphatase pH optimum in S. cerevisiae. Genetics. 70, 59-73.

Freeling, M. 1984. Plant transposable elements and insertion sequences. Annu. Rev. Plant Physiol. 35, 277-298.

Friar, J.L., Goldman, T., Perez-Mercader, J. 2012. Genome sizes and the Benford distribution. PLoS One. 7, e36624.

Fu, Y., Kawabe, A., Etcheverry, M., Ito, T., Toyoda, A., Fujiyama, A., Colot, V., Tarutani, Y., Kakutani, T. 2013. Mobilization of a plant transposon by expression of the transposon-encoded

179 anti-silencing factor. The EMBO J. 32, 2407-2417.

Gao, D., Jimenez-Lopez, J.C., Iwata, A., Gill, N., Jackson, S.A. 2012. Functional and structural divergence of an unusual LTR retrotransposon family in plants. PLoS One. 7, e48595.

Gao, X., Hou, Y., Ebina, H., Levin, H.L., Voytas, D.F. 2008. Chromodomains direct integration of retrotransposons to heterochromatin. Genome Res. 18, 359-369.

Genome 10K Community of Scientists. 2009. Genome 10K: a proposal to obtain whole-genome sequence for 10 000 vertebrate species. J. Hered. 100, 659-674.

George, J.A., Traverse, K.L., DeBaryshe, P.G., Kelley, K.J., Pardue, M.-L. 2010. Evolution of diverse mechanisms for protecting chromosome ends by Drosophila TART telomere retrotransposons. P. Natl. Acad. Sci. USA. 107, 21052-21057.

GIGA Community of Scientists. 2014. The Global Invertebrate Genomics Alliance (GIGA): Developing community resources to study diverse invertebrate genomes. J. Hered. 105, 1-18.

Gilbert, C., Chateigner, A., Ernenwein, L., Barbe, V., Bézier, A., Herniou, E.A., Cordaux, R. 2014. Population genomics supports baculoviruses as vectors of horizontal transfer of insect transposons. Nat. Comm. 5, 3348.

Gilbert, C., Schaack, S., Pace II, J.K., Brindley, P.J., Feschotte, C. 2010. A role for host-parasite interactions in the horizontal transfer of transposons across phyla. Nature. 464, 1347-1350.

Gilbert, N., Labuda, D. 1999. CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs. P. Natl. Acad. Sci. USA. 96, 2869-2874.

Gladyshev, E.A., Arkhipova, I.R. 2010. Genome structure of bdelloid rotifers: shaped by asexuality or desiccation? J. Hered. 101, S85-S93.

Glayzer, D.C., Roberts, I.N., Archer, D.B., Oliver, R.P. 1995. The isolation of Ant1, a transposable element from Aspergillus niger. Mol. Gen. Genet. 249, 432-438.

Goddard, M.R., Godfray, H.C.J., Burt, A. 2005. Sex increases the efficacy of natural selection in experimental yeast populations. Nature. 107, 636-640.

Goff, S.A., Vaughn, M., McKay, S., Lyons, E., Stapleton, A.E., Gessler, D., Matasci, N., Wang, L., Hanlon, M., Lenards, A., et al. 2011. The iPlant collaborative: cyberinfrastructure for plant biology. Front. Plant Sci. 2, 34.

Gómez, E., Schulman, A.H., Martínez-Izquierdo, J.A., Vicient, C.M. 2006. Integrase diversity and transcription of the maize retrotransposon Grande. Genome. 49, 558-562.

Goodier, J.L., Ostertag, E.M., Kazazian, H.H. 2000. of 3'-flanking sequences is common in L1 retrotransposition. Human Mol. Genet. 9, 653-657.

180

Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N. et al. 2012. Phytozome: A comparative platform for green plant genomics. Nucl. Acids Res. 40, 1178-1186.

Goodwin, T.J.D., Butler, M.I., Poulter, R.T.M. 2003. Cryptons: a group of tyrosine-recombinase encoding DNA transposons from pathogenic fungi. Microbiology. 149, 3099-3109.

Goodwin, T.J.D., Poulter, R.T.M. 2001. The DIRS1 group of retrotransposons. Mol. Biol. Evol. 18, 2067-2082.

Goodwin, T.J.D., Poulter, R.T.M. 2004. A new group of tyrosine recombinase-encoding retrotransposons. Mol. Biol. Evol. 21, 746-759.

Gould, S.J. 1983. What Happens to Bodies if Genes Act for Themselves? Hen's Teeth and Horses Toes. Norton, New York, USA.

Gould, S.J. 1991. The Case of the Creeping Fox Terrier Clone. Bully for Brontosaurus: Reflections in Natural History. Norton, New York, USA.

Gould, S.J. 2002. The Structure of Evolutionary Theory. Belknap Press, Cambridge, MA, USA.

Gould, S.J., Lewontin, R.C. 1979. The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist program. P. Roy. Soc. Lon. B. 205, 581-598.

Gould, S.J., Vrba, E.S. 1982. Exaptation-a missing term in the science of form. Paleobiol. 8, 4-15 Granzotto, A., Lopes, F.R., Vieira, C., Carareto, C.M.A. 2011. Vertical inheritance and bursts of transposition have shaped the evolution of the BS non-LTR retrotransposon in Drosophila. Mol. Genet. Genomics. 286, 57-66.

Graur, D., Zheng, Y., Azevedo, R.B.R. 2015. An evolutionary classification of genomic function. Genome Biol. Evol. 7, 642-645.

Graur, D., Zheng, Y., Price, N., Azevedo, R.B.R., Zufall, R.A., Elhaik, E. 2013. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biol. Evol. 5, 578-590.

Green, M. 1969. Controlling element mediated transpositions of the white gene in Drosophila melanogaster. Genetics. 61, 429-441.

Gregory, T.R. 2000. Nucleotypic effects without nuclei: genome size and erythrocyte size in mammals. Genome. 43, 895-901.

Gregory, T.R. 2001. Coincidence, coevolution, or causation? DNA content, cell size, and the C- value enigma. Biol. Rev. 76, 65-101.

181

Gregory, T.R. 2002a. A bird‘s eye view of the C-value enigma: genome size, cell size, and metabolic rate in the class Aves. Evolution. 56, 121-130.

Gregory, T.R. 2002b. Genome size and developmental complexity. Genetica. 115, 131-146.

Gregory, T.R. 2004. Macroevolution, hierarchy theory, and the C-value enigma. Paleobiol. 30, 179-202.

Gregory, T.R. 2005. Synergy between sequence and size in large-scale genomics. Nat. Rev. Genet. 6, 699-708.

Gregory, T.R. 2016. Animal Genome Size Database. http://www.genomesize.com

Gregory, T.R., Hebert, P.D.N. 1999. The modulation of DNA content: proximate causes and ultimate consequences. Genome Res. 9, 317-324.

Gregory, T.R., Nicol, J.A., Tamm, H., Kullman, B., Kullman, K., Leitch, I.J., Murray, B.G. Kapraun, D.F., Greilhuber, J., Bennett, M.D. 2007. Eukaryotic genome size databases. Nucl. Acids Res. 35, D332-D338.

Gregory, T.R., Witt, J.D.S. 2008. Population size and genome size in fishes: a closer look. Genome. 51, 309-313.

Grenier, J.K., Arguello, J.R., Moreira, M.C., Gottipati, S., Mohammed, J., Hackett, S.R., Boughton, R., Greenberg, A.J., Clark, A.G. 2015. Global Diversity Lines - A five continent reference panel of sequenced Drosophila melanogaster strains. G3. 5, 593-603.

Guizard, S., Piégu, B., Arensburger, P., Guillou, F., Bigot, Y. 2016. Deep landscape update of dispersed and tandem repeats in the genome model of the red jungle fowl, Gallus gallus, using a series of de novo investigating tools. BMC Genomics. 17, 659.

Guo, X., Gao, J., Li, F., Wang, J. 2014. Evidence of horizontal transfer of non-autonomous Lep1 Helitrons facilitated by host-parasite interactions. Sci. Rep. 4, 5119.

Han, J.S. 2010. Non-long terminal repeat (non-LTR) retrotransposons: mechanisms, recent developments, and unanswered questions. Mob. DNA. 1, 15.

Harada, K., Yukuhiro, K., Mukai, T. 1990. Transposition rates of movable genetic elements in Drosophila melanogaster. P. Natl. Acad. Sci. USA. 87, 3248-3252.

Haussler, D., O'Brien, S.J., Ryder, O.A., Barker, F.K., Clamp, M., Crawford, A.J., Hanner, R., Hanotte, O., Johnson, W.E., McGuire, J.A., et al. 2009. Genome 10K: A proposal to obtain whole-genome sequence for 10 000 vertebrate species. J. Hered. 100, 659-674.

Havecker, E.R., Gao, X., Voytas, D.F. 2004. The diversity of LTR retrotransposons. Genome

182

Biol. 5, 225.

Hedges, D.J., Deininger, P.L. 2007. Inviting instability: transposable elements, double-strand breaks, and the maintenance of genomic integrity. Mut. Res. 616, 46-59.

Hellen, E.H.B., Brookfield, J.F.Y. 2012. The diversity of class II transposable elements in mammalian genomes has arisen from ancestral phylogenetic splits during ancient waves of proliferation through the genome. Mol. Biol. Evol. 30, 100-108.

Hellsten, U., Harland, R.M., Gilchrist, M.J., Hendrix, D., Jurka, J., Kapitonov, V., Ovcharenko, I., Putnam, N.H., Shu, S., Taher, L., et al. 2010. The genome of the western clawed frog Xenopus tropicalis. Science. 328, 633-636.

Hickey, D.A. 1982. Selfish DNA: a sexually-transmitted nuclear parasite. Genetics. 101, 519- 531.

Hickman, A.B., Chandler, M., Dyda, F. 2010. Integrating prokaryotes and eukaroytes: DNA transposases in light of structure. Crit. Rev. Biochem. Mol. Biol. 45, 50-69.

Higashiyama, T., Noutoshi, Y., Fujie, M., Yamada, T. 1997. Zepp, a LINE-like retrotransposon accumulated in the Chlorella telomeric region. The EMBO J. 16, 3715-3723.

Hillebrand, H. 2004. On the generality of the latitudinal gradient. Am. Nat. 163, 192-211.

Hoen, D.R., Hickey, G., Bourque, G., Casacuberta, J., Cordaux, R., Feschotte, C., Fiston-Lavier, A.-S., Hua-Van, A., Hubley, R., Kapusta, A. et al. 2015. A call for benchmarking transposable element annotation methods. Mob. DNA, 6, 13.

Hollister, J.D., Gaut, B.S. 2009. Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression. Genome Res. 19, 1419-1428.

Holmquist, G. 1989. Evolution of chromosome bands: molecular ecology of noncoding DNA. J. Mol. Evol. 28, 469-486.

Holsinger, K.E., Weir, B.S. 2009. Genetics in geographically structured populations: defining, estimating and interpreting Fst. Nat. Rev. Genet. 10, 639-650.

Hou, Y, Lin, S. 2009. Distinct gene number-genome size relationships for eukaryotes and noneukaryotes: gene content estimation for dinoflagellate genomes. PLoS ONE. 4, e6978.

Houle, D., Nuzhdin, S.V. 2004. Mutation accumulation and the effect of copia insertions in Drosophila melanogaster. Genetical Res. 83, 7-18.

Hu, S., Ohtsubo, E., Davidson, N. 1975. Electron microscope heteroduplex studies of sequence relations among plasmids of Escherichia coli: structure of F13 and related F-primes. J. Bacteriol.

183

122, 749-763.

Hu, S., Ptashne, K., Cohen, S.N., Davidson, N. 1975. αβ sequence of F is IS31. J. Bacteriol. 123, 687-692.

Huang, J.T., Dooner, H.K. 2008. Macrotransposition and other complex chromosomal restructuring in maize by closely linked transposons in direct orientation. Plant Cell. 20, 2019- 2032.

Hua-Van, A., Le Rouzic, A., Boutin, T.S., Filée, J., Capy, P. 2011. The struggle for life of the genome's selfish architects. Biol. Direct. 6, 19.

Hua-Van, A., Le Rouzic, A., Maisonhaute, C., Capy, P. 2005. Abundance, distribution and dynamics of retrotransposable elements and transposons: similarities and differences. Cytogenet. Genome Res. 110, 426-440.

Hudson, R.R. and Kaplan, N.L. 1986. On the divergence of members of a transposable element family. J. Math. Biol. 24, 207-215. i5K Consortium. 2013. The i5K Initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment. J. Hered. 104, 595-600.

Ilyin, Y.V., Tchurikov, N.A., Ananiev, E.A., Ryskov, A.P., Yenikolopov, G.N., Limborska, S.A., Maleeva, N.E., Gvozdev, V.A., Georgiev, G.P. 1977. Studies on the DNA fragments of mammals and Drosophila containing structural genes and adjacent sequences. Cold Spring Harbor Symp. Quant. Biol. 42, 959-969.

International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature. 409, 860-921.

Izsvák, Z., Wang, Y., Ivics, Z. 2009. Interactions of transposons with the cellular DNA repair machinery. In: Lankenau, D.-H., Volff, J.-N. (eds), Transposons and the Dynamic Genome. Springer-Verlag, Berlin/Heidelberg, Germany.

Jablonski, D. 2008. Species selection: theory and data. Annu. Rev. Ecol. Evol. Syst. 39, 501-524. Jain, H.K.1980. Incidental DNA. Nature. 288, 647-658.

Jameson, N., Georgelis, N., Fouladbash, E., Martens, S., Hannah, L.C., Lal, S. 2008. Helitron mediated amplification of cytochrome P450 monooxygenase gene in maize. Plant Mol. Biol. 67, 295-304.

Janicki, M., Rooke, R., Yang, G. 2011. Bioinformatics and genomic analysis of transposable elements in eukaryotic genomes. Chromosome Res. 19, 787-808.

Jiang, B., Lou, Q.-F., Wang, D., Wu, Z.-M., Zhang, W.-P., Chen, J.-F. 2011. Allopolyploidization induced the activation of Ty1-copia retrotransposons in Cucumis hytivus, a

184 newly formed Cucumis allotetraploid. Bot. Studies. 52, 145-152.

Jordan, E., Saedler, H., Starlinger, P. 1968. Oo and strong polar mutations in the gal operon are insertions. Mol. Gen. Genetics. 102, 353-363.

Jordan, I.K., McDonald, J.F. 1999. Tempo and mode of Ty element evolution in Saccharomyces cerevisiae. Genetics. 151, 1341-1351.

Jordan, I.K., Miller, W.J. 2009. Genome defense against transposable elements and the origins of regulatory RNA. In: Lankenau, D.-H., Volff, J.-N. (eds), Transposons and the Dynamic Genome. Springer-Verlag, Berlin/Heidelberg, Germany.

Jordan, I.K., Rogozin, I.B., Glazko, G.V., Koonin, E.V. 2003. Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet. 19, 68-72.

Jungck, J.R. 1982. Is the Neo-Darwinian synthesis robust enough to withstand the challenge of recent discoveries in molecular biology and molecular evolution? Proc. Bienn. Phil. Sci. Assoc. 1982, 322-328.

Jurka, J., Bao, W., Kojima, K.K. 2011. Families of transposable elements, population structure and the origin of species. Biol. Direct. 6, 44.

Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., Walichiewicz, J. 2005. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462-467.

Kaeppler, S.M., Kaeppler, H.F., Rhee, Y. 2000. Epigenetic aspects of somaclonal variation in plants. Plant Mol. Biol. 43, 179-188.

Kalendar, R., Vicient, C.M., Peleg, O., Anamthawat-Jonsson, K., Bolshoy, A., Schulman, A.H. 2004. Large retrotransposon derivatives: abundant, conserved but nonautonomous retroelements of barley and related genomes. Genetics. 166, 1437-1450.

Kamiya, T., O'Dwyer, K., Nakagawa, S., Poulin, R. 2014. What determines species richness of parasitic organisms? A meta-analysis across animal, plant and fungal hosts. Biol. Rev. 89, 123- 134.

Kano, H., Godoy, I., Courtney, C., Vetter, M.R., Gerton, G.L., Ostertag, E.M., Kazazian, H.H. 2009. L1 retrotransposition occurs mainly in embryogenesis and creates somatic mosaicism. Genes Dev. 23, 1303-1312.

Kapitonov, V.V., Jurka, J. 2007. Helitrons on a roll: eukaryotic rolling-circle transposons. Trends Genet. 23, 521-529.

Kapitonov, V.V., Jurka, J. 2001. Rolling-circle transposons in eukaryotes. P. Natl. Acad. Sci. USA. 98, 8714-8719.

185

Kapitonov, V.V., Jurka, J. 2004. Harbinger transposons and an ancient HARBI1 gene derived from a transposase. DNA Cell Biol.23, 311-324.

Kapitonov, V.V., Jurka, J. 2006. Self-synthesizing DNA transposons in eukaryotes. P. Natl. Acad. Sci. USA. 103, 4540-4545.

Kapitonov, V.V., Koonin, E.V. 2015. Evolution of the RAG1-RAG2 locus: both proteins came from the same transposon. Biol. Direct. 10, 20.

Kapitonov, V.V., Tempel, S., Jurka, J. 2009. Simple and fast classification of non-LTR retrotransposons based on phylogeny of their RT domain protein sequences. Gene. 448, 207-213.

Kawecki, T.J., Lenski, R.E., Ebert, D., Hollis, B., Olivieri, I., Whitlock, M.C. 2012. Experimental evolution. Trends Ecol. Evol. 27, 547-560.

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G.E., Dekker, J., et al. 2014. Defining functional DNA elements in the human genome. P. Natl. Acad. Sci. USA. 111, 6131-6138.

Kidwell, M.G. 2002. Transposable elements and the evolution of genome size in eukaryotes. Genetica. 115, 49-63.

Kidwell, M.G., Lisch, D.R. 1997. Transposable elements as sources of variation in animals and plants. P. Natl. Acad. Sci. USA. 94, 7704-7711.

Kidwell, M.G., Lisch, D.R. 2000. Transposable elements and host genome evolution. Trends Ecol. Evol. 15, 95-99.

Kidwell, M.G., Lisch, D.R. 2001. Perspective: transposable elements, parasitic DNA, and genome evolution. Evolution. 55, 1-24.

Kijima, T.E., Innan, H. 2013. Population genetics and molecular evolution of DNA sequences in transposable elements. I. A simulation framework. Genetics. 195, 957-967.

Kim, C., Rubin, C.M., Schmid, C.W. 2001. Genome-wide chromatin remodeling modulates the Alu heat shock response. Gene. 276, 127-133.

Kleckner, N. 1977. Translocatable elements in procaryotes. Cell. 11, 11-23.

Kofler, R., Nolte, V., Schlötterer, C. 2015. Tempo and mode of transposable element activity in Drosophila. PLoS Genet. 11, e1005406.

Kojima, K.K., Fujiwara, H. 2003. Evolution of target specificity in R1 clade non-LTR retrotransposons. Mol. Biol. Evol. 20, 351-361.

186

Kojima, K.K., Fujiwara, H. 2005. Long-term inheritance of the 28S rDNA-specific retrotransposon R2. Mol. Biol. Evol. 22, 2157-2165.

Kojima, K.K., Jurka, J. 2011. Crypton transposons: identification of new diverse families and ancient domestication events. Mob. DNA. 2, 12.

Kojima, K.K., Jurka, J. 2013. A superfamily of DNA transposons targeting multicopy small RNA genes. PLoS One. 8, e68260.

Koonin, E.V., Dolja, V.V. 2014. Virus world as an evolutionary network of viruses and capsidless selfish elements. Micro. Mol. Biol. Rev. 78, 278-303.

Kovach, A., Wegrzyn, J.L., Parra, G., Holt, C., Bruening, G.E., Loopstra, C.A., Hartigan, J., Yandell, M., Langley, C.H., Korf, I., Neale, D.B. 2010. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences. BMC Genomics. 11, 420.

Kozlowski, J., Konarzewski, M., Gawelczyk, A.T. 2003. Cell size as a link between noncoding DNA and metabolic rate scaling. P. Natl. Acad. Sci. USA.100, 14080-14085.

Kraaijeveld, K., Zwanenburg, B., Hubert, B., Vieira, C., De Pater, S., Van Alphen, J.J.M., Den Dunnen, J.T., De Knijff, P. 2012. Transposon proliferation in an asexual parasitoid. Mol. Ecol. 21, 3898-3906.

Kramerov, D.A., Vassetzky, N.S. 2011. Origin and evolution of SINEs in eukaryotic genomes. Heredity. 107, 487-495.

Krishnaswamy, L., Zhang, J., Peterson, T. 2010. Fusion of reverse-oriented Ds termini following abortive transposition in Arabidopsis: implications for the mechanism of Ac/Ds transposition. Plant Cell Rep. 29, 413-417.

Krylov, D.M., Koonin, E.V. 2001. A novel family of predicted retroviral-like aspartyl proteases with a possible key role in eukaryotic cell cycle control. Curr. Biol. 11, R584-R587.

Kullman, B., Tamm, H., Kullman, K. 2005. Fungal Genome Size Database. http://www.zbi.ee/fungal-genomesize

Kuraku, S., Qiu, H., Meyer, A. 2012. Horizontal transfers of Tc1 elements between teleost fishes and their vertebrate parasites, lampreys. Genome Biol. Evol. 4, 817-824.

Labrador, M., Corces, V.G. 1997. Transposable element-host interactions: regulation of insertion and excision. Annu. Rev. Genet. 31, 381-404.

Laha, T., Loukas, A., Wattanasatitarpa, S., Somprakhon, J., Kewgrai, N., Sithithaworn, P., Kaewkes, S., Mitreva, M., Brindley, P.J. 2007. The bandit, a new DNA transposon from a

187 hookworm — possible horizontal genetic transfer between host and parasite. PLoS Negl. Trop. Dis. 1, e35.

Lambowitz, A.M., Zimmerly, S. 2010. Group II introns: mobile ribozymes that invade DNA. Cold Spring Harb. Perspect. Biol. 3, 8.

Lampe, D. J., Walden, K. K. O., Robertson, H. M. 2001. Loss of transposase-DNA interaction may underlie the divergence of mariner family transposable elements and the ability of more than one mariner to occupy the same genome. Mol. Biol. Evol. 18, 954-961.

Langley, C.H., Montgomery, E., Hudson, R., Kaplan, N., Charlesworth, B. 1988. On the role of unequal exchange in the containment of transposable element copy number. Genetical Res. 52, 223-235.

Law, J.A., Jacobsen, S.E. 2010. Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat. Rev. Genet.11, 204-220.

Le Rouzic, A. 2002. Application du modèle des méta-populations à la dynamique des éléments transposables. MSc Thesis. DEA Biodiversité. Université Pierre et Marie Curie, Paris, France.

Le Rouzic, A., Capy, P. 2006. Population genetics models of competition between transposable element subfamilies. Genetics. 174, 785-793.

Le Rouzic, A., Capy, P. 2009. Theoretical approaches to the dynamics of transposable elements in genomes, populations, and species. In: Lankenau, D.-H., Volff, J.-N. (eds.), Transposons and the Dynamic Genome. Springer-Verlag, Berlin/Heidelberg, Germany.

Le Rouzic, A., Deceliere, G. 2005. Models of the population genetics of transposable elements. Genet. Res. 85, 171-181.

Le Rouzic, A., Dupas, S., Capy, P. 2007. Genome ecosystem and transposable elements species. Gene. 390, 214-220.

Leonardo, T.E., Nuzhdin, S.V. 2002. Intracellular battlegrounds: conflict and cooperation between transposable elements. Genetics Res. 80, 155-161.

Lerat, E., Capy, P. 1999. Retrotransposons and retroviruses: analysis of the envelope gene. Mol. Biol. Evol.16, 1198-1207.

Levin, H.L. 1995. A novel mechanism of self-primed reverse transcription defines a new family of retroelements. Mol. Cell. Biol. 15, 3310-3317.

Levin, H.L., Moran, J.V. 2011. Dynamic interactions between transposable elements and their hosts. Nat. Rev. Genet. 12, 615-627.

188

Lewontin, R.C. 1970. The units of selection. Annu. Rev. Ecol. Evol. Syst. 1, 1-18.

Li, Y., Dooner, H.K. 2009. Excision of Helitron transposons in maize. Genetics. 182, 399-402.

Linquist, S., Saylor, B., Cottenie, K., Elliott, T.A., Kremer, S.C., Gregory, T.R. 2013. Distinguishing ecological from evolutionary approaches to transposable elements. Biol. Rev. 88 , 573-584.

Linquist, S., Cottenie, K., Elliott, T.A., Saylor, B., Kremer, S.C., Gregory, T.R. 2015. Applying ecological models to communities of genetic elements: the case of Neutral Theory. Mol. Ecol. 24, 3232-3242.

Lippman, Z., Gendrel, A.-V., Black, M., Vaughn, M.W., Dedhia, N., McCombie, W.R., Lavine, K.L., Mittal, V., May, B., Kasschau, K.D. et al. 2004. Role of transposable elements in heterochromatin and epigenetic control. Nature. 430, 471-476.

Lisch, D. 2012. How important are transposons for plant evolution? Nat. Rev. Genet. 14, 49-61.

List of sequenced protist genomes. 2015. Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=List_of_sequenced_protist_genomes&oldid=7194670 44

Liu, W., Thummasuwan, S., Sehgal, S.K., Chouvarine, P., Peterson, D.G. 2011. Characterization of the genome of bald cypress. BMC Genomics. 12, 553.

Llorèns, C., Fares, M.A., Moya, A. 2008. Relationships of gag-pol diversity between Ty3/Gypsy and Retroviridae LTR retroelements and the three kings hypothesis. BMC Evol. Biol. 8, 276.

Lloréns, C., Futami, R., Covelli, L., Dominguez-Escriba, L., Viu, J.M., Tamarit, D., Aguilar- Rodriguez, J., Vicente-Ripolles, M., Fuster, G., Bernet, G.P. et al. 2011. The Gypsy database (GyDB) of mobile genetic elements: release 2.0. Nucl. Acids Res. 39, D70-D74.

Llorens, C., Muñoz-Pomer, A., Bernad, L., Botella, H., Moya, A. 2009. Network dynamics of eukaryote LTR retroelements beyond phylogenetic trees. Biol. Direct. 4, 41.

Lohe, A.R., Hartl, D.L. 1996. Autoregulation of mariner transposase activity by overproduction and dominant-negative complementation. Mol. Biol. Evol. 13, 549-555.

Lomolino, M. 2000. Ecology's most general, yet protean pattern: The species-area relationship. J. Biogeo. 27, 17-26.

Lorenzi, H.A., Robledo, G., Levin, M.J. 2006. The VIPER elements of trypanosomes constitute a novel group of tyrosine recombinase-encoding retrotransposons. Mol. Biochem. Parasitol. 145,

189

184-194.

Loreto, E.L.S.,Carareto, C.M.A., Capy, P. 2008. Revisiting horizontal transfer of transposable elements in Drosophila. Heredity. 100, 545-554.

Luan, D.D., Korman, M.H., Jakubczak, J.L., Eickbush, T.H. 1993. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell. 72, 595-605.

Lukic, S., Chen, K. 2011. Human piRNAs are under selection in Africans and repress transposable elements. Mol. Biol. Evol. 28, 3061-2067.

Lynch M., Conery, J.S. 2003. The origins of genome complexity. Science. 302, 1401–1403.

Lynch, M. 2007. The Origins of Genome Architecture. Sinauer Associates Inc., Sunderland, MA, USA.

Lynch, M., Conery, J.S. 2003. The origins of genome complexity. Science. 302, 1401-1404.

Lynch, M., Force, A. 2000. The probability of duplicate gene preservation by subfunctionalization. Genetics. 154, 459-473.

Ma, J., Bennetzen, J.L. 2004. Rapid recent growth and divergence of rice nuclear genomes. P. Natl. Acad. Sci. USA. 101, 12404-12410.

Maddison, D.R., Schulz, K.-S. (eds.) 2007. The Tree of Life Web Project. http://tolweb.org Maddison, W.P., Maddison, D.R. 2008. MESQUITE: a modular system for evolutionary analysis. http://mesquiteproject.org

Madlung, A., Comai, L. 2004. The effect of stress on genome regulation and structure. Ann. Bot. 94, 481-495.

Magurran, A.E. 1988. Ecological Diversity and its Measurement. Croom Helm Ltd., London, UK.

Malik, H.S. 2005. Ribonuclease H evolution in retrotransposable elements. Cytogenet. Genome Res. 110, 392-401.

Malik, H.S. and T.H. Eickbush. 2001. Phylogenetic analysis of Ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses. Genome Res. 11, 1187-1197.

Malik, H.S., Burke, W.D., Eickbush, T.H. 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16, 793-805.

Marín, I. 2010. GIN transposons: genetic elements linking retrotransposons and genes. Mol.

190

Biol. Evol. 27, 1903-1911.

Marquez, C.P., Pritham, E.J. 2010. Phantom, a new subclass of Mutator DNA transposons found in insect viruses and widely distributed in animals. Genetics. 185, 1507-1517.

Martin, S. 2006. The ORF1 protein encoded by LINE-1: structure and function during L1 retrotransposition. J. Biomed. Biotech. 2006, 45621.

Maslow, A.H. 1962. Toward a Psychology of Being. D. van Nostrand, Princeton, NJ, USA.

Matsunaga, W., Kobayashi, A., Kato, A., Ito, H. 2012. The effects of heat induction and the siRNA biogenesis pathway on the transgenerational transposition of ONSEN, a copia-like retrotransposon in Arabidopsis thaliana. Plant Cell Physiol. 53, 824-833.

Matzke, M.A., Mette, M.F., Aufsatz, W., Jakowitsch, J., Matzke, A.J.M. 1999. Host defenses to parasitic sequences and the evolution of epigenetic control mechanisms. Genetics. 107, 271-287.

Mauricio, R. 2005. Can ecology help genomics: the genome as ecosystem? Genetica. 123, 205- 209.

Maze, I., Feng, J., Wilkinson, M.B., Sun, H., Shen, L., Nestler, E.J. 2011. Cocaine dynamically regulates heterochromatin and repetitive element unsilencing in nucleus accumbens. P. Natl. Acad. Sci. USA. 108, 3035-3040.

McClintock, B. 1946. Maize genetics. Carnegie Inst. Wash. 45, 176-186.

McClintock, B. 1947. Cytogenetic studies of maize and Neurospora. Carnegie Inst. Wash. 46, 146-152.

McClintock, B. 1961. Some parallels between gene control systems in maize and in bacteria. Am. Nat. 95, 265-277.

McClintock, B. 1984. The significance of responses of the genome to challenge. Science 226, 792-801.

McCue, A.D., Nuthikattu, S., Slotkin, R.K. 2013. Genome-wide identification of genes regulated in trans by transposable element small interfering RNAs. RNA Biol. 10, 1-17.

McCutcheon, J., von Dohlen, C. 2011. An interdependent metabolic patchwork in the nested symbiosis of mealybugs. Curr. Biol. 21, 1366-1372.

McDonald, J.F. 1990. Macroevolution and retroviral elements. BioSci. 40, 183-191.

McDonald, M.J., Hsieh, Y.-Y., Yu, Y.-H., Chang, S.-L., Leu, J.-Y. 2012. The evolution of low mutation rates in experimental mutator populations of Saccharomyces cerevisiae. Curr. Biol. 22,

191

1-6.

McFadden, J., Knowles, G. 1997. Escape from evolutionary stasis by transposon-mediated deleterious mutations. J. Theor. Biol. 186, 441-447.

McVicker, G., Green, P. 2010. Genomic signatures of germline gene expression. Genome Res. 20, 1503-1511.

Menees, T.M., Sandmeyer, S.B. 1994. Transposition of the yeast retrovirus-like element Ty3 is dependent on the cell cycle. Mol. Cell. Biol. 14, 8229-8240.

Middleton, C. P., Stein, N., Keller, B., Kilian, B., Wicker, T. 2013. Comparative analysis of genome composition in Triticeae reveals strong variation in transposable element dynamics and nucleotide diversity. Plant J. 73, 347-356.

Midford, P.E., Garland, T., Maddison, W.P. 2011. PDAP, PDTREE package for Mesquite, v. 1.16. http://mesquiteprojectorg/pdap_mesquite

Miller, W.J., Hagemann, S., Reiter, E., Pinsker, W. 1992. P-element homologous sequences are tandemly repeated in the genome of Drosophila guanche. P. Natl. Acad. Sci. USA. 89, 4018- 4022.

Mills, R.E., Bennett, E.A., Iskow, R.C., Devine, S.E. 2007. Which transposable elements are active in the human genome? Trends Genet. 23, 183-191.

Mirsky, A.E., Ris, H. 1951. The deoxyribonucleic acid content of animal cells and its evolutionary significance. J. Gen. Physiol. 34, 451-462.

Modolo, L., Picard, F., Lerat, E. 2014. A new genome-wide method to track horizontally transferred sequences: Application to Drosophila. Genome Biol. Evol. 6, 416-432.

Montiel, E.E., Ruiz-Ruano, F.J., Cabrero, J., Marchal, J.A., Sánchez, A., Perfectti, F., López- león, M.D., Camacho, J.P.M. 2015. Intragenomic distribution of RTE retroelements suggests intrachromosomal movement. Chromosome Res. 23, 211-223.

Morgan, M.T. 2001. Transposable element number in mixed mating populations. Genetics Res. 77, 261-275.

Mota, N.R., Ludwig, A., da Silva Valente, V.L., Loreto, E.L.S. 2010. harrow: new Drosophila hAT transposons involved in horizontal transfer. Insect Mol. Biol. 19, 217-228.

Muotri, A.R., Zhao, C., Marchetto, M.C.N., Gage, F.H. 2009. Environmental influence on L1 retrotransposons in the adult hippocampus. Hippocampus. 19, 1002-1007.

Naito, K., Zhang, F., Tsukiyama, T., Saito, H., Hancock, C.N., Richardson, A.O., Okumoto, Y., Tanisaka, T., Wessler, S.R. 2009. Unexpected consequences of a sudden and massive transposon

192 amplification on rice gene expression. Nature. 461, 1130-1134.

Nellåker, C., Keane, T.M., Yalcin, B., Wong, K., Agam, A., Belgard, T.G., Flint, J., Adams, D.J., Frankel, W.N., Ponting, C.P. 2012. The genomic landscape shaped by selection on transposable elements across 18 mouse strains. Genome Biol. 13, R45.

Nevers, P., Saedler, H. 1977. Transposable genetic elements as agents of gene instability and chromosomal rearrangements. Nature. 268, 109-115.

Nosaka, M., Itoh, J.-I., Nagato, Y., Ono, A., Ishiwata, A., Sato, Y. 2012. Role of transposon- derived small RNAs in the interplay between genomes and parasitic DNA in rice. PLoS Genet. 8, e1002953.

Novick, P.A., Smith, J.D., Floumanhaft, M., Ray, D.A., Boissinot, S. 2011. The evolution and diversity of DNA transposons in the genome of the lizard Anolis carolinensis. Genome Biol. Evol. 3, 1-14.

Novikov, A., Smyshlyaev, G., Novikova, O. 2012. Evolutionary history of LTR retrotransposon chromodomains in plants. Int. J. Plant Genomics. 2012, 874743.

Novikova, O.S. 2010. Diversity and evolution of LTR retrotransposons in the genome of Phanerochaete chrysosporium (Fungi: Basidiomycota). Russ. J. Genet. 46, 637-644.

Nowacki, M., Higgins, B.P., Maquilan, G.M., Swart, E.C., Doak, T.G., Landweber, L.F.: A functional role for transposases in a large eukaryotic genome. Science. 324, 935-938.

Nowacki, M., Vijayan, V., Zhou, Y., Schotanus, K., Doak, T.G., Landweber, L.F. 2008. RNA- mediated epigenetic programming of a genome-rearrangement pathway. Nature. 451, 153-159.

Nuzhdin, S.V., Pasyukova, E.G., Mackay, T.F. 1997. Accumulation of transposable elements in laboratory lines of Drosophila melanogaster. Genetica. 100, 167-175.

O'Donnell, K.A., Burns, K.H. 2010. Mobilizing diversity: transposable element insertions in genetic variation and disease. Mob. DNA .1, 21.

Ohno, S. 1972. So much "junk" DNA in our genome. In: Smith, H.H. (ed.), Evolution of Genetic Systems. Gordan and Breach, New York, USA.

Ohno, S. 1981. (AGCTG)(AGCTG)(AGCTG)(GGGTG) as the primordial sequence of intergenic switches: the role in immunoglobulin class switch. Differentiation. 18, 65-74.

Ohta, T. 1984. Population genetics of transposable elements. J. Math. Appl. Med. Biol. 1, 17-29.

Okada, N., Hamada, M., Ogiwara, I., Oshima, K. 1997. SINEs and LINEs share common 3' sequences: a review. Gene. 205, 229-243.

193

Oki, N., Yano, K., Okumoto, Y., Tsukiyama, T., Teraishi, M., Tanisaka, T. 2008. A genome- wide view of miniature inverted-repeat transposable elements (MITEs) in rice, Oryza sativa ssp. japonica. Genes Genet. Syst. 83, 321-329.

Oksanen, J., Blanchet, F.G., Kindt, R., Legendre, P., Minchin, P.R., O'Hara, R.B., Simpson, G.L., Solymos, P., Henry, M., Stevens, H., Wagner, H. 2015. vegan: Community Ecology Package. R package version 2.3-2. http://CRAN.R-project.org/package=vegan

Oliver, K.R., Greene, W.K. 2009. Transposable elements: powerful facilitators of evolution. BioEssays. 31, 703-714.

Oliver, K.R., Greene, W.K. 2011. Mobile DNA and the TE-Thrust hypothesis: supporting evidence from the primates. Mob. DNA. 2, 8.

Orgel, L.E., Crick, F.H.C. 1980. Selfish DNA: the ultimate parasite. Nature. 284, 604-607.

Orgel, L.E., Crick, F.H.C., Sapienza, C. 1980. Selfish DNA. Nature. 288, 645-646.

Östergren, G. 1945. Parasitic nature of extra fragment chromosomes. Bot. Notiser. 2, 157-163.

Ovchinnikov, I., Troxel, A.B., Swergold, G.D. 2001. Genomic characterization of recent human LINE-1 insertions: evidence supporting random insertion. Genome Res. 11, 2050-2058.

Palazzo, A.F., Gregory, T.R. 2014. The case for junk DNA. PLoS Genet. 10, e1004351.

Palazzo, A.F., Lee, E.S. 2015. Non-coding RNA: what is functional and what is junk? Front. Genet. 6, 2.

Paradis, E., Claude, J., Strimmer, K. 2004. APE: analyses of and evolution in R language. Bioinformatics. 20, 289-290.

Pardue, M.-L., DeBaryshe, P.G. 2011. Retrotransposons that maintain chromosome ends. P. Natl. Acad. Sci. USA. 108, 20317-20324.

Parekh, R.N., Shaw, M.R., Wittrup, K.D. 1996. An integrating vector for tunable, high copy, stable integration into the dispersed Ty delta sites of Saccharomyces cerevisiae. Biotech. Prog. 12, 16-21.

Pasyukova, E., Nuzhdin, S., Li, W., Flavell, A.J. 1997. Germ line transposition of the copia retrotransposon in Drosophila melanogaster is restricted to males by tissue-specific control of copia RNA levels. Mol. Gen. Genet. 255, 115-124.

Pasyukova, E.G., Nuzhdin, S.V., Morozova, T.V., Mackay, T.F.C. 2004. Accumulation of transposable elements in the genome of Drosophila melanogaster is associated with a decrease

194 in fitness. J. Hered. 95, 284-290.

Pelak, K., Shianna, K.V., Ge, D., Maia, J.M., Zhu, M., Smith, J.P., Cirulli, E.T., Fellay, J., Dickson, S.P., Gumbs, C.E., et al. 2010. The characterization of twenty sequenced human genomes. PLoS Genet. 6, e1001111.

Pellicer, J., Fay, M.F., Leitch, I.J. 2010. The largest eukaryotic genome of them all? Bot. J. Linn. Soc. 164, 10-15.

Penton, E.H., Sullender , B.W., Crease, T.J. 2002. Pokey, a new DNA transposon in Daphnia (Cladocera:Crustacea). J. Mol. Evol. 55, 664-673.

Pepper, J.W., Findlay, C.S., Kassen, R., Spencer, S.L., Maley, C.C. 2009. Cancer research meets evolutionary biology. Evol. Appl. 2 , 62-70.

Petersen, G., Seberg, O. 2009. Stowaway MITEs in Hordeum (Poaceae): evolutionary history, ancestral elements and classification. Cladistics. 25, 1-11.

Peterson, D.G., Schulze, S.R., Sciara, E.B., Lee, S.A., Bowers, J.E., Nagel, A., Jiang, N., Tibbitts, D.C., Wessler, S.R., Paterson, A.H. 2002. Integration of Cot analysis, DNA cloning, and high - throughput sequencing facilitates genome characterization and gene discovery. Genome Res. 12, 795-807.

Peterson, P.A. 1970. Controlling elements and mutable loci in maize: their relationship to bacterial episomes. Genetica. 41, 33-56.

Peterson, P.A. 2002. Early beginnings of mobile element studies: controlling elements vs gene inserts, pre-molecular concepts. Maydica. 47, 147-167.

Petrov, D.A., Aminetzach, Y.T., Davis, J.C., Bensasson, D., Hirsh, A.E. 2003. Size matters: non- LTR retrotransposable elements and ectopic recombination in Drosophila. Mol. Biol. Evol. 20, 880-892.

Pidpala, O.V., Yatsishina, A.P., Lukash, L.L. 2008. Human mobile genetic elements: structure, distribution and functional role. Cytol. Genet. 42, 420-430.

Piednoël, M., Bonnivard, E. 2009. DIRS1-like retrotransposons are widely distributed among Decapoda and are particularly present in hydrothermal vent organisms. BMC Evol. Biol. 9, 86.

Piégu, B., Bire, S., Arensburger, P., Bigot, Y. 2015. A survey of transposable element classification systems - a call for a fundamental update to meet the challenge of their diversity and complexity. Mol. Phylo. Evol. 86, 90-109.

Piriyapongsa, J., Mariño-Ramírez, L., Jordan, I.K. 2007. Origin and evolution of human microRNAs from transposable elements. Genetics. 176, 1323-1337.

195

Piskurek, O., Okada, N. 2007. Poxviruses as possible vectors for horizontal transfer of from reptiles to mammals. P. Natl. Acad. Sci. USA. 104, 12046-12051.

Platt, R.N., Blanco-Berdugo, L., Ray, D.A. 2016. Accurate transposable element annotation is vital when analyzing new genome assemblies. Genome Biol. Evol. 8, 403-410.

Poulter, R.T., Butler, M.I. 2015. Tyrosine recombinase retrotransposons and transposons. In Craig, N. L., Chandler, M., Gellert, M., Lambowitz, A. M., Rice, P. A., Sandmeyer, S. B. (eds.), Mobile DNA III. American Society for Microbiology Press, Washington, DC, USA.

Pritham, E.J., Putliwala, T., Feschotte, C. 2007. Mavericks, a novel class of giant transposable elements widespread in eukaryotes and related to DNA viruses. Gene. 390, 3-17.

Prokopowich, C.D., Gregory, T.R., Crease, T.J. 2003. The correlation between rDNA copy number and genome size in eukaryotes. Genome. 46, 48-50.

Purvis, A., Garland, T. 1993. Polytomies in comparative analyses of continuous characters. Syst. Biol. 42, 569–575.

Quesneville, H., Anxolabéhère, D. 1998. Dynamics of transposable elements in metapopulations: a model of P element invasion in Drosophila. Theor. Pop. Biol. 54, 175-193.

R Core Team 2015. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org

Ragupathy, R., You, F.M., Cloutier, S. 2013. Arguments for standardizing transposable element annotation in plant genomes. Trends Plant Sci. 18, 367-376.

Ramallo, E., Kalendar, R., Schulman, A.H., Martínez-Izquierdo, J.A. 2008. Reme1, a copia retrotransposon in melon, is transcriptionally induced by UV light. Plant Mol. Biol. 66, 137-150.

Rampant, P.F., Lesur, I., Boussardon, C., Bitton, F., Martin-Magniette, M.-L., Bodenes, C., Le Provost, G., Berges, H., Fluch, S., Kremer, A., Plomion, C. 2011. Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome. BMC Genomics. 12, 292.

Rashkova, S., Karam, S.E., Kellum, R., Pardue, M.-L. 2002. Gag proteins of the two Drosophila telomeric retrotransposons are targeted to chromosome ends. J. Cell Biol. 159, 397-402.

Ratcliff, W.C., Denison, R.F., Borrello, M., Travisano, M. 2012. Experimental evolution of multicellularity. P. Natl. Acad. Sci. USA.109, 1595-1600.

Ray, D.A., Feschotte, C., Pagan, H.J.T., Smith, J.D., Pritham, E.J., Arensburger, P., Atkinson, P.W., Craig, N.L. 2008. Multiple waves of recent DNA transposon activity in the bat, Myotis

196 lucifugus. Genome Res. 18, 717-728.

Ray, D.A., Pagan, H.J.T., Thompson, M.L., Stevens, R.D. 2007. Bats with hATs: evidence for recent DNA transposon activity in genus Myotis. Mol. Biol. Evol. 24, 632-639.

Rebollo, R., Horard, B., Hubert, B., Vieira, C. 2010. Jumping genes and epigenetics: towards new species. Gene. 454, 1-7.

Reddy, T.B.K., Thomas, A.D., Stamatis, D., Bertsch, J., Isbandi, M., Jansson, J., Mallajosyula, J., Pagani, I., Lobos, E.A., Kyrpidesm, N.C. 2015. The Genomes OnLine Database (GOLD) v. 5: A metadata management systems based on a four level (meta)genome project classification. Nucl. Acids Res. 43, D1099-D1106.

Renfree, M.B., Papenfuss, A.T., Deakin, J.E., Lindsay J., Heider, T., Belov, K., Rens, W., Waters, P.D., Pharo, E.A., Shaw, G., et al. 2011. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development. Genome Biol. 12, R81.

Rep, M., van de Does, H.C., Cornelissen, B.J.C. 2005. Drifter, a novel, low copy hAT-like transposon in Fusarium oxysporum is activated during starvation. Fungal Genet. Biol. 42, 546- 553.

Revell, L. J. 2012. phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3 217-223.

Rho, M., Schaack, S., Gao, X., Kim, S., Lynch, M., Tang, H. 2010. LTR elements in the genome of Daphnia pulex. BMC Genomics. 11, 425.

Roeder, G.S., Fink, G.R. 1980. DNA rearrangements associated with a transposable element in yeast. Cell. 21, 239-249.

Robillard, E., Le Rouzic, A., Zhang, Z., Capy, P., Hua-Van, A. 2016. Experimental evolution reveals hyperparasitic interactions among transposable elements. P. Natl. Acad. Sci. USA. 10.1073/pnas.1524143113.

Rolland, J., Condamine, F.L., Jiguet, F., Morlon, H. 2014. Faster speciation and reduced extinction in the tropics contribute to the mammalian latitudinal diversity gradient. PLoS Biol. 12, e1001775.

Rosenzweig, M.L. 1992. Species diversity gradients: we know more and less than we thought. J. Mammology. 73, 715-730.

Rubin, E., Levy, A.A. 1997. Abortive gap repair: underlying mechanism for Ds element formation. Mol. Cell. Biol. 17, 6294-6302.

Sacerdot, C., Mercier, G., Todeschini, A.L., Dutreix, M., Springer, M., Lesage, P. 2005. Impact

197 of ionizing radiation on the life cycle of Saccharomyces cerevisiae Ty1 retrotransposon. Yeast. 22, 441-445.

Saedler, H., Heiss, B. 1973. Multiple copies of the insertion-DNA sequences IS1 and IS2 in the chromosome of E. coli K-12. Mol. Gen. Genetics. 122, 267-277.

Sandmeyer, S., Patterson, K., Bilanchone, V. 2015. Ty3, a position-specific retrotransposon in budding yeast. In: Craig, N.L., Chandler, M., Gellert, M., Lambowitz, A.M., Rice, P.A., Sandmeyer, S.B. (eds.), Mobile DNA III. American Society for Microbiology Press, Washington, DC, USA.

Sapienza, C., Doolittle, W.F. 1981. Genes are things you have whether you want them or not. Cold Spring Harbor Symp. Quant. Biol. 45, 177-182.

Saylor, B., Elliott, T.A., Linquist, S., Kremer, S.C., Gregory, T.R., Cottenie, K. 2013. A novel application of ecological analyses to assess transposable element distributions in the genome of the domestic cow, Bos taurus. Genome. 56, 521-533.

Schaack, S., Choi, E., Lynch, M., Pritham, E.J. 2010. DNA transposons and the role of recombination in mutation accumulation in Daphnia pulex. Genome Biol. 11, R46.

Schaack, S., Gilbert, C., Feschotte, C. 2010. Promiscuous DNA: horizontal transfer of transposable elements and why it matters for eukaryotic evolution. Trends Ecol. Evol. 25, 537- 546.

Schaack, S., Pritham, E.J., Wolf, A., Lynch, M. 2010. DNA transposon dynamics in populations of Daphnia pulex with and without sex. Proc. Royal Soc. London B. 277, 2381-2387.

Schwarz, K.B. 1996. Oxidative stress during viral infection: a review. Free Radical Biol. Med. 21, 641-649.

Seberg, O., Petersen, G. 2009. A unified classification system for eukaryotic transposable elements should reflect their phylogeny. Nat. Rev. Genet. 10, 276.

Sela, N., Mersch, B., Hotz-Wagenblatt, A., Ast, G. 2010. Characteristics of transposable element exonization within human and mouse. PLoS One. 5, e10907.

Selmecki, A.M., Maruvka, Y.E., Richmond, P.A., Guillet, M., Shoresh, N., Sorenson, A.L., De, S., Kishony, R., Michor, F., Dowell, R., Pellman, D. 2015. Polyploidy can drive rapid adaptation in yeast. Nature. 519, 349-352.

Senerchia, N., Wicker, T., Felber, F., Parisod, C. 2013. Evolutionary dynamics of retrotransposons assessed by high-throughput sequencing in wild relatives of wheat. Genome Biol. Evol. 5, 1010-1020.

Shaheen, M., E. Williamson, E., Nickoloff, J., Lee, S.-H., Hromas, R. 2010. Metnase/SETMAR:

198 a domesticated primate transposase that enhances DNA repair, replication, and decatenation. Genetica. 138, 559-566.

Shapiro, J. 1969. Mutations caused by the insertion of genetic material in the galactose operon of Escherichia coli. J. Mol. Biol. 40, 93-105.

Shapiro, J.A. 1977. DNA insertion elements and the evolution of chromosome primary structure. Trends Biochem. Sci. 129, 176-180.

Shapiro, J.A. 2010. Mobile DNA and evolution in the 21st century. Mob. DNA. 1, 4.

Sharma, A., Schneider, K.L., Presting, G.G. 2008. Sustained retrotransposition is mediated by nucleotide deletions and interelement recombinations. P. Natl. Acad. Sci. USA. 105, 15470- 15474.

Sharp, P.A., Cohen, S.N., Davidson, N. 1973. Electron microscope heteroduplex studies of sequence relations among plasmids of Escherichia coli II. Structure of drug resistance (R) factors and F factors. J. Mol. Biol. 75, 235-255.

Shi, X., Seluanov, A., Gorbunova, V. 2007. Cell divisions are required for L1 retrotransposition. Mol. Cell. Biol. 27, 1264-1270.

Simmons, M.J., Bucholz, L.M. 1985. Transposase titration in Drosophila melanogaster: a model of cytotype in the P-M system of hybrid dysgenesis. P. Natl. Acad. Sci. USA. 82, 8119-8123.

Simon, D.M., Kelcher, S.A., Zimmerly, S. 2009. A broadscale phylogenetic analysis of group II intron RNAs and intron-encoded reverse transcriptases. Mol. Biol. Evol. 26, 2795-2808.

Sinha, R.P., Häder, D.-P. 2002. UV-induced DNA damage and repair: a review. Photochem. Photobiol. Sci. 1, 225-236.

Sinzelle, L., Izsvák, Z., Ivics, Z. 2009. Molecular domestication of transposable elements: From detrimental parasites to useful host genes. Cell. Mol. Life Sci. 66, 1073-1093.

Slatkin, M. 1985. Genetic differentiation of transposable elements under mutation and unbiased . Genetics. 110, 145-158.

Slotkin, R.K., Martienssen, R. 2007. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 8, 272-285.

Šmarda, P., Bureš, P., Horová, L., Leitch, I.J., Mucina, L., Pacini, E., Tichý, L., Grulich, V., Rotreklová, O. 2014. Ecological and evolutionary significance of genomic GC content diversity in monocots. Proc. Natl. Acad. Sci. USA. 111, E4096–E4102.

Smith, T. 1980. Occam's razor. Nature. 285, 620.

199

Sommer, H., Carpenter, R., Harrison, B.J., Saedler, H. 1985. The transposable element Tam3 of Antirrhinum majus generates a novel type of sequence alterations upon excision. Mol. Gen. Genet. 199, 225-231.

Spadafora, C. 2008. A reverse transcriptase-dependent mechanism plays central roles in fundamental biological processes. Syst. Biol. Reprod. Med. 54, 11-21.

Spradling, A.C., Bellen, H.J., Hoskins, R.A. 2011. Drosophila P elements preferentially transpose to replication origins. P. Natl. Acad. Sci. USA.108, 15948-15953.

Strobel, E., Dunsmuir, P., Rubin, G.M. 1979. Polymorphisms in the chromosomal locations of elements of the 412, copia and 297 dispersed repeated gene families in Drosophila. Cell. 17, 429-439.

Subramanian, R.A., Arensburger, P., Atkinson, P.W., O'Brochta, D.A. 2007. Transposable element dynamics of the hAT element Herves in the human malaria vector Anopheles gambiae s.s. Genetics. 176, 2477-2487.

Sun, C., Shepard, D.B., Chong, R.A., López Arriaza, J., Hall, K., Castoe, T.A., Feschotte, C., Pollock, D.D., Mueller, R.L. 2012. LTR retrotransposons contribute to genomic gigantism in plethodontid salamanders. Genome Biol. Evol. 4, 168-183.

Sun, C., Wyngaard, G., Walton, D.B., Wichman, H.A., Mueller, R.L. 2014. Billions of basepairs of recently expanded, repetitive sequences are eliminated from the somatic genome during copepod development. BMC Genomics. 15, 186.

Syvanen, M. 1984. The evolutionary implications of mobile genetic elements. Annu. Rev. Genet. 18, 271-293.

Szitenberg, A., Koutsovoulus, G., Blaxter, M.L., Lunt, D.H. 2014. The evolution of tyrosine- recombinase elements in Nematoda. PLoS One. 9, e106630. Takahashi, H., Okazaki, S., Fujiwara, H. 1997. A new family of site-specific retrotransposons, SART1, is inserted into telomeric repeats of the silkworm, Bombyx mori. Nucl. Acids Res. 25, 1578-1584.

Tanaka, A., Nakatani, Y., Hamada, N., Jinno-Oue, A., Shimizu, N., Wada, S., Funayama, T., Mori, T., Islam, S., Hoque, S.A., et al. 2012. Ionising irradiation alters the dynamics of human long interspersed nuclear elements 1 (LINE1) retrotransposon. Mutagenesis 27. 599-607.

Tenaillon, M.I., Hufford, M.B., Gaut, B.S., Ross-Ibarra, J. 2011. Genome size and transposable element content as determined by high-throughput sequencing in maize and Zea luxurians. Genome Biol. Evol. 3, 219-229.

The Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 408, 796-815.

200

The ENCODE Project Consortium. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature. 489, 57-74.

Thomas, C. A. 1971. The genetic organization of chromosomes. Annu. Rev. Genet. 5, 237-256.

Thomas, J., Pritham, E.J. 2015. Helitrons, the eukaryotic rolling circle transposable elements. In: Craig, N.L., Chandler, M., Gellert, M., Lambowitz, A.M., Rice, P.A., Sandmeyer, S.B. (eds.), Mobile DNA III. American Society for Microbiology Press, Washington, DC, USA.

Thurman, R.E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M.T., Haugen, E., Sheffield, N.C., Stergachis, A.B., Wang, H., Vernot, B., et al. 2012. The accessible chromatin landscape of the human genome. Nature. 489, 75-82.

Townsend, J.P., Hartl, D.L. 2000. The kinetics of transposable element autoregulation. Genetica. 108, 229-237.

Traverse, K.L., George, J.A., DeBaryshe, P.G., Pardue, M.-L. 2010. Evolution of species- specific promoter-associated mechanisms for protecting chromosome ends by Drosophila Het-A telomeric transposons. P. Natl. Acad. Sci. USA. 16, 5064-5069.

Treangen, T.J., Salzberg, S.L. 2012. Repetitive DNA and next-generation sequencing, computational challenges and solutions. Nat. Rev. Genet. 13, 36–44.

Ueki, N., Nishii, I. 2008. Idaten is a new cold-inducible transposon of Volvox carteri that can be used for tagging developmentally important genes. Genetics. 180, 1343-1353.

Vassetzky, N.S., Kramerov, D.A. 2013. SINEBase: a database and tool for SINE analysis. Nucl. Acids Res. 41, D83-D89.

Vendrely R., Vendrely C. 1948. La teneur du noyau cellulaire en acide désoxyribonucléique à travers les organes, les individus et les espèces animales: Techniques et premiers résultats. Experientia. 4, 434–436.

Venner, S., Feschotte, C., Biémont, C. 2009. Dynamics of transposable elements: towards a community ecology of the genome. Trends Genet. 25, 317-323.

Verdaasdonk, J.S., Bloom, K. 2011. Centromeres: unique chromatin structures that drive chromosome segregation. Nat. Rev. Mol. Cell Biol. 12, 320-332.

Vergilino, R., Elliott, T.A., Desjardins-Proulx, P., Crease, T.J., Dufresne, F. 2013. Evolution of a transposon in Daphnia hybrid genomes. Mob. DNA. 4, 7.

Vinogradov, A.E. 1998. Genome size and GC-percent in vertebrates as determined by flow cytometry: the triangular relationship. Cytometry. 31, 100–109.

201

Vinogradov, A.E. 1999. Intron-genome size relationship on a large evolutionary scale. J. Mol. Evol. 49, 376-384.

Vitte, C., Panaud, O. 2003. Formation of solo-LTRs through unequal counterbalances amplifications of LTR retrotransposons in rice Oryza sativa L. Mol. Biol. Evol. 20, 528-540.

Volff, J.-N., Bouneau, L., Ozouf-Costaz, C., Fischer, C. 2003. Diversity of retrotransposable elements in compact pufferfish genomes. Trends Genet. 19, 674-678.

Walisko, O., Jursch, T., Izsvák, Z., Ivics, Z. 2009. Transposon-host cell interactions in the regulation of Sleeping Beauty transposition. In: Lankenau, D.-H., Volff, J.-N. (eds.), Transposons and the Dynamic Genome. Springer-Verlag, Berlin/Heidelberg, Germany.

Walker, P.M.B. 1979. Genes and non-coding sequences. In: Porter, R. and O'Connor, M. (eds.), Human Genetics: Possibilities and Realities. Excerpta Medica, Amsterdam, Netherlands.

Wallau, G.L., Capy, P., Loreto, E., Le Rouzic, A., Hua-Van, A. 2016. VHICA, a new method to discriminate between vertical and horizontal transposon transfer: application to the mariner family within Drosophila. Mol. Biol. Evol. 33, 1094-1109.

Wallau, G.L., Ortiz, M.F., Loreto, E.L.S. 2012. Horizontal transposon transfer in Eukarya: detection, bias and perspectives. Genome Biol. Evol. 4, 689-699.

Waltari, E., Edwards, S.V. 2002. Evolutionary dynamics of intron size, genome size, and physiological correlates in archosaurs. Am. Nat. 160, 539-552.

Wang, J., Zhuang, J., Iyer, S., Lin, X., Whitfield, T.W., Greven, M.C., Pierce, B.G., Dong, X., Kundaje, A., Cheng, Y., et al. 2012. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 22, 1798-1812.

Waples, R.S., Gaggiotti, O. 2006. What is a population? An empirical evaluation of some genetic methods for identifying the number of gene pools and their degree of connectivity. Mol. Ecol. 15, 1419-1439.

Warbrick, E., Heatherington, W., Lane, D.P., Glover, D.M. 1998. PCNA binding proteins in Drosophila melanogaster: the analysis of a conserved PCNA binding domain. Nucl. Acids Res. 26, 3925-3932.

Warren, W.C., Hillier, L.W., Graves, J.A.M., Birney, E., Ponting , C.P., Grützner, F., Belov, K., Miller, W., Clarke, L., et al. 2008. Genome analysis of the platypus reveals unique signatures of evolution. Nature. 453, 175-183.

Weil, C.F., Kunze, R. 2000. Transposition of maize Ac/Ds transposable elements in the yeast Saccharomyces cerevisiae. Nat. Genet. 26, 187-190.

202

Wendel, J.F., Cronn, R.C., Alvarez, I., Liu, B.,Small, R.L, Senchina, D.S. 2002. Intron size and genome size in plants. Mol. Biol. Evol. 19, 2346-2352.

Werren, J.H. 2011. Selfish genetic elements, genetic conflict, and evolutionary innovation. P. Natl. Acad. Sci. USA. 108, 10863-10870.

Wessler, S.R. 1996. Plant retrotransposons: Turned on by stress. Current Biol. 6, 959-961. Whitney, K.D., Garland, T. 2010. Did genetic drift drive increases in genome complexity? PLoS Genet. 6, e1001080.

Wicker, T., Sabot, F., Hua-Van, A., Bennetzen, J.L., Capy, P., Chalhoub, B., Flavell, A., Leroy, P., Morgante, M., Panaud, O., et al. 2007. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973-982.

Wilke, C.M., Adams, J.H. 1992. Fitness effects of Ty transposition in Saccharomyces cerevisiae. Genetics. 42, 31-42.

Wilkins, A.S. 2010. The enemy within: an epigenetic role of retrotransposons in cancer initiation. BioEssays. 32, 856-865.

Wilson, D.S. 1975. A theory of group selection. P. Natl. Acad. Sci. USA. 72, 143-146.

Witherspoon, D.J., Doak, T.G., Williams, K.R., Seegmiller, A., Seger, J., Herrick, G. 1997. Selection on the protein-coding genes of the TBE1 family of transposable elements in the ciliates Oxytricha fallax and O. trifallax. Mol. Biol. Evol. 14, 696-706.

Witherspoon, D.J. 1999. Selective constraints on P-element evolution. Mol. Biol. Evol. 16, 472- 478.

Witherspoon, D.J. 2000. Natural selection on transposable elements in eukaryotes, PhD Thesis. Department of Biology. University of Utah, Salt Lake City, Utah, USA.

Witte, C., Le, Q.H., Bureau, T., Kumar, A. 2001. Terminal-repeat retrotransposons in miniature (TRIM) are involved in restructuring plant genomes. P. Natl. Acad. Sci. USA. 98, 13778-13783.

Wood, D., Brink, R.A. 1956. Frequency of somatic mutation to self color in maize plants homozygous and heterozygous for variegated pericarp. P. Natl. Acad. Sci. USA. 42, 514-519.

Wright, S., Finnegan, D. 2001. Genome evolution: sex and the transposable element. Curr. Biol. 11, 296-299.

Wright, S.I., Schoen, D.J. 1999. Transposon dynamics and the breeding system. Genetica. 107, 139-148.

Xiong, Y., Eickbush, T.H. 1988. Similarity of reverse transcriptase-like sequence of viruses,

203 transposable elements, and mitchondrial introns. Mol. Biol. Evol. 5, 675-690.

Xiong, Y., Eickbush, T.H. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9, 3353-3362.

Yadav, V.P., Mandal, P.K., Bhattacharya, A., Bhattacharya, S. 2012. Recombinant SINEs are formed at high frequency during induced retrotransposition in vivo. Nat. Comm. 3, 854.

Yang, G., Nagel, D.H., Feschotte, C., Hancock, C.N., Wessler, S.R. 2009. Tuned for transposition: molecular determinants underlying the hyperactivity of a Stowaway MITE. Science. 325, 1391-1394.

Ye, J., Pérez-González, C.E., Eickbush, D.G., Eickbush, T.H. 2005. Competition between R1 and R2 transposable elements in the 28S rRNA genes of insects. Cytogenet. Genome Res. 110, 299-306.

Yoder, J.A., Walsh, C.P., Bestor, T.H. 1997. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. 13, 335-340.

Young, M.W. 1979 Middle repetitive DNA: a fluid component of the Drosophila genome. P. Natl. Acad. Sci. USA. 76, 6274-6278.

Yu, C., Zhang, J., Peterson, T. 2011. Genome rearrangements in maize induced by alternative transposition of reversed Ac/Ds termini. Genetics. 188, 59-67.

Yuan, Y.-W., Wessler, S.R. 2011. The catalytic domain of all eukaryotic cut-and-paste transposase superfamilies. P. Natl. Acad. Sci. USA. 108, 7884-7889.

Zeh, D.W., Zeh, J.A., Ishida, Y. 2009. Transposable elements and an epigenetic basis for punctuated equilibria. BioEssays. 31, 715-726.

Zeyl, C. 2006. Experimental evolution with yeast. FEMS Yeast Res. 6, 685-691.

Zeyl, C., Bell, G., Da Silva, J. 1994. Transposon abundance in sexual and asexual populations of Chlamydomonas reinhardtii. Evolution. 84, 1406-1409.

Zeyl, C., Bell, G., Green, D.M. 1996. Sex and the spread of retrotransposon Ty3 in experimental populations of Saccharomyces cerevisiae. Genetics. 143, 1567-1577.

Zhang, J., Yu, C., Pulletikurti, V., Lamb, J., Danilova, T., Weber, D.F., Birchler, J., Peterson, T. 2009. Alternative Ac/Ds transposition induces major chromosomal rearrangements in maize. Genes Dev. 23, 755-765.

Zhang, P., Spradling, A.C. 1993. Efficient and dispersed local P element transposition from Drosophila females. Genetics. 133, 361-373.

204

Zhang, Q., Edwards, S.V. 2012. The evolution of intron size in amniotes: A role for powered flight? Genome Biol. Evol. 4, 1033-1043.

Zhang, X., Feschotte, C., Zhang, Q., Jiang, N., Eggleston, W.B., Wessler, S.R. 2001. P instability factor: an active maize transposon system associated with the amplification of Tourist-like MITEs and a new superfamily of transposases. P. Natl. Acad. Sci. USA. 98, 12572-12577.

Zhao, F., Qi, J., Schuster, S.C. 2009. Tracking the past: interspersed repeats in an extinct Afrotherian mammal, Mammuthus primigenius. Genome Res. 19, 1384-1392.

Zhou, M.-B., Lu, J.-J., Zhong, H., Liu, X.-M., Tang, D.-Q. 2010. Distribution and diversity of PIF-like transposable elements in the Bambusoideae subfamily. Plant Sci. 179, 257-266.

Zou, S., Ke, N., Kim, J.M., Voytas, D.F. 1996. The Saccharomyces retrotransposon Ty5 integrates preferentially into regions of silent chromatin at the telomeres and mating loci. Genes Dev. 10, 634-645.

205

Phylogeny references

Alamouti, S.M., Tsui, C.K.M., Breuil, C. 2009. Multigene phylogeny of filamentous ambrosia fungi associated with ambrosia and bark beetles. Mycol. Res. 113, 822-835.

Baguñà, J., Riutort, M. 2004. Molecular phylogeny of the Platyhelminthes. Canadian J. Zool. 82, 168-193.

Bain, O., Casiraghi, M., Martin, C., Uni, S. 2008. The nematoda Filarioidea, critical analysis linking molecular and traditional approaches. Parasite. 15, 342-348.

Baroin-Tourancheau, A, Delgado, P., Perasso, R., Adoutte, A. 1992. A broad molecular phylogeny of ciliates, identification of major evolutionary trends and radiations within the phylum. Proc. Natl. Acad. Sci. USA. 89, 9764-9768.

Beakes, G.W., Glockling, S.L., Sekimoto, S. 2012. The evolutionary phylogeny of the oomycete "fungi". Protoplasma. 249, 3-19.

Begerow, D., Stoll, M., Bauer, R. 2006. A phylogenetic hypothesis of Ustilaginomycotina based on multiple gene analyses and morphological data. Mycologia. 98, 906-916.

Beilstein, M.A., Al-Shehbaz, I.A., Mathews, S., Kellogg, E.A. 2008. Brassicaceae phylogeny inferred from phytochrome A and ndhF sequence data: Tribes and trichomes revisited. Am. J. Bot. 95, 1307-1327.

Binder, M., Justo, A., Riley, R., Salamov, A., Lopez-Giraldez, F., Sjökvist, E., Copeland, A., Foster, B., Sun, H., Larsson E., et al. 2013 Phylogenetic and phylogenomic overview of the Polyporales. Mycologia. 105, 1350-1373.

Blair, J.E., Coffey, M.D., Park, S.Y., Geiser, D.M., Kang, S. 2008. A multi-locus phylogeny for Phytophthora utilizing markers derived from complete genome sequences. Fungal Genet. Biol. 45, 266-277.

Cannon, P.F., Damm, U., Johnston, P.R., Weir, B.S. 2012. Colletotrichum-current status and future directions. Stud. Mycol. 73, 181-213.

Crouch, J.A., Tomaso-Peterson, M. 2012. Anthracnose disease of centipedegrass turf caused by Colletotrichum eremochloae, a new fungal species closely related to Colletotrichum sublineola. Mycologia. 104, 1085-1096.

Diezmann, S., Cox, C.J., Schönian, G., Vilgalys, R.J., Mitchell, T.G. 2004. Phylogeny and evolution of medical species of Candida and related taxa: a multigenic analysis. J. Clin. Microbiol. 42, 5624-5635.

Fitzpatrick, D.A., Logue, M.E., Stajich, J.E., Butler, G. 2006. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis. BMC Evol. Biol. 6, 99.

206

Freshwater, D.W., Fredericq, S., Butler, B.S., Hommersand, M.H., Chase, M.W. 1994. A gene phylogeny of the red algae (Rhodophyta) based on plastid rbcL. Proc. Natl. Acad. Sci. USA. 91, 7281-7285.

Geiser, D.M., Gueidan, C., Miadlikowska, J., Lutzoni, F., Kauff, F., Hofstetter, V., Fraker, E., Schoch, C.L., Tibell, L., Untereiner, W.A., et al. 2006. Eurotiomycetes, Eurotiomycetidae and Chaetothyriomycetidae. Mycologia. 98, 1053-1064.

Groenewald, M., Kang, J.-C., Crous, P.W., Gams, W. 2001. ITS and ß-tubulin phylogeny of Phaeoacremonium and Phaeomoniella species. Mycol. Res. 105, 651-657.

Guillon, J.M., Guéry, L., Hulin, V., Girondot, M. 2012. A large phylogeny of turtles (Testudines) using molecular data. Contrib. Zool. 81, 147-158.

Houbraken, J., Samson, R.A. 2011. Phylogeny of Penicillium and the segregation of Trichocomaceae into three families. Stud. Mycol. 70, 1-51.

Huhndorf, S.M., Miller, A.N., Fernández, F.A. 2004. Molecular systematics of the Sordariales, the order and the family Lasiosphaeriaceae redefined. Mycologia. 96, 368-387.

Jansa, S.A, Weksler, M. 2004. Phylogeny of muroid rodents: Relationships within and among major lineages as determined by IRBP gene sequences. Mol. Phylogenet. Evol. 31, 256-276.

Johansson, U.S., Fjeldså, J., Bowie, R.C.K. 2008. Phylogenetic relationships within Passerida (Aves, Passeriformes): A review and a new molecular phylogeny based on three nuclear intron markers. Mol. Phylogenet. Evol. 48, 858-876.

Kurtzman, C.P., Robnett, C.J., Basehoar-Powers, E. 2008. Phylogenetic relationships among species of Pichia, Issatchenkia and Williopsis determined from multigene sequence analysis, and the proposal of Barnettozyma gen. nov., Lindnera gen. nov. and Wickerhamomyces gen. nov. FEMS Yeast Res. 8, 939-954.

Kurtzman, C.P., Robnett, C.J. 2003. Phylogenetic relationships among yeasts of the 'Saccharomyces complex' determined from multigene sequence analyses. FEMS Yeast Res. 3, 417-432.

Kurtzman, C.P., Robnett, C.J. 2013. Relationships among genera of the Saccharomycotina (Ascomycota) from multigene phylogenetic analysis of type species. FEMS Yeast Res. 13, 23- 33.

Leliaert, F., Smith, D.R., Moreau, H., Herron, M.D., Verbruggen, H., Delwiche, C.F., De Clerck, O. 2012. Phylogeny and molecular evolution of the green algae. Crit. Rev. Plant Sci. 31, 1-46.

207

Li, Y., Hyde, K.D., Jeewon, R., Cai, L., Vijaykrishna, D., Zhang, K. 2005. Phylogenetics and evolution of nematode-trapping fungi (Orbiliales) estimated from nuclear and protein coding genes. Mycologia. 97, 1034-1046.

Machouart, M., Samerpitak, K., de Hoog, G.S., Gueidan, C. 2013. A multigene phylogeny reveals that Ochroconis belongs to the family Sympoventuriaceae (Venturiales, Dothideomycetes). Fungal Divers. 65, 77-88.

Matheny, P.B., Curtis, J.M., Hofstetter, V., Aime, M.C., Moncalvo, J.-M., Ge, Z.-W., Slot, J.C., Ammirati, J.F., Baroni, T.J., Bougher, N.L., et al. 2006. Major clades of Agaricales, a multilocus phylogenetic overview. Mycologia. 98, 982-995.

Mühlhausen, S., Kollmar, M. 2014. Molecular phylogeny of sequenced Saccharomycetes reveals polyphyly of the alternative yeast codon usage. Genome Biol. Evol. 6, 1253.

Nilsson, M.A., Churakov, G., Sommer, M., van Tran, N., Zemann, A., Brosius, J., Schmitz, J. 2010. Tracking marsupial evolution using archaic genomic insertions. PLoS Biol. 8, e1000436.

Oliveira, M.C., Bhattacharya, D. 2000. Phylogeny of the Bangiophycidae (Rhodophyta) and the secondary endosymbiotic origin of algal plastids. American J. Bot. 87, 482-492.

Pan, S., Sigler, L., Cole, G.T. 1994. Evidence for a phylogenetic connection between Coccidioides immitis and Uncinocarpus reesii (Onygenaceae). Microbiology. 140, 1481-1494.

Park, J.-K., Sultana, T., Lee, S.-H., Kang, S., Kim, H.K., Min, G.-S., Eom, K.S., Nadler, S.A. 2011. Monophyly of clade III nematodes is not supported by phylogenetic analysis of complete mitochondrial genome sequences. BMC Genomics. 12, 392.

Peterson, S.W. 2008. Phylogenetic analysis of Aspergillus species using DNA sequences from four loci. Mycologia. 100, 205-226.

Potter, D., Eriksson, T., Evans, R.C., Oh, S., Smedmark, J.E.E., Morgan, D.R., Kerr, M., Robertson, K.R., Arsenault, M., Dickinson, T.A., et al. 2007. Phylogeny and classification of Rosaceae. Plant Syst. Evol. 266, 5-43.

Rehner, S.A., Samuels, G.J. 1995. Molecular systematics of the Hypocreales, a teleomorph gene phylogeny and the status of their anamorphs. Canadian J. Bot. 73, 816-823.

Reidenbach, K.R., Cook, S., Bertone, M.A., Harbach, R.E., Wiegmann, B.M., Besansky, N.J. 2009. Phylogenetic analysis and temporal diversification of mosquitoes (Diptera, Culicidae) based on nuclear genes and morphology. BMC Evol. Biol. 9, 298.

Romón, P., de Beer, Z.W., Zhou, X., Duong, T.A., Wingfield, B.D., Wingfield, M.J. 2014. Multigene phylogenies of Ophiostomataceae associated with Monterey pine bark beetles in Spain reveal three new fungal species. Mycologia. 106, 119-132.

208

Rossman, A.Y., Farr, D.F., Castlebury, L.A. 2007. A review of the phylogeny and biology of the Diaporthales. Mycoscience. 48, 135-144.

Särkinen, T., Bohs, L., Olmstead, R.G., Knapp, S. 2013. A phylogenetic framework for evolutionary study of the nightshades (Solanaceae), a dated 1000-tip tree. BMC Evol. Biol. 13, 214.

Schoch, C.L., Crous, P.W., Groenewald, J.Z., Boehm, E.W.A., Burgess, T.I., de Gruyter, J., de Hoog, G.S., Dixon, L.J., Grube, M., Gueidan, C., et al. 2009. A class-wide phylogenetic assessment of Dothideomycetes. Stud. Mycol. 64, 1-15.

Schoniana, G., Cuprolillo, E., Mauricio, I. 2013. Molecular evolution and phylogeny of Leishmania. In, Drug Resistance in Leishmania Parasites. Edited by Ponte-Sucre, A. 259-284.

Seetharam, A.S., Stuart, G.W. 2013. Whole genome phylogeny for 21 Drosophila species using predicted 2b-RAD fragments. PeerJ. 1, e226.

Steiner, U., Leibner, S., Schardl, C.L., Leuchtmann, A., Leistner, E. 2011. Periglandula, a new fungal genus within the Clavicipitaceae and its association with Convolvulaceae. Mycologia. 103, 1133-1145.

Sugiyama, J., Hosaka, K., Suh, S.-O. 2006. Early diverging Ascomycota, phylogenetic divergence and related evolutionary enigmas. Mycologia. 98, 996-1005.

Suh, S.-O., Blackwell, M., Kurtzman, C.P., Lachance, M.-A. 2006. Phylogenetics of Saccharomycetales, the ascomycete yeasts. Mycologia. 98, 1006-1017.

Suh, S.-O., Zhou, J.J. 2010. Methylotrophic yeasts near Ogataea (Hansenula) polymorpha, A proposal of Ogataea angusta comb. nov. and Candida parapolymorpha sp. nov. FEMS Yeast Res. 10, 631-386.

Sung, G.H., Hywel-Jones, N.L., Sung, J.M., Luangsa-ard, J.J., Shrestha, B., Spatafora, J.W. 2007. Phylogenetic classification of Cordyceps and the clavicipitaceous fungi. Stud. Mycol. 57, 5-59.

Sung, G.H., Spatafora, J.W., Zare, R., Hodge, K.T., Gams, W. 2001. A revision of Verticillium sect. Prostrata. II. Phylogenetic analyses of SSU and LSU nuclear rDNa sequences from anamorphs and teleomorphs of the Clavicipitaceae. Nova Hedwigia. 72, 311-328.

Sung, G.H., Sung, J.M., Hywel-Jones, N.L., Spatafora, J.W. 2007. A multi-gene phylogeny of Clavicipitaceae (Ascomycota, Fungi): Identification of localized incongruence using a combinational bootstrap approach. Mol. Phylogenet. Evol. 44, 1204-1223.

Tanaka, E., Tanaka, C. 2008. Phylogenetic study of clavicipitaceous fungi using acetaldehyde dehydrogenase gene sequences. Mycoscience. 49, 115-125.

209

Tokuoka, T. 2007. Molecular phylogenetic analysis of Euphorbiaceae sensu stricto based on plastid and nuclear DNA sequences and ovule and seed character evolution. J. Plant Res. 120, 511-522.

Tomšovský, M., Kolarík, M., Pažoutová, S., Homolka, L. 2006. Molecular phylogeny of European Trametes (Basidiomycetes, Polyporales) species based on LSU and ITS (nrDNA) sequences. Nova Hedwigia. 82, 269-280.

Tsagkogeorga, G., Turon, X., Hopcroft, R.R., Tilak, M.-K., Feldstein, T., Shenkar, N., Loya, Y., Huchon, D., Douzery, E.J.P., Delsuc, F. 2009. An updated 18S rRNA phylogeny of tunicates based on mixture and secondary structure models. BMC Evol. Biol. 9, 187.

Tsui, C.K.M., Daniel, H.M., Robert, V., Meyer, W. 2008. Re-examining the phylogeny of clinically relevant Candida species and allied genera based on multigene analyses. FEMS Yeast Res. 8, 651-659. van der Aa Kühle, A., Jespersen, L. 2003. The taxonomic position of Saccharomyces boulardii as evaluated by sequence analysis of the D1/D2 domain of 26S rDNA, the ITS1-5.8S rDNA- ITS2 region and the mitochondrial cytochrome-c oxidase II gene. Syst. Appl. Microbiol. 26, 564-571.

Vossbrinck, C.R., Baker, M.D., Andreadis, T.G. 2010. Phylogenetic position of Octosporea muscaedomesticae (Microsporidia) and its relationship to Octosporea bayeri based on small subunit rDNA analysis. J. Invertebr. Pathol. 105, 366-270.

Vossbrinck, C.R., Debrunner-Vossbrinck, B.A. 2005. Molecular phylogeny of the Microsporidia: Ecological, ultrastructural and taxonomic consideration. Folia Parasit. 52, 131- 142.

Wang, Z., Johnston, P.R., Takamatsu, S., Spatafora, J.W., Hibbett, D.S. 2006. Toward a phylogenetic classification of the Leotiomycetes based on rDNA data. Mycologia. 98, 1065- 1075.

Watanabe, M., Yonezawa, T., Lee, K.-I., Kumagai, S., Sugita-Konishi, Y., Keiichi, G., Hara- Kudo, Y. 2011. Molecular phylogeny of the higher and lower of the Fusarium genus and differences in the evolutionary histories of multiple genes. BMC Evol. Biol. 11, 322.

Wedin, M., Wiklund, E., Crewe, A., Döring, H., Ekman, S., Nyberg, A., Shmitt, I., Lumbsch, H.T. 2005. Phylogenetic relationships of Lecanoromycetes (Ascomycota) as revealed by analyses of mtSSU and nLSU rDNA sequence data. Mycol. Res. 109, 159-172.

Wojciechowski, M.F. 2003. Reconstructing the phylogeny of legumes (Leguminosae), an early 21st century perspective. In: Advances in Legume Systematics. Edited by Klitgaard, B.B., Bruneau, A. Royal Botanic Gardens, 5-35.

210

Yamada, O., Takara, R., Hamada, R., Hayashi, R., Tsukahara, M., Mikami, S. 2011. Molecular biological researches of Kuro-Koji molds, their classification and safety. J. Biosci. Bioeng. 112, 233-237.

Zhan, Z., Xu, K., Warren, A., Gong, Y. 2009. Reconsideration of phylogenetic relationships of the subclass peritrichia (Ciliophora, Oligohymenophorea) based on small subunit ribosomal RNA gene sequences, with the establishment of a new subclass Mobilia Kahl, 1933. J. Eukaryot. Microb. 56, 552-558.

Zhang, N., Castlebury, L.A., Miller, A.N., Huhndorf, S.M., Schoch, C.L., Seifert, K.A., Rossman, A.Y., Rogers, J.D., Kohlmeyer, J., Volkmann-Kohlmeyer, B., et al. 2006. An overview of the systematics of the Sordariomycetes based on a four-gene phylogeny. Mycologia. 98, 1076-1087.

211

Genome data references

Abad, P., Gouzy, J., Aury, J.-M., Castagnone-Sereno, P., Danchin, E.G.J., Deleury, E., Perfus- Barbeoch, L., Anthouard, V., Artiguenave, F., Blok, V.C., et al. 2008. Genome sequence of the metazoan plant-parasitic nematode Meloidogyne incognita. Nat. Biotech. 26, 909-915.

Abrahamsen, M.S., Templeton, T.J., Enomoto, S., Abrahante, J.E., Zhu, G., Lancto, C.A., Deng, M., Liu, C., Widmer, G., Tzipori, S., et al. 2004. Complete genome sequence of the apicomplexan, Cryptosporidium parvum. Science. 304, 441-445.

Adelson, D. L., J. M. Raison, M. Garber, and R. C. Edgar. 2010. Interspersed repeats in the horse (Equus caballus); spatial correlations highlight conserved chromosomal domains. Animal Genet. 41,1-9.

Aeschlimann, S.H., Jönsson, F., Postberg, J., Stover, N.A., Petera, R.L., Lipps, H.-J., Nowacki, M., Swart, E.C. 2014. The draft assembly of the radically organized Stylonychia lemnae macronuclear genome. Genome Biol. Evol. 6, 1707-1723.

Aflitos, S., Schijlen, E., De Jong,, H., De Ridder, D., Smit, S., Finkers, R., Wang, J., Zhang, G., Li, N., Mao, L., Bakker, F., et al. 2014. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing. Plant J. 80, 136-148.

Akiyoshi, D.E., Morrison, H.G., Lei, S., Feng, X., Zhang, Q., Corradi, N., Mayanja, H., Tumwine, J.K., Keeling, P.J., Weiss, L.M., et al. 2009. Genomic survey of the non-cultivatable opportunistic human pathogen, Enterocytozoon bieneusi. PLoS Pathog. 5, e1000261.

Albertin, C.B., O. Simakov, T. Mitros, Z.Y. Wang, J.R. Pungor, E. Edsinger-Gonzales, S. Brenner, C.W. Ragsdale, and D.S. Rokhsar. 2015. The octopus genome and the evolution of cephalopod neural and morphological novelties. Nature 524: 220-224.

Al-Dous, E.K., George, B., Al-Mahmoud, M.E., Al-Jaber, M.Y., Wang, H., Salameh, Y.M., Al- Azwani, E.K., Chaluvadi, S., Pontaroli, A.C., Debarry, J., et al. 2011. De novo genome sequencing and comparative genomics of date palm (Phoenix dactylifera). Nat. Biotech. 29, 521- 527.

Alföldi, J., Di Palma, F., Grabherr, M., Williams, C., Kong, L., Mauceli, E., Russell, P., Lowe, C.B., Glor, R.E., Jaffe, J.D., et al. 2011. The genome of the green anole lizard and a comparative analysis with birds and mammals. Nature. 477, 587-591.

Almeida, L. M., Silva, I.T., Silva, W.A., JCastro, J.P., Riggs, P.K., Carareto, C.M., Amaral, M.E.J. 2007. The contribution of transposable elements to Bos taurus gene structure. Gene 390, 180-189.

212

Amborella Genome Project. 2013. The Amborella genome and the evolution of flowering plants. Science 342, 1241089.

Amemiya, C.T., Alföldi, J., Lee, A.P., Fan, S., Philippe, H., MacCallum, I., Braasch, I., Manousaki, T., Schneider, I., Rohner, N., et al. 2013. The African coelacanth genome provides insights into tetrapod evolution. Nature. 496, 311-316.

Amselem, J., Cuomo, C.A., van Kan, J.A.L., Viaud, M., Benito, E.P., Couloux, A., Coutinho, P.M., de Vries, R.P., Dyer, P.S., Fillinger, S., et al. 2011. Genomic analysis of the necrotrophic fungal pathogens Sclerotinia sclerotiorum and Botrytis cinerea. PLoS Genet. 7, e1002230.

Amyotte, S. G., Tan, X., Pennerman, K., Del Mar Jimenez-Gasco, M., Klosterman, S.J., Ma, L.- J., Dobinson, K.F., et al. 2012. Transposable elements in phytopathogenic Verticillium spp.: insights into genome evolution and inter- and intra-specific diversification. BMC Genomics. 13, 314.

Andersen, M.R., Salazar, M.P., Schaap, P.J., van de Vondervoort, P.J.I., Culley, D., Thykaer, J., Frisvad, J.C., Nielsen, K.F., Albang, R., Albermann, K., et al. 2011. Comparative genomics of citric-acid-producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88. Genome Res. 21, 885-897.

Antonielli, L., Strauss, J., Sessitsch, A., Berger, H. 2014. Draft genome sequence of Phaeomoniella chlamydospora strain RR-HG1, a grapevine trunk disease (Esca)-related member of the Ascomycota. Genome Announce. 2, e00098-00014.

Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J.-M., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 297, 1301-1310.

Aragona, M., Minio, A., Ferrarini, A., Valente, M.T., Bagnaresi, P., Orrù, L., Tononi, P., Zamperin, G., Infantino, A., Valè, G., et al. 2014. De novo genome assembly of the soil-borne and tomato pathogen Pyrenochaeta lycopersici. BMC Genomics. 15, 313.

Arai, R. 2011, Fish Karyotypes: A Checklist. Springer, Tokyo, Japan.

Arensburger, P., Megy, K., Waterhouse, R.M., Abrudan, J., Amedeo, P., Antelo, B., Bartholomay, L., Bidwell, S., Caler, E., Camara, F., et al. 2010. Sequencing of Culex quinquefasciatus establishes a platform for mosquito comparative genomics. Science. 330, 86- 88.

Argout, X., Salse, J., Aury, J.-M., Guiltinan, M.J., Droc, G., Gouzy, J., Allegre, M., Chaparro, C., Legavre, T., Maximova, S.N., et al. 2011. The genome of Theobroma cacao. Nat. Genet. 43, 101-108.

213

Arkhipova, I.R., Yushenova, I.A., Rodriguez, F. 2013. Endonuclease-containing Penelope retrotransposons in the bdelloid rotifer Adineta vaga exhibit unusual structural features and play a role in expansion of host gene families. Mob. DNA. 2013, 4, 19.

Armbrust, E.V., Berges, J.A., Bowler, C., Green, B.R., Martinez, D., Putnam, N.H., Zhou, S., Allen, A.E., Apt, K.E., Bechner, M., et al. 2004. The genome of the diatom Thalassiosira pseudonana, ecology, evolution, and metabolism. Science. 306, 79-86.

Ashman, T.-L., Bachtrog, D., Blackmon, H., Goldberg, E.E., Hahn, M.W., Kirkpatrick, M., Kitano, J., Mank, J.E., Mayrose, I., Ming, R., et al. 2014. Tree of Sex: A database of sexual systems. Scientific Data. 1, 140015.

Aury, J.-M., Jaillon, O., Duret, L., Noel, B., Jubin, C., Porcel, B.M., Ségurens, B., Daubin, V., Anthouard, V., Aiach, N., et al. 2006. Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature. 444, 171-178.

Aversano, R., Contaldi, F., Ercolano, M.R., Grosso, V., Iorizzo, M., Tatino, F., Xumerle, L., Dal Molin, A., Avanzato, C., Ferrarini, A., et al. 2015. The Solanum commersonii genome sequence provides insights into adaptation to stress conditions and genome evolution of wild potato relatives. Plant Cell. 27, 954-968.

Aylward, F.O., Burnum-Johnson, K.E., Tringe, S.G., Teiling, C., Tremmel, D.M., Moeller, J.A., Scott, J.J., Barry, K.W., Piehowski, P.D., Nicora, C.D., et al. 2013. Leucoagaricus gongylophorus produces diverse enzymes for the degradation of recalcitrant plant polymers in leaf-cutter ant fungus gardens. Appl. Environ. Microb. 79, 3770-3778.

Bai, X., Adams, B.J., Ciche, T.A., Clifton, S., Gaugler, R., Kim, K., Spieth, J., Sternberg, P.W., Wilson, R.K., Grewal, P.S. 2013. A lover and a fighter: the genome sequence of an entomopathogenic nematode Heterorhabditis bacteriophora. PLoS One. 8, e69618.

Baker, E., Wang, B., Bellora, N., Peris, D., Hulfachor, A.B., Koshalek, J.A., Adams, M., Libkind, D., Hittinger, C.T. 2015. The genome sequence of Saccharomyces eubayanus and the domestication of lager-brewing yeasts. Mol. Biol. Evol. 32, 2818-2831.

Bakowski, M.A., Priest, M., Young, S., Cuomo, C.A., Troemel, E.R. 2014. Genome sequence of the microsporidian species Nematocida sp1 strain ERTm6 (ATCC PRA-372). Genome Announce. 2, 1-2.

Banks, J.A., Nishiyama, T., Hasebe, M., Bowman, J.L., Gribskov, M., DePamphilis, C., Albert, V.A., Aono, N., Aoyama, T., Ambrose, B.A., et al. 2011. The Selaginella genome identifies genetic changes associated with the evolution of vascular plants. Science. 322, 960-963.

214

Barbier, G., Oesterhelt, C., Larson, M.D., Halgren, R.G., Wilkerson, C., Garavito, R.M., Benning, C., Weber, A.P. 2005. Comparative genomics of two closely related unicellular thermo-acidophilic red algae, Galdieria sulphuraria and Cyanidioschyzon merolae, reveals the molecular basis of the metabolic flexibility of Galdieria sulphuraria and significant differences in carbohydrate metabolism of both algae. Plant Phys. 137, 460-474.

Baroncelli, R., Piaggeschi, G., Fiorini, L., Bertolini, E., Zapparata, A., Pè, M.E., Sarrocco, S., Vannacci, G. 2015. Draft whole-genome sequence of the biocontrol agent Trichoderma harzianum T6776. Genome Announce. 3, e00647-00615.

Baroncelli, R., Sanz-Martín, J.M., Rech, G.E., Sukno, S.A., Thon, M.R. 2014. Draft genome sequence of Colletotrichum sublineola, a destructive pathogen of cultivated sorghum. Genome Announce. 2, 10-11.

Baroncelli, R., Sreenivasaprasad, S., Sukno, S.A., Thon, M.R., Holub, E. 2014. Draft genome sequence of Colletotrichum acutatum Sensu Lato (Colletotrichum fioriniae). Genome Announce. 2, e00112-e00114.

Bartoš, J., Paux, E., Kofler, R., Havránková, M., Kopecký, D., Suchánková, P., Šafár, J., Simkova, H., Town, C.D., Lelley, T. 2008. A first survey of the rye (Secale cereale) genome composition through BAC end sequencing of the short arm of chromosome 1R. BMC Plant Biol. 8, 95.

Basnayake, S., Maclean, D.J., Whisson, S.C., Drenth, A. 2009. Identification and occurrence of the LTR-copia-like retrotransposon, PSCR and other copia-like element in the genome of Phytophthora sojae. Curr. Genet. 55, 521-536.

Baucom, R. S., Estill, J.C., Chaparro, C., Upshaw, N., Jogi, A., Deragon, J.-M., Westerman, R.P., Sanmiguel, P.J., Bennetzen, J.L. 2009. Exceptional diversity, non-random distribution, and rapid evolution of retroelements in the B73 maize genome. PLoS Genet. 5, e1000732.

Baxter, L., Tripathy, S., Ishaque, N., Boot, N., Cabral, A., Kemen, E., Thines, M., Ah-Fong, A., Anderson, R., Badejoko, W., et al. 2010. Signatures of adaptation to obligate biotrophy in the Hyaloperonospora arabidopsidis genome. Science. 330, 1549-1551.

Belknap, W.R., Wang, Y., Huo, N., Wu, J., Rockhold, D.R., Gu, T.Q., Stover, E. 2011. Characterizing the citrus cultivar Carrizo genome through 454 shotgun sequencing. Genome. 54, 1005-1015.

Bennett, H.M., Mok, H.P., Gkrania-Klotsas, E., Tsai, I.J., Stanley, E.J., Antoun, N.M., Coghlan, A., Harsha, B., Traini, A., Ribeiro, D.M., et al. 2014. The genome of the sparganosis tapeworm Spirometra erinaceieuropaei isolated from the biopsy of a migrating brain lesion. Genome Biol. 15, 510.

215

Bennett, M.D., Leitch, I.J. 2011. Nuclear DNA amounts in angiosperms, targets, trends and tomorrow. Ann. Bot-London. 107, 467-590.

Bennetzen, J.L., Schmutz, J., Wang, H., Percifield, R., Hawkins, J., Pontaroli, A.C., Estep, M., Feng, L., Vaughn, J.N., Grimwood, J., et al. 2012. Reference genome sequence of the model plant Setaria. Nat. Biotech. 30, 555-561.

Berka, R.M., Grigoriev, I.V., Otillar, R., Salamov, A., Grimwood, J., Reid, I., Ishmael, N., John, T., Darmond, C., Moisan, M.-C., et al. 2011. Comparative genomic analysis of the thermophilic biomass-degrading fungi Myceliophthora thermophila and Thielavia terrestris. Nat. Biotech. 29, 922-927.

Bernet, G. P., Muñoz-Pomer, A., Domínguez-Escribá, L., Covelli, L., Bernad, L., Ramasamy, S., Futami, R., Sempere, J.M., Moya, A., Llorens, C. 2011. GyDB mobilomics: LTR retroelements and integrase-related transposons of the pea aphid Acyrthosiphon pisum genome. Mob. Genet. Elements. 1, 97-102.

Berriman, M., Ghedin, E., Hertz-Fowler, C., Blandin, G., Renauld, H., Bartholomeu, D.C., Lennard, N.J., Caler, E., Hamlin, N.E., Haas, B., et al. 2005. The genome of the African trypanosome Trypanosoma brucei. Science. 309, 416-422.

Berriman, M., Haas, B., LoVerde, P.T., Wilson, R.A., Dillon, G.P., Cerqueira, G.C., Mashiyama, S.T., Al-Lazikani, B., Andrade, L.F., Ashton, P.D., et al. 2009. The genome of the blood fluke Schistosoma mansoni. Nature. 460, 352-360.

Berry, D., Cox, M.P., Scott, B. 2015. Draft genome sequence of the filamentous fungus Penicillium paxilli (ATCC 26601). Genome Announce. 3, e00071-00015.

Berthelot, C., Brunet, F., Chalopin, D., Juanchich, A., Bernard, M., Noël, B., Bento, P., Da Silva, C., Labadie, K., Alberti, A., et al. 2014. The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nature Comm. 5, 3657.

Bertioli, D. J., Vidigal, B., Nielen, S., MRatnaparkhe, M.B., Lee, T.-H., Leal-Bertioli, S.C.M., Kim, C., Guimaraes, P.M., Seijo, G., Schwarzacher, T. et al. 2013. The repetitive component of the A genome of peanut (Arachis hypogaea) and its role in remodelling intergenic sequence space since its evolutionary divergence from the B genome. Annals Bot. 112, 545-559.

Bhattacharya, D., Price, D.C., Xin Chan, C., Qiu, H., Rose, N., Ball, S., Weber, A.P.M., Arias, M.C., Henrissat, B., Coutinho, P.M., et al. 2013. Genome of the red alga Porphyridium purpureum. Nat. Comm. 4, 1941.

Blanc, G., Agarkova, I., Grimwood, J., Kuo, A., Brueggeman, A., Dunigan, D.D., Gurnon, J., Ladunga, I., Lindquist, E., Lucas, S., et al. 2012. The genome of the polar eukaryotic microalga Coccomyxa subellipsoidea reveals traits of cold adaptation. Genome Biol. 13, R39.

216

Blanc, G., Duncan, G., Agarkova, I., Borodovsky, M., Gurnon, J., Kuo, A., Lindquist, E., Lucas, S., Pangilinan, J., Polle, J., et al. 2010. The Chlorella variabilis NC64A genome reveals adaptation to photosymbiosis, coevolution with viruses, and cryptic sex. Plant Cell. 22, 2943- 2955.

Blanco-Ulate, B., Rolshausen, P.E., Cantu, D. 2013. Draft genome sequence of the grapevine dieback fungus Eutypa lata UCR-EL1. Genome Announce. 1, 3-4.

Blazejewski, T., Nursimulu, N., Pszenny, V., Dangoudoubiyam, S., Namasivayam, S., Chiasson, M.A., Chessman, K., Tonkin, M., Swapna, L.S., Hung, S.S., et al. 2015. Systems-based analysis of the Sarcocystis neurona genome identifies pathways that contribute to a heteroxenous life cycle. mBio. 6, 1-16.

Bolger, A., Scossa, F., Bolger, M.E., Lanz, C., Maumus, F., Tohge, T., Quesneville, H., Alseekh, S., Sørensen, I., Lichtenstein, G., et al. 2014. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat. Genet. 46, 1034-1038.

Bonasio, R., Zhang, G., Ye, C., Mutti, N.S., Fang, X., Qin, N., Donahue, G., Yang, P., Li, Q., Li, C., et al. 2010. Genomic comparison of the ants Camponotus floridanus and Harpegnathos saltator. Science. 329, 1068-1071.

Bowler, C., Allen, A.E., Badger, J.H., Grimwood, J., Jabbari, K., Kuo, A., Maheswari, U., Martens, C., Maumus, F., Otillar, R.P., et al. 2008. The Phaeodactylum genome reveals the evolutionary history of diatom genomes. Nature. 456, 239-244.

Brawand, D., Wagner, C.E., Li, Y.I., Malinsky, M., Keller, I., Fan, S., Simakov, O., Ng, A.Y., Lim, Z.W., Bezault, E., et al. 2014. The genomic substrate for adaptive radiation in African cichlid fish. Nature. 513, 375-381.

Brayton, K.A., Lau, A.O.T., Herndon, D.R., Hannick, L., Kappmeyer, L.S., Berens, S.J., Bidwell, S.L., Brown, W.C., Crabtree, J., Fadrosh, D., et al. 2007. Genome sequence of Babesia bovis and comparative analysis of apicomplexan hemoprotozoa. PLoS Pathog. 3, 1401-1413.

Brenchley, R., Spannagl, M., Pfeifer, M., Barker, G.L.A., D'Amore, R., Allen, A.M., McKenzie, N., Kramer, M., Kerhornou, A., Bolser, D., et al. 2012. Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature. 491, 705-710.

Bringaud, F., Ghedin, E., El-Sayed, N.M.A., Papadopoulou, B. 2008. Role of transposable elements in trypanosomatids. Microbes Infect. 10, 575-581.

Brugère, J.F., Cornillot, E., Méténier, G., Bensimon, A., Vivarès, C.P. 2000. Encephalitozoon cuniculi (Microspora) genome, physical map and evidence for telomere-associated rDNA units on all chromosomes. Nucl. Acids Res. 28, 2026-2033.

217

Burmester, A., Shelest, E., Glöckner, G., Heddergott, C., Schindler, S., Staib, P., Heidel, A., Felder, M., Petzold, A., Szafranski, K., et al. 2011. Comparative and functional genomics provide insights into the pathogenicity of dermatophytic fungi. Genome Biol. 12, R7.

Bushley, K.E., Raja, R., Jaiswal, P., Cumbie, J.S., Nonogaki, M., Boyd, A.E., Owensby, C.A., Knaus, B.J., Elser, J., Miller, D., et al. 2013. The genome of Tolypocladium inflatum, evolution, organization, and expression of the cyclosporin biosynthetic . PLoS Genet. 9, e1003496.

Butler, G., Rasmussen, M.D., Lin, M.F., Santos, M.A.S., Sakthikumar, S., Munro, C.A., Rheinbay, E., Grabherr, M., Reedy, J.L., Agrafioti, I., et al. 2010. Evolution of pathogenicity and sexual reproduction in eight Candida genomes. Nature. 459, 657-662.

Buyyarapu, R., Kantety, R.V., JYu, J.Z., Xu, Z., Kohel, R.J., Percy, R.G., Macmil, S., Wiley, G.B., Roe., B.A., Sharma, G.C. 2013. BAC-pool sequencing and analysis of large segments of A12 and D12 homoeologous chromosomes in upland cotton. PLoS One. 8, e76757.

Byrne, S.L., Nagy, I., Pfeifer, M., Armstead, I., Swain, S., Studer, B., Mayer, K., Campbell, J.D., Czaban, A., Hentrup, S., et al. 2015. A synteny-based draft genome sequence of the forage grass Lolium perenne. Plant J. 84, 816-826.

Cai, J., Liu, X., Vanneste, K., Proost, S., Tsai, W.-C., Liu, K.-W., Chen, L.-J., He, Y., Xu, Q., Bian, C., et al. 2014. The genome sequence of the orchid Phalaenopsis equestris. Nature Genet. 47, 65-72.

Campbell, S.E., Williams, T.A., Yousuf, A., Soanes, D.M., Paszkiewicz, K.H., Williams, B.A.P. 2013. The genome of Spraguea lophii and the basis of host-microsporidian interactions. PLoS Genet. 9, e1003676.

Cañestro, C., Albalat, R. 2012. Transposon diversity is higher in amphioxus than in vertebrates: functional and evolutionary inferences. Brief. Funct. Genomics. 11, 131-141.

Cannarozzi, G., Plaza-Wüthrich, S., Esfeld, K., Larti, S., Wilson, Y.S., Girma, D., de Castro, E., Chanyalew, S., Blösch, R., Farinelli, L., et al. 2014. Genome and transcriptome sequencing identifies breeding targets in the orphan crop tef (Eragrostis tef). BMC Genomics. 15, 581.

Cantu, D., Govindarajulu, M., Kozik, A., Wang, M., Chen, X., Kojima, K.K., Jurka, J., Michelmore, R.W., Dubcovsky, J. 2011. Next generation sequencing provides rapid access to the genome of Puccinia striiformis f. sp. tritici, the causal agent of wheat stripe rust. PLoS One. 6, e24230.

Cao, Z., Yu, Y., Wu, Y., Hao, P., Di, Z., He, Y., Chen, Z., Yang, W., Shen , Z., He, X., et al. 2013. The genome of Mesobuthus martensii reveals a unique adaptation model of arthropods. Nat. Comm. 4, 2602.

218

Carlton, J.M., Angiuoli, S.V., Suh, B.B., Kooij, T.W., Pertea, M., Silva, J.C., Ermolaeva, M.D., Allen, J.E., Selengut, J.D., Koo, H.L., et al. 2002 Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature. 419, 512-519.

Carlton, J.M., Adams, J.H., Silva, J.C., Bidwell, S.L., Lorenzi, H., Caler, E., Crabtree, J., Angiuoli, S.V., Merino, E.F., Amedeo, P., et al. 2008. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 455, 757-763.

Carlton, J.M., Hirt, R.P., Silva, J.C., Delcher, A.L., Schatz, M., Zhao, Q., Wortman, J.R., Bidwell, S.L., Alsmark, U.C.M., Besteiro, S., et al. 2007. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis. Science. 315, 207-212.

Carr, M., Suga, H. 2014. The Holozoan Capsaspora owczarzaki possesses a diverse complement of active transposable element families. Genome Biol. Evol. 6, 949-963.

Castoe, T.A., de Koning, A.J., Hall, K.T., Yokoyama, K.D., Gu, W., Smith, E.N., Feschotte, C., Uetz, P., Ray, D.A., Dobry, J., et al. 2011. Sequencing the genome of the Burmese python (Python molurus bivittatus) as a model for studying extreme adaptations in snakes. Genome Biol. 12, 406.

Castoe, T.A., De Koning, A.P.J., Hall, K.T., Card, D.C., Schield, D.R., Fujita, M.K., Ruggiero, R.P., Degner, J.F., Daza, J.M., Gu, W., et al. 2013. The Burmese python genome reveals the molecular basis for extreme adaptation in snakes. P. Natl. Acad. Sci. USA. 110, 20645-20650.

Cerqueira, G.C., Arnaud, M.B., Inglis, D.O., Skrzypek, M.S., Binkley, G., Simison, M., Miyasato, S.R., Binkley, J., Orvis, J., Shah, P., et al. 2014. The Aspergillus Genome Database: Multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations. Nucl. Acids Res. 42, 705-710.

Chagné, D., Crowhurst, R.N., Pindo, M., Thrimawithana, A., Deng, C., Ireland, H., Fiers, M., Dzierzon, H., Cestaro, A., Fontana, P., et al. 2014. The draft genome sequence of European pear (Pyrus communis L. 'Bartlett'). PLoS One. 9, e92644.

Chalhoub, B., Denoeud, F., Liu, S., Parkin, I.A.P., Tang, H., Wang, X., Chiquet, J., Belcram, H., Tong, C., Samans, B., et al. 2014. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science. 345, 950-953.

Chan, A.P., Crabtree, J., Zhao, Q., Lorenzi, H., Orvis, J., Puiu, D., Melake-Berhan, A., Jones, K.M., Redman, J., Chen, G., et al. 2010. Draft genome sequence of the oilseed species Ricinus communis. Nat. Biotech. 28, 951-956.

Chan, C.L., Yew, S.M., Na, S.L., Tan, Y.-c., Lee, K.W., Yee, W.-Y., Ngeow, Y.F., Ng, P.K. 2014. Draft genome sequence of Ochroconis constricta UM 578, isolated from human skin scraping. Genome Announce. 2, e000740-000714.

219

Chan, G.F., Bamadhaj, H.M., Gan, H.M., Rashid, N.A.A. 2012. Genome sequence of Aureobasidium pullulans AY4, an emerging opportunistic fungal pathogen with diverse biotechnological potential. Eukaryot. Cell. 11, 1419-1420.

Chapman, J.A., Kirkness, E.F., Simakov, O., Hampson, S.E., Mitros, T., Weinmaier, T., Rattei, T., Balasubramanian, P.G., Borman, J., Busam, D., et al. 2010. The dynamic Hydra genome. Nature. 464, 592-596.

Chatterjee, S., Alampalli, S.V., Nageshan, R.K., Chettiar, S.T., Joshi, S., Tatu, U.S. 2015. Draft genome of a commonly misdiagnosed multidrug resistant pathogen Candida auris. BMC Genomics. 16, 686.

Chen, C.-H., Kuo, T.C.-Y., Yang, M.-H., Chien, T.-Y., Chu, M.-J., Huang, L.-C., Chen, C-.Y., Lo, H.-F., Jeng, S.-T., Chen, L.-F.O. 2014. Identification of cucurbitacins and assembly of a draft genome for Aquilaria agallocha. BMC Genomics. 15, 578.

Chen, J., Huang, Q., Gao, D., Wang, J., Lang, Y., Liu, T., Li, B., Bai, Z., Luis Goicoechea, J., Liang, C., et al. 2013. Whole-genome sequencing of Oryza brachyantha reveals mechanisms underlying Oryza genome evolution. Nat. Comm. 4, 1595.

Chen, S., Xu, J., Liu, C., Zhu, Y., Nelson, D.R., Zhou, S., Li, C., Wang, L., Guo, X., Sun, Y., et al. 2012. Genome sequence of the model medicinal mushroom Ganoderma lucidum. Nat. Comm. 3, 913.

Chen, S., Zhang, G., Shao, C., Huang, Q., Liu, G., Zhang, P., Song, W., An, N., Chalopin, D., Volff, J.-N., et al. 2014 Whole-genome sequence of a flatfish provides insights into ZW sex chromosome evolution and adaptation to a benthic lifestyle. Nat. Genet. 46, 253-260.

Chen, X.-G., Jiang, X., Gu, J., Xu, M., Wu, Y., Deng, Y., Zhang, C., Bonizzoni, M., Dermauw, W., Vontas, J., et al. 2015. Genome sequence of the Asian Tiger mosquito, Aedes albopictus, reveals insights into its biology, genetics, and evolution. P. Natl .Acad. Sci. USA. 112, E5907-5915.

Cheng, S., van den Bergh, E., Zeng, P., Zhong, X., Xu, J., Liu, X., Hofberger, J., de Bruijn, S., Bhide, A.S., Kuelahoglu, C., et al. 2013. The Tarenaya hassleriana genome provides insight into reproductive trait and genome evolution of crucifers. Plant Cell 25, 2813-2830.

Chibucos, M.C., Etienne, K.A., Orvis, J., Lee, H., Daugherty, S., Lockhart, S.R., Ibrahim, A.S., Bruno, V.M. 2015. The genome sequence of four isolates from the family Lichtheimiaceae. Path. Disease. 73, ftv024.

Chipman, A.D., Ferrier, D.E.K., Brena, C., Qu, J., Hughes, D.S.T., Schröder, R., Torres-Oliva, M., Znassi, N., Jiang, H., Almeida, F.C., et al. 2014. The first Myriapod genome sequence 220 reveals conservative arthropod gene content and genome organisation in the centipede Strigamia maritima. PLoS Biol. 12, e1002005.

Chiu, J.C., Jiang, X., Zhao, L., Hamm, C.A., Cridland, J.M., Saelao, P., Hamby, K.A., Lee, E.K., Kwok, R.S., Zhang, G., et al. 2013. Genome of Drosophila suzukii, the spotted wing Drosophila. G3. 3, 2257-2271.

Cho, Y.S., Hu, L., Hou, H., Lee, H., Xu, J., Kwon, S., Oh, S., Kim, H.-M., Jho, S., Kim, S., et al. 2013. The tiger genome and comparative analysis with lion and snow leopard genomes. Nat. Comm. 4, 2433.

Chu, W.S., Magee, B.B., Magee, P.T. 1993. Construction of an SfiI macrorestriction map of the Candida albicans genome. J. Bacteriol. 175, 6637-6651.

Cissé, O.H., Almeida, J.M.G.C.F., Fonseca, Á., Kumar, A., Salojärvi, J., Overmyer, K. 2013. Genome sequencing of the plant pathogen Taphrina deformans, the causal agent of peach leaf curl. mBio. 4, e00055-00013.

Cissé, O.H., Pagni, M., Hauser, P.M. 2013. Specimen from a patient de novo assembly of the Pneumocystis jirovecii genome from a single bronchoalveolar lavage fluid specimen from a patient. mBio. 4, e00428-00412.

Clarke, M., Lohan, A.J., Liu, B., Lagkouvardos, I., Roy, S., Zafar, N., Bertelli, C., Schilde, C., Kianianmomeni, A., Burglin, T.R., et al. 2013. Genome of Acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling. Genome Biol. 14, R11.

Cock, J.M., Sterck, L., Rouzé, P., Scornet, D., Allen, A.E., Amoutzias, G., Anthouard, V., Artiguenave, F., Aury, J.-M., Badger, J.H., et al. 2010. The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature. 465, 617-621.

Colbourne, J.K., Pfrender, M.E., Gilbert, D., Thomas,W.K., Tucker, A., Oakley, T.H., Tokishita, S., Aerts, A., Arnold, G.J., Basu, M.K., et al. 2011. The ecoresponsive genome of Daphnia pulex. Science. 331, 555-561.

Coleman, J.J., Rounsley, S.D., Rodriguez-Carres, M., Kuo, A., Wasmann, C.C., Grimwood, J., Schmutz, J., Taga, M., White, G.J., Zhou, S., et al. 2009. The genome of Nectria haematococca: contribution of supernumerary chromosomes to gene expansion. PLoS Genet. 5, e1000618.

Collen, J., Porcel, B., Carre, W., Ball, S.G., Chaparro, C., Tonon, T., Barbeyron, T., Michel, G., Noel, B., Valentin, K., et al. 2013. Genome structure and metabolic features in the red seaweed Chondrus crispus shed light on evolution of the Archaeplastida. P. Natl. Acad. Sci. USA. 110, 5247-5252. 221

Collins, C., Keane, T.M., Turner, D.J., O'Keeffe, G., Fitzpatrick, D.A., Doyle, S. 2013. Genomic and proteomic dissection of the ubiquitous plant pathogen, Armillaria mellea, toward a new infection model system. J. Proteome Res. 12, 2552-2570.

Condon, B.J., Leng, Y., Wu, D., Bushley, K.E., Ohm, R.A., Otillar, R., Martin, J., Schackwitz, W., Grimwood, J., MohdZainudin, N., et al. 2013. Comparative genome structure, secondary metabolite, and effector coding capacity across Cochliobolus pathogens. PLoS Genet. 9, e1003233.

Cong, Q., Borek, D., Otwinowski, Z., Grishin, N.V. 2015. Tiger Swallowtail genome reveals mechanisms for speciation and caterpillar chemical defense. Cell Rep. 10, 910-919.

Cornillot, E., Hadj-Kaddour, K., Dassouli, A., Noel, B., Ranwez ,V., Vacherie, B., Augagneur, Y., Brès, V., Duclos, A., Randazzo, S., et al. 2012. Sequencing of the smallest Apicomplexan genome from the human pathogen Babesia microti. Nucl. Acids Res. 40, 9102-9114.

Cornman, R.S., Chen, Y.P., Schatz, M.C., Street, C., Zhao, Y., Desany, B., Egholm, M., Hutchison, S., Pettis, J.S., Lipkin, W.I., et al. 2009. Genomic analyses of the microsporidian Nosema ceranae, an emergent pathogen of honey bees. PLoS Pathog. 5, e1000466.

Corradi, N., Haag, K.L., Pombert, J.-F., Ebert, D., Keeling, P.J. 2009. Draft genome sequence of the Daphnia pathogen Octosporea bayeri: insights into the gene content of a large microsporidian genome and a model for host-parasite interactions. Genome Biol. 10, R106.

Corradi, N., Pombert, J.-F., Farinelli, L., Didier, E.S., Keeling, P.J. 2010. The complete sequence of the smallest known nuclear genome from the microsporidian Encephalitozoon intestinalis. Nat. Comm. 1,77.

Cotton, J.A., Lilley, C.J., Jones, L.M., Kikuchi, T., Reid, A.J., Thorpe, P., Tsai, I.J., Beasley, H., Blok, V., Cock, P.J., et al. 2014. The genome and life-stage specific transcriptomes of Globodera pallida elucidate key aspects of plant parasitism by a cyst nematode. Genome Biol. 15, R43.

Coyne, R.S., Hannick, L., Shanmugam, D., Hostetler, J.B., Brami, D., Joardar, V.S., Johnson, J., Radune, D., Singh, I., Badger, J.H., et al. 2011. Comparative genomics of the pathogenic ciliate Ichthyophthirius multifiliis, its free-living relatives and a host species provide insights into adoption of a parasitic lifestyle and prospects for disease control. Genome Biol. 12, R100.

Crameri, G., Todd, S., Grimley, S., McEachern, J.A., Marsh, G.A., Smith, C., Tachedjian, M., de Jong, C., Virtue, E.R., Yu, M., et al. 2009. Establishment, immortalisation and characterisation of pteropid bat cell lines. PLoS One. 4, e8266.

Cuesta, I., González, L.M., Estrada, K., Grande, R., Zaballos, A., Lobo, C.A., Barrera, J., Sanchez-Flores, A., Montero, E. 2014. High-quality draft genome sequence of Babesia divergens, the etiological agent of cattle and human babesiosis. Genome Announce. 2, 6-7.

222

Cuomo, C.A., Güldener, U., Xu, J.-R., Trail, F., Turgeon, B.G., Di Pietro, A., Walton, J.D., Ma, L.-J., Baker, S.E., Rep, M., et al. 2007. The Fusarium graminearum genome reveals a link between localized polymorphism and pathogen specialization. Science. 317, 1400-1402.

Cuomo, C.A., Valle, N.R.-D., Perez-Sanchez, L., Abouelleil, A., Goldberg, J., Young, S., Zeng, Q., Birren, B.W. 2014. Genome sequence of the pathogenic fungus Sporothrix schenckii (ATCC 58251). Genome Announce. 2, e00446-00414.

Curtin, C. D., Borneman, A.R., Chambers, P.J., Pretorius, I.S. 2012. De-novo assembly and analysis of the heterozygous triploid genome of the wine spoilage yeast Dekkera bruxellensis AWRI1499. PLoS One. 7, e33840.

Curtis, B.A., Tanifuji, G., Burki, F., Gruber, A., Irimia, M., Maruyama, S., Arias, M.C., Ball, S.G., Gile, G.H., Hirakawa, Y., et al. 2012. Algal genomes reveal evolutionary mosaicism and the fate of nucleomorphs. Nature. 492, 59-65.

D‘Hont, A., Denoeud, F., Aury, J.-M., Baurens, F.-C., Carreel, F., Garsmeur, O., Noel, B., Bocs, S., Droc, G., Rouard, M., et al. 2012. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 488, 213–217.

Dalloul, R.A., Long, J.A., Zimin, A.V., Aslam, L., Beal, K., Blomberg, L.A., Bouffard, P., Burt, D.W., Crasta, O., Crooijmans, R.P.M.A., et al. 2010. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol. 8, e1000475.

Dastjerdi, A., Robert, C., Watson, M. 2014. Low coverage sequencing of two Asian elephant (Elephas maximus) genomes. GigaSci. 3, 12.

Davey, M.W., Gudimella, R., Harikrishna, J.A., Sin, L.W., Khalid, N., Keulemans, J. 2013. A draft Musa balbisiana genome sequence for molecular genetics in polyploid, inter- and intra- specific Musa hybrids. BMC Genomics. 14, 683. de Carvalho, J. F., Chelaifa, H., Boutte, J., Poulain, J., Couloux, A., Wincker, P., Bellec, A., Fourment, J., Berges, H., Salmon, A., et al. 2013. Exploring the genome of the salt-marsh Spartina maritima (Poaceae, Chloridoideae) through BAC end sequence analysis. Plant Mol. Biol. 83, 591-606. de la Chaux, N., Tsuchimatsu, T., Shimizu, K.K., Wagner, A. 2012. The predominantly selfing plant Arabidopsis thaliana experienced a recent reduction in transposable element abundance compared to its outcrossing relative Arabidopsis lyrata. Mob. DNA. 3, 2. de Wit, P.J.G.M., van der Burgt, A., Ökmen, B., Stergiopoulos, I., Abd-Elsalam, K.A., Aerts, A.L., Bahkali, A.H., Beenen, H.G., Chettri, P., Cox, M.P., et al. 2012. The genomes of the fungal

223 plant pathogens Cladosporium fulvum and Dothistroma septosporum reveal adaptation to different hosts and lifestyles but also signatures of common ancestry. PLoS Genet. 8, e1003088.

Dean, R.A., Talbot, N.J., Ebbole, D.J., Farman, M.L., Mitchell, T.K., Orbach, M.J., Thon, M., Kulkarni, R., Xu, J.-R., Pan, H., et al. 2005. The genome sequence of the rice blast fungus Magnaporthe grisea. Nature. 434, 980-986.

Dehal, P., Satou, Y., Campbell, R.K., Chapman, J., Degnan, B., De Tomaso, A., Davidson, B., Di Gregorio, A., Gelpke, M., Goodstein, D.M., et al. 2002. The draft genome of Ciona intestinalis, insights into chordate and vertebrate origins. Science. 298, 2157-2167.

Deligios, M., Fraumene, C., Abbondio, M., Mannazzu, I., Tanca, A., Addis, M.F., Uzzau, S. 2015. Draft genome sequence of Rhodotorula mucilaginosa, an emergent opportunistic pathogen. Genome Announce. 3, e00201-00215.

DeMarco, R., Venancio, T.M., Verjovski-Almeida, S. 2006. SmTRC1, a novel Schistosoma mansoni DNA transposon, discloses new families of animal and fungi transposons belonging to the CACTA superfamily. BMC Evol. Biol. 6, 8.

Denoeud, F., Henriet, S., Mungpakdee, S., Aury, J.-M., Da Silva, C., Brinkmann, H., Mikhaleva, J., Olsen, L.C., Jubin, C., Canestro, C., et al. 2010. Plasticity of animal genome architecture unmasked by rapid evolution of a pelagic tunicate. Science. 330, 1381-1385.

Denoeud, F., Carretero-Paulet, L., Dereeper, A., Droc, G., Guyot, R., Pietrella, M., Zheng, C., Alberti, A., Anthony, F., Aprea, G., et al. 2014. The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science. 345, 1181-1184.

Dereeper, A., Guyot, R., Tranchant-Dubreuil, C., Anthony, F.O., Argout, X., de Bellis, F., Combes, M.C., Gavory, F., de Kochko, A., Kudrna, D., et al. 2013. BAC-end sequences analysis provides first insights into coffee (Coffea canephora P.) genome composition and evolution. Plant Mol. Biol. 83, 177-189.

Derelle, E., Ferraz, C., Rombauts, S., Rouzé, P., Worden, A.Z., Robbens, S., Partensky, F., Degroeve, S., Echeynié, S., Cooke, R., et al. 2006. Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. P. Natl. Acad. Sci. USA. 103, 11647-11652.

Desjardins, C.A., Cerqueira, G.C., Goldberg, J.M., Hotopp, J.C.D., Haas, B.J., Zucker, J., Ribeiro, J.M.C., Saif, S., Levin, J.Z., Fan, L., et al. 2013. Genomics of Loa loa, a Wolbachia-free filarial parasite of humans. Nat. Genet. 45, 495-500.

Desjardins, C.A., Champion, M.D., Holder, J.W., Muszewska, A., Goldberg, J., Bailão, A.M., Brigido, M.M., Ferreira, M.E.D.S., Garcia, A.M., Grynberg, M., et al. 2011. Comparative

224 genomic analysis of human fungal pathogens causing paracoccidioidomycosis. PLoS Genet. 7, e1002345.

Detering, H., Aebischer, T., Dabrowski, P.W., Radonic, A., Nitsche, A., Renard, B., Kiderlen, A. 2015. First Draft Genome Sequence of , the causative agent of amoebic encephalitis. Genome Announce. 3, e01013-01015.

Dieterich, C., Clifton, S.W., Schuster, L.N., Chinwalla, A., Delehaunty, K., Dinkelacker, I., Fulton, L., Fulton, R., Godfrey, J., Minx, P., et al. 2008. The Pristionchus pacificus genome provides a unique perspective on nematode lifestyle and parasitism. Nat. Genet. 40, 1193-1198.

Dietrich, F.S., Voegeli, S., Brachat, S., Lerch, A., Gates, K., Steiner, S., Mohr, C., Pöhlmann, R., Luedi, P., Choi, S., et al. 2004. The Ashbya gossypii genome as a tool for mapping the ancient Saccharomyces cerevisiae genome. Science. 304, 304-307.

Dietrich, F.S., Voegeli, S., Kuo, S., Philippsen, P. 2013. Genomes of Ashbya fungi isolated from insects reveal four mating-type loci, numerous translocations, lack of transposons, and distinct gene duplications. G3. 3, 1225-1239.

Diguistini, S., Wang, Y., Liao, N.Y., Taylor, G., Tanguay, P., Feau, N., Henrissat, B., Chan, S.K., Hesse-Orce, U., Alamouti, S.M., et al. 2010. Genome and transcriptome analyses of the mountain pine beetle-fungal symbiont Grosmannia clavigera, a lodgepole pine pathogen. P. Natl. Acad. Sci. USA. 108, 2504–2509.

Dohm, J.C., Minoche, A.E., Holtgräwe, D., Capella-Gutiérrez, S., Zakrzewski, F., Tafer, H., Rupp, O., Sörensen, T.R., Stracke, R., Reinhardt, R., et al. 2014. The genome of the recently domesticated crop plant sugar beet (Beta vulgaris). Nature. 505, 546-549.

Dorn, K.M., Fankhauser, J.D., Wyse, D.L., Marks, M.D. 2013. De novo assembly of the pennycress (Thlaspi arvense) transcriptome provides tools for the development of a winter cover crop and biodiesel feedstock. Plant J. 75, 1028-1038.

Doyle, J.M., Katzner, T.E., Bloom, P.H., Ji, Y., Wijayawardena, B.K., Dewoody, J.A. 2014. The genome sequence of a widespread apex predator, the golden eagle (Aquila chrysaetos). PLoS One. 9, e95599.

Dritsou, V., Topalis, P., Windbichler, N., Simoni, A., Hall, A., Lawson, D., Hinsley, M., Hughes, D., Napolioni, V., Crucianelli, F., et al. 2015. A draft genome sequence of an invasive mosquito: an Italian Aedes albopictus. Path. Glob. Health. 109, 207-220.

Drosophila 12 Genomes Consortium. 2007. Evolution of genes and genomes on the Drosophila phylogeny. Nature. 450, 203-218.

225

Du, J., Grant, D., Tian, Z., Nelson, R.T., Zhu, L., Shoemaker, R.C., Ma, J. 2010. SoyTEdb: a comprehensive database of transposable elements in the soybean genome. BMC Genomics. 11, 113.

Dujon, B., Sherman, D., Fischer, G., Durrens, P., Casaregola, S., Lafontaine, I., De Montigny, J., Marck, C., Neuvéglise, C., Talla, E., et al. 2004. Genome evolution in yeasts. Nature. 430, 35-44.

Duplessis, S., Cuomo, C.A., Lin, Y.-C., Aerts, A., Tisserant, E., Veneault-Fourrey, C., Joly, D.L., Hacquard, S., Amselem, J., Cantarel, B.L., et al. 2011. Obligate biotrophy features unraveled by the genomic analysis of rust fungi. P. Natl. Acad. Sci. USA.108, 9166-9171.

Durand, P. M., Oelofse, A.J., Coetzer, T.L. 2006. An analysis of mobile genetic elements in three Plasmodium species and their potential impact on the nucleotide composition of the P. falciparum genome. BMC Genomics. 7, 282.

Eastwood, D.C., Floudas, D., Binder, M., Majcherczyk, A., Schneider, P., Aerts, A., Asiegbu, F.O., Baker, S.E., Barry, K., Bendiksby, M., et al. 2011. The plant cell wall-decomposing machinery underlies the functional diversity of forest fungi. Science. 333, 762-765.

Ehrenkaufer, G.M., Weedall, G.D., Williams, D., Lorenzi, H.A., Caler, E., Hall, N., Singh, U. 2013. The genome and transcriptome of the enteric parasite Entamoeba invadens, a model for encystation. Genome Biol. 14, R77.

Eichinger, L., Pachebat, J.A., Glöckner, G., Rajandream, M.-A., Sucgang, R., Berriman, M., Song. J., Olsen, R., Szafranski, K., Xu, Q., et al. 2005. The genome of the social amoeba Dictyostelium discoideum. Nature. 435, 43-57.

Eisen, J.A., Coyne, R.S., Wu, M., Wu, D., Thiagarajan, M., Wortman, J.R., Badger, J.H., Ren, Q., Amedeo, P., Jones, K.M., et al. 2006. Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryotes. PLoS Biol. 4, e286.

Ellegren, H., Smeds, L., Burri, R., Olason, P.I., Backström, N., Kawakami, T., Künstner, A., Mäkinen, H., Nadachowska-Brzyska, K., Qvarnström, A., et al. 2012. The genomic landscape of species divergence in Ficedula flycatchers. Nature. 491, 756-760.

Ellison, C.E., Stajich, J.E., Jacobson, D.J., Natvig, D.O., Lapidus, A., Foster, B., Aerts, A., Riley, R., Lindquist, E.A., Grigoriev, I.V., et al. 2011. Massive changes in genome architecture accompany the transition to self-fertility in the filamentous fungus Neurospora tetrasperma. Genetics. 189, 55-69.

El-Sayed, N.M., Myler, P.J., Bartholomeu, D.C., Nilsson, D., Aggarwal, G., Tran, A.-N., Ghedin, E., Worthey, E.A., Delcher, A.L., Blandin, G., et al. 2005. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science. 309, 409-415. 226

Espagne, E., Lespinet, O., Malagnac, F., Da Silva, C., Jaillon, O., Porcel, B.M., Couloux, A., Aury, J.-M., Ségurens, B., Poulain, J., et al. 2008. The genome sequence of the model ascomycete fungus Podospora anserina. Genome Biol. 9, R77.

Etienne, K.A., Chibucos, M.C., Su, Q., Orvis, J., Daugherty, S., Ott, S., Sengamalay, N.A., Fraser, C.M., Lockhart, S.R., Bruno, M. 2014. Draft genome sequence of Mortierella alpina isolate CDC-B6842. Genome Announce. 2, e01180-01113.

Fairclough, S.R., Chen, Z., Kramer, E., Zeng, Q., Young, S., Robertson, H.M., Begovic, E., Richter, D.J., Russ, C., Westbrook, M.J., et al. 2013. Premetazoan genome evolution and the regulation of cell differentiation in the choanoflagellate Salpingoeca rosetta. Genome Biol. 14, R15.

Fang, X., Mou, Y., Huang, Z., Li, Y., Han, L., Zhang, Y., Feng, Y., Chen, Y., Jiang, X., Zhao, W., et al. 2012. The sequence and analysis of a Chinese pig genome. GigaSci. 1, 16.

Fang, X., Nevo, E., Han, L., Levanon, E.Y., Zhao, J., Avivi, A., Larkin, D., Jiang, X., Feranchuk, S., Zhu, Y., et al. 2014. Genome-wide adaptive complexes to underground stresses in blind mole rats Spalax. Nat. Comm. 5, 3966.

Fedorova, N.D., Khaldi, N., Joardar, V.S., Maiti, R., Amedeo, P., Anderson, M.J., Crabtree, J., Silva, J.C., Badger, J.H., Albarraq, A., et al. 2008. Genomic islands in the pathogenic filamentous fungus Aspergillus fumigatus. PLoS Genet. 4, e1000046.

Fernandez-Fueyo, E., Ruiz-Dueñas, F.J., Ferreira, P., Floudas, D., Hibbett, D.S., Canessa, P., Larrondo, L.F., James, T.Y., Seelenfreund, D., Lobos, S., et al. 2012. Comparative genomics of Ceriporiopsis subvermispora and Phanerochaete chrysosporium provide insight into selective ligninolysis. P. Natl. Acad. Sci. USA. 109, 5458–5463.

Fernández-Medina, R. D., Struchiner, C.J., Ribeiro, J.M.C. 2011. Novel transposable elements from Anopheles gambiae. BMC Genomics. 12, 260.

Fierro, F., Gutiérrez, S., Diez, B., Martín, J.F. 1993. Resolution of four large chromosomes in penicillin-producing filamentous fungi, the penicillin gene cluster is located on chromosome II (9.6 Mb) in Penicillium notatum and chromosome 1 (10.4 Mb) in Penicillium chrysogenum. Mol. Gen. Genet. 241, 573-578.

Flegontov, P., Votýpka, J., Skalický, T., Logacheva, M.D., Penin, A.A., Tanifuji, G., Onodera, N.T., Kondrashov, A.S., Volf, P., Archibald, J.M. et al. 2013. Paratrypanosoma is a novel early- branching trypanosomatid. Curr. Biol. 23, 1787-1793.

227

Flot, J.-F., Hespeels, B., Li, X., Noel, B., Arkhipova, I., Danchin, E.G.J., Hejnol, A., Henrissat, B., Koszul, R., Aury, J.-M., et al. 2013. Genomic evidence for ameiotic evolution in the bdelloid rotifer Adineta vaga. Nature. 500, 453-457.

Floudas, D., Binder, M., Riley, R., Barry, K., Blanchette, R.A., Henrissat, B., Martínez, A.T., Otillar, R., Spatafora, J.W., Yadav, J.S., et al. 2012. The Paleozoic origin of enzymatic lignin decomposition reconstructed from 31 fungal genomes. Science. 336, 1715-1719.

Foflonker, F., Price, D.C., Qiu, H., Palenik, B., Wang, S., Bhattacharya, D. 2014. Genome of the halotolerant green alga Picochlorum sp. reveals strategies for thriving under fluctuating environmental conditions. Environ. Microbiol. 17, 412-426.

Foulongne-Oriol, M., Murat, C., Castanera, R., Ramírez, L., Sonnenberg, A.S.M. 2013. Genome- wide survey of repetitive DNA elements in the button mushroom Agaricus bisporus. Fungal Genet. Biol. 55, 6-21.

Frantz, L.A., Schraiber, F.J.G., Madsen, O., Megens, H.-J., Bosse, M., Paudel, Y., Semiadi, G., Meijard, E., Crooijmans, R.P., Archibald, A.L., Slatkin, M., et al. 2013. Genome sequencing reveals fine scale diversification and reticulation history during speciation in Sus. Genome Biol. 14, R107

Franzén, O., Jerlström-Hultqvist, J., Castro, E., Sherwood, E., Ankarklev, J., Reiner, D.S., Palm, D., Andersson, J.O., Andersson, B., Svärd, S.G. 2009. Draft genome sequencing of Giardia intestinalis assemablage B isolate GS: is human giardiasis caused by two different species? PLoS Pathog. 5, e1000560.

Freel, K.C., Sarilar, V., Neuvéglise, C., Devillers, H., Friedrich, A., Schacherer, J. 2014. Genome sequence of the yeast Cyberlindnera fabianii (Hansenula fabianii). Genome Announce. 2, e00638-00614.

Fritz-Laylin, L.K., Prochnik, S.E., Ginger, M.L., Dacks, J.B., Carpenter, M.L., Field, M.C., Kuo, A., Paredez, A., Chapman, J., Pham, J., et al. 2010. The genome of Naegleria gruberi illuminates early eukaryotic versatility. Cell 140, 631-642.

Fujii, T., Koike, H., Sawayama, S., Yano, S., Inoue, H. 2015. Draft genome sequence of Talaromyces cellulolyticus strain Y-94, a source of lignocellulosic biomass-degrading enzymes. Genome Announce. 3, e00014-00015.

Fungal Genome Size Database. 2005. Kullman, B., Tamm, H., Kullman, K. http://www.zbi.ee/fungal-genomesize

Futagami, T., Mori, K., Yamashita, A., Wada, S., Kajiwara, Y., Takashita, H., Omori, T., Takegawa, K., Tashiro, K., Kuhara, S., et al. 2011. Genome sequence of the white koji mold

228

Aspergillus kawachii IFO 4308, used for brewing the Japanese distilled spirit shochu. Eukaryot. Cell. 10, 1586-1587.

Galagan, J.E., Calvo, S.E., Borkovich, K.A., Selker, E.U., Read, N.D., Jaffe, D., FitzHugh, W., Ma, L.-J., Smirnov, S., Purcell, S., et al. 2003. The genome sequence of the filamentous fungus Neurospora crassa. Nature. 422, 859-868.

Galagan, J.E., Calvo, S.E., Cuomo ,C., Ma, L.-J., Wortman, J.R., Batzoglou, S., Lee, S.-I., Bastürkmen, M., Spevak, C.C., Clutterbuck, J., et al. 2005. Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature. 438, 1105-1115.

Galeote, V., Bigey, F., Devillers, H., Neuvéglise, C., Dequin, S. 2013. Genome sequence of the food spoilage yeast Zygosaccharomyces bailii. Genome Announce. 1, e00606-00613.

Gao, C., Wang, Y., Shen, Y., Yan, D., He, X., Dai, J., Wu, Q. 2014. Oil accumulation mechanisms of the oleaginous microalga Chlorella protothecoides revealed through its genome, transcriptomes, and proteomes. BMC Genomics. 15, 582.

Gao, M., Li, G., McCombie, W.R., Quiros, C.F. 2005. Comparative analysis of a transposon-rich Brassica oleracea BAC clone with its corresponding sequence in A. thaliana. Theoret. Appl. Genet. 111, 949-955

Gao, Q., Jin, K., Ying, S.-H., Zhang, Y., Xiao, G., Shang, Y., Duan, Z., Hu, X., Xie, X.-Q.., Zhou, G., et al. 2011. Genome sequencing and comparative transcriptomics of the model entomopathogenic fungi Metarhizium anisopliae and M. acridum. PLoS Genet. 7, e1001264.

Gao, R., Cheng, Y., Wang, Y., Guo, L., Zhang, G. 2015. Genome sequence of Phytophthora fragariae var. fragariae, a quarantine plant-pathogenic fungus. Genome Announce. 3, e00034- 00015.

Gao, Y., Gao, Q., Zhang, H., Wang, L., Zhang, F., Yang, C., Song, L. 2014. Draft sequencing and analysis of the genome of pufferfish Takifugu flavidus. DNA Res. 21, 627-637.

Garcia, S., Leitch, I.J., Anadon-Rosell, A., Canela, M.Á., Gálvez, F., Garnatje, T., Gras, A., Hidalgo, O., Johnston, E., De Xaxars, G.M., et al. 2014. Recent updates and developments to plant genome size databases. Nucl. Acids Res. 42, 1-8.

Garcia-Mas, J., Benjak, A., Sanseverino, W., Bourgeois, M., Mir, G., González, V.M., Hénaff, E., Câmara, F., Cozzuto, L., Lowy, E., et al. 2012. The genome of melon (Cucumis melo L.). P. Natl. Acad. Sci. USA. 109, 11872-11877.

Gardner, M.J., Hall, N., Fung, E., White, O., Berriman, M., Hyman, R.W., Carlton, J.M., Pain, A., Nelson, K.E., Bowman, S., et al. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 419, 498-511.

229

Ge, R.-L., Cai, Q., Shen, Y.-Y., San, A., Ma, L., Zhang, Y., Yi, X., Chen, Y., Yang, L., Huang, Y., et al. 2013. Draft genome sequence of the Tibetan antelope. Nat. Comm. 4, 1858.

Genet, C., Dehais, P., Palti, Y., Gao, G., Gavory, F., Wincker, P., Quillet, E., Bouhassa, M. 2011. Analysis of BAC-end sequences in rainbow trout: content characterization and assessment of synteny between trout and other fish genomes. BMC Genomics. 12, 314

Gentles, A. J., Wakefield, M.J., Kohany, O., Gu, W., Batzer, M.A., Pollock, D.D., Jerka, J. 2007. Evolutionary dynamics of transposable elements in the short-tailed opossum Monodelphis domestica. Genome Res. 17, 992-1004.

Ghedin, E., Wang, S., Spiro, D., Caler, E., Zhao, Q., Crabtree, J., Allen, J.E., Delcher, A.L., Guiliano, D.B., Miranda-Saavedra, D., et al. 2007. Draft genome of the filarial nematode parasite Brugia malayi. Science. 317, 1756-1760.

Gianoulis, T.A., Griffin, M.A., Spakowicz, D.J., Dunican, B.F., Alpha, C.J., Sboner, A., Sismour, A.M., Kodira, C., Egholm, M., Church, G.M., et al. 2012. Genomic analysis of the hydrocarbon-producing, cellulolytic, endophytic fungus Ascocoryne sarcoides. PLoS Genet. 8, e1002558.

Gibbs, R.A., Weinstock, G.M., Metzker, M.L., Muzny, D.M., Sodergren, E.J., Scherer, S., Scott, G., Steffan, D., Worley, K.C., Burch, P.E., et al. 2004. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 428, 493-521.

Gilchrist, A., Shearman, D.C., Frommer, M., Raphael, K.A., Deshpande, N.P., Wilkins, M.R., Sherwin, W.B., Sved, J.A. 2014. The draft genome of the pest tephritid fruit fly Bactrocera tryoni: resources for the genomic analysis of hybridising species. BMC Genomics. 15, 1153.

Gillece, J. D., Schupp, J.M., Balajee, S.A., Harris, J., Pearson, T., Yan, Y., Keim, P., DeBess, E., Marsden-Haug, N., Wohrle, R., et al. 2011. Whole genome sequence analysis of Cryptococcus gattii from the Pacific Northwest reveals unexpected diversity. PLoS One. 6, e28550.

Giorello, F.M., Berná, L., Greif, G., Camesasca, L., Salzman, V., Medina, K., Robello, C., Gaggero, C., Aguilar, P.S. 2014. Genome sequence of the native apiculate wine yeast Hanseniaspora vineae T02/19AF. Genome Announce. 2, e00530-00514.

Gobler, C.J., Berry, D.L., Dyhrman, S.T., Wilhelm, S.W., Salamov, A., Lobanov, A.V., Zhang, Y., Collier, J.L., Wurch, L.L., Kustka, A.B., et al. 2011. Niche of harmful alga Aureococcus anophagefferens revealed through ecogenomics. P. Natl. Acad. Sci. USA. 108, 4352-4357.

Godel, C., Kumar, S., Koutsovoulos, G., Ludin, P., Nilsson, D., Comandatore, F., Wrobel, N., Thompson, M., Schmid, C.D., Goto, S., et al. 2012. The genome of the heartworm, Dirofilaria immitis, reveals drug and vaccine targets. FASEB J. 26, 4650-4661.

230

Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., et al. 1996. Life with 6000 genes. Science. 274, 546,563- 567.

Gomez-Angulo, J., Vega-Alvarado, L., Escalante-García, Z., Grande, R., Gschaedler-Mathis, A., Amaya-Delgado, L., Arrizon, J., Sanchez-Flores, A. 2015. Genome sequence of Torulaspora delbrueckii NRRL Y-50541, isolated from mezcal fermentation. Genome Announce. 3, e00438- 00415.

González, L. G., Deyholos, M.K. 2012. Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome. BMC Genomics. 13, 644.

Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam N., et al. 2012. Phytozome: A comparative platform for green plant genomics. Nucl. Acids Res. 40, 1178-1186.

Goodwin, S.B., M'barek, S.B., Dhillon, B., Wittenberg, A.H.J., Crane, C.F., Hane, J.K., Foster, A.J., Van der Lee, T.A.J., Grimwood, J., Aerts, A., et al. 2011. Finished genome of the fungal wheat pathogen Mycosphaerella graminicola reveals dispensome structure, chromosome plasticity, and stealth pathogenesis. PLoS Genet. 7, e1002070.

Goodwin, T.J.D., Poulter, R.T.M. 2004. A new group of tyrosine recombinase-encoding retrotransposons. Mol. Biol. Evol. 21, 746-759.

Grbic, M., Van Leeuwen, T., Clark, R.M., Rombauts, S., Rouzé, P., Grbic, V., Osborne, E.J., Dermauw, W., Thi Ngoc, P.C., Ortego, F., et al. 2011. The genome of Tetranychus urticae reveals herbivorous pest adaptations. Nature. 479, 487-492.

Green, R.E., Braun, E.L., Armstrong, J., Earl, D., Nguyen, N., Hickey, G., Vandewege, M.W., John, J.A.S., Capella-Gutiérrez, S., Castoe, T.A., et al. 2014. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs. Science. 346, 1355.

Groenen, M.A.M., Archibald, A.L., Uenishi, H., Tuggle, C.K., Takeuchi, Y., Rothschild, M.F., Rogel-Gaillard, C., Park, C., Milan, D., Megens, H.-J., et al. 2012. Analyses of pig genomes provide insight into porcine demography and evolution. Nature. 491, 393-398.

Gui, Y.-J., Zhou, Y., Wang, Y., Wang, S., Wang, S.-Y., Hu, Y., Bo, S.-P., Chen, H., Zhou, C.P., Ma, N.X., et al. 2010. Insights into the bamboo genome: syntenic relationships to rice and sorghum. J. Integr. Plant Biol. 51, 1008-1015.

Guo, S., Zhang, J., Sun, H., Salse, J., Lucas, W.J., Zhang, H., Zheng, Y., Mao, L., Ren, Y., Wang, Z., et al. 2013. The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions. Nat. Genet. 45, 51-58.

231

Haas, B.J., Kamoun, S., Zody, M.C., Jiang, R.H.Y., Handsaker, R.E., Cano, L.M., Grabherr, M., Kodira, C.D., Raffaele, S., Torto-Alalibo, T., et al. 2009. Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans. Nature. 461, 393-398.

Haddad, N.J., Loucif-Ayad, W., Adjlane, N., Saini, D., Manchiganti, R., Krishnamurthy, V., AlShagoor, B., Batainh, A.M., Mugasimangalam, R. 2015. Draft genome sequence of the Algerian bee Apis mellifera intermissa. Genomic Data. 4, 24-25.

Hahn, C., Fromm, B., Bachmann, L. 2014. Comparative genomics of flatworms (platyhelminthes) reveals shared genomic features of ecto- and endoparasitic neodermata. Genome Biol. Evol. 6, 1105-1117.

Hahn, M.W., Han, M.V., Han, S.G. 2007. evolution across 12 Drosophila genomes. PLoS Genet. 3, 2135-2146.

Hamberger, B., Hall, D., Yuen, M., Oddy, C., Hamberger, B., Keeling, C.I., Ritland, C., Ritland, K., Bohlmann, J. 2009. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome. BMC Plant Biol. 9, 106

Hane, J.K., Lowe, R.G.T., Solomon, P.S., Tan, K.-C., Schoch, C.L., Spatafora, J.W., Crous, P.W., Kodira, C., Birren, B.W., Galagan, J.E., et al. 2007. Dothideomycete plant interactions illuminated by genome sequencing and EST analysis of the wheat pathogen Stagonospora nodorum. Plant Cell. 19, 3347-3368.

Hao, D.-C., Yang, L., Xiao, P.-G. 2011. The first insight into the Taxus genome via fosmid library construction and end sequencing. Mol. Genet. Genomics. 285, 197-205.

Haridas, S., Wang, Y., Lim, L., Alamouti, S.M., Jackman, S., Docking, R., Robertson, G., Birol, I., Bohlmann, J., Breuil, C. 2013. The genome and transcriptome of the pine saprophyte Ophiostoma piceae, and a comparison with the bark beetle-associated pine pathogen Grosmannia clavigera. BMC Genomics. 14, 373.

Haudry, A., Platts, A.E., Vello, E., Hoen, D.R., Leclercq, M., Williamson, R.J., Forczek, E., Joly-Lopez, Z., Steffen, J.G., Hazzouri, K.M., et al. 2013. An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat. Genet. 45, 891-898.

Hauser, P.M., Burdet, F.X., Cissé, O.H., Keller, L., Taffé, P., Sanglard, D., Pagni, M. 2010. Comparative genomics suggests that the fungal pathogen Pneumocystis is an obligate parasite scavenging amino acids from its host's lungs. PLoS One. 5, e15152.

He, N., Zhang, C., Qi, X., Zhao, S., Tao, Y., Yang, G., Lee, T.-H., Wang, X., Cai, Q., Li, D., et al. 2013. Draft genome sequence of the mulberry tree Morus notabilis. Nat. Comm. 4, 2445.

232

Heinz, E., Williams, T.A., Nakjang, S., Noël, C.J., Swan, D.C., Goldberg, A.V., Harris, S.R., Weinmaier, T., Markert, S., Becher, D., et al. 2012. The genome of the obligate intracellular parasite Trachipleistophora hominis, new insights into microsporidian genome dynamics and reductive evolution. PLoS Pathog. 8, e1002979.

Heitlinger, E., Spork, S., Lucius, R., Dieterich, C. 2014. The genome of Eimeria falciformis - reduction and specialization in a single host apicomplexan parasite. BMC Genomics. 15, 696.

Hellborg, L., Piškur, J. 2009. Complex nature of the genome in a wine spoilage yeast, Dekkera bruxellensis. Eukaryot. Cell. 8, 1739-1749.

Hellsten, U., Harland, R.M., Gilchrist, M.J., Hendrix, D., Jurka, J., Kapitonov, V., Ovcharenko, I., Putnam, N.H., Shu, S., Taher, L., et al. 2010. The genome of the western clawed frog Xenopus tropicalis. Science. 328, 633-636.

Herrera-Estrella, A., Goldman, G.H., van Montagu, M., Geremia, R.A. 1993. Electrophoretic karyotype and gene assignment to resolved chromosomes of Trichoderma spp. Mol. Microbiol. 7, 515-521.

Hertweck, K. 2013. Assembly and comparative analysis of transposable elements from low coverage genomic sequence data in Asparagales. Genome. 56, 1-8.

Hirai, H., Taguchi, T., Saitoh, Y., Kawanaka, M., Sugiyama, H., Habe, S., Okamoto, M., Hirata, M., Shimada, M., Tiu, W.U., et al. 2000. Chromosomal differentiation of the Schistosoma japonicum complex. Int. J. Parasitol. 30, 441-452.

Hirakawa, H., Shirasawa, K., Miyatake, K., Nunome, T., Negoro, S., Ohyama, A., Yamaguchi, H., Sato, S., Isobe, S., Tabata, S., Fukuoka, H. 2014. Draft genome sequence of eggplant (Solanum melongena L.): the representative Solanum species indigenous to the old world. DNA Res. 21, 649-660.

Hirakawa, H., Shirasawa, K., Kosugi, S., Tashiro, K., Nakayama, S., Yamada, M., Kohara, M., Watanabe, A., Kishida, Y., Fujishiro, T., et al. 2013. Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species. DNA Res. 29, 169-181

Hirakawa, H., Okada, Y., Tabuchi, H., Shirasawa, K., Watanabe, A., Tsuruoka, H., Minami, C., Nakayama, S., Sasamoto, S., Kohara, M., et al. 2015. Survey of genome sequences in a wild sweet potato, Ipomoea trifida (H. B. K.) G. Don. DNA Res. 22, 171-179.

Holt, R.A., Subramanian, G.M., Halpern, A., Sutton, G.G., Charlab, R., Nusskern, D.R., Wincker, P., Clark, A.G., Ribeiro, J.M.C., Wides, R., et al. 2002. The genome sequence of the malaria mosquito Anopheles gambiae. Science. 298, 129-149.

Hong, X., Scofield, D.G., Lynch, M. 2006. Intron size, abundance, and distribution within untranslated regions of genes. Mol. Biol. Evol. 23, 2392-2404. 233

Hori, K., Maruyama, F., Fujisawa, T., Togashi, T., Yamamoto, N., Seo, M., Sato, S., Yamada, T., Mori, H., Tajima, N., et al. 2014. Klebsormidium flaccidum genome reveals primary factors for plant terrestrial adaptation. Nat. Comm. 5, 3978.

Horn, F., Habel, A., Scharf, D.H., Dworschak, J., Brakhage, A.A., Guthke, R., Hertweck, C., Linde, J. 2015. Draft genome sequence and gene annotation of the entomopathogenic Fungus Verticillium hemipterigenum. Genome Announce. 3, e01439-01414.

Hovde, B.T., Deodato, C.R., Hunsperger, H.M., Ryken, S.A., Yost, W., Jha, R.K., Patterson, J., Monnat, R.J., Barlow, S.B., Starkenburg, S.R., Cattolico, R.A. 2015. Genome sequence and transcriptome analyses of Chrysochromulina tobin: metabolic tools for enhanced algal fitness in the prominent order Prymnesiales (Haptophyceae). PLoS Genet. 11, 1-31.

Howe, K., Clark, M.D., Torroja, C.F., Torrance, J., Berthelot, C., Muffato, M., Collins, J.E., Humphray, S., McLaren, K., Matthews, L., et al. 2013. The zebrafish reference genome sequence and its relationship to the human genome. Nature. 496, 498-503.

Hsu, C.-C., Chung, Y.-L., Chen, T.-C., Lee, Y.-L., Kuo, Y.-T., Tasi, W.-C., Hsiao, Y.-Y., Chen, Y.W., Wu, W.L., Chen, H.H. 2011. An overview of the Phalaenopsis orchid genome through BAC end sequence analysis. BMC Plant Biol. 11, 3.

Hu, T.T., Pattyn, P., Bakker, E.G., Cao, J., Cheng, J.-F., Clark, R.M., Fahlgren, N., Fawcett, J.A., Grimwood, J., Gundlach, H., et al. 2011. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat. Genet. 43, 476-481.

Huang, S., Ding, J., Deng, D., Tang, W., Sun, H., Liu, D., Zhang, L., Niu, X., Zhang, X., Meng, M., et al. 2013. Draft genome of the kiwifruit Actinidia chinensis. Nat. Comm. 4, 2640.

Huang, S., Li, R., Zhang, Z., Li, L., Gu, X., Fan, W., Lucas, W.J., Wang, X., Xie, B., Ni, P., et al. 2009. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 41, 1275-1281.

Huang, Y., Li, Y., Burt, D.W., Chen, H., Zhang, Y., Qian, W., Kim, H., Gan, S., Zhao, Y., Li, J., et al. 2013. The duck genome and transcriptome provide insight into an avian influenza virus reservoir species. Nat. Genet. 45, 776-783.

Ibarra-Laclette, E., Lyons, E., Hernández-Guzmán, G., Pérez-Torres, C.A., Carretero-Paulet, L., Chang, T.-H., Lan, T., Welch, A.J., Juárez, M.J.A., Simpson, J., et al. 2013. Architecture and evolution of a minute plant genome. Nature. 498, 94-98.

International Chicken Genome Sequencing Consortium. 2004. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 432, 695-715.

International Glossina Genome Initiative. 2014. Genome sequence of the tsetse fly (Glossina morsitans), vector of African trypanosomiasis. Science. 344, 380-386. 234

International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature. 409, 860-921.

Islam, M.S., Haque, M.S., Islam ,M.M., Emdad, E.M., Halim, A., Hossen, Q.M.M., Hossain, M.Z., Ahmed, B., Rahim, S., Rahman, M.S., et al. 2012. Tools to kill: genome of one of the most destructive plant pathogenic fungi Macrophomina phaseolina. BMC Genomics. 13, 493.

Istvanek, J., Jaros, M., Krenek, A., Repkova, J. 2014. Genome assembly and annotation for red clover (Trifolium pratense, Fabaceae). Am. J. Bot. 101, 1-11.

Ivens, A.C., Peacock, C.S., Worthey, E.A., Murphy, L., Aggarwal, G., Berriman, M., Sisk, E., Rajandream, M.-A., Adlem, E., Aert, R., et al. 2005. The genome of the kinetoplastid parasite, Leishmania major. Science. 309, 436-442.

Jackson, A.P., Gamble, J.A., Yeomans, T., Moran, G.P., Saunders, D., Harris, D., Aslett, M., Barrell, J.F., Butler, G., Citiulo, F., et al. 2009. Comparative genomics of the fungal pathogens Candida dubliniensis and Candida albicans. Genome Res. 19, 2231-2244.

Jaeckisch, N., Yang, I., Wohlrab, S., Glöckner, G., Kroymann, J., Vogel, H., Cembella, A., John, U. 2011. Comparative genomic and transcriptomic characterization of the toxigenic marine dinoflagellate Alexandrium ostenfeldii. PLoS One. 6, e28012.

Jaillon, O., Aury, J.-M., Brunet, F., Petit, J.-L., Stange-Thomann, N., Mauceli, E., Bouneau, L., Fischer, C., Ozouf-Costaz, C., Bernot, A., et al. 2004. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 431, 946-957.

James, T.Y., Pelin, A., Bonen, L., Ahrendt, S., Sain, D., Corradi, N., Stajich, J.E. 2013. Shared signatures of parasitism and phylogenomics unite Cryptomycota and microsporidia. Curr. Biol. 23, 1548-1553.

Janbon, G., Ormerod, K.L., Paulet, D., Byrnes, E.J., Yadav, V., Chatterjee, G., Mullapudi, N., Hon, C.-C., Billmyre, R.B., Brunel, F., et al. 2014. Analysis of the genome and transcriptome of Cryptococcus neoformans var. grubii reveals complex RNA expression and microevolution leading to virulence attenuation. PLoS Genet. 10, e1004261.

Jeffries, T.W., Grigoriev, I.V., Grimwood, J., Laplaza, J.M., Aerts, A., Salamov, A., Schmutz, J., Lindquist, E., Dehal, P., Shapiro, H., et al. 2007. Genome sequence of the lignocellulose- bioconverting and xylose-fermenting yeast Pichia stipitis. Nat. Biotech. 25, 319-326.

Jex, A.R., Liu, S., Li, B., Young, N.D., Hall, R.S., Li, Y., Yang, L., Zeng, N., Xu, X., Xiong, Z., et al. 2011. Ascaris suum draft genome. Nature. 479, 529-533.

Jex, A.R., Nejsum, P., Schwarz, E.M., Hu, L., Young, N.D., Hall, R.S., Korhonen, P.K., Liao, S., Thamsborg, S., Xia, J., et al. 2014. Genome and transcriptome of the porcine whipworm Trichuris suis. Nat. Genet. 46, 701-706. 235

Jia, J., Zhao, S., Kong, X., Li, Y., Zhao, G., He, W., Appels, R., Pfeifer, M., Tao, Y., Zhang, X., et al. 2013. Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature. 496, 91-95

Jiang, X., Peery, A., Hall, A.B., Sharma, A., Chen, X.-G., Waterhouse, R.M., Komissarov, A., Riehle, M.M., Shouche, Y., Sharakhova, M.V., et al. 2014. Genome analysis of a major urban malaria vector mosquito, Anopheles stephensi. Genome Biol. 15, 459.

Jiang, Y., Xie, M., Chen, W., Talbot, R., Maddox, J.F., Faraut, T., Wu, C., Muzny, D.M., Li, Y., Zhang, W., et al. 2014. The sheep genome illuminates biology of the rumen and lipid metabolism. Science. 344, 1168-1173.

Jones, T., Federspiel, N.A., Chibana, H., Dungan, J., Kalman, S., Magee, B.B., Newport, G., Thorstenson, Y.R., Agabian, N., Magee, P.T., et al. 2004. The diploid genome sequence of Candida albicans. P. Natl. Acad. Sci. USA. 101, 7329-7334.

Joneson, S., Stajich, J.E., Shiu, S.-H., Rosenblum, E.B. 2011. Genomic transition to pathogenicity in chytrid fungi. PLoS Pathog. 7, e1002338.

Kagale, S., Koh, C., Nixon, J., Bollina, V., Clarke, W.E., Tuteja, R., Spillane, C., Robinson, S.J., Links, M.G., Clarke, C., et al. 2014. The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure. Nat. Comm. 5, 3706.

Kakumani, P.K., Malhotra, P., Mukherjee, S.K., Bhatnagar, R.K. 2014. A draft genome assembly of the army worm, Spodoptera frugiperda. Genomics. 104, 134-143.

Kaminker, J. S., Bergman, C.M., Kronmiller, B., Carlson, J., Svirskas, R., Patel, S., Frise, E., Wheeler, D.A., Lewis, S.E., Rubin, G.M., et al. 2002. The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective. Genome Biol. 3, R0084

Kämper, J., Kahmann, R., Bölker, M., Ma, L.-J., Brefort, T., Saville, B.J., Banuett, F., Kronstad, J.W., Gold, S.E., Müller, O., et al. 2006. Insights from the genome of the biotrophic fungal plant pathogen Ustilago maydis. Nature. 444, 97-101.

Kang, Y.J., Satyawan, D., Shim, S., Lee, T., Lee, J., Hwang, W.J., Kim, S.K., Lestari, P., Laosatit, K., Kim, K.H., et al. 2015. Draft genome sequence of adzuki bean, Vigna angularis. Sci. Rep. 5, 8069.

Kang, Y.J., Kim, S.K., Kim, M.Y., Lestari, P., Kim, K.H., Ha, B.-K., Jun, T.H., Hwang, W.J., Lee, T., Lee, J., et al. 2014. Genome sequence of mungbean and insights into evolution within Vigna species. Nat. Comm. 5, 5443.

Kapraun, D.F. 2005. Nuclear DNA content estimates in multicellular green, red and brown algae: Phylogenetic considerations. Ann. Bot-London. 95, 7-44.

236

Kapraun, D.F. 2007. Nuclear DNA content estimates in green algal lineages, Chlorophyta and Streptophyta. Ann. Bot-London. 99, 677-701.

Karro, J.E., Yan, Y., Zheng, D., Zhang, Z., Carriero, N., Cayting, P., Harrison, P., Gerstein, M. 2007. .org: A comprehensive database and comparison platform for pseudogene annotation. Nucl. Acids Res. 35, 55-60.

Kasahara, M., Naruse, K., Sasaki, S., Nakatani, Y., Qu, W., Ahsan, B., Yamada, T., Nagayasu, Y., Doi, K., Kasai, Y., et al. 2007. The medaka draft genome and insights into vertebrate genome evolution. Nature. 447, 714-719.

Katinka, M.D., Duprat, S., Cornillot, E., Méténier, G., Thomarat, F., Prensier, G., Barbe, V., Peyretaillade, E., Brottier, P., Wincker, P., et al. 2001. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature. 414, 450-453.

Keeling, C.I., Yuen, M.M.S., Liao, N.Y., Docking, T.R., Chan, S.K., Taylor, G.A., Palmquist, D.L., Jackman, S.D., Nguyen, A., Li, M., et al. 2013. Draft genome of the mountain pine beetle, Dendroctonus ponderosae Hopkins, a major forest pest. Genome Biol. 14, R27.

Kelley, J.L., Peyton, J.T., Fiston-Lavier, A.-S., Teets, N.M., Yee, M.-C., Johnston, J.S., Bustamante, C.D., Lee, R.E., Denlinger, D.L. 2014. Compact genome of the Antarctic midge is likely an adaptation to an extreme environment. Nat. Comm. 5,1-8.

Kellis, M., Birren, B.W., Lander, E.S. 2004. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 428, 617-624.

Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.S. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 423, 241-254.

Kemen, E., Gardiner, A., Schultz-Larsen, T., Kemen, A.C., Balmuth, A.L., Robert-Seilaniantz, A., Bailey, K., Holub, E., Studholme, D.J., Maclean, D., et al. 2011. Gene gain and loss during evolution of obligate parasitism in the white rust pathogen of Arabidopsis thaliana. PLoS Biol. 9, e1001094.

Khatri, I., Akhtar, A., Kaur, K., Tomar, R., Prasad, G.S., Ramya, T.N.C., Subramanian, S., 2013. Gleaning evolutionary insights from the genome sequence of a probiotic yeast Saccharomyces boulardii. Gut Pathog. 5, 30.

Kikuchi, T., Cotton, J.A., Dalzell, J.J., Hasegawa, K., Kanzaki, N., McVeigh, P., Takanashi, T., Tsai, I.J., Assefa, S.A., Cock, P.J.A., et al. 2011. Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus. PLoS Pathog. 7, e1002219.

Kim, C., Lee, T.-H., Compton, R.O., Robertson, J.S., Pierce, G.J., Paterson, A.H. 2013. A genome-wide BAC end-sequence survey of sugarcane elucidates genome composition, and identifies BACs covering much of the euchromatin. Plant Mol. Biol. 81, 139-147. 237

Kim, E.B., Fang, X., Fushan, A.A., Huang, Z., Lobanov, A.V., Han, L., Marino, S.M., Sun, X., Turanov, A.A., Yang, P., et al. 2011. Genome sequencing reveals insights into physiology and longevity of the naked mole rat. Nature. 479, 223-227.

Kim, J.M., Vanguri, S., Boeke, J.D., Gabriel, A., Voytas, D.F. 1998. Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. Genome Res. 8, 464-478.

Kim, J.-S., Baek, J.-H., Park, N.-H., Kim, C. 2015. Complete genome sequence of halophilic yeast Meyerozyma caribbica MG20W isolated from rhizosphere soil. Genome Announce. 3, e00127-00115.

Kim, M.Y., Lee, S., Van, K., Kim, T.-H., Jeong, S.-C., Choi, I.-Y., Kim, D.-S., Lee, Y.-S., Park, D., Ma, J., et al. 2010. Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome. P. Natl. Acad. Sci. USA. 107, 22032-22037.

Kim, S., Park, M., Yeom, S.-I., Kim, Y.-M., Lee, J.M., Lee, H.-A., Seo, E., Choi, J., Cheong, K., Kim, K.-T., et al. 2014. Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species. Nat. Genet. 46, 270-278.

Kimbacher, S., Gerstl, I., Velimirov, B., Hagemann, S. 2009. Drosophila P transposons of the urochordata Ciona intestinalis. Mol. Genet. Genomics 282, 165-172.

King, N., Westbrook, M.J., Young, S.L., Kuo, A., Abedin, M., Chapman, J., Fairclough, S., Hellsten, U., Isogai, Y., Letunic, I., et al. 2008. The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature. 451, 783-788.

Kirkness, E.F., Haas, B.J., Sun, W., Braig, H.R., Perotti, M.A., Clark, J.M., Lee, S.H., Robertson, H.M., Kennedy, R.C., Elhaik, E., et al. 2010. Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle. P. Natl. Acad. Sci. USA. 107, 12168-12173.

Kitashiba, H., Li, F., Hirakawa, H., Kawanabe, T., Zou, Z., Hasegawa, Y., Tonosaki, K., Shirasawa, S., Fukushima, A., Yokoi, S., et al. 2014. Draft sequences of the radish (Raphanus sativus L.) genome. DNA Res. 21, 481-490.

Klosterman, S.J., Subbarao, K.V., Kang, S., Veronese, P., Gold, S.E., Thomma, B.P.H.J., Chen, Z., Henrissat, B., Lee, Y.-H., Park, J., et al. 2011. Comparative genomics yields insights into niche adaptation of plant vascular wilt pathogens. PLoS Pathog. 7, e1002137.

Kreppel, L., Fey, P., Gaudet, P., Just, E., Kibbe, W.A., Chisholm, R.L., Kimmel, A.R. 2004. dictyBase: a new Dictyostelium discoideum genome database. Nucl. Acids Res. 32, D332-D333.

Krishnan, N.M., Pattnaik, S., Jain, P., Gaur, P., Choudhary, R., Vaidyanathan, S., Deepak, S., Hariharan, A.K., Krishna, P.G.B., Nair, J., et al. 2012. A draft of the genome and four 238 transcriptomes of a medicinal and pesticidal angiosperm Azadirachta indica. BMC Genomics. 13, 464.

Kuan, C.S., Yew, S.M., Toh, Y.F., Chan, C.L., Ngeow, Y.F., Lee, K.W., Na, S.L., Yee, W.Y., Hoh, C.C., Ng, K.P. 2014. Dissecting the fungal biology of Bipolaris papendorfii: from phylogenetic to comparative genomic analysis. DNA Res. 22, 219-232.

Kubicek, C.P., Herrera-Estrella, A., Seidl-Seiboth, V., Martinez, D.A., Druzhinina, I.S., Thon, M., Zeilinger, S., Casas-Flores, S., Horwitz, B.A., Mukherjee, P.K., et al. 2011. Comparative genome sequence analysis underscores mycoparasitism as the ancestral life style of Trichoderma. Genome Biol. 12, R40.

Kuhl, H., Beck, A., Wozniak, G., Canario, A.V.M., Volckaert, F.A.M., Reinhardt, R. 2010. The European sea bass Dicentrarchus labrax genome puzzle: comparative BAC-mapping and low coverage shotgun sequencing. BMC Genomics. 11, 68.

Kumar, S., Randhawa, A., Ganesan, K., Raghava, G.P.S., Mondal, A.K. 2012. Draft genome sequence of salt-tolerant yeast Debaryomyces hansenii var. hansenii MTCC 234. Eukaryot. Cell. 11, 961-962.

Kutsenko, A., Svensson, T., Nystedt, B., Lundeberg, J., Björk, P., Sonnhammer, E., Giacomello, S., Visa, N., Wieslander, L. 2014. The Chironomus tentans genome sequence and the organization of the Balbiani ring genes. BMC Genomics. 15, 819.

Labbé, J., Murat, C., Morin, E., Tuskan, G.A., Le Tacon, F., Martin, F. 2012. Characterization of transposable elements in the ectomycorrhizal fungus Laccaria bicolor. PLoS One. 7, e40197.

Lamour, K.H., Mudge, J., Gobena, D., Hurtado-Gonzales, O.P., Schmutz, J., Kuo, A., Miller. N.A., Rice, B.J., Raffaele, S., Cano, L.M., et al. 2012. Genome sequencing and mapping reveal loss of heterozygosity as a mechanism for rapid adaptation in the vegetable pathogen Phytophthora capsici. Mol. Plant Microbe In. 25, 1350-1360.

Laurie, J.D., Ali, S., Linning, R., Mannhaupt, G., Wong, P., Güldener, U., Münsterkötter, M., Moore, R., Kahmann, R., Bakkeren, G., et al. 2012. Genome comparison of barley and maize smut fungi reveals targeted loss of RNA silencing components and species-specific presence of transposable elements. Plant Cell. 24, 1733-1745.

Lavoie, C. A., Platt, R.N., Novick, P.A., Counterman, B.A., Ray, D.A. 2013. Transposable element evolution in Heliconius suggests genome diversity within Lepidoptera. Mob. DNA. 4, 21.

Lee, R. M., Thimmapuram, J., Thinglum, K.A., Gong, G., Hernandez, A.G., Wright, C.L., Kim, R.W., Mikel, M.A., Tranel, P.J. 2009. Sampling the waterhemp (Amaranthus tuberculatus) genome using pyrosequencing technology. Weed Sci. 57, 463-469.

239

Lenassi, M., Gostincar, C., Jackman, S., Turk, M., Sadowski, I., Nislow, C., Jones, S., Birol, I., Cimerman, N.G., Plemenitaš, A. 2013. Whole genome duplication and enrichment of metal cation transporters revealed by de novo genome sequencing of extremely halotolerant black yeast Hortaea werneckii. PLoS One. 8, e71328.

Lerat, E., Burlet, N., Biémont, C., Vieira, C. 2011. Comparative analysis of transposable elements in the melanogaster subgroup sequenced genomes. Gene. 473, 100-109.

Lertwattanasakul, N., T. Kosaka, A. Hosoyama, Y. Suzuki, N. Rodrussamee, M. Matsutani, M. Murata, N. Fujimoto, Suprayogi, K. Tsuchikane, S. Limtong, N. Fujita, and M. Yamada. 2015. Genetic basis of the highly efficient yeast Kluyveromyces marxianus: complete genome sequence and transcriptome analyses. Biotech. Biofuels 8, 47.

Leushkin, E.V., Sutormin, R.A., Nabieva, E.R., Penin, A.A., Kondrashov, A.S., Logacheva, M.D. 2013. The miniature genome of a carnivorous plant Genlisea aurea contains a low number of genes and short non-coding sequences. BMC Genomics. 14, 476.

Levasseur, A., Lomascolo, A., Chabrol, O., Ruiz-Dueñas, F.J., Boukhris-Uzan, E., Piumi, F., Kües, U., Ram, A.F.J., Murat, C., Haon, M., et al. 2014. The genome of the white-rot fungus Pycnoporus cinnabarinus, a basidiomycete model with a versatile arsenal for lignocellulosic biomass breakdown. BMC Genomics. 15, 486.

Lévesque, C.A., Brouwer, H., Cano, L., Hamilton, J.P., Holt, C., Huitema, E., Raffaele, S., Robideau, G.P., Thines, M., Win, J., et al. 2010. Genome sequence of the necrotrophic plant pathogen Pythium ultimum reveals original pathogenicity mechanisms and effector repertoire. Genome Biol. 11, R73.

Lewis, N.E., Liu, X., Li, Y., Nagarajan, H., Yerganian, G., O'Brien, E., Bordbar, A., Roth, A.M., Rosenbloom, J., Bian, C., et al. 2013. Genomic landscapes of Chinese hamster ovary cell lines as revealed by the Cricetulus griseus draft genome. Nat. Biotech. 31: 759-765.

Li, F., Fan, G., Wang, K., Sun, F., Yuan, Y., Song, G., Li, Q., Ma, Z., Lu, C., Zou, C., et al. 2014. Genome sequence of the cultivated cotton Gossypium arboreum. Nat. Genet. 46, 567-572.

Li, R., Fan, W., Tian, G., Zhu, H., He, L., Cai, J., Huang, Q., Cai, Q., Li, B., Bai, Y., et al. 2010. The sequence and de novo assembly of the giant panda genome. Nature. 463, 311-317.

Li, S., Darwish, O., Alkharouf, N., Matthews, B., Ji, P., Domier, L.L., Zhang, N., Bluhm, B.H. 2015. Draft genome sequence of Phomopsis longicolla isolate MSPL 10-6. Genomics Data. 3, 55-56.

240

Lin, J., Kudrna, D., Wing, R.A. 2011. Construction, characterization, and preliminary BAC-end sequence analysis of a bacterial artificial chromosome library of the tea plant (Camellia sinensis). J. Biomed. Biotech. 2011, 476723.

Lin, K., Limpens, E., Zhang, Z., Ivanov, S., Saunders, D.G.O., Mu, D., Pang, E., Cao, H., Cha, H., Lin, T., et al. 2014. Single nucleus genome sequencing reveals high similarity among nuclei of an endomycorrhizal fungus. PLoS Genet. 10, e1004078.

Lin, S., Cheng, S., Song, B., Zhong, X., Lin, X., Li, W., Li, L., Zhang, Y., Zhang, H., Ji, Z., et al. 2015. The Symbiodinium kawagutii genome illuminates dinoflagellate gene expression and coral symbiosis. Science. 350, 691-694.

Lind, M., van der Nest, M., Olson, Å., Brandström-Durling, M., Stenlid, J. 2012. A 2nd generation linkage map of Heterobasidion annosum s.l. based on in silico anchoring of AFLP markers. PLoS One. 7, 1-8.

Lindblad-Toh, K., Wade, C.M., Mikkelsen, T.S., Karlsson, E.K., Jaffe, D.B., Kamal, M., Clamp, M., Chang, J.L., Kulbokas, E.J., Zody, M.C., et al. 2005. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 438, 803-819.

Linde, J., Schwartze, V., Binder, U., Voigt, K., Horn, F. 2014. De novo whole-genome sequence and genome annotation of Lichtheimia ramosa. Genome Announce. 2, e0088-0014.

Ling, H.-Q., Zhao, S., Liu, D., Wang, J., Sun, H., Zhang, C., Fan, H., Li, D., Dong, L., Tao, Y., et al. 2013. Draft genome of the wheat A-genome progenitor Triticum urartu. Nature. 496, 87- 90.

Liti, G., Ba, A.N.N., Blythe, M., Müller, C.A., Bergström, A., Cubillos, F.A., Dafhnis-Calas, F., Khoshraftar, S., Malla, S., Mehta, N., et al. 2013. High quality de novo sequencing and assembly of the Saccharomyces arboricolus genome. BMC Genomics. 14, 69.

Liu, D., Gong, J., Dai, W., Kang, X., Huang, Z., Zhang, H.-M., Liu, W., Liu, L., Ma, J., Xia, Z., et al. 2012. The genome of Ganoderma lucidum provides insights into triterpenes biosynthesis and wood degradation. PLoS One. 7, e36146.

Liu, D., Zeng, S.-H., Chen, J.-J., Zhang, Y.-J., Xiao, G., Zhu, L.-Y., Wang, Y. 2013. First insights into the large genome of Epimedium sagittatum (Sieb. et Zucc) Maxim, a Chinese traditional medicinal plant. Int. J. Mol. Sci. 14, 13559-13576.

Liu, K., Zhang, W., Lai, Y., Xiang, M., Wang, X., Zhang, X., Liu, X. 2014. Drechslerella stenobrocha genome illustrates the mechanism of constricting rings and the origin of nematode predation in fungi. BMC Genomics. 15, 114.

241

Liu, M.-J., Zhao, J., Cai, Q.-L., Liu, G.-C., Wang, J.-R., Zhao, Z.-H., Liu, P., Dai, L., Yan, G., Wang, W.-J., et al. 2014. The complex jujube genome provides insights into fruit tree biology. Nat. Comm. 5, 5315.

Liu, S., Liu, Y., Yang, X., Tong, C., Edwards, D., Parkin, I.A.P., Zhao, M., Ma, J., Yu, J., Huang, S., et al. 2014. The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes. Nat. Comm. 5, 3930.

Liu, W., Thummasuwan, S., Sehgal, S.K., Chouvarine, P., Peterson, D.G. 2011. Characterization of the genome of bald cypress. BMC Genomics. 12, 553.

Liu, X., Zhao, B., Zheng, H.-J., Hu, Y., Lu, G., Yang, C.-Q., Chen, J.-D., Chen, J.-J., Chen, D.- Y., Zhang, L., et al. 2015. Gossypium barbadense genome sequence provides insight into the evolution of extra-long staple fiber and specialized metabolites. Sci. Rep. 5, 14139.

Liu, X., Kaas, R.S., Jensen, P.R., Workman, M. 2012. Draft genome sequence of the yeast Pachysolen tannophilus CBS 4044/NRRL Y-2460. Eukaryot. Cell 11, 827.

Locke, D.P., Hillier, L.W., Warren, W.C., Worley, K.C., Nazareth, L.V., Muzny, D.M., Yang, S.-P., Wang, Z., Chinwalla, A.T., Minx, P., et al. 2011. Comparative and demographic analysis of orang-utan genomes. Nature. 469,529-533.

Loftus, B., Anderson, I., Davies, R., Alsmark, U.C.M., Samuelson, J., Amedeo, P., Roncaglia, P., Berriman, M., Hirt, R.P., Mann, B.J., et al. 2005. The genome of the protist parasite Entamoeba histolytica. Nature. 433, 865-868.

Loftus, B.J., Fung, E., Roncaglia, P., Rowley, D., Amedeo, P., Bruno, D., Vamathevan, J., Miranda, M., Anderson, I.J., Fraser, J.A., et al. 2005. The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans. Science. 307, 1321-1324.

Lommer, M., Specht, M., Roy, A.-S., Kraemer, L., Andreson, R., Gutowska, M.A., Wolf, J., Bergner, S.V., Schilhabel, M.B., Klostermeier, U.C., et al. 2012. Genome and low-iron response of an oceanic diatom adapted to chronic iron limitation. Genome Biol. 13, R66.

Lorenz, S., Guenther, M., Grumaz, C., Rupp, S., Zibek, S., Sohn, K. 2014. Genome sequence of the Basidiomycetous fungus Pseudozyma aphidis DSM70725, an efficient producer of biosurfactant mannosylerythritol lipids. Genome Announce. 2, 13-14.

Lorenzi, H., Thiagarajan, M., Haas, B., Wortman, J., Hall, N., Caler, E. 2008. Genome wide survey, discovery and evolution of repetitive elements in three Entamoeba species. BMC Genomics. 9, 595.

Lu, L., Chen, Y., Wang, Z., Li, X., Chen, W., Tao, Z., Shen, J., Tian, Y., Wang, D., Li, G., et al. 2015. The goose genome sequence leads to insights into the evolution of waterfowl and susceptibility to fatty liver. Genome Biol. 16, 89. 242

Ma, L.-J., Ibrahim, A.S., Skory, C., Grabherr, M.G., Burger, G., Butler, M., Elias, M., Idnurm, A., Lang, B.F., Sone, T., et al. 2009. Genomic analysis of the basal lineage fungus Rhizopus oryzae reveals a whole-genome duplication. PLoS Genet. 5, e1000549.

Ma, L.-J., van der Does, H.C., Borkovich, K.A., Coleman, J.J., Daboussi, M.-J., Di Pietro, A., Dufresne, M., Freitag, M., Grabherr, M., Henrissat, B., et al. 2010. Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium. Nature. 464, 367-373.

Ma, T., Wang, J., Zhou, G., Yue, Z., Hu, Q., Chen, Y., Liu, B., Qiu, Q., Wang, Z., Zhang, J., et al. 2013. Genomic insights into salt adaptation in a desert poplar. Nat. Comm. 4, 2797.

Macas, J., Kejnovský, E., Neumann, P., Novák, P., Koblížková, A., Vyskot, B. 2011. Next generation sequencing-based analysis of repetitive DNA in the model dioceous plant Silene latifolia. PLoS One. 6, e27335.

Macas, J., Neumann, P., Navrátilová, A. 2007. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genomics. 8, 427.

Machida, M., Asai, K., Sano, M., Tanaka, T., Kumagai, T., Terai, G., Kusumoto, K.-I., Arima, T., Akita, O., Kashiwagi, Y., et al. 2005. Genome sequencing and analysis of Aspergillus oryzae. Nature. 438, 1157-1161.

Magbanua, Z. V., Ozkan, S., Bartlett, B.D., Chouvarine, P., Saski, C.A., Liston, A., Cronn, R.C., Nelson, C.D., Peterson, D.G. 2011. Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of Loblolly pine. PLoS One. 6, e16214.

Malapi-Wight, M., Salgado-Salazar, C., Demers, J., Veltri, D., Crouch, A. 2015. Draft genome sequence of Dactylonectria macrodidyma, a plant-pathogenic fungus in the Nectriaceae. Genome Announce. 3, e00278-00215.

Malkus, A., Song, Q., Cregan, P., Arseniuk, E., Ueng, P.P. 2009. Genetic linkage map of Phaeosphaeria nodorum, the causal agent of Stagonospora nodorum blotch disease of wheat. Eur. J. Plant Pathol. 124, 681-690.

Mandal, P. K., Kazazian, H.H. 2008. SnapShot: vertebrate transposons. Cell. 135, 192.

Mansour, L., Ben Hassine, O.K., Vivares, C.P., Cornillot, E. 2013. Spraguea lophii (Microsporidia) parasite of the teleost fish, Lophius piscatorius from Tunisian coasts: Evidence for an extensive chromosome length polymorphism. Parasitol. Int. 62, 66-74.

Marcet-Houben, M., Ballester, A.-R., de la Fuente, B., Harries, E., Marcos, J.F., González- Candelas, L., Gabaldón, T. 2012. Genome sequence of the necrotrophic fungus Penicillium digitatum, the main postharvest pathogen of citrus. BMC Genomics. 13, 646.

243

Mardanov, A.V., Beletsky, A.V., Kadnikov, V.V., Ignatov, A.N., Ravin, N.V. 2014. Draft genome sequence of Sclerotinia borealis, a psychrophilic plant pathogenic fungus. Genome Announce. 2, e01175-01113.

Marinotti, O., Cerqueira, G.C., de Almeida, L.G.P, Ferro, M.I.T., Loreto, E.L.D.S., Zaha, A., Teixeira, S.M.R., Wespiser, A.R., Almeida, E.S.A., Schlindwein, A.D., et al. 2013. The genome of Anopheles darlingi, the main neotropical malaria vector. Nucl. Acids Res. 41, 7387-7400.

Marmoset Genome Sequencing and Analysis Consortium. 2014. The common marmoset genome provides insight into primate biology and evolution. Nat. Genet. 46, 850-857.

Martin, F., Aerts, A., Ahrén, D., Brun, A., Danchin, E.G.J., Duchaussoy, F., Gibon, J., Kohler, A., Lindquist, E., Pereda, V., et al. 2008. The genome of Laccaria bicolor provides insights into mycorrhizal symbiosis. Nature. 452, 88-92.

Martin, F., Kohler, A., Murat, C., Balestrini, R., Coutinho, P.M., Jaillon, O., Montanini, B., Morin, E., Noel, B., Percudani, R., et al. 2010. Périgord black truffle genome uncovers evolutionary origins and mechanisms of symbiosis. Nature. 464, 1033-1038.

Martinez, D., Challacombe, J., Morgenstern, I., Hibbett, D., Schmoll, M., Kubicek, C.P., Ferreira, P., Ruiz-Duenas, F.J., Martinez, A.T., Kersten, P., et al. 2009 Genome, transcriptome, and secretome analysis of wood decay fungus Postia placenta supports unique mechanisms of lignocellulose conversion. Proc. Natl. Acad. Sci. USA.106, 1954-1959.

Martinez, D., Larrondo, L.F., Putnam, N., Gelpke, M.D.S., Huang, K., Chapman, J., Helfenbein, K.G., Ramaiya, P., Detter, J.C., Larimer, F., et al. 2004 Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78. Nat. Biotech. 22, 695-700.

Martinez, D., Challacombe, J., Morgenstern, I., Hibbett, D., Schmoll, M., Kubicek, C.P., Ferreira, P., Ruiz-Duenas, F.J., Martinez, A.T., Kersten, P., et al. 2009. Genome, transcriptome, and secretome analysis of wood decay fungus Postia placenta supports unique mechanisms of lignocellulose conversion. P. Natl. Acad. Sci. USA.106, 1954-1959.

Martinez, D.A., Oliver, B.G., Gräser, Y., Goldberg, J.M., Li, W., Martinez-rossi, N.M., Monod, M., Shelest, E., Barton, R.C., Birch, E., et al. 2012. Genes involved in infection comparative genome analysis of Trichophyton rubrum and related dermatophytes reveals candidate genes involved in infection. mBio. 3, e00259-00212.

Matsui, M., Yokoyama, T.Y., Nemoto, K., Kumagai, T., Terai, G., Arita, M., Machida, M., Shibata, T. 2015. Genome sequence of Fungal Species no.11243, which produces the antifungal antibiotic FR901469. Genome Announce. 3, e00118-00115.

244

Matsuzaki, M., Misumi, O., Shin-I, T., Maruyama, S., Takahara, M., Miyagishima, S.-Y., Mori, T., Nishida, K., Yagisawa, F., Nishida, K., et al. 2004. Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D. Nature. 428, 643-657.

Matsuzawa, T., Koike, H., Saika, A., Fukuoka, T., Sato, S., Habe, H., Kitamoto, D., Morita, T. 2015. Draft genome sequence of the yeast Starmerella bombicola NBRC10243, a producer of sophorolipids, glycolipid biosurfactants. Genome Announce. 3, e00176-00115.

Maumus, F., Allen, A.E., Mhiri, C., Hu, H., Jabbari, K., Vardi, A., Grandbastien, M.-A., Bowler, C. 2009. Potential impact of stress activated retrotransposons on genome evolution in a marine diatom. BMC Genomics. 10, 624.

Meinhardt, L.W., Costa, G.G.L., Thomazella, D.P.T., Teixeira, P.J.P.L., Carazzolle, M.F., Schuster, S.C., Carlson, J.E., Guiltinan, M.J., Mieczkowski, P., Farmer, A., et al. 2014. Genome and secretome analysis of the hemibiotrophic fungal pathogen, Moniliophthora roreri, which causes frosty pod rot disease of cacao, mechanisms of the biotrophic and necrotrophic phases. BMC Genomics. 15, 164.

Merchant, S.S., Prochnik, S.E., Vallon, O., Harris, E.H., Karpowicz, S.J., Witman, G.B., Terry, A., Salamov, A., Fritz-Laylin, L.K., Maréchal-Drouard, L., et al. 2007. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science. 318, 245-250.

Metcalfe, C. J., Filée, J., Germon, I., Joss, J., Casane, D. 2012. Evolution of the Australian lungfish (Neoceratodus forsteri) genome: a major role for CR1 and L2 LINE elements. Mol. Biol. Evol. 29, 3529-3539.

Mikkelsen, T.S., Wakefield, M.J., Aken, B., Amemiya, C.T., Chang, J.L., Duke, S., Garber, M., Gentles, A.J., Goodstadt, L., Heger, A., et al. 2007. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature. 447, 167-177.

Miller, W., Drautz, D.I., Ratan, A., Pusey, B., Qi, J., Lesk, A.M., Tomsho, L.P., Packard, M.D., Zhao, F., Sher, A., et al. 2008. Sequencing the nuclear genome of the extinct woolly mammoth. Nature. 456, 387-390.

Miller, W., Hayes, V.M., Ratan, A., Petersen, C., Wittekindt, N.E., Miller, J., Walenz, B., Knight, J., Qi, J., Zhao, F., et al. 2012. Genetic diversity and population structure of the endangered marsupial Sarcophilus harrisii (Tasmanian devil). P. Natl. Acad. Sci. USA. 108, 12348-12353.

Min, B., Park, H., Jang, Y., Kim, J.J., Kim, K.H., Pangilinan, J., Lipzen, A., Riley, R., Grigoriev, I.V., Spatafora, J.W., Choi, I.-G. 2015. Genome sequence of a white rot fungus Schizopora paradoxa KUC8140 for wood decay and mycoremediation. J. Biotech. 211, 42-43.

245

Ming, R., Hou, S., Feng, Y., Yu, Q., Dionne-Laporte, A., Saw, J.H., Senin, P., Wang, W., Ly, B.V., Lewis, K.L.T., et al. 2008. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature. 452, 991-996.

Ming, R., VanBuren, R., Wai, C.M., Tang, H., Schatz, M.C., Bowers, J.E., Lyons, E., Wang, M.-L., Chen, J., Biggers, E., et al. 2015. The pineapple genome and the evolution of CAM photosynthesis. Nat. Genet. 47, 1435-1442.

Ming, R., Vanburen, R., Liu, Y., Yang, M., Han, Y., Li, L.-T., Zhang, Q., Kim, M.-J., Schatz, M.C., Campbell, M., et al. 2013. Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.). Genome Biol. 14, R41.

Minrong, C., Hangqin, L., Zhimei, G., Daoquan, C. 1996. The karyotype and the C-band patterns of the Baiji dolphin, Lipotes vexillifer. Acta Hydrobiol. Sin. 20, 138-143.

Mitreva, M., Jasmer, D.P., Zarlenga, D.S., Wang, Z., Abubucker, S., Martin, J., Taylor, C.M., Yin, Y., Fulton, L., Minx, P., et al. 2011. The draft genome of the parasitic nematode Trichinella spiralis. Nat. Genet. 43, 228-235.

Moghe, G.D., Hufnagel, D.E., Tang, H., Xiao, Y., Dworkin, I., Town, C.D., Conner, J.K., Shiu, S.-H. 2014. Consequences of whole-genome triplication as revealed by comparative genomic analyses of the wild radish Raphanus raphanistrum and three other Brassicaceae species. Plant Cell. 26, 1925-1937.

Mondego, J.M.C., Carazzolle, M.F., Costa, G.G.L., Formighieri, E.F., Parizzi, L.P., Rincones, J., Cotomacci, C., Carraro, D.M., Cunha, A.F., Carrer, H., et al. 2008. A genome survey of Moniliophthora perniciosa gives new insights into Witches' Broom Disease of cacao. BMC Genomics. 9, 548.

Moore, G.G., Mack, B.M., Beltz, S.B. 2015. Genomic sequence of the aflatoxigenic filamentous fungus Aspergillus nomius. BMC Genomics. 16, 551.

Morales, L., Noel, B., Porcel, B., Marcet-Houben, M., Hullo, M.-F., Sacerdot, C., Tekaia, F., Leh-Louis, V., Despons, L., Khanna, V., et al. 2013. Complete DNA sequence of Kuraishia capsulata illustrates novel genomic features among budding yeasts (Saccharomycotina). Genome Biol. Evol. 5, 2524-2539.

Moreau, H., Verhelst, B., Couloux, A., Derelle, E., Rombauts, S., Grimsley, N., Van Bel, M., Poulain, J., Katinka, M., Hohmann-Marriott, M.F., et al. 2012. Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage. Genome Biol. 13, R74.

Morin, E., Kohler, A., Baker, A.R., Foulongne-Oriol, M., Lombard, V., Nagye, L.G., Ohm, R.A., Patyshakuliyeva, A., Brun, A., Aerts, A.L., et al. 2012. Genome sequence of the button

246 mushroom Agaricus bisporus reveals mechanisms governing adaptation to a humic-rich ecological niche. P. Natl. Acad. Sci. USA. 109, 17501-17506.

Moroz, L.L., Kocot, K.M., Citarella, M.R., Dosung, S., Norekian, T.P., Povolotskaya, I.S., Grigorenko, A.P., Dailey, C., Berezikov, E., Buckley, K.M., et al. 2014. The ctenophore genome and the evolutionary origins of neural systems. Nature. 510, 109-114.

Morrison, H.G., McArthur, A.G., Gillin, F.D., Aley, S.B., Adam, R.D., Olsen, G.J., Best, A.A., Cande, W.Z., Chen, F., Cipriano, M.J., et al. 2007. Genomic minimalism in the early diverging intestinal parasite Giardia lamblia. Science. 317, 1921-1926.

Mouse Genome Sequencing Consortium. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature. 420, 520-562.

Murchison, E.P., Schulz-Trieglaff, O.B., Ning, Z., Alexandrov, L.B., Bauer, M.J., Fu, B., Hims, M., Ding, Z., Ivakhno, S., Stewart, C., et al. 2012. Genome sequencing and analysis of the Tasmanian devil and its transmissible cancer. Cell. 148, 780-791.

Myburg, A.A., Grattapaglia, D., Tuskan, G.A., Hellsten, U., Hayes, R.D., Grimwood, J., Jenkins, J., Lindquist, E., Tice, H., Bauer, D., et al. 2014. The genome of Eucalyptus grandis. Nature. 509, 356-362.

Nakamura, Y., Mori, K., Saitoh, K., Oshima, K., Mekuchi, M., Sugaya, T., Shigenobu, Y., Ojima, N., Muta, S., Fujiwara, A., et al. 2013. Evolutionary changes of multiple visual pigment genes in the complete genome of Pacific bluefin tuna. P. Natl. Acad. Sci. USA. 110, 11061- 11066.

Nakamura, Y., Sasaki, N., Kobayashi, M., Ojima, N., Yasuike, M., Shigenobu, Y., Satomi, M., Fukuma, Y., Shiwaku, K., Tsujimoto, A., et al. 2013. The first symbiont-free genome sequence of marine red alga, Susabi-nori (Pyropia yezoensis). PLoS One. 8, e57122.

Natali, L., Cossu, R.M., Barghini, E., Giordani, T., Buti, M., Mascagni, F., Morgante, M., Gill, N., Kane, N.C., Rieseberg, L., et al. 2013. The repetitive component of the sunflower genome as shown by different procedures for assembling next generation sequencing reads. BMC Genomics. 14, 686.

Natsume, S., Takagi, H., Shiraishi, A., Murata, J., Toyonaga, H., Patzak, J., Takagi, M., Yaegashi, H., Uemura, A., Mitsuoka, C., et al. 2015. The draft genome of hop (Humulus lupulus), an essence for brewing. Plant Cell Physiol. 56, 428-441.

Neafsey, D.E., Barker, B.M., Sharpton, T.J., Stajich, J.E., Park, D.J., Whiston, E., Hung, C.-Y., McMahan, C., White, J., Sykes, S., et al. 2010. Population genomic sequencing of Coccidioides fungi reveals recent hybridization and transposon control. Genome Res. 20, 938-946.

247

Neale, D.B., Wegrzyn, J.L., Stevens, K.A., Zimin, A.V., Puiu, D., Crepeau, M.W., Cardeno, C., Koriabine, M., Holtz-Morris, A.E., Liechty, J.D., et al. 2014. Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome Biol. 15, R59.

Nefedova, L. N., Kim, A.I. 2009. Molecular phylogeny and systematics of Drosophila retrotransposons and retroviruses. Mol. Biol. 43, 747-756.

Negre, B., Simpson, P. 2013. Diversity of transposable elements and repeats in a 600 kb region of the fly Calliphora vicina. Mob. DNA. 4, 13.

Nemri, A., Saunders, D.G.O., Anderson, C., Upadhyaya, N.M., Win, J., Lawrence, G.J., Jones, D.A., Kamoun, S., Ellis, J.G., Dodds, P.N. 2014. The genome sequence and effector complement of the flax rust pathogen Melampsora lini. Front. Plant Sci. 5, 98.

Nene, V., Wortman, J.R., Lawson, D., Haas, B., Kodira, C., Tu, Z.J., Loftus, B., Xi, Z., Megy, K., Grabherr, M., et al. 2007. Genome sequence of Aedes aegypti, a major arbovirus vector. Science. 316, 1718-1723.

Ng, K.P., Ngeow, Y.F., Yew, S.M., Hassan, H., Soo-Hoo, T.S., Na, S.L., Chan, C.L., Hoh, C.-C., Lee, K.-W., Yee, W.-Y. 2012. Draft genome sequence of Daldinia eschscholzii isolated from blood culture. Eukaryot. Cell 11, 703-704.

Ng, K.P., Yew, S.M., Chan, C.L., Soo-Hoo, T.S., Na, S.L., Hassan, H., Ngeow, Y.F., Hoh, C.-C., Lee, K.-W., Yee, W.-Y. 2012 Sequencing of Cladosporium sphaerospermum, a Dematiaceous fungus isolated from blood culture. Eukaryot. Cell 11, 705-706.

Nierman, W.C., Pain, A., Anderson, M.J., Wortman, J.R., Kim, H.S., Arroyo, J., Berriman, M., Abe, K., Archer, D.B., Bermejo, C., et al. 2005. Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus. Nature. 438, 1151-1156.

Nikaido, M., Noguchi, H., Nishihara, H., Toyoda, A., Suzuki, Y., Kajitani, R., Suzuki, H., Okuno, M., Aibara, M., Ngatunga, B.P., et al. 2013. Coelacanth genomes reveal signatures for evolutionary transition from water to land. Genome Res. 23, 1740-1748.

Nishida, H., Nagatsuka, Y., Sugiyama, J. 2011. Draft genome sequencing of the enigmatic basidiomycete Mixia osmundae. J. Gen. Appl. Microbiol. 57, 63-67.

Nossa, C.W., Havlak, P., Yue, J.-X., Lv, J., Vincent, K.Y., Brockmann, H.J., Putnam, N.H. 2014. Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication. GigaSci. 3,9.

Novick, P.A., Basta, H., Floumanhaft, M., McClure, M.A., Boissinot, S. 2009. The evolutionary dynamics of autonomous non-LTR retrotransposons in the lizard Anolis carolinensis shows more similarity to fish than mammals. Mol. Biol. Evol. 26, 1811-1822.

248

Novick, P.A., Smith, J.D., Floumanhaft, M., Ray, D.A., Boissinot, S. 2011. The evolution and diversity of DNA transposons in the genome of the lizard Anolis carolinensis. Genome Biol. Evol. 3, 1-14.

Novikova, O.S. 2010. Diversity and evolution of LTR retrotransposons in the genome of Phanerochaete chrysosporium (Fungi: Basidiomycota). Russian J. Genet. 46, 637-644.

Nowrousian, M., Stajich, J.E., Chu, M., Engh, I., Espagne, E., Halliday, K., Kamerewerd, J., Kempken, F., Knab, B., Kuo, H.-C., et al. 2010. De novo assembly of a 40 Mb eukaryotic genome from short sequence reads, Sordaria macrospora, a model organism for fungal morphogenesis. PLoS Genet. 6, e1000891.

Nygaard, S., Zhang, G., Schiøtt, M., Li, C., Wurm, Y., Hu, H., Zhou, J., Ji, L., Qiu, F., Rasmussen, M., et al. 2011. The genome of the leaf-cutting ant Acromyrmex echinatior suggests key adaptations to advanced social life and fungus farming. Genome Res. 21, 1339-1348.

Nystedt, B., Street, N.R., Wetterbom, A., Zuccolo, A., Lin, Y.-C., Scofield, D.G., Vezzi, F., Delhomme, N., Giacomello, S., Alexeyenko, A., et al. 2013. The Norway spruce genome sequence and conifer genome evolution. Nature. 497, 579-584.

O'Connell, R.J., Thon, M.R., Hacquard, S., Amyotte, S.G., Kleemann, J., Torres, M.F., Damm, U., Buiate, E.A., Epstein, L., Alkan, N., et al. 2012. Lifestyle transitions in plant pathogenic Colletotrichum fungi deciphered by genome and transcriptome analyses. Nat. Genet. 44, 1060- 1065.

Ohm, R.A., de Jong, J.F., Lugones, L.G., Aerts, A., Kothe, E., Stajich, J.E., de Vries, R.P., Record, E., Levasseur, A., Baker, S.E., et al. 2010. Genome sequence of the model mushroom Schizophyllum commune. Nat. Biotech. 28, 957-963.

Oka, T., Ekino, K., Fukuda, K., Nomura, Y. 2014. Draft. genome sequence of the formaldehyde- resistant fungus Byssochlamys specabilis No. 5 (anamorph Paecilomyces variotii No. 5) (NBRC109023). Genome Announce. 2, e01162-01113.

Okagaki, L.H., Nunes, C.C., Sailsbery, J., Clay, B., Brown, D., John, T., Oh, Y., Young, N., Fitzgerald, M., Haas, B.J., et al. 2015. Genome sequences of three phytopathogenic species of the Magnaporthaceae family of Fungi. G3. 5, 2539-2545.

Olson, A., Aerts, A., Asiegbu, F., Belbahri, L., Bouzid, O., Broberg, A., Canbäck, B., Coutinho, P.M., Cullen, D., Dalman, K., et al. 2012. Insight into trade-off between wood decay and parasitism from the genome of a fungal forest pathogen. New Phytol. 194, 1001-1013.

Opperman, C.H., Bird, D.M., Williamson, V.M., Rokhsar , D.S., Burke, M., Cohn, J., Cromer, J., Diener, S., Gajan, J., Graham, S., et al. 2008. Sequence and genetic map of Meloidogyne hapla: a compact nematode genome for plant parasitism. P. Natl. Acad. Sci. USA. 105, 14802-14807.

249

Orlando, L., Ginolhac, A., Zhang, G., Froese, D., Albrechtsen, A., Stiller, M., Schubert, M., Cappellini, E., Petersen, B., Moltke, I., et al. 2013. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature. 499, 74-78.

Osanai-Futahashi, M., Suetsugu, Y., Mita, K., Fujiwara, H. 2008. Genome-wide screening and characterization of transposable elements and their distribution analysis in the silkworm, Bombyx mori. Insect Biochem. Mol. Biol. 38, 1046-1057.

Padamsee, M., Kumar, T.K.A., Riley, R., Binder, M., Boyd, A., Calvo, A.M., Furukawa, K., Hesse, C., Hohmann, S., James, T.Y., et al. 2012. The genome of the xerotolerant mold Wallemia sebi reveals adaptations to osmotic stress and suggests cryptic sexual reproduction. Fungal Genet. Biol. 49, 217-226.

Pain, A., Böhme, U., Berry, A.E., Mungall, K., Finn, R.D., Jackson, A.P., Mourier, T., Mistry, J., Pasini, E.M., Aslett, M.A., et al. 2008. The genome of the simian and human malaria parasite Plasmodium knowlesi. Nature. 455, 799-803.

Pain, A., Renauld, H., Berriman, M., Murphy, L., Yeats, C.A., Weir, W., Kerhornou, A., Aslett, M., Bishop, R., Bouchier, C., et al. 2005. Genome of the host-cell transforming parasite Theileria annulata compared with T. parva. Science. 309, 131-133.

Palenik, B., Grimwood, J., Aerts, A., Rouzé, P., Salamov, A., Putnam, N., Dupont, C., Jorgensen, R., Derelle, E., Rombauts, S., et al. 2007. The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. P. Natl. Acad. Sci. USA. 104, 7705- 7710.

Park, D., Jung, J.W., Choi, B.-S., Jayakodi, M., Lee, J., Lim, J., Yu, Y., Choi, Y.-S., Lee, M.-L., Park, Y., et al. 2015. Uncovering the novel characteristics of Asian honey bee, Apis cerana, by whole genome sequencing. BMC Genomics. 16, 1.

Park, G.M., Im, K., Huh , S., Yong, T.S. 2000. Chromosomes of the liver fluke, Clonorchis sinensis. Korean J. Parasitol. 38, 201-206.

Park, S.D.E., Magee, D.A., McGettigan, P.A., Teasdale, M.D., Edwards, C.J., Lohan, A.J., Murphy, A., Braud, M., Donoghue, M.T., Liu, Y., et al. 2015. Genome sequencing of the extinct Eurasian wild aurochs, Bos primigenius, illuminates the phylogeography and evolution of cattle. Genome Biol. 16, 234.

Park, S.-Y., Choi, J., Kim, J.A., Jeong, M.-H., Kim, S., Lee, Y.-H., Hur, J.-S. 2013. Draft genome sequence of Cladonia macilenta KoLRI003786 , a lichen-forming fungus producing biruloquinone. Genome Announce. 1, e00695-00613.

250

Park, S.-Y., Choi, J., Kim, J.A., Yu, N.-H., Kim, S., Kondratyuk, S.Y., Lee, Y.-H., Hur, J.-S. 2013. Draft genome sequence of lichen-forming fungus Caloplaca flavorubescens strain FoLRI002931. Genome Announce. 1, e00678-00613.

Park, S.-Y., Choi, J., Lee, G.-w., Jeong, M.-H., Kim, J.A., Oh, S.-O., Lee, Y.-H., Hur, J.-S. 2014. Draft genome sequence of Umbilicaria muehlenbergii KoLRILF000956 , a lichen-forming fungus amenable to genetic manipulation. Genome Announce. 2, e00357-00314.

Park, S.-Y., Choi, J., Lee, G.-W., Kim, J.A., Oh, S.-O., Jeong, M.-H., Yu, N.-H., Kim, S., Lee, Y.-H., Hur, J.-S. 2014. Draft genome sequence of lichen-forming fungus Cladonia metacorallifera strain KoLRI00260. Genome Announce. 2, e01065-01013.

Park, S.-Y., Choi, J., Lee, G.-W., Park, C.-H., Kim, J.A., Oh, S.-O., Lee, Y.-H., Hur, J.-S., 2014. Draft genome sequence of Endocarpon pusillum strain KoLRILF000583. Genome Announce. 2, e00452-00414.

Park, S.-Y., J. Choi, J. A. Kim, N.-H. Yu, S. Kim, S. Y. Kondratyuk, Y.-H. Lee, Hur, J.S. 2013. Draft genome sequence of lichen-forming fungus Caloplaca flavorubescens strain FoLRI002931. Genome Announce. 1, 6-7.

Park, Y.-J., Baek, J.H., Lee, S., Kim, C., Rhee, H., Kim, H., Seo, J.-S., Park, H.-R., Yoon, D.-E., Nam, J.-Y., et al. 2014. Whole genome and global gene expression analyses of the model mushroom Flammulina velutipes reveal a high capacity for lignocellulose degradation. PLoS One. 9, e93560.

Parlange, F., Oberhaensli, S., Breen, J., Platzer, M., Taudien, S., Simková, H., Wicker, T., Dolezel, J., Keller, B. 2011. A major invasion of transposable elements accounts for the large size of the Blumeria graminis f.sp. tritici genome. Funct. Integr. Genomics. 11, 671-677.

Passoth, V., Hansen, M., Klinner, U., Emeis, C.C. 1992. The electrophoretic banding pattern of the chromosomes of Pichia stipitis and Candida shehatae. Curr. Genet. 22, 429-431.

Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood, J., Gundlach, H., Haberer, G., Hellsten, U., Mitros, T., Poliakov, A., et al. 2009. The Sorghum bicolor genome and the diversification of grasses. Nature. 457,551-556.

Pawar, H., Sahasrabuddhe, N.A., Renuse, S., Keerthikumar, S., Sharma, J., Kumar, G.S.S., Venugopal, A., Sekhar, N.R., Kelkar, D.S., Nemade, H., et al. 2012. A proteogenomic approach to map the proteome of an unsequenced pathogen - Leishmania donovani. Proteomics. 12, 832- 844.

Peacock, C.S., Seeger, K., Harris, D., Murphy, L., Ruiz, J.C., Quail, M.A., Peters, N., Adlem, E., Tivey, A., Aslett, M., et al. 2007. Comparative genomic analysis of three Leishmania species that cause diverse human disease. Nat. Genet. 39, 839-847.

251

Pel, H.J., de Winde, J.H., Archer, D.B., Dyer, P.S., Hofmann, G., Schaap, P.J., Turner, G., de Vries, R.P., Albang, R., Albermann, K., et al. 2007. Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88. Nat. Biotech. 225, 221-231.

Peng, X., Alföldi, J., Gori, K., Eisfeld, A.J., Tyler, S.R., Tisoncik-Go, J., Brawand, D., Law, G.L., Skunca, N., Hatta, M., et al. 2014. The draft genome sequence of the ferret (Mustela putorius furo) facilitates study of human respiratory disease. Nat. Biotech. 32, 1250-1255.

Peng, Y., Lai, Z., Lane, T., Nageswara-Rao, M., Okada, M., Jasieniuk, M., O'Geen, H., Kim, R.W., Sammons, R.D., Rieseberg, L.H., et al. 2014. De novo genome assembly of the economically important weed horseweed using integrated data from multiple sequencing platforms. Plant Physiol. 166, 1241-1254.

Peng, Z., Lu, Y., Li, L., Zhao, Q., Feng, Q., Gao, Z., Lu, H., Hu, T., Yao, N., Liu, K., et al. 2013. The draft genome of the fast-growing non-timber forest species moso bamboo (Phyllostachys heterocycla). Nat. Genet. 45, 456-461.

Perlin, M.H., Amselem, J., Fontanillas, E., Toh, S.S., Chen, Z., Goldberg, J., Duplessis, S., Henrissat, B., Young, S., Zeng, Q., et al. 2015. Sex and parasites: genomic and transcriptomic analysis of Microbotryum lychnidis-dioicae, the biotrophic and plant-castrating anther smut fungus. BMC Genomics. 16, 461.

Permanyer, J., Gonzàlez-Duarte, R., Albalat, R. 2003. The non-LTR retrotransposons in Ciona intestinalis: new insights into the evolution of chordate genomes. Genome Biol. 4, R73.

Piednoël, M., Aberer, A.J., Schneeweiss, G.M., Macas, J., Novak, P., Gundlach, H., Temsch, E.M., Renner, S.S. 2012. Next-generation sequencing reveals the impact of repetitive DNA across phylogenetically closely related genomes of Orobanchaceae. Mol. Biol. Evol. 29, 3601- 3611.

Piškur, J., Ling, Z., Marcet-Houben, M., Ishchuk, O.P., Aerts, A., LaButti, K., Copeland, A., Lindquist, E., Barry, K., Compagno, C. 2012. The genome of wine yeast Dekkera bruxellensis provides a tool to explore its food-related properties. Int. J. Food Microbiol. 157, 202-209.

Piskurek, O., Nishihara, H., Okada, N. 2009. The evolution of two partner LINE/SINE families and a full-length chromodomain-containing Ty3/Gypsy LTR element in the first reptilian genome of Anolis carolinensis. Gene. 441, 111-118.

Plomion, C., Aury, J.-M., Amselem, J., Alaeitabar, T., Barbe, V., Belser, C., Bergès, H., Bodénès, C., Boudet, N., Boury, C., et al. 2015. Decoding the oak genome: public release of sequence data, assembly, annotation and publication strategies. Mol. Ecol. Resources. 16, 254- 265.

252

Polashock, J., Zelzion, E., Fajardo, D., Zalapa, J., Georgi, L., Bhattacharya, D., Vorsa, N. 2014. The American cranberry: first insights into the whole genome of a species adapted to bog habitat. BMC Plant Biol. 14, 165.

Poma, A., Venora, G., Miranda, M., Pacioni, G. 2002. The karyotypes of three Tuber species (Pezizales, Ascomycota). Caryologia. 55, 307-313.

Pombert, J.-F., Blouin, N.A., Lane,C., Boucias, D., Keeling, P.J. 2014. A lack of parasitic reduction in the obligate parasitic green alga Helicosporidium. PLoS Genet. 10, e1004355.

Pontius, J.U., Mullikin, J.C., Smith, D.R., Agencourt Sequencing Team, Lindblad-Toh, K., Gnerre, S., Clamp, M., Chang, J., Stephens, R., Neelam, B., et al. 2007. Initial sequence and comparative analysis of the cat genome. Genome Res. 17, 1675-1689.

Post, R. 2005. The chromosomes of the Filariae. Filaria J. 4, 10.

Price, D.C., Chan, C.X., Yoon, H.S., Yang, E.C., Qiu, H., Weber, A.P.M., Schwacke, R., Gross, J., Blouin, N.A., Lane, C., et al. 2012. Cyanophora paradoxa genome elucidates origin of photosynthesis in algae and plants. Science. 335, 843-847.

Pritham, E. J., Feschotte, C., Wessler, S.R. 2005. Unexpected diversity and differential success of DNA transposons in four species of Entamoeba protozoans. Mol. Biol. Evol. 22, 1751-1763.

Prochnik, S.E., Umen, J., Nedelcu, A.M., Hallmann, A., Miller, S.M., Nishii, I., Ferris, P., Kuo, A., Mitros, T., Fritz-Laylin, L.K., et al. 2010. Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri. Science. 329, 223-226.

Prüfer, K., Munch, K., Hellmann, I., Akagi, K., Miller, J.R., Walenz, B., Koren, S., Sutton, G., Kodira, C., Winer, R., et al. 2012. The bonobo genome compared with the and human genomes. Nature 486: 527-531.

Putnam, N.H., Butts, T., Ferrier, D.E.K., Furlong, R.F., Hellsten, U., Kawashima, T., Robinson- Rechavi, M., Shoguchi, E., Terry, A., Yu, J.-K., et al. 2008. The amphioxus genome and the evolution of the chordate karyotype. Nature. 453, 1064-1071.

Putnam, N.H., Srivastava, M., Hellsten, U., Dirks, B., Chapman, J., Salamov, A., Terry. A., Shapiro, H., Lindquist, E., Kapitonov, V.V., et al. 2007. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science. 317, 86-94.

Qin, C., Yu, C., Shen, Y., Fang, X., Chen, L., Min, J., Cheng, J., Zhao, S., Xu, M., Luo, Y., et al. 2014. Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum domestication and specialization. P. Natl. Acad. Sci. USA. 111, 5135-5140.

253

Qin, X., Evans, J.D., Aronstein, K.A., Murray, K.D., Weinstock, G.M. 2006. Genome sequences of the honey bee pathogens Paenibacillus larvae and Ascosphaera apis. Insect Mol. Biol. 15, 715-718.

Qiu, Q., Zhang, G., Ma, T., Qian, W., Wang, J., Ye, Z., Cao, C., Hu, Q., Kim, J., Larkin, D.M., et al. 2012. The yak genome and adaptation to life at high altitude. Nat. Genet. 44, 946-949.

Quandt, C.A., Bushley, K.E., Spatafora, J.W. 2015. The genome of the truffle-parasite Tolypocladium ophioglossoides and the evolution of antifungal peptaibiotics. BMC Genomics. 16, 553.

Que, Y., Xu, L., Wu, Q., Liu, Y., Ling, H., Liu, Y., Zhang, Y., Guo, J., Su, Y., Chen, J., Wang, S., Zhang, C. 2014. Genome sequencing of Sporisorium scitamineum provides insights into the pathogenic mechanisms of sugarcane smut. BMC Genomics. 15, 996.

Radakovits, R., Jinkerson, R.E., Fuerstenberg, S.I., Tae, H., Settlage, R.E., Boore, J.L., Posewitz, M.C. 2012. Draft genome sequence and genetic transformation of the oleaginous alga Nannochloropsis gaditana. Nat. Comm. 3, 686.

Rahman, A.Y.A., Usharraj, A.O., Misra, B.B., Thottathil, G.P., Jayasekaran, K., Feng, Y., Hou, S., Ong, S.Y., Ng, F.L., Lee, L.S., et al. 2013. Draft genome sequence of the rubber tree Hevea brasiliensis. BMC Genomics. 14, 75.

Ramezani-Rad, M., Hollenberg, C., Lauber, J., Wedler, H., Griess, E., Wagner, C., Albermann, K., Hani, J., Piontek, M., Dahlems, U. 2003. The Hansenula polymorpha (strain CBS4732) genome sequencing and analysis. FEMS Yeast Res. 4, 207-215.

Rampant, P. F., Lesur, I., Boussardon, C., Bitton, F., Martin-Magniette, M.-L., Bodenes, C., Le Provost, G., Berges, H., Fluch, S., Kremer, A., et al. 2011. Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome. BMC Genomics. 12, 292.

Rands, C.M., Darling, A., Fujita, M., Kong, L., Webster, M.T., Clabaut, C., Emes, R.D., Heger, A., Meader, S., Hawkins, M.B., et al. 2013. Insights into the evolution of Darwin's finches from comparative analysis of the Geospiza magnirostris genome sequence. BMC Genomics. 14, 95.

Rat Genome Sequencing Project Consortium. 2004. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 428, 493-521.

Ravin, N.V., Eldarov, M.A., Kadnikov, V.V., Beletsky, A.V., Schneider, J., Mardanova, E.S., Smekalova, E.M., Zvereva, M.I., Dontsova, O.A., Mardanov, A.V., et al. 2013. Genome sequence and analysis of methylotrophic yeast Hansenula polymorpha DL1. BMC Genomics. 14, 837. 254

Rawal, K., Dorji, S., Kumar, A., Ganguly, A., Grewal, A.S. 2013. Identification and characterization of MGEs and their insertion sites in the genome. Mob. Genet. Elements. 3, e25675.

Read, B.A., Kegel, J., Klute, M.J., Kuo, A., Lefebvre, S.C., Maumus, F., Mayer, C., Miller, J., Monier, A., Salamov, A., et al. 2013. Pan genome of the phytoplankton Emiliania underpins its global distribution. Nature. 499, 209-213.

Redman, E., Grillo, V., Saunders, G., Packard, E., Jackson, F., Berriman, M., Gilleard, J.S. 2008. Genetics of mating and sex determination in the parasitic nematode Haemonchus contortus. Genetics. 180, 1877-1887.

Reid, A.J., Vermont, S.J., Cotton, J.A., Harris, D., Hill-Cawthorne, G.A., Könen-Waisman, S., Latham, S.M., Mourier, T., Norton, R., Quail, M.A., et al. 2012. Comparative genomics of the apicomplexan parasites Toxoplasma gondii and Neospora caninum: Coccidia differing in host range and transmission strategy. PLoS Pathog. 8, e1002567.

Renfree, M.B., Papenfuss, A.T., Deakin, J.E., Lindsay, J., Heider, T., Belov, K., Rens, W., Waters, P.D., Pharo, E.A., Shaw, G., et al. 2011. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development. Genome Biol. 12, R81.

Renny-Byfield, S., Chester, M., Kovarík, A., Le Comber, S.C., Grandbastien, M.-A., Deloger, M., Nichols, R.A., Macas, J. Novak, P, Chase, M.W., et al. 2011. Next generation sequencing reveals genome downsizing in allotetraploid Nicotiana tabacum, predominantly through the elimination of paternally derived repetitive DNAs. Mol. Biol. Evol. 28, 2843-2854.

Rensing, S.A., Lang, D., Zimmer, A.D., Terry, A., Salamov, A., Shapiro, H., Nishiyama, T., Perroud, P.-F., Lindquist, E.A., Kamisugi,Y., et al. 2008. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science. 319, 64-69.

Retrobase. http://biocadmin.otago.ac.nz/fmi/xsl/retrobase/home.xsl

Rhesus Macaque Genome Sequencing and Analysis Consortium. 2007. Evolutionary and biomedical insights from the rhesus macaque genome. Science. 316, 222-234.

Rhind, N., Chen, Z., Yassour, M., Thompson, D.A., Haas, B.J., Habib, N., Wapinski, I., Roy, S., Lin, M.F., Heiman, D.I., et al. 2011. Comparative functional genomics of the fission yeasts. Science. 332, 930-936.

Riccombeni, A., Vidanes, G., Proux-Wéra, E., Wolfe, K.H., Butler, G. 2012. Sequence and analysis of the genome of the pathogenic yeast Candida orthopsilosis. PLoS One. 7, e35750.

255

Richards, S., Liu, Y., Bettencourt, B.R., Hradecky, P., Letovsky, S., Nielsen, R., Thornton, K., Hubisz, M.J., Chen, R., Meisel, R.P., et al. 2005. Comparative genome sequencing of Drosophila pseudoobscura, chromosomal, gene, and cis-element evolution. Genome Res. 15, 1-18.

Robb, S.M.C., Gotting, K., Ross, E., Alvarado, A.S. 2015. SmedGD 2.0: The Schmidtea mediterranea genome database. Genesis. 53, 535-546.

Rogers, M.B., Hilley, J.D., Dickens, N.J., Wilkes, J., Bates, P.A., Depledge, D.P., Harris, D., Her, Y., Herzyk, P., Imamura, H., et al. 2011. Chromosome and gene allow major structural change between species and strains of Leishmania. Genome Res. 21, 2129-2142.

Rouxel, T., Grandaubert, J., Hane, J.K., Hoede, C., van de Wouw, A.P., Couloux, A., Dominguez, V., Anthouard, V., Bally, P., Bourras, S., et al. 2011. Effector diversification within compartments of the Leptosphaeria maculans genome affected by Repeat-Induced Point mutations. Nat. Comm. 2, 202.

Rupp, O., Brinkrolf, K., Buerth, C., Kunigo, M., Schneider, J., Jaenicke, S., Goesmann, A., Puhler, A., Jaeger, K.E., Ernst, J.F. 2015. The structure of the Cyberlindnera jadinii genome and its relation to Candida utilis analyzed by the occurrence of single nucleotide polymorphisms. J. Biotech. 211, 20-30.

Ryan, J.F., Pang, K., Schnitzler, C.E., Nguyen, A.-D., Moreland, R.T., Simmons, D.K., Koch, B.J., Francis, W.R., Havlak, P., Smith, S.A., et al. 2013. The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science. 342, 1336.

Saika, A., Koike, H., Hori, T., Fukuoka, T., Sato, S., Habe, H., Kitamoto, D., Morita, T. 2014. Draft genome sequence of the yeast Pseudozyma antarctica type strain JCM10317, a producer of the glycolipid biosurfactants, mannosylerythritol lipids. Genome Announce. 2, e00878-00814.

Sanggaard, K.W., Bechsgaard, J.S., Fang, X., Duan, J., Dyrlund, T.F., Gupta, V., Jiang, X., Cheng, L., Fan, D., Feng, Y., et al. 2014. Spider genomes provide insight into composition and evolution of venom and silk. Nat. Comm. 2014, 3765.

Santana, M. F., Silva, J.C.F., Batista, A.D., Ribeiro, L.E., da Silva, G.F., de Araújo, E.F., de Queiroz, M.V.. 2012. Abundance, distribution and potential impact of transposable elements in the genome of Mycosphaerella fijiensis. BMC Genomics. 13, 720.

Sarilar, V., Devillers, H., Freel, K.C., Schacherer, J., Neuvéglise, C. 2015. Draft genome sequence of Lachancea lanzarotensis CBS 12615 T, an ascomycetous yeast isolated from grapes. Genome Announce. 3, e00292-00215.

256

Sato, S., Hirakawa, H., Isobe, S., Fukai, E., Watanabe, A., Kato, M., Kawashima, K., Minami, C., Muraki, A., Nakazaki, N., et al. 2011. Sequence analysis of the genome of an oil-bearing tree, Jatropha curcas L. DNA Res. 18, 65-76.

Sato, S., Nakamura, Y., Kaneko, T., Asamizu, E., Kato, T., Nakao, M., Sasamoto, S., Watanabe, A., Ono, A., Kawashima, K., et al. 2008. Genome structure of the legume, Lotus japonicus. DNA Res. 15, 227-239.

Scally, A., Dutheil, J.Y., Hillier, L.W., Jordan, G.E., Goodhead, I., Herrero, J., Hobolth, A., Lappalainen, T., Mailund, T., Marques-Bonet, T., et al. 2012. Insights into hominid evolution from the gorilla genome sequence. Nature. 483, 169-175.

Schaeffer, S.W., Bhutkar, A., McAllister, B.F., Matsuda, M., Matzkin, L.M., O'Grady, P.M., Rohde, C., Valente, V.L.S., Aguadé, M., Anderson, W.W., et al. 2008. Polytene chromosomal maps of 11 Drosophila species: The order of genomic scaffolds inferred from genetic and physical maps. Genetics. 179, 1601-1655.

Schafhauser, T., Wibberg, D., Ruckert, C., Winkler, A., Flor, L., van Pee, K.H., Fewer, D.P., Sivonen, K., Jahn, L., Ludwig-Muller, J., et al. 2015. Draft genome sequence of Talaromyces islandicus ("Penicillium islandicum") WF-38-12, a neglected mold with significant biotechnological potential. J. Biotech. 211, 101-102.

Schardl, C.L., Young, C.A., Hesse, U., Amyotte, S.G., Andreeva, K., Calie, P.J., Fleetwood, D.J., Haws, D.C., Moore, N., Oeser, B., et al. 2013. Plant-symbiotic fungi as chemical engineers, multi-genome analysis of the Clavicipitaceae reveals dynamics of alkaloid loci. PLoS Genet. 9, e1003323.

Schartl, M., Walter, R.B., Shen, Y., Garcia, T., Catchen, J., Amores, A., Braasch, I., Chalopin, D., Volff, J.-N., Lesch, K.-P., et al. 2013. The genome of the platyfish, Xiphophorus maculatus, provides insights into evolutionary adaptation and several complex traits. Nat. Genet. 45, 567- 572.

Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., Hyten, D.L., Song, Q., Thelen, J.J., Cheng, J., et al. 2010. Genome sequence of the palaeopolyploid soybean. Nature. 463, 178-183.

Schmutz, J., McClean, P.E., Mamidi, S., Wu, G.A., Cannon, S.B., Grimwood, J., Jenkins, J., Shu, S., Song, Q., Chavarro, C., et al. 2014. A reference genome for common bean and genome- wide analysis of dual domestications. Nat. Genet. 46, 707-713.

Schnable, P.S., Ware, D., Fulton, R.S., Stein, J.C., Wei, F., Pasternak, S., Liang, C., Zhang, J., Fulton, L., Graves, T.A., et al. 2009. The B73 maize genome, complexity, diversity, and dynamics. Science. 326, 1112-1115.

257

Schneider, J., Andrea, H., Blom, J., Jaenicke, S., Rückert, C., Schorsch, C., Szczepanowski, R., Farwick, M., Goesmann, A., Pühler, A., et al. 2012. Draft genome sequence of Wickerhamomyces ciferrii NRRL Y-1031 F-60-10. Eukaryot. Cell. 11, 1582-1583.

Schneider, J., Rupp, O., Trost, E., Jaenicke, S., Passoth, V., Goesmann, A., Tauch, A., Brinkrolf, K. 2012. Genome sequence of Wickerhamomyces anomalus DSM 6766 reveals genetic basis of biotechnologically important antimicrobial activities. FEMS Yeast Res. 12, 382-386.

Schwartze, V.U., Winter, S., Shelest, E., Marcet-Houben, M., Horn, F., Wehner, S., Linde, J., Valiante, V., Sammeth, M., Riege, K., et al. 2014. Gene expansion shapes genome architecture in the human pathogen Lichtheimia corymbifera: an evolutionary genomics analysis in the ancient terrestrial mucorales (Mucoromycotina). PLoS Genet. 10, e1004496.

Schwarz, E.M., Korhonen, P.K., Campbell, B.E., Young, N.D., Jex, A.R., Jabbar, A., Hall, R.S., Mondal, A., Howe, A.C., Pell, J., et al. 2013. The genome and developmental transcriptome of the strongylid nematode Haemonchus contortus. Genome Biol. 14, R89.

Schwarz, E.M., Hu, Y., Antoshechkin, I., Miller, M.M., Sternberg, P.W., Aroian, R.V. 2015. The genome and transcriptome of the zoonotic hookworm Ancylostoma ceylanicum identify infection-specific gene families. Nat. Genet. 47, 416-422.

Scott, J.G., Warren, W.C., Beukeboom, L.W., Bopp, D., Clark, A.G., Giers, S.D., Hediger, M., Jones, A.K., Kasai, S., Leichter, C.A., et al. 2014. Genome of the house fly, Musca domestica L., a global vector of diseases with adaptations to a septic environment. Genome Biol. 15, 466.

Sea Urchin Genome Sequencing Consortium. 2006. The genome of the sea urchin Strongylocentrotus purpuratus. Science. 314, 941-952.

Seabury, C.M., Dowd, S.E., Seabury, P.M., Raudsepp, T., Brightsmith, D.J., Liboriussen, P., Halley, Y., Fisher, C.A., Owens, E., Viswanathan, G., et al. 2013. A multi-platform draft de novo genome assembly and comparative analysis for the scarlet macaw (Ara macao). PLoS One. 8, e62415.

Seim, I., Fang, X., Xiong, Z., Lobanov, A.V., Huang, Z., Ma, S., Feng, Y., Turanov, A.A., Zhu, Y., Lenz, T.L., et al. 2013. Genome analysis reveals insights into physiology and longevity of the Brandt‘s bat Myotis brandtii. Nat. Comm. 4, 2212.

Shaffer, H.B., Minx, P., Warren, D.E., Shedlock, A.M., Thomson, R.C., Valenzuela, N., Abramyan, J., Amemiya, C.T., Badenhorst, D., Biggar, K.K., et al. 2013. The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage. Genome Biol. 14, R28.

258

Shapiro, M.D., Kronenberg, Z., Li, C., Domyan, E.T., Pan, H., Campbell, M., Tan, H., Huff, C.D., Hu, H., Vickrey, A.I., et al. 2013. Genomic diversity and evolution of the head crest in the rock pigeon. Science. 339, 1063-1067.

Sharma, M. K., Sharma, R., Cao, P., Jenkins, J., Bartley, L.E., Qualls, M., Grimwood, J., Schmutz, J., Rokshar, D., Ronald, P.C. 2012. A genome-wide survey of switchgrass genome structure and organization. PLoS One. 7, e33892.

Sharpton, T.J., Stajich, J.E., Rounsley, S.D., Gardner, M.J., Wortman, J.R., Jordar, V.S., Maiti, R., Kodira, C.D., Neafsey, D.E., Zeng, Q., et al. 2009. Comparative genomic analyses of the human fungal pathogens Coccidioides and their relatives. Genome Res. 19, 1722-1731.

Shinzato, C., Shoguchi, E., Kawashima, T., Hamada, M., Hisata, K., Tanaka, M., Fujie, M., Fujiwara, M., Koyanagi, R., Ikuta, T., et al. 2011. Using the Acropora digitifera genome to understand coral responses to environmental change. Nature. 476, 320-324.

Shoguchi, E., Shinzato, C., Kawashima, T., Gyoja, F., Mungpakdee, S., Koyanagi, R., Takeuchi, T., Hisata, K., Tanaka, M., Fujiwara, M., et al. 2013. Draft assembly of the Symbiodinium minutum nuclear genome reveals dinoflagellate gene structure. Curr. Biol. 23, 1399-1408.

Shulaev, V., Sargent, D.J., Crowhurst, R.N., Mockler, T.C., Folkerts, O., Delcher, A.L., Jaiswal, P., Mockaitis, K., Liston, A., Mane, S.P., et al. 2010. The genome of woodland strawberry (Fragaria vesca). Nat. Genet. 43, 109-118.

Sierro, N., Battey, J.N.D., Ouadi, S., Bakaher, N., Bovet, L., Willig, A., Goepfert, S., Peitsch, M.C., Ivanov, N.V. 2014. The tobacco genome sequence and its comparison with those of tomato and potato. Nat. Comm. 5, 3833.

Simakov, O., Marletaz, F., Cho, S.-J., Edsinger-Gonzales, E., Havlak, P., Hellsten, U., Kuo, D.- H., Larsson, T., Lv, J., Arendt, D., et al. 2013. Insights into bilaterian evolution from three spiralian genomes. Nature. 493, 526-531.

Simmen, M. W., Bird, A. 2000. Sequence analysis of transposable elements in the sea squirt, Ciona intestinalis. Mol. Biol. Evol. 17, 1685-1694.

Singh, R., Ong-Abdullah, M., Low, E.-T.L., Manaf, M.A.A., Rosli, R., Nookiah, R., Ooi, L.C.- L., Ooi, S.E., Chan, K.-L., Halim, M.A., et al. 2013. Oil palm genome sequence reveals divergence of interfertile species in Old and New worlds. Nature. 500, 335-339.

Slotte, T., Hazzouri, K.M., Ågren, J.A., Koenig, D., Maumus, F., Guo, Y.-L., Steige, K., Platts, A.E., Escobar, J.S., Newman, L.K., et al. 2013. The Capsella rubella genome and the genomic consequences of rapid mating system evolution. Nat. Genet. 45, 831-835.

259

Smith, C.D., Zimin, A., Holt, C., Abouheif, E., Benton, R., Cash, E., Croset, V., Currie, C.R., Elhaik, E., Elsik, C.G., et al. 2011. Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile). P. Natl. Acad. Sci. USA. 108, 5673-5678.

Smith, C.R., Smith, C.D., Robertson, H.M., Helmkampf, M., Zimin, A., Yandell, M., Holt, C., Hu, H., Abouheif, E., Benton, R., et al. 2011. Draft genome of the red harvester ant Pogonomyrmex barbatus. P. Natl. Acad. Sci. USA. 108, 5667-5672.

Smith, J.J., Kuraku, S., Holt, C., Sauka-Spengler, T., Jiang, N., Campbell, M.S., Yandell, M.D., Manousaki, T., Meyer, A., Bloom, O.E., et al. 2013. Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution. Nat. Genet. 45, 415-421.

Soliai, M.M., Meyer, S.E., Udall, J.A., Elzinga, D.E., Hermansen, R.A., Bodily, P.M., Hart, A.A., Coleman, C.E. 2014. De novo genome assembly of the fungal plant pathogen Pyrenophora semeniperda. PLoS One 9, e87045.

Souciet, J.-L., Dujon, B., Gaillardin, C., Johnston, M., Baret, P.V., Cliften, P., Sherman, D.J., Weissenbach, J., Westhof, E., Wincker, P., et al. 2009. Comparative genomics of protoploid Saccharomycetaceae. Genome Res. 19, 1696-1709.

Spanu, P.D., Abbott, J.C., Amselem, J., Burgis, T.A., Soanes, D.M., Stuber, K., Loren van Themaat, E.V., Brown, J.K.M., Butcher, S.A., Gurr, S.J., et al. 2010. Genome expansion and gene loss in powdery mildew fungi reveal tradeoffs in extreme parasitism. Science. 330, 1543- 1546.

Srivastava, M., Begovic, E., Chapman, J., Putnam, N.H., Hellsten, U., Kawashima, T., Kuo, A., Mitros, T., Salamov, A., Carpenter, M.L., et al. 2008. The Trichoplax genome and the nature of placozoans. Nature. 454, 955-960.

Srivastava, M., Simakov, O., Chapman, J., Fahey, B., Gauthier, M.E.A., Mitros, T., Richards, G.S., Conaco, C., Dacre, M., Hellsten, U., et al. 2010. The Amphimedon queenslandica genome and the evolution of animal complexity. Nature. 466, 720-726.

Staats, M., van Kan, J.A.L. 2012. Genome update of Botrytis cinerea strains B05.10 and T4. Eukaryot. Cell. 11, 1413-1414.

Stajich, J.E., Dietrich, F.S., Roy, S.W. 2007. Comparative genomic analysis of fungal genomes reveals intron-rich ancestors. Genome Biol. 8, R223.

Stajich, J.E., Wilke, S.K., Ahrén, D., Au, C.H., Birren, B.W., Borodovsky, M., Burns, C., Canbäck, B., Casselton, L.A., Cheng, C.K., et al. 2010. Insights into evolution of multicellular fungi from the assembled chromosomes of the mushroom Coprinopsis cinerea (Coprinus cinereus). P. Natl. Acad. Sci. USA. 107, 11889-11894.

260

Star, B., Nederbragt, A.J., Jentoft, S., Grimholt, U., Malmstrøm, M., Gregers, T.F., Rounge, T.B., Paulsen, J., Solbakken, M.H., Sharma, A., et al. 2011. The genome sequence of Atlantic cod reveals a unique immune system. Nature. 477, 207-210.

Staton, S. E., Bakken, B.H., BBlackman, B.K., Chapman, M.A., Kane, N.C., Tang, S., Ungerer, M.C., Knapp, S.J., Rieseberg, L.H., Burke, J.M. 2012. The sunflower (Helianthus annuus L.) genome reflects a recent history of biased accumulation of transposable elements. Plant J. 72, 142-153.

Stein, L.D., Bao, Z., Blasiar, D., Blumenthal, T., Brent, M.R., Chen, N., Chinwalla, A., Clarke, L., Clee, C., Coghlan, A., et al. 2003. The genome sequence of Caenorhabditis briggsae, a platform for comparative genomics. PLoS Biol. 1, E45.

Sterflinger, K., Lopandic, K., Blasi, B., Poynter, C., de Hoog, S., Tafer, H. 2015. Draft genome of Cladophialophora immunda, a black yeast and efficient degrader of polyaromatic hydrocarbons. Genome Announce. 3, e01283-01214.

Sterflinger, K., Lopandic, K., Pandey, R.V., Blasi, B., Kriegner, A. 2014. Nothing special in the specialist? Draft genome sequence of Cryomyces antarcticus, the most extremophilic fungus from Antarctica. PloS One. 9, e109908.

Straub, S. C. K., Fishbein, M., Livshultz, T., Foster, Z., Parks, M., Weitemier, K., Cronn, R.C., Liston, A. 2011. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing. BMC Genomics. 12, 211.

Sucgang, R., Kuo, A., Tian, X., Salerno, W., Parikh, A., Feasley, C.L., Dalin, E., Tu, H., Huang, E., Barry, K., et al. 2011. Comparative genomics of the social amoebae Dictyostelium discoideum and Dictyostelium purpureum. Genome Biol. 12, R20.

Suen, G., Teiling, C., Li, L., Holt, C., Abouheif, E., Bornberg-Bauer, E., Bouffard, P., Caldera, E.J., Cash, E., Cavanaugh, A., et al. 2011. The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle. PLoS Genet. 7, e1002007.

Suga, H., Chen, Z., de Mendoza, A., Sebé-Pedrós, A., Brown, M.W., Kramer, E., Carr, M., Kerner, P., Vervoort, M., Sánchez-Pons, N., et al. 2013. The Capsaspora genome reveals a complex unicellular prehistory of animals. Nat. Comm. 4, 1-9.

Sun, C., Shepard, D.B., Chong, R.A., López Arriaza, J., Hall, K., Castoe, T.A., Feschotte, C., Pollock, D.D, Mueller, R.L. 2012. LTR retrotransposons contribute to genomic gigantism in plethodontid salamanders. Genome Biol. Evol. 4, 168-183.

261

Sun, Y.-B., Xiong, Z.-J., Xiang, X.-Y., Liu, S.-P., Zhou, W.-W., Tu, X.-L., Zhong, L., Wang, L., Wu, D.-D., Zhang, B.-L., et al. 2015. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes. P. Natl. Acad. Sci. USA. 112, E1257-E1262.

Sun, Z.-B., Sun, M.-H., Li, S.-D. 2015. Draft genome sequence of mycoparasite Clonostachys rosea Strain 67-1. Genome Announce. 3, e00546-00515.

Sunil, M., Hariharan, A.K., Nayak, S., Gupta, S., Nambisan, S.R., Gupta, R.P., Panda, B., Choudhary, B., Srinivasan, S. 2014. The draft genome and transcriptome of Amaranthus hypochondriacus, a C4 dicot producing high-lysine edible pseudo-cereal. DNA Res. 21, 1-18.

Suzuki, H., MacDonald, J., Syed, K., Salamov, A., Hori, C., Aerts, A., Henrissat, B., Wiebenga, A., VanKuyk, P.A., Barry, K., et al. 2012. Comparative genomics of the white-rot fungi, Phanerochaete carnosa and P. chrysosporium, to elucidate the genetic basis of the distinct wood types they colonize. BMC Genomics. 13,444.

Suzuki, T., Hoshino, T., Matsushika, A. 2014. Draft genome sequence of Kluyveromyces marxianus strain DMB1, isolated from sugarcane bagasse hydrolysate. Genome Announce. 2, e00733-00714.

Sveinsson, S., Gill, N., Kane, N.C., Cronk, Q. 2013. Transposon fingerprinting using low coverage whole genome shotgun sequencing in Cacao (Theobroma cacao L.) and related species. BMC Genomics. 14, 502.

Swaminathan, K., Alabady, M.S., Varala, K., De Paoli, E., Ho, I., Rokhsar, D.S., Arumuganathan, A.K., Ming, R., Green, P.J., Meyers, B.C., et al. 2010. Genomic and small RNA sequencing of Miscanthus x giganteus shows the utility of sorghum as a reference genome sequence for Andropogoneae grasses. Genome Biol. 11, R12.

Swart, E.C., Bracht, J.R., Magrini, V., Minx, P., Chen, X., Zhou, Y., Khurana, J.S., Goldman, A.D., Nowacki, M., Schotanus, K., et al. 2013. The Oxytricha trifallax macronuclear genome, a complex eukaryotic genome with 16,000 tiny chromosomes. PLoS Biol. 11, e1001473.

Tachibana, S.-I., Sullivan, S.A., Kawai, S., Nakamura, S., Kim, H.R., Goto, N., Arisue, N., Palacpac, N.M.Q., Honma, H., Yagi, M., et al. 2012. Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade. Nat. Genet. 44, 1051- 1055.

Tafer, H., Lopandic, K., Blasi, B., Poyntner, C., Sterflinger, K. 2015. Draft genome sequence of Exophiala mesophila, a black yeast with high bioremediation potential. Genome Announce. 3, e00203-00215.

262

Takeda, I., Tamano, K., Yamane, N., Ishii, T., Miura, A., Umemura, M., Terai, G., Baker, S., Koike, H., Machida, M. 2014. Genome sequence of the Mucoromycotina fungus Umbelopsis isabellina, an effective producer of lipids. Genome Announce. 2, e00071-00014.

Takeuchi, T., Kawashima, T., Koyanagi, R., Gyoja, F., Tanaka, M., Ikuta, T., Shoguchi, E., Fujiwara, M., Shinzato, C., Hisata, K., et al. 2012. Draft genome of the pearl oyster Pinctada fucata, a platform for understanding bivalve biology. DNA Res. 19, 117-130.

Tamazian, G., Simonov, S., Dobrynin, P., Makunin, A., Logachev, A., Komissarov, A., Shevchenko, A., Brukhin, V., Cherkasov, N., Svitin, A., et al. 2014. Annotated features of domestic cat - Felis catus genome. GigaSci. 3,13.

Tang, J.D., Perkins, A.D., Sonstegard, T.S., Schroeder, S.G., Burgess, S.C., Diehl, S.V. 2012. Short-read sequencing for genomic analysis of the brown rot fungus Fibroporia radiculosa. Appl. Environ. Microb. 78, 2272-2281.

Terol, J., Naranjo, M.A., Ollitrault, P., Talon, M. 2008. Development of genomic resources for Citrus clementina: characterization of three deep-coverage BAC libraries and analysis of 46,000 BAC end sequences. BMC Genomics. 9, 423.

Terrapon, N., Li, C., Robertson, H.M., Ji, L., Meng, X., Booth, W., Chen, Z., Childers, C.P., Glastad, K.M., Gokhale, K., et al. 2014. Molecular traces of alternative social organization in a termite genome. Nat. Comm. 5, 3636.

The Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 408, 796-815.

The Bovine Genome Sequencing and Analysis Consortium, Elsik, C.G., Tellam, R.L., Worley, K.C. 2009. The genome sequence of taurine cattle, a window to ruminant biology and evolution. Science. 324, 522-528.

The Brassica rapa Genome Sequencing Project Consortium. 2011. The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 43, 1035-1039.

The C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans, a platform for investigating biology. Science. 282, 2012-2018.

The Chimp Sequencing and Analysis Consortium. 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 437, 69-87.

The French-Italian Public Consortium for Grapevine Genome Characterization. 2007. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 449, 463-467.

263

The Genolevures Consortium. 2009. Comparative genomics of protoploid Saccharomycetaceae. Genome Research 19: 1696-1709.

The Heliconius Genome Consortium. 2012. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature. 487, 94-98.

The Honeybee Genome Sequencing Consortium. 2006. Insights into social insects from the genome of the honeybee Apis mellifera. Nature. 443, 931-949.

The International Aphid Genomics Consortium. 2010. Genome sequence of the pea aphid Acyrthosiphon pisum. PLoS Biol. 8, e1000313.

The International Barley Genome Sequencing Consortium. 2012. A physical, genetic and functional sequence assembly of the barley genome. Nature. 491, 711–716.

The International Brachypodium Initiative. 2010. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 463, 763-768.

The International Peach Genome Initiative. 2013 The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat. Genet. 45, 487–494.

The International Wheat Genome Sequencing Consortium. 2014. A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science. 345, 1251788- 1251781.

The Nasonia Genome Working Group. 2010. Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science. 327,343-348.

The Pneumocystis Genome Project. 2015. http://pgp.cchmc.org

The Potato Genome Sequencing Consortium. 2011. Genome sequence and analysis of the tuber crop potato. Nature. 475, 189-197.

The Schistosoma japonicum Genome Sequencing and Functional Analysis Consortium. 2009. The Schistosoma japonicum genome reveals features of host–parasite interplay. Nature. 460, 345-352.

The Tomato Genome Constortium. 2012. The tomato genome sequence provides insights into fleshy fruit evolution. Nature. 485, 635-641.

Thon, M. R., Pan, H., Diener, S., Papalas, J., Taro, A., Mitchell, T.K. 2006. The role of transposable element clusters in genome evolution and loss of synteny in the rice blast fungus Magnaporthe oryzae. Genome Biol. 7, R16.

264

Thon, M. R., Martin, S.L., Goff, S., Wing, R.A., Dean, R.A. 2004. BAC end sequences and a physical map reveal transposable element content and clustering patterns in the genome of Magnaporthe grisea. Fungal Genet. Biol. 41, 657-666.

Tian, Y., Zeng, Y., Zhang, J., Yang, C., Yan, L., Wang, X., Shi, C., Xie, J., Dai, T., Peng, L., et al. 2015. High quality reference genome of drumstick tree (Moringa oleifera Lam.), a potential perennial crop. Sci. China: Life Sci. 58, 627-638.

Tisserant, E., Malbreil, M., Kuo, A., Kohler, A., Symeonidi, A., Balestrini, R., Charron, P., Frei dit Frey, N., Gianinazi-Pearson, V., Gilbert, L., et al. 2013. Genome of an arbuscular mycorrhizal fungus provides insight into the oldest plant symbiosis. P. Natl. Acad. Sci. USA. 110, 20117-20122.

Toome, M., Kuo, A., Henrissat, B., Lipzen, A., Tritt, A., Yoshinaga, Y., Zane, M., Barry, K., Grigoriev, I., Spatafora, J., et al. 2014. Draft genome sequence of a rare smut relative, Tilletiaria anomala UBC 951. Genome Announce. 2, e00539-00514.

Toome, M., Ohm, R.A., Riley, R.W., James, T.Y., Lazarus, K.L., Henrissat, B., Albu, S., Boyd, A., Chow, J., Clum, A. et al. 2014. Genome sequencing provides insight into the reproductive biology, nutritional mode and ploidy of the fern pathogen Mixia osmundae. New Phytol. 202, 554-564.

Traeger, S., Altegoer, F., Freitag, M., Gabaldon, T., Kempken, F., Kumar, A., Marcet-Houben, M., Pöggeler, S., Stajich, J.E., Nowrousian, M. 2013. The genome and development-dependent transcriptomes of Pyronema confluens, a window into fungal evolution. PLoS Genet. 9, e1003820.

Traut, W., Vogel, H., Glöckner, G., Hartmann, E., Heckel, D.G. 2013. High-throughput sequencing of a single chromosome: a moth W chromosome. Chromosome Res. 21, 491-505.

Triana, S., Ohm, R.A., Cock, H.D., Restrepo, S., Celis, A. 2015. Draft genome sequence of the animal and human pathogen Malassezia pachydermatis strain CBS 1879. Genome Announce. 3, e01197-01115.

Tribolium Genome Sequencing Consortium. 2008. The genome of the model beetle and pest Tribolium castaneum. Nature. 452, 949-955.

Tsai, I.J., Zarowiecki, M., Holroyd, N., Garciarrubio, A., Sanchez-Flores, A., Brooks, K.L., Tracey, A., Bobes, R.J., Fragoso, G., Sciutto, E., et al. 2013. The genomes of four tapeworm species reveal adaptations to parasitism. Nature. 496, 57-63.

Tsuji, M., Kudoh, S., Hoshino, T. 2015. Draft genome sequence of cryophilic Basidiomycetous yeast Mrakia blollopis SK-4, isolated from an algal mat of Naga-ike Lake in the Skarvsnes ice- free area, east Antarctica. Genome Announce. 3, e01454-01414.

265

Turk, M., Plemenitaš, A. 2002. The HOG pathway in the halophilic black yeast Hortaea werneckii: Isolation of the HOG1 homolog gene and activation of HwHog1p. FEMS Microbiol. Lett. 216, 193-199.

Tuskan, G.A., Difazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam ,N., Ralph, S., Rombauts, S., Salamov, A., et al. 2006. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 313, 1596-1604.

Tyler, B.M., Tripathy, S., Zhang, X., Dehal, P., Jiang, R.H.Y., Aerts, A., Arredondo, F.D., Baxter, L., Bensasson, D., Beynon, J.L., et al. 2006. Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis. Science. 313, 1261-1266.

Vaidya, M. B., Sajnani, M.R., Ramani, U.V., Tripathi, A.K., Bhatt, V.D., Patel, J.S., Patel, M.M. 2012. A preliminary analysis of repetitive sequence organisation in Bubalus bubalis genome. Indian J. Biotech. 11, 62-66.

Valdes, J., Tapia, P., Cepeda, V., Varela, J., Godoy, L., Cubillos, F.A., Silva, E., Martinez, C.,Ganga, M.A. 2014. Draft genome sequence and transcriptome analysis of the wine spoilage yeast Dekkera bruxellensis LAMAP2480 provides insights into genetic diversity, metabolism and survival. FEMS Microbiol. Letters 361, 104-106. van Bakel, H., Stout, J.M., Cote, A.G., Tallon, C.M., Sharpe, A.G., Hughes, T.R., Page, J.E. 2011. The draft genome and transcriptome of Cannabis sativa. Genome Biol. 12, R102. van den Berg, M.A., Albang, R., Albermann, K., Badger, J.H., Daran, J.-M., Driessen, A.J.M., Garcia-Estrada, C., Fedorova, N.D., Harris, D.M., Heijne, W.H.M., et al. 2008. Genome sequencing and analysis of the filamentous fungus Penicillium chrysogenum. Nat. Biotech. 26, 1161-1168. van het Hoog, M., Rast, T.J., Martchenko, M., Grindle, S., Dignard, D., Hogues, H., Cuomo, C., Berriman, M., Scherer, S., Magee, B.B., et al. 2007. Assembly of the Candida albicans genome into sixteen supercontigs aligned on the eight chromosomes. Genome Biol. 8, R52.

Vandeputte, P., Ghamrawi, S., Rechenmann, M., Iltis, A., Giraud, S., Fleury, M., Thornton, C., Delhaès, L., Meyer, W., Papon, N., Bouchara, J.-P. 2014. Draft genome sequence of the pathogenic fungus Scedosporium apiospermum. Genome Announce. 2, e00988-00914.

Varshney, R.K., Chen, W., Li, Y., Bharti, A.K., Saxena, R.K., Schlueter, J.A., Donoghue, M.T.A., Azam, S., Fan, G., Whaley, A.M., et al. 2012. Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat. Biotech. 30, 83-89.

Varshney, R.K., Song, C., Saxena, R.K., Azam, S., Yu, S., Sharpe, A.G., Cannon, S., Baek, J., Rosen, B.D., Tar'an, B., et al. 2013, Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement. Nat. Biotech. 31, 240-246.

266

Vega-Alvarado, L., Gómez-Angulo, J., Escalante-García, Z., Grande, R., Gschaedler-Mathis, A., Amaya-Delgado, L., Sanchez-Flores, A., Arrizon, J. 2015. High-quality draft genome sequence of Candida apicola NRRL Y-50540. Genome Announce. 3, e00437-00415.

Velasco, R., Zharkikh, A., Affourtit, J., Dhingra, A., Cestaro, A., Kalyanaraman, A., Fontana, P., Bhatnagar, S.K., Troggio, M., Pruss, D., et al. 2010. The genome of the domesticated apple (Malus × domestica Borkh.). Nat. Genet. 42, 833-839.

Velasco, R., Zharkikh, A., Troggio, M., Cartwright, D.A., Cestaro, A., Pruss, D., Pindo, M., Fitzgerald, L.M., Vezzulli, S., Reid, J., et al. 2007. A high quality draft consensus sequence of the genome of a heterozygous grapevine variety. PLoS One. 2, e1326.

Venancio, T. M., Wilson, R.A., Verjovska-Almeida, S., DeMarco, R. 2010. Bursts of transposition from non-long terminal repeat retrotransposon families of the RTE clade in Schistosoma mansoni. Int. J. Parasitol. 40:743-749.

Venkatesh, B., Lee, A.P., Ravi, V., Maurya, A.K., Lian, M.M., Swann, J.B., Ohta, Y., Flajnik, M.F., Sutoh, Y., Kasahara, M., et al. 2014. Elephant shark genome provides unique insights into gnathostome evolution. Nature .505, 174-179.

Venkatesh, B., Kirkness, E.F., Loh, Y.-H., Halpern, A.L., Lee, A.P., Johnson, J., Dandona, N., Viswanathan, L.D., Tay, A., Venter, J.C., et al. 2007. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii) genome. PLoS Biol. 5, e101.

Vinson, J.P., Jaffe, D.B., O'Neill, K., Karlsson, E.K., Stange-Thomann, N., Anderson, S., Mesirov, J.P., Satoh, N., Satou, Y., Nusbaum, C., et al. 2005. Assembly of polymorphic genomes, algorithms and application to Ciona savignyi. Genome Res. 15, 1127-1135.

Vitte, C., Estep, M.C., Leebens-Mack, J., Bennetzen, J.L.. 2013. Young, intact and nested retrotransposons are abundant in the onion and asparagus genomes. Ann. Bot. 112, 881-889.

Volleth, M., Heller, K.-G. 2012. Variations on a theme: karyotype comparison in Eurasian Myotis species and implications for phylogeny. Vespertillo. 116,329-350.

Vonk, F.J., Casewell, N.R., Henkel, C.V., Heimberg, A.M., Jansen, H.J., McCleary, R.J.R., Kerkkamp, H.M.E., Vos, R.A., Guerreiro, I., Calvete, J.J., et al. 2013. The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system. P. Natl. Acad. Sci. USA. 110, 20651-20656.

Voskoboynik, A., Neff, N.F., Sahoo, D., Newman, A.M., Pushkarev, D., Koh, W., Passarelli, B., Fan, H.C., Mantalas, G.L., Palmeri, K.J., et al. 2013.The genome sequence of the colonial chordate, Botryllus schlosseri. eLife. 2, e00569-e00569.

267

Wade, C.M., Giulotto, E., Sigurdsson, S., Zoli, M., Gnerre, S., Imsland, F., Lear, T.L., Adelson, D.L., Bailey, E., Bellone, R.R., et al. 2009. Genome sequence, comparative analysis, and population genetics of the domestic horse. Science. 326, 865-867.

Wan, Q.-H., Pan, S.-K., Hu, L., Zhu, Y., Xu, P.-W., Xia, J.-Q., Chen, H., He, G.-Y., He, J., Ni, X.-W., et al. 2013. Genome analysis and signature discovery for diving and sensory properties of the endangered Chinese alligator. Cell Res. 23, 1091-1105.

Wang, W., Haberer, G., Gundlach, H., Gläßer, C., Nussbaumer, T., Luo, M.C., Lomsadze, A., Borodovsky, M., Kerstetter, R.A., Shanklin, J., et al. 2014. The Spirodela polyrhiza genome reveals insights into its neotenous reduction fast growth and aquatic lifestyle. Nat. Comm. 5, 3311.

Wang, K., Wang, Z., Li, F., Ye, W., Wang, J., Song, G., Yue, Z., Cong, L., Shang, H., Zhu, S., et al. 2012. The draft genome of a diploid cotton Gossypium raimondii. Nat. Genet. 44, 1098-1103.

Wang, L., Chen, W., Feng, Y., Ren, Y., Gu, Z., Chen, H., Wang, H., Thomas, M.J., Zhang, B., Berquin, I.M., et al. 2011. Genome characterization of the oleaginous fungus Mortierella alpina. PLoS One. 6, e28319.

Wang, N., Thomson, M., Bodles, W.J.A., Crawford, R.M.M., Hunt, H.V., Featherstone, A.W., Pellicer, J., Buggs, R.J.A. 2013. Genome sequence of dwarf birch (Betula nana) and cross- species RAD markers. Mol. Ecol. 22, 3098-3111.

Wang, S., Zhang, L., Meyer, E., Bao, Z. 2010. Genome-wide analysis of transposable elements and tandem repeats in the compact placozoan genome. Biol. Direct. 5, 18.

Wang, X., Chen, W., Huang, Y., Sun, J., Men, J., Liu, H., Luo, F., Guo, L., Lv, X., Deng, C., et al. 2011. The draft genome of the carcinogenic human liver fluke Clonorchis sinensis. Genome Biol. 12, R107.

Wang, X., Fang, X., Yang, P., Jiang, X., Jiang, F., Zhao, D., Li, B., Cui, F., Wei, J., Ma, C., et al. 2014. The locust genome provides insight into swarm formation and long-distance flight. Nat. Comm. 5, 2957.

Wang, X., Liu, Q., Wang, H., Luo, C.-X., Wang, G., Luo, M. 2013. A BAC based physical map and genome survey of the rice false smut fungus Villosiclava virens. BMC Genomics. 14, 883.

Wang, X., Zhang, X., Liu, L., Xiang, M., Wang, W., Sun, X., Che, Y., Guo, L., Liu, G., Guo, L., Wang, C., Yin, W.-B., Stadler, M., Zhang, X., Liu, X. 2015. Genomic and transcriptomic analysis of the endophytic fungus Pestalotiopsis fici reveals its lifestyle and high potential for synthesis of natural products. BMC Genomics. 16, 28.

268

Wang, Y.-Y., Liu, B., Zhang, X.-Y., Zhou, Q.-M., Zhang, T., Li, H., Yu, Y.-F., Zhang, X.-L., Hao, X.-Y., Wang, M., et al. 2014. Genome characteristics reveal the impact of lichenization on lichen-forming fungus Endocarpon pusillum Hedwig (Verrucariales, Ascomycota). BMC Genomics. 15, 34.

Wang, Z., Hobson, N., Galindo, L., Zhu, S., Shi, D., McDill, J., Yang, L., Hawkins, S., Neutelings, G., Datla, R., et al. 2012. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J. 72, 461-473.

Wang, Z., Pascual-Anaya, J., Zadissa, A., Li, W., Niimura, Y., Huang, Z., Li, C., White, S., Xiong, Z., Fang, D., et al. 2013. The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle-specific body plan. Nat. Genet. 45, 701- 706.

Warren, R.L., Keeling, C.I., Yuen, M.M.S., Raymond, A., Taylor, G.A., Vandervalk, B.P., Mohamadi, H., Paulino, D., Chiu, R., Jackman, S.D., et al. 2015. Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism. Plant J. 83, 189-212.

Warren, W. C., Clayton, D.F., Ellegren, H., Arnold, A.P., Hillier, L.W., Künstner, A., Searle, S., White, S., Vilella, A.J., Fairley, S., et al. 2010. The genome of a songbird. Nature. 464, 757-762.

Warren, W.C., Hillier, L.W., Graves, J.A.M., Birney, E., Ponting, C.P., Grützner, F., Belov, K., Miller, W., Clarke, L., Chinwalla, A.T., et al. 2008. Genome analysis of the platypus reveals unique signatures of evolution. Nature. 453, 175-183.

Wawrzyniak, I., Courtine, D., Osman, M., Hubans-Pierlot, C., Cian, A., Nourrisson, C., Chabe, M., Poirier, P., Bart, A., Polonais, V., et al. 2015. Draft genome sequence of the intestinal parasite Blastocystis subtype 4-isolate WR1. Genomics Data. 4, 22-23.

Wegrzyn, J. L., Lin, B.Y., Zieve, J.J., Dougherty, W.M., Martínez-García, P.J., Koriabine, M., A. Holtz-Morris, deJong, P., Crepeau, M., Langley, C.H., et al. 2013. Insights into the loblolly pine genome: characterization of BAC and fosmid sequences. PLoS One. 8, e72439.

Wegrzyn, J. L., Liechty, J.D., Stevens, K.A., Wu, L.-S., Loopstra, C.A., Vasquez-Gross, H.A., Dougherty, W.M., Lin, B.Y., Zieve, J.J., Martinez-Garcia, P.J., et al. 2014. Unique features of the Loblolly Pine (Pinus taeda L.) megagenome revealed through sequence annotation. Genetics. 196, 891-909.

Wendland, J., Walther, A. 2011. Genome evolution in the Eremothecium clade of the Saccharomyces complex revealed by comparative genomics. G3 1, 539-548.

Wendland, J., Walther, A. 2014. Chromosome number reduction in Eremothecium coryli by two telomere-to-telomere fusions. Genome Biol. Evol. 6, 1186-1198.

269

Wibberg, D., Jelonek, L., Rupp, O., Hennig, M., Eikmeyer, F., Goesmann, A., Hartmann, A., Borriss, R., Grosch, R., Pühler, A., et al. 2013. Establishment and interpretation of the genome sequence of the phytopathogenic fungus Rhizoctonia solani AG1-IB isolate 7/3/14. J. Biotech. 167, 142-155.

Wicker, T., Oberhaensli, S., Parlange, F., Buchmann, J.P., Shatalina, M., Roffler, S., Ben-David, R., Doležel, J., Simková, H., Schulze-Lefert, P., et al. 2013. The wheat powdery mildew genome shows the unique evolution of an obligate biotroph. Nat. Genet. 45, 1092-1096.

Wilken, P.M., Steenkamp, E.T., Wingfield, M.J., de Beer, Z.W., Wingfield, B.D. 2013. Draft nuclear genome sequence for the plant pathogen, Ceratocystis fimbriata. IMA Fungus. 4, 357- 358.

Winckler, T., Szafranski, K., Glöckner, G. 2005. Transfer RNA gene-targeted integration: an adaptation of retrotransposable elements to survive in the compact Dictyostelium discoideum genome. Cytogenet. Genome Res. 110, 288-298.

Wohlbach, D.J., Kuo, A., Sato, T.K., Potts, K.M., Salamov, A.A., Labutti, K.M., Sun, H., Clum, A., Pangilinan, J.L., Lindquist, E.A., et al. 2011. Comparative genomics of xylose-fermenting fungi for enhanced biofuel production. P. Natl. Acad. Sci. USA. 108, 13212-13217.

Wollenberg, T., Schirawski, J. 2014. Comparative genomics of plant fungal pathogens: The Ustilago-Sporisorium paradigm. PLoS Pathog. 10, e1004218.

Woo, P.C.Y., Lau, S.K.P., Liu, B., Cai, J.J., Chong, K.T.K., Tse, H., Kao, R.Y.T., Chan, C.-M., Chow, W.-N., Yuen, K.-Y. 2011. Draft genome sequence of Penicillium marneffei strain PM1. Eukaryot. Cell 10, 1740-1741.

Wood, V., Gwilliam, R., Rajandream, M.A., Lyne, M., Lyne, R., Stewart, A., Sgouros, J., Peat, N., Hayles, J., Baker, S., et al. 2002. The genome sequence of Schizosaccharomyces pombe. Nature. 415, 871-880.

Worden, A.Z., Lee, J.-H., Mock, T., Rouzé, P., Simmons, M.P., Aerts, A.L., Allen, A.E., Cuvelier, M.L., Derelle, E., Everett, M.V., et al. 2009. Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas. Science. 9375, 268-272.

Wormbase (release Ws246). 2015. http://www.wormbase.org

Wu, C., Zhang, D., Kan, M., Lv, Z., Zhu, A., Su, Y., Zhou, D., Zhang, J., Zhang, Z., Xu, M., et al. 2014. The draft genome of the large yellow croaker reveals well-developed innate immunity. Nat. Comm. 5, 5227.

270

Wu, G.A., Prochnik, S., Jenkins, J., Salse, J., Hellsten, U., Murat, F., Perrier, X., Ruiz, M., Scalabrin, S., Terol, J., et al. 2014. Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat. Biotech. 32, 656-662.

Wu, H.-J., Zhang, Z., Wang, J.-Y., Oh, D.-H., Dassanayake, M., Liu, B., Huang, Q., Sun, H.-X., Xia, R., Wu, Y., et al. 2012. Insights into salt tolerance from the genome of Thellungiella salsuginea. P. Natl. Acad. Sci. USA. 109, 12219-12224.

Wu, J., Wang, Z., Shi, Z., Zhang, S., Ming, R., Zhu, S., Khan, M.A., Tao, S., Korban, S.S., Wang, H., et al. 2013. The genome of the pear (Pyrus bretschneideri Rehd.). Genome Res. 23, 396-408.

Wu, J., Gu,Y. Q., Hu, Y., You, F.M., Dandekar, A.M., Leslie, C.A., Aradhya, M., Dvorak, J, Luo, M.C. 2012. Characterizing the walnut genome through analyses of BAC end sequences. Plant Mol. Biol. 78, 95-107.

Wurm, Y., Wang, J., Riba-Grognuz, O., Corona, M., Nygaard, S., Hunt, B., Ingram, K., Falquet, L., Nipitwattanaphon, M., Gotzek, D., et al. 2011. The genome of the fire ant Solenopsis invicta. P. Natl. Acad. Sci. USA. 108, 5679-5684.

Xia, Q., Zhou, Z., Lu, C., Cheng, D., Dai, F., Li, B., Zhao, P., Zha, X., Cheng, T., Chai, C., et al. 2004. A draft sequence for the genome of the domesticated silkworm (Bombyx mori). Science. 306, 1937-1940.

Xiao, H., Yong-Jie, Z., Guo-Hua, X., Peng, Z., Yong-Liang, X., Xing-Yu, Z., St Leger, R., Xing- Zhong, L., Cheng-Shu, W. 2013. Genome survey uncovers the secrets of sex and lifestyle in caterpillar fungus. Chinese Sci. Bull. 58, 2846-2854.

Xu, H.-E., Zhang, H.-H., Xia, T., Han, M.-J., Shen, Y.-H., Zhang, Z. 2013. BmTEdb: a collective database of transposable elements in the silkworm genome. Database. 2013, bat055.

Xu, J., Saunders, C.W., Hu, P., Grant, R.A., Boekhout, T., Kuramae, E.E., Kronstad, J.W., Deangelis, Y.M., Reeder, N.L., Johnstone, K.R., et al. 2007. Dandruff-associated Malassezia genomes reveal convergent and divergent virulence traits shared with plant and human fungal pathogens. P. Natl. Acad. Sci. USA. 104, 18730-18735.

Xu, P., Widmer, G., Wang, Y., Ozaki, L.S., Alves, J.M., Serrano, M.G., Puiu, D., Manque, P., Akiyoshi, D., Mackey, A.J., et al. 2004. The genome of Cryptosporidum hominis. Nature. 431, 1107-1112.

Xu, P., Zhang, X., Wang, X., Li, J., Liu, G., Kuang, Y., Xu, J., Zheng, X., Ren, L., Wang, G., et al. 2014. Genome sequence and genetic diversity of the common carp, Cyprinus carpio. Nat. Genet. 46, 1212-1219.

271

Xu, Q., Chen, L.-L., Ruan, X., Chen, D., Zhu, A., Chen, C., Bertrand, D., Jiao, W.-B., Hao, B.- H., Lyon, M.P., et al. 2013. The draft genome of sweet orange (Citrus sinensis). Nat. Genet. 45, 59-66.

Xue, M., Yang, J., Li, Z., Hu, S., Yao, N., Dean, R.A., Zhao, W., Shen, M., Zhang, H., Li, C., et al. 2012. Comparative analysis of the genomes of two field isolates of the rice blast fungus Magnaporthe oryzae. PLoS Genet. 8, e1002869.

Yagi, M., Kosugi, S., Hirakawa, H., Ohmiya, A., Tanase, K., Harada, T., Kishimoto, K., Nakayama, M., Ichimura, K., Onozaki, T., et al. 2013. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.). DNA Res. 21, 231-241.

Yan, G., Zhang, G., Fang, X., Zhang, Y., Li, C., Ling, F., Cooper, D.N., Li, Q., Li, Y., van Gool, A.J., et al. 2011. Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nat. Biotech. 29, 1019-1023.

Yang, R., Jarvis, D.E., Chen, H., Beilstein, M.A., Grimwood, J., Jenkins, J., Shu, S., Prochnik, S., Xin, M., Ma, C., et al. 2013. The reference genome of the halophytic plant Eutrema salsugineum. Front. Plant Sci. 4, 46.

Yang, D., Pomraning, K., Kopchinskiy, A., Karimi Aghcheh, R., Atanasova, L., Chenthamara, K., Baker, S.E., Zhang, R., Shen, Q., et al. 2015. Genome sequence and annotation of Trichoderma parareesei, the ancestor of the cellulase producer Trichoderma reesei. Genome Announce. 3, e00885-00815.

Yang, E., Chow, W.-N., Wang, G., Woo, P.C.Y., Lau, S.K.P., Yuen, K.-Y., Lin, X., Cai, J.J. 2014. Signature gene expression reveals novel clues to the molecular mechanisms of dimorphic transition in Penicillium marneffei. PLoS Genet. 10, e1004662.

Yang, H., Wang, Y., Zhang, Z., Yang, R., Zhu, D. 2014. Whole-genome shotgun assembly and analysis of the genome of Shiraia sp. strain Slf14, a novel endophytic fungus producing huperzine A and hypocrellin A. Genome Announce. 2, e00011-00014.

Yang, J., Wang, L., Ji, X., Feng, Y., Li, X., Zou, C., Xu, J., Ren, Y., Mi, Q., Wu, J., et al. 2011. Genomic and proteomic analyses of the fungus Arthrobotrys oligospora provide insights into nematode-trap formation. PLoS Pathog. 7, e1002179.

Yang, K., Tian, Z., Chen, C., Luo, L., Zhao, B., Wang, Z., Yu, L., Li, Y., Sun, Y., Li, W., et al. 2015. Genome sequencing of adzuki bean (Vigna angularis) provides insight into high starch and low fat accumulation and domestication. P. Natl. Acad. Sci. USA. 112, 13213-13218.

Yang, R., Jarvis, D.E., Chen, H., Beilstein, M.A., Grimwood, J., Jenkins, J., Shu, S., Prochnik, S., Xin, M., Ma, C., et al. 2013. The reference genome of the halophytic plant Eutrema salsugineum. Front. Plant Sci. 4, 46.

272

Yang, Y., Xiong, J., Zhou, Z., Huo, F., Miao, W., Ran, C., Liu, Y., Zhang, J., Feng, J., Wang, M., Wang, M., Wang, L., Yao, B. 2014. The genome of the myxosporean Thelohanellus kitauei shows adaptation sto nutrient acquisition within its fish host. Genome Biol. Evol. 6, 3182-3198.

Yang, Y., Zhao, H., Barrero, R.A., Zhang, B., Sun, G., Wilson, I.W., Xie, F., Walker, K.D., Parks, J.W., Bruce, R., et al. 2014. Genome sequencing and analysis of the paclitaxel-producing endophytic fungus Penicillium aurantiogriseum NRRL 62431. BMC Genomics. 15, 69.

Yap, H.-Y.Y., Chooi, Y.-H., Firdaus-Raih, M., Fung, S.-Y., Ng, S.-T., Tan, C.-S., Tan, N.-H. 2014. The genome of the Tiger Milk mushroom, Lignosus rhinocerotis, provides insights into the genetic basis of its medicinal properties. BMC Genomics. 15, 635.

Yim, H.-S., Cho, Y.S., Guang, X., Kang, S.G., Jeong, J.-Y., Cha, S.-S., Oh, H.-M., Lee, J.-H., Yang, E.C., Kwon, K.K., et al. 2013. Minke whale genome and aquatic adaptation in cetaceans. Nat. Genet. 46, 88-92.

You, M., Yue, Z., He, W., Yang, X., Yang, G., Xie, M., Zhan, D., Baxter, S.W., Vasseur, L., Gurr, G.M., et al. 2013. A heterozygous moth genome provides insights into herbivory and detoxification. Nat. Genet. 45, 220-225.

Young, N.D., Debellé, F., Oldroyd, G.E.D., Geurts, R., Cannon, S.B., Udvardi, M.K., Benedito, V.A., Mayer, K.F.X., Gouzy, J., Schoof, H., et al. 2011. The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature. 480, 520-524.

Youssar, L., Grüning, B.A., Erxleben, A., Günther, S., Hüttel, W. 2012. Genome sequence of the fungus Glarea lozoyensis, the first genome sequence of a species from the Helotiaceae family. Eukaryot. Cell. 11, 250.

Youssef, N.H., Couger, M.B., Struchtemeyer, C.G., Liggenstoffer, A.S., Prade, R.A., Najar, F.Z., Atiyeh, H.K., Wilkins, M.R., Elshahed, M.S. 2013. Genome of the anaerobic fungus Orpinomyces sp. C1A reveals the unique evolutionary history of a remarkable plant biomass degrader. Appl. Environ. Microb. 79, 4620-4634.

Yu, J., Hu, S., Wang, J., Wong, G.K.-S., Li, S., Liu. B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., et al. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science. 296, 79- 92.

Yu, J., Zhao, M., Wang, X., Tong, C., Huang, S., Tehrim, S., Liu, Y., Hua, W., Liu, S. 2013. Bolbase: a comprehensive genomics database for Brassica oleracea. BMC Genomics. 14, 664.

Zajc, J., Liu, Y., Dai, W., Yang, Z., Hu, J., Gostincar, C., Gunde-Cimerman, N. 2013. Genome and transcriptome sequencing of the halophilic fungus Wallemia ichthyophaga, haloadaptations present and absent. BMC Genomics. 14, 617.

273

Zhan, X., Pan, S., Wang, J., Dixon, A., He, J., Muller, M.G., Ni, P., Hu, L., Liu, Y., Hou, H., et al. 2013. Peregrine and saker falcon genome sequences provide insights into evolution of a predatory lifestyle. Nat. Genet. 45, 563-566.

Zhang, G., Cowled, C., Shi, Z., Huang, Z., Bishop-Lilly, K.A., Fang, X., Wynne, J.W., Xiong, Z., Baker, M.L., Zhao, W., et al. 2013. Comparative analysis of bat genomes provides insight into the evolution of flight and immunity. Science. 339, 456-460.

Zhang, G., Fang, X., Guo, X., Li, L., Luo, R., Xu, F., Yang, P., Zhang, L., Wang, X., Qi, H., et al. 2012. The oyster genome reveals stress adaptation and complexity of shell formation. Nature. 490, 49-54.

Zhang, L., Li, L., Xu, F., Qi, H., Wang, X., Que, H., Zhang, G. 2013. Fosmid library construction and end sequences analysis of the Pacific oyster, Crassostrea gigas. Molluscan Res. 33, 65-73.

Zhang, Q., Chen, W., Sun, L., Zhao, F., Huang, B., Yang, W., Tao, Y., Wang, J., Yuan, Z., Fan, G., et al. 2012. The genome of Prunus mume. Nat. Comm. 3, 1318.

Zhang, T., Hu, Y., Jiang, W., Fang, L., Guan, X., Chen, J., Zhang, J., Saski, C.A., Scheffler, B.E., Stelly, D.M., et al. 2015. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat. Biotech. 33, 531-537.

Zhang, X. , Wessler, S.R. 2004. Genome-wide comparative analysis of the transposable elements in the related species Arabidopsis thaliana and Brassica oleracea. P. Natl. Acad. Sci. USA. 101, 5589-5594.

Zhang, Y., Zhang, K., Fang, A., Han, Y., Yang, J., Xue, M., Bao, J., Hu, D., Zhou, B., Sun, X., et al. 2014. Specific adaptation of Ustilaginoidea virens in occupying host florets revealed by comparative and functional genomics. Nat. Comm. 5, 3849.

Zhao, C., T. Zhang, X. Zhang, S. Hu, and J. Xiang. 2012. Sequencing and analysis of four BAC clones containing innate immune genes from the Zhikong scallop (Chlamys farreri). Gene 502:9- 15

Zhao, F., Qi, J.,Schuster, S.C. 2009. Tracking the past: interspersed repeats in an extinct Afrotherian mammal, Mammuthus primigenius. Genome Res. 19, 1384-1392.

Zheng, H., Zhang, W., Zhang, L., Zhang, Z., Li, J., Lu, G., Zhu, Y., Wang, Y., Huang, Y., Liu, J., et al. 2013. The genome of the hydatid tapeworm Echinococcus granulosus. Nat. Genet. 45, 1168-1175.

Zheng, W., Huang, L., Huang, J., Wang, X., Chen, X., Zhao, J., Guo, J., Zhuang, H., Qiu, C., Liu, J., et al. 2013. High genome heterozygosity and endemic in the wheat stripe rust fungus. Nat. Comm. 4, 2673. 274

Zhou, F., Xu, Y. 2009. RepPop: a database for repetitive elements in Populus trichocarpa. BMC Genomics. 10, 14.

Zhou, P., Zhang, G., Chen, S., Jiang, Z., Tang, Y., Henrissat, B., Yan, Q., Yang, S., Chen, C.-F., Zhang, B., et al. 2014. Genome sequence and transcriptome analyses of the thermophilic zygomycete fungus Rhizomucor miehei. BMC Genomics. 15, 294.

Zhou, W., Hu, Y., Sui, Z., Fu, F., Wang, J., Chang, L., Guo, W., Li, B. 2013. Genome survey sequencing and genetic background characterization of Gracilariopsis lemaneiformis (Rhodophyta) based on next-generation sequencing. PLoS One. 8, e69909.

Zhou, X., Sun, F., Xu, S., Fan, G., Zhu, K., Liu, X., Chen, Y., Shi, C., Yang, Y., Huang, Z., et al. 2013. Baiji genomes reveal low genetic variability and new insights into secondary aquatic adaptations. Nat. Comm. 4, 2708.

Zhu, H., Blackmon, B.P., Sasinowski, M., Dean, R.A. 1999. Physical map and organization of chromosome 7 in the rice blast fungus, Magnaporthe grisea. Genome Res. 9, 739-750.

Zimin, A.V., Delcher, A.L., Florea, L., Kelley, D.R., Schatz, M.C., Puiu, D., Hanrahan, F., Pertea, G., Van Tassell, C.P., Sonstegard, T.S., et al. 2009. A whole-genome assembly of the domestic cow, Bos taurus. Genome Biol. 10, R42.

Zuccaro, A., Basiewicz, M., Zurawska, M., Biedenkopf, D., Kogel, K.H. 2009. Karyotype analysis, genome organization, and stable genetic transformation of the root colonizing fungus Piriformospora indica. Fungal Genet. Biol. 46, 543-550.

Zuccaro, A., Lahrmann, U., Güldener, U., Langen, G., Pfiffi, S., Biedenkopf, D., Wong, P., Samans, B., Grimm, C., Basiewicz, M., et al. 2011. Endophytic life strategies decoded by genome and transcriptome analyses of the mutualistic root symbiont Piriformospora indica. PLoS Pathog. 7, e1002290.

Zuccolo, A., Bowers, J.E., Estill, J.C., Xiong, Z., Luo, M., Sebastian, A., Goicoechea, J.L., Collura, K., Yu, Y., Jiao, Y., et al. 2011. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure. Genome Biol. 12, R48.

275