Evolutionary and Functional Impacts of Short Interspersed Nuclear Elements (SINEs) Revealed via Genomic Assessment of Felid CanSINEs

By Kathryn B. Walters-Conte

B. S., May 2000, University of Maryland, College Park M. S., May 2002, The George Washington University

A Dissertation Submitted to

The Faculty of

Columbian College of Arts and Sciences of The George Washington University

in partial fulfillment of the requirements for the Degree of Doctor of Philosophy

May 15 th , 2011

Dissertation Directed By

Diana L.E. Johnson Associate Professor of Biology

Jill Pecon-Slattery Staff Scientist, National Cancer Institute

. The Columbian College of Arts and Sciences of The George Washington University certifies that Kathryn Walters-Conte has passed the Final Examination for the degree of

Doctor of Philosophy as of March 24 th , 2011. This is the final and approved form of the dissertation.

Evolutionary and Functional Impacts of Short Interspersed Nuclear Elements (SINEs) Revealed via Genomic Assessment of Felid CanSINEs

Kathryn Walters-Conte

Dissertation Research Committee:

Diana L.E. Johnson, Associate Professor of Biology, Dissertation Co-Director

Jill Pecon-Slattery, Staff Scientist, National Cancer Institute, Dissertation Co-Director

Diana Lipscomb, Ronald Weintraub Chair and Professor, Committee Member

Marc W. Allard, Research Microbiologist, U.S. Food and Drug Administration, Committee

Member

ii Acknowledgements

I would like to first thank my advisor and collaborator, Dr. Jill Pecon-Slattery, at the National Cancer Institute of the National Institutes of Health, for generously permitting me to join her research group. Without her mentorship this dissertation would never have been possible. I would also like to express gratitude to my advisor at the George

Washington University, Dr. Diana L. E. Johnson, who provided invaluable moral and scientific support over the years. I have had a dynamic committee who deserve my thanks:

Dr. Marc Allard, my first advisor, for bringing me to the University and introducing me to the world of phylogenetics; Dr. Warren Johnson for teaching me all about ecology and

Dr Diana Lipscomb, for ensuring that I could see this endeavor through till the end.

I would also like to thank Carrie McCracken, Nicole Crumpler, Clare Holleley,

Carlos Driscoll, Victor David, Joan Pontius, Sher Hendrickson and everyone else in the

Laboratory of Genomic Diversity and the NCI “Core” Lab at the National Cancer Institute who gave invaluable scientific support, provided world class resources and stirred insightful discussions. I also extend a special thanks to Dr. Stephen O’Brien for including me in his extended family of conservation geneticists.

I would not have been able to complete my doctorate without the support of my friends and family especially: my father, Edward Walters, a pioneer stay-at-home Dad who taught me to read when I was two-years old; my mother, Wanda Walters, who introduced me to science with a microscope at the age of five and my brother, Ryan, who continues to defy the odds. Lastly I thank my husband, Matthew Conte, for his never-ending patience).

iii Abstract of Dissertation

Evolutionary and Functional Impacts of Short Interspersed Elements (SINEs) Revealed via Genomic Assessment of Felid CanSINEs

Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element that comprise over 10% of mammalian genomes, and play a vital role in genome structure and gene function. SINEs have also been promoted as valuable evolutionary markers at the population, species, and familial strata. While SINEs are prolific throughout

Mammalia, historically SINE-based inquiry has primarily occurred amongst primates, rodents and cetaceans. Recent developments in genomic resources enable high-throughput investigations of SINEs within a variety of mammalian lineages.

Publication of domestic , ( familiaris ), and domestic cat ( catus ) whole genome sequences in 2005 and 2007 respectively provide references sequences for comparative investigations of the order specific SINE family, CanSINEs. This dissertation explores the evolutionary implications CanSINES within the (cat) family, a charismatic carnivoran clade of 38 species that includes biomedical model organisms, companion , and ecologically imperiled species. Capitalizing on the relative conservation of chromosome arrangements among Felidae species, comparative genomics methods were used to find CanSINE insertions that were initially located in domestic and exotic species across the entire Felidae family and in other Feliform suborder representatives.

Comparative analyses of CanSINEs elucidate many aspects of SINE biology including: characterization of two Feliform specific SINE subfamilies, the phylogenetic

iv consistency of CanSINE insertion loci, the evolutionary distribution of loci in divergent species following rapid speciation, a non-random retrotransposition process wherein new inserts occur at specified DNA motifs, and the utility of both presence/absence data and sequence data derived from SINE inserts for de novo phylogenetic reconstruction. The methods employed in this dissertation allow prior assumptions regarding the functional and evolutionary activity of mammalian SINEs to be evaluated empirically, ultimately providing a framework for the study of transposable elements in all .

v Table of Contents

Acknowledgments...... iii

Abstract of Dissertation...... iv

List of Figures...... vii

List of Tables...... ix

General Introduction...... x

Chapter 1: Carnivores specific SINEs (Can-SINEs): Distribution, Evolution and Genomic

Impact (written for publication in Journal of Heredity)...... …...... 1

Chapter 2: Comparative SINE analyses reveal feliform specific CanSINE subfamilies and the complexities of Felidae evolution (written for publication in Genome Research)……28

Chapter 3: Phylogeny and Rapid Speciation of the Felidae Illuminated by CanSINE

Insertion Analysis (written for publication in Molecular Phylogenetics and

Evolution)...... 64

Chapter 4: Computational Comparative Analyses of CanSINEs in Felidae (written for publication in BMC Biology)………… ...... 110

General Conclusions...... 154

vi

List of Figures

Figure 1.1…………………………………………………………………………………16

Figure 1.2…………………………………………………………………………………18

Figure 2.1…………………………………………………………………………… ……44

Figure 2.2…………….…………………………………………………………………...46

Figure 2.3…………………………………………………………………………………48

Figure 2.4…………………………………………………………………………………49

Figure 2.5…………………………………………………………………………………50

Figure 2.s1……………………………………..………………………………………….62

Figure 2.s2……………………………………..………………………………………….62

Figure 3.1…………………………………………………………………………… ……84

Figure 3.2…………………………………………………………………………… ……86

Figure 3.3…………………………………………………………………………… ……88

Figure 3.4…………………………………………………………………………… ……91

Figure 3.5…………………………………………………………………………… ……93

Figure 3.6…………………………………………………………………………… ……95

Figure 3.s1……………………………………………………………………………….104

Figure 3.s2……………………………………………………………………………….105

Figure 3.s3……………………………………………………………………………….107

Figure 4.1………………………………………………………………………………..126

Figure 4.2………………………………………………………………………………..127

Figure 4.3………………………………………………………………………………..128

vii

Figure 4.4………………………………………………………………………………..129

Figure 4.5………………………………………………………………………………..131

Figure 4.6………………………………………………………………………………..133

Figure 4.s1………………………………………………………………………………142

Figure 4.s2………………………………………………………………………………147

Figure 4.s3………………………………………………………………………………148

Figure 4.s4………………………………………………………………………………149

Figure 4.s5………………………………………………………………………………150

Figure 4.s6………………………………………………………………………………151

Figure 4.s7………………………………………………………………………………152

Figure 4.s8………………………………………………………………………………153

viii

List of Tables

Table 1.1………………………………………………………………………………….14

Table 2.1………………………………………………………………………………….43

Table 2.s1…………………………………………………………………………………59

Table 2.s2…………………………………………………………………………………60

Table 2.s3…………………………………………………………………………………61

Table 3.1………………………………………………………………………………….81

Table 3.s1………………………………………………………………………….. …...102

Table 3.s2………………………………………………………………………………..103

Table 4.1…………………………………………………………………………………125

Table 4.2…………………………………………………………………………………125

Table 4.s1………………………………………………………………………………..141

ix

GENERAL INTRODUCTION

Central to our understanding of mammalian diversity is an assessment of how

genome composition and structure has evolved throughout time. Recent publication of

multiple genome-sequencing projects allows ascertainment of the abundant repetitive DNA

sequences known as retrotransposable elements, which can comprise up to 42% of the

mammalian genome. Long interspersed nuclear elements (LINEs) and short interspersed

nuclear elements (SINEs) are classes of retrotransposable elements that play significant

roles in the size and repetitive nature of mammalian genomes. However, little is known

about the processes regulating the evolution, structure and function of these elements

within host genomes. SINEs are of particular interest for evolutionary analyses due to size

(100-500 bp), abundance (up to 1x10 6 genomic copies) and insertional polymorphism (up to 7% are heterozygous in the dog genome). Comparative genomic approaches allow evaluation of these sequences with regards to their contribution to genome architecture and utility in evolutionary studies across a wide-range of organisms.

The Felidae family, which is in the suborder of the order Carnivora, is a study system in which the evolution and functional impacts of SINE elements may be examined. Two whole genome shotgun sequences assemblies (1.9X and 2.8X) were recently obtained for the domestic cat ( Felis catus ) (Pontius et al. 2007) in addition to

highly resolved Felidae and Carnivora phylogenies derived from multiple nuclear and

mitochondrial DNA segments (Eizirik et al. 2010; Johnson et al. 2006). Despite these

resources, a comprehensive assessment of SINEs within the Felidae or any other Feliformia

clade has yet to be completed. Thus, this dissertation characterizes the Carnivora specific

SINE family (CanSINEs) within the Felidae lineage. x

Novel methods to localize conserved and polymorphic SINE insertion sites in exotic

Felidae species were employed, resulting in the discovery of lineage specific SINE subfamilies and processes of targeted SINE integration. In addition, SINE insertion sites were evaluated as homologous sources of sequence variation for phylogenetic reconstruction and as synapomorphic markers for cladistic evolutionary analyses. An understanding of felid CanSINEs is essential to the continued development of the Felidae as a biomedical and ecological model system. Further, the insertion sites characterized in this study will potentially aid conservation strategies by providing molecular markers for rapid identification of exotic species.

Study System

The Carnivora are widely distributed and exhibit extreme morphological diversity

(Gittleman and Purvis 1998). While not all species are obligate carnivores, all members of the order possess four carnassial teeth in the front of the jaw that are adaptations for the meat consumption (Greaves 1985).

There are 286 extant Carnivora species divided into two suborders: and

Feliformia. Caniformia, the “dog-like” suborder, is composed of seven extant families: the land dwelling (, wolves, ), (, ),

(, ) and Ursidae () along with the marine Otariidae (fur seals, sea lions), Odobenidae () and Phocodiae (true seals) collectively known as .

The “cat-like” suborder, Feliformia, was previously divided into 4 extant families: Felidae

(), Herpestidae (), Hyaenidae () and (civets). However, the placement of several species including the Asian linsangs and the Malagasy carnivores was controversial (Gaubert and Veron 2003; Yoder et al. 2002). A recent large-scale xi

molecular phylogeny revised the Feliformia suborder by splitting Viverridae into

four familial groups: Nandiniidae (), Viverridae (other civets),

Hyaenidae (hyenas), (Malagasy carnivores), and Prionodontidae (Asian

linsangs) (Eizirik et al. 2010). The divergence between the Caniformia and Feliformia

suborders occurred ~59 million years ago (MYA) followed by early divergences of the

modern feliforms ~44.5 MYA, and radiation of the modern Felidae ~10 MYA (Eizirik et

al. 2010).

Historically, the establishment of a reliable Felidae phylogeny was difficult owing

to several periods of rapid speciation and conflicting morphological characteristics.

However, in 2006 an extensive molecular phylogenetic analysis based on over 23,000 base

pairs of nuclear and mitochondrial sequence described eight major Felidae lineages

(Johnson et al. 2006). These lineages diverged in a step-wise manner beginning with the

lineage ( subfamily) ~10 MYA, followed by the bay cat lineage ~9.5

MYA, the lineage ~8.5 MYA, the ocelot lineage ~8 MYA, the lineage ~7

MYA, the lineage ~6.7 MYA, and finally the split of the Asian leopard cat and

domestic cat lineages ~6.2 MYA.

The Panthera lineage, also known as the “big cats”, originated in Asia where the

clouded leopard ( nebulosa ) diverged from the progenitor of the Panthera genus.

Over time Panthera spread throughout the world following prey across reappearing land

bridges. Of the five extant Panthera genus species, tigers (P. tigris ) and snow leopards ( P.

uncia ) are endemic to Asia; jaguars ( P. onca ) are endemic to the Americas; lions ( P. leo ) are endemic to Africa and the Gir forest of India; and leopards ( Panthera pardus ) are endemic to both Africa and Asia (O'Brien and Johnson 2005) xii

The bay cat lineage consists of three Asian species: (

marmorata ), ( P. temminckii ) and the bay cat ( P. badia ). The caracal lineage was the first of the (non-Pantherinae species) to become established outside of Asia and includes three extant African and Asian species: ( Profelis serval ),

African golden cat ( P. aurata ) and caracal ( P. caracal ) (O’Brien et al. 2008).

Approximately 8 MYA a land bridge led to a migration of cats from Asia to the Americas,

resulting in a South American cat lineage known as the ocelot lineage. Extant species of

this lineage include the namesake ocelot ( pardalis ), margay ( L. wiedii ), Andean mountain cat ( L. jacobita ), pampas cat ( L. colocolo ), Geoffroy’s cat (L. geoffroyi ), kodkod

(L. guigna ), and tigrina ( L. tigrina ). Currently all ocelot lineage species are imperiled and

the Andean mountain cat is listed as endangered on the IUCN Red List (www.iucn.org)

(Nowell and Jackson 1996).

The lynx lineage is comprised of four extant species. Migrations of lynx

progenitors between Asia, North America and Europe have resulted in the modern day

bobcat ( Lynx rufus ), Canadian lynx ( L. canadensis ), ( L. lynx ) and the

Iberian lynx ( L. pardinus ) (www.iucn.org) (Johnson et al. 2004;

O'Brien and Johnson 2005).

The species of the puma lineage: cheetah ( jubatus ), puma ( Puma

concolor ) and ( P. yagouaroundi ), are frequently the subjects of conservation

studies. The African cheetah is infamous for its remarkable genetic uniformity, resulting

from a near extinction event ~ 12,000 years ago resulting in high susceptibility to infectious

disease (O'Brien and Johnson 2005). The panther ( P. concolor coryi ) is also

vulnerable to disease and infant mortality owing to a population bottleneck following over xiii

hunting and fragmented ranges that diminished the population. Recent introduction of P.

concolor from Texas into Florida along with efforts to minimize hunting and

vehicle accidents has resulted in a resurgence of Florida panthers (Johnson et al. 2010).

The Asian leopard cat lineage includes five species separated into two genera:

Prionailurus and Otocolobus. The genus is comprised of the Asian leopard cat

(P. bengalensis ), flat-headed cat ( P. planiceps ), rusty-spotted cat ( P. rubiginosus ) and fishing cat ( P. viverrinus ). The Otocolobus contains only the pallas cat species ( O.

manul ), which occupies a unique geographic niche in the barren, rocky and mountainous

regions of central Asia (Masuda et al. 1996) and has at times been placed as an early

divergence of the domestic cat lineage (Yu and Zhang 2005).

The domestic cat lineage is primarily comprised of the wild relatives of the

common house cat: the black-footed cat ( Felis nigripes ), the jungle cat ( F. chaus ), the sand

cat ( F. margarita ) and the wildcat ( F. silvestris ) species complex. The modern house cat

(F. catus ) was recently described as derived from the near eastern wildcat subspecies ( F.

silvestris libyca ) (Driscoll et al. 2007). For the purposes of this dissertation, designate

sequences derived from the cat whole genome sequence assembly or the commercially

available cat DNA supplied by EMD4 Biosciences are designated F. catus . A sample from

the Scottish wildcat population is referred to as F. silvestris . F. bieti denotes an individual

belonging to the Chinese mountain cat subspecies and F. libyca denotes a wild individual

from the African wildcat subspecies.

Chapter Summaries

The aim of this dissertation is to provide a detailed evolutionary characterization of

CanSINEs within the Felidae family. The comparative genomic methods presented here xiv incorporate computational analyses of the domestic cat whole genome sequence with molecular biology techniques.

Prior to applying experimental approaches, an understanding of previous discoveries and insights on SINEs in Carnivora was needed. Thus, the first chapter of the dissertation entitled “Carnivore SINEs (CanSINEs): Distribution, Evolution and Genomic

Advances” is a review that describes the discovery, characterization and distribution of

CanSINEs, and details modern approaches for comparative SINE analyses within

Carnivora.

In the second chapter entitled, “Comparative SINE analyses reveal Feliform specific CanSINE subfamilies and the complexities of Felidae evolution”, 31 insertion sites located during the initial cat genome annotation were examined at homologous loci across all Felidae and some Feliformia. As a result, 14 additional previously unknown CanSINE insertions were localized in several non-domestic cat species. The effect of rapid speciation on SINE insertion polymorphism is addressed, two Feliform specific SINE subfamilies are characterized, and a SINE excision is described.

In the third chapter entitled “Phylogeny and Rapid Speciation of the Felidae

Illuminated by CanSINE Insertion Analysis”, libraries of genomic DNA flanked by

CanSINEs from exotic Felidae species were compared to the cat whole genome sequence and potentially informative insertion sites were queried across Felidae. As a result 53 previously unknown and informative CanSINE loci were identified and evaluated as cladistic evolutionary markers.

The CanSINE insertion sites discussed in chapters 2 and 3 were incorporated into computational analyses for the final chapter, “Computational Comparative Analyses of xv

CanSINEs in Felidae”. The subfamilies introduced in chapter 2 are characterized further detail. In addition, the target site duplications generated by SINE retrotransposition are evaluated and mechanisms for CanSINE genomic integration are proposed. Finally, conserved CanSINE loci are used as orthologous sequence data for molecular tree-building algorithms. The resultant phylogenies are resolved at nearly every node and are largely congruent with previous hypotheses of Felidae evolution.

Collectively, the research described in this dissertation has several broad applications. This study will 1) provide effective techniques for localizing SINEs in species lacking whole genome sequences, 2) provide further understanding of the non- random insertion dynamics of mammalian SINEs, 3) show how SINEs evolve and are maintained in successive lineages, 4) verify hypotheses of phylogenetic relationships within Felidae, and 5) provide molecular markers that can be applied to the identification of threatened and .

xvi

Literature Cited

Driscoll, C., M. Menotti-Raymond, A.L. Roca, K. Hupe, W.E. Johnson, E. Geffen, E.H.

Harley, M. Delibes, D. Pontier, A.C. Kitchener, N. Yamaguchi, S.J. O'Brien, and

D.W. MacDonald. 2007. The near eastern origin of cat domestication. Science 317:

519-523.

Eizirik, E., W.J. Murphy, K.P. Koepfli, W.E. Johnson, J. Dragoo, R.K. Wayne, and S.J.

O'Brien. 2010. Pattern and timing of diversification of the mammalian order

Carnivora inferred from multiple nuclear gene sequences. Molecular Phylogenetics

and Evolution 56: 49-63.

Gaubert, P. and G. Veron. 2003. Exhaustive sample set among Viverridae reveals the

sister-group of felids: the linsangs as a case of extreme morphological convergence

within Feliformia. Proc Biol Sci 270: 2523–2530.

Gittleman, J.L. and A. Purvis. 1998. Body size and species-richness in carnivores and primates. Proceedings of the Royal Society: Biological Sciences 265: 113-119.

Greaves, W.S. 1985. The generalized carnivore jaw. Zoological Journal of the Linnean

Society 85: 267-274.

Johnson, W.E., E. Eizirik, J. Pecon-Slattery, W.J. Murphy, A. Antunes, E. Teeling, and S.J.

O'Brien. 2006. The late Miocene radiation of modern Felidae: a genetic assessment.

Science 311: 78-77.

Johnson, W.E., J.A. Godoy, F. Palomares, M. Delibes, M. Fernandes, E. Revilla, and a.S.J.

O'Brien 2004. Phylogenetic and phylogeographic analysis of Iberian lynx

populations. Journal of Heredity 95: 19-28.

xvii

Johnson, W.E., D.R. Onorato, M.E. Roelke, E.D. Land, M.W. Cunningham, R.C. Belden,

R. McBride, D. Jansen, M. Lotz, D. Shindle, J. Howard, D.E. Wildt, L.M. Penfold,

J.A. Hostetler, M.K. Oli, and S.J. O'Brien. 2010. Genetic restoration of the florida

panther. Science 329: 1641-1645.

Masuda, R., J.V. Lopez, J. Pecon-Slattery, N. Yuhki, and S.J. O'Brien. 1996. Molecular

phylogeny of mitochondrial cytochrome b and 12S rRNA sequences in the Felidae:

ocelot and domestic cat lineages. Molecular Phylogenetics and Evolution 6: 351-

365.

Nowell, K. and P. Jackson. 1996. WILD CATS: Status survey and conservation action

plan. IUCN, Gland, Switzerland .

O'Brien, S.J. and W.E. Johnson. 2005. Big cat genomics. Annu. Rev. Genomics Hum.

Genet. 6: 407-429.

O’Brien, S.J., W. Johnson, C. Driscoll, J. Pontius, J. Pecon-Slattery, and M. Menotti-

Raymond. 2008. State of cat genomics. Trends in 24: 268-279.

Pontius, J., J. Mullikin, D. Smith, K. Lindblad-Toh, S. Gnerre, M. Clamp, J. Chang, R.

Stephens, B. Neelam, N. Volfovsky, A. Schaffer, R. Agarwala, K. Narfstrom, W.

Murphy, U. Giger, A. Roca, A. Antunes, M. Menotti-Raymond, N. Yuhki, J.

Pecon-Slattery, and W. Johnson. 2007. Initial sequence and comparative analysis of

the cat genome. Genome research 17: 1675-1689.

Yoder, A.D., M. Burns, S. Zehr, T. Delefosse, G. Veron, S. Goodman, and J. Flynn. 2002.

Single origin of malagasy carnivora from an African ancestor. Nature 421: 734-

737.

xviii

Yu, L. and Y.-p. Zhang. 2005. Phylogenetic studies of pantherine cats (Felidae) based on

multiple genes, with novel application of nuclear ß-fibrinogen intron 7 to

carnivores. Molecular Phylogenetics and Evolution 35: 483-495.

xix

CHAPTER ONE: CARNIVORES SPECIFIC SINES (CAN-SINES):

DISTRIBUTION, EVOLUTION AND GENOMIC IMPACT

Abstract

Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element (retrotransposon) with features that allow investigators to resolve evolutionary relationships between populations and species while providing insight into genome composition and function. Characterization of a Carnivora specific SINE family called,

CanSINEs, has aided comparative genomic studies by providing rare genomic changes and neutrally evolving sequence variants that are often needed to resolve phylogenetic relationships within rapidly diverged lineages. In addition, CanSINEs constitute significant source of genotypic and phenotypic diversity within Carnivora. Publication of the whole genome sequence of domestic dog, domestic cat and giant panda serves as a valuable resource in comparative genomic inferences gleaned from CanSINEs. In anticipation of forthcoming studies bolstered by new genomic data, this review describes the discovery and characterization of CanSINE motifs as well as describes composition, distribution and effect on genome function. As the contribution of non-coding sequences to genomic diversity becomes more apparent, SINEs and other transposable elements will play an increasingly large role in mammalian comparative genomics.

1

Introduction Short interspersed nuclear elements (SINEs) are repetitive genomic sequences,

members of class 1 transposable elements (retrotransposons), that are present in all

eukaryotic organisms (Wicker, Sabot et al. 2007), including monotreme, marsupial and

eutherian mammals, (Nishihara, Hasegawa et al. 2006; Gu, Ray et al. 2007; Munemasa,

Nikaido et al. 2008). Characterized by unique features in structure, proliferation and

genome distribution, SINEs are chimeras of transcribed RNA genes and simple repeats that

proliferate via reverse-transcription using the enzymatic machinery of autonomous

elements, which recognize internal polymerase III promoter sequences (Collier and

Largaespada 2007). Approximately 40 tRNA, 7SL RNA and 5S rRNA derived SINE

families have been described in mammals thus far, many of which are present in more than

10 4 copies (Miyamoto 1999; Gu, Ray et al. 2007). As a source of insertional mutagenesis,

SINEs have been linked to unequal recombination events (Callinan, Wang et al. 2005) and genetic diseases (Deininger and Batzer 1999) as well as proven to be informative evolutionary markers across genomes. With the publication of the whole genome sequences from domestic dog ( Canis familiaris ) (Lindblad-Toh, Wade et al. 2005),

domestic cat ( Felis catus ) (Pontius, Mullikin et al. 2007) and most recently giant panda

( melanoleuca ) (Li, Fan et al. 2010) new insights are possible on the family of

SINEs unique to the Carnivora order termed CanSINEs. Here we review the structural,

functional and evolutionary impact of CanSINEs on carnivore genomes.

2

Discovery and Characterization of CanSINEs

CanSINEs were first described in the early 1990s when an interspersed repetitive element was discovered on the X-chromosome of the ( Mustela vison ) that shared 55% sequence similarity with tRNA-lysine derived rodent B2 SINEs (Lawrence,

McDonnell et al. 1985; Lavrentieva, Rivkin et al. 1991). Subsequent studies of other caniform suborder species including harbour seal ( vitulina concolor ), dog ( C. familiaris ), wolf ( C. lupus ), coyote ( C. latrans ) and mink ( M. vision ) affirmed the presence of these sequences in high copy number (Coltman and Wright 1994) . Initially not observed in the feliform suborder representative, F. catus, this newly characterized SINE family appeared caniform specific and therefore was named CanSINE (Coltman and

Wright 1994). However, subsequent hybridization studies with F. catus (van der Vlugt and

Lenstra 1995; Vassetzky and Kramerov 2002) as well as a sequence study of three Y- chromosome genes among all members of the Felis genus, the bobcat ( Lynx rufus ), C. familiaris , and several species (Pecon-Slattery, Murphy et al. 2000) conclusively demonstrated that CanSINEs are ubiquitous across Carnivora.

CanSINEs are defined by a tRNA-related region, which includes A and B promoter boxes, followed by a (CT) n microsatellite and terminate with a poly-A/T tail containing the polyadenylation signal AATAAA (Figure 1) (Lavrentieva, Rivkin et al. 1991). The tRNA- related region has primary sequence similarity of 70-79% to lysine-tRNAs (Vassetzky and

Kramerov 2002) and most inserts are between 150 and 300 base pairs (bp) in length, depending on the size of the (CT)n and poly A/T regions (Vassetzky and Kramerov 2002;

Pecon-Slattery, Wilkerson et al. 2004). Each CanSINE locus is flanked by target site duplications of 8-15 bp generated during retrotransposition. The number of SINE repeats 3

within carnivore genomes, based on estimates of the C. familiaris genome range from

1.1x10 6 (Lindblad-Toh, Wade et al. 2005) to 1.3x10 6 (Coltman and Wright 1994) copies.

CanSINE Amplification

Although the specific retrotransposition mechanism has yet to be fully described,

comparative evidence indicates that CanSINEs proliferate in a manner similar to other

mammalian SINEs (Gentles, Kohany et al. 2005). In general, SINE amplification occurs

through a ‘copy and paste’ mechanism known as target primed reverse transcription

wherein a few SINE master or source copies are transcribed in high copy number

(Cordaux, Hedges et al. 2004; Tong, Gan et al. 2010). Lacking functional coding regions,

SINE amplification and integration is dependent on enzymes derived from the host genome

and long interspersed nuclear elements (LINEs) (Kajikawa and Okada 2002). Proliferation

is initiated by recognition of the promoter boxes residing in the RNA-related region by

host-derived RNA-polymerase III and followed by cleavage of the genomic DNA at

TTAAAA motifs between the T’s and A’s, by LINE derived enzymes (Gentles, Kohany et

al. 2005; Cordaux and Batzer 2009) . This process allows the poly-A/T tail of the SINE

transcript to bind to the single stranded genomic DNA, creating a primer for LINE derived

reverse transcriptase to synthesize complementary SINE DNA (Christensen, Ye et al. 2006;

Kurzynska-Kokorniak, Jamburuthugoda et al. 2007). A subsequent nick in the opposite

genomic strand 6-10 bp downstream from the initial cut site results in a novel SINE insert

flanked TSDs (Jurka 1997; Christensen, Ye et al. 2006) (Figure 2).

The involvement of LINEs in SINE proliferation has been attributed to structural

similarities between the 3’ portions (microsatellite and poly A/T regions) of each

transposable element. For example, monotreme Mon-1 and marsupial Ther-2 SINE 4

families share 3’ end sequences with LINEs of the same genome, which are believed to

facilitate SINE retrotransposition (Ohshima and Okada 2005). Conversely, the endogenous

L1 family LINEs that facilitate rodent and primate 7SL-derived SINE proliferation, do not

share primary sequence homology (Ohshima and Okada 2005). Whole genome analyses of

C. familiaris and F. catus reveal that at least 19% and 8% of the genome is comprised of

LINE sequences respectively, derived from the L1 and L2 families (Lindblad-Toh, Wade et al. 2005; Pontius, Mullikin et al. 2007). In addition, complete open reading frames may be found amongst carnivore L1s, indicating recent retrotransposition activity of this LINE family (Pontius, Mullikin et al. 2007). However, further comparative genomic analyses are essential in clarifying the functional and evolutionary associations between carnivore specific SINEs and LINEs.

CanSINE Voucher Sequences and Subfamilies

At any given time only a few SINE insertions will act as the source for novel SINE transcripts during amplification (Cordaux, Hedges et al. 2004). Gradual accumulation of mutations will eventually inhibit transcription of a given master copy, which then becomes dormant and is replaced by an alternate copy. This process results in SINE subfamilies with diagnostic nucleotide sequence that are classified into phylogenetic lineages (Ray,

Xing et al. 2006). The 16 CanSINE voucher sequences described within the Repbase library and RepeatMasker software to date (Jurka, Kapitonov et al. 2005; Smit and Green

2005) serve as prototypes for subfamily classification schemes. CanSINE sequences are collectively designated ‘SINEC’ followed by putative subfamily and species of first discovery identifiers (Figure 1). For example, the first CanSINE insertion sequence found in the A. melanoleuca genome is designated SINEC1_AMe (Li, Fan et al. 2010). 5

Phylogenetic analysis of existing CanSINE voucher sequences defines two

evolutionary lineages concordant with genome origin as either caniform or feliform (Figure

1). Amongst the caniform SINEs, divergent lineages represented by A. melanoeuca

(family Ursidae) and P. vitulina (family Phocidae) are interspersed with putative Canis

(family Canidae) specific sequences. This structure, unrelated to carnivore taxonomy,

suggests SINE master copies that originated early in caniforms have remained active in

SINE proliferation within independent lineages (Figure 1). Future comparative research

will fully characterize the CanSINE subfamilies and provide further insight to the carnivore

genome diversity.

Biomedical Effects of CanSINEs

Transposable elements, including SINEs, contribute broadly to the functional

diversity of mammalian genomes. While the vast majority of novel insertions that manage

to evade purifying selection and genetic drift are functionally benign, a few confer neutral

or deleterious phenotypic variants (Belancio, Hedges et al. 2008). SINE insertions may be

co-opted by host genomes as sites for non-homologous genome rearrangement, as sources

for coding sequence and as regulatory elements. Numerous transposable element insertions

have been implicated in human disease (Deininger and Batzer 1999; Batzer and Deininger

2002) and transgenic mice engineered for over expression of the transposable element type

known as “Sleeping Beauty” experienced increased cancer development (Dupuy, Akagi et

al. 2005). Relative to primates and rodents, the phenotypic consequences of SINE

insertions among Carnivores are less thoroughly documented. However, an increasing

number of SINE-derived variants have been found amongst the domestic dog breeds,

6 which provide well-defined populations ideal for studying morphological variation and disease susceptibility.

The merle coat pattern of C. familiaris is an incompletely dominant inherited phenotype noted by marbled coat pigmentation that is sometimes accompanied by symptoms analogous to those associated with human Waardenburg syndrome including auditory, visual, skeletal, cardiac and reproductive impairments (Clark, Wahl et al. 2006).

This phenotype, common in the Corgi, Dachshund, Australian Shepard and other domestic breeds, segregates with a region of the C. familiaris genome that includes the SILV “silver” gene, which is responsible for coat pigment in multiple mammalian species including mouse and horse (Kwon, Halaban et al. 1995; Theos, Truschel et al. 2005). In merle canines, the SILV gene harbors a CanSINE insertion in reverse orientation at the boundary of intron 10 and exon 11, that causes alternate splicing of exon 11 and ultimately results in the merle phenotype (Clark, Wahl et al. 2006).

The autosomal recessive version of spontaneous centronuclear myopathy (CNM), a muscular disorder that affects multiple mammalian species including humans, occurs in

Labrador retrievers. The disease is characterized by muscle weakness, muscular atrophy and other afflictions attributed to compromised muscle fibers (Hu, Laporte et al. 1996;

Laporte, Hu et al. 1996). Disease associated genotypes were mapped to the canine autosomal protein tyrosine phosphatase-like member A gene ( PTPLA ), the mouse and human homologs of which is expressed in myogenic precursors during embryogenesis and in adult skeletal muscles (Uwanogho, Hardcastle et al. 1999; Breen, Hitte et al. 2004).

Characterization of canine PTPLA revealed a CanSINE insertion in the reverse orientation within exon 2 that segregates with CNM disease. The disorder conforms to a recessive 7

model of inheritance in which unaffected carriers are heterozygous while affected

individuals are homozygous for the insertion. CanSINE presence results in 7 transcript

isoforms of which only two are identical to wild-type PTPLA while the remaining 5 are presumably ineffective or toxic (Pelé, Tiret et al. 2005).

Narcolepsy is a sleep disorder that affects humans and other mammals including domestic dogs. A mutation within the hypocretin receptor 2 gene ( HCRTR2 ) is among the multiple genetic causes of human early onset narcolepsy (Peyron, Faraco et al. 2000).

Doberman Pinschers with narcolepsy also have an HCRTR2 mutation in the form of a

CanSINE insertion prior to the fourth exon that results in inefficient pre-mRNA splicing.

Other domestic dog breeds with increased incidence of narcolepsy, however, do not have

the HCRTR2 insertion and thus, similar to the human disease, there may be multiple causes of canine narcolepsy (Lin, Faraco et al. 1999).

Domestic dogs are characterized by profound body size diversity, ranging from

2kg Chihuahuas to 82kg Mastiffs. Recent efforts to unveil genetic factors influencing canine body size have uncovered multiple CanSINE insertion variants in linkage disequilibrium with size determining genes. Within the second intron of the insulin-like growth factor 1 ( IGF1 ) gene, a SNP and CanSINE segregate with a wide-variety of small breeds (Sutter, Bustamante et al. 2007). Although SINE insertions are demonstrated factors in gene expression (Hasler and Strub 2006; Lin, Shen et al. 2008) the causative mutation in IGF1 has yet to be determined. From an evolutionary perspective, while the

‘small’ haplotype is not observed in the domestic dog forbearer, the gray wolf ( Canis lupus ), overall genetic similarity to the homologous locus in Middle Eastern wolves

8 suggests a Middle Eastern origin for a small dog variant that occurred after the initial domestication (Gray, Sutter et al. 2010).

Retrogenes are derived from processed mRNAs sequences that have been retrotransposed into the genome via reverse transcription and are usually not expressed due to the absence of regulatory elements (Brosius 1999). However, in rare instances, retrogenes can employ the internal promoters of nearby LINEs or SINEs for transcription and expression (Brosius 1999; Esnault, Maestre et al. 2000). Chondrodysplasia (shortened limbs) is a feature of several the domestic dog breeds and is linked with a fibroblast- growth-factor receptor 4 ( fgfr4 ) retrogene. This retrogene has been inserted into a LINE sequence that is in close proximity to multiple CanSINE insertions suggesting that these elements provide promoter sequences that allow expression of the fgfr4 at critical time points in development (Parker, VonHoldt et al. 2009).

The CanSINE induced phenotypes described above, associated with dog domestication and breed development, serve as models for the role of SINEs in gene function and complex genetic diseases in natural populations. Increased genomic data and advances in bioinformatics tools will further elucidate the interplay between CanSINEs, coding sequences and regulatory regions (Spady and Ostrander 2008).

Evolutionary Insights

SINEs have been used as synapomorphic markers in the reconstruction several mammalian phylogenies including the afrotheria, cetacean and artiodactyls lineages

(Nikaido, Rooney et al. 1999; Nikaido, Matsuno et al. 2001; Nishihara, Satta et al. 2005;

Nishihara, Hasegawa et al. 2006). However, very few studies have investigated the utility of CanSINEs in carnivore systematics. Prior to the availability of carnivore whole genome 9 sequences, CanSINEs were primarily characterized as a by-product of other genetic studies.

For example, a survey of microsatellite loci inadvertently uncovered a species-specific

SINE insertion in the Mel08 locus of American mink ( Mustela vison) (Lopez-Giraldez,

Andres et al. 2006) that was subsequently used in conservation efforts to distinguish invasive M. vison from the ecologically imperiled European mink ( M. lutreola) (Lopez-

Giraldez, Gomez-Moliner et al. 2005) (Table 1). Similarly, sequence analysis of the transytherin gene inadvertently revealed an orthologous CanSINE locus that is a synapomorphy amongst caniforms (Zehr, Nedbal et al. 2001). Comparative studies of the feliform species bay cat ( Profelis temminckii) and Pallas cat ( Otocolobus manul) , as well as the caniform taxa grey wolf ( C. lupus) , red wolf ( C. rufus) , ( lutra ) and striped ( mephitis ) unveiled multiple, independently-derived CanSINE insertions occur in the ß-fibrinogen gene. Moreover, the M. mephitis insert is a chimera, illustrating the tendency for SINEs to be incorporated within or adjacent to existing SINEs (Yu, Li et al. 2004; Yu, Liu et al. 2008). (Table 1)

The Y-chromosome is described as a ‘sink’ for transposable elements where SINEs can accumulate in the non-recombining region (Krzywinskia, Nusskernb et al. 2004; Jurka,

Kapitonov et al. 2007). CanSINE insertions have been found within the Zfy gene of the

Felis genus, the bear (Ursidae) family, the Japanese ( anakuma ) and the stoat

(Mustela erminea) (Pecon-Slattery, Murphy et al. 2000; Yamada and Masuda 2010). The

Smcy gene hosts an orthologous insertion in the sand cat (F. margarita ) and the wildcat ( F. silvestris ) species complex . Species-specific insertions are also found in the Smcy gene of

Ursus arctos and L. rufus (Pecon-Slattery, Wilkerson et al. 2004). Otocolobus manul has a species-specific insertion in the Ube1y gene (Pecon-Slattery, Murphy et al. 2000). (Table 1) 10

As with all mammalian SINEs, CanSINE insertion distributions are generally

congruent with existing hypotheses of speciation (Ray, Xing et al. 2006). However, the

intronic insertions found in all Felis species and L. rufus (described above) occur at

identical genomic coordinates (Pecon-Slattery, Murphy et al. 2000; Pecon-Slattery,

Wilkerson et al. 2004), which if interpreted as a single synapomorphy, would unite

distantly related species. However, phylogenetic reconstruction with sequences adjacent to

the insertion site and differences in the poly A/T tails indicate that the two insertions are the

result of independent SINE integration events at identical loci that occurred after the

divergence of the major cat lineages (Pecon-Slattery, Wilkerson et al. 2004; Johnson,

Eizirik et al. 2006).

Comparative Genomics Fosters CanSINEs Analyses

Whole genome sequencing technologies enable inclusion of SINEs and other

transposable elements in comparative analyses. Computational tools for analysis of non-

coding regions littered with SINE insertions are becoming more accessible while sequences

from model organisms provide a point of reference for the pursuit of informative SINEs in

closely related and divergent taxa (Wang, Song et al. 2006; Liu, Alkan et al. 2009;

Schröder, Bleidorn et al. 2009).

Whole genome sequences derived from a Standard Poodle and Boxer indicate

CanSINEs account for approximately 8% of the C. familiaris genome (Kirkness, Bafna et

al. 2003; Lindblad-Toh, Wade et al. 2005). Nearly all CanSINE sequences that most

closely align to the SINEC_Cf2 voucher sequence are homozygous and conserved between

the two individuals, indicating that this subfamily is inactive. However, approximately 7%

of sequences that most closely resemble the SINEC_Cf voucher sequence are unfixed, 11

either within or between the two canines, which suggests recent proliferation of the

corresponding subfamily (Kirkness, Bafna et al. 2003). In contrast, recently acquired

insertions account for only 0.5% of the Alu content in the human genome, of which only

25% are unfixed (Batzer and Deininger 2002). Approximately 7.9% of the giant panda ( A. melanoleuca) genome is comprised of SINE insertions that are <10% diverged from

Repbase voucher sequences (Li, Fan et al. 2010). Further, de novo characterization of transposable elements identified an additional 0.1% of sequence belonging to SINE elements that were not yet identified in the Repbase, which suggests the presence of panda specific subfamilies (Li, Fan et al. 2010). Initial estimates of the transposable element content in F. catus found that 11% of the genome is comprised of CanSINEs similar to

SINEC_FC1, SINEC_FC2 and SINEC_C1 voucher sequences and mammalian-wide MIR

SINEs. However, novel CanSINEs were also identified (Pontius, Mullikin et al. 2007),

suggesting feliform specific subfamilies in addition to those in Repbase.

Whole genome sequences may be used as references for comparative analyses,

allowing large-scale identification of CanSINE insertion sites in both coding and non-

coding regions that are population, species, or lineage specific. Within C. familiaris , 64

unfixed SINEC_Cf insertions have been localized that segregate with breed (Wang and

Kirkness 2005). In addition, 17 intronic parsimony informative CanSINE loci have been

located among 21 caniform species, all of which are congruent with other molecular data

(Schröder, Bleidorn et al. 2009). Within Feliformia, comparative genomics analysis

resulted in the identification of over 60 informative loci that can discern suborder, familial,

and species relationships (Walters-Conte, unpublished data).

12

Conclusions

Our understanding of mammalian transposable elements has accelerated in the last

20 years as a consequence of advances in sequencing and computational technologies.

Through these advances we find that carnivore specific CanSINEs have a significant impact on genome content and gene function. These abundant retroelement insertions can directly alter phenotypes by becoming incorporated into coding regions and by providing promoter sequences that disrupt the transcriptional regulation of adjacent genes. As sources of , CanSINEs have proven to be highly informative markers to differentiate the protected Mustela lutreola from the invasive M. vison. In addition,

CanSINEs can define domestic dog breeds as well as diagnose species, genus, familial and suborder relationships throughout Carnivora. Whole genome sequences provide references to investigate the plethora of CanSINEs that are clustered within intergenic regions. Future advancements in sequencing and bioinformatics technologies will provide further insights into CanSINE biology, which will add to our general knowledge of the function and evolution of mammalian SINEs.

13

Taxonomic Group Locus Name Publication Suborder Caniformia Translytherin intron 1 Zehr et al 2000 Caniformia CF_L002d Schroder et al 2009 Caniformia CF_L003c Schroder et al 2009 Caniformia CF_L006a Schroder et al 2009 Caniformia CF_L010 Schroder et al 2009 Caniformia CF_L013 Schroder et al 2009

Superfamily Arctiodea CF_L003b Schroder et al 2009 Musteloidea CF_L002b Schroder et al 2009

Family Ursidae Zfy Pecon-Slattery et al 2000 Canidae CF_L001 Schroder et al 2009 Canidae* CF_L003a Schroder et al 2009 Canidae CF_L004 Schroder et al 2009 Canidae CF_L007a Schroder et al 2009 Canidae CF_L007b Schroder et al 2009 Canidae CF_L008 Schroder et al 2009 Canidae CF_L011 Schroder et al 2009 Canidae CF_L014 Schroder et al 2009 Canidae CF_L015 Schroder et al 2009 Odobenidae/Otariidae CF_L006b Schroder et al 2009 Mustelidae (except Meles) CF_L004b Schroder et al 2009 Canidae* CF_L016 Schroder et al 2009

Subfamily Caninae CF_L002e Schroder et al 2009 Caninae* CF_L009b Schroder et al 2009

Genus Felis Zfy Pecon-Slattery et al 2000 Canis ß-fibrinogen intron 7 Yu and Zhang 2005

Species Mustela vison Mel08 locus Lopez-Giraldez et al 2005 Lynx rufus Smcy Pecon-Slattery et al 2004 Otocolobus manul Ube1y Pecon-Slattery et al 2000 Otocolobus manul ß-fibrinogen intron 7 Yu and Zhang 2005 Profelis temminckii ß-fibrinogen intron 7 Yu and Zhang 2005 Mustela erminea Zfy Yamada and Masuda 2010 Meles anakuma Zfy Yamada and Masuda 2010 Mephitis mephitis ß-fibrinogen intron 7 Yu and Zhang 2008 Lutra lutra ß-fibrinogen intron 4 Yu and Zhang 2008 arctos Smcy Pecon-Slattery et al 2000 Felis margarita/silvestris Smcy Pecon-Slattery et al 2004 14

Ursus arctos* CF_L002a Schroder et al 2009 Mephitis mephitis* CF_L002c Schroder et al 2009 Mephitis mephitis* CF_L008a Schroder et al 2009 Canis familiaris* CF_L005 Schroder et al 2009 Canis familiaris* CF_L012 Schroder et al 2009 lotor* CF_L009a Schroder et al 2009 californianus CF_L017 Schroder et al 2009 Ursus arctos* CF_L018 Schroder et al 2009 Procyon lotor* CF_L021 Schroder et al 2009

Table 1: A sampling of clade and species-specific CanSINE insertion sites published in previous evolutionary studies. *Phylogenetic range is not yet conclusive as data is not available from all related taxa. Data from functional studies were not included.

15

16

Figure 1. Alignment and minimum evolution phylogeny of Repbase CanSINE voucher sequences. All CanSINE vouchers begin with the designation “SINEC”, followed by species of first discovery where “Fc”= Felis catus , “CF”= Canis familiaris , “Ame”=

Ailuropoda melanoleuca , and “Pv”= Phoca vitulina. A) The initial global alignment

Lysine-tRNA derived segments was estimated using the Geneious alignment module (ref) and refined by hand. Lines indicate RNA polymerase III promoter boxes A and B. B) The neighbor-joining clustering algorithm was used to estimate phylogeny with the Tamura-Nei distance model and branch support approximated using 1000 bootstrap replications.

Feliform SINEs form a distinct clade while SINEs amongst the caniform lineages are intermingled and do not cluster according to lineage.

17

Figure 2. Retrotransposition of mammalian SINE insertions. A) The master SINE copy is transcribed by RNA polymerase III. B) LINE derived enzyme nicks a chromosome strand at the AATTTT. C) The poly-A tail of the transcribed SINE binds to free TTTT, and acts as a primer for LINE derived reverse transcriptase. A nick on the opposite strand frees the complementary target site duplication sequence. D) Reverse transcriptase synthesizes complementary strands. E) A new SINE insertion with characteristic target site duplications. Adapted from (Cordaux and Batzer, 2009). 18

Literature Cited

Batzer, M. A. and P. L. Deininger (2002). "Alu repeats and human genomic diversity."

Nature Reviews Genetics 3: 370-379.

Belancio, V. P., D. J. Hedges, et al. (2008). "Mammalian non-LTR retrotransposons: For

better or worse, in sickness and health." Genome Res 18 : 343-358.

Breen, M., C. Hitte, et al. (2004). "An integrated 4249 marker FISH/RH map of the canine

genome." BMC Genomics 5(1): 65.

Brosius, J. (1999). "Many G-protein-coupled receptors are encoded by retrogenes." Trends

in Genetics 15 (8): 304-305.

Brosius, J. (1999). "Transmutation of tRNA." Nature 22 : 8-9.

Callinan, P. A., J. Wang, et al. (2005). "Alu retrotransposition-mediated deletion." J Mol

Biol 348 : 791-800.

Christensen, S., J. Ye, et al. (2006). "RNA from the 5 ′ end of the R2 retrotransposon

controls R2 protein binding to and cleavage of its DNA target site." PNAS 103 (47):

17602-17607.

Clark, L. A., J. M. Wahl, et al. (2006). "Retrotransposon insertion in SILV is responsible

for merle patterning of the domestic dog." PNAS 103 (5): 1376-1381.

Collier, L. S. and D. A. Largaespada (2007). "Transposable elements and the dynamic

somatic genome." Genome Biology 8((Suppl 1)): S5.

Coltman, D. W. and J. M. Wright (1994). "Can SINEs: a family of tRNA-derived

retroposons specific to the superfamily Canoidea." Nucl. Acids Res. 22 (14): 2726-

2730.

19

Cordaux, R. and M. Batzer (2009). "The impact of retrotransposons on human genome

evolution." Nature Reviews Genetics 10 : 691-703.

Cordaux, R., D. J. Hedges, et al. (2004). "Retrotransposition of Alu elements: how many

sources?" Trends in Genetics 20 (10): 464-467.

Deininger, P. and M. Batzer (1999). "Alu repeats and human disease." Molecular Genetics

and Metabolism 67 (3): 183-193.

Dupuy, A. J., K. Akagi, et al. (2005). "Mammalian mutagenesis using a highly mobile

somatic Sleeping Beauty transposon system." Nature 436 : 221-226.

Esnault, C., J. Maestre, et al. (2000). "Human LINE retrotransposons generate processed

pseudogenes." Nature Genet 24 : 363-367.

Gentles, A. J., O. Kohany, et al. (2005). "Evolutionary diversity and potential

recombinogenic role of integration targets of non-LTR retrotransposons." Mol Biol

Evol 22 : 1983-1991.

Gray, M., N. Sutter, et al. (2010). "The IGF1 small dog haplotype is derived from Middle

Eastern grey wolves." BMC Biology 8: 16.

Gu, W., D. A. Ray, et al. (2007). "SINEs, evolution and genome structure in the opossum."

Gene 396 (1): 46-58.

Hasler, J. and K. Strub (2006). "Survey and summary- Alu elements as regulators of gene

expression." Nucl. Acids Res. 34 : 5491-5497.

Hu, L.-J., J. Laporte, et al. (1996). "X-linked myotubular myopathy: refinement of the gene

to a 280-kb region with new and highly informative microsatellite markers." Hum

Genet 98 (2): 178-181.

20

Johnson, W. E., E. Eizirik, et al. (2006). "The late Miocene radiation of modern Felidae: a

genetic assessment." Science 311 (5757): 78-77.

Jurka, J. (1997). "Sequence patterns indicate an enzymatic involvement in integration of

mammalian retroposons." PNAS 94 : 1872-1877.

Jurka, J., V. Kapitonov, et al. (2007). "Repetitive sequences in complex genomes: structure

and evolution." Annu. Rev. Genomics Hum. Genet. 8: 241-259.

Jurka, J., V. V. Kapitonov, et al. (2005). "Repbase Update, a database of eukaryotic

repetitive elements " Cytogenetic and Genome Research 110 : 462-467.

Kajikawa, M. and N. Okada (2002). "LINEs mobilize SINEs in the eel through a shared 3'

sequence." Cell 111 ( 3): 433 - 444.

Kirkness, E. F., V. Bafna, et al. (2003). "The dog genome: survey sequencing and

comparative analysis." Science 301 : 1898 - 903.

Krzywinskia, J., D. R. Nusskernb, et al. (2004). "Isolation and Characterization of Y

Chromosome Sequences From the African Malaria Mosquito Anopheles gambiae."

Genetics 166 : 1291-1302.

Kurzynska-Kokorniak, A., V. Jamburuthugoda, et al. (2007). "DNA-directed DNA

polymerase and strand displacement activity of the reverse transcriptase encoded by

the R2 retrotransposon." J Mol Biol 374 (2): 322-333.

Kwon, B., R. Halaban, et al. (1995). "Mouse silver mutation is caused by a single base

insertion in the putative cytoplasmic domain of Pmel 17." Nucl. Acids Res. 23 (1):

154-158.

21

Laporte, J., L. Hu, et al. (1996). "A gene mutated in X-linked myotubular myopathy

defines a new putative tyrosine phosphatase family conserved in yeast." Nature

Genet 13 (2): 175-182.

Lavrentieva, M. V., M. I. Rivkin, et al. (1991). "B2-like repetitive sequence from the X

chromosome of the American mink (Mustela vison)." Mammalian Genome 1(3):

165-170.

Lawrence, C., D. McDonnell, et al. (1985). "Analysis of repetitive sequence elements

containing tRNA-like sequences." Nucl. Acids Res. 13 (12): 4239-4252.

Li, R., W. Fan, et al. (2010). "The sequence and de novo assembly of the giant panda

genome." Nature 463 (7279): 311 - 317.

Lin, L., J. Faraco, et al. (1999). "The sleep disorder canine narcolepsy is caused by a

mutation in the hypocretin (Orexin) receptor 2 gene." Cell 98 : 365-376.

Lin, L., S. Shen, et al. (2008). "Diverse splicing patterns of exonized Alu elements in

human tissues." PLoS Genet 4: e10000225.

Lindblad-Toh, K., C. M. Wade, et al. (2005). "Genome sequence, comparative analysis and

haplotype structure of the domestic dog." Nature 438 (7069): 803-19.

Liu, G., C. Alkan, et al. (2009). "Comparative analysis of Alu repeats in primate genomes."

Genome Res 19 : 876-885.

Lopez-Giraldez, F., O. Andres, et al. (2006). "Analyses of carnivore microsatellites and

their intimate association with tRNA-derived SINEs." BMC Genomics 7(269).

Lopez-Giraldez, F., B. J. Gomez-Moliner, et al. (2005). "Genetic distinction of American

and European mink ( Mustela vison and M. lutreola ) and European polecat ( M.

22

putorius ) hair samples by detection of species-specific SINE and RFLP assay."

Journal of Zoology, London 265 : 405-410.

Miyamoto, M. M. (1999). "Molecular systematics: Perfect SINEs of evolutionary history?"

Current Biology 9: 816 -819.

Munemasa, M., M. Nikaido, et al. (2008). "Newly discovered young CORE-SINEs in

marsupial genomes." Gene 407 (1-2): 176-185.

Nikaido, M., F. Matsuno, et al. (2001). "Retroposon analysis of major cetacean lineages:

The monophyly of toothed whales and the paraphyly of river dolphins." PNAS

98 (13): 7384-7389.

Nikaido, M., A. P. Rooney, et al. (1999). "Phylogenetic relationships among cetartiodactyls

based on insertions of short and long interspersed elements: Hippopotamuses are

the closest extant relatives of whales." PNAS 96 (18): 10261-10266.

Nishihara, H., M. Hasegawa, et al. (2006). "Pegasoferae, an unexpected mammalian clade

revealed by tracking ancient retroposon insertions." PNAS 103 (26): 9929-9934.

Nishihara, H., Y. Satta, et al. (2005). "Retrotransposon analysis of Afrotherian phylogeny."

Mol Biol Evol 9: 1823-1833.

Ohshima, K. and N. Okada (2005). "SINEs and LINEs: symbionts of eukaryotic genomes

with a common tail." Cytogenetic and Genome Research 110 (1-4): 475- 490.

Parker, H. G., B. VonHoldt, et al. (2009). "An expressed fgfr retrogene is associated with

breed-defining chondrodyplasia in domestic dogs." Science 325 (5943): 995-998.

Pecon-Slattery, J., W. J. Murphy, et al. (2000). "Patterns of diversity among SINE elements

isolated from three Y-chromosome genes in carnivores." Mol Biol Evol 17 (5): 825-

829. 23

Pecon-Slattery, J., A. J. Wilkerson, et al. (2004). "Phylogenetic assessment of introns and

SINEs within the Y chromosome using the cat family Felidae as a species tree "

Mol Biol Evol 21 (12): 2299-2309.

Pelé, M., L. Tiret, et al. (2005). "SINE exonic insertion in the PTPLA gene leads to

multiple splicing defects and segregates with the autosomal recessive centronuclear

myopathy in dogs " Human Molecular Genetics 14 (11): 1417-1427.

Peyron, C., J. Faraco, et al. (2000). "A mutation in a case of early onset narcolepsy and a

generalized absence of hypocretin peptides in human narcoleptic brains." Nat Med

6(9): 991-997.

Pontius, J., J. Mullikin, et al. (2007). "Initial sequence and comparative analysis of the cat

genome." Genome research 17 (11): 1675-1689.

Ray, D. A., J. Xing, et al. (2006). "SINEs of a nearly perfect character." Syst. Biol 55 (6):

928-935.

Schröder, C., C. Bleidorn, et al. (2009). "Occurrence of Can-SINEs and intron sequence

evolution supports robust phylogeny of carnivores and their terrestrial

relatives." Gene 448 (2): 221-226.

Smit, A. and P. Green (2005). "Repeat Masker Website and Server."

Spady, T. and E. Ostrander (2008). "Canine behavioral genetics: pointing out the

phenotypes and herding up the genes." American Journal of Human Genetics 82 :

10-18.

Sutter, N., C. Bustamante, et al. (2007). "A single IGF1 allele is a major determinant of

small size in dogs." Science 316 (5821): 112-115.

24

Theos, A. C., S. T. Truschel, et al. (2005). "The Silver locus product

Pmel17/gp100/Silv/ME20: controversial in name and in function. ." Pigment Cell

Research 18 (5): 322-336.

Tong, C., X. Gan, et al. (2010). "Multiple source genes of HAmo SINE actively expanded

and ongoing retroposition in cyprinid genomes relying on its partner LINE." BMC

Evolutionary Biology 10 (115): 115.

Uwanogho, D. A., Z. Hardcastle, et al. (1999). " Molecular cloning, chromosomal

mapping, and developmental expression of a novel protein tyrosine phosphatase-

like gene." Genomics 62 (3): 406-416. van der Vlugt, H. H. and J. A. Lenstra (1995). "SINE elements of carnivores." Mammalian

Genome 6(1): 49-51.

Vassetzky, N. S. and D. A. Kramerov (2002). "CAN—a pan-carnivore SINE family."

Mammalian Genome 13 (1): 50- 57.

Wang, J., L. Song, et al. (2006). "Whole genome computational comparative genomics: A

fruitful approach for ascertaining Alu insertion polymorphisms." Gene 365 : 11-20.

Wang, W. and E. F. Kirkness (2005). "Short interspersed elements (SINEs) are a major

source of canine genomic diversity." Genome Res. 15 : 1798-1808.

Wicker, T., F. Sabot, et al. (2007). "A unified classification system for eukaryotic

transposable elements." Nat Rev Genet 8(12): 973 -982.

Yamada, C. and R. Masuda (2010). "Molecular phylogeny and evolution of sex-

chromosomal genes and SINE sequences in the family Mustelidae." Study

35 : 17-30.

25

Yu, L., Q.-w. Li, et al. (2004). "Phylogenetic relationships within mammalian order

Carnivora indicated by sequences of two nuclear DNA genes." Molecular

Phylogenetics and Evolution 33 : 694-705.

Yu, L., J. Liu, et al. (2008). "New insights into the evolution of intronic sequences of the

Beta-fibrinogen gene and their application in reconstructing mustelid phylogeny."

Zoological Science 25 (6): 622-672.

Zehr, S. M., M. A. Nedbal, et al. (2001). "Tempo and mode of evolution in an orthologous

Can SINE." Mammalian Genome 12 (1): 38-44.

26

CHAPTER TWO: COMPARATIVE SINE ANALYSES REVEAL FELIFORM SPECIFIC CANSINE SUBFAMILIES AND THE COMPLEXITIES OF FELIDAE EVOLUTION

Abstract

Recent developments in genomic resources enable high-throughput discovery of mammalian short interspersed nuclear element (SINEs) insertions. Here the recently completed domestic cat whole genome sequence is used as a reference to locate 45 previously uncharacterized Carnivore specific SINE (CanSINE) insertions within the

Feliformia suborder. Sequence diversity and differential distribution of the loci reveals two new SINE subfamilies, one feliform specific and the other Felidae specific. The utility of the insertion sites as evolutionary markers was also assessed in regard to Feliformia suborder phylogeny with detailed analyses in the Felidae family. Among the 38 Felidae species and five additional Feliformia species examined here, SINE presence/absence data was largely congruent with previous phylogenetic analyses. The few phylogenetic inconsistencies found provide new insights into processes of incomplete lineage sorting following rapid speciation and SINE-mediated genome rearrangement.

27

Introduction Short interspersed nuclear elements (SINEs) are a type of transposable element,

<500 base pairs (bp), that accumulate in genomes by retrotransposition (Jurka et al. 2007;

Kajikawa and Okada 2002). These abundant sequences, comprised of one or more RNA

gene derived regions, a di-nucleotide repeat region and/or a poly A/T tail (Coltman and

Wright 1994; Ohshima and Okada 1994), account for up to 10% of mammalian genome

sequence (Jurka et al. 2007; Lindblad-Toh et al. 2005; Pontius et al. 2007a) and thus play a

significant role in genomic organization (Jurka et al. 2007). While SINEs are found

throughout Mammalia, most inferences regarding the evolution of these sequences are

derived from analyses of only a few clades. Investigations of primate Alu sequences reveal a copy and paste proliferation mechanism that generates SINE families and subfamilies differentiated by sequence variation and presence in specific evolutionary lineages (Jurka

1997; Liu et al. 2009; Ray and Batzer 2005). Orthologous insertion loci SINEs are informative synapomorphic markers (Lum et al. 2000; Nikaido et al. 2001; Nikaido et al.

1999; Ray 2007) that have enhanced phylogenetic studies of cetaceans (whales dolphins and porpoises), artiodactyls (ungulates) and afrotherians (a morphologically diverse clade of African mammals) (Lum et al. 2000; Nikaido et al. 2001; Nishihara et al. 2005).

Although generally considered informative markers of speciation, underlying mechanisms governing SINE evolution and proliferation may hinder accurate phylogenetic inference. Previously, SINE discovery either a by-product in sequence studies of other genomic targets (Pecon-Slattery et al. 2000; Yu and Zhang 2005; Zehr et al. 2001) or involved arduous hybridization and cloning protocols (Okada et al. 2004; Shimamura et al.

28

1999). In addition, processes such as incomplete lineage sorting of ancestral

polymorphisms (Miyamoto 1999; Takahashi et al. 2001), or removal via recombination

(Callinan et al. 2005; van de Lagemaat et al. 2005), can compromise phylogenetic

consistency.

In recent years, the study of SINE evolution has experienced a renaissance; a result

of mammalian whole genome sequencing projects that facilitate genome-wide comparative

analyses of transposable elements. Numerous insertion loci located throughout the

genomes of exotic species, such as anteaters, mammoths and non-human primates, allow

characterization of SINE subfamilies within a variety of mammalian orders (Liu et al.

2009; Nishihara et al. 2007; Zhao et al. 2009). Furthermore, newly available phylogenies

provide an evolutionary context for proper interpretation of complex speciation events and

SINE-mediated recombination (Callinan et al. 2005; Ray 2007; Takahashi et al. 2001; Yu

and Zhang 2005).

The cat family, Felidae, provides a reference system for SINE evolution. The

current felid phylogeny is derived from analyses of multiple independent genetic markers

that partition the extant species into eight statistically supported evolutionary lineages

(Johnson et al. 2006). In addition, a recent revision of the Carnivora order phylogeny

allows examination of felid SINEs with respect to the Feliformia suborder (Eizirik et al.

2010). Here we explore the family of SINEs unique to Carnivora, CanSINEs, using a

comparative genomics approach that incorporates the domestic cat ( F. catus ) whole genome sequence (Pontius et al. 2007a) with the highly resolved Felidae and Carnivora phylogenies (Eizirik et al. 2010; Johnson et al. 2006). Forty-five previously undocumented

CanSINE insertion loci are identified and their distribution determined among 38 Felidae 29

species and 5 additional outgroup feliforms. Two feliform specific CanSINE subfamilies

are delimited, effects of rapid speciation on the phylogenetic distribution SINE insertion

loci are empirically evaluated and evidence for an imprecise SINE excision is revealed.

Results and Discussion

Multiple insertions occur at homologous CanSINE loci

Feliform specific CanSINE insertion loci were characterized by combining in silico analysis of the F. catus whole genome sequence with direct PCR amplification and sequencing. An initial assessment of CanSINEs within the 1.9X domestic cat whole genome sequence identified 322 intact loci (Pontius et al. 2007a). Here, a subset of 60 loci were amplified and sequenced in 38 Felidae species and 5 additional species representing feliform outgroup taxa (Supplemental Table 1). Of these, amplification was successful for at least 80% of the sample taxa at 31 loci (Supplemental Table 2). The nearly 50% amplification failure rate can attributed to the difficulty of obtaining interpretable PCR and sequencing products from loci occurring within or adjacent to other transposable elements, a clustering pattern that was observed in 18 loci.

PCR amplification using 31 F. catus based primer sets revealed 10 loci containing two or more independent insertions, resulting in the discovery of additional 14 CanSINE insertions (Supplemental Figure 1) (Table 3). Eleven species-specific insertions were distributed among taxa representing each of the eight major Felidae lineages and one

Hyaenidae species. Three synapomorphic (shared derived) loci occurred, one locus present in both snow leopard ( Panthera uncia ) and tiger ( P. tigris ) and two polymorphic loci distributed among the lynx species. (Table 1, Figure 1). The discovery of additional SINE integration events indicates a continual proliferation of CanSINEs throughout Felidae. 30

These results are also consistent with recent observations within the Caniformia suborder

wherein amplification of five putative C. familiaris CanSINE loci revealed eight additional

insertions in related species (Schröder et al. 2009).

In total, 45 previously uncharacterized CanSINE insertions were discovered in this

study. Repeated observations of SINEs mapped in close genomic proximity to other transposable elements supports previous assertions that certain genomic regions are prone to SINE integration (Jurka et al. 2005; Yu and Zhang 2005). Endonucleases derived from the L1 long interspersed nuclear element (LINE) family facilitate integration of primate

Alu ’s by cleaving genomic DNA at the motif: TTAAAA(N) 0-8TYTNR (Jurka 2004).

Similarly, over 20% of C. familiaris CanSINE integration sites include TTAAAA (Gentles et al. 2005). Thus, felid SINEs are expected to cluster in genomic regions containing L1 endonuclease cleavage sites, including those nested in the 3’ poly A/T tails of existing

SINE insertions. Integrated retrotransposons are generally fixed in regions of the genome where large insertions are tolerated, such as regions where gene expression will not be adversely affected (Jurka et al. 2005). In accordance with these prior observations, the 45

CanSINEs located within this study reside in non-coding regions adjacent to other SINE or

LINE elements, and in some instances, occur within the poly A/T tails of other retrotransposons.

Feliform specific CanSINE subfamilies Multiple sequence alignments of the newly described insertions reveal two feliform specific CanSINE subfamilies (I and II) that each may be further subdivided into subtypes (A and B) based on the presence of diagnostic nucleotides in the tRNA-related region. Subfamily I insertions include a ‘CCTGATT’ motif at position 36 and subfamily II 31

insertions include a ‘GGCTCGG’ motif at position 118 (Figure 2). Within subfamily I, an

‘RG’ insertion at position 116 distinguishes subtype B from subtype A. Within subfamily

II, subtype A insertions are differentiated from subtype B by a ‘T’ deletion at position 51

(Figure 2a). When represented in a phylogenetic context, subfamilies I and II form distinct

clades with internal clusters of subtypes A or B (Figure 2b).

The evolutionary distributions of subfamilies I and II differ: subfamily I insertion

sites occur in all feliforms sampled while subfamily II insertions only occur within the

Felidae/Priondontidae clade. Two of the subfamily II subtype A insertions are conserved in

all Felidae, one occurs in the pampas cat ( Leopardus colocolo ) and one occurs in the clouded leopard ssp. ( Neofelis ). All subfamily II subtype B insertions are species or sister- species specific and present within each of the major Felidae lineages (Figure 2a).

Average pair-wise sequence identities among the feliform SINE subfamilies are consistent with time of proliferation. Subfamily I insertions have an average similarity score of 77% while the more recently acquired subfamily II insertions have an average similarity score of 91%. Similarity scores of subtypes A and B within subfamily I are 79% and 82%, while similarity scores of subtypes A and B within subfamily II are 92% and

93%. Within subfamily I, mutation rates derived from tRNA-related region were estimated at 1.05 x10 -9 mutations per site per year for subtype A and 1.29 x10 -9 mutations per site per year for subtype B. Within subfamily II mutation rates derived from tRNA-related region were estimated at 1.00 x10 -9 mutations per site per year for subtype A and 1.28 x10 -9 mutations per site per year for subtype B. These rates are slightly faster than estimates of average mutation rates among mammalian genes (2.2 x10 -9 mutations per site per year)

(Kumar and Subramanian 2002), a possible reflection of reduced sequence-level selection 32

on non-coding SINE elements. However, the CanSINE mutation rates presented here are

slower than rates derived from human whole genome analyses (1.8 x10 -8 mutations per site per year)(Roach et al. 2010) that include simple and microsatellite repeats. The slower mutation rate among the tRNA derived domains of SINEs is likely due to fewer errors during DNA replication prior to meiosis, or possibly to positive selection that maintains the integrity of SINE sequences.

Ascertainment of CanSINE loci throughout Felidae and Feliformia permitted characterization two unique feliform specific subfamilies with mutation rates of approximately 1.15 x10 -9 mutations per site per year. Subfamily I proliferated ~45-60

million years ago (MYA) (Eizirik et al. 2010) in the common ancestor of all Feliformia and

is potentially derived from two distinct master copy sequences, resulting in two subtypes.

Subfamily II is derived from two master SINE copies that have proliferated since the

beginning of the Felidae and Priondontidae radiation approximately 35 MYA (Eizirik et al.

2010) and is presumably the source of current insertional mutagenesis.

SINE insertion loci reflect Felidae phylogeny Large scale sequencing of the Felidae family provides a robust reference phylogeny to map SINE insertion events (Johnson et al. 2006). Of the 45 CanSINE insertion loci assessed in this study, 43 are congruent with the established felid phylogeny.

Twenty-seven of the 28 loci designated as subfamily I insertions support the monophyly of

Feliformia (Figure 1). Moreover, two insertions support the monophyly of Felidae, and one insertion supports the sister group relationship between Felidae and Priondontidae

(Figure 1). This finding corroborates recent molecular analyses that place the Asian linsangs ( Prionodon linsang and P. pardicolor ) into a newly described Prionodontidae 33

family that is the sister lineage of Felidae (Eizirik et al. 2010). Within Felidae, a more

recent insertion is diagnostic of the Panthera uncia/P. tigris lineage (Figure 1).

Evidence for incomplete lineage sorting of an ancestral insertion While over 90% of orthologous SINE loci presented here are consistent with the current feliform phylogeny, the remainder provide empirical evidence of hypothesized caveats to

SINE-based phylogenetic analysis. Incomplete lineage sorting, whereby differential survival of polymorphic ancestral alleles results in an allele phylogeny inconsistent with species phylogeny, is a potential source of phylogenetic incongruence in SINE analyses

(Hillis 1999; Takahashi et al. 2001). Within this study, two CanSINEs at a single locus provide empirical examples of SINE lineage sorting in Carnivore evolution.

The genus Lynx diverged from the other Felidae approximately 6 MYA, followed

by the divergence of the bobcat ( L. rufus ) progenitor approximately 3 MYA. The three

other extant lynx species, Canadian lynx (L. canadensis ), Eurasian lynx ( L. lynx ) and

Iberian lynx ( L. pardinus ), evolved rapidly ~1.2-1.6 MYA (Johnson et al. 2004).

Morphologically distinguished by size, L. canadensis , L. lynx and L. pardinus have not had

overlapping ranges since the Pleistocene (Kurten and Granqvist 1987) and are currently

separated by geological barriers. In addition, L. pardinus is critically endangered with as

few as 500 individuals remaining in Spain and Portugal (Piris and Fernandes 2003). With

no evidence of pervasive hybridization or introgression, the lynx are a model system for

ascertainment of SINE distributions following rapid speciation

Initial analysis of locus 106265 (chromosome B1, nucleotides 191,469,873 –

191,484,794) revealed an orthologous insertion is present in the L. canadensis and L. lynx

voucher specimens while an alternate insertion is present in the L. pardinus voucher 34

(Figure 3). This pattern suggests L. canadensis and L. lynx are sister taxa to the exclusion of L. pardinus , a hypothesis that is incongruent with prior evolutionary depictions based on concatenated nuclear and mitochondrial DNA sequences (Johnson et al. 2004; Meli et al.

2009) (Figure 4) but consistent with a phylogeny based on a single cytochrome ( cyt) b gene

(Agnarsson et al. 2010). An additional 21 L. canadensis , 22 L. lynx and 8 L. pardinus individuals revealed insertion polymorphism within L. lynx . Seventeen percent of the L. lynx individuals have the insertion pattern similar to L. pardinus , 61% the pattern similar to

L. canadensis , and 22% are heterozygous, containing a single copy of each insertion on opposite chromatids. To address the possibility of hybridization, mitochondrial haplotypes where obtained and found to be consistent with species designation (data not shown). In addition, no correlation was observed between the geographic origin of L. lynx individuals and SINE profile (Supplemental Table 3).

Given the rapid divergence of the three lynx species and improbability of hybridization or introgression, incomplete lineage sorting presumably generated the

CanSINE distribution at scaffold 106265. While generally considered a source of error in

SINE-based analyses, the extent to which incomplete lineage sorting actually impairs evolutionary inference is debatable (Churakov et al. 2009). Of the 45 orthologous insertion sites described here, only one is affected by incomplete lineage sorting, a rare phenomenon.

Furthermore, the SINE distributions described here present a more composite depiction of the nearly simultaneous divergence of three lynx species. One may infer that the most recent common ancestor of the three lynx species was host to two unfixed insertions and diverged so rapidly approximately 1.5 MYA that L. lynx, with its relatively large habitat and effective population size, maintained both insertion sites while either one locus or the 35

other was fixed in L. canadensis and L. pardinus. Thus, SINE based analyses convey complex speciation events that are not immediately apparent in phylogenetic reconstructions using other types of molecular data.

Excision of a SINE locus SINE insertions are generally considered unidirectional characters free from reversal (Hillis 1999; Ray et al. 2006). However, within Felidae there is evidence for near perfect excision of a SINE locus in the puma ( Puma concolor ). Sequencing of 17

individuals representing multiple P. concolor subspecies revealed an 18 bp fragment that

aligns to a reverse-oriented SINE in place of the expected 220 bp CanSINE and 63 bp of

upstream sequence (Figure 5). Each of the eight cheetah (Acinonyx jubatus ) and

jaguarundi (P. yagouaroundi ) individuals representing the P. concolor sister species

possess the insertion, indicating the deletion occurred and became fixed after the P.

yagouaroundi and P. concolor divergence approximately 4 MYA but prior to the

expansion of the extant subspecies.

Potential mechanisms for SINE excision include deletion via inter or intra

chromosomal recombination between insertions of the same SINE family (Jurka et al.

2007) or between flanking target site duplications (TSDs) (Callinan et al. 2005; van de

Lagemaat et al. 2005). Although the mechanism for SINE excision in P. concolor is

unclear, the inverted 18bp may be the vestige of a recombination event, or the simple

repeats that surround the integration site may have formed a loop structure that was omitted

during DNA replication (Figure 5). In either case, similar to the thoroughly examined

primate Alu ’s (Callinan et al. 2005; van de Lagemaat et al. 2005), CanSINEs are indeed

subject to rare instances of removal (1 excision out of 1,339 synapomorphic insertions). 36

In summary, comparative genomics methods provide a means to efficiently identify

SINE loci and examine the patterns of proliferation, insertion, and degradation. Felidae

SINEs may be categorized into subfamilies based on sequence structure and taxonomic

placement. Patterns of insertion also support species designations, affirming SINEs as

systematic markers. In addition, complex evolutionary and genomic processes including

incomplete lineages sorting following rapid species divergence and SINE mediated genome

rearrangement are described empirically. Ongoing and future large-scale comparative

genomic screening across mammals will further define the evolutionary impacts of SINEs.

Methods

In Silico Localization of Insertion Loci Previously, feliform specific SINE insertion loci were mapped to F. catus chromosomes using a combination of computational tools. First, RepeatMasker

(www.repeatmasker.org, version open-3.1.0 with Cross_match version 0.990329, RepBase

Update 10.04, RM database version 20050523) was used to identify instances of feliform specific CanSINEs insertions (SINEC_Fc and SINEC_Fc2) in the 1.9X F. catus genome assembly (Pontius et al. 2007b). From the resulting list of 322 Felid SINEs, select loci were retrieved from the cat genome assembly on the UCSC genome browser (genome.ucsc.edu).

In the ‘Browser’ view, the status of particular SINE insertions in F. catus and C. familiaris

was determined by comparing two alignment tracks. The ‘Conservation’ track, displays an

alignment of homologous segments from dog, mouse and human. The ‘Repeating

Elements’ track aligns known repetitive elements to the input sequence (Supplemental

Figure 2). Regions found to contain feliform specific CanSINEs in F. catus with

37

homologous flanking sequence in both F. catus and C. familiaris were selected for

amplification and sequencing across Feliformia .

Alignment regions were extended 100-300 base pairs on either side of the insertion

site, so that flanking forward and reverse PCR primers could be designed. Primers were

optimized to amplify 60 SINE insertion regions present in F. catus and absent in the homologous region of C. familiaris . The UCSC genome scaffold annotations were matched

to corresponding cat chromosome locations using the F. catus genome browser, GARfield

(http://lgd.abcc.ncifcrf.gov/cgi-bin/gbrowse/cat/ ) (Pontius and O'Brien 2007). Within the

context of this study each region is named for the UCSC genome browser scaffold from

which the reference sequence was obtained. (Supplemental Table 1)

DNA Specimens Distribution of SINE insertion loci was ascertained in at least one individual

representing nearly every Felidae species. Samples from the domestic cat lineage were

domestic cat ( Felis catus ), Chinese desert cat ( F. bieti ), jungle cat ( F. chaus ), European

wildcat ( F. silvestris ), African wildcat ( F. libyca ), black-footed cat ( F. nigripes ), and sand

cat ( F. margarita ). The panthera lineage species sampled are leopard ( Panthera pardus ),

tiger ( P. tigris ), jaguar (P. onca ), snow leopard ( P. uncia ), lion ( P. leo ), and clouded leopard

(Neofelis nebulosa ). The puma lineage includes the puma ( Puma concolor ), jaguarondi (P.

yagouaroundi ) and cheetah ( Acinonyx jubatus ). The lynx lineage is comprised of bobcat

(Lynx rufus), Siberian lynx ( L. lynx ), Canadian lynx ( L. canadensis ), and Iberian lynx ( L.

pardinus ). The Asian leopard cat lineage consists of the Asian leopard cat ( Prionailurus

bengalensis ), fishing cat ( P. viverrina ), the rusty-spotted cat ( P. rubiginosa ), flat-headed cat

(P. planiceps ) and the pallas cat ( Otocolobus manul ). The caracal lineage is represented by 38

caracal ( Profelis caracal ), (P. aurata ) and serval ( P. serval ). The bay cat

(Pardofelis badia ), Asian golden cat ( P. temmincki ), and marbled cat ( P. marmorata ) represent the bay cat lineage. The ocelot lineage includes the ocelot ( Leopardus pardalis ), tigrina ( L. tigrina ), margay ( L. weidii ), pampas cat ( L. colocolo ), kodkod (L. guigna ), and

Geoffroy's cat ( L. geoffroyi ). The critically endangered Andean Mountain Cat, Leopardus jacobita , is the only Felidae species not sampled due to difficulties in obtaining a high quality biological sample.

Representative samples from 5 additional Feliformia families were also obtained.

The taxonomy of Feliformia was modified in a recent study by Eizirik et al 2010, and will be used as the reference terminology for this study. Feliformia species sampled in this study are Prionodon linsang , of the Prionodontidae family, hyena , of the Hyaenidae

family, hirtula of the Herpestidae family, Cryptoprocta ferox of the Eupleridae

family and Genetta genetta from the Vivveridae family (Supplemental Table 2). The

domestic dog, C. familiaris of the Caniformia suborder was the outgroup taxa. Commercial

genomic DNA from F. catus and C. familiaris was purchased from EMBD Biosciences

Product No: 69235 and 69234 respectively.

PCR, Sequencing and Cloning Approximately 20ng of extracted genomic DNA was used in each PCR reaction.

All reactions consist of 0.1U of AmpliTaq DNA polymerase, 0.75µM forward and reverse

primer, 2.5mM MgCl 2, 0.2mM of each deoxynucleotide triphosphate and the appropriate

amount of 10X AmpliTaq Buffer II and water for a 20µL reaction. Touchdown PCR

conditions were 5 min at 94°C, 10 cycles of 30 sec at 94°C, 30 sec at 63°C* and 60 sec at

72°C, with a decrease in the annealing temperature (*) at a rate of 0.5°C per cycle, 39

followed by 30 cycles of 30sec at 95°C, 30sec at 58°C** and 60sec at 72°C, then a final

elongation step of 7min at 72°C. Initial annealing temperatures (*) were set to 5°C warmer

than the final annealing temperature (**). Final Annealing temperatures varied from 50-

64 °C depending on the primer set.

To confirm amplification and assess the sizes of DNA fragments, 5 l of PCR product was fractionated by gel electrophoresis in a 1.0% agarose gel containing ethidium bromide (Supplemental Figure 1). Prior to cloning or sequencing, 20ul of PCR product was purified using the ExoSAP protocol with 0.72uL Shrimp Alkaline Phosphatase (SAP) and 1.44uL Exonuclease I (ExoI) (Amersham Pharmacia, Piscataway, NJ).

Cycle sequencing reactions consisted of 0.25U BigDye® Terminator v3.1 Ready Reaction

Mix, 0.075µM primer, 5µL of sequencing buffer (Applied Biosystems), 1.5µL of purified

PCR product and enough water for a 10µL reaction. Cycle sequencing was performed under the following conditions: 94°C for 10 sec, 52°C for 5 sec, and 72°C for 2 min for 45 cycles. Products from cycle sequencing reactions were run on an ABI 3730 DNA

Analyzer. Sequence results were visualized and edited with Sequencher v4.8 (GeneCodes).

Multiple gel electrophoresis bands or illegible preliminary sequencing traces were resolved by cloning PCR amplification products with the TOPO TA Cloning Kit (Invitrogen) followed by purification with the Qiagen GeneClean Kit according to manufacturer’s instructions. Purified fragments were then sequenced in a cycle sequencing reaction consisting of 0.25U BigDye® Terminator v3.1 Ready Reaction Mix, 1µL of forward or reverse M13 primer provided in the TOPO TA Cloning Kit, 5µL of sequencing buffer

(Applied Biosystems), 2.5µL of purified PCR product and enough water for a 10µL

40

reaction. Cycle sequencing was performed under the following conditions: 94°C for 10 sec,

52°C for 5 sec, and 72°C for 4 min for 45 cycles. Products from cycle sequencing were run

on an ABI 3730 DNA Analyzer. Sequence results were visualized and edited with

Sequencher v4.8 (GeneCodes).

Determining SINE Presence or Absence A specific SINE insertion site is delimited by the exact sequence of the 6-10 base

pair target site duplication (TSD). If a SINE is present the amplification product will

include; the forward primer sequence, 5’ genomic sequence, a TSD, the SINE element, the

same TSD, 3’ genomic sequence and the reverse primer sequence. If a SINE is absent, the

amplification product will include; the primer sequences plus 5’ and 3’ genomic sequence

bracketing one copy of the TSD sequence. Note that the absence of any PCR product

signifies amplification failure and does not imply that the SINE is absent in the

homologous region of a non-model species. Thus, criteria for successful amplification loci

are 1) PCR products from F. catus include the target SINE insertion and therefore are about

200-400 base pairs larger than the amplification products of C. familiaris that lack the target SINE insertion 2) the sequence of the TSD can be determined by examining sequence traces of F. catus and C. familiaris 3) PCR products yielded sufficiently legible sequences such that SINE presence or absence at the TSD can be ascertained in at least

80% of the sample taxa. All final sequences will be submitted to GenBank.

Evolutionary Analysis The tRNA related regions of sequences representing each full-length SINE insertion loci were aligned using the MAFFT algorithm implemented in the Geneious software package (Drummond et al. 2009; Katoh et al. 2009). Phylogenetic analyses were

41 performed using neighbor-joining, maximum parsimony and maximum likelihood.

Neighbor-joining and maximum parsimony methods were implemented in PAUP with equal character weighting among sites and substitution types (Swofford 2003). The TrN+G

(Tamura-Nei plus gamma) model was selected as the optimal nucleotide substitution model for likelihood analyses using Modeltest with the AIC criterion (Posada and Buckley 2004;

Posada and Crandall 2001). Maximum likelihood was implemented in GARLI through the

Lattice Project Grid computing system using the general time reversible model (nearest option to TrN) and a gamma distribution to account for among-site rate variation (Bazinet and Cumming 2008; Zwickl 2006). Bootstrap support values for all three analyses were obtained from 1000 repetitions. Genetic distances were estimated using average pair-wise percent identities calculated by the Geneious software package (Drummond et al. 2009).

Mutation rates were calculated using the equation: m / (t*n*l) where m= the total number of mutations across an alignment, t= average estimated time since the last common ancestor, n= number of taxa and l= alignment length.

42

Target Insertion Regions Additional Insertion Scaffold Distribution Scaffold Distribution 4743 Feliformia 106265 Iberian lynx, Eurasian lynx 95036 Feliformia 106265 Canadian lynx, Eurasian lynx 95892 Feliformia 106265 Marbled cat 99463 Feliformia 134292 Tiger, Snow leopard 101187 Feliformia 131713 106265 Feliformia 204133 Eurasian lynx 115304 Felidae/Prionodontidae 122236 Pampas cat 117885 Feliformia 145617 Marbled cat 122236 Felidae* 164336 Flat-Headed Cat 131713 Feliformia 99463 Black-footed Cat 134292 Feliformia 101187 Serval 135256 Felidae* 101187 Jaguarundi 136949 Feliformia 199572 Hyaena 136982 Feliformia 199572 Clouded leopard 145617 Feliformia 145754 Feliformia Deletion 150913 Feliformia 174511 Puma 151787 Feliformia 158776 Feliformia 164336 Feliformia 170176 Feliformia 174511 Feliformia 179189 Feliformia 180515 Feliformia 194731 Feliformia 199572 Feliformia 203536 Feliformia 204133 Feliformia 212906 Feliformia 213652 Feliformia 217179 Feliformia

Table 1: Twenty-eight of the 31 target CanSINE insertions examined are conserved throughout Feliformia. * Presence of loci 122236 and 135256 in Prionodontidae could not be determined, due to non-amplification of the P. linsang sample. Fourteen additional species and sister-species specific insertions were located. One apparent deletion of a

SINE was found in the puma, P. concolor . 43

44

Figure 1 CanSINE insertion events are mapped onto the Feliformia/Felidae species tree adapted from Johnson et al 2006 (branch lengths not drawn to scale). Conserved insertion events corresponding to the 31 original target CanSINE loci are indicated by arrow boxes.

The 14 additional species and sister-species specific insertions are indicated by triangles.

One SINE loss was observed in puma ( P. concolor ), denoted by a black triangle. *

Eurasian lynx ( Lynx lynx ) polymorphic insertion sites that are shared with either Iberian lynx ( L. pardinus ) or Canadian lynx ( L. canadensis ).

45

A)

B)

46

Figure 2 A) Alignment of tRNA-related regions from feliform CanSINE insertions. The genomic scaffold number, as annotated on the UCSC genome browser, and the species or clade in which the insertion is found identifies each sequence. feliform specific CanSINE subfamilies are defined by the age of insertion and characteristic sequence motifs. Further, each subfamily may be divided into two subtypes based on time of insertion and sequence.

B) A minimum evolution phylogeny of feliform CanSINEs, obtained from the aligned tRNA-related regions, depicts two SINE subfamilies (I and II) with internal clades of subtypes A and B respectively. Numbers in bold indicate bootstrap support scores based on 1000 pseudo-replicates from minimum evolution/parsimony/likelihood optimizations.

47

Figure 3 Alignment of Lynx lineage individuals at locus 106256 reveals a conserved

CanSINE in all species (bp 51-217), a CanSINE immediately adjacent that is fixed in L.

pardinus and polymorphic in L. lynx (bp 231-466), and a different CanSINE insertion (bp

782-1003) fixed in L. canadensis and polymorphic in L. lynx . Each insertion has unique

TSDs, as well as distinct SNPs throughout the SINE sequence.

48

Figure 4: Alternate hypotheses of lynx phylogeny. Hypothesis 1 based on maximum- likelihood reconstruction using 18,853 bp of nuclear DNA (Johnson et al 2006).

Hypothesis 2 based in SINE insertion patterns in voucher individuals representing each of the lynx species.

49

Figure 5: Loss of the target CanSINE insertion and 68 adjacent nucleotides (bp 99-379) from P. concolor and replacement by an 18 nucleotide sequence (bp 81-98) similar in primary sequence to a CanSINE in the opposite orientation.

50

Literature Cited

Agnarsson, I., M. Kunter, and L. May-Collado. 2010. Dog, cats and kin: A molecular

species-level phylogeny of Carnivora. Molecular Phylogenetics and Evolution 54:

726-745.

Bazinet, A. and M. Cumming. 2008. The lattice project: a grid research and production

environment combining multiple grid computing models. Distributed & Grid

Computing- Science Made Transparent for Everyone. Principles, Applications and

Supporting Communities. : 2-13.

Callinan, P.A., J. Wang, S.W. Herke, R.K. Garber, P. Liang, and M.A. Batzer. 2005. Alu

retrotransposition-mediated deletion. J Mol Biol 348: 791-800.

Churakov, G., J. Kreigs, R. Baertsch, A. Zemann, J. Brosius, and J. Schmitz 2009. Mosaic

retrotransposon insertion patterns in placental mammals. Genome Res 19: 868-875.

Coltman, D.W. and J.M. Wright. 1994. Can SINEs: a family of tRNA-derived retroposons

specific to the superfamily Canoidea. Nucl. Acids Res. 22: 2726-2730.

Drummond, A., B. Ashton, M. Cheung, J. Heled, M. Kearse, R. Moir, S. Stones-Havas, T.

Thierer, and A. Wilson. 2009. Geneious v4.7.

Eizirik, E., W.J. Murphy, K.P. Koepfli, W.E. Johnson, J. Dragoo, R.K. Wayne, and S.J.

O'Brien. 2010. Pattern and timing of diversification of the mammalian order

Carnivora inferred from multiple nuclear gene sequences. Molecular Phylogenetics

and Evolution 56: 49-63.

51

Gentles, A.J., O. Kohany, and J. Jurka. 2005. Evolutionary diversity and potential

recombinogenic role of integration targets of non-LTR retrotransposons. Mol Biol

Evol 22: 1983-1991.

Hillis, D.M. 1999. SINEs of the perfect character. PNAS 96: 9979-9981.

Johnson, W.E., E. Eizirik, J. Pecon-Slattery, W.J. Murphy, A. Antunes, E. Teeling, and S.J.

O'Brien. 2006. The late Miocene radiation of modern Felidae: a genetic assessment.

Science 311: 78-77.

Johnson, W.E., J.A. Godoy, F. Palomares, M. Delibes, M. Fernandes, E. Revilla, and a.S.J.

O'Brien 2004. Phylogenetic and phylogeographic analysis of Iberian lynx

populations. Journal of Heredity 95: 19-28.

Jurka, J. 1997. Sequence patterns indicate an enzymatic involvement in integration of

mammalian retroposons. PNAS 94: 1872-1877.

Jurka, J. 2004. Evolutionary impact of human Alu repetitive elements Current Opinion in

Genetics and Development 14: 603-608.

Jurka, J., V. Kapitonov, O. Kohany, and M.V. Jurka. 2007. Repetitive sequences in

complex genomes: structure and evolution. Annu. Rev. Genomics Hum. Genet. 8:

241-259.

Jurka, J., O. Kohany, A. Pavlicek, V. Kapitonov, and M.V. Jurka. 2005. Clustering,

duplication and chromosomal distribution of mouse SINE retrotransposons.

Cytogenetic and Genome Research 110: 117-123.

Kajikawa, M. and N. Okada. 2002. LINEs mobilize SINEs in the eel through a shared 3'

sequence. Cell 111: 433 - 444.

52

Katoh, K., G. Asimenos, and H. Toh. 2009. Multiple alignment of DNA sequences and

MAFFT. Methods Mol Biol 537: 39-64.

Kumar, S. and S. Subramanian. 2002. Mutation rates in mammalian genomes. PNAS 99:

803-808.

Kurten, B. and E. Granqvist. 1987. Fossil pardel lynx ( Lynx pardina spelaea Boule ) from a

cave in southern France. Ann Zool Fennici 24: 39-43.

Lindblad-Toh, K. C.M. Wade T.S. Mikkelsen E.K. Karlsson D.B. Jaffe M. Kamal M.

Clamp J.L. Chang E.J. Kulbokas, 3rd M.C. Zody E. Mauceli X. Xie M. Breen R.K.

Wayne E.A. Ostrander C.P. Ponting F. Galibert D.R. Smith P.J. DeJong E.

Kirkness P. Alvarez T. Biagi W. Brockman J. Butler C.W. Chin A. Cook J. Cuff

M.J. Daly D. DeCaprio S. Gnerre M. Grabherr M. Kellis M. Kleber C. Bardeleben

L. Goodstadt A. Heger C. Hitte L. Kim K.P. Koepfli H.G. Parker J.P. Pollinger

S.M. Searle N.B. Sutter R. Thomas C. Webber J. Baldwin A. Abebe A. Abouelleil

L. Aftuck M. Ait-Zahra T. Aldredge N. Allen P. An S. Anderson C. Antoine H.

Arachchi A. Aslam L. Ayotte P. Bachantsang A. Barry T. Bayul M. Benamara A.

Berlin D. Bessette B. Blitshteyn T. Bloom J. Blye L. Boguslavskiy C. Bonnet B.

Boukhgalter A. Brown P. Cahill N. Calixte J. Camarata Y. Cheshatsang J. Chu M.

Citroen A. Collymore P. Cooke T. Dawoe R. Daza K. Decktor S. DeGray N.

Dhargay K. Dooley P. Dorje K. Dorjee L. Dorris N. Duffey A. Dupes O.

Egbiremolen R. Elong J. Falk A. Farina S. Faro D. Ferguson P. Ferreira S.

M. FitzGerald K. Foley C. Foley A. Franke D. Friedrich D. Gage M. Garber G.

Gearin G. Giannoukos T. Goode A. Goyette J. Graham E. Grandbois K. Gyaltsen

N. Hafez D. Hagopian B. Hagos J. Hall C. Healy R. Hegarty T. Honan A. Horn N. 53

Houde L. Hughes L. Hunnicutt M. Husby B. Jester C. Jones A. Kamat B. Kanga C.

Kells D. Khazanovich A.C. Kieu P. Kisner M. Kumar K. Lance T. Landers M. Lara

W. Lee J.P. Leger N. Lennon L. Leuper S. LeVine J. Liu X. Liu Y. Lokyitsang T.

Lokyitsang A. Lui J. Macdonald J. Major R. Marabella K. Maru C. Matthews S.

McDonough T. Mehta J. Meldrim A. Melnikov L. Meneus A. Mihalev T. Mihova

K. Miller R. Mittelman V. Mlenga L. Mulrain G. Munson A. Navidi J. Naylor T.

Nguyen N. Nguyen C. Nguyen R. Nicol N. Norbu C. Norbu N. Novod T. Nyima P.

Olandt B. O'Neill K. O'Neill S. Osman L. Oyono C. Patti D. Perrin P. Phunkhang F.

Pierre M. Priest A. Rachupka S. Raghuraman R. Rameau V. Ray C. Raymond F.

Rege C. Rise J. Rogers P. Rogov J. Sahalie S. Settipalli T. Sharpe T. Shea M.

Sheehan N. Sherpa J. Shi D. Shih J. Sloan C. Smith T. Sparrow J. Stalker N.

Stange-Thomann S. Stavropoulos C. Stone S. Stone S. Sykes P. Tchuinga P.

Tenzing S. Tesfaye D. Thoulutsang Y. Thoulutsang K. Topham I. Topping T.

Tsamla H. Vassiliev V. Venkataraman A. Vo T. Wangchuk T. Wangdi M. Weiand

J. Wilkinson A. Wilson S. Yadav S. Yang X. Yang G. Young Q. Yu J. Zainoun L.

Zembek A. Zimmer and E.S. Lander. 2005. Genome sequence, comparative

analysis and haplotype structure of the domestic dog. Nature 438: 803-819.

Liu, G., C. Alkan, L. Jiang, S. Zhao, and E.E. Eichler. 2009. Comparative analysis of Alu

repeats in primate genomes. Genome Res 19: 876-885.

Lum, J.K., M. Nikaido, M. Shimamura, S. Hidetoshi, A.M. Shedlock, N. Okada, and M.

Hasegawa. 2000. Consistency of SINE insertion topology and flanking sequence

tree: quantifying relationships among Cetartiodactyls. Mol Biol Evol 17: 1417-

1424. 54

Meli, M.L., V. Cattori, F. Martínez, G. López, A. Vargas, M.A. Simón, I. Zorrilla, A.

Muñoz, F. Palomares, J.V. López-Bao, J. Pastor, R. Tandon, B. Willi, R. Hofmann-

Lehmann, and H. Lutz. 2009. and other pathogens as

important threats to the survival of the critically endangered Iberian Lynx ( Lynx

pardinus ). PLOS One 4: e4744.

Miyamoto, M.M. 1999. Molecular systematics: Perfect SINEs of evolutionary history?

Current Biology 9: 816 -819.

Nikaido, M., F. Matsuno, H. Hamilton, R.L. Brownell, Jr., Y. Cao, W. Ding, Z. Zuoyan,

A.M. Shedlock, R.E. Fordyce, M. Hasegawa, and N. Okada. 2001. Retroposon

analysis of major cetacean lineages: The monophyly of toothed whales and the

paraphyly of river dolphins. PNAS 98: 7384-7389.

Nikaido, M., A.P. Rooney, and N. Okada. 1999. Phylogenetic relationships among

cetartiodactyls based on insertions of short and long interspersed elements:

Hippopotamuses are the closest extant relatives of whales. PNAS 96: 10261-10266.

Nishihara, H., S. Kuno, M. Nikaido, and N. Okada. 2007. MyrSINEs: A novel SINE family

in the anteater genomes. Gene 400: 98-103.

Nishihara, H., Y. Satta, M. Nikaido, J.G.M. Thewissen, M. Stanhope, and N. Okada. 2005.

Retrotransposon analysis of Afrotherian phylogeny. Mol Biol Evol 9: 1823-1833.

Ohshima, K. and N. Okada. 1994. Generality of the tRNA origin of short interspersed

repetitive elements (SINEs) : characterization of three different tRNA-derived

retroposons in the octopus. Journal of Molecular Biology 243: 25 - 37.

55

Okada, N., A.M. Shedlock, and M. Nikaido. 2004. Retroposon mapping in molecular

systematics. In Mobile Genetic Elements: protocols and genomic applications (eds.

W. Miller and P. Capy), pp. 189-226. Humana Press Inc.

Pecon-Slattery, J., W.J. Murphy, and S.J. O'Brien. 2000. Patterns of diversity among SINE

elements isolated from three Y-chromosome genes in carnivores. Mol Biol Evol 17:

825-829.

Piris, A. and M. Fernandes. 2003. Last in Portugal? Molecular approaches in a pre-

extinction scenario. Conservation Genetics 4: 525-532.

Pontius, J., J. Mullikin, D. Smith, K. Lindblad-Toh, S. Gnerre, M. Clamp, J. Chang, R.

Stephens, B. Neelam, N. Volfovsky, A. Schaffer, R. Agarwala, K. Narfstrom, W.

Murphy, U. Giger, A. Roca, A. Antunes, M. Menotti-Raymond, N. Yuhki, J.

Pecon-Slattery, and W. Johnson. 2007a. Initial sequence and comparative analysis

of the cat genome. Genome research 17: 1675-1689.

Pontius, J.U., J.C. Mullikin, D.R. Smith, K. Lindblad-Toh, S. Gnerre, M. Clamp, J. Chang,

R. Stephens, B. Neelam, N. Volfovsky, A.A. Schaffer, R. Agarwala, K. Narfstrom,

W.J. Murphy, U. Giger, A.L. Roca, A. Antunes, M. Menotti-Raymond, N. Yuhki, J.

Pecon-Slattery, W.E. Johnson, G. Bourque, G. Tesler, and S.J. O'Brien. 2007b.

Initial sequence and comparative analysis of the cat genome. Genome Res 17:

1675-1689.

Pontius, J.U. and a.S.J. O'Brien 2007. Genome annotation resource fields—GARFIELD: a

genome browser for Felis catus. Journal of Heredity 98: 386-389.

56

Posada, D. and T.R. Buckley. 2004. Model selection and averaging in phylogenetics:

advantages of Akaike information criterion and Bayesian approaches over

likelihood ratio tests. Syst. Biol 53: 793-818.

Posada, D. and K.A. Crandall. 2001. Selecting the best-fit model of nucleotide substitution.

Syst. Biol 50: 580-601.

Ray, D.A. 2007. SINEs of progress: Mobile element applications to molecular ecology.

Molecular Ecology 16: 19-33.

Ray, D.A. and M.A. Batzer. 2005. Tracking Alu evolution in new world primates. BMC

Evolutionary Biology 5.

Ray, D.A., J. Xing, A.H. Salem, and M.A. Batzer. 2006. SINEs of a nearly perfect

character. Syst. Biol 55: 928-935.

Roach, J.C., G. Glusman, A. Smit, C.D. Huff, R. Hubley, P.T. Shannon, L. Rowen, K.

Pant, N. Goodman, M. Bamshad, J. Shendure, D. Radoje, L.B. Jorde, L. Hood, and

D.J. Galas. 2010. Analysis of genetic inheritance in a family quartet by whole-

genome sequencing. Science 328: 636-639.

Schröder, C., C. Bleidorn, S. Hartmann, and R. Tiedemann. 2009. Occurrence of Can-

SINEs and intron sequence evolution supports robust phylogeny of pinniped

carnivores and their terrestrial relatives. Gene 448: 221-226.

Shimamura, M., H. Abe, M. Nikaido, K. Ohshima, and N. Okada. 1999. Genealogy of

families of SINEs in cetaceans and artiodactyls: the presence of a huge superfamily

if tRNA-Glu derived families of SINEs. Mol Biol Evol 16: 1046-1060.

Swofford, D. 2003. PAUP*. Phylogenetic analysis using parsimony (*and other methods)

version 4. Sinauer Associates . 57

Takahashi, K., Y. Terai, M. Nishida , and N. Okada. 2001. Phylogenetic relationships and

ancient incomplete lineage sorting among cichlid fishes in Lake Tanganyika as

revealed by analysis of the insertion of retroposons. Mol Biol Evol 18: 2057-2066. van de Lagemaat, L.N., L. Gagnier, P. Medstrand, and D.L. Mager. 2005. Genomic

deletions and precise removal of transposable elements mediated by short identical

DNA segments in primates. Genome Res 15: 1243-1249.

Yu, L. and Y.-p. Zhang. 2005. Evolutionary implications of multiple SINE insertions in an

intronic region from diverse mammals. Mammalian Genome 16: 651-660.

Zehr, S.M., M.A. Nedbal, and J.J. Flynn. 2001. Tempo and mode of evolution in an

orthologous Can SINE. Mammalian Genome 12: 38-44.

Zhao, F., J. Qi, and S.C. Schuster. 2009. Tracking the past: interspersed repeats in an

extinct Afrotherian mammal, Mammuthus primigenius . Genome Res 19: 1384-

1392.

Zwickl, D. 2006. Genetic algorithm approaches for the phylogenetic analysis of large

biological sequence datasets under maximum likelihood criterion. PhD dissertation,

The University of Texas at Austin.

58

Species Common Name Code Lineage Family Felis bieti Chinese Desert Cat FBI Domestic Cat Felidae Felis catus Domestic Cat FCA Domestic Cat Felidae Felis chaus Jungle Cat FCH Domestic Cat Felidae Felis libyca African Wild Cat FLI Domestic Cat Felidae Felis margarita Desert Cat FMA Domestic Cat Felidae Felis nigripes Black-footed Cat FNI Domestic Cat Felidae Felis silvestris European Wild Cat FSI Domestic Cat Felidae Leopardus colocolo Pampas Cat LCO Ocelot Felidae Leopardus geoffroyi Geoffroy's Cat OGE Ocelot Felidae Leopardus guigna Kodkod OGU Ocelot Felidae Leopardus jacobitus Andean Mt. Cat OJA Ocelot Felidae Leopardus pardalis Ocelot LPA Ocelot Felidae Leopardus tigrinus Tigrina LTI Ocelot Felidae Leopardus wiedii Margay LWI Ocelot Felidae Lynx canadensis Canadian Lynx LCA Lynx Felidae Lynx lynx Eurasian Lynx LLY Lynx Felidae Lynx pardinus Iberian Lynx LYP Lynx Felidae Lynx rufus Bobcat LRU Lynx Felidae Panthera leo Lion PLE Panthera Felidae Panthera onca Jaguar PON Panthera Felidae Panthera pardus Leopard PPA Panthera Felidae Panthera tigris Tiger PTI Panthera Felidae Uncia uncia Snow Leopard PUN Panthera Felidae Neofelis nebulosa Clouded Leopard NNE Panthera Felidae Pardofelis badia Bay Cat PBA Bay Cat Felidae Pardofelis marmorata Marbled Cat PMA Bay Cat Felidae Pardofelis temminckii Asian Golden Cat PTE Bay Cat Felidae Otocolobus manul Pallas Cat OMA Leopard Cat Felidae Prionailurus bengalensis Asian Leopard Cat PBE Leopard Cat Felidae Prionailurus planiceps Flat-headed Cat IPL Leopard Cat Felidae Prionailurus rubiginosus Rusty Spotted Cat PRU Leopard Cat Felidae Prionailurus viverrinus Fishing Cat PVI Leopard Cat Felidae Profelis aurata African Golden Cat PAU Caracal Felidae Profelis caracal Caracal CCA Caracal Felidae Profelis serval Serval LSE Caracal Felidae Puma concolor Puma PCO Puma Felidae Puma yagouaroundi Jaguarundi HYA Puma Felidae Acinonyx jubatus Cheetah AJU Puma Felidae Helogale hirtula Dwarf HPA Other Feliformia Herpestidae Genetta genetta Genet GGE Other Feliformia Vivveridae Hyaena hyaena Hyena HHY Other Feliformia Hyaenidae Prionodon linsang Banded Linsang PLI Other Feliformia Prionodontidae Cryptoprocta ferox Fossa CFE Other Feliformia Eupleridae Supplemental Table 1: Every Felidae species was sampled except Andean Mountain Cat.

Five additional species, each representing different Feliformia families, were also queried.

59

Scaffold Forward Primer 5’-3’ Reverse Primer 5’-3’ Chromosome Position

4743 AGTCAGGACACCTGGTTTGG GTGATGTTGGTAATTATATCCTCCC C1: 124821471- 124931865 95036 TTGTTGCCTGGCAAG ATAGTAAATTATGGCACAGATAA UN36: 783726 - 786810 95892 CTGTACCATTTTCTCTTTTCAT TCATATATCTGACAAAGGATTA X: 106422338 - 106431094 99463 GTTTATTCCTTCTGGATTTCTTCAG GAAAGTTCACAGAAAAGCTAGATGA B4: 88534201 - 88540012 101187 CTTAACCTCTCTGTGCC ATCATTGAGGTAATCCCT C2: 125094248 - 125106032 106265 TAGGAAGATATTGTTATGGAAAG TGAGTAATCAACCTTGGTATT B1: 191469873 - 191484794 115304 TCATATAACAGTCCCACTCTGG AGCATATGAATGTGCACATG UN3: 10045041 - 10084868 117885 ACATGGCTGATGGAAAACT AAGTTTGTTGTGTTCCTGAGA UN36: 4121470 - 4126051 122236 TCATTGGCTAAGAGGTTG TCTCTGTTGCCAGACCTTC UN9: 2561094 - 2563938 131713 TTGGAAATCTCACTGTCTTTTCTC GTGTGCTTCCATAACAAGGC D2: 18001674 - 18212792 TGTAATGAGAAYATAAAATTTAACT 134292 ATAAT GCAAAGCTTTTAACAATTTATTC D1: 112776152 - 113004154 135256 GGGAGTGACTGCTAATGAGTA TTTGAGAATGCCCATTTC D1: 135487868 - 135678524 136949 CAGAAAACTCCAGATTTTCTACTT ATTTAGGTCTTTGATCCATTTT D1: 23281897 - 23598106 136982 AATTCTCATCTTTTCTGAATTGAG CCTCAAGCCATCCTACAGTAA D1: 25280168 - 25429612 145617 AGATGCTCTCCTCACCCT TCAGCAACCTTGGTGAGA B3: 101408163 - 101878758 145754 CTCTGTACTTTTGCTCAATTT TGGCCAAGAAAGCTAATT B3: 105227423 - 105339802 150913 CTTCAGAACACATGCAAGG GCTGAAACCATTATCAGTTCA E2: 66170238 - 66723524 151787 GTGCCAAACCACAGCCT ATCCTGAGTCCCAAGCTGTT UN9: 1208555 - 1276996 158776 GTAAAATCATCCCTAAACAAATT ATTTAAGATCAGTGGCTCATT A3: 75423320 - 75426336 164336 ACTAAATTTTTAAATAGCTCTAAA TGCCCATTTTTTAACCTA A3: 5375696 - 5567150 170176 TCTGCCTTTTGTTCCATAGACTT GCTGGGAAGAAGGTGAGTCT C2: 62008241 - 62329430 CAATTTTATGAGGAAAATAATATCT 174511 CTAT TTGGGAGGCTCATTTTCT UN13: 9741721 - 9761046 179189 CCAGGGCTTTAATGTAAAGTTTATT GCTAAATGCAACCACTTGC B1: 3729770 - 3958145 180515 TAGTATCATCTGGAARAATGC CTGTGGACACCTCTTATCTC A1: 249714913 - 249755626 194731 CTGAGCAGACCTGGATTC TTAGAACACTTYATGAAGGT A2: 200196491 - 200445527 199572 ACATGCATTCCTCTTTGAAT TAATCAATACAGGTACCAGTGACTC D4: 33412603 - 33662084 203536 TTTCCCAAAACTCCTTTCTTGT ACATGTTTCCATGGCCCA X: 69879474 - 70012608 204133 ACCTACTCAGTAACCTAATTCTGC GCTAATTAATTTCAAATATTTCCAT X: 93071968 - 93626326 212906 CCAAATGTCCATCAACAGG GATCCTTCCCATCCCATTC A1: 14131285 - 14217310 213652 TGCATCTGAGAGATCAAAAAT TCAGGGAGGGGATGTTG E1: 57845691 - 58503085 215012 TGAGCTGTCGGAGGC CTTTGGTGCTCATGGATT A2: 67400366 - 67772895 217179 ATAGGCATTTTATTTTGAATTT GAAAAGGAAATTCAAATGG X: 65114741 - 65370428 Supplemental Table 2: PCR primers flanking 60 feliform specific CanSINE insertions.

Each primer pair is designated by the corresponding UCSC genome browser scaffold number and chromosome coordinates.

60

Sample-ID Origin M/F Type Sample-ID Origin M/F Type

L. pardinus-1 Spain U 1 L. canadensis-410 Quebec U 2 L. pardinus-4 Spain F 1 L. canadensis-901 Maine M 2 L. pardinus-5 Spain U 1 L. canadensis-904 Maine M 2 L. pardinus-10 Spain U 1 L. canadensis-905 Maine M 2 L. pardinus-16 Spain F 1 L. canadensis-908 Maine M 2 L. pardinus-17 Spain U 1 L. canadensis-909 Maine M 2 L. pardinus-19 Spain M 1 L. canadensis-41 Newfoundland U 2 L. pardinus-20 Spain U 1 L. canadensis-43 Newfoundland U 2 L. canadensis-47 Newfoundland U 2 L. lynx-12 Switzerland M 2 L. canadensis-52 Newfoundland U 2 L. lynx-29 Belarus U 1 & 2 L. canadensis-53 Newfoundland U 2 L. lynx-34 Tibet F 2 L. canadensis-55 Newfoundland U 2 L. lynx-35 Xinijang M 2 L. canadensis-56 Newfoundland U 2 L. lynx-36 Xingxia F 1 & 2 L. canadensis-57 Newfoundland U 2 L. lynx-37 Xingxia M 2 L. canadensis-62 Newfoundland U 2 L. lynx-38 Xinijang M 1 L. canadensis-299 Quebec U 2 L. lynx-39 Xinijang F 2 L. canadensis-307 Quebec U 2 L. lynx-40 Gansu M 1 L. canadensis-308 Quebec U 2 L. lynx-42 Gansu F 2 L. canadensis-315 Quebec U 2 L. lynx-43 Gansu F 2 L. canadensis-318 Quebec U 2 L. lynx-44 Gansu M 1 & 2 L. canadensis-319 Quebec U 2 L. lynx-45 Ningxia F 1 & 2 L. canadensis-322 Quebec U 2 L. lynx-46 Ningxia F 1 L. lynx-47 Qinghai M 2 L. lynx-48 Qinghai M 2 L. lynx-49 Qinghai F 1 L. lynx-50 Qinghai F 1 & 2 L. lynx-51 Yunnan F 2 1 1&2 2 L. lynx-53 Russia F 2 L. canadensis 0% 0% 100% L. lynx-54 Russia F 2 L. lynx 17% 22% 61% L. lynx-55 Russia M 2 L. pardinus 100% 0% 0% L. lynx-56 Russia U 2 Supplemental Table 3 Distribution of informative CanSINE insertion sites among Lynx pardinus , L. lynx , and L. canadensis individuals. Presence of SINE 1 (S1) and SINE 2 (S2) do not appear to be correlated with geographic location in the L. lynx .

61

Supplemental Figure 1: Gel electrophoresis indicates presence of a feliform specific SINE insertion in all Feliformia species. An additional insertion can be seen in Eurasian lynx

(Lynx lynx ) at this locus. The lack of a band for the lion ( Panthera leo ) and desert cat ( Felis margarita ) indicates a failed PCR reaction.

Supplemental Figure 2: The UCSC Genome Browser was used to locate instances of

SINE insertions present in Felis catus and absent in Canis familiaris at homologous loci.

Conserved primers for PCR were designed flanking either end of the insertion site.

62

CHAPTER THREE: PHYLOGENY AND RAPID SPECIATION OF THE

FELIDAE ILLUMINATED BY CanSINE INSERTION ANALYSIS

Abstract

SINEs (short interspersed nuclear elements) are a class of retrotransposable elements that may be used as phylogenetic markers owing to a proliferation method that produces nearly homoplasy free presence or absence character states. The Felidae (cat) family within the order Carnivora has undergone periods of explosive radiation, such that the order of speciation during the initial Felidae radiation 6.5-11.0 million years ago as well as within the Lynx and Leopardus genera have yet to be fully resolved. With the Felis catus whole genome sequence as a reference, a SINE-to-SINE PCR methodology was developed that localized 53 previously uncharacterized synapomorphic and autapomorphic

Carnivora specific SINE (CanSINE) insertions throughout the Felidae family. Among the

23 synapomorphic sites, 18 are congruent with current reconstructions of the Felidae phylogeny derived from 23-thousand base pairs of nuclear and mitochondrial DNA. The five incongruent sites can be attributed to incomplete lineage sorting, species hybridization, truly non-bifurcating lineages or alternative orders of speciation. This study demonstrates the utility of SINE insertions as phylogenetic markers and reveals many of the complex evolutionary relationships within Felidae.

63

Introduction

The Felidae, commonly known as the cat family, is a charismatic clade within the order Carnivora comprised of 36-39 species including domestic, wild, ubiquitous and endangered varieties. The evolutionary history of Felidae includes multiple periods of rapid speciation and is further complicated by the potential for hybridization and introgression between extant species (Johnson et al. 2004; Johnson et al. 1999; Luo et al.

2010; Masuda et al. 1996; O'Brien and Johnson 2005; O’Brien et al. 2008). Recently, a comprehensive Felidae phylogeny was published that incorporated over 23,000 base pairs

(bp) of sequence data gathered from nuclear and mitochondrial genes. This analysis defined eight highly-supported major Felidae lineages; panthera, bay cat, caracal, ocelot, lynx, puma, Asian leopard cat, and domestic cat, that diverged in rapid succession ~6.5-

11.0 million years ago (MYA). Within each of these lineages three to seven species emerged (Figure 1). However, the precise order of speciation at the base of the Felidae phylogeny remains ambiguous with statistical support scores as low as 50% depending on the optimality criterion (Johnson et al. 2006). In addition, while most divergences within the major Felidae lineages are resolved, a few relationships remain tenuous, a reflection of ongoing speciation and/or near simultaneous radiation.

Recent mammalian phylogenetic analyses have been aided by assessments of short interspersed nuclear element (SINE) insertions at orthologous genomic loci (Nishihara et al. 2009; Schröder et al. 2009; Zhao et al. 2009). Owing to proliferation via retrotransposition (Jurka 1997; Weiner 2002), SINEs, delimited by unique target site duplications (TSDs) (Jurka and Klonowski 1996), are generally unidirectional molecular characters in which the absence of the insertion is the ancestral state while insertion 64 presence is derived (Ray 2007). In addition, excision of fixed insertions and precisely parallel integration events are extremely rare, making SINEs nearly homoplasy free molecular markers (Ray et al. 2006) such that the presence or absence of individual insertion sites can diagnose evolutionary affinities. While SINE insertion sites have been extolled as powerful phylogenetic characters (Hillis 1999; Ray 2007), incomplete lineage sorting following rapid speciation can generate SINE insertion profiles incongruent with other molecular data (Churakov et al. 2009; Nishihara et al. 2009; Takahashi et al. 2001).

In addition, recent revelations about the retrotransposition process in primates indicate

SINE integration is not entirely random. Rather, long interspersed nuclear element (LINE) derived endonuclease facilitates integration of SINE transcripts at specific genomic locations, resulting in precisely parallel or near parallel insertion events (Jurka et al. 2007;

Jurka and Klonowski 1996; Jurka et al. 2005). With advances in whole genome sequencing technologies and an increasing number of reference phylogenies, the explicatory power of SINEs in evolutionary studies can now be determined empirically.

Here we present a novel method for the discovery of phylogenetically informative

Carnivore specific SINEs (CanSINEs) throughout Felidae. Using a combination of bioinformatics and PCR-based methods, 53 CanSINE integration events are characterized.

These insertions were mapped to the Felidae phylogeny in order to 1) elucidate SINE integration patterns within felid genomes, 2) provide clade and species diagnostic molecular markers, 3) confirm newly described evolutionary relationships, 4) provide further support for weakly resolved nodes and 5) observe the effect of incomplete lineage sorting following rapid speciation on SINE distributions.

65

Methods

Acquisition of Genomic DNA from Felidae Species

The distribution of SINE insertion loci was assessed in one or more individuals

representing 39 extant Felidae species and subspecies (Supplemental Table 1). Commercial

genomic DNA from F. silvestris catus (F. catus ) was purchased from EMBD Biosciences,

Product No: 69235. Blood and tissue samples were obtained from the remaining taxa, which are organized into eight major lineages. The domestic cat lineage is comprised of four official species; jungle cat ( F. chaus ), black-footed cat ( F. nigripes ), and sand cat ( F. margarita ), the wildcat species complex ( F. silvestris ). For the purpose of this study F.

silvestris is represented by four subspecies: Chinese desert cat ( F. bieti ),

(F. silvestris ), African wildcat ( F. libyca ) and the domestic cat ( F. catus ). The panthera

lineage species sampled are leopard ( Panthera pardus ), tiger ( P. tigris ), jaguar (P. onca ),

snow leopard ( P. uncia ), lion ( P. leo ), and the two clouded leopard species ( Neofelis

nebulosa and N. diardi ). The puma lineage includes the puma ( Puma concolor ),

jaguarondi (P. yagouaroundi ) and cheetah ( Acinonyx jubatus ). The lynx lineage is

comprised of bobcat (Lynx rufus ), Siberian lynx ( L. lynx ), Canadian lynx ( L. canadensis ),

and Iberian lynx ( L. pardinus ). The Asian leopard cat lineage consists of the Asian leopard

cat ( Prionailurus bengalensis ), fishing cat ( P. viverrinus ), the rusty-spotted cat ( P.

rubiginosus ), flat-headed cat ( P. planiceps ) and the pallas cat ( Otocolobus manul ). The

caracal lineage is represented by caracal ( Profelis caracal ), African golden cat (P. aurata )

and serval ( P. serval ). The bay cat ( Pardofelis badia ), Asian golden cat ( P. temminckii ),

and marbled cat ( P. marmorata ) represent the bay cat lineage. The ocelot lineage includes

the ocelot ( Leopardus pardalis ), Andean mountain cat ( L. jacobita), tigrina ( L. tigrina ), 66 margay ( L. wiedii ), pampas cat ( L. colocolo ), kodkod (L. guigna ), and Geoffroy's cat ( L. geoffroyi ). Genomic DNA was extracted from blood or tissue samples using the Qiagen

DNeasy Blood & Tissue Kit.

SINE-to-SINE PCR and Cloning

SINE-to-SINE PCR amplification primers anneal to diagnostic motifs within the tRNA-related region of two Feliform specific CanSINE subfamilies (I and II), characterized in Chapter 2, on the anti-sense strand. Primers 1

(ATCAGACTCTTGATTTCAGCTCA) and 2 (AGCTCAGGTCATGATCCCAGG) specifically bind to subfamily I while primers 3 (TCCGACTTCAGCCAGGTC), 4

(TGATGGCTCGGAGCCT) and 5 (TCCGACTTCGGCTCAGGTC) specifically bind to subfamily II (Supplemental Figure 1). Single primer PCR was performed on approximately 20ng of extracted genomic DNA from eight Felidae species; N. nebulosa , P. onca , P. marmorata , P. badia, L. guigna, L. rufus, O. manul and P. viverrinus. Reactions consisted of 0.1U of AmpliTaq DNA polymerase, 1.5µM primer, 2.5mM MgCl 2, 0.2mM of each deoxynucleotide triphosphate and the appropriate amount of 10X AmpliTaq Buffer

II and water for a 20µL reaction. PCR conditions were 5 min at 94°C, 40 cycles of 30 sec at 94°C, 30 sec at 54°C and 90 sec at 72°C followed by a final elongation step of 5min at

72°C.

SINE-to-SINE amplifications resulted in a collection of DNA fragments flanked by

CanSINE segments. To confirm amplification and assess the size range of DNA fragments, 5µl of PCR product was fractionated by gel electrophoresis in a 1.0% agarose gel containing ethidium bromide. Prior to cloning, 15µl of PCR product was purified using the ExoSAP protocol with 0.72µl Shrimp Alkaline Phosphatase (SAP) and 1.44µl 67

Exonuclease I (ExoI) (Amersham Pharmacia, Piscataway, NJ). Isolation of SINE flanked

fragments was completed using the TOPO TA Cloning kit (Invitrogen). Twelve to 24

clones from each query species were purified using the GeneClean kit (Qiagen) and

sequenced. Cycle sequencing reactions consisted of 0.25U BigDye® Terminator v3.1

Ready Reaction mix, 1µL of forward or reverse M13 primer provided in the TOPO TA

Cloning kit, 5µL of sequencing buffer (Applied Biosystems), 2.5µL of purified clone

extract and enough water for a 10µL reaction. Cycle sequencing was performed under the

following conditions: 94°C for 10 sec, 52°C for 5 sec, and 72°C for 4 min for 45 cycles.

Products from cycle sequencing were run on an ABI 3730 DNA Analyzer and raw

sequence files were visualized with Sequencher v4.8 (GeneCodes) or with the Geneious

Pro 5.1.7 software package (Biomatters). Multiple sequence alignments were completed in

Geneious using the global alignment module with gap open penalty set to 12 and gap

extension penalty set to 2. Alignments were also refined by hand.

Identification of Novel Informative SINE loci

Sequenced DNA fragments consisted of genomic sequence from the host species

flanked at either end by the tRNA-related region of a feliform specific CanSINE insertion.

After masking for low complexity repeats using RepeatMasker, the segments were aligned

to the 10X F. catus whole genome sequence with the BLAST algorithm. When possible, the resulting homologous F. catus regions were extended 200 bp on either end, imported

into Sequencher and aligned to the appropriate contig. Two different screening strategies

were then employed depending on insertion presence or absence status in F. catus . If the

SINEs identified in the non-model species were absent in F. catus , primers were built

around the putative insertion sites and all Felidae species were then amplified by direct 68

PCR. Alternatively, if a SINE was initially identified a non-panthera lineage species and F.

catus , primers were built around the putative insertion site and direct PCR was performed on a Pantherinae species. If the insertion is present in Pantherinae (the most basal cat lineage), then the insertion must have occurred in the ancestor of all Felidae and thus is not parsimony informative. However, if the insertion is absent from Pantherinae, the insertion event must have occurred during the initial Felidae radiation. The site was then assessed by direct PCR and sequencing in all Felidae species to determine the evolutionary time of insertion.

In either scenario direct PCR reactions consisted of approximately 20ng of extracted genomic DNA, 0.1U of AmpliTaq DNA polymerase, 0.75µM forward and reverse primer, 2.5mM MgCl 2, 0.2mM of each deoxynucleotide triphosphate and the appropriate amount of 10X AmpliTaq Buffer II and water for a 20µL reaction. Touchdown

PCR conditions were 5 min at 94°C, 10 cycles of 30 sec at 94°C, 30 sec at 63°C* and 60 sec at 72°C, with a decrease in the annealing temperature (*) at a rate of 0.5°C per cycle, followed by 30 cycles of 30sec at 95°C, 30sec at 58°C** and 60sec at 72°C, then a final elongation step of 7min at 72°C. Initial annealing temperatures (*) were set to 5°C warmer than the final annealing temperature (**). Final Annealing temperatures varied from 50-

64 °C depending on the primer set.

To confirm amplification and assess the sizes of DNA fragments, 5 µL of PCR product was fractionated by gel electrophoresis in a 1.0% agarose gel containing ethidium bromide. Prior to cloning or sequencing, 20ul of PCR product was purified using the

ExoSAP protocol with 0.72µL Shrimp Alkaline Phosphatase (SAP) and 1.44µL

69

Exonuclease I (ExoI) (Amersham Pharmacia, Piscataway, NJ). Cycle sequencing reactions consisted of 0.25U BigDye® Terminator v3.1 Ready Reaction Mix, 0.075µM primer, 5µL of sequencing buffer (Applied Biosystems), 1.5µL of purified PCR product and enough water for a 10µL reaction. In some instances, internal sequencing primers were necessary to recover the entire sequence (Supplemental Table 2). Cycle sequencing was performed under the following conditions: 94°C for 10 sec, 52°C for 5 sec, and 72°C for 2 min for 45 cycles. Products from cycle sequencing reactions were run on an ABI 3730 DNA

Analyzer. Sequence results were visualized and edited with Sequencher and Geneious.

Determining SINE Presence or Absence

Full-length SINE insertion sites are delimited by the exact sequence of the 6-10 base pair target site duplication (TSD). If a SINE is present the amplification product will include; the forward primer sequence, 5’ genomic sequence, a TSD, the SINE element, the same TSD, 3’ genomic sequence and the reverse primer sequence. If a SINE is absent, the amplification product will include; the primer sequences and genomic sequence bracketing one copy of the TSD sequence. Note that the absence of any PCR product signifies amplification failure and does not imply that the SINE is absent in the homologous region of a non-model species. Thus, criteria for successful amplification loci are 1) PCR products that include target SINE insertions are about 200-400 base pairs larger than amplification products that lack a SINE insertion 2) The sequence of the TSD can be determined by examining the sequence traces 3) PCR products yielded sufficiently legible sequences such that SINE presence or absence at the TSD can be ascertained in at least

80% of the query species. All final sequences will be submitted to GenBank.

70

Results and Discussion

Non-random accumulation of novel insertion sites

Using the SINE-to-SINE amplification method, 32 genomic regions containing

informative CanSINE insertions were identified (Table 1). Twelve of the loci contained

two or more independent insertion events, resulting in identification of 53 synapomorphic

and autapomorphic markers that were mapped to Johnson et al 2006 Felidae phylogeny

(Table 1B) (Figure 1). Owing to the more recent origin of CanSINE subfamily II (Chapter

2), initial amplification with subfamily II specific primers yielded the majority, 30 of 32,

informative genomic regions.

Among the 12 loci containing multiple independent retroposition events in

unrelated species, striking accumulations were observed in loci 133135 and 212075 (Table

1B). All 4 locus 133135 integration events occur within a single 16 bp motif. An insertion

present in the entire ocelot lineage has the TSD ‘AAAATAAAGATTTATTT’, the bobcat

(L. rufus ) genome contains an insertion with TSD ‘TAAAG’, caracal (Profelis caracal ) has

an insertion flanked by ‘AAATAAAGATTT’ and marbled cat (Pardofelis marmorata ) contains an insertion flanked by ‘TTTAAATAAAGATT’. The independence of the 4 insertion events is verified by variance in microsatellite segments and the reverse orientation of the L. rufus SINE (Table 1B, Figure 2). Locus 212075 includes 6

independent insertion sites. Three of these sites are species-specific and occur within the

Prionailurus genus. The Asian leopard cat (P. bengalensis ) and flat-headed cat (P.

planiceps ) have unfixed insertions, while the single rusty-spotted cat ( P. rubiginosus )

individual is homozygous for an insertion (Table 1B, Figure 3). In addition, insertions

71

occurring in black-footed cat (Felis nigripes ) and P. bengalensis as well as in the bay cat

lineage and P. planiceps are each flanked by nearly identical TSD’s.

These results indicate Felidae specific CanSINE integrations do not occur randomly across the genome, which is consistent with observations among other mammalian SINEs

(Yu and Zhang 2005). The motif ‘TTAAAA’ is a candidate enzymatic target for LINE derived endonuclease (Jurka 1997; Jurka and Klonowski 1996) and is observed at the beginning of target site duplications among dog, primate and rodent SINEs with the first enducleolytic site between the thymine and adenosine residues (Gentles et al. 2005). A

‘TTAAAA’ or similar motif is presumably the enzymatic signal for novel felid CanSINE integrations, resulting in an accumulation of SINE retroposition events in AT-rich regions or, in the case of loci 133135 and 212075, nearly identical integration sites among independent lineages. The observed felid CanSINE clustering also reflects SINE tolerance in regions where insertion presence does not negatively impact the host. SINEs can adversely affect host organisms by supplying promoters that disrupt the regulation of adjacent genes, or by becoming integrated into coding sequence (Krull et al. 2007; Singer et al. 2004). Consequently, germ line insertions are generally limited to benign regions of the host genome (Jurka 2004).

Felidae specific CanSINE insertions with identical integration sites yet distinct internal SNPs were located previously in L. rufus and the Felis genus (Pecon-Slattery et al.

2004). In addition, insertion ‘hotspots’ with nearly identical integration sites have been observed in rodents and cetaceans (Cantrell et al. 2001; Kass et al. 2000; Nikaido et al.

1999). Here, we present additional evidence for near-parallel insertion events that are

72 distinguished from incomplete lineage sorting by precise TSDs, internal SNPs and orientation.

SINEs as Evolutionary Markers

CanSINE insertions are genetic markers that have been invaluable in distinguishing protected European mink ( M. lutreola) from the invasive American mink ( Mustela vison)

(Lopez-Giraldez et al. 2005). Here, 30 autapomorphic insertion loci were identified amongst the Felidae (Figure 1) that can potentially be used as species markers for wildlife management. Insertions occur in P. bengalensis , L. rufus , C. aurata , C. caracal, P. onca , L. canadensis, and O. manul as well as in the threatened F. nigripes, P. planiceps, P. rubiginosus, P. viverrinus, P. badia, P. marmorata, and P. uncia s pecies (IUCN RedList http://www.iucnredlist.org/) . Further investigations of these loci with large numbers of unrelated individuals will determine if they are truly species-specific fixed insertions that can provide simple biological markers.

Orthologous SINE insertions have been described as shared derived traits that can diagnose evolutionary groups (Ray 2007; Shedlock et al. 2004). Of the 23 parsimony- informative insertions located in this study, 18 are consistent with a recent reconstructions of Felidae evolution that utilized 23 kbp of nuclear and mitochondrial DNA (Figure 1).

Single synapomorphic insertions support 7 of the 8 major Felidae lineages while 11 other sites support phylogenetic affinities within the lineages. Within the most basal Felidae lineage, panthera, a site diagnostic of the Panthera genus as well as a site diagnostic of the clouded leopard ( Neofelis ) genus occur. The two clouded leopard species, Neofelis neofelis and N. diardi were recently reclassified from subspecies status due to genetic, morphological and geographical distinctions (Buckley-Beason et al. 2006). Although the 73

insertion located here does not distinguish these two endangered species, it may prove

useful for Neofelis conservation as a molecular marker that can (after extensive validation) be ascertained by a simple PCR or probe-based assay. In addition, the presence of a

Neofelis specific insertion alludes to possible species-specific insertions for future

conservation efforts. Within the caracal lineage, P. caracal and P. aurata , share one

insertion site. The ocelot lineage includes 2 insertions that support the L. geoffroyi , L.

guigna and L. tigrina clade, as well as an insertion supporting the sister taxa relationship

between L. geoffroyi and O. guigna . In the Asian leopard cat lineage, 3 distinct insertions

are synapomorphies of the Prionailurus genus. Another insertion site supports the affinity

of P. bengalensis , P. planiceps , and P. viverrinus . Two informative insertions occur in the

domestic cat lineage; one supports the grouping of F. margarita with all F. silvestris subspecies, while another is a synapomorphy of all F. silvestris subspecies (Figure 1).

O. manul is a Felidae species with an often-contested evolutionary history. Studies

of morphology, immunology and karyology indicate O. manul is an early divergence of the domestic cat lineage while some molecular markers indicate this solitary Mongolian species is an early divergence of the Asian leopard cat lineage (Johnson et al. 2006;

Masuda et al. 1996; Yu et al. 2004; Yu and Zhang 2005). Within this study, an orthologous insertion site located in O. manul and the Prionailurus genus provides further support for

the inclusion of O. manul in the Asian leopard cat lineage.

Disentangling SINE distributions following rapid speciation

When effective population sizes are large and intervals between speciation events are short, ancestral polymorphisms are retained in successive lineages. This process eventually leads to incomplete lineage sorting wherein the pattern of allele fixations does 74

not reflect the order of speciation (Avise and Robinson 2008). The Felidae family has

experienced periods of explosive speciation beginning with the radiation of the 8 major

extant Felidae lineages, which occurred in less than 4 MY. Ninety-five percent confidence

intervals of divergence date estimates are nearly indistinguishable for the nodes separating

the ocelot, lynx and puma lineages (6.7-11.7 MYA and 6.3-11.0 MYA) and the nodes

separating the puma, Asian leopard cat and domestic cat lineages (5.6-9.8 MYA and 5.3-

9.2 MYA) (Johnson et al. 2006). Divergences within the ocelot lineage have been rapid as

well, with the L. jacobita and L. colocolo divergence occurring only after 20-thousand years of separation from the stem lineage, and the divergence of L. tigrinus , L. guigna and

L. geoffroyi occurring in under 20-thousand years (Johnson et al. 2006; Johnson et al.

1999). Forty-thousand years separate both the L. canadensis , L. lynx and L. pardinus species complex and the P. bengalensis , P. viverrinus and P. planiceps species complex

(Johnson et al. 2006; Johnson et al. 2004; Luo et al. 2010). These rapid speciation are

reflected in CanSINE insertion profiles that are at times incongruent with prior

phylogenetic hypotheses.

Initial PCR and sequencing of the lynx voucher samples revealed an orthologous

insertion at locus 134463 that is present in L. canadensis and L. lynx but absent from L.

pardinus . Analyses of an additional 18 L. canadensis , 14 L. lynx and 6 L. pardinus individuals revealed that the insert is a shared derived trait of L. canadensis and L. lynx

(Figure 4a). The 134463 insertion pattern is inconsistent with maximum likelihood

phylogenetic estimations derived from concatenated segments of nuclear and mitochondrial

DNA (Johnson et al. 2006; Johnson et al. 2004), but consistent with a recent Bayesian

reconstruction based on a single mitochondrial gene (Agnarsson et al. 2010). 75

Previously we proposed incomplete lineage sorting following rapid divergence of

L. canadensis , L. lynx , and L. pardinus, ~ 1.5 MYA, resulted in paraphyletic distribution of an ancestrally polymorphic insertion at locus 106265 (Chapter 2). Here the distribution of

SINE 134463 also indicates rapid speciation events in the lynx lineage. However, given the weak statistical support (50-60%) for the L. lynx and L. pardinus clade in prior phylogenetic analyses, CanSINE locus 133463 could support a sister taxa relationship between L. lynx and L. canadensis (Johnson et al. 2006) . Additional molecular data from forthcoming whole genome sequencing efforts, such as the Genome 10K project

(www.genome10K.soe.ucsc.edu), will elucidate the relationships among the lynx species and strengthen models of SINE proliferation and fixation following rapid speciation.

Discordant SINE profiles are also observed within the ocelot lineage, wherein, an insertion at locus 133135 is present in all 10 L. pardalis , 2 L. jacobita, 9 L. tigrina , 3 L. guigna and 11 L. geoffroyi individuals sampled yet absent in the 6 L. wiedii and 3 L. colocolo individuals sampled (Figure 4b). This distribution is in conflict with prior statistically supported phylogenetic hypotheses in which L. pardalis and L. wiedii are sister species and form a lineage to the exclusion of L. tigrina , L. guigna and L. geoffroyi

(Johnson et al. 2006). To determine whether L. pardalis inherited the insertion through incomplete lineage sorting or hybridization, DNA from each Leopardus individual was sequenced at the SINE locus and at the mitochondrial gene, NADH5 (Supplemental Figure

1) . L. pardalis has a species-specific mutation (A>C) in the poly A/T tail of SINE 133135 as well as multiple unfixed A>T mutations. All L. pardalis individuals also have a species- specific mitochondrial haplotype that is most similar to the haplotype of L. weidii (data not shown). Since there is currently no evidence for hybridization with other Leopardus 76

species, the most plausible explanation for the polyphyletic distribution of the SINE locus,

with respect to L. pardalis , is incomplete lineage sorting.

The phylogenetic position of L. jacobita and L. colocolo within the ocelot lineage has yet to be resolved with consistency. Depending on the molecular data types examined and the optimality criterion employed, these two species have been placed as sister taxa or as belonging to other clades within the ocelot lineage (Johnson et al. 1998; Johnson et al.

2006; Johnson et al. 1999). Here, the distribution of SINE locus 133135, wherein the insertion is present in L. jacobita but absent in L. colocolo, does not support a sister taxa relationship between the two species. To explore the source of the L. jacobita SINE

133135 further, sequences at the SINE locus and at the mitochondrial gene, NADH5 , were ascertained for all the ocelot lineage samples (Supplemental Figure 2). The single L. jacobita SINE sequence is identical to those from L. geoffroyi and L. guigna , with the exception of an A>G mutation. In addition, both L. jacobita and L. colocolo NADH5 haplotypes are found amongst L. tigrinus , L. guigna and L. geoffroyi. Hence, whether the presence of SINE 133135 in L. jacobita is due to incomplete lineage sorting of a SINE that was present in the Leopardus ancestor or due to a closer evolutionary relationship between

L. jacobita and the L. tigrinus , L. geoffroyi and L. guigna clade cannot be determined.

Unfixed SINE insertion sites within extant species reflect ongoing processes of ancestral allele sorting. Within our sampling of the Asian leopard cat lineage, an orthologous insertion site at locus 161275 is polymorphic in 2 P. rubiginosus and 9 P. bengalensis individuals, present in all 7 P. viverrinus individuals and absent from all 5 P. planiceps individuals. This distribution is incongruent with prior statistically supported phylogenies and a fixed insertion site at chromosome C1 that is diagnostic of the P. 77

bengalensis , P. planiceps and P. viverrinus clade (Johnson et al. 2006; Luo 2006; Luo et al.

2010) (Figure 4c).

To determine whether CanSINE presence amongst the diminutive (2-3.5 pound) and ecologically vulnerable P. rubiginosus is due to incomplete lineage sorting of an

ancestral SINE during the ~2 MY radiation of the Prionailurus genus (Johnson et al. 2006) or more recent hybridization events, the sequences of each SINE and a NADH5 segment were obtained (Supplemental Figure 3). NADH5 haplotypes from both P. rubiginosus individuals are identical to those found within our P. bengalensis sampling. In addition, the two P. rubiginosus SINE sequences differ from each other and are each identical to those found amongst P. bengalensis . Thus hybridization between P. rubiginosus and P. bengalensis cannot be ruled out as the source of the P. rubiginosus SINE. Alternatively, the SINE sequence variants could have developed within the Prionailurus progenitor such that extant P. rubiginosus still possesses the ancestral diversity.

The Asian leopard cat species, P. bengalensis, also serves as a model of an ongoing

SINE fixation process. P. bengalensis is divided into two putative subspecies that have been separated approximately 2.5 MYA: a ‘northern’ population on the Asian mainland and a ‘southern’ population on the Malay Peninsula, (Luo 2006; Luo et al. 2010). The 4 individuals examined from the northern population are polymorphic at locus 161275, while all 4 southern individuals sampled are homozygous for SINE presence. Thus the southern population is on a path towards fixation of locus 161275 while the northern population will likely, barring a selective sweep, remain polymorphic and gradually drift toward fixation of presence or absence.

78

Rapid speciation during the late-Miocene

As demonstrated above, incomplete lineage sorting results in SINE insertion profiles in direct conflict with other phylogenetic reconstructions. Alternatively, conflicting SINE profiles may be the hallmark of a truly non-bifurcating mode of evolution. Here we find that the distribution of 2 SINE insertions mapped to the

“backbone” of the Felidae tree is indicative of an explosive mammalian radiation. Locus

154966 is present in all species of the domestic cat, Asian leopard cat and lynx lineages, and absent in the puma, ocelot, caracal, bay cat and panthera lineages (Figures 1 and 5).

Locus 214534 is present in all Felidae species except those of the caracal and panthera lineages (Figures 1 and 6).

SINE insertion profiles at loci 154966 and 214534 reflect the relatively weak node supports between the major Felidae lineages obtained during prior phylogenetic analysis

(Agnarsson et al. 2010; Johnson et al. 2006). The insertion at locus 154966 suggests that the lynx lineage is more recently derived than the puma lineage, which is consistent with prior minimum evolution, maximum parsimony and Bayesian analysis (Johnson et al.

2006). However, maximum likelihood reconstruction indicates, with a bootstrap support of just 55%, that the puma lineage is more recently derived (Johnson et al. 2006). Similarly, the insertion at locus 214534 suggests a more basal position of the caracal lineage within

Felidae than the bay cat lineage, while previous phylogenetic reconstructions place the bay cat lineage at a more basal position than the caracal lineage, with statistical support from

50-100% depending on the optimality criterion (Johnson et al. 2006). The insertion pattern at locus 214534 can be attributed to the nearly simultaneous divergences of the bay cat and caracal lineages ~8-8.5 MYA, resulting in ‘‘ancient’’ incomplete lineage sorting, a 79 phenomenon observed in cichlid species that diverged during a similar period of time, ~5–

10 MYA (Terai et al. 2003).

Given the plethora of genomic data now available for analyzing evolutionary relationships one would expect resolution of controversial phylogenetic nodes (Hallstrom and Janke 2010). However, even when over 23,000 bp of DNA are used in phylogenetic tree reconstruction in conjunction with SINE mapping, the order of divergence during the initial Felidae radiation cannot be resolved with confidence. Similarly, among the basal mammalian lineages, SINE assessments combined with massive amounts of genomic data could not produce indisputable bifurcating lineages (Churakov et al. 2009; Nishihara et al.

2009). When species radiation occurs in short time frames, under 5 MY, a bifurcating topology may not provide an accurate representation of evolution since rapid speciation generates mosaic genomes with conflicting phylogenetic signals (Hallstrom and Janke

2010). In such instances a polytomy or split network may be a more accurate depiction of evolutionary history.

80

A.

Scaffold Chromosome Target Site Duplication Taxonomic Group Location unmatched chrD4:80722469-80722612 TAATGGCTCACA Leopardus guigna/geoffroyi 206537 chrUn2:3605781-3605984 CGAGCCCCGCGTC Leopardus guigna/geoffroyi/tigrinus 213566 chrE2:66187777-66188092 TGTTAAGAGGAGTCAATTGC Otocolobus manul C 212331 Unknown TCAAAAGCAGTGAATCTA Otocolobus manul 215112 chrC2:108111706-108111935 undefined Otocolobus manul 150162 chrE2:39923417-39923704 CGTATGTGGGAAA Pardofelis badia 213798 chrD3:47912789-47913122 GCTAGCACA Leopardus guigna/geoffroyi/tigrinus 146417 unknown GAACACATGGAATA Pardofelis badia 216162 chrUn12:15996523-15996731 TCAAAGATTATT Pardofelis marmorata 216524 chrA2:218731772-218732047 TTGGACCTAAAGAA Pardofelis marmorata 5313 chrA3:7066904-7067187 GACATAAGCATGG Panthera onca 782 chrF1:82759656-82759439 TTCCCTGTGGGT Panthera genus unmatched chrB1:32807332-32807799 AGAAAATGAACAAACA Panthera onca 161275 chrC1:184822652-184823028 TTAAAAATTGTTTTGA Prionailurus rubiginosus/bengalensis/viverrinus 125972 chrC1:74363856-74363642 AAAGAGCTAGCTTG Prionailurus viverrinus 168013 unknown AAAAGACATAC Prionailurus viverrinus unmatched chrC1:181748497-181748747 CTCGTACTYTTATC Prionailurus planiceps/bengalensis/viverrinus 130416 chrB4:9912326-9912585 AGGATCAAAA Prionailurus genus

81

B.

Scaffold Chromosome Target Site Duplication Taxonomic Group Location 133135 chrD2:92520918-92520698 AAAATAAAGATTTATTT Ocelot Lineage except Leopardus wiedii/colocolo TAAAG Lynx rufus AAAATAAAGATTT Profelis caracal TTTAAAATAAAGATT Pardofelis marmorata 212075 chrD2:7263045-7262816 TATAAGGGAGG Bay Cat Lineage CGAACTTGAT Profelis caracal/aurata AAGAAGTAAAGCTTTA Felis nigripes AAACGAACTTGGT Prionailurus rubiginosus AAGGTATAAGGG Prionailurus planiceps AAGAAGTAAGGCTT Prionailurus bengalensis 150951 chrE2:71800542-71800192 AAAAAGCCATTCTA Asian Leopard Cat Lineage TTAAAAATAATGTTCCC Felis nigripes TAAGACCCAAAGATTAT Ocelot Lineage unmatched chrA1:248917944-248918545 GTTTTAAATAATTT Panthera Lineage AAAGTAACA Pardofelis marmorata AAAGATACATGTTCA Prionailurus genus 105890 chrA2:162899126-162899564 TAAACTTA Otocolobus manul TCATGATG Caracal Lineage TTAGCC Domestic Cat Lineage 203464 chrX:73848142-73848698 CAAGATTCTTA Felis silvestris/margarita CAATAGATTC Panthera uncia 139216 chrB4:109463719-109463931 AATAAAATAGGCAATATCA Lynx Lineage TGGTGAGGATATTTTGA Otocolobus manul 212733 chrB3:91276896-91276402 TTTCATCAGATTTT Otocolobus manul AAAAGCCTT Neofelis neofelis 73133 chrA1:78064263-78064761 TATATAGACTTTTT Otocolobus manul AATCACATCCA Lynx canadensis 188620 chrB2:122398146-122398467 GTAAGGGGTGGT Panthera onca ATCTTAGGGGG Pardofelis aurata 150853 chrE2:65441081-65441305 AAAATGGAAGTTGCTA Prionailurus genus TACACACCA Pardofelis marmorata 134463 chrD1:118666712-118666980 AAATTTAA Prionailurus genus AGTCAAGAAG Lynx canadenis/lynx

82

Table 1 Synapomorphic and autapomorphic SINE insertion loci annotated by chromosome position, UCSC genome browser scaffold number and the specific target site duplication

(TSD) sequence. A) Insertion loci in which a single insertion event among the Felidae. B)

Insertion loci containing multiple insertion events among the Felidae.

83

84

Figure 1 SINE insertion events are mapped onto the Felidae species tree adapted from

Johnson et al 2006. Historical insertion events are indicated by triangles. Multiple

insertion events are indicated by internal triangle numbers. SINEs that presumably

occurred during the initial Felidae radiation are annotated on the right. Branch lengths are

not drawn to time scale. Note, the excess of species-specific insertions in O. manul , P. viverrinus , P. badia, and P. marmorata is not indicative of increased retroposition activity, rather these species were used in initial SINE-to-SINE PCR.

85

10 20 30 40 50 60 P. caracal ATTTGCAAAATAATTCCCCAAATGGCAATAAATAACAAACTCTGGTTATACGTTTAAAAT P. marmorata ...... Leopardus ...... L. rufus ...... A...... F. catus ......

70 80 90 100 110 120 P. caracal AAAGATTTA---GGGGCGCCTGGGTGGCGCAGTCGGTTAAGCGTCCGACTTCAGCCAGGT P. marmorata ...... ----...... GG...... Leopardus ...... TTT...... L. rufus ...... ------F. catus ...... TTT------

130 140 150 160 170 180 P. caracal CACGATCTCGCGGTCCGTGAGTTCGAGCCCCGCGTCGGGCTCTGGGCTGATGGCTCGGAG P. marmorata ...... A...... T...... A... Leopardus ...... A... L. rufus ------F. catus ------

190 200 210 220 230 240 P. caracal CCTGGAGCCTGTTTCCGACTCTGTGTCTCCCTCTCTCTCTGCCCCTCCCCCGTTCATGCT P. marmorata ...... T...... Leopardus ...... T...... TG...... T...------L. rufus ------F. catus ------

250 260 270 280 290 300 P. caracal CTGTCTCTCTCTGTCCCAAAAATAAATAAAAAACGTGAAAAAAAATTAAAAAAATAAAAA P. marmorata ...... ---..A..TG..T...... G...... A-.------Leopardus ------..A..TA..T....T...... T...... ---- L. rufus ------F. catus ------

310 320 330 340 350 360 P. caracal TAAAAATA------P. marmorata ------Leopardus ------L. rufus ------TTTTTTTTTAAATTTTTTTTTTCAACGTTTATTTATTTTTGGGACAGAGACA F. catus ------

370 380 390 400 410 420 P. caracal ------P. marmorata ------Leopardus ------L. rufus GACCATGAACGGGGGAGGGGCAGAGAGAGAGGGAGACCCAGAATAGGAAACAGGCTCCAG F. catus ------

430 440 450 460 470 480 P. caracal ------P. marmorata ------Leopardus ------L. rufus GCTCTGACCCATCACCCCAGAGCCCGACACGGGGCTCGAATCCACGGACCGCGAGATCGT F. catus ------

490 500 510 520 530 540 P. caracal ------AAATAAA P. marmorata ------TTTA...... Leopardus ------A...... L. rufus GACCTGGCTGAAGTCGGACACTTAACCGACTGCGCCACCCAGGCGCCCC------.... F. catus ------

550 560 570 580 590 600 P. caracal GATTTATTTTTAATCCAAGTTTTCATGAAGAGTAATTTAGAATGAGACACACTTGCATTA P. marmorata ...... Leopardus ...... L. rufus ...... F. catus ------...... T...... 86

Figure 2 Alignment of unique CanSINE insertion events occurring at locus 133135 in the caracal ( Caracal caracal ), marbled cat ( Pardofelis marmorata ), ocelot lineage (genus:

Leopardus ) and bobcat ( Lynx rufus ) with the homologous F. catus sequence as a reference.

The independent insertions have overlapping target site duplications (shaded).

87

10 20 30 40 50 P. viverrinus ATTAATAATGATGTATTAAGAAGTAAGGCTT------P. planiceps 1 ...... ------P. bengalensis 1 ...... ------P. bengalensis 2 ...... GGGGCGCTGGGTGGCGCAGTCGGT P. planiceps 2 ...... ------P. rubiginosus ...... ------

60 70 80 90 100 P. viverrinus ------P. planiceps 1 ------P. bengalensis 1 ------P. bengalensis 2 TAAGCATCCGACTTCAGCCAGGTCACGATCTCGCGGTCCGTGAGTTCGAACCCCG P. planiceps 2 ------P. rubiginosus ------

120 130 140 150 160 P. viverrinus ------P. planiceps 1 ------P. bengalensis 1 ------P. bengalensis 2 CGTCGGGCTCTGGGCTGATGGCTCAGAGCCTGGAGCCCGTTTCCGATTCTGTGTC P. planiceps 2 ------P. rubiginosus ------

170 180 190 200 210 P. viverrinus ------P. planiceps 1 ------P. bengalensis 1 ------P. bengalensis 2 TCCCTCTCTCTCTGCCCCTCCCCCGTTCATGCCCTGTCTCTCTCTGTCCCAAAAA P. planiceps 2 ------P. rubiginosus ------

230 240 250 260 270 P. viverrinus ------TAGCACGATGGGACAGA P. planiceps 1 ------...... P. bengalensis 1 ------...... P. bengalensis 2 TAAAATAAAACGTGAAAAAAAAATTTAAAGAAGTAAGG...... P. planiceps 2 ------...... P. rubiginosus ------......

280 290 300 310 320 P. viverrinus AGTTTATTTAAAAAACA------P. planiceps 1 ...... G------P. bengalensis 1 ...... G------P. bengalensis 2 ...... G------P. planiceps 2 ...... G------P. rubiginosus ...... GAACTTGGTGGGGCGCCTGGGTGGCGCAGTCGGAGAAGC

340 350 360 370 380 P. viverrinus ------P. planiceps 1 ------P. bengalensis 1 ------P. bengalensis 2 ------P. planiceps 2 ------P. rubiginosus GTCCGACTTTGATGAGTTACCATCTTGCGGGCTGTTAATGTGGAGCCCCTCGTCT

88

390 400 410 420 430 P. viverrinus ------P. planiceps 1 ------P. bengalensis 1 ------P. bengalensis 2 ------P. planiceps 2 ------P. rubiginosus GCCTCTGGACTGAAGGCTTAGACCCTGGAGCCTGTTTCCGACTCTGTGTCTCCCT

450 460 470 480 490 P. viverrinus ------P. planiceps 1 ------P. bengalensis 1 ------P. bengalensis 2 ------P. planiceps 2 ------P. rubiginosus CTCTCTCTGCCCCTCCCCGGTTCAGCGCCCCGCCTCCCCCCCCCATTGTTTTTTT

500 510 520 530 540 P. viverrinus --AACTTGGTAATGGTAAAGGTGATTTCTCACTAAAAGAGGTATAAGGG------P. planiceps 1 --...... ------P. bengalensis 1 --...... ------P. bengalensis 2 --...... ------P. planiceps 2 --...... A...... AGGGGC P. rubiginosus TT.....A....A...------...... ------

560 570 580 590 600 P. viverrinus ------P. planiceps 1 ------P. bengalensis 1 ------P. bengalensis 2 ------P. planiceps 2 GCCTGGGTGGCGCAGTCGGTTAAGCGTCCGACTTCAGCCAGGTCACGATCTCGCG P. rubiginosus ------

610 620 630 640 650 P. viverrinus ------P. planiceps 1 ------P. bengalensis 1 ------P. bengalensis 2 ------P. planiceps 2 GTCCGGGAGTTCGAGCCCCGCGTCGGGCTCTGGGCTGATGGCTCAGAGCCTGGAG P. rubiginosus ------

670 680 690 700 710 P. viverrinus ------P. planiceps 1 ------P. bengalensis 1 ------P. bengalensis 2 ------P. planiceps 2 CCTGTTTCCGATTCTGTGTCTCCCTCTCTCTCTGCCCCTCCCCCGTTCATGCTCT P. rubiginosus ------

720 730 740 750 760 P. viverrinus ------AAATTTTGCAATTTTCTT P. planiceps 1 ------...... P. bengalensis 1 ------...... P. bengalensis 2 ------...... P. planiceps 2 GTCTCTCTCTGTCCCCAAAAAAATAAATAAACGTGAA...... P. rubiginosus ------......

89

780 790 P. viverrinus TGCCCCTATTTAAAGATAGGG P. planiceps 1 ...T.T...... P. bengalensis 1 ...T.T...... P. bengalensis 2 ...T.T...... P. planiceps 2 ...T.T...... P. rubiginosus ...T.T......

Figure 3 Alignment of three unique insertion events occurring within the Asian leopard cat lineage at locus 212075. The insertions occurring in the Asian leopard cat species ( P. bengalensis ) and flat-headed cat ( P. viverrinus ) are unfixed. Target site duplications are shaded.

90

A.

B.

C.

91

Figure 4 Three SINE insertion sites are incongruent with prior phylogenic analyses. The model topology is shown based on maximum-likelihood reconstruction using 18,853 bp of nuclear DNA, with bootstrap scores (Johnson et al 2006). A) An insertion at locus 134463 is present in all Canadian and Eurasian Lynx individuals and absent in all Iberian Lynxes.

B) An insertion at locus 133135 has a paraphyletic distribution among the ocelot lineage species. C) An insertion at locus 161275 is present in all fishing cat individuals examined, absent in all flat-headed cats examined and polymorphic among multiple Asian leopard cat and rusty spotted cat individuals. However, 4 other SINE insertion sites support the monophyly of Prionailurus and one insertion supports the monophyly of Asian leopard cat, fishing cat and flat-headed cat.

92

10 20 30 40 50 Domestic TCCTTCCAATTTCCTATGTACTTTATGTTCGTTGTAGAAAATGTGGTAGAGGGGC Asian Leopard ...... Puma ...... ------Lynx ...... Ocelot ...... A....------Caracal .....M...... ------Bay ...... ------Panthera ...... ------

60 70 80 90 100 110 Domestic GCCTGGGTGGCGCAGTCGGTTAAGCGTCCGACTTCAGCCAGGTCACGATCTCGCG Asian Leopard ...... A...... Puma ------Lynx ...... A...... Ocelot ------Caracal ------Bay ------Panthera ------

120 130 140 150 160 Domestic GTCCGTGAGTTCGAGCCCCGCGTCGGGCTCTGGGCTGATGGCTCGGAGCCTGGAG Asian Leopard .....G...... A....A...... A...... Puma ------Lynx ...... A Ocelot ------Caracal ------Bay ------Panthera ------

170 180 190 200 210 220 Domestic CCTGTTTCCGATTCTGTGTCTCCC--TCTCTCTCTGCCCCTCCCCCGTTCATGCT Asian Leopard A...... TC...... G... Puma ------Lynx ...... --...... Ocelot ------Caracal ------Bay ------Panthera ------

230 240 250 260 270 Domestic CTGTCTCTCTCTGTCCCTAAAATAAATAAAAGTTGAAAAAAAAATTTTTTTTTAA Asian Leopard ...... A...... -A.. Puma ------Lynx ...... C...... -A.. Ocelot ------Caracal ------Bay ------Panthera ------

280 290 300 310 320 330 Domestic AGAAAATGTGGTAGATATAAACAAAGAAGGGAACATCACCCCGAATCCCATGACT Asian Leopard ...... T...... C.... Puma ...... CA... Lynx ...... C.... Ocelot ...... G...... Caracal ...... CR... Bay ...... C.... Panthera ...... 93

340 350 360 370 380 Domestic GTGTCTTTGATGTTGCCACTAACTTTCACGTGCGTCCTTCCTGTCTTTGTTGAAT Asian Leopard ...... -.... Puma ...... Lynx ...... N.... Ocelot ...... G...... Caracal ...... Bay ...... Panthera ...... G......

Figure 5 Alignment of a SINE insertion at locus 154966 that occurred during the initial

Felidae radiation and is present in the domestic cat, Asian leopard cat and lynx lineages.

Each lineage is represented by a consensus sequence and target site duplications are shaded.

94

10 20 30 40 50 Domestic CACACTCACTTGAGAGCTCACTCAGTTATTGCTGCAAAAAATAGGGGCGCCTGGG Asian Leopard ...... Lynx ...... Puma ...... G...... T...... A Ocelot ...... G...... Caracal ...... G...... ------Bay ...... G...... Panthera ...... TG...... ------

60 70 80 90 100 Domestic TGGCTCAGTCGGTTAAGCGTCCGACTTTGGCTCAGGTCATGATCTCGCGGTATGT Asian Leopard ...... T...... C...... T...... Lynx ...... G...... C...... C...AC...... Puma ...G...... -...... TG.T.....C..A...... C...... Ocelot ..C...... C...... C...... Caracal ------Bay ...... G...... C...... Panthera ------

120 130 140 150 160 Domestic GAGTTCGAGCCCTGCGTCGGGCTCTGTGCTGACTGCTCAGAGCCTGGAGTCTGTT Asian Leopard ...... C..... Lynx ...... CA...... C..... Puma .....T...... C...... C..... Ocelot ...... C....T...... C..... Caracal ------Bay ...... C....T.A..--...... C..... Panthera ------

170 180 190 200 210 Domestic TCAGATTCTGTGTCTCCCTCTCTCTCTGACCCTCTCCCGTTCATGCTCTGTCTCT Asian Leopard ...... Lynx ...... ------..... Puma ...... M.A.Y...... Ocelot ...... T...... A.....T...... Caracal ------Bay ...... Panthera ------

230 240 250 260 270 Domestic CTCTGTCTCAA----AAATAAATGTTAAAAAAAA-TAAAAAAATAGATATTTATT Asian Leopard ...... ----...... -...... T...... Lynx ...... AAAT...... -...... Puma ..S...... AAAT...... T...... T...... Ocelot ...... AAAT...... T...... Caracal ------...... Bay ...... AAAT...... A...... Panthera ------......

95

280 290 300 310 320 Domestic TATTGATGCCTAGAAGTTATCCCAAAATTCAATAGTCTAAAACGACCGTTATATA Asian Leopard ...... A...... C.T.G...... M.....T...... Lynx ...... C.T.G...... T...... Puma ----...... C.T.GW...... T...... Ocelot ----...... C.T.G...... T...... Caracal ...... G...... C.T.G...... T...... Bay ...... C.T.G...... T...... Panthera ...... G...... C.T.G...... G.T......

Figure 6 Alignment of a SINE insertion at locus 214534, which occurred during the initial

Felidae radiation. The insertion is present in all lineages except the panthera and caracal lineages. Each lineage is represented by a consensus sequence and target site duplications are shaded.

96

Literature Cited

Agnarsson, I., M. Kunter, and L. May-Collado. 2010. Dog, cats and kin: A molecular

species-level phylogeny of Carnivora. Molecular Phylogenetics and Evolution 54:

726-745.

Avise, J.C. and T.J. Robinson. 2008. Hemiplasy: a new term in the lexicon of

phylogenetics. Syst. Biol 57 .

Buckley-Beason, V.A., W.E. Johnson, W.G. Nash, R. Stanyon, J.C. Menninger, C.A.

Driscoll, J. Howard, M. Bush, J.E. Page, M.E. Roelke, G. Stone, P. Martelli, C.

Wen, L. Ling, R.K. Duraisingam, P. Lam, and S. O'Brien. 2006. Molecular

evidence for species-level distinctions in clouded leopards. Current Biology 16:

2371-2376.

Cantrell, M.A., B.J. Filanoski, A.R. Ingermann, K. Olsson, N. DiLuglio, Z. Lister, and

H.A. Wichman. 2001. An ancient retrovirus-like element contains hot spots for

SINE Insertion. Genetics 158: 769-777.

Churakov, G., J. Kreigs, R. Baertsch, A. Zemann, J. Brosius, and J. Schmitz 2009. Mosaic

retrotransposon insertion patterns in placental mammals. Genome Res 19: 868-875.

Gentles, A.J., O. Kohany, and J. Jurka. 2005. Evolutionary diversity and potential

recombinogenic role of integration targets of non-LTR retrotransposons. Mol Biol

Evol 22: 1983-1991.

Hallstrom, B. and A. Janke. 2010. Mammalian evolution may not be strictly bifurcating.

Mol Biol Evol 27: 2804-2816.

Hillis, D.M. 1999. SINEs of the perfect character. PNAS 96: 9979-9981.

97

Johnson, W.E., M. Culver, J.A. Iriate, E. Eizirik, K.L. Seymour, and S. O'Brien. 1998.

Tracking the evolution of the elusive Andean mountain cat (Oreailurus jacobita)

from mitochondrial DNA. J Hered 89: 227-232.

Johnson, W.E., E. Eizirik, J. Pecon-Slattery, W.J. Murphy, A. Antunes, E. Teeling, and S.J.

O'Brien. 2006. The late Miocene radiation of modern Felidae: a genetic assessment.

Science 311: 78-77.

Johnson, W.E., J.A. Godoy, F. Palomares, M. Delibes, M. Fernandes, E. Revilla, and a.S.J.

O'Brien 2004. Phylogenetic and phylogeographic analysis of Iberian lynx

populations. Journal of Heredity 95: 19-28.

Johnson, W.E., J. Pecon-Slattery, E. Eizirik, J.-H. Kim, M. Menotti-Raymond, C. Bonacic,

R. Cambre, P. Crawshaw, A. Nunes, H.N. Seuanez, M.A.M. Moreira, K.L.

Seymour, F. Simon, W. Swanson, and S.J. O'Brien. 1999. Disparate

phylogeographic patterns of molecular genetic variation in four closely related

South American small cat species. Molecular Ecology 8: S79-S94.

Jurka, J. 1997. Sequence patterns indicate an enzymatic involvement in integration of

mammalian retroposons. PNAS 94: 1872-1877.

Jurka, J. 2004. Evolutionary impact of human Alu repetitive elements Current Opinion in

Genetics and Development 14: 603-608.

Jurka, J., V. Kapitonov, O. Kohany, and M.V. Jurka. 2007. Repetitive sequences in

complex genomes: structure and evolution. Annu. Rev. Genomics Hum. Genet. 8:

241-259.

Jurka, J. and P. Klonowski. 1996. Integration of retroposable elements in mammals:

selection of target sites. Journal of Molecular Evolution 43: 685-689. 98

Jurka, J., O. Kohany, A. Pavlicek, V. Kapitonov, and M.V. Jurka. 2005. Clustering,

duplication and chromosomal distribution of mouse SINE retrotransposons.

Cytogenetic and Genome Research 110: 117-123.

Kass, D.H., M.E. Raynor, and T.M. Willams. 2000. Evolutionary history of B1 retroposons

in the genus Mus . J. Mol. Evol. 51: 256-226.

Krull, M., M. Petrusma, W. Makalowski, J. Brosius, and J. Schmitz. 2007. Functional

persistence of exonized mammalian-wide interspersed repeat elements (MIRs).

Genome Res. 17: 1139-1145.

Lopez-Giraldez, F., B.J. Gomez-Moliner, J. Marmi, and X. Domingo-Roura. 2005. Genetic

distinction of American and European mink ( Mustela vison and M. lutreola ) and

European polecat ( M. putorius ) hair samples by detection of species-specific SINE

and RFLP assay. Journal of Zoology, London 265: 405-410.

Luo, S. 2006. Comparative phylogeography of sympatric wild cats: Implications for

biogeography and conservation in Asian biodiversity hotspots. University of

Minnesota- Doctoral Dissertation .

Luo, S., W.E. Johnson, P. Martelli, A. Antunes, J.L.D. Smith, and S.J. O'Brien. 2010.

Phylogenetic partitions of Asian felids reveal significant Indochinese-Sundaic

transition. Mol Biol Evol In Press .

Masuda, R., J.V. Lopez, J. Pecon-Slattery, N. Yuhki, and S.J. O'Brien. 1996. Molecular

phylogeny of mitochondrial cytochrome b and 12S rRNA sequences in the Felidae:

ocelot and domestic cat lineages. Molecular Phylogenetics and Evolution 6: 351-

365.

99

Nikaido, M., A.P. Rooney, and N. Okada. 1999. Phylogenetic relationships among

cetartiodactyls based on insertions of short and long interspersed elements:

Hippopotamuses are the closest extant relatives of whales. PNAS 96: 10261-10266.

Nishihara, H., S. Maruyama, and N. Okada. 2009. Retrotransposon analysis and recent

geological data suggest near-simultaneous divergence of the three superorders of

mammals. Proc Natl Acad Sci U S A 106: 5235-5240.

O'Brien, S.J. and W.E. Johnson. 2005. Big cat genomics. Annu. Rev. Genomics Hum.

Genet. 6: 407-429.

O’Brien, S.J., W. Johnson, C. Driscoll, J. Pontius, J. Pecon-Slattery, and M. Menotti-

Raymond. 2008. State of cat genomics. Trends in Genetics 24: 268-279.

Pecon-Slattery, J., A.J. Wilkerson, W.J. Murphy, and S.J. O'Brien. 2004. Phylogenetic

assessment of introns and SINEs within the Y chromosome using the cat family

Felidae as a species tree Mol Biol Evol 21: 2299-2309.

Ray, D.A. 2007. SINEs of progress: Mobile element applications to molecular ecology.

Molecular Ecology 16: 19-33.

Ray, D.A., J. Xing, A.H. Salem, and M.A. Batzer. 2006. SINEs of a nearly perfect

character. Syst. Biol 55: 928-935.

Schröder, C., C. Bleidorn, S. Hartmann, and R. Tiedemann. 2009. Occurrence of Can-

SINEs and intron sequence evolution supports robust phylogeny of pinniped

carnivores and their terrestrial relatives. Gene 448: 221-226.

Shedlock, A.M., K. Takahashi, and N. Okada. 2004. SINEs of speciation: tracking lineages

with retroposons. Trends in Ecology & Evolution 19: 545-553.

100

Singer, S.S., D.N. Mannel, T. Hehlgans, J. Brosius, and J. Schmitz. 2004. From "junk" to

gene: curriculum vitae of a primate receptor isoform gene. Journal of Molecular

Biology 341: 883-886.

Takahashi, K., Y. Terai, M. Nishida , and N. Okada. 2001. Phylogenetic relationships and

ancient incomplete lineage sorting among cichlid fishes in Lake Tanganyika as

revealed by analysis of the insertion of retroposons. Mol Biol Evol 18: 2057-2066.

Terai, Y., K. Takahashi, M. Nishida , T. Sato, and N. Okada. 2003. Using SINEs to probe

ancient explosive speciation: ‘‘Hidden’’ radiation of African cichlids? Mol Biol

Evol 20: 924-930.

Weiner, A.M. 2002. SINEs and LINEs: the art of biting the hand that feeds you. Current

Opinion in Cell Biology 14: 343 - 350.

Yu, L., Q.-w. Li, O. Ryder, and Y.-p. Zhang. 2004. Phylogenetic relationships within

mammalian order Carnivora indicated by sequences of two nuclear DNA genes.

Molecular Phylogenetics and Evolution 33: 694-705.

Yu, L. and Y.-p. Zhang. 2005. Evolutionary implications of multiple SINE insertions in an

intronic region from diverse mammals. Mammalian Genome 16: 651-660.

Zhao, F., J. Qi, and S.C. Schuster. 2009. Tracking the past: interspersed repeats in an

extinct Afrotherian mammal, Mammuthus primigenius . Genome Res 19: 1384-

1392.

101

Species Common Name Code Lineage Family Felis bieti Chinese Desert Cat FBI Domestic Cat Felidae Felis catus Domestic Cat FCA Domestic Cat Felidae Felis chaus Jungle Cat FCH Domestic Cat Felidae Felis libyca African Wild Cat FLI Domestic Cat Felidae Felis margarita Desert Cat FMA Domestic Cat Felidae Felis nigripes Black-footed Cat FNI Domestic Cat Felidae Felis silvestris European Wild Cat FSI Domestic Cat Felidae Leopardus colocolo Pampas Cat LCO Ocelot Felidae Leopardus geoffroyi Geoffroy's Cat OGE Ocelot Felidae Leopardus guigna Kodkod OGU Ocelot Felidae Leopardus jacobitus Andean Mt. Cat OJA Ocelot Felidae Leopardus pardalis Ocelot LPA Ocelot Felidae Leopardus tigrinus Tigrina LTI Ocelot Felidae Leopardus wiedii Margay LWI Ocelot Felidae Lynx canadensis Canadian Lynx LCA Lynx Felidae Lynx lynx Eurasian Lynx LLY Lynx Felidae Lynx pardina Iberian Lynx LYP Lynx Felidae Lynx rufus Bobcat LRU Lynx Felidae Panthera leo Lion PLE Panthera Felidae Panthera onca Jaguar PON Panthera Felidae Panthera pardus Leopard PPA Panthera Felidae Panthera tigris Tiger PTI Panthera Felidae Uncia uncia Snow Leopard PUN Panthera Felidae Neofelis nebulosa Clouded Leopard NNE Panthera Felidae Pardofelis badia Bay Cat PBA Bay Cat Felidae Pardofelis marmorata Marbled Cat PMA Bay Cat Felidae Pardofelis temminckii Asian Golden Cat PTE Bay Cat Felidae Otocolobus manul Pallas Cat OMA Asian Leopard Cat Felidae Prionailurus bengalensis Asian Leopard Cat PBE Asian Leopard Cat Felidae Prionailurus planiceps Flat-headed Cat IPL Asian Leopard Cat Felidae Prionailurus rubiginosus Rusty Spotted Cat PRU Asian Leopard Cat Felidae Prionailurus viverrinus Fishing Cat PVI Asian Leopard Cat Felidae Profelis aurata African Golden Cat PAU Caracal Felidae Profelis caracal Caracal CCA Caracal Felidae Profelis serval Serval LSE Caracal Felidae Puma concolor Puma PCO Puma Felidae Puma yagouaroundi Jaguarundi HYA Puma Felidae Acinonyx jubatus Cheetah AJU Puma Felidae Helogale hirtula Dwarf mongoose HPA Other Feliformia Herpestidae Genetta genetta Genet GGE Other Feliformia Vivveridae Hyaena hyaena Hyena HHY Other Feliformia Hyaenidae Prionodon linsang Banded Linsang PLI Other Feliformia Prionodontidae Cryptoprocta ferox Fossa CFE Other Feliformia Eupleridae Supplemental Table 1: CanSINE sequences were obtained from all Felidae family species along with additional species representing other Feliformia families.

102

Scaffold Forward Primer 5'-3' Reverse Primer 5'-3' Chromosome 782 GTGCTTAGCTCTGAGTGTTGGA GAGACTGGGAGCCTCCAGA chrF1:82759656-82759439 5313 TGAAGGTGGCAGTGCATGT ACCTGTGAATCTAGGCTCCCTA chrA3:7066904-7067187 125972 ATGGCTAAAAGTCTACTTCAGACT CAGAGGCCACGTTAGACATTA chrC1:74363856-74363642 130416 AGCTTGTCATACATCAGACTTCA TAGCGCATTTATCTCTGTGTTCT chrB4:9912326-9912585 133135 TAATGCAAGTGTGTCTCATTCTA ACAGTAGGGTCCTGCTTCAG chrD2:92520918-92520698 134463 GATGTCTCTCAAATGATTCATGTCT AGTGTCTTACGTTTCTGAATCTCTTCT chrD1:118666712- 118666980 146417 AACTAAACAGAAGATCCACAGG TGATTGGAGAACATGCTTG unknown 150162 TAGACTGTTGCTATGAGCAGG GACTGGCTGTAGCAACATTAG chrE2:39923417-39923704 150853 AGGCTTCAAAGTCTGGTGT TCTTCTGTCACAGAGTTGACC chrE2:65441081-65441305 150951 CCTTGACTGGCCAGTTGACT AGTCACTCTGAGTCTTACTGAAATGA chrE2:71800542-71800192 161275 ATTGCTGTAACAATCTACATTAAGAAAC CCAAGAGGTTAGCAATTTCCAT chrC1:184822652- 184823028 168013 ACTGCTGTATTCTTAGGACCCT GTCCATCTTGTCTGGTGTGA unknown 188620 ACTATTTAAGCTGGGTCTGGA CTTTGTGATGCCTCTGAACA chrB2:122398146- 122398467 206537 CTCATTAGAAACGTGATGTCTG GTGCTACCCGGAGGAA chrUn2:3605781-3605984 212075 CCATGACCCTATCTTTAAATAAGAGC GCAATGTGTTCCACATCTTAGAGTT chrD2:7263045-7262816 213798 TGCCAATCAAAGATGTAAGTTACA AGTAAATGGTGTATTGACATATGGGA chrD3:47912789-47913122 n/a ACTACATGTCCCTACCATTCA TAAATTACAAGATCATTTTGGATA chrA1:248917944- 248918545 n/a TTCCCTAACAGGTTGAAATG TATGAAATTTATTGTCAAATTGGT chrB1:32807332-32807799 n/a CAGGTGAAGCTACCTCCACTA AGTCCCTGGCACTGTGC chrC1:181748497- 181748747 n\a TGGTTTATGCAGAGAACTGGTA CATCACGCAGCCACAAA chrD4:80722469-80722612 154966 TAGATTTTCCTTCCAATTTCCTATGTAC AGAATTAGGACGGTGTCAAGCT unknown 214534 TGGCCAAATGCGAGAAGTA GCATGGAAACTTGCTTATGACA chrA3:41096151-41096453 73133 GCTGTACATATTATGGTAGTCAGTTC CAAGTGGTCTCATGTATTGACA chrA1:78064263-78064761 216524 ATGCAGTGAATCTTCATGCATA ATCTCGGGCAAACATCTTAGT chrA2:218731772- 218732047 216162 ATGACACAATTCAGGACTTACA AAGTATAATACTCACCCATCTGAAG chrUn12:15996523- 15996731 213566 ACGCTCATCACCCAACTGT CAGGTCACAACGGAGGTCA chrE2:66187777-66188092 212733 ACTACCTTGAGAAAGCAATATGCT ACAAGGTATGACCATGCAAG chrB3:91276896-91276402 139216 CTGAACTTATCCGTCTGTGATCAATA CCCAGATTGAGTCTCTTGAGATCT chrB4:109463719- 109463931 212331 AGACTCTGTGCAATAGCTCATTTA GCATTCTCCGGCATTTATAT unknown 215112 TCAGATATACATAGCACTGCTATAAGA AAGGATAGGCATCAGGTTAGG chrC2:108111706- AT 108111935 203464 ATGCGCCYTGCATACTG GCCCATAAACTGGATAATCAGTC chrX:73848142-73848698 150890 CATGGACAAACTAGGAATGCC GCTCCAACACCCCACTGT chrA2:162899126- 162899564 Supplemental Table 2: PCR primers for 31 genomic loci containing informative

CanSINE loci. Each primer pair is designated by the corresponding UCSC genome browser scaffold number (if known) and chromosome coordinates (if known).

103

Supplemental Figure 1 Alignment of the tRNA-related regions of CanSINE insertions localized in Chapter 2. The UCSC genome browser scaffold number and the species or clade in which the insertion is occurs identifies each sequence. Primers for inter-SINE

PCR (underlined) were designed to amplify motifs specific to SINEs that were active in the Feliform progenitors or within subsequent lineages. 104

10 20 30 40 50 L. pardalis _125,86(B) GTTTAAAATAAAGATTTATTTGGGGCGCCTGGGTGGCGCAGTCGGTTAAGCGTCC L. pardalis _87(B) ...... L. pardalis _88(B) ...... L. pardalis _90,91,98(B) ...... L. pardalis _116(B) ...... L. tigrinus _21(C) ...... L. tigrinus _22(C) ...... L. tigrinus _23,47(D,C) ...... L. tigrinus _15(E) ...... L. tigrinus _34(C ...... L. tigrinus _3(E) ...... L. tigrinus _43(C ...... L. geoffroyi _21(D) ...... L. geoffroyi _3,73(D) ...... L. geoffroyi _40,28(D) ...... L. geoffroyi _48(D) ...... L. geoffroyi _71(D) ...... L. geoffroyi _72(D) ...... L. geoffroyi _60(D) ...... L. guigna _32,33,37(D) ...... L. jacobita _4(E) ......

60 70 80 90 100 L. pardalis _125,86(B) GACTTCAGCCAGGTCACGATCTCGCGGTCCGTGAGTTCGAGCCCCGCGTCGGGCT L. pardalis _87(B) ...... L. pardalis _88(B) ...... L. pardalis _90,91,98(B) ...... L. pardalis _116(B) ...... L. tigrinus _21(C) ...... L. tigrinus _22(C) ...... L. tigrinus _23,47(D,C) ...... L. tigrinus _15(E) ...... L. tigrinus _34(C) ...... T...... T...... L. tigrinus _3(E) ...... T...... T...... L. tigrinus _43(C) ...... A...... L. geoffroyi _21(D) ...... L. geoffroyi _3,73(D) ...... L. geoffroyi _40,28(D) ...... L. geoffroyi _48(D) ...... L. geoffroyi _71(D) ...... L. geoffroyi _72(D) ...... L. geoffroyi _60(D) ...... A...... A...... L. guigna _32,33,37(D) ...... A...... L. jacobita _4(E) ......

120 130 140 150 160 L. pardalis _125,86(B) CTGGGCTGATGGCTCAGAGCCTGGAGCCTGTTTCCGATTCTGTGTCTCTGTCTCT L. pardalis _87(B) ...... L. pardalis _88(B) ...... L. pardalis _90,91,98(B) ...... L. pardalis _116(B) ...... L. tigrinus _21(C) ...... L. tigrinus _22(C) ...... L. tigrinus _23,47 ...... L. tigrinus _15 ...... L. tigrinus _34(C) ...... ----.. L. tigrinus _3(E) ...... ----.. L. tigrinus _43(C) ...... L. geoffroyi _21(D) ...... L. geoffroyi _3,73(D) ...... L. geoffroyi _40,28(D) ...... L. geoffroyi _48(D) ...... L. geoffroyi _71(D) ...... L. geoffroyi _72(D) ...... L. geoffroyi _60(D) ...... 105

L. guigna _32,33,37(D) ...... L. jacobita _4(E) ......

170 180 190 200 L. pardalis _125,86(B) CTCTGTCCCAAAAATAAATAAACTTT-AAAAAAAATTAATTAATTACAAAT L. pardalis _87(B) ...... -....T....A...... L. pardalis _88(B) ...... -...... A...... A...... L. pardalis _90,91,98(B) ...... -...... A...A...A...... L. pardalis _91 ...... -...... A...A...A...... L. pardalis _98 ...... -...... A...A...A...... L. pardalis _116(B) ...... A....T....A...A...A...... L. tigrinus _21(C) ...... A...... A...A...A.A.... L. tigrinus _22(C) ...... TT...... A...A...A.A.... L. tigrinus _23,47 ...... A...... A...A...A.A.... L. tigrinus _15 ...... A...... A...A...A.A.... L. tigrinus _15 ...... A...... A...A...A.A.... L. tigrinus _34(C) ...... A...... A...A...A.A.... L. tigrinus _3(E) ...... A...... A...A...A.A.... L. tigrinus _43(C) ...... A...... A...A...A.A.... L. tigrinus _47 ...... A...... A...A...A.A.... L. geoffroyi _21(D) ...... -...... -A...A...A.A...A L. geoffroyi _3,73(D) ...... -...... A...A...A.A.... L. geoffroyi _40,28(D) ...... A...... A...A...A.A.... L. geoffroyi _48(D) ...... -...... ATT.A.T.A.A.... L. geoffroyi _71(D) ...... A...... A.T.A...A.A.... L. geoffroyi _72(D) ...... T....A...... A...A...A.A.... L. geoffroyi _73 ...... -...... A...A...A.A.... L. geoffroyi _28 ...... A...... A...A...A.A.... L. geoffroyi _60(D) ...... A...... A...A...A.A.... L. guigna _32,33,37(D) ...... A...... A...A...A.A.... L. jacobita _4(E) ...... A...... A...A...A.AG...

Supplemental Figure 2: Alignment of DNA sequences from scaffold 133135 SINE amongst ocelot lineage species, with NADH5 haplotypes for each individual indicated in parentheses. Species-specific SINE variants are seen in the ocelot species ( L. pardalis ), indicating the SINE was inherited from a common ocelot ancestor. The Andean mountain cat ( L. jacobita ) sequence is nearly identical to those from the kodkod ( L. guigna ),

Geoffroy’s cat ( L. geoffroyi) and tigrina ( L. tigrinus ). Thus, whether the L. jacobita SINE is derived from incomplete lineage sorting in the common ancestor of the ocelot lineage or if the L. jacobita is more closely related to the L. guina , L. geoffroyi , and L. tigrinus clade cannot be determined.

106

10 20 30 40 50 P. viverrinus -27,34,44(A,B,B) TGAACAATAAGGAAATACTGAATGTCCTTTTTAAGAACTGATTTGAGGGGCGCCT P. viverrinus -38(A) ...... P. viverrinus -35(A) ...... P. viverrinus -20(A) ...... P. viverrinus -3(A) ...... A...... P. rubiginosus -2(C) ...... P. rubiginosus -3(C) ...... P. bengalensis -6(U) ...... P. bengalensis -9(C) ...... P. bengalensis -18(B) ...... P. bengalensis -47,102,105(B) ......

60 70 80 90 100 P. viverrinus -27,34,44(A,B,B) GGGTGGCGCAGTCGGTTAAGCGTCCGACTTCAGCCAGGTCACGATCTTGCGGTCC P. viverrinus -38(A) ...... P. viverrinus -35(A) ...... P. viverrinus -20(A) ...... A...... P. viverrinus -3(A) ...... P. rubiginosus -2(C) ...... A...... P. rubiginosus -3(C) ...... T...... C...... P. bengalensis -6(U)-8 ...... T...... C...... P. bengalensis -9(C) ...... A...... P. bengalensis -18(B) ...... A...... P. bengalensis -47,102,105(B) ...... A......

120 130 140 150 160 P. viverrinus -27,34,44(A,B,B) GTGAGTTCGAGCCCCACGTCGGGCCCTGGGCTGATGGCTCAGAGCCTGGAGCCTG P. viverrinus -38(A) ...... P. viverrinus -35(A) ...... P. viverrinus -20(A) A...... P. viverrinus -3(A) ...... P. rubiginosus -2(C) ...... G...... P. rubiginosus -3(C) ...... G.C...... P. bengalensis -6(U) ...... G...... P. bengalensis -9(C) ...... G...... P. bengalensis -18(B) ...... G...... R...... P. bengalensis -47,102,105(B) ...... G......

170 180 190 200 210 P. viverrinus -27,34,44(A,B,B) TTTCCGATTCTGTGTCTCCCTCTCTCTCTGCCCTTCCCCCGTTCATGCTCTGTCT P. viverrinus -38(A) ...... P. viverrinus -35(A) ...... P. viverrinus -20(A) ...... P. viverrinus -3(A) ...... P. rubiginosus -2(C) ...... P. rubiginosus -3(C) ...... P. bengalensis -6(U) ...... P. bengalensis -9(C) ...... P. bengalensis -18(B) ...... P. bengalensis -47,102,105(B) ......

230 240 250 260 270 P. viverrinus -27,34,44(A,B,B) CTCTCTGTCCCAAAAATAAATAAACGTTGA------AAACAAAA P. viverrinus -38(A) ...... ------...... P. viverrinus -35(A) ...... ------...... P. viverrinus -20(A) ...... ------...... P. viverrinus -3(A) ...... ------...... P. rubiginosus -2(C) ...... ------...... P. rubiginosus -3(C) ...... AAAAAAAATTTTTTTTT...A.... P. bengalensis -6(U) ...... AAAAAAAATTTTTTTTT...A.... P. bengalensis -9(C) ...... ------...... P. bengalensis -18(B) ...... ------...... P. bengalensis -47,102,105(B) ...... ------......

280 290 300 P. viverrinus -27,34,44(A,B,B) ATTAAGAACTGATTTGAAATTAATTTCAATCCAT P. viverrinus -38(A) .....A...... C. P. viverrinus -35(A) ...... G.....T..... P. viverrinus -20(A) ...... P. viverrinus -3(A) ...... P. rubiginosus -2(C) ...... 107

P. rubiginosus -3(C) ...... -...... P. bengalensis -6(U) ...... P. bengalensis -9(C) ...... P. bengalensis -18(B) .....A...... P. bengalensis -47,102,105(B) ...... -......

Supplemental Figure 3: Alignment of DNA sequences from the scaffold 161275 SINE

amongst Asian leopard cat lineage species, with NADH5 haplotypes for each individual

indicated in parentheses. The 2 rusty-spotted cat ( P. rubiginosus ) individuals have SINE

sequences that differ from each other and are found amongst the Asian leopard cat ( P.

bengalensis ). Thus it cannot be determined whether the rusty-spotted cats acquired their

SINEs through incomplete lineage sorting of ancestral polymorphisms or through hybridization with the Asian leopard cat.

108

CHAPTER FOUR: COMPUTATIONAL COMPARATIVE ANALYSES OF CanSINES IN FELIDAE

Abstract

Background: The Felidae (cat) family is a clade within the Carnivora order that is a model

system for biomedical, conservation and evolutionary biology. Recent whole genome

analysis of the domestic cat revealed that the Carnivora specific short interspersed nuclear

element (SINE) family called CanSINEs comprises more than 8% of Felidae genomes.

Current comparative genomics analyses suggest repetitive sequences, such as SINE

elements, have significant impacts on gene function and are highly useful genetic markers

for phylogenetic studies.

Results : Previously, 98 intergenic CanSINE insertions were characterized amongst 38 extant Felidae species and 5 feliform outgroup species. Here alignments of these insertions to delineate two feliform specific CanSINE subfamilies differentiated by sequence that have been active during distinct time points in feliform evolution. In addition, comparison of CanSINE flanking sequences allowed inference of the preferred endonuclease mediated

SINE integration sites. Our results are consistent with previous findings among primate

Alu ’s that SINE retrotransposition most often occurs at sites containing TTAAAA motifs.

Finally, the collection of CanSINEs was used to infer Felidae phylogeny through

concatenation of nucleotide sequences from plesiomorphic insertion sites and by using

synapomorphic insertion sites in a transformation series for parsimony analysis based on

109

SINE presence or absence. The resulting phylogenies are largely consistent with prior assessments including the division of Felidae into eight major lineages.

Conclusion: This study presents a robust characterization of CanSINEs within the dynamic

Felidae family that demonstrates the impact of these sequences on the structure of the cat genome and their utility as sources of evolutionary data.

110

Background

Transposable elements comprise a significant portion of eukaryotic genomes and can be classified into retrotransposons and DNA transposons, wherein the former propagates via RNA-mediated transposition (Wicker et al. 2007). Among the classes of retrotransposons are long interspersed nuclear repeats (LINEs) and short interspersed nuclear repeats (SINEs). While LINEs are autonomously mobilized by internally encoded reverse transcriptase, SINEs rely on other mobile elements, including LINEs, and the host genome for transposition (Pritham 2009; Weiner 2002). SINEs are 100-300 base pair (bp) sequences that are particularly abundant in mammalian genomes, accounting for over 10% of the genome, and elicit phenotypic changes through ‘exonization’, transcription regulation and genomic rearrangements (Batzer and Deininger 2002; Callinan et al. 2005;

Clark et al. 2006; Cordaux and Batzer 2009).

Much of our current understanding of mammalian SINEs is derived from extensive study of primate Alu sequences. Alu ’s are divided into subfamilies based on the extent of sequence diversity, diagnostic mutations and evolutionary distribution (Liu et al.

2009; Shimamura et al. 1999). Prior analyses have shown that integration of novel Alu transcripts occur at sequences containing two genomic sites that are cleaved by a LINE derived endonuclease. The 5’ nicking site contains six consensus nucleotides, TT-AAAA, where the hyphen indicates the cleavage site (Gentles et al. 2005) while the second nicking site has not been fully characterized but typically occurs 0-20 base pairs downstream from the 5’ site (Jurka 1997). Once integrated, orthologous SINE insertion sites are generally clade specific and permanent such that the presence of a SINE at a particular locus can 111 serve as a phylogenetic marker (Ray et al. 2007; Ray et al. 2006). Thus, collections of informative insertions may be incorporated into parsimony analyses for de novo inference of phylogeny (Nikaido et al. 2001; Sasaki et al. 2004). Nucleotide data from orthologous

SINE loci can also provide valuable phylogenetic information, although this application is seldom employed (Zehr et al. 2001).

The recently published genomes of the domestic dog ( Canis familiaris ) and domestic cat ( Felis catus ), shed light on the Carnivora order specific SINE family,

CanSINEs (Lindblad-Toh et al. 2005; Pontius et al. 2007). In addition, revisions of the

Carnivore phylogeny, including a survey of the entire Felidae family, provide a reference for evaluation of CanSINEs in an evolutionary context (Eizirik et al. 2010; Johnson et al.

2006). Using these resources nearly 100 intergenic CanSINE insertions were described

(Chapters 2 and 3). In this study, I further examine these insertions in functional and evolutionary contexts through 1) a thorough characterization of feliform suborder specific

CanSINE subfamilies, 2) determination of target insertion sites, 3) utilization of pleisomorphic insertion sites as sources of orthologous sequence data for phylogenetic analysis and 4) implementation of informative insertion sites in transformation series to reconstruct phylogenetic relationships within the Felidae.

Methods

Acquisition of CanSINE insertion loci

Previously, two strategies were employed localize feliform specific SINE insertion loci (Chapters 2 and 3). First, 31 CanSINE loci mapped to the F. catus whole genome sequence using RepeatMasker (www.repeatmasker.org, version open-3.1.0 with

Cross_match version 0.990329, Repbase Update 10.04, RM database version 20050523) 112

were characterized in all other extant Felidae species and five other feliform species

(Hyeana hyeana, Genetta genetta, Prionodon linsang, Cryptoprocta ferox and Helogale

hirtula) . This method allowed the presence/absence status of the 31 loci across Feliformia to be obtained, and also resulted in the identification of 14 previously unknown CanSINE insertions amongst wild species. The second approach utilized SINE to SINE PCR wherein primers specific to feliform CanSINE sequences were used to amplify genomic DNA from a variety of wild Felidae species. The resulting amplicons were isolated, sequenced and mapped to the F. catus genome. Potentially informative regions were then queried across

all other extant Felidae resulting in discovery of 53 CanSINE loci. The names of all

feliform species sampled are listed in Supplemental Table 1.

Subfamily Characterization

The tRNA-related regions from orthologous loci were aligned using the MAFFT

algorithm (Katoh et al. 2009) and modified by hand in the Geneious software package

(Drummond and Ashton B 2010). To account for sampling bias, consensus sequences for

insertions present throughout the feliform suborder were obtained by aligning orthologous

sequences from single individuals representing each feliform family sampled. Likewise,

consensus sequences for insertions present across Felidae were obtained by aligning

segments from single individuals representing each of the eight major Felidae lineages.

Consensus sequences for all other insertions were obtained by aligning sequences from all

species in which the insertion is found.

Phylogenetic analyses were performed using neighbor-joining, maximum

parsimony and maximum likelihood algorithms. Neighbor-joining and maximum

parsimony methods were implemented in PAUP with equal character weighting among 113 sites and the HKY85 substitution model (Swofford 2003). Maximum likelihood was implemented in GARLI through the Lattice Project Grid computing system using the general time reversible model and a gamma distribution to account for among-site rate variation (Zwickl 2006) (Bazinet and Cumming 2008). Bootstrap support values for all three analyses were obtained from 1000 repetitions. Genetic distances were estimated using average pair-wise percent identities calculated by the Geneious software package

(Drummond and Ashton B 2010). Again to account for sampling bias, pair-wise percent identities of insertions present across Feliformia were calculated by comparing only one representative per taxonomic family.

Target Site Duplications

Target site duplication (TSD) sequences were determined by first running a

BLAST search http://www.proweb.org/proweb/Tools/selfblast.html on full-length SINE sequences plus 60 nucleotides 5’ and 3’, to identify perfectly matching repeat sequences.

However, accumulation of mutations necessitated manual estimation of most TSDs by locating a TT-AAAA or similar hexamer (the putative endonuclease nick site) 5’ to the

SINE start site and inferring the TSD as the sequence in between. Afterward the matching sequence was located following the 3’ end. While this technique was usually successful, in some cases, neither a perfect match, putative nicking hexamer nor 3’ duplication could be located and these sites were omitted from further analyses. In total, the TSDs and flanking

16 bp were obtained for 71 CanSINE sequences. Nucleotide frequencies were calculated for each relative position and χ2 statistics were calculated using the R program

(http://www.R-project.org) with significant p values at 0.05 and 0.01.

114

Molecular Phylogeny

Two super matrices were used to construct molecular Felidae phylogenies. The first consisted of 23 concatenated tRNA-related regions from orthologous SINE loci for a total alignment length of 2,965 bp. The second consisted of a concatenation of those same tRNA-related regions plus 18 orthologous CanSINE microsatellite regions for a total alignment length of 4,008 bp. Phylogeny was estimated using maximum parsimony and maximum likelihood and Bayesian optimality criterion. Maximum parsimony methods were implemented in PAUP with equal character weighting among sites and the HKY85 substitution model and gaps coded as fifth characters (Swofford 2003). The general-time reversible plus proportion invariant plus gamma (GTR+I+G) model was selected as the optimal nucleotide substitution model for likelihood analyses using Modeltest with the AIC criterion (Posada and Buckley 2004; Posada and Crandall 2001). Maximum likelihood was implemented in GARLI through the Lattice Project Grid computing system using the general time reversible substitution model and a gamma distribution to account for among- site rate variation (Bazinet and Cumming 2008; Zwickl 2006). Bayesian analyses were performed using BEAST using the general time reversible substitution model and a strict molecular clock (Drummond and Rambaut 2007). Bootstrap support values for parsimony and likelihood analyses were obtained from 1000 repetitions.

Insertion Site Phylogeny

A matrix of presence/absence character states was compiled for 31 parsimony informative CanSINE insertion sites using WinClada (Nixon 2003). All characters were weighted equally and treated as non-additive. A heuristic parsimony was also completed

115 using Winclada with the branch swapping TBR method and unsupported nodes were collapsed in the resultant phylogeny.

Combined Sequence and Presence/Absence Phylogeny A combined matrix of 31 parsimony informative CanSINE insertion sites and 4,008 bp of sequence data from orthologous SINE loci was composed using Mega 5 (Tamura et al. 2011). Presence was coded as ‘G’ and absence coded as ‘A’. Equally character weighting maximum parsimony reconstructions were performed using Mega 5 with the closest neighbor interchange algorithm, and variable character weighting maximum parsimony reconstructions were performed using Winclada with the branch swapping TBR method.

Results and Discussion

CanSINE Subfamilies

Nearly 100 intergenic CanSINE insertion loci were isolated and sequenced across all Felidae species and other feliforms. Alignment of consensus sequences from the tRNA- related regions of each orthologous locus revealed two SINE subfamilies. Subfamily I is comprised of insertions present in all Feliformia species examined and thus was presumably active in the genome of the common Feliformia ancestor. However, a C. ferox specific insertion indicates more recent proliferation of subfamily I among the Malagasy carnivores. Subfamily II is comprised of more recently derived insertions that were only encountered within the Felidae and Prionodontidae lineages either as plesiomorphies, synapomorphies or autapomorphies.

The two subfamilies are distinguished by an eight base pair indel at alignment position 35 and a seven base pair indel at alignment position 108. Alignment of subfamily 116

I insertions also reveals two sequence subtypes, A and B, that are differentiated by insertions and single nucleotide polymorphisms (Supplemental Figure 1). Within subfamily II, the insertion sequences are also divided into two subtypes. Subtype B occurred after the initial Felidae radiation (as synapomorphies or autapomorphies) and has a thymine deletion at alignment position 120 relative to subtype A (Supplemental Figure

1). Phylogenetic trees drawn using neighbor-joining, maximum parsimony and maximum likelihood algorithms each result in statistically supported subfamily I and II clades (Figure

1). In addition, within subfamilies 1 and II, subtypes A and B form monophyletic lineages respectively.

Once a new SINE insert becomes fixed within a population it will acquire mutations through subsequent speciation events such that the degree of sequence divergence between species is proportional to the age of the insertion (Nishihara et al.

2007). Accordingly, subfamily I insertions, which have existed for ~40-60 MY (Eizirik et al. 2010) have an average pair-wise percent similarity of 83.45% (standard deviation

7.23%) amongst the species in which they occur while the 11 MY or younger (Johnson et al. 2006) subfamily II insertions have an average pair-wise percent similarity of 96.32%

(standard deviation 2.71%) (Figure 2).

Although transposable element databases such as Repbase (Jurka et al. 2005) and

RepeatMasker (Smit and Green 2005) include a broad collection of mammalian SINE voucher sequences, the breadth of species in which a particular SINE family or subfamily occurs is often undocumented. Thorough delineation of SINE sequences is essential to future genome annotation. Here, two feliform specific SINE subfamilies are characterized in greater detail than is currently available in published databases. The RepeatMasker 117 voucher sequences, SINEC_Fc and SINEC_Fc2, each cluster with Felidae-wide subfamily

II insertions (Supplemental Figure 1), while Repbase’s SINEC_Fc3 sequence does not cluster with any of the feliform specific SINEs characterized in this study (data not shown).

Insertion Site Patterns

The target site duplication (TSD) sequence, 16 bp of upstream genomic sequence and 16 bp of downstream genomic sequence was obtained for 32 subfamily I and 39 subfamily II insertions. Relative base frequencies were calculated at each nucleotide location (Figure 3). Similar to primates, rodents and other carnivores, thymine is disproportionately represented in the two base pairs immediately preceding the first endonucleolytic nicking site of feliform SINEs, while adenine is overrepresented in the four base pairs following the first nicking site. This suggests the preferred 5’ nicking site is between TT and AAAA. The statistical significance of the nicking site structure was verified by a χ2 test that compares the observed to expected base occurrences at individual nucleotide positions based on the overall base composition of the 5’ flanking repeats and the 16 bp adjacent regions (Figure 4) (Gentles et al. 2005; Jurka 1997). In addition, a tally of hexamers located in positions -2 through +4 shows an abundance (63%) of TTAAAA motifs or variants that differ by 1 nucleotide (Figure 4). Comparatively less base composition structure is observed at the 3’ ends of the flanking repeats. Here, only base occurrences at the 14th position preceding the second nick site are significant at p=0.01 and this structure is likely the reflection of the 5’ nick site described above (Figures 3 and 4).

Insertion site preference does not vary significantly between subfamily I and subfamily II insertions at either the 5’ or 3’ ends.

118

The structured base composition at the 5’ end of the flanking repeat supports the

involvement of L1 LINE endonuclease in the generation of staggered nicks prior to

CanSINE retrotransposition. This mechanism was described for primate and rodents

SINEs and shown to nick DNA in A/T rich regions in vitro (Feng et al. 1996; Gentles et al.

2005; Jurka 1997). The position of the second nick site is less defined and may depend on the distance between the active sites of the endonucleolytic enzyme rather than sequence

(Jurka 1997), as indicated by the observation that over 50% of TSDs are between 14 and 17 base pairs in length (Table 1). Non-random insertion patterns also explain the near-parallel insertions that are repeatedly observed amongst CanSINEs (Pecon-Slattery et al. 2004;

Schröder et al. 2009) (Walters-Conte, Chapters 2 and 3).

SINEs Sequences for Phylogenetic Reconstruction

Phylogenetic reconstructions ideally utilize orthologous loci that are distributed throughout the genome, present in all taxa, are easy to align and variable enough to provide sufficient numbers of informative sites (Cruickshank 2002). Mammalian SINEs fulfill these requirements, hence Felidae phylogenies were constructed using orthologous

CanSINE loci (Johnson et al. 2006).

Sequence data was compiled from 23 orthologous CanSINE sequences present either in all Feliformia or all Felidae species. Data set 1 consisted of an alignment of approximately 130 bp of tRNA-related regions from each locus for a total of 2,965 bp while data set 2 was an alignment of the tRNA-related regions plus the microsatellite regions of 18 loci for a total of 4,008 bp. Phylogenies were constructed using maximum parsimony, maximum likelihood, and Bayesian analyses with gaps were coded as fifth character states to maximize the informative sites in parsimony analysis. 119

Each of the major Felidae lineages is clearly defined under all optimality criterion

(Figure 5). However, Bayesian estimation included a weakly supported bay cat and lynx

lineage clade, with posterior probability scores of 94% (Supplemental Figure 4). The

inclusion of SINE microsatellite regions allowed resolution of many within-lineage

relationships that are largely congruent with prior comprehensive Felidae topologies

(Johnson et al. 2006). Among the exceptions is the placement of P. planiceps as the basal

Prionailurus species with P. bengalensis , P. rubiginosus and P. viverrinus clade support values of 74% for maximum parsimony, 60% for maximum likelihood and 88% for

Bayesian estimation. Given that the SINE data sets are less than one-fifth the size Johnson et al data set, some inconsistencies and unresolved nodes were expected. Thus, sequences derived from orthologous SINE insertions contain sufficient variable sites to allow phylogenetic resolution through at least the familial-level.

SINE Presence/Absence for Phylogenetic Reconstruction

A phylogenetic tree of the Felidae was constructed on the basis of SINE insertion status from a total of 31 previously characterized informative loci (Chapters 2 and 3). As expected, because there were fewer characters than taxa, the phylogeny was not fully resolved. Nonetheless, a single most parsimonious tree (L=33, CI=93, RI=97) was obtained that is largely congruent with previous assessments of Felidae evolution including resolution of each of the eight major lineages and several internal nodes (Figure 6). The phylogeny required a single character reversal within the Asian leopard cat lineage and one parallel insertion event in the lynx lineage. These two inconsistencies were discussed previously as consequences of incomplete lineage sorting following rapid speciation

(Chapter 3). 120

While SINE insertions have been lauded as ideal characters for cladistic

phylogenetic analysis (Shedlock et al. 2004), the relative difficulty of localizing adequate

numbers of independent insertions has limited the use of SINEs as characters in a

transformation series (Nikaido et al. 1999). SINE insertions are more often employed as

additional evidence for existing phylogenies (Murphy et al. 2007; Schröder et al. 2009; Yu

et al. 2008). Using recently developed comparative genomic approaches; a sufficient

number of loci were characterized such that a de novo Felidae phylogeny could be

constructed. With additional resources, including additional whole genome sequence

projects, it is now possible to fully resolve ambiguous mammalian phylogenies based

solely on SINE insertion data (Xing et al. 2005).

Combined Matrix for Phylogenetic Reconstruction A phylogenetic analysis using a matrix of 4,008 bp of SINE derived sequence plus

31 SINE presence/absence characters was performed using maximum parsimony. With

equal character weighting, heuristic searches retained 11 most parsimonious trees with consistency index (CI) = 0.80 and retention index (RI) = 0.78. The topology of the majority rules consensus is nearly identical to topologies obtained from sequence-only analyses (Supplemental Figures 2 and 8). Bootstrap support scores obtained from 1000 replicates are noted at equivalent nodes on the sequence-derived tree in Figure 5. To balance the relative contributions of the data partitions, a series of maximum parsimony trees were constructed with variable weighting of the presence/absence data. Although the

topologies of the resulting trees did not vary significantly, consistency and retention indices

increased with additional weighting of presence/absence characters (Table 2).

121

The combined matrix of sequence and presence/absence data provides molecular

characters that are evolving at different rates: a fast-evolving microsatellite portion with an

inherent risk for homoplasy from mutational saturation, a medium-evolving tRNA-region

derived portion with fewer variable sites but less mutational saturation, and slowly-

evolving presence/absence characters with few phylogenetic inconsistencies. Here, the

combined data generally improves the resolution of the Felidae tree (Figure 5). Bootstrap

support scores also increased at several nodes with the addition of presence/absence data.

However, reduced support for the L. pardinus and L. lynx lineage (non-resolution of the

lynx clade) with the addition of presence/absence data can be attributed to an insertion

present in L. lynx and L. canadensis that is absent in L. pardinus as well as two insertions

that are present in either L. canadensis or L. pardinus but polymorphic in L. lynx (Chapters

2 and 3).

The diagnostic power of presence/absence data in the combined data phylogeny is also demonstrated by consistency and retention indices obtained under different character weighting schemes (Table 2). As the relative weight of the presence/absence partition increases, the tree length, CI and RI all increase. While tree length will inherently increase each time a SINE character is used to diagnose a clade, higher CI and RI indicate that synapomorphies defined by presence/absence characters are strongly supported in light of conflicting signal from the faster evolving tRNA-related and microsatellite regions.

Increased resolution of the SINE derived Felidae phylogeny when presence/absence data is combined with sequence data is consistent with a growing number of phylogenetic studies wherein analyses of combined morphological and molecular sequence data sets yield increased levels of resolution and support compared to analyses of molecular data 122 alone (Wortley and 2005). Here, SINE presence/absence data is similar to morphological data in that changes are relatively rare and thus provide synapomorphic traits.

Conclusions

Using comparative genomics approaches that utilize the F. catus genome as a reference, CanSINE insertion sequences were surveyed throughout the Felidae family.

Alignment of consensus sequences derived from orthologous loci reveals two subfamilies that have been active at different times during Feliformia evolution. The collection of sequences termed, subfamily I, are inactive sequences derived from the common feliform ancestor while, subfamily II has been active since the common Felidae ancestor and is a continual source of insertional mutagenesis. Examination of TSDs and flanking sequence indicate CanSINE insertions proliferate in a manner similar to rodent and primate SINEs, wherein an endonucleolytic enzyme derived from the L1 LINEs nicks genomic DNA at

TTAAAA or similar motifs followed by a second nick approximately 15 base pairs downstream. This highly targeted enzymatic activity has resulted in near-parallel and parallel SINE insertions that are often observed among CanSINEs (Pecon-Slattery et al.

2004; Schröder et al. 2009; Yu and Zhang 2005). Findings from non-traditional phylogenetic analyses implemented in this study have broad implications for the use of

SINEs in future comparative analyses. Orthologous insertions sites may indeed be used to build phylogenetic trees and inclusion of (CT)n microsatellite regions increases phylogenetic resolution. In addition given sufficient synapomorphic insertions, a familial phylogeny can be generated from SINE presence or absence data. Finally, in accordance with a recent surge in evolutionary studies combining multiple data types (Spaulding et al. 123

2010; Wortley and Scotland 2005), a combined matrix of SINE sequence and presence/absence data yields highly resolved phylogenetic reconstructions.

124

Length Observations 7 2 8 1 9 1 10 4 11 4 12 8 13 7 14 10 15 10 16 12 17 9 18 4 20 1 23 1 24 1

Table 1 The unimodal length distribution of 71 feliform specific CanSINE target site

duplications (TSDs). Over 50% of insertions are flanked by 14-16 base pairs of duplicated

sequence.

Presence/Absence Most Parsimonious Steps Consistency Retention Character Weight Trees (Length) Index Index 1 (equal) 11 1324 80 78 2 3 1359 80 80 5 5 1460 81 83 10 5 1625 82 86 30 5 2285 85 92 60 5 3275 88 94

Table 2 Tree scores from maximum parsimony reconstructions using a combined matrix of

SINE sequence and presence/absence characters.

125

Figure 1 A minimum evolution phylogeny of Feliform CanSINEs, obtained from 93 aligned tRNA-related regions, depicts two SINE subfamilies (I and II) with internal clades of subtypes A and B respectively. Numbers in bold indicate bootstrap support scores based on 1000 pseudo-replicates from minimum evolution/parsimony/likelihood optimizations.

The full alignment including terminal labels is in Supplemental Figure 1.

126

Pairwise % ID vs Last Common Ancestor Age

100

95

90

85 Subfamily I 80 Subfamily II 75

70

65

60 0 10 20 30 40

Age of Last Common Ancestor (MY)

Figure 2 Pair-wise % similarity plotted against the age of the last common ancestor for each insertion. Within subfamily II there is a linear relationship between age of insertion and % sequence divergence. All subfamily I insertions plotted are conserved across

Feliformia, thus the age of the last common ancestor is always the age of the Feliformia progenitor.

127

A)

5' Insertion Site Composition

0.90

0.80

0.70

0.60 T's 0.50 C's 0.40 A's G's 0.30

0.20

0.10

0.00

-8 -6 -4 -2 1 3 5 7 9 -16 -14 -12 -10 11 13 15 Poistion

B)

3' Insertion Site Composition

0.8

0.7

0.6

0.5 T's C's 0.4 A's G's 0.3

0.2

0.1

0

-8 -6 -4 -2 1 3 5 7 9 -16 -14 -12 -10 11 13 15 Position

Figure 3 Nucleotide composition of the Felifrom CanSINE insertion integration sites. A)

Base positions -16 to -1 represent the 16 bp sequence before the first nick site, while positions 1 to 16 represent the TSD following the first nick site. B) Base positions -16 to -1 represent the 16 bp within the TSD that are before the second nick site, while positions 1 to

16 represent the genomic sequence following the second nick site. Note that few pre- integrations sites are greater than 17bp. 128

A)

B)

129

2 Figure 4 χ values for each sequence position of subfamily I and subfamily II integration sites. Horizontal lines correspond to significance levels P=0.01 (grey) and P=0.001 (black).

A) At the 5’ integration site a cluster of significant values occurs at the positions immediately surrounding the nick site, between-2 and 4. B) At the end of the CanSINEs, only a single site 16 bp before the second nick site was found to have significant base composition bias at p=0.001.

130

A.

131

B.

Node Description tMP tML tBAY tmMP tmML tmBay tmsMP 1 Felidae base 99 86 100 100 96 100 100 2 (Bay Cat, Ocelot, Lynx, Puma, A. Leopard, Domestic) 74 < 100 59 92 100 80 3 ( A. Leopard, Domestic) < < < < 67 92 66 4 Domest cat lineage 99 100 100 95 90 100 97 5 (F. margarita,F.catus,F.silvestris,F.libyca,F.bieti) < < 99 < < 63 67 6 (F.catus,F.silvestris,F.libyca,F.bieti) < < 99 91 80 100 77 7 (F.bieti, F.silvestris, F.libyca) 77 < 100 94 80 100 92 8 (F.silvestris,F.libyca) 73 69 100 99 88 100 89 9 Asian Leopard Cat lineage 97 93 100 99 98 100 97 10 (P.rubiginosus,P.bengalensis,P.viverrinus,P.planiceps) 98 99 100 98 96 100 99 11 (P.bengalensis,P.viverrinus,P.rubiginosus) 59 59 95 74 60 98 73 12 (P.viverrinus,P.bengalensis)* < < < 60 < 84 66 13 Ocelot lineage 99 100 100 99 100 100 100 14 (L.tigrinus,L.geoffroyi,L.guigna) 94 99 100 98 97 100 99 15 (L.geoffroyi,L.guigna) 89 82 100 95 98 100 98 16 Puma lineage 99 98 100 99 100 100 99 17 (P.concolor,P.yagouaroundi) 72 < 99 100 87 100 79 18 Lynx lineage 99 100 100 100 99 100 100 19 (L.canadensis,L.pardinus,L.lynx) 96 99 100 82 95 100 99 20 (L.pardinus,L.lynx) 61 51 96 67 57 95 < 21 Bay cat lineage 99 69 100 97 78 100 81 22 (P.badia,P.temminckii)* 100 100 100 100 100 100 100 23 Panthera lineage 85 96 100 93 100 100 97 24 (P.leo,P.onca,P.pardus,P.tigris,P.uncia) 94 90 100 99 96 100 98 25 (P.tigris,P.uncia,P.pardus) 69 59 100 < < 95 53 26 (P.tigris,P.uncia) 85 79 100 70 67 100 78 27 (P.leo,P.onca) < < 96 51 < 100 62 28 Caracal lineage 89 85 100 99 95 100 94 29 (P.caracal,P.aurata) 99 99 100 99 100 100 100 30 (H.hyaena,C.ferox,H.hirtula) 98 100 100 93 100 100 93 31 (C.ferox,H.hirtula) 85 61 95 91 62 89 72

Figure 5 Felidae phylogenies derived from orthologous CanSINE sequences via maximum parsimony, maximum likelihood and Bayesian optimization. A) The topology displayed is from maximum likelihood reconstruction using tRNA-related and microsatellite regions.

Individual topologies in supplemental figures 2-8. Node labels correspond to support values listed in (B). B) Analyses denoted “t” were completed using 23 concatenated tRNA-related regions, analyses denoted “tm” were completed using a concatenation of 23 tRNA-related regions and 18 microsatellite regions and the analysis denoted “tms” was completed using a concatenation of 23 tRNA-related regions, 18 microsatellite regions and

31 SINE presence/absence characters.

132

Genetta genetta (GGE) 100000000000000000000000000000000 Hyaena hyaena (HHY) 100000000000000000000000000000000 Helogale hirtula (HPA) 100000000000000000000000000000000 Cryptoprocta ferox (CFE) 100000000000000000000000000000000 Profelis serval (LSE) 111000000000000000000000000001000 P.caracal (CCA) 111000000000000100000000000001000 P.aurata (PAU) 111000000000000100000000000001000 Neofelis nebulosa (NNE) 111000000000000000100000000000000 Panthera leo (PLE) 111000000100000000100000000000000 P.onca (PON) 111000000100000000100000000000000 P.pardus (PPA) 111000000100000000100000000000000 P.unica (PUN) 111001000100000000100000000000000 P.tigris (PTI) 111001000100000000100000000000000 Puma concolor (PCO) 111000000000000000000000000010010 P.yagourundi (HYA) 111000000000000000000000000010010 A.jubatus (AJU) 111000000000000000000000000010010 Pardofelis badia (PBA) 111000000000001000000000000000010 P.temminckii (PTE) 111000000000001000000000000000010 P.marmorata (PMA) 111000000000001000000000000000010 Leopardus pardalis (LPA) 111000000000010001000000000000010 L.weidii (LWI) 111000000000010000000000000000010 L.colocolo (LCO) 111000000000010000000000000000010 L.tigrinus (LTI) 111000011000010001000000000000010 L.geoffroyi (OGE) 111000111000010001000000000000010 L.guigna (OGU) 111000111000010001000000000000010 Lynx pardinus (LYP) 111100000000000000000001000000110 L.rufus (LRU) 111000000000000000000001000000110 L.lynx (LLY) 111110000000000000000001001000110 L.canadensis (LCA) 111010000000000000000001001000110 Otocolobus manul (OMA) 111000000000000010000000000000110 Prionailurus rubiginosus 111000000000100010010000110000111 P.bengalensis (PBE) 111000000011100010010000110000111 P.viverrinus (PVI) 111000000011100010010000110000111 P.planiceps (IPL) 111000000011100010010000110000110 Felis catus (FCA) 111000000000000000000110000100110 F.nigripes (FNI) 111000000000000000000100000000110 F.chaus (FCH) 111000000000000000000100000000110 F.margarita (FMA) 111000000000000000000110000000110 F.silvestris (FSI) 111000000000000000000110000100110 F.libyca (FLI) 111000000000000000000110000100110 F.bieti (FBI) 111000000000000000000110000100110

Figure 6 Phylogenetic relationships among the Felidae. A) Matrix showing the character

states for 31 retropositional events used to generate the phylogeny. 0 = absence; 1 =

presence B) Insertion sites (numbered) are mapped onto the single maximum parsimony

tree inferred from these data. The consistency index was 93 and the retention index was 98

for this 34-step tree (where each step equals an insertion/deletion event).

133

Literature Cited

Batzer, M.A. and P.L. Deininger. 2002. Alu repeats and human genomic diversity. Nature

Reviews Genetics 3: 370-379.

Bazinet, A. and M. Cumming. 2008. The lattice project: a grid research and production

environment combining multiple grid computing models. Distributed & Grid

Computing- Science Made Transparent for Everyone. Principles, Applications and

Supporting Communities. : 2-13.

Callinan, P.A., J. Wang, S.W. Herke, R.K. Garber, P. Liang, and M.A. Batzer. 2005. Alu

retrotransposition-mediated deletion. J Mol Biol 348: 791-800.

Clark, L.A., J.M. Wahl, C.A. Rees, and K.E. Murphy. 2006. Retrotransposon insertion in

SILV is responsible for merle patterning of the domestic dog. PNAS 103: 1376-

1381.

Cordaux, R. and M. Batzer. 2009. The impact of retrotransposons on human genome

evolution. Nature Reviews Genetics 10: 691-703.

Cruickshank, R.H. 2002. Molecular markers for the phylogenetics of mites and ticks.

Systematic & Applied Acarology 7: 3-14.

Drummond, A. and B.S. Ashton B, Cheung M, Cooper A, Heled J, Kearse M, Moir R,

Stones-Havas S, Sturrock S, Thierer T, Wilson A. 2010. Geneious v5.1.

Drummond, A. and A. Rambaut. 2007. BEAST: Bayesian evolutionary analysis by

sampling trees. BMC Evolutionary Biology 7.

134

Eizirik, E., W.J. Murphy, K.P. Koepfli, W.E. Johnson, J. Dragoo, R.K. Wayne, and S.J.

O'Brien. 2010. Pattern and timing of diversification of the mammalian order

Carnivora inferred from multiple nuclear gene sequences. Molecular Phylogenetics

and Evolution 56: 49-63.

Feng, Q., J.V. Moran, H.H. Kazazian, and J.D. Boeke. 1996. Human L1 retrotransposon

encodes a conserved endonuclease required for retrotransposition. Cell 87: 905-

916.

Gentles, A.J., O. Kohany, and J. Jurka. 2005. Evolutionary diversity and potential

recombinogenic role of integration targets of non-LTR retrotransposons. Mol Biol

Evol 22: 1983-1991.

Johnson, W.E., E. Eizirik, J. Pecon-Slattery, W.J. Murphy, A. Antunes, E. Teeling, and S.J.

O'Brien. 2006. The late Miocene radiation of modern Felidae: a genetic assessment.

Science 311: 78-77.

Jurka, J. 1997. Sequence patterns indicate an enzymatic involvement in integration of

mammalian retroposons. PNAS 94: 1872-1877.

Jurka, J., V.V. Kapitonov, A. Pavlicek, P. Klonowski, O. Kohany, and J. Walichiewicz.

2005. Repbase Update, a database of eukaryotic repetitive elements Cytogenetic

and Genome Research 110: 462-467.

Katoh, K., G. Asimenos, and H. Toh. 2009. Multiple alignment of DNA sequences and

MAFFT. Methods Mol Biol 537: 39-64.

Lindblad-Toh, K. C.M. Wade T.S. Mikkelsen E.K. Karlsson D.B. Jaffe M. Kamal M.

Clamp J.L. Chang E.J. Kulbokas, 3rd M.C. Zody E. Mauceli X. Xie M. Breen R.K.

Wayne E.A. Ostrander C.P. Ponting F. Galibert D.R. Smith P.J. DeJong E. 135

Kirkness P. Alvarez T. Biagi W. Brockman J. Butler C.W. Chin A. Cook J. Cuff

M.J. Daly D. DeCaprio S. Gnerre M. Grabherr M. Kellis M. Kleber C. Bardeleben

L. Goodstadt A. Heger C. Hitte L. Kim K.P. Koepfli H.G. Parker J.P. Pollinger

S.M. Searle N.B. Sutter R. Thomas C. Webber J. Baldwin A. Abebe A. Abouelleil

L. Aftuck M. Ait-Zahra T. Aldredge N. Allen P. An S. Anderson C. Antoine H.

Arachchi A. Aslam L. Ayotte P. Bachantsang A. Barry T. Bayul M. Benamara A.

Berlin D. Bessette B. Blitshteyn T. Bloom J. Blye L. Boguslavskiy C. Bonnet B.

Boukhgalter A. Brown P. Cahill N. Calixte J. Camarata Y. Cheshatsang J. Chu M.

Citroen A. Collymore P. Cooke T. Dawoe R. Daza K. Decktor S. DeGray N.

Dhargay K. Dooley P. Dorje K. Dorjee L. Dorris N. Duffey A. Dupes O.

Egbiremolen R. Elong J. Falk A. Farina S. Faro D. Ferguson P. Ferreira S. Fisher

M. FitzGerald K. Foley C. Foley A. Franke D. Friedrich D. Gage M. Garber G.

Gearin G. Giannoukos T. Goode A. Goyette J. Graham E. Grandbois K. Gyaltsen

N. Hafez D. Hagopian B. Hagos J. Hall C. Healy R. Hegarty T. Honan A. Horn N.

Houde L. Hughes L. Hunnicutt M. Husby B. Jester C. Jones A. Kamat B. Kanga C.

Kells D. Khazanovich A.C. Kieu P. Kisner M. Kumar K. Lance T. Landers M. Lara

W. Lee J.P. Leger N. Lennon L. Leuper S. LeVine J. Liu X. Liu Y. Lokyitsang T.

Lokyitsang A. Lui J. Macdonald J. Major R. Marabella K. Maru C. Matthews S.

McDonough T. Mehta J. Meldrim A. Melnikov L. Meneus A. Mihalev T. Mihova

K. Miller R. Mittelman V. Mlenga L. Mulrain G. Munson A. Navidi J. Naylor T.

Nguyen N. Nguyen C. Nguyen R. Nicol N. Norbu C. Norbu N. Novod T. Nyima P.

Olandt B. O'Neill K. O'Neill S. Osman L. Oyono C. Patti D. Perrin P. Phunkhang F.

Pierre M. Priest A. Rachupka S. Raghuraman R. Rameau V. Ray C. Raymond F. 136

Rege C. Rise J. Rogers P. Rogov J. Sahalie S. Settipalli T. Sharpe T. Shea M.

Sheehan N. Sherpa J. Shi D. Shih J. Sloan C. Smith T. Sparrow J. Stalker N.

Stange-Thomann S. Stavropoulos C. Stone S. Stone S. Sykes P. Tchuinga P.

Tenzing S. Tesfaye D. Thoulutsang Y. Thoulutsang K. Topham I. Topping T.

Tsamla H. Vassiliev V. Venkataraman A. Vo T. Wangchuk T. Wangdi M. Weiand

J. Wilkinson A. Wilson S. Yadav S. Yang X. Yang G. Young Q. Yu J. Zainoun L.

Zembek A. Zimmer and E.S. Lander. 2005. Genome sequence, comparative

analysis and haplotype structure of the domestic dog. Nature 438: 803-819.

Liu, G., C. Alkan, L. Jiang, S. Zhao, and E.E. Eichler. 2009. Comparative analysis of Alu

repeats in primate genomes. Genome Res 19: 876-885.

Murphy, W.J., T.H. Pringle, T.A. Crider, M.S. Springer, and W. Miller. 2007. Using

genomic data to unravel the root of the placental mammal phylogeny. Genome Res

17: 412-4421.

Nikaido, M., F. Matsuno, H. Hamilton, R.L. Brownell, Jr., Y. Cao, W. Ding, Z. Zuoyan,

A.M. Shedlock, R.E. Fordyce, M. Hasegawa, and N. Okada. 2001. Retroposon

analysis of major cetacean lineages: The monophyly of toothed whales and the

paraphyly of river dolphins. PNAS 98: 7384-7389.

Nikaido, M., A.P. Rooney, and N. Okada. 1999. Phylogenetic relationships among

cetartiodactyls based on insertions of short and long interspersed elements:

Hippopotamuses are the closest extant relatives of whales. PNAS 96: 10261-10266.

Nishihara, H., S. Kuno, M. Nikaido, and N. Okada. 2007. MyrSINEs: A novel SINE family

in the anteater genomes. Gene 400: 98-103.

Nixon, K.C. 2003. WinClada ver. 1.00.08 update. Published by the Author, Ithaca, NY . 137

Pecon-Slattery, J., A.J. Wilkerson, W.J. Murphy, and S.J. O'Brien. 2004. Phylogenetic

assessment of introns and SINEs within the Y chromosome using the cat family

Felidae as a species tree Mol Biol Evol 21: 2299-2309.

Pontius, J., J. Mullikin, D. Smith, K. Lindblad-Toh, S. Gnerre, M. Clamp, J. Chang, R.

Stephens, B. Neelam, N. Volfovsky, A. Schaffer, R. Agarwala, K. Narfstrom, W.

Murphy, U. Giger, A. Roca, A. Antunes, M. Menotti-Raymond, N. Yuhki, J.

Pecon-Slattery, and W. Johnson. 2007. Initial sequence and comparative analysis of

the cat genome. Genome research 17: 1675-1689.

Posada, D. and T.R. Buckley. 2004. Model selection and averaging in phylogenetics:

advantages of Akaike information criterion and Bayesian approaches over

likelihood ratio tests. Syst. Biol 53: 793-818.

Posada, D. and K.A. Crandall. 2001. Selecting the best-fit model of nucleotide substitution.

Syst. Biol 50: 580-601.

Pritham, E. 2009. Transposable elements and factors influencing their success in

eukaryotes. J Hered 100: 648-655.

Ray, D.A., J.A. Walker, A. Hall, B. Llewellyn, J. Ballantyne, A.T. Christian, K. Turtletaub,

and M.A. Batzer. 2007. Inference of human geographic origins using Alu insertion

polymorphisms. Forensic Science International 153: 117-124.

Ray, D.A., J. Xing, A.H. Salem, and M.A. Batzer. 2006. SINEs of a nearly perfect

character. Syst. Biol 55: 928-935.

Sasaki, T., K. Takahashi, M. Nikaido, S. Miura, Y. Yasukawa, and N. Okada. 2004. First

application of the SINE (short interspersed repetitive element) method to infer

138

phylogenetic relationships in reptiles: an example from the turtle superfamily

Testudinoidea. Mol Biol Evol 21: 705-715.

Schröder, C., C. Bleidorn, S. Hartmann, and R. Tiedemann. 2009. Occurrence of Can-

SINEs and intron sequence evolution supports robust phylogeny of pinniped

carnivores and their terrestrial relatives. Gene 448: 221-226.

Shedlock, A.M., K. Takahashi, and N. Okada. 2004. SINEs of speciation: tracking lineages

with retroposons. Trends in Ecology & Evolution 19: 545-553.

Shimamura, M., H. Abe, M. Nikaido, K. Ohshima, and N. Okada. 1999. Genealogy of

families of SINEs in cetaceans and artiodactyls: the presence of a huge superfamily

if tRNA-Glu derived families of SINEs. Mol Biol Evol 16: 1046-1060.

Smit, A. and P. Green. 2005. Repeat Masker Website and Server.

Spaulding, M., M. O'Leary, and J. Gatesy. 2010. Relationships of cetacea (artiodactyla)

among mammals: increased taxon sampling alters interpretations of key fossils and

character evolution. PLOS One 4: e7062.

Swofford, D. 2003. PAUP*. Phylogenetic analysis using parsimony (*and other methods)

version 4. Sinauer Associates .

Tamura, K., D. Peterson, N. Peterson, G. Stecher, M. Nei, and S. Kumar. 2011. MEGA5:

Molecular evolutionary genetics analysis using maximum likelihood, evolutionary

distance, and maximum parsimony methods. Mol Biol Evol (submitted).

Weiner, A.M. 2002. SINEs and LINEs: the art of biting the hand that feeds you. Current

Opinion in Cell Biology 14: 343 - 350.

Wicker, T., F. Sabot, A. Hua-Van, J.L. Bennetzen, P. Capy, B. Chalhoub, A. Flavell, P.

Leroy, M. Morgante, O. Panaud, E. Paux, P. SanMiguel, and A.H. Schulman. 2007. 139

A unified classification system for eukaryotic transposable elements. Nat Rev Genet

8: 973 -982.

Wortley, A. and R. Scotland. 2005. The Effect of Combining Molecular and Morphological

Data in Published Phylogenetic Analyses. Syst. Biol 55: 677-685.

Xing, J., H. Wang, K. Han, D.A. Ray, C.H. Huang, L.G. Chemnick, C.-B. Steward, T.R.

Disotell, O.A. Ryder, and M. Batzer. 2005. A mobile element based phylogeny of

Old World monkeys. Molecular Phylogenetics and Evolution 37: 872-880.

Yu, L., J. Liu, P.-t. Luan, H. Lee, M. Lee, M.-s. Min, O. Ryder, L. Chemnick, H. Davis,

and Y.-p. Zhang. 2008. New insights into the evolution of intronic sequences of the

Beta-fibrinogen gene and their application in reconstructing mustelid phylogeny.

Zoological Science 25: 622-672.

Yu, L. and Y.-p. Zhang. 2005. Evolutionary implications of multiple SINE insertions in an

intronic region from diverse mammals. Mammalian Genome 16: 651-660.

Zehr, S.M., M.A. Nedbal, and J.J. Flynn. 2001. Tempo and mode of evolution in an

orthologous Can SINE. Mammalian Genome 12: 38-44.

Zwickl, D. 2006. Genetic algorithm approaches for the phylogenetic analysis of large

biological sequence datasets under maximum likelihood criterion. PhD dissertation,

The University of Texas at Austin.

140

Species Common Name Code Lineage Family Felis bieti Chinese Desert Cat FBI Domestic Cat Felidae Felis catus Domestic Cat FCA Domestic Cat Felidae Felis chaus Jungle Cat FCH Domestic Cat Felidae Felis libyca African Wild Cat FLI Domestic Cat Felidae Felis margarita Desert Cat FMA Domestic Cat Felidae Felis nigripes Black-footed Cat FNI Domestic Cat Felidae Felis silvestris European Wild Cat FSI Domestic Cat Felidae Leopardus colocolo Pampas Cat LCO Ocelot Felidae Leopardus geoffroyi Geoffroy's Cat OGE Ocelot Felidae Leopardus guigna Kodkod OGU Ocelot Felidae Leopardus jacobitus Andean Mt. Cat OJA Ocelot Felidae Leopardus pardalis Ocelot LPA Ocelot Felidae Leopardus tigrinus Tigrina LTI Ocelot Felidae Leopardus wiedii Margay LWI Ocelot Felidae Lynx canadensis Canadian Lynx LCA Lynx Felidae Lynx lynx Eurasian Lynx LLY Lynx Felidae Lynx pardina Iberian Lynx LYP Lynx Felidae Lynx rufus Bobcat LRU Lynx Felidae Panthera leo Lion PLE Panthera Felidae Panthera onca Jaguar PON Panthera Felidae Panthera pardus Leopard PPA Panthera Felidae Panthera tigris Tiger PTI Panthera Felidae Uncia uncia Snow Leopard PUN Panthera Felidae Neofelis nebulosa Clouded Leopard NNE Panthera Felidae Pardofelis badia Bay Cat PBA Bay Cat Felidae Pardofelis marmorata Marbled Cat PMA Bay Cat Felidae Pardofelis temminckii Asian Golden Cat PTE Bay Cat Felidae Otocolobus manul Pallas Cat OMA Asian Leopard Cat Felidae Prionailurus bengalensis Asian Leopard Cat PBE Asian Leopard Cat Felidae Prionailurus planiceps Flat-headed Cat IPL Asian Leopard Cat Felidae Prionailurus rubiginosus Rusty Spotted Cat PRU Asian Leopard Cat Felidae Prionailurus viverrinus Fishing Cat PVI Asian Leopard Cat Felidae Profelis aurata African Golden Cat PAU Caracal Felidae Profelis caracal Caracal CCA Caracal Felidae Profelis serval Serval LSE Caracal Felidae Puma concolor Puma PCO Puma Felidae Puma yagouaroundi Jaguarundi HYA Puma Felidae Acinonyx jubatus Cheetah AJU Puma Felidae Helogale hirtula Dwarf mongoose HPA Other Feliformia Herpestidae Genetta genetta Genet GGE Other Feliformia Vivveridae Hyaena hyaena Hyena HHY Other Feliformia Hyaenidae Prionodon linsang Banded Linsang PLI Other Feliformia Prionodontidae Cryptoprocta ferox Fossa CFE Other Feliformia Eupleridae Supplemental Table 1: CanSINE sequences were obtained from all Felidae family species along with additional species representing other Feliformia families.

141

10 20 30 40 Subfamily II Subtype B 212075 F.nigripes GGGGCGCCTGGGTGGCGCAGTCGGTTAAGCGTCCGAC------TT chC1.181M P.bengal/viver ...... ------.. 150162 P.badia ...... A...... ------.. chB1.32M P.onca ...... ------.. 105890 C.aurata ...... A...... T...... ------.. 150853 Prionailurus ...... T...... -..------.. 133135 L.rufus ....--...... G...... T...... ------.. 130416 Prionailurus .....A...... ------.. 105890 O.manul ...... G...... ------.. 212075 Pardofelis ....--...... G...... G...... A------.. 213566 O.manul .....C...... G...... A...... ------.. 125972 P.viverrina ...... ------.. 133135 P.marmorata ...... GG...... ------.. 150951 A. Leopard ...... A...... ------.. 150951 L.pardinus .....A...... ------.. 212075 P.bengalensis ...... -...... A...... ------.. 5313 P.onca ...... T...... ------.. 161275 P.bengal/viver/plani ...... A...... ------.. 139216 O.manul ....--...... A...... ------.. 203464 F.silves/margar ...... T...... ------.. 146417 P.badia ...... ------.. 216524 P.marmorata ...... ------.. 215122 O.manul .....A...... ------.. 216162 P.marmorata ...... A...... ------.. 212075 P.rubiginosa ...... AG...... ------.. 105890 Felis ...... ------.. 150853 P.marmorata ...... T...------.. 134463 Prionailurus ...... C...... ------.. 150951 F.nigripes ...... A...... A...... ------.. 154966 Lynx/A.Leopard/Dom ...... ------.. 133135 C.caracal ...... ------.. 212733 Neofelis ...... ------.. chA1.248M Prionailurus ...... ------.. chD4.80M L.geoffroyi/guigna ...... ------.. 212075 P.planiceps ...... ------.. 73133 L.canadensis ...... ------.. 212331 O.manul ...... ------.. 133135 Leopardus ...... ------.. 213798 L.tigrin/geoffr/guigna...... ------.. 101187 C.serval -...... C.C...... ------.. 101187 P.yagouarundi -...... ------.. 204133 L.lynx -...... ------.. 164336 P.viverrina -...... T...... ------.. 99643 F.nigripes -...... A...... ------.. 106256 P.marmorata -...... A...... A...... A...... ------.. 134292 P.uncia/tigris -...... ------.. 145617 P.marmorata -...... ------.. 106256 L.pardinus/lynx -...... -...... ------.. 106256 L.canadensis/lynx -...... ------..

Subfamily II Subtype A 203464 Felidae ...... T.....T.....A...... T..T...------.. 139216 Lynx ...... -...... T...... A...... ------T.. 782 Panthera ...... T...... G..T...... ------.. chA1.248M Pantherinae ...... T...... ------.. 214534 FelidaenoPanthera/Bay ...... T...... ------.. 73133 O.manul ...... ------.. 122236 L.colocolo -...... T...... G...... ------.. 122236 Felidae -....A...... T...... G...... ------.. 115304 Felidae/Prion -....A...... T...... G...... ------.. 199572 Neofelis -...... T...... G...... ------.. SINEC_Fc2 RM -...... T...... ------.. SINEC_Fc RM -...... ------..

Subfamily I Subtype B 213652 Feliformia -....A...... T.....A...... A..T...TCTTGGT.. 131713 Feliformia -....T...... T.....A...... A..TCCTGAT.. 142

131713 C.ferox -..C.A...... C...T.....A...... A...... TCCTGAT.. 199572 Feliformia -.A.TAT.....G...T...... A..A.G.TCTTGAT.. 99534 Feliformia -....A...... G...T--..T...... T.C..A..TCTTGGT.. 194731 Feliformia -....A...... G...T.G..G...... A..T--GGAT.. 180515 Feliformia -....A...... T.....A...... A..TCTTGAT.. 95892 Feliformia -....A...... G...T....T...... A.A.AG...TCTTGAT.. 151787 Feliformia -...... T...C....G...... A..TCTTGAT.. 134929 Feliformia -...G.....A...A.C....T...... A..T..ATCTTGAT.. 99463 Feliformia -....A..C...... T..A..A...... TA..T...TCTTGGT.. 4743 Feliformia -...AT..AA....T.T.....A...... A..TCTTGAT..

Subfamily II Subtype A 206155 Feliformia -...T.TG..A.....T....T....G...A..T...TCTTGAT.. 145754 Feliformia -....A...... A.TT...T....G...A..T...TCTTGAT.. 145617 Feliformia -.A..A...... T.....A...G...AC.T...TCTTGAT.. 150913 Feliformia -C.C.T...C...... T....T....G...C...AC.TCTTGAT.. 212906 Feliformia -...... A...... T.....A...G...A..T...TCTTGAT.. 164336 Feliformia -....AT.....CA..T.....A...G...A..T...T--TGAT.. 95036 Feliformia -..TTAT...... T....T....G...A..T...TCTTGAT.. 117885 Feliformia -.A.TA...A...... T.....A.G...... A.G.TCTTGAT.. 106256 Feliformia -....A.....A....TTG..T....G...A..T...TCTTGAT.. 135256 Feliformia -...TA.T...... T.G..T...... AC.T...TCTTGAT.. 101187 Feliformia -.A..A...... T....T....G...A..T..TTCTTGAT.. 107176 Feliformia -....A...... T...... G...A...C..TCTTGAT.. 203536 Feliformia -....A...... T...... G...A..T...TCTTGAT.. 174511 Feliformia -..AN.T...... T..T.T....G...A..T...TCTTGAT.. 136949 Feliformia -....A.Y...... T...... G...A..Y...TCTTGAT.. 179189 Feliformia -...... A.T..A..AA..G...A...A..TCTTGAT.. 217179 Feliformia -...... T.....A...... A..T...TCTTGAT..

50 60 70 80 Subfamily II Subtype B 212075 F.nigripes CAGC-CAGGTCACGATCTCGCGGTCCGTGAGT-TCGAGCCCCGCGT chC1.181M P.bengal/viverr ....-...... T...... -...... A... 150162 P.badia ....-...... -...... chB1.32M P.onca ....-.....G...... A...... A.....-...... 105890 C.aurata ....-...... -.T...... 150853 Prionailurus ..A.-.C...... C...... C.....-..C.A...... 133135 L.rufus ....-...... T...... GA.-...... T.. 130416 Prionailurus ....-...... A..TT.G....-...... 105890 O.manul ....-...... -...... 212075 Pardofelis ....-...... T...T.A....G....-.T...... GC. 213566 O.manul ....-...... G..-...... 125972 P.viverrina T...-...... -...... 133135 P.marmorata ....-...... A...... -...... T.... 150951 A. Leopard ....-...... T...... A.....-...... 150951 L.pardinus ....-...... C....T...... -...... 212075 P.bengalensis ....-...... -....A...... 5313 P.onca ....-...... -...... 161275 P.bengal/viver/plani ....-...... T...... -...... 139216 O.manul .....-...... C...... -...... 203464 F.silves/margar .....-...... -...... 146417 P.badia ....-...... -...... 216524 P.marmorata ....-...... -...... 215122 O.manul ....-...... -...... 216162 P.marmorata ....-...... -...... 212075 P.rubiginosa .TGAT--GA..T..C....T....G.T..T.A.G.G...... T... 105890 Felis ....-...... AG...... -...... 150853 P.marmorata ....-...... -.T...... 134463 Prionailurus ....-...... -...... A. 150951 F.nigripes ....-...... -...... 154966 Lynx/A.Leopard/Dom ....-...... -...... C 133135 C.caracal ....-...... -.A...... 212733 Neofelis ....-...... -...... chA1.248M Prionailurus ....-...... G...... -...... chD4.80M L.geoffroyi/guigna ....-...... -...... 212075 P.planiceps ....-...... G....-...... 73133 L.canadensis ....-...... -...... 212331 O.manul ....-...... C...... -...... 143

133135 Leopardus ....-...... -...... 213798 L.tigrin/geoffr/guigna....-...... -...... 101187 C.serval ....-.C...... TC-...... -...... 101187 P.yagouarundi ....-...... TC-...... -...... 204133 L.lynx ....-...... A..TC-..G....-...... 164336 P.viverrina ....-...... T...... A..TC-...... -...... 99643 F.nigripes ....-...... TC-...... -...... 106256 P.marmorata ....-...... TC-...... -.T...... 134292 P.uncia/tigris ....-...... TC-T...... -...... T...... 145617 P.marmorata ....-...... TC-T.G....-...... 106256 L.pardinus/lynx ....-...... T...... TC-...... -...... 106256 L.canadensis/lynx ....-...... TC-T...... -......

Subfamily II Subtype A 203464 Felidae ...TT..A....T...... A-..T...... -..A.A....AT.. 139216 Lynx ...T...... A...... -.T...... 782 Panthera ....T...... T...... A.AC..T...... -...... A. chA1.248M Pantherinae .G..T...... T...... -...... T.... 214534 FelidaenoPanthera/Bay TG..T...... AT...... -...... T.... 73133 O.manul ....T...... C.T...... -...... 122236 L.colocolo .G..T...... T...... ATC-...... -...... 122236 Felidae .G..T...... GT...... ATC-...... -...... AT.. 115304 Felidae/Prion GG..T...... T...... ATC-T...... -...... AT.. 199572 Neofelis ....T...... TC-...... -...... SINEC_Fc2 RM .G..T...... T...... -..T...... -...... SINEC_Fc RM ....-...... -...... -......

Subfamily I Subtype B 213652 Feliformia TG..T...... T..C...A..C.TT...G.A-....T....AT.. 131713 Feliformia ....T...A...T...... AT-T.TT.....A-..A...... A.A. 131713 C.ferox TG..T...... T...... A.-T.TT.....A-..A...... AT.. 199572 Feliformia ....T...A...T...... AT.T.--...G.A-...... AT.. 99534 Feliformia TG..T...A...T...... AT.T.-...... -...... AT.. 194731 Feliformia ....T...... T...... A..T.-.A..G.A-...... A.AG 180515 Feliformia ....T..A....T...... A.AT.-.A...A.-..A...... A.A. 95892 Feliformia .T..TT...... T...... CT.T.-T.....C-.A...... AA.. 151787 Feliformia ....T...... A.CT.-...... A-...... A... 134929 Feliformia .T..T...A...T...... CT.T.-.C....A-.A.....TT.A.. 99463 Feliformia .G..T...... T...... C..T.-....G..-...... AT.. 4743 Feliformia ....T..A....T...... A..T.-.A.....-.T.G...... A.

Subfamily I Subtype A 206155 Feliformia ....T...... T....C.AG.T.--...G.A-.T....TG.A.T. 145754 Feliformia ....T.....TGT....C.AG.T.--...G.A-...... ATA. 145617 Feliformia TG..T...... T....C.AG.T.--...G.A-...... AT.. 150913 Feliformia .G..T...... T....C.AG.T.--...G.A-...... AT.. 212906 Feliformia .GA.T...... T....C.AG.TC--C..G.A-...... AT.. 164336 Feliformia ....T...... T....C.A-.A.-.C....A-...... AT.. 95036 Feliformia ....T...... T....C.AG.T.--A..G.C-..A....T.A... 117885 Feliformia TG..T...... T...... A.AT.-.A...AA-.A...... AT.. 106256 Feliformia .T..T...... T....C.AG.T.--...G-A-...... T.... 135256 Feliformia .T..T...... T...... TG.T.-.A....A-...... A... 101187 Feliformia -G..T...... T....C.AG.TC-AT..G-A-..A...... A.A. 107176 Feliformia ....T...... T....C.AG.T--.A..G.A-..A.....T.T.. 203536 Feliformia TG..C...... T....C.AG.T.--...G.A-.G...... AT-G 174511 Feliformia ....T...... T....C.AG.T--.A..-.A-.T...T...AT.. 136949 Feliformia ....T...... T....C.AG.T--.A..G.A-.T...... A. 179189 Feliformia .T..T...... T....C.AG.T--.A..G.A-C.T.....T.T.. 217179 Feliformia TG..TT...... T...... A..A.--.....A-.A...... TAT..

100 110 120 130 Subfamily II Subtype B 212075 F.nigripes CGGGCTC--TGGGCTGATGGCTCAGAGCCTGGAGCCTG-TTTCCG chC1.181M P.bengal/viver .A.....--...... G...... -...... 150162 P.badia ...... --...... G...... -...... chB1.32M P.onca .A.....--...... G...... -...... 105890 C.aurata ...... --..T.....CT...... -...... 150853 Prionailurus ...... --...... T.A.A...... A....-G....C 144

133135 L.rufus ...... T--....G...... G...... -.....T 130416 Prionailurus .A....T--...... -...... 105890 O.manul ...... --.....C..G.....G...T.G...... -....G. 212075 Pardofelis .A....T--....G...... T..C...... -...... 213566 O.manul .A....T-G...... -...... 125972 P.viverrina .A....T--...... -...TT. 133135 P.marmorata ...... --...... -...... 150951 A. Leopard ...... T--...... -...... 150951 L.pardinus ...... --...A...... -...... 212075 P.bengalensis ...... --...... C.-...... 5313 P.onca ...... --...... G...... -....G. 161275 P.bengal/viver/plani .....C.--...... -...... 139216 O.manul .A.....--...... -...... 203464 F.silves/margar ...... --...... G...... --..T.. 146417 P.badia ...... --...... G...... -.....C 216524 P.marmorata ...... --...... G...... -.....A 215122 O.manul ...... --...... G...... -...... 216162 P.marmorata ...... --...... G...... -...... 212075 P.rubiginosa .T.C...--...A....A....T...C...... -...... 105890 Felis .A.....TG...... G...... -...... 150853 P.marmorata .A.....--...... G...... -...... 134463 Prionailurus ...... --...... -...... 150951 F.nigripes ...... --...... T...... 154966 Lynx/A.Leopard/Dom .A.....--...... G...... -...... 133135 C.caracal .A.....--...... G...... -...... 212733 Neofelis .A.....--...... -...... chA1.248M Prionailurus .A.....--...... G...... -...... chD4.80M L.geoffroyi/guigna ...... --...... -....T. 212075 P.planiceps ...... --...... -...... 73133 L.canadensis ...... --...... -C..... 212331 O.manul ...... --...... -...... 133135 Leopardus ...... --...... -...... 213798 L.tigrin/geoffr/guigna ...... --...... -...... 101187 C.serval ...... T--...... G...... T..C.GA 101187 P.yagouarundi ...... --...... T..C.GA 204133 L.lynx ...... --...... T..C.GA 164336 P.viverrina .A.....--...... G..A...... T..C.GA 99643 F.nigripes .A.....--...... T..C.AA 106256 P.marmorata .A.....--...... T..CTGA 134292 P.uncia/tigris .A.....--...A...... G...... T..C.GA 145617 P.marmorata .A.....--.....------...... G..T..C.GA 106256 L.pardinus/lynx ...... --...... T..C.GA 106256 L.canadensis/lynx ...... --...... A...... T..C.GA

Subfamily II Subtype A 203464 Felidae ...... G--.AT...A.CA...TG...... A-C...A. 139216 Lynx ...... --...... G...... -C...A. 782 Panthera ...... --..T...... C.-C...A. chA1.248M Pantherinae ...... --..T.....CA...... -....A. 214534 FelidaenoPanthera/Bay ...... --..T.....CT...... -....A. 73133 O.manul.A ...... --...... -...... 122236 L.colocolo T...... --..T.....CC...... T..CAGA 122236 Felidae ...... --..T.....CC...... T..C.GA 115304 Felidae/Prion ...... --..T.....CA...... T..CGGA 199572 Neofelis ...... --...... G...... C..C.GN SINEC_Fc2 RM ...... --..T.....CA...... C..NNNN SINEC_Fc RM ...... --...... T..NNNN

Subfamily I Subtype B 213652 Feliformia .A.....--..C....GCC.------TGCA...... C..GGGA 131713 Feliformia T...... --..TA....CA.------..A-...... C..GGGA 131713 C.ferox T...... --..T.....CA.------TGCA...... C..-GGA 199572 Feliformia .A.....--..CA....CA.------.AC...... C..GGGA 99534 Feliformia .A.....--CAT...... ------TA.A...... C..GGGA 194731 Feliformia ...... --..T.....CCA------TGCA...... C..GGGA 180515 Feliformia T...... --..T.....CAT....------..C....C..GGGA 95892 Feliformia .A.....--..CA....CA.------.AG...... C..GGGA 151787 Feliformia .A-....--..T....G.....------...... C..GGGA 134929 Feliformia .A.....--..CA....GA.------.A...... T.AC..GGGA 145

99463 Feliformia .-.....--..T...... ------TG...... C..GGGA 4743 Feliformia .A-....--..C.....CA.------TGCA...... C..GGGA

Subfamily I Subtype A 206155 Feliformia T...... --CAT.....GT.------...... C..AAGA 145754 Feliformia ...... G--CTTA...TGT.------.....G...C..AAGA 145617 Feliformia T...... --..CA....GT.------...... C..AAGA 150913 Feliformia T...... --..C.....GT.------....A....C..AAGA 212906 Feliformia T...... --..T.....GT.------..A.....AC..AAGA 164336 Feliformia .A-....--..T.....GT.------.....G...C..GGGA 95036 Feliformia T-.....--..CA....G------.A...... C..GTGA 117885 Feliformia T...... --C.T....GG------.AC...... C..AAGA 106256 Feliformia ...... --CAT.....GT.------....A....C..AAGA 135256 Feliformia .A.....--..CA....CA.------.ACA...T...C..GGGA 101187 Feliformia T-.....--..A.....GC.------...... C..ATGA 107176 Feliformia T-.....--CAT.....GT.------...... C..AAGA 203536 Feliformia .C.....--CAT.....GN------TG...... C..AAGA 174511 Feliformia T...... --CAT....GGT.------...... C..GAGA 136949 Feliformia ...A...--CAT.....GT.------...... C..GAGA 179189 Feliformia .A....T--CAT.....GT.------...... C..AAGA 217179 Feliformia TA.....--..T.....G------.A...... C..AAGA

Supplemental Figure 1 Alignment of 92 complete SINE sequences found among the feliform suborder. Diagnostic indels are attributed to subfamily and subtype distinctions.

146

Supplemental Figure 2 Majority-rules consensus tree derived from 108 most parsimonious trees inferred using a matrix of the tRNA-related regions from 23 concatenated CanSINE sequences. Branches corresponding to partitions reproduced in less than 50% trees are collapsed. Bootstrap support scores displayed next to corresponding nodes are based on 1000 replicates.

147

Supplemental Figure 3 Maximum-likelihood tree inferred using a matrix of the tRNA- related regions from 23 concatenated CanSINE sequences. Branches corresponding to partitions reproduced in less than 50% trees are collapsed. Bootstrap support scores displayed next to corresponding nodes are based on 1000 replicates.

148

Supplemental Figure 4 Bayesian phylogeny estimation inferred using a matrix of the tRNA-related regions from 23 concatenated CanSINE sequences. Branches corresponding to partitions reproduced in less than 50% trees are collapsed. Posterior- probability scores are displayed next to corresponding nodes.

149

Supplemental Figure 5 Majority-rules consensus tree derived from 22 most parsimonious trees inferred using a matrix of 23 tRNA-related regions and 18 microsatellite regions from

CanSINE sequences. Branches corresponding to partitions reproduced in less than 50% trees are collapsed. Bootstrap support scores displayed next to corresponding nodes are based on 1000 replicates.

150

Supplemental Figure 6 Maximum-likelihood tree inferred using a matrix of 23 tRNA- related regions and 18 microsatellite regions from CanSINE sequences. Branches corresponding to partitions reproduced in less than 50% trees are collapsed. Bootstrap support scores displayed next to corresponding nodes are based on 1000 replicates. 151

Supplemental Figure 7 Bayesian phylogeny estimation inferred using a matrix of 23 tRNA-related regions and 18 microsatellite regions from CanSINE sequences. Branches corresponding to partitions reproduced in less than 50% trees are collapsed. Posterior- probability scores are displayed next to corresponding nodes.

152

Supplemental Figure 8 A bootstrap consensus tree derived from 11 most parsimonious trees inferred using a combined matrix of 23 tRNA-related regions and 18 microsatellite regions from CanSINE sequences plus 31 presence/absence characters. Branches corresponding to partitions reproduced in less than 50% trees are collapsed. The consistency index is 0.80 and the retention index is 0.78. Bootstrap support scores above

50% based on 1000 replicates are noted next to corresponding nodes.

153

GENERAL CONCLUSIONS

In the 60 years since Barbara McClintock’s discovery of transposable elements it

has become evident that these sequences are significant components of mammalian

genomes and not merely 'junk DNA'. We are currently in the midst of understanding how

SINE insertions influence specific phenotypes and the extent to which they contribute to

genetic diversity. The findings of this dissertation will bolster future research in the

abundant Carnivore specific SINE insertions known as CanSINEs.

A review of prior CanSINEs studies finds that these sequences are indeed

Carnivora-wide (not caniform specific as was originally hypothesized) and a major source

of genetic variation, accounting for as much as 8% of the genome. CanSINEs contribute to

phenotypic diversity by providing promoter sequences for adjacent genes, and by

incorporation into coding sequences, altering expression patterns. Carnivora CanSINEs are

also informative phylogenetic markers with clade specific insertion sites throughout within

Carnivora. Recent carnivore whole genome sequencing projects also reveal clade specific

CanSINE subfamilies in the giant panda ( A. melanoeuca ), that alluding to the maintenance

of retrotranspositional activity in multiple carnivore lineages.

Using the domestic cat ( F. catus ) whole genome sequence as a reference and two

novel comparative methods, I located and characterized 98 novel informative SINE

insertion loci specific to Felidae. These insertions are diagnostic of evolutionary relationships at the suborder, familial, genus, sister-species and species levels. Mapping

these insertion sites to an existing Felidae phylogeny based on large-scale sequence data

provided additional support for newly hypothesized evolutionary relationships such as, the

154 sister-taxa relationship between Priondontidae and Felidae as well as the inclusion of O. manul in the Asian leopard cat lineage.

These new data directly address the accuracy of SINEs in phylogenetic inference.

One long-standing controversy regarding SINE-based evolutionary analyses is the extent to which incomplete lineage sorting following rapid speciation results in paraphyletic SINE distributions. As felids evolved in an explosive radiation an opportunity to test this hypothesis is presented. Here, the five insertion loci incongruent with other molecular data are indicative of either incomplete lineage sorting or of truly non-bifurcating species histories. For example, polymorphic and fixed insertions within the lynx lineage reflect the near simultaneous divergence (occurring in less than 40,000 years) of L. canadensis , L. lynx and L. pardinus . Likewise, combined examinations of SINE loci and NADH5 haplotypes indicate hybridization between Prionailurus species. Examination of a SINE locus across Felidae also identified an otherwise plesiomorphic SINE locus that is entirely deleted from P. concolor , representing the first documentation of SINE excision in

Carnivora.

Characterization of numerous CanSINE loci throughout the Felidae lineage clarifies the role of SINEs within the host genome. Feliform specific SINE subfamilies were defined based on species distribution and sequence. Similar to primate Alu ’s, CanSINEs are integrated at specified genomic locations and rely on LINE derived enzymes for proliferation. As neutrally evolving orthologous loci, CanSINE integration sites are valid sources of sequence data for tree-building algorithms and provide sufficient presence/absence data for de novo phylogenies. Furthermore, combining sequence and insertion data provides robust phylogenetic resolution. In conclusion, mammalian SINEs 155 are not merely ‘junk DNA’, rather, they comprise a rich genomic resource that has helped shaped the diversity of species we observe today.

FUNDING

This dissertation was funded by the National Science Foundation Doctoral

Dissertation Improvement Grant No. CCLS20499F , the George Washington University

Facilitating Fund and the Laboratory of Genomic Diversity at the National Cancer Institute.

ADDITIONAL SUPPORT

I would like to thank the Laboratory of Genomic Diversity for access to tissue samples and for access to the laboratory resources. All samples were collected in full compliance with specific Federal Fish and Wildlife permits, Convention and International Trade in

Endangered Species of Wild Flora and Fauna (CITES); Endangered and Threatened

Species, Captive Bred issued to the National Cancer Institute-National Institutes of Health

(S.J. O'Brien, principal officer) by the U.S. Fish and Wildlife Service of the Department of the Interior. I also thank Joan Pontius, Carrie McCracken, Victor David and Warren

Johnson for technical assistance and expertise.

156