<<

Advances in felid genetics and genomics

Georgina Samaha Bachelor of and Veterinary Biosciences (Hons I), Bachelor of Arts and Sciences

Faculty of Science Sydney School of Veterinary Science

A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy

2021

Acknowledgements

I thank Assoc. Prof. Bianca Waud, my PhD supervisor and mentor. You instilled in me an enthusiasm for scientific practice that I will carry into future endeavours. A first-time experience for both of us, your door was always open, and you took everything in stride. While I am sure it was difficult to watch me flail at times, I am grateful for the freedom to get lost, make mistakes and come to things in my own time. I also thank my auxiliary supervisor, Prof. Claire Wade who was always there to thoughtfully tease out a problem. I am grateful to have been a beneficiary of your scientific acumen- I could always trust you to recognise a dead end and cut through the B.S. I am grateful to have been a member of the Wade-Waud laboratory and acknowledge the postgraduate students I studied alongside over the course of my candidature: Mitchell O’Brien, Bobbie Cansdale, Tracy Chew, Jessica Gurr, Niruba Kandasamy and Lillian Brancalion. I am especially appreciative of Mitchell’s friendship and collaboration. I also give sincere thanks to Trung Doan and Dr Maura Carrai for their technical expertise and support.

I thank my internal collaborators: Dr Catherine Gruber, Prof. Julia Beatty and Dr Hamutal Mazrier who helped me develop my communication skills and taught me the value of interdisciplinary collaboration and lateral thinking in scientific practice. I also thank my external collaborators Prof. Leslie Lyons and Dr Linda Fleeman, and the services and computing resources that the University of Sydney’s High Performance Computing cluster, Sydney Informatics Hub and ICT department provided for all analyses presented here. I am also grateful to have been the recipient of an Australian Postgraduate Award and for the funding provided by the Jenna O’Grady Donley Fund.

Finally, I give thanks to my friends and family. I thank my parents for their love and support in pursuing the path I chose, and my partner Emilia for her indulgence and patience.

i Statement of originality

I declare that this thesis is my own work and contains no material which has been accepted for the award of any other degree or diploma in my name at any other university or institution of tertiary education. To the best of my knowledge this thesis contains no material previously published or written by another person, except where referenced in the text. I certify that the intellectual content of this thesis is the product of my own work and that all assistance received in preparing this thesis and sources have been acknowledged.

Georgina Samaha 28th February 2021

ii Table of Contents

Abstract 1

Chapter 1: Introduction 2 1.1 A brief history of felid 2 1.2 From wildcat to house 5 1.3 The expansion of feline genomics 8 1.4 The domestic cat as a model for mapping heritable diseases in humans 12 1.5 Felid conservation 15 1.6 Aims of thesis 19 1.7 References 20

Chapter 2: The domestic cat as a novel genetic model of complex disease 34 2.1 The as a genetic model of type 2 diabetes in humans 34 2.2 Mapping the genetic basis of diabetes mellitus in the Australian Burmese cat 42

Chapter 3: Cross-species applications of the feline reference genome for the benefit of conservation research 55 3.1 Exploiting genomic synteny in using cross-species genome alignment and SNV discovery to inform conservation management in big 55

Chapter 4: Cross-species applications of low-density SNV discovery and genotyping techniques for the benefit of conservation research 83 4.1 Harnessing cross-species SNV methodologies for conservation genomics: A comparison of genotyping array and reduced-representation sequencing methods in wild felids 83

Chapter 5: Concluding remarks 109

Appendices 112 Appendix I: Supplementary data for chapter 2.2 112 Appendix II: Supplementary data for chapter 3 149 Appendix III: Supplementary data for chapter 4 170

iii Manuscripts and conference proceedings

This thesis contains published manuscripts, manuscripts currently under review for publication and research that was presented at national and international conferences. They are as listed below.

2021 Under review: Samaha, G., Wade, C.M., Mazrier, H., Grueber, C.E., Haase, B. Harnessing cross- species SNP methodologies for conservation genomics: A comparison of genotyping array and reduced-representation sequencing methods in wild felids. Molecular Ecology Resources

2020 Under review: Samaha, G., Wade, C.M., Mazrier, H., Grueber, C. E., & Haase, B. Exploiting genomic synteny in Felidae: using cross-species genome alignment and SNP discovery to inform conservation management in big cats. BMC Genomics

Samaha, G., Wade, C.M., Betty, J., Lyons, L.A., Fleeman, L.M., Haase, B. (2020) Mapping the genetic basis of diabetes mellitus in the Australian Burmese cat (Felis catus). Scientific Reports 10, 19194 doi: 10.1038/s41598-020-766166-3

Samaha, G. Advances in Felid Genomics. Postgraduate conference, November 2020, Sydney School of Veterinary Science, The University of Sydney, NSW, Australia.

2019 Samaha, G., Beatty, J., Wade, C., Haase, B. (2019) The Burmese cat as a genetic model for type 2 diabetes. Animal Genetics 50(4) doi: 10.1111/age.12799

Samaha, G., Lamande, S., Haase, B. Exploring the functional role of TRPV4 variant c.2041G>A in osteochondrodysplasia in cats. The 10th International Conference on Canine and Feline Genetics and Genomics, May 2019, Bern, Switzerland.

2017 Samaha, G., Beatty, J., Wade, C., Haase, B. The Burmese cat as a spontaneous model for type 2 diabetes. 2017 EMRC Conference, Charles Perkins Centre, Sydney, NSW, Australia.

Samaha, G., Beatty, J., Lyons, L., Wade, C., Haase, B. Revealing the genetic basis of diabetes mellitus in Burmese cats. The 9th International Conference on Canine and Feline Genetics and Genomics, May 2017, Saint Paul, Minnesota, USA.

2016 Samaha, G. & Haase, B. Conservation genomics and genetic disorder management of threatened zoo felids. Postgraduate conference, November 2016, Faculty of Veterinary Science, The University of Sydney, NSW, Australia.

iv Authorship Attribution Statement

Chapter 2.1 of this thesis is published as Samaha, G., Beatty, J., Wade, C., Haase, B. (2019) The Burmese cat as a genetic model for type 2 diabetes. Animal Genetics 50(4) doi: 10.1111/age.12799

I wrote this review under the supervision of Assoc. Prof. Bianca Waud (Haase) and Prof. Claire Wade. Together with Assoc. Prof. Bianca Waud, Prof. Claire Wade and Prof. Julia Beatty, I conceptualised the research. I drafted the manuscript and developed the ideas and arguments within it. Critical revisions were made by myself, Assoc. Prof. Bianca Waud, Prof. Claire Wade and Prof. Julia Beatty.

Chapter 2.2 of this thesis is published as Samaha, G., Wade, C.M., Betty, J., Lyons, L.A., Fleeman, L.M., Haase, B. (2020) Mapping the genetic basis of diabetes mellitus in the Australian Burmese cat (Felis catus). Scientific Reports 10, 19194 doi: 10.1038/s41598-020-766166-3

I conceptualised this study with Assoc. Prof. Bianca Waud, Prof. Claire Wade and Prof. Julia Beatty, under the supervision of Assoc. Prof. Bianca Waud, Prof. Claire Wade. I performed all experimental work, data analysis and visualisation. Samples were provided by Prof. Julia Beatty, Prof. Leslie Lyons and Dr Linda Fleeman. Clinical diagnoses were performed by Prof. Julia Beatty and Dr Linda Fleeman. I drafted the original manuscript. Critical revisions were made by myself, Assoc. Prof. Bianca Waud, Prof. Claire Wade, Prof. Julia Beatty, Prof. Leslie Lyons and Dr Linda Fleeman.

Chapter 3 of this thesis is currently under review with the journal BMC Genomics. It was submitted as Samaha, G., Wade, C.M., Mazrier, H., Grueber, C. E., & Haase, B. Exploiting genomic synteny in Felidae: using cross- species genome alignment and SNP discovery to inform conservation management in big cats. BMC Genomics.

I conceptualised this study with Assoc. Prof Bianca Waud, Prof. Claire Wade, Dr Hamutal Mazrier and Dr Catherine Grueber and performed all experimental work, data analyses and visualisation under the supervision of Assoc. Prof. Bianca Waud and Prof. Claire Wade. I drafted the original manuscript and critical revisions were made by myself, Assoc. Prof Bianca Waud, Prof. Claire Wade, Dr Hamutal Mazrier and Dr. Catherine Grueber.

Chapter 4 of this thesis is currently under review with the journal Molecular Ecology Resources. It was submitted as Samaha, G., Wade, C.M., Mazrier, H., Grueber, C.E., Haase, B. Harnessing cross-species SNP methodologies for conservation genomics: A comparison of genotyping array and reduced-representation sequencing methods in wild felids. Molecular Ecology Resources.

v

Georgina Samaha 28/02/21

As supervisor for the candidature upon which this thesis is based, I can confirm that the authorship attribution statements above are correct.

Assoc. Prof. Bianca Waud 28/02/21

vi Abbreviations

CVD Congenital vestibular disease FH Familial hypocholesterolaemia FDM Feline diabetes mellitus GWAS Genome-wide association studies IA Islet Amyloidosis IUCN International Union for the Conservation of Nature Kb Kilobase Kya Thousand years ago LCA Last common ancestor LD Linkage disequilibrium MAF Minor allele frequency Mb Megabase MDS Multidimensional scaling mtDNA Mitochondrial DNA MY Million years Mya Million years ago MCH Major histocompatibility complex NGS Next generation sequencing OMIA Online Mendelian Inheritance in OMIM Online Mendelian Inheritance in Man PCR Polymerase chain reaction RFLP Restriction fragment length polymorphisms RH Radiation ROH Runs of homozygosity SNP Single nucleotide polymorphism SNV Single nucleotide variant SSP Species Survival Plan STR Short tandem repeats T2D Type 2 diabetes VCF Variant call format VEP Variant effect predictor WGS Whole genome sequence

vii Abstract

The cat family (Felidae) has a unique evolutionary history, and their highly conserved genomic architecture offers us a glimpse into the ancestral genome organisation of and the processes of and adaptive evolution. Since the development of the first feline genome assembly, next generation sequencing technologies and genetic resources have been used to expand our understanding of cat family biology and population demographics. Patterns of genomic structure among cats have revealed speciation events of the 38 extant felid species and the consequences of domestication and establishment. Given their value to comparative studies, felids increasingly serve as models for studying species conservation, evolution and adaptation, domestication, and disease gene discovery. The ongoing development of genomic tools in felids has largely focused on the domestic cat (Felis catus) and its biomedical relevance to heritable diseases in humans. Generations of inbreeding for aesthetic traits in domestic cat have resulted in genetically isolated populations with a simplified genetic landscape, suitable for genome-wide surveys that explore population structure and breed-specific traits and diseases. While recent genomic advances have increased opportunities for conservation research in similar ways to model species, species of interest to conservation research typically lack the basic genomic resources necessary to perform efficient, cost-effective and pragmatic genome-scale research. This limits the scope of inquiry into their population structures, the demographic events that have shaped them, and their ability to overcome the anthropogenic threats they face. Domestic cats share a high degree of genomic synteny with their wild counterparts who are amongst the most vulnerable species in the world. In the absence of high-quality reference genomes, cross-species genome alignment and variant calling methods can serve as a reliable, cost-effective method for single nucleotide polymorphism discovery, thanks to the high degree of genomic synteny among felids.

In this thesis, genomic tools developed for the domestic cat are applied to demographic and health scenarios in a domestic cat breed and endangered wild felid species. I demonstrate the advantages of using pedigreed breeds to construct the genetic landscape of a complex diseases in humans using the Burmese breed as a naturally occurring model of type 2 diabetes. I then demonstrate the cross-species utility of genomic resources in wild felid conservation using commonly applied reference-based methods. Using the feline reference genome to perform variant detection across Sumatran tigers (Panthera tigris sumatrae), snow leopards (Panthera uncia) and cheetahs (Acinonyx jubatus), we gain insights into population dynamics, evolutionary history, and disease management of wild cat species relative to the domestic cat genome. Further, I compare two commonly used low-density variant datasets: reduced representation sequencing and the feline genotyping array in their ability to estimate population structure and identify genomic regions underlying selection in Sumatran tigers, snow leopard and cheetahs.

1 Chapter 1: Introduction

Domestic cats and their wild counterparts have a unique evolutionary history that make them an exceptional model for studying comparative, evolutionary, biomedical and conservation genomics. The genomic architecture of the cat family, Felidae, is highly conserved, giving us a glimpse into the ancestral genome organisation of mammals, the processes of domestication and adaptive evolution. Since their domestication, the serendipitous consequences of selective breeding have left distinct genomic traces in domestic cat breeds that can been leveraged to map the genes underlying breed defining traits and heritable diseases. Advances in genomic tools and bioinformatic techniques including annotated reference genome assemblies, linkage maps and the feline genotyping array, have been essential in expanding our understanding of the adaptive signatures of feline domestication and the relevance of the domestic cat as a biomedical model for heritable human diseases. Felidae is comprised of some of the most successful and threatened predators on Earth. The extension of genomic studies in wild cat species has illuminated patterns of genetic variance relating to demographic history, speciation and adaptation. However, there remains a notable gap between these studies and the integration of genomics in the conservation management of these species. In this chapter, I provide an overview of recent genomic advances that have expanded our knowledge of felid biology and discuss how invaluable the cat is to comparative genomic studies across a range of disciplines.

1.1 A brief history of felid taxonomy

Felids are highly specialised ambush predators. Unique among carnivores, they display a strict adherence to eating meat (hypercarnivory). Consequently, felids share many morphological and physiological features essential to stalking, capturing, subduing and consuming their prey (1) and the metabolic demands of a prey- based diet (2, 3). This specialisation has been essential to the limited degree of taxonomic diversification observed across felids (4). The cat body plan is notably consistent, despite displaying a range in size over two orders of magnitude, from the rusty spotted cat (Prionailurus rubiginosus ~1kg) to the tiger (Panthera tigris ~300kg). Specialised morphological adaptations associated with hypercarnivory in cats include cranial and facial features that impart high bite force, supination of the forelimb and protractile claws for handling struggling prey (5), and enrichment of olfactory and auditory faculties (6-8).

All living cats share a last common ancestor (LCA) ~10-15 million years ago (Mya) (9-11), with all major felid lineages established during a brief period spanning the late Miocene and early Pliocene (6.4-2.9 Mya). Today, at least 38 felid species are recognised globally. Prior to the use of DNA sequencing, partitioning felid species based on generic associations was inconsistent. Attempts to clarify these generic associations based on incomplete fossil records (12, 13) and living felid morphology were hampered by the uniform morphology of felids (14, 15) (reviewed by (16)). These studies also lacked a robust methodology with which to construct relationships among felid species. Before the advent of molecular techniques, felid species were split into two groups based on morphological characteristics: big cats (Panthera) and small cats () (14). This classification was based on the presence of an elastic ligament called the epihyoideum in the hyoid apparatus of big cats, which is completely ossified in smaller cats (14, 17). This difference in hyoid structure was believed responsible for differences in species’ vocal repertoires with big cats able to roar but not and smaller cats able to purr but not roar. More recently, it has been shown that the fundamental morphological difference

2 between roaring and purring cats was the vocal cord structure in the larynx (18), undermining the classification of big and smaller cats based on vocal repertoire, as snow leopards (Panthera uncia) are not known to roar and unlike the tiger, lion (Panthera leo), leopard (Panthera pardus) and jaguar (Panthera onca), do not possess the laryngeal structure required to roar (19, 20).

Generic grouping of species is complicated by morphological similarities among species, an incomplete fossil record (21), introgression between lineages (22) and recent (<1 million years (My)) individual speciation events (11). Molecular techniques, like polymerase chain reaction (PCR) (23, 24) and Sanger sequencing (25), coupled with improvements in computing technologies were a turning point in felid systematics. Using these techniques, researchers were able to compare DNA sequences between species. The inclusion of genetic classifications confirmed many morphological species classifications and lead to an expansion in the number of recognised genera (26-28). Today, there are 14 recognised felid genera, arranged using a combination of genetic and morphological techniques (9, 29, 30) (Figure 1). All species are organised in two subfamilies; Pantherinae and Felinae, this includes the cheetah (Felinae) which was long regarded as an early subfamily separation. Molecular techniques allowed researchers to standardise phylogenies by correlating times of divergence among species with known evolutionary events. For example, 16S rRNA and mitochondrial DNA (mtDNA) were used to divide Felidae into two subfamilies, Pantherinae and Felinae (28) and clarify the divergence times within the Panthera lineage (31).

While molecular techniques found much common ground with morphological approaches in broader generic classification of felids, some topological conflict among and within molecular studies has persisted. This is exemplified by inconsistencies between molecular phylogenetic studies of the Pantherinae genera that have posed a variety of different relationship hypotheses (28, 29, 32-34). Congruence between molecular studies is essential to confidence in resultant reconstructions of the evolutionary history of a group of organisms and conflicting results likely reflect differences in methodology (35). While many cladistic studies based on molecular techniques focused on single genes, other studies (33, 36, 37) demonstrated the variable rates of gene evolution between felid species, cautioning against approaches that construct phylogenies in this way. More recently, genomic approaches have further demonstrated this through discordant phylogenetic pattern resolution across maternal and parental genomic partitions and between nuclear genome and mitogenome phylogenies (6, 9).

As has been observed in humans (homo sapiens) and chimpanzees (Pan troglodytes), cat evolution has occurred too recently for accumulated ancestral polymorphisms to effectively separate extant species into monophyletic lineages (9, 28, 33, 38). The challenges posed by rapid felid diversification to resolving phylogenetic relationships may also offer a unique opportunity to clarify the mechanisms underlying their adaptive divergence. A high degree of genomic synteny has been observed across felid species using cytological and genomic tools. Early chromosome staining techniques described cytogenic homogeneity among felid genera (39). The felid chromosome complement includes 18 autosomes and an XY sex-determining chromosome set (40, 41). Felid karyotypes are still characterised by historical chromosome grouping based on size and telomeric position. The standard felid karyotype has three large metacentric chromosomes (A1-A3), five large subtelomeric chromosomes (B1-B4, X), two medium-sized metacentric chromosomes (C1, C2), four subtelomeric chromosomes (D1-D4), three small metacentric chromosomes (E1-E3) and two small acrocentric chromosomes (F1, F2). While all of Felidae share a similar karyotype, some rearrangements have been

3 characterised, namely the ocelot chromosome complement, comprised of 16 autosomes (39) with a Robertsonian translocation of chromosomes F1 and F2 forming chromosome C3 in this lineage (41). Some variation in the felid karyotype is provided by other small pericentric alterations to small chromosomes but the domestic cat chromosomal architecture is archetypal for felids (39, 42). Beyond this, cat chromosome architecture is ancestral among (43) and along with humans (2N=44), the cat karyotype (2N=38) is highly similar to the ancient mammalian founder karyotype (44). Genomic studies have repeatedly revealed a high degree of genome-wide synteny and repeat composition shared between felids (6, 45-47) and striking levels of concordance among gene pathways implicated in interspecies introgression support conserved morphology among Panthera species (22).

Figure 1: Cladogram displaying the phylogenetic relationships of living felid species. Molecular time estimates are based on work by Johnson et al. (2006) and Li et al. (2016). Figure is author’s own work, cladogram made with TimeTree (Kumar et al. 2017).

4 1.2 From wildcat to house cat

Despite being known as aloof and temperamental, the domestic cat has become one of the most populous and well-loved household . They have served us as important cultural symbols, pest-control agents and companions. Evidence suggests that cat domestication was unique among domestic species, driven largely by a commensal relationship with humans, rather than selected by humans to perform specific tasks (48). Domestic cats are descended from wildcats (Felis silvestris), a polytypic species comprised of five (49), distributed across Africa, the , and Central . Determining which of the five wildcat subspecies preceded the domestic cat has been complicated by regular interbreeding of domestic cats with wildcats (50-53), shared morphological features and widespread evidence of multiple independent feline domestication events from around the globe (48, 54-57). In a phylogenetic analysis using mtDNA and short tandem repeats (STR) (49), domestic cats were found to cluster distinctly with the African wildcat (Felis silvestris lybica) to the exclusion of other subspecies. The earliest archaeological evidence of a relationship between cats and humans comes from Cyprus, ~9 Kya (57). Cats are not endemic to Cyprus, and fossilised remains of a cat on the island demonstrate that cats were introduced by humans 1,500 years earlier (55, 56). Cats developed a commensal relationship with Neolithic settler communities growing cereal, encouraged by people to settle in villages to control rodent populations attracted to stored grains (58).

While animals can be domesticated rapidly, given the right conditions (59, 60), a cat’s temperament makes it an unlikely candidate for domestication. Most other domesticated species were purpose-bred and played an essential role in the Neolithic agricultural revolution (61-63). Animal domestication involves the development of a mutually beneficial relationship between humans and wild species through the exploitation of favourable behaviours. These favourable characteristics in domesticates-to-be typically include hierarchical social structures, dominant males, parent-offspring bonding, low reactivity to humans and generalist feeding (64-66). Across domesticated species this process can largely be grouped into three pathways: the commensal pathway, the prey pathway and the directed pathway (66). A surprising range of species have taken the commensal path along with the domestic cat, including chickens (Gallus domesticus) (67), dogs (Canis familiaris) (62) and (Sus scrofa) (68). It is believed these species were each drawn to human settlements to prey on small commensal species and scavenge scraps and developed a tolerance to humans along the way. Interestingly, domestication did not dramatically alter the behavioural, morphological, physiological and ecological traits of cats as it has in other species, like the dog (69). The process of domestication has not been linear, as cats formed commensal relationships with humans and inevitably became domesticated, some bred with local wildcat subspecies and others became feral (70, 71). This interbreeding likely underlies the genetic congruence between wildcats and randomly bred cat populations (9, 72, 73).

Around the globe over the last ten thousand years, small groups of cats underwent artificial selection based on regional aesthetic tastes. This gave rise to the feline predating the establishment of pedigreed breeds (Table 1) (74). Genetically distinct regional populations across Asia, Western Europe, East Africa and the Mediterranean basin each gave way to a subset of modern breeds that shared a basic genetic architecture. From these regional landraces, various phenotypic forms that differed in coat colour and patterning, craniofacial structure, ear and tail conformation and hair type were developed. This diversity has mostly arisen over the past ~150 years as cat fanciers have developed 71 internationally recognised pedigreed breeds (http://tica.org). Cat breed development has involved selective breeding of a subset of cats from a natural population (like the

5 Persian or ), selection for spontaneous de novo mutations that define a new breed (like the Scottish Fold (75), Munchkin (76) and Rex (77) breeds) and crossbreeding across ’ (like the Bengal, Savannah and breeds (78)).

Table 1: Traditional domestic cat breeds and their origins.

Region Breed name Origin Year of Derivative breeds establishment Asian Abyssinian India <1871 Australian Mist, Chausie, Somali, Ocicat, Singapura Myanmar <1871 Snowshoe, Ragdoll Burmese Myanmar 1350-1767* , , Bombay, Singapura Kuril Island <1871 Japanese Bobtail Japan 1600s Korat Thailand 1350-1767* Thai Thailand 1350-1767* Siamese, , Burmese Mediterranean Aegean Greece <1871 Turkish angora Turkey <1871 Persian Turkish van Turkey <1871 European Chartreux France <1558 Manx Isle of Man <1871 Cymric Maine Coon USA <1871 Norwegian forest Norway <1871 Russian blue Russia <1871 Siberian Russia <1871 East Africa Arabian Mau Middle East <1100 BC Egyptian Mau Egypt <1000 Bengal Persian Middle East <1620 Exotic Shorthair, Himalayan, Ragdoll Kenya <1871

*Dates taken from http://tica.org, http://cfa.org, (79), (80)

Most modern breeds have been developed from natural populations and refined over time based on selection for a unique body conformation. The is one of the oldest cat breeds, characterised by its long coat, round face and short muzzle. From its origins as a in Iran, it was imported to Italy in the 1620s and from there exported around the world (81). Persians were shown at the first in London in 1871 organised by Harrison Weir who was responsible for establishing the Standards of the Cat Fancy as they are known today. Selection for short faces, snub nose and round full cheeks, shifted to the more extreme ‘Peke- faced’ Persian standard of today. This modern standard was popularised in the USA and today, selection for

6 an extremely brachycephalic face is cause for welfare concerns among veterinarians (82). Evidence suggests that this extreme phenotype carries with it an elevated predisposition to heritable conditions including facial fold pyoderma, facial dermatitis, epitrichial cysts, corneal sequestrum and immune deficiencies resulting in susceptibility to infection (83, 84). The Persian breed serves as a good example of the deleterious consequences that can follow selective breeding. While not usually so extreme, the closed breeding systems used to maintain pedigreed lines leave distinct traces in the genome, particularly when demographic events like population bottlenecks occur. Breeding from a small pool of individuals leads to a loss of genetic variation in subsequent populations, creating a founder effect. This practice has resulted in populations of low genetic variability and includes the Persian breed which has one of the highest levels of inbreeding among all pedigreed breeds (74). The simplified genetic architecture of pedigreed breeds consists of regions of high homozygosity, referred to as runs of homozygosity (ROH) surrounding genes responsible for breed-defining traits (85-87). In Persians, these regions encompassed genes CHL1 and CNTN6, known to determine face shape, as well as genes underlying neurological function and behaviour (LRRN1, RYK and SLC16A7) (84).

A smaller collection of breeds has been developed through selection of a novel mutation that arose spontaneously in a single individual, with these traits typically being under Mendelian inheritance and caused by variation within a single gene. Resulting breeds are defined by their unique physical characteristics like the Scottish Fold with its forward-folded ears, the ‘werewolf’ and dwarfed Munchkin. As with breeds developed from natural populations (like the Persian), allelic variants that define these breeds often have adverse off-target effects in other tissues. For example, allelic variations underlying the Scottish Fold’s ear phenotype also predisposes it to osteochondrodysplasia, the malformation of the distal limbs and tail, accompanied by accelerated degenerative joint disease (88, 89). The Scottish Fold was developed in the early 1960s from a single barn cat in who had folded ears. The folded ear phenotype is an autosomal- dominant trait associated with a c.1024G>T substitution in TRPV4 (75). In the Scottish Fold ear, TRPV4 causes a weakness in the pinna cartilage, causing the ear to fold forward. Dysfunction of this gene has been shown to disrupt endochondral ossification on articular bone surfaces resulting in reduced length and malformation of metacarpal and metatarsal bones (90, 91). While age of onset and clinical signs of osteochondrodysplasia is inconstistant between homozygous and heterozygous cats (92), homozygous cats display more severe clinical signs of limb deformity and degenerative joint disease from a young age. Welfare concerns regarding chronic pain and disability changed breeding practices globally, ensuring no fold-to-fold matings take place and desexing of homozygous Scottish Folds (93, 94). While osteochondrodysplasia is of serious welfare concern, as a single gene disorder, it is likely easier for breeders to select against than a complex trait like brachycephaly in Persians. Additionally, reported variability of the disease state in heterozygous cats does not definitively support an end to breeding Scottish Folds.

Breeding strategies have resulted in cat populations with unique genetic profiles, marked by the signatures of population bottlenecks (Figure 2), artificial selection and inbreeding. Cats are unique among domesticated species, having experienced a relatively short period of intense artificial selection in the last 150 years. Additionally, unlike other domesticated species, artificial selection has focused mostly on aesthetic traits, not complex behavioural traits. Coat colour and length are the major focus of selective breeding in cats, these traits are under Mendelian inheritance. Sometimes these simple traits are all that delineate pedigreed breed standards, this is the case between the longhaired Persian and breeds (80). Some of these traits are required for individual breed registration, as is the case for the KRT71 c.445-1G>C variant

7 and the previously mentioned Scottish Fold TRPV4 c.1024G>T variant. Together, the short period of breed development and selective breeding has resulted in populations with a simplified genetic landscape, suitable for mapping the genetic basis of breed-specific traits. The frequent inbreeding required to fix these traits in a breed creates extensive tracts of linkage disequilibrium (LD) among loci on each chromosome (95). Researchers have taken advantage of this structure to develop tools with which to map the genetic basis of breed-specific traits, heritable diseases and domestication (96).

Figure 2: Changing haplotype structure of domestic cats through breed establishment. Breeding for aesthetic traits in domestic cats has resulted in genetically isolated populations that are defined by long runs of homozygosity, extensive linkage disequilibrium (LD) and low levels of genetic diversity. Figure is author’s own work.

1.3 The expansion of feline genomics

In recent years, the domestic cat has been the subject of investigations into the genetic basis of disease susceptibility, host immune responses, domestication and mammalian genome evolution (97). As an important

8 model for the advancement of hereditary disease research, a growing bank of genomic tools and resources have been developed to characterise genome organisation and genes of interest in the domestic cat. This includes genetic linkage maps, somatic cell and radiation hybrid maps, annotated reference genome assemblies, single nucleotide polymorphism (SNP) array and more recently the 99 Lives Project. This growing bank of resources is largely due to the biomedical relevance of the cat as a model species for heritable and infectious diseases in humans. Studying heritable traits in populations with simplified genetic landscape like a pedigreed breed is beneficial for genome-wide mapping studies (95). Further, the pathological and clinical similarities in heritable diseases of cats and humans is well documented (98). Naturally occurring orthologous diseases can capture the complex clinicopathological and physiological reality of a disease state. The inheritance pattern as well as clinical and physiological details of over 360 heritable traits and diseases have been documented in the cat and over 200 of these are considered potential models for human disease (see omia.angis.org.au (99)).

Following the successful completion of the human genome project (100), low coverage (2x) genome assemblies for 29 species representing the breadth of mammalian diversity were sequenced (101). This was primarily intended to aid in the annotation of the human genome and characterise divergence among mammalian radiations through comparative sequence alignments. The species selected for this endeavour included primates, domestic ruminants, bats, rodents, and companion animals, among others. The chosen feline representative was a female named Cinnamon. The initial feline genome assembly consisted of 663,480 contigs at 1.9x coverage (47). Assembly, mapping and annotation relied on cross-referencing the cat with six previously developed genome assemblies (human, , rat, dog, chimpanzee and cow and successfully mapped 92% of canine genes, 89% of cow and chimpanzee genes and 83% of human genes, highlighting the value of the feline genome sequence to comparative genomic studies (100, 102). Despite its low coverage, features of the initial feline assembly included: homologous synteny blocks shared among all seven species, details of species- and lineage-specific inter- and intra-chromosomal rearrangements, 20,285 orthologous genes, an abundance of novel retroviral elements and patterns of homozygosity useful for LD mapping.

Increasingly detailed gene maps and genome assemblies of the domestic cat have been developed to provide tools for discerning the heritable basis of morphological variation and disease. Early developments included: somatic cell, ZOO FISH and radiation hybrid maps (103-106), genetic linkage maps (40, 107) and several genomic libraries for genome assembly and gene discovery (108, 109). Radiation hybrid (RH) maps were critical to gene discovery in domestic cats, used to construct gene maps and capturing the natural history of genome organisation. They were based on microsatellite genotyping, as such they had low marker densities (104, 110) and relied heavily on comparative mapping to improve long-range precision (105, 110). These maps were constructed using multi-generational domestic cat pedigrees (111-113) and inter-species hybrids (104, 107). Autosomal marker densities were consistently improved upon, however, they remained limited in their ability to improve the quality of the feline reference assembly, narrow candidate regions and identify causative mutations. Four revisions of the cat reference assembly have since improved upon the initial low-coverage assembly, each offering more accuracy with fewer misassemblies and greater genome coverage than their predecessor. The felCat8 assembly used a high-resolution linkage map based on SNP genotyping of multigenerational pedigrees of domestic cats to improve the contiguity of the earlier cat reference assembly to a read depth of 20x (114). The high-resolution linkage map was based on the 62,897 SNPs on the Illumina

9 Infinium iSelect 63K genotyping array. The inter-marker distance of the feline SNP array was ~1 SNP/50 Kb (Figure 2), offering a 20-fold improvement on previous microsatellite-based maps (104, 105, 113, 115).

Following the release of the initial low-coverage (1.9x) feline draft assembly, the cat’s first high-density SNP map was developed (47, 116). This initial map consisted of over 3 million SNPs and served as the basis of the feline Illumina Infinium iSelect 63K genotyping array (116). The 62,897 SNPs on the feline array make it a relatively low-density genotyping array compared with other domestic species currently (117-120). Despite this, it has been essential to trait and population-based analyses in cats (96), having also been used to validate and arrange chromosome scaffolds of the felCat8 assembly (114). Initially mapped to the felCat4 assembly, SNP positions have been remapped to all feline genome assemblies (76, 96, 121, 122), demonstrating a high genotyping rate across breeds (<0.01%) compared with other domestic species (96, 117, 118, 123). The feline array has an inter-marker distance of ~38 Kb, 20 inter-marker gaps >500 Kb and low coverage of the X chromosome (Figure 3). Substantial differences in LD among breeds genotyped on the 63K array have shown that Eastern breeds (Birman, Burmese, , Siamese, ) have more extensive LD (96), making them more suited to association and population-based analyses using the array. Extended haplotypes and LD within cat breeds have been used to localise breed-specific traits through genome-wide association studies (GWAS) using the feline array (96). This has included: the folded ear phenotype and osteochondrodysplasia in the Scottish Fold (75, 95), hypokalaemia (124) and craniofacial dysplasia in the Burmese cat (125), hereditary myopathy in the and Sphynx (126) and the curly coat phenotype of Selkirk Rex cats (127). While complex trait analyses using the feline array have been limited, those that display elevated prevalence in breeds displaying extensive LD have been successful like the Burmese (121), Persian (84) and Birman (128).

Genome-scale science promised comprehensive investigation of genetic differences that underpinned essential traits, adaptive evolution and heritable diseases in species spanning the tree of life. Across all species, reference genomes have reached a mythic status with ambitious consortia such as the Earth Biogenome Project (129) and the Genome 10K Project (130) aiming to sequence and categorise the genomes of the Earth’s eukaryotic biodiversity in the hopes of preserving biodiversity and revolutionising our understanding of biology and evolution. Improved sequencing technologies now offer long read sequencing with unbiased coverage for generating de novo genome assemblies and the promise of publicly available, gold standard assemblies for a growing number of species. Reference genomes have fast become the foundation of genomic data analyses in model species used for: scaffolding of future assemblies, variant calling, RNA sequencing, gene annotation and functional analyses.

10

Figure 3: Distribution of SNPs on the Illumina Infinium iSelect genotyping array across autosomes (A1-F2) and the X chromosome mapped to the felCat9 genome assembly. The key indicates the number of SNPs within 1 Mb windows. Figure is the author’s own work, ideogram made using R package CMPlot (Yin et al. 2020). SNP locations taken from the SNP manifest presented in Samaha et al. 2020.

While reference assemblies have contributed significantly to biological research, they are a type-specimen, not a baseline for the global population of a species. As a type-specimen, Cinnamon was specifically selected because of her genetic history, defined by three episodes of historic inbreeding that simplified her genome to a patchwork of long stretches of homozygosity (57% of her genome), interspersed among regions of heterozygosity. These events were: an initial domestication event (~10 Kya), the development of the Abyssinian breed (49) and pedigree establishment (131). Given this history, Cinnamon’s allelic profile is not representative of the global cat population, dominated by randomly bred cats. Like their human companions, domestic species display a wide degree of allelic diversity, geographic and ethnic population clustering (74). A reliance on pedigreed breeds for generating genomic data may streamline the search for simply and recessively inherited variants, but it also limits our ability to capture the global allelic variation in domestic cats. Further, as a type- specimen, reference genomes can distort results of genome alignments in regions where it is not ‘typical’. This refers to the tendency of some sequencing reads to map more reliably to reference alleles than non-reference alleles. Reference bias can induce errors in variant calling, a loss of rare alleles, and a bias toward reflecting the properties of the reference genome (132).

The feline reference assembly is only a small part of the complex model by which we can estimate how both an individual’s genes and their environment interact to create a unique combination of phenotypic traits. To better capture the global genetic variation in cats, the 99 Lives Cat Genome Sequencing Initiative (http://felinegenetics.missouri.edu/) (herein referred to as ‘99 Lives Project’) has recently been established.

11 Following in the footsteps of similar initiatives in humans (133, 134) and other domestic species (135, 136), the 99 Lives Project is a collaborative effort that seeks to sequence a range of breeds to better capture the true extent of genetic variation in cats through the sharing of genome sequence data among the research community. To date, cats included in the 99 Lives Project represent over 50 heritable trait and disease research projects, including wild cat species, that have been used to develop clinical genetic tests for heritable diseases and breed defining clinical traits (137, 138).

1.4 The domestic cat as a model for mapping heritable disease in humans

Domestic cats are subjected to extensive veterinary surveillance, presenting a myriad of disorders that bear close resemblance to analogous conditions in humans. Naturally occurring, heritable diseases in companion animals are increasingly used as informative models for diseases in humans, particularly in genetic mapping studies (139). Comparative studies have shown us that the sequences of genes determining biological function among eukaryotes are highly conserved (140-142). As such, knowledge of the biological role played by a gene or its protein products in one organism can be used to illuminate its role in other organisms. The domestic cat’s utility as a model species was a major driving force behind the development of cat-specific genomic tools (143). Humans and cats share 15,962 orthologous genes and selection pressures within these genes is highly correlated (76). Further, the total number of genome-wide variants discovered in cats surpasses those in dogs (135, 144, 145), rats (146), pigs (147), horses (148) and rhesus macaques (149), all of which benefit from a growing bank of genomic resources as model species. Over 36 million biallelic SNPs have been discovered in cats (76), twice as many as has been discovered in humans (134). Animals with higher levels of genomic diversity than humans offer a unique opportunity to discover pathogenic and tolerated loss of function variants responsible for analogous disease states in humans. Variant discovery across orthologous genes can reveal rare, loss of function, risk, and pathogenic loci of biomedical relevance to humans. In cats, sequencing efforts like the 99 Lives Project continue to improve resolution of rare and pathogenic variants of potential clinical significance in humans.

Many of the practical and ethical considerations regarding the use of animal models include their ability to convey meaningful information regarding a disease state (150). Using naturally occurring disease models circumvents many of the drawbacks associated with in vivo experimental work involved in induced illness and transgenic animals, and allows us to observe the natural progression of an illness over time (139, 151, 152). Cats have a long history as models for studying human disease, used to characterise clinical presentation and allelic segregation of Mendelian traits long before they were used to map disease causing allelic variants and evaluate gene therapies (153). Orthologous and naturally occurring disease models for humans are rarely identified in non-human primates, however as our companions, cats share our homes and many of the environmental and lifestyle factors that complicate our understanding of complex heritable diseases and traits. Together with the reduced genetic heterogeneity shown by pedigreed breeds, shared environment and lifestyle factors can be leveraged to map the genetic basis of simple and complex traits (95). Demographic processes of selection, genetic drift and crossing have fixed favoured alleles in populations of pedigreed breeds. In some cases, these fixed alleles are accompanied by undesirable, disease causing gene variants that are responsible for increased prevalence of a heritable disease among specific breeds or geographical populations. Many of the variants that have arisen in these populations have only a minor effect on reducing fitness and can persist

12 in populations (154). Further, selective breeding practices using a limited number of individuals reduces effective population size, thereby increasing the burden of deleterious variants across the genome, not just around swept regions under intense selection (155).

Currently, 362 heritable traits and disorders have been characterised in the domestic cat (99), 228 of which may serve as potential models for human diseases. Likely causal mutations for more than 65 of these conditions have been characterised and commercial tests are available for many of them (Table 2) (137, 138). Early cytogenetic techniques confirming karyotypic similarities between cats and humans (43) established cats as an informative model for some well-known chromosomal abnormalities in humans, like Turner’s Syndrome (XO) (156, 157) and Klinefelter’s Syndrome (XXY) (158-160). Following the discovery of X-inactivation in tortoiseshell cats (161, 162), sex chromosome abnormalities in calico and tortoiseshell males have led to the development of karyotypic and gene-based assays for sex development disorders (163-167). The availability of these models has been essential to understanding the pathophysiology of these disorders, linking these supernumerary X chromosomes with germ cell loss and dysfunction and cognitive deficits. Many of the genes underlying simple traits in cats have been identified using a candidate gene approach, which requires a clear understanding of gene function and disease pathophysiology. Most disease variants identified using a candidate gene approach are uniquely observed in selected pedigreed breeds, have near complete penetrance and an autosomal mode of inheritance (137) (Table 2). This is demonstrated by spontaneous retinal diseases segregating in Bengal and Abyssinian breeds and segregation of GM2 gangliosidosis in Burmese and breeds (168-170) that have led to gene-based therapies in humans (131, 171-175).

More recently, GWAS, in which a set of SNPs across an individual’s genome are surveyed for significant association with an observed phenotype, have been used to identify causal or risk loci that underlie a disease. The transition from a hypothesis-driven approach focused on biologically relevant genes to GWAS provided a novel framework for naming the genetic variants underlying common and complex disorders. Gene mapping can be used to clarify an incomplete understanding of heritable disease pathophysiology. In domestic cats, the structure of pedigreed breeds is advantageous to localising the genetic basis of these heritable traits and diseases using GWAS (95, 102, 117). Generations of inbreeding have resulted in populations with small effective population sizes, extensive LD, few, and long common haplotypes and long ROH (74, 95) (Figure 2). Cat breeds with the lowest levels of genetic variation and highest degree of LD, like the Burmese, Persian, Birman and Abyssinian, require fewer SNPs to map genetic loci associated with a trait or disease and can be better suited to association and selective sweep analyses using the feline genotyping array than other breeds (74, 95). The Burmese breed has been the subject of various GWAS to identify causative and risk variants underlying the breed’s predisposition to hypokalaemia (WNK4) (124), craniofacial dysplasia (ALX1) (125), and diabetes mellitus (ANK1) (121). Similarly, the Persian breed presents with a heightened predisposition to progressive retinal atrophy (122) and Chediak-Higashi syndrome (176).

13 Table 2: Monogenic disorders displaying increased breed prevalence in cats serve as a model for analogous diseases in humans. Affected Relevant Disease Gene Variant OMIA entry breeds OMIM entry Autoimmune lymphoproliferative British FASLG c.413_414insA 002064-9685 601859 disease (ALPS) shorthair

Brachycephaly Burmese ALX1 c.496delCTCTCAGGACTG 001551-9685 613456

Muscular dystrophy- Sphynx COLQ c.1190G>A 001621-9685 603034 dystroglycanopathy (limb-girdle) 191045, 601494, TNNT2 c.95-108G>A 002304-9685 612422, Hypertrophic cardiomyopathy 115195, 601494 Maine Coon, 615396, MYBPC3 c.91G>C, c.2460C>T 000515-9685 Ragdoll 115197 Verrucous epidermal keratinocytic Domestic 308050, NSDHL c.397A>G 002117-9685 nevi shorthair 300275 Korat, 230500, Gangliosidosis 1 GLB1 c.1448G>C 000402-9685 Siamese 230650 Burmese, c.1356_1362delGTTCTCA, Gangliosidosis 2 HEXB 01462-0985 268800 Korat c.39delC Norwegian IVS11+1552_IVS12-1339 Glycogen storage disorder IV GBE1 000420-9685 232500 Forest Cat del6.2kb ins334 Chondrodysplasia Munchkin UGDH c.950G>A 000187-9685 225500 Domestic longhair, Mannosidosis domestic MAN2B1 c.1749_1752delCCAG 000625-9685 248500 shorthair, Persian Progressive retinal atrophy Bengal KIF3B c.1000G>A 002267-9685 603754 Retinal degeneration II Abyssinian CEP290 IVS50 + 9T>G 001244-9685 611755 c.8347- Chediak-Higashi syndrome Persian LYST 000185-9685 214500 2422_9548+1749dup Domestic Mucolipidosis II GNPTAB c.2644C>T 001248-9685 252500 shorthair Domestic Cystinuria type A SLC3A1 c.1342C>T 000256-9685 220100 shorthair

14 The genetic profile of pedigreed breeds closely resembles that of genetically isolated human populations which have been most useful in modelling common and complex diseases with differential phenotypic presentation and allelic heterogeneity among ethnic populations (177). Genetically isolated populations are derived from a small number of individuals following a founder event (e.g. geographical, social or cultural barriers) that results in restricted admixture from other populations. Much like pedigreed animal breeds, genetically isolated human populations are characterised by reduced genetic heterogeneity and extensive LD and haplotypes (178). They also display an increased prevalence of heritable diseases (179, 180). Their extended haplotypes have been used to empower association analyses, while reducing sample size and the number of genetic markers required to map an association to a genetic locus (181-185). This population structure is particularly beneficial to complex disease mapping, capable of detecting rare recessive disease alleles often missed in large global populations (186). In small founder populations, the number of risk alleles predisposing individuals to a complex disease is more restricted than in a heterogeneous population. Population-specific allele frequency profiles are responsible for the differences in risk factors and phenotypes between populations. For example, an impressive number of candidate loci and population-specific T2D risk alleles have been identified across African populations (187), Southeast Asians (188), Greenlandic Inuit (177, 189) and Aboriginal Australians (190). This is comparable to the discovery of breed-specific allelic profiles and causative variants like has been observed in hypertrophic cardiomyopathy affecting Ragdoll (MYBPC3 (191)) and Maine Coon (MYBPC3 (192) and TNNT2 (193)) breeds. While these discoveries cannot always be replicated in other populations, their value ultimately lies in their contribution to exposing the stratified aetiopathogenesis of heritable diseases and the functional relevance of genes.

1.5 Felid conservation

Today, wild felids are among the most threatened mammals and many of the 38 extant cat species are the subject of global conservation action plans. They are essential members of a diverse range of boreal and tropical forest, savannah and desert habitats. As apex predators, wild cat species play a pivotal role in maintaining healthy, functional ecosystems (194), through control of herbivores and small predators (195, 196). While all cat species have experienced population declines (197-201), big cats (Panthera lineage) have suffered the most dramatically (202, 203) (figure 4). Together with their low reproductive rates and long lifespan, their requirement for large home ranges for hunting makes wide-ranging felids the most vulnerable to anthropogenic threats and persecution (194, 201, 204, 205). Genetic diversity in these vulnerable species is increasingly threatened by interactions with humans (206) and domestic species (51), habitat loss and fragmentation (207), human-wildlife conflict (208, 209) and population fragmentation. Given their ongoing vulnerability, ecological value and place in our collective consciousness, wild felids often serve as flagship and umbrella species for habitat conservation endeavours (see Sumatran tigers (210), snow leopards in the Tibetan plateau (211) and Saharan cheetahs (212)).

Flagship felid species are the subjects of internationally coordinated species recovery plans designed to stabilise declining populations and maintain their long-term viability. This is achieved through a framework of actions that mitigate threats like persecution and habitat loss, ex situ management and ecological, genetic and biological research to better understand a species’ ecological niche and adaptive traits. The IUCN Red List of Threatened Species is considered the leading authority on species conservation classification and extinction

15 risk and a key part of species recovery plans. The IUCN Red List classification of a species is based on recent demographic trends and habitat quality (202, 213, 214). Inherent in these extinction risk categorisations is the assumption that demographic trends leading to reduced population sizes will inadvertently result in a loss of genetic diversity from the population and subsequent loss of fitness (215-217). These processes are dominated by genetic drift and inbreeding which act in concert to bring about an accumulation of deleterious alleles in a population, potentially leading to inbreeding depression, compromised adaptive potential due to the loss of genetic diversity, and genetic isolation of fragmented populations (218-221).

Figure 4: The of the 38 listed felid species, organised by lineage. The Panthera (big cat) lineage has the highest proportion of vulnerable and endangered species indicated as a percentage. Data was sourced from the IUCN Red List of Threatened Species (IUCN 2020). Figure is author’s own work.

A significant challenge in the conservation of threatened populations is the ability to sustain effective population size and genetic diversity in the face of habitat fragmentation, illegal poaching and climate change. In natural populations, evolutionary and adaptive mechanisms of natural selection, genetic drift and gene flow act together to cause changes in allelic frequencies over time. However, the balance of these mechanisms is disrupted in threatened populations due to human-mediated habitat fragmentation and destruction. As such, inbreeding and rates of hybridisation are on the rise across many species and endemic gene pools have been severely endangered (222, 223). A loss of genetic diversity due to genetic drift is particularly of concern in small, fragmented populations in which fixation of deleterious alleles can increase a population’s extinction risk by reducing their overall fitness (215). Descriptions of historic bottleneck events in cheetahs (224), Siberian tigers (225), and panthers (226), have demonstrated the ongoing effects of small effective population sizes and inbreeding depression on reproductive and immune function. Recent widespread hybridisation between domestic cats and European wildcats (Felis silvestris), intensified by human-mediated range expansion of domestic cats, poses a threat to the long-term viability of wildcats (50, 51, 53, 73). The introgression of domestic alleles is expected to reduce fitness in wildcats and have broader ecological impacts. In situ population dynamics of wild species underscore the perceived value of ex situ management strategies including captive

16 breeding programmes and genetic rescue interventions (207, 227-230). However, the success of these programmes is predicated on a number of factors relating to the genetic health of founder populations and the environmental quality in captivity. The success of captive breeding programmes also varies dramatically between species, with many captive felid species showing reproductive difficulties including high infant mortality (231), and low sperm count in males (230, 232, 233). Nevertheless, researchers have used animals bred in captivity to develop an understanding of the damage inbreeding and genetic drift can cause (218, 234, 235).

As ex situ conservation techniques are advocated by the IUCN, captive breeding programmes are designed to grow a population’s size to a level that could be successfully reintroduced into a species historical range. Mounting evidence suggests that captivity relaxes natural selection pressures and results in unintentional domestication (236, 237). This has best been exemplified in cheetahs, which are notoriously difficult to breed in captivity. Estimates of heterozygosity among cheetahs using microsatellites showed >90% less diversity compared with other mammals (238, 239). These estimates were comparable to other fragmented felid populations like African lions and cougars (240-242). This work was supplemented with functional studies on skin-graft acceptance that demonstrated reduced functional allelic variation at the cheetah’s major histocompatibility complex (MHC), one of the most polymorphic loci in mammals, important in innate immunity and disease susceptibility. Early studies of cheetahs using microsatellites, mtDNA and MHC genes showed the level of inbreeding in cheetahs was similar to deliberately inbred domestic species (238). This was attributed to two historical bottleneck events ~100 kya and again ~11-12 kya (224). Another recognised consequence of these events was poor reproductive performance including low fecundity and malformed spermatozoa in males (243, 244). Genomic dataset have since been used to confirm this and revealed an accumulation of deleterious variants in reproductive gene families in cheetahs, particularly relating to spermatozoa development (46).

In the absence of high-quality genomic resources, conservation geneticists have mostly used traditional molecular techniques like microsatellites, mtDNA, and restriction fragment length polymorphisms (RFLPs) to characterise the impact of genetic drift, inbreeding and gene flow within populations. More recently, a shift toward genomic approaches has introduced the scope to address a wider range of demographic factors through an increased coverage of neutral loci and the adaptive loci underlying these processes (245, 246). In model species, SNPs are the genetic marker of choice for the advancement of functional and quantitative research. They offer a favourable alternative to traditional marker sets, given their high frequency across coding and non-coding regions, comparability across datasets and low typing error rates. Reference genome assemblies simplify SNP discovery efforts by lowering input costs, computational burden and sampling requirements, and have been developed for 13 felid species (Table 3). Reference genomes are essential to modern-day species conservation efforts, allowing us to investigate demography, inbreeding, disease susceptibility, adaptation and behavioural ecology (247). They provide genomic context for identifying annotated gene regions essential for understanding adaptive or deleterious variant consequences. It is important to mention that considerable computational, financial and bioinformatic resources are required to build high-quality reference genome assemblies, rarely available to conservation researchers and managers (248). This is demonstrated by the elevated proportion of draft versus chromosome-level felid assemblies (Table 3).

17

Table 3: Sequencing and assembly details of all publicly available felid reference genome assemblies

Total Scaffold Contig Genome length N50 N50 Unplaced Species Assembly Reference Assembly level coverage (Gb) (Mb) (Mb) (Mb) Domestic cat felCat3 Pontius et al. 2007 Scaffold 1.9x 1.64 0.117 0.002 - felCat4 Mullikin et al. 2010 Chromosome 2.8x 2 0.162 0.005 170.6 felCat5 Montague et al. 2014 Chromosome 2x 2.35 4.7 0.021 111 felCat8 Li et al. 2016 Chromosome 20x 2.64 18.07 0.045 73.7 felCat9 Buckley et al. 2020 Chromosome 72x 2.48 83.97 41.92 46.02

Amur tiger PanTig1.0 Cho et al. 2013 Scaffold 99x 2.39 8.86 0.030 - Lion PanLeo1.0 Armstrong et al. 2020 Chromosome 46x 2.4 136 0.286 246.6 Jaguar PanOnc_v1 Figueiro et al. 2017 Scaffold 60.7x 2.5 0.116 0.006 - Leopard PanPar1.0 Kim et al. 2016 Scaffold 158.5x 2.58 21.7 0.002 - Cheetah aciJub1 Dobrynin et al. 2015 Scaffold 75x 2.37 3.12 0.003 - Aci_Jub_2 Dobrynin et al. 2015 Scaffold 71.74x 2.38 48.5 0.179 - Puma PumCon1.0 Ochoa et al. 2019 Scaffold 47x 2.43 100.5 0.002 - Iberian lynx LYPA1.0 Abascal et al. 2016 Scaffold 135x 2.41 1.51 0.099 - Canadian lynx mLynCan4_v1 G10K Consortium Chromosome 72x 2.4 147.3 7.5 1.08 Jaguarundi PumYag Felidae Consortium Scaffold 37x 2.47 49.27 0.1 - Black footed cat FelNig_v1 Felidae Consortium Scaffold 17.1 2.43 0.0186 0.016 - Caracal CarCar1.0 Felidae Consortium Scaffold 47x 2.42 2.08 0.033 - Asian leopard cat euptilurus_v03 Bredemeyer et al. 2020 Scaffold 100x 2.42 0.214 0.0204 -

Despite being considered ‘works in progress’, draft genome assemblies are of value to research communities (249). Draft sequences provide a comprehensive set of estimates of the number of genes in a species genome, their functional class and sequence comparison to other relevant species. However, some publications potentially mislead end users as to their utility and limitations by not clarifying the build level (22, 46, 250). High- quality genome assemblies are differentiated from draft-quality assemblies by their lower error rates, fewer gaps (e.g. chromosome-level assembly), and high-quality annotations. The relative lack of scaffold order in a draft assembly interferes with its application in comparative studies, of particular interest in felids given the high degree of genomic synteny discussed earlier. Recently, a correction of the tiger draft assembly was published (251), highlighting the potential for inadequately validated draft assemblies to bias the outcomes of genomic observations and biological conclusions (22). While genomics is gaining momentum in conservation, the trend toward reference genomes carries potential risks for under resourced research groups unable to resolve an assembly to a chromosome level build. Model species are increasingly looking toward new technologies like Hi-C, capable of chromosome conformation capture techniques to resolve genome-wide contact maps

18 between loci within chromosomes (252). Projects such as Genome 10K (130) and DNA Zoo (dnazoo.org) are using long-range scaffolding techniques like Hi-C to construct highly contiguous chromosome-level reference genome assemblies for species of conservation value like the African lion (253).

1.6 Aims of thesis

In this thesis, genomic tools developed for the domestic cat are applied to demographic and health scenarios in a domestic cat breed and endangered wild felid species. With this approach I demonstrate the value of the domestic cat as a comparative model for heritable diseases in humans and for the purposes of conservation genomics in wild felids. First, to demonstrate the value of the domestic cat as a model for studying heritable diseases in humans, I used the feline reference genome assembly and genotyping array to map the polygenic basis of feline diabetes mellitus in the Burmese breed, identifying risk haplotypes and candidate genes. Further, I propose the Australian Burmese breed as a unique naturally occurring genetic model of type 2 diabetes in humans. I then establish and evaluate the cross-species utility of genomic resources developed for the domestic cat in felid conservation research in two stages. One, to evaluate the utility of the feline reference genome to perform variant detection across Sumatran tigers, snow leopards and cheetahs, providing a useful resource for future studies in population dynamics, evolutionary history, and disease management of wild cat species. Two, this dataset was then used to simulate common low-density variant calling and genotyping datasets: reduced representation sequencing and the feline genotyping array. I compared the cross-species application of these tools for estimating demographic processes of conservation value and their underlying genes in Sumatran tigers, snow leopards and cheetahs.

19 1.7 References

1. Van Valkenburgh B. Déjà vu: the evolution of feeding morphologies in the Carnivora. Integrative and Comparative Biology. 2007;47(1):147-63. 2. Verbrugghe A, Hesta M. Cats and Carbohydrates: The Carnivore Fantasy? Veterinary sciences. 2017;4(4). 3. Eisert R. Hypercarnivory and the brain: protein requirements of cats reconsidered. Journal of comparative physiology B, Biochemical, systemic, and environmental physiology. 2011;181(1):1-17. 4. Holliday J, A., Steppan SJ. Evolution of Hypercarnivory: The Effect of Specialization on Morphological and Taxonomic Diversity. Paleobiology. 2004;30(1):108-28. 5. Meachen-Samuels J, Van Valkenburgh B. Forelimb indicators of prey-size preference in the Felidae. Journal of morphology. 2009;270(6):729-44. 6. Montague MJ, Li G, Gandolfi B, Khan R, Aken BL, Searle SMJ, et al. Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication. Proceedings of the National Academy of Sciences. 2014;111(48):17230-5. 7. Salazar I, Sanchez Quinteiro P, Cifuentes JM, Garcia Caballero T. The vomeronasal organ of the cat. J Anat. 1996;188 ( Pt 2)(Pt 2):445-54. 8. Decraemer WF, Khanna SM. Measurement, visualisation and quantitative analysis of complete three- dimensional kinematical data sets of human and cat middle ear. Middle Ear Mechanics in Research and Otology.3-10. 9. Li G, Davis BW, Eizirik E, Murphy WJ. Phylogenomic evidence for ancient hybridization in the genomes of living cats (Felidae). Genome Res. 2016;26(1):1-11. 10. Koufos GD, Kostopoulos DS, Vlachou TD. Neogene/Quaternary mammalian migrations in eastern Mediterranean. Belgian Journal of Zoology. 2005;135(2):181. 11. Luo SJ, Zhang Y, Johnson WE, Miao L, Martelli P, Antunes A, et al. Sympatric Asian felid phylogeography reveals a major Indochinese-Sundaic divergence. Molecular ecology. 2014;23(8):2072-92. 12. Matthew WD. The phylogeny of the Felidae. Bulletin of the AMNH ; v. 28, article 261910. 13. Beaumont Gd. Notes complementaries sur quelques felides (Carnivores). Archives des Sciences. 1978;31:219-27. 14. Pocock RI. XL.—The classification of existing Felidæ. Annals and Magazine of Natural History. 1917;20(119):329-50. 15. Herrington SJ. Phylogenetic Relationships of the Wild Cats of the World. Lawrence: University of Kansas; 1986. 16. Werdelin L, Yamaguchi N, Johnson WE, O’Brien SJ. Phylogeny and evolution of cats (Felidae). Biology and conservation of wild felids Oxford. 2010:59-82. 17. Owen R. On the anatomy of the cheetah, Felis jubata, Schreb. The Transactions of the Zoological Society of London. 1834;1(2):129-36. 18. Hast MH. The larynx of roaring and non-roaring cats. J Anat. 1989;163:117-21. 19. Klemuk SA, Riede T, Walsh EJ, Titze IR. Adapted to roar: functional morphology of tiger and lion vocal folds. PLoS One. 2011;6(11):e27029-e. 20. Weissengruber GE, Forstenpointner G, Peters G, Kübber-Heiss A, Fitch WT. Hyoid apparatus and pharynx in the lion (Panthera leo), jaguar (Panthera onca), tiger (Panthera tigris), cheetah (Acinonyxjubatus) and domestic cat (Felis silvestris f. catus). J Anat. 2002;201(3):195-209.

20 21. Van Valkenburgh B. Iterative evolution of hypercarnivory in canids (Mammalia: Carnivora): evolutionary interactions among sympatric predators. Paleobiology. 1991;17(4):340-62. 22. Figueiró HV, Li G, Trindade FJ, Assis J, Pais F, Fernandes G, et al. Genome-wide signatures of complex introgression and adaptive evolution in the big cats. Science Advances. 2017;3(7):e1700299. 23. Kleppe K, Ohtsuka E, Kleppe R, Molineux I, Khorana HG. Studies on polynucleotides: XCVI. Repair replication of short synthetic DNA's as catalyzed by DNA polymerases. Journal of Molecular Biology. 1971;56(2):341-61. 24. Mullis K, Faloona F, Scharf S, Saiki R, Horn G, Erlich H. Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harbor symposia on quantitative biology. 1986;51 Pt 1:263- 73. 25. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A. 1977;74(12):5463-7. 26. Mattern MY, McLennan DA. Phylogeny and Speciation of Felids. Cladistics. 2000;16(2):232-53. 27. Collier GE, O'Brien SJ. A MOLECULAR PHYLOGENY OF THE FELIDAE: IMMUNOLOGICAL DISTANCE. Evolution. 1985;39(3):473-87. 28. Johnson WE, O’Brien SJ. Phylogenetic reconstruction of the felidae using 16S rRNA and NADH-5 mitochondrial genes. Journal of Molecular Evolution. 1997;44(1):S98-S116. 29. Johnson WE, Eizirik E, Pecon-Slattery J, Murphy WJ, Antunes A, Teeling E, et al. The Late Miocene Radiation of Modern Felidae: A Genetic Assessment. Science. 2006;311(5757):73-7. 30. Kitchener AC, Breitenmoser-Würsten C, Eizirik E, Gentry A, Werdelin L, Wilting A, et al. A revised taxonomy of the Felidae: The final report of the Cat Classification Task Force of the IUCN Cat Specialist Group. Cat News. 2017. 31. Zhang W, Zhang M. Complete mitochondrial genomes reveal phylogeny relationship and evolutionary history of the family Felidae. Genet Mol Res. 2013;12:3256-62. 32. Janczewski DN, Modi WS, Stephens JC, O'Brien SJ. Molecular evolution of mitochondrial 12S RNA and cytochrome b sequences in the pantherine lineage of Felidae. Molecular Biology and Evolution. 1995;12(4):690-707. 33. Yu L, Zhang YP. Phylogenetic studies of pantherine cats (Felidae) based on multiple genes, with novel application of nuclear beta-fibrinogen intron 7 to carnivores. Molecular phylogenetics and evolution. 2005;35(2):483-95. 34. Burger J, Rosendahl W, Loreille O, Hemmer H, Eriksson T, Götherström A, et al. Molecular phylogeny of the extinct cave lion Panthera leo spelaea. Molecular phylogenetics and evolution. 2004;30(3):841-9. 35. Hillis D. Molecular Versus Morphological Approaches To Systematics. Annual Review of Ecology and Systematics. 2003;18:23-42. 36. Jae-Heup K, Eizirik E, O'Brien SJ, Johnson WE. Structure and patterns of sequence variation in the mitochondrial DNA control region of the great cats. Mitochondrion. 2001;1(3):279-92. 37. O'Brien SJ, Yuhki N. Comparative genome organization of the major histocompatibility complex: lessons from the Felidae. Immunological Reviews. 1999;167(1):133-44. 38. Masuda R, Lopez JV, Slattery JP, Yuhki N, O'Brien SJ. Molecular phylogeny of mitochondrial cytochrome b and 12S rRNA sequences in the Felidae: ocelot and domestic cat lineages. Molecular phylogenetics and evolution. 1996;6(3):351-65. 39. Hsu TC, Rearden HH, Luquette GF. Karyological Studies of Nine Species of Felidae. The American Naturalist. 1963;97(895):225-34.

21 40. O'Brien SJ, Nash WG. Genetic Mapping in Mammals: Chromosome Map of Domestic Cat. Science. 1982;216(4543):257-65. 41. Wurster-Hill DH, Gray CW. Giemsa banding patterns in the chromosomes of twelve species of cats (Felidae). Cytogenetics and cell genetics. 1973;12(6):388-97. 42. Perelman PL, Graphodatsky AS, Serdukova NA, Nie W, Alkalaeva EZ, Fu B, et al. Karyotypic conservatism in the suborder (Order Carnivora). Cytogenetic and genome research. 2005;108(4):348-54. 43. Nie W, Wang J, Su W, Wang D, Tanomtong A, Perelman PL, et al. Chromosomal rearrangements and karyotype evolution in carnivores revealed by chromosome painting. Heredity. 2012;108(1):17-27. 44. Rettenberger G, Klett C, Zechner U, Bruch J, Just W, Vogel W, et al. ZOO-FISH analysis: cat and human karyotypes closely resemble the putative ancestral mammalian karyotype. Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology. 1995;3(8):479-86. 45. Cho YS, Hu L, Hou H, Lee H, Xu J, Kwon S, et al. The tiger genome and comparative analysis with lion and snow leopard genomes. Nature Communications. 2013;4(1):2433. 46. Dobrynin P, Liu S, Tamazian G, Xiong Z, Yurchenko AA, Krasheninnikova K, et al. Genomic legacy of the African cheetah, Acinonyx jubatus. Genome Biology. 2015;16(1):277. 47. Pontius JU, Mullikin JC, Smith DR, Agencourt Sequencing T, Lindblad-Toh K, Gnerre S, et al. Initial sequence and comparative analysis of the cat genome. Genome Res. 2007;17(11):1675-89. 48. Hu Y, Hu S, Wang W, Wu X, Marshall FB, Chen X, et al. Earliest evidence for commensal processes of cat domestication. Proceedings of the National Academy of Sciences. 2014;111(1):116-20. 49. Driscoll CA, Menotti-Raymond M, Roca AL, Hupe K, Johnson WE, Geffen E, et al. The Near Eastern origin of cat domestication. Science (New York, NY). 2007;317(5837):519-23. 50. Daniels MJ, Beaumont MA, Johnson PJ, Balharry D, Macdonald DW, Barratt E. Ecology and genetics of wild-living cats in the north-east of Scotland and the implications for the conservation of the wildcat. Journal of Applied Ecology. 2001:146-61. 51. Oliveira R, Godinho R, Randi E, Alves PC. Hybridization versus conservation: are domestic cats threatening the genetic integrity of wildcats (Felis silvestris silvestris) in Iberian Peninsula? Philosophical Transactions of the Royal Society B: Biological Sciences. 2008;363(1505):2953-61. 52. Oliveira R, Godinho R, Randi E, Ferrand N, Alves PC. Molecular analysis of hybridisation between wild and domestic cats (Felis silvestris) in Portugal: implications for conservation. Conservation Genetics. 2008;9(1):1-11. 53. Nussberger B, Wandeler P, Weber D, Keller LF. Monitoring introgression in European wildcats in the Swiss Jura. Conservation Genetics. 2014;15(5):1219-30. 54. Van Neer W, Linseele V, Friedman R, De Cupere B. More evidence for cat taming at the Predynastic elite cemetery of Hierakonpolis (Upper ). Journal of Archaeological Science. 2014;45:103-11. 55. Vigne J-D, Briois F, Zazzo A, Willcox G, Cucchi T, Thiébault S, et al. First wave of cultivators spread to Cyprus at least 10,600 y ago. Proceedings of the National Academy of Sciences. 2012;109(22):8445-9. 56. Vigne J-D, Carr, xe, re I, Briois F, xe, et al. New Evidence from the Pre-Neolithic and Pre-Pottery Neolithic in Cyprus The Early Process of Domestication in the Near East. Current Anthropology. 2011;52(S4):S255-S71. 57. Vigne J-D, Guilaine J, Debue K, Haye L, Gérard P. Early Taming of the Cat in Cyprus. Science. 2004;304(5668):259-.

22 58. Driscoll CA, Clutton-Brock J, Kitchener AC, O'Brien SJ. The Taming of the cat. Genetic and archaeological findings hint that wildcats became housecats earlier--and in a different place--than previously thought. Sci Am. 2009;300(6):68-75. 59. Trut L, Oskina I, Kharlamova A. Animal evolution during domestication: the domesticated fox as a model. Bioessays. 2009;31(3):349-60. 60. Wang X, Pipes L, Trut LN, Herbeck Y, Vladimirova AV, Gulevich RG, et al. Genomic responses to selection for tame/aggressive behaviors in the silver fox (Vulpes vulpes). Proceedings of the National Academy of Sciences. 2018;115(41):10398-403. 61. Scheu A. Neolithic animal domestication as seen from ancient DNA. Quaternary International. 2018;496:102-7. 62. Ollivier M, Tresset A, Frantz LA, Bréhard S, Bălăşescu A, Mashkour M, et al. Dogs accompanied humans during the Neolithic expansion into Europe. Biology letters. 2018;14(10):20180286. 63. Botigué LR, Song S, Scheu A, Gopalan S, Pendleton AL, Oetjens M, et al. Ancient European dog genomes reveal continuity since the Early Neolithic. Nature communications. 2017;8(1):1-11. 64. Price EO. Behavioral Aspects of Animal Domestication. The Quarterly Review of Biology. 1984;59(1):1-32. 65. Price EO. Behavioral development in animals undergoing domestication. Applied Animal Behaviour Science. 1999;65(3):245-71. 66. Zeder MA. Pathways to animal domestication. Biodiversity in agriculture: domestication, evolution, and sustainability. 2012:227-59. 67. Miao Y, Peng M-S, Wu G-S, Ouyang Y, Yang Z, Yu N, et al. Chicken domestication: an updated perspective based on mitochondrial genomes. Heredity. 2013;110(3):277-82. 68. Redding RW, Rosenberg M. Ancestral pigs: a New (Guinea) model for domestication in the Middle East. MASCA research papers in science and archaeology. 1998;15:65-76. 69. Axelsson E, Ratnakumar A, Arendt M-L, Maqbool K, Webster MT, Perloski M, et al. The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature. 2013;495(7441):360-4. 70. Mattucci F, Oliveira R, Lyons LA, Alves PC, Randi E. populations are subdivided into five main biogeographic groups: consequences of Pleistocene climate changes or recent anthropogenic fragmentation? Ecology and Evolution. 2016;6(1):3-22. 71. Ottoni C, Van Neer W, De Cupere B, Daligault J, Guimaraes S, Peters J, et al. The palaeogenetics of cat dispersal in the ancient world. Nature Ecology & Evolution. 2017;1(7):0139. 72. Wiseman R, O'Ryan C, Harley E. Microsatellite analysis reveals that domestic cat (Felis catus) and southern African wild cat (F. lybica) are genetically distinct. Animal Conservation. 2000;3(3):221-8. 73. Beaumont M, Barratt EM, Gottelli D, Kitchener AC, Daniels MJ, Pritchard JK, et al. Genetic diversity and introgression in the Scottish wildcat. Molecular ecology. 2001;10(2):319-36. 74. Lipinski MJ, Froenicke L, Baysac KC, Billings NC, Leutenegger CM, Levy AM, et al. The ascent of cat breeds: genetic evaluations of breeds and worldwide random-bred populations. Genomics. 2008;91(1):12-21. 75. Gandolfi B, Alamri S, Darby WG, Adhikari B, Lattimer JC, Malik R, et al. A dominant TRPV4 variant underlies osteochondrodysplasia in Scottish fold cats. Osteoarthritis and Cartilage. 2016;24(8):1441-50. 76. Buckley RM, Davis BW, Brashear WA, Farias FHG, Kuroki K, Graves T, et al. A new domestic cat genome assembly based on long sequence reads empowers feline genomic medicine and identifies a novel gene for dwarfism. PLoS Genet. 2020;16(10):e1008926-e.

23 77. Gandolfi B, Alhaddad H, Affolter VK, Brockman J, Haggstrom J, Joslin SEK, et al. To the Root of the Curl: A Signature of a Recent Selective Sweep Identifies a Mutation That Defines the Cat Breed. PLoS One. 2013;8(6):e67105. 78. Murphy W. Genetic Analysis of Feline Interspecies Hybrids Tufts' Canine and Feline Breeding and Genetics Conference, 2015. 2015. 79. Gebhardt RH. The Complete Cat Book. New York: Howell Book House; 1991. 80. Kurushima JD, Lipinski MJ, Gandolfi B, Froenicke L, Grahn JC, Grahn RA, et al. Variation of cats under domestication: genetic assignment of domestic cats to breeds and worldwide random-bred populations. Animal Genetics. 2013;44(3):311-24. 81. TICA. The Persian Breed 2018 [Available from: https://tica.org/breeds/browse-all- breeds?view=article&id=864:persian-breed&catid=79. 82. Malik R, Sparkes A, Bessant C. Brachycephalia--a bastardisation of what makes cats special. Journal of feline medicine and surgery. 2009;11(11):889-90. 83. Schlueter C, Budras KD, Ludewig E, Mayrhofer E, Koenig HE, Walter A, et al. Brachycephalic feline noses: CT and anatomical study of the relationship between head conformation and the nasolacrimal drainage system. Journal of feline medicine and surgery. 2009;11(11):891-900. 84. Bertolini F, Gandolfi B, Kim ES, Haase B, Lyons LA, Rothschild MF. Evidence of selection signatures that shape the Persian cat breed. Mammalian genome : official journal of the International Mammalian Genome Society. 2016;27(3-4):144-55. 85. Curik I, Ferenčaković M, Sölkner J. Inbreeding and runs of homozygosity: A possible solution to an old problem. Livestock Science. 2014;166:26-34. 86. Clark DW, Okada Y, Moore KHS, Mason D, Pirastu N, Gandin I, et al. Associations of autozygosity with a broad range of human phenotypes. Nature Communications. 2019;10(1):4957. 87. Zhao G, Zhang T, Liu Y, Wang Z, Xu L, Zhu B, et al. Genome-Wide Assessment of Runs of Homozygosity in Chinese Wagyu Beef Cattle. Animals. 2020;10(8):1425. 88. Chang J, Jung J, Oh S, Lee S, Kim G, Kim H, et al. Osteochondrodysplasia in three Scottish Fold cats. J Vet Sci. 2007;8(3):307-9. 89. Malik R, Allan G, Howlett C, Thompson D, James G, McWhirter C, et al. Osteochondrodysplasia in Scottish fold cats. Australian veterinary journal. 1999;77(2):85-92. 90. Everaerts W, Nilius B, Owsianik G. The vanilloid transient receptor potential channel TRPV4: from structure to disease. Progress in biophysics and molecular biology. 2010;103(1):2-17. 91. White JPM, Cibelli M, Urban L, Nilius B, McGeown JG, Nagy I. TRPV4: Molecular Conductor of a Diverse Orchestra. Physiological Reviews. 2016;96(3):911-73. 92. Takanosu M, Takanosu T, Suzuki H, Suzuki K. Incomplete dominant osteochondrodysplasia in heterozygous Scottish Fold cats. Journal of Small Animal Practice. 2008;49(4):197-9. 93. ACF. ACF BREEDING POLICY FOR THE SCOTTISH FOLD (LONGHAIR AND SHORTHAIR) AND SCOTTISH LONGHAIR AND SHORTHAIR CATS. 2018. 94. Federation WC. WCF Breed Standards: Scottish Fold. 2010. 95. Alhaddad H, Khan R, Grahn RA, Gandolfi B, Mullikin JC, Cole SA, et al. Extent of Linkage Disequilibrium in the Domestic Cat, Felis silvestris catus, and Its Breeds. PLoS One. 2013;8(1):e53537. 96. Gandolfi B, Alhaddad H, Abdi M, Bach LH, Creighton EK, Davis BW, et al. Applications and efficiencies of the first cat 63K DNA array. Scientific Reports. 2018;8(1):7024.

24 97. O'Brien SJ, Johnson W, Driscoll C, Pontius J, Pecon-Slattery J, Menotti-Raymond M. State of cat genomics. Trends Genet. 2008;24(6):268-79. 98. Menotti-Raymond M, O’Brien SJ. The domestic cat, Felis catus, as a model of hereditary and infectious disease. Sourcebook of models for biomedical research: Springer; 2008. p. 221-32. 99. Nicholas FW. Online Mendelian Inheritance in Animals (OMIA): a record of advances in animal genetics, freely available on the Internet for 25 years. Animal Genetics. 2021;52(1):3-9. 100. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860-921. 101. Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Washietl S, et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478(7370):476-82. 102. Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005;438(7069):803-19. 103. O'Brien SJ, Cevario SJ, Martenson JS, Thompson MA, Nash WG, Chang E, et al. Comparative gene mapping in the domestic cat (Felis catus). The Journal of heredity. 1997;88(5):408-14. 104. Menotti-Raymond M, David VA, Agarwala R, Schäffer AA, Stephens R, O'Brien SJ, et al. Radiation hybrid mapping of 304 novel microsatellites in the domestic cat genome. Cytogenetic and genome research. 2003;102(1-4):272-6. 105. Murphy WJ, Davis B, David VA, Agarwala R, Schäffer AA, Pearks Wilkerson AJ, et al. A 1.5-Mb- resolution radiation hybrid map of the cat genome and comparative analysis with the canine and human genomes. Genomics. 2007;89(2):189-96. 106. Murphy WJ, O'Brien SJ. Designing and optimizing comparative anchor primers for comparative gene mapping and phylogenetic inference. Nature protocols. 2007;2(11):3022-30. 107. Menotti-Raymond M, David VA, Lyons LA, Schäffer AA, Tomlin JF, Hutton MK, et al. A genetic linkage map of microsatellites in the domestic cat (Felis catus). Genomics. 1999;57(1):9-23. 108. Beck TW, Menninger J, Voigt G, Newmann K, Nishigaki Y, Nash WG, et al. Comparative feline genomics: a BAC/PAC contig map of the major histocompatibility complex class II region. Genomics. 2001;71(3):282-95. 109. Beck TW, Menninger J, Murphy WJ, Nash WG, O'Brien S J, Yuhki N. The feline major histocompatibility complex is rearranged by an inversion with a breakpoint in the distal class I region. Immunogenetics. 2005;56(10):702-9. 110. Davis BW, Raudsepp T, Pearks Wilkerson AJ, Agarwala R, Schäffer AA, Houck M, et al. A high- resolution cat radiation hybrid and integrated FISH mapping resource for phylogenomic studies across Felidae. Genomics. 2009;93(4):299-304. 111. Schmidt-Küntzel A, Nelson G, David VA, Schäffer AA, Eizirik E, Roelke ME, et al. A domestic cat X chromosome linkage map and the sex-linked orange locus: mapping of orange, multiple origins and epistasis over nonagouti. Genetics. 2009;181(4):1415-25. 112. Schmidt-Küntzel A, Eizirik E, O'Brien SJ, Menotti-Raymond M. Tyrosinase and tyrosinase related protein 1 alleles specify domestic cat coat color phenotypes of the albino and brown loci. The Journal of heredity. 2005;96(4):289-301. 113. Menotti-Raymond M, David VA, Schäffer AA, Tomlin JF, Eizirik E, Phillip C, et al. An autosomal genetic linkage map of the domestic cat, Felis silvestris catus. Genomics. 2009;93(4):305-13.

25 114. Li G, Hillier LW, Grahn RA, Zimin AV, David VA, Menotti-Raymond M, et al. A High-Resolution SNP Array-Based Linkage Map Anchors a New Domestic Cat Draft Genome Assembly and Provides Detailed Patterns of Recombination. G3: Genes|Genomes|Genetics. 2016;6(6):1607-16. 115. Murphy WJ, Sun S, Chen Z, Yuhki N, Hirschmann D, Menotti-Raymond M, et al. A radiation hybrid map of the cat genome: implications for comparative mapping. Genome Res. 2000;10(5):691-702. 116. Mullikin JC, Hansen NF, Shen L, Ebling H, Donahue WF, Tao W, et al. Light whole genome sequence for SNP discovery across domestic cat breeds. BMC Genomics. 2010;11(1):406. 117. McCue ME, Bannasch DL, Petersen JL, Gurr J, Bailey E, Binns MM, et al. A high density SNP array for the domestic horse and extant Perissodactyla: utility for association mapping, genetic diversity, and phylogeny studies. PLoS Genet. 2012;8(1):e1002451. 118. Vaysse A, Ratnakumar A, Derrien T, Axelsson E, Pielberg GR, Sigurdsson S, et al. Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping. PLoS Genet. 2011;7(10):e1002316. 119. Kranis A, Gheyas AA, Boschiero C, Turner F, Yu L, Smith S, et al. Development of a high density 600K SNP genotyping array for chicken. BMC genomics. 2013;14(1):1-13. 120. Dash S, Singh A, Bhatia AK, Jayakumar S, Sharma A, Singh S, et al. Evaluation of Bovine High- Density SNP Genotyping Array in Indigenous Dairy Cattle Breeds. Animal Biotechnology. 2018;29(2):129-35. 121. Samaha G, Wade CM, Beatty J, Lyons LA, Fleeman LM, Haase B. Mapping the genetic basis of diabetes mellitus in the Australian Burmese cat (Felis catus). Scientific Reports. 2020;10(1):19194. 122. Alhaddad H, Gandolfi B, Grahn RA, Rah HC, Peterson CB, Maggs DJ, et al. Genome-wide association and linkage analyses localize a progressive retinal atrophy locus in Persian cats. Mammalian genome : official journal of the International Mammalian Genome Society. 2014;25(7-8):354-62. 123. Matukumalli LK, Lawley CT, Schnabel RD, Taylor JF, Allan MF, Heaton MP, et al. Development and characterization of a high density SNP genotyping assay for cattle. PLoS One. 2009;4(4):e5350. 124. Gandolfi B, Gruffydd-Jones TJ, Malik R, Cortes A, Jones BR, Helps CR, et al. First WNK4- hypokalemia animal model identified by genome-wide association in Burmese cats. PLoS One. 2012;7(12):e53173-e. 125. Lyons LA, Erdman CA, Grahn RA, Hamilton MJ, Carter MJ, Helps CR, et al. Aristaless-Like Homeobox protein 1 (ALX1) variant associated with craniofacial structure and frontonasal dysplasia in Burmese cats. Dev Biol. 2016;409(2):451-8. 126. Gandolfi B, Grahn RA, Creighton EK, Williams DC, Dickinson PJ, Sturges BK, et al. COLQ variant associated with Devon Rex and Sphynx feline hereditary myopathy. Anim Genet. 2015;46(6):711-5. 127. Gandolfi B, Alhaddad H, Joslin SE, Khan R, Filler S, Brem G, et al. A splice variant in KRT71 is associated with curly coat phenotype of Selkirk Rex cats. Sci Rep. 2013;3:2000. 128. Golovko L, Lyons LA, Liu H, Sørensen A, Wehnert S, Pedersen NC. Genetic susceptibility to feline infectious peritonitis in Birman cats. Virus Research. 2013;175(1):58-63. 129. Lewin HA, Robinson GE, Kress WJ, Baker WJ, Coddington J, Crandall KA, et al. Earth BioGenome Project: Sequencing life for the future of life. Proceedings of the National Academy of Sciences. 2018;115(17):4325-33. 130. Koepfli K-P, Paten B, Genome KCoS, O'Brien SJ. The Genome 10K Project: a way forward. Annu Rev Anim Biosci. 2015;3:57-111. 131. Menotti-Raymond M, David VA, Schäffer AA, Stephens R, Wells D, Kumar-Singh R, et al. Mutation in CEP290 discovered for cat model of human retinal degeneration. The Journal of heredity. 2007;98(3):211-20.

26 132. Günther T, Nettelblad C. The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genet. 2019;15(7):e1008302. 133. Gibbs RA, Belmont JW, Hardenbol P, Willis TD, Yu F, Yang H, et al. The international HapMap project. 2003. 134. Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68-74. 135. Wang G-D, Larson G, Kidd JM, vonHoldt BM, Ostrander EA, Zhang Y-P. Dog10K: the International Consortium of Canine Genome Sequencing. Natl Sci Rev. 2019;6(4):611-3. 136. Gibbs RA, Taylor JF, Van Tassell CP, Barendse W, Eversole KA, Gill CA, et al. Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds. Science. 2009;324(5926):528-32. 137. Lyons LA, Buckley RM. Direct-to-Consumer Genetic Testing for Domestic Cats. The Veterinary clinics of Small animal practice. 2020;50(5):991-1000. 138. Buckley RM, Lyons LA. Precision/Genomic Medicine for Domestic Cats. Veterinary Clinics of North America: Small Animal Practice. 2020;50(5):983-90. 139. Shearin AL, Ostrander EA. Leading the way: canine models of genomics and disease. Dis Model Mech. 2010;3(1-2):27-34. 140. Hecker N, Hiller M. A genome alignment of 120 mammals highlights ultraconserved element variability and placenta-associated enhancers. GigaScience. 2020;9(1). 141. McLean C, Bejerano G. Dispensability of mammalian DNA. Genome Res. 2008;18(11):1743-51. 142. Graphodatsky AS, Trifonov VA, Stanyon R. The genome diversity and karyotype evolution of mammals. Molecular cytogenetics. 2011;4:22-. 143. O'Brien SJ, Menotti-Raymond M, Murphy WJ, Yuhki N. The Feline Genome Project. Annual Review of Genetics. 2002;36(1):657-86. 144. Plassais J, Kim J, Davis BW, Karyadi DM, Hogan AN, Harris AC, et al. Whole genome sequencing of canids reveals genomic regions under selection and variants influencing morphology. Nature Communications. 2019;10(1):1489. 145. Wang C, Wallerman O, Arendt M-L, Sundström E, Karlsson Å, Nordin J, et al. A new long-read dog assembly uncovers thousands of exons and functional elements missing in the previous reference. bioRxiv. 2020:2020.07.02.185108. 146. Hermsen R, de Ligt J, Spee W, Blokzijl F, Schäfer S, Adami E, et al. Genomic landscape of rat strain and substrain variation. BMC Genomics. 2015;16(1):357. 147. Warr A, Affara N, Aken B, Beiki H, Bickhart DM, Billis K, et al. An improved pig reference genome sequence to enable pig genetics and genomics research. GigaScience. 2020;9(6). 148. Jagannathan V, Gerber V, Rieder S, Tetens J, Thaller G, Drögemüller C, et al. Comprehensive characterization of horse genome variation by whole-genome sequencing of 88 horses. Anim Genet. 2019;50(1):74-7. 149. Xue C, Raveendran M, Harris RA, Fawcett GL, Liu X, White S, et al. The population genomics of rhesus macaques (Macaca mulatta) based on whole-genome sequences. Genome Res. 2016;26(12):1651- 62. 150. Hooijmans CR, Leenaars M, Ritskes-Hoitinga M. A gold standard publication checklist to improve the quality of animal studies, to fully integrate the Three Rs, and to make systematic reviews more feasible. Alternatives to laboratory animals : ATLA. 2010;38(2):167-82.

27 151. Hoenig M. The cat as a model for human obesity and diabetes. J Diabetes Sci Technol. 2012;6(3):525-33. 152. Rowell JL, McCarthy DO, Alvarez CE. Dog models of naturally occurring cancer. Trends Mol Med. 2011;17(7):380-8. 153. Burkholder T, Feliciano CL, VandeWoude S, Baker HJ. Biology and Diseases of Cats. Laboratory Animal Medicine. 2015:555-76. 154. Sams AJ, Boyko AR. Fine-Scale Resolution of Runs of Homozygosity Reveal Patterns of Inbreeding and Substantial Overlap with Recessive Disease Genotypes in Domestic Dogs. G3: Genes|Genomes|Genetics. 2019;9(1):117-23. 155. Marsden CD, Ortega-Del Vecchyo D, O’Brien DP, Taylor JF, Ramirez O, Vilà C, et al. Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs. Proceedings of the National Academy of Sciences. 2016;113(1):152-7. 156. Omoe K, Endo A. Relationship between the monosomy X phenotype and Y-linked ribosomal protein S4 (Rps4) in several species of mammals: a molecular evolutionary analysis of Rps4 homologs. Genomics. 1996;31(1):44-50. 157. Szczerbal I, Nizanski W, Dzimira S, Nowacka-Woszuk J, Ochota M, Switonski M. X monosomy in a virilized female cat. Reproduction in domestic animals = Zuchthygiene. 2015;50(2):344-8. 158. Lyons LA. Genetic testing in domestic cats. Mol Cell Probes. 2012;26(6):224-30. 159. Centerwall WR, Benirschke K. An animal model for the XXY Klinefelter's syndrome in man: tortoiseshell and calico male cats. American journal of veterinary research. 1975;36(9):1275-80. 160. Pedersen AS, Berg LC, Almstrup K, Thomsen PD. A tortoiseshell male cat: chromosome analysis and histologic examination of the testis. Cytogenetic and genome research. 2014;142(2):107-11. 161. Barr ML, Bertram EG. A Morphological Distinction between Neurones of the Male and Female, and the Behaviour of the Nucleolar Satellite during Accelerated Nucleoprotein Synthesis. Nature. 1949;163(4148):676-7. 162. Lyon MF. Sex chromatin and gene action in the mammalian X-chromosome. Am J Hum Genet. 1962;14(2):135-48. 163. Szczerbal I, Stachowiak M, Dzimira S, Sliwa K, Switonski M. The first case of 38,XX (SRY-positive) disorder of sex development in a cat. Molecular Cytogenetics. 2015;8(1):22. 164. Schlafer DH, Valentine B, Fahnestock G, Froenicke L, Grahn RA, Lyons LA, et al. A case of SRY- positive 38,XY true hermaphroditism (XY sex reversal) in a cat. Vet Pathol. 2011;48(4):817-22. 165. Szczerbal I, Krzeminska P, Dzimira S, Tamminen TM, Saari S, Nizanski W, et al. Disorders of sex development in cats with different complements of sex chromosomes. Reproduction in domestic animals = Zuchthygiene. 2018;53(6):1317-22. 166. De Lorenzi L, Banco B, Previderè C, Bonacina S, Romagnoli S, Grieco V, et al. Testicular XX (SRY- Negative) Disorder of Sex Development in Cat. Sexual development : genetics, molecular biology, evolution, endocrinology, embryology, and pathology of sex determination and differentiation. 2017;11(4):210-6. 167. Szczerbal I, Stachowiak M, Nowacka-Woszuk J, Dzimira S, Szczepanska K, Switonski M. Disorder of sex development in a cat with chromosome mosaicism 37,X/38,X,r(Y). Reproduction in domestic animals = Zuchthygiene. 2017;52(5):914-7. 168. Cork LC, Munnell JF, Lorenz MD, Murphy JV, Baker HJ, Rattazzi MC. GM2 ganglioside lysosomal storage disease in cats with beta-hexosaminidase deficiency. Science. 1977;196(4293):1014-7.

28 169. Martin DR, Cox NR, Morrison NE, Kennamer DM, Peck SL, Dodson AN, et al. Mutation of the GM2 activator protein in a feline model of GM2 gangliosidosis. Acta Neuropathologica. 2005;110(5):443-50. 170. Martin DR, Krum BK, Varadarajan GS, Hathcock TL, Smith BF, Baker HJ. An inversion of 25 base pairs causes feline GM2 gangliosidosis variant 0. Experimental Neurology. 2004;187(1):30-7. 171. Menotti-Raymond M, Deckman KH, David V, Myrkalo J, O'Brien SJ, Narfström K. Mutation discovered in a feline model of human congenital retinal blinding disease. Invest Ophthalmol Vis Sci. 2010;51(6):2852-9. 172. Lyons LA, Reilly CM, Ofri R, Maggs DJ, Gandolfi B, Alhaddad H, et al. Leber’s Congenital Amaurosis and Retinitis Pigmentosa mutations in the domestic cat. Invest Ophthalmol Vis Sci. 2015;56(7):2871-. 173. Narfström K, Holland Deckman K, Menotti-Raymond M. The Domestic Cat as a Large Animal Model for Characterization of Disease and Therapeutic Intervention in Hereditary Retinal Blindness. Journal of Ophthalmology. 2011;2011:906943. 174. McCurdy VJ, Rockwell HE, Arthur JR, Bradbury AM, Johnson AK, Randle AN, et al. Widespread correction of central nervous system disease after intracranial gene therapy in a feline model of Sandhoff disease. Gene therapy. 2015;22(2):181-9. 175. Rockwell HE, McCurdy VJ, Eaton SC, Wilson DU, Johnson AK, Randle AN, et al. AAV-mediated gene delivery in a feline model of Sandhoff disease corrects lysosomal storage in the central nervous system. ASN Neuro. 2015;7(2):1759091415569908. 176. Buckley RM, Grahn RA, Gandolfi B, Herrick JR, Kittleson MD, Bateman HL, et al. Assisted reproduction mediated resurrection of a feline model for Chediak-Higashi syndrome caused by a large duplication in LYST. Scientific Reports. 2020;10(1):64. 177. Grarup N, Moltke I, Albrechtsen A, Hansen T. Diabetes in Population Isolates: Lessons from Greenland. Rev Diabet Stud. 2015;12(3-4):320-9. 178. Arcos-Burgos M, Muenke M. Genetics of population isolates. Clinical Genetics. 2002;61(4):233-47. 179. Kääriäinen H, Muilu J, Perola M, Kristiansson K. Genetics in an isolated population like Finland: a different basis for genomic medicine? Journal of Community Genetics. 2017;8(4):319-26. 180. Keijser SFA, Meijndert LE, Fieten H, Carrière BJ, van Steenbeek FG, Leegwater PAJ, et al. Disease burden in four populations of dog and cat breeds compared to mixed-breed dogs and cats. Preventive veterinary medicine. 2017;140:38-44. 181. Hatzikotoulas K, Gilly A, Zeggini E. Using population isolates in genetic association studies. Brief Funct Genomics. 2014;13(5):371-7. 182. Heutink P, Oostra BA. Gene finding in genetically isolated populations. Human Molecular Genetics. 2002;11(20):2507-15. 183. McDonald SP, Hoy WE, Maguire GP, Duarte NL, Wilcken DE, Wang XL. The p53Pro72Arg polymorphism is associated with albuminuria among aboriginal Australians. Journal of the American Society of Nephrology. 2002;13(3):677-83. 184. Thomson RJ, McMorran B, Hoy W, Jose M, Whittock L, Thornton T, et al. New Genetic Loci Associated With Chronic Kidney Disease in an Indigenous Australian Population. Frontiers in Genetics. 2019;10(330). 185. Andersen MK, Pedersen C-ET, Moltke I, Hansen T, Albrechtsen A, Grarup N. Genetics of Type 2 Diabetes: the Power of Isolated Populations. Current Diabetes Reports. 2016;16(7):65. 186. Kristiansson K, Naukkarinen J, Peltonen L. Isolated populations and complex disease gene identification. Genome biology. 2008;9(8):109-.

29 187. Chen J, Sun M, Adeyemo A, Pirie F, Carstensen T, Pomilla C, et al. Genome-wide association study of type 2 diabetes in Africa. Diabetologia. 2019;62(7):1204-11. 188. Sim X, Ong RT-H, Suo C, Tay W-T, Liu J, Ng DP-K, et al. Transferability of Type 2 Diabetes Implicated Loci in Multi-Ethnic Cohorts from Southeast Asia. PLoS Genet. 2011;7(4):e1001363. 189. Grarup N, Moltke I, Andersen MK, Bjerregaard P, Larsen CVL, Dahl-Petersen IK, et al. Identification of novel high-impact recessively inherited type 2 diabetes risk variants in the Greenlandic population. Diabetologia. 2018;61(9):2005-15. 190. Busfield F, Duffy DL, Kesting JB, Walker SM, Lovelock PK, Good D, et al. A genomewide search for type 2 diabetes-susceptibility genes in indigenous Australians. Am J Hum Genet. 2002;70(2):349-57. 191. Meurs KM, Norgard MM, Ederer MM, Hendrix KP, Kittleson MD. A substitution mutation in the myosin binding protein C gene in ragdoll hypertrophic cardiomyopathy. Genomics. 2007;90(2):261-4. 192. Ontiveros ES, Ueda Y, Harris SP, Stern JA. Precision medicine validation: identifying the MYBPC3 A31P variant with whole-genome sequencing in two Maine Coon cats with hypertrophic cardiomyopathy. Journal of feline medicine and surgery. 2019;21(12):1086-93. 193. McNamara JW, Schuckman M, Becker RC, Sadayappan S. A Novel Homozygous Intronic Variant in TNNT2 Associates With Feline Cardiomyopathy. Frontiers in Physiology. 2020;11(1500). 194. Ripple WJ, Estes JA, Beschta RL, Wilmers CC, Ritchie EG, Hebblewhite M, et al. Status and ecological effects of the world’s largest carnivores. Science. 2014;343(6167). 195. Ritchie EG, Johnson CN. Predator interactions, mesopredator release and biodiversity conservation. Ecology letters. 2009;12(9):982-98. 196. Colman NJ, Gordon CE, Crowther MS, Letnic M. Lethal control of an apex predator has unintended cascading effects on forest mammal assemblages. Proceedings of the royal society B: biological sciences. 2014;281(1782):20133094. 197. Vijay V, Pimm SL, Jenkins CN, Smith SJ. The impacts of oil palm on recent deforestation and biodiversity loss. PLoS One. 2016;11(7):e0159668. 198. Boron V, Deere NJ, Xofis P, Link A, Quiñones-Guerrero A, Payan E, et al. Richness, diversity, and factors influencing occupancy of mammal communities across human-modified landscapes in Colombia. Biological Conservation. 2019;232:108-16. 199. Sollmann R, Hunter LT, Slotow R, Macdonald DW, Henschel P. Effects of human land-use on Africa's only forest-dependent felid: The African golden cat Caracal aurata. Biological Conservation. 2016;199:1-9. 200. Cruz P, Iezzi ME, De Angelo C, Varela D, Di Bitetti MS, Paviolo A. Effects of human impacts on habitat use, activity patterns and ecological relationships among medium and small felids of the Atlantic Forest. PLoS One. 2018;13(8):e0200806. 201. Krafte Holland K, Larson LR, Powell RB. Characterizing conflict between humans and big cats Panthera spp: A systematic review of research trends and management opportunities. PLoS One. 2018;13(9):e0203877. 202. IUCN. The IUCN Red List of Threatened Species Version 2020-3 2020 [Available from: https://www.iucnredlist.org. 203. Carbone C, Pettorelli N, Stephens PA. The bigger they come, the harder they fall: body size and prey abundance influence predator–prey ratios. Biology letters. 2011;7(2):312-5. 204. Johnston B. The endangerment and conservation of cheetahs (Acinonyx jubatus), leopards (Panthera pardus), lions (Panthera leo), and tigers (Panthera tigris) in Africa and Asia.

30 205. Sorci G, Cornet S, Faivre B. Immunity and the emergence of virulent pathogens. Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases. 2013;16:441-6. 206. Alexander JS, Gopalaswamy AM, Shi K, Hughes J, Riordan P. Patterns of Snow Leopard Site Use in an Increasingly Human-Dominated Landscape. PLoS One. 2016;11(5):e0155309. 207. van de Kerk M, Onorato DP, Hostetler JA, Bolker BM, Oli MK. Dynamics, Persistence, and Genetic Management of the Endangered Florida Panther Population. Wildlife Monographs. 2019;203(1):3-35. 208. Lubis MI, Pusparini W, Prabowo SA, Marthy W, Tarmizi, Andayani N, et al. Unraveling the complexity of human–tiger conflicts in the Leuser Ecosystem, Sumatra. Animal Conservation. 2020;23(6):741-9. 209. Inskip C, Zimmermann A. Human-felid conflict: a review of patterns and priorities worldwide. Oryx. 2009;43(1):18-34. 210. IUCN. Saving the Sumatran tiger 2020 [Available from: https://www.iucn.org/theme/species/our- work/action-ground/integrated-tiger-habitat-conservation-programme/saving-sumatran-tiger. 211. WWF. Snow leopard | WWF China 2021 [Available from: https://en.wwfchina.org/en/what_we_do/species/fs/snow_leopard/. 212. Belbachir F, Pettorelli N, Wacher T, Belbachir-Bazi A, Durant SM. Monitoring Rarity: The Saharan Cheetah as a Flagship Species for a Threatened Ecosystem. PLoS One. 2015;10(1):e0115136. 213. Akçakaya HR, Bennett EL, Brooks TM, Grace MK, Heath A, Hedges S, et al. Quantifying species recovery and conservation success to develop an IUCN Green List of Species. Conservation Biology. 2018;32(5):1128-38. 214. Betts J, Young RP, Hilton-Taylor C, Hoffmann M, Rodríguez JP, Stuart SN, et al. A framework for evaluating the impact of the IUCN Red List of threatened species. Conservation Biology. 2020;34(3):632-43. 215. Willi Y, Van Buskirk J, Schmid B, Fischer M. Genetic isolation of fragmented populations is exacerbated by drift and selection. Journal of Evolutionary Biology. 2007;20(2):534-42. 216. Grueber CE, Wallis GP, Jamieson IG. Heterozygosity–fitness correlations and their relevance to studies on inbreeding depression in threatened species. Molecular ecology. 2008;17(18):3978-84. 217. Hedrick PW, Garcia-Dorado A. Understanding inbreeding depression, purging, and genetic rescue. Trends in ecology & evolution. 2016;31(12):940-52. 218. Ralls K, Ballou JD, Templeton A. Estimates of Lethal Equivalents and the Cost of Inbreeding in Mammals. Conservation Biology. 1988;2(2):185-93. 219. Lacy RC. Importance of genetic variation to the viability of mammalian populations. Journal of Mammalogy. 1997;78(2):320-35. 220. Keller LF, Waller DM. Inbreeding effects in wild populations. Trends in ecology & evolution. 2002;17(5):230-41. 221. Spielman D, Brook BW, Frankham R. Most species are not driven to extinction before genetic factors impact them. Proc Natl Acad Sci U S A. 2004;101(42):15261-4. 222. Todesco M, Pascual MA, Owens GL, Ostevik KL, Moyers BT, Hübner S, et al. Hybridization and extinction. Evol Appl. 2016;9(7):892-908. 223. Reed DH, Lowe EH, Briscoe DA, Frankham R. Inbreeding and extinction: effects of rate of inbreeding. Conservation Genetics. 2003;4(3):405-10. 224. Menotti-Raymond M, O'Brien SJ. Dating the genetic bottleneck of the African cheetah. Proceedings of the National Academy of Sciences. 1993;90(8):3172-6.

31 225. Alasaad S, Soriguer RC, Chelomina G, Sushitsky YP, Fickel J. Siberian tiger's recent population bottleneck in the Russian Far East revealed by microsatellite markers. Mammalian Biology. 2011;76(6):722-6. 226. Culver M, Hedrick PW, Murphy K, O'Brien S, Hornocker MG. Estimation of the bottleneck size in Florida panthers. Animal Conservation. 2008;11(2):104-10. 227. O’Brien SJ, Johnson WE, Driscoll CA, Dobrynin P, Marker L. Conservation Genetics of the Cheetah: Lessons Learned and New Opportunities. Journal of Heredity. 2017;108(6):671-7. 228. Henry P, Miquelle D, Sugimoto T, McCullough DR, Caccone A, Russello MA. In situ population structure and ex situ representation of the endangered Amur tiger. Molecular ecology. 2009;18(15):3173-84. 229. Russello MA, Gladyshev E, Miquelle D, Caccone A. Potential genetic consequences of a recent bottleneck in the Amur tiger of. Conservation Genetics. 2004;5(5):707-13. 230. Roelke ME, Martenson JS, O'Brien SJ. The consequences of demographic reduction and genetic depletion in the endangered Florida panther. Current Biology. 1993;3(6):340-50. 231. Kohler IV, Preston SH, Lackey LB. Comparative mortality levels among selected species of captive animals. Demographic Research. 2006;15:413-34. 232. Koester DC, Freeman EW, Brown JL, Wildt DE, Terrell KA, Franklin AD, et al. Motile Sperm Output by Male Cheetahs (Acinonyx jubatus) Managed Ex Situ Is Influenced by Public Exposure and Number of Care- Givers. PLoS One. 2015;10(9):e0135847-e. 233. Morato R, Conforti V, Azevedo F, Jacomo A, Silveira L, Sana D, et al. Comparative analyses of semen and endocrine characteristics of free-living versus captive jaguars (Panthera onca). Reproduction. 2001;122(5):745-51. 234. Ballou J, Ralls K. Inbreeding and juvenile mortality in small populations of ungulates: A detailed analysis. Biological Conservation. 1982;24(4):239-72. 235. Laikre L, Ryman N. Inbreeding Depression in a Captive (Canis lupus) Population. Conservation Biology. 1991;5(1):33-40. 236. Christie MR, Marine ML, French RA, Blouin MS. Genetic adaptation to captivity can occur in a single generation. Proceedings of the National Academy of Sciences. 2012;109(1):238-42. 237. Lynch M, O'Hely M. Captive breeding and the genetic fitness of natural populations. Conservation Genetics. 2001;2(4):363-78. 238. O’Brien SJ, Wildt DE, Goldman D, Merril CR, Bush M. The cheetah is depauperate in genetic variation. Science. 1983;221(4609):459-62. 239. Yuhki N, O'Brien SJ. DNA variation of the mammalian major histocompatibility complex reflects genomic diversity and population history. Proceedings of the National Academy of Sciences. 1990;87(2):836- 40. 240. Dubach J, Briggs M, White P, Ament B, Patterson B. Genetic perspectives on “lion conservation units” in Eastern and Southern Africa. Conservation Genetics. 2013;14(4):741-55. 241. Dures SG, Carbone C, Loveridge AJ, Maude G, Midlane N, Aschenborn O, et al. A century of decline: Loss of genetic diversity in a southern African lion-conservation stronghold. Diversity and Distributions. 2019;25(6):870-9. 242. Miotto R, Cervini M, Figueiredo M, Begotti R, Galetti P. Genetic diversity and population structure of pumas (Puma concolor) in southeastern Brazil: implications for conservation in a human-dominated landscape. Conservation Genetics. 2011;12(6):1447-55. 243. Lindburg DG, Durrant BS, Millard SE, Oosterhuis JE. Fertility assessment of cheetah males with poor quality semen. Zoo Biology. 1993;12(1):97-103.

32 244. Durrant BS, Millard SE, Zimmerman DM, Lindburg DG. Lifetime semen production in a cheetah (Acinonyx jubatus). Zoo Biology. 2001;20(5):359-66. 245. Ouborg NJ, Pertoldi C, Loeschcke V, Bijlsma R, Hedrick PW. Conservation genetics in transition to conservation genomics. Trends in Genetics. 2010;26(4):177-87. 246. Allendorf FW, Hohenlohe PA, Luikart G. Genomics and the future of conservation genetics. Nature Reviews Genetics. 2010;11(10):697-709. 247. Brandies P, Peel E, Hogg CJ, Belov K. The Value of Reference Genomes in the Conservation of Threatened Species. Genes (Basel). 2019;10(11):846. 248. Taylor H, Dussex N, van Heezik Y. Bridging the conservation genetics gap by identifying barriers to implementation for conservation practitioners. Global Ecology and Conservation. 2017;10:231-42. 249. Mardis E, McPherson J, Martienssen R, Wilson RK, McCombie WR. What is Finished, and Why Does it Matter. Genome Res. 2002;12(5):669-71. 250. Kim S, Cho YS, Kim H-M, Chung O, Kim H, Jho S, et al. Comparison of carnivore, omnivore, and herbivore mammalian genomes with a new leopard assembly. Genome Biology. 2016;17(1):211. 251. Mittal P, Jaiswal SK, Vijay N, Saxena R, Sharma VK. Comparative analysis of corrected tiger genome provides clues to its neuronal evolution. Scientific Reports. 2019;9(1):18459. 252. Renschler G, Richard G, Valsecchi CIK, Toscano S, Arrigoni L, Ramírez F, et al. Hi-C guided assemblies reveal conserved regulatory topologies on X and autosomes despite extensive genome shuffling. Genes & Development. 2019;33(21-22):1591-612. 253. Armstrong EE, Taylor RW, Miller DE, Kaelin CB, Barsh GS, Hadly EA, et al. Long live the king: chromosome-level assembly of the lion ( Panthera leo ) using linked-read, Hi-C, and long-read data. BMC Biology. 2020;18(1):3.

33 Chapter 2 The domestic cat as a novel genetic model of complex disease

2.1 Synopsis- The Burmese cat as a genetic model of type 2 diabetes in humans

Since their domestication, cats have undergone a relatively short period of selective breeding for aesthetic traits typically under a Mendelian mode of inheritance. These pedigreed populations display a simplified genetic landscape that makes them suitable for mapping the genetic loci underlying breed-specific traits and diseases. Breed-based clustering of certain diseases in cats mirrors the familial and ethnic population clustering of many complex diseases in humans. This population-wide genetic signature is useful when studying common and complex diseases that have a variable phenotypic presentation, are globally distributed and have a large environmental component. In this chapter, I present the Burmese cat breed as a naturally occurring model for studying the genetic basis of type 2 diabetes (T2D) in humans. Feline diabetes mellitus (FDM) is a common condition that shares more clinical and pathological features with T2D, than any other species. The Burmese breed has an elevated risk of developing FDM, compared with other breeds. In section 2.1 I present a published review that discusses the genetic profile of the Burmese breed and the clinical and pathological similarities between FDM and T2D that make the Burmese cat a unique, naturally occurring model of T2D. This manuscript was published in the journal Animal Genetics.

34 REVIEW doi: 10.1111/age.12799 The Burmese cat as a genetic model of type 2 diabetes in humans

† ‡ G. Samaha* , J. Beatty , C. M. Wade and B. Haase* *Sydney School of Veterinary Science, University of Sydney, Sydney, NSW, 2006, Australia . †Sydney School of Veterinary Science, Valentine Charlton Cat Centre, University of Sydney, Sydney, NSW, 2006, Australia . ‡School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, 2006, Australia .

Summary The recent extension of genetic tools to the domestic cat, together with the serendipitous consequences of selective breeding, have been essential to the study of the genetic diseases that affect them. Cats are increasingly presented for veterinary surveillance and share many of human’s heritable diseases, allowing them to serve as natural models of these conditions. Feline diabetes mellitus is a common condition in domestic cats that bears close pathological and clinical resemblance to type 2 diabetes in humans, including pancreatic b-cell dysfunction and peripheral insulin resistance. In Australia, New Zealand and Europe, diabetes mellitus is almost four times more common in cats of the Burmese breed than in other breeds. This geographically based breed predisposition parallels familial and population clustering of type 2 diabetes in humans. As a genetically isolated population, the Australian Burmese breed provides a spontaneous, naturally occurring genetic model of type 2 diabetes. Genetically isolated populations typically exhibit extended linkage disequilibrium and increased opportunity for deleterious variants to reach high frequencies over many generations due to genetic drift. Studying complex diseases in such populations allows for tighter control of confounding factors including environmental heterogeneity, allelic frequencies and population stratification. The homogeneous genetic background of Australian Burmese cats may provide a unique opportunity to either refine genetic signals previously associated with type 2 diabetes or identify new risk factors for this disease.

Keywords animal model, diabetes mellitus, feline, genetics, type 2 diabetes

understanding of the metabolic consequences of T2D Introduction pathogenesis. The global prevalence of type 2 diabetes (T2D) in adult Large-scale, open-population genome-wide association humans has nearly doubled in the past 30 years and is studies (GWASes) have been integral in explaining the expected to rise further over succeeding decades (World genetic complexity of T2D. These association studies have Health Organisation 2001). T2D pathogenesis is charac- consistently identified loci associated with altered glucose terised by hyperglycaemia caused by dysfunctional insulin metabolism and pancreatic b-cell decompensation (Maha- secretion and action. Several processes are involved in the jan et al. ; Xue et al. 2018). However, many of these signals development of T2D, specifically targeting insulin-produc- exhibit heterogeneous expression across different popula- ing pancreatic b-cells with consequential insulin deficiency tions. More recently, GWASes extended to genetically and insulin resistance in peripheral tissues (ADA 2018). isolated populations, disproportionately affected by T2D, Determining the aetiology of T2D has been difficult, as it is a have proven useful in simplifying the identification of robust complex endocrine disorder that develops slowly over time risk variants with association studies (Hanson et al. 2017; and affects almost all body systems. The growing population Moltke et al. 1999). These studies have provided insights and economic costs associated with this disorder and its into population-dependent T2D presentation, improving the affiliated complications are exacerbated by an incomplete detection of founder alleles present at a higher frequency in specific populations, thereby informing the broader under- Address for correspondence lying biological mechanisms and genetic risks faced by specific populations. G. Samaha, The University of Sydney, RMC Gunn Building, Regimental Feline diabetes mellitus (FDM; OMIA 000277-9685) is a Dr, Sydney, NSW 2006, Australia. common endocrinopathy of domestic cats, with Burmese E-mail: [email protected] cats in Australia, New Zealand and Europe having an Accepted for publication 11 March 2019

© 2019 Stichting International Foundation for Animal Genetics, 50, 319–325 319

35 320 Samaha et al.

increased risk of becoming diabetic compared with other descent is often characterised by an earlier age of onset and breeds (Rand et al. 2013; Wade et al. 2014; Lederer et al. lower body mass index than in Europeans (Yoon et al. 2010; Ohlund€ et al. 1996). Although one in 180 cats from 2006). Transferability studies have shown allelic hetero- the general cat population is affected by FDM, one in 50 geneity at several of these common T2D loci between Burmese cats develop the condition. Compared with the different ethnic populations that may underlie this pheno- general cat population, Burmese cats are at three to four typic variation (Kuo et al. 2010; Liu et al. 2008). Genetically times greater risk than are non-Burmese cats (McCann et al. isolated populations are characterised by reduced pheno- 2005). Like genetically isolated human populations, the typic, environmental and genetic heterogeneity. Their small domestic cat (Felis silvestris catus), specifically Australian effective population size results in high levels of linkage representatives of the Burmese breed, may provide a disequilibrium (LD) and homozygosity, increasing the sta- clinically relevant genetic population for the application of tistical power to detect disease variants due to genetic drift T2D GWASes. Companion animals are increasingly used to (Service et al. 1997). Leveraging the reduced genetic model analogous human diseases and have been the variation of isolated populations with unique disease subjects of various GWASes (Karlsson et al. 2006; Gandolfi prevalence can refine association signals that lead to et al. 2016). This is due to their homogeneous genetic identifying susceptibility loci, founder alleles and novel architecture resulting from selective inbreeding over many disease pathways (Hatzikotoulas et al. 2014). GWASes generations. across population isolates, including Native Americans, Greenlandic Inuit, Icelanders and Finns, have contributed to the catalogue of T2D risk loci, including common Simplifying the genetic landscape of T2D variants that have been widely replicated in European Although T2D is a common disease, its phenotypic presen- cohorts (Scott et al. 2005; Moltke et al. 1999), novel tation covers a spectrum of pathophysiological processes variants (Hanson et al. 2017; Steinthorsdottir et al. 2014) across patient populations (Ahlqvist et al. 2018; Udler et al. and causal variants with therapeutic potential (Flannick 2014). Many GWAS loci associated with T2D have only a et al. 2014). As more T2D-affected populations are sub- modest effect size and display allelic heterogeneity across jected to GWASes, it has become apparent that population- different populations (Kuo et al. 2010; Liu et al. 2008) and specific allele frequency profiles are responsible for differ- varied allelic effects between these populations (Guo et al. ences in T2D risk factors and phenotypes between popula- 2018; Cho et al. 2012). Despite the successful identification tions (Ma & Chan 2015; Willet & Haase 2004; Fumagalli of numerous T2D-associated risk loci using GWASes, et al. 2015). Although these novel discoveries cannot robustly implicating genes and causal variants associated always be realized in other populations, their value with these loci in the pathophysiology of this disease has ultimately lies in their contribution to exposing the stratified proven difficult, limiting opportunities for clinical transla- aetiopathogenesis of T2D. tion. GWASes in large, outbred East Asian and European Genetic studies in companion animals offer similar populations have dominated the search for T2D risk loci. In statistical advantages to those performed in genetically outbred populations, which typically exhibit greater haplo- isolated human populations. Selective breeding of domestic type diversity for a given locus, causative variants of cats is a relatively new practice, occurring only in the past complex traits are likely to be diluted among several 150 years. Founder effects associated with selective breed- haplotypic backgrounds, impeding their detection. These ing have resulted in genetically distinct breeds characterised GWASes have identified approximately 100 T2D suscepti- by low within-population variance and heterozygosity and bility loci, over 80 of which are common (minor allele extensive LD, making them less genetically diverse than frequency ≥5%; Morris et al. 2009; Mahajan et al. ; Fuchs- their random-bred counterparts (Lipinski et al. 2016; Gan- berger et al. 2016). These loci overwhelmingly reside in dolfi et al. 2018). Domesticated breeds that are dispersed non-coding or regulatory regions of the genome and are across a geographical range have often undergone more estimated to explain less than 15% of T2D familial than one bottleneck event. This likely confers an increased aggregation (Morris et al. 2009). These association signals risk of complex diseases caused by a combination of rare tend to span great distances of the genome and include and common alleles, and as was shown in the domestic dog multiple genes through which T2D phenotypes may be (Karlsson et al. 2006), such structures provide additional mediated. Except for a few functionally validated genes, opportunities for exposing risk variants. these loci are commonly named according to their closest, most compelling candidate gene. As such there has been Genomic resources for health studies in the limited progress in defining the genetic vulnerabilities that domestic cat underlie T2D. Although T2D is a common disease, differential pheno- In recent years, the domestic cat has increasingly been the typic presentation among patients of different ancestry is subject of investigation into the genetic basis of disease often observed. For example, T2D in people of East Asian susceptibility. Forty-two pedigreed breeds are now

© 2019 Stichting International Foundation for Animal Genetics, 50, 319–325

36 Genetics of diabetes in Burmese cats 321 recognised globally (Cat Fanciers’ Association 2018; The also reported to have inherited derangements in lipid International Cat Association 2015; metabolism that may be important in the development of 1999). Breeds are defined by standardised phenotypic FDM (Crispin 2002; Kluger et al. 2017). characteristics including coat colour, pattern, texture and length. The simplified genetic landscape of domestic cat Pathological features of T2D and FDM breeds has resulted in hundreds of breed-specific traits and pathologies that offer novel opportunities to explore the Naturally occurring diabetes in cats presents significant genetic basis of simple and complex traits (https://omia.org/). parallels to the understanding of T2D in humans. FDM and Dozens of genetic variants have been directly implicated in T2D share key features involved in the complex interplay phenotypic and inherited diseases in the domestic cat. Many between genetic, metabolic and acquired factors that define of these diseases segregate in a breed- and population- the diabetic state. Rodents are the preferred animal models for specific manner, and some benefit from commercial genetic exploring the genetic basis of T2D; however, these are tests (Lyons ). typically induced, single-gene models that are best suited to The understanding of the genetic basis of these diseases is a addressing the specific roles of genes of interest. Naturally result of a range of genomic resources available for the occurring animal models provide the opportunity to study the domestic cat. These resources include annotated reference aetiopathogenesis of this disease in its natural context. genomes (Pontius et al. 1990; Tamazian et al. 2009), a dense Current classification, diagnosis and management of FDM single nucleotide polymorphism (SNP) map and manifest follows that of T2D in humans and has previously been (Mullikin et al. 2012; Willet & Haase 2018), the Illumina reviewed by others (Rand & Marsall 2008; Caney 2013; Rand Infinium iSelect 63K DNA SNP array, whole genome 2007; Gottlieb & Rand 1998). The presentation of diabetic sequencing (Montague et al. 2014; Li et al. 2013), radiation cats to veterinary clinics has been increasing in recent years hybrid and linkage maps (Menotti-Raymond et al. 2007; with approximately one in 200 cats diagnosed as diabetic in Davis et al. 2009; Li et al. 2013) and the 99 Lives Cat Genome the UK (McCann et al. 2005). The similarities between FDM Sequencing Initiative (http://felinegenetics.missouri.edu/ and T2D clinical and molecular phenotypes have been 99lives). These resources have been used to map and extensively reviewed elsewhere (Henson & O’Brien 2001; understand heritable characteristics, including genetic vari- Osto et al. 2013; Gilor et al. 2016). Several pathogenic ants responsible for coat phenotypes (OMIA 000202-9685, processes can lead to FDM in genetically vulnerable individ- 001199-9685), morphology (OMIA 000319-9685) and uals, accelerated by acquired risk factors of obesity, excess heritable diseases (OMIA 001949-9685). caloric intake and physical inactivity (Kahn et al. 2011; Slingerland et al. 2012; Ohlund€ et al. 2015). The Australian Burmese cat: a genetic model of T2D Glucotoxicity, islet amyloidosis and b-cell dysfunction In Australia and Europe, FDM is observed at a significantly At the cellular level, pancreatic islets of FDM and T2D patients higher rate in the Burmese breed compared with the general are characterised by reduced b-cell mass and islet amyloidosis cat population (Fig. 1). Approximately 2% of the Burmese (IA). Although the precise reasons for reduced b-cell mass population is affected by FDM, and that number rises to remain unknown, some factors responsible for this decline 10% in cats over eight years of age (Rand et al. 2013; include glucotoxicity, ageing and genetic abnormalities. Lederer et al. 2010). This heightened risk among Burmese Human and feline b-cells exposed to hyperglycaemic condi- has not been observed in North American breeding popu- tions display strongly impaired function, resulting in exhaus- lations, where the Burmese breed is genetically distinct tion, decreased mass and decreased insulin gene expression (Panciera et al. ; Prahl et al. 2011; Lipinski et al. 2016). (Maedler et al. 2013; Zini et al. 2009). Glucotoxicity inhibits Burmese cats exhibit signs of low genetic diversity associ- the activity of key b-cell transcription factors, silencing b-cell ated with inbreeding, having the second lowest level of specific genes and resulting in the dedifferentiation of b-cells heterozygosity, highest inbreeding coefficient and largest (Gao et al. 2014; Nishimura et al. 2010; Gutierrez et al. extent of LD of any cat breed (Lipinski et al. 2016; Alhaddad 2007). b-cell mass is reduced up to 65% at diagnosis in T2D et al. 2013). Given these genetic features, the Burmese patients, suggesting that b-cell loss is a prerequisite factor for breed is well placed as a population isolate in identifying initiating diabetic pathogenesis (Yoon et al. 2003; Rahier and refining genetic association signals that lead to iden- et al. 2007; Chen et al. 2017). Hyperglycaemia induces tifying susceptibility loci and novel disease pathways. systemic inflammation similarly in humans and cats; how- Several hereditary diseases have segregated in the breed, ever, the local islet inflammatory response observed in including: brachycephaly (OMIA001551-9685) and gan- hyperglycaemic humans has not been shown to the same gliosidosis type II (OMIA 001462-9685), hypokalaemic extent in cats, suggesting that b-cell dysfunction in cats may periodic paralysis (OMIA 001759-9685) and an orofacial occur through mechanisms that do not involve a local pain syndrome (Heath 2014). Australian Burmese cats are inflammatory response (Zini et al. 2009).

© 2019 Stichting International Foundation for Animal Genetics, 50, 319–325

37 322 Samaha et al.

6

5

4

3 Prevalence (%) 2

1

0 UK (McCann et al. 2007) UK (O'Neill et al. 2016) Australia (Rand et al. 1997)Australia (Baral et al. 2003) Australia (Lederer et al. N = 14030 N = 194563 N = 4402 N = 4299 2009) N = 12576 General population FDM cases Burmese FDM cases

Figure 1 Prevalence studies of FDM in Australian and British veterinary and insured cat populations have identified a higher rate of FDM among Burmese cats compared with the general cat population.

Islet amyloidosis has been implicated in the progressive (Appleton et al. 2001). Obesity is a persistent state of loss of b-cell mass in insulin-resistant and diabetic individ- hyperinsulinemia, and the primary abnormality in an uals. This has been observed in diabetic cats, with a 50% insulin-resistant state such as obesity is an increased loss of b-cell mass associated with structural changes, lipid requirement for insulin to maintain glucose homeostasis. derangement, IA deposition and glycogen accumulation An early functional defect observed in T2D is decreased (O’Brien et al. 1985; Zini et al. 2016). IA is observed as insulin gene expression (Poitout et al. 2013). Downregula- naturally occurring in only humans, primates and cats tion of insulin signalling genes is observed in obese cats, (O’Brien et al. 1986, 1993; Zini et al. 2016). In T2D, the mirroring that of obese, insulin-resistant humans (Mori extracellular deposition of insoluble islet amyloid polypep- et al. 2014). Obese cats and humans are significantly more tide (IAPP) impairs nutrient and oxygen uptake, resulting in likely to develop FDM than are their optimal-weight cytotoxicity of b-cells and not adjacent a-ord-cells (Law counterparts and have shown decreased glucose effective- et al. 2013; Spijker et al. 2006). These deposits have been ness (Appleton et al. 2001; Hoenig et al. 2006; Boffetta et al. observed in over 90% of T2D patients, and their frequency 2011). varies considerably in diabetic felines, from 22% to 100% Polymorphisms associated with the melanocortin 4 recep- (Yano et al. 1981; O’Brien et al. ; Goossens et al. 2016; tor gene (MC4R) locus are associated with the risk of Jurgens et al. 2006; Zini et al. 2016). It is also important to obesity, insulin resistance and T2D in humans (Chambers note that within these same studies, amyloid deposits have et al. 2008; Xi et al. 2012). This gene may have an impact been observed to varying extent in non-diabetic humans on energy expenditure and food intake. A candidate gene and felines. Genetic studies have revealed associations study found a polymorphism in MC4R (c.92C>T) to be between T2D and IAPP metabolism; however, many of significantly associated with FDM in obese domestic short- these signals have failed to be consistently replicated across haired cats (P < 0.01; Forcada et al. 2014). It has been various populations (Zeggini et al. 2007; Lam et al. 2008). hypothesised that, because cats are obligate carnivores and These inconsistencies, coupled with the extent of amyloid adapted to a high-protein, low-carbohydrate diet, commer- deposition in diabetic compared with non-diabetic cats, call cial dry foods may exacerbate insulin demand and predis- into question the contribution of IA to T2D and whether it is pose them to FDM (Rand et al. 2004). Diets high in a cause or consequence of declining b-cell mass. carbohydrates lead to elevated postprandial glucose and insulin concentrations in insulin-resistant cats and humans compared with high-fat and high-protein diets (McAuley Insulin resistance and dyslipidaemia et al. 2014; Keller et al. 2007). Cats have been shown to Insulin resistance is also essential to the development of tolerate high-carbohydrate diets, and although low-carbo- FDM and T2D. Obesity is a major contributing factor in hydrate diets contribute to improved glycaemic control in insulin resistance and is observed similarly in both diseases diabetic cats, there is no evidence thus far that carbohydrate

© 2019 Stichting International Foundation for Animal Genetics, 50, 319–325

38 Genetics of diabetes in Burmese cats 323 intake should be considered a risk factor for FDM (Laflamme Cat Fanciers Association (2018) http://cfa.org/. 2013; Slingerland et al. 2012). Chambers J.C., Elliott P., Zabaneh D., Zhang W., Li Y., Froguel P., The underlying mechanisms predisposing Burmese to FDM Balding D., Scott J. & Kooner J.S. (2008) Common genetic have yet to be described; however, inherited derangements in variation near MC4R is associated with waist circumference and 40 – lipid metabolism in the form of lipid aqueous and delayed insulin resistance. Nature Genetics , 716 8. Chen C., Cohrs C.M., Stertmann J., Bozsak R. & Speier S. (2017) triglyceride clearance have been described in this breed Human beta cell mass and function in diabetes: recent advances (Crispin 2002; Kluger et al. 2017). Gene expression profiling in knowledge and technologies to understand disease pathogen- demonstrated lean Burmese adiponectin and cholesterol esis. Molecular Metabolism 6, 943–57. lipoprotein profiles as analogous to those of obese cats (Lee ChoY.S.,LeeJ.-.Y.,ParkK.S.&NhoC.W.(2012) Genetics of type 2 diabetes et al. 2009). Similar profiles of lipid dysregulation have been in East Asian populations. Current Diabetes Reports 12,68–96. observed in Japanese individuals with an increased preva- Crispin S. (2002) Ocular lipid deposition and hyperlipopro- lence of T2D and obesity (Weyer et al. 2018). In this study, teinaemia. Progress in Retinal and Eye Research 21, 169–224. low plasma proinflammatory cytokine adiponectin was more Davis B.W., Raudsepp T., Pearks Wilkerson A.J., Agarwala R., closely reflective of the severity of insulin resistance than was Schaffer€ A.A., Houck M., Chowdhary B.P. & Murphy W.J. (2009) the degree of glucose intolerance or adiposity. As such, A high-resolution cat radiation hybrid and integrated FISH deranged lipid metabolism may be related to the prevalence of mapping resource for phylogenomic studies across Felidae. Genomics 93, 299–304. FDM in some Burmese breeding populations, occurring prior Flannick J., Thorleifsson G., Beer N.L. et al. (2014) Loss-of-function to the development of an insulin-resistant state. mutations in SLC30A8 protect against type 2 diabetes. Nature Genetics 46, 357–63. Conclusion Forcada Y., Holder A., Church D.B. & Catchpole B. (2014) A polymorphism in the melanocortin 4 receptor gene (MC4R: In the case of complex diseases such as T2D, targeting C.92C>T) is associated with diabetes mellitus in overweight genetically isolated populations with unique disease preva- domestic shorthaired cats. Journal of Veterinary Internal Medicine lence for gene discovery can inform the understanding of 28, 458–64. underlying disease pathology. As such a population, the Fuchsberger C., Flannick J., Teslovich T.M. et al. (2016) The genetic Australian Burmese breed can increase the power of genetic architecture of type 2 diabetes. Nature, 536,41–7. association studies and potentially pinpoint population- Fumagalli M., Moltke I., Grarup N. et al. (2015) Greenlandic Inuit specific allele frequency profiles that may explain discrepan- show genetic signatures of diet and climate adaptation. Science cies in T2D risk factors and phenotypes between populations. 349, 1343–7. The shared clinical and pathophysiological features of FDM Gandolfi B., Alamri S., Darby W.G. et al. (2016) A dominant TRPV4 and T2D make FDM a promising clinical model of T2D in variant underlies osteochondrodysplasia in Scottish fold cats. 24 – humans. Identifying the genetic basis of this condition in cats Osteoarthritis Cartilage , 1441 50. Gandolfi B., Alhaddad H., Abdi M. et al. (2018) Applications and may provide valuable insight into the pathogenesis of human efficiencies of the first cat 63K DNA array. Scientific Reports 8, T2D. 7024. Gao T., McKenna B., Li C. et al. (2014) Pdx1 maintains b cell References identity and function by repressing an a cell program. Cell Metabolism 19, 259–71. Ahlqvist E., Storm P., Kar€ aj€ am€ aki€ A. et al. (2018) Novel subgroups Gilor C., Niessen S.J., Furrow E. & DiBartola S.P. (2016) What’s in a of adult-onset diabetes and their association with outcomes: a name? Classification of diabetes mellitus in veterinarymedicine and data-driven cluster analysis of six variables. The Lancet Diabetes why it matters Journal of Veterinary Internal Medicine 30, 927–40. and Endocrinology 6, 361–9. Goossens M.M., Nelson R.W., Feldman E.C. & Griffey S.M. (1998) Alhaddad H., Khan R., Grahn R.A. et al. (2013) Extent of linkage Response to insulin treatment and survival in 104 cats with disequilibrium in the domestic cat, Felis silvestris catus, and its diabetes mellitus (1985–1995). Journal of Veterinary Internal breeds. PLoS ONE 8, e53537. Medicine 12,1–6. American Diabetes Association (ADA) (2018) Classification and Gottlieb S. & Rand J. (2018) Managing feline diabetes: current diagnosis of diabetes. Standards of Care in Diabetes-2019, 41 perspectives. Veterinary Medicine: Research and Reports 9,33–42. (Suppl), 13–27. Guo T., Hanson R.L., Traurig M. et al. (2007) TCF7L2 is not a major Appleton D.J., Rand J.S. & Sunvold G.D. (2001) Insulin sensitivity susceptibility gene for type 2 diabetes in Pima Indians. Clinical decreases with obesity, and lean cats with low insulin sensitivity Research 12, 3082–8. are at greatest risk of glucose intolerance with weight gain. Gutierrez G.D., Bender A.S., Cirulli V., Mastracci T.L., Kelly S.M., Journal of Feline Medicine and Surgery 3, 211–28. Tsirigos A., Kaestner K.H. & Sussel L. (2017) Pancreatic b cell Boffetta P., McLerran D., Chen Y. et al. (2011) Body mass index and identity requires continual repression of non-b cell programs. diabetes in Asia: a cross-sectional pooled analysis of 900,000 Journal of Clinical Investigation 127, 244–59. individuals in the Asia cohort consortium. PLoS ONE 6, e19930. Hanson R.L., Muller Y.L., Kobes S. et al. (2014) A genome-wide Caney S.M. (2013) Management of cats on Lente insulin: tips and association study in American Indians implicates DNER as a traps. Veterinary Clinics of North America-Small Animal Practice susceptibility locus for type 2 diabetes. Diabetes, 63, 369–76. 43, 267–82.

© 2019 Stichting International Foundation for Animal Genetics, 50, 319–325

39 324 Samaha et al.

Hatzikotoulas K., Gilly A. & Zeggini E. (2014) Using population Lyons L.A. (2015) DNA mutations of the cat: the good, the bad and isolates in genetic association studies. Briefings in Functional the ugly. Journal of Feline Medicine and Surgery 17, 203–19. Genomics 13, 371–7. Ma R.C.W. & Chan J.C.N. (2013) Type 2 diabetes in East Asians: Heath S. (2001) Orofacial pain syndrome in cats. Veterinary Record similarities and differences with populations in Europe and the 149, 660. United States. Annals of the New York Academy of Sciences 1281, Henson M.S. & O’Brien T.D. (2006) Feline models of type 2 diabetes. 64–91. ILAR Journal 47, 234–42. Maedler K., Sergeev P., Ris F., Oberholzer J., Joller-Jemelka H.I., Hoenig M., Thomaseth K., Brandao J., Waldron M. & Ferguson D.C. Spinas G.A., Kaiser N., Halban P.A. & Donath M.Y. (2002) (2006) Assessment and mathematical modeling of glucose Glucose-induced b cell production of IL-1b contributes to turnover and insulin sensitivity in lean and obese cats. Domestic glucotoxicity in human pancreatic islets. Journal of Clinical Animal Endocrinology 31, 373–89. Investigation 110, 851–60. Jurgens C.A., Toukatly M.N., Fligner C.L. et al. (2011) b-cell loss Mahajan A., Go M.J., Zhang W. et al. (2014) Genome-wide trans- and b-cell apoptosis in human type 2 diabetes are related to ancestry meta-analysis provides insight into the genetic archi- islet amyloid deposition. American Journal of Pathology 178, tecture of type 2 diabetes susceptibility. Nature Genetics 46, 234– 2632–40. 44. Kahn S.E., Hull R.L. & Utzschneider K.M. (2006) Mechanisms McAuley K.A., Hopkins C.M., Smith K.J., McLay R.T., Williams linking obesity to insulin resistance and type 2 diabetes. Nature, S.M., Taylor R.W. & Mann J.I. (2005) Comparison of high-fast 444, 840–6. and high-protein diets with a high-carbohydrate diet in insulin- Karlsson E.K., Baranowska I., Wade C.M. et al. (2007) Efficient resistant obese women. Diabetologia 48,8–16. mapping of Mendelian traits in dogs through genomewide McCann T.M., Simpson K.E., Shaw D.J., Butt J.A. & Gunn-Moore association. Nature Genetics 39, 1321–8. D.A. (2007) Feline diabetes mellitus in the UK: the prevalence Keller C., Liesegang A., Frey D. & Wichert B. (2017) Metabolic within an insured cat population and a questionnaire-based response to three different diets in lean cats and cats predisposed putative risk factor analysis. Journal of Feline Medicine and Surgery to overweight. BMC Veterinary Research 13,1–10. 9, 289–99. Kluger E.K., Caslake M., Baral R.M., Malik R. & Govendir M. (2010) Menotti-Raymond M., David V.A., Lyons L.A., Schaffer€ A.A., Preliminary post-prandial studies of Burmese cats with elevated Tomlin J.F., Hutton M.K. & O’Brien S.J. (1999) A genetic linkage triglyceride concentrations and/or presumed lipid aqueous. map of microsatellites in the domestic cat (Felis catus). Genomics Journal of Feline Medicine and Surgery 12, 621–30. 57,9–23. Kuo J.Z., Sheu W.H.H., Assimes T.L. et al. (2013) Trans-ethnic fine Moltke I., Grarup N., Jørgensen M.E. et al. (2014) A common mapping identifies a novel independent locus at the 30 end of Greenlandic TBC1D4 variant confers muscle insulin resistance CDKAL1 and novel variants of several susceptibility loci for type 2 and type 2 diabetes. Nature, 512, 190–3. diabetes in a Han Chinese population. Diabetologia 56, 2619–28. Montague M.J., Li G., Gandolfi B. et al. (2014) Comparative analysis Laflamme D. (2008) Letter to the editor: cats and carbohydrates. of the domestic cat genome reveals genetic signatures underlying Topics in Companion Animal Medicine 23, 159–60. feline biology and domestication. Proceedings of the National Lam V.K.L., Ma R.C.W., Lee H.M. et al. (2013) Genetic association Academy of Sciences of the United States of America 111, 17230–5. of type 2 diabetes with islet amyloid polypeptide processing and Mori A., Lee P., Takemitsu H., Iwasaki E., Kimura N., Yagishita M., degrading pathways in Asian populations. PLoS ONE 8, e62378. Hayasaka M. & Arai T. (2009) Decreased gene expression of Law E., Lu S., Kieffer T.J., Warnock G.L., Ao Z., Woo M. & Marzban insulin signaling genes in insulin sensitive tissues of obese cats. L. (2010) Differences between amyloid toxicity in alpha and beta Veterinary Research Communications 33, 315–29. cells in human and mouse islets and the role of caspase-3. Morris A., Voight B. & Teslovich T. (2012) Large-scale association Diabetologia, 53, 1415–27. analysis provides insights into the genetic architecture and Lederer R., Rand J.S., Jonsson N.N., Hughes I.P. & Morton J.M. pathophysiology of type 2 diabetes. Nature Genetics 44, 981–90. (2009) Frequency of feline diabetes mellitus and breed predispo- Mullikin J.C., Hansen N.F., Shen L. et al. (2010) Light whole sition in domestic cats in Australia. Veterinary Journal 179, 254– genome sequence for SNP discovery across domestic cat breeds. 8. BMC Genomics 11,1–8. Lee P., Mori A., Coradini M., Mori N., Sagara F., Yamamoto I., Rand Nishimura W., Takahashi S. & Yasuda K. (2014) MafA is critical for J.S. & Arai T. (2013) Potential predictive biomarkers of obesity in maintenance of the mature beta cell phenotype in mice. Burmese cats. Veterinary Journal 195, 221–7. Diabetologia 58, 566–74. Li G., Hillier L.W., Grahn R.A. et al. (2016) A high-resolution SNP O’Brien T.D., Hayden D.W., Johnson K.H. & Stevens J.B. (1985) array-based linkage map anchors a new domestic cat draft High dose intravenous glucose tolerance test and serum genome assembly and provides detailed patterns of recombina- insulin and glucagon levels in diabetic and non-diabetic cats: tion. G3: Genes, Genomes, Genetics 6, 1607–16. relationships to insular amyloidosis. Veterinary Pathology 22, Lipinski M.J., Froenicke L., Baysac K.C. et al. (2008) The ascent of 250–61. cat breeds: genetic evaluations of breeds and worldwide random- O’Brien T.D., Hayden D.W., Johnson K.H. & Fletcher T.F. (1986) bred populations. Genomics 91,12–21. Immunohistochemical morphometry of pancreatic endocrine Liu C.T., Raghavan S., Maruthur N. et al. (2016) Trans-ethnic cells in diabetic, normoglycaemic glucose-intolerant and normal meta-analysis and functional annotation illuminates the genetic cats. Journal of Comparative Pathology 96, 357–69. architecture of fasting glucose and insulin. American Journal of O’Brien T., Butler P., Westermark P. & Johnson K. (1993) Islet Human Genetics 99,56–75. amyloid polypeptide—a review of its biology and potential roles

© 2019 Stichting International Foundation for Animal Genetics, 50, 319–325

40 Genetics of diabetes in Burmese cats 325

in the pathogenesis of diabetes-mellitus. Veterinary Pathology 30, inactivity rather than the proportion of dry food are risk 317–32. factors in the development of feline type 2 diabetes mellitus. O’Brien T.D., Wagner J.D., Litwak K.N., Carlson C.S., Cefalu W.T., Veterinary Journal 179, 247–53. Jordan K., Johnson K.H. & Butler P.C. (1996) Islet amyloid and Spijker H.S., Song H., Ellenbroek J.H. et al. (2015) Loss of b-cell islet amyloid polypeptide in cynomolgus macaques (Macaca identity occurs in type 2 diabetes and is associated with islet fascicularis): an animal model of human noninsulin-dependent amyloid deposits. Diabetes 64,1–45. diabetes mellitus. Veterinary Pathology 33, 479–85. Steinthorsdottir V., Thorleifsson G., Sulem P. et al. (2014) Identi- Ohlund€ M., Fall T., Strom€ Holst B., Hansson-Hamlin H., Bonnett B. fication of low-frequency and rare sequence variants associated & Egenvall A. (2015) Incidence of diabetes mellitus in insured with elevated or reduced risk of type 2 diabetes. Nature Genetics Swedish Cats in relation to age, breed and sex. Journal of 46, 294–8. Veterinary Internal Medicine 29, 1342–7. Tamazian G., Simonov S., Dobrynin P. et al. (2014) Annotated Ohlund€ M., Egenvall A., Fall T., Hansson-Hamlin H., Rocklinsberg€ features of domestic cat – Felis catus genome. GigaScience 3, H. & Holst B.S. (2017) Environmental risk factors for diabetes 13. mellitus in cats. Journal of Veterinary Internal Medicine 31,29–35. The International Cat Association (2018) https://www.tica.org/en/. Osto M., Zini E., Reusch C.E. & Lutz T.A. (2013) Diabetes from Udler M.S., Kim J., Grotthuss M.V. et al. (2018) Clustering of type 2 humans to cats. General and Comparative Endocrinology 182,48–53. diabetes genetic loci by multi-trait associations identifies disease Panciera D.L., Thomas C.B., Eicker S.W. & Atkins C.E. (1990) mechanisms and subtypes. PLoS Medicine 15, e1002654. Epizootiologic patterns of diabetes mellitus in cats: 333 cases Wade C.M., Gething M. & Rand J.S. (1999) Evidence of a genetic (1980–1986). Journal of the American Veterinary Medical Associ- basis for diabetes mellitus in Burmese cats. Journal of Veterinary ation 197, 1504–8. Internal Medicine 13, 269. Poitout V., Amyot J., Semache M., Zarrouki B., Hagman D. & Fontes Weyer C., Funahashi T., Tanaka S., Hotta K., Matsuzawa Y., G. (2011) Glucolipotoxicity of the pancreatic beta cell. Biochimica Pratley R.E. & Tataranni P.A. (2001) Hypoadiponectinemia in et Biophysica Acta (BBA) – Molecular and Cell Biology of Lipids obesity and type 2 diabetes: close association with insulin 1801, 289–98. resistance and hyperinsulinemia. Journal of Clinical Endocrinology Pontius J.U., Mullikin J.C., Smith D.R. et al. (2007) Initial sequence and Metabolism 86, 1930–5. and comparative analysis of the cat genome. Genome Research 17, Willet C.E. & Haase B. (2014) An updated felCat5 SNP manifest for 1675–89. the Illumina Feline 63k SNP genotyping array. Animal Genetics Prahl A., Guptill L., Glickman N.W., Tetrick M. & Glickman L.T. 45, 614–5. (2007) Time trends and risk factors for diabetes mellitus in cats World Cat Federation (2018) http://www.wcf-online.de/WCF-EN/. presented to veterinary teaching hospitals. Journal of Feline World Health Organisation (2016) Global Report on Diabetes. Medicine and Surgery 9, 351–8. https://www.who.int/diabetes/global-report/en/ Rahier J., Guiot Y., Goebbels R.M., Sempoux C. & Henquin J.C. Xi B., Takeuchi F., Chandak G., Kato N., Pan H.W., AGEN-T2D (2008)Pancreaticbeta-cellmassinEuropeansubjectswithtype Consortium, Zhou D.H., Pan H.Y. & Mi J. (2012) Common 2 diabetes. Diabetes, Obesity and Metabolism 10,32–42. polymorphism near the MC4R gene is associated with type 2 Rand J.C. (2013) Pathogenesis of feline diabetes. Veterinary Clinics diabetes: data from a metaanalysis of 123,373 individuals. of North America: Small Animal Practice 43, 221–31. Diabetologia 55, 2660–6. Rand J.C. & Marsall R.D. (2005) Diabetes mellitus in cats. Veterinary Xue A., Wu Z., Zhu Z. et al. (2018) Genome-wide association Clinics of North America: Small Animal Practice 35, 211–24. analyses identify 143 risk variants and putative regulatory Rand J.C., Bobbermien L.M., Hendrickz J.K. & Copland M. (1997) mechanisms for type 2 diabetes. Nature Communications 9, 2941. Over representation of Burmese cats with diabetes mellitus. Yano B.L., Hayden D.W., & Johnson K.H. (1981) Feline insular Australian Veterinary Journal 75, 402–5. amyloid: association with diabetes mellitus. Veterinary Pathology Rand J., Fleeman J.S., Farrow H.A., Appleton D.J. & Lederer R. 18, 621–7. (2004) Canine and feline diabetes mellitus: nature or nurture? Yoon K.H., Ko S.H., Cho J.H. et al. (2003) Selective b-cell loss Journal of Nutrition 134, 2072–80. and a-cell expansion in patients with type 2 diabetes mellitus Scott R.A., Lagou V., Welch R.P. et al. (2012) Large-scale associ- in Korea. Journal of Clinical Endocrinology and Metabolism 88, ation analyses identify new loci influencing glycemic traits and 2300–8. provide insight into the underlying biological pathways. Nature Yoon K., Lee J., Kim J., Cho J.H., Choi Y.H., Ko S.H., Zimmet P. & Genetics 44, 991–1005. Son H.Y. (2006) Epidemic obesity and type 2 diabetes in Asia. Service S., DeYoung J., Karayiorgou M. et al. (2006) Magnitude and Lancet 368, 1681–8. distribution of linkage disequilibrium in population isolates and Zeggini E., Weedon M.N., Lindgrn C.M. et al. (2007) Multiple type 2 implications for genome-wide association studies. Nature Genetics diabetes susceptibility genes following genome- wide association 38, 556–60. scan in UK samples. Science 316, 1336–41. SIGMA Type 2 Diabetes Consortium, Williams A.L., Jacobs S.B. et al. Zini E., Osto M., Franchini M. et al. (2009) Hyperglycaemia but not (2014) Sequence variants in SLC16A11 are a common risk hyperlipidaemia causes beta cell dysfunction and beta cell loss in factor for type 2 diabetes in Mexico. Nature 506,97–101. the domestic cat. Diabetologia 52, 336–46. Slingerland L.I., Fazilova V.V., Plantinga E.A., Kooistra H.S. & Zini E., Ferro S., Lunardi F. et al. (2016) Endocrine pancreas in cats Beynen A.C. (2009) Indoor confinement and physical with diabetes mellitus. Veterinary Pathology 53, 145–52.

© 2019 Stichting International Foundation for Animal Genetics, 50, 319–325

41 2.2 Synopsis- Mapping the genetic basis of diabetes mellitus in the Australian Burmese cat

Pedigreed cat breeds exhibit extensive linkage disequilibrium (LD), compared with humans and other randomly breeding populations. LD is an essential component of genome-wide mapping techniques, as populations with extensive LD reduce the number of markers and samples required to detect genetic loci underpinning traits of interest. Further, the extent of LD can be used to narrow regions of linkage or regions of association to a candidate gene. In section 2.2, I present an original research article that describes two approaches for mapping the genetic basis of feline diabetes mellitus in the Burmese breed. Taking advantage of the simplified genetic landscape of the Burmese breed, I performed genome-wide association study and selective sweep analyses to identify diabetes-associated haplotypes and candidate genes. Mapping this disease in the Burmese breed facilitated tighter control of confounding factors like population stratification, varied allelic frequency and environmental heterogeneity. The identification of risk alleles segregating in genes previously implicated in lipid dysregulation and type 2 diabetes (T2D) pathophysiology in humans supports the use of the Burmese cat as a naturally occurring animal model of T2D. This manuscript was published in the journal Scientific Reports.

42 www.nature.com/scientificreports

OPEN Mapping the genetic basis of diabetes mellitus in the Australian Burmese cat (Felis catus) Georgina Samaha1*, Claire M. Wade2, Julia Beatty1,3, Leslie A. Lyons4, Linda M. Fleeman5 & Bianca Haase1

Diabetes mellitus, a common endocrinopathy afecting domestic cats, shares many clinical and pathologic features with type 2 diabetes in humans. In Australia and Europe, diabetes mellitus is almost four times more common among Burmese cats than in other breeds. As a genetically isolated population, the diabetic Australian Burmese cat provides a spontaneous genetic model for studying diabetes mellitus in humans. Studying complex diseases in pedigreed breeds facilitates tighter control of confounding factors including population stratifcation, allelic frequencies and environmental heterogeneity. We used the feline SNV array and whole genome sequence data to undertake a genome wide-association study and runs of homozygosity analysis, of a case–control cohort of Australian and European Burmese cats. Our results identifed diabetes-associated haplotypes across chromosomes A3, B1 and E1 and selective sweeps across the Burmese breed on chromosomes B1, B3, D1 and D4. The locus on chromosome B1, common to both analyses, revealed coding and splice region variants in candidate genes, ANK1, EPHX2 and LOXL2, implicated in diabetes mellitus and lipid dysregulation. Mapping this condition in Burmese cats has revealed a polygenic spectrum, implicating loci linked to pancreatic beta cell dysfunction, lipid dysregulation and insulin resistance in the pathogenesis of diabetes mellitus in the Burmese cat.

Domestic cats (Felis catus), a common household share with humans many environmental and lifestyle risk factors for metabolic disorders like diabetes mellitus. Feline diabetes mellitus (FDM; OMIA 000277-9685) afects approximately one in 200 cats. Similarities in clinical presentation, pathological fndings and risk factors between FDM and human type 2 diabetes (T2D) are well-established1–4. Both are characterised by inadequate insulin secretion resulting in absolute or partial insulin defciency and varying degrees of insulin resistance in peripheral tissues. Te prevalence of FDM has been increasing over recent decades, with obesity, diet, over-nutrition and lack of physical inactivity all implicated as risk ­factors5,6. In humans, T2D is known to have a complex underly- ing genetic architecture, with many genetic risk loci identifed by genome wide association studies (GWAS). Such studies have consistently identifed T2D associated loci containing genes that alter glucose metabolism or pancreatic β-cell ­function7,8. Among recognised cat breeds, the incidence of FDM is highest in Burmese cats­ 9,10. In Australia, Britain and Europe, Burmese cats are approximately four times more likely to develop FDM than domestic cats of other breeds or breed ­mixes11. In contrast, the same heightened risk has not been observed in the North American Burmese population which is regarded as genetically ­distinct12–14. Companion animals are increasingly used to model human disea­ ses15,16. Studying the basis of diabetes mellitus in the naturally occurring Burmese model circumvents the drawbacks associated with in vivo experimental work involving induced illness in healthy animals and transgenic animals. Tis includes welfare and the inability to adequately model the disease ­state17. Compared with humans, the relatively short lifespan of cats permits observation of the natural progression of an illness over a compressed timeline.

1Faculty of Science, Sydney School of Veterinary Science, University of Sydney, Sydney, NSW, Australia. 2School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, Australia. 3Department of Infectious Diseases and Public Health, City University of Hong Kong, Kowloon, Hong Kong SAR, People’s Republic of China. 4Department of Veterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri, Columbia, MO, USA. 5Animal Diabetes Australia, Melbourne, VIC, Australia. *email: georgina.samaha@ sydney.edu.au

43 Scientifc Reports | (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 1 Vol.:(0123456789) www.nature.com/scientificreports/

Breed-specifc phenotypes in cats result from selective breeding for desired characteristics such as coat colour and texture, body size and morphology. Breed-specifc traits are ofen accompanied by breed-specifc suscep- tibility to genetic diseases compared with outbred populations. Heritable diseases disproportionately afecting Burmese cats include craniofacial ­defect18, ­hypokalaemia19, ­hyperlipidaemia20, orofacial pain ­syndrome21 and ­diabetes22. Regarding genetic diversity, the Burmese breed has among the lowest levels of heterozygosity, most extensive linkage disequilibrium (LD) and highest inbreeding coefcients in recognised cat bree­ ds14,23. Tis simplifed genetic architecture is benefcial for mapping the genetic basis of breed-specifc traits. Extensive LD reduces the number of markers required to tag segregating ­haplotypes23,24, making Burmese suitable for analysis on the Illumina Infnium iSelect 63 k Cat DNA genotyping array that is used in this study. Te closed breeding systems used to maintain pedigreed breed lines leave distinct genomic traces, particularly when population bottlenecks, such as the importation of individuals to new countries, create founder efects. Te extent and structure of LD varies widely across chromosomes and regions of long-range LD can indicate partial or complete selective sweeps of functional signifcance in a population. Across species, the process of intensive selection for breed-specifc traits infuences the frequency, distribution and length of runs of homozygosity (ROH)25. ROH exist as long tracts of homozygous genotypes and are ofen the result of consanguineous mat- ings, but they can arise by other mech­ anisms25–27. Detecting ROH can identify genes that have been subjected to selection in population genetic ­analyses28,29. For example, ROH detection has been used to map causative recessive variants segregating within human families and canine breeds in complex and Mendelian disea­ ses30–33. Enrichment of deleterious variants surrounding regions of selection in domesticated breeds suggest deleterious alleles may ‘hitchhike’ with nearby positively selected alleles, supporting a link between disease heritability and breed-specifc ­traits33–35. Te unique demographic history of the Burmese breed is characterised by reduced efective population sizes, potential founder efects and genetic bottlenecks that may account for diferential enrichment of FDM between diferent breeding populations. In a population with an already reduced genetic diversity, we predict GWAS and ROH analyses ofer an opportunity to detect FDM-risk loci segregating in the Australian Burmese breeding population. We used genotyping array and whole genome sequence data to, (1) conduct case–control GWAS of diabetic and non-diabetic Burmese cats, (2) Identify FDM-associated risk haplotypes and (3) perform genome-wide characterisation of Burmese ROH to expose risk loci that diferentiate the heightened prevalence of FDM in this breed. Results Remapping array variants to feline 9.0 genome assembly. As inconsistencies between genome assemblies can complicate GWAS, the SNV locations of array variants were updated to the most recent feline reference assembly (felCat9). Original locations were included in a manifest provided with the Illumina iSelect 63 Cat DNA genotyping array. For the 62,897 variants on the feline genotyping array, 61,371 (97.6%) were remapped. Tis comprised 58,674 autosomal variants and 2697 markers on chromosome X. Te marker counts and statistics per chromosome (Table S1a) and the updated map fle (Table S1b) are presented in Supplementary data. Te largest gap between consecutive markers spanned 82.3 Mb and was detected on chromosome E1. Te average distance between markers was 39.7 kb. As expected, gaps and low-density distributions of SNVs were consistently observed fanking centromeres for each chromosome.

Genome‑wide and haplotype association analyses. Samples from eighty-two Burmese cats that were genotyped on the Illumina Infnium Feline 63 k iSelect DNA array passed quality control. Multidimensional scal- ing (MDS) revealed some geographical clustering of cat populations (Fig. 1a). Cats from Europe (EU) clustered distinctly from Australian (AUS) and British (UK) samples. Given geographical clustering of cases, the GWAS was frst run in Australian cats only to control for population stratifcation (analysis 1). Analysis 1 comprised of 22 cases and 20 controls, Cases and controls within the cohort of samples used in the initial case–control GWAS (analysis 1) which were evenly dispersed across the population cohort (herein referred to as the ‘Australian clus- ter’) (Fig. 1b). For analysis 1, the QQ-plot revealed a deviation from the expected P-value distribution only in −6 the tail (λ = 1.03) (Fig. 1c). Tis analysis totalled 30,212 SNVs and one marker at E2:6,883,182 (P­ raw = 4.68 × 10 ) passed the empirical genome-wide signifcance threshold (Fig. 2a). Te 3021 SNVs that comprised the top 10% associated signals from analysis 1 were run in an expanded cohort of Australian, European and British Burmese cats (22 cases, 60 controls) (analysis 2). SNVs passing genome-wide signifcance from analysis 2 were observed on chromosomes A3, B1, C2 and E1 (Fig. 2b). Te most signifcantly −5 associated marker from analysis 2 was C2: 139,466,787 ­(Pgenome = 1.26 × 10 ). Tis SNV was found not to be in LD with any surrounding markers and was excluded from further analysis. Additional SNVs passing genome- wide signifcance on chromosomes A3, B1 and E1 were observed within intronic regions of Guanine nucleotide- −05 binding protein G subunit (GNAS) (A3:4,665,802; ­Pgenome = 4.44 × 10 ), Growth regulating oestrogen receptor −05 binding 1 (GREB1) (A3:134,518,080; ­Pgenome = 6.13 × 10 ), Zinc fnger matrin-type 4 (ZMAT4) (B1:44,235,566; −05 −05 ­Pgenome = 6.85 × 10 ) and EF-hand calcium binding domain 5 (EFCAB5) (E1:16,917,823; ­Pgenome = 4.02 × 10 ) genes. Five regions of association on chromosomes A3, B1 and E1 were defned based on LD clumping with ­r2 > 0.8 to the top regional SNV and used to identify FDM-associated haplotypes within each of these clumps (Table 1). Te highest associated haplotype spanned the interval chrA3:134,425,431–134,601,739 bp. Tis region spanned genes; Lipin-1 (LPIN1), Neurotensin receptor 2 (NTSR2) and GREB1 and harboured a SNV within GREB1 (P = 4.33 × 10−06) unique to cases. The second highest associated haplotype spanned the interval chrB1:48,056,021–48,848,288 bp (P = 4.69 × 10−06) and was present in 27.3% of cases and 3.3% of controls. Across this region, seven synonymous

44 Scientifc Reports | (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 2 Vol:.(1234567890) www.nature.com/scientificreports/

Figure 1. Multi-dimensional scaling and quantile–quantile plot of diabetic and non-diabetic Burmese cats. (a) Multi-dimensional scaling distribution of Burmese cats of Australian, British and European (HK) provenance in two dimensions. Samples included within the Australian cluster (circled) were included in the initial case– control association analysis. (b) Multi-dimensional scaling distribution of diabetic and non-diabetic Burmese cats in the Australian cluster shown in two dimensions (c) Quantile–quantile plot showing limited infation of the test statistics with a genomic infation factor (λ = 1.03).

Figure 2. Case–control genome wide association analysis performed in two stages in the Burmese breed. (a) Manhattan plot summarising initial analysis of 20 cases and 22 controls in the Australian cluster shows a marker on chromosome E1 b. Te top 10% of markers (3022 SNVs) run in the initial association analysis were run in an expanded association analysis of 82 Burmese cats of Australian, European and British provenance, comprising 22 cases and 60 controls. Signals that maintained or increased their signifcance above the empirical genome- wise signifcance threshold (P < 7.6 × 10−5) are highlighted in red. Loci passing this threshold were observed on chromosomes A3, B1 and E1.

45 Scientifc Reports | (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 3 Vol.:(0123456789) www.nature.com/scientificreports/

Chr Position P Allele Haplotype block Size (kb) Associated haplotype Case:control freq Haplotype P Genes ATP5F1E TUBB1 4,665,802 4.44E−05 C 4,396,199–4,741,413 345 GGCG​ 0.232:0.042 0.0372 A3 PRELID3B NELFCD GNAS 134,518,080 6.13E−05 A 134,425,431–134,601,739 176 GAAG​ 0.168:0 7.00E−04 LPIN1 NTSR2 GREB1 AGG​AGA​GAG​AGC​GAA​ ANK1 GINS4 GPAT4 44,398,531 1.88E−05 C 43,116,211–44,435,725 1319 0.182:0.017 2.50E−04 GGG​CA SFRP1 ZMAT4 B1 AAA​GGG​CAA​GAA​ 48,213,950 3.98E−05 A 48,056,021–48,848,288 792 0.273:0.033 4.69E−06 UNC5D CGAGA​ TMIGD1 BLMH SLC6A4 E1 16,917,823 4.02E−05 A 16,577,377–17,262,839 685 GGG​GGC​AAG​AGG​AA 0.636:0.898 0.0052 NSRP1 EFCAB5 SSH2

Table 1. LD clumping and haplotype analysis of association signals from analysis 2 identifed risk haplotypes on chromosomes A3, B1 and E1 across 82 Burmese cats.

Chromosome Position (gDNA) Sequence change (cDNA) Variant id Gene Consequence Location g.4531693 c.1803G > A – CTSZ Synonymous Exon 5 g.4538198 c.811C > T – NELFCD Synonymous Exon 7 A3 g.4538472 c.751C > T rs783705891 NELFCD Synonymous Exon 6 g.4678284 c.528A > G – NESP55 Synonymous Exon 1 g.4678485 c.729G > A – NESP55 Synonymous Exon 1 g.35028512 c.474G > A – LOXL2 Missense Exon 2 g.35028658 c.620C > T – LOXL2 Synonymous Exon 2 g.35028712 c.674G > A – LOXL2 Synonymous Exon 2 g.35028781 c.743C > T – LOXL2 Synonymous Exon 2 B1 g.35274559 c.136G > A – PEBP4 Missense Exon 1 g.43228852 c.768C > T – ANK1 Synonymous Exon 5 g.43227755C > T - – ANK1 Splice region Intron 4 g.43235911C > T - – ANK1 Splice region Intron 9 g.48700205 c.1176G > A rs783910583 UNC5D Synonymous Exon 10

Table 2. Coding variants matching FDM-risk haplotypes across GWAS loci on chromosomes A3 and B1 and variants segregating in cases across both ROH on B1.

and splice region variants in Cathepsin Z (CTSZ), Negative elongation factor CD (NELFCD), Neuroendocrine secretory protein 55 (NESP55), Ankyrin 1 (ANK1) and UNC-5 netrin receptor (UNC5D) (Table 2), 708 intronic, two splice region variants, two 5′ UTR variants in GPAT4 and GINS4 and seven 3′ UTR variants in CTSZ and NELFCD matched risk haplotypes across the fve FDM-associated haplotype blocks (Table S3). Most notably, variants matching the FDM-risk haplotype on B1 were identifed in common T2D candidate gene, ANK1. Of WGS samples, two cases were heterozygous for both ANK1 splice variants, all other individuals were homozygous for the reference allele. SNV, g.43227755C > T, was located within 5 bp of the region of a constitutive donor splice site at the start of intron 4 and g.43235911C > T was located 2 bp upstream of a constitutive acceptor splice site at the end of intron 9.

Runs of homozygosity. Te total number of SNVs included in ROH analysis was 52,933. A total of 5079 ROH were identifed across all individuals. Te distribution of sizes of ROH was consistent across populations. Most of the observed ROH were less than 20 Mb in size, with 46.9%, 48.1% and 46.5% being observed between 5 and 10 Mb in size (Fig. 3a). Limited by the distribution and density of SNVs on the feline genotyping array, no ROH < 1 Mb were detected. Australian samples displayed larger spans of ROH, with two samples displaying a total length of ROH greater than 750 Mb. Tese two samples displayed the highest inbreeding coefcients. No correlation was observed between case–control status and the total length of ROH, number of ROH or extent of inbreeding in Burmese cats. Inbreeding coefcient measures FROH­ cov and FROH­ aut across all samples were −16 highly correlated (Fig. 3b; r = 1.0; P < 2.26 × 10 ). While ­FROHaut is the more commonly reported inbreeding coefcient statistic, ­FROHcov allows maximal possible genome coverage. Hence, we have reported both here. ­FROHaut and ­FROHcov reported an inbreeding coefcient ~ 0.24 across all samples. Inbreeding ­(FROHcov) was consistent among Australian (FROH­ cov = 0.25), UK ­(FROHcov = 0.23) and EU (FROH­ cov = 0.22) populations (Fig. 3b). Genomic regions containing the most common ROH across all samples were identifed across all auto- somes (Fig. 3c). ROH on chromosomes B1, B3, D1 and D4 revealed regions potentially under selection in the breed, with both ROH on chromosome B1 containing FDM-risk variants (Table 2). ROH spanning chrB3:20,948,939–28,536,357 contained 22 genes, none of which are associated with known feline phenotypes. Tis ROH was observed in > 75% of samples and was syntenic with a region of human chromosome 15, previously

46 Scientifc Reports | (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 4 Vol:.(1234567890) www.nature.com/scientificreports/

Figure 3. ROH analyses on 82 Burmese samples of Australian, British and European provenance. (a) Most ROH were found in the size ranges 1–20 Mb across all individuals. (b) Inbreeding coefcient measurements were equivalent across all populations. c. Autosomal distribution of the incidence of SNVs in ROH measured across all samples.

implicated in Prader Willi syndrome (PWS) (OMIM 176270) and oculocutaneous albinism (OMIM 203200). FDM-risk variants were found to segregate in Non-Imprinted gene in Prader-Willi syndrome/Angelman syn- drome (NIPA1; OMIM:608145) and Gamma-aminobutyric Acid Receptor, Gamma-3 (GABRG3; OMIM:600233). An ROH spanning chrD1:37,995,037–57,240,116 upstream of the FDM-associated locus was observed in > 90% of all Burmese cats. Tis locus contained the Tyrosinase (TYR​) gene that has been previously Intronic risk variants in glutamate receptor, metabotropic 5 (GRM5), Dexamethosone-induced gene-2 (DIG2) and Myosin VIIA (MYO7A) gene were found across WGS samples. Te FDM locus with indicative association on chromosome D1 was 40 Mb upstream of the ROH spanning chrD1:37,995,037–57,240,116. Te ROH on chrD4:34,892,052–40,360,453, observed in 62% of Burmese samples, spanned the Tyrosinase-related protein 1 (TYRP1) gene, implicated in brown coat colour in cats (OMIA 001249-9685). Intronic risk variants in protein tyrosine phosphate receptor delta (PTPRD) were detected in WGS samples. Two ROH fanking the centromere on chromosome B1 were detected across all populations. Both were syntenic to a region on human chromosome 8 and overlapped the putative ticked coat phenotype locus in cats (OMIA 001484-9685). Te frst spanned chrB1:34,395,302–38,202,712 at a lower incidence (0.49) than the sec- ond, chrB1:40,408,022–54,187,390 bp (0.69). Missense and synonymous risk variants in Lysyl Oxidase homolog 2 (LOXL2) and Phosphatidylethanolamine-Binding Protein 4 (PEBP4) genes segregated in FDM-cases (Table 2). SNVs within the chrB1:34,395,302–38,202,712 bp ROH were in moderate LD (r­ 2 = 0.5) to the highest associated SNV in the chrB1:40,408,022–54,187,390 bp ROH (Fig. 4a). ChrB1:40,408,022–54,187,390 overlapped the most signifcant region of FDM-association identifed in analysis 2 (Fig. 4b) and was observed in 80% of controls and 49% in cases and contained 65 genes. FDM-risk variants segregating in cases could not be detected as only one diabetic WGS sample had the chrB1:40408022-54187390 ROH. A homozygous missense variant (c.163A > G) in exon 1 of Epoxide hydrolase 2 (EPHX2) was fxed across all WGS samples. Annotated FDM-risk variants discovered across ROH are presented in table S4. Discussion We sought to identify novel genetic loci associated with FDM in the Australian Burmese population. GWAS exploits the non-random coinheritance of genetic variants (LD) to assay thousands of markers for an association with any phenotypic trait. Te Burmese breed is characterised by a low fraction of GWAS-informative SNVs on the 63 K array, and only 49% of the available array markers were included in our analysis. However, the low allele frequency spectrum used in array-based association analyses, coupled with the genetic profle of this breed limits 47 Scientifc Reports | (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 5 Vol.:(0123456789) www.nature.com/scientificreports/

Figure 4. Overlapping FDM associated loci on chromosome B1 was identifed in both ROH and GWAS analyses. (a) Te distribution of SNVs fanking the highest associated SNV (red). Haplotype testing narrowed this to a region spanning 42–49 Mb. (b) Two ROH were observed on chromosome B1. ROH_1 (orange) was 4 Mb downstream of the GWAS identifed FDM-association loci. ROH_2 (blue) was found to overlap the FDM- associated haplotype block. Te incidence of SNVs included in ROHs were observed diferentially between cases and controls with two adjacent peaks of higher frequently in controls than cases. c. Targeted variant discovery in these ROH and GWAS-identifed regions revealed coding variants in genes.

the need to control for false-positives. Given the extent of LD observed in pedigreed cat breeds, traditionally conservative forms of correction (i.e. Bonferroni correction) are considered excessive. Empirical estimations of genome-wide signifcance (i.e. 95% CIs from the distribution of raw P values) are more conservative than what is typically used in human GWAS but are adequate in population isol­ ates16,36. Genomic clustering of samples was concordant with geographical provenance. Geographic population strati- fcation was expected as the cats in this study were collected from British, European and Australian populations that have each been subjected to population bottlenecks during their local breed development. Population sub- structure in GWAS cohorts is identifed as a cause of potentially spurious ­associations37. Imprecise modelling of genetic relatedness within GWAS sample populations can cause substantial infation of test statistics and potentially false association signals. Further, limiting study samples to entirely unrelated individuals is difcult in pedigreed breeds, such as the Burmese. To limit the efects of hidden relatedness and population stratifca- tion, the linear mixed-model (LMM) approach implemented by EMMAX was used. LMM have been shown to perform well in comparison to traditional family-based association tests­ 38. Te Burmese cats in this study displayed high homozygosity and long ROH consistent with previous ­reports14. Te distribution of ROH across the Burmese samples was consistent with previous ­results39. Long ROH (> 5 Mb) are indicative of recent inbreeding, unbroken by historical recombination events. No detectable diference in inbreeding coefcients between FDM cases and control groups was identifed. Several factors infuence the resolution of ROH calling, including marker density, marker distribution and genotype call quality. Terefore, for most populations, medium density genotyping arrays do not lend themselves to high resolution analysis of ­ROH40. For this analysis, minor allele frequency (MAF) and LD pruning were not employed in the ROH detec- tion as both have been shown to mask the detection of true ROH by limiting genome coverag­ e39. Long ROH (1–10 Mb) made up the largest size category observed across all populations, accounting for over 70% of ROH observations. Large ROH may persist in a population because of low rates of local recombination particularly in genomic regions that are subject to positive selection pressures. Te risk of T2D is increased by moderate inbreeding and consanguinity within isolated ­populations41–43. ROH did not implicate autozygosity as a risk

48 Scientifc Reports | (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 6 Vol:.(1234567890) www.nature.com/scientificreports/

factor for FDM in Burmese cats, as diabetic cats did not display a higher number of ROH or higher inbreeding coefcient than non-diabetic cats. Selective sweeps, indicative of positive selection, manifest by the reduction of genetic variation surrounding a benefcial mutation, and occur due to positive selection pressure increasing the frequency of the favourable allele over ­time44. In cats, at least eight loci are involved in coat colour determination, with various combinations of these responsible for extensive phenotypic variation. Genes, TYR​ and TYRP1, inside ROH support strong selection of tyrosine metabolism and common Burmese coat ‘ticking’ and colour ­phenotypes45–47. Tyrosine metabolism infuences the development, diferentiation and proliferation of melanocytes, the construction and transport of the melanosome and the synthesis of mela­ nin48. Short selective sweeps, indicative of extensive LD, have been associated with T2D in isolated human popu­ lations49. Sweeps can cause a shif in the allelic frequency of a selected allele and the ‘hitchhiking’ alleles in their vicinity. Pancreatic islets of FDM and T2D patients are characterised by reduced beta cell function and decreased insulin gene expr­ ession50,51. FDM-risk variants within these selective sweeps identifed GRM5 and DLG2 as candidates for beta cell dysregulation, as both are involved in glucose-stimulated insulin secretion and hyperglycaemia susceptibility in rodent and human­ 52–54. Addition- ally, PTPRD contained within the chromosome D4 ROH has been associated with progression to diabetes in humans through enhanced insulin ­resistance55,56. Te ROH on chromosome B3 overlapped a syntenic loci in humans implicated in Prader Willi Syndrome (PWS). T2D has been found to afect between 7 and 24% of PWS patients, far exceeding the prevalence in the general ­population57,58. Tis is likely a consequence of insulin resistance resulting from morbid obesity however PWS patients also exhibit a state of hypoinsulinaemia without expected insulin resistance despite their obese state. Defcits in pancreatic islet development may play a role in the PWS ­phenotype59,60. NIPA1 is a magnesium transporter, upregulated in response to reduced magnesium ­concentration61. Variants in NIPA1 have previously been associated with T2D risk­ 62 and GABRG3 is an early childhood obesity gene contributing to PWS phen­ otype63. Te basis of the diabetic state in obese PWS patients is currently unclear but the PWS locus contains epigenetically imprinted genes that serve as viable candidates for FDM. Te most compelling FDM-association was provided by the locus on chromosome B1, identifed in both ROH and GWAS analyses. Coding variants matching the risk-haplotype segregated in ANK1, an established T2D candidate gene associated with decreased beta-cell function in human­ s64,65. Feline ANK1 is highly orthologous (93.89%) to human ANK1 sequence and multiple ANK1 isoforms with afnities for various target proteins are expressed in a tissue specifc mann­ er66–68. Transcripts of varying size are present in tissues essential to glucose metabolism: skeletal musc­ le69, pancreas, adipose and live­ r70. Splice region variants identifed in this study may be infuential in the molecular function of feline ANK1 isoforms, but any potential impact of these will be depend- ent on tissue-specifc expression of ANK1 isoforms. Alternatively-spliced ANK1 transcripts are functionally diverse and variants in regulatory regions have been implicated in altered molecular function of the various human isoforms and their role in ­T2D71. Te physiological role of ANK1 in T2D pathogenesis is unconfrmed but Ankyrin B (ANK2) regulates ATP sensitivity in murine pancreatic beta cell­ s72. Additionally, SNVs in the ANK1 promoter region have been found to increase intramuscular fat content in pigs­ 73 and increased muscle-specifc ANK1 expression in human skeletal musc­ le71 implying SNVs in the ANK1 promoter region may play a role in the development of insulin resistance. ROH analysis implicated two loci on chromosome B1 as FDM-risk loci, containing risk variants segregating in cases and across the breed, further highlighting the genetic complexity of FDM. Both partially overlapped the semi-dominant Abyssinian ‘ticked’ coat locus responsible for the homogenous agouti coat with no body markings characteristic of Burmese cats­ 47. Across chrB1:34,395,302–38,202,712, variants segregated in cases in LOXL2, previously implicated in nephropathy and retinopathy in T2D pati­ ents74,75, indicating these may play a critical role in the progression to FDM. Diabetic nephropathy is not routinely recognised in diabetic cats, although diabetic retinopathy has been ­reported76. Dyslipidaemia is a critical factor in the early infammatory response in the development of retino­ pathy77. LOXL2 overexpression is considered a potential target for treatment of vascular changes involved in diabetic ­retinopathy78. Inherited derangements in lipid metabolism have been described in the Burmese ­breed20,79 and may be related to the high prevalence of FDM. Familial hypercholesterolaemia (FH) is a common autosomal dominant disorder of lipoprotein metabolism, with variable phenotypic presentation in ­humans80. Variants in EPHX2 have been found to exacerbate the dysfunction of the Low-density lipoprotein receptor (LDLR) gene in FH­ 81 and are associated with insulin resistance in T2D ­patients82. Defects in LDLR result in disturbed clearance of low-density lipoprotein cholesterol (LDL-C) and the clinical presentation varies widely depending on the segregation of risk alleles and haplotype. A missense substitution (c.163A > G) in EPHX2 was fxed across all WGS Burmese samples within the chrB1:40,408,022–54,187,390 bp ROH. Phenotypic heterogeneity is present between homozygous and heterozygous FH patients, with a range of increased plasma triglyceride levels, likely infuenced by modifying gene–gene interactions. A syndrome similar to FH has been described in Burmese cats with afected individu- als exhibiting signifcantly elevated plasma triglyceride concentrations, lipid aqueous and delayed triglyceride clearance compared with other bree­ ds83–85. Post-prandial hypertriglyceridemia has been described in cats with inherited Lipoprotein lipase (LPL) ­factors86 with variable phenotypic presentation in homozygous and heterozy- gous individuals. Further studies will be required to determine whether the lipid metabolism defects in Burmese cats can be attributed to EPHX2 dysfunction and any potential contribution to increased risk of FDM. Tis study has unveiled genomic regions underlying dysfunctional metabolic processes predisposing Bur- mese cats to FDM, highlighting lipid metabolism as a contributing factor. Larger cohorts of comprehensively phenotyped individuals across multiple breeds are needed to validate the genetic risk factors presented here. Te segregation of polymorphisms across autozygous regions and identifcation of risk-haplotypes reveal a potential highly penetrant recessive locus on chromosome B1. Tis strategy for identifying risk-loci across a predisposed population suggests the interaction of multiple genes across a fxed number of loci are likely responsible for FDM 49 Scientifc Reports | (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 7 Vol.:(0123456789) www.nature.com/scientificreports/

in Australian Burmese cats. Te regions reported here characterise a window of FDM association containing candidate genes for further investigation in larger studies with prospective clinical phenotyping. Detailed char- acterisation of the genetic risk factors involved in the pathogenesis of FDM, including the allelic variants and candidate genes presented here, will provide a more comprehensive understanding of the molecular mechanisms involved and the genetic interaction profles responsible for this disease, and for T2D in humans. Building on insights gained from this study, future work can potentially pinpoint population specifc allele frequency profles. Materials and methods Clinical diagnosis and sample collection. Eighty-two Burmese cats, comprising 22 diabetic and 60 non-diabetic individuals were submitted for genotyping analysis on the Illumina Infnium iSelect 63 k Cat DNA genotyping array. Genomic DNA was extracted from whole blood using the DNeasy blood and tissue kit (Qia- gen GmbH, Germany) or Performagene PG-100 buccal swabs (DNA Genotek, Canada). All cases (15 male, 7 female) had been diagnosed with FDM based on persistent fasting hyperglycaemia, glycosuria with clinical signs; weight loss, polydipsia and polyphagia by a qualifed veterinarian at the University of Sydney Veterinary Teaching Hospital and Animal Diabetes Australia. Unafected Burmese were collected from an database of adult Burmese cats. Tis included 23 Australian and 24 British Burmese and 13 European Burmese, nine European cats were members of the same pedigr­ ee19. Samples from non-diabetic Burmese (26 male, 34 female) were all collected by a qualifed veterinarian and showed no clinical signs of FDM at the time of their presentation for clinical examination and sample collection and had never previously been diagnosed with FDM. Veterinary records of controls were checked for indicators of disease related to diabetes before they were included.

Remapping array variants to feline 9.0 genome assembly. To determine the exact position of feline array single nucleotide variants (SNV) on the Felis_catus_9.0 genome assembly, we performed a Basic Local Alignment ­Search87 using SNV probes available in the feline manifes­ t88. A FASTA fle was created using the SNV identifer and fanking genomic sequence upstream and downstream of each SNV in top orientation. Te long- est fanking sequence was retained and output in FASTA format. A custom BLAST database was prepared using BLAST+87 to make the reference FASTA sequence searchable. A nucleotide BLAST search was performed and output was fltered to collect the single best hit for each sequence, hits that failed to return any result, hits that failed to reach the base position immediately adjacent to the SNV and those that had an E-value greater than 1e−05 were rejected.

Population structure and quality control. Quality control was carried out for 82 samples and 61,386 SNVs using PLINK 1.989. First, we identifed possible duplicate samples and outliers based on pairwise genetic distances (–genome) and removed one sample from each pair with an identity by descent (IBD) estimate > 0.65. Population stratifcation was evaluated using a MDS plot with two dimensions (–mds). Marker-based QC included pruning the total set of SNVs at a MAF (–maf) of 0.05 and a SNV (–mind) and individual call rate (–geno) of > 90%.

Case–control genome‑wide association study and haplotype analysis. Testing for association between FDM and markers on the feline genotyping array was performed in two stages using Efcient Mixed- Model Association eXpedited (EMMAX)90. Given all cases were of Australian provenance, an initial association analysis (analysis 1) was performed on samples within the unstratifed Australian cluster. To validate the associa- tions observed in the Australian cluster, a second association analysis (analysis 2) run in EMMAX included an expanded group of individuals from Australian, UK and EU populations (22 cases, 60 controls) and the top 10% of association signals from analysis 1. Te genome-wide signifcance threshold in both analyses was calculated based on empirical 95% confdence intervals (CIs). Te probability distribution was determined by running the GWAS 1000 times with randomly permuted phenotypes. Te genome-wide signifcance threshold was set at the 97.5% upper CI (based on a two-tailed distribution) (P < 7.6 × 10−5) (Karlsson et al.16). Regions of association were refned using LD clumping (–clump) in PLINK. First, a region of weak LD ­(r2 > 0.2) was defned within 5 Mb of each top SNV and then the region was narrowed to a single locus of high LD ­(r2 > 0.8) within 2 Mb of the highest associated SNV per locus. Te refned regions were submitted for haplotype analysis with ­HAPLOVIEW91. FDM-associated haplotype blocks were examined in all 82 individuals and were defned using the four-gamete rule­ 92. Consecutive haplotype blocks with a multiallelic r­ 2 value of 1 were com- bined and blocks containing the top associated SNVs were used to defne our FDM-associated loci. Signifcance of FDM-associated haplotype blocks was measured by running 10,000 permutations.

Runs of homozygosity analysis. ROH analysis was performed to identify FDM-associated regions of autozygosity using PLINK (–homozyg function). Te input settings for minimal density of SNVs, maximal gap size, scanning window length and threshold settings were determined ­empirically39. Te ROH analysis was run using a maximal gap size of 300 kb (–homozyg-gap) and scanning window size setting of minimal density (– homozyg-density) at 80 kb/SNV at a genome coverage of 98.9%. Additional settings were; a scanning window hit rate of 0.05 (–homozyg-window-threshold), maximum of one heterozygous SNV per fnal ROH segment (–homozyg-het) and a minimum of 94 SNVs in each fnal ROH (–homozyg-SNV) (Table S2). Inbreeding coef- fcients ­(FROH) for genotyped individuals were calculated based on the length of the autosomal genome and the length of the genome covered by feline array markers (Meyermans et al. 2019). Correlations between inbreed- ing coefcients ­FROHcov and ­FROHaut were measured using a Pearson’s correlation test. We did not prune SNVs based on MAF or LD. ROH on chromosomes B1, B3, D1 and D4 were examined for syntenic regions using the Ensembl Bioinformatics database and a database search was performed using the BLAST algorithm. Compara- 50 Scientifc Reports | (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 8 Vol:.(1234567890) www.nature.com/scientificreports/

tive genome analyses of ROH on chromosomes B1 and B3 were undertaken. ROH distributions were compared between the Australian cluster and the expanded group of Burmese and between cases and controls across all autosomes to identify any regions associated with FDM.

Risk variant discovery and annotation. Genomic DNA was extracted from whole blood samples of fve Australian Burmese cats (3 cases, 2 controls) included in GWAS analysis, using a phenol–chloroform extraction protocol. DNA samples were submitted for whole genome sequencing (WGS). Illumina paired-end libraries were prepared and sequenced on the Illumina HiSeq 2000 platform with 150 bp paired-end reads (~ 12–21 × cover- age). An additional three WGS samples, of unknown diabetic phenotype, available on Sequence Read Archive (SRA) were downloaded (SRX2376210, SRX2669131, SRX2376201) (~ 12–27 × coverage). All samples were aligned to the Felis_catus_9.0 reference assembly using BWA-mem93. Base quality score recalibration, indel realignment and duplicate removal was performed using Genome Analysis ToolKit (GATK)94. Genome-wide variant detection was performed using GATK’s HaplotypeCaller according to best practices and variants fltered for sequence depth (DP > 10), quality of alignment (GQ > 20) and strand ­bias95. Given the complex phenotypic presentation of FDM and the unknown clinical phenotype of the SRA samples, associated haplotype blocks were fltered according to presence or absence of risk haplotypes across the eight WGS samples, rather than clinical information of each individual. ROH were examined for FDM risk variants segregating in afected WGS samples and potential risk factors across the breed. Filtered variants were analysed with Ensembl’s Variant Efect Predic- tor (VEP) tool­ 96. Sequence variants were further fltered for genic variants and annotated variants with Sort Intolerant From Tolerant (SIFT) ­tool97.

Ethical approval. Recommendations from the Australian Code for the Care and Use of Animals for Scien- tifc Purposes was strictly adhered to throughout this study. Research was conducted at Te University of Sydney, under Animal Ethics Committee approval no: N00/9–2009/3/5109, 24 September 2009. Blood and buccal swab samples were all collected a veterinarian or donated by owners and breeders. Data availability Genotyping array data is available at 10.6084/m9.fgshare.12561815 and 10.6084/m9.fgshare.12561770. Whole genome sequence data for fve phenotyped Australian Burmese samples can be accessed freely upon request to the 99Lives Consortium Coordinator, L. A. Lyons ([email protected]). Whole genome sequence data for three Burmese samples of unknown clinical status can be accessed via NCBI Sequence Read Archive under accession codes: SRX2376208, SRX2376209 and SRX2376210.

Received: 29 June 2020; Accepted: 22 October 2020

References 1. Henson, M. S. & O’Brien, T. D. Feline models of type 2 diabetes mellitus. ILAR J. 47, 234–242. https://doi.or​ g/10.1093/ilar.47.3.234 (2006). 2. Osto, M., Zini, E., Reusch, C. E. & Lutz, T. A. Diabetes from humans to cats. Gen. Comp. Endocrinol. 182, 48–53. https​://doi. org/10.1016/j.ygcen​.2012.11.019 (2013). 3. Hoenig, M. Comparative aspects of human, canine, and feline obesity and factors predicting progression to diabetes. Vet. Sci. https​ ://doi.org/10.3390/vetsc​i1020​121 (2014). 4. Samaha, G., Beatty, J., Wade, C. M. & Haase, B. Te Burmese cat as a genetic model of type 2 diabetes in humans. Anim. Genet. 50, 319–325. https​://doi.org/10.1111/age.12799​ (2019). 5. Slingerland, L. I., Fazilova, V. V., Plantinga, E. A., Kooistra, H. S. & Beynen, A. C. Indoor confnement and physical inactivity rather than the proportion of dry food are risk factors in the development of feline type 2 diabetes mellitus. Vet. J. 179, 247–253. https​://doi.org/10.1016/j.tvjl.2007.08.035 (2009). 6. Öhlund, M. et al. Environmental risk factors for diabetes mellitus in cats. J. Vet. Intern. Med. 31, 29–35. https​://doi.org/10.1111/ jvim.14618​ (2017). 7. DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium. et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet. 46, 234–244. https​://doi.org/10.1038/ ng.2897 (2014). 8. Xue, A. et al. Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat. Commun. 9, 2941. https​://doi.org/10.1038/s4146​7-018-04951​-w (2018). 9. Baral, R., Rand, J., Catt, M. & Farrow, H. Prevalence of feline diabetes mellitus in a feline private practice [abstract]. J. Vet. Intern. Med. 17, 433–434 (2003). 10. McCann, T. M., Simpson, K. E., Shaw, D. J., Butt, J. A. & Gunn-Moore, D. A. Feline diabetes mellitus in the UK: the prevalence within an insured cat population and a questionnaire-based putative risk factor analysis. J. Feline Med. Surg. 9, 289–299. https​:// doi.org/10.1016/j.jfms.2007.02.001 (2007). 11. Lederer, R., Rand, J. S., Jonsson, N. N., Hughes, I. P. & Morton, J. M. Frequency of feline diabetes mellitus and breed predisposition in domestic cats in Australia. Vet. J. 179, 254–258. https://do​ i.org/10.1016/j.tvjl.2007.09.019 (2009). 12. Panciera, D. L., Tomas, C. B., Eicker, S. W. & Atkins, C. E. Epizootiologic patterns of diabetes mellitus in cats: 333 cases (1980– 1986). J. Am. Vet. Med. Assoc. 197, 1504–1508 (1990). 13. Prahl, A., Guptill, L., Glickman, N. W., Tetrick, M. & Glickman, L. T. Time trends and risk factors for diabetes mellitus in cats presented to veterinary teaching hospitals. J. Feline Med. Surg. 9, 351–358. https​://doi.org/10.1016/j.jfms.2007.02.004 (2007). 14. Lipinski, M. J. et al. Te ascent of cat breeds: Genetic evaluations of breeds and worldwide random-bred populations. Genomics 91, 12–21. https://do​ i.org/10.1016/j.ygeno.2007​ .10.009 (2008). 15. Gandolf, B. et al. A dominant TRPV4 variant underlies osteochondrodysplasia in Scottish fold cats. Osteoarthr. Cartil. 24, 1441– 1450. https​://doi.org/10.1016/j.joca.2016.03.019 (2016). 16. Karlsson, E. K. et al. Genome-wide analyses implicate 33 loci in heritable dog osteosarcoma, including regulatory variants near CDKN2A/B. Genome Biol. 14, R132. https​://doi.org/10.1186/gb-2013-14-12-r132 (2013).

51 Scientifc Reports | (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 9 Vol.:(0123456789) www.nature.com/scientificreports/

17. King, A. J. F. The use of animal models in diabetes research. Br. J. Pharmacol. 166, 877–894. https​://doi.org/10.111 1/j.1476-5381.2012.01911​.x (2012). 18. Lyons, L. A. et al. Aristaless-like homeobox protein 1 (ALX1) variant associated with craniofacial structure and frontonasal dys- plasia in Burmese cats. Dev. Biol. 409, 451–458. https​://doi.org/10.1016/j.ydbio​.2015.11.015 (2016). 19. Gandolf, B. et al. First WNK4-hypokalemia animal model identifed by genome-wide association in Burmese cats. PLoS ONE 7, e53173–e53173. https​://doi.org/10.1371/journ​al.pone.00531​73 (2012). 20. Kluger, E. K., Caslake, M., Baral, R. M., Malik, R. & Govendir, M. Preliminary post-prandial studies of Burmese cats with ele- vated triglyceride concentrations and/or presumed lipid aqueous. J. Feline Med. Surg. 12, 621–630. https​://doi.org/10.1016/j. jfms.2010.04.002 (2010). 21. Rusbridge, C. et al. Feline orofacial pain syndrome (FOPS): a retrospective study of 113 cases. J. Feline Med. Surg. 12, 498–508. https​://doi.org/10.1016/j.jfms.2010.03.005 (2010). 22. Rand, J. S., Bobbermien, L. M., Hendrikz, J. K. & Copland, M. Over representation of Burmese cats with diabetes mellitus. Aust. Vet. J. 75, 402–405. https​://doi.org/10.1111/j.1751-0813.1997.tb143​40.x (1997). 23. Alhaddad, H. et al. Extent of linkage disequilibrium in the domestic cat, Felis silvestris catus, and its breeds. PLoS ONE 8, e53537– e53537. https​://doi.org/10.1371/journ​al.pone.00535​37 (2013). 24. Gandolf, B. et al. Applications and efciencies of the frst cat 63K DNA array. Sci. Rep. 8, 7024. https​://doi.org/10.1038/s4159 8-018-25438​-0 (2018). 25. Aramburu, O. et al. Genomic signatures afer fve generations of intensive selective breeding: runs of homozygosity and genetic diversity in representative domestic and wild populations of turbot (Scophthalmus maximus). Front. Genet. https://doi.or​ g/10.3389/ fgene​.2020.00296​ (2020). 26. Gibson, J., Morton, N. E. & Collins, A. Extended tracts of homozygosity in outbred human populations. Hum. Mol. Genet. 15, 789–795. https​://doi.org/10.1093/hmg/ddi49​3 (2006). 27. Purfeld, D. C., McParland, S., Wall, E. & Berry, D. P. Te distribution of runs of homozygosity and selection signatures in six commercial meat sheep breeds. PLoS ONE 12, e0176780–e0176780. https​://doi.org/10.1371/journ​al.pone.01767​80 (2017). 28. Metzger, J. et al. Runs of homozygosity reveal signatures of positive selection for reproduction traits in breed and non-breed horses. BMC Genom. 16, 764. https​://doi.org/10.1186/s1286​4-015-1977-3 (2015). 29. Xie, R. et al. Genome-wide scan for runs of homozygosity identifes candidate genes in three pig breeds. Animals (Basel) 9, 518. https​://doi.org/10.3390/ani90​80518​ (2019). 30. Lencz, T. et al. Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia. Proc. Natl. Acad. Sci. 104, 19942– 19947. https​://doi.org/10.1073/pnas.07100​21104​ (2007). 31. Nalls, M. A. et al. Extended tracts of homozygosity identify novel candidate genes associated with late-onset Alzheimer’s disease. Neurogenetics 10, 183–190. https​://doi.org/10.1007/s1004​8-009-0182-4 (2009). 32. Alkuraya, F. S. Te application of next-generation sequencing in the autozygosity mapping of human recessive diseases. Hum. Genet. 132, 1197–1211. https​://doi.org/10.1007/s0043​9-013-1344-x (2013). 33. Sams, A. J. & Boyko, A. R. Fine-scale resolution of runs of homozygosity reveal patterns of inbreeding and substantial overlap with recessive disease genotypes in domestic dogs. G3 Genes Genom. Genet. 9, 117–123. https://doi​ .org/10.1534/g3.118.200836​ (2019). 34. Marsden, C. D. et al. Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs. Proc. Natl. Acad. Sci. USA 113, 152–157. https​://doi.org/10.1073/pnas.15125​01113​ (2016). 35. Derks, M. F. L. et al. A survey of functional genomic variation in domesticated chickens. Genet. Sel. Evol. 50, 17. https​://doi. org/10.1186/s1271​1-018-0390-1 (2018). 36. Kanai, M., Tanaka, T. & Okada, Y. Empirical estimation of genome-wide signifcance thresholds based on the 1000 genomes project data set. J. Hum. Genet. 61, 861–866. https​://doi.org/10.1038/jhg.2016.72 (2016). 37. Tian, C., Gregersen, P. K. & Seldin, M. F. Accounting for ancestry: population substructure and genome-wide association studies. Hum. Mol. Genet. 17, R143-150. https​://doi.org/10.1093/hmg/ddn26​8 (2008). 38. Eu-ahsunthornwattana, J. et al. Comparison of methods to account for relatedness in genome-wide association studies with family- based data. PLoS Genet. 10, e1004445. https​://doi.org/10.1371/journ​al.pgen.10044​45 (2014). 39. Meyermans, R., Gorssen, W., Buys, N. & Janssens, S. How to study runs of homozygosity using PLINK? A guide for analyzing medium density SNP data in livestock and pet species. BMC Genom. 21, 94. https​://doi.org/10.1186/s1286​4-020-6463-x (2020). 40. Ceballos, F. C., Hazelhurst, S. & Ramsay, M. Assessing runs of Homozygosity: a comparison of SNV array and whole genome sequence low coverage data. BMC Genom. 19, 106–106. https​://doi.org/10.1186/s1286​4-018-4489-0 (2018). 41. Dajani, R. et al. Diabetes mellitus in genetically isolated populations in Jordan: prevalence, awareness, glycemic control, and associated factors. J. Diabetes Complicat. 26, 175–180. https​://doi.org/10.1016/j.jdiac​omp.2012.03.009 (2012). 42. Saxena, R. et al. Genome-wide association study identifes a novel locus contributing to type 2 diabetes susceptibility in Sikhs of Punjabi origin From India. Diabetes 62, 1746. https​://doi.org/10.2337/db12-1077 (2013). 43. Gosadi, I. M., Goyder, E. C. & Teare, M. D. Investigating the potential efect of consanguinity on type 2 diabetes susceptibility in a Saudi population. Hum. Hered. 77, 197–206. https​://doi.org/10.1159/00036​2447 (2014). 44. Smith, J. M. & Haigh, J. Te hitch-hiking efect of a favourable gene. Genet. Res. 23, 23–35 (1974). 45. Lyons, L. A., Foe, I. T., Rah, H. C. & Grahn, R. A. Chocolate coated cats: TYRP1 mutations for brown color in domestic cats. Mamm. Genome Of. J. Int. Mamm. Genome Soc. 16, 356–366. https​://doi.org/10.1007/s0033​5-004-2455-4 (2005). 46. Schmidt-Küntzel, A., Eizirik, E., O’Brien, S. J. & Menotti-Raymond, M. Tyrosinase and tyrosinase related protein 1 alleles specify domestic cat coat color phenotypes of the albino and brown Loci. J. Hered. 96, 289–301. https​://doi.org/10.1093/jhere​d/esi06​6 (2005). 47. Eizirik, E. et al. Defning and mapping mammalian coat pattern genes: multiple genomic regions implicated in domestic cat stripes and spots. Genetics 184, 267–275. https​://doi.org/10.1534/genet​ics.109.10962​9 (2010). 48. D’Mello, S. A., Finlay, G. J., Baguley, B. C. & Askarian-Amiri, M. E. Signaling pathways in melanogenesis. Int. J. Mol. Sci. https​:// doi.org/10.3390/ijms1​70711​44 (2016). 49. Acosta, J. L. et al. Rare intronic variants of TCF7L2 arising by selective sweeps in an indigenous population from Mexico. BMC Genet. 17, 68–68. https​://doi.org/10.1186/s1286​3-016-0372-7 (2016). 50. Maedler, K. Beta cells in type 2 diabetes—a crucial contribution to pathogenesis. Diabetes Obes. Metab. 10, 408–420. https​://doi. org/10.1111/j.1463-1326.2007.00718​.x (2008). 51. Zini, E. et al. Endocrine pancreas in cats with diabetes mellitus. Vet. Pathol. 53, 136–144. https​://doi.org/10.1177/03009​85815​ 59107​8 (2016). 52. Brice, N. L., Varadi, A., Ashcrof, S. J. & Molnar, E. Metabotropic glutamate and GABA(B) receptors contribute to the modulation of glucose-stimulated insulin secretion in pancreatic beta cells. Diabetologia 45, 242–252. https​://doi.org/10.1007/s0012​5-001- 0750-0 (2002). 53. Storto, M. et al. Insulin secretion is controlled by mGlu5 metabotropic glutamate receptors. Mol. Pharmacol. 69, 1234–1241. https ://doi.org/10.1124/mol.105.01839​0 (2006). 54. Yang, C. H. et al. E2f8 and Dlg2 genes have independent efects on impaired insulin secretion associated with hyperglycaemia. Diabetologia 63, 1333–1348. https​://doi.org/10.1007/s0012​5-020-05137​-0 (2020). 55. Tsai, F. J. et al. A genome-wide association study identifes susceptibility variants for type 2 diabetes in Han Chinese. PLoS Genet. 6, e1000847. https​://doi.org/10.1371/journ​al.pgen.10008​47 (2010).

52 Scientifc Reports | (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 10 Vol:.(1234567890) www.nature.com/scientificreports/

56. Chang, Y. C. et al. Replication of genome-wide association signals of type 2 diabetes in Han Chinese in a prospective cohort. Clin. Endocrinol. 76, 365–372. https​://doi.org/10.1111/j.1365-2265.2011.04175​.x (2012). 57. Vogels, A. & Fryns, J. P. Age at diagnosis, body mass index and physical morbidity in children and adults with the Prader-Willi syndrome. Genet. Couns. (Geneva, Switzerland) 15, 397–404 (2004). 58. Sinnema, M. et al. Physical health problems in adults with Prader-Willi syndrome. Am. J. Med. Genet. Part A 155a, 2112–2124. https​://doi.org/10.1002/ajmg.a.34171​ (2011). 59. Stefan, M. et al. Global defcits in development, function, and gene expression in the endocrine pancreas in a deletion mouse model of Prader–Willi syndrome. Am. J. Physiol. Endocrinol. Metab. 300, E909–E922. https://doi.or​ g/10.1152/ajpendo.00185​ .2010​ (2011). 60. Goldstone, A. P., Holland, A. J., Butler, J. V. & Whittington, J. E. Appetite hormones and the transition to hyperphagia in children with Prader–Willi syndrome. Int. J. Obes. 36, 1564–1570. https​://doi.org/10.1038/ijo.2011.274 (2012). 61. Goytain, A., Hines, R. M., El-Husseini, A. & Quamme, G. A. NIPA1(SPG6), the basis for autosomal dominant form of hereditary spastic paraplegia, encodes a functional Mg2+ transporter. J. Biol. Chem. 282, 8060–8068. https://doi.or​ g/10.1074/jbc.M610314200​ ​ (2007). 62. Chan, K. H. K. et al. Genetic variations in magnesium-related ion channels may afect diabetes risk among African American and Hispanic American women. J. Nutr. 145, 418–424. https​://doi.org/10.3945/jn.114.20348​9 (2015). 63. Ebert, M. H., Schmidt, D. E., Tompson, T. & Butler, M. G. Elevated plasma gamma-aminobutyric acid (GABA) levels in individuals with either Prader–Willi syndrome or Angelman syndrome. J. Neuropsychiatry Clin. Neurosci. 9, 75–80. https​://doi.org/10.1176/ jnp.9.1.75 (1997). 64. Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990. https​://doi.org/10.1038/ng.2383 (2012). 65. Harder, M. N. et al. Type 2 diabetes risk alleles near BCAR1 and in ANK1 associate with decreased β-cell function whereas risk alleles near ANKRD55 and GRB14 associate with decreased insulin sensitivity in the Danish Inter99 cohort. J. Clin. Endocrinol. Metab. 98, E801–E806. https​://doi.org/10.1210/jc.2012-4169 (2013). 66. Gallagher, P. G. & Forget, B. G. An alternate promoter directs expression of a truncated, muscle-specifc isoform of the human Ankyrin 1 gene. J. Biol. Chem. 273, 1339–1348. https​://doi.org/10.1074/jbc.273.3.1339 (1998). 67. Gallagher, P. G., Tse, W. T., Scarpa, A. L., Lux, S. E. & Forget, B. G. Structure and organization of the human Ankyrin-1 gene: basis for complexity of pre-mRNA Processing. J. Biol. Chem. 272, 19220–19228. https​://doi.org/10.1074/jbc.272.31.19220​ (1997). 68. Rubtsov, A. M. & Lopina, O. D. Ankyrins. FEBS Lett. 482, 1–5. https​://doi.org/10.1016/s0014​-5793(00)01924​-4 (2000). 69. Sun, L., Zhang, X., Wang, T., Chen, M. & Qiao, H. Association of ANK1 variants with new-onset type 2 diabetes in a Han Chinese population from northeast China. Exp. Ter. Med. 14, 3184–3190. https​://doi.org/10.3892/etm.2017.4866 (2017). 70. Imamura, M. et al. A single-nucleotide polymorphism in ANK1 is associated with susceptibility to type 2 diabetes in Japanese populations. Hum. Mol. Genet. 21, 3042–3049. https​://doi.org/10.1093/hmg/dds11​3 (2012). 71. Yan, R. et al. A novel type 2 diabetes risk allele increases the promoter activity of the muscle-specifc small ankyrin 1 gene. Sci. Rep. 6, 25105. https​://doi.org/10.1038/srep2​5105 (2016). 72. Kline, C. F. et al. Dual role of KATP channel C-terminal motif in membrane targeting and metabolic regulation. Proc. Natl. Acad. Sci. 106, 16669–16674. https​://doi.org/10.1073/pnas.09071​38106​ (2009). 73. Aslan, O. et al. Association between promoter polymorphisms in a key cytoskeletal gene (Ankyrin 1) and intramuscular fat and water-holding capacity in porcine muscle. Mol. Biol. Rep. 39, 3903–3914. https​://doi.org/10.1007/s1103​3-011-1169-4 (2012). 74. Chen, J., Ren, J., Loo, W. T. Y., Hao, L. & Wang, M. Lysyl oxidases expression and histopathological changes of the diabetic rat nephron. Mol. Med. Rep. 17, 2431–2441. https​://doi.org/10.3892/mmr.2017.8182 (2018). 75. Subramanian, M. L. et al. Upregulation of lysyl oxidase expression in vitreous of diabetic subjects: implications for diabetic retin- opathy. Cells 8, 1122. https​://doi.org/10.3390/cells​81011​22 (2019). 76. Linsenmeier, R. A. et al. Retinal hypoxia in long-term diabetic cats. Invest. Ophthalmol. Vis. Sci. 39, 1647–1657 (1998). 77. Al-Shabrawey, M., Ibrahim, A., Beasley, S., Wang, F. & Tawfk, A. Bioactive lipids and early infammatory response in diabetic retinopathy. Acta Ophthalmol. https​://doi.org/10.1111/j.1755-3768.2015.0061 (2015). 78. Li, Z. et al. Overexpression of 15-lipoxygenase-1 in oxygen-induced ischemic retinopathy inhibits retinal neovascularization via downregulation of vascular endothelial growth factor-A expression. Mol. Vis. 18, 2847–2859 (2012). 79. Lee, P. et al. Potential predictive biomarkers of obesity in Burmese cats. Vet. J. 195, 221–227. https​://doi.org/10.1016/j. tvjl.2012.06.027 (2013). 80. Takada, D. et al. Interaction between the LDL-receptor gene bearing a novel mutation and a variant in the apolipoprotein A-II promoter: molecular study in a 1135-member familial hypercholesterolemia kindred. J. Hum. Genet. 47, 656–664. https​://doi. org/10.1007/s1003​80200​101 (2002). 81. Sato, K. et al. Soluble epoxide hydrolase variant (Glu287Arg) modifes plasma total cholesterol and triglyceride phenotype in familial hypercholesterolemia: intrafamilial association study in an eight-generation hyperlipidemic kindred. J. Hum. Genet. 49, 29–34. https​://doi.org/10.1007/s1003​8-003-0103-6 (2004). 82. Ohtoshi, K. et al. Association of soluble epoxide hydrolase gene polymorphism with insulin resistance in type 2 diabetic patients. Biochem. Biophys. Res. Commun. 331, 347–350. https​://doi.org/10.1016/j.bbrc.2005.03.171 (2005). 83. Crispin, S. Ocular lipid deposition and hyperlipoproteinaemia. Prog. Retin. Eye Res. 21, 169–224. https​://doi.org/10.1016/S1350​ -9462(02)00004​-6 (2002). 84. Hardman, C. Lipid aqueous as a sign of hyperlipidaemia in Burmese cats. 261 (1999). 85. Kluger, E. K. et al. Triglyceride response following an oral fat tolerance test in Burmese cats, other pedigree cats and domestic crossbred cats. J. Feline Med. Surg. 11, 82–90. https​://doi.org/10.1016/j.jfms.2008.05.005 (2009). 86. Ginzinger, D. G. et al. A mutation in the lipoprotein lipase gene is the molecular basis of chylomicronemia in a colony of domestic cats. J. Clin. Investig. 97, 1257–1266. https​://doi.org/10.1172/JCI11​8541 (1996). 87. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421. https​://doi.org/10.1186/1471-2105-10-421 (2009). 88. Willet, C. E. & Haase, B. An updated felCat5 SNP manifest for the Illumina Feline 63k SNP genotyping array. Anim. Genet. 45, 614–615. https​://doi.org/10.1111/age.12169​ (2014). 89. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7. https​://doi. org/10.1186/s1374​2-015-0047-8 (2015). 90. Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354. https​://doi.org/10.1038/ng.548 (2010). 91. Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265. https​://doi.org/10.1093/bioin​forma​tics/bth45​7 (2004). 92. Wang, N., Akey, J. M., Zhang, K., Chakraborty, R. & Jin, L. Distribution of recombination crossovers and the origin of haplo- type blocks: the interplay of population history, recombination, and mutation. Am. J. Hum. Genet. 71, 1227–1234. https​://doi. org/10.1086/34439​8 (2002). 93. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England) 25, 1754–1760. https​://doi.org/10.1093/bioin​forma​tics/btp32​4 (2009).

53 Scientifc Reports | (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 11 Vol.:(0123456789) www.nature.com/scientificreports/

94. McKenna, A. et al. Te Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303. https​://doi.org/10.1101/gr.10752​4.110 (2010). 95. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498. https​://doi.org/10.1038/ng.806 (2011). 96. McLaren, W. et al. Te Ensembl Variant Efect Predictor. Genome Biol. 17, 122. https://doi.or​ g/10.1186/s13059-016-0974-4​ (2016). 97. Ng, P. C. & Henikof, S. Predicting deleterious amino acid substitutions. Genome Res. 11, 863–874. https://do​ i.org/10.1101/gr.17660​ 1 (2001). Acknowledgements Funding for this project was provided by Sydney School of Veterinary Science- Constance H. Aird Bequest, Aus- tralian Companion Health Fund Ref No.:023/2013 and the Feline Health Research Fund. Te authors acknowl- edge the contributions of Tim Grufyydd-Jones and Sarah Pierard who provided samples, cat owners and Frank Nicholas for his insightful advice. Author contributions B.H., C.M.W., J.B., and G.S. conceptualised the research and contributed to the progression of overall aims. B.H. and C.M.W. were responsible for the provision of resources. Experimental work, data analysis and visualisation were conducted by G.S. under the supervision of B.H. and C.M.W. Samples were provided by L.A.L., J.B. and L.M.F. Clinical diagnosis was performed by J.B. and L.M.F. Te manuscript was written by G.S., all authors read, edited and approved the fnal manuscript.

Competing interests Te authors declare no competing interests. Additional information Supplementary information is available for this paper at https://doi.org/10.1038/s4159​ ​8-020-76166​-3. Correspondence and requests for materials should be addressed to G.S. Reprints and permissions information is available at www.nature.com/reprints. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations. Open Access Tis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. Te images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat​iveco​mmons​.org/licen​ses/by/4.0/.

© Te Author(s) 2021

54 Scientifc Reports | (2020) 10:19194 | https://doi.org/10.1038/s41598-020-76166-3 12 Vol:.(1234567890) Chapter 3 Cross-species applications of the feline reference genome for the benefit of conservation research

3.1 Synopsis- Exploiting genomic synteny in Felidae: using cross-species genome alignment and SNV discovery to inform conservation management in big cats

While genomics has enabled vast improvements in the quantification of genome-wide diversity and the identification of adaptive and deleterious alleles in model species, wildlife species have not reaped the same benefits. In resource constrained species, alternative genomic approaches that reduce costs and computational resources can be used to inform conservation management. In the absence of species-specific reference genomes, the availability of a reference genome from a closely related species can provide a reliable substrate for performing variant discovery. Domestic cats benefit from high-quality reference genome assemblies and share a high degree of genomic synteny with their wild counterparts. In this chapter, I demonstrate the cross-species application of the most recent feline reference genome assembly for variant calling and annotation in the Sumatran tiger, snow leopard and cheetah. This dataset offers insights into species-specific genetic adaptations associated with domestication and the evolutionary success of big cats as hypercarnivores and supports the use of cross-species genome alignment methods for variant discovery in felids. This chapter is presented in the format of a manuscript that has been submitted to the journal BMC Genomics.

55

Exploiting genomic synteny in Felidae: using cross-species genome alignment and SNV discovery to inform conservation management in big cats

Georgina Samaha1*, Claire M. Wade2, Hamutal Mazrier1, Catherine E. Grueber2ƚ & Bianca Haase1ƚ

1Sydney School of Veterinary Science, Faculty of Science, University of Sydney, NSW, Australia

2School of Life and Environmental Sciences, The University of Sydney, NSW, Australia

*Corresponding author: [email protected]

ƚ These authors share senior authorship

56 Abstract Background: While recent advances in genomics has enabled vast improvements in the quantification of genome-wide diversity and the identification of adaptive and deleterious alleles in model species, wildlife and non-model species have largely not reaped the same benefits. This has been attributed to the resources and infrastructure required to develop essential genomic datasets such as reference genomes. In the absence of a high-quality reference genome, cross-species alignments can provide reliable, cost-effective methods for single nucleotide variant (SNV) discovery. Here, we demonstrated the utility of cross-species genome alignment methods in gaining insights into population structure and functional genomic features in cheetah (Acinonyx jubatus), snow leopard (Panthera uncia) and Sumatran tiger (Panthera tigris sumatrae), relative to the domestic cat (Felis catus). Results: Alignment of big cats to the domestic cat reference assembly yielded nearly complete sequence coverage of the reference genome. From this, 38,839,061 variants in cheetah, 15,504,143 in snow leopard and 13,414,953 in Sumatran tiger were discovered and annotated. This method was able to delineate population structure but limited in its ability to adequately detect rare variants. Enrichment analysis of fixed and species- specific SNVs revealed insights into adaptive traits, evolutionary history and the pathogenesis of heritable diseases. Conclusions: The high degree of synteny among felid genomes enabled the successful application of the domestic cat reference in high-quality SNV detection. The datasets presented here provide a useful resource for future studies into population dynamics, evolutionary history and genetic and disease management of big cats. This cross-species method of variant discovery provides genomic context for identifying annotated gene regions essential to understanding adaptive and deleterious variants that can improve conservation outcomes.

Keywords Genomics, conservation, felids, cheetah, tiger, snow leopard, cat, cross-species, SNV, WGS

57 Background

As natural habitats and ecosystems are increasingly impacted by anthropogenic events, a growing number of species require some form of ex situ management to prevent extinction. Big cat species are amongst the most vulnerable, having experienced dramatic population declines as result of illegal poaching, trade and . All 38 wild felid species have a negative global population trend (1). Despite these challenges, wild felids are revered as cultural symbols and are important flagship species for engendering public interest in conservation programmes. As keystone species, strategies for the protection of large felids also benefit their ecosystems (2). Allelic variation is essential to preserving species’ genetic integrity and maintaining functioning ecosystems, as genetically diverse populations tend to have higher fitness and adaptive capacity (3). Given their conservation status, many big cats are the subjects of Species Survival Plans (SSP) (4, 5); internationally coordinated programmes that manage ex situ breeding and aim for healthy, self-sustaining and genetically diverse populations. Retaining genetic diversity in a closed, captive population is challenging and expensive (6, 7). Reductions in genetic diversity due to inbreeding, genetic drift and selection can lead to an accumulation of deleterious mutations in a captive-bred population, threatening their long-term viability (8, 9).

With genomic resources becoming increasingly accessible for a wide diversity of species, single nucleotide variant (SNV)-based genetic analyses offer conservationists higher resolution for measuring diversity and addressing conservation questions than previously (10). In model species, SNVs are the genetic marker of choice for the advancement of functional, quantitative and evolutionary research (11-13). Their high frequency across coding and non-coding regions, low typing error rates and ease of comparability across datasets, make them a favourable alternative or complement to low density markers such as microsatellites, allozymes and mtDNA (14). Genome-wide SNV discovery methods have enabled the transition from estimates of inbreeding using pedigree data and microsatellite markers, to the direct quantification of inbreeding via genome-wide scans of individual homozygosity (15-17). The use of genome-wide SNV data in big cats has illuminated patterns of genetic variation relating to population history, physiological adaptation and speciation (18-22). However, there remains a notable gap between these studies and the integration of genomics into the conservation management of these species. Recent reviews have highlighted several barriers to the widespread uptake of genomic data in conservation, including: high costs associated with sequencing and sampling, lack of computational infrastructure, need for specialist bioinformatic expertise, and the absence of genomic resources (i.e. reference data) for non-model species (23, 24). Here, ‘non-model species’ refers to those with limited genomic resources, specifically reference genomes.

A shortage of high-quality reference genomes is often cited as a barrier to SNV discovery and genotyping in non-model species (24-26). Reference-guided SNV discovery is more computationally efficient than methods that do not use a reference, offering higher accuracy at lower sequencing depths, as well as the ability to

58 physically map and determine linkage disequilibrium between SNVs. The genomic context provided by a reference genome allows for identifying annotated gene regions essential to understanding any potential adaptive or deleterious variant consequences. Compared with many other carnivorous genera, felid genomics has been extensively studied: a number of domestic cat reference assemblies have been developed since 2007, followed by draft assemblies of tiger (Panthera tigris) (20), cheetah (Acinonyx jubatas) (18), leopard (Panthera pardus) (27), jaguar (Panthera onca) (21) and lion (28). These studies required considerable financial, computational and bioinformatic resources, rarely available to conservation managers. High-quality genome assemblies are differentiated from draft-quality assemblies by their lower error rates, fewer gaps (e.g. chromosome-level assembly), and high-quality annotations. Recently, a correction of the tiger draft assembly was published (29), highlighting the potential for inadequately validated draft assemblies to bias the outcomes of genomic observations and biological conclusions.

Cross-species reference-guided methods (30, 31), reduced-representation libraries (19, 32, 33) and pooled sequencing (poolseq) (34, 35) have successfully been used to study genomics in non-model species, circumventing the costs, computational resources and extensive sampling required to develop a high quality reference genome. Here, we performed cross-species whole genome sequence (WGS) alignment and SNV discovery using the domestic cat (Felis catus) reference genome assembly of three big cat species; Sumatran tiger (Panthera tigris sumatrae), snow leopard (Panthera uncia) and cheetah. We compare the utility of non- barcoded pooled versus individual WGS data for cross-species alignment and SNV discovery. We present an annotated catalogue of high-quality variants from seventeen whole genome sequences comprising twenty-six individuals across the three species. We show that reference genomes from distantly related species can be successfully used for SNV discovery to inform conservation management. Gene enrichment and gene diversity analyses provided a proof of principle of the effectiveness of cross-species application of the domestic cat reference genome in variant calling in big cat species.

Results

Cross-species WGS alignment and variant calling using the felCat9 reference assembly

Genomic DNA of four Sumatran tigers, four cheetahs and four snow leopards was pooled by species in equimolar ratios and sequenced. Whole genome sequences for seven Sumatran tigers, six cheetahs and one snow leopard were downloaded from sequence read archive (SRA) (36). Individual and pooled samples were aligned to the felCat9 (37) reference assembly. The number of reads in each species pool was 830 M for cheetah, 960.7 M for snow leopard and 896.7 M for Sumatran tiger. Overall sequencing performance is summarised in Table S1 and results for alignments and variant calling for all individuals and pools are presented in Table 1. An average of 170 M cheetah reads, 627 M snow leopard reads, and 251 M Sumatran tiger reads were mapped to the Felis_catus_9.0 (felCat9; GCA_000181335.4) reference assembly. On average, 94% of

59 cheetah, 93% of snow leopard and 95% of Sumatran tiger reads were properly paired and mapped to felCat9 chromosomes (Figure S1). The proportion of successfully paired and mapped reads was lower for Sumatran tiger and cheetah pools compared with their individually sequenced counterparts. The read depth for all samples ranged between 5.56x and 35.91x. Genome coverage was highest over a greater portion of bases for all species pools compared with their SRA counterparts (Figure S2). Read coverage for species pools ranged from 25.78x to 50.14x and alignment of species pools to the felCat9 reference assembly resulted in greater than 90% coverage of the reference at a minimum depth of 20x compared with a depth of ~5x, ~7x and 10x in cheetah, Sumatran tiger and snow leopard individual samples, respectively. To compare the utility of non-barcoded pooled versus individual WGS, variant calling was performed in diploid mode for all samples. Across Sumatran tiger and cheetah samples, called fewer biallelic variants were called for individuals than for their pooled counterparts, while the snow leopard pool and individual samples showed approximately equal numbers of variants (Table 1). Individual nucleotide diversity (π) is reported under the assumption that both parental sets of chromosomes have been sequenced to equivalent coverage depths. This statistic was similar among pools and individuals for Sumatran tigers and snow leopards, however the cheetah pool displayed markedly higher π compared to all sequenced individuals and compared to pools for the other two species. Despite having the lowest average read coverage, cheetahs had the highest proportion of reads mapped and high-quality SNVs called against the felCat9 reference assembly (Table 1). Cheetahs had a higher density of SNVs per kilobase across all felCat9 chromosomes than Sumatran tigers and snow leopards. The cheetah and Sumatran tiger pools had a significantly higher density of SNVs across all chromosomes compared with individuals of the respective species. In total, 13,414,953, 15,504,143 and 38,839,061 biallelic SNVs passed quality filtering in Sumatran tiger, snow leopard and cheetah, respectively. Of these, 10,472,528 in Sumatran tigers, 9,124,699 in snow leopards and 26,430,702 in cheetahs were transitions (Ts) and 5,030,622 in Sumatran tigers, 4,285,891 in snow leopards and 12,258,571 in cheetahs were transversions (Tv). Ts/Tv ratios for pooled samples were higher than individual samples (Table 1). Sumatran tiger individuals had a mean Ts/Tv ratio of 1.7 (σ = 0.06) while the Sumatran tiger pool reported a Ts/Tv ratio of 2.09. Cheetah individuals similarly reported a lower mean Ts/Tv ratio of 1.8 (σ=0.02) compared with their pooled counterparts, which had a Ts/Tv ratio of 2.1, potentially indicating a higher rate of false variants in the individual sample variant sets. Ts/Tv ratios for species variant sets ranged from 2.08 to 2.15.

Population structure and demographic statistics

Population structure and demographic statistics were calculated using minor allele frequency (MAF) filtered datasets for all species. Multi-dimensional scaling (MDS) was used to partition total genomic variation among individuals in each species group (Figure 1). There was clear differentiation between individuals and pools in all three species, and among individuals of different geographical provenance. The first component (C1)

60 corresponded to the axis of differentiation among Tanzanian and Namibian cheetahs, with the pool distinctly clustering with Namibian cheetahs. Among Sumatran tigers, the second component (C2) accounted for differentiation between Indonesian (IDN) and American (USA), with two distinct clusters of Indonesian individuals separated by C1. Estimates of genetic variability and pairwise similarity were measured in Sumatran tiger and cheetah individuals (only one individual sample was available for snow leopard) (Table 2). Co-ancestry coefficients (ϴ) were calculated as the probability of finding identical alleles when randomly sampling one allele from each heterozygous individual and suggested unrelated kinship among all Sumatran tiger individuals. Expected (He) and observed heterozygosity (Ho) were calculated to measure genetic diversity of each population. Ho was lower than He across all samples, indicating a deviation from Hardy-Weinberg equilibrium (HWE) and possible inbreeding (non-random mating). Mean individual inbreeding coefficients (F) among cheetahs was 0.346±0.062 (range:0.259-0.419) and 0.585±0.033 (range:0.233-0.273) among Sumatran tigers. These results may reflect close relationships among samples within each species, however this cannot be confirmed as data regarding their pedigree was not made available. The proportion of pairwise identity by state (IBS) allelic similarity among cheetahs ranged from 0.472 to 0.394 (μ=0.437±0.028). Mean IBS among Namibian cheetahs was 0.412 and 0.398 among Tanzanian cheetahs. Among Sumatran tigers IBS ranged from 0.392-0.521 (μ=0.5±0.026).

Functional annotation of genomic variants

Variant datasets were annotated using Variant Effect Predictor (VEP) in order to identify coding variants of potential functional significance. VEP assigned 29,059,874, 25,194,581 and 73,915,269 functional classes to SNVs in snow leopards, Sumatran tigers and cheetahs, respectively, based on the felCat9 reference assembly annotation. The number of functional classes defined by VEP is higher than the total number of SNVs, because some sites have multiple annotations. The quantity of quality-filtered, fixed and MAF-filtered variants annotated by VEP varied among species, however the number of transcripts and genes overlapped were consistent across all species (Table 3) and functional annotation of transcript-associated variants was similar across all three species, with over 60% of coding variants labelled as synonymous for all species (Figure 2). A summary of functional annotation of MAF-filtered and fixed SNVs is provided for each species dataset (Table S2).

Cheetah MAF-filtered variants were enriched for over 90 terms that were reduced to best representative terms: -9 macromolecular localisation (GO:0033036; Padj=5.67x10 ), regulation of biological quality (GO:0065008; -8 -9 Padj=7.36x10 ), cytoskeleton organisation (GO:0007010; Padj=5.09x10 ), developmental process (GO:0032502; -17 -12 Padj=7.91x10 ) and cytoskeleton organisation (GO:0005856, Padj=9.20x10 ) (Table S3a). In Sumatran tigers, -4 variants passing the MAF-filter were enriched for protein binding (GO:0005515; Padj=3.54x10 ), cilium -4 -4 (GO:0005929; Padj=1.53x10 ), and collagen-containing extracellular matrix (GO:0062023; Padj=3.53x10 )

61 (Table S3b). The snow leopard MAF-filtered dataset was not found to be functionally enriched. Calling variants from one snow leopard individual and a pool called in diploid mode has likely confounded the number of fixed and MAF-filtered variants in snow leopards. In snow leopard, a lack of significant functional enrichment among genes containing MAF-filtered variants may be attributed to small sample size.

Genomic signatures of adaptation among big cats Signatures of selection within each species were tested using nucleotide and gene diversity of synonymous and non-synonymous SNVs. Most genes had a pairwise nucleotide diversity (πN/πS) < 1 while 54 genes in cheetahs, six genes in Sumatran tigers and one gene in snow leopards revealed signs of positive selection (Table S4). All genes displaying signatures of positive selection in tiger and snow leopard were involved in olfaction. In cheetahs, genes showing signs of selection were involved in olfaction (LOC101084218, LOC101085032, LOC101086178, LOC101095034 and LOC101101377) and immune responses (TLR3, SYTL2, OAS3 and RAB44). Genes under positive selection also included dynein axonemal heavy chain genes DNAH2 and DNAH6, involved in flagellum-dependent cell motility (GO:0001539) and SGCG and XIRP1, expressed exclusively in skeletal muscle. SNV fixed in each species group were collected and annotated to identify genes potentially involved in species- specific phenotypic signatures of adaptation. This also served to highlight phenotypic differences between the domestic cat and big cat species. In cheetahs, genes harbouring homozygous SNVs related to KEGG pathways were: HEXB, HARS1, PPP2CA, TGDS, ALG11 and PDE4D. Fixed missense alleles in cheetahs also occurred in ACTN3, previously associated with athletic performance in humans and horses (38, 39). When reduced to the most representative subset of GOterms by semantic similarity, a total of 144 representative GO terms were analysed for fixed SNVs in Sumatran tigers and 95 in snow leopards (Table S5a,b). In Sumatran tigers, genes -17 harbouring fixed variants were enriched for metabolic pathways (KEGG:01100; Padj=1.94x10 ). Fixed non- synonymous alleles shared by snow leopards and Sumatran tigers were enriched for growth (GO:0040007), locomotion (GO:0040011), and developmental process (GO:0032502). These included missense variants in genes previously associated with body size: LCORL that were unique to Panthera species (Table S6). Pathway enrichment analysis was performed to gain mechanistic insight into genes containing fixed variants. Fixed non- synonymous variants in 418 genes common to all three big cat species were annotated for cadherin signalling -5 (Padj=1.51x10 ) and Wnt signalling (Padj=0.001) pathways. Genes in the protocadherin family displayed fixed allelic differences between all big cats and the domestic cat genome (Table S7).

Genetic insights into heritable conditions affecting big cats

62 To identify genes potentially underlying conditions of clinical importance in each species, a non-redundant list of genes categorised by GO terms relevant to known heritable conditions, immune and reproductive function was collected for each species. Genes containing deleterious variants (SIFT score 0-0.05) were collected from MAF-filtered datasets and observed in 201 genes annotated for relevant GO terms in cheetah, in six genes in snow leopards and in 44 genes in Sumatran tigers (Table S8). In cheetahs, these genes were annotated for spermatogenesis (GO:0007283), cilium assembly (GO:0060271), sperm flagellum (GO:0036126), B cell mediated immunity (GO:0019724) and embryo development (GO:0009790) (Figure S3a). In cheetahs, genes housing variants previously associated with known disease relating to ciliary dysfunction included PCDH15, HOMER2, SPEF2, NAGLU, PHGDH, ATR and ABL1. In Sumatran tigers, genes containing deleterious variants were restricted to terms relating to cilium structure and assembly (GO:0060271, GO:0032420) (Figure S3b) and ATR was also the only gene included in Sumatran tiger phenotype-associated genes. Snow leopard genes containing deleterious variants were: ADAM29, ADCY10, CCNYL1, CCT3, KIF20B and LAMB1.

Discussion The cross-species application of the domestic cat reference presented here takes advantage of the high degree of synteny between Felidae genomes. Previously, cheetah (18) and tiger (20, 22) genomes have shown a high level of conserved synteny and repeat composition when compared with the domestic cat. This similarity is supported by the high alignment quality of all samples included here. Successful alignment of big cat samples and pools resulted in >99% coverage of the felCat9 reference assembly at varying sequence depths. Despite differences in phylogenetic distances between Acinonyx and Panthera lineages and domestic cats, the proportion of reads that aligned for each species did not appear to decrease with phylogenetic distance. The high affinity alignment presented here indicates strong genomic conservation within Felidae. This likely reflects the relatively contemporary speciation of modern felids, occurring <11MYA (40). Genomic synteny between species provides opportunities to interpret genomic structure and gene function in an evolutionary context. The high degree of synteny observed here is of crucial importance in studying diversity among felid species and mechanisms underlying local adaptation that differ among them (21, 41, 42). Additionally, the highly conserved synteny of the cat genome with that of humans and other mammalian species has given insight into ancestral genome organisation (43-45), supporting the cat as a valuable biomedical model for heritable human diseases (46-49).

63 Estimates of heterozygosity are vulnerable to reference bias Demographic parameters based on estimates of heterozygosity were inconsistent with previous studies. This may result from small sample sizes, unknown relatedness among samples and the cross-species genome alignment method employed here. While felid genomes may be highly homologous, and previous reports indicate no large-scale chromosomal rearrangements among felids (50), reference sequence bias can substantially impact estimates of heterozygosity and downstream population genomic analyses (51, 52). Cross- species alignments have been found to bias heterozygosity estimates while correctly measuring population structure (28, 53). Cheetahs were previously found to have lower rate of heterozygous SNVs when aligned to the draft cheetah assembly (0.0019-0.0021 (18), compared with 0.012 when aligned to the domestic cat reference herein). A similar pattern was also observed for the snow leopard (0.0002 (20) compared with 0.004 herein). In genomic studies, reference-guided variant calling will always be biased toward the properties of the reference genome, rather than those shared across a population. Reference genomes are idiosyncratic, type- specimens and preferential alignment of genomic sequences to the reference alleles, results in underestimating the level of variation in aligned samples from different populations. This problem is demonstrated by the higher SNV frequency in cheetahs (more closely related to the domestic cat), compared with snow leopards and Sumatran tigers. When aligned to its own species’ reference assembly, the cheetah genome displays lower overall SNV frequency and a significantly higher proportion of homozygous stretches than the domestic cat (18). Any demographic statistics that rely on low frequency variants, may be affected by this bias which likely accounts for high inbreeding coefficients and low heterozygosity observations. Population structure analyses driven by common variants are largely unaffected by reference bias, as MDS showed samples of the same geographic provenance clustered together. Multi-genome alignment techniques that overcome these biases are available in humans, but these resources will likely remain scarce in the context of big cat conservation as they require large sample sets (54-56). We recommend against using demographic inferences based on cross- species genome alignments to inform conservation management.

Poolseq is a cost-effective means of SNV discovery We compared sequencing of individual genomes with sequencing of pooled DNA from multiple (N=4) individuals (poolseq). After quality filtering, pooled samples called more SNV than lower coverage individual WGS (Sumatran tiger and cheetah) and an equivalent number of SNV to high coverage individual WGS (snow leopard). Poolseq can be used to provide a high-quality approach for genotyping the collective genomic profile of a population, comparable to population-level allele frequency estimates of individual WGS (57). Poolseq is a cost-effective method for assessing population structure and genome-wide patterns of variation (58, 59), but does present statistical challenges in deriving estimates of demographic and other inferences that rely on individual heterozygosity. Estimating allele frequencies from poolseq is vulnerable to experimental noise and

64 bias at a number of protocol stages, from pooling equimolar ratios of DNA (60) through to library construction (61), sequencing and analysis. Poolseq is a less efficient technique for discovering SNVs than individual WGS when coverage is low and sample sizes are small (57). Previous studies have shown that effects of experimental error are greater when a pool is small (N < 10) and sequencing depth is low (62). Suitably large sample sizes (N>50) are often unfeasible for conservationists working with endangered species, particularly large carnivores that exist in low densities in both the wild and ex situ management settings. To compensate for small pool size (N=4), pools were sequenced at a high coverage, quality-based filtering and MAF-based filtering of SNV was performed. MAF thresholds for each species group were used to exclude variants present in a heterozygous state in one individual only. This approach can be used to reliably characterise the allele frequency spectrum, however the downside is that attempts to control for high error rates will have excluded low-frequency or rare alleles. As a result of excluding these alleles, we did not include poolseq samples in estimations of demographic parameters. Choosing poolseq under these circumstances is to accept the trade-off of losing information about rare alleles, in favour of a cost-effective estimation of genome-wide allele frequency of a population.

Genomic signatures into big cat hypercarnivory Considering our data in the genomic and phenotypic context of the domestic cat has highlighted a suite of defining characteristics of domestication. These characteristics typically include neurological and behavioural changes associated with tameness, a shift toward a polyoestrous reproductive cycle, altered dietary habits and morphological changes (63). Fixed variants presented here highlight differences in metabolic function, body size and neurological processes in wild and domestic cats. Changes in neural crest-related genes are believed to underlie the evolution of tameness across a range of domesticated species (42, 64-67). The expression of cell adhesion proteins including cadherins and protocadherins during neural crest cell development is regulated by Wnt signalling (68, 69). Fixed differences in protocadherin genes is consistent with comparative studies in wildcats (Felis silvestris) (42) and domesticated foxes (70). As hyper-carnivorous ambush predators, cats share physiological traits essential for hunting and endogenous glucose demands (66, 71, 72). Fixed variants in big cats reveal adaptive physiological functions essential to their evolutionary success as carnivorous species (73, 74). In cheetahs this included genes involved in spatiotemporal awareness (HARS1) (75) and skeletal muscle function (ACTN3, SACS, MEGF10, SGCG and XIRP1) (38, 76-78). Among Sumatran tiger and snow leopards these included unique missense variants in LCORL, a gene previously associated with body size in domestic mammals (13, 79, 80). These results reflect species-specific genetic adaptations associated with hyper-carnivory, highlighting candidate genes underlying species-specific adaptative mechanisms integral to the evolutionary success of big cats. Carnivorous diets are associated with increased metabolism, faster growth rates and higher fecundity (27, 81) and is dependent on an abundance of prey species. Habitat loss and prey depletion threaten the ecological niche occupied by big cats as keystone

65 species (82). Ongoing genomic studies of these traits can highlight mechanisms by which big cats interact with their ecosystems, complementing ecological studies and serve as an essential component of holistic management plans (83-86).

Genomic insights into the pathogenesis of diseases affecting big cats Adding to the ecological complexities of big cat conservation is the impact of infectious and heritable diseases (87, 88). Captive cheetahs, snow leopards and Sumatran tigers have historically presented a range of infectious and degenerative diseases, while their wild counterparts have remained unaffected (87, 89, 90). Captive breeding programmes typically operate to maintain within-population genetic diversity, however for many threatened species, population bottlenecks in the wild have resulted in genetically depauperate populations that display impaired fitness and increased susceptibility to infectious diseases (15, 91). Domestic animals are increasingly used to model complex and simple genetic diseases in humans (46, 48, 92). The methods employed by these studies, including genome-wide association studies, can be adapted by conservationists studying the genetic basis of heritable diseases. As an example, candidate genes relevant to health and reproductive success studied here can be used to inform studies of the genetic basis of documented conditions of ciliary dysfunction including congenital vestibular disease (CVD) in Sumatran tigers (89), poor spermatozoan quality in cheetahs (93) and chronic respiratory infections (87, 88), by testing whether any of these variants are associated with clinical findings. Sumatran tiger cubs in Australian zoos have been reportedly affected by CVD with a heritable component since 1990. Pedigree and segregation analyses suggested an autosomal dominant mode of inheritance with incomplete penetrance of CVD within the Australian population (89). Stereocilia in the vestibular system play a functional role in spatial navigation and self-motion perception (94) and their dysfunction can cause neurological symptoms, such as those observed in the Sumatran tiger. Genes containing deleterious variants of potential clinical significance identified here included: SPEF2 previously implicated in primary ciliary dyskinesia (95) and vestibular stereocilia function (96), HOMER2 a stereociliary scaffolding protein, essential for normal hearing and vestibular function in humans and mice (97, 98) and PCDH15 implicated in Usher syndrome in humans and balance disorders and deafness in mice (99). Similar enrichment of ciliary genes was observed in the cheetah with deleterious variants observed in seven genes (IFT140, CPLANE1, DYNC2H1, CCDC39, CC2DA, and RPGRIP1L) involved in cilium development and structure. Studies in mice and humans have shown defects in ciliary structure and dysmotility are typically present from birth, with affected individuals suffering recurrent respiratory infections and poor fertility (100, 101). Respiratory infections have long been observed as a significant cause of mortality in captive cheetahs (87, 102). Data presented here can be used to pinpoint mechanistic bases of ciliary dysfunction in mucociliary clearance of the respiratory tract, paranasal sinuses and middle ear during respiratory infections. Eighteen genes known to cause primary cilia dyskinesia

66 in humans (103) were found to contain missense variants in the cheetah dataset, of these CCDC39, DNAH8 LRRC6 contained deleterious variants. These genes are highly conserved between felids and humans and as such may serve as valuable candidates for understanding the pathogenesis of reproductive, respiratory and vestibular diseases present in big cat populations and aid in improved diagnosis and treatment by veterinarians. Conclusion We have demonstrated the utility of cross-species genome alignments in gaining insights into population structure and functional genomic features in big cat species. The datasets presented here provide a useful resource for future studies into population dynamics, evolutionary history and genetic and disease management of big cats. The high degree of synteny among felid genomes enabled the successful application of the domestic cat reference for visualising population structure, discovering variants associated with adaptive traits, genes under selection and pathogenesis of heritable diseases. Importantly however, this method is limited in its capacity to adequately quantify heterozygosity and low-frequency variants. Poolseq proved a low-cost method for genotypic profiling of each species. This cross-species method of variant discovery provides genomic context for identifying annotated gene regions essential to understanding the genomic landscape underpinning traits and diseases that can be used to improve conservation outcomes.

67 Methods Animals and DNA sequencing

Whole genome sequences for seven Sumatran tigers, six cheetahs and one snow leopard were downloaded from Sequence Read Archive (SRA) in fastq format (Table S9). Whole blood samples from four snow leopards, four cheetahs and four Sumatran tigers currently housed in Australian zoos were collected as a part of routine health examinations by registered veterinarians employed by each zoo and submitted to the University for infectious disease screening. Genomic DNA was isolated from whole blood by phenol-chloroform extraction and pooled by species in equimolar ratios (4 individuals/species pool). Library preparation and whole genome sequencing of species pools was performed by Ramaciotti Centre for Genomics, University of New South (Kensington, Australia). Illumina paired-end libraries were prepared and sequenced on the Illumina HiSeq 2000 platform with 150bp paired-end reads.

Reference genome alignment, variant calling and filtering

All samples were aligned to the felCat9 reference assembly using BWA-mem (104). Aligned reads were sorted using Samtools 1.9 (105). Base quality score recalibration was performed using Genome Analysis ToolKit (GATK) (106). Summary statistics of alignments were collected using Samtools stats function. GATK best practices (107) were used for SNV and short indel calling. For each sample, GATK’s HaplotypeCaller tool was used to call variants from the recalibrated bam file. GVCF files containing unfiltered SNV and short indel calls for all sites were then submitted for joint genotyping with GATK’s GenotypeGVCF tool. A VCF was generated for each species cohort and then passed to VariantFiltration tool for hard filtering according to GATK recommendations: QUAL>40.0, QD>2.0, MQ>50.0, FS<50 and then to SelectVariants tool to remove indel calls. Summary statistics of quality filtered VCF datasets and diversity estimates for individuals and species cohorts including: nucleotide diversity (π), TsTv ratio, and individual heterozygosity (Hs) were performed using VCFTools –site-pi, --tstv and –het flags (108). Species-specific SNVs were collected from hard-filtered species cohort VCFs using minor allele frequency (MAF) thresholds (Table 2) with BCFtools view function (109). Population structure within species was detected by MDS using PLINK (110) with MAF-filtered SNV datasets. To examine whether any individuals were closely related, relatedness among individuals in each species cohort were calculated using VCFTools –relatedness2 function.

Variant annotation and gene enrichment analysis

Variant datasets were annotated using Ensembl Variant Effect Predictor (VEP) tool (111) and the NCBI annotation release 104 of the felCat9 genome build. Gene annotation and enrichment analyses were performed for fixed and MAF-filtered datasets on protein change variants annotated by VEP with the following impact terms: missense_variant, start_lost, stop_gained, stop_lost, stop_retained_variant, splice_acceptor_variant,

68 splice_donor_variant. Genes with an accelerated rate of non-synonymous to synonymous substitutions (dN/dS)

and nucleotide diversity of non-synonymous to synonymous coding sites (πN/πS) within each species were calculated using SNPGenie v1.2 (112). Enrichment analysis of GO terms and pathways was performed using gPofiler (113). The gProfiler gOSt function gathers functional annotation terms from various annotation sources including gene ontology terms, biological pathways, protein databases and human phenotype ontology. Enrichment terms were considered significant if they passed a significance threshold of P<0.001, corrected for multiple testing. This was done for fixed variant sets for each species to highlight genes of potential significance in felid evolution. It was also performed for MAF-filtered SNV datasets to highlight genes of potential significance within each species. GO enrichment output of fixed and MAF-filtered datasets from gProfiler were reduced to their most specific GOterms using REVIGO (114). Network pathway enrichment analysis was performed using WebGestalt (115) overrepresentation analysis and EnrichNet (116) to identify network interconnectivity score (XD score) and overlap-based enrichment score (Fisher’s exact test) of genes containing fixed variants and known phenotypic annotations. Phenotypic annotation of genes containing fixed SNVs for each species to the OMIM (117) database was performed using Ensembl’s BioMart (118). A list of genes previously associated with body size in domestic species was collected from the literature (Table S10) and cross-searched using fixed variant datasets in all big cat species. MEGA-X software (119) was used to perform multiple sequence alignment of LCORL protein, previously implicated in body size variation among mammals. LCORL protein sequences for domestic dog (Canis lupus familiaris), domestic cat, horse ( caballus), cow (Bos taurus) and lion (Panthera leo) were downloaded from Ensembl and aligned to consensus FASTA sequences of cheetah, Sumatran tiger and snow leopard. Multiple sequence alignment was created using the Muscle algorithm.

To identify genes potentially implicated in the reproductive success and overall health of captive-bred big cats, a list of clinically relevant genes was curated using GO annotation terms with AmiGO2 (120) in humans, dogs, pigs, cats, rats and mice (Table S11). This list comprised genes implicated in a list of heritable conditions affecting captive-bred cheetahs, snow leopard and Sumatran tigers, compiled from literature (Table S12) and genes implicated in reproduction, immunity and embryonic development. Variants in these genes were collected from each species cohort MAF-filtered VCF file and submitted for annotation analysis using GOnet (121) and Web-Gestalt using over-representation analysis and disease OMIM and GLAD4U (122) functional databases. High impact variants in these genes were included if they had been annotated as ‘deleterious’ by VEP.

Declarations Ethics approval and consent to participate

69 Research was conducted at The University of Sydney, under Animal Ethics Committee approval no: N00/9– 2009/3/5109, 24 September 2009. Whole blood samples were collected under Zoos Victoria ethics approval ZV09006 and Taronga Conservation Society approval #R12B127. Consent for publication Not applicable Availability of data and materials Sequence data for cheetah, Sumatran tiger and snow leopard individual samples is available on NCBI Sequence Read Archive. Cheetah samples: SRR2737540, SRR2737541, SRR2737542, SRR2737543, SRR2737544, SRR2737545 were deposited under BioProject no. PRJNA297824. Sumatran tiger samples: SRR7152379, SRR7152382, SRR7152383, SRR7152384, SRR7152385, SRR7152386, SRR7152388 were deposited under BioProject no. PRJNA437782. Snow leopard sample SRR836372 was deposited under BioProject no. PRJNA182708. Final VCF files for cheetahs is available at 10.6084/m9.figshare.12996920, for Sumatran tigers is available at 10.6084/m9.figshare.12996947 and for snow leopards is available at 10.6084/m9.figshare.12996977. Competing interests The authors declare that they have no competing interests. Funding The research undertaken in this project was funded by the Jenna O’Grady Donley Fund, Sydney School of Veterinary Science. Author information Catherine E. Grueber and Bianca Haase contributed equally to this work. Author contributions All authors conceptualised the study design. B.W and C.M.W were responsible for the provision of resources. Experimental work, bioinformatic analyses and visualisation were conducted by G.S under the supervision of B.H and C.M.W. C.E.G provided valuable advice on statistical analysis. G.S drafted the original manuscript which all authors edited. All authors approved the final manuscript. Acknowledgements The authors acknowledge the University of Sydney’s high-performance computing cluster Artemis for providing the high-performance computing resources that have contributed to the research reported within this manuscript. We would also like to acknowledge Dr Jacqueline Norris, Zoos Victoria, Melbourne Zoo, Mogo Zoo,

70 Weribee Open Range Zoo and Taronga Conservation Society Australia for providing whole blood samples of big cats included in poolseq datasets.

References 1. IUCN. The IUCN Red List of Threatened Species. Version 2020-2. 2020 [Available from: https://www.iucnredlist.org 2. Ordiz A, Bischof R, Swenson JE. Saving large carnivores, but losing the apex predator? Biological Conservation. 2013;168:128-33. 3. Reed DH, Frankham R. Correlation between Fitness and Genetic Diversity. Conservation Biology. 2003;17(1):230-7. 4. Grisham J, editor Cheetah Species Survival Plan in situ Conservation Programs. Proceedings of the 58th Annual Meeting, World Association of Zoos and Aquariums, WAZA, 16-20 November, 2003: Cooperation Between Zoos in Situ and Ex Situ Conservation Programmes; 2004: World Association of Zoos and Aquariums. 5. Tetzloff J. Role of Zoos in Snow Leopard Conservation: The Species Survival Plan in North America. Snow Leopards: Elsevier; 2016. p. 301-10. 6. Moran E, Cullen R, F. D. Hughey K. The costs of single species programmes and the budget constraint. Pacific Conservation Biology. 2008;14(2):108-18. 7. Willoughby JR, Ivy JA, Lacy RC, Doyle JM, DeWoody JA. Inbreeding and selection shape genomic diversity in captive populations: Implications for the conservation of endangered species. PLOS ONE. 2017;12(4):e0175996. 8. Frankham R, Loebel DA. Modeling problems in conservation genetics using captive Drosophila populations: rapid genetic adaptation to captivity. Zoo Biology. 1992;11(5):333-42. 9. Theodorou K, Couvet D. The efficiency of close inbreeding to reduce genetic adaptation to captivity. Heredity (Edinb). 2015;114(1):38-47. 10. Supple MA, Shapiro B. Conservation of biodiversity in the genomics era. Genome Biology. 2018;19(1):131. 11. Gandolfi B, Alhaddad H, Abdi M, Bach LH, Creighton EK, Davis BW, et al. Applications and efficiencies of the first cat 63K DNA array. Scientific Reports. 2018;8(1):7024. 12. Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L. Natural selection has driven population differentiation in modern humans. Nat Genet. 2008;40(3):340-5. 13. Plassais J, Kim J, Davis BW, Karyadi DM, Hogan AN, Harris AC, et al. Whole genome sequencing of canids reveals genomic regions under selection and variants influencing morphology. Nature Communications. 2019;10(1):1489.

71 14. Jones AG, Small CM, Paczolt KA, Ratterman NL. A practical guide to methods of parentage analysis. Molecular Ecology Resources. 2010;10(1):6-30. 15. Hedrick PW, Garcia-Dorado A. Understanding Inbreeding Depression, Purging, and Genetic Rescue. Trends Ecol Evol. 2016;31(12):940-52. 16. Keller MC, Visscher PM, Goddard ME. Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics. 2011;189(1):237-49. 17. Galla SJ, Moraga R, Brown L, Cleland S, Hoeppner MP, Maloney RF, et al. A comparison of pedigree, genetic and genomic estimates of relatedness for informing pairing decisions in two critically endangered birds: Implications for conservation breeding programmes worldwide. Evolutionary Applications. 2020;13(5):991-1008. 18. Dobrynin P, Liu S, Tamazian G, Xiong Z, Yurchenko AA, Krasheninnikova K, et al. Genomic legacy of the African cheetah, Acinonyx jubatus. Genome Biology. 2015;16(1):277. 19. Natesh M, Atla G, Nigam P, Jhala YV, Zachariah A, Borthakur U, et al. Conservation priorities for endangered Indian tigers through a genomic lens. Scientific Reports. 2017;7(1):9614. 20. Cho YS, Hu L, Hou H, Lee H, Xu J, Kwon S, et al. The tiger genome and comparative analysis with lion and snow leopard genomes. Nature Communications. 2013;4(1):2433. 21. Figueiró HV, Li G, Trindade FJ, Assis J, Pais F, Fernandes G, et al. Genome-wide signatures of complex introgression and adaptive evolution in the big cats. Science Advances. 2017;3(7):e1700299. 22. Liu Y-C, Sun X, Driscoll C, Miquelle DG, Xu X, Martelli P, et al. Genome-Wide Evolutionary Analysis of Natural History and Adaptation in the World’s Tigers. Current Biology. 2018;28(23):3840-9.e6. 23. Galla SJ, Buckley TR, Elshire R, Hale ML, Knapp M, McCallum J, et al. Building strong relationships between conservation genetics and primary industry leads to mutually beneficial genomic advances. Mol Ecol. 2016;25(21):5267-81. 24. Shafer ABA, Wolf JBW, Alves PC, Bergström L, Bruford MW, Brännström I, et al. Genomics and the challenging translation into conservation practice. Trends in Ecology & Evolution. 2015;30(2):78-87. 25. Brandies P, Peel E, Hogg CJ, Belov K. The Value of Reference Genomes in the Conservation of Threatened Species. Genes. 2019;10(11):846. 26. Ouborg NJ, Pertoldi C, Loeschcke V, Bijlsma R, Hedrick PW. Conservation genetics in transition to conservation genomics. Trends in Genetics. 2010;26(4):177-87. 27. Kim S, Cho YS, Kim H-M, Chung O, Kim H, Jho S, et al. Comparison of carnivore, omnivore, and herbivore mammalian genomes with a new leopard assembly. Genome Biology. 2016;17(1):211. 28. Armstrong EE, Taylor RW, Miller DE, Kaelin CB, Barsh GS, Hadly EA, et al. Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long-read data. BMC Biology. 2020;18(1):3.

72 29. Mittal P, Jaiswal SK, Vijay N, Saxena R, Sharma VK. Comparative analysis of corrected tiger genome provides clues to its neuronal evolution. Scientific Reports. 2019;9(1):18459. 30. Galla SJ, Forsdick NJ, Brown L, Hoeppner MP, Knapp M, Maloney RF, et al. Reference Genomes from Distantly Related Species Can Be Used for Discovery of Single Nucleotide Polymorphisms to Inform Conservation Management. Genes. 2019;10(1):9. 31. Minias P, Dunn PO, Whittingham LA, Johnson JA, Oyler-McCance SJ. Evaluation of a Chicken 600K SNP genotyping array in non-model species of grouse. Scientific Reports. 2019;9(1):6407. 32. Blåhed I-M, Königsson H, Ericsson G, Spong G. Discovery of SNPs for individual identification by reduced representation sequencing of moose (Alces alces). PLOS ONE. 2018;13(5):e0197364. 33. Wright B, Farquharson KA, McLennan EA, Belov K, Hogg CJ, Grueber CE. From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species. BMC Genomics. 2019;20(1):453. 34. Kurland S, Wheat CW, de la Paz Celorio Mancera M, Kutschera VE, Hill J, Andersson A, et al. Exploring a Pool-seq-only approach for gaining population genomic insights in nonmodel species. Ecol Evol. 2019;9(19):11448-63. 35. Micheletti SJ, Narum SR. Utility of pooled sequencing for association mapping in nonmodel organisms. Mol Ecol Resour. 2018;18(4):825-37. 36. Leinonen R, Sugawara H, Shumway M, Collaboration INSD. The sequence read archive. Nucleic acids research. 2010;39(suppl_1):D19-D21. 37. Buckley RM, Davis BW, Brashear WA, Farias FHG, Kuroki K, Graves T, et al. A new domestic cat genome assembly based on long sequence reads empowers feline genomic medicine and identifies a novel gene for dwarfism. bioRxiv. 2020:2020.01.06.896258. 38. Pickering C, Kiely J. ACTN3: More than Just a Gene for Speed. Frontiers in Physiology. 2017;8(1080). 39. Ropka-Molik K, Stefaniuk-Szmukier M, Musiał AD, Piórkowska K, Szmatoła T. Sequence analysis and expression profiling of the equine ACTN3 gene during exercise in Arabian horses. Gene. 2019;685:149-55. 40. Johnson WE, Eizirik E, Pecon-Slattery J, Murphy WJ, Antunes A, Teeling E, et al. The Late Miocene Radiation of Modern Felidae: A Genetic Assessment. Science (New York, NY). 2006;311(5757):73. 41. Xu X, Dong G-X, Hu X-S, Miao L, Zhang X-L, Zhang D-L, et al. The Genetic Basis of White Tigers. Current Biology. 2013;23(11):1031-5. 42. Montague MJ, Li G, Gandolfi B, Khan R, Aken BL, Searle SMJ, et al. Comparative analysis of the domestic cat genome reveals genetic signatures underlying feline biology and domestication. Proceedings of the National Academy of Sciences. 2014;111(48):17230. 43. Rettenberger G, Klett C, Zechner U, Bruch J, Just W, Vogel W, et al. ZOO-FISH analysis: cat and human karyotypes closely resemble the putative ancestral mammalian karyotype. Chromosome Research. 1995;3(8):479-86.

73 44. Murphy WJ, Sun S, Chen ZQ, Pecon-Slattery J, O'Brien SJ. Extensive conservation of sex chromosome organization between cat and human revealed by parallel radiation hybrid mapping. Genome Res. 1999;9(12):1223-30. 45. O'Brien SJ, Cevario S, Martenson JS, Thompson M, Nash WG, Chang E, et al. Comparative gene mapping in the domestic cat (Felis catus). Journal of Heredity. 1997;88(5):408-14. 46. Samaha G, Beatty J, Wade CM, Haase B. The Burmese cat as a genetic model of type 2 diabetes in humans. Animal Genetics. 2019;50(4):319-25. 47. Narfström K, Holland Deckman K, Menotti-Raymond M. The Domestic Cat as a Large Animal Model for Characterization of Disease and Therapeutic Intervention in Hereditary Retinal Blindness. Journal of Ophthalmology. 2011;2011:906943. 48. Gandolfi B, Alamri S, Darby WG, Adhikari B, Lattimer JC, Malik R, et al. A dominant TRPV4 variant underlies osteochondrodysplasia in Scottish fold cats. Osteoarthritis and cartilage. 2016;24(8):1441-50. 49. Gandolfi B, Gruffydd-Jones TJ, Malik R, Cortes A, Jones BR, Helps CR, et al. First WNK4- hypokalemia animal model identified by genome-wide association in Burmese cats. PloS one. 2012;7(12):e53173-e. 50. Wurster-Hill DH, Centerwall WR. The interrelationships of chromosome banding patterns in canids, mustelids, hyena, and felids. Cytogenet Cell Genet. 1982;34(1-2):178-92. 51. Günther T, Nettelblad C. The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS genetics. 2019;15(7):e1008302. 52. Ross MG, Russ C, Costello M, Hollinger A, Lennon NJ, Hegarty R, et al. Characterizing and measuring bias in sequence data. Genome Biology. 2013;14(5):R51. 53. Gopalakrishnan S, Samaniego Castruita JA, Sinding M-HS, Kuderna LFK, Räikkönen J, Petersen B, et al. The wolf reference genome sequence (Canis lupus lupus) and its implications for Canis spp. population genomics. BMC Genomics. 2017;18(1):495. 54. Shukla HG, Bawa PS, Srinivasan S. hg19KIndel: ethnicity normalized human reference genome. BMC Genomics. 2019;20(1):459. 55. Huang L, Popic V, Batzoglou S. Short read alignment with populations of genomes. Bioinformatics. 2013;29(13):i361-70. 56. Rasmussen M, Guo X, Wang Y, Lohmueller KE, Rasmussen S, Albrechtsen A, et al. An Aboriginal Australian genome reveals separate human dispersals into Asia. Science. 2011;334(6052):94-8. 57. Futschik A, Schlötterer C. The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics. 2010;186(1):207-18. 58. Dorant Y, Benestan L, Rougemont Q, Normandeau E, Boyle B, Rochette R, et al. Comparing Pool- seq, Rapture, and GBS genotyping for inferring weak population structure: The American lobster (Homarus americanus) as a case study. Ecology and Evolution. 2019;9(11):6606-23.

74 59. Doyle JM, Willoughby JR, Bell DA, Bloom PH, Bragin EA, Fernandez NB, et al. Elevated Heterozygosity in Adults Relative to Juveniles Provides Evidence of Viability Selection on Eagles and Falcons. Journal of Heredity. 2019;110(6):696-706. 60. Craig JE, Hewitt AW, McMellon AE, Henders AK, Ma L, Wallace L, et al. Rapid inexpensive genome- wide association using pooled whole blood. Genome research. 2009;19(11):2075-80. 61. Kofler R, Nolte V, Schlötterer C. The impact of library preparation protocols on the consistency of allele frequency estimates in Pool-Seq data. Molecular Ecology Resources. 2016;16(1):118-22. 62. Hivert V, Leblois R, Petit EJ, Gautier M, Vitalis R. Measuring genetic differentiation from Pool-seq data. bioRxiv. 2018:282400. 63. Wilkins AS, Wrangham RW, Fitch WT. The “domestication syndrome” in mammals: a unified explanation based on neural crest cell behavior and genetics. Genetics. 2014;197(3):795-808. 64. Pendleton AL, Shen F, Taravella AM, Emery S, Veeramah KR, Boyko AR, et al. Comparison of village dog and wolf genomes highlights the role of the neural crest in dog domestication. BMC Biology. 2018;16(1):64. 65. Sánchez-Villagra MR, Geiger M, Schneider RA. The taming of the neural crest: a developmental perspective on the origins of morphological covariation in domesticated mammals. R Soc Open Sci.3(6):160107. 66. Frantz LAF, Schraiber JG, Madsen O, Megens H-J, Cagan A, Bosse M, et al. Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes. Nat Genet. 2015;47(10):1141-8. 67. Salinas PC. Wnt signaling in the vertebrate central nervous system: from axon guidance to synaptic function. Cold Spring Harbor perspectives in biology. 2012;4(2):a008003. 68. Simões-Costa M, Bronner ME. Establishing neural crest identity: a gene regulatory recipe. Development. 2015;142(2):242-57. 69. Cousin H. Cadherins function during the collective cell migration of Xenopus Cranial Neural Crest cells: revisiting the role of E-cadherin. Mechanisms of Development. 2017;148:79-88. 70. Wang X, Pipes L, Trut LN, Herbeck Y, Vladimirova AV, Gulevich RG, et al. Genomic responses to selection for tame/aggressive behaviors in the silver fox (Vulpes vulpes). Proceedings of the National Academy of Sciences. 2018;115(41):10398-403. 71. Eisert R. Hypercarnivory and the brain: protein requirements of cats reconsidered. Journal of Comparative Physiology B. 2011;181(1):1-17. 72. Sicuro FL, Oliveira LFB. Skull morphology and functionality of extant Felidae (Mammalia: Carnivora): a phylogenetic and evolutionary perspective. Zoological Journal of the Linnean Society. 2011;161(2):414-62.

75 73. Depauw S, Hesta M, Whitehouse-Tedd K, Vanhaecke L, Verbrugghe A, Janssens GP. Animal fibre: the forgotten nutrient in strict carnivores? First insights in the cheetah. Journal of animal physiology and animal nutrition. 2013;97(1):146-54. 74. Winchester B. Lysosomal metabolism of glycoproteins. Glycobiology. 2005;15(6):1R-15R. 75. Safka Brozkova D, Deconinck T, Beth Griffin L, Ferbert A, Haberlova J, Mazanec R, et al. Loss of function mutations in HARS cause a spectrum of inherited peripheral neuropathies. Brain. 2015;138(8):2161- 72. 76. Parfitt DA, Michael GJ, Vermeulen EGM, Prodromou NV, Webb TR, Gallo J-M, et al. The ataxia protein sacsin is a functional co-chaperone that protects against polyglutamine-expanded ataxin-1. Human Molecular Genetics. 2009;18(9):1556-65. 77. Logan CV, Lucke B, Pottinger C, Abdelhamed ZA, Parry DA, Szymanska K, et al. Mutations in MEGF10, a regulator of satellite cell myogenesis, cause early onset myopathy, areflexia, respiratory distress and dysphagia (EMARDD). Nat Genet. 2011;43(12):1189-92. 78. Hudson PE, Corr SA, Payne‐Davis RC, Clancy SN, Lane E, Wilson AM. Functional anatomy of the cheetah (Acinonyx jubatus) forelimb. Journal of Anatomy. 2011;218(4):375-85. 79. Saif R, Henkel J, Jagannathan V, Drögemüller C, Flury C, Leeb T. The LCORL Locus Is under Selection in Large-Sized Pakistani Goat Breeds. Genes. 2020;11(2). 80. Metzger J, Schrimpf R, Philipp U, Distl O. Expression levels of LCORL are associated with body size in horses. PloS one. 2013;8(2):e56497. 81. Muñoz-Garcia A, Williams JB. Basal metabolic rate in carnivores is associated with diet after controlling for phylogeny. Physiological and biochemical Zoology. 2005;78(6):1039-56. 82. Wolf C, Ripple WJ. Prey depletion as a threat to the world's large carnivores. R Soc Open Sci.3(8):160252. 83. Durant SM, Mitchell N, Groom R, Pettorelli N, Ipavec A, Jacobson AP, et al. The global decline of cheetah <em>Acinonyx jubatus</em> and what it means for conservation. Proceedings of the National Academy of Sciences. 2017;114(3):528. 84. Chapron G, Miquelle DG, Lambert A, Goodrich JM, Legendre S, Clobert J. The impact on tigers of poaching versus prey depletion. Journal of Applied Ecology. 2008;45(6):1667-74. 85. Alexander JS, Gopalaswamy AM, Shi K, Hughes J, Riordan P. Patterns of snow leopard site use in an increasingly human-dominated landscape. PLoS One. 2016;11(5):e0155309. 86. Luskin MS, Albert WR, Tobler MW. Sumatran tiger survival threatened by deforestation despite increasing densities in parks. Nature Communications. 2017;8(1):1783. 87. Terio KA, Mitchell E, Walzer C, Schmidt-Küntzel A, Marker L, Citino S. Diseases Impacting Captive and Free-Ranging Cheetahs. Cheetahs: Biology and Conservation. 2018:349-64.

76 88. Ostrowski S, Gilbert M. Diseases of Free-Ranging Snow Leopards and Primary Prey Species. Snow Leopards: Elsevier; 2016. p. 97-112. 89. Wheelhouse JL, Hulst F, Beatty JA, Hogg CJ, Child G, Wade CM, et al. Congenital vestibular disease in captive Sumatran tigers (Panthera tigris ssp. sumatrae) in Australasia. The Veterinary Journal. 2015;206(2):178-82. 90. Herrin KV, Allan G, Black A, Aliah R, Howlett CR. STIFLE OSTEOCHONDRITIS DISSECANS IN SNOW LEOPARDS (UNCIA UNCIA). Journal of Zoo and Wildlife Medicine. 2012;43(2):347-54, 8. 91. Smallbone W, van Oosterhout C, Cable J. The effects of inbreeding on disease susceptibility: Gyrodactylus turnbulli infection of guppies, Poecilia reticulata. Experimental Parasitology. 2016;167:32-7. 92. Karlsson EK, Sigurdsson S, Ivansson E, Thomas R, Elvers I, Wright J, et al. Genome-wide analyses implicate 33 loci in heritable dog osteosarcoma, including regulatory variants near CDKN2A/B. Genome Biol. 2013;14(12):R132. 93. Crosier AE, Wachter B, Schulman M, Lüders I, Koester DC, Wielebnowski N, et al. Chapter 27 - Reproductive Physiology of the Cheetah and Assisted Reproductive Techniques. In: Nyhus PJ, Marker L, Boast LK, Schmidt-Küntzel A, editors. Cheetahs: Biology and Conservation: Academic Press; 2018. p. 385- 402. 94. de Lahunta A, Glass E. Vestibular System: Special Proprioception. Veterinary Neuroanatomy and Clinical Neurology. 2009:319-47. 95. Sironen A, Kotaja N, Mulhern H, Wyatt TA, Sisson JH, Pavlik JA, et al. Loss of SPEF2 function in mice results in spermatogenesis defects and primary ciliary dyskinesia. Biol Reprod. 2011;85(4):690-701. 96. Scheffer DI, Shen J, Corey DP, Chen Z-Y. Gene Expression by Mouse Inner Ear Hair Cells during Development. The Journal of Neuroscience. 2015;35(16):6366. 97. Azaiez H, Decker AR, Booth KT, Simpson AC, Shearer AE, Huygen PLM, et al. HOMER2, a Stereociliary Scaffolding Protein, Is Essential for Normal Hearing in Humans and Mice. PLOS Genetics. 2015;11(3):e1005137. 98. Azaiez H, Decker AR, Booth KT, Simpson AC, Shearer AE, Huygen PLM, et al. HOMER2, a stereociliary scaffolding protein, is essential for normal hearing in humans and mice. PLoS genetics. 2015;11(3):e1005137-e. 99. Alagramam KN, Murcia CL, Kwon HY, Pawlowski KS, Wright CG, Woychik RP. The mouse Ames waltzer hearing-loss mutant is caused by mutation of Pcdh15, a novel protocadherin gene. Nature Genetics. 2001;27(1):99-102. 100. Tilley AE, Walters MS, Shaykhiev R, Crystal RG. Cilia dysfunction in lung disease. Annu Rev Physiol. 2015;77:379-406. 101. Sironen A, Shoemark A, Patel M, Loebinger MR, Mitchison HM. Sperm defects in primary ciliary dyskinesia and related causes of male infertility. Cellular and Molecular Life Sciences. 2020;77(11):2029-48.

77 102. Bell K. Morbidity and Mortality in Hand Reared Cheetah Cubs Animal Keeper's Forum2005. p. 306- 14. 103. Lobo J, Zariwala MA, Noone PG. Primary ciliary dyskinesia. Semin Respir Crit Care Med. 2015;36(2):169-79. 104. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England). 2009;25(14):1754-60. 105. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-9. 106. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research. 2010;20(9):1297-303. 107. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43(5):491-8. 108. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156-8. 109. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics (Oxford, England). 2011;27(21):2987- 93. 110. Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4(1):7. 111. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biology. 2016;17(1):122. 112. Nelson CW, Moncla LH, Hughes AL. SNPGenie: estimating evolutionary parameters to detect natural selection using pooled next-generation sequencing data. Bioinformatics. 2015;31(22):3709-11. 113. Raudvere U, Kolberg L, Kuzmin I, Arak T, Adler P, Peterson H, et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Research. 2019;47(W1):W191-W8. 114. Supek F, Bošnjak M, Škunca N, Šmuc T. REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms. PLOS ONE. 2011;6(7):e21800. 115. Liao Y, Wang J, Jaehnig EJ, Shi Z, Zhang B. WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs. Nucleic Acids Research. 2019;47(W1):W199-W205. 116. Glaab E, Baudot A, Krasnogor N, Schneider R, Valencia A. EnrichNet: network-based gene set enrichment analysis. Bioinformatics. 2012;28(18):i451-i7.

78 117. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic acids research. 2005;33(suppl_1):D514-D7. 118. Kinsella RJ, Kähäri A, Haider S, Zamora J, Proctor G, Spudich G, et al. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database. 2011;2011. 119. Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Molecular biology and evolution. 2018;35(6):1547-9. 120. Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S. AmiGO: online access to ontology and annotation data. Bioinformatics. 2009;25(2):288-9. 121. Pomaznoy M, Ha B, Peters B. GOnet: a tool for interactive Gene Ontology analysis. BMC Bioinformatics. 2018;19(1):470. 122. Jourquin J, Duncan D, Shi Z, Zhang B. GLAD4U: deriving and prioritizing gene lists from PubMed literature. BMC genomics. 2012;13(S8):S20.

79 Tables and figures

Table 1: Summary of alignment of big cat individuals and pools to the domestic cat reference assembly.

Reads mapped Read Quality filtered TsTv Nucleotide diversity Species Sample ID (%) depth SNVs ratio (π) Cheetah CHEETAH_NAM1 97.74 12.41 27,854,391 1.82 0.00596 CHEETAH_NAM2 97.44 13.43 31,254,807 1.83 0.00804 CHEETAH_NAM3 97.45 11.75 32,936,565 1.85 0.00940 CHEETAH_TZA1 96.94 11.93 29,981,645 1.81 0.00765 CHEETAH_TZA2 97.41 12.52 27,435,836 1.80 0.00637 CHEETAH_TZA3 97.35 11.46 31,533,437 1.85 0.00801 pool (N=4) 79.23 12.42 38,246,085 2.12 0.04701 Sumatran tiger SUM_IDN1 95.20 25.78 2,245,646 1.70 0.00243 SUM_IDN2 96.94 50.14 2,984,324 1.67 0.00237 SUM_IDN3 93.87 5.56 2,213,024 1.79 0.00213 SUM_IDN4 95.43 7.15 2,626,994 1.60 0.00243 SUM_IDN5 94.98 8.55 2,442,662 1.78 0.00194 SUM_USA1 94.47 35.91 2,205,499 1.73 0.00203 SUM_USA2 96.58 7.16 2,361,962 1.68 0.00243 pool (N=4) 88.84 6.28 18,859,425 2.10 0.00740 Snow leopard SNOW 91.30 7.34 10,634,946 1.73 0.00246 pool (N=4) 95.58 43.59 9,340,412 1.83 0.00281

Figure 1: Distinct clustering of samples within species groups based on multi-dimensional scaling (MDS). a. Cheetah individuals are labelled as either Namibian (CHEETAH_NAM) or Tanzanian (CHEETAH_TZA) based on geographic provenance, b. Snow leopard pool and individual samples and c. Sumatran tiger individuals were also labelled as either American (SUM_USA) or Indonesian (SUM_IND) based on geographic provenance.

80

Table 2: Heterozygosity, pairwise relatedness (mean±SD) and similarity coefficients of each species group was calculated using individual data only (i.e. pools excluded). Heterozygous SNV rate was calculated as the ratio of heterozygous SNVs over the felCat9 genome assembly length. Snow leopard values correspond to the single individual sample ‘SNOW’.

Coancestry coefficient Expected Observed Heterozygous Inbreeding Within species IBS (ϴ) heterozygosity heterozygosity SNV rate coefficient (F) FMin FMax Snow leopard (N=1) 0.5 - 0.49 0.38 0.004 0.233 - - Sumatran tiger(N=7) 0.43±0.16 1.11±0.44 0.429±0.0002 0.178±0.014 0.001±0.0001 0.585±0.033 0.553 0.645 Cheetah (N=6) 0.36±0.14 -0.42±0.31 0.392±0.0002 0.245±0.025 0.012±0.0008 0.346±0.062 0.259 0.419

Table 3: Summary of SNV called and annotated for each species. Total number of samples corresponds to the number of individual WGS and individuals included in each pool (N=4). Fixed SNVs refers to variants calling the alternative allele across all samples in each species relative to the domestic cat.

Cheetah Snow leopard Sumatran tiger Quality filtered SNV Total 38,839,061 15,504,143 13,414,953 Coding variants 449,996 144,156 126,127 Transcripts overlapped 53,786 53,707 53,640 Genes overlapped 29,043 28,978 28,930 Fixed SNV Total 1,737,447 13,882,181 3,755,816 Coding variants 9,910 122,446 37,539 MAF filtered SNV Total 2,671,858 55,604 409,014 MAF threshold* 0.125 0.25 0.143 Missense 17,333 623 2,737 Deleterious¥ 4,734 177 772 *MAF thresholds were used to exclude SNV that did not appear in at least one individual in a homozygous state or at least two individuals in a heterozygous state. ¥ SNV annotated as ‘deleterious’ by VEP tool had a SIFT score 0-0.05.

81

Figure 2: Distribution of major transcript-associated SNV annotation categories among all three big cat species was proportional among each species.

82 Chapter 4 Cross-species applications of low-density SNV discovery and genotyping techniques for the benefit of conservation research

4.1 Synopsis- Harnessing cross-species SNV methodologies for conservation genomics: A comparison of genotyping array and reduced-representation sequencing methods in wild felids

Efforts to integrate genomic datasets into wildlife conservation have included low-density approaches like reduced-representation sequencing and genotyping arrays. These approaches offer an informative and cost- effective means to estimate demographic processes and can provide a reference-free approach to population- scale genomics in species without an available reference assembly. However, when paired with a reference assembly, these techniques are more computationally efficient, less prone to error and offer functional annotations of adaptive and deleterious variants. Improving the widespread implementation of genomic methods into conservation management of wild species requires pragmatic and descriptive methods that demonstrate what these methods can offer and what aspects of conservation management they can inform. In this chapter, I present a comparison of cross-species applications of reduced-representation sequencing and genotyping array methodologies in the Sumatran tiger, snow leopard and cheetah. I compare the quantity and location of polymorphic markers called by both methods and the ability of each dataset to resolve population structure within species and identify genomic regions under selection. Results showed that reduced- representation sequencing offered a higher density of markers and therefore was a higher resolution of population structure and regions under selection. This chapter is presented in the format of a manuscript that has been submitted to the journal Molecular Ecology Resources.

83 Harnessing cross-species SNP methodologies for conservation genomics: A comparison of genotyping array and reduced-representation sequencing methods in wild felids

Georgina Samaha1*, Claire M. Wade2, Hamutal Mazrier1, Catherine E. Grueber2† & Bianca Haase1†

1Sydney School of Veterinary Science, Faculty of Science, The University of Sydney, NSW, Australia

2School of Life and Environmental Sciences, The University of Sydney, NSW, Australia

* Corresponding author: [email protected]

†These authors share senior authorship

84 Abstract Recent genomic advances have increased opportunities for wildlife conservation research. The growing range of sequencing and genotyping options raises the question of which dataset is best suited for estimating demographic processes of conservation value in a population, such as natural selection, population bottlenecks, and relatedness. SNP genotyping and discovery techniques are most informative and computationally efficient when combined with a high-quality reference genome assembly, not commonly available to wildlife species. In the absence of a reference genome, cross-species applications of reference genomes in closely related species can be used, with the added benefit of offering comparative insights into ecologically adaptive traits and heritable diseases. Here, we compared the cross-species utility of two commonly used techniques: reduced-representation sequencing and genotyping arrays. We simulated single nucleotide polymorphism datasets from whole genome sequence data for snow leopards (Panthera uncia), Sumatran tigers (Panthera tigris sumatrae) and cheetahs (Acinonyx jubatus) and aligned them to the domestic cat (Felis catus) reference genome assembly. We compared the quantity and genomic location of polymorphic markers called by reduced-representation sequencing and the feline genotyping array against the feline reference genome, and the ability of each dataset to estimate population structure and relatedness within species and identify genomic regions under selection. We showed cross-species applications of the feline SNP array and reference-based RRS to be reproducible across three Felidae species. Compared with the genotyping array, reduced-representation sequencing offered higher SNP density and greater resolution of population structure and genotype-phenotype associations.

Keywords reduced-representation sequencing, genotyping array, conservation, cat, SNP

85 Introduction Recent decades have seen the emergence of genomics as a tool for conservation management. Next-generation sequencing technologies have dramatically increased the quantity and quality of genomic markers available for studies in model and non-model species alike. Screening a population across thousands, or millions of single nucleotide polymorphisms (SNP) can increase the precision of diversity estimates over traditional markers, and more-readily differentiate between neutral and functional variation (Zimmerman, Aldridge, & Oyler-McCance, 2020). While traditional genetic markers, including microsatellites, have been used to estimate demographic history, genetic drift, gene flow and effective population size [e.g. (Gooley, Hogg, Belov, & Grueber, 2017; Takezaki & Nei, 2009; Terrell et al., 2016)], genomic marker sets are typically over four orders of magnitude larger than those of traditional markers, giving a significantly higher coverage of a species’ genome. Coupled with the use of a reference genome, this higher-density data allows researchers to investigate questions regarding species’ ecological fitness with unprecedented insight into gene-environment interactions (De La Torre, Wilhite, & Neale, 2019; Pendleton et al., 2018; Wright et al., 2015).

Today, the genome of an individual of any species can be sequenced and assembled at a moderate financial cost. This requires however considerable bioinformatic and computing resources (Mardis, 2010) and calls have been made to improve accessibility of the infrastructure, methods and analytical pipelines that are essential to this work (McMahon, Teeling, & Höglund, 2014; Shafer et al., 2015). While whole genome sequencing (WGS) is the gold- standard for population-scale genomics, affordable, more accessible low-density methods including genotyping arrays and reduced-representation sequencing (RRS) are increasingly used (Minias, Dunn, Whittingham, Johnson, & Oyler-McCance, 2019; Wright et al., 2020). These low-density approaches outperform traditional marker sets and are a cheaper alternative to WGS, targeting both coding and non-coding regions of the genome.

The growing range of sequencing and genotyping options raises the question of which dataset is best suited to estimating the demographic processes of conservation value in a population, such as adaptive selection, population bottlenecks and relatedness. Genotyping arrays profile a targeted set of SNPs and are the low-density approach of choice in model species, taking advantage of extensive linkage disequilibrium (LD) among markers. Genotyping arrays have been used to identify signatures of selection (Wang et al., 2018), resolve breed structure (Schaefer et al., 2017; Yang et al., 2019) and reveal the basis of heritable diseases (O’Brien et al., 2020; Samaha et al., 2020). In comparison, RRS methods have been more widely adopted by ecologists and plant geneticists working with non- model species, due to their relatively low costs and ability to identify novel genetic variation without a reference genome (Paun, Verhoeven, & Richards, 2019).

86

Both approaches offer a comparable number of SNPs, commonly used to delineate population structure (Judkins, Couger, Warren, & Van Den Bussche, 2020), measure genetic diversity (Natesh et al., 2017; Wright et al., 2015), perform genome wide association studies (GWAS) (Margres et al., 2018; Xu et al., 2013), and identify ecologically- relevant alleles (Gagnaire, Normandeau, Pavey, & Bernatchez, 2013; Husby et al., 2015). RRS methods have offered conservation geneticists substantial advantages over other genotyping techniques in resolving conservation questions (Choquet et al., 2019; Paun et al., 2019; Wright et al., 2020). Although some studies have found RRS methods can generate an order of magnitude fewer SNPs than SNP arrays (Negro et al., 2019), genotyping of RRS loci has the advantage of not requiring any prior knowledge of the genome of the target species, making it particularly suitable for non-model species.

Reference genomes of non-model species remain scarce, and this shortage is often cited as a barrier to SNP discovery (Galla et al., 2016; Shafer et al., 2015; Supple & Shapiro, 2018; Taylor, Dussex, & van Heezik, 2017). Nevertheless, cross-species applications have the benefit of offering comparative insights into ecologically adaptive traits and heritable diseases (Galla et al., 2019; Kharzinova et al., 2018; Minias et al., 2019; Ogden, Baird, Senn, & McEwing, 2012). Using reference genomes and SNP arrays developed for model species can provide biological context in the form of variant and gene annotation, and are more computationally efficient than methods that do not use a reference (Lischer & Shimizu, 2017). We previously demonstrated the utility of a cross-species SNP discovery using the domestic cat (Felis catus) reference genome assembly (Samaha et al. under review) and successfully called and annotated SNPs relevant to population structure, adaptive traits, evolutionary history and the pathogenesis of heritable diseases in cheetah (Acinonyx jubatus), snow leopard (Panthera uncia), and Sumatran tiger (Panthera tigris sumatrae).

As human populations encroach on their ranges, wild cat conservation has become an important global conservation priority (Qi, Holyoak, Ning, & Jiang, 2020; Ripple et al., 2014). Implementing genomics in conservation management of these species requires approaches that can deliver reliable, contemporary, and pragmatic information. Ongoing demonstration and evaluation of these technologies and their ability to answer relevant questions is essential to their widespread inclusion in conservation management. Here, we simulated RRS and genotyping array datasets from big cat WGS data, aligned to the most recent feline reference genome assembly, to evaluate workflows for generating low-density SNP markers that can be used in a conservation management context. We compared a reference-aligned RRS method, restriction-site associated DNA sequencing (RADseq) with the feline Illumina Infinium iSelect 63K DNA genotyping array using data from cheetah, snow leopard, and

87 Sumatran tiger. We compare the ability of each of these tools to delineate population structure and relatedness, quantify the proportion of coding and non-coding variants and answer questions relating to phylogenetic relationships and adaptive diversity.

Materials and Methods Ethics approval Research was conducted at The University of Sydney, under Animal Ethics Committee approval no: N00/9– 2009/3/5109, 24 September 2009. Whole blood samples were collected under Zoos Victoria ethics approval ZV09006 and Taronga Conservation Society approval #R12B127.

Reference genome alignment and variant calling As previously described (Samaha, Wade, Mazrier, Grueber, & Haase, Under Review), genomic DNA of four cheetahs, four snow leopards and four Sumatran tigers was extracted. Equimolar DNA pools consisting of four individuals per species were sequenced with paired-end 150-bp reads on Illumina Hiseq2000. Individuals included in species pools were collected from a database of EDTA-stabilised whole blood from captive managed big cats housed in Australian zoos. WGS of seven Sumatran tigers, one snow leopard and six cheetahs were downloaded from sequence read archive (SRA) (Table S1) (Samaha et al., Under Review). Of SRA samples, cheetahs had been collected from two wild subpopulations: Namibia and Tanzania and the snow leopard and Sumatran tigers collected from captively bred animals. SRA Sumatran tigers had been collected from two zoo populations in Indonesia and the USA. Pooled and individual WGS samples were aligned to the felCat9 reference genome assembly (Buckley et al., 2020) using the Burrows-Wheeler Alignment tool (BWA-mem) (Li & Durban 2009).

For each sample, variant calling of SNPs and short indels was performed by Genome Analysis ToolKit’s (GATK) HaplotypeCaller with a minimum base quality score of 20 and a minimum mapping quality score of 20, and GenotypeGVCF tools following best practices (DePristo et al., 2011; McKenna et al., 2010). Variants in variant call file (VCF) format were filtered based on quality (QUAL, QD, MQ, FS) using GATK’s VariantFiltration and remove indels using GATK’s SelectVariants tool. For each sample, a consensus FASTA file was created using the felCat9 FASTA sequence and filtered SNV datasets for each sample using BCFtools (Li, 2011) consensus function, forcing the alternate allele to be called at heterozygous sites. Following alignment to the felCat9 reference assembly, read coverage for species pools ranged from 25.78x to 50.14x and alignment of species pools to the felCat9 reference assembly resulted in greater than 90% coverage of the reference at a minimum depth of 20x compared with a depth of ~5x, ~7x and 10x in cheetah, Sumatran tiger and snow leopard individual samples, respectively.

88 Genotyping array simulation: ‘array dataset’

To identify the coordinates of array variants on the Felis_catus_9.0 reference genome assembly, a Basic Local Alignment Search (BLAST) was performed using array marker probe sequences available in the feline Infinium genotyping array manifest (Willet & Haase, 2014) and the consensus FASTA sequences for each sample. A FASTA file was created using the marker identifier and flanking genomic sequence upstream and downstream of each SNP in top orientation. The longest flanking sequence was retained and output in FASTA format. A custom BLAST database was prepared using BLAST+ to make the reference FASTA sequence searchable. A nucleotide BLAST search was performed, and output was filtered to collect the single best hit for each sequence. Hits that failed to return any result, hits that failed to reach the base position immediately adjacent to the marker, and hits that had an E-value greater than 1 × 10-5 were rejected. Positions of successfully mapped markers were extracted and reformatted to PLINK 1.9 (Chang et al., 2015) format (herein referred to as the ‘array dataset’).

In silico RRS digestion of WGS: ‘RRS dataset’

Perl module RestrictionDigest (Wang, Guofan, Haigang, Li, & Xuedi, 2016) was used to perform in silico double- digest restriction-site associated sequencing (RADseq) of consensus FASTA files using enzyme pair NlaIII and MluCI (Natesh et al., 2017). RestrictionDigest module was run in double digest mode with boundary and step parameters run in default mode. Digested fragments in FASTA format were reformatted to sequencing FASTQ format and fragments were realigned to the felCat9 reference assembly using BWA-mem. Variant calling was performed using GATK as above. In order to make comparisons between species, each species cohort variant call format file (VCF) was merged into a single VCF using BCFtools (--merge) and missing SNP genotypes were called as the reference allele to standardise the combined SNP sets. Multiallelic variants differing between species were split into individual entries in the species VCF using BCFtools (--norm). VCF files were reformatted to PLINK format (herein referred to as the ‘RRS dataset’).

Diversity summary statistics and population structure analysis Diversity summary statistics and population structure analysis were conducted for both RRS and array datasets using PLINK. Principal component analysis (PCA) was performed to evaluate and compare the ability of each dataset to reflect the genetic structure of populations and individuals within each species group. PCA was performed using SNPs genotyped in at least 90% of individuals (--geno) and polymorphic in at least two individuals in each species group. To quantify relatedness among individuals within a species based on allele-sharing

89 proportions, pairwise identical by state (IBS) Hamming distances were calculated using PLINK (--ibs-matrix). To determine the extent to which association mapping could be used for each dataset, pairwise measures of linkage disequilibrium (LD) between markers across all autosomes were collected. Using PLINK (--r2, --ld-window, --ld- window-r2 –ld-window-kb), we measured LD between markers at a distance of 1 kb to 15 Mb from one another. Only species cohorts including seven or more samples were included in LD quantification.

Variant annotation and analysis Given the small number of samples for each species, species-specific polymorphic markers were collected using minor allele frequency (MAF) thresholds that allowed for a variant to be present in at least two individuals in a heterozygous form or one individual in homozygous form for each dataset. Polymorphic SNP array and RRS datasets were annotated using Ensembl Variant Effect Predictor (VEP) tool (McLaren et al., 2016) and the NCBI annotation release 104 of the felCat9 genome assembly. SNP datasets were evaluated for taxonomic association between the two Panthera species (Sumatran tiger and snow leopard) and the cheetah. To identify taxonomically segregating variants, two case-control association analyses were run for each dataset in PLINK (--assoc) using genus as a “phenotype”. SNPs passing a Bonferroni adjusted significance threshold of 1.39 × 10-5 for array dataset and 6.34 × 10-7 for the RRS dataset were considered significantly associated. Functional annotation of Gene Ontology (GO) terms (Ashburner et al., 2000) was performed using DAVID (Huang da, Sherman, & Lempicki, 2009a, 2009b) and AgriGO (Tian et al., 2017). Phenotypic annotation of genes containing fixed SNPs for each species to the OMIM database (Amberger, Bocchini, Schiettecatte, Scott, & Hamosh, 2015) was performed using Ensembl’s BioMart software suite (Smedley et al., 2009).

Results In silico genotyping of 63K feline array SNPs Of the 62,897 variants on the feline genotyping array, 59,248 (94.2%) were called successfully in our genomic data for Sumatran tigers, 59,313 (94.3%) in snow leopards, and 58,592 (93.1%) in cheetahs. The average distance between array markers was 40,018 bp across all chromosomes realigned to the felCat9 reference assembly. The mean call rate (± SD) of all genotyped SNPs across samples was 95 ± 0.017% in cheetahs, 95.9 ± 0.002% in snow leopards and 95.8 ± 0.007% in Sumatran tigers. Of the successfully genotyped SNPs, cheetahs exhibited the highest proportion of polymorphic markers (706 SNPs, 1.19%), followed by Sumatran tigers (343 SNPs, 0.57%) and snow leopards (91 SNPs 0.15%) (Figure 1a).

In silico restriction-site associated digestion and SNP calling

90 A RADseq digest was simulated for eight Sumatran tiger samples, two snow leopard samples and seven cheetah samples using WGS data. Given the high proportion of aligned reads across all species to the felCat9 reference assembly, the total number of digested fragments generated across all samples showed high consistency (range 8,471,424 bp to 8,467,761 bp). Digested fragments were distributed evenly across chromosomes, covering 40.3- 47.5% of each chromosome. Following quality and MAF-based filtering, SNPs that were polymorphic in at least two samples within each species group were collected. In Sumatran tigers this included 23,638 SNPs, in snow leopards 4,475, and in cheetahs 540,917 SNPs; 107 SNPs were common to all three species (Figure 1b).

SNP distribution across the genome and variant annotation Across all three species, the array versus RRS datasets differed in the distribution and density of SNPs. SNPs were observed at a significantly higher density in the RRS dataset than the array dataset across all species (Figure 1) (Table S2). Across the RRS datasets, genotyping rates of polymorphic markers were consistent across all chromosomes, and proportional to their size. Large metacentric (A1, A2, A3), midsize metacentric (C1) and large subtelomeric (B1, B4, X) chromosome classes had the highest number of polymorphic markers with the lowest intermarker distances of all chromosomes in cheetahs and Sumatran tigers, but not snow leopards. The marker density of both array and RRS datasets was reflected in genome-wide LD profiles. In the array dataset, inconsistent LD decay reflected the low density of polymorphic markers (Figure S1a). For the RRS dataset, short range LD was observed in both Sumatran tiger and cheetah datasets, but the rate of LD decay was present over a larger distance in the Sumatran tiger (Figure S1b). In the RRS dataset, LD did not drop below 0.15 across a 15 Mb range which is likely a reflection of the small sample size (Sumatran tiger N=8, cheetah N=7) used for this analysis. The mean r2 value for SNPs in the RRS dataset up to 50kb apart was 0.65 for cheetahs and 0.45 for Sumatran tigers. Snow leopards were excluded from LD analysis due to an insufficient number of samples.

For both the array and RRS datasets, snow leopards had the highest inter-marker distances and lowest chromosome density compared to the other two species, likely due to low sampling rate. For cheetahs the average intermarker distance was 3.5 Mb between array markers and 3.8 kb between RRS markers (Figure 2a), for snow leopards it was 25.2 Mb between array markers and 275kb between RRS markers (Figure 2b), and in Sumatran tigers, the average inter-marker distance between polymorphic array markers was 7.5 Mb and 99.7 kb between RRS markers (Figure 2c). For both datasets, the largest gaps and lowest density of SNPs was observed in snow leopards, relative to the other two species (Table S2). A small proportion of SNPs were identified in both the array and RRS datasets in cheetahs (57 SNPs), Sumatran tigers (2 SNPs), and snow leopard (1 SNP) (Table S3). A low proportion of polymorphic markers <1 Mb apart was observed in cheetahs (array: 30.7%), Sumatran tigers (array: 21.3%), and snow leopards (array: 5.5%). The proportion of array markers with an inter-marker distance >1Mb was

91 >95% across all species, and at >1kb was 43.5% for cheetahs, 28.8% for Sumatran tigers and 30.2% for snow leopards.

To examine the ability of each dataset to detect functional variants, variant effect predictor (VEP) was used to assign functional classes to SNPs in big cats, based on the felCat9 reference assembly annotation. These annotations were limited to the most severe consequence per variant (Table 1). Across all species in both datasets, non-coding variant categories made up majority of SNPs. In both array and RRS datasets, variants segregating in genes unique -6 -6 to the cheetah were enriched for neuron projection morphogenesis (Parray = 6.21 × 10 ; PRRS=2.91 × 10 GO:0048812). SNPs segregating in genes in the snow leopard array dataset were not functionally enriched, -3 however those in the RRS dataset were enriched for dendrite development (PRRS = 1.98 × 10 ; GO:0050773).

Similarly, in Sumatran tigers, SNPs segregating in genes were enriched for neuron projection morphogenesis (Parray -6 = 4.94 × 10 ; GO:0048812) and RRS markers segregated in genes enriched for dendrite morphogenesis (PRRS = 1.61 × 10-6; GO:0048813). Twenty-nine of the 107 variants common to all species in the RRS dataset were associated with 14 genes (Table S4).

Detecting population structure and relatedness To test whether the array and RRS datasets convey similar information about population structure, principal component analysis (PCA) was conducted to visualise pairwise distances based on variance standardised relationship matrix among all samples and within species for each dataset. PCA of the array dataset was performed using 221 autosomal SNPs (Figure 3a). PCA of the RRS dataset was performed using 373 autosomal SNPs (Figure 3b). Both array and RRS datasets successfully differentiated species and individuals. Among cheetahs there was clear sub-population clustering of Namibian and Tanzanian samples in the RRS dataset. Sumatran tigers were spread, regardless of geographic provenance (Indonesia or USA). Pairwise IBS/Hamming distances were >0.99 for all samples across all species cohorts in the array dataset (cheetahs 0.997 ± 0.0007, snow leopards 0.992, Sumatran tigers 0.998 ± 0.0001). In the RRS dataset, cheetahs had an average IBS value of 0.68 ± 0.035, Sumatran tigers had an average IBS value of 0.67 ± 0.02 and snow leopards had an IBS of 0.83.

Variant associations among species To test the ability of array and RRS datasets to explore the genetic basis of stratification among individuals, a case- control association analysis comparing Panthera species to cheetah was run for both datasets, using the first two principal components (described in previous section) as covariates. For the array dataset, 4,252 polymorphic SNPs were used to perform a case control GWAS to detect differences between Panthera species and cheetah. In total, 1,936 fixed SNPs passing a Bonferroni corrected genome-wide significance level of 3.19 × 10-5, segregated in 866

92 genes across all autosomes (Figure S2a). Of the segregating SNPs, 71 synonymous and missense variants were observed across 31 genes. GO annotation by DAVID highlighted taxonomic differences in cilium organisation (GO:0044782; P= 5.5 × 10-2) and cellular protein complex assembly (GO:0034622; P = 7.0 × 10-2). To examine potential functional significance of these genes, AgriGo gene ontology singular enrichment analysis (SEA) found ATPase activity (GO:0016887) to be significantly enriched (FDR<0.05) among SNPs showing fixed differences between cheetah and Panthera species (Figure S3a).

In the RRS dataset, 115,483 SNPs were included in the case-control GWAS comparing cheetah with Panthera species. Of these, 11,841 SNPs passed Bonferroni-corrected genome wide significance threshold of 4.32 × 10-7 (Figure S2b). In total, 64 missense and synonymous SNPs segregating in 61 genes displayed fixed differences between Panthera species and cheetahs. The segregating SNPs included 13 missense variants (Table S5). One missense variant in SLC45A2 was uniquely observed in cheetahs, GO annotation highlighted taxonomic differences in ion transport (GO:0006811; P = 2.5 × 10-2), sensory perception of light stimulus (GO:0050953; P= 1.6 × 10-2), Schwann cell development (GO:0014044; P= 1.9 × 10-3) and actomyosin structure organisation (GO:0031032; P= 8.8 × 10-2). AgriGo gene ontology SEA found significant enrichment (FDR <0.05) of six GO terms involved in neurological function (Figure S3b).

Discussion When choosing a suitable SNP identification and genotyping method, researchers must consider whether their dataset of choice can reflect the relevant aspects of a population’s genetic profile. The fundamental requirement that SNP-based population analyses must fulfill is the ability to adequately interrogate loci driving population stratification, including both neutral loci and those under selective pressure. Here, the RRS dataset consistently outperformed the feline genotyping array in genome and gene coverage and in resolving population structure and genotype-phenotype investigations.

Compared with the array, the RRS dataset offered a far more comprehensive SNP set across all species. The quantity and distribution of markers provided by RRS depends largely on the choice of restriction enzymes and sequencing depth (Wang, Li, Qi, Du, & Zhang, 2016). Here, we simulated a double-digest RADseq method using a restriction enzyme combination previously used to estimate genetic differentiation and signatures of selection in wild Bengal tigers (Panthera tigris tigris) across India (Natesh et al., 2017). Compared with Natesh et al. (2017), we generated fewer SNPs across individual Sumatran tiger samples and a comparable number in the pooled sample.

93 This can be attributed to differences in sequencing depth of samples, reference genome assemblies and quality- based filtering criteria. Natesh et al. (2017) used the draft tiger reference assembly (Cho et al., 2013) which has since been corrected due to erroneous base calls affecting 4,472 genes (Mittal, Jaiswal, Vijay, Saxena, & Sharma, 2019).

The low proportion of polymorphic array markers is consistent with previous cross-species applications of SNP arrays in other mammals (Kharzinova et al., 2015; Miller, Poissant, Kijas, Coltman, & the International Sheep Genomics, 2011; Ogden et al., 2012) and the feline array (Gandolfi et al., 2018; Li, Davis, Eizirik, & Murphy, 2016). In mammals, cross-species applications of SNP arrays typically show a linear decrease in genotyping rates with divergence (~1.5% per million years), coupled with an exponential decrease in polymorphic calls (Hoffman, Thorne, McEwing, Forcada, & Ogden, 2013; Miller, Kijas, Heaton, McEwan, & Coltman, 2012). The domestic cat shares a divergence time of ~6.7 million years (MY) with the cheetah (Johnson et al., 2006), and ~11 MY with Panthera species (Cho et al., 2013). SNP arrays are developed for model species to measure genetic diversity and investigate signatures of selection and disease. As such, they contain species-specific SNPs and typically reflect a genetic profile associated with domestication and breed development. The feline SNP array was developed using 47 pedigreed cat breeds and has been used to study the genetic basis of simple traits (Yu, Creighton, Buckley, Lyons, & Consortium, 2020), complex disease (Samaha et al., 2020) and resolve breed structure (Alhaddad et al., 2013; Gandolfi et al., 2018). It is unlikely that the low proportion of amplified polymorphic markers accurately reflect the overall pattern of diversity in the domestic cat’s wild relatives.

As a viable alternative to WGS, low-density SNP typing methodologies take advantage of LD among markers to maximise efficient coverage of the whole genome. Including SNPs that are in sufficient LD (r2 > 0.8) with each other is critical for capturing causative variants (Tam et al., 2019). Given its higher SNP density, the RRS dataset was able to delineate genome-wide LD, unlike the array dataset. LD between markers persisted at higher levels in both cheetahs and Sumatran tigers than has previously been observed in domesticated species including the domestic cat (Alhaddad et al., 2013). While this pattern may reflect the genomic profile of critically endangered (Sumatran tiger) and vulnerable (cheetah) species, with a relatively low level of genetic diversity (Dobrynin et al., 2015; Liu et al., 2018; Luo, Johnson, & O'Brien, 2010), these estimates may also reflect the elevated correlation among SNPs in cross-species genome alignments (Malomane et al., 2018). Fixed SNPs that arise from cross-species differences between the domestic cat reference and wild cat species can overestimate genetic similarity among samples (Armstrong et al., 2020; Gopalakrishnan et al., 2017). Given the small sample sizes, we were unable to determine

94 whether monomorphic SNPs represent fixed differences between the domestic cat and wild cat species or true homozygous sites.

Another promising indicator of a SNP set’s utility is its ability to identify causative alleles; a major benefit of reference- based approaches is the biological context that gene and variant annotation databases can provide. By studying the genetic basis of phenotypic traits in wild populations, we improve our understanding of adaptation under natural selection (Kozma et al. 2019), species-specific signatures of selection (Liu et al., 2018) and the genetic basis of heritable disease (Storfer et al., 2018). In recent years the domestic cat has been the subject of a growing bank of genomic resources and has served as a model for heritable diseases in humans (Gurda, Bradbury, & Vite, 2017; Mazrier et al., 2003; Menotti-Raymond & O’Brien, 2008; Samaha, Beatty, Wade, & Haase, 2019). The high degree of genomic synteny between humans and cats (Murphy et al., 2000), allows us to take advantage of annotated human gene and metabolic pathway databases in studying the genetic basis of adaptive traits and heritable diseases in wild and domestic cat species. Variant annotation and GWAS analyses of the both array and RRS datasets reported here revealed fixed variants in genes that have been previously associated with phenotypic differences between the cheetah and Panthera species. However, the higher resolution of the RRS dataset offered a more comprehensive genomic picture. We identified missense variants in genes potentially underlying cheetah’s reputation as the fastest land mammal. This included GO annotation implicating PPFIA1 and EPB41L4A in actomyosin protein complexes which have been shown to affect striated muscle contraction and heart rate (Guo & Guilford, 2006). Additionally, missense variants were observed in genes responsible for cardiac development including LAMA2 (Yarnitzky & Volk, 1995) and ADGRG6 (Patra et al., 2013). Cheetahs have been described as having enlarged hearts, lungs and bronchi as adaptations for high speed running (Eaton, 1974). Further, a missense variant unique to cheetahs was observed in SLC45A2, a gene involved in melanin synthesis (Holl et al., 2019; Le et al., 2020; H. Wang et al., 2016) and previously implicated in white coat colour in tigers (Xu et al., 2013).

Neutral markers have traditionally been used to identify population structure, however these do not give insight into adaptive divergence among and within populations. Quantifying relatedness among individuals and population structure using SNP data aids conservation management by informing the distribution of diversity in wild populations (Kitchener et al., 2017; Schmidt-Küntzel et al., 2018), variation in individual fitness (Wright et al., 2020), and selecting mating pairs in captive breeding programmes (Ivy, Putnam, Navarro, Gurr, & Ryder, 2016). Pairwise Hamming distances in the RRS dataset resolved a high degree of population structure within the cheetah cohort than the array dataset. Similarly, the RRS dataset PCA was able to differentiate between species, subpopulations within species and individuals. The tighter clustering of Panthera species in the RRS PCA and Hamming distances compared with

95 the array dataset reflects the closer phylogenetic relationship between Panthera species (snow leopard and Sumatran tiger), compared with cheetahs (Mattern & McLennan, 2000). As a result of habitat fragmentation, there are currently four genetically distinct sub-populations of cheetah, two of which reside in Namibia and Tanzania (Kitchener et al., 2017; Schmidt-Küntzel et al., 2018). Only the RRS dataset was able to distinguish this geographic population structure among cheetahs. SNP discovery methods like RRS do not face the same limitations as SNP genotyping arrays in cross-species applications and are therefore able to resolve a higher degree of genome complexity (Miller et al., 2012). The ability to capture background relatedness among samples is also an essential component of GWAS, as cryptic relatedness is a major source of confounding (Thomson & McWhirter, 2017).

Conclusion We demonstrated cross-species applications of the feline SNP array and reference-based RRS to be reproducible across three Felidae species, with limited overlap between RRS and array marker sets. Compared with the array dataset, the RRS dataset offered higher SNP density and greater resolution of population structure and genotype- phenotype associations. Our findings have highlighted the potential role that applications of low-density cross- species genomic methods can play in conservation biology. Genome-wide SNP profiles are key to answering a range of pressing biological and ecological questions, from describing levels of diversity within and between species to pronouncing genes underlying phenotypic traits of interest.

Acknowledgements The research undertaken in this project was funded by the Jenna O’Grady Donley Fund, Sydney School of Veterinary Science. The authors acknowledge the University of Sydney’s high-performance computing cluster Artemis for providing computing resources that have contributed to the research reported within this manuscript. We would also like to acknowledge Dr Jacqueline Norris, Zoos Victoria, Melbourne Zoo, Mogo Zoo, Weribee Open Range Zoo and Taronga Conservation Society Australia for providing whole blood samples of big cats included in poolseq datasets.

References Alhaddad, H., Khan, R., Grahn, R. A., Gandolfi, B., Mullikin, J. C., Cole, S. A., . . . Longeri, M. (2013). Extent of linkage disequilibrium in the domestic cat, Felis silvestris catus, and its breeds. PLoS One, 8(1), e53537.

96 Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F., & Hamosh, A. (2015). OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res, 43(Database issue), D789-798. doi:10.1093/nar/gku1205 Armstrong, E. E., Taylor, R. W., Miller, D. E., Kaelin, C. B., Barsh, G. S., Hadly, E. A., & Petrov, D. (2020). Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long-read data. BMC Biology, 18(1), 3. doi:10.1186/s12915-019-0734-5 Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., . . . Sherlock, G. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1), 25-29. doi:10.1038/75556 Buckley, R. M., Davis, B. W., Brashear, W. A., Farias, F. H., Kuroki, K., Graves, T., . . . Middleton, R. (2020). A new domestic cat genome assembly based on long sequence reads empowers feline genomic medicine and identifies a novel gene for dwarfism. bioRxiv. Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., & Lee, J. J. (2015). Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4(1), s13742-13015-10047-13748. Cho, Y. S., Hu, L., Hou, H., Lee, H., Xu, J., Kwon, S., . . . Bhak, J. (2013). The tiger genome and comparative analysis with lion and snow leopard genomes. Nature Communications, 4(1), 2433. doi:10.1038/ncomms3433 Choquet, M., Smolina, I., Dhanasiri, A. K. S., Blanco-Bercial, L., Kopp, M., Jueterbock, A., . . . Hoarau, G. (2019). Towards population genomics in non-model species with large genomes: a case study of the marine zooplankton Calanus finmarchicus. Royal Society Open Science, 6(2), 180608. doi:doi:10.1098/rsos.180608 De La Torre, A. R., Wilhite, B., & Neale, D. B. (2019). Environmental Genome-Wide Association Reveals Climate Adaptation Is Shaped by Subtle to Moderate Allele Frequency Shifts in Loblolly Pine. Genome Biology and Evolution, 11(10), 2976-2989. doi:10.1093/gbe/evz220 DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., . . . Hanna, M. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics, 43(5), 491. Dobrynin, P., Liu, S., Tamazian, G., Xiong, Z., Yurchenko, A. A., Krasheninnikova, K., . . . Johnson, W. (2015). Genomic legacy of the African cheetah, Acinonyx jubatus. Genome Biology, 16(1), 1-20. Eaton, R. (1974). The Cheetah. The Biology, Ecology, and Behavior of an Endangered Species. New York: Van Nostrand Reinhold. Gagnaire, P. A., Normandeau, E., Pavey, S. A., & Bernatchez, L. (2013). Mapping phenotypic, expression and transmission ratio distortion QTL using RAD markers in the Lake Whitefish (Coregonus clupeaformis). Mol Ecol, 22(11), 3036-3048. doi:10.1111/mec.12127

97 Galla, S. J., Buckley, T. R., Elshire, R., Hale, M. L., Knapp, M., McCallum, J., . . . Steeves, T. E. (2016). Building strong relationships between conservation genetics and primary industry leads to mutually beneficial genomic advances. Molecular Ecology, 25(21), 5267-5281. doi:https://doi.org/10.1111/mec.13837 Galla, S. J., Forsdick, N. J., Brown, L., Hoeppner, M. P., Knapp, M., Maloney, R. F., . . . Steeves, T. E. (2019). Reference genomes from distantly related species can be used for discovery of single nucleotide polymorphisms to inform conservation management. Genes, 10(1), 9. Gandolfi, B., Alhaddad, H., Abdi, M., Bach, L. H., Creighton, E. K., Davis, B. W., . . . Lyons, L. A. (2018). Applications and efficiencies of the first cat 63K DNA array. Scientific Reports, 8(1), 7024. doi:10.1038/s41598-018- 25438-0 Gooley, R., Hogg, C. J., Belov, K., & Grueber, C. E. (2017). No evidence of inbreeding depression in a Tasmanian devil insurance population despite significant variation in inbreeding. Scientific Reports, 7(1), 1830. doi:10.1038/s41598-017-02000-y Gopalakrishnan, S., Samaniego Castruita, J. A., Sinding, M.-H. S., Kuderna, L. F. K., Räikkönen, J., Petersen, B., . . . Gilbert, M. T. P. (2017). The wolf reference genome sequence (Canis lupus lupus) and its implications for Canis spp. population genomics. BMC Genomics, 18(1), 495. doi:10.1186/s12864-017-3883-3 Guo, B., & Guilford, W. H. (2006). Mechanics of actomyosin bonds in different nucleotide states are tuned to muscle contraction. Proceedings of the National Academy of Sciences, 103(26), 9844. doi:10.1073/pnas.0601255103 Gurda, B. L., Bradbury, A. M., & Vite, C. H. (2017). Canine and Feline Models of Human Genetic Diseases and Their Contributions to Advancing Clinical Therapies The Yale journal of biology and medicine, 90(3), 417- 431. Hoffman, J. I., Thorne, M. A., McEwing, R., Forcada, J., & Ogden, R. (2013). Cross-amplification and validation of SNPs conserved over 44 million years between seals and dogs. PLoS One, 8(7), e68365. Holl, H. M., Pflug, K. M., Yates, K. M., Hoefs-Martin, K., Shepard, C., Cook, D. G., . . . Brooks, S. A. (2019). A candidate gene approach identifies variants in SLC45A2 that explain dilute phenotypes, pearl and sunshine, in compound heterozygote horses. Anim Genet, 50(3), 271-274. doi:10.1111/age.12790 Huang da, W., Sherman, B. T., & Lempicki, R. A. (2009a). Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res, 37(1), 1-13. doi:10.1093/nar/gkn923 Huang da, W., Sherman, B. T., & Lempicki, R. A. (2009b). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc, 4(1), 44-57. doi:10.1038/nprot.2008.211 Husby, A., Kawakami, T., Rönnegård, L., Smeds, L., Ellegren, H., & Qvarnström, A. (2015). Genome-wide association mapping in a wild avian population identifies a link between genetic and phenotypic variation in a life-history trait. Proceedings of the Royal Society B: Biological Sciences, 282(1806), 20150156.

98 Ivy, J. A., Putnam, A. S., Navarro, A. Y., Gurr, J., & Ryder, O. A. (2016). Applying SNP-Derived Molecular Coancestry Estimates to Captive Breeding Programs. Journal of Heredity, 107(5), 403-412. doi:10.1093/jhered/esw029 Johnson, W. E., Eizirik, E., Pecon-Slattery, J., Murphy, W. J., Antunes, A., Teeling, E., & Brien, S. J. (2006). The Late Miocene Radiation of Modern Felidae: A Genetic Assessment. Science, 311(5757), 73. doi:10.1126/science.1122277 Judkins, M. E., Couger, B. M., Warren, W. C., & Van Den Bussche, R. A. (2020). A 50K SNP array reveals genetic structure for bald eagles (Haliaeetus leucocephalus). Conservation Genetics, 21(1), 65-76. doi:10.1007/s10592-019-01216-x Kharzinova, V. R., Dotsev, A. V., Deniskova, T. E., Solovieva, A. D., Fedorov, V. I., Layshev, K. A., . . . Reyer, H. (2018). Genetic diversity and population structure of domestic and wild reindeer (Rangifer tarandus L. 1758): A novel approach using BovineHD BeadChip. Plos one, 13(11), e0207944. Kharzinova, V. R., Sermyagin, A. A., Gladyr, E. A., Okhlopkov, I. M., Brem, G., & Zinovieva, N. A. (2015). A Study of Applicability of SNP Chips Developed for Bovine and Ovine Species to Whole-Genome Analysis of Reindeer Rangifer tarandus. Journal of Heredity, 106(6), 758-761. doi:10.1093/jhered/esv081 Kitchener, A. C., Breitenmoser-Würsten, C., Eizirik, E., Gentry, A., Werdelin, L., Wilting, A., . . . Driscoll, C. (2017). A revised taxonomy of the Felidae: The final report of the Cat Classification Task Force of the IUCN Cat Specialist Group. Cat News. Le, L., Escobar, I. E., Ho, T., Lefkovith, A. J., Latteri, E., Haltaufderhyde, K. D., . . . Marks, M. S. (2020). SLC45A2 protein stability and regulation of melanosome pH determine melanocyte pigmentation. Mol Biol Cell, 31(24), 2687-2702. doi:10.1091/mbc.E20-03-0200 Li, G., Davis, B. W., Eizirik, E., & Murphy, W. J. (2016). Phylogenomic evidence for ancient hybridization in the genomes of living cats (Felidae). Genome Res, 26(1), 1-11. doi:10.1101/gr.186668.114 Li, H. (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics (Oxford, England), 27(21), 2987- 2993. doi:10.1093/bioinformatics/btr509 Lischer, H. E. L., & Shimizu, K. K. (2017). Reference-guided de novo assembly approach improves genome reconstruction for related species. BMC Bioinformatics, 18(1), 474. doi:10.1186/s12859-017-1911-6 Liu, Y.-C., Sun, X., Driscoll, C., Miquelle, D. G., Xu, X., Martelli, P., . . . Luo, S.-J. (2018). Genome-Wide Evolutionary Analysis of Natural History and Adaptation in the World’s Tigers. Current Biology, 28(23), 3840- 3849.e3846. doi:https://doi.org/10.1016/j.cub.2018.09.019 Luo, S.-J., Johnson, W. E., & O'Brien, S. J. (2010). Applying molecular genetic tools to tiger conservation. Integrative zoology, 5(4), 351-362. doi:10.1111/j.1749-4877.2010.00222.x

99 Malomane, D. K., Reimer, C., Weigend, S., Weigend, A., Sharifi, A. R., & Simianer, H. (2018). Efficiency of different strategies to mitigate ascertainment bias when using SNP panels in diversity studies. BMC Genomics, 19(1), 22. doi:10.1186/s12864-017-4416-9 Mardis, E. R. (2010). The $1,000 genome, the $100,000 analysis? Genome Medicine, 2(11), 84. doi:10.1186/gm205 Margres, M. J., Jones, M. E., Epstein, B., Kerlin, D. H., Comte, S., Fox, S., . . . Storfer, A. (2018). Large-effect loci affect survival in Tasmanian devils (Sarcophilus harrisii) infected with a transmissible cancer. Molecular Ecology, 27(21), 4189-4199. doi:10.1111/mec.14853 Mattern, M. Y., & McLennan, D. A. (2000). Phylogeny and Speciation of Felids. Cladistics, 16(2), 232-253. doi:https://doi.org/10.1006/clad.2000.0132 Mazrier, H., Van Hoeven, M., Wang, P., Knox, V. W., Aguirre, G. D., Holt, E., . . . Giger, U. (2003). Inheritance, Biochemical Abnormalities, and Clinical Features of Feline Mucolipidosis II: The First Animal Model of Human I-Cell Disease. Journal of Heredity, 94(5), 363-373. doi:10.1093/jhered/esg080 McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., . . . DePristo, M. A. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research, 20(9), 1297-1303. doi:10.1101/gr.107524.110 McLaren, W., Gil, L., Hunt, S. E., Riat, H. S., Ritchie, G. R. S., Thormann, A., . . . Cunningham, F. (2016). The Ensembl Variant Effect Predictor. Genome biology, 17(1), 122. doi:10.1186/s13059-016-0974-4 McMahon, B. J., Teeling, E. C., & Höglund, J. (2014). How and why should we implement genomics into conservation? Evolutionary Applications, 7(9), 999-1007. Menotti-Raymond, M., & O’Brien, S. J. (2008). The Domestic Cat, Felis catus, as a Model of Hereditary and Infectious Disease. In P. M. Conn (Ed.), Sourcebook of Models for Biomedical Research (pp. 221-232). Totowa, NJ: Humana Press. Miller, J. M., Kijas, J. W., Heaton, M. P., McEwan, J. C., & Coltman, D. W. (2012). Consistent divergence times and allele sharing measured from cross-species application of SNP chips developed for three domestic species. Molecular Ecology Resources, 12(6), 1145-1150. doi:https://doi.org/10.1111/1755-0998.12017 Miller, J. M., Poissant, J., Kijas, J. W., Coltman, D. W., & the International Sheep Genomics, C. (2011). A genome- wide set of SNPs detects population substructure and long range linkage disequilibrium in wild sheep. Molecular Ecology Resources, 11(2), 314-322. doi:https://doi.org/10.1111/j.1755-0998.2010.02918.x Minias, P., Dunn, P. O., Whittingham, L. A., Johnson, J. A., & Oyler-McCance, S. J. (2019). Evaluation of a Chicken 600K SNP genotyping array in non-model species of grouse. Scientific Reports, 9(1), 6407. doi:10.1038/s41598-019-42885-5 Mittal, P., Jaiswal, S. K., Vijay, N., Saxena, R., & Sharma, V. K. (2019). Comparative analysis of corrected tiger genome provides clues to its neuronal evolution. Scientific Reports, 9(1), 18459. doi:10.1038/s41598-019- 54838-z

100 Murphy, W. J., Sun, S., Chen, Z., Yuhki, N., Hirschmann, D., Menotti-Raymond, M., & O'Brien, S. J. (2000). A radiation hybrid map of the cat genome: implications for comparative mapping. Genome research, 10(5), 691-702. doi:10.1101/gr.10.5.691 Natesh, M., Atla, G., Nigam, P., Jhala, Y. V., Zachariah, A., Borthakur, U., & Ramakrishnan, U. (2017). Conservation priorities for endangered Indian tigers through a genomic lens. Scientific Reports, 7(1), 9614. doi:10.1038/s41598-017-09748-3 Negro, S. S., Millet, E. J., Madur, D., Bauland, C., Combes, V., Welcker, C., . . . Nicolas, S. D. (2019). Genotyping- by-sequencing and SNP-arrays are complementary for detecting quantitative trait loci by tagging different haplotypes in association studies. BMC plant biology, 19(1), 318-318. doi:10.1186/s12870-019-1926-4 O’Brien, M. J., Beijerink, N. J., Sansom, M., Thornton, S. W., Chew, T., & Wade, C. M. (2020). A large deletion on CFA28 omitting ACSL5 gene is associated with intestinal lipid malabsorption in the Australian Kelpie dog breed. Scientific Reports, 10(1), 18223. doi:10.1038/s41598-020-75243-x Ogden, R., Baird, J., Senn, H., & McEwing, R. (2012). The use of cross-species genome-wide arrays to discover SNP markers for conservation genetics: a case study from Arabian and scimitar-horned oryx. Conservation Genetics Resources, 4(2), 471-473. doi:10.1007/s12686-011-9577-2 Patra, C., van Amerongen, M. J., Ghosh, S., Ricciardi, F., Sajjad, A., Novoyatleva, T., . . . Engel, F. B. (2013). Organ- specific function of adhesion G protein-coupled receptor GPR126 is domain-dependent. Proc Natl Acad Sci U S A, 110(42), 16898-16903. doi:10.1073/pnas.1304837110 Paun, O., Verhoeven, K. J. F., & Richards, C. L. (2019). Opportunities and limitations of reduced representation bisulfite sequencing in plant ecological epigenomics. New Phytologist, 221(2), 738-742. doi:https://doi.org/10.1111/nph.15388 Pendleton, A. L., Shen, F., Taravella, A. M., Emery, S., Veeramah, K. R., Boyko, A. R., & Kidd, J. M. (2018). Comparison of village dog and wolf genomes highlights the role of the neural crest in dog domestication. BMC Biology, 16(1), 64. doi:10.1186/s12915-018-0535-2 Qi, J., Holyoak, M., Ning, Y., & Jiang, G. (2020). Ecological thresholds and large carnivores conservation: Implications for the Amur tiger and leopard in China. Global Ecology and Conservation, 21, e00837. doi:https://doi.org/10.1016/j.gecco.2019.e00837 Ripple, W. J., Estes, J. A., Beschta, R. L., Wilmers, C. C., Ritchie, E. G., Hebblewhite, M., . . . Wirsing, A. J. (2014). Status and Ecological Effects of the World’s Largest Carnivores. Science, 343(6167), 1241484. doi:10.1126/science.1241484 Samaha, G., Beatty, J., Wade, C. M., & Haase, B. (2019). The Burmese cat as a genetic model of type 2 diabetes in humans. Animal Genetics, 50(4), 319-325. doi:https://doi.org/10.1111/age.12799

101 Samaha, G., Wade, C., Mazrier, H., Grueber, C. E., & Haase, B. (Under Review). Exploiting genomic synteny in Felidae: using cross-species genome alignment and SNP discovery to inform conservation management in big cats. BMC Genomics. Samaha, G., Wade, C. M., Beatty, J., Lyons, L. A., Fleeman, L. M., & Haase, B. (2020). Mapping the genetic basis of diabetes mellitus in the Australian Burmese cat (Felis catus). Scientific Reports, 10(1), 19194. doi:10.1038/s41598-020-76166-3 Schaefer, R. J., Schubert, M., Bailey, E., Bannasch, D. L., Barrey, E., Bar-Gal, G. K., . . . McCue, M. E. (2017). Developing a 670k genotyping array to tag ~2M SNPs across 24 horse breeds. BMC Genomics, 18(1), 565. doi:10.1186/s12864-017-3943-8 Schmidt-Küntzel, A., Dalton, D. L., Menotti-Raymond, M., Fabiano, E., Charruau, P., Johnson, W. E., . . . O’Brien, S. J. (2018). Conservation genetics of the cheetah: Genetic history and implications for conservation. Cheetahs: Biology and Conservation, 71. Shafer, A. B., Wolf, J. B., Alves, P. C., Bergström, L., Bruford, M. W., Brännström, I., . . . Ekblom, R. (2015). Genomics and the challenging translation into conservation practice. Trends in ecology & evolution, 30(2), 78-87. Smedley, D., Haider, S., Ballester, B., Holland, R., London, D., Thorisson, G., & Kasprzyk, A. (2009). BioMart – biological queries made easy. BMC Genomics, 10(1), 22. doi:10.1186/1471-2164-10-22 Storfer, A., Hohenlohe, P. A., Margres, M. J., Patton, A., Fraik, A. K., Lawrance, M., . . . Jones, M. E. (2018). The devil is in the details: Genomics of transmissible cancers in Tasmanian devils. PLOS Pathogens, 14(8), e1007098. doi:10.1371/journal.ppat.1007098 Supple, M. A., & Shapiro, B. (2018). Conservation of biodiversity in the genomics era. Genome Biology, 19(1), 131. doi:10.1186/s13059-018-1520-3 Takezaki, N., & Nei, M. (2009). Genomic Drift and Evolution of Microsatellite DNAs in Human Populations. Molecular Biology and Evolution, 26(8), 1835-1840. doi:10.1093/molbev/msp091 Tam, V., Patel, N., Turcotte, M., Bossé, Y., Paré, G., & Meyre, D. (2019). Benefits and limitations of genome-wide association studies. Nature Reviews Genetics, 20(8), 467-484. doi:10.1038/s41576-019-0127-1 Taylor, H., Dussex, N., & van Heezik, Y. (2017). Bridging the conservation genetics gap by identifying barriers to implementation for conservation practitioners. Global Ecology and Conservation, 10, 231-242. doi:https://doi.org/10.1016/j.gecco.2017.04.001 Terrell, K. A., Crosier, A. E., Wildt, D. E., O'Brien, S. J., Anthony, N. M., Marker, L., & Johnson, W. E. (2016). Continued decline in genetic diversity among wild cheetahs (Acinonyx jubatus) without further loss of semen quality. Biological Conservation, 200, 192-199. doi:https://doi.org/10.1016/j.biocon.2016.05.034 Thomson, R., & McWhirter, R. (2017). Adjusting for Familial Relatedness in the Analysis of GWAS Data. Methods Mol Biol, 1526, 175-190. doi:10.1007/978-1-4939-6613-4_10

102 Tian, T., Liu, Y., Yan, H., You, Q., Yi, X., Du, Z., . . . Su, Z. (2017). agriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update. Nucleic Acids Res, 45(W1), W122-W129. doi:10.1093/nar/gkx382 Wang, H., Xue, L., Li, Y., Zhao, B., Chen, T., Liu, Y., . . . Wang, J. (2016). Distribution and expression of SLC45A2 in the skin of sheep with different coat colors. Folia Histochem Cytobiol, 54(3), 143-150. doi:10.5603/FHC.a2016.0015 Wang, J., Guofan, Z., Haigang, Q., Li, L., & Xuedi, D. (2016). RestrictionDigest: A powerful Perl module for simulating genomic restriction digests. Electronic Journal of Biotechnology., 21, 36-42. doi:10.1016/j.ejbt.2016.02.003 Wang, J., Li, L., Qi, H., Du, X., & Zhang, G. (2016). RestrictionDigest: A powerful Perl module for simulating genomic restriction digests. Electronic Journal of Biotechnology, 21, 36-42. doi:https://doi.org/10.1016/j.ejbt.2016.02.003 Wang, K., Wu, P., Yang, Q., Chen, D., Zhou, J., Jiang, A., . . . Tang, G. (2018). Detection of Selection Signatures in Chinese Landrace and Yorkshire Pigs Based on Genotyping-by-Sequencing Data. Frontiers in genetics, 9, 119-119. doi:10.3389/fgene.2018.00119 Willet, C. E., & Haase, B. (2014). An updated felCat5 SNP manifest for the Illumina Feline 63k SNP genotyping array. Anim Genet, 45(4), 614-615. doi:10.1111/age.12169 Wright, B., Farquharson, K. A., McLennan, E. A., Belov, K., Hogg, C. J., & Grueber, C. E. (2020). A demonstration of conservation genomics for threatened species management. Molecular ecology resources, 20(6), 1526- 1541. doi:10.1111/1755-0998.13211 Wright, B., Morris, K., Grueber, C. E., Willet, C. E., Gooley, R., Hogg, C. J., . . . Belov, K. (2015). Development of a SNP-based assay for measuring genetic diversity in the Tasmanian devil insurance population. BMC Genomics, 16(1), 791. doi:10.1186/s12864-015-2020-4 Xu, X., Dong, G.-X., Hu, X.-S., Miao, L., Zhang, X.-L., Zhang, D.-L., . . . Luo, S.-J. (2013). The Genetic Basis of White Tigers. Current Biology, 23(11), 1031-1035. doi:https://doi.org/10.1016/j.cub.2013.04.054 Yang, Q., Chen, H., Ye, J., Liu, C., Wei, R., Chen, C., & Huang, L. (2019). Genetic Diversity and Signatures of Selection in 15 Chinese Indigenous Dog Breeds Revealed by Genome-Wide SNPs. Frontiers in Genetics, 10(1174). doi:10.3389/fgene.2019.01174 Yarnitzky, T., & Volk, T. (1995). Laminin is required for heart, somatic muscles, and gut development in the Drosophila embryo. Developmental biology, 169(2), 609-618. doi:10.1006/dbio.1995.1173 Yu, Y., Creighton, E. K., Buckley, R. M., Lyons, L. A., & Consortium, L. (2020). A Deletion in GDF7 is Associated with a Heritable Forebrain Commissural Malformation Concurrent with Ventriculomegaly and Interhemispheric Cysts in Cats. Genes, 11(6), 672.

103 Zimmerman, S. J., Aldridge, C. L., & Oyler-McCance, S. J. (2020). An empirical comparison of population genetic analyses using microsatellite and SNP data for a species of conservation concern. BMC Genomics, 21(1), 382. doi:10.1186/s12864-020-06783-9

Data sharing and benefit-sharing statement Sequence data for cheetah, Sumatran tiger and snow leopard individual samples is available on NCBI Sequence Read Archive. Cheetah samples: SRR2737540, SRR2737541, SRR2737542, SRR2737543, SRR2737544, SRR2737545 were uploaded under BioProject no. PRJNA297824. Sumatran tiger samples: SRR7152379, SRR7152382, SRR7152383, SRR7152384, SRR7152385, SRR7152386, SRR7152388 were uploaded under BioProject no. PRJNA437782. Snow leopard sample SRR836372 was uploaded under BioProject no. PRJNA182708.

Author contributions All authors conceptualised the study design. B.W and C.M.W were responsible for the provision of resources. Experimental work, bioinformatic analyses and visualisation were conducted by G.S under the supervision of B.H and C.E.G. G.S drafted the original manuscript which all authors edited. All authors approved the final manuscript.

104 Tables and figures

Figure 1: Comparison of unique and shared polymorphic markers across both datasets for each species. a. Genotyping array: the number of genotyped polymorphic SNPs varied across species (range:1.19%-0.15%). No markers were shared across all species. b. Reduced-representation sequencing; 107 SNPs were shared across all three species.

105

Figure 2: Comparison of the distribution of polymorphic markers from the feline genotyping array (ARRAY) and reduced representation sequencing (RRS) datasets across all felCat9 chromosomes. Across all species the RRS dataset yielded a larger and evenly distributed marker set. a. Comparison of array and RRS datasets in the cheetah, b. in the snow leopard, and c. in the Sumatran tiger. Colours of each line corresponds to the number of SNPs per 1 Mb window. Chromosome nomenclature corresponds to chromosome size and telomeric position, as is the convention in the domestic cat. 106

Table 1: Number of annotated variants (and their predicted effects) in array and reduced-representation sequencing datasets for three species.

Cheetah Snow leopard Sumatran tiger

Variant class Array RRS Array RRS Array RRS (N=419) (N=22,252) (N=60) (N=2,414) (N=189) (N=5,506)

3’ UTR 2 18 1 35 2 40

5’ UTR 1 3 0 1 0 4

Downstream of gene ƚ 45 313 5 304 19 402

Intergenic 273 4,346 31 2,074 148 5,038

Intron 269 3,052 36 1,455 127 2,657

Missense 4 18 1 36 3 47

Stop gained 0 0 0 0 0 1

Synonymous 9 19 4 38 4 69

Upstream of gene ƚ 37 345 3 230 13 342

ƚ Downstream and upstream gene distances were defined as 5000 bp.

107 a

b

Figure 3: Principal component analyses (PCA) comparing the two marker datasets ability to model population structure between and within species. a. array dataset: clustering of individuals within each species (top left) and differentiation of individuals within species groups (top right: cheetah, bottom right: Sumatran tiger and bottom left: snow leopard). b. reduced-representation sequencing (RRS) was able to delineate clustering of individuals within species (top left) and differentiation between individuals within species groups (bottom right: Sumatran tiger, bottom left: snow leopard) and sub-populations (top right: cheetah). 108 Chapter 5: Concluding remarks

Throughout this thesis, I explored the use of the domestic cat as a powerful model for medical and conservation genetic studies. The work presented here has utilised the unique population structures and patterns of genome organisation among cat populations to demonstrate the value of the domestic cat as a comparative model. I used NGS technologies to interrogate regions of the feline genome relevant to disease pathology of T2D in humans, hypercarnivory in wild cat species and domestication. Over recent decades, improvements in NGS technologies and bioinformatic resources have led to the creation of high-quality reference genome assemblies, linkage maps and a genotyping array. The development and ongoing improvement of these tools has largely been spurred on by the relevance of cats to human disease research and comparative genomics.

The inbreeding required to establish pedigreed breed standards has created populations suitable for selective scan and association analyses using low-density marker sets. In Chapter 2, I presented a naturally occurring feline model of T2D in the Burmese breed. I used the feline genotyping array and WGS data to identify selective sweeps and regions of disease association across the Australian Burmese breeding population. The genomic profile of the Burmese breed was marked by generations of inbreeding, containing extensive ROH across multiple chromosomes, surrounding genes underlying breed-specific traits. Not only is FDM a common condition in Burmese cats, they also experience an elevated prevalence of inherited derangements in lipid metabolism. This work revealed a polygenic spectrum of candidate genes previously implicated in T2D and lipid dysregulation pathologies in humans. Both ROH and GWAS analyses identified a risk haplotype and splice region variants segregating in a known T2D candidate gene, ANK1.

As a mapping study, this work sought to establish the Burmese cat as a genetic model for T2D in humans, characterising the genetic landscape of a predisposed population and paving the way for further investigations in larger cohorts. GWAS in human populations have identified numerous T2D risk loci, however, attempts to localise these signals to genes and causal variants has proven difficult due to a high degree of phenotypic variance and population stratification. Mapping studies in pedigreed animal populations offer statistical advantages to those performed in human populations, limiting marker density and sampling requirements. Naturally occurring FDM shares key pathological features involved in the complex interplay between genetic, metabolic and acquired factors that define T2D, making it favourable to other animal models for genome-wide mapping studies. In addition to simplifying their genetic landscape, selective breeding and its associated founder effects in the Burmese breed has likely conferred an increased risk of FDM caused by a combination of rare and common alleles. The identification of genes involved in lipid metabolism and diabetic pathogenesis in this study supports previous work claiming deranged lipid metabolism may precede the development of FDM in some Burmese breeding populations.

Despite extensive ROH and LD among Burmese cats, the density of markers on the feline genotyping array is limited in its ability to narrow down the complex architecture of polygenic traits. Further, the limited use of phenotyped WGS samples used here constrained the clarification of FDM-associated haplotypes. The current feline array has mostly been used to localise simple Mendelian traits and an improvement in marker density would improve our ability to better resolve the genetic architecture of complex traits. Other domestic species used to model human diseases, like the domestic dog, have benefitted from access to high-density genotyping

109 arrays that allow for better clarification of complex traits and diseases. Genotyping arrays are a cost-effective and accessible tool for studying population genetics, widely used in companion animal genetics. The continued development of genetic resources like genotyping technologies expanded ‘hypothesis-free’ population-based investigations in cats, moving away from pedigree and candidate-gene approaches. This has been an essential component of refining comparative feline models for biomedical research in humans.

Following the use of reference-based genomic datasets in domestic cats, Chapters 3 and 4 demonstrated what conservation genetics more broadly can gain from leveraging genomic resources developed for model species, using felids. The inclusion of genome-scale data in conservation genetics research can inform species delineation, signatures of selection through identification of adaptive alleles and improve genetic rescue and ex situ management in the most vulnerable species. Despite long being recognised as an important conservation management tool, an uptake of conservation genomics research has been slow and not routinely integrated into management strategies. Lacking financial, computational and bioinformatic resources, combined with poor collaboration between conservation managers and researchers is often cited as the reason for this. Genomic resources offer essential genomic context to population level analyses for identifying gene regions underlying adaptive and deleterious variants. As such Chapters 3 and 4 demonstrated alternative reference-based methods that take advantage of the high degree of genomic synteny observed between cat species and interrogate the advantages and limitations of these approaches.

In Chapter 3 I presented a cross-species genome alignment and variant calling method for the benefit of conservation research in wild felids. Alignment of Sumatran tiger, snow leopard and cheetah WGS yielded nearly complete coverage of the feline reference genome. Variant annotation and enrichment analysis revealed species-specific variants that offered insight into adaptive traits, evolutionary history and the pathogenesis of heritable diseases in these species. Further, as a comparative approach, this cross-species method was able to highlight a suite of defining characteristics of domestication including neurological and behavioural adaptations, reproductive and metabolic function and differences in body size. The variant dataset presented in variant call file (VCF) format are offered as a resource for future studies into population dynamics, evolutionary history and genetic and disease management in wild cat species. We were constrained in our capacity to collect WGS samples, and as such, our dataset was comprised of publicly available WGS data and pooled sequencing. We compared sequencing of individual genomes with pooled data and found pooled WGS data served as a cost-effective means of SNP discovery, and could be used to reliably characterise the allele frequency spectrum of a population. Demonstrations that include the use of publicly available data and pooled WGS is useful across genomic disciplines but also particularly to conservationists, given the financial and sampling considerations that constrain the generation of new data using wildlife species.

A comprehensive exploration of the advantages and limitations of a cross-species variant calling approach was presented for the benefit of conservation practitioners. The most notable limitation was the vulnerability of demographic parameter estimations to reference sequence bias. As type specimens, reference assemblies can distort variant calling results and cause a loss of rate alleles. A higher frequency of variants observed in cheetahs aligned to the domestic cat compared with an alignment to its own species reference assembly demonstrated this clearly. However, population structure analyses like multi-dimensional scaling were largely unaffected by reference bias. In better resourced, model species like humans, multi-genome alignment

110 techniques are used to overcome this bias, however, these resources will likely remain scarce in a conservation setting, given their requirement for large sample sizes, often difficult to obtain for vulnerable large mammals.

Following the cross-species genome alignment demonstration, I explored two commonly used low-density variant genotyping and discovery methods in Chapter 4. The growing range of NGS options raises the question of which dataset is best suited for estimating demographic processes of conservation value. While many have espoused the cost-effectiveness of low-density genomic approaches like genotyping arrays and RRS, practical comparisons for the benefit of conservation researchers are not widely available in the literature. In Chapter 4, I compare the location of polymorphic markers called by RRS and the feline genotyping array in Sumatran tigers, snow leopards and cheetahs, against the feline reference genome. Further, we evaluated the ability of each dataset to estimate population structure and identify genomic regions under selection using association analyses. As expected, the RRS dataset offered a higher number of variants across all species compared with polymorphic SNPs on the feline array. The low proportion of polymorphic array markers here is consistent with the exponential decrease in polymorphic markers previously observed in other cross-species applications of genotyping arrays in domestic mammals. Genotyping arrays are purpose built; in the case of those developed for domestic species, marker sets will have been selected to reflect a genetic profile associated with domestication and breed development. As such, their usage in the wild relatives of domestic species is limited and this was clearly demonstrated here. Both RRS and genotyping array datasets were simulated from the WGS data generated in Chapter 3. This may have elevated the number of variants observed across all species in the RRS dataset.

Advances in sequencing technologies and computational power have improved accessibility to genomics for non-model species, while bioinformatic pipelines have become more readily available to scientists without a bioinformatics or computer science background. Genomic approaches require researchers to make simple methodological decisions that will impact downstream analyses and the ability to answer specific research questions. In Chapters 3 and 4, we explored the different questions that can be answered using a variety of commonly used genomics approaches. Species capable of using existing genomic resources developed for model species have additional methodological considerations that impact their ability to garner useful information from these datasets. This work was expanded only to a small group of samples of few species, as such, it could not capture the broader applicability of this method across all wild felid lineages. Cheetahs consistently yielded a higher degree of polymorphic markers across both Chapters 3 and 4. This is likely a reflection of the unique genetic profile of cheetahs, having experienced historical bottleneck events resulting in fragmented populations defined by low levels of heterozygosity. Aligning cheetah samples to the domestic cat likely overestimates genetic diversity in these samples.

Collectively, my findings demonstrate the utility of genomic tools developed for the domestic cat across a range of scenarios affecting felid species. Further, this work supports the value of the domestic cat as a comparative model for genetic research.

111 Appendix I Supplementary data for Chapter 2.2

112 Table S1a: Remapping of Illumina Infinium iSelect 63 k Cat DNA genotyping array markers on felCat9 assembly. Marker and gap sizes per chromosome.

Average intermarker distance Number of gaps Chromosome Marker count (kb) >100kb A1 6287 38.50 589

A2 4357 39.29 405

A3 3402 42.10 338 B1 5023 40.08 486 B2 3768 40.27 369

B3 3714 40.27 380 B4 3714 40.11 364

C1 5655 40.00 603 C2 4179 38.57 449 D1 3072 39.74 329 D2 2340 39.67 258 D3 2528 39.60 263

D4 2442 39.59 260

E1 1526 39.65 154 E2 1651 38.75 161 E3 1086 39.65 122

F1 1791 39.66 169 F2 2139 39.63 222

X 2697 40.02 241 NULL 1526 - - Total 61371 39.74 6163

Table S1b: Updated locations of the 62,897 markers on the Illumina Infinium iSelect 63 k Cat DNA genotyping array remapped to the felCat9 assembly.

https://static-content.springer.com/esm/art%3A10.1038%2Fs41598-020-76166- 3/MediaObjects/41598_2020_76166_MOESM1_ESM.xlsx

113 Table S2: Runs of homozygosity analysis parameters calculated according to Meyermans et al. 2019 recommendations.

Number of genotyped animals 82 Total SNPs before quality control 62897 Unmapped 2777 Non-autosomal 5455 Call rate <0.95 1732 Total removed SNPs 9964 Total passed QC 52933 0.05 ns 52933 𝝰𝝰 ni 82 Het (--freq) 0.17 Minimal SNP/ROH (L) 98.10162155 Desired number of outer SNPs to be discarded (Nout) 4 Scanning window threshold (t) 0.050967557

114 Table S3: Variants segregating with the risk haplotype on chromosomes A3, B1 and E1.

Chromosome gDNA Risk allele SNP ID Variant type Gene name Transcript ID (Ensembl) Exon Intron A3 4526903 A Intron CTSZ ENSFCAT00000000844.5 - 2/5 A3 4527589 C Intron CTSZ ENSFCAT00000000844.5 - 2/5 A3 4528175 - rs783536464 Intron CTSZ ENSFCAT00000000844.5 - 2/5 A3 4528991 T Intron CTSZ ENSFCAT00000000844.5 - 2/5 A3 4529112 T Intron CTSZ ENSFCAT00000000844.5 - 2/5 A3 4530182 C rs784016879 Intron CTSZ ENSFCAT00000000844.5 - 3/5 A3 4531548 A Intron CTSZ ENSFCAT00000000844.5 - 4/5 A3 4531693 A Synonymous CTSZ ENSFCAT00000000844.5 5/6 - A3 4532014 G Intron CTSZ ENSFCAT00000000844.5 - 5/5 A3 4532071 T Intron CTSZ ENSFCAT00000000844.5 - 5/5 A3 4532541 T Intron CTSZ ENSFCAT00000000844.5 - 5/5 A3 4532583 T Intron CTSZ ENSFCAT00000000844.5 - 5/5 A3 4532783 C 3' UTR CTSZ ENSFCAT00000000844.5 6/6 - A3 4532891 T 3' UTR CTSZ ENSFCAT00000000844.5 6/6 - A3 4533076 T 3' UTR NELFCD ENSFCAT00000000843.5 15/15 - A3 4533128 C 3' UTR NELFCD ENSFCAT00000000843.5 15/15 - A3 4533135 T 3' UTR NELFCD ENSFCAT00000000843.5 15/15 - A3 4533214 A 3' UTR NELFCD ENSFCAT00000000843.5 15/15 - A3 4533500 G 3' UTR NELFCD ENSFCAT00000000843.5 15/15 - A3 4534045 T Intron NELFCD ENSFCAT00000000843.5 - 13/14 A3 4534664 C Intron NELFCD ENSFCAT00000000843.5 - 11/14 A3 4535981 T Intron NELFCD ENSFCAT00000000843.5 - 10/14 A3 4538198 T Synonymous NELFCD ENSFCAT00000000843.5 7/15 - A3 4538472 T rs783705891 Synonymous NELFCD ENSFCAT00000000843.5 6/15 - A3 4538644 T rs785004103 Intron NELFCD ENSFCAT00000000843.5 - 5/14 A3 4538967 A Intron NELFCD ENSFCAT00000000843.5 - 5/14 A3 4539145 C Intron NELFCD ENSFCAT00000000843.5 - 5/14 A3 4539376 C Intron NELFCD ENSFCAT00000000843.5 - 4/14 A3 4539581 T Intron NELFCD ENSFCAT00000000843.5 - 4/14 A3 4540079 A Intron NELFCD ENSFCAT00000000843.5 - 4/14 A3 4540108 A Intron NELFCD ENSFCAT00000000843.5 - 4/14 A3 4540287 A Intron NELFCD ENSFCAT00000000843.5 - 4/14 A3 4541883 G Intron NELFCD ENSFCAT00000000843.5 - 1/14 A3 4542139 T Intron NELFCD ENSFCAT00000000843.5 - 1/14

115 A3 4542155 T Intron NELFCD ENSFCAT00000000843.5 - 1/14 A3 4543039 T Intron NELFCD ENSFCAT00000000843.5 - 1/14 A3 4543895 - Intron NELFCD ENSFCAT00000000843.5 - 1/14 A3 4544174 G Intron NELFCD ENSFCAT00000000843.5 - 1/14 A3 4544573 C Intron NELFCD ENSFCAT00000000843.5 - 1/14 A3 4545250 A Intron NELFCD ENSFCAT00000000843.5 - 1/14 A3 4597945 - Intron GNAS ENSFCAT00000075327.1 - 13/13 A3 4603168 A Intron GNAS ENSFCAT00000075327.1 - 13/13 A3 4663320 G Intron GNAS ENSFCAT00000075708.1 - 2/13 A3 4678284 A Synonymous - ENSFCAT00000016341.3 1/1 - A3 4678485 G Synonymous - ENSFCAT00000016341.3 1/1 - A3 134511428 A Intron GREB1 ENSFCAT00000008516.6 - 5/31 A3 134513483 T Intron GREB1 ENSFCAT00000008516.6 - 4/31 A3 134519568 T Intron GREB1 ENSFCAT00000008516.6 - 3/31 A3 134519600 T Intron GREB1 ENSFCAT00000008516.6 - 3/31 A3 134519738 - Intron GREB1 ENSFCAT00000008516.6 - 3/31 A3 134521723 T Intron GREB1 ENSFCAT00000008516.6 - 3/31 B1 43116760 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43117915 G rs783202713 Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43118192 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43121243 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43124840 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43125913 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43125929 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43126420 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43132250 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43135163 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43137070 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43139016 - Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43139482 G Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43140167 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43144094 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43144438 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43144613 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43144898 G Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43146785 - Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43146794 T Intron ANK1 ENSFCAT00000037334.3 - 1/43

116 B1 43146841 - Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43147021 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43147420 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43147987 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43147990 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43148127 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43148200 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43148285 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43148431 C rs784151064 Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43148890 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43149869 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43151795 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43152095 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43152219 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43152462 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43153556 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43153925 - Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43154424 G Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43155041 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43155494 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43155495 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43155542 - Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43155821 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43156020 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43156116 - Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43157618 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43158314 G Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43159527 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43159832 - Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43160279 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43161135 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43161205 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43161681 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43161980 C rs785844696 Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43163137 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43163360 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43163619 C Intron ANK1 ENSFCAT00000037334.3 - 1/43

117 B1 43163624 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43163725 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43163820 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43164847 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43165215 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43165917 G Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43166426 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43169885 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43169936 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43173094 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43173610 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43173747 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43183711 G Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43187833 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43189378 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43189870 - Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43189891 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43189908 G Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43190484 - Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43191827 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43191851 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43192257 A Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43192445 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43193574 - Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43193653 T Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43194293 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43194428 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43195503 C Intron ANK1 ENSFCAT00000037334.3 - 1/43 B1 43211590 T Intron ANK1 ENSFCAT00000037334.3 - 2/43 B1 43213057 C Intron ANK1 ENSFCAT00000037334.3 - 2/43 B1 43213076 T Intron ANK1 ENSFCAT00000037334.3 - 2/43 B1 43214426 T Intron ANK1 ENSFCAT00000037334.3 - 2/43 B1 43215111 G Intron ANK1 ENSFCAT00000037334.3 - 2/43 B1 43215464 T Intron ANK1 ENSFCAT00000037334.3 - 2/43 B1 43216767 G Intron ANK1 ENSFCAT00000037334.3 - 2/43 B1 43217344 G Intron ANK1 ENSFCAT00000037334.3 - 2/43 B1 43218009 C Intron ANK1 ENSFCAT00000037334.3 - 2/43

118 B1 43224149 T Intron ANK1 ENSFCAT00000037334.3 - 3/43 B1 43224755 A Intron ANK1 ENSFCAT00000037334.3 - 3/43 B1 43227359 T Intron ANK1 ENSFCAT00000037334.3 - 3/43 B1 43227377 A Intron ANK1 ENSFCAT00000037334.3 - 3/43 Intronic splice B1 43227755 T region ANK1 ENSFCAT00000037334.3 - 4/43 B1 43227920 A Intron ANK1 ENSFCAT00000037334.3 - 4/43 B1 43228164 A Intron ANK1 ENSFCAT00000037334.3 - 4/43 B1 43228434 - Intron ANK1 ENSFCAT00000037334.3 - 4/43 B1 43228570 G Intron ANK1 ENSFCAT00000037334.3 - 4/43 B1 43228646 A Intron ANK1 ENSFCAT00000037334.3 - 4/43 B1 43228811 C Intron ANK1 ENSFCAT00000037334.3 - 4/43 B1 43228852 T Synonymous ANK1 ENSFCAT00000037334.3 5/44 - B1 43229123 A Intron ANK1 ENSFCAT00000037334.3 - 5/43 B1 43230391 C Intron ANK1 ENSFCAT00000037334.3 - 6/43 B1 43230802 A Intron ANK1 ENSFCAT00000037334.3 - 6/43 B1 43231260 T Intron ANK1 ENSFCAT00000037334.3 - 7/43 B1 43231663 T Intron ANK1 ENSFCAT00000037334.3 - 7/43 B1 43231824 A Intron ANK1 ENSFCAT00000037334.3 - 7/43 B1 43232268 T Intron ANK1 ENSFCAT00000037334.3 - 8/43 B1 43232494 A Intron ANK1 ENSFCAT00000037334.3 - 9/43 B1 43233953 - Intron ANK1 ENSFCAT00000037334.3 - 9/43 B1 43234822 G Intron ANK1 ENSFCAT00000037334.3 - 9/43 B1 43235379 G Intron ANK1 ENSFCAT00000037334.3 - 9/43 Intronic splice B1 43235911 T region ANK1 ENSFCAT00000037334.3 - 9/43 B1 43236466 A Intron ANK1 ENSFCAT00000037334.3 - 10/43 B1 43237137 C Intron ANK1 ENSFCAT00000037334.3 - 10/43 B1 43238339 A Intron ANK1 ENSFCAT00000037334.3 - 12/43 B1 43238875 A Intron ANK1 ENSFCAT00000037334.3 - 13/43 B1 43238934 - Intron ANK1 ENSFCAT00000037334.3 - 13/43 B1 43240772 C Intron ANK1 ENSFCAT00000037334.3 - 14/43 B1 43241275 A Intron ANK1 ENSFCAT00000037334.3 - 15/43 B1 43245170 A Intron ANK1 ENSFCAT00000037334.3 - 16/43 B1 43247715 T Intron ANK1 ENSFCAT00000037334.3 - 17/43 B1 43248304 T Intron ANK1 ENSFCAT00000037334.3 - 19/43 B1 43249984 C Intron ANK1 ENSFCAT00000037334.3 - 21/43 B1 43250858 A Intron ANK1 ENSFCAT00000037334.3 - 22/43

119 B1 43251102 T Intron ANK1 ENSFCAT00000037334.3 - 22/43 B1 43258504 T Intron ANK1 ENSFCAT00000037334.3 - 30/43 B1 43267176 G Intron ANK1 ENSFCAT00000037334.3 - 39/43 B1 43270693 G Intron ANK1 ENSFCAT00000037334.3 - 39/43 B1 43282451 G Intron ANK1 ENSFCAT00000037334.3 - 39/43 B1 43284252 TG Intron ANK1 ENSFCAT00000037334.3 - 39/43 B1 43286356 T rs44077554 Intron ANK1 ENSFCAT00000037334.3 - 40/43 B1 43291441 C Intron ANK1 ENSFCAT00000037334.3 - 41/43 B1 43292646 A Intron ANK1 ENSFCAT00000037334.3 - 42/43 B1 43292804 C Intron ANK1 ENSFCAT00000037334.3 - 42/43 B1 43292851 T Intron ANK1 ENSFCAT00000037334.3 - 42/43 B1 43293211 T Intron ANK1 ENSFCAT00000037334.3 - 42/43 B1 43295705 C Intron ANK1 ENSFCAT00000037334.3 - 43/43 B1 43295848 T Intron ANK1 ENSFCAT00000037334.3 - 43/43 B1 43295911 A Intron ANK1 ENSFCAT00000037334.3 - 43/43 B1 43296123 T Intron ANK1 ENSFCAT00000037334.3 - 43/43 B1 43296303 A Intron ANK1 ENSFCAT00000037334.3 - 43/43 B1 43297662 G Intron ANK1 ENSFCAT00000037334.3 - 43/43 B1 43298105 T Intron ANK1 ENSFCAT00000037334.3 - 43/43 B1 43298132 G Intron ANK1 ENSFCAT00000037334.3 - 43/43 B1 43298727 T Intron ANK1 ENSFCAT00000037334.3 - 43/43 B1 43298773 G Intron ANK1 ENSFCAT00000037334.3 - 43/43 B1 43299250 T Intron ANK1 ENSFCAT00000037334.3 - 43/43 B1 43328193 A Intron GPAT4 ENSFCAT00000012952.5 - 11/11 B1 43332490 T Intron GPAT4 ENSFCAT00000012952.5 - 9/11 B1 43332529 G Intron GPAT4 ENSFCAT00000012952.5 - 9/11 B1 43332686 C Intron GPAT4 ENSFCAT00000012952.5 - 9/11 B1 43332886 T Intron GPAT4 ENSFCAT00000012952.5 - 9/11 B1 43333112 G Intron GPAT4 ENSFCAT00000012952.5 - 9/11 B1 43334723 - Intron GPAT4 ENSFCAT00000012952.5 - 7/11 B1 43335595 A Intron GPAT4 ENSFCAT00000012952.5 - 7/11 B1 43338883 C Intron GPAT4 ENSFCAT00000012952.5 - 2/11 B1 43341380 A Intron GPAT4 ENSFCAT00000012952.5 - 1/11 B1 43345572 G Intron GPAT4 ENSFCAT00000012952.5 - 1/11 B1 43345576 C Intron GPAT4 ENSFCAT00000012952.5 - 1/11 B1 43346201 C Intron GPAT4 ENSFCAT00000012952.5 - 1/11 B1 43347021 - 5' UTR GPAT4 ENSFCAT00000012952.5 1/12 -

120 B1 43396981 T Intron GINS4 ENSFCAT00000012951.5 - 6/9 B1 43399167 C Intron GINS4 ENSFCAT00000012951.5 - 5/9 B1 43400100 T Intron GINS4 ENSFCAT00000012951.5 - 4/9 B1 43400625 - Intron GINS4 ENSFCAT00000012951.5 - 4/9 B1 43401660 G rs783157977 Intron GINS4 ENSFCAT00000012951.5 - 4/9 B1 43401971 - Intron GINS4 ENSFCAT00000012951.5 - 4/9 B1 43403924 - Intron GINS4 ENSFCAT00000012951.5 - 3/9 B1 43415769 - Intron GINS4 ENSFCAT00000012951.5 - 3/9 B1 43421304 C Intron GINS4 ENSFCAT00000012951.5 - 3/9 B1 43432637 A Intron GINS4 ENSFCAT00000012951.5 - 2/9 B1 43435105 G 5' UTR GINS4 ENSFCAT00000012951.5 1/10 - B1 43630113 - Intron SFRP1 ENSFCAT00000044536.3 - 2/2 B1 43630859 - Intron SFRP1 ENSFCAT00000044536.3 - 2/2 B1 43630882 T Intron SFRP1 ENSFCAT00000044536.3 - 2/2 B1 43631079 A Intron SFRP1 ENSFCAT00000044536.3 - 2/2 B1 43635275 C Intron SFRP1 ENSFCAT00000044536.3 - 2/2 B1 43637162 T Intron SFRP1 ENSFCAT00000044536.3 - 2/2 B1 43650554 A Intron SFRP1 ENSFCAT00000044536.3 - 2/2 B1 43654923 G Intron SFRP1 ENSFCAT00000044536.3 - 2/2 B1 43657160 A rs44078852 Intron SFRP1 ENSFCAT00000044536.3 - 2/2 B1 43668769 - Intron SFRP1 ENSFCAT00000082995.1 - 2/2 B1 43671611 G rs783092900 Intron SFRP1 ENSFCAT00000082995.1 - 2/2 B1 43672535 C Intron SFRP1 ENSFCAT00000082995.1 - 2/2 B1 43676771 A rs44078864 Intron SFRP1 ENSFCAT00000082995.1 - 2/2 B1 43676788 T Intron SFRP1 ENSFCAT00000082995.1 - 2/2 B1 43678080 T Intron SFRP1 ENSFCAT00000082995.1 - 2/2 B1 43679323 G rs44078868 Intron SFRP1 ENSFCAT00000082995.1 - 2/2 B1 43679800 T Intron SFRP1 ENSFCAT00000082995.1 - 2/2 B1 43682556 A Intron SFRP1 ENSFCAT00000082995.1 - 2/2 B1 43910457 G Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43916143 G Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43918115 A Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43926544 G Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43927944 - Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43935553 G Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43938442 C Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43944014 G rs784346186 Intron ZMAT4 ENSFCAT00000034824.3 - 1/6

121 B1 43949138 A rs784710239 Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43949628 G Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43950222 T Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43950489 G Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43961917 C Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43967573 G Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43967935 - Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43970990 A Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43972447 T Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43977268 T Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43978241 - Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43978705 A Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43978791 - Intron ZMAT4 ENSFCAT00000034824.3 - 1/6 B1 43982710 - Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43983799 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43984487 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43984903 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43984946 A Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43985172 G Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43985769 - Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43985968 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43986149 A Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43987342 A Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43987762 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43987784 GA Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43987860 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43988174 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43988980 A Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43989259 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43989467 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43990339 A Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43990806 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43991663 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43995605 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43995665 - Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43995863 - Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 43997871 A Intron ZMAT4 ENSFCAT00000034824.3 - 2/6

122 B1 43999891 A Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 44002740 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 44002859 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 44008612 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 44012331 C Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 44015434 C Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 44015859 - Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 44015949 A Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 44020778 A Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 44021362 T Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 44029876 G Intron ZMAT4 ENSFCAT00000034824.3 - 2/6 B1 44031799 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44032096 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44034075 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44037448 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44039621 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44039816 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44041494 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44042024 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44042710 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44042711 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44042869 T rs784700671 Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44042941 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44043000 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44043072 G rs783699422 Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44043098 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44044505 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44044935 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44045118 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44045130 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44045149 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44045163 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44045204 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44045314 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44045393 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44045637 G rs785743355 Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44045928 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6

123 B1 44046134 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44046267 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44046497 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44046823 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44047295 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44047461 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44047610 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44048026 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44048167 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44048378 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44048743 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44048847 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44049316 T rs784221549 Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44049491 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44049570 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44049592 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44049803 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44049895 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44050602 TT Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44050786 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44051902 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44051948 - Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44052312 - Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44052968 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44054384 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44054505 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44055370 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44056305 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44056927 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44056948 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44057948 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44058714 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44058853 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44060372 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44061316 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44062016 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44063892 - Intron ZMAT4 ENSFCAT00000034824.3 - 3/6

124 B1 44065277 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44068711 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44069385 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44069698 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44071477 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44077596 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44085026 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44086922 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44087153 A Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44087154 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44092207 C Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44093143 - Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44093361 T Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44099336 G Intron ZMAT4 ENSFCAT00000034824.3 - 3/6 B1 44100562 T Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44100780 G rs784275009 Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44101308 - Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44107034 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44107035 T Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44109269 G Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44109853 G Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44110575 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44111016 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44111132 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44112170 G Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44112835 G Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44112931 G Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44112996 G Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44113596 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44113624 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44113920 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44113945 T Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44114200 G Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44114212 C Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44114384 T Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44114470 G Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44114859 T rs784226689 Intron ZMAT4 ENSFCAT00000034824.3 - 4/6

125 B1 44114902 C rs783557475 Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44115239 A rs784703583 Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44115287 G Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44115731 A rs784643800 Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44115944 T rs786194520 Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44116400 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44117182 C Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44117228 A rs785220305 Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44117585 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44118519 T Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44119057 T Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44122832 C Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44122987 T Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44123095 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44123456 C Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44124601 T Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44125322 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44126232 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44126376 A Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44126554 G Intron ZMAT4 ENSFCAT00000034824.3 - 4/6 B1 44128274 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44134666 - Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44134668 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44137068 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44138040 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44138571 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44138887 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44140122 G rs784908939 Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44140137 A rs785147194 Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44140406 G rs784044427 Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44144955 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44145589 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44145590 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44148043 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44149110 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44149233 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44150376 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6

126 B1 44151604 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44152443 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44152553 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44152685 - Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44157337 - Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44158208 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44159420 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44162362 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44164409 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44168299 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44168681 - Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44169049 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44171724 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44172933 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44174304 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44175526 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44175739 - Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44176046 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44176375 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44176537 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44177791 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44180595 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44182044 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44183317 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44183426 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44184914 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44186201 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44187405 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44191029 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44191841 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44192215 - Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44192624 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44193674 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44195822 - Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44195826 - Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44195832 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44196095 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6

127 B1 44196146 A Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44196569 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44198044 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44199418 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44200542 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44200899 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44201133 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44201545 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44202383 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44202548 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44202725 G Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44203573 T Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44203924 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44207134 - Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44207218 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44207837 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44209688 C Intron ZMAT4 ENSFCAT00000034824.3 - 5/6 B1 44212405 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44215631 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44217493 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44218717 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44219040 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44219392 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44220032 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44221012 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44223246 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44225532 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44226260 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44227013 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44227324 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44228771 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44228879 C Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44229134 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44229643 C Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44229974 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44230291 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44231111 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6

128 B1 44231500 C Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44231539 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44231689 - Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44231878 C Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44231934 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44232206 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44234174 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44234253 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44234438 C Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44234608 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44235054 C Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44235056 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44235185 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44236499 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44236755 - Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44237215 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44237228 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44237594 - Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44237689 C Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44237784 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44237787 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44238702 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44239425 C Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44241831 C Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44246623 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44247345 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44248607 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44248858 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44249781 G rs786049643 Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44250034 A rs784687481 Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44250330 - Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44251631 - Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44252309 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44254148 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44254168 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44254214 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44254406 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6

129 B1 44254651 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44254989 T Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44255607 G rs785543572 Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44255763 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44256205 C Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44256206 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44256338 A Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44256996 - Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44257475 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 44258490 G Intron ZMAT4 ENSFCAT00000034824.3 - 6/6 B1 48645148 A Intron UNC5D ENSFCAT00000051223.1 - 17/17 B1 48645223 A Intron UNC5D ENSFCAT00000051223.1 - 17/17 B1 48647623 A Intron UNC5D ENSFCAT00000051223.1 - 16/17 B1 48649036 T Intron UNC5D ENSFCAT00000051223.1 - 16/17 B1 48649218 - Intron UNC5D ENSFCAT00000051223.1 - 16/17 B1 48650415 C Intron UNC5D ENSFCAT00000051223.1 - 16/17 B1 48653687 T Intron UNC5D ENSFCAT00000051223.1 - 15/17 B1 48656198 T Intron UNC5D ENSFCAT00000051223.1 - 15/17 B1 48656229 T Intron UNC5D ENSFCAT00000051223.1 - 15/17 B1 48656401 - Intron UNC5D ENSFCAT00000051223.1 - 15/17 B1 48657016 T Intron UNC5D ENSFCAT00000051223.1 - 15/17 B1 48657456 T Intron UNC5D ENSFCAT00000051223.1 - 15/17 B1 48658308 G Intron UNC5D ENSFCAT00000051223.1 - 15/17 B1 48658589 T Intron UNC5D ENSFCAT00000051223.1 - 15/17 B1 48659593 T Intron UNC5D ENSFCAT00000051223.1 - 15/17 B1 48659602 C Intron UNC5D ENSFCAT00000051223.1 - 15/17 B1 48659818 T Intron UNC5D ENSFCAT00000051223.1 - 15/17 B1 48660259 - Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48661328 G Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48661580 A Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48661641 A Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48661773 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48661795 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48661875 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48662100 C Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48662351 A Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48662827 A Intron UNC5D ENSFCAT00000051223.1 - 14/17

130 B1 48662915 - Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48663134 G Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48663185 A Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48663420 - Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48663444 C Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48663694 A Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48664006 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48664765 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48664957 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48666591 T rs785320749 Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48666793 G Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48666942 - Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48667060 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48667130 C Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48667368 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48667388 - Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48667913 A Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48667940 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48667998 C Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48668285 A rs786089320 Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48668367 C Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48668379 G Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48668483 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48668585 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48668704 G Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48668790 G Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48668936 - Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48668979 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48669037 T Intron UNC5D ENSFCAT00000051223.1 - 14/17 B1 48669672 G Intron UNC5D ENSFCAT00000051223.1 - 13/17 B1 48669843 - Intron UNC5D ENSFCAT00000051223.1 - 13/17 B1 48670416 T Intron UNC5D ENSFCAT00000051223.1 - 13/17 B1 48670417 G Intron UNC5D ENSFCAT00000051223.1 - 13/17 B1 48670531 T Intron UNC5D ENSFCAT00000051223.1 - 13/17 B1 48670898 C Intron UNC5D ENSFCAT00000051223.1 - 13/17 B1 48671450 A Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48671502 C Intron UNC5D ENSFCAT00000051223.1 - 12/17

131 B1 48671578 C Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48671686 T Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48672136 C Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48672328 T Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48672494 T Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48672527 T Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48673905 T Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48675114 - Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48675701 A Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48675852 T Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48676133 G Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48679145 C Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48680652 A Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48682988 C Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48683301 C Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48684017 C Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48685496 G Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48685811 T Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48685812 G Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48685907 A Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48686742 T Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48686778 C Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48687931 G Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48688193 T Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48689504 A Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48690820 A Intron UNC5D ENSFCAT00000051223.1 - 12/17 B1 48692606 G Intron UNC5D ENSFCAT00000051223.1 - 11/17 B1 48693033 G Intron UNC5D ENSFCAT00000051223.1 - 11/17 B1 48693860 T Intron UNC5D ENSFCAT00000051223.1 - 11/17 B1 48693923 T Intron UNC5D ENSFCAT00000051223.1 - 11/17 B1 48693978 A Intron UNC5D ENSFCAT00000051223.1 - 11/17 B1 48694137 - Intron UNC5D ENSFCAT00000051223.1 - 11/17 B1 48694612 C Intron UNC5D ENSFCAT00000051223.1 - 11/17 B1 48694651 G Intron UNC5D ENSFCAT00000051223.1 - 11/17 B1 48696620 T Intron UNC5D ENSFCAT00000051223.1 - 10/17 B1 48697721 T Intron UNC5D ENSFCAT00000051223.1 - 10/17 B1 48698163 A Intron UNC5D ENSFCAT00000051223.1 - 10/17

132 B1 48698285 - Intron UNC5D ENSFCAT00000051223.1 - 10/17 B1 48699168 T Intron UNC5D ENSFCAT00000051223.1 - 10/17 B1 48699360 - Intron UNC5D ENSFCAT00000051223.1 - 10/17 B1 48699362 - Intron UNC5D ENSFCAT00000051223.1 - 10/17 B1 48699903 A Intron UNC5D ENSFCAT00000051223.1 - 10/17 B1 48700041 G Intron UNC5D ENSFCAT00000051223.1 - 10/17 B1 48700205 A Synonymous UNC5D ENSFCAT00000051223.1 10/18 - B1 48701363 C Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48701366 A rs785263179 Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48701542 A rs783257523 Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48701719 T Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48701758 T Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48701899 T Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48702192 T Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48702916 T Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48703353 A Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48703607 T Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48704047 T Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48704128 C Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48704634 A Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48705925 A Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48706233 A Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48706398 AG Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48706713 T Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48706915 T Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48707109 T Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48710239 - Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48710307 A rs784771839 Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48710861 T Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48710899 A Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48710954 A Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48711164 - Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48711199 T Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48714921 G Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48715293 C Intron UNC5D ENSFCAT00000051223.1 - 8/17 B1 48716136 C Intron UNC5D ENSFCAT00000051223.1 - 7/17 B1 48754267 - Intron UNC5D ENSFCAT00000051223.1 - 4/17

133 B1 48758672 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48764829 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48764917 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48765041 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48765482 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48765565 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48765602 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48765613 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48765670 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48765685 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48765723 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48765757 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48765992 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48766150 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48766235 - Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48766575 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48766594 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48766893 - Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48767383 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48767678 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48768364 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48769742 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48770898 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48771343 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48772660 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48773112 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48774785 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48776173 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48776246 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48776325 - Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48776443 - Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48776829 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48777138 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48778633 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48781488 - Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48782105 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48783697 G Intron UNC5D ENSFCAT00000051223.1 - 4/17

134 B1 48785051 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48785661 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48786021 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48786619 - Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48786622 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48787982 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48789395 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48793848 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48795080 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48795507 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48795640 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48797592 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48797672 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48801189 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48801889 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48802390 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48802966 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48804802 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48805047 - Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48805350 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48808167 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48808868 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48811438 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48811450 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48812252 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48812388 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48816732 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48817816 A Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48818702 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48818761 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48818952 - Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48820243 G Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48820408 C Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48822203 T Intron UNC5D ENSFCAT00000051223.1 - 4/17 B1 48826120 C Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48826435 C Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48827220 G Intron UNC5D ENSFCAT00000051223.1 - 3/17

135 B1 48827482 T Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48828450 T Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48828704 A Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48828728 G Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48830890 T Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48830996 G Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48831544 T Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48831643 A Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48831653 G Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48832273 - Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48832348 A Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48832626 T Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48832650 T Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48832909 T Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48833656 T Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48834127 A Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48834230 A Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48836306 A Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48837633 A Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48839467 T Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48839783 C Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48841430 A Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48845862 C Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48847225 C Intron UNC5D ENSFCAT00000051223.1 - 3/17 B1 48848151 A Intron UNC5D ENSFCAT00000051223.1 - 3/17 E1 16759102 C Intron NSRP1 ENSFCAT00000035282.3 - 3/6 E1 16806357 T Intron NSRP1 ENSFCAT00000035282.3 - 2/6 E1 16842398 T Intron EFCAB5 ENSFCAT00000004651.6 - 20/24 E1 16842405 T Intron EFCAB5 ENSFCAT00000004651.6 - 20/24 E1 16858310 - Intron EFCAB5 ENSFCAT00000004651.6 - 17/24 E1 16866767 T Intron EFCAB5 ENSFCAT00000004651.6 - 16/24 E1 16873603 A Intron EFCAB5 ENSFCAT00000004651.6 - 16/24 E1 16919167 A Intron EFCAB5 ENSFCAT00000004651.6 - 7/24 E1 16951343 A Intron EFCAB5 ENSFCAT00000004651.6 - 4/24 E1 16976234 C Intron EFCAB5 ENSFCAT00000004651.6 - 3/24 E1 17040537 C Intron SSH2 ENSFCAT00000058999.1 - 1/16 E1 17111896 - Intron SSH2 ENSFCAT00000058999.1 - 2/16

136 E1 17116191 C Intron SSH2 ENSFCAT00000058999.1 - 2/16

137 Table S4: Annotated FDM-risk variants observed across runs of homozygosity on chromosomes B1, D1 and D4.

Risk Gene Transcript ID Chromosome gDNA allele Consequence name (Ensembl) Exon Intron SIFT score B1 41220823 A Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41220867 C Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41220872 C Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41220917 A Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41220933 A Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41220961 T Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41220969 A Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41220985 T Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41220995 C Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221492 G Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221493 T Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221508 G Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221511 T Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221523 G Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221529 T Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221545 C Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221549 G Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221552 G Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221558 C Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221562 C Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221566 T Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221569 G Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221572 A Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221573 G Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221574 A Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221580 G Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 41221581 T Intron PSD3 ENSFCAT00000040631.3 - 2/18 - B1 42999385 T Intron KAT6A ENSFCAT00000056563.2 - 2/15 - B1 45073032 T Intron ADAM18 ENSFCAT00000026594.4 - 20/20 -

138 B1 45073103 T Intron ADAM18 ENSFCAT00000026594.4 - 20/20 - B1 45734703 C Intron ADAM32 ENSFCAT00000029945.4 - 6/23 - B1 45905364 C Intron ADAM9 ENSFCAT00000002595.6 - 1/21 - B1 45905371 A Intron ADAM9 ENSFCAT00000002595.6 - 1/21 - B1 45905373 A Intron ADAM9 ENSFCAT00000002595.6 - 1/21 - B1 45905378 C Intron ADAM9 ENSFCAT00000002595.6 - 1/21 - B1 46124262 A Intron TACC1 ENSFCAT00000030063.4 - 1/13 - B1 52330420 A Intron INTS9 ENSFCAT00000037745.3 - 10/17 - B1 52330421 A Intron INTS9 ENSFCAT00000037745.3 - 10/17 - B1 52330444 T Intron INTS9 ENSFCAT00000037745.3 - 10/17 - B1 52330446 A Intron INTS9 ENSFCAT00000037745.3 - 10/17 - B1 34515442 T 5'UTR STC1 ENSFCAG00000026611 1/5 - - B1 34515445 C 5'UTR STC1 ENSFCAG00000026611 1/5 - - B1 34515634 A 5'UTR STC1 ENSFCAG00000026611 1/5 - - B1 34517154 AA Intron STC1 ENSFCAG00000026611 - 1/4 - B1 34517155 A Intron STC1 ENSFCAG00000026611 - 1/4 - B1 34517296 C Intron STC1 ENSFCAG00000026611 - 1/4 - B1 34517692 T Intron STC1 ENSFCAG00000026611 - 1/4 - B1 34517747 G Intron STC1 ENSFCAG00000026611 - 1/4 - B1 34518057 G Intron STC1 ENSFCAG00000026611 - 2/4 - B1 34518188 C Intron STC1 ENSFCAG00000026611 - 2/4 - B1 34518363 T Intron STC1 ENSFCAG00000026611 - 2/4 - B1 34518497 T Intron STC1 ENSFCAG00000026611 - 2/4 - B1 34519277 - Intron STC1 ENSFCAG00000026611 - 3/4 - B1 34519997 G Intron STC1 ENSFCAG00000026611 - 3/4 - B1 34520410 G Intron STC1 ENSFCAG00000026611 - 3/4 - B1 34520603 - Intron STC1 ENSFCAG00000026611 - 3/4 - B1 34520926 C Intron STC1 ENSFCAG00000026611 - 3/4 - B1 34522105 G Intron STC1 ENSFCAG00000026611 - 3/4 - B1 34522701 C Intron STC1 ENSFCAG00000026611 - 3/4 - B1 34523070 C Intron STC1 ENSFCAG00000026611 - 3/4 - B1 34523149 G Intron STC1 ENSFCAG00000026611 - 3/4 - B1 34524505 A Intron STC1 ENSFCAG00000026611 - 3/4 - B1 34714800 G 3' UTR NKX3-1 ENSFCAG00000039688 2/2 - - B1 34524662 A Intron STC1 ENSFCAG00000026611 - 3/4 -

139 B1 34826477 A Intron SLC25A37 ENSFCAG00000005418 - 2/3 - B1 34826557 G Intron SLC25A37 ENSFCAG00000005418 - 2/3 - B1 34827492 A Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34828318 - Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34828361 C Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34828819 G Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34832020 G Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34835526 A Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34836106 A Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34836155 - Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34836634 T Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34837186 C Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34839686 T Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34840283 C Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34845203 A Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34845976 C Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34847570 C Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34847704 - Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34849586 G Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34857027 G Intron SLC25A37 ENSFCAG00000005418 - 1/3 - B1 34936157 G Intron ENTPD4 ENSFCAG00000005417 - 1/13 - B1 34938583 T Intron ENTPD4 ENSFCAG00000005417 - 2/13 - B1 34945041 - Intron ENTPD4 ENSFCAG00000005417 - 6/13 - B1 34945436 - Intron ENTPD4 ENSFCAG00000005417 - 6/13 - B1 34947775 A Intron ENTPD4 ENSFCAG00000005417 - 6/13 - B1 34950114 C Intron ENTPD4 ENSFCAG00000005417 - 6/13 - B1 34950770 T Intron ENTPD4 ENSFCAG00000005417 - 6/13 - B1 34959957 C Intron ENTPD4 ENSFCAG00000005417 - 10/13 - B1 34959971 G Intron ENTPD4 ENSFCAG00000005417 - 10/13 - B1 34961590 C Intron ENTPD4 ENSFCAG00000005417 - 12/13 - B1 34963775 T 3' UTR ENTPD4 ENSFCAG00000005417 14/14 - - B1 34963413 T Intron ENTPD4 ENSFCAG00000005417 - 13/13 - B1 34969254 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34969404 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34969511 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 -

140 B1 34969553 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34969638 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34969904 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34969981 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34969993 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34969998 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34970097 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34970194 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34970214 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34971238 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34971657 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34971680 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34971887 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34972667 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34973982 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34974355 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34974453 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34974584 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34974814 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34975699 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34976600 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34976919 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34977010 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34977221 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34977251 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34977353 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34977422 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34977716 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34980240 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34980324 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34980399 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34980524 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34980787 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34980853 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34981173 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 -

141 B1 34981349 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34981637 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34981939 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34982109 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34982254 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34982403 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34982409 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34982436 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34982742 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34982912 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34983104 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34983256 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34984521 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34984823 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34985361 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34985525 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34985924 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34985992 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34986212 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34986349 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34987228 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34987380 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34987518 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34987614 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34987847 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34987908 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34987979 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34988180 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34988204 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34988373 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34988432 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34988459 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34988521 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34988757 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34988764 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 -

142 B1 34988813 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34989349 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34989361 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34989769 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34989781 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34989914 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34989926 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34989977 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34989983 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34990362 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34990639 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34990728 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34990845 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34990850 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34990942 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34991009 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34991013 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34991142 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34991185 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34991249 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34991305 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34991359 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34991362 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34991587 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34991927 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34992417 G Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34992461 - Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34993100 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34993246 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34993665 T Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34993792 C Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34993901 A Intron ENTPD4 ENSFCAG00000005417 - 12/12 - B1 34999970 T Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35000160 G Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35000364 A Intron LOXL2 ENSFCAG00000012400 - 1/13 -

143 B1 35000504 T Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35001470 C Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35001881 G Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35002019 C Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35002822 T Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35019153 C Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35019161 A Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35019170 C Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35019300 C Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35019434 T Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35020232 C Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35020265 T Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35020347 A Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35022642 T Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35023249 A Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35027316 T Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35027744 T Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35028055 T Intron LOXL2 ENSFCAG00000012400 - 1/13 - B1 35028512 A Missense LOXL2 ENSFCAG00000012400 2/14 - tolerated_low_confidence(0.08) B1 35028658 T Synonymous LOXL2 ENSFCAG00000012400 2/14 - - B1 35028712 A Synonymous LOXL2 ENSFCAG00000012400 2/14 - - B1 35028781 T Synonymous LOXL2 ENSFCAG00000012400 2/14 - - B1 35028915 T Intron LOXL2 ENSFCAG00000012400 - 2/13 - B1 35029088 T Intron LOXL2 ENSFCAG00000012400 - 2/13 - B1 35029150 G Intron LOXL2 ENSFCAG00000012400 - 2/13 - B1 35029241 - Intron LOXL2 ENSFCAG00000012400 - 2/13 - B1 35029320 C Intron LOXL2 ENSFCAG00000012400 - 2/13 - B1 35029336 T Intron LOXL2 ENSFCAG00000012400 - 2/13 - B1 35044010 G Intron LOXL2 ENSFCAG00000012400 - 3/13 - B1 35052707 - Intron LOXL2 ENSFCAG00000012400 - 4/13 - B1 35053822 G Intron LOXL2 ENSFCAG00000012400 - 4/13 - B1 35058799 C Intron LOXL2 ENSFCAG00000012400 - 5/13 - B1 35058973 T Intron LOXL2 ENSFCAG00000012400 - 5/13 - B1 35059083 G Intron LOXL2 ENSFCAG00000012400 - 5/13 - B1 35059430 G Intron LOXL2 ENSFCAG00000012400 - 5/13 -

144 B1 35059698 C Intron LOXL2 ENSFCAG00000012400 - 5/13 - B1 35059706 C Intron LOXL2 ENSFCAG00000012400 - 5/13 - B1 35060058 A Intron LOXL2 ENSFCAG00000012400 - 6/13 - B1 35060059 C Intron LOXL2 ENSFCAG00000012400 - 6/13 - B1 35060339 G Intron LOXL2 ENSFCAG00000012400 - 6/13 - B1 35060992 C Intron LOXL2 ENSFCAG00000012400 - 6/13 - B1 35061080 C Intron LOXL2 ENSFCAG00000012400 - 6/13 - B1 35061190 A Intron LOXL2 ENSFCAG00000012400 - 6/13 - B1 35062883 G Intron LOXL2 ENSFCAG00000012400 - 6/13 - B1 35063310 A Intron LOXL2 ENSFCAG00000012400 - 6/13 - B1 35063612 A Intron LOXL2 ENSFCAG00000012400 - 6/13 - B1 35063762 A Intron LOXL2 ENSFCAG00000012400 - 6/13 - B1 35071536 T Intron LOXL2 ENSFCAG00000012400 - 7/13 - B1 35084891 A Intron LOXL2 ENSFCAG00000012400 - 10/13 - B1 35084988 T Intron LOXL2 ENSFCAG00000012400 - 10/13 - B1 35128911 T Intron R3HCC1 ENSFCAG00000012395 - 1/7 - B1 35129333 G Intron R3HCC1 ENSFCAG00000012395 - 1/7 - B1 35130023 C Intron R3HCC1 ENSFCAG00000012395 - 1/7 - B1 35220652 C Intron RHOBTB2 ENSFCAG00000005670 - 7/10 - B1 35227496 C Intron RHOBTB2 ENSFCAG00000005670 - 2/10 - B1 35238676 C Intron RHOBTB2 ENSFCAG00000005670 - 1/10 - B1 35274559 A Missense PEBP4 ENSFCAG00000029653 1/7 - deleterious_low_confidence(0) B1 35277539 T Intron PEBP4 ENSFCAG00000029653 - 1/6 - B1 35277744 C Intron PEBP4 ENSFCAG00000029653 - 1/6 - B1 35277895 A Intron PEBP4 ENSFCAG00000029653 - 1/6 - B1 35280605 G Intron PEBP4 ENSFCAG00000029653 - 1/6 - B1 35283483 C Intron PEBP4 ENSFCAG00000029653 - 1/6 - B1 35284007 C Intron PEBP4 ENSFCAG00000029653 - 1/6 - B1 38097605 - Intron LPL ENSFCAG00000012161 - 1/8 - B3 21131950 - Intron APBA2 ENSFCAG00000008067 - 4/12 - B3 26563124 - Intron GABRG3 ENSFCAG00000023805 - 3/9 - B3 26615318 - Intron GABRG3 ENSFCAG00000023805 - 3/9 - B3 28224967 C Intron NIPA1 ENSFCAG00000013033 - 1/4 - B3 28224969 C Intron NIPA1 ENSFCAG00000013033 - 1/4 - B3 28224973 C Intron NIPA1 ENSFCAG00000013033 - 1/4 -

145 B3 28224977 C Intron NIPA1 ENSFCAG00000013033 - 1/4 - B3 28224981 C Intron NIPA1 ENSFCAG00000013033 - 1/4 - D1 25511704 G Intron ETS1 ENSFCAT00000076275.1 - 2/8 - D1 25708472 G Intron FLI1 ENSFCAT00000056583.2 - 1/8 - D1 26346056 T Intron BARX2 ENSFCAT00000002881.5 - 1/3 - D1 28665092 T Intron NTM ENSFCAT00000025549.3 - 1/7 - D1 28665103 C Intron NTM ENSFCAT00000025549.3 - 1/7 - D1 31169049 C Intron GLB1L2 ENSFCAT00000083030.1 - 5/20 - D1 40570250 G Intron SESN3 ENSFCAT00000057835.2 - 1/9 - D1 40718491 A Missense KDM4B ENSFCAT00000042333.3 1/2 - deleterious_low_confidence(0.04) D1 40718502 A Synonymous KDM4B ENSFCAT00000042333.3 1/2 - - D1 40718535 T Synonymous KDM4B ENSFCAT00000042333.3 1/2 - - D1 40718573 G Missense KDM4B ENSFCAT00000042333.3 1/2 - deleterious_low_confidence(0.01) D1 40718617 G Missense KDM4B ENSFCAT00000042333.3 1/2 - deleterious_low_confidence(0) D1 40719218 G Synonymous KDM4B ENSFCAT00000042333.3 1/2 - - D1 40719269 G Synonymous KDM4B ENSFCAT00000042333.3 1/2 - - D1 40734112 A Missense KDM4D ENSFCAT00000048124.3 1/1 - tolerated(0.15) D1 40734159 T Synonymous KDM4D ENSFCAT00000048124.3 1/1 - - D1 40734282 T Synonymous KDM4D ENSFCAT00000048124.3 1/1 - - D1 40734302 G Missense KDM4D ENSFCAT00000048124.3 1/1 - deleterious(0.01) D1 40734438 C Synonymous KDM4D ENSFCAT00000048124.3 1/1 - - D1 40734947 G Synonymous KDM4D ENSFCAT00000048124.3 1/1 - - D1 40734998 A Synonymous KDM4D ENSFCAT00000048124.3 1/1 - - D1 45245100 A Intron - ENSFCAT00000076162.1 - 2/4 - D1 45245143 A Intron - ENSFCAT00000076162.1 - 2/4 - D1 45319790 A 3' UTR - ENSFCAT00000082514.1 5/5 - - D1 45319883 G 3' UTR - ENSFCAT00000082514.1 5/5 - - D1 45320197 A Missense TRIM43 ENSFCAT00000082514.1 5/5 - tolerated(1) D1 45320246 G Missense TRIM43 ENSFCAT00000082514.1 5/5 - deleterious(0) D1 45321011 C Intron - ENSFCAT00000082514.1 - 4/4 - D1 45321208 T Intron - ENSFCAT00000082514.1 - 4/4 - D1 45322565 T Intron - ENSFCAT00000082514.1 - 3/4 - D1 45322608 T Intron - ENSFCAT00000082514.1 - 3/4 - D1 45322644 A Intron - ENSFCAT00000082514.1 - 3/4 - D1 45323329 T Intron - ENSFCAT00000082514.1 - 2/4 -

146 D1 45673707 C Intron NOX4 ENSFCAT00000022277.4 - 11/17 - D1 45673725 G Intron NOX4 ENSFCAT00000022277.4 - 11/17 - D1 45673749 T Intron NOX4 ENSFCAT00000022277.4 - 11/17 - D1 45673752 A Intron NOX4 ENSFCAT00000022277.4 - 11/17 - D1 45673762 G Intron NOX4 ENSFCAT00000022277.4 - 11/17 - D1 45673766 C Intron NOX4 ENSFCAT00000022277.4 - 11/17 - D1 45673769 C Intron NOX4 ENSFCAT00000022277.4 - 11/17 - D1 45673772 A Intron NOX4 ENSFCAT00000022277.4 - 11/17 - D1 45673788 A Intron NOX4 ENSFCAT00000022277.4 - 11/17 - D1 45673793 C Intron NOX4 ENSFCAT00000022277.4 - 11/17 - D1 45673798 A Intron NOX4 ENSFCAT00000022277.4 - 11/17 - D1 45673803 G Intron NOX4 ENSFCAT00000022277.4 - 11/17 - D1 45837983 C Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 45837986 G Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 45837998 C Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 45838001 A Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 45838008 C Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 45838011 T Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 45838033 C Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 45838037 G Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 45838043 A Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 45838061 C Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 45838066 C Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 45838067 G Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 45838070 T Intron TYR ENSFCAT00000029640.4 - 5/6 - D1 46052822 T Intron GRM5 ENSFCAT00000050923.2 - 2/9 - D1 48983488 T Intron SYTL2 ENSFCAT00000068868.1 - 15/18 - D1 48983498 G Intron SYTL2 ENSFCAT00000068868.1 - 15/18 - D1 49184884 G Intron DLG2 ENSFCAT00000056330.2 - 2/26 - D1 49225396 T Intron DLG2 ENSFCAT00000056330.2 - 2/26 - D1 49482620 G Intron DLG2 ENSFCAT00000056330.2 - 3/26 - D1 49482692 G Intron DLG2 ENSFCAT00000056330.2 - 3/26 - D1 49537891 T Intron DLG2 ENSFCAT00000056330.2 - 4/26 - D1 49537916 T Intron DLG2 ENSFCAT00000056330.2 - 4/26 - D1 49537945 T Intron DLG2 ENSFCAT00000056330.2 - 4/26 -

147 D1 50150155 A Intron DLG2 ENSFCAT00000056330.2 - 6/26 - D1 50705037 G Intron DLG2 ENSFCAT00000056330.2 - 14/26 - D1 50758227 C Intron DLG2 ENSFCAT00000056330.2 - 15/26 - D1 55433159 T Intron TENM4 ENSFCAT00000064454.2 - 3/29 - D1 55433188 T Intron TENM4 ENSFCAT00000064454.2 - 3/29 - D1 55433234 G Intron TENM4 ENSFCAT00000064454.2 - 3/29 - D1 55844373 T Intron NARS2 ENSFCAT00000007986.6 - 6/13 - D1 45832512 - Intron TYR ENSFCAG00000024128 - 5/6 - D1 46052822 T Intron GRM5 ENSFCAG00000035511 - 2/9 - D1 47782913 - Intron TMEM135 ENSFCAG00000028774 - 3/14 - D1 49225396 T Intron DLG2 ENSFCAG00000027392 - 2/26 - D1 57205183 - Intron MYO7A ENSFCAG00000014446 - 1/47 - D4 35955267 T Intron PTPRD ENSFCAG00000024457 - 19/34 - D4 35999761 A Intron PTPRD ENSFCAG00000024457 - 8/34 - D4 36069647 T Intron PTPRD ENSFCAG00000024457 - 3/34 -

148 Appendix II Supplementary data for Chapter 3

149 multiplexed individuals each. each. individuals multiplexed Table S1 CHEETAH_NAM1 CHEETAH_NAM2 CHEETAH_NAM3 CHEETAH_TZA1 CHEETAH_TZA2 CHEETAH_TZA3 hea_olcheetah_zoo Cheetah_pool Snow_pool U_S1SRR7152382 SUM_USA1 U_S2SRR7152388 Tiger_pool SUM_USA2 U_N1SRR7152379 SUM_IND1 U_N2SRR7152383 SUM_IND2 U_N3SRR7152384 SUM_IND3 U_N4SRR7152385 SUM_IND4 SRR7152386 SUM_IND5 Study ID NWSRR836372 SNOW

: Sequencing and alignment performance for all individual samples and species pools. Cheetah_

snowleopard_zoo

SRR2737545 SRR2737544 SRR2737543 SRR2737542 SRR2737541 SRR2737540 tiger_zoo SRA ID

259,891,068 278,106,842 190,441,404 157,217,480 167,900,195 195,795,738 179,294,346 140,785,595 249,813,908 960,748,344 830,023,021 251,239,016 674,533,906 259,742,613 244,777,848 896,654,664 252,998,818

Total

Duplicated Duplicated

417,175,752 83,454,225 73,267,778 23,603,406 5,608,853 7,775,577 6,479,806 4,072,491 5,232,703 6,685,332 5,908,472 3,460,750 8,206,663 7,898,946 8,689,944 4,484,808 9,136,531

275,406,392 254,805,521 187,979,646 155,245,585 165,042,092 193,485,848 177,139,914 139,398,378 949,542,518 815,442,939 242,713,795 627,466,282 246,868,240 254,472,013 867,848,329 249,689,186 239,176,510 Mapped Mapped

273,819,174 255,801,214 188,922,576 155,987,120 166,543,258 194,205,160 177,873,254 139,688,172 935,574,618 672,983,076 246,043,452 668,139,522 247,091,046 255,459,558 826,973,788 249,273,320 240,764,770 Paired

mapped % mapped Paired 84 136,909,587 127,900,607 98.46 98.43 92 77,993,560 83,271,629 97,102,580 99.22 88,936,627 99.19 69,844,086 99.19 467,787,309 99.21 99.22 97.38 10 336,491,538 81.08 84 123,021,726 98.49 99.20 90 334,069,761 99.05 123,545,523 98.35 83 127,729,779 98.35 22 413,486,894 92.23 124,636,660 120,382,385 98.53 98.36

pool

, tiger_

Read1 Read1

94,

461,288 pool

and snowleopard_pool and

136,909,587 127,900,607 467,787,309 336,491,538 123,021,726 334,069,761 123,545,523 127,729,779 413,486,894 124,636,660 120,382,385 Read2 Read2 77,993,560 83,271,629 97,102,580 88,936,627 69,844,086 94,461,288

Both pairs refer speciesto pools consistingof 4 mapped mapped 269,598,242 247,404,706 153,143,052 162,763,432 190,803,454 174,700,322 137,598,032 918,299,114 657,631,804 234,500,482 185,388,388 615,857,152 239,769,154 246,714,320 796,591,428 244,357,960 231,236,440

mapped % mapped Properly 69 4,488,153 4,549,492 96.94 95.20 74 2,434,868 2,305,957 2,653,863 97.41 2,444,758 96.94 1,906,071 97.45 97.44 97.74 55 26,623,748 95.58 92 8,347,980 79.23 38 4,429,079 93.87 73 3,041,999 97.35 13 28,597,406 91.30 4,412,118 95.43 49 5,232,435 94.98 88 15,126,551 88.84 65 4,536,377 4,267,876 96.58 94.47 pairs

to different chr to different Mate mapped mapped Mate

(Q=>5)

150 Figure S1: mapped and total reads across WGS samples and pools. Total number of mapped reads (black) includes singletons and pairs. Number of paired reads mapped indicated by the grey line. Tiger_zoo, snowleopard_zoo and cheetah_zoo refer to multiplexed pool samples.

151

a

b

c

Figure S2: Plots of genome coverage for each sample bam file aligned to the felCat9 reference assembly for a. cheetah, b. Sumatran tiger and c. snow leopard cohorts. On each panel, the key indicates coloured line of each sample and their sequencing depth in brackets.

152 Table S2: Functional annotation of all fixed and within-species SNPs for each species

Sumatran tiger Snow leopard Cheetah Variant type Fixed Within species Fixed Within species Fixed Within species splice_acceptor_variant 104 26 352 1 622 92 splice_donor_variant 108 10 313 2 727 58 stop_gained 84 57 375 11 705 250 stop_lost 17 5 77 2 157 19

start_lost 37 0 131 0 252 35 missense_variant 19,012 2,737 60,808 623 94,623 17,333 splice_region_variant 8,396 843 26,349 94 42,760 4,650 153 stop_retained_variant 30 4 107 3 146 23 synonymous_variant 37,539 4,330 122,446 743 186,854 24,391 5_prime_UTR_variant 5,103 555 17,690 86 32,661 5,357 3_prime_UTR_variant 13,561 1,901 47,397 442 97,389 13,317 non_coding_transcript_exon_variant 15,637 2,291 59,041 239 132,397 16,239 intron_variant 4,722,034 463,442 17,028,335 59,801 32,761,264 3,201,762 non_coding_transcript_variant 476,553 52,425 1,712,728 6,380 3,165,903 315,529 upstream_gene_variant 344,265 39,732 1,282,784 5,624 2,650,874 289,107 downstream_gene_variant 339,310 39,610 1,255,622 6,517 2,669,476 292,755 intergenic_variant 1,643,862 190,707 6,190,833 26,298 11,568,383 1,196,177

Table S3a: Top 20 gene ontology terms (GOterms) enriched for species-specific SNV within cheetahs. P-values were adjusted for multiple testing using Benjamini-Hochberg false discovery rate. GOtermID GOterm name Padj GO:0005515 protein binding 3.23E-23 GO:0051179 localization 6.31E-22 GO:0032502 developmental process 7.91E-17 GO:0048856 anatomical structure development 1.45E-16 GO:0007275 multicellular organism development 2.97E-15 GO:0005488 binding 5.64E-15 GO:0120025 plasma membrane bounded cell projection 3.26E-14 GO:0042995 cell projection 3.77E-14 GO:0048731 system development 2.71E-13 GO:0043167 ion binding 8.85E-12 GO:0005856 cytoskeleton 9.20E-12 GO:0006928 movement of cell or subcellular component 3.60E-11 GO:0051234 establishment of localization 5.54E-11 GO:0097367 carbohydrate derivative binding 7.83E-11 GO:0030554 adenyl nucleotide binding 1.25E-10 GO:0032559 adenyl ribonucleotide binding 1.65E-10 GO:0048869 cellular developmental process 2.48E-10 GO:0043168 anion binding 6.42E-10 GO:0036094 small molecule binding 1.27E-09 GO:0005524 ATP binding 1.56E-09

154 Table S3b: Top 20 gene ontology terms (GOterms) enriched for species specific SNVs within Sumatran tigers. P-values were adjusted for multiple testing using Benjamini-Hochberg false discovery rate. GOtermID GOterm name Padj GO:0005515 protein binding 6.01E-95 GO:0003824 catalytic activity 9.19E-56 GO:0043168 anion binding 2.36E-52 GO:0036094 small molecule binding 4.08E-40 GO:1901265 nucleoside phosphate binding 1.18E-39 GO:0000166 nucleotide binding 1.47E-39 GO:0097367 carbohydrate derivative binding 2.59E-38 GO:0030554 adenyl nucleotide binding 2.74E-37 GO:0032559 adenyl ribonucleotide binding 7.81E-37 GO:0017076 purine nucleotide binding 1.74E-34 GO:0032553 ribonucleotide binding 1.81E-34 GO:0005524 ATP binding 2.87E-34 GO:0032555 purine ribonucleotide binding 7.19E-34 GO:0035639 purine ribonucleoside triphosphate binding 1.09E-31 GO:0019899 enzyme binding 5.86E-29 GO:0016787 hydrolase activity 7.10E-28 GO:0016772 transferase activity, transferring phosphorus-containing groups 9.10E-28 GO:0016301 kinase activity 1.84E-27 GO:0016773 phosphotransferase activity, alcohol group as acceptor 2.25E-27 GO:0008092 cytoskeletal protein binding 5.09E-26

155 Table S4: Genes under positive selection identified as those displaying elevated πNπS ratios across all three species.

Species Gene name Gene symbol Gene ID πNπS Sumatran tiger Alpha 1-3-galactosyltransferase LOC101087791 ENSFCAG00000025873 3.04 Vomeronasal type-1 receptor LOC101091239 ENSFCAG00000025070 1.45 none none ENSFCAG00000053257 1.22 none none ENSFCAG00000051471 1.16

Olfactory receptor 5H2 LOC101089384 ENSFCAG00000049952 1.16

none none ENSFCAG00000045227 1.13 Snow leopard none LOC101089645 ENSFCAG00000040741 1.47 Cheetah sidekick cell adhesion molecule 1 SDK1 ENSFCAG00000012315 77.66 N-terminal EF-hand calcium binding NECAB2 ENSFCAG00000026632 18.62 protein 2 membrane spanning 4-domains A7 MS4A7 ENSFCAG00000007252 15.65 epoxide hydrolase 1 EPHX1 ENSFCAG00000006965 5.62 olfactory receptor 2A1/2A42-like LOC101085032 ENSFCAG00000043770 2.87 RAB44 RAB44 ENSFCAG00000023379 2.79 none none ENSFCAG00000029091 2.27 G2 and S-phase expressed 1 GTSE1 ENSFCAG00000006532 2.26 dynein axonemal heavy chain 2 DNAH2 ENSFCAG00000009626 2.12 synaptotagmin like 2 SYTL2 ENSFCAG00000029903 2.06 toll like receptor 3 TLR3 ENSFCAG00000031090 1.91 2'-5'-oligoadenylate synthetase 3 OAS3 ENSFCAG00000000320 1.87 Fc receptor like 4 FCRL4 ENSFCAG00000038609 1.86 adaptor related protein complex 3 AP3S1 ENSFCAG00000029920 1.82 subunit sigma 1 none none ENSFCAG00000051726 1.78 ribosome biogenesis protein NSA2 LOC101094823 ENSFCAG00000025379 1.73 homolog BTB domain containing 16 BTBD16 ENSFCAG00000003652 1.72 olfactory receptor 151-like LOC101084218 ENSFCAG00000033436 1.71 dynein axonemal heavy chain 6 DNAH6 ENSFCAG00000000768 1.68 none none ENSFCAG00000052912 1.66 none none ENSFCAG00000049647 1.63 migration and invasion inhibitory protein MIIP ENSFCAG00000000077 1.59 microtubule associated serine/threonine MAST4 ENSFCAG00000025450 1.46 kinase family member 4 none none ENSFCAG00000048579 1.43 sortilin related VPS10 domain SORCS2 ENSFCAG00000015288 1.36 containing receptor 2 maestro heat like repeat family member MROH8 ENSFCAG00000011925 1.34 8 olfactory receptor-like protein OLF3 LOC101101377 ENSFCAG00000040453 1.27 coiled-coil domain containing 136 CCDC136 ENSFCAG00000043434 1.25 adhesion G protein-coupled receptor ADGRG6 ENSFCAG00000024629 1.24 G6

156 olfactory receptor 5V1-like LOC101095034 ENSFCAG00000027476 1.23 cortactin CTTN ENSFCAG00000004767 1.22 sarcoglycan gamma SGCG ENSFCAG00000033144 1.2 cytoskeleton associated protein 2 CKAP2 ENSFCAG00000010021 1.19 polo like kinase 5 PLK5 ENSFCAG00000001784 1.18 putative olfactory receptor 5AK3 LOC101086178 ENSFCAG00000027084 1.16 C2 domain containing 3 centriole C2CD3 ENSFCAG00000028884 1.14 elongation regulator none none ENSFCAG00000024754 1.1 xin actin binding repeat containing 1 XIRP1 ENSFCAG00000013533 1.1 leucine rich repeat containing 66 LRRC66 ENSFCAG00000042278 1.09 none none ENSFCAG00000035027 1.09 none none ENSFCAG00000053257 1.08 ribokinase RBKS ENSFCAG00000003303 1.08 hemicentin 2 HMCN2 ENSFCAG00000030267 1.08 fms related tyrosine kinase 4 FLT4 ENSFCAG00000001882 1.07 none none ENSFCAG00000040052 1.05 proline rich 14 like PRR14L ENSFCAG00000029759 1.04 olfactory receptor 51G2-like LOC101092885 ENSFCAG00000025216 1.04 RAB11 family interacting protein 1 RAB11FIP1 ENSFCAG00000028568 1.03 anoctamin 7 ANO7 ENSFCAG00000010453 1.03 none none ENSFCAG00000003805 1.03 PBX homeobox interacting protein 1 PBXIP1 ENSFCAG00000001344 1.03 EF-hand calcium-binding domain- LOC102899173 ENSFCAG00000003882 1.02 containing protein 3 ATR serine/threonine kinase ATR ENSFCAG00000018078 1

157 Table S5a: Top 20 gene ontology terms (GOterms) enriched across fixed SNVs in snow leopards. P-values were adjusted for multiple testing using Benjamini-Hochberg FDR (false discovery rate). GOtermID GOterm name Padj GO:0005515 protein binding 4.33E-72 GO:0032502 developmental process 1.05E-58 GO:0048856 anatomical structure development 5.03E-58 GO:0005488 binding 2.69E-52 GO:0005737 cytoplasm 1.15E-50 GO:0007275 multicellular organism development 6.02E-50 GO:0048731 system development 1.39E-48 GO:0043227 membrane-bounded organelle 2.47E-45 GO:0005622 intracellular 3.96E-43 GO:0048518 positive regulation of biological process 2.30E-39 GO:0043231 intracellular membrane-bounded organelle 2.54E-37 GO:0048522 positive regulation of cellular process 1.82E-35 GO:0048513 animal organ development 3.84E-35 GO:0048519 negative regulation of biological process 7.14E-35 GO:0048869 cellular developmental process 8.16E-35 GO:0030154 cell differentiation 2.10E-34 GO:0043226 organelle 9.86E-34 GO:0051179 localization 2.55E-33 GO:0048523 negative regulation of cellular process 2.17E-31 GO:0051239 regulation of multicellular organismal process 1.21E-30

158 Table S5b: Top 20 gene ontology terms (GOterms) enriched across fixed SNVs in Sumatran tigers. P-values were adjusted for multiple testing using Benjamini-Hochberg FDR (false discovery rate). GOtermID GOterm name Padj GO:0005515 protein binding 6.01E-95 GO:0003824 catalytic activity 9.19E-56 GO:0043168 anion binding 2.36E-52 GO:0036094 small molecule binding 4.08E-40 GO:1901265 nucleoside phosphate binding 1.18E-39 GO:0000166 nucleotide binding 1.47E-39 GO:0097367 carbohydrate derivative binding 2.59E-38 GO:0030554 adenyl nucleotide binding 2.74E-37 GO:0032559 adenyl ribonucleotide binding 7.81E-37 GO:0017076 purine nucleotide binding 1.74E-34 GO:0032553 ribonucleotide binding 1.81E-34 GO:0005524 ATP binding 2.87E-34 GO:0032555 purine ribonucleotide binding 7.19E-34 GO:0035639 purine ribonucleoside triphosphate binding 1.09E-31 GO:0019899 enzyme binding 5.86E-29 GO:0016787 hydrolase activity 7.10E-28 GO:0016772 transferase activity, transferring phosphorus-containing groups 9.10E-28 GO:0016301 kinase activity 1.84E-27 GO:0016773 phosphotransferase activity, alcohol group as acceptor 2.25E-27 GO:0008092 cytoskeletal protein binding 5.09E-26

159 Table S6: Multi-species alignment of LCORL (ENSFCAG00000029474) revealed Panthera-specific conservation of six missense variants. Cheetah, snow leopard and Sumatran tiger refer to samples aligned to the domestic cat (felCat9) reference assembly. Protein positions are reported relative to the Ensembl transcript ENSFCAT00000081895.1

B1:194193762 B1:194194515 B1:194194793 B1:194194989 B1:194195030 B1:194196732 B1:194196965 B1:194197053 B1:194197775 B1:194197785 B1:194197940 Genomic position Nucleotide variant A/G G/A G/C C/T T/C T/C C/T G/A T/C G/A A/T

protein position 393 644 737 802 816 1383 1461 1490 1731 1734 1786

Cat amino acid variant N/S R/K A/P A/V S/P L/P R/C R/Q F/L R/K N/Y Cheetah S K A A S L R R F R N Snow leopard S K P V P P C Q L K Y Sumatran tiger S K P V P P C Q L K Y Lion S K P V P P C R L K Y Domestic cat N R A A S L R R F R N Domestic dog S K A T S T H R L R N Cow N K A T S A H Q L R N Horse H K A T P A H R L R N

160 Table S7: Protocadherin genes containing fixed non-synonymous SNVs common to all big cat species relative to the domestic cat (felCat9) reference assembly.

Reference allele Gene Transcript ID gDNA (felCat9) Big cat allele Amino acid Consequence PCDHAC2 ENSFCAT00000078597.1 A1:119042691 C A Q/K Missense PCDHB1 ENSFCAT00000003688.4 A1:119266039 G A A/T Missense PCDHB4 ENSFCAT00000001366.6 A1:119316197 A G S/N Missense A1:119316209 T G W/L Missense PCDHB5 ENSFCAT00000065939.1 A1:119316539 C T S/P Missense A1:119317400 A G V/I Missense PCDHB13 ENSFCAT00000045876.3 A1:119350686 A G R/Q Missense PCDHGB1 ENSFCAT00000049228.3 A1:119479145 T G V/L Missense A1:119480423 A C P/T Missense PCDHGA6 ENSFCAT00000028046.4 A1:119504728 C T - Splice donor

161 Table S8: Genes containing deleterious SNVs implicated in heritable conditions affecting big cats grouped by species.

Sumatran tiger APOB CEP350 GALNTL5 MAST2 PHACTR4 ATAD5 CEP72 GAPDH MGAT4D SPACA1 ATR DDIAS GCNT3 MROH2B SPAG6 BTBD18 DNAH7 HAUS3 MSH6 SPATA7 CALR3 DRD5 HAUS6 MT-ND4 SPEF2 CCP110 DUSP13 HOMER2 NSRP1 TMF1 CCT3 DYNC2H1 IL2RA PAQR5 ZFX CENPJ ENO1 LRGUK PCDH15 ZP2 CEP19 FHDC1 LRRK2 PCM1 Snow leopard ADAM29 ADCY10 CCNYL1 CCT3 KIF20B LAMB1 Cheetah ABL1 CABYR CPLANE1 FETUB KLRK1 ACAN CACNA1S CR2 FGF18 KNL1 ADAM20 CAPN2 CRTAP FLT1 KRT9 ADAM29 CAST CST11 FRAS1 LAMA3 ADAR CC2D2A CTSH FREM1 LRGUK ADGRF4 CCDC33 CUL3 FREM2 LRIG3 ADGRF5 CCDC39 DDO GALNTL5 LRP1B AHI1 CCDC42 DLC1 GGN LRP2 ALMS1 CCDC62 DNAAF1 GINS4 LRRK2 AMER1 CCN1 DNAH1 GPR31 LUM AMH CD19 DNAH2 H1-6 MAP7 APOB CD1B DNAH8 HERPUD2 MASTL AR CDH23 DNAJA1 HMX2 MATN3 ARID4B CDK16 DNALI1 HSP90AA1 MBTPS2 ASXL2 CENPU DRC7 HSPA8 MEIKIN ATP2B4 CEP290 DUOX2 HSPE1-RS1 MMP13 ATP8B3 CFAP44 DUSP13 HSPG2 MORC1 ATR CFAP69 DYNC2H1 HTR2B MROH2B AURKA CLEC4G EFCAB9 IFT140 MVK BARD1 CLEC7A EHMT2 IL4R NCAPG2 BRCA1 CLOCK EPO INVS NES C2CD3 CLUAP1 ERBIN IQCG MYCBPAP C5 CNTRL FABP9 KDM6A MYO3B C6 COMT FCRL6 KIAA1217 NAGLU C8A CP FDPS KLHL12 NBN NODAL PRSS55 SPATA5 TPPP2 TEX14

162 NUP210L RAD21L1 SPATA6 TRAF3IP1 TEX15 OFD1 RBBP6 SPEF2 TRPC6 TLR8 OR7C1 RECK SPTBN4 TTC21A TP53BP1 OVOL1 RIPK3 SRD5A2 TTK SLC38A10 PALLD ROPN1L STPG4 TUT7 SLC9B2 PANX1 RPGRIP1L SUSD4 USP9X SOS1 PAQR7 RTN4 SYNE1 USPL1 SPACA1 PAX8 SAXO1 TACR2 UTP25 PLPP4 PCDH15 SCUBE2 TBC1D21 WDR38 POU5F1 PCDH8 SERPINA5 TCOF1 WDR66 PRDX3 PCDHGA4 SERPINH1 TDRD1 WNK1 PRKDC PHGDH SHBG TEKT4 ZAN ZFX PHLDB1 SIK3 TEPP ZEB2 TET1 PLEKHA1 SIRT2 TESMIN ZFPM1 SLC26A3 PLK4

163

Figure S3a: Gene ontology annotation of clinically significant GOterms in cheetahs. Deleterious variants were observed in 201 genes included in custom list of GO terms relevant to known heritable conditions, immune and reproductive function. These genes were annotated for terms relevant to reproductive and immune function. Network interaction graph produced by GOnet (https://tools.dice-database.org/GOnet/)

164

Figure S3b: Gene ontology annotation of clinically significant GOterms in Sumatran tigers. Deleterious variants were observed in 44 genes included in custom list of GO terms relevant to known heritable conditions, immune and reproductive function. These genes were annotated for terms relevant to cilium structure. Network interaction graph produced by GOnet (https://tools.dice- database.org/GOnet/)

165 Table S9: Samples downloaded from Sequence Read Archive (SRA) comprised six cheetahs, one snow leopard and seven Sumatran tigers.

Species Sample SRA ID Sex Population Sequencing platform Reference

Cheetah CHEETAH_TZA3 SRR2737545 F Tanzania Illumina HiSeq 2000 Dobrynin et al. 2015

Cheetah CHEETAH_TZA2 SRR2737544 M Tanzania Illumina HiSeq 2000 Dobrynin et al. 2015

Cheetah CHEETAH_TZA1 SRR2737543 M Tanzania Illumina HiSeq 2000 Dobrynin et al. 2015

Cheetah CHEETAH_NAM3 SRR2737542 M Namibia Illumina HiSeq 2000 Dobrynin et al. 2015

Cheetah CHEETAH_NAM2 SRR2737541 M Namibia Illumina HiSeq 2000 Dobrynin et al. 2015

Cheetah CHEETAH_NAM1 SRR2737540 F Namibia Illumina HiSeq 2000 Dobrynin et al. 2015

Snow leopard SNOW SRR836372 F Captive, Korea Illumina HiSeq 2000 Cho et al. 2013

Sumatran tiger SUM_IND1 SRR7152379 M Taman Safari, Indonesia Illumina HiSeq 2500 Liu et al. 2018

Sumatran tiger SUM_USA1 SRR7152382 F Phoenix Zoo, USA Illumina HiSeq 2500 Liu et al. 2018

Sumatran tiger SUM_IND2 SRR7152383 F Taman Safari, Indonesia Illumina HiSeq 2500 Liu et al. 2018

Sumatran tiger SUM_IND3 SRR7152384 F Taman Safari, Indonesia Illumina HiSeq 2500 Liu et al. 2018

Sumatran tiger SUM_IND4 SRR7152385 M Taman Safari, Indonesia Illumina HiSeq 2500 Liu et al. 2018

Sumatran tiger SUM_IND5 SRR7152386 F Taman Safari, Indonesia Illumina HiSeq 2500 Liu et al. 2018

Sumatran tiger SUM_USA2 SRR7152388 M Atlanta Zoo, USA Illumina HiSeq 2500 Liu et al. 2018

166 Table S10: Genes associated with size in domestic species identified from a literature search

Gene symbol Gene name First cited in ADAMTSL9-AS9 ADAM metallopeptidase with thrombospondin type 1 motif 9 Plassais et al., 2019 ACSL4 Acyl-CoA Synthetase Long Chain Family Member 4 Plassais et al., 2017 GHR1 Growth Hormone Receptor 1 Rimbault et al., 2013 GHR2 Growth Hormone Receptor 2 Rimbault et al., 2013 HNF4G Hepatocyte nuclear factor 4 gamma Plassais et al., 2017 HMAG2 High-mobility group AT-hook 2 Rimbault et al., 2013 IGSF1 Immunoglobulin superfamily, member 1 Rimbault et al., 2013 IGF1 Insulin Growth Factor 1 Sutter et al., 2007 IGFBP2 Insulin Like Growth Factor 2 MRNA Binding Protein 2 Jones et al., 2008 IGF1R Insulin-like growth factor-1 receptor Hoopes et al., 2012 IRS4 Insulin Receptor Substrate 4 Plassais et al., 2017 LCORL Ligand dependent nuclear receptor corepressor Plassais et al., 2019 R3HDM1 R3H domain containing 1 Plassais et al., 2019 SMAD2 SMAD Family Member 2 Rimbault et al., 2013 STC2 Stanniocalcin 2 Rimbault et al., 2013 ZNF608 Zinc finger Protein 608 Plassais et al., 2019

167 Table S11: Selected GO enrichment terms used to classify deleterious variants potentially implicated in reproductive success and overall health of captive bred big cats.

GO terms Name Ontology Definition The developmental process by which male germ line stem cells self renew or give rise to GO:0007283 spermatogenesis biological_process successive cell types resulting in the development of a spermatozoa. The complete process of formation and maturation of an ovum or female gamete from a primordial female germ cell. Examples of this process are found in Mus musculus and GO:0048477 oogenesis biological_process Drosophila melanogaster. A reproduction process that creates a new organism by combining the genetic material of two gametes, which may come from two organisms or from a single organism, in the case of self- fertilizing hermaphrodites, e.g. C. elegans, or self-fertilization in plants. It occurs both in eukaryotes and prokaryotes: in multicellular eukaryotic organisms, an individual is created anew; in prokaryotes, the initial cell has additional or transformed genetic material. In a process called genetic recombination, genetic material (DNA) originating from two gametes join up so that homologous sequences are aligned with each other, and this is followed by exchange of genetic information. After the new recombinant chromosome is formed, it is GO:0019953 sexual reproduction biological_process passed on to progeny. Generation of the male gamete; specialised haploid cells produced by meiosis and along with GO:0048232 male gamete generation biological_process a female gamete takes part in sexual reproduction Generation of the female gamete; specialised haploid cells produced by meiosis and along GO:0007292 female gamete generation biological_process with a male gamete takes part in sexual reproduction The process whose specific outcome is the progression of an embryo from its formation until the end of its embryonic life stage. The end of the embryonic stage is organism-specific. For example, for mammals, the process would begin with zygote formation and end with birth. For , the process would begin at zygote formation and end with larval hatching. For plant zygotic embryos, this would be from zygote formation to the end of seed dormancy. For plant vegetative embryos, this would be from the initial determination of the cell or group of cells to GO:0009790 embryo development biological_process form an embryo until the point when the embryo becomes independent of the parent plant. embryonic process involved in female A reproductive process occurring in the embryo or fetus that allows the embryo or fetus to GO:0060136 pregnancy biological_process develop within the mother. The set of physiological processes that allow an embryo or foetus to develop within the body of a female animal. It covers the time from fertilization of a female ovum by a male GO:0007565 female pregnancy biological_process spermatozoon until birth A type of ovulation cycle, which occurs in most mammalian therian females, where the GO:0044849 estrous cycle biological_process endometrium is resorbed if pregnancy does not occur. The chemical reactions and pathways involving progesterone, a steroid hormone produced in GO:0042448 progesterone metabolic process biological_process the ovary which prepares and maintains the uterus for pregnancy. Also found in plants An immune response mediated by the innate immune system and directed against a previously encountered immunologic stimulus, being quicker and quantitatively better GO:0090714 innate immunity memory response biological_process compared with the initial response to that stimulus. Any process involved with the carrying out of an immune response by a B cell, through, for GO:0019724 B cell mediated immunity biological_process instance, the production of antibodies or cytokines, or antigen presentation to T cells. GO:0002456 T cell mediated immunity biological_process Any process involved in the carrying out of an immune response by a T cell. Interacting selectively and non-covalently with major histocompatibility complex molecules; a set of molecules displayed on cell surfaces that are responsible for lymphocyte recognition GO:0042287 MHC protein binding molecular_function and antigen presentation The aggregation, arrangement and bonding together of a set of components to form an MHC GO:0002396 MHC protein complex assembly biological_process protein complex. The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a GO:0060271 cilium assembly biological_process centriole. A process that is carried out at the cellular level which results in the assembly, arrangement vestibular receptor cell stereocilium of constituent parts, or disassembly of a stereocilium. A stereocilium is an actin-based GO:0060121 organization biological_process protrusion from the apical surface of vestibular hair cells An actin-based protrusion from the apical surface of auditory and vestibular hair cells and of neuromast cells. These protrusions are supported by a bundle of cross-linked actin filaments (an actin cable), oriented such that the plus (barbed) ends are at the tip of the protrusion, capped by a tip complex which bridges to the plasma. Bundles of stereocilia act as GO:0032420 stereocilium cellular_component mechanosensory organelles. hip joint articular cartilage The process whose specific outcome is the progression of hip joint articular cartilage over GO:0061977 development biological_process time, from its formation to the mature structure. negative regulation of inflammatory Any process that stops, prevents, or reduces the frequency, rate or extent of the inflammatory GO:0050728 response biological_process response. GO:0061827 sperm head cellular_component The part of the late spermatid or spermatozoon that contains the nucleus and acrosome A microtubule-based flagellum (or cilium) that is part of a sperm, a mature male germ cell GO:0036126 sperm flagellum cellular_component that develops from a spermatid. Any process that decreases the rate, frequency, or extent of the multiplication or reproduction negative regulation of growth plate of chondrocytes in a growing endochondral bone, resulting in the expansion of a cell GO:0061914 cartilage chondrocyte proliferation biological_process population The multiplication or reproduction of chondrocytes by cell division, resulting in the expansion GO:0035988 chondrocyte proliferation biological_process of their population. A chondrocyte is a polymorphic cell that forms cartilage The growth of a chondrocyte, where growth contributes to the progression of the chondrocyte GO:0003415 chondrocyte hypertrophy biological_process over time chondrocyte development involved in The progression of a chondrocyte over time from after its commitment to its mature state GO:0003433 endochondral bone morphogenesis biological_process where the chondrocyte will contribute to the shaping of an endochondral bone The process whose specific outcome is the progression of the skeleton over time, from its formation to the mature structure. The skeleton is the bony framework of the body in vertebrates (endoskeleton) or the hard outer envelope of insects (exoskeleton or GO:0001501 skeletal system development biological_process dermoskeleton) GO:0061386 closure of optic fissure biological_process The closure of the temporary ventral gap in the optic cup that contributes to its shaping The generation of amyloid fibrils, insoluble fibrous protein aggregates exhibiting beta sheet GO:1990000 amyloid fibril formation biological_process structure, from proteins. GO:1905906 regulation of amyloid fibril formation biological_process Any process that modulates the frequency, rate or extent of amyloid fibril formation. 168 Table S12: Known heritable conditions affecting captive bred big cat species and relevant GOterms

Species Condition Reference Goterms Sumatran tiger Congenital vestibular disorder Wheelhouse et al. 2015 GO:0060271 cilium assembly GO:0060121 vestibular receptor cell stereocilium organization GO:0032420 stereocilium Snow leopard Hip dysplasia Suedmeyer et al. 2000 GO:0061977 hip joint articular cartilage development GO:0001501 skeletal system development Osteochondritis dissecans Herrin et al. 2012 GO:0061914 negative regulation of growth plate cartilage chondrocyte proliferation GO:0035988 chondrocyte proliferation GO:0003415 chondrocyte hypertrophy GO:0003433 chondrocyte development involved in endochondral bone morphogenesis Multiple ocular coloboma Barnett et al. 2002 GO:0061386 closure of optic fissure Cheetah Kinked tail Marker 1997 GO:0001501 skeletal system development Sperm deformities Crosier et al. 2007 GO:0007283 spermatogenesis GO:0048232 male gamete generation GO:0061827 sperm head GO:0036126 sperm flagellum Amyloidosis Caughey et al. 2008 GO:1990000 amyloid fibril formation GO:1905906 regulation of amyloid fibril formation Bilateral carpal valgus deformities Bell et al. 2011 GO:0001501 skeletal system development

169 Appendix III Supplementary data for Chapter 4

170 Table S1: Details of cheetah, snow leopard and Sumatran tiger samples downloaded from NCBI’s Sequence Read Archive.

Species Sample SRA id Sex Population BioSample Sequencing Tissue Reference platform

Cheetah CHEETAH_TZA3 SRR2737545 F Tanzania SAMN04147028 Illumina Blood Dobrynin et al. HiSeq 2000 2015

Cheetah CHEETAH_TZA2 SRR2737544 M Tanzania SAMN04147027 Illumina Blood Dobrynin et al. HiSeq 2000 2015

Cheetah CHEETAH_TZA1 SRR2737543 M Tanzania SAMN04147026 Illumina Blood Dobrynin et al. HiSeq 2000 2015

Cheetah CHEETAH_NAM3 SRR2737542 M Namibia SAMN04147025 Illumina Blood Dobrynin et al. HiSeq 2000 2015

Cheetah CHEETAH_NAM2 SRR2737541 M Namibia SAMN04147024 Illumina Blood Dobrynin et al. HiSeq 2000 2015

Cheetah CHEETAH_NAM1 SRR2737540 F Namibia SAMN04147023 Illumina Blood Dobrynin et al. HiSeq 2000 2015

Snow SNOW SRR836372 F Captive, Korea SAMN02086968 Illumina Muscle Cho et al. 2013 leopard HiSeq 2000

Sumatran SUM_IND1 SRR7152379 M Taman Safari, SAMN09080450 Illumina Blood Liu et al. 2018 tiger Indonesia HiSeq 2500

Sumatran SUM_USA1 SRR7152382 F Phoenix Zoo, SAMN09080447 Illumina Blood Liu et al. 2018 tiger USA HiSeq 2500

Sumatran SUM_IND2 SRR7152383 F Taman Safari, SAMN09080454 Illumina Blood Liu et al. 2018 tiger Indonesia HiSeq 2500

Sumatran SUM_IND3 SRR7152384 F Taman Safari, SAMN09080453 Illumina Blood Liu et al. 2018 tiger Indonesia HiSeq 2500

Sumatran SUM_IND4 SRR7152385 M Taman Safari, SAMN09080452 Illumina Blood Liu et al. 2018 tiger Indonesia HiSeq 2500

Sumatran SUM_IND5 SRR7152386 F Taman Safari, SAMN09080451 Illumina Blood Liu et al. 2018 tiger Indonesia HiSeq 2500

Sumatran SUM_USA2 SRR7152388 M Atlanta Zoo, SAMN09080455 Illumina Blood Liu et al. 2018 tiger USA HiSeq 2500

171 H COUNT CHR D4 D3 D2 D1 A3 A2 A1 B4 B3 B2 B1 C2 C1 E3 E2 E1 F1 F2 X cat reference genome assembly (felCat9). assembly genome reference cat Table Summarymarker distribution offelineS2: andarray among genotyping 1,018 1,018 1,357 1,512 1,060 1,060 1,750 1,750 1,418 1,486 2,219 1,516 1,516 1,894 2,349 2,349 2,065 2,766 2,766 995 530 958 507 489 742 266,787.9 266,787.9 124,621.0 121,096.5 106,254.5 117,121.6 105,522.0 104,122.4 78,716.4 78,716.4 69,880.9 82,755.7 66,731.6 94,666.9 71,059.6 77,073.8 82,290.3 93,676.5 60,893.8 83,005.7 87,421.8 INTER RRS 5,779,682 5,779,682 1,179,650 1,199,837 3,442,564 2,242,523 2,011,666 2,019,531 2,187,621 2,085,615 2,266,111 2,048,984 2,280,279 2,036,503 2,261,433 2,814,556 2,032,722 2,356,002 2,063,123 2,039,741 SUMATRANTIGER MAX COUNT 59 12 10 15 11 12 15 25 18 18 17 18 19 22 24 8 8 4 9 10,296,328.7 10,296,328.7 10,391,988.4

2,049,272.5 2,049,272.5 8,700,725.4 8,037,747.1 9,877,352.0 4,515,495.1 5,892,620.0 5,356,804.7 7,706,319.5 7,280,859.1 7,659,408.1 8,820,091.2 7,785,474.1 7,808,776.3 8,336,151.3 7,071,664.1 6,675,833.5 9,238,811.7 INTER ARRAY 19,016,628 19,016,628 17,691,499 23,654,864 20,761,612 12,242,791 21,977,682 18,244,863 24,576,744 13,736,224 20,883,877 44,992,810 26,795,387 22,458,219 33,087,900 24,197,503 25,843,231 28,863,858 24,872,310 23,368,536 MAX COUNT 375 484 248 169 452 141 353 405 239 583 575 670 652 485 499 743 442 765 928 347,991.7 347,991.7 169,801.9 269,770.0 257,792.9 140,945.6 447,676.9 267,237.4 237,968.7 375,828.2 200,602.4 278,567.8 330,580.4 218,050.0 308,546.3 310,797.2 279,457.6 321,773.6 217,293.7 260,264.7 INTER reduced representation sequencing datasets of cheetahs, Sumatran tigers and snow leopards aligned to the domestic RRS 4,909,449 4,909,449 3,623,129 3,235,987 3,660,584 2,785,816 4,494,757 3,324,568 3,006,809 3,801,305 2,515,141 2,836,129 3,041,406 3,535,255 2,999,136 3,794,968 2,787,957 4,007,745 3,847,263 4,168,384 MAX SNOW LEOPARDSNOW COUNT 3 1 2 3 5 5 7 5 6 3 6 5 3 6 5 7 7 6 6 44,837,793.0 44,837,793.0 23,755,079.0 15,601,858.0 11,138,178.5 15,012,093.0 18,328,735.0 14,305,784.6 16,330,833.0 48,095,131.5 60,530,182.0 23,511,522.2 33,020,195.5 28,000,644.0 23,132,024.7 18,054,930.4 30,823,879.6 9,424,903.0 9,424,903.0 4,372,464.5 INTER ARRAY - 76,746,847 76,746,847 24,877,730 20,243,630 22,057,320 30,301,851 55,141,209 50,447,373 38,351,513 84,749,138 69,743,474 75,846,978 64,478,510 93,358,151 40,202,986 38,979,136 90,364,413 6,231,231 36,633 3,210.3 3,210.3 36,633 6,231,231 MAX - 24,447 3,421.1 114,218 114,218 3,421.1 24,447 - - 19,689 3,535.3 140,707 140,707 3,535.3 19,689 - COUNT 14,222 9,170.8 9,170.8 14,222 3,639.0 12,268 3,415.4 18,830 3,604.9 17,609 3,523.9 27,375 3,473.2 27,891 3,436.1 26,233 3,729.4 43,217 3,831.4 58,145 3,585.5 40,297 3,791.3 39,497 3,567.3 43,514 3,599.2 57,832 3,591.8 39,862 3,623.9 47,303 3,488.8 69,388 INTER RRS 2,152,489 2,152,489 3,002,183 2,058,294 2,003,345 2,006,032 2,042,348 2,004,299 2,034,660 2,002,019 2,034,888 2,011,576 2,059,574 2,071,088 2,014,195 2,357,372 2,047,149 2,016,731 MAX CHEETAH COUNT 26 54 16 21 15 24 24 29 28 35 55 40 37 33 53 37 65 79 16 2,193,749.5 2,193,749.5 3,022,836.7 3,668,559.1 2,711,395.3 2,839,234.8 3,864,250.7 3,934,949.8 3,749,720.6 2,868,347.7 4,090,071.6 4,564,133.8 4,032,732.3 3,452,888.1 3,992,191.8 4,437,552.2 3,902,028.0 3,774,229.6 2,593,311.9 3,029,492.1 INTER ARRAY 31,485,592 31,485,592 13,565,356 11,918,866 12,051,564 14,302,057 11,445,967 23,308,063 10,243,290 17,366,119 28,094,974 17,091,582 23,974,448 15,727,173 23,578,239 19,580,007 24,369,606 12,054,490 16,696,025 8,690,188 8,690,188 MAX

172

Figure S1: Genome-wide linkage disequilibrium decay for Sumatran tiger and cheetah cohorts. a. Average pairwise r2 values measured using the simulated feline array dataset and b. Average pairwise r2 values measured using the simulated reduced representation sequence dataset.

173 Table S3: SNPs that were identified across array and reduced-representation sequencing datasets in cheetahs.

Chromosome Position A2 88770082 150402373 A3 2759677 6736194 13262857 25347949 83373426 141551825 B1 28420727 164264967 B2 76630148 B3 37443485 127473147 144150063 C1 844621 40854128 87024247 187606645 D1 113817544 D2 24322970 87514927 D3 74221989 D4 21930201 F1 37151873 50281262 F2 31646593 X 16203803 20553796 26320734 40014870 93715647 96363271

174 Table S4: Annotation of SNPs common to all three species in the reduced representation sequencing dataset. Chromosome Position (bp) Allele Consequence Gene name Ensembl gene ID A1 241640498 T Intronic BRD9 ENSFCAG00000009535 241651585 A Intronic BRD9 ENSFCAG00000009535 A2 46076155 A Intronic SUMF1 ENSFCAG00000011512 71402303 G Intronic LANCL2 ENSFCAG00000027536 71404646 T Intronic LANCL2 ENSFCAG00000027536 71404649 C Intronic LANCL2 ENSFCAG00000027536 71416009 T Intronic LANCL2 ENSFCAG00000027536 71416010 G Intronic LANCL2 ENSFCAG00000027536 72073295 A 3' UTR BLVRA ENSFCAG00000026122 72073299 A 3' UTR BLVRA ENSFCAG00000026122 72073330 A 3' UTR BLVRA ENSFCAG00000026122 77423975 T Intronic TRG ENSFCAG00000050998 A3 68083244 A Intronic PRKCE ENSFCAG00000005102 B1 77025175 T Downstream gene variant TMEM154 ENSFCAG00000035945 77025178 G Downstream gene variant TMEM154 ENSFCAG00000035945 B4 25481607 T Upstream gene variant RAB18 ENSFCAG00000027161 C1 8887137 A Intronic TNFRSF8 ENSFCAG00000001453 27071498 G Missense SFPQ ENSFCAG00000042862 124711073 G Intronic DPP10 ENSFCAG00000005743 124711081 A Intronic DPP10 ENSFCAG00000005743 124711084 C Intronic DPP10 ENSFCAG00000005743 124711090 T Intronic DPP10 ENSFCAG00000005743 124712175 G Intronic DPP10 ENSFCAG00000005743 124712194 C Intronic DPP10 ENSFCAG00000005743 173095208 T Missense DNAJC10 ENSFCAG00000026076 173095214 A Missense DNAJC10 ENSFCAG00000026076 C2 65334000 T Intronic IGSF11 ENSFCAG00000029999 65334007 T Intronic IGSF11 ENSFCAG00000029999 E2 27443419 G Synonymous NETO2 ENSFCAG00000044763

175

a

b

Figure S2: Manhattan plots of genome-wide association analysis comparing Panthera species (Sumatran tiger and snow leopard) to cheetah using a. array datasets with a genome-wide significance threshold of 1.39x10-5 and b. reduced representation sequencing dataset with a genome-wide significance threshold of 6.34x10-7.

176 a b

Figure S3: Fixed coding variants between Panthera species and cheetah were differentially enriched for a. molecular function relating to ATPase activity in the array dataset and b. biological processes in the reduced representation sequencing datasets.

177

Table S5: Missense variants associated with differences between cheetah and Panthera genera in the reduced representation dataset.

Chr Position (bp) Gene name Cheetah Panthera Amino acid change Ensembl gene ID cDNA position Protein position A1 174,063,967 EPB41L4A TG N/H ENSFCAG00000008228 1,450 376 214,531,476 SLC45A2 TC Y/H ENSFCAG00000002354 590 173 B1 77,956,596 FAM160A1 GT L/M ENSFCAG00000025713 3,224 953 B2 31,519,454 MDC1 AG I/T ENSFCAG00000007696 4,251 1,237 119,672,933 LAMA2 AG T/A ENSFCAG00000005957 8,755 2,919 131,217,610 ADGRG6 GA R/K ENSFCAG00000024629 3,017 1,006 B3 118,697,876 PCNX1 GT A/S ENSFCAG00000004193 997 333 C2 134,398,332 COL6A6 AG M/T ENSFCAG00000013504 177 49 D1 114,293,024 PPFIA1 GC C/S ENSFCAG00000004765 941 314 D4 52,261,011 LRRC19 TG Y/S ENSFCAG00000025095 974 325 59,065,834 UNC13B TA F/L ENSFCAG00000012434 4,131 1,377 X 46,114,600 TSPYL2 GA S/N ENSFCAG00000022093 1,922 641 82,885,690 SRPX2 TG S/A ENSFCAG00000003306 598 23

178