Whole genome data provides evidence of divergent selection and flow between two populations of red grouse Lagopus lagopus scotica with implications for conservation

Grace Walsh

Degree project in biology, Master of science (2 years), 2021 Examensarbete i biologi 60 hp till masterexamen, 2021 Biology Education Centre and Department of Ecology and Genetics, Uppsala University Supervisors: Jacob Höglund and Barry John McMahon External opponent: Chaz Hyseni ABSTRACT Red grouse Lagopus lagopus scotica are endangered in Ireland but are widespread in other parts of their range. Their decline in Ireland is attributed to increases in generalist predators and habitat loss due to changing land use. Once widespread in Ireland, its population is now reduced and fragmented. For the effective conservation of this species, an understanding of genomic diversity and local adaptation can be important. In this study whole-genome sequencing of contemporary and historic samples was used to assess the level of differentiation, inbreeding and population structure between two populations of red grouse as well as to identify candidate putatively under divergent selection. Irish and English populations are shown to be differentiated (FST=0.095). There is clear population structure between English and Irish red grouse, with evidence of admixture into the Irish population. This is evidence of gene flow from the English to the Irish population. There were signs of recent inbreeding in the contemporary Irish population more so than in the historical or English samples. The contemporary Irish population has significantly more long runs of homozygosity than the other two populations. Outlier analysis between the contemporary samples identified 661 candidate genes under putative divergent selection. These were involved in a large variety of processes including immune response, pigmentation and food intake. This study provides more evidence that Irish red grouse are locally adapted and stressed that conservation efforts should focus on conserving the Irish population as one unit.

1

TABLE OF CONTENTS 1. INTRODUCTION ...... 3 1.1 BACKGROUND ...... 3 1.2 STUDY SPECIES ...... 4 1.2.1 Red Grouse (Lagopus lagopus scotica) ...... 4 1.2.2 Taxonomic status and overview of genetic studies ...... 6 1.2.3 Parasite mediated selection ...... 7 1.2.4 Pigmentation ...... 8 1.3 STUDY OBJECTIVES ...... 9 2. METHODS ...... 9 2.1 STUDY SAMPLES ...... 9 2.2 DNA EXTRACTIONS ...... 10 2.3 LIBRARY PREPARATIONS ...... 10 2.4 SEQUENCING ...... 10 2.5 MAPPING AND SNP CALLING ...... 10 2.6 ANALYSES ...... 12 2.6.1 Genetic diversity...... 12 2.6.2 Population structure and local adaptation using pcadapt ...... 12 2.6.3 Analysis of Admixture ...... 12 2.6.4 Fst outlier analysis ...... 12 2.6.5 (GO) enrichment analysis ...... 13 2.6.6 Runs of Homozygosity analysis ...... 13 3. RESULTS ...... 13 3.1 SEQUENCES, MAPPING AND SNP CALLING ...... 13 3.2 GENETIC DIVERSITY ...... 13 3.3 PRINCIPAL COMPONENT ANALYSIS AND OUTLIER DETECTION ...... 14 3.4 ANALYSIS OF ADMIXTURE ...... 14 3.5 WINDOW-BASED FST SCANS ...... 15 3.6 OUTLIER ANALYSIS ...... 16 3.6.1 Immune response ...... 16 3.6.2 Pigmentation ...... 19 3.6.3 General genes and outlier method overlaps ...... 21 3.6.4 Gene ontology (GO) enrichment analysis ...... 24 3.7 RUNS OF HOMOZYGOSITY ...... 24 3.7.1 Short/medium ROH ...... 25 3.7.2 Long ROH ...... 25 4. DISCUSSION ...... 26 4.1 GENETIC DIVERSITY...... 26 4.2 GENETIC DIFFERENTIATION ...... 27 4.3 OUTLIER ANALYSIS ...... 28 4.3.1 Immune response ...... 28 4.3.2 Pigmentation ...... 29 4.3.3 Other relevant genes ...... 30 4.3.4 Gene ontology (GO) enrichment analysis ...... 31 4.4 RUNS OF HOMOZYGOSITY ...... 32 4.5 MANAGEMENT IMPLICATIONS ...... 33 4.6 STUDY LIMITATIONS...... 34 4.7 CONCLUSIONS ...... 34 ACKNOWLEDGEMENTS ...... 35 REFERENCES ...... 36 SUPPLEMENTARY INFORMATION ...... 47

2

1. INTRODUCTION

1.1 Background Current global extinction rates are 100-1000 higher than what is considered to be the natural background level as calculated over geological time (Proença & Pereira 2017). This has led to the current period being considered the sixth mass extinction (Barnosky et al. 2011, Ceballos et al. 2015). This mass extinction is inherently detrimental for wildlife but is also a threat to civilisation – to what extent however is unknown (Ceballos G et al. 2020). There are endless calls for action to be taken to halt these drastic reductions in biodiversity (e.g. Díaz et al. 2019, Ceballos et al. 2020). Nevertheless, many populations are continuously being reduced and fragmented through over-exploitation, changes in agriculture and urbanisation (Maxwell et al. 2016). On a species or population level, once their numbers are low enough the ecosystem functions which they were once carrying out are lost and this, in turn, can lead to further extinctions (Estes et al. 2016). In addition, population decline may lead to reduction in resilience to environmental change (Reed & Frankham 2003).

Increased fragmentation causes populations to become smaller and more isolated which can lead to reductions in genetic diversity and a subsequent increase in extinction risk. This is due to small populations being more sensitive to environmental stochasticity such as disease (Frankham 1995). In addition, these small populations are more affected by genetic drift (Wright 1969, Kimura 1983) and the accumulation of deleterious alleles leading to subsequent reductions in fitness (Frankham 2010). Population decline may lead to reductions in genetic variation and thus less variation for selection to act on, which results in these populations having reduced fitness in several traits, including weight (Mattila et al. 2012), fertility (Reed and Frankham, 2003), and lower resistance to disease and pathogens (Spielman et al. 2004).

On top of this smaller populations tend to have higher levels of inbreeding and thus an increased risk of inbreeding depression. When sections of the genome are identical by descent (IBD) in parents and then passed on to their offspring, this results in homozygous regions. As these homozygous regions become more numerous in successive generations this can result in inbreeding depression with inbred individuals having lower fecundity compared to outbred ones (Hedrick and Kalinowski, 2000). These homozygous regions can be studied by performing scans along the genome and identifying Runs of Homozygosity (ROH) – long sections of the genome that are IBD. The length and number of these ROH can give insights into past and recent inbreeding levels (Ceballos et al. 2018).

There are several examples of fragmentation affecting birds (Lindsay et al. 2008, Bruggeman et al. 2010). As this loss of genetic variation from fragmentation continues so can inbreeding depression, accumulation of deleterious alleles and negative effects on fitness. As these impacts increase populations become smaller and are thus more at risk of these effects. This cycle of decreasing population size and decreasing fitness can cause species to enter an “extinction vortex” (Gilpin & Soulé 1986). This occurs as an increasing effect of drift coupled with reduced fitness causes the population to decrease which in turn increases the effect of drift. Genetic rescue through translocations or increased connectivity to increase genetic variation is a way to counter this (Tallmon et al. 2004). However, this is not a straightforward task, nor is it suitable in all situations. When two populations have diverged, they can become locally adapted to their respective environments (Hereford 2009). Local adaptation can have important consequences for conservation efforts. When two populations

3 are locally adapted, translocations and resulting mating could lead to outbreeding depression through the introduction of maladapted alleles (Frankham et al. 2011). Therefore, for effective conservation of small isolated populations often with low genetic diversity, it is important to have an understanding the extent of divergence among populations. The lower the level of divergence the lower the risk of outbreeding depression (Whiteley et al. 2015). By incorporating museum samples into these studies temporal changes in the genetic make- up of populations can also be assessed. Even though there are inherent problems with sampling museum species, such as DNA fragmentation and deamination of cytosine to uracil, they are becoming an important aspect and tool of conservation (Nakahama 2021).

Among some of the most drastic species declines seen in Europe today have occurred in ground-nesting birds, with 74% of species in decline across Europe (compared to 41% of other nesting birds) (McMahon et al. 2020). A recent review of birds of conservation concern in Ireland saw six species moving off the red list and 23 being added to it – a 43% increase (Gilbert et al. 2021). Some species in Ireland are very close to extinction such as the Eurasian Curlew (Numenius arquata) with c. 150 breeding pairs left (O’Donoghue et al. 2020) and the Ring Ouzel (Turdus torquatus) where exact numbers are not known but the breeding population is reduced to two areas with low numbers (Mee 2018). Upland birds are particularly at risk in Ireland with a large proportion red-listed compared to other groups (Gilbert et al. 2021). Reasons for these declines include increases in generalist predators (McMahon et al. 2020) afforestation, exploitation of peat, overstocking and the erosion caused by these activities (Cummins et al. 2010). Given these stark declines in breeding birds in Ireland, there is a need to quantify the genetic viability of more species in order to examine the resilience of populations and to inform conservation efforts.

1.2 Study species 1.2.1 Red Grouse (Lagopus lagopus scotica) Red grouse (Lagopus lagopus scotica) is a sub-species of the willow grouse (Lagopus lagopus) which occurs exclusively in the British Isles (Quintela et al. 2010). Willow grouse have an almost circumpolar distribution with an east/west division and a generally outbred population (Höglund et al. 2013). Red grouse are behaviourally and phenotypically different to willow grouse. Unlike willow grouse, they do not develop white winter plumage but remain red/brown year-round (Skoglund & Höglund 2010). SNP markers have shown red grouse and willow grouse are diverged lineages (Quintela et al. 2010). In addition, whole- genome data supports this inference, with red grouse and willow grouse forming distinct groups in principal component and phylogenetic analyses (Kozma et al. 2019). Kozma et al. 2019 found two genes, SUN3 and EDIL3 that were under strong selection only in red grouse, but not in the other grouse study species. Studies in other taxa show SUN3 is a involved in spermiogenesis in mammals and also cytoskeleton anchoring (Kozma et al. 2019) while the EDIL3 gene has been suggested to play a role in the first stages of eggshell biomineralization in chickens (Gallus gallus) (Stapane et al. 2020).

4

Figure 1. Map of the British Isles showing the year-round distribution of Red grouse. Distribution is shown on a 10km grid square level which each dot representing a grid square where red grouse were detected. English samples are shown in red, Recent Irish samples (between 2007-2020) are shown in orange and Museum samples (c.1880) are shown in purple. Map reproduced from Bird Atlas 2007–11, which is a joint project between, BTO, BirdWatch Ireland and the Scottish Ornithologists’ Club. Map reproduced with permission from the British Trust for Ornithology.

Red grouse numbers in Great Britain were estimated to be 265,000 breeding pairs in 2005 (Robinson 2005) however current numbers are likely lower as populations have been in decline since then (Balmer et al. 2013). In Ireland, there is an estimated 4200 (95% CI: 3,800 – 4,700) individual birds in the Republic of Ireland (Cummins et al. 2015) and 202-221 in Northern Ireland (Allen et al. 2005). Red grouse in Ireland are the only bird species found exclusively on peatland habitats. Irish red grouse are associated with bog habitats whereas in Britain red grouse occur on heather (Calluna vulgaris) dominated moorland (Balmer et al. 2013). In Ireland, they are mostly found on montane and raised bog (Bracken et al. 2008) with the highest number on mountain blanket bog followed by upland blanket bog. The most recent survey confirmed losses of greater than 50% since 1968-72 (Cummins et al. 2015). There has also been a significant reduction in the genetic diversity of Irish red grouse (Freeland et al. 2007). In Britain, grouse moors are intensively managed through intensive predator control and habitat management leading to high grouse densities (Sotherton et al. 2009, Ludwig et al. 2017).

5

1.2.2 Taxonomic status and overview of genetic studies In the past, red grouse on the islands of Britain and Ireland (along with birds from the Outer Hebrides) have been described as separate subspecies, Tetrao dresseri and Tetrao hibernica respectively. This distinction was due to differences in plumage with Irish and Outer Hebridean birds being generally lighter in colour than their counterparts in Britain (Kleinschmidt 1919). Today they are mostly considered the same subspecies – L. lagopus scotica. Drastic declines in Irish grouse of 50% since 1968-72 (Cummins et al. 2015) have led to more interest in the taxonomy of Irish red grouse in particular to identify whether or not translocations from Britain are a viable conservation option. This has led to three published genetic studies on levels of divergence and the potential of an L. lagopus hibernica subspecies in Ireland.

Firstly, there is quantitative evidence that Irish birds have lighter plumage than British birds. There is a disparity in the plumage of birds found in Ireland versus those found in Britain (Kleinschmidt 1919) believed to be a result of the different habitats (Balmer et al. 2013). In Great Britain, birds are thought to occupy areas with denser heather cover whereas in Ireland the birds are typically found in bogs and areas with higher densities of grass (Hutchinson 1989, Finnerty et al. 2007, Balmer et al. 2013). When reflectance scores of feathers from Irish, British and Scandinavian grouse were examined Irish birds had statistically lighter breast feathers than Scandinavian and British birds. In addition, two genes involved in pigmentation DCT and MC1R showed significant differences in sequences between British and Irish birds (Höglund & McMahon, unpublished observations). However, no dedicated study has looked at to what extent the habitats differ between Ireland and Britain and or how well this correlates with observed phenotypes.

British and Irish red grouse are considered the same subspecies when using mtDNA sequences (ca 300bp). Analysis of the mtDNA sequences from 22 Irish, 9 British and 2 Outer Hebridean red grouse along with willow grouse from Norway, Russia and Sweden revealed 12 unique haplotypes. The two haplotypes which occurred most often, were present in the Irish and British populations. Overall the haplotype network did not indicate that individuals from Ireland were more related to each other than those from Britain. These shared haplotypes were not thought to be from recent hybridisation or introgression and therefore they rejected the existence of an endemic subspecies in Ireland. From these results, the study concluded translocation could be a viable management option (Freeland et al. 2007).

In contract, microsatellite makers have shown there is significant differentiation between red grouse from Ireland and Scotland (FST = 0.07, 95% CI = 0.04–0.10) (McMahon et al. 2012). This study suggested that the Irish population consisted of four clusters labelled as Cork, Munster, Wicklow, and the North-west and West. Overall, the highest genetic diversity was in the Wicklow population corresponding to the highest effective population size (Ne) of 151. A factorial correspondence analysis showed no overlap between Scottish and Irish populations for multilocus genotypes and found no evidence of recent gene flow between the Irish and British populations. The heterogeneity of the Irish red grouse in this study was on the same level with threatened and isolated populations. This study recommended against using British birds to reinforce the Irish population.

McMahon et al. (2012) and Höglund & McMahon (unpublished observations) both looked at neutral genetic markers and therefore it cannot be said that the differences shown are not due to genetic drift as opposed to selection. To address this adaptive markers were used. By using adaptive and neutral markers insights into the relative roles of genetic drift and selection can

6 be evaluated. The major histocompatibility complex (MHC) genes, which are involved in immune response have been well studied and are some of the best-understood genes for this purpose. Meyer-Lucht et al. (2016) looked at MHC class II genes and neutral SNPs in Scottish versus Irish red grouse. Irish and Scottish grouse in this study were shown to be significantly differentiated both at adaptive MHC markers (FST: 0.090 - 0.116) and at neutral SNPs (FST = 0.084). There was significantly more differentiation at adaptive markers than neutral SNPs indicating local adaptation. Irish birds were shown to have a higher diversity of the MHC-BLB genes also. This could mean that Irish birds are locally adapted to a more diverse or different parasite community (Meyer-Lucht et al. 2016). The Irish birds like the Scottish birds were shown to be scattered in a PCA (Meyer-Lucht et al, 2016) unlike previous research where the Scottish birds were scattered but not the Irish (McMahon et al. 2012) showing somewhat contradictory measures of genetic diversity. This study also found an Ne of 106 for all of Ireland which the authors point out is on the verge of being critical (Lynch & Lande 1998). Meyer-Lucht et al. (2016) state that genetic diversity is reduced in the Irish birds but not to a great degree. To fully understand the BLB genes and the host-parasite relationship in Irish birds, data on parasite communities in Ireland must be collected.

In contrast to these three studies which found no evidence for a legacy of British birds in the Irish population, Höglund & McMahon (unpublished observations) when looking at 15 microsatellite loci, found that birds from Northern Ireland fell between birds from the rest of Ireland and Britain indicating there has been some recent gene flow in the north.

Overall, there is clear evidence that Irish and British red grouse are genetically differentiated at neutral (McMahon et al 2012, Meyer-Lucht et al, 2016) and adaptive markers (Meyer- Lucht et al, 2016). There is some evidence for divergence in genes involved with pigmentation (Höglund & McMahon, unpublished observations) and the immune response (Meyer-Lucht et al, 2016) and there is evidence of gene flow from British red grouse to the population in the north of Ireland (Höglund & McMahon, unpublished observations). While these studies do not fully address whether Irish red grouse warrant sub-species status, they do present a strong case against translocations.

1.2.3 Parasite mediated selection The role of parasites in regulating red grouse populations has been heavily studied in the UK particularly regarding the role of the nematode Trichostrongylus tenuis (Martinez-Padilla et al. 2019). Parasites can have an effect on testosterone levels which affects comb size and male aggression and in turn can affect mate choice and intrasexual selection (Mougeot et al. 2005). Higher parasite burdens correlate with lower breeding success (Hudson 1986). A microarray experiment with varying T. tenuis infection and testosterone levels identified 52 genes of which all were upregulated in caecal tissue following infection and 51 were downregulated when testosterone was high (Webster et al. 2011a).

Genome-wide association (GWAS) modelling identified 5 SNPs explaining T. tenuis burden (Wenzel & Piertney 2015) and another study associated 12 genes with T. tenuis burden (Wenzel et al. 2015). These genes were involved in the immune response but also in metabolism, oxidative stress and detoxification mechanisms. A subsequent landscape-scale study looking at 21 red grouse populations and found no evidence of balancing or directional selecting acting on any of these pre-identified genes (Wenzel & Piertney, 2015, Wenzel et al. 2016). However, evidence of selection at the population level regarding cytosine methylation states was found (Wenzel & Piertney 2014).

7

A reason put forward in a review of this subject for the lack of evidence of selection is that the selective forces are too low to counter the effects of genetic drift and gene flow (Martínez-Padilla et al. 2019). There is evidence for some gene flow between Ireland and Britain (Höglund & McMahon, unpublished observations) but this is almost certainly less than the levels within Scotland and therefore selection can have a bigger effect. When considering MHC genes in red grouse in Ireland vs Britain there was evidence for directional selection and local adaptation to a more diverse or different parasite load in Irish red grouse compared to Scottish. There are varying parasite loads within North-East Scotland (Wenzel et al. 2015) and the potential for population-level selection in this regard. Therefore it is not a stretch to consider that there would be different parasite burdens and consequently different selection pressures on Irish vs British red grouse. However, published data on parasite communities in Ireland is very scarce. The nematode T. tenuis is widespread and even controlled on grouse moors in Britain (Newborn & Foster 2002). However, it has only been recorded twice in Irish birds. This may be due to insufficient survey work and further studies are needed (Finnerty & Dunne, 2007). The tick Ixodes ricinus, a vector of louping ill virus is a parasite of red grouse in both Britain and Ireland (Jeffries et al. 2014) with experimental infections resulting in c. 80% mortality (Reid 1975). In the British Isles molecular genetic analysis shows that there are 3 distinct populations of the virus. These are the Irish, Welsh and British populations. In Ireland, unlike other areas in the British Isles two of the populations co-occur – the Irish and the British. There is also a greater proportion of the virus in Ireland in domestic species due to higher rainfall favouring the tick I. ricinus to survive in a wider range of habitats (McGuire et al. 1998). This could potentially result in a more intense and also more diverse parasite burden in Ireland.

1.2.4 Pigmentation Crypsis is defined as when a species resembles its background to avoid detection by predators who hunt by sight where and when that species is most often preyed upon (Endler 1981). It has been well studied in many systems. A recent review of the topic has emphasized the importance of understanding the demography of the focal species. Several studies relate crypsis to variation in the same melanin pathway which concerns the genes MC1R and Agouti (Harris et al. 2020). There is evidence that plumage colour varies between birds in Great Britain vs birds in Ireland (Höglund & McMahon, unpublished observations) which has been attributed to their different habitats in the two places (Hutchinson 1989, Finnerty et al. 2007). There is not any differentiation in the region containing Agouti between English red grouse, willow grouse and rock ptarmigan (Lagopus muta) so it has hypothesised this gene is involved in the brown plumage which all three species share in the summer months and which red grouse have year-round (Kozma et al. 2019).

These differences in plumage at certain times of the year correspond to local environmental conditions since the British isles where red grouse occur are mostly snow-free all year. Willow grouse however develop a white winter plumage (Skoglund & Höglund 2010) signifying local adaptation to differing environments. There is quantitative evidence that Irish birds have lighter plumage than British birds. When reflectance scores of feathers from Irish, British and Scandinavian were examined it was found Irish birds had statistically lighter breast feathers than Scandinavian and British birds (Höglund & McMahon, unpublished observations). When sequencing coding regions of DCT, MC1R, TYR and TYR1 they were not found to be involved in the differing plumages (Skoglund & Höglund 2010) however differences in the sequences of DCT and MC1R have been found between British and Irish red grouse with MC1R showing nonsynonymous substitutions in four variable positions. The DCT gene showed significant sequence differences between Irish and British birds. In this

8 study, a direct link between genotype and phenotype, however, was not established (Höglund & McMahon, unpublished observations).

Crypsis has been studied using wholes genome (WG) sequences to calculate site frequency spectrums (SFS) to investigate the role of Agouti in determining the seasonal camouflage change of snowshoe hares (Jones et al. 2018) and in the different crypsis adaptations of Nebraska deer mice (Pfeifer et al. 2018). Also, WGS and SFS have been used to investigate the role of MC1R in selection and adaptation in sand lizards showing crypsis (Laurent et al. 2016).

Previous studies have used relatively small genetic fragments such as microsatellites, sets of candidate genes and neutral SNPs to study selection and differentiation in red grouse. To date, no study has used WG data to address these questions. Now, with advances in high throughput screening technology the genomics of local adaptation are being studied more (Savolainen et al. 2013). In addition to the improvements in methodologies to study museum specimens, these questions can be better addressed. The usefulness of museum samples in conservation genomics has become more common with improving methods. In particular, it is useful to compare populations before and after severe bottlenecks and to assess how diversity and variation have changed over time (Nakahama 2020). 1.3 Study objectives (i) Does WG data support previous findings of reduced genetic diversity and significant differentiation between Irish and English Red Grouse? (ii) Is there any evidence for local adaptation in these populations? Specifically regarding genes involved in pigmentation and immune response. (iii) Are there signs of different levels of inbreeding between the two populations? (iv) Is there evidence for a legacy of introduced British birds in the Irish population of Red Grouse? (v) From WG data, what is suggested regarding translocations of this species?

2. METHODS

2.1 Study samples Whole genome sequences of 23 individuals were analysed in this study. Details of these are shown in Table S1 and Fig. 1. Eight individuals were sampled from museum skins at the Natural History Museum, Dublin. The museum samples were from Irish individuals and were all catalogued between 1881-1882. There were also feather samples from four Irish birds stored at Uppsala University which were collected in 2007. Two more samples, one feather and one muscle were obtained from Irish birds in 2020. In addition, there were nine sequenced genomes from English red grouse available at Uppsala University from a previous study (Kozma et al. 2019) which were from two spleen and seven liver samples. These samples were collected from the Yorkshire Dales in the UK in 2013. In this report, the samples will be referred to as English, Museum and Recent. English referring to the English samples, Recent referring to the feather and muscle samples collected in Ireland since 2007 and Museum referring to the Irish samples originally catalogued between 1881-1882. The Recent and Museum samples are referred to collectively as the Irish samples.

9

2.2 DNA extractions The DNA extraction protocol varied for the different samples. For the 9 English samples, DNA extractions were carried out as described in Kozma et al. 2019 with a Qiagen DNeasy Blood & Tissue Kit® following the manufacturer’s instructions (Qiagen). The quality of DNA was assessed on an agarose gel and a Quibit® Fluorometer. Extractions for the recent muscle samples were carried out using a Thermo Scientific KingFisher Cell and Tissue DNA Kit following the manufacturer’s instructions for the KingFisher Duo purification system. Before commencing the tissue samples were broken down mechanically using a scalpel. DNA yield was assessed using a Qbit 3.0 Fluorometer. DNA extractions for the feather and toepad samples were carried out using a QIAamp DNA Micro Kit following the manufacturer's protocol for the isolation of genomic DNA from tissues less than 10mg with some deviations. During the lysate stage 20ul of DTT was added to aid in tissue breakdown. There was an extra incubation step for 10 minutes at 72°C DNA during the lysis stage. Lastly, the final dilution step was split into two with 30ul added each time and a centrifuge in between. 2.3 Library preparations For the nine English samples, library preparation was carried out using the Illumina TruSeq protocol as described in Kozma et al. 2019. The muscle and feather sample libraries were prepared with the Illumina TruSeq PCR-free protocol. These library preparations were carried out by NGI Solna. Library preparations for the feather samples were carried out at the Department of Bioinformatics and Genetics at the Swedish Museum of Natural History in Stockholm, Sweden. The protocol followed “Illumina Sequencing Library Preparation for Highly Multiplexed Target Capture and Sequencing” (Meyer & Kircher 2010) which was modified by adding USER enzyme and dual indexing (Briggs et al. 2010). 2.4 Sequencing The English samples were sequenced using an Illumina HiSeq 2500. They were pair ended reads of 125bp with a target insert size of 350 bp. This was done at the SNP&SEQ technology platform at Uppsala University (Kozma et al. 2019). The feather and muscle samples were sequenced on a quarter lane of S4 NovaSeq6000 (NovaSeq Control Software 1.7.0/RTA v3.4.4) using the standard protocol. They were pair ended with a read length of 2x150bp. The museum samples were sequenced on one NovaSeq6000 (NovaSeq Control Software 1.7.0/RTA v3.4.4) and were also pair ended. The feather, muscle samples and museum samples were sequenced at NGI Solna. 2.5 Mapping and SNP calling Mapping and SNP calling was carried using a bioinformatic pipeline which followed the suggested best practices workflow by the Genome Analysis Toolkit (GATK) (Van der Auwera et al. 2013, BroadInstitute 2020).

Sequencing output fastqc files were merged for each individual. Quality checks were then carried out on these samples using the tool Fastqc (v0.11.9) (Andrews 2010) which provides an overview of the data quality and highlights potential issues. The software Trimmomatic (v0.36) (Bolger et al. 2014) was used to identify and remove adapter sequences as well as to remove low-quality areas. Trimmomatic was used with the following settings: [LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15 MINLEN:50]. This resulted in a window 4

10 base pairs wide being read and subsequently cut when average quality dropped below 5 and then reads below 50 base pairs in length were also dropped. This resulted in four files per individual consisting of paired forward and reverse reads and unpaired forward and reverse reads. For further analysis, the paired read files were used.

The resulting paired forward and reverse reads were then aligned to the reference chicken genome using the Burrows-Wheeler Aligner (BWA) (v0.7.17). BWA was ran using the mem algorithm with long-read support for split alignment on 20 threads [-t] and Picard compatibility [-M]. The output is one BAM file for each individual. The MarkDuplicates was then used to identify and mark the duplicates so that they would be ignored in later steps. The tool index from SAMtools (v.1.5) (Li et al. 2016) was used to index the resulting files. QualiMap (v.2.2.1) (García-Alcalde et al. 2012) was used to assess read quality.

Variant calling was carried out according to the steps outlined by GATK (v4.2.0.0) (Van der Auwera et al. 2013) using 3 tools – HaplotypeCaller, CombineGVCFs and GenotypeGVCFs. HaplotypeCaller was used on each sample with [-ERC GVCF] to create an intermediate g.vcf to be run through GenotypeGVCFs. The output of HaplotypeCaller is a g.vcf for each per individual. Before these can be genotyped they must be combined into one file with CombineGVCFs. The file is then joint genotyped with GenotypeGVCFs resulting in one single variant call format (VCF) with all samples.

The next step was filtering the vcf. Several tools were used for this. SNPs were extracted using SelectVariants in two steps. Firstly, [--select-type-to-include SNP] was used to select SNPs and [--restrict-sites-to BIALLELIC] to only include biallelic sites, excluding multiallelic sites and these filtered sites were excluded [-- exclude-filtered true]. Then VariantFiltration was used with the filtering criteria: [QUAL < 30, MQ < 40.00, SOR > 4.000, QD < 2.00, FS > 60.000, MQRankSum < -12.5, ReadPosRankSum < -8.000]. Finally, SNPs in repetitive regions were removed with BEDTools (v.2.27.1). For this, the chicken repeat track was used. This could potentially result in some error as it may not correspond exactly to the repetitive regions in red grouse. For this the command subtract was run with default settings and the chicken repeat track as the reference. A similar process was carried out for indels where SelectVariants was used to select the indels [--select-type-to-include INDEL] and filtered sites were excluded [--exclude-filtered true]. VariantFiltration was then ran with the criteria: [QUAL < 0, SNP, MQ < 40.00, SOR > 10.000, QD < 2.00, SNPfilter4, FS > 200.000, ReadPosRankSum < -20.000]. Then BEDTools (v2.27.1) (Quinlan & Hall 2010) was used to remove SNPs within 5bp of an indel using the commands window and subtract. The flag [-w 5] was used with window to search for overlapping windows between the output SNP file and the output indel file from the previous steps. It marks 5 base pairs upstream and downstream of each overlapping feature in the SNP file which were then removed. Using subtract the output file is then written to a vcf which has been hard filtered, with SNPs in complex repetitive regions removed as well as SNPs within 5bp of indels.

The last filtering step is carried out using VCFtools (v.0.1.15) (Danecek et al. 2011). The Z and W sex were removed. The file was filtered to include sites with a minor allele frequency greater than or equal to 0.5 [--maf 0.05] and sites, where the max missing data for one locus was more than 10%, were cut [--max-missing 0.9]. This

11 dataset was now ready for analysis. The same dataset was then filtered to remove linkage disequilibrium at a threshold of 0.4 using PLINK (v1.9) (Purcell et al. 2007). 2.6 Analyses For some analyses, the museum samples are omitted for two reasons. Firstly, the levels of deamination in the museum samples were not quantified or filtered for. In historic samples, there can be damage at CpG regions due to cytosine deamination and conversion into uracil (e.g. Van der Valk et al. 2019). This could lead to incorrect conclusions regarding divergence. Also since there was a c. 120-year time gap between the Irish samples this could have affected the results. Therefore in some analyses where it is considered this could have a large effect the museum samples were left out.

2.6.1 Genetic diversity Nucleotide diversity (π) was calculated among the English and Irish samples. For this, the Irish samples were split into Recent and Museum samples to investigate whether there had been a drop in genetic diversity over time in Irish birds. This was carried out in VCFtools (v.0.1.15) (Danecek et al. 2011) with the command [--window-pi 15000] to measure nucleotide diversity per 15 kb window.

2.6.2 Population structure and local adaptation using pcadapt To investigate population structure and identify evidence of local adaptation a package called pcadapt (Privé et al. 2020) was implemented in R (v.4.0.4). Firstly, a Principal Component Analysis (PCA) with all samples was carried out to visualise genetic variation. Following this, a second PCA was carried out with only the Recent Irish and English samples. The software pcadapt then detects outliers based on the principal components (PCs). It computes k PCs based on the genetic variation which explain population structure. Initially, 10 PCs were used and this was graphically inspected using a scree plot (Fig. S2(a)) to assess the variance explained by each PC and also with a score plot to so see how individuals clustered. It was decided to retain two PCs which were believed to be related to population structure. SNPs were then regressed against the two PCs and z-scores were calculated. Mahalanobis’ distances of these z-scores were then calculated and converted into p-values. The p-values for each SNP were transformed into q-values with the R package q-value (Storey et al. 2019) and a false-discovery rate (FDR) of 0.05 was used to detect outliers. For the outlier analysis, only the Recent Irish samples and English samples were used which has been pruned for linkage disequilibrium.

2.6.3 Analysis of Admixture To estimate ancestry, the software ADMIXTURE (Pritchard et al. 2000) was used. For this analysis, the BED file was used that had been pruned for linkage disequilibrium which is the requirement for this software. This was run for populations (k) set to 1-5. By using [--cv] a cross-validation error for each k was obtained. This error estimate gives the accuracy of the estimate with lower values being more accurate. The k with the lowest value is considered to be the most likely number of populations. The ADMIXTURE results were then visualised in R (v.4.0.4) using tidyverse and ggplot2.

2.6.4 Fst outlier analysis To assess genetic differentiation further, pairwise fixation indices (FST) (Weir & Cockerham 1984) were calculated for Recent Irish vs English red grouse. FST is a measure of population differentiation ranging from 0-1, with 0 being the most similar and 1 being the most

12 differentiated. This was calculated using PLINK (v1.9) (Purcell et al. 2007) and the [-- fst]function. In addition, window-based FST across the genome was calculated for 15 kb windows using also with the Recent Irish and English samples. This was done with vcftools and the options[--weir-fst-pop] and [--fst-window-size 15000]. The dataset had not been thinned for linkage disequilibrium. The FST scores were transformed into ZFST scores. These show how many standard deviations a data point is from the mean. The results were then visualised in R with the qqman package.

These window-based ZFST values were used to determine genes in regions with high FST. Windows with ZFST >3 were considered outliers and were mapped against the chicken genome. Genes within 5 kb above and below these windows were extracted. Particular attention was paid to genes with a function in immune response or plumage colour functions as these have been previously studied in the species and put forward as being under divergent selection (Skoglund and Höglund, 2010, Meyer-Lucht et al. 2016).

2.6.5 Gene ontology (GO) enrichment analysis ShinyGo v.0.61 (Ge et al. 2020) was used to carry out gene ontology (GO) enrichment analysis based on a false discovery rate (FDR) of <0.05 as statistical significance. This analysis was run to identify gene ontology enrichment in Biological Processes, Molecular Function, Cellular Components and KEGG pathways.

2.6.6 Runs of Homozygosity analysis To detect runs of homozygosity (ROH) there were two different criteria used. This was done with PLINK and the [--homozyg] option. The scanning window was 50 SNPs wide and set so that SNPs that were over 1000 kb apart could not be considered in the same ROH. Only stretches containing at least 50 SNPs that were 100 kb were considered. Following this, the ROH output was split into short/medium and long ROH. Short/medium ROH was defined as homozygous stretches between 100-1000 kb. These are signs of past demographic effects such as ancient inbreeding and bottlenecks whereas long ROH are considered signs of recent inbreeding (Ceballos et al. 2018). Long stretches of ROH were classed as those greater than 1000 kb. The total number and average length of ROH per individuals and total length were visualised in R using the packages ggplot2 and tidyverse.

3. RESULTS

3.1 Sequences, Mapping and SNP calling After mapping and filtering the reads the mean coverage per sample was 24.3x. Details of these samples are shown in Table S1. The final SNP dataset with all individuals consisted of 3345069 SNPs. The dataset which was pruned for linkage disequilibrium contained 662871 SNPs. The final SNP dataset for only Recent Irish and English individuals contained 5873882 SNPs of which 849131 remained after pruning for linkage disequilibrium. 3.2 Genetic Diversity Nucleotide diversity (π) was similar in all populations. It was lowest in the museum samples at 0.000879 (± 0.000586 SD) and very similar in the English and Recent Irish samples at 0.001157 (± 0.00071 SD) and 0.00107 (± 0.00067 SD) respectively.

13

3.3 Principal component analysis and outlier detection The PCA showed genetic structure corresponding to the sample locations. Together PC1 and 0.25 PC2 explained 20.33% of the variance as shown in Fig. 2. The Irish and English samples split along PC1 but not PC2. The analysis also shows the English samples grouping more closely together than the Irish samples which corresponds to the geographical extent of sampling in both countries (see Fig. 1).

a)

) 0.00 % Population

3 0.25

0 .

5 England

(

2 Ireland

C P

) 0.00

% Population

3

0 .

5 England

( b) −0.25

2 Ireland C

P 15

−0.25 10

5

−0.50 0 −0.50 0 5 10 15 20 PC −0.2 −0.1 −0.2 −0.10.0 0.0 0.1 0.1 0.20.2 PC1 (15.3%) PC1 (15.3%) Figure 2. (a) Principal component analysis (PCA) of genetic variation for 23 individuals of red grouse from Ireland and England. Each dot represents one individual with orange representing English birds and blue representing Irish birds. (b) Bar chart showing the amount of variation explained by the first 20 principal components.

PC3 and PC4 explained c. 5% variation each as shown in Fig. S1 (b). When the effect of population (PC1) is removed the Irish samples cluster closely together and the English samples are spread out Fig. S1 (b). This seems to show that when the effect of location is removed the Irish samples have less genetic variation than the English samples.

3.4 Analysis of admixture For the ADMIXTURE analysis the lowest cross-validation error, except for k=1, was found for k=2 (Fig. 3(b)). This is the number of genetic clusters which fits the data best. Fig. 3(a) shows the admixture results for 2-5 populations. When k is set to 2 it shows all English samples forming one distinct population. Of the 14 Irish individuals, two of them show admixture of c. 30-40% and another two individuals show minor admixture from the English population. The admixed individuals still show predominant ancestry from the Irish population. All admixed individuals from Ireland come from the Recent samples.

14

1.00 1.00 a)1.00 1.00

0.75 0.750.75 0.75

2

2

2

2

= = = 0.50

0.50 0.50 = 0.50

K K

K K 0.25 0.250.25 0.25 0.00 0.000.00 0.00 1.00 1.001.00 1.00

0.75 0.750.75 0.75

3

3

3

3

=

= =

0.50 0.500.50 = 0.50

K

K K

K 0.25 0.250.25 0.25 0.00 0.000.00 0.00 b) 1.00 1.001.00 1.00 0.75 0.750.75

0.75 1.6

4

4

4

4

=

= =

0.50 0.500.50 = 0.50

K

K K K 1.4 0.25 0.250.25 0.25 0.00 0.000.00 0.00 1.2 1.00 1.001.00 V 1.00 C 0.75 0.75

0.75 0.75 1.0

5

5

5

5

=

= =

0.50 0.500.50 = 0.50

K

K K K 0.8 0.25 0.250.25 0.25 0.00 0.000.00 0.00 RecentRecent Recent Recent Museum Museum Museum Museum England England England England 1 2 3 4 5 K Figure 3. (a) Admixture analysis performed with ADMIXTURE for K=2-5 genetic clusters. Each vertical bar represents one individual and the colours group them by genetic cluster. The solid black lines separate the three different populations of Recent (n=6), Museum (n=8) and English (n=9). The y-axis shows the ancestral fractions. Bars of a single colour imply ancestry from a single population. (b) Cross-validation error for ADMIXTURE runs with K set to 1-5.

3.5 Window-based FST scans

The mean FST between the Recent Irish and English samples was 0.095. The spatial distribution of ZFST across the genome is shown in Fig. 4 and it shows several peaks of FST. The major peaks are present on chromosomes 3, 4, 6 and 7. The windows with the highest ZFST scores are on chromosome 3 at 11.3 and 11.1 followed by chromosome 6 with two windows of 10.65 ZFST. The distribution of ZFST along the genome does not appear to be uniform.

12

10

8

t s

F 6 Z

4

2

0

Position Figure 4. Manhattan plot of genome-wide FST outlier scan for autosomal chromosomes from Recent Irish vs. English birds. The y-axis shows ZFST scores in 15 kb windows. A score >3 (above the green line) was considered an outlier. The changing colours indicate changing chromosomes. Each dot represents the average ZFST value for a 15 kb window.

15

3.6 Outlier Analysis There were two methods used to identify outlier candidate genes associated with local adaptation. Outlier regions were mapped to the annotated chicken genome to extract genes within them. Firstly, the package pcadapt was used to identified outlier SNPs. This was only carried out for the contemporary samples. There was a total of 12016 outlier SNPs found. These SNPs were not located within any coding genes. Within a 5kb region upstream and downstream there were 545 genes extracted and when duplicates were removed 333 remained. These SNPs are likely located in promotor regions or transcription factors and therefore could have functions relating to the nearby genes. Window-based FST scans were also used to detect outliers. FST values were transformed to ZFST. The genes in any windows with ZFSt >3 were extracted. This method identified 345 genes within 5kb upstream and downstream of the 15kb outlier regions. This resulted in a total list of 661 outlier genes. Of these, 16 genes were identified by both methods (Table S2). The full lists of genes for the two outlier methods are shown in Tables S3 and S4 with colour indicating function (green = immune, orange = pigmentation and red = other interesting functions). Outlier genes involved in the immune response are shown in Table 1, pigmentation in Table 2 and other genes with relevant functions of note are shown in Table 3.

3.6.1 Immune response There were up to 39 genes that were potentially involved in the immune response. The two genes which showed the highest differentiation were ADAM8 (ADAM metallopeptidase domain 8) with a ZFST of 9.52 and RSFR (leukocyte ribonuclease A-2) with a ZFST of 9.2. Of these genes, several were associated with microbial or viral infection (RSFR, IRAK2, CCL4, CCL5, B2M, HIST1H2B7L4, HIST1H2BO, CD79B, GAL4, OvoDA1, TNIP1) in chickens. A further 19 genes (MAL, ADAMTS1, C1QBP, TFEB, DMBT1, LPO, PLG, CACTIN, CARD9, CXCR4, IFI35, MAL, TRIL, CSF3, IL20, IL22, TRIL, ZBTB24, CSF3) were associated with response to pathogens in mammals. There were at least five genes associated with cancer, four in mammals (TUSC2, GKN2, ADAMTS1, FBXL8) and one in chickens (AQP5). The gene FBXL8 was the only immune response associated gene found by both methods.

16

Table 1. Outlier genes believed to be involved in the immune response which occurred within 15kb windows along the genome with z-transformed FST >3 (method=ZFST) or genes 5kb up and downstream of outlier SNPs detected with pcadapt (method=pcadapt). Chrom- ZF Ensembl Gene ID Gene name Justification Source Method osome ST

6 9.52 ENSGALG00000053549 ADAM8 Involved in the extravasation of leukocytes in Gómez-Gaviro et al. 2007 ZFST Bactericidal and angiogenic properties in chickens and a role Nitto et al. 2006, Rychlik et 6 9.2 ENSGALG00000027165 RSFR ZF in tissue repair al. 2014 ST 27 7.95 ENSGALG00000003267 STAT3 IL20 and IL22 may work through STAT3 Blumberg et al. 2001 ZFST 12 7.6 ENSGALG00000041338 TUSC2 Highly conserved lung cancer candidate Mariniello et al. 2020 ZFST Anti-inflammatory in humans and linked to gastric cancer 22 5.57 ENSGALG00000039187 GKN2 Menheniott et al. 2016 ZF progression ST 12 4.7 ENSGALG00000008407 IRAK2 Associated with infectious bronchitis virus in chickens Liu et al. 2018 ZFST 11 4.68 ENSGALG00000003201 FBXL8 Tumour suppressor in humans Yoshida et al. 2021 Both 33 4.46 ENSGALG00000041878 AQP5 Involved in ovarian cancers in chickens Tiwari et al. 2014 ZFST 26 4.05 ENSGALG00000039878 TFEB Associated with humoral immunity in mice Huan et al. 2006 ZFST Rosenstiel et al. 2007, 31 4.04 ENSGALG00000047283 DMBT1 Host defence and binding of pathogens in mice and humans ZF Madsen et al. 2010 ST Associated with feather damage related to pecking behaviour Bystry et al. 2001, Biscarini 19 3.76 ENSGALG00000034478 CCL4 ZF in chickens and recruitment of T-cells in mice et al. 2010 ST Hughes et al. 2007, Shini & 19 3.76 ENSGALG00000043603 CCL5 Pro-inflammatory cytokine in chickens ZF Kaiser 2009 ST Gerson et al. 2000, 19 3.76 ENSGALG00000039554 LPO Microbial defences in the airways of humans and sheep ZF Wijkstrom-Frei et al. 2003 ST Breaks down blood clots, inflammatory response and as reviewed in Heissig et al. 3 3.42 ENSGALG00000004293 PLG ZF macrophage recruitment in mammals 2020 ST 1 3.34 ENSGALG00000015795 ADAMTS5 Regulator of lymphocyte migration McMahon et al. 2016 ZFST Class I MHC component in humans and mice and Smith et al. 2015, Yu et al. 10 3.28 ENSGALG00000002160 B2M presentation of peptide antigens to T-cells. Expression is ZF 2013, Li et al. 2016 ST upregulated during viral infection in chickens. H2B histones have broad-spectrum antimicrobial activity, 1 3.19 ENSGALG00000051251 HIST1H2B7L4 Silphaduang et al. 2006 ZF present in chicken ovaries and oviducts ST H2B histones have broad-spectrum antimicrobial activity, 1 3.19 ENSGALG00000027571 HIST1H2B7L4 Silphaduang et al. 2006 ZF present in chicken ovaries and oviducts ST H2B histones have broad-spectrum antimicrobial activity, 1 3.19 ENSGALG00000050309 HIST1H2BO Silphaduang et al. 2006 ZF present in chicken ovaries and oviducts ST 1 - ENSGALG00000040342 ADAMTS1 Inflammatory response to tumours in mice Rodríguez-Baena et al. 2018 Pcadapt 19 - ENSGALG00000001654 C1QBP Pro-inflammatory response in humans Anders et al. 2018 Pcadapt

17

Innate immune response regulating toll-like receptors and 28 - ENSGALG00000024391 CACTIN Atzei et al. 2010 Pcadapt targets MHC Class III genes in humans 17 - ENSGALG00000001889 CARD9 Innate immune response Colonna 2007 Pcadapt 2 - ENSGALG00000011734 CCR8L Enriched in response to immune stress in chickens Guo et al. 2020 Pcadapt Response to viral infection in chickens, chicken B-cell Sayegh et al. 2000, Meydan et 27 - ENSGALG00000000251 CD79B Pcadapt development al. 2011 10 - ENSGALG00000031270 CSK Tumour suppression and B-cell activity Hata et al. 1994, Okada 2012 Pcadapt Katsumoto & Kume 2011, Formation of endocrine b-cells in chickens and the movement 7 - ENSGALG00000012357 CXCR4 Stein & Nombela-Arrieta Pcadapt of lymphocytes in mammals 2005 3 - ENSGALG00000019843 GAL4 Bactericidal activity in chickens against Salmonella spp. Milona et al. 2007 Pcadapt 27 - ENSGALG00000002832 IFI35 Innate immune response in humans Das et al. 2015 Pcadapt Angiogenic, proinflammatory in humans that potentially acts 26 - ENSGALG00000000911 IL20 Hsieh et al. 2006 Pcadapt through STAT3 Response to inflammation to reduce tissue damage and aid in 1 - ENSGALG00000009904 IL22 its repair, in particular regarding avian epithelial and liver Kim et al. 2012 Pcadapt cells 25 - ENSGALG00000011570 ILF2 Believed to have functions against viral infections in chickens Stricker et al. 2010 Pcadapt Kaiser & Mariani 1999, Gu et 4 - ENSGALG00000011849 interleukin-2 T-cell proliferative activity in chicken Pcadapt al. 2010 3 - ENSGALG00000008552 MAL T-cell proliferative activity in humans Alonso & Weissman 1987 Pcadapt Anti-microbial peptides that are located in the oviduct of 3 - ENSGALG00000032737 OvoDA1 Whenham et al. 2015 Pcadapt chickens 13 - ENSGALG00000004496 TNIP1 Associated with immunological and inflammatory diseases Ramirez et al. 2015 Pcadapt Functional role in the TLR4 complex involved in microbe 2 - ENSGALG00000011145 TRIL Carpenter et al. 2009 Pcadapt recognition in humans 3 - ENSGALG00000015231 ZBTB24 Mutations associated with immunodeficiency de Greef et al. 2011 Pcadapt Immune response in fish and humans and is present in Santos et al. 2006, Hamilton 27 - ENSGALG00000030907 CSF3 Pcadapt chickens and is upregulated following parasitic disease 2008, Giles et al. 2019

18

3.6.2 Pigmentation Regarding pigmentation, several genes were identified. Those with the highest ZFST of 10.26- 8.25 were LIPML1, LIPML2 and LIPML3 (lipase member M-like 1,2,3). These are potentially involved in the final steps of keratinocyte differentiation. There is also a gene called HSDL1 (hydroxysteroid dehydrogenase-like 1) which is involved in the maintenance of secondary sexual characteristics, which in birds includes plumage colour. This gene was identified using both outlier methods.

There were also 5 feather keratin-like genes (LOC107055272, LOC112530353, LOC770940, LOC107055236, LOC431320) and one feather gene. In total there were 13 keratin and keratin-like genes identified. Of these three (KRT75L4, KRT6A, KRT75) were identified in both methods. MC1R which codes for melanocortin 1 receptor was identified by pcadapt. This is a well-known and studied gene involved in pigmentation. When taking a closer look at the region containing this gene it was found that there was a SNP within the MC1R region with a particularly high FST of 0.3 indicating high differentiation for one SNP in this gene. Other notable genes involved in melanin production include HPS1 and SNAPIN.

19

Table 2. Outlier genes believed to be involved in the pigmentation which occurred within 15kb windows along the genome with z-transformed FST >3 (method=ZFST) or genes 5kb up and downstream of outlier SNPs detected with pcadapt (method=pcadapt). Chrom- Gene ZF Ensembl Gene ID Justification Source Method osome ST name Lipase member M are involved in the final steps of keratinocyte Toulza et al. 6 10.26 ENSGALG00000046392 LIPML1 ZF differentiation 2007 ST Lipase member M proteins are involved in the final steps of keratinocyte Toulza et al. 6 8.25 ENSGALG00000045194 LIPML2 ZF differentiation 2007 ST Lipase member M proteins are involved in the final steps of keratinocyte Toulza et al. 6 8.25 ENSGALG00000003557 LIPML3 ZF differentiation 2007 ST Maintenance of secondary sexual characteristics and sex differentiation by Gloux et al. 11 5.81 ENSGALG00000003273 HSDL1 ZF metabolising hormones 2019 ST Guo X et al. 33 3.6 ENSGALG00000030629 KRT6A Associated with frizzle feather Both 2018 Dong et al. 33 3.06 ENSGALG00000044875 KRT75L4 Frizzle chicken condition in Chinese indigenous chicken Both 2018 Ng et al. 33 3.01 ENSGALG00000035972 KRT75 Role in hair and nail formation and responsible for frizzle chicken condition Both 2012 Role in BLOC-3 complex in humans which is involved in melanin production Martina et al. 6 - ENSGALG00000017419 HPS1 and melanosome biogenesis, involved in iris development, intensified evolution 2003, Borges Pcadapt in barn owl eye-development. et al. 2019 Harris et al. 11 - ENSGALG00000054486 MC1R Pigmentation in several species including birds Pcadapt 2020 BLOC-1 complex, which is involved in melanin production and melanosome Setty et al. 25 - ENSGALG00000026796 SNAPIN Pcadapt biogenesis 2007

20

3.6.3 General genes and outlier method overlaps There were also many other genes identified with important functions. There were 8 genes associated with the eye, most often process in the retina in both chickens and mammals (RDH8, PDC, ATOH7, CALB2, FOXG1, RLBP1, OPNP, SIX5). There were also some genes involved in male fertility and spermatogenesis in humans and/or mice (DNAH3, DDX25, SYT6, TMEM203, TSGA10) and also one gene associated with fertilization (WBP2NL). There were genes involved in stress responses including hypoxia in chickens (EIF2AK1, FOXG1) and cold stress (TRH). Lastly, there were genes involved in various behavioural traits such as food intake, sleep and memory and energy balance (GHRL, NPW, HCRT, GALR2, GALR3, GRIN1, RHNO1). Of these, genes with particularly high differentiation include RDH8, DDX25 and DNAH3.

There were also a further 9 genes that were found by both methods and did not fall into any of the above categories. These are shown in Table S2. There was one keratin-like gene LOC112529929 (keratin, type II cytoskeletal 4-like) and two histones HIST1H2A4L3 and HIST1H2B5. Also PIGC (phosphatidylinositol glycan anchor biosynthesis class C), HS6ST3 (heparan sulfate 6-O-sulfotransferase 3), HEBP1(Heme-binding protein 1) and S100B.

21

Table 3. Outlier genes with interesting which occurred within 15kb windows along the genome with z-transformed FST >3 (method=ZFST) or genes 5kb up and downstream of outlier SNPs detected with pcadapt (method=pcadapt). Chrom- Gene ZF Ensembl Gene ID Justification Source Method osome ST name

6 7.89 ENSGALG00000029823 SPRN Prp(C)-like neuroprotective activity in mice Watts et al. 2007 ZFST Ben Khelifa et al. 14 6.36 ENSGALG00000002350 DNAH3 Male fertility in humans ZF 2014 ST 1 6.03 ENSGALG00000013422 RHNO1 Down-regulated in efficient residual feed intake chickens Yang et al. 2020 Both 19 5.48 ENSGALG00000039863 DDX25 Spermatogenesis in mice Sheng et al. 2006 ZFST 12 4.7 ENSGALG00000008411 GHRL Feeding and weight gain in chickens Jin et al. 2014 ZFST 27 4.51 ENSGALG00000011485 HCRT Energy balance and potentially food intake Miranda et al. 2013 Both Varying roles in ovarian steroidogenesis, cold stress in chickens in Wang & Xu 2008, 12 4.26 ENSGALG00000008490 TRH ZF humans potentially involved in hair growth Gáspár et al. 2010 ST 8 4.13 ENSGALG00000030417 PDC Found in the retina of rats, vision in mice and humans Zhu & Craft 2000 ZFST 17 3.92 ENSGALG00000008898 GRIN1 Energy balance, feeding and response to food deprivation in mice Liu et al. 2012 ZFST Shambharkar et al. 17 3.72 ENSGALG00000008925 TMEM203 Spermatogenesis in humans ZF 2015 ST Breaks down blood clots, inflammatory response, macrophage as reviewed in Heissig 3 3.42 ENSGALG00000004293 PLG ZF recruitment in mammals et al. 2020 ST 1 3.41 ENSGALG00000016757 TSGA10 Spermatogenesis in humans Ye et al. 2020 ZFST 8 3.05 ENSGALG00000032319 RDH8 Crucial steps of this visual cycle Lhor & Salesse 2014 ZFST 26 3.03 ENSGALG00000001968 SYT6 Sperm function in mice Hutt et al. 2005 ZFST Response to several stresses in humans including heme deficiency, Guo et al. 2020, Hu et 14 3.01 ENSGALG00000003391 EIF2AK1 oxidative stress, osmotic shock, mitochondrial dysfunction and heat al. 2020 ZFST shock and potentially involved in hypoxic stress in chickens Skowronska- 6 ENSGALG00000052760 ATOH7 Retina development in chickens Pcadapt Krawczyk et al. 2009 4 ENSGALG00000014975 DRD5 Potentially involved in feeding behaviour but needs further study Jiang et al. 2021 Pcadapt Retina development in mice and main candidate gene for hypoxia in Fotaki et al. 2013, 5 ENSGALG00000036364 FOXG1 Pcadapt Tibetan chicken Jiang et al. 2018 Hormone receptor highly expressed in chicken oviduct and Ho et al. 2012, 18 ENSGALG00000002088 GALR2 oviposition in quail. In chicken - wide expression in tissues involved Kołodziejski et al. Pcadapt in glucose homeostasis and is regulated by food deprivation 2021 Hormone receptor wide expression in tissues involved in glucose Kołodziejski et al. 1 ENSGALG00000012307 GALR3 Pcadapt homeostasis and is regulated by food deprivation in chickens 2021 Component of mtorc1 and mtorc2 which have roles in cell growth 14 ENSGALG00000005878 MLST8 Jacinto et al. 2004 Pcadapt and survival in response to nutrients in mammals

22

Associated with the regulation of food intake, sleep, social behaviour Takenoya et al. 2010, 14 ENSGALG00000045096 NPW and fear memory in mammals and is highly conserved between Nagata-Kuroiwa et al. Pcadapt chicken and other vertebrates 2011, Bu et al. 2016 He et al. 2009Xue et 10 ENSGALG00000006676 RLBP1 Needed for function of rod and cone photoreceptors Pcadapt al. 2015 Wu et al. 2007Wu et 1 ENSGALG00000011908 WBP2NL Fertilization in mammals Pcadapt al. 2007 5 ENSGALG00000017389 SIX5 Expressed in adult eyes particularly in the lens epithelium Winchester et al. 1999 Pcadapt 10 - ENSGALG00000006702 MFGE8 Mineralization of eggs in chicken Stapane et al. 2019 Pcadapt

23

3.6.4 Gene ontology (GO) enrichment analysis Gene ontology (GO) enrichment analysis was carried out with all 661 outliers. The gene ontology enrichment analysis results are shown in Table 4. There were no enrichment terms for Biological Processes. There were 12 enriched GO terms for Cellular Components. The most significantly enriched category was keratin filament.

Table 4. Gene ontology enrichment analysis for outlier gene identified using window-based FST outlier scans and pcadapt for Irish vs English red grouse. Genes in Total Enrichment FDR Functional Category/Pathway GO term list genes Cellular Component 2.00E-08 10 16 Keratin filament GO:0045095 4.00E-07 29 206 Intermediate filament GO:0005882 4.00E-07 31 234 Intermediate filament cytoskeleton GO:0045111 2.80E-06 15 66 Nucleosome GO:0000786 1.10E-05 15 74 DNA packaging complex GO:0044815 9.30E-04 17 131 Protein-DNA complex GO:0032993 9.30E-04 39 482 Polymeric cytoskeletal fibre GO:0099513 6.40E-03 43 611 Supramolecular complex GO:0099080 6.40E-03 43 611 Supramolecular polymer GO:0099081 6.40E-03 43 610 Supramolecular fibre GO:0099080 2.20E-02 15 146 Ribosomal subunit GO:0044391 4.20E-02 16 173 Ribosome GO:0005840 Molecular Function 7.00E-06 54 604 Structural molecule activity GO:0005198 2.40E-02 16 138 Structural constituent of ribosome GO:0003735 4.30E-02 18 181 Structural constituent of cytoskeleton GO:0005200 KEGG pathway 3.50E-02 13 115 Ribosome

There were several genes including KRT18 (keratin 18), KRT6A (keratin 6A), KRT75 (keratin 75), ENSGALG00000043689 (IF rod domain-containing protein), KRT75L4 (keratin, type II cytoskeletal 75-like 4), LOC112529929 (keratin, type II cytoskeletal 4-like) and KRT8 (keratin 80) which were enriched for seven GO enrichment categories including intermediate filament, keratin filament, intermediate filament cytoskeleton, polymeric cytoskeletal fibre, supramolecular complex, supramolecular polymer and supramolecular fibre. There were three categories that showed enrichment for Molecular Function which were Structural molecule activity, Structural constituent of ribosome and Structural constituent of cytoskeleton. The KEGG pathway ribosome was also significantly enriched. 3.7 Runs of Homozygosity There were 7270 runs of homozygosity (ROH) detected. Of these 7112 were considered short/medium ROH and 158 were considered long ROH. The average (± sd) length of individual ROH (LROH) was 307 (±180), 229 (±398) and 220 (± 196) for the Recent, Museum and English populations respectively. There was a significant effect of population on LROH (p <0.001) with a Tukey test revealing significant differences between the Recent vs Museum and English samples. The average (± sd) number of ROH per individual (NROH) was 403 (± 90), 379 (±223.9) and 203 (± 9) for Recent, Museum and English respectively. This was significantly affected by population (p=0.019) with a significant difference between Recent and English as shown by a Tukey test (p-adj=0.046) and Museum and England (p-adj= 0.035).

24

The total average (± sd) sum of ROH segments (SROH) overall were 123786.26 (±43287.5), 118321.68 (±33453.42) and 45328.54 (±4237.13) for the Recent, Museum and English populations respectively.

)

b K

( 2000

260 )

H

b

K

O

(

R

H m

240 O

u i

R 1750

d

g

e

n

m

o

/

t

L r

220

f

o

o h

1500

h

S

t

f

g

o n

200

e

h

t

L

g

e

n g

e 1250

a

L r

180

e

e

v

g

A

a

r

e v A England Museum Recent England Museum Recent

600

H

O R

30

H

m

O

u

i

R

d

g Population e

400 n

m

o

/

t L

r England

f 20

o

o

h

y Museum

S

c

f

n

o

e Recent

y u

c 200

q n

e 10

r

e

u

F

q

e

r F 0 0 0 50 100 150 0 20 40 60 Sum of ROH Lengths (Mb) Sum of ROH Lengths (Mb) Figure 5. Boxplots showing the average length of (a) short ROH segments (100-1000 kb) and (b) long ROH segments (>1000 kb) for each population. Frequency of (c) short/medium and (d) long ROH against the sum of ROH for each individual with the colours indicating the different populations.

3.7.1 Short/medium ROH Short/medium LROH was bounded between 100-1000 kb. There were 7112 Short/medium ROH, with an average (± sd) NROH of 309.2 (±163) per individual overall populations. The distribution of LROH per population is shown in Fig. 5 (a). This shows there is more variation for the Museum samples, which are overall similar to the English samples. The Recent samples show substantially longer short/medium LROH than the other two populations. The average LROH per population were 209.97 (±140.87), 242.81 (±178), 217.28 (±147.64) for the English, Irish and Museum populations respectively. For short/medium ROH there was a significant effect of the population (p <0.001) on LROH following the same trend as for all ROH with Recent individuals being statistically different from English (p-adj <0.001) and Museum (p-adj <0.001) but English and Museum samples not being statistically different (p- adj=0.310) as shown by a Tukey test.

3.7.2 Long ROH Long ROH were above 1000 kb in length and ranged in size from 1002-6476 kb. Long ROH were recorded in 7/9 English samples, 5/8 Museum samples and 6/6 Recent samples. The longest ROH in any English, Museum and Recent individual was 1619 kb, 4033 kb and 6473

25 kb respectively. The LROH per population (± sd) was 1340.63 (± 348.75) in the English population, 1655.09 (± 874.72) in the Recent and 1376.65 (±562.482) in the Museum. This is shown in Fig. 5 (b). The SROH for long ROH segments against the frequency is shown in Fig. 5 (d) and indicated English individuals have the shortest and least frequent long ROH, followed by the Museum and then the Recent individuals. Whereas for short/medium ROH the Museum and Recent individuals were not differentiated in this regard, however, the English were still the lowest. There was no significant effect of population on long ROH length (p=0.076). The average number per population (±sd) was 2.4 (±1), 18.3 (±9.89) and 6.2 (± 5.6) for the English, Recent and Museum populations respectively. The NROH was significantly affected by population (p <0.001) with a Tukey test revealing a significant difference between Recent and English. There is one individual as shown in Fig. 5 (d) that has more and longer ROH than all other individuals. This is an Irish individual from Co. Roscommon in the midlands and has 5 out of the 10 longest ROH. The Irish individual with the lowest ROH was from Co. Wicklow in the southeast.

The SROH for long ROH were 30343.26 (±21847.54), 8120.5 (±7646.68) and 2790.67 (±1166.86) for the Recent, Museum and English populations respectively.

4. DISCUSSION A major aim of this study was to address the practicality of using British red grouse to reinforce the Irish population using whole genome data. This was done using outlier detection analysis to identify candidate genes under putative divergent selection and population genomics to assess population structure. From these analyses, the level of differentiation between the two populations was determined. While this topic has been addressed before, this is the first time whole-genome sequence data has been used. 4.1 Genetic diversity Previous studies have found a significant reduction in the genetic diversity of Irish red grouse comparing historical (c. 1880) samples with recent samples (Freeland et al. 2007). This was not seen here, in contrast, there was higher diversity in the Recent samples than in the Museum samples. Freeland et al. found nucleotide diversity in historic samples to be 0.00247 (±0.00051) compared to the estimate here from WGS data of 0.00089 (±0.00059) in the Museum samples. There was not so large a difference seen in the Recent estimates with Freeland et al. reporting nucleotide diversity of 0.00117 (±0.00052) and this study recording it as 0.00109 (±0.00067).

From what is known of the large population declines and fragmentation of Irish red grouse (Cummins et al. 2015) and the issues in this study concerning the Museum samples the results from Freeland et al. showing a significant reduction in genetic diversity are more convincing. This is difficult to speculate on however since it could also be due to improved methods in extracting DNA from museum samples providing a more accurate measurement. However, since there are issues with the museum samples in this study it would be best to assume what was found in Freeland et al. (2007) regarding diversity levels is more accurate until this analysis can be repeated with correctly filtered museum sequences. The results for the Recent samples were very similar potentially also indicating a problem with the diversity analysis and the Museum samples. Also in analysis of variation among contemporary Irish and British populations, diversity levels in Ireland were lower (McMahon et al. 2012, Meyer- Lucht et al. 2012). Although Meyer-Lucht et al. (2012) found that they were not lower by a

26 large degree. Here the PCA plots show the Irish samples having more variation than the English. This is likely due to the larger area sampled in Ireland as well as the temporal difference between the Recent from 2007-2020 and Museum samples which were catalogued in c. 1881. When the PCA was carried out with only the contemporary samples (Fig S1 (b)) the English population seems to have more genetic variation then the Irish.

The ROH analysis suggested low levels of genetic diversity in both Museum and Recent samples, with long ROH indicative of lower genetic diversity. However, there was much more spread in the Museum samples which would that perhaps there was historic admixture and that the contemporary Irish individuals have undergone more recent declines in genetic diversity. 4.2 Genetic differentiation The PCA showed a clear separation between red grouse corresponding to the two populations along PC1, with all Irish individuals clustering together and all English individuals doing the same. There is more substructure within the Irish population, which can be explained by the samples being from a larger geographic extent and also likely due to different sampling times. When the Museum samples are removed (Fig. S1 (b)) it can be seen that there is indeed more variation in the English samples compared to the Recent Irish. There are however, two Irish individuals that are noticeably closer on the PCA to the English population than the other Irish individuals. When looking at the ADMIXTURE results (k=2) this can be explained by admixture from the English population into four of the Irish individuals, but most prominently in those that are placed closer to the English population on the PCA plot. This indicates that there has been recent gene flow between British and Irish red grouse which other studies did not find (Freeland et al. 2007, McMahon et al. 2012, Meyer-Lucht et al. 2016). There has been evidence of recent gene flow into Northern Ireland (Höglund & McMahon, unpublished observations). The birds showing considerable admixture (30-40%) in this study are from Co. Tipperary and Co. Wicklow in the south/south-west of Ireland. Only when increasing k to 5 is any evidence of admixture between the two populations absent. The two individuals that show minor admixture for k=2 and k=3 are from Co. Clare in the west of Ireland and Co. Monaghan in the north. This implies that there is admixture from Britain likely in multiple Irish populations. Even with this admixture, there is a clear separation between the two populations.

The pairwise FST estimate between Recent Irish and English red grouse was 0.095 which is in line with what previous studies have found for adaptive MHC-BLB genes of 0.09-0.116 (Meyer-Lucht et al. 2016) and somewhat higher than what has been shown for neutral microsatellite markers of 0.07 (McMahon et al. 2012) and 0.084 for neutral SNPs (Meyer- Lucht et al. 2016). Window-based FST scans along the genome were also carried out to identify the relative effects of genetic drift and selection on FST. When selection is the dominant force shaping differentiation it is expected that there will be peaks of FST along the genome as particular loci are under selection. Regions showing FST higher than the background level are evidence for directional selection, and lower regions are evidence for purifying selection (e.g. Kozma et al. 2019). In contrast, when genetic drift is the dominant force there are no obvious peaks or patterns. In this case, there are several peaks higher than the background level which is evidence that there is divergent selection between Irish and English red grouse. In addition outlier analysis using ZFST and pcadapt identified 661 candidate genes under putative divergent selection. It should be noted that there are several genes on chromosome 6 that occur very close together with particularly high ZFST. These

27 genes include LIPML1, LIPML2, LIPML3 and RSFR, ADAM8 and SPRN which all have a ZFST >7.4 in a 4.5 Mb region. This indicated that there could be an effect of linkage on the high Fst values here and perhaps all these genes are not under direct select but rather are hitchhiking on those that are 4.3 Outlier analysis There were two different methods used to identify outlier genes under putative divergent selection. Together these methods identified 661 genes, of which 16 were found using both methods. The ZFST outlier method was used to identified regions with ZFST >3 and then map these regions (± 5 kb) back to the annotated chicken genome and extract the genes. This has previously been used to assess local adaptation in grouse species (Kozma et al. 2020). The second method was pcadapt which detects outliers based on principal components and has successfully identified signatures of divergence in birds (e.g. Sendell-Price et al. 2020). There were two categories of genes decided upon a priori to which particular attention would be paid as there was evidence for them being differentiated. These were immune response genes (Meyer-Lucht et al. 2016) and pigmentation genes (Höglund & McMahon, unpublished observations). In addition, genes with other important functions were noted.

4.3.1 Immune response At least 39 genes involved in immune response were identified. Previously, MHC-BLB genes have been suggested to potentially be locally adapted between the populations (Meyer-Lucht et al. 2016) and while this study did not identify any BLB genes, the ZFST outlier method did highlight B2M (beta-2-microglobulin) which is a component of class I MHC. In humans and mice, it is involved in brain function (Smith et al. 2015) and the presentation of peptide antigens to T-cells (Li et al. 2016). In chickens, its expression is upregulated during viral infection (Yu et al. 2013). The gene RSFR (leukocyte ribonuclease A-2) was identified and had a high ZFST of 9.19. This gene is very likely to have bactericidal activity through granulocyte white blood cells and is angiogenic (Nitto et al. 2006). In chickens the protein encoded by this gene increases in macrophages following Salmonella enteritidis infection (Sekelova et al. 2017) which is a zoonotic pathogen mainly reservoired in birds (Obukhovska 2013). The only immune gene which was identified by both methods was FBXL8 (F-box and leucine rich repeat protein 8) which is involved in tumour suppression in humans (Yoshida et al. 2021).

There were four interleukins identified as outliers. Firstly, IL20 (interleukin 20) and STAT3 (signal transducer and activator of transcription 3) were identified. The cytokine IL20 in humans has a role in the function of the epidermis and functions through STAT3 (Blumberg et al. 2001). This gene may play a role in inflammatory disorders of the skin (Sabat et al. 2007) along with gastrointestinal diseases (Niess et al. 2018). There were two other interleukins identified - IL22 (interleukin 22) and ILF2 (interleukin enhancer-binding factor 2). IL22 in chickens has a role in response to inflammation in epithelial and liver cells and is also known to act through STAT3 (Kim et al. 2012, (Pickert et al. 2009). ILF2 is believed to function against viral infections in chickens (Stricker et al. 2010). IRAK2 (interleukin 1 receptor associated kinase 2) has been associated with avian infectious bronchitis virus (IBV) in chickens, and it is downregulated during infection (Liu et al. 2018).

There were five chemokine or chemokine-like genes identified as outliers. Chemokines are involved in the immune response and lymphocyte migration (Moser & Loetscher 2001). CCL4 may be involved in chicken feather damage related to pecking behaviour (Biscarini et

28 al. 2010). It is involved in chickens, in the recruitment of T-cells (Bystry et al. 2001) and is highly expressed in chicken lung cells lungs during avian influenza virus, as was IL22 (Ranaware et al. 2016). CCR8L is upregulated in response to immune stress in chickens (Guo et al. 2020). Another chemokine CXCR4 was also identified and is upregulated in red grouse with high T. tenuis burden (Webster et al. 2011b), is involved in germ cell migration (Lee et al. 2017) and the formation of pancreatic cells and endocrine -cells (Katsumoto & Kume, 2011) in chickens. It has also been shown to play a role in the movement of lymphocytes in mammals (Stein & Nombela-Arrieta 2005).

There were also two histones identified HIST1H2B7L4 (histone cluster 1, H2B-VII-like 4 (similar to human histone cluster 1, class H2B, member N), HIST1H2B7L2 (histone cluster 1, H2B-VII-like 2 (similar to human histone cluster 1, class H2B, member N) and HIST1H2BO (histone cluster 1, H2bo). HIST1H2B7L4 and HIST1H2B7L2 are members of the histone class H2B, members of which have broad-spectrum antimicrobial activity and are present in chicken ovaries (Silphaduang et al. 2006). The gene OvoDA1 (Ovodefensin A1) was also identified which codes for an anti-microbial peptide that is located in the oviduct of chickens (Whenham et al. 2015).

Several other genes were recorded which are associated with the response to pathogens in chickens including GAL4 (avian beta-defensin 4) which shows bactericidal activity against Salmonella spp. (Milona et al. 2007), CD79B (CD79b molecule) may be involved in resistance to the viral Marek's disease (Meydan et al. 2011) as well as having a speculated role in T-cell development and likely role in b-cell development (Sayegh et al. 2009). The gene CSK (C-terminal Src kinase) is involved in tumour suppression and B-cell activity (Hata et al. 1994, Okada 2012) in chickens and humans.

Overall, it can be seen that there are several genes involved in a variety of immune response functions which are under putative directional selection in red grouse and thus associated with local adaptation.

4.3.2 Pigmentation The MC1R gene which encodes for the melanocortin 1 receptor, is important for normal pigmentation in several organisms (Mundy 2005). An interaction between this gene and the Agouti gene plays an integral role in pigmentation in many species (Harris et al. 2020). Pigmentation is believed to be under divergent selection in red grouse between Ireland and Britain (Höglund & McMahon, unpublished observations). It was identified as an outlier with the pcadapt method and was located ~2.8MB downstream from an outlier SNP. In addition, a SNP within the MC1R region had a high FST of 0.3. Given this gene’s important role in crypsis (Harris et al. 2020) and that it has already shown to be differentiated between Irish and British red grouse there is evidence for this gene being differentiation and thus divergent selection regarding pigmentation.

HSDL1 was an outlier using both methods and may play a role in pigmentation as it is involved in the maintenance of secondary sexual characteristics and sex differentiation by metabolising hormones (Gloux et al. 2019). The genes HPS1 and SNAPIN were also extracted as outliers and in humans play a role in the BLOC-3 (Martina et al. 2003) and BLOC-1 (Setty et al. 2007) complexes respectively, which are involved in melanin production and melanosome biogenesis. Mutations in HPS1 have been shown to cause changes in mice pigmentation (Nguyen & Wei 2007).

29

4.3.3 Other relevant genes Genes with other important functions were involved in reproduction, processes in the eye and food intake. Previously, it was found that the EDIL3 gene was under divergent selection between red grouse and other grouse species (Kozma et al. 2019). This gene was not identified here as an outlier. However, the gene is involved in egg mineralization in chickens along with another gene MFGE8 (milk fat globule-EGF factor 8 protein) which was identified here as an outlier (Stapane et al. 2019).

Interestingly, several genes involved in the eye were also detected. SIX5 (SIX homeobox 5) is expressed in adult human eyes and particularly in the lens epithelium (Winchester et al. 1999). RLBP1 (retinaldehyde binding protein 1) is required for the correct functioning of rod and cone photoreceptors with mutations in this gene affecting visual pigments in humans (He et al. 2009) and is very important for vision in mice (Xue et al. 2015). In mice, FOXG1 (forkhead box G1) is involved in the development of the retina (Fotaki et al. 2013). ATOH7 (atonal bHLH transcription factor 7) is involved in retina development in chickens (Skowronska-Krawczyk et al. 2009). Also, the HPS1 (HPS1, biogenesis of lysosomal organelles complex 3 subunit 1) gene plays a role in iris development (Jardón et al. 2015) and has underwent intensified evolution in the development of barn owl eyes. The study states that nocturnal owls overall have dark irises which is consistent with HPS1 being under intensified selection (Borges et al. 2019). Lastly, PDC (phosducin) is present in mammal retinas and is associated with vision in mice and humans (Lee 1984, Zhu and Craft et al. 2000).

There were also two genes associated with high altitude one of which is the main candidate gene for hypoxia in Tibetan chicken - FOXG1 (forkhead box G1) (Jiang et al. 2018). The second is EIF2AK1 (eukaryotic translation initiation factor 2 alpha kinase 1) and is potentially involved in the adaptation to hypoxic stress in Tibetan chickens (Hu et al. 2020). There were several genes involved in spermatogenesis and sperm function in mice such as DDX25 (DEAD-box helicase 25) (Sheng et al. 2006) and SYT6 (synaptotagmin 6) (Hutt et al. 2005). Also, there were two genes involved in spermatogenesis in humans TSGA10 (Ye et al. 2020) and TMEM203 (Shambharkar et al. 2015).

Lastly, there were several genes involved in energy control and food intake. This is particularly interesting as with the different habitats in Ireland and Britain the belief is that in Ireland heather is not as dense as in Britain. Heather makes up 75% of the diet of red grouse chicks in Scotland and adults are known to rely more heavily on heather than their chicks (Savory 1977). It therefore would fit with the different habitats hypothesis that red grouse in Ireland perhaps have less optimal food available to them. However, this is rather speculatory since very little data exists from Ireland. It has been shown however, that even though there likely is not as much heather in Ireland it still makes up a large portion of the red grouse diet at least in some populations with 90% of adult droppings consisting of heather and 70-90% in chicks from a fragmented population in the west of Ireland (Lance & Mahon 1975). With this in mind, there were two genes GALR2 (galanin receptor 2) and GALR3 (galanin receptor 3) that have been studied in chickens regarding food deprivation. It has been shown that the expression of these two genes in tissues involved in the maintenance of glucose homeostasis is regulated by short-term food deprivation (Kołodziejski et al. 2021). Also, the gene GHRL (ghrelin/obestatin prepropeptide) is significantly associated with body weight, body weight gain, and feed conversion ratio (how food relates to biomass) in chickens (Jin et al. 2014). The gene HCRT (hypocretin neuropeptide precursor) is potentially involved in energy balance in chickens (Miranda et al. 2013). This gene was identified using both methods. The

30 gene DRD5 may be involved with feeding behaviour in chickens but needs further study (Jiang et al. 2021). Lastly, the gene RHNO1 was recorded with both methods and while no evidence has been found for an association with food deprivation it is downregulated in chickens which are very efficient with feed intake (Yang et al. 2020).

Other genes where there is evidence for associations with food intake and energy balance in mammals include NPW (neuropeptide W) which is also associated with sleep, social behaviour, and fear memory and in mammals. It is highly conserved between chickens and other vertebrates (Takenoya et al. 2010, Nagata-Kuroiwa et al. 2011, Bu et al. 2016). The gene GRIN1 (glutamate ionotropic receptor NMDA type subunit 1) is associated with energy balance in mice and the response to food deprivation (Liu et al. 2012). Lastly, the gene MLST8 (MTOR associated protein, LST8 homolog) was identified which in mammals is a component of mtorc1 and mtorc2 complexes which have roles in the growth and survival of cells in response to nutrients (Jacinto et al. 2004).

4.3.4 Gene ontology (GO) enrichment analysis Gene ontology (GO) enrichment analysis was carried out on the list of 661 outlier genes to identify whether or not certain genes, depending on their GO annotations for biological function, cellular component or molecular processes were over represented in the list of genes compared to their background representation in the genome. If certain genes are found to be over-represented since the gene list here was a group of differentiated outliers the results of this could uncover processes that are under divergent selection.

The gene ontology enrichment analysis showed divergent genes were significantly enriched for the several GO categories mostly concerning Cellular Components. The most enriched categories involved the keratins and were keratin filament (GO:0045095), intermediate filament (GO:0005882) and intermediate filament cytoskeleton (GO:0045111). In the hierarchy of GO terms, keratin filament is an intermediate filament which in turn is part of intermediate filament cytoskeleton. Of the genes which were enriched, there were 25 keratins. Intermediate filaments are essential for maintaining cell structure and plasticity (Herrmann et al. 2009), mutations in which lead to several skin disorders in humans (Lane & McLean 2004). In birds, they contribute to the formations of beaks, feathers, claws and also scales (Ng et al. 2014). Bird feathers consist of α and β keratins, with β keratins unique to birds and reptiles (Wu et al. 2015). There was one β-keratin-like gene LOC431320 which was enriched for six categories and also one β-keratin related protein BKJ which was enriched for six also. There were also five feather keratin-like genes LOC112530353, LOC107055236, LOC107055272, LOC107055272, LOC770940 which showed significant enrichment for several categories. This indicates that red grouse in Ireland and England are differentiated according to functions involving keratin. Keratin can play a role in the feather colour of birds as melanin granules are present in keratins (Shawkey et al. 2009, Shawkey & D’Alba 2017).

The other functional categories enrichment was found for include nucleosome (GO:0000786), DNA packaging complex (GO:0044815) and protein-DNA complex (GO:0032993). The genes enriched for these categories were exclusively histones except HOXA11 (homeobox A11) which was enriched for Protein-DNA complex. In the hierarchy of GO terms, nucleosome is a DNA packaging complex and a protein-DNA complex.

31

4.4 Runs of Homozygosity Runs of homozygosity (ROH) are the result of an individual inheriting identical haplotypes from both parents with the more recent the ancestor the longer the ROH segment. This is because over time recombination will break up ROH segments and in the absence of inbreeding, they will become smaller. Therefore long ROH are considered to result from recent inbreeding (Ceballos et al. 2018). Relatively the Recent Irish population has undergone more recent inbreeding with a higher SROH per population at 30343 kb (± 21847) for the Recent Irish individuals compared to 8690 kb (± 8405) for the Museum specimens and 3256 kb (± 1628) for the English individuals. For the Recent Irish samples, the SROH of short/medium ROH were c. 3 times bigger than for long ROH. For the Museum samples, this difference was larger with there being c. 10 times as many short/medium SROH compared to long for the Museum samples and c. 7 for the English. This points again to higher levels of inbreeding in the Recent Irish population. The NROH for long segments was also significantly higher in the Recent Irish population pointing to an increased level of inbreeding here (Ceballos et al. 2018).

Within the Irish samples, there is one individual that shows relatively low SROH for long segments at 9238 kb. This is an admixed individual so this pattern is expected (Ceballos et al. 2018) and this individual is from the Wicklow population which is recorded as having the most genetic diversity of the Irish populations (McMahon et al. 2012). The other admixed individual showed the second-highest SROH for long segments at 30114kb. In addition, one Irish individual had an extremely large long SROH (72147.95) relatively to other individuals. This individual came from Roscommon which is in the midlands and is an area that has experienced particularly serious declines (Cummins et al. 2015). These population declines and long ROH indicate that at least some populations of red grouse in Ireland seem to be inbred (Ceballos et al. 2018).

Short ROH indicate historic bottlenecks (Ceballos et al. 2018). All samples had more NROH and SROH for short/medium segments compared to long. The difference was largest in the Museum and English samples. This indicates a lesser effect of recent inbreeding in the Museum and English populations and that these short ROH are signs of historic inbreeding. There were two Museum samples with relatively short ROH. These were from Limerick and the Tipperary/Waterford border. This perhaps indicates that there has been some historical admixture into these populations which are members of the same genetic cluster (as shown in McMahon et al. 2012). In comparison, the contemporary sample from Tipperary/Waterford is admixed and shows the second highest long SROH. Indicating an increase in the levels of inbreeding on the Tipperary/Waterford border. The ROH for the Museum individuals were more spread than the Irish individuals indicating some populations did not suffer as much from inbreeding in the past but that overall the levels have now increased. There were two other pairs of historic and contemporary samples from the same location. In both instances, there was a higher NROH for short/medium segments in the Museum samples and long ROH were longer in the contemporary samples. This indicates that there is a higher level of inbreeding in the Recent Irish individuals and therefore a risk of inbreeding depression (Hedrick and Kalinowski, 2000, Ceballos et al. 2018).

Regarding the issue with Museum samples, to confirm these results the analysis would need to be repeated with correctly filtered sequences. Regardless these results do show that there seem to be higher levels in inbreeding in the contemporary Irish population compared to the English.

32

4.5 Management Implications The three most recent studies on red grouse (McMahon et al. 2012, Meyer-Lucht et al. 2016, Höglund & McMahon, unpublished observations) concluded Ireland should be treated as its own “Management Unit” (MU) as it showed significant and important differentiation from the other populations (Moritz 1994). This report agrees with this and has added to the evidence on important differentiation with a list of candidate genes under putative divergent selection.

Unlike some of these studies however, it does conclude that there has been recent gene flow from Britain to Ireland. Previously this had only been shown for the north of Ireland (Höglund & McMahon, unpublished observations) but now there is evidence of substantial admixture from gene flow in at least two of the southern populations. Selection seems to be playing an important role in shaping the differentiation between these two populations meaning this gene flow could introduce maladapted phenotypes into the Irish environment. The two populations show divergence in many genes with important biological functions including the immune response, pigmentation and food intake. This suggests red grouse in Ireland are adapted to the local conditions of potentially divergent parasite communities as well as different levels of heather in each habitat. The introduction of maladapted phenotypes and genotypes can result in outbreeding depression. This is when admixed individuals have reduced fitness (Frankham et al. 2011) and this could exacerbate conservation efforts. In addition, the contemporary Irish population has significantly more and longer ROH than the English individuals. The ROH analysis indicated higher inbreeding in Ireland, that this has potentially increasing over time and that there is likely reduced genetic diversity in the Irish population (Freeland et al. 2007).

It is clear that for the Irish population with increased levels of inbreeding, reduced genetic diversity and continuing populations declines (Cummins et al. 2015) that conservation action is needed however, in practice this can be hard to implement (National Red Grouse Steering Committee 2013). There is still a large enough population of red grouse in Ireland so that habitat management alone could be sufficient to increase the species but this will not be the case if there are further declines. The estimate from 2016 of an Ne of 100 means that if populations continue declining it may not be possible to conserve the Irish red grouse due to loss of genetic variation (Lynch & Lande 1998, Meyer-Lucht et al. 2016). It is therefore critical that this does not happen.

It is advised here, in agreement with previous studies (McMahon et al. 2012, Meyer-Lucht et al. 2016) that translocations from Britain be avoided. This recommendation is made based on the two populations showing divergence in several important genes and the risk of introducing genetic variation from maladapted birds. This recommendation is also in line with what is stated in the Species Action Plan (SAP), which notes that there are introductions of birds from Britain to Ireland. The SAP along with the current study recommends that this practice should end. Most importantly, to reduce the effects of genetic drift and inbreeding gene flow between the small and fragmented populations should be restored and at the bare minimum retained through habitat management. Restoring gene flow between the four main populations in Ireland through translocations could also help minimise the effects of inbreeding. If this was to occur, it is imperative to ensure that the chosen source population is healthy enough to support the removal of individuals (National Red Grouse Steering Committee 2013). As there is evidence of admixture in the Wicklow population and others it

33 is important to ensure maladapted genotypes are not moved around Ireland to isolated populations. 4.6 Study Limitations Firstly, the distribution of the samples was suboptimal. In the Irish samples, several populations were accounted for. However, this was not true for the British populations where all samples came from two areas in England in the Yorkshire Dales. There are British populations in Wales and the Hebridean Islands that have not been sampled by McMahon et al. (2012) or Meyer-Lucht et al. (2016). Since red grouse in the Outer Hebrides have been previously described as very similar to red grouse in Ireland (Kleinschmidt 1919) it would be useful to have them included in comparative genetic studies looking at red grouse variation across their range. Nonetheless, this study provides clear evidence that at least red grouse in Ireland and England show high levels of differentiation.

Secondly, as mentioned, the Museum samples were not optimally filtered. It has been shown for historical samples that over time there is an increase in the number of cytosine to thymine substitutions. This deamination accumulates at the end of DNA fragments (Sawyer et al. 2012, Weiß et al. 2016). USER enzyme was used to remove these damaged DNA fragments of uracil residues created by cytosine deamination. However, this method is less efficient at removing the effects of deamination at CpG sites (Briggs et al. 2010). Ideally, these sites would have been quantified and subsequently removed so as not to interfere with downstream analysis. 4.7 Conclusions There is evidence of the negative effects associated with fragmented and declining populations in Irish red grouse. The ROH analysis indicates there has been recent inbreeding in the contemporary Irish population. The English population had slightly higher genetic diversity than the Recent Irish samples. However, it must be taken into account that these English samples all come from one population in England whereas the Irish samples come from the four genetically distinct populations in Ireland. Had a comparison of all British populations been made with all Irish populations this gap would likely be larger. In addition, the PCA analyses also indicated there is more variation in the English samples.

The English and Irish samples are differentiated as shown from the PCA, ADMIXTURE and FST analyses. However, there is evidence of recent gene flow from the English population into some of the Irish individuals. This may be causing negative effects through outbreeding depression. The window-based FST scans, pcadapt outlier detection and GO enrichment analysis further confirmed that there are high levels of differentiation between Irish and English red grouse in terms of immune response, pigmentation and food intake.

These results provide further insights from a whole-genome perspective on the taxonomy and conservation needs of red grouse. Along with previous studies (Freeland et al. 2007, McMahon et al. 2012, Meyer-Lucht et al.2016 and Höglund & McMahon, unpublished observations), current downward population trends for Irish red grouse (Gilbert et al. 2021) and evidence shown here of increased levels of inbreeding there is a clear need to continue monitoring genetic diversity of Irish grouse. In addition to this, increasing the genetic viability of Irish red grouse through means other than translocations from Britain is required to reduce the potential effects of inbreeding depression.

34

ACKNOWLEDGEMENTS Firstly I would like to thank my supervisor Jacob Höglund for invaluable support, guidance and encouragement throughout the project. I would also like to thank my second supervisor Barry John McMahon for providing contacts for samples and reviewing the manuscript. I am also very grateful to Patrik Rödin-Mörch for providing his bioinformatics expertise and guidance. A thank you also to Filip Thörn and Martin Irestedt for their expertise and help in carrying out the lab work at the Department of Bioinformatics and Genetics at the Natural History Museum in Stockholm. Also, a special thank you to Séamus Butler, Pat O’ Sullivan and Rupert Butler of the ABGN Gun Club and John Leech who brought me out to see some red grouse and to hear about the conservation work they have been doing.

Historic samples were generously provided by The National Museum of Ireland – Natural History in Dublin with the help of Aidan O’Hanlon. Fresh samples were provided by Séamus Butler and Frank Reynolds. Feather samples were in storage at Uppsala University. The English samples were taken from a previous study (Kozma et al. 2019). The author acknowledges support from the National Genomics Infrastructure in Stockholm funded by Science for Life Laboratory, the Knut and Alice Wallenberg Foundation and the Swedish Research Council, and SNIC/Uppsala Multidisciplinary Center for Advanced Computational Science for assistance with massively parallel sequencing and access to the UPPMAX computational infrastructure.

35

REFERENCES Allen, D., Mellon, C. & Mawhinney, K. (2004) The Status of Red Grouse in Northern Ireland 2004. Environment & Heritage Service, Belfast. Anders E, Nebel D, Westman J, Herwald H, Nilsson B-O, Svensson D. 2018. Globular C1q receptor (p33) binds and stabilizes pro-inflammatory MCP-1: a novel mechanism for regulation of MCP-1 production and function. The Biochemical Journal 475: 775–786. Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. Available online at: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ Alonso MA, Weissman SM. 1987. cDNA cloning and sequence of MAL, a hydrophobic protein associated with human T-cell differentiation. Proceedings of the National Academy of Sciences of the United States of America 84: 1997–2001. Atzei P, Gargan S, Curran N, Moynagh PN. 2010. Cactin Targets the MHC Class III Protein IκB- like (IκBL) and Inhibits NF-κB and Interferon-regulatory Factor Signaling Pathways. Journal of Biological Chemistry 285: 36804–36817. Balmer DE, Gillings S, Caffrey BJ, Swann RL, Downie IS, Fuller RJ. 2013. Bird Atlas 2007-11: the breeding and wintering birds of Britain and Ireland. Thetford: BTO. Barnosky AD, Matzke N, Tomiya S, Wogan GOU, Swartz B, Quental TB, Marshall C, McGuire JL, Lindsey EL, Maguire KC, Mersey B, Ferrer EA. 2011. Has the Earth’s sixth mass extinction already arrived? Nature 471: 51–57. Barton NH. 2000. Genetic hitchhiking. Philosophical Transactions of the Royal Society B: Biological Sciences 355: 1553–1562. Ben Khelifa M, Coutton C, Zouari R, Karaouzène T, Rendu J, Bidart M, Yassine S, Pierre V, Delaroche J, Hennebicq S, Grunwald D, Escalier D, Pernet-Gallay K, Jouk P-S, Thierry-Mieg N, Touré A, Arnoult C, Ray PF. 2014. Mutations in DNAH1, which encodes an inner arm heavy chain dynein, lead to male infertility from multiple morphological abnormalities of the sperm flagella. American Journal of Human Genetics 94: 95–104. Biscarini F, Bovenhuis H, van der Poel J, Rodenburg TB, Jungerius AP, van Arendonk JAM. 2010. Across-Line SNP Association Study for Direct and Associative Effects on Feather Damage in Laying Hens. Behavior Genetics 40: 715–727. Blumberg H, Conklin D, Xu WF, Grossmann A, Brender T, Carollo S, Eagan M, Foster D, Haldeman BA, Hammond A, Haugen H, Jelinek L, Kelly JD, Madden K, Maurer MF, Parrish-Novak J, Prunkard D, Sexson S, Sprecher C, Waggie K, West J, Whitmore TE, Yao L, Kuechle MK, Dale BA, Chandrasekher YA. 2001. Interleukin 20: discovery, receptor identification, and role in epidermal function. Cell 104: 9–19. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120. Borges R, Fonseca J, Gomes C, Johnson WE, O’Brien SJ, Zhang G, Gilbert MTP, Jarvis ED, Antunes A. 2019. Avian Binocularity and Adaptation to Nocturnal Environments: Genomic Insights from a Highly Derived Visual Phenotype. Genome Biology and Evolution 11: 2244– 2255. Bracken F, McMahon BJ, Whelan J. 2008. Breeding bird populations of Irish peatlands. Bird Study 55: 169–178. Briggs AW, Stenzel U, Meyer M, Krause J, Kircher M, Pääbo S. 2010. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Research 38: e87. BroadInstitute. 2020a. Best Practices Workflow. online 2020: https://gatk.broadinstitute.org/hc/en- us/sections/360007226651-Best-Practices-Workflows. Accessed January 10, 2021/

36

Bruggeman DJ, Wiegand T, Fernández N. 2010. The relative effects of habitat loss and fragmentation on population genetic variation in the red-cockaded woodpecker (Picoides borealis). Molecular Ecology 19: 3679–3691. Bu G, Lin D, Cui L, Huang L, Lv C, Huang S, Wan Y, Fang C, Li J, Wang Y. 2016. Characterization of Neuropeptide B (NPB), Neuropeptide W (NPW), and Their Receptors in Chickens: Evidence for NPW Being a Novel Inhibitor of Pituitary GH and Prolactin Secretion. Endocrinology 157: 3562–3576. Bystry RS, Aluvihare V, Welch KA, Kallikourdis M, Betz AG. 2001. B cells and professional APCs recruit regulatory T cells via CCL4. Nature Immunology 2: 1126–1132. Carpenter S, Carlson T, Dellacasagrande J, Garcia A, Gibbons S, Hertzog P, Lyons A, Lin L-L, Lynch M, Monie T, Murphy C, Seidl KJ, Wells C, Dunne A, O’Neill LAJ. 2009. TRIL, a Functional Component of the TLR4 Signaling Complex, Highly Expressed in Brain. The Journal of Immunology 183: 3989–3995. Ceballos FC, Joshi PK, Clark DW, Ramsay M, Wilson JF. 2018. Runs of homozygosity: windows into population history and trait architecture. Nature Reviews Genetics 19: 220–234. Ceballos G, Ehrlich PR, Raven PH. 2020. Vertebrates on the brink as indicators of biological annihilation and the sixth mass extinction. Proceedings of the National Academy of Sciences 117: 13596–13602. Colonna M. 2007. All roads lead to CARD9. Nature Immunology 8: 554–555. Cummins S, Bleasdale A, Douglas C, Newton SF, O’Halloran J, Wilson HJ. 2015. Densities and population estimates of Red Grouse Lagopus lagopus scotica in Ireland based on the 2006- 2008 national survey. Irish Birds 10: 197–210. Cummins S, Bleasdale A, Douglas C, Newton S, O’Halloran J, Wilson HJ. 2010. The status of Red Grouse in Ireland and the effects of land use, habitat and habitat quality on their distribution. Irish Wildlife Manuals, No. 50. National Parks and Wildlife Service, Department of the Environment, Heritage and Local Government, Dublin, Ireland. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group. 2011. The variant call format and VCFtools. Bioinformatics 27: 2156–2158. Das A, Dinh PX, Pattnaik AK. 2015. Trim21 regulates Nmi-IFI35 complex-mediated inhibition of innate antiviral response. Virology 485: 383–392. de Greef JC, Wang J, Balog J, den Dunnen JT, Frants RR, Straasheijm KR, Aytekin C, van der Burg M, Duprez L, Ferster A, Gennery AR, Gimelli G, Reisli I, Schuetz C, Schulz A, Smeets DFCM, Sznajer Y, Wijmenga C, van Eggermond MC, van Ostaijen-ten Dam MM, Lankester AC, van Tol MJD, van den Elsen PJ, Weemaes CM, van der Maarel SM. 2011. Mutations in ZBTB24 Are Associated with Immunodeficiency, Centromeric Instability, and Facial Anomalies Syndrome Type 2. The American Journal of Human Genetics 88: 796–804. Díaz S, Settele J, Brondízio ES, Ngo HT, Agard J, Arneth A, Balvanera P, Brauman KA, Butchart SHM, Chan KMA, Garibaldi LA, Ichii K, Liu J, Subramanian SM, Midgley GF, Miloslavich P, Molnár Z, Obura D, Pfaff A, Polasky S, Purvis A, Razzaque J, Reyers B, Chowdhury RR, Shin Y-J, Visseren-Hamakers I, Willis KJ, Zayas CN. 2019. Pervasive human-driven decline of life on Earth points to the need for transformative change. Science. 366(6471). Dong J, He C, Wang Z, Li Y, Li S, Tao L, Chen J, Li D, Yang F, Li N, Zhang Q, Zhang L, Wang G, Akinyemi F, Meng H, Du B. 2018. A novel deletion in KRT75L4 mediates the frizzle trait in a Chinese indigenous chicken. Genetics Selection Evolution 50: 68. Endler JA. 1981. An overview of the relationships between mimicry and crypsis. Biological Journal of the Linnean Society 16: 25–31. Estes JA, Burdin A, Doak DF. 2016. Sea otters, kelp forests, and the extinction of Steller’s sea cow. Proceedings of the National Academy of Sciences 113: 880–885.

37

Finnerty EJ, Dunne J. 2007. Trichostrongylus tenuis Found in Red Grouse Lagopus scoticus in the Connemara National Park. Irish Naturalists Journal 28: 471 Finnerty EJ, Dunne J, McMahon BJ. 2007. Evaluation of Red Grouse (Lagopus scoticus) habitat in the Connemara National Park. Irish Birds 8: 207–214 Fotaki V, Smith R, Pratt T, Price DJ. 2013. Foxg1 is required to limit the formation of ciliary margin tissue and Wnt/β-catenin signalling in the developing nasal retina of the mouse. Developmental Biology 380: 299–313. Frankham R. 1995. Conservation Genetics. Annual Review of Genetics 29: 305–327. Frankham R. 2010. Where are we in conservation genetics and where do we need to go? Conservation Genetics 11: 661–663. Frankham R, Ballou JD, Eldridge MDB, Lacy RC, Ralls K, Dudash MR, Fenster CB. 2011. Predicting the Probability of Outbreeding Depression: Predicting Outbreeding Depression. Conservation Biology 25: 465–475. Freeland JR, Anderson S, Allen D, Looney D. 2007. Museum samples provide novel insights into the taxonomy and genetic diversity of Irish red grouse. Conservation Genetics 8: 695–703. García-Alcalde F, Okonechnikov K, Carbonell J, Cruz LM, Götz S, Tarazona S, Dopazo J, Meyer TF, Conesa A. 2012. Qualimap: evaluating next-generation sequencing alignment data. Bioinformatics (Oxford, England) 28: 2678–2679. Gáspár E, Hardenbicker C, Bodó E, Wenzel B, Ramot Y, Funk W, Kromminga A, Paus R. 2010. Thyrotropin releasing hormone (TRH): a new player in human hair-growth control. The FASEB Journal 24: 393–403. Ge SX, Jung D, Yao R. 2020. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics 36: 2628–2629. Gerson C, Sabater J, Scuri M, Torbati A, Coffey R, Abraham JW, Lauredo I, Forteza R, Wanner A, Salathe M, Abraham WM, Conner GE. 2000. The lactoperoxidase system functions in bacterial clearance of airways. American Journal of Respiratory Cell and Molecular Biology 22: 665–671. Gilbert G, Stanbury A, Lewis L. 2021. Birds of Conservation Concern in Ireland. Irish Birds 43: 1–22. Giles T, van Limbergen T, Sakkas P, Belkhiri A, Maes D, Kyriazakis I, Mendez J, Barrow P, Foster N. 2019. Differential gene response to coccidiosis in modern fast growing and slow growing broiler genotypes. Veterinary Parasitology 268: 1–8. Gilpin ME, Soulé ME. 1986. Minimum Viable Populations: Processes of Species Extinction. In: Soulé ME (ed.). Conservation Biology: The Science of Scarcity and Diversity, pp. 19–34. Sinauer Associates, Inc., Sunderland (Mass.). Gloux A, Duclos MJ, Brionne A, Bourin M, Nys Y, Réhault-Godbert S. 2019. Integrative analysis of transcriptomic data related to the liver of laying hens: from physiological basics to newly identified functions. BMC Genomics 20: 1-16. Gómez-Gaviro M, Domínguez-Luis M, Canchado J, Calafat J, Janssen H, Lara-Pezzi E, Fourie A, Tugores A, Valenzuela-Fernández A, Mollinedo F, Sánchez-Madrid F, Díaz-González F. 2007. Expression and regulation of the metalloproteinase ADAM-8 during human neutrophil pathophysiological activation and its catalytic activity on L-selectin shedding. Journal of Immunology (Baltimore, Md: 1950) 178: 8053–8063. Gu J, Ruan X, Huang Z, Chen J, Zhou J. 2010. Identification of functional domains of chicken Interleukin 2. Veterinary Immunology and Immunopathology 134: 230–238. Guo X, Li Y-Q, Wang M-S, Wang Z-B, Zhang Q, Shao Y, Jiang R-S, Wang S, Ma C-D, Murphy RW, Wang G-Q, Dong J, Zhang L, Wu D-D, Du B-W, Peng M-S, Zhang Y-P. 2018. A parallel mechanism underlying frizzle in domestic chickens. Journal of Molecular Cell Biology 10: 589–591.

38

Guo Y, Jiang R, Su A, Tian H, Zhang Y, Li W, Tian Y, Li K, Sun G, Han R, Yan F, Kang X. 2020. Identification of genes related to effects of stress on immune function in the spleen in a chicken stress model using transcriptome analysis. Molecular Immunology 124: 180–189. Hamilton JA. 2008. Colony-stimulating factors in inflammation and autoimmunity. Nature Reviews Immunology 8: 533–544. Harris RB, Irwin K, Jones MR, Laurent S, Barrett RDH, Nachman MW, Good JM, Linnen CR, Jensen JD, Pfeifer SP. 2020. The population genetics of crypsis in vertebrates: recent insights from mice, hares, and lizards. Heredity 124: 1–14. Hata A, Sabe H, Kurosaki T, Takata M, Hanafusa H. 1994. Functional analysis of Csk in signal transduction through the B-cell antigen receptor. Molecular and Cellular Biology 14: 7306– 7313. He X, Lobsiger J, Stocker A. 2009. Bothnia dystrophy is caused by domino-like rearrangements in cellular retinaldehyde-binding protein mutant R234W. Proceedings of the National Academy of Sciences 106: 18545–18550. Heissig B, Salama Y, Takahashi S, Osada T, Hattori K. 2020. The multifaceted role of plasminogen in inflammation. Cellular Signalling 75: 109761. Hereford J. 2009. A Quantitative Survey of Local Adaptation and Fitness Trade-Offs. The American naturalist 173: 579–88. Herrmann H, Strelkov SV, Burkhard P, Aebi U. 2009. Intermediate filaments: primary determinants of cell architecture and plasticity. The Journal of Clinical Investigation 119: 1772–1783. Ho JCW, Jacobs T, Wang Y, Leung FC. 2012. Identification and characterization of the chicken galanin receptor GalR2 and a novel GalR2-like receptor (GalR2-L). General and Comparative Endocrinology 179: 305–312. Höglund J, Wang B, Axelsson T, Quintela M. 2013. Phylogeography of willow grouse (Lagopus lagopus) in the Arctic: taxonomic discordance as inferred from molecular data: Phylogeography of L. Lagopus. Biological Journal of the Linnean Society 110: 77–90. Hsieh M-Y, Chen W-Y, Jiang M-J, Cheng B-C, Huang T-Y, Chang M-S. 2006. Interleukin-20 promotes angiogenesis in a direct and indirect manner. Genes & Immunity 7: 234–242. Hu Y, Su J, Cheng L, Lan D, Li D. 2020. Pectoral muscle transcriptome analyses reveal high- altitude adaptations in Tibetan chickens. Animal Biology 70: 385–400. Huan C, Kelly ML, Steele R, Shapira I, Gottesman SRS, Roman CAJ. 2006. Transcription factors TFE3 and TFEB are critical for CD40 ligand expression and thymus-dependent humoral immunity. Nature immunology 7: 1082–1091. Hudson PJ. 1986. The Effect of a Parasitic Nematode on the Breeding Production of Red Grouse. Journal of Animal Ecology 55: 85–92. Hughes S, Poh T-Y, Bumstead N, Kaiser P. 2007. Re-evaluation of the chicken MIP family of chemokines and their receptors suggests that CCL5 is the prototypic MIP family chemokine, and that different species have developed different repertoires of both the CC chemokines and their receptors. Developmental & Comparative Immunology 31: 72–86. Hutchinson C. 2010. Birds in Ireland. T & AD Poyser, Calton. Hutt DM, Baltz JM, Ngsee JK. 2005. Synaptotagmin VI and VIII and Syntaxin 2 Are Essential for the Mouse Sperm Acrosome Reaction *. Journal of Biological Chemistry 280: 20197–20203. Jacinto E, Loewith R, Schmidt A, Lin S, Rüegg MA, Hall A, Hall MN. 2004. Mammalian TOR complex 2 controls the actin cytoskeleton and is rapamycin insensitive. Nature Cell Biology 6: 1122–1128. Jeffries CL, Mansfield KL, Phipps LP, Wakeley PR, Mearns R, Schock A, Bell S, Breed AC, Fooks AR, Johnson N. 2014. Louping ill virus: an endemic tick-borne disease of Great Britain. The Journal of General Virology 95: 1005–1014.

39

Jiang SY, Xu HY, Shen ZN, Zhao CJ, Wu C. 2018. Genome-wide association analysis reveals novel loci for hypoxia adaptability in Tibetan chicken. Animal Genetics 49: 337–339. Jiang J, Qi L, Lv Z, Wei Q, Shi F. 2021. Dietary stevioside supplementation increases feed intake by altering the hypothalamic transcriptome profile and gut microbiota in broiler chickens. Journal of the Science of Food and Agriculture 101: 2156–2167. Jin S, Chen S, Li H, Lu Y, Xu G, Yang N. 2014. Associations of polymorphisms in GHRL, GHSR, and IGF1R genes with feed efficiency in chickens. Molecular Biology Reports 41: 3973–3979. Jones MR, Mills LS, Alves PC, Callahan CM, Alves JM, Lafferty DJR, Jiggins FM, Jensen JD, Melo-Ferreira J, Good JM. 2018. Adaptive introgression underlies polymorphic seasonal camouflage in snowshoe hares. Science 360: 1355–1358. Kaiser P, Mariani P. 1999. Promoter sequence, exon:intron structure, and synteny of genetic location show that a chicken cytokine with T-cell proliferative activity is IL2 and not IL15. Immunogenetics 49: 26–35. Katsumoto K, Kume S. 2011. Endoderm and mesoderm reciprocal signaling mediated by CXCL12 and CXCR4 regulates the migration of angioblasts and establishes the pancreatic fate. Development (Cambridge, England) 138: 1947–1955. Kim S, Faris L, Cox CM, Sumners LH, Jenkins MC, Fetterer RH, Miska KB, Dalloul RA. 2012. Molecular characterization and immunological roles of avian IL-22 and its soluble receptor IL-22 binding protein. Cytokine 60: 815–827. Kimura M. 1983. The Neutral Theory of Molecular Evolution. doi 10.1017/CBO9780511623486. Kleinschmidt, O. 1919. Die Färbungen des schottischen Moorhuhns. Falco 15: 2-4. Kołodziejski PA, Pruszyńska-Oszmałek E, Hejdysz M, Sassek M, Leciejewska N, Ziarniak K, Bień J, Ślósarz P, Kubiś M, Kaczmarek S. 2021. Effect of Fasting on the Spexin System in Broiler Chickens. Animals: 11(2): 518. Kozma R, Rödin-Mörch P, Höglund J. 2019. Genomic regions of speciation and adaptation among three species of grouse. Scientific Reports 9: 812. Lynch M, Lande R. 1998. The critical effective size for a genetically secure population. Animal Conservation 1: 70–72. Lane EB, McLean WHI. 2004. Keratins and skin disorders. The Journal of Pathology 204: 355– 366. Laurent S, Pfeifer SP, Settles ML, Hunter SS, Hardwick KM, Ormond L, Sousa VC, Jensen JD, Rosenblum EB. 2016. The population genomics of rapid adaptation: disentangling signatures of selection and demography in white sands lizards. Molecular Ecology 25: 306–323. Lhor M, Salesse C. 2014. Retinol dehydrogenases: Membrane-bound enzymes for the visual function. Biochemistry and Cell Biology 92: 1–14. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079. Li L, Dong M, Wang X-G. 2016. The Implication and Significance of Beta 2 Microglobulin: A Conservative Multifunctional Regulator. Chinese Medical Journal 129: 448–455. Lindsay D, Barr K, Lance R, Tweddale S, Hayden T, Leberg P. 2008. Habitat fragmentation and genetic diversity of an endangered, migratory songbird, the golden-cheeked warbler (Dendroica chrysoparia). Molecular ecology 17: 2122–33. Liu H, Yang X, Zhang Z, Zou W, Wang H. 2018. miR-146a-5p promotes replication of infectious bronchitis virus by targeting IRAK2 and TNFRSF18. Microbial Pathogenesis 120: 32–36. Liu T, Kong D, Shah BP, Ye C, Koda S, Saunders A, Ding JB, Yang Z, Sabatini BL, Lowell BB. 2012. Fasting Activation of AgRP Neurons Requires NMDA Receptors and Involves Spinogenesis and Increased Excitatory Tone. Neuron 73: 511–522.

40

Ludwig SC, Roos S, Bubb D, Baines D. 2017. Long-term trends in abundance and breeding success of red grouse and hen harriers in relation to changing management of a Scottish grouse moor. Wildlife Biology 2017: wlb.00246 Madsen J, Mollenhauer J, Holmskov U. 2010. Review: Gp-340/DMBT1 in mucosal innate immunity. Innate Immunity 16: 160–167. Mariniello RM, Orlandella FM, Stefano AED, Iervolino PLC, Smaldone G, Luciano N, Cervone N, Munciguerra F, Esposito S, Mirabelli P, Salvatore G. 2020. The TUSC2 Tumour Suppressor Inhibits the Malignant Phenotype of Human Thyroid Cancer Cells via SMAC/DIABLO Protein. International Journal of Molecular Sciences 21(3): 702. Martina J, Moriyama K, Bonifacino J. 2003. BLOC-3, a protein complex containing the Hermansky-Pudlak syndrome gene products HPS1 and HPS4. The Journal of biological chemistry 278: 29376–84. Martinez-Padilla J, Wenzel M, Mougeot F, Pérez-Rodríguez L, Piertney S, Redpath S. 2019. Parasite-mediated selection in red grouse -consequences for population dynamics and mate choice, In: Wilson K, Fenton A, Tompkins D (eds). Wildlife Disease Ecology: Linking Theory to Data and Application, pp. 296-320. Cambridge University Press. Mattila A, Duplouy A, Kirjokangas M, Lehtonen R, Rastas P, Hanski I. 2012. High genetic load in an old isolated butterfly population. Proceedings of the National Academy of Sciences of the United States of America 109: E2496-505. Maxwell SL, Fuller RA, Brooks TM, Watson JEM. 2016. Biodiversity: The ravages of guns, nets and bulldozers. Nature News 536: 143. McGuire K, Holmes E, Gao G, Reid H, Gould E. 1998. Tracing the origin of louping ill virus by molecular phylogenetic analysis. The Journal of general virology 79: 981–8. McMahon BJ, Johansson MP, Piertney SB, Buckley K, Höglund J. 2012. Genetic variation among endangered Irish red grouse (Lagopus lagopus hibernicus) populations: implications for conservation and management. Conservation Genetics 13: 639–647. McMahon BJ, Doyle S, Gray A, Kelly SBA, Redpath SM. 2020. European bird declines: Do we need to rethink approaches to the management of abundant generalist predators? Journal of Applied Ecology 57: 1885–1890. Mee A. 2018. The status and ecology of a remnant population of Ring Ouzel Turdus torquatus in the MacGillycuddy’s Reeks, Kerry. 11: 13–22. Menheniott TR, O’Connor L, Chionh YT, Däbritz J, Scurr M, Rollo BN, Ng GZ, Jacobs S, Catubig A, Kurklu B, Mercer S, Minamoto T, Ong DE, Ferrero RL, Fox JG, Wang TC, Sutton P, Judd LM, Giraud AS. 2016. Loss of gastrokine-2 drives premalignant gastric inflammation and tumor progression. The Journal of Clinical Investigation 126: 1383–1400. Meydan H, Yildiz MA, Dodgson JB, Cheng HH. 2011. Allele-specific expression analysis reveals CD79B has a cis-acting regulatory element that responds to Marek’s disease virus infection in chickens. Poultry Science 90: 1206–1211. Meyer M, Kircher M. 2010. Illumina Sequencing Library Preparation for Highly Multiplexed Target Capture and Sequencing. Cold Spring Harbor protocols 2010: pdb.prot5448. Meyer-Lucht Y, Mulder KP, James MC, McMahon BJ, Buckley K, Piertney SB, Höglund J. 2016. Adaptive and neutral genetic differentiation among Scottish and endangered Irish red grouse (Lagopus lagopus scotica). Conservation Genetics 17: 615–630. Milona P, Townes CL, Bevan RM, Hall J. 2007. The chicken host peptides, gallinacins 4, 7, and 9 have antimicrobial activity against Salmonella serovars. Biochemical and Biophysical Research Communications 356: 169–174. Miranda B, Esposito V, de Girolamo P, Sharp PJ, Wilson PW, Dunn IC. 2013. Orexin in the chicken hypothalamus: immunocytochemical localisation and comparison of mRNA concentrations during the day and night, and after chronic food restriction. Brain Research 1513: 34–40.

41

Moritz C. 1994. Defining ‘Evolutionarily Significant Units’ for conservation. Trends in Ecology & Evolution 9: 373–375. Moser B, Loetscher P. 2001. Lymphocyte traffic control by chemokines. Nature Immunology 2: 123–128. Mougeot, F., Evans, S.A. & Redpath, S.M. 2005a. Interactions between population processes in a cyclic species: parasites reduce autumn territorial behaviour of male red grouse. Oecologia, 144, 289–298. Mundy NI. 2005. A window on the genetics of evolution: MC1R and plumage colouration in birds. Proceedings of the Royal Society B: Biological Sciences 272: 1633–1640. Nagata-Kuroiwa R, Furutani N, Hara J, Hondo M, Ishii M, Abe T, Mieda M, Tsujino N, Motoike T, Yanagawa Y, Kuwaki T, Yamamoto M, Yanagisawa M, Sakurai T. 2011. Critical role of neuropeptides B/W receptor 1 signaling in social behavior and fear memory. PloS One 6: e16972. Nakahama N. 2021. Museum specimens: An overlooked and valuable material for conservation genetics. Ecological Research 36: 13–23. National Red Grouse Steering Committee. 2013. Red Grouse Species Action Plan. Available: https://www.npws.ie/sites/default/files/publications/pdf/2013_RedGrouse_SAP.pdf [Accessed 14-05-2021]. Newborn D, Foster R. 2002. Control of parasite burdens in wild red grouse Lagopus lagopus scoticus through the indirect application of anthelmintics. Journal of Applied Ecology 39: 909–914. Ng CS, Wu P, Foley J, Foley A, McDonald M-L, Juan W-T, Huang C-J, Lai Y-T, Lo W-S, Chen C-F, Leal SM, Zhang H, Widelitz RB, Patel PI, Li W-H, Chuong C-M. 2012. The Chicken Frizzle Feather Is Due to an α-Keratin (KRT75) Mutation That Causes a Defective Rachis. PLOS Genetics 8: e1002748. Ng CS, Wu P, Fan W-L, Yan J, Chen C-K, Lai Y-T, Wu S-M, Mao C-T, Chen J-J, Lu M-YJ, Ho M-R, Widelitz RB, Chen C-F, Chuong C-M, Li W-H. 2014. Genomic Organization, Transcriptomic Analysis, and Functional Characterization of Avian α- and β-Keratins in Diverse Feather Forms. Genome Biology and Evolution 6: 2258–2273. Niess JH, Hruz P, Kaymak T. 2018. The Interleukin-20 Cytokines in Intestinal Diseases. Frontiers in Immunology, 9: 1373. Nitto T, Dyer KD, Czapiga M, Rosenberg HF. 2006. Evolution and Function of Leukocyte RNase A Ribonucleases of the Avian Species, Gallus gallus. Journal of Biological Chemistry 281: 25622–25634. O’Donoghue, B.G. and Carey J.G.J. (2020). Curlew Conservation Programme Annual Report 2020. National Parks & Wildlife Service, Killarney. Okada M. 2012. Regulation of the Src Family Kinases by Csk. International Journal of Biological Sciences 8: 1385–1397. Obukhovska O. 2013. The Natural Reservoirs of Salmonella Enteritidis in Populations of Wild Birds. Online Journal of Public Health Informatics. 5(1): e171 Pfeifer SP, Laurent S, Sousa VC, Linnen CR, Foll M, Excoffier L, Hoekstra HE, Jensen JD. 2018. The Evolutionary History of Nebraska Deer Mice: Local Adaptation in the Face of Strong Gene Flow. Molecular Biology and Evolution 35: 792–806. Pickert G, Neufert C, Leppkes M, Zheng Y, Wittkopf N, Warntjen M, Lehr H-A, Hirth S, Weigmann B, Wirtz S, Ouyang W, Neurath MF, Becker C. 2009. STAT3 links IL-22 signaling in intestinal epithelial cells to mucosal wound healing. Journal of Experimental Medicine 206: 1465–1472. Pritchard JK, Stephens M, Donnelly P. 2000. Inference of Population Structure Using Multilocus Genotype Data. Genetics 155: 945–959.

42

Privé F, Luu K, Vilhjálmsson BJ, Blum MGB. 2020. Performing Highly Efficient Genome Scans for Local Adaptation with R Package pcadapt Version 4. Molecular Biology and Evolution 37: 2153–2154. Proença V, Pereira HM. 2017. Comparing Extinction Rates: Past, Present, and Future. Reference Module in Life Sciences, doi 10.1016/B978-0-12-809633-8.02128-2. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. 2007. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. American Journal of Human Genetics 81: 559–575. Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841–842. Quintela M, Berlin S, Wang B, Höglund J. 2010. Genetic diversity and differentiation among Lagopus lagopus populations in Scandinavia and Scotland: evolutionary significant units confirmed by SNP markers. Molecular Ecology 19: 2380–2393. Ramirez VP, Krueger W, Aneskievich BJ. 2015. TNIP1 reduction of HSPA6 gene expression occurs in promoter regions lacking binding sites for known TNIP1-repressed transcription factors. Gene 555: 430–437. Ranaware PB, Mishra A, Vijayakumar P, Gandhale PN, Kumar H, Kulkarni DD, Raut AA. 2016. Genome Wide Host Gene Expression Analysis in Chicken Lungs Infected with Avian Influenza Viruses. PloS One 11: e0153671. Reed DH, Frankham R. 2003. Correlation between Fitness and Genetic Diversity. Conservation Biology 17: 230–237. Reid HW. 1975. Experimental infection of red grouse with louping-ill virus (Flavivirus group): I. The viraemia and antibody response. Journal of Comparative Pathology 85: 223–229. Robinson RA. 2005. BirdFacts: profiles of birds occurring in Britain & Ireland. BTO, Thetford (http://www.bto.org/birdfacts, accessed on 08 December 2020) Rodríguez-Baena FJ, Redondo-García S, Peris-Torres C, Martino-Echarri E, Fernández-Rodríguez R, Plaza-Calonge M del C, Anderson P, Rodríguez-Manzaneque JC. 2018. ADAMTS1 protease is required for a balanced immune cell repertoire and tumour inflammatory response. Scientific Reports 8:13103 Rosenstiel P, Sina C, End C, Renner M, Lyer S, Till A, Hellmig S, Nikolaus S, Fölsch UR, Helmke B, Autschbach F, Schirmacher P, Kioschis P, Hafner M, Poustka A, Mollenhauer J, Schreiber S. 2007. Regulation of DMBT1 via NOD2 and TLR4 in Intestinal Epithelial Cells Modulates Bacterial Recognition and Invasion. The Journal of Immunology 178: 8203–8211. Rychlik I, Elsheimer-Matulova M, Kyrova K. 2014. Gene expression in the chicken caecum in response to infections with non-typhoid Salmonella. Veterinary Research 45: 119. Savory J. 2008. The food of Red Grouse chicks Lagopus l. scoticus. Ibis 119: 1–9. Sabat R, Wallace E, Endesfelder S, Wolk K. 2007. IL-19 and IL-20: two novel cytokines with importance in inflammatory diseases. Expert Opinion on Therapeutic Targets 11: 601–612. Santos MD, Yasuike M, Hirono I, Aoki T. 2006. The granulocyte colony-stimulating factors (CSF3s) of fish and chicken. Immunogenetics 58: 422–432. Savolainen O, Lascoux M, Merilä J. 2013. Ecological genomics of local adaptation. Nature Reviews Genetics 14: 807–820. Sawyer S, Krause J, Guschanski K, Savolainen V, Pääbo S. 2012. Temporal Patterns of Nucleotide Misincorporations and DNA Fragmentation in Ancient DNA. PLOS ONE 7: e34131. Sayegh CE, Demaries SL, Pike KA, Friedman JE, Ratcliffe MJH. 2000. The chicken B-cell receptor complex and its role in avian B-cell development. Immunological Reviews 175: 187–200.

43

Sekelova Z, Stepanova H, Polansky O, Varmuzova K, Faldynova M, Fedr R, Rychlik I, Vlasatikova L. 2017. Differential protein expression in chicken macrophages and heterophils in vivo following infection with Salmonella Enteritidis. Veterinary Research 48: 35. Sendell-Price AT, Ruegg KC, Clegg SM. 2020. Rapid morphological divergence following a human-mediated introduction: the role of drift and directional selection. Heredity 124: 535– 549. Setty SRG, Tenza D, Truschel ST, Chou E, Sviderskaya EV, Theos AC, Lamoreux ML, Di Pietro SM, Starcevic M, Bennett DC, Dell’Angelica EC, Raposo G, Marks MS. 2007. BLOC-1 Is Required for Cargo-specific Sorting from Vacuolar Early Endosomes toward Lysosome- related Organelles. Molecular Biology of the Cell 18: 768–780. Shambharkar PB, Bittinger M, Latario B, Xiong Z, Bandyopadhyay S, Davis V, Lin V, Yang Y, Valdez R, Labow MA. 2015. TMEM203 Is a Novel Regulator of Intracellular Calcium Homeostasis and Is Required for Spermatogenesis. PLOS ONE 10: e0127480. Shawkey MD, Morehouse NI, Vukusic P. 2009. A protean palette: colour materials and mixing in birds and butterflies. Journal of The Royal Society Interface 6: S221–S231. Shawkey MD, D’Alba L. 2017. Interactions between colour-producing mechanisms and their effects on the integumentary colour palette. Philosophical Transactions of the Royal Society B: Biological Sciences 372: 20160536. Sheng Y, Tsai-Morris C-H, Gutti R, Maeda Y, Dufau ML. 2006. Gonadotropin-regulated Testicular RNA Helicase (GRTH/Ddx25) Is a Transport Protein Involved in Gene-specific mRNA Export and Protein Translation during Spermatogenesis. Journal of Biological Chemistry 281: 35048–35056. Shini S, Kaiser P. 2009. Effects of stress, mimicked by administration of corticosterone in drinking water, on the expression of chicken cytokine and chemokine genes in lymphocytes. Stress 12: 388–399. Silphaduang U, Hincke MT, Nys Y, Mine Y. 2006. Antimicrobial proteins in chicken reproductive system. Biochemical and Biophysical Research Communications 340: 648–655. Skoglund P, Höglund J. 2010. Sequence Polymorphism in Candidate Genes for Differences in Winter Plumage between Scottish and Scandinavian Willow Grouse (Lagopus lagopus). PloS one 5: e10334. Skowronska-Krawczyk D, Chiodini F, Ebeling M, Alliod C, Kundzewicz A, Castro D, Ballivet M, Guillemot F, Matter-Sadzinski L, Matter J-M. 2009. Conserved regulatory sequences in Atoh7 mediate non-conserved regulatory responses in retina ontogenesis. Development (Cambridge, England) 136: 3767–3777. Smith LK, He Y, Park J-S, Bieri G, Snethlage CE, Lin K, Gontier G, Wabl R, Plambeck KE, Udeochu J, Wheatley EG, Bouchard J, Eggel A, Narasimha R, Grant JL, Luo J, Wyss-Coray T, Villeda SA. 2015. β2-microglobulin is a systemic pro-aging factor that impairs cognitive function and neurogenesis. Nature Medicine 21: 932–937. Sotherton N, Tapper S, Smith A. 2009. Hen harriers and red grouse: economic aspects of red grouse shooting and the implications for moorland conservation. Journal of Applied Ecology 46: 955–960. Spielman D, Brook B, Briscoe D, Frankham R. 2004. Does Inbreeding and Loss of Genetic Diversity Decrease Disease Resistance? Conservation Genetics 5: 439–448. Stapane L, Le Roy N, Hincke MT, Gautron J. 2019. The glycoproteins EDIL3 and MFGE8 regulate vesicle-mediated eggshell calcification in a new model for avian biomineralization. Journal of Biological Chemistry 294: 14526–14545. Stapane L, Le Roy N, Ezagal J, Rodriguez-Navarro AB, Labas V, Combes-Soia L, Hincke MT, Gautron J. 2020. Avian eggshell formation reveals a new paradigm for vertebrate mineralization via vesicular amorphous calcium carbonate. The Journal of Biological Chemistry 295: 15853–15869.

44

Stein JV, Nombela-Arrieta C. 2005. Chemokine control of lymphocyte trafficking: a general overview. Immunology 116: 1–12. Storey JD, Bass AJ, Dabney A, Robinson D (2020). qvalue: Q-value estimation for false discovery rate control. R package version 2.22.0. Stricker RLO, Behrens S-E, Mundt E. 2010. Nuclear Factor NF45 Interacts with Viral Proteins of Infectious Bursal Disease Virus and Inhibits Viral Replication. Journal of Virology 84: 10592–10605. Takenoya F, Kageyama H, Shiba K, Date Y, Nakazato M, Shioda S. 2010. Neuropeptide W: a key player in the homeostatic regulation of feeding and energy metabolism? Annals of the New York Academy of Sciences 1200: 162–169. Tallmon D, Luikart G, Waples R. 2004. The alluring simplicity and complex reality of genetic rescue. Trends in ecology & evolution 19: 489–96. Tiwari A, Hadley J, Ramachandran R. 2014. Aquaporin 5 expression is altered in ovarian tumors and ascites-derived ovarian tumor cells in the chicken model of ovarian tumor. Journal of ovarian research 7: 99. Toulza E, Mattiuzzo NR, Galliano M-F, Jonca N, Dossat C, Jacob D, de Daruvar A, Wincker P, Serre G, Guerrin M. 2007. Large-scale identification of human genes implicated in epidermal barrier function. Genome Biology 8: R107. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA. 2013. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Current protocols in bioinformatics 43(1110): 11.10.1-11.10.33. Van der Valk T, Díez-del-Molino D, Marques-Bonet T, Guschanski K, Dalén L. 2019. Historical Genomes Reveal the Genomic Consequences of Recent Population Decline in Eastern Gorillas. Current Biology 29: 165-170.e6. Wang JW, Xu SW. 2008. Effects of Cold Stress on the Messenger Ribonucleic Acid Levels of Corticotrophin-Releasing Hormone and Thyrotropin-Releasing Hormone in Hypothalami of Broilers. Poultry Science 87: 973–978. Watts J, Drisaldi B, Ng V, Yang J, Strome R, Horne P, Sy M-S, Yoong L, Young R, Mastrangelo P, Bergeron C, Fraser P, Carlson G, Mount H, Schmitt-Ulms G, Westaway D. 2007. The CNS glycoprotein Shadoo has PrPC-like protective properties and displays reduced levels in prion infections. The EMBO journal 26: 4038–50. Webster LMI, Paterson S, Mougeot F, Martinez-Padilla J, Piertney SB. 2011a. Transcriptomic response of red grouse to gastro-intestinal nematode parasites and testosterone: implications for population dynamics. Molecular Ecology 20: 920–931. Webster LMI, Mello LV, Mougeot F, Martinez‐Padilla J, Paterson S, Piertney SB. 2011b. Identification of genes responding to nematode infection in red grouse. Molecular Ecology Resources 11: 305–313. Weir BS, Cockerham CC. 1984. Estimating F-Statistics for the Analysis of Population Structure. Evolution 38: 1358–1370. Weiß CL, Schuenemann VJ, Devos J, Shirsekar G, Reiter E, Gould BA, Stinchcombe JR, Krause J, Burbano HA. Temporal patterns of damage and decay kinetics of DNA retrieved from plant herbarium specimens. Royal Society Open Science 3: 160239. Wenzel MA, Piertney SB. 2014. Fine-scale population epigenetic structure in relation to gastrointestinal parasite load in red grouse (Lagopus lagopus scotica). Molecular Ecology 23: 4256–4273. Wenzel MA, James MC, Douglas A, Piertney SB. 2015. Genome-wide association and genome partitioning reveal novel genomic regions underlying variation in gastrointestinal nematode burden in a wild bird. Molecular Ecology 24: 4175–4192.

45

Wenzel MA, Piertney SB. 2015. Digging for gold nuggets: uncovering novel candidate genes for variation in gastrointestinal nematode burden in a wild bird species. Journal of Evolutionary Biology 28: 807–825. Wenzel MA, Douglas A, James MC, Redpath SM, Piertney SB. 2016. The role of parasite-driven selection in shaping landscape genomic structure in red grouse (Lagopus lagopus scotica). Molecular Ecology 25: 324–341. Whenham N, Lu TC, Maidin MBM, Wilson PW, Bain MM, Stevenson ML, Stevens MP, Bedford MR, Dunn IC. 2015. Ovodefensins, an Oviduct-Specific Antimicrobial Gene Family, Have Evolved in Birds and Reptiles to Protect the Egg by Both Sequence and Intra-Six-Cysteine Sequence Motif Spacing1. Biology of Reproduction, 92(6):154, 1–13. Whiteley AR, Fitzpatrick SW, Funk WC, Tallmon DA. 2015. Genetic rescue to the rescue. Trends in Ecology & Evolution 30: 42–49. Wickham H. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. Wijkstrom-Frei C, El-Chemaly S, Ali-Rachedi R, Gerson C, Cobas MA, Forteza R, Salathe M, Conner GE. 2003. Lactoperoxidase and human airway host defense. American Journal of Respiratory Cell and Molecular Biology 29: 206–212. Winchester C, Ferrier R, Sermoni A, Clark B, Johnson K. 1999. Characterization of the expression of DMPK and SIX5 in the human eye and implications for pathogenesis in myotonic dystrophy. Human molecular genetics 8: 481–92. Wright OR. 1969. Summary of Research on the Selection Interview Since 1964. Personnel Psychology 22: 391–413. Wu ATH, Sutovsky P, Manandhar G, Xu W, Katayama M, Day BN, Park K-W, Yi Y-J, Xi YW, Prather RS, Oko R. 2007. PAWP, a Sperm-specific WW Domain-binding Protein, Promotes Meiotic Resumption and Pronuclear Development during Fertilization. Journal of Biological Chemistry 282: 12164–12175. Wu P, Ng CS, Yan J, Lai Y-C, Chen C-K, Lai Y-T, Wu S-M, Chen J-J, Luo W, Widelitz RB, Li W-H, Chuong C-M. 2015. Topographical mapping of α- and β-keratins on developing chicken skin integuments: Functional interaction and evolutionary perspectives. Proceedings of the National Academy of Sciences 112: E6770–E6779. Xue Y, Shen SQ, Jui J, Rupp AC, Byrne LC, Hattar S, Flannery JG, Corbo JC, Kefalov VJ. 2015. CRALBP supports the mammalian retinal visual cycle and cone vision. The Journal of Clinical Investigation 125: 727–738. Yang L, He T, Xiong F, Chen X, Fan X, Jin S, Geng Z. 2020. Identification of key genes and pathways associated with feed efficiency of native chickens based on transcriptome data via bioinformatics analysis. BMC Genomics 21: 292. Ye Y, Wei X, Sha Y, Li N, Yan X, Cheng L, Qiao D, Zhou W, Wu R, Liu Q, Li Y. 2020. Loss‐of‐ function mutation in TSGA10 causes acephalic spermatozoa phenotype in human. Molecular Genetics & Genomic Medicine 8:e1284 Yoshida A, Choi J, Jin HR, Li Y, Bajpai S, Qie S, Diehl JA. 2021. Fbxl8 suppresses lymphoma growth and hematopoietic transformation through degradation of cyclin D3. Oncogene 40: 292–306. Yu C, Liu Q, Qin A, Hu X, Xu W, Qian K, Shao H, Jin W. 2013. Expression kinetics of chicken β2-microglobulin and Class I MHC in vitro and in vivo during Marek’s disease viral infections. Veterinary Research Communications 37: 277–283. Zhu X, Craft CM. 2000. Modulation of CRX Transactivation Activity by Phosducin Isoforms. Molecular and Cellular Biology 20: 5216–5226.

46

SUPPLEMENTARY INFORMATION

(a) (b)

Figure S1. (a) Scree plot showing the variance explained by the first 10 PCs including all samples (Recent, Museum and English) using pcadapt. (b) PCA score plot showing PC3 and PC4 for English and all Irish samples.

47

Figure S2. (a) Scree plot showing the variance explained by the first 10 PCs for the comparison of contemporary Irish and English samples. Score plots for (b) PC1 and PC2 and (c) PC3 and PC4 and (d) qq plot of -log10(p-values) for the SNPs, (e) Manhattan plot of -log10(p-values) and (f) histogram of the frequency of p-values.

48

Table S1 Information on samples. Museum Mean Sample ID Location Sample type Sample year* ID coverage (x) IreScot1 - Clare Feather 2006/2007 21.0 IreScot2 - Monaghan Feather 2007 22.4 IreScot3 - Cork Feather 2007 21.9 IreScot5 - Roscommon Feather 2007 23.3 IreScot6 - Knockmealdowns, Tipperary Feather 2020 21.1 IreScot9 - Wicklow Muscle 2020 20.9 MusScot1 1881.590.1 Sligo Toepad 1881 23.7 MusScot2 2003.30.9 Tipperary Toepad 1881 23.7 MusScot3 1881.589.1 Offaly Toepad 1882 29.0 MusScot4 2003.30.17 Monaghan Toepad 1881 23.6 MusScot5 1881.594 Kerry Toepad 1881 23.8 MusScot6 2003.30.27 Wicklow Toepad 1881 23.7 MusScot7 2003.30.23 Limerick Toepad c. 1881 24.0 MusScot9 2003.30.24 Knockmealdowns, Waterford Toepad 1881 19.7 Scot1 - Feetham, Yorkshire Dales, UK Liver c. 1881-82 26.5 Scot2 - Feetham, Yorkshire Dales, UK Liver 1882 26.9 Scot3 - Feetham, Yorkshire Dales, UK Liver 2013 30.3 Scot4 - Feetham, Yorkshire Dales, UK Liver 2013 26.3 Scot5 - Gunnerside, Yorkshire Dales, UK Liver 2013 29.6 Scot6 - Gunnerside, Yorkshire Dales, UK Spleen 2013 26.8 Scot7 - Gunnerside, Yorkshire Dales, UK Spleen 2013 27.2 Scot8 - Gunnerside, Yorkshire Dales, UK Liver 2013 27.4 Scot9 - Gunnerside, Yorkshire Dales, UK Liver 2013 28.6

49

Table S2. Outlier genes that were recorded by both outlier methods. Chrom- ZF Ensembl Gene ID Gene name Description Source osome ST 10 7.4 ENSGALG00000002097 LOC415324 epididymal protein-like - 6.97 ENSGALG00000011485 HCRT Miranda et al. 27 Energy balance and potentially food intake 2013 Yang et al. 1 6.03 ENSGALG00000013422 RHNO1 Down-regulated in efficient residual feed intake birds 2020 ENSGALG00000006217 S100B Acruri et al. 7 5.8 Involved in chicken skeletal muscle development 2002 ENSGALG00000003273 HSDL1 Maintenance of secondary sexual characteristics and sex differentiation Gloux et al. 11 5.84 by metabolising hormones 2019 33 5.60 ENSGALG00000047132 LOC112529929 keratin, type II cytoskeletal 4-like - 33 4.86 ENSGALG00000042837 LOC100858942 glucagon-like - ENSGALG00000003201 FBXL8 Yoshida et al. 11 4.68 Tumour suppressor in humans 2021 1 3.98 ENSGALG00000011798 HEBP1 Heme-binding protein 1 - 1 3.98 ENSGALG00000011799 HEBP1 Heme-binding protein 1 - 1 3.67 ENSGALG00000025906 HS6ST3 heparan sulfate 6-O-sulfotransferase 3 - Guo X et al. 33 3.60 ENSGALG00000030629 KRT6A Associated with frizzle feather 2018 Role in hair and nail formation and responsible for frizzle chicken 33 3.60 ENSGALG00000035972 KRT75 Ng et al. 2012 condition 3.26 ENSGALG00000044875 KRT75L4 Dong et al. 33 Frizzle chicken condition in Chinese indigenous chicken 2018 8 3.23 ENSGALG00000027608 PIGC phosphatidylinositol glycan anchor biosynthesis class C - 1 3.19 ENSGALG00000032645 HIST1H2A4L3 Major component of nucleosome. - 1 3.19 ENSGALG00000052649 HIST1H2B5 Major component of nucleosome. -

50

Table S3. Outlier genes identified as putatively under divergent selection using genome wide Fst outliter scans between Irish and English Red Grouse. Colour signifies associated function, green = immune, orange = pigmentation, pink = interesting/relevant functions and blank=unassigned. NCBI Chromosome ZFST Ensembl Gene ID Accension Gene description Gene name number 6 9.52 ENSGALG00000053549 423672 ADAM metallopeptidase domain 8 ADAM8 6 8.40 ENSGALG00000053549 423672 ADAM metallopeptidase domain 8 ADAM8 6 7.89 ENSGALG00000053549 423672 ADAM metallopeptidase domain 8 ADAM8 1 3.34 ENSGALG00000015795 427971 ADAM metallopeptidase with thrombospondin type 1 motif 5 ADAMTS5 33 4.46 ENSGALG00000041878 431305 aquaporin 5 AQP5 33 3.65 ENSGALG00000041878 431305 aquaporin 5 AQP5 33 3.46 ENSGALG00000041878 431305 aquaporin 5 AQP5 10 3.28 ENSGALG00000002160 414830 beta-2-microglobulin B2M 19 3.76 ENSGALG00000034478 395468 C-C motif chemokine ligand 4 CCL4 19 3.76 ENSGALG00000043603 417465 chemokine (C-C motif) ligand 5 CCL5 8 3.51 ENSGALG00000005619 429084 coagulation factor III, tissue factor F3 31 4.04 ENSGALG00000047283 426819 deleted in malignant brain tumors 1 DMBT1 22 5.57 ENSGALG00000039187 419515 gastrokine 2 GKN2 22 3.07 ENSGALG00000039187 419515 gastrokine 2 GKN2 1 4.42 ENSGALG00000014442 374193 glyceraldehyde-3-phosphate dehydrogenase GAPDH 2 3.00 ENSGALG00000010949 428431 glycoprotein nmb GPNMB 1 3.98 ENSGALG00000011798 417961 heme binding protein 1 HEBP1 1 3.98 ENSGALG00000011799 417961 heme binding protein 1 HEBP1 histone cluster 1, H2B-VII-like 2 (similar to human histone cluster 1, class H2B, 1 3.19 ENSGALG00000027571 770188 HIST1H2B7L4 member N) 1 3.19 ENSGALG00000050309 100858607 histone cluster 1, H2bo HIST1H2BO histone cluster 1,H2B-VII-like 4 (similar to human histone cluster 1, class H2B, 1 3.19 ENSGALG00000051251 770267 HIST1H2B7L4 member N) 12 7.60 ENSGALG00000002138 415908 hyaluronoglucosaminidase 2 HYAL2 12 5.55 ENSGALG00000002138 415908 hyaluronoglucosaminidase 2 HYAL2 7 5.62 ENSGALG00000007511 396181 integrin subunit beta 2 ITGB2 7 4.91 ENSGALG00000007511 396181 integrin subunit beta 2 ITGB2 7 4.83 ENSGALG00000007511 396181 integrin subunit beta 2 ITGB2 12 4.70 ENSGALG00000008407 416118 interleukin 1 receptor associated kinase 2 IRAK2 33 3.60 ENSGALG00000030629 408041 keratin 6A KRT6A 33 3.06 ENSGALG00000030629 408041 keratin 6A KRT6A 33 3.01 ENSGALG00000030629 408041 keratin 6A KRT6A

51 19 3.76 ENSGALG00000039554 417466 lactoperoxidase LPO 6 9.20 ENSGALG00000027165 423668 leukocyte ribonuclease A-2 RSFR 6 7.56 ENSGALG00000027165 423668 leukocyte ribonuclease A-2 RSFR 6 7.41 ENSGALG00000027165 423668 leukocyte ribonuclease A-2 RSFR 17 5.35 ENSGALG00000008952 417282 NADPH oxidase activator 1 NOXA1 17 3.21 ENSGALG00000027231 NOTCH regulated ankyrin repeat protein 10 3.56 ENSGALG00000039164 415336 paracaspase 3 PCASP3 4 3.67 ENSGALG00000010122 422487 SH2 domain containing 4A SH2D4A 27 7.95 ENSGALG00000003267 420027 signal transducer and activator of transcription 3 STAT3 27 6.97 ENSGALG00000003267 420027 signal transducer and activator of transcription 3 STAT3 27 5.20 ENSGALG00000003267 420027 signal transducer and activator of transcription 3 STAT3 10 3.37 ENSGALG00000030672 415330 thyroid hormone receptor interactor 4 TRIP4 26 4.05 ENSGALG00000039878 419922 transcription factor EB TFEB 12 7.60 ENSGALG00000041338 415910 tumor suppressor candidate 2 TUSC2 12 5.55 ENSGALG00000041338 415910 tumor suppressor candidate 2 TUSC2 27 6.30 ENSGALG00000047812 107055259 feather keratin 2-like LOC112530353 27 6.30 ENSGALG00000052795 112530354 feather keratin 3-like LOC107055272 27 6.30 ENSGALG00000054042 112530353 feather keratin 3-like LOC107055272 27 6.30 ENSGALG00000051293 107055272 feather keratin 3-like LOC112530353 27 6.30 ENSGALG00000054373 100859500 feather keratin 3-like LOC770940 27 3.81 ENSGALG00000052287 107055227 feather keratin Cos1-1/Cos1-3/Cos2-1-like LOC107055236 11 5.81 ENSGALG00000003273 415703 hydroxysteroid dehydrogenase like 1 HSDL1 11 5.82 ENSGALG00000003273 415703 hydroxysteroid dehydrogenase like 1 HSDL1 33 5.61 ENSGALG00000030002 101749333 keratin 18 KRT18 33 3.06 ENSGALG00000032672 407779 keratin 5 KRT5 33 3.60 ENSGALG00000032672 407779 keratin 5 KRT5 33 3.01 ENSGALG00000035972 408042 keratin 75 KRT75 33 3.60 ENSGALG00000035972 408042 keratin 75 KRT75 33 3.06 ENSGALG00000044875 100858302 keratin, type II cytoskeletal 75-like 4 KRT75L4 33 3.26 ENSGALG00000044875 100858302 keratin, type II cytoskeletal 75-like 4 KRT75L4 6 10.26 ENSGALG00000046392 770890 lipase member M-like 1 LIPML1 6 10.64 ENSGALG00000046392 770890 lipase member M-like 1 LIPML1 6 10.65 ENSGALG00000046392 770890 lipase member M-like 1 LIPML1 6 8.25 ENSGALG00000045194 770883 lipase member M-like 2 LIPML2 6 10.26 ENSGALG00000045194 770883 lipase member M-like 2 LIPML2 6 10.64 ENSGALG00000045194 770883 lipase member M-like 2 LIPML2 6 8.25 ENSGALG00000003557 770870 lipase member M-like 3 LIPML3 6 10.26 ENSGALG00000003557 770870 lipase member M-like 3 LIPML3

52 6 10.30 ENSGALG00000003557 770870 lipase member M-like 3 LIPML3 8 3.05 ENSGALG00000032319 429086 retinol dehydrogenase 8 (all-trans) RDH8L 8 3.99 ENSGALG00000032319 429086 retinol dehydrogenase 8 (all-trans) RDH8L 27 3.18 ENSGALG00000028031 430993 complement C1q like 1 C1QL1 19 5.48 ENSGALG00000039863 427073 DEAD-box helicase 25 DDX25 14 6.36 ENSGALG00000002350 427004 dynein axonemal heavy chain 3 DNAH3 14 3.01 ENSGALG00000003391 395360 eukaryotic translation initiation factor 2 alpha kinase 1 EIF2AK1 12 4.70 ENSGALG00000008411 408185 ghrelin/obestatin prepropeptide GHRL 17 3.92 ENSGALG00000008898 404296 glutamate ionotropic receptor NMDA type subunit 1 GRIN1 17 6.55 ENSGALG00000008898 404296 glutamate ionotropic receptor NMDA type subunit 1 GRIN1 27 4.51 ENSGALG00000011485 374005 hypocretin neuropeptide precursor HCRT 27 6.97 ENSGALG00000011485 374005 hypocretin neuropeptide precursor HCRT 8 4.13 ENSGALG00000030417 429078 phosducin PDC 8 4.65 ENSGALG00000030417 429078 phosducin PDC 8 4.67 ENSGALG00000030417 429078 phosducin PDC 27 3.18 ENSGALG00000045212 428279 phospholipase C delta 3 PLCD3 3 3.42 ENSGALG00000004293 421580 plasminogen PLG 6 7.89 ENSGALG00000029823 Shadow of prion protein SPRN 6 8.25 ENSGALG00000029823 Shadow of prion protein SPRN 6 10.26 ENSGALG00000029823 Shadow of prion protein SPRN 6 10.30 ENSGALG00000029823 Shadow of prion protein SPRN 28 3.02 ENSGALG00000000430 420056 signal peptide peptidase like 2C SPPL2C 28 3.88 ENSGALG00000000430 420056 signal peptide peptidase like 2C SPPL2C 26 3.03 ENSGALG00000001968 428270 synaptotagmin 6 SYT6 1 3.41 ENSGALG00000016757 418696 testis specific 10 TSGA10 12 4.26 ENSGALG00000008490 414344 thyrotropin releasing hormone TRH 17 3.72 ENSGALG00000008925 112529981 transmembrane protein 203 TMEM203 17 4.94 ENSGALG00000008925 112529981 transmembrane protein 203 TMEM203 2 3.00 ENSGALG00000046897 . 14 3.01 ENSGALG00000003378 . 14 3.01 ENSGALG00000049190 . 3 3.05 ENSGALG00000051018 . 6 3.13 ENSGALG00000005613 . 14 3.13 ENSGALG00000003378 . 14 3.13 ENSGALG00000049190 . 4 3.16 ENSGALG00000054035 . 1 3.18 ENSGALG00000049532 . 1 3.19 ENSGALG00000052402 .

53 8 3.26 ENSGALG00000047029 . 8 3.30 ENSGALG00000048056 . 7 3.37 ENSGALG00000050048 . 4 3.43 ENSGALG00000029286 . 1 3.50 ENSGALG00000049532 . 14 3.50 ENSGALG00000042810 . 1 3.50 ENSGALG00000046911 . 1 3.50 ENSGALG00000051313 . 4 3.50 ENSGALG00000044816 . 5 3.64 ENSGALG00000053204 . 2 3.66 ENSGALG00000050611 . 7 3.69 ENSGALG00000046256 . 7 3.69 ENSGALG00000050089 . 7 3.69 ENSGALG00000051229 . 7 3.69 ENSGALG00000050048 . 11 3.73 ENSGALG00000041408 . 27 3.81 ENSGALG00000052680 . 13 3.86 ENSGALG00000052502 . 1 3.90 ENSGALG00000050991 . 28 3.93 ENSGALG00000047550 . 1 3.98 ENSGALG00000011796 . 4 4.00 ENSGALG00000054035 . 28 4.07 ENSGALG00000047550 . 2 4.08 ENSGALG00000048735 . 2 4.09 ENSGALG00000050611 . 1 4.14 ENSGALG00000052140 . 2 4.22 ENSGALG00000047510 . 2 4.22 ENSGALG00000046824 . 2 4.22 ENSGALG00000052157 . 7 4.23 ENSGALG00000046256 . 1 4.33 ENSGALG00000051865 . 9 4.37 ENSGALG00000031071 . 14 4.51 ENSGALG00000042810 . 28 4.52 ENSGALG00000047550 . 7 4.70 ENSGALG00000046256 . 14 4.76 ENSGALG00000052056 . 7 4.77 ENSGALG00000050048 . 2 4.80 ENSGALG00000050611 .

54 2 4.85 ENSGALG00000046897 . 14 4.92 ENSGALG00000042810 . 14 5.09 ENSGALG00000053304 . 14 5.09 ENSGALG00000031384 . 14 5.32 ENSGALG00000042810 . 19 5.48 ENSGALG00000047563 . 1 5.59 ENSGALG00000048907 . 8 5.93 ENSGALG00000048056 . 14 5.99 ENSGALG00000053304 . 14 5.99 ENSGALG00000031384 . 9 6.09 ENSGALG00000031071 . 27 6.30 ENSGALG00000054001 . 2 6.31 ENSGALG00000048735 . 2 6.42 ENSGALG00000046897 . 8 6.49 ENSGALG00000048056 . 18 6.55 ENSGALG00000029660 . 4 6.94 ENSGALG00000053520 . 13 6.98 ENSGALG00000036310 . 8 7.39 ENSGALG00000053885 . 4 7.80 ENSGALG00000044816 . 13 8.41 ENSGALG00000036310 . 8 8.54 ENSGALG00000053885 . 13 8.60 ENSGALG00000052502 . 3 3.03 ENSGALG00000051378 . 33 3.06 ENSGALG00000043689 . 1 3.19 ENSGALG00000046904 . 26 3.60 ENSGALG00000031807 . 1 3.67 ENSGALG00000041311 . 8 3.80 ENSGALG00000006091 . 27 3.81 ENSGALG00000048288 . 27 3.81 ENSGALG00000047069 . 19 4.37 ENSGALG00000042819 . 9 4.37 ENSGALG00000046790 . 9 6.09 ENSGALG00000046790 . 19 3.76 ENSGALG00000044527 770506 1-acylglycerol-3-phosphate O-acyltransferase 1 AGPAT1 27 4.51 ENSGALG00000003345 420029 1-phosphatidylinositol-4,5-bisphosphate phosphodiesterase delta-4-like PIBPPDD4L 27 6.97 ENSGALG00000003345 420029 1-phosphatidylinositol-4,5-bisphosphate phosphodiesterase delta-4-like PIBPPDD4L 17 3.46 ENSGALG00000045310 427775 3-ketosteroid-9-alpha-hydroxylase oxygenase subunit-like LOC427775

55 17 3.92 ENSGALG00000045310 427775 3-ketosteroid-9-alpha-hydroxylase oxygenase subunit-like LOC427775 17 4.01 ENSGALG00000045310 427775 3-ketosteroid-9-alpha-hydroxylase oxygenase subunit-like LOC427775 6 6.18 ENSGALG00000003689 423678 3'-phosphoadenosine 5'-phosphosulfate synthase 2 PAPSS2 6 6.74 ENSGALG00000003689 423678 3'-phosphoadenosine 5'-phosphosulfate synthase 2 PAPSS2 6 7.70 ENSGALG00000003689 423678 3'-phosphoadenosine 5'-phosphosulfate synthase 2 PAPSS2 1 3.01 ENSGALG00000029746 427914 5'-nucleotidase domain containing 3 NT5DC3 14 5.09 ENSGALG00000002200 426999 acyl-CoA synthetase medium-chain family member 3 ACSM3 14 5.30 ENSGALG00000002200 426999 acyl-CoA synthetase medium-chain family member 3 ACSM3 14 5.99 ENSGALG00000002200 426999 acyl-CoA synthetase medium-chain family member 3 ACSM3 10 3.97 ENSGALG00000035202 769343 adaptor related protein complex 3 beta 2 subunit AP3B2 10 7.41 ENSGALG00000035202 769343 adaptor related protein complex 3 beta 2 subunit AP3B2 2 4.09 ENSGALG00000027757 420589 ADP ribosylation factor like GTPase 4A ARL4A 10 3.59 ENSGALG00000037950 768936 alpha kinase 3 ALPK3 17 3.21 ENSGALG00000030255 770795 alpha-1-microglobulin/bikunin precursor AMBP 17 3.61 ENSGALG00000030255 770795 alpha-1-microglobulin/bikunin precursor AMBP 14 3.01 ENSGALG00000003400 769030 aminoacyl tRNA synthetase complex interacting multifunctional protein 2 AIMP2 4 3.32 ENSGALG00000010866 428752 amphiregulin AREG 13 3.86 ENSGALG00000001132 101750838 amyloid beta (A4) precursor protein-binding, family B, member 3 APBB3 13 8.60 ENSGALG00000001132 101750838 amyloid beta (A4) precursor protein-binding, family B, member 3 APBB3 19 4.37 ENSGALG00000041502 395773 angiopoietin-related protein 1-like ANGPT1L 14 3.01 ENSGALG00000051639 101752127 ankyrin repeat domain 61 ANKRD61 9 3.17 ENSGALG00000041501 100859787 APC membrane recruitment protein 3 AMER3 9 4.88 ENSGALG00000041501 100859787 APC membrane recruitment protein 3 AMER3 9 4.89 ENSGALG00000041501 100859787 APC membrane recruitment protein 3 AMER3 1 3.19 ENSGALG00000053511 769889 apolipoprotein L domain containing 1 APOLD1 33 3.46 ENSGALG00000037878 431304 aquaporin 2 AQP2 33 3.65 ENSGALG00000037878 431304 aquaporin 2 AQP2 33 4.46 ENSGALG00000037878 431304 aquaporin 2 AQP2 33 5.37 ENSGALG00000037878 431304 aquaporin 2 AQP2 8 3.44 ENSGALG00000005647 424487 ATP binding cassette subfamily D member 3 ABCD3 8 3.51 ENSGALG00000005647 424487 ATP binding cassette subfamily D member 3 ABCD3 6 5.15 ENSGALG00000003652 423676 ATPase family, AAA domain containing 1 ATAD3 6 6.18 ENSGALG00000003652 423676 ATPase family, AAA domain containing 1 ATAD2 6 6.74 ENSGALG00000003652 423676 ATPase family, AAA domain containing 1 ATAD1 10 3.15 ENSGALG00000001798 415318 Bardet-Biedl syndrome 4 BBS4 33 4.65 ENSGALG00000043400 431303 BCDIN3 domain containing RNA methyltransferase BCDIN3D 33 5.37 ENSGALG00000043400 431303 BCDIN3 domain containing RNA methyltransferase BCDIN3D 33 6.19 ENSGALG00000043400 431303 BCDIN3 domain containing RNA methyltransferase BCDIN3D

56 19 4.37 ENSGALG00000036647 100857890 BCL tumor suppressor 7B BCL7B 6 7.41 ENSGALG00000032838 395651 beta-keratin related protein BKJ 6 7.56 ENSGALG00000032838 395651 beta-keratin related protein BKJ 26 3.60 ENSGALG00000052763 419932 BTG anti-proliferation factor 2 BTG2 26 4.22 ENSGALG00000052763 419932 BTG anti-proliferation factor 2 BTG2 1 4.42 ENSGALG00000028638 100499455 C-type natriuretic peptide 1 CNP1 26 3.03 ENSGALG00000037744 395985 calcium voltage-gated channel subunit alpha1 S CACNA1S 4 4.17 ENSGALG00000030209 395266 carboxypeptidase Z CPZ 17 7.09 ENSGALG00000053517 417269 cell division cycle 26 CDC26 2 3.07 ENSGALG00000035978 420608 cell division cycle associated 7 like CDCA7L 28 5.10 ENSGALG00000043352 420050 ceramide synthase 4 CERS4 1 3.83 ENSGALG00000030933 772057 chromosome 1 open reading frame, human C12orf73 C1H12ORF73 12 3.66 ENSGALG00000002240 415914 chromosome 12 C3orf18 homolog C3orf18 12 4.95 ENSGALG00000002240 415914 chromosome 12 C3orf18 homolog C3orf18 5 3.15 ENSGALG00000036504 423524 chromosome 5 open reading frame, human C14orf39 C14ORF39 7 4.91 ENSGALG00000047637 424040 chromosome 7 C21orf58 homolog C7H21orf58 7 5.62 ENSGALG00000047637 424040 chromosome 7 C21orf58 homolog C7H21orf58 7 7.13 ENSGALG00000047637 424040 chromosome 7 C21orf58 homolog C7H21orf58 8 4.65 ENSGALG00000005080 424455 chromosome 8 C1orf27 homolog C8H1orf27 4 3.43 ENSGALG00000009128 772269 chromosome X open reading frame 40B CXorf40B 33 7.46 ENSGALG00000034825 425200 chymotrypsin like elastase family member 1 CELA1 9 3.58 ENSGALG00000038964 424768 claudin 15 CLDN15 9 4.00 ENSGALG00000038964 424768 claudin 15 CLDN15 2 4.32 ENSGALG00000053276 426291 coiled-coil domain containing 166 CCDC166 8 4.72 ENSGALG00000005889 424497 coiled-coil domain containing 18 CCDC18 9 4.37 ENSGALG00000006714 424880 coiled-coil domain containing 61 CCDC61 9 6.09 ENSGALG00000006714 424880 coiled-coil domain containing 61 CCDC61 28 3.66 ENSGALG00000000337 cortexin 1 28 4.08 ENSGALG00000000337 cortexin 1 28 4.86 ENSGALG00000000337 cortexin 1 12 4.70 ENSGALG00000008420 416121 cullin associated and neddylation dissociated 2 (putative) CAND2 6 9.20 ENSGALG00000003249 101749626 CWF19-like 1, cell cycle control (S. pombe) CWF19L1 12 5.55 ENSGALG00000050390 100857844 cytochrome b561 family member D2 CYB561D2 12 7.60 ENSGALG00000050390 100857844 cytochrome b561 family member D2 CYB561D2 14 4.62 ENSGALG00000002267 427002 defective in cullin neddylation 1 domain containing 3 DCUN1D3 27 3.18 ENSGALG00000000870 419967 dephospho-CoA kinase domain containing DCAKD 17 6.55 ENSGALG00000008873 417275 DNA polymerase epsilon 3, accessory subunit POLE3 8 3.05 ENSGALG00000005858 424496 down-regulator of transcription 1 DR1

57 8 3.99 ENSGALG00000005858 424496 down-regulator of transcription 1 DR1 9 3.58 ENSGALG00000000399 430912 dual specificity phosphatase 28 DUSP28 9 4.00 ENSGALG00000000399 430912 dual specificity phosphatase 28 DUSP28 11 4.68 ENSGALG00000003258 dynein axonemal assembly factor 1 11 5.81 ENSGALG00000003258 dynein axonemal assembly factor 1 17 3.72 ENSGALG00000021274 427778 ectonucleoside triphosphate diphosphohydrolase 1 LOC427778 17 4.94 ENSGALG00000021274 427778 ectonucleoside triphosphate diphosphohydrolase 1 LOC427778 17 5.35 ENSGALG00000021274 427778 ectonucleoside triphosphate diphosphohydrolase 1 LOC427778 17 4.94 ENSGALG00000008932 374095 ectonucleoside triphosphate diphosphohydrolase 8 ENTPD8 17 5.35 ENSGALG00000008932 374095 ectonucleoside triphosphate diphosphohydrolase 8 ENTPD8 17 3.72 ENSGALG00000008936 417281 ectonucleoside triphosphate diphosphohydrolase 8-like ENTPD8L 17 4.94 ENSGALG00000008936 417281 ectonucleoside triphosphate diphosphohydrolase 8-like ENTPD8L 17 5.35 ENSGALG00000008936 417281 ectonucleoside triphosphate diphosphohydrolase 8-like ENTPD8L 18 5.63 ENSGALG00000001036 417313 elaC ribonuclease Z 2 ELAC2 6 3.13 ENSGALG00000025772 770955 ELOVL fatty acid elongase 3 ELOVL3 10 3.97 ENSGALG00000002097 415324 epididymal protein-like LOC415324 10 7.41 ENSGALG00000002097 415324 epididymal protein-like LOC415324 6 8.40 ENSGALG00000003308 423670 ER lipid raft associated 1 ERLIN1 6 9.52 ENSGALG00000003308 423670 ER lipid raft associated 1 ERLIN1 6 10.65 ENSGALG00000003308 423670 ER lipid raft associated 1 ERLIN1 14 5.09 ENSGALG00000026038 427000 ERI1 exoribonuclease family member 2 ERI2 14 5.30 ENSGALG00000026038 427000 ERI1 exoribonuclease family member 2 ERI2 14 5.99 ENSGALG00000026038 427000 ERI1 exoribonuclease family member 2 ERI2 13 3.86 ENSGALG00000048830 eukaryotic translation initiation factor 4E binding protein 3 EIF4EBP3 13 8.60 ENSGALG00000048830 eukaryotic translation initiation factor 4E binding protein 3 EIF4EBP3 8 3.46 ENSGALG00000051438 424465 exostosin like glycosyltransferase 2 EXTL2 11 3.04 ENSGALG00000003201 415701 F-box and leucine rich repeat protein 8 FBXL8 11 4.26 ENSGALG00000003201 415701 F-box and leucine rich repeat protein 8 FBXL8 11 4.68 ENSGALG00000003201 415701 F-box and leucine rich repeat protein 8 FBXL8 8 4.01 ENSGALG00000005918 429087 family with sequence similarity 69 member A DIPK1A 23 5.46 ENSGALG00000040986 426384 family with sequence similarity 76 member A FAM76A 33 3.46 ENSGALG00000031615 426899 Fas apoptotic inhibitory molecule 2 FAIM2 33 5.37 ENSGALG00000031615 426899 Fas apoptotic inhibitory molecule 2 FAIM2 33 6.19 ENSGALG00000031615 426899 Fas apoptotic inhibitory molecule 2 FAIM2 26 3.21 ENSGALG00000003447 419923 fibroblast growth factor receptor substrate 3 FRS3 26 4.05 ENSGALG00000003447 419923 fibroblast growth factor receptor substrate 3 FRS3 26 3.60 ENSGALG00000034067 395814 fibromodulin FMOD 26 4.22 ENSGALG00000034067 395814 fibromodulin FMOD

58 1 4.73 ENSGALG00000013420 430534 forkhead box M1 FOXM1 1 4.82 ENSGALG00000013420 430534 forkhead box M1 FOXM1 14 3.55 ENSGALG00000027146 431418 G protein-coupled receptor 139 GPR139 14 5.14 ENSGALG00000027146 431418 G protein-coupled receptor 139 GPR139 3 3.38 ENSGALG00000015068 G protein-coupled receptor 6 GPR6 4 4.11 ENSGALG00000015595 428798 G protein-coupled receptor 78 GPR78 8 3.38 ENSGALG00000005252 G protein-coupled receptor 88 14 3.50 ENSGALG00000002028 101747453 G protein-coupled receptor class C group 5 member B GPRC5B 14 4.51 ENSGALG00000002028 101747453 G protein-coupled receptor class C group 5 member B GPRC5B 14 4.92 ENSGALG00000002028 101747453 G protein-coupled receptor class C group 5 member B GPRC5B Gallus gallus histone cluster 1, H1.01 (similar to human histone cluster 1, class 1 3.19 ENSGALG00000031874 HIST1H101 H1 genes) (HIST1H101), mRNA. Gallus gallus histone cluster 1, H1.11L (similar to human histone cluster 1, class 1 3.19 ENSGALG00000011783 HIST1H111R H1 genes) (HIST1H111L), mRNA. 26 3.21 ENSGALG00000028489 428272 gastricsin-like GASTL 26 4.05 ENSGALG00000028489 428272 gastricsin-like GASTL 27 4.51 ENSGALG00000003333 420028 GH3 domain containing GHDC 27 6.97 ENSGALG00000003333 420028 GH3 domain containing GHDC 27 7.95 ENSGALG00000003333 420028 GH3 domain containing GHDC 33 3.65 ENSGALG00000042837 100858942 glucagon-like LOC100858945 33 4.43 ENSGALG00000042837 100858942 glucagon-like LOC100858944 33 4.46 ENSGALG00000042837 100858942 glucagon-like LOC100858943 33 4.86 ENSGALG00000042837 100858942 glucagon-like LOC100858942 18 4.11 ENSGALG00000041800 427787 glucagon-like peptide 2 receptor GLP2R 1 3.83 ENSGALG00000033789 374163 heat shock protein 90 beta family member 1 HSP90B1 11 4.26 ENSGALG00000003224 427540 heat shock transcription factor 4 HSF4 11 4.68 ENSGALG00000003224 427540 heat shock transcription factor 4 HSF4 19 3.22 ENSGALG00000001031 417471 heat shock transcription factor 5 HSF5 14 3.76 ENSGALG00000009346 416763 hematological and neurological expressed 1 like HN1L 14 4.17 ENSGALG00000009346 416763 hematological and neurological expressed 1 like HN1L 14 4.50 ENSGALG00000009346 416763 hematological and neurological expressed 1 like HN1L 12 3.66 ENSGALG00000002253 415915 HemK methyltransferase family member 1 HEMK1 1 3.67 ENSGALG00000025906 heparan sulfate 6-O-sulfotransferase 3 HS6ST3 13 6.23 ENSGALG00000000949 395654 heparin binding EGF like growth factor HBEGF 13 8.42 ENSGALG00000000949 395654 heparin binding EGF like growth factor HBEGF 28 3.20 ENSGALG00000000377 420054 heterogeneous nuclear ribonucleoprotein M HNRNPM 8 3.80 ENSGALG00000050568 424509 HFM1, ATP dependent DNA helicase homolog HFM1 12 3.39 ENSGALG00000004930 100858928 histamine receptor H1 HRH1

59 1 3.19 ENSGALG00000027064 100858681 histone cluster 1 H3 family member h HIST1H3H histone cluster 1, H2A-IV-like 2 (similar to human histone cluster 2, class H2A, 1 3.19 ENSGALG00000052030 427891 HIST1H2A4L2 member C) histone cluster 1, H2A-IV-like 3 (similar to human histone cluster 2, class H2A, 1 3.19 ENSGALG00000032645 417955 HIST1H2A4L3 member C) histone cluster 1, H2A-IV-like 4 (similar to human histone cluster 2, class H2A, 1 3.19 ENSGALG00000052284 427895 HIST1H2A4L4 member C) histone cluster 1, H2A, IV (similar to human histone cluster 2, class H2A, 1 3.19 ENSGALG00000054083 404299 HIST1H2A4 member C) 1 3.19 ENSGALG00000052649 417957 histone cluster 1, H2B-V (similar to human histone cluster 1, class H2B) H2B-V 2 3.39 ENSGALG00000026631 776143 homeobox A10 HOXA10 2 3.39 ENSGALG00000040021 395327 homeobox A11 HOXA11 2 3.39 ENSGALG00000027234 373934 homeobox A13 HOXA13 12 7.60 ENSGALG00000000785 426739 hyaluronoglucosaminidase 1 HYAL1 4 3.43 ENSGALG00000009154 422392 iduronate 2-sulfatase IDS 9 4.37 ENSGALG00000029152 100857200 IMP4 homolog, U3 small nucleolar ribonucleoprotein IMP4 9 6.09 ENSGALG00000029152 100857200 IMP4 homolog, U3 small nucleolar ribonucleoprotein IMP4 8 4.91 ENSGALG00000045555 426862 influenza virus NS1A binding protein IVNS1ABP 1 4.28 ENSGALG00000014335 418262 integrin alpha FG-GAP repeat containing 2 ITFG2 1 5.06 ENSGALG00000014335 418262 integrin alpha FG-GAP repeat containing 2 ITFG2 1 4.42 ENSGALG00000014443 418276 intermediate filament family orphan 1 IFFO1 2 3.03 ENSGALG00000030991 374185 iroquois homeobox 1 IRX1 33 3.01 ENSGALG00000034868 395772 keratin 7 KRT7 33 3.06 ENSGALG00000034868 395772 keratin 7 KRT7 33 3.60 ENSGALG00000034868 395772 keratin 7 KRT7 33 5.61 ENSGALG00000050400 426896 keratin 8 KRT8 33 3.06 ENSGALG00000039634 431302 keratin 80 KRT80 33 4.27 ENSGALG00000039634 431302 keratin 80 KRT80 33 4.65 ENSGALG00000039634 431302 keratin 80 KRT80 33 5.61 ENSGALG00000047132 112529929 keratin, type II cytoskeletal 4-like LOC112529929 10 3.21 ENSGALG00000034288 101748248 KIAA0101 KIAA0101 10 3.37 ENSGALG00000034288 101748248 KIAA0101 KIAA0101 27 3.18 ENSGALG00000031553 419968 kinesin family member 18B KIF18B 7 4.83 ENSGALG00000007503 424041 kynurenine 3-monooxygenase KMO 7 4.91 ENSGALG00000007503 424041 kynurenine 3-monooxygenase KMO 7 5.62 ENSGALG00000007503 424041 kynurenine 3-monooxygenase KMO 7 3.69 ENSGALG00000006198 424037 lanosterol synthase LSS 7 4.23 ENSGALG00000006198 424037 lanosterol synthase LSS

60 7 5.88 ENSGALG00000006198 424037 lanosterol synthase LSS 4 3.52 ENSGALG00000043447 422174 LAS1 like, ribosome biogenesis factor LAS1L 17 3.92 ENSGALG00000045901 101749800 leucine rich repeat containing 26 LRRC26 17 6.55 ENSGALG00000045901 101749800 leucine rich repeat containing 26 LRRC26 14 3.13 ENSGALG00000003217 374125 lipopolysaccharide induced TNF factor LITAF 1 3.41 ENSGALG00000016758 418697 lipoyltransferase 1 LIPT1 28 3.88 ENSGALG00000000407 420055 LSM7 homolog, U6 small nuclear RNA and mRNA degradation associated LSM7 1 3.10 ENSGALG00000011271 417891 lumican LUM 14 4.62 ENSGALG00000002273 427003 LYR motif containing 1 LYRM1 14 6.67 ENSGALG00000002042 426994 lysine-rich nucleolar protein 1 KNOP1 14 6.79 ENSGALG00000002042 426994 lysine-rich nucleolar protein 1 KNOP1 14 6.82 ENSGALG00000002042 426994 lysine-rich nucleolar protein 1 KNOP1 28 4.59 ENSGALG00000040471 420049 major vault protein MVP 19 3.66 ENSGALG00000005830 427840 MAX network transcriptional repressor MNT 19 3.41 ENSGALG00000030401 417679 mediator complex subunit 31 MED31 11 5.13 ENSGALG00000003293 415704 membrane bound transcription factor peptidase, site 1 MBTPS1 11 5.82 ENSGALG00000003293 415704 membrane bound transcription factor peptidase, site 1 MBTPS1 12 4.65 ENSGALG00000002362 107048996 mesencephalic astrocyte derived neurotrophic factor MANF 12 5.49 ENSGALG00000002362 107048996 mesencephalic astrocyte derived neurotrophic factor MANF 8 3.54 ENSGALG00000043087 395174 metal response element binding transcription factor 2 MTF2 1 3.41 ENSGALG00000016759 418698 microtubule interacting and trafficking domain containing 1 MITD1 2 3.00 ENSGALG00000010954 420616 mitochondrial assembly of ribosomal large subunit 1 MALSU1 3 3.95 ENSGALG00000019967 421691 mitochondrial fission regulator 2 MTFR2 1 3.48 ENSGALG00000039783 771615 mitochondrial ribosomal protein L57 MRPL57 12 4.39 ENSGALG00000008506 416129 mitochondrial ribosomal protein S25 MRPS25 6 8.25 ENSGALG00000003549 791224 mitochondrial ribosome associated GTPase 1 MTG1 6 10.26 ENSGALG00000003549 791224 mitochondrial ribosome associated GTPase 1 MTG1 6 10.30 ENSGALG00000003549 791224 mitochondrial ribosome associated GTPase 1 MTG1 2 4.32 ENSGALG00000029279 426292 mitogen-activated protein kinase 15 MAPK15 19 4.37 ENSGALG00000034960 101751014 MLX interacting protein like MLXIPL 6 8.03 ENSGALG00000003695 395356 multiple inositol-polyphosphate phosphatase 1 MINPP1 6 9.72 ENSGALG00000003695 395356 multiple inositol-polyphosphate phosphatase 1 MINPP1 20 3.18 ENSGALG00000005932 419249 myelin transcription factor 1 MYT1 2 3.00 ENSGALG00000038635 770011 myosin light chain 12B MYL12B 2 3.49 ENSGALG00000038635 770011 myosin light chain 12B MYL12B 2 3.00 ENSGALG00000014854 396284 myosin, light chain 12A, regulatory, non-sarcomeric MYL12A 27 3.18 ENSGALG00000000857 419966 N-myristoyltransferase 1 NMT1 17 3.72 ENSGALG00000008920 417280 NADPH dependent diflavin oxidoreductase 1 NDOR1

61 10 3.36 ENSGALG00000043506 415333 neuromedin B NMB 2 3.01 ENSGALG00000038838 421137 neutral sphingomyelinase activation associated factor NSMAF 12 5.55 ENSGALG00000002131 425132 NPR2-like, GATOR1 complex subunit 12 7.60 ENSGALG00000002131 425132 NPR2-like, GATOR1 complex subunit 26 3.03 ENSGALG00000001956 419882 olfactomedin like 3 OLFML3 10 3.28 ENSGALG00000002167 415327 PAT1 homolog 2 PATL2 10 3.28 ENSGALG00000002210 396447 peptidylprolyl isomerase B (cyclophilin B) PPIB 8 3.23 ENSGALG00000027608 424387 phosphatidylinositol glycan anchor biosynthesis class C PIGC 26 3.03 ENSGALG00000038902 100858838 Pim-1 proto-oncogene, serine/threonine kinase PIM1 26 3.52 ENSGALG00000038902 100858838 Pim-1 proto-oncogene, serine/threonine kinase PIM1 2 3.31 ENSGALG00000007036 420462 pitrilysin metallopeptidase 1 PITRM1 2 3.42 ENSGALG00000007036 420462 pitrilysin metallopeptidase 1 PITRM1 9 3.21 ENSGALG00000002169 424758 pleckstrin homology domain containing B2 PLEKHB2 9 3.83 ENSGALG00000002169 424758 pleckstrin homology domain containing B2 PLEKHB2 6 8.25 ENSGALG00000054212 100857693 polyamine oxidase PAOX 6 10.26 ENSGALG00000054212 100857693 polyamine oxidase PAOX 6 10.30 ENSGALG00000054212 100857693 polyamine oxidase PAOX 27 5.20 ENSGALG00000003261 396006 polymerase I and transcript release factor PTRF 33 7.46 ENSGALG00000041101 429501 polypeptide N-acetylgalactosaminyltransferase 6 GALNT6 27 4.51 ENSGALG00000003354 772091 potassium voltage-gated channel subfamily H member 4 KCNH4 13 6.23 ENSGALG00000000946 416142 prefoldin subunit 1 PFDN1 13 6.95 ENSGALG00000000946 416142 prefoldin subunit 1 PFDN1 9 4.00 ENSGALG00000002315 424767 presenilin associated rhomboid like PARL 9 4.68 ENSGALG00000002315 424767 presenilin associated rhomboid like PARL 26 3.21 ENSGALG00000028262 100859351 prickle planar cell polarity protein 4 PRICKLE4 26 3.21 ENSGALG00000003435 395690 progastricsin (pepsinogen C) PGC 26 4.05 ENSGALG00000003435 395690 progastricsin (pepsinogen C) PGC 8 5.92 ENSGALG00000033635 396451 prostaglandin-endoperoxide synthase 2 PTGS2 8 6.71 ENSGALG00000033635 396451 prostaglandin-endoperoxide synthase 2 PTGS2 7 3.69 ENSGALG00000006141 395112 protein O-fucosyltransferase 2 POFUT2 7 4.70 ENSGALG00000006141 395112 protein O-fucosyltransferase 2 POFUT2 13 3.96 ENSGALG00000029387 430546 purine rich element binding protein A PURA 13 6.16 ENSGALG00000029387 430546 purine rich element binding protein A PURA 13 7.53 ENSGALG00000029387 430546 purine rich element binding protein A PURA 13 8.30 ENSGALG00000029387 430546 purine rich element binding protein A PURA 1 3.65 ENSGALG00000028018 428095 RAB38, member RAS oncogene family RAB38 12 4.26 ENSGALG00000008500 416128 rabenosyn, RAB effector RBSN 12 4.39 ENSGALG00000008500 416128 rabenosyn, RAB effector RBSN

62 1 4.73 ENSGALG00000013422 426083 RAD9-HUS1-RAD1 interacting nuclear orphan 1 RHNO1 1 4.82 ENSGALG00000013422 426083 RAD9-HUS1-RAD1 interacting nuclear orphan 1 RHNO1 1 6.03 ENSGALG00000013422 426083 RAD9-HUS1-RAD1 interacting nuclear orphan 1 RHNO1 6 4.49 ENSGALG00000002395 423623 rap1 GTPase-GDP dissociation stimulator 1-like RAP1GDS1L 12 5.55 ENSGALG00000038951 101750777 Ras association domain family member 1 RASSF1 12 7.60 ENSGALG00000038951 101750777 Ras association domain family member 1 RASSF1 11 4.03 ENSGALG00000044466 415708 receptor-interacting serine-threonine kinase 3 RIPK3 11 6.27 ENSGALG00000044466 415708 receptor-interacting serine-threonine kinase 3 RIPK3 11 6.62 ENSGALG00000044466 415708 receptor-interacting serine-threonine kinase 3 RIPK3 17 3.28 ENSGALG00000008881 395110 regulator of G-protein signaling 3 RGS3 17 6.25 ENSGALG00000054209 395110 regulator of G-protein signaling 3 RGS3 9 3.34 ENSGALG00000002193 Rho guanine nucleotide exchange factor 4 1 5.26 ENSGALG00000017276 419033 Rho guanine nucleotide exchange factor 5 ARHGEF5 6 7.41 ENSGALG00000003196 396194 ribonuclease A family member k6 ANG 6 7.56 ENSGALG00000003196 396194 ribonuclease A family member k6 ANG 5 3.08 ENSGALG00000005948 770018 ribosomal protein L27a RPL27A 12 4.70 ENSGALG00000027142 416122 ribosomal protein L32 RPL32 8 4.01 ENSGALG00000005922 395269 ribosomal protein L5 RPL5 12 4.65 ENSGALG00000050969 770664 RNA binding motif protein 15B RBM15B 12 5.49 ENSGALG00000050969 770664 RNA binding motif protein 15B RBM15B 14 4.62 ENSGALG00000002239 427001 RNA exonuclease 5 REXO5 14 5.30 ENSGALG00000002239 427001 RNA exonuclease 5 REXO5 14 5.99 ENSGALG00000002239 427001 RNA exonuclease 5 REXO5 7 3.53 ENSGALG00000006217 424038 S100 calcium binding protein B S100B 7 3.69 ENSGALG00000006217 424038 S100 calcium binding protein B S100B 7 4.23 ENSGALG00000006217 424038 S100 calcium binding protein B S100B 7 5.88 ENSGALG00000006217 424038 S100 calcium binding protein B S100B 10 3.36 ENSGALG00000037553 415334 SEC11 homolog A, signal peptidase complex subunit SEC11A 12 4.70 ENSGALG00000008419 416119 SEC13 homolog, nuclear pore and COPII coat complex component SEC13 23 5.46 ENSGALG00000037603 100858796 sestrin-2-like LOC100858796 27 4.51 ENSGALG00000003282 395556 signal transducer and activator of transcription 5A STAT5A 27 6.97 ENSGALG00000003282 395556 signal transducer and activator of transcription 5A STAT5A 27 7.95 ENSGALG00000003282 395556 signal transducer and activator of transcription 5A STAT5A 5 3.15 ENSGALG00000020359 395843 SIX homeobox 6 SIX6 4 3.35 ENSGALG00000009192 100857927 SLIT and NTRK like family member 2 SLITRK2 4 3.61 ENSGALG00000009192 100857927 SLIT and NTRK like family member 2 SLITRK2 4 7.74 ENSGALG00000009192 100857927 SLIT and NTRK like family member 2 SLITRK2 14 3.67 ENSGALG00000031841 416665 SLX4 structure-specific endonuclease subunit homolog SLX4

63 19 3.41 ENSGALG00000005995 solute carrier family 13 member 5 10 3.59 ENSGALG00000030863 415338 solute carrier family 28 (sodium-coupled nucleoside transporter), member 2 SLC28A2 10 3.70 ENSGALG00000030863 415338 solute carrier family 28 (sodium-coupled nucleoside transporter), member 2 SLC28A2 17 7.09 ENSGALG00000008844 417268 solute carrier family 31 member 1 SLC31A1 11 5.13 ENSGALG00000003302 100857309 solute carrier family 38 member 8 SLC38A8 11 5.57 ENSGALG00000003302 100857309 solute carrier family 38 member 8 SLC38A8 11 5.82 ENSGALG00000003302 100857309 solute carrier family 38 member 8 SLC38A8 33 4.43 ENSGALG00000031274 425441 solute carrier family 4 member 8 SLC4A8 33 4.86 ENSGALG00000031274 425441 solute carrier family 4 member 8 SLC4A8 13 8.42 ENSGALG00000044387 769199 solute carrier family 4 member 9 SLC4A9 13 8.60 ENSGALG00000044387 769199 solute carrier family 4 member 9 SLC4A9 12 3.79 ENSGALG00000034289 416033 solute carrier family 41 member 3 SLC41A3 10 3.28 ENSGALG00000002189 427480 sorting nexin 1 SNX1 10 3.28 ENSGALG00000046823 769119 sorting nexin 22 SNX22 1 3.48 ENSGALG00000017128 418948 spindle and kinetochore associated complex subunit 3 SKA3 9 3.97 ENSGALG00000043460 396105 SRY-box 2 SOX2 13 3.86 ENSGALG00000040453 427608 steroid receptor RNA activator 1 SRA1 13 8.60 ENSGALG00000040453 427608 steroid receptor RNA activator 1 SRA1 8 4.94 ENSGALG00000034230 100859817 SWT1, RNA endoribonuclease homolog SWT1 19 3.41 ENSGALG00000027950 417680 thioredoxin domain containing 17 TXNDC17 14 3.67 ENSGALG00000039951 416666 TNF receptor associated protein 1 TRAP1 11 3.04 ENSGALG00000003195 415700 TNFRSF1A associated via death domain TRADD 11 4.26 ENSGALG00000003195 415700 TNFRSF1A associated via death domain TRADD 11 4.68 ENSGALG00000003195 415700 TNFRSF1A associated via death domain TRADD 8 3.79 ENSGALG00000005776 424491 trans-2,3-enoyl-CoA reductase TECR 8 4.25 ENSGALG00000005776 424491 trans-2,3-enoyl-CoA reductase TECR 19 4.37 ENSGALG00000031263 101751077 transducin (beta)-like 2 TBL2 28 3.48 ENSGALG00000028937 100859352 translocase of inner mitochondrial membrane 13 TIMM13 28 4.13 ENSGALG00000028937 100859352 translocase of inner mitochondrial membrane 13 TIMM13 28 3.06 ENSGALG00000000352 420053 translocase of inner mitochondrial membrane 44 TIMM44 28 3.20 ENSGALG00000000352 420053 translocase of inner mitochondrial membrane 44 TIMM44 28 3.51 ENSGALG00000000352 420053 translocase of inner mitochondrial membrane 44 TIMM44 8 8.82 ENSGALG00000005105 424457 translocated promoter region, nuclear basket protein TPR 8 4.72 ENSGALG00000005904 424499 transmembrane p24 trafficking protein 5 TMED5 28 3.48 ENSGALG00000000458 428320 transmembrane protease, serine 9 TMPRSS9 28 4.13 ENSGALG00000000458 428320 transmembrane protease, serine 9 TMPRSS9 12 5.55 ENSGALG00000050948 429757 transmembrane protein 115 TMEM115 12 7.60 ENSGALG00000050948 429757 transmembrane protein 115 TMEM115

64 14 4.76 ENSGALG00000027173 427005 transmembrane protein 159 TMEM159 4 3.43 ENSGALG00000009139 422391 transmembrane protein 185B TMEM185B 1 6.03 ENSGALG00000013424 426085 tubby like protein 3 TULP3 6 7.89 ENSGALG00000003526 423673 tubulin gamma complex associated protein 2 TUBGCP2 6 10.30 ENSGALG00000003526 423673 tubulin gamma complex associated protein 2 TUBGCP2 28 3.12 ENSGALG00000034271 420154 ubiquitin like with PHD and ring finger domains 1 UHRF1 11 3.04 ENSGALG00000021345 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 9 11 3.25 ENSGALG00000021345 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 9 11 4.26 ENSGALG00000021345 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 9 27 6.30 ENSGALG00000047783 107055273 uncharacterized LOC107055273 LOC100858100 31 4.04 ENSGALG00000047761 769729 uncharacterized LOC769729 LOC769729 4 3.58 ENSGALG00000010878 422575 USO1 vesicle transport factor USO1 3 3.57 ENSGALG00000019958 100857259 vesicular inhibitory amino acid transporter-like LOC100857259 3 5.49 ENSGALG00000019958 100857259 vesicular inhibitory amino acid transporter-like LOC100857259 12 4.70 ENSGALG00000028009 416117 von Hippel-Lindau tumor suppressor VHL 19 3.41 ENSGALG00000005928 417677 XIAP associated factor 1 XAF1 7 3.69 ENSGALG00000047621 107053767 ybeY metallopeptidase (putative) YBEY 7 4.70 ENSGALG00000047621 107053767 ybeY metallopeptidase (putative) YBEY 4 4.33 ENSGALG00000042725 422176 zinc finger C4H2-type containing ZC4H2 4 3.52 ENSGALG00000039391 422175 zinc finger CCCH-type containing 12B ZC3H12B 1 3.20 ENSGALG00000017126 418947 zinc finger DHHC-type containing 20 ZDHHC20 12 5.55 ENSGALG00000035883 112529928 zinc finger MYND-type containing 10 ZMYND10 12 7.60 ENSGALG00000035883 112529928 zinc finger MYND-type containing 10 ZMYND10 6 7.89 ENSGALG00000003533 428940 zinc finger protein 511 ZNF511 6 8.25 ENSGALG00000003533 428940 zinc finger protein 511 ZNF511 6 10.30 ENSGALG00000003533 428940 zinc finger protein 511 ZNF511 14 4.76 ENSGALG00000002428 427006 zona pellucida glycoprotein 2 ZP2

65 Table S4. Outlier genes identified as putatively under divergent selection using principal component analysis (pcadapt) between Irish and English Red Grouse. Colour signifies associated function, green = immune, orange = pigmentation, pink = interesting/relevant functions and blank=unassigned. NCBI Chromosome Ensembl Gene ID Accension Gene description Gene name Number 1 ENSGALG00000040342 418479 ADAM metallopeptidase with thrombospondin type 1 motif 1 ADAMTS1 19 ENSGALG00000001654 395538 complement C1q binding protein C1QBP 28 ENSGALG00000024391 420073 cactin, spliceosome C complex subunit CACTIN 17 ENSGALG00000001889 427756 caspase recruitment domain family member 9 CARD9 2 ENSGALG00000011734 776400 C-C chemokine receptor 8 like CCR8L 27 ENSGALG00000000251 419940 CD79b molecule CD79B 10 ENSGALG00000031270 396396 C-terminal Src kinase CSK 7 ENSGALG00000012357 395324 C-X-C motif chemokine receptor 4 CXCR4 27 ENSGALG00000035369 425527 dual specificity phosphatase 3 DUSP3 3 ENSGALG00000019843 414342 avian beta-defensin 4 GAL4 1 ENSGALG00000012550 396287 heme oxygenase 1 HMOX1 27 ENSGALG00000002832 420011 interferon-induced protein 35 IFI35 26 ENSGALG00000000911 428265 interleukin 20 IL20 1 ENSGALG00000009904 417838 interleukin 22 IL22 1 ENSGALG00000009904 417838 interleukin 22 IL22 25 ENSGALG00000011570 425709 interleukin enhancer binding factor 2 ILF2 4 ENSGALG00000011849 373958 interleukin 2 interleukin-2 3 ENSGALG00000008552 416707 mal, T cell differentiation protein MAL 10 ENSGALG00000006702 415494 milk fat globule-EGF factor 8 protein MFGE8 3 ENSGALG00000032737 422030 Ovodefensin A1 OvoDA1 12 ENSGALG00000004517 771158 poly(ADP-ribose) polymerase family member 3 PARP3 13 ENSGALG00000004588 416275 ribosomal protein S14 RPS14 10 ENSGALG00000002157 374053 ribosomal protein S17 RPS17 15 ENSGALG00000005862 416923 somatomedin B and thrombospondin type 1 domain containing like SBSPONL 25 ENSGALG00000044505 425978 SH2 domain containing 2A SH2D2A 5 ENSGALG00000036069 395472 T-cell immune regulator 1, ATPase H transporting V0 subunit a3 TCIRG1 13 ENSGALG00000004496 100858009 TNFAIP3 interacting protein 1 TNIP1 2 ENSGALG00000011145 428436 TLR4 interactor with leucine rich repeats TRIL 4 ENSGALG00000006808 422262 tetraspanin 6 TSPAN6 19 ENSGALG00000003589 395935 vitronectin VTN 3 ENSGALG00000015231 421763 zinc finger and BTB domain containing 24 ZBTB24 27 ENSGALG00000030907 396216 colony stimulating factor 3 CSF3

66 25 ENSGALG00000028706 769269 feather keratin I F-KER 26 ENSGALG00000023933 419860 G0/G1 switch 2 G0S2 6 ENSGALG00000017419 429879 HPS1, biogenesis of lysosomal organelles complex 3 subunit 1 HPS1 25 ENSGALG00000000341 425968 keratin associated protein 10-4 JAC 33 ENSGALG00000030629 408041 keratin 6A KRT6A 33 ENSGALG00000035972 408042 keratin 75 KRT75 33 ENSGALG00000035972 408042 keratin 75 KRT75 33 ENSGALG00000044875 100858302 keratin, type II cytoskeletal 75-like 4 KRT75L4 27 ENSGALG00000036846 420041 keratin, type I cytoskeletal 9-like KRT9 25 ENSGALG00000050271 431316 scale keratin-like LOC100859790 33 ENSGALG00000047132 112529929 keratin, type II cytoskeletal 4-like LOC112529929 25 ENSGALG00000054186 431317 scale keratin-like LOC431317 25 ENSGALG00000048160 431320 feather beta keratin-like LOC431320 11 ENSGALG00000054486 427562 melanocortin 1 receptor MC1R 6 ENSGALG00000052760 395388 atonal bHLH transcription factor 7 ATOH7 11 ENSGALG00000026903 396255 calbindin 2 CALB2 10 ENSGALG00000014121 769735 cytochrome c oxidase subunit 5A COX5A 6 ENSGALG00000008121 425056 cytochrome P450 family 17 subfamily A member 1 CYP17A1 4 ENSGALG00000014975 427552 dopamine receptor D5 DRD5 5 ENSGALG00000036364 396110 forkhead box G1 FOXG1 18 ENSGALG00000002088 770030 galanin receptor 2 GALR2 1 ENSGALG00000012307 427904 galanin receptor 3 GALR3 1 ENSGALG00000012307 427904 galanin receptor 3 GALR3 5 ENSGALG00000028839 428919 glycoprotein hormone beta 5 GPHB5 27 ENSGALG00000011485 374005 hypocretin neuropeptide precursor HCRT 14 ENSGALG00000005878 416558 MTOR associated protein, LST8 homolog MLST8 14 ENSGALG00000045096 101748764 neuropeptide W NPW 5 ENSGALG00000032687 423088 pleckstrin homology like domain family A member 2 PHLDA2 10 ENSGALG00000006676 415492 retinaldehyde binding protein 1 RLBP1 25 ENSGALG00000026796 100858163 SNAP associated protein SNAPIN 1 ENSGALG00000050619 395219 suppressor of cytokine signaling 2 SOCS2 1 ENSGALG00000011908 770780 WBP2 N-terminal like WBP2NL 19 ENSGALG00000004924 396377 opsin, pineal 19 ENSGALG00000004924 396377 opsin, pineal 5 ENSGALG00000017389 395601 SIX homeobox 5 SIX5 28 ENSGALG00000003024 420112 armadillo repeat containing 6 ARMC6 1 ENSGALG00000038981 417701 Cbl proto-oncogene like 1 CBLL1

67 9 ENSGALG00000006402 424857 negative regulator of reactive oxygen species NRROS 28 ENSGALG00000001146 428329 UBX domain protein 6 UBXN6 25 ENSGALG00000024272 426356 S100 calcium binding protein A12 S100A12 21 ENSGALG00000021598 419479 arylacetamide deacetylase like 3C AADACL3C 24 ENSGALG00000006795 428242 ATP binding cassette subfamily G member 4 ABCG4 22 ENSGALG00000041634 396084 actin, gamma 2, smooth muscle, enteric ACTG2 17 ENSGALG00000029150 396002 adenylate kinase 1 AK1 4 ENSGALG00000007599 422314 APC membrane recruitment protein 1 AMER1 9 ENSGALG00000050605 424755 AMMECR1 like AMMECR1L 9 ENSGALG00000031312 769605 anaphase promoting complex subunit 13 ANAPC13 5 ENSGALG00000011376 423469 ankyrin repeat domain 9 ANKRD9 3 ENSGALG00000009041 421298 ankyrin repeat domain 9-like ANKRD9L 18 ENSGALG00000033376 417431 apolipoprotein H APOH 17 ENSGALG00000004378 417194 ankyrin repeat and SOCS box containing 6 ASB6 8 ENSGALG00000010444 424616 ATP synthase mitochondrial F1 complex assembly factor 1 ATPAF1 21 ENSGALG00000001555 419415 aurora kinase A interacting protein 1 AURKAIP1 3 ENSGALG00000052020 100858701 avian beta-defensin 14 AvBD14 17 ENSGALG00000021553 417126 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase-like B3GNTL 6 ENSGALG00000037714 100858359 BBSome interacting protein 1 BBIP1 26 ENSGALG00000050352 772088 BCL2 like 15 BCL2L15 8 ENSGALG00000043605 771694 bestrophin 4 BEST4 6 ENSGALG00000008124 423869 BLOC-1 related complex subunit 7 BORCS7 20 ENSGALG00000006662 419289 BPI fold containing family B member 3 BPIFB3 8 ENSGALG00000043998 107054005 Bartter syndrome, infantile, with sensorineural deafness (Barttin) BSND 8 ENSGALG00000028978 100858105 BTB domain containing 19 BTBD19 13 ENSGALG00000006506 416321 chromosome 13 C5orf15 homolog C13H5orf15 21 ENSGALG00000004102 395744 chromosome 1 open reading frame 158 C1orf158 23 ENSGALG00000045736 419639 chromosome 23 open reading frame, human C1orf94 C23H1ORF94 24 ENSGALG00000007950 419799 chromosome 24 C11orf1 homolog C24H11orf1 26 ENSGALG00000000497 419812 chromosome 26 open reading frame, human C6orf222 C26H6ORF222 28 ENSGALG00000001273 428334 C2 calcium dependent domain containing 4C C2CD4C 5 ENSGALG00000006077 395520 chromosome 5 C11orf58 homolog C5H11orf58 11 ENSGALG00000042664 771265 cerebellin 1 precursor CBLN1 1 ENSGALG00000036867 107053684 chromobox 6 CBX6 1 ENSGALG00000016981 418848 spermatid associated CBY2 12 ENSGALG00000006787 100858691 coiled-coil domain containing 71 CCDC71 13 ENSGALG00000023812 771567 cyclin I family member 2 CCNI2

68 17 ENSGALG00000035576 417226 cyclin dependent kinase 9 CDK9 9 ENSGALG00000006405 424858 centrosomal protein 19 CEP19 5 ENSGALG00000010027 423320 cofilin 2 CFL2 5 ENSGALG00000027874 423209 ChaC glutathione specific gamma-glutamylcyclotransferase 1 CHAC1 1 ENSGALG00000011970 417998 chondroadherin-like CHADL 10 ENSGALG00000003014 386578 cholinergic receptor nicotinic alpha 3 subunit CHRNA3 2 ENSGALG00000030969 395386 C-type lectin domain family 3 member B CLEC3B 10 ENSGALG00000001362 415295 CDC like kinase 3 CLK3 3 ENSGALG00000008757 421266 cannabinoid receptor interacting protein 1 CNRIP1 3 ENSGALG00000027440 421493 COX20, cytochrome c oxidase assembly factor COX20 9 ENSGALG00000045776 424899 carboxypeptidase N subunit 2 CPN2 11 ENSGALG00000000507 415851 copine VII CPNE7 1 ENSGALG00000030101 425183 cold shock domain containing C2, RNA binding CSDC2 26 ENSGALG00000000318 396176 cysteine and glycine rich protein 1 CSRP1 9 ENSGALG00000008178 770119 cytochrome P450, family 2, subfamily AB, polypeptide 2 CYP2AB2 28 ENSGALG00000047617 426942 DET1 and DDB1 associated 1 DDA1 28 ENSGALG00000003130 420116 DEAD-box helicase 49 DDX49 3 ENSGALG00000009330 421327 delta 4-desaturase, sphingolipid 1 DEGS1 26 ENSGALG00000000154 421153 DENN domain containing 2D DENND2D 19 ENSGALG00000004009 100858939 dehydrogenase/reductase 13 DHRS13 17 ENSGALG00000048117 772152 DNL-type zinc finger DNLZ 13 ENSGALG00000026914 429960 docking protein 3 DOK3 2 ENSGALG00000011237 420643 diphthamide biosynthesis 3 pseudogene 1 DPH3P1 1 ENSGALG00000007084 425349 dual specificity phosphatase 12 DUSP12 15 ENSGALG00000007153 427708 dynein light chain 1 cytoplasmic DYL1 15 ENSGALG00000007147 770215 dynein light chain 2 cytoplasmic DYL2 14 ENSGALG00000006122 427672 glutamyl-tRNA synthetase 2, mitochondrial EARS2 9 ENSGALG00000006272 100857430 endothelin converting enzyme 2 ECE2 25 ENSGALG00000031901 110224124 epidermal differentiation protein beta EDbeta 25 ENSGALG00000038478 101747731 keratin-associated protein 5-1-like EDMPN2 25 ENSGALG00000030242 110224125 epidermal differentiation protein starting with MTF motif and rich in histidine EDMTFH 25 ENSGALG00000054721 101751162 keratin-associated protein 9-1-like EDQCM 25 ENSGALG00000049060 107055097 uncharacterized LOC107055097 EDYM2 2 ENSGALG00000037310 428434 even-skipped homeobox 1 EVX1 14 ENSGALG00000024041 416538 fumarylacetoacetate hydrolase domain containing 1 FAHD1 1 ENSGALG00000015324 418395 family with sequence similarity 172 member B, pseudogene FAM172BP 27 ENSGALG00000000914 101747521 family with sequence similarity 187 member A FAM187A

69 11 ENSGALG00000003201 415701 F-box and leucine rich repeat protein 8 FBXL8 4 ENSGALG00000013663 428779 fibroblast growth factor 20 FGF20 1 ENSGALG00000017029 428073 fibrinogen-like protein 1-like FGL1L 9 ENSGALG00000006440 429133 galactose-3-O-sulfotransferase 4 GAL3ST4 1 ENSGALG00000016918 418811 putative glutamine amidotransferase like class 1 domain containing 3A-like1 GATD3AL1 1 ENSGALG00000033149 430581 GRIP and coiled-coil domain containing 1 GCC1 27 ENSGALG00000000909 419969 glial fibrillary acidic protein GFAP 27 ENSGALG00000000997 396501 gap junction protein gamma 3 GJC1 4 ENSGALG00000010588 396117 Gnot2 homeodomain protein GNOT2 6 ENSGALG00000002048 395648 G protein regulated inducer of neurite outgrowth 2 GPRIN2 13 ENSGALG00000050577 768554 GrpE like 2, mitochondrial GRPEL2 4 ENSGALG00000031160 422646 G-rich RNA sequence binding factor 1 GRSF1 1 ENSGALG00000052649 417957 histone cluster 1, H2B-V (similar to human histone cluster 1, class H2B) H2B-V 1 ENSGALG00000048097 100858319 histone cluster 1, H4i H4-I 1 ENSGALG00000011798 417961 heme binding protein 1 HEBP1 1 ENSGALG00000011799 417961 heme binding protein 1 HEBP1 8 ENSGALG00000001959 429055 HEN1 methyltransferase homolog 1 HENMT1 21 ENSGALG00000001141 419392 hes family bHLH transcription factor 5 HES5 24 ENSGALG00000006806 419756 histone H4 transcription factor HINFP 6 ENSGALG00000023415 429199 H6 family homeobox 3 HMX3 7 ENSGALG00000009274 396178 homeobox D12 HOXD12 1 ENSGALG00000025906 heparan sulfate 6-O-sulfotransferase 3 HS6ST3 11 ENSGALG00000003273 415703 hydroxysteroid dehydrogenase like 1 HSDL1 15 ENSGALG00000007383 416988 heat shock protein family B (small) member 8 HSPB8 5 ENSGALG00000026846 423221 jumonji domain containing 7 JMJD7 5 ENSGALG00000026846 423221 jumonji domain containing 7 JMJD7 5 ENSGALG00000012142 395721 potassium voltage-gated channel subfamily A member 4 KCNA4 20 ENSGALG00000007980 419347 potassium voltage-gated channel modifier subfamily G member 1 KCNG1 3 ENSGALG00000016470 428653 potassium voltage-gated channel modifier subfamily S member 3 KCNS3 18 ENSGALG00000050762 417390 kinesin family member 2B KIF2B 27 ENSGALG00000003536 428313 kelch like family member 11 KLHL11 1 ENSGALG00000017332 101747275 kelch like family member 35 KLHL35 10 ENSGALG00000003505 415373 lactamase beta LACTB 8 ENSGALG00000011047 424699 leptin receptor overlapping transcript LEPROT 31 ENSGALG00000043938 430516 DNA ligase 1 LIG1 5 ENSGALG00000029892 374129 LIM domain only 2 LMO2 25 ENSGALG00000026070 100857512 death-associated protein kinase 2-like LOC100857512

70 25 ENSGALG00000026070 100857512 death-associated protein kinase 2-like LOC100857512 5 ENSGALG00000053515 100857834 pepsin A-like LOC100857834 33 ENSGALG00000042837 100858942 glucagon-like LOC100858942 15 ENSGALG00000040869 100858984 dynein light chain 2, cytoplasmic-like LOC100858984 1 ENSGALG00000053778 101747255 serologically defined colon cancer antigen 3 homolog LOC101747255 8 ENSGALG00000043245 101747588 ankyrin-3-like LOC101747588 4 ENSGALG00000005747 101748744 serine protease 23-like LOC101748744 1 ENSGALG00000048309 107050476 uncharacterized LOC107050476 LOC107050476 17 ENSGALG00000021553 417126 UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase-like LOC107050476 25 ENSGALG00000050822 107055098 uncharacterized LOC107055098 LOC107055098 25 ENSGALG00000032318 107055131 proline-rich protein 4-like LOC107055131 21 ENSGALG00000027002 107057363 transcription factor HES-5-like LOC107057363 21 ENSGALG00000027002 107057363 transcription factor HES-5-like LOC107057363 10 ENSGALG00000002097 415324 epididymal protein-like LOC415324 21 ENSGALG00000001136 419390 hairy and enhancer of split 5-like LOC419390 4 ENSGALG00000009792 422442 uncharacterized LOC422442 LOC422442 8 ENSGALG00000010203 424588 E3 ubiquitin-protein ligase HECTD3-like LOC424588 27 ENSGALG00000019797 428277 parathyroid hormone/parathyroid hormone-related peptide receptor-like LOC428277 8 ENSGALG00000039159 429098 fatty-acid amide hydrolase 1-like LOC429098 1 ENSGALG00000019312 693258 noggin 4 LOC693258 8 ENSGALG00000010743 424655 leucine rich repeat containing 42 LRRC42 5 ENSGALG00000009112 423236 leucine rich repeat containing 57 LRRC57 4 ENSGALG00000015967 771502 leucine rich repeat transmembrane neuronal 1 LRRTM1 2 ENSGALG00000043582 420301 lymphocyte antigen 6 complex, locus E-like LY6CLEL 23 ENSGALG00000002013 419617 MYST/Esa1-associated factor 6 MEAF6 14 ENSGALG00000005411 416539 meiosis specific with OB domains MEIOB 7 ENSGALG00000008493 424094 methyltransferase like 21A METTL21A 5 ENSGALG00000031212 428925 mannosyl (alpha-1,6-)-glycoprotein beta-1,2-N-acetylglucosaminyltransferase MGAT2 5 ENSGALG00000031212 428925 mannosyl (alpha-1,6-)-glycoprotein beta-1,2-N-acetylglucosaminyltransferase MGAT2 1 ENSGALG00000017003 428072 motilin receptor MLNR 3 ENSGALG00000040307 101751661 MORN repeat containing 2 MORN2 2 ENSGALG00000042082 428537 v-mos Moloney murine sarcoma viral oncogene homolog MOS 10 ENSGALG00000029296 769765 mannose phosphate isomerase MPI 28 ENSGALG00000036004 426941 mitochondrial ribosomal protein L34 MRPL34 10 ENSGALG00000006782 415499 mitochondrial ribosomal protein L46 MRPL46 3 ENSGALG00000009890 421385 mitochondrial ribosomal protein S10 MRPS10 8 ENSGALG00000030028 424436 mitochondrial ribosomal protein S14 MRPS14

71 19 ENSGALG00000002379 417538 mitochondrial ribosomal protein S17 MRPS17 26 ENSGALG00000000164 421154 myosin binding protein H like MYBPH 27 ENSGALG00000000585 396472 myosin, light chain 4, alkali; atrial, embryonic MYL4 27 ENSGALG00000000585 396472 myosin, light chain 4, alkali; atrial, embryonic MYL4 8 ENSGALG00000021074 424391 myocilin MYOC 3 ENSGALG00000047977 421467 N-acetylneuraminic acid phosphatase NANP 4 ENSGALG00000005753 422214 nuclear cap binding protein subunit 2, 20kDa NCBP2 4 ENSGALG00000008613 772150 NADH:ubiquinone oxidoreductase subunit A1 NDUFA1 18 ENSGALG00000036659 422071 NADH:ubiquinone oxidoreductase complex assembly factor 8 NDUFAF8 7 ENSGALG00000008239 424078 NADH:ubiquinone oxidoreductase subunit B3 NDUFB3 15 ENSGALG00000008085 417020 neurofilament heavy polypeptide NEFH 9 ENSGALG00000006361 429131 sialidase 4 NEU4 3 ENSGALG00000036869 428549 NK2 homeobox 2 NKX2-2 1 ENSGALG00000037807 418277 NOP2 nucleolar protein NOP2 27 ENSGALG00000003651 420034 5'-nucleotidase, cytosolic IIIB NT5C3B 1 ENSGALG00000016994 418858 nudix hydrolase 15 NUDT15 3 ENSGALG00000016508 421982 numb homolog (Drosophila)-like NUMBL 19 ENSGALG00000032280 417501 nucleoporin 88 NUP88 10 ENSGALG00000039166 768801 olfactory receptor family 10 subfamily A member 4 OR10A4 3 ENSGALG00000008732 395484 cutaneous T-cell lymphoma-associated antigen 1 OTOR 9 ENSGALG00000039432 107049027 purinergic receptor P2Y13 P2RY13 1 ENSGALG00000017327 396114 pyrimidinergic receptor P2Y6 P2RY3 9 ENSGALG00000021180 100858936 progestin and adipoQ receptor family member 9 PAQR9 11 ENSGALG00000001214 101748842 par-6 partitioning defective 6 homolog alpha (C. elegans) PARD6A 8 ENSGALG00000010473 424619 PDZK1 interacting protein 1 PDZK1IP1 10 ENSGALG00000006534 415486 peroxisomal biogenesis factor 11 alpha PEX11A 17 ENSGALG00000001595 772213 PHD finger protein 19 PHF19 8 ENSGALG00000027608 424387 phosphatidylinositol glycan anchor biosynthesis class C PIGC 19 ENSGALG00000003665 417571 phosphatidylinositol glycan anchor biosynthesis class S PIGC 15 ENSGALG00000006942 416964 phosphoinositide-3-kinase interacting protein 1 PIK3IP1 6 ENSGALG00000005086 396424 plasminogen activator, urokinase PLAU 1 ENSGALG00000038599 417989 phosphomannomutase 1 PMM1 2 ENSGALG00000044362 428440 protein phosphatase 2C-like domain containing 1 PP2D1 10 ENSGALG00000048349 770336 protein regulator of cytokinesis 1 PRC1 27 ENSGALG00000000349 428274 proteasome 26S subunit, ATPase 5 PSMC5 18 ENSGALG00000026225 417425 proteasome 26S subunit, non-ATPase 12 PSMD12 27 ENSGALG00000033162 426133 proteasome 26S subunit, non-ATPase 3 PSMD3

72 14 ENSGALG00000004203 416463 proteasome assembly chaperone 3 PSMG3 14 ENSGALG00000004203 416463 proteasome assembly chaperone 3 PSMG3 6 ENSGALG00000009610 771778 phosphoseryl-tRNA kinase PSTK 9 ENSGALG00000043329 424931 prothymosin, alpha PTMA 1 ENSGALG00000016538 418614 retinoic acid induced 2 RAI2 4 ENSGALG00000006061 422232 RAP2C, member of RAS oncogene family RAP2C 18 ENSGALG00000021552 417349 RAS-like, family 10, member A RASL10A 18 ENSGALG00000021552 417349 RAS-like, family 10, member A RASL10A 20 ENSGALG00000003952 419185 recombination signal binding protein for immunoglobulin kappa J region like RBPJL 14 ENSGALG00000040648 395621 regulator of G-protein signaling 11 RGS11 1 ENSGALG00000013422 426083 RAD9-HUS1-RAD1 interacting nuclear orphan 1 RHNO1 2 ENSGALG00000034803 420226 ring finger protein 151 RNF151 21 ENSGALG00000039967 426504 ATP-dependent RNA helicase DDX19B; Uncharacterized protein RP11-529K1.3 9 ENSGALG00000009312 100858296 ribosomal protein L22 like 1 RPL22L1 27 ENSGALG00000002837 396280 ribosomal protein L27 RPL27 2 ENSGALG00000035252 420182 ribosomal protein L7 RPL7 10 ENSGALG00000052307 430848 ribonuclease P/MRP subunit p25 RPP25 3 ENSGALG00000036790 395796 ribosomal protein S27a RPS27A 25 ENSGALG00000000576 768974 ribosomal RNA adenine dimethylase domain containing 1 RRNAD1 3 ENSGALG00000028056 100858699 ribosomal RNA processing 36 RRP36 7 ENSGALG00000006217 424038 S100 calcium binding protein B S100B 6 ENSGALG00000004431 423711 secretion associated Ras related GTPase 1A SAR1A 15 ENSGALG00000005862 416923 somatomedin B and thrombospondin type 1 domain containing like SBSPONL 4 ENSGALG00000047990 422563 stimulator of chondrogenesis 1 SCRG1 19 ENSGALG00000037709 427832 stromal cell derived factor 2 SDF2 10 ENSGALG00000007162 415522 selenoprotein S SELENOS 11 ENSGALG00000002247 101750922 SET domain containing 6 SETD6 4 ENSGALG00000013968 422760 sarcoglycan beta SGCB 4 ENSGALG00000033908 770263 solute carrier family 10 member 4 SLC10A4 1 ENSGALG00000011514 417924 solute carrier family 25 member 3 SLC25A3 18 ENSGALG00000007018 768556 solute carrier family 26 member 11 SLC26A11 14 ENSGALG00000010497 425593 solute carrier family 29 member 4 SLC29A4 9 ENSGALG00000007478 424916 solute carrier family 51 alpha subunit SLC51A 9 ENSGALG00000007478 424916 solute carrier family 51 alpha subunit SLC51A 20 ENSGALG00000006194 419270 solute carrier family 52 member 3 SLC52A3 21 ENSGALG00000033204 419472 PQ loop repeat containing 2 SLC66A1 4 ENSGALG00000007242 422287 SLIT and NTRK like family member 4 SLITRK4

73 1 ENSGALG00000011743 769813 single-pass membrane protein with coiled-coil domains 3 SMCO3 23 ENSGALG00000030261 100858028 small integral membrane protein 12 SMIM12 3 ENSGALG00000015812 421827 small integral membrane protein 8 SMIM8 1 ENSGALG00000011323 768769 small nuclear RNA activating complex polypeptide 3 SNAPC3 22 ENSGALG00000028409 771344 small nuclear ribonucleoprotein polypeptide G pseudogene 15 SNRPGP15 27 ENSGALG00000009929 429784 sclerostin SOST 18 ENSGALG00000004418 395305 somatostatin receptor 2 SS2R 20 ENSGALG00000007541 419315 syntaxin 16 STX16 21 ENSGALG00000002906 419453 TAR DNA binding protein TARDBP 14 ENSGALG00000027823 430233 transcription factor AP-4 TFAP4 9 ENSGALG00000053550 429271 THAP domain containing 4 THAP4 2 ENSGALG00000007677 100859434 threonine synthase like 1 THNSL1 26 ENSGALG00000000086 421147 translocase of inner mitochondrial membrane 17 homolog B (yeast) TIMM17B 4 ENSGALG00000015710 422900 transmembrane protein 129 TMEM129 11 ENSGALG00000040878 100857564 transmembrane protein 208 TMEM208 21 ENSGALG00000035387 419411 transmembrane protein 240 TMEM240 23 ENSGALG00000043733 769637 transmembrane protein 50A TMEM50A 21 ENSGALG00000001534 419414 transmembrane protein 88B TMEM88B 21 ENSGALG00000001024 419384 tumor protein p63 regulated 1-like TPRG1L 1 ENSGALG00000028872 396363 tumor protein, translationally-controlled 1 TPT1 1 ENSGALG00000028872 396363 tumor protein, translationally-controlled 1 TPT1 1 ENSGALG00000013056 418166 tubulin alpha 8a TUBA8A 1 ENSGALG00000012532 426978 thioredoxin 2 TXN2 2 ENSGALG00000014897 421060 thymidylate synthetase TYMS 6 ENSGALG00000033807 769373 trypsin domain containing 1 TYSND 14 ENSGALG00000006117 771251 ubiquitin family domain containing 1 UBFD1 4 ENSGALG00000006412 770844 vestigial like family member 1 VGLL1 5 ENSGALG00000043089 101748485 von Willebrand factor C and EGF domains VWCE 7 ENSGALG00000008470 424089 WD repeat domain 12 WDR12 3 ENSGALG00000008760 769534 WD repeat domain 92 WDR92 18 ENSGALG00000007367 422097 WAP, follistatin/kazal, immunoglobulin, kunitz and netrin domain containing 2 WFIKKN2 20 ENSGALG00000047383 419192 WNT1 inducible signaling pathway protein 2 WISP2 26 ENSGALG00000037597 428266 YOD1 deubiquitinase YOD1 1 ENSGALG00000034265 771155 zygote arrest 1-like ZAR1L 17 ENSGALG00000050757 417207 zinc finger DHHC-type containing 12 ZDHHC12 1 ENSGALG00000025906 heparan sulfate 6-O-sulfotransferase 3 1 ENSGALG00000037167 100858354 histone cluster 1, H1.03 (similar to human histone cluster 1, class H1 genes)

74 histone cluster 1, H2A-IV-like 3 (similar to human histone cluster 2, class H2A, member 1 ENSGALG00000032645 417955 C) 1 ENSGALG00000049268 427881 histone cluster 1, H2A, III (similar to human histone cluster 2, class H2A, member C) 1 ENSGALG00000049268 427881 histone cluster 1, H2A, III (similar to human histone cluster 2, class H2A, member C) 1 ENSGALG00000011931 small nuclear ribonucleoprotein 13 2 ENSGALG00000026092 428525 histamine H3 receptor-like LOC428525 3 ENSGALG00000009253 SDE2 telomere maintenance homolog 4 ENSGALG00000026653 ADP ribosylation factor like GTPase 9 6 ENSGALG00000009718 family with sequence similarity 53 member B 6 ENSGALG00000052512 neuropeptide S 7 ENSGALG00000037378 reprimo, TP53 dependent G2 arrest mediator homolog 8 ENSGALG00000003443 leucine rich repeat containing 52 9 ENSGALG00000042553 RAP2B, member of RAS oncogene family 10 ENSGALG00000053823 family with sequence similarity 219 member B 10 ENSGALG00000002000 GRAM domain containing 2A 10 ENSGALG00000007022 pyroglutamyl-peptidase I like 13 ENSGALG00000027809 inhibitory synaptic factor family member 2B 18 ENSGALG00000029882 myeloid associated differentiation marker like 2 18 ENSGALG00000003114 noggin 21 ENSGALG00000037593 PTEN induced kinase 1 23 ENSGALG00000003301 family with sequence similarity 167 member B 24 ENSGALG00000006795 428242 ATP binding cassette subfamily G member 4 ABCG4 25 ENSGALG00000045721 interferon stimulated exonuclease gene 20 like 2

75