<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Title

2 De Novo Assembly of the Northern (Cardinalis cardinalis) Genome Reveals

3 Candidate Regulatory Regions for Sexually Dichromatic Red Plumage Coloration

4

5 Author

6 Simon Yung Wa Sin †‡*, Lily Lu †, Scott V. Edwards †

7

8 Author affiliation

9 † Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology,

10 Harvard University, 26 Oxford Street, Cambridge, MA 02138, USA.

11 ‡ School of Biological Sciences, The University of Hong Kong, Pok Fu Lam Road, Hong

12 Kong.

13

14

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

15 Running title

16 genome assembly

17

18 Keywords

19 AllPaths-LG

20 Cis-regulatory elements

21 CYP2J19 gene

22 Ketocarotenoid pigments

23 Transcription factors

24

25 * Corresponding author

26 Simon Y. W. Sin

27 Email: [email protected]

28 Mailing address: School of Biological Sciences, Kadoorie Biological Sciences Building, The

29 University of Hong Kong, Pok Fu Lam Road, Hong Kong.

30

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

31 Abstract

32 Northern cardinals (Cardinalis cardinalis) are common, mid-sized widely

33 distributed in North America. As an iconic species with strong sexual dichromatism, it has

34 been the focus of extensive ecological and evolutionary research, yet genomic studies

35 investigating the evolution of genotype–phenotype association of plumage coloration and

36 dichromatism are lacking. Here we present a new, highly contiguous assembly for C.

37 cardinalis. We generated a 1.1 Gb assembly comprised of 4,762 scaffolds, with a scaffold

38 N50 of 3.6 Mb, a contig N50 of 114.4 kb and a longest scaffold of 19.7 Mb. We identified

39 93.5% complete and single-copy orthologs from an Aves dataset using BUSCO,

40 demonstrating high completeness of the genome assembly. We annotated the genomic region

41 comprising the CYP2J19 gene, which plays a pivotal role in the red coloration in .

42 Comparative analyses demonstrated non-exonic regions unique to the CYP2J19 gene in

43 passerines and a long insertion upstream of the gene in C. cardinalis. Transcription factor

44 binding motifs discovered in the unique insertion region in C. cardinalis suggest potential

45 androgen-regulated mechanisms underlying sexual dichromatism. Pairwise Sequential

46 Markovian Coalescent (PSMC) analysis of the genome reveals fluctuations in historic

47 effective population size between 100,000–250,000 in the last 2 millions years, with declines

48 concordant with the beginning of the Pleistocene epoch and Last Glacial Period. This draft

49 genome of C. cardinalis provides an important resource for future studies of ecological,

50 evolutionary, and functional genomics in cardinals and other birds.

51

52 Introduction

53 The northern cardinal (Cardinalis cardinalis) is a mid-sized (~42-48 g) broadly

54 distributed in eastern and central North America, with a range encompassing northern Central

55 America to southeastern Canada (Smith et al. 2011). It has high genetic and phenotypic

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

56 diversity and is currently divided into 18 subspecies (Paynter 1970; Smith et al. 2011).

57 Cardinals have been studied extensively on research areas such as song and communication

58 (e.g. Anderson and Conner 1985; Yamaguchi 1998; Jawor and MacDougall-Shackleton

59 2008), sexual selection (e.g. Jawor et al. 2003), physiology (e.g. DeVries and Jawor 2013;

60 Wright and Fokidis 2016), phylogeography (e.g. Smith et al. 2011), and plumage coloration

61 (e.g. Linville and Breitwisch 1997; Linville et al. 1998; McGraw et al. 2003). C. cardinalis is

62 an iconic species in Cardinalidae and has strong sexual dichromatism, with adult males

63 possessing bright red plumage and adult females tan (Fig. 1).

64 The red plumage coloration of many species plays important roles in social and

65 sexual signaling. With the advance of genomic technologies in the last several years, the

66 genetic basis and evolution of plumage coloration in birds is under intense investigation

67 (Lopes et al. 2016; Mundy et al. 2016; Twyman et al. 2016; Toomey et al. 2018; Twyman et

68 al. 2018a; Twyman et al. 2018b; Funk and Taylor 2019; Kim et al. 2019; Gazda et al. 2020).

69 The red color of feathers is generated by the deposition of ingested carotenoids, modified

70 endogenously. A ketolase is involved in an oxidative reaction to convert dietary yellow

71 carotenoids into red ketocarotenoids (Friedman et al. 2014). Recently the gene encoding the

72 carotenoid ketolase has been shown to be a cytochrome P450 enzyme, CYP2J19 (Lopes et al.

73 2016; Mundy et al. 2016). The identification of CYP2J19 as the gene responsible for red

74 coloration in hybrid canary (Serinus canaria) plumage (Lopes et al. 2016) and zebra finch

75 (Taeniopygia guttata) bill and legs (Mundy et al. 2016) suggests its general role in red

76 pigmentation of multiple tissues across birds. C. cardinalis is an excellent candidate to

77 further our understanding of the regulation and development of red plumage coloration in

78 birds. In particular, little attention has been paid to noncoding, regulatory signatures in the

79 region surrounding CYP2J19, and genomic resources for cardinals would greatly facilitate

80 such work.

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

81 Despite extensive ecological and evolutionary research on C. cardinalis, we lack a

82 highly contiguous genome of this species and other species in the family Cardinalidae. A

83 genome assembly of C. cardinalis will facilitate studies of genotype–phenotype association

84 of sexually dichromatic plumage color in this species, and facilitate comparative genomic

85 analysis in birds generally. Here we generate a new genome assembly from a male collected

86 in Texas, USA, with a voucher specimen in the ornithology collection of the Museum of

87 Comparative Zoology. We used the AllPaths-LG (Gnerre et al. 2011) method to perform de

88 novo genome assembly. Given the lack of genomic resources currently available for the

89 Cardinalidae, this C. cardinalis draft genome will facilitate future studies on genomic studies

90 of cardinals and other passerines.

91

92 Methods & Materials

93 Sample collection and DNA extraction

94 We collected a male C. cardinalis on the DNA Works Ranch, Afton, Dickens County, Texas,

95 United States (33.76286°, -100.8168°) on 19 January 2016 (MCZ Ornithology cat. no.:

96 364215). We collected muscle, heart, and liver and snap frozen the tissues in liquid nitrogen

97 immediately in the field. Upon returning from the field, tissues were stored in the

98 cryopreservation facility of the Museum of Comparative Zoology, Harvard University until

99 DNA extraction. We isolated genomic DNA using the DNeasy Blood and Tissue Kit

100 (Qiagen, Hilden, Germany) following the manufacturer’s protocol. We confirmed the sex of

101 the individual using published PCR primers targeting CHD1 genes with different sizes of

102 intron on the W and Z chromosomes (2550F & 2718R; Fridolfsson and Ellegren 1999) and

103 measured DNA concentration with a Qubit dsDNA HS Assay Kit (Invitrogen, Carlsbad,

104 USA).

105

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

106 Library preparation and sequencing

107 We performed whole-genome library preparation and sequencing following Grayson et al.

108 (2017). In brief, a DNA fragment library of 220 bp insert size was prepared using the PrepX

109 ILM 32i DNA Library Kit (Takara), and mate-pair libraries of 3 kb insert size were prepared

110 using the Nextera Mate Pair Sample Preparation Kit (cat. No. FC-132-1001, Illumina). We

111 then assessed library quality using the HS DNA Kit (Agilent) and quantified the libraries

112 with qPCR prior to sequencing (KAPA library quantification kit). We sequenced the libraries

113 on an Illumina HiSeq instrument (High Output 250 kit, PE 125 bp reads) at the Bauer Core

114 facility at Harvard University.

115

116 De novo genome assembly and assessment

117 We assessed the quality of the sequencing data using FastQC and removed adapters using

118 Trimmomatic (Bolger et al. 2014) and assembled the genome using AllPaths-LG v52488

119 (Gnerre et al. 2011), which allowed us to estimate the genome size from k-mer frequencies

120 and assess the contiguity of the de novo genome. We estimated the completeness of the

121 assembled genome with BUSCO v2.0 (Simão et al. 2015) and used the aves_odb9 dataset to

122 search for 4915 universal single-copy orthologs in birds.

123

124 Analysis of the CYP2J19 genomic region

125 We annotated the genomic region that comprises the CYP2J19 gene using blastn with the

126 CYP2J19, CYP2J40, HOOK1 and NFIA genes from S. canaria and T. guttata as queries.

127 Conserved non-exonic elements (CNEEs) were obtained from a published UCSC Genome

128 Browser track hub containing a progressiveCactus alignment of 42 bird and reptile species

129 and CNEE annotations (viewed in the UCSC genome browser at

130 https://ifx.rc.fas.harvard.edu/pub/ratiteHub8/hub.txt; Sackton et al. 2019). We identified

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

131 CYP2J19 genes from 30 bird species (Twyman et al. 2018a) in the NCBI by blasting their

132 genome assemblies. The alignment was compiled using MUSCLE (Edgar 2004) and viewed

133 with Geneious (Kearse et al. 2012). We also aligned the genomic regions up- and

134 downstream of CYP2J19 in 10 passerines to identify any potential conserved regulatory

135 regions present in species with red carotenoid coloration.

136 We used the MEME Suite (Bailey et al. 2009; Bailey et al. 2015) to discover potential

137 regulatory DNA motifs in the region upstream of the CYP2J19 gene that we found to be

138 unique to C. cardinalis and to predict transcription factors (TFs) binding to those motifs. We

139 used MEME (Bailey and Elkan 1994) to identify the top five most statistically significant

140 motifs and their corresponding positions in the region. We used Tomtom (Gupta et al. 2007)

141 to compare the identified motifs against databases (e.g. JASPAR) of known motifs and

142 identify potential TFs specific to those matched motifs. To search for potential biological

143 roles of these motifs, we used GOMo (Buske et al. 2010) to scan all promoters in Gallus

144 gallus using the identified motifs to determine if any motifs are significantly associated with

145 genes linked to any Genome Ontology (GO) terms.

146

147 Inference of demographic history

148 To investigate historical demographic changes in the northern cardinal we used the Pairwise

149 Sequential Markovian Coalescent (PSMC) model (Li and Durbin 2011) based on the diploid

150 whole-genome sequence to reconstruct the population history. We generated consensus

151 sequences for all autosomes using SAMtools’ (v1.5; Li et al. 2009) mpileup command and

152 the vcf2fq command from vcfutils.pl. We applied filters for base quality and mapping quality

153 below 30. The settings for the PSMC atomic time intervals were “4+30*2+4+6+10”. 100

154 bootstraps were used to compute the variance in estimates of Ne. To convert inferred

155 population sizes and times to numbers of individuals and years, respectively, we used the

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

156 estimate of mutation rate of 3.44e-09 per site per generation from the medium ground finch

157 (Geospiza fortis, Thraupidae, a closely related passerine clade) (Nadachowska-Brzyska et al.

158 2015). We estimated the generation time of the northern cardinal as age of sexual maturity

159 multiplied by a factor of two (Nadachowska-Brzyska et al. 2015), yielding a generation time

160 of 2 years.

161

162 Data availability (Data will be available later prior to publication)

163 Data from all sequencing runs and final genome assembly are available from NCBI

164 (BioProject number will be provided later). Short-fragment and mate-pair libraries are also

165 available from the NCBI SRA (number to be provided later).

166

167 Results and Discussion

168 Genome assembly and evaluation

169 We generated 703,754,970 total reads from two different sequencing libraries, including

170 335,713,090 reads from the fragment library and 368,041,880 reads from the mate-pair

171 library. The genome size estimated by AllPaths-LG from k-mers was 1.1 Gb (Table 1). The

172 sequence coverage estimated was ~59x. The assembly consisted of 32,783 contigs and 4,762

173 scaffolds. The largest scaffold was 19.7 Mb. The contig N50 was 114.4 kb and the scaffold

174 N50 was 3.6 Mb (Table 1). The number of contigs per Mb was 31.4 and the number of

175 scaffolds per Mb was 4.6. The average GC content of the assembly was 42.1%. BUSCO

176 scores (Simão et al. 2015) suggest high completeness of the genome, with 93.5% of single-

177 copy orthologs for birds identified (Table 2). The genome contiguity and completeness is

178 better than most genomes presented in Jarvis et al. (2014) and Grayson et al. (2017).

179

180 Candidate non-coding regulatory regions of CYP2J19

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

181 The CYP2J19 gene was identified on scaffold 100 of the C. cardinalis genome, flanked by

182 CYP2J40 and NFIA genes (Fig. 2), an arrangement consistent with other species (e.g. Lopes

183 et al. 2016; Mundy et al. 2016). The length of the CYP2J19 gene in C. cardinalis is 8925 bp,

184 comprising 9 exons. We identified 26 CNEEs in the CYP2J19 gene, 24 CNEEs within ~6 kb

185 upstream and 1 CNEE within ~6 kb downstream of the gene. Of those 51 CNEEs, 10 CNEEs

186 in CYP2J19 and 7 CNEEs upstream of the gene are at least 50 bp in length.

187 Three intronic insertions, as well as one non-coding region upstream and one

188 downstream of the gene were found to be present only in passerine species (Fig. 3). In

189 addition, we identified unique insertions of large genomic regions in the two passerine

190 species in our alignment with red carotenoid coloration, i.e. C. cardinalis and T. guttata. A

191 unique 5920 bp insertion is present 339 bp upstream of the CYP2J19 gene in C. cardinalis

192 (Fig. 4a). There is a 11322 bp insertion downstream of CYP2J19 in T. guttata comprising the

193 CYP2J19B gene (Fig. 4b; Mundy et al. 2016) that is not present in C. cardinalis. The most

194 similar sequence to the C. cardinalis upstream insertion identified by blasting (~84%

195 identity, 25% query cover) is a sequence annotated as “RNA-directed DNA polymerase from

196 mobile element jockey-like” in the American crow (Corvus brachyrhynchos) genome

197 assembly (Zhang et al. 2014), suggesting that the insertion may contain a non-long terminal

198 repeat (non-LTR) retrotransposon (Ivanov et al. 1991). An endogenous retroviral insertion

199 located upstream of the aromatase gene is proposed to be the mechanism of gene activation

200 that lead to the henny feathers phenotype in chickens (Matsumine et al. 1991). The possibility

201 that the upstream retrovirus may promote CYP2J19 gene activation is worth more

202 investigation.

203 We discovered a total of 25 motifs clustered in a ~1.7 kb region in the unique

204 insertion sequence of C. cardinalis (Fig. 5a). Fourteen TFs are predicted for the 5 identified

205 motif types (Fig. S1, Table 3). No significant GO term appears to be associated with motifs

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

206 1–3 and 5, whereas significant GO terms associated with motif 3 predict molecular functions

207 of GTP binding and hexose transmembrane transporter activity. Three predicted TFs (i.e.

208 Sp1, SREBF, and RREB1; Table 3) are associated with androgen regulation of gene

209 expression.

210 Cis-regulatory elements play a crucial role in the development of sexually dimorphic

211 traits (Williams and Carroll 2009). In birds, sexual dimorphism is strongly affected by steroid

212 sex hormones (Kimball 2006). Sex hormones such as androgen may indirectly influence the

213 expression of androgen-regulated genes, through binding to transcription factors that interact

214 with regulatory elements such as enhancer and cause sex-biased gene expression, which leads

215 to sex-specific phenotypes (Coyne et al. 2008; Mank 2009). Whereas male plumage appears

216 to be testosterone-dependent in passerines (Kimball 2006), the molecular mechanisms

217 controlling sex-biased gene expression and development of sexually dimorphic traits in birds

218 is largely unknown (Kraaijeveld 2019; Gazda et al. in press). In this study, we identified

219 potential TFs that may regulate the expression of the red plumage color gene CYP2J19.

220 Many CYP450s are differentially expressed between the sexes (Rinn and Snyder 2005).

221 Some of our TFs predicted to interact with the insertion unique to C. cardinalis are androgen-

222 regulated, including Sp1, which is involved in the androgen activation of the vas deferens

223 protein promoter in mice (Darne et al. 1997) and sterol regulatory element-binding factor

224 (SREBF, aka SREBP), which is involved in androgen-regulated activation mechanisms of

225 target genes (Heemers et al. 2006). Another predicted TF, the zinc finger protein Ras-

226 responsive element binding protein (RREB1), is a co-regulator of androgen receptor (AR)

227 and plays a role in androgenic signaling by affecting AR-dependent transcription

228 (Mukhopadhyay et al. 2007). The interaction between androgens, the implicated TFs and

229 their corresponding binding motifs may underlie the development of sexually dimorphic red

230 plumage color in C. cardinalis. In addition, the TF nuclear factor erythroid-derived 2-like 2

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

231 (Nfe2l2, also known as nuclear factor erythroid 2-related factor 2, Nrf2) is predicted to bind

232 to all but motif 2 (Fig. 5). The activity of Nrf2 is influenced by changes in oxidative stress,

233 and it regulates the expression of numerous antioxidant proteins (Schultz et al. 2014).

234 Because the production of red ketacarotenoids via CYP2J19 is sensitive to oxidative state

235 (Lopes et al. 2016), this oxidation-dependent TF suggests a possible link between production

236 of red pigments and the oxidative stress faced by the organism, which may reflect an

237 association between red carotenoid coloration and individual quality (Hill and Johnson 2012).

238 The above non-exonic regions in and outside of the CYP2J19 gene are candidates of

239 regulatory elements responsible for modulating red carotenoid-based coloration in passerines

240 and C. cardinalis. Noncoding regulatory regions may be subject to less pleiotropic constraint

241 than protein-coding genes (Carroll et al. 2008), and many CNEEs act as enhancers with

242 regulatory functions (Seki et al. 2017; Sackton et al. 2019). In particular, the large non-exonic

243 region rich with TF-binding motifs upstream of the C. cardinalis CYP2J19 gene may play a

244 role in the strong sexual dichromatism and striking bright red male color in this species. The

245 mechanism of plumage color development and sexual dichromatism, and the regulatory role

246 of those genomic regions identified in this study are fruitful areas for future research.

247

248 Demographic historic of C. cardinalis

249 The PSMC analysis (Fig. 6) suggested a demographic history of C. cardinalis characterized

250 by a fluctuation in the historic effective population size between 100,000–250,000 around

251 ~2,000,000 years ago (ya). The population was at ~160,000 at the beginning of Pleistocene

252 epoch at 2,580,000 ya, then decreased to 110,000 in ~800,000 ya, and increased to ~230,000

253 at the beginning of the Last Glacial Period around 115,000 ya. The population started to

254 declined again after the beginning of the Last Glacial Period (LGP). The decrease in effective

255 population size observed at the beginning of Pleistocene might be due to the divergence of C.

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

256 cardinalis into different subspecies (Provost et al. 2018). The population decline after the

257 start of the LGP is consistent with the ecological niche model that indicates a dramatic range

258 reduction for C. cardinalis during the LGP (Smith et al. 2011).

259

260 Conclusion

261 We used small-fragment and mate-pair libraries Illumina sequencing data to generate a draft

262 genome assembly of the northern cardinal, C. cardinalis. Comparative analyses revealed

263 conserved non-exonic regions unique to the CYP2J19 gene in passerines and in C. cardinalis,

264 which may play a role in the regulation of red carotenoid-based plumage coloration. The

265 motifs discovered in the lineage-specific upstream region of CYP2J19 in C. cardinalis

266 suggest potential cis-regulatory mechanisms underlying sexual dichromatism. The assembled

267 C. cardinalis genome is therefore useful for studying genotype–phenotype associations and

268 sexual dichromatism in this species. As one of the first genome sequenced in Cardinalidae,

269 and the highest in quality, it will also be an important resource for the comparative study of

270 plumage color evolution in passerines and birds in general.

271

272 Acknowledgments

273 We would like to thank Jonathan Schmitt and Katherine Eldridge for collecting and preparing

274 the specimen and tissues. We thank Clarence Stewart for sharing the northern cardinal photo.

275 This research was supported by research funds through Harvard University. The computing

276 resources for this study were provided by the Odyssey cluster supported by the FAS Division

277 of Science, Research Computing Group at Harvard University.

278

279 Literature Cited

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

280 Anderson, M. and R. Conner. 1985. Northern cardinal song in three forest habitats in eastern

281 Texas. The Wilson Bulletin:436–449.

282 Bailey, T., M. Boden, F. Buske, M. Frith, C. Grant, L. Clementi, J. Ren, W. Li, and W.

283 Noble. 2009. MEME SUITE: tools for motif discovery and searching. Nucleic Acids

284 Res. 37:W202–W208.

285 Bailey, T. and C. Elkan. 1994. Fitting a mixture model by expectation maximization to

286 discover motifs in biopolymers. Proceedings of the Second International Conference

287 on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park,

288 California:28–36.

289 Bailey, T., J. Johnson, C. Grant, and W. Noble. 2015. The MEME suite. Nucleic Acids Res.

290 43:W39–W49.

291 Bolger, A., M. Lohse, and B. Usadel. 2014. Trimmomatic: a flexible trimmer for Illumina

292 sequence data. Bioinformatics 30:2114–2120.

293 Buske, F., M. Bodén, D. Bauer, and T. Bailey. 2010. Assigning roles to DNA regulatory

294 motifs using comparative genomics. Bioinformatics 26:860–866.

295 Carroll, S., B. Prud’homme, and N. Gompel. 2008. Regulating evolution. Sci. Am. 298:60–

296 67.

297 Coyne, J., E. Kay, and S. Pruett-Jones. 2008. The genetic basis of sexual dimorphism in

298 birds. Evolution: International Journal of Organic Evolution 62:214–219.

299 Darne, C., L. Morel, F. Claessens, M. Manin, S. Fabre, G. Veyssiere, W. Rombauts, and C.

300 Jean. 1997. Ubiquitous transcription factors NF1 and Sp1 are involved in the

301 androgen activation of the mouse vas deferens protein promoter. Mol. Cell.

302 Endocrinol. 132:13–23.

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

303 DeVries, M. and J. Jawor. 2013. Natural variation in circulating testosterone does not predict

304 nestling provisioning rates in the northern cardinal, Cardinalis cardinalis. Anim.

305 Behav. 85:957–965.

306 Edgar, R. 2004. MUSCLE: multiple sequence alignment with high accuracy and high

307 throughput. Nucleic Acids Res. 32:1792–1797.

308 Fridolfsson, A. K. and H. Ellegren. 1999. A simple and universal method for molecular

309 sexing of non-ratite birds. J. Avian Biol. 30:116–121.

310 Friedman, N., K. McGraw, and K. Omland. 2014. Evolution of carotenoid pigmentation in

311 caciques and meadowlarks (Icteridae): repeated gains of red plumage coloration by

312 carotenoid C4-oxygenation. Evolution 68:791–801.

313 Funk, E. and S. Taylor. 2019. High-throughput sequencing is revealing genetic associations

314 with avian plumage color. The Auk 136:ukz048.

315 Gazda, M. A., P. M. Araújo, R. J. Lopes, M. B. Toomey, P. Andrade, S. Afonso, C. Marques,

316 L. Nunes, P. Pereira, S. Trigo, G. E. Hill, J. C. Corbo, and M. Carneiro. in press. A

317 genetic mechanism for sexual dichromatism in birds. Science.

318 Gazda, M. A., M. B. Toomey, P. M. Araújo, R. J. Lopes, S. Afonso, C. A. Myers, K. Serres,

319 P. D. Kiser, G. E. Hill, J. C. Corbo, and M. Carneiro. 2020. Genetic basis of de novo

320 appearance of carotenoid ornamentation in bare parts of canaries. Mol. Biol. Evol.

321 37:1317–1328.

322 Gnerre, S., I. MacCallum, D. Przybylski, F. Ribeiro, J. Burton, B. Walker, T. Sharpe, G. Hall,

323 T. Shea, S. Sykes, and A. Berlin. 2011. High-quality draft assemblies of mammalian

324 genomes from massively parallel sequence data. Proceedings of the National

325 Academy of Sciences 108:1513–1518.

326 Grayson, P., S. Y. W. Sin, T. B. Sackton, and S. V. Edwards. 2017. Comparative genomics as

327 a foundation for evo-devo studies in birds. Humana Press, New York, NY.

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

328 Gupta, S., J. Stamatoyannopoulos, T. Bailey, and W. Noble. 2007. Quantifying similarity

329 between motifs. Genome Biology 8:R24.

330 Heemers, H., G. Verhoeven, and J. Swinnen. 2006. Androgen activation of the sterol

331 regulatory element-binding protein pathway: Current insights. Mol. Endocrinol.

332 20:2265–2277.

333 Hill, G. and J. Johnson. 2012. The vitamin A–redox hypothesis: a biochemical basis for

334 honest signaling via carotenoid pigmentation. The American Naturalist 180:E127–

335 E150.

336 Ivanov, V., A. Melnikov, A. Siunov, I. Fodor, and Y. Ilyin. 1991. Authentic reverse

337 transcriptase is coded by jockey, a mobile Drosophila element related to mammalian

338 LINEs. The EMBO journal 10:2489–2495.

339 Jarvis, E., S. Mirarab, A. Aberer, B. Li, P. Houde, C. Li, S. Ho, B. Faircloth, B. Nabholz, J.

340 Howard, and A. Suh. 2014. Whole-genome analyses resolve early branches in the tree

341 of life of modern birds. Science 346:1320–1331.

342 Jawor, J., S. Linville, S. Beall, and R. Breitwisch. 2003. Assortative mating by multiple

343 ornaments in northern cardinals (Cardinalis cardinalis). Behav. Ecol. 14:515–520.

344 Jawor, J. and S. MacDougall-Shackleton. 2008. Seasonal and sex-related variation in song

345 control nuclei in a species with near-monomorphic song, the northern cardinal.

346 Neurosci. Lett. 443:169–173.

347 Kearse, M., R. Moir, A. Wilson, S. Stones-Havas, M. Cheung, S. Sturrock, S. Buxton, A.

348 Cooper, S. Markowitz, C. Duran, and T. Thierer. 2012. Geneious Basic: an integrated

349 and extendable desktop software platform for the organization and analysis of

350 sequence data. Bioinformatics 28:1647–1649.

351 Kim, K., B. Jackson, H. Zhang, D. Toews, S. Taylor, E. Greig, I. Lovette, M. Liu, A.

352 Davison, S. Griffith, and K. Zeng. 2019. Genetics and evidence for balancing

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

353 selection of a sex-linked colour polymorphism in a songbird. Nature Communications

354 10:1–11.

355 Kimball, R. 2006. Hormonal control of coloration. Pp. 431–468 in G. Hill, and K. McGraw,

356 eds. Bird coloration. I. Mechanisms and measurements. Harvard Univerasity Press,

357 Cambridge, MA.

358 Kraaijeveld, K. 2019. Genetic architecture of novel ornamental traits and the establishment of

359 sexual dimorphism: insights from domestic birds. Journal of Ornithology 160:861–

360 868.

361 Li, H. and R. Durbin. 2011. Inference of human population history from individual whole-

362 genome sequences. Nature 475:493–496.

363 Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis,

364 and R. Durbin. 2009. The sequence alignment/map format and SAMtools.

365 Bioinformatics 25:2078–2079.

366 Linville, S. and R. Breitwisch. 1997. Carotenoid availability and plumage coloration in a wild

367 population of northern cardinals. The Auk 114:796–800.

368 Linville, S., R. Breitwisch, and A. Schilling. 1998. Plumage brightness as an indicator of

369 parental care in northern cardinals. Anim. Behav. 55:119–127.

370 Lopes, R. J., J. D. Johnson, M. B. Toomey, M. S. Ferreira, P. M. Araujo, J. Melo-Ferreira, L.

371 Andersson, G. E. Hill, J. C. Corbo, and M. Carneiro. 2016. Genetic basis for red

372 coloration in birds. Curr. Biol. 26:1427–1434.

373 Mank, J. 2009. Sex chromosomes and the evolution of sexual dimorphism: lessons from the

374 genome. The American Naturalist 173:141–150.

375 Matsumine, H., M. A. Herbst, S. H. Ou, J. D. Wilson, and M. J. McPhaul. 1991. Aromatase

376 mRNA in the extragonadal tissues of chickens with the henny-feathering trait is

377 derived from a distinctive promoter structure that contains a segment of a retroviral

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

378 long terminal repeat. Functional organization of the Sebright, Leghorn, and Campine

379 aromatase genes. J. Biol. Chem. 266:19900–19907.

380 McGraw, K., G. Hill, and R. Parker. 2003. Carotenoid pigments in a mutant cardinal:

381 implications for the genetic and enzymatic control mechanisms of carotenoid

382 metabolism in birds. The Condor 105:587–592.

383 Mukhopadhyay, N., B. Cinar, L. Mukhopadhyay, M. Lutchman, A. Ferdinand, J. Kim, L.

384 Chung, R. Adam, S. Ray, A. Leiter, and J. Richie. 2007. The zinc finger protein ras-

385 responsive element binding protein-1 is a coregulator of the androgen receptor:

386 implications for the role of the Ras pathway in enhancing androgenic signaling in

387 prostate cancer. Mol. Endocrinol. 21:2056–2070.

388 Mundy, N. I., J. Stapley, C. Bennison, R. Tucker, H. Twyman, K. W. Kim, T. Burke, T. R.

389 Birkhead, S. Andersson, and J. Slate. 2016. Red carotenoid coloration in the zebra

390 finch is controlled by a cytochrome P450 gene cluster. Curr. Biol. 26:1435–1440.

391 Nadachowska-Brzyska, K., C. Li, L. Smeds, G. Zhang, and H. Ellegren. 2015. Temporal

392 dynamics of avian populations during Pleistocene revealed by whole-genome

393 sequences. Curr. Biol. 25:1375–1380.

394 Paynter, R. J. 1970. Subfamily Emberizinae. Museum of Comparitive Zoology, Cambridge,

395 MA, USA.

396 Provost, K., W. M. III, and B. Smith. 2018. Genomic divergence in allopatric Northern

397 Cardinals of the North American warm deserts is linked to behavioral differentiation.

398 Ecology and Evolution 8:12456–12478.

399 Rinn, J. and M. Snyder. 2005. Sexual dimorphism in mammalian gene expression. Trends

400 Genet. 21:298–305.

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

401 Sackton, T., P. Grayson, A. Cloutier, Z. Hu, J. Liu, N. Wheeler, P. Gardner, J. Clarke, A.

402 Baker, M. Clamp, and S. Edwards. 2019. Convergent regulatory evolution and loss of

403 flight in paleognathous birds. Science 364:74–78.

404 Schultz, M., S. Hagan, A. Datta, Y. Zhang, M. Freeman, S. Sikka, A. Abdel-Mageed, and D.

405 Mondal. 2014. Nrf1 and Nrf2 transcription factors regulate androgen receptor

406 transactivation in prostate cancer cells. PloS One 9.

407 Seki, R., C. Li, Q. Fang, S. Hayashi, S. Egawa, J. Hu, L. Xu, H. Pan, M. Kondo, T. Sato, and

408 H. Matsubara. 2017. Functional roles of Aves class-specific cis-regulatory elements

409 on macroevolution of bird-specific features. Nature Communications 8:1–14.

410 Simão, F. A., R. M. Waterhouse, P. Ioannidis, E. V. Kriventseva, and E. M. Zdobnov. 2015.

411 BUSCO: assessing genome assembly and annotation completeness with single-copy

412 orthologs. Bioinformatics 31:3210–3212.

413 Smith, B., P. Escalante, B. Baños, A. Navarro-Sigüenza, S. Rohwer, and J. Klicka. 2011. The

414 role of historical and contemporary processes on phylogeographic structure and

415 genetic diversity in the Northern Cardinal, Cardinalis cardinalis. BMC Evol. Biol.

416 11:136.

417 Toomey, M., C. Marques, P. Andrade, P. Araújo, S. Sabatino, M. Gazda, S. Afonso, R.

418 Lopes, J. Corbo, and M. Carneiro. 2018. A non-coding region near Follistatin controls

419 head colour polymorphism in the Gouldian finch. Proceedings of the Royal Society B

420 285:20181788.

421 Twyman, H., S. Andersson, and N. Mundy. 2018a. Evolution of CYP2J19, a gene involved in

422 colour vision and red coloration in birds: positive selection in the face of conservation

423 and pleiotropy. BMC Evol. Biol. 18:22.

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

424 Twyman, H., M. Prager, N. I. Mundy, and S. Andersson. 2018b. Expression of a carotenoid-

425 modifying gene and evolution of red coloration in weaverbirds (Ploceidae). Mol.

426 Ecol. 27:449–458.

427 Twyman, H., N. Valenzuela, R. Literman, S. Andersson, and N. I. Mundy. 2016. Seeing red

428 to being red: conserved genetic mechanism for red cone oil droplets and co-option for

429 red coloration in birds and turtles. Proc. R. Soc. Lond. B Biol. Sci. 283:20161208.

430 Williams, T. and S. Carroll. 2009. Genetic and molecular insights into the development and

431 evolution of sexual dimorphism. Nat. Rev. Genet. 10:797–804.

432 Wright, S. and H. Fokidis. 2016. Sources of variation in plasma corticosterone and

433 dehydroepiandrosterone in the male northern cardinal (Cardinalis cardinalis): II.

434 Effects of urbanization, food supplementation and social stress. Gen. Comp.

435 Endocrinol. 235:201–209.

436 Yamaguchi, A. 1998. A sexually dimorphic learned birdsong in the Northern Cardinal. The

437 Condor 100:504–511.

438 Zhang, G., C. Li, Q. Li, B. Li, D. Larkin, C. Lee, J. Storz, A. Antunes, M. Greenwold, R.

439 Meredith, and A. Ödeen. 2014. Comparative genomics reveals insights into avian

440 genome evolution and adaptation. Science 346:1311–1320.

441

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

442 Tables

443

444

445 Table 1 De novo assembly metrics for northern cardinal genome

Metric Value Estimated genome size 1.10 Gb %GC content 42.1 Total depth of coverage 59x Total contig length (bp) 1,019,501,986 Total scaffold length (bp, gapped) 1,044,184,327 Number of contigs 32,783 Contig N50 114.4 kb Number of scaffolds 4,762 Scaffold N50 (with gaps) 3.6 Mb Largest scaffold 19.7 Mb 446

447

448

449 Table 2 Output from BUSCO analyses to assess genome completeness by searching for

450 single-copy orthologs from aves dataset

Aves % Complete BUSCOs 4642 94.4 Complete and single-copy BUSCOs 4596 93.5 Complete and duplicated BUSCOs 46 0.9 Fragmented BUSCOs 167 3.4 Missing BUSCOs 106 2.2 Total BUSCO groups searched 4915 451 452 453 454 455 456 457 458 459 460 461

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

462 463 Table 3 Transcription factors (TFs) predicted for the 5 motifs. 464 Motif no. TF name a p-value 1 KLF13 6.32×10-04 Nfe2l2 2.15×10-03 SP1 3.62×10-03 SREBF1 4.81×10-03 Ascl2 secondary 5.18×10-03 2 SOX8 DBD5 3.99×10-03 3 Nfe2l2 3.72×10-04 MAF NFE2 2.78×10-03 Bach1 Mafk 3.14×10-03 Zfp128 primary 3.40×10-03 E2F7 5.52×10-03 4 Nfe2l2 1.05×10-03 RREB1 2.86×10-03 SP1 4.73×10-03 Bach1 Mafk 4.91×10-03 5 SP1 2.04×10-03 Ascl2 secondary 2.06×10-03 SREBF1 2.18×10-03 EBF1 3.42×10-03 Nfe2l2 3.95×10-03 KLF16 4.83×10-03 SP3 4.83×10-03 RREB1 4.96×10-03 465 a Bold names indicates TFs predicted for more than one motif. 466 467 468 469

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

470 Figures

471

472

473

474 Figure 1 A pair of northern cardinals, a common, sexually dichromatic passerine bird. The

475 adult male (left) has bright red plumage whereas the adult female (right) is primarily tan in

476 color. Photo © Clarence Stewart.

477

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 2 Annotation of the CYP2J19 region in scaffold 100 of the northern cardinal genome. The CYP2J19 gene is flanked by the NFIA and CYP2J40 genes as in other passerines. Conserved non-exonic elements (CNEEs) are shown in black boxes. Exons of the CYP2J19 gene are in yellow triangles.

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 3 Multiple sequence alignment of the CYP2J19 gene in birds, with red highlighted areas showing regions present in passerines but not other birds. Exons of the CYP2J19 gene are depicted as yellow triangles. The dark red highlighted region upstream of the CYP2J19 gene in the northern cardinal is different from other passerines (see figure 5).

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 4 Multiple sequence alignment of (a) upstream and (b) downstream regions of the CYP2J19 gene in passerines. A large insertion upstream of CYP2J19 unique to the northern cardinal is highlighted in red. The insertion of a large region downstream of CYP2J19 highlighted in orange in the zebra finch indicates the CYP2J19B gene in the CYP2J2-like cluster in this species. The names of species possessing red carotenoid coloration (i.e. northern cardinal and zebra finch) are in red. Exons of the CYP2J19 gene are depicted as yellow triangles.

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

a Name Combined p -value Motif locationsMotif Locations Motif 1 Motif 2 Motif 3 p-value Motif 4 Motif 5 + car 5.715.71e-111×10-111 - 1000 bp Motif Symbol Motif Consensus b1. GCGGGCAAAVAGGGGTGTGGTGAGTCTCTGGGGCTTATACAAGGCAGTTWMotif 1 Motif 4 2. TGAGTCTCTGRGGYGTATACMAGGCAGTTTGAGYGGGCARACAGGRGYGK2 2 3. GGRTGTGCTGAGTCTCTGGGGCGTATACMAGGCAGTTTGAGCGGGCARAC

1 1 bits 4. -28 ACAGGGGTGTGSTGAGTCTCTGGGGCTTATACAAGGCARTWWSAGBRGbits -25 5. 2.6×10 ACAGGGGTGTGGTGAGTCYCTGGGGCKTATACMWGGCARTTTGAGCRGGC2.1×10 C A G A TT C G G TTG GG G T G G T G G G T C C C T C C C T CG G T A CA A T T G A C C A C GG C GG G G C G T A C A C A T A A T TT T A 0 C T G C GC C GG C CC GC C C 2 3 4 5 6 7 8 9 A A A T A 1 G 0 C 2 3 4 5 6 7 8 9 1 A A A A AA A 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 G G A A G TGTG GAGTCTCTGGGG AC AGGCMEMEA (no SSC)T 28.04.2020 14:32 A AGGGGT TG GAGT T TGGG C T AGGCA T MEMEA G(no SSC) 04.05.2020G 10:15 Motif 2 Motif 5

2 2

1

bits 1 2.0×10-32 1.5×10-29 bits G C A C G G T T T T G AA G G G C G G G G G C G G T T T T A A A A A A C G C C G G G C G G 0 T A A T TT A C 2 3 4 5 6 7 8 9 1 C C G A A A A T T A A 0 G C A C C GA CA A 2 3 4 5 6 7 8 9 1 C G CT CT C A A 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 MEME (no SSC) 28.04.2020 14:31 TGAG CTCTG GG TATAC GGCAGTTTG GG A AC G G A A GG TGTG TGAG C CTG GG T T GG A TTTG MEMEC (no SSC) 04.05.2020GC 10:16 Motif 3

2

1 2.0×10-35 bits A C A C AG C G GGGTG CG C C A C A 0 G T T C C G GA 2 3 4 5 6 7 8 9 1 G A A 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 GG G G G G C C GGGGC C GGC G GGMEME (no SSC) 28.04.2020 14:30 T T T A T T T TATA A A TTT

Figure 5 Identification of potential DNA motifs in the unique 5920 bp insertion upstream of the CYP2J19 gene in C. cardinalis. (a) Locations of 25 motifs identified and their distribution in the insertion sequence. Sites on the positive (+) strand are shown above the line. Scale is shown below the sequence. (b) Logos of five motifs with statistically significant e-values provided next to the logos.

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

40

35 ) 4 30

25

20

15

10 Effective population size (x10 5

0 104 105 106 Years Time ((g=2, µya=0.3x10) -8)

Figure 6 Demographic history of the northern cardinal inferred using PSMC. Bold red line is

the median effective population size estimate, whereas thin lines are 100 individual bootstrap

replicates. The light and dark grey areas indicate the Pleistocene epoch and Last Glacial

Period, respectively.

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.12.092080; this version posted May 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

UP00099_2 MA0150.2 Supplementary2 Information 2

1 1 bits bits

GGG GT G C GCT AT AA TG A A A T G T AC G C TT TCATATT C A C G G A G A C C T C TG G A A C C G T TG CCT ACACG C C G T G G G GT AA CT A G T a d G AACCT A C AGC 0 T 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 A Motif 4 16 15 14 13 12 11 10 T 15 14 13 12 11 Motif 1 GC TCA10 2 2

1

-28 bits 1 2.6×10 2.1×10-25 bits C G T G G TTG GG T GA GA GT C C G T G G GG C GG G G C G T C A C A C A C T G C C C A T TT T A GT C AC GG CA A CC TGC C C A T A G A C C A C C 0 A A T T T 9 8 7 6 5 4 3 2 1 A T A C T G G C GC 0 C 9 8 7 6 5 4 3 2 1 A AA A 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 A A 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 1 10 Tomtom (no SSC) 01.05.2020 05:46 4 G G A A G TGTG GAGTCTCTGGGG AC AGGCA T A AGGGGT TG GAGT T TGGG C T AGGCA T TomtomA G(no SSC) 04.05.2020G 10:21 Transcription factors Transcription factors KLF13_full MA0150.2 2 2 KLF13 1 Nfe2l2 bits 1 bits -04 -03 p = 6.32×10 GCA C A CA T p = 1.05×10 CA C A CTG T C T T G T A T T T A A GGG C C G A T GA A T T A A C G G C C C T T C G T G G G T G 0 A GT A 2 3 4 5 6 7 8 9 1 CT A G T G G AACCT A AGC T C 10 11 12 13 14 15 16 17 18 0 2 3 4 5 6 7 8 9 MA0150.2 T1 A 10 11 12 13 14 MA0073.115 2 GGGCG GC T 2 C 2 T A Nfe2l2 2 1 RREB1 bits 1 1 bits -03 bits 1 p = 2.15×10 -03 bits A T T A C G p = 2.86×10 G C C C T G G G C GT G G G GA G GT A CT A G T A T G AATCCT A C GC G G G T G G G G 0 9 8 7 6 5 4 3 2 1 A T C T T T T C A A T C C 15 14 13 12 11 T G 10 G G ATATAAGCT GT AT T G TTG GG C SP1_DBD 0 T T GGTG T T 2 3 4 5 6 7 8 9 GG C GG G G C G 1 T A AC A T TTCT A A C C GG C CC GC C C G T G G A A A T A C 10 11 12 13 14 15 16 17 18 19 0 20 2 3 4 5 6 7 8 9 1 C T G A GC G T C C C G GT C T A SP1_DBD A G C C A C C C A T T 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 2 50 A T A T C T G G C GC 0 C 2 3 4 5 6 7 8 9 1 A A A AA A 1 Tomtom (no SSC) 01.05.2020 05:45 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 T A 48 2 G G 2 4 G G A A G TGTG GAGTCTCTGGGG AC AGGCA T T Tomtom (no SSC) 04.05.2020 10:21 2 A AGGGGT TG GAGT T TGGG C T AGGCA T AG G SP1 1

bits SP1 1 1 bits -03 bits 1 p = 3.62×10 bits G C G T G G -03 G A T G AT TT A G T A C C CA C C T T GA A T G C 0 C A A T p = 4.73×10 9 8 7 6 5 4 3 2 T G 1 G G G G C A T 11 10 G AT MA0595.1 TT A GG C GG G G C G T A C C CA C C G G G A AC A T TTCT A A A G TT G AC GG CA A CC TGC C C A GA T 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 A C 1 G T G G C 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 11 10 T 2 C T G C C C GT C T A G A C C A C G C GG A MA0591.1A T T T 1 T A Tomtom (no SSC) 01.05.2020 05:45 C T G G C GC 0 C 2 3 4 5 6 7 8 9 1 A A A AA A 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 G G G G G G G C C GGGG C GGC 48 2 G A A T T A T T T A A A T 2 GG 4 Tomtom (no SSC) 04.05.2020 10:26 2 A AGGGGT TG GAGT T TGGG C T AGGCA T AG G SREBF1 1 bits Bach1 Mafk 1 1 bits

-03 bits 1 p = 4.81×10 G G T bits A GGC C T G C CT C G T -03 0 A 9 8 7 6 5 4 3 2 C A 1 G A T C

10 T T G C G G T GG C GG G G UP00099_2 C G p = 4.91×10 GC C C C A C CA C GGC A A G G G G G T A T TT T A A A T G A C T G T A G A C GG C CC GC C C T G T C A TT 0 A A A T A G A G 9 8 7 6 5 4 3 2 1 A CG SP1_DBD 0 2 3 4 5 6 7 8 9 C 1 G T G G C 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 T T 2 C T C C C 10 11 12 13 14 15 G GT C G T A G A C C A C C 1 Tomtom (no SSC) 01.05.2020 05:45 A A TTT A T T T A 2 C T G G C GC 0 C 9 8 7 6 5 4 3 2 1 G C A A A AA A 2 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 G G A A G TGTG GAGTCTCTGGGG AC AGGCA T 10 CT A A G 4 Tomtom (no SSC) 04.05.2020 10:26 Ascl2 GGGG G G GT GGG C GGC G G 2 A A T T A T T T T A A T A

1 secondary bits 1

1 bits bits

-03 1 p = 5.18×10 GGG SOX8_DBD_5 bits G GT G C GCT AT AA TG A A G A CC T TA TG T T TCA T C G A G G A C C T 2 TG A A GTG CCTGACACG C T C 0 C G 9 8 7 6 5 4 3 2 A 1 A T G G G G A T G T AT TT A 16 15 14 13 12 11 10 T A C C C GG C GG G G C G CA A C C Motif 5 A T A G A AC T TTCT A A 0 2 3 4 5 6 7 8 9 0 AC GG CA A CC TGC C C A 1 9 8 7 6 5 4 3 2 1 A G G 10 G 11 TTG G 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 C G T G G e C 1 T Tomtom (no SSC) 01.05.2020 05:46 C T G C C C GT C T A G A C C A C CA A TTT A T GC T G G C GC 0 C 2 3 4 5 6 7 8 9 G G G G G G G C C GGGG C GGC 1 G A A A AA A 2 G 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 A A T T A T T T A A A T 48 4 1 2 Tomtom (no SSC) 04.05.2020 10:26 bits A AGGGGT TG GAGT T TGGG C T AGGCA T AG G

1 G

bits G

AT 0 AGC T -29 1 2 3 4 5 6 7 8 9 1 bits 10 11 12 13 b Motif 2 G C 14 1.5×10 G G T C T C T T A AAT A A A T T G G G 2 GG C C GG G G C G A G G G A AC T TTCT A A T AA 0 AC GG CA A CC TGC C C A 9 8 7 6 5 4 3 2 1 A T G C G G 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 C G G G T T C T G G 1 Tomtom (no SSC) 01.05.2020 05:46 C A A A A A A G C A C G C C GA C CA A 0 C C 2 3 4 5 6 7 8 9 1 T T A A 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 G G A A G TGTG GAGTCTCTGGGG AC AGGCA T 50 5 Tomtom (no SSC) 04.05.2020 10:28 A A GG TGTG TGAG C CTG SP1_DBDGG T T GG A TTTG C GC 1 2 2.0×10-32 bits T G CG AA AGCG C G AG GGT T SP1 0 C T CG CA A T TT AC G 2 3 4 5 6 7 8 9 1 A T T A A 1 bits 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 2 50 G G C C G GG C SOX8_DBD_5GGC G G GG C GTomtom (no SSC) 30.04.2020G 17:05 T A T T TATA A TTT A A -03 2 p = 2.04×10 G C G T G G A T G AT TT A T A C C CAGA A CT C 0 2 3 4 5 6 7 8 9 1 UP00099_2 10 SOX8 DBD5 11 2 GG G 1 bits Ascl2 2 -03 p = 3.99×10 MA0758.1 G G 2 1 AT secondary GC T 0 A bits 2 3 4 5 6 7 8 9 1 10 11 12 13 14 1 G C G C bits T AAT T A T A -03 2 p = 2.06×10 GGG GT G C GCT AT AA TG A A G AC G C TT TCATATT C G A G A C C T 1 TG A A TG CCTGACACG C bits 0 9 8 7 6 5 4 3 2 1 T G AA G G MA0595.1 16 15 14 13 12 11 10 T G C G G C G G G T T C T G G C A A A A A A 2 G C A C G C C GA C CA A 0 C C 2 3 4 5 6 7 8 9 1 T T A A 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 5 50 1 T 2 Tomtom (no SSC) 04.05.2020 10:28 A c bits T GG G G G G C C G GG GG G C GC T CC A C C G A A T T T A T T T A TTT 0 2 3 4 5 6 7 8 9 A1 10 11 12 13 G C Motif 3 C T GGCGGGG A 14 SREBF1 A TT T T 1 AA bits 2 T G A AG G C AG G 0 C T CG CA A T TT AC G 2 3 4 5 6 7 8 9 1 A T T A A 1 bits 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 2 50 -03 G G C C G GG C GGC G G GG C GTomtom (no SSC) 30.04.2020G 17:05 p = 2.18×10 G T A T T TATA A TTT A A T G A GGC C C CT T 0 A C G G G G 2 3 4 5 6 7 8 9 -35 1 G T AA 1 10 T EBF1_full

bits G C G G C G G G T T C T G G 2.0×10 C A A A A A A G C A C G C C GA C CA A 20 C C 9 8 7 6 5 4 3 2 1 T T A A 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 T 10 TGA 5 A C G 2 A A GG TGTG TGAG C CTG GG T T GG A TTTG TomtomC (no SSC) 04.05.2020GC 10:28 A C A C G GGGTG CG C C A C A 0 G G T T C C G GA 2 3 4 5 6 7 8 9 1 A A 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 EBF1 3 Tomtom (no SSC) 01.05.2020 08:14 1 GG G G G G C C GGGGC C GGC G GG bits T T T A T T T MA0150.2TATA A A TTT 1 2 p = 3.42×10-03 bits

AT TT C G A AT GC A CA A G T AA TC A CCG GG GTTG 0 T 9 8 7 6 5 4 3 2 1 T AG AA G G 14 13 12 11 T MA0150.210 G C C G G C G G G T T T C T G G Nfe2l2 C A A A A A A 2 G C A C G C C GA C CA A 1 0 C C 2 3 4 5 6 7 8 9 1 C C G T T A A bits G G 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 5 50 2 A A GG TGTG TGAG C CTG GG T T GG A TTTG TomtomC (no SSC) 04.05.2020GC 10:28 p = 3.72×10-04 A T T A G CCG C C G T C GT G G GT AA CT A G T G AATCCT A C AGC Nfe2l2 0 9 8 7 6 5 4 3 2 1 A T 1 bits 15 14 13 12 11 GC 10 MA0501.1 C 1 2 T A -03 bits 2 p = 3.95×10

A T T A G CCG C C G T C GT G G GT AA CT A G T G AATCCT ATC AGC G AA G G 0 9 8 7 6 5 4 3 2 1 A T T KLF16_DBD G C G G 15 14 13 12 11 C G G G 10 T T C T G G MAF NFE2 C A A A A A A G C A C G C C GA C CA A 20 C C 9 8 7 6 5 4 3 2 1 GC T T A A 1 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10

bits C 1 5 Tomtom (no SSC) 04.05.2020 10:29

bits T A 2 A A GG TGTG TGAG C CTG GG T T GG A TTTG C GC p = 2.78×10-03 T AA A C GT AT C G T GACT T A C A A C G A CT A CCG G CC G T C A C A KLF16 0 1 9 8 7 6 5 4 3 2 1 G G T G

T bits 15 14 13 12 11 C 10 G G G G C C A C A C A GT C C G G 0 G T T A 9 8 7 6 5 4 3 2 1 C MA0591.1 A A 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 G 10 C 3 Tomtom (no SSC) 01.05.2020 08:13 1 2 GG G G G GTCAC GGGGC C GGC G GG -03 bits 2 T T T A T T T TATA A A TTT p = 4.83×10 G C T GC T T T T T AA AG A T C A CCAC G G A 0 AT G T 2 3 4 5 6 7 8 9 1 T G AA G G 10 11 T SP3_DBD Bach1 Mafk G G G G C G G C G G G T T C T G G 2 C A A A A A A G C A C G C C GA C CA A 1 0 C C 9 8 7 6 5 4 3 2 1 T T A A

bits GG 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 1 10

bits 5 2 GG G G G G C C G GG GG G TomtomC (no SSC) 04.05.2020GC 10:29 -03 A A T T T A T T T A TTT p = 3.14×10 TC T A GC C C C G SP3 CA C GGC A ATC T TAG G A G TG GCTCG A GATA A C A 0 G T G 1 9 8 7 6 5 4 3 2 1

T C G G G G C C bits 15 14 13 12 11 10 C A C A 0 G G T T C C G GA 9 8 7 6 5 4 3 2 1 G C A A UP00094_1 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 T 10 A 1 3 Tomtom (no SSC) 01.05.2020 08:14 C A bits G -03 2 GG G G G GTC C GGGGC C GGC G GG 2 T T T A T T T TATA A A TTT p = 4.83×10 G G C G T A T C TA T T T A G T TC A ACAAA T G A CTGAC CGCGC 0 2 3 4 5 6 7 8 9 1 T G AA G G MA0073.1 10 11 T Zfp128 primary G G C G G C G G G T T C T G G 2 G A A A G C A A A G C A C G C C GA C CA A 0 C C 2 3 4 5 6 7 8 9 1 T T A A 1 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 bits 1 5 Tomtom (no SSC) 04.05.2020 10:29 bits 2 A A GG TGTG TGAG C CTG GG T T GG A TTTG C GC p = 3.40×10-03 T CCC G RREB1 G A CT G A G T A T G TA A GTC G C C T TT A A 1 A A G G T T C C A T A A A A 0 G T G bits 2 3 4 5 6 7 8 9 C 1 G G G G G C C C C 10 11 12 13 14 15 16 17 A A 0 G G T T C C G GA 9 8 7 6 5 4 3 2 1 G A A 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 MA0758.1 1 C 3 C Tomtom (no SSC) 01.05.2020 08:14 bits -03 2 GG TGTG TGAGTCTCTGGGGC TATAC AGGCAGTTT GG 2 p = 4.96×10 G G G G G T G G G G T G G T TT C C C ATATAAGCT T AT T 0 T T GGTG T T 2 3 4 5 6 7 8 9 1 T T G AA G G 10 11 12 13 14 15 16 17 18 19 G T G 20 G C G G E2F7 C G G G T T C T G G C A A A A A A G C A C G C C GA C CA A 0 C C 2 3 4 5 6 7 8 9 1 T T A A 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 G G 50 1 T bits 5 1 2 Tomtom (no SSC) 04.05.2020 10:29 bits A A GG TGTG TGAG C CTG GG T T GG A TTTG C GC p = 5.52×10-03 T A A C GT T CC 0 A CGA CT C GA G 9 8 7 6 5 4 3 2 A1 14 13 12 11 C G G G G 10 C C C A C A 0 G G T T C C G GA 2 3 4 5 6 7 8 9 1 A A 1 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 A 50 T G bits 3 GG G G G G C C GGGGC C GGC G TT GCGGGTomtomA (noA SSC) 01.05.2020 08:14 2 T T T A T T T TATA A A TTT T T G AA G G G C G G C G G G T T C T G G C A A A A A A G C A C G C C GA C CA A 0 C C 2 3 4 5 6 7 8 9 1 T T A A 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 5 Tomtom (no SSC) 04.05.2020 10:29 1 Figure S1 Identificationbits of potential binding transcription factorsA A GG TG TinG T GtheAG C CuniqueTG GG T T 5920GG A T TbpTG C GC A C A C AG C G GGGTG CG C C A C A 0 G G T T C C G GA 9 8 7 6 5 4 3 2 1 A A 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 3 insertion upstreamGG TGTG T GofAGT CtheTCTGG GCYP2J19GC TATAC AGGCAG TgeneTT GinGTomtom (no SSC) 01.05.2020C. 08:14 cardinalis. (a–e) Logos of five motifs with statistically significant e-values provided next to the logos. Transcription factors predicted for the 5 motifs are shown below the query motifs.

28