<<

Article and echidna genomes reveal mammalian biology and evolution

https://doi.org/10.1038/s41586-020-03039-0 Yang Zhou1,2,36, Linda Shearwin-Whyatt3,36, Jing Li4,36, Zhenzhen Song1,5, Takashi Hayakawa6,7, David Stevens3, Jane C. Fenelon8, Emma Peel9, Yuanyuan Cheng9, Filip Pajpach3, Received: 4 December 2019 Natasha Bradley3, Hikoyu Suzuki10, Masato Nikaido11, Joana Damas12, Tasman Daish3, Accepted: 30 July 2020 Tahlia Perry3, Zexian Zhu4, Yuncong Geng13, Arang Rhie14, Ying Sims15, Jonathan Wood15, Bettina Haase16, Jacquelyn Mountcastle16, Olivier Fedrigo16, Qiye Li1, Huanming Yang1,17,18,19, Published online: 6 January 2021 Jian Wang1,17, Stephen D. Johnston20, Adam M. Phillippy14, Kerstin Howe15, Erich D. Jarvis21,22, Open access Oliver A. Ryder23, Henrik Kaessmann24, Peter Donnelly25, Jonas Korlach26, Harris A. Lewin12,27,28, Jennifer Graves29,30,31, Katherine Belov9, Marilyn B. Renfree8, Frank Grutzner3 ✉, Qi Zhou4,32,33 ✉ Check for updates & Guojie Zhang1,2,34,35 ✉

Egg-laying () are the only extant mammalian outgroup to therians ( and eutherian ) and provide key insights into mammalian evolution1,2. Here we generate and analyse reference genomes of the platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus), which represent the only two extant lineages. The nearly complete platypus genome assembly has anchored almost the entire genome onto chromosomes, markedly improving the genome continuity and gene annotation. Together with our echidna sequence, the genomes of the two allow us to detect the ancestral and lineage-specifc genomic changes that shape both monotreme and mammalian evolution. We provide evidence that the monotreme sex chromosome complex originated from an ancestral chromosome ring confguration. The formation of such a unique chromosome complex may have been facilitated by the unusually extensive interactions between the multi-X and multi-Y chromosomes that are shared by the autosomal homologues in humans. Further comparative genomic analyses unravel marked diferences between monotremes and therians in haptoglobin genes, lactation genes and chemosensory receptor genes for smell and taste that underlie the ecological adaptation of monotremes.

The iconic -laying monotremes of Australasia represent one of the chromosomes2. The incomplete platypus assembly without Y chro- three major mammalian lineages. The monotreme lineage comprises mosome sequences and lack of an echidna genome have limited the two extant families, the semi-aquatic Ornithorhynchidae (platypus) and interpretation of the and monotremes. Here the terrestrial Tachyglossidae (echidna). At present, the single species we combined PacBio long-read, 10× linked-read, chromatin confor- of platypus has a restricted distribution in Eastern , whereas mation (Hi-C) and physical map data to produce a highly accurate four echidna species (T. aculeatus and three Zaglossus spp.) are present chromosome-scale assembly of the platypus genome. We also pro- in Australia and (Supplementary Information). duced a less-continuous assembly for the short-beaked echidna, which and echidnas feature radical differences in diet (carnivorous compared enables us to infer the genomic changes that occurred in the ancestral with insectivorous), neurophysiology (-oriented com- monotremes and other mammals. pared with olfaction-oriented), as well as specific intraspecific conflict and defence adaptations1. Owing to their distinct ecological, anatomi- cal and physiological features, monotremes are interesting mammals Chromosome-scale monotreme genomes well-suited for the study of the evolution of ecological adaptation. Our new male platypus genome assembly (mOrnAna1) shows a Of particular interest are their sex chromosomes, which originated 1,390-fold improvement for the contig N50 and a 49-fold improvement independently from those of therian mammals through additions of for the scaffold N50 compared with the previous Sanger-based assem- autosomes onto an ancestral XY pair, resulting in a multiple sex chro- bly (OANA5) (Fig. 1a). We performed extensive error correction and mosome system that assembles as a chain during meiosis3. manual curation to polish and anchor the assembly at the chromosome The previous female platypus genome assembly (OANA5) provided scale (Extended Data Fig. 1a, b). Ambiguous chromosome assignments many important insights into monotreme biology and mammalian were resolved with fluorescence in situ hybridization (FISH) experi- evolution. However, only about 25% of its sequence was assigned to ments (Extended Data Fig. 1c, d). We also produced a male echidna

A list of affiliations appears at the end of the paper.

756 | Nature | Vol 592 | 29 April 2021 a 600 Assembly OANA5 mOrnAna1 400 mTacAcu1

200

Number of bases (Mb) 0

1 2 5 10 20 50 10 20 50 1,000 2,000 5,000 10,000 20,000 50,000 0 0 0 2n = 46 Contig length (kb, top of interval) X

95 b

2n = 18 X 84

95 26 X 70 Marsupialia 2n = 14 2n = 36 2n = 14

Mammalia 2n = 60 156 2n = 52

X1 X2 X3 X4 X5

Fig. 1 | Chromosome assembly of monotreme and mammalian genome colour-coded in accordance with their chromosomal source in the mammalian evolution. a, The contig length distribution among the three monotreme ancestor. Numbers of estimated rearrangements are shown for each branch. assemblies shows a large improvement in the sequence continuity of the Silhouettes of the human and opossum are from https://www.flaticon.com/. platypus assembly, and at least equivalent quality of the echidna assembly. Silhouettes of the platypus and Tasmanian devil are created by S. Werning and b, Mammalian karyotype evolution trajectory. 2n = 60 ancestral karyotypes are reproduced under the Creative Commons Attribution 3.0 Unported licence were inferred for the common ancestor of mammals. Conserved blocks were (http://creativecommons.org/licenses/by/3.0/).

genome (mTacAcu1) from a variety of short- and long-insert-size librar- Fig. 1f–h). The repeat elements comprising about half of the monotreme ies, and further scaffolded it using the same methods as in platypus. The genomes are dominated by LINE/L2 elements that are more similar resulting mTacAc1 sequence shows better sequence continuity than to reptile genomes than therian mammals (which comprise mostly OANA5, with a scaffold N50 size of 32.51 Mb (Supplementary Table 2). LINE/L1)5 (Supplementary Table 12). The highly continuous assembly To study the origin and evolution of monotreme sex chromosomes, also substantially improves gene annotation. We identified 20,742 and we greatly improved the assembly of the platypus sex chromosomes. We 22,029 protein-coding genes in mOrnAna1 and mTacAcu1, respectively anchored 172 Mb (92% compared to 22% in OANA5) X-borne sequences (Supplementary Table 13). Specifically, 19,576 coding exons from 8,303 to chromosomes (Supplementary Tables 4, 6). This includes one 1.6-Mb platypus genes were recovered from the gapped regions of OANA5. segment that was previously misassigned to chromosome 14 (Extended Among them, 454 genes were completely missed in OANA5, and 3,961 Data Fig. 1e). We determined all of the pseudoautosomal regions (PARs) fragmented genes in OANA5 now have complete open-reading frames. except for X4, on the basis of the different read coverage between sexes We corrected 2,395 genes that were previously split or misannotated and representation of FISH markers (Supplementary Table 3). We also in OANA5 (Extended Data Fig. 1i, j). mapped 92% of the platypus Y-borne sequences to the five Y chromo- somes using PacBio reads produced using Y-borne bacterial artificial chromosome (BAC) clones4 (Supplementary Tables 5, 6). Owing to a lack Insights into mammalian genome evolution of echidna linkage markers, we used the platypus X chromosomes as a Our phylogenomic reconstruction shows that monotremes diverged reference to anchor a similar length (177 Mb, 96%) of X chromosomes from therians around 187 million years ago, and the two monotremes and identified 8.6 Mb Y-borne sequences in echidna. diverged around 55 million years ago (Extended Data Fig. 2a). This esti- In the final curated platypus genome (mOrnAna1) 98% of the mate provides a date for the monotreme–therian split that is earlier than sequence was assigned to the 21 autosomes, 5 X and 5 Y chromosomes previous estimates (about 21 million years ago)2, but agrees with recent (Supplementary Table 7), with putative telomeres and centromeres analyses of few genes and evidence6. We also inferred that mono- annotated for half of the chromosomes (Supplementary Table 8). mOr- tremes had similar genome substitution rates (approximately 2.6 × 10−3 nAna1 fills around 90% of the gaps in OANA5 (Supplementary Table 9), substitutions per site per million years) compared with other mammals recovering 161 Mb of previously missed genomic sequences, most of (Supplementary Table 15). About 14 Mb of mammalian specific highly which are long interspersed nuclear elements (LINE)/L2 and short inter- conserved elements were identified by comparison among vertebrates spersed nuclear elements (SINE)/MIR (Supplementary Tables 10, 11). We (Methods): around 90% of elements were located in non-coding regions also removed 68 Mb of redundant sequences in OANA5 (Extended Data (Extended Data Fig. 2c), and are associated with genes that are enriched

Nature | Vol 592 | 29 April 2021 | 757 Article

Y5 abX Y-frag PAR X5 / 2 S3 DNA 1 2 X5 / S5 RNA1 80% GC 30% X1

X1

Y4 X4 X4 S2

S4 Y3 S6 S0

X3 X3

Y1

Y2 X2 S1 X2

42.8 100.0 Independent AMHX AMHY Sequence similarity (%) Shared Time of sex chromosome formation

Fig. 2 | Origin and evolution of the sex chromosomes of the platypus. X chromosome ring of the gametologue pairs that have suppressed a, Genomic composition of the platypus sex chromosomes. From the outer to recombination before the divergence of monotremes (‘shared’, orange inner rings: the X chromosomes with the PARs (light colours) and SDRs triangles) or after the divergence (‘independent’, blue triangles). b, Homology (dark colours) labelled; the assembled Y chromosome fragments within SDRs between X and Y chromosomes of platypus. In particular, most of Y5 shows showing the colour-scaled sequence divergence levels with the homologous X homology with X1 and X2, which suggests an ancestral ring conformation of chromosomes; female-to-male (F/M) ratios of short sequencing-read coverage the platypus sex chromosomes. We also labelled the position of the putative in non-overlapping 5-kb windows; F/M expression ratios (each red dot is one sex-determining gene AMH. The platypus silhouette is created by S. Werning gene) of the adult kidney and the smoothed expression trend; and GC content and is reproduced under the Creative Commons Attribution 3.0 Unported in non-overlapping 2-kb windows. In addition, we labelled the positions on the licence (http://creativecommons.org/licenses/by/3.0/).

in processes such as brain development (Extended Data Fig. 2d, e, Sup- sex chromosomes—that is, the PARs—with regions that have become plementary Results and Supplementary Tables 18–20). sexually differentiated (SDRs). PAR boundaries show a sharp shift in Next we used chromosome information from human, opossum, the female-to-male sequencing coverage ratio as expected (Fig. 2a and Tasmanian devil, platypus, chicken and common wall lizard genomes to Extended Data Fig. 4a). Both monotremes showed generally nonbiased reconstruct the mammalian ancestral karyotype (Methods). This analy- gene expression levels between sexes within PARs, but pronounced sis reveals 30 mammalian ancestral chromosomes (MACs) (2n = 60) at female-biased expression within SDRs, indicating the absence of a resolution of 500 kb, covering around 66% of the human genome and complete chromosome-wide dosage compensation in monotremes approximately 67% of the platypus genome (Fig. 1b and Supplementary as previously suggested9 (Extended Data Fig. 4b). Tables 24–26). Of these, 25 MACs were maintained without breaks in a The short PARs of platypus chromosomes X2–X5 have a significantly single chromosome of the therian ancestor, and 17 of them have fused higher GC content (one-sided Wilcoxon rank-sum test, P < 0.01) than with other MACs in therians. Sixteen MACs were still maintained in a the SDRs or the longer PARs (Extended Data Fig. 4c), which probably single human chromosome, but only MAC28 had not undergone any reflects strong GC-biased gene conversion that is caused by a high intrachromosomal rearrangements during therian evolution (Extended recombination rate10. This is similar to the pattern of the short GC-rich Data Fig. 2f, g). We detected at least 918 chromosome breakage events, human PAR, the recombination rate of which is 17-fold higher than and confirmed that the X chromosome in humans was derived from the genome-wide average11. Notably, chicken orthologous sequences the fusion of an original therian X chromosome with an autosomal of these monotreme PARs are all located on the microchromosomes, region after the divergence from marsupials7 (Fig. 1b and Extended Data which also have a high GC content12 (one-sided Wilcoxon rank-sum test, Fig. 2f, g). The five X chromosomes in platypus were derived from dif- P < 0.01) (Extended Data Fig. 4c, d). This highly conserved recombina- ferent MACs by multiple fusion and translocation events. tion landscape might be partially selected in monotremes for maintain- We found that gene families associated with the immune response ing the sequence polymorphism and balanced dosage of MHC genes, and growth were expanded considerably in the mammalian ances- which reside in the PARs of the chromosome X3–Y3 and Y4–X5 pairs tor, perhaps contributing to the evolution of immune adaptation and in platypus13 (Extended Data Fig. 3a). The regional selection for high fur, respectively, in mammals (Supplementary Table 30). We further recombination may also counteract further expansion of SDRs on these manually annotated major histocompatibility complex (MHC) genes sex chromosomes. and other immune genes (Supplementary Results). As in nonmamma- Sex chromosomes of both eutherians and birds formed through step- lian vertebrates, the monotreme MHC class Ia genes colocalize with wise suppression of recombination, resulting in a pattern of pairwise antigen-processing genes and MHC class II genes (Extended Data Fig. 3a sequence divergence between SDRs termed ‘evolutionary strata’14,15. We and Supplementary Table 31). The defensin genes gave rise to unique identified at least seven strata in monotremes, named S0 to S6 from defensin-like peptides (OavDLP genes) in platypus venom8. By con- the oldest to the youngest strata (Fig. 2a and Extended Data Fig. 4a), trast, echidna has only one single OavDLP pseudogene (Extended Data by ranking their levels of pairwise synonymous sequence divergence Fig. 3f–h), suggesting the loss of the key venom gene family in this species. between the X–Y gametologues and the phylogeny (Extended Data Fig. 5a, b). All but the most recent strata (S5 and S6) are shared by platy- pus and echidna. However, the PARs that border S5 and S6, as well as Monotreme sex chromosome evolution the shorter PARs of chromosomes X2 and X5 (Extended Data Fig. 5c, d), To elucidate the detailed genomic composition of the monotreme sex formed independently after their divergence. Overall, the distribution chromosomes, we compared regions that share sequences between the of evolutionary strata suggested a time order of incorporating different

758 | Nature | Vol 592 | 29 April 2021 a b Y2 Y3 No interaction

Y2 Y3 Interaction

Platypus c Percentage of cells showing DR PAR association X1 Y1 X2 Y2 X3 Y3 X4 Y4 X5 Y5 0369 Human Y2–Y3 * * * * Y2–X1 * * 1 Y2–WSB1 2 3 12 d X1 X2 X3 X4 X5 4 15 5 16 Platypus 6 17 Human 7 18 Chicken 8 19 0 14.1 9 22 CTCF binding sites per 10 kb

Fig. 3 | Interactions between the platypus sex chromosomes. and Y3 (top, n = 593, 3 independent experiments) and interaction (bottom, a, Interchromosomal interactions among the platypus sex chromosomes n = 56, 3 independent experiments). Scale bars, 10 μm. c, The significantly detected by Hi-C data of liver tissue in platypus (top) and human (bottom). The higher frequency of interaction between Y2 and Y3 than that between Y2 and bars between the Hi-C panels show the platypus sex chromosomes and their X1, and between Y2 and WSB1 (chromosome 17). n = 185, 206, 258 cells for the orthologues in the human genome. Grey, intrachromosomal interactions; red, three independent replicate experiments of Y2–Y3, n = 258, 250, 205 cells for interchromosomal interactions. Red lines link the regions with significantly the three independent replicate experiments of Y2–X1, n = 298, 262, 220 cells high interchromosomal interactions. The interchromosomal interactions for the three independent replicate experiments of Y2–WSB1. Data are seem to be conserved in mammals, as indicated by the homologous mean ± s.d. ***P < 0.001 (Y2–Y3 versus Y2–X1, P = 0.0004675; Y2–Y3 versus chromosomal fragments of the human and platypus sex chromosomes and Y2–WSB1, P = 6.376 × 10−5), one-sided Fisher’s exact test. d, Putative their Hi-C contact patterns. b, FISH with BAC probes to detect sex CTCF-binding-site density plot showing its enrichment among homologous chromosomes Y2, Y3 or X1 and autosome chromosome17 (WSB1) in interphase regions in the platypus, human and chicken genomes. platypus fibroblasts. Examples show no interaction between chromosomes Y2

ancestral autosomes into the sex chromosome chain: it started from the are not found on its pairing partner X5, but on X1 (Fig. 2b and Supple- S0 region of X1 containing a sex-determining gene (see below), followed mentary Table 40). Chromosomes X1 and Y5 do not pair at meiosis, by X2, X3 and X5. X4 and individual regions of X3 and X1 underwent but this homology suggests that the origin of the extant monotreme suppression of recombination after the monotreme divergence. sex chromosome complex involved the opening of the ancestral chro- Despite episodes of independent evolution, most sex chromosome mosomal ‘ring’ as degeneration proceeded18. A conserved vertebrate regions of the platypus and echidna are homologous (Extended Data sex-determining gene, the anti-Mullerian hormone, is located on chro- Fig. 6a), suggesting that the complex formed in the monotreme ances- mosome Y5 (AMHY) and S0 of chromosome X1 (AMHX)14 (Fig. 2b). The tor16. To reconstruct its origin, we projected the platypus sex chromo- ancestral X1–Y5 pairing region that encompasses AMH could, therefore, somes onto their chicken homologues (Supplementary Table 39). This be the site at which homologous recombination was first suppressed. refined homology map (Extended Data Fig. 4d) suggests that both The degeneration of chromosome Y5 then caused the loss of homology fusions and reciprocal translocations among the ancestral micro- and with X1 and led to the break of the chromosome ring. Indeed, synony- macrochromosomal fragments gave rise to the monotreme sex chro- mous substitution rates (dS) between the retained X1–Y5 gametologue mosome complex. The platypus X chromosomes contain homologous pairs are significantly higher (one-sided Wilcoxon ranked-sum test, sequences of the entire or partial chicken microchromosomes 11, 16, P < 0.01) than those of any other sex chromosome pairs (Extended 17, 25 and 28. These microchromosomes also have orthologues in the Data Fig. 6e). A chromosome ring configuration has been reported in spotted gar17, suggesting that they were ancestral vertebrate micro- plants19, but not in any species. Alternatively, the ancestral ring chromosomes, and fused in the ancestral monotreme or mammalian structure might have evolved after the emergence of the proto-X1–Y5 chromosomes. Evidence of reciprocal translocations came from the pair by translocations that involve other autosomes, so that sexually observation that parts of every two neighbouring sex chromosomes antagonistic alleles could be linked to the sex-determining genes20. are homologous to two adjacent regions of the same chicken chromo- some (Extended Data Fig. 6c, d). For example, platypus chromosomes X1 and X2 are both homologous to parts of chicken microchromosome Interactions between sex chromosomes 12 and chromosome 13, whereas X2 and X3 are both homologous to The platypus sex chromosomes exhibit an unusual association with chicken chromosome 2. each other compared to autosomes during and after meiosis21. As lit- Notably, X1 at one end of the meiotic chain and Y5 at the other share tle is known about their spatial organization in platypus somatic cells, this alternately overlapping relationship, and both are homologous we investigated this using Hi-C data (male liver) and chromosomal to chicken microchromosome 28. Indeed, most of the genes on Y5 FISH with sex-chromosome-specific and autosomal BAC probes (male

Nature | Vol 592 | 29 April 2021 | 759 Article a TAS2R OR V1R

Monotreme- specic cluster b 1 Therian-specic HP Primate-specic HP duplication 1 greater haem afnity IST PKD1L3DHODHHP HPR TXNL4BDHX38 PMFBP DPEP1 and CD163 interaction

Red blood cell enucleation HP loss

HP loss

Ancestral HP duplicated from MASP HP loss

HP loss

c T1B1 T1B1 T1D1 T1E1 ATH SUL SUL SUL SUL CSN1S1CSN2 HSTN ST HTN3 HTN1 CSN1S2PRR27 CSN2BODAM FDCSPCSN3 CSN3BCABS1 AMTN AMBN ENAM JCHAIN

Fig. 4 | Genomic features related to biological characteristics of the flaticon.com/. Silhouettes of the platypus and Tasmanian devil are created by monotremes. a, Differences in numbers of TAS2R, OR and V1R genes between S. Werning and the emu silhouette is created by D. Naish (vectorized by platypus and echidna. b, Phylogeny and synteny of the HP gene. Regions are not T. M. Keesey); all three silhouettes are reproduced under the Creative drawn to scale. c, Synteny conservation of the region surrounding caseins (CSN Commons Attribution 3.0 Unported licence (http://creativecommons.org/ genes) and the ancestral teeth genes (ODAM, FDCSP, AMTN, AMBN and ENAM). licenses/by/3.0/). Silhouettes of the human, opossum, koala and frog are from https://www. fibroblasts). Notably, Hi-C data showed that chromosomes Y2 and Y3 Data Fig. 7h). These results suggest that an ancestral interaction land- undergo frequent interchromosomal interactions, whereas autosomes scape facilitated by local enrichment of CTCF-binding sites could have confine their interactions mostly within chromosomes (Fig. 3a and promoted the reciprocal translocations between spatially adjacent Extended Data Fig. 7a–d). FISH showed that chromosomes Y2 and Y3 autosomal fragments that gave rise to the sex chromosome complex signals overlapped more frequently (5.2- and 7.6-fold) than signals in the monotremes. between chromosomes Y2 and X1 or Y2 and an autosome (chromo- some 17) (P = 8.67 × 10−4 and 8.57 × 10−5, respectively) (Fig. 3b, c and Supplementary Table 41). These interactions allow us to predict a zigzag Eco-evolutionary adaptation of diet three-dimensional conformation of the sex chromosomes at interphase Platypuses consume aquatic invertebrates whereas echidnas feed (Extended Data Fig. 7e). A similar pattern was also present in echidna predominantly on social . Although the recent ancestor of (Extended Data Fig. 7f). Notably, the high interaction frequency is monotremes had adult teeth, both extant monotremes lack teeth23. conserved in human orthologous autosomal regions (Fig. 3a), sug- Of eight genes involved in tooth development24, four genes were lost in gesting functional importance unrelated to the evolution or function both monotreme genomes, suggesting that the loss occurred in their of sex chromosomes. recent common ancestors (Extended Data Fig. 8a and Supplementary We further examined the distribution of putative binding sites Table 42), consistent with other toothless or enamel-less eutherians25. of the CTCF protein, which is usually enriched at the boundaries of Echidnas (but not platypuses) further lost two enamel genes. Analysis topologically associated domains (TADs) and mediates both intra- of genes involved in stomach function revealed that the considerable and interchromosomal interactions22. This revealed considerable loss of digestive genes (reported in platypus26) is shared with echidna enrichment of putative CTCF-binding sites at the TAD boundaries of and probably occurred in the monotreme ancestor, although NGN3— the platypus genome (Extended Data Fig. 7g), which are more enriched which is essential for stomach and pancreas development—has been along the interacting sex chromosomes X2 and X4, as well as along maintained in both species (Extended Data Fig. 8b–g and Supplemen- their orthologous regions in human and chicken (Fig. 3d and Extended tary Table 43).

760 | Nature | Vol 592 | 29 April 2021 Chemosensory systems mediate animal behaviour that is essential the haemoglobin chaperone in chicken, the CD163 family protein(s) for survival and reproduction through the direct interaction with envi- may have evolved this role in monotremes. ronmental chemical cues27. For example, eutherian mammals have more than 25 copies of bitter taste receptor genes (TAS2R genes)27,28, whereas this gene family is considerably smaller in monotremes (Extended Transition from to viviparity Data Fig. 9a) with only 7 in platypus (Supplementary Tables 44, 45). Monotremes provide the key to understanding how viviparity evolved The number is reduced to three in echidna (Fig. 4a and Supplementary in mammals. They are not as dependent on egg proteins as egg-laying Results). This reduction is also observed in pangolins, which suggests avian and reptilian species owing to their nutrient acquisition from convergent evolution that results from the insectivore diet of both uterine secretions23,37, and the subsequent reliance of the young on echidnas and pangolins29. lactation. Whereas reptiles have three functional copies of the major The nasal cavity of the platypus is closed off during diving and the egg protein vitellogenin (VTG)38, in monotremes we found only one size of the main olfactory bulb of the platypus is much smaller than functional copy (VTG2) (Extended Data Fig. 10g and Supplementary that of the echidna1. Consistent with this, the number of olfactory Table 52) and a partial sequence for VTG1. receptors (OR genes) in platypus (299) is much smaller than in echidna Similar to , monotremes have an extended lactation period (693) (Fig. 4a and Supplementary Table 46). The difference in the large and the composition of the changes dynamically as the development olfactory bulb and OR repertoire in echidna may contribute to the progresses to match the changing needs of the young37. SPINT3, a major ability to search for odours of underground prey, whereas the platypus milk-specific protein that is present in early lactation of therians with a prob- relies on electroreception to detect prey in the water. However, the able role in the protection of immunoincompetent young in marsupials39, size of the accessory olfactory bulb is larger in the platypus than in is absent in monotremes. Syntenic analysis confirmed that this region is the echidna1. The accessory olfactory bulb receives projections from conserved in platypus but contains two copies of a new protein that con- the vomeronasal organ, and there is a marked expansion of the num- tains a Kunitz domain (Extended Data Fig. 10h and Supplementary Table 53). ber of vomeronasal type-1 receptors (V1R genes) in the platypus (262) The Kunitz family is a rapidly evolving family, and one of the new members compared with the echidna (28) (Fig. 4a and Supplementary Table 47). could have a immunoprotective function similar to SPINT3 in monotremes. Vomeronasal receptors probably have important roles in courtship, The monotreme genomes contain most of the milk genes that have , induction of lactation and milk ejection in monotremes23. been identified in therian mammals38,40. Most mammals have three Therefore, the diversification of the olfactory bulb and accessory olfac- casein genes41, which encode the most abundant milk proteins secreted tory bulb systems in monotremes provide an interesting example of throughout lactation (Fig. 4c). In addition to these genes, monotremes the eco-evolutionary trade-off. V1R amplification has been associated have extra caseins that are not found in therian mammals, with unknown with the size of the vomeronasal organ and nocturnal activity30. This functions, an extra copy of CSN2 (CSN2B) (previously reported40) and is also consistent with the fact that the platypus closes its eyes when CSN3 (CSN3B) in platypus (described here), which has the classic struc- diving and therefore relies entirely on other senses underwater and ture of CSN342 (Extended Data Fig. 10i and Supplementary Table 54). in the burrow. All caseins are members of the secretory calcium-binding phospho- protein (SCPP) gene family and are thought to have evolved from other SCPP genes, namely the teeth-related gene ODAM through its derivatives Haemoglobin degradation in monotremes FDCSP and SCPPPQ142. As reported above (see ‘Eco-evolutionary adapta- The semi-aquatic lifestyle of the platypus is supported by particularly tion of diet’), extant monotremes appear to have lost both ODAM and high haemoglobin levels and large numbers of small red blood cells31. FDCSP. Syntenic analysis showed that the additional monotreme casein The haemoglobin–haem detoxification system in mammals provides genes (CSN2B and CSN3B) are found in the same therian chromosomal efficient clearance to minimize oxidative damage32 in which hapto- region as ODAM and FDCSP and within the casein locus (Fig. 4c), pro- globin is the haemoglobin chaperone32 and free haem is bound by viding further evidence that caseins evolved from odontogenic genes. haemopexin and alpha-1 microglobulin33. Both the haemopexin and alpha-1 microglobulin genes are found in the monotreme genomes, whereas the haptoglobin gene is absent Summary (Fig. 4b, Extended Data Fig. 10a, b and Supplementary Table 48), which Complete and accurate reference genomes and annotations are suggests that monotremes evolved a haemoglobin clearance system critical for evolutionary and functional analyses. It remains a chal- that is different from that of other mammals. Haptoglobin evolved lenge to produce a highly accurate chromosome-level assembly, in the common ancestor of vertebrates from an immune gene of the particularly for differentiated sex chromosomes. We have pro- MASP family33 but has neofunctionalized in mammals to bind to hae- duced a high-quality platypus genome using a combination of moglobin with a higher affinity and to bind to the CD163A receptor, single-molecule sequencing technology and multiple sources of which is also absent in monotremes, for clearance in macrophages34. physical mapping methods to assign most of the sequences to a The absence of the haptoglobin gene and CD163A in monotremes chromosome-scale assembly. This permits better-resolved analyses suggests that the neofunctionalization of haptoglobin happened of the origin and diversification of the complex sex chromosome after the divergence of monotremes from therians, not before it as system that evolved specifically in monotremes. We delineate previously thought34, and long after the evolution of enucleated red ancient and lineage-specific changes in the sensory system, hae- blood cells in the common ancestor of mammals35. Several nonmam- moglobin degradation and reproduction that represent some of malian vertebrates have lost haptoglobin, including chicken34 (Fig. 4b), the most fascinating biology of platypus and echidna. The new in which an alternative, secreted CD163 family member, PIT54, is the genomes of both species will enable further insights into therian haemoglobin-binding chaperone33. Phylogenetic analysis shows that innovations and the biology and evolution of these extraordinary monotremes lack genes that cluster with haptoglobin in the MASP egg-laying mammals. family or a PIT54 orthologue (Extended Data Fig. 10c–e and Supple- mentary Table 50). We confirmed the expansion of the CD163 fam- ily in platypus2 (ten members) and found five in echidna, compared Online content with two and three in humans and mice, respectively (Extended Data Any methods, additional references, Nature Research reporting summa- Fig. 10e, f). As mammalian CD163A can bind to haemoglobin in the ries, source data, extended data, supplementary information, acknowl- absence of haptoglobin36 and one CD163 family member has become edgements, peer review information; details of author contributions

Nature | Vol 592 | 29 April 2021 | 761 Article and competing interests; and statements of data and code availability 34. Redmond, A. K. et al. Haptoglobin is a divergent masp family member that are available at https://doi.org/10.1038/s41586-020-03039-0. neofunctionalized to recycle hemoglobin via CD163 in mammals. J. Immunol. 201, 2483–2491 (2018). 35. Huttenlocker, A. K. & Farmer, C. G. Bone microvasculature tracks red blood cell size 1. Ashwell, K. Neurobiology of Monotremes: Brain Evolution in Our Distant Mammalian diminution in and dinosaur forerunners. Curr. Biol. 27, 48–54 (2017). Cousins (CSIRO PUBLISHING, 2013). 36. Schaer, D. J. et al. CD163 is the macrophage scavenger receptor for native and chemically 2. Warren, W. C. et al. Genome analysis of the platypus reveals unique signatures of modified hemoglobins in the absence of haptoglobin. Blood 107, 373–380 (2006). evolution. Nature 453, 175–183 (2008). 37. Griffiths, M. Echidnas (Pergamon, 1968). 3. Grützner, F. et al. In the platypus a meiotic chain of ten sex chromosomes shares genes 38. Brawand, D., Wahli, W. & Kaessmann, H. Loss of egg yolk genes in mammals and the with the bird Z and mammal X chromosomes. Nature 432, 913–917 (2004). origin of lactation and placentation. PLoS Biol. 6, e63 (2008). 4. Kortschak, R. D., Tsend-Ayush, E. & Grützner, F. Analysis of SINE and LINE repeat content 39. Pharo, E. A. et al. The -specific marsupial ELP and eutherian CTI share a of Y chromosomes in the platypus, Ornithorhynchus anatinus. Reprod. Fertil. Dev. 21, common ancestral gene. BMC Evol. Biol. 12, 80 (2012). 964–975 (2009). 40. Lefèvre, C. M., Sharp, J. A. & Nicholas, K. R. Characterisation of monotreme caseins 5. Boissinot, S. & Sookdeo, A. The evolution of LINE-1 in vertebrates. Genome Biol. Evol. 8, reveals lineage-specific expansion of an ancestral casein locus in mammals. Reprod. 3485–3507 (2016). Fertil. Dev. 21, 1015–1027 (2009). 6. Phillips, M. J., Bennett, T. H. & Lee, M. S. Molecules, morphology, and ecology indicate a 41. Holt, C., Carver, J. A., Ecroyd, H. & Thorn, D. C. Invited review: Caseins and the casein recent, amphibious ancestry for echidnas. Proc. Natl Acad. Sci. USA 106, 17089–17094 micelle: their biological functions, structures, and behavior in foods. J. Dairy Sci. 96, (2009). 6127–6146 (2013). 7. Bellott, D. W. et al. Mammalian Y chromosomes retain widely expressed dosage-sensitive 42. Kawasaki, K., Lafont, A. G. & Sire, J. Y. The evolution of milk casein genes from tooth genes regulators. Nature 508, 494–499 (2014). before the origin of mammals. Mol. Biol. Evol. 28, 2053–2061 (2011). 8. Whittington, C. M. et al. Defensins and the convergent evolution of platypus and reptile Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in venom genes. Genome Res. 18, 986–994 (2008). published maps and institutional affiliations. 9. Julien, P. et al. Mechanisms and evolutionary patterns of mammalian and avian dosage compensation. PLoS Biol. 10, e1001328 (2012). Open Access This article is licensed under a Creative Commons Attribution 10. Rousselle, M., Laverré, A., Figuet, E., Nabholz, B. & Galtier, N. Influence of recombination 4.0 International License, which permits use, sharing, adaptation, distribution and GC-biased gene conversion on the adaptive and nonadaptive substitution rate in and reproduction in any medium or format, as long as you give appropriate mammals versus birds. Mol. Biol. Evol. 36, 458–471 (2019). credit to the original author(s) and the source, provide a link to the Creative Commons license, 11. Hinch, A. G., Altemose, N., Noor, N., Donnelly, P. & Myers, S. R. Recombination in the and indicate if changes were made. The images or other third party material in this article are human pseudoautosomal region PAR1. PLoS Genet. 10, e1004503 (2014). included in the article’s Creative Commons license, unless indicated otherwise in a credit line 12. Burt, D. W. Origin and evolution of avian microchromosomes. Cytogenet. Genome Res. to the material. If material is not included in the article’s Creative Commons license and your 96, 97–112 (2002). intended use is not permitted by statutory regulation or exceeds the permitted use, you will 13. Dohm, J. C., Tsend-Ayush, E., Reinhardt, R., Grützner, F. & Himmelbauer, H. Disruption and need to obtain permission directly from the copyright holder. To view a copy of this license, pseudoautosomal localization of the major histocompatibility complex in monotremes. visit http://creativecommons.org/licenses/by/4.0/. Genome Biol. 8, R175 (2007). 14. Cortez, D. et al. Origins and functional evolution of Y chromosomes across mammals. © The Author(s) 2021 Nature 508, 488–493 (2014). 15. Zhou, Q. et al. Complex evolutionary trajectories of sex chromosomes across bird taxa. Science 346, 1246338 (2014). 1BGI-Shenzhen, Shenzhen, China. 2Villum Center for Biodiversity Genomics, Section for 16. Veyrunes, F. et al. Bird-like sex chromosomes of platypus imply recent origin of mammal Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, sex chromosomes. Genome Res. 18, 965–973 (2008). 3 17. Braasch, I. et al. The spotted gar genome illuminates vertebrate evolution and facilitates Denmark. School of Biological Sciences, The Environment Institute, The University of 4 human–teleost comparisons. Nat. Genet. 48, 427–437 (2016). Adelaide, Adelaide, South Australia, Australia. MOE Laboratory of Biosystems Homeostasis 18. Gruetzner, F., Ashley, T., Rowell, D. M. & Marshall Graves, J. A. How did the platypus get its and Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Life sex chromosome chain? A comparison of meiotic multiples and sex chromosomes in Sciences Institute, Zhejiang University, Hangzhou, China. 5BGI Education Center, University of plants and animals. Chromosoma 115, 75–88 (2006). Chinese Academy of Sciences, Shenzhen, China. 6Faculty of Environmental Earth Science, 19. Golczyk, H., Massouh, A. & Greiner, S. Translocations of chromosome end-segments and Hokkaido University, Sapporo, Japan. 7Japan Monkey Centre, Inuyama, Japan. 8School of facultative heterochromatin promote meiotic ring formation in evening primroses. Plant BioSciences, The University of Melbourne, Melbourne, Victoria, Australia. 9School of Life and Cell 26, 1280–1293 (2014). Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia. 20. de Waal Malefijt, M. & Charlesworth, B. A model for the evolution of translocation 10 11 heterozygosity. Heredity 43, 315–331 (1979). digzyme Inc, Tokyo, Japan. School of Life Science and Technology, Tokyo Institute of 12 21. Casey, A. E., Daish, T. J., Barbero, J. L. & Grützner, F. Differential cohesin loading marks paired Technology, Tokyo, Japan. The Genome Center, University of California, Davis, CA, USA. and unpaired regions of platypus sex chromosomes at prophase I. Sci. Rep. 7, 4217 (2017). 13Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA. 22. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of 14Genome Informatics Section, Computational and Statistical Genomics Branch, National chromatin interactions. Nature 485, 376–380 (2012). Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA. 15Tree of 23. Griffiths, M. The Biology of Monotremes (Academic, 1978). Life Programme, Wellcome Sanger Institute, Cambridge, UK. 16The Vertebrate Genome Lab, 24. Meredith, R. W., Zhang, G., Gilbert, M. T., Jarvis, E. D. & Springer, M. S. Evidence for a single 17 The Rockefeller University, New York, NY, USA. James D. Watson Institute of Genome loss of mineralized teeth in the common avian ancestor. Science 346, 1254390 (2014). Sciences, Hangzhou, China. 18University of the Chinese Academy of Sciences, Beijing, China. 25. Springer, M. S. et al. Odontogenic ameloblast-associated (ODAM) is inactivated in 19Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, toothless/enamelless placental mammals and toothed whales. BMC Evol. Biol. 19, 31 20 (2019). Shenzhen, China. School of Agriculture and Food Sciences, The University of Queensland, 21 26. Ordoñez, G. R. et al. Loss of genes implicated in gastric function during platypus Gatton, Queensland, Australia. Laboratory of Neurogenetics of Language, The Rockefeller evolution. Genome Biol. 9, R81 (2008). University, New York, NY, USA. 22Howard Hughes Medical Institute, Chevy Chase, MD, USA. 27. Hayakawa, T., Suzuki-Hashido, N., Matsui, A. & Go, Y. Frequent expansions of the bitter 23San Diego Zoo Global, Escondido, CA, USA. 24Center for Molecular Biology of Heidelberg taste receptor gene repertoire during evolution of mammals in the Euarchontoglires University (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany. 25Wellcome Centre for Human clade. Mol. Biol. Evol. 31, 2018–2031 (2014). Genetics, University of Oxford, Oxford, UK. 26Pacific Biosciences, Menlo Park, CA, USA. 28. Johnson, R. N. et al. Adaptation and conservation insights from the koala genome. Nat. 27Department of Evolution and Ecology, College of Biological Sciences, University of Genet. 50, 1102–1111 (2018). California, Davis, CA, USA. 28Department of Reproduction and Population Health, School of 29. Liu, Z. et al. Dietary specialization drives multiple independent losses and gains in the 29 bitter taste gene repertoire of Laurasiatherian mammals. Front. Zool. 13, 28 (2016). Veterinary Medicine, University of California, Davis, CA, USA. Research School of Biology, 30 30. Hunnicutt, K. E. et al. Comparative genomic analysis of the pheromone receptor class 1 Australian National University, Canberra, Australian Capital Territory, Australia. Institute for family (V1R) reveals extreme complexity in mouse lemurs (, Microcebus) and a Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia. chromosomal hotspot across mammals. Genome Biol. Evol. 12, 3562–3579 (2020). 31School of Life Sciences, La Trobe University, Melbourne, Victoria, Australia. 32Department of 31. Johansen, K., Lenfant, C. & Grigg, G. C. Respiratory properties of blood and responses to Neuroscience and Developmental Biology, University of Vienna, Vienna, Austria. 33Center for diving of platypus Ornithorhynchus anatinus (Shaw). Comp. Biochem. Physiol. 18, Reproductive Medicine, The 2nd Affiliated Hospital, School of Medicine, Zhejiang University, 597–608 (1966). Hangzhou, China. . 34State Key Laboratory of Genetic Resources and Evolution, Kunming 32. Alayash, A. I. Haptoglobin: old protein with new functions. Clin. Chim. Acta 412, 493–498 Institute of Zoology, Chinese Academy of Sciences, Kunming, China. 35Center for Excellence (2011). 36 33. Wicher, K. B. & Fries, E. Haptoglobin, a hemoglobin-binding plasma protein, is present in in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China. These bony fish and mammals but not in frog and chicken. Proc. Natl Acad. Sci. USA 103, authors contributed equally: Yang Zhou, Linda Shearwin-Whyatt, Jing Li. ✉e-mail: frank. 4168–4173 (2006). [email protected]; [email protected]; [email protected]

762 | Nature | Vol 592 | 29 April 2021 Methods of information. We also curated and anchored some echidna X-borne scaffolds to chromosome X based on Mashmap49 (v.2.0) one-to-one Data reporting results with platypus50. No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not Annotation blinded to allocation during experiments and outcome assessment. We identified repetitive elements in both assemblies using the same pipeline, which included homologue-based and de novo prediction. Ethics and sample collection For the homology-based method, we used default repeat library from Pmale08, Pmale09 and Emale12 were collected under AEC permits S-49- Repbase (v.21.11)51 for RepeatMasker (v.4.0.6)52, trf (v.4.07)53 and Pro- 2006, S-032-2008 and S-2011-146 at Upper Barnard River (New South teinmasker (v.4.0.6)52 to annotate. For the de novo method, we first ran Wales, Australia) during the breeding season. Emale01 was collected RepeatModeler (v.1.0.8) to construct the consensus sequence library for under San Diego Zoo Global IACUC approval 18-024 and vouched at each monotreme using their genome as input, then aligned the genome San Diego Natural History Museum. against each consensus library to identify repeats using RepeatMasker. Gene annotation was performed by merging the homology, de novo Sequencing and assembling prediction and transcriptome analyses to build a consensus gene set Skeletal muscle of Pmale09 was used for PacBio, 10X and BioNano of each species. Protein sequences from human, mouse, opossum, genome sequencing and the liver of Pmale09 was used for Hi-C (Phase platypus, chicken and green lizard (Anolis carolinensis) from Ensembl54 Genomics); the liver of Pmale08 was used for Chicago Hi-C (Dovetail (release 87) were aligned to the genome using TBLASTN55 (v.2.2.26) Genomics). Heart muscle of Emale01 was used for a variety of library (e < 1 × 10−5). Candidate gene regions were refined using GeneWise for construction and Illumina sequencing analyses. Muscle of Emale12 was more accurate gene models. We randomly selected 1,000 high score used for 10X and BioNano genome sequencing and liver of Emale12 was homology-based genes to train Augustus56 (v.3.0.3) for de novo predic- used for Hi-C (Phase Genomics). Echidna RNA was extracted from brain, tion on a repeat N-masked genome. We also mapped RNA-sequencing cerebellum, kidney, liver, testis and ovary and sequenced using a previ- reads of the platypus from a previously published study57 and echidna ously published procedure43. Platypus Y chromosome BAC isolation via to their respective assemblies using HISAT258 (v.2.0.4), and constructed hybridization was performed using a previous published procedure4 transcripts using stringTie59 (v.1.2.3). Results from these three methods and sequenced with PacBio. The platypus genome was assembled fol- were merged into a nonredundant gene set. Possible retrogenes were lowing VGP assembly pipeline v.1.0. The echidna genome was assembled filtered according to their hit to SwissProt database60 (release 2015_12) using Platanus44 (v.1.2.1) and followed by three steps of scaffolding in or Iprscan61 (v.5.16-55.0). We used the SwissProt database (e < 1 × 10−5) the order of 10X, BioNano and Hi-C. Manual curation was performed for to annotate the function of the genes. Iprscan was used to annotate the both assemblies. Details are available in the Supplementary Methods. GO of genes. Detailed descriptions of the manual annotation, curation and phylogenetic analysis of genes related to imprinting, immune Sex-borne sequence identification system, reproduction and haemoglobin degradation can be found in Female and male reads were mapped to the genome using BWA the Supplementary Methods. ALN45 (v.0.7.12). The read depth of each sex was calculated in 5-kb non-overlapping windows to identify X-borne sequences and 2-kb Gap analysis non-overlapping windows to identify Y-borne sequences, normalized We identified gap-filling regions using an alignment-based strategy against the median depth. To identify X-borne sequences, we calculated similar to a previously published study62. We considered gaps for the female-to-male (F/M) depth ratio of regions that were covered by which both flanking regions mapped to mOrnAna1 as closed gaps. both sexes in each scaffold, requiring a minimum coverage of 80%, and Only properly closed gaps defined by (1) both flanking regions were assigned sequences to X-borne if the depth ratio ranged between 1.5 aligned but did not overlap and (2) closed gap size were within 100 and 2.5. To identify Y-borne sequences, we calculated the F/M depth times the estimated gap size in OANA5 were considered for repeat and ratio as well as the F/M coverage ratio and assigned scaffolds to Y-borne gene improvement analysis. if either ratio was within the range of 0.0–0.3. Parameter evaluation details are available in the Supplementary Methods. Redundant sequences analysis We performed two rounds of Mashmap with parameters ‘-f one-to-one -s Chromosome anchoring 2000’ using mOrnAna1 as reference and OANA5 as query. A one-to-one We collected 75 BAC and 179 marker genes (Supplementary Table 3) relationship was obtained in the first round of Mashmap. In the second and ordered them according to their relative order from those papers. round of Mashmap, those OANA5 sequences that were unmapped in Protein sequences of the gene markers were compared to mOrnAna1 the first round of mapping were used as query. Candidate redundant using TBLASTN and the best hit was kept, after which the markers sequences were obtained from the second round Mashmap result, but were analysed using GeneWise46 (v.2.4.1) to obtain the location within excluded regions that were gaps in OANA5. Female and male reads were a scaffold. BAC-end reads were mapped to the assembly using BWA then mapped to OANA5 and mOrnAna1 using BWA ALN and normalized MEM47 (v.0.7.12) and the best hits were kept. We also used the anchored by the mode depth. sequences of OANA5 except for the sequence of chromosome 14 to anchor the scaffolds into chromosomes. Scaffolds were orientated and Gene set comparison ordered first based on the order of FISH or gene markers then on the We performed LASTZ63 (v.1.04.00) alignment using OANA5 as reference order in OANA5. All identified PARs were included in chromosome X. with parameter set ‘--hspthresh=4500 --gap=600,150 --ydrop=15000 We collected assembled Y contigs from a previously published study4 --notransition --format=axt’ and a score matrix for the comparison of and generated some Y-BAC PacBio sequencing data. Assembled Y closely related species to generate a chain file for gene location liftover contigs were mapped to the platypus assembly using BWA MEM and from OANA5 to mOrnAna1. Gene coordinates in OANA5 were first con- Y-BAC PacBio reads were mapped using minimap2 (v.2.13)48. As evi- verted to mOrnAna1 using in-house-generated scripts with the chain dence of both Y2 and Y3 were found on scaffold_229_arrow_ctg1 and file. We searched for overlap between the converted OANA5 gene set scaffold_269_arrow_ctg1 and the covered regions overlapped, these and mOrnAna1 gene set. Fragmented genes were defined as multi- two scaffolds were excluded from the chromosome Y classification. ple converted OANA5 genes that overlapped with a single mOrnAna1 Classified Y-borne scaffolds failed to anchor and orient due to the lack gene. A one-to-one gene pair between the two gene sets was defined Article as the liftover of the OANA5 gene when it overlapped with only one rate was calculated by dividing the branch length to the mammalian mOrnAna1 gene. Only one-to-one pairs were used for the comparison common ancestor to the mammal–reptile divergence time. of open-reading frame completeness. We defined a gene as having a complete open-reading frame if its first codon is a start codon and the Gene family analysis last codon is a stop codon. Gene families across the seven species were generated using orthoMCL70 (v.2.0.9) with BLASTP results (e < 1 × 10−7) and was fed to Identification of one-to-one orthologues and synteny blocks CAFÉ71 (v.4.2) along with the phylogenetic tree. We first estimated the between the human sequence and sequences of other species assembly error by excluding families with more than 100 members. We defined one-to-one orthologues between the human sequence Then the estimated rate was used to infer the family size at every node and the sequences of other species by considering both reciprocal for each family. The ancestral node gene number of families with more best BLASTP hits (RBH) and synteny, taking the human sequence as than 100 members among extant species were inferred separately. reference, as previously described64. First, we conducted BLASTP We extracted genes based on the human gene set for GO enrichment for all protein sequences from human and other species including (χ2 test) of the significantly expanded family (Viterbi P < 0.05) for the mouse, opossum, platypus, echidna, chicken and green lizard with mammalian ancestor. A false-discovery rate (FDR) adjustment was used an e-value cut-off of 1 × 10−7, and combined local alignments with the for multiple-test corrections in GO enrichment analyses. SOLAR (http://treesoft.svn.sourceforge.net/viewrc/treesoft/). Next, we identified RBH orthologues between human and every other spe- Mammalian-specific highly conserved element analysis cies on the basis of the following parameters: alignment score, align- We used the same MULTIZ alignment of the substitution rate analy- ment rate and identity. From these RBH orthologues, we retained those sis and identified mammalian-specific highly conserved elements pairs with conserved synteny across species. Synteny was determined (MSHCEs) using a similar strategy as has previously been described72. based on their flanking genes. If RBH orthologous gene pairs shared the At least 80% of species and at least one species in eutherians, marsupi- same flanking genes, we retained the genes for downstream analyses. als and monotremes were required to be present in alignments. Type-I Finally, we merged pairwise orthologue lists according to the human MSHCEs were defined as HCEs to which no outgroup could be aligned; coordinates. In this way, we produced the final one-to-one orthologue type-II MSHCEs were HCEs that were significantly conversed (P < 0.01) set across species. in mammals compared to mammals + outgroup calculated using phyloP We used the human genome as the reference and aligned it with (Benjamini–Hochberg adjusted). We considered four sets of outgroup other species using LASTZ with parameter set ‘--hspthresh=4500 combinations: (1) green lizard only; (2) chicken only; (3) two reptiles --gap=600,150 --ydrop=15000 –notransition --format=axt’ and a score and one frog; and (4) two reptiles, one frog and one fish, and only kept matrix for the comparison of closely related species. Alignments were those that were significantly conserved in all four sets of statistical tests converted into ‘chain’ and ‘net’ results with different levels of alignment (Benjamini–Hochberg adjusted P < 0.01). Only elements ≥20 bp were scores using utilities of the UCSC Genome Browser (http://genom- kept for further analysis. ewiki.ucsc.edu/index.php/), and the pairwise synteny blocks between To annotate MSHCEs to possible functional elements, we used the genomes of each species and the human genome were extracted accord- human annotation (Ensembl release 87) as a reference and classified ing to the net result. Only alignments larger than 10 kb were kept. The the elements into the coding sequence, 5′ and 3′ untranslated regions, synteny blocks were further cleaned of overlapping genes. N50 and the non-coding RNA, pseudogene, intron, upstream 10-kb region (from total length of the synteny block inferred from each human–species start codon), downstream 10-kb region (from the stop codon) and pair were calculated based on the human coordinates. intergenic regions, with the same hierarchical order if the regions overlapped. Genes located within the upstream or downstream 10-kb Phylogenetic analysis range of MSHCEs were considered to be MSHCE-associated genes, and The phylogenetic tree was constructed using concatenated ordered by the length of the element. The top-300 MSHCE-associated four-degenerated sites from the 7,946 one-to-one orthologues using genes were used in the GO enrichment analysis (χ2 test, FDR-adjusted) RaxML65 (v.8.2.4) with parameter set ‘-m GTRCAT -# 100 -p 12345 -x and visualized using REVIGO73. 12345 -f a’ and chicken and green lizard were specified as the outgroup. MCMCtree in PAML66 (v.4.7) was used to estimate divergence time of Mammalian karyotype reconstruction each species with calibration points obtained from a previously pub- We used pairwise LASTZ alignments of the opossum, Tasmanian devil, lished study67 using the same data. Points and time range included the platypus, chicken and common wall lizard (Podarcis muralis) genomes most recent common ancestor of human–mouse, 85–94 million years to the human genome as input. Echidna was not used here as most of ago; human–opossum 150–167 million years ago; human–platypus, the sequences were not anchored to chromosomes, which would lead 163–191 million years ago, human–chicken, 297–326 million years ago, to a more fragmented reconstruction. With the net and chain results, anole–chicken, 276–286 million years ago. The seed used for MCMC conserved segments that were uniquely and universally presented in was 1192664277. all six species were obtained using inferCARs74 (release 2006-Jun-16). Marsupial and therian ancestral karyotypes were inferred using ANGES75 Substitution rate analysis (v.1.01) using the branch-and-bound algorithm, and the resulting con- We first performed pairwise whole-genome LASTZ alignment using tinuous ancestral regions (CARs) were further reorganized based on 12 mammals (Macaca mulatta, Tupaia belangeri, Mus musculus, Canis the previously predicted configuration76 (Supplementary Tables 22, lupus familiaris, Myotis lucifugus, Bos taurus, Sorex araneus, Loxodonta 23). We replaced the conserved segments of the human, opossum and africana, Dasypus novemcinctus, Monodelphis domestica, O. anatinus Tasmanian devil genomes with those of the reconstructed therian and T. aculeatus) with the human genome as the reference genome, ancestral karyotype and reconstructed marsupial ancestral karyo- with the parameter set ‘--step=19 --hspthresh=2200 -inner=2000 type using ANGES with the same parameters except setting the target --ydrop=3400 --gappedthresh=10000 --format=axt’ and a score matrix reconstruction node to mammalian ancestor. We reorganized CARs for the comparison of distantly related species. Pairwise alignments on the basis of gene synteny among ingroups and outgroups inferred were merged using MULTIZ68 (v.11.2). The four-degenerated site align- using MCScanX77 (release 08-05-2012), requiring that there is synteny ment was extracted based on the human gene set (Ensembl release 87), across CARs in at least one ingroup–outgroup pair (Supplementary concatenated and fed to phyloFit in the PHAST package69 (v.1.5) for the Tables 22, 23). Pairwise MCScanX was run among the six species with calculation of branch lengths (substitution per site). The substitution BLASTP (e < 1 × 10−7). Rearrangement events in each lineage were inferred using GRIMM78 as a bait by fimo in MEME87 (v.4.12.0) to identify putative CTCF-binding (v.2.1) by taking the karyotypes of the most recent ancestor and the sites. CTCF densities in every 100 kb non-overlapping sliding window child as input. The breakpoint number in each lineage was calculated along the platypus sex chromosomes or scaled homologous sequences on the basis of the output of GRIMM using an in-house-generated script, of echidna, human and chicken were compared. in which one breakpoint was counted in fission, two breakpoints were counted in translocation, and one or two breakpoints were counted FISH in inversion, depending on whether the inversion happened at the BACs were obtained from the Children’s Hospital Oakland Research end of the chromosome. Calculations were done using resolutions Institute from the platypus BAC library CH236: CH236-775N6 of 500 kb and 300 kb, and using the raw ANGES output and reorgan- 13q2; CH236-97I3 15p1 and CUGI BAC/EST resource centre from ized output, respectively (Supplementary Table 28). Differences in the platypus BAC library Oa_Ab: Oa_Bb-155A12 autosomal (WSB1); breakpoint rates compared to the average of all branches were tested Oa_Bb-145P09 Y2; Oa_Bb-397I21 Y3. The Super_Scaffold_40-specific as previously described79. probe was amplified from platypus genomic DNA. Gene ENSOANT 00000009075.3 was amplified using primers GTCTAAAGACAAGTG Gametologue identification TACATCTGTGAC and GTGACTTCTCTTGCGAACACAC. The 3.9-kb We used BLASTP to compare all Y-borne genes to all X-borne genes product was cloned into pGEM-T Easy (Promega). BAC probes were (e < 1 × 10−5) and kept the best hit for each Y-borne gene. Candidate directly labelled with dUTP Alexa Fluor 594-dUTP, aminoallyl- gametologue pairs were further confirmed if both of the genes were dUTP-XX-ATTO-488 (Jena Bioscience) using the Nick Transla- mapped to the same gene in NCBI or the SwissProt database. Four game- tion Kit (Roche Diagnostics) and the Super_Scaffold_40-specific tologues (platypus AMHX and FEM1CX from OANA5, and SDHAY and probe labelled with biotin using the Biotin-Nick Translation Mix HNRNPKY from ref. 14) were added as they were missing in mOrnAna1. (1175824919, Roche Diagnostics). The FISH protocol was carried out Translated genes were aligned using PRANK80, filtered using Gblock81, on cultured fibroblasts from platypus (authenticated by karyotype, and converted back into the alignment of the coding sequence. dS was not mycoplasma tested) obtained from animals captured at the calculated using codeml in PAML with ‘runmode=-2’. Upper Barnard River (New South Wales, Australia) during the breed- ing season (AEC permits S-49-2006, S-032-2008 and S-2011-146) as Demarcate evolutionary strata previously described88 with the following exceptions. Slides were We aligned all platypus Y-borne scaffolds (N-masked) to all platy- denatured at 70 °C for 3 min in 70% formamide in 2× SSC, 1 mg DNA pus X-borne sequences (N-masked), and aligned all echidna Y-borne probe was used per slide, pre-annealing of repetitive DNA sequences scaffolds (N-masked) to all echidna X-borne sequences (N-masked), was done at 37 °C for 30–60 min. Detection of biotin-labelled probes using LASTZ with the parameter set ‘--step=19 --hspthresh=2200 was done using Rhodamine Avidin D (Vector Laboratories, A-2002), --inner=2000 --ydrop=3400 --gappedthresh=10000 --format=axt’ goat Biotinylated anti-avidin D (Vector Laboratories, BA-0300) and and a score matrix set for the comparison of distantly related species. Rhodamine Avidin D. Slides were blocked in 4 × SSC, 1% BSA frac- On the basis of the net and ‘maf’ results, the identity of each alignment tion V, for 30 min at 37 °C. Rhodamine Avidin D and Biotinylated block was calculated in a 2-kb non-overlapped window and the aligned anti-avidin D and the second Rhodamine Avidin D were diluted in Y-borne sequences were oriented along the X chromosomes. Identity 4 × SSC, 1% BSA fraction V and were incubated on slides for 45 min along X chromosomes was colour-coded for visualization. at 37 °C, after each step washes were done in 4 × SSC, 4 × SSC, 0.1% triton, 4 × SSC at room temperature for 10 min each. Slides were Expression calculation mounted in VECTASHIELD with DAPI (Vector Laboratories, H-1200). RNA-sequencing reads of platypus (SRP102989) and echidna were Sample size was determined according to ref. 89, but was limited by mapped to the genome using HISTA2. Uniquely mapped reads were material availability. Images were captured on a Nikon Ti Microscope used in the calculation and normalization of the reads per kilobase per using NIS-Elements AR 4.20.00 software and processed with ImageJ million reads (RPKM) using DESeq82 (v.1.28.0) to generate an expres- (v.2.0.0). Fisher’s exact test was performed with matrix containing sion matrix for each species. For tissues that were available in both mean of associated and non-associated cells from the three repli- sexes, we computed the median RPKM of each X-borne gene, and cates. No blinding nor randomization was performed. computed its F/M RPKM ratio (requiring RPKM in both sexes to be ≥1) to determine dosage-compensation status. We used the median Reporting summary expression value in each tissue to calculate the tissue specificity Further information on research design is available in the Nature index TAU83 for each gene. We defined tissue-specific expression as Research Reporting Summary linked to this paper. a gene that shows at least twofold higher expression in tissue with the highest expression than in any other tissue, the highest RPKM > 1 and TAU > 0.8. Data availability The platypus whole-genome shotgun project has been deposited at Building genome-wide Hi-C interaction maps GenBank (project accessions PRJNA489114 and PRJNA489115), CNSA Genome-wide interaction maps at a 100-kb resolution were generated (https://db.cngb.org/cnsa/) of CNGBdb (accession CNP0000130) and for platypus, echidna and human (SRX641267) with HiC-Pro84 (v.2.10.0). GenomeArk (https://vgp.github.io/genomeark-curated-assembly/ For echidna, we only retained scaffolds >10 kb as the large number of Ornithorhynchus_anatinus/). The echidna whole-genome shotgun pro- short scaffolds would cause ICE normalization failure. The normalized ject has been deposited at GenBank (project accession PRJNA576333), sex chromosomes submatrix was extracted for quantification and plot- CNSA of CNGBdb (accession CNP0000697) and GenomeArk at ting with ggplot2 (v.3.2.1). For human, we used the scaled homologous (https://vgp.github.io/genomeark/Tachyglossus_aculeatus/). Echidna sequences of platypus for quantification and plotting. RNA-sequencing data have been deposited at GenBank (project acces- sion PRJNA591380) and CNSA of CNGBdb (accession CNP0000779). Identification of TADs and CTCF-binding sites Public database used in this study include: NCBI (https://www.ncbi.nlm. HiC-Pro interaction maps were transformed to h5 format using hicCo- nih.gov/), Ensembl (release 87) (http://dec2016.archive.ensembl.org/ nvertFormat and fed to hicFindTADs with the parameters ‘--outPrefix index.html), Uniprot (https://www.uniprot.org/) and Repbase (https:// TAD --numberOfProcessors 32 --correctForMultipleTesting fdr’ to iden- www.girinst.org/repbase/). Accession codes of genes are available in tify TADs with HiCExplorer85 (v.3.0). The human CTCF motif86 was used Supplementary Tables 31, 33, 37, 49, 51. Article

80. Löytynoja, A. Phylogeny-aware alignment with PRANK. Methods Mol. Biol. 1079, 155–170 Code availability (2014). 81. Talavera, G. & Castresana, J. Improvement of phylogenies after removing divergent and In-house-generated scripts used in this study are shared on GitHub ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56, 564–577 (https://github.com/ZhangLabSZ/MonotremeGenome). (2007). 82. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010). 43. Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. 83. Yanai, I. et al. Genome-wide midrange transcription profiles reveal expression level Nature 571, 505–509 (2019). relationships in human tissue specification. Bioinformatics 21, 650–659 (2005). 44. Kajitani, R. et al. Efficient de novo assembly of highly heterozygous genomes from 84. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. whole-genome shotgun short reads. Genome Res. 24, 1384–1395 (2014). Genome Biol. 16, 259 (2015). 45. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler 85. Ramírez, F. et al. High-resolution TADs reveal DNA sequences underlying genome transform. Bioinformatics 25, 1754–1760 (2009). organization in flies. Nat. Commun. 9, 189 (2018). 46. Birney, E., Clamp, M. & Durbin, R. Genewise and genomewise. Genome Res. 14, 988–995 86. Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327– (2004). 339 (2013). 47. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler 87. Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization to discover transform. Bioinformatics 26, 589–595 (2010). motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994). 48. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 88. Tsend-Ayush, E. et al. Higher-order genome organization in platypus and chicken sperm 3094–3100 (2018). and repositioning of sex chromosomes during mammalian evolution. Chromosoma 118, 49. Jain, C., Koren, S., Dilthey, A., Phillippy, A. M. & Aluru, S. A fast adaptive algorithm for 53–69 (2009). computing whole-genome homology maps. Bioinformatics 34, i748–i756 (2018). 89. Ling, J. Q. et al. CTCF mediates interchromosomal colocalization between Igf2/H19 and 50. Rens, W. et al. The multiple sex chromosomes of platypus and echidna are not Wsb1/Nf1. Science 312, 269–272 (2006). completely identical and several share homology with the avian Z. Genome Biol. 8, R243 90. Parra, Z. E. et al. Comparative genomic analysis and evolution of the T cell receptor loci in (2007). the opossum Monodelphis domestica. BMC Genomics 9, 111 (2008). 51. Bao, W., Kojima, K. K. & Kohany, O. Repbase update, a database of repetitive elements in 91. Van Laere, A. S., Coppieters, W. & Georges, M. Characterization of the bovine eukaryotic genomes. Mob. DNA 6, 11 (2015). pseudoautosomal boundary: documenting the evolutionary history of mammalian sex 52. Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. chromosomes. Genome Res. 18, 1884–1895 (2008). Protoc. Bioinformatics 5, 4.10.1–4.10.14 (2004). 53. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999). Acknowledgements We thank members of BGI-Shenzhen, China National GeneBank and VGP, 54. Yates, A. et al. Ensembl 2016. Nucleic Acids Res. 44, D710–D716 (2016). and P. Baybayan, R. Hall and J. Howard for help carrying out the sequencing of the platypus 55. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment and echidna genomes, M. Asahara for discussion, and D. Charlesworth for comments. Work search tool. J. Mol. Biol. 215, 403–410 (1990). was supported by the Strategic Priority Research Program of the Chinese Academy of 56. Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes Sciences (XDB31020000), the National Key R&D Program of China (MOST) grant with a generalized hidden Markov model that uses hints from external sources. BMC 2018YFC1406901, International Partnership Program of Chinese Academy of Sciences Bioinformatics 7, 62 (2006). (152453KYSB20170002), Carlsberg foundation (CF16-0663) and Villum Foundation (25900) to 57. Brawand, D. et al. The evolution of gene expression levels in mammalian organs. Nature G.Z. Q.Z. is supported by the National Natural Science Foundation of China (31722050, 478, 343–348 (2011). 31671319 and 32061130208), Natural Science Foundation of Zhejiang Province (LD19C190001), 58. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory European Research Council Starting Grant (grant agreement 677696) and start-up funds from requirements. Nat. Methods 12, 357–360 (2015). Zhejiang University. F.G., L.S.-W. and T.D. are supported by Australian Research Council 59. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from (FT160100267, DP170104907 and DP110105396). M.B.R., J.C.F. and S.D.J. are supported by the RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015). Australian Research Council (LP160101728). We acknowledge the Kyoto University Research 60. UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. Administration Office for support and Human Genome Center, the Institute of Medical 47, D506–D515 (2019). Science, the University of Tokyo for the super-computing resource for supporting T.H.’s 61. Jones, P. et al. InterProScan 5: genome-scale protein function classification. research facilities. T.H. was financed by JSPS KAKENHI grant numbers 16K18630 and 19K16241 Bioinformatics 30, 1236–1240 (2014). and the Sasakawa Scientific Research Grant from the Japan Science Society. The echidna 62. Bickhart, D. M. et al. Single-molecule sequencing and chromatin conformation capture RNA-sequencing analysis was supported by H.K.’s grant from the European Research Council enable de novo reference assembly of the domestic goat genome. Nat. Genet. 49, (615253, OntoTransEvol). This work was supported by Guangdong Provincial Academician 643–650 (2017). Workstation of BGI Synthetic Genomics No. 2017B090904014 (H.Y.), Robert and Rosabel 63. Harris, R. S. Improved Pairwise Alignment of Genomic DNA. PhD thesis, Pennsylvania State Osborne Endowment, Howard Hughes Medical Institute (E.D.J.), Rockefeller University start-up Univ. (2007). funds (E.D.J.), Intramural Research Program of the National Human Genome Research Institute, 64. Zhang, G. et al. Comparative genomics reveals insights into avian genome evolution and National Institutes of Health (A.R. and A.M.P.), Korea Health Technology R&D Project through the adaptation. Science 346, 1311–1320 (2014). Korea Health Industry Development Institute HI17C2098 (A.R.). This work used the computational 65. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of resources of BGI-Shenzhen and the NIH HPC Biowulf cluster (https://hpc.nih.gov). Animal icons large phylogenies. Bioinformatics 30, 1312–1313 (2014). are from https://www.flaticon.com/ (made by Freepik) and http://phylopic.org/. 66. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007). Author contributions G.Z., Q.Z. and F.G. initiated and designed the monotreme genome 67. Benton, M. J. et al. Constraints on the timescale of animal evolutionary history. Palaeontol. project with early input from J.G. G.Z., F.G., Q.L., T.D., B.H., J.M., O.F., E.D.J., O.A.R., H.K., P.D., Electronica 18, 1–106 (2015). J.K., T.P., H.Y. and J. Wang coordinated the project and were involved in the collection, 68. Blanchette, M. et al. Aligning multiple genomic sequences with the threaded blockset extraction and sequencing of samples. A.M.P., A.R., G.Z., Y.Z., K.H., Y.S., J. Wood, Q.L., Q.Z., F.G. aligner. Genome Res. 14, 708–715 (2004). and L.S.-W. performed genome assembling, evaluation and chromosome assignment. G.Z., 69. Hubisz, M. J., Pollard, K. S. & Siepel, A. PHAST and RPHAST: phylogenetic analysis with Y.Z., Z.S., Y.G. and F.G. performed the annotation and mammalian macro-evolutionary analysis space/time models. Brief. Bioinform. 12, 41–51 (2011). including divergence time estimation, gene family and MSHCE analysis. G.Z., Y.Z., H.A.L., J.D. 70. Li, L., Stoeckert, C. J., Jr & Roos, D. S. OrthoMCL: identification of ortholog groups for and Q.Z. performed ancestral karyotype analysis. Q.Z., J.L., Z.Z., F.G., F.P., G.Z., Y.Z., L.S.-W. and eukaryotic genomes. Genome Res. 13, 2178–2189 (2003). Y.G. performed analysis to sex chromosome and FISH validation. K.B., E.P., Y.C., Y.Z., D.S., Z.S., 71. Han, M. V., Thomas, G. W., Lugo-Martinez, J. & Hahn, M. W. Estimating gene gain and loss G.Z. and F.G. performed the immune gene analysis. Y.Z., J.C.F., M.B.R., Z.S. and G.Z. performed rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. the tooth-related gene analysis. N.B., F.G. and Y.Z. performed the digestive gene analysis and Biol. Evol. 30, 1987–1997 (2013). PCR validation. T.H., H.S., M.N. and F.G. performed the chemosensory gene analysis. L.S.-W., 72. Seki, R. et al. Functional roles of Aves class-specific cis-regulatory elements on F.G., Y.Z. and Z.S. performed the haemoglobin-degradation gene analysis. M.B.R., J.C.F. and S.J. macroevolution of bird-specific features. Nat. Commun. 8, 14229 (2017). performed the reproductive gene analysis. G.Z., Q.Z., F.G., Y.Z., L.S.-W., J.L., M.B.R., J.C.F., K.B., 73. Supek, F., Bošnjak, M., Škunca, N. & Šmuc, T. REVIGO summarizes and visualizes long lists E.P., Y.C., D.S., N.B., F.P., T.H., H.A.L., J.D., A.M.P., A.R., K.H., J. Wood, O.F. and J.G. wrote the of gene ontology terms. PLoS ONE 6, e21800 (2011). manuscript. 74. Ma, J. et al. Reconstructing contiguous regions of an ancestral genome. Genome Res. 16, 1557–1565 (2006). 75. Jones, B. R., Rajaraman, A., Tannier, E. & Chauve, C. ANGES: reconstructing ANcestral Competing interests J.K. is an employee of Pacific Biosciences, a company that develops GEnomeS maps. Bioinformatics 28, 2388–2390 (2012). single-molecule sequencing technologies. 76. Deakin, J. E. et al. Reconstruction of the ancestral marsupial karyotype from comparative gene maps. BMC Evol. Biol. 13, 258 (2013). Additional information 77. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020- and collinearity. Nucleic Acids Res. 40, e49 (2012). 03039-0. 78. Tesler, G. GRIMM: genome rearrangements web server. Bioinformatics 18, 492–493 Correspondence and requests for materials should be addressed to F.G., Q.Z. or G.Z. (2002). Peer review information Nature thanks Janine Deakin, Rebecca Johnson and Hugues Roest 79. Kim, J. et al. Reconstruction and evolutionary history of eutherian chromosomes. Proc. Crollius for their contribution to the peer review of this work. Natl Acad. Sci. USA 114, E5379–E5388 (2017). Reprints and permissions information is available at http://www.nature.com/reprints. Extended Data Fig. 1 | See next page for caption. Article

Extended Data Fig. 1 | Platypus genome assembly and evaluation. a, b, Hi-C depth in mOrnAna1 in both male and female. One-to-one sequences in OANA5 two-dimensional juicebox maps of mOrnAna1 before (a) and after (b) manual (black) have 1× normalized depth in both OANA5 and mOrnAna1 in reads that assembly curation. The grey lines depict scaffold boundaries. The off-diagonal are mapped from both sexes as expected. Each dot represents one mapping matches between scaffolds indicate potential missed joins, whereas ‘empty’ region between OANA5 and mOrnAna1 by Mashmap, and the normalized depth areas within scaffold boundaries indicate misjoins. The gEVAL-supported values of each dot are calculated as the mean depth across the mapping region manual assembly curation led to a notably improved arrangement with >96% of in OANA5 and mOrnAna1. The small peak in one-to-one sequence density plot the assembly sequence inside chromosome-scale scaffolds. c, d, The Super_ in the male indicates candidate X-linked sequences. h, Example redundant Scaffold_40 was misassigned to chromosome 15 in OANA5 but FISH on sequences Contig40802, Contig44497 and Contig35847 in OANA5 that could metaphase spreads from platypus fibroblasts map it to chromosome 13. c, Co- be interpreted as false duplications. Dot plot is generated between the target hybridization of the BAC of chromosome 15 (green, top arrow) and Super_ region of mOrnAna1 chromosome 1 and OANA5 Contig1255, Contig40802, Scaffold_40 probe (red, bottom arrow) showing an absence of co-localization Contig44497 and Contig35847 by FlexiDot. Candidate redundant sequences (14 nuclei scored, 2 independent experiments). Inset, interphase example. are those mapped to the same region in mOrnAna1 chromosome 1, highlighted d, Co-hybridization of the BAC of chromosome 13 (green) and Super_ by dashed lines in the dot plot and grey in the normalized depth plot. Scaffold_40 probe (red) showing co-localization (arrows, 40 nuclei scored, Normalized male and female read depths along each sequence are calculated in 5 independent experiments). Scale bars, 10 μm. e, An example of scaffold 500-bp windows, and plotted along each sequence. Although the normalized chromosome misassignment in OANA5. Female-to-male (F/M) depth ratio, depth is always around 1 in the region of mOrnAna1 chromosome 1, normalized normalized female depth and normalized male depth along OANA5 depth drops half in Contig40802, Contig44497, Contig35847 and the aligned chromosome 14 in 5-kb non-overlapping windows. Depth ratio, normalized regions in Contig1255, indicating that Contig40802, Contig44497 and female depth and normalized male depth all suggest that OANA5 chromosome Contig35847 are probably redundant sequences in OANA5. i, j, Examples of 14 should be an X-borne rather than autosomal sequence. f, g, Normalized depth gene annotation artefacts in OANA5: CIT (i) and PBRM1 (j) have been distribution of redundant sequences and one-to-one sequences in male (f) fragmented into multiple small artificial genes in OANA5 (purple) but have now and female (g). Redundant sequences (red) in OANA5 are probably assembly been fully recovered in mOrnAna1 (orange). Orthologous human genes (grey) artefacts due to heterozygotes of the sequenced individual of OANA5, and are are also shown to indicate that the mOrnAna1 rather than OANA5 annotation therefore featured with 0.5× normalized depth in OANA5 but 1× normalized has a similar gene structure to that of the human genes. Extended Data Fig. 2 | See next page for caption. Article Extended Data Fig. 2 | Mammalian genome evolution. a, Phylogenetic tree adjust for multiple testing. GO terms are clustered based on semantic constructed using fourfold degenerate sites from 7,946 one-to-one similarity. GO terms related to nervous system development are highlighted in orthologues among seven representative species (human, mouse, opossum, bold. e, A case of one MSHCE in BCL11A that overlaps with the enhancer signals platypus, echidna, chicken and green lizard). The fossil time calibration of the inferred from H3K27ac ChIP-seq experiments at 8.5 and 12 weeks after nodes marked by circles were obtained from a previously published study67. conception (p.c.w.). f, Evolution highway comparative chromosome browser The numbers of gene families that have undergone significant (Viterbi P < 0.05) visualization of reconstructed MACs at a 500-kb resolution. Blocks overlaid on lineage-specific expansions (green) and contractions (red) are marked on each each MAC represent human syntenic fragments. Numbers within blocks branch. Exact P values are available in Supplementary Table 29. No multiple- indicate the homologous human chromosome. g, Evolution highway testing correction was applied. b, Examples of some imprinting gene clusters comparative chromosome browser visualization of the human genome at a improved in mOrnAna1 compared to OANA5. The first line of each synteny plot 500-kb resolution, with each block overlaid on each human chromosome represents mOrnAna1 and the second line represents OANA5. Names of genes representing putative chromosome fragments of the ancestral mammalian that have been found to be imprinted in human and mouse are highlighted in genome. Numbers within blocks depict the ancestral mammal chromosome black, and non-imprinting genes in red. Fragmented genes with alignment rate numbers. Silhouettes of the human and opossum are from https://www. lower than 70% are marked by triangles. The double slash represents the flaticon.com/. The silhouette of the platypus is created by S. Werning and is intermediate region longer than 100 kb. c, Distribution of MSHCEs on genomic reproduced under a Creative Commons Attribution 3.0 Unported licence elements. d, Enriched GO terms in the top-300 MSHCE-associated genes. (http://creativecommons.org/licenses/by/3.0/). P values of enrichment are calculated using a χ2 test, and FDRs are computed to Extended Data Fig. 3 | See next page for caption. Article Extended Data Fig. 3 | Evolution of immune gene family in monotremes. bootstrap values with greater than 50% support are shown. f, Synteny a, MHC genes in platypus and echidna are located on two different conservation of beta-defensin genes in monotremes and loss of functional chromosomes, but the classical class I and II genes involved in antigen venom defensins in echidna. Venom defensins (OavDLP genes) and venom-like presentation are located within a single cluster in each genome. defensin (DEFB-VL genes) are shown in red. Only putative functional defensins b, Phylogenetic relationship of class I genes in representative mammals and are shown. g, Putative OavDLP loss in echidna. OavDLP genes and DEFB-VL each chicken. Classical class I genes (red) in monotremes exhibit high similarity, contain two exons (indicated by a box and triangle) in platypus. Both exons of which is rarely observed in other species. Only bootstrap values with >50% platypus DEFB-VL can be mapped to echidna chromosome X2. A single platypus support are shown. c, d, Phylogenetic relationship of MHC class II alpha (c) and OavDLP exon can be mapped to echidna chromosome X2 while the second beta (d) genes. Genes with prefix ‘HLA’, ‘Modo’, ‘Phci’, ‘Oran’, ‘Taac’ and ‘Gaga’ exons cannot. Grey links indicate platypus–echidna LASTZ alignment. indicate genes in human, opossum, koala, platypus, echidna and chicken, h, Phylogenetic relationship of DEFB-VL and OavDLP genes suggested that respectively. Only bootstrap values with >50% support are shown. ancestral monotremes had all three OavDLP genes but that echidna has lost the e, Phylogenetic relationship among putative functional Vγ sequences from two of them (OavDLP-B and OavDLP-C). Branch length is not shown. ta, echidna; platypus (yellow), echidna (purple), koala (green), mouse (orange), human oa, platypus. Silhouettes of the human, opossum, koala and frog are from (red), sheep (grey), cow (dark red) and chicken (dark yellow). Groups according https://www.flaticon.com/. The silhouette of the platypus is created by to a previous study90 are displayed around the outside of the tree, with the S. Werning and is reproduced under a Creative Commons Attribution 3.0 putative marsupial–monotreme-specific group denoted by a ‘?’. Only Unported licence (http://creativecommons.org/licenses/by/3.0/). Extended Data Fig. 4 | See next page for caption. Article Extended Data Fig. 4 | Genomic composition of monotreme sex values of the M/F expression ratio is close to 0 for genes on autosomes (A) and chromosomes. a, Composition of the echidna sex chromosomes. The circos PARs, whereas for genes on SDRs, the expression is female-biased in all tissues, plot (from outer to inner rings) shows: X chromosomes with PARs shown as which suggests that monotremes have partial dosage compensation. Whiskers light colours and SDRs as dark colours; assembled Y chromosome fragments indicate the 25– 75th percentiles and circles are the median value. c, Some PARs showing the colour-scaled sequence similarity levels with homologous X show significantly higher GC content than SDRs. For platypus, some PARs (X2- chromosomes; normalized F/M ratios of Illumina DNA-sequencing depth in PAR-S, X3-PAR-S, X4-PAR-L, X5-PAR-S and X5-PAR-L (where -S is the shorter PAR non-overlapping 5-kb windows; F/M expression ratios (each red dot is one of the chromosome and -L the longer PAR)) show significantly (P < 0.01) higher gene) of adult kidney and smoothed expression ratio trend; and GC content in GC content (1-kb non-overlapping windows) than the SDRs of the same non-overlapping 2-kb windows. In addition, Y-linked fragments with a similar chromosome, which are labelled as asterisks in the heat map. We also checked level of sequence divergence from the X chromosome indicate a pattern of their orthologous sequences in chicken, as a proxy for the ancestral status evolutionary strata. As expected, F/M DNA depth ratio is centred at 1 at PARs, before the chromosome became a sex chromosome, and found similarly higher but is around 2 at SDRs. Some PARs show significantly higher GC content than GC content in the orthologous region of PARs than those of SDRs in chicken. the regions that suppressed recombination between X and Y. b, Partial dosage ***P < 0.01 (all P < 2.2 × 10−16), one-sided Wilcoxon rank-sum test. d, Atlas of compensation in monotremes. The four point range plots show log2- orthologous chicken fragments along each platypus sex chromosome. The transformed values of the male-to-female (M/F) expression ratio in the brain, PARs between the platypus X and Y chromosomes are indicated by crosses. kidney, heart and liver of platypus and echidna. As expected, log2-transformed We also labelled the position of the putative sex-determining gene AMH. Extended Data Fig. 5 | See next page for caption. Article

Extended Data Fig. 5 | Evolution of PARs after the platypus and echidna from echidna) were used to infer X2-PAR-S and X5-PAR-L of platypus evolved divergence. a, The distribution of pairwise dS values of platypus and echidna independently from echidna after their divergence, given their different sex chromosomes. In both platypus and echidna, gametologue pairs in the X1 lengths. This is supported by the Venn diagrams of PAR genes between S0 region (Fig. 2), which is largely homologous to chicken chromosome 28, platypus and echidna, in which most genes are not shared within have a higher dS value than those of any other sex-linked regions. This suggests independently evolved PARs. d, Alignments of PAR–SDR boundaries between that X1 S0 is the oldest evolutionary stratum. Therefore, we also show platypus platypus and echidna. Alignments of genes (±1 Mb around the boundaries) genes of X2 with an orthologue on chicken chromosome 28 separately from support independent evolution of X2-PAR-S and X5-PAR-L in platypus and others (X1_S0_chr28). Following the order of dS values of different echidna, as most of their genes are not homologous at the PAR–SDR boundaries chromosome regions, we inferred the time order of formation of evolutionary (blue, PAR genes; red, SDR genes; platypus, top chromosome, echidna, bottom strata, called S0–S6. For platypus, n = 5, 5, 2, 2, 1, 1, 4, 2 and 6 XY gametologue chromosome). We used lines to connect the genes of the two species, whenever pairs are plotted, from left to right. For echidna, n = 7, 2, 1, 1, 4, 2 and 1 XY they are orthologous to each other. For each X chromosome, we also labelled gametologue pairs are plotted, from left to right. Box plots show median, their repeat information. Six repeat tracks between each X pair are shown, from quartiles (boxes) and range (whiskers). b, Phylogenetic tree examples of top to bottom: the overall repeat content of platypus; LINE/L2 elements of gametologues that evolved in the common ancestor of monotremes (EF2 in X2) platypus; SINE/MIR elements of platypus; SINE/MIR elements of echidna; LINE/ and independently in two monotreme species (IRF4 in X3). c, Alignments of L2 elements of echidna and overall repeat content of echidna. We did not find platypus and echidna X chromosomes (PAR, light colours; SDR, dark colours; obvious repeat enrichment at PAR–SDR boundaries, as shown previously in the top chromosomes are from platypus and the bottom chromosomes are cow91. Extended Data Fig. 6 | Sex chromosome evolution in monotremes. chromosomes (x axis) and human chromosomes (y axis). c, Homologous a, Mummerplot showing homology between platypus (x axis) and echidna relationships between platypus sex chromosomes and chicken. d, Alignment (y axis) X chromosomes. Blue lines: forward alignment; red lines: reverse between platypus and chicken showing the alternating pairing pattern of the alignment. For echidna, X1, X2 and X3 are homologous to platypus X1, X2 and platypus sex chromosome chain. e, X/Y pairwise dS comparison between X3, respectively. Echidna X4 is homologous to platypus X5. And for echidna X5, gametologues on X1–Y5 pair (n = 18) and other sex chromosome pairs (n = 10). it is not homologous to any platypus sex chromosome, and instead it is Box plots show median, quartiles (boxes) and range (whiskers). ***P < 0.001 homologous to platypus chromosome 12. b, Homology between platypus X (P = 0.0002954), one-sided Wilcoxon rank-sum test. Article

Extended Data Fig. 7 | Chromatin conformation of monotreme sex (boxes) and range (whiskers). ***P < 0.0001 (P < 2.2 × 10−16), one-sided Wilcoxon chromosomes. a, Hi-C interactions between platypus sex chromosomes, with rank-sum test. e, Inferred three-dimensional structure of the platypus sex chromosome 1 shown as control. a, b, There are unexpected interchromosomal chromosome system. X chromosomes are shown in red and Y chromosomes in interactions (shown in red) between platypus sex chromosomes detected by blue, with PARs in light colour. Interchromosomal interactions inferred from Hi-C data (a), whereas most interactions are within the same chromosomes Hi-C are shown by dashed lines. f, Hi-C interactions reveal unexpected (shown in red in b) for the other chromosomes (b). c, The Hi-C interchromosomal interchromosomal interactions between the echidna sex chromosomes. interactions among platypus sex chromosome (inter_XY, n = 2,711 100-kb g, Putative CTCF-binding sites are enriched at TAD boundaries in platypus and windows) is significantly higher than that among autosomes (inter_A, echidna sex chromosomes. For each X chromosome of platypus, we calculated n = 14,342,930 100-kb window). Box plots show median, quartiles (boxes) and their putative CTCF-binding-site density per 10 kb and plotted them along the range (whiskers). ***P < 0.0001 (P < 2.2 × 10−16), one-sided Wilcoxon rank-sum ±500 kb of TAD boundaries. Platypus X4 and echidna X5 are not shown because test. d, The interaction strength is higher between Y2 and Y3 than the less than 10 TAD boundaries are detected. h, Putative CTCF-binding-site interaction strengths between Y2 and other chromosomes. n = 1,002, 228, density plot showing its enrichment among the homologous regions of 5,025, 67,313 and 6,904,867 100-kb windows are shown in Y2-Y2, Y2-Y3, platypus, echidna, human and chicken. Y2-other.sex.chr, Y2-A and A-A, respectively. Box plots show median, quartiles Extended Data Fig. 8 | See next page for caption. Article

Extended Data Fig. 8 | Loss of dietary-related genes in monotremes. to other mammals. This, together with sequencing results, shows that NGN3 in a, Tooth-related gene loss in representative mammals and reptiles. monotremes is present and is likely to be functioning normally. NGN3, NGN3 b–f, Potential loss of digestion-related genes in both monotremes shown by primers; b-actin, β-actin primers; -ve, negative control, no template; gDNA, whole-genome alignment and read mapping. In each panel there are three lines genomic DNA template; brain, brain cDNA template; stom, stomach cDNA in the synteny plot, representing the orthologous region of the genes in template; int, intestine cDNA template; panc, pancreas cDNA template. Lanes 1 platypus, human and echidna from top to bottom, respectively. Grey links (top), 1, 8 (middle) and 9 (bottom) are a 100-bp DNA ladder: 1,517, 1,200, 1,000, indicate human–platypus and human–echidna LASTZ alignments. Each 900, 800, 700, 600, 500/517, 400, 300, 200 and 100 bp. Expected sizes of PCR rectangle or triangle represents an exon. Fragmented genes are marked by products for NGN3 in platypus is 157 bp and for echidna 145 bp, and the PCR dashed lines. Illumina reads of platypus and echidna are aligned to the platypus product for the β-actin genomic region is 597 bp and cDNA is 348 bp. or human genome (Ensembl release 87) and the flanking region of each gene is Silhouettes of human and opossum are from https://www.flaticon.com/. The visualized by pyGenomicTrack. GAPDH region is also plotted as a control. silhouette of the platypus is created by S. Werning and is reproduced under a g, RT–PCR expression analysis shows expression of NGN3 in brain, stomach, Creative Commons Attribution 3.0 Unported licence (http:// intestine and pancreas of both platypus and echidna. These results are similar creativecommons.org/licenses/by/3.0/). Extended Data Fig. 9 | Taste-receptor evolution and olfactory-receptor previous ancestors. b, Genomic organization of the intact class I olfactory organization in monotremes. a, Maximum-likelihood mammalian-wide gene receptor (OR) cluster spanning over 1.2 Mb on platypus chromosome 2 tree of the bitter taste receptors (TAS2R genes). There are 28 eutherian (Eu), (138,375,798–139,616,970 bp). The vertical lines indicate the 48 intact class I OR 27 marsupial (Ma) and 7 monotreme-specific (Mo) orthologous gene groups genes. The white open box indicates the J element, a presumable cis-regulatory (supported by ≥95% bootstrap values), where the nodes of orthologous gene element (enhancer) for the mammalian class I OR cluster (chromosome 2: group clades are indicated by white open circles. Bootstrap values of ≥70% in 139,639,465–139,639,907 bp). Silhouettes of human, opossum and koala are the nodes connecting orthologous gene group clades are indicated by from https://www.flaticon.com/. Silhouettes of the platypus and Tasmanian asterisks. There are 3 therian (I, II and III), 2 eutherian (I and II), 3 marsupial devil are created by S. Werning and are reproduced under a Creative Commons (I, II and III) and one monotreme-specific clusters in which massive expansion Attribution 3.0 Unported licence (http://creativecommons.org/licenses/ events occurred in the common ancestor of each taxon after the split from its by/3.0/). Article

Extended Data Fig. 10 | See next page for caption. Extended Data Fig. 10 | Genomic features related to haemoglobin clearance surrounding the vitellogenin (VTG) genes VTG1, VTG2 and VTG3. Pseudogenes and reproduction in monotremes. a, b, Confirmation of HP absence in are marked by a dashed outline. Monotremes have pseudogene VTG1, monotremes by whole-genome alignment (a) and read mapping (b). Grey links functional VTG2 and no VTG3; and there is a pseudogene VTG2 in koala. indicate human–platypus and human–echidna LASTZ alignments. Illumina Syntenic maps are shown for human (Homo sapiens), koala (Phascolarctos reads of platypus and echidna are aligned to the human genome (Ensembl cinereus), chicken (Gallus gallus), platypus (O. anatinus) and echidna release 87) and coding regions of HP are visualized by pyGenomicTrack. (T. aculeatus). Koala scaffold 1, NW_018343984.1; koala scaffold 2, Limited coverage is found at the exons of HP, suggesting the absence of HP in NW_018344134.1. Gene distances are not to scale. h, Synteny conservation of monotremes. c, Phylogenetic tree of HP and related proteases across different regions containing SPINT3. Synteny conservation of the region surrounding species using the maximum-likelihood method. Node IDs are in format of serine peptidase inhibitor, Kunitz-type, 3 (SPINT3). No copy of SPINT3 is ‘species geneID’. Branch length is not shown here. d, Gene synteny plot of the detected in platypus but many of the other flanking genes in the region are PIT54 region between chicken and platypus. Echidna is not shown in the figure conserved. Other members with a WFDC domain are detected including two as the flanking orthologues of PIT54 are on different scaffolds, preventing us Kunitz-domain members that did not align to any known gene (labelled KDCP1). from determining the presence of the gene by synteny. e, Phylogenetic tree of Syntenic maps are reported for human (H. sapiens), cow (B. taurus), grey short- members of the group B scavenger receptor cysteine-rich family across tailed opossum (Monodelphis domestica), koala (P. cinereus) and platypus different species using the neighbour-joining method. Gene IDs are formatted (O. anatinus). Koala scaffold 1, NW_018343967.1; koala scaffold 2, as ‘species geneID’. Branch length is not shown here. f, Confirmation of SCART1 NW_018344098.1. Gene distances are not to scale. i, Casein 3 (CSN3) protein number difference by dot plot and mapping depth of SCART1 orthologous sequence alignment in monotremes. All three CSN3 proteins identified in the regions between platypus and echidna. The region of the SCART1 cluster in monotremes have the classic five-exon structure of CSN3 with the untranslated platypus is plotted along the x axis while the sequence of echidna is plotted exons I and IV (not shown), the signal peptide in exon II, a small exon III coding along the y axis. Lines in dot plot are visualized according to LASTZ alignment for 11 residues, a pSER cluster (S**) at the 5′ end of exon IV and a relatively large between the two species. Normalized male and female read depths along each P/Q-rich exon IV. OA, O. anatinus (platypus); TA, T. aculeatus (short-beaked sequence is calculated in 500-bp windows, and plotted along each sequence. echidna). Silhouettes of human, opossum and koala are from https://www. Normalized depth of both sexes, especially those in the shading region, is flaticon.com/. The silhouette of the platypus is created by S. Werning and is centred at 1 along both species, confirming the SCART1 number difference reproduced under a Creative Commons Attribution 3.0 Unported licence between the two species is true and is not due to assembly issues. g, Synteny (http://creativecommons.org/licenses/by/3.0/). conservation of vitellogenin genes. Synteny conservation of the region nature research | reporting summary

Corresponding author(s): Guojie Zhang

Last updated by author(s): Jul 6, 2020

Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one- or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section. A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)

For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Our web collection on statistics for biologists contains articles on many of the points above.

Software and code Policy information about availability of computer code Data collection Fluorescence in situ hybridization (FISH) data were collected Nikon Ti Microscope using NIS-Elements software (supplied with a Nikon Eclipse- Ti inverted epifluorescent microscope) and processed with ImageJ.

Data analysis Common bioinformatic and statistical analysis software packages were used, including: smrtanalysis (v5.1.0.26412), FALCON (v5.1.1), FALCON-Unzip (v1.0.2), Scaff10X (v2.1 git 4.28.2018), BioNano Solve (v3.2.1_04122018), SALSA (v2.0), Longranger (v2.2.2), FreeBayes (v1.2.0), BCFtools (v1.8), gEVAL (https://vgp-geval.sanger.ac.uk), Platanus (v1.2.1), BWA (v0.7.12), Samtools (v1.2), BLAST (v2.2.26), NCBI BLAST (v2.2.31), minimap2 (v2.13), Mashmap (v2.0), RepeatMasker (v4.0.6), trf (v4.07), ProteinMasker (v4.0.6), RepeatModeler (v1.0.8), GeneWise (v2.4.1), Augustus (v3.0.3), HISAT2 (v2.0.4), stringTie (v1.2.3), Iprscan (v5.16-55.0), BEDTools (v2.26.0), LASTZ (v1.04.00), BUSCO (v3.0.2), RaxML (v8.2.4), PAML (v4.7), MULTIZ (v11.2), PHAST (v1.5), orthoMCL (v2.0.9), CAFE (v4.2), REVIGO (http://revigo.irb.hr/), inferCARs (2006- Jun-16), ANGES (v1.01), MCScanX (08-05-2012), GRIMM (v2.1), DESeq (v1.28.0), HiC-Pro (v2.10.0), ggplot2 (v3.2.1), HiCExplorer (v3.0), MEME (v4.12.0), EMBOSS (v6.5.7), hmmer (v3.1b2), genscan (v1.0), MEGA (X, 7, v5.5.2), exonerate (v2.2.0), pyGenomeTracks (v2.1), MAFFT (v6.857b), MUSCLE (v3.8.31), TreeBest (v1.9.2), ggtree (v1.16.6), TOPCONS (v2.0), CD-HIT (v4.6.8), FATE (v2.7.0), GHOSTZ (v1.0.2), Clustal Omega (v1.2.4), Clustal W (v2.1), ImageJ (v2.0.0), NIS-ELements software (v4.20.00). Custom scripts are open source and available on GitHub at https://github.com/ZhangLabSZ/MonotremeGenome. April 2020 For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

1 Data nature research | reporting summary Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: - Accession codes, unique identifiers, or web links for publicly available datasets - A list of figures that have associated raw data - A description of any restrictions on data availability

Platypus whole genome shotgun project has been deposited at GenBank under the project accession PRJNA489114 and PRJNA489115, CNSA (https://db.cngb.org/ cnsa/) of CNGBdb with accession CNP0000130 and GenomeArk at https://vgp.github.io/genomeark-curated-assembly/Ornithorhynchus_anatinus/. Echidna whole genome shotgun project has been deposited at GenBank under the project accession PRJNA576333, CNSA of CNGBdb with accession CNP0000697 and GenomeArk at https://vgp.github.io/genomeark/Tachyglossus_aculeatus/. Echidna RNA-seq has been deposited at GenBank under the project accession PRJNA591380 and CNSA of CNGBdb with accession CNP0000779.

Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design All studies must disclose on these points even when the disclosure is negative. Sample size The determination of sample size for genome sequencing is not applied in this study. Sample size for FISH experiments were determined according to the previously published work in the field (Ling et al. 2006), but was also limited by the availability of the material. Bioinformatic analyses were performed with all available data.

Data exclusions None

Replication FISH experiments showing that Super_Scaffold_40 did not co-localize with the chr15 BAC were carried out in two independent experiments and co-localization of Super_Scaffold_40 with the chr13 BAC was carried out in five independent experiments. Potential interchromosomal interactions indicated by Hi-C data were validated in three independent FISH experiments for each Y2-Y3, Y2-X1and Y2-WSB1(chr17) pair.

Randomization Randomization was not performed in this study. Only one cell line was used (established from a single individual) and all available data was collected in all experiments to fulfill the criteria for the sample size. In strata analysis gametologues were grouped according to the pairwise dS and phylogeny. Grouping in interchromosomal interaction analysis was based on sequence chromosome assginment.

Blinding Our study was not an intervention study and therefore blinding was not required.

Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. Materials & experimental systems Methods n/a Involved in the study n/a Involved in the study Antibodies ChIP-seq Eukaryotic cell lines Flow cytometry Palaeontology and archaeology MRI-based neuroimaging Animals and other organisms Human research participants Clinical data April 2020 Dual use research of concern

Eukaryotic cell lines Policy information about cell lines Cell line source(s) Fibroblast primary cell culture derived from the adult male platypus individual number 1 from NSW field expedition in New South Wales, year 2008.

2 None of the cell lines used were authenticated.

Authentication nature research | reporting summary

Mycoplasma contamination Cell lines were not tested for mycoplasma contamination.

Commonly misidentified lines No commonly misidentified lines were used. (See ICLAC register)

Animals and other organisms Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research Laboratory animals Study did not involve laboratory animals.

Wild animals Study did not involve wild animals.

Field-collected samples Platypus and echidna male individuals where collected during breeding season at the upper Barnard river area (New South Wales) and euthanised with an intraperitoneal injection of pentobarbitone sodium (Nembutal) at a dose of 0.1 mg/g body weight. Tissues where snap frozen in liquid N2 immediately. Emale01 was collected from Melbourne Zoo and provided by San Diego Zoo Global, with an estimated date of birth of 12/31/67.

Ethics oversight Platypus and echidna samples (Pmale08, Pmale09, Emale12) were collected under the AEC permits S-49-2006, S-032-2008 and S-2011-146 at the Upper Barnard River (New South Wales, Australia) during the breeding season. Emale01 was collected under San Diego Zoo GlobalIACUC approval 18-024, and is vouchered at the San Diego Natural History Museum. Note that full information on the approval of the study protocol must also be provided in the manuscript. April 2020

3

Minerva Access is the Institutional Repository of The University of Melbourne

Author/s: Zhou, Y; Shearwin-Whyatt, L; Li, J; Song, Z; Hayakawa, T; Stevens, D; Fenelon, JC; Peel, E; Cheng, Y; Pajpach, F; Bradley, N; Suzuki, H; Nikaido, M; Damas, J; Daish, T; Perry, T; Zhu, Z; Geng, Y; Rhie, A; Sims, Y; Wood, J; Haase, B; Mountcastle, J; Fedrigo, O; Li, Q; Yang, H; Wang, J; Johnston, SD; Phillippy, AM; Howe, K; Jarvis, ED; Ryder, OA; Kaessmann, H; Donnelly, P; Korlach, J; Lewin, HA; Graves, J; Belov, K; Renfree, MB; Grutzner, F; Zhou, Q; Zhang, G

Title: Platypus and echidna genomes reveal mammalian biology and evolution

Date: 2021-01-06

Citation: Zhou, Y., Shearwin-Whyatt, L., Li, J., Song, Z., Hayakawa, T., Stevens, D., Fenelon, J. C., Peel, E., Cheng, Y., Pajpach, F., Bradley, N., Suzuki, H., Nikaido, M., Damas, J., Daish, T., Perry, T., Zhu, Z., Geng, Y., Rhie, A. ,... Zhang, G. (2021). Platypus and echidna genomes reveal mammalian biology and evolution. NATURE, 592 (7856), pp.756-+. https://doi.org/10.1038/s41586-020-03039-0.

Persistent Link: http://hdl.handle.net/11343/274271

File Description: Published version License: CC BY