A comparative genomic investigation of high-elevation adaptation in ectothermic

Supplementary Information Appendix

Table of contents

1. Materials and sequencing 1

1.1. De novo genome sequencing 1

1.2. Transcriptome sequencing 1

1.3. Genome resequencing 1

2. Assembly 1

2.1. De novo genome assembly of a female Tibetan hot spring 1

2.2. De novo transcriptome assembly of the Tibetan hot spring snake 2

2.3. De novo genome assembly of five re-sequenced snakes 2

2.4. Mitochondrial genome assembly 2

3. Repeat 2

3.1. Repeat annotation 2

3.2. Transposable element (TE) expansion history analysis 2

3.3. Discussion 2

4. annotation 3

4.1. Gene structural annotation 3

4.2. Gene functional annotation 3

5. GC contents 3

5.1. Isochore structure in vertebrates 3

6. Genome evolution 4

6.1. Identification of gene families 4

6.2. Phylogenetic tree construction 4

www.pnas.org/cgi/doi/10.1073/pnas.1805348115 i

6.3. Divergence time estimation 4

6.4. Expanded and contracted gene families 4

6.5. Positively selected (PSGs) 5

7. Whole genome alignments and sex evolution 5

8. High-altitude adaptation 6

8.1. Shared amino acid substitutions 6

8.2. Functional assay of FEN1 6

8.3. Functional assay of EPAS1 6

9. Supplementary Figures 7

10. Supplementary Tables 20

REFERENCES 43

ii

1. Materials and sequencing 1.1. De novo genome sequencing Blood samples acquired from a female Tibetan hot spring snake (Thermophis baileyi, sample name: 1-13) captured in Yangbajing, Xizang, China were used for de novo sequencing. Whole-genome shotgun sequencing was employed and short paired-end inserts (280 bp and 450 bp) and long mate-paired inserts (2 kb, 5 kb, and 10 kb) were subsequently constructed using a standard protocol provided by Illumina (San Diego, USA). Paired-end sequencing was performed using the Illumina HiSeq 2000 system (SI Appendix, Table S1).

1.2. Transcriptome sequencing Six tissues (liver, brain, heart, lung, muscle, and ovary) were collected from the same Tibetan hot spring snake individual (individual name: 1-13) used for genome sequencing. Total RNA was extracted from pooled tissues using a TRIzol® kit (Life Technologies, Carlsbad, USA). PolyA messenger RNAs (mRNAs) were isolated using oligonucleotide (dT) magnetic beads and disrupted into short segments. This was followed by cDNA synthesis using random hexamer primers and reverse transcriptase. After end-repair, adapter-ligation and polymerase chain reaction (PCR) amplification, each paired-end cDNA library was sequenced with a read length of 150 bp using the Illumina Hiseq 2000 sequencing platform.

1.3. Genome resequencing Samples used for genome resequencing were acquired from three Thermophis and two false (Pseudoxenodon) individuals. A male Tibetan hot spring snake (T. baileyi) sample, a female Sichuan hot spring snake (T. zhaoermii) sample, and a female Shangri-La hot spring snake (T. shangrila) sample were obtained from Yangbajing, Xizang, China, Quhe village, Litang, Sichuan, China, and Tianshengqiao, Shangri-La, Yunnan, China, respectively. A female large-eyed false cobra (Pseudoxenodon macrops, http://www.iucnredlist.org/details/191926/0) and male bamboo false cobra (Pseudoxenodon bambusicola, http://www.iucnredlist.org/details/192230/0) were collected from Honghe, Yunnan and Quanzhou, Fujian, China. For each of the five snakes, two paired-end libraries with an average insert size of 450bp were constructed. Each library was prepared according to the appropriate Illumina’s protocols, and sequenced with a read length of 150 bp using the HiSeq 2000 instrument (SI Appendix, Table S2).

2. Assembly 2.1. De novo genome assembly of a female Tibetan hot spring snake Short insert size (280 bp and 450 bp) paired-end reads were filtered by removing adaptor sequences, PCR duplicates and low-quality reads using Trimmomatic (v3.20)(1), followed by error correction using SOAPec (v2.01)(2). The target DNA fragment size was less than twice the single-end read length, so the reads may have overlapped, e.g., 150 bp Illumina reads were taken from the 280 bp insert size library, and for the read-through library, the corresponding paired-end reads were merged into a longer fragment if there was an overlap using PEAR(3). Long mate-pair reads (2 kb and 5 kb) were trimmed using NextClip(4), and fragments with the junction adapter in at least one of the paired reads were used. (SI Appendix, Table S3) To estimate the genome size of the Tibetan hot spring snake, the KmerFreq_AR program in the SOAPec ver. 2.01 package was used to construct a k-mer frequency spectrum (k=17) using data from library I310 (SI Appendix, Fig. S1 and Table S3). In our analysis, the total number of input reads was 301,627,168, the total number of bases was 41,624,549,184, the total K-mer number (K_num) was 36,798,245,614, and the expected depth (K_depth) was 20. Assuming that genome size (G) can be estimated as G=K_num/K_depth(5), the Tibetan hot spring snake genome size was estimated as 1.76 Gb. Whole-genome shotgun assembly of the Tibetan hot spring snake was performed using the short oligonucleotide analysis package SOAPdenovo2(2). Qualified reads from short-insert size libraries were used to construct a de Bruijn graph without using paired-end information. Contigs were constructed by merging the bubbles and resolving the small repeats. All of the qualified reads were realigned to the contig sequences and paired-end relationships between the reads of allowed linkages between the contigs. Step-by-step, we used the relationships from the short-insert-size paired ends to the long-distance paired ends to construct scaffolds. Gaps were then closed using the paired-end information to retrieve read pairs, in which one end mapped to a unique contig and the other was located in the gap region, using GapCloser (version 1.10)(2). (SI Appendix, Table S4) To assess assembly quality, we plotted the GC-depth distribution for the assemblies (SI Appendix, Fig. S2). We also used a core eukaryotic gene mapping method (CEGMA, Core Eukaryotic Genes Mapping Approach)(6) to identify the core genes in the Tibetan hot spring snake genome assembly; 235 core eukaryotic genes from 248 (94%) were found in the assembly (SI Appendix, Table S5).

1

2.2. De novo transcriptome assembly of the Tibetan hot spring snake Before de novo assembly, Illumina raw reads were filtered by the following steps. Read pairs with adapter contamination were removed, and each remaining read was cropped by 10 bp at the head and 5 bp at the tail. Read pairs with N contents greater than 5% and over 50% low quality bases (Q3) were also removed. Finally, reads with potentially low-quality regions were trimmed using Trimmomatic (v3.20)(1). Reads with a quality score below 3 at the beginning or at the end were also trimmed off, and reads containing 3' or 5' ends with an average quality score of below Q15 in a four- sliding window were trimmed. Any reads becoming shorter than 25 bp were excluded from further assembly. After trimming, all of the cleaned reads were used for assembly using Trinity (version2.0.3)(7) with the default parameters set. (SI Appendix, Table S6)

2.3. De novo genome assembly of five re-sequenced snakes Before de novo assembly, Illumina raw reads were filtered (SI Appendix, Table S7) by the following steps. Read pairs with adapter contamination were removed, and reads with potentially low-quality regions were trimmed using Trimmomatic (v3.20)(1). Reads with a quality score below 3 at the beginning or at the end were also trimmed off, and reads containing 3' or 5' ends with an average quality score of below Q15 in a four-base pair sliding window were trimmed. Any reads shorter than 32 bp were also excluded from further assembly. After trimming, all of the cleaned reads of each individual were used for assembly using SOAPdenov2(2) with K=31. The gap regions of each assembly were filled using GapCloser(2) (version 1.10) (SI Appendix, Table S8).

2.4. Mitochondrial genome assembly The mitochondrial genome of each individual was reconstructed (SI Appendix, Table S9) directly from genomic next-generation sequencing reads using MITObim v1.8(8), which implements a baiting and iterative mapping approach. The complete mitochondrial genome of T. zhaoermii (National Center for Biotechnology Information (NCBI) Accession No.: GQ166168.1) was used as the initial reference, and 24 million pairs of clean reads (SI Appendix, Table S3 and Table S7) were used as input for each individual.

3. Repeat 3.1. Repeat annotation A de novo repeat annotation of the Tibetan hot spring snake genome was conducted using RepeatModeler(9) and RepeatMasker(10). Tibetan hot spring snake-specific de novo repeat libraries were constructed by combining results from LTR_STRUC(11) and RepeatModeler(9) with the default parameters set. Consensus sequences in the Tibetan hot spring snake-specific de novo repeat libraries and their classification information were further combined with the library from RepeatMasker(10) in order to run RepeatMasker(10) on the assembled scaffolds, followed by tandem repeat identification using TRF(12). Ophiophagus hannah, Python bivittatus, Boa constrictor, Thamnophis sirtalis, Ophisaurus gracilis, Pogona vitticeps, Anolis carolinensis, and Gekko japonicus genomes were annotated with the same pipeline. (SI Appendix, Fig. S3 and Table S10-S12)

3.2. Transposable element (TE) expansion history analysis An in-house PERL script was written to parse the result (.out file) generated by RepeatMasker(10) to fetch the divergence between copies of TEs identified in the genome to the consensus sequence in the library. The distributions of each TE subtype in each species were then plotted (SI Appendix, Fig. S4) and compared using a Wilcoxon rank-sum test (SI Appendix, Table S13). We used percentage divergence as a proxy for TE age to detect TE expansion; therefore, a small divergence indicated a recent TE expansion, while a larger divergence indicated a more ancient expansion. We also calculated the percentage of each TE family in the divergence (ranging from 0.0 to 0.60), and TE families that constituted less than 1% of the genome were excluded from further analysis. If TE activity was uniform, we would expect a consistent accumulation of TEs at different evolutionary time points. An excessive accumulation of TEs at a particular evolutionary time point indicated a potential TE expansion in the host genome (SI Appendix, Fig. S5).

3.3. Discussion We identified repetitive sequences in five snake species using a combination of structure- and homology- based analyses (SI Appendix, Table S10). In the snakes we include in this analysis, we find that the proportion of the genome made up of transposons is much higher than that in mammals, birds, and (SI Appendix, Table S12). SINE-type TE content is similar across all five snake species considered here. LINE-type TE content in the suborder Colubroidea – including the Tibetan hot spring snake, the common garter snake, and the king cobra – makes up a lower proportion of the genome than in the suborder Henophidia (python and boa), while SINE-type

2

TEs are higher in Colubroidea than in Henophidia (SI Appendix, Table S12). We identified ~791Mb of repetitive sequences, which is predominantly made up of LTRs and other unknown TEs (SI Appendix, Table S12), comprising a total of 45.28% of the T. baileyi genome assembly. This is the highest percentage of TE content among all five snakes with a sequenced genome (SI Appendix, Table S10), and thus the assembled genome size for T. baileyi is larger than the others. TE expansion is one of the major sources of vertebrate genome expansion. In our analyses, we found that patterns of TE expansion are highly similar within the Colubroidea and Henophidia (SI Appendix, Fig. S5A). Within Colubroidea, even though the Tibetan hot spring snake is more closely related to the common garter snake, the TE expansion patterns are far more similar between the common garter snake and the king cobra. In contrast, the pattern of TE expansion in the Tibetan hot spring snake is far more different from the common garter snake than would be expected by phylogenetic inertia, suggesting a potential relationship between TE expansion and novel environmental factors, which has also been observed in the invasive inbreeding ant Cardiocondyla obscurior (13). Notably, a more recent wave of TE insertions was detected in Thermophis, including a Thermophis-specific expansion of the LTR/Gypsy and DNA/hAT-Charlie families (SI Appendix, Fig. S5A).

4. Gene annotation 4.1. Gene structural annotation Transcriptome alignment, de novo gene prediction, and based predictions were used for gene prediction (SI Appendix, Fig. S6). Briefly, for transcriptome alignment, RNA-Seq reads were assembled into transcripts using Trinity described above and the transcripts were aligned to the genomes to obtain gene structure annotation information using PASA(14). For de novo gene prediction, SNAP(15), GeneMark-ET(16) and Augustus(17) were used to predict genes on TE-hard-masked genome sequences, and a high-quality data-set for training these ab initio gene predictors was generated by PASA(14). For sequence homology based gene prediction, sequences from the SwissProt vertebrate database and four sequences (human, chicken, and green anole from Ensembl 78, and python (GCF_000186305.1) from NCBI) were incorporated into MAKER2(18) to generate homology gene structures. All of the predicted gene structures were integrated into the consensus gene models using MAKER2 (18), and the distributions of genic elements were compared to related species (SI Appendix, Fig. S7). To evaluate whether our gene models were contaminated with a large number of TE-related , the proteomes of the Tibetan hot spring snake and a well-annotated model organism (zebrafish, Danio rerio, Ensembl78)were BLASTed against the TE sequences (protein and nucleotide sequences) in RepBase(19). A simple method of ascertaining whether the predicted protein-coding genes are fragmental or complete is to compare their full-length homologs; therefore, we BLASTed the Tibetan hot spring snake proteome against the SwissProt database(20) and the top hit for each protein was used. The length ratio was computed as the length of the Tibetan hot spring snake protein divided by the length of its corresponding SwissProt homologous sequence, and their frequencies were plotted (SI Appendix, Fig. S8). In general, if a large number of TEs in the genome were not identified, and/or were not hard-masked, the resulting gene models would have a high percentage of them, and the gene number would be inflated. From SI Appendix, Table S14, we found that the number of such proteins was low, even lower than in zebrafish. Therefore, almost all the TE-related proteins were masked before gene annotation.

4.2. Gene functional annotation To determine the functional annotation of the gene models, a BLASTP(21) search with an E-value ≤ 1e-5 was performed against the protein databases NR (non-redundant protein sequences in NCBI), SwissProt(20), RefSeq(22), and Trembl(20). The resulting NR BLASTP hits were processed by BLAST2GO(23) to retrieve the associated (GO) terms(24) describing biological processes (BP), molecular functions (MF), and cellular components (CC). The motifs and domains of each gene model were predicted by InterProScan(25) 4.8 against public protein databases, including ProDom(26), PRINTS(27), Pfam(28), SMART(29), PANTHER(30), PROSITE(31) and TIGR(32). (SI Appendix, Table S15)

5. GC contents 5.1. Isochore structure in vertebrates GC contents (G+C/A+T+C+G) were calculated in 50 kb sliding windows with a step size of 5kb in the genome of species from SI Appendix, Table S16. Scaffolds with lengths less than 50 kb were filtered. The terminal windows of each scaffold with lengths less than 50kb and windows with the contents of ambiguous ‘N’ greater than 20% were also filtered. The mean, variance and SD of GC contents for each species were calculated (SI Appendix, Table S17). Wilcoxon rank-sum tests were conducted to investigate the standard variation in GC contents among

3

the categories (SI Appendix, Table S18). The varying levels of GC content along sequences (isochores) appear to be related to recombination rate (33). Specifically, isochores with high GC content can result from higher rates of GC-biased gene conversion (gBGC) in regions of higher recombination rates (33). The genomes of the lizards Anolis carolinensis (34) and Pogona vitticeps(35), have been found to lack GC-rich isochores. By examining variation in GC content (measured as standard deviation in SI Appendix, Fig. S12), we find that snake genomes that have been sequenced to date have a higher degree of GC-isochore structure than lizards (that is, the standard deviation in GC content is higher in snakes), and across , GC-isochore structure increases even more strongly in birds (SI Appendix, Fig. S12). We see that among the taxa included in this analysis, birds and mammals have the highest variance in GC-isochore structure (SI Appendix, Fig. S12). These findings suggest that it’s more likely that ancestral Sauropsida species have eroded towards GC homogeneity and that snakes and other Sauropsida species might have re-evolved GC- isochore structures towards heterogeneity since their divergence from lizards. Isochores with high GC content are believed to be the consequence of higher rates of GC-biased gene conversion (gBGC) in regions of higher recombination rates (33). Therefore, when compared to other snakes, the moderate level of GC heterogeneity observed in the Tibetan hot-spring snake genomes may reflect only moderate recombination rates despite extreme environmental conditions, However, the taxon sampling of snake genomes is too sparse to make specific predictions about where major changes in GC-isochore content have occurred among snakes.

6. Genome evolution 6.1. Identification of gene families Protein sequences of 23 species of vertebrate were downloaded from Ensembl (release version 78) and NCBI (SI Appendix, Table S19). Only the longest transcript was selected for each gene with alternative splicing variants. Genes with fewer than 50 amino acids were removed. Self-to-self alignments were conducted for pooled protein sequences using BLASTP(21) with an E-value of 1e-5, and low quality hits (identity <30% and coverage <30%) were removed. Orthologous groups were constructed by ORTHOMCL(36) v2.0.9 using the default settings based on the filtered BLASTP results (SI Appendix, Table S20). Genes that could not be clustered into any gene family and for which only one species sample was available were considered species-specific. GO terms that were statistically significantly over-represented among the Tibetan hot spring snake-specific genes were identified using BiNGO(37) in Cytoscape(38) by conducting a hypergeometric test. The entire GO annotations of the Tibetan hot spring snake genes were assigned as a reference set, and the Benjamini and Hochberg false discovery rate (FDR) correction was applied (SI Appendix, Table S21).

6.2. Phylogenetic tree construction Single copy gene families were retrieved from the ORTHOMCL(36) results as described above and used for phylogenetic tree construction. Families containing any sequences shorter than 200 amino acids were removed. The protein sequences from each family were aligned using MUSCLE v3.8.31(39) with the default parameters set, and the corresponding CDS alignments were back-translated from the corresponding protein alignments. Conserved CDS alignments were extracted by Gblocks(40), and the resulting CDS alignments of each family were used for further phylogenomic analyses. For phylogenetic tree construction, the CDS alignments of each single family were concatenated to generate a matrix of 100,000 unambiguously aligned nucleotide positions. Four-fold degenerate nucleotide sites (4DTV) were extracted from these super genes, and MrBayes3.22(41) was used to generate a Bayesian tree with the GTR+I+Γ model using 4DTV (SI Appendix, Fig. S9). The Markov chain Monte Carlo (MCMC) process was run for 5,000,000 generations, and trees were sampled every 100 generations with the first 10,000 samples dropped.

6.3. Divergence time estimation The concatenated supergenes were separated into three categories that corresponded with the 1st, 2nd, and 3rd codon sites in the CDS. Divergence times were estimated under a relaxed clock model using the MCMCTREE program in the PAML4.7 package(42). The “Independent rates (clock=2)” model and the “JC69” model in MCMCTREE were used for the calculation. The MCMC process was run for 6,000,000 iterations, after a burn-in of 2,000,000 iterations. We ran the program twice for each data type to confirm that the results were similar between runs. The constraints are listed in SI Appendix, Table S22.

6.4. Expanded and contracted gene families Gene family expansion and contraction analyses were performed using CAFÉ 3.1(43). Firstly, an “expanded and contracted gene family” on each branch of the tree was detected by comparing the cluster size of each branch with the maximum likelihood cluster size of the ancestral node leading to that branch: a small ancestral node

4

indicated gene family expansion, while a large one indicated gene family contraction. The overall p-value (family- wide p-value in CAFÉ 3.1(43), which is based on a Monte Carlo resampling procedure) of each branch and node was then calculated, and the exact p-values (Viterbi method in CAFÉ 3.1(43)) of each significant overall p-value (≤0.01) gene family were also calculated. Finally, for each branch and node, an “expanded and contracted gene family” with both an overall p-value and an exact p-value ≤0.01 was defined as a “significantly expanded and contracted gene family” (SI Appendix, Table S23). Significantly over-represented GO terms among these significantly expanded gene families were identified using the topGO(44) package in R programming language (https://www.r-project.org/), and the Benjamini and Hochberg FDR correction was applied. Significantly over-represented GO terms were identified with corrected p- values of ≤0.05 (SI Appendix, Table S24 and Table S25).

6.5. Positively selected genes (PSGs) To identify potential PSGs in the Tibetan hot spring snake lineage, gene families of the Tibetan hot spring snake and seven other species (Gallus gallus, Alligator sinensis, Chelonia mydas, Thamnophis sirtalis, Python bivittatus, Ophiophagus hannah, and Anolis carolinensis) were retrieved from the ORTHOMCL(36) results as described above. Single gene families were then extracted and the protein sequences from each family were aligned using MUSCLE(39) v3.8.31 with the default parameters set. The corresponding CDS alignments were back- translated from the corresponding protein alignments using PAL2NAL(45), and conserved CDS alignments were extracted by Gblocks(40) and used for further PSGs identification. The branch-site model of CODEML in PAML(42) 4.7 was used to test for potentially PSGs, with the Tibetan hot spring snake set as the foreground branch and the others as background branches. The null hypothesis was that the ω value of each site on each branch was less than or equal to 1, while the alternative hypothesis was that the ω values of particular sites on the foreground branch were greater than 1. A likelihood ratio test (LRT) was then performed: the null distribution was a 50:50 mixture of chi-squared distributions with 1 degree of freedom and a point mass at zero. The p-values calculated based on this mixture distribution were further corrected for multiple testing by conducting FDR test with a Bonferroni correction. The PSGs met the requirements of a corrected p-value (<0.01) and contained at least one positively selected site with a posterior probability greater than 0.99, according to a Bayes Empirical Bayes (BEB) analysis (SI Appendix, Table S26). Significantly over-represented GO terms among the PSGs were identified using the topGO(44) package in R, and significantly over-represented GO terms were identified with corrected p-values of ≤0.05 (SI Appendix, Table S27). Of 517 genes examined that are involved in the “response to DNA damage stimulus” (GO term), 12 showed evidence of positive selection in the Tibetan hot-spring snake (SI Appendix, Table S28).

7. Whole genome alignments and sex chromosome evolution Pairwise alignments between the Tibetan hot spring snake and the green anole lizard, as well as between five re-sequenced genome assemblies, were performed using the LASTZ program(46). The LASTZ outputs in the axt format were chained using the axtChain program. The chained alignments were processed into nets with chainNet, and the netSyntenic Best-chain alignments in the axt format were extracted using the netToAxt program. We mapped the Tibetan hot spring snake scaffolds to green anole lizard , and then linked these mapped scaffolds to pseudo-chromosomes. This information was used for the identification of Z-linked sequences in snakes. We used the BWA(47) program to map the Illumina clean reads from the sequenced male and female hot spring snakes to the Tibetan hot spring snake de novo assemblies with the default parameters set. As far as is known, all snakes have genotypic (as opposed to environmental) sex determination (48). Sex chromosomes are derived from autosomes when sex-linked alleles become fixed on particular chromosomes. In many snakes, homomorphic sex chromosomes have become differentiated into ZW heteromorphic chromosomes. Pythons and boas possess homomorphic sex chromosomes, which are assumed to be the ancestral state in snakes, whereas many colubroid snakes have heteromorphic sex chromosomes (48, 49). However, the evolutionary history and structure of sex chromosomes in Thermophis remain unclear. To investigate the origin and evolution of Thermophis sex chromosomes, male and female genomic reads were mapped to the de novo assembled scaffolds to estimate their coverage using the BWA(47) program. As these snakes have a ZW sex chromosome system, Z- linked regions with a degenerate W homolog should only have half the genomic coverage in females relative to males, whereas autosomal regions and undifferentiated sex-linked regions (pseudo-autosomal regions, or PARs) should have equal genomic coverage in both sexes. Considering that karyotypes and synteny have been well conserved during the evolution of reptiles (48), genomic scaffolds from the Tibetan hot spring snake were assigned to the chromosomes of the green lizard according to sequence similarity, and then male and female coverage was mapped to green lizard chromosomes using the Thermophis-Anolis synteny relationships (SI Appendix, Fig. S13). This coverage analysis reveals that Thermophis scaffolds homologous to chromosomes 1–5 of Anolis have similar

5

coverage in males and females, while scaffolds homologous to of Anolis shows a nearly 2-fold reduction in female to male coverage relative to the other chromosomes in this species. These results suggest that the ancestral snake sex chromosomes were derived from homologs to Anolis chromosome 6. Moreover, we do not identify any segments along Anolis chromosome 6 with similar coverage in the two sexes, as would be expected for pseudo-autosomal regions (SI Appendix, Fig. S13), indicating that, at the nucleotide level, Tibetan hot spring snakes has completely heteromorphic sex chromosomes.

8. High-altitude adaptation 8.1. Shared amino acid substitutions Tibetan hot spring snake proteomes were independently aligned with those of Ophisaurus gracilis, Anolis carolinensis, Pogona vitticeps, Ophiophagus hannah, Python bivittatus, Boa constrictor, and Thamnophis sirtalis using BLASTP. Reciprocal best hits (RBH) were extracted from each pair, and RBHs from each pair were merged into groups according to their homology with the Tibetan hot spring snake. Groups with less than two lizard and two snake species were removed. The protein sequences from each group were then aligned using MUSCLE(39) v3.8.31 with the default parameters set, and conserved amino acid alignments were extracted using Gblocks(40). Tibetan hot spring snake-specific amino acid substitutions were extracted from the conserved amino acid alignments, and the amino acids at these substitution sites were checked in the five re-sequenced snakes (three hot spring and two low-land snakes, see SI Appendix, Table S2) using multiple genome alignments of the Tibetan hot spring snake and the five re-sequenced genome assemblies. Multiple genome alignments of the Tibetan hot spring snake and the five re-sequenced genome assemblies were obtained by merging whole-genome pairwise alignments (obtained from the above mentioned LASTZ pipeline) using Multiz(50). If the amino acids in one site were the same in all four hot spring snakes and differed to those in the lizard and low-land snake species, they were defined as hot spring snake-specific amino acid substitutions. In total, 9,680 hot spring snake-specific amino acid substitution sites from 5,543 genes were identified. The functional effects of these hot spring snake-specific amino acid substitutions were further evaluated by SIFT(51) and PolyPhen2(52). In total, 27 sites (from 27 genes) predicted as “DELETERIOUS” by SIFT and predicted as “damaging” by PolyPhen2 were identified (Fig. 2C, SI Appendix, Fig. S10 and Table S29). Sites of 12 PSGs that may play a role in DNA damage repair in the Tibetan hot spring snake (SI Appendix, Table S28) were also checked in the five re-sequenced snakes (three hot spring and two low-land snakes, see SI Appendix, Table S2) using multiple genome alignments of the Tibetan hot spring snake and the five re-sequenced genome assemblies. We only retained the positively selected sites in which amino acids in the four hot spring snake individuals are the same but differed to other species (SI Appendix, Fig. S11).

8.2. Functional assay of FEN1 HEK293 cells were cultivated in DMEM (Gibco C11995500BT) with 10% FBS (Gibco 10099141), and were then transiently transfected with pCMV-3×FLAG-FEN1, or pCMV-3×FLAG-FEN1 p.Ala200Thr. 24h after transfection (Lipofectamine™ 3000 Transfection Reagent, Gibco L3000015), the cells were passage- cultured into 5 plates averagely. After 48h, the cells were irradiated with UV of 40J·m-2·min-1 for 0 min, 2 min, 5 min, 15 min, and 30 min, harvested and immediately lysed (RIPA Lysis Buffer, Beyotime P0013B; PMSF, Beyotime ST506), and subjected to immunoblot with anti-FLAG (sigma 1804) or anti-LAMIN (Abcam ab83472) or anti-GAPDH (SUNGENE BIOTECH KM9002T) antibodies. Three biological replicates were used to produce gray scale images by Quantity One.

8.3. Functional assay of EPAS1 To investigate whether endogenous EPO transcriptional up-regulation differs between EPAS1 and EPAS1 p.His65Arg, we constructed three plasmids including pIRES2-EPAS1-EGFP, pIRES2-EPAS1p.His65Arg-EGFP and pIRES2-EGFP. These plasmids were over-expressed in 293T cells. For mRNA extraction, 5×104 293T cells were plated on six-well plates in triplicate, and when the cells reached 50% confluence, the plasmids (1μg/well) were transfected to the cells after 48 h using Lipofectamine® 3000 (Thermo Fisher L3000015). A Semiquantitative RT-PCR was performed using first-strand cDNA (RevertAid H Minus First Strand cDNA Synthesis Kit, THERMO , #K1631) that was synthesized from total RNA samples (Total RNA Kit II, OMEGA R6934-01) and GoTaq Colorless Master Mix (PROMEGA M7133) to ascertain whether the plasmids were successfully transfected. The amplified products were separated on 1% agarose gels, stained with Goldview (SBS 090804), and photographed. Real-time quantitative PCR was performed using SYBR® Premix Ex Taq™ II (Takara RR820A) with first- strand cDNA to evaluate EPO expression. The primers used, annealing temperatures, and expected product sizes are described in SI Appendix, Table S30.

6

9. Supplementary Figures Fig. S1 Estimation of Tibetan hot spring snake genome size based on a 17-mer analysis. The x-axis represents the depth (X) and the y-axis represents the ratio, i.e., the frequency at that depth divided by the total frequency at all depths.

0.025

0.02 Kmer Species Kmer Individual 0.015

Ratio 0.01

0.005

0 0 20 40 60 80 Kmer Depth (X)

Fig. S2 Distributions of guanine-cytosine (GC) contents and sequencing depths. The x-axis represents the GC contents and the y-axis represents the average depth.

7

Fig. S3 Identification of repeats using Tibetan hot spring snake-specific repeat libraries

8

Fig. S4 Distribution of divergence of each transposable element (TE) type

9

Fig. S5 (A) Comparison of TEs in five snake genomes. The y-axis indicates a specific TE family at a given divergence from the consensus sequence and x-axis indicates its percentage of the genome. (B) Comparison of TEs in lizard and gecko genomes. The x-axis indicates a specific TE family at a given divergence from the consensus sequence and the y-axis indicates its percentage of the genome.

10

Fig. S6 Pipeline of gene annotation.

11

Fig. S7 Distributions of coding DNA sequence (CDS) length, intron length, transcript length, and the CDS numbers of each transcript.

12

Fig. S8 Frequency of the length ratio of a homologous pair. Homologous pairs contain two homologous proteins from two species. In our analysis, one was the target species and the other was a sequence from the SwissProt database.

Fig. S9 Phylogeny of 24 vertebrate genomes.

13

Fig. S10 Twenty-seven genes with hot spring snake-specific amino acid substitutions.

14

15

16

17

Fig. S11 Genes from the hypoxia inducible factor (HIF) super family with hot spring snake specific amino acid substitutions.

18

Fig. S12 Variation in GC content among various genomes. Y-axis is the standard deviation of GC contents (10Kb windows) for each species. Species for each category were listed in SI Appendix, Table S16.

Fig. S13 Distribution of normalized read coverage depth for female (green) and male (orange) Tibetan hot-spring snake. Tibetan hot-spring snake scaffolds were ordered along the green anole lizard chromosomes as well as to three other lizard un-anchored scaffold sequences larger than 10Mb (GL343214.1, GL343208.1, and GL343193.1 respectively).

19

10. Supplementary Tables Table S1 Ten libraries used by the Tibetan hot spring snake genome sequencing project. Insert sizes include paired-end read lengths. Coverage was calculated assuming a genome size of 1.76 Gb. Total bases for Sequence Insert size library Read pairs Bases Length each library Depth (X) I301 190,512,843 57,153,852,900 150 57,153,852,900 32.47 I303 163,156,184 48,946,855,200 150 48,946,855,200 27.81 280bp I305 201,583,472 60,475,041,600 150 60,475,041,600 34.36 I306 170,847,513 51,254,253,900 150 51,254,253,900 29.12 I309 163,405,410 49,021,623,000 150 49,021,623,000 27.85 450bp I310 177,028,127 53,108,438,100 150 53,108,438,100 30.18 81,471,862 24,441,558,600 150 2k A002 83,554,995 25,066,498,500 150 73,945,660,800 42.01 81,458,679 24,437,603,700 150 83,435,425 25,030,627,500 150 2k A007 82,645,013 24,793,503,900 150 73,751,185,800 41.90 79,756,848 23,927,054,400 150 79,989,538 23,996,861,400 150 5k A021 81,518,079 24,455,423,700 150 72,289,345,500 41.07 79,456,868 23,837,060,400 150 10k 10k 149,607,583 14,661,543,134 49 14,661,543,134 8.33 Total 1,949,428,439 554,607,799,934 554,607,799,934 315.12

Table S2 Raw Illumina reads data of the five snakes. Lane Library Species Sample Library Read pairs bases Read pairs bases 70,465,469 21,139,640,700 20,873,624 6,262,087,200 I311 132,323,744 39,697,123,200 20,622,715 6,186,814,500 T. baileyi 1hao 20,361,936 6,108,580,800 (Male) 39,828,338 11,948,501,400 I312 39,160,352 11,748,105,600 118,522,076 35,556,622,800 39,533,386 11,860,015,800 T. shangrila 333 - - 126,301,661 37,890,498,300 2-2 (Female) 334 - - 105,375,253 31,612,575,900 T. zhaoermii 332 - - 91,350,706 27,405,211,800 A3 (Female) 331 - - 97,847,001 29,354,100,300 P. macrops 327 - - 152,833,124 45,849,937,200 B3 (Female) 328 - - 137,559,901 41,267,970,300 P. bambusicola 329 - - 140,385,912 42,115,773,600 C2 (Male) 330 - - 162,689,488 48,806,846,400

20

Table S3 Coverage was calculated assuming a genome size of 1.76 GB.

Insert size library type Read pairs Bases Depth (X) overlap 109,124,258 28,506,587,682 16.20 I301 R1 45,527,754 6,393,615,059 3.63 R2 45,527,754 5,812,110,105 3.30 overlap 97,372,588 25,305,349,605 14.38 I303 R1 49,944,851 7,100,485,923 4.03 R2 49,944,851 6,545,648,704 3.72 280bp overlap 75,238,336 19,898,306,833 11.31 I305 R1 91,627,756 13,268,610,171 7.54 R2 91,627,756 12,530,591,144 7.12 overlap 68,779,823 18,126,332,374 10.30 I306 R1 69,597,702 10,033,866,703 5.70 R2 69,597,702 9,422,170,138 5.35 R1 141,908,944 19,175,467,432 10.90 I309 R2 141,908,944 18,413,270,069 10.46 500bp R1 148,278,489 20,046,290,029 11.39 I310 R2 148,278,489 19,263,090,243 10.94 R1 125,052,763 11,682,845,543 6.64 A002 R2 125,052,763 12,116,870,206 6.88 2k R1 130,121,078 11,938,670,907 6.78 A007 R2 130,121,078 12,310,611,364 6.99 R1 121,272,636 11,557,951,802 6.57 5k A21 R2 121,272,636 11,937,668,354 6.78 R1 146,016,323 7,154,799,827 4.07 10k 10k R2 146,016,323 7,154,799,827 4.07 Total 2,489,211,597 325,696,010,044 185.05

Table S4 Statistics for the de novo assemblies. N50 is the minimum sequence length of 50% of the entire assembly. N10 to N90 is defined similarly. scaffold contig

length(bp) number length(bp) number

Max length 18,661,274 151,693

N10 6,660,994 18 45,574 2,695 N30 3,992,054 88 26,220 12,313 N50 2,413,955 202 16,800 27,763 N70 1,258,892 401 10,152 52,391 N90 363,689 889 4,351 99,157 Total length 1,747,683,645 1,608,265,774

number>=200bp 20,741 193,595

number>=2000bp 3,161 133,147

GC rate 0.38 0.412

21

Table S5 Statistics of the completeness of the Tibetan hot spring snake genome based on 248 core eukaryotic genes (CEGs) using CEGMA(6). #Prots %Completeness - #Total Average %Ortho Complete 206 83.06 - 247 1.20 17.48 Group 1 57 86.36 - 66 1.16 15.79 Group 2 42 75.00 - 49 1.17 16.67 Group 3 50 81.97 - 64 1.28 22.00 Group 4 57 87.69 - 68 1.19 15.79

Partial 235 94.76 - 339 1.44 31.91 Group 1 63 95.45 - 88 1.40 28.57 Group 2 54 96.43 - 77 1.43 37.04 Group 3 56 91.80 - 85 1.52 35.71 Group 4 62 95.38 - 89 1.44 27.42

# These results are based on the set of genes selected by Genis Parra # Prots = number of 248 ultra-conserved CEGs present in genome # %Completeness = percentage of 248 ultra-conserved CEGs present # Total = total number of CEGs present including putative orthologs # Average = average number of orthologs per CEG # %Ortho = percentage of detected CEGS that have more than 1 ortholog

Table S6 Statistics of transcriptome sequencing data and assemblies. Raw read pairs 32,580,435 Raw bases 9,774,130,500 Clean read pairs 32,526,305 Clean bases 8,782,102,350 Total trinity 'genes' 169,314 Total trinity transcripts 194,594 Percent GC 43.62 Total bases 175,099,782 Stats based on Contig N10 5,930 ALL Contig N20 4,343 transcript Contig N30 3,322 contigs Contig N40 2,559 Contig N50 1,873 Total bases 123,502,821 Stats based on Contig N10 4,947 ONLY Contig N20 3,431 LONGEST Contig N30 2,524 ISOFORM Contig N40 1,780 per 'GENE' Contig N50 1,181

22

Table S7 Clean Illumina read data of the five snakes. Lane Library Species Sample Library Read pairs bases Read pairs bases 18,850,243 5,415,626,504 18,146,624 5,212,471,228 I311 122,665,630 35,218,278,916 18,597,934 5,327,997,034 T. baileyi 1hao 67,070,829 19,262,184,150 (Male) 35,834,105 10,260,863,446 I312 35,014,363 10,033,560,228 106,033,260 30,342,281,546 35,184,792 10,047,857,872 T. shangrila 333 - - 124,265,568 35,801,267,690 2--2 (Female) 334 - - 103,505,830 30,481,425,618 T. zhaoermii 332 - - 89,716,139 25,761,434,100 A3 (Female) 331 - - 95,290,211 27,823,100,404 P. macrops 327 - - 148,311,991 41,277,678,938 B3 (Female) 328 - - 133,239,864 38,891,999,620 P. bambusicola 329 - - 138,373,586 40,004,506,104 C2 (Male) 330 - - 160,391,586 47,396,387,846

Table S8 Statistics for the assemblies of five re-sequenced individuals. scaffolds contigs Species Sample total N50 total N50 T. baileyi 1hao 1,438,423,223 9,113 1,396,272,257 5,543 T. shangrila 2-2 1,493,006,050 9,547 1,417,418,923 5,051 T. zhaoermii A3 1,473,650,348 8,639 1,391,896,387 4,661 P. macrops B3 1,460,561,383 11,573 1,360,186,608 5,385 P. bambusicola C2 1,422,951,373 15,898 1,366,923,414 7,022

Table S9 Length of assembled mitochondria. Species Individuals Length (bp) T. baileyi 1hao 17,492 T. shangrila 2-2 17,476 T. zhaoermii A3 17,462 P. macrops B3 17,859 P. bambusicola C2 17,715 T. baileyi 1-13 17,476

23

Table S10 Repeat content of selected genomes. Species Genome Size Count Length (bp) Repeat contents (%) T. baileyi 1,747,683,645 5,139,385 791,366,059 45.28% T. sirtalis 1,424,897,867 3,251,924 455,021,517 31.93% O. hannah 1,594,074,654 3,997,749 571,811,891 35.87% P. bivittatus 1,435,034,535 2,890,868 448,009,867 31.22% B. constrictor 1,447,999,364 2,749,106 505,406,434 34.90% P. vitticeps 1,758,749,122 4,464,365 677,831,025 38.54% A. carolinensis 1,799,143,587 5,413,643 917,754,158 51.01% O. gracilis 1,781,356,875 6,243,817 892,740,773 50.12% G. japonicus 2,490,257,917 8,336,120 1,373,404,326 55.15%

Table S11 Size of each repeat and non-repeat class in the genomes. TE Non-TE-Repeats Species Genome Size Count bp % Count bp % T. baileyi 1,747,683,645 4,883,675 772,532,609 44.20% 255,710 18,833,450 1.08% T. sirtalis 1,424,897,867 2,954,978 419,555,245 29.44% 296,946 35,466,272 2.49% O. hannah 1,594,074,654 3,754,115 535,326,201 33.58% 243,634 36,485,690 2.29% P. bivittatus 1,435,034,535 2,610,313 420,430,061 29.30% 280,555 27,579,806 1.92% B. constrictor 1,447,999,364 2,492,737 480,358,226 33.17% 256,369 25,048,208 1.73% P. vitticeps 1,758,749,122 4,041,462 624,432,067 35.50% 422,903 53,398,958 3.04% A. carolinensis 1,799,143,587 5,308,739 909,085,950 50.53% 104,904 8,668,208 0.48% O. gracilis 1,781,356,875 5,776,099 838,055,895 47.05% 467,718 54,684,878 3.07% G. japonicus 2,490,257,917 8,203,395 1,361,357,078 54.67% 132,725 12,047,248 0.48%

Table S12 Repeat content (subtypes) of selected genomes. Transposable Elements (TE) Species Non-TE-Repeats DNA LTR LINE SINE Unknown T. baileyi 4.35% 20.43% 8.80% 1.47% 9.17% 1.08% T. sirtalis 6.13% 12.42% 8.37% 2.19% 0.33% 2.49% O. hannah 4.30% 16.27% 10.79% 1.66% 0.56% 2.29% P. bivittatus 5.32% 6.24% 14.80% 2.22% 0.71% 1.92% B. constrictor 5.37% 9.60% 15.09% 2.35% 0.77% 1.73% P. vitticeps 7.76% 11.97% 12.48% 2.80% 0.49% 3.04% A. carolinensis 9.20% 20.07% 12.64% 3.09% 5.52% 0.48% O. gracilis 9.53% 22.39% 12.24% 1.53% 1.35% 3.07% G. japonicus 1.87% 31.09% 8.98% 2.45% 10.28% 0.48%

24

Table S13 Comparison of median divergences of each transposable element (TE) type. The lower triangle of each type indicates the significance of each comparison (Wilcoxon rank-sum test), and the upper triangle indicates the difference between the species in rows and species in columns. Negative values are highlighted in blue, and positive values are highlighted in light red. Each value on the upper triangle of the matrix indicates that the estimated difference in median divergence between Tbail and Tsirt (e.g., -1.60E+00 (marked as * in the Table) indicates -0.0160), and is highlighted in blue if it was less than zero. Tbail (Thermophis baileyi), Ohann (Ophiophagus Hannah), Pbivi (Python bivittatus), Tsirt (Thamnophis sirtalis), Ograc (Ophisaurus gracilis), Pvitt (Pogona vitticeps), Acaro (Anolis carolinensis), Gjapo (Gekko japonicus), and Bcons (Boa constrictor). TE Species Tbail Tsirt Ohann Pbivi Bcons Pvitt Acaro Ograc Gjapo Tbail NA -1.60E+00 -2.50E+00 -4.20E+00 -4.40E+00 -4.60E+00 1.00E-01 -4.10E+00 -2.00E+00 Tsirt 0.00E+00 NA -8.00E-01 -2.70E+00 -2.80E+00 -3.00E+00 1.70E+00 -2.40E+00 -3.00E-01 Ohann 0.00E+00 0.00E+00 NA -1.90E+00 -2.00E+00 -2.20E+00 2.50E+00 -1.60E+00 4.00E-01 Pbivi 0.00E+00 0.00E+00 0.00E+00 NA -2.00E-01 -3.00E-01 4.30E+00 4.00E-01 2.50E+00 DNA Bcons 0.00E+00 0.00E+00 0.00E+00 1.78E-31 NA -1.00E-01 4.40E+00 5.00E-01 2.60E+00 Pvitt 0.00E+00 0.00E+00 0.00E+00 1.15E-98 9.89E-26 NA 4.70E+00 6.00E-01 2.80E+00 Acaro 6.64E-08 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 NA -4.10E+00 -2.00E+00 Ograc 0.00E+00 0.00E+00 0.00E+00 4.13E-215 0.00E+00 0.00E+00 0.00E+00 NA 2.10E+00 Gjapo 0.00E+00 2.03E-83 8.39E-144 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 NA Tbail NA -1.60E+00 -2.00E+00 -4.00E+00 -4.70E+00 -9.00E-01 4.00E-01 -1.20E+00 -1.40E+00 Tsirt 0.00E+00 NA -3.00E-01 -2.60E+00 -3.30E+00 6.00E-01 1.90E+00 2.00E-01 8.84E-05 Ohann 0.00E+00 8.76E-105 NA -2.40E+00 -3.10E+00 9.00E-01 2.20E+00 5.00E-01 3.00E-01 Pbivi 0.00E+00 0.00E+00 0.00E+00 NA -5.00E-01 3.30E+00 4.40E+00 2.80E+00 2.50E+00 LINE Bcons 0.00E+00 0.00E+00 0.00E+00 0.00E+00 NA 3.90E+00 5.00E+00 3.40E+00 3.20E+00 Pvitt 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 NA 1.20E+00 -4.00E-01 -6.00E-01 Acaro 8.07E-119 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 NA -1.60E+00 -1.80E+00 Ograc 0.00E+00 3.89E-59 1.05E-305 0.00E+00 0.00E+00 5.45E-217 0.00E+00 NA -2.00E-01 Gjapo 0.00E+00 9.40E-02 1.21E-100 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.61E-63 NA Tbail NA 2.60E+00 -1.10E+00 -3.10E+00 1.20E+00 -1.10E+00 2.30E+00 -2.30E+00 4.00E-01 Tsirt 0.00E+00 NA -3.60E+00 -5.50E+00 -1.60E+00 -3.60E+00 -4.00E-01 -4.80E+00 -2.10E+00 Ohann 0.00E+00 0.00E+00 NA -2.10E+00 2.20E+00 -3.77E-05 3.30E+00 -1.20E+00 1.40E+00 Pbivi 0.00E+00 0.00E+00 0.00E+00 NA 3.80E+00 2.00E+00 5.10E+00 9.00E-01 3.60E+00 SINE Bcons 0.00E+00 0.00E+00 0.00E+00 0.00E+00 NA -2.10E+00 1.20E+00 -3.20E+00 -7.00E-01 Pvitt 0.00E+00 0.00E+00 6.08E-02 0.00E+00 0.00E+00 NA 3.30E+00 -1.10E+00 1.50E+00 Acaro 0.00E+00 7.95E-61 0.00E+00 0.00E+00 0.00E+00 0.00E+00 NA -4.40E+00 -1.80E+00 Ograc 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 NA 2.70E+00 Gjapo 8.26E-81 0.00E+00 0.00E+00 0.00E+00 4.71E-173 0.00E+00 0.00E+00 0.00E+00 NA Tbail NA 3.50E+00 4.10E+00 -1.70E+00 -3.20E+00 -2.40E+00 7.10E+00 -6.00E-01 -1.00E-01 Tsirt 3.42E-51 NA 3.00E-01 -4.60E+00 -5.70E+00 -5.20E+00 3.60E+00 -3.50E+00 -3.50E+00 Ohann 1.06E-53 1.08E-39 NA -4.90E+00 -5.90E+00 -5.40E+00 3.20E+00 -3.70E+00 -4.10E+00 Pbivi 2.29E-15 0.00E+00 0.00E+00 NA -1.10E+00 -7.00E-01 8.30E+00 1.10E+00 1.50E+00 Retro Bcons 6.79E-37 0.00E+00 0.00E+00 1.55E-301 NA 4.00E-01 9.50E+00 2.20E+00 2.90E+00 Pvitt 9.62E-24 0.00E+00 0.00E+00 3.74E-161 3.82E-55 NA 8.90E+00 1.80E+00 2.20E+00 Acaro 5.30E-92 2.44E-34 1.83E-26 1.27E-130 2.98E-127 2.07E-133 NA -7.20E+00 -7.10E+00 Ograc 5.95E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 7.46E-98 NA 4.00E-01 Tbail Tsirt Ohann Pbivi Bcons Pvitt Acaro Ograc Gjapo Tbail NA -1.80E+00 -3.20E+00 -3.70E+00 -2.40E+00 -3.80E+00 -1.00E+00 -2.30E+00 -2.30E+00 Tsirt 0.00E+00 NA -1.30E+00 -1.90E+00 -5.00E-01 -2.00E+00 9.00E-01 -5.00E-01 -5.00E-01 LTR Ohann 0.00E+00 0.00E+00 NA -6.00E-01 8.00E-01 -7.00E-01 2.20E+00 9.00E-01 8.00E-01 Pbivi 0.00E+00 0.00E+00 0.00E+00 NA 1.40E+00 -1.00E-01 2.80E+00 1.40E+00 1.40E+00 Bcons 0.00E+00 0.00E+00 0.00E+00 0.00E+00 NA -1.40E+00 1.40E+00 1.00E-01 5.33E-05

25

Pvitt 0.00E+00 0.00E+00 0.00E+00 2.84E-28 0.00E+00 NA 2.90E+00 1.50E+00 1.50E+00 Acaro 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 NA -1.30E+00 -1.40E+00 Ograc 0.00E+00 0.00E+00 0.00E+00 0.00E+00 5.37E-10 0.00E+00 0.00E+00 NA -8.54E-08 Gjapo 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.60E-04 0.00E+00 0.00E+00 4.46E-05 NA Tbail NA -2.00E+00 -1.60E+00 -7.40E+00 -7.10E+00 -5.80E+00 -5.50E+00 -4.60E+00 -3.20E+00 Tsirt 0.00E+00 NA 3.00E-01 -5.20E+00 -5.00E+00 -3.70E+00 -3.40E+00 -2.50E+00 -1.30E+00 Ohann 0.00E+00 1.08E-06 NA -5.60E+00 -5.20E+00 -4.00E+00 -3.70E+00 -2.80E+00 -1.50E+00 Pbivi 0.00E+00 0.00E+00 0.00E+00 NA 1.00E-01 1.50E+00 1.70E+00 2.70E+00 3.80E+00 Unknown Bcons 0.00E+00 0.00E+00 0.00E+00 9.82E-04 NA 1.30E+00 1.50E+00 2.50E+00 3.60E+00 Pvitt 0.00E+00 0.00E+00 0.00E+00 0.00E+00 5.89E-249 NA 3.00E-01 1.20E+00 2.40E+00 Acaro 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.35E-19 NA 1.00E+00 2.10E+00 Ograc 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 NA 1.20E+00 Gjapo 0.00E+00 9.38E-183 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 NA

Table S14 Number of potential transposable elements (TEs)-related proteins in the selected genomes. Species # gene models Total RepBase RepBase (protein) (nucleotide) Zebrafish 26,206 784 55 729 Tibetan hot spring snake 20,995 259 60 199

Table S15 Number of proteins in the Tibetan hot spring snake annotated with their functions. # gene models SwissProt RefSeq NR GO InterPro 20,995 18,978 19,138 19,832 15,962 14,112

Table S16 Species used for a comparision of guanine-cytosine (GC) content variation. Category Species name Category Species name Lizard Ophisaurus gracilis Bird Meleagris gallopavo Lizard Anolis carolinensis Bird Gallus gallus Lizard Pogona vitticeps Bird Taeniopygia guttata Snake Ophiophagus hannah Bird Nipponia nippon Snake Python bivittatus Bird Ficedula albicollis Snake Thermophis baileyi Bird Anas platyrhynchos Snake Boa constrictor Bird Columba livia Snake Thamnophis sirtalis Bird Pygoscelis adeliae Snake Vipera berus Mammals Ornithorhynchus anatinus Turtle Chrysemys picta Mammals Monodelphis domestica Turtle Pelodiscus sinensis Mammals Dasypus novemcinctus Turtle Chelonia mydas Mammals Loxodonta africana Alligator Alligator Mammals Ovis aries mississippiensis Alligator Alligator sinensis Mammals Homo sapiens Alligator Crocodylus porosus Mammals Mus musculus Alligator Gavialis gangeticus Mammals Canis familiaris

26

Table S17 Means, variances and standard deviations (SD) of the guanine-cytosine (GC) content in 32 species. Category Latin name Mean Variance SD Lizard Ophisaurus gracilis 0.43632 0.00070 0.02648 Lizard Anolis carolinensis 0.40166 0.00027 0.01649 Lizard Pogona vitticeps 0.41734 0.00083 0.02882 Snake Ophiophagus hannah 0.38311 0.00078 0.02793 Snake Python bivittatus 0.39305 0.00097 0.03117 Snake Thermophis baileyi 0.41058 0.00106 0.03259 Snake Boa constrictor 0.40283 0.00127 0.03568 Snake Thamnophis sirtalis 0.39102 0.00110 0.03311 Snake Vipera berus 0.40290 0.00142 0.03770 Turtle Chrysemys picta 0.43740 0.00174 0.04177 Turtle Pelodiscus sinensis 0.43981 0.00149 0.03863 Turtle Chelonia mydas 0.43221 0.00148 0.03851 Alligator Alligator mississippiensis 0.43929 0.00099 0.03145 Alligator Alligator sinensis 0.44399 0.00130 0.03610 Alligator Crocodylus porosus 0.44062 0.00137 0.03696 Alligator Gavialis gangeticus 0.43750 0.00116 0.03405 Bird Meleagris gallopavo 0.40426 0.00179 0.04228 Bird Gallus gallus 0.41507 0.00247 0.04968 Bird Taeniopygia guttata 0.41427 0.00219 0.04679 Bird Nipponia nippon 0.41672 0.00255 0.05051 Bird Ficedula albicollis 0.43547 0.00428 0.06545 Bird Anas platyrhynchos 0.40588 0.00228 0.04774 Bird Columba livia 0.41290 0.00233 0.04823 Bird Pygoscelis adeliae 0.41226 0.00179 0.04235 Mammal Ornithorhynchus anatinus 0.44105 0.00129 0.03589 Mammal Monodelphis domestica 0.37808 0.00122 0.03487 Mammal Dasypus novemcinctus 0.40452 0.00262 0.05117 Mammal Loxodonta africana 0.40730 0.00141 0.03753 Mammal Ovis aries 0.41761 0.00236 0.04859 Mammal Homo sapiens 0.40999 0.00272 0.05211 Mammal Mus musculus 0.41668 0.00159 0.03985 Mammal Canis familiaris 0.41178 0.00349 0.05908

Table S18 Wilcoxon rank-sum test of the standard variation in guanine-cytosine (GC) content among seven categories. Snake 0.048 Turtle 0.100 0.024 Alligator 0.057 0.476 0.057 Bird 0.012 0.001 0.012 0.004 Mammal 0.012 0.008 0.776 0.048 0.574 Lizard Snake Turtle Alligator Bird

27

Table S19 Abbreviations and Latin names of the species used for the identification of gene families and downstream analyses of genome evolution. Abbreviation Latin Category Lcha Latimeria chalumnae Fish Npar Nanorana parkeri Frog Xtro Xenopus_tropicalis Frog Oana Ornithorhynchus anatinus Mammal Mdom Monodelphis domestica Mammal Cfam Canis lupus familiaris Mammal Mmus Mus musculus Mammal Hsap Homo sapiens Mammal Ogra Ophisaurus gracilis Lizard Pvit Pogona vitticeps Lizard Acar Anolis carolinensis Lizard Pbiv Python bivittatus Snake Ohan Ophiophagus hannah Snake Tsir Thamnophis sirtalis Snake Cbel Chrysemys picta bellii Turtle Psin Pelodiscus sinensis Turtle Cmyd Chelonia mydas Turtle Amis Alligator mississippiensis Crocodile Asin Alligator sinensis Crocodile Ggal Gallus gallus Brid Tgut Taeniopygia guttata Brid Nnip Nipponia nippon Brid Apla Anas platyrhynchos Brid

28

Table S20 Statistics of the gene families identified by OrthoMCL(36). The “(gene)” subtext indicates that the column includes the number of genes in a gene family, and the “(family)” subtext indicates that the column includes the number of gene families. “Unclustered” indicates genes that could not be clustered into any gene family. “Unique” indicates a gene family for which only one species exists. “Common” indicates a gene family in which all the species are present, and “Single copy” indicates a gene family in which only one gene exists. Unclustered Clustered Clustered Unique Unique Common Common Single Genes (gene) (gene) (family) (family) (gene) (family) (gene) copy Acar 18264 827 17437 12865 40 137 3937 6545 1362 Amis 18008 554 17454 13943 3 6 3937 6535 1362 Apla 15011 665 14346 11825 7 18 3937 6014 1362 Asin 18371 628 17743 13911 26 55 3937 6556 1362 Cbel 20797 1217 19580 14656 85 202 3937 6659 1362 Cfam 19520 1511 18009 14167 40 114 3937 6514 1362 Cmyd 17615 632 16983 13632 9 19 3937 6348 1362 Ggal 15355 770 14585 11748 24 88 3937 5994 1362 Tbai 20943 2612 18331 14300 92 354 3937 6629 1362 Hsap 18945 510 18435 14425 78 310 3937 6441 1362 Lcha 19436 1169 18267 12989 162 785 3937 6584 1362 Mdom 20806 1534 19272 13401 108 499 3937 6488 1362 Mmus 21724 1040 20684 14583 197 1423 3937 6667 1362 Nnip 15571 569 15002 11631 23 790 3937 5855 1362 Npar 21799 2819 18980 13246 295 1200 3937 6220 1362 Oana 21326 4608 16718 12140 108 383 3937 6378 1362 Ogra 19488 3582 15906 12294 179 741 3937 6115 1362 Ohan 18240 4286 13954 11494 32 106 3937 5855 1362 Pbiv 18962 912 18050 13964 23 58 3937 6923 1362 Psin 19040 1207 17833 13539 116 461 3937 6271 1362 Pvit 18545 594 17951 13564 64 224 3937 6524 1362 Tgut 17002 1695 15307 11588 56 529 3937 6155 1362 Tsir 18242 1669 16573 13006 13 41 3937 6542 1362 Xtro 18324 630 17694 12279 123 712 3937 6760 1362

29

Table S21 Significantly over-represented Gene Ontology (GO) terms among Tibetan hot spring snake-specific genes. “x” is the number of Tibetan hot spring snake-specific genes assigned to that GO term. The GO terms with corrected p value bellow 0.05 are selected as significant enriched groups. corrected p- GO-ID p-value x Description value 7608 9.60E-28 6.25E-24 59 sensory perception of smell 7606 8.79E-26 2.86E-22 62 sensory perception of chemical stimulus 4871 1.60E-14 3.48E-11 158 signal transducer activity detection of chemical stimulus involved in sensory 50907 2.50E-14 4.00E-11 34 perception detection of chemical stimulus involved in sensory 50911 3.69E-14 4.00E-11 32 perception of smell 4984 3.69E-14 4.00E-11 32 olfactory receptor activity 7600 1.46E-13 1.35E-10 77 sensory perception 9593 8.59E-13 6.98E-10 36 detection of chemical stimulus 50906 7.82E-12 5.48E-09 39 detection of stimulus involved in sensory perception 60089 8.43E-12 5.48E-09 162 molecular transducer activity 50877 4.32E-11 2.55E-08 100 neurological system process 4888 9.19E-11 4.98E-08 108 transmembrane signaling receptor activity 38023 5.44E-10 2.72E-07 115 signaling receptor activity 51606 5.12E-09 2.38E-06 43 detection of stimulus 3008 5.84E-09 2.53E-06 121 system process 4872 1.41E-07 5.72E-05 119 receptor activity G-protein coupled receptor signaling pathway, coupled to 7187 2.06E-07 7.90E-05 52 cyclic nucleotide second messenger 1885 1.07E-06 3.86E-04 14 endothelial cell development 4497 1.48E-06 5.08E-04 25 monooxygenase activity 32993 4.15E-06 1.35E-03 23 protein-DNA complex 786 1.56E-05 4.60E-03 17 nucleosome 1990104 1.56E-05 4.60E-03 17 DNA bending complex 20037 1.89E-05 5.34E-03 27 heme binding 7186 2.18E-05 5.90E-03 79 G-protein coupled receptor signaling pathway 45446 2.36E-05 6.14E-03 15 endothelial cell differentiation 4930 2.56E-05 6.39E-03 61 G-protein coupled receptor activity 16712 2.89E-05 6.96E-03 11 oxidoreductase activity, acting on paired donors… 46906 3.43E-05 7.97E-03 27 tetrapyrrole binding 44391 3.92E-05 8.30E-03 25 ribosomal subunit 5198 3.94E-05 8.30E-03 61 structural molecule activity 44815 3.95E-05 8.30E-03 17 DNA packaging complex 3735 5.02E-05 1.02E-02 29 structural constituent of ribosome 5840 9.07E-05 1.79E-02 47 ribosome 7165 1.11E-04 2.13E-02 294 signal transduction 3158 1.34E-04 2.48E-02 15 endothelium development 22626 1.80E-04 3.19E-02 20 cytosolic ribosome 35666 1.81E-04 3.19E-02 6 TRIF-dependent toll-like receptor signaling pathway 23052 2.66E-04 4.44E-02 310 signaling 44700 2.66E-04 4.44E-02 310 single organism signaling

30

Table S22 Calibrating points for estimating divergence times. Myr, million years. Clade Clade Min (100 Myr) Max (100 Myr) Reference Homo sapiens Latimeria chalumnae - 4.160 (53) Homo sapiens Xenopus tropicalis 3.304 3.501 (53) Homo sapiens Monodelphis domestica 1.430 1.780 (54) Homo sapiens Mus musculus 0.617 1.005 (53) Homo sapiens Gallus gallus 3.123 3.304 (53) Gallus gallus Taeniopygia guttata 0.660 0.865 Gallus gallus Alligator mississippiensis 2.350 2.504 Gallus gallus Anolis carolinensis 2.597 2.998 Anolis carolinensis Ophiophagus hannah 1.450 1.940 Ophiophagus hannah Thamnophis sirtalis 0.333 -

Table S23 Expanded and contracted gene families on each branch and node. Ex, expanded; Co, contracted; Un, unchanged. family-wide Viterbi Species Ex Un Co Ex Co Un Ex Co Acar 671 21,490 2,001 37 22 76 16 0 Amis 323 22,907 932 8 39 88 3 8 Apla 315 22,700 1,147 10 14 111 2 2 Asin 490 22,741 931 35 8 92 21 2 Cbel 761 22,790 611 50 7 78 29 2 Cfam 535 22,089 1,538 15 38 82 6 2 Cmyd 200 22,139 1,823 8 50 77 3 23 Ggal 295 22,636 1,231 16 13 106 10 0 Hsap 381 22,884 897 11 36 88 6 4 Tbai 794 21,934 1,434 29 18 88 17 2 Lcha 1,062 11,647 11,453 23 107 5 9 1 Mdom 932 20,414 2,816 39 29 67 14 2 Mmus 814 22,672 676 38 13 84 22 1 Nnip 271 22,516 1,375 10 27 98 6 5 Npar 1,121 21,200 1,841 22 26 87 8 2 Oana 1,264 17,352 5,546 22 49 64 7 1 Ogra 690 19,442 4,030 15 61 59 6 5 Ohan 383 18,336 5,443 9 53 73 5 15 Pbiv 900 20,543 2,719 34 28 73 13 0 Psin 518 21,139 2,505 33 30 72 12 1 Pvit 642 22,299 1,221 38 20 77 11 0 Tgut 779 21,978 1,405 19 24 92 14 1 Tsir 758 20,446 2,958 24 35 76 15 7 Xtro 728 20,888 2,546 29 23 83 11 1 Birds 130 19,523 4,509 6 61 68 1 2 Crocodiles 448 20,487 3,227 33 36 66 4 0 Turtles 438 21,451 2,273 49 32 54 13 0 Snakes 194 22,466 1,502 14 36 85 2 2 Lizards 10 21,988 2,164 5 13 117 0 0 204 20,999 2,959 20 35 80 2 0 Mammals 131 17,852 6,179 22 55 58 2 0 Amphibians 371 14,158 9,633 31 75 29 12 0

31

Table S24 Gene Ontology (GO) enrichment of significantly expanded gene families in the Tibetan hot spring snake. GO Class # in genome # in significant P Corrected P Term GO:0004021 MF 16 13 2.00E-25 2.74E-22 L-alanine:2-oxoglutarate aminotransferase activity GO:0004104 MF 13 10 2.50E-19 1.71E-16 cholinesterase activity GO:0030170 MF 67 13 6.40E-15 2.92E-12 pyridoxal phosphate binding GO:0005509 MF 611 24 1.00E-10 3.42E-08 calcium ion binding oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, GO:0016712 MF 35 8 3.10E-10 7.07E-08 reduced flavin or flavoprotein as one donor, and incorporation of one atom of oxygen GO:0020037 MF 152 13 3.10E-10 7.07E-08 heme binding GO:0005506 MF 186 13 3.80E-09 7.43E-07 iron ion binding GO:0016881 MF 100 9 1.20E-07 2.05E-05 acid-amino acid ligase activity GO:0005488 MF 9698 85 8.80E-06 0.001338578 binding GO:0004497 MF 118 13 0.0003 0.04107 monooxygenase activity GO:0046982 MF 327 10 0.00035 0.043559091 protein heterodimerization activity GO:0007156 BP 120 24 3.00E-28 1.95E-24 homophilic cell adhesion GO:0042851 BP 18 13 8.30E-25 2.69E-21 L-alanine metabolic process GO:0006103 BP 24 13 2.30E-22 4.97E-19 2-oxoglutarate metabolic process GO:0015976 BP 37 13 3.10E-19 5.03E-16 carbon utilization GO:0006531 BP 54 13 8.70E-17 1.13E-13 aspartate metabolic process GO:0006334 BP 69 6 1.20E-05 0.012968 nucleosome assembly

Table S25 Significantly contracted gene families in the Tibetan hot spring snake. Cluster KEGG ortholog Description Cluster617 tgu:100229757 similar to olfactory receptor, family 5, subfamily U, member 1; K04257 olfactory receptor Cluster46 ptr:470805 ZNF167; zinc finger protein 167; K09229 KRAB and SCAN domains-containing zinc finger protein

Table S26 Positively selected genes in the Tibetan hot spring snake. Tibetan HSS Clutser ID Pvalue AdjustedP #Sites NCBI Accession gene ID Cluster2353 Tba000159.1 0.00E+00 0.00E+00 9 gi|565310502|gb|ETE63686.1| Cluster3311 Tba000235.1 0.00E+00 0.00E+00 11 gi|565308180|gb|ETE62158.1| Cluster5771 Tba000301.1 0.00E+00 0.00E+00 33 gi|565311480|gb|ETE64339.1| Cluster6708 Tba000462.1 0.00E+00 0.00E+00 41 gi|602675876|ref|XP_007443413.1| Cluster3956 Tba000553.1 1.47E-13 6.03E-10 6 gi|565307737|gb|ETE61894.1| Cluster9001 Tba000608.1 1.02E-06 4.17E-03 2 gi|602677687|ref|XP_007444289.1| Cluster3305 Tba000638.1 2.81E-13 1.15E-09 7 gi|637335944|ref|XP_008115549.1| Cluster4958 Tba000716.1 2.67E-09 1.09E-05 4 gi|565299294|gb|ETE57725.1| Cluster4543 Tba000777.1 0.00E+00 0.00E+00 12 gi|602629800|ref|XP_007421823.1| Cluster7374 Tba000782.1 0.00E+00 0.00E+00 19 gi|602649318|ref|XP_007431067.1| Cluster7324 Tba000796.1 2.66E-10 1.09E-06 5 gi|602632687|ref|XP_007423232.1| Cluster4545 Tba000824.1 4.48E-13 1.83E-09 15 gi|602656952|ref|XP_007434202.1| Cluster4943 Tba000843.1 0.00E+00 0.00E+00 20 gi|602662878|ref|XP_007437078.1|

32

Cluster4882 Tba000881.1 0.00E+00 0.00E+00 18 gi|565312584|gb|ETE65128.1| Cluster9302 Tba000908.1 0.00E+00 0.00E+00 18 gi|602648497|ref|XP_007430665.1| Cluster2112 Tba000919.1 6.66E-16 2.73E-12 10 gi|565323038|gb|ETE73662.1| Cluster5850 Tba001182.1 2.14E-13 8.77E-10 26 gi|565315313|gb|ETE67227.1| Cluster5657 Tba001468.1 1.89E-12 7.74E-09 6 gi|602659252|ref|XP_007435321.1| Cluster3184 Tba001709.1 8.12E-11 3.32E-07 23 gi|602665887|ref|XP_007438525.1| Cluster2298 Tba001742.1 6.93E-11 2.84E-07 3 gi|602658256|ref|XP_007434834.1| Cluster6856 Tba001774.1 2.16E-10 8.86E-07 4 gi|565319652|gb|ETE70708.1| Cluster2321 Tba001831.1 6.25E-12 2.56E-08 4 gi|565320236|gb|ETE71208.1| Cluster4941 Tba001878.1 6.11E-16 2.50E-12 9 gi|602663032|ref|XP_007437153.1| Cluster4736 Tba002021.1 0.00E+00 0.00E+00 35 gi|602633856|ref|XP_007423805.1| Cluster5169 Tba002229.1 0.00E+00 0.00E+00 14 gi|327272259|ref|XP_003220903.1| Cluster6386 Tba002404.1 0.00E+00 0.00E+00 13 gi|565320219|gb|ETE71196.1| Cluster6393 Tba002407.1 0.00E+00 0.00E+00 7 gi|565306797|gb|ETE61346.1| Cluster4766 Tba002432.1 3.00E-15 1.23E-11 7 gi|602656831|ref|XP_007434144.1| Cluster8386 Tba002463.1 0.00E+00 0.00E+00 16 gi|602664122|ref|XP_007437684.1| Cluster4919 Tba002479.1 0.00E+00 0.00E+00 7 gi|602651043|ref|XP_007431908.1| Cluster8877 Tba002538.1 0.00E+00 0.00E+00 35 gi|565301361|gb|ETE58529.1| Cluster7115 Tba002774.1 3.05E-10 1.25E-06 19 gi|602673429|ref|XP_007442212.1| Cluster3813 Tba002831.1 2.55E-15 1.05E-11 9 gi|565300139|gb|ETE58039.1| Cluster7386 Tba002907.1 2.67E-11 1.09E-07 3 gi|602669088|ref|XP_007440083.1| Cluster5495 Tba002908.1 2.85E-13 1.17E-09 4 gi|602673347|ref|XP_007442171.1| Cluster7040 Tba003195.1 5.77E-12 2.36E-08 9 gi|602654382|ref|XP_007433030.1| Cluster4257 Tba003316.1 5.82E-08 2.38E-04 1 gi|602646270|ref|XP_007429568.1| Cluster6220 Tba003319.1 4.32E-09 1.77E-05 3 gi|602646262|ref|XP_007429564.1| Cluster5607 Tba003336.1 4.36E-09 1.78E-05 2 gi|602652003|ref|XP_007432381.1| Cluster5526 Tba003584.1 7.48E-12 3.06E-08 1 gi|565323264|gb|ETE73853.1| Cluster5524 Tba003586.1 5.27E-12 2.16E-08 4 gi|542168804|ref|XP_005492964.1| Cluster5412 Tba003611.1 0.00E+00 0.00E+00 8 gi|637327638|ref|XP_003224145.2| Cluster10344 Tba003656.1 1.25E-09 5.10E-06 3 gi|727016671|ref|XP_010397584.1| Cluster4331 Tba003713.1 2.94E-07 1.20E-03 2 gi|602640588|ref|XP_007427102.1| Cluster5017 Tba003736.1 0.00E+00 0.00E+00 20 gi|565312553|gb|ETE65101.1| Cluster5746 Tba003785.1 8.58E-09 3.51E-05 2 gi|565312333|gb|ETE64941.1| Cluster6594 Tba003934.1 1.33E-10 5.43E-07 1 gi|514781372|ref|XP_005027835.1| Cluster6577 Tba003936.1 5.14E-08 2.10E-04 3 gi|602639154|ref|XP_007426397.1| Cluster8641 Tba003992.1 0.00E+00 0.00E+00 14 gi|602672820|ref|XP_007441911.1| Cluster5714 Tba004320.1 2.61E-11 1.07E-07 8 gi|602669691|ref|XP_007440380.1| Cluster4297 Tba004402.1 0.00E+00 0.00E+00 5 gi|602650750|ref|XP_007431768.1| Cluster3905 Tba004566.1 2.22E-16 9.09E-13 5 gi|565316134|gb|ETE67859.1| Cluster7250 Tba004568.1 0.00E+00 0.00E+00 10 gi|565316129|gb|ETE67854.1| Cluster7356 Tba004594.1 0.00E+00 0.00E+00 28 gi|565322176|gb|ETE72900.1| Cluster4284 Tba004930.1 1.76E-09 7.22E-06 5 gi|568968875|ref|XP_006514339.1| Cluster5181 Tba004943.1 3.71E-07 1.52E-03 4 gi|602644988|ref|XP_007428938.1| Cluster4232 Tba005081.1 0.00E+00 0.00E+00 28 gi|602649807|ref|XP_007431304.1| Cluster12395 Tba005098.1 0.00E+00 0.00E+00 20 gi|602663777|ref|XP_007437514.1| Cluster7192 Tba005319.1 1.18E-14 4.82E-11 6 gi|565311437|gb|ETE64306.1|

33

Cluster6243 Tba005420.1 2.22E-16 9.09E-13 15 gi|565314620|gb|ETE66688.1| Cluster7621 Tba005422.1 1.06E-09 4.35E-06 5 gi|602663217|ref|XP_007437243.1| Cluster7463 Tba005504.1 0.00E+00 0.00E+00 28 gi|565307649|gb|ETE61831.1| Cluster6558 Tba005571.1 1.06E-13 4.36E-10 6 gi|637340338|ref|XP_008116354.1| Cluster5787 Tba005616.1 0.00E+00 0.00E+00 11 gi|565313549|gb|ETE65858.1| Cluster4738 Tba005644.1 1.82E-06 7.46E-03 1 gi|602646036|ref|XP_007429454.1| Cluster3659 Tba005667.1 2.77E-14 1.13E-10 6 gi|602647025|ref|XP_007429942.1| Cluster12144 Tba005743.1 8.77E-15 3.59E-11 5 gi|602646716|ref|XP_007429789.1| Cluster5439 Tba005862.1 1.42E-09 5.82E-06 15 gi|565308113|gb|ETE62119.1| Cluster2682 Tba006022.1 4.28E-13 1.75E-09 7 gi|637282392|ref|XP_008105505.1| Cluster5185 Tba006042.1 5.68E-13 2.32E-09 6 gi|565310648|gb|ETE63780.1| Cluster8638 Tba006149.1 3.31E-13 1.35E-09 22 gi|602677685|ref|XP_007444288.1| Cluster8872 Tba006155.1 0.00E+00 0.00E+00 15 gi|602655253|ref|XP_007433376.1| Cluster10245 Tba006165.1 1.79E-09 7.32E-06 5 gi|602633420|ref|XP_007423589.1| Cluster3185 Tba006208.1 5.73E-09 2.34E-05 2 gi|565309526|gb|ETE63031.1| Cluster6828 Tba006231.1 2.34E-11 9.56E-08 6 gi|536986775|gb|AGU42404.1| Cluster4932 Tba006244.1 0.00E+00 0.00E+00 9 gi|602654726|ref|XP_007433173.1| Cluster9566 Tba006276.1 5.55E-10 2.27E-06 4 gi|565303864|gb|ETE59720.1| Cluster7367 Tba006621.1 1.52E-14 6.20E-11 9 gi|602634488|ref|XP_007424116.1| Cluster6346 Tba006692.1 3.38E-08 1.38E-04 2 gi|565312200|gb|ETE64845.1| Cluster2771 Tba006789.1 0.00E+00 0.00E+00 18 gi|565315640|gb|ETE67481.1| Cluster5768 Tba006933.1 0.00E+00 0.00E+00 9 gi|602665921|ref|XP_007438541.1| Cluster4522 Tba006973.1 5.55E-17 2.27E-13 6 gi|602658340|ref|XP_007434874.1| Cluster4460 Tba006989.1 5.83E-08 2.38E-04 9 gi|426345597|ref|XP_004040492.1| Cluster4203 Tba007060.1 4.65E-11 1.90E-07 17 gi|565320863|gb|ETE71743.1| Cluster10211 Tba007172.1 0.00E+00 0.00E+00 13 gi|602659325|ref|XP_007435356.1| Cluster8123 Tba007173.1 1.05E-09 4.29E-06 4 gi|565315037|gb|ETE67012.1| Cluster6693 Tba007196.1 6.07E-08 2.49E-04 10 gi|565317923|gb|ETE69283.1| Cluster6596 Tba007229.1 7.42E-07 3.04E-03 1 gi|602627057|ref|XP_007420478.1| Cluster8276 Tba007247.1 0.00E+00 0.00E+00 13 gi|602646468|ref|XP_007429665.1| Cluster8905 Tba007368.1 0.00E+00 0.00E+00 13 gi|602627815|ref|XP_007420851.1| Cluster5082 Tba007418.1 7.72E-08 3.16E-04 2 gi|565313099|gb|ETE65521.1| Cluster7584 Tba007593.1 1.41E-08 5.79E-05 3 gi|565315723|gb|ETE67545.1| Cluster2745 Tba007617.1 5.32E-07 2.18E-03 1 gi|565315179|gb|ETE67126.1| Cluster5760 Tba007618.1 0.00E+00 0.00E+00 20 gi|602669721|ref|XP_007440395.1| Cluster6766 Tba007637.1 5.77E-14 2.36E-10 10 gi|602647164|ref|XP_007430010.1| Cluster4903 Tba007736.1 3.69E-12 1.51E-08 17 gi|565300934|gb|ETE58359.1| Cluster12431 Tba007757.1 0.00E+00 0.00E+00 14 gi|602661663|ref|XP_007436480.1| Cluster4712 Tba008097.1 0.00E+00 0.00E+00 18 gi|565315255|gb|ETE67185.1| Cluster2863 Tba008099.1 0.00E+00 0.00E+00 16 gi|602637888|ref|XP_007425782.1| Cluster9731 Tba008295.1 0.00E+00 0.00E+00 7 gi|602656401|ref|XP_007433935.1| Cluster6535 Tba008318.1 0.00E+00 0.00E+00 27 gi|602660610|ref|XP_007435968.1| Cluster6252 Tba008475.1 1.56E-10 6.38E-07 3 gi|602628399|ref|XP_007421136.1| Cluster5187 Tba008508.1 0.00E+00 0.00E+00 14 gi|565320331|gb|ETE71288.1| Cluster4652 Tba008570.1 1.11E-16 4.54E-13 17 gi|602630200|ref|XP_007422018.1| Cluster4661 Tba008573.1 4.57E-08 1.87E-04 3 gi|602630206|ref|XP_007422021.1|

34

Cluster5422 Tba008683.1 7.33E-12 3.00E-08 5 gi|602631768|ref|XP_007422781.1| Cluster7009 Tba008799.1 9.86E-13 4.04E-09 5 gi|584004662|ref|XP_006797332.1| Cluster5620 Tba008842.1 5.55E-16 2.27E-12 56 gi|565310181|gb|ETE63467.1| Cluster8892 Tba008938.1 1.69E-14 6.91E-11 4 gi|602666717|ref|XP_007438929.1| Cluster5705 Tba008946.1 1.51E-07 6.17E-04 59 gi|565317243|gb|ETE68752.1| Cluster8912 Tba009111.1 1.11E-16 4.54E-13 16 gi|538259959|dbj|BAN82074.1| Cluster2932 Tba009166.1 4.66E-13 1.91E-09 28 gi|327274378|ref|XP_003221954.1| Cluster6662 Tba009289.1 9.44E-10 3.87E-06 3 gi|565320337|gb|ETE71293.1| Cluster4391 Tba009294.1 6.30E-08 2.58E-04 5 gi|565319887|gb|ETE70908.1| Cluster5682 Tba009305.1 1.56E-10 6.40E-07 6 gi|565312767|gb|ETE65266.1| Cluster4622 Tba009379.1 0.00E+00 0.00E+00 10 gi|602663801|ref|XP_007437526.1| Cluster4306 Tba009500.1 0.00E+00 0.00E+00 11 gi|565313965|gb|ETE66168.1| Cluster6706 Tba009509.1 3.02E-13 1.23E-09 6 gi|565322607|gb|ETE73283.1| Cluster4940 Tba009521.1 4.62E-08 1.89E-04 3 gi|698381520|ref|XP_009816517.1| Cluster5343 Tba009527.1 4.43E-10 1.81E-06 2 gi|565319120|gb|ETE70260.1| Cluster8913 Tba009581.1 5.55E-16 2.27E-12 4 gi|602660941|ref|XP_007436131.1| Cluster4428 Tba009723.1 3.13E-10 1.28E-06 3 gi|602637850|ref|XP_007425764.1| Cluster2831 Tba009726.1 3.55E-07 1.45E-03 1 gi|565322266|gb|ETE72978.1| Cluster3103 Tba009836.1 3.76E-07 1.54E-03 2 gi|602638612|ref|XP_007426133.1| Cluster9220 Tba009861.1 7.18E-07 2.94E-03 1 gi|565311282|gb|ETE64201.1| Cluster10150 Tba010051.1 1.62E-08 6.62E-05 7 gi|641758336|ref|XP_008165181.1| Cluster7303 Tba010198.1 0.00E+00 0.00E+00 22 gi|602655025|ref|XP_007433287.1| Cluster5166 Tba010217.1 0.00E+00 0.00E+00 6 gi|602630948|ref|XP_007422382.1| Cluster7119 Tba010224.1 0.00E+00 0.00E+00 7 gi|602665104|ref|XP_007438141.1| Cluster9357 Tba010248.1 6.35E-09 2.60E-05 6 gi|327282670|ref|XP_003226065.1| Cluster2917 Tba010343.1 1.90E-09 7.76E-06 4 gi|530626811|ref|XP_005302931.1| Cluster4678 Tba010433.1 9.88E-08 4.04E-04 2 gi|602651185|ref|XP_007431978.1| Cluster8618 Tba010453.1 8.05E-12 3.30E-08 6 gi|565323068|gb|ETE73685.1| Cluster10632 Tba010510.1 0.00E+00 0.00E+00 11 gi|602641962|ref|XP_007427777.1| Cluster6605 Tba010513.1 1.89E-15 7.73E-12 9 gi|565314760|gb|ETE66796.1| Cluster4687 Tba010514.1 1.27E-07 5.18E-04 2 gi|527260822|ref|XP_005148659.1| Cluster6339 Tba010523.1 0.00E+00 0.00E+00 22 gi|565315330|gb|ETE67239.1| Cluster4261 Tba010535.1 0.00E+00 0.00E+00 9 gi|565312998|gb|ETE65445.1| Cluster9024 Tba010552.1 3.56E-10 1.46E-06 4 gi|565322405|gb|ETE73105.1| Cluster4972 Tba010669.1 0.00E+00 0.00E+00 25 gi|565303892|gb|ETE59734.1| Cluster11243 Tba010692.1 0.00E+00 0.00E+00 18 gi|664712988|ref|XP_008513455.1| Cluster11727 Tba010704.1 7.22E-16 2.95E-12 29 gi|565319561|gb|ETE70634.1| Cluster4618 Tba010736.1 4.70E-14 1.92E-10 19 gi|602639933|ref|XP_007426784.1| Cluster5012 Tba010787.1 1.72E-07 7.05E-04 12 gi|351699075|gb|EHB01994.1| Cluster6860 Tba010809.1 1.96E-09 8.00E-06 3 gi|327269410|ref|XP_003219487.1| Cluster4371 Tba010820.1 0.00E+00 0.00E+00 13 gi|602636177|ref|XP_007424946.1| Cluster3307 Tba010852.1 3.06E-10 1.25E-06 3 gi|602630455|ref|XP_007422143.1| Cluster3122 Tba010928.1 8.75E-11 3.58E-07 7 gi|602637299|ref|XP_007425498.1| Cluster4606 Tba010935.1 1.00E-10 4.10E-07 3 gi|565317652|gb|ETE69070.1| Cluster6336 Tba011018.1 2.37E-11 9.71E-08 7 gi|602658779|ref|XP_007435090.1| Cluster12392 Tba011044.1 2.18E-14 8.91E-11 9 gi|602657796|ref|XP_007434612.1|

35

Cluster3947 Tba011085.1 6.03E-11 2.47E-07 9 gi|602631470|ref|XP_007422636.1| Cluster7708 Tba011105.1 9.54E-13 3.91E-09 10 gi|565313397|gb|ETE65749.1| Cluster4148 Tba011129.1 9.96E-11 4.08E-07 4 gi|637246336|ref|XP_008105604.1| Cluster8101 Tba011234.1 0.00E+00 0.00E+00 20 gi|327277550|ref|XP_003223527.1| Cluster7424 Tba011464.1 7.63E-14 3.12E-10 6 gi|565317715|gb|ETE69117.1| Cluster5774 Tba011764.1 1.90E-12 7.78E-09 15 gi|602643628|ref|XP_007428272.1| Cluster5020 Tba011788.1 8.32E-14 3.40E-10 5 gi|565313187|gb|ETE65591.1| Cluster3282 Tba011976.1 2.63E-14 1.08E-10 4 gi|602625976|ref|XP_007419951.1| Cluster4929 Tba012071.1 0.00E+00 0.00E+00 13 gi|602634763|ref|XP_007424252.1| Cluster9872 Tba012142.1 3.01E-10 1.23E-06 2 gi|565310374|gb|ETE63599.1| Cluster5158 Tba012254.1 0.00E+00 0.00E+00 46 gi|565307664|gb|ETE61842.1| Cluster7654 Tba012300.1 1.14E-14 4.68E-11 5 gi|602642082|ref|XP_007427834.1| Cluster12060 Tba012337.1 0.00E+00 0.00E+00 37 gi|637266307|ref|XP_008102382.1| Cluster11144 Tba012496.1 1.11E-15 4.54E-12 6 gi|565321362|gb|ETE72183.1| Cluster5523 Tba012519.1 1.06E-07 4.32E-04 6 gi|602626643|ref|XP_007420275.1| Cluster7362 Tba012530.1 3.47E-08 1.42E-04 3 gi|602677033|ref|XP_007443970.1| Cluster5465 Tba012535.1 3.13E-13 1.28E-09 10 gi|602626123|ref|XP_007420023.1| Cluster8154 Tba012564.1 3.44E-07 1.41E-03 12 gi|620985207|ref|XP_001520077.3| Cluster6532 Tba012577.1 0.00E+00 0.00E+00 24 gi|565314170|gb|ETE66326.1| Cluster3006 Tba012604.1 0.00E+00 0.00E+00 11 gi|602635250|ref|XP_007424493.1| Cluster2469 Tba012634.1 0.00E+00 0.00E+00 15 gi|565306207|gb|ETE61002.1| Cluster8226 Tba012712.1 2.18E-06 8.91E-03 2 gi|565303325|gb|ETE59453.1| Cluster5480 Tba012744.1 0.00E+00 0.00E+00 26 gi|602627104|ref|XP_007420501.1| Cluster5462 Tba012768.1 1.42E-13 5.81E-10 4 gi|565311614|gb|ETE64435.1| Cluster5388 Tba012785.1 0.00E+00 0.00E+00 10 gi|602655009|ref|XP_007433281.1| Cluster8691 Tba012853.1 1.63E-13 6.68E-10 12 gi|602657331|ref|XP_007434386.1| Cluster13348 Tba012882.1 0.00E+00 0.00E+00 12 gi|565313418|gb|ETE65767.1| Cluster8861 Tba012937.1 6.27E-12 2.57E-08 7 gi|565310535|gb|ETE63708.1| Cluster9919 Tba012957.1 7.34E-10 3.01E-06 4 gi|602669570|ref|XP_007440320.1| Cluster6925 Tba013019.1 0.00E+00 0.00E+00 28 gi|602639180|ref|XP_007426410.1| Cluster5110 Tba013162.1 1.29E-07 5.26E-04 3 gi|565314210|gb|ETE66357.1| Cluster5226 Tba013330.1 1.16E-09 4.74E-06 6 gi|602653978|ref|XP_007432837.1| Cluster5963 Tba013336.1 2.14E-07 8.77E-04 16 gi|602632213|ref|XP_007423000.1| Cluster4567 Tba013351.1 4.27E-12 1.75E-08 4 gi|602633157|ref|XP_007423461.1| Cluster9029 Tba013405.1 2.91E-13 1.19E-09 9 gi|602650200|ref|XP_007431497.1| Cluster5833 Tba013809.1 0.00E+00 0.00E+00 16 gi|565311609|gb|ETE64432.1| Cluster6742 Tba013858.1 3.82E-14 1.56E-10 5 gi|602638031|ref|XP_007425849.1| Cluster9853 Tba013864.1 0.00E+00 0.00E+00 21 gi|565308350|gb|ETE62266.1| Cluster3573 Tba013898.1 2.25E-14 9.22E-11 4 gi|565322666|gb|ETE73333.1| Cluster5751 Tba013937.1 6.11E-16 2.50E-12 6 gi|565310816|gb|ETE63894.1| Cluster3170 Tba014028.1 0.00E+00 0.00E+00 18 gi|637298221|ref|XP_008108359.1| Cluster3636 Tba014066.1 8.02E-13 3.28E-09 7 gi|602647269|ref|XP_007430062.1| Cluster5571 Tba014082.1 1.23E-11 5.02E-08 5 gi|565315945|gb|ETE67714.1| Cluster5200 Tba014086.1 0.00E+00 0.00E+00 38 gi|565321019|gb|ETE71880.1| Cluster4075 Tba014095.1 0.00E+00 0.00E+00 8 gi|565309647|gb|ETE63115.1| Cluster5482 Tba014148.1 7.05E-08 2.89E-04 3 gi|565310931|gb|ETE63966.1|

36

Cluster3139 Tba014365.1 0.00E+00 0.00E+00 15 gi|602629665|ref|XP_007421757.1| Cluster2702 Tba014421.1 0.00E+00 0.00E+00 59 gi|602645769|ref|XP_007429321.1| Cluster4342 Tba014471.1 0.00E+00 0.00E+00 5 gi|602645130|ref|XP_007429007.1| Cluster5675 Tba014570.1 0.00E+00 0.00E+00 8 gi|565305364|gb|ETE60529.1| Cluster5662 Tba014577.1 7.17E-10 2.93E-06 8 gi|723132655|ref|XP_010282806.1| Cluster2965 Tba014612.1 3.62E-14 1.48E-10 3 gi|565312361|gb|ETE64961.1| Cluster8177 Tba014666.1 5.72E-07 2.34E-03 6 gi|602644586|ref|XP_007428745.1| Cluster12520 Tba014688.1 1.99E-07 8.16E-04 2 gi|602628578|ref|XP_007421223.1| Cluster3676 Tba014777.1 5.55E-17 2.27E-13 22 gi|602645351|ref|XP_007429115.1| Cluster5163 Tba014792.1 2.95E-08 1.21E-04 4 gi|565322108|gb|ETE72842.1| Cluster4682 Tba014829.1 0.00E+00 0.00E+00 32 gi|602637271|ref|XP_007425484.1| Cluster8699 Tba014844.1 1.04E-13 4.26E-10 7 gi|602673421|ref|XP_007442208.1| Cluster9656 Tba014950.1 0.00E+00 0.00E+00 16 gi|602648019|ref|XP_007430429.1| Cluster11085 Tba014956.1 6.55E-07 2.68E-03 3 gi|602648017|ref|XP_007430428.1| Cluster7124 Tba015176.1 5.90E-10 2.42E-06 4 gi|565320066|gb|ETE71068.1| Cluster4157 Tba015344.1 0.00E+00 0.00E+00 25 gi|602666910|ref|XP_007439024.1| Cluster5309 Tba015373.1 0.00E+00 0.00E+00 20 gi|538260145|dbj|BAN82167.1| Cluster5532 Tba015418.1 0.00E+00 0.00E+00 9 gi|565320186|gb|ETE71169.1| Cluster5870 Tba015445.1 6.21E-13 2.54E-09 4 gi|602646530|ref|XP_007429696.1| Cluster5352 Tba015589.1 0.00E+00 0.00E+00 28 gi|565315195|gb|ETE67138.1| Cluster4724 Tba015670.1 0.00E+00 0.00E+00 15 gi|602654099|ref|XP_007432897.1| Cluster5542 Tba015697.1 1.56E-08 6.39E-05 5 gi|565317598|gb|ETE69024.1| Cluster7442 Tba015719.1 0.00E+00 0.00E+00 17 gi|602673433|ref|XP_007442214.1| Cluster6459 Tba015923.1 2.15E-08 8.79E-05 1 gi|565310837|gb|ETE63909.1| Cluster8590 Tba016031.1 1.39E-10 5.70E-07 17 gi|327277384|ref|XP_003223445.1| Cluster2442 Tba016101.1 1.74E-09 7.14E-06 3 gi|602660033|ref|XP_007435701.1| Cluster2653 Tba016151.1 0.00E+00 0.00E+00 32 gi|465981703|gb|EMP35920.1| Cluster4585 Tba016181.1 1.96E-08 8.02E-05 6 gi|565316101|gb|ETE67836.1| Cluster6154 Tba016361.1 0.00E+00 0.00E+00 41 gi|602628484|ref|XP_007421177.1| Cluster6819 Tba016550.1 8.99E-15 3.68E-11 12 gi|530603929|ref|XP_005293709.1| Cluster5053 Tba016690.1 4.81E-11 1.97E-07 4 gi|565311197|gb|ETE64137.1| Cluster3291 Tba016712.1 1.33E-07 5.46E-04 5 gi|118100253|ref|XP_415805.2| Cluster5298 Tba016759.1 1.59E-09 6.50E-06 4 gi|565310665|gb|ETE63791.1| Cluster4817 Tba016774.1 0.00E+00 0.00E+00 19 gi|704334283|ref|XP_010173141.1| Cluster8623 Tba016828.1 0.00E+00 0.00E+00 26 gi|602666217|ref|XP_007438686.1| Cluster5266 Tba016830.1 1.81E-12 7.40E-09 7 gi|565305331|gb|ETE60506.1| Cluster3929 Tba016960.1 6.06E-08 2.48E-04 7 gi|565308251|gb|ETE62203.1| Cluster2969 Tba016964.1 9.08E-10 3.72E-06 5 gi|602628999|ref|XP_007421430.1| Cluster2987 Tba017020.1 2.43E-13 9.93E-10 4 gi|602655658|ref|XP_007433573.1| Cluster8825 Tba017029.1 2.16E-07 8.83E-04 2 gi|602658070|ref|XP_007434744.1| Cluster7492 Tba017041.1 1.51E-13 6.18E-10 12 gi|565317082|gb|ETE68617.1| Cluster7262 Tba017091.1 0.00E+00 0.00E+00 41 gi|565308231|gb|ETE62192.1| Cluster4880 Tba017187.1 0.00E+00 0.00E+00 22 gi|602634446|ref|XP_007424095.1| Cluster5083 Tba017190.1 9.50E-07 3.89E-03 2 gi|565321449|gb|ETE72257.1| Cluster7784 Tba017292.1 6.61E-12 2.70E-08 14 gi|565322182|gb|ETE72905.1| Cluster5246 Tba017402.1 1.11E-16 4.54E-13 10 gi|637281554|ref|XP_008105337.1|

37

Cluster4081 Tba017428.1 7.74E-10 3.17E-06 7 gi|565313090|gb|ETE65516.1| Cluster6456 Tba017469.1 5.55E-17 2.27E-13 10 gi|602661164|ref|XP_007436240.1| Cluster8005 Tba017513.1 0.00E+00 0.00E+00 27 gi|565311808|gb|ETE64562.1| Cluster3143 Tba017592.1 9.83E-12 4.02E-08 3 gi|602664875|ref|XP_007438052.1| Cluster12704 Tba017643.1 8.93E-11 3.65E-07 7 gi|565318656|gb|ETE69884.1| Cluster8866 Tba017774.1 4.85E-10 1.99E-06 7 gi|565309932|gb|ETE63307.1| Cluster6354 Tba017883.1 1.11E-16 4.54E-13 15 gi|565312110|gb|ETE64781.1| Cluster4321 Tba017915.1 2.75E-10 1.12E-06 3 gi|602664579|ref|XP_007437906.1| Cluster6311 Tba017944.1 5.55E-17 2.27E-13 12 gi|565311782|gb|ETE64546.1| Cluster9828 Tba018056.1 0.00E+00 0.00E+00 18 gi|602630383|ref|XP_007422107.1| Cluster7585 Tba018059.1 5.58E-08 2.28E-04 5 gi|602641798|ref|XP_007427696.1| Cluster7571 Tba018066.1 4.09E-11 1.67E-07 4 gi|602656439|ref|XP_007433954.1| Cluster5735 Tba018120.1 0.00E+00 0.00E+00 10 gi|602634807|ref|XP_007424274.1| Cluster4258 Tba018143.1 1.96E-07 8.01E-04 2 gi|641758760|ref|XP_008165331.1| Cluster5331 Tba018194.1 3.14E-09 1.28E-05 5 gi|602637339|ref|XP_007425517.1| Cluster3366 Tba018316.1 0.00E+00 0.00E+00 9 gi|565312779|gb|ETE65275.1| Cluster6108 Tba018378.1 8.69E-09 3.56E-05 5 gi|602648100|ref|XP_007430469.1| Cluster3673 Tba018417.1 4.22E-15 1.73E-11 8 gi|565318176|gb|ETE69489.1| Cluster9375 Tba018507.1 1.41E-09 5.78E-06 4 gi|565321076|gb|ETE71931.1| Cluster5182 Tba018534.1 6.03E-10 2.47E-06 5 gi|602656621|ref|XP_007434043.1| Cluster5666 Tba018555.1 0.00E+00 0.00E+00 11 gi|602628659|ref|XP_007421263.1| Cluster2331 Tba018579.1 2.89E-12 1.18E-08 5 gi|602672645|ref|XP_007441824.1| Cluster6293 Tba018606.1 0.00E+00 0.00E+00 10 gi|565321615|gb|ETE72404.1| Cluster7626 Tba018633.1 9.61E-09 3.93E-05 2 gi|565317616|gb|ETE69041.1| Cluster4254 Tba018639.1 0.00E+00 0.00E+00 8 gi|565317614|gb|ETE69039.1| Cluster3712 Tba018685.1 9.33E-09 3.82E-05 4 gi|602633107|ref|XP_007423436.1| Cluster4238 Tba018751.1 0.00E+00 0.00E+00 11 gi|565322379|gb|ETE73082.1| Cluster5220 Tba018791.1 0.00E+00 0.00E+00 29 gi|602667504|ref|XP_007439313.1| Cluster4553 Tba018909.1 1.22E-15 5.00E-12 8 gi|637315886|ref|XP_008111752.1| Cluster2992 Tba019051.1 6.39E-13 2.62E-09 4 gi|602654318|ref|XP_007433000.1| Cluster8761 Tba019065.1 0.00E+00 0.00E+00 46 gi|565317764|gb|ETE69161.1| Cluster6436 Tba019091.1 0.00E+00 0.00E+00 25 gi|565316525|gb|ETE68167.1| Cluster5630 Tba019186.1 1.49E-08 6.11E-05 2 gi|602641258|ref|XP_007427434.1| Cluster3762 Tba019216.1 3.30E-11 1.35E-07 9 gi|565322865|gb|ETE73508.1| Cluster7392 Tba019219.1 7.25E-11 2.97E-07 5 gi|602645486|ref|XP_007429182.1| Cluster5516 Tba019220.1 0.00E+00 0.00E+00 18 gi|565322870|gb|ETE73513.1| Cluster8294 Tba019228.1 0.00E+00 0.00E+00 25 gi|602656745|ref|XP_007434104.1| Cluster5373 Tba019389.1 3.48E-07 1.42E-03 3 gi|565310312|gb|ETE63560.1| Cluster6419 Tba019391.1 8.20E-09 3.36E-05 3 gi|602647416|ref|XP_007430134.1| Cluster4548 Tba019476.1 1.89E-06 7.73E-03 1 gi|675407742|ref|XP_008932310.1| Cluster4566 Tba019481.1 0.00E+00 0.00E+00 20 gi|565298939|gb|ETE57599.1| Cluster6771 Tba019644.1 7.77E-16 3.18E-12 11 gi|565318906|gb|ETE70086.1| Cluster7265 Tba019652.1 2.96E-14 1.21E-10 11 gi|695138210|ref|XP_009508534.1| Cluster5038 Tba019671.1 3.22E-08 1.32E-04 1 gi|565323161|gb|ETE73764.1| Cluster5031 Tba019673.1 0.00E+00 0.00E+00 21 gi|602657638|ref|XP_007434535.1| Cluster5026 Tba019680.1 0.00E+00 0.00E+00 22 gi|602664271|ref|XP_007437753.1|

38

Cluster6927 Tba019745.1 3.53E-09 1.45E-05 3 gi|565317805|gb|ETE69191.1| Cluster4875 Tba019783.1 0.00E+00 0.00E+00 24 gi|565307596|gb|ETE61800.1| Cluster9542 Tba019848.1 1.31E-11 5.36E-08 4 gi|602655858|ref|XP_007433670.1| Cluster4457 Tba019890.1 0.00E+00 0.00E+00 37 gi|602632086|ref|XP_007422937.1| Cluster3321 Tba019950.1 1.78E-08 7.30E-05 3 gi|602629239|ref|XP_007421547.1| Cluster4808 Tba020001.1 0.00E+00 0.00E+00 18 gi|139948756|ref|NP_076115.3| Cluster6275 Tba020056.1 0.00E+00 0.00E+00 13 gi|564268544|ref|XP_006273061.1| Cluster4765 Tba020131.1 0.00E+00 0.00E+00 56 gi|677411865|gb|KFQ19303.1| Cluster4338 Tba020144.1 6.57E-09 2.69E-05 5 gi|602634749|ref|XP_007424245.1| Cluster6296 Tba020210.1 2.01E-07 8.24E-04 3 gi|565311638|gb|ETE64451.1| Cluster8806 Tba020224.1 7.77E-16 3.18E-12 5 gi|602634700|ref|XP_007424221.1| Cluster7457 Tba020229.1 0.00E+00 0.00E+00 14 gi|699670237|ref|XP_009884499.1| Cluster5057 Tba020245.1 0.00E+00 0.00E+00 9 gi|565307642|gb|ETE61828.1| Cluster9232 Tba020391.1 6.76E-07 2.77E-03 7 gi|602655313|ref|XP_007433406.1| Cluster8624 Tba020563.1 2.07E-13 8.47E-10 3 gi|602660743|ref|XP_007436033.1| Cluster9934 Tba020706.1 1.11E-16 4.54E-13 17 gi|602637991|ref|XP_007425829.1| Cluster7378 Tba020717.1 0.00E+00 0.00E+00 23 gi|602658927|ref|XP_007435162.1| Cluster5380 Tba020776.1 7.79E-10 3.19E-06 7 gi|602652422|ref|XP_007432584.1| Cluster5339 Tba020790.1 5.55E-16 2.27E-12 15 gi|602667609|ref|XP_007439365.1|

Table S27 Gene Ontology (GO) enrichment of positively selected genes in the Tibetan hot spring snake by topGO42. GO Class # in # in P Term genome significant GO:0016922 MF 16 3 0.0021 ligand-dependent nuclear receptor binding GO:0003714 MF 114 7 0.0027 transcription corepressor activity oxidoreductase activity, acting on the GO:0016624 MF 8 2 0.007 aldehyde or oxo group of donors, disulfide as acceptor intramolecular oxidoreductase activity, GO:0016861 MF 10 2 0.0111 interconverting aldoses and ketoses cyclin-dependent protein serine/threonine GO:0004693 MF 11 2 0.0134 kinase activity magnesium ion transmembrane GO:0015095 MF 12 2 0.0159 transporter activity ligand-dependent nuclear receptor GO:0030374 MF 35 3 0.0194 transcription coactivator activity GO:0003755 MF 36 3 0.021 peptidyl-prolyl cis-trans isomerase activity GO:0003676 MF 3072 64 0.0219 nucleic acid binding GO:0004672 MF 666 10 0.0264 protein kinase activity GO:0035064 MF 41 3 0.0295 methylated histone binding GO:0030331 MF 17 2 0.0311 estrogen receptor binding GO:0046966 MF 18 2 0.0346 thyroid hormone receptor binding GO:0003697 MF 47 3 0.0418 single-stranded DNA binding GO:0004812 MF 50 3 0.0487 aminoacyl-tRNA ligase activity transcription initiation from RNA GO:0006367 BP 74 6 0.0023 polymerase II promoter mRNA transcription from RNA GO:0042789 BP 6 2 0.0034 polymerase II promoter GO:0030879 BP 58 3 0.0048 mammary gland development GO:0032656 BP 7 2 0.0048 regulation of interleukin-13 production

39

negative regulation of keratinocyte GO:0010839 BP 8 2 0.0063 proliferation GO:0043174 BP 8 2 0.0063 nucleoside salvage GO:0043101 BP 8 2 0.0063 purine-containing compound salvage GO:0032674 BP 10 2 0.0099 regulation of interleukin-5 production regulation of protein import into GO:0033158 BP 10 2 0.0099 nucleus, translocation renal system process involved in regulation GO:0003071 BP 10 2 0.0099 of systemic arterial blood pressure GO:0055074 BP 186 3 0.0122 calcium ion homeostasis GO:0072574 BP 8 2 0.0154 hepatocyte proliferation GO:0090398 BP 34 3 0.0154 cellular senescence GO:0000413 BP 35 3 0.0167 protein peptidyl-prolyl isomerization GO:0043967 BP 60 4 0.0179 histone H4 acetylation GO:0021542 BP 14 2 0.0193 dentate gyrus development GO:0090136 BP 14 2 0.0193 epithelial cell-cell adhesion GO:0015693 BP 14 2 0.0193 magnesium ion transport negative regulation of systemic arterial GO:0003085 BP 14 2 0.0193 blood pressure GO:0010824 BP 15 2 0.022 regulation of centrosome duplication GO:0043968 BP 15 2 0.022 histone H2A acetylation GO:0006974 BP 517 12 0.0239 cellular response to DNA damage stimulus GO:0035855 BP 16 2 0.0249 megakaryocyte development GO:0046487 BP 17 2 0.0279 glyoxylate metabolic process GO:0035909 BP 18 2 0.0311 aorta morphogenesis GO:0050890 BP 191 4 0.0313 cognition GO:0009164 BP 992 12 0.0314 nucleoside catabolic process positive regulation of cyclin-dependent GO:0045737 BP 19 2 0.0344 protein serine/threonine kinase activity GO:0006412 BP 494 13 0.0352 translation GO:0048821 BP 21 2 0.0414 erythrocyte development GO:0051402 BP 183 4 0.0416 neuron apoptotic process GO:0006418 BP 51 3 0.0444 tRNA aminoacylation for protein translation GO:0006506 BP 22 2 0.0451 GPI anchor biosynthetic process positive regulation of reactive oxygen GO:2000379 BP 22 2 0.0451 species metabolic process GO:0071156 BP 61 3 0.0456 regulation of cell cycle arrest GO:0060019 BP 8 2 0.0456 radial glial cell differentiation GO:0050866 BP 60 2 0.0458 negative regulation of cell activation cyclin-dependent protein kinase GO:0000307 CC 18 3 0.003 holoenzyme complex GO:0005763 CC 9 2 0.0091 mitochondrial small ribosomal subunit GO:0000812 CC 11 2 0.0136 Swr1 complex GO:0016592 CC 37 3 0.0229 mediator complex

40

Table S28 Twelve candidate genes that may play a role in DNA damage response in the Tibetan hot spring snake. Gene P Corrected P #Sites Description ERCC6 PGBD3, ERCC6; piggyBac transposable element derived 3; Tba002774.1 3.05E-10 1.25E-06 19 K10841 DNA excision repair protein ERCC-6 SMARCAL1 K14440 SWI/SNF-related matrix-associated actin-dependent Tba003713.1 2.94E-07 1.20E-03 2 regulator of chromatin subfamily A-like protein 1 [EC:3.6.4.12] Tba005644.1 1.82E-06 7.46E-03 1 MEIOB meiosis-specific with OB domain-containing protein ING4 similar to inhibitor of growth family, member 4; Tba007247.1 0.00E+00 0.00E+00 13 K11346 inhibitor of growth protein 4 Tba008683.1 7.33E-12 3.00E-08 5 RBBP5 RBBP5; retinoblastoma binding protein 5; K14961 COMPASS component SWD1 DMAP1 DMAP1; DNA methyltransferase 1 associated protein 1; Tba012535.1 3.13E-13 1.28E-09 10 K11324 DNA methyltransferase 1-associated protein 1 MSH2 MSH2; mutS homolog 2, colon cancer, nonpolyposis type 1 (E. coli); Tba012634.1 0.00E+00 0.00E+00 15 K08735 DNA mismatch repair protein MSH2 p21 CIP1; cdk inhibitor CIP1 (p21); Tba012937.1 6.27E-12 2.57E-08 7 K06625 cyclin-dependent kinase inhibitor 1A CCNT2 ccnt2a, ccnt2, cycT, wu:fi17h07, zgc:55357; cyclin T2a; Tba015373.1 0.00E+00 0.00E+00 20 K15188 cyclin T USP7 ubiquitin specific protease 7; Tba015670.1 0.00E+00 0.00E+00 15 K11838 ubiquitin carboxyl-terminal hydrolase 7 [EC:3.1.2.15] Tba017041.1 1.51E-13 6.18E-10 12 RBM38 RNA-binding protein 38 APLF C2orf13; chromosome 2 open reading frame 13; Tba018633.1 9.61E-09 3.93E-05 2 K13295 aprataxin and PNK-like factor [EC:4.2.99.18]

Table S29 Twenty-seven genes with hot spring snake-specific amino acid substitutions. Protein SwissProt Accession Function Mitosis Tba013290.1 sp|Q76I89|NDC80_CHICK Kinetochore protein NDC80 homolog Tba019434.1 sp|Q9P2P6|STAR9_HUMAN StAR-related lipid transfer protein 9 Reproduction Tba004255.1 sp|Q6ZQQ6|WDR87_HUMAN WD repeat-containing protein 87 Tba004503.1 sp|O77726|ZP2_MACRA Zona pellucida sperm-binding protein 2 Tba008098.1 sp|Q6DMN8|SPAT4_PANTR Spermatogenesis-associated protein 4 5'-nucleotidase activity Tba019485.1 sp|Q5ZIZ4|5NTC_CHICK Cytosolic purine 5'-nucleotidase Tba005347.1 sp|Q86UY8|NT5D3_HUMAN 5'-nucleotidase domain-containing protein 3 ROS and DNA damage Tba017301.1 sp|A5YM72|CRNS1_HUMAN Carnosine synthase 1 Tba013094.1 sp|Q5I4H3|FEN1_XIPMA Flap endonuclease 1 Tba015247.1 sp|Q5FWL3|RNF41_XENLA E3 ubiquitin-protein ligase NRDP1 Brain and nervous system

41

Tba000661.1 sp|Q9PWB0|CER1_CHICK Cerberus Tba007237.1 sp|Q16538|GP162_HUMAN Probable G-protein coupled receptor 162 Tba008685.1 sp|Q02246|CNTN2_HUMAN Contactin-2 Tba012913.1 sp|E7F9T0|MICA1_DANRE Protein-methionine sulfoxide oxidase mical1 Tba001417.1 sp|Q96Q04|LMTK3_HUMAN Serine/threonine-protein kinase LMTK3 Tba015947.1 sp|Q8N3K9|CMYA5_HUMAN Cardiomyopathy-associated protein 5 Immune Tba001154.1 sp|Q62658|FKB1A_RAT Peptidyl-prolyl cis-trans isomerase FKBP1A Tba004573.1 sp|Q9NZM3|ITSN2_HUMAN Intersectin-2 Tba000789.1 sp|O43916|CHST1_HUMAN Carbohydrate sulfotransferase 1 Tba002785.1 sp|Q6ZS81|WDFY4_HUMAN WD repeat- and FYVE domain-containing protein 4 Tba007554.1 sp|P24394|IL4RA_HUMAN Interleukin-4 receptor subunit alpha Heart Tba006950.1 sp|P10288|CADH2_CHICK Cadherin-2 Potassium/sodium hyperpolarization-activated Tba000999.1 sp|Q9Y3Q4|HCN4_HUMAN cyclic nucleotide-gated channel 4 Others Tba004500.1 sp|Q00341|VIGLN_HUMAN Vigilin Tba005715.1 sp|Q6W2J9|BCOR_HUMAN BCL-6 corepressor Tba007505.1 sp|Q91WK1|SPRY4_MOUSE SPRY domain-containing protein 4 Tba013088.1 sp|A0JPN2|S39A4_RAT Zinc transporter ZIP4

Table S30 Primers, annealing temperatures, and expected product sizes. annealing Primer Sequence Product size temperatures EPAS1-R GGTCAAGTGGTTACAGGGCA 59.89 104 EPAS1-F GGCTGCAGGTTCCGAGTATT 60.11 EPO-F TTGCGGAAAGTGTCAGCAGT 60.46 104 EPO-R GCTGCATGTGGATAAAGCCG 59.97 GAPDH-R TTGAGGTCAATGAAGGGGTC 57.12 117 GAPDH-F GAAGGTGAAGGTCGGAGTCA 59.03

42

REFERENCES 1. Bolger AM, Lohse M, & Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114-2120. 2. Luo R, et al. (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1(1):18. 3. Zhang J, Kobert K, Flouri T, & Stamatakis A (2014) PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30(5):614-620. 4. Leggett RM, Clavijo BJ, Clissold L, Clark MD, & Caccamo M (2014) NextClip: an analysis and read preparation tool for Nextera Long Mate Pair libraries. Bioinformatics 30(4):566-568. 5. Li R, et al. (2009) The sequence and de novo assembly of the giant panda genome. Nature 463(7279):311-317. 6. Parra G, Bradnam K, & Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23(9):1061-1067. 7. Grabherr MG, et al. (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29(7):644-652. 8. Hahn C, Bachmann L, & Chevreux B (2013) Reconstructing mitochondrial genomes directly from genomic next- generation sequencing reads—a baiting and iterative mapping approach. Nucleic Acids Res.:gkt371. 9. Smit A & Hubley R (2010) RepeatModeler Open-1.0. Repeat Masker Website. 10. Smit AF, Hubley R, & Green P (1996-2010) RepeatMasker Open-3.0. 11. McCarthy EM & McDonald JF (2003) LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19(3):362-367. 12. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27(2):573. 13. Schrader L, et al. (2014) Transposable element islands facilitate adaptation to novel environments in an invasive species. Nat. Commun. 5:5495. 14. Haas BJ, et al. (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31(19):5654-5666. 15. Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5(1):59. 16. Lomsadze A, Burns PD, & Borodovsky M (2014) Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res.:gku557. 17. Stanke M, et al. (2006) AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34(suppl 2):W435-W439. 18. Holt C & Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12(1):491. 19. Jurka J, et al. (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110(1-4):462-467. 20. Boeckmann B, et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31(1):365-370. 21. Camacho C, et al. (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421. 22. Pruitt KD, et al. (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42(D1):D756- D763. 23. Conesa A & Gotz S (2008) Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics 2008:619832. 24. Ashburner M, et al. (2000) Gene Ontology: tool for the unification of biology. Nat. Genet. 25(1):25-29. 25. Quevillon E, et al. (2005) InterProScan: protein domains identifier. Nucleic Acids Res. 33(Web Server issue):W116-120. 26. Bru C, et al. (2005) The ProDom database of protein domain families: more emphasis on 3D. Nucleic Acids Res. 33(suppl 1):D212-D215. 27. Attwood T, Beck M, Bleasby A, & Parry-Smith D (1994) PRINTS--a database of protein motif fingerprints. Nucleic Acids Res. 22(17):3590. 28. Bateman A, et al. (2004) The Pfam protein families database. Nucleic Acids Res. 32(suppl 1):D138-D141. 29. Ponting CP, Schultz J, Milpetz F, & Bork P (1999) SMART: identification and annotation of domains from signalling and extracellular protein sequences. Nucleic Acids Res. 27(1):229-232. 30. Mi H, et al. (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 33(suppl 1):D284-D288. 31. Hulo N, et al. (2006) The PROSITE database. Nucleic Acids Res. 34(suppl 1):D227-D230. 32. Haft DH, Selengut JD, & White O (2003) The TIGRFAMs database of protein families. Nucleic Acids Res. 31(1):371-373. 33. Consortium I (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on

43

vertebrate evolution. Nature 432(7018):695-716. 34. Alföldi J, et al. (2011) The genome of the green anole lizard and a comparative analysis with birds and mammals. Nature 477(7366):587-591. 35. Georges A, et al. (2015) High-coverage sequencing and annotated assembly of the genome of the Australian dragon lizard Pogona vitticeps. GigaScience 4(1):45. 36. Fischer S, et al. (2011) Using OrthoMCL to Assign Proteins to OrthoMCL‐DB Groups or to Cluster Proteomes Into New Ortholog Groups. Current Protocols in Bioinformatics:6.12. 11-16.12. 19. 37. Maere S, Heymans K, & Kuiper M (2005) BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21(16):3448-3449. 38. Smoot ME, Ono K, Ruscheinski J, Wang P-L, & Ideker T (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27(3):431-432. 39. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5):1792-1797. 40. Talavera G & Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56(4):564-577. 41. Ronquist F & Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12):1572-1574. 42. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24(8):1586-1591. 43. De Bie T, Cristianini N, Demuth JP, & Hahn MW (2006) CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22(10):1269-1271. 44. Alexa A & Rahnenfuhrer J (2010) topGO: enrichment analysis for gene ontology. R package version 2(0). 45. Suyama M, Torrents D, & Bork P (2006) PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34(Web Server issue):12. 46. Harris RS (2007) Improved pairwise alignment of genomic DNA (The Pennsylvania State University). 47. Li H & Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14):1754-1760. 48. Vicoso B, Emerson J, Zektser Y, Mahajan S, & Bachtrog D (2013) Comparative sex chromosome genomics in snakes: differentiation, evolutionary strata, and lack of global dosage compensation. PLoS Biol. 11(8):e1001643. 49. Yin W, et al. (2016) Evolutionary trajectories of snake genes and genomes revealed by comparative analyses of five-pacer viper. Nat. Commun. 7:13107. 50. Blanchette M, et al. (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14(4):708-715. 51. Kumar P, Henikoff S, & Ng PC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature protocols 4(7):1073-1081. 52. Adzhubei I, Jordan DM, & Sunyaev SR (2013) Predicting functional effect of human missense mutations using PolyPhen‐2. Current protocols in human genetics:7.20. 21-27.20. 41. 53. Benton MJ & Donoghue PC (2007) Paleontological evidence to date the tree of life. Mol. Biol. Evol. 24(1):26-53. 54. Luo ZX, Yuan CX, Meng QJ, & Ji Q (2011) A Jurassic eutherian mammal and divergence of marsupials and placentals. Nature.

44