bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 The whale shark genome reveals patterns

2 of gene family evolution 3 4 Milton Tan1*, Anthony K. Redmond2, Helen Dooley3, Ryo Nozu4, Keiichi Sato4,5, Shigehiro 5 Kuraku6, Sergey Koren7, Adam M. Phillippy7, Alistair D.M. Dove8, Timothy D. Read9 6 7 1. Illinois Natural History Survey at University of Illinois Urbana-Champaign, Champaign, IL, 8 USA. 9 2. Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland. 10 3. University of Maryland School of Medicine, Institute of Marine & Environmental Technology, 11 Baltimore, MD, USA. 12 4. Okinawa Churashima Research Center, Okinawa Churashima Foundation, Okinawa, Japan. 13 5. Okinawa Churaumi Aquarium, Motobu, Okinawa, Japan 14 6. RIKEN Center for Biosystems Dynamics Research (BDR), RIKEN, Kobe, Japan. 15 7. National Human Genome Research Institute, Bethesda, MD, USA. 16 8. Georgia Aquarium, Atlanta, GA, USA. 17 9. Department of Infectious Diseases, Emory University School of Medicine, Atlanta, GA, USA. 18 19 Keywords: , Elasmobranchii, , Cartilaginous , , 20 comparative genomics, innate immunity, gigantism, vertebrate evolution

21 Abstract 22 Chondrichthyes (cartilaginous fishes) are fundamental for understanding vertebrate evolution, 23 yet their genomes are understudied. We report long-read sequencing of the whale shark 24 genome to generate the best gapless chondrichthyan genome assembly yet with higher contig 25 contiguity than all other cartilaginous fish genomes, and studied vertebrate genomic evolution of 26 ancestral gene families, immunity, and gigantism. We found a major increase in gene families at 27 the origin of gnathostomes (jawed ) independent of their genome duplication. We 28 studied vertebrate pathogen recognition receptors (PRRs), which are key in initiating innate 29 immune defense, and found diverse patterns of gene family evolution, demonstrating that 30 adaptive immunity in gnathostomes did not fully displace germline-encoded PRR innovation. We 31 also discovered a new Toll-like receptor (TLR29) and three NOD1 copies in the whale shark. 32 We found chondrichthyan and giant vertebrate genomes had decreased substitution rates 33 compared to other vertebrates, but gene family expansion rates varied among vertebrate giants, 34 suggesting substitution and expansion rates of gene families are decoupled in vertebrate 35 genomes. Finally, we found gene families that shifted in expansion rate in vertebrate giants 36 were enriched for human cancer-related genes, consistent with gigantism requiring adaptations 37 to suppress cancer. 38

1 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

39 Introduction 40 Jawed vertebrates (Gnathostomata) comprise two extant major groups, the cartilaginous 41 fishes (Chondrichthyes) and the bony vertebrates (, including Tetrapoda) 42 (Venkatesh et al., 2014). Comparison of genomes between these two groups not only provides 43 insight into early gnathostome evolution and the emergence of various biological features, but 44 also enables inference of ancestral jawed vertebrate traits (Venkatesh et al., 2014). The 45 availability of sequence data from many species across vertebrate lineages is key to the 46 success of such studies. Until very recently, genomic data from cartilaginous fishes was 47 significantly underrepresented compared to other vertebrate lineages. The first cartilaginous fish 48 genome, that of Callorhinchus milii (known colloquially as ghost shark, elephant shark, or 49 elephant fish), was used to study the early evolution of genes related to bone development and 50 emergence of the adaptive immune system (Venkatesh et al., 2014). As a member of the 51 Holocephali (chimaeras, ratfishes), one of the two major groups of cartilaginous fishes, C. milii 52 separated from the Elasmobranchii (sharks, rays, and skates) approximately 420 million years 53 ago, shortly after the divergence from bony vertebrates. Sampling other elasmobranch genomes 54 for comparison is therefore critically important to our understanding of vertebrate genome 55 evolution (Redmond et al., 2018). 56 Until recently, few genetic resources have been available for elasmobranchs in general, 57 and for the whale shark (Rhincodon typus) in particular. The first draft elasmobranch genome 58 published was for a male whale shark of Taiwanese origin by Read et al. (2017). Famously 59 representing one of Earth's ocean giants, the whale shark is by far the largest of all extant 60 fishes, reaching a maximum confirmed length of nearly 19 meters (McClain et al., 2015). Due to 61 its phylogenetic position relative to other vertebrates, the scarcity of shark genomes, and its 62 unique biology, the previous whale shark genome was used to address questions related to 63 vertebrate genome evolution (Hara et al., 2018; Marra et al., 2019), the relationship of gene 64 evolution in sharks and unique shark traits (Hara et al., 2018; Marra et al., 2019), as well as the 65 evolution of gigantism (Weber et al., 2020). A toll-like receptor (TLR) similar to TLR21 was also 66 found in this first whale shark genome draft assembly, suggesting that TLR21 was derived in the 67 most recent common ancestor of jawed vertebrates. While this represented an important step 68 forward for elasmobranch genomics, the genome was fragmentary, and substantial 69 improvements to the genome contiguity and annotation were expected from reassembling the 70 genome using PacBio long-read sequences (Read et al., 2017). 71 Despite the relative lack of genomic information prior, much recent work has focused 72 upon further sequencing, assembling, and analyzing of the whale shark nuclear genome (Hara 73 et al., 2018; Read et al., 2017; Weber et al., 2020). Hara et al. reassembled the published whale 74 shark genome data, and sequenced transcriptome data from blood cells sampled from a 75 different individual for annotation (Hara et al., 2018). Alongside the work on the whale shark 76 genome, genomes have also been assembled for the bamboo shark (Chiloscyllium punctatum), 77 cloudy catshark (Scyliorhinhus torazame), white shark (Carharodon carcharias), and white- 78 spotted bamboo shark (Chiloscyllium plagiosum). Comparative analyses of shark genomes 79 supported numerous evolutionary implications of shark genome evolution, including a slow rate 80 of shark genome evolution, a reduction in olfactory gene diversity, positive selection of wound 81 healing genes, proliferation of CR1-like LINEs within introns related to their larger genomes, and

2 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

82 rapid evolution in immune-related genes (Hara et al., 2018; Marra et al., 2019; Weber et al., 83 2020; Zhang et al., 2020). 84 Long read sequencing is an important factor in assembling longer contigs to resolve 85 repetitive regions which comprise the majority of vertebrate genomes (Koren et al., 2017). 86 Herein we report on the best gapless assembly of the whale shark genome to date, based on de 87 novo assembly of long reads obtained with the PacBio single molecule real-time sequencing 88 platform. We used this assembly and new annotation in a comparative genomic approach to 89 investigate the origins and losses of gene families, aiming to identify patterns of gene family 90 evolution associated with major early vertebrate evolutionary transitions. Building upon our 91 previous finding of a putative TLR21 in the initial draft whale shark genome assembly, we 92 performed a detailed examination of the evolution of jawed vertebrate pathogen recognition 93 receptors (PRRs), which are innate immune molecules that play a vital role in the detection of 94 pathogens. Despite their clear functional importance, PRRs (and innate immune molecules in 95 general), have been poorly studied in cartilaginous fishes until now. Given that cartilaginous 96 fishes are the most distant evolutionarily lineage relative to humans to possess both an adaptive 97 and innate immune system, the study of their PRR repertoire is important to understanding the 98 integration of the two systems in early jawed vertebrates. For example, previous work has 99 shown several invertebrate genomes possess greater expanded PRR repertoires 100 when compared to relatively conserved repertoires found in bony vertebrates, which suggests 101 that adaptive immunity may have negated the need for many new PRRs in jawed vertebrates 102 (Huang et al., 2008; Rast et al., 2006; Smith et al., 2013), although this hypothesis has not been 103 formally tested. Using the new whale shark genome assembly, we therefore investigated the 104 repertoires of three major PRR families: NOD-like receptors, RIG-like receptors, and Toll-like 105 receptors. Next, we compared the rates of functional genomic evolution in multiple independent 106 lineages of vertebrates in which gigantism has evolved, including the whale shark, to test for 107 relationships between gigantism and genomic evolution among vertebrates. Finally, we studied 108 whether gene families that have shifted in gene duplication rates were enriched for orthologs of 109 known cancer-related genes. Larger-bodied organisms tend to have lower cancer rates than 110 expected given their increased numbers of cells relative to smaller-bodied organisms (Peto et 111 al., 1975), suggesting genes involved in cancer suppression may evolve differently in vertebrate 112 giants. Recent research in giant such as elephants and whales supports this 113 hypothesis and has identified selection or duplication of various gene families that are related to 114 suppressing cancer in humans (Abegglen et al., 2015; Sulak et al., 2016; Tollis et al., 2019). 115 Hence, we studied whether gene families that have shifted in gene duplication rates were 116 enriched for orthologs of known cancer-related genes.

117 Results and Discussion 118 Gapless Genome Assembly. We added to our previously sequenced short-read Illumina data 119 (~30X coverage) by generating 61.8 GB of long-read PacBio sequences; relative to a non- 120 sequencing-based estimate of genome size of 3.73 Gbp (Hara et al., 2018), this is an expected 121 coverage of long read sequences of about 16X coverage, for a total of ~46X coverage. The new 122 whale shark genome assembly represented the best gapless assembly to date for the whale 123 shark (Appendix, Supplementary table 1). The total length of the new assembly was 2.96 Gbp.

3 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

124 The total size of the assembly was very similar to the genome size estimated from the k-mer 125 based approach GenomeScope of approximately 2.79 Gbp, suggesting the genome is fairly 126 complete. On the other hand, it was smaller than a non-sequencing based estimate of the whale 127 shark genome size of 3.73 Gbp by Hara et al. (2018), which suggests that sections of the 128 genome, potentially comprising primarily repetitive elements, are still missing. Repetitive 129 elements were annotated to comprise roughly 50.34% of the genome assembly (Appendix 1A). 130 The new assembly had 57,333 contigs with a contig N50 of 144,422 bp, or fewer contigs 131 than the number of scaffolds of previous assemblies, and a higher contig N50, representing a 132 dramatic improvement in gapless contiguity compared to the existing whale shark genome 133 assemblies (Supplementary table 1). This higher contiguity at the contig level (vs. scaffold level) 134 was also better than the published Callorhinchus, brownbanded bamboo shark, cloudy catshark, 135 and white shark genomes (Hara et al., 2018; Marra et al., 2019; Venkatesh et al., 2014). The 136 scaffolded genome had 39,176 scaffolds and a scaffold N50 of 344,460 bp. Relative to the 137 previously published most contiguous assembly by Weber et al.!"#$#$%&!'()*+!,(+-!.+/0.,+1!2! 138 32.!()4(+.!562330*1!78$!03!9#:8;!<=!">:?>!<=!@#$$!=/%&!0A.!562330*1+1!255+B=*-!(25!32.!3+'+.! 139 562330*15!">:>C!5+DA+E6+5&!?>F&;??!5+DA+E6+5!@#$$!=/% (Supplementary table 1). 140 Based on GenomeScope, the whale shark genome had an estimated level of 141 heterozygosity of approximately 0.0797–0.0828%, consistent with the k-mer coverage plot 142 showing a unimodal distribution (Supplementary figure 1). Comparison of the k-mer profile with 143 the presences of k-mers in the assembly revealed that they were relatively concordant, with no 144 indication that there were many k-mers that were represented twice in the assembly (which 145 could be due to a diploid individual having phased haplotypes assembled into separate contigs) 146 (Supplementary figure 2). Mapping the reads to the genome assembly and calling SNPs using 147 freebayes provided a similar estimate of 2,189,244 SNPs, a rate of 0.0739% heterozygosity, or 148 an average of a SNP every 1353 bases. This suggests that the heterozygosity of the whale 149 shark genome is relatively low. 150 To assess gene completeness, we first used BUSCO v2 (Simão et al., 2015). Of 2,586 151 orthologs conserved among vertebrates searched by BUSCO (Simão et al., 2015), we found 152 2,033 complete orthologs in the whale shark genome, of which 1,967 were single-copy and 66 153 had duplicates. 323 orthologs were detected as fragments, while 230 were not detected by 154 BUSCO. With 78.7% complete genes, this represents a marked improvement over the previous 155 whale shark genome assembly, which only had 15% complete BUSCO genes (Hara et al., 156 2018). We also evaluated gene completeness using a rigid 1-to-1 ortholog core vertebrate gene 157 (CVG) set that is better tuned for finding gene families in elasmobranchs (Hara et al., 2015) 158 implemented in gVolante server version 1.2.1 (Nishimura et al., 2017; accessed 2019 Apr 23) 159 (Nishimura et al., 2017), we found that 85% of core vertebrate genes were complete and found 160 that 97.4% of core vertebrate genes included partial genes, which compares favorably to 161 completeness statistics in other shark assemblies (Hara et al., 2018) (Supplementary table 3). 162 The gene content of this whale shark assembly was thus quite complete and informative for 163 questions regarding vertebrate gene evolution. 164 165 Ancestral Vertebrate Genome Evolution. We sought to use the new whale shark genome 166 assembly to infer the evolutionary history of protein-coding gene families (i.e. orthogroups)

4 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

167 across vertebrate phylogeny, aiming to provide insight into the evolution of biological 168 innovations during major transitions in vertebrate evolution. Orthogroups are defined as all 169 genes descended from a single gene in the common ancestor of the species considered (Emms 170 and Kelly, 2015); hence, they are dependent to a degree on the phylogenetic breadth of species 171 included in the analysis. Genes from the proteomes for 37 species deriving primarily 172 from Ensembl and RefSeq (O’Leary et al., 2016; Yates et al., 2020) (Supplementary table 4), 173 including 35 representative vertebrates, a sea squirt (), and a (Branchiostoma) 174 were assigned to 18,435 orthologous gene families using OrthoFinder, which were assigned to 175 a mean of 10,688 orthogroups per genome. We then inferred the history of gene family origin 176 and loss by comparing the presence and absence of gene families across species. Although 177 accurate inference of gene family size evolution in vertebrates may benefit from further taxon 178 sampling among invertebrates, Branchiostoma is notable among for retaining a 179 relatively high number of gene families present in the stem branch (Richter et al., 2018), 180 hence it is one of the best outgroup species to vertebrates to study the origins of novel genes 181 within the vertebrate clade. 182 We inferred a consistent increase in the total number of gene families from the root to 183 the MRCA (most recent common ancestor) of Gnathostomata, but only slight increases 184 following this in the MRCAs of bony fishes and cartilaginous fishes (black numbers, Figure 1, 185 Appendix). We also found numbers of novel gene families increased from the root to a peak in 186 the MRCA of gnathostomes, and then novelty decreased precipitously towards the bony and 187 cartilaginous fish descendants (numbers indicated by + symbol, Figure 1). Gene families 188 conserved in all members of a clade may be considered core genes. There was also an 189 increase in the number of core genes of each ancestor from the most inclusive to the least 190 inclusive clades, as expected with decreasing phylogenetic breadth of the less inclusive clades 191 (black parenthetical numbers, Figure 1). We found that a decreasing number of novel core gene 192 families in vertebrate ancestors were retained between the MRCA of Olfactores ( + 193 vertebrates) and the MRCA of Gnathostomata (parenthetical numbers indicated by +, Figure 1), 194 in contrast to the general pattern of increasing numbers of novel gene families along these 195 same branches. Overall, this implied that the origin of jawed vertebrates established a large 196 proportion of novel gene families of both bony vertebrates and cartilaginous fishes, with variable 197 retention among descendant lineages. 198 The inclusion of multiple chondrichthyan lineages was important in the inference of 199 gnathostome-derived gene families. The selachians (true sharks) lost fewer gene families than 200 Callorhinchus (144 vs. 2,269 overall gene families, 83 vs. 1,422 gnathostome-derived gene 201 families), but future improved taxon sampling may further increase this estimate. Additional 202 high-quality genomes of holocephalans may demonstrate that some of these losses in 203 Callorhinchus are lineage-specific or due to assembly or annotation errors, and the addition of 204 batoids (skates and rays) could recover gene families independently lost in holocephalans and 205 selachians. Thus, increasing chondrichthyan taxon sampling allowed for more confidence in the 206 origin and loss of gene families in vertebrate history and assignment of hundreds of genes as 207 having originated prior to the MRCA of gnathostomes. 208 The burst in emergence of novel gene families that we observed along the ancestral 209 jawed vertebrate branch coincides with the two rounds (2R) of whole genome duplication that 210 occurred early in vertebrate evolution, resulting in gene duplicates referred to as ohnologs

5 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

211 (Braasch et al., 2016; Ohno et al., 1968; Singh et al., 2015). Hypothetically, divergent ohnologs 212 may be erroneously assigned to novel gene families and artefactually inflate our estimate for 213 gene family birth along the ancestral jawed vertebrate branch. To estimate the potential extent 214 of ohnolog family splitting, we compared the 2,885 gene families inferred as novel at the base of 215 jawed vertebrates to ohnolog families previously inferred by Singh et al. (2015). Generally, most 216 ohnologs had all their copies assigned to single gene families (1131–1609 ohnologs per 217 species). We found that only 157 (5.4% of 2,885 gene families) of gene families that we inferred 218 to have originated in the MRCA of jawed vertebrates corresponded to split ohnologs. Hence, the 219 split ohnologs are not a large proportion of the novel gene families in the jawed vertebrate 220 ancestor. In addition, we also found that only 3 gene families were inferred to be derived at the 221 MRCA of teleosts, coinciding with the teleost-specific genome duplication, which also supports 222 that gene family inference is robust to genome duplication. This finding reinforces the 223 importance of this evolutionary transition for genomic novelty, not just due to the vertebrate 2R 224 whole genome duplication, but also through the addition of novel gene families. 225 Next, we tested whether gene families that were gained or lost during vertebrate 226 evolution were enriched for certain GO (Gene Ontology) or Pfam annotations (Appendix; 227 Supplementary table 5), potentially indicating functional genomic shifts preceding the origin of 228 these clades. Functional annotations were annotated using InterProScan 5.32-71.0 and Kinfin 229 1.0 (Laetsch and Blaxter, 2017), and functional enrichment was determined using the Fisher's 230 exact test. Overall, there were 8,700 gene families (47.2%) annotated for Gene Ontology (GO) 231 functional terms and 14,727 gene families (79.9%) annotated for Pfam protein domains. For 232 example, for the 711 novel gene families in the MRCA of Olfactores, we found an enrichment of 233 connexin function (Supplementary table 6). This is consistent with prior work that determined 234 that the origin of connexin gap junction proteins among was in the MRCA in 235 Olfactores (Alexopoulos et al., 2004). We also found enrichment of ankyrin repeat domains, a 236 motif found widely across eukaryotes which has diverse functions in mediation of protein– 237 protein interactions (Li et al., 2006), and hence may be involved in the evolution of novel 238 protein-complexes in Olfactores. Also, specific to the evolution of the whale shark, we inferred 7 239 novel gene families and a loss of 1,501 gene families. Neither of these sets of gene families 240 were enriched for any functional terms, suggesting whale shark-specific traits are not attributed 241 to functional genomic shifts due to the origins or losses of gene families. 242 Among the novel proteins in the MRCA of vertebrates, we found enrichment of several 243 protein domain types including rhodopsin family 7-transmembrane (7-TM) receptor domains, 244 immunoglobulin V-set domain, collagen triple helix repeats, zona pellucida domain, and C2H2- 245 type zinc finger domain (Supplementary table 7; Appendix). The enrichment of collagen function 246 is consistent with the importance of these collagens at the origin of vertebrates and their 247 potential involvement in origin of vertebrate traits, such as bone and teeth (Boot-Handford and 248 Tuckwell, 2003). The enrichment of the zona pellucida domain at the origin of vertebrates is 249 consistent with previous evidence showing that zona pellucida proteins likely originated in 250 vertebrates (Litscher and Wassarman, 2014). Inner-ear proteins also contain the zona pellucida 251 domain, making its appearance in the vertebrate ancestor coincident with the origin of inner 252 ears (Popper et al., 1992). The 7-TM domain proteins include a wide variety of receptors but 253 were not enriched for any particular GO term. Some example receptors include those involved 254 in binding a variety of ligands (e.g. fatty acids, neuropeptides, and hormones), and receptors

6 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

255 with immune relevance (e.g. chemokine, bradykinin, and protease-activated receptors). The 256 immunoglobulin V-set domain was found in several proteins, most which had roles in cell 257 adhesion and other functions (Appendix). We also found enrichment among novel vertebrate 258 genes for the C2H2-type zing finger domain, a well-characterized zinc finger domain primarily 259 responsible for nucleotide–protein, as well as protein–protein interactions (Brayer and Segal, 260 2008; Wolfe et al., 2000). These novel genes were also not enriched for any particular GO term, 261 but play a role in a variety of developmental signaling pathways and cell cycle regulation 262 (Supplementary table 7; Appendix). The enrichment of these varied functional protein domains 263 in the MRCA of vertebrates demonstrate their importance in the origin of diverse vertebrate 264 traits, including responding to stimuli, fertilization, immunity, and signaling. Although the origins 265 of some of these gene families were in the vertebrate ancestor, subsequent gene diversification 266 in jawed vertebrates continued to increase the functional diversity of these gene families, such 267 as in the collagens which were duplicated in the jawed vertebrate genome duplication (Haq et 268 al., 2019; Wada et al., 2006). 269 In the MRCA of jawed vertebrates, we found enrichment of a variety of immune-related 270 protein domains including immunoglobulin V-set domain, immunoglobulin C1-set domain, and 271 interleukin-8-like small cytokines, with functional enrichment of immune response and hormone 272 activity. Immunoglobulin-domain containing gene families included many immunoglobulins, 273 interleukins, interleukin receptors, T-cell receptors, sialic acid-binding immunoglobulin-type 274 lectins (Siglec proteins), chemokines, cluster of differentiation (CD) proteins, and MHC proteins 275 (Supplementary table 8), consistent with the evolution of immunoglobulin-/T-cell receptor-based 276 adaptive immunity in gnathostomes (Boehm, 2012; Kaiser et al., 2004). We also found 277 enrichment for hormone activity, related to the origin of genes for many hormones at the origin 278 of vertebrates (Supplementary table 8). This finding complemented previous work that identified 279 hormones with a role in homeostasis originating in the MRCA of jawed vertebrates, but 280 it emphasizes that hormone activity is also a predominant function among earlier novel 281 vertebrate gene families (Hara et al., 2018). 282 Differences between bony vertebrates and cartilaginous fishes might be due to 283 functional differences in gene families specific to each lineage. For gene families exclusive to 284 bony vertebrates (including the 414 gene families derived in the MRCA of bony vertebrates and 285 366 gene families lost in cartilaginous fishes), we found several enriched sets of protein 286 domains and functions including GCPR domain, lectin C-type domain, and C2H2-type zinc- 287 finger proteins (Supplementary Tables 9–10). GPCRs are 7-TM proteins that transmit signals in 288 response to extracellular stimuli to G proteins (Pierce et al., 2002). This enrichment of GPCR 289 protein function is consistent with the relative paucity of these receptors in cartilaginous fishes, 290 noted previously (Marra et al., 2019). We found many of the GPCRs gained in the MRCA of 291 bony vertebrates were olfactory receptors, which is also consistent with the relatively low 292 number of olfactory receptors noted in cartilaginous fishes previously (Hara et al., 2018; Marra 293 et al., 2019). We also found that one of the GPCR gene families included MAS1 and its 294 relatives. MAS1 is important in response to angiotensin and regulating blood pressure, and 295 although sharks produce angiotensin I (Takei et al., 1993), the precursor to angiotensin-(1-7), 296 the lack of MAS1 and related receptors in cartilaginous fishes suggests that such responses are 297 mediated by alternative receptors and that blood pressure regulation is distinct between 298 cartilaginous fishes and bony vertebrates. Among the lectin C-type domain proteins, we found

7 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

299 no orthologs of the NK gene cluster in cartilaginous fishes (e.g. CD69, KLRC), a conserved 300 complex of genes found across bony vertebrates, which implies potential differences in the 301 natural killer complex in cartilaginous fishes (Kelley et al., 2005). Gene families lost in 302 cartilaginous fishes are also enriched for loss of KRAB box domain, which play a role in 303 transcription repression factors (Margolin et al., 1994). There was also enrichment for genes 304 including the C-type lectin domain, which bind a variety of ligands and have functions including 305 playing roles in immunity (Brown et al., 2018). By contrast, we did not find enrichment in 306 domains or functions among the gene families derived in cartilaginous fishes, which may in part 307 be due to fewer annotations among gene families that are not present in bony vertebrates 308 (Appendix). In summary, functional genomic differences between bony vertebrates and 309 cartilaginous fishes are due to differences in the presence of gene families in bony vertebrates, 310 with some related to immunity, chemosensation, and signaling. 311 Our analyses imply a dynamic history of gene family gain and loss across early 312 vertebrate evolution. Of particular importance was the number of gene families gained in the 313 MRCA of jawed vertebrates in establishing the gene families that are present in bony 314 vertebrates and cartilaginous fishes, with these novel gene families being enriched for immune- 315 related functions. The whale shark genome provided an important additional resource to study 316 the origins of gene families in early vertebrate evolution. 317 318 Table 1. Vertebrate and invertebrate PRR repertoires. Superscripts indicate these citations: 319 1Chen et al. (2021), 2Howe et al. (2016), 3Mukherjee et al. (2014), 4Kasamatsu et al. (2010), 320 5Buckley & Rast (2015), 6Tassia et al. (2017). 321 Species Toll-like NOD-like RIG-like Receptors Receptors Receptors (TLRs) (NLRs) (RLRs) Jawed vertebrates

Homo sapiens (Human) 10 21 3

1 2 3 Danio rerio (Zebrafish) 20 421 3

Rhincodon typus (Whale Shark) 13 43 3

Jawless vertebrates

4 5 5 3 Petromyzon marinus (Lamprey) 16 / 19 34 2

Invertebrate

8 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

5 5 3 Ciona intestinalis 3 16 2

6 5 5 3 Branchiostoma floridae 19 / 72 92 5

6 5 5 3 Strongylocentrotus purpuratus 104 / 253 203 6

6 Cephalodiscus hodgsoni 6

6 Ptychodera flava 14

6 3 Saccoglossus kowalevskii 10 3

Protostomes

5 5 Drosophila melanogaster 9 0 0

5 5 Daphnia pulex 7 2

5 5 3 Caenorhabditis elegans 1 0 2

5 5 3 Capitella teleta 105 55 2

5 5 3 Helobdella robusta 16 0 2

5 5 3 Lottia gigantea 60 1 3

Non-bilaterian animals

5 5 3 Nematostella vectensis 1 42 2

5 5 3 Amphimedon queenslandica 0 135 2

322 323 324 Evolution of jawed vertebrate innate immune receptors 325 Cartilaginous fishes are the most distant human relatives to possess an adaptive 326 immune system based on immunoglobulin antibodies and T cell receptors (Dooley, 2014; Flajnik 327 and Kasahara, 2010). This has driven extensive functional study of their adaptive immune 328 system and an in-depth, although controversial, analysis of the evolution of adaptive immune 329 genes in the elephant shark genome (Dooley, 2014; Redmond et al., 2018; Venkatesh et al., 330 2014). By comparison, the cartilaginous fish innate immune system has been overlooked

9 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

331 (Krishnaswamy Gopalan et al., 2014), despite its importance to understanding the impact that 332 the emergence of adaptive immunity had on innate immune innovation. For example, some 333 previous analyses of deuterostome invertebrate genomes identified greatly expanded PRR 334 repertoires. Yet vertebrate PRR repertoires are considered to be highly conserved, leading to 335 the suggestion that the need for vast PRR repertoires in vertebrates was superseded by the 336 presence of adaptive immunity in vertebrates (Huang et al., 2008; Rast et al., 2006; Smith et al., 337 2013). Notable exceptions to this include an expansion of TLRs in codfishes due to a proposed 338 loss of CD4 and MHC class II (Solbakken et al., 2017), and expansion of fish-specific NOD-like 339 receptors (NLRs) in some other teleosts (Howe et al., 2016; Stein et al., 2007). As such, we 340 sought to use the whale shark genome to determine whether cartilaginous fishes have a similar 341 PRR set to bony vertebrates, with which they share an adaptive immune system, and also 342 search for evidence of PRR expansions aiming to better understand vertebrate innate immune 343 evolution. To this end, we used BLAST to identify whale shark sequences corresponding to 344 three major PRR families –NLRs, RIG-like receptors (RLRs), and Toll-like receptors (TLRs) – 345 and reconstructed their phylogeny among published, curated vertebrated PRR gene datasets. 346 NLRs are intracellular receptors that detect a wide array of pathogen- and damage- 347 associated molecular patterns (e.g. flagellin, extracellular ATP, glucose) (Fritz and Kufer, 2015). 348 We identified 43 putative NLRs in the whale shark. We found direct orthologs of almost all 349 human NLRs (UFBOOT=100 for all; Table 1), of which 23 contained a clearly identifiable 350 NACHT domain (a signature of NLRs) (NOD1, NOD2, CIITA, NLRC5, NLRC3) while other 351 putative orthologs did not (NLRX1, NLRC4, BIRC1, NWD1, TEP-1, and NLRP). While inclusion 352 of these sequences lacking an apparent NACHT might seem questionable, the false-negative 353 rate for NACHT domain detection is high, even for some human NLRs. The presence of these 354 orthologs in whale shark indicates the presence of a conserved core NLR repertoire in jawed 355 vertebrates (Supplementary table 13, Figure 2 and Figure 2–figure supplement 1, Appendix). 356 Surprisingly, we found three orthologs of NOD1 in the whale shark, which is a key receptor for 357 detection of intracellular bacteria, rather than a single copy as in humans (ultrafast bootstrap 358 support, UFBOOT=100; Figure 2 and Figure 2–figure supplement 2 4). Further analyses 359 intimate that the three NOD1 copies resulted from tandem duplication events in the ancestor of 360 cartilaginous fishes (Appendix). Sequence characterization suggests that all three of the whale 361 shark NOD1s possess a canonical NACHT domain and so should retain a NOD1-like binding- 362 mechanism, but may have unique recognition specificity (Figure 2–figure supplement 2, 363 Appendix). Thus, we hypothesize that the three NOD1s present in cartilaginous fishes 364 potentiate broader bacterial recognition or more nuanced responses to intracellular pathogens. 365 In contrast to the scenario observed for NOD1, we did not find NACHT domain-containing 366 orthologs of any of the 14 human NLRP genes, many of which activate inflammatory responses 367 (Schroder and Tschopp, 2010), in whale shark, and only a single sequence lacking a detectable 368 NACHT domain (Supplementary table 13, Figure 2 and Figure 2–figure supplement 1, 369 Appendix). However, we did identify an apparently novel jawed vertebrate NLR gene family that 370 appears to be closely related to the NLRPs (UFBOOT=67; Figure 2 and Figure 2–figure 371 supplement 1). This family has undergone significant expansion in the whale shark 372 (UFBOOT=100; Figure 2 and Figure 2–figure supplement 1), and we tentatively suggest that 373 this may compensate for the paucity of true NLRPs in whale shark. Nonetheless, these results 374 imply that the NLR-based inflammasomes in humans and whale sharks are not directly

10 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

375 orthologous, and hence that NLR-based induction of inflammation and inflammation-induced 376 programmed cell death (Schroder and Tschopp, 2010) are functionally distinct in human and 377 whale shark. Interestingly, each of the vertebrate species we examined (human, zebrafish, and 378 whale shark) has independently expanded different NLR subfamilies relative to the other 379 species included in the analysis, with NLRP genes expanded in human (clade UFBOOT=99; 380 Figure 2 and Figure 2–figure supplement 1) and the previously-identified ‘fish-specific’ FISNA in 381 zebrafish (clade UFBOOT=86; Figure 2 and Figure 2–figure supplement 1). For the latter, we 382 unexpectedly found a whale shark ortholog (UFBOOT=74), suggesting this gene was present in 383 the jawed vertebrate ancestor and is not a teleost novelty. In all, while we found evidence for a 384 core set of NLRs in jawed vertebrates, our analyses also show that multiple, independent NLR 385 repertoire expansions, with probable immunological-relevance, have occurred during jawed 386 vertebrate evolution despite the presence of the adaptive immune system. 387 RIG-like receptors (RLRs) are intracellular receptors that detect viral nucleic acid and 388 initiate immune responses through Mitochondrial Antiviral Signaling Protein (MAVS) (Mukherjee 389 et al., 2014). Bony vertebrates have three RLR: RIG-1 (encoded by DDX58), MDA5 (IFIH1), and 390 LGP2 (DHX58). Structurally, these are all DEAD-Helicase domain containing family proteins 391 with a viral RNA binding C-terminal RD (RNA recognition domain), and an N-terminal CARD 392 domain pair that mediates interaction with MAVS (Loo and Gale, 2011; Mukherjee et al., 2014). 393 Previous phylogenetic studies either did not included RLRs from cartilaginous fishes or have 394 failed to definitively identify each of the three canonical vertebrate RLRs in this lineage, meaning 395 that the ancestral jawed vertebrate RLR repertoire remained unknown (Krishnaswamy Gopalan 396 et al., 2014; Mukherjee et al., 2014). Our phylogenetic analyses of DEAD-Helicase and CARD 397 domains indicate that orthologs of each of these genes exist in whale shark, revealing that all 398 three RLRs had already diverged in the last common ancestor of extant jawed vertebrates 399 (UFBOOT values all 100; Table 1, Figure 2 and Figure 2–figure supplement 2, Supplementary 400 table 13). Further, and consistent with past findings (Mukherjee et al., 2014), we found that 401 MDA5 and LGP2 are the result of a vertebrate-specific duplication, while RIG-1 split from these 402 genes much earlier in animal evolution (Figure 2 and Figure 2–figure supplement 2, Appendix). 403 We also identified MAVS orthologs in whale shark, elephant shark, and despite difficulties 404 identifying a sequence previously (Boudinot et al., 2014), coelacanth (UFBOOT=100; Figure 2 405 and Figure 2–figure supplement 2, Supplementary table 13, Appendix). These results show that 406 the mammalian RLR repertoire (and MAVS) was established prior to the emergence of extant 407 jawed vertebrates and has been highly conserved since, consistent with a lack of evidence for 408 large RLR expansions in invertebrates. 409 Toll-like receptors (TLRs) recognize a wide variety of PAMPs and are probably the best 410 known of all innate immune receptors. While large expansions have been observed in several 411 invertebrate lineages (Huang et al., 2008; Rast et al., 2006), many studies suggest that the 412 vertebrate TLR repertoire is largely conserved (Boudinot et al., 2014; Braasch et al., 2016; 413 Wang et al., 2016). Some teleosts appear to be an exception to this rule; however, this is likely 414 due to the teleost-specific whole-genome duplication, and loss of CD4 and MHC class II in 415 codfishes (Solbakken et al., 2017). We identified 13 putative TLRs in whale shark (Table 1, 416 Supplementary table 13; Appendix%&!??!03!'()6(!2.+!0.,(0*040A5!,0!GHI?J;J?$&!GHI#J#K!"L#%&!

417 GHI>&!GHIM&!GHIK&!GHIF!"L#%&!GHI#?&!GHI##J#>&!2E1!GHI#M!"NOPQQG!R2*A+5!2**!@FFS!O)4A.+!

11 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

418 # and Figure 2–figure supplement 3; Appendix). The remaining two, along with a coelacanth 419 sequence, represent a novel ancestral jawed vertebrate TLR gene family related to TLR21, for 420 which we propose the name TLR29 (UFBOOT = 99; Figure 2 and Figure 2–figure supplement 421 3). Thus, the whale shark TLR repertoire is a unique combination when compared to all other 422 vertebrates previously studied, being formed from a mix of classic mammalian and teleost 423 TLRs, supplemented with TLR27 and the new TLR29. Our rooted phylogenies indicate that the 424 ancestor of extant vertebrates possessed at least 15 TLRs, while the ancestor of jawed 425 vertebrates possessed at least 19 TLRs (including three distinct TLR9 lineages), both of which 426 are larger repertoires than possessed by modern 2R species (Figure 2 and Figure 2–figure 427 supplement 3; Appendix). Unlike invertebrates, where both loss and expansion of TLRs have 428 been extensive, our data suggest that many jawed vertebrate TLRs existed in the jawed 429 vertebrate ancestor, with lineage-specific diversification of jawed vertebrate TLRs primarily 430 resulting from differential loss (as well as genome duplications), supplemented by occasional 431 gene duplication events. 432 Overall, our findings imply that the jawed vertebrate ancestor possessed a core set of 433 PRRs that has largely shaped the PRR repertoires of modern jawed vertebrates. We propose 434 the budding adaptive immune system formed alongside this core set of PRRs, with concomitant 435 genome duplication-driven expansion of immunoregulatory genes leading these PRRs to 436 become embedded within new, combined innate-adaptive immunity networks. Our results 437 suggest that the impact of this on the propensity for large expansions is PRR type-specific, with 438 expansion of NLRs being recurrent during jawed vertebrate evolution, and massive expansion 439 of TLRs constrained without degeneration of the adaptive immune system. Although reliance 440 upon innate immune receptors is offset in vertebrates due to the presence of the adaptive 441 immune system, our results suggest that differences in PRR repertoires between vertebrates 442 and invertebrates are driven by specific functional needs on a case-by-case basis. Thus, rather 443 than a simple replacement scenario, the interaction with the adaptive immune system, and 444 associated regulatory complexity, is likely a major factor restraining the proliferation of certain 445 vertebrate PRRs. Although there is clear evidence that PRR expansions can and do occur in 446 jawed vertebrates despite the presence of an adaptive immune system, further in-depth 447 analyses are needed to help better tease out the changes in tempo of PRR diversification 448 across animal phylogeny and whether this associates in any way with the emergence of 449 adaptive immunity. 450 451 Rates of functional genomic evolution and gigantism. Rates of genomic evolution vary 452 considerably across vertebrates, either across clades or in relationship to other biological 453 factors, including body size. We compared rates in two different aspects of genomic evolution 454 with potential functional relationship to gigantism in the whale shark, to those of other 455 vertebrates: rates of amino acid substitution in protein-coding genes, and rates of evolution in 456 gene family size. 457 Substitution rates across a set of single-copy orthologs varied across vertebrate 458 genomes, and these rates were relatively low in the whale shark compared to most other 459 vertebrates (Figure 3). We tested for different rates of substitution among vertebrate clades 460 using the two-cluster test implemented by LINTRE (Takezaki et al., 1995). Previous use of this 461 test to compare the elephant shark (Callorhinchus milii) genome to other vertebrates determined

12 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

462 that C. milii has a slower substitution rate than the coelacanth, teleosts, and 463 (Venkatesh et al., 2014), and that cartilaginous fishes are slower than bony vertebrates (Hara et 464 al., 2018). We also found that cartilaginous fishes have a significantly slower rate of substitution 465 (p = 0.0004). In addition, C. milii (p = 0.0004) was found to be significantly slower than sharks, 466 consistent with prior work (Weber et al., 2020). The whale shark was not significantly different in 467 rate compared to the brownbanded bamboo shark (p = 0.1802), its closest relative included in 468 the analysis. We also found that cartilaginous fishes had a lower rate of molecular evolution 469 compared to subsets of bony vertebrates including ray-finned fishes (p = 0.0004), 470 sarcopterygians (p = 0.001), the coelacanth (p = 0.0004), and tetrapods (p = 0.0088), but not 471 significantly slower than the spotted gar (p = 0.1416). In addition, we found some patterns 472 among other vertebrates consistent with previous studies on rates of molecular substitution, 473 including that ray-finned fishes evolved more rapidly than sarcopterygians (p = 0.0004) and that 474 spotted gar evolved more slowly than teleosts (p = 0.0004). 475 We then tested whether rates of molecular substitution differed on those branches 476 leading to gigantism among vertebrates. The origins of gigantism in elephants, whales, and the 477 whale shark have previously been shown to correspond to shifts in the rate or mode of body 478 size evolution (Pimiento et al., 2019; Puttick and Thomas, 2015; Slater et al., 2017). We 479 estimated time-varying rates of body size evolution in cartilaginous fishes using BAMM 480 (Rabosky et al., 2013). Consistent with previous research (Pimiento et al., 2019), we found that 481 gigantism in whale shark corresponds to a discrete shift in the rate of body size evolution to five 482 times the background rate in cartilaginous fishes (Appendix; Supplementary figure 3) (Pimiento 483 et al., 2019). Using PAML to fit models where the rates of amino acid substitution leading to 484 vertebrate giants to other vertebrates differed, we found this model was significantly different 485 than the strict clock (log-likelihood ratio test p = 1.76 x 10-56), indicating a significantly different 486 rate of molecular substitution in vertebrate giants. This finding is consistent with earlier evidence 487 that larger-bodied taxa have lower rates of protein evolution (Martin and Palumbi, 1993). 488 However, given that the whale shark genome did not appear to evolve significantly more slowly 489 than the brownbanded bamboo shark genome (noted above), or other small-bodied sharks as 490 found previously when focusing on fourfold degenerate sites (Hara et al., 2018), there may not 491 be an additive effect on substitution rates in the whale shark genome as both a vertebrate giant 492 and a cartilaginous fish. This implied that substitution rates and body size may have less effect 493 in cartilaginous fishes, which are already overall slowly evolving, in contrast to the pattern seen 494 in other vertebrates. 495 Rates of change in gene family sizes, due to gain and loss of gene copies within gene 496 families, can also vary across species (Han et al., 2013). This represents another potential axis 497 of genomic evolution that is potentially independent from substitution rates. We estimated rates 498 for gene family size evolution for 10,258 gene families present in the MRCA of vertebrates using 499 CAFE 4.2.1 (Han et al., 2013). Average global rates of gene gain and loss in vertebrates were 500 estimated to be 0.0006092 gains/losses per million years. We found that the rate of gene family 501 size evolution in giant vertebrates was significantly faster than in the remaining branches, 502 roughly double the rate in non-giant lineages (p < 0.002). Mean change in gene family size 503 shows that rates of gene family size evolution vary across all taxa including giant lineages, and 504 that an increase in gene family size evolution is not a consistent result of gigantism. However, 505 more complex models in which giant lineages were allowed to vary in rate did not converge.

13 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

506 These results suggested that the relationship of gigantism on gene substitution rates do not 507 necessarily predict other forms of genomic evolution, including rates of gene family size 508 evolution. 509 Replicated shifts in gene gain or loss for specific gene families in independent giant 510 lineages might indicate the consistent effect of selection related to gigantism in particular 511 functional genes. We inferred that 1,387 gene families had a rate shift in gene family size 512 evolution on at least one branch in the vertebrate phylogeny (at p < 0.05, Supplementary file 9). 513 For those gene families that had a rate shift, on average, around 7 independent rate shifts 514 occurred among the vertebrate species considered. Gene families with any shift across the 515 vertebrate phylogeny were enriched for ribosomal genes as well as a few gene families that 516 were enriched for dynein heavy chain genes (Supplementary table 11). No gene families 517 independently shifted in all giant taxa exclusive of other vertebrates, and only five gene families 518 independently shifted in any giant taxa exclusive of other branches in the vertebrate phylogeny 519 (Supplementary file 9). This indicates no consistent signal of selection for a rate shift in gene 520 family size evolution for any particular gene family in the evolution of vertebrate giants. 521 Interestingly, the gene families that shifted in rates of gene gain and loss anywhere in 522 vertebrate phylogeny were also enriched for human orthologs in the Cancer Gene Census 523 (Fisher's exact test, odds ratio = 1.43, p = 0.00414) (Sondka et al., 2018), suggesting that these 524 cancer-relevant gene families were also more likely to shift in their expansion/contraction rate 525 across vertebrates. However, this analysis did not consider which branches the shifts occurred 526 on. We hence explored whether gene families that shifted in rate of gene family size evolution 527 across any branch leading to gigantism were enriched for cancer genes. Cancer suppression 528 has evolved by different mechanisms across mammalian lineages, including gene family 529 expansions (Tollis et al., 2017); for example, the duplication of tumor suppressor protein TP53 530 has been implicated in reduced cancer rates in proboscideans (elephants and relatives) relative 531 to other mammals (Sulak et al., 2016). By contrast, this same gene family is not expanded in 532 baleen whales (Tollis et al., 2017), thus it is already known the same gene families do not 533 expand in all mammalian giants, so we should not expect this to be the case when including 534 fishes. We confirmed a rate shift in gene family size evolution in TP53 in the lineage leading to 535 elephant, but this gene family also had a rate shift for TP53 along the branch leading to minke 536 whale (but not bowhead whale), as well as the non-giant bamboo shark. Therefore, we also 537 tested if all gene families that shifted along a branch leading to a vertebrate giant were enriched 538 for cancer-related genes, including gene families that shifted along non-giant branches. Here, 539 we found that all gene families that shifted in gene gain and loss rate along vertebrate giant 540 lineages (including gene families that shifted along other branches) were enriched for cancer 541 genes relative to all gene families that shifted in any branch in the vertebrate phylogeny (odds 542 ratio = 2.66, p = 0.00199), with over twice as many cancer-related gene families estimated to 543 have shifted in rate among all vertebrate lineages than the null expectation. That these gene 544 families were not enriched for any particular GO function or protein domain implies that cancer- 545 suppression can evolve through various mechanisms. For comparison, we also did the same 546 test but focusing on any gene families that shifted along branches leading to the non-giant 547 vertebrates sister to the giant vertebrates (i.e. brownbanded bamboo shark, pufferfishes, hyrax, 548 bottlenose dolphin), and found that there was no significant enrichment for cancer gene families 549 along those branches (odds ratio = 0.88, p = 0.624). Furthermore, we confirmed the significance

14 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

550 of the observed effect size by randomly drawing sets of branches across the vertebrate 551 phylogeny to test for enrichment of cancer genes along random sets of branches, and found 552 that the observed odds ratio of 2.66 was more extreme than 98% of random odds ratios (i.e. p = 553 0.02). This reinforces that the finding of gene family size evolution shfits along giant branches is 554 significantly enriched for cancer genes. 555 For the 1,387 gene families that had a significant rate shift in gene family size evolution, 556 we then studied if the rates were significantly greater in cancer genes versus non-cancer genes, 557 depending on whether or not branches led to giant taxa or not. By fitting a linear mixed model 558 using lme4, we found that there was a significantly higher rate of gene family size evolution 559 along branches leading to giant taxa versus other branches (coefficient 0.0203, p = 5.95e-6), no 560 effect of whether a gene was related to cancer on rates (coefficient -0.00245, p = 0.219), but a 561 significant interaction of cancer-suppression function and gigantism (coefficient 0.0102, p = 562 8.51e-5), such that rates of gene family size evolution in genes related to cancer leading to giant 563 taxa are even higher than expected relative to the effect of being on a giant branch alone 564 (Figure 4). The significantly higher rate of gene family size evolution in vertebrate giants is 565 consistent with the genome-wide patterns estimated above. In these gene families where a rate 566 shift occurred, we found the mean rate of gene family size evolution along branches leading to 567 giant taxa was 3.32 times greater than the mean rate along other branches in cancer genes, but 568 branches leading to giant taxa had only an average rate that was 2.60 times greater than other 569 branches in genes not related to cancer. Overall, this is suggestive that dynamics of vertebrate 570 evolution in cancer-related gene family size among the sampled taxa are driven by the evolution 571 of gigantism. 572 573 574

575 Conclusions 576 As a representative of cartilaginous fishes, a lineage for which only few genomes have thus far 577 been sequenced, the whale shark genome provides an important resource for vertebrate 578 comparative genomics. The new genome assembly based on long reads we reported in this 579 paper represents the best gapless genome assembly thus far among cartilaginous fishes. 580 Comparison of the whale shark to other vertebrates not only expands the number of shared 581 gene families that were ancestral to jawed vertebrates but demonstrates that the early 582 vertebrate genome duplications were also accompanied by a burst in the evolution of novel 583 genes. These early gene families are involved in a diversity of functions including reproduction, 584 metabolism, development, and adaptive immunity. We also find differences in gene families 585 implying functional genomic differences between bony vertebrates and cartilaginous fishes, 586 MAS1 and the NK gene cluster. With specific respect to genes involved in innate immune 587 protection, we found divergent patterns of gene gain and loss between NLRs, RLRs, and TLRs, 588 which provide insight into their repertoires in the jawed vertebrate ancestor. These results 589 rejected a scenario where the importance of PRRs is muted in vertebrates by the presence of 590 adaptive immunity, instead indicating the ongoing necessity of ancient PRRs, which were 591 integrated with the new adaptive immune system in the jawed vertebrate ancestor. Finally, we

15 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

592 demonstrated that the relationship between rates of gene family size evolution and rates of 593 substitution to gigantism are decoupled, and that gene families that shifted in gene expansion 594 and contraction rate leading to vertebrate giants were enriched for genes with cancer relevance. 595 The whale shark genome helps to build a foundation in shark and vertebrate comparative 596 genomics, which is useful to answer questions of broader vertebrate evolution and convergent 597 evolution of distinctive traits. Further sequencing of high-quality elasmobranch genomes will 598 continue to enhance research from finding unique, whale shark-specific evolutionary change to 599 illuminating broader patterns of vertebrate evolution. 600

601 Acknowledgements 602 The sequencing service was provided by the Norwegian Sequencing Centre 603 (www.sequencing.uio.no), a national technology platform hosted by the University of Oslo and 604 supported by the "Functional Genomics" and "Infrastructure" programs of the Research Council 605 of Norway and the Southeastern Regional Health Authorities". We thank F. Thibaud-Nissen for 606 assistance with genome annotation through NCBI RefSeq. We thank B. Morgan and the High 607 Performance Computing oversight committee for access and assistance with the Center for 608 Advanced Science Innovation and Commerce (CASIC) supercomputer at Auburn University, R. 609 A. Petit III for assistance with computing at Emory University, and High Performance Computing 610 at George Washington University for assistance with Colonial One and Pegasus. We thank the 611 staff of Laboratory of Phyloinformatics in RIKEN BDR for transcriptome sequencing. Inference 612 of gigantism in the whale shark was made possible by body size data kindly provided by C. Mull. 613 We are thankful for funding provided from the Georgia Aquarium and the Emory School of 614 Medicine Development. S. Koren and A.M. Phillippy were supported by the Intramural Research 615 Program of the National Human Genome Research Institute, National Institutes of Health. 616 Silhouettes used throughout via PhyloPic: Callorhinchus, CC BY-SA by Milton Tan, originally by 617 Tony Ayling; whale shark, CC BY-SA by Scarlet23 (vectorized by T. Michael Keesey); spotted 618 gar, tilapia, stickleback, CC BY-NC-SA by Milton Tan; Asian arowana, coelacanth, CC BY-NC- 619 SA by Maija Karala; clawed frog, anole, platypus, opossum, elephant, CC BY by Sarah 620 Werning; alligator, CC BY-NC-SA by Scott Hartman; mouse, CC BY-SA by David Liao; dolphin, 621 CC BY-SA by Chris Huh; catshark, brownbanded bamboo shark, white shark, zebrafish, pike, 622 cod, ricefish, ocean sunfish, chicken, armadillo, hyrax, human, dog, pig, cow, minke whale, 623 bowhead whale, public domain.

624 Author Contributions 625 M.T. contributed writing of the manuscript, experimental design, and implementing analyses of 626 genome polishing, phylogenomics, orthology determination, gene family evolution, RNAseq 627 mapping, and submission of genomic data to NCBI for accessioning and annotation by RefSeq. 628 A.K.R. and H.D. performed analyses and wrote the results, supplementary text, and methods 629 sections on innate immune genes. S.Koren and A.P. assembled the genome using Canu. R.N., 630 K.S., and S.Kuraku generated whale shark blood transcriptome data. T.D.R. and A.D.M.D. 631 initiated and managed the genome project. All authors provided feedback on the manuscript.

16 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

632 Methods 633 Genome sequence assembly and assessment. To improve on our earlier efforts to sequence 634 and assemble the whale shark genome (Read et al., 2017), we generated PacBio long read 635 sequences from the same DNA sample. These sequences are available on NCBI SRA under 636 the accession SRX3471980. This resulted in 61.8 Gbp of sequences, equivalent to ~20x fold 637 coverage. The initial assembly was performed using Canu 1.2 (Koren et al., 2017) with adjusted 638 parameters to account for the lower input coverage: 639 canu -p asm -d shark genomeSize=3.5g corMhapSensitivity=high corMinCoverage=2 640 errorRate=0.035 641 Illumina reads from all paired end read libraries from Read et al. were trimmed using 642 Trimmomatic v0.39 with the following settings: ILLUMINACLIP:adapters.fa:2:30:10 LEADING:5 643 TRAILING:5 SLIDINGWINDOW:4:15 MINLEN:31 where adapters.fa is a fasta file containing all 644 Illumina sequence adapters packaged with Trimmomatic (Bolger et al., 2014). Illumina reads 645 from the single mate pair library from Read et al. were trimmed using NxTrim v0.4.3 using 646 default settings (O’Connell et al., 2015). All Illumina reads were aligned to the genome using 647 BWA-MEM (Li, 2013) v0.7.12-r1039 with default parameters and alignments were used as input 648 into Pilon v1.18 under default settings (Walker et al., 2014) to correct errors in the draft 649 assembly. All reads were then used to scaffold the genome using Platanus 1.2.4 (Kajitani et al., 650 2014). The runs for each library were provided as separate input libraries to platanus scaffold 651 such that the insert sizes will be considered to be different for each library, and the resulting 652 scaffolded assembly was passed to platanus gap_close, both with default settings. Genome 653 size and quality statistics were computed using QUAST v5.0.2 on default settings (Gurevich et 654 al., 2013) and compared to the published values from prior studies. The white shark genome 655 paper did not report contig N50; we decomposed the scaffolds into contigs and determined N50 656 using seqkit v0.10.1 (Shen et al., 2016). 657 We performed k-mer analysis using Jellyfish version 2.2.6 on all Illumina reads using the 658 -C setting to count canonical k-mers, a k-mer size (-m) of 21 and hash size (-s) of 100M 659 (Marçais and Kingsford, 2011). Then, we used GenomeScope to fit a model that allows for 660 assembly-free estimation of genome size, heterozygosity, and repeat content (Vurture et al., 661 2017), providing the k-mer size of 21, read length of 100, and using the default maximum k-mer 662 coverage of 1000 (GenomeScope accessed 2017 June 05). We used KAT v2.2.0 to plot the k- 663 mers and visualize the copy number in the genome of k-mers in the raw read Illumina data 664 (Mapleson et al., 2016). We first used kat comp to compare the Jellyfish k-mer counts to the 665 genome assembly, and plotted the results using kat plot spectra-cn. 666 We assessed gene completeness with conserved vertebrate orthologs using BUSCO v2 667 (Simão et al., 2015) and CVG orthologs using gVolante (version 1.2.1; accessed 2019 April 23) 668 (Hara et al., 2015), and by mapping RNA-seq reads (Appendix). The trimmed reads were then 669 re-used to call SNPs to assess heterozygosity using freebayes under default settings (Garrison 670 and Marth, 2012). We then used vcflib packages vcffilter to filter the results for a minimum 671 quality of 20 (-f "QUAL >20") and vcfstats to count the number of SNPs (Garrison et al., 2021). 672 673 Transcriptome sequencing. Approximately 30 million short read pairs for whale shark 674 transcripts were obtained with paired-end 127 cycles from blood cells of a male and a female by

17 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

675 the Illumina HiSeq 1500, as described previously (Hara et al., 2018). Animal handling and 676 sample collections at Okinawa Churaumi Aquarium were conducted by veterinary staff without 677 restraining the individuals, in accordance with the Husbandry Guidelines approved by the Ethics 678 and Welfare Committee of Japanese Association of Zoos and Aquariums. Downstream handling 679 of nucleic acids was conducted in accordance with the Guideline of the Institutional Animal Care 680 and Use Committee (IACUC) of RIKEN Kobe Branch (Approval ID: H16-11). Transcriptome 681 sequence data are available at NCBI BioProject ID PRJDB8472 and DDBJ DRA ID 682 DRA008572. 683 684 Gene prediction. Genes predictions were provided to us by RefSeq using their genome 685 annotation pipeline version 7.3 (Warren et al., 2017), details of the resulting annotation are 686 publicly available (“Rhincodon typus Annotation Report,” 2018). This annotation included 687 alignments of RNAseq data from grey bambooshark Chiloscyllium griseum kidney and spleen, 688 nurse shark Ginglymostoma cirratum spleen and thymus, and brownbanded bambooshark 689 Chiloscyllium punctatum retina, as well as protein alignments from , and RefSeq 690 protein sequences for Asian arowana Scleropages formosus, coelacanth, spotted gar, 691 zebrafish, clawed frog, and human. After preliminary orthology determination, we determined 692 additional genes absent in whale shark conserved among vertebrates, which we annotated by 693 aligning protein sequences from these genes from human, gar, coelacanth, and mouse (Table 694 S3) to whale shark using genBLAST v1.39 (She et al., 2011) with these settings: -p genblastg 695 -e 1e-5 -g T -gff -cdna -pro -pid (Appendix, Supplementary file 3). 696 697 Orthology inference. We identified orthologs from the whale shark genome by comparison to 698 publicly available chordate genomes. We compiled chordate proteomes for 32 species 699 representing major vertebrate clades, the sea squirt Ciona intestinalis, and lancelet 700 Branchiostoma floridae (Supplementary table 4). In selecting representative vertebrates, we 701 specifically included the ocean sunfish, African elephant, and two baleen whale genomes 702 (minke whale, bowhead whale), and the most closely-related genomes available for these taxa 703 (Takifugu rubripes and Dichotomyctere nigroviridis, rock hyrax, and bottlenose dolphin). These 704 ortholog clusters were used for the identification of origins of gene families in chordate evolution 705 and genes that originated in the most recent common ancestor of jawed vertebrates, studying 706 enrichment or changes in functional annotation associated with these orthogroups, 707 phylogenomics, estimation of rates of molecular substitution, and estimation of rates of gene 708 duplication and loss. 709 Ortholog clusters from proteomes were determined using OrthoFinder v2.2.6 with default 710 settings (Emms and Kelly, 2015). With the resulting hits, OrthoFinder adjust scores for 711 reciprocal best hits while accounting for gene length bias and phylogenetic distance, then 712 proceeds with clustering genes into orthogroups. Preliminary orthology determination suggested 713 many potential missing orthologs in the elephant shark and whale shark genomes. We thus 714 performed orthology-based annotation using genBLAST (She et al., 2011) as noted under Gene 715 prediction methods, added newly identified proteins to the proteomes of whale shark and 716 Callorhinchus, and reran the OrthoFinder pipeline including these proteins. 717 All proteins were then annotated for Gene Ontology (GO) and Pfam terms using 718 InterProScan 5.32-71.0 (Jones et al., 2014), and representative annotations were assigned to

18 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

719 each chordate orthogroup using KinFin 1.0 with the --infer-singletons option on to interpret 720 gene families absent from clusters as singletons, then running the 721 functional_annotation_of_clusters.py script packaged with KinFin under default settings, which 722 assigns an annotation to a gene family if at least 75% of proteins in the gene family has that 723 annotation, and at least 75% of taxa within the cluster have a protein with that annotation 724 (Laetsch and Blaxter, 2017) (Supplementary file 6). 725 726 Gene Family Origin and Loss. A custom R script is provided for analyses run for this section 727 (Supplementary file 10). To infer when gene families (as inferred from OrthoFinder) were gained 728 and lost in vertebrate evolution, we mapped the origins and losses of gene families to the 729 species tree parsimoniously, assuming that gene families have a single origin, but can be lost 730 (Laetsch and Blaxter, 2017). We were then able to count the number of gene families present at 731 the MRCA of nodes, the number of novel gene families that originated along each branch, and 732 the number of gene families lost along each branch (including gene families uniquely lost along 733 each branch). We also determined the number of gene families conserved in all descendants 734 (core genes), and the number of novel gene families conserved in all descendants (novel core 735 genes). 736 To confirm that the number of novel gene families in the jawed vertebrate ancestor was 737 not inflated by artefactual oversplitting of ohnologs (gene duplicates that arose from two rounds 738 of whole genome duplication early in vertebrate evolution). Singh et al. (2015) independently 739 used a synteny-aware method to identify ohnologs in a subset of vertebrate genomes. We 740 compared our assignment of human orthologs to gene families to the assignment of human 741 orthologs to Singh et al. ohnolog families (downloaded 2020 June 2). We determined whether 742 our gene families and Singh et al. ohnolog families matched and whether human orthologs were 743 assigned to a single gene family or ohnolog. We replicated this for green anole, spotted gar, 744 zebrafish, and possum ohnologs. To find the common genes between the Ensembl protein IDs 745 we clustered and Ensembl gene IDs provided by Singh et al., we used the R package biomaRt 746 v2.45.2 to translate identifiers (Durinck et al., 2005). 747 Based on the representative annotations for each orthogroup determined above, we 748 then determined whether groups of gene families that were gained or lost along branches in the 749 vertebrate phylogeny were enriched for certain functions using a Fisher's Exact Test. Within 750 each comparison, we adjusted the p-value to correct for multiple hypothesis testing by the 751 Benjamin-Hochberg method using the p.adjust function in R (Wright, 1992). Corrected p-values 752 under the BH method can be interpreted at a significance threshold that is equivalent to the 753 false discovery rate. We considered functions enriched with an adjusted p-value of 0.05 and 754 false discovery rate of 0.05. 755 756 Innate Immunity Analyses. 757 Homology identification: Sequence similarity searches were performed using BLAST+ 758 v2.6.0 to identify putative homologs of TLRs, NLRs and RLRs (Altschul et al., 1990). An 759 alternative approach using profile hidden Markov models, HMMER [version 3.1] (Eddy, 1998), 760 was also tested for TLRs; the results obtained were identical, except that BLAST returned an 761 additional putative TLR. Due to this HMMER results were not applied in subsequent analyses, 762 and HMMER was not applied elsewhere(Eddy, 1998). Searches for whale shark TLR and RLR

19 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

763 homologs were performed using all other sequences present in the TLR and RLR trees. 764 Retention of sequences for further analyses was reliant on a reciprocal blast hit to a TLR or RLR 765 in the Swissprot reviewed database or the NCBI non-redundant protein set (Boeckmann et al., 766 2003). 767 For NLRs, detection is more complicated, as some NLRs do not contain computationally 768 detectable NACHT domains (i.e. some family members, even in humans, are false negatives in 769 domain-based search tools and databases), despite the NACHT domain being the defining 770 feature of NLR family members. Further, some of these genes contain other domains and are 771 also included in other gene families where most members do not contain NACHT domains. As 772 such, for the main analysis performed here, those sequences in the predicted proteome and 773 translated transcriptome containing a predicted NACHT domain according to the NCBI CD- 774 search webserver (Marchler-Bauer et al., 2015) are noted as such (and should be considered 775 as the conservative set of whale shark NLR-like sequences). Additional sequences from the 776 predicted protein set with a blast hit to known NLRs were also included to permit detection of 777 potential orthologs of NLRs not found in the conservative set with definite/detectable NACHT 778 domains. Proteins containing the closely related NB-ARC domain were also extracted from the 779 whale shark proteome 780 In cases where a transcript matches the genomic location of a predicted protein, the 781 predicted protein is the sequence reported. Where multiple predicted proteins refer to the same 782 genomic location, only a single sequence is retained for further analysis. 783 Phylogenetic Datasets: For NLRs, we performed phylogenetic analyses of the whale 784 shark putative NLRs to known NLRs from human and zebrafish, both of which are highly 785 phylogenetically relevant and well-studied in this regard. Proteins containing the closely related 786 NB-ARC domain were used as an outgroup in NLR analyses, along with Human APAF-1 which 787 also harbors an NB-ARC domain (Urbach and Ausubel, 2017). 788 To better understand RLR and MAVS evolution we used two datasets. The first of these 789 was based on the central DEAD-helicase domains (hence, excluding MAVS) to define which of 790 the three RLR proteins could be found in whale shark, and infer the jawed vertebrate RLR 791 repertoire, also following Mukherjee et al. (2014). For the RLR datasets, members of each of the 792 three vertebrate RLR families, some invertebrate RLRs, and a selection of DICER proteins 793 sequences as an outgroup (Mukherjee et al., 2014) were gathered to generate a 794 phylogenetically informative dataset (i.e. aiming to include representatives of each of the major 795 vertebrate classes for which genome data were available). Full length proteins were aligned for 796 phylogenetic analysis of DEAD-Helicase domains, and trimmed to the start and end of these 797 domains based on the three human RLR sequences (Mukherjee et al., 2014). The second 798 dataset was based on individual CARD domains, as the presence of two CARD domains in 799 RIG-1 and MDA5 is thought to have come about through independent domain duplication in 800 each lineage, which would mislead phylogenetic analyses if ignored (Korithoski et al., 2015) The 801 same process as for DEAD-Helicase domains was performed for the CARD domains (Korithoski 802 et al., 2015). 803 For the TLR dataset, a large set of TLR nucleotide sequences were taken from a past 804 study that densely sampled vertebrates (Wang et al., 2016). TLR sequence from grey bamboo 805 shark (Chiloscyllium griseum) was also included (Krishnaswamy Gopalan et al., 2014).

20 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

806 Following trimming, the alignment consisted almost entirely of sites from the TIR domain, so TIR 807 domains were not specifically extracted for this analysis. 808 For the NLR analysis the described set of human NLRs and NACHT domain containing 809 proteins, as well as the closely related NB-ARCs as an outgroup (Urbach and Ausubel, 2017), 810 were downloaded from NCBI protein database. Sequences of zebrafish, where NLRs are 811 massively expanded (Howe et al., 2016; Stein et al., 2007), were also included in this analysis, 812 but these were downloaded from the InterPro website (i.e. all Danio rerio proteins containing a 813 NACHT domain) (Hunter et al., 2009). A very large number of zebrafish sequences were 814 obtained, so to reduce the prevalence of pseudo-replicate sequences (that are likely to be 815 uninformative in the context of understanding the whale shark NLR repertoire), CD-HIT (Fu et 816 al., 2012) was used to cluster zebrafish sequences with greater than 75% identity prior to 817 phylogenetic analysis. An additional NLR analysis was performed focusing specifically on 818 NOD1s, this employed NOD2 as an outgroup based on the larger NLR analysis and included 819 NOD1s identified by BLAST in elephant shark. Notably, our NLR analsyis relies on poorer taxon 820 sampling compared to that for the RLR and NLR datasets. This is due to a relative paucity of 821 previously characterized NLR repertoires across vertebrate species. Importantly, although this 822 does not lend itself well to understanding the tempo of lineage-specific gene family 823 expansion/contraction it does not preclude detection of such events along the lineages leading 824 to the species included in the analysis. 825 Multiple sequence alignment and phylogenetic analyses: Multiple sequence alignments 826 were generated with MAFFT (version 7.313) (Katoh and Standley, 2013) using default 827 parameters for the larger TLR and NLR datasets, but using the more intensive L-INS-i method 828 for RLRs and the focused NOD1 NLR dataset. trimAl (version 1.2rev59) (Capella-Gutierrez et 829 al., 2009) was applied to remove gap rich sites, which are often poorly aligned, from the 830 alignments using the “gappyout” algorithm. BMGE (version 1.12) (Criscuolo and Gribaldo, 2010) 831 was then used to help minimize the number of saturated sites in the remaining alignment (as 832 identified using the BLOSUM30 matrix). The RLR analyses were not subjected to this BMGE 833 analysis, as these were derived from conserved domains (meaning that alignments were based 834 on relatively conserved sequence tracts and were already quite short). The NOD1 focused NLR 835 alignments were judged to contain relatively similar sequences and were not subjected to either 836 trimAl or BMGE analyses. Phylogenetic analyses were performed in IQ-TREE (version: omp- 837 1.5.4) (Nguyen et al., 2015) using 1000 ultrafast bootstrap replicates (Minh et al., 2013) and the 838 best-fitting model of amino acid substitution. Best-fitting substitution models were determined 839 according to the Bayesian information criterion with ModelFinder from the IQ-TREE package 840 (Kalyaanamoorthy et al., 2017), and ultrafast bootstrap support was computed to assess branch 841 support (Hoang et al., 2017). The following (best-fitting) models were applied for each dataset: 842 LG+I+G for RLR CARD domains dataset, LG+I+F+G for RLR DEAD-Helicase domains dataset, 843 JTT+I+F+G for the TLR dataset, JTT+F+G for the NLR dataset, and JTT+I+F+G for the NOD1 844 focused NLR dataset. The trees were rooted either by outgroup, or the TLR tree was rooted 845 minimal ancestor deviation method (Tria et al., 2017). This is unlike many other TLR trees 846 produced in previous studies which are unrooted (Roach et al., 2005; Wang et al., 2016) 847 848 Phylogenomics. Orthogroups were filtered to single-copy orthologs for phylogenomic analyses. 849 We determined orthologs from orthogroups by reconstructing orthogroup trees and used tree-

21 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

850 based orthology determination using the UPhO pipeline (Ballesteros and Hormiga, 2016). The 851 paMATRAX+ pipeline bundled with UPhO was used to perform alignment (mafft version 852 7.130b), mask gaps (trimAl version 1.2), remove sequences containing too few unambiguous 853 sites, and check that at the minimum number of taxa are present (using the Al2Phylo script part 854 of UPhO), and then reconstruct phylogenies (IQ-TREE v1.6.10) (Capella-Gutierrez et al., 2009; 855 Katoh and Standley, 2013; Nguyen et al., 2015). Next, we used UPhO to extract orthologs by 856 identifying all maximum inclusive subtrees from orthogroups with at least five species, with the 857 allowance for in-paralogs (paralogs that arose after all species divergences in the phylogeny, 858 and thus do not affect relative relationships in the phylogeny), and retained the longest in- 859 paralogous sequence for each species within each ortholog. For each single-copy ortholog, we 860 aligned, trimmed, and sanitized sequences using the paMATRAX+ pipeline. 861 To select the most reliable sequences for infer a phylogenomic time tree, we filtered for 862 the most informative loci using MARE version 0.1.2-rc with default settings except -t (taxon 863 weight) set to 10 to weight retaining taxa higher the alignment over retaining loci (Misof et al., 864 2013). Next, orthologs without lamprey, Callorhinchus, whale shark, Branchiostoma, and Ciona 865 were excluded. We also filtered down to loci that supported the monophyly of vertebrate, 866 gnathostome, chondrichthyan, and osteichthyan clades. After our filtering we were left with an 867 alignment comprising 281 loci and 209,275 residues. We concatenated the sequences and 868 selected the best model of amino acid substitution and partitioning scheme and inferred a 869 maximum likelihood phylogeny using IQTREE v1.6.10 (Hoang et al., 2017; Kalyaanamoorthy et 870 al., 2017; Nguyen et al., 2015) with the followings settings: -bb 1000 -bnni -m MFP+MERGE - 871 rcluster 10. The tree was rooted using the amphioxus Branchiostoma. The phylogeny was 872 largely consistent with consensus arising from phylogenomic studies. We also inferred a 873 phylogeny accounting for incomplete lineage sorting using ASTRAL v5.7.1 (Zhang et al., 2018) 874 based on gene trees (not shown), which was identical in topology except for the placement of 875 armadillo (Xenarathra) sister to Afrotheria in the ASTRAL tree versus armadillos sister to 876 Boreoeutheria in IQ-TREE. This relationship has historically been difficult to reconstruct and is 877 consistent with prior conflicts between concatenated and coalescent-based analysis on the 878 placement of Xenarthra with far more taxa (Esselstyn et al., 2017). In addition, none of our focal 879 results are reported within mammals, where the relationships of Xenarthrans could be relevant. 880 Numerous fossil-based node calibrations were identified from the literature. Most node 881 ages were derived from age ranges published in the Fossil Calibration Database (Benton et al., 882 2014) and are listed in Supplementary table 12. While previously the age of crown 883 Chondrichthyes (here, the MRCA of Holocephali + Elasmobranchii) has been suggested to 884 range from 333.56–422.4 Ma, the minimum age was recently pushed further back to 358 Ma 885 based on multiple holocephalan fossils (Coates et al., 2017). To assess the concordance of the 886 fossil calibrations, we used treePL version 1.0 to estimate divergence times from the ML tree 887 with each fossil calibration using penalized likelihood, then performed cross-validation and 888 evaluated the concordance of the fossils to the time tree to identify and exclude outliers (Near et 889 al., 2005). After excluding fossils that were discordant with the others, we estimated divergence 890 times using treePL with the remaining fossil calibrations. The final treePL config file is provided 891 (Supplementary file 10). 892

22 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

893 Tests for Rates of Substitution. Based on our aligned matrix from single-copy orthologs used 894 for phylogenomics, we tested for differences in rates of molecular substitution between 895 vertebrates by using the two-cluster test implemented in LINTRE (2010 Apr 17 version) 896 (Takezaki et al., 1995), using amino acid p-distances between taxa to estimate branch lengths. 897 The two-cluster test is designed to test if the rates in two clades are significantly different by 898 comparison to an outgroup. We tested rates on the full tree, as well as focused on certain 899 cluster pairs by subsetting the dataset to focus on specific clades for comparison. Sequences 900 were converted to phylip format from fasta format using pxs2phy using phyx v1.01 (Brown et al., 901 2017). 902 We also compared rates of genomic evolution of four independent instances of 903 vertebrate gigantism (whale shark, elephant, baleen whales, ocean sunfish) relative to the 904 background rate of molecular evolution among vertebrates. To do this, we used PAML 4.9i to 905 compute the likelihoods of the alignment of single-copy orthologs used for phylogenomics under 906 two different models of molecular evolution (Yang, 2007). We computed the likelihood of the 907 data under a strict clock model (single-rate model) and under a local clock model (two-rate 908 model) where the clock rate differed on branches leading to vertebrate giants. We then 909 determined significance using the likelihood ratio test. PAML control files are provided 910 (Supplementary file 10). 911 912 Rates of Gene Family Size Evolution. We estimated rates of gene family expansion and 913 contraction across vertebrates among gene families. OrthoFinder output includes counts of the 914 size of each orthogroup (i.e. gene families) for each species. We analyzed the evolution of gene 915 family size under a birth-death process using CAFE version 4.2.1 (Han et al., 2013), with the 916 gene family size evolutionary rate parameter λ. We focused on gene families present in the 917 most recent common ancestor of vertebrates and filtered these only to gene families present in 918 at least two species, and to exclude gene families that exceed 100 copies in any species, as 919 large gene families have too large variance for consistent rate parameter estimation, resulting in 920 10,258 gene families. We used the caferror.py script to estimate species-by-species error rates 921 in the annotation to improve the accuracy of rate estimation (Han et al., 2013). We used a time- 922 calibrated phylogeny of vertebrates for this analysis (see above). We provide scripts used for 923 running CAFE (Supplementary file 10). 924 We estimated rates of gene duplication and loss across vertebrates under a single λ 925 model, and two multi-λ models: a two λ mode model where branches leading to gigantism had a 926 second rate, and a five λ model where the rate categories were the background and a separate 927 rate for each of the four independent origins of gigantism. However, the five λ model did not 928 converge. To test for significance of the observed difference in likelihoods between the two λ 929 model and the single λ model, we simulated gene family evolution with 500 replicates under 930 these models and estimated the log-likelihood ratios from this null, simulated distribution. The p- 931 value corresponds to the proportion of simulated replicates which had a smaller log-likelihood 932 ratio than observed. When fitting the λ model, CAFE 4 additionally computes rates of duplication 933 and loss along each branch for each gene family, and tests whether significant rate shifts occur 934 along each branch (Supplementary file 10). P-values < 0.05 indicate a significant rate shift in 935 gene family size evolution rate.

23 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

936 We identified gene families that had shifted and tested whether they were enriched for 937 GO and Pfam terms (as above). We also tested for enrichment of gene families including 938 human orthologs related to cancer. Cancer-related gene families were determined by 939 downloading the gene families from the COSMIC Cancer Gene Census (Sondka et al., 2018) 940 and determining which orthogroups included the human ortholog based on the Ensembl gene 941 identifier provided by the CGC (database version 91 2020 April 07, accessed 2020 June 3). 942 Ensembl gene ENSG identifiers were matched to the Ensembl protein ENSP identifiers (which 943 we used for orthogroup determination) using biomaRt version 2.45.2, database accessed 2020 944 Sep 9 (Durinck et al., 2005). 945 We also tested whether gene families that shifted in expansion and contraction rate 946 along branches leading to giant taxa were enriched for cancer genes. Six branches were tested 947 in this set of focal branches relative to all other branches in the vertebrate phylogeny: the 948 branch leading to whale shark, the branch leading to ocean sunfish, the branch leading to 949 African elephant, and the branches corresponding to the clade of baleen whales (the clade 950 sister to bottlenose dolphin). To confirm whether the results were more extreme than expected, 951 we performed two tests. First, we drew the branches corresponding to the non-giant sister taxa 952 of the vertebrate giants, then tested for whether these were enriched for cancer genes. 953 Secondly, we tested for cancer gene enrichment on 100 permutations of selecting six random 954 branches without replacement from across the vertebrate phylogeny. We then compared the 955 observed odds ratio of enrichment for cancer genes to this null distribution. 956 We then compared the rate of gene family size evolution for gene families related to 957 cancer to rates of gene families not related to cancer along branches leading to giant 958 vertebrates and the remaining branches in phylogeny. Branch-wise rates of gene family size 959 evolution were estimated by computing the difference in estimated ancestral and descendant 960 gene family sizes of each branch and dividing by time. We also used the lme4 package and 961 lmerTest packages to fit a linear mixed model and test for significant contribution on rate 962 depends on whether it was estimated for a cancer gene or not, whether the rate was estimated 963 on a branch leading to a giant taxon or not, the interaction of these variables, and with gene 964 family as a random effect. 965

966 References

967 Abegglen LM, Caulin AF, Chan A, Lee K, Robinson R, Campbell MS, Kiso WK, Schmitt DL, 968 Waddell PJ, Bhaskara S, Jensen ST, Maley CC, Schiffman JD. 2015. Potential 969 Mechanisms for Cancer Resistance in Elephants and Comparative Cellular Response to 970 DNA Damage in Humans. JAMA 314:1850–1860. 971 Alexopoulos H, Böttger A, Fischer S, Levin A, Wolf A, Fujisawa T, Hayakawa S, Gojobori T, 972 Davies JA, David CN, Bacon JP. 2004. Evolution of gap junctions: the missing link? Curr 973 Biol 14:R879–80. 974 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J 975 Mol Biol 215:403–410. 976 Ballesteros JA, Hormiga G. 2016. A New Orthology Assessment Method for Phylogenomic 977 Data: Unrooted Phylogenetic Orthology. Mol Biol Evol 33:2481. 978 Benton MJ, Donoghue PCJ, Asher RJ, Friedman M, Near TJ, Vinther J. 2014. Constraints on

24 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

979 the timescale of animal evolutionary history. Palaentologica Electronica 18.1.1FC:1–106. 980 Boeckmann B, Bairoch A, Apweiler R, Blatter M-C, Estreicher A, Gasteiger E, Martin MJ, 981 Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M. 2003. The SWISS-PROT protein 982 knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31:365–370. 983 Boehm T. 2012. Evolution of vertebrate immunity. Curr Biol 22:R722–32. 984 Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence 985 data. Bioinformatics 30:2114–2120. 986 Boot-Handford RP, Tuckwell DS. 2003. Fibrillar collagen: the key to vertebrate evolution? A tale 987 of molecular incest. Bioessays 25:142–151. 988 Boudinot P, Zou J, Ota T, Buonocore F, Scapigliati G, Canapa A, Cannon J, Litman G, Hansen 989 JD. 2014. A -like repertoire of innate immune receptors and effectors for 990 coelacanths. J Exp Zool B Mol Dev Evol 322:415–437. 991 Braasch I, Gehrke AR, Smith JJ, Kawasaki K, Manousaki T, Pasquier J, Amores A, Desvignes 992 T, Batzel P, Catchen J, Berlin AM, Campbell MS, Barrell D, Martin KJ, Mulley JF, Ravi V, 993 Lee AP, Nakamura T, Chalopin D, Fan S, Wcisel D, Cañestro C, Sydes J, Beaudry FEG, 994 Sun Y, Hertel J, Beam MJ, Fasold M, Ishiyama M, Johnson J, Kehr S, Lara M, Letaw JH, 995 Litman GW, Litman RT, Mikami M, Ota T, Saha NR, Williams L, Stadler PF, Wang H, 996 Taylor JS, Fontenot Q, Ferrara A, Searle SMJ, Aken B, Yandell M, Schneider I, Yoder JA, 997 Volff J-N, Meyer A, Amemiya CT, Venkatesh B, Holland PWH, Guiguen Y, Bobe J, Shubin 998 NH, Di Palma F, Alföldi J, Lindblad-Toh K, Postlethwait JH. 2016. The spotted gar genome 999 illuminates vertebrate evolution and facilitates human-teleost comparisons. Nat Genet 1000 48:427–437. 1001 Brayer KJ, Segal DJ. 2008. Keep your fingers off my DNA: protein--protein interactions 1002 mediated by C2H2 zinc finger domains. Cell Biochem Biophys 50:111–131. 1003 Brown GD, Willment JA, Whitehead L. 2018. C-type lectins in immunity and homeostasis. Nat 1004 Rev Immunol 18:374–389. 1005 Brown JW, Walker JF, Smith SA. 2017. Phyx: phylogenetic tools for unix. Bioinformatics 1006 33:1886–1888. 1007 Buckley KM, Rast JP. 2015. Diversity of animal immune receptors and the origins of recognition 1008 complexity in the deuterostomes. Dev Comp Immunol 49:179–189. 1009 Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. 2009. trimAl: a tool for automated 1010 alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973. 1011 Chen H, Liang Y, Han Y, Liu T, Chen S. 2021. Genome-wide analysis of Toll-like receptors in 1012 zebrafish and the effect of rearing temperature on the receptors in response to stimulated 1013 pathogen infection. J Fish Dis 44:337–349. 1014 Coates MI, Gess RW, Finarelli JA, Criswell KE, Tietjen K. 2017. A symmoriiform chondrichthyan 1015 braincase and the origin of chimaeroid fishes. Nature 541:208–211. 1016 Criscuolo A, Gribaldo S. 2010. BMGE (Block Mapping and Gathering with Entropy): a new 1017 software for selection of phylogenetic informative regions from multiple sequence 1018 alignments. BMC Evol Biol 10:210. 1019 Dooley H. 2014. Athena and the Evolution of Adaptive Immunity. Immunobiology of the Shark 1020 29. 1021 Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, Huber W. 2005. BioMart and 1022 Bioconductor: a powerful link between biological databases and microarray data analysis. 1023 Bioinformatics 21:3439–3440. 1024 Eddy SR. 1998. Profile hidden Markov models. Bioinformatics 14:755–763. 1025 Emms DM, Kelly S. 2015. OrthoFinder: solving fundamental biases in whole genome 1026 comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16:157. 1027 Esselstyn JA, Oliveros CH, Swanson MT, Faircloth BC. 2017. Investigating Difficult Nodes in the 1028 Placental Mammal Tree with Expanded Taxon Sampling and Thousands of Ultraconserved 1029 Elements. Genome Biol Evol 9:2308–2321.

25 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1030 Flajnik MF, Kasahara M. 2010. Origin and evolution of the adaptive immune system: genetic 1031 events and selective pressures. Nat Rev Genet 11:47–59. 1032 Fritz JH, Kufer TA. 2015. Editorial: NLR-Protein Functions in Immunity. Front Immunol 6:306. 1033 Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD-HIT: accelerated for clustering the next-generation 1034 sequencing data. Bioinformatics 28:3150–3152. 1035 Garrison E, Kronenberg ZN, Dawson ET, Pedersen BS, Prins P. 2021. Vcflib and tools for 1036 processing the VCF variant call format. bioRxiv. doi:10.1101/2021.05.21.445151 1037 Garrison E, Marth G. 2012. Haplotype-based variant detection from short-read sequencing. 1038 arXiv [q-bioGN]. 1039 Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome 1040 assemblies. Bioinformatics 29:1072–1075. 1041 Han MV, Thomas GWC, Lugo-Martinez J, Hahn MW. 2013. Estimating gene gain and loss rates 1042 in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol 1043 30:1987–1997. 1044 Haq F, Ahmed N, Qasim M. 2019. Comparative genomic analysis of collagen gene diversity. 3 1045 Biotech 9:83. 1046 Hara Y, Tatsumi K, Yoshida M, Kajikawa E, Kiyonari H, Kuraku S. 2015. Optimizing and 1047 benchmarking de novo transcriptome sequencing: from library preparation to assembly 1048 evaluation. BMC Genomics 16:977. 1049 Hara Y, Yamaguchi K, Onimaru K, Kadota M, Koyanagi M, Keeley SD, Tatsumi K, Tanaka K, 1050 Motone F, Kageyama Y, Nozu R, Adachi N, Nishimura O, Nakagawa R, Tanegashima C, 1051 Kiyatake I, Matsumoto R, Murakumo K, Nishida K, Terakita A, Kuratani S, Sato K, Hyodo S, 1052 Kuraku S. 2018. Shark genomes provide insights into elasmobranch evolution and the 1053 origin of vertebrates. Nat Ecol Evol 2:1761–1771. 1054 Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Le SV. 2017. UFBoot2: Improving the 1055 Ultrafast Bootstrap Approximation. Mol Biol Evol. doi:10.1093/molbev/msx281 1056 Howe K, Schiffer PH, Zielinski J, Wiehe T, Laird GK, Marioni JC, Soylemez O, Kondrashov F, 1057 Leptin M. 2016. Structure and evolutionary history of a large family of NLR proteins in the 1058 zebrafish. Open Biol 6:160009. 1059 Huang S, Yuan S, Guo L, Yu Y, Li J, Wu T, Liu T, Yang M, Wu K, Liu H, Ge J, Yu Y, Huang H, 1060 Dong M, Yu C, Chen S, Xu A. 2008. Genomic analysis of the immune gene repertoire of 1061 amphioxus reveals extraordinary innate complexity and diversity. Genome Res 18:1112– 1062 1126. 1063 Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty 1064 L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, 1065 Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, 1066 Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJA, Thimma M, Thomas 1067 PD, Valentin F, Wilson D, Wu CH, Yeats C. 2009. InterPro: the integrative protein signature 1068 database. Nucleic Acids Res 37:D211–5. 1069 Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, 1070 Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong S-Y, Lopez R, 1071 Hunter S. 2014. InterProScan 5: genome-scale protein function classification. 1072 Bioinformatics 30:1236–1240. 1073 Kaiser P, Rothwell L, Avery S, Balu S. 2004. Evolution of the interleukins. Dev Comp Immunol 1074 28:375–394. 1075 Kajitani R, Toshimoto K, Noguchi H, Toyoda A, Ogura Y, Okuno M, Yabana M, Harada M, 1076 Nagayasu E, Maruyama H, Kohara Y, Fujiyama A, Hayashi T, Itoh T. 2014. Efficient de 1077 novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. 1078 Genome Res 24:1384–1395. 1079 Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. 2017. ModelFinder: fast 1080 model selection for accurate phylogenetic estimates. Nat Methods 14:587–589.

26 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1081 Kasamatsu J, Oshiumi H, Matsumoto M, Kasahara M, Seya T. 2010. Phylogenetic and 1082 expression analysis of lamprey toll-like receptors. Dev Comp Immunol 34:855–865. 1083 Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7: 1084 improvements in performance and usability. Mol Biol Evol 30:772–780. 1085 Kelley J, Walter L, Trowsdale J. 2005. Comparative genomics of natural killer cell receptor gene 1086 clusters. PLoS Genet 1:129–139. 1087 Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable and 1088 accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome 1089 Res. doi:10.1101/gr.215087.116 1090 Korithoski B, Kolaczkowski O, Mukherjee K, Kola R, Earl C, Kolaczkowski B. 2015. Evolution of 1091 a Novel Antiviral Immune-Signaling Interaction by Partial-Gene Duplication. PLoS One 1092 10:e0137276. 1093 Krishnaswamy Gopalan T, Gururaj P, Gupta R, Gopal DR, Rajesh P, Chidambaram B, 1094 Kalyanasundaram A, Angamuthu R. 2014. Transcriptome profiling reveals higher vertebrate 1095 orthologous of intra-cytoplasmic pattern recognition receptors in grey bamboo shark. PLoS 1096 One 9:e100018. 1097 Laetsch DR, Blaxter ML. 2017. KinFin: Software for Taxon-Aware Analysis of Clustered Protein 1098 Sequences. G3 7:3349–3357. 1099 Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 1100 arXiv preprint arXiv:13033997. 1101 Li J, Mahajan A, Tsai M-D. 2006. Ankyrin repeat: a unique motif mediating protein- protein 1102 interactions. Biochemistry 45:15168–15178. 1103 Litscher ES, Wassarman PM. 2014. Evolution, structure, and synthesis of vertebrate egg-coat 1104 proteins. Trends Dev Biol 8:65–76. 1105 Loo Y-M, Gale M Jr. 2011. Immune signaling by RIG-I-like receptors. Immunity 34:680–692. 1106 Mapleson D, Garcia Accinelli G, Kettleborough G, Wright J, Clavijo BJ. 2016. KAT: a K-mer 1107 analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics. 1108 doi:10.1093/bioinformatics/btw663 1109 Marçais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of 1110 occurrences of k-mers. Bioinformatics 27:764–770. 1111 Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, Geer RC, He J, 1112 Gwadz M, Hurwitz DI, Lanczycki CJ, Lu F, Marchler GH, Song JS, Thanki N, Wang Z, 1113 Yamashita RA, Zhang D, Zheng C, Bryant SH. 2015. CDD: NCBI’s conserved domain 1114 database. Nucleic Acids Res 43:D222–6. 1115 Margolin JF, Friedman JR, Meyer WK, Vissing H, Thiesen HJ, Rauscher FJ 3rd. 1994. Krüppel- 1116 associated boxes are potent transcriptional repression domains. Proc Natl Acad Sci U S A 1117 91:4509–4513. 1118 Marra NJ, Stanhope MJ, Jue NK, Wang M, Sun Q, Bitar PP, Richards VP, Komissarov A, Rayko 1119 M, Kliver S, Stanhope BJ, Winkler C, O’Brien SJ, Antunes A, Jorgensen S, Shivji MS. 2019. 1120 White shark genome reveals ancient elasmobranch adaptations associated with wound 1121 healing and the maintenance of genome stability. Proc Natl Acad Sci U S A 201819778. 1122 Martin AP, Palumbi SR. 1993. Body size, metabolic rate, generation time, and the molecular 1123 clock. Proc Natl Acad Sci U S A 90:4087–4091. 1124 McClain CR, Balk MA, Benfield MC, Branch TA, Chen C, Cosgrove J, Dove ADM, Gaskins LC, 1125 Helm RR, Hochberg FG, Lee FB, Marshall A, McMurray SE, Schanche C, Stone SN, Thaler 1126 AD. 2015. Sizing ocean giants: patterns of intraspecific size variation in marine megafauna. 1127 PeerJ 2:e715. 1128 Minh BQ, Nguyen MAT, von Haeseler A. 2013. Ultrafast approximation for phylogenetic 1129 bootstrap. Mol Biol Evol 30:1188–1195. 1130 Misof B, Meyer B, von Reumont BM, Kück P, Misof K, Meusemann K. 2013. Selecting 1131 informative subsets of sparse supermatrices increases the chance to find correct trees.

27 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1132 BMC Bioinformatics 14:348. 1133 Mukherjee K, Korithoski B, Kolaczkowski B. 2014. Ancient origins of vertebrate-specific innate 1134 antiviral immunity. Mol Biol Evol 31:140–153. 1135 Near TJ, Meylan PA, Shaffer HB. 2005. Assessing concordance of fossil calibration points in 1136 molecular clock studies: an example using . Am Nat 165:137–146. 1137 Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective 1138 stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32:268– 1139 274. 1140 Nishimura O, Hara Y, Kuraku S. 2017. gVolante for standardizing completeness assessment of 1141 genome and transcriptome assemblies. Bioinformatics 33:3635–3637. 1142 O’Connell J, Schulz-Trieglaff O, Carlson E, Hims MM, Gormley NA, Cox AJ. 2015. NxTrim: 1143 optimized trimming of Illumina mate pair reads. Bioinformatics 31:2035–2037. 1144 Ohno S, Wolf U, Atkin NB. 1968. Evolution from fish to mammals by gene duplication. Hereditas 1145 59:169–187. 1146 O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, 1147 Smith-White B, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y, Blinkova O, Brover V, 1148 Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, 1149 Hatcher E, Hlavina W, Joardar VS, Kodali VK, Li W, Maglott D, Masterson P, McGarvey 1150 KM, Murphy MR, O’Neill K, Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, 1151 Shkeda A, Storz SS, Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR, Wallin C, 1152 Webb D, Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio M, Kitts P, Murphy TD, 1153 Pruitt KD. 2016. Reference sequence (RefSeq) database at NCBI: current status, 1154 taxonomic expansion, and functional annotation. Nucleic Acids Res 44:D733–45. 1155 Peto R, Roe FJ, Lee PN, Levy L, Clack J. 1975. Cancer and ageing in mice and men. Br J 1156 Cancer 32:411–426. 1157 Pierce KL, Premont RT, Lefkowitz RJ. 2002. Seven-transmembrane receptors. Nat Rev Mol Cell 1158 Biol 3:639–650. 1159 Pimiento C, Cantalpiedra JL, Shimada K, Field DJ, Smaers JB. 2019. Evolutionary pathways 1160 toward gigantism in sharks and rays. Evolution. doi:10.1111/evo.13680 1161 Popper AN, Platt C, Edds PL. 1992. Evolution of the Vertebrate Inner Ear: An Overview of Ideas 1162 In: Webster DB, Popper AN, Fay RR, editors. The Evolutionary Biology of Hearing. New 1163 York, NY: Springer New York. pp. 49–57. 1164 Puttick MN, Thomas GH. 2015. Fossils and living taxa agree on patterns of body mass 1165 evolution: a case study with Afrotheria. Proc Biol Sci 282:20152023. 1166 Rabosky DL, Santini F, Eastman J, Smith SA, Sidlauskas B, Chang J, Alfaro ME. 2013. Rates 1167 of speciation and morphological evolution are correlated across the largest vertebrate 1168 radiation. Nat Commun 4:1958. 1169 Rast JP, Smith LC, Loza-Coll M, Hibino T, Litman GW. 2006. Genomic insights into the immune 1170 system of the sea urchin. Science 314:952–956. 1171 Read TD, Petit RA 3rd, Joseph SJ, Alam MT, Weil MR, Ahmad M, Bhimani R, Vuong JS, Haase 1172 CP, Webb DH, Tan M, Dove ADM. 2017. Draft sequencing and assembly of the genome of 1173 the world’s largest fish, the whale shark: Rhincodon typus Smith 1828. BMC Genomics 1174 18:532. 1175 Redmond AK, Macqueen DJ, Dooley H. 2018. Phylotranscriptomics suggests the jawed 1176 vertebrate ancestor could generate diverse helper and regulatory T cell subsets. BMC Evol 1177 Biol 18:169. 1178 Rhincodon typus Annotation Report. 2018. 1179 https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Rhincodon_typus/100/ 1180 Richter DJ, Fozouni P, Eisen MB, King N. 2018. Gene family innovation, conservation and loss 1181 on the animal stem lineage. Elife 7. doi:10.7554/eLife.34226 1182 Roach JC, Glusman G, Rowen L, Kaur A, Purcell MK, Smith KD, Hood LE, Aderem A. 2005.

28 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1183 The evolution of vertebrate Toll-like receptors. Proc Natl Acad Sci U S A 102:9577–9582. 1184 Schroder K, Tschopp J. 2010. The inflammasomes. Cell 140:821–832. 1185 Shen W, Le S, Li Y, Hu F. 2016. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q 1186 File Manipulation. PLoS One 11:e0163962. 1187 She R, Chu JS-C, Uyar B, Wang J, Wang K, Chen N. 2011. genBlastG: using BLAST searches 1188 to build homologous gene models. Bioinformatics 27:2141–2143. 1189 Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: 1190 assessing genome assembly and annotation completeness with single-copy orthologs. 1191 Bioinformatics 31:3210–3212. 1192 Singh PP, Arora J, Isambert H. 2015. Identification of Ohnolog Genes Originating from Whole 1193 Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple 1194 Genomes. PLoS Comput Biol 11:e1004394. 1195 Slater GJ, Goldbogen JA, Pyenson ND. 2017. Independent evolution of baleen whale gigantism 1196 linked to Plio-Pleistocene ocean dynamics. Proc Biol Sci 284. doi:10.1098/rspb.2017.0546 1197 Smith JJ, Kuraku S, Holt C, Sauka-Spengler T, Jiang N, Campbell MS, Yandell MD, Manousaki 1198 T, Meyer A, Bloom OE, Morgan JR, Buxbaum JD, Sachidanandam R, Sims C, Garruss AS, 1199 Cook M, Krumlauf R, Wiedemann LM, Sower SA, Decatur WA, Hall JA, Amemiya CT, Saha 1200 NR, Buckley KM, Rast JP, Das S, Hirano M, McCurley N, Guo P, Rohner N, Tabin CJ, 1201 Piccinelli P, Elgar G, Ruffier M, Aken BL, Searle SMJ, Muffato M, Pignatelli M, Herrero J, 1202 Jones M, Brown CT, Chung-Davidson Y-W, Nanlohy KG, Libants SV, Yeh C-Y, McCauley 1203 DW, Langeland JA, Pancer Z, Fritzsch B, de Jong PJ, Zhu B, Fulton LL, Theising B, Flicek 1204 P, Bronner ME, Warren WC, Clifton SW, Wilson RK, Li W. 2013. Sequencing of the sea 1205 lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution. Nat 1206 Genet 45:415–21, 421e1–2. 1207 Solbakken MH, Voje KL, Jakobsen KS, Jentoft S. 2017. Linking species habitat and past 1208 palaeoclimatic events to evolution of the teleost innate immune system. Proc Biol Sci 284. 1209 doi:10.1098/rspb.2016.2810 1210 Sondka Z, Bamford S, Cole CG, Ward SA, Dunham I, Forbes SA. 2018. The COSMIC Cancer 1211 Gene Census: describing genetic dysfunction across all human cancers. Nat Rev Cancer 1212 18:696–705. 1213 Stein C, Caccamo M, Laird G, Leptin M. 2007. Conservation and divergence of gene families 1214 encoding components of innate immune response systems in zebrafish. Genome Biol 1215 8:R251. 1216 Sulak M, Fong L, Mika K, Chigurupati S, Yon L, Mongan NP, Emes RD, Lynch VJ. 2016. TP53 1217 copy number expansion is associated with the evolution of increased body size and an 1218 enhanced DNA damage response in elephants. Elife 5. doi:10.7554/eLife.11994 1219 Takei Y, Hasegawa Y, Watanabe TX, Nakajima K, Hazon N. 1993. A novel angiotensin I 1220 isolated from an elasmobranch fish. J Endocrinol 139:281–285. 1221 Takezaki N, Rzhetsky A, Nei M. 1995. Phylogenetic test of the molecular clock and linearized 1222 trees. Mol Biol Evol 12:823–833. 1223 Tassia MG, Whelan NV, Halanych KM. 2017. Toll-like receptor pathway evolution in 1224 deuterostomes. Proc Natl Acad Sci U S A 114:7055–7060. 1225 Tollis M, Robbins J, Webb AE, Kuderna LFK, Caulin AF, Garcia JD, Bèrubè M, Pourmand N, 1226 Marques-Bonet T, O’Connell MJ, Palsbøll PJ, Maley CC. 2019. Return to the sea, get huge, 1227 beat cancer: an analysis of cetacean genomes including an assembly for the humpback 1228 whale (Megaptera novaeangliae). Mol Biol Evol. doi:10.1093/molbev/msz099 1229 Tollis M, Schiffman JD, Boddy AM. 2017. Evolution of cancer suppression as revealed by 1230 mammalian comparative genomics. Curr Opin Genet Dev 42:40–47. 1231 Tria FDK, Landan G, Dagan T. 2017. Phylogenetic rooting using minimal ancestor deviation. 1232 Nat Ecol Evol 1:193. 1233 Urbach JM, Ausubel FM. 2017. The NBS-LRR architectures of plant R-proteins and metazoan

29 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1234 NLRs evolved in independent events. Proc Natl Acad Sci U S A 114:1063–1068. 1235 Venkatesh B, Lee AP, Ravi V, Maurya AK, Lian MM, Swann JB, Ohta Y, Flajnik MF, Sutoh Y, 1236 Kasahara M, Hoon S, Gangu V, Roy SW, Irimia M, Korzh V, Kondrychyn I, Lim ZW, Tay B- 1237 H, Tohari S, Kong KW, Ho S, Lorente-Galdos B, Quilez J, Marques-Bonet T, Raney BJ, 1238 Ingham PW, Tay A, Hillier LW, Minx P, Boehm T, Wilson RK, Brenner S, Warren WC. 1239 2014. Elephant shark genome provides unique insights into gnathostome evolution. Nature 1240 505:174–179. 1241 Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, Schatz MC. 1242 2017. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 1243 33:2202–2204. 1244 Wada H, Okuyama M, Satoh N, Zhang S. 2006. Molecular evolution of fibrillar collagen in 1245 chordates, with implications for the evolution of vertebrate skeletons and chordate 1246 phylogeny. Evol Dev 8:370–377. 1247 Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, 1248 Wortman J, Young SK, Earl AM. 2014. Pilon: an integrated tool for comprehensive 1249 microbial variant detection and genome assembly improvement. PLoS One 9:e112963. 1250 Wang J, Zhang Z, Liu J, Zhao J, Yin D. 2016. Ectodomain Architecture Affects Sequence and 1251 Functional Evolution of Vertebrate Toll-like Receptors. Sci Rep 6:26705. 1252 Warren WC, Hillier LW, Tomlinson C, Minx P, Kremitzki M, Graves T, Markovic C, Bouk N, 1253 Pruitt KD, Thibaud-Nissen F, Schneider V, Mansour TA, Brown CT, Zimin A, Hawken R, 1254 Abrahamsen M, Pyrkosz AB, Morisson M, Fillon V, Vignal A, Chow W, Howe K, Fulton JE, 1255 Miller MM, Lovell P, Mello CV, Wirthlin M, Mason AS, Kuo R, Burt DW, Dodgson JB, Cheng 1256 HH. 2017. A New Chicken Genome Assembly Provides Insight into Avian Genome 1257 Structure. G3 7:109–117. 1258 Weber JA, Park SG, Luria V, Jeon S, Kim H-M, Jeon Y, Bhak Y, Jun JH, Kim SW, Hong WH, 1259 Lee S, Cho YS, Karger A, Cain JW, Manica A, Kim S, Kim J-H, Edwards JS, Bhak J, 1260 Church GM. 2020. The whale shark genome reveals how genomic and physiological 1261 properties scale with body size. Proc Natl Acad Sci U S A 117:20662–20671. 1262 Wolfe SA, Nekludova L, Pabo CO. 2000. DNA recognition by Cys2His2 zinc finger proteins. 1263 Annu Rev Biophys Biomol Struct 29:183–212. 1264 Wright SP. 1992. Adjusted p-values for simultaneous inference. Biometrics 48:1005. 1265 Yang Z. 2007. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol Biol Evol 24:1586– 1266 1591. 1267 Yates AD, Achuthan P, Akanni W, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, 1268 Azov AG, Bennett R, Bhai J, Billis K, Boddu S, Marugán JC, Cummins C, Davidson C, 1269 Dodiya K, Fatima R, Gall A, Giron CG, Gil L, Grego T, Haggerty L, Haskell E, Hourlier T, 1270 Izuogu OG, Janacek SH, Juettemann T, Kay M, Lavidas I, Le T, Lemos D, Martinez JG, 1271 Maurel T, McDowall M, McMahon A, Mohanan S, Moore B, Nuhn M, Oheh DN, Parker A, 1272 Parton A, Patricio M, Sakthivel MP, Abdul Salam AI, Schmitt BM, Schuilenburg H, 1273 Sheppard D, Sycheva M, Szuba M, Taylor K, Thormann A, Threadgold G, Vullo A, Walts B, 1274 Winterbottom A, Zadissa A, Chakiachvili M, Flint B, Frankish A, Hunt SE, IIsley G, 1275 Kostadima M, Langridge N, Loveland JE, Martin FJ, Morales J, Mudge JM, Muffato M, 1276 Perry E, Ruffier M, Trevanion SJ, Cunningham F, Howe KL, Zerbino DR, Flicek P. 2020. 1277 Ensembl 2020. Nucleic Acids Res 48:D682–D688. 1278 Zhang C, Rabiee M, Sayyari E, Mirarab S. 2018. ASTRAL-III: polynomial time species tree 1279 reconstruction from partially resolved gene trees. BMC Bioinformatics 19:153. 1280 Zhang Y, Gao H, Li H, Guo J, Ouyang B, Wang M, Xu Q, Wang J, Lv M, Guo X, Liu Q, Wei L, 1281 Ren H, Xi Y, Guo Y, Ren B, Pan S, Liu C, Ding X, Xiang H, Yu Y, Song Y, Meng L, Liu S, 1282 Wang J, Jiang Y, Shi J, Liu S, Sabir JSM, Sabir MJ, Khan M, Hajrah NH, Ming-Yuen Lee S, 1283 Xu X, Yang H, Wang J, Fan G, Yang N, Liu X. 2020. The White-Spotted Bamboo Shark 1284 Genome Reveals Chromosome Rearrangements and Fast-Evolving Immune Genes of

30 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1285 Cartilaginous Fish. iScience 23:101754.

1286

31 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1287 1288 Figure 1. Origins and losses of vertebrate gene families. Above the branch in black is the total 1289 number of gene families inferred to be present in the most recent common ancestor at that 1290 branch; the number in parentheses indicates the number of gene families conserved in all 1291 descendants of that branch. Numbers preceded by + and – indicate the number of gene families 1292 inferred to be gained or lost along that branch. Gains and losses are color-coded based on the 1293 branch where these gene families originated. Colors indicate gene families present in the most 1294 recent common ancestor of chordates, green indicates gene families that originated in the most 1295 recent common ancestor of tunicates and vertebrates (Olfactores), purple indicates vertebrate- 1296 derived gene families, orange indicates gnathostome-derived gene families, grey indicates 1297 chondrichthyan-derived gene families, while dark blue indicates shark-derived gene families. 1298 Negative numbers within parentheses indicate gene family losses that are unique to that branch 1299 (as opposed to gene families that were also lost along other branches). Positive colored 1300 numbers within parentheses indicate novel gene families conserved in all descendants ("core" 1301 gene families).

32 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1302 1303

33 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1304 O)4A.+!#:!G(+!TII!.+/+.,0).+!03!'(2*+!5(2.U:!701+5!5A//0.,+1!@F8V!NOPQQG!)E1)62,+1!'),(!2!

1305 10,:!O0.!7QWX*)U+!.+6+/,0.5&!7HI5!)E!'(2*+!5(2.U!'),(!2!7YZ[G!10B2)E!2.+!)E1)62,+1!=-!2!10,!

1306 2,!,(+!,)/:!\++!2*50&!O)4A.+!#]3)4A.+!5A//*+B+E,!?]#:!O0.!I^

1308 10,!2,!+26(!,)/:!\++!2*50&!O)4A.+!#]3)4A.+!5A//*+B+E,!>:!O0.!G0**X*)U+!.+6+/,0.5&!+26(!6*21+!

1309 .+/.+5+E,5!2!5+/2.2,+!GHI!+L6+/,!32B)*)+5!30AE1!'),()E!GHI?>!2.+!2*50!*2=+*+1!2!"GHI?>2%&!=! 1310 "GHI>#%&!2E1!6!"GHI>>%:!GHI!32B)*)+5!2.+!2*50!*2=+*+1!=-!5,2.5!)E1)62,)E4!'(+,(+.!,(+-!'+.+!

1311 /.+5+E,!)E!,(+!'(2*+!5(2.U!4+E0B+&!/.+5+E,!)E!_2'+1!R+.,+=.2,+!2E6+5,0.&!/.+5+E,!)E!,(+!

1312 R+.,+=.2,+!2E6+5,0.&!2E1!E0R+*!,0!,()5!5,A1-:!\++!2*50!O)4A.+!#]5A//*+B+E,!`:

34 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1313 35 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1314 (See Included File for high resolution file) 1315 Figure 2–figure supplement 1. Phylogenetic analysis of NLRs from whale shark, zebrafish and 1316 human. Branches leading to human sequences are shown in black, to zebrafish in blue-green, 1317 and whale shark in red. Whale shark sequences with a detectable NACHT have ‘_NACHT’ at 1318 the end of the sequence identifier (except for transcriptome sequence which all contain NACHT 1319 domains; All NACHT domain containing sequences are also noted with ‘*’ in table 1). 1320

36 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1321

1322 1323 (See Included File for high resolution file) 1324 Figure 2–figure supplement 2. Detailed analysis of NOD1 evolution. (A) Phylogenetic analysis of 1325 NOD1 focused NLR dataset. (B) Domain structure of NOD1 duplicates in whale shark. (C) 1326 Synteny analysis of jawed vertebrate NOD1s. 1327

37 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1328 1329 (See Included File for high resolution file) 1330 Figure 2–figure supplement 3. Phylogenetic Analyses of whale shark and jawed vertebrate 1331 RLRs, DICER and MAVS. (A) Tree of the RLR and DICER DEAD-Helicase domains. (B) Tree of 1332 the RLR and MAVS CARD domains. Whale shark sequences are highlighted in blue. 1333 1334

38 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1335

39 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1336 (See Included File for high resolution file) 1337 Figure 2–figure supplement 4. Phylogenetic tree of vertebrate TLRs, including new whale shark 1338 sequences. The tree is rooted according to the minimal ancestor deviation method (Kümmel 1339 Tria, Landan and Dagan, 2017). 1340

40 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1341 1342 Figure 3. Amino acid substitution rate variation among jawed vertebrates. Branches are colored 1343 based on rates quantified by substitutions per site per million years of the maximum likelihood 1344 tree compared to a time-calibrated tree. Together, sharks have a slower rate of molecular 1345 evolution than Callorhinchus (see text on two-cluster test). However, sharks do not have a 1346 significantly slower rate of molecular evolution than spotted gar. Furthermore, vertebrate giants 1347 – including the whale shark, ocean sunfish, elephant, and whales – have significantly lower 1348 rates of molecular evolution than other vertebrates. Note, color scale is on normalized 1349 reciprocal-transformed data, which emphasizes changes between smaller values of substitution 1350 per My.

41 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1351 1352 Figure 4. Among 1,387 gene families with a significant rate shift, branch-specific rates of gene 1353 family size evolution for branches leading to giant taxa were significantly higher than in 1354 branches leading to other taxa, and additionally the rate of gene family size evolution was even 1355 greater in cancer-related gene families related to other gene families specifically in branches 1356 leading to giant taxa.

42 9150 +164

Lancelet 7568 –2411 (–796) 8986 (2484) +40 Chordata 6585 Sea Squirt –3388 (–1361) 9939 (2716) –494 (–188) +953 (+232) +8

Olfactores 13535 (5233) 10459 (3364) –128 (–1) Lamprey –191 (–191) –50 (–0) +711 (+189) –30 (–0) +414 (+2) Vertebrata 13329 (4576) –13 (–2) Osteichthyes 10977 –2 (–2) –560 (–41) –127 (–4) Bony Vertebrates +2885 (+174) –160 (–16) –1422 (–32) Gnathostomata +7 10500 –1267 (–473) 13239 (8204) –177 (–62) Callorhinchus 13357 (9349) –108 (–51) bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not –240 (–13) certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under –723 (–175) aCC-BY-NC 4.0 International license. –193 (–10) –70 (–1) –114 (–54) –34 (–0) –56 (–14) –502 (–91) –39 (–5) +34 +276 (+56) –407 (–18) 12248 –25 (–7) –289 (–34) Chondricthyes –49 (–6) White Shark +120 (+120) –48 (–6) 13935 (8521) –497 (–47) –39 (–8) –43 (–11) –191 (–19) –8 (–2) +8 –14 (–6) 12399 –83 (–28) –193 (–21) +840 (+45) –30 (–2) Cloudy Cat Shark 13199 (10884) –55 (–3) –128 (–1) –366 (–10) Selachii –17 (–2) –43 (–0) –153 (–21) –37 (–0) +14

–551 (–4) 11705 –65 (–16) –288 (–19) +88 (+88) –65 (–5) Brown-Banded –44 (–3) –506 (–22) Bamboo Shark Orectolobiformes –66 (–20) –532 (–81) +7

Whale Shark TLR15TLR15

TLR2/28TLR2/28

TLR24TLR24 TLR27TLR27 TLR25TLR25

TLR14/18TLR14/18

TLR1/6/10TLR1/6/10

Outgroup NOD-like Toll-like TLR4 TLR4 receptors Whale shark lineage- specific NLR expansion receptors TLR5SLTLR5SL Whale shark with NACHT NLRP domain NLRX1 TLR5 TLR5 Zebrafish CIITA Mammals Human NLRC5 (including ) NOD2 NOD1 c c NLRC4 Coelacanth b b BIRC1 a aTLR13TLR13 Teleosts TLR21/29−liTLR21/29−like ke NWD1 TLR29TLR29 NLRC3 (TLR35)(TLR35) TEP–1 Spotted gar Lampreys TLR21TLR21 Callorhinchus Whale shark Present in whale shark genome TLR11/12/16/19/20/26TLR11/12/16/19/20/26 Present in jawed vertebrate ancestor FISNA Domain Present in vertebrate ancestor Containing NLR Newly identified TLR family

TLR22/23TLR22/23 RIG-like TLR15 receptors

DEAD TLR2/28 TLR3 TLR3 Dicer domain TLR9cTLR9c RIG–1/DDX58RIG−1/DDX58 TLR24 (TLR31)(TLR31) TLR27 LGP2/DHX58 TLR9aTLR9a bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not LGP2/DHX58 certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. TLR25 TLR9 TLR9 TLR9bTLR9b MDA5/IFIH1MDA5/IFIH1 CARD (TLR30)(TLR30) TLR14/18 TLR7/8−liTLR7/8−like ke Plants domain Outgroup Caspases (TLR34)(TLR34) Invertebrates Outgroup Caspases Mammals TLR7 TLR7 MDA5/IFIH1 Reptiles (including birds) CARD2MDA5/IFIH1 CARD2 Amphibians RIG−1/DDX58 RIG–1/DDX58CARD1 CARD1 Coelacanth TLR1/6/10 RIG–1/DDX58RIG−1/DDX58 CARD2 Teleosts CARD2 Spotted gar MAVSMAVS TLR8 TLR8 Cartilaginous Fishes MDA5/IFIH1 MDA5/IFIH1 CARD1 except Whale Shark CARD1 0.5 Whale shark 0.5 0.5 TLR4

TLR5SL

TLR5

c b a TLR13 TLR21/29−like TLR29 (TLR35) TLR21

TLR11/12/16/19/20/26

TLR22/23

TLR3

TLR9c (TLR31) TLR9a TLR9 TLR9b (TLR30) TLR7/8−like (TLR34) TLR7

TLR8

0.5 Homo_sapiens_ABQ59028.1_APAF1 Rhincodon_typus_XP_020387052.1_NBARC Outgroup Rhincodon_typus_XP_020390367.1_NBARC Rhincodon_typus_XP_020391062.1_blast_NLRC4-like Rhincodon_typus_XP_020391906.1_NACHT 74 Rhincodon_typus_XP_020386151.1_blast_NLRC3-like Rhincodon_typus_XP_020367001.1_NACHT Danio_rerio_A0A0R4IFS1 86 Rhincodon_typus_XP_020378394.1_blast_NLRX1-like Danio_rerio_F1QZJ8 73 Rhincodon_typus_XP_020370463.1_NACHT 80 92 Rhincodon_typus_XP_020389366.1_blast_NLRP-like 81 Rhincodon_typus_XP_020379593.1_NACHT 100 Rhincodon_typus_XP_020380299.1_NACHT Rhincodon_typus_XP_020387859.1_blast_NLRP-like 92 Rhincodon_typus_XP_020383553.1_NACHT Rhincodon_typus_XP_020383554.1_NACHT 59 Whale shark Rhincodon_typus_XP_020385964.1_NACHT 98 Rhincodon_typus_XP_020375477.1_NACHT 81 lineage-specific Rhincodon_typus_XP_020392482.1_NACHT STRGRTYPT.52674.1.p1 94 67 Rhincodon_typus_XP_020387670.1_NACHT NLR expansion 61 76 Rhincodon_typus_XP_020387858.1_blast_NLRP-like Rhincodon_typus_XP_020380409.1_NACHT 70 98 Rhincodon_typus_XP_020391354.1_NACHT 78 Rhincodon_typus_XP_020387523.1_NACHT 81 Rhincodon_typus_XP_020373146.1_NACHT 27 Danio_rerio_A0A075F3V0 Danio_rerio_A0A075B8Z5 95 Danio_rerio_F1QBM8

99 Danio_rerio_K7DYI4 Homo_sapiens_P59044.2_NLRP6 Homo_sapiens_Q96P20.3_NLRP3 88 99 Homo_sapiens_P59046.2_NAL12 Homo_sapiens_Q86W26.1_NAL10 82 Homo_sapiens_Q9C000.1_NLRP1 99 52 Homo_sapiens_P59047.2_NALP5 NLRP Homo_sapiens_Q86W28.2_NALP8 99 Homo_sapiens_Q86W24.1_NAL14 99 Homo_sapiens_Q86W25.2_NAL13 94 94 Homo_sapiens_Q9NX02.1_NALP2 Homo_sapiens_Q8WX94.1_NALP7 95 Homo_sapiens_Q96MN2.3_NALP4 98 Homo_sapiens_Q7RTR0.1_NLRP9 95 Homo_sapiens_P59045.2_NAL11 99 Rhincodon_typus_XP_020376766.1_blast_NLRP-like 61 Danio_rerio_E7F7L0 78 Rhincodon_typus_XP_020372884.1_blast_NLRX1-like Homo_sapiens_Q86UT6.1_NLRX1 70 NLRX1 Danio_rerio_E7FDW7 Rhincodon_typus_XP_020392242.1_NACHT Homo_sapiens_P33076.3_C2TA 99 CIITA Danio_rerio_E7F3F4 64 Rhincodon_typus_XP_020383359.1_NACHT 99 Rhincodon_typus_XP_020383358.1_blast_NLRC5-like Homo_sapiens_Q86WI3.3_NLRC5 Rhincodon_typus_XP_020370077.1_blast_NLRC5-like NLRC5 Danio_rerio_A0A0U1SEF6 89 Danio_rerio_A0A0U1SNH7 Danio_rerio_F8W3K2 75 Rhincodon_typus_XP_020371316.1_NACHT NOD2 Homo_sapiens_Q9HC29.1_NOD2 Homo_sapiens_Q9Y239.1_NOD1 99 Danio_rerio_X1WGQ4 Rhincodon_typus_XP_020372421.1_NACHT NOD1 Rhincodon_typus_XP_020377166.1_NACHT 99 Rhincodon_typus_XP_020377167.1_NACHT 29 Rhincodon_typus_XP_020380031.1_blast_NLRP-like Homo_sapiens_Q9NPP4.2_NLRC4 48 Rhincodon_typus_XP_020392938.1_blast_NLRC4-like NLRC4 Homo_sapiens_Q13075.3_BIRC1 54 22 Rhincodon_typus_XP_020373169.1_blast_BIRC1-like Rhincodon_typus_XP_020373170.1_blast_BIRC1-like BIRC1 Danio_rerio_X1WD23

59 Danio_rerio_H0WEV0 Danio_rerio_A0A1D5NSU9 99 Rhincodon_typus_XP_020391917.1_blast_NWD1-like Danio_rerio_A0A0G2L830 NWD1 Rhincodon_typus_XP_020381701.1_blast_NWD1-like Danio_rerio_X1WFJ7 99 Homo_sapiens_Q149M9.3_NWD1 Rhincodon_typus_XP_020378861.1_blast_NWD1-like Rhincodon_typus_XP_020391377.1_NACHT Homo_sapiens_Q7RTR2.2_NLRC3 NLRC3 Homo_sapiens_Q99973.2_TEP1 Rhincodon_typus_XP_020381121.1_blast_TEP1-like TEP-1 Rhincodon_typus_XP_020374622.1_blast_NLRC3-like Rhincodon_typus_XP_020386246.1_NACHT

67 Rhincodon_typus_XP_020374464.1_blast_NLRC3-like Danio_rerio_A0A2R8QBL1 88 Danio_rerio_K7DY86 Danio_rerio_A0A2R8QGA8

99 Danio_rerio_A0A0G2KGE9 97 63 Danio_rerio_E7EZS1 Danio_rerio_F1Q5P9 Danio_rerio_F1R571 Danio_rerio_F6NI41 99 Danio_rerio_K9J6Q7 Danio_rerio_R4GEZ5 94 70 Danio_rerio_E7FD84 72 Danio_rerio_F1QEN0 73 90 Danio_rerio_F1QNV7 74 Danio_rerio_E7FDK1 Danio_rerio_F1QNR9 54 Danio_rerio_B3DK63

97 Danio_rerio_A0A2R8RK90 Danio_rerio_A0A0R4IG29 85 97 Danio_rerio_A0A0R4IWS0 98 Danio_rerio_R4GDY0 41 50 Danio_rerio_A0A2R8PV68 87 Danio_rerio_A0A0R4IUL2 99 Danio_rerio_F1Q7D8 Danio_rerio_A0A0R4IBH9 78 Danio_rerio_A0A2R8RLF6 66 Danio_rerio_A0A0R4IV55 12 Danio_rerio_A0A0R4ILQ6 23 86 Danio_rerio_A0A0R4ISH0 89 Danio_rerio_A0A0R4IWZ6 31 Danio_rerio_A0A0R4IZB5 42 Danio_rerio_A0A0R4IL40 96 Danio_rerio_F1Q912 64 Danio_rerio_A0A0R4ILB3 46 Danio_rerio_E9QD65 81 Danio_rerio_A0A2R8QSP5 67 Danio_rerio_H0WFI2 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to displayDanio_rerio_A0A0R4IKC0 the preprint in perpetuity. It is made available under FISNA Domain aCC-BY-NC 4.0 International license. Danio_rerio_X1WHH2 59 Danio_rerio_A0A0R4IAU1 33 Containing NLR Danio_rerio_A0A0R4ITY0 98 64 Danio_rerio_A0A0R4IXI1 Danio_rerio_F1R805 84 Danio_rerio_F1R8I8 98 Danio_rerio_U3JB30 Danio_rerio_B0S6L3 35 89 Danio_rerio_B0S8F8 54 Danio_rerio_F1QMK1 61 Danio_rerio_F1R4J1 68 Danio_rerio_F1QYL1 63 65 Danio_rerio_F1QLB9 83 Danio_rerio_F6NX52 83 Danio_rerio_A0A2R8QDR8 83 Danio_rerio_B0S8F9 Danio_rerio_A0A0R4ITG5 Danio_rerio_F1QDK7

78 Danio_rerio_H0WFD4

78 Danio_rerio_A0A0R4IQE4

62 Danio_rerio_A0A0R4IG87 65 61 Danio_rerio_A0A0R4II23 Danio_rerio_K7DYM1 66 Danio_rerio_A0A0R4IY81 97 Danio_rerio_E7FBD8 66 Danio_rerio_R4GED5 Danio_rerio_R4GEH5 52 Danio_rerio_X1WE09 50 Danio_rerio_A0A0G2KYL2 99 Danio_rerio_A0A1L1QZI0 Danio_rerio_F1QLU1 93 59 Danio_rerio_F1QBA5 Danio_rerio_E7F3U2 Danio_rerio_F1QGE7 74 Danio_rerio_F6NLW8 Danio_rerio_E7FA88 75 96 Danio_rerio_F6NQ84

98 Danio_rerio_R4GE51 Danio_rerio_X1WGM2 Danio_rerio_E9QHN9 Danio_rerio_R4GEQ6

0.5 (A) (C) Homo_sapiens_Q9HC29_NOD2 NOD2 MTURN ZNRF2 NOD1 GGCT 100 Human Rhincodon_typus_XP_020371316.1 NC_000007.14

Homo_sapiens_Q9Y239.1_NOD1

93 GGCTb NOD1 ZNRF2b Danio_rerio_XP_002665106.3_NOD1 Zebrafish NC_007127.1 100 Rhincodon_typus_XP_020377166.1 NOD1 93 NOD1-A Callorhinchus_milii_XP_007887856.1 62 Elephant shark Speciation Rhincodon_typus_XP_020377167.1 Duplication NOD1-B ZNFX-1 NOD1-A NOD1-C ZNRF2 MTURN bioRxiv preprint doi: https://doi.org/10.1101/685743Whale shark; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display51 the preprint in perpetuity. It is made available under Elephant aCC-BY-NCshark 4.0 International license. Rhincodon_typus_XP_020372421.1 NW_018042589.1 Zebrafish Human 92 NOD1-C

0.2 Callorhinchus_milii_XP_007887968.1 OSBPL10 GGCT RP9 NW_006890066.1 (B) Human NOD1 Whale shark

NOD1-A NOD1-B Whale shark NOD1-A NW_018042589.1

NOD1-C CTH Whale shark NOD1-B NW_018036385.1 Contig Start Contig End

ZNRF2 MTURN Whale shark NOD1-C NW_020514425.1

RP9 GGCT OSBPL10 NW_018028461.1 CARD/DEATH NACHT LRRs A. RLR DEAD Helicase domain tree B. RLR CARD domain tree

45 NP_001180567.1_IFIH1_Gallus_gallus XP_020377384.1_Rhincodon_typus_10-100 91 XP_006278058.1_IFIH1_Alligator_mississippiensis CDO19216.1_IFIH1_Chiloscyllium_griseum_10-100 XP_007888035.1_IFIH1_Callorhinchus_milii_5-95 63 XP_008176681.1_IFIH1_Chrysemys_picta_bellii NP_001295492.1_IFIH1_Danio_rerio_9-99 XP_012400206.1_IFIH1_Sarcophilus_harrisii XP_006636679.1_IFIH1_Lepisosteus_oculatus_6-98 NP_071451.2_IFIH1_Homo_sapiens 93 XP_006004403.1_IFIH1_Latimeria_chalumnae_8-98 MDA5/IFIH1 87 XP_015281719.1_IFIH1_Gekko_japonicus XP_002933320.2_IFIH1_Xenopus_tropicalis_7-97 XP_006636679.1_IFIH1_Lepisosteus_oculatus XP_012400206.1_IFIH1_Sarcophilus_harrisii_7-97 78 CARD1 NP_001295492.1_IFIH1_Danio_rerio MDA5/IFIH1 NP_071451.2_IFIH1_Homo_sapiens_7-97 85 87 XP_008176681.1_IFIH1_Chrysemys_picta_bellii_6-96 XP_002933320.2_IFIH1_Xenopus_tropicalis XP_006278058.1_IFIH1_Alligator_mississippiensis_6-95 XP_006004403.1_IFIH1_Latimeria_chalumnae NP_001180567.1_IFIH1_Gallus_gallus_6-96 CDO19216.1_IFIH1_Chiloscyllium_griseum 75 XP_015281719.1_IFIH1_Gekko_japonicus_7-97

XP_020377384.1_IFIH1_Rhincodon_typus XP_007896213.1_MAVS_Callorhinchus_milii_3-91 XP_020375939.1_IFIH1_Rhincodon_typus 72 XP_020379351.1__MAVS_Rhincodon_typus 80 XP_007888035.1_IFIH1_Callorhinchus_milii 69 XP_006006070.1_MAVS_Latimeria_chalumnae_3-91

49 XP_008167836.1_MAVS_Chrysemys_picta_bellii_3-91 94 XP_005294133.1_DHX58_Chrysemys_picta_bellii NP_001012911.1_MAVS_Gallus_gallus_3-91 62 91 NP_001305337.1_DHX58_Gallus_gallus MAVS XP_012395992.1_MAVS_Sarcophilus_harrisii_3-91 XP_012403721.1_DHX58_Sarcophilus_harrisii 85 AGF94754.1_MAVS_Homo_sapiens_3-91

NP_077024.2_DHX58_Homo_sapiens XP_017946386.1_MAVS_Xenopus_tropicalis_3-91 XP_002939087.2_DHX58_Xenopus_tropicalis LGP2/DHX58 CAX48608.1_MAVS_Danio_rerio_3-93 XP_006638307.1_DHX58_Lepisosteus_oculatus XP_001509166.2_DDX58_Ornithorhynchus_anatinus_1-42 71 NP_055129.2_DDX58_Homo_sapiens_33-130 NP_001244086.1_DHX58_Danio_rerio XP_005294819.1_DDX58_Chrysemys_picta_bellii_33-130 94 XP_007907694.1_DHX58_Callorhinchus_milii 88 XP_006009599.1_DDX58_Latimeria_chalumnae_33-131 XP_020377839.1_XP_020377833.1_DHX58_Rhincodon_typus RIG-1/DDX58 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not NP_001293024.1_DDX58_Danio_rerio_33-131 certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under 91 aCC-BY-NC 4.0 International license. XP_006821027.1_Saccoglossus_kowalevskii XP_015201265.1_DDX58_Lepisosteus_oculatus_33-131 XP_001636342.1_Nematostella_vectensis XP_020382456.1_Rhincodon_typus_39-137 CARD2

XP_001509166.2_DDX58_Ornithorhynchus_anatinus CDO19221.1_DDX58_Chiloscyllium_griseum_33131 XP_007904799.1_DDX58_Callorhinchus_milii_33-131 NP_055129.2_DDX58_Homo_sapiens 70 XP_020382456.1_Rhincodon_typus_9-96 XP_005294819.1_DDX58_Chrysemys_picta_bellii 51 54 CDO19221.1_DDX58_Chiloscyllium_griseum_3-90 79 XP_006009599.1_DDX58_Latimeria_chalumnae 54 XP_007904799.1_DDX58_Callorhinchus_milii_3-90 NP_001293024.1_DDX58_Danio_rerio RIG-1/DDX58 NP_001293024.1_DDX58_Danio_rerio_3-90 RIG-1/DDX58 XP_015201265.1_DDX58_Lepisosteus_oculatus XP_015201265.1_DDX58_Lepisosteus_oculatus_3-90 CDO19221.1_DDX58_Chiloscyllium_griseum XP_005294819.1_DDX58_Chrysemys_picta_bellii_3-91 CARD1 NP_055129.2_DDX58_Homo_sapiens_3-91 XP_020382456.1_DDX58_Rhincodon_typus 94 93 XP_006009599.1_DDX58_Latimeria_chalumnae_3-90 XP_007904799.1_DDX58_Callorhinchus_milii 77 NP_001180567.1_IFIH1_Gallus_gallus_37-135 ALJ11022.1_Branchiostoma_japonicum 78 XP_015281719.1_IFIH1_Gekko_japonicus_38-136 80 59 XP_003756622.2_Dicer_Sarcophilus_harrisii XP_008176681.1_IFIH1_Chrysemys_picta_bellii_37-135 55 XP_007667547.1_Dicer_Ornithorhynchus_anatinus 54 XP_006278058.1_IFIH1_Alligator_mississippiensis_37-135 XP_012400206.1_IFIH1_Sarcophilus_harrisii_38-136 54 NP_001035555.1_Dicer_Gallus_gallus 36 NP_071451.2_IFIH1_Homo_sapiens_38-135 53 XP_005307563.1_Dicer_Chrysemys_picta_bellii MDA5/IFIH1 54 XP_006636679.1_IFIH1_Lepisosteus_oculatus_38-137 XP_003214365.1_Dicer_Anolis_carolinensis NP_001295492.1_IFIH1_Danio_rerio_40-138 NP_001278557.1_Dicer_Homo_sapiens 45 CARD2 XP_006004403.1_IFIH1_Latimeria_chalumnae_39-137 86 NP_001123390.2_Dicer_Xenopus_tropicalis 91 XP_020377384.1_Rhincodon_typus_41-140 XP_014348697.1_Dicer_Latimeria_chalumnae CDO19216.1_IFIH1_Chiloscyllium_griseum_41-140

NP_001154925.1_Dicer_Danio_rerio XP_007888035.1_IFIH1_Callorhinchus_milii_36-135 XP_002933320.2_IFIH1_Xenopus_tropicalis_38-136 XP_015205688.1_Dicer_Lepisosteus_oculatus 76 BAA87905.1_CASP9_Homo_sapiens_3-92 XP_007909546.1_Dicer_Callorhinchus_milii 88 XP_007654482.1_CASP9_Ornithorhynchus_anatinus_1-46 XP_020385790.1_Dicer_Rhincodon_typus 69 94 XP_005293057.1_CASP9_Chrysemys_picta_bellii_3-90 92 NP_524453.1_Dicer-1_Drosophila_melanogaster Dicer XP_424580.5_CASP9_Gallus_gallus_3-90 92 KXJ17487.1_Dicer_Exaiptasia_pallida 73 AAI59330.1_CASP9_Xenopus_tropicalis_3-90 94 XP_014353607.1_CASP9_Latimeria_chalumnae_3-90 NP_523778.2_Dicer-2_Drosophila_melanogaster 58 XP_006641963.1_CASP9_Lepisosteus_oculatus_3-63 AED92831.1_dicer-like_4_Arabidopsis_thaliana XP_020383398.1_CASP9_Rhincodon_typus_3-90 AQK44792.1_dicer-like_103_Zea_mays 78 83 XP_007905426.1_CASP9_Callorhinchus_milii_3-90 AEE73926.1_dicer-like_2_Arabidopsis_thaliana Outgroup Caspases NP_001007405.2_CASP9_Danio_rerio_3-90

80 AQK64396.1_dicer-like_105_Zea_mays CAG46548.1_CASP2_Homo_sapiens_3-90

ONM35340.1_dicer-like_104_Zea_mays XP_007668397.1_CASP2_Ornithorhynchus_anatinus_9-96 92 ONM05638.1_dicer-like_102_Zea_mays XP_014339468.1_CASP2_Latimeria_chalumnae_10-97 XP_023965843.1_CASP2_Chrysemys_picta_bellii_9-96 AEE77844.1_dicer-like_3_Arabidopsis_thaliana 51 NP_001161173.1_CASP2_Gallus_gallus_9-96 AEE27221.1_dicer-like_1_Arabidopsis_thaliana AAI46718.1_CASP2_Danio_rerio_9-97

ONL93082.1_dicer-like_101_Zea_mays XP_012809163.1_CASP2_Xenopus_tropicalis_7-94

0.4 0.5 TLR Tree Legend Mammals Reptiles (including birds) Amphibians Coelacanth Teleosts Spotted gar Lampreys Elephant shark Whale shark Present in whale shark genome Present in jawed vertebrate ancestor Present in vertebrate ancestor Newly identified TLR family

ENSMUSG00000040522_Mus_musculus_Glires_TLR8_7_Single-domain.p1

9 8 XM_003511799_Cricetulus_griseus_Glires_TLR8_7_Single-domain.p1

8 8 ENSG00000101916_Homo_sapiens_Primates_TLR8_7_Single-domain.p1

XM_006172017_Tupaia_chinensis_Scandentia_TLR8_7_Single-domain.p1 6 9 ENSAMEG00000019770_Ailuropoda_melanoleuca_Laurasiatheria_TLR8_7_Single-domain.p1

ENSCAFG00000023498_Canis_lupus_familiaris_Laurasiatheria_TLR8_7_Single-domain.p1 8 7 8 9 XM_005880947_Myotis_brandtii_Laurasiatheria_TLR8_7_Single-domain.p1

ENSLAFG00000011755_Loxodonta_africana_Afrotheria_TLR8_7_Single-domain.p1 9 0 XM_007951627_Orycteropus_afer_afer_Afrotheria_TLR8_7_Single-domain.p1

ENSDNOG00000036980_Dasypus_novemcinctus_Xenarthra_TLR8_7_Single-domain.p1

ENSMODG00000017328_Monodelphis_domestica_Metatheria_TLR8_7_Single-domain.p1

ENSSHAG00000013998_Sarcophilus_harrisii_Metatheria_TLR8_7_Single-domain.p1 9 9 XM_005287856_Chrysemys_picta_bellii_Reptiles_TLR8A_7_Single-domain.p1

XM_007060202_Chelonia_mydas_Reptiles_TLR8A_7_Single-domain.p1

XM_006122794_Pelodiscus_sinensis_Reptiles_TLR8A_7_Single-domain.p1

XM_006022315_Alligator_sinensis_Reptiles_TLR8A_7_Single-domain.p1

XM_006258900_Alligator_mississippiensis_Reptiles_TLR8A_7_Single-domain.p1

XM_006022316_Alligator_sinensis_Reptiles_TLR8B_7_Single-domain.p1

XM_006258899_Alligator_mississippiensis_Reptiles_TLR8B_7_Single-domain.p1

XM_006122845_Pelodiscus_sinensis_Reptiles_TLR8B_7_Single-domain.p1

XM_007060256_Chelonia_mydas_Reptiles_TLR8B_7_Single-domain.p1

XM_006122846_Pelodiscus_sinensis_Reptiles_TLR8B_7_Single-domain.p1 TLR8 ENSXETG00000030579_Xenopus_tropicalis_Amphibia_TLR8_7_Single-domain.p1

XM_004912345_Xenopus_tropicalis_Amphibia_TLR8_7_Single-domain.p1 9 9 KF573998_Andrias_davidianus_Amphibia_TLR8_7_Single-domain.p1

ENSLACG00000016637_Latimeria_chalumnae_Coelacanthimorpha_TLR8_7_Single-domain.p1

ENSGMOG00000001639_Gadus_morhua_Euteleosteomorpha_TLR8_7_Single-domain.p1

ENSGMOG00000011526_Gadus_morhua_Euteleosteomorpha_TLR8_7_Single-domain.p1

ENSGMOG00000001689_Gadus_morhua_Euteleosteomorpha_TLR8_7_Single-domain.p1

ENSXMAG00000004433_Xiphophorus_maculatus_Euteleosteomorpha_TLR8_7_Single-domain.p1 9 8 XM_003961870_Takifugu_rubripes_Euteleosteomorpha_TLR8_7_Single-domain.p1 9 8 9 9 ENSORLG00000017800_Oryzias_latipes_Euteleosteomorpha_TLR8_7_Single-domain.p1

HF970586_Salmo_salar_Euteleosteomorpha_TLR8_7_Single-domain.p1

NM_001161693_Salmo_salar_Euteleosteomorpha_TLR8_7_Single-domain.p1

HQ677718_Ictalurus_punctatus_Otomorpha_TLR8_7_Single-domain.p1

HQ677719_Ictalurus_punctatus_Otomorpha_TLR8_7_Single-domain.p1

ENSAMXG00000001729_Astyanax_mexicanus_Otomorpha_TLR8_7_Single-domain.p1

ENSDARG00000089151_Danio_rerio_Otomorpha_TLR8_7_Single-domain.p1

9 9 ENSDARG00000090119_Danio_rerio_Otomorpha_TLR8_7_Single-domain.p1 ENSAMXG00000025614_Astyanax_mexicanus_Otomorpha_TLR8_7_Single-domain.p1

XM_003199392_Danio_rerio_Otomorpha_TLR8_7_Single-domain.p1

HF970587_Salmo_salar_Euteleosteomorpha_TLR8_7_Single-domain.p1

HF970588_Salmo_salar_Euteleosteomorpha_TLR8_7_Single-domain.p1

ENSLOCG00000009982_Lepisosteus_oculatus_Actinopterygii_TLR8_7_Single-domain.p1

XM_006638003_Lepisosteus_oculatus_Actinopterygii_TLR8_7_Single-domain.p1

XM_007891825_Callorhinchus_milii_Chondrichthyes_TLR8_7_Single-domain.p1

Rhincodon_typus_XP_020376247.1

ENSMUSG00000044583_Mus_musculus_Glires_TLR7_7_Single-domain.p1

9 5 XM_007639424_Cricetulus_griseus_Glires_TLR7_7_Single-domain.p1

8 8 ENSG00000196664_Homo_sapiens_Primates_TLR7_7_Single-domain.p1

XM_005880946_Myotis_brandtii_Laurasiatheria_TLR7_7_Single-domain.p1 8 7 9 9 XM_006172018_Tupaia_chinensis_Scandentia_TLR7_7_Single-domain.p1

ENSAMEG00000019777_Ailuropoda_melanoleuca_Laurasiatheria_TLR7_7_Single-domain.p1 100 8 0 9 9NM_001048124_Canis_lupus_familiaris_Laurasiatheria_TLR7_7_Single-domain.p1 ENSBTAG00000022161_Bos_taurus_Laurasiatheria_TLR7_7_Single-domain.p1

ENSLAFG00000001156_Loxodonta_africana_Afrotheria_TLR7_7_Single-domain.p1 8 1 XM_007951628_Orycteropus_afer_afer_Afrotheria_TLR7_7_Single-domain.p1 9 2 ENSDNOG00000034029_Dasypus_novemcinctus_Xenarthra_TLR7_7_Single-domain.p1

XM_003765580_Sarcophilus_harrisii_Metatheria_TLR7_7_Single-domain.p1

XM_007670389_Ornithorhynchus_anatinus_Prototheria_TLR7_7_Single-domain.p1

XM_002194896_Taeniopygia_guttata_Aves_TLR7_7_Single-domain.p1

XM_005429715_Geospiza_fortis_Aves_TLR7_7_Single-domain.p1

XM_005229443_Falco_peregrinus_Aves_TLR7_7_Single-domain.p1 8 3 ENSAPLG00000004139_Anas_platyrhynchos_Aves_TLR7_7_Single-domain.p1

9 9 ENSGALG00000016590_Gallus_gallus_Aves_TLR7_7_Single-domain.p1 9 5 XM_005287771_Chrysemys_picta_bellii_Reptiles_TLR7_7_Single-domain.p1 TLR7 XM_007060255_Chelonia_mydas_Reptiles_TLR7_7_Single-domain.p1

XM_006122844_Pelodiscus_sinensis_Reptiles_TLR7_7_Single-domain.p1 9 6 8 7 XM_006022317_Alligator_sinensis_Reptiles_TLR7_7_Single-domain.p1

XM_006258898_Alligator_mississippiensis_Reptiles_TLR7_7_Single-domain.p1

XM_003218850_Anolis_carolinensis_Reptiles_TLR7_7_Single-domain.p1

9 9 XM_007435644_Python_bivittatus_Reptiles_TLR7_7_Single-domain.p1

ENSXETG00000027076_Xenopus_tropicalis_Amphibia_TLR7_7_Single-domain.p1 8 0 KF573997_Andrias_davidianus_Amphibia_TLR7_7_Single-domain.p1

ENSLACG00000001037_Latimeria_chalumnae_Coelacanthimorpha_TLR7_7_Single-domain.p1

XM_006013173_Latimeria_chalumnae_Coelacanthimorpha_TLR7_7_Single-domain.p1

ENSONIG00000020676_Oreochromis_niloticus_Euteleosteomorpha_TLR7_7_Single-domain.p1

8 8 XM_006788393_Neolamprologus_brichardi_Euteleosteomorpha_TLR7_7_Single-domain.p1 ENSXMAG00000004434_Xiphophorus_maculatus_Euteleosteomorpha_TLR7_7_Single-domain.p1

XM_003961871_Takifugu_rubripes_Euteleosteomorpha_TLR7_7_Single-domain.p1

ENSGMOG00000001675_Gadus_morhua_Euteleosteomorpha_TLR7_7_Single-domain.p1

HF970585_Salmo_salar_Euteleosteomorpha_TLR7_7_Single-domain.p1

ENSAMXG00000026060_Astyanax_mexicanus_Otomorpha_TLR7_7_Single-domain.p1

HQ677718_Ictalurus_punctatus_Otomorpha_TLR7_7_Single-domain.p1

ENSDARG00000068812_Danio_rerio_Otomorpha_TLR7_7_Single-domain.p1

ENSLOCG00000009977_Lepisosteus_oculatus_Actinopterygii_TLR7_7_Single-domain.p1

XM_007891618_Callorhinchus_milii_Chondrichthyes_TLR7_7_Single-domain.p1

Rhincodon_typus_XP_020376237.1

ENSPMAG00000002336_Petromyzon_marinus_Cyclostomata_TLR7_8_7_Single-domain.p1

ENSPMAG00000006251_Petromyzon_marinus_Cyclostomata_TLR7_8_7_Single-domain.p1 TLR7/8-like (TLR34)

XM_005914626_Haplochromis_burtoni_Euteleosteomorpha_TLR9_7_Single-domain.p1

XM_006785704_Neolamprologus_brichardi_Euteleosteomorpha_TLR9_7_Single-domain.p1

ENSONIG00000012415_Oreochromis_niloticus_Euteleosteomorpha_TLR9_7_Single-domain.p1 (TLR30) TLR9b

ENSTRUG00000007717_Takifugu_rubripes_Euteleosteomorpha_TLR9_7_Single-domain.p1

ENSORLG00000008852_Oryzias_latipes_Euteleosteomorpha_TLR9_7_Single-domain.p1 9 6 ENSXMAG00000007629_Xiphophorus_maculatus_Euteleosteomorpha_TLR9_7_Single-domain.p1

ENSGMOG00000003161_Gadus_morhua_Euteleosteomorpha_TLR9_7_Single-domain.p1 9 9 ENSGMOG00000011244_Gadus_morhua_Euteleosteomorpha_TLR9_7_Single-domain.p1

ENSGMOG00000011256_Gadus_morhua_Euteleosteomorpha_TLR9_7_Single-domain.p1

ENSGMOG00000003222_Gadus_morhua_Euteleosteomorpha_TLR9_7_Single-domain.p1 9 9 ENSGMOG00000003269_Gadus_morhua_Euteleosteomorpha_TLR9_7_Single-domain.p1

NM_001123653_Salmo_salar_Euteleosteomorpha_TLR9_7_Single-domain.p1

ENSAMXG00000026345_Astyanax_mexicanus_Otomorpha_TLR9_7_Single-domain.p1 9 6 HQ677720_Ictalurus_punctatus_Otomorpha_TLR9_7_Single-domain.p1

ENSDARG00000044490_Danio_rerio_Otomorpha_TLR9_7_Single-domain.p1

ENSLOCG00000014202_Lepisosteus_oculatus_Actinopterygii_TLR9_7_Single-domain.p1

Rhincodon_typus_XP_020391373.1

FGENESH_Rhincodon_typus_tig000316855 TLR9

XM_007890298_Callorhinchus_milii_Chondrichthyes_TLR9_7_Single-domain.p1

ENSMUSG00000045322_Mus_musculus_Glires_TLR9_7_Single-domain.p1

7 6 XM_007615244_Cricetulus_griseus_Glires_TLR9_7_Single-domain.p1 7 6 XM_006169328_Tupaia_chinensis_Scandentia_TLR9_7_Single-domain.p1

7 6 ENSBTAG00000018198_Bos_taurus_Laurasiatheria_TLR9_7_Single-domain.p1

ENSAMEG00000009061_Ailuropoda_melanoleuca_Laurasiatheria_TLR9_7_Single-domain.p1 9 4 7 7 ENSCAFG00000023201_Canis_lupus_familiaris_Laurasiatheria_TLR9_7_Single-domain.p1 TLR9a

XM_004457360_Dasypus_novemcinctus_Xenarthra_TLR9_7_Single-domain.p1

ENSG00000239732_Homo_sapiens_Primates_TLR9_7_Single-domain.p1

XM_005881923_Myotis_brandtii_Laurasiatheria_TLR9_7_Single-domain.p1 8 8 9 9 ENSLAFG00000002993_Loxodonta_africana_Afrotheria_TLR9_7_Single-domain.p1 8 8 XM_007952045_Orycteropus_afer_afer_Afrotheria_TLR9_7_Single-domain.p1

XM_003762588_Sarcophilus_harrisii_Metatheria_TLR9_7_Single-domain.p1 9 8 XM_007673124_Ornithorhynchus_anatinus_Prototheria_TLR9_7_Single-domain.p1

XM_005987092_Latimeria_chalumnae_Coelacanthimorpha_TLR9_7_Single-domain.p1

XM_006631167_Lepisosteus_oculatus_Actinopterygii_TLR9_7_Single-domain.p1

XM_005313446_Chrysemys_picta_bellii_Reptiles_TLR9_7_Single-domain.p1 XM_006119661_Pelodiscus_sinensis_Reptiles_TLR9_7_Single-domain.p1 TLR9c JN969980_Andrias_davidianus_Amphibia_TLR9_7_Single-domain.p1

ENSXETG00000005958_Xenopus_tropicalis_Amphibia_TLR9_7_Single-domain.p1 (TLR31)

ENSAMEG00000005179_Ailuropoda_melanoleuca_Laurasiatheria_TLR3_3_Single-domain.p1

9 3 ENSCAFG00000007406_Canis_lupus_familiaris_Laurasiatheria_TLR3_3_Single-domain.p1 ENSBTAG00000008682_Bos_taurus_Laurasiatheria_TLR3_3_Single-domain.p1 9 7 8 9 XM_005863094_Myotis_brandtii_Laurasiatheria_TLR3_3_Single-domain.p1 ENSLAFG00000017558_Loxodonta_africana_Afrotheria_TLR3_3_Single-domain.p1 8 8 6 1 9 9 XM_007948388_Orycteropus_afer_afer_Afrotheria_TLR3_3_Single-domain.p1 ENSDNOG00000007440_Dasypus_novemcinctus_Xenarthra_TLR3_3_Single-domain.p1

ENSG00000164342_Homo_sapiens_Primates_TLR3_3_Single-domain.p1 7 1 XM_006160465_Tupaia_chinensis_Scandentia_TLR3_3_Single-domain.p1

ENSMUSG00000031639_Mus_musculus_Glires_TLR3_3_Single-domain.p1 9 5 9 9 XM_007654406_Cricetulus_griseus_Glires_TLR3_3_Single-domain.p1 ENSOCUG00000017763_Oryctolagus_cuniculus_Glires_TLR3_3_Single-domain.p1

ENSMODG00000004861_Monodelphis_domestica_Metatheria_TLR3_3_Single-domain.p1

XM_003773017_Sarcophilus_harrisii_Metatheria_TLR3_3_Single-domain.p1

ENSOANG00000004937_Ornithorhynchus_anatinus_Prototheria_TLR3_3_Single-domain.p1

XM_002190852_Taeniopygia_guttata_Aves_TLR3_3_Single-domain.p1

XM_005415516_Geospiza_fortis_Aves_TLR3_3_Single-domain.p1

XM_005243141_Falco_peregrinus_Aves_TLR3_3_Single-domain.p1

ENSAPLG00000008976_Anas_platyrhynchos_Aves_TLR3_3_Single-domain.p1

9 9 ENSGALG00000013468_Gallus_gallus_Aves_TLR3_3_Single-domain.p1

XM_006015902_Alligator_sinensis_Reptiles_TLR3_3_Single-domain.p1 TLR3 XM_006260649_Alligator_mississippiensis_Reptiles_TLR3_3_Single-domain.p1

ENSPSIG00000010900_Pelodiscus_sinensis_Reptiles_TLR3_3_Single-domain.p1 9 8 XM_005282040_Chrysemys_picta_bellii_Reptiles_TLR3_3_Single-domain.p1 9 9 XM_007063639_Chelonia_mydas_Reptiles_TLR3_3_Single-domain.p1

ENSACAG00000001732_Anolis_carolinensis_Reptiles_TLR3_3_Single-domain.p1

XM_007439002_Python_bivittatus_Reptiles_TLR3_3_Single-domain.p1

ENSXETG00000005873_Xenopus_tropicalis_Amphibia_TLR3_3_Single-domain.p1

KF573996_Andrias_davidianus_Amphibia_TLR3_3_Single-domain.p1

ENSLACG00000011410_Latimeria_chalumnae_Coelacanthimorpha_TLR3_3_Single-domain.p1

Rhincodon_typus_XP_020375139.1 9 2 Rhincodon_typus_XP_020375140.1

XM_007892819_Callorhinchus_milii_Chondrichthyes_TLR3_3_Single-domain.p1

XM_006630108_Lepisosteus_oculatus_Actinopterygii_TLR3_3_Single-domain.p1

8 8 XM_005929233_Haplochromis_burtoni_Euteleosteomorpha_TLR3_3_Single-domain.p1 XM_006796122_Neolamprologus_brichardi_Euteleosteomorpha_TLR3_3_Single-domain.p1

9 9 ENSONIG00000010172_Oreochromis_niloticus_Euteleosteomorpha_TLR3_3_Single-domain.p1 ENSXMAG00000018871_Xiphophorus_maculatus_Euteleosteomorpha_TLR3_3_Single-domain.p1 9 8 XM_003972308_Takifugu_rubripes_Euteleosteomorpha_TLR3_3_Single-domain.p1 9 6 XM_004066015_Oryzias_latipes_Euteleosteomorpha_TLR3_3_Single-domain.p1 9 1 ENSGMOG00000000786_Gadus_morhua_Euteleosteomorpha_TLR3_3_Single-domain.p1

BK008646_Salmo_salar_Euteleosteomorpha_TLR3_3_Single-domain.p1

ENSAMXG00000020215_Astyanax_mexicanus_Otomorpha_TLR3_3_Single-domain.p1

NM_001200068_Ictalurus_punctatus_Otomorpha_TLR3_3_Single-domain.p1

ENSDARG00000016065_Danio_rerio_Otomorpha_TLR3_3_Single-domain.p1

ENSPMAG00000001609_Petromyzon_marinus_Cyclostomata_TLR3_3_Single-domain.p1

XM_005298023_Chrysemys_picta_bellii_Reptiles_TLR22_13_Single-domain.p1

XM_007062372_Chelonia_mydas_Reptiles_TLR22_13_Single-domain.p1

XM_006138102_Pelodiscus_sinensis_Reptiles_TLR22_13_Single-domain.p1

XM_006278050_Alligator_mississippiensis_Reptiles_TLR22_13_Single-domain.p1

ENSACAG00000028117_Anolis_carolinensis_Reptiles_TLR22_13_Single-domain.p1

XM_007421113_Python_bivittatus_Reptiles_TLR22_13_Single-domain.p1

9 7 ENSXETG00000018306_Xenopus_tropicalis_Amphibia_TLR22_13_Single-domain.p1 XM_007906265_Callorhinchus_milii_Chondrichthyes_TLR22_13_Single-domain.p1 9 6 Rhincodon_typus_XP_020374252.1

ENSLOCG00000018316_Lepisosteus_oculatus_Actinopterygii_TLR22_13_Single-domain.p1 9 3 ENSAMXG00000025057_Astyanax_mexicanus_Otomorpha_TLR22_13_Single-domain.p1

HQ677725_Ictalurus_punctatus_Otomorpha_TLR22_13_Single-domain.p1

8 8 NM_001128675_Danio_rerio_Otomorpha_TLR22_13_Single-domain.p1

ENSAMXG00000026007_Astyanax_mexicanus_Otomorpha_TLR22_13_Single-domain.p1

XM_007259068_Astyanax_mexicanus_Otomorpha_TLR22_13_Single-domain.p1 9 3 ENSAMXG00000026008_Astyanax_mexicanus_Otomorpha_TLR22_13_Single-domain.p1

AM233509_Salmo_salar_Euteleosteomorpha_TLR22_13_Single-domain.p1

9 3 FM206383_Salmo_salar_Euteleosteomorpha_TLR22_13_Single-domain.p1 NM_001140388_Salmo_salar_Euteleosteomorpha_TLR22_13_Single-domain.p1

JX074780_Gadus_morhua_Euteleosteomorpha_TLR22_13_Single-domain.p1 TLR22/23

JX074783_Gadus_morhua_Euteleosteomorpha_TLR22_13_Single-domain.p1

9 6 JX074778_Gadus_morhua_Euteleosteomorpha_TLR22_13_Single-domain.p1 ENSGMOG00000000110_Gadus_morhua_Euteleosteomorpha_TLR22_13_Single-domain.p1

ENSGMOG00000000152_Gadus_morhua_Euteleosteomorpha_TLR22_13_Single-domain.p1 9 3 JX074775_Gadus_morhua_Euteleosteomorpha_TLR22_13_Single-domain.p1

JX074773_Gadus_morhua_Euteleosteomorpha_TLR22_13_Single-domain.p1

ENSGMOG00000000150_Gadus_morhua_Euteleosteomorpha_TLR22_13_Single-domain.p1

9 9 XM_005932157_Haplochromis_burtoni_Euteleosteomorpha_TLR22_13_Single-domain.p1 9 8 XM_005932173_Haplochromis_burtoni_Euteleosteomorpha_TLR22_13_Single-domain.p1

XM_006799265_Neolamprologus_brichardi_Euteleosteomorpha_TLR22_13_Single-domain.p1

ENSONIG00000006496_Oreochromis_niloticus_Euteleosteomorpha_TLR22_13_Single-domain.p1

ENSXMAG00000003638_Xiphophorus_maculatus_Euteleosteomorpha_TLR22_13_Single-domain.p1

8 8 ENSXMAG00000020395_Xiphophorus_maculatus_Euteleosteomorpha_TLR22_13_Single-domain.p1 ENSTRUG00000006193_Takifugu_rubripes_Euteleosteomorpha_TLR22_13_Single-domain.p1 8 8 ENSORLG00000020413_Oryzias_latipes_Euteleosteomorpha_TLR22_13_Single-domain.p1

XM_005805500_Xiphophorus_maculatus_Euteleosteomorpha_TLR23_13_Single-domain.p1

8 5 XM_005805502_Xiphophorus_maculatus_Euteleosteomorpha_TLR23_13_Single-domain.p1

ENSXMAG00000017252_Xiphophorus_maculatus_Euteleosteomorpha_TLR23_13_Single-domain.p1

XM_005805501_Xiphophorus_maculatus_Euteleosteomorpha_TLR23_13_Single-domain.p1

9 4 AC156435_Takifugu_rubripes_Euteleosteomorpha_TLR23_13_Single-domain.p1 JX074784_Gadus_morhua_Euteleosteomorpha_TLR23_13_Single-domain.p1

ENSONIG00000003939_Oreochromis_niloticus_Euteleosteomorpha_TLR23_13_Single-domain.p1

XM_005949692_Haplochromis_burtoni_Euteleosteomorpha_TLR23_13_Single-domain.p1

XM_006806101_Neolamprologus_brichardi_Euteleosteomorpha_TLR23_13_Single-domain.p1

XM_005461202_Oreochromis_niloticus_Euteleosteomorpha_TLR23_13_Single-domain.p1

XM_003415290_Loxodonta_africana_Afrotheria_TLR12_11_Trans-three-domain.p1

9 9 XM_007950466_Orycteropus_afer_afer_Afrotheria_TLR12_11_Trans-three-domain.p1

9 6 ENSDNOG00000044194_Dasypus_novemcinctus_Xenarthra_TLR12_11_Trans-three-domain.p1 9 6 9 1 XM_005878404_Myotis_brandtii_Laurasiatheria_TLR12_11_Trans-three-domain.p1 9 9 XM_006158140_Tupaia_chinensis_Scandentia_TLR12_11_Trans-three-domain.p1

ENSOCUG00000021564_Oryctolagus_cuniculus_Glires_TLR12_11_Trans-three-domain.p1

ENSMUSG00000062545_Mus_musculus_Glires_TLR12_11_Trans-three-domain.p1

XM_003500646_Cricetulus_griseus_Glires_TLR12_11_Trans-three-domain.p1 TLR11/12/16/19/20/26 XM_001516788_Ornithorhynchus_anatinus_Prototheria_TLR12_11_Trans-three-domain.p1

ENSMUSG00000051969_Mus_musculus_Glires_TLR11_11_Trans-three-domain.p1

9 2 XM_007653829_Cricetulus_griseus_Glires_TLR11_11_Trans-three-domain.p1

9 2 XM_006149416_Tupaia_chinensis_Scandentia_TLR11_11_Trans-three-domain.p1

5 1 ENSOCUG00000026685_Oryctolagus_cuniculus_Glires_TLR11_11_Trans-three-domain.p1

ENSLAFG00000029363_Loxodonta_africana_Afrotheria_TLR11_11_Trans-three-domain.p1 5 4 9 8 9 3 XM_007958082_Orycteropus_afer_afer_Afrotheria_TLR11_11_Trans-three-domain.p1

XM_004482721_Dasypus_novemcinctus_Xenarthra_TLR11_11_Trans-three-domain.p1

XM_005856946_Myotis_brandtii_Laurasiatheria_TLR11_11_Trans-three-domain.p1

ENSXETG00000032510_Xenopus_tropicalis_Amphibia_TLR16_11_Trans-three-domain.p1

ENSDARG00000069593_Danio_rerio_Otomorpha_TLR20_11_Trans-three-domain.p1

8 5 ENSDARG00000088701_Danio_rerio_Otomorpha_TLR20_11_Trans-three-domain.p1 ENSAMXG00000026258_Astyanax_mexicanus_Otomorpha_TLR20_11_Trans-three-domain.p1 8 8 HG514147_Salmo_salar_Euteleosteomorpha_TLR20_11_Trans-three-domain.p1 9 4 XM_007235175_Astyanax_mexicanus_Otomorpha_TLR20_11_Trans-three-domain.p1

HQ677723_Ictalurus_punctatus_Otomorpha_TLR20_11_Trans-three-domain.p1

9 9 HQ677727_Ictalurus_punctatus_Otomorpha_TLR26_11_Trans-three-domain.p1 HG514150_Salmo_salar_Euteleosteomorpha_TLR20_11_Trans-three-domain.p1 9 9 ENSAMXG00000016324_Astyanax_mexicanus_Otomorpha_TLR19_11_Trans-three-domain.p1

HQ677722_Ictalurus_punctatus_Otomorpha_TLR19_11_Trans-three-domain.p1

ENSDARG00000070392_Danio_rerio_Otomorpha_TLR19_11_Trans-three-domain.p1

XM_002664847_Danio_rerio_Otomorpha_TLR19_11_Trans-three-domain.p1

HG514146_Salmo_salar_Euteleosteomorpha_TLR19_11_Trans-three-domain.p1

ENSGALG00000000774_Gallus_gallus_Aves_TLR21_13_Single-domain.p1

XM_005431221_Geospiza_fortis_Aves_TLR21_13_Single-domain.p1

9 8 XM_005238572_Falco_peregrinus_Aves_TLR21_13_Single-domain.p1 XM_006038111_Alligator_sinensis_Reptiles_TLR21_13_Single-domain.p1

XM_005314303_Chrysemys_picta_bellii_Reptiles_TLR21_13_Single-domain.p1

XM_007053406_Chelonia_mydas_Reptiles_TLR21_13_Single-domain.p1

ENSPSIG00000014031_Pelodiscus_sinensis_Reptiles_TLR21_13_Single-domain.p1

ENSACAG00000015069_Anolis_carolinensis_Reptiles_TLR21_13_Single-domain.p1

XM_007430452_Python_bivittatus_Reptiles_TLR21_13_Single-domain.p1 9 9

ENSXETG00000032848_Xenopus_tropicalis_Amphibia_TLR21_13_Single-domain.p1 TLR21

ENSLACG00000001078_Latimeria_chalumnae_Coelacanthimorpha_TLR21_13_Single-domain.p1

XM_005950954_Haplochromis_burtoni_Euteleosteomorpha_TLR21_13_Single-domain.p1

XM_006808166_Neolamprologus_brichardi_Euteleosteomorpha_TLR21_13_Single-domain.p1

9 7 ENSONIG00000020525_Oreochromis_niloticus_Euteleosteomorpha_TLR21_13_Single-domain.p1 NM_001032579_Takifugu_rubripes_Euteleosteomorpha_TLR21_13_Single-domain.p1 9 9 9 9 XM_004078227_Oryzias_latipes_Euteleosteomorpha_TLR21_13_Single-domain.p1 9 7 9 9 XM_005798819_Xiphophorus_maculatus_Euteleosteomorpha_TLR21_13_Single-domain.p1

JX074771_Gadus_morhua_Euteleosteomorpha_TLR21_13_Single-domain.p1

9 9 9 2 HG514151_Salmo_salar_Euteleosteomorpha_TLR21_13_Single-domain.p1 ENSAMXG00000025136_Astyanax_mexicanus_Otomorpha_TLR21_13_Single-domain.p1 9 9 NM_001200065_Ictalurus_punctatus_Otomorpha_TLR21_13_Single-domain.p1

ENSDARG00000058045_Danio_rerio_Otomorpha_TLR21_13_Single-domain.p1

Rhincodon_typus_XP_020378687.1

Rhincodon_typus_XP_020388546.1

FGENESH_Rhincodon_typus_tig000303350 9 9 TLR29 ENSLACG00000002657_Latimeria_chalumnae_Coelacanthimorpha_TLR21_13_Single-domain.p1

ENSPMAG00000005013_Petromyzon_marinus_Cyclostomata_TLR21_13_Single-domain.p1 ENSPMAG00000010099_Petromyzon_marinus_Cyclostomata_TLR21_13_Single-domain.p1 TLR21/29-like (TLR35) ENSPMAG00000010181_Petromyzon_marinus_Cyclostomata_TLR21_13_Single-domain.p1

8 5 XM_002720225_Oryctolagus_cuniculus_Glires_TLR13_13_Single-domain.p1 TLR13a 8 5 XM_006148781_Tupaia_chinensis_Scandentia_TLR13_13_Single-domain.p1

6 0 XM_005877135_Myotis_brandtii_Laurasiatheria_TLR13_13_Single-domain.p1

ENSMUSG00000033777_Mus_musculus_Glires_TLR13_13_Single-domain.p1

ENSLAFG00000025903_Loxodonta_africana_Afrotheria_TLR13_13_Single-domain.p1

ENSMODG00000028942_Monodelphis_domestica_Metatheria_TLR13_13_Single-domain.p1 TLR13 8 4 ENSSHAG00000013182_Sarcophilus_harrisii_Metatheria_TLR13_13_Single-domain.p1

ENSLACG00000012864_Latimeria_chalumnae_Coelacanthimorpha_TLR13_13_Single-domain.p1

XM_006015077_Alligator_sinensis_Reptiles_TLR13_13_Single-domain.p1 9 8 XM_006277653_Alligator_mississippiensis_Reptiles_TLR13_13_Single-domain.p1 9 8 TLR13b XM_005312618_Chrysemys_picta_bellii_Reptiles_TLR13_13_Single-domain.p1 ENSACAG00000016646_Anolis_carolinensis_Reptiles_TLR13_13_Single-domain.p1 (TLR32) XM_002935001_Xenopus_tropicalis_Amphibia_TLR13_13_Single-domain.p1 XM_006005093_Latimeria_chalumnae_Coelacanthimorpha_TLR13_13_Single-domain.p1 TLR13c XM_006005094_Latimeria_chalumnae_Coelacanthimorpha_TLR13_13_Single-domain.p1 8 6 AF186107_Mus_musculus_Glires_TLR5_5_Single-domain.p1 (TLR33) 4 5 XM_003494963_Cricetulus_griseus_Glires_TLR5_5_Single-domain.p1

4 0 HQ874605_Oryctolagus_cuniculus_Glires_TLR5_5_Single-domain.p1

3 7 XM_006171975_Tupaia_chinensis_Scandentia_TLR5_5_Single-domain.p1

ENSG00000187554_Homo_sapiens_Primates_TLR5_5_Single-domain.p1 6 9 4 2 XM_007936955_Orycteropus_afer_afer_Afrotheria_TLR5_5_Single-domain.p1

ENSAMEG00000020191_Ailuropoda_melanoleuca_Laurasiatheria_TLR5_5_Single-domain.p1

3 7 NM_001197176_Canis_lupus_familiaris_Laurasiatheria_TLR5_5_Single-domain.p1 ENSBTAG00000000477_Bos_taurus_Laurasiatheria_TLR5_5_Single-domain.p1

XM_005866488_Myotis_brandtii_Laurasiatheria_TLR5_5_Single-domain.p1

ENSSHAG00000012101_Sarcophilus_harrisii_Metatheria_TLR5_5_Single-domain.p1

9 9 XM_001376152_Monodelphis_domestica_Metatheria_TLR5_5_Single-domain.p1 XM_001512183_Ornithorhynchus_anatinus_Prototheria_TLR5_5_Single-domain.p1

8 9 ENSGALG00000009392_Gallus_gallus_Aves_TLR5_5_Single-domain.p1 9 9 ENSTGUG00000002653_Taeniopygia_guttata_Aves_TLR5_5_Single-domain.p1

XM_005241848_Falco_peregrinus_Aves_TLR5_5_Single-domain.p1

ENSAPLG00000001279_Anas_platyrhynchos_Aves_TLR5_5_Single-domain.p1

XM_006029187_Alligator_sinensis_Reptiles_TLR5_5_Single-domain.p1

XM_006270883_Alligator_mississippiensis_Reptiles_TLR5_5_Single-domain.p1

XM_005289597_Chrysemys_picta_bellii_Reptiles_TLR5_5_Single-domain.p1 9 9 XM_007070449_Chelonia_mydas_Reptiles_TLR5_5_Single-domain.p1 TLR5 XM_006115600_Pelodiscus_sinensis_Reptiles_TLR5_5_Single-domain.p1

XM_003216035_Anolis_carolinensis_Reptiles_TLR5_5_Single-domain.p1

XM_007434409_Python_bivittatus_Reptiles_TLR5_5_Single-domain.p1

ENSLACG00000015379_Latimeria_chalumnae_Coelacanthimorpha_TLR5_5_Single-domain.p1 9 9 ENSXETG00000015027_Xenopus_tropicalis_Amphibia_TLR5_5_Single-domain.p1

NM_001094980_Xenopus_laevis_Amphibia_TLR5_5_Single-domain.p1

ENSLOCG00000018000_Lepisosteus_oculatus_Actinopterygii_TLR5_5_Single-domain.p1

XM_005449891_Oreochromis_niloticus_Euteleosteomorpha_TLR5_5_Single-domain.p1

XM_005949079_Haplochromis_burtoni_Euteleosteomorpha_TLR5_5_Single-domain.p1

9 8 XM_006809115_Neolamprologus_brichardi_Euteleosteomorpha_TLR5_5_Single-domain.p1 8 0 ENSXMAG00000001788_Xiphophorus_maculatus_Euteleosteomorpha_TLR5_5_Single-domain.p1 9 9 9 7 ENSTRUG00000004759_Takifugu_rubripes_Euteleosteomorpha_TLR5_5_Single-domain.p1 XM_004083676_Oryzias_latipes_Euteleosteomorpha_TLR5_5_Single-domain.p1

ENSORLG00000017639_Oryzias_latipes_Euteleosteomorpha_TLR5S_5_Single-domain.p1

9 7 XM_005803424_Xiphophorus_maculatus_Euteleosteomorpha_TLR5S_5_Single-domain.p1 XM_003971921_Takifugu_rubripes_Euteleosteomorpha_TLR5S_5_Single-domain.p1 9 9 XM_005931574_Haplochromis_burtoni_Euteleosteomorpha_TLR5S_5_Single-domain.p1

XM_006792730_Neolamprologus_brichardi_Euteleosteomorpha_TLR5S_5_Single-domain.p1

NM_001123691_Salmo_salar_Euteleosteomorpha_TLR5_5_Single-domain.p1

HQ677717_Ictalurus_punctatus_Otomorpha_TLR5_5_Single-domain.p2

NM_001200229_Ictalurus_punctatus_Otomorpha_TLR5_5_Single-domain.p1

ENSAMXG00000026012_Astyanax_mexicanus_Otomorpha_TLR5_5_Single-domain.p1

ENSDARG00000044415_Danio_rerio_Otomorpha_TLR5_5_Single-domain.p1

ENSDARG00000052322_Danio_rerio_Otomorpha_TLR5_5_Single-domain.p1

XM_003229886_Anolis_carolinensis_Reptiles_TLR5SL_5_Single-domain.p1 TLR5SL XM_007439931_Python_bivittatus_Reptiles_TLR5SL_5_Single-domain.p1

XM_005307990_Chrysemys_picta_bellii_Reptiles_TLR5SL_5_Single-domain.p1

XM_006136605_Pelodiscus_sinensis_Reptiles_TLR5SL_5_Single-domain.p1

XM_006265007_Alligator_mississippiensis_Reptiles_TLR5SL_5_Single-domain.p1

ENSXETG00000030446_Xenopus_tropicalis_Amphibia_TLR5SL_5_Single-domain.p1

NM_001095629_Xenopus_laevis_Amphibia_TLR5SL_5_Single-domain.p1

ENSAMEG00000005790_Ailuropoda_melanoleuca_Laurasiatheria_TLR4_4_Three-domain.p1

ENSCAFG00000003518_Canis_lupus_familiaris_Laurasiatheria_TLR4_4_Three-domain.p1

ENSBTAG00000006240_Bos_taurus_Laurasiatheria_TLR4_4_Three-domain.p1

6 3 XM_005880935_Myotis_brandtii_Laurasiatheria_TLR4_4_Three-domain.p1

ENSG00000136869_Homo_sapiens_Primates_TLR4_4_Three-domain.p1 5 2 ENSLAFG00000006775_Loxodonta_africana_Afrotheria_TLR4_4_Three-domain.p1

6 7 8 1 XM_007936671_Orycteropus_afer_afer_Afrotheria_TLR4_4_Three-domain.p1 ENSDNOG00000032753_Dasypus_novemcinctus_Xenarthra_TLR4_4_Three-domain.p1

XM_006142796_Tupaia_chinensis_Scandentia_TLR4_4_Three-domain.p1

ENSOCUG00000002216_Oryctolagus_cuniculus_Glires_TLR4_4_Three-domain.p1

ENSMUSG00000039005_Mus_musculus_Glires_TLR4_4_Three-domain.p1

NM_001246762_Cricetulus_griseus_Glires_TLR4_4_Three-domain.p1

ENSMODG00000019614_Monodelphis_domestica_Metatheria_TLR4_4_Three-domain.p1

ENSSHAG00000017208_Sarcophilus_harrisii_Metatheria_TLR4_4_Three-domain.p1 TLR4 XM_007661855_Ornithorhynchus_anatinus_Prototheria_TLR4_4_Three-domain.p1

NM_001142454_Taeniopygia_guttata_Aves_TLR4_4_Three-domain.p1

9 9 XM_005423810_Geospiza_fortis_Aves_TLR4_4_Three-domain.p1

ENSAPLG00000012625_Anas_platyrhynchos_Aves_TLR4_4_Three-domain.p1

ENSGALG00000007001_Gallus_gallus_Aves_TLR4_4_Three-domain.p1

XM_005231393_Falco_peregrinus_Aves_TLR4_4_Three-domain.p1 9 9 XM_005301387_Chrysemys_picta_bellii_Reptiles_TLR4_4_Three-domain.p1

XM_007057276_Chelonia_mydas_Reptiles_TLR4_4_Three-domain.p1

ENSPSIG00000003827_Pelodiscus_sinensis_Reptiles_TLR4_4_Three-domain.p1

XM_006026272_Alligator_sinensis_Reptiles_TLR4_4_Three-domain.p1

XM_006273411_Alligator_mississippiensis_Reptiles_TLR4_4_Three-domain.p1

ENSACAG00000013911_Anolis_carolinensis_Reptiles_TLR4_4_Three-domain.p1

XM_007433542_Python_bivittatus_Reptiles_TLR4_4_Three-domain.p1

ENSDARG00000019742_Danio_rerio_Otomorpha_TLR4L_4_Three-domain.p1

ENSDARG00000075671_Danio_rerio_Otomorpha_TLR4L_4_Three-domain.p1

ENSDARG00000022048_Danio_rerio_Otomorpha_TLR4L_4_Three-domain.p1

HQ677716_Ictalurus_punctatus_Otomorpha_TLR4L_4_Three-domain.p1

ENSMUSG00000044827_Mus_musculus_Glires_TLR1_1_Three-domain.p1

6 3 XM_003515570_Cricetulus_griseus_Glires_TLR1_1_Three-domain.p1

6 3 XM_005877435_Myotis_brandtii_Laurasiatheria_TLR1_1_Three-domain.p1 ENSAMEG00000018486_Ailuropoda_melanoleuca_Laurasiatheria_TLR1_1_Three-domain.p1 9 8 ENSCAFG00000024010_Canis_lupus_familiaris_Laurasiatheria_TLR1_1_Three-domain.p1 9 9 FJ147090_Bos_taurus_Laurasiatheria_TLR1_1_Three-domain.p1

ENSOCUG00000001824_Oryctolagus_cuniculus_Glires_TLR1_1_Three-domain.p1 9 9 XM_006169244_Tupaia_chinensis_Scandentia_TLR1_1_Three-domain.p1

ENSG00000174125_Homo_sapiens_Primates_TLR1_1_Three-domain.p1

ENSLAFG00000008549_Loxodonta_africana_Afrotheria_TLR1_1_Three-domain.p1

5 4 XM_007953531_Orycteropus_afer_afer_Afrotheria_TLR1_1_Three-domain.p1 ENSDNOG00000000525_Dasypus_novemcinctus_Xenarthra_TLR1_1_Three-domain.p1

ENSCAFG00000016172_Canis_lupus_familiaris_Laurasiatheria_TLR6_1_Three-domain.p1

XM_002928125_Ailuropoda_melanoleuca_Laurasiatheria_TLR6_1_Three-domain.p1

XM_005207846_Bos_taurus_Laurasiatheria_TLR6_1_Three-domain.p1 9 3 5 9 XM_005877436_Myotis_brandtii_Laurasiatheria_TLR6_1_Three-domain.p1

ENSMUSG00000051498_Mus_musculus_Glires_TLR6_1_Three-domain.p1

6 9 XM_003515801_Cricetulus_griseus_Glires_TLR6_1_Three-domain.p1 ENSOCUG00000006031_Oryctolagus_cuniculus_Glires_TLR6_1_Three-domain.p1

5 8 ENSG00000174130_Homo_sapiens_Primates_TLR6_1_Three-domain.p1 9 4 XM_006169243_Tupaia_chinensis_Scandentia_TLR6_1_Three-domain.p1

9 6 XM_003411392_Loxodonta_africana_Afrotheria_TLR6_1_Three-domain.p1

XM_007953530_Orycteropus_afer_afer_Afrotheria_TLR6_1_Three-domain.p1

NM_001281307_Dasypus_novemcinctus_Xenarthra_TLR6_1_Three-domain.p1

ENSMODG00000020532_Monodelphis_domestica_Metatheria_TLR1_6_1_Three-domain.p1

ENSSHAG00000010462_Sarcophilus_harrisii_Metatheria_TLR1_6_1_Three-domain.p1

ENSAMEG00000018488_Ailuropoda_melanoleuca_Laurasiatheria_TLR10_1_Three-domain.p1

9 8 ENSCAFG00000016175_Canis_lupus_familiaris_Laurasiatheria_TLR10_1_Three-domain.p1

9 2 ENSBTAG00000014042_Bos_taurus_Laurasiatheria_TLR10_1_Three-domain.p1 ENSG00000174123_Homo_sapiens_Primates_TLR10_1_Three-domain.p1 9 4 9 2 ENSOCUG00000003701_Oryctolagus_cuniculus_Glires_TLR10_1_Three-domain.p1 8 4 ENSDNOG00000048094_Dasypus_novemcinctus_Xenarthra_TLR10_1_Three-domain.p1

ENSLAFG00000008544_Loxodonta_africana_Afrotheria_TLR10_1_Three-domain.p1

XM_003515569_Cricetulus_griseus_Glires_TLR10_1_Three-domain.p1 TLR1/6/10 ENSSHAG00000010643_Sarcophilus_harrisii_Metatheria_TLR10_1_Three-domain.p1 9 5 ENSOANG00000010318_Ornithorhynchus_anatinus_Prototheria_TLR10_1_Three-domain.p1 9 5 XM_001512923_Ornithorhynchus_anatinus_Prototheria_TLR1_6_1_Three-domain.p1

XM_005431639_Geospiza_fortis_Aves_TLR1LA_1_Three-domain.p1

9 6 XM_005490501_Zonotrichia_albicollis_Aves_TLR1LA_1_Three-domain.p1 8 2 6 4 XM_004176605_Taeniopygia_guttata_Aves_TLR1LA_1_Three-domain.p1

9 0 XM_005243208_Falco_peregrinus_Aves_TLR1LA_1_Three-domain.p1

ENSAPLG00000002049_Anas_platyrhynchos_Aves_TLR1LA_1_Three-domain.p1 4 8 ENSGALG00000017485_Gallus_gallus_Aves_TLR1LA_1_Three-domain.p1

9 5 ENSTGUG00000009041_Taeniopygia_guttata_Aves_TLR1LB_1_Three-domain.p1 9 5 XM_005415453_Geospiza_fortis_Aves_TLR1LB_1_Three-domain.p1

5 2 XM_005243247_Falco_peregrinus_Aves_TLR1LB_1_Three-domain.p1

JN572686_Anas_platyrhynchos_Aves_TLR1LB_1_Three-domain.p1 9 3 ENSGALG00000027093_Gallus_gallus_Aves_TLR1LB_1_Three-domain.p1

XM_006029234_Alligator_sinensis_Reptiles_TLR1LB_1_Three-domain.p1

XM_006258498_Alligator_mississippiensis_Reptiles_TLR1LB_1_Three-domain.p1

XM_006029243_Alligator_sinensis_Reptiles_TLR1LA_1_Three-domain.p1

XM_006258501_Alligator_mississippiensis_Reptiles_TLR1LA_1_Three-domain.p1

XM_007059713_Chelonia_mydas_Reptiles_TLR1LB_1_Three-domain.p1

XM_008169981_Chrysemys_picta_bellii_Reptiles_TLR1LB_1_Three-domain.p1 5 0 ENSPSIG00000012620_Pelodiscus_sinensis_Reptiles_TLR1LB_1_Three-domain.p1

XM_008169980_Chrysemys_picta_bellii_Reptiles_TLR1LA_1_Three-domain.p1

ENSACAG00000007342_Anolis_carolinensis_Reptiles_TLR1L_1_Three-domain.p1 9 1 9 0 XM_007437894_Python_bivittatus_Reptiles_TLR1L_1_Three-domain.p1

XM_003226246_Anolis_carolinensis_Reptiles_TLR1L_1_Three-domain.p1

ENSXETG00000018489_Xenopus_tropicalis_Amphibia_TLR1L_1_Three-domain.p1 8 6 XM_002938656_Xenopus_tropicalis_Amphibia_TLR1L_1_Three-domain.p1

ENSXETG00000018477_Xenopus_tropicalis_Amphibia_TLR1L_1_Three-domain.p1

ENSLACG00000010038_Latimeria_chalumnae_Coelacanthimorpha_TLR1L_1_Three-domain.p1

ENSAMXG00000025563_Astyanax_mexicanus_Otomorpha_TLR1L_1_Three-domain.p1 8 9 HQ677713_Ictalurus_punctatus_Otomorpha_TLR1L_1_Three-domain.p1

ENSDARG00000043032_Danio_rerio_Otomorpha_TLR1L_1_Three-domain.p1

ENSORLG00000004420_Oryzias_latipes_Euteleosteomorpha_TLR1L_1_Three-domain.p1

ENSTRUG00000009990_Takifugu_rubripes_Euteleosteomorpha_TLR1L_1_Three-domain.p1

XM_005937729_Haplochromis_burtoni_Euteleosteomorpha_TLR1L_1_Three-domain.p1

ENSLOCG00000012910_Lepisosteus_oculatus_Actinopterygii_TLR1L_1_Three-domain.p1

Rhincodon_typus_XP_020390742.1

Rhincodon_typus_XP_020390749.1

9 9 HG964646_Chiloscyllium_griseum_Chondrichthyes_TLR1L_1_Three-domain.p1 XM_007889182_Callorhinchus_milii_Chondrichthyes_TLR1L_1_Three-domain.p1

XM_005929488_Haplochromis_burtoni_Euteleosteomorpha_TLR18_1_Three-domain.p1

XM_006791792_Neolamprologus_brichardi_Euteleosteomorpha_TLR18_1_Three-domain.p1

ENSONIG00000006684_Oreochromis_niloticus_Euteleosteomorpha_TLR18_1_Three-domain.p1 9 5 ENSORLG00000015704_Oryzias_latipes_Euteleosteomorpha_TLR18_1_Three-domain.p1 9 6 XM_005800441_Xiphophorus_maculatus_Euteleosteomorpha_TLR18_1_Three-domain.p1

ENSTRUG00000000441_Takifugu_rubripes_Euteleosteomorpha_TLR18_1_Three-domain.p1

ENSGMOG00000003793_Gadus_morhua_Euteleosteomorpha_TLR18_1_Three-domain.p1

HG794501_Salmo_salar_Euteleosteomorpha_TLR18_1_Three-domain.p1

ENSAMXG00000018772_Astyanax_mexicanus_Otomorpha_TLR18_1_Three-domain.p1 TLR14/18

7 7 HQ677721_Ictalurus_punctatus_Otomorpha_TLR18_1_Three-domain.p1 ENSDARG00000040249_Danio_rerio_Otomorpha_TLR18_1_Three-domain.p1 9 2 XM_006641799_Lepisosteus_oculatus_Actinopterygii_TLR18_1_Three-domain.p1 8 3 ENSLACG00000017699_Latimeria_chalumnae_Coelacanthimorpha_TLR14_1_Three-domain.p1

7 9 XM_007910264_Callorhinchus_milii_Chondrichthyes_TLR14_1_Three-domain.p1 XM_005293629_Chrysemys_picta_bellii_Reptiles_TLR14_1_Three-domain.p1 8 0 XM_007064608_Chelonia_mydas_Reptiles_TLR14_1_Three-domain.p1

XM_006033493_Alligator_sinensis_Reptiles_TLR14_1_Three-domain.p1

XM_006266453_Alligator_mississippiensis_Reptiles_TLR14_1_Three-domain.p1

XM_003230485_Anolis_carolinensis_Reptiles_TLR14_1_Three-domain.p1

XM_007436577_Python_bivittatus_Reptiles_TLR14_1_Three-domain.p1

XM_002943051_Xenopus_tropicalis_Amphibia_TLR14_1_Three-domain.p1

9 5 XM_004919740_Xenopus_tropicalis_Amphibia_TLR14_1_Three-domain.p1

XM_002943049_Xenopus_tropicalis_Amphibia_TLR14_1_Three-domain.p1

XM_002943050_Xenopus_tropicalis_Amphibia_TLR14_1_Three-domain.p1

XM_005933812_Haplochromis_burtoni_Euteleosteomorpha_TLR25_1_Three-domain.p1

XM_006799739_Neolamprologus_brichardi_Euteleosteomorpha_TLR25_1_Three-domain.p1 8 7 ENSONIG00000014149_Oreochromis_niloticus_Euteleosteomorpha_TLR25_1_Three-domain.p1

XM_004083114_Oryzias_latipes_Euteleosteomorpha_TLR25_1_Three-domain.p1 7 0

ENSGMOG00000001699_Gadus_morhua_Euteleosteomorpha_TLR25_1_Three-domain.p1 TLR25

9 7 HQ677726_Ictalurus_punctatus_Otomorpha_TLR25_1_Three-domain.p1

XM_006633562_Lepisosteus_oculatus_Actinopterygii_TLR25_1_Three-domain.p1

ENSLOCG00000016891_Lepisosteus_oculatus_Actinopterygii_TLR25_1_Three-domain.p1

XM_007898660_Callorhinchus_milii_Chondrichthyes_TLR25_1_Three-domain.p1 8 1 9 6 AB109403_Lethenteron_camtschaticum_Cyclostomata_TLR25_1_Three-domain.p1 ENSPMAG00000010085_Petromyzon_marinus_Cyclostomata_TLR25_1_Three-domain.p1

AB109402_Lethenteron_camtschaticum_Cyclostomata_TLR25_1_Three-domain.p1

ENSPMAG00000010230_Petromyzon_marinus_Cyclostomata_TLR25_1_Three-domain.p1

XM_007895690_Callorhinchus_milii_Chondrichthyes_TLR27_1_Three-domain.p1 8 9 6 8 Rhincodon_typus_XP_020384134.1 ENSLACG00000015138_Latimeria_chalumnae_Coelacanthimorpha_TLR27_1_Three-domain.p1 TLR27 ENSLOCG00000017732_Lepisosteus_oculatus_Actinopterygii_TLR27_1_Three-domain.p1

ENSPMAG00000010418_Petromyzon_marinus_Cyclostomata_TLR24_1_Three-domain.p1 ENSPMAG00000010419_Petromyzon_marinus_Cyclostomata_TLR24_1_Three-domain.p1 TLR24 XM_002196366_Taeniopygia_guttata_Aves_TLR2B_1_Three-domain.p1

XM_005415709_Geospiza_fortis_Aves_TLR2B_1_Three-domain.p1

6 7 XM_005232498_Falco_peregrinus_Aves_TLR2B_1_Three-domain.p1 AB046533_Gallus_gallus_Aves_TLR2B_1_Three-domain.p1

ENSAPLG00000011399_Anas_platyrhynchos_Aves_TLR2B_1_Three-domain.p1 9 6 ENSTGUG00000005179_Taeniopygia_guttata_Aves_TLR2A_1_Three-domain.p1

9 8 XM_005415589_Geospiza_fortis_Aves_TLR2A_1_Three-domain.p1

XM_005232499_Falco_peregrinus_Aves_TLR2A_1_Three-domain.p1 9 6 ENSAPLG00000011397_Anas_platyrhynchos_Aves_TLR2A_1_Three-domain.p1

9 6 XM_003641158_Gallus_gallus_Aves_TLR2A_1_Three-domain.p1

8 5 XM_005299363_Chrysemys_picta_bellii_Reptiles_TLR2_1_Three-domain.p1 9 9 XM_007071294_Chelonia_mydas_Reptiles_TLR2_1_Three-domain.p1

XM_005299362_Chrysemys_picta_bellii_Reptiles_TLR2_1_Three-domain.p1

ENSPSIG00000000529_Pelodiscus_sinensis_Reptiles_TLR2_1_Three-domain.p1

9 7 ENSPSIG00000000537_Pelodiscus_sinensis_Reptiles_TLR2_1_Three-domain.p1 9 9 XM_006019140_Alligator_sinensis_Reptiles_TLR2_1_Three-domain.p1

XM_006264524_Alligator_mississippiensis_Reptiles_TLR2_1_Three-domain.p1

XM_006019141_Alligator_sinensis_Reptiles_TLR2_1_Three-domain.p1

ENSACAG00000021135_Anolis_carolinensis_Reptiles_TLR2_1_Three-domain.p1

XM_003221712_Anolis_carolinensis_Reptiles_TLR2_1_Three-domain.p1

XM_007425609_Python_bivittatus_Reptiles_TLR2_1_Three-domain.p1

ENSMUSG00000027995_Mus_musculus_Glires_TLR2_1_Three-domain.p1

4 9 XM_007640226_Cricetulus_griseus_Glires_TLR2_1_Three-domain.p1

9 9 ENSG00000137462_Homo_sapiens_Primates_TLR2_1_Three-domain.p1

9 8 ENSOCUG00000003367_Oryctolagus_cuniculus_Glires_TLR2_1_Three-domain.p1 8 6 TLR2/28 XM_006172099_Tupaia_chinensis_Scandentia_TLR2_1_Three-domain.p1 9 7 ENSCAFG00000008351_Canis_lupus_familiaris_Laurasiatheria_TLR2_1_Three-domain.p1

9 7 XM_002913846_Ailuropoda_melanoleuca_Laurasiatheria_TLR2_1_Three-domain.p1 6 9 XM_005871438_Myotis_brandtii_Laurasiatheria_TLR2_1_Three-domain.p1

ENSBTAG00000008008_Bos_taurus_Laurasiatheria_TLR2_1_Three-domain.p1

ENSLAFG00000005069_Loxodonta_africana_Afrotheria_TLR2_1_Three-domain.p1 9 8 7 6 XM_007943988_Orycteropus_afer_afer_Afrotheria_TLR2_1_Three-domain.p1

ENSDNOG00000039179_Dasypus_novemcinctus_Xenarthra_TLR2_1_Three-domain.p1 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. ENSMODG00000024954_Monodelphis_domestica_Metatheria_TLR2_1_Three-domain.p1 ENSSHAG00000007395_Sarcophilus_harrisii_Metatheria_TLR2_1_Three-domain.p1

XM_007671810_Ornithorhynchus_anatinus_Prototheria_TLR2_1_Three-domain.p1

ENSXETG00000001817_Xenopus_tropicalis_Amphibia_TLR2_1_Three-domain.p1

ENSXETG00000033109_Xenopus_tropicalis_Amphibia_TLR2_1_Three-domain.p1

7 6 KF573999_Andrias_davidianus_Amphibia_TLR2_1_Three-domain.p1

Rhincodon_typus_XP_020367827.1

Rhincodon_typus_XP_020367828.1

Rhincodon_typus_XP_020379248.1

XM_007903987_Callorhinchus_milii_Chondrichthyes_TLR2_1_Three-domain.p1 8 9 XM_007904038_Callorhinchus_milii_Chondrichthyes_TLR2_1_Three-domain.p1

8 6 ENSLOCG00000018220_Lepisosteus_oculatus_Actinopterygii_TLR2_1_Three-domain.p1

8 8 ENSLACG00000012590_Latimeria_chalumnae_Coelacanthimorpha_TLR2_1_Three-domain.p1 ENSXMAG00000008258_Xiphophorus_maculatus_Euteleosteomorpha_TLR2_1_Three-domain.p1

ENSXMAG00000008259_Xiphophorus_maculatus_Euteleosteomorpha_TLR2_1_Three-domain.p1

ENSORLG00000002540_Oryzias_latipes_Euteleosteomorpha_TLR2_1_Three-domain.p1

ENSONIG00000014114_Oreochromis_niloticus_Euteleosteomorpha_TLR2_1_Three-domain.p1

XM_006806622_Neolamprologus_brichardi_Euteleosteomorpha_TLR2_1_Three-domain.p1

ENSTRUG00000003176_Takifugu_rubripes_Euteleosteomorpha_TLR2_1_Three-domain.p1

ENSAMXG00000025582_Astyanax_mexicanus_Otomorpha_TLR2_1_Three-domain.p1 9 8 8 4 XM_005170853_Danio_rerio_Otomorpha_TLR2_1_Three-domain.p1 DQ372072_Ictalurus_punctatus_Otomorpha_TLR2_1_Three-domain.p1

AKN63433.1_TLR28_Miichthys_miiuy

XM_006021588_Alligator_sinensis_Reptiles_TLR15_15_Single-domain.p1

8 3 XM_006274445_Alligator_mississippiensis_Reptiles_TLR15_15_Single-domain.p1 TLR15 7 1 XM_007444506_Python_bivittatus_Reptiles_TLR15_15_Single-domain.p1

XM_005018870_Anas_platyrhynchos_Aves_TLR15_15_Single-domain.p1 5 7 XM_002197069_Taeniopygia_guttata_Aves_TLR15_15_Single-domain.p1

XM_005422603_Geospiza_fortis_Aves_TLR15_15_Single-domain.p1 7 4 XM_005235036_Falco_peregrinus_Aves_TLR15_15_Single-domain.p1

NM_001037835_Gallus_gallus_Aves_TLR15_15_Single-domain.p1 0.3 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Callorhinchuscmil storCloudy Catshark ccarWhite Shark Whalertyp Shark Brownbandedcpun Bamboo Shark Spottedlocu Gar Asiansfor Arowana drerZebrafish Northerneluc Pike Atlanticgmor Cod onilNile Tilapia olatJapanese Ricefish Three-spinedgacu Stickleback Oceanmmol Sunfish trubJapanese Puffer dnigGreen Spotted Puffer lchaCoelacanth Clawedxtro Frog acarAnole Chickenggal amisAlligator oanaPlatypus Opossummdom Hyraxpcap Elephantlafr Armadillodnov mmusMouse Humanhsap clupDog sscrPig btauCow ttruDolphin substitutionstrait value per My bmysBowhead Whale 3e−04 5e−04 0.0029 bacuMinke Whale length=0.15 bioRxiv preprint doi: https://doi.org/10.1101/685743; this version posted July 28, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Non−cancer gene families along non−giant branches

Non−cancer gene families along giant branches

Cancer gene families along non−giant branches

Cancer gene families along giant branches

0.001 0.002 0.005 0.010 0.020 0.050 0.100 0.200

Rates of Gene Family Size Evolution