<<

Downloaded by guest on October 2, 2021 ailA Bartlett A. Daniel Hofmann P. Erich genotype–phenotype and across date variation network. regulatory phenotypic to polygenic mediating a network mechanisms -regulatory charac- key complete identify the most most the simplest, methyla- of provide the and terization of We production accessibility, venom. the chromatin to toxic contribute loss, all gene levels tion that genome this show use and to genotype date to complex most assembly a the -genome or generate contiguous We loss mechanisms. gene result regulatory , through through the mediated rattlesnake is genotype phenotype simple any venom a of simple of venom the we whether toxic possesses determine Here, most which to tractability. and Rattlesnake, Tiger simplest genetic the the in their of system variation genome of the model regulatory sequence because of a impacts traits provide phenotypic complex complex the for Snake studying the for especially challenging. identifying variation, is yet 2020) 13, traits, such ubiquitous, July review is for producing (received regulation 2020 mechanisms 8, gene December approved in and CA, Variation Davis, , of University Lewin, A. Harris by Edited 29634 SC Clemson, University, Clemson Conservation, Environmental 36688; AL Mobile, Alabama, 02138; MA Cambridge, a phenotype Margres J. venom Mark simple a complex underlying a genotype reveals genome Rattlesnake Tiger The l yeAvnm aehg ertxct eutn from resulting PNAS neurotoxicity high have sim- venoms Here, phospholipase (21). A potent dichotomy A–B the across ple as activities known neurotoxic (18) in differences individuals with associated is oms expres- venom in differences (18–20). with toxicity in correlated and be function differences to shown manner; been have tractable venom sion the a alter in directly (10), to phenotype product expression final toxin their in and variation are allowing inter- venoms expressed are the Because processes between posed (15–17). developmental complicating scales toxin no phylogenetic among secretions, divergence all expression at of genes evolutionary levels families toxin-gene high extreme 25 and show to 5 that 12), comprise venoms (11, Snake 14). fitness tractability (13, genetic rates to their regulatory contributions of 10), of because (9, traits consequences polygenic phenotypic in variation and impacts adaptive challenge. a polygenic, remains in traits divergence complex character- phenotypic result, to poorly a change of As regulatory (8). refs. products linking loci complex genes; many involving are several pathways developmental traits to ized most one with 7), (i.e., traits and bases 6 for genetic exist simple populations relatively natural in adaptive variation of examples expression several Although (5). traits to particular networks affect gene-regulatory through transmits variation this understanding phenotypic how requires at however, the variation, Discerning pervasive can regulatory of and (2–4). impacts is levels divergence species regulation phenotypic and affect gene population, organismal, in cellular, the Divergence unique with cells (1). of types functions different produce to genome single this A eateto ilgclSine,CesnUiest,Cesn C29634; SC Clemson, University, Clemson Sciences, Biological of Department h ags xso hntpcvraini atenk ven- rattlesnake in variation phenotypic of axis largest The the studying for system a as emerged have venoms Snake esstesm eoe aito ngn euainallows regulation gene in pos- variation organism genome, eukaryotic same a the within sesses cell every nearly lthough 01Vl 1 o e2014634118 4 No. 118 Vol. 2021 a,b,c,1 a,3 | e c eeregulation gene eateto nertv ilg,Uiest fSuhFoia ap,F 33620; FL Tampa, Florida, South of University Biology, Integrative of Department ioh .Colston J. Timothy , rnStiers Erin , A e ht .Rautsaw M. Rhett , 2 eateto ilgclSine lrd tt nvriy alhse,F 20;and 32306; FL Tallahassee, University, State Florida Science, Biological of Department (PLA 2 ertxn n o levels low and neurotoxins ) | a methylation cye .Ellsworth A. Schyler , e,4 a | chromatin ai .Gilbert M. David , ao .Strickland L. Jason , e unrS Nystrom S. Gunnar , b eateto raimcadEouinr ilg,HradUniversity, Harvard Biology, Evolutionary and Organismic of Department e doi:10.1073/pnas.2014634118/-/DCSupplemental at online information supporting contains article This 4 ulse aur 8 2021. 18, January Published 3 2 1 the under Published Submission.y Direct PNAS a is and article data; This analyzed interest.y paper. y competing the D.R.R. no wrote M.J.M., C.L.P. declare and and authors research; D.M.G., The G.S.N., T.J.C., performed G.S.N., S.A.E., S.A.E., D.M.G. A.J.M., E.S., J.L.S., E.P.H., and R.M.R., T.D.S., M.J.M., T.J.C., A.J.M., D.A.B., J.L.S., M.P.H., J.L.S., R.M.R., R.M.R., G.S.N., M.J.M., research; S.A.E., designed C.L.P. A.J.M., and D.R.R., M.J.M., contributions: Author stersl fasml eoyetruhgn oso com- a or loss mechanisms, gene gene-regulatory unclear. through is through however, genotype mediated simple genotype a plex of result simplicity phenotypic the this is Whether . toxic tens of possessed hundreds techniques to similar by analyzed 1A; venoms viper pro- Fig. most toxic four in that found shown comprised (24) teins al. are et Calvete S1). estimates Table Appendix, [VC] species any complexity of venom toxic, (venom most ( but simplest, Rattlesnake the Tiger refs. possesses the 1A; tigris) , (Fig. Among present 23). also variation, and was 22 this types of venom within all, gene summarized variation not that and is but showed venoms some, and rattlesnake documented explained (18) in presence–absence work been species Recent has 1A. between Fig. neu- and dichotomy in lack (14) A–B within hand, The both other SVMPs. the of on levels venoms, B rotoxic type (SVMPs). complex metalloproteinases More snake-venom tissue-damaging of rsn drs:Dprmn fBooy nvriyo lrd,Gievle L32611. y FL Gainesville, Florida, of University Biology, of Department address: Present rsn drs:SineDprmn,Cp erCmuiyClee imntn NC Wilmington, College, Community Fear Cape 28401. Department, Science address: Ohio Present The Biology, viper@ 43210. y Organismal OH and or Columbus, Ecology University, State Evolution, [email protected] of Department Email: address: Present addressed. be may clemson.edu. correspondence whom To opeiymyb h euto hrdrgltr elements regulatory shared of members. result gene-family among the genomic be of may retention the complexity that toxic, suggest most We but venom. simplest, rattlesnake the of production producing the for proteins were responsible mechanisms venom regulatory genes indicating of phenotype, venom simple number of the number the The of exceeded genotype. product greatly complex the or was simple venom and a determine rattlesnake to sequenced simplest Rattlesnake the We Tiger whether the rates. of evolutionary genome the contributions high assembled tractability, and genetic fitness, their to of regula- this or addressing because for sequence, system question excellent number, an gene are are venoms in differences Snake tion. trait variation whether of is result biology the in question central A Significance a,d ai .Rokyta R. Darin , y nrwJ Mason J. Andrew , PLA y 2 btpsesohrprlg)adepeshigh express and paralogs) other possess (but s NSlicense.y PNAS 3 fthe of ∼93% e ihe .Hogan P. Michael , e d https://doi.org/10.1073/pnas.2014634118 eateto ilg,Uiest fSouth of University Biology, of Department n hitpe .Parkinson L. Christopher and , a,2 rsa .Schramer D. Tristan , .tigris C. f eateto oetyand Forestry of Department . y eo rtoe whereas proteome, venom https://www.pnas.org/lookup/suppl/ e , a , | a,f,1 f12 of 1 SI

EVOLUTION ABCrotalus tigris VC=5.15 Crotalus adamanteus VC=18.19 1 SVMP PLA2 CRISP MYO SVSP PDE VEGF NUC 0 KUN

Crotalus horridus Type A VC=12.24 Crotalus horridus Type B VC=15.42

1

0

Crotalus scutulatus Type A VC=19.68 Type B VC=14.66

1

0 Absorbance (220 nm) relative to maximum

Crotalus atrox VC=18.83 Crotalus viridis VC=11.57 LAAO 1 HYAL VF PLB NGF CRISP Vespryn Gene density 0 Repeat content CG content 20 40 60 80 100 120 20 40 60 80 100 120 Methylation Chromatin Accessibility Time (Minutes)

Fig. 1. A complex genetic architecture underlies the simple venom phenotype of the . (A) Comparative proteomics shows the simplicity of the Tiger Rattlesnake (C. tigris) venom phenotype. RP-HPLC of whole venoms for eight individuals across six species of rattlesnake. Chromatographic peaks represent a particular toxic or set of proteins. Estimates of VC, our proxy for phenotypic complexity, are shown next to each venom profile, and the Tiger Rattlesnake (VC = 5.15) possessed a significantly simpler venom than all other rattlesnakes shown here (see SI Appendix, Table S1 for details). For species polymorphic for the A–B venom dichotomy, where type A venoms are simple and neurotoxic and type B venoms are complex and hemotoxic, venom type is listed next to species name. C. tigris possesses a type A venom, whereas Crotalus adamanteus, C. atrox, and C. viridis possess type B venoms. See SI Appendix, Fig. S4 for transcriptome–proteome relationship in C. tigris. (B) Circos plot of the RaGOO assembly displaying gene density, repeat content, GC content, proportion of methylated cytosines in the venom gland, and chromatin accessibility in the venom gland across 100-kb windows at scale. Venom that were reliably mapped to chromosome scaffolds are shown as colored vertical lines and labeled accordingly. CRISP, cysteine-rich secretory protein; HYAL, hyaluronidase; KUN, Kunitz-type toxin; LAAO, L-amino acid oxidase; MYO, ; NGF, nerve growth factor; NUC, nucleotidase; PDE, phosphodiesterase; PLB, phospholipase B; VEGF, vascular endothelial growth factor; VF, venom factor. Snake image credits: M.P.H. and Travis Fisher (photographer).

To characterize the genetic architecture underlying the sim- Results plest rattlesnake venom, we sequenced and assembled the Tiger De Novo Genome Assembly and Annotation. We sequenced the Rattlesnake genome. We then used this genome, along with Tiger Rattlesnake genome using PacBio (∼ 33× coverage; N50 RNA-sequencing (RNA-seq), Assay for Transposase-Accessible of 25.73 kb; SI Appendix, Fig. S1) and Illumina (∼ 190× cover- Chromatin using sequencing (ATAC-seq), and whole-genome age; 150 paired-end [PE]) sequencing technologies. The hybrid bisulfite sequencing (WGBS), to determine whether 1) the sim- de novo assembly produced 4,228 scaffolds with a N50 scaf- plest venom phenotype was matched by a simple genetic archi- fold/contig length of 2.11 Mb. Although the final assembly was tecture or 2) molecular and evolutionary mechanisms mediated 1.61 Gb in size, similar to other genome assem- the production of a simple phenotype from a more complex blies (26–29), the N50 contig length was order(s)-of-magnitude genotype. Here, an approximate one-to-one mapping of geno- larger in our assembly relative to other assemblies (Table 1). We type to phenotype (i.e., venom gene to venom protein) would assessed the completeness of the assembly using the BUSCO represent a simple genotype. A complex genotype would be char- (version [v] 3) (30) vertebrata gene set (n = 2, 586) and found acterized by a many-to-one mapping of genotype to phenotype, that the assembly was ∼95.8% complete (93.9% single-copy, where many functional, but silenced, venom genes would be 1.9% duplicated, 2.0% fragmented, and 2.2% missing), similar to present in the genome, consistent with definitions from other other recent squamate genomes (e.g., Komodo Dragon 95.7%; systems (reviewed in ref. 25). We found evidence that gene ref. 31). loss, chromatin accessibility, and methylation levels all con- We next used RaGOO (32) to scaffold our de novo assembly tributed to the production of the simplest, most toxic rattlesnake to the Prairie Rattlesnake genome assembly (28). The scaf- venom. Although deletion events reduced genomic complexity folded assembly reduced the total number of scaffolds from in particular toxin-gene tandem arrays, other genomic regions 4,228 to 380 (160 new scaffolds and 220 unassigned de novo maintained complexity similar to other rattlesnakes with more scaffolds) and increased the scaffold N50 from 2.11 to 207.72 complex venom phenotypes. In these regions, chromatin acces- MB (Table 1), producing a chromosomal-level assembly; all 18 sibility and methylation levels were significant predictors of (7 macro-, 10 micro-, Z chromosome) identified toxin- and, therefore, putatively enabled a com- in the Prairie Rattlesnake genome assembly (28) were assem- plex genetic architecture to produce a simple phenotype. Overall, bled here (Fig. 1B). Genomic content in the Tiger Rattlesnake we provided the most complete genomic characterization of genome was comparable to other recent snake genomes. For the the venom-gene regulatory network to date and identified key de novo assembly, CG content, repeat content, and gene density mechanisms generating phenotypic variation across a polygenic were 39.9% (39.8% RaGOO assembly), 42.8% (44.0% RaGOO regulatory network. assembly), and 1.17 genes per 100 kb (1.13 genes per 100 kb

2 of 12 | PNAS Margres et al. https://doi.org/10.1073/pnas.2014634118 The Tiger Rattlesnake genome reveals a complex genotype underlying a simple venom phenotype Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 SMs nteTgrRtlsaegnm Fg 2). (Fig. genome on Rattlesnake deletions Tiger the (PLA large in found (SVMPs) 7 We microchromosome types. venom both genetic and the species bac- several investigate used colleagues to the and of cloning Dowell architecture chromosome 23); (22, the artificial as al. terial well et as Dowell (28) of genome data Rattlesnake Prairie rattlesnake the com- other using several we to species Rattlesnake, genome Tiger Rattlesnake Tiger the the in pared losses Venoms. Simple toxin-gene of specific Evolution tify the to Contributes Loss Gene toxin putative identified the of 36 not. 51 did to the genes 25 pheno- of that venom 26 therefore, the to and, to type contributed 15 genes that toxin confirmed putative we although identified Overall, S4), of (10). Fig. detection toxins the Appendix, other precluded (SI likely thresholds detection genome proteomic the the in present of be venom to spectrometry mass quantitative by confirmed h ie atenk eoervasacmlxgntp neligasml eo phenotype venom simple a underlying genotype complex a reveals genome Rattlesnake Tiger The al. et Margres different to vas- mapped 1B; and [VEGF]) (Fig. although locations [CTLs] factor chromosomal genomes, lectins growth (28) C-type endothelial (e.g., Rattlesnake cular families Prairie toxin-gene and other Tiger the Rat- Prairie in the in genes genes (e.g., toxin assembly toxin 1B; orthologous Rattlesnake (Fig. scaffolded of tlesnake Tiger the locations of the of locations to results the genome, mapping compared Rattlesnake Tiger and the the in analyzed location genes ven- toxin chromosomal we other putative the from 51 determine the toxins To of putative species. to snake homologous omous 51 were genes, which protein-coding FGENESH+ 18,240 Appendix of annotate (33), (SI to MAKER assessment used individual manual and genome S2) on the Table RNA-seq and from performed tissues We 25 respectively. assembly), RaGOO rncitm;rslswr agl ossetars ee indi- S3 seven venom-gland Fig. across Appendix , the identified (SI consistent viduals in largely genes expressed were toxin appreciably results putative transcriptome; were genome the the of in half TPM approximately with genes that toxin 20 six million additional identified an per and (transcripts We ual expressed Rattlesnake. highly in Tiger were [TPM] to expressed the that actively RNA-seq genes of were used toxin gland genes next 51 con- venom We these directly the phenotype. of and which venom expressed determine the these be the in of to to all role likely tribute Not a were venom. played however, rattlesnake likely genes, simplest loss the gene of that production indicating 1), (Table 11.76, hnfudi h rii atenk eoe(n genome (n Rattlesnake genome Prairie Rattlesnake the Tiger in found the Venom. than in 51) Simplest genes toxin the fewer of cantly Basis Genetic The PLA df ,0 1k)i ohvnmgad ftegnm individ- genome the of glands venom both in k]) [1 >1,000 cfodN0 Mb N50, Scaffold weeaporae.Te10RGOsaflsd o nld h 2 cfod sindt hoooe0(6 aO cfod n 220 and scaffolds RaGOO (160 0 total). chromosome scaffolds to 380 assigned 0; scaffolds chromosome 220 in the scaffolds include novo not de do unassigned scaffolds RaGOO 160 The appropriate). (where contigs of Number scaffolds of Number Gb assemblies size, genome Assembly snake other to comparisons and statistics assembly genome Rattlesnake Tiger 1. Table rti-oiggenes Protein-coding % content, GC Mb N50, Contig uaievnmpoencdn ee 51 genes protein-coding venom Putative 1, = o h ie atenk,d ooasml ttsisaesono h et n aO sebysaitc r hw nteright the on shown are statistics assembly RaGOO and left, the on shown are statistics assembly novo de Rattlesnake, Tiger the For 2 n VP hrdsmlrcrmsmllocations chromosomal similar shared SVMP) and P < .Ms ao oi-eefamilies toxin-gene major Most S1). Dataset PLA .0)adohrvnmu nk species snake venomous other and 0.001) 2 n VPtxngn aiisacross families toxin-gene SVMP and and .Ffentxn were toxins Fifteen S1). Dataset aae S1). Dataset 2 )admcohoooe1 microchromosome and s) eietfidsignifi- identified We ie atenk rii atenk iePc ie igCbaIda Cobra Indian Cobra King Viper Five-Pace Rattlesnake Prairie Rattlesnake Tiger > 2.11/207.72 4,228/160 1.61/1.59 39.9/39.8 0,suggesting 500, 18,240 4,228 2.110 92; = 3) and (34), i.S2 Fig. , oiden- To χ 2 = = eoi ein(2.TeTgrRtlsaepsesdthree this possessed of all Rattlesnake (PLA across history Tiger shared The evolutionary three PLA was (22). (range dynamic region four paralog the was genomic single e.g., region reflecting no species; this species, although in other seven), species in to all found across were genes region deletion this this dele- PLA in other other (although that all events Rattlesnake in Tiger suggesting tion the found to been date, unique was have to event genes three examined these rattlesnakes of two three least of At deletion the in resulted PLA that genome Rattlesnake Tiger lg htwr o xrse nteTgrRtlsaegenome Rattlesnake Tiger the par- in functional expressed of (e.g., not retention were similar the that as found alogs well more We as with pro- phenotypes, species phenotype. in complex the microchromosomes venom both explain on simplest completely events deletion the not could of alone duction the loss to gene relative genome region region, genomic Rattlesnake SVMP Tiger the the in role of larger Tiger a playing The of deletions 30). two gene only to but 4 genes, genes, (range venom these SVMP 13 five was possessed mean rattlesnake date Rattlesnake all the to across data, investigated genes venom species these SVMP Rattlesnake Including functional Diamondback of (35). number Western genome the atrox) identified in 2) (Crotalus Fig. paralogs ven- in included SVMP A (not type work 30 recent with more 2); scutulatus) (Fig. with (Crotalus oms shared Rattlesnake was how- 1 Mojave microchromosome 7, the on microchromosome event the Unlike deletion in the genes. resulted ever, SVMP that 10 genome of Rattlesnake Tiger deletion the in kb) (∼340 (e.g., venoms complex horridus to more Crotalus relative much Rattlesnake with genome, species Tiger rattlesnake the Rattlesnake other in Tiger and different the not complexity was in genomic complexity event average deletion than large less a and found 21) we Although (18, venoms A .Methylation S5). Fig. Appendix, (SI near samples protein-coding methylation all annotated decreased all across other for found genes (TSSs) with sites and CpG consistent start 37), transcription (∼77%), genome-wide (36, tissues high at 3; vertebrates all recovered Fig. milked at (control; across We one pancreas milked methylation S3). “Active,” the (one Table and as glands Appendix, “Resting”) indicated venom as and two indicated d tissues, three 4 on Expression. WGBS Toxin-Gene Predict Levels Methylation phenotype genotype. simple complex a more of a production from the enabling also were anisms 166,667 179.89 17,352 7,043 0.015 1.34 nmcohoooe1 efudalredlto event deletion large a found we 1, microchromosome On nmcohoooe7 efuda 1k eeini the in deletion 11-kb an found we 7, microchromosome On 92 — 2 2 2 PLA 2 -K eo genes: venom uuiso h enn ertxno type of neurotoxin defining the of subunits -A2-MTXA) eo ee:tebsc(PLA basic the genes: venom i.2.Tema ubro functional of number mean The 2). Fig. ; 2 -C2 SVMP-234 and yeB). type 6,5 9,9 1,897 296,399 160,256 732866313,066 816,633 17,322 1141,0 23,248 18,506 21,194 .2 .0 0.300 0.004 0.220 ,idctn htrgltr mech- regulatory that indicating SVMP-233), 2.12 1.47 262 PLA — and 2 PLA -C1, eeepesd Despite expressed. were SVMP-244, 2 PLA hc a o expressed. not was which -C2, https://doi.org/10.1073/pnas.2014634118 .3223.35 1.79 0.23 1.66 0640.5 40.6 232 2 and -A1, 2 n acidic and -B2-MTXB) PLA 139 PLA eperformed We PLA PNAS 2 0dand d >30 -D 2 2 genomic Fg 2). (Fig. | venom PLA f12 of 3 SI 2

EVOLUTION OTUD3 EBC2 K A2C1 AF 1 D MUL1

C. tigris Type A TE B2-MTXB NE A2-MTXA ~11,000 bp deletion

C. scutulatus Type A TE B2-MTXB NF A2-MTXA TE NE

C. horridus Type A TE B2-MTXB NE A2-MTXA B2-MTXA NE NE

C. scutulatus Type B TE B1 PLA2-K NE

C. viridis Type B TE B1 PLA2-K NE

C. atrox Type B TE B1 PLA2-K NE

C. horridus Type B TE ~15,000 bp deletion NE NE

C. adamanteus Type B TE B1-NF ~9,000 bp deletion NE NE

Non-toxin Toxin NF NE Not expressed TE Transposable Element

NEFM 232 233 234 1834235126352 235323 237233 8 240 242244 24 2442 ADAM28 SVMP Type PIII PII PII PIII PIII PIII PIII PII PII PIII PIII PIII PII PIII PIII C. tigris Type A NE NE ~340,000 bp deletion NE C. scutulatus Type A NF ~341,000 bp deletion NE C. horridus Type A NE NENE NE NE NE NE NE NE C. horridus Type B NE NE NE NE C. scutulatus Type B NE C. viridis Type B NE

Fig. 2. Gene losses in two major toxin-gene families partly facilitate the evolution of a simple venom phenotype in the Tiger Rattlesnake (C. tigris). Genomic position for the PLA2 (Upper) and SVMP (Lower) toxin tandem arrays across several rattlesnake species. Toxin genes are shown as black boxes, and toxin- is taken from Dowell et al. (22, 23). Flanking nontoxin genes (e.g., OTUD3) are shown as white boxes. The Tiger Rattlesnake genome has large deletions in both toxin tandem arrays, although deletions do not completely explain the simplest venom phenotype; for example, five functional SVMP genes are retained in the genome, but only two are expressed. PII, type II SVMP; PIII, type III SVMP.

and expression levels were significantly negatively correlated Fig. S7). Given that these genes were differentially expressed for all genes across all tissues (P < 0.001), as were toxin- and differentially methylated across venom glands and pancreas specific methylation and expression levels in the venom glands (Fig. 3B), methylation appeared to also play a role in regulat- (P = 0.012; SI Appendix, Fig. S6). To determine if differences ing the nonvenom components associated with venom transla- in methylation predicted differences in gene expression across tion/secretion and, therefore, in the overall production of the venom glands and pancreas, we performed a linear regres- simple venom. sion and found significant negative correlations in toxin genes (P = 0.001) and differentially expressed nontoxin genes (P < Chromatin Accessibility Predicts Toxin-Gene Expression. We per- 0.001), but not in similarly expressed nontoxins (P = 0.280; formed ATAC-seq on nine tissues, six venom glands (right and Fig. 3B); these results indicated that toxin and nontoxin tran- left venom glands from three individuals) and three control tis- scripts that were differentially expressed across pancreas and sues (Harderian gland, labial gland, and pancreas; SI Appendix, venom glands exhibited significant, correlated changes in methy- Table S4). We found increased chromatin accessibility near lation level, whereas nontoxin transcripts that were similarly TSS for all nontoxin genes across all control tissues with lit- expressed across pancreas and venom glands were also similarly tle to no accessibility near toxin TSS (SI Appendix, Fig. S8). methylated. Highly expressed toxin genes were demethylated in In the venom glands, we found increased chromatin accessibil- the venom gland and heavily methylated in the pancreas (Fig. 3B; ity near toxin TSSs and transcription end sites (TESs) relative SI Appendix, Fig. S6), as expected, suggesting that methyla- to nontoxins in the venom gland; this pattern was even more tion played a role in venom-gene expression. pronounced for the 20 most highly expressed toxins described (GO)-enrichment analysis of the differentially expressed non- above. Highly expressed toxins exhibited much higher levels toxin genes (n = 54) showed that these genes were involved of chromatin accessibility relative to lowly expressed toxins in with protein production, particularly the ribosomal machinery the venom glands, where low-expression toxin-gene accessibil- of the venom-gland cells; roughly a third of the terms were ity was much more similar to nontoxin-gene accessibility (SI directly related to ribosomal activity/components (SI Appendix, Appendix, Figs. S8 and S9). Highly expressed toxin genes were

4 of 12 | PNAS Margres et al. https://doi.org/10.1073/pnas.2014634118 The Tiger Rattlesnake genome reveals a complex genotype underlying a simple venom phenotype Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 is 8and S8 Figs. Appendix, (SI tissue the any S9 in in did TESs TESs genes accessible nontoxin have near and not genes accessibility toxin increased low-expression gland; having venom in unique also sequenced. VGs both for bins 2-kb across percentage methylation h ie atenk eoervasacmlxgntp neligasml eo phenotype venom simple a underlying genotype complex a reveals genome Rattlesnake Tiger The al. et lev- Margres methylation toxin-gene of co-regulating distributions in the role compared joint we potential methylation, their expression, and as well accessibility Phenotype. as Venom chromatin Complex Simple between a a relationship Allow Produce Jointly to Methylation Genotype and Accessibility Chromatin levels. expression toxin-gene with correlated methylation reduced have VGs the in expressed highly exhibit more that are transcripts that Overall, transcripts pancreas. The (C levels; regions. the versa. genomic in methylation vice (D) methylated in and highly differences VG more exhibit the transcripts in also represent levels VGs transcripts values and represent negative pancreas values whereas negative across VGs, whereas DE the VGs, in the methylated in expressed The highly highly pancreas. TPM more more the (mean transcripts in represent represent expressed pancreas points values highly TPM and Blue positive more across (mean pancreas; transcript. VGs bp) and pancreas individual across VGs (±500 and an between DE TSS represents VGs expression significantly for point across level Each not (DE) methylation tissues. were expressed in same differentially that Difference the significantly (B) across were tests. differences that nonsignificant expression nontoxins predicts represents significantly gray VGs and and tests, pancreas significant represents using green subsampling; by after assessed high- TPM was (Closed). (Open), with significance regions regions transcripts chromatin Statistical 32 inaccessible chromatin-accessible introns and the venom-gland-specific including regions, represent regions intergenic across toxin-genic toxins (Nontoxin), entire VGs introns TSSs, included including toxin-gene resting regions regions TPM) and (Low nontoxin-genic TSS low-expression entire (VG), TSSs, TSSs, toxin-gene glands nontoxin-gene all all TSS), venom (Toxin), Toxin TPM active (High TSSs pancreas, toxin-gene for expression levels methylation of 3. Fig. A High TPM All Low TPM All .Oeal hoai cesblt,smlrt ehlto,was methylation, to similar accessibility, chromatin Overall, ). Open Toxin TSS Toxin TSS Toxin TSS Toxins Nontoxin TSS Nontoxins Intergenic Closed Distributions (A) Rattlesnake. Tiger the of glands venom the in expression gene regulate jointly accessibility chromatin and levels Methylation 0 04 08 100 80 60 40 20 0 paon h siae tr ie ihTMtxn ersn h 0ms ihyepesdtxn (TPM toxins expressed highly most 20 the represent toxins TPM High site. start estimated the around bp ±500 Methylation % y xsrpeet o-rnfre TCsqma oeaeetmtsars -bbn o l he G.The VGs. three all for bins 2-kb across estimates coverage mean ATAC-seq log-transformed represents axis and hoai cesblt n ehlto ee r infiatyngtvl orltdars h PLA the across correlated negatively significantly are level methylation and accessibility Chromatin D) x xssostedfeec nmtyainlvlbtenVsadpnra;pstv ausrpeettranscripts represent values positive pancreas; and VGs between level methylation in difference the shows axis < t et.Pecat ersn h rprino 000bootstrapped 10,000 of proportion the represent charts Pie tests. ,0.Nnoi ee ersn ee cieyepesdi h G htd o rdc oi proteins. toxic produce not do that VGs the in expressed actively genes represent genes Nontoxin 1,000. Resting VG Pancreas Active VG oass the assess To > 0;Padj 300;

log(Mean ATAC Coverage) CD B 1 2 3 4 5 iet h acesfra es n oprsnars the across comparison one least at for rela- pancreas levels the methylation to venom-gland tive in reductions 3A; significant Fig. introns chromatin inaccessible (Closed; including 9) regions regions and nontoxin- regions, nontoxin-genic intergenic all 8) entire (Nontoxin), 6) 7) (Toxin), TSSs, entire introns 5) gene including TSSs, toxin-gene regions TPM) toxin-gene toxin-genic (Low all 3) low-expression TSSs, 4) Toxin) TSSs, TPM (High following gene the toxin expression across see glands chromatin-accessible (Open; venom-gland-specific venom regions 1) both regions: and genomic pancreas for els 55 5100 75 50 25 0 R >

2 Higher Pancreas <−− Difference in log(TPM) −−> Higher Venom Glands −10 =0.28, −5 10 0 5 .) e onsrpeettxn.The toxins. represent points Red 0.1). Mean Methylation % Higher Pancreas <−−Difference inMethylation −−>HigherVenom Glands R R R    p = = =    <0.001  0.11 0.53 0.64  ,    p , ,    p p =    = < 0.280   0.001 0.001 > 0;Padj 300; aeil n Methods and Materials 5 50 0 −50 .W found We S10 ). Fig. Appendix, SI < t .1.Bakpit ersn nontoxins represent points Black 0.01). et htwr infiat(P significant were that tests https://doi.org/10.1073/pnas.2014634118 log(Mean ATAC Coverage) 2 3 4 5 0255075 R y ENnoi otxnToxin Nontoxin DE Nontoxin 2 xssostedfeec in difference the shows axis =0.25,  Mean Methylation % o eal) )high- 2) details), for x p <0.001 xsrpeet mean represents axis > ,0) o TPM Low 1,000). PNAS 2 (C n SVMP and ) | < f12 of 5 0.05) 100

EVOLUTION Downloaded by guest on October 2, 2021 i.4. Fig. f12 of 6 the both to relative (0.440 gland (P TSSs venom gland venom active resting signif- and the was pancreas 3A; methylation in (Fig. TSSs, greater result nontoxin icantly this For in S10). confidence Fig Appendix, less showed analysis ping ( different significantly ( also regions genomic 3D) (Fig. significantly the both were across level correlated negatively methylation Indeed, and Rattlesnake. accessibility regu- Tiger the the chromatin in to expression tandem, toxin-gene negatively of in lation and contributed, significantly putatively both were correlated; methylation and accessibility (0.011≤ Toxin and (0.001≤ TSSs represents shading Toxin red whereas (P (control), Open pancreas be the may in all not bins PLA and 2-kb peak appropriate; the where overlaid, in tracks at under are levels shown coverage are colors methylation in annotations Gene respectively; change represents gland. venom sequenced, fold shading maximum. the represent in individuals gray bins signalValues 2-kb three MACS2-estimated track, in levels the peak; methylation methylation represent the the under methylation plots For area ATAC-seq and the summit. and MACS2, represent by RNA-seq signalValues identified the Genrich-estimated peaks visible. in ATAC-seq transcriptome, colors specific (LVG) venom-gland venom-gland orange left Genrich, ATAC-seq, and RVG by transcriptome, identified (RVG) RNA-seq venom-gland peaks percentage. ATAC-seq right bottom: specific to venom-gland top ATAC-seq, from LVG Rows genes. silenced near methylation PLA of Expression Expression 6888 700 600

Methylation % 100 signalValue signalValue Coverage relative to max. Coverage relative to max. 2 0 0 5 0 0 1 0 1 0 0 euaoylnsaefrtePLA the for landscape Regulatory | oi ee ihicesdacsiiiyadrdcdmtyainna ciegns(.. PLA (i.e., genes active near methylation reduced and accessibility increased with genes toxin https://doi.org/10.1073/pnas.2014634118 PNAS < ≤ .0;sergttiso itiuin) ihTPM High distributions), of tails right see 0.001; P ≤ y P TD PLA OTUD3 TD PLA OTUD3 TD PLA OTUD3 TD PLA OTUD3 TD PLA OTUD3 .1) otxn (0.083 Nontoxins 0.910), xsrpeetvnmgadepeso eessae ytems ihyepesdtasrp nta eoi ein e,blue, Red, region. genomic that in transcript expressed highly most the by scaled levels expression venom-gland represent axes ≤ P ≤ .3)rgos niaigta chromatin that indicating regions, 0.038) .0) l oi Ss(P TSSs Toxin All 0.009), P P < < .0) lhuhorbootstrap- our although 0.001), .0) negncrgoswere regions Intergenic 0.001). 2 2 2 2 2 2 oi-eefml nteTgrRtlsae hoai cesblt n ehlto eesrglt h expression the regulate levels methylation and accessibility Chromatin Rattlesnake. Tiger the in family toxin-gene EPLA -E EPLA -E EPLA -E EPLA -E EPLA -E PLA < .0) o P Toxin TPM Low 0.001). 2 n SVMP and 3C) (Fig. ≤ P ≤ .6) and 0.160), h ie atenk eoervasacmlxgntp neligasml eo phenotype venom simple a underlying genotype complex a reveals genome Rattlesnake Tiger The 0 = . 013), SI 2 2 2 2 2 B PLA -B2 B PLA -B2 B PLA -B2 B PLA -B2 B PLA -B2 ircrmsm ,teTgrRtlsaepsesdthree possessed Rattlesnake Tiger 4 the PLA (Figs. 7, level microchromosome expression of 5; regardless and regions genes, genomic all toxin for containing venom-specific levels expression and putative levels, methylation represent see enhancers; and (which promoters regions accessible 3A; (Fig. level S8). expression Fig. toxin-gene Appendix, to related Toxin methylation directly and TPM were accessibility Low chromatin in that showed TPM result regions venom High nonsignificant TSS in and either result regions significant and TSS The Toxin pancreas expected. as across 3A), (Fig. level gland methylation in ferent (0.120 Closed enx netgtdvnmgadseicchromatin- venom-gland-specific investigated next We 2 eo ee:tebsc(PLA basic the genes: venom .Frthe For S11–S29). Figs. Appendix , SI 2 2 2 2 2 C PLA -C2 C PLA -C2 C PLA -C2 C PLA -C2 C PLA -C2 ≤ P 2 -B2 ≤ 2 2 2 2 2 .5)rgoswr o infiatydif- significantly not were regions 0.950) A PLA -A2 A PLA -A2 A PLA -A2 A PLA -A2 n eue cesblt n increased and accessibility reduced and -A2) and A PLA -A2 aeil n Methods and Materials 2 n - A2 and -C2, -B2, 2 2 2 2 2 FMUL1 -F FMUL1 -F FMUL1 -F FMUL1 -F FMUL1 -F 2 n acidic and -B2-MTXB) r oi ee.Max., genes. toxin are PLA 2 o details), for age tal. et Margres einon region SI Downloaded by guest on October 2, 2021 h ie atenk eoervasacmlxgntp neligasml eo phenotype venom simple a underlying genotype complex a reveals genome Rattlesnake Tiger The al. et Margres methylation reduced and accessibility increased found again We and genes: venom 233, SVMP five (SI possessed tlesnake genes toxin all across consistent S11–29). Figs. largely Appendix, not were but results genes, these active the near identi- fied were regions chromatin-accessible near Venom-gland-specific methylation increased and accessibility methyla- reduced and near accessibility tion increased found We S3). Fig. transcriptome, venom-gland the in and genes toxin expressed highly (PLA represents shading appropriate; red where whereas tracks (control), all under pancreas not shown the and are in annotations overlaid, bins the Gene are 2-kb at gland. colors coverage in venom in levels respectively; the and change methylation in sequenced, 244, fold represents bins individuals represent 2-kb shading signalValues three in gray MACS2-estimated the levels track, peak; methylation represent methylation the the under plots (LVG) ATAC-seq area For venom-gland the and summit. represent left peak RNA-seq signalValues ATAC-seq, the Genrich-estimated RVG visible. in be transcriptome, and colors may MACS2, (RVG) orange by and venom-gland identified blue, right peaks ATAC-seq Red, bottom: venom-gland-specific Genrich, to RNA-seq by (i.e., percentage. top identified genes methylation from peaks active ATAC-seq Rows near venom-gland-specific ATAC-seq, methylation genes. LVG reduced transcriptome, silenced and near accessibility methylation increased with increased genes toxin SVMP of 5. Fig.

o h VPrgo nmcohoooe1 h ie Rat- Tiger the 1, microchromosome on region SVMP the For Expression Expression

Methylation % 100 signalValue signalValue 487 Coverage 190 relative to max. Coverage 350 relative to max. SVMP-244 PLA SVMP-234, 2 0 0 7 0 0 0 1 0 0 1 euaoylnsaefrteSM oi-eefml nteTgrRtlsae hoai cesblt n ehlto eesrglt h expression the regulate levels methylation and accessibility Chromatin Rattlesnake. Tiger the in family toxin-gene SVMP the for landscape Regulatory ertxnsbnt,wihwr h most the were which subunits, neurotoxin -A2-MTXA) SVMP-2442 EMSM-3 VP23SM-3 VP24SM-42ADAM28 SVMP-2442 SVMP-244 SVMP-234 SVMP-233 SVMP-232 NEFM NEFM NEFM NEFM NEFM 2 PLA hc a o xrse Fg 2; (Fig. expressed not was which -C2, 2 eeepesd(i.2; (Fig. expressed were -B2-MTXB and SVMP-244, r oi ee.Mx,maximum. Max., genes. toxin are and y xsrpeetvnmgadepeso eessae ytems ihyepesdtasrp nta eoi region. genomic that in transcript expressed highly most the by scaled levels expression venom-gland represent axes PLA only SVMP-2442; PLA 2 -A2-MTXA VP22SM-3 VP24SM-4 SVMP-2442 SVMP-244 SVMP-234 SVMP-233 SVMP-232 VP22SM-3 VP24SM-4 SVMP-2442 SVMP-244 SVMP-234 SVMP-233 SVMP-232 VP22SM-3 VP24SM-4 SVMP-2442 SVMP-244 SVMP-234 SVMP-233 SVMP-232 VP22SM-3 VP24SM-4 SVMP-2442 SVMP-244 SVMP-234 SVMP-233 SVMP-232 2 i.S3). Fig. Appendix, SI sepce,and expected, as -C2, SVMP-232, PLA 2 IAppendix, SI n reduced and -C2 SVMP-234 Fg 4). (Fig. SVMP- env n 3konmtf htwr infiatyenriched significantly 21 were these that regions; motifs chromatin-accessible known venom-gland-specific 13 in and novo and de (39) (see Genrich the independently ods both datasets on (40) (38) MACS2 HOMER using discov- chromatin- motif analysis a ery performed venom-gland-specific we above, the described regions accessible in enriched significantly the Network. in Regulatory Venom-Gene found were par- regions silenced accessible three of the introns several not although but genes, alogs, active the near 5). mostly identified (Fig. were and paralogs regions accessibility silenced chromatin-accessible three reduced Venom-gland-specific the and near paralogs methylation expressed increased two the near o eal) sn h erc aae,w dnie 8 identified we dataset, Genrich the Using details). for hc a o xrse Fg 5). (Fig. expressed not was which SVMP-2442, SVMP-234 and n eue cesblt and accessibility reduced and SVMP-244) SVMP-232, https://doi.org/10.1073/pnas.2014634118 oietf euaoyelements regulatory identify To SVMP-233, ADAM28 ADAM28 ADAM28 ADAM28 aeil n Meth- and Materials SVMP-234, PNAS | SVMP- f12 of 7

EVOLUTION motifs represented ∼ 17 different transcription factors (TFs; nonallelic homologous recombination, and our data further sup- Datasets S2 and S3). Using the MACS2 dataset, we identified port this hypothesis. Evidence of gene duplications and deletions 24 de novo and 30 known motifs that were significantly enriched was widespread across the genome. For example, with 15 func- in venom-gland-specific chromatin-accessible regions; these 54 tional paralogs, SVSPs were the largest toxin-gene family in motifs represented ∼ 47 different TFs (Datasets S4 and S5). the Tiger Rattlesnake genome (Dataset S1). Similar to previ- Motifs for ∼ 10 TFs were shared among the two datasets: seven ous work in other toxin genes (29, 35), we found evidence of members of the forkhead class (FOX) of DNA-binding proteins, multiple pseudogenes and orphaned exons on microchromo- multiple motifs for the Nuclear Factor I (NFI) family, GRHL2, some 2 (SI Appendix, Figs. S11 and S12), suggesting that past and SMAD3 (although SMAD3 motifs were not identified in any duplications and deletions played a role in generating the cur- toxin promotor/TSS; see below). rent set of SVSP paralogs in the Tiger Rattlesnake genome. We identified a full-length NFIC motif and a half-site NFI Although we found substantial evidence for gene loss, gene dele- motif in the promoter/TSS regions of both PLA2-B2-MTXB and tions alone could not completely explain the simplest venom PLA2-A2-MTXA (Datasets S6 and S7), suggesting that these par- phenotype, as only half of the identified toxins were expressed alogs were co-regulated by the NFI family. Similarly, a FOXA2 at appreciable levels in the venom gland and confirmed in motif was identified in the promoter/TSS region of SVMP- the venom. 234, and a FOXA1 motif was identified in the promoter/TSS To identify regulatory mechanisms governing the expression region of SVMP-244 (although other motifs were identified in (or lack thereof) of these genes, we investigated the roles of the untranslated region). A GRHL2 motif and a half-site NFI methylation and chromatin accessibility in toxin-gene regula- motif were consistently identified across multiple tion. Methylation level and chromatin accessibility were both serine proteinase (SVSP) paralogs. Overall, our motif analysis significant predictors of toxin-gene expression within the venom suggested that 1) paralogs within a toxin-gene family were likely gland (i.e., high- versus low-expression toxin genes) as well as co-regulated by the same (set of) TF(s); and 2) TFs were shared across tissues, suggesting that each mechanism played a role across different toxin-gene families. (perhaps jointly) in the production of the simplest venom pheno- To determine if the venom-gland-specific expression of toxin type. The demethylation of accessible, venom-specific promoter genes occurred through venom-gland-specific TFs, we tested for and enhancer regions was expected; transcription cannot take significant differential expression across venom glands and all place unless a gene is both accessible and free of methylation. other tissues for the 1) ∼10 candidate TFs described above, The redundancy of methylating inaccessible regions, however, and 2) ∼12 candidate TFs previously identified in the Prairie was less clear (44, 45). Previous research suggested that methy- Rattlesnake (28). Schield et al. (28) identified a set of 12 TFs lation was a passive process, by which methyl groups were that were significantly up-regulated in the Prairie Rattlesnake added following the removal of TFs (44). More recent work, venom gland relative to other tissues, but did not identify however, has suggested that methylation assists in maintaining specific binding sites to suggest a role in directly regulating chromatin accessibility by facilitating TF binding (46–48). Here, venom genes (see Discussion). We found 10 TFs that were signif- unmethylated sites would recruit methyl-minus TFs to bind and, icantly differentially expressed across Tiger Rattlesnake venom therefore, prevent chromatin from innately closing. Without the glands and other tissues. NFIA (P < 0.001), GRHL1 (P < removal of methyl groups, transcription factors could not bind, 0.001), and NR4A2 (P = 0.008), all candidates from the previous and chromatin would then rapidly become inaccessible (46–48); study, were significantly overexpressed in the Tiger Rattlesnake this inaccessibility could play a critical role in the prevention venom gland relative to all other tissues. FOXA2 (P = 0.001), of spurious transcription of nearby genes (49), although not all FOXA3 (P < 0.001), FOXD3 (P < 0.001), FOXO3 (P < 0.001), demethylated toxin genes were accessible. For example, toxin NFI (P < 0.001), NFIC (P < 0.001), and SMAD3 (P < 0.001) genes showed minimal expression in the pancreas despite a were significantly underexpressed in the venom gland; all other range of methylation levels (although the correlation between candidates were not differentially expressed across tissue types methylation and expression remained significant; P = 0.047; SI (Dataset S8). Appendix, Fig. S6). Toxin genes, however, were not accessible in the pancreas (SI Appendix, Figs. S8 and S9), indicating that Discussion methylation level and chromatin accessibility were not always A central question in evolutionary biology is whether major phe- redundant. notypic differences are largely the result of variation in gene Conservative estimates suggest that, in vertebrates, tandemly number, gene sequence, or regulation (22, 41, 42). We sequenced arrayed genes represent ∼14% of all genes and ∼25% of all and assembled a high-quality genome for the Tiger Rattlesnake gene-duplication events, yet little is known about their regula- to determine whether gene loss or specific regulatory mech- tion (50). Toxin genes are known to occur in large tandem arrays anisms produced the simplest rattlesnake venom. The most (22, 23, 26, 29, 35), and chromatin accessibility can span thou- contiguous snake-genome assembly to date, in terms of contig sands of base pairs. For example, the two Genrich-identified N50, provided the critical sequence information necessary to PLA2 peaks are ∼2.9 kb and ∼3.6 kb, respectively (Fig. 4). infer gene loss (43) and characterize regulatory landscapes Chromatin-accessible regions containing tandemly arrayed genes across the genome. We used the assembled genome, RNA-seq, could, therefore, contain multiple loci, TSSs, and/or TESs (e.g., ATAC-seq, and WGBS to show that gene deletions, chromatin the broad accessibility of the CTL tandem array; SI Appendix, accessibility, methylation, and specific TFs all contributed to the Fig. S14). High-expression toxin genes were unique in having production of a simple trait from a complex genotype. increased accessibility near both TSSs and TESs. In tandem Our comparative genomic analyses showed significant reduc- arrays, TESs and TSSs will often be adjacent, and 72 to 94% tions in the total number of toxin genes present in the Tiger of tandemly arrayed genes in vertebrate genomes have been Rattlesnake genome relative to other rattlesnake species both shown to be in parallel transcription orientation (i.e., on the same across the entire genome (28) as well as in specific genomic strand; ref. 50). We found similar patterns in the two largest regions (22, 23, 35). On microchromosome 7, we found a 11-kb toxin-gene families here (Fig. 5; SI Appendix, Figs. S11 and deletion in the Tiger Rattlesnake genome that resulted in the S12). Given that toxin paralogs often shared TF binding sites, deletion of three PLA2 venom genes, and this deletion event methylation may have prevented the transcription of nearby, coincided with the location of a known transposable element accessible genes that did not need to be expressed in the same (TE). Dowell et al. (22) proposed that the TE clusters between array; other highly expressed toxin genes not located in tandem genes could facilitate gene duplications and deletions through arrays, however, also exhibited increased chromatin accessibility

8 of 12 | PNAS Margres et al. https://doi.org/10.1073/pnas.2014634118 The Tiger Rattlesnake genome reveals a complex genotype underlying a simple venom phenotype Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 h ie atenk eoervasacmlxgntp neligasml eo phenotype venom simple a underlying genotype complex a reveals genome Rattlesnake Tiger The al. et Margres did modifications posttranscriptional different that suggesting very observed, show in toxin-protein shown to final As and product. need transcript toxin between would levels expression proteome tran- and the production, scriptome if venom-protein example, modulating For discordant. were be miRNAs to need transcrip- would the proteome venom, and role Rattlesnake tome Tiger a simple play the generating to in miRNAs, including modifications, this adults; transcriptional in also included individuals docu- were (all been study Rattlesnake in not Tiger has the classes in variation age mented ontogenetic to across but expression proposed (57), venom rattlesnakes been mechanisms in similarly such differences for have underlie role (miRNAs) minor more a but microRNAs best, (56), (10). at venoms showed, snake work in recent variation phenotypic 25). major of a ref. as source in proposed (reviewed been have involved posttranscrip- modifications be Posttranscriptional and also may RNAs, modifications, tional noncoding (SNPs), nucleotide single shown including polymorphisms complex mechanisms, of been generation other the yet have in variation, roles trait TFs key play and to elsewhere and methylation, here as such modifications interactions sets by specific TFs. and of predicted level, methylation were structure, chromatin production between venom reg- toxin-gene few and that a showed to ulation we linked Overall, be TFs. to venom-specific appears key components expression, nonvenom toxin-gene than the other production, rather regulating Venom the translation/secretion. on to venom The with gland, linked associated types. venom be the tissue may in across hand, overexpressed expression were differential that of TFs explain- lack tissues, the other also as in ing while such roles genes regulatory identified, venom regulating important we directly maintaining that in role TFs a the play genes. FOXA1, that toxin and propose near expression we sites on binding Therefore, based for enrichment identified significant were et lacked Schield TFs in the candidate motifs whereas (28) expression, al. for toxin-gene enrichment in candi- role significant Our a regulatory suggesting glands. on regions, venom based chromatin-accessible venom-gland-specific the were in TFs candidate overexpressed date Rattlesnake how- were Prairie did, the that We glands from list Rattlesnake. TFs venom Tiger three the in none identify in up-regulated ever, regions, tissues significantly other promoter/enhancer to were relative toxin TFs these putative these for of in enrichment significant TFs identified remod- ∼10 chromatin we through Although Rattlesnake eling. Tiger the results in These expression the of (55). example, activation members for and that, by suggest structure regulated chromatin affect be to also can and the chromatin with act the to belonged FOX identified TFs to date corresponded that signifi- elements regions chromatin-accessible regulatory venom-gland-specific of the set in a enriched identify cantly to ATAC-seq our able Using were (28). we with regions data, toxin-gene associated not in did TFs sites these they binding of regulation, any for venom in suggesting enrichment role significant genome, a identify Rattlesnake play (28) may Prairie TFs al. the these that in et genes Schield venom for of Although evidence (53). found recently evolution highly phenotypic or of and arrays, signature tandem a in are genes TESs genes. expressed venom and highly genes, TSSs S16 ). identified expressed accessible deter- and been to if required S15 only is work mine Figs. have comparative Further , Appendix TESs (51). before accessible (SI once knowledge, TESs our and To TSSs both near tutrlcp-ubrvrat,crmtnsrcue DNA structure, chromatin variants, Structural/copy-number (52) regulation transcriptional in role central a play also TFs NFI N-idn rtishv ensont ietyinter- directly to shown been have proteins DNA-binding aiy(4,and (54), family oee,ti a not was this however, S4 , Fig. Appendix, SI .Fraypost- any For Methods). and Materials NFI NFI NFI 0Ts ih fte1 candi- 10 the of Eight TFs. ∼10 ebr aeas enshown been also have members Fbnigstsnear sites binding TF Ffml a regulate may family TF FOX or NFI Ffamilies. TF ∼ PLA 72% 2 iie apesz (n a size Unfortunately, on sample regulatory networks. of limited regulatory regions, generation complex the promoter phenotype. across to venom and contribute variation the likely enhancer hand, of other the production the the in in particularly role SNPs, key a play not etlnt,5. mttllnt)cletdfo at rzCut,Ari- County, extracted Cruz was Santa (gDNA) from DNA male genomic collected (HMW) length) adult High-molecular-weight total an zona. cm of 56.5 vein Sequencing. length, caudal vent PacBio the and from extracted Isolation DNA High-Molecular-Weight Methods and Materials of evolvability potential geno- map. the complex genotype–phenotype to highlighting the extend traits, may simple modularity and this types that phenotype architec- show complex a We regulatory produce (2). to modular genotype simple that a allowed showed tures polygenic work a in Previous mecha- divergence regulatory trait. phenotypic these to link genes as particular production at well nisms as the the phenotype, to overall assess mechanisms the simultaneously key of to two able of tractability were contributions genetic we relative the system, using venom the By of (8). challenging complex is for especially phenotypes, identifying variation, ubiquitous, such is producing mechanisms expression the gene in variation Although Conclusion of maintenance selec- the role complexity. in what genomic plays and show paralogs networks silenced will gene-regulatory work compar- on Further genomic tion (64). functional genome one and the least ative in at element with on regulatory interact selection that other likely shown purifying regions has of chromatin-accessible work result of recent the elements; be regulatory may shared genes of suggest- inactive, maintenance but (63), the elements that likely regions ing enhancer These of genome. venom-specific Rattlesnake introns Tiger represented several the the expressed, in found in gene not We this regions but Rattlesnake. chromatin-accessible retained, Diamondback venom-specific also Western was the locus in the (mdc- genes; in paralog SVMP paralog this described functional (35) a al. 1 as et retained Giorgianni (SI was genome. sequenced the but transcriptomes S3), venom-gland Fig. seven Appendix, the of any reg- example, in shared/co-opted For of elements. because of retained ulatory some be that in may hypothesis or paralogs additional stages, these an life posit We different tissues. individuals at different other populations), in different expressed in be (perhaps may genes venom functional, the inactive, that in but posited SVMPs and examining when Rattlesnake pattern Diamondback similar Western a toxin found Giorgianni (35) selection. inactive, al. by et maintained but being likely functional, most gene were that tandem genes highlighting in theory genome be further complexity Rattlesnake can arrays, Tiger pseudogenization of the common in cost how array showed to tandem SVSP genotype, contrary The a (62). (61), in could advantageous maps genes genotype–phenotype of be complex number more have the evolvability that and with and suggesting increase robustness expressed that actually not Sim- suggested can clear. were have less studies genes was ulation however, phenotype. venom pseudogenization, venom the the undergone the of not of to half half approximately contributed evolvability Why Here, genes and (58–60). venom robustness, trait identified adaptation, particular a of for rate the influence in aid should identified we goal. enhancers to that and central promoters the is affects networks; putative regulatory species variation complex of and sequence evolution the populations how understanding across determining regulation but venom study, this in ntermnsrp)a h otlkl meit netrof ancestor immediate likely most the as manuscript) their in h rhtcueo h eoyepeoyempcndirectly can map genotype–phenotype the of architecture The 3 = rcue sfo drsigSNPs addressing from us precluded ) SVMP-2442 https://doi.org/10.1073/pnas.2014634118 SVMP-2442 n te functional, other and .tigris C. a o expressed not was PNAS 5 msnout- cm (52 lo was Blood | >95% f12 of 9

EVOLUTION by using a pipette-free protocol (SI Appendix). Single-molecule real-time UniProt database containing all protein-coding genes from A. carolinensis (SMRT) Cell Sequencing (Sequel) was performed at the University of (n = 19, 108) (74), O. hannah (n = 18, 504) (26), and Crotalus (n = 11, 799). Delaware Sequencing and Genotyping Center. HMW DNA was sequenced Because venom-gene families are known to occur in large tandem arrays across six SMRT cells and resulted in 3,540,291 filtered subreads (total and MAKER can underestimate the number of paralogs in particular gene sequence length of 53,096,151,594 bp; ∼ 33× coverage), N50 of 25.73 kb, families (28), we performed additional annotation steps for venom genes and a maximum subread length of 164.84 kb (SI Appendix, Fig. S1). in the C. tigris genome. We used a combination of empirical annotation in FGENESH+ (34) as described (22, 28), as well as manual annotation using DNA Isolation and Illumina Sequencing. Blood was extracted from the RNA-seq alignments in HiSat2 (v2.1.0) (76); the former identified all genes caudal vein of the adult male C. tigris genome animal. gDNA was extracted regardless of expression, whereas the latter was used to explicitly identify for short-read genome-library construction following a standard phenol– expressed toxin genes. chloroform extraction protocol. Approximately 25 µL of whole blood was added to 700 µL of lysis buffer, 35 µL of 10 mg/mL proteinase K, and 1 µL Genomic Alignments. We used published genome sequences to compare the ◦ of 20 mg/mL ribonuclease A; the solution was incubated at 55 C for 2 h. PLA2 and SVMP toxin arrays across all rattlesnake species investigated to One milliliter of phenol–chloroform was added, and the solution was cen- date. For the PLA2 genomic region, we aligned sequences from C. tigris (this trifuged at 13,000 rpm for 5 min. The aqueous phase containing DNA was study), Crotalus viridis (28), and six individuals from Dowell et al. (22): type added to 1 mL of chloroform–isoamyl, and the solution was centrifuged A and type B C. horridus, type A and type B C. scutulatus, C. atrox, and at 13,000 rpm for 3 min. Six hundred microliters of the aqueous phase Crotalus adamanteus. For the SVMP genomic region, we aligned sequences was removed, and 60 µL of 3 M sodium acetate and 1,200 µL of ice-cold from C. tigris (this study), C. viridis (28), and four individuals from Dowell ethanol were added. The solution was placed at −20 ◦C overnight to allow et al. (23): type A and type B C. horridus and type A and type B C. scutu- DNA to precipitate. Precipitate DNA was washed with 70% ethanol and latus. Alignments were conducted by using MAFFT (v7.407) (77). Given the resuspended in 100 µL of 10 mM Tris buffer. The genome library was pre- high rate of gene loss in these regions leading to large differences in length pared by using the TruSeq DNA PCR-Free Library Prep and sequenced 150 across species, we first anchored the alignments by using the flanking non- PE on a NovaSeq 6000 at the Translational Science Laboratory in the Col- toxin genes on either side of the toxin-gene arrays. For the PLA2 region, we lege of Medicine at Florida State University. Sequencing resulted in ∼ 190× used OTUD3 and MUL1. For the SVMP region, we used ADAM28 and NEFM. coverage. We then performed pairwise alignments of each pair of sequences (rather than aligning all at once) to account for the variation in these regions and Genome Assembly. A hybrid genome assembly was performed by using produce a tractable alignment. the Illumina and PacBio data with MaSuRCA (v3.2.8) (65) and default settings on the Clemson University Palmetto Cluster. Genome assembly com- RNA-Seq Analyses. Transcriptomes were assembled by using the “new pleteness was assessed by using the BUSCO (v3) (30) vertebrata gene set Tuxedo” package (78). Trimmed reads were aligned to the de novo- (n = 2, 586). Genome scaffolding was performed by using RaGOO (32); the assembled genome by using HiSat2 (v2.1.0) (76). Transcripts were assembled, de novo-assembled genome and the raw Illumina data were provided as and expression was quantified by using StringTie (v1.3.4) (79). edgeR (80) input. The Prairie Rattlesnake genome (28) served as the reference. All was used to calculate differential expression by comparing tissues with repli- other parameters were default. For both the de novo and the scaffolded cates (i.e., venom gland, vomeronasal organ, muscle, , pancreas, and RaGOO assemblies, we calculated gene density, GC content, and repeat testes) to the expression of all other tissues. Briefly, we used StringTie and content across 100-kb regions using custom python scripts (https://github. edgeR to measure expression in counts per million (CPM). Only genes with a com/masonaj157/Ctigris genome scripts). For the RaGOO scaffolded assem- CPM > 5 in two or more samples were used in downstream analyses. Cover- bly, the proportion of methylated cytosines was calculated from the venom- age data were then normalized in edgeR by using default Trimmed Mean of gland WGBS library from the genome individual. For ATAC-seq coverage, M-values. Genes with a false-discovery rate of <1% and a log-fold change venom-gland libraries for the genome individual were aligned to the of >2 were considered significantly different. Differentially expressed genes RaGOO assembly using Bowtie2 (66) as described below. Summary statis- were visualized by using pheatmap in R across all 36 samples (left and tics for the RaGOO assembly shown in Fig. 1B were visualized in R using the right venom-gland transcriptomes were combined for three individuals circlize package (67). See SI Appendix for details. for plotting) using log2-transformed CPM (SI Appendix, Fig. S2). The 51 putative toxin genes were visualized across all venom-gland samples (SI RNA-Seq. Total RNA was extracted from 39 individual tissues by using a Appendix, Fig. S3). standard TRIzol method (68); these tissues represent 24 different tissue types across seven individuals (SI Appendix, Table S2). To avoid poten- ATAC-Seq, Peak Calling, and Motif Analyses. Nine tissues from three individu- tially confounding ontogenetic differences in venom expression common als (SI Appendix, Table S4) were dissected immediately following euthanasia in rattlesnakes (e.g., ref. 16), all individuals sampled were adults (onto- and individually prepped for ATAC-seq as described (81) with minor modifi- genetic venom variation has also not been documented in C. tigris). To cations. Approximately 50,000 cells per tissue (i.e., ∼50 k intact nuclei) were investigate how dependent venom expression was on the last date of isolated for each library, and nuclei isolation, preparation, and cleanup were venom expulsion, we tested three different stimulation timepoints prior carried out as described (81). The Tn5 transposase (Nextera DNA Library to gland removal: 1) five individuals at 4 d following gland stimulation Preparation Kit, Illumina) was used for transposition. Libraries were then (standard practice; ref. 69), 2) one individual at 1 d (active state), and 3) sequenced 150 PE on a NovaSeq 6000 at the Translational Science Labora- one individual >30 d (resting state). Venom expression has also repeat- tory in the College of Medicine at Florida State University; sequencing depth edly been shown to be under genetic, rather than plastic, control (16, is summarized in SI Appendix, Table S4. Low-quality bases were trimmed 70). Libraries were sequenced 150 PE on a NovaSeq 6000 at the Trans- with Trim Galore! (v0.4.5) (71). Overall alignment rates varied from 81.68 lational Science Laboratory in the College of Medicine at Florida State to 98.79%. Peaks were independently called for each library by using Gen- University or on a NextSeq 550 at the Clemson University Genomics and rich (39) and MACS2 (40). To identify venom-gland-specific ATAC-seq peaks, Bioinformatics Facility. Low-quality bases were trimmed with Trim Galore! BEDTools (82) was used to 1) identify peaks shared across all six venom- (v0.4.5) (71). gland libraries (13,655 Genrich peaks and 27,509 MACS2 peaks) and 2) remove any peak identified in a non-venom-gland tissue (pancreas, labial Genome Annotation. We annotated repeat elements in the de novo- gland, or harderian gland) that overlapped ≥ 50% with a peak from the assembled genome using RepeatModeler (72). We then de novo-assembled shared venom-gland peak set in (1). This filtering step was performed 25 individual transcriptomes (representing 24 distinct tissues) from the adult for the Genrich and MACS2 peak datasets independently and resulted in male C. tigris genome animal as described (73). We then used MAKER 2,446 and 5,919 venom-specific peaks in the Genrich and MACS2 datasets, (v2.31.8) (33) to annotate coding sequences in the de novo-assembled respectively. pyGenomeTracks (83) was used to visualize ATAC-seq data and genome using the 309,105 de novo-assembled transcripts (SI Appendix), peak calls. To identify regulatory elements significantly enriched in the the species-specific repeat library described above, and all protein-coding venom-specific ATAC-seq peaks relative to the genomic background, we per- genes from Anolis carolinensis (n = 19, 108) (74) and Ophiophagus hannah formed a motif-discovery analysis using HOMER (38); analyses were run for (n = 18, 504) (26). After our initial MAKER run, we used BUSCO (v3) (30) the Genrich and MACS2 peak datasets independently. Discovery analyses and our de novo genome assembly to iteratively train AUGUSTUS (75); were run by using “-size 200” and “-mset vertebrates” parameters. Enrich- three iterations of training were performed. Following the final MAKER ment P values ≤ 1e−50 were considered significant, as suggested by the iteration, gene IDs were ascribed based on homology, as described (28). HOMER documentation. Peaks were annotated by using a custom motif file Briefly, we used blastp with an e-value threshold of 1e−6 to search a custom containing all significant de novo and known motifs from either the Genrich

10 of 12 | PNAS Margres et al. https://doi.org/10.1073/pnas.2014634118 The Tiger Rattlesnake genome reveals a complex genotype underlying a simple venom phenotype Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 1 .Hlig .Bad,H ib,Ceouino eo ucinadvnmresis- venom and Margres function M. venom 12. of Coevolution Gibbs, H. Biardi, J. Holding, to M. little contribute 11. mechanisms Post-transcriptional Calvin, K. Margres, M. Rokyta, D. 10. h ie atenk eoervasacmlxgntp neligasml eo phenotype venom simple a underlying genotype complex a reveals genome Rattlesnake Tiger The al. et Margres in ecluae enAA-e oeaei -bwnosars the across windows 2-kb in methyla- coverage ATAC-seq with correlated mean was calculated accessibility we chromatin the tion, these if using among determine (90) To terms ShinyGO 0.05. GO used enriched we gallus for genes, Gallus search nontoxin esti- To expressed was (84). expression differentially Differential DESeq2 tissues. with gland mated venom and pancreas across (P toxins expressed in differentially toxins, difference (adjusted analysis: this nontoxins the We for TPM genes pancreas. calculated of in and groups glands We three expression venom tested above. between gene expression described estimated gene and as and methylation TSS (78), around StringTie methylation measured regions with we 1-kb expression, of gene level lower increased in if gene determine resulted To ggpubr. between methylation using each relationship the between estimated correlation the we of degree level, assess methylation to and accessibility, Finally, chromatin replicate. expression, each for ment (∼ observations replicated confi- 50 bootstrap provide subsampling 10,000 To tests, performed we tissues. estimates, three significance the in between dence differences and significant ggpubr used for We nontoxin test regions. genes, closed toxin and TSS, open regions, nontoxin intergenic of and genes, methylation toxin mean around calculated regions in 1-kb we types regions, genome, tissue start the between gene of differed of regions downstream methylation and if different upstream determine To kb sites. 3 stop as methyla- well and pancreas. plot as 100%) to genes and used all to was across (0 glands tion (89) deepTools levels the venom regions. methylation genomic across visualize particular the to across windows both used the 2-kb was of in for (83) alignment methylation pyGenomeTracks methylation genome overall mean mean novo-assembled and 1) 2) de (88) and calculate glands to M-bias venom used for Bis- two was check analyses. (82) to downstream BEDTools the used in success. and were used merged, reports were were levels mark a sites at methylation methylation states CpG Non-CpG methylation resulting 10). 10 to (–cutoff of mapped observation site then minimum given were a with reads (–non extracted (87), genome were Bismark novo-assembled Using de reads. the all from 3 reduce To end 20. the of each score at Phred bias minimum a methylation with Science (71) Translational tar- (v0.4.5) a Galore! the 20× Trim with with University at of Libraries State coverage Kit. 6000 Florida Prep sequencing at NovaSeq Medicine Library geted of a Methyl-Seq College on the Pico in Zymo PE Laboratory the 150 using sequenced phenol– by 5 were standard later performed a were d using preps 6 Library by to above. dissection 4 described after at as d procedure, EtOH 14 chloroform 95% extracted in was stored DNA and (85). euthanasia following immediately sected (84) WGBS. DESeq2 used we See tissues, expression. body differential significant other for to test compared to gland up-regulated venom were the TFs candidate in if determine To respectively. run, MACS2 or .O aoann .Lsox .Mrl,Eooia eoiso oa adaptation. local of Margres genomics M. Ecological 9. Merila, J. Lascoux, M. Savolainen, O. 8. .L hn,R ed eoeeiigi utrisrvasthat reveals butterflies in editing Genome Reed, R. Zhang, L. 7. .S Lamichhaney evolution. S. network regulatory 6. gene on Perspectives Halfon, M. population-level 5. shapes adaptation regulatory Sackton T. Genome-wide 4. Reed, R. Lewis, J. 3. clinic. the to Belleghem Van biology S. pure 2. From transdifferentiation: and Metaplasia Slack, J. 1. (2017). differentiation. expression through exclusively ceeded prey. squirrel its and predator (2016). rattlesnake a in tance venoms. snake in variation phenotypic adamanteus ). (Crotalus rattlesnake (2014). diamondback 145–158 eastern the of Genet. Rev. Distal-less ipaeetdrn drought. a during displacement (2017). 436–447 birds. in landscapes genomic genes. pattern wing Biol. Cell Mol. Rev. IAppendix, (SI individuals two from tissues Three Science > .) l otxn etdhda vrg P rae hn300 than greater TPM average an had tested nontoxins All 0.1). erse yso oo patterns. color eyespot represses ovretrgltr vlto n oso ih npaleognathous in flight of loss and evolution regulatory Convergent al., et ikn h rncitm n rtoet hrceietevenom the characterize to proteome and transcriptome the Linking al., et 0–2 (2013). 807–820 14, and uniy o ult:Rpdaatto naplgnctatpro- trait polygenic a in adaptation Rapid quality: not Quantity, al., et 47 (2019). 74–78 364, eksz ou nDri’ nhsfclttdcharacter facilitated finches Darwin’s in locus size beak A al., et atsnorvegicus Rattus ope oua rhtcueaon ipetoktof toolkit simple a around architecture modular Complex al., et 6–7 (2007). 369–378 8, P a.Eo.Evol. Ecol. Nat. Heliconius. [Padj] 0 n 5 and < Science .1,adnnifrnilyepesdnon- expressed nondifferentially and 0.01), o.Bo.Evol. Biol. Mol. 02(2017). 0052 1, 0 G3 eoe n as-icvr aeof rate false-discovery a and genomes 8) o-ult ae eetrimmed were bases Low-quality (86). nso ed,w lpe 0b on bp 10 clipped we reads, of ends 7–7 (2016). 470–474 353, ubro oi ee)wt replace- with genes) toxin of number 3528 (2015). 2375–2382 5, a.Commun. Nat. ietoa) n ehlto calls methylation and directional), 5–7 (2018). 159–173 36, IAppendix SI rc .Sc B Soc. R. Proc. o.Bo.Evol. Biol. Mol. 16 (2016). 11769 7, eedis- were S4) Table spalt rnsGenet. Trends .Proteomics J. o details. for 20152841 283, rmtsand promotes 3099–3110 34, t et to tests Nat. Nat. ◦ 96, 33, C t 3 .Dowell N. 23. canebrake of presence the in Dowell variation N. Regional Wolf, 22. B. T. Straight, C. R. Glenn, L. J. 21. Margres M. 20. Calvete J. 19. significance biological and Trends rattlesnakes: in composition Venom Mackessy, P. S. 18. Margres M. 17. Margres M. 16. W W. Daltry, C. J. 15. of neofunctionalization and evolution Strickland J. Adaptive 14. arsenal: an Inventing Lynch, J. V. 13. aisGopa avr nvriyfravc nAA-e nlss and analyses; ATAC-seq Infor- on assembly. Sciences genome advice and on for Arts discussions for University of Broe Fisher Harvard Faculty Michael Travis at the annotation; Group and gene Sackton matics on Todd Tim discussions data; photography; for Geno- PacBio for Schield & high-quality Drew generating Sequencing and on DNA Castoe assistance Delaware and for of Kingham Center University Bruce Bioinfor-typing the thank at and We Award Shevchenko resources. Genomics Development Olga computational University Clemson for Institutional Clemson National Facility by the Sciences matics NIH thank provided Medical by We funding supported General P20GM109094. startup were of and resources Institute C.L.P.) Additional (to (C.L.P.). University 1822417 DEB and no. (accession PRJNA88989 ACKNOWLEDGMENTS. BioProject at data found RNA-seq be avail- can available study are SRR5270853). Previously this data PRJNA558767. in (SRR11461881–83) used BioProject WGBS under and able SRR11816475–79), and (SRR10189615– 24, SRR11545022– WGS (SRR11524048–77, RNA-seq Illumina (SRR11413274–82), no. ATAC-seq (SRR9915945), 26), (accession PacBio Archive/GenBank Raw Nucleotide VORL00000000). Japan/European of Bank Data Availability. Data University State Florida and (13–17W), (2017- (1529/1836). Florida University Central Clemson of University at the Committees 067), Use and by Care approved Animal Commission were procedures Institutional Conservation All Authority. Wildlife Island Jekyll and and LSSC-13-00004, Fish and Florida Parks SPR-0713-098, SP628489/SP673626/SP622613, Wildlife Fish and Game mits: Protocols. and Permits See the (10). in described present as were individual genome genome details. the the in of identified venom toxins which determine to Proteomics Venom Quantitative peaks of details. number effective the except whether for samples determined all we cal- whether for phenotype, were estimates venom determine CIs peak To 95% of tigris. The number C. peaks. effective of the identi- number for were effective culated peaks were the indices Distinct obtain diversity 1a. and to calculated, Fig. corrected was in (H) index shown diversity Shannon’s samples fied, venom performed eight was the (RP-HPLC) on chromatography liquid high-performance phase Chromatography. Liquid High-Performance Reversed-Phase Venom details PLA and log- the level After for methylation coverage windows. correlated ATAC-seq we methylation coverage, ATAC-seq calculated of previously transformation match to genome (2018). noeatraievnmtpswti atenk species. rattlesnake within types venom alternative encode Biol. Curr. in toxin populations. among expression-driven activities toxic for in evidence divergence and adamanteus) (Crotalus rattlesnake diamondback (2010). 528–544 9, along trend pedomorphic tive American South the 495–510. and pp. 2008), CA, Linda, Loma Press, University Linda (Loma Eds. in venoms. snake in selection adamanteus ). (Crotalus rattlesnake diamondback (1996). 537–540 scutulatus). (Crotalus rattlesnakes Mojave in variation A phospholipase venom snake .tigris C. .K ae,K .Baa,M .Crwl,S .Bush, P. S. Cardwell, D. M. Beaman, R. K. Hayes, K. W. Rattlesnakes, of Biology The rtlshorridus Crotalus h epoii n eetls fvnmtxngnsi rattlesnakes. in genes toxin venom of loss recent and origin deep The al., et nk eoiso h eta mrcnrattlesnake American central the of venomics Snake al., et 4424 (2016). 2434–2445 26, elotiete9%C ftedsrbto.See distribution. the of CI 95% the outside fell xrml iegn altpsi w oi eecomplexes gene toxin two in haplotypes divergent Extremely al., et ucinlcaatrztoso eo hntpsi h eastern the in phenotypes venom of characterizations Functional al., et ipn h cls h irto-eeto aac en toward leans balance migration-selection The scales: the Tipping al., et vdnefrdvretpten flclslcindiiggenome driving selection local of patterns divergent for Evidence al., et hntpcitgaini h edn ytmo h eastern the of system feeding the in integration Phenotypic al., et h env eoeasml a endpstda DNA at deposited been has assembly genome novo de The se,R .Top,De n nk eo evolution. venom snake and Diet Thorpe, S. R. uster, ¨ hswr a upre yNFGat E 1638879 DEB Grants NSF by supported was work This nml eecletdudrtefloigper- following the under collected were rtlsdurissus Crotalus venom. o.Bo.Evol. Biol. Mol. 2 Crotalus genes. 2 op ice.Pyil C Physiol. Biochem. Comp. n VP cfod.See scaffolds. SVMPs and s eue uniaiems spectrometry mass quantitative used We .tigris C. M vl Biol. Evol. BMC ipra nSuhAmerica. South in dispersal https://doi.org/10.1073/pnas.2014634118 ope onst ertxct sa adap- an as neurotoxicity to points complex 7–8 (2019). 271–282 36, osse infiatysimpler significantly a possessed Toxicon o.Ecol. Mol. c.Rep. Sci. (2007) 2 7, 83 (2016). 28–38 119, ur Biol. Curr. 4532 (2015). 3405–3420 24, 3–4 (1994). 337–346 107, 72 (2018). 17622 8, PNAS IAppendix SI IAppendix SI IAppendix SI .Poem Res. Proteome J. rtlssimus Crotalus 1016–1026 28, Nature | Reversed- 1o 12 of 11 379, for for for

EVOLUTION 24. J. J. Calvete, A. Perez,´ B. Lomonte, E. E. Sanchez,´ L. Sanz, Snake venomics of Crotalus 58. M. C. Cowperthwaite, J. J. Bull, L. Ancel Myers, Distributions of beneficial fitness tigris: The minimalist toxin arsenal of the deadliest neartic rattlesnake venom. Evolu- effects in RNA. Genetics 170, 1449–1457 (2005). tionary clues for generating a pan-specific against crotalid type II venoms. 59. G. P. Wagner et al., Pleiotropic scaling of gene effects and the ‘cost of complexity’. J. Proteome Res. 11, 1382–1390 (2012). Nature 452, 470–473 (2008). 25. A. Marian, Molecular genetic studies of complex traits. Transl. Res. 159, 64–79 (2012). 60. J. Aguilar-Rodriguesz, L. Peel, M. Stella, A. Wagner, J. Payne, The architecture of an 26. F. Vonk et al., The king cobra genome reveals dynamic gene evolution and adaptation empirical genotype-phenotype map. Evolution 72, 1242–1260 (2018). in the snake venom system. Proc. Natl. Acad. Sci. U.S.A. 110, 20651–20656 (2013). 61. P. Catalan, A. Wagner, S. Manrubia, J. Cuesta, Addings levels of complexity enhances 27. W. Yin et al., Evolutionary trajectories of snake genes and genomes revealed by robustness and evolvability in a multilevel genotype-phenotype map. J. R. Soc. comparative analyses of five-pacer viper. Nat. Commun. 7, 13107 (2016). Interface 15, 20170516 (2018). 28. D. Schield et al., The origins and evolution of chromosomes, dosage compensation, 62. H. A. Orr, Adaptation and the cost of complexity. Evolution 54, 13–20 (2000). and mechanisms underlying venom regulation in . Genome Res. 29, 590–601 63. F. Yan, D. Powell, D. Curtis, N. Wong, From reads to insight: A hitchhiker’s guide to (2019). ATAC-seq data analysis. Genome Biol. 21, 22 (2020). 29. K. Suryamohan et al., The Indian cobra reference genome and transcriptome enables 64. W. Greenwald et al., Chromatin co-accessibility is highly structured, spans entire comprehensive identification of venom toxins. Nat. Genet. 52–106–117 (2020). chromosomes, and mediates long range regulatory genetic effects. https://doi.org/ 30. F. Simao, R. Waterhouse, P. Ioannidis, E. Kriventseva, E. Zdobnov, BUSCO: Assess- 10.1101/604371 (9 April 2019). ing genome assembly and annotation completeness with single-copy orthologs. 65. A. Zimin et al., The MaSuRCA genome assembler. Bioinformatics 29, 2669–2677 Bioinformatics 31, 3210–3212 (2015). (2013). 31. A. Lind et al., Genome of the Komodo dragon reveals adaptations in the cardio- 66. B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat. Methods vascular and chemosensory systems of monitor . Nat. Ecol. Evol. 3, 1241–1252 9, 357–359 (2012). (2019). 67. Z. Gu, L. Gu, R. Eils, M. Schlesner, B. Brors, Circlize implements and enhances circular 32. M. Alonge et al., RaGOO: Fast and accurate reference-guided scaffolding of draft visualization in R. Bioinformatics 30, 2811–2812 (2014). genomes. Genome Biol. 20, 224 (2019). 68. D. R. Rokyta, A. R. Lemmon, M. J. Margres, K. Aronow, The venom-gland transcrip- 33. B. Cantarel et al., Maker: An easy-to-use annotation pipeline designed for emerging tome of the eastern diamondback rattlesnake (Crotalus adamanteus). BMC Genom. model organism genomes. Genome Res. 18, 188–196 (2008). 13, 312 (2012). 34. V. Solovyev, P. Kosarev, I. Seledsov, D. Vorobyev, Automatic annotation of eukaryotic 69. D. Rotenberg, E. S. Bamberger, E. Kochva, Studies on ribonucleic acid synthesis in genes, pseudogenes and promoters. Genome Biol. 7, S10 (2006). the venom glands of Vipera palaestinae (Ophidia, Reptilia). Biochem. J. 121, 609–612 35. M. Giorgianni et al., The origin and diversification of a novel protein family in (1971). venomous snakes. Proc. Natl. Acad. Sci. U.S.A. 117, 10911–10920 (2020). 70. H. L. Gibbs, L. Sanz, J. E. Chiucchi, T. M. Farrell, J. J. Calvete, Proteomic analysis of 36. S. Feng et al., Conservation and divergence of methylation patterning in plants and ontogenetic and diet-related changes in venom composition of juvenile and adult animals. Proc. Natl. Acad. Sci. U.S.A. 107, 8689–8694 (2010). dusky pigmy rattlesnakes (Sistrurus miliarius barbouri). J. Proteomics 74, 2169–2179 37. R. Schmitz, Z. Lewis, M. Goll, DNA methylation: Shared and divergent features across (2011). eukaryotes. Trends Genet. 35, 818–827 (2019). 71. F. Krueger, Trim galore. A wrapper tool around cutadapt and fastQC to consis- 38. S. Heinz et al., Simple combinations of lineage-determining transcription factors tently apply quality and adapter trimming to fastQ files. https://www.bioinformatics. prime cis-regulatory elements required for macrophage and B cell identities. Mol. babraham.ac.uk/projects/trim galore/. Accessed 1 July 2019. Cell 3, 576–589 (2010). 72. A. Smit, R. Hubley, Repeatmodeler. http://www.repeatmasker.org/RepeatModeler/. 39. J. M. Gaspar, Data from “Genrich.” GitHub. https://github.com/jsh58/Genrich.git. Accessed 1 February 2019. Accessed 1 September 2019. 73. M. Holding, M. Margres, A. Mason, C. Parkinson, D. Rokyta, Evaluating the perfor- 40. Y. Zhang et al., Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, 1–9 mance of de novo assembly methods for venom-gland transcriptomics. Toxins 10, 249 (2008). (2018). 41. H. E. Hoekstra, J. A. Coyne, The locus of evolution: Evo devo and the genetics of 74. J. Alfoldi¨ et al., The genome of the green anole and a comparative analysis adaptation. Evolution 61, 995–1016 (2007). with birds and mammals. Nature 477, 587–591 (2011). 42. S. B. Carroll, Evo-devo and an expanding evolutionary synthesis: A genetic theory of 75. M. Stanke, B. Morgenstern, Augustus: A web server for gene prediction in eukaryotes morphological evolution. Cell 134, 25–36 (2008). that allows user-defined constraints. Nucleic Acids Res. 33, W465–W467 (2005). 43. D. Gordon et al., Long-read sequence assembly of the gorilla genome. Science 352, 76. D. Kim, B. Langmead, S. Salzberg, HISAT: A fast spliced aligner with low memory aae0344 (2016). requirements. Nat. Methods 12, 357–360 (2015). 44. T. Rea, The accessible chromatin landscape of the . Nature 489, 75–82 77. K. Katoh, D. Standley, MAFFT multiple sequence alignment software version 7: (2012). Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013). 45. S. Clark et al., scNMT-seq enables joint profiling of chromatin accessibility DNA 78. M. Pertea, D. Kim, G. Perea, J. Leek, S. Salzberg, Transcript-level expression analysis of methylation and transcription in single cells. Nat. Commun. 9, 781 (2018). RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11, 1650–1667 46. Y. Yin et al., Impact of cytosine methylation on DNA binding specificities of human (2016). transcription factors. Science 356, eaaj2239 (2017). 79. M. Pertea et al., StringTie enables improved reconstruction of a transcriptome from 47. Q. Lin et al., Methmotif: An integrative cell specific database of transcription factor RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015). binding motifs coupled with DNA methylation profiles. Nucleic Acids Res. 47, D145– 80. M. Robinson, D. McCarthy, G. Smyth, edgeR: A Bioconductor package for differen- D154 (2019). tial expression analysis of digital gene expression data. Bioinformatics 26, 139–140 48. R. Spektor, N. Tippens, C. Mimoso, P. Soloway, methyl-ATAC-seq measures DNA (2010). methylation at accessible chromatin. Genome Res. 29, 969–977 (2019). 81. J. Buenrostro, B. Wu, H. Chang, W. Greenleaf, ATAC-seq: A method for assaying 49. F. Neri et al., Intragenic DNA methylation prevents spurious transcription initiation. chromatin accessibility genome-wide. Curr. Protoc Mol. Biol. 109, 21–29 (2015). Nature 543, 72–77 (2017). 82. A. Quinlan, I. Hall, BEDTools: A flexible suite of utilities for comparing genomic 50. D. Pan, L. Zhang, Tandemly arrayed genes in vertebrate genomes. Comp. Funct. features. Bioinformatics 26, 841–842 (2010). Genomics 2008, 545269 (2008). 83. F. Ramirez et al., High-resolution TADs reveal DNA sequences underlying genome 51. J. Mieczkowski et al., MNase titration reveals differences between nucleosome organization in flies. Nat. Commun. 9, 189 (2018). occupancy and chromatin accessibility. Nat. Commun. 7, 11485 (2016). 84. M. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion 52. A. de Mendoza et al., Transcription factor evolution in eukaryotes and the assembly for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). of the regulatory toolkit in multicellular lineages. Proc. Natl. Acad. Sci. U.S.A. 110, 85. C. Schroder, W. Steimer, gDNAa extraction yield and methylation status of blood E4858–E4866 (2013). samples are affected by long-term storage conditions. PloS One 13, e0192414 53. P. Wittkopp, G. Kalay, Cis-regulatory elements: Molecular mechanisms and evolution- (2018). ary processes underlying divergence. Nat. Rev. Genet. 13, 59–69 (2012). 86. M. Ziller, K. Hansen, A. Meissner, M. Aryee, Coverage recommendations for methy- 54. M. Grabowska et al., NFI transcription factors interact with FOXA1 to regulate lation analysis by whole-genome bisulfite sequencing. Nat. Methods 12, 230–232 prostate-specific gene expression. Mol. Endocrinol. 28, 949–964 (2014). (2015). 55. M. Fane, L. Harris, A. Smith, M. Piper, Nuclear factor one transcription factors as 87. F. Krueger, S. Andrews, Bismark: A flexible aligner and methylation caller for bisulfite- epigenetic regulators in . Int. J. Canc. 140, 2634–2641 (2017). seq applications. Bioinformatics 27, 1571–1572 (2011). 56. N. Casewell et al., Medically important differences in snake venom composition are 88. K. Hansen, B. Langmead, R. Irizarry, BSmooth: From whole genome bisulfite dictated by distinct postgenomic mechanisms. Proc. Natl. Acad. Sci. USA 111, 9205– sequencing reads to differentially methylated regions. Genome Biol. 13, R83 (2012). 9210 (2014). 89. F. Ramirez, F. Dundar, S. Diehl, B. Gruning, T. Manke, deepTools: A flexible platform 57. J. Durban et al., Integrated “omics” profiling indicates that miRNAs are modulators for exploring deep-sequencing data. Nucleic Acids Res. 42, W187–W191 (2014). of the ontogenetic venom composition shift in the central American rattlesnake, 90. S. Ge, D. Jung, R. Yao, ShinyGO: A graphical gene-set enrichment tool for animals and simus. BMC Genomics 14, 234 (2013). plants. Bioinformatics 36, 2628–2629 (2019).

12 of 12 | PNAS Margres et al. https://doi.org/10.1073/pnas.2014634118 The Tiger Rattlesnake genome reveals a complex genotype underlying a simple venom phenotype Downloaded by guest on October 2, 2021