Supplementary Materials for 3 4 Concerted Evolution Reveals Co-Adapted Amino Acid Substitutions in Frogs That Prey on Toxic 5 Toads
Total Page:16
File Type:pdf, Size:1020Kb
1 2 Supplementary Materials for 3 4 Concerted evolution reveals co-adapted amino acid substitutions in frogs that prey on toxic 5 toads. 6 7 Shabnam Mohammadi*, Lu Yang*, Arbel Harpak*, Santiago Herrera Álvarez,, María del Pilar 8 Rodríguez-Ordoñez, Julie Peng, Karen Zhang, Jay F. Storz, Susanne Dobler, Andrew J. 9 Crawford,** & Peter Andolfatto,** 10 11 12 *Co-first authorship 13 **Correspondence to: [email protected], [email protected] 14 15 16 This PDF file includes: 17 18 Materials and Methods 19 Tables S1 to S7 20 Figs. S1 to S6 21 22 1 23 Table of Contents 24 Materials and Methods Sample collection and data sources 4 RNA-seq based gene discovery of ATP1A paralogs 4 Targeted sequencing of coding region of ATP1A1 paralogs 5 De novo genome assembly of Leptodactylus fuscus 5 Obtaining intron sequences of ATP1A1 via targeted long-read sequencing 6 Estimation of genealogical relationships 7 Maximum likelihood analysis of site-by-site support for alternative tree topologies 8 Theoretical single-site model for the probability of maintaining an adapted 9 substitution Simulations of ATP1A1 gene family evolution 12 Inference of evolutionary parameters using Approximate Bayesian Computation 15 Construction of expression vectors 19 Generation of recombinant viruses and transfection into Sf9 cells 21 Preparation of Sf9 membranes 21 Verification by SDS-PAGE and western blotting 22 Ouabain inhibition assay (measurement of CG resistance) 23 ATP hydrolysis assay (measurement of ATPase activity and a proxy for protein 23 activity) Statistical analyses of biochemical assay results 24 Supplementary Tables Table S1. Collection information for samples used in this study 25 2 Table S2. Sources of ATP1A1 sequences included in the phylogenetic analysis 26 Table S3. Summary of the de novo genome assembly of Leptodactylus fuscus 28 Table S4. Fisher’s exact test result for the likelihood of site-wise support of “ancient 29 origin” and “recent origins” topologies. Table S4. List of primers used for intron sequencing 30 Table S5. List of engineered ATP1A1 constructs used to test functional effects of 31 amino acid substitutions in Leptodactylus. Table S6. Summary of the ouabain sensitivity and catalytic properties of Na+, K+- 32 ATPase for each ATP1A1 construct. Table S7. Statistical analysis of ouabain sensitivity and ATPase activity. 33 Supplementary Figures Figure S1. Proportion of ATP1A1, ATP1A2, and ATP1A3 paralogs in brain, 34 muscle, and stomach of seven Anuran species. Figure S2. Annotation of ATP1A1S and ATP1A1R paralogs in Leptodactylus fuscus 35 de novo genome assembly Figure S3. Variation among sites implicated in CG-resistance for ATPA paralogs of 36 various species. Figure S4. Positions of 12 R-copy-specific amino acid substitutions on the crystal 37 structure of pig ATP1A1 bound to the cardiac glycoside bufalin. Figure S5. Western blot analysis of Na+, K+-ATPase with engineered ATP1A1 (α) 38 subunits produced in this study. Figure S6. Cardiac glycoside (ouabain) inhibition curves for six biological replicates 39 of each recombinant Leptodactylus Na+, K+-ATPase produced in this study 3 25 Materials and Methods 26 Sample collection and data sources 27 Samples of the five Leptodactylus species, plus Engystomops pustulosus, Lithodytes 28 lineatus, Atelopus zeteki, and Rhinella marina were collected from different geographic locations 29 in Colombia (Table S1) and stored in RNAlater (Invitrogen) at -80ºC until used. A tissue sample 30 of the toad, Atelopus zeteki, was donated by the Smithsonian’s National Zoo and came from a 31 necropsied captive animal. The outgroup species, Megophrys nasuta, Kaloula pulchra, Rana 32 sphenocephala, Rana catesbeiana, Dendrobates auratus, Melanophryniscus stelzneri, and 33 Duttaphrynus melanostictus were obtained from the pet trade under IACUC Protocol No. 2057- 34 16. Live animals were euthanized under the supervision of a research veterinarian at Princeton 35 University. To capture all three copies of ATP1A, we collected tissue samples from brain, 36 skeletal muscle, and stomach – each of which highly expresses at least one of the three copies 37 (15). Beside the data generated from the 16 anuran species, the rest of data used in this study 38 were downloaded from publicly available sources (Table S2). 39 40 RNA-seq based gene discovery of ATP1A paralogs 41 Full-length coding sequences of ATP1A1, ATP1A2 and ATP1A3 were reconstructed for 42 several species using RNA-seq based gene discovery. Total RNA was extracted from multiple 43 tissues from 16 Anuran species (Table S2) using TRIzol Reagents (Ambion, Life technologies) 44 following the manufacturer’s protocol. RNA-seq libraries were prepared with TruSeq RNA 45 Library Prep Kit v2 (Illumina) and sequenced on Illumina HiSeq2500 (Genomics Core Facility, 46 Princeton, NJ, USA) with either PE 75bp or SE 140bp (Table S2). Reads were trimmed and de 47 novo assembled with Trinity v2.2.0 (38). ATP1A1 of Xenopus laevis (GenBank 4 48 NM_001090595) was initially used to BLAST against the assembled transcripts of L. latrans to 49 recover ATP1A1S and ATP1A1R, which were later used as queries to reconstruct ATP1A1s 50 from other species. ATP1/2/3 of the rest species used in this study were mined from publicly 51 available datasets (Table S2) following the same pipeline (Figure S3). 52 53 Targeted sequencing of coding region of ATP1A1 paralogs 54 Total RNA was extracted from L. fuscus, L. insularum, and L. colombiensis as described 55 above and reverse-transcribed to single-strand cDNA using SuperScript III Reverse 56 Transcriptase (Invitrogen). ATP1A1 was amplified using Phusion High-Fidelity DNA 57 polymerase (Invitrogen) using forward primer: 5’-ATAAGTATGAGCCCGCAGCC-3’ and 58 reverse primer: 5’-CCAGGGCTGCGTCTGATTATG-3’. PCR products were cleaned with 59 QIAquick PCR Purification Kit (Qiagen) and A-tailed with Taq Polymerase (NEB) before 60 cloning into a pTOPO-TA vector (Invitrogen). The presence of the insert in the plasmid was 61 confirmed by colony-PCR. Libraries of plasmids were prepared with Tn5 transposase, indexed 62 with customized barcodes (22), and sequenced on Illumina MiSeq Nano (Genomics Core 63 Facility, Princeton, NJ, USA). Assembly of the plasmids was performed with Velvet v1.2.10 64 (39) and Oases v0.2.8 (40). ATP1A1 was reconstructed by aligning with previously obtained 65 ATP1A1 sequences of L. latrans and L. pentadactylus. 66 67 De novo genome assembly of Leptodactylus fuscus 68 High-molecular-weight DNA was isolated from a single Leptodactylus fuscus embryo by 69 HudsonAlpha (Huntsville, AL, USA), who also prepared the Chromium library and sequenced it 70 on Illumina HiSeq X. Barcodes were removed using the Long Ranger basic v2.2.2 5 71 (https://support.10xgenomics.com/genome-exome/software/downloads/latest). Trimmed reads 72 were used for k-mer estimation in Jellyfish (33Marçais and Kingsford 2011, v2.2.7). The k-mer 73 (k=21) frequency distribution was processed in GenomeScope (41) to estimate the genome size, 74 heterozygosity, and percentage of repeat content. The linked-reads were assembled using the 75 Supernova assembler (42) with “–accept-extreme-coverage” specified because the coverage was 76 lower than recommended. The assembled genome is 2.42 Gb (16,530 scaffolds >=10 kb, scaffold 77 N50 = 363 kb, Table S3) and was outputted in the psuedohap2 format (accession No TbD.). The 78 completeness of the genome assembly was assessed using Benchmarking Universal Single-Copy 79 Orthologs (BUSCOs, v4.0.5 (43)), and 72.6% of the BUSCO Tetrapoda gene annotations 80 (version odb10) were identified (Table S3). 81 82 Obtaining intron sequences of ATP1A1 via targeted long-read sequencing 83 The intron sequences of L. fuscus were determined by blasting (blast-2.26) the protein- 84 coding sequences of ATP1A1 S and ATP1A1 R against its genome assembly (Figure S4). For 85 the other four Leptodactylus species (L. pentadactylus, L. latrans, L. insularum, L. colombiensis) 86 and two outgroup species (Engystomops pustulosus and Lithodytes lineatus), introns were 87 obtained via targeted long-read sequencing using Oxford Nanopore MinION. Genomic DNA 88 was extracted with Agencourt DNAdvance Kit (Beckman Coulter, France) and ATP1A1 was 89 amplified using LongAmp Taq PCR kit (NEB). Primers contain a species-specific 90 complementary sequence and a custom barcode (Table S5). PCR products were gel confirmed, 91 cut, and purified using QIAquick PCR Purfication kit (Qiagen). Libraries were pooled and 92 prepared for sequencing using Ligation Sequencing Kit SQK-LSK109 (Oxford Nanopore 93 Technologies) following the manufacturer’s protocol. 72,161 reads were generated within six 6 94 hours, 89% passed the filter, and the real-time read length distribution matched that shown on gel 95 image of the amplicons. Raw trace data were base called with Albacore v2.3.4 (Oxford 96 Nanopore Technologies) and demultiplexed using LAST v980 (44). Reads that mapped to more 97 than one barcode were discarded. Reads were assigned to each species based on barcodes using 98 seqtk (45). Only reads of around the expected length (± 200 nt) were used for downstream 99 analyses. For Leptodactylus species with two copies of ATP1A1, reads were further split by 100 perfectly matching the 111-122 region of the two copies, which bears 22-25% difference in 101 nucleotide sequences between each other. Assembly was carried out using Canu v1.8 (46) and 102 1000 reads (1000x coverage) were randomly selected for better performance. Reconstructed 103 sequences were identical when different sets of 1000 reads were used. Filtered reads were 104 mapped back to the reconstructed reference with minimap2 (47) and polished with racon v1.3.3 105 (48). Short-read sequencing data were generated using Tn5 transposases (as described above) to 106 further correct and polish the sequences. Final sequences were aligned with the MUSCLE 107 algorithm (49) implemented in SeaView (50). The boundaries between introns and exons were 108 manually adjusted to follow the general rule that introns start with GT and end with AG.