<<

bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 How to make a giant: Genomic basis and tradeoffs of

2 gigantism in the , the world’s largest rodent 3 4 Authors: 5 Santiago Herrera-Álvarez*1, Elinor Karlsson2, Oliver A. Ryder3, Kerstin Lindblad-Toh2,4, & 6 Andrew J. Crawford1 7 8 Affiliations: 9 1 Department of Biological Sciences, Universidad de los , Bogotá 111711, 10 2 Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA 11 3 San Diego Institute for Conservation Research, Global, Escondido, CA, 92027, USA 12 4 Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 13 752 36 Uppsala, Sweden. 14 15 *Corresponding author: 16 Address: Universidad de los Andes 17 Calle 19 No. 1-60, Bogotá, código postal 111711, Colombia, . 18 Email: [email protected] 19 Phone: +1(312) 536-1206 20 21

1 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

22 Abstract 23 Gigantism is the result of one lineage within a clade evolving extremely large body size 24 relative to its small-bodied ancestors, a phenomenon observed numerous times in 25 . Theory predicts that the evolution of giants should be constrained by two 26 tradeoffs. First, because body size is negatively correlated with population size, purifying 27 selection is expected to be less efficient in of large body size, leading to a 28 genome-wide elevation of the ratio of non-synonymous to synonymous substitution rates

29 (dN/dS) or mutation load. Second, gigantism is achieved through higher number of cells 30 and higher rates of cell proliferation, thus increasing the likelihood of cancer. However, 31 the incidence of cancer in gigantic animals is lower than the theoretical expectation, a 32 phenomenon referred to as Peto’s Paradox. To explore the genetic basis of gigantism in 33 and uncover genomic signatures of gigantism-related tradeoffs, we sequenced the

34 genome of the capybara, the world’s largest living rodent. We found that dN/dS is 35 elevated genome wide in the capybara, relative to other rodents, implying a higher 36 mutation load. Conversely, a genome-wide scan for adaptive protein evolution in the 37 capybara highlighted several genes involved in growth regulation by the insulin/insulin- 38 like growth factor signaling (IIS) pathway. Capybara-specific gene-family expansions 39 included a putative novel anticancer adaptation that involves T cell-mediated tumor 40 suppression, offering a potential resolution to Peto’s Paradox in this lineage. Gene 41 interaction network analyses also revealed that size regulators function simultaneously as 42 growth factors and oncogenes, creating an evolutionary conflict. Based on our findings, 43 we hypothesize that gigantism in the capybara likely involved three evolutionary steps: 1) 44 Increase in body size by cell proliferation through the ISS pathway, 2) coupled evolution 45 of growth-regulatory and cancer-suppression mechanisms, possibly driven by 46 intragenomic conflict, and 3) establishment of the T cell-mediated tumor suppression 47 pathway as an anticancer adaptation. Interestingly, increased mutation load appears to be 48 an inevitable outcome of an increase in body size. 49

2 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

50 Author Summary 51 The existence of gigantic animals presents an evolutionary puzzle. Larger animals have 52 more cells and undergo exponentially more cell divisions, thus, they should have 53 enormous rates of cancer. Moreover, large animals also have smaller populations making 54 them vulnerable to . So, how do gigantic animals such as elephants and blue 55 whales protect themselves from cancer, and what are the consequences of evolving a 56 large size on the ‘genetic health’ of a species? To address these questions we sequenced 57 the genome of the capybara, the world’s largest rodent, and performed comparative 58 genomic analyses to identify the genes and pathways involved in growth regulation and 59 cancer suppression. We found that the insulin-signaling pathway was involved in the 60 evolution of gigantism in the capybara. We also found a putative novel anticancer 61 mechanism mediated by the detection of tumors by T-cells, offering a potential solution 62 to how mitigated the tradeoff imposed by cancer. Furthermore, we show that 63 capybara genome harbors a higher proportion of slightly deleterious mutations relative to 64 all other rodent genomes. Overall, this study provides insights at the genomic level into 65 the evolution of a complex and extreme phenotype, and offers a detailed picture of how 66 the evolution of a giant body size in the capybara has shaped its genome.

3 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

67 Introduction 68 Body size is arguably the most apparent characteristic when studying multicellular life, 69 and understanding how and why body size evolves has been a major question in biology 70 [1]. Body size is a canonical complex trait, and, consequently, ontogenetic changes 71 underlying body size evolution affect other life-history traits, such as fecundity and 72 longevity [2,3]. Yet, how size is developmentally controlled to generate species-specific 73 differences remains elusive. Although increase in body size has several selective 74 advantages [4], maximum body size in tetrapods appears to be determined mostly by 75 intrinsic biological constraints [5–7]. Two major tradeoffs in the evolution of large bodies 76 are the reduction in population size and the increased risk of developing growth- 77 associated diseases, such as cancer [8,9]. 78 79 Larger species have lower population densities [8], slower metabolic rates, longer 80 generation times, and a lower reproductive output [3]. Together these traits lead to an

81 overall reduction in the genetic effective population size (Ne) relative to small-bodied 82 species [10]. In a finite population, the interplay between selection and genetic drift 83 determines the fate of newly arisen mutations, and the strength of drift relative to

84 selection will largely depend on Ne [11]. In smaller populations, slightly deleterious 85 mutations are expected to accumulate at a higher rate than in larger populations, due to 86 stronger genetic drift relative to purifying selection, thus increasing the mutation load in 87 the population, which may increase the risk of extinction [12]. Furthermore, animals have 88 positive allometric scaling of number of cells in the body with overall body size [13]. 89 Thus, if every cell has an equal probability, however low, of becoming cancerous and 90 related species have equivalent cancer suppression mechanisms, the risk of developing 91 cancer should increase as a function of number of cells. Therefore, large species should 92 face a higher lifetime risk of cancer simply because large bodies contain more cells 93 [9,14]. Moreover, giant body sizes are achieved mainly by an accelerated post-natal 94 growth rate [15], which represents an increment in the rate of cell proliferation [16]. For a 95 given mutation rate, an elevated rate of cell proliferation accelerates the rate of 96 accumulation of mutations and, thus, increases the chances for a cell population of 97 becoming cancerous [17]. 98

4 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

99 The continued existence of enormous animals, such as whales and elephants, implies, 100 however, that the evolution of gigantism must be coupled with the evolution of cancer 101 suppression mechanisms. In fact, the lack of correlation between size and cancer 102 incidence, known as Peto’s Paradox, suggests that large-sized species have mitigated the 103 inherent increase in cancer risk associated with the evolution of large bodies [9,14]. For 104 instance, genomes of the Asian and African elephants (Elephas maximus; Loxodonta 105 africana) and the bowhead whale (Balaena mysticetus), three of the largest living 106 , revealed lineage-specific expansion of genes associated with tumor 107 suppression via enhanced DNA-damage response [18,19] and cell cycle control [20], 108 respectively. Thus, cancer selection, i.e., selection to prevent or postpone deaths due to 109 cancer at least until after attainment of reproductive maturity, is especially important as 110 animals evolve larger bodies and might account for the relative lower incidence of cancer 111 in large animals and the evolution of anticancer mechanisms [21]. 112 113 Among mammals, rodents are the most diverse group in terms of species richness and 114 morphological disparity, particularly in body size. South American rodents of the 115 infraorder represent one of the most spectacular evolutionary radiations 116 among living New World mammals [22], and fossil data suggest the independent 117 evolution of gigantism in at least three lineages [23]. Caviomorphs have the broadest 118 range in body size within rodents, and make Rodentia one of the mammalian orders with 119 the largest range in body size [24]. 120 121 Most notable among the caviomorphs is the largest living rodent, the capybara, 122 hydrochaeris ([Linnaeus, 1776]; Fig. 1C), family . Capybaras 123 have an average adult body weight of 55 kg [25], being 60 times more massive than their 124 closest living relative, the rock cavies ( sp.), and ~2,000 times more massive 125 than the common mouse (Mus musculus). Relative to all rodents, capybaras are three 126 orders of magnitude above the mean body size for the order [25]. Previous studies on 127 physiology and molecular evolution identified that caviomorph insulin, a hormone 128 involved in growth and metabolism, has a higher mitogenic effect relative to other 129 mammalian growth factors [26,27], suggesting that lineage-specific changes in insulin 130 within caviomorphs coincide with important changes in life-history traits. However, the 131 genomic basis of the capybara’s unique size remains unexplored. Additionally, there is 132 evidence that telomerase activity has co-evolved with body mass in rodents, with the

5 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

133 largest species (i.e., capybara and ) displaying low telomerase activity in somatic 134 tissues, suggesting a potential role for replicative senescence as a cancer suppression 135 mechanism in large-bodied rodents [28]. However, it is yet not clear whether capybara 136 has a unique solution to its increased risk of cancer relative to smaller rodents, or how 137 some lineages have coupled growth-regulatory pathways and cancer suppression 138 mechanisms to permit evolution of giant bodies. 139 140 Results 141 Genome landscape and annotation 142 Capybaras have 32 autosomes (2n = 66), consisting of 10 submetacentric chromosome 143 pairs, 10 metacentric chromosome pairs, 2 acrocentric chromosome pairs and 10 144 telocentric chromosome pairs [29]. We generated a whole-genome sequence and de novo 145 assembly of a female capybara, using a combination of a DISCOVAR de novo assembly 146 and Chicago libraries (See Materials and methods). The total length of the assembly was 147 ~2.7 Gb, slightly shorter than the genome size estimated by flow cytometry of 3.14 Gb 148 (Table 1), including 73,920 contigs with an N50 of 12.2 Mb and the longest contig at 149 75.7 Mb. The GC content was estimated at 39.79%. 150 151 The assembled capybara genome has very high coverage of coding regions: we recovered 152 89% of the 3,023 vertebrate benchmarking universal single-copy orthologs (BUSCO 153 pipeline, [30]), and 94.4% of the ultra-conserved core eukaryotic genes (CEGMA 154 pipeline, [31]), which is comparable to the genome despite the latter having 155 two-fold larger scaffold N50 (Fig. S1A). We used guinea pig, rat, and mouse Ensembl 156 protein sets to guide the homology-based annotation with MAKER v2.31.9 [32], resulting 157 in a capybara genome containing 24,941 protein-coding genes with 92.5% of the 158 annotations with an AED bellow 0.5 (See Materials and methods). RepeatMasker 159 annotation and classification of repetitive elements estimated that 37.4% of the capybara 160 genome corresponded to repetitive content, spanning 1,025 Mb of the genome (Fig S1B). 161 Most of the repeat content (72.5%) corresponded to transposable elements, with LINE-1 162 the most abundant (20.46%), similar to the mouse genome [33]. Further comparison 163 across the rodent phylogeny showed that both genome size and repeat content of the 164 capybara are comparable to that of other rodents (Fig. 1B). 165

6 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

166 Body mass evolution 167 To determine whether the capybara could be considered a giant based on a lineage- 168 specific accelerative rate of body mass evolution, we used phylogenetic comparative 169 methods. We found that the rate of body mass evolution among caviomorph rodents has 170 not been constant throughout the their phylogenetic history, as shown by the better fit of a 171 multi-rate model compared to the single-rate-Brownian Motion model (ΔDIC = 78.6; 172 Table S1) with an estimated value of α = 1.35 (95% confidence interval [CI]: 1.0 to 1.7). 173 The ancestral body size of the most recent common ancestor (MRCA) of Caviomorpha 174 was estimated to be 971 g (95% CI, 221 g to 5,135 g) and the size of the MRCA of the 175 capybara and (Kerodon rupestris) was estimated to be 1132 g (95% CI, 437 g 176 to 23,225 g), suggesting that the rate of body size evolution has undergone one or more 177 sudden changes. 178 179 Using the AUTEUR package [34], we detected three shifts in the rate of body-mass 180 evolution along the caviomorph tree (Fig. S5): one decrease in evolutionary rate at the 181 root of the Octodontoidea clade (shift probability: 0.11), and two incidences of rate 182 acceleration. The first rate increase was localized on the branch leading to the 183 (Myocastor ; shift probability: 0.16), the largest species within Octodontoidea, and 184 a second increase, with the highest posterior support, in the MRCA of the capybara, rock 185 cavy and Patagonian (Dolichotis patagonum; shift probability: 0.44). Capybaras and 186 Patagonian maras are the largest living species of caviomorph rodents, yet the capybara is 187 approximately 3 times heavier (Fig S6). These results suggest that the capybara evolved 188 from a moderately small ancestor, comparable to the size of a guinea pig, indicating that 189 capybara’s characteristic large body was achieved by a spurt in the rate of body mass 190 evolution (see also [35]). 191 192 The accelerated rate of body mass evolution observed in capybaras relative to other 193 rodents parallels the rate acceleration observed in the world’s largest animals, e.g., 194 elephants, the largest living terrestrial mammals relative to smaller afrotherian ancestors 195 [36], baleen whales including the blue whale, the largest to have ever lived, 196 relative to ancestral mysticetes [37], and sauropodomorphs, the largest terrestrial animals 197 in Earth history, relative to other dinosaurs [38]. Thus, the tempo and mode of body size 198 evolution demonstrates that the capybara also provides an example of gigantism among 199 rodents.

7 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

200 201 Having characterized the genome and the mode of body mass evolution, we undertook 202 detailed analyses of key genes and gene families to explore the genomic basis of 203 capybara’s large size, identify possible molecular adaptations related to cancer

204 suppression, and detect the genomic signatures associated with a reduced Ne. 205 206 Gene-specific rates of evolution in the capybara genome 207 Because the guinea pig ( porcellus) is the closest rodent with a sequenced genome 208 to the capybara (Fig. 1A) and is much smaller (750 g versus 55,000 g), pairwise 209 comparisons between the capybara and guinea pig could provide insights into the genes 210 underlying the rapid burst in body size evolution in the capybara (see Body size

211 evolution). We calculated the pairwise dN/dS ratios for 11,255 single-copy gene orthologs 212 (SCOs) shared by capybara and guinea pig. Sequence conservation in protein-coding 213 genes between guinea pig and capybara was moderately high (mean = 92.6% with 214 standard deviation [SD] = 4.8%) after an estimated divergence time of ~20 million

215 (Fig. S3). The 20 capybara-guinea pig SCOs that showed a dN/dS > 1, were enriched for 216 11 GO Biological Process, such as response to stress (GO:0006950) and immune system 217 process (GO:0002376), which included a number of genes involved in innate (CCL26, 218 CXCL13, LILRA, XCL1, PILRA and CMC4) and adaptive (MHC-B46) immune 219 responses. Further, in order to identify genes evolving rapidly on the capybara lineage, 220 we compared each of 8,084 SCOs shared by capybara-guinea pig-rat. The top 5% (n =

221 404) of genes with high dN/dS values for capybara-guinea pig relative to the highest dN/dS 222 value for guinea pig-rat orthologs were enriched for 36 GO Biological Process, including 223 cell cycle (GO:0007049), cell death (GO:0008219), immune system process 224 (GO:0002376), mitotic cell cycle (GO:0000278) and aging (GO:0007568).

225 As a complementary analysis of positive selection among genes, we used codon-based 226 models of evolution [39] that take into account heterogeneous selection pressure at 227 different sites within proteins, combined with a protein sequence-divergence approach 228 (see Materials and methods). We identified a set of 4,452 SCOs using five available 229 genomes of Hystricognath rodents: naked mole rat (Heterocephalus glaber), degu 230 (Octodon degus), (Chinchilla lanigera), guinea pig and capybara (Fig. 2). We 231 considered genes in the top 5% of the distribution in both dimensions (i.e., the likelihood 232 ratio tests [LRTs] of branch-site models and the protein distance index [PDI] statistic of

8 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

233 protein p-distance) as positively selected in the capybara lineage, resulting in fifty-six 234 candidate genes (see Materials and methods). The modest number of candidate genes is 235 not surprising given the low levels of protein divergence between capybara and guinea 236 pig (mean: 1%, 95% C.I.: 0.9-1.2). These 56 genes were enriched for 37 GO Biological 237 Process, including cell death (GO:0008219), cell cycle (GO:0007049), cell proliferation 238 (GO:0008283), growth (GO:0048590), mitotic cell cycle (GO:0000278) and immune 239 system process (GO:0002376).

240 Gene family composition analysis 241 Gene family evolution, especially expansion by gene duplication, has been recognized as 242 an important mechanism explaining morphological and physiological differences 243 between species [40]. To characterize the repertoires of gene functions of rodent 244 genomes, we employed a phenetic approach based on principal component analysis 245 (PCA) that seeks to quantify and visualize the similarity in functional composition among 246 species (see Matrials and methods). The PCA largely reflected phylogenetic relationships 247 within Rodentia (Fig. 3A), except for the capybara, which showed a clear signature of 248 genome-wide functional diversification relative to the rest of rodents. To explore which 249 gene families were driving this pattern, we performed an overrepresentation test to 250 determine the gene families that dominated the loadings of each principal component. 251 The main determinants of PC1 were the diversification of gene families related to sensory 252 perception of smell, G-protein coupled receptor signaling and immune response (Fig. 253 3A). Further, based on a genome-wide screen of 14,825 gene families, we identified 39 254 families significantly expanded in capybara, and 1 family significantly contracted. 255 256 Evolution of regulators of growth and size 257 Differences in size between related organisms can reflect differences in cell number, cell 258 size or both (Conlon and Raff, 1999). In animals, six signaling pathways regulate growth: 259 RTK signaling via Ras, IIS via the PI3K pathway, Rheb/Tor, cytokine-JAK/STAT, 260 Warts/Hippo, and the Myc oncogene [41]. We found several genes within the IIS 261 (insulin/insulin-like growth factor signaling) pathway to exhibit either signatures of 262 pairwise divergence in the capybara, such as LEP (leptin), and PDX1 (Pancreatic and 263 duodenal homeobox 1), or to be under positive selection, such as PLAGL2 (Pleiomorphic 264 adenoma-like Protein 2) and RFX6 (Regulatory Factor X6; Fig. 2). Moreover, we found 265 two gene families within the IIS pathway to be significantly expanded in the capybara

9 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

266 relative to other rodents (Fig. 3B), namely PI3KR2 (Phosphatidylinositol 3-kinase 267 Regulatory subunit β), and PRKCI (atypical protein kinase C isoform PRKC iota). 268 269 The IIS pathway is implicated in embryonic and post-natal growth via cell proliferation 270 [42] consistent with the possibility that large body size in the capybara has evolved 271 through an increase in number of cells (Fig 4A). For instance, LEP is an important 272 growth factor that regulates body weight and bone mass through the IIS pathway [43]. 273 PLAGL2 is a member of the PLAG family of zinc finger transcription factors, which 274 interacts with the insulin-like growth factor II (igf2) activating the igf2-mitogenic 275 signaling pathway [44], and disruption can result in decreased body weight and post-natal 276 growth retardation in mice [45,46]. IGF-II provides a constitutive drive for fetal and post- 277 natal growth [47] and it has been shown that IGF-II levels do not drop after birth in 278 caviomorph rodents, but instead remain detectable in adulthood [48], supporting a role 279 for IGF-II regulation on the evolution of gigantism in the capybara. PI3K signalling is 280 implicated in cell proliferation, growth, and survival via the IIS and mTOR signaling 281 pathways [49]. PI3KR2 has been shown to have both oncogenetic [50] and tumor 282 suppressor properties [51], and PRKCI functions downstream of PI3K and is involved in 283 cell survival, differentiation, and proliferation by accelerating G1/S transition [52]. 284 285 PDX1 and RFX6 are two key transcription factors (TFs) that regulate the development of 286 the insulin-producing pancreatic β-cells. Pancreatic islets are composed of 5 different cell 287 types, being the β-cells the most prominent, composing 50-80% of the islets and the ones 288 in charge of producing insulin [53]. PDX1 and RFX6 TFs are crucial to establish and 289 maintain the functional identity of developing and adult β-cells [54,55]. Furthermore, in 290 adult β-cells PDX1 and RFX6 regulate insulin gene transcription [56] and, particularly, 291 RFX6 also regulates insulin content and secretion [57]. Thus, besides the physiological 292 effects of caviomorph insulin [26,27], lineage-specific shifts in insulin transcription and 293 secretion profiles may be an important mechanism in the evolution of extreme body size 294 in the capybara. 295 296 Evolution of cancer resistance 297 In rodents, cancer incidence has been reported to be 46% in a wild-caught population of 298 Mus musculus raised in the laboratory [58], and 14.4% to 30% in guinea pigs older than 299 three years [59]. If capybara’s large body size is caused by an increase in cell

10 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

300 proliferation, this could potentially lead to an increased risk of cancer. However, in 301 capybaras, only three cases of cancer have been reported to date [60–62], suggesting the 302 possibility of a lower incidence of cancer in capybara relative to other rodents. If so, this 303 pattern would be consistent with empiric analyses of Peto’s Paradox [63]. Thus, we 304 searched for genes and gene families related to cancer resistance in the capybara. 305 306 We found three gene families significantly expanded in the capybara relative to other 307 rodents related to tumor reversion and cancer suppression by the immune system (Fig. 308 3B-C), namely TPT1 (Tumor Protein, Translationally-Controlled 1), MAGEB5 309 (Melanoma Antigen Family B5) and GZMB (Granzyme B). TPT1 has a major role in 310 phenotypic reprogramming of cells, inducing cell proliferation and growth via the mTOR 311 signaling pathway, and plays a role in tumor reversion where cancer cells lose their 312 malignant phenotype [64]. Type I MAGE genes (e.g., MAGEB5) are expressed in highly 313 proliferating cells such as placenta, tumors, and germ-line cells [65]. In normal somatic 314 tissues these genes are deactivated but when cells become neoplastic, MAGE genes are 315 reactivated and the resultant proteins may be recognized by cytotoxic T lymphocytes, 316 triggering a T-cell-mediated tumor suppression response [65]. Lastly, GZMB is an 317 important component of the perforin/granzyme-mediated cell-killing response of 318 cytotoxic T lymphocytes via caspase-dependent apoptosis [66]. The fact that the capybara 319 displays functional diversification in the T-cell receptor (TCR) signaling pathway (Fig. 320 3A), and expansion of MAGEB5 and GZMB gene families, suggests that T-cell-mediated 321 tumor suppression response may be a parallel strategy, besides replicative senescence, to 322 reduce the increased cancer risk related to the evolution of a giant body size (Fig. 4B). 323 324 Lineage-specific amino acid replacements are good predictors of change in protein 325 function [67]. We aligned 876 SCOs from 9 rodent species spanning the three main 326 groups within Rodentia and identified sites otherwise conserved in all rodents except for 327 the capybara (See Materials and methods). Among the top 5% genes (n = 44) with the 328 highest concentration of unique-capybara residues we found TERF2-Interacting 329 Telomere Protein 1 (TERF2IP). TERF2IP is part of the shelterin complex that shapes and 330 controls the length of telomeric DNA in mammals [68], particularly the negative 331 regulation of telomere length [69], and it has been shown that many TERF2-interacting 332 proteins are mutated in chromosomal instability syndromes, increasing the susceptibility 333 to cancer [70]. PARP2 (Poly(ADP-Ribose) Polymerase member 2), a candidate protein

11 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

334 showing signatures of lineage-specific positive selection in the capybara (Fig. 2), directly 335 interacts with the TERF2 protein affecting its ability to bind telomeric DNA, and 336 disruption causes chromatid breaks that affect telomere integrity [71]. Together, these 337 results support previous evidence showing that replicative senescence plays an important 338 role in reducing the increased cancer risk in the big rodents [28]. 339 340 Intragenomic conflict during evolution of gigantism 341 How can giants evolve, or even exist? Size is regulated through somatic growth [72] and 342 cancer appears through somatic evolution [73]. Thus, cancer places a major constraint on 343 the evolution of large bodies. To investigate how growth and cancer suppression 344 pathways may be coevolving to allow the evolution of a giant body size in the capybara, 345 we used the genes that showed lineage-specific positive selection and rapid evolution, 346 high concentration of unique-capybara residues, and capybara-expanded gene families 347 (see above) to perform network analysis of 82 interacting genes using String Database 348 ([74]). Network analyses based on GO Biological Process revealed six functional clusters 349 among the 82 genes including immunity, regulation of cell proliferation, cell 350 communication and adhesion, metabolism and homeostasis, mitochondrion organization 351 and metabolism, and ribosome maturation (Fig. 5A). 352 353 We extracted 23 genes deeply involved in growth regulation by the IIS pathway, and the 354 T cell-mediated immunity pathway to explore in more detail the interaction between 355 growth and cancer suppression mechanisms. Remarkably, we found that this smaller 356 network has an enrichment of cancer pathways such as prostate cancer (KEGG ID 357 05215), colorectal cancer (KEGG ID 05210), small cell lung cancer (KEGG ID 05222), 358 and pathways in cancer (KEGG ID 05200). For instance, INS1, PI3KR2, PLAGL2 and 359 LEP have been reported as growth factors, promoting somatic evolution, but also 360 oncogenes, enabling tumor progression (Fig. 5B). Paradoxically, LEP is a major regulator 361 of body weight and bone mass [43] but also promotes angiogenesis through VEGF 362 signaling and acts as an autocrine factor in cancer cells promoting proliferation and 363 inhibition of apoptosis [75]. PLAGL2 contributes to post-natal growth by activating the 364 igf2-mitogenic signaling pathway [44] but also contributes to cancer by generating loss of 365 cell-cell contact inhibition, maintaining an immature differentiation state in 366 glioblastomas, and inducing proliferation of hematopoietic progenitors in leukemia 367 [44,76,77]. Insulin-PI3K signaling is implicated in cell proliferation and growth and

12 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

368 survival via the IIS and mTOR signaling pathways [49], but is frequently mutated in solid 369 tumors [78]. These observations suggest that some cancers could arise as a by-product of 370 growth regulatory evolution. 371 372 The dual nature of growth factors can create a scenario for an intragenomic conflict in 373 large-sized organisms, where somatic evolution leading to gigantism may promote 374 tumorigenic cells [79]. Cancers are selfish cell lineages that evolve through somatic 375 mutation and cell-lineage selection [80]. Thus, selection for large body size achieved 376 through greater cell proliferation can favor mutations that also confer cellular autonomy 377 in growth signals enhancing their own probability of replication and transmission, a 378 hallmark of cancer [79,81]. In fact, within species the risk of cancer scales with body size 379 [9,82,83]. Thus, the opportunity for selfish cell lineages to arise, incidental to selection 380 for cell proliferation, imposes a selection pressure for cancer surveillance mechanisms, 381 generating a coupled evolution between growth-promoting and tumor-suppression 382 pathways, a plausible underlying process causing Peto's Paradox and a possible 383 explanation for how giant bodies can evolve in the first place. Therefore, based on gene 384 network interaction analyses, we propose a model for the evolution of gigantism in the 385 capybara, and the resolution of the conflict, where adaptive evolution and duplication of 386 genes within the IIS pathway promote somatic evolution through cell proliferation, 387 leading to a phyletic increase in size, and the lineage-specific expansion of gene families 388 within the T-cell mediated tumor suppression pathway as a mechanism to counteract the 389 increased cancer risk. 390 391 Genomic signatures of mutational load 392 To quantify the mutation load in capybara we used the ratio of non-synonymous

393 substitutions to synonymous substitutions (dN/dS) as a proxy. Because most non- 394 synonymous substitutions are deleterious or slightly deleterious [84], small populations, 395 for which purifying selection is comparatively inefficient, tend to have elevated genome- 396 wide fraction of fixed non-synonymous substitutions, and thus an overall higher value of

397 dN/dS [85]. Based on the inverse relationship between body mass and population size [8],

398 we used body mass across 16 rodents as a proxy for Ne. Bootstrap comparison of

399 capybara-specific dN/dS values with the rest of the rodents showed that the capybara has a

400 higher genome-wide mean dN/dS (0.15 vs. 0.11, respectively; two-tailed two-sample t-

401 test, t = -103.25, df = 1075, P < 0.00001. Difference of mean dN/dS between capybara and

13 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

402 other rodents: 0.041; permutation test, P < 0.001; Fig. 6), suggesting elevated mutation 403 load in the capybara relative to other rodents. Furthermore, analysis of body mass and

404 nuclear dN/dS across rodents showed that body mass explains nearly 30% of the variance 2 405 in mean dN/dS among species (R = 0.294, df = 14, P < 0.05; Phylogenetic Independent 406 Contrasts (PIC): R2 = 0.332, df = 13, P < 0.05; Fig. S4A), suggesting that capybara may 407 be part of a trend in which larger rodents experience less efficient purifying selection. 408 409 To further corroborate that the increased mutational load was attributable to demographic

410 effects, we estimated dN/dS values for the 13 protein-coding mitochondrial genes in ten 411 rodent species (Table S3). If there is variation in the effective population size of 412 mitochondrial DNA (mtDNA) across species, then diversity levels and substitution 413 patterns in mtDNA should be correlated to those in nuclear DNA given that evolutionary

414 processes that affect Ne should affect both genomes [86–88]. Consistent with this idea,

415 mean mitochondrial and nuclear dN/dS values within rodents were strongly correlated 416 (Spearman’s ρ = 0.67, P < 0.05; PIC: Spearman’s ρ = 0.7, P < 0.05; Fig S4B). Together,

417 these results suggest that lower Ne in larger rodent species has increased the mutation load 418 in both nuclear and mitochondrial genomes, supporting a tradeoff at the population level 419 of evolving a large body size. 420 421 Discussion 422 Despite intrinsic constraints on the evolution of large bodies, mammalian species in 423 multiple lineages have achieved dramatic body size changes through accelerated rates of 424 phenotypic evolution [5,89]. The evolution of gigantism can therefore shed light on how 425 biological systems can break underlying constraints, and mitigate or overcome the 426 evolutionary effects of tradeoffs on whole-organism performance [90,91]. The genome of 427 the capybara, the world’s largest rodent, is therefore crucial to addressing these questions. 428 The genome assembly analyzed here has revealed insights into the capybara’s particular 429 morphology, offering the first genomic scan onto the developmental regulators of its 430 large size, as well as the characterization of a putative novel anti-cancer mechanism, and 431 the collateral effects at the population level of evolving a large body size. For instance,

432 reduced long-term Ne can increase the probability of extinction because of the 433 accumulation of slightly deleterious mutations that reduce the mean fitness of the 434 population [12,92,93]. This may explain macroevolutionary patterns of extinction risk

14 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

435 related to body size [94], reinforcing the effect of body size at the population level and its 436 consequences to genome evolution. 437 438 Changes underlying body size evolution affect other life-history traits, such as fecundity 439 and longevity [2], thus, the extent of pleiotropic effects of each developmental regulator 440 of body size will ultimately determine its availability for evolutionary change [95]. The 441 IIS pathway has also been linked to rapid and extreme body-size evolution in the sunfish 442 (Mola mola), the world’s largest bony fish [96], and to extreme size differences in dog 443 breeds [97]. Given the existence of at least six size-regulating pathways [41], the repeated 444 use of the IIS pathway in fishes, dogs and rodents, suggests that this pathway might be 445 less constrained and could be a ‘hot-spot’ for extreme body size evolution in vertebrates, 446 indicating a common genetic and developmental pathway controlling rapid body size 447 evolution through cell proliferation. 448 449 Finally, cancer imposes a strong constraint on morphological and ontogenetic evolution 450 [90], particularly during the evolution of gigantism. The evolution of compensatory 451 cancer-protective mechanisms, thus, becomes a requirement if animals are to evolve 452 larger bodies. In fact, larger and long-lived mammals have evolved enhanced 453 mechanisms of cancer suppression as revealed by the lower risk of developing cancer 454 relative to smaller species, a phenomenon known as Peto’s Paradox. While some tumor 455 suppression pathways are conserved among animals, some are the result of idiosyncratic 456 evolutionary processes within species, resulting in a high variety of lineage-specific 457 solutions [98]. For instance, elephants, whales and capybaras, which have evolved 458 extremely large sizes, each have a specific solution to the inherent higher risk of cancer: 459 higher sensitivity to DNA damage, more efficient DNA repair, and enhanced immune 460 response against tumors, respectively. Mammalian comparative genomics to study cancer 461 can therefore provide insight into the common processes as well as the specific 462 mechanisms of cancer resistance, and offer potential targets for human cancer prevention 463 and treatment. Thus, given the increasing interest in using genetically engineered T cells 464 for cancer immunotherapy [99], the capybara could be a suitable model to study the 465 evolution and physiology of tumor detection and eradication by T cells. 466

15 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

467 Materials and methods 468 Supplemental materials and data availability 469 Genome assembly: NCBI BioProject: PRJNA399400; NCBI assembly accession number: 470 PVLA00000000. 471 472 Code with statistical analyses and tables with genes: 473 https://github.com/santiagoha18/Capybara_gigantism 474 475 Ethics statement 476 All procedures involving live animals followed protocols approved by the Comité 477 Institucional de Uso y Cuidado de Animales de Laboratorio (CICUAL) of the 478 Universidad de los Andes, approval number C.FUA_14-023, 5 June 2014. Research and 479 field collection of samples was authorized by the Autoridad Nacional de Licencias 480 Ambientales (ANLA) de Colombia under the permiso marco resolución No. 1177 to the 481 Universidad de los Andes. 482 483 Rate of Body Size Evolution and Ancestral State Reconstruction 484 The term ‘gigantism’ is relative to the clade of interest. We used a comparative approach 485 to test whether the capybara’s large size is due to lineage-specific accelerated rates of 486 phenotypic evolution. From the literature we obtained a recent time-calibrated phylogeny 487 of Caviomorpha [100] and for each species obtained mean adult body mass (in grams) as 488 the mean of the two sexes [101,102]. Analyses were conducted on natural log- 489 transformed body mass. 490 491 The Brownian motion (BM) model of continuous trait evolution assumes that trait 492 variation accumulates proportionally through time, rates of trait evolution are finite and 493 constant across the phylogeny, where branch lengths are scaled to time [103]. The 494 disparity in body size among caviomorph rodents (Fig S5), suggests that rate of body size 495 evolution likely varies across time or among lineages. To compare the BM model with a 496 multiple rate model of body size evolution on the caviomorph phylogeny, and to 497 reconstruct ancestral states, we used the software STABLETRAITS [104]. STABLETRAITS 498 samples from a symmetrical distribution of rates with mean zero and defined by its index 499 of stability (α). BM α=2 implies a normal distribution (i.e., single-rate model), whereas

16 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

500 α<2 results in a shallower platykurtic distribution with heavy tails, supporting a multi- 501 rate model. For the analysis, results from a multi-rate model (α<2) were compared with a 502 BM model (α=2) to select the best fitting model based on the Deviance Information 503 Criterion (DIC). Two MCMC chains were run for 2 million generations each until the 504 potential scale reduction factor went bellow 1.01, which implies convergence between 505 the two chains. Burn-in was set to 25% of samples, with the output containing calculated 506 rates, ancestral states, and posterior probabilities. 507 508 Given that the multi-rate model was selected as the best model according to the DIC 509 (Table 1), the R package AUTEUR [34] was used to identify which branches on the

510 phylogeny experienced rate shifts. AUTEUR uses a reversible-jump MCMC (rjMCMC) to 511 assess the fit of models that differ in the number of rate shifts in the tree. The algorithm 512 assumes that evolutionary rates are inherited from successive ancestral nodes, unless the 513 data provide evidence of a rate shift. The rjMCMC approach explores models ranging 514 from a single-rate BM process to a maximally complex process where each branch 515 evolves under a different rate. Two rjMCMC chains were run for 1 million generations 516 each with the following parameters: sample.freq = 100 prop.mergesplit = 0.1 and 517 prop.width = 0.1. 518 519 Genome size estimation 520 Genome size in the capybara was estimated using flow cytometry on blood samples from 521 two juveniles taken from a population in Cabuyaro, Meta, Colombia (04.284025°, - 522 072.725647°; Table S2). Animals where tranquilized with an intramuscular dose of 523 Zoletil® 50 (VirbacTM, Colombia), and then sacrificed with a lethal dose of Euthanex® 524 50 (InvetTM, Colombia) according to manufacturer instructions, and different tissues were 525 collected for future RNA-seq work. Blood samples were stored in 3mL heparinized tubes 526 at 2ºC and transported to the laboratory in Bogotá, Colombia. Nucleated white blood 527 cells were separated from the rest of blood contents by centrifugation. 100µL of white 528 blood cells were washed with 3mL of PBS buffer, and DNA was stained according to the 529 BD CycletestTM Plus DNA kit instructions (BD BiosciencesTM). Chicken red blood cells 530 were used as a genome size standard, assuming a haploid C-value of 1.25 pg. Flow 531 cytometry was conducted on a BD FACSCantoTM II cytometer with a FACs DIVA v6.1 532 software. Average C-value was estimated to be 3.21 pg (ANDES-M 2304 C-value = 3.17

17 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

533 pg, ANDES-M 2303 C-value = 3.25 pg). Converting the mean C-value to numbers of 534 base pairs we obtained an estimated genome size of 3.14 Gb. 535 536 Capybara genome sequencing and assembly 537 As part of the 200 Mammalian Genomes project, the SciLifeLab at Uppsala University 538 and the Broad Institute sequenced and assembled a DISCOVAR de novo genome 539 assembly [105] from a sample of a wild-born female H. hydrochaeris from the San Diego 540 Zoo’s Frozen Zoo (frozen sample KB10393), imported from Paraguay. The genome 541 assembly was performed with the DISCOVAR de novo pipeline, generating an assembly 542 spanning 2.734 gigabases (Gb) with a scaffold N50 length of 0.202 megabases (Mb; 543 Table 1). 544 545 The Dovetail GenomicsTM proprietary scaffolding method based on ChicagoTM libraries 546 [106] was used to upgrade the initial DISCOVAR assembly. Dovetail’s novel approach 547 to increasing the contiguity of genome assemblies combines initial short-read assemblies 548 with long-range information generated by in vitro proximity ligation of DNA in 549 chromatin. From the same individual of H. hydrochaeris used to obtain the initial 550 DISCOVAR de novo assembly, 500 ng of high molecular weight (HMW) gDNA (50 Kb 551 mean fragment size) was used as input for the ChicagoTM libraries. The libraries were 552 sequenced on two lanes of an Illumina HiSeq in Rapid Run Mode to produce 316 million 553 2x100 bp paired-end reads, which provided 67X physical coverage (measured in bins of 554 1-50 kb). 555 556 The reads were assembled by Dovetail GenomicsTM using the HiRiseTM Scaffolder 557 pipeline and the DISCOVAR de novo genome assembly as inputs. Shotgun and 558 ChicagoTM library sequences were aligned to the draft input assembly using a modified 559 SNAP read mapper (http://snap.cs.berkeley.edu) to generate an assembly spanning 2.737 560 Gb, with a contig N50 length of 161.1 kb and an impressive scaffold N50 length of 12.2 561 Mb, as calculated using the QUAST software [107]. 562 563 Assembly quality and annotation 564 Two approaches were employed to evaluate the quality of the assembly: 1) the Core 565 Eukaryotic Genes Mapping Approach [31] was used to determine the number of 566 complete Core Eukaryotic Genes (CEGs) recovered, and 2) an analysis of Benchmarking

18 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

567 Universal Single-Copy Orthologs [30] was applied as an evolutionary measure of 568 genome completeness, using the vertebrate dataset (3023 genes) as query. Gene content 569 of a well-assembled genome should include a high proportion (≥75%) of both CEGs and 570 BUSCOs (Fig S1). 571 572 Putative genes were located in the assembly by homology-based annotation with 573 MAKER v2.31.9 [32] based on protein evidence from mouse (Mus musculus), rat (Rattus 574 norvegicus) and guinea pig (Cavia porcellus; Ensembl release 85). CD-HIT v4.6.1 [108] 575 was used to cluster highly homologous protein sequences across the three protein sets and 576 generate a non-redundant protein database (nrPD), which was used to guide gene 577 predictions on the capybara genome. To evaluate the quality of the annotations, the 578 Annotation Edit Distance (AED) score was used as a measure of congruence between 579 each annotation and the homology-based evidence (in this case, protein sequences). 580 When an annotation perfectly matches the overlapping model protein, the AED value is 0 581 [109]. As a rule of thumb, a genome annotation where 90% of the annotations have an 582 AED less than 0.5 is considered well annotated (Campbell et al., 2014). The capybara 583 genome annotation has 92.5% of its annotations with an AED below 0.5. 584 585 PANTHER [110] was used to characterize the functions of the capybara-annotated 586 proteins (Fig S2). PANTHER is a library of protein families and subfamilies that uses the 587 Gene Ontology (GO) database to assign function to proteins. Following the protocol for 588 large-scale gene function analysis with PANTHER [111], the Scoring Tool was used to 589 assign each protein of the capybara genome to a specific protein family (and subfamily) 590 based on a library of Hidden Markov Models. 591 592 Annotation of repeat elements 593 RepeatMasker open-4.0.5 [112] was used to identify and classify transposable elements 594 (TEs) and short tandem repeats by aligning the capybara genome sequences against a 595 reference library of known repeats for rodents, using default parameters. For comparison, 596 NCBI Annotation releases for 13 additional rodents were used (Fig. 1) 597 598 Phylogenomics and divergence time estimation 599 Available glire (rodent + ) genomes were downloaded from NCBI and Ensembl 600 (Table S3). AlignWise [113] was used to identify protein-coding regions and extract the

19 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

601 longest open reading frame per gene. Single-copy gene orthologs (SCOs) shared across 602 the 17 glire genomes (Table S3) were identified using an 8321-marker dataset of 603 orthologs for glires from OrthoMaM database v8 [114]. This dataset contains orthologous 604 groups for 6 of the 17 genomes; thus, we constructed a dataset combining the 8321 605 OrthoMaM orthologous groups with the 11 remaining genomes and performed an all-vs- 606 all blastn [115]. We used the reciprocal groups function of VESPA v1.0β [116] to 607 recover orthologous groups spanning the 17 genomes and identified 2597 groups, of 608 which 370 were SCOs. For phylogenetic tree inference, each of the 370 SCOs was 609 aligned using MUSCLE [117] and then concatenated. Poorly aligned sites were trimmed 610 using Gblocks v0.91b using the –t=c option to retain the reading frames [118]. RAxML 611 v8.2.10 [119] with rapid bootstrap mode (n = 500) and assuming a GTRGAMMA model 612 was used to infer a phylogenetic tree. We also used a species-tree approach to 613 phylogenomic inference that takes into account the individual history of each gene, which 614 may conflict due to, e.g., incomplete lineage sorting, using the software Neighbor-joining 615 species tree (NJst; [120]). For each individually aligned SCO, 100 bootstrap replicates 616 were run on RAxML v8.2.10. A file with 37000 gene trees (100 bootstraps x 370 SCOs) 617 was used as input for NJst. 618 619 Given that both phylogenomic analyses (concatenated and species-tree approach) resulted 620 in identical topologies, divergence times were estimated based on the topology obtained 621 in the concatenated analysis using the software MCMCTREE [121] in the PAML 622 package v4.8 [39]. Two nodes were calibrated using fossil data [122] as follows: the most 623 recent common ancestor (MRCA) of Rodentia was restricted to the interval 76 – 88 624 million years ago (Ma), and the MRCA of Mus and Rattus was limited to 18 – 23 Ma. 625 626 Genomic signatures of mutation load 627 In order estimate the mutation load, we used the ratio of the rates of non-synonymous

628 substitutions over synonymous substitutions (dN/dS) as a proxy. A set of 229 SCOs shared 629 across the 16 rodent genomes (Table S3) was recovered with the software Proteinortho

630 [123]. Values of dN/dS were estimated for the set of 229 SCOs, using the program

631 CODEML from the PAML package v4.8 [39]. Likewise, dN/dS ratios for the 13 protein- 632 coding genes of the mitochondrial genomes of capybara plus 9 additional rodent 633 mitochondrial genomes were estimated (Table S3). Two evolutionary models were

634 evaluated: model 0 with dN/dS held constant over the time-calibrated concatenated

20 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

635 likelihood tree, and model 1 with free dN/dS estimated separately for each branch. 636 Likelihood-ratio tests were used to compare the two models, and in each case the model 1

637 was the preferred one. Only dN/dS values associated with external branches were used as 638 measures of mutation load. Codon-based alignments were performed with MACSE [124] 639 with default parameters and poorly aligned regions were trimmed with Gblocks v. 0.91b 640 [118]. 641

642 To determine if the capybara has a higher genome-wide dN/dS relative to other rodents, 643 we (a) calculated the mean ratio from each of 1000 bootstrapped samples of the original

644 229 capybara-specific dN/dS values (see above), and then did the same for the combined

645 set of dN/dS values for the other 15 rodent genomes. The two samples of 1,000 means 646 were then compared using a two-sample t-test. (b) A permutation test (1,000 647 permutations) was performed to create a null distribution of the differences of (mean

648 capybara dN/dS – mean rodents dN/dS), and then compared to the observed value: 0.041. 649

650 To test whether variation in dN/dS was caused by demographic effects (i.e. higher genetic 651 drift relative to purifying selection), the correlation between nuclear and mitochondrial

652 dN/dS values across species was estimated, under the hypothesis that a demographic 653 bottleneck should affect both genomes [87]. The correlations between (a) Mean nuclear

654 dN/dS versus body mass and (b) mean nuclear dN/dS versus mean mitochondrial dN/dS, 655 were re-evaluated using phylogenetic independent contrasts (PICs; [125]) to control for 656 the potentially confounding effects of shared evolutionary history. Because for most

657 mammalian species comparable genetic estimates of nuclear Ne were not available, we

658 used body mass as a proxy for Ne. Mean dN/dS values per species and natural log- 659 transformed values of body mass were used to estimate the impact of the reduction of 660 effective population size on the strength of purifying selection. Assessment of linear 661 model assumptions was performed with the gvlma R package [126]. 662 663 Gene family composition 664 To search for expanded gene families in the capybara, we performed gene family 665 assignments for each of the 16 rodent genomes (Table S3) using the protocol for large- 666 scale gene function analysis with PANTHER [111]. Following [127], functional 667 repertoires of the genomes were represented in a multidimensional space, where each 668 dimension corresponds to a particular gene family. A principal component analysis

21 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

669 (PCA) was conducted using the prcomp function in R [126] on the count of the number 670 of genes in each gene family as identified by PANTHER. The coordinate of a species’ 671 genome along each dimension represents the number of genes that it contains with the 672 corresponding gene family normalized by the total number of genes in that particular 673 species. PCA based on content and size of gene families has been shown to reflect 674 important evolutionary splits between groups or clades of animals (non-bilaterian and 675 bilaterian: [127]; deuterostomes and protostomes: [128]). In rodents, PC1 component 676 separated the capybara from all other rodent genomes and explained the 20.14% of 677 variance, while PC2 grouped all genomes and explained 11.34% of the variance. An 678 overrepresentation test was conducted on the gene families that fell on both 2.5%-tails of 679 the loadings of PC1 and PC2, to determine the gene families that dominate the loadings 680 of each principal component. A Fisher’s exact test was then performed iteratively in R (R 681 Development Core Team, 2008) on counts of each PANTHER gene family from the 5% 682 tails of PC1 and PC2 against the complete set of gene families (23,110 families). To be 683 defined as enriched, we assumed a significance level α = 0.05 for each gene family, 684 before a traditional Bonferroni correction for multiple testing. 685 686 To conduct a genome-wide screen for gene family expansions, a table with 14,825 shared 687 PANTHER gene families, present in at least 15 of the 16 genomes, was constructed with 688 the counts for genes annotated to those families in the 16 rodent genomes (see above). A 689 Fisher’s exact test was then performed iteratively in R [126] comparing the number of 690 genes found in the capybara in each gene family against the total number of genes in the 691 remaining 15 rodents for a given gene family. To be defined as enriched, a gene family 692 had to have a significant p-value (α<0.05) after a traditional Bonferroni correction for 693 multiple testing. We calculated a Z-score per gene family per species, where 1) the 694 individual counts of each gene family per species were normalized by the total number of 695 genes in the genome of each species and 2) was standarized within each gene family 696 (across species) with the scale function in R [126] and visualized as a heatmap (Fig. 7B). 697 This Z-score represented the number of standard deviations below or above the mean 698 gene family count for each species. 699 700 701

22 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

702 Pairwise dN/dS analysis 703 Pairwise comparisons of guinea pig and capybara could provide insights into the genes 704 underlying capybara’s large size, as well as insights into other adaptive differences. We

705 calculated pairwise dN/dS ratios of each gene within genomes to look for signs of positive 706 selection. A total of 11,255 single-copy gene orthologs (SCOs) between guinea pig and 707 capybara where obtained with the OrthoFinder software v.1.1.4 [129]. Specific

708 Biopython [130] and PyCogent [131] modules where used to estimate dN/dS values for 709 each SCO. To identify genes that evolved more rapidly in the capybara lineage, we

710 calculated the ratios of the capybara-guinea pig dN/dS values relative to the highest dN/dS 711 value for guinea pig-rat, on a set of 8,084 SCOs, and ranked them by decreasing ratio. 712 Codon-based alignments were performed with MACSE [124] with default parameters. 713 Gaps and ambiguities were excluded from alignments, and only alignments with a 714 trimmed length above 50% of the original alignment length were retained for subsequent 715 analyses. 716 717 Positive selection analysis using codon-based models of evolution

718 While pairwise dN/dS can identify genes with overall faster sequence evolution, amino 719 acid sites within a protein may experience different selective pressures and have different

720 underlying dN/dS ratios [132]. We fitted branch-site models to identify positive selection 721 in a set of 4,452 SCOs obtained using the OrthoVenn software [133] on capybara versus 722 four closely related rodents with published whole genome sequences: naked mole rat, 723 degu, chinchilla and guinea pig. Downstream selective pressure variation analyses were 724 done using the Selection Analysis Preparation functions provided in VESPA v1.0β [116]. 725 We compared the likelihood scores of two selection models implemented in CODEML in 726 the PAML package v4.8 [39] using likelihood-ratio tests (LRT) as a measure of model 727 fit: a branch-site model (model A) in which attempts to detect positive selection acting on

728 a few sites on specified lineages or ‘foreground branches’ (dN/dS>1), versus the null

729 model (A null) in which codons can only evolve neutrally (dN/dS =1) or under purifying

730 selection (0< dN/dS <1; [134]). As maximum likelihood methods to detect positive 731 selection are sensitive to sample size [135], increasing the possibility of false positives 732 [136], we calculated a second measure of rapid evolution of each gene: ((Capybara, 733 guinea pig), degu) protein trios where used to estimate the difference of the raw p- 734 distance in protein sequence between capybara and degu (pdCD), and the raw p-distance 735 between guinea pig and degu (pdGD). These two distances were used to calculate the

23 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

736 ‘protein distance index’ or PDI for each i gene (PDIi = pdCDi – pdGDi) which takes a 737 value of zero if the two p-distances are equal, meaning that the capybara and guinea pig 738 lineages have evolved at equal rates since their divergence from their MRCA. A positive 739 value is indicative of accelerated protein evolution in the capybara lineage. We 740 considered genes in the top 5% of the distribution in both dimensions (i.e., the LRTs of 741 branch-site models and the PDI statistic) as positively selected in the capybara lineage. 742 Codon-based alignments were performed with MACSE [124] with default parameters 743 and poorly aligned regions were trimmed with Gblocks v. 0.91b [118]. 744 745 For genes exhibiting signatures of lineage-specific positive selection, we conducted GO 746 enrichment analysis to identify Biological Process overrepresented in these genes relative 747 to the 4,452 SCO set. The GO Biological Process for each gene was retrieved with the 748 biomaRt R package [137] using the protein stable IDs of guinea pig orthologs. A Fisher’s 749 exact test was then performed iteratively in R [126] on counts of each Biological Process 750 from the positively selected genes against the full set of orthologs (4,452 genes). To be 751 defined as enriched, a GO Biological Process had to have a significant p-value (α<0.05) 752 after a traditional Bonferroni correction for multiple testing. 753 754 Identification of proteins with capybara-unique residues 755 Conservation-based methods used to identify lineage-specific amino acid replacements 756 have been proven to be good predictors of change in protein function [67]. We identified 757 SCOs for 9 rodents (capybara, guinea pig, degu, chinchilla, naked mole rat, Ord’s 758 kangaroo rat, mouse, rat and thirteen-lined ) and used multiple sequence 759 alignments (MSA) to identify positions in each SCO that are conserved in all species 760 except for the capybara. An in-house Python script was used to identify unique capybara 761 amino acid residues where a maximum of one unknown residue was allowed in each 762 position of the alignment. SCOs were identified with OrthoFinder software v.1.1.4 [129] 763 and protein MSAs were performed with PRANK [138] with default parameters. Gaps 764 were excluded from alignments, and only alignments with a trimmed length above 50% 765 of the original alignment length were retained for subsequent analyses. This procedure 766 resulted in a total of 876 SCOs retained. The SCOs were ranked according to the number 767 of unique substitutions normalized by protein length. 768 769

24 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

770 Gene interaction network analyses 771 We generated a list of 195 candidate genes that showed lineage-specific positive 772 selection, pairwise dN/dS > 1, rapid evolution (top 1%), high concentration of capybara- 773 unique residues (top 5%) and gene-family expansions specific to the capybara. Insulin 774 was included as part of the 195 gene set given that previous analyses showed signatures 775 of adaptive evolution among Caviomorph rodents [26]. Of the 195 genes, only 82 genes 776 presented known or predicted interactions with each other according to String Database 777 [74]. We used this new gene set to perform a network analysis of GO Biological Process 778 and KEGG pathways on the 82 interacting genes. Networks were constructed based on 779 String DB interactions and plotted with Cytoscape 3.0 [139]. GO and KEGG enrichment 780 analyses were performed based on String DB information with a false discovery rate of 781 0.05. 782

25 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

783 Acknowledgements 784 This work was supported by Colciencias grant in Science, Technology & Innovation BIO 785 number 1204-659-44334 (A.J.C.); grant from the Facultad de Ciencias, Universidad 786 de los Andes (S.H.A., A.J.C.). Genome sequencing and assembly was supported by the 787 200 Mammals Project at the Broad Institute (NIH Grant 5R01HG008742-02) and 788 Uppsala University (Swedish Research Council). Special thanks to: Diane Genereux, 789 Daniel Cadena and Jeremy Johnson for insightful comments on the final version of the 790 manuscript; Eva Murén and Voichita Marinescu for generating the sequencing libraries; 791 Juanita Herrera and veterinarian Marlly Guarín for help with drafting and implementing 792 animal care and use protocols; Universidad de los Andes’ HPC group for cluster access; 793 John Marío González in the medical school of the Universidad de los Andes for access 794 and help with flow cytometry; Agencia de Licencias Ambientales (ANLA) for collecting 795 permits granted to Universidad de los Andes (permit Nº 1177). 796 797 Author contributions 798 A.J.C., E.K and K.L.T. conceived genome-sequencing project. O.A.R. provided 799 materials. E.K. and K.L.T. provided the DISCOVAR genome. A.J.C. provided financing. 800 S.H.A. conceived the evolutionary questions addressed here, performed all evolutionary 801 and statistical analyses, and wrote the manuscript. All authors revised and approved final 802 version. 803

26 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

804 References

805 1. Haldane JBS. On being the right size. Possible Worlds and Other Essays. 1927. 806 doi:10.1038/ng0905-923 807 2. Shingleton AW. Evolution and the regulation of growth and body size. In: Flatt IT, Heyland A, 808 editors. Mechanisms of life history evolution: The genetics and physiology of life history traits and 809 trade-offs. Oxford University Press; 2011. pp. 43–55.

810 3. Calder WAI. Size, function, and life history. Harvard UnivPress, Cambridge, Mass. 1984; 811 4. Purvis A, Orme DL. Evolutionary trends in body size. In: Carel JC, P.A. K, Christen Y, editors. 812 Deciphering Growth. 2005. pp. 1–18. Available: Heidelberg. Springer 813 5. McNeill Alexander R. All Time Giants: The largest animals and their problems. Palaeontology. 814 1998. p. 1231–1245.

815 6. Sookias RB, Butler RJ, Benson RBJ. Rise of dinosaurs reveals major body-size transitions are 816 driven by passive processes of trait evolution. Proc R Soc B Biol Sci. 2012;279: 2180–2187. 817 doi:10.1098/rspb.2011.2441 818 7. Kozeowski JAN, Gawelczyk AT. Why are species’ body size distributions usually skewed to the 819 right? Funct Ecol. 2014;16: 419–432. 820 8. Damuth J. Population density and body size in mammals. Nature. 1981. pp. 699–700. 821 doi:10.1038/290699a0

822 9. Caulin AF, Maley CC. Peto’s Paradox: evolution’s prescription for cancer prevention. Trends Ecol 823 Evol. 2011;26: 175–182. doi:10.1016/j.tree.2011.01.002

824 10. Leffler EM, Bullaughey K, Matute DR, Meyer WK, Ségurel L, Venkat A, et al. Revisiting an old 825 riddle: what determines genetic diversity levels within species? PLoS Biol. 2012;10: e1001388. 826 doi:10.1371/journal.pbio.1001388 827 11. Charlesworth B. Fundamental concepts in genetics: effective population size and patterns of 828 molecular evolution and variation. Nat Rev Genet. 2009;10: 195–205. doi:10.1038/nrg2526

829 12. Lynch M, Gabriel W. Mutation load and the survival of small populations. Evolution (N Y). 830 1990;44: 1725–1737. doi:10.2307/2409502

831 13. Savage VM, Allen AP, Brown JH, Gillooly JF, Herman AB, Woodruff WH, et al. Scaling of 832 number, size, and metabolic rate of cells with body size in mammals. PNAS. 2007;104: 4718– 833 4723. doi:10.1073/pnas.0611235104 834 14. Tollis M, Boddy AM, Maley CC. Peto’s Paradox: how has evolution solved the problem of cancer 835 prevention? BMC Biol. BMC Biology; 2017;15: 60. doi:10.1186/s12915-017-0401-7

836 15. Erickson GM, Rogers KC, Yerby SA. Dinosaurian growth patterns and rapid avian growth rates. 837 Nature. 2001;412: 429–433. doi:10.1038/35086558

838 16. Lui JC, Baron J. Mechanisms limiting body growth in mammals. Endocr Rev. 2011;32: 422–440. 839 doi:10.1210/er.2011-0001 840 17. Cairns J. Mutation selection and the natural history of cancer. Nature. 1975;255: 197–200. 841 doi:10.1038/255197a0 842 18. Sulak M, Fong L, Mika K, Chigurupati S, Yon L, Mongan NP, et al. TP53 copy number expansion 843 is associated with the evolution of increased body size and an enhanced DNA damage response in

27 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

844 elephants. Elife. 2016;5: 1–30. doi:10.7554/eLife.11994

845 19. Vazquez JM, Sulak M, Chigurupati S, Lynch Correspondence VJ, Manuel Vazquez J, Lynch VJ. A 846 zombie LIF gene in elephants is upregulated by TP53 to induce apoptosis in response to DNA 847 damage. CellReports. 2018;24: 1765–1776. doi:10.1016/j.celrep.2018.07.042 848 20. Keane M, Semeiks J, Webb AE, Li YI, Quesada V, Craig T, et al. Insights into the evolution of 849 longevity from the bowhead whale genome. Cell Rep. 2015;10: 112–122. 850 doi:10.1016/j.celrep.2014.12.008 851 21. Leroi AM, Koufopanou V, Burt A. Cancer selection. Nature. 2002;1326: 226–231.

852 22. Hershkovitz P. The recent mammals of the neotropical region: a zoogeographic and ecological 853 review. In: Keast A, Erk FC, Glass B, editors. Evolution, Mammals and Southern Continents. 854 Albany: State University of New York; 1972. pp. 311–431. doi:10.1086/659883 855 23. Vucetich M, Deschamps CM. Roedores gigantes en el Museo de La Plata. Museo. 2015;27. 856 24. Sánchez-Villagra MR, Aguilera OA, Horovitz I. The anatomy of the world’s largest extinct rodent. 857 Science (80- ). 2003;301: 1708–1710. doi:10.1126/science.1089332 858 25. Nowak RM, Paradiso JL. Walker’s mammals of the world. 4th ed. Walker’s mammals of the world. 859 Baltimore and London: The Johns Hopkins University Press,; 1983. 860 26. Opazo JC, Palma RE, Melo F, Lessa EP. Adaptive evolution of the insulin gene in caviomorph 861 rodents. Mol Biol Evol. 2005;22: 1290–1298. doi:10.1093/molbev/msi117

862 27. King GL, Kahn CR. Non-parallel evolution of metabolic and growth-promoting functions of 863 insulin [Internet]. Nature. 1981. pp. 644–646. doi:10.1038/292644a0

864 28. Seluanov A, Chen Z, Hine C, Sasahara THC, Antonio AC, Ribeiro M, et al. Telomerase activity 865 coevolves with body mass, not lifespan Andrei. Aging Cell. 2007;6: 45–52. doi:10.1111/j.1474- 866 9726.2006.00262.x.Telomerase

867 29. Sánches-Isaza CA, Jiménez-Robayo LM. Estudio citogenético del chigüiro (Hydrochoerus 868 hydrochaeris) de la orinoquía colombiana. In: López-Arévalo HF, Sánchez-Palomino P, 869 Montenegro OL, editors. El chigüiro Hydrochoerus hydrochaeris en la Orinoquía colombiana: 870 Ecología, manejo sostenible y conservación. Universidad Nacional de Colombia; 2014. p. 147.

871 30. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva E V., Zdobnov EM. BUSCO: Assessing 872 genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 873 2015;31: 3210–3212. doi:10.1093/bioinformatics/btv351 874 31. Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic 875 genomes. Bioinformatics. 2007;23: 1061–7. doi:10.1093/bioinformatics/btm071

876 32. Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, et al. MAKER: An easy-to-use 877 annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18: 188– 878 196. doi:10.1101/gr.6743907 879 33. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, et al. Initial sequencing 880 and comparative analysis of the mouse genome. Nature. 2002;420: 520–562. 881 doi:10.1038/nature01262 882 34. Eastman JM, Alfaro ME, Joyce P, Hipp AL, Harmon LJ. A Novel comparative method for 883 identifying shifts in the rate of character evolution on trees. Evolution (N Y). 2011;65: 3578–3589.

28 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

884 doi:10.1111/j.1558-5646.2011.01401.x

885 35. Álvarez A, Arévalo RLM, Verzi DH, Aires B. Diversification patterns and size evolution in 886 caviomorph rodents. Biol J Linn Soc. 2017;20: 1–16. 887 36. Puttick MN, Thomas GH. Fossils and living taxa agree on patterns of body mass evolution: a case 888 study with . Proc R Soc B. 2015;282: 20152023. doi:10.1098/rspb.2015.2023 889 37. Slater GJ, Goldbogen J, Pyenson ND. Independent evolution of baleen whale gigantism linked to 890 Plio- ocean dynamics. Proc R Soc B. 2017;284: 20170546. doi:10.1098/rspb.2017.0546 891 38. Benson RBJ, Campione NE, Carrano MT, Mannion PD, Sullivan C, Upchurch P, et al. Rates of 892 dinosaur body mass evolution indicate 170 million years of sustained ecological innovation on the 893 avian stem lineage. PLoS Biol. 2014;12. doi:10.1371/journal.pbio.1001853 894 39. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24: 1586– 895 91. doi:10.1093/molbev/msm088 896 40. Demuth JP, Bie T De, Stajich JE, Cristianini N, Hahn MW. The evolution of mammalian gene 897 families. PLoS One. 2006;1: e85. doi:10.1371/journal.pone.0000085 898 41. Zhu H, Zhou Z, Wang D, Liu W, Zhu H. Hippo pathway genes developed varied exon numbers and 899 coevolved functional domains in metazoans for species specific growth control. BMC Evol Biol. 900 2013;13: 76. doi:10.1186/1471-2148-13-76 901 42. Baker J, Liu J-P, Robertson EJ, Efstratiadis A. Role of insulin-like growth factors in embryonic and 902 postnatal growth. Cell. 1993;75: 73–82. doi:10.1016/S0092-8674(05)80085-6 903 43. Margetic S, Gazzola C, Pegg G, Hill R. Leptin: a review of its peripheral actions and interactions. 904 Int J Obes. 2002;26: 1407–1433. doi:10.1038=sj.ijo.0802142 905 44. Hensen K, Valckenborgh ICC Van, Kas K, Ven WJM Van De, Voz ML. The tumorigenic diversity 906 of the three PLAG family members is associated with different DNA binding capacities 1. Cancer 907 Res. 2002;62: 1510–1517. 908 45. Hensen K, Braem C, Declercq J, Van Dyck F, Dewerchin M, Fiette L, et al. Targeted disruption of 909 the murine Plag1 proto-oncogene causes growth retardation and reduced fertility. Dev Growth 910 Differ. 2004;46: 459–470. doi:10.1111/j.1440-169x.2004.00762.x

911 46. Blake JA, Eppig JT, Kadin JA, Richardson JE, Smith CL, Bult CJ, et al. Mouse Genome Database 912 (MGD)-2017: Community knowledge resource for the laboratory mouse. Nucleic Acids Res. 2017; 913 doi:10.1093/nar/gkw1040 914 47. Fowden AL. The insulin-like growth factors and feto-placental growth. Placenta. 2003;24: 803– 915 812. doi:10.1016/S0143-4004(03)00080-8 916 48. Levinovitz A, Norstedt G, van den Berg S, Robinson IC, Ekstrom TJ. Isolation of an insulin-like 917 growth factor II cDNA from guinea pig liver: expression and developmental regulation. Mol Cell 918 Endocrinol. 1992;89: 105–110. 919 49. Laplante M, Sabatini DM. mTOR signaling in growth control and disease. Cell. 2012;149: 274– 920 293. doi:10.1016/j.cell.2012.03.017 921 50. Ito Y, Hart JR, Ueno L, Vogt PK. Oncogenic activity of the regulatory subunit p85β of 922 phosphatidylinositol 3-kinase (PI3K). Proc Natl Acad Sci. 2014;111: 16826–16829. 923 doi:10.1073/pnas.1420281111 924 51. Taniguchi CM, Winnay J, Kondo T, Bronson RT, Guimaraes AR, Aleman JO, et al. The PI3K

29 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

925 regulatory subunit p85α can exert tumor suppressor properties through negative regulation of 926 growth factor signalling. Cancer Research. 2007. pp. 5305–5315. doi:10.1158/0008-5472 927 52. Ni S, Chen L, Li M, Zhao W, Shan X, Wu M, et al. PKC iota promotes cellular proliferation by 928 accelerated G1/S transition via interaction with CDK7 in esophageal squamous cell carcinoma. 929 Tumor Biol. 2016;37: 13799–13809. doi:10.1007/s13277-016-5193-9 930 53. Murtaugh LC. Pancreas and beta-cell development: from the actual to the possible. Development. 931 2007;134: 427–38. doi:10.1242/dev.02770 932 54. Gao T, McKenna B, Li C, Reichert M, Nguyen J, Singh T, et al. Pdx1 maintains β cell identity and 933 function by repressing an α cell program. Cell Metab. 2014;19: 259–271. 934 doi:10.1016/j.cmet.2013.12.002

935 55. Piccand J, Strasser P, Hodson DJ, Meunier A, Ye T, Keime C, et al. Rfx6 maintains the functional 936 identity of adult pancreatic β cells. Cell Rep. 2014;9: 2219–2232. doi:10.1016/j.celrep.2014.11.033 937 56. Pagliuca FW, Melton DA. How to make a functional β-cell. Development. 2013;140: 2472–83. 938 doi:10.1242/dev.093187 939 57. Chandra V, Albagli-curiel O, Polak M, Scharfmann R, Chandra V, Albagli-curiel O, et al. RFX6 940 regulates insulin secretion by modulating Ca 2 + homeostasis in human β cells. Cell Rep. 2014; 1– 941 13. doi:10.1016/j.celrep.2014.11.010

942 58. Andervont HB, Dunn TB. Occurrence of tumors in wild house mice. J Natl Cancer Inst. 1962; 943 doi:10.1093/jnci/28.5.1153 944 59. Jelínek F. Spontaneous tumours in guinea pigs. Acta Vet Brno. 2003;72: 221–228.

945 60. Stoffregen DA, Prowten AW, Steinberg H, Anderson WI, Stoffregen A, Prowten W, et al. A 946 fibrosarcoma in the skeletal muscle of a capybara (Hydrochoerus hydrochaeris). J Wildl Dis. 947 1993;29: 345–348. 948 61. Hamano T, Terasawa F, Tachikawa Y, Murai A, Mori T. Squamous cell carcinoma in a capybara 949 (Hydrochoerus hydrochaeris). J Vet Med Sci. 2014;76: 1301–1304. doi:10.1292/jvms.13-0395 950 62. Srivorakul S, Boonsri K, Vechmanus T, Boonthong P. Localized histiocytic sarcoma in a captive 951 capybara (Hydrochoerus hydrochaeris). Thai J Vet Med. 2017;47: 131–135.

952 63. Abegglen LM, Caulin AF, Chan A, Lee K, Robinson R, Campbell MS, et al. Potential mechanisms 953 for cancer resistance in elephants and comparative cellular response to DNA damage in humans. 954 JAMA. 2015;314: 1850–1852. doi:10.1001/jama.2015.13134.Potential 955 64. Amson R, Pece S, Marine J, Paolo P, Fiore D, Telerman A, et al. TPT1 / TCTP-regulated pathways 956 in phenotypic reprogramming. Trends Cell Biol. 2013;23: 37–46. 957 65. van der Bruggen P, Traversari C, Chomez P, Lurquin C, De Plaen E, Van den Eynde B, et al. A 958 gene encoding an antigen recognized by cytolytic T lymphocytes on a human melanoma. Science 959 (80- ). 1991;254: 1643–1647. doi:10.1126/science.1840703 960 66. Russell JH, Ley TJ. Lymphocyte-mediated cytotoxicity. Annu Rev Immunol. 2002;20: 323–370. 961 doi:10.1146/annurev.immunol.20.100201.131730 962 67. Jelier R, Semple JI, Garcia-verdugo R, Lehner B. Predicting phenotypic variation in yeast from 963 individual genome sequences. Nat Genet. 2011;43: 1270–1274. doi:10.1038/ng.1007 964 68. De Lange T. Shelterin: The protein complex that shapes and safeguards human telomeres. Genes 965 Dev. 2005;19: 2100–2110. doi:10.1101/gad.1346005

30 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

966 69. O’Connor MS, Safari A, Liu D, Qin J, Songyang Z. The human Rap1 protein complex and 967 modulation of telomere length. J Biol Chem. 2004;279: 28585–28591. 968 doi:10.1074/jbc.M312913200

969 70. Blasco MA. Telomeres and human disease: ageing, cancer and beyond. Nat Rev Genet. 2005;6: 970 611–622. doi:10.1038/nrg1656 971 71. Dantzer F, Giraud-Panis M-J, Jaco I, Amé JC, Schultz I, Blasco M, et al. Functional interaction 972 between Poly(ADP-Ribose) Polymerase 2 (PARP-2) and TRF2: PARP activity negatively regulates 973 TRF2. Mol Cell Biol. 2004;24: 1595–1607. doi:10.1128/MCB.24.4.1595

974 72. Conlon I, Raff M. Size control in animal development. Cell. 1999;96: 235–244. 975 doi:10.1016/S0092-8674(00)80563-2

976 73. Crespi B, Summers K. Evolutionary biology of cancer. Trends Ecol Evol. 2005;20: 545–552. 977 doi:10.1016/j.tree.2005.07.007 978 74. Mering C Von, Huynen M, Jaeggi D, Schmidt S, Bork P, Snel B. STRING: a database of 979 predicted functional associations between proteins. 2003;31: 258–261. doi:10.1093/nar/gkg034 980 75. Andò S, Catalano S. The multifactorial role of leptin in driving the breast cancer 981 microenvironment. Nat Rev Endocrinol. 2012;8: 263–75. doi:10.1038/nrendo.2011.184 982 76. Zheng H, Ying H, Wiedemeyer R, Yan H, Quayle SN, Ivanova E V., et al. PLAGL2 regulates Wnt 983 signaling to impede differentiation in neural stem cells and gliomas. Cancer Cell. Elsevier Ltd; 984 2010;17: 497–509. doi:10.1016/j.ccr.2010.03.020 985 77. Landrette S, Medera D, He F, Castilla L. The PlagL2 transcription factor activates Mpl 986 transcription and signaling in hematopoietic progenitor and leukemia cells Sean. Leukemia. 987 2011;25: 655–662. doi:10.1146/annurev-immunol-032713-120240.Microglia

988 78. Yuan TL, Cantley LC. PI3K pathway alterations in cancer: variations on a theme. Oncogene. 989 2008;27: 5497–5510. doi:10.1038/onc.2008.245

990 79. Summers K, Silva J, Farwell MA. Intragenomic conflict and cancer. Med Hypotheses. 2002;59: 991 170–179. doi:10.1016/S0306-9877(02)00249-9 992 80. Burt A, Trivers RL. Genes in conflict: The biology of selfish genetic elements. The Quarterly 993 Review of Biology. 2006. doi:10.1086/523135 994 81. Hanahan D, Weinberg R. The hallmarks of cancer. Cell. 2000;100: 57–70. doi:10.1007/s00262- 995 010-0968-0 996 82. Tjalma RA. Canine bone sarcoma: Estimation of relative risk as a function of body size. J Natl 997 Cancer Inst. 1966; doi:10.1093/jnci/36.6.1137 998 83. Fraumeni JF. Stature and malignant tumors of bone in childhood and adolescence. Cancer. 1967; 999 doi:10.1002/1097-0142(196706)20:6<967::AID-CNCR2820200606>3.0.CO;2-P

1000 84. Ohta T. The Nearly Neutral Theory Of Molecular Evolution. Annu Rev Ecol Syst. 1992; 1001 doi:10.2307/2097289

1002 85. Popadin K, Polishchuk L V, Mamirova L, Knorre D, Gunbin K. Accumulation of slightly 1003 deleterious mutations in mitochondrial protein-coding genes of large versus small mammals. Proc 1004 Natl Acad Sci. 2007;104: 13390–13395. doi:10.1073/pnas.0701256104 1005 86. Ballard JWO, Whitlock MC. The incomplete natural history of mitochondria. Mol Ecol. 2004;13: 1006 729–744. doi:10.1046/j.1365-294X.2003.02063.x

31 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1007 87. Piganeau G, Eyre-Walker A. Evidence for variation in the effective population size of animal 1008 mitochondrial DNA. PLoS One. 2009;4: 2–9. doi:10.1371/journal.pone.0004396 1009 88. Mulligan C., Kitchen A, Miyamoto MM. Comment on “Population size does not influence 1010 mitochondrial genetic diversity in animals.” Science (80- ). 2006;312: 570–572. 1011 doi:10.1126/science.1122033 1012 89. Baker J, Meade A, Pagel M, Venditti C. Adaptive evolution toward larger size in mammals. Proc 1013 Natl Acad Sci. 2015;112: 5093–5098. doi:10.1073/pnas.1419823112 1014 90. Kavanagh K. Perspective: Embedded molecular switches, anticancer selection, and effects on 1015 ontogenetic rates: A hypothesis of devlopmental constraint on morphogenesis and evolution. 1016 Evolution (N Y). 2003;57: 939–948. doi: 10.1111/j.0014-3820.2003.tb00306.x

1017 91. Lailvaux SP, Husak JF. The life history of whole-organism performance. Q Rev Biol. 2014;89: 1018 285–318. doi:10.1086/678567 1019 92. Lohr JN, Haag CR. Genetic load, inbreeding depression, and hybrid vigor covary with population 1020 size: An empirical evaluation of theoretical predictions. Evolution (N Y). 2015;69: 3109–3122. 1021 doi:10.1111/evo.12802

1022 93. Bertram J, Gomez K, Masel J. Predicting patterns of long-term adaptation and extinction with 1023 population genetics. Evolution (N Y). 2016;71: 204–214. doi:10.1111/evo.13116

1024 94. Ripple WJ, Wolf C, Newsome TM, Hoffmann M, Wirsing AJ, McCauley DJ. Extinction risk is 1025 most acute for the world’s largest and smallest vertebrates. Proc Nat Acad Sci. 2017; 1–6. 1026 doi:10.1073/pnas.1702078114 1027 95. Hansen TF, Houle D. Evolvability, stabilizing selection, and the problem of stasis. In: In Pigliucci 1028 M, Preston K, editors. Phenotypic Integration: Studying the ecology and evolution fo complex 1029 phenotypes. Oxford: Oxford University Press; 2004. 1030 96. Pan H, Yu H, Ravi V, Li C, Lee AP, Lian MM, et al. The genome of the largest bony fish, ocean 1031 sunfish (Mola mola), provides insights into its fast growth rate. Gigascience. GigaScience; 2016;5: 1032 36. doi:10.1186/s13742-016-0144-3

1033 97. Sutter NB, Bustamante CD, Chase K, Gray MM, Zhao K, Zhu L, et al. A single IGF1 allele is a 1034 major determinant of small size in dogs. Science (80- ). 2007;316: 112–115. 1035 98. Tollis M, Schiffman JD, Boddy AM. Evolution of cancer suppression as revealed by mammalian 1036 comparative genomics. Curr Opin Genet Dev. Elsevier Ltd; 2017;42: 40–47. 1037 doi:10.1016/j.gde.2016.12.004

1038 99. Kershaw MH, Westwood JA, Darcy PK. Gene-engineered T cells for cancer therapy. Nat Rev 1039 Cancer. 2013;13: 525–541. doi:10.1038/nrc3565

1040 100. Upham NS, Patterson BD. Diversification and biogeography of the Neotropical caviomorph lineage 1041 Octodontoidea (Rodentia: ). Mol Phylogenet Evol. 2012;63: 417–429. 1042 doi:10.1016/j.ympev.2012.01.020 1043 101. Tacutu R, Craig T, Budovsky A, Wuttke D, Lehmann G, Taranukha D, et al. Human Ageing 1044 Genomic Resources: Integrated databases and tools for the biology and genetics of ageing. Nucleic 1045 Acids Res. 2013; doi:10.1093/nar/gks1155 1046 102. Freudenthal M, Martín-Suárez E. Estimating body mass of fossil rodents. Scr Geol. 2013;8: 1–130.

1047 103. Felsenstein J. Phylogenies and quantitative characters. Ann Rev Ecol Syst. 1988;19: 445–471.

32 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1048 104. Elliot M, Mooers A. Inferring ancestral states without assuming neutrality or gradualism using a 1049 stable model of continuous character evolution. BMC Evol Biol. 2014;14: 1–39. 1050 doi:10.1186/s12862-014-0226-8

1051 105. Weisenfeld NI, Yin S, Sharpe T, Lau B, Hegarty R, Holmes L, et al. Comprehensive variation 1052 discovery in single human genomes. Nat Genet. 2014; doi:10.1038/ng.3121 1053 106. Putnam NH, Connell BO, Stites JC, Rice BJ, Hartley PD, Sugnet CW, et al. Chromosome-scale 1054 shotgun assembly using an in vitro method for long-range linkage. Genome Res. 2015;26: 342– 1055 350. doi:10.1101/gr.193474.115.Freely

1056 107. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: Quality assessment tool for genome 1057 assemblies. Bioinformatics. 2013;29: 1072–1075. doi:10.1093/bioinformatics/btt086

1058 108. Li W, Jaroszewzki L, Godzik A. Clustering of highly homologous sequences to reduce the size of 1059 large protein databases. Bioinformatics. 2001;17: 282–283. 1060 109. Yandell M, Ence D. A beginner’s guide to eukaryotic genome annotation. Nat Rev Genet. 2012;13: 1061 329–342. doi:10.1038/nrg3174 1062 110. Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, et al. PANTHER: A library 1063 of protein families and subfamilies indexed by function. Genome Res. 2003;13: 2129–2141. 1064 doi:10.1101/gr.772403

1065 111. Mi H, Muruganujan A, Casagrande JT, Thomas PD. Large-scale gene function analysis with the 1066 PANTHER classification system. Nat Protoc. 2013;8: 1551–1566. doi:10.1038/nprot.2013.092 1067 112. Smit A, Hubley R, Green P. RepeatMasker Open-4.0. 2013-2015 . http://www.repeatmasker.org. 1068 2013. 1069 113. Evans T, Loose M. AlignWise: a tool for identifying protein-coding sequence and correcting 1070 frame-shifts. BMC Bioinformatics. BMC Bioinformatics; 2015;16: 376. doi:10.1186/s12859-015- 1071 0813-8

1072 114. Douzery EJP, Scornavacca C, Romiguier J, Belkhir K, Galtier N, Delsuc F, et al. OrthoMaM v8: A 1073 database of orthologous exons and coding sequences for comparative genomics in mammals. Mol 1074 Biol Evol. 2014;31: 1923–1928. doi:10.1093/molbev/msu132 1075 115. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: 1076 architecture and applications. BMC Bioinformatics. 2009;10: 421. doi:10.1186/1471-2105-10-421

1077 116. Webb AE, Walsh TA, O’Connell MJ. VESPA: Very large-scale Evolutionary and Selective 1078 Pressure Analyses. PeerJ Prepr. 2016;4: e1895v1.

1079 117. Edgar RC. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. 1080 Nucleic Acids Res. 2004; doi:10.1093/nar/gkh340 1081 118. Talavera G, Castresana J. Improvement of phylogenies after removing divergent and ambiguously 1082 aligned blocks from protein sequence alignments. Syst Biol. 2007; 1083 doi:10.1080/10635150701472164

1084 119. Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large 1085 phylogenies. Bioinformatics. 2014; doi:10.1093/bioinformatics/btu033

1086 120. Shaw TI, Ruan Z, Glenn TC, Liu L. STRAW: Species TRee Analysis Web server. Nucleic Acids 1087 Res. 2013; doi:10.1093/nar/gkt377 1088 121. Yang Z, Rannala B. Bayesian estimation of species divergence times under a using

33 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1089 multiple fossil calibrations with soft bounds. Mol Biol Evol. 2006; doi:10.1093/molbev/msj024

1090 122. Hedges SB, Dudley J, Kumar S. TimeTree: A public knowledge-base of divergence times among 1091 organisms. Bioinformatics. 2006; doi:10.1093/bioinformatics/btl505 1092 123. Lechner M, Findeiß S, Steiner L, Marz M, Stadler P, Prohaska S. Proteinortho: Detection of (Co- 1093 )orthologs in large-scale analysis. BMC Bioinformatics. 2011;12: 1–9. doi:10.2307/2412448 1094 124. Ranwez V, Harispe S, Delsuc F, Douzery EJP. MACSE: Multiple alignment of coding SEquences 1095 accounting for frameshifts and stop codons. PLoS One. 2011;6. doi:10.1371/journal.pone.0022594 1096 125. Felsenstein J. Phylogenies and the comparative method. Am Nat. 1985; doi:10.1086/284325 1097 126. R Development Core Team R. R: A Language and environment for statistical computing. R 1098 Foundation for Statistical Computing. 2011. doi:10.1007/978-3-540-74686-7 1099 127. Simakov O, Marletaz F, Cho S-J, Edsinger-Gonzales E, Havlak P, Hellsten U, et al. Insights into 1100 bilaterian evolution from three spiralian genomes. Nature. Nature Publishing Group; 2013;493: 1101 526–531. doi:10.1038/nature11696

1102 128. Albertin CB, Simakov O, Mitros T, Wang ZY, Pungor JR, Edsinger-Gonzales E, et al. The octopus 1103 genome and the evolution of cephalopod neural and morphological novelties. Nature. 2015;524: 1104 220–224. doi:10.1038/nature14668 1105 129. Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons 1106 dramatically improves orthogroup inference accuracy. Genome Biol. Genome Biology; 2015;16: 1107 157. doi:10.1186/s13059-015-0721-2 1108 130. Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: Freely available 1109 Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25: 1110 1422–1423. doi:10.1093/bioinformatics/btp163

1111 131. Knight R, Maxwell P, Birmingham A, Carnes J, Caporaso G, Easton BC, et al. PyCogent: a toolkit 1112 for making sense from sequence. Genome Biol. 2007;8. doi:10.1186/gb-2007-8-8-r171 1113 132. Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol Evol. 1114 2000;15: 496–503. 1115 133. Wang Y, Coleman-Derr D, Chen G, Gu YQ. OrthoVenn: A web server for genome wide 1116 comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Res. 1117 2015;43: W78–W84. doi:10.1093/nar/gkv487

1118 134. Zhang J, Nielsen R, Yang Z. Evaluation of an improved branch-site likelihood method for detecting 1119 positive selection at the molecular level. Mol Biol Evol. 2005;22: 2472–9. 1120 doi:10.1093/molbev/msi237 1121 135. Anisimova M, Bielawski JP, Yang Z. Accuracy and power of bayes prediction of amino acid sites 1122 under positive selection. Mol Biol Evol. 2002;19: 950–958. 1123 doi:10.1093/oxfordjournals.molbev.a004152 1124 136. Zhang J. Frequent False Detection of Positive Selection by the likelihood method with branch-site 1125 models. Mol Biol Evol. 2004;21: 1332–1339. doi:10.1093/molbev/msh117 1126 137. Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration of genomic 1127 datasets with the R/Bioconductor package biomaRt. Nat Protoc. 2009;4: 1184–1191. 1128 doi:10.1038/nprot.2009.97 1129 138. Löytynoja A, Goldman N. An algorithm for progressive multiple alignment of sequences with

34 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1130 insertions. Proc Natl Acad Sci U S A. 2005;102: 10557–62. doi:10.1073/pnas.0409137102

1131 139. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: A software 1132 environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 1133 doi:10.1101/gr.1239303 1134 140. A high-resolution map of humanevolutionary constraint using 29 mammals. Nature. 2011;478: 1135 476–482.

1136 141. Kim EB, Fang X, Fushan AA, Huang Z, Lobanov A V, Han L, et al. Genome sequencing reveals 1137 insights into physiology and longevity of the naked mole rat. Nature. Nature Publishing Group; 1138 2011;479: 223–227. doi:10.1038/nature10533 1139 142. Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, et al. Genome 1140 sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004; 1141 doi:10.1038/nature02426 1142 143. Fang X, Nevo E, Han L, Levanon EY, Zhao J, Avivi A, et al. Genome-wide adaptive complexes to 1143 underground stresses in blind mole rats Spalax. Nat Commun. 2014;5: 1–9. 1144 doi:10.1038/ncomms4966

1145 144. Lok S, Paton TA, Wang Z, Kaur G, Walker S, Yuen RKC, et al. De novo genome and 1146 transcriptome assembly of the Canadian beaver (Castor canadensis). G3 (Bethesda). 2017; 1147 doi:10.1534/g3.116.038208 1148

1149

1150

1151

35 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1152 Figures and tables 1153 1154 Table 1. Assembly statistics of the capybara genome. 1155 Starting Assembly Final Assembly (Shotgun + DISCOVAR de novo) (Chicago + HiRise) Total length (Mb) 2,735 2,738 Contig N50 (Kb) 148 161 Number of scaffolds 656,018 73,920 Scaffold N50 (Mb) 0.202 12.2 1156 1157

36 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A. B. C. Genome Repeat Number of size (Mb) content (%) genes

Naked?(0),?(0), mole rat 2631.08 28.63 20,063?(0),?(0),?(0),?(0),?(0),

Chinchilla-=$9>=$'')-=$9>=$'') 2390.87 29.57 20,685-=$9>=$'')-=$9>=$'')-=$9>=$'')-=$9>=$'')-=$9>=$'')

Degu4&;#4&;# 2995.89 23.33 20,6104&;#4&;#4&;#4&;#4&;#

Capybara-) : <2)-) : <2)%)%) 2585.13 37.45 24,941-)-)-)-)-)::<2)<2):::<2)<2)<2)%))% )) %)

Guinea8#$9&)5:$;8#$9&)5:$; pig 2723.22 37.89 19,9768#$9&)5:$;8#$9&)5:$;8#$9&)5:$;8#$9&)5:$;8#$9&)5:$;8#$9&)5:$;

Mouse(+#.&(+#.& 2671.82 44.16 22,493(+#.&(+#.&(+#.&(+#.&(+#.&

Rat0), 0), 2743.3 41.98 23,3470),0),0),0),0),

Vole7+'&7+'& 2287.34 23 20,36077+'&+'&777+'&+'&+'&

Hamster6)*.,&%6)*.,&% 2504.93 22.15 20,4956)*.,&%6)*.,&%6)*.,&%6)*.,&%6)*.,&%

Deer4&&%5*+#.&4&&%5*+#.& mouse 2630.54 - 4&&%5*+#.&4&&%5*+#.&-4&&%5*+#.& 4&&%5*+#.&4&&%5*+#.&

Blind3(0),3(0), mole rat 3061.42 32.89 20,9063(0),3(0),3(0),3(0),3(0),

Jerboa1&%2+)1&%2+) 2835.25 33.4 18,8821&%2+)1&%2+)1&%2+)1&%2+)1&%2+)

Kangaroo/0), /0), rat 2236.37 31.19 19,987/0),/0),/0),/0),/0),

Beaver-).,+%-).,+% 2518.31 33.74 21,914-).,+%-).,+%-).,+%-).,+%-).,+%

Marmot() %*+,)() %*+,) 2510.59 - ()()()()()%*+,)-*+,)% %*+,)*+,)*+,)

Squirrel!"#$%%&'!"#$%%&' 2195.88 33.57 20,022!"#$%%&'!"#$%%&'!"#$%%&'!"#$%%&'!"#$%%&'

'!80 '! %!40 %!!0 ! !! "!!!!"!!!! '!'!#!!!!'!'!#!!!!%!%! !! !! $!!!!$!!!!!! !0 %!!!!%!!!!"!!!!"!!!!"!!!!"!!!!"!!!!10,000&!!!!&!!!!#!!!!#!!!!#!!!!20,000 $!!!!$!!!!$!!!!30,000 %!!!!%!!!!%!!!!40,000 &!!!!&!!!!&!!!!50,000 -./MYA-./ !"#$%&'((%)*+,!"#$%&'((%)*+,-./-./-./-./ !"#$%&'((%)*+,!"#$%&'((%)*+,Body mass (g) 1158 !"#$%&'((%)*+,!"#$%&'((%)*+,!"#$%&'((%)*+, 1159 1160 Fig 1. Divergence times, genome statistics and body mass of representative rodents 1161 used for comparative genomics analyses. (A) Divergence times of rodent species using 1162 the topology obtained from the phylogenomic analysis (see Materials and methods). 1163 MYA: Million years ago (B) Basic genome statistics of rodents used for comparative 1164 analyses. Neither Repeat Content nor Gene Content was available for Deer mouse and 1165 Marmot (C) Body mass ranges from 20.5 g, being the mouse the smallest surveyed

1166 rodent, to 55,000 g in the capybara, the largest living rodent. 1167

37 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

● ●

● ●

● ● 0.4 ● ● ●

● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

x (PDI) ● ●● ● ● ● ● e ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● PLAGL2 ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ●● ●●● ● ●● PARP2 ● ●● ●● ●● ●●●●●● ● ●● ● ●● ●● ● ● ● ● ● ●● ●●●● ●● ●●● ●● ● ●● ● ● ● ● ● ●●●●●● ●● ● ● ● ●● ● ● ● ●● ● ●●● ● ●●● ● ●●● ●●●●● ●● ● ● ●● ● ● ●●●●●●● ●● ● ● ●●●●●●●●● ●● ●● ●● ● ●● ● ● ● ● ●● ●●●●●●●● ● ●●●● ●●●●● ● ● ●●● ● ● ● ● ● ●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ● ●●● ●●● ●●● ● RFX6 ●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ● ●● ● ● ● ●●●●●●●●●●● ●●●●● ●●●●●●●● ●●●●●●● ●●●●●●●●●●●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●● ●● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ● ● ●● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ● ● ● 0.0 ●●●●●●●●●●●●●●●●●●●● ●● ● ● ● ● ●●●●●●●●●●●●●●●● ● ●● ●● ● ●●●●●●●●●● ●● ●● ● ●●●● ● ● ●● ● ●●●●● ●●● ● ● ●●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● 51 ●● ● ● ● ● ● otein distance ind ● 37

r 221 ● ●

● P ● ● 384 51 33 ● ● . ●● 88 46 96 133 32 ● ● 229 162 ● 70 94 12,878 174 82 58 733 49 −0.4 1,855 116 298 302 820 116 60 518 566

● 633

0 100 200 300 400 Likelihood−ratio branch−site test

1168 1169 Fig 2. Positively selected genes in the capybara. Red dots are genes that lie in the top 1170 5% of the distribution, above and to the right of both axes and, thus, considered to be 1171 under positive selection (see Materials and methods). Among fifty-six genes under 1172 positive selection, three were associated with growth regulation and tumor suppression 1173 (orange dots and arrows). PLAGL2, Pleiomorphic adenoma-like Protein 2; PARP2, 1174 Poly(ADP-Ribose) Polymerase 2; RFX6, Regulatory Factor X6. Inset: Venn diagram of 1175 the protein orthologous clusters among the 5 hystricognath species: capybara 1176 (Hydrochoerus hydrochaeris), guinea pig (Cavia porcellus), degu (Octodon degus), 1177 chinchilla (Chinchilla lanigera) and naked mole rat (Heterocephalus glaber). Of the 1178 12,878 orthologous groups shared by all 5 species, 4,452 were SCOs. 1179

38 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A. C.

Itri ● 0.0025 Mmar

Ccan ● Dord ●

● ● Jjac Odeg

GuineaPig|XP_005002258 Hgla ● ●

Ngal CAPYBARA_s1_002394-RA ● Cpor Group CAPYBARA_s8_002846-RA Clan ●a HYST 0.0000 ● pathway Chinchilla|XP_013361744 ● ●a MRC Mocr ● ●a SRC Maur

Hamster|XP_005080125

NMRat|XP_012923762 signaling Mouse|XP_006527991 Mouse|NP_001094920

PC2 (11.34%) Mouse|MGI_2684890 Pman CAPYBARA_s8_002845-RA

● CAPYBARA_s2_003063-RA CAPYBARA_s8_002844-RA DeerMouse|XP_006990193 CAPYBARA_s9_001861-RA GPCR CAPYBARA_s8_003045-RA CAPYBARA_s5_001718-RA Rat|XP_003752106 CAPYBARA_s2_003025-RA Vole|XP_005368674 CAPYBARA_s3_002782-RA −0.0025 CAPYBARA_s8_003044-RA Castor|XP_020007989 Jerboa|XP_004672959 CAPYBARA_s2_003058-RA DeerMouse|XP_006998169 CAPYBARA_s9_001846-RA DeerMouse|XP_006998166 CAPYBARA_s1_002393-RA CAPYBARA_s4_002332-RA CAPYBARA_s8_002842-RA ● Vole|XP_005372092DeerMouse|XP_006990184 GuineaPig|XP_003466482 Rnor ● Hhyd GuineaPig|XP_003466479 BMRat|XP_008824753 ● CAPYBARA_s8_002847-RA Mmus |XP_005080128 −0.005 0.000 0.005 0.010 0.015 CAPYBARA_s3_002709-RA PC1 (20.14%) GuineaPig|XP_005002260 Rat|XP_006257076 GuineaPig|XP_005002257 Rat|XP_008771536 GuineaPig|XP_003466480 KangRat|XP_012881930 Sensory perception of smell GuineaPig|XP_003466486 GPCR signaling pathway KangRat|XP_012881929 GuineaPig|XP_003466487 T/B cell receptor signaling pathway KangRat|XP_012881925 GuineaPig|XP_003466485 KangRat|XP_012881928 CAPYBARA_s9_001385-RA KangRat|XP_012881923Castor|XP_020007987 CAPYBARA_s4_002285-RA Raw Z−score B. 0 1 2 3 Rabbit|XP_017205423 GuineaPig|XP_003466484 Rabbit|XP_008252038 CAPYBARA_s8_002850-RA CAPYBARA_s8_002849-RA CAPYBARA_s9_001724-RA CAPYBARA_s8_002843-RA GrSquirrel|XP_005329234Marmota|XP_015343471 PTHR24271:SF38 GZMB Degu|XP_004631477Degu|XP_004631482 T cell-mediated Degu|XP_004631483 Jerboa|XP_012807880 Human|HGNC_23795 tumor Rat|XP_006257074 Hamster|XP_012975977 Degu|XP_004631479 suppression DeerMouse|XP_015863877 NMRat|XP_004872297 Gorilla|ENSGGOG00000009421 Chinchilla|XP_005414797Degu|XP_004631480 PTHR11736:SF103 Chinchilla|XP_005414799 MAGEB5 Chinchilla|XP_005414798 Saimiri|XP_010333224.1 NMRat|XP_012923763

Cell Rat|RGD_1563078

Rat|XP_006257075 PTHR11991:SF6 Rat|NP_001100416 TPT1 RhesusMacaque|ENSMMUG00000044123Mouse|NP_001192197 Mouse|NP_083123 reprogramming Mouse|XP_006528294Mouse|MGI_3712084 Mouse|MGI_2148169

PANTHER gene familes PANTHER PTHR10155:SF7 PI3KR2 PI3K-Akt signaling (Growth) Bootstrap support PTHR24356:SF266 PRKCI ≥95% 0.3 Itri Jjac Clan Hgla Ngal Cpor Rnor Dord Mocr Maur Hhyd Ccan Odeg Mmar Pman Mmus Species

1180 1181 1182 Fig 3. Gene family composition analysis. (A) Clustering of rodent genomes in a 1183 multidimensional space of molecular functions. The first two principal components are 1184 displayed, accounting for 20.14% and 11.34% of the variation, respectively. PC1 1185 separates the capybara from the rest of the rodents based on the enrichment of gene 1186 families related to sensory perception and immune response. HYST, Hystricognathi 1187 clade; MRC, Mouse-related clade; SRC, Squirrel-related clade. Black portion of arrows 1188 indicates diversification of gene families within each PC. (B) Gene families enriched in 1189 the capybara. Displayed, are gene families related to growth control and tumor 1190 suppression that showed significant enrichment in the capybara (p < 0.05). The heatmap 1191 shows normalized gene counts of PANTHER molecular function categories for the 16 1192 rodents. (C) Phylogenetic tree of MAGEB5 gene family of capybara (24 MAGEB5 1193 members) compared with rodents and rabbit; and outgroup primates (Human, gorilla, 1194 Rhesus macaque and squirrel monkey). Yellow clade contains capybara and guinea pig

39 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1195 members; Orange clade shows a monophyletic expansion of MAGEB5 genes in capybara. 1196 Hhyd, Hydrochoerus hydrochaeris (capybara); Cpor, Cavia porcellus (guinea pig); Clan, 1197 Chinchilla lanigera (chinchilla); Odeg, Octodon degus (degu); Hgla, Heterocephalus 1198 glaber (naked mole rat); Mmus, Mus musculus (mouse); Rnor, Rattus norvegicus (rat); 1199 Mocr, Microtus ochrogaster (vole); Maur, Mesocricetus auratus (hamster); Pman, 1200 Peromyscus maniculatus (deer mouse); Ngal, Nannospalax galili (blind mole rat); Jjac, 1201 Jaculus jaculus (jerboa); Dord, Dipodomys ordii (kangaroo rat); Ccan, Castor canadensis 1202 (beaver); Itri, Ictidomys tridecemlineatus (squirrel); Mmar, Marmota marmota (marmot). 1203

40 bioRxiv preprint doi: https://doi.org/10.1101/424606; this version posted September 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A. Growth: IIS Pathway

Pancreas

β-cell INS IGFII LEP

RFX6 PDX1 Islet Insulin INSR LEPR

Positive selection PI3K PLAGL2 Positive selection in Caviomorpha PRKCI Accelerated evolution Igf-II Gene family expansion Growth, Proliferation & S urvival

Body cell

B. Cancer resistance: T cell-mediated tumor suppression

Cytotoxic T cell Perforin/granzyme-mediated cell-killing

TCR MAGEB5 GZMB

Perforin

Pore

Cancer cell 1204 1205 Fig 4. Proposed signaling pathways and mechanisms involved in (A) growth and (B) 1206 cancer resistance in the capybara. INSR: Insulin receptor; LEPR: Leptin receptor. 1207

41 A. B.

Immune Cell proliferation and growth Cell communication and adhesion Growth (IIS pathway) Metabolism and homeostasis Growth and cancer Mitochondrion organization and metabolism Immune response and cancer Ribosome maturation T cell mediated immunity Other 1208 1209 Fig 5. (A) Gene interaction network of capybara genes. Molecular interactions among 1210 82 genes based on lineage-specific family expansions, positive selection, rapid protein 1211 evolution and unique amino acid substitutions at sites otherwise fixed in rodents. Cluster 1212 analysis was based on GO Biological Process (B) Intragenomic conflict during 1213 evolution of gigantism. Molecular interactions among 23 major genes involved in 1214 growth regulation by the IIS pathway, and T cell mediated immune response. 4 of the 8 1215 genes within the IIS pathway act also as oncogenes and are involved in known cancer 1216 pathways. 1217

42 1218 1219 1220 Fig 6. Mutation load analysis across 16 rodent genomes. The capybara shows 1221 signatures of mutational load, harboring a higher rate of accumulation of potentially 1222 deleterious mutations across 229 SCOs. (A) Density plot showing the distribution of

1223 dN/dS values for the 229 SCOs for capybara and the rest of rodents. Capybaras have a

1224 slightly higher mean value of dN/dS relative to the rest of rodents (dashed lines): 0.15 vs.

1225 0.11, respectively. (B) Mean dN/dS was significantly higher (T-test, t = -103.25, df = 1226 1075, 95%CI = -0.042 to -0.041, P < 0.00001; permutation test, P < 0.001) for the 1227 capybara relative to the other rodents, based on 1,000 bootstrap replicates from the 1228 distributions in (A). 1229

43 1230 Table S1. Reconstruction of ancestral body mass for the caviomorph clade showed that 1231 capybara likely evolved from a relatively small ancestor. Body mass reconstruction was

1232 done with STABLETRAITS [104], which accounts for rate variation in trait evolution along 1233 a phylogeny. Results from a multi-rate model (α<2) were compared with a BM model 1234 (α=2) to select the best fitting model based on the Deviance Information Criterion (DIC). 1235 Ancestral state DIC multi-rate DIC single-rate Node ΔDIC Best fitting model (gr - 95% CI) model (α<2) model (α=2) Hydrochoerus - 1132 (437, 23225) Multi-rate model Kerodon 224.59 303.2 78.61 (α<2) Caviomorpha 971 (221, 5135) 1236

44 1237 Table S2. Field and collection information of the two specimens analyzed for genome 1238 size estimation. 1239 Museum voucher Collection Locality Country Datum Latitude Longitude number Number ANDES-M 2304 AJC 5199 Cabuyaro, Meta Colombia WGS84 4.284025 -72.725647 ANDES-M 2303 AJC 5198 Cabuyaro, Meta Colombia WGS84 4.284025 -72.725647 1240

45 1241 Table S3. Published genomes used for comparative analyses. a Species used for 1242 mitochondrial genome analyses. Mitochondrial genomes were downloaded from the 1243 Organelle Genome Resources database of NCBI. b Direct submission to NCBI 1244 (Unpublished data). 1245

Clade Species Genome version Database Reference a Hystricognathi Guinea pig (Cavia porcellus) cavPor3 ENSEMBL [140] Long-tailed chinchilla (Chinchilla Hystricognathi ChiLan1.0 NCBI lanigera)a NCBIb a Hystricognathi Degu (Octodon degus) OctDeg1.0 NCBI NCBIb a Hystricognathi Naked mole rat (Heterocephalus glaber) HetGla_female_1.0 NCBI [141] a Mouse-related clade Mouse (Mus musculus) GRCm38.p5 ENSEMBL [33] a Mouse-related clade Rat (Rattus norvegicus) Rnor_6.0 ENSEMBL [142] Mouse-related clade Prairie vole (Microtus ochrogaster) MicOch1.0 NCBI NCBIb a Mouse-related clade Golden hamster (Mesocricetus auratus) MesAur1.0 NCBI NCBIb North American deer mouse (Peromyscus Mouse-related clade Pman1.0 NCBI maniculatus) NCBIb a Mouse-related clade Blind mole rat (Spalax galili) S.galili_v1.0 NCBI [143] a Mouse-related clade Lesser Egyptian jerboa (Jaculus jaculus) JacJac1.0 NCBI NCBIb Mouse-related clade Ord's kangaroo rat (Dipodomys ordii) Dord_2.0 NCBI [140] (Castor Mouse-related clade C.can genome v1.0 NCBI canadensis) [144] Squirrel-related- Thirteen-lined ground squirrel (Ictidomys SpeTri2.0 NCBI clade tridecemlineatus) [140] Squirrel-related- Alpine marmot (Marmota marmota) marMar2.1 NCBI clade NCBIb Outgroup Rabbit (Oryctolagus cuniculus) OryCun2.0 NCBI [140] 1246

46 100 97.58 94.35

88.98 87.72

75

64.92 62.5 A. B.

sp 50 Capybara (Scaffold N50: 12.2Mb) Guinea pig (Scaffold N50: 27.9Mb) 100 97.58 94.35

% Recovered genes % Recovered 88.98 87.72

25

75

64.92 6.25 7.04 62.5

0.86 0.89 0 sp 50 BUSCOs(C) BUSCOs(D) BUSCOs(F) CEGs(C) CEGs(P) Capybara (Scaffold N50: 12.2Mb) Gene Category Guinea pig (Scaffold N50: 27.9Mb) % Recovered genes % Recovered

25

6.25 7.04

0.86 0.89 0

BUSCOs(C) BUSCOs(D) BUSCOs(F) CEGs(C) CEGs(P) 1247 Gene Category 1248 1249 Fig S1. Capybara genome annotation statistics (A) Quality assessment of the capybara 1250 genome compared to the guinea pig genome. CEGMA analysis was conducted for 248 1251 CEGs. BUSCO analysis was conducted with the vertebrate dataset that included 3023 1252 BUSCOs. Note that guinea pig scaffold N50 is twice as high as the capybara scaffold 1253 N50, but both genomes are of similar quality. (C): Complete, (D): Duplicated, (F): 1254 Fragmented, (P): Partial. (B) Estimated proportion of the capybara genome sequence 1255 occupied by repetitive elements. Results are based on identification of known and de 1256 novo repeat elements aligning the capybara genome sequences against a reference library 1257 of known repeats for rodents using RepeatMasker. LINE, long interspersed element; 1258 LRT, long terminal repeat; SINE, short interspersed element.

47 1259 1260 1261 Fig S2. Functional annotation of the capybara genome with PANTHER. 1262

48 Capybara Guinea pig

Chinchilla Degu

Naked mole rat

Kangaroo rat

Beaver Jerboa

Blind mole rat

Hamster Vole Deer mouse

Rat

Mouse Squirrel Marmota Rabbit

80 70 60 50 40 30 20 10 0 1263 Million years ago 1264 1265 Fig S3. Divergence times of representative rodent species using the topology 1266 obtained from phylogenomic analyses. The blue bars on nodes represent the 95% 1267 credibility intervals of divergence time estimates. Two nodes were calibrated using fossil 1268 data as follows: Most recent common ancestor (MRCA) of Rodentia: 76 to 88 million 1269 years ago (Ma); MRCA of Mus and Rattus: 18 to 23 Ma. Black dots on nodes represent 1270 bootstrap values over 98% in both analyses: concatenated and species-tree. Clades from 1271 top to bottom: Hystricognathi, Mouse-related clade and Squirrel-related clade. 1272

49 1273

0.175 A. ● B.

0.15

● 0.150 ● ●

0.10 ● 0.125 ●

● Mean nuclear dN/dS Mean nuclear

● ● ● 2 ● ● 0.100 ● R = 0.294 dN/dS Mean mitochondrial ● ● ● ● ● 0.05 ● ● P = 0.01748 ● ● Spearman's ρ = 0.67 ● P = 0.03938 0.075 3 5 7 9 11 0.10 0.11 0.12 0.13 0.14 0.15 Ln(body mass) Mean nuclear dN/dS 1274 1275 1276 Fig S4. Demographic analyses of mutation load (A) Correlation of log-transformed

1277 body mass values with the mean dN/dS of 229 SCOs for each rodent species. Within

1278 rodents, mean dN/dS is positively correlated with body mass indicating a lower efficiency 1279 of purifying selection relative to genetic drift in bigger rodents (R2 = 0.294, df = 14, P < 2 1280 0.05; PIC: R = 0.332, df = 13, P < 0.05). (B) Correlation between the mean dN/dS values 1281 of 229 SCOs and the mean dN/dS values of the 13-protein-coding mitochondrial genes 1282 for ten rodent species for which both nuclear and mitochondrial genomes were available.

1283 Nuclear and mitochondrial dN/dS values show a significant positive correlation 1284 (Spearman’s ρ = 0.67, P < 0.05; PIC: Spearman’s ρ = 0.7, P < 0.05), which indicates a 1285 reduction of the effective population size caused by the increment in body mass. 1286

50 shift direction 1.00 0.80 0.50 0.20 0.00 -0.20 point.seq -0.50 -0.80 -1.00

Oc Ch Er Ca

Ctenomys steinbachi Ctenomys boliviensis Ctenomys haigi Ctenomys bridgesi Octodon lunatus Octodon degus Octodon fuscus Aconaemys sagei Aconaemys porteri Aconaemys cyanus Spalacopus barrerae Tympanoctomys Octomys mimax gliroides Octodontomys spinosus Euryzygomatomys laticeps Clyomys shift setosus Trinomys eliasi Trinomys dimidiatus Trinomys direction iheringi Trinomys dactylinus Dactylomys boliviensis Dactylomys amblyonyx Kannabateomys Toromys grandis blainvilii Phyllomys chrysurus Echimys macrura Makalata brasiliensis Phyllomys didelphoides Makalata Isothrix sinnamariensis Isothrix bistriata emiliae Lonchothrix hispidus Mesomys apereoides Thrichomys longicaudatus Proechimys roberti Proechimys simonsi Proechimys quadruplicatus Proechimys gymnurus Hoplomys coypus Myocastor pilorides Capromys cinerea Abrocoma bennettii Abrocoma maximus Lagostomus lanigera Chinchilla viscacia branickii Dinomys bicolor Coendou melanura Sphiggurus dorsatum Erethizon taczanowskii Cuniculus paca Cuniculus leporina Dasyprocta acouchy Myoprocta musteloides tschudii Cavia porcellus Cavia aperea Cavia australis patagonum Dolichotis rupestris Kerodon hydrochaeris Hydrochoerus typicus Petromus swinderianus Thryonomys glaber Heterocephalus argenteocinereus Heliophobius hottentotus Cryptomys suillus Bathyergus capensis Georychus Hystrix brachyurus 1.00 0.80 posterior rates 0.50 0.20 rep(-0.5, length(point.seq)) 1.563 0.00 0.738 -0.20 point.seq 0.348 -0.50 -0.80 0.165 -1.00 0.078 0.037 point.seq 0.017 Ctenomys steinbachi Ctenomys boliviensis Ctenomys haigi Ctenomys bridgesi Octodon lunatus Octodon degus Octodon fuscus Aconaemys sagei Aconaemys porteri Aconaemys cyanus Spalacopus barrerae Tympanoctomys Octomys mimax gliroides Octodontomys spinosus Euryzygomatomys laticeps Clyomys setosus Trinomys eliasi Trinomys dimidiatus Trinomys iheringi Trinomys dactylinus Dactylomys boliviensis Dactylomys amblyonyx Kannabateomys Toromys grandis blainvilii Phyllomys chrysurus Echimys macrura Makalata brasiliensis Phyllomys didelphoides Makalata Isothrix sinnamariensis Isothrix bistriata emiliae Lonchothrix hispidus Mesomys apereoides Thrichomys longicaudatus Proechimys roberti Proechimys simonsi Proechimys quadruplicatus Proechimys gymnurus Hoplomys coypus Myocastor pilorides Capromys cinerea Abrocoma bennettii Abrocoma maximus Lagostomus lanigera Chinchilla viscacia Lagidium branickii Dinomys bicolor Coendou melanura Sphiggurus dorsatum Erethizon taczanowskii Cuniculus paca Cuniculus leporina Dasyprocta acouchy Myoprocta musteloides Galea tschudii Cavia porcellus Cavia aperea Cavia australis Microcavia patagonum Dolichotis rupestris Kerodon hydrochaeris Hydrochoerus typicus Petromus swinderianus Thryonomys glaber Heterocephalus argenteocinereus Heliophobius hottentotus Cryptomys suillus Bathyergus capensis Georychus Hystrix brachyurus 0.008 posterior rates shift direction rep(-0.5, length(point.seq)) 1.563 1.00 0.004 0.738 0.80 0.348 0.50 0.165 0.20 0.078 0.00 0.037 -0.20 point.seq point.seq 0.017 -0.50 0.008 -0.80 0.004 -1.00 shift probability rep(-0.5, length(point.seq)) 1.000 0.875

Ctenomys steinbachi Ctenomys boliviensis Ctenomys haigi Ctenomys bridgesi Octodon lunatus Octodon degus Octodon fuscus Aconaemys sagei Aconaemys porteri Aconaemys cyanus Spalacopus barrerae Tympanoctomys Octomys mimax gliroides Octodontomys spinosus Euryzygomatomys laticeps Clyomys setosus Trinomys eliasi Trinomys dimidiatus Trinomys iheringi Trinomys dactylinus Dactylomys boliviensis Dactylomys amblyonyx Kannabateomys Toromys grandis blainvilii Phyllomys chrysurus Echimys macrura Makalata brasiliensis Phyllomys didelphoides Makalata Isothrix sinnamariensis Isothrix bistriata emiliae Lonchothrix hispidus Mesomys apereoides Thrichomys longicaudatus Proechimys roberti Proechimys simonsi Proechimys quadruplicatus Proechimys gymnurus Hoplomys coypus Myocastor pilorides Capromys cinerea Abrocoma bennettii Abrocoma maximus Lagostomus lanigera Chinchilla viscacia Lagidium branickii Dinomys bicolor Coendou melanura Sphiggurus dorsatum Erethizon taczanowskii Cuniculus paca Cuniculus leporina Dasyprocta acouchy Myoprocta musteloides Galea tschudii Cavia porcellus Cavia aperea Cavia australis Microcavia patagonum Dolichotis rupestris Kerodon hydrochaeris Hydrochoerus typicus Petromus swinderianus Thryonomys glaber Heterocephalus argenteocinereus Heliophobius hottentotus Cryptomys suillus Bathyergus capensis Georychus Hystrix brachyurus CV 0.750 shift direction shift probability posterior rates 0.625 1.00 rep(-0.5, length(point.seq)) 1.000 rep(-0.5, length(point.seq)) 1.563 0.80 0.875 0.738 0.500 0.50 0.750 0.348 0.375 point.seq 0.20 0.625 0.165 0.250 0.00 0.500 0.078 0.375 0.037

-0.20point.seq point.seq 0.125 point.seq -0.50 0.250 0.017 0.000 -0.80 0.125 0.008 -1.00 0.000 0.004 1287 Ctenomys steinbachi Ctenomys boliviensis Ctenomys haigi Ctenomys bridgesi Octodon lunatus Octodon degus Octodon fuscus Aconaemys sagei Aconaemys porteri Aconaemys cyanus Spalacopus barrerae Tympanoctomys Octomys mimax gliroides Octodontomys spinosus Euryzygomatomys laticeps Clyomys setosus Trinomys eliasi Trinomys dimidiatus Trinomys iheringi Trinomys dactylinus Dactylomys boliviensis Dactylomys amblyonyx Kannabateomys Toromys grandis blainvilii Phyllomys chrysurus Echimys macrura Makalata brasiliensis Phyllomys didelphoides Makalata Isothrix sinnamariensis Isothrix bistriata emiliae Lonchothrix hispidus Mesomys apereoides Thrichomys longicaudatus Proechimys roberti Proechimys simonsi Proechimys quadruplicatus Proechimys gymnurus Hoplomys coypus Myocastor pilorides Capromys cinerea Abrocoma bennettii Abrocoma maximus Lagostomus lanigera Chinchilla viscacia Lagidium branickii Dinomys bicolor Coendou melanura Sphiggurus dorsatum Erethizon taczanowskii Cuniculus paca Cuniculus leporina Dasyprocta acouchy Myoprocta musteloides Galea tschudii Cavia porcellus Cavia aperea Cavia australis Microcavia patagonum Dolichotis rupestris Kerodon hydrochaeris Hydrochoerus typicus Petromus swinderianus Thryonomys glaber Heterocephalus argenteocinereus Heliophobius hottentotus Cryptomys suillus Bathyergus capensis Georychus Hystrix brachyurus 1288 posterior rates shift probability rep(-0.5, length(point.seq)) 1.563 rep(-0.5, length(point.seq)) 1.000 0.738 0.875 0.348 0.750 1289 Fig S5. Bayesian 0.165 sampling of evolutionary 0.625 rates of body mass evolution. Shown is a 0.078 0.500 0.037 0.375 point.seq point.seq 0.017 0.250 1290 significant upturn 0.008 in the evolutionary 0.125rate of this trait in the most recent common ancestor 0.004 0.000 1291 of the clade containing capybara-rock cavy-mara (arrow). Hue and size of circles denote

shift probability 1292 posteriorrep(-0.5, support length(point.seq)) 1.000 of a rate shift in the branch, with larger and redder circles suggesting 0.875 0.750 0.625 1293 higher posterior 0.500 probability supporting an upturn in the evolutionary rate. Branch color 0.375 point.seq 0.250 1294 denotes posterior 0.125 estimated rates of body mass evolution. CV: Caviomorpha, Oc: 0.000 1295 Octodontoidea (degus and spiny rats), Ch: Chinchilloidea ( and ), Er: 1296 Erethizontoidea (), Ca: Cavioidea (capybaras, maras and guinea pigs). 1297

51 1298 Fig S6.

Hystrix brachyurus Georychus capensis Bathyergus suillus Cryptomys hottentotus Heliophobius argenteocinereus Heterocephalus glaber Thryonomys swinderianus Petromus typicus Hydrochoerus hydrochaeris Kerodon rupestris Dolichotis patagonum Microcavia australis Cavia aperea Cavia porcellus Cavia tschudii Galea musteloides Myoprocta acouchy Dasyprocta leporina Cuniculus paca Cuniculus taczanowskii Erethizon dorsatum Sphiggurus melanura Coendou bicolor Dinomys branickii Lagidium viscacia Chinchilla lanigera Lagostomus maximus Abrocoma bennettii Abrocoma cinerea Capromys pilorides Myocastor coypus Hoplomys gymnurus Proechimys quadruplicatus Proechimys simonsi Proechimys roberti Proechimys longicaudatus Thrichomys apereoides Mesomys hispidus Lonchothrix emiliae Isothrix bistriata Isothrix sinnamariensis Makalata didelphoides Phyllomys brasiliensis Makalata macrura Echimys chrysurus Phyllomys blainvilii Toromys grandis Kannabateomys amblyonyx Dactylomys boliviensis Dactylomys dactylinus Trinomys iheringi Trinomys dimidiatus Trinomys eliasi Trinomys setosus Clyomys laticeps Euryzygomatomys spinosus Octodontomys gliroides Octomys mimax Tympanoctomys barrerae Spalacopus cyanus Aconaemys porteri Aconaemys sagei Aconaemys fuscus Octodon degus Octodon lunatus Octodon bridgesi Ctenomys haigi Ctenomys boliviensis Ctenomys steinbachi

120 100 80 60 40 20 0 0 10000 20000 30000 40000 50000 Million years ago Body mass (g) 1299 1300 1301 Fig S6. Time-calibrated phylogram of caviomorph rodents (from [100]) with values 1302 of body mass indicated to the right of each species. The broad range in body mass 1303 within caviomorph rodents reaches three orders of magnitude, with the smallest species 1304 weighting under 100g (green bars) and the largest species, the capybara, weighting 1305 55,000g (red bar). The red node corresponds to the most recent common ancestor of 1306 caviomorph rodents. 1307

52