1 How to make a giant: Genetic basis and genomic tradeoffs of 2 gigantism in the capybara, the world’s largest rodent 3 4 5 6 Santiago Herrera-Álvarez* 7 Laboratorio Biom|ics 8 Department of Biological Sciences 9 Universidad de los Andes, A.A. 4976, Bogotá, Colombia 10 11 Advisor: 12 Andrew J. Crawford, Ph.D 13 Associate Professor 14 Department of Biological Sciences 15 Universidad de los Andes, A.A. 4976, Bogotá, Colombia 16 17 18 *Communicating author: 19 Address: Carrera 1a No 18A-10, Bogotá, Colombia 20 Email: [email protected] 21

1 22 ABSTRACT 23 Differences in body size are arguably the most noticeable differences between . 24 However, how size is developmentally controlled to generate species-specific 25 differences remains elusive. Moreover, theory predicts that inherent tradeoffs related to 26 the evolution of big bodies, namely, a higher mutation load and an increased risk of 27 cancer, should constraint the evolution of giants. We sequenced the genome of the 28 capybara, the world’s largest living rodent, to explore the genetic basis of gigantism in 29 and the genomic signatures of the gigantism-related tradeoffs. The evolution of 30 big bodies amongst rodents appeared to correlate with a lower efficiency of purifying 31 selection, consistent with a higher mutational load in the capybara relative to the rest of 32 rodents. Moreover, several capybara genes involved in growth regulation by the 33 insulin/insulin-like growth factor signaling (IIS) pathway were found to be under 34 positive selection or accelerated evolution, and capybara-specific gene family 35 expansions revealed a putative novel anticancer adaptation based on T-cell-mediated 36 tumor suppression, consistent with cancer selection theory. Our analysis suggests that 37 gigantism in the capybara likely involved four evolutionary steps: 1) Increase in body 38 size by cell proliferation through the ISS pathway. 2) Coupled evolution of growth- 39 regulatory and cancer-suppression mechanisms, possibly driven by intragenomic 40 conflict. 3) Emergence of a cancer suppression mechanism, and 4) a mutational load, 41 possibly as an inevitable outcome of an increase in body size. 42 43

2 44 INTRODUCTION 45 Body size is arguably the most apparent characteristic when studying multicellular life 46 (Haldane, 1927). Understanding how and why body size evolves has been a major 47 question in biology because this trait affects nearly every aspect of a species’ natural 48 history (Calder, 1984). Evolutionary change in size has major impacts on physiology 49 and morphology (Gould, 1966; LaBarbera, 1989), which in turn affect life-history traits 50 and rates of phenotypic change and diversification (Calder, 1984; Silby and Brown, 51 2007; Liow et al., 2008). Yet, how size is developmentally controlled to generate 52 species-specific differences remains elusive. 53 54 While abiotic and extrinsic biological factors have been proposed as major determinants 55 of body size evolution, maximum body size in tetrapods appears to be determined 56 mostly by intrinsic biological constraints (McNeill Alexander, 1998; Sookias et al., 57 2012). , for instance, display a distribution of body size skewed towards 58 smaller sizes suggesting shared underlying biological constraints on the evolution of 59 large bodies (Kozeowsi and Gawelczyk, 2002). However, in some lineages these 60 constraints seem to have disappeared enabling the evolution of gigantism (Lovegrove 61 and Haines, 2004; Sookias et al., 2012). Although increase in body size has several 62 selective advantages (Purvis and Orme, 2005), two major tradeoffs in the evolution of 63 large bodies are the reduction in population size (Damuth, 1981) and the increased risk 64 of developing size-associated diseases, such as cancer (Caulin and Maley, 2011). Thus, 65 the evolution of an extreme phenotype, such as gigantism, could shed light on 66 understanding how biological systems can be built to minimize or overcome the 67 evolutionary effects of these tradeoffs on whole-organism performance (Lailvaux and 68 Husak, 2014). 69 70 Larger species have lower population densities (Damuth, 1981), slower metabolic rates, 71 longer generation times, and a lower reproductive output (Calder, 1984). Together these

72 traits promote an overall reduction in the genetic effective population size (Ne) relative 73 to small-bodied species (Leffler et al., 2012). In smaller populations, slightly deleterious 74 mutations are expected to accumulate at a higher rate than in larger populations due to 75 stronger genetic drift relative to purifying selection, thus increasing the mutation load in 76 the population, which may increase the risk of extinction (Lynch and Gabriel, 1990). 77 Furthermore, have a positive allometric scaling of number of cells in the body

3 78 with overall body size (Savage et al., 2007). Thus, if every cell has an equal probability, 79 however low, of becoming cancerous and all have equivalent cancer suppression 80 mechanisms, the risk of developing cancer should increase as a function of number of 81 cells and, therefore, large organisms should face a higher lifetime risk of cancer simply 82 due to the fact that large bodies contain more cells. (Caulin and Maley, 2011; Tollis et 83 al., 2017). Moreover, giant body sizes are achieved mainly by an accelerated post-natal 84 growth rate (Erickson et al., 2001), which represents an increment in the rate of cell 85 proliferation (Lui and Baron, 2011). For a given mutation rate, an elevated rate of cell 86 proliferation accelerates the rate of accumulation of mutations and, thus, increases the 87 chances for a cell population of becoming cancerous (Cairns, 1975). 88 89 The continued existence of enormous animals, such as whales and elephants, implies, 90 however, that the evolution of gigantism must be coupled with the evolution of cancer 91 suppression mechanisms. In fact, the lack of correlation between size and cancer 92 occurrence, known as Peto’s paradox, suggests that large-sized species have overcome 93 the inherent increase in cancer risk associated with the evolution of large bodies (Caulin 94 and Maley, 2011; Tollis et al., 2017). For instance, genomes of the Asian elephant 95 (Elephas maximus) and the bowhead whale (Balaena mysticetus), two of the largest 96 living mammals, revealed expansion of genes associated with tumor suppression via 97 enhanced DNA-damage response (Sulak et al., 2015) and cell cycle control (Keane et 98 al., 2015), respectively. Thus, cancer selection, i.e. selection to prevent or postpone 99 deaths due to cancer, is especially important as animals evolve larger bodies and might 100 account for the relative lower incidence of cancer in large animals and the evolution of 101 anticancer mechanisms (Leroi et al., 2002). 102 103 Among mammals, rodents are the most diverse group in terms of species-richness and 104 morphological disparity, particularly in terms of body size. South American rodents 105 () represent the most spectacular evolutionary radiation among living New 106 World mammals (Hershkovitz, 1972), and an extraordinary exception to the rodent 107 bauplan where gigantism evolved at least three different times (Lovegrove, 2001; 108 Vucetich and Deschamps, 2015). This group arrived in South America approximately 109 40 million years ago, during the Middle Eocene after a single event of transatlantic 110 colonization from Africa (Antoine et al., 2012; Hutchon and Douzery, 2001), and 111 diversified extensively in terms of ecology and morphology likely due to niche

4 112 specialization (Mares and Ojeda, 1982; Álvarez et al., 2017). Caviomorphs have the 113 broadest range in body size within rodents, making Rodentia one of the mammalian 114 orders with the largest range in body size (Sánchez-Villagra et al., 2003). Most notably, 115 among the caviomorphs is the largest living rodent, the capybara (Hydrochoerus 116 hydrochaeris, [Linnaeus, 1776]; Fig. 1C). 117 118 Capybaras have an average adult body weight of 55 kg (Nowak and Paradiso, 1983), 119 being 60 times more massive than their closest living relative, the rock cavies (Kerodon 120 sp.), and ~2000 times more massive than the common mouse (Mus musculus). Relative 121 to all rodents, capybaras are three orders of magnitude above the mean body size for the 122 order (Nowak and Paradiso, 1983). Previous studies on physiology and molecular 123 evolution identified that the caviomorph insulin, a hormone involved in growth and 124 metabolism, has a higher mitogenic effect relative to other mammalian growth factors 125 (Opazo et al., 2005; King and Kahn, 1981), however, the genomic basis of the 126 capybara’s unique size remains unexplored. Additionally, there is evidence that 127 telomerase activity in rodents coevolves with body mass, with bigger rodents displaying 128 lower telomerase activity in somatic tissues, suggesting a role for replicative senescence 129 as a cancer suppression mechanism in large-bodied rodents (Seluanov et al., 2007). 130 However, whether capybara has a unique solution to the increased risk of cancer 131 relative to the rest of rodents, and moreover, how size-regulatory and cancer 132 suppression mechanisms are coupled to allow the evolution of giant bodies remain 133 elusive. 134 135 Here, we report the sequencing and analysis of a high-quality genome of the capybara, 136 the world’s largest living rodent. This work provides insights into the mechanisms 137 underlying the evolution of extreme body size in rodents, the evolutionary interplay 138 between size-regulatory and cancer-suppression mechanisms and the effects of body 139 size on genome evolution. 140 141 RESULTS 142 Mode of body mass evolution 143 To determine whether the capybara could be considered a giant based on a lineage- 144 specific accelerative rate of body mass evolution, we used a comparative approach that 145 takes into account rate variation in trait evolution. We found that the rate of body mass

5 146 evolution among caviomorph rodents has not been constant throughout the phylogeny, 147 but is likely heterogeneous among branches, supported by the better fit of a multi-rate 148 model compared to the single-rate-Brownian Motion model (ΔDIC = 78.6; Table S1) 149 with an estimated value of α of 1.35 (95% confidence interval [CI] of 1.0 to 1.7). The 150 ancestral body size of the most recent common ancestor (MRCA) of Caviomorpha was 151 estimated to be 971 g (95% CI, 221 g to 5135 g) and the size of the MRCA of the 152 capybara and rock cavy (Kerodon rupestris) was estimated to be 1132 g (95% CI, 437 g 153 to 23,225 g), suggesting that the rate of body size evolution has undergone one or more 154 sudden changes. 155 156 We detected three shifts in the rate of body-mass evolution along the tree (Fig. 2). One 157 decrease in evolutionary rate at the root of the Octodontoidea clade (shift probability: 158 0.11), and two increases in rate acceleration. The first increase in rate was localized in 159 the branch leading to Myocastor (shift probability: 0.16), the largest species 160 within Octodontoidea, and a second increase, with the highest posterior support, at the 161 base of the clade containing the capybara, rock cavy and mara (shift probability: 0.44). 162 Capybaras and Patagonian maras (Dolichotis patagonum) are the largest living species 163 of caviomorph rodents, however, the capybara is approximately 3 times heavier (Fig 164 S5). Together, these results suggest that the capybara evolved from a moderately small 165 ancestor, comparable to the size of a guinea pig, and its characteristic big size was 166 achieved by a spurt in the rate of body mass evolution (see also Álvarez et al, 2017). 167 168 Sequencing and Annotation of the Capybara Genome 169 To explore the genetic basis of the capybara’s unique size, identify possible molecular 170 adaptations related to cancer suppression and detect the genomic signatures associated

171 to a reduced Ne, we generated a whole-genome sequence and de novo assembly of a 172 female capybara. The genome was assembled using a combination of a DISCOVAR 173 assembly and Chicago libraries. Contig N50 was 161.1 kb and scaffold N50 was 12.2 174 Mb, and the total length of the assembly was ~2.7 Gb, slightly shorter than the 175 estimated genome size (3.14 Gb; Table 1). Completeness of the genome was estimated 176 at 94.35% and 88.98% based on CEGMA (Parra et al., 2007) and BUSCO (Simão et al., 177 2015) pipelines, respectively, which is comparable to the guinea pig genome despite the 178 latter having two-fold larger scaffold N50 (Fig. S1A). According to Yandell and Ence 179 (2012), an assembly can be considered as a ‘high-quality draft assembly’ for annotation

6 180 if it is at least 90% complete. We used guinea pig, rat and mouse Ensembl protein sets 181 to guide the homology-based annotation with MAKER (Cantarel et al., 2008), resulting 182 in a capybara genome containing 24.941 protein-coding genes with 92.5% of the 183 annotations with an AED bellow 0.5 (See Experimental Procedures). 184 185 RepeatMasker annotation and classification of repetitive elements estimated that 37.4% 186 of the capybara genome corresponds to repetitive content, spanning 1025.21 Mb of the 187 genome (Fig S1B). Most of the repeat content (72.5%) corresponds to transposable 188 elements, with LINE-1 the most abundant (20.46%), similar to the mouse genome 189 (Waterston and Pachter, 2002). A further comparison across the rodent phylogeny 190 showed that both genome size and repeat content size of the capybara are comparable to 191 that of other rodents (Fig. 1B). 192 193 Genomic Signatures of Mutational Load 194 To estimate the mutation load we used the ratio of the rates of non-synonymous 195 substitutions over synonymous substitutions (dN/dS) as a proxy. Because selection is 196 mostly purifying and the majority of non-synonymous substitutions are considered 197 deleterious or slightly deleterious (Ohta, 1992), less efficient purifying selection relative 198 to genetic drift, a typical pattern in small populations, can result in a higher genome- 199 wide fraction of fixed non-synonymous substitutions, and thus an overall higher value 200 of dN/dS (Popadin et al., 2007). Based on the inverse relationship between body mass

201 and population size (Damuth, 1981), we used body mass as a proxy for Ne. Comparison 202 of capybara-specific dN/dS values with the rest of rodents showed that the capybara has 203 a higher genome-wide mean dN/dS (0.15 vs. 0.11, respectively, Two-sample t-test, P < 204 0.0001; Fig. 3), suggesting the presence of an elevated mutational load in the capybara 205 relative to other rodents. Furthermore, analysis of body mass and nuclear dN/dS across 206 rodents showed that body mass is positively correlated with mean dN/dS (R2 = 0.294, P 207 < 0.05; PIC: R2 = 0.332, P < 0.05; Fig. S4A), suggesting that larger rodents experience 208 less efficient purifying selection. 209 210 To further corroborate that the increased mutational load was attributable to 211 demographic effects, we estimated the dN/dS values for the 13 protein-coding 212 mitochondrial genes for ten rodent species (Table S3). If there is variation in the 213 effective population size of mitochondrial DNA (mtDNA) across species, then diversity

7 214 levels and substitution patterns in mtDNA should be correlated to those in nuclear DNA

215 given that evolutionary processes that affect Ne should affect both genomes (Ballard and 216 Whitlock, 2004; Piganeau and Eyre-Walker, 2009). Consistently, we found that, within 217 rodents, mean mitochondrial and nuclear dN/dS values were strongly correlated 218 (Spearman’s ρ = 0.67, P < 0.05; PIC: Spearman’s ρ = 0.7, P < 0.05; Fig S4B).

219 Together, these results suggest that lower Ne in larger rodent species has increased the 220 mutational load in nuclear and mitochondrial genomes, consistent with variation in 221 effective population sizes among diverse groups of animals (Piganeau and Eyre-Walker, 222 2009). 223 224 Gene-specific Rates of Evolution in the Capybara Genome 225 To find coding-genes potentially related to capybara’s unique size, we first used 226 capybara-coding sequences to calculate the pairwise dN/dS ratios for 11,255 guinea pig 227 (Cavia porcellus) single-copy gene orthologs (SCOs) and identified genes with overall 228 faster sequence evolution. Because the guinea pig is the closest rodent with a sequenced 229 genome to the capybara (Fig. 1A) and is smaller (~750g versus 55,000g), pairwise 230 comparisons between the capybara and guinea pig could provide insights into the genes 231 underlying the rapid burst in body size evolution in the capybara (see Mode of body size 232 evolution). Not surprisingly, there are moderately high levels of sequence conservation 233 in protein coding genes between guinea pig and capybara (mean = 92.6 with standard 234 deviation [SD] = 4.8%) after an estimated divergence time of ~20 million years (Fig. 235 S3). Among the 20 capybara-guinea pig SCOs that showed a dN/dS exceeding 1, leptin 236 (LEP) is a gene known to have an important role as a growth factor regulating body 237 weight and bone mass through the insulin/insulin-like growth factor signaling (IIS) 238 pathway (Margetic et al., 2002). Remarkably, this gene also showed signatures of 239 adaptive evolution in a pairwise dN/dS analysis between the 100-ton-bowhead whale 240 and the 7-ton-minke whale (Keane et al., 2015), suggesting a role for lep in the 241 evolution of giant body sizes. Among these 20 capybara genes showing dN/dS > 1, 242 there are also a number of genes involved in innate (CCL26, CXCL13, LILRA, XCL1, 243 PILRA and CMC4) and adaptive (MHC-B46) immune responses. In addition, we 244 conducted a comparison between 8084 capybara-guinea pig-rat SCOs to identify genes 245 evolving faster in the capybara lineage. The top 5% of genes with high dN/dS values for 246 capybara-guinea pig relative to the highest value for guinea pig-rat orthologs included 247 genes involved in the development of the insulin-producing pancreatic β-cells and DNA

8 248 damage repair pathway such as Pancreatic and duodenal homeobox 1 (PDX1) and 249 Poly(ADP-Ribose) Polymerase members 3 and 9 (PARP3 and PARP9).

250 251 As a complementary analysis of positive selection among genes that takes into account 252 heterogeneous selection pressure at different sites within proteins, we used codon-based 253 models of evolution (Yang, 2007) to identify candidate target genes evolving faster in 254 the capybara lineage. We identified a set of 4452 SCOs using five available genomes 255 for Hystricognath rodents: naked mole rat (Heterocephalus glaber), degu (Octodon 256 degus), chinchilla (Chinchilla lanigera), guinea pig and capybara (Fig. 4A). Fifty-six 257 candidate genes exhibited signatures of lineage-specific positive selection. The modest 258 number of candidate genes is likely due to the low levels of protein divergence between 259 capybara and guinea pig (mean: 1%, 95% C.I.: 0.9-1.2; See Experimental Procedures). 260 Interestingly, among the 37 enriched GO Biological Process in the positively selected 261 genes, are cell death (GO:0008219), cell cycle (GO:0007049), cell proliferation 262 (GO:0008283), growth (GO:0048590), mitotic cell cycle (GO:0000278) and immune 263 system process (GO:0002376). Four genes directly involved in growth regulation and 264 cancer suppression were identified (Fig. 4B), including Apoptosis Regulator Bcl-2 265 (BCL-2), Pleiomorphic adenoma-like Protein 2 (PLAGL2), Poly(ADP-Ribose) 266 Polymerase 2 (PARP2) and Regulatory Factor X6 (RFX6). PLAGL2 is an important 267 member of the PLAG family of zinc finger transcription factors, which interacts with 268 the insulin-like growth factor II (igf2) activating the igf2-mitogenic signaling pathway 269 (Hensen et al., 2002) and disruption can result in decreased body weight and post-natal 270 growth retardation in mice (Hensen et al., 2004; Blake et al. 2017). IGF-II provides a 271 constitutive drive for fetal and post-natal growth (Fowden, 2003) and it has been shown 272 that IGF-II levels do not drop after birth in caviomorph rodents, but instead maintain 273 detectable levels throughout their adult life (Levinovitz et al., 1992), supporting a role 274 of IGF-II regulation on the evolution of gigantism in the capybara. Remarkably, Plag1, 275 and important homolog of Plagl2, has been associated in controlling stature in cattle 276 (Karim et al., 2011), humans (Pryce et al., 2011) and domestic pig (Rubin et al., 2012). 277 BCL-2 is an important regulator of cell death, and altered regulation of Bcl-2-family 278 genes has been detected in multiple cancers (Yip and Reed, 2008). 279

9 280 We found evidence of rapid evolution and lineage-specific positive selection in two key 281 transcription factors (TFs) that regulate the development of the insulin-producing 282 pancreatic β-cells: PDX1 and RFX6 (see above). Pancreatic islets are composed of 5 283 different cell types, being the β-cells the most prominent, composing 50-80% of the 284 islets and the ones in charge of producing insulin (Murtaugh, 2007). PDX1 and RFX6 285 TFs are crucial to establish and maintain the functional identity of developing and adult 286 β-cells (Gao et al., 2014; Piccand et al., 2014). Furthermore, in adult β-cells PDX1 and 287 RFX6 regulate insulin gene transcription (Pagliuca and Melton, 2013) and, particularly, 288 RFX6 also regulates insulin content and secretion (Chandra et al., 2014), which shows 289 signatures of adaptive evolution (Opazo et al., 2005) and has a stronger mitogenic 290 activity in caviomorph rodents (King and Kahn, 1981). Thus, besides the physiological 291 effects of caviomorph insulin, shifts in insulin expression and secretion profiles may be 292 an important mechanism in the evolution of extreme body size. 293 294 Lineage-specific amino acid replacements are good predictors of change in protein 295 function (Jelier et al., 2011). We aligned 876 SCOs sequences from 9 rodent species 296 spanning the three main groups within Rodentia and identified sites otherwise 297 conserved in all rodents except for the capybara (See Experimental Procedures). Among 298 the top 5% genes (44 genes) with the highest concentration of capybara-unique residues 299 was TERF2-Interacting Telomere Protein 1 (TERF2IP). TERF2IP is part of the 300 shelterin complex that shapes and controls the length of telomeric DNA in mammals 301 (De Lange, 2005), particularly the negative regulation of telomere length (O’Connor et 302 al., 2004), and it has been shown that many TERF2-interacting proteins are mutated in 303 chromosomal instability syndromes, increasing the susceptibility to cancer (Blasco, 304 2005). PARP2, a candidate gene showing signatures of lineage-specific positive 305 selection (see above), directly interacts with the TERF2 protein affecting its ability to 306 bind telomeric DNA, and disruption causes chromatid breaks that affect telomere 307 integrity (Dantzer et al., 2004). Together, these results support previous evidence 308 (Seluanov et al., 2007) showing that replicative senescence plays an important role in 309 reducing the increased cancer risk in the big rodents. 310 311 Gene Family Composition Analysis 312 Gene family evolution (e.g., gene expansions) has been recognized as an important 313 mechanism explaining morphological and physiological differences between species

10 314 (Demuth et al., 2006). To characterize the functional arrangements of the genomes, we 315 employed a phenetic approach based on principal components that seeks to identify 316 genomes whose functional composition resembles each other and, thus, tend to be 317 closer in a multidimensional space of molecular functional repertoires (see 318 Experimental Procedures). This approach has been shown to reflect important 319 evolutionary splits between groups or clades of animals (non-bilaterian versus 320 bilaterian: Simakov et al., 2013; deuterostomes and protostomes: Albertin et al., 2015). 321 In the case of rodents, this multivariate approach largely reflects phylogenetic 322 relationships within rodents despite the much smaller time frame for functional 323 divergence (Fig. 5A). The capybara, however, showed signatures of genome-wide 324 functional diversification relative to the rest of rodents. To explore which gene families 325 were driving this pattern we performed an overrepresentation test to determine the gene 326 families that dominate the loadings of each principal component. Based on this, the 327 main determinants of PC1 (Fig. 5A) were the diversification of genes related to sensory 328 perception of smell, G-protein coupled receptor signaling and immune response. 329 330 Based on the genome-wide screen for gene family expansions, we identified 106 gene 331 families significantly enriched in capybara, five of them that are particularly relevant for 332 growth and cancer suppression (Fig. 5B). Phosphatidylinositol 3-kinase Regulatory 333 subunit β (PI3KR2), atypical protein kinase C isoform PRKC iota (PRKCI), Tumor 334 Protein, Translationally-Controlled 1 (TPT1), Melanoma Antigen Family B5 335 (MAGEB5) and Granzyme B (GZMB) were among the three-fold enriched gene 336 families in the capybara. PI3K signalling is implicated in cell proliferation, growth and 337 survival via the IIS and mTOR signaling pathways (Laplante and Sabatini, 2012), and 338 PI3KR2 has been shown to have both oncogenetic (Ito et al., 2014) and tumor 339 suppressor properties (Taniguchi et al., 2010). PRKCI functions downstream of PI3K 340 and is involved in cell survival, differentiation, and proliferation by accelerating G1/S 341 transition (Ni et al., 2016). TPT1 has a major role in phenotypic reprogramming of 342 cells, inducing cell proliferation and growth via the mTOR signaling pathway, and also 343 tumor reversion where cancer cells lose their malignant phenotype (Amson et al., 2013). 344 Type I MAGE genes (e.g., MAGEB5) are expressed in highly proliferating cells such as 345 placenta, tumors, and germ-line cells (van der Bruggen et al., 1991). Interestingly, in 346 normal somatic tissues these genes are deactivated but when cells become neoplastic, 347 MAGE genes are reactivated and the resultant proteins may be recognized by cytotoxic

11 348 T lymphocytes, triggering a T-cell-mediated tumor suppression response (van der 349 Bruggen et al., 1991). Lastly, GZMB is an important component of the 350 perforin/granzyme-mediated cell-killing response of cytotoxic T lymphocytes via 351 caspase-dependent apoptosis (Russell and Ley, 2002). The fact that the capybara 352 displays functional diversification in the T-cell receptor (TCR) signaling pathway (Fig. 353 5A), expansion of MAGEB5 and GZMB genes, and expansions of the variable 354 segments of the alpha subunits of TCRs (TRAV22 and TRAV12-2; Fig. 5B), suggests 355 that T-cell-mediated tumor suppression response may be a parallel strategy, besides 356 replicative senescence, to reduce the increased cancer risk related to the evolution of a 357 giant body size. 358 359 Interaction of Growth, Cancer and Cancer Suppression Pathways 360 To investigate how growth and cancer suppression pathways might be coevolving in the 361 capybara to allow the evolution of a giant body size, we used the genes that showed 362 lineage-specific positive selection and rapid evolution, high concentration of unique- 363 capybara residues and capybara-expanded gene families (see above) to perform a 364 network analysis of GO Biological Process and KEGG pathways on 100 interacting 365 genes using String Database (Mering et al., 2003). The GO enrichment analysis showed 366 a number of Biological Process (Fig. 6A) involved in growth regulation (Pancreas 367 development [GO:0031016] and response to insulin [GO:0032868]) and cancer 368 suppression (Positive regulation of immune response to tumor cell [GO:0002839], 369 positive regulation of T cell proliferation [GO:0042102], Negative regulation of cell 370 proliferation [GO:0008285] and apoptotic signaling pathway [GO:0097190]). 371 Interestingly, this analysis also suggests a role for T-cell-mediated tumor suppression as 372 an anti-cancer mechanism in the capybara. 373 374 KEGG pathway enrichment (Fig. 6B) showed also a number of pathways involved in 375 growth (Insulin signaling pathway [ID 04910] and PI3K-Akt signaling pathway [ID 376 04151]) and immune response (Antigen processing and presentation [ID 04612] and 377 hematopoietic lineage [ID 04640]) that could be involved in the anticancer mechanism 378 mediated by T cells. Remarkably, some genes are directly involved in known cancer 379 pathways such as melanoma (ID 05218), prostate cancer (ID 05215), colorectal cancer 380 (ID 05210) and transcriptional misregulation of cancer (ID 05202). Genes involved in 381 known cancer pathways directly interact with genes involved in growth regulation and

12 382 immune response, but there is no direct interaction between genes in these two latter 383 categories. Of particular interest among the genes involved in cancer pathways are 384 insulin, PI3KR2 and FGF22 that are both growth factors and oncogenes (Laplante and 385 Sabatini, 2002), suggesting that cancer develops as a by-product of growth regulatory 386 evolution. 387 388 DISCUSSION 389 A new definition of ‘giant’ 390 The study of body mass evolution has been primarily focused on the extremes relative 391 to human body size (Bonner, 2006) however, the term ‘gigantism’ is relative to the 392 clade of interest. Here, we demonstrate that comparative phylogenetic methods can 393 reveal remarkable patterns of body mass evolution in a clade otherwise considered to be 394 composed of small organisms. Capybaras are the last extant remnant of a series of 395 exceptional big rodents that evolved in South America during the last 15 million years 396 (Vucetich and Deschamps, 2015; Álvarez et al., 2017). The accelerated rate of body 397 mass evolution in capybaras relative to other rodents parallels the rate acceleration 398 observed in the world’s largest animals, e.g., elephants, the largest living terrestrial 399 mammals, among afrotherian mammals (Puttick and Thomas, 2015); mysticetes, which 400 include the largest to have ever lived –the blue whale-, among baleen whales 401 (Slater et al., 2017) and sauropodomorphs, the largest terrestrial animals in Earth’s 402 history, relative to other dinosaurs (Benson et al., 2014), indicating that capybaras can 403 be also considered as giants among rodents. Thus, more than just the absolute body size, 404 a significant increase in the rate of body size evolution from relatively small-bodied 405 ancestors, as revealed through comparative analyses, may be more revealing as an 406 evolutionary metric to objectively identify giant organisms. 407 408 Developmental regulators of body size in the capybara 409 This study provides the first genomic insights into the evolution of gigantism in the 410 world’s largest living rodent. Our results suggest that the insulin/insulin-like growth 411 factor signaling (IIS) pathway was involved in the evolution of capybara’s unique size, 412 including signaling by insulin (PDX1 and RFX6), igf2 (PLAGL2) and PI3K (PI3KR2 413 and PRKCI). This pathway has been linked to body-size evolution in diverse animals, 414 such as fishes, dogs, and flies. For instance, the genome of the world’s largest bony fish, 415 the sunfish (Mola mola), also revealed accelerated rates of evolution in genes associated

13 416 with the regulation of growth by the IIS pathway (Pan et al., 2016). Extreme size 417 differences in dog breeds are determined by a single haplotype of the insulin-like 418 growth factor I (igf1) gene (Sutter et al., 2007). Moreover, several genes within the IIS 419 pathway are involved in natural body size variation in latitudinal clines in Drosophila 420 melanogaster (De Jong and Bochdanovitis, 2003). 421 422 Differences in size between related organisms can reflect differences in cell number, 423 cell size or both (Conlon and Raff, 1999). In animals, six signaling pathways regulate 424 growth: RTK signaling via Ras, IIS via the PI3K pathway, Rheb/Tor, cytokine- 425 JAK/STAT, Warts/Hippo, and the Myc oncogene (Zhu et al., 2013). However, we have 426 little understanding regarding which growth-regulatory pathways are targeted more 427 often during evolution of large bodies, or whether there is a predictable pattern, two key 428 questions in the study of the origin of species’ size differences. Body size is a canonical 429 complex trait, and ontogenetic changes underlying body size evolution affect other life- 430 history traits, such as fecundity and longevity (Shingleton, 2011). Thus, the extent of 431 pleiotropic effects of each developmental regulator of body size will ultimately 432 determine its availability for evolutionary change (Hansen and Houle, 2004). Given the 433 existence of at least six size-regulating pathways (Zhu et al., 2013), the repeated use of 434 the IIS pathway in flies, fishes and mammals, suggests that this pathway might be less 435 constrained and could be a ‘hot-spot’ for extreme body size evolution in animals, 436 indicating a common genetic and developmental pathway controlling rapid body size 437 evolution through cell proliferation. More studies on extant gigantic animals, e.g., whale 438 shark, ostrich, Komodo dragon, elephant or goliath beetle, can help to better understand 439 the developmental programs underlying size diversity in animals and elucidate how 440 often is the IIS pathway targeted to generate extreme body size evolution relative to the 441 other pathways. 442

443 Evolutionary tradeoffs of gigantism: Reduced Ne and Cancer 444 Despite the intrinsic constraints to the evolution of giant body sizes (McNeill 445 Alexander, 1998), large bodies have evolved multiple times among mammals (Baker et 446 al., 2015). Here we addressed two important problems that face giant organisms: an 447 increased risk of cancer due to a higher number of cells and higher rates of cell

448 proliferation, plus a reduction in effective population size (Ne) leading to a higher 449 genetic load. For the latter, capybaras harbor a higher rate of accumulation of slightly

14 450 deleterious mutations relative to small-bodied rodents, suggesting a smaller Ne due to its 451 body size. It is interesting to note that within rodents, body mass explains ~30% of the 452 genetic load, resembling the pattern observed for mammals in general (Popadin et al., 453 2007), reinforcing the effect of body size on genome evolution. 454 455 In a finite population, the interplay between selection and genetic drift determines the 456 fate of newly arisen mutations, and the relative strength of each force will largely

457 depend on Ne (Charlesworth, 2009). Given that Ne tends to scale proportionally with the 458 number of individuals in a population (Charlesworth, 2009), and among species body 459 mass and population density are negatively correlated (Damuth, 1981), the evolution of

460 larger body sizes causes a reduction in Ne. With lower Ne, genetic drift becomes more 461 potent and slightly deleterious mutations behave neutrally. As purifying selection 462 becomes relatively less effective, slightly deleterious mutations accumulate and the 463 mean fitness of individuals is reduced (Lynch and Gabriel, 1990; Lohr and Haag, 2015).

464 Reduced long-term Ne can increase the probability of extinction (Bertram et al., 2017) 465 and may explain macroevolutionary patterns of extinction risk related to body size 466 (Ripple et al., 2017). 467 468 In rodents, cancer incidence has been reported to be 46% in a wild-caught population of 469 Mus musculus raised in the laboratory (Andervort and Dunn, 1962), and 14.4% to 30% 470 in guinea pigs older than three years (Jelínek, 2003). In capybaras, only three cases have 471 been reported to date (Stoffregen et al., 1993; Hamano et al., 2014; Srivorakul et al., 472 2017), suggesting the possibility of a lower incidence of cancer in capybara relative to 473 other rodents. This pattern is consistent with Peto’s paradox and with empiric analyses 474 of cancer incidence among different mammalian species (Abegglen et al., 2015). 475 476 Mammalian comparative genomic analyses have revealed several lineage-specific 477 molecular adaptations in respect to cancer suppression (Tollis et al., 2017b). Here, we 478 found evidence for a likely capybara-specific cancer suppression mechanism in 479 response to the evolution of a large body size, namely, T-cell-mediated tumor 480 suppression, revealed by lineage-specific expansions of MAGE tumor antigens and 481 diversification of genes involved in immune detection and eradication of tumors. 482 Furthermore, we corroborate previous evidence supporting a role for replicative 483 senescence in bigger rodents such that nearly complete somatic repression of telomerase

15 484 activity seems to be shared by the two largest rodents, beaver (~20Kg) and capybara 485 (~60Kg; Seluanov et al., 2007). Species-specific evolution in immune response could 486 uncover new model organisms for biomedical research (Webb et al., 2015), and, given 487 the increasing interest in using genetically engineered T cells for cancer immunotherapy 488 (Kershaw et al., 2013), the capybara might be a suitable model to study the evolution 489 and physiology of tumor detection and eradication by T cells. 490 491 Genes in Conflict: Coupled Evolution of Growth-Promoting and Cancer- 492 Suppression Mechanisms 493 How can giants evolve, or even exist? Size is regulated through somatic growth (Conlon 494 and Raff, 1999) and cancer develops through somatic evolution (Crespi and Summers, 495 2005). Thus, cancer is a major constraint on the evolution of large bodies. However, the 496 mere existence of animals such as whales and elephants suggests that the evolution of 497 gigantism must be coupled with the evolution of cancer suppression mechanisms. We 498 showed that growth, cancer, and anticancer pathways are intimately connected through 499 molecular interactions. Paradoxically, genes that regulate size have dual natures: they 500 are growth factors, promoting somatic evolution, but also oncogenes, enabling tumor 501 progression. For example, LEP is a major regulator of body weight and bone mass 502 (Margetic et al., 2002) but also promotes angiogenesis through VEGF signaling and acts 503 as an autocrine factor in cancer cells promoting proliferation and inhibition of apoptosis 504 (Andò et al., 2012). PLAGL2 is involved in post-natal growth by activating the igf2- 505 mitogenic signaling pathway (Hensen et al., 2002) but generates loss of cell-cell contact 506 inhibition, maintains an immature differentiation state in glioblastomas, and induces 507 proliferation of hematopoietic progenitors in leukemia (Hensen et al., 2002; Zheng et 508 al., 2011; Landrette et al., 2011). Finally, insulin-PI3K signaling is implicated in cell 509 proliferation, growth and survival via the IIS and mTOR signaling pathways (Laplante 510 and Sabatini, 2012) but is a major node frequently mutated in solid tumors (Yuan and 511 Cantley, 2008). These observations suggest that some cancers can arise as a by-product 512 of growth regulatory evolution. 513 514 The dual nature of growth factors creates an intragenomic conflict in large-sized 515 organisms, where somatic evolution leading to gigantism may promote tumorigenic 516 cells (Summers et al., 2002). Cancers are selfish cell lineages that evolve through 517 somatic mutation and cell-lineage selection (Burt and Trivers, 2006). Thus, in big-

16 518 bodied organisms, genetic pathways that evolved in the context of natural selection 519 favoring increase in size, through cell proliferation, can be co-opted by cancer cell 520 lineages increasing their selective advantage by allowing self-sufficiency in growth 521 signals, one of the hallmarks of cancer (Hanahan and Weinberg, 2000; Summers et al., 522 2002). In fact, within species the risk of cancer scales with body size (Caulin and 523 Maley, 2011). For example, dog breeds such as Great Danes, New Foundlands and St 524 Bernarnds, which have been selected for rapid growth and large body size, have a 180- 525 fold greater risk of osteosarcoma than smaller breeds (Tjalma, 1966). Likewise, in 526 humans, juvenile osteosarcoma seems to be a by-product of recent evolutionary changes 527 in human growth, such as the pubertal growth spurt, where 50% of tumors occur in 528 children in the top 75th percentile for height for their age (Fraumeni, 1967). 529 530 The opportunity for selfish cell lineages to arise imposes a selection pressure on cancer 531 surveillance mechanisms, generating a coupled evolution between growth-promoting 532 and tumor-suppression pathways, a plausible underlying process causing Peto's paradox 533 and a possible explanation to how giant bodies can evolve in the first place. Therefore, 534 based on gene network interaction analyses, we propose a model for the evolution of 535 gigantism in the capybara where the IIS pathway promotes somatic evolution through 536 cell proliferation, leading to a phyletic increase in size, and the evolution of T-cell 537 mediated tumor suppression pathway as a mechanism to counteract the increased cancer 538 risk. Further analyses of correlated rates of evolution between the interacting genes 539 (Clark and Aquadro, 2010) can help to better understand how protein-protein 540 interactions and intragenomic conflict shape the evolution of this complex trait. 541 542 543 EXPERIMENTAL PROCEDURES 544 Rate of Body Size Evolution and Ancestral State Reconstruction 545 The term ‘gigantism’ is relative to the clade of interest. We used a comparative 546 approach to explore to what extent the capybara’s large size is due to common ancestry 547 versus the result of accelerated rates of phenotypic evolution. A recent caviomorph 548 phylogeny (Upham and Patterson, 2012) was used to determine whether the capybara 549 could be considered a giant based on a lineage-specific accelerative rate of body mass 550 evolution. Mean adult body mass data, averaged across both sexes (in grams), for each

17 551 species was taken from the literature (Tacutu et al., 2013; Freudenthal and Martín- 552 Suárez, 2013). Analyses were conducted on log-transformed measures of body mass. 553 554 The Brownian Motion (BM) model of continuous trait evolution assumes that trait 555 variation accumulates proportionally through time on a phylogeny where branch lengths 556 are scaled to time, and rates of trait evolution are finite and do not change across the 557 phylogeny (Felsentein, 1988). However, the disparity in body size among caviomorph 558 rodents (Fig S5), suggests that rate of body size evolution likely vary across lineages. 559 To test for a multiple rate model of body size evolution on the caviomorph phylogeny 560 and reconstruct ancestral states, the software STABLETRAITS was employed (Elliot and

561 Mooers, 2014). STABLETRAITS samples from a symmetrical distribution of mean zero 562 which is defined by its index of stability (α): for BM α=2 which results in a normal 563 distribution (i.e. single-rate model), but when α<2 this results in a shallower distribution 564 with heavy tails, meaning that can sample different rate values supporting a multi-rate 565 model. For the analysis, results from a multi-rate model (α<2) were compared with a 566 BM model (α=2) to select the best fitting model based on the Deviance Information 567 Criterion (DIC). Two MCMC chains were run for 2 million generations until the 568 potential scale reduction factor went bellow 1.01 to ensure convergence. The burn-in 569 was set to 25%, with the output containing calculated rates, ancestral states and 570 posterior probabilities. 571 572 Given that the multi-rate model was selected as the best model according to DIC value

573 (Table 1), the R package AUTEUR (Eastman et al., 2011) was used to localize the 574 branches in the phylogeny where the rate shifts occurred, without prior knowledge or 575 specification. AUTEUR uses a reversible-jump MCMC (rjMCMC) that allows assessing 576 the fit of different models that differ in the number of rate shifts in the tree under a BM 577 model. The algorithm assumes that evolutionary rates are heritable, that is, the ancestral 578 value for the evolutionary rate is inherited to all descendants unless the data provides 579 evidence of a rate shift. The rjMCMC approach allows the exploration from the 580 simplest BM process (i.e. a global constant rate) to the more complex process where 581 each branch evolves under a different rate. Two rjMCMC chains were run for 1 million 582 generations with the following parameters: sample.freq = 100 prop.mergesplit = 0.1 and 583 prop.width = 0.1. 584

18 585 Capybara Genome Size Estimation 586 Genome size in capybara (H. hydrochaeris) was estimated using flow cytometry on 587 blood samples from two juveniles taken from a population in Cabuyaro, Meta, 588 Colombia (04.2840250°, -072.7256472°; Table S2). Animals where tranquilized with 589 an intramuscular dose of Zoletil® 50 (VirbacTM, Colombia), and then sacrificed with a 590 lethal dose of Euthanex® 50 (InvetTM, Colombia) according to manufacturer 591 instructions. Blood samples were stored in 3mL-heparinized tubes at 2ºC and 592 transported to the laboratory in Bogotá, Colombia. Nucleated white blood cells were 593 separated from the rest of blood contents for genome size estimation. 100µL of white 594 blood cells were washed with 3mL of PBS buffer, and cells’ DNA was stained 595 according to the BD CycletestTM Plus DNA kit instructions (BD BiosciencesTM). 596 Chicken red blood cells were used as a genome size standard, with an accepted haploid 597 DNA content (C-value) of 1.25 pg. Flow cytometry was conducted on a BD 598 FACSCantoTM II cytometer with a FACs DIVA v6.1 software. Average C-value was 599 estimated to be 3.21 pg (capybara-1 C = 3.17 pg, capybara-2 C = 3.25 pg), which 600 corresponds to a mean genome size of 3.14 Gb. 601 602 Capybara Genome Sequencing and Assembly 603 As part of the 200 Mammalian Genomes project, the Broad Institute generated a 604 DISCOVAR genome (Weisenfeld et al., 2014) from a sample of a wild-born female H. 605 hydrochaeris from the San Diego Zoo’s Frozen Zoo (frozen sample KB10393), 606 apparently of Brazilian origin. The genome assembly was performed with the 607 DISCOVAR de novo pipeline, generating an assembly spanning 2.734 Gb with a 608 scaffold N50 length of 0.202 megabases (Mbp). 609 610 Dovetail GenomicsTM’s proprietary ChicagoTM method (Putnam et al., 2015) was used 611 to upgrade the initial DISCOVAR assembly. Dovetail’s novel approach to increasing 612 the contiguity of genome assemblies combines initial short-read assemblies with long- 613 range information generated by in vitro proximity ligation of DNA in chromatin. San 614 Diego Zoo’s Frozen Zoo collection provided 500 ng of high molecular weight (HMW) 615 gDNA (50 Kb mean fragment size) from the same individual of H. hydrochaeris 616 sequenced by the Broad Institute (see above). DNA was reconstituted into chromatin in 617 vitro and performed a DNA-protein crosslinking with formaldehyde. Fixed chromatin 618 was digested with the Mbo1 enzyme, and the 5’ overhangs were filled in with

19 619 biotinylated nucleotides, and then free blunt ends were ligated. After ligation, crosslinks 620 were reversed and the DNA purified from protein and the purified DNA was treated to 621 remove biotin that was not incorporated into ligated fragments. The DNA was sheared 622 to ~350 bp mean fragment size, and libraries were generated using NEBNext Ultra 623 enzymes and Illumina-compatible adapters. Biotin-containing fragments were then 624 isolated using streptavidin beads before PCR enrichment of each library. The libraries 625 were sequenced on two lanes of an Illumina HiSeq in Rapid Run Mode to produce 316 626 million 2x100bp paired end reads, which provided 67x physical coverage (measured in 627 bins of 1-50 kb). 628 629 The reads were assembled by Dovetail GenomicsTM using the HiRiseTM Scaffolder 630 pipeline and the DISCOVAR genome assembly as input. Shotgun and ChicagoTM 631 library sequences were aligned to the draft input assembly using a modified SNAP read 632 mapper (http://snap.cs.berkeley.edu), to generate an assembly spanning 2.737 Gb of an 633 estimated 3.14 Gb genome size (see Capybara genome size estimation, above), with a 634 contig N50 length of 161.1 kb and an impressive scaffold N50 length of 12.2 Mb. 635 636 Assembly Quality, Homology-Based and Functional Annotation 637 Two approaches were employed to evaluate the quality of the assembly: 1) the Core 638 Eukaryotic Genes Mapping Approach (CEGMA, Parra et al., 2007), was used to 639 determine the number of complete Core Eukaryotic Genes (CEGs) recovered, given that 640 a well-assembled genome should recover a high proportion (≥75%) of the 248 ultra- 641 conserved CEGs regardless of the cell type or organism. 2) An analysis of 642 Benchmarking Universal Single-Copy Orthologs (BUSCO, Simão et al., 2015) was 643 applied as an evolutionary measure of genome completeness, using the vertebrate 644 dataset (3023 genes) as query. Gene content of a well-assembled genome should include 645 a high proportion (≥75%) of BUSCOs (Fig S1). Scaffold N50 of the assembly was 646 calculated using the QUAST software (Gurevich et al., 2013). 647 648 Putative genes were located in the assembly by homology-based annotation with 649 MAKER (Cantarel et al., 2008) based on protein evidence from mouse (Mus musculus), 650 rat (Rattus norvegicus) and guinea pig (Cavia porcellus; Ensembl release 85). CD-HIT 651 v4.6.1 (Li et al., 2001) was used to cluster highly homologous protein sequences across 652 the three protein sets and generate a non-redundant protein database (nrPD). The nrPD

20 653 was used to guide gene predictions on the capybara genome. To evaluate the quality of 654 the annotations the Annotation edit distance (AED) score was used as a measure of 655 congruence between each annotation and the homology-based evidence (in this case, 656 protein sequences). When an annotation perfectly matches the overlapping model 657 protein, the AED value is 0 (Yandell and Ence, 2012). As a rule of thumb, a genome 658 annotation where 90% of the annotations have an AED less than 0.5 is considered well 659 annotated (Campbell et al., 2014). The capybara genome annotation has 92.5% of its 660 annotations with an AED below 0.5. 661 662 PANTHER (Thomas et al., 2003) was used to characterize the functions of the 663 capybara-annotated proteins (Fig S2). PANTHER is a library of protein families and 664 subfamilies that uses the Gene Ontology (GO) database to assign function to proteins. 665 Following the protocol for large-scale gene function analysis with PANTHER (Mi et 666 al., 2013), the Scoring Tool was used to assign each protein of the capybara genome to a 667 specific protein family (and subfamily) based on a library of Hidden Markov Models 668 (HMMs). 669 670 Annotation of Repeat Elements 671 RepeatMasker open-4.0.5 (Smit et al., 2015) was used to identify and classify 672 transposable elements (TEs) and short tandem repeats by aligning the capybara genome 673 sequences against a reference library of known repeats for rodents, using default 674 parameters. For comparison, NCBI Annotation releases for 13 additional rodents were 675 used. 676 677 Preparation of Downloaded Genomes for Comparative Analyses 678 Available rodent genomes were downloaded from NCBI and Ensembl (Table S3). 679 AlignWise (Evans and Loose, 2015) was used to identify protein-coding regions and 680 extract the longest open reading frame per gene. The protocol for large-scale gene 681 function analysis with PANTHER (Mi et al., 2013) was employed to characterize 682 molecular functions in each genome. 683 684 Rodent Phylogenomics and Divergence Time Estimation 685 Single-copy gene orthologs (SCOs) across 17 glire (rodents + rabbit) genomes (Table 686 S3) were searched using an 8321-marker dataset of orthologs for glires from OrthoMaM

21 687 database v8 (Douzery et al., 2014). This dataset contains orthologous groups for 6 of the 688 17 genomes; thus, we constructed a dataset combining the 8321 OrthoMaM orthologous 689 groups with the 11 remaining genomes and performed an all-vs-all blastn (Camacho et 690 al., 2009). We used the reciprocal groups function of VESPA v1.0β (Webb et al., 2016) 691 to recover orthologous groups spanning the 17 genomes and identified 2597 692 orthologous groups, of which 370 were SCOs. For inference, the set 693 of 370 SCOs was concatenated and aligned using MUSCLE (Edgar, 2004), and poorly 694 aligned sites were trimmed using Gblocks v0.91b (Talavera and Castresana, 2007). 695 RAxML v8.2.10 (Stamatakis, 2014) with rapid bootstrap mode (n = 500) and assuming 696 a GTRGAMMA model was used to infer the phylogenomic tree. We also used a 697 species-tree approach to phylogenomic inference that takes into account the individual 698 history of each gene, which may conflict due to, e.g., incomplete lineage sorting, using 699 the software Neighbor-joining species tree (NJst; Shaw et al., 2013). For each 700 individually aligned SCO, 100 bootstrap replicates were ran on RAxML v8.2.10. A file 701 with 37000 gene trees (100 bootstraps x 370 SCOs) was used as input for NJst. 702 703 Given that both phylogenomic analyses (concatenated and species-tree approach) 704 resulted in identical topologies, divergence times were estimated based on the topology 705 obtained in the concatenated analysis using the software MCMCTREE (Yang and 706 Rannala, 2006) (PAML package). Two nodes were calibrated using fossil data (Hedges 707 et al., 2006) as follows: ancestral node of Rodentia: 76 to 88 million years ago (Ma); 708 ancestral node of Mus and Rattus: 18 to 23 Ma. 709 710 Genomic Signatures of Mutation Load 711 In order estimate the mutation load, we used the ratio of the rates of non-synonymous 712 substitutions over synonymous substitutions (dN/dS) as a proxy. Values of dN/dS were 713 estimated for the set of 229 SCOs, as recovered with the software Proteinortho (Lechner 714 et al., 2011), from 16 rodent genomes using the program CODEML from the PAML 715 package v4.8 (Yang, 2007). Likewise, dN/dS ratios for the 13 protein-coding genes of 716 the mitochondrial genomes of capybara plus 9 additional rodent mitochondrial genomes 717 were estimated (Table 3). Two evolutionary models were implemented: model 0 with 718 dN/dS held constant over the tree and model 1 with free dN/dS estimated separately for 719 each branch. Likelihood-ratio tests were used to compare the two models, and in every 720 case the model 1 was the preferred one. Only dN/dS values associated with external

22 721 branches were used as measures of mutation load. Codon-based alignments were 722 performed with MACSE (Ranwez et al., 2011) with default parameters and poorly 723 aligned regions were trimmed with Gblocks v. 0.91b (Talavera and Castresana, 2007). 724 725 To determine if the capybara has a higher genome-wide dN/dS relative to other rodents, 726 capybara-specific dN/dS values (see above) were compared with dN/dS values for the 727 rest of rodents using a bootstrap sampling. Because for most mammalian species no

728 direct estimates of Ne are available, we used body mass as a proxy for Ne. Mean dN/dS 729 values per species and log-transformed values of body mass were used to estimate the 730 effect of the reduction of effective population size on the strength of purifying selection. 731 Additionally, to test whether the observed higher values of dN/dS were caused by 732 demographic effects (i.e. higher genetic drift relative to purifying selection), a 733 correlation between nuclear and mitochondrial dN/dS values was performed given that a 734 demographic bottleneck should affect both genomes. In both cases, an additional 735 analysis of phylogenetic independent contrasts (PICs) was applied to eliminate the 736 effect of phylogenetic inertia. Assessment of linear model assumptions (e.g., 737 heteroscedasticity) was performed with the gvlma R package (R Development Core 738 Team, 2008). 739 740 Gene Family Composition 741 To search for expanded gene families in capybara, we performed gene family 742 assignments for each of the 16 rodent genomes using the protocol for large-scale gene 743 function analysis with PANTHER (Mi et al., 2013). Following Simakov et al. (2013), 744 functional repertories of the genomes were represented in a multidimensional space, 745 where each dimension corresponds to a particular gene family. A principal component 746 analysis was conducted using the prcomp function in R (R Development Core Team, 747 2008) on the count of the number of genes in each gene family as identified by 748 PANTHER. The coordinate of a genome along each dimension represents the number 749 of genes that it contains with the corresponding gene family normalized by the total 750 number of genes in that particular species. PC1 component separated the capybara from 751 the rest of rodents and explains the 20.14% of variance, while PC2 formed one big 752 cluster and explained 11.34% of the variance. An overrepresentation test was conducted 753 on the gene families that fell on both 2.5%-tails of the loadings of PC1 and PC2, to 754 determine the gene families that dominate the loadings of each principal component. A

23 755 Fisher’s exact test was then performed iteratively in R (R Development Core Team, 756 2008) on counts of each PANTHER gene family from the 5% tails of each PC against 757 the complete set of shared gene families (23110 families). To be defined as enriched, a 758 gene family had to have a significant p-value (<0.05) after a traditional Bonferroni 759 correction for multiple testing. 760 761 To conduct a genome-wide screen for gene family expansions, a table with 23110 762 shared PANTHER gene families was constructed with the counts for genes annotated to 763 those families in the 16 rodent genomes. A Fisher’s exact test was then performed 764 iteratively in R (R Development Core Team, 2008) on raw counts of each gene family, 765 comparing the number of counts found in the capybara to the background (the 766 remaining 15 rodents). To be defined as enriched, a gene family had to have a 767 significant p-value (<0.05) after a traditional Bonferroni correction for multiple testing. 768 To represent and account for the different numbers of genes across species in the 769 heatmap (Fig. 7B), the individual counts were normalized by the total gene count in 770 each species and then normalized within each gene family (across species) with the 771 scale function in R (R Development Core Team, 2008). This Z-score represented the 772 number of standard deviations below or above the mean gene family count for each 773 species. 774 775 Pairwise dN/dS Analysis 776 Pairwise comparisons between guinea pig and capybara could provide insights into the 777 genes underlying capybara’s size. We calculated pairwise dN/dS ratios to compare 778 among genes within genomes to look for signs of positive selection. A total of 11255 779 SCOs between guinea pig and capybara where obtained with the OrthoFinder software 780 v.1.1.4 (Emms and Kelly, 2015) for downstream analysis of positive selection. Specific 781 Biopython (Cock et al., 2009) and PyCogent (Knight et al., 2007) modules where used 782 to estimate dN/dS values for each SCO. In addition, we calculated the ratios of the 783 capybara-guinea pig dN/dS values to the higher of the dN/dS values between guinea 784 pig-rat, on a set of 8084 SCOs, to identify genes that evolved more rapidly on the 785 capybara lineage. Codon-based alignments were performed with MACSE (Ranwez et 786 al., 2011) with default parameters. Gaps and ambiguities were excluded from 787 alignments, and only alignments with a trimmed length above 50% of the original 788 alignment length were retained for subsequent analyses.

24 789 Positive Selection Analysis Using Codon-Based Models of Evolution 790 While pairwise dN/dS can identify genes with overall faster sequence evolution, amino 791 acid sites within a protein are expected to be under different selective pressures and 792 have different underlying dN/dS ratios (Yang et al., 2000). We used branch-site models 793 to identify positive selection on capybara versus naked mole rat, degu, chinchilla and 794 guinea pig on a set of 4452 SCOs determined with OrthoVenn software (Wang et al., 795 2015). Downstream analyses were done using the Selection Analysis Preparation 796 functions provided in VESPA v1.0β (Webb et al., 2016). We compared the likelihood 797 scores of two selection models implemented in CODEML in the PAML package v4.8 798 (Yang, 2007) using likelihood-ratio tests (LRT) as a measure of model fit: A branch-site 799 model A which attempts to detect positive selection acting on a few sites on particular 800 specified lineages or ‘foreground branches’ (dN/dS>1); and the null model (A null) 801 where codons can only evolve neutrally (dN/dS=1) or under purifying selection 802 (0

808 (pdGD). This ‘protein distance index’ or PDI (PDIi = pdCDi – pdGDi) takes a value of 809 zero if the two p-distances are equal, meaning that there has been low protein 810 divergence in the capybara lineage. A value above zero is an indicative of rapid protein 811 evolution in the capybara lineage. Genes above the top 5% of the distribution in each 812 dimension (i.e. LRT and PDI) where considered as positively selected in the capybara 813 lineage. Codon-based alignments were performed with MACSE (Ranwez et al., 2011) 814 with default parameters and poorly aligned regions were trimmed with Gblocks v. 0.91b 815 (Talavera and Castresana, 2007). 816 817 For genes exhibiting signatures of lineage-specific positive selection, we conducted a 818 GO enrichment analysis to identify Biological Process overrepresented in these genes 819 relative to the 4452 SCO set. GO Biological Process for each gene was retrieved with 820 the biomaRt R package (Durinck et al., 2009) using the protein stable IDs of guinea pig 821 orthologs. A Fisher’s exact test was then performed iteratively in R (R Development 822 Core Team, 2008) on counts of each Biological Process from the positively selected

25 823 genes against the full set of orthologs (4452 genes). To be defined as enriched, a GO 824 Biological Process had to have a significant p-value (<0.05) after a traditional 825 Bonferroni correction for multiple testing. 826 827 Identification of Proteins with Capybara-Unique Residues 828 Conservation-based methods used to identify lineage-specific amino acid replacements 829 have been proven to be good predictors of change in protein function (Jelier et al., 830 2011). We identified SCOs for 9 rodents (capybara, guinea pig, degu, chinchilla, naked 831 mole rat, Ord’s kangaroo rat, mouse, rat and thirteen-lined ground squirrel) and used 832 multiple sequence alignments (MSA) to identify positions in each SCO that are 833 conserved in all species except for the capybara. An in-house Python script was used to 834 identify unique capybara amino acid residues where a maximum of one unknown 835 residue was allowed in each position of the alignment. SCOs were identified with 836 OrthoFinder software v.1.1.4 (Emms and Kelly, 2015) and protein MSAs were 837 performed with PRANK (Löytynoja and Goldman, 2005) with default parameters. Gaps 838 were excluded from alignments, and only alignments with a trimmed length above 50% 839 of the original alignment length were retained for subsequent analyses. This procedure 840 resulted in a total of 876 SCOs retained. The SCOs were ranked according to the 841 number of unique substitutions normalized by protein length. 842 843 Gene Interaction Network Analyses 844 We used the genes that showed lineage-specific positive selection, pairwise dN/dS 845 above 1, rapid evolution (top 1%), high concentration of capybara-unique residues (top 846 5%) and capybara-expanded gene families to generate a list containing a total of 195 847 genes. Insulin was included as part of the 195 gene set given that previous analyses 848 showed signatures of adaptive evolution among Caviomorph rodents (Opazo et al., 849 2005). Of the 195 genes, only 100 genes presented known or predicted interactions 850 according to String Database (Mering et al., 2003). We used this new gene set to 851 perform a network analysis of GO Biological Process and KEGG pathways on the 100 852 interacting genes. The networks were constructed based on String DB interactions 853 (Mering et al., 2003) and plotted with Cytoscape 3.0 (Shannon et al., 2003). GO and 854 KEGG enrichment analyses were performed based on String DB information with a 855 false discovery rate of 0.05. 856

26 857 Acknowledgements 858 On behalf of the authors, I want to thank Colciencias for its incredibly generous support 859 to this project. The whole study, including the high-quality capybara genome, field trips 860 and materials was funded by the Colciencias project “Conservation Genomics of the 861 Capybara: building the foundations for its sustainable use” (identifier 1204-659-44334). 862 I would like to take the opportunity of recognising the great importance of scientific 863 research for societies, and the consequent relevance of government entities dedicated to 864 funding and promoting scientific research, especially in a country with such a huge 865 potential, as is Colombia. Additionally, I want to thank the National Environmental 866 Licensing Authority (ANLA) for the “Permiso Marco de Recolección” awarded to the 867 Universidad de Los Andes (resolution 1177, October 9th of 2014 IDB0359), which 868 made possible the collection of specimens critical for the development of the project. 869 870 I also would like to thank all the Administrative staff of the Biological Sciences 871 department and the Sequencing Lab staff for all their support during various stages of 872 my master's process. 873 874 I want to thank Dr. Mary J. O’Connell, Dr. Daniel Cadena and Dr. Silvia Restrepo for 875 their great insightful and helpful comments on the initial and final steps of the 876 document. The comments were critical at choosing the direction of some analyses and 877 their further interpretation, which made the document scientifically more robust. 878 879 I want to thank the Biom|ics lab for all their support and great moments. They became 880 as part of my family since 2014, and I am very grateful for having had the opportunity 881 of getting to know people with such a human quality. Particularly, I want to give a very 882 special thanks to my advisor and lab PI, Dr. Andrew Crawford. He has always been 883 incredibly supportive, has inspired me to always give my best and has been my mentor 884 in almost everything I know about genetics and evolution. Despite my different interests 885 regarding the lab’s main focus, he has always encouraged my ideas and believed in my 886 passion and abilities to achieve them. This project could not have been a reality if he has 887 not believed in my initial idea and interest about gigantism and cancer. 888 889 I want to thank to my family, because they have been my backbone in all this process. 890 They have always support my passion for biology and evolution, and more importantly,

27 891 have been my main emotional support. To my mom, dad and sister: Thanks for so much 892 for everything and I hope I can give you back so much. Finally, I want to thank some 893 people who have also been a tremendous emotional and intellectual support: Laura 894 Céspedes, Sofía Rojas, Catalina Sánchez and Roberto Márquez. 895

28 896 REFERENCES

897 Abegglen, L. M., Caulin, A. F., Chan, A., Lee, K., Robinson, R., Campbell, M. S., et al. (2015). Potential 898 Mechanisms for Cancer Resistance in Elephants and Comparative Cellular Response to DNA Damage in 899 Humans. JAMA, 314(17), 1850–1852.

900 Albertin, C. B., Simakov, O., Mitros, T., Wang, Z. Y., Pungor, J. R., Edsinger-Gonzales, E., et al. (2015). 901 The octopus genome and the evolution of cephalopod neural and morphological novelties. Nature, 902 524(7564), 220–224.

903 Andervort, H. B. & Dunn, T. B. (1962). Occurrence of tumors in wild house mice. J. Natl Cancer Instit., 904 28, 1153–1163

905 Andò, S., & Catalano, S. (2012). The multifactorial role of leptin in driving the breast cancer 906 microenvironment. Nature Reviews. Endocrinology, 8(5), 263–75.

907 McNeill Alexander, R. (1998). All Time Giants: The largest animals and their problems. Palaeontology, 908 41(6), 1231-1245.

909 Álvarez, A., Arévalo, R. L. M., Verzi, D. H., & Aires, B. (2017). Diversification patterns and size 910 evolution in caviomorph rodents. Biol. J. Linn. Soc., 20, 1–16.

911 Amson, R., Pece, S., Marine, J., Paolo, P., Fiore, D., Telerman, A., & Supe, N. (2013). TPT1 / TCTP- 912 regulated pathways in phenotypic reprogramming. Trends in Cell Biology, 23(1), 37–46.

913 Anisimova, M., Bielawski, J. P., & Yang, Z. (2002). Accuracy and Power of Bayes Prediction of Amino 914 Acid Sites Under Positive Selection. Molecular Biology and Evolution, 19(6), 950–958.

915 Antoine, P.O., Marivaux, L., Croft, D. a., Billet, G., Ganerod, M., Jaramillo, C., et al. (2012). Middle 916 Eocene rodents from Peruvian Amazonia reveal the pattern and timing of caviomorph origins and 917 biogeography. Proc. R. Soc. B, 279(1732), 1319–1326.

918 Baker, J., Meade, A., Pagel, M., & Venditti, C. (2015). Adaptive evolution toward larger size in 919 mammals. Proc. Natl. Acad. Sci, 112(16), 5093–5098.

920 Ballard, J. W. O., & Whitlock, M. C. (2004). The incomplete natural history of mitochondria. Molecular 921 Ecology, 13(4), 729–744.

922 Benson, R. B. J., Campione, N. E., Carrano, M. T., Mannion, P. D., Sullivan, C., Upchurch, P., & Evans, 923 D. C. (2014). Rates of Dinosaur Body Mass Evolution Indicate 170 Million Years of Sustained 924 Ecological Innovation on the Avian Stem Lineage. PLoS Biology, 12(5).

925 Bertram, J., Gomez, K., & Masel, J. (2017). Predicting patterns of long‐term adaptation and extinction 926 with population genetics. Evolution, 71(2), 204-214.

927 Blake, J. A., Eppig, J. T., Kadin, J. A., Richardson, J. E., Smith, C. L., & Bult, C. J. (2017). Mouse 928 Genome Database (MGD)-2017: community knowledge resource for the laboratory mouse. Nucleic acids 929 research, 45(D1), D723-D729.

930 Blasco, M. A. (2005). Telomeres and human disease: ageing, cancer and beyond. Nat Rev Genet, 6(8), 931 611–622.

932 Bonner,J. T. (2006).Why size matters: From bacteria to blue whales. Princeton University Press.

933 Burt, A. & Trivers, R. (2006). Genes in conflict: The biology of selfish genetic elements. Harvard 934 University Press. Cambridge, Massachussetts.

29 935 Cairns, J. (1975). Mutation selection and the natural history of cancer. Nature, 255, 197–200.

936 Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., & Madden, T. L. (2009). 937 BLAST+: architecture and applications. BMC Bioinformatics, 10(1), 421.

938 939 Campbell, M. S., Holt, C., Moore, B., & Yandell, M. (2014). Genome Annotation and Curation Using 940 MAKER and MAKER-P. Curr Protoc Bioninformatics, 48, 4.11.1-4.11.39.

941 Cantarel, B. L., Korf, I., Robb, S. M. C., Parra, G., Ross, E., Moore, B., et al. (2008). MAKER: An easy- 942 to-use annotation pipeline designed for emerging model organism genomes. Genome Research, 18(1), 943 188–196.

944 Calder,W. A., III. 1984. Size, function, and life history. Harvard Univ. Press, Cambridge,MA.

945 Caulin, A. F., & Maley, C. C. (2011). Peto’s Paradox: evolution’s prescription for cancer prevention. 946 Trends in Ecology & Evolution, 26(4), 175–182.

947 Chandra, V., Albagli-curiel, O., Polak, M., Scharfmann, R., Chandra, V., Albagli-curiel, O., et al. (2014). 948 RFX6 Regulates Insulin Secretion by Modulating Ca2+ Homeostasis in Human b Cells. Cell Reports, 1– 949 13.

950 Charlesworth, B. (2009). Effective population size and patterns of molecular evolution and variation. 951 Nature Reviews Genetics, 10(3), 195–205.

952 Clark, N. L., & Aquadro, C. F. (2010). A Novel Method to Detect Proteins Evolving at Correlated Rates: 953 Identifying New Functional Relationships between Coevolving Proteins. Mol. Biol. Evol, 27(5), 1152– 954 1161.

955 Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., et al. (2009). Biopython: 956 Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 957 25(11), 1422–1423.

958 Conlon, I., & Raff, M. (1999). Size control in animal development. Cell, 96(2), 235–244.

959 Crespi, B., & Summers, K. (2005). Evolutionary biology of cancer. Trends in Ecology and Evolution, 960 20(10), 545–552.

961 Damuth, J. (1981). Population density and body size in mammals. Nature, 290, 699-700

962 Dantzer, F., Giraud-Panis, M.-J., Jaco, I., Amé, J. C., Schultz, I., Blasco, M. et al. (2004). Functional 963 Interaction between Poly(ADP-Ribose) Polymerase 2 (PARP-2) and TRF2: PARP Activity Negatively 964 Regulates TRF2. Molecular and Cellular Biology, 24(4), 1595–1607.

965 De Jong, G., & Bochdanovits, Z. (2003). Latitudinal clines in Drosophila melanogaster: Body size, 966 allozyme frequencies, inversion frequencies, and the insulin-signalling pathway. J. Genet., 82, 207–223.

967 De Lange, T. (2005). Shelterin: The protein complex that shapes and safeguards human telomeres. Genes 968 and Development, 19(18), 2100–2110.

969 Demuth, J. P., Bie, T. De, Stajich, J. E., Cristianini, N., & Hahn, M. W. (2006). The Evolution of 970 Mammalian Gene Families. PLoS ONE, 1(1), e85.

971 Douzery, E. J. P., Scornavacca, C., Romiguier, J., Belkhir, K., Galtier, N., Delsuc, F., & Ranwez, V. 972 (2014). OrthoMaM v8: A database of orthologous exons and coding sequences for comparative genomics 973 in mammals. Molecular Biology and Evolution, 31(7), 1923–1928.

30 974 Durinck, S., Spellman, P. T., Birney, E., & Huber, W. (2009). Mapping identifiers for the integration of 975 genomic datasets with the R/Bioconductor package biomaRt. Nature Protocols, 4(8), 1184–1191.

976 Eastman, J. M., Alfaro, M. E., Joyce, P., Hipp, A. L., & Harmon, L. J. (2011). A Novel comparative 977 method for identifying shifts in the rate of character evolution on trees. Evolution, 65(12), 3578–3589.

978 Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. 979 Nucleic acids research, 32(5), 1792-1797.

980 Elliot, M., & Mooers, A. (2014). Inferring ancestral states without assuming neutrality or gradualism 981 using a stable model of continuous character evolution. BMC Evolutionary Biology, 14(226), 1–39.

982 Emms, D. M., & Kelly, S. (2015). OrthoFinder: solving fundamental biases in whole genome 983 comparisons dramatically improves orthogroup inference accuracy. Genome Biology, 16(1), 157.

984 Erickson, G. M., Rogers, K. C., & Yerby, S. A. (2001). Dinosaurian growth patterns and rapid avian 985 growth rates. Nature, 412(6845), 429–433.

986 Evans, T., & Loose, M. (2015). AlignWise: a tool for identifying protein-coding sequence and correcting 987 frame-shifts. BMC Bioinformatics, 16(1), 376.

988 Fang, X., Nevo, E., Han, L., Levanon, E. Y., Zhao, J., Avivi, A. et al. (2014). Genome-wide adaptive 989 complexes to underground stresses in blind mole rats Spalax. Nature Communications, 5, 1-9

990 Felsenstein, J. (1988). Phylogenies and Quantitative Characters. Ann. Rev. Ecol. Syst., 19, 445–471.

991 Fowden, A. L. (2003). The insulin-like growth factors and feto-placental growth. Placenta, 24(8–9), 803– 992 812

993 Fraumeni, J. F. (1967). Stature and malignant tumours of bone in childhood and adolesence. Cancer, 20, 994 967–973

995 Freudenthal, M., & Martín-Suárez, E. (2013). Estimating body mass of fossil rodents. Scripta Geologica, 996 8(145), 1–130.

997 Gao, T., McKenna, B., Li, C., Reichert, M., Nguyen, J., Singh, T. et al. (2014). Pdx1 maintains β cell 998 identity and function by repressing an α cell program. Cell Metabolism, 19(2), 259–271.

999 Gould, S. J. (1966). Allometry and Size in Ontogeny and Phylogeny. Biol. Rev, 41, 587–640.

1000 Gorbunova, V., Seluanov, A., Zhang, Z., Gladyshev, V. N., & Vijg, J. (2014). Comparative genetics of 1001 longevity and cancer: insights from long-lived rodents. Nature Reviews. Genetics, 15(8), 531–40.

1002 Gurevich, A., Saveliev, V., Vyahhi, N., & Tesler, G. (2013). QUAST: Quality assessment tool for 1003 genome assemblies. Bioinformatics, 29(8), 1072–1075. 1004 Haldane, J.B.S. (1927). ‘On being the right size’ Possible Worlds and Other Essays.

1005 Hanahan, D., & Weinberg, R. (2000). The hallmarks of cancer. Cell, 100, 57–70.

1006 Hedges, S. B., Dudley, J., & Kumar, S. (2006). TimeTree: a public knowledge-base of divergence times 1007 among organisms. Bioinformatics, 22(23), 2971-2972.

1008 Hamano, T., Terasawa, F., Tachikawa, Y., Murai, A., & Mori, T. (2014). Squamous Cell Carcinoma in a 1009 Capybara (Hydrochoerus hydrochaeris). J. Vet. Med. Sci., 76(9), 1301–1304.

31 1010 Hensen, K., Valckenborgh, I. C. C. Van, Kas, K., Ven, W. J. M. Van De, & Voz, M. L. (2002). The 1011 Tumorigenic Diversity of the Three PLAG Family Members Is Associated with Different DNA Binding 1012 Capacities 1. Cancer Research, 62, 1510–1517.

1013 Hensen, K., Braem, C., Declercq, J., Van Dyck, F., Dewerchin, M., Fiette, L. et al. (2004). Targeted 1014 disruption of the murine Plag1 proto-oncogene causes growth retardation and reduced fertility. 1015 Development Growth and Differentiation, 46(5), 459–470.

1016 Hansen, T. F. & Houle, D. (2004) Evolvability, stabilizing selection, and the problem of stasis. In 1017 Pigliucci, M. & Preston, K. (Eds.) Phenotypic Integration: Studying the ecology and evolution fo complex 1018 phenotypes. Oxford, Oxford University Press.

1019 Hershkovitz, P. (1972). The recent mammals of the neotropical region: a zoogeographic and ecological 1020 review. In A. Keast, F. C. Erk, & B. Glass (Eds.), Evolution, Mammals and Southern Continents (pp. 1021 311–431). Albany: State University of New York.

1022 Hutchon, D., & Douzery, E. J. . (2001). From the Old World to the New World: A Molecular Chronicle 1023 of the Phylogeny and Biogeography of Hystricognath Rodents. Molecular Phylogenetics and Evolution, 1024 20(2), 238–251.

1025 Ito, Y., Hart, J. R., Ueno, L., & Vogt, P. K. (2014). Oncogenic activity of the regulatory subunit p85β of 1026 phosphatidylinositol 3-kinase (PI3K). Proceedings of the National Academy of Sciences, 111(47), 16826– 1027 16829.

1028 Jelier, R., Semple, J. I., Garcia-verdugo, R., & Lehner, B. (2011). Predicting phenotypic variation in yeast 1029 from individual genome sequences. Nat Genet, 43(12), 1270–1274.

1030 Jelínek, F. (2003). Spontaneous Tumours in Guinea Pigs. Acta Vet. Brno, 72, 221–228.

1031 Karim, L., Takeda, H., Lin, L., Druet, T., Arias, J. a C., Baurain, D., et al. (2011). Variants modulating 1032 the expression of a chromosome domain encompassing PLAG1 influence bovine stature. Nature 1033 Genetics, 43(5), 405–413.

1034 Keane, M., Semeiks, J., Webb, A. E., Li, Y. I., Quesada, V., Craig, T. et al. (2015). Insights into the 1035 evolution of longevity from the bowhead whale genome. Cell Reports, 10(1), 112–122.

1036 Kershaw, M. H., Westwood, J. A., & Darcy, P. K. (2013). Gene-engineered T cells for cancer therapy. 1037 Nature Reviews Cancer, 13(8), 525–541.

1038 Kim, E. B., Fang, X., Fushan, A. A., Huang, Z., Lobanov, A. V, Han, L., et al. (2011). Genome 1039 sequencing reveals insights into physiology and longevity of the naked mole rat. Nature, 479(7372), 223– 1040 227.

1041 King, G. L., & Kahn, C. R. (1981). Non-parallel evolution of metabolic and growth-promoting functions 1042 of insulin. Nature, 292, 644-646

1043 Knight, R., Maxwell, P., Birmingham, A., Carnes, J., Caporaso, G., Easton, B. C., et al. (2007). 1044 PyCogent : a toolkit for making sense from sequence. Genome Biology, 8(8).

1045 Kozeowski, J. A. N., & Gawelczyk, A. T. (2014). Why are species’ body size distributions usually 1046 skewed to the right? Functional Ecology, 16(4), 419–432.

1047 LaBarbera, M. (1989). Analyzing body size as a factor in ecology and evolution. Annu. Rev. Ecol. Evol. 1048 Syst., 20, 97–117.

32 1049 Lailvaux, S. P., & Husak, J. F. (2014). The Life History of Whole-Organism Performance. The Quarterly 1050 Review of Biology, 89(4), 285–318.

1051 Landrette, S., Medera, D., He, F., & Castilla, L. (2011). The PlagL2 transcription factor activates Mpl 1052 transcription and signaling in hematopoietic progenitor and leukemia cells. Leukemia, 25(4), 655–662.

1053 Laplante, M., & Sabatini, D. M. (2012). mTOR Signaling in Growth Control and Disease. Cell, 149(2), 1054 274–293.

1055 Lechner, M., Findeiß, S., Steiner, L., Marz, M., Stadler, P., & Prohaska, S. (2011). Proteinortho: 1056 Detection of (Co-)orthologs in large-scale analysis. BMC Bioinformatics, 12(124), 1–9.

1057 Leroi, A. M., Koufopanou, V., & Burt, A. (2002). Cancer selection. Nature, 1326, 226–231.

1058 Levinovitz, A., Norstedt, G., van den Berg, S., Robinson, I. C., & Ekstrom, T. J. (1992). Isolation of an 1059 insulin-like growth factor II cDNA from guinea pig liver: expression and developmental regulation. 1060 Molecular and Cellular Endocrinology, 89(1–2), 105–110.

1061 Lovegrove, B. G. (2001). The Evolution Of Body Armor In Mammals: Plantigrade Constraints Of Large 1062 Body Size. Evolution, 55(7), 1464–1473.

1063 Lovegrove, B. G., & Haines, L. (2004). The evolution of placental body sizes: Evolutionary 1064 history, form, and function. Oecologia, 138(1), 13–27.

1065 Li, W., Jaroszewzki, L., & Godzik, A. (2001). Clustering of highly homologous sequences to reduce the 1066 size of large protein databases. Bioinformatics, 17(3), 282–283.

1067 Lohr, J. N., & Haag, C. R. (2015). Genetic load, inbreeding depression, and hybrid vigor covary with 1068 population size: An empirical evaluation of theoretical predictions. Evolution, 69(12), 3109–3122.

1069 Löytynoja, A., & Goldman, N. (2005). An algorithm for progressive multiple alignment of sequences 1070 with insertions. Proc. Natl. Acad. Sci, 102(30), 10557–62

1071 Lui, J. C. & Baron, J. (2011). Mechanisms limiting body growth in mammals. Endocrine Reviews, 32(3), 1072 422–440.

1073 Lynch, M., & Gabriel, W. (1990). Mutation Load and the Survival of Small Populations. Evolution, 1074 44(7), 1725–1737.

1075 Leffler, E. M., Bullaughey, K., Matute, D. R., Meyer, W. K., Se, L., Venkat, A., Andolfatto, P. & 1076 Przeworski, M. (2012). Revisiting an Old Riddle : What Determines Genetic Diversity Levels within 1077 Species? PLoS Biology, 10(9).

1078 Liow, L. H., M. Fortelius, E. Bingham, K. Lintulaakso, H. Mannila, L. Flynn, and N. C. Stenseth. (2008). 1079 Higher origination and extinction rates in larger mammals. Proc. Natl. Acad. Sci, 105:6097–6102.

1080 Mares, M. A. & Ojeda, R. A. (1982). Patterns of diversity and adaptation in South American 1081 hystricognath rodents. In M. A. Mares & H. . Genoways (Eds.), Mammalian Biology in South America 1082 (pp. 393–432). Pymatuning Lab Ecol., Univ. Pittsburgh.

1083 Margetic, S., Gazzola, C., Pegg, G., & Hill, R. (2002). Leptin: a review of its peripheral actions and 1084 interactions. International Journal of Obesity, 26, 1407–1433.

1085 Mering, C. Von, Huynen, M., Jaeggi, D., Schmidt, S., Bork, P., & Snel, B. (2003). STRING : a database 1086 of predicted functional associations between proteins, 31(1), 258–261.

33 1087 Mi, H., Muruganujan, A., Casagrande, J. T., & Thomas, P. D. (2013). Large-scale gene function analysis 1088 with the PANTHER classification system. Nature Protocols, 8(8), 1551–1566.

1089 Murtaugh, L. C. (2007). Pancreas and beta-cell development: from the actual to the possible. 1090 Development, 134(3), 427–38.

1091 Nowak, R.M. & Paradiso, J.L. (1983). Walker’s mammals of the world. 4th Ed. The Johns Hopkins 1092 University Press, Baltimore and London.

1093 Ni, S., Chen, L., Li, M., Zhao, W., Shan, X., Wu, M. et al. (2016). PKC iota promotes cellular 1094 proliferation by accelerated G1/S transition via interaction with CDK7 in esophageal squamous cell 1095 carcinoma. Tumor Biology, 37(10), 13799–13809.

1096 O’Connor, M. S., Safari, A., Liu, D., Qin, J., & Songyang, Z. (2004). The human Rap1 protein complex 1097 and modulation of telomere length. Journal of Biological Chemistry, 279(27), 28585–28591.

1098 Ohta, T. (1992). The nearly neutral theory of molecular evolution. Annual Review of Ecology and 1099 Systematics, 23(1), 263-286.

1100 Opazo, J. C., Palma, R. E., Melo, F., & Lessa, E. P. (2005). Adaptive evolution of the insulin gene in 1101 caviomorph rodents. Molecular Biology and Evolution, 22(5), 1290–1298.

1102 Pagliuca, F. W., & Melton, D. A. (2013). How to make a functional β-cell. Development, 140(12), 2472– 1103 83.

1104 Pan, H., Yu, H., Ravi, V., Li, C., Lee, A. P., Lian, M. M. et al. (2016). The genome of the largest bony 1105 fish, ocean sunfish (Mola mola), provides insights into its fast growth rate. GigaScience, 5, 36.

1106 Parra, G., Bradnam, K., & Korf, I. (2007). CEGMA: a pipeline to accurately annotate core genes in 1107 eukaryotic genomes. Bioinformatics, 23(9), 1061–7.

1108 Piccand, J., Strasser, P., Hodson, D. J., Meunier, A., Ye, T., Keime, C., … Gradwohl, G. (2014). Rfx6 1109 Maintains the Functional Identity of Adult Pancreatic β Cells. Cell Reports, 9(6), 2219–2232.

1110 Piganeau, G., & Eyre-Walker, A. (2009). Evidence for variation in the effective population size of animal 1111 mitochondrial DNA. PLoS ONE, 4(2), 2–9.

1112 Popadin, K., Polishchuk, L. V, Mamirova, L., Knorre, D., & Gunbin, K. (2007). Accumulation of slightly 1113 deleterious mutations in mitochondrial protein-coding genes of large versus small mammals. Proc. Nat. 1114 Acad. Sci, 104(33), 13390–13395.

1115 Pryce, J. E., Hayes, B. J., Bolormaa, S., & Goddard, M. E. (2011). Polymorphic regions affecting human 1116 height also control stature in cattle. Genetics, 187(3), 981–984.

1117 Purvis, A. & Orme, D.L. (2005). Evolutionary trends in body size. In J.C. Carel, P.A. Kelly & Y. 1118 Christen (Eds.), Deciphering Growth (pp 1-18). Heidelberg. Springer.

1119 Putnam, N. H., Connell, B. O., Stites, J. C., Rice, B. J., Hartley, P. D., Sugnet, C. W. et al. (2015). 1120 Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Research, 1121 26, 342–350.

1122 Puttick, M. N., & Thomas, G. H. (2015). Fossils and living taxa agree on patterns of body mass 1123 evolution: a case study with Afrotheria. Proc. R. Soc. B, 282, 20152023.

1124 R Development Core Team (2008). R: A language and environment for statistical computing. R 1125 Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R- 1126 project.org.

34 1127 Ranwez, V., Harispe, S., Delsuc, F., & Douzery, E. J. P. (2011). MACSE: Multiple Alignment of Coding 1128 SEquences accounting for frameshifts and stop codons. PLoS ONE, 6(9).

1129 Ripple, W. J., Wolf, C., Newsome, T. M., Hoffmann, M., Wirsing, A. J., & McCauley, D. J. (2017). 1130 Extinction risk is most acute for the world’s largest and smallest vertebrates. Proc. Nat. Acad. Sci, 1–6.

1131 Rubin, C., Megens, H., Martinez, A., Maqbool, K., & Sayyab, S. (2012). Strong signatures of selection in 1132 the domestic pig genome, 109(48), 19529–19536.

1133 Russell, J. H., & Ley, T. J. (2002). Lymphocyte-mediated cytotoxicity. Annual Review of Immunology, 1134 20(1), 323–370.

1135 Sánchez-Villagra, M. R., Aguilera, O. A., & Horovitz, I. (2003). The Anatomy of the World’s Largest 1136 Extinct Rodent. Science, 301(5640), 1708–1710.

1137 Savage, V. M., Allen, A. P., Brown, J. H., Gillooly, J. F., Herman, A. B., Woodruff, W. H., & West, G. 1138 B. (2007). Scaling of number, size, and metabolic rate of cells with body size in mammals. Proc. Natl. 1139 Acad. Sci, USA 104(11), 4718–4723.

1140 Seluanov, A., Chen, Z., Hine, C., Sasahara, T. H. C., Antonio, A. C., Ribeiro, M. et al. (2007). 1141 Telomerase activity coevolves with body mass, not lifespan. Aging Cell, 6(1), 45–52.

1142 Schaffer, L. (1994). A model for insulin binding to the insulin receptor. Eur. J. Biochem, 221, 1127– 1143 1132.

1144 Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., et al. (2003). Cytoscape: a 1145 software environment for integrated models of biomolecular interaction networks. Genome research, 1146 13(11), 2498-2504.

1147 Shaw, T. I., Ruan, Z., Glenn, T. C., & Liu, L. (2013). STRAW: species tree analysis web server. Nucleic 1148 Acids Research, 41(W1), W238-W241.

1149 Shingleton, A.W (2011). Evolution and the regulation of growth and body size. In T. Flatt & A. Heyland 1150 (Eds.), Mechanisms of life history evolution: The genetics and physiology of life history traits and trade- 1151 offs (pp 43-55). Oxford University Press.

1152 Sibly, R. M., & Brown, J. H. (2007). Effects of body size and lifestyle on evolution of mammal life 1153 histories. Proc. Nat. Acad. Sci, 104(45), 17707–17712.

1154 Simakov, O., Marletaz, F., Cho, S.-J., Edsinger-Gonzales, E., Havlak, P., Hellsten, U., et al. (2013). 1155 Insights into bilaterian evolution from three spiralian genomes. Nature, 493(7433), 526–531.

1156 Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V., & Zdobnov, E. M. (2015). BUSCO: 1157 Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 1158 31(19), 3210–3212.

1159 Slater, G. J., Goldbogen, J., & Pyenson, N. D. (2017). Independent evolution of baleen whale gigantism 1160 linked to Plio-Pleistocene ocean dynamics. Proc. R. Soc. B, 284, 20170546.

1161 Smit A, Hubley, R & Green, P. 2013-2015. RepeatMasker Open-4.0.

1162 Srivorakul, S., Boonsri, K., Vechmanus, T., & Boonthong, P. (2017). Localized histiocytic sarcoma in a 1163 captive capybara (Hydrochoerus hydrochaeris). Thai J. Vet. Med, 47(1), 131–135.

1164 Stamatakis, A. (2014) RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large 1165 Phylogenies". Bioinformatics, 30 (9): 1312-1313.

35 1166 Sookias, R. B., Benson, R. B. J., & Butler, R. J. (2012). Biology, not environment, drives major patterns 1167 in maximum tetrapod body size through time. Biology Letters, 8(4), 674–677.

1168 Stoffregen, D. A., Prowten, A. W., Steinberg, H., Anderson, W. I., Stoffregen, A., Prowten, W., & 1169 Anderson, I. (1993). A Fibrosarcoma in the Skeletal Muscle of a Capybara (Hydrochoerus hydrochaeris). 1170 J. Wildlife Diseases, 29(2), 345–348.

1171 Sulak, M., Fong, L., Mika, K., Chigurupati, S., Yon, L., & Nigel, P. (2015). TP53 copy number 1172 expansion correlates with the evolution of increased body size and an enhanced DNA damage response in 1173 elephants. bioRxiv.

1174 Summers, K., Silva, J., & Farwell, M. A. (2002). Intragenomic conflict and cancer. Medical Hypotheses, 1175 59(2), 170–179.

1176 Sutter, N. B., Bustamante, C. D., Chase, K., Gray, M. M., Zhao, K., Zhu, L., et al. (2007). A Single IGF1 1177 Allele Is a Major Determinant of Small Size in Dogs. Science, 316, 112–115.

1178 Tacutu, R., Craig, T., Budovsky, A., Wuttke, D., Lehmann, G., Taranukha, D., Costa, J., Fraifeld, V. E., 1179 de Magalhaes, J. P. (2013). Human Ageing Genomic Resources: Integrated databases and tools for the 1180 biology and genetics of ageing. Nucleic Acids Research, 41(D1), D1027-D1033

1181 Talavera, G., and Castresana, J. (2007). Improvement of phylogenies after removing divergent and 1182 ambiguously aligned blocks from protein sequence alignments. Systematic Biology ,56, 564-577.

1183 Tjalma, R. A. (1966) Canine bone sarcoma: estimation of relative risk as function of body size. J. Natl 1184 Cancer Instit. 36, 1137–1150.

1185 Taniguchi, C. M., Winnay, J., Kondo, T., Bronson, R. T., Guimaraes, A. R., Aleman, J. O. et al. (2010). 1186 The PI3K regulatory subunit p85α can exert tumor suppressor properties through negative regulation of 1187 growth factor signalling. Cancer Research, 70(13), 5305–5315.

1188 Thomas, P. D., Campbell, M. J., Kejariwal, A., Mi, H., Karlak, B., Daverman, R. et al. (2003). 1189 PANTHER: A library of protein families and subfamilies indexed by function. Genome Research, 13(9), 1190 2129–2141.

1191 Tollis, M., Boddy, A. M., & Maley, C. C. (2017a). Peto’s Paradox: how has evolution solved the problem 1192 of cancer prevention? BMC Biology, 15(1), 60.

1193 Tollis, M., Schiffman, J. D., & Boddy, A. M. (2017b). Evolution of cancer suppression as revealed by 1194 mammalian comparative genomics. Current Opinion in Genetics and Development, 42, 40–47.

1195 Upham, N. S., & Patterson, B. D. (2012). Diversification and biogeography of the Neotropical 1196 caviomorph lineage Octodontoidea (Rodentia: ). Molecular Phylogenetics and Evolution, 1197 63(2), 417–429.

1198 Van der Bruggen, P., Traversari, C., Chomez, P., Lurquin, C., De Plaen, E., Van den Eynde, B., et al. 1199 (1991). A gene encoding an antigen recognized by cytolytic T lymphocytes on a human melanoma. 1200 Science, 254(5038), 1643–1647.

1201 Vucetich, M., & Deschamps, C. M. (2015). Roedores gigantes en el Museo de La Plata. Museo, 27.

1202 Wang, Y., Coleman-Derr, D., Chen, G., & Gu, Y. Q. (2015). OrthoVenn: A web server for genome wide 1203 comparison and annotation of orthologous clusters across multiple species. Nucleic Acids Research, 1204 43(W1), W78–W84

1205 Waterston, R. H., & Pachter, L. (2002). Initial sequencing and comparative analysis of the mouse 1206 genome. Nature, 420(6915), 520-562.

36 1207 Webb, A. E., Walsh, T. A., & O’Connell, M. J. (2016). VESPA: Very large-scale Evolutionary and 1208 Selective Pressure Analyses. PeerJ Preprints, 4, e1895v1.

1209 Weisenfeld, N.I., Yin, S., Sharpe, T., Lau, B., Hegarty, R., Holmes, L., Sogoloff, B., Tabbaa, D., 1210 Williams, L., Russ, C., Nusbaum, C., Lander, E.S., MacCallum, I., Jaffe, D.B. (2014). Comprehensive 1211 variation discovery in single human genomes. Nature Genetics 46, 1350-1355.

1212 Yandell, M., & Ence, D. (2012). A beginner’s guide to eukaryotic genome annotation. Nature Reviews 1213 Genetics, 13(5), 329–342.

1214 Yang, Z., Nielsen, R., Goldman, N., & Pedersen, a M. (2000). Codon-substitution models for 1215 heterogeneous selection pressure at amino acid sites. Genetics, 155(1), 431–49.

1216 Yang Z, Rannala B. (2006) Bayesian estimation of species divergence times under a molecular clock 1217 using multiple fossil calibrations with soft bounds. Mol Biol Evol.,23:212–26.

1218 Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and 1219 Evolution, 24(8), 1586–91.

1220 Yip, K. W., & Reed, J. C. (2008). Bcl-2 family proteins and cancer. Oncogene, 27(50), 6398–6406.

1221 Yuan, T. L., & Cantley, L. C. (2008). PI3K pathway alterations in cancer: variations on a theme. 1222 Oncogene, 27(41), 5497–5510.

1223 Zhang, J. (2004). Frequent False Detection of Positive Selection by the Likelihood Method with Branch- 1224 Site Models. Mol. Biol. Evol, 21(7), 1332–1339.

1225 Zhang, J., Nielsen, R., & Yang, Z. (2005). Evaluation of an Improved Branch-Site Likelihood Method for 1226 Detecting Positive Selection at the Molecular Level. Mol. Biol. Evol., 22(12): 2472–2479

1227 Zheng, H., Ying, H., Wiedemeyer, R., Yan, H., Quayle, S. N., Ivanova, E. V., et al. (2010). PLAGL2 1228 Regulates Wnt Signaling to Impede Differentiation in Neural Stem Cells and Gliomas. Cancer Cell, 1229 17(5), 497–509.

1230 Zhu, H., Zhou, Z., Wang, D., Liu, W., & Zhu, H. (2013). Hippo pathway genes developed varied exon 1231 numbers and coevolved functional domains in metazoans for species specific growth control. BMC 1232 Evolutionary Biology, 13(1), 76.

1233

1234

1235

37 1236 FIGURES AND TABLES 1237 1238 Table 1. Genome assembly statistics of the capybara genome. 1239 Starting Assembly Final Assembly (DISCOVAR, by the Broad Institute ) (HiRise, by Dovetail) Total length (Mb) 2734.8 2737.7 Contig N50 (Kb) 148.4 161.1 Number of scaffolds 656018 625482 Scaffold N50 (Mb) 0.202 12.2 1240 1241

38 1242 Table S1. Reconstruction of ancestral body mass for the caviomorph clade showed that 1243 capybara likely evolved from a relatively small ancestor. Body mass reconstruction was 1244 done with STABLETRAITS (Elliot and Mooers, 2014), which accounts for rate variation 1245 in trait evolution along a phylogeny. Results from a multi-rate model (α<2) were 1246 compared with a BM model (α=2) to select the best fitting model based on the Deviance 1247 Information Criterion (DIC). 1248 Ancestral state DIC multi-rate DIC single-rate Node ΔDIC Best fitting model (gr - 95% CI) model (α<2) model (α=2) Hydrochoerus - 1132 (437, 23225) Kerodon Multi-rate model 224.59 303.2 78.61 (α<2) Caviomorpha 971 (221, 5135) 1249

39 1250 Table S2. Field and collection information of the two specimens analyzed for genome 1251 size estimation. 1252 Collection Family Species Locality Country Datum Latitude Longitude Number Hydrochoerus Caviidae hydrochaeris AJC 5199 Cabuyaro, Meta Colombia WGS84 4.284025 -72.7256472 Hydrochoerus Caviidae hydrochaeris AJC 5198 Cabuyaro, Meta Colombia WGS84 4.284025 -72.7256472 1253

40 1254 Table S3. Published genomes used for comparative analyses. a Species used for 1255 mitochondrial genome analyses. Mitochondrial genomes were downloaded from the 1256 Organelle Genome Resources database of NCBI. 1257 Clade Species Genome version Database Hystricognathi Guinea pig (Cavia porcellus)a cavPor3 ENSEMBL Hystricognathi Long-tailed chinchilla (Chinchilla lanigera)a ChiLan1.0 NCBI Hystricognathi Degu (Octodon degus)a OctDeg1.0 NCBI

Hystricognathi Naked mole rat (Heterocephalus glaber)a HetGla_female_1.0 NCBI Mouse-related clade Mouse (Mus musculus)a GRCm38.p5 ENSEMBL Mouse-related clade Rat (Rattus norvegicus)a Rnor_6.0 ENSEMBL Mouse-related clade Prairie vole (Microtus ochrogaster) MicOch1.0 NCBI Mouse-related clade Golden hamster (Mesocricetus auratus)a MesAur1.0 NCBI North American deer mouse (Peromyscus Mouse-related clade Pman1.0 NCBI maniculatus) Mouse-related clade Blind mole rat (Spalax galili)a S.galili_v1.0 NCBI Mouse-related clade Lesser Egyptian jerboa (Jaculus jaculus)a JacJac1.0 NCBI Mouse-related clade Ord's kangaroo rat (Dipodomys ordii) Dord_2.0 NCBI Mouse-related clade North American beaver (Castor canadensis) C.can genome v1.0 NCBI Thirteen-lined ground squirrel (Ictidomys Squirrel-related-clade SpeTri2.0 NCBI tridecemlineatus) Squirrel-related-clade Alpine marmot (Marmota marmota) marMar2.1 NCBI

Outgroup Rabbit (Oryctolagus cuniculus) OryCun2.0 NCBI 1258

41 1259 Figure 1. Divergence times, genome statistics and body mass of representative 1260 rodents used for comparative genomics analyses. (A) Divergence times of rodent 1261 species using the topology obtained from the phylogenomic analysis (see Experimental 1262 Procedures). MYA: Million years ago (B) Basic genome statistics of rodents used for 1263 comparative analyses. Neither Repeat Content nor Gene Content was available for Deer 1264 mouse and marmot (C) Body mass ranges from 20.5 gr, being the mouse the smallest

1265 surveyed rodent, to 55,000 gr in the capybara, the largest living rodent. 1266 1267 Figure 2. Bayesian sampling of evolutionary rates of body mass evolution, 1268 suggested a significant upturn in the evolutionary rate of this trait in the most recent 1269 common ancestor of the clade containing capybara-rock cavy-mara (arrow). Hue and 1270 size of circles denote posterior support of a rate shift in the branch, with larger and 1271 redder circles suggesting higher posterior probability supporting an upturn in the 1272 evolutionary rate. Branch color denotes posterior estimated rates of body mass 1273 evolution. CV: Caviomorpha, Oc: Octodontoidea (degus and spiny rats), Ch: 1274 Chinchilloidea (chinchillas and viscachas), Er: Erethizontoidea (porcupines), Ca: 1275 Cavioidea (capybaras, maras and guinea pigs). 1276 1277 Figure 3. Analysis of mutational load for 229 single-copy gene orthologs (SCOs) 1278 across 16 rodents. The capybara shows signatures of mutational load, harboring a 1279 higher rate of accumulation of potentially deleterious mutations. (A) Density plot 1280 showing the distribution of dN/dS values for the 229 SCOs for capybara and the rest of 1281 rodents. Capybaras have a slightly higher mean value of dN/dS relative to the rest of 1282 rodents (dashed lines): 0.15 vs. 0.11, respectively. (B) Mean dN/dS was significantly 1283 higher (T-test, t = -103.41, 95%CI = -0.04263944 to -0.04105141, P < 0.0001) for the 1284 capybara relative to the other rodents, based on 1000 bootstrap replicates from the 1285 distributions in (A). 1286 1287 Figure 4. (A) Venn diagram of the protein orthologous clusters among the 5 1288 hystricognath species: 1) capybara (Hydrochoerus hydrochaeris), 2) guinea pig (Cavia 1289 porcellus), 3) degu (Octodon degus), 4) chinchilla (Chinchilla lanigera) and 5) naked 1290 mole rat (Heterocephalus glaber). Of the 12,878 orthologous groups shared by all 5 1291 species, 4452 were SCOs. (B) Positively selected genes in the capybara. Red dots are 1292 genes that lie the top 5% of the distribution, above and to the right of both axes and,

42 1293 thus, considered to be under positive selection (see Experimental Procedures). 1294 Interestingly, among fifty-six genes under positive selection, four were associated with 1295 growth regulation and tumor suppression (orange dots and arrows). BCL2, Apoptosis 1296 Regulator Bcl-2; PLAGL2, Pleiomorphic adenoma-like Protein 2; PARP2, Poly(ADP- 1297 Ribose) Polymerase 2; RFX6, Regulatory Factor X6. 1298 1299 Figure 5. Gene family composition analysis. (A) Clustering of rodent genomes in a 1300 multidimensional space of molecular functions. The first two principal components are 1301 displayed, accounting for 20.14% and 11.34% of the variation, respectively. PC1 1302 separates the capybara from the rest of the rodents based on the presence of expanded 1303 gene families related to sensory perception and immune response. HYST, 1304 Hystricognathi clade; MRC, Mouse-related clade; SRC, Squirrel-related clade. Black 1305 portion of arrows indicates diversification of gene families within each PC. (B) Gene 1306 families enriched in the capybara. Displayed, are gene families related to growth control 1307 and tumor suppression that showed significant enrichment in the capybara based on 1308 Fisher’s exact test p-values after Bonferroni correction for multiple testing (p < 0.05). 1309 The heatmap shows normalized gene counts of PANTHER molecular function 1310 categories for the 16 rodents. Hhyd, Hydrochoerus hydrochaeris (capybara); Cpor, 1311 Cavia porcellus (guinea pig); Clan, Chinchilla lanigera (chinchilla); Odeg, Octodon 1312 degus (degu); Hgla, Heterocephalus glaber (naked mole rat); Mmus, Mus musculus 1313 (mouse); Rnor, Rattus norvegicus (rat); Mocr, Microtus ochrogaster (vole); Maur, 1314 Mesocricetus auratus (hamster); Pman, Peromyscus maniculatus (deer mouse); Ngal, 1315 Nannospalax galili (blind mole rat); Jjac, Jaculus jaculus (jerboa); Dord, Dipodomys 1316 ordii (kangaroo rat); Ccan, Castor canadensis (beaver); Itri, Ictidomys tridecemlineatus 1317 (squirrel); Mmar, Marmota marmota (marmot). 1318 1319 Figure 6. Network analysis of A) GO Biological Process and B) KEGG Pathways of 1320 capybara genes. Molecular interactions among 100 genes based on lineage-specific 1321 family expansions, positive selection and unique residues. Shown are enriched 1322 Biological Process and KEGG Pathways relevant for the evolution of gigantism and 1323 cancer suppression. Nodes indicate genes and edges correspond to molecular 1324 interactions according to String Database (Mering et al., 2003). Blue nodes are genes 1325 involved in growth, immune response, regulation of cell proliferation and cancer

43 1326 pathways. Big arrows denote the effect of growth and immune pathways in promoting 1327 (‘+’) or suppressing (‘-’) the development of cancer, respectively. 1328 1329 Figure S1. (A) Quality assessment of the capybara genome compared to the guinea pig 1330 genome. CEGMA analysis was conducted for 248 CEGs. BUSCO analysis was 1331 conducted with the vertebrate dataset that included 3023 BUSCOs. Note that guinea pig 1332 scaffold N50 is twice as high as the capybara scaffold N50, but both genomes are of 1333 similar quality. (C): Complete, (D): Duplicated, (F): Fragmented, (P): Partial. (B) 1334 Estimated proportion of the capybara genome sequence occupied by different repetitive 1335 elements. Results are based on identification of known and de novo repeat elements 1336 aligning the capybara genome sequences against a reference library of known repeats 1337 for rodents using RepeatMasker. LINE, long interspersed element; LRT, long terminal 1338 repeat; SINE, short interspersed element. 1339 1340 Figure S2. Functional annotation of the capybara genome with PANTHER. 1341 1342 Figure S3. Divergence times of representative rodent species using the topology 1343 obtained from phylogenomic analyses (See Experimental Procedures). The blue bars on 1344 nodes represent the 95% credibility intervals of divergence time estimates. Two nodes 1345 were calibrated using fossil data (Hedges et al., 2006) as follows: ancestral node of 1346 Rodentia: 76 to 88 million years ago (Ma); ancestral node of Mus and Rattus: 18 to 23 1347 Ma. Black dots on nodes represent bootstrap values over 98% in both analyses: 1348 concatenated and species-tree. Clades from top to bottom: Hystricognathi, Mouse- 1349 related clade and Squirrel-related clade. 1350 1351 Figure S4. (A) Correlation of log-transformed body mass values with the mean dN/dS 1352 of 229 SCOs for each rodent species. Within rodents, mean dN/dS is positively 1353 correlated with body mass indicating a lower efficiency of purifying selection relative to 1354 genetic drift in bigger rodents. (B) Correlation between the mean dN/dS values of 229 1355 SCOs and the mean dN/dS values of the 13-protein-coding mitochondrial genes for ten 1356 rodent species for which both nuclear and mitochondrial genomes were available. 1357 Nuclear and mitochondrial dN/dS values show a significant positive correlation, which 1358 indicates a reduction of the effective population size caused by the increment in body 1359 mass.

44 1360 Figure S5. Time-calibrated phylogram of caviomorph rodents (from Upham and 1361 Patterson, 2012) with values of body mass indicated to the right of each species. The 1362 broad range in body mass within caviomorph rodents reaches three orders of magnitude, 1363 with the smallest species weighting under 100g and the largest species, the capybara, 1364 weighting 55,000g. The red node corresponds to the most recent common ancestor of 1365 caviomorph rodents. 1366 1367 1368

45 1369 Figure 1.

A. B. C. Genome Repeat Number of size (Mb) content (%) genes Naked mole rat 2631.08 28.63 21,805

Chinchilla 2390.87 29.57 26,352

Degu 2995.89 23.33 20,696

Capybara 2737.7 37.45 24,941

Guinea pig 2723.22 37.89 23,264

Mouse 2671.82 44.16 36,487

Rat 2743.3 41.98 31,695

Vole 2287.34 23 20,741

Hamster 2504.93 22.15 21,194

Deer mouse 2630.54 - -

Blind mole rat 3061.42 32.89 22,048

Jerboa 2835.25 33.4 18,942

Kangaroo rat 2236.37 31.19 20,696

Beaver 2518.31 33.74 23,503

Marmot 2510.59 - -

Squirrel 2195.88 33.57 20,418

80 40 0 0 10000 20000 30000 40000 50000 Body mass (gr) 1370 MYA 1371 1372

46 shift direction 1.00 0.80 1373 Figure 2. 0.50 0.20 1374 0.00 -0.20 point.seq -0.50 -0.80 -1.00

Oc Ch Er Ca

Ctenomys steinbachi Ctenomys boliviensis Ctenomys haigi Ctenomys bridgesi Octodon lunatus Octodon degus Octodon fuscus Aconaemys sagei Aconaemys porteri Aconaemys cyanus Spalacopus barrerae Tympanoctomys Octomys mimax gliroides Octodontomys spinosus Euryzygomatomys laticeps shift setosus Trinomys eliasi Trinomys dimidiatus Trinomys direction iheringi Trinomys dactylinus boliviensis Dactylomys amblyonyx Kannabateomys Toromys grandis blainvilii Phyllomys chrysurus Echimys macrura brasiliensis Phyllomys didelphoides Makalata sinnamariensis Isothrix bistriata emiliae Lonchothrix hispidus apereoides longicaudatus roberti Proechimys simonsi Proechimys quadruplicatus Proechimys gymnurus Hoplomys coypus Myocastor pilorides Capromys cinerea Abrocoma bennettii Abrocoma maximus Lagostomus lanigera Chinchilla viscacia Lagidium branickii Dinomys bicolor Coendou melanura Sphiggurus dorsatum Erethizon taczanowskii Cuniculus paca Cuniculus leporina Dasyprocta acouchy Myoprocta musteloides Galea tschudii Cavia porcellus Cavia aperea Cavia australis Microcavia patagonum Dolichotis rupestris Kerodon hydrochaeris Hydrochoerus typicus Petromus swinderianus Thryonomys glaber Heterocephalus argenteocinereus Heliophobius hottentotus Cryptomys suillus Bathyergus capensis Georychus Hystrix brachyurus 1.00 0.80 posterior rates 0.50 0.20 rep(-0.5, length(point.seq)) 1.563 0.00 0.738 -0.20 point.seq 0.348 -0.50 -0.80 0.165 -1.00 0.078 0.037 point.seq 0.017 Ctenomys steinbachi Ctenomys boliviensis Ctenomys haigi Ctenomys bridgesi Octodon lunatus Octodon degus Octodon fuscus Aconaemys sagei Aconaemys porteri Aconaemys cyanus Spalacopus barrerae Tympanoctomys Octomys mimax gliroides Octodontomys spinosus Euryzygomatomys laticeps Clyomys setosus Trinomys eliasi Trinomys dimidiatus Trinomys iheringi Trinomys dactylinus Dactylomys boliviensis Dactylomys amblyonyx Kannabateomys Toromys grandis blainvilii Phyllomys chrysurus Echimys macrura Makalata brasiliensis Phyllomys didelphoides Makalata Isothrix sinnamariensis Isothrix bistriata emiliae Lonchothrix hispidus Mesomys apereoides Thrichomys longicaudatus Proechimys roberti Proechimys simonsi Proechimys quadruplicatus Proechimys gymnurus Hoplomys coypus Myocastor pilorides Capromys cinerea Abrocoma bennettii Abrocoma maximus Lagostomus lanigera Chinchilla viscacia Lagidium branickii Dinomys bicolor Coendou melanura Sphiggurus dorsatum Erethizon taczanowskii Cuniculus paca Cuniculus leporina Dasyprocta acouchy Myoprocta musteloides Galea tschudii Cavia porcellus Cavia aperea Cavia australis Microcavia patagonum Dolichotis rupestris Kerodon hydrochaeris Hydrochoerus typicus Petromus swinderianus Thryonomys glaber Heterocephalus argenteocinereus Heliophobius hottentotus Cryptomys suillus Bathyergus capensis Georychus Hystrix brachyurus 0.008 posterior rates shift direction rep(-0.5, length(point.seq)) 1.563 1.00 0.004 0.738 0.80 0.348 0.50 0.165 0.20 0.078 0.00 0.037 -0.20 point.seq point.seq 0.017 -0.50 0.008 -0.80 0.004 -1.00 shift probability rep(-0.5, length(point.seq)) 1.000 0.875

Ctenomys steinbachi Ctenomys boliviensis Ctenomys haigi Ctenomys bridgesi Octodon lunatus Octodon degus Octodon fuscus Aconaemys sagei Aconaemys porteri Aconaemys cyanus Spalacopus barrerae Tympanoctomys Octomys mimax gliroides Octodontomys spinosus Euryzygomatomys laticeps Clyomys setosus Trinomys eliasi Trinomys dimidiatus Trinomys iheringi Trinomys dactylinus Dactylomys boliviensis Dactylomys amblyonyx Kannabateomys Toromys grandis blainvilii Phyllomys chrysurus Echimys macrura Makalata brasiliensis Phyllomys didelphoides Makalata Isothrix sinnamariensis Isothrix bistriata emiliae Lonchothrix hispidus Mesomys apereoides Thrichomys longicaudatus Proechimys roberti Proechimys simonsi Proechimys quadruplicatus Proechimys gymnurus Hoplomys coypus Myocastor pilorides Capromys cinerea Abrocoma bennettii Abrocoma maximus Lagostomus lanigera Chinchilla viscacia Lagidium branickii Dinomys bicolor Coendou melanura Sphiggurus dorsatum Erethizon taczanowskii Cuniculus paca Cuniculus leporina Dasyprocta acouchy Myoprocta musteloides Galea tschudii Cavia porcellus Cavia aperea Cavia australis Microcavia patagonum Dolichotis rupestris Kerodon hydrochaeris Hydrochoerus typicus Petromus swinderianus Thryonomys glaber Heterocephalus argenteocinereus Heliophobius hottentotus Cryptomys suillus Bathyergus capensis Georychus Hystrix brachyurus CV 0.750 shift direction shift probability posterior rates 0.625 1.00 rep(-0.5, length(point.seq)) 1.000 rep(-0.5, length(point.seq)) 1.563 0.80 0.875 0.738 0.500 0.50 0.750 0.348 0.375 point.seq 0.20 0.625 0.165 0.250 0.00 0.500 0.078 0.375 0.037

-0.20point.seq point.seq 0.125 point.seq -0.50 0.250 0.017 0.000 -0.80 0.125 0.008 -1.00 0.000 0.004 1375 Ctenomys steinbachi Ctenomys boliviensis Ctenomys haigi Ctenomys bridgesi Octodon lunatus Octodon degus Octodon fuscus Aconaemys sagei Aconaemys porteri Aconaemys cyanus Spalacopus barrerae Tympanoctomys Octomys mimax gliroides Octodontomys spinosus Euryzygomatomys laticeps Clyomys setosus Trinomys eliasi Trinomys dimidiatus Trinomys iheringi Trinomys dactylinus Dactylomys boliviensis Dactylomys amblyonyx Kannabateomys Toromys grandis blainvilii Phyllomys chrysurus Echimys macrura Makalata brasiliensis Phyllomys didelphoides Makalata Isothrix sinnamariensis Isothrix bistriata emiliae Lonchothrix hispidus Mesomys apereoides Thrichomys longicaudatus Proechimys roberti Proechimys simonsi Proechimys quadruplicatus Proechimys gymnurus Hoplomys coypus Myocastor pilorides Capromys cinerea Abrocoma bennettii Abrocoma maximus Lagostomus lanigera Chinchilla viscacia Lagidium branickii Dinomys bicolor Coendou melanura Sphiggurus dorsatum Erethizon taczanowskii Cuniculus paca Cuniculus leporina Dasyprocta acouchy Myoprocta musteloides Galea tschudii Cavia porcellus Cavia aperea Cavia australis Microcavia patagonum Dolichotis rupestris Kerodon hydrochaeris Hydrochoerus typicus Petromus swinderianus Thryonomys glaber Heterocephalus argenteocinereus Heliophobius hottentotus Cryptomys suillus Bathyergus capensis Georychus Hystrix brachyurus

1376 posterior rates shift probability rep(-0.5, length(point.seq)) 1.563 rep(-0.5, length(point.seq)) 1.000 0.738 0.875 1377 0.348 0.750 0.165 0.625 0.078 0.500 0.037 0.375 point.seq point.seq 1378 0.017 0.250 0.008 0.125 0.004 0.000

shift probability rep(-0.5, length(point.seq)) 1.000 0.875 0.750 0.625 0.500 0.375 point.seq 0.250 0.125 0.000

47 1379 Figure 3.

0.200 A. B.

● P = 2.2e−16 ● ● ● 6 ● ●

0.175 Taxon Capybara Other rodents

4 0.150 Density

2 Mean nuclear dN/dS Mean nuclear 0.125

● ● ● ● ●

● ● 0 0.100

0.0 0.5 1.0 1.5 2.0 Capybara Other Rodents Nuclear dN/dS Taxon 1380 1381 1382

48 1383 Figure 4.

A. B.

● ●

● ● 2 ● ● 0.4 ● ● ●

1 ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● PARP2 3 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● PLAGL2 ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ●● ●●● ● ●● ● ●● ●● ●● ●●●●●● ● ●● ● ●● ●● ● ● ● ● ● ●● ●●●● ●● ●●● ●● ● ●● ● ● ● ● ● ●●●●●● ●● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ●●● ● ●●● ●●●●● ●● ● ● ●● ● ● ●●●●●●● ●● ● ● ●●●●●●●●● ●● ●● ●● ● ●● ● ● ● ● ●● ●●●●●●●● ● ●●●● ●●●●● ● ● ●●● ● ● ● ● ● ●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ● ●●● ●●● ●●● ● RFX6 ●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ● ●● ● ● ● ●●●●●●●●●●● ●●●●● ●●●●●●●● ●●●●●●● BCL2●●●●●●●●●●●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●● ●● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●● ● ● ●● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ● ● ● 0.0 ●●●●●●●●●●●●●●●●●●●● ●● ● ● ● ● ●●●●●●●●●●●●●●●● ● ●● ●● ● ●●●●●●●●●● ●● ●● ● ●●●● ● ● ●● ● ●●●●● ●●● ● ● ●●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● Protein distance index Protein ● ● ● ● ●●

● ● 5 4 ● −0.4

0 100 200 300 400 Likelihood−ratio branch−site test 1384 1385

49 1386 Figure 5.

A. B. Raw Z−score Itri 0 1 2 3 ● 0.0025 Mmar

Ccan ● Dord ● PTHR19339:SF7 TRAV22

● ● Jjac Odeg

Hgla PTHR23267:SF202 ● TRAV12-2 ● Ngal Cpor ● Group T-cell mediated Clan HYST tumor suppression ● ●a

pathway 0.0000 PTHR24271:SF38 ● ●a MRC GZMB Mocr ● ●a SRC Maur

MAGEB5 signaling PTHR11736:SF103

PC2 (11.34%) Pman ● GPCR Cell PTHR11991:SF6 TPT1 reprogramming

−0.0025 gene familes PANTHER

PTHR10155:SF7 PI3KR2

● PI3K-Akt signaling Rnor ● (Growth) Hhyd PTHR24356:SF266 PRKCI

● Mmus

−0.005 0.000 0.005 0.010 0.015 Itri Jjac Clan Hgla Ngal Cpor Rnor Dord Mocr Maur Hhyd Ccan

PC1 (20.14%) Odeg Mmar Pman Mmus Species Sensory perception of smell GPCR signaling pathway 1387 T/B cell receptor signaling pathway 1388

50 1389 Figure 6.

A. B.

Pancreas development and response to insulin Cancer

Apoptotic signaling pathway

Immune response

Growth

Negative regulation of cell proliferation

T-cell proliferation- activation and immune response to tumor cell 1390 1391

51 100 97.58 94.35

88.98 87.72 1392 Figure S1. 75

64.92 62.5 A. B.

sp 50 Capybara (Scaffold N50: 12.2Mb) Guinea pig (Scaffold N50: 27.9Mb) 100 97.58 94.35

% Recovered genes % Recovered 88.98 87.72

25

75

64.92 6.25 7.04 62.5

0.86 0.89 0 sp 50 BUSCOs(C) BUSCOs(D) BUSCOs(F) CEGs(C) CEGs(P) Capybara (Scaffold N50: 12.2Mb) Gene Category Guinea pig (Scaffold N50: 27.9Mb) % Recovered genes % Recovered

25

6.25 7.04

0.86 0.89 0

BUSCOs(C) BUSCOs(D) BUSCOs(F) CEGs(C) CEGs(P) 1393 Gene Category 1394 1395

52 1396 Figure S2.

1397 1398

53 1399 Figure S3.

Capybara Guinea pig

Chinchilla Degu

Naked mole rat

Kangaroo rat

Beaver Jerboa

Blind mole rat

Hamster Vole Deer mouse

Rat

Mouse Squirrel Marmota Rabbit

80 70 60 50 40 30 20 10 0 1400 Million years ago 1401 1402

54 1403 Figure S4. 1404

0.175 A. ● B.

0.15

● 0.150 ● ●

0.10 ● 0.125 ●

● Mean nuclear dN/dS Mean nuclear

● ● ● 2 ● ● 0.100 ● R = 0.294 dN/dS Mean mitochondrial ● ● ● ● ● 0.05 ● ● P = 0.01748 ● ● Spearman's ρ = 0.67 ● P = 0.03938 0.075 3 5 7 9 11 0.10 0.11 0.12 0.13 0.14 0.15 Ln(body mass) Mean nuclear dN/dS 1405 1406

55 1407 Figure S5. 1408

Hystrix brachyurus Georychus capensis Bathyergus suillus Cryptomys hottentotus Heliophobius argenteocinereus Heterocephalus glaber Thryonomys swinderianus Petromus typicus Hydrochoerus hydrochaeris Kerodon rupestris Dolichotis patagonum Microcavia australis Cavia aperea Cavia porcellus Cavia tschudii Galea musteloides Myoprocta acouchy Dasyprocta leporina Cuniculus paca Cuniculus taczanowskii Erethizon dorsatum Sphiggurus melanura Coendou bicolor Dinomys branickii Lagidium viscacia Chinchilla lanigera Lagostomus maximus Abrocoma bennettii Abrocoma cinerea Capromys pilorides Myocastor coypus Hoplomys gymnurus Proechimys quadruplicatus Proechimys simonsi Proechimys roberti Proechimys longicaudatus Thrichomys apereoides Mesomys hispidus Lonchothrix emiliae Isothrix bistriata Isothrix sinnamariensis Makalata didelphoides Phyllomys brasiliensis Makalata macrura Echimys chrysurus Phyllomys blainvilii Toromys grandis Kannabateomys amblyonyx Dactylomys boliviensis Dactylomys dactylinus Trinomys iheringi Trinomys dimidiatus Trinomys eliasi Trinomys setosus Clyomys laticeps Euryzygomatomys spinosus Octodontomys gliroides Octomys mimax Tympanoctomys barrerae Spalacopus cyanus Aconaemys porteri Aconaemys sagei Aconaemys fuscus Octodon degus Octodon lunatus Octodon bridgesi Ctenomys haigi Ctenomys boliviensis Ctenomys steinbachi

120 100 80 60 40 20 0 0 10000 20000 30000 40000 50000 Million years ago Body mass (g) 1409

56