MBE Advance Access published January 11, 2013

Article Discoveries section

Protein insertions and deletions enabled by neutral roaming in Downloaded from sequence space

http://mbe.oxfordjournals.org/

Ágnes Tóth-Petróczy and Dan S. Tawfik

Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, 76100, Israel at Fordham University Library on January 17, 2013

Running title: Correlated evolution of InDels and substitutions

Corresponding author: Prof. Dan S. Tawfik, [email protected]

1

© The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Abstract

Backbone modifications via insertions and deletions (InDels) may exert dramatic effects, for better (mediating new functions) and for worse (causing loss of structure and/or function). However, contrary to point (substitutions), our knowledge of the evolution and structural-functional effects of InDels is limited, and so is our capability to engineer them. We sought to assess how deleterious InDels are relative to point mutations, and understand the mechanisms that mediate their acceptance. Analysis of the evolution of InDels in orthologous protein phylogenies indicated that their rate of purging Downloaded from is ≥ 9-fold higher than for point mutations. In yeast, for example, the substitutions-to- InDels ratio is ~14-fold higher in protein coding than in non-coding regions. The incorporation of InDels relative to substitutions is not only slow, but also non-linear. On http://mbe.oxfordjournals.org/ average, ≥50 substitutions accumulate prior to the appearance of the first InDel. We also found that substitutions are enriched in sequential and spatial proximity to InDels, suggesting that certain substitutions are correlated with InDels. As indicated by the lag in InDels accumulation, some of these correlated substitutions may have occurred first, as at Fordham University Library on January 17, 2013 apparently-neutral mutations, and later enabled the accumulation of InDels that would be otherwise purged. Thus, compensatory substitutions may follow InDels in an 'adaptive walk' as traditionally assumed, but might also accumulate first, by 'neutral roaming'. The dynamics of InDels accumulation also depends on their genomic frequencies - InDels in flies are 4-fold more frequent than in yeast, and tend to be compensated rather than enabled.

2

Introduction

Short insertions and deletions (InDels, in the order of several ) are frequent events, occurring in some genomes at a frequency that is comparable to that of point mutations (Denver et al. 2004; Lynch et al. 2008). However, whilst the effects of substitutions ( and amino acid point mutations) were extensively studied, our understanding of InDels is limited. Likewise, whereas protein engineering by substitutions is a matter of routine, the engineering of backbone modifications comprises a challenge (Hu et al. 2007; Murphy et al. 2009; Afriat-Jurnou et al. 2012). InDels can Downloaded from severely perturb the structural integrity of proteins, and may therefore be highly deleterious. At the same time, InDels often comprise the key step in the acquisition of new functions (Tawfik 2006; Akiva et al. 2008; Neuenfeldt et al. 2008; Britten 2010; http://mbe.oxfordjournals.org/ Cooley et al. 2010; Hashimoto and Panchenko 2010). For example, a in an active-site loop, combined with an adjacent, correlated substitution, led to a >106 change in enzyme selectivity (Afriat-Jurnou et al. 2012). Such dramatic shifts are not observed with substitutions. Purifying selection (neutral evolution), and adaptive evolution, may at Fordham University Library on January 17, 2013 therefore act more strongly on InDels than on substitutions, but the relative magnitudes remain unknown. Likewise, compensatory mechanisms are likely to be critical to InDels acceptance, but the nature of these mechanisms is scarcely addressed (Leushkin et al. 2012).

Here, we have examined the relationship between sequence divergence by amino acid substitutions and InDels. We investigated how InDels evolve in natural proteins and obtained new insights regarding the mechanisms of acceptance of InDels. Specifically, we explored the following questions: What is the magnitude of purifying selection acting on InDels? Are InDels less abundant than substitutions because they occur less frequently than point mutations or because they are more intensely purged (and are therefore also harder to engineer)? Is there a correlation between InDels and certain substitutions? By which mechanism do InDels become tolerated: Are InDels compensated by certain substitutions after their incorporation, i.e., via an adaptive walks (Leushkin et al. 2012) as generally assumed for deleterious mutations? Or do, perhaps, substitutions accumulate

3 first as neutral and thereby allow InDels to accumulate at a later stage (a mechanism dubbed here 'neutral roaming')?

To explore the above issues, we studied orthologous proteins along two different phylogenies - yeast and fly, and examined the emergence of substitutions and InDels within their proteins. The relative frequencies of occurrence of InDels in these organisms are substantially different. In flies, where compensatory substitutions have been identified in correlation with InDels (Leushkin et al. 2012), point mutations occur only 4 times more frequently than InDels. In yeast, however, InDels are more rare (1/16 relative to Downloaded from point mutations).

http://mbe.oxfordjournals.org/

Results

The substitutions to InDels ratio

We assembled a database of common orthologs of 22 fungi species that experienced

~500 million years of evolution, and aligned their sequences. After filtering of low at Fordham University Library on January 17, 2013 quality sequences and annotation mistakes, our dataset comprised 1310 orthologous families (Dataset S1). Gapped positions were identified in pairwise alignments. InDels were defined as consecutive gapped regions within domains only (i.e., the N- and C- termini length variations were excluded). The most frequent InDels (33%) were of a single amino acid residue, and the median length was of 2 residues (Fig. S1). We then compared the level of sequence divergence with respect to InDels versus substitutions. Comparison of both proteomes and of individual families indicated that the substitutions/InDels ratio (S/I hereafter) is not constant - at low divergence, InDels accumulated at a slower rate than at high divergence (Fig. 1a, 1b; Fig. S2).

Handling InDels comprises a challenge for alignment algorithms (Golubchik et al. 2007). To assess the bias introduced by the alignment algorithm, we tested four algorithms that use different gap-handling strategies (see Methods). The nonlinear, lag pattern was observed throughout although its magnitude varied (Fig. 1c). We subsequently chose MUSCLE that is the most conservative with gap assignments and exhibited the weakest

4 signal of non-linearity (Fig. 1c). The results presented here therefore represent the most conservative estimate of correlation between substitutions and InDels.

Correlated emergence of substitutions and InDels

The lag phase in the accumulation of InDels (Fig. 1ab) suggests that certain substitutions accumulate prior to the acceptance of InDels. A similar lag behaviour was found between the accumulation of surface and core mutations, suggesting a surface-core correlation, and a mechanism of apparently-neutral surface substitutions enabling the accommodation Downloaded from of core substitutions that would be otherwise deleterious (Toth-Petroczy and Tawfik 2011).

To quantify the nonlinear relationships we used a second-order polynomial function: http://mbe.oxfordjournals.org/ y(x)=ax2 + bx, that correlates the number of substitutions (x) and InDels (y) in evolving 2 proteins (Fig. 1a; R secondorder=0.96). We defined the lag intensity as the absolute value of |b/a|; the lower the value the more intense the lag is. An alternative measure is lag length, defined as the intercept with the x axis according to a linear fit. By the null hypothesis of independent accumulation, the number of substitutions at zero InDels would be zero. at Fordham University Library on January 17, 2013 However, the lag lengths observed are higher than zero (Fig. 1a; lag length = 10% sequence divergence, or on average, ~30-40 substitutions per domain).

In addition to the comparison of proteomes (Fig. 1a), we compared protein families individually. We grouped fungi orthologous pairs by sequence divergence, and calculated the average number of InDels for each bin (Fig. 1b). This representation confirms the nonlinear relationship between InDels and substitutions accumulation. Despite the huge variability between proteins, the mean values in each bin are best fitted by the second 2 2 order function (R secondorder=0.99) and the linear fit (R linear=0.89) gives a lag length of 8% sequence divergence by substitutions. Further, grouping proteins by their evolutionary rates to slow, average and fast categories (Toth-Petroczy and Tawfik 2011) showed that, as expected, slowly evolving proteins show the highest substitution/InDels ratio, as well as the strongest lag (Fig. S3a).

Finally, we examined the substitutions/InDels ratios in clusters of orthologous groups (COGs) that include sequences from all kingdoms of life. The lag trend is clearly

5 observed (Fig. 1d). Despite considerable variability, >90% of COGs show a lag, with the average lag length being 56 substitutions (Fig. 1d inset).

Substitutions and InDels in non-coding regions are not correlated

Is the observed correlation because of the the genetic mechanisms that incorporate substitutions together with InDels (Yang et al. 2008; De and Babu 2010; Hicks et al. 2010), or it is the outcome of purifying selection acting on InDels within protein domains? To address this question, we analyzed a dataset of the flanking non-coding 5’ Downloaded from regions of the 1310 orthologous yeast proteins (Kellis et al. 2003). Although these regions are partly under selection they were chosen because, unlike most other non- coding genome parts, they could be reliably aligned. Unlike coding regions, 5'-UTR http://mbe.oxfordjournals.org/ regions exhibit a linear relationship between nucleotide substitutions and InDels (Fig. 2). A comparison of a subset of four closely related yeast species, where limited divergence enables even more reliable alignments, gave the same result (Fig. S3b). This suggests that even under selection, the appearance of InDels in non-coding regions is independent of

the accumulation of nucleotide substitutions. at Fordham University Library on January 17, 2013

The measure of correlations between InDels and substitutions can also be affected by saturation, or multiple hits at the same site. Saturation might results in underestimation of the divergence rate for substitutions, or, in underestimating the InDel rates. In the case of InDels, there is no available model for reversible or multiple InDel events. However, we could estimate from the occurrence of reverting InDels in independent lineages (i.e., insertions and deletions that occurred at the same site) that multiple hits are rare (5% in fungi phylogenies, and 1% in the fly phylogenies described below). In addition, the fraction of multiple hits is proportional to the overall fraction of InDels (as since, for example, in the fungi vs. the fly phylogenies that represents lower divergence; see below). Thus, correction for multiple hits would only increase the lag intensity. Correcting for saturation of substitutions would decrease the lag intensity, but models that account for saturation indicated that it only plays a significant role at high divergence (Yang 1998). Foremost, the comparison of coding regions to 5'-UTRs also indicates that the non-linearity observed in coding regions is unlikely to be the outcome of saturation of

6 sites resulting in underestimated divergence rates (Gojobori 1983), as saturation occurs much faster for base than for amino acid exchanges.

Thus, irrespectively to the protein sets compared, and even when the most conserved alignment algorithm is applied, the accumulation of InDels shows a non-linear relationship with sequence divergence as measured by substitutions. This non-linearity seems to be the outcome of selection.

InDels in ordered versus disordered regions Downloaded from

InDels appear in loops more frequently than in secondary structure elements (de la Chaux et al. 2007). However, the same trend holds for substitutions (Tokuriki et al. 2008). We therefore examined the relative tolerance of InDels vs. substitutions in secondary http://mbe.oxfordjournals.org/ structure elements and in loops (regions with no secondary structure). We analyzed all yeast orthologous families where the S. cerevisiae ortholog has a known structure (N=170). The number of substitutions in loops and in secondary structures largely follows linear relationship with a slope of 1.25. On average, ~2/3 of these proteins' at Fordham University Library on January 17, 2013 residues comprises secondary structure elements. The 1.25 slope therefore reflects a 2- fold higher tolerance of substitutions in loops then in structured regions (Fig. S4). For InDels, however, the relationship is nonlinear, and loops are ~14 times more tolerable to InDels than secondary structured regions (Fig. 3). In accordance, InDels are more tolerated in disordered regions than in ordered ones; the S/I is on average 3-fold higher for ordered regions the relative accumulation rate of InDels and substitutions is nearly linear (Fig. 3, Fig. S5).

Due to higher occurrence of polymerase slippage, InDels occur at higher frequency in repeat regions (McDonald et al. 2011), and disordered regions are enriched in repeat sequences (Simon and Hancock 2009). Is the increased occurrence of InDels in disordered regions due to the presence of repetitive sequences? Our dataset contains relatively few tandem repeats (0.04% of all InDels), defined as ≥4 residues duplications). Thus, enrichment of InDels in repeat regions is not a significant phenomenon within conserved orthologs.

7

Enabling substitutions in InDel neighboring and contacting residues

As demonstrated above, in ordered protein domains, on average, 50 substitutions accumulate prior to the appearance of the first InDel. However, only few of these substitutions might actually be correlated with the appearance of a particular InDel. In general, strong enrichment in substitutions in contacting residues appear to underline drifting orthologs (Callahan et al. 2011; Toth-Petroczy and Tawfik 2011). We therefore examined the rate of substitutions in positions that are in sequential and spatial neighborhood to InDels. We compared them to the rate of substitutions in the remaining Downloaded from parts of the protein, thus deriving a relative measure of local/global rates.

Nearly 70% of the InDel flanking positions (sequential neighbors) and contacting http://mbe.oxfordjournals.org/ positions (spatial neighbors) evolve faster than the rest of the protein (Fig. 4a). We also calculated the sequence identity of regions that flank InDels relative to the entire protein, and found increased divergence (Fig. 4b). The enriched divergence drastically declines with increasing window size – the strongest signal is at window size of 5 residues of

sequential neighbours, and the signal largely disappears at a window of 20 residues. at Fordham University Library on January 17, 2013 Around 20% of flanking positions show the opposite behavior: The InDel regions evolve slower than the rest of the protein. However, in these cases, the regions where the InDels occurred show much higher divergence than the rest of the protein (Fig. S6b and c). For non-coding regions, no increased divergence is seen in regions that are flanking InDels (mean local identity – global identity ~0; Fig. S6a). The signals of enriched divergence therefore suggest that correlated mutations occur primarily in proximity, or even in direct contact, with the "future" InDel. Thus, as indicated by other data (Fig. 2), the increased local divergence around InDels is due to selection rather than due to the genetic mechanisms that incorporate InDels (Hicks et al. 2010; Borneman et al. 2011).

The increased divergence rates of indel flanking regions (Fig. 4) may also relate to the fact that InDels accumulate in regions which are generally more permissive to mutations, and specifically to loop regions (Fig. 3). To support the hypothesis of InDel-correlated substitutions, we compared close orthologs with and without InDels. The alignments follow ~500 million years of drift, yet only include 22 sequences from the currently available genomes. Consequently, the predicted ancestral nodes and their descendent

8 proteins are separated by numerous sequence exchanges. Nonetheless, we assessed the number of InDel-specific substitutions via comparison of reference and outgroup sequences in a manner similar to that of Leushkin et al. (Leushkin et al. 2012). The alignments were polarized using a reference sequence where the InDel did not occur, and an outgroup (Fig 5a). As found for fly orthologs (Leushkin et al. 2012), InDel-carrying sequences show an increase in the frequency of substitutions relative to their reference sequences. In both the fly and fungi sets, the increase is most pronounced at window size

of 5 residues and disappears by 20 residues (Fig. 5bc, N=534 common fungi and fly Downloaded from orthologs). The shift is, however, far more pronounced for the fly phylogenies (Fig 5c), probably due to the InDel rates in fly genomes being ~4-fold higher than in fungi (see

below). Furthermore, the positions, where InDel-specific substitutions occur show, on http://mbe.oxfordjournals.org/ average, slower evolutionary rates than the reference specific ones (Fig. S8). Namely, the InDel-specific substitutions appear at more conserved sites, which supports their correlated appearance with the InDels.

InDels incorporation in fly versus fungi proteins at Fordham University Library on January 17, 2013

Organisms significantly differ in the ratio of appearance of point mutations versus InDels. C. elegans, for example, comprises an extreme case with a higher InDel rate than substitution rate (Denver et al. 2004). But even for S. cerevisiae and D. melanogaster the S/I ratios differ from 16 to 4, respectively (Lynch et al. 2008). Do these different genomic rates affect the S/I ratio in proteins? Does the higher rate affect the mechanism by which InDels accumulate?

To examine these questions, we constructed a dataset of 12 different Drosophila species that share in total 10119 orthologs (Dataset S1; Fig. S9). We found, that fly proteins accumulated up to 3-fold more InDels relative to substitutions compared to yeast (Fig. S7a). However, in the fly phylogeny that corresponds to ~50 my of evolution, ca. 80% of the proteome is assigned as shared orthologs, while in the fungi phylogeny (~500 my), only ~15% are assigned as orthologs. Consequently, the fungi orthologs represent highly conserved and structurally ordered proteins, whereas the fly orthologs include more disordered proteins (Fig. S7b). To eliminate this bias (Fig. 3, Fig. S5), we analyzed orthologs that are common to both phylogenies (N=534 proteins). For this dataset, the S/I

9 ratio in the fly phylogeny was 2-fold lower than for yeast (Fig. 6). Thus, although InDels occur ~4 times more frequently in fly than in yeast genomes, under selection, only 2-fold higher rate of InDels is observed. Thus, purifying selection eliminated most of the higher proportion of InDels that occurred in fly genomes.

The lag in InDels appearance seems less pronounced in the fly relative to the fungi phylogeny (Fig. 6), but the signal for InDels and co-occurring substitutions in their flanking regions is stronger (Fig. 5c versus 5b). Whilst other factors such as populations sizes and difference in the rate sequencing errors in flies and fungi genomes may apply, it Downloaded from appears that the higher rate of InDels in fly proteins resulted in a larger fraction of InDels being compensated after their appearance, as previously argued (Leushkin et al. 2012), http://mbe.oxfordjournals.org/ than being enabled by neutral mutations, as seen in the yeast phylogenies.

Discussion

What are the origins of the difference in the rate of accumulation of substitutions vs. at Fordham University Library on January 17, 2013 InDels? Two main reasons might apply: (i) differences in the intrinsic rates of these two types of genomic modifications (i.e. regardless of selection); (ii) InDels are far more deleterious than point mutations and purifying selection acts more strongly on them. Our analysis of non-coding regions indicates an S/I (substitutions, including both synonymous and non-synonymous exchanges, to InDels ratio) of ~8 (Table 1). The mutation rates measured for S. cerevisiae indicate an S/I of ~16 (0.33x10-9 base substitution events per germ-line cell division versus 0.02x10-9 for short InDels) (Lynch et al. 2008). The difference between S/I of 16 and 8 may originate from differences in laboratory versus naturally occurring strains. Underestimation of the substitution rate due to reversible exchanges (saturation) is also a possibility, although we measured a similar ratio for all 22 species and for 4 closely diverged ones (Fig. S3b). In coding regions, ~2/3 of the InDels are not accepted due to frame shifts, while the frequency of non-sense substitutions is ~0.05. So we would expect substitutions to be ~8x3, i.e., ~24 times more frequent than InDels in coding regions. However, the observed S/I rate in coding regions (including synonymous substitutions) is, on average, ~220 (Table 1). This implies that

10 regardless of frame shifts, purifying selection acts much more strongly on InDels. Further, the degree of purifying selection acting on InDels, and the concomitant level of correlation between InDels and substitutions, vary depending on the degree of structural packing and cooperativity. Weak selection and no correlation are observed in DNA promoter regions that comprise linear motifs. In proteins, weak correlation is observed in disordered protein regions, whereas the highest level of purifying selection and correlation is seen in secondary structural elements that comprise the highest degree of

structural order. Within ordered regions selection acts at least 9-fold more intensely on Downloaded from InDels, and up to 100-fold more intensely in secondary structure elements (Fig. 3). Consequently, InDels are preferentially located in regions with no assigned structure (de

la Chaux et al. 2007). http://mbe.oxfordjournals.org/

The substitutions-to-InDels ratio is not only high due to the intensity of selection, but is also not constant - it decreases on the level of sequence divergence (Fig 1, Fig. S2). An opposite tendency was shown for nucleotide exchanges and InDels while comparing full genome alignments of closely related mammalian species (Chen et al. 2009). The at Fordham University Library on January 17, 2013 discrepancy might reflect the different tendencies in noncoding regions (McDonald et al. 2011) (Fig. 2), and especially in repeat regions, and may also relate to ambiguities associated with aligning full genomes. In general, alignment algorithms handle gaps very poorly even in coding regions (Golubchik et al. 2007). The choice of the alignment algorithm strongly influences the S/I ratio. We tested four different algorithms that use different strategies for gap handling. The most InDels are incrorporated by PRANK, which is using phylogenetic information for the identification of insertions and deletions, and avoids multiple penalization of InDel events (Loytynoja and Goldman 2008). Nonetheless, the correlated nature of InDels and substitutions was corroborated by all algorithms tested (Fig. 1c).

Accumulating evidence indicates that point mutations and InDels are interdependent (De and Babu 2010; Hicks et al. 2010; Leushkin et al. 2012). At the genomic level, base substitutions and InDels co-occur and positively co-vary both within and between species (Hardison et al. 2003; Longman-Jacobsen et al. 2003). Due to the low fidelity of non- replicative DNA repair polymerases that introduce break-repair-induced single-

11 nucleotide mutations in the vicinity of genomic structural alterations (Rattray et al. 2001; Rattray and Strathern 2003; Hicks et al. 2010), certain regions are prone to be simultaneously altered by both mutation types (Tian et al. 2008; De and Babu 2010). Is the covariance of InDels and substitutions observed here, and by others (Leushkin et al. 2012), driven by the genetic mechanism that incorporates InDels, or is it primarily the outcome of selection? We observed increased substitution rates in the sequential vicinity of InDels in protein coding regions, and no increase in the proximity of InDels within

non-coding regions (up to 60 nucleotides upstream and downstream). The effect of DNA Downloaded from repair may therefore be limited to long InDels, and not to the short InDels analyzed here. Taken together with the increased substitution rates in the spatial vicinity of InDels (that

cannot relate to the DNA repair machinery), our data is consistent with the InDel- http://mbe.oxfordjournals.org/ substitution correlation being the outcome of selection.

The nonlinear accumulation of InDels (Fig. 1), and the covariance of substitutions and InDels (Figs. 4, 5 (see also Ref. (Leushkin et al. 2012)) indicates that the occurrence of InDels is highly context dependent, or epistatic – InDels seem to occur in combination at Fordham University Library on January 17, 2013 with certain substitutions. The primary assumption is that these correlated substitutions compensate for the deleterious effects of InDels, and are therefore fixed by positive selection - i.e., via adaptive walks (Kondrashov and Koonin 2003). Given the level of purifying selection acting on InDels, particularly in ordered regions, let alone in secondary structure elements, many InDels, unless enabled by certain point mutations, may result in complete non-functionalization. The lag behavior also suggests that correlated substitutions may occur before, presumably as neutral, and later enable the accumulation of InDels. A similar phenomenon seems to underline the accumulation of highly deleterious core substitutions that may be enabled by initially neutral substitutions on the surface (Toth-Petroczy and Tawfik 2011).

Overall, 'compensatory mutations' comprise the ruling paradigm whilst the notion of neutral mutations coming first and thereby enabling deleterious changes ('enabling mutation(s)') is relatively unrecognized (ISI search (topics) for "compensatory mutation(s)" versus "enabling mutation" is 751:2 hits, and Google fight 3200:114). Both in the fly and fungi phylogenies, a high fraction of InDel-specific substitutions might

12 have occurred as compensatory. However, the lag behavior suggests a measurable fraction of enabling mutations that appeared prior to the InDel events. Leushkin et al. proposes that InDels trigger adaptive changes, namely adaptive amino acid substitutions occur after an InDel event, to compensate for its fitness effect (Leushkin et al. 2012). We surmise that both pre- and post-compensation contribute to the acceptance of InDels (Fig. 7). Thus, InDels, and other types of highly deleterious mutations, such as point mutations in core residues (Toth-Petroczy and Tawfik 2011), can get incorporated via neutral

roaming. Indeed, several studies identified substitutions that accumulated as neutral Downloaded from (permissive mutations) and had subsequently enabled highly deleterious point mutations to accumulate ((Ortlund et al. 2007; Lunzer et al. 2010). InDels are, in general, far less

studied than point mutations. Nonetheless, they are subjected to the same epsitatic http://mbe.oxfordjournals.org/ constrains. Namely, InDels that accumulated within one sequence context are incompatible with others. In yeast, for example, the co-inducer Gal3 diverged by duplication from the enzyme Gal1. Divergence involved a deletion of two amino acids within Gal1's active-site, rendering Gal3 enzymatically inactive yet functional as co- inducer. The same deletion, however, causes complete loss of function in the at Fordham University Library on January 17, 2013 contemporary Gal1, both as an enzyme and as co-inducer. Like-wise, the respective causes loss of function in Gal3 (Hittinger and Carroll 2007). Thus, in the course of divergence, certain substitutions appeared that enabled this deletion.

So far we compared only orthologs that largely bear the same function, and the sequence changes relate mostly to genetic drift. However, the role of InDels in the functional adaptation of proteins is well recognized (Grishin 2001; Tawfik 2006; Golubchik et al. 2007; Hittinger and Carroll 2007; Akiva et al. 2008; Jochens et al. 2009; Hashimoto and Panchenko 2010; Afriat-Jurnou et al. 2012; Saab-Rincon et al. 2012), and examples of beneficial InDels are known (Podlaha and Zhang 2003; Podlaha et al. 2005) We found, however, that paralogs exhibit only a slightly higher InDel rate, and a slightly lower S/I ratios, relative to the corresponding orthologs (Fig. S2). The weak signal may be due to the fact that adaptation is also manifested in the increased substitution rate, and along a given phylogeny, most InDels relate to drift and only few actually drive neo- functionalization.

13

The newly identified features described here also suggest ways of facilitating the engineering of InDels, particularly of short InDels within active-site loops. Our own experience has been, for example, that a deletion within an active-site loop became advantageous only when a in a residue upstream to this deletion was incorporated (Afriat-Jurnou et al. 2012). In other loop grafting experiments, the introduction of variability at hinge residues that connect the loops with the scaffold facilitated the acceptance of loop InDels (Ochoa-Leyva et al. 2009; Ochoa-Leyva et al.

2011). We surmise that enabling point mutations, both in sequential and spatial vicinity, Downloaded from comprise the key to InDels incorporation. Methods for predicting such enabling mutations may promote our understanding of natural evolution and facilitate the

laboratory engineering of proteins. http://mbe.oxfordjournals.org/

at Fordham University Library on January 17, 2013

14

Material and Methods

Datasets. Fungi sequences and orthology annotations were obtained from the Fungal Orthogroups Repository (Wapinski et al. 2007) and Flybase (Tweedie et al. 2009) (species are listed in Supplementary Figure 9). Due to many gene annotation errors, the sequences were carefully filtered. Orthologs that diverged more than 70%, or deviated in length by more than 50%, were excluded. Overall, we used 1310 fungi and 10119 fly proteins with orthologs in all species within their phylogeny. The final list of proteins can be found in Dataset S1. Orthologous proteins groups (COGs) were downloaded from Downloaded from eggNOGv2 database (Muller et al. 2009). Only COGs that include >100 sequences and shared >40% sequence identity were included to minimize alignment errors (N=600). S.cerevisiae proteins with known structures were retrieved from PDB. http://mbe.oxfordjournals.org/

Sequence analysis. We tested four different alignment methods, MUSCLE (Edgar 2004), Dialign (Subramanian et al. 2008), ClustalW (Larkin et al. 2007) and PRANK (Loytynoja and Goldman 2008) (Fig. 1c). Gapped positions were counted and InDels were defined as

consecutive gapped regions, and only InDels within domains were included. InDel at Fordham University Library on January 17, 2013 analysis and statistical calculations were performed using Perl scripts. Secondary structure assignments were done by dssp (Kabsch and Sander 1983) based on high resolution X-ray structures (≤2.5 Å). Missing regions within structures were assigned as 'loops', since the lack of density usually relates to high flexibility. InDels within secondary structure elements were defined as such if positions flanking both sides of the InDel were annotated as helix or sheet (i.e., InDels at the ends of helixes and sheets were assigned as loop). InDel contacting residues were defined based on spatial proximity (<4Å) to the InDel flanking residues (one residue upstream and downstream) in the S. cerevisiae structures. We categorized domains as structured and unstructured based on disorder prediction by IUPred (Dosztanyi et al. 2005). Disordered domains were defined as ≥30 residues long regions, with less than 3 residues consecutively predicted to be ordered, and average disorder score >0.5. Ordered domains were defined as regions with average disorder scores <0.5, and <3 consecutive disordered residues.

Trees and InDel-specific lineage analysis. Maximum likelihood phylogenetic trees were created by PhyML (Guindon and Gascuel 2003) based on the concatenated alignments of

15 all orthologs with known structures (Fig. S9). Evolutionary rates were calculated by Rate4Site (Pupko et al. 2002) using JTT matrix.

Indel specific substitutions were determined by polarizing the alignments of three species from the phylogeny: ((InDel,Reference),Outgroup) (see Fig. 5a). For the fungi dataset the following triplets were considered: ((Scer,Spar),Smik), ((Spar,Smik)Sbay), ((Cgla,Scas),Sklu), ((Sklu,Kwal),Agos), ((Klac,Agos),Ylip), ((Ctro,Calb),Dhan), ((Cpar, Lelo), Dhan), ((Dhan,Cgiu),Ylip), ((Anid,Ncra),Sjap), ((Soct,Spom),Sjap). For the fly phylogeny the following triplets were considered: ((Dsim, Dsec),Dmel), Downloaded from ((Dyak,Dere),Dana), ((Dpse,Dper),Dwil), ((Dmoj, Dvir),Dgri). Within each triplet, substitutions were counted that occurred only in the InDel containing sequence (InDel specific divergence), but not in the reference and in the outgroup sequence. Similarly, http://mbe.oxfordjournals.org/ substitutions were counted that occurred only in the reference sequence (reference- specific substitutions). The sequence divergence was calculated in the vicinity of the InDel (at window size of 5, 10, 20 residues, local divergence), and for the whole sequence (global divergence). Relative divergence is defined as the difference between at Fordham University Library on January 17, 2013 the local and global divergence (Figs. 4 and 5). The difference between the relative InDel-specific, and relative reference-specific divergence is plotted in Figure 5 (for the common orthologs, N=534).

Figures and statistical analysis were made by Matlab.

16

Acknowledgements

Financial support by the Israel Science foundation is gratefully acknowledged. D.S.T. is the Nella and Leon Benoziyo Professor of Biochemistry. Downloaded from http://mbe.oxfordjournals.org/ at Fordham University Library on January 17, 2013

17

Figure legends

Figure 1. Nonlinearity in the relative rates of accumulation of substitutions and InDels. The lag phase suggests that the appearance of InDels is correlated with substitutions that occurred first, as apparently-neutral mutations, and later enabled the accumulation of InDels. Sequence divergence by amino acid substitutions (X-axis) was correlated with InDels accumulation rates (Y-axis) comparing: (a) Proteomes of 22 fungi species (points represent average values of pairwise differences for all orthologs: 22*21/2=231 pairs). (b) Orthologous fungi pairs (points represent averages for all pairs Downloaded from within a given divergence window). (c) To examine the bias introduced by alignment programs, we compared four different programs using the fungi set (1310 proteins from 22 species). The lag intensity (b/a; note that the lower b/a, the more intense is the lag) is http://mbe.oxfordjournals.org/ 158 for MUSCLE, 128 for ClustalW, 127 for Dialign, and 30 for PRANK. Given these trends, MUSCLE that provides the most conservative esimate was applied throughout this study. (d) Individual orthologous families (COGs) also show a non-linear trend. Shown is the correlation for a well-aligned COG (COG0126, phosphoglycerate kinase). at Fordham University Library on January 17, 2013 Inset: distribution of lag lengths for 600 COGs.

Fitting parameters: The fit follows the second-order term ax2+bx; also provided are the 2 2 parameters for the linear fit. (a) R linear=0.94, lag length = 0.08; R secondorder=0.96, b/a= 2 2 2 158, (b) R linear=0.95, lag length = 0.08; R secondorder=0.99, b/a = 0.16, (d) R linear=0.58, lag 2 length = 65; R secondorder=0.61, b/a = 31. The error bars represent standard deviations.

Figure 2. Relative divergence rates of InDel and substitutions in coding versus non- coding regions of fungi orthologs. Plotted is the fraction of nucleotide substitutions (nt, including synonymous ones) vs. fraction of InDels per gene. Points represent average values of pairwise differences binned by the level of divergence. Whereas coding regions 2 2 exhibit a lag in the appearance of InDels (R linear=0.86, lag length = 0.08; R secondorder= 0.99, b/a = 0.08), non-coding regions show a linear correlation (data derived from 2 2 promoter regions; R linear=0.88, lag length = -0.02; R secondorder=0.88, b/a = 2.35).

Figure 3. The lag length increases with structural order. Sequence divergence by amino acid substitutions (X-axis) was correlated with InDels accumulation rates (Y-axis)

18 comparing (from highest to lowest degree of order): Secondary structure regions (α-helix 2 or β-sheet configurations; "2° structure" R secondorder=0.84, b/a= 4.3; see also Inset); ordered regions without regular structure, such as loops, turns and coils ("loop", 2 R secondorder=0.90; b/a= 182) in fungi orthologs with known crystal structures (N=172). And, ordered versus disordered regions, as defined using IUPred (disordered residue, if 2 2 score ≥0.5; disordered R secondorder=0.95, b/a= 112; ordered R secondorder=0.96, b/a= 103).

Figure 4. Increased substitution rates in sequential and spatial proximity to InDels. (a) The evolutionary rates of positions in the vicinity of InDels (local rate) were Downloaded from compared to the global rate (mean rate for the entire protein). The neighbouring residues of InDels both in sequence (black, mean=1.24) and in space (red, mean=1.33) show increased evolutionary rates (local rate / global rate > 1; t-test p-values <10-5 for both http://mbe.oxfordjournals.org/ sets). In case of spatial neighbourhood, the residues with <4Å distance of the InDel flanking residues were considered (see Methods). (b) Similarly, InDel neighbouring positions exhibit increased sequence divergence (local identity – global identity < 0). The difference in the sequence identities disappears with increasing window size from 5 to 20 at Fordham University Library on January 17, 2013 - residues (nInDels = 150824, means are -0.17, -0.11, -0.06 respectively; t-test p-values < 10 5 for all datasets). Sequence identity was calculated based on pairwise comparison of sequences, as the number of substitutions devided by the aligned length of the sequences (i.e excluding gaps). (c) Increased sequence divergence of InDel flanking positions in fly orthologs at window sizes of 5, 10, 20, 30 and 50 residues (nInDels=167362, means are - 0.31, -0.26, -0.20, -0.17 respectively, t-test p-values < 10-5).

Figure 5. The correlated appearance of InDels and substitutions. (a) InDel-specific substitutions were identified by virtue of appearing in one sequence (an InDel carrying ortholog, or InDel-specific) and in neither the reference nor the out-group orthologs. The fraction of InDel-specific and reference specific substitutions were determined for all positions. (b, c) The distribution of the differences of the occurrences of InDel-specific versus reference-specific substitutions for the fungi (b) and fly (c) phylogenies. In both sets, InDel-specific changes are enriched in the InDels' vicinity, and this enrichment

19 diminishes with increasing window size (from 5 to 20 residues; t-test p-values are 1.8·10- 5 -5 , 0.015, 0.12 respectively; p-values are <10 for the flies for all window sizes).

Figure 6. Substitutions-to-InDels ratios differ in the fungi and fly phylogenies. The average number of InDels and amino acid substitutions is compared for the same set of 2 proteins (common orthologs, N=534) in the fly phylogeny (R secondorder=0.95; b/a = 166) 2 and in the fungi phylogeny (R secondorder=0.95; b/a = 406).

Figure 7. Schematic representation of the two mechanisms of divergence: adaptive Downloaded from walking and neutral roaming. The x and y axes represent genotypes, and the z-axis represents phenotype, i.e., fitness. The plot describes two lineages, marked with a start

and end point, along which a correlated substitution and an InDel occur. The red lineage http://mbe.oxfordjournals.org/ describes an 'adaptive walk' - a deleterious InDel occurs first, followed by a compensatory substitution. The blue lineage describes 'neutral roaming' - a nearly-neutral substitution occurs first, and thereby enables the acceptance of the deleterious InDel that follows. Evidence for correlated substitutions and InDels is presented here (Fig. 5) and

elsewhere (Leushkin et al. 2012), but correlation in itself does not indicate the at Fordham University Library on January 17, 2013 mechanism. However, the lag phase in the accumulation of InDels relative to substitutions (Fig. 1) lends support to the role of neutral roaming in accumulation of InDels.

20

Table 1. Substitutions-to-InDel ratios in coding and non-coding regions of different genomes.

Genomes Subset Region S/I1 Lag Lag Values length2 intensity3 derived from I=1 I=10 Figure:

Yeast Four Sensu Complete 21 7 23 1.3 stricto species proteins (N=4192) Coding 2194 75 70 61 Fig. S3c regions4

Noncoding 8 8 0 - Downloaded from regions4 S. cerevisiae Spontaneous 16.5 - - - mutation rate5

Fungi Orthologs of Complete 50 25 56 158 Fig. 1ab http://mbe.oxfordjournals.org/ 22 fungi proteins species Secondary 110 35 20 4.3 Fig. 2a (N=1310) structure regions6 Loop 54 28 13 182 regions6 Ordered 56 25 55 103 regions7 at Fordham University Library on January 17, 2013 Disordered 18 11 3.4 112 regions7 Common fungi-fly orthologs 47 24 49 167 Fig. 5 (N=534) Flies Orthologs of Complete 14 8 10 572 Fig. S7a 12 fly Proteins (drosophila) Coding 88 48 45 388 species regions4 (N=10119) Noncoding 14 14 -24 - regions4 Common fungi-fly orthologs 25 18 1 406 Fig. 5 (N=534) Drosophila Spontaneous 4 - - - melanogaster mutation rate5 COGs Orthologous Complete 102±50 28±9 55±50 42±117 Fig. 1c protein proteins families (N=600)

1 Because of the non-linear dependency, the average S/I (the substitutions/InDels ratio) was derived for various levels of divergence, namely, for the first and 10th InDel. 2 'Lag length' relates to the intercept of the x axis of the linear fit (-b/a) in InDel-to- substitution graphs (e.g. Fig. 1).

21

3 'Lag intensity' is the b/a value derived from the nonlinear fit ax2+bx in InDel-to- substitution graphs (e.g. Fig. 1). 4 These S/I values are measured for nucleotide substitutions (rather than amino acids, as in the 'Complete proteins' dataset) and InDels based on DNA alignments. 5 The spontaneous mutation rates as observed in laboratory strains(Lynch et al. 2008). 6 InDels and substitutions were determined in regions that adopt helical or sheet structures (secondary structure regions), or in loop and/or disordered regions, based on the crystal structures of the S. cerevisiae orthologs (N=170). 7 Disorered regions were predicted using IUPred (see Methods).

Downloaded from

http://mbe.oxfordjournals.org/ at Fordham University Library on January 17, 2013

22

References

Afriat-Jurnou, L., C. J. Jackson, and D. S. Tawfik. 2012. Reconstructing a Missing Link in the Evolution of a Recently Diverged Phosphotriesterase by Active-Site Loop Remodeling. Biochemistry 51:6047-6055. Akiva, E., Z. Itzhaki, and H. Margalit. 2008. Built-in loops allow versatility in domain- domain interactions: lessons from self-interacting domains. Proc Natl Acad Sci U S A 105:13292-13297. Borneman, A. R., B. A. Desany, D. Riches, J. P. Affourtit, A. H. Forgan, I. S. Pretorius, M. Egholm, and P. J. Chambers. 2011. Whole-genome comparison reveals novel genetic elements that characterize the genome of industrial strains of Downloaded from Saccharomyces cerevisiae. PLoS Genet 7:e1001287. Britten, R. J. 2010. Transposable element insertions have strongly affected human evolution. Proc Natl Acad Sci U S A 107:19945-19948.

Callahan, B., R. A. Neher, D. Bachtrog, P. Andolfatto, and B. I. Shraiman. 2011. http://mbe.oxfordjournals.org/ Correlated evolution of nearby residues in Drosophilid proteins. PLoS Genet 7:e1001315. Chen, J. Q., Y. Wu, H. Yang, J. Bergelson, M. Kreitman, and D. Tian. 2009. Variation in the ratio of nucleotide substitution and indel rates across genomes in mammals and bacteria. Mol Biol Evol 26:1523-1531. Cooley, R. B., D. J. Arp, and P. A. Karplus. 2010. Evolutionary origin of a secondary

structure: pi-helices as cryptic but widespread insertional variations of alpha- at Fordham University Library on January 17, 2013 helices that enhance protein functionality. J Mol Biol 404:232-246. de la Chaux, N., P. W. Messer, and P. F. Arndt. 2007. DNA indels in coding regions reveal selective constraints on protein evolution in the human lineage. BMC Evol Biol 7:191. De, S., and M. M. Babu. 2010. A time-invariant principle of genome evolution. Proc Natl Acad Sci U S A 107:13004-13009. Denver, D. R., K. Morris, M. Lynch, and W. K. Thomas. 2004. High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature 430:679-682. Dosztanyi, Z., V. Csizmok, P. Tompa, and I. Simon. 2005. The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol 347:827-839. Edgar, R. C. 2004. MUSCLE: multiple with high accuracy and high throughput. Nucleic Acids Res 32:1792-1797. Gojobori, T. 1983. Codon substitution in evolution and the "saturation" of synonymous changes. Genetics 105:1011-1027. Golubchik, T., M. J. Wise, S. Easteal, and L. S. Jermiin. 2007. Mind the gaps: evidence of bias in estimates of multiple sequence alignments. Mol Biol Evol 24:2433- 2442. Grishin, N. V. 2001. Fold change in evolution of protein structures. J Struct Biol 134:167-185. Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696-704.

23

Hardison, R. C., K. M. Roskin, S. Yang et al. (18 co-authors) 2003. Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. Genome Res 13:13-26. Hashimoto, K., and A. R. Panchenko. 2010. Mechanisms of protein oligomerization, the critical role of insertions and deletions in maintaining different oligomeric states. Proc Natl Acad Sci U S A 107:20352-20357. Hicks, W. M., M. Kim, and J. E. Haber. 2010. Increased mutagenesis and unique mutation signature associated with mitotic gene conversion. Science 329:82-85. Hittinger, C. T., and S. B. Carroll. 2007. Gene duplication and the adaptive evolution of a classic genetic switch. Nature 449:677-681. Hu, X., H. Wang, H. Ke, and B. Kuhlman. 2007. High-resolution design of a protein loop. Proc Natl Acad Sci U S A 104:17668-17673. Downloaded from Jochens, H., K. Stiba, C. Savile, R. Fujii, J. G. Yu, T. Gerassenkov, R. J. Kazlauskas, and U. T. Bornscheuer. 2009. Converting an esterase into an epoxide hydrolase. Angew Chem Int Ed Engl 48:3532-3535.

Kabsch, W., and C. Sander. 1983. Dictionary of protein secondary structure: pattern http://mbe.oxfordjournals.org/ recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577- 2637. Kellis, M., N. Patterson, M. Endrizzi, B. Birren, and E. S. Lander. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241-254. Kondrashov, F. A., and E. V. Koonin. 2003. Evolution of alternative splicing: deletions, insertions and origin of functional parts of proteins from intron sequences. Trends at Fordham University Library on January 17, 2013 Genet 19:115-119. Larkin, M. A., G. Blackshields, N. P. Brown et al. (13 co-authors) 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947-2948. Leushkin, E. V., G. A. Bazykin, and A. S. Kondrashov. 2012. Insertions and deletions trigger adaptive walks in Drosophila proteins. Proc Biol Sci 279:3075-3082. Longman-Jacobsen, N., J. F. Williamson, R. L. Dawkins, and S. Gaudieri. 2003. In polymorphic genomic regions indels cluster with nucleotide polymorphism: Quantum Genomics. Gene 312:257-261. Loytynoja, A., and N. Goldman. 2008. Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320:1632-1635. Lunzer, M., G. B. Golding, and A. M. Dean. 2010. Pervasive cryptic epistasis in molecular evolution. PLoS Genet 6:e1001162. Lynch, M., W. Sung, K. Morris et al. (11 co-authors) 2008. A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc Natl Acad Sci U S A 105:9272- 9277. McDonald, M. J., W. C. Wang, H. D. Huang, and J. Y. Leu. 2011. Clusters of nucleotide substitutions and insertion/deletion mutations are associated with repeat sequences. PLoS Biol 9:e1000622. Muller, J., D. Szklarczyk, P. Julien et al. (11 co-authors) 2009. eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res 38:D190-195.

24

Murphy, P. M., J. M. Bolduc, J. L. Gallaher, B. L. Stoddard, and D. Baker. 2009. Alteration of enzyme specificity by computational loop remodeling and design. Proc Natl Acad Sci U S A 106:9215-9220. Neuenfeldt, A., A. Just, H. Betat, and M. Morl. 2008. Evolution of tRNA nucleotidyltransferases: a small deletion generated CC-adding enzymes. Proc Natl Acad Sci U S A 105:7953-7958. Ochoa-Leyva, A., F. Barona-Gomez, G. Saab-Rincon, K. Verdel-Aranda, F. Sanchez, and X. Soberon. 2011. Exploring the Structure-Function Loop Adaptability of a (beta/alpha)(8)-Barrel Enzyme through Loop Swapping and Hinge Variability. J Mol Biol 411:143-157. Ochoa-Leyva, A., X. Soberon, F. Sanchez, M. Arguello, G. Montero-Moran, and G. Saab-Rincon. 2009. Protein design through systematic catalytic loop exchange in Downloaded from the (beta/alpha)8 fold. J Mol Biol 387:949-964. Ortlund, E. A., J. T. Bridgham, M. R. Redinbo, and J. W. Thornton. 2007. Crystal structure of an ancient protein: evolution by conformational epistasis. Science

317:1544-1548. http://mbe.oxfordjournals.org/ Podlaha, O., D. M. Webb, P. K. Tucker, and J. Zhang. 2005. Positive selection for indel substitutions in the rodent sperm protein catsper1. Mol Biol Evol 22:1845-1852. Podlaha, O., and J. Zhang. 2003. Positive selection on protein-length in the evolution of a primate sperm ion channel. Proc Natl Acad Sci U S A 100:12241-12246. Pupko, T., R. E. Bell, I. Mayrose, F. Glaser, and N. Ben-Tal. 2002. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics at Fordham University Library on January 17, 2013 18 Suppl 1:S71-77. Rattray, A. J., C. B. McGill, B. K. Shafer, and J. N. Strathern. 2001. Fidelity of mitotic double-strand-break repair in Saccharomyces cerevisiae: a role for SAE2/COM1. Genetics 158:109-122. Rattray, A. J., and J. N. Strathern. 2003. Error-prone DNA polymerases: when making a mistake is the only way to get ahead. Annu Rev Genet 37:31-66. Saab-Rincon, G., L. Olvera, M. Olvera, E. Rudino-Pinera, E. Benites, X. Soberon, and E. Morett. 2012. Evolutionary Walk between (beta/alpha)(8) Barrels: Catalytic Migration from Triosephosphate Isomerase to Thiamin Phosphate Synthase. J Mol Biol 416:255-270. Simon, M., and J. M. Hancock. 2009. Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins. Genome Biol 10:R59. Subramanian, A. R., M. Kaufmann, and B. Morgenstern. 2008. DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3:6. Tawfik, D. S. 2006. Biochemistry. Loop grafting and the origins of enzyme species. Science 311:475-476. Tian, D., Q. Wang, P. Zhang et al. 2008. Single-nucleotide mutation rate increases close to insertions/deletions in eukaryotes. Nature 455:105-108. Tokuriki, N., F. Stricher, L. Serrano, and D. S. Tawfik. 2008. How protein stability and new functions trade off. PLoS Comput Biol 4:e1000002. Toth-Petroczy, A., and D. S. Tawfik. 2011. Slow protein evolutionary rates are dictated by surface-core association. Proc Natl Acad Sci U S A 108:11151-11156.

25

Tweedie, S., M. Ashburner, K. Falls et al. (11 co-authors) 2009. FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res 37:D555-559. Wapinski, I., A. Pfeffer, N. Friedman, and A. Regev. 2007. Natural history and evolutionary principles of gene duplication in fungi. Nature 449:54-61. Yang, Y., J. Sterling, F. Storici, M. A. Resnick, and D. A. Gordenin. 2008. Hypermutability of damaged single-strand DNA formed at double-strand breaks and uncapped telomeres in yeast Saccharomyces cerevisiae. PLoS Genet 4:e1000264. Yang, Z. 1998. On the best evolutionary rate for phylogenetic analysis. Syst Biol 47:125- 133.

Downloaded from

http://mbe.oxfordjournals.org/ at Fordham University Library on January 17, 2013

26

Figure 1 Downloaded from http://mbe.oxfordjournals.org/ at Fordham University Library on January 17, 2013

27

Figure 2 Downloaded from http://mbe.oxfordjournals.org/

at Fordham University Library on January 17, 2013

28

Figure 3 Downloaded from

http://mbe.oxfordjournals.org/

at Fordham University Library on January 17, 2013

29

Figure 4

Downloaded from

http://mbe.oxfordjournals.org/ at Fordham University Library on January 17, 2013

30

Figure 5 Downloaded from http://mbe.oxfordjournals.org/

at Fordham University Library on January 17, 2013

31

Figure 6 Downloaded from http://mbe.oxfordjournals.org/

at Fordham University Library on January 17, 2013

32

Figure 7 Downloaded from http://mbe.oxfordjournals.org/

at Fordham University Library on January 17, 2013

33