<<

fpls-07-01433 September 20, 2016 Time: 13:22 # 1

View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Frontiers - Publisher Connector

ORIGINAL RESEARCH published: 22 September 2016 doi: 10.3389/fpls.2016.01433

Gene Evolutionary Trajectories and GC Patterns Driven by Recombination in Zea mays

Anitha Sundararajan1, Stefanie Dukowic-Schulze2, Madeline Kwicklis1, Kayla Engstrom1, Nathan Garcia1, Oliver J. Oviedo1, Thiruvarangan Ramaraj1, Michael D. Gonzales1, Yan He3, Minghui Wang3,4, Qi Sun4, Jaroslaw Pillardy4, Shahryar F. Kianian5, Wojciech P. Pawlowski3, Changbin Chen2 and Joann Mudge1*

1 National Center for Resources, Santa Fe, NM, USA, 2 Department of Horticultural Science, University of Minnesota, St. Paul, MN, USA, 3 Section of Plant , School of Integrative Plant Science, Cornell University, Ithaca, NY, USA, 4 Biotechnology Resource Center Bioinformatics Facility, Cornell University, Ithaca, NY, USA, 5 Cereal Disease Laboratory, United States Department of Agriculture – Agricultural Research Service, St. Paul, MN, USA

Recombination occurring during meiosis is critical for creating genetic variation and plays an essential role in plant . In addition to creating novel gene combinations, recombination can affect genome structure through altering GC patterns. In maize (Zea mays) and other grasses, another intriguing GC pattern exists. Maize genes show a Edited by: bimodal GC content distribution that has been attributed to nucleotide bias in the third, Jun Yu, Beijing Institute of Genomics, China or wobble, position of the codon. Recombination may be an underlying driving force Reviewed by: given that recombination sites are often associated with high GC content. Here we James D. Higgins, explore the relationship between recombination and genomic GC patterns by comparing University of Leicester, UK GC gene content at each of the three codon positions (GC , GC , and GC , collectively Longjiang Fan, 1 2 3 Zhejiang University, China termed GCx) to instances of a variable GC-rich motif that underlies double strand break *Correspondence: (DSB) hotspots and to meiocyte-specific gene expression. Surprisingly, GCx bimodality Joann Mudge in maize cannot be fully explained by the codon wobble hypothesis. High GCx genes [email protected] show a strong overlap with the DSB hotspot motif, possibly providing a mechanism

Specialty section: for the high evolutionary rates seen in these genes. On the other hand, genes that This article was submitted to are turned on in meiosis (early prophase I) are biased against both high GCx genes Plant Genetics and Genomics, a section of the journal and genes with the DSB hotspot motif, possibly allowing important meiotic genes to Frontiers in Plant Science avoid DSBs. Our data suggests a strong link between the GC-rich motif underlying DSB Received: 20 July 2016 hotspots and high GCx genes. Accepted: 08 September 2016 Published: 22 September 2016 Keywords: recombination, GC, meiosis, meiocytes, maize, codon usage, wobble, gene expression Citation: Sundararajan A, Dukowic-Schulze S, Kwicklis , Engstrom K, Garcia N, INTRODUCTION Oviedo OJ, Ramaraj T, Gonzales MD, He Y, Wang M, In eukaryotes, meiotic exchange of genetic information, or recombination, between homologous Sun Q, Pillardy J, Kianian SF, chromosomes is a critical step in generating genetic diversity required for adaptation. Pawlowski WP, Chen C and Recombination is also a crucial tool in plant improvement efforts. Local genome architecture is Mudge J (2016) Gene Evolutionary sculpted by the recombination process, and genome architecture, in turn, drives recombination. Trajectories and GC Patterns Driven by Recombination in Zea mays. This interplay helps to create variability in genomic space, defining relatively stable and plastic Front. Plant Sci. 7:1433. genomic regions. This fluctuation in genomic stability is critical for balancing adaptation and doi: 10.3389/fpls.2016.01433 stability on the phenotypic level.

Frontiers in Plant Science | www.frontiersin.org 1 September 2016 | Volume 7 | Article 1433 fpls-07-01433 September 20, 2016 Time: 13:22 # 2

Sundararajan et al. Gene Evolutionary Trajectories and GC Patterns

Recombination has direct implications for GC patterns and with GC rich ones, which occurs in but not in yeast vice versa. GC content refers to the percentage of guanine and (Duret and Arndt, 2008; Marsolier-Kergoat and Yeramian, 2009; cytosine bases in a DNA sequence, as opposed to adenine and Marsolier-Kergoat, 2011). In yeast, GC-biased gene conversion thymidine bases. There have been many studies substantiating was found specifically at crossovers, not at DSBs that did not the positive correlation between recombination and GC content involve gene conversion and did not result in a crossover, and (Ikemura and Wada, 1991; Eyre-Walker, 1993; Fullerton et al., was shown to be due to mismatch repair rather than base excision 2001; Galtier et al., 2001; Marais et al., 2001; Duret and Arndt, repair or DSB repair (Lesecque et al., 2013). Even within-species 2008; Haudry et al., 2008; Escobar et al., 2010; Muyle et al., variation in GC content is partly related to recombination, being 2011). Crossovers have been found to be correlated with high correlated in mammals, birds, and yeast (Birdsell, 2002; Duret GC content in rat, mouse, human, zebrafish, bee, and maize at and Arndt, 2008; Nabholz et al., 2011). Notably, it has been a broad scale (Jensen-Seaman et al., 2004; Beye et al., 2006; Gore proposed for angiosperms, that recombination and GC-biased et al., 2009; Backstrom et al., 2010; Giraut et al., 2011), while other gene conversion drives gene GC content patterns, even forming a studies detected strong correlation only at a fine scale (∼5 kb 50–30 gradient along many genes (Glémin et al., 2014). However, for yeast, ∼15–128 kb for human) and rather weak correlation GC-biased gene conversion might be attenuated in inbreeding at a broad scale (∼30 kb for yeast, ∼1 Mb for human; Gerton species such as selfing grasses leading to no apparent correlation et al., 2000; Myers et al., 2006; Marsolier-Kergoat and Yeramian, (Giraut et al., 2011). 2009). Triplets of the four major nucleotide bases (guanine, cytosine, In plants, correlation of recombination and GC content has adenine, and thymine/uracil) can be combined in 64 (43) different been demonstrated in multiple species (Haudry et al., 2008; ways but encode for only 20 amino acids and the stop codon. Escobar et al., 2010; Muyle et al., 2011). A study examining Thus, there is redundancy in the genetic code, resulting in three different grasses (rice, maize, and Brachypodium) revealed triplets (codons) that differ but are synonymous concerning significant correlation of the local recombination rate with high their matching amino acid. GC content, , and selection GC content, especially in the wobble codon position (third pressure are critical for the evolution of heterogeneity in codon position in the codon; Serres-Giardi et al., 2012). This was usage (Sharp et al., 1988; Sharp and Matassi, 1994; Karlin in contrast to prior studies performed with lower levels of and Mrazek, 1996; Sueoka and Kawanishi, 2000). By now, resolution, which found only weak correlation in maize and none it has been well documented that synonymous codon usage in rice or Brachypodium (Gore et al., 2009; Tian et al., 2009; Huo varies significantly among and, in addition, among et al., 2011). However, negative correlation between crossovers different genes within any given genome (Grantham et al., 1980; and high GC content was reported for Arabidopsis (Drouaud Sharp et al., 1988; Wang and Hickey, 2007). One implication et al., 2006) in spite of the fact that a crossover motif has of higher GC content from silent codon positions can be a been identified that has high GC content every third nucleotide general increased expression due to higher stability of mRNA, as (Wijnker et al., 2013). demonstrated in mammals (Kudla et al., 2006). Intra- and inter- While most studies agreed on a positive correlation between genomic deviations in codon usage can be attributed to many recombination and high GC content, the cause for this has factors. In the case of prokaryotes and unicellular organisms, it is been disputed (Duret and Arndt, 2008; Marsolier-Kergoat and thought to be a result of natural selection during which protein Yeramian, 2009; Giraut et al., 2011). Possible reasons suggested production is optimized (Gouy and Gautier, 1982; Sharp and for the high GC/crossover correlation include selection on codon Li, 1986; Shields et al., 1988; Sharp et al., 2005). Likewise, in usage, mutational bias, or GC-biased gene conversion (Eyre- some higher eukaryotes, there is also evidence that codon bias Walker and Hurst, 2001; Duret and Galtier, 2009). The latter may occur due to selection for translational efficiency (Shields is seen as the most likely cause for GC enrichment (Figure 1), et al., 1988; Stenico et al., 1994). Particularly, there is a positive and has been suggested for diverse organisms such as yeast, correlation between the complementarity of the codons in highly mammals, and birds (Webster et al., 2006; Duret and Arndt, expressed genes and the anticodons of the most abundant tRNAs 2008; Mancera et al., 2008; Nabholz et al., 2011). A study by (Ikemura, 1981; Moriyama and Powell, 1997; Kanaya et al., Birdsell(2002) presented compelling evidence demonstrating 1999). a highly significant positive correlation between GC in the Numerous studies have been conducted on codon usage in wobble position and recombination within 6,143 ORFs analyzed plant species including the model plants Arabidopsis (Arabidopsis in the yeast (Saccharomyces cerevisiae) genome. This study thaliana), rice (Oryza sativa), and a moss (Physcomitrella patens), also showed a significant correlation between recombination as well as in poplar, citrus and true grass (Poaceae/Gramineae) and the mean GC content of the first and second codon species (Chiapello et al., 1998; Liu et al., 2003; Guo et al., and recombination, but not to the same extent as with 2007; Ingvarsson, 2008; Xu et al., 2013). In addition to the the GC content in the third, or wobble, position (Birdsell, reported factors involved in codon usage bias, higher average 2002). GC content in monocot compared to dicot genomes plays an In biased gene conversion, repair tracts at recombination sites important role in dictating codon usage differences between the that are not necessarily resolved into crossovers favor changes to two groups of plants (Fennoy and Bailey-Serres, 1993; Carels and GC over AT (Eyre-Walker, 1993; Birdsell, 2002). This can occur Bernardi, 2000). These compositional variations are illustrated simply by using the mismatch repair machinery (Marais, 2003) by an abundance of genes with high GC levels in the third or by repair of double strand breaks (DSBs) in GC-poor alleles codon position in monocots (Matassi et al., 1989). The third

Frontiers in Plant Science| www.frontiersin.org 2 September 2016| Volume 7| Article 1433 fpls-07-01433 September 20, 2016 Time: 13:22 # 3

Sundararajan et al. Gene Evolutionary Trajectories and GC Patterns

FIGURE 1 | Outcomes of meiotic recombination. A double strand break is processed via different pathways, with very few DSBs resulting in actual crossovers while a majority are resolved via gene conversion, frequently inserting a GC bias due to mismatch repair. Thick arrows represent the major routes.

position in the codon, also referred to as the wobble position, genome evolution, as measured by GC patterns, and vice is less biased for the amino acid than the other two bases and versa. accounts for most of the degeneracy of the genetic code. The Recombination occurs during meiosis, specifically wobble position functions as a marker for GC richness, and the during prophase I when DSBs are formed and homologous frequency of GC nucleotides at the third position is defined as chromosomes pair and recombine (Padmore et al., 1991; GC3 (Chiapello et al., 1998; Tatarinova et al., 2010; Elhaik et al., Sheehan and Pawlowski, 2012; Dukowic-Schulze et al., 2014b). 2014). While DSBs initiate recombination and are a necessary In some organisms, two classes of genes can be distinguished prerequisite, most DSBs will not be resolved into crossovers. by their high or low GC3 content (Carels and Bernardi, 2000; Two Arabidopsis motifs associated with crossovers have been Tatarinova et al., 2010). This trait is thought to be ancestral identified, including one that showed high GC content every to monocots, with some lineages losing the bimodal GC3 three nucleotides (Wijnker et al., 2013). In maize, a previously distribution (Clément et al., 2014). These two GC3 classes identified variable motif underlying genic DSB hotspots is GC- in monocots experience divergent evolutionary pressure and rich and also shows high GC periodicity every three nucleotides, contain different functional categories of genes. High GC3 reminiscent of GC periodicity within the codon (He et al., in genes within monocots were mainly categorized into functions review). involving electron transport or energy pathways, response to In this study, we used maize as a model to better under- biotic and abiotic stressors and signal transduction (Carels stand the relationship between genome architecture and and Bernardi, 2000; Tatarinova et al., 2010). High GC3 genes recombination. We examine the interplay between genome can be turned on quickly and experience accelerated evolution evolution (including divergent evolutionary trajectories within (Tatarinova et al., 2010). a single genome), GC patterns, and recombination initiation Kawabe and Miyashita(2003) went beyond GC 3 content and in maize. Specifically, we address whether the GC-rich, three described GC content in the first, second, and third codon nucleotide-periodic motif underlying DSB hotspots in maize positions for seven plant species, including monocots and dicots, correlates with GC3 or other codon-driven GC patterns. In showing that GC1 and GC2 in addition to GC3 were significantly addition, we address how meiotic genes fit into the DSB and higher in monocots than dicots. The differences in GC content GC landscapes. Concurrently, we extend present knowledge of were the largest in the third codon position, followed by the GC1, GC2, and GC3, collectively termed GCx, in Zea mays, first and then the second position. Studies by Cruveiller et al. by examining the relationships among them. In short, we aim (2000, 2004) report a linear correlation between genic GC2 and to learn whether DSB-associated motifs with high GC content, GC3 levels, which is found in species as distant as human and particularly at every third nucleotide, could be the driving force bacteria. behind bimodal GC patterns that split the maize genome into As part of a larger project, we have explored the genomic labile and stable evolutionary trajectories. landscape of meiosis, specifically looking at early prophase I in isolated plant meiocytes, the cells which undergo meiosis and recombination. Previous publications have focused on MATERIALS AND METHODS the functional aspects of gene expression (Chen et al., 2010; Dukowic-Schulze et al., 2014a,b) and the landscape of DSB Reference Genome sites and its distribution (He et al., in review). This article All analyses used the B73 maize reference genome version focuses primarily on the implications of recombination on RefGen_v2 to previous expression analysis. The filtered

Frontiers in Plant Science| www.frontiersin.org 3 September 2016| Volume 7| Article 1433 fpls-07-01433 September 20, 2016 Time: 13:22 # 4

Sundararajan et al. Gene Evolutionary Trajectories and GC Patterns

gene set (annotation set 5b, gff format) was used for gene RESULTS annotation. GC Patterns in Maize Genes Show Differential Expression Analysis Bimodal Peaks with a Strong Bias in the Samples, sequence, and differential expression analyses are Third Codon Position described in Dukowic-Schulze et al.(2014a,b,c). For this analysis, We examined the GC content of maize genes and their CDSs a less stringent dataset was used with a significance cutoff for (Figure 2). The GC content of maize genes shows a bimodal = = calling differential expression increased from p 0.01 to p 0.05. peak, indicating that there are two classes of genes in the maize genome that are differentiated by GC content. This Double Strand Break Hotspots matches previous observations (Duret et al., 1995; Carels and Using ChIP-seq with antibodies against the RAD51 protein Bernardi, 2000; Lescot et al., 2008; Paterson et al., 2009) as described in He et al.(2013), the DSB hotspot motif was and hold true both when calculated across genes, including identified with the sequence GVSGRSGNSGRSGVSGRSG (He introns (Figure 2A), and when calculated just across CDSs et al., in review). The motif was identified from ∼900 genic (Figure 2B). Including introns and untranslated regions (UTRs; hotspot regions that did not contain transposable elements. Figure 2A) appears to bias genes toward the lower GC content Copies of the motif were identified using the rGADEM package class compared to the pattern using only CDS, though both (Li, 2009) to re-scan these genic hotspot regions for matches classes are clearly visible. This suggests that the high GC to the position weight matrix of the motif using a stringency content is maintained preferentially in the CDS regions. As of 80%. discussed above, a strong bias in GC content at the third codon position has been shown to contribute to the bimodality GC Calculations of maize genic GC content but bimodality has been shown for all three codon positions. We took a comprehensive GC, GC , GC , and GC were calculated using custom Perl 1 2 3 approach and examined GC content in the first (GC ), scripts. For GC , GC , and GC , calculations for each gene 1 1 2 3 second (GC ), and third (GC ) codon positions, collectively were performed on the sequence that contributes to the protein 2 3 termed GC . (coding domain sequences (CDSs), and redundancies removed x GC , GC , and GC were calculated for all genes and the where CDSs overlapped. The phase of each CDS, defined as the 1 2 3 distribution plotted (Figure 3). A bimodal distribution was easily number of nucleotides that need be removed from the beginning apparent for GC and GC (GC content peaks around 50–55% of the CDS to find the first base of the next codon, was taken into 1 3 and 90–95%) where both GC content classes had higher GC account. GC represents the GC content of the first nucleotides, 1 content than the overall genic GC content (see Figures 2 and 3), GC the content of the second nucleotides, and GC the content 2 3 regarding both the intron-inclusive (40–45% and around 60%) of the third nucleotides of all codons in a gene. Genic GC was and the intron-exclusive CDS analysis (∼50% and 70–75%; calculated for exons only (CDSs) as well as for exons together Figure 3). In contrast, GC analysis showed had only one with introns in the pre-mRNA. 2 discernible peak of far lower GC content (around 45%), with a long tail to the right, with possibly a very shallow second peak Pathway Enrichment Analysis showing high GC content in the same range as that seen in GC1 agriGO was used to perform gene ontology (GO) enrichment and GC3. As expected, the peak containing high GC content studies (Du et al., 2010) using singular enrichment analysis genes was most pronounced in the third position. A cutoff of to identify enrichment compared to the Z. mays reference. ≥80% (Tatarinova et al., 2010) was used for all GCx conditions Advanced statistical options include Fisher’s exact test and, to identify the class of high GCx content genes among the in order to perform multi-comparison adjustment with the 39,656 genes present in Z. mays (Table 1). Among the high large input dataset, the Benjamini–Hochberg correction method GCx content classes, the high GC3 content class had the most (Benjamini and Hochberg, 1995). A significance value of 0.05 was genes at 5,719, followed by the high GC1 content class at 3,647 used to obtain lists of enriched GO terms unless the input gene list genes. GC2 had the fewest with 629 genes (Table 1). There was large, in which case we focused on the most significant terms was almost no overlap between each of the high GCx classes (p = 0.01). This did not alter the nature of the functionalities that (Figure 4). were enriched for within the analyses. In order to consolidate the large list of GO terms, REVIGO was used (Supek et al., GC Patterns in Genic Double Strand 2011). REVIGO uses a simple hierarchical clustering procedure to Break Hotspot Motifs Show High GC remove redundant terms, summarize related terms, and visualize Every Third Nucleotide the final set of GO terms. To gain insight on patterns possibly shaping bimodality and GCx classes, we looked at copies of the DSB hotspot motif (hereafter Plotting and Statistical Analyses referred to as “motifs”) in genic regions (He et al., in review). Plotting was done in R Statistical Package 3.2.0 and two- Of 3,423 DSB genic hotspot motifs, 2,143 fell into 544 genes. sided chi-square tests performed in Microsoft Excel v Of these, 1,909 motifs fell into 811 CDSs of 511 genes, and 14.6.4. the remaining 234 motifs occurred in introns and 50 and 30

Frontiers in Plant Science| www.frontiersin.org 4 September 2016| Volume 7| Article 1433 fpls-07-01433 September 20, 2016 Time: 13:22 # 5

Sundararajan et al. Gene Evolutionary Trajectories and GC Patterns

FIGURE 2 | GC patterns within maize coding regions shown in kernel density plots. (A) Histogram of GC content of genes (including exons and introns). (B) Histogram of GC content of genes (just CDS sequence). The area under the curve represents the probability of getting a value in a given range of GC content, with the area under the entire curve equal to 1.

UTR regions. Maize genes are, on average, less than 50% coding sequence (1,153.7 nt of intron sequence for every 1 kb of CDS), yet 89% of the motifs occurred in the CDS. This indicates a strong

preference for copies of the DSB hotspot motifs to occur in the FIGURE 3 | Kernel density plots of GCx distribution patterns of gene protein-coding portion of the gene (chi-square p-value = 0). Of CDSs. (A) GC1. (B) GC2. (C) GC3. the 544 genes containing hotspot motifs, 84% contained more than one motif instance indicating the propensity for them to cluster (Table 2). Because the periodic groups of the DSB hotspot motifs are GC content of the hotspot motif (GVSGRSGNSGRSG- not necessarily in frame with a gene’s coding frame, we analyzed VSGRSG) demonstrates a 3 nt-based periodicity (He et al., in GCx content of the motifs by identifying its coding frame and review). While none of the positions are perfectly conserved, the adjusting it from motif coordinates (relating to the beginning of G’s at every third nucleotide have information content values the motif) to GCx space (relating to the coding frame of the gene that are much higher than at any other residue and range that contains the motif). Indeed, motifs fell into all three possible from approximately 1.2 to 1.7. Indeed, every third position coding frames, with 868 matching the gene’s coding frame (i.e., starting with the first nucleotide is nearly always “G.” We the first base of the motif is also the first position of a codon), 389 determined GC content across positions 1–19 of all underlying starting on the second position of the codon, and 1,094 starting motifs. GC content per nucleotide position, averaged across all on the third position of the codon. Nucleotides within motifs motif occurrences, was calculated for all hotspot motifs as well were then placed into GCx groups based on their position within as only those hotspot motifs falling into genes. Every third nucleotide was grouped into a separate periodicity group (i.e.,

group 1 = nucleotide positions 1, 4, 7, 10, 13, 16, 19; group TABLE 1 | Counts of high and low GCx genes. 2 = nucleotide positions 2, 5, 8, 11, 14, 17; group 3 = nucleotide positions 3, 6, 9, 12, 15, 18). These groups are framed relative GCx High GCx Low GCx genes (≥80%) genes (<80%) to the start position of the motif and not necessarily to the

coding frame of the containing gene (Figures 5A,B). Pairwise GC1 3,647 36,009

differences between GC content of these groups all showed GC2 629 39,027

significant differences (Table 3). GC3 5,719 33,937

Frontiers in Plant Science| www.frontiersin.org 5 September 2016| Volume 7| Article 1433 fpls-07-01433 September 20, 2016 Time: 13:22 # 6

Sundararajan et al. Gene Evolutionary Trajectories and GC Patterns

TABLE 2 | Frequencies of double strand break hotspot motifs per gene.

Motifs per Gene Number of Genes

1 85 2 107 3 92 4 75 5 58 6 45 7 32 8 20 9 13 10 5 11 6 12 5 15 1 16 1

high GCx genes (Table 4). Around 70% (380/544) of the genes containing DSB hotspot motifs fall into at least one peak of high GCx content (i.e., GC1, GC2, and/or GC3). Of these, only three had high GCx content in more than one category, and only one had high GCx content in all three. The number of DSBs that fall into each of the high GCx categories mirrors the size of each of the high GCx content peaks. Twenty-four percent of genes containing DSB hotspots have high GC1 content, 5% have high GC2 content, and 42% have high GC3. When looking at the GCx content of individual DSB hotspot motifs, as opposed to looking at them collectively as done above, a large fraction of the motifs are GC-rich (≥80%), including many with 100% GC content (Figure 6). Indeed, more than a quarter of motifs have 100% GC1 and GC3 content and more than half have 100% GC2 content. It is thus not surprising that genes containing the motifs also tend to be have high GCx content.

Genes with High and Low GCx Content Show Functional Differentiation Among the 39,656 genes annotated in the maize reference genome, 9,861 genes (25%) are in a high GCx content peak. However, there is very little overlap between high GCx genes in the GC1, GC2, and GC3 categories (Figure 4). Indeed, overlap

FIGURE 4 | Comparison of GCx distributions. (A) GC1 vs. GC2. (B) GC1 rates are significantly less than expected given random sampling vs. GC3. (C) GC2 vs. GC3. Blue lines show the 80% cutoff separating the high (Table 5). The lack of overlap is especially stark between the and low GCx classes. high GC content peaks of GC1 and GC3 (Figure 4B) where very few genes simultaneously have high GC1 and high GC3 content (≥80%), while many genes have a strong positive correlation the codon rather than in the motif. The GC2 group within motifs when being of both low GC1 and GC3 content (<80%). Of 526 showed significantly higher GC content than either the GC1 or genes that would be expected to fall into the high GC3 content GC3 groups and was the only group that overlapped the high GCx peak given random sampling of 3,647 high GC1 genes from the range (≥80%; Figure 5C). This is particularly interesting when maize genome, only 96 (18% of expectation) fall into the high considering that genes with high GC2 content are far less frequent GC3 content peak. Furthermore, most of the genes that do show than those with high GC1 or GC3 (Figure 3). Given these high GC1 and GC3 content are limited to GCx contents close to patterns, we tested whether high GCx on a gene level is correlated the 80% cutoff in one or both GCx categories, indicating they with high GCx of the contained motif. Genes containing DSB may be part of the upper tail of the lower GCx content peak that hotspot motifs were indeed significantly overrepresented for merges into and is indistinguishable from the high GCx content

Frontiers in Plant Science| www.frontiersin.org 6 September 2016| Volume 7| Article 1433 fpls-07-01433 September 20, 2016 Time: 13:22 # 7

Sundararajan et al. Gene Evolutionary Trajectories and GC Patterns

involved in cellular processes, metabolic processes, protein metabolism, protein modifications, biosynthetic pathways, localization, and transport. Genes responsible for transcription and transcriptional regulation processes were also categorized as low GC3. Genes in the high GC3 class fell into functions implicated in the immune system and stress responses including adaptive immunity and response to wounding. Genes involved in sexual reproduction were also enriched in the high GC3 class though this did not appear to translate specifically to enrichment of meiotic recombination genes. In short, our results indicate more generalized or housekeeping functions for low GC3 genes, SOS-type functions for high GC3 genes, and are very similar to those seen in other studies (Carels and Bernardi, 2000; Tatarinova et al., 2010). In the case of GC content of the first nucleotide of each codon, 36,009 genes were categorized with low GC1 (<80%) and 3,648 genes with high (≥80%) GC1 content. GO analysis on GC1 gene classes showed similar trends to the GC3 classes. Low GC1 genes described functions such as cellular protein modification process, protein localization, macromolecule metabolism (primary and cellular metabolism), including lipid and phosphorus metabolism, gene expression, and cell death. High GC1 genes were implicated in functions such as response to stress, DNA packaging, regulation of biological quality, adaptive immune response, fatty acid metabolism, and response to stimulus. When all genes in Z. mays were examined for the GC content in the second nucleotide of each codon, even though there were significantly fewer genes with high GC content compared to high GC1 and GC3 content, the general trends observed in GC1 and GC3 were seen in GC2 for the low (39,027 genes) and high (630 genes) GC2 classes with some subtle exceptions. To elaborate, low GC2 genes were implicated in ontologies such as protein metabolism, localization, RNA metabolism, lipid metabolism, and intracellular transport. The 630 high GC2 genes, however, were categorized into different ontologies compared to the high GC1 and high GC3 gene sets. High GC2 genes were involved in cell-wall organization, transmembrane transport, lipid, carbohydrate, and phosphorus metabolism, G-coupled signaling pathways, protein phosphorylation, post- FIGURE 5 | GC content within hotspot motifs was compared between translational modification, sexual reproduction, and localization 3 nt-based periodicity groups across all motif instances. (A) All hotspot type functions. The low number of genes in the high GC class motifs are plotted in groups of every third nucleotide starting with nt1, nt2, or 2 nt3. The nt1 group includes GC content calculated across all motifs for may have lowered our power to detect some of the classes seen in nucleotide positions 1, 4, 7, etc. Each position was analyzed separately. high GC1 and GC3 genes. Likewise, the nt2 and nt3 groups have GC content calculated for every third Overall, high GCx genes tend to play a role in stress and nucleotide position starting with nt2 and nt3, respectively. (B) All genic adaptive responses as well as sexual reproduction. The low GCx hotspot motifs are plotted in groups of every third nucleotide as in (A); (C) The reading frame of the motif was determined in order to place nucleotides into genes tend to be biased toward more generalized functions. the GC1, GC2, and GC3 categories. Then, GC content was calculated for the Genes Up-Regulated in Meiocytes Are GC1, GC2, and GC3 position within the motif, trimming the end nucleotide overhangs that resulted when shifting motifs to line up GC1, GC2, and GC3 Underrepresented for DSB Motifs positions. To study correlations between DSB motif presence, GCx patterns and gene expression, we examined genes that contained DSB hotspot motifs regarding their expression level in meiocytes peak (Figure 3). Taken together, there seems to be a selection compared to anthers or seedlings (Table 6). Genes with DSB against high GCx content in more than one codon position. hotspot motifs were down-regulated in meiocytes at a rate When genes in the low GC3 class were subjected to agriGO expected given independence of down-regulated genes and analysis, housekeeping pathways were enriched, such as genes motifs. However, meiocyte up-regulated expression of DSB

Frontiers in Plant Science| www.frontiersin.org 7 September 2016| Volume 7| Article 1433 fpls-07-01433 September 20, 2016 Time: 13:22 # 8

Sundararajan et al. Gene Evolutionary Trajectories and GC Patterns

TABLE 3 | Comparison of GC content between 3 nt-based periodic nucleotide groups of DSB motifs within DSB hotspots and genic DSB hotspots.

Comparisons F-test for equal variance Equal variance Unequal variance (homoscedastic), (heteroscedastic), two-tailed t-tests two-tailed t-tests

All motifs (position within motif) nt1 vs. nt2 0.06315758 0.000426348 nt2 vs. nt3 0.548540329 1.41343E-05 nt1 vs. nt3 0.016068299 7.88762E-06 Genic motifs (position within motif) nt1 vs. nt2 0.012207227 0.000329802 nt2 vs. nt3 0.596027462 3.77664E-06 nt1 vs. nt3 0.003212561 5.4902E-06

Genic motifs (position within codon) GC1 vs. GC2 0.348471769 0.000199458

GC2 vs. GC3 0.288858144 1.7E-06

GC1 vs. GC3 0.058398065 0.38307282

An F-test was performed to test the hypothesis of equal variances followed by one of two t-tests based on the outcome of the F-test. See the legend for Figure 5 for a detailed description of the groups. All comparisons were done either as a homoscedastic or a heteroscedastic t-test, depending on the outcome of the F-test.

DISCUSSION TABLE 4 | Chi-square test to determine whether GCx high and low categorization is randomly distributed within the 544 DSB motif-containing genes. We explored the correlations of genes with instances of a motif found at meiotic DSB hotspots in maize with GC High GCx observed Low GCx observed p-value patterns as well as meiotic expression levels. GC-rich motifs (expected) (expected) with three-nucleotide periodicity underlie hotspots of DSBs GC1 130 (50) 414 (494) 1.79998E-32 that are precursors to recombination between homologous GC2 28 (9) 516 (535) 2.97987E-11 chromosomes. We looked at the genomic landscape of these GC3 227 (78) 317 (466) 1.87802E-73 motifs, including motifs in regions outside of the DSB hotspots that were identified in our dataset. While most DSB hotspots occurred near the transcription start or stop sites, GCx content hotspot motif-containing genes occurred at a lower frequency is, by definition, within the codons. Nevertheless, genes tend to be than expected randomly. This bias is significant in the meiocytes either high or low GCx as a whole though there is a slight increase vs. seedlings comparison but not in the meiocytes vs. anthers in GC content in the 50–30 direction (Tatarinova et al., 2010) comparison, likely due to the large overlap of gene expression and GCx determination included bases between the transcription patterns between meiocytes and anthers (Dukowic-Schulze et al., start and stop sites. 2014b). These DSB hotspot motifs tend to occur in genes that We approached our analysis from another angle, and frequently have high GC1, GC2, or GC3 content. High GC1, looked at GC3 distribution of all genes with any expression GC2, and GC3 content genes are negatively correlated with each in meiocytes in B73. When considering all genes that are other, indicating that the extremely high GC content (≥80%) that expressed in meiocytes, GC3 distribution does not differ from occurs in genes with high GCx almost exclusively derive from GC3 distribution across all genes (Table 7). However, when only one frame. zooming in specifically on genes that are up- or down-regulated To incorporate meiotic gene expression, we sequenced RNA in meiocytes compared to seedlings or anthers (which contain from meiocytes (the cells where meiosis takes place) that were just meiocytes), GC-based trends become clear. Genes up-regulated beginning the meiotic process. Specifically, the cells were in the in meiocytes tend to have low GC3 content (<80%) while early stages of prophase I, which itself consists of five sequential genes down-regulated in meiocytes tend to have high GC3 stages (leptotene, zygotene, pachytene, diplotene, and diakinesis) (≥80%; Table 7). Similar trends are seen with GC1 and GC2 and is the platform of DSBs and crossovers. Early prophase I (Table 7). includes the formation of DSBs (leptotene), resection and single- When considered in totality, up-regulated genes in meiocytes end invasion (zygotene), and resolution of some of the DSBs into are biased toward low GCx content classes. When we look closely crossovers (pachytene) and therefore is highly relevant to the at the smaller number of up-regulated genes with high GC3 generation of novel genetic combinations. Notably, genes that are content, there are energy-based functional gene classes such as up-regulated in early meiosis tend to have low GCx and are biased electron carrier activity, ATP binding and mitochondria that are against containing DSB hotspot motifs. Our data thus indicates enriched and chromosome-specific gene classes such as DNA that genes up-regulated in early meiosis might be protected binding, DNA packaging, and DNA conformation change that against DSBs. Our observation that genes that are up-regulated in are enriched in the up-regulated high GC3 genes. These classes meiocytes are significantly biased against genes with DSB hotspot were also found to be enriched when analyzing all up-regulated motifs makes sense given the environment in early prophase I. genes in meiocytes, regardless of GC content (Dukowic-Schulze Meiotic genes induced during early prophase I, by necessity, need et al., 2014b). to be genes that are not simultaneously undergoing DSBs, though

Frontiers in Plant Science| www.frontiersin.org 8 September 2016| Volume 7| Article 1433 fpls-07-01433 September 20, 2016 Time: 13:22 # 9

Sundararajan et al. Gene Evolutionary Trajectories and GC Patterns

TABLE 5 | Chi-square test to determine whether other GCx classes are randomly distributed within each of the high GCx classes.

High GCx Low GCx p-value observed observed (expected) (expected)

Among high GC1 genes

GC2 26 (58) 3,621 (3,589) 2.43E-05

GC3 96 (526) 3,551 (3,121) 2.57E-91

Among high GC2 genes

GC1 26 (58) 603 (571) 1.11E-05

GC3 26 (91) 603 (538) 2.06E-13

Among high GC3 genes

GC1 96 (526) 5,623 (5,193) 3.59E-86

GC2 26 (91) 5,693 (5,628) 7.44E-12

Significant p-values allowing rejection of the null hypothesis (p ≤ 0.05) are in bold.

TABLE 6 | Chi-square test to determine whether up- or down-regulated genes in meiocytes are randomly distributed within the 544 motif-containing genes.

Differential expression in Yes No p-value meiocytes

Meiocyte- Comparison Observed Observed specific tissue (expected) (expected) expression

Up-regulated Anther 2 (3) 542 (541) 0.712147988 Up-regulated Seedling 18 (44) 526 (500) 3.57589E-05 Down-regulated Anther 2 (3) 542 (541) 0.506157854 Down-regulated Seedling 62 (57) 482 (487) 0.465276348

Significant p-values allowing rejection of the null hypothesis (p ≤ 0.05) are in bold.

of development or meiosis, and not around the time when DSBs are generated, but benefit from recombination-driven evolution. GC bimodality refers to the occurrence of two classes of genes, distinguished by their GC content. They have been reported in monocot but not in dicot plants (Clément et al., 2014). To date, there have been several reports explaining the differences in relative distribution of genes based on GC3 between monocots FIGURE 6 | Density plot of GCx content of individual motifs. (A) GC1, (B) GC2, (C) GC3. and dicots. Additionally, pronounced differences were observed in GC3 between close relatives in plant families. For example, a comparative study between A. thaliana, Raphanus sativus, Brassica rapa, and Brassica napus revealed that the GC3 values we emphasize that the presence of a motif does not necessarily of R. sativus, B. rapa, and B. napus are on an average 5% higher indicate the presence of a recombination hotspot. As a key point, than that of A. thaliana orthologs (Villagomez and Kuleck, 2009; our analysis suggests that DSB formation could be targeted to or Tatarinova et al., 2010). Further, many plants, including grasses avoided in specific functional classes, regulated by the periodic are prone to genome duplications, which provides redundancy GC-rich motif underlying DSBs. that can relax selection pressure for individual genes though In general, the low GCx gene classes in our dataset were genome-wide some stabilization must occur after a polyploidy underrepresented for the DSB hotspot motif (Table 4; Figure 7) event. Nevertheless, the flexibility provided by redundancy and showed bias toward housekeeping functions. In contrast, that occurs after polyploidization may enable the evolution of we found genes with high GCx that are enriched for more genomic regions that have varied recombination rates and GC specialized functions, including sexual reproduction, though contents. genes did not appear to be specific to meiotic recombination. Degeneracy in the genetic code in the third codon position Possibly, these latter genes are needed during other stages gave rise to the wobble hypothesis (Crick, 1966) which has

Frontiers in Plant Science| www.frontiersin.org 9 September 2016| Volume 7| Article 1433 fpls-07-01433 September 20, 2016 Time: 13:22 # 10

Sundararajan et al. Gene Evolutionary Trajectories and GC Patterns

TABLE 7 | Chi-square test on B73 genes that are expressed in or up- or down-regulated in meiocytes compared to anthers or seedlings.

Meiocyte expression GCx Low GCx observed (expected) High GCx observed (expected) p-value

All expressed genes GC1 22,023 (21,981) 2,184 (2,226) 0.3477 All expressed genes GC2 23,852 (23,823) 355 (384) 0.1363 All expressed genes GC3 20,648 (20,716) 3,559 (3,491) 0.2136 Up vs. anthers GC1 180 (172) 9 (17) 0.0349 Up vs. seedlings GC1 3,002 (2,938) 234 (298) 1.0900E-04 Down vs. anthers GC1 192 (211) 40 (21) 2.2300E-05 Down vs. seedlings GC1 3,560 (3,759) 580 (381) 8.5100E-27

Up vs. anthers GC2 186 (186) 4 (3) 0.5596

Up vs. seedlings GC2 3,200 (3,185) 36 (51) 3.1000E-02

Down vs. anthers GC2 224 (228) 8 (4) 2.3200E-02

Down vs. seedlings GC2 4,029 (4,074) 111 (66) 1.7100E-08

Up vs. anthers GC3 179 (162) 10 (27) 0.0004

Up vs. seedlings GC3 2,855 (2,769) 381 (467) 1.7800E-05

Down vs. anthers GC3 167 (199) 65 (33) 3.7800E-09

Down vs. seedlings GC3 3,117 (3,543) 1,023 (597) 3.6400E-79

The null hypothesis is that there is no relation between these sets of genes and the high and low GCx categories. Significant p-values allowing rejection of the null hypothesis (p ≤ 0.05) are in bold.

first and second positions of the codon as well, though much less than is seen in the third position (Supplementary Table 1). Given the much larger amount of wobble in the third codon position compared to the first and second, however, it is surprising that the size of the peak with high GC1 content genes is nearly two- thirds that of the corresponding high GC3 peak in our dataset of DSB hotspot motif containing genes (Figure 3; Table 1). Further, the amount of wobble in the first and second positions is quite similar (Supplementary Table 1), yet the second position has only 17% the number of genes with high GC2 content compared to GC1 (Figure 3; Table 1). If GC bimodality patterns in maize were enabled simply by the freedom that codon wobble gives for GC shifts, than nearly all of the high GCx genes would be in the high GC3 category. The fact that nearly half of the high GCx genes fall into the GC1 and GC2 categories clearly indicates that the wobble hypothesis, on its own, is insufficient for explaining the GCx bimodality across all three frames. The ability in many parts of a protein to interchange amino acids with amino acids of similar biochemical properties FIGURE 7 | Overview of findings. High GCx genes are overrepresented for (Henikoff and Henikoff, 1992) might help explain the DSB hotspots and genes down-regulated in meiocytes but underrepresented discrepancy between the numbers of genes in each high for genes up-regulated in meiocytes. Individual high GCx classes (GC1, GC2, GCx class and the codon position’s corresponding wobble. and GC3) show less overlap than expected given the number of genes in each class. Genes up-regulated in meiocytes are underrepresented for DSB Substitution of amino acids of similar biochemical properties hotspots. may allow a genome to shift GC content at all the codon positions. More work needs to be done to test this hypothesis and discover its effect on the varying sizes of the high GCx classes across all genes but is not in the scope of this work. been invoked to explain the bimodal GC3 distribution of genes in maize and other monocots (Tatarinova et al., 2010). In the The importance of GCx bimodality is underscored by its maintenance through mutational pressure and selective restraint wobble position, tRNA modifications allow pairing of tRNA with though the mechanism is yet unknown (Liu et al., 2012). The multiple codons, allowing multiple codons to code for a single propensity for recombination to increase GC content (Figure 1) amino acid (reviewed in Agris et al., 2007). This degeneracy of may be a contributing mechanism of positive reinforcement for the genetic code gives freedom, mainly in the third codon, for maintaining genes in high GCx content and DSB hotspot pools GC shifts to take place. (Galtier et al., 2001). Further, the typically high evolutionary rates While the wobble hypothesis is often only applied to the third associated with high GCx genes (Tatarinova et al., 2010) could codon position, there is degeneracy in the genetic code in the be due to their association with the GC-rich DSB motifs, leading

Frontiers in Plant Science| www.frontiersin.org 10 September 2016| Volume 7| Article 1433 fpls-07-01433 September 20, 2016 Time: 13:22 # 11

Sundararajan et al. Gene Evolutionary Trajectories and GC Patterns

to increased rates of recombination. Genetic recombination is in meiocytes have a strong bias against high GCx classes and crucial to an organism’s ability to survive, adapt, and evolve. DSB motifs, indicating the importance of these genes and their Genetic recombination occurs through crossing over during acquisition of protective strategies against DSBs. Intriguingly, meiosis and, furthermore, gene conversion occurs with a frequent wobble bases and degeneracy in the genetic code are not sufficient GC bias in the case of non-crossovers. for explaining the high GCx classes and their sizes across all DSBs were assayed using ChIP-seq targeting the RAD51 gene three frames. The presence of the three nucleotide-based periodic (He et al., 2013), which binds to DSBs during zygotene, resulting GC-rich motifs underlying double strand hotspot motifs may in the discovery of a maize DSB hotspot motif (He et al., in provide a first glimpse at additional selection pressure driving review). Given the three nucleotide-based GC periodicity found the generation of functionally biased high GCx classes in maize in the DSB hotspot motifs and the documented GC3 bias found and related organisms. More work needs to be done to determine in maize, it was natural to look for correlations between the whether the motif provides recognition for the DSB machinery, two. High correlation was found between genes containing DSB thus providing a possible mechanism promoting high GCx hotspot motifs and GC3 levels, both of the motif itself and the content. containing gene. Interestingly, similar correlations were found for GC1 and GC2 levels. Indeed, 70% of all genic DSB hotspots fall into high GCx genes, a remarkable and significant proportion AUTHOR CONTRIBUTIONS given that only one-quarter of genes overall fall into the high GCx category. Even more intriguingly, genes with high GC2 motifs AS, SD-S, and JM analyzed the data and wrote the manuscript. make the biggest contribution, although high GC2 genes are far MK, KE, OO, NG, TR, and MG helped with data analysis. MW, less prevalent than GC1 or GC3 genes (Figures 5C and 6B). YH, QS, and WP performed experiments to identify DSB hotspot Further work needs to be done in order to determine whether motifs and provided data. WP and CC edited the manuscript. there is a direct or secondary relationship between copies of the WP,CC, SK, JP and JM are investigators on the NSF IOS-1025881 motif, double-strand breaks, and, by extension, recombination. grant of which this manuscript is a part. Nevertheless, this data suggests intriguing hypotheses that imply a role for recombination in maintaining bimodal GCx distributions that cannot fully be explained by the wobble FUNDING hypothesis. This research was supported by the United States National Science Foundation grant IOS-1025881 and by an Institutional CONCLUSION Development Award (IDeA) from the National Institute of General Medical Sciences of the National Institutes of Health Sexual reproduction coupled with meiotic recombination is an under grant number P20GM103451. important evolutionary strategy for generating genetic diversity crucial to adaptation and evolution of species. As the precursors to crossovers and non-crossovers, DSBs and their locations are critical to understanding recombination processes and biases. SUPPLEMENTARY MATERIAL Here we showed that DSB hotspot motifs have a strong tendency The Supplementary Material for this article can be found online to occur in genes with high GC content. These genes have x at: http://journal.frontiersin.org/article/10.3389/fpls.2016.01433 extremely high GC content in one frame, have high evolutionary rates, and are biased toward SOS-functional classes, implicating SUPPLEMENTAL TABLE 1 | Degeneracy of the genetic code in the first support for adaptation in these genes. Further, genes up-regulated and second codon positions.

REFERENCES Carels, N., and Bernardi, G. (2000). Two classes of genes in plants. Genetics 154, 1819–1825. Agris, P. F., Vendeix, F. A., and Graham, W. D. (2007). tRNA’s wobble decoding Chen, C., Farmer, A. D., Langley, R. J., Mudge, J., Crow, J. A., May, G. D., et al. of the genome: 40 years of modification. J. Mol. Biol. 366, 1–13. doi: (2010). Meiosis-specific gene discovery in plants: RNA-Seq applied to isolated 10.1016/j.jmb.2006.11.046 Arabidopsis male meiocytes. BMC Plant Biol. 10:280. doi: 10.1186/1471-2229- Backstrom, N., Forstmeier, W., Schielzeth, H., Mellenius, H., Nam, K., Bolund, E., 10-280 et al. (2010). The recombination landscape of the zebra finch Taeniopygia Chiapello, H., Lisacek, F., Caboche, M., and Henaut, A. (1998). Codon usage guttata genome. Genome Res. 20, 485–495. doi: 10.1101/gr.101410.109 and gene function are related in sequences of Arabidopsis thaliana. Gene 209, Benjamini, Y., and Hochberg, Y. (1995). Controlling the fasle discovery rate: a GC1-GC38. doi: 10.1016/s0378-1119(97)00671-9 practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300. Clément, Y., Fustier, M.-A., Nabholz, B., and Glémin, S. (2014). The bimodal Beye, M., Gattermeier, I., Hasselmann, M., Gempe, T., Schioett, M., Baines, J. F., distribution of genic GC content is ancestral to Monocot species. Genome Biol. et al. (2006). Exceptionally high levels of recombination across the honey bee Evol. 7, 336–348. doi: 10.1093/gbe/evu278 genome. Genome Res. 16, 1339–1344. doi: 10.1101/gr.5680406 Crick, F. H. (1966). Codon–anticodon pairing: the wobble hypothesis. J. Mol. Biol. Birdsell, J. A. (2002). Integrating genomics, bioinformatics, and classical genetics 19, 548–555. doi: 10.1016/S0022-2836(66)80022-0 to study the effects of recombination on genome evolution. Mol. Biol. Evol. 19, Cruveiller, S., D’Onofrio, G., and Bernardi, G. (2000). The compositional 1181–1197. doi: 10.1093/oxfordjournals.molbev.a004176 transition between the genomes of cold- and warm-blooded vertebrates:

Frontiers in Plant Science| www.frontiersin.org 11 September 2016| Volume 7| Article 1433 fpls-07-01433 September 20, 2016 Time: 13:22 # 12

Sundararajan et al. Gene Evolutionary Trajectories and GC Patterns

codon frequencies in orthologous genes. Gene 261, 71–83. doi: 10.1016/S0378- Guo, X., Bao, J., and Fan, L. (2007). Evidence of selectively driven codon usage in 1119(00)00520-5 rice: implications for GC content evolution of Gramineae genes. FEBS Lett. 581, Cruveiller, S., Jabbari, K., Clay, O., and Bernardi, G. (2004). Compositional gene 1015–1021. doi: 10.1016/j.febslet.2007.01.088 landscapes in vertebrates. Genome Res. 14, 886–892. doi: 10.1101/gr.2246704 Haudry, A., Cenci, A., Guilhaumon, C., Paux, E., Poirier, S., Santoni, S., et al. (2008). Drouaud, J., Camilleri, C., Bourguignon, P. Y., Canaguier, A., Berard, A., Mating system and recombination affect molecular evolution in four Triticeae Vezon, D., et al. (2006). Variation in crossing-over rates across chromosome species. Genet. Res. (Camb.) 90, 97–109. doi: 10.1017/S0016672307009032 4 of Arabidopsis thaliana reveals the presence of meiotic recombination “hot He, Y., Sidhu, G., and Pawlowski, W. P. (2013). Chromatin immunoprecipitation spots.” Genome Res. 16, 106–114. doi: 10.1101/gr.4319006 for studying chromosomal localization of meiotic proteins in maize. Methods Du, Z., Zhou, X., Ling, Y., Zhang, Z., and Su, Z. (2010). agriGO: a GO analysis Mol. Biol. 990, 191–201. doi: 10.1007/978-1-62703-333-6_19 toolkit for the agricultural community. Nucleic Acids Res. 38, W64–W70. doi: Henikoff, S., and Henikoff, J. G. (1992). Amino acid substitution matrices 10.1093/nar/gkq310 from protein blocks. Proc. Natl. Acad. Sci. U.S.A. 89, 10915–10919. doi: Dukowic-Schulze, S., Harris, A., Li, J., Sundararajan, A., Mudge, J., Retzel, E. F., 10.1073/pnas.89.22.10915 et al. (2014a). Comparative transcriptomics of early meiosis in Arabidopsis and Huo, N., Garvin, D. F., You, F. M., McMahon, S., Luo, M. C., Gu, Y. Q., et al. maize. J. Genet. Genomics 41, 139–152. doi: 10.1016/j.jgg.2013.11.007 (2011). Comparison of a high-density genetic linkage map to genome features Dukowic-Schulze, S., Sundararajan, A., Mudge, J., Ramaraj, T., Farmer, A. D., in the model grass Brachypodium distachyon. Theor. Appl. Genet. 123, 455–464. Wang, M., et al. (2014b). The transcriptome landscape of early maize meiosis. doi: 10.1007/s00122-011-1598-4 BMC Plant Biol. 14:118. doi: 10.1186/1471-2229-14-118 Ikemura, T. (1981). Correlation between the abundance of Escherichia coli transfer Dukowic-Schulze, S., Sundararajan, A., Ramaraj, T., Mudge, J., and Chen, C. RNAs and the occurrence of the respective codons in its protein genes. J. Mol. (2014c). Sequencing-based large-scale genomics approaches with small Biol. 146, 1–21. doi: 10.1016/0022-2836(81)90363-6 numbers of isolated maize meiocytes. Front. Plant Sci. 5:57. doi: Ikemura, T., and Wada, K. (1991). Evident diversity of codon usage patterns of 10.3389/fpls.2014.00057 human genes with respect to chromosome banding patterns and chromosome Duret, L., and Arndt, P. F. (2008). The impact of recombination on numbers; relation between nucleotide sequence data and cytogenetic data. nucleotide substitutions in the human genome. PLoS Genet. 4:e1000071. doi: Nucleic Acids Res. 19, 4333–4339. doi: 10.1093/nar/19.16.4333 10.1371/journal.pgen.1000071 Ingvarsson, P. K. (2008). Molecular evolution of synonymous codon usage in Duret, L., and Galtier, N. (2009). Biased gene conversion and the evolution Populus. BMC Evol. Biol. 8:307. doi: 10.1186/1471-2148-8-307 of mammalian genomic landscapes. Annu. Rev. Genomics Hum. Genet. 10, Jensen-Seaman, M. I., Furey, T. S., Payseur, B. A., Lu, Y., Roskin, K. M., Chen, C. F., 285–311. doi: 10.1146/annurev-genom-082908-150001 et al. (2004). Comparative recombination rates in the rat, mouse, and human Duret, L., Mouchiroud, D., and Gautier, C. (1995). Statistical analysis of vertebrate genomes. Genome Res. 14, 528–538. doi: 10.1101/gr.1970304 sequences reveals that long genes are scarce in GC-rich isochores. J. Mol. Evol. Kanaya, S., Yamada, Y., Kudo, Y., and Ikemura, T. (1999). Studies of codon usage 40, 308–317. doi: 10.1007/BF00163235 and tRNA genes of 18 unicellular organisms and quantification of Bacillus Elhaik, E., Pellegrini, M., and Tatarinova, T. V. (2014). Gene expression and subtilis tRNAs: gene expression level and species-specific diversity of codon nucleotide composition are associated with genic methylation level in Oryza usage based on multivariate analysis. Gene 238, 143–155. doi: 10.1016/S0378- sativa. BMC Bioinformatics 15:23. doi: 10.1186/1471-2105-15-23 1119(99)00225-5 Escobar, J. S., Cenci, A., Bolognini, J., Haudry, A., Laurent, S., David, J., et al. (2010). Karlin, S., and Mrazek, J. (1996). What drives codon choices in human genes? An integrative test of the dead-end hypothesis of selfing evolution in Triticeae J. Mol. Biol. 262, 459–472. doi: 10.1006/jmbi.1996.0528 (Poaceae). Evolution 64, 2855–2872. doi: 10.1111/j.1558-5646.2010.01045.x Kawabe, A., and Miyashita, N. T. (2003). Patterns of codon usage bias in three Eyre-Walker, A. (1993). Recombination and mammalian genome evolution. Proc. dicot and four monocot plant species. Genes Genet. Syst. 78, 343–352. doi: Biol. Sci. 252, 237–243. doi: 10.1098/rspb.1993.0071 10.1266/ggs.78.343 Eyre-Walker, A., and Hurst, L. D. (2001). The evolution of isochores. Nat. Rev. Kudla, G., Lipinski, L., Caffin, F., Helwak, A., and Zylicz, M. (2006). High guanine Genet. 2, 549–555. doi: 10.1038/35080577 and cytosine content increases mRNA levels in mammalian cells. PLoS Biol. Fennoy, S. L., and Bailey-Serres, J. (1993). Synonymous codon usage in Zea mays L. 4:e180. doi: 10.1371/journal.pbio.0040180 nuclear genes is varied by levels of C and G-ending codons. Nucleic Acids Res. Lescot, M., Piffanelli, P., Ciampi, A. Y., Ruiz, M., Blanc, G., Leebens-Mack, J., 21, 5294–5300. doi: 10.1093/nar/21.23.5294 et al. (2008). Insights into the Musa genome: syntenic relationships to Fullerton, S. M., Bernardo Carvalho, A., and Clark, A. G. (2001). Local rates of rice and between Musa species. BMC Genomics 9:58. doi: 10.1186/1471- recombination are positively correlated with GC content in the human genome. 2164-9-58 Mol. Biol. Evol. 18, 1139–1142. doi: 10.1093/oxfordjournals.molbev.a003886 Lesecque, Y., Mouchiroud, D., and Duret, L. (2013). GC-biased gene conversion Galtier, N., Piganeau, G., Mouchiroud, D., and Duret, L. (2001). GC-content in yeast is specifically associated with crossovers: molecular mechanisms evolution in mammalian genomes: the biased gene conversion hypothesis. and evolutionary significance. Mol. Biol. Evol. 30, 1409–1419. doi: Genetics 159, 907–911. 10.1093/molbev/mst056 Gerton, J. L., DeRisi, J., Shroff, R., Lichten, M., Brown, P. O., and Petes, T. D. (2000). Li, L. (2009). GADEM: a genetic algorithm guided formation of spaced dyads Global mapping of meiotic recombination hotspots and coldspots in the yeast coupled with an EM algorithm for motif discovery. J. Comput. Biol. 16, 317–329. Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. U.S.A. 97, 11383–11390. doi: doi: 10.1089/cmb.2008.16TT 10.1073/pnas.97.21.11383 Liu, H., Huang, Y., Du, X., Chen, Z., Zeng, X., Chen, Y., et al. (2012). Patterns Giraut, L., Falque, M., Drouaud, J., Pereira, L., Martin, O. C., and Mezard, C. of synonymous codon usage bias in the model grass Brachypodium distachyon. (2011). Genome-wide crossover distribution in Arabidopsis thaliana meiosis Genet. Mol. Res. 11, 4695–4706. doi: 10.4238/2012.October.17.3 reveals sex-specific patterns along chromosomes. PLoS Genet. 7:e1002354. doi: Liu, Q. P., Tan, J., and Xue, Q. Z. (2003). [Synonymous codon usage bias in the rice 10.1371/journal.pgen.1002354 cultivar 93-11 (Oryza sativa L. ssp. indica)]. Yi Chuan Xue Bao 30, 335–340. Glémin, S., Clément, Y., David, J., and Ressayre, A. (2014). GC content evolution Mancera, E., Bourgon, R., Brozzi, A., Huber, W., and Steinmetz, L. M. (2008). High- in coding regions of angiosperm genomes: a unifying hypothesis. Trends Genet. resolution mapping of meiotic crossovers and non-crossovers in yeast. Nature 30, 263–270. doi: 10.1016/j.tig.2014.05.002 454, 479–485. doi: 10.1038/nature07135 Gore, M. A., Chia, J. M., Elshire, R. J., Sun, Q., Ersoz, E. S., Hurwitz, B. L., et al. Marais, G. (2003). Biased gene conversion: implications for genome and sex (2009). A first-generation haplotype map of maize. Science 326, 1115–1117. doi: evolution. Trends Genet. 19, 330–338. doi: 10.1016/S0168-9525(03)00116-1 10.1126/science.1177837 Marais, G., Mouchiroud, D., and Duret, L. (2001). Does recombination improve Gouy, M., and Gautier, C. (1982). Codon usage in bacteria: correlation with gene selection on codon usage? Lessons from nematode and fly complete genomes. expressivity. Nucleic Acids Res. 10, 7055–7074. doi: 10.1093/nar/10.22.7055 Proc. Natl. Acad. Sci. U.S.A. 98, 5688–5692. doi: 10.1073/pnas.091427698 Grantham, R., Gautier, C., and Gouy, M. (1980). Codon frequencies in 119 Marsolier-Kergoat, M. C. (2011). A simple model for the influence of individual genes confirm consistent choices of degenerate bases according to meiotic conversion tracts on GC content. PLoS ONE 6:e16109. doi: genome type. Nucleic Acids Res. 8, 1893–1912. doi: 10.1093/nar/8.9.1893 10.1371/journal.pone.0016109

Frontiers in Plant Science| www.frontiersin.org 12 September 2016| Volume 7| Article 1433 fpls-07-01433 September 20, 2016 Time: 13:22 # 13

Sundararajan et al. Gene Evolutionary Trajectories and GC Patterns

Marsolier-Kergoat, M. C., and Yeramian, E. (2009). GC content and Stenico, M., Lloyd, A. T., and Sharp, P. M. (1994). Codon usage in Caenorhabditis recombination: reassessing the causal effects for the Saccharomyces cerevisiae elegans: delineation of translational selection and mutational biases. Nucleic genome. Genetics 183, 31–38. doi: 10.1534/genetics.109.105049 Acids Res. 22, 2437–2446. doi: 10.1093/nar/22.13.2437 Matassi, G., Montero, L. M., Salinas, J., and Bernardi, G. (1989). The isochore Sueoka, N., and Kawanishi, Y. (2000). DNA G++C content of the third codon organization and the compositional distribution of homologous coding position and codon usage biases of human genes. Gene 261, 53–62. doi: sequences in the nuclear genome of plants. Nucleic Acids Res. 17, 5273–5290. 10.1016/S0378-1119(00)00480-7 doi: 10.1093/nar/17.13.5273 Supek, F., Bosnjak, M., Skunca, N., and Smuc, T. (2011). REVIGO summarizes Moriyama, E. N., and Powell, J. R. (1997). Codon usage bias and tRNA abundance and visualizes long lists of gene ontology terms. PLoS ONE 6:e21800. doi: in Drosophila. J. Mol. Evol. 45, 514–523. doi: 10.1007/PL00006256 10.1371/journal.pone.0021800 Muyle, A., Serres-Giardi, L., Ressayre, A., Escobar, J., and Glemin, S. (2011). GC- Tatarinova, T. V., Alexandrov, N. N., Bouck, J. B., and Feldmann, K. A. (2010). biased gene conversion and selection affect GC content in the Oryza genus GC3 biology in corn, rice, sorghum and other grasses. BMC Genomics 11:308. (rice). Mol. Biol. Evol. 28, 2695–2706. doi: 10.1093/molbev/msr104 doi: 10.1186/1471-2164-11-308 Myers, S., Spencer, C. C., Auton, A., Bottolo, L., Freeman, C., Donnelly, P., et al. Tian, Z., Rizzon, C., Du, J., Zhu, L., Bennetzen, J. L., Jackson, S. A., et al. (2009). Do (2006). The distribution and causes of meiotic recombination in the human genetic recombination and gene density shape the pattern of DNA elimination genome. Biochem. Soc. Trans. 34, 526–530. doi: 10.1042/BST0340526 in rice long terminal repeat retrotransposons? Genome Res. 19, 2221–2230. doi: Nabholz, B., Kunstner, A., Wang, R., Jarvis, E. D., and Ellegren, H. (2011). 10.1101/gr.083899.108 Dynamic evolution of base composition: causes and consequences in avian Villagomez, L. T. T. V., and Kuleck, G. (2009). Ecological Genomics: Construction phylogenomics. Mol. Biol. Evol. 28, 2197–2210. doi: 10.1093/molbev/msr047 of Molecular Pathways Responsible for Gene Regulation, and Adaptation to Padmore, R., Cao, L., and Kleckner, N. (1991). Temporal comparison of Heavy Metal Stress in Arabidopsis thaliana, and Raphanus sativus. Stockholm: recombination and synaptonemal complex formation during meiosis in S. ISMB/ECCB. cerevisiae. Cell 66, 1239–1256. doi: 10.1016/0092-8674(91)90046-2 Wang, H. C., and Hickey, D. A. (2007). Rapid divergence of codon usage patterns Paterson, A. H., Bowers, J. E., Feltus, F. A., Tang, H., Lin, L., and Wang, X. (2009). within the rice genome. BMC Evol. Biol. 7(Suppl 1):S6. doi: 10.1186/1471-2148- Comparative genomics of grasses promises a bountiful harvest. Plant Physiol. 7-S1-S6 149, 125–131. doi: 10.1104/pp.108.129262 Webster, M. T., Axelsson, E., and Ellegren, H. (2006). Strong regional biases in Serres-Giardi, L., Belkhir, K., David, J., and Glemin, S. (2012). Patterns nucleotide substitution in the chicken genome. Mol. Biol. Evol. 23, 1203–1216. and evolution of nucleotide landscapes in seed plants. Plant Cell 24, doi: 10.1093/molbev/msk008 1379–1397. doi: 10.1105/tpc.111.093674 Wijnker, E., Velikkakam James, G., Ding, J., Becker, F., Klasen, J. R., Sharp, P. M., Bailes, E., Grocock, R. J., Peden, J. F., and Sockett, R. E. (2005). Rawat, V., et al. (2013). The genomic landscape of meiotic crossovers and Variation in the strength of selected codon usage bias among bacteria. Nucleic gene conversions in Arabidopsis thaliana. Elife 2:e01426. doi: 10.7554/eLife. Acids Res. 33, 1141–1153. doi: 10.1093/nar/gki242 01426 Sharp, P. M., Cowe, E., Higgins, D. G., Shields, D. C., Wolfe, K. H., and Wright, F. Xu, C., Dong, J., Tong, C., Gong, X., Wen, Q., and Zhuge, Q. (2013). Analysis (1988). Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces of synonymous codon usage patterns in seven different citrus species. Evol. cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo Bioinform. 9, 215–228. doi: 10.4137/EBO.S11930 sapiens; a review of the considerable within-species diversity. Nucleic Acids Res. 16, 8207–8211. doi: 10.1093/nar/16.17.8207 Conflict of Interest Statement: The authors declare that the research was Sharp, P. M., and Li, W. H. (1986). An evolutionary perspective on synonymous conducted in the absence of any commercial or financial relationships that could codon usage in unicellular organisms. J. Mol. Evol. 24, 28–38. doi: be construed as a potential conflict of interest. 10.1007/BF02099948 Sharp, P. M., and Matassi, G. (1994). Codon usage and genome evolution. Curr. Copyright © 2016 Sundararajan, Dukowic-Schulze, Kwicklis, Engstrom, Garcia, Opin. Genet. Dev. 4, 851–860. doi: 10.1016/0959-437X(94)90070-1 Oviedo, Ramaraj, Gonzales, He, Wang, Sun, Pillardy, Kianian, Pawlowski, Chen and Sheehan, M. J., and Pawlowski, W. P. (2012). Imaging chromosome dynamics in Mudge. This is an open-access article distributed under the terms of the Creative meiosis in plants. Methods Enzymol. 505, 125–143. doi: 10.1016/B978-0-12- Commons Attribution License (CC BY). The use, distribution or reproduction in 388448-0.00015-2 other forums is permitted, provided the original author(s) or licensor are credited Shields, D. C., Sharp, P. M., Higgins, D. G., and Wright, F. (1988). “Silent” sites and that the original publication in this journal is cited, in accordance with accepted in Drosophila genes are not neutral: evidence of selection among synonymous academic practice. No use, distribution or reproduction is permitted which does not codons. Mol. Biol. Evol. 5, 704–716. comply with these terms.

Frontiers in Plant Science| www.frontiersin.org 13 September 2016| Volume 7| Article 1433