RNA Sequencing Reveals the Complex Regulatory Network in the Maize Kernel

RNA Sequencing Reveals the Complex Regulatory Network in the Maize Kernel

ARTICLE Received 23 Mar 2013 | Accepted 29 Oct 2013 | Published 17 Dec 2013 DOI: 10.1038/ncomms3832 RNA sequencing reveals the complex regulatory network in the maize kernel Junjie Fu1, Yanbing Cheng2, Jingjing Linghu3, Xiaohong Yang3, Lin Kang2, Zuxin Zhang4, Jie Zhang3, Cheng He3, Xuemei Du3, Zhiyu Peng2, Bo Wang3, Lihong Zhai4, Changmin Dai2, Jiabao Xu2, Weidong Wang3, Xiangru Li2, Jun Zheng1, Li Chen2, Longhai Luo2, Junjie Liu2, Xiaoju Qian2, Jianbing Yan4, Jun Wang2 & Guoying Wang1 RNA sequencing can simultaneously identify exonic polymorphisms and quantitate gene expression. Here we report RNA sequencing of developing maize kernels from 368 inbred lines producing 25.8 billion reads and 3.6 million single-nucleotide polymorphisms. Both the MaizeSNP50 BeadChip and the Sequenom MassArray iPLEX platforms confirm a subset of high-quality SNPs. Of these SNPs, we have mapped 931,484 to gene regions with a mean density of 40.3 SNPs per gene. The genome-wide association study identifies 16,408 expression quantitative trait loci. A two-step approach defines 95.1% of the eQTLs to a 10-kb region, and 67.7% of them include a single gene. The establishment of relationships between eQTLs and their targets reveals a large-scale gene regulatory network, which include the regulation of 31 zein and 16 key kernel genes. These results contribute to our understanding of kernel development and to the improvement of maize yield and nutritional quality. 1 Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China. 2 Beijing Genomics Institute, Shenzhen 518083, China. 3 National Maize Improvement Center of China, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China. 4 National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China. Correspondence and requests for materials should be addressed to G.W. (email: [email protected]) or to J.W. (email: [email protected]) or to J.Y. (email: [email protected]). NATURE COMMUNICATIONS | 4:2832 | DOI: 10.1038/ncomms3832 | www.nature.com/naturecommunications 1 & 2013 Macmillan Publishers Limited. All rights reserved. ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3832 aize is both a model organism for genetic studies and an 5.03% 0.51% important crop for food, fuel and feed1. Maize kernels 7.52% Maccumulate a large amount of storage compounds such 21.79% as starch, oil and protein. Understanding the genetic regulation of 90–100% 80–90% their synthesis and accumulation will be of great value to maize 7.69% improvement for yield and nutritional quality. In the last decades, 70–80% many genes that are essential for maize kernel development and 60–70% nutrient accumulation have been characterized using genetic 50–60% 2,3 mutants or map-based cloning methods . Linkage or association 7.68% 40–50% analyses have identified more than a hundred of loci or 30–40% 4,5 candidate genes underlying kernel-related traits . Moreover, 20–30% 20.02% the transcriptome profiles of maize kernel have already been 7.83% 10–20% 6–8 analysed in two elite inbred lines , identifying candidate genes 0–10% and coexpression networks involved in kernel developmental pathways. However, our understanding of the processes and the 9.36% gene regulatory networks in maize kernels remain limited. 12.57% With the development of technology and significant reduction in the cost of next-generation sequencing, RNA-seq technology has been successfully used for both single-nucleotide polymorph- ism (SNP) detection and expression quantitative trait loci (eQTL) analysis to reveal gene regulatory networks that are active in specific tissues9,10. In this study, we explore the gene expression profiles of the developing maize kernel by RNA sequencing of 368 1,629,629 inbred lines at 15 days after pollination (DAP). Our purpose is to explore the sequence diversity across the inbred lines, especially in the gene regions, and to discover the gene regulatory networks employed in immature maize kernels. The results show that there 77,117 are extensive gene expression variation and sequence diversity among the inbred lines and 931,484 of 1,026,244 high-quality 85,359 115,663 SNPs are mapped to the gene regions. The genome-wide 28,380 36.225 association study (GWAS) identifies 16,408 eQTL; 95.1% of the 9,267 749,150 eQTLs are within a 10-kb region and 67.7% of them include a single gene. The establishment of relationships between eQTLs and their targets reveals a large-scale gene regulatory network. These results can be used to systematically examine the potential effects of gene variants on kernel-associated traits and biological pathways. Figure 1 | Gene coverage by reads and the comparison of SNPs with those from NAM and B73/Mo17. (a) Gene coverage was calculated as the Results percentage of the gene region covered by reads out of the total gene length. RNA-seq reveals extensive diversity in maize transcripts. The (b) Red circle stands for SNPs of NAM population, blue circle stands for poly(A) þ transcriptome of immature kernels (15 DAP) from SNPs of B73/Mo17, and green circle and yellow circle stands for SNPs of 368 maize inbred lines were sequenced using 90-bp paired-end this study before and after filtering sites with a missing rate 40.6, Illumina sequencing with libraries of 200-bp insert sizes. After respectively. filtering out reads with low sequencing quality, 70.1 million reads were maintained in each sample (Supplementary Data 1). In total, 25.8 billion high-quality reads were obtained. On average, 71.0% Table 1 | Summary of SNPs in 368 maize inbred lines. of the reads were mapped to the B73 reference genome (AGPv2) and 70.3% of the reads to the maize annotated genes (filtered- SNP data set Number Number Number Mean number gene set, release 5b). Among the genes with RNA-seq reads, of SNPs of SNPs in of genes of SNPs 71.6% have coverage of 450% of the gene length (Fig. 1a). Of all gene region per gene the reads mapped to the genome, 83.5% were mapped uniquely Total 3,619,762 2,636,164 32,259 81.7 and these reads were used to build the consensus sequence for SNPs with missing 1,026,244 931,484 23,106 40.3 each sample (Supplementary Data 1). After quality control, we rate o0.6 identified totally 3,619,762 SNPs using B73 as the reference by a SNPs with 525,105 477,797 22,014 21.7 two-step procedure with multiple criteria11,12 (Table 1). Among MAF Z0.05* them, 2,636,164 SNPs were in the exons, which is 5.6 times MAF, minor allele frequency; SNP, single-nucleotide polymorphism. greater than that previously reported in a group of six elite maize *The MAF of each SNP was calculated after the imputation. inbred lines (468,900 exonic SNPs)13, 7.5 times higher than that reported in the nested association mapping (NAM) population (352,000 exonic SNPs)14 and 35.7 times higher than that reported between B73 and Mo17 (73,900 exonic SNPs)14. Moreover, 69.7% alternative allele of any given inbred line was 235,651, with a of SNPs in the NAM population and 87.5% of SNPs in the range from 101,020 to 313,630 SNPs (Supplementary Data 1). B73/Mo17 were included in our SNP set (Fig. 1b). Overall, our Missing genotypes (Supplementary Table S1) were imputed SNP data set included 1.6 million of novel SNPs. Compared with using fastPHASE15. By randomly masking B1% of SNP sites, a the B73 reference genome, the mean number of loci carrying the simulation was performed to determine the imputation accuracy 2 NATURE COMMUNICATIONS | 4:2832 | DOI: 10.1038/ncomms3832 | www.nature.com/naturecommunications & 2013 Macmillan Publishers Limited. All rights reserved. NATURE COMMUNICATIONS | DOI: 10.1038/ncomms3832 ARTICLE (Supplementary Fig. S1). The results indicate that the imputation 0.30 accuracy was 99.3% when the missing data rate cutoff value was RNA-Seq specific set to 0.6. Therefore, 1,026,244 SNPs with a missing data rate of NAM specific Overlap o0.6 were used for imputation to infer missing genotypes. All 0.25 these SNPs were named according to their chromosome positions in the B73 reference genome (Methods). 0.20 SNP quality control and distribution. To evaluate the repro- 0.15 ducibility of genotyping by RNA-seq, we first compared the Percentage genotypes of three pairs of biological replicates SK, Han21 and 0.10 Ye478. The concordant rates between each pair of replicates were 499% (Supplementary Table S2), indicating that our sequencing and SNP calling methods were reproducible. Second, the geno- 0.05 types of this study were compared with the genotypes determined 16 by the MaizeSNP50 BeadChip . By comparing the overlapping 0.00 genotypes, the concordant rate between the genotypes determined 0.00 0.10 0.20 0.30 0.40 0.50 by RNA-seq and those by the MaizeSNP50 BeadChip was 98.6% MAF before imputation and 96.7% after imputation (Supplementary Table S3, Supplementary Fig. S2 and Supplementary Data 2). RNA-Seq specific Given the significant difference of the minor allele frequency 0.7 (MAF) of the overlapped SNPs from that of the non-overlapped NAM specific Overlap SNPs (Supplementary Fig. S3), we further compared the 0.6 concordant rates of SNPs with different MAFs and found that all the SNPs have concordant rates higher than 96% 0.5 (Supplementary Table S4). Considering that most of the SNPs in the MaizeSNP50 BeadChip are common, 355 SNP sites 0.4 containing newly identified rare alleles were randomly selected and validated across 96 inbred lines by the Sequenom MassArray 0.3 iPLEX genotyping system (Supplementary Table S5).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us