Chloroplast Genomes: Diversity, Evolution, and Applications in Genetic Engineering Henry Daniell1*, Choun-Sea Lin2, Ming Yu1 and Wan-Jung Chang2
Total Page:16
File Type:pdf, Size:1020Kb
Daniell et al. Genome Biology (2016) 17:134 DOI 10.1186/s13059-016-1004-2 REVIEW Open Access Chloroplast genomes: diversity, evolution, and applications in genetic engineering Henry Daniell1*, Choun-Sea Lin2, Ming Yu1 and Wan-Jung Chang2 The advent of high-throughput sequencing technolo- Abstract gies has facilitated rapid progress in the field of chloro- Chloroplasts play a crucial role in sustaining life on plast genetics and genomics. Since the first chloroplast earth. The availability of over 800 sequenced chloroplast genome, from tobacco (Nicotiana tabacum), was se- genomes from a variety of land plants has enhanced quenced in 1986 [3], over 800 complete chloroplast gen- our understanding of chloroplast biology, intracellular ome sequences have been made available in the National gene transfer, conservation, diversity, and the genetic Center for Biotechnology Information (NCBI) organelle basis by which chloroplast transgenes can be genome database, including 300 from crop and tree ge- engineered to enhance plant agronomic traits or to nomes. Insights gained from complete chloroplast gen- produce high-value agricultural or biomedical products. ome sequences have enhanced our understanding of In this review, we discuss the impact of chloroplast plant biology and diversity; chloroplast genomes have genome sequences on understanding the origins of made significant contributions to phylogenetic studies of economically important cultivated species and changes several plant families and to resolving evolutionary rela- that have taken place during domestication. We also tionships within phylogenetic clades. In addition, chloro- discuss the potential biotechnological applications of plast genome sequences have revealed considerable chloroplast genomes. variation within and between plant species in terms of both sequence and structural variation. This information has been especially valuable for our understanding of the Introduction climatic adaptation of economically important crops, fa- Chloroplasts are active metabolic centers that sustain life cilitating the breeding of closely related species and the on earth by converting solar energy to carbohydrates identification and conservation of valuable traits [4, 5]. through the process of photosynthesis and oxygen re- Improved understanding of variation among chloroplast lease. Although photosynthesis is often recognized as genomes has also allowed the identification of specific the key function of plastids, they also play vital roles in examples of chloroplast gene transfer to plant nuclear or other aspects of plant physiology and development, in- mitochondrial genomes, which has shed new light on cluding the synthesis of amino acids, nucleotides, fatty the relationship between these three genomes in plants. acids, phytohormones, vitamins and a plethora of metab- In addition to improving our understanding of plant olites, and the assimilation of sulfur and nitrogen. biology and evolution, chloroplast genomics research Metabolites that are synthesized in chloroplasts are im- has important translational applications, such as confer- portant for plant interactions with their environment ring protection against biotic or abiotic stress and the (responses to heat, drought, salt, light, and so on) and development of vaccines and biopharmaceuticals in ed- their defense against invading pathogens. So, chloro- ible crop plants. Indeed, the first commercial-scale pro- plasts serve as metabolic centers in cellular reactions to duction of a human blood protein in a Current Good signals and respond via retrograde signaling [1, 2]. The Manufacturing Processes (cGMP) facility was published chloroplast genome encodes many key proteins that recently [6]. The lack of conservation of intergenic spa- are involved in photosynthesis and other metabolic cer regions, even among chloroplast genomes of closely processes. related plant species, and the species specificity of regu- latory sequences have facilitated the development of * Correspondence: [email protected] 1Department of Biochemistry, School of Dental Medicine, University of highly efficient transformation vectors for the integration Pennsylvania, South 40th St, Philadelphia, PA 19104-6030, USA and expression of foreign genes in chloroplasts. Because Full list of author information is available at the end of the article © 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Daniell et al. Genome Biology (2016) 17:134 Page 2 of 29 the published literature is rarely cross-referenced, this The low accuracy (~85 % of the raw data) of the long review highlights the impact of chloroplast genomes on reads produced by the PacBio platform [45] can be cor- various biotechnology applications. In addition to our rected by combining the latest chemistry with a hier- enhanced understanding of chloroplast biology, we dis- archical genome assembly process algorithm; accuracy cuss in depth the roles of chloroplast genome sequences rates as high as 99.999 % can be achieved after such in improving our understanding of intracellular gene post-error corrections [46]. Accuracy can also be in- transfer, conservation, diversity, and the genetic basis by creased using Illumina short reads [42]. In a study of which chloroplast transgenes are engineered to enhance Potentilla micrantha, sequencing with the Illumina plat- plant agronomic traits or to produce high-value agricul- form produced seven contigs covering only 90.59 % of tural or biomedical products. In addition, we discuss the the chloroplast genome; by contrast, using the PacBio impact of chloroplast genome sequences on increasing platform with error correction, the entire genome was our understanding of the origins of economically im- successfully assembled in a single contig [39]. portant cultivated species and changes that occurred during domestication. Chloroplast genome structure The chloroplast genomes of land plants have highly con- Advances in chloroplast genome sequencing served structures and organization of content; they com- technology prise a single circular molecule with a quadripartite One of the important factors in the rapid advancement structure that includes two copies of an IR region that of the chloroplast genomics field is improvement in se- separate large and small single-copy (LSC and SSC) re- quencing technologies. In studies conducted before the gions (Fig. 1a, b). The chloroplast genome includes 120– availability of high-throughput methods, isolated chloro- 130 genes, primarily participating in photosynthesis, plasts were used for the amplification of the entire chloro- transcription, and translation. Recent studies have iden- plast genome by rolling circle amplification [7–12]. An tified considerable diversity within non-coding intergenic alternative strategy is to screen bacterial artificial chromo- spacer regions, which often include important regulatory some (BAC) or fosmid libraries using chloroplast genome sequences [13]. Despite the overall conservation in sequences as probes [13–20]; however, these methods are structure, chloroplast genome size varies between spe- subject to many challenges, including difficulty in con- cies, ranging from 107 kb (Cathaya argyrophylla)to structing good-quality BAC or fosmid libraries, large 218 kb (Pelargonium), and is independent of nuclear numbers of PCR reactions, and the possibility of contam- genome size (Table 1). Certain lineages of land-plant ination from other organellar DNA [21–32]. The PCR ap- chloroplast genomes also show significant structural re- proach is also difficult to apply to species that have no arrangements, with evidence of the loss of IR regions or relatives whose chloroplast genomes have been sequenced entire gene families. Furthermore, there is also evidence or those with highly rearranged chloroplast genomes. for the existence of linear chloroplast genomes, as illus- The development of next-generation sequencing (NGS) trated in Fig. 1b. The percentage of each form within the methods provided scientists with faster and cheaper cell varies in different reports [47, 48]. methods to sequence chloroplast genomes. Moore and Like the genes, the introns in land-plant chloroplast colleagues [33] first reported using NGS to determine genomes are generally conserved, but the loss of introns chloroplast genome sequences, in Nandina and Platanus. within protein-coding genes has been reported in several Although multiple NGS platforms are available for chloro- plant species [49], including barley (Hordeum vulgare) [8], plast genome sequencing [34], Illumina is currently the bamboo (Bambusa sp.) [28], cassava (Manihot esculenta) major NGS platform used for chloroplast genomes [20], and chickpea (Cicer arietinum) [7]. The proteins [21, 32, 35, 36] because it allows the use of rolling encoded by genes in which intron loss is known to occur circle amplification products [35, 37]. Investigators have diverse functions; they include an ATP synthase can then use bioinformatics platforms to