<<

GENE EXPRESSION PROFILING IN SINGLE CELL C4 AND RELATED PHOTOSYNTHETIC SPECIES IN

By

RICHARD MATTHEW SHARPE

A dissertation submitted in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

WASHINGTON STATE UNIVERSITY Program in Molecular Sciences

DECEMBER 2014

© Copyright by RICHARD MATTHEW SHARPE, 2014 All Rights Reserved

137

i i

© Copyright by RICHARD MATTHEW SHARPE, 2014 All Rights Reserve

To the Faculty of Washington State University:

The members of the Committee appointed to examine the dissertation of RICHARD

MATTHEW SHARPE find it satisfactory and recommend it be accepted.

i i ______Gerald E. Edwards, PhD. Co-Chair

______Amit Dhingra, PhD. Co-Chair

______Thomas W. Okita, PhD.

ii

ACKNOWLEDGMENTS

Gerald E. Edwards, Amit Dhingra, Thomas W. Okita, Sascha Offermann, Helmut Kirchhoff,

Miroslava Herbstova, Robert Yarbrough, Tyson Koepke, Derick Jiwan, Christopher

Hendrickson, Maxim Kapralov, Chuck Cody, Artemus Harper, John Grimes, Marco Galli, Mio

Satoh-Cruz, Ananth Kalyanaraman, Katherine Evans, David Kramer, Scott Schaeffer, Nuria

Koteyeva, Elena Voznesenskaya, National Science Foundation, Washington State University i i Program in Molecular Plant Sciences i

iii

GENE EXPRESSION PROFILING IN SINGLE CELL C4 AND RELATED

PHOTOSYNTHETIC SPECIES IN SUAEDOIDEAE

Abstract

by Richard Matthew Sharpe, Ph.D. Washington State University December 2014

Co-Chair: Gerald E. Edwards Co-Chair: Amit Dhingra

Slightly over a decade ago aralocaspica, a higher land plant species that performs

C4 photosynthesis in a single cell was discovered. Subsequent to this discovery three additional i species in the genus, a sister clade to the Suaeda genus, were reported that perform the v

C4 photosynthetic function in a single chlorenchyma cell. Since the discovery of these with a novel form of anatomy associated with photosynthesis, the genetic resources required for the advancement of knowledge of this phenomenon have been lacking. The goal of providing the genetic resources required to advance the knowledge of how these species attain the capability to perform C4 photosynthesis in a single cell has been the focus of this research.

The advent and maturing of High Throughput Sequencing (HTS) technologies has allowed for the generation of the massive amount of genomic and transcriptomic sequence information required to provide the resources required to investigate the unique genetic landscape of these single cell C4 (SCC4) species. The state of current knowledge about the SCC4 species is provided as well as the use of HTS technologies to elucidate the transcriptomic landscape of the developing Bienertia sinuspersici leaf and a photosynthesis-centric transcriptome comparison

iv

between the different structural C4 and C3 type photosynthetic species in the Suaedoideae subfamily is detailed. The B. sinuspersici developmental profile indicates that the young leaf tissue devotes the majority of the transcriptional energy in cell division, transcription and regulation whereas the transcriptional energy in the mature tissue is focused towards maintenance of the photosynthetic processes. Differential translational and chloroplast import components between the tissues are quite evident as well. Species level photosynthetic comparisons indicated differential isoform recruitment into the various pathways. The identification and characterization of the induction and regulation of genes required to develop dimorphic chloroplasts in a single cell will enable efforts to instill C4 traits into C3 species.

v

v

TABLE OF CONTENTS

ACKNOWLEDGMENTS ...…………………………...... …………………………………iii

ABSTRACT ...………………….……………...……………...……..…………………………..iv

LIST OF TABLES …………………….…………………………………....…………….………x

LIST OF FIGURES ………………………………….…………………………....…….……….xi

SUPPLEMENTARY DATA….………………………………………………………..….…….xii

CHAPTER ONE …………………………………………...…………………………..…………1

INTRODUCTION …………………………………………………………………….………….1

Conservation of Energy …………………………………………………………….…….1

Plant Research ……………………………………………………………………..……..2

Molecular Plant Research ………………………………………….……….…………….3

Photosynthesis ……………………………………………..……….…………………….4 v i Focus of Research ………………………………………………………………………..5

“One decade after the discovery of single-cell C4 species in terrestrial

plants - What did we learn about the minimal requirements of C4

photosynthesis?” ……………………………………………………………………...…5

Summary………………………………………………………………………………..39

Chapter References ……………………………………………………………..………40

CHAPTER 2……………………………………………………………………………….……41

DEVELOPMENTAL TRANSCRIPTOMES OF THE SINGLE CELL C4

PHOTOSYNTHETIC TYPE PLANT BIENERTIA SINUSPERSICI

Abstract …………………………………………………………………………………41

Introduction ……………………………………………………………………………..42

vi

Results …………………………………………………………………………………..46

Read Quality, Trimming, Mapping and Overall Transcriptome Expression …..46

Read Trimming Parameters …………………………………………..…46

EST Gene Ontology ……………………………………………………..48

EST Gen Ontology Classifications ….…………………………………..50

Representation of ccGO ….….…………………………………………..52

Representation of mfGO ….….……………………………………….....54

Representation of bpGO ….….…………...……………………………..55

Discussion ……………………………………………………………………………….57

Materials and Methods …………………………………………………………………..62

Plant Material ……………………………………………………………………62

RNA Extraction …………………………………………………………………62 v Illumina Sequencing …………………………………………………………….63 i i 454 Sequencing ………………………………………………………………….64

Bioinformatics …………………………………………………………………..64

Initial Assembler Comparison …………………………………………..64

Data assembly …………………………………………………………...65

Annotation ……………………………………………..………………..65

References ……………………………………………………………………………….67

CHAPTER 3 …………………………………………………………………………………….81

COMPARATIVE TRANSCRIPTOMICS OF SINGLE CELL C4, KRANZ C4 AND C3

PHOTOSYNTHETIC TYPES IN SUAEDOIDEAE

Abstract ………………………………………………………………………………....81

vii

Introduction ……………………………………………………………………………..82

Results ……………………………………………………………….………………….85

Post-translational Components ………………………………….………………85

14-3-3 chaperones ……………………………………….………………85

HSP 70 …………………………………………………………………..86

Chloroplast Import ………………………………………………………………87

Photosynthetic Import TOC components ……………………………….87

Housekeeping Import TOC components ………………………………..88

C4 pathway components ………………………………………………………..89

Alternate C4 Biochemical Enzymes ……………………………………………91

Discussion ……………………………………………………………….………………91

Material and Methods ………………………………………………….….…………….96 v Plant Material …………………………………………………………………...96 i i Genome size estimates ………………………………………………………….96 i

RNA Extraction ………………………………………………………...……….96

Illumina Sequencing …………………………………………………………….97

454 Sequencing ………………………………………………………………….98

Bioinformatics …………………………………………………………………………..98

Data Assembly …………………………………………………………..……...98

Annotation ………………………………………………………………………99

Proteomics and Transcriptomics Comparison …………………………………100

References …………………..………………………………………………………….101

CHAPTER 4…………………………………………………………………………...……….112

viii

Conclusions and Future Perspectives…………………………………………………..112

References……………………………………………………………………...………114

i x

ix

LIST OF TABLES

CHAPTER 2

Table 1. B. Sinuspersici Developmental Stage ESTs & Percentage of ESTs Per Fold Change ...72

Table 2. Gene Ontology Enrichment in young and mature Bienertia sinuspersici……………...74

Table 3. GO term enrichment and number of ESTs representing enriched GO term……………76

x

x

LIST OF FIGURES CHAPTER 1

Figure 1. Comparison of C3 and the two structural types of single-cell C4 photosynthesis ……27

Figure 2. Biochemistry of SCC4 species ………………………………………………..………28

CHAPTER 2

Figure 1. de novo Transcriptome Build Workflow ……………...………………………………71

Figure 2. B. Sinuspersici Developmental Stage ESTs & Percentage of ESTs Per Fold Change...73

Figure 3. Enriched GO term relationships……………………………………………………….75

Figure 4. Over-represented cellular component gene ontology distribution…………………….79

CHAPTER 3

Figure 1. 14-3-3 Identification and Expression Values………………………………….……..104

Figure 2. Heat Shock Protein Identity and RPKM Values………………….………….………105 x Figure 3. Photosynthetic Pre-protein Import TOC Component Expression…………..……….106 i

Figure 4. Initial analysis of carbonic anhydrase isoforms…………………………….…….....108

Figure 5. Phosphoenolpyruvate carboxylase (PEPC) expression values………………………109

Figure 6. Transaminating and decarboxylation enzymes………………………………………110

Figure 7. Alternate C4 Biochemical Decarboxylases………………………………………….111

xi

SUPPLEMENTARY DATA SUPPLEMENTARY DATA 1

Flow Cytometric Estimation of Nuclear DNA Content of B. sinuspersici Leaf Samples……...116

Supplementary Data 1 Table 1. Flow cytometry results of 16 B. sinuspersici replicates……...117

SUPPLEMENTARY DATA 2

Supplementary Data 2 Figure 1………………………………………………………………...121

Supplementary Data 2 Table 1. Bienertia sinuspersici top species blastx to blastn comparisons with e-values of 0.00……….…………………………………………………………………..122

Supplementary Data 2 Figure 2. Bienertia sinuspersici Gene Ontology Characterization…...123

References……………………………………………………………………………………..124

Supplementary Data 2 Table 2…………………………………………………………………125

Supplementary Data 2 Table 3…………………………………………………………………128 x ASSEMBLY OF DIFFERENT GENE STRUCTURES i i Pyruvate, orthophosphate dikinase……………………………………………………………..137

Supplementary Data 3 Figure 1. Suaedoideae species PPDK characterization……………...... 138

Supplementary Data 3 Figure 2. Serine-Glyoxylate Transaminase……………………………139

References………………………………………………………………………………………140

xii

Chapter 1

Introduction

Why would anyone think plant research is cool?

Conservation of energy.

The conservation of energy law, the First Law of Thermodynamics, states that energy in a system cannot be created or destroyed but can change forms. Plants convert solar energy into chemical energy through the process of photosynthesis. This conversion of radiant energy to chemical energy produced by the photosynthetic process in plants, algae and bacteria has affected human development from the dawn of time. Mankind has been developing physically and culturally since man’s ability to tame the conversion of energy stored in dried plant material 1 in the form of heat and light via fire to the cultivation and domestication of wild plant species for agricultural purposes. The oxygen that we breathe is one of the byproducts of the photosynthetic process as well as the many unique products plants produce in the form of sugars and oils, used for both nutrition and biofuels as well as medicinal compounds. The chemotherapy drug Taxol® as well as the heart therapy drug digoxin are both products produced by plants. Ancient relatives of the endosymbiotic photosynthetic organism engulfed by the precedent species of plants constitute the chloroplast organelle present in higher plants. It is this organelle where the majority of these processes take place. Ancient relatives of modern day plants, algae and bacteria, as well as present day photosynthetic species, are responsible for generating the oxygen in the atmosphere we breathe. Plants are at the base of the processes responsible for many of the fossil fuels widely used as our primary fuel sources today. Coal and oil are the remnants of vast

1

ancient vegetative tracts subjected to slow pressure and heat. Energy can only be changed, not created or destroyed, so the radiant energy absorbed by plants is converted into chemical energy through the photosynthetic process. This chemical energy is used in the production of the products we find beneficial from the energy we consume to the atmosphere we breathe to the medicinal compounds we use and the agriculture that sustains us. It is this conservation of energy through the conversion of radiant energy to chemical energy that produce the basis for mankind’s existence and the photosynthetic process is the catalyst which makes all these phenomena possible. So really the question should be why anyone would not want to pursue research on plants.

Plant research

Plant research runs the gamut from studies of the biosphere to the molecular aspects of 2 genes and metabolites. Major areas in plant research are focused on areas most important to us as humans from the quality of our atmosphere, to crop sustainability, the medicinal qualities of plant and plant metabolites to basic research on plants in general used to form the building blocks and ideas used in the more applied areas of plant research.

One of the major areas of research is to increase the yield of crops to provide for the future increased projected demand for crop consumption. Crop yields are directly related to photosynthetic efficiency and increasing the photosynthetic efficiency of crops can increase crop yields as well as plant biomass and carbon assimilation in biofuel feedstocks. The majority of agricultural crops are classified as C3 photosynthetic types and an avenues to increase their yields is by increasing land usage, carbon assimilation and water use efficiency. C4 photosynthesis has been shown to have an increased carbon assimilation and water use efficiency

2

compared to the C3 photosynthetic types. Regardless of the type of photosynthesis performed, these attributes are bestowed upon the organism by their genetic makeup. Genetic research on C3 and C4 type plants to date indicates that the genes recruited in C4 photosynthesis are still present in C3 plants (Aubry et al., 2011).

Molecular Plant Research

The majority of molecular plant research revolves around model plant systems such as

Arabidopsis, maize and tobacco. The prime reason for the focus on these three plants is they have had their genomes sequenced and mapped as well as being amenable to transformation techniques which enable focused genetic research. Genetic sequence is the lynchpin of molecular research. Correct sequence enables laboratory techniques as simple as polymerase chain reactions (PCR) to complex techniques involved in single nucleotide polymorphism (SNP) 3 identification. These techniques permit researchers to create the genetic materials necessary to produce plants with gene knockouts and gene over expression lines allowing the characterization of a genes function. There are modifications in these techniques which permit the characterization of a genes function but the major portion of research time is spent in generating and confirming the genes sequence prior to creating the genetic material. The ability to query genetic databases for sequence pertinent to a genes function reduces the fiscal, personnel and time requirements related to produce the sequence.

The a priori knowledge of genetic sequence information in model plant system research allows for the correlation and transference of sequence knowledge from model systems to non- model systems. Model systems on the other hand cannot provide direct insight into novel medicinal, nutritional or bioenergy aspects present in non-model systems. The information that

3

model systems can provide to the study of non-model systems with novel phenotypes is invaluable. Predicted non-model genetic sequence annotation from model species can encompass the scope from gene identification to motif identification and gene ontology. The comparative nature of this technique establishes a baseline for the advancement of knowledge conducive in the investigation of novel biology and molecular aspects of non-model species.

Photosynthesis

Photosynthesis is the conversion of radiant energy to chemical energy by building the carbon backbones necessary to carbohydrate anabolism. Photonic energy is captured by the chlorenchyma cells light harvesting apparatus and converted various cellular processes to the energetic molecules of adenosine triphosphate (ATP) and reduced nicotinamide adenine dinucleotide phosphate (NADPH). The proteins and enzymes of the chloroplastic electron 4 transport chain utilize the solar energy captured by the light harvesting apparatus to split H2O into O2 producing available protons and electrons in the process. The electrons are used to reduce

NADP+ to NADPH and the protons are used in a proton gradient across the chloroplast thylakoid membrane through ATP synthase resulting in ATP from ADP and phosphate. These energetic molecules are then utilized in a myriad of processes which includes the Calvin-Benson cycle to reduce and regenerate the carbohydrate precursor intermediates used in the fixation of carbon. In the C3 type photosynthesis process, carbon atoms from atmospheric CO2 are directly incorporated via Ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) while C4 type photosynthesis uses CO2 decarboxylated from four carbon acids. Since Rubisco has an almost equal affinity for CO2 as O2 the ability to convert CO2 into mobile 4 carbon acids then sequester and the C4 acids to concentrate CO2 in an almost O2 depleted environment around

4

Rubisco imparts an energetic efficiency to C4 plants in relation to C3 type plants. The sequestration is accomplished in an additional cell type and it is this dual cell synergy of Kranz anatomy that the majority of the C4 land plants employ.

Focus of Research

Approximately 10 years ago it was discovered Suaeda aralocaspica, a higher land plant species, performs C4 photosynthesis in a single cell. Subsequent to this discovery three additional species in the Bienertia genus, a sister clade to the Suaeda genus, performs the C4 photosynthetic function in a single chlorenchyma cell. Since the discovery of these plants with a novel form of anatomy associated with photosynthesis, the genetic resources required for the advancement of knowledge of this phenomenon have been lacking. The goal of providing the genetic resources required to advance the knowledge of how these species attain the capability to 5 perform C4 photosynthesis in a single cell has been the focus of this research. The following review published in Photosynthesis Research outlines the previous information related to advancing the anatomy, photosynthetic and molecular knowledge of the single cell C4 phenomenon.

5

One decade after the discovery of single-cell C4 species in terrestrial plants -

What did we learn about the minimal requirements of C4 photosynthesis?

Richard M. Sharpe1.2 and Sascha Offermann3

1 School of Biological Science, Washington State University, Pullman, USA

2 Molecular Plant Sciences Graduate Program, Washington State University, Pullman, USA

3 Institute of Botany, Leibniz University Hannover, Germany

Corresponding author:

Sascha Offermann, [email protected], phone: +49-511-762-4452, fax: +49-511-762-19262

Abstract 6

Until about 10 years ago the general accepted textbook knowledge was that terrestrial C4 photosynthesis requires separation of photosynthetic functions into two specialized cell types, the mesophyll and bundle sheath cells forming the distinctive Kranz anatomy typical for C4 plants.

This paradigm has been broken with the discovery of Suaeda aralocaspica, a chenopod from central Asia, performing C4 photosynthesis within individual chlorenchyma cells. Since then, three more single-cell C4 (SCC4) species have been discovered in the genus Bienertia. They are interesting not only because of their unusual mode of photosynthesis but also present a puzzle for cell biologists. In these species, two morphological and biochemical specialized types of chloroplasts develop within individual chlorenchyma cells, a situation that has never been observed in plants before.

Here we review recent literature concerning the biochemistry, physiology and molecular biology

6

of SCC4 photosynthesis. Particularly, we focus on what has been learned in relation to the following questions: How does the specialized morphology required for the operation of SCC4 develop and is there a C3 intermediate type of photosynthesis during development? What is the degree of specialization between the two chloroplast types and how does this compare to the chloroplasts of Kranz C4 species? How do nucleus-encoded proteins that are targeted to chloroplasts accumulate differentially in the two chloroplast types and how efficient is the CO2 concentrating mechanism in SCC4 species compared to the Kranz C4 forms?

Keywords: photorespiration, single-cell C4 photosynthesis, Kranz, mesophyll, bundle sheath, chloroplast differentiation.

Introduction 7 In the early 20th century Otto Warburg discovered that oxygen inhibits photosynthesis (Warburg

1920). This “Warburg effect” was later named photorespiration and occurs when ribulose-1,5- bisphosphate carboxylase/oxygenase (Rubisco) reacts with oxygen instead of carbon dioxide

(Fig1a). While the carboxylation reaction generates phosphoglycerate (P-glycerate), which is subsequently used in the reductive pentosephosphate pathway (RPP) to generate carbohydrates, the reaction with oxygen generates P-glycerate, as well as phosphoglycolate (P-glycolate), a molecule with no known metabolic use (Ogren 1984). P-glycolate must be converted through the photorespiratory pathway in a series of biochemical reactions involving mitochondria, peroxisomes and chloroplasts. Previously fixed carbon and nitrogen are lost during this process and additional energy must be invested for their refixation. Stressors, such as elevated temperature, limited water availability and high salinity, trigger a rise in photorespiration and it

7

is estimated that up to 25% of previously fixed carbon can be lost in C3 plants during this process

(Sage 2004).

C4 photosynthesis is a natural adaptation to suppress photorespiration and is usually associated with Kranz anatomy (Fig1b). In Kranz C4 species, CO2 is initially assimilated in the photosynthetic carbon assimilation compartment (PCA) by phosphoenolpyruvate carboxylase

(PEPC), which is insensitive to oxygen. PEPC is located in concentric rings of mesophyll cells

(MC), which surround the photosynthetic carbon reduction (PCR) compartment in the central bundle sheath cells (BSC). Only the PCR compartment contains Rubisco. The C4 acids that result from the initial assimilation diffuse from the MC to the BSC where they are decarboxylated and the released CO2 is fixed by Rubisco. This mechanism actively concentrates CO2 in the PCR, thereby suppressing Rubisco’s oxygenase activity. As a consequence, photorespiration is drastically reduced in C4 plants (von Caemmerer 2000). In Kranz C4 species, the BSC density is 8 significantly increased as compared to C3 species (compare Fig1a and Fig1b) and numerous plasmodesmata connect the MC and BSC, probably allowing for efficient metabolite exchange

(Weiner et al. 1988). BSC of Kranz C4 species often have reinforced cell walls and it has been proposed that this has a significant role in preventing CO2 from leaking out of the Rubisco containing compartment (von Caemmerer and Furbank 2003). Therefore, the specific differentiation into MC and BSC was thought to be essential for an efficient CO2 concentrating mechanism in C4 plants (Hatch 1987).

Single-cell C4 photosynthesis

It has been known for some time that several freshwater aquatic monocots can operate an inducible C4-like carbon concentrating mechanism (CCM) without the need for special

8

anatomical adaptations (Bowes et al. 2002; Bowes and Salvucci 1989). However, since all previously discovered plants operating a full C4 cycle have been shown to rely exclusively on

Kranz anatomy, the finding of a terrestrial plant capable of performing C4 photosynthesis within individual cells was surprising. Terrestrial single-cell C4 (SCC4) photosynthesis was first shown in Suaeda aralocaspica (formerly Borszczowia aralocaspica), a halophytic species belonging to family Chenopodiaceae growing in the central Asian and Iranian semi-arid salt deserts (Freitag and Stichler 2000; Voznesenskaya et al. 2001). Since then, with Bienertia cycloptera (Freitag and Stichler 2002; Voznesenskaya et al. 2002), Bienertia sinuspersici (Akhani et al. 2005) and

Bienertia kavirense (Akhani et al. 2012), three more SCC4 species have been discovered.

Phylogenetic analyses support two separate origins of SCC4: one in the Bienertia clade and the other in S. aralocaspica. It is likely that Bienertia is the sister lineage to the Suaeda clade and the two genera together make up the Suaedoideae subfamily (Kapralov et al. 2006; Sage et al. 9 2011). Some analyses instead place Bienertia sister to the Salicornioideae, albeit with weak support (Kadereit et al. 2012). It is possible that the Bienertia lineage of three species developed

SCC4 prior to S. aralocaspica (Bienertia: 20.8-2.6 My; S. aralocaspica: 7.7 My – present;

(Christin et al. 2011)), but this is not clear given the age estimate overlap.

Leaf and cell morphology of single-cell C4 species

The leaf and cell anatomy of SCC4 species is very different from Kranz type C4 species. Two different structural forms have been reported. The first structural form is found in S. aralocaspica (Fig1c). The characteristic unusually long chlorenchyma cells are arranged in a single layer in the leaf. They are densely packed and devoid of intercellular air space proximal to the water storage cells which surround the vascular tissue but loosely packed with plenty of

9

intercellular space on the distal end (Freitag and Stichler 2000). Dimorphic chloroplasts are localized at the opposite poles, forming a proximal and a distal (relative to the vascular tissue) compartment. A large vacuole divides the two compartments and cytoplasm is restricted to a thin layer between the vacuole and the plasma membrane.

In contrast, Bienertia species show a completely different and highly unusual subcellular morphology (Fig1d). The chlorenchyma cells of this structural type are generally loosely packed in two to three layers with large intercellular spaces surrounding the cells. The most prominent anatomical feature of this type of cell is a cytoplasmic compartment, densely packed with chloroplasts, localized to the center of the cell (Freitag and Stichler 2002; Voznesenskaya et al.

2002). A second chloroplast type is located at the periphery of the cell. Cytoplasmic channels transecting the central single vacuole connect the cytoplasm of the two compartments (Park et al.

2009; Voznesenskaya et al. 2005). The chloroplasts in the two domains differ vastly in their 1 number, size and shape. Chlorenchyma cells contain about six times more chloroplasts in the 0 central compartment than in the periphery, but the peripheral chloroplasts are about twice as large and contain more chlorophyll per chloroplast (Offermann et al. 2011a). The granal index of the peripheral chloroplasts was reported to be about one third smaller compared to the chloroplasts of the central compartment in B. cycloptera (Voznesenskaya et al. 2002) and less grana were also observed in the distal compared to the proximal chloroplasts of S. aralocaspica

(Voznesenskaya et al. 2003). In summary, the differences in the morphology suggest different functions for the two chloroplast types, similar to the dimorphic chloroplasts of Kranz C4 species.

Terrestrial C4 photosynthesis does not require Kranz anatomy

The first indication for a single-celled C4 mechanism operating in these species came from

10

13 carbon isotope discrimination data on plant biomass (δ C). C4 plants discriminate less against

13 the heavier carbon isotope, C, than C3 plants, mainly due to different fractionation by the two different carboxylases (PEPC vs. Rubisco) utilized for the initial carboxylation reaction

(Farquhar 1983). Carbon isotope composition in C4 plants is therefore usually more positive

13 12 (higher C/ C ratio) than in C3 plants when expressed relative to a standard, and can be used to differentiate between the two photosynthetic types.

13 Generally, δ C values observed in the SCC4 species were much more positive compared to C3 plants. They ranged from approx. −12‰ in the recently discovered B. kavirense (Akhani et al.

2012) to −13 to −14‰ in S. aralocaspica (Freitag and Stichler 2000; Voznesenskaya et al. 2001) and −14 to −15‰ in mature stems and leaves of B. cycloptera and B. sinuspersici (Akhani et al.

2005; Freitag and Stichler 2002; Voznesenskaya et al. 2002), indicating initial carbon fixation via PEPC in these species. Since isotope composition analysis cannot directly differentiate 1 between C4 and CAM photosynthesis (both use PEPC for initial CO2 fixation and therefore can 1

13 have similar δ C values), titratable acids were measured in both structural variants of SCC4. The analyses showed that C4 acids (e.g. malate) do not accumulate during the nighttime

(Voznesenskaya et al. 2003; Voznesenskaya et al. 2002). These results exclude the possibility of

CAM photosynthesis in these species. Interestingly, B. cycloptera has been observed to have more negative δ13C values (approx. −16 to −21‰) in newly emerging leaves and stems from plants grown in a greenhouse or in growth chambers. It has been speculated that this could be due to an incomplete C4 mechanism in the young tissues causing increased leakage of CO2 from the Rubisco containing compartment (Voznesenskaya et al. 2002). Furthermore, different samples from natural habitats, while exhibiting a C4 signature, showed considerable variation

13 (δ C of −12 to −16‰) leading to the hypothesis that B. cycloptera might perform C4 or C3

11

photosynthesis depending on the environmental conditions (Freitag and Stichler 2002;

Voznesenskaya et al. 2002). To understand if B. cycloptera functions naturally as a facultative C4 or an obligate C4 plant, a systematic sampling from natural habitats over a complete growing season was performed. Isotope composition analysis suggested that B. cycloptera performs C4 photosynthesis throughout its life cycle in nature with an average δ13C of −15.2‰ (Akhani et al.

13 2008). The fluctuations in the δ C compositions reported for the different samples from SCC4 species are therefore likely related to environmental factors including changing light and water availability. Such fluctuations have also been reported before to occur in Kranz C4 species under changing environmental conditions (Buchmann et al. 1996).

Besides evidence from the isotope discrimination data, the measured CO2 compensation points were much lower in S. aralocaspica, B. sinuspersici and B. cycloptera than in related C3 chenopod species and their CO2 fixation rates were near saturation under ambient CO2 1 concentration. Also, photosynthesis was insensitive to the level of oxygen concentration 2

(Voznesenskaya et al. 2001, 2002; Smith et al. 2009). These parameters indicate the operation of a carbon concentrating mechanism in these species. In the following section, we will summarize the evidence for a NAD-ME type C4 pathway operating in the currently discovered SCC4 plants.

To provide the biochemical background, we will first give a brief overview over Kranz C4 biochemistry and then compare Kranz C4 with the situation found in the SCC4 species.

Evidence for an NAD-ME type C4 biochemistry in the currently known SCC4 species

C4 species are generally categorized into three biochemical subtypes according to the different decarboxylases (NADP-malic enzyme [NADP-ME], NAD-ME and phosphoenolpyruvate carboxykinase [PEP-CK]) utilized predominantly in the C4 cycle. However, recent evidence

12

suggests some flexibility in utilization of different decarboxylation pathways at least in some species (Furbank 2011; Pick et al. 2011; Sommer et al. 2012). Common to all three biochemical subtypes are the initial steps for primary CO2 fixation (Fig2). CO2 is first hydrated in the photosynthetic carbon assimilation (PCA) compartment, which is equivalent to the MC of Kranz

- C4 species. Carbonic anhydrase (CA) forms bicarbonate (HCO3 ), which is subsequently used by cytosolic PEPC to form oxaloacetate (OAA). Depending on the biochemical subtype, OAA is either predominantly reduced to malate in the chloroplasts of the PCA compartment (NADP-ME type) or predominantly transaminated by cytosolic aspartate aminotransferase (ASP-AT) to aspartate (ASP) (NAD-ME type). In PEP-CK species, two pathways operate in parallel with most of the OAA being transformed to ASP in the cytosol and some OAA being reduced to malate (MA) in the chloroplasts. In all subtypes, the C4 acids (ASP or MA) then diffuse into the photosynthetic carbon reduction (PCR) compartment, which is equivalent to the BSC in Kranz 1

C4 species. In NADP-ME species, decarboxylation of C4 acids occurs directly in the chloroplasts. 3

In NAD-ME species ASP is first deaminated to OAA by mitochondrial ASP-AT followed by reduction to MA through NAD-malate dehydrogenase (NAD-MDH). MA is then decarboxylated yielding CO2 and pyruvate (PA). In PEP-CK species, ASP is deaminated to OAA by cytosolic

ASP-AT and subsequently decarboxylated by cytosolic PEP-CK. This reaction directly regenerates phosphoenolpyruvate (PEP) that can diffuse back into the mesophyll and serve as an

- acceptor molecule for HCO3 . In parallel, MA is decarboxylated to PA by NAD-ME in the mitochondria. In all three subtypes, PEP is regenerated from PA in the PCA compartment chloroplasts (left part of Fig2) by phosphoenolpyruvate, pi-dikinase (PPDK) utilizing ATP provided by the light reactions of photosynthesis.

14 In the SCC4 species, C-labeling experiments in both structural forms showed that the majority

13

14 of label after a short exposure to CO2 was in aspartate and malate (Voznesenskaya et al. 2003;

Voznesenskaya et al. 2002). Additionally, highly abundant isoforms of cytoplasmic and mitochondrial ASP-AT as well as cytoplasmic alanine aminotransferase (ALA-AT), were detected, compatible with an NAD-ME type ASP-ALA shuttle between the two compartments

(Lara et al. 2008; Park et al. 2010; Offermann et al. 2011a). Activities of other C4 enzymes such as PPDK, PEPC and NAD-ME were on similar levels as observed in Kranz C4 species in both, S. aralocaspica and B. cycloptera. Although some NADP-ME activity was initially observed in S. aralocaspica, the enzyme was not detectable in in situ immunolocalization studies or western blot analysis in either S. aralocaspica (Voznesenskaya et al. 2001) or B. cycloptera

(Voznesenskaya et al. 2002). Immunolocalization studies (Voznesenskaya et al. 2001;

Voznesenskaya et al. 2002; Voznesenskaya et al. 2005) and Western blot analysis on isolated subcellular fractions (Offermann et al. 2011a) revealed that PEPC is located throughout the 1 cytoplasm without indication for subcellular preference for the distal or proximal compartment in 4

S. aralocaspica or the peripheral and central compartment in the Bienertia species. In B. sinuspersici, a PEPC was reported with typical C4 features, i.e. high specific activity, light activation and reduced sensitivity to malate inhibition in the light (Lara et al. 2006). It was proposed that PEPC activity in the proximal or central compartments of SCC4 must be restricted to prevent futile cycling (release of CO2 by the decarboxylase and re-fixation by PEPC) but it remains yet to be shown how this is accomplished (Lara et al. 2006). In contrast, Rubisco specifically localized to the proximal chloroplasts of S. aralocaspica (Voznesenskaya et al.

2001) and to the central compartment chloroplasts of Bienertia (Offermann et al. 2011a;

Voznesenskaya et al. 2002), respectively, together with accumulation of starch in these chloroplasts (Voznesenskaya et al. 2003; Voznesenskaya et al. 2005). PPDK accumulated in the

14

distal chloroplast of S. aralocaspica (Voznesenskaya et al. 2001) and peripheral chloroplasts of

Bienertia (Offermann et al. 2011a; Voznesenskaya et al. 2002) indicating an enzyme accumulation pattern that is comparable to the differential accumulation of these enzymes in the

MC and BSC of Kranz C4 species.

In NAD-ME C4 species the decarboxylation of C4 acids occurs within the mitochondria.

Accordingly, subcellular localization of mitochondria in both structural variants of SCC4 was investigated. Electron microscopy, immunolocalization with antibodies for mitochondrial NAD-

ME and glycine decarboxylase (Voznesenskaya et al. 2001; Voznesenskaya et al. 2002,

Voznesenskaya et al. 2005) as well as utilization of mitochondria specific fluorescent probes

(Chuong et al. 2006; Park et al. 2009) and in vivo localization studies using a mitochondrial transit peptide fused to green fluorescent protein (Lung et al. 2011) (B. sinuspersici only) all indicated that mitochondria are restricted to the PCR compartment in these species. They occur 1 almost exclusively either near the proximal chloroplasts of S. aralocaspica or within the central 5 compartment in the case of B. sinuspersici. The close proximity of mitochondria and PCR chloroplasts in SCC4 species is therefore in accordance with their proposed function of delivering CO2 to Rubisco.

For B. sinuspersici, a protocol was developed for the isolation and purification of the dimorphic chloroplasts from mesophyll protoplasts based on different densities of the central cytoplasmic compartment and the peripheral chloroplasts. However, separation of Suaeda type dimorphic chloroplasts has not been attempted yet, mainly due to lack of sufficiently distinguishing features. For B. sinuspersici, Western analysis confirmed the differential localization of Rubisco and PPDK in the two chloroplast types (Offermann et al. 2011a) and, interestingly, the localization of the pyruvate transporter BASS2 in the peripheral chloroplasts, which has also

15

been observed to accumulate specifically in the MC of Kranz C4 species (Furumoto et al. 2011).

Physiological measurements with the isolated chloroplasts showed that only the chloroplasts of the central compartment had a fully operational reductive pentose phosphate pathway (RPP), which is consistent with the observed accumulation of starch in these chloroplasts. In contrast, the peripheral chloroplasts did not fix CO2 but were capable of light driven formation of PEP

(Offermann et al. 2011a). This suggests that pyruvate is transported into the peripheral chloroplasts where it is converted to PEP by PPDK. As in Kranz C4 mesophyll chloroplasts, PEP is presumably exported out of the chloroplast with exchange for inorganic phosphate to maintain the phosphate balance.

Together, the morphological, biochemical and physiological evidence presented in these studies indicate the SCC4 species perform a fully operational C4 pathway within individual cells. The two SCC4 systems mimic the Kranz system in that they have a compartment analogous to BSC in 1 Kranz type, either at the proximal end (S. aralocaspica) or in the central compartment 6

(Bienertia). Accordingly, a compartment analogous to mesophyll in Kranz type plants is found at the distal end of the cell as in the case of S. aralocaspica or at the periphery in the Bienertia species. The currently known SCC4 species operate NAD-ME type biochemistry compatible with the model shown in Fig2 and similar to that observed in related Kranz type C4 NAD-ME species

(Offermann et al. 2011a). These species therefore demonstrate that there is no absolute requirement for Kranz anatomy in order to perform C4 photosynthesis in terrestrial plants and that the required separation of photosynthetic functions can be achieved through subcellular compartmentalization instead.

16

Development of the SCC4 morphology and differential accumulation of C4 enzymes

The development of two photosynthetic cell types along with the specialization of chloroplast function in Kranz C4 species has been the subject of much interest. Previous studies suggested that a developmental transition occurs from a monomorphic C3 default mode with Rubisco containing chloroplasts in both young MC and BSC followed by development of dimorphic chloroplasts and C4 biochemistry (Berry et al. 1997; Sheen 1999). However, recent studies found no evidence for an intermediate stage with functional C3 photosynthesis in developing maize leaves (Pick et al. 2011; Sharpe et al. 2011).

In the SCC4 species, cells from young leaves (approx. 0.1 to 0.3 cm long and harvested directly after emerging) initially lack the characteristic mesophyll cell morphology followed by a progressive development towards the formation of two cytoplasmic compartments along with specialization of the two chloroplast types in mature leaves. For example, young S. aralocaspica 1 cells only have a single chloroplast type containing some Rubisco as well as starch. Also 7 mitochondria are still randomly distributed with no clear preference for the proximal compartment as observed later in mature cells (Voznesenskaya et al. 2003). Similarly, cells from young B. sinuspersici leaves lack the prominent structure of the central compartment and the large central vacuole has not yet formed (Park et al. 2009; Voznesenskaya et al. 2005). Rubisco and expression of photorespiratory enzymes are already high in cells from young leaves, whereas

PPDK and PEPC expression remain low until the compartments develop (Lara et al. 2008). Also young cotyledons of B. cycloptera contain only one chloroplast type with Rubisco, but no detectable PPDK and little PEPC (Voznesenskaya et al. 2004). Therefore, C4 enzyme expression clearly lags behind the expression of Rubisco and the photorespiratory enzymes in the SCC4 species.

17

Isotope analysis indicated incomplete C4 photosynthesis in young leaves of B. sinuspersici with more negative δ13C values than in mature leaves (Voznesenskaya et al. 2005). In contrast, δ13C values were typical for C4 in young, intermediate and mature samples from S. aralocaspica

(Voznesenskaya et al. 2003). However, no gas exchange measurements have been performed on the young leaves of the SCC4 species due to technical limitations (young leaves are less than 3 mm in size). Therefore, it is not clear if cells, which haven not developed the full SCC4 morphology, perform C3 photosynthesis or rather constitute a non-photosynthetic sink tissue with little function of Rubisco, the C3 cycle and photorespiratory enzymes. When the cells mature, there is an increase in the C4 enzymes PPDK, PEPC and NAD-ME, while levels of Rubisco increase only moderately. The two compartments progressively form until fully developed in mature leaves accompanied by fusion of the several smaller vacuoles to form a single central vacuole in B. sinuspersici cells (Park et al. 2009). 1

Since all currently known SCC4 species are halophytes, the question was asked if salt and 8

the resulting succulent phenotype is a requirement for the development of a functional SCC4 system. Salinity studies with B. sinuspersici indicated optimum growth and photosynthetic performance with moderate salt levels (50-200 mM NaCl), whereas high salt concentrations (400 mM NaCl) had detrimental effects on both parameters (Leisner et al. 2010). However, the two compartments and the dimorphic chloroplasts developed regardless of the presence of salt, indicating that salt is not a requirement for SCC4 photosynthesis per se.

So far the mechanism that drives the formation of the two compartments in the SCC4 species is unknown. In mature chlorenchyma cells of both structural variants, a highly organized network of actin filaments and microtubules was found to be associated with the chloroplasts of the two compartments. Experiments with cytoskeleton disrupting drugs indicated that the

18

microtubules are especially critical for maintaining the positioning of the dimorphic chloroplasts and other organelles in the subcellular compartments in mature cells of S. aralocaspica and B. sinuspersici (Chuong et al. 2006; Park et al. 2009). Further experiments are needed to understand if the cytoskeleton is also implicated in the initial formation of the two compartments in young leaves and what signals are important for this polarization.

The influence of light

Light is an essential stimulus in plants for determination of leaf identity as well as for the conversion of proplastids to chloroplasts (Kerstetter and Poethig 1998). In C4 plants, light is involved in the regulation of C4 enzymes at the transcriptional and post-transcriptional level

(Langdale and Nelson 1991; Long and Berry 1996; Offermann at al. 2006; Sheen and Bogorad

1987) and exerts control over tissue-specific accumulation of Rubisco although species-specific 1 differences exist (Langdale et al. 1988; Wang et al. 1993). In fully developed tissues, light is an 9 influential factor in the positioning of chloroplasts for typical light and shade avoidance movements in response to excess or insufficient light (Davis et al. 2011; Kagawa 2001; Mullen et al. 2006; Sakurai et al. 2004).

To determine whether development of the two structural domains and the specific protein accumulation pattern in SCC4 species is light dependent, analyses were made with cotyledons of

S. aralocaspica. As mentioned above, young cotyledons of this species have only one structural type of chloroplast containing Rubisco and no PPDK. Also the cells do not show the typical polarization, which is similar to the situation in young cells of the true leaves. When young seedlings were kept in darkness, only one Rubisco containing type of plastid was observed in the cotyledons. In contrast, light induced the formation of dimorphic chloroplasts from a single pool

19

of monomorphic chloroplasts together with the typical polarization of the cell and induction of

C4 cycle enzymes (Voznesenskaya et al. 2004). Therefore, at least in the cotyledons of S. aralocaspica, light seems to trigger the complete developmental transition including morphological as well as biochemical adaptations. In another experiment, branches of B. sinuspersici containing fully developed cells were exposed for one month to complete darkness which did not result in the loss of the typical SCC4 anatomy although the packing of the central compartment with chloroplasts was less dense compared to leaves from light grown branches

(Lara et al. 2008). Complete darkness caused a reduction in transcript levels for the major photosynthetic proteins albeit the corresponding proteins were still highly abundant.

Alternatively, when plants were kept for one month under very low light conditions, cells showed a strong shade avoidance response with a shift of the central compartment towards the periphery of the chlorenchyma cells. The positioning of the peripheral chloroplasts remained 2 instead remarkably stable (Lara et al. 2008; Park et al. 2008). Despite this change in morphology, 0

low light grown branches retained physiological C4 features and showed low photorespiration.

This indicates that once the developmental program towards the SCC4 phenotype is triggered by light, it is stable and irreversible.

How do the chloroplasts become dimorphic?

How the two different chloroplast types in SCC4 species develop is currently unknown.

Obviously, tissue-specific expression of chloroplast-targeted proteins is not possible in the SCC4 systems since two morphological and biochemical different types of chloroplasts occur here within the same cell and all chloroplast-targeted proteins are under transcriptional control of a single nucleus. Alternative mechanisms allowing for differential protein accumulation must

20

therefore exist. It was hypothesized that, in SCC4 species, control of protein accumulation in the two chloroplast types could either be achieved through selective protein import, differential mRNA localization or selective degradation of proteins (Offermann et al. 2011b). Chloroplasts generally seem to be capable of forming selective protein import complexes. This has been shown for example for the differential import pathways of photosynthetic and non- photosynthetic proteins through specialized TIC-TOC complexes (Inoue et al. 2010; Ivanova et al. 2004; Kubis et al. 2004; Smith et al. 2004). Accordingly, this could also provide the mechanism for selective accumulation of RPP enzymes in the proximal or central compartment chloroplasts and, in turn, for PPDK in the distal or peripheral chloroplasts of SCC4 species. To ensure that the translocation mechanism of nuclear-encoded chloroplast targeted pre-proteins operates in a manner consistent with known import mechanisms of chloroplasts, the functional characteristics of TOC159, TOC132 and TOC34 were investigated in the SCC4 species. Lung 2 and Chuong (2012) showed that the genetic composition of these three import proteins in B. 1 sinuspersici exhibit similar characteristics of previously identified higher plant chloroplast TOC components. However, initial experiments with protoplasts showed that the Rubisco small subunit transit peptide fused to a reporter does not show the expected differential accumulation pattern between the two chloroplast types suggesting that selectivity is more likely exerted at the level of protein stability or specific localization of transcripts (Lung et al. 2011). Such targeting of transcripts to specific subcellular domains has been observed in plants before. For example,

Michaud et al. (2010) showed that the mRNAs for mitochondrial proteins were targeted to the mitochondrial surface in potato. Also, mRNAs for seed storage proteins were localized to specific subdomains of the endoplasmic reticulum in rice (Doroshenk et al. 2010). Accordingly, in the SCC4 species, mRNAs for Rubisco could be targeted to the proximal or central

21

compartment chloroplasts, whereas PPDK mRNA could be targeted to the peripheral or distal compartment chloroplasts.

Besides differential import or mRNA localization, there is some evidence for a specific protein degradation mechanism operating in the MC chloroplasts of Kranz type C4 species (Oswald et al.

1990). Here, the accumulation of the large subunit of Rubisco (RBCL) has been shown to be partially under transcriptional and translational control; however, the observed differences were not sufficient to explain the exclusive accumulation of RBCL in BSC chloroplasts. Hence, it was proposed that the RBCL protein must be specifically degraded in the mesophyll chloroplasts.

Although very little is known about substrate specificity of chloroplast proteases (Kato and

Sakamoto 2010), it is possible that a similar mechanism regulates differential protein accumulation in SCC4 species.

In summary, our understanding of the development of the two cytoplasmic domains and the 2 formation of dimorphic chloroplasts is still very limited and more research is needed to 2

understand the molecular mechanisms enabling these species to perform C4 photosynthesis within individual cells.

Efficiency of single-cell C4 photosynthesis

Since the discovery of SCC4, a central question is how efficient are these systems in comparison with the classical Kranz C4 species? For example, initial comparison of carboxylation rates from the SCC4 species with literature values from other Kranz C4 species indicated lower carboxylation efficiency in the SCC4 as compared to Kranz C4 photosynthesis (Von Caemmerer

2003). However, subsequent side-by-side comparison with related Kranz C4 and C3 species showed no intrinsically lower carboxylation efficiency in the SCC4 compared to the Kranz C4

(King et al. 2012; Smith et al. 2009). Interestingly, carboxylation rates calculated per unit of

22

Rubisco were even higher in the SCC4 than in the Kranz C4 species (Smith et al. 2009). Higher rates of carboxylation per Rubisco protein in C4 vs. C3 plants can be expected due to the CCM and consequently the lower investment in Rubisco protein in the C4 species. However, the reasons for the apparent higher carboxylation rates per unit Rubisco of the SCC4 compared to the

Kranz C4 is currently unknown and needs to be determined in future studies.

In C4 plants, Rubisco usually cannot fix all CO2 that is donated by the decarboxylase.

There is some leakage of CO2, which is a function of the relative rates of the C4 and C3 cycles, and the magnitude of resistance to the diffusion of CO2 from the PCR compartment. In Kranz C4 plants, this leakiness (defined as rate of leakage divided by the rate of the C4 cycle times 100) was estimated to be as high as 20-40% depending on the species and conditions (Henderson et al.

1992; Kubasek et al. 2007) Leakiness could be even higher in SCC4 plants, due to the absence of a cell wall separating the PCA and PCR compartments, thereby reducing the efficiency 2 compared to Kranz C4 species. However, comparison of quantum requirements for carbon 3

fixation between SCC4 and Kranz C4 at 25°C did not show systematic differences between the photosynthetic types, indicating that leakiness in the SCC4 is not substantially higher compared to Kranz C4 species (Smith et al. 2009). King et al. (2012) also studied the efficiency of the SCC4 system utilizing a coupled gas-exchange and membrane inlet mass spectrometry system to measure the rates of CO2 assimilation and carbon isotope discrimination simultaneously when changing the temperature, light and CO2 levels. They used leakiness as a proxy for photosynthetic efficiency since it increases the energy requirement for CO2 fixation. Although leakiness cannot be directly measured, it is possible to estimate it by comparing modeled with measured isotope fractionation because the loss of concentrated CO2 around Rubisco affects the amount of fractionation by this enzyme (Farquhar 1983; Henderson et al. 1992). Under

23

saturating light conditions, leakage was found to be similar in the two different structural types of SCC4 and in the Kranz C4 species tested. Also, increased temperature and light caused a decrease in leakage in all tested plants, similar to what has been observed previously in Kranz C4 plants (Kubasek et al. 2007). In another study, isotope analysis from dry matter indicated increased leakage when B. sinuspersici plants were exposed to prolonged very low light conditions and it was speculated that this could be caused by the observed shift of the central compartment from the center of the cell towards the periphery, thereby altering the diffusive resistance of the system (Lara et al. 2008). Interestingly, leakage increased much more in B. sinuspersici than in S. aralocaspica under low light conditions (King et al. 2012). This may be related to the anatomical differences between the two structural types of SCC4. Bienertia species have the central compartment with tightly packed chloroplasts, therefore, self-shading of chloroplasts, especially under low illumination, could limit photosynthetic efficiency of the C3 2 cycle. This could cause increased leakage due to imbalance between function of the C4 and C3 4 cycles.

What limits loss of CO2 in the SCC4 system?

The lack of a bundle sheath cell wall in the SCC4 species raised the question of what limits the diffusion of CO2 out of the PCA compartment? In S. aralocaspica, there is a long diffusion path of approximately 60 micrometers through the thin cytoplasmic layer from the distal to the proximal compartment (Voznesenskaya et al. 2001). The length of the transvacuolar strands connecting the peripheral cytoplasm with the central compartment in B. sinuspersici is approx.

20 micrometers. The diffusive resistance in both structural types was found to be in the range of

C4 plants by calculating the liquid-to-air phase pathway and by gas exchange measurements with

24

inactivation of the C4 cycle by inhibiting PEPC activity (Edwards et al. 2007). Therefore, the cytoplasm might be the primary barrier for CO2 diffusion in the SCC4 systems. However, it is unclear why S. aralocaspica, with the longer diffusive path, shows similar leakage as Bienertia, with the shorter path, and it was postulated that additional mechanisms might influence the efficiency of the CO2 concentrating mechanism in the two structural types (King et al. 2012).

Additionally, all known SCC4 plants belong to the NAD-ME subtype with decarboxylation taking place in the mitochondria located either at the proximal end of the cell as in S. aralocaspica or internally in the central compartment in B. sinuspersici. Therefore, diffusion of

CO2 through chloroplasts and across organellar membranes should provide additional resistance

(see von Caemmerer and Furbanks (2003) for discussion of factors providing resistance to leakage in Kranz C4 species including the membranes).

2 5

Conclusion and outlook

The results summarized in this review show that SCC4 plants operate a CCM and can sustain carboxylation rates similar to Kranz C4 species under favorable conditions. However, future studies are needed to determine the efficiency of the SCC4 species relative to other structural and biochemical forms of C4 plants including the investment in the C4 versus C3 cycle, water and nitrogen use efficiency, and efficiency of delivery of CO2 to the C3 cycle (control of leakiness).

The two structural forms of SCC4 may represent alternative means of C4 function along with the

25 structural forms of Kranz anatomy, which are currently recognized among terrestrial plants

(Edwards and Voznesenskaya 2011).

The unique development of the two cytoplasmic compartments and the differentiation of the

25

dimorphic chloroplasts in SCC4 species require investigation into the molecular mechanisms regulating these processes and enabling technologies for the study of these questions are currently being developed. Transient transformation of B. sinuspersici protoplasts has already been accomplished (Lung and Chuong 2012; Lung et al. 2011). Furthermore, B. sinuspersici has also been shown to be amenable to regeneration from stem explants (Rosnow et al. 2011) and apical and axillary buds (Northmore et al. 2012) while regeneration from undifferentiated callus and stable plant transformation remains to be investigated. Sequence availability for the SCC4 species is currently very limited although some sequences, mainly used for transcriptional and phylogenetic analyses, have been submitted to public databases (Kadereit et al. 2003; Kapralov et al. 2006; Lara et al. 2006; Lung and Chuong 2012; Park et al. 2010; Schuetze et al. 2003).

Given the uniqueness of the SCC4 species in terms of its mode of photosynthesis and the highly unusual subcellular compartmentalization, efforts are ongoing to sequence the first genomes and 2 transcriptomes of SCC4 species, website at: 6 http://genomics.wsu.edu/pages/researchbienertia/index.html.

Acknowledgements:

We are grateful to Gerald E. Edwards and Christoph Peterhaensel for critical reading of the manuscript. This work was supported by grants from the National Science Foundation, Grants

IOS 0641232 and MCB 1146928 and by the Civilian Research and Development Foundation

Grant RUB1-2982-ST-10 in support of RMS and from the Deutsche Forschungsgemeinschaft

(OF106/1-1) to SO.

26

A) C3 photosynthesis B) Kranz C4 photosynthesis

CO2 O2 CO2

MC

MC BSC BSC C4 CO2

P-glycerate P-glycolate

carbohydrates photorespiration

D) SCC photosynthesis C) SCC4 photosynthesis 4 Sueada type Bienertia type

distal CO central central CO2 2 chloroplast compartment compartment chloroplast C 4 C4 2 vacuole COCO22 7

CO2

proximal peripheral cytoplasmic chloroplast chloroplast channel

Fig1: Comparison of C3 (A), Kranz C4 (B) and the two structural types of single-cell C4 photosynthesis (C and D). Magenta and cyan colors indicate the PCA and PCR compartments, respectively. The vacuole is additionally indicated in yellow in the single-cell C4 types.

27

2 8

28

Fig2: Biochemistry of SCC4 species. The NAD-ME C4 pathway of the SCC4 species is highlighted in red and the NADP-ME and PEP-CK subtypes are shown for comparison. General steps involved in carbon fixation are shown on the right side of the figure and correspond to the reactions shown in detail on the left side. Metabolites are shown in white and enzymes are color- coded depending on their subcellular localization. Cytoplasmic and mitochondrial localized enzymes are colored in brown and yellow, respectively. Chloroplasts localization is shown in blue. -R/+R and -ATP/+ATP indicate consumption and production of reductive power and ATP.

Note that there is a net shuttling of reductive power from the PCA to the PCR in form of malate in NADP-ME and PEP-CK types, but not in the NAD-ME type. Oxidation of malate in the mitochondria of PEP-CK types in thought to drive ATP generation required for PEP regeneration by PEP-CK enzyme (red arrow) (Carnal et al. 1993). PEP-CK = phosphoenolpyruvate carboxykinase; NAD-ME = NAD dependent malic enzyme; NADP-ME = NADP dependent 2 malic enzyme; ALA-AT = alanine aminotransferase; PPDK = phosphoenolpyruvate, Pi-dikinase; 9

PEPC = phosphoenolpyruvate carboxylase; CA = carbonic anhydrase; NADP-MDH = NADP dependent malate dehydrogenase; ASP-AT = aspartate aminotransferase; NAD-MDH = NAD dependent malate dehydrogenase; Rubisco = ribulose 1,5 bisphosphate carboxylase/oxygenase.

PA = pyruvate; PEP = phosphoenolpyruvate; HCO3- = bicarbonate; OAA = oxaloacetate; MA = malate; ASP = aspartate; ALA = alanine.

29

References

Akhani H, Barroca J, Koteeva N, et al. (2005) Bienertia sinuspersici (Chenopodiaceae): A New

Species from Southwest Asia and Discovery of a Third Terrestrial C4 Plant without

Kranz Anatomy. System Bot 30:290–301.

Akhani H, Chatrenoor T, Dehghani M, et al. (2012) A new species of Bienertia

(Chenopodiaceae) from Iranian salt deserts: A third species of the genus and discovery of

a fourth terrestrial C4 plant without Kranz anatomy. Plant Biosys - Intl J Deal Aspect

Plant Biol 1–10. doi: 10.1080/11263504.2012.662921

Akhani H, Lara MV, Ghasemkhani M, et al. (2008) Does Bienertia cycloptera with the single-

cell system of C4 photosynthesis exhibit a seasonal pattern of δ13C values in nature

similar to co-existing C4 Chenopodiaceae having the dual-cell (Kranz) system? 3 Photosynth Res 99:23–36. doi: 10.1007/s11120-008-9376-0 0

Berry JO, McCormac DJ, Long JJ, et al. (1997) Photosynthetic Gene Expression in Amaranth, an

NAD-ME Type C4 Dicot. Funct Plant Biol 24:423 – 428.

Bowes G, Rao S, Estavillo GM, Reiskind JB (2002) C4 mechanisms in aquatic angiosperms:

comparisons with terrestrial C4 systems. Funct Plant Biol 29:379 – 392.

Bowes G, Salvucci ME (1989) Plasticity in the photosynthetic carbon metabolism of submersed

aquatic macrophytes. Aquat Bot 34:233–266. doi: 10.1016/0304-3770(89)90058-2

Buchmann N, Brooks JR, Rapp KD, Ehleringer JR (1996) Carbon isotope composition of C4

grasses is influenced by light and water supply. Plant, Cell Environ 19:392–402. doi:

10.1111/j.1365-3040.1996.tb00331.x von Caemmerer S (2000) Biochemical Models of Leaf Photosynthesis. Techniques in Plant

Science, CSIRO Publishing, Collingwood, VIC, Australia.

30

von Caemmerer S (2003) C4 photosynthesis in a single C3 cell is theoretically inefficient but

may ameliorate internal CO2 diffusion limitations of C3 leaves. Plant, Cell Environ 26:

1191-1197 von Caemmerer S, Furbank R (2003) The C4 pathway: an efficient CO2 pump. Photosynth Res

77:191–207.

Carnal NW, Agostino A, Hatch MD (1993) Photosynthesis in Phosphoenolpyruvate

Carboxykinase-type C-4 Plants - Mechanism and Regulation of C-4 Acid

Decarboxylation in Bundle-Sheath Cells. Archiv Biochem Biophys 306:360–367.

Christin PA, Osborne CP, Sage RF, et al. (2011) C4 are not younger than C4 monocots.

J Exp Bot 62:3171 –3181. doi: 10.1093/jxb/err041

Chuong SDX, Franceschi VR, Edwards GE (2006) The Cytoskeleton Maintains Organelle

Partitioning Required for Single-Cell C4 Photosynthesis in Chenopodiaceae Species. 3 Plant Cell 18:2207–2223. doi: 10.1105/tpc.105.036186 1

Davis PA, Caylor S, Whippo CW, Hangarter RP (2011) Changes in leaf optical properties

associated with light-dependent chloroplast movements. Plant Cell Environ 34:2047–

2059. doi: 10.1111/j.1365-3040.2011.02402.x

Doroshenk KA, Crofts AJ, Washida H, et al. (2010) Characterization of the rice glup4 mutant

suggests a role for the small GTPase Rab5 in the biosynthesis of carbon and nitrogen

storage reserves in developing endosperm. Breeding Sci 60: 556-567

Edwards GE, Voznesenskaya EV (2011) C4 photosynthesis: Kranz forms and single-cell C4 in

terrestrial plants. In: Raghavendra AS, Sage RF (eds) C4 Photosynthesis and Related

CO2 Concentrating Mechanisms. Advances in Photosynthesis and Respiration, Vol 32

Springer, Dordrecht, The Netherlands, pp 29-61

31

Edwards GE, Voznesenskaya E, Smith M, et al. (2007) Breaking the Kranz paradigm in

terrestrial C4 plants: does it hold promise for C4 rice? In: Sheehy JE, Mitchell PL, Hardy

B (eds) Charting new pathways to C4 rice. International Rice Research Institute, World

Scientific, Los Banos

Farquhar G (1983) On the Nature of Carbon Isotope Discrimination in C4 Species. Aust J Plant

Physiol 10:205. doi: 10.1071/PP9830205

Freitag H, Stichler W (2000) A Remarkable New Leaf Type With Unusual Photosynthetic Tissue

in a Central Asiatic Genus of Chenopodiaceae. Plant Biol 2:154–160. doi: 10.1055/s-

2000-9462

Freitag H, Stichler W (2002) Bienertia cycloptera Bunge ex Boiss., Chenopodiaceae, another C4

Plant without Kranz Tissues. Plant Biol 4:121–132. doi: 10.1055/s-2002-20444

Furbank RT (2011) Evolution of the C4 photosynthetic mechanism: are there really three C4 acid 3 decarboxylation types? J Exp Bot 62:3103 –3108. doi: 10.1093/jxb/err080 2

Furumoto T, Yamaguchi T, Ohshima-Ichie Y, et al. (2011) A plastidial sodium-dependent

pyruvate transporter. Nature 478:274–274. doi: 10.1038/nature10518

Hatch MD (1987) C-4 Photosynthesis - A Unique Blend of Modified Biochemistry, Anatomy

and Ultrastructure. Biochem et Biophys Acta 895:81–106.

Henderson S, Caemmerer S, Farquhar G (1992) Short-Term Measurements of Carbon Isotope

Discrimination in Several C4 Species. Funct Plant Biol 19: 263-285

Inoue H, Rounds C, Schnell DJ (2010) The Molecular Basis for Distinct Pathways for Protein

Import into Arabidopsis Chloroplasts. Plant Cell 22: 1947-1960

Ivanova Y, Smith MD, Chen K, at al. (2004) Members of the Toc159 import receptor family

represent distinct pathways for protein targeting to plastids. Mol Biol Cell 15: 3379-3392

32

Kadereit G, Borsch T, Weising K, at al. (2003) Phylogeny of and

Chenopodiaceae and the evolution of C4 photosynthesis. Int J Plant Sci 164: 959-986

Kadereit G, Ackerly D, Pirie MD (2012) A broader model for C4 photosynthesis evolution in

plants inferred from the goosefoot family (Chenopodiaceae s.s.). Proc Royal Soc B: Biol

Sci. doi: 10.1098/rspb.2012.0440

Kagawa T (2001) Arabidopsis NPL1: A Phototropin Homolog Controlling the Chloroplast High-

Light Avoidance Response. Science 291:2138–2141. doi: 10.1126/science.291.5511.2138

Kapralov MV, Akhani H, Voznesenskaya EV, et al. (2006) Phylogenetic Relationships in the

Salicornioideae / Suaedoideae / Salsoloideae s.l. (Chenopodiaceae) Clade and a

Clarification of the Phylogenetic Position of Bienertia and Alexandra Using Multiple

DNA Sequence Datasets. System Bot 31:571–585. doi: 10.1043/06-01.1

Kato Y, Sakamoto W (2010) New Insights into the Types and Function of Proteases in Plastids. 3 Int Rev Cel Mol Bio 280: 185-218 3

Kerstetter RA, Poethig RS (1998) The Specification of Leaf Identity During Shoot Development.

Annual Review of Cell and Developmental Biology 14:373–398. doi:

10.1146/annurev.cellbio.14.1.373

King JL, Edwards GE, Cousins AB (2012) The efficiency of the CO2-concentrating mechanism

during single-cell C4 photosynthesis. Plant, Cell & Environment 35:513–523. doi:

10.1111/j.1365-3040.2011.02431.x

Kubasek J, Setlik J, Dwyer S, et al. (2007) Light and growth temperature alter carbon isotope

discrimination and estimated bundle sheath leakiness in C4 grasses and dicots.

Photosynth Res 91: 47-58

33

Kubis S, Patel R, Combe J, et al. (2004) Functional specialization amongst the Arabidopsis

Toc159 family of chloroplast protein import receptors. Plant Cell 16: 2059-2077

Langdale J, Nelson T (1991) Spatial regulation of photosynthetic development in C4 plants.

Trends in Genetics 7:191–196. doi: 10.1016/0168-9525(91)90124-9

Langdale JA, Zelitch I, Miller E, Nelson T (1988) Cell position and light influence C4 versus C3

patterns of photosynthetic gene expression in maize. EMBO J 7:3643–3651.

Lara MV, Chuong SDX, Akhani H, et al. (2006) Species Having C4 Single-Cell-Type

Photosynthesis in the Chenopodiaceae Family Evolved a Photosynthetic

Phosphoenolpyruvate Carboxylase Like That of Kranz-Type C4 Species. Plant Physiol

142:673–684. doi: 10.1104/pp.106.085829

Lara MV, Offermann S, Smith M, et al. (2008) Leaf Development in the Single-Cell C4 System

in Bienertia sinuspersici: Expression of Genes and Peptide Levels for C4 Metabolism in 3 Relation to Chlorenchyma Structure under Different Light Conditions. Plant Physiol 4

148:593–610. doi: 10.1104/pp.108.124008

Leisner CP, Cousins AB, Offermann S, et al. (2010) The effects of salinity on photosynthesis

and growth of the single-cell C4 species Bienertia sinuspersici (Chenopodiaceae).

Photosynth Res 106: 201-214

Long JJ, Berry JO (1996) Tissue-Specific and Light-Mediated Expression of the C4

Photosynthetic NAD-Dependent Malic Enzyme of Amaranth Mitochondria. Plant Physiol

112:473–482. doi: 10.1104/pp.112.2.473

Lung SC, Chuong SD (2012) A transit peptide-like sorting signal at the C terminus directs the

Bienertia sinuspersici preprotein receptor Toc159 to the chloroplast outer membrane.

Plant Cell 24: 1560-1578

34

Lung SC, Yanagisawa M, Chuong SD (2011) Protoplast isolation and transient gene expression

in the single-cell C4 species Bienertia sinuspersici. Plant Cell Rep 30: 473-484

Michaud M, Marechal-Drouard L, Duchene AM (2010) RNA trafficking in plant cells: targeting

of cytosolic mRNAs to the mitochondrial surface. Plant Mol Biol 73: 697-704

Mullen JL, Weinig C, Hangarter RP (2006) Shade avoidance and the regulation of leaf

inclination in Arabidopsis. Plant, Cell Environ 29:1099–1106. doi: 10.1111/j.1365-

3040.2005.01484.x

Northmore J, Zhou V, Chuong S (2012) Multiple shoot induction and plant regeneration of the

single-cell C4 species Bienertia sinuspersici. Plant Cell Tiss Organ Cult 108: 101-109

Offermann S, Danker T, Dreymüller D, et al. (2006) Illumination Is Necessary and Sufficient to

Induce Histone Acetylation Independent of Transcriptional Activity at the C4-Specific

Phosphoenolpyruvate Carboxylase Promoter in Maize. Plant Physiol 141: 1078-1088 3 Offermann S, Okita TW, Edwards GE (2011a) Resolving the Compartmentation and Function of 5

C4 Photosynthesis in the Single-Cell C4 Species Bienertia sinuspersici. Plant Physiol

155:1612–1628. doi: 10.1104/pp.110.170381

Offermann S, Okita TW, Edwards GE (2011b) How do single cell C4 species form dimorphic

chloroplasts? Plant Sig Behav 6:762–765. doi: 10.4161/psb.6.5.15426

Ogren WL (1984) Photorespiration: Pathways, Regulation, and Modification. Annu Rev Plant

Physiol 35:415–442. doi: 10.1146/annurev.pp.35.060184.002215

Oswald A, Streubel M, Ljungberg U, et al. (1990) Differential biogenesis of photosystem-II in

mesophyll and bundle-sheath cells of 'malic' enzyme NADP(+)-type C4 plants. A

comparative protein and RNA analysis. Eur J Biochem 190: 185-194.

doi: 10.1111/j.1432-1033.1990.tb15563.x

35

Park J, Knoblauch M, Okita TW, Edwards GE (2008) Structural changes in the vacuole and

cytoskeleton are key to development of the two cytoplasmic domains supporting single-

cell C4 photosynthesis in Bienertia sinuspersici. Planta 229:369–382. doi:

10.1007/s00425-008-0836-8

Park J, Okita TW, Edwards GE (2009) Salt tolerant mechanisms in single-cell C4 species

Bienertia sinuspersici and Suaeda aralocaspica (Chenopodiaceae). Plant Sci 176:616–

626. doi: 10.1016/j.plantsci.2009.01.014

Park J, Okita TW, Edwards GE (2010) Expression profiling and proteomic analysis of isolated

photosynthetic cells of the non-Kranz C4 species Bienertia sinuspersici. Funct Plant Biol

37:1. doi: 10.1071/FP09074

Pick TR, Brautigam A, Schluter U, et al. (2011) Systems Analysis of a Maize Leaf

Developmental Gradient Redefines the Current C4 Model and Provides Candidates for 3 Regulation. Plant Cell 23:4208–4220. doi: 10.1105/tpc.111.090324 6

Rosnow J, Offermann S, Park J, et al. (2011) In vitro cultures and regeneration of Bienertia

sinuspersici (Chenopodiaceae) under increasing concentrations of sodium chloride and

carbon dioxide. Plant Cell Rep 30: 1541-1553

Sage RF (2004) The evolution of C4 photosynthesis. New Phytol 161:341–370.

Sage RF, Christin P-A, Edwards EJ (2011) The C4 plant lineages of planet Earth. J Exp Bot

62:3155 –3169. doi: 10.1093/jxb/err048

Sakurai N, Domoto K, Takagi S (2004) Blue-light-induced reorganization of the actin

cytoskeleton and the avoidance response of chloroplasts in epidermal cells of Vallisneria

gigantea. Planta 221:66–74. doi: 10.1007/s00425-004-1416-1

36

Schütze P, Freitag H, Weising K (2003) An integrated molecular and morphological study of the

subfamily Suaedoideae Ulbr. (Chenopodiaceae). Plant Sys Evo 239: 257-286, doi:

10.1007/s00606-003-0013-2

Sharpe RM, Mahajan A, Takacs E, et al. (2011) Developmental and cell type characterization of

bundle sheath and mesophyll chloroplast transcript abundance in maize. Curr Genet

57:89–102–102.

Sheen J (1999) C4 Gene Expression. Annu Rev Plant Physiol Plant Mol Biol 50:187–217. doi:

10.1146/annurev.arplant.50.1.187

Sheen JY, Bogorad L (1987) Differential expression of C4 pathway genes in mesophyll and

bundle sheath cells of greening maize leaves. J Biol Chem 262:11726–11730.

Smith MD, Rounds CM, Wang F, et al. (2004) AtToc159 is a selective transit peptide receptor

for the import of nucleus-encoded chloroplast proteins. J Cell Biol 165: 323-334 3 Smith ME, Koteyeva NK, Voznesenskaya EV, et al. (2009) Photosynthetic features of non- 7

Kranz type C4 versus Kranz type C4 and C3 species in subfamily Suaedoideae

(Chenopodiaceae). Funct Plant Biol 36:770. doi: 10.1071/FP09120

Sommer M, Bräutigam A, Weber APM (2012) The dicotyledonous NAD malic enzyme C4 plant

Cleome gynandra displays age-dependent plasticity of C4 decarboxylation biochemistry.

Plant Biol 14:621–629. doi: 10.1111/j.1438-8677.2011.00539.x

Voznesenskaya EV, Edwards GE, Kiirats O, et al. (2003) Development of biochemical

specialization and organelle partitioning in the single-cell C4 system in leaves of

Borszczowia aralocaspica (Chenopodiaceae). Am J Bot 90:1669 –1680. doi:

10.3732/ajb.90.12.1669

37

Voznesenskaya EV, Franceschi VR, Edwards GE (2004) Light-dependent Development of

Single Cell C4 Photosynthesis in Cotyledons of Borszczowia aralocaspica

(Chenopodiaceae) during Transformation from a Storage to a Photosynthetic Organ.

Annal Bot 93:177–187. doi: 10.1093/aob/mch026

Voznesenskaya EV, Franceschi VR, Kiirats O, et al. (2001) Kranz anatomy is not essential for

terrestrial C4 plant photosynthesis. Nature 414:543–546. doi: 10.1038/35107073

Voznesenskaya EV, Franceschi VR, Kiirats O, et al. (2002) Proof of C4 photosynthesis without

Kranz anatomy in Bienertia cycloptera (Chenopodiaceae). Plant J 31:649–662. doi:

10.1046/j.1365-313X.2002.01385.x

Voznesenskaya EV, Koteyeva NK, Chuong SDX, et al. (2005) Differentiation of cellular and

biochemical features of the single-cell C4 syndrome during leaf development in Bienertia

cycloptera (Chenopodiaceae). Am J Bot 92:1784–1795. doi: 10.3732/ajb.92.11.1784 3 Wang JL, Long JJ, Hotchkiss T, Berry JO (1993) C4 Photosynthetic Gene Expression in Light- 8

and Dark-Grown Amaranth Cotyledons. Plant Physiol 102:1085–1093. doi:

10.1104/pp.102.4.1085

Warburg O (1920) Über die Geschwindigkeit der photochemischen Kohlensäurezersetzung in

lebenden Zellen. II. Biochem. Z. 103: 188–217.

Weiner H, Burnell JN, Woodrow IE, et al. (1988) Metabolite Diffusion into Bundle Sheath Cells

from C4 Plants. Plant Physiol 88:815–822. doi: 10.1104/pp.88.3.815

38

Summary

Plants with novel traits are useful in a number of aspects important to humanity. The conversion of solar energy into chemical energy through photosynthesis is not a novel trait but the manner in which the SCC4 species completes this process is. Understanding how these SCC4 accomplish this task can provide the building blocks towards increasing crop yields, lessen the demand for available freshwater, increasing salt tolerance, producing biofuel stocks and engineering biopharmaceuticals.

3 9

39

Chapter References

Aubry S, Brown NJ, Hibberd JM (2011) The role of proteins in C3 plants prior to their recruitment into the C4 pathway. J Exp Bot 62: 3049 –3059

4 0

40

Chapter 2

Developmental Transcriptomes of the Single Cell C4 Photosynthetic Type Plant

Bienertia sinuspersici

Abstract

The majority of agricultural crops perform C3 type photosynthesis which lags behind C4 type photosynthesis in the ability to perform photosynthesis at higher temperatures and less water. A strategy to increase the C3 crop productivity is to increase photosynthetic efficiency, whether by adaptations to utilize water more efficiently and thus increasing land area available for farming or increasing the photosynthetic yield which translates into larger harvests. Bienertia sinuspersici is one of four known land plant species to perform C4 photosynthesis in a single cell and molecular information about this unique form of photosynthesis is lacking. In an effort to fill 4 1 in portions of the gap in our knowledge of this phenomenon we performed RNASeq on the emerging leaf stage and a mature leaf stage. 72,820 unique ESTs were analyzed with 72,169

ESTs present in the emergent leaf data and 72,089 ESTs present in the mature leaf stage data.

Annotation and Gene Ontology was assigned and Gene Ontology enrichment was performed.

Cellular component distribution, molecular functions and biological processes were shown to be differentially associated in the two developmental tissues. Nuclear component associated ESTs were the most abundant in the young tissue and chloroplast component ESTs were more abundant in the mature tissue. Biological processes associated with cell division, transcriptional regulatory mechanisms and translational components were enriched in the young tissue while translational components and post translational modifications were enriched in the mature tissue.

EST compositions of the translational components were differentially present between the young

41

and mature tissues. Overall this analysis of Bienertia sinuspersici EST compositions of Gene

Ontologies establishes developmental patterns which occur while producing the cellular dimorphism required for C4 photosynthesis which indicate the existence of a unique expression pattern of regulatory mechanisms to produce the SCC4 phenomenon.

Introduction

The study of plants with novel biochemistry, structure and physiology has advanced our knowledge of the diversity in the plant kingdom. This increased knowledge is leading to beneficial advancements in the bioenergy, medical and agricultural fields. Photosynthesis is the foundation of life on planet earth. Generally, there are 3 variations in carbon assimilation terrestrial plants; C3, CAM and C4 photosynthesis. C4 plants evolved a mechanism to capture atmospheric CO and concentrate it in the leaf so that it is not limiting for photosynthesis which 2 4 2 is most beneficial in warmer climates. This is accomplished by biochemical and structural

development of two types of photosynthetic cells which are specialized for C4 function

(mesophyll and bundle sheath), called Kranz anatomy (for review of forms see Edwards and

Voznesenskaya, 2011). However, recently novel plants have been discovered that defy the norm. In particular Bienertia and a species in its sister clade Suaeda in the Chenopodiaceae family have a diverse and intriguing set of anatomical and physiological traits that include the four known land plant species that perform C4 photosynthesis within individual photosynthetic cells in leaves, in the absence of Kranz anatomy. Species in the Suaeda genus employ both C3 and C4 photosynthetic pathways in environments as diverse as the North American continent and the Persian Gulf region (Schütze et al., 2003; Kapralov et al., 2006; Kadereit et al., 2012) Suaeda aralocaspica, Bienertia cycloptera, Bienertia sinuspersici, and Bienertia kavirense are the four

42

known terrestrial species to perform C4 photosynthesis in individual chlorenchyma cells, or single cell C4 (SCC4) utilizing dimorphic chloroplasts, as opposed to the more widely evolved dual cell Kranz type C4 photosynthesis as reviewed in (Muhaidat et al., 2007; Edwards and

Voznesenskaya, 2011; Langdale, 2011).

With few exceptions, the anatomy, photosynthetic physiology and photochemistry have been elucidated for the SCC4 structural types (reviewed in Sharpe and Offermann, 2013)

However the molecular basis for development of the specialized photosynthesis in these plant species remains elusive.

Anatomically the majority of the C4 land plants utilize a dual cell synergy between the mesophyll and bundle sheath cells to impart these physiological enhancements of carbon assimilation. This dual cell anatomy also includes cell specific physiologically and 4 morphologically distinct chloroplasts that impart the crucial components required for this 3 synergy (Sage, 2004; Edwards and Voznesenskaya, 2011). In contrast, the SCC4 species have an analogous development of distinct chloroplasts in each of the chlorenchyma cells that perform the C4 photosynthetic process (reviewed in Sharpe and Offermann, 2013). Current literature pertaining to photosynthetic leaf cell development has focused primarily on the development of mesophyll cells in C3 species, and development of mesophyll and bundle sheath cells in some C4 species (Dengler and Nelson, 1999; Li et al., 2010; Koteyeva et al., 2011; Langdale, 2011;

Nelson, 2011; Koteyeva et al., 2014).

These studies have a major focus on the progression of development of chloroplasts during leaf ontogeny as it relates to major functions for the organ and cell type. However such studies do not provide insight to the unique development of dimorphic chloroplasts within

43

individual photosynthetic cells. Molecular studies on the SCC4 species can shed light on the novel means these species develop intracellular, compartmentalized dimorphic chloroplasts, and how their functions are regulated.

Understanding the genetics behind the accomplishment of such an intricate intracellular function as the conversion of radiant energy to chemical energy via C4 photosynthesis will increase our knowledge of how these species evolved to be successful in warmer and arid environments (Ainsworth et al, 2006; Gowik et al, 2011). The physiological stresses imposed by warmer temperatures and limited water sources translate into productivity enhancements through increased ability of C4 plants to capture and assimilate carbon dioxide (Taylor et al., 2014). C3 species overall are better adapted to more temperate climes. With climate change trending towards warmer and more arid cycles, the future ability of agricultural C3 species to adopt C4 attributes will play a greater role in which of these species will be more economically viable in 4 4 the future (Keeley & Rundel, 2003; Von Caemmerer, 2013; Taylor et al., 2014).

There are two forms of chlorenchyma cell morphology in the SSC4 species. Dimorphic chloroplasts are localized to the distal and proximal poles of the cell in relation to the vascular tissue in the Suaeda aralocaspica species. The Bienertia species have dimorphic chloroplasts separated by a densely packed cytoplasmic compartment localized in the center of the cell and a cytoplasmic layer lying adjacent to the plasma membrane. These two cytoplasmic domains are compartmentalized from each other by a large vacuole and are connected to each other via cytoplasmic strands running through the vacuole (Freitag and Stichler, 2000; Freitag and

Stichler, 2002; Edwards et al., 2004). Both of these structural forms of SCC4 accomplish C4 via an NAD-malic enzyme type C4 cycle with evidence for their function as C4 from analysis of

44

carbon isotope composition of leaf biomass, photosynthetic CO2 compensation points, activities and compartmentation of photosynthetic enzymes (reviewed in Sharpe and Offermann, 2013).

The genetic orchestration of the novel SCC4 phenomenon can be investigated by unraveling the unique features of its genome. The 1n genome of Bienertia sinuspersici, from analyses of mature leaves has been estimated to be ~3.6 to 3.8 Gigabases by flow cytometry and

DAPI staining (Supplementary data file 1). This would place the genome size of B. sinuspersici in the intermediate size range (Michael, 2014), and on the same scale as that of the human genome. There is some potential to overestimate the size in mature cells due to genomic duplication as the cells size increases and transposon activity bloating the total genome size via

DNA insertions (Kumar and Bennetzen, 1999; Sablowski and Carnier Dornelas, 2014).

Characterization of the complete genome can be preceded by unraveling the gene space using a comprehensive analysis of the transcriptome using RNASeq (Ozsolak and Milos, 2011). Indeed, 4 5 this approach has yielded detailed and useful information about differential transcript abundance, regulatory transcriptional differences between the mesophyll and bundle sheath cells, during leaf development of Kranz type C4 species that has enabled elucidation of the specific underlying biology (Brautigam et al., 2011; Gowik et al., 2011; Chang et al., 2012; Takacs et al., 2012;

Aubry et al., 2014).

In this study, we performed a comprehensive transcriptome analysis using the long and short reads from 454 and Illumina sequencing technologies to gain molecular information that can be utilized for understanding the novel physiology and anatomy from the developing leaves of the halophytic SSC4 species Bienertia sinuspersici.

45

Results

General steps in the construction of the B. sinuspersici developmental transcriptome and annotation can be found in Figure 1. Briefly, the raw reads were processed through a quality check and all low quality and contaminating reads were removed (See Supplementary Data 2). A master transcriptome dataset (BsRef) was generated from young and mature leaf RNASeq data and each of the leaf read datasets were mapped back to the BsRef for expression comparisons.

Read Quality, Trimming, Mapping and Overall Transcriptome Expression

116,257 contigs were assembled from 141,504,502 total trimmed reads with an N50 of

792 bases. A robust and comprehensive transcriptomic assembly was constructed from contigs meeting the filter criteria of greater than 200 base length and an average read coverage of five or 4 greater yielded 73,486 ESTs in the final B. sinuspersici transcriptome (hereafter referred to as 6

BsRef) analysis. 72,537 ESTs remained after removal of contaminating ESTs (see below in the

EST Gene Ontology section and Supplementary Data 2). Of these, 72,089 ESTs were mapped with at least a 1 times coverage from the sequencing read dataset from mature leaves and 72,169

ESTs were mapped with at least a 1 times coverage from the sequencing read dataset from young leaves (Table 1).

Read Trimming Parameters: In this analysis it was determined that stringent trimming of read data is required for a quality assembly. However, a stringent trimming of reads, with criteria of a

97% base match over a sliding 40% sequence coverage range, for the initial assembly may have a detrimental effect on the number of ESTs built. When evaluated against a non-stringent trimmed assembly the number of ESTs built increased by less than one percent over the

46

stringently trimmed assembly. Similar percentages were obtained when 700 ESTs or 0.98%, when the total assembly was mapped with all of the untrimmed reads from mature leaf sample compared to the number of ESTs mapped with the trimmed reads. The Young dataset increased in number by 0.85%, or 609 ESTs, when mapped with all the untrimmed reads in the young dataset compared to the trimmed young reads. Further, we evaluated the use of lower quality reads (All dataset) and a looser mapping stringency and its effect on the quality and quantification of the resulting reads per kilobase per million reads (RPKM) values. The number of mature reads and young reads used in the trimmed mapping were reduced by over half compared to untrimmed read mapping evaluation. The percentage differences in the number of trimmed reads used versus the number of total trimmed reads mapped, 89.08% Mature and

89.97% Young, and the number of all reads used versus the number of all reads mapped, 88.39%

Mature and 89.26% Young, resulted in a 0.69% difference for the Mature data and a 0.71% 4 difference for the Young data. In conclusion, the use of a stringent read quality and mapping 7 strategy produces less contigs but a higher quality assembly of the sequences. Use of the original non-trimmed reads, for mapping and RPKM calculations, increases the population of reads, and thus a higher statistical power, while having little effect on the differences between the trimmed and non-trimmed read dataset percentages. The corresponding developmental tissue ratios between the untrimmed reads and the trimmed reads datasets were evaluated for similarity. 95% of the ESTs had RPKM ratios within 1 RPKM of untrimmed versus trimmed when the four additive RPKM values were filtered for greater than 4. Averaging for an RPKM of 1 was used to alleviate the bias incurred when low trimmed RPKM ratio values skew the comparison. For example, if the RPKM value for an untrimmed reads Mature EST and untrimmed reads Young

EST are 1.272 and 0.096 respectively, and the RPKM value for the same Trimmed reads Mature

47

EST and Trimmed reads Young EST are 1.355 and 0.170 respectively, the difference between the All RPKM Mature/Young ratio of 13.25 and the Trimmed RPKM Mature/Young ratio of

7.97, while biologically still relevant, is skewed towards a larger Young influence due to a very low Trimmed reads Young RPKM value.

Overall expression dynamics as determined from the transcriptome assembly indicated that 22,887 of the ESTs exhibited higher ratios in the mature tissue and 48,980 ESTs exhibited higher ratios in the younger tissues. Only a total of 7,245, or 31.66% of the overall filtered assembly from the mature tissue and 20,696 or 42.25% from the young tissue had annotations assigned by BLAST2GO. The expression profiles across the datasets were analyzed for 2, 5 and

10-fold greater expression values for each developmental tissue type (Table 1). While the total number of transcripts from each comparative fold category were higher in the young tissue than the mature tissue (Figure 2A), the percentage of ESTs over the cutoff point to the total number of 4 8 ESTs in the dataset indicate higher percentages (0.89% at 5x to 0.66% at 10x) beginning with the

5 times greater than cutoff point in the mature tissue (Figure 2B).

EST Gene Ontology

Blast2GO’s Gene Ontology (GO) annotation for a sequence relies on the results generated by a blastx alignment of an in silico nucleotide translation (Altschul et al., 1997;

Conesa et al., 2005). The reference species assigned is from the best match homologies from the results generated by the computational algorithms used by the blastx alignment. Of the five times average coverage and at least 200 base length filtered, 73,486 B. sinuspersici ESTs assembled,

28,313 ESTs or 38.5%, had GO associations annotated through Blast2GO. A total of 296,017

GO terms were associated with the B. sinuspersici 28,313 ESTs; 47.95% of the reference terms

48

being automatically assigned, 29.63% of the reference terms were derived from computational analysis, 18.07% of the reference terms arrived at from experimental evidence and 4.36% of the reference terms specified by the authors or curators. A total of 26,067 of the EST annotations were derived primarily from 28 plant species while the remaining 2,246 ESTs were categorized as “other” species (sic) or “unknown” (Supplementary Data 1 & Supplementary File 1b). GO associations rely on inter-species sequence homology at the amino acid level (blastx alignments), or sequence homology at the nucleotide level (blastn alignments), so subsequently, species identity from GO associations made at the nucleotide level may or may not correspond to the same species made at the amino acid level. This is due to the codon degeneracy, bias and usage that make up the amino acids and amino acid differences between species. In the relatively small pool of inter-species highly conserved genes, it is expected there will be instances of 100% similarity at the amino acid level but, even in these highly conserved genes, 100% nucleotide 4 sequence similarity is very rare even between sister species. The e-value associated with an 9 annotation is the probability of a sequence being homologous to the identity of the reference gene and so by inference the unknown sequence is either a paralogue or ortholog of the reference gene. The use of annotations and probabilities assigned to the annotations are a useful tool in assessing the confidence level to which the identity of an unknown sequence can be made

(Altschul et al., 1997).

Due to the predictive nature of GO assignments, the 28 species top blastx assignments to the 28,313 B. sinuspersici ESTs were analyzed for assignments with e-values of 0.00 and 100% similarity at the amino acid level. If the EST had a 100% amino acid similarity, the B. sinuspersici EST sequence was extracted from the fasta data generated by the assembly and a reciprocal blastn of the sequence was made against the nt database at NCBI to determine if the

49

species assigned with 100% probability at the amino acid level matched the species assigned at the nucleotide level with an e-value of 0.00 (Supplementary File 1 a, b & c). Results from this analysis were as expected, with sequence annotations having a high confidence level made at the amino acid level and very little if any ESTs with an e-value of 0.00 and 100% nucleotide identity. ESTs aligned and annotated from the species Vitis vinifera and Solanum lycopersicum were exceptions. Four reciprocal blastn sequence annotations with 100% similarity aligned to

Vitis vinifera were no longer than 33 bases in length and constituted 0.03% of the ESTs associated with the Vitis vinifera species and 0.01% of the 28,313 overall annotated ESTs.

Solanum lycopersicum associated annotations on the other hand totaled 179 ESTs of the 405 total species associations. These 179 ESTs had blastx e-values of 0.00 and amino acid sequence identities of 100%. A reciprocal blastn of the 409 ESTs identified the same 179 blastx ESTs with an e-value of 0.00 and a 100% base identity similarity with an average hit length of 842 bases 5 across the 179 EST dataset. The large disparity between the numbers of ESTs with very high 0 homology to S. lycopersicum as opposed to the group of species from which the B. sinuspersici

ESTs were annotated led to the conclusion of possible cross species contamination and the set of

ESTs corresponding to the S. lycopersicum reciprocal blastn identities of greater than or equal to

95% were removed from the final analysis (Supplementary File 1).

EST Gene Ontology Classifications

Assignment of gene ontologies to annotated datasets enables identification and classification of processes. Accordingly, there are three gene ontologies; Cellular Component,

Biological Processes and Molecular Function, with which a gene product can be associated (The

Gene Ontology Consortium, 2000). As such, the gene product may have multiple associated

50

terms assigned to it within the ontologies as in an ancestor, parent and child type relationship.

GO enrichment analysis was reduced to the most specific annotated terms, or the last GO annotated child term for each annotated EST to reduce the number of terms due to the redundancy inherent in ancestor\parent\child relationships. GO enrichment was performed for over-represented and under-represented terms between the two developmental datasets with the

Fisher’s Exact test employing a p-value cutoff of 0.01 and a false discovery rate of 0.05.

Due to the nature of interconnectivity and the hierarchal structure of the ontology system a well annotated EST can be easily categorized as to its location and function. A gene involved in the regulation of a biological process that produces a metabolite associated with a developmental process would produce at least three first ancestor GO terms, GO:0065007 biological regulation, GO:0008152 metabolic process and GO:0032502 developmental process.

The better a sequence is annotated, from both an assembly and a gene ontology assignment 5 1 perspective, the better an EST can be characterized from a bioinformatics standpoint. To illustrate the impact of the numbers of each first ancestor term has on the overall first ancestor consortium see Figure 4. The percentages of the number of terms in relation to the overall number of overrepresented terms in a tissue type, were expressed as a percentage of the overall first ancestor for that tissue type (Figure 4A & C). For the overrepresented GO terms in only one developmental tissue, seven first ancestor cellular components were represented (Figure 4A). Six of the seven cellular components were present in both developmental tissue types and only the vacuole was overrepresented in the mature tissue solely. There were ten unique Biological

Process ontology first ancestor terms in the significantly higher expressed mature and young datasets (Figure 4B). Only six of the ten were represented in the mature tissue with all ten first ancestor terms present in the young dataset.

51

There were a total of 328 unique GO terms assigned in the enriched mature or young developmental GO sets (Table 2). A total of 188 terms belonged to the biological processes

(bpGO), 45 terms to the cellular component (ccGO) and 95 terms to the molecular function

(mfGO) ontologies (Table 2 and Figure 3). For each process, whether some terms are over- represented our under-represented in relation to the total ESTs is presented below.

Representation of ccGO. Overall there were 20 terms belonging to ccGO which were over- represented only in the young tissue (overY) dataset, while there were 11 terms which were over- represented only in the mature tissue (overM). The under-represented only in the young tissue

(underY) followed, with 6 terms, and the under-represented only in the mature tissue (underM) had 3 enriched terms. There were no shared GO terms between the overY and underM or the underY and underM category.

5 There were however, 5 terms in ccGO that were shared between the two developmental 2 tissues. Two terms, GO:0009523 photosystem II and GO:0009941 chloroplast envelope components, were overM and underY. Interestingly, three of the ccGO terms were over- represented in both tissues; GO:0022625 cytosolic large ribosomal subunit, GO:0005730 nucleolus and GO:0005886 plasma membrane. Further analysis on these ccGO terms revealed discrete subsets of ESTs indicating differential activity in these functions. Differential nucleolus, ribosomal and plasma membrane component composition indicate potential involvement in development of dimorphism in the chlorenchyma cell.

While mature leaves had a higher level of expression of chloroplast envelope ESTs relative to the total EST dataset, there were more GO: 0009941 chloroplast envelope ESTs (228

Y vs. 105 M) expressed significantly higher (SH) in the young tissue compared to the mature

52

tissue (Table 3). The enriched GO:0009523 photosystem II term comprised a total of 46 ESTs overall with overM of 18 ESTs and the underY with only 8 ESTs (Table 3). Three of the five cellular component terms; GO:0022625 cytosolic large ribosomal subunit (Table 3),

GO:0005730 nucleolus (Table3) and GO:0005886 plasma membrane (Table 3), were overM and overY with a total of 56 SH mature ESTs and 79 SH young ESTs for a total of 173 ESTs in the

GO:0022625 cytosolic large ribosomal subunit category, 116 SH mature ESTs and 237 SH young ESTs in the GO:0005730 nucleolus category and 436 SH mature ESTs and 1315 SH young ESTs out of 3820 ESTs for the GO:0005886 plasma membrane category. Mature ESTs enriched in the cellular component ontology were expressed SH and more abundant in the photosystem II term only. The overM EST dataset had lower numbers of SH ESTs in the remaining four cellular component categories than did the young tissue EST dataset.

Among the cellular component GO terms, in the mature leaves 32% were assigned to the 5 3 chloroplast component with the other six components splitting the remaining 68% GO terms within the individual components ranging from 16 to 10 percent (Figure 4A). The mature tissue was comprised of six discrete first ancestor Biological Process ontology terms which included 41 discrete overrepresented GO terms and 43 discrete under represented GO terms. The unique overrepresented first ancestor GO terms for the young tissue were restricted to 6 of the 7 cellular components. The one component not overrepresented in the young tissue as opposed to the mature tissue is the vacuole component. Young tissue transcription energy centered mainly in the nucleus compartment with 45% of the annotated ESTs assigned this GO term first ancestor. The chloroplast compartment first ancestor comprised 23%.

53

Representation of mfGO. There were four mfGO terms which were shared between the mature and young developmental tissue samples; GO:0003735 structural constituent of ribosome,

GO:0030247 polysaccharide binding, GO:0003777 microtubule motor activity and GO:0005524

ATP binding. Only the GO:0003735 structural constituent of ribosome term was over represented in both tissues. The GO:0003777 microtubule motor activity and GO:0005524 ATP binding terms were overY and underM, while the GO:0030247 polysaccharide binding was overM while underY. GO:0003735 structural constituent of ribosome (Table 3) was over represented in both mature and young tissue types and, of the 509 total ESTs in the GO term, the young EST dataset contained 221 SH ESTs in relation to the 117 SH ESTs in the mature dataset.

The GO:0030247 polysaccharide binding term (Table 3) had 59 ESTs assigned and was over represented in the mature dataset with 16 ESTs SH in expression and 10 ESTs SH in expression in the young EST dataset. Both the GO:0003777 microtubule motor activity and GO:0005524 5 ATP binding terms (Table 3) were underM even though there were 250 SH ESTs in the 4

GO:0005524 ATP binding term while there were only 2 SH ESTs in the GO:0003777 microtubule motor activity category. The young tissue was over represented in these two terms and there were 1080 SH GO:0005524 ATP binding category ESTs and 64 SH GO:0003777 microtubule motor activity ESTs.

Molecular Function ontologies are orientated towards gene or sequence predicted functionality and less towards cellular level descriptions (shown above). The mfGO allows for some ontology comparisons; but, relationships can be more ambiguous than, and not as clear as comparisons between ccGO and bpGO. An example can be given with GO:0005507 copper ion binding. The first ancestors for this GO are GO:0005515 binding and GO:0001071 nucleic acid binding transcription factor activity , the second ancestor GO terms are ion binding and sequence

54

specific DNA binding transcription. These GO terms have informational use when a specific

GO:0005507 copper ion binding assigned gene is being investigated. However, this is not helpful, and in the case of GO:0001071 nucleic acid binding transcription factor activity, should not be used in manual annotation (EMBL-EBI), when attempting to characterize a developmental picture. Genes as different as CUTA and MHA5 in Arabidopsis will have

GO:0005507 copper ion binding assigned to them but they are spatially and functionally quite different from one another (Fox and Guerinot, 1998; Burkhead et al., 2003; Andrés-Colás et al.,

2006). Overall the Molecular Function ontology is quite different between the developmental tissues and these differences can be found in Supplementary Data 2 Table 2.

Representation of bpGO. The biological process ontology was the most represented of the three ontologies and comprised 17 shared GO terms. GO:0001510 RNA methylation was the only shared term to be over represented in both tissue types and GO:0015074 DNA integration and 5 5 GO:0006508 proteolysis were the only biological process terms to be over represented in mature tissue and under-represented in young tissue. Thirteen of the bpGO terms were over represented in the young tissue and under-represented in the mature tissue. Interestingly, two of the GO terms, GO:0000956 nuclear-transcribed mRNA catabolic process and GO:0019288 isopentenyl diphosphate biosynthetic process, methylerythritol 4-phosphate pathway, were under represented in both tissues and is indicative of these gene sets being enriched overall in comparison to the total transcriptome but the involved genes are equally distributed between the tissue types (Table

3). As noted above, the Biological process ontology constituted the largest shared term group with 17 members. The GO:0001510 RNA methylation term was the only term over-represented in both young and mature tissues and 224 ESTs were assigned this GO term (Table 3). 100 ESTs were SH in the young tissue with 43 SH in the mature tissue. 5 ESTs were SH in the mature

55

tissue and 37 ESTs were SH in the young tissue for the under-represented in both tissues

GO:0000956 nuclear-transcribed mRNA catabolic process term which had a total of 175 assigned terms (Table 3). The other under–represented in both tissue types GO, GO:0019288 isopentenyl diphosphate biosynthetic process, methylerythritol 4-phosphate pathway with 373 total ESTs, had 18 SH ESTs in the mature tissue and 75 SH ESTs in the young tissue (Table 3).

Twelve GO terms were over-represented in the young tissue while under-represented in the mature tissue (Table 3). 6 of these terms: DNA replication initiation, microtubule-based movement, reciprocal meiotic recombination, regulation of G2/M transition of mitotic cell cycle, regulation of meristem growth and synapsis are related to cell duplication with the great percentage, 492 young versus 21 mature, SH ESTs associated with the young tissue. The DNA replication initiation and regulation of G2/M transition of mitotic cell cycle terms contained zero

SH ESTs in the mature tissue dataset. Four GO terms over represented in young tissue, 437 SH 5 ESTs, and under-represented in mature tissue, 14 SH ESTs, were involved with transcriptional 6 regulation via methylation of chromatin, chromatin silencing by small RNA or gene silencing by miRNA. Only 2 GO terms: DNA integration and proteolysis were over-represented in the mature tissue while under-represented in the young tissue (Table 3). The 101 SH mature ESTs as well as the 93 SH young ESTs in the GO:0015074 DNA integration term are comprised mainly of transposons and RNA directed DNA synthesis genes. The GO:0006508 proteolysis term contained 37 SH young ESTs which were orthologous or paralogous to mature SH ESTs and 63 unique SH young ESTs and 4 unique SH mature ESTs. The remaining two GO terms, both of which are over represented in the young tissue and under-represented in the mature tissue, are involved in differentiation and development of the reproductive tissue. The GO:0009560 embryo sac egg cell differentiation and GO:0009909 regulation of flower development terms SH EST

56

composition mirrored the top heavy young EST to mature EST ratio of the cell duplication contingent with 85 SH young ESTs and 5 SH mature ESTs assigned a GO:0009560 embryo sac egg cell differentiation and 196 SH young ESTs and 9 SH mature ESTs assigned a GO:0009909 regulation of flower development term (Table 3).

The Biological Process Go terms in the young tissue dataset were more numerous as well as diverse, 72.6% of the overall terms in the young leaf dataset, versus 27.4% in the mature leaf dataset with the exception of the GO:0050896 response to stimulus (Figure 4B & 4C and Sup

Data 2 Table 2). The young dataset GO:0050896 response to stimulus had eight discrete GO terms associated while the mature dataset had 18. The largest category by percentage in the mature tissue was the GO:000182 metabolic process with a 31% share of the mature overall GO annotations. The only Biological Process first ancestor GO term to be congruent between the two developmental datasets was the GO:0009987 cellular process term at 29.8% of the young and 5 7 29.3% of the mature sample. The percentage of unique young tissue GO terms, in relation to the total number of discrete GO terms, was higher in the first ancestor GO:0065007 biological regulation, 13.9% young and 6.9% mature, and GO:0071840 cellular component organization or biogenesis terms, 11.3% young and 3.4% mature, where both mature and young tissues had overlapping first ancestor terms. The other first ancestor GO term to be higher by percentage in the young dataset was the GO:0032502 developmental process.

Discussion

The quality trimming of reads decreased the number of assembled contigs with increased base resolution and length. The 73,486 contigs predicted from the final assembly reflects the

RNA landscape in B. sinuspersici during two developmental time points. The increase in number

57

of contigs to previously reported numbers of expressed genes in a typical transcriptome is likely due to a finer resolution of contig base calling between SNP differential alleles, orthologous and/or paralogous gene copies (Brautigam et al., 2011). While this approach to assembly of a transcriptome works very well in a de novo aspect, the stringent read trimming trends toward the removal of legitimate RNA specie copies in extracted RNA pools due to the small but inherent inaccuracies attributed to enzymatic and technological processes used to produce a final in silico read sequence. This approach decreases the total number of legitimate RNA read occurrences thus reducing the statistical power provided by the numerous reads produced by the Illumina sequencing technology. To correct for any bias incorporated by a strict trimming and assembly regime, mapping of the developmental Illumina read datasets to the assembled reference sequences with lower homology parameters better reflect the number of transcripts represented in the overall transcriptome. Sequencing errors due to this workflow will incorporate reads with 5 significant hits which will still be counted but any polymorphisms won’t be incorporated into the 8 reference ESTs. Polymorphisms generate a greater diversity of assembled contigs, thus generating a more conservative transcript reference with which to incorporate possible sequencing reads, but still stringent enough to filter for possible contaminants. High confidence reference contigs produced from a strict parameter assembly, mapped with similar percentages, thus lending statistical credence to the use of the larger population for finer tuned predictions.

Similarity to GO term assignments relied upon by high confidence sequence assembly homology and tight similarity criteria, further filters GO term assignments for increased likelihood.

Uncertainties between the correlations on the edges, still produces a most likely group due to the varied decrease of additive constraints. This can also be mirrored to some extent by the expression values exhibited by the majority of the contigs that make up the GO terms.

58

A comprehensive developmental transcriptome from Bienertia can be used to characterize development of leaf tissue and identify areas of biological interest. The overall developmental progression as determined by the transcriptome analysis indicates in the young tissue transcript profile that the majority of the transcriptional energy is used for cellular processes, which is then followed by metabolic processes. The mature transcript profile had almost a third of the transcription energy invested in the metabolic process followed by the cellular processes and then response to stimulus. These three processes encompassed almost 88% of the transcriptional energy. The young tissue on the other hand spent only 53% of its transcriptional energy with these processes. This should be expected as the young leaf tissue is derived from leaf primordia and is in the very early stages of developing photosynthesis. The enrichment of the photosystem II and chloroplast envelope GOs in the mature tissue and under- represented in the young tissue supports the observation that the mature tissue has already 5 differentiated into a photosynthetic organ and due to its sessile nature must maintain a 9 transcriptomic profile suited to the metabolic and cellular processes of photosynthesis as well as providing the capability to respond to changing environmental cues. This observation aligns with biological functions associated with these two cellular components observed in mature tissues that are actively engaged in photosynthesis such as ROS generation, photosystem II complex D1 subunit repair and increased antenna and OE components (reviewed in Rochaix, 2014).

Three of the five GO enriched cellular components were over-represented in both developmental tissues. Two are related to the cellular translational machinery of the cell and the third cellular component enriched in both tissues involves the plasma membrane. The remaining two enriched GO terms are over-represented in the mature tissue and under-represented in the young tissue and deal with the chloroplast and the photosystem II complex. The two enriched

59

cellular components, GO:0005730 nucleolus and GO:0022625 cytosolic large ribosomal subunit, encompass the components involved in the production of the critical cytosolic RNA elements of the ribosomes and the protein constituent of the nuclear encoded cytosolic ribosome 80S.

Heterogeneity of ribosomal composition has been implicated in mRNA translational specificity and constitutes a genetic expression regulatory point. Additional investigations are needed to identify mRNA binding factors which may regulate differentiation of the photosynthetic cells.

The GO:0005886 plasma membrane term which is over-represented in both young and mature leaves consists of ESTs which are selectively enriched in young versus mature leaves.

It has been widely recognized that plant cells in a young meristematic stage divide and stop dividing as organs mature. As they mature they begin to enlarge by modifying the cell wall and increase the cellular content as well as increasing the cytoplasmic membranes and total cytoplasmic area. During the enlargement they orchestrate the movement of organelles to their 6 0 most efficient location as the cells mature. This state is predicted in young leaves of B. sinuspersici based on the largest group of enriched GO terms reported in the young leaves. The majority of the GO terms identified in the complete transcriptome in young leaves were associated with the cell cycle during the cell duplication processes illustrated by GO:0010389 - regulation of G2/M transition of mitotic cell cycle, GO:0006270 - DNA replication initiation,

GO:0007129 – synapsis and GO:0007131 - reciprocal meiotic recombination.

In the mature tissue the components of the chloroplast envelope and photosystem II were over-represented. In young chlorenchyma cells the nucleus is a dominant organelle, while in mature chlorenchyma cells of Bienertia the chloroplasts are dominant in the central and peripheral cytoplasmic compartments (Voznesenskaya et al. 2005, see Fig. 5). The higher level

60

of expression of some components of photosynthesis is expected for maintenance of the photosynthetic apparatus.

Go’s enriched in the mature tissue mainly deal with transcription processes. GO:0015074

- DNA integration - comprise 2 classes of these elements: DNA transposons (class 2 elements) and retrotransposons (class 1 elements). Class 2 elements, or the DNA transposons, encode a transposase and participate in the process in which a segment of DNA is incorporated into another, usually larger, DNA molecule such as a chromosome. Contig representatives in this term from the transcriptome belong to the long terminal repeat (LTR) transposons subclasses of

Ty1-copia and Ty3-gypsy as well as the gag-pol complex required for transcription of these

LTR-transposon elements and are known to be abundant and widely dispersed in the

Viridiplantae (Kumar and Bennetzen, 1999). These elements have been implicated in affect the genome size due to the insertion of transposon sequence and the function of a cell dependent on 6 1 the transposons insertion into or near genes thus affecting the transcription, read through transcription from promoter area insertion or methylation of the DNA sequence surrounding the insertion, and translation, reading frame disruption, of those genes (Parisod et al., 2009). This could be a fortunate implication for future genome sequencing efforts of B. sinuspersici as the flow cytometry measurements for the genome size were performed on the mature tissue and a better estimation of the B. sinuspersici genome size from flow cytometry measurements of the cells from young tissue samples could possibly illustrate a more accurate estimation of the genome size prior to transposon activity.

With the overview provided from this transcriptome, and the changes in gene ontogeny between young and mature leaves, an excellent platform has been established to investigate core

61

C4 genetic developmental cues and dimorphic chloroplast and structural development under the communication of a single nuclei in Bienertia sinuspersici.

Materials and Methods

Plant material

Bienertia sinuspersici plants were maintained in 10 gallon citrus pots in growth chambers under a 14 h light/10 h dark photoperiod with a stepwise increasing light regime to 525 PPFD at full light and an 18⁰C (dark) to 35⁰C (light) temperature regime. Plants were watered once a week and fertilized with Peters 20-21-5 once a week in between watering. Whole fully expanded mature leaves (~ 2 cm long) and newly emergent (~1 mm long), leaves from three separate B. sinuspersici plants were harvested and for each treatment combined into one sample within two hours after light initiation and immediately flash frozen in liquid nitrogen. Flash frozen leaf 6 2 tissue was ground into a fine powder with a liquid nitrogen cooled mortar and pestle.

Approximately 100 mg of frozen powder was transferred to a liquid nitrogen frozen 2 mL

Eppendorf tube and either kept in liquid nitrogen or stored at -80⁰C until RNA was extracted.

RNA extraction

Total RNA was extracted using an acid guanidinium thiocyanate phenol chloroform extraction method similar to that described by Chomczynski and Sacchi (1987). 1 mL of 0.8M guanidinium thiocyanate, 0.4 M ammonium thiocyanate, 0.1M sodium acetate pH5.0, 5% w/v glycerol, and 38% v/v water saturated phenol were added to approximately 100 mg powdered tissue, shaken to evenly mix sample and incubated at room temperature for 5 min. 200 µL chloroform was added and shaken vigorously until entire sample became uniformly cloudy and

62

incubated at room temperature for 3 min. Samples were then centrifuged at 17k x g at 4⁰C for 15 min and the aqueous phase was removed to a clean 1.5 mL Eppendorf tube. 600 µL 2-propanol was added, rocked 5 to 6 times and incubated at room temperature for 10 min. Samples were centrifuged 17k x g at 4⁰C for 10 minutes and the supernatant poured off. 1mL 75% DEPC ethanol was added to pellet, vortexed for 10 seconds and centrifuged 9.5k x g at 4⁰C for 5 minutes. Pellets were then suspended in RNase free water and incubated at 37⁰C with RNase free DNaseI for 30 minutes and DNaseI inactivated at 65⁰C for 10 minutes. 450µL buffer RLC from the Qiagen (Valencia, CA) RNeasy Plant Mini Kit was added to the digestion, processed in accordance with the manufacturer’s recommendations and eluted in 50µL RNase free water.

Extracted RNA was quality checked either with the Bio-Rad (Hercules, CA) Experion system using the Experion RNA High Sens Analysis kit or the Agilent (Santa Clara, CA) 2100

Bioanalyzer system using the RNA Nano Chip and Plant RNA Nano Assay Class. 6 3 Illumina Sequencing

The Illumina Hi Seq 2000 sequencing platform was used to sequence 2x100 PE reads from the cDNA libraries generated from the above RNA extractions at Michigan State

University’s Research Technology Support Facility. cDNA and final sequencing library molecules were generated with Illumina’s TruSeq RNA Sample Preparation v2 kit and instructions with minor modifications. Modifications to the published protocol include a decrease in the mRNA fragmentation incubation time from 8 minutes to 30 seconds to create the final library proper molecule size range. Additionally, Aline Biosciences’ (Woburn, MA) DNA

SizeSelector-I bead- based size selection system was utilized to target final library molecules for a mean size of 450 base pairs. All libraries were then quantified on a Life Technologies

63

(Carlsbad, CA) Qubit Fluorometer and qualified on an Agilent (Santa Clara, CA) 2100

Bioanalyzer (Dr. Jeff Landgraf personal communication).

454 Sequencing

cDNA libraries were constructed from the RNA extractions using the SMARTer™ PCR cDNA Synthesis Kit from ClonTech (Mountain View, Ca. ) according to the manufacturer’s instructions. cDNA quality and size distribution were verified via 1% TAE gels and the Bio-Rad

(Hercules, CA) Experion system. cDNA libraries were then processed to attach the Rapid

Library Multiplex Identification (RL MID) Adapters according to the manufacturer’s protocol.

Libraries were then quality checked for size distribution with Agilent's (Santa Clara, Ca.) 2100

Bioanalyzer and quantified via fluorometry. Libraries were then pooled and sequenced on Roche

Applied Science’s (Indianapolis, IN) Genome Sequencer FLX System with GS FLX Titanium 6 technology. 4

Bioinformatics

Initial Assembler Comparison

Sequence read information was extracted with Roche’s GS FLX System Data Analysis

(ver 2.3). Standard Flowgram Format (sff) files were then processed with in-house scripts to remove RL MID and SMaRT adapter sequences and assembled with CLC Bio Genomics

Workbench (ver 5 trial version), a Mira (ver3.2.1 production, Linux x86_64 static)/Cap3

(version date - 10/15/07) combination, and DNAStar NGen (ver 2.1.0) assemblers. Assembled data was then processed through Blast2GO (ver 2.4.5). Test results indicated the CLC Bio

Genomics Workbench produced longer mean contig lengths and higher N50 values with larger

64

read datasets (data not shown) and the CLC Bio Genomics Workbench ver.6 was used for subsequent read data processing.

Data Assembly

Sequence read information from Roche’s GS FLX Standard Flowgram Format (sff) files and Illumina HiSeq 2000 2x100 PE fastq files were used as input for the CLC Bio Genomic

Workbench ver 6. All developmental read datasets were processed with the CLC Create

Sequencing QC Report tool to assess read quality. The CLC Trim Sequence process was used to trim the 454 read datasets for a Phred value of 15 and the Illumina reads were trimmed of the first 12 bases due to GC ratio variability and for a Phred score of 30. All read datasets were trimmed of ambiguous bases. Illumina reads were then processed through the CLC Merge

Overlapping Pairs tool and all reads were de novo assembled to produce contiguous sequences 6 (contigs). Trimmed reads used for assembly were mapped back to the assembled contigs, 5 mapped reads were used to update the contigs and contigs with no mapped reads were ignored.

Consensus contig sequences were extracted as a multi-fasta file. The individual mature and young read datasets, original non-trimmed reads, were mapped back to the assembled contigs to generate individual developmental sample reads per contig and then normalized with the Reads

Per Kilobase per Million reads (RPKM) method as in (Mortazavi et al., 2008).

Annotation

Contig sequences were identified by alignment with blastx through Blast2GO (Conesa et al., 2005) as well as local stand-alone blastx alignments against the NCBI nr database (ver.

2.2.27+)(Altschul et al., 1997). Reciprocal blastn alignments were obtained through CLC Main

65

Workbench (ver 6) against a local stand-alone NCBI nt database (ver. 2.2.27+)(Altschul et al.,

1997). Gene ontology (GO) annotation, enzyme code annotation and the EMBL-EBI

InterProScan annotation of predicted protein signatures were all annotated through Blast2GO

(Conesa et al., 2005). The BLAST annotated RNA-Seq datasets from the young and mature leaves of Bienertia sinuspersici were analyzed for GO enrichment with Blast2GO (Conesa et al.,

2005). Due to assembler constraints and lack of genomic reference sequence, unless otherwise specified, expression analysis was restricted to the contig consensus sequence annotation and cannot differentiate between specific alleles, gene family members of highly similar sequence or subunit specificity without subsequent targeted molecular techniques (O’Neil and Emrich, 2013).

6 6

66

References

Ainsworth EA, Rogers A, Vodkin LO, Walter A, Schurr U (2006) The Effects of Elevated CO2

Concentration on Soybean Gene Expression. An Analysis of Growing and Mature

Leaves. Plant Physiol 142: 135–147

Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped

BLAST and PSI-BLAST: a new generation of protein database search programs.

Nucleic Acids Res 25: 3389 –3402

Andrés-Colás N, Sancenón V, Rodríguez-Navarro S, Mayo S, Thiele DJ, Ecker JR, Puig S,

Peñarrubia L (2006) The Arabidopsis heavy metal P-type ATPase HMA5 interacts with

metallochaperones and functions in copper detoxification of roots. Plant J 45: 225–236

Aubry S, Kelly S, Kümpers BMC, Smith-Unna RD, Hibberd JM (2014) Deep Evolutionary

Comparison of Gene Expression Identifies Parallel Recruitment of Trans-Factors in Two 6 7 Independent Origins of C4 Photosynthesis. PLoS Genet 10: e1004365

Brautigam A, Mullick T, Schliesky S, Weber APM (2011) Critical assessment of assembly

strategies for non-model species mRNA-Seq data and application of next-generation

sequencing to the comparison of C3 and C4 species. J Exp Bot 62: 3093–3102

Burkhead JL, Abdel-Ghany SE, Morrill JM, Pilon-Smits EAH, Pilon M (2003) The Arabidopsis

thaliana CUTA gene encodes an evolutionarily conserved copper binding chloroplast

protein. Plant J 34: 856–867

Von Caemmerer S (2013) Steady-state models of photosynthesis: Steady-state models of

photosynthesis. Plant Cell Environ 36: 1617–1630

Chang Y-M, Liu W-Y, Shih AC-C, Shen M-N, Lu C-H, Lu M-YJ, Yang H-W, Wang T-Y, Chen

SC-C, Chen SM, et al (2012) Characterizing Regulatory and Functional Differentiation

67

between Maize Mesophyll and Bundle Sheath Cells by Transcriptomic Analysis. Plant

Physiol 160: 165–177

Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (2005) Blast2GO: a universal

tool for annotation, visualization and analysis in functional genomics research.

Bioinformatics 21: 3674 –3676

Edwards GE, Franceschi VR, Voznesenskaya EV (2004) Single-cell C4 Photosynthesis Versus

the Dual-cell (Kranz) Paradigm. Annu Rev Plant Biol 55: 173–196

Edwards GE, Voznesenskaya EV (2011) Chapter 4 C 4 Photosynthesis: Kranz Forms and Single-

Cell C 4 in Terrestrial Plants. C4 Photosynth Relat CO2 Conc Mech 29–61

EMBL-EBI GO:0001071 nucleic acid binding transcription factor activity.

http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0001071

Fox TC, Guerinot ML (1998) Molecular Biology of Cation Transport in Plants. Annu Rev Plant 6 Physiol Plant Mol Biol 49: 669–696 8

Freitag H, Stichler W (2000) A Remarkable New Leaf Type With Unusual Photosynthetic Tissue

in a Central Asiatic Genus of Chenopodiaceae. Plant Biol 2: 154–160

Freitag H, Stichler W (2002) Bienertia cycloptera Bunge ex Boiss., Chenopodiaceae, another C4

Plant without Kranz Tissues. Plant Biol 4: 121–132

Gowik U, Bräutigam A, Weber KL, Weber APM, Westhoff P (2011) Evolution of C4

Photosynthesis in the Genus Flaveria: How Many and Which Genes Does It Take to

Make C4? Plant Cell Online 23: 2087 –2105

Jon E. Keeley, Philip W. Rundel (2003) Evolution of CAM and C4 Carbon‐Concentrating

Mechanisms. Int J Plant Sci 164: S55–S77

68

Kadereit G, Ackerly D, Pirie MD (2012) A broader model for C4 photosynthesis evolution in

plants inferred from the goosefoot family (Chenopodiaceae s.s.). Proc R Soc B Biol Sci.

doi: 10.1098/rspb.2012.0440

Kapralov MV, Akhani H, Voznesenskaya EV, Edwards G, Franceschi V, Roalson EH (2006)

Phylogenetic Relationships in the Salicornioideae / Suaedoideae / Salsoloideae s.l.

(Chenopodiaceae) Clade and a Clarification of the Phylogenetic Position of Bienertia

and Alexandra Using Multiple DNA Sequence Datasets. Syst Bot 31: 571–585

Kumar A, Bennetzen JL (1999) Plant Retrotransposons. Annu Rev Genet 33: 479–532

Langdale JA (2011) C4 Cycles: Past, Present, and Future Research on C4 Photosynthesis. Plant

Cell Online. doi: 10.1105/tpc.111.092098

Michael TP (2014) Plant genome size variation: bloating and purging DNA. Brief Funct

Genomics 13: 308–317 6 Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying 9

mammalian transcriptomes by RNA-Seq. Nat Methods 5: 621–628

Muhaidat R, Sage RF, Dengler NG (2007) Diversity of Kranz anatomy and biochemistry in C4

eudicots. Am J Bot 94: 362–381

O’Neil ST, Emrich SJ (2013) Assessing De Novo transcriptome assembly metrics for

consistency and utility. BMC Genomics 14: 465

Ozsolak F, Milos PM (2011) RNA sequencing: advances, challenges and opportunities. Nat Rev

Genet 12: 87–98

Parisod C, Salmon A, Zerjal T, Tenaillon M, Grandbastien M-A, Ainouche M (2009) Rapid

structural and epigenetic reorganization near transposable elements in hybrid and

allopolyploid genomes in Spartina. New Phytol 184: 1003–1015

69

Rochaix J-D (2014) Regulation and Dynamics of the Light-Harvesting System. Annu Rev Plant

Biol 65: 287–309

Sablowski R, Carnier Dornelas M (2014) Interplay between cell growth and cell cycle in plants.

J Exp Bot 65: 2703–2714

Sage RF (2004) The evolution of C4 photosynthesis. New Phytol 161: 341–370

Schütze P, Freitag H, Weising K (2003) An integrated molecular and morphological study of the

subfamily Suaedoideae Ulbr. (Chenopodiaceae). Plant Syst Evol 239: 257–286

Sharpe RM, Offermann S (2013) One decade after the discovery of single-cell C4 species in

terrestrial plants: what did we learn about the minimal requirements of C4

photosynthesis? Photosynth Res. doi: 10.1007/s11120-013-9810-9

Takacs EM, Li J, Du C, Ponnala L, Janick-Buckner D, Yu J, Muehlbauer GJ, Schnable PS,

Timmermans MCP, Sun Q, et al (2012) Ontogeny of the Maize Shoot Apical Meristem. 7 Plant Cell Online 24: 3219–3234 0

Taylor SH, Ripley BS, Martin T, De-Wet L-A, Woodward FI, Osborne CP (2014) Physiological

advantages of C4 grasses in the field: a comparative experiment demonstrating the

importance of drought. Glob Change Biol 20: 1992–2003

The Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology -

Nature Genetics. Nat Genet 25: 25–29

70

7 1

Figure 1. de novo Transcriptome Build Workflow. cDNA from mature and young B. sinuspersici total RNA was sequenced on the 454 FLX/Titanium and HiSeq2000 platforms.

Generated read sequences were assembled with CLC Bio’s Genomic Workbench and then annotated with the NCBI nr database version 2.2.27+ and GO, E.C. and InterproScan annotated through Blast2GO as well as the visualization and GO statistical calculations.

71

Table 1. B. Sinuspersici Developmental Stage ESTs & Percentage of ESTs Per Fold

Change. Number of ESTs per developmental tissue type and magnitude of fold level distribution with percentages of the distribution between the developmental tissue types.

72,537 Total Number of EST’s when 949 S. lycopersicum blastn > 95% identity removed 28, 045 Total Annotated ESTs Mature Young Total ESTs in dataset 72,089 72,169 Number of ESTs Mapped 22,887 48,980 Higher per Tissue Type Annotated ESTs 7,245 20,696 ESTs 2 fold difference or above 8,049 19,291 Annotated ESTs 2 fold or above 2,765 8,351 Total Annotated ESTs 28, 045 Total Mature Mapped Greater Mature/Total 22,887 7,245 31.66% ESTs Mapped in Mature EST Total Young Mapped Greater Young/Total 7 48,980 20,696 42.25% ESTs Mapped in Young EST 2 Fold Differences & Number of ESTs Mature > log (2) / Mature Mature RPKM > log (2) ESTs 2,764 10 12.08% 10 Total ESTs Mature > log (5) / Mature Mature ESTs > log (5) ESTs 855 10 3.74% 10 Total ESTs Mature > log (10) / Mature Mature ESTs > log (10) ESTs 313 10 1.37% 10 Total ESTs Young > log (2) / Young Total Young RPKM > log (2) ESTs 8,351 10 17.05% 10 ESTs Young > log (5) / Total Young Young ESTs > log (5) ESTs 1,398 10 2.85% 10 ESTs Young > log (10) / Total Young Young ESTs > log (10) ESTs 346 10 0.71% 10 ESTs

72

A B

Figure 2. B. Sinuspersici Developmental Stage ESTs & Percentage of ESTs Per Fold

Change. A. Fold-wise progression of EST composition indicates the young ESTs more diverse 7 and numerous than the mature ESTs at lower RPKM levels until the 5 fold level. B. Percentage 3 of ESTs are 2.3 times higher in the young tissue at the 2 fold level but at the higher folds the percentage of mature and young tissue RPKM log10 values are within 1.2% and 0.02% of each other at 5 and 10 fold respectively.

73

Table 2. Gene Ontology Enrichment in young and mature Bienertia sinuspersici.

Gene Ontology Enrichment Fisher p-value < 0.01, FDR < 0.05 Annotations

Young GO Mature Shared GO Total Unique GO Terms GO Terms Terms Terms Biological 124 81 17 188 Processes Cellular 31 19 5 45 Component Molecular 64 35 4 95 Function

7 4

74

7

5 Figure 3. Number of over and under-represented GO terms and the relationship between the GO terms and developmental tissue types. Blue circles denote mature tissue and the purple circles denote young tissue GO terms.

75

Table 3. GO term enrichment and number of ESTs representing enriched GO term.

Molecular Significantly Significantly Mature Young Total in Function GO Higher in Higher in Under/Over Under/Over GO Term Term Mature Young

ATP Binding Under Over 250 1080 2974 microtubule Under Over 2 64 93 motor activity polysaccharide Over Under 16 10 59 binding structural constituent of Over Over 117 221 509 ribosome

Cellular Significantly Significantly Mature Young Total in Component Higher in Higher in Under/Over Under/Over GO Term GO Term Mature Young chloroplast Over Under 105 228 828 7 envelope 6

cytosolic large Over Over 56 79 173 ribosomal subunit nucleolus Over Over 116 237 575 photosystem Over Under 18 8 46 II plasma Over Over 436 1315 3820 membrane

76

Significantly Significantly Biological Mature Young Total in Higher in Higher in Process Under/Over Under/Over GO Term Mature Young chromatin silencing by small Under Over 3 84 149 RNA

DNA integration Over Under 101 93 607

DNA replication Under Over 0 96 123 initiation embryo sac egg cell Under Over 5 85 189 differentiation histone H3-K9 Under Over 4 192 258 methylation isopentenyl 7 diphosphate biosynthetic 7 process, Under Under 18 75 373 methylerythritol 4-phosphate pathway methylation- dependent Under Over 3 91 158 chromatin silencing microtubule- Under Over 2 70 102 based movement nuclear- transcribed Under Under 5 37 175 mRNA catabolic process production of miRNAs involved Under Over 4 70 153 in gene silencing by miRNA proteolysis Over Under 92 169 674

77

Significantly Significantly Biological Mature Young Total in Higher in Higher in Process Under/Over Under/Over GO Term Mature Young reciprocal meiotic Under Over 7 85 180 recombination regulation of flower Under Over 9 196 361 development regulation of G2/M transition Under Over 0 62 82 of mitotic cell cycle regulation of meristem Under Over 9 121 226 growth

RNA methylation Over Over 43 100 224 synapsis Under Over 3 58 100

7 8

78

A

B

7 9

C

79

Figure 4. Over-represented cellular component gene ontology distribution comparisons between the developmental tissues. A. EST percentage of Cellular component GO first ancestor distribution in young and mature tissues. B. Number of ESTs comprising the enriched Biological Process GO. C. Percentage of ESTs involved in each of the enriched Biological Processes GO.

8 0

80

Chapter 3

Comparative Transcriptomics of Single Cell C4, Kranz C4 and C3

Photosynthetic Types in Suaedoideae

Abstract In the Chenopodiaceae, the Suaedoideae subfamily is comprised of three photosynthetic types, Kranz C4 (KC4), single cell C4 (SCC4) and C3 type species making it a key taxon in the investigation of photosynthetic differences in carbon assimilation between closely related species. Genomic resources available to investigate these differences are largely related to the

SCC4 Bienertia sinuspersici species. To address this issue, transcriptomes from mature leaf samples of Suaeda aralocaspica (SCC4), Suaeda eltonica (KC4) and (C3) 8 1 were generated and compared alongside Bienertia sinuspersici. Photosynthetic C4 cycle enzyme

sequences were abundantly present in all three C4 species and present at low levels in the C3 species. Bienertia developmental tissue displayed up-regulation of the C4 cycle enzymes in mature tissue compared to young tissue. In plants approximately ninety percent of photosynthetic genes are nuclear encoded and require translocation to the chloroplast. In this study, enrichment of post-translational and chaperone components involved with chloroplast targeted pre-proteins were differentially expressed between species in both copy number and magnitude. Subunit isoforms and copy numbers of the chloroplast import complexes were differentially expressed between the species and in the Bienertia developmental tissues. Transcriptomic comparisons indicate these photosynthetic phenotypes are the result of differential regulation and expression of the affected pathway genes.

81

Introduction

A large research community has been engaged in the study of plants with traits that confer heat, drought and salt tolerance along with water use efficiency. Many of these studies have indicated photosynthesis to be closely associated, if not directly implicated, as a primary contributor to these traits. C4 type photosynthesis has been shown to impart these traits to plants that perform this type of photosynthesis and there are two major anatomical leaf chlorenchyma cell structures involved in C4 photosynthesis. Kranz anatomy C4 species divide the initial carbon fixation, C4 acid and phosphoenolpyruvate (PEP) regeneration mechanisms in the mesophyll cell and the final carbon fixation in the bundle sheath cells (Edwards & Walker, 1983). Each of these cells has a discrete nucleus with which to coordinate these cellular and metabolic processes. The single cell C4 (SCC4) species of Suaeda aralocaspica, Bienertia cycloptera, Bienertia sinuspersici, and Bienertia kavirense are the four known terrestrial species to perform C4 8 2 photosynthesis in individual chlorenchyma cells and maintain the cellular and metabolic processes under the coordination from a single nucleus (for review see Edwards &

Voznesenskaya, 2011; Langdale, 2011; Sharpe & Offermann, 2014). This leads to the hypothesis that the SCC4 species must coordinate the metabolic and cellular processes in a different manner than those of the Kranz anatomy types. The Suaedoideae subfamily is comprised of Kranz C4,

SCC4 and C3 type species and thus provides a platform in which to investigate the controlling mechanisms between the Kranz and SCC4 phenotypes.

The most detailed genetic information to date for Bienertia sinuspersici is a singular normalized transcriptome as well as a spatial and developmental soluble proteome (Offermann et al submitted) as well as a discrete, core photosynthetic related characterization. The experimental design of the Offermann et al submitted paper generated a mechanistic view of the cellular and

82

photosynthetic character of Bienertia. The insight from this study highlights interesting questions regarding the regulatory aspects of this unique phenotype. The differential protein dynamics in developmental time and cellular space has illuminated questions pertaining to the regulatory structure involved in producing the SCC4 phenotype as well as dimorphic chloroplasts and the temporal differential spatial targeting of chloroplast related gene products in these single chlorenchyma cells.

Chloroplasts are the cellular organelle where photosynthesis takes place in the chlorenchyma cell and approximately 90 percent of the gene products involved are nuclear encoded (Inaba and Schnell, 2008). Once transcribed, mRNAs are translocated from the nucleus to the cytosol where they are translated on the ribosomes. The ribosomes are composed of rRNA and proteins and the protein composition of the ribosomes are differentially enriched (reviewed in Xue & Barna, 2012) as was found in the maturing chlorenchyma cells leading to a 8 3 heterogeneity in ribosome composition during development in Bienertia sinuspersici (Sharpe et al in prep). Nascent nuclear encoded peptides destined for the chloroplasts are selectively phosphorylated and bound by members of a family of 14-3-3 proteins and heat shock proteins

(HSPs) that selectively act upon the precursor proteins for degradation or by the translocon at the outer envelope proteins (TOCs) of the chloroplast to enable translocation and targeting of nuclear encoded gene products into the different chloroplast compartments (May & Soll, 2000;

Lee et al., 2009). 14-3-3 and HSP gene products were shown to be differentially enriched between the young and mature B. sinuspersici developmental tissue types in the proteome as well as in the transcriptome studies (Offermann et al submitted and Sharpe et al in prep).

Nuclear encoded proteins bound for the chloroplast contain an N terminal transit peptide which targets the precursor peptide to the import machinery of the chloroplast. The import

83

machinery consists of the TOC complex, the translocon at the inner envelope (TIC) and various associated proteins which bind, import and fold the nuclear encoded protein to its site of function. Two major TOC complexes have been shown to differentially import nuclear encoded protein subsets through from a Class I and a Class II pathway (for review see Inaba & Schnell,

2008; Flores-Pérez & Jarvis, 2013). The TOC159 complex has been implicated in the preferential import of photosynthetic proteins through the Class II import pathway and the nonphotosynthetic, or housekeeping proteins, are thought to be imported via the TOC132 apparatus. There appear to be few shared transit peptide motifs between the nuclear encoded genes (Chotewutmontri et al., 2012) and this appears to hold true for B. sinuspersici (Offermann et al submitted). Functional motifs and physicochemical properties attributed to transit peptides have been hypothesized to be the discerning properties for differential accumulation of these proteins to the chloroplasts but these motifs and properties do not appear to be implicated in the 8 differential targeting to the dimorphic chloroplasts in the SCC4 anatomy (Offermann et al 4 submitted).

Aspects of this dichotomy can lead to insights related to targeted chloroplast products whether lipid, carbohydrate or energetic based photosynthetic-type induced phenotypes such as temperature and drought resistance. Temporal accumulation of the ribulose-1,5-bisphosphate carboxylase/oxygenase small subunit (RbcS) in total cell samples of mature and young leaf tissue, as well as additional photosynthetic related enzymes, can be attributed to transcriptional regulation of the RbcS gene family (Sharpe et al In prep). The differential accumulation of the more active and abundant rubisco enzyme in the central compartment compared to the peripheral compartment could be attributed to post-transcriptional regulation, through mRNA binding proteins, or translational regulatory mechanisms imparted by the heterogeneity of the ribosomes

84

or protein turnover (Voznesenskaya et al., 2005; Rosnow et al., 2014b, Offermann et al submitted) . Genetic components implicated in gene expression regulatory processes and in the development of the chloroplast were identified as differentially expressed at the detectable protein level in the Offermann et al study as well as the comparatively identified contigs in a follow-up RNA-seq study (Sharpe et al In prep). The information provided from these two studies sets a baseline for a comparative study between the B. sinuspersici, S. aralocaspica, S. eltonica and S. maritima transcriptional databases for elements associated with the differentially targeted and post-transcriptionally regulated photosynthetic gene products.

Results

Post-translational Components 8 5 14-3-3 chaperones

Two members of the 14-3-3 chaperone family were identified as being more abundant in the young tissue compared to mature tissue in the Offermann et al proteomics, ID 3 UN054444 and ID 37 UN048586, as well as several in the transcriptomic comparisons. ID 3 UN048586 was identified in the proteomics as being greater than two fold enriched in the young tissue

(Offermann et al submitted). Due to the similarity of 14-3-3 members (Denison et al., 2011) and nonspecific annotations, both ID 3 and ID 37 nucleotide sequences were used as the query sequences in a blastn comparison against the Suaedoideae BLAST databases for best hit homology. Best hit homology results were aligned and a phylogenetic tree was built for a concise identification of the transcripts that corresponded to the Offermann et al sequences and the corresponding Suaedoideae RPKM values were computed (Figure 1). Contigs homologous to

85

UN054444 were present in all 4 Suaedoideae species and S. maritima best hit resulted in two full length contigs. RPKM values were highest in the young Bienertia tissue sample and 1.75 times more abundant than the mature tissue. An additional group of contigs was represented by all but the S. eltonica species and was labeled as UN054444-like. In the UN054444-like homologous contigs, the Bienertia developmental tissues as well as the represented Suaedoideae samples followed the same expression pattern with the exception of the S. maritima species which exhibited higher values in relation to the pattern expressed in the UN054444 group. The

UN048586 and UN048586-like homologous contigs were only represented by the SCC4 species and the UN048586 homologous developmental comparison was just slightly less than 2 fold

(1.98).

HSP 70

Offermann et al proteomics data indicated three HSP70 variants with two meeting 8 6 reporting criteria for the developmental tissues. Sequence corresponding to ID 220 UN053889 and ID 220 UN070112 were blastn aligned with the four Suaedoideae BLAST databases. RPKM values were calculated for the reported contigs and the contigs aligned for concise identification utilizing phylogenetic means. Phylogenetic analysis indicated best hit contigs with e-values of

0.0 fell into two groups with one group clustering with ID 220 UN053889, ID 220 UN070112 and the Arabidopsis thaliana HSP 70-4 gene AJ002551.1 and one group clustered together identified with best blastn hits against HSP70-1 cognate genes.

HSP70-1 expression across the four Suaedoideae species as well as the Bienertia developmental tissues were present with the Suaeda species exhibiting two contigs each and the

Bienertia species expressing one major contig. S. eltonica contig 244 exhibited a much higher

RPKM value by over three fold relative to the rest of the contigs except the young Bienertia

86

tissue (Figure 2). The second S. eltonica contig, 1056, was expressed higher than all but the

Bienertia contigs by over two fold. Each of the species exhibiting at least two homologs of the

HSP70 gene expressed one of the homologs over two fold with the exception of the S. maritima pair 3988/3989, 87% of each other, and the two S. aralocaspica contigs which was at 1.86 fold of each other. The Bienertia young tissue was expressed 3.67 times greater than the mature tissue and agrees with the expression pattern result in the Offermann proteomics albeit at a much higher differential than the slightly under two fold expression reported from the proteomics work.

HSP70-2 exhibited a much different profile than HSP70-1 in that Bienertia expressed two homologous contigs and the Suaeda species, with the exception of S. maritima, expressed a single contig. The RPKM expression profiles were inverted compared to HSP70-1 as well with the exception of S. aralocaspica which was in the midrange comparative to the other three 8 7 species. While Bs contig 23320 reported RPKM numbers barely above cutoff numbers, Bs contig 4265 was higher in the mature tissue with a 2.84 fold increase compared to the young tissue. The C3-type S. maritima Sm contig 5241 was relatively higher than the two Suaeda species and Sm contig 5242, RPKM value of 66.28, was midpoint between the S. aralocaspica,

RPKM value of 28.08, and the S. eltonica with an RPKM value of 95.02.

Chloroplast Import

Photosynthetic Import TOC components

TOC75, 159 and 33 constitute the major components involved in the import of photosynthetic pre-proteins through the chloroplast TOC/TIC import complex (Jarvis & López-

Juez, 2013). TOC159 and TOC33 identify and bind pre-protein precursor elements and initiate

87

entry to TOC75 and the intermembrane space. Sequence similarity between TOC33 and TOC34 are homologous enough that, due to assembly constraints and ambiguity in annotation, datasets indicate all isoforms had the TOC34 annotation. Further analysis into sequence identity needs to be accomplished between these two subunits in the Suaedoideae and hereafter will be referred to as TOC33/34 as they are involved in both of the major import complexes. TOC33/34 expression in the developmental Bienertia samples indicated a greater than two fold expression in the young tissue compared to the mature sample and the C4 species expression was higher compared to the

C3 species with the exception of one mature Bienertia contig (Figure 3). S. maritima did exhibit two separate isoforms compared to the C4 species as well as two TOC75 contigs in relation to the C4 species TOC75 contigs. TOC75 had higher RPKM expression values in the C4 species than the C3 species and was again more highly expressed in the young tissue in relation to the mature Bienertia tissue. All species had two TOC159 isoforms present and are labeled TOC159a 8 and TOC159b (Figure 3 A and B). The most striking comparison between the two isoforms was 8 the C3 S. maritima had higher expression values for TOC159b than for isoform TOC159a by almost two fold whereas the C4 species had expression values greater than two fold in the

TOC159a compared to the isoform TOC159b. Contig evaluation in S. maritima exhibited two isoforms in each of the TOC159a and TOC159b isoforms compared to a singular TOC159a and

TOC159b isoforms in the C4 species.

Housekeeping Import TOC components

The TOC import complex associated with the import of the non-photosynthetic plastid targeted nuclear encoded pre-proteins are comprised of TOC132, TOC134 and TOC90 (Jarvis &

López-Juez, 2013). TOC33/34 and TOC132 identify, bind and initiate movement intoTOC90 which completes the translocation of pre-proteins to the TIC complex and intermembrane space.

88

TOC132 was not differentially expressed in the Bienertia developmental tissues and all C4 species exhibited greater than two fold values compared to the C3 species and, in contrast to the number of isoforms reported in the previous TOC components, the Kranz C4 species S. eltonica exhibited two evenly expressed isoforms of TOC132. S. maritima had two isoforms for TOC132 as well. TOC90 was the only TOC component to exhibit an expression profile for two isoforms similar to the C4 species singular isoforms with the exception of the differential TOC159a/159b profile.

C4 pathway components

The C4 pathway enzymes and proteins detected in the Offermann proteomics study were detected in the Sharpe et al transcriptomics study. While all proteins implicated in the C4 cycle were present in the transcriptomics, not all were developmentally differentially expressed in either the proteomics or the transcriptomics data. The entry point for atmospheric carbon 8 9 acquisition in plants is one of the metabolic differences between C3 and C4 photosynthetic types, beginning in the C4 type with the initial enzyme responsible in the C4 type is carbonic anhydrase

(CA). There are several members in this gene family with distinct metabolic roles which are preferentially expressed between species and tissue types. In C4 plants, beta-CA plays the major role in the initial acquisition of carbon from the atmosphere. There are two distinct isoforms of betaCA2 and both isoforms were detected and identified in the Sharpe et al. (in preparation) and

Offermann et al. (submitted) studies. In the Suaedoideae species both isoforms were present in the transcriptomic profile (Figure 4) and while both isoforms were identified as being present in the proteomics profile, neither isoform passed the spectral count criteria in the proteomics study to be included in the developmental series comparison.

89

After the conversion from CO2 to bicarbonate, phosphoenolpyruvate carboxylase

(PEPC) transfers a carboxyl group from bicarbonate to phosphoenolpyruvate (PEP) to form the four carbon oxaloacetic acid (OAA). In evolution of C4 there has been positive selection on certain amino acid residues to develop a C4 isoform having certain kinetic properties for function in C4 photosynthesis (ppc-1; see Rosnow et al. 2014 for the naming convention of the C4 isoform). In most C4 type PEPC there has been positive selection with substitution of a serine for an alanine at position 780 (e.g. in maize). However, in family Suaedoideae there is variation in the C4 isoform (two forms of ppc1 in Bienertia, one with Ala and the other with Ser residue at

780, in S. eltonica one isoform with Ser residue, and in S. aralocaspica, one C4 form with Ala residue at position 780). In this study, isoform sequences corresponding to those in the Rosnow et al 2014 study were identified as well as an additional Ala residue isoform in S. eltonica. There were highly expressed Ser residue isoforms in B. sinuspersici and S. eltonica and the Ala residue 9 isoform in S. aralocaspica with over 10 fold lower expression of the PEPC transcript in the C3 0 species S. maritima (Figure 5).

In the NAD-ME biochemical C4 sub-type, OAA is transaminated to aspartate by cytosolic aspartate transaminase (cASP-AT) and translocated to the mitochondria where a mitochondrial compartmentalized aspartate transaminase (mASP-AT) transaminates aspartate back to OAA. Both variants of the ASP-AT were detected across the Suaedoideae species as well as in the proteomics study (Figure 6). In the mitochondria, NAD malate dehydrogenase (NAD-

MDH) converts the OAA to malate which is then decarboxylated by the NAD malic enzyme

(NAD-ME). Both NAD-MDH and NAD-ME were present in all datasets of both studies including isoforms of NAD-ME (Figure 6). The products from the NAD-ME decarboxylation are

CO2, which diffuses into the neighboring chloroplast for fixation in the Calvin-Benson cycle by

90

ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco), and pyruvate which is exported to the cytosol. It is in the cytosol where pyruvate is transaminated by an alanine transaminase

(ALA-AT) from glutamate, which replenishes the 2-oxoglutarate glutamate shuttle used in the

OAA to aspartate transamination, to alanine. The current model for the SSC4 C4 cycle in

Bienertia has alanine transaminated back to pyruvate for import into the peripheral compartment chloroplasts where pyruvate orthophosphate dikinase (PPDK) converts the pyruvate to PEP for incorporation by PEPC to OAA thus completing the C4 cycle.

Alternate C4 Biochemical Enzymes

Biochemical C4 photosynthetic types are so named based on the decarboxylation enzymes involved releasing CO2 from the C4 acid. The PEPCK type uses phosphoenolpyruvate carboxykinase to release CO2 from OAA and the NADP type requires NADP as a co-substrate whereas the NAD-ME type requires NAD to release CO2 from malic acid. The Bienertia genus 9 1 are all NAD-ME types as well as S. aralocaspica and S. eltonica. While the genes for PEPCK and NADP-ME were present in the transcriptome RPKM expression values were very low compared to the NAD-ME transcripts (Figure 7).

Discussion

Development in Bienertia

All C4 dicarboxylic acid cycle proteins were present in the transcriptomics and most were differentially expressed between young and mature tissue of Bienertia. The C4 cycle enzymatic transcripts were up-regulated in the mature Bienertia tissue with the exception of

NAD-MDH and NAD-ME. In a previous study, transcripts of a few C4 cycle enzymes analyzed by quantitative real-time polymerase chain reaction were also found to increase during leaf

91

development with an increase from young to mature (greater than two fold) with the exception of CA isoforms and NAD-MDH (Lara et al., 2008). Expression ratio variations between that and the current study might occur due to the differences in the count detection strategies between the two technologies. Nevertheless, all information available to date shows most C4 cycle enzyme genes are up-regulated in the mature tissues compared to the young tissues. Results from the

Offermann et al (submitted) proteomics study and developmental protein data of selected C4 cycle enzymes (Voznesenskaya et al., 2005; Lara et al., 2008) showed either a lack of detection of protein in the young tissue or an increase in C4 cycle protein quantities in a young to mature fashion. However, the increase in the transcriptomics ratios for mature to young for the majority of the C4 cycle enzymes which indicates post-transcriptional regulation is occurring.

In considering transcripts which were enriched in young tissue, two proteins that exhibited a greater abundance were members of the 14-3-3 family and the HSP70 family, both in 9 2 the Offermann et al. proteomics and the transcriptomic data in the current study. The 14-3-3 protein family has been implicated along with serine/threonine kinases in the phosphorylation of precursors of pre-proteins destined for the chloroplasts (May and Soll, 2000; Boer et al., 2012).

Instances of this phosphorylation lead to the binding of HSP70 and the subsequent degradation or increased import into the chloroplasts (Martin et al., 2006). HSP70 has been implicated in the binding of RBCS (Ivey et al., 2000) and with this association the hypothesis could be formulated that, due to the increase in HSP70 expression in the young tissue seen in Bienertia and S. eltonica, the degradation of RBCS transcript is more likely than when HSP70 expression decreases in the mature tissue.

Transcripts of chloroplast import complexes were present at the initiation of photosynthetic development in Bienertia and they exhibited a young to mature tissue decrease in

92

transcript abundance. Subunits of the photosynthetic pre-protein import TOC complex, with the exception of TOC159a, were more highly expressed in the young tissue. Components of both the

TOC159 and TOC132 complexes were not identified in sufficient quantities for comparative purposes in the Offermann et al study which would be expected due to the extent of the portion of these components occurring in the soluble fractions measured. In the Lung & Chuong, (2012) study, a decrease in TOC159 and TOC132 proteins (analyzed by antibodies) was observed from young to mature leaves, which was mirrored in the present transcriptomics study.

Suaedoideae Species Comparisons

Comparative analysis of expression values and presence of C4 cycle enzymes between the different photosynthetic C4 anatomical types and the C3 species showed the C3 and C4 photosynthetic types mirrored their metabolic functions. In the SCC4 species, B. sinuspersici and

S. aralocaspica there were high transcript levels of the βCA2 isoform of carbonic anhydrase, 9 3 suggestive of this form in the C4 cycle; transcripts of this form were undetectable in the C3 S. maritima. B. sinuspersici also had high transcript levels of the βCA isoform. Interestingly, the

βCA2 isoform was barely detectable in C4 S. eltonica, while it had high levels of βCA isoform transcripts which were near the expression value for S. maritima. The βCA isoform was barely detectable in S. aralocaspica. This suggests some diversity in the form of CA recruited for function in C4 photosynthesis among these photosynthetic types. Both the carbonic anhydrase and PEPC genes belong to families of genes and these results may be due to the close sequence homology between family members. In general, the results show the two types of single-cell C4 species and the Kranz type C4 species have similar expression patterns for enzymes to support an NAD-ME type C4 cycle. Considering whether other C4 decarboxylases may be involved in

C4 photosynthesis in these species, transcripts of PEPCK and NADP-ME were both detected;

93

but at levels more than a magnitude lower than the NAD-ME values. This suggests little or no function of these two decarboxylases on these species. Further analysis of levels of these proteins and the compartmentation would be required to determine whether they may contribute to C4 photosynthesis. Preliminary data revealed little or no detection of NADP-ME and PEPCK in these C4 species by western blot analysis (Koteyeva et al. unpublished).

The extent to which there is convergence in the mechanisms controlling differentiation to form two types of chloroplasts in these SCC4 versus Kranz type NAD-ME type C4 species in the

Suaedoideae subfamily is unknown. Differentiation to develop two chloroplast types in single cell C4 species from nuclear encoded genes must be controlled at the post-transcriptional level, while in Kranz type C4 species this may occur by both transcriptional and post-transcriptional processes (Hibberd and Covshoff, 2010).

In the Kranz type S. eltonica, and the two single cell C4 types, post-transcriptional 9 4 regulation could be envisioned by the selectivity in expression of some proteins in the 14-3-3,

HSP and TOC gene families. Some of these proteins may differentially process pre-proteins bound for the two compartments in single cell C4 while being differentially present in the mesophyll and bundle sheath cells of the Kranz anatomy. The differential presence of 14-3-3 and

HSP70 isoforms between the SCC4, Kranz C4 and C3 species could indicate that transcriptional control in these gene families is involved in the post-translational regulation of genes involved in the development of these photosynthetic phenotypes.

With addition of transcriptomic information from representative species in subfamily

Suaedoideae, a more diverse pool of gene isoforms is available; this information may lead to a more concise picture of the genetic mechanisms responsible for development of the SCC4 phenotypes. Due to inherent technical difficulties in assembly due to sequence homology in

94

expression of alleles and gene family members, additional curation both at the bioinformatics and experimental levels will be required in the future in order to identify candidate gene isoforms for regulation of the identified pathways.

Due to previous, as well as the present, lack of gene naming protocols, care must be taken when in silico comparison by annotation is made between gene homologs and orthologs as well as between species; many of these comparisons cannot be accomplished with automated techniques alone. The difficulties correlating dissimilar data sets are challenging. Classic examples of single copy genes, such as pyruvate, orthophosphate dikinase (PPDK) with its differential transcriptional and post-translational regulation, were supported between the transcriptomic and proteomic datasets as well as in the transcriptional data for all four species.

Conversely, difficulties in precise correlation between gene product isoforms were present between the transcriptomic and proteomic data as well as between the species comparison 9 5 studies as the annotation nomenclature between closely related gene family members illustrate (Supplementary Data 3). Information provided from the dataset generated in the Sharpe et al analysis (in preparation) further illuminates the complexity and dynamics of the RNA landscape during leaf development. While comparisons between the Offermann et al proteomics and the

Sharpe et al transcriptomics studies synergistically support aspects of the physiology, there exist inconsistencies in the data reported between the studies. Whether data inconsistencies arise due to biological regulatory mechanisms or technical difficulties due to sequencing and assembly, these processes still require further investigation at a more detailed level.

95

Material and Methods

Plant material

S. aralocaspica and S. eltonica plants were maintained in 2 gallon pots in growth chambers under a 14 hour light/10 hour dark with a stepwise increasing light regime to 525

PPFM at full light and an 18⁰C (dark) to 35⁰C (light) temperature regime. S. maritima plants were maintained in 2 gallon pots in growth chambers under a 14 hour light/10 hour dark with a stepwise increasing light regime to 525 PPFM at full light and an 18⁰C (dark) to 26⁰C (light) temperature regime. Plants were watered twice a week and fertilized with Peters 20-21-5 once a week. Whole fully expanded mature leaves from three separate plants from each species were harvested and combined as one sample within two hours after light initiation and immediately flash frozen in liquid nitrogen. Flash frozen leaf tissue was ground into a fine powder with a liquid nitrogen cooled mortar and pestle and approximately 100mg of frozen powder was 9 6 transferred to a liquid nitrogen frozen 2 mL eppendorf tube and either kept in liquid nitrogen or stored at -80⁰C until RNA was extracted.

Genomic size estimates

Flow cytometry was conducted on mature leaf samples of B. sinuspersici, S. aralocaspica, S. eltonica and S. maritima by the Benaroya Research Institute at Virginia Mason.

RNA extraction

Total RNA was extracted using an acid guanidinium thiocyanate phenol chloroform extraction method similar to that described by Chomczynski and Sacchi (1987). 1mL of 0.8M guanidinium thiocyanate, 0.4M ammonium thiocyanate, 0.1M sodium acetate pH5.0, 5% w/v glycerol, and 38% v/v water saturated phenol were added to approximately 100mg powdered

96

tissue, shaken to evenly mix sample and incubated at room temperature for 5 minutes. 200µL chloroform was added and shaken vigorously until entire sample became uniformly cloudy and incubated at room temperature for 3 minutes. Samples were then centrifuged at 17k x g at 4⁰C for 15 minutes and the aqueous phase was removed to a clean 1.5mL eppendorf tube. 600µL 2- propanol was added, rocked 5 to 6 times and incubated at room temperature for 10 minutes.

Samples were centrifuged 17k x g at 4⁰C for 10 minutes and the supernatant poured off. 1mL

75% DEPC ethanol was added to pellet, vortexed for 10 seconds and centrifuged 9.5k x g at 4⁰C for 5 minutes. Pellets were then suspended in RNase free water and incubated at 37⁰C with

RNase free DNaseI for 30 minutes and DNaseI inactivated at 65⁰C for 10 minutes. 450µL buffer

RLC from the Qiagen (Valencia, CA) RNeasy Plant Mini Kit was added to the digestion, processed in accordance with the manufacturer’s recommendations and eluted in 50µL RNase free water. Extracted RNA was quality checked either with the Bio-Rad (Hercules, CA) 9 Experion system using the Experion RNA High Sens Analysis kit or the Agilent (Santa Clara, 7

CA) 2100 Bioanalyzer system using the RNA Nano Chip and Plant RNA Nano Assay Class.

Illumina Sequencing

The Illumina Hi Seq 2000 sequencing platform was used to sequence 2x100 PE reads from the cDNA libraries generated from the above RNA extractions at Michigan State

University’s Research Technology Support Facility. cDNA and final sequencing library molecules were generated with Illumina’s TruSeq RNA Sample Preparation v2 kit and instructions with minor modifications. Modifications to the published protocol include a decrease in the mRNA fragmentation incubation time from 8 minutes to 30 seconds to create the final library proper molecule size range. Additionally, Aline Biosciences’ (Woburn, MA) DNA

SizeSelector-I bead- based size selection system was utilized to target final library molecules for

97

a mean size of 450 base pairs. All libraries were then quantified on a Life Technologies

(Carlsbad, CA) Qubit Fluorometer and qualified on an Agilent (Santa Clara, CA) 2100

Bioanalyzer (Dr. Jeff Landgraf personal communication).

454 Sequencing

cDNA libraries were constructed from the RNA extractions using the SMARTer™ PCR cDNA Synthesis Kit from ClonTech (Mountain View, Ca. ) according to the manufacturer’s instructions. cDNA quality and size distribution were verified via 1% TAE gels and the Bio-Rad

(Hercules, CA) Experion system. cDNA libraries were then processed to attach the Rapid

Library Multiplex Identification (RL MID) Adapters according to the manufacturer’s protocol.

Libraries were then quality checked for size distribution with Agilent's (Santa Clara, Ca.) 2100

Bioanalyzer and quantified via fluorometry. Libraries were then pooled and sequenced on Roche

Applied Science’s (Indianapolis, IN ) Genome Sequencer FLX System with GS FLX Titanium 9 8 technology.

Bioinformatics

Data Assembly

Sequence read information from Roche’s GS FLX Standard Flowgram Format (sff) files and Illumina HiSeq 2000 2x100 PE fastq files were used as input for the CLC Bio Genomic

Workbench ver 6. All developmental read datasets were processed with the CLC Create

Sequencing QC Report tool to assess read quality. The CLC Trim Sequence process was used to trim the 454 read datasets for a Phred value of 15 and the Illumina reads were trimmed of the first 12 bases due to GC ratio variability and for a Phred score of 30. All read datasets were trimmed of ambiguous bases. Illumina reads were then processed through the CLC Merge

98

Overlapping Pairs tool and all reads were de novo assembled to produce contiguous sequences

(contigs). Trimmed reads used for assembly were mapped back to the assembled contigs, mapped reads were used to update the contigs and contigs with no mapped reads were ignored.

Consensus contig sequences were extracted as a multi-fasta file. The individual mature and young read datasets, original non-trimmed reads, were mapped back to the assembled contigs to generate individual developmental sample reads per contig and then normalized with the Reads

Per Kilobase per Million reads (RPKM) method as in (Mortazavi et al., 2008).

Annotation

Contig sequences were identified by alignment with blastx through Blast2GO (Conesa et al., 2005) as well as local stand-alone blastx alignments against the NCBI nr database (ver.

2.2.27+)(Altschul et al., 1997). Reciprocal blastn alignments were obtained through CLC Main

Workbench (ver 6) against a local stand-alone NCBI nt database (ver. 2.2.27+)(Altschul et al., 9 9 1997). Gene ontology (GO) annotation, enzyme code annotation and the EMBL-EBI

InterProScan annotation of predicted protein signatures were all annotated through Blast2GO

(Conesa et al., 2005). The BLAST annotated RNA-Seq datasets from the Suaeda genus were analyzed for GO enrichment with Blast2GO (Conesa et al., 2005). Due to assembler constraints and lack of genomic reference sequence, unless otherwise specified, expression analysis was restricted to the contig consensus sequence annotation and cannot differentiate between specific alleles, gene family members of highly similar sequence or subunit specificity without subsequent targeted molecular techniques (O’Neil & Emrich, 2013).

99

Proteomics and Transcriptomics Comparison

Bienertia sinuspersici proteomics data from Offermann et al (submitted) was mined for proteins identified as having a twofold or greater difference between the corresponding youngest developmental tissue (YY) and mature leaf tissue (M). Gene sequences obtained from the corresponding UN numbers used in the proteomics for assignation of protein ID numbers were blastn aligned to a Blast database constructed from the combined contig fasta sequences from the four Suaedoideae subfamily datasets. The top blastn hit identities from each of the four species corresponding to query sequences were calculated for RPKM values and compared.

1 0 0

100

References

Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped

BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic

Acids Research 25: 3389 –3402.

Boer AH, Kleeff PJM, Gao J. 2012. Plant 14-3-3 proteins as spiders in a web of phosphorylation.

Protoplasma 250: 425–440.

Chotewutmontri P, Reddick LE, McWilliams DR, Campbell IM, Bruce BD. 2012. Differential

Transit Peptide Recognition during Preprotein Binding and Translocation into Flowering

Plant Plastids. The Plant Cell.

Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M. 2005. Blast2GO: a universal

tool for annotation, visualization and analysis in functional genomics research.

Bioinformatics 21: 3674 –3676. 1 0 Denison FC, Paul A-L, Zupanska AK, Ferl RJ. 2011. 14-3-3 proteins in plant physiology. 1

Seminars in Cell & Developmental Biology 22: 720–727.

Edwards GE, Voznesenskaya EV. 2011. Chapter 4 C 4 Photosynthesis: Kranz Forms and Single-

Cell C 4 in Terrestrial Plants. C4 Photosynthesis and Related CO2 Concentrating

Mechanisms: 29–61.

Edwards GE, Walker DA. 1983. C3,C4: mechanisms, and cellular and environmental

regulation, of photosynthesis. Oxford,UK: Packard Publishing Limited.

Flores-Pérez Ú, Jarvis P. 2013. Molecular chaperone involvement in chloroplast protein import.

Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1833: 332–340.

Inaba T, Schnell DJ. 2008. Protein trafficking to plastids: one theme, many variations. Biochem J

413: 15–28.

101

Ivey RA, Subramanian C, Bruce BD. 2000. Identification of a Hsp70 Recognition Domain

within the Rubisco Small Subunit Transit Peptide. Plant Physiology 122: 1289–1300.

Jarvis P, López-Juez E. 2013. Biogenesis and homeostasis of chloroplasts and other plastids.

Nature Reviews Molecular Cell Biology 14: 787–802.

Langdale JA. 2011. C4 Cycles: Past, Present, and Future Research on C4 Photosynthesis. The

Plant Cell Online.

Lara MV, Offermann S, Smith M, Okita TW, Andreo CS, Edwards GE (2008) Leaf

Development in the Single-Cell C4 System in Bienertia sinuspersici: Expression of

Genes and Peptide Levels for C4 Metabolism in Relation to Chlorenchyma Structure

under Different Light Conditions. Plant Physiol 148: 593–610

Lee DW, Lee S, Oh YJ, Hwang I. 2009. Multiple Sequence Motifs in the Rubisco Small Subunit

Transit Peptide Independently Contribute to Toc159-Dependent Import of Proteins into 1 Chloroplasts. Plant Physiology 151: 129 –141. 0 2 Ludwig M (2012) Carbonic anhydrase and the molecular evolution of C4 photosynthesis:

Carbonic anhydrase and C4 photosynthesis. Plant Cell Environ 35: 22–37

Martin T, Sharma R, Sippel C, Waegemann K, Soll J, Vothknecht UC. 2006. A Protein Kinase

Family in Arabidopsis Phosphorylates Chloroplast Precursor Proteins. Journal of

Biological Chemistry 281: 40216–40223.

May T, Soll J. 2000. 14-3-3 Proteins Form a Guidance Complex with Chloroplast Precursor

Proteins in Plants. The Plant Cell Online 12: 53–63.

Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. 2008. Mapping and quantifying

mammalian transcriptomes by RNA-Seq. Nature Methods 5: 621–628.

102

O’Neil ST, Emrich SJ. 2013. Assessing De Novo transcriptome assembly metrics for consistency

and utility. BMC Genomics 14: 465.

Rosnow JJ, Edwards GE, Roalson EH. 2014. Positive selection of Kranz and non-Kranz C4

phosphoenolpyruvate carboxylase amino acids in Suaedoideae (Chenopodiaceae).

Journal of Experimental Botany.

Rosnow J, Yerramsetty P, Berry JO, Okita TW, Edwards GE (2014b) Exploring mechanisms

linked to differentiation and function of dimorphic chloroplasts in the single cell C4

species Bienertia sinuspersici. BMC Plant Biol 14: 34

Sharpe RM, Offermann S. 2014. One decade after the discovery of single-cell C4 species in

terrestrial plants: what did we learn about the minimal requirements of C4

photosynthesis? Photosynthesis Research 119: 169–180.

Voznesenskaya EV, Koteyeva NK, Chuong SDX, Akhani H, Edwards GE, Franceschi VR 1 (2005) Differentiation of cellular and biochemical features of the single-cell C4 0 3 syndrome during leaf development in Bienertia cycloptera (Chenopodiaceae). Am J Bot

92: 1784–1795

Xue S, Barna M. 2012. Specialized ribosomes: a new frontier in gene regulation and organismal

biology. Nature Reviews Molecular Cell Biology 13: 355–369.

103

A

B

1 0

4

Figure 1. 14-3-3 Identification and Expression Values. A. Comparative RPKM expression

values across the four Suaedoideae species. B. Phylogenetic analysis of blastn best hit

Suaedoideae contigs with proteomics identified 14-3-3 differentially expressed proteins.

Hatched RPKM value bars denote an identified additional gene copy or allele.

Phylogenetic tree color codes do not represent a species rather the different colors are

representative of different sub-trees and are provided for ease of identification.

104

C

B

1 0 5

Figure 2. Heat Shock Protein Identity and RPKM Values. A. Heat shock protein 70 RPKM

values corresponding to homologous Arabidopsis HSP70. B. Best blastn hits were to

HSP70-1 cognate genes. C. Phylogenetic breakout of best blastn hits against proteomic

identified UN070112 and UN053889. Red branches indicate blastn reference genes and

genes used for outliers to root the tree. Hatched RPKM value bars denote an identified

additional gene copy or allele. Phylogenetic tree color codes do not represent a species

rather the different colors are representative of different sub-trees and are provided for

ease of identification.

105

1 0 6

106

Figure 3. Photosynthetic Pre-protein Import TOC Component Expression. A. TOC159a isoform

denotes a single copy isoform for C4 species and two isoforms for C3 S. maritima. B.

TOC159b isoform where two C3 isoforms have duplicate copy number or allelic

expression more than twofold greater than the C4 isoforms. C. TOC75 homologs indicate

more than twofold greater expression in the Bs young tissue in comparison to the mature

tissues of the Suaedoideae species. D. TOC33/34 isoforms indicate the SCC4 B.

sinuspersici and the C3 species S. maritima expressed two isoforms and S. aralocaspica

as well as S. eltonica expressed one.

1 0

7

107

Figure 4. Initial analysis of carbonic anhydrase isoforms. Developmental expression profile indicates increasing transcript levels in B. sinuspersici for both βCA2 and βCA. The Kranz C4 S. 1 eltonica exhibited a very low βCA2 expression value compared to the βCA and αCA isoforms 0 8 and an expression profile much like the C3 S. maritima profile.

108

*FSW-frame shift prior to FSW area, ^ mapping-Stops before FA/SW area possible beginning to contig 4358, ~ mapping-FAW 5' premature stop codons

Figure 5. Phosphoenolpyruvate carboxylase (PEPC) expression values. Sequence curation indicated two isoforms of PEPC were present in Bienertia and S. eltonica and only one isoform in S. aralocaspica and S. maritima.

1 0 9

109

1 1 0

Figure 6. Transaminating and decarboxylation enzymes

110

1 1 1

Figure 7. Alternate C4 Biochemical Decarboxylases.

111

Chapter 4

Conclusions and Future Perspectives

The plant kingdom is diverse in form and function. This diversity has been exploited by humans from the beginning of our evolution and has provided food, shelter, fuel and medicines to humans from the moment we developed the capability to exploit these traits. The successful exploitation of these resources has allowed for the expansion and growth of the human population almost to the point of non-sustainability. The availability and projected demand for agricultural products in the next 15 years, according to the Food and Agriculture Organization of the United Nations report of 2009, indicate that the demand for this resource will outgrow our ability to meet the demand in the near future (Bruinsma, 2003). Finding ways to increase photosynthetic efficiency is an avenue to increase a plant’s efficiency in converting light, CO2 1 and water into products important to human sustenance (Zhu et al., 2010). The effort to find 1 2 solutions to meet the increasing demand for agricultural products has been an overreaching goal of photosynthesis research.

Work accomplished in generating an initial developmental transcriptome in Bienertia provides the research community with mRNA sequence as well as differential expression of genes on a global scale with which to generate hypotheses to elucidate key regulatory points.

Comparison of sequence and expression levels across closely related but structurally different photosynthetic types indicates differential isoform recruitment into the various pathways. Now that these resources are available increased efforts can be made to increase forward genetics as well as reverse genetics information. The establishment of a B. sinuspersici TILLING population is now feasible with increased greenhouse generated seed stocks. Screening for structural and

112

biochemical phenotypes mutated via ethyl methanesufonate (EMS) or gamma radiation will enable forward genetics efforts in the identification of genes at the crux of the unique SCC4 phenotype. Reverse genetics techniques are a future aim as regeneration protocols still require maturity.

Differential spatial and temporal chloroplast targeting mechanisms that must be present in the SCC4 species, indicated by differential protein accumulation shown in previous SCC4 studies (reviewed in Sharpe and Offermann, 2014), have increased the structural, biochemical and physiological knowledge of these unique species. The work presented in this thesis has brought more insight through the analysis of the preceding RNASeq studies. There is little dispute that increased photosynthetic efficiency translates to increases in biomass and yields.

What is not yet decided is the method or methods on how to best achieve this goal. Selective breeding or artificial selection has been the mainstay of crop improvement in agriculture. 1 1 Breeding approaches are time and resource consuming and estimates in wheat take between 10 3 and 20 years depending upon the methods used and 2 to 3 years for maize to get to the advanced testing phase and (Heffner et al., 2010). Traits such as C4 photosynthesis have not been observed in the major cereal grains with the exception of maize and this limitation excludes breeding as a viable option in an attempt to confer these energetic efficient traits. While the C4 phenotype has not been observed in the majority of cereal crops the enzymes involved in the C4 pathway for this phenotype have been shown to be present in C3 plants (Aubry et al., 2011). Understanding the mechanisms underlying the induction and functional translation as well as the translocation to the functional compartments is necessary to selectively confer beneficial traits to agricultural crops.

113

References

Aubry S, Brown NJ, Hibberd JM (2011) The role of proteins in C3 plants prior to their recruitment into the C4 pathway. J Exp Bot 62: 3049 –3059

Bruinsma J, ed (2003) World agriculture: towards 2015/2030 An FAO Perspective. Earthscan Publications Ltd, London

Heffner EL, Lorenz AJ, Jannink J-L, Sorrells ME (2010) Plant Breeding with Genomic Selection: Gain per Unit Time and Cost. Crop Sci 50: 1681

Sharpe RM, Offermann S (2014) One decade after the discovery of single-cell C4 species in terrestrial plants: what did we learn about the minimal requirements of C4 photosynthesis? Photosynth Res 119: 169–180

Zhu X-G, Long SP, Ort DR (2010) Improving Photosynthetic Efficiency for Greater Yield. Annu Rev Plant Biol 61: 235–261

1

1 4

114

Appendix

CHAPTER 2

Supplementary Data 1 …………………………………………………………………..1

1 1 5

115

Supplementary Data 1

Flow Cytometric Estimation of Nuclear DNA Content of B. sinuspersici Leaf Samples

Flow cytometry was performed at the Flow Cytometry and Imaging Core Laboratory at Virginia

Mason Research Center. Four technical replicates of four biological samples were run on two separate occasions by two separate technicians. B. sinuspersici samples #1 and #2 were compared to chicken erythrocyte nuclei and samples #3 and #4 were compared to wheat nuclei.

Picogram to nucleotide calculations were made with 1pg = 980Mbp (Supplementary Data 1

Table 1).

1 1 6

116

Supplementary Data 1 Table 1. Flow cytometry results of 16 B. sinuspersici replicates Sample Name (pg/2C) mbp/1C Bienertia sinuspersici #1 6.286 3033 Bienertia sinuspersici #1 5.976 2883 Bienertia sinuspersici #1 5.726 2763 Bienertia sinuspersici #1 5.749 2774 Bienertia sinuspersici #2 7.736 3733 Bienertia sinuspersici #2 7.850 3788 Bienertia sinuspersici #2 7.742 3735 Bienertia sinuspersici #2 7.694 3712 Bienertia sinuspersici #3 8.260 3985 Bienertia sinuspersici #3 8.045 3882 Bienertia sinuspersici #3 8.108 3912 Bienertia sinuspersici #3 8.074 3896 1 Bienertia sinuspersici #4 8.148 3931 1 Bienertia sinuspersici #4 7.797 3762 7

Bienertia sinuspersici #4 8.771 4232 Bienertia sinuspersici #4 8.422 4064 Average of 16 replicates 7.524 3630 Standard Deviation 0.994 480

117

Supplementary Data 2

Blast2GO’s Gene Ontology (GO) annotation for a sequence relies on the results generated by a blastx alignment from an in silico nucleotide translation. The reference species assigned is the best match from the results generated by the computational algorithms used by the blastx alignment. Of the five times average coverage and at least 200 base length filtered

73,486 B. sinuspersici ESTs assembled, 28,313 ESTs had GO associations annotated through

Blast2GO. A total of 296,017 GO terms were associated with the B. sinuspersici 28,313 ESTs;

47.95% of the reference terms being automatically assigned, 29.63% of the reference terms were derived from computational analysis, 18.07% of the reference terms arrived at from experimental evidence and 4.36% of the reference terms specified by the authors or curators

(Conesa et al., 2005). 26,067 of the EST annotations were made primarily from 28 species while the remaining 2,246 were categorized as “other” or “unknown” (Supplementary Data 2 1 1 Figure 1). GO associations rely on inter-species sequence homology at the amino acid level via 8 blastx alignments or, when compared at the nucleotide level, blastn alignments. Genus and species identity, as a result from GO associations made at the nucleotide level, may or may not correspond to the same genus and species made at the amino acid level. This is due to the codon degeneracy that makes up the amino acids and the amino acid sequence differences between species. In the, relatively small, pool of inter-species highly conserved genes, it is expected there will be instances of 100% similarity at the amino acid level but, even in these highly conserved genes, 100% nucleotide sequence similarity is very rare even between sister species.

The e-value associated with an annotation is the probability of a sequence, even if the sequence does not have a 100% similarity match, either at the amino acid or nucleotide level, is homologous to the identity of the reference gene. By inference, the unknown sequence is either

118

a paralogue or an ortholog to the reference gene. The use of annotations, and the probability assigned to individual annotations, are a useful tool in assessing the confidence level to which the identity of an unknown sequence can be made.

Due to the predictive nature of GO assignments, the 28 species top blastx assignments to the 28,313 B. sinuspersici ESTs were analyzed for assignments with e-values of 0.00. When an inter-species alignment of 100% similarity, on the amino acid level between the BsRef EST and an assigned top homologous match species, the BsRef EST nucleotide sequence was extracted from the database and a reciprocal blastn of the extracted sequence was made against the nt database at NCBI. Species assigned with a 100% similarity at the amino acid level and at the nucleotide level with an e-value of 0.00 can be found in Supplementary Data 2 Table 1. Results from this analysis were as expected with amino acid sequence annotations having a high confidence level and very little, if any, blastn e-values of 0.00 (Supplementary Data 2 Table 1). 1 1 Exceptions were in the annotated ESTs from species Vitis vinifera and Solanum lycopersicum. 9

An initial assembly, annotation and Gene Ontology enrichment analysis indicated assembled sequences, present in mature tissue GO enrichment of GO:0005840 ribosome and young tissue

GO enrichment of GO:0005840 ribosome, had higher than expected homology to Solanum lycopersicum sequence (data not shown). Subsequent to the initial assembly and analysis, a stricter, more constrained and targeted read trimming and assembly as described in the

Materials and Methods was conducted. Annotated BsRef ESTs with homology to Solanum lycopersicum sequences outlined in Supplementary File 1 a, b & c were removed from the final dataset. The four BsRef sequences with 100% nucleotide similarity to Vitis vinifera sequences were no longer than 33 bases in length and constituted 0.03% of the Vitis vinifera annotations and 0.01% of the overall annotated ESTs. Solanum lycopersicum associated blastn annotations

119

on the other hand, were assigned to 179 of the 405 blastx annotated ESTs as well as blastn e- values of 0.00 with sequence identities of 100% with an average hit length of 842 bases across the 179 EST dataset.

We also posed the question what affect might these small ratios have on the overall annotation profile if incorporated sequences were possibly missed in the extraction process. To answer this question an analysis of the three Gene Ontologies were queried for underrepresented and over represented terms with and without the BsRef 100% nucleotide

Solanum lycopersicum annotated homologous sequences (Supplementary Data 2 Figure 2). All comparative annotated datasets were within 5% of each other indicating if any Solanum lycopersicum influenced BsRef ESTs remained in the database the effects imparted on the final analysis would be minimal.

1 2 0

120

1 2 1

Supplementary Data 2 Figure 1. Species top blastx hit distribution. Number of sequences associated with the species are to the right of the species.

121

Supplementary Data 2 Table 1. Bienertia sinuspersici top species blastx to blastn comparisons with e-values of 0.00.

Reciprocal Number 100% Number Top Number Top 0.00 blastn 0.00 blastx Similarity blastx ESTs blastx e-value ESTs e-value Species ESTs Match Vitis vinifera 10,308 384 23 4 Populus trichocarpa 3,489 379 1 0 Ricinus communis 3,371 530 0 0 Glycine max 2,103 230 0 0 Beta vulgaris 1,069 74 0 0 Medicago truncatula 993 47 0 0 Arabidopsis thaliana 668 39 0 0 Oryza sativa 425 17 0 0 Arabidopsis lyrata 425 21 0 0 Solanum 405 179 179 179 lycopersicum Silene latifolia 379 57 1 0 Spinacia oleracea 345 67 0 0 Lotus japonicus 290 5 0 0 1 Solanum tuberosum 256 27 1 0 2 Nicotiana tabacum 194 22 0 0 2 Mesembryanthemum 172 43 0 0 crystallinum Sorghum bicolor 167 7 0 0 Brachypodium 150 1 0 0 distachyon Malus x 148 7 0 0 Zea mays 109 1 0 0 Cucumis melo 102 5 0 0 Gossypium hirsutum 90 10 0 0 Jatropha curcas 86 8 0 0 unknown 85 38 0 0 Dianthus 79 13 0 0 caryophyllus Theobroma cacao 71 17 0 0 Solanum demissum 61 3 0 0 Hordeum vulgare 57 0 0 0 Prunus persica 55 12 0 0

122

1 2 Supplementary Data 2 Figure 2. Ontology assignment comparison between complete 3

Bienertia sinuspersici assembly and B. sinuspersici assembly dataset with Solanum lycopersicum blastn greater than 95% homologous contigs removed.

123

References

Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (2005) Blast2GO: a

universal tool for annotation, visualization and analysis in functional genomics research.

Bioinformatics 21: 3674 –3676

1 2 4

124

Supplementary Data 2 Table 2. Cellular Compartment enriched GO term comparisons between young and mature tissues. Mature Shared Young GO:0022625 cytosolic large overM GO:0022625 cytosolic large overY&M GO:0005871 kinesin complex overY ribosomal ribosomal subunit subunit GO:0005730 nucleolus overM GO:0005730 nucleolus overY&M GO:0042555 MCM complex overY GO:0022627 cytosolic overM GO:0005886 plasma overY&M GO:0000796 condensin overY small membrane complex ribosomal subunit GO:0005618 cell wall overM GO:0009523 photosystem II overM & GO:0000786 nucleosome overY underY GO:0048046 apoplast overM GO:0009941 chloroplast overM & GO:0009574 preprophase overY envelope underY band GO:0005886 plasma overM GO:0030915 Smc5-Smc6 overY membrane complex 125 GO:0009523 photosystem overM GO:0009506 plasmodesma overY II GO:0005774 vacuolar overM GO:0010369 chromocenter overY membrane GO:0030076 light- overM GO:0009330 DNA overY harvesting topoisomerase complex complex (ATP- hydrolyzing) GO:0005747 mitochondrial overM GO:0005950 anthranilate overY respiratory synthase chain complex complex I GO:0030093 chloroplast overM GO:0055028 cortical overY photosystem I microtubule GO:0000323 lytic vacuole overM GO:0005951 carbamoyl- overY

phosphate synthase complex GO:0009538 photosystem I overM GO:0000776 kinetochore overY reaction center GO:0045263 proton- overM GO:0045177 apical part of overY transporting cell ATP synthase complex, coupling factor F(o) GO:0010287 plastoglobule overM GO:0005819 spindle overY GO:0009941 chloroplast overM GO:0043601 nuclear overY envelope replisome

GO:0005768 endosome underM GO:0000930 gamma-tubulin overY

complex GO:0005802 trans-Golgi underM GO:0005720 nuclear overY 126 network heterochromatin GO:0044454 nuclear underM GO:0009505 plant-type cell overY chromosome wall part GO:0042644 chloroplast overY nucleoid GO:0009535 chloroplast underY thylakoid membrane GO:0005746 mitochondrial underY respiratory chain GO:0005759 mitochondrial underY matrix GO:0042170 plastid underY membrane

GO:0005778 peroxisomal underY membrane GO:0031977 thylakoid lumen underY GO:0022625 cytosolic large overY ribosomal subunit GO:0005730 nucleolus overY GO:0005886 plasma overY membrane GO:0009523 photosystem II underY GO:0009941 chloroplast underY envelope

127

Supplementary Data 2 Table 3. Young and mature tissues Molecular Function enriched GO term comparisons.

Mature Shared Young

structural constituent GO:0003735 structural constituent of ribosome overM GO:0003735 overY GO:0008017 microtubule binding overY of ribosome polysaccharide sequence-specific DNA binding transcription factor GO:0020037 heme binding overM GO:0030247 underY GO:0003700 overY binding activity microtubule motor GO:0016168 chlorophyll binding overM GO:0003777 overY GO:0015189 L-lysine transmembrane transporter activity overY activity GO:0015035 protein disulfide oxidoreductase activity overM GO:0005524 ATP binding overY GO:0015181 arginine transmembrane transporter activity overY

GO:0009055 electron carrier activity overM GO:0016165 linoleate 13S-lipoxygenase activity overY

GO:0004601 peroxidase activity overM GO:0043565 sequence-specific DNA binding overY

GO:0005507 copper ion binding overM GO:0004575 sucrose alpha-glucosidase activity overY

GO:0008061 chitin binding overM GO:0033926 glycopeptide alpha-N-acetylgalactosaminidase activity overY

GO:0005509 calcium ion binding overM GO:0005516 calmodulin binding overY

oxidoreductase activity, acting on paired donors, with 128 GO:0016705 overM GO:0003896 DNA primase activity overY incorporation or reduction of molecular oxygen GO:0004568 chitinase activity overM GO:0004312 fatty acid synthase activity overY

GO:0008794 arsenate reductase (glutaredoxin) activity overM GO:0052659 inositol-1,3,4,5-tetrakisphosphate 5-phosphatase activity overY

GO:0052854 medium-chain-(S)-2-hydroxy-acid oxidase activity overM GO:0035198 miRNA binding overY

GO:0052853 long-chain-(S)-2-hydroxy-long-chain-acid oxidase activity overM GO:0035197 siRNA binding overY

GO:0052852 very-long-chain-(S)-2-hydroxy-acid oxidase activity overM GO:0046982 protein heterodimerization activity overY

GO:0030247 polysaccharide binding overM GO:0008146 sulfotransferase activity overY

GO:0004497 monooxygenase activity overM GO:0035175 histone kinase activity (H3-S10 specific) overY

GO:0046524 sucrose-phosphate synthase activity overM GO:0004650 polygalacturonase activity overY

GO:0005381 iron ion transmembrane transporter activity overM GO:0008810 cellulase activity overY

GO:0004435 phosphatidylinositol phospholipase C activity overM GO:0003918 DNA topoisomerase type II (ATP-hydrolyzing) activity overY

Supplementary Data 2 Table 3. Young and mature tissues Molecular Function enriched GO term comparisons.

Mature Shared Young GO:0047760 butyrate-CoA ligase activity overM GO:0052658 inositol-1,4,5-trisphosphate 5-phosphatase activity overY cyclin-dependent protein serine/threonine kinase GO:0000823 inositol-1,4,5-trisphosphate 6-kinase activity overM GO:0016538 regulator activity overY

GO:0052716 hydroquinone:oxygen oxidoreductase activity overM GO:0004340 glucokinase activity overY carbamoyl-phosphate synthase (glutamine-hydrolyzing) GO:0008568 microtubule-severing ATPase activity overM GO:0004088 activity overY

GO:0004176 ATP-dependent peptidase activity overM GO:0033729 anthocyanidin reductase activity overY

GO:0005506 iron ion binding overM GO:0004301 epoxide hydrolase activity overY

GO:0052725 inositol-1,3,4-trisphosphate 6-kinase activity overM GO:0003886 DNA (cytosine-5-)-methyltransferase activity overY dolichyl-diphosphooligosaccharide-protein GO:0008891 glycolate oxidase activity overM GO:0004579 glycotransferase activity overY polygalacturonate 4-alpha-galacturonosyltransferase GO:0051765 inositol tetrakisphosphate kinase activity overM GO:0047262 activity overY

GO:0016767 geranylgeranyl-diphosphate geranylgeranyltransferase activity overM GO:0051082 unfolded protein binding overY 129 GO:0004713 protein tyrosine kinase activity underM GO:0008865 fructokinase activity overY

GO:0003777 microtubule motor activity underM GO:0016906 sterol 3-beta-glucosyltransferase activity overY

GO:0005524 ATP binding underM GO:0051219 phosphoprotein binding overY

GO:0004674 protein serine/threonine kinase activity underM GO:0010293 abscisic aldehyde oxidase activity overY

GO:0008094 DNA-dependent ATPase activity underM GO:0004445 inositol-polyphosphate 5-phosphatase activity overY

GO:0005089 Rho guanyl-nucleotide exchange factor activity overY

GO:0010385 double-stranded methylated DNA binding overY

GO:0045486 naringenin 3-dioxygenase activity overY

GO:0018685 alkane 1-monooxygenase activity overY

GO:0004087 carbamoyl-phosphate synthase (ammonia) activity overY GO:0015180 L-alanine transmembrane transporter activity overY

GO:0080023 3R-hydroxyacyl-CoA dehydratase activity overY organic phosphonate transmembrane-transporting GO:0015416 ATPase activity overY

GO:0016174 NAD(P)H oxidase activity overY

GO:0004775 succinate-CoA ligase (ADP-forming) activity overY

GO:0004003 ATP-dependent DNA helicase activity overY

GO:0019199 transmembrane receptor protein kinase activity overY

GO:0060590 ATPase regulator activity overY

Supplementary Data 2 Table 3. Young and mature tissues Molecular Function enriched GO term comparisons.

GO:0004775 succinate-CoA ligase (ADP-forming) activity overY

GO:0004003 ATP-dependent DNA helicase activity overY

GO:0019199 transmembrane receptor protein kinase activity overY

GO:0060590 ATPase regulator activity overY

GO:0046029 mannitol dehydrogenase activity overY

GO:0010279 indole-3-acetic acid amido synthetase activity overY under GO:0003964 RNA-directed DNA polymerase activity Y under GO:0008270 zinc ion binding Y under GO:0004842 ubiquitin-protein ligase activity Y under GO:0008137 NADH dehydrogenase (ubiquinone) activity Y under 130 GO:0050308 sugar-phosphatase activity Y under GO:0004806 triglyceride lipase activity Y under GO:0004652 polynucleotide adenylyltransferase activity Y hydrolase activity, acting on carbon-nitrogen (but not under GO:0016811 peptide) bonds, in linear amides Y under GO:0004222 metalloendopeptidase activity Y under GO:0008948 oxaloacetate decarboxylase activity Y

GO:0003735 structural constituent of ribosome overY under GO:0030247 polysaccharide binding Y

GO:0003777 microtubule motor activity overY

GO:0005524 ATP binding overY

Supplementary Data 2 Table 3. Young and mature tissues Biological Process enriched GO term comparisons.

Mature Shared Young

GO:00 GO:0046686 response to cadmium ion overM 09909 regulation of flower development overY GO:0051225 spindle assembly overY GO:00 GO:0009651 response to salt stress overM 51567 histone H3-K9 methylation overY GO:0016444 somatic cell DNA recombination overY GO:00 GO:0009765 photosynthesis, light harvesting overM 06270 DNA replication initiation overY GO:0006268 DNA unwinding involved in DNA replication overY GO:00 GO:0009644 response to high light intensity overM 01510 RNA methylation overY GO:0007076 mitotic chromosome condensation overY GO:00 GO:0006032 chitin catabolic process overM 09560 embryo sac egg cell differentiation overY GO:0048653 anther development overY GO:00 GO:0045454 cell redox homeostasis overM 06346 methylation-dependent chromatin silencing overY GO:0008356 asymmetric cell division overY GO:00 GO:0009414 response to water deprivation overM 07131 reciprocal meiotic recombination overY GO:0010267 production of ta-siRNAs involved in RNA interference overY GO:00 transmembrane receptor protein tyrosine kinase GO:0018298 protein-chromophore linkage overM 31048 chromatin silencing by small RNA overY GO:0007169 signaling pathway overY GO:00 isopentenyl diphosphate biosynthetic process, GO:0006007 glucose catabolic process overM 19288 methylerythritol 4-phosphate pathway underY GO:0048451 petal formation overY GO:00 regulation of cyclin-dependent protein GO:0051707 response to other organism overM 15074 DNA integration underY GO:0000079 serine/threonine kinase activity overY 131 GO:00 regulation of G2/M transition of mitotic cell GO:0007568 aging overM 10389 cycle overY GO:0007000 nucleolus organization overY GO:00 GO:0010045 response to nickel cation overM 00956 nuclear-transcribed mRNA catabolic process underY GO:0048453 sepal formation overY GO:00 GO:0001510 RNA methylation overM 10075 regulation of meristem growth overY GO:0009664 plant-type cell wall organization overY GO:00 anthocyanin accumulation in tissues in response to UV GO:0009733 response to auxin stimulus overM 07129 synapsis overY GO:0043481 light overY GO:00 GO:0006662 glycerol ether metabolic process overM 07018 microtubule-based movement overY GO:0009220 pyrimidine ribonucleotide biosynthetic process overY PSII associated light-harvesting complex II catabolic GO:00 production of miRNAs involved in gene GO:0010304 process overM 35196 silencing by miRNA overY GO:0015819 lysine transport overY GO:00 GO:0010114 response to red light overM 06508 proteolysis underY GO:0015809 arginine transport overY

GO:0009853 photorespiration overM GO:0007140 male meiosis overY

Supplementary Data 2 Table 3. Young and mature tissues Biological Process enriched GO term comparisons.

GO:0010043 response to zinc ion overM GO:0006334 nucleosome assembly overY

GO:0015074 DNA integration overM GO:0008361 regulation of cell size overY

GO:0009408 response to heat overM GO:0051302 regulation of cell division overY

GO:0006200 ATP catabolic process overM GO:0042761 very long-chain fatty acid biosynthetic process overY

GO:0019605 butyrate metabolic process overM GO:0009855 determination of bilateral symmetry overY maturation of SSU-rRNA from tricistronic rRNA GO:0000462 transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA) overM GO:0051016 barbed-end actin filament capping overY

GO:0034755 iron ion transmembrane transport overM GO:0009825 multidimensional cell growth overY

GO:0010218 response to far red light overM GO:0006265 DNA topological change overY

GO:0046274 lignin catabolic process overM GO:0007020 microtubule nucleation overY

GO:0010200 response to chitin overM GO:0006085 acetyl-CoA biosynthetic process overY 132 GO:0009767 photosynthetic electron transport chain overM GO:0009102 biotin biosynthetic process overY

GO:0009723 response to ethylene stimulus overM GO:0009926 auxin polar transport overY

GO:0006414 translational elongation overM GO:0043987 histone H3-S10 phosphorylation overY GO:0042542 response to hydrogen peroxide overM GO:0009624 response to nematode overY

GO:0009686 gibberellin biosynthetic process overM GO:0010067 procambium histogenesis overY negative regulation of plant-type hypersensitive GO:0034051 response overM GO:0009640 photomorphogenesis overY GO:0030308 negative regulation of cell growth overM GO:0010588 cotyledon vascular tissue pattern formation overY GO:0019805 quinolinate biosynthetic process overM GO:0051607 defense response to virus overY

Supplementary Data 2 Table 3. Young and mature tissues Biological Process enriched GO term comparisons.

GO:0006508 proteolysis overM GO:0016126 sterol biosynthetic process overY

GO:0009737 response to abscisic acid stimulus overM GO:0007267 cell-cell signaling overY GO:0009909 regulation of flower development underM GO:0045010 actin nucleation overY GO:0006275 regulation of DNA replication underM GO:0009934 regulation of meristem structural organization overY GO:0010090 trichome morphogenesis underM GO:0009699 phenylpropanoid biosynthetic process overY

GO:0009793 embryo development ending in seed dormancy underM GO:0042335 cuticle development overY

GO:0045010 actin nucleation underM GO:0006269 DNA replication, synthesis of RNA primer overY

GO:0051567 histone H3-K9 methylation underM GO:0016233 telomere capping overY

GO:0006468 protein phosphorylation underM GO:0000914 phragmoplast assembly overY

GO:0006306 DNA methylation underM GO:0009089 lysine biosynthetic process via diaminopimelate overY vegetative to reproductive phase transition of

GO:0010228 meristem underM GO:0010162 seed dormancy process overY 133

GO:0010016 shoot system morphogenesis underM GO:0009860 pollen tube growth overY

GO:0006270 DNA replication initiation underM GO:0010212 response to ionizing radiation overY

GO:0000911 cytokinesis by cell plate formation underM GO:0016132 brassinosteroid biosynthetic process overY

GO:0007155 cell adhesion underM GO:0009556 microsporogenesis overY

GO:0048449 floral organ formation underM GO:0007143 female meiosis overY

GO:0009555 pollen development underM GO:0007093 mitotic cell cycle checkpoint overY double-strand break repair via homologous GO:0000724 recombination underM GO:0044030 regulation of DNA methylation overY double-strand break repair via synthesis-dependent GO:0009560 embryo sac egg cell differentiation underM GO:0045003 strand annealing overY

GO:0006346 methylation-dependent chromatin silencing underM GO:0031408 oxylipin biosynthetic process overY

GO:0007131 reciprocal meiotic recombination underM GO:0010305 leaf vascular tissue pattern formation overY

Supplementary Data 2 Table 3. Young and mature tissues Biological Process enriched GO term comparisons.

GO:0031048 chromatin silencing by small RNA underM GO:0048016 inositol phosphate-mediated signaling overY isopentenyl diphosphate biosynthetic process, GO:0019288 methylerythritol 4-phosphate pathway underM GO:0015808 L-alanine transport overY GO:0010389 regulation of G2/M transition of mitotic cell cycle underM GO:0048830 adventitious root development overY GO:0006486 protein glycosylation underM GO:0035019 somatic stem cell maintenance overY GO:0010374 stomatal complex development underM GO:0009964 negative regulation of flavonoid biosynthetic process overY GO:0010051 xylem and phloem pattern formation underM GO:0046856 phosphatidylinositol dephosphorylation overY GO:0030422 production of siRNA involved in RNA interference underM GO:0045786 negative regulation of cell cycle overY GO:0045595 regulation of cell differentiation underM GO:0010119 regulation of stomatal movement overY GO:0000956 nuclear-transcribed mRNA catabolic process underM GO:0010025 wax biosynthetic process overY

GO:0010075 regulation of meristem growth underM GO:0048440 carpel development overY

GO:0007129 synapsis underM GO:0019915 lipid storage overY 134 GO:0070646 protein modification by small protein removal underM GO:0046785 microtubule polymerization overY

GO:0007018 microtubule-based movement underM GO:0090116 C-5 methylation of cytosine overY

GO:0010332 response to gamma radiation underM GO:0010440 stomatal lineage progression overY

GO:0010014 meristem initiation underM GO:0080051 cutin transport overY

GO:0009630 gravitropism underM GO:0009303 rRNA transcription overY

GO:0006312 mitotic recombination underM GO:0006564 L-serine biosynthetic process overY production of miRNAs involved in gene silencing by GO:0035196 miRNA underM GO:0006013 mannose metabolic process overY

GO:0006694 steroid biosynthetic process underM GO:0009957 epidermal cell fate specification overY

GO:0042127 regulation of cell proliferation underM GO:0007349 cellularization overY

GO:0006397 mRNA processing underM GO:0010029 regulation of seed germination overY

Supplementary Data 2 Table 3. Young and mature tissues Biological Process enriched GO term comparisons.

GO:0048366 leaf development underM GO:0006278 RNA-dependent DNA replication underY

GO:0008033 tRNA processing underM GO:0006098 pentose-phosphate shunt underY GO:0009932 cell tip growth underM GO:0010207 photosystem II assembly underY

GO:0010027 thylakoid membrane organization underY GO:0016117 carotenoid biosynthetic process underY GO:0006623 protein targeting to vacuole underY GO:0016558 protein import into peroxisome matrix underY GO:0010264 myo-inositol hexakisphosphate biosynthetic process underY GO:0016197 endosomal transport underY

GO:0045492 xylan biosynthetic process underY

GO:0010413 glucuronoxylan metabolic process underY 135 GO:0019252 starch biosynthetic process underY

GO:0016556 mRNA modification underY

GO:0042773 ATP synthesis coupled electron transport underY

GO:0043085 positive regulation of catalytic activity underY

GO:0015995 chlorophyll biosynthetic process underY

GO:0035304 regulation of protein dephosphorylation underY

GO:0009637 response to blue light underY

GO:0048585 negative regulation of response to stimulus underY

GO:0048193 Golgi vesicle transport underY

GO:0000303 response to superoxide underY

GO:0006417 regulation of translation underY

GO:0008272 sulfate transport underY

GO:0006655 phosphatidylglycerol biosynthetic process underY

Supplementary Data 2 Table 3. Young and mature tissues Biological Process enriched GO term comparisons.

GO:0008272 sulfate transport underY GO:0006655 phosphatidylglycerol biosynthetic process underY GO:0000023 maltose metabolic process underY GO:0048573 photoperiodism, flowering underY GO:0019674 NAD metabolic process underY GO:0006879 cellular iron ion homeostasis underY GO:0071281 cellular response to iron ion underY

136

Assembly of different gene structures

Pyruvate, orthophosphate dikinase (PPDK)

PPDK regenerates phosphoenolpyruvate (PEP) via phosphorylation of pyruvate as well as pyruvate kinase (PK) and phosphoenolpyruvate carboxykinase (PEPCK) (Parsley and

Hibberd, 2006; Chastain et al., 2011). In the photosynthetic pathway expression initiation of the

PPDK gene differs between C3 and C4 isozyme versions. C3 versions of PPDK initiate transcription in the first exon thus there is no signal peptide to direct the pre-protein to the chloroplast localizing the protein in the cytoplasm (Parsley and Hibberd, 2006).

137

Supplementary Data 3 Figure 1. Suaedoideae species PPDK characterization. Blue is Bienertia sinuspersici, green is Suaeda aralocaspica, red is Suaeda eltonica and black is Suaeda maritima. Salmon color histograms denote read coverage across the coding region for each EST.

This depiction is from a classic one copy gene where the allelic variation is minimal or non- existent in the expression profile resulting in clear single full length EST sequence.

138

Supplementary Data 3 Figure 2. Suaedoideae species Serine-Glyoxylate Transaminase

(SGAT) characterization. Purple is young Bienertia sinuspersici, blue is mature Bienertia sinuspersici, green is Suaeda aralocaspica, red is Suaeda eltonica and black is Suaeda maritima. Assembled contig length is great enough to identify a full length B. sinuspersici EST

(Bs contig 2308) as well as a truncated version (Bs contig 31130). Whether this depicts allelic variation or a second gene copy as in S. maritima

139

References

Chastain CJ, Failing CJ, Manandhar L, Zimmerman MA, Lakner MM, Nguyen THT (2011)

Functional evolution of C4 pyruvate, orthophosphate dikinase. J Exp Bot 62: 3083 –3091

Parsley K, Hibberd J (2006) The Arabidopsis PPDK gene is transcribed from two promoters to

produce differentially expressed transcripts responsible for cytosolic and plastidic

proteins. Plant Mol Biol 62: 339–349–349

140