Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations

2008 A genomic study of soybean iron deficiency chlorosis Jamie Aileen O'Rourke Iowa State University

Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Agricultural Science Commons, Agronomy and Crop Sciences Commons, and the Horticulture Commons

Recommended Citation O'Rourke, Jamie Aileen, "A genomic study of soybean iron deficiency chlorosis" (2008). Retrospective Theses and Dissertations. 15824. https://lib.dr.iastate.edu/rtd/15824

This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. A genomic study of soybean iron deficiency chlorosis

by

Jamie Aileen O’Rourke

A dissertation submitted to the graduate faculty

in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Major: Genetics

Program of Study Committee: Randy C. Shoemaker, Co-Major Professor Steve Whitham, Co-Major Professor Silvia Cianzio Thomas Baum David Hannapel

Iowa State University

Ames, Iowa

2008

Copyright © Jamie Aileen O’Rourke, 2008. All rights reserved. 3316180

3316180 2008 ii

Dedicated to:

To my parents, Tamia and Dennis, and my sisters, Kelly and Darcy. For always believing

in and supporting my dreams and constantly inspiring me to be my best. iii

TABLE OF CONTENTS

CHAPTER 1: GENERAL INTRODUCTION

Introduction 1

Dissertation Organization 2

Literature Review 5

References 28

CHAPTER 2: MICROARRAY ANALYSIS OF IRON DEFICIENCY CHLOROSIS IN

NEAR-ISOGENIC SOYBEAN LINES

Abstract 36

Background 39

Results 40

Discussion 44

Conclusion 53

Materials and Methods 54

References 61

CHAPTER 3: RECOVERING FROM IRON DEFICIENCY CHLOROSIS IN IRON

DEFICIENCY CHLORISIS IN NEAR-ISOGENIC SOYBEANS: A MICROARRAY

STUDY

Abstract 73

Introduction 74

Results 76

Discussion 78 iv

Materials and Methods 81

References 85

Chapter 4: INTEGRATING MICROARRAY ANALYSIS AND THE SOYBEAN

GENOME TO UNDERSTAND SOYBEANS IRON DEFICIENCY RESPONSE

Abstract 94

Introduction 96

Materials and Methods 98

Results 104

Discussion 110

Conclusion 119

References 120

CHAPTER 5: GENERAL CONCLUSIONS

Conclusion 133

Future Research 134

References 135

APPENDIX: SUPPLEMENTARY TABLES

Supplementary Table 1 137

Supplementary Table 2 138

Supplementary Table 3 173

Supplementary Table 4 187

v

Abstract

Soybeans grown in the farmlands of the upper Midwest of the United States often suffer from iron deficiency chlorosis which results in end of season yield loss. Previous studies in soybean have identified 19 Quantitative Trait Loci (QTL) regions with an association to iron deficiency chlorosis. However, specific genes involved in the process of iron reduction, uptake and transport have not been identified in soybean. Through the use of near isogenic soybean lines (NILs), Expressed Sequence Tag (EST) microarrays, and Affymetrix® GeneChips®, I have identified a suite of candidate genes putatively involved in the soybean iron deficiency response. Single linkage cluster analysis of the candidate genes and genes known to be induced by biotic and abiotic stresses determined that under iron stress conditions, iron efficient soybeans initiate both a general stress response and an iron specific stress response. Iron inefficient soybeans do not initiate a complete complement of either response. We also determined that periods of iron deficiency stress have effects on soybean gene expression that continue after the iron deficiency is alleviated. Candidate genes identified through Affymetrix® GeneChip® analysis confirmed both these results. Candidate genes have been aligned with the 7X soybean genome sequence to identify 56 candidate genes from the iron efficient NIL and

22 candidate genes from the iron inefficient NIL located within the previously identified

QTLs. The hybridization data from the Affymetrix® GeneChips® were also used to identify single feature polymorphisms. The identified sequence differences between the

NILs have the potential to be developed as iron specific molecular markers. The promoter regions of the candidate genes were bioinformatically mined to identify conserved motifs, most likely indicative of transcription factor binding sites involved in vi soybean iron deficiency response. These motifs can be used to query the genome sequence to identify additional genes, which might be involved in soybean iron deficiency chlorosis but were not identified by microarray analysis. Finally, the genome alignment shows genes differentially expressed due to iron deficiency chlorosis are not randomly dispersed throughout the genome, but are instead clustered together. This fact, taken together with the identification of conserved promoter motifs, suggests that the clustered genes are most likely coordinately regulated. These analyses have provided the first suite of candidate genes for iron deficiency chlorosis in soybean. 1

Chapter 1. General Introduction

Introduction

Iron is one of 17 essential micronutrients required for proper nutrition [1].

Though humans only require between 10mg and 15mg of iron per day [1], in developing countries iron deficiency anemia affects approximately 43% of the population [2].

Legumes, rice, wheat, maize, and cassava are staple crops in these countries [3] with soybeans having the potential to provide substantial quantities of iron [4]. Increasing the iron content of soybeans and other staple crops will help reduce the prevalence of this widespread nutritional disorder.

The top soybean producing states in the upper Midwest of the United States produced approximately 1,446 million bushels of soybeans in 2007 [5]. Soybeans grown on calcareous soils often suffer from iron deficiency chlorosis (IDC) [6-13]. Soybean

IDC is characterized by yellowing of interveinal leaf tissue, stunted growth, and end of the season yield loss [8, 14]. In the upper Midwest farmlands, IDC has been implicated in yield losses of up to 25% [15], which translates to substantial monetary loss for affected soybean producers.

Previous studies have identified soybean IDC as the result of both a major gene with minor modifying genes and as a quantitative trait [8, 10, 12]. QTL analysis of IDC has identified 19 QTL regions of significance within the soybean genome [10, 12, 16].

One study identified a genomic region responsible for 25% of the chlorosis in one population but only 17% of the chlorosis in a second study [10]. Molecular marker studies have identified three markers (Satt481, Satt114, and Satt239) that show a consistent association with IDC independent of environmental conditions [11, 17].

2

Genes involved in iron reduction, uptake, and transport have been identified and characterized in model organisms such as rice and Arabidopsis [18-26]. However, these genes have not yet been isolated and identified in soybean. The identification and manipulation of these genes, most likely through breeding programs, is imperative in improving soybean tolerance to iron deficient growth conditions and the iron content of the resulting soybean. The focus of my research project has been to use EST microarrays and GeneChips® to better understand the soybean response to iron stress and to identify candidate genes involved in soybean iron deficiency chlorosis. Coupling the candidate genes with the available genomic sequence has allowed us to better understand co- expression and co-regulation in soybean through clustering and the identification of potential regulatory elements in promoter regions.

Dissertation Organization

This dissertation is organized into five chapters. The first chapter contains a literature review covering the historical and current research in iron deficiency chlorosis in both soybean and model organisms. Chapters 2, 3, and 4 are each in the form of a manuscript. Chapters 2 and 3 have been published in BMC Genomics and Plant

Physiology and Biochemistry respectively, while chapter 4 is in the process of being readied for submission. Chapter 5 is a summary of conclusions reached throughout the course of all the experiments and recommendations for future research in the field.

Chapter 2, Microarray analysis of iron deficiency chlorosis in near-isogenic soybean lines, was published in 2007 volume 8 of BMC Genomics [27]. Candidate genes involved in soybean iron deficiency response were identified through microarray analysis of RNA extracted from root tissue of near isogenic lines grown in iron sufficient and iron

3

deficient hydroponics systems. Clustering analyses determined that iron efficient soybeans induced both an iron specific and a more general stress response. Iron reductase assays confirmed that the iron inefficient plant is unable to induce reductase activity under iron limited growth conditions. The increased reductase capacity of iron efficient plants and iron specific stress response is characteristic of iron efficient soybeans. Contributions from co-authors were invaluable. Michelle A. Graham carried out the bioinformatics analysis, including the single linkage clustering and gene annotation. Mike Grusak assisted in the reductase assays; Lila Vodkin assisted in the microarray analysis, while D. Orlando Gonzolaz produced the microarrays. All other experimentation, analysis and writing was performed by Jamie A. O’Rourke with the guidance and editorial assistance of Randy Shoemaker.

Chapter 3, Recovering from iron deficiency chlorosis in near isogenic soybeans:

A microarray study, was published in Plant Physiology and Biochemistry volume 45 in

2007 [28]. This study asked if soybeans subjected to a brief period of iron deficiency stress, but then returned to optimal growth conditions would still exhibit changes in their gene expression profiles. Four EST transcripts remained differentially expressed after the elimination of iron deficiency stress, suggesting the iron stress conditions result in long lasting changes in gene expression. The microarray results were confirmed by semi quantitative real time PCR analysis. Contributions from co-authors included direction in microarray techniques and analysis from Lila Vodkin and D. Orlando Gonzalez.

Michelle A. Graham performed bioinformatic analyses, including gene annotation. Silvia

Cianzio assisted in the drafting of the manuscript. Jamie A. O’Rourke did all other analyses and writing with editorial assistance from Randy Shoemaker.

4

Chapter 4, Integrating Microarray Analysis and the Soybean Genome to

Understand Soybean's Iron Deficiency Response, is in the process of being prepared for submission. This microarray analysis was performed with the Affymetrix® GeneChip®

Soybean Genome Array using RNA isolated from leaf tissue of near isogenic soybeans grown in iron sufficient and deficient hydroponic systems. Analysis of the candidate genes in the structural context of the soybean genome revealed clustering of co-expressed genes in soybean, a phenomenon often reported in other species. Analysis of promoter sequences of the identified candidate genes has identified two distinct and highly conserved promoter regions across 17 differentially expressed IDC genes. Additional conserved motifs, with high homology to known transcription factor binding sites, were also identified in the promoter regions of differentially expressed genes in the iron efficient genotype, but not in the iron inefficient genotype. Author contributions include genome sequence alignment and manipulation by Steve Cannon, Jeremy Schmutz, and

Jane Grimwald. David Grant performed the genome cluster analysis. Michelle Graham and Rex Nelson did the bioinformatic analyses, with Dr. Graham performing the SFP identification, GO category over representation analyses, and writing custom Perl scripts for other various bioinformatic analyses. Carroll Vance suggested the promoter analysis.

All other work and writing was performed by Jamie A. O’Rourke with editorial assistance from Michelle Graham and Randy Shoemaker.

5

Literature Review

Iron Deficiency Chlorosis

Soybean has emerged as one of the leading crop species in the world today. Like all crop species, the price of soybeans is dependent not only on the quantity but also on the quality and composition of the seeds. This, in turn, is dependent upon the health and composition of the plant. Since the introduction of soybeans into the United States, the

U.S. has become one of the leading soybean producing countries, producing over 2,585 million bushels annually [5]. The top five soybean producing states in the US are Iowa,

Illinois, Minnesota, Indiana, and Ohio. Combined, these five states produced 1,446 million bushels of soybeans in 2007 [5].

Soybeans grown in the upper Midwest states are often susceptible to iron deficiency chlorosis (IDC). IDC was first identified as a problem in 20 soybean varieties introduced from Manchuria in1938 [7], and grown on calcareous soils. Calcareous soils, are characteristically defined by a pH between 7.5 and 8.5, calcium carbonate concentration of 10-14 g kg-1 in the surface 15cm, and DTPA-Fe concentrations between

10 and 20 mg kg -1 [9], are common in the upper Midwestern U.S.. Iron deficient growth conditions induce a characteristic chlorosis pattern in soybeans; interveinal leaf tissue is yellow or necrotic while veins remain green. Severity of chlorosis is rated on a 5 point scale [8, 14]: 1, no yellowing; 2, slight yellowing; 3, moderate yellowing; 4, intense yellowing; and 5, severe yellowing with necrotic legions [8]. Visual symptoms are accompanied by various changes in the plant’s morphology and/or physiology. For plants, iron deficient growth conditions result in low yield at the end of the growing season. Yield losses of up to 25% have been reported from producers, even among

6

chlorosis-resistant soybean varieties [15]. This translates to substantial monetary loss for soybean producers. In 2006, soybean producers made 20.4$ billion dollars [5]. A yield loss of 25% means soybean producers lost 5 billion dollars in 2006. In 2007, when beans approached $15/bu, the magnitude of this loss greatly increased. As the agricultural importance of soybean increases worldwide, it becomes increasingly important to understand the genetic mechanisms controlling iron uptake, transport, and or storage that result in IDC.

Iron Deficiency Importance

Iron is important for both plant and animal health. For plants, two stable redox states allow iron to serve as an electron acceptor in a variety of cellular processes including photosynthesis, respiration, and seed development. For humans, iron deficiency anemia is one of the leading nutritional disorders worldwide, affecting 43% of the population in developing countries [2]. Iron, zinc, iodine, and vitamin A have been identified as the micronutrients most prevalent in human malnutrition [1]. Humans require between 10mg and 15mg of iron a day [1]. In developing countries, legumes, rice, wheat, maize, and cassava are staple crops [3]. In an effort to combat nutritional disorders, the Harvest Plus Challenge aims to produce staple crops with doubled iron concentrations [1].

Soybeans have long been regarded to have the potential to be natural iron supplements [4]. Soybean seeds contain an average iron concentration of 55 mg kg-1, with the majority stored in ferritin [2]. In plants, ferritin is a spherical protein that sequesters thousands of Fe3+ ions, mostly found in the plastids, due to the high iron requirements of photosynthesis. The regulation of transcription of ferritin mRNA in

7

soybean is the result of a unique promoter that binds a trans receptor to repress ferritin mRNA transcription when iron levels are low [29]. The specificity of the ferritin promoter reveals the difficulty in identifying iron homeostasis networks in plants.

Strategy I and Strategy II

Iron represents approximately 5% of the mass of the earth’s crust [30] and is thus present in nearly all soils. Most iron is bound in the soil matrix or forms insoluble ferric hydroxide precipitates [18] making it unavailable to plants. These iron precipitates are made biologically available only through weathering [30]. Plants, however, are only able to utilize Fe2+ or Fe3+ ions. The Fe2+ ion contributes little to the total soluble inorganic iron concentration [30]. Environmental conditions can also limit the Fe2+ availability.

Soils with a high pH have low levels of soluble iron.

Plants have evolved two systems to facilitate iron uptake from the soil, strategy I and II [31] (Figure 1). All plant species, except grasses, utilize strategy I to mobilize Fe2+ from the soil matrix. Strategy I is characterized by a number of biochemical reactions including: I) the release of hydrogen ions from roots, ii) the release of reducing compounds from roots, iii) the reduction of Fe3+ to Fe2+ at the root surface, and iv) an increase in the organic acid (particularly citrate) in the roots [32] (Figure 1). Additional adaptive measures include root morphology changes, root hair and transfer cell development, and an increase in the citrate concentration of phloem [33].

Strategy I plants secrete protons from the H+-ATPases in the plasmalemma to the root apoplast [30] (Figure 1). Dicots have developed transfer cells, which are characterized by an enlargement of the plasmamembrane, which increases the number of plasmalemma H+ pumps [30]. The proton extrusion causes a depression of the pH, which

8

facilitates iron reduction [30]. A root bound membrane reductase reduces the solubilized

Fe3+ to Fe2+ (Figure 1). Reduced iron is then transported across the plasma membrane by an iron transport protein (Figure 1). Membranes of plants grown under iron deficient conditions show higher iron reduction power than those from plants well supplied with iron [30].

Strategy II plants secrete ferric chelators, identified as phytosiderophores, to chelate Fe3+ (Figure 1), which readily forms these organic complexes or chelates called siderophores. Phytosiderophores belong to the mugenic acid (MA) family [34] and are highly soluble and stable over a wide pH range. In plants, phytosiderophores are secreted from the root tip a few millimeters behind the apex, in a diurnal pattern, usually in the mornings. MAs are secreted as a monovalent anion, probably via a K+/MA- symport

[35], produced from three methionine molecules combined to form nicotianamine by nicotianamine synthase [34]. Nicotianamine can be further modified to form various

MA derivatives. Each plant secretes its own specific set of MAs [35]. The entire iron:phytosiderophore complex is transported across the plasma membrane by specific transport proteins such as yellow-stripe1 in maize [34] (Figure 1).

The onset of iron deficient conditions induces the up-regulation of reduction activity [31]. Reductase activity is usually restricted to the young, non-lignified root zones [33]. Because reductase activity is directly proportional to the root surface area, the up-regulation of reductase is often preceeded by the growth of root hairs [33]. In maize, iron deficient growth conditions have been shown to affect the process of photosynthesis [36]. Due to the iron requirement of chlorophyll b, the ratio of chlorophyll a/b increases, but the total chlorophyll content of the plant is reduced [36].

9

The lack of iron affects the prevalence and stoiciochemistry of photosystems I and II, which leads to reduced activity of the electron transport chain [36]. Low level of photosynthesis results in low levels of carbon fixation, which will affect growth and yield.

Yield Loss

Chlorotic soybean plants are common in the farmlands of the upper Midwestern

United States [8-10, 12, 14]. An increase in the chlorosis levels seen in the plant [14] results in a negative correlation with yield. Every one unit increase on the chlorosis scale results in a 20% yield loss [14]. To mitigate this yield loss, researchers have been trying to understand the genetics underpinning this trait.

The genetic regulation of IDC in soybean has been characterized as dependent both from a major gene with minor modifying genes [8, 12] and as a quantitative trait

[10, 12]. Plant breeders have used breeding practices to improve soybeans iron efficiency, but success has been somewhat limited. Complexity of this trait was re- affirmed by the identification of a QTL responsible for 25% of the chlorosis in one population, but only responsible for 17% of the chlorosis in a second population [10].

Additional QTL regions were identified in field studies [12]. These same QTLs were identified in plants grown hydroponically in a USDA greenhouse study using regulated nutrient solutions designed to limit iron availability [16]. The results of this study suggested the hydroponic system was a valid and more easily replicable system to investigate soybean iron deficiency chlorosis [16].

10

Bicarbonate, high moisture, low temperature and high lime content all play a role in iron deficiency chlorosis [37]. Cool wet weather results in poorly aerated soils, allowing carbon dioxide to accumulate [37]. The build up of CO2 results in the alkalization of the soil, which promotes the oxidation and formation of the Fe3+ ion.

Bicarbonate allows the interactions of limestone, bulk density, water content, biodegradable organic matter, and soil drainage to increase soil CO2 levels, thus

- increasing soil HCO3 [38]. In hydroponic solutions, the addition of bicarbonate causes increased chlorosis scores [38] [39]. Bicarbonate neutralizes the protons being released from the root system of the plant, thus blocking the reduction of Fe+3 to Fe+2 [40]. The pH of hydroponic solutions and severity of chlorosis can be adjusted by manipulating the bicarbonate concentration or CO2 levels [41]. High pH levels will often reduce the solubility of other nutrients such as calcium, copper, phosphorus and zinc [42]. However, calcareous soils often have a relatively moderate pH. Soil moisture, electroconductivity,

DTPA-Cr and DTPA-Fe have been identified as significant predictors of soil attributes resulting in chlorotic soybeans [15].

Mapping Soybean Genes Determining IDC

The development of molecular markers [single sequence repeats (SSRs) and restriction fragment length polymorphisms (RFLPs)] in soybean allowed the first mapping of IDC QTLs in 1992 [10]. Markers explaining phenotypic variation were identified in one population, but were not significantly associated with iron efficiency in a second mapping population [10]. The disparity between populations and markers has proven to be a difficult problem to overcome throughout the years. Utilizing 99 RFLPs

11

and SSRs, QTLs on 6 molecular linkage groups (MLGs) were identified, all with minor affects, in a Pride B216 x A15 mapping population [12]. Ninety-seven RFLPs and SSRs identified a QTL on MLG N, accounting for 72% of the phenotypic variation in another mapping population, Anoka x A7. The observations reinforced the idea that IDC may be controlled by two distinct genetic mechanisms [43]. Two QTLs on MLG I and N were identified in both the Pride B216 x A15 and the Anoka x A7 mapping populations, respectively [13]. No common molecular markers were present in both populations, precluding the use of marker-assisted selection (MAS) in breeding programs focusing on improvement of IDC resistance. Due to the high environmental component of IDC, the same populations were grown in the hydroponics system previously described and evaluated using the same molecular markers. Similar, though not entirely identical,

QTLs were identified in both experiments [13], further validating the hydroponics system as an appropriate vehicle to expedite IDC research. The use of hydroponics allows for multiple growth seasons within a short time span, in which the severity of the chlorosis can be controlled, and changes in the environmental conditions are minimal.

SSR markers have also been used to select for IDC resistance. Marker-assisted selection (MAS) found that marker association to IDC expression was dependent on geographic location [6, 11], further emphasizing the environmental impact in IDC symptomatology. Using genotypic data increased the efficiency of selecting chlorosis resistant lines by up to two fold [6]. The study demonstrated the importance of utilizing an environmentally independent method to study IDC. A second study, including 108

SSR markers spanning 8 known IDC QTLs identified only 24 of the markers as polymorphic [11]. The remaining 84 markers were homozygous in the parental lines of

12

the mapping population and thus not useful in screening for iron efficiency. Of the 24 polymorphic SSRs, only 3 showed an association with IDC. Only one of these, Satt481 showed a consistent association with IDC irrespective to environmental conditions. [11].

The marker only accounts for 12% of the IDC variation [11], but it was the first marker identified as useful for breeding purposes.

Recently, two additional markers have been identified that are associated with soybean IDC. Using two populations, each a collection of advanced breeding lines from the upper Midwest, association mapping identified Satt114 and Satt239 to account for

10% and 18.3% of the chlorotic variability respectively [17, 41]. Using traditional QTL mapping with a bi-parental cross, a single locus may show a major effect in one population but not in another. The structure of the population used in association mapping ensures a locus must show an effect in multiple lines to be considered significant. Thus, association mapping identifies markers associated with the trait of interest in a population independent manner. These markers should show broad applicability in screening for IDC susceptibility.

Using Model Organisms to Identify Genes involved in Iron Deficiency

ZIPs & IRT

Most of the information about iron uptake has been obtained from studies in

Arabidopsis thaliana. In Arabidopsis, iron transport has been identified as the rate- limiting step in iron acquisition. Five percent of the Arabidopsis genome is thought to encode transporters, with 150 of those specific to cations [44]. Cation transporters are

13

important in regulating the presence of toxic compounds within the plant. They also serve an important role in signaling [44].

The first gene identified specifically related to iron deficiency was called the iron response transporter gene 1 (IRT1) in Arabidopsis. This gene was the first iron transporter cloned in plants or animals [19]. IRT1 expression is induced by iron deficient conditions in a root specific manner and is co-regulated with the plasma membrane iron reductase [19]. Plants with an irt1 mutation show specific developmental defects, including a reduced stacking of thylakoids into grana, lack of differentiation of palisade parenchyma in leaves, reduced number of vascular bundles in stems, and abnormal endodermal and cortex cells in the roots [20]. IRT1 was the founding member of the zinc response transporter (ZRT)-iron response transporter (IRT) like protein (ZIP) family [19].

ZIP proteins are characterized by the transport of iron, zinc, and manganese [21]. This family has expanded to include 15 ZIP metal transporters (IRT1-3 and ZIP1-12) [45]. All members of the ZIP family have 8 transmembrane domains with a characteristic conserved histidine residue thought to be involved in heavy metal binding [22] and a ZIP signature sequence:

{LIVFA}{GAS}{LIVMD}{LIVSCG}{LIVFAS}{H}{SAN}{LIVFA}{LIVFMAT}{LI

VDE}{G}{LIVF}{SAN}{LIVF}{SAN}. IRT1 is transcriptionally regulated by the availability of iron, zinc, and cadmium; and post-transcriptionally regulated by the presence of iron and zinc [21]. This suggests mineral uptake is a highly regulated process to prevent potentially toxic levels from accumulating within the plant.

In Arabidopsis the IRT2 gene shows high (69% nucleic acid identity) sequence similarity to IRT1 but its protein is unable to transport manganese or cadmium [23]. IRT2

14

is expressed in the external layers of the root. IRT2 gene expression is two-fold lower than that of IRT1, suggesting IRT2 is a housekeeping gene in iron uptake and homeostasis

[23]. Knockout studies of IRT1, IRT2, and IRT1 IRT2 have shown that 35S::IRT2 cannot restore a wild type phenotype to an irt1-1 mutant. However, 35S::IRT1 restores near wildtype phenotype to an irt1-2 mutant [24], suggesting IRT1 is required for normal iron uptake and homeostasis, not IRT2. IRT2 may function as an internal cellular relay for

IRT1 [18]. IRT1 has also been shown to accumulate in flowers before pollination, suggesting IRT1 provides iron to pollen grains [18]. Expression levels of IRT1 in flower tissue is not dependent on iron availability, therefore it is most probably developmentally controlled [18].

ZIP genes are not unique to Arabidopsis. IRT1 homologs have been identified and cloned in a variety of plant species including rice (OsIRT1) [25] and tomato (LeIRT1)

[46] where it is recognized as an iron transporter induced by iron deficient conditions in a root specific manner. In Medicago truncatula, six ZIP proteins have been identified and characterized [47]. The pea homolog of IRT1 (RIT1) preferentially transports Fe2+, but will also transport Zn2+, Cd2+, and other micronutrient metals if they are present in a high concentration [48]. Overlapping functions of ZIP proteins, though not fully redundant, emphasize the importance of proper iron homeostasis in maintaining plant health.

Ferric Reductase Oxidase (FRO)

Ferric chelate reductase is an important component of the strategy I response.

Reductase ability of plant roots is a pre-requisite for iron uptake [49]. In Arabidopsis,

FERRIC REDUCTASE OXIDASE (FRO) encodes the iron deficiency inducible ferric

15

reductase oxidase [50]. The reduced iron can then be transported across the plasma membrane by IRT1 (discussed above). There are eight identified FRO alleles in

Arabidopsis. FRO1-FRO8 are expressed in different, though overlapping, tissues and are regulated by both iron and copper availability [50]. The FRO proteins are a subset of flavocytochromes [51] and are regulated by light in a tissue specific manner in response to developmental cues [52]. FRO genes have been identified in pea, tomato and yeast

[53-55]. All eight Arabidopsis FRO family members contain a FAD , a

NADPH binding site, four invariant histidine residues, and 8 or 9 transmembrane helicies [50]. Despite their similar composition, the different FRO members are specific to different membranes. FRO2 is located on the plasma membrane, FRO3 and FRO8 localize to the mitochondria, FRO7 localizes to the chloroplast, FRO6 localizes to either the chloroplast or the plasma membrane and FROs 1, 4, and 5 contain secretory pathway signal peptides [50].

Either iron reductase or iron transport would be the logical rate-limiting steps resulting in chlorosis in Strategy I plants. Increased reductase capacity has been associated with an increase in the redox potential, providing electrons for reduction [56].

Transforming tobacco with yeast reductase (FRE2) reduced chlorotic symptoms compared to control plants [57] [58]. Transforming rice with a yeast reductase resulted in an 8% yield increase [59]. Expressing the Arabidopsis FRO2 in soybean alleviated chlorotic symptoms in plants grown in iron limited hydroponics environments [60].

Increasing soybean reductase capacity prevented the loss of biomass often seen in soybeans grown in iron deficient hydroponics [60].

16

MATE Genes

Multi antimicrobial efflux (MATE) proteins have also been found to play a valuable role in iron homeostasis. MATE proteins export antimicrobial compounds produced by both pathogens and host plants from roots using proton motive force antiporters [61]. Fifty-six MATE genes have been identified in Arabidopsis [62], making this one of the most numerous gene families involved in the iron process. MATE genes have been identified in many other species including maize and lupine.

The Ferric Reductase Defective 3 gene (FRD3) has become one of the most widely studied MATE genes in Arabidopsis [26, 63, 64]. Plants containing the frd3 mutation constitutively express all three components of the strategy I iron deficiency response, though they may have higher levels of iron than wild type plants [26]. FRD3 has been shown to be involved in root to shoot metal translocation [63, 65]. Recently, it has been shown that FRD3 is a proton coupled citrate exporter on the plasma membrane, extruding citrate into either the xylem or the rhizosphere [64]. This coordinates with other MATE gene activities, specifically, NorM. In bacteria, NorM pumps anti microbial agents out of bacterial cells in exchange for sodium [66] [67]. It is thought that FRD3 transports an iron chelator required for proper iron localization (possibly citrate) [68].

The accumulation of citrate and malate in leaves and roots of iron deficient plants coincides with an increase in proton extrusion. In sugar beets, iron deficiency results in an increase in carbon fixation and oxygen utilization in root tissues [69]. An increase in carbon fixation is accompanied by an increase in phosphoenolpyruvate carboxylase

(PEPC), cytosolic malate dehydrogenase (MDH), and citrate synthase (CS) activity [69].

Under iron deficient conditions, phosphoenolpyruvate is converted to oxaloacetate by

17

phosphoenolpyruvate carboxylase (PEPC). PEPC is converted to malate by cytosolic

MDH, which is converted to citrate by CS. This increase in glycolytic activity has also been seen in Arabidopsis [70] through microarray analysis.

YSL

The use of strategy II allows graminacious plants to utilize both Fe2+ and Fe3+ from the soil. Strategy II plants are less susceptible to iron deficiency chlorosis than strategy I plants, which are limited to utilizing Fe2+. Of particular importance to strategy

II plants is the transporter, which moves iron phytosiderophore complexes into the plant.

First identified in maize, the Yellow Stripe like protein (YS1) has been identified as the key component in iron uptake and transport for strategy II plants [71]. In maize, the YS1 gene is expressed in the vasculature tissues of shoots and reproductive organs as well as in the roots [72]. This suggests maize YS1 serves a role in iron translocation within the plant. Homologs of YS1 have been identified in other species, the closest being barley

YS1with 70% identity [73].

Though a model for strategy I iron uptake, Arabidopsis has eight independent homologs of YS1 termed Yellow Stripe Like, or AtYSL1 through 8 [74]. However, as a strategy I plant, Arabidopsis is unable to synthesize or transport phytosiderophores.

Arabidopsis does contain nicotianamine (NA), a precursor in phytosiderophore production [74]. Though NA is ubiquitous across all plant species, MA synthase has not been identified in any dicot species to date [75]. AtYS1 through 8 proteins have been shown to transport Fe:NA complexes. NA has high binding affinity for many metals

18

including Fe, Cu, Zn and Mg [74]. NA protects cells from oxidative damage by scavenging free Fe.

YSL1 through 8 proteins in Arabidopsis transport specific metals. AtYSL1 specifically transports Fe while AtYS2 transports both Fe and Cu [76]. One of the AtYSL family’s most important roles may be in transporting iron from the vasculature tissue to the seeds [72]. Seeds of ysl1/ysl3 mutants have low iron content, resulting in low germination rates [77]. YSL2 and IRT gene expression are both dependent on the iron status of the plant, but are oppositely expressed. YSL2 shows low expression levels in Fe deficient roots, but high expression in iron sufficient and iron re-supplied roots [74].

Low levels of expression under low iron conditions support the hypothesis that AtYS-like genes function in iron storage or detoxification.

Iron Specific Transcription Factors

Two types of transcription factors have been identified, which specifically interact with the promoter region of genes related to iron deficiency chlorosis. bHLH transcription factors were the first group shown to affect gene expression under iron deficient conditions. The first bHLH transcription factor involved in the iron deficiency response was the Fe-deficiency Induced Transcription Factor 1 (FIT1) [78]. FIT1 transcripts were identified in roots of iron deficient Arabidopsis plants and it was later shown to regulate the expression of IRT1 [78]. The FER protein has also been identified as a bHLH transcription factor involved in the iron deficiency response in tomatoes [79].

In tomatoes, FER proteins are only essential in roots [80]. The gene functions at all iron concentrations thus suggesting it is involved in root physiology and development [80].

19

Upregulated FER expression induces LeIRT, LeNRAMP1, and LeFRO [81], highlighting it’s importance in tomato’s iron deficiency response. In addition to FIT and

FER, there are 160 additional bHLH transcription factors encoded in the Arabidopsis genome [81]. These proteins are identified by their characteristic motifs; a short N terminal and two helicies separated by a variable length loop [81]. Two additional bHLH proteins were highly up-regulated under iron deficient conditions, AtbHLH38 and

AtbHLH39. Sequence alignment identified a bHLH subfamily. Composed of four genes, these transcription factors have an additional three amino acids in the second helix and share a conserved cystine residue in the variable length loop [81]. Over expression of these genes results in the synthesis and excretion of riboflavin from the plant roots.

In rice, a strategy II plant, the iron deficiency-response element factor 1(IDEF1) a transcription factor, which specifically binds to the iron deficiency-responsive element 1

(IDE1) and confers tolerance to iron deficiency [82], was identified. The IDEF1 transcription factor is a member of the AB13/VP1 transcription factor family [82]. This family is a plant specific and known to be involved in abscisic acid signaling [82].

IDEF1 binds to the CATGC sequence of IDE, known iron deficiency responsive cis acting elements [82]. IDEF1 up-regulation improves tolerance to iron deficiency by modifying the utilization of iron already within the plant [82]. This unique approach to iron deficiency may prove a more viable method for conferring tolerance.

20

Microarrays

The field of global gene expression assays (microarrays) emerged in the early

1990’s with cDNAs spotted on nylon membranes. Nearly a decade later, microarrays have evolved into an importaant genomics tool. The ability to monitor the expression level of thousands of genes has exponentially increased the genetic data gathered from a single experiment. Microarray experiments are designed as a method of broad discovery

[83]. This widely adopted platform encourages ‘question driven’ experiments over

‘hypothesis driven’ experiments. However, the parallel assessment of gene expression of hundreds to thousands of genes in a single experiment is an excellent platform from which hypothesis can be based for future experiments. cDNA Arrays

The first modern microarrays were cDNA arrays, first developed in the mid

1990’s. Each glass slide was spotted with a few to many thousands of cDNA clones, usually derived from expressed sequence tag (EST) libraries. Each EST is a single pass sequence read of a cDNA clone. Arabidopsis and rice, which now have fully sequenced genomes, also have the largest EST libraries with approximately 620,000 and 1,100,000 clones respectively. Since most species do not have the luxury of a whole genome sequence, EST libraries represent the largest available resource of sequence data.

Soybean has one of the largest publicly available EST libraries with almost 350,000 clones [84]. The soybean community supported the development of cDNA arrays at the

University of Illinois. Initially, each slide had approximately 9,000 cDNAs represented.

As the technology improved, the number cDNAs increased up to approximately 19,000

21

cDNAs. These microarrays have been used by a number of laboratories to examine traits from CO2 availability[85], somatic embryogenesis[86], pathogen response[87], and iron chlorosis deficiency[27, 28]. Microarrays allow a molecular approach to questions previously only asked in terms of molecular markers, classical genetics, and plant breeding. Where before researchers could identify a region of the genome of importance, microarrays identify a suite of genes involved in or affected by the trait or condition being studied. This provides the basis for future in-depth, targeted molecular research.

Affymetrix Arrays

While cDNA arrays can be produced in a single laboratory or by a core facility, the commercialization of microarrays led to the development of a more sophisticated platform. Affymetrix® GeneChips® have become the benchmark of microarray experimentation. The photolithographic technology used by Affymetrix® allows millions of sequences to be present on a single GeneChip®. The commercialized microarray also substantially decreases the chip-to-chip variation, which is a large source of variability in cDNA arrays. The replicability of Affymetrix® GeneChips® has made them the industry standard against which all microarray experiments are judged. Unlike cDNA microarrays, GeneChip® experiments use biotinylated cRNA as the experimental probe and only one RNA sample is applied to each GeneChip®. Each sequence on a

GeneChip® is composed of 11 independent perfect match and 11 mismatch 25mer probes. The variety of probes is meant to increase sequence specificity and reduce cross hybridization of closely related sequences. The specificity and replicability of

Affymetrix® GeneChips® allows for the detection of changes in expression down to 1.25

22

fold between experimental conditions. To date, there are approximately 80 different

Affymetrix® arrays representing 37 different species. The human genome, sequenced in

2003, has 14 unique GeneChips® available. Of the plants, only Arabidopsis has multiple

GeneChips®, with three currently available.

The sequences on the Affymetrix® GeneChip® Soybean Genome Array represent

35,611 unigenes from soybean. Additonally, due to input from the soybean community, the GeneChip® also contains 15,421 and 7,431 unigenes from Phytopthora sojae and

Heterdoera glycines, two of the major pests in soybean. The inclusion of P. sojae and H. glycine genes allows for more efficient study of infection and disease response. The importance of simultaneously comparing pathogens and soybean gene expression can be seen in the research published using the soybean GeneChip®. The effect of nematode infection on plant health has been the subject of five independent projects [88-93]; the responses to Asian soybean rust the subject of another[94].

Equally important to identifying genes affected by environmental conditions, or pathogen infection is classifying the differentially expressed genes. Affymetrix® has provided preliminary annotation for probes with known information, available through

NetAffx analysis center (http://www.affymetrix.com/analysis/index.affx). Additionally, researchers with the USDA-ARS have compiled annotations for the sequences on the

Affymetrix Soybean GeneChip (unpublished), publicly available at http://soybase.org/GeneChip. Using BLASTX and TBLASTX [95] the sequences from which the Affymetrix probes were derived were compared to the UniProt database and the Arabidopsis genome gene calls (TAIR7, www.arabidopsis.org). The top three

UniProt BLAST hit and Arabidopsis best hit GO ontology, GO slim annotations, and

23

PFAM for the identified coding sequence are available for download or query.

Identifying the closest Arabidopsis homolog to the soybean gene in question allows inference about the function of the soybean gene.

SFPs

Microarrays can simultaneously provide gene expression and genotypic data [96].

Having been used to perform whole genome expression studies for years, Affymetrix and long oligo microarrays are now being used to perform whole genome polymorphism studies [97]. Phenotypic variations, which are often the subject of microarray studies, are products of underlying DNA diversity [98]. Two individuals producing the same amount of RNA but containing a genetic polymorphism within DNA of a specific feature will give rise to differential hybridizations confined to the feature [99]. Variations in nucleotides are key to the development of molecular markers [98]. Researchers have taken advantage of the design of the Affymetrix® GeneChip® probes for purposes other than candidate gene identification. Two emerging uses are the identification of single feature polymorphisms (SFPs) and eQTL analysis.

The most frequent type of polymorphism observed in any organism is a single nucleotide polymorphism (SNP) [98]. However, SNP identification requires prior sequence information [98], which is not available for most species. Similar to SNPs,

SFPs represent sequence differences of more than one nucleotide [97]. The 25base pair design of the probe sets used by Affymetrix® provide an ideal platform to use in screening for SFPs. There is little to no extra expense in mining existing microarray data to identify SFPs [96]. Sequences with base changes within the 25mer probe will show a reduced hybridization signal than those with a perfect match to the probe [98].

24

Traditionally, few SNPs are identified throughout the genome. However, the global analysis used to detect SFPs allows for SFP identification throughout the genome. It is also more likely to identify a causal polymorphism for a trait using SFPs [97] due to the increased number of polymorphisms considered. Regions of high recombination show high numbers of SFPs [97]. Additionally, genes involved in disease resistance show a high number of SFPs [97]. It may be possible to identify genes that are or were under selection by comparison of the SFPs within gene families [97]. Over half of the SFPs

(59%) were encoded in cis-acting regulators and 36% were encoded in structural genes.

The remaining 5% are encoded in trans acting regulators or duplicated genes [99].

Polymorphic mutations preferentially accumulating in regulatory regions over protein encoding regions suggests an evolutionary conservation of structural gene sequence and favored adaptations by transcriptional regulation.

Mapping with Microarrays

In agriculture, the main purpose of QTL discovery is to eventually improve selection efficiency in breeding. A single QTL region may contain hundreds or thousands of candidate genes. Using microarrays to identify genes with expression differences under different conditions can significantly limit the number of candidate genes considered for future experiments [100]. Combining microarray and QTL analysis is one method to understand the genetic control of gene expression [101] and has been termed genetical genomics [102]. The idea to use microarrays to identify QTLs (eQTLs) was first proposed in 2003 [103] but was not successfully reported until 2005 [104]. Using microarrays for QTL mapping can only be successful if some of the gene expression levels being measured by the microarray are under the genetic control of the QTL and if

25

some of the heritable gene expression levels are related to the disease being studied

[103]. If SFPs, as described earlier, and eQTLs can be identified by the same dataset, a physical link between the marker and gene can be established [99], the objective of all

QTL experiments.

Conclusion

The study of iron uptake, transport, and storage in plants is important in improving the nutritional quality of plants for human consumption and reducing yield loss in crop species. The use of mutants in model organisms, such as Arabidopsis, expedited the identification of individual genes involved in iron reduction, transport, and translocation. The complicated genome of soybean has precluded the identification of candidate genes through mutational analysis. The use of marker assisted selection has identified a few markers which can be used to screen germplasms for iron efficiency, but individual candidate genes, which can then be used in breeding programs to improve iron efficiency, must be identified. These candidates can then be assigned to biological and metabolic pathways to determine what biological processes within the plant are altered under iron deficient stress. We must characterize soybeans’ response to iron deficient growth conditions to understand how to begin manipulating the response to improve iron efficiency.

The release of the soybean genome should allow rapid progress in the molecular characterization of soybeans iron deficiency response. In addition to identifying genes within previously identified QTLs, we need to identify SFPs that can be tested as markers in marker assisted selection. We can use the genome to better understand the regulation of genes involved in soybeans iron deficiency response. Are they all located in one place

26

in the genome, or are they scattered throughout the genome? Do they share promoter regions, suggesting they may be co-regulated? Global expression analysis and genomic sequence will significantly alter the way iron deficiency research is conducted in soybean. Laboratory research will allow researchers to design experiments that maximize the potential of yearly field experiments to increase the iron content of seeds and minimize yield loss due to iron deficient growth conditions.

27

Figure1: Two Iron Uptake Strategies

28

References:

1. Welch RM, R.D. Graham: Breeding for micronutrients in staple food crops from a human nutrition perspective. Journal of experimental botany 2004, 55:353-364. 2. Theil EC: Iron, Ferritin, and Nutrition. Annu Rev Nutr 2004, 24:327-343. 3. Ghandilyan A, D. Vreugdenhil, and M.G.M. Aarts: Progress in the genetic understanding of palnt iron and zinc nutrition. Physiologia Plantarum 2006, 126:407-417. 4. Theil EC: Ferritin: at the crossroads of iron and oxygen metabolism. The Journal of nutrition 2003, 133(5 Suppl 1):1549S-1553S. 5. USDA Economic Research Service. US Soybean Industry: Background Statistics and Information April 7, 2008, http://www.ers.usda.gov/News/soybeancoverage.htm. 6. Charlson DV, S.R. Cianzio, and R.C. Shoemaker: Associating SSR Markers with Soybean Resistance to Iron Deficiency Chlorosis. Journal of Plant Nutrition 2003, 26:2267-2276. 7. Weiss M: Inheritance And Physiology Of Efficiency In Iron Utilization In Soybeans. Genetics 1943, 28:253-268. 8. Cianzio SR, Fehr WR, Anderson IC: Genotypic Evaluation for Iron Deficiency Chlorosis in Soybeans by Visual Scores and Chlorophyll Concentration. Crop Science 1979, 19:644-646. 9. Inskeep WP, Bloom, P.R.: Soil Chemical Factors Associated with Soybean Chlorosis in Calciaquolls of Western Minnesota. Agronomy Journal 1987, 79:779-786. 10. Diers BW, S.R. Cianzio, R.C. Shoemaker: Possible Identification of Quantitative Trait Loci Affecting Iron Efficiency in Soybean. Journal of Plant Nutrition 1992, 15(10):2127-2136. 11. Charlson DV, Grant D, Bailey TB, Cianzio SR, Shoemaker RC: Molecular Marker Satt481 is Associated iwth Iron-Deficiency Chlorosis Resistance in a Soybean Breeding Population. Crop Science 2005:2394-2399. 12. Lin S, Cianzio SR, Shoemaker RC: Mapping genetic loci for iron deficiency chlorosis in soybean. Molecular Breeding 1997, 3:219-229. 13. Lin SF, D. Grant, S. Cianzio, R. Shoemaker: Molecular Characterization of Iron Deficiency Chlorosis in Soybean. Journal of Plant Nutrition 2000, 23:1929-1939. 14. Froelich DM, Fehr WR: Agronomic Performance of Soybeans with Differing Levels of Iron Deficiency Chlorosis on Calcareous Soil. Crop Science 1981, 21:438-441. 15. Hansen NC, Schmitt MA, Anderson JE, Strock JS: Soybean: Iron Deficiency of Soybean in the Upper Midwest and Associated Soil Properties. Agronomy Journal 2003, 95:1595-1601. 16. Lin S, Baumer JS, Ivers D, Cianzio SR, Shoemaker RC: Field and Nutrient Solution Tests Measure similar Mechanisms Controlling Iron Deficiency Chlorosis in Soybean. Crop Science 1998(38):254-259.

29

17. Wang J, P.E. McLean, R.Lee, R.J. Goos, T. Helms: Association mapping of iron deficiency chlorosis loci in soybean (Glycine max L. Merr.) advanced breeding lines. . Theor Appl Genet 2008, DOI 10.1007/s00122-008-0710-x. 18. Vert G, Grotz N, Dedaldechamp F, Gaymard F, Guerinot ML, Briat JF, Curie C: IRT1, an Arabidopsis transporter essential for iron uptake from the soil and for plant growth. The Plant cell 2002, 14(6):1223-1233. 19. Eide D, Broderius M, Fett J, Guerinot ML: A novel iron-regulated metal transporter from plants identified by functional expression in yeast. Proceedings of the National Academy of Sciences of the United States of America 1996, 93(11):5624-5628. 20. Henriques R, Jasik J, Klein M, Martinoia E, Feller U, Schell J, Pais MS, Koncz C: Knock-out of Arabidopsis metal transporter gene IRT1 results in iron deficiency accompanied by cell differentiation defects. Plant molecular biology 2002, 50(4-5):587-597. 21. Connolly EL, Fett JP, Guerinot ML: Expression of the IRT1 metal transporter is controlled by metals at the levels of transcript and protein accumulation. The Plant cell 2002, 14(6):1347-1357. 22. Grotz N, Fox T, Connolly E, Park W, Guerinot ML, Eide D: Identification of a family of zinc transporter genes from Arabidopsis that respond to zinc deficiency. Proceedings of the National Academy of Sciences of the United States of America 1998, 95(12):7220-7224. 23. Vert G, Briat JF, Curie C: Arabidopsis IRT2 gene encodes a root-periphery iron transporter. Plant J 2001, 26(2):181-189. 24. Varotto C, D. Maiwald, P. Pesaresi, P. Jahns, F. Salamini, and D. Leister: The metal ion transporter IRT1 is necessary for iron homeostasis and efficient photosynthesis in Arabidopsis thaliana. The Plant Journal 2002, 31(5):589- 599. 25. Bughio N, H. Yamaguchi, N.K. Nishizawa, H. Nakanishi, and S. Mori: Cloning an iron-regulated metal transporter from rice. Journal of experimental botany 2002, 53:1677-1682. 26. Rogers EE, Guerinot ML: FRD3, a Member of the Multidrug and Toxin Efflux Family, controls Iron Deficiency Responses in Arabidopsis. The Plant cell 2002, 14:1787-1799. 27. O'Rourke JA, D.V. Charlson, D. O. Gonzalez, L.O. Vodkin, M.A. Graham, S.R. Cianzio, M.A. Grusak, R.C. Shoemaker: Microarray analysis of iron deficiency chlorosis in near-isogenic soybean lines. BMC genomics 2007, 8:476-489. 28. O'Rourke JA, Graham MA, Vodkin L, Gonzalez DO, Cianzio SR, Shoemaker RC: Recovering from iron deficiency chlorosis in near-isogenic soybeans: a microarray study. Plant Physiol Biochem 2007, 45(5):287-292. 29. Wei J, Theil EC: Identification and characterization of the iron regulatory element in the ferritin gene of a plant (soybean). The Journal of biological chemistry 2000, 275(23):17488-17493. 30. Mengel K, Kirby EA, Kosegarten H, Appel T: Principles of Plant Nutrition 5th Edition, 5th edn. Boston: Kluwer Academic Publishers; 2001. 31. Romheld V: Different Strategies for Iron Acquisition in Higher Plants. Physiol Plant 1987, 70:231-234.

30

32. Brown JC: Mechanisms of Iron Uptake by Plants. Plant, Cell and Environment 1978, 1:249-257. 33. Schmidt W, Tittel J, Schikora A: Role of hormones in the induction of iron deficiency responses in Arabidopsis roots. Plant Physiol 2000, 122(4):1109- 1118. 34. Hell R, Stephan UW: Iron uptake, trafficking and homeostasis in plants. Planta 2003, 216:541-551. 35. Curie C, J.F. Briat: Iron Transport and Signaling in Plants. Annu Rev Plant Biology 2003, 54:183-206. 36. Sharma S: Adaptation of Photosynthesis under Iron Deficiency in Maize. Journal of plant physiology 2007, 164(110):1261-1267. 37. Porter LK, D.W. Thorne: Interrelation of Carbon Dioxide and Bicarbonate Ions in Causing Plant Chlorosis. Soil Science 1954, 79:373-382. 38. Coulombe BA, Chaney RL, Wiebold WJ: Bicarbonate Directly Induces Iron Chlorosis in Susceptible Soybean Cultivars. Soil Science Society American Journal 1984, 48:1297-1301. 39. Flemming A, R. Chaney, B. Coulombe: Bicarbonate inhibits Fe-stress response and Fe uptake - translocation of chlorosis-susceptible soybean cultivars. Plant Nutrition 1984, 7:699-714. 40. Mengel K: Iron Availability in Plant Tissues - Iron chlorosis on calcareous soils. Plant and Soil 1994, 165:275-283. 41. Norvell WA, M.L. Adams: Screening Soybean Cultivars for Resistance to Iron-Deficiency Chlorosis in Culture Solutions Containing Magnesium or Sodium Bicarbonate. Journal of Plant Nutrition 2006, 29:1855-1867. 42. Parker DR, W.A. Norvell: Advances in Solution Culture Methods for Plant Mineral Nutrition Research Advances in Agronomy 1999, 65:151-213. 43. Rodriguez de Cianzio S, W.R. Fehr: Variation in the Inheritance of Resistance to Iron Deficiency Chlorosis in Soybeans. Crop Science 1982, 22:433-434. 44. Maser P, Thomine S, Schroeder JI, Ward JM, Hirschi K, Sze H, Talke IN, Amtmann A, Maathuis FJ, Sanders D et al: Phylogenetic relationships within cation transporter families of Arabidopsis. Plant Physiol 2001, 126(4):1646- 1667. 45. Kim SA, Guerinot ML: Mining iron: iron uptake and transport in plants. FEBS letters 2007, 581(12):2273-2280. 46. Dancis A, D.S. Yuan, D. Haile, C. Askwith, D. Eide, C. Moehle, J. Kaplan, R.D. Klausner: Molecular characterization of a copper transport protein in S. cerrevisiae: an unexpected role for copper in iron transport. Cell 1994, 76:393-402. 47. Lopez-Millan AF, D.R. Ellis, and M.A. Grusak: Identification and characterization of several new members of the ZIP family of metal ion transporters in Medicago truncatula. Plant molecular biology 2004, 54:583- 596. 48. Cohen CK, Garvin DF, Kochian LV: Kinetic properties of a micronutrient transporter from Pisum sativum indicate a primary function in Fe uptake from the soil. Planta 2004, 218(5):784-792.

31

49. Chaney RL, J.C. Brown, L.O. Tiffin: Obligatory reduction of ferric chelates in iron uptake by soybeans Plant Physiol 1972, 50:208-213. 50. Mukherjee I, Campbell NH, Ash J, Connolly EL: Expression profiling of the Arabidopsis ferric chelate reductase (FRO) gene family reveals differential regulation by iron and copper. Planta 2006, 223:1178-1190. 51. Robinson NJ, C.M. Proctor, E.L. Connolly, M.L. Guerinot: A ferric-chelate reductase for iron uptake from soils. Nature 1999, 397:694-697. 52. Feng H, A. Fengying, S. Zhang, Z. Ji, H.Q. Ling, and J. Zou: Light-Regulated, Tissue-Specific, and Cell Differentiation-Specific Expression of the Arabidopsis Fe(III)-Chelate Reductase Gene AtFRO6. Plant Physiol 2006, 140:1345-1354. 53. Waters BMDGB, D.J. Eide: Characterization of FRO1, a pea ferric-chelate reductase involved in root iron acquisition. Plant Physiol 2002, 129:85-94. 54. Li L, X. Cheng, H.Q. Ling: Isolation and characterization of Fe(III)-chelate reduccase gene LeFRO1 in tomato. Plant molecular biology 2004, 54:125-136. 55. Martins LJ, L.T. Jensesn, J.R. Simons, G.L. Keller, D.R. Winge: Metalloregulation of FRE1 and FRE2 homologs in Saccharomyces cerevisiae. Journal of Biological Chemistry 1998, 273:23716-23721. 56. Ric de Vos C, H.J. Lubberding, H.F. Bienfait: Rhizosphere Acidification as a Response to Iron Deficiency in Bean Plants. Plant Physiol 1986, 81:842-846. 57. Moc MC, A.I. Samuelsen, R.C. Martin, D.W.S. Mok: Fe(III) reductase, the FRE genes, and FRE-transformed toacco. Journal of Plant Nutrition 2000, 23(1941-1951). 58. Samuelsen AI, R.C. Martin, D.W.S. Mok, M.C. Mok: Expression of the yeast FRE gnes in transgenic tobacco. Plant Physiol 1998, 118:51-58. 59. Ishimaru Y, S. Kim, T. Tsukamoto, H. Oki, T. Kobayashi, S. Watanabe, S. Matsuhashik, M. Takahashi, H. Nakanishi, S. Mori, and N.K.Nishizawa: Mutational reconstructed ferric chelate reductase confers enhanced tolerance in rice to iron deficiency in calcareous soil. Proceedings of the National Academy of Sciences of the United States of America 2007, 104(18):7373-7378. 60. Vasconcelos M, Eckert H, Arahana V, Graef G, Grusak MA, Clemente T: Molecular and phenotypic characterization of transgenic soybean expressing the Arabidopsis ferric chelate reductase gene, FRO2. Planta 2006, 224(5):1116-1128. 61. Simmons CR, M. Fridlender, P.A. Navarro, N. Yalpani: A maize defense- inducible gene is a major facillitator superfamily member related to bacterial multidrug resistance efflux antiporters. Plant molecular biology 2004, 52:433- 446. 62. Nawrath C, S. Heck, N. Parinthawong, and J.P. Metraux: EDS5, an Essential Component of Salicylic Acid-Dependent Signaling for Disease Resistance in Arabidopsis, Is a Member of the MATE Transporter Family. The Plant cell 2002, 14:275-286. 63. Talke IN, M. Hanikenne, and U. Kramer: Zinc-Dependent Global Transcriptional Control, Transcriptional Deregulation, and Higher Gene

32

Copy Number for Genes in Metal Homeostasis of the Hyperaccumulator Arabidopsis halleri. Plant Physiol 2006, 142:148-167. 64. Durrett TP, W. Gassmann, and E. Rogers: The FRD3-Mediated Efflux of Citrate into the Root Vasculature Is Necessary for Efficient Iron Translocation Plant Physiol 2007, 144:197-205. 65. Green LS, E.E. Rogers: FRD3 Controls Iron Localization in Arabidopsis. Plant Physiol 2004, 136:2523-2531. 66. Morita Y, S. Kodama, S. Shiota, T. Mine, A. Kataoka, T. Mizushima, T. Tsuchiya: NorM, a putative multidrug efflux protein, of Vibrio parahaemolyticus and its homolog in Escherichia coli. Antimicrobial Agents Chemotherapy 1998, 42:1778-1782. 67. Morita Y, A. Kataoka, S. Shiota, T. Mizushima, and T. Tsuchiya NorM of Vibrio parahaemolyticus is an Na+ driven multidrug efflux pump. Journal of bacteriology 2000, 182:6694-6697. 68. Uhde-Stone C, J. Liu, K.E. Zinn, D. L. Allan, C.P. Vance: Transgenic proteoid roots of white lupin: a vehicle for characterizing and silencing root genes involved in adaptation to P stress. The Plant Journal 2005, 44:840-853. 69. Lopez-Millan AF, F. Morales, S. Andaluz, Y. Gogorcena, A. Abadia, J. De Las Rivas, J. Abadia: Responses of sugar beet roots to iron deficiency. Changes in carbon assimilation and oxygen use. Plant Physiol 2000, 124:885-898. 70. Thimm O, Essigmann B, Kloska S, Altmann T, Buckhout TJ: Response of Arabidopsis to Iron Deficiency Stress as Revealed by Microarray Analysis. Plant Physiology 2001, 127:1030-1043. 71. von Wiren N, S. Mori, H. Marschner, V. Romheld: Iron Inefficiency in Maize Mutant ys1 (Zea mays L. cv Yellow-Stripe) Is Caused by a Defect in Uptake of Iron Phytosiderophores. Plant Physiol 1994, 106:71-77. 72. Puig S, N. Andres-Colas, A. Garcia-Molina, and L. Penarrubia: Copper and iron homeostasis in Arabidopsis: responses to metal deficiencies, interactions and biotechnological applications Plant, Cell and Environment 2007, 30:271-290. 73. Murata Y, J. Feng Ma, N. Yamaji, D. Ueno, K. Nomoto, and T. Iwashita: A specific transporter for iron(III)-phytosiderophore in barley roots. The Plant Journal 2006, 46(563-572). 74. Schaaf G, Schikora A, Haberle J, Vert G, Ludewig U, Briat JF, Curie C, von Wiren N: A putative function for the arabidopsis Fe-Phytosiderophore transporter homolog AtYSL2 in Fe and Zn homeostasis. Plant & cell physiology 2005, 46(5):762-774. 75. Takahashi M, Y. Terada, I. Nakai, H. Nakanishi, E. Yoshimura, S. Mori, N.K. Nishizawa: Role of nicotianamine in the intracellular delivery of metals and plant reproductive development. The Plant cell 2003, 15:1263-1280. 76. Waters BM, Chu HH, Didonato RJ, Roberts LA, Eisley RB, Lahner B, Salt DE, Walker EL: Mutations in Arabidopsis yellow stripe-like1 and yellow stripe- like3 reveal their roles in metal ion homeostasis and loading of metal ions in seeds. Plant Physiol 2006, 141(4):1446-1458. 77. Briat JF, C. Curie, F. Gaymard: Iron Utilization and Metabolism in Plants. Current opinion in plant biology 2007, 10 (3):276-282.

33

78. Colangelo EP, Guerinot ML: The essential basic helix-loop-helix protein FIT1 is required for the iron deficiency response. The Plant cell 2004, 16(12):3400- 3412. 79. Brumbarova T, Bauer P: Iron-Mediated Control of the Basic Helix-Loop-Helix Protein FER, a Reulator of Iron Uptake in Tomato Plant Physiology 2005, 137:1018-1026. 80. Ling HQ, P. Bauer, Z. Bereczky, B. Keller, M. Ganal: The tomato fer gene encoding a bHLH protein controls iron-uptake responses in roots. Proceedings of the National Academy of Sciences of the United States of America 2002, 99(21):13938-13943. 81. Vorwieger A, C. Gryczka, A. Czihal, D. Douchov, J. Tiedemann, H.P. Mock, M. Jakoby, B. Weisshaar, I. Saalbach, H. Baumlein: Iron assimilation and transcription factor controlled synthesis of riboflavin in plants. Planta 2007, 226(1):147-158. 82. Kobayashi T, Y. Ogo, R.N. Itai, H. Nakanishi, M. Takahashi, S. Mori, and N.K. Nishizawa: The transcription facyor IDEF1 regulates the response to and tolerance of iron deficiency in plants. Proceedings of the National Academy of Sciences of the United States of America 2007, 104(48):19150-19155. 83. Lockhart DJ, E.A. Winzeler: Genomics, gene expression and DNA arrays. Nature 2000, 405(6788):827-836. 84. Shoemaker RC, P. Keim, L. Vodkin, E. Retzel, S.W. Clifton, R. Waterson, D. Smoller, V. Coryell, A. Khanna, J. Erplding, X. Gai, V. Brendel, C. Raph- Schmidt, E.G. Shoop, C.V. Vielweber, M. Shimatz, D. Pape, Y. Bowers, B. Theising, J. Martin, M. Dante, T. Wylie, and C. Granger: A compilation of soybean ESTs: generation and analysis. Genome / National Research Council Canada = Genome / Conseil national de recherches Canada 2002, 45:329-338. 85. Ainsworth EA, Rogers A, Vodkin LO, Walter A, Schurr U: The effects of elevated CO2 concentration on soybean gene expression. An analysis of growing and mature leaves. Plant physiology 2006, 142(1):135-147. 86. Thibaud-Nissen F, Shealy RT, Khanna A, Vodkin LO: Clustering of microarray data reveals transcript patterns associated with somatic embryogenesis in soybean. Plant physiology 2003, 132(1):118-136. 87. Zou J, Rodriguez-Zas S, Aldea M, Li M, Zhu J, Gonzalez DO, Vodkin LO, DeLucia E, Clough SJ: Expression profiling soybean response to Pseudomonas syringae reveals new defense-related genes and rapid HR- specific downregulation of photosynthesis. Mol Plant Microbe Interact 2005, 18(11):1161-1174. 88. Puthoff DP, M.L. Ehrenfried, B.T. Vinyard, M.L. Tucker: GeneChip profiling of transcriptional responses to soybean cyst nematode, Heterodera glycines, colonization of soybean roots. Journal of Experimental Botany 2007, 58(12):3407-3418. 89. Elling AA, M. Mitreva, J. Recknor, X. Gai, J. Martin, T.R. Maier, J.P. McDermott, T. Hewezi, D. McBird, E.L. Davis, R.S. Hussey, D. Nettleton, J.P. McCarter, T.J. Baum: Divergent evolution of arrested development in teh dauer stage of Caenorhabditis elegans and the inefective stage of Heterodera glycines. Genome Biology 2007, 8(10):R211.

34

90. Tucker ML, A. Burke, C.A. Murphy, V.K. Thai, M.L. Ehrenfried: Gene expression profiles for cell wall-modifying proteins associated with soybean cyst nematode infection, petiole abscission, root tips, flowers, apical buds, and leaves. Journal of Experimental Botany 2007, 58(12):3395-3406. 91. Klink VPCCO, N.W. Alkharouf, M.H. MacDonald, B.F. Matthews: Laser capture microdissection (LCM) and comparative microarray expression analysis of syncytial cells isolated from incompatible and cimpatible soybean (Glycine max) roots infected by the soybean cyst nematode (Heterodera glycines). Planta 2007, 226(6):1389-1409. 92. Klink VPCCO, N.W. Alkharouf, M.H. MacDonald, B.F. Matthews: A time- course compatative microarray analysis of an incompatible and compatile response by Glycine max (soybean) to Heterodera glycines (soybean cyst nematode) infection. . Planta 2007, 226(6):1423-1447. 93. Ithal N, J. Recknor, D. Nettleton, L. Hearne, T. Maier, T.J. Baum, M.G. Mitchum: Parallel genome-wide expression profiling of host and pathogen during soybean cyst nematode infection of soybean. Mol Plant Microbe Interact 2007, 20(3):293-305. 94. Panthee DR, J.S. Yuan, D.L. Wright, J.J. Marois, D. Mailhot, C.N. Stewart: Gene expression analysis in soybean in response to the causal agent of Asian soybean rust (Phakopsora pachyrhizi Sydow) in an early growth stage. Functional Integrated Genomics 2007, 7(4):291-301. 95. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402. 96. West MA, H. van Leeuwen, A. Kozik, D. J. Kliebenstein, R.W. Doerge, D.A. St. Clair, and R.W. Michelmore: High-density haplotyping with microarray-based expression and single feature polymorphism markers in Arabidopsis. Genome research 2006, 16:787-795. 97. Borevitz JO, S.P. Hazen, T.P. Michael, G.P. Morris, I.R. Baxter, T.T. Hu, H. Chen, J.D. Werner, M. Nordborg, D.E. Salt, S.a. Kay, J. Chory, D. Weigel, J.D.G. Jones, and J.R. Ecker: Genome-wide patterns of single-feature polymorphism in Arabidopsis thaliana. PNAS 2007, 104(29):12057-12062. 98. Kumar R, J. Qiu, T. Joshi, B. Valliyodan, D. Xu, H.T. Nguyen: Single Feature Polymorphism Discovery in Rice. PLoS One 2007, 2(3):e284. 99. Luo ZW, E. Potokina, A. Druka, R. Wise, R. Waugh, M.J. Kearsey: SFP Genotyping From Affymetrix Arrays Is Robust But Largely Detects Cis- acting Expression Regulators. Genetics 2007, 176:789-900. 100. Lai CQ, L.D. Parnell, R.F. Lyman, J.M. Ordovas, T.F.C. Mackay: Candidate genes affecting Drosophila life span identified by integrating microarray gene expression analysis and QTL mapping. Mechanisms of Ageing and Development 2007, 128:237-249. 101. DeCook R, Lall S, Nettleton D, Howell SH: Genetic regulation of gene expression during shoot development in Arabidopsis. Genetics 2006, 172(2):1155-1164. 102. Jansen RC, J.P. Nap: Genetical Genomics: the added value from segregation TRENDS in Genetics 2001, 17:388-391.

35

103. Perez-Enciso M, M.A. Toro, M. Tenenhaus, D. Gianola: Combining Gene expression and Molecular Marker Information for Mapping Complex Trait Genes: A Simulation Study. Genetics 2003, 164:1597-1606. 104. Ronald J, J.M. Akey, J. Whittle, E.N. Smith, G. Yvert, and L. Kruglyak: Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. Genome research 2005, 15:284-291.

36

Chapter 2. Microarray analysis of iron deficiency chlorosis in near-isogenic

soybean lines

A paper published in BMC Genomics 2007 8: 476-489

Jamie A. O’Rourke1, Dirk V. Charlson2, Delkin O. Gonzalez3, Lila O. Vodkin3, Michelle

A. Graham4,5, Silvia R. Cianzio5, Michael A. Grusak6, Randy C. Shoemaker4,5

Abstract

Background

Iron is one of fourteen mineral elements required for proper plant growth and development of soybean (Glycine max L. Merr.). Soybeans grown on calcareous soils, which are prevalent in the upper Midwest of the United States, often exhibit symptoms indicative of iron deficiency chlorosis (IDC). Yield loss has a positive linear correlation with increasing severity of chlorotic symptoms. As soybean is an important agronomic crop, it is essential to understand the genetics and physiology of traits affecting plant yield. Soybean cultivars vary greatly in their ability to respond successfully to iron

1 Department of Genetics, Developmental and Cellular Biology, Iowa State University, Ames, Iowa 50011 USA 2 Department of Crop, Soil, and Environmental Sciences. University of Arkansas, Fayetteville, Arkansas 72704 USA 3 Department of Crop Sciences, University of Illinois Urbana-Champaign, Urbana, Illinois 61801 USA 4 USDA-ARS, Corn Insect and Crop Genetics Research Unit, Iowa State University, Ames, Iowa 50011 USA 5 Agronomy Department, Iowa State University, Ames, Iowa 50011 USA 6 USDA-ARS Children’s Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, Texas 77030 USA

37

deficiency stress. Microarray analyses permit the identification of genes and physiological processes involved in soybean’s response to iron stress.

Results

RNA isolated from the roots of two near isogenic lines, which differ in iron efficiency, PI 548533 (Clark; iron efficient) and PI 547430 (IsoClark; iron inefficient), were compared on a spotted microarray slide containing 9,728 cDNAs from root specific

EST libraries. A comparison of RNA transcripts isolated from plants grown under iron limiting hydroponic conditions for two weeks revealed 43 genes as differentially expressed. A single linkage clustering analysis of these 43 genes showed 57% of them possessed high sequence similarity to known stress induced genes. A control experiment comparing plants grown under adequate iron hydroponic conditions showed no differences in gene expression between the two near isogenic lines. Expression levels of a subset of the differentially expressed genes were also compared by real time reverse transcriptase PCR (RT-PCR). The RT-PCR experiments confirmed differential expression between the iron efficient and iron inefficient plants for 9 of 10 randomly chosen genes examined. To gain further insight into the iron physiological status of the plants, the root iron reductase activity was measured in both iron efficient and inefficient genotypes for plants grown under iron sufficient and iron limited conditions. Iron inefficient plants failed to respond to decreased iron availability with increased activity of

Fe reductase.

38

Conclusions

These experiments have identified genes involved in the soybean iron deficiency chlorosis response under iron deficient conditions. Single linkage cluster analysis suggests iron limited soybeans mount a general stress response as well as a specialized iron deficiency stress response. Root membrane bound reductase capacity is often correlated with iron efficiency. Under iron-limited conditions, the iron efficient plant had high root bound membrane reductase capacity while the iron inefficient plants reductase levels remained low, further limiting iron uptake through the root. Many of the genes up- regulated in the iron inefficient NIL are involved in known stress induced pathways. The most striking response of the iron inefficient genotype to iron deficiency stress was the induction of a profusion of signaling and regulatory genes, presumably in an attempt to establish and maintain cellular homeostasis. Genes were up-regulated that point toward an increased transport of molecules through membranes. Genes associated with reactive oxidative species and an ROS-defensive were also induced. The up-regulation of genes involved in DNA repair and RNA stability reflect the inhospitable cellular environment resulting from iron deficiency stress. Other genes were induced that are involved in protein and lipid catabolism; perhaps as an effort to maintain carbon flow and scavenge energy. The under-expression of a key glycolitic gene may result in the iron- inefficient genotype being energetically challenged to maintain a stable cellular environment. These experiments have identified candidate genes and processes for further experimentation to increase our understanding of soybeans’ response to iron deficiency stress.

39

Background

The ability of iron (Fe) to serve as an electron acceptor makes Fe a valuable in a variety of plant processes including photosynthesis, respiration, and seed development. In the soil matrix, Fe exists in one of two forms, Fe2+or Fe3+. However, many environmental conditions, including the high pH of calcareous soils, can result in little Fe2+ availability [1] [2] [3] [4]. To survive in iron limiting environments, plants have evolved two iron uptake strategies, Strategy I and II [5]. Dicot species, including soybean, utilize the Strategy I mechanism to take up the Fe2+ ion. Strategy I plants utilize an ATPase to secrete protons from the roots to acidify the rhizosphere [1] [6] [7] [8] [9].

This acidification aids in the release of Fe from chelating agents in the soil. A root membrane reductase reduces the prevalent Fe3+ ion to the biologically usable Fe2+ ion.

This Fe2+ can then be transported into the roots of the plant where it is available for use in various cellular processes. For strategy I plants, the iron reduction by plant roots has been identified as the rate-limiting step in iron deficiency [10]. Strategy II plants, monocot species, release phytosiderophores from the roots that chelate Fe3+ ions. The entire phytosiderophore iron complex is then transported into the root system of the plant.

Complex genetic and environmental interactions have made soybean IDC an extremely difficult trait to study in field trials [11] [12]. Low Fe availability exacerbates chlorosis levels in many cultivars. This is true in the calcareous soils prevalent in the upper U.S. Midwest farmlands [12]. As plants are subjected to Fe deficiency stress, they respond in a characteristic manner. Developing trifoliates exhibit interveinal chlorosis, growth is stunted, and yield is reduced. Yield reduction has a positive linear

40

correlation with increasing chlorosis levels [4]. To minimize the environmental effect on the plant phenotype, visual phenotypic studies have been conducted with plants grown in a nutrient solution hydroponics system. The hydroponics experiments identified the same QTLs identified in field grown trials [13] making this a viable system in which to study the effects of IDC on soybean while minimizing environmental effects. The comparison of expression profiles, via utilization of cDNA microarrays, of RNA from Fe efficient and inefficient soybean near isogenic lines (NILs) grown under Fe limited hydroponic conditions will identify differentially expressed transcripts related to iron stress. This will provide clues to the physiological differences between iron efficient and inefficient cultivars

Results

Transcript levels of near isogenic soybeans, Clark (Fe-efficient) (PI 548533) and

IsoClark (Fe-inefficient) (PI 547430) were compared by microarray analysis. Plants were grown in Fe limited (50uM Fe(NO3)3) hydroponic conditions for two weeks. RNA extracted from root tissue of both Fe efficient and Fe inefficient plants was fluorescently labeled and hybridized to soybean cDNA microarray slides, containing 9,728 cDNAs representing unigene libraries Gm-r1021 and Gm-r1083 [14], in a balanced dye swap design. A comparison of three biological replicates, each with two technical replicates for a total of six hybridizations, identified 43 genes whose expression levels exceeded a two-fold difference (Tables 1 and 2). Forty-two of the forty-three identified genes were over-expressed in the Fe inefficient line in comparison to the Fe efficient genotype, while a single gene was under-expressed.

41

As controls, the NILs were also grown in Fe sufficient hydroponics solutions

(100uM Fe(NO3)3) and analyzed on cDNA arrays containing the original 9,728 genes from root specific cDNA libraries examined plus an additional 9,272 genes from seed coat, seedling, cotyledon, flower, and pod cDNA libraries for a more global transcript analysis. An analysis of three biological replicates, with two technical replicates apiece for a total of six hybridizations, showed no genes with consistent differential expression between the NILs under Fe sufficient conditions. Thus, the differential expression seen under Fe deficient conditions is likely a result of the differential response of the NILs to the Fe limited environment rather than inherent genetic differences between the NILs

[15].

Real Time RT-PCR experiments confirmed the expression patterns observed in the microarray experiments for nine out of ten randomly chosen genes (Table 3 with figures in supplemental data: [16]). These experiments confirmed that, for the genes tested, the Fe inefficient plants had higher levels of gene expression than Fe efficient plants (Table 3 and[16]). For four of the nine genes confirmed, the RT-PCR results showed greater differential expression between the NILs than was identified by microarray analysis. The RT-PCR experiments examined expression patterns of individual genes, as evidenced by the single peak in the melting curve analysis (data not shown), while hybridization-based microarrays do not necessarily distinguish between gene family members. Three of the nine genes examined by RT-PCR clustered with known stress response genes while the other six genes analyzed by RT-PCR appear to be unique to soybean’s iron deficiency response (see below).

42

To determine a probable function of the 43 differentially expressed genes, the

GenBank accession of the Expressed Sequence Tag (EST) for the gene was queried against The Institute for Genomic Research (TIGR) database soybean gene index

(Version 12.0) [17] to identify the tentative consensus (TC) sequence containing the respective EST. The TC sequence was compared to the UniProt protein database

(February 2006) [18] using BLASTX [19] and an E-value cutoff of E<10-4, to assign a putative function (Tables 1 and 2). Eight of the forty-three sequences examined had no homology to the UniProt protein database. Therefore, the eight individual EST sequences (Gm-c1028-8183, Gm-c1028-8336, Gm-c1009-2578, Gm-c1028-4530, Gm- c1028-1850, Gm-c1028-5360, Gm-c1028-963, and Gm-c1028-4123) were queried against a database of available Soybean Whole Genome Shotgun (WGS) using megaBLAST BLASTN with an E-value cutoff of E<10–100 to identify genomic sequence that could extend the EST sequence. Identical sequence reads which were at least 500 nucleotides in length and shared 100% nucleotide identity to the EST were assembled into a multiple sequence alignment with the EST. If any of the identified sequences extended the ends of the EST a new consensus was generated for the EST. The new consensus was then compared to the UniProt database by BLASTX with an E-value cutoff of E<10-4 to assign a putative function.

Genes known to be involved in the Fe deficiency response have been identified and characterized in model organisms such as Arabidopsis thaliana. To determine if homologs of these genes were present on the soybean cDNA array, 33 members of six

Arabidopsis gene families known to be involved in Fe uptake and homeostasis (IRT,

FRO, FRD, FIT, NRAMP and YSL) were compared to the soybean EST database by

43

BLASTN comparison (E<10-4). Soybean EST sequences belonging to the Gm-r1021 and Gm-r1083 libraries, and thus putatively represented on the cDNA array, were identified. The soybean sequences were then compared (BLASTN) back to the

Arabidopsis genome to determine if they were the reciprocal best match to the original

Arabidopsis iron genes and likely functional orthologs. This approach demonstrated that only one soybean ortholog of an Arabidopsis iron uptake gene was represented on the array. Soybean EST Gm-r1083-2131 is the homolog of Arabidopsis Yellow Stripe-Like

6. However, this gene was not differentially expressed in the microarray experiment.

A single linkage cluster analysis [20] was performed to identify any Fe induced genes with sequence homology (E<10-4) to other stress induced genes. Twenty-four of the 43 Fe deficiency induced genes clustered with known stress-induced genes (Table 1).

Most clusters contain only one Fe induced gene and a number of other stress induced genes. However, one cluster was composed of only two genes (Gm-c1009-2900 and

Gm-c1028-6890) which showed homology to each other and were differentially expressed under Fe deficient conditions, but show no significant homology to other stress induced genes. The remaining nineteen Fe deficiency induced genes showed no sequence homology to known stress induced genes, nor to the other Fe deficiency induced genes identified by the microarray experiment (Table 2).

Because iron reductase is a fundamental component of Strategy I plants, but not represented on the cDNA array, we conducted root iron reductase experiments on both iron efficient and inefficient plants grown in hydroponic solutions 50 and 100uM

Fe(NO3)3. This provided us with information on the physiological status of the plants for this enzyme activity. The iron efficient plant showed a statistically significant increase in

44

root reductase activity from 0.2 to 0.7 umol Fe reduced per gram of fresh weight tissue per hour at 50uM Fe(NO3)3 (Figure 1).

Discussion

In calcareous soils iron-inefficient soybean genotypes often display symptoms of iron deficiency stress (interveinal chlorosis and reduced yield). These symptoms are exacerbated by the cool wet conditions prevalent in early spring. Under field conditions, if the young soybean plant survives the initial iron stress, the plant continues to grow, albeit slowly, and eventually, as the plant matures and the environmental conditions change, the phenotypic effects of iron stress disappear [21]. Soybean cultivars differ in their ability to respond successfully to iron stress. Results of this study have provided clues to understand some of the physiological differences between iron-efficient cultivars and iron-inefficient cultivars.

In this study, NILs developed especially for their iron deficiency response by the

USDA [22], were used in an established hydroponics system to compare gene expression profiles between iron efficient (Clark) and iron inefficient (IsoClark) NILs. The NILs are phenotypically identical, except in their chlorotic response under iron stress conditions.

Clark remains a healthy green under iron deficient conditions while IsoClark exhibits severe interveinal chlorosis.

Growing the NILs in an established hydroponics system allowed for a comparison of gene expression profiles of the roots of iron efficient (Clark) and inefficient (IsoClark) plants to identify differentially expressed genes between the NILs to better understand the physiological responses of soybean to iron stress. Most changes in gene expression

45

identified under iron-limited conditions are negated upon the re-supply of iron to the system [15], thus confirming these genes as induced by iron deficiency. This comparison allows us to confidently report these forty-three genes as differentially expressed between the NILs in response to iron deficiency.

Non-Induction of Iron Reductase Activity Under Iron Deficiency Stress in the Iron-

Inefficient Soybean Isoline

Induction of the Fe(III) chelate reductase is a key iron stress response in Strategy I plants [23]. Without reductase activity, the available Fe+2 for uptake of iron into the root is extremely limited. Because the gene encoding iron reductase (FRO2) was not found on the cDNA array used in this study, we conducted an iron reductase assay to assess this uniquely-Strategy I response in both the iron efficient and inefficient genotypes. The iron inefficient genotype used in this study failed to respond to iron deficiency stress by induction of increased ferric reductase activity. However, the iron-efficient genotype responded to reduced iron availability by increasing its ability to reduce Fe+3 to the usable

Fe+2. Reduction of ferric iron by the roots is considered to be a limiting factor in successful response to reduced iron availability [10]. The lack of induction of the iron reductase in the inefficient isoline is likely a major factor contributing to the severe iron deficiency stress symptoms observed, relative to the iron efficient genotype.

46

General and Specialized Stress Response Genes are Involved in Soybean Iron

Deficiency Stress Response

With the advent of microarray technology, researchers can now identify a broad range of genes that work in concert to protect the plant from abiotic and biotic stresses.

While some genes may be specific to a particular pathogen, stress, or plant species, others may be part of a general stress response shared across multiple plant species or multiple stresses. We developed an in-house sequence database that contains genes identified from the literature that are significantly differentially regulated in response to abiotic or biotic stresses. Some of the sequences are differentially expressed in response to pathogen attack [24] [25] while the majority, are differentially expressed in response to a variety of abiotic stresses including oxygen deprivation [26], drought [27], salt stress [28] [29], Fe deficiency [30], oxidative stress [31], phosphate deficiency and others [32] [33] [34].

Included in this database are the 43 genes identified in our experiments as differentially expressed in response to limited Fe.

A single linkage cluster analysis [20] was performed to determine if the genes identified as differentially expressed in our microarray experiment exhibited significant

(10E-4) sequence similarity to genes differentially expressed under other abiotic stress conditions. Twenty-four of the 43 identified genes showed significant sequence similarity to other genes whose expression levels are altered by some form of abiotic stress (Table 1). The remaining 19 genes showed no sequence homology to known stress induced genes (Table 2). These 19 genes may be unique to soybeans’ iron response.

Three sequences show no sequence homology to any of the genes characterized in the

47

UniProt database, or to other genes identified under iron limited conditions. The unique sequence of these three genes suggests they may be unique to legumes. The two groupings of genes (Tables 1 and 2) identified under Fe limited conditions suggest both a universal stress response and an Fe specific stress response are induced upon Fe deficient conditions.

The Soybean Response to Iron Deficiency Stress

Plants respond to iron stress through an impressive number of metabolic adaptations and adjustments. The iron-inefficient soybean isoline used in this study failed to respond to reduced iron availability by increased activity of Fe(III) chelate reductase.

Thus, the reduced availability of the iron in the growth medium created a severe iron stress for the inefficient plants.

The most striking response of the inefficient isoline to iron stress was the dramatic increase in transcripts of genes involved in signaling and hormonal regulation.

Increased signaling is likely an attempt on the part of the stressed plant to maintain metabolic homeostasis in a decreasingly sustainable environment. For example, MAP kinase and a SNARE protein are well known signaling proteins that were induced in the inefficient line. In addition, RNA mediating genes for RNA methyltransferase and an

RNA binding protein were also induced upon iron stress, as were several DNA-binding zinc finger protein genes and ethylene receptors.

Ethylene is a signaling molecule often associated with root hair development, pathogen infection, wounding and other abiotic stresses [35] including iron stress [36]

[37] [38]. In this study, transcripts encoding an ethylene receptor protein and two

48

ethylene responsive transcription factors were up-regulated. These transcription factors are one of the largest families of transcription factors and can be induced by abiotic stresses [39]. A MAP kinase protein was also induced by iron stress in this study and has previously been shown to serve as a relay in the ethylene-signaling pathway [40].

Another up-regulated gene, the response regulator protein, has been shown to be involved in ethylene and other hormone signaling [41]. The identification of so many genes

(>16% of all transcripts identified) encoding ethylene response-protein gene transcripts under our experimental conditions strongly indicates the ethylene signaling pathway is involved in the soybean Fe deficiency stress response, probably serving a myriad of duties [42].

The increase of signaling transcripts in the severely stressed genotype likely accounts for the up-regulation of genes involved in anion transport (endomembrane protein and the putative arsA Homolog hASNA-1) and an aquaporin protein. Aquaporins are known to be induced by iron deficiency and other abiotic stresses [43]. The induction of an aquaporin gene points toward the necessity for the cells to move nutrients and metabolites. Because of the ability of aquaporins to transport small molecules they may also be serving in the movement of additional cellular signals in stress pathways.

UDP- calalyzes the transfer of a glucosyl group from UDP- glucose to an acceptor molecule. UDP-glucosyltransferase has recently been shown to be a key enzyme in the production of isoflavones in Glycine max [44]. Because of the role of isoflavones in the soybean stress response the induction of this gene may simply reflect a generalized response to the iron stress. However, the induction of this gene may be indicative of glucosylation of proteins for export through cellular membranes, or

49

synthesis of oligosaccharides from cellular starch or sugars [45]. Glucosylation of protein-linked oligosaccharides may protect them from degradation [46]. Because the endoplasmic reticulum is the main site at which glucosylation of oligosaccharides takes place, induction of an RER1-like gene (functions in returning membrane proteins to the endoplasmic reticulum (ER)) supports this scenario.

A single aldolase gene, Fructose-bisphosphate aldolase, was under-expressed in the inefficient genotype relative to the efficient genotype. The uniqueness of this response under our experimental conditions warrants discussion. The reduced amount of this catalytic gene product may have several outcomes. Fructose-bisphosphate aldolase is an early step in the glycolysis pathway. The products of this pathway are ATP and pyruvic acid (PVA). It is unlikely that suppression of this gene during severe iron stress and chlorosis means the inefficient isoline has an adequate energy source from photosynthesis and therefore does not require the breakdown of glucose. The possible slowdown of glycolysis could result in an energetically challenged cellular environment, thus contributing further to the iron stress. The lack of evidence for increased glycolysis would also suggest that glucose levels are not depleted, leaving that molecule available for other activities (see above).

Although less supported, under-expression of the aldolase gene may result in failure to induce a critical iron homeostasis response in the inefficient genotype. The reduced amount of aldolase transcript in the inefficient genotype suggests this may not have been adequate to respond to reduced iron in the environment. Therefore, in addition to the lack of ferric reductase induction, maintenance of iron homeostasis within the cell may have been further impaired, compounding the iron stress placed on the plant. This

50

has important implications for understanding genetic variation in soybeans’ response to iron stress.

These two results (lack of induction of ferric chelate reductase and reduced aldolase transcript) probably play major roles in creating the severe stress seen in the inefficient genotype. Most of the other differentially expressed transcripts can be explained by the soybean physiological responses to the stress.

Under adverse environment conditions, such as iron stress, plants are known to produce reactive oxygen species (ROS). In this study, a peroxisomal copper containing oxidase was up-regulated in the inefficient genotype. Peroxisomal copper containing oxidase catalyzes the oxidation of amines to aldehyde, NH3 and H202 [47] . ROS such as

H202 can cause damage to proteins and lipids [48]. The hydroponic conditions maintaining severe iron deficiency stress invoke the oxidative stress response. In a seemingly defensive reaction, the up-regulation of a peroxidase precursor points to the soybean plant responding to the increased ROS (H202) by increasing the amount of ROS- scavenging enzyme(s). This is not unusual. Other Strategy I plants, such as sunflower and sugar beet, also have been shown to respond to iron stress through changes in components of their antioxidative systems [49] [50].

Several of the up-regulated genes in iron stressed roots identified in this study are related to the ubiquitin/ proteosome degradation pathway. These include ubiquitin, ubiquitin conjugating enzyme, and a 26S proteasoeme regulatory subunit. The up- regulation of genes in the ubiquitin/ proteasome pathway plus the up-regulation of a gene for phagocytosis and a cell motility protein suggests a breakdown of cellular membranes and general deterioration of cellular health of root tissue due to iron deficiency.

51

Nutrient deprivation in plants has shown to induce both ubiquitin/proteasome and vacuolar degradation of proteins and lipids [43] [51]. Homologs of these genes in other species have been shown to be involved in recycling non-essential proteins and the utilization of the degraded products to maintain vital cellular function [52]. Ubiquitin conjugating have been shown to be induced under stress conditions [43] including heavy metal stress. The ubiquitin response has also been associated with the regulation and downstream signaling of resistance genes [53]. The by products of this catabolism are thought to be re-mobilized to sustain growth under stress conditions [51].

The remobilization of the byproducts by the iron inefficient plants may provide carbon and nutrients to rapidly expanding leaves. Thimm et al. [54] suggested a similar physiological response to iron stress, to maintain carbon flow. Garbarino et al [55] suggested abiotic stress results in improperly folded proteins, which are targeted for degradation by ubiquitinization. Interestingly, one of the over-expressed genes in the inefficient genotype encoded a chaperonin protein and chaperonins are needed for proper folding of nascent proteins.

The transcript for the ubiquitin conjugating enzyme was shown to be up regulated in iron inefficient plants under iron limiting conditions. In other species this enzyme has been shown to require the interaction of zinc ring finger proteins. In this study, five zinc finger protein genes were induced in the iron-stressed genotype. These zinc finger proteins may be acting as transcription factors in the regulation of the ubiquitin pathway in soybean, or they may be involved in the post translational modification of other genes known to be involved in iron homeostasis [56] [57] [58].

52

It is unlikely that protein and lipids are the only cellular components modified by the physiological conditions created from the iron stress. The increased expression of the

2-oxoglutarate (2OG) Fe(II)-dependent oxygenase suggest that the physiological changes brought about by iron deficiency stress has resulted in damage to DNA or modification of RNA and the soybean plant is responding to those challenges by increasing DNA repair and RNA stability. The 2OGFe(II)-dependent oxygenase has been predicted to detoxify methylated bases of ssDNA and reverse methylase modification of

RNA, thus creating less toxic base derivatives, and enzymes of this family are also known to catalyze the formation of the plant hormone ethylene [59].

Many of the changes in transcript level observed in iron-stressed soybean correspond to general stress responses. For example, a TIR-NBS-LRR-TIR gene, a common motif in known resistance genes, was found to be up-regulated in the stressed iron-inefficient genotype. The over-expression of this gene suggests soybean responds to iron stress in a manner akin to the way it would combat pathogenic infection. Similarly, when iron-stressed, other plants such as Arabidopsis and rice, show expression changes of genes involved in wounding, abiotic and biotic stresses [60], as well as reproduction

[61] [62]. These types of genes may all be members of common cascades involved in physiological stress responses.

It is important to note that four of the genes we identified in this experiment had no BLAST homology to the UniProt protein database (Tables 1 and 2). While this makes it difficult to determine their function, the fact that they are induced 2.9 to 3.5 fold, suggest they have very important roles and are worthy of further functional analyses.

53

Conclusion

The use of cDNA arrays has allowed us to identify transcripts differentially expressed in soybean under Fe stress conditions. Some of the genes identified are similar to general stress response genes while others may be specific to Fe stress response in soybean. It is important to note that the genes found on the cDNA array used in this study represent only a small subset of the total genic component of soybean. As such, the genes identified as differentially expressed in this study represent only a fragmented snapshot of changes occurring in the soybean physiology in response to iron deficiency stress.

However, we have been able to confirm and extend previous knowledge of soybean’s iron stress responses and draw important inferences for genetic and physiological differences between soybean iron-efficient and iron-inefficient genotypes.

Relative to inefficient soybean genotypes, iron-efficient genotypes may have an increased ability to respond to reduced iron availability in the environment through efficient induction of iron reductase. Root membrane bound reductase capacity is often correlated with iron efficiency. In this study, under iron limited conditions, the iron efficient plant had high root membrane reductase capacity while the iron inefficient plants reductase levels remained low, further limiting iron uptake through the root. Additionally, iron- efficient genotypes may have an efficient induction of catalytic enzymes necessary to release ATP and provide a much needed energy source to help maintain homeostasis.

Many of the genes induced in the iron inefficient NIL are involved in known stress induced pathways. The most striking response of the iron inefficient genotype to iron deficiency stress was the induction of a profusion of signaling and regulatory genes

54

in an attempt to establish and maintain cellular homeostasis. Genes were induced that point toward an increased transport of molecules through membranes. A suppression of a key catalytic gene suggests the iron-inefficient genotype may be energetically challenged to maintain a stable cellular environment.

Many of the induced genes were obviously up-regulated in response to decreasing metabolic integrity and cellular damage. Enzymes were induced that point toward production of protein and lipid-damaging reactive oxidative species and a concomitant induction of an ROS-defensive enzyme. Genes involved in DNA repair and RNA stability were induced. Other genes were induced that are involved in protein and lipid catabolism; perhaps as an effort to maintain carbon flow and scavenge energy. These experiments have identified candidate genes and processes for further experimentation to increase our understanding of soybeans’ response to iron deficiency stress.

These transcripts should serve as a starting point for future research to both understand and improve iron uptake and utilization as a step in improving overall plant health. Understanding the role these gene products play in soybean Fe metabolism could help alleviate yield loss for crops grown in calcareous soils. Further manipulation of these genes could lead to higher Fe content or increased Fe bioavailability for soybeans and other Strategy I food crops.

Methods

Near isogenic soybean lines (NILs) were developed by the USDA in 1972 [22] specifically for their response to Fe deficiency. Fe efficient PI 548533 (Clark) was crossed with Fe inefficient PI 54619 (T203). Progeny were selfed and resulting F2 plants

55

were screened for Fe inefficiency. The Fe inefficient progeny were backcrossed to the

Fe efficient PI 548533 for six generations [22], resulting in an Fe inefficient plant with the Clark genetic background. The Fe inefficient isoline was released as PI 547430

(IsoClark) [22]. Both the Fe efficient PI 548533 (Clark) and Fe inefficient PI 547430

(IsoClark) lines were grown in the Ames, Iowa USDA greenhouse under 16hr photoperiods. Plants were germinated in sterile vermiculite with distilled deionized water. After one week they were transplanted into a DTPA nutrient buffered hydroponics system [3] containing all minerals necessary for normal growth.

Experimental 10L systems to induce Fe deficiency stress had 50uM Fe(NO3)3 Fe levels while systems for the control experiment contained 100uM Fe(NO3)3. Additionally, each

10L system contained 2mM MgSO4*7H2O, 3mM Mg(NO3)2*6H2O, 2.5mM KNO3, 1mM

CaCl2*2H2O, 4.0mM Ca(NO3)2*4H2O, 0.020mM KH2PO4, 542.5uM KOH, 217uM

DTPA, 1.52uM MnCl2*4H2O, 4.6uM ZnSO4*7H2O, 2uM CuSO4*5H2O, 0.20uM

NaMoO4*2H20, 1uM CoSO4*7H2O, 1uM NiSO4*6H2O, 10uM H3BO3, and 20mM

HCO3. A pH of 7.8 was maintained by the aeration of a 3% CO2: air mixture. A supplemental nutrient solution containing 16mM potassium phosphate, 0.287mM boric acid and 355mM ammonium nitrate was added daily to maintain proper plant nutrition.

To ensure the chlorosis was due to Fe deficiency stress, A15, an Fe efficient plant, and

T203, Fe inefficient plant, were included with each experimental replication. Plants were grown in the hydroponics system for two weeks, until they reached the V3 stage [63], at which point tissue was harvested for RNA extraction. This experiment was replicated three times, for three independent biological replicates, each with two technical replicates.

56

RNA Extraction and Microarray Hybridizations

Total RNA from Fe deficient plants was extracted from root tissue of three biological replicates, each with two technical replicates, for a total of six slide hybridizations using a modified phenol:chloroform extraction with a lithium chloride precipitation [14]. Total RNA for control samples was extracted from root tissue following the QiagenRNeasy protocol for three biological replicates each with two technical replicates for a total of six slide hybridizations. All samples were composed of root tissue from four individual plants, all grown in the same hydroponic unit. RNA purity was determined by spectroscopic readings at A260 and A280 and by formaldehyde gel visualization. Experimental samples were further purified using the RNeasy kits from

Qiagen. Purified RNA was then re-analyzed to determine purity and final concentration.

Each sample yielded 180 ug of purified RNA, 90ug of purified RNA was used for each of the dye swap pairs of cDNA slides. The cDNA array for experimental samples consisted of 9,728 total cDNAs of unigene sets Gm-r1021 and Gm-r1083 spotted onto amine coated glass slides [14] and entered as platform GPL1013 in NCBIs Gene Expression

Omnibus database [64] [65]. The cDNA array for the control samples consisted of the original 9,728 total cDNAs from unigene sets Gm-r1021 and Gm-r1083 plus an additional 9,272 total cDNAs of unigene set Gm-r1070 and entered as platform GPL3015 in GEO.

Purified RNA samples were split into 90 ug aliquots and concentrated to 10uL in a Savant Speed Vac. The concentrated purified RNA and oligo dT was heated together for 10 min. 20uL of 1X Buffer, 10mM DTT, 500uM low T dNTPs, 100uM Cy3 or Cy5

57

(Amersham Biosciences), and 13u/uL SuperScriptII (Invitrogen) was added to each

RNA/Oligo dT sample then placed at 420C for 2 hours. Remaining RNA was degraded with an RnaseA/H treatment. The three biological replicates of the Fe deficient samples formed six technical replicates, the raw data has been deposited in GEO [64] [65] and is accessible through GEO series accession number GSE7290. The three biological replicates of the control samples formed six technical replicates, again, the raw data has been deposited in GEO [64] [65] and is accessible through GEO series accession number

GSE7325. The labeled Clark (Fe-efficient) and IsoClark (Fe-inefficient) cDNA samples were mixed in a balanced dye swap design. The combined samples were purified with

QIAquick PCR purification kits (Qiagen) labeled with PolyA DNA and hybridized for 18 hours at 42OC. After overnight hybridization, slides were washed (wash 1: 1XSSC,

0.2%SDS, wash 2: 0.2XSSC, 0.2%SDS, wash 3: 0.1XSDS) to remove unbound cDNAs.

Slides were scanned with ScanArray Express (Stratagene) and resulting images were overlaid and spots identified by the ImaGene program. An analysis program developed at the University of Illinois [14] was used to identify differentially expressed cDNAs.

For our purposes, differential expression is defined as a minimum of two fold over or under expression in the cDNA of IsoClark (Fe-inefficient) relative to Clark (Fe-efficient).

Real Time PCR Confirmation

For the RT-Real Time PCR experiments, 200ng of RNA extracted from root tissue of plants collected over a 48-hour time course was added as initial template for each sample with Time 0 representing the time at which tissue was collected for the microarray experiment. Primers (Table 3) were designed to produce a 250bp amplicon

58

based on the sequences available from GenBank. Stratagene’s Brilliant qRT-PCR kit was used with each 25uL reaction assembled as described by the Stratagene instruction manual (Catalog #600532) with 2.5uL of 50mM MgCl2, and 2uL of 50nM Forward and

Reverse primers as determined experimentally to optimize the reactions. Cycling protocols consisted of a 45 min. at 42OC for the reverse transcription, 10min at 95OC to disable any remaining StrataScript, then 40 cycles of 30sec at 95OC, 1 min at proper annealing temperature for each primer pair, 30sec at 72OC. The PCR reactions were run in the Stratagene Mx3000P followed by a dissociation curve, taking a fluorescent reading at every degree between 55OC and 95OC to ensure only one PCR product was amplifying.

The Stratagene analysis system established a threshold fluorescence level where amplicon fluorescence levels were statistically higher than background fluorescence; this threshold level is referred to as the Ct value, the cycle at which the samples fluorescence is above threshold. To be considered differentially expressed, the Fe efficient and Fe inefficient plants at the same time point had to differ in where they crossed the Ct by more than 1 cycle. One cycle difference in the RT-PCR experiments corresponds to the two-fold difference in gene transcripts between the NILs examined by the microarray experiment. The fold change was calculated from the differences in Ct using the 2Ct method [20] [66]. As controls, a passive reference dye was added to each sample, to ensure recorded fluorescence levels were due to SYBR green incorporation.

Additionally, each sample was also run in triplicate and each sample was also normalized against tubulin amplification, see primers in Table 3, to ensure the differential expression was not due to differing amounts of initial RNA template added to each sample. As an

59

additional negative control each sample was also run without reverse transcriptase to ensure amplification was due from RNA template.

Single Linkage Clustering Analysis

The single linkage clustering analysis performed in this work used techniques first reported by Graham et al. [20]. In brief, the nucleotide sequences of the 43 cDNAs identified as differentially regulated under Fe chlorosis conditions were added to a data set containing the nucleotide sequences of plant genes known to be involved in general stress responses. These general stress genes were identified based on micro/macro and bioinformatic analyses from the following published works: [52] [24] [25] [26] [67] [28]

[68] [33] [30] [31] [20] [34]. Of the total 430 sequences used for clustering, 221 were derived from phosphate-starved tissues of Arabidopsis [34], Medicago truncatula, soybean and Phaseolus vulgaris [20]. The remaining 209 sequences came from a variety of plant stresses [52] [24] [25] [26] [27] [28] [68] [33] [30] [31]. Each sequence was given a unique identifier to allow identification of the source treatment. The entire data set was then compared to itself using TBLASTX [19] with a minimum E-value cutoff of

10E-4. The single linkage clustering perl scripts generated by [20] were used to assign homologous sequences to a cluster. Note that sequences with no UniProt hit, can cluster to sequences with known annotation. Thus, clustering can be used to imply annotation.

Root Iron Reductase Analysis

Seeds of iron efficient and inefficient plants were germinated on germination paper for 7 days before being transplanted into the hydroponics system described above,

60

with either 50 or 100uM Fe(NO3)3. Plants were grown for 2 weeks in the hydroponics system. Cotyledons were removed after 7 days in hydroponics to ensure a uniform chlorotic response. Root reductase activity of the plants was measured with intact roots that were submerged for 30 min in an aerated assay solution containing 1.5mM KNO3,

1mM Ca(NO3)2, 3.75mM NH4H2PO4, 0.25mM MgSO4, 25uM CaCl2, 25uM H3BO3, 2uM

MnSO4, 2uM ZnSO4, 0.5uM CuSO4, 0.5uM H2MoO4, 0.1uM NiSO4, 100uM Fe(III)-

HEDTA, 100uM BPDS (bathophenanthroline disulfonic acid), and 1mM MES, pH 6.

Iron reduction was quantified spectrophotometrically, by measuring the formation of the red-colored product, Fe(II)-BPDS3; absorbance was measured at 535 nm. An aliquot of the solution with no roots submerged in it is used as the blank. A molar co-extinction coefficient of 22.14mM-1cm-1 was used with the measured absorbance reading to calculate the rate of reduction. There were two replicates of the experiment, each with three plants per genotype per iron concentration.

Authors’ contributions:

JAO carried out the sample collection and preparation, microarray hybridizations, RT

PCR validation, data analysis and drafted the manuscript. DVC provided advice on experimental design and provided comment and revisions to the manuscript. MAG

(Ames) carried out the bioinformatic analysis, including the single linkage clustering and annotation, and assisted in drafting the manuscript. LOV provided leadership in cDNA microarrays, their analysis, and edited the manuscript. DOG produced cDNA microarrays used in the study and aided with the hybridizations. SRC provided comments and revisions to the manuscript. MAG (Houston) assisted in the reductase

61

assays and provided comments and revisions to the manuscript. RCS conceived the study, coordinated the design of the project, and drafted the manuscript. Authors are grateful for helpful discussions with Professors R. Thornburg and J. Specht. All authors read and approved the final manuscript.

Acknowledgements

This project was supported by the North Central Soybean Research Program (grant no.

58-3625-2-459).

References

1. Mengel K: Iron Availability in Plant Tissues - Iron chlorosis on calcareous soils. Plant and Soil 1994, 165:275-283. 2. Mengel K, Kirby EA, Kosegarten H, Appel T: Principles of Plant Nutrition 5th Edition, 5th edn. Boston: Kluwer Academic Publishers; 2001. 3. Coulombe BA, Chaney RL, Wiebold WJ: Bicarbonate Directly Induces Iron Chlorosis in Susceptible Soybean Cultivars. Soil Science Society American Journal 1984, 48:1297-1301. 4. Froelich DM, Fehr WR: Agronomic Performance of Soybeans with Differing Levels of Iron Deficiency Chlorosis on Calcareous Soil. Crop Science 1981, 21:438-441. 5. Romheld V: Different Strategies for Iron Acquisition in Higher Plants. Physiol Plant 1987, 70:231-234. 6. Rogers EE, Guerinot ML: FRD3, a Member of the Multidrug and Toxin Efflux Family, controls Iron Deficiency Responses in Arabidopsis. The Plant cell 2002, 14:1787-1799. 7. Vert G, Grotz N, Dedaldechamp F, Gaymard F, Guerinot ML, Briat JF, Curie C: IRT1, an Arabidopsis transporter essential for iron uptake from the soil and for plant growth. The Plant cell 2002, 14(6):1223-1233. 8. Eide D, Broderius M, Fett J, Guerinot ML: A novel iron-regulated metal transporter from plants identified by functional expression in yeast. Proceedings of the National Academy of Sciences of the United States of America 1996, 93(11):5624-5628. 9. Vert G, Briat JF, Curie C: Arabidopsis IRT2 gene encodes a root-periphery iron transporter. Plant J 2001, 26(2):181-189. 10. Connolly EL, Campbell NH, Grotz N, Prichard CL, Guerinot ML: Overexpression of the FRO2 ferric chelate reductase confers tolerance to growth

62

on low iron and uncovers posttranscriptional control. Plant Physiol 2003, 133(3):1102- 1110. 11. Cianzio SR, Fehr WR, Anderson IC: Genotypic Evaluation for Iron Deficiency Chlorosis in Soybeans by Visual Scores and Chlorophyll Concentration. Crop Science 1979, 19:644-646. 12. Lin S, Cianzio SR, Shoemaker RC: Mapping genetic loci for iron deficiency chlorosis in soybean. Molecular Breeding 1997, 3:219-229. 13. Lin S, Baumer JS, Ivers D, Cianzio SR, Shoemaker RC: Field and Nutrient Solution Tests Measure similar Mechanisms Controlling Iron Deficiency Chlorosis in Soybean. Crop Science 1998, 1998(38):254-259. 14. Vodkin LO, Khanna A, Shealy R, Clough SJ, Gonzalez DO, Philip R, Zabala G, Thibaud-Nissen F, Sidarous M, Stromvik M, Shoop E, Schmidt C, Retzel E, Erpelding J, Shoemaker RC, Roxigues-Huete AM, Polaco JC, Coryell V, Keim P, Gong G, Liu G, Pardinas J, Schweitzer P: Microarrays for global expression constructed with a low redundancy set of 27,500 sequenced cDNAs representing an array of developmental stages and physiological conditions of the soybean plant. BMC genomics 2004, 5(1):73. 15. O'Rourke JA, Graham MA, Vodkin L, Gonzalez DO, Cianzio SR, Shoemaker RC: Recovering from iron deficiency chlorosis in near-isogenic soybeans: a microarray study. Plant Physiol Biochem 2007, 45(5):287-292. 16. Iron Deficiency Supplemental Data. http://soybaseorg/publication_data/ORourke/irondeficiency/indexhtml. 17. Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G, Sultana R, White J: The TIGR Gene Indicies: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res 2001, 29:159- 164. 18. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS: UniProt: the universal protein knowledgebase. Nucleic Acids Res 2004:D115-D119. 19. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402. 20. Graham MA, Smith KAT, Cannon SB, VandenBosch KA: Computational Identification and Characterization of Novel Genes from Legumes. Plant Physiology 2004, 135:1179-1197. 21. Hansen NC, Schmitt MA, Anderson JE, Strock JS: Soybean: Iron Deficiency of Soybean in the Upper Midwest and Associated Soil Properties. Agronomy Journal 2003, 95:1595-1601. 22. USDA A, National Genetic Resources Program: Germplasm Resource Information Network - GRIN. In.: National Germplasm Resources Laboratory, Beltsville, Maryland; 2004. 23. Connolly EL, Guerinot M: Iron stress in plants. Genome biology 2002, 3(8):REVIEWS1024. 24. Bae H, Kim MS, Sicher RC, Bae HJ, Bailey BA: Necrosis and ethylene inducing peptide from Fusarium oxysporum induces a complex cascade of transcripts

63

associated with signal transduction and cell death in Arabidopsis. Plant Physiol 2006, 141(3):1056-1067. 25. Branco-Price C, Kawaguchi R, Ferreira RB, Bailey-Serres J: Genome-wide Analysis of Transcript Abundance and Translation in Arabidopsis Seedlings Subjected to Oxygen Deprivation. Annals of botany 2005, 96:647-660. 26. Cominelli E, Galblati M, Vavasseur A, Conti L, Sala T, Vuylsteke M, Leonhardt N, Dellaporta S, Tonelli C: A Guard-Cell-Specific MYB Transcription Factor Regulates Stomatal Movements and Plant Drought Tolerance. Current Biology 2005, 15:1196-1200. 27. Gong Q, Li P, Ma S, Indu Ruspassara S, Bohnert HJ: Salinity Stress Adaptation Competence in the Extremophile Thellungiella halophila in Comparison with its Relative Arabidopsis thaliana. The Plant Journal 2005, 44:903-916. 28. He XJ, Mu RL, Cao WH, Zhang ZG, Zhang JS, Chen SY: AtNAC2, a transcription factor downstream of ethylene and auxin signaling pathways is involved in salt stress response and latereal root development. The Plant Journal 2005, 44:903-916. 29. Kasukabe Y, He L, Nada K, Misawa S, Ihara I, Tachibana S: Overexpression of Spermidine Synthase Enhances Tolerance to multiple Environmental Stresses and Up-Regulates the Expression of Various Stress Regulated Genes in Transgenic Arabidopsis thaliana. Plant Cell Physiology 2004, 45:712-722. 30. Rizhsky L, Davletova H, Liang H, Mittler R: The Zinc Finger Protein Zat12 Is Required for Cytosolic Ascorbate Peroxidase 1 Expression during Oxidative Stress in Arabidopsis. The Journal of biological chemistry 2004, 279:11736- 11743. 31. Wong CE, Li Y, Labbe AS, Guervara D, Nuin P, Whitty B, Xiaz C, Golding GB, Gray GR, Weretilnyk EA, Griffith M, Moffatt BA: Transcriptional Profiling Implicates Novel Interactions between Abiotic Stress and Hormonal Responses in Thellungiella, a Close Relative of Arabidopsis. Plant Physiology 2006, 140:1437- 1450. 32. Graham AM, Ramirez M, Valdes-Lopez O, Lara M, Tesfaye M, Vance CP, Hernandez G: Identification of candidate phosphorus stress induced genes in Paseolus vulgaris L. through clustering analysis across several plant species. Functional Plant Biology 2006, 33:789-797. 33. Negishi T, Nakanishi H, Yazaki J, Kishimoto N, Fujii F, Shimbo K, Yamamoto K, Sakata K, Sasaki T, Kikuchi S, Mori S, Nishizawa NK: cDNA microarray analysis of gene expression during Fe-deficiency stress in barley suggests that polar transport of vesicles is implicated in phytosiderophore secretion in Fe- deficient barley roots. The Plant Journal 2002, 30:83-94. 34. Misson J, Raghothama KG, Jain A, Jouhet J, Block MA, Bligny R, Ortet P, Creff A, Somerille S, Rolland N Doumas P, Narcy P, Herrera-Estrella L, Nussaume L, Thibaud MC: A genome-wide transcriptional analysis using Arabidopsis thaliana Affymetrix gene chips determined plant responses to phosphate deprivation. Proceedings of the National Academy of Sciences of the United States of America 2005, 102:11935-11939. 35. Fujimoto SY, Ohta M, Usui A, Shinshi H, Ohme-Takagi M: Arabidopsis Ethylene-Responsive Element Binding Factors Act as Transcriptional Activators

64

or Repressors of GCC Box-Mediated Gene Expression. The Plant Cell 2000, 12:393- 404. 36. Schikora A, Schmidt W: Iron stress-induced changes in root epidermal cell fate are regulated independently from physiological responses to low iron availability. Plant Physiol 2001, 125(4):1679-1687. 37. Waters BM, Chu HH, Didonato RJ, Roberts LA, Eisley RB, Lahner B, Salt DE, Walker EL: Mutations in Arabidopsis yellow stripe-like1 and yellow stripe-like3 reveal their roles in metal ion homeostasis and loading of metal ions in seeds. Plant Physiol 2006, 141(4):1446-1458. 38. Schmidt W, Tittel J, Schikora A: Role of hormones in the induction of iron deficiency responses in Arabidopsis roots. Plant Physiol 2000, 122(4):1109-1118. 39. Ohme-Takagi M, Suzuki K, Shinshi H: Regulation of Ethylene-Induced Transcription of Defense Genes. Plant Cell Physiology 2000, 11:1187-11192. 40. Wang KLC, Li H, Ecker JR: Ethylene Biosynthesis and Signaling Networks. The Plant Cell 2002, 2002:S131-S151. 41. Haas C, Lohrmann J, Albrecht V, Sweere U, Hummel F, Yoo SD, Hwang I, Zhu T, Schafer E, Kudla J, Harter K: The response regulator 2 mediates ethylene signaling and hormone signal integration in Arabidopsis. The EMBO Journal 2004, 23:3290-3302. 42. Bleecker AB, Kende H: Ethylene: a gaseous signal molecule in plants. Annual review of cell and developmental biology 2000, 16:1-18. 43. Sperotto RA, Riachenevsky FK, Fett JP: Iron deficiency in rice shoots: identification of novel induced genes using RDA and possible relation to leaf senescence. Plant cell reports 2007. 44. Noguchi A, Saito A, Homma Y, Nakao M, Sasaki N, Nishino T, Takahashi S, Nakayama T: A UDP-Glucose:Isoflavone 7-O-Glucosyltransferase from the Roots of Soybean (Glycine max) Seedlings: PURIFICATION, GENE CLONING, PHYLOGENETICS, AND AN IMPLICATION FOR AN ALTERNATIVE STRATEGY OF . The Journal of biological chemistry 2007, 282(32):23581-23590. 45. Plou FJ, Martin MT, Gomez de Segura A, Alcalde M, Ballesteros A: Gluccosyltransferases acting on starch or sucrose for the synthesis of oligosaccharides. Canadian Journal of Chemistry 2002, 80:743-752. 46. Parodi AJ, Mendelzon DH, Lederkremer GZ, Martin-Barrientos J: Evidence that transient glucosylation of protein-linked Man9GlcNAc2, Man8GlcNAc2, and Man7GlcNAc2 occurs in rat liver and Phaseolus vulgaris cells. The Journal of biological chemistry 1984, 259(10):6351-6357. 47. Tipping AJ, McPherson MJ: Cloning and molecular analysis of the pea seedling copper amine oxidase. The Journal of biological chemistry 1995, 270(28):16939- 16946. 48. Sun B, Jing Y, Chen K, Song L, Chen F, Zhang L: Protective effect of nitric oxide on iron deficiency-induced oxidative stress in maize (Zea mays). Journal of plant physiology 2007, 164(5):536-543. 49. Zaharieva TB, Abadia J: Iron deficiency enhances the levels of ascorbate, glutathione, and related enzymes in sugar beet roots. Protoplasma 2003, 221(3- 4):269-275.

65

50. Ranieri A, Castagna A, Baldan B, Soldatini GF: Iron deficiency differently affects peroxidase isoforms in sunflower. Journal of experimental botany 2001, 52(354):25-35. 51. Rose TL, Bonneau L, C. D, Marty-Mazars D: Starvation – induced expression of autophagy-related genes in Arabidopsis. Biol Cell 2006, 98:53-67. 52. Alignan M, Hewezi T, Petitprez M, Dechamp-Guillaume G, Gentzbitten L: A cDNA microarray approach to decipher sunflower (Helianthus annuus) responses to the necrotrophic fungusPhoma macdonaldii. New Phytologist 2006, 170:523- 536. 53. Goritschnig S, Zhang Y, Li X: The ubiquitin pathway is required for innate immunity in Arabidopsis. The Plant Journal 2007, 2007:540-551. 54. Thimm O, Essigmann B, Kloska S, Altmann T, Buckhout TJ: Response of Arabidopsis to Iron Deficiency Stress as Revealed by Microarray Analysis. Plant Physiology 2001, 127:1030-1043. 55. Garbarino JE, Oosumi T, Belknap WR: Isolation of a polyubiquitin promoter and its expression in transgenic potato plants. Plant Physiology 1995, 109:1371-1379. 56. Hudson BP, Martinez-Yamout MA, Dyson HJ, Wright PE: Recognition of the mRNA AU-rich element by the zinc finger domain of TIS11d. Nature structural & molecular biology 2004, 11(3):257-264. 57. Connolly EL, Fett JP, Guerinot ML: Expression of the IRT1 metal transporter is controlled by metals at the levels of transcript and protein accumulation. The Plant cell 2002, 14(6):1347-1357. 58. Brumbarova T, Bauer P: Iron-Mediated Control of the Basic Helix-Loop-Helix Protein FER, a Reulator of Iron Uptake in Tomato Plant Physiology 2004, 137:1018-1026. 59. Aravind L, Koonin EV: The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases. Genome biology 2001, 2(3):RESEARCH0007. 60. Rietz S, Holk A, Scherer GF: Expression of the patatin-related phospholipase A gene AtPLA IIA in Arabidopsis thaliana is up-regulated by salicylic acid, wounding, ethylene, and iron and phosphate deficiency. Planta 2004, 219(5):743- 753. 61. Agarwal P, Arora R, Ray S, Singh AK, Singh VP, Takatsuji H, Kapoor S, Tyagi AK: Genome-wide identification of C(2)H (2) zinc-finger gene family in rice and their phylogeny and expression analysis. Plant molecular biology 2007, 65(4): 467-85. 62. Ray S, Agarwal P, Arora R, Kapoor S, Tyagi AK: Expression analysis of calcium-dependent protein kinase gene family during reproductive development and abiotic stress conditions in rice (Oryza sativa L. ssp. indica). Mol Genet Genomics 2007, 278(5): 493-505. 63. Fehr WR, Caviness CE: Stages of Soybean Development, Special Report 80. In., vol. 80: Cooperative Extension Service, Iowa State University; 1977: 1-12. 64. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R: NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res 2007, 35(Database issue):D760-765.

66

65. Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30(1):207-210. 66. Schmittgen TD, Zakrajsek BA, Mills AG, Gorn V, Singer MJ, Reed MW: Quantitative Reverse Transcription-Polymerase Chain Reaction to Study mRNA Decay: Comparison of Endpoint and Real-Time Methods. Analytical Biochemistry 2000, 285:194-204. 67. Gong Q, Li P, Ma S, Indu Rupassara S, Bohnert HJ: Salinity stress adaptation competence in the extremophile Thellungiella halophila in comparison with its relative Arabidopsis thaliana. Plant J 2005, 44(5):826-839. 68. Kasukabe Y, He L, Nada K, Misawa S, Ihara I, Tachibana S: Overexpression of spermidine synthase enhances tolerance to multiple environmental stresses and up-regulates the expression of various stress-regulated genes in transgenic Arabidopsis thaliana. Plant & cell physiology 2004, 45(6):712-722.

67

Figure 1: Whole Root Reductase Assay Results Across Various Iron Concentrations

The iron efficient Clark plant shows a statistically significant increase in reductase activity at 50uM Fe(NO3)3, iron deficient conditions for the microarray experiment. At the same iron concentration, the iron inefficient IsoClark shows low levels of reductase activity.

68

177 103 118 164 55 19 107 24/ 64 63 49 ------

36 0E Value - -

UniRef Blast UniRef E 1.00E 1.00E 1.00E 0 1.0 1.00E 2.00E 1.00E 8e 1.00E 2.00E 2.00E 1.00E

Cluster Members 1Fe,1PS,1ST 118PS, 3Fe, 107ST 1Fe,2PS,1ST 118PS, 3Fe, 107ST 3ST 1Fe, 6PS, 11ST 2Fe, 6PS, 11ST 2Fe, 5PS,6ST/ 1Fe, 5ST 1Fe,9PS, 8PS, 33ST 2Fe, 8PS, 33ST/ 2Fe, 2Fe 2Fe 1PS,7ST 1Fe,

4

-

Bisphosphate Bisphosphate

-

hatase type 2C / type hatase TBLASTX UniRef DB UniRef DB TBLASTX Annotation Fructose Aldoslase Protein Serine/Threonine Kinase Peroxissomal Copper OxidaseContaining Factor eIF Initiation gamma Protein, Finger Zinc Cys3His H2 Protein, Finger Zinc Protein Finger Zinc Phosp Responsive Ethylene Factor Transcription Responsive Ethylene Factor/ Transcription Ubiquitin Ubiquitin Regulator Response

isogenic lines that cluster with other stress induced genes stress that lines other isogenic induced with cluster

- UNIREF 100 Q9SJQ9 Q9M590 Q9XHP 4 Q8H0T8 Q9XEE6 Q9ZT44 Q9C9T6 Q9LDA 7 Q56E95 Q9FE67 Q49976

Associated Associated TC TIGR GmTC206003 GmTC220166 GmTC223013 GmTC225799 GmTC224861 GmTC219139 AW831928 GmTC208403 GmTC225579 GmTC214121 GmTC214518 GmTC228370

P Value 0.2128 0.0012 0.0417 0.0391 0.0366 0.1137 0.0036 0.0262 0.0208 0.0078 0.0163

Federated Federated Ratio 0.296 2.176 2.204 2.409 2.412 2.536 2.701 2.597 2.639 2.936 5.219 2.821

1674 6047 8683 8247 8188 6637 5360 2360 7092 2900 6890 8604 ------

c1004 c1028 c1028 c1028 c1004 c1028 c1028 c1009 c1004 c1009 c1028 c1028 ------

Clone ID Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Table 1: Genes differentially expressed between iron differentially between Genes 1: near expressed Table

69

e of of e 67 136 95 143 47 26 66 115 48 153 ------

0E

in the in iron 1.00E 1.00E 2.00E 1.00E 2.00E 1.00E 2.00E 1.00E 1.0 1.00E N/A 0

two near two isogenic

s depravation (PS), or general general (PS), depravation s or

ce to which the which to clone ID ce

expressed genes genes expressed - 1Fe,1PS 1Fe,1PS 1PS, 1ST 1Fe, 31PS, 15ST 2Fe, 1Fe,1PS 5ST 1Fe,22PS, 4PS,3ST 1Fe, / 1Fe, 1Fe,2ST 2PS,1ST 1ST 1Fe,1PS, 31PS, 15ST 2Fe, 2PS, 2ST 1Fe, 118PS, 3Fe, 107ST

4 -

e

value is the association of the the of valueannotationassociation the is to -

RER1A Protein Chaperonin RuBisCo Activase Protein SNARE Protein Methyltransferase RNA Peroxidase Precursor 4 Dihydroflavonol Reductase Shock Protein Heat Hsp70 Protein Binding RNA PIP1 Protein Aquaporin E<10E Hit No UniRef Kinas Protein Map

Q69IX0 Q75HJ3 Q6J4N8 Q2V2S5 Q9SKM 5 Q9ZNZ6 Q9SPJ5 Q9M6R 1 O80567 Q946J9 No UniRef Q9MA1 7 and fold changes below 0.5 represent under changes fold represent 0.5 and below

205220 218842

. TIGR TC represents the tentative consensus sequen consensus tentative the TC represents TIGR .

expressed expressed - GmTC GmTC228039 GmTC228924 GmTC209508 GmTC230619 AW704123 GmTC GmTC204156 GmTC206397 GmC225028 AW666293 GmTC216364

0.0246 0.0340 0.0083 0.0026 0.0091 0.0361 0.0713 0.0018 0.1557 0.0542 0.027 0.1109

red to the iron efficient plant redefficient iron the to 2.827 2.891 3.01 3.117 3.138 3.156 3.214 3.583 3.593 3.776 5.576 5.174

8161 6556 3137 2333 1706 2326 1633 5349 6630 2676 4123 9215

------

(Continued) c1028 c1028 c1013 c1013 c1028 c1028 c1028 c1028 c1004 c1028 c1028 c1028 ------

Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm

Table 1 Table

The clone ID identifies the specific clone spotted on the microarray. Federated Ratio is the fold change between the Federated the ID between spotted change is Ratio identifies clone clone fold the specific microarray. the on The genes over above represent 2 Fold lines. changes compa plant inefficient showing genes similarity high of TIGR. to annotation to function UNIPROT is according sequenc the belongs identified the The phosphoru iron by (Fe), induced genes of the deficiency indicate number Cluster TC. members TIGR the E grouping. similarity cluster share sequence high that (ST) unique a form to stress abiotic sequence. TC TIGR the

70

11 177 25 133 35 147 18 59 61 51 11 32 179 108 ------

Value - .00E UniProt Blast E 7 1.00E 1.00E 1.00E 3.00E 1.00E 0 3.00E NA NA 5.00E 8.00E NA 3.00E 4.00E NA 3.00E 1.00E 1.00E

1 -

4 -

4 4 ATPase - - -

TIR Type Disease Type TIR Disease - LRR -

glucosyltransferase Fe(II) Oxygenase Oxygenase Fe(II) - - NBS - TBLASTX UniProt DB TBLASTX Annotation DP1 Factor Transcription non 26S Proteasome Subunit Regulatory 2OG Protein Endomembrane Protein Membrane hASNA Homolog Putative arsA Polyprotein 2 Large Protein Membrane Plant Specific E<10 Hit No UniRef E<10 Hit No UniRef Enzyme Ubiquitin Conjugating Ring Finger Protein Zinc TIR Protein Resistance Protein Decay Mediated Nonsense UPF3 Receptor Ethylene E<10E Hit No UniRef Cell Motility and Phagocytosis Protein UDP Ring Finger Protein Zinc

38

UniProt Q7XZ14 Q06364 Q8RWY1 Q940G0 Q6DBF6 Q949M9 Q8JUF1 Q7XYW5 No UniRef No UniRef Q3HVN0 Q9C9T6 Motif Analysis Q1SL19 Q6W5B6 Q9LR39 Q9LY Q8H9B4 Q2TE73

Associated Associated TC TIGR BE021708 GmTC215393 AW831377 GmTC219105 GmTC227948 GmTC227091 GmTC225133 GmTC203969 AW278268 BE021665 GmTC204328 GmTC226909 GmTC229698 AW704680 BE021924 BE021484 GmTC217970 GmTC217285 GmTC225698

isogenic lines that did not cluster with other iron or stress induced genes induced isogenicstress did that lines or iron other with not cluster -

Value - P 0.1037 0.0403 0.0316 0.0311 0.0426 0.0065 0.0241 0.1011 0.0551 0.0004 0.0599 0.1685 0.0179 0.0102 0.0041 0.2337 0.0359 0.1264 0.0057

788 Federated Federated Ratio 2.217 2.247 2.379 2.471 2.471 2.663 2.772 2. 2.892 3.087 3.429 3.435 3.532 3.532 3.564 3.687 3.712 4.181 7.149

8390 6580 4867 7485 2190 720 5020 6717 2578 8336 6231 2943 1850 4530 8658 8183 3740 1088 963 ------

c1028 c1028 c1028 c1028 c1028 c1028 c1004 c1004 c1009 c1028 c1004 c1013 c1028 c1028 c1028 c1028 c1028 c1028 c1028 ------

Clone ID Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm Gm

Table 2: Genes differentially expressed between differentially between near Genes 2: expressed Table

71

This list of genes represents the genes identified as differentially expressed between the NILs under iron deficient conditions which do not have sequence homology to other known stress induced genes. The sequences of these genes also showed no homology to other genes induced by iron deficiency and differentially expressed between the NILs. The clone ID identifies the specific clone spotted on the microarray. The Federated Ratio is the fold change between the two near isogenic lines. Fold changes above 2 represent genes over-expressed in the iron inefficient plant compared to the iron efficient plant while fold changes below 0.5 represent genes under-expressed in the iron inefficient plant compared to the iron efficient plant. The TIGR TC represents the tentative consensus sequence to which the clone ID belongs according to TIGR. The UNIPROT annotation is the identified function of genes showing high similarity to the sequence of the TIGR TCThe E-value is the association of the annotation to the TIGR TC sequence.

72

Fe Inefficient Standard Error 0.255 0.015 0.030 0.040 0.040 0.060 0.010 0.045 0.05 efficient) and efficient)

- an an was Fe Efficient Standard Error 0.345 0.185 0.115 0.020 0.080 0.040 0.055 0.025 0.060

quantitative real time real quantitative

- 7 Fold Change Identified Real by RT Time PCR 7.56 2.9 2.7 2. 2.82 5.12 4.2 3.66 2.8

Federated Ratio 2.379 2.412 2.701 2.772 3.156 3.453 3.564 3.687 7.149

Primer Reverse AAAAGGCCTGGAATGCTC GCAGGAGCAGATGGTAGC AAAAGGCCTGGAATGCTC CGGCTACTCCCTATCCA GGACAGAGGGAGAGATCAGG GATTGTATTTCCCGTGGATT GTGAATGCGCGAAGGAT ACATTGGCTATTTACTTACA GCCACTGCCCTGTCTTACTC

d by microarray analysis confirmed by semi by confirmed analysis microarray by d

G

inefficient) inefficient) identifie - Forward Primer CAGTGGAACTTCGTTGG CCCTGATCTAGAAGTTGG CAGTGGAACTTCGTTGGG GAAGAACAGCGAAACCTAAC CAAGAGCATGATCTACCAGC CGAACCCAAACAAGATACAC TCCAACTCCATCGTCGAG CCAAGCTGGACCATA TGCCATCACTGTTTATCAAG

10) - 4867 8188 5360 5020 2326 2943 8658 8183 963 ------8

Clone ID (Gmc 28 04 28 04 28 13 28 2 28

Ten genes were chosen at random to have their differential expression between Clark (Fe Clark between differential were their expression chosen to atgenes have random Ten (Fe IsoClark ten the of genes of nine chosen. was the four In confirmed for expression PCRDifferential analysis. th differential NILs the expression time between real the PCRgreater genes nine showed analysis. microarray by identified

Table 3: Real Time PCR results confirming differential expression identified by microarray analysis identified Time differential PCR microarray by expression confirming Real 3: results Table

73

Chapter 3. Recovering from Iron Deficiency Chlorosis in Near Isogenic Soybeans:

A Microarray Study

A paper published in Plant Physiology and Biochemistry 2007 45(5): 287-282

Jamie A. O’Rourke7, Michelle A. Graham8, Lila Vodkin9, D. Orlando Gonzalez3, Silvia

R. Cianzio and Randy C. Shoemaker

Abstract

Iron deficiency chlorosis (IDC) in soybeans has proven to be a perennial problem in the calcareous soils of the U.S. upper Midwest. A historically difficult trait to study in fields, the use of hydroponics in a controlled greenhouse environment has provided a mechanism to study genetic variation while limiting environmental complications. IDC susceptible plants growing in calcareous soils and in iron-controlled hydroponic experiments often exhibit a characteristic chlorotic phenotype early in the growing season but are able to re-green later in the season. To examine the changes in gene expression of these plants, near isogenic lines, iron efficient PI548553 (Clark) and iron inefficient

PI547430 (IsoClark), developed for their response to iron deficiency stress (22) were grown in iron-deficient hydroponic conditions for one week, then transferred to iron sufficient conditions for another week. This induced a phenotypic response mimicking

7 Department of Genetics, Developmental and Cellular Biology, Iowa State University, Ames, Iowa 50011 USA 8 USDA-ARS, Corn Insect and Crop Genetics Research Unit, Iowa State University, Ames, Iowa 50011 USA 9 Department of Crop Sciences, University of Illinois Urbana-Champaign, Urbana, Illinois 61801 USA

74

the growth of the plants in the field; initial chlorosis followed by re-greening. RNA was isolated from root tissue and transcript profiles were examined between the two near- isogenic lines using publicly available cDNA microarrays. By alleviating the iron deficiency stress our expectation was that plants would return to baseline expression levels. However, the microarray comparison identified four cDNAs that were under- expressed by a two-fold or greater difference in the iron inefficient plant compared to the iron efficient plant. This differential expression was re-examined and confirmed by real time PCR experimentation. Control experiments showed that these genes are not differentially expressed in plants grown continually under iron rich hydroponic conditions. The expression differences suggest potential residual effects of iron deficiency on plant health.

Key Words

Iron Deficiency Chlorosis (IDC), Microarray, Hydroponics, Soybean, Real Time PCR

(RT PCR)

Introduction

Iron deficiency chlorosis (IDC) results in substantial yield loss throughout the upper Midwestern farmlands each year. The high pH of calcareous soils hinders the formation of the Fe+2 ion required for plant uptake. With little to no biologically available iron, plants develop a characteristic interveinal chlorotic patterning of the leaf tissue. If iron deficient growth conditions are not alleviated, the plant will suffer from stunted growth and reduced yield (9). Often, the chlorotic phenotype appears early in the

75

growing season and disappears as the plants continue to grow and mature throughout the growing season (11). Long lasting effects of periods of iron deficiency stress can be observed in the decreased yield of plants that suffered from iron limiting conditions early in the growing season. This carryover effect of brief periods of iron deficiency stress on yield suggests the genetic changes occurring early in the plant development have a continuing effect later in the plants’ life cycle.

Quantitative trait phenotypes are a result of the combined effect of environmental and genetic factors. IDC is an extremely quantitative trait with a large environmental component that makes traditional field studies problematic (5, 7, 14). However, previous

IDC studies have identified a hydroponics system that allows plants to be grown in a manner whereby the environment is controlled. In replicated trials, the same iron deficiency QTLs are identified in soybean plants grown hydroponically and in field studies (14). Therefore, this hydroponics system has proven to be a reliable method to examine gene expression of soybean plants suffering from iron deficiency chlorosis (14).

In this study we examined the gene expression profiles of iron efficient and iron inefficient near isogenic lines (NILs) grown in the hydroponics system. After one week in hydroponic iron deficient conditions, at which point differential chlorosis was observed between the iron efficient and inefficient NILs, plants were transferred to iron sufficient conditions for one week, after which both iron efficient and inefficient plants appeared green and healthy. Manipulation of the iron content of the hydroponics system allowed us to mirror the phenotypic response of plants in the field. Transcripts with altered expression levels between iron efficient and iron inefficient plants may be indicative of the long residual effects of iron deficiency on soybean plant health.

76

Results

Transcript levels of near isogenic soybeans, iron efficient Clark (PI548533) and iron inefficient IsoClark (PI547430) were compared by microarray analysis. Plants were grown in hydroponics under iron limiting conditions (50uM Fe(NO3)3) for one week, at which time the inefficient line expressed a chlorotic phenotype (Figure 1). Plants were then transferred to iron sufficient conditions (100uM Fe(NO3)3) for one week, at which time both genotypes were equally green. RNA was then extracted from root tissue of both iron efficient and iron inefficient plants. The RNA was fluorescently labeled and hybridized to cDNA microarray slides containing 9,728 cDNAs (25) in a balanced dye swap design. A comparison of four biological replicates, with two technical replicates each, for a total of eight microarray hybridizations, revealed four genes exhibiting directionally consistent differential expression on six arrays. These four genes exceeded a two-fold difference of expression between the two genotypes on six of the eight slides

(Table I). All four of the transcripts showed reduced expression levels in the iron inefficient plants in comparison to the iron efficient plants.

As a control, NILs grown in iron sufficient hydroponic solutions (100uM

Fe(NO3)3) for two weeks were also harvested and analyzed on cDNA slides containing the 9,728 genes above, plus an additional 9,272 genes. A comparison of three biological replicates with two technical replicates apiece, for a total of six total hybridizations, were analyzed. Using the same stringency levels as the iron recovery experiment, directionally consistent differential expression on six arrays, we observed no differential expression between the two near isogenic lines grown continually under iron sufficient conditions for any of the 9,728 genes examined in the iron recovery experiment or any of the 9,272

77

additional genes analyzed. However, if stringency levels are lowered to five of six arrays, a small number of genes differentially expressed between the two lines are observed. The four genes identified in this experiment were not among that group (see supplemental data at http://soybase.org/publication_data/ORourke/IronStressRecovery/index.html). The lack of differential expression between the iron efficient and inefficient plants grown under continual iron sufficient conditions has also been observed in an Affymetrix gene chip experiment (O’Rourke and Shoemaker, unpublished data). Thus, the differential expression seen under the iron recovery conditions are likely a result of the NILs’ differential response to the changing iron environment.

The expression patterns identified in the microarray experiment were confirmed by reverse transcriptase real time PCR (RT PCR) (Table 1). As observed with the microarray data, iron inefficient plants had lower expression levels than iron efficient plants (see example, Figure 2, Table 1). Dissociation curve analysis confirmed that the

RT PCR reaction amplified a single product for each NIL/gene combination. However, a shift in melting temperature between the iron efficient and iron inefficient plants for the asparagine aminohydrolase gene encoded by Gm-c1028-5479 (Figure 3) suggests sequence differences in the coding region for the gene.

The GenBank accessions of the four EST sequences differentially expressed under the iron recovery conditions were queried against The Institute for Genomic Research

(TIGR) soybean gene index (Version 12.0, 17) to identify the tentative consensus (TC) sequence containing the EST of interest. BLASTX (1) was used to compare the TCs to the Uniref100 protein database (February 2006, 2) and assign putative function (Table 1).

78

In addition, the TCs were queried against an in-house database of stress-induced genes collected from the literature. Of the four TCs, only one showed any homology to other stress-induced genes. This dissimilarity reinforces the idea that these genes are involved in the recovery process of iron deficiency. The gene showing similarity to other stress induced genes, Gm-c1028-6182, was part of a TIGR TC with a UNIREF annotation of a transmembrane protein kinase receptor.

Discussion

Iron deficiency stress is known to reduce yield of soybeans even though no visual indication of iron chlorosis can be observed. In this study we identified four genes differentially expressed between the two near isogenic lines after a return to iron sufficient conditions and a return to normal green phenotypes. These four genes were not differentially expressed under iron deficient conditions (O’Rourke et al. Unpublished

Data) nor do they represent constitutive differences between the NILs grown under iron sufficient growth conditions.

The three genes with putative functional annotations can be ascribed hypothetical roles in iron deficiency-related processes. GmTC221258 had a UNIREF annotation of a ribophorin protein (Table I). Ribophorin proteins are a subunit of the greater oligosaccharyltransferase (OST), which is involved in the glycosylation of proteins entering the endoplasmic reticulum (16, 13). The ribophorin proteins I and II are thought to be essential components of the OST (16, 13) serving to crosslink the 60s ribosomal subunit to the endoplasmic reticulum to form the rough ER (18). The under-expression of this gene in iron inefficient plants compared to iron efficient plants may indicate the

79

production of membranes and vesicles in iron inefficient plants have not yet, and may never, return to the levels seen in iron efficient plants subjected to the same environmental conditions.

The transmembrane protein kinase identified by GmTC209369 may be serving as a receptor protein, possibly recognizing iron availability or other environmental cues

(23). The kinase would then phosphorylate a secondary signaling molecule to initiate a cascade of responses, including root to shoot signaling (19), depending on the amount of iron present. Kinase phosphorylation often serves as a signal to initiate a signaling cascade within the cell. In rice, the over expression of a transmembrane protein kinase has been shown to induce aluminum tolerance (19). This mechanism could easily be extended to involve kinase activity in other heavy metal responses. Genes with sequences highly homologous to GmTC209369 have been shown to be differentially expressed under phosphate stress (10) and other forms of abiotic stress (12, 3).

The UNIREF annotation of GmTC217977 suggests this gene encodes an asparagine aminohydrolase. Asparaginease activity has been shown to provide ammonia as a nitrogen source for all nitrogen containing compounds within the cell. The enzymatic activity has been shown to be upregulated during protein synthesis (4, 15).

Activity levels of asparaginase are highest in roots and nodules (20) so it is not surprising the expression levels of asparagine aminohydrolase would be differentially expressed in the root systems of iron efficient and inefficient plants. Differential expression of the asparagine aminohydrolase suggests the protein synthesis and nitrogen metabolism pathways in the iron inefficient plant have not returned to the production levels of the iron efficient plant.

80

The real time PCR confirmation experiments detected greater differential expression between the NILs than was observed with the microarray. This may be because the microarray analysis, through hybridization, reflects the expression level of multiple members of a gene family while the RT PCR is specific to the individual family member represented by the EST sequence.

The dissociation analysis of Gm-c1028-5479 revealed sequence dissimilarity between the iron efficient and iron inefficient lines. Both genotypes amplified only one

PCR product, as shown by the single peak of the dissociation curve (Figure 3). However, the shift in melting temperatures between the two genotypes suggests the sequence of the asparagine aminohydrolase gene is longer in the iron inefficient plant than in the iron efficient plant. Alternatively, the near isogenic lines may have different GC concentrations in the region of the gene being transcribed, or, much less likely, we may be amplifying different members of the same gene family. That the gene encoded by

Gm-c1028-5479 appears to have a genotype-dependent coding region makes this gene an excellent candidate for further examination in its role of iron efficiency.

The four candidate genes identified by microarray analysis in this study are excellent candidate genes to study the long-term effects of iron deficiency on soybean.

The induction of these genes after returning to an iron sufficient environment suggests these genes may be directly involved in utilizing available iron. The sequence differences in the asparagine aminohydrolase gene suggested by the RTPCR dissociation curve suggests different members of the same gene family may be differentially transcribed in different genotypes possibly resulting in different activity levels within the plants. The lower expression levels of these genes in iron inefficient plants suggests they may not

81

reach expression levels achieved by iron efficient plants. This reduced expression pattern may reflect reduced activity levels, which, speculating, could explain the reduced yield often seen from plants experiencing iron deficiency chlorosis early in the growing season. This study has identified a small subset of the genes likely involved in this complicated metabolic process. In the future, experiments using larger arrays such as the

Affymetrix gene chip may help to identify additional candidate genes.

Methods

Plant Growth Conditions:

Near isogenic soybean lines, PI548533 ‘Clark’ and PI547430 ‘IsoClark’, were grown in the Ames, Iowa USDA greenhouse under 16hr photoperiods. Plants were germinated in sterile vermiculite with distilled deionized water. After one week they were transferred to a DTPA nutrient buffered hydroponics system (6) containing all minerals necessary for normal growth with iron being the only limiting component.

Specifically, each 10L system contained 2mM MgSO4*7H2O, 3mM Mg(NO3)2*6H2O,

2.5mM KNO3, 1mM CaCl2*2H2O, 4.0mM Ca (NO3)2*4H2O, 0.020mM KH2PO4,

542.5uM KOH, 217uM DTPA, 1.52uM MnCl2*4H2O, 4.6uM ZnSO4*7H2O, 2uM

CuSO4*5H2O, 0.20uM NaMoO4*2H20, 1uM CoSO4*7H2O, 1uM NiSO4*6H2O, 10uM

H3BO3, and 20mM HCO3. A pH of 7.8 was maintained by the aeration of a 3% CO2: air mixture to each 10L system. To induce iron deficiency, plants were limited with 50uM

Fe(NO3)3 for one week after which the iron level was increased to a non-limiting 100uM

Fe(NO3)3. NILs also were grown constitutively at 100uM Fe(NO3)3, as controls. A supplemental nutrient solution containing 16mM potassium phosphate, 0.287mM boric acid and 355mM ammonium nitrate was added daily to all plants to maintain proper plant

82

nutrition. As further visual controls and to ensure that chlorosis was due to iron deficiency stress, A15, an iron efficient plant and T203, iron inefficient plant, were also included with each experimental replication. At the end of this period plants were at the

V3 stage (8).

RNA Extraction and Microarray Hybridizations

The iron recovery experiment was comprised of four biological replicates, each with two technical replicates for a total of eight hybridizations. Samples were hybridized to a spotted slide containing 9,728 cDNAs from Gm-c libraries 1021 and 1083, which are root specific. Gene expression patterns of plants for the control experiment were compared across three biological replicates, with two technical replicates apiece, for a total of six hybridizations. Samples for the control plants were hybridized to spotted slides containing the 9,728 cDNAs present on the experimental slides plus an additional

9,272 cDNAs. The additional genes allowed for a more global analysis of known root transcripts to determine if there are any inherent genetic differences between the near isogenic lines.

Total RNA was extracted from root tissue of iron recovery plants using a modified phenol chloroform extraction with a lithium chloride precipitation protocol as set forth by the NSF Soybean Microarray Workshop in May of 2000 (24). Samples were composed of root tissue from four individual plants, all grown in the same hydroponic unit. Samples were further purified using RNeasy kits from Qiagen. RNA was extracted from iron sufficient control plants following the RNA extraction protocol of the Qiagen

RNeasy kits. For both methods, each sample yielded 180ug of purified RNA, 90ug of purified RNA for each cDNA array of the dyeswapped slides.

83

Purified RNA samples were split into 90ug aliquots and heated with oligo dT for

10 min. 20uL of 1X Buffer, 10mM DTT, 500uM low T dNTPs, 100uM Cy3 or Cy5

(Amersham Biosciences), and 13u/uL SuperScriptII (Invitrogen) was added to each sample then placed at 420C for 2hours. Remaining RNA was degraded with an

RnaseA/H treatment. Labeled Clark and IsoClark cDNA samples were mixed in a balanced dye swap design labeled with PolyA DNA and hybridized for 18hours at 42oC.

After overnight hybridizations, slides were washed (wash 1: 1XSSC, 0.2%SDS, wash 2:

0.2XSSC, 0.2%SDS, wash 3: 0.1XSDS) to remove unbound cDNAs. Slides were scanned with ScanArray Express (Stratagene) and resulting images were overlaid and spots identified by the ImaGene program. An analysis program developed at the

University of Illinois (21) was used to identify differentially expressed cDNAs. For our purposes, differential expression is defined as a minimum of a two fold over or under expression of the cDNA in IsoClark relative to Clark.

Real Time PCR

For the RT-Real Time PCR experiments 200ng of RNA extracted from root tissue served as initial template for each sample. Primers were designed based on available EST sequences from GenBank to produce 250bp amplicons. Primer sequences were as follows: Gm-c1028-8295 F: GGCCACCATGTAACTTATTC R:

ACTGGCATTGCTGATTGACA Gm-c1028-7992 F: CACTGTTAAATTGCCTGATGC

R: CTCGCACCAACTCTTTAGC Gm-c1028-6182 F:

CAGTGGGAAAGAATCTTGTCAC R: GCCATATTCACTGAGAGGTTAC Gm- c1028-5479 F: GACATTCCAAGGTTGCGTAGGC R:

84

CGCCATTTCTGTTTCGCTTATGG, Tubulin F:CAATTGGAGCGCATCAAT R:

ATACACTCATCAGCATTCTC. Stratagene’s Brilliant qRT-PCR kit was used with each 25uL reaction assembled as described by the Stratagene instruction manual (Catalog

#600532) with 2.5uL of 50mM MgCl2, and 2uL of 50nM F and R primers. Cycling protocols consisted of a 45 min. at 42OC, 10min at 95OC, 40 cycles of 30sec at 95OC, 1 min at 62O, and 30sec at 72OC. The PCRs were run in the Stratagene Mx3000P followed by a dissociation curve, taking a fluorescent reading at every degree between 55OC and

95OC to ensure only one PCR product was amplifying. The Stratagene analysis system established a threshold fluorescence level where amplicon fluorescence levels were statistically higher than background fluorescence; this threshold level is referred to as the

Ct value, the cycle at which the samples fluorescence is above threshold. As controls, a passive reference dye was added to each sample, to ensure recorded fluorescence levels were due to SYBR green incorporation. Additionally, each sample was also run in triplicate and each sample was also normalized against tubulin amplification, see primers above, to ensure the differential expression was not due to differing amounts of initial

RNA template added to each sample. To be considered differentially expressed, the iron efficient and iron inefficient plants at the same time point had to differ in where they crossed the fluorescence threshold by more than 1 cycle.

Dissociation analyses of RT PCR products were conducted by measuring fluorescence levels between 55 C and 95 C on a Stratagene MX3000P.

85

References:

1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ,

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Nucleic Acids Res 25 (1997) 3389-3402.

2. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E,

Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh

LS, UniProt: the universal protein knowledgebase. Nucleic Acids Res. (2004) D115-

D119

3. Bae, H., M.S. Kim, R.C. Sicher, H.J. Bae, and B.A. Bailey, Necrosis and Ethylene

Inducing Peptide from Fusarium oxysporum Induces a Complex Cascade of Transcripts

Associated with Signal Transduction and Cell Death in Arabidopsis. Plant Physiology

141 (2006) 1056-1067.

4. Casado, A., J.L. Caballero, A.R. Franco, J.Cardenas, M.R. Grant, and J. Munoz-

Blanco, Molecular Cloning of the Gene Encoding the L-Asparaginase Gene of the

Arabidopsis thaliana . Plant Physiology 108 (1995) 1321-1322.

5. Cianzio, S.R., W.R. Fehr, and I.C. Anderson, Genotypic Evaluation for Iron

Deficiency Chlorosis in Soybeans by Visual Scores and Chlorophyll Concentration.

Crop science 19 (1979) 644-646.

86

6. Coulombe, B.A., R.L. Chaney, and W.J. Wiebold, Bicarbonate Directly Induces Iron

Chlorosis in Susceptible Soybean Cultivars. Soil Science Society American Journal 48

(1984) 1297-1301.

7. Diers, B.W., S.R. Cianzio and R.C. Shoemaker, Possible Identification of Quantitative

Trait Loci Affecting Iron Efficiency in Soybean Journal of Plant Nutrition 15 (1992)

2127-2136.

8. Fehr, W.R. and C.E. Caviness. Stages of Soybean Development. Special Report 80,

Cooperative Extension Service, Iowa State University. CODEN:IWSRBCC(80) 1-12

(1977).

9. Froelich, D.M., W.R. Fehr, Agronomic Performance of Soybeans with Differing

Levels of Iron Deficiency Chlorosis on Calcareous Soil. Crop Science 21 (1981) 438-

441.

10. Graham, M.A., M. Ramirez, O. Valdes-Lopez, M. Lara, M. Tesfaye, C.P. Vance, and

G. Hernandez, Identification of Candidate Phosphorus Stress Induced Genes in Phaseolus vulgaris L. through Clustering Analysis Across Several Plant Species. Functional Plant

Biology, (2006). in press.

87

11. Hansen, N.C., M.A. Schmitt, J.E. Anderson, J.S. Strock (2003) Soybean: Iron

Deficiency of Soybean in the Upper Midwest and Associated Soil Properties. Agronomy

Journal 95: 1595-1601.

12. He, X.J., R.L. Mu, W.H. Cao, Z.G. Zhang, J.S. Zhang, and S.Y. Chen. (2005)

AtNAC2 a transcription factor downstream of ethylene and auxin signaling pathways, is involved in salt stress response and lateral root development. The Plant Journal 44: 903-

916.

13. Helenius, A., and M. Aebi. (2004) Roles of N-Linked Glycans in the Endoplasmic

Reticulum. Annual Review of Biochemistry 73: 1019-1049.

14. Lin, S., J.S. Baumer, D. Ivers, S. Cianzio and R. Shoemaker, Field and Nutrient

Solution Tests Measure similar Mechanisms Controlling Iron Deficiency Chlorosis in

Soybean. Crop Science 38 (1998) 254-259.

15. Michalska, K., G. Bujacz, and M. Jaskolski, Crystal Structure of Plant Asparaginase.

Journal of Molecular Biology 360 (2006) 105-116.

16. Nikonov, A.V., G. Krelbich, Organization of Translocon Complexes in ER

Membranes. Biochemical Society Transactions 31 (2003) 1253-1256

88

17. Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G,

Sultana R, White J, The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res 29 (2001) 159-164

18. Rosenfeld, M. G., E.E. Marcantonio, J. Hakimi, V.M. Ort, P.H. Atkinson, D.

Sabatini, and G. Kreibich, Biosynthesis and Processing of Ribophorins in the

Endoplasmic Reticulum. The Journal of Cell Biology 99 (1984) 1076-1082.

19. Shivaguru, M., B. Ezaki, Z.H. He, H. Tong, H. Osawa, F. Baluska, D. Volkmann, and H. Matsumoto, Aluminum-Induced Gene Expression and Protein Localization of a

Cell Wall-Associated Receptor Kinase in Arabidopsis. Plant Physiology 132 (2003)

2256-2266.

20. Streeter, John, Asparaginase and Asparagine Transaminase in Soybean Leaves and

Root Nodules. Plant Physiology 60 (1977) 235-239.

21. Thibaud-Nissen, F., R.T. Shealy, A.Khanna, and L.O. Vodkin. Clustering of

Microarray Data Reveals Transcript Patterns Associated with Somatic Embryogenesis in

Soybean. Plant Physiology 132 (2003) 118-136.

22. USDA, ARS, National Genetic Resources Program, Germplasm Resources

Information Network – GRIN. (Online Database) National Germplasm Resources

89

Laboratory, Beltsville, Maryland. 2004. Available: http://www.ars.grin.gov/cgi- bin/npgs/html/acc_search.pl?accid=PI+547430.

23. Verica, J.A., L. Chae, J. Tong, P. Ingmire, and Z.H. He, Tissue-Specific and

Developmentally Regulated Expression of a Cluster of Tandemly Arrayed Cell Wall-

Associated Kinase –Like Kinase Genes in Arabidopsis. Plant Physiology 133 (2003)

1732-1746.

24. Vodkin, L., R. Shealy, S. Clough, R. Philip, T. Raph, Libby Shoop, S. Bergman, R.

Staggs, M. Karo (2000) NSF Soybean Microarray Workshop University of Illinois.

25. Vodkin, L., Khannaa, A., Shealy, R., Clough, S.J., Gonzalez, D.O., Philip, R.,

Zabala, G., Thibaud-Nissen, F., Sidarous, M., Stromvik, M.V., Shoop, E., Schmidt, C.,

Retzel, E., Erpelding, J, Shoemaker, R.C., Roxrigues-Huete, A.M., Polacco, J.C., Coryell,

V., Keim., Paul, Gong, G., Liu, L., Pardinas, J., Schweitzer, P., Microarrays for global expression constructed with a low redundancy set of 27,000 sequenced cDNAs representing an array of developmental stages and physiological conditions of the soybean plant. BMC Genomics 5: (2005) 73. doi:10.1186/1471-2164-5-73

90

-

Fold Change Fold Change in Gene Expression 3 4 2 256

ate a two fold or fold two a greater ate TC sequence. The RT The PCR sequence. TC RT PCR Cycle Differences 1.30 1.84 1.02 8.53

25 85 44 61 - - - -

Value change expression change is the gene in

- - E 2e 6e 2e 1e

els. Ratios indic 0.5 below els.

UniProt Annotation Hypothetical Ribophorin Kinase Asparagine Aminohydrolase

droponicexpression is level growth differential of conditions. The

TC TIGR Singleton TC221258 TC209369 TC217977 ression difference of the tartget transcript between the two NILs two the as transcript between the of tartget difference ression

Federated Ratio 0.284 0.299 0.290 0.298

- - - - unctional annotation of the EST sequence, the TIGR the EST max containing TC unctionalannotation Glycine the sequence, EST the of

c1028 c1028 c1028 c1028 - - - -

EST Designation Gm 8295 Gm 7992 Gm 6182 Gm 5479

Table 1: Differentially Expressed Genes Between NILs Grown Under Iron Recovery Conditions Between Expressed Genes Grown IronUnder Differentially 1: Recovery NILs Table

Expression levels and annotations for each of expressed from differentially ESTs the identified each annotations and four for levels Expression hy iron under recovery grown plants together levels calculated is expression inefficient all which adding by ratio, federated the by designated expression lev dividing sum the by efficient of and iron inefficient genotype. difference iron the and all In efficient between level genotype cases expression genotype. levels genotype in efficient the in inefficient the were levels expression than less expression f a provide To The on E the against TC performed and tBLASTx UniProtTIGR the identified database. was a was the UniProt similarity the TIGR the between and annotation represents value the transcript in target the the of crossing of NILs difference the is amplification difference cycle tubulin. with normalization after fold threshold The amplification of the change exp calculated amplification. in difference the by represented

91

Figure 1: Differential Chlorosis of Iron Efficient (Clark) and Iron Inefficient (IsoClark)

Chlorosis patterning of iron efficient and iron inefficient plants grown in hydroponics system under 50uM Fe(NO3)3. Note the severe interveinal chlorotic response of the iron inefficient plant (Isoclark) compared to the iron efficient plant (Clark). The differential chlorosis response suggests that while the plants have been subjected to the same treatments, they have different tolerances or responses to low iron conditions.

92

Figure 2: Amplification plot of Gm-c1-28-5479

Fluorescence levels of SYBR Green covalently bound to the amplicon represented by Gm-c1028-5479.

Confirming the microarray results, the expression level of the gene is lower in the Fe- inefficient line than in the Fe-efficient plant. In this case, the difference in Ct values is 8.53 cycles, which translates to a 256 fold difference. This is a greater change in expression levels between Fe-efficient and Fe-inefficient lines than seen in the microarray. It is possible that the microarray was examining expression levels of multiple members of the same gene family, while the RTPCR is only examining the expression level of one family member.

93

Figure 3: Dissociation Curve of Gm-c1028-5479

Fluorescence levels associated with the melting temperature of the amplicon of Gm- c1028-5479. The shift in melting temperatures between iron efficient and iron inefficient lines suggests the gene product of Gm-c1028-5479 genotype dependent. The single peak generated by the degradation of each amplicon confirm only one gene product was amplified for each genotype, reaffirming the sensitivity of the RTPCR. The different product sizes would suggest that this gene is an excellent candidate for further study in iron efficiency work.

94

Chapter 4. Integrating Microarray Analysis and the Soybean Genome to

Understand the Soybeans Iron Deficiency Response

A Genomic Study of Iron Deficiency Chlorosis in Soybeans

Jamie A. O'Rourke, Rex T. Nelson, David Grant, Jeremy Schmutz, Jane Grimwood,

Steven Cannon, Carroll P. Vance, Michelle A. Graham, and Randy C. Shoemaker

Abstract

Iron is an essential micronutrient for both plants and animals. Iron deficiency chlorosis (IDC) in soybeans, a major source of protein and edible oil for much of the world’s population, results in yield loss. The United States is the world’s largest producer of soybeans, but reports of up to 23% yield loss due to IDC are common in the calcareous soils of the upper Midwest. The use of microarray technologies has allowed

IDC research to move beyond quantitative trait locus (QTL) studies to the identification of specific candidate genes involved in the trait of interest. Near isogenic lines (NILs)

(PI548553 and PI547430), developed for their differential iron response, were grown hydroponically in iron sufficient and iron limited conditions. Transcriptional profiles of the plants were analyzed and compared using the Affymetrix® GeneChip® Soybean

Genome Array, which represents approximately 37,500 soybean EST transcripts. A comparison of iron efficient Clark plants (PI548553) grown under Fe-sufficient and Fe- limited conditions identified 835 candidate genes putatively involved in soybean's iron stress response. An identical comparison of iron inefficient IsoClark plants (PI547430)

95

identified 200 candidate genes. These same microarrays were also used to identify 211 single feature polymorphisms (SFPs) between the NILs. These SFPs represent a potential source of genetic variation involved in the differential iron stress response elicited by the two lines. Many of these SFPs are located in genes with high homology to transcriptional regulators. Semi quantitative real time PCR analysis of the near isogenic lines confirmed the differential expression of candidate genes identified by microarray analyses. Sequences of the differentially expressed genes, SFPs, and sequences of markers known to lie within iron QTLs were aligned against the 7X build of the soybean whole genome sequence to identify regions of transcriptional significance. This analysis identified 58 genes differentially expressed in the microarray experiment with a genetic location within known QTLs in the Clark genotype and 21 in the IsoClark genotype.

Additionally, 11 of the 211 SFPs aligned within the known QTL regions. A sliding window analysis of the microarray data and the 7X genome coupled with an iterative simulation model of the data showed the candidate genes exhibit clustering in the genome. Closely clustered genes in other species have been shown to be co-regulated.

An analysis of promoter regions of differentially expressed genes identified 11 conserved motifs in promoter regions of 248 differentially expressed genes, representing 129 clusters identified earlier and confirming the cluster analysis results. These conserved motifs support the hypothesis that the differentially expressed genes are co-regulated.

Additionally, the combined results of all analyses lead us to believe iron inefficiency in soybean is a result of a mutation in a transcription factor, which controls the expression of genes required in inducing an iron stress response.

96

Introduction

Iron is a critical micronutrient for both plant and animal nutrition, serving as an invaluable co-factor for a variety of cellular processes. Iron deficiency anemia is one of the leading nutritional disorders worldwide, affecting 43% of the population of developing countries [1]. For most of the world’s population, legumes are a major source of dietary iron [1, 2]. Though iron composes 5% of the earth’s crust [3], it is largely unavailable to plants. Additionally, 30% of the worlds’ soils are classified as calcareous

[4], with a pH greater than 7.5. Calcareous soils are especially prevalent in the upper

Midwest of the US [5, 6] and have been shown to have a direct correlation with iron deficiency in soybeans. IDC in soybeans is characterized by interveinal chlorosis of the developing trifoliates [7] and an end of season yield loss in direct proportion to the severity of the chlorosis [8].

Plants have evolved two systems to uptake iron from the soil. These systems are termed strategy I and II [9, 10]. Soybeans and other dicots utilize strategy I, in which the rhizosphere is acidified by the release of protons to produce a favorable environment for the release of iron from chelating agents in the soil. The Fe+3 ion is then reduced by a membrane bound reductase to the usable Fe+2 form and transported across the cell wall and plasma membrane into the cell by a specific transporter for distribution and use within the plant. The transport of the iron ion into the plant has been shown to be the rate-limiting step in IDC [11]. Graminaceous monocots utilize strategy II, whereby the roots release chelators called phytosiderophores to bind Fe+3 ions. Once bound, the entire complex is transported into the root where it is uncoupled. The Fe+3 ion is reduced to

Fe+2 and the phytosiderophores are re-released into the soil.

97

The quantitative nature of IDC makes field studies problematic. Previous studies have identified Quantitative Trait Loci (QTLs) associated with IDC [5, 12]. Many of the same quantitative trait loci (QTLs) have been identified in both field and greenhouse studies, where plants are grown in a hydroponics system designed specifically to induce

IDC [13]. Growing plants in a controlled greenhouse environment with regulated nutritional availability allows for reliable and replicable induction of iron deficiency stress. In addition, the advent of microarray technology now allows for the identification of individual genes whose expression levels are affected by iron availability [14, 15].

The availability of a whole-genome sequence assembly for the soybean genome has, for the first time, allowed us to genetically position differentially expressed genes induced by iron deficiency.

Genomic studies in many organisms have shown genes in close proximity to one another in the genome are often co-expressed. These co-expressed genes create clusters of expression neighborhoods [16]. A study in Arabidopsis showed clusters of up to 20 different genes were coordinately regulated, with a median cluster size of 100kb [17]. In rice, approximately five percent of the genome has been associated with co-expressed gene clusters [18]. These clusters are conserved by natural selection [19]. Initially co- expressed genes were thought to belong to similar biological pathways [17], but further studies have shown co-functionality to be a poor predictor of co-expression [20]. Instead, promoter analysis has found co-regulated genes are often regulated by common transcription factors [16, 20, 21]. The co-expression of clustered genes may be partially regulated by the interaction of promoters and transcription factors [21]. Co-regulated genes often have common transcription factors [20] so an increase in the transcription

98

factor binding site due to a high prevalence of promoter regions would increase the likelihood of the transcription factor binding and aiding in the expression of the gene cluster.

Materials and Methods

Plant Growth and RNA Extractions

NILs developed for their characteristic response to limited iron conditions, were developed by the USDA-ARS [22]. The iron efficient PI548533 (Clark) was crossed with iron inefficient T203 (PI54619). Seven repeated backcrosses to Clark yielded the iron inefficient line PI547430 (IsoClark). Both the iron efficient Clark and the iron inefficient IsoClark were germinated in sterile vermiculite and transferred to a DTPA buffered nutrient hydroponics system 7 days after planting. Each 10L hydroponic unit contained 2 mM MgSO4*7H2O, 3 mM Mg(NO3)2*6H2O, 2.5 mM KNO3, 1 mM

CaCl2*2H2O, 4.0 mM Ca(NO3)2*4H2O, 0.020 mM KH2PO4, 542.5 µM KOH, 217 µM

DTPA, 1.52 µM MnCl2*4H2O, 4.6 µM ZnSO4*7H2O, 2 µM CuSO4*5H2O, 0.20 µM

NaMoO4*2H20, 1 µM CoSO4*7H2O, 1 µM NiSO4*6H2O, 10 µM H3BO3, and 20 mM

HCO3. A pH of 7.8 was maintained by the aeration of a 3% CO2: air mixture. A supplemental nutrient solution containing 16 mM potassium phosphate, 0.287 mM boric acid and 355 mM ammonium nitrate was added daily to maintain proper plant nutrition.

Both iron efficient and iron inefficient plants were grown in iron sufficient (100uM

Fe(NO3)3) and iron limiting (50 uM Fe(NO3)3) hydroponic conditions. Leaf tissue from the 2nd trifoliate was collected 21 days after planting, or after 14 days in the hydroponics system. Tissue was flash frozen in liquid nitrogen and stored at –80 OC until RNA could

99

be extracted. Three independent biological replicates were used as the experimental tissue. RNA extractions were performed using the Qiagen RNeasy Plant Mini Kit

(catalog # 74904). RNA samples were submitted to the Iowa State University

GeneChip® facility to be hybridized and scanned using the Soybean Affymetrix®

GeneChip®. A model based expression index analysis (MBEI) [23] of the raw chip data identified perfect match probes with a two-fold or greater expression difference between the genotype and iron concentrations. An analysis of Clark plants grown in iron sufficient and iron deficient conditions showed 835 transcripts differentially expressed at two-fold or greater. IsoClark plants grown in identical conditions showed 200 transcripts that met the criteria for differential expression.

Candidate Gene Annotation

The candidate genes were queried against the SoyBase Affymetrix® GeneChip®

Soybean Genome Array Annotation page, publicly available at http://www.soybase.org/AffyChip/. Here, researchers with the USDA-ARS have used

BLASTX and TBLASTX [24] to compare the sequences from which all Affymetrix probes were derived to the UniProt database and the Arabidopsis genome gene calls

(TAIR7, http://www.arabidopsis.org/). The top three UniProt BLAST hits and the

Arabidopsis best hit GO annotation is reported for each Affymetrix probe set. To assign a putative function and classification to the differentially expressed genes

(Supplementary Data Tables 1, 2, and 3) the three UniProt annotations were compared.

If all three were identical that annotation was assigned to the gene. If the top three

BLAST hits were not in concordance, that sequence was re-examined to determine if one of the annotations was more likely correct than the others. If no annotation could be

100

confidently identified by BLAST analysis with UniProt, the differentially expressed gene was annotated as an unknown. If the gene sequence for the Affymetrix® probe showed no sequence homology to any of the proteins in the UniProt database, the sequence was annotated as No UniProt Hit.

GO Slim Term Analysis

For expressed genes with homology greater than 10e-6 to an Arabisopsis gene, custom perl scripts were written to parse and tally each transcript GO slim ID for biological process, molecular function, and cellular process. The same scripts were used to tally GO slim IDs for the entire chip. Differences between the expressed genes and the entire chip were compared using a Fisher exact test [25]. This test was performed to identify the GO slim terms within each of the three GO slim classifications that were over-or under-represented in the lists of differentially expressed genes in relation to their presence on the soybean Affymetrix® chip. A Bonferroni correction [26], using the number of identifiers present on the Affymetrix® chip, was applied to the two-tailed probability value (p-value) of each GO slim identifier. GO slim identifications with a p- value of less than or equal to 0.05 after the Bonferroni correction were considered statistically over-or under-represented in our list of differentially expressed genes. This correction is likely to underestimate the number of categories of genes either over-or under-represented on the lists of differentially expressed genes in comparison to their prevalence on the Affymetrix® chip.

Real Time PCR Confirmation

The differential expression observed in the microarray experiment to identify candidate genes was confirmed using semi quantitative Real Time Reverse Transcriptase

101

PCR (sqRT-PCR). Thirteen transcripts identified as differentially expressed in the microarray experiment were tested using sqRT-PCR (Table 3). Genes for sqRT-PCR confirmation were chosen based on differential expression levels in the microarray. We tested genes showing both extreme differential expression and those just exceeding the two-fold criteria. Primers were designed from the EST sequence used to construct the

Affymetrix probe to produce 250 bp amplicons. The sqRT-PCR was conducted as described by the Stratagene protocol (Catalog #600532) using the Stratagene Brilliant qRT-PCR kit with 25uL reactions. For each experimental reaction, 200ng of total RNA was added as initial template along with 125mM MgCl2 and 100nM forward and reverse primers. Cycling parameters were as follows: 45 min at 42OC for reverse transcription,

10 min at 95OC to denature reverse transcriptase StrataScript, 40 cycles of 30 sec at

95OC, 1 min at proper annealing temperature, 30 sec at 72OC. All sqRT-PCR reactions were performed in the Stratagene Mx3000P followed by a dissociation curve, taking a fluorescence reading at every degree between 55OC and 95OC to ensure only one PCR product was amplified. As controls, a passive reference dye was added to each reaction to ensure the increase in fluorescence was due to an increase in amplicon and not an artifact of the PCR. Additionally, each sample was run in triplicate and normalized against tubulin amplification to ensure differential expression was not due to differing amounts of initial template RNA added to each sample.

To be considered differentially expressed, samples had to differ in cycle thresholds (Ct) by more than 1 cycle, which corresponds to the two-fold difference in gene transcripts between the NILs identified by the microarray experiment. The resulting

102

fold change of the sqRT-PCR was calculated from the differences in Ct using the 2 Ct method [27].

SFP Identification and Association with known IDC QTLs on Soybean Genome

Single Feature Polymorphisms (SFPs) were identified following the protocol outlined by West et al. 2006 [28]. In brief, the microarray data from plants grown under iron sufficient conditions was transformed by robust multichip analysis (RMA) [29].

Custom perl scripts were used to examine each of the ten individual probes comprising a single perfect match probe. These perl scripts assigned each perfect mach probe set an

SFPdev score by subtracting the average hybridization signal from the other ten probes from the hybridization signal of the probe in question and dividing that by the hybridization signal of the probe being examined ((hyb signal probe 1 – (hyb signal probe

1+ hyb signal probe 2 + hyb signal probe 3 + hyb signal probe 4 …hyb signal probe

10)/10) / hyb signal probe 1). SFPdev scores with an absolute value greater than or equal to two on all replicates indicated an SFP.

Statistical Modeling and Cluster Analysis

To determine if gene distribution along the assembled genome could be explained by random chance, a simulation program originally reported by Grant [30] was applied to a theoretical genome. A genome of 996,903,313 bp (the combined size of the 7x genome assembly which has been assigned to soybean molecular linkage groups) was partitioned into 1,000,000 bp, 100,000 bp, and 10,000 bp windows resulting in 953 bins, 9,530 bins and 95,300 bins respectively. The program positioned 760 or 200 genes depending on the genotype being simulated on the genome and determined the number of genes within the window. The simulation was repeated 1,000 times. The mean number of bins with 0

103

- 8 genes was calculated for the 1,000 repetitions. A standard deviation for each gene bin size was also calculated. To determine how this compared with our experimental data, the sequences assigned to MLGs were concatenated together and the sliding window analysis was performed to identify clusters. The difference between the microarray data and the simulated data is calculated in terms of the number of simulated data standard deviations (SD). A difference greater than two SD is considered statistically significant. The sign of the difference is indicative of whether there are more or fewer genes than expected.

Promoter Identification and Analysis

The consensus sequence used by Affymetrix® to generate the probes on the

Soybean GeneChip® identified as differentially expressed between Clark plants grown under iron sufficient and iron deficient conditions were queried against the 7X genome gene calls. The top hit for each differentially gene was used as the gene call for the differentially expressed sequence on the Affymetrix® GeneChip®. Custom perl scripts identified the 500 bases upstream of the start codon for each gene from the 7X genome assembly. The reverse complement of each of the 500 bp promoter regions was also identified. The program MEME (Multiple Em for Motif Elicitation [31]) was run against the 500 base promoter regions of all IDC genes to identify short conserved sequences in the promoter regions of the differentially expressed genes using the –dna –mod anr –evt 1 commands. Identified motifs with E-values < 1E-6 were then compared against a modified TRANSFAC database using BLASTN [24] to determine if identified motifs contained any known transcription factor binding sites.

104

RESULTS

Candidate Gene Identification and GO analysis

RNA from both iron efficient Clark and iron inefficient IsoClark grown under iron limiting conditions (50uM Fe(NO3)3) and iron sufficient conditions (100uM

Fe(NO3)3) were submitted to the Iowa State GeneChip Facility for hybridization and scanning using the Affymetrix® GeneChip® Soybean Genome Array. An MBEI analysis [23] of the data revealed only 30 transcripts met or exceed the two fold difference required to be considered differentially expressed between Clark and IsoClark genotypes grown under iron sufficient conditions (Supplementary Table 1). This result confirms the NILs probably differ by only a limited number of genes. In contrast, 835 transcripts were differentially expressed between Clark plants grown under iron sufficient and iron limiting conditions (Supplementary Table 2) and 200 transcripts differentially expressed between IsoClark plants grown under the same conditions (Supplementary

Table 3).

GO slim categories that were either over-or under-represented in our lists of differentially expressed genes were identified for both the Clark and IsoClark comparisons. Transcripts with GO slim classifications that are over or under represented on our list of differentially expressed genes should be representative of the processes and pathways being induced or shut down under iron stress in both the iron efficient and iron inefficient plants. The Clark genotype experiment had 488 out of 835 unique transcripts with GO slim IDs. Of the corresponding GO slim IDs, 42 were either over or under represented in our list of differentially expressed genes (Table 1). These transcripts could be over-represented in our expression data based on comparison with the entire chip

105

(Table 1). The over and under represented GO slim categories could be further divided into 17 biological process IDs, 19 molecular function IDs, and 6 cellular component processes (Table 1). Of the 200 differentially expressed genes in the IsoClark genotype,

49 had corresponding Arabidopsis GO slim IDs. Of these, 11 were over or under represented and fell into 5 molecular function categories, 1 cellular component category, or five biological process categories (Table 2).

Examining the GO terms associated with the candidate genes provides further insight into the disparity of the number of differentially expressed genes between genotypes. The IsoClark (inefficient) genotype does not appear to induce genes in response to the iron depravation stress. The most prevalent GO term in all three classifications for both genotypes was ‘unknown function’ (Tables 1 and 2). However, the Clark (efficient) genotype also had a high proportion of GO terms (and thus, transcripts) specifically related to iron availability and usage, ie: ferric iron binding

(GO:0008199), iron ion transport (GO:0006826), and iron ion homeostasis

(GO:0006879) that were over-represented on our lists of candidate genes. There were also a number of GO terms not specifically related to iron, but which are associated with a more general stress response (GO:0009611 – response to wounding, GO:00099 jasmonic acid biosynthesis, and GO:0009408 – response to heat).

Real Time PCR Confirmation

The differential expression observed through sqRT-PCR analysis mirrored, in direction, the expression differences observed in the microarray study. As with previous microarray confirmation studies, the sqRT-PCR showed greater levels of differential expression than seen in the microarray. This is most likely due to cross hybridization of

106

genes to the microarray, while the RT PCR is designed to be very specific and may be measuring individual members of a gene family. These experiments confirmed the iron deficient plants had lower expression levels of the transcripts than the iron sufficient plants, replicating the results seen in the microarray data.

Positioning Candidate Genes on the 7X Genome Assembly

Sequencing of the soybean genome by the Department of Energy, Joint Genome

Institute currently has produced 7X sequence coverage of the genome, which has been assembled into sequence scaffolds by researchers at Stanford University. In collaboration with researchers at Stanford University, USDA-ARS researchers have assembled the scaffolds into pseudo chromosomes based on marker homology, allowing us to place our candidate genes on specific molecular linkage groups (Figure 1).

The sequences of transcripts identified as differentially expressed by microarray analysis (see above) were obtained from the Affymetrix® website

(www.affymetrix.com). These sequences were then queried against the 7X soybean genome using BLASTN [24] and an e-value cutoff of 10E-50 to ensure a high sequence homology between the aligned sequences. The 835 differentially expressed genes in the

Clark genotype were located on 214 unique scaffolds. The 200 genes identified as differentially expressed in the IsoClark genotype were located on 118 scaffolds. The same parameters were used to compare the sequences of SFPs to the 7X genome.

Markers used in previous iron QTL studies were also identified on the pseudo chromosomes to delineate known iron QTL regions (Figure 1). The iron efficiency QTLs were scaled to the 7X build and used to determine if any of the candidate genes from the microarray experiment were encoded within the iron QTL regions. Fifty- eight genes in

107

the Clark genotype and twenty-one genes in the IsoClark genotype were located within previously identified QTLs (Figure 1).

Cluster Analysis

The gene distribution simulation randomly placed genes across the assembled genome. If the candidate genes were randomly located throughout the genome, we would expect the experimental results to closely mirror the simulated data study.

However our results strongly indicate that the differentially expressed genes exhibit clustering of two or more genes within 1,000,000 bp, 100,000 bp, and 10,000 bp in the genome (Tables 4 and 5).

The candidate genes do not show a high concordance with known QTL regions, but do serve to identify additional genomic regions of IDC transcriptional importance

(Figure 1). The largest cluster contained eight candidate genes located within 1 MB on

MLG C2 (Figure 1). There was also a cluster of seven genes on MLG D1B (Figure 1).

D1B contained six clusters of four or more genes within 1,000,000 bases, as did MLG F

(Figure 1). None of these clusters were located within known iron QTLs. However, another cluster of seven genes falls on MLG H (Figure 1), which has three known iron

QTLs, together spanning 24 cM of the linkage group. MLG M contained the most gene clusters, eight separate clusters, each with four candidate genes (Figure 1). Again, MLG

M has not been previously shown to contain regions of genetic importance to soybean

IDC.

The identification of these gene clusters on MLGs previously not known to be involved in soybean IDC opens new regions of genetic interest to investigate in future studies. Most importantly, the distribution of genes and gene clusters throughout the

108

genome, not solely within the confines of known QTLs, highlights the limitations of using QTL regions to identify candidate genes for the trait of interest. The majority of candidate genes identified were not within the QTL, nor were the largest clusters of differentially expressed genes. Though the QTL must contain sequence of importance to

IDC, genes responsible for the chlorosis and yield loss may not be confined to the QTL region.

Promoter Analysis

Previous research has demonstrated that genes clustered in close proximity in the genome may be coordinately regulated. To determine if clusters if IDC genes were coordinately regulated, we mined 500 bases from the promoters of all IDC genes and used these as input into the MEME software program. As an internal control, sequences were not analyzed as members of clusters; rather, all sequences were analyzed as a single large group. If IDC genes were coordinately regulated, MEME could also be used to independently identify potential gene clusters. In total, the promoters of 835 iron deficiency induced genes from the Clark genotypic comparison and 200 genes from the

IsoClark comparison were analyzed using MEME. There were no motifs found using

MEME in the IsoClark (inefficient) gene promoter regions. All motifs identified by

MEME were found in the promoter regions of genes differentially expressed due to iron deficiency in the Clark (efficient) genotype. Twenty-one motifs with E-values more significant than 10E-6 were identified by MEME analysis. Following visual inspection, this number was reduced to 11. Motifs were eliminated if they contained repetitive sequence or had lower significance E-values. The 11 motifs were identified in 248 IDC genes, representing 129 of the clusters of two or more genes as mentioned above. One

109

mechanism by which genes can be coordinately regulated is through the action of transcription factors that bind to the promoter to regulate gene expression. Therefore, the

11 motifs identified above were compared to known transcription factor binding sites in the TRANSFAC [32] database (Table 6). Three showed significant homology (99% identity) to known transcription factor binding sites (Table 6). These three sites bind a helix-loop-helix transcription factor (USF), an elongation factor (EF2), and a Myb transcription factor. These binding sites were identified in the promoter regions of 42,

40, and 28 genes respectively. Both helix loop helix and Myb transcription factors are known to be involved in regulating the iron stress response and general stress responses in other plant species.

SFP Analysis

Seventy-two candidate SFPs in the Clark genotype and 98 in the IsoClark genotype were identified (Supplementary Table 4). Sequences of the probes containing the SFP were queried against the 7X Soybean genome to determine a putative genetic location (see above). Of the 170 total SFPs, 163 had 100% homology to the genomic sequence. Many of the sequences were present in the genome multiple times, thus an absolute position could not be identified. We chose to place this sequence in all genomic positions where the sequence showed 100% identity to the genome up to 3 locations, yielding 206 potential SFP locations.

Of the 206 SFP locations, 40 showed an association to previously identified iron efficiency QTLs (Figure 2). Only one of the 206 SFPs identified in this study

(GmaAffx.41460.1.S1_at) was encoded within a gene identified as differentially expressed between IsoClark plants grown under iron sufficient and iron deficient

110

conditions. The remaining SFPs were not in differentially expressed genes in either

Clark or IsoClark genotypes. This suggests the SFPs are located within non-protein coding sequences, perhaps involved in regulatory elements, which would not necessarily be differentially expressed. GO slim ID analysis, as previously described, was performed with the gene sequences containing SFPs. Of the 206 SFPs identified, 20% had an unknown biological process annotation. The most prevalent group with known annotations were related to transcriptional regulation. Genes involved with electron transport, ATP binding, , and were also identified as over-represented by their GO IDs (data not shown).

DISCUSSION

Microarray Analysis

The most obvious result of the microarray experiment is the disparity between genotypes in the number of genes differentially expressed in iron replete versus iron deficient conditions. The Clark genotype analysis identified 835 differentially regulated genes when grown under iron sufficient and iron insufficient conditions. The IsoClark analysis identified only 200. The lack of SFPs among all but one of the differentially expressed genes suggests differences in expression level are not due to structural differences in the candidate genes, between the genotypes. However, this does not preclude the possibility of structural differences in the promoter regions of the genes.

The candidate genes identified with the microarray experiment suggest the Clark genotype is capable of recognizing the iron deficiency and eliciting a change in transcription patterns as a response to the stress. To compensate for the lack of iron

111

availability, the iron deficient Clark plant adjusts its physiological processes to ensure survival through the stress. Alternatively, the IsoClark genotype does not appear to initiate an effective response to the iron deficient conditions. The lack of differentially expressed genes in the IsoClark genotype, when comparing iron sufficient and iron deficient conditions, implies the iron deficient IsoClark plant continues to function as if still in iron sufficient conditions. However, the lack of iron as a cofactor in many of the basic biological processes results in a multitude of biological pathway failures, resulting in chlorotic plants.

In Arabidopsis, iron deficiency stress causes an increase in the transcription of electron transport chain components. Specifically, cytochromes are upregulated [33].

Our experiment identified seventeen genes associated with cytochrome P450 in iron stressed Clark plants. All seventeen genes were down-regulated in iron stressed tissue compared to non stressed tissue, the opposite response as seen in Arabidopsis plants [33].

Thimm et al. proposed a correlation between iron deficiency stress and in induction of phosphoenolpyruvate carboxylase activity [33]. Four genes associated with phosphoenolpyruvate activity were identified as differentially expressed in the Clark genotype by microarray analysis. All four of these genes were down regulated in plants grown under iron stress rather than in iron sufficient conditions. Iron deficiency has also been shown to induce glycolytic activity [34]. There are three main enzymes involved in glycolysis; glyceraldehydes 3 phosphate dehydrogenase (G3PD), pyruvate kinase (PK), and fructose 6 phosphate kinase (F6PK). All three have been shown to be up-regulated in

Arabidopsis [33] and cucumber [34] under iron deficiency stress. Microarray analysis comparing Clark plants grown in iron sufficient and iron stressed conditions only

112

identified a single G3PD and a single F6PK, both of which were down-regulated in the iron stressed tissue compared to iron sufficient tissue. Seven genes associated with PK were identified in our microarray, again, all seven were down regulated. The down regulation of the three main components of glycolysis suggests soybean, unlike

Arabidopsis, does not increase non photosynthetic carbon fixation or phosphoenolpyruvate carboxylase activity under iron stressed conditions. The contrasting results support the hypothesis proposed by Zocchi et al. that soybeans do not follow canonical iron deficiency responses [35].

Soybean does follow some of the established responses to iron deficient stress conditions. It has been proposed that under iron deficient conditions citrate provides a carbon skeleton for chlorotic leaves to allow for sustained growth and respiration [36].

Clark iron deficient stressed plants show a down regulation of citrate (GO:

0008815) in comparison to non-stressed plants. The reduced breakdown of citrate in iron stressed plants lends credence to this hypothesis. Additionally, iron deficient conditions cause decreased activity of lipoxygenases [33]. All thirteen lipoxygenases identified by microarray analysis in the Clark genotype showed decreased expression in the iron stressed tissue compared to the iron sufficient tissue.

The discrepancies between previously reported literature and soybeans iron deficiency response highlight the complexity of the iron stress response. However, it is important to remember than transcriptional regulation is only one form of regulation.

Posttranscriptional modification may be an important component to understanding soybean’s iron deficiency response, but that is beyond the scope of this investigation.

113

GO Slim ID Analysis

The Clark (iron efficient) genotype had an over-representation of genes in GO slim categories specific to iron availability/usage and categories associated with a more general stress response. This reinforces the hypothesis that soybean induces both an iron specific and a more general stress response. A similar pattern was observed in a cDNA microarray experiment [14]. Additionally, the Clark genotype showed a statistically significant number of GO slim IDs that were over-represented related to DNA replication and DNA binding activity. The increased expression levels of genes involved in these processes is probably a result of the DNA repair required to prevent lethal mutations from

ROS [37], which are more prevalent under conditions of stress [37]. DNA binding activity suggests the activity of transcription factors, which lead to dramatic expression changes downstream. However, the down regulation of genes related to translation ie:

GO0006412 (translation) and GO: 0006468 (protein amino acid phosphorylation) is indicative that the plant is not synthesizing normal proteins as it would under optimal growth conditions and is instead reducing the expression of genes involved in cellular processes not imperative to survival.

The IsoClark genotype had many fewer GO categories significantly over or under-represented on our lists of candidate genes (Table 2) in comparison to Clark. Only two of the GO classifications were related to iron GO:0008940 (nitrate reductase activity) and GO:0008382 (iron superoxide dismutase activity). The remaining GO categories show little association to either a general or an iron specific stress response. Again, as was previously reported, it appears the IsoClark genotype is unable to recognize or respond to the iron stress. The IsoClark genotype had fewer genes differentially

114

expressed due to iron deficiency and most of the genes that were differentially expressed are not associated with stress related pathways.

Clusters of Co-Expressed Genes

Expression analysis has been used in some model organisms to identify differentially expressed genes that are clustered together within the genome [16-21, 38,

39]. These genomic neighborhoods are thought to be conserved by natural selection [19] but are not entirely explained by co-functionality [20]. The combined use of expression data with known QTL positions and expression clusters should further narrow the list of candidate genes to identify functionally important differences in the soybean genome affecting iron efficiency.

Co-expressed genes show a non random distribution throughout the genome [19,

20]; similarly expressed genes are located in clusters. Localized co-expression of genes has been reported in many different species including (but not limited to Arabidopsis [17,

38, 39], rice [18], human [16, 40], and yeast [19]). Williams et al [17] found genes located nearby in the genome and genes involved in the same pathways are more likely to be co-expressed. The incidence of co-expressed gene clusters has been widely studied

[16, 17, 19, 20, 40]. One proposed explanation is that the co-expressed genes are regulated by a common transcription factor. Grouping these genes creates an increase in the abundance of binding sites specific to that transcription factor [40]. A related hypothesis suggests the co-expressed genes are regulated by similar promoter sequences, so a co-expression ‘neighborhood’ would increase the availability of these promoter sequences [21]. However, genomic studies have, as of yet, been unable to confirm either of the two hypotheses.

115

Cluster analysis, as first reported by Grant et al. [30], was performed to determine if candidate genes identified by the microarray experiment were randomly distributed across the genome. Iterative simulations modeling our data showed our candidate genes were not distributed evenly throughout the genome. Using a sliding window of 1,000,000 bases, we identified more genes in smaller regions of the genome than expected by a random distribution of the differentially expressed genes with 3 – 8 candidate genes per 1,000,000 bases (Tables 4 and 5). The same patterning held true when the sliding window was reduced to 100,000 and 10,000 bases. The statistical significance comes from comparing the experimental data to the simulated data is found in the number of simulated standard deviations (SDs) the experimental data is from the simulated data (Simulation SD column in Tables 4 and 5). When comparing clusters of three or more genes in either Clark or IsoClark, there is only one instance (3 genes per cluster, Table 4) where the difference between the experimental data and the simulation study is not statistically different.

In the Clark genotype, with a window of 1,000,000 bases, there were thirty-six clusters of four genes and thirteen clusters of five genes per window identified in the experimental data. There were only seven clusters of four genes and only a single occurrence of five genes clustering together in the simulation study. The difference in

SDs is 11.32 and 11.81 respectively, highly statistically significant. The million base window allowed larger gene clusters to be identified in the experimental data (two clusters of both six and seven genes and a single cluster containing eight genes). No clusters of these sizes were identified in the simulation study, further supporting the clustering hypothesis. When the window size is decreased to 100,000 or 10,000 bases,

116

three genes in a cluster become significantly over represented in the experimental data compared to the simulation study. The microarray experiment identified 22 clusters of three genes per 100,000 bases and seven clusters per 10,000 bases. No clusters of three or more genes were identified in the simulation study at either window size.

The IsoClark genotype identified fewer candidate genes in the microarray experiment, which reduces the number of gene clusters identified. However, even with a reduced number of candidates, IsoClark still exhibited clustering of co-expressed genes.

With a window of 1,000,000 bases, there were eight clusters of three genes identified in the experimental data, but only two clusters are identified in the simulation study.

Though there are only six clusters different, the experimental data has a SD of 4.35 from the simulation study, well above the two required for statistical significance. There were no clusters greater than three genes identified in the IsoClark simulation, but there were a number of clusters with greater than three genes identified in the experimental data for

IsoClark. The retention of clusters, even among so few candidate genes, lends further support that the soybean genome has conserved genomic regions with co-expressed genes.

Single Feature Polymorphisms (SFPs)

Identifying candidate genes for a trait of interest is the most widely used method of analyzing the data provided by microarray experiments. However, mining the hybridization data to identify single feature polymorphisms (SFPs) provides a high throughput platform for detecting polymorphisms[41]. Single Nucleotide

Polymorphisms (SNPs) are the most commonly recognized polymorphism, but identification is labor intensive and SNP coverage across the genome is fairly sparse [42].

117

It has been suggested that there is a greater probability of identifying a causal polymorphism for the trait of study using SFPs than traditional SNPs [42], perhaps due to better genic coverage.

To date, only 3 molecular markers (Satt 481, Satt114, and Satt239) segregate with the iron efficiency trait in soybean across multiple populations [43, 44]. The 72 SFPs identified in the Clark genotype and the 98 SFPs identified in the IsoClark genotype relative to Williams 82 in this study have the potential to be developed into molecular markers specific to IDC. Of particular interest may be the 72 SFPs from the Clark genotype as these SFP molecular markers may aide in selecting for iron efficiency.

Determining which of the SFPs may potentially be used as molecular markers should begin with the SFPs located within known iron QTLs. Initially, we hypothesized the

SFPs would be in differentially expressed genes within known iron QTLs. However, only one SFP (GmaAffx.41460.1.S1_at) was differentially expressed in IsoClark leaf tissue, and only 11 of the identified SFPs are located within known QTLs. Other studies have shown the majority of SFPs identify sequences in non protein-coding sequences

[45]. This dovetails nicely with our data, where the majority of SFP sequences have an unknown function and the largest class of annotated SFPs is transcription factors

(Supplementary Table 4). These SFP polymorphisms altering transcription and /or translation rates of key genes and proteins, or serving some other regulatory function in soybean iron homeostasis.

Iron QTLs and the Soybean Genome

QTL mapping and marker assisted selection have been utilized by plant breeders for decades in the pursuit of crop improvement. This approach has been especially

118

important for quantitative traits such as iron deficiency chlorosis [5, 12, 44, 46]. Only in recent years have scientists been able to utilize microarray technology to examine gene expression on a global scale to identify candidate genes for their trait of interest. The development of the Affymetrix® GeneChip® Soybean genome array [47] means repeatable precision, providing more confidence to the microarray experiments than cDNA arrays.

The availability of the whole-genome soybean sequence has provided the ability to visualize the placement of candidate gene sequences within the genome. This view will allow further insight into soybeans’ response to iron deficiency stress. Nineteen

QTL regions have been identified for iron deficiency chlorosis. These regions represent approximately 182cM of genetic information. Our initial hypothesis was that the majority of the genes identified in the microarray experiment would map within known iron QTL regions. However, only 58 of the 835 (7%) candidates in the Clark genotype and 21 of the 200 (10%) in the IsoClark genotype mapped within known QTL regions

(Figure 2). Thus, the majority of the candidate genes lie outside the region defined by the

QTL. However, given the evidence of coordinate gene expression, gene clustering and conserved promoter motifs in our data, we have revised our previous hypothesis. We now believe the previously identified QTL regions likely correspond to transcription factors that regulate gene expression during iron stress. While microarray experiments would identify IDC regulated genes whose expression changes in response to a transcription factor, they may not identify the transcription factor itself. In contrast, QTL mapping would identify a mutation in a transcription factor, which is at the top of the signaling pathway. The mutation would effect either the expression of the transcription factor or its

119

ability to bind to target promoters. This hypothesis is supported by our data. The clustering of co-expressed genes suggests they are being coordinately regulated. This is supported by the conserved motifs identified in the promoter regions of candidate genes.

Most often, motifs are conserved throughout a previously identified cluster of genes in

Clark. It is unlikely these motifs are missing or are altered in the promoter regions of the

IsoClark genome. More likely, IsoClark may have a mutation in the transcription factor that controls the expression of these genes. Only by combining QTL analyses, microarray analyses of NILs, and the genome sequence could this conclusion be reached.

Conclusion

The use of near isogenic soybean lines, microarray analysis, SFP identification, and the sequence of the soybean genome has allowed us to identify individual genes lying within known iron efficiency QTLs whose expression levels are affected by iron availability. We have also identified 11 conserved motifs in the promoter sequence of genes differentially expressed due to iron deficiency stress. The 58 differentially expressed genes identified in Clark and 21 in IsoClark, located within known QTL regions, are the first genes identified by microarray analysis within QTL regions specific to iron deficiency stress. The conserved sequences throughout the promoter regions of the differentially expressed genes in the Clark genotype provide compelling evidence that the differential iron response is likely due to the differential expression or binding of a transcription factor. Co-expressed genes clustered either by physical proximity (Tables 4 and 5) or through shared promoter motifs (Table 6) highlight the limitations of the use of

QTL analysis as the majority of the differentially expressed genes lie outside QTL regions. Additionally, both types of clustering suggest the control of soybeans’ iron

120

deficiency response is regulated by the differential expression of a transcription factor or a mutation within the transcription factor, which affects its ability to bind to target promoter regions. This implies the eight transcription factors differentially expressed in

Clark under iron deficiency stress which are located within known iron QTL regions are the most likely candidate genes for the QTL. Future research should focus on identifying additional transcription factors within the IDC QTL regions and functional analysis of the other 50 genes in Clark and 21 genes in IsoClark that map within the QTL regions. Additionally, the conserved motifs identified by MEME in the promoter regions of iron deficiency induced genes can be used to mine the soybean genome for additional genes potentially affected by IDC, but which are not represented on the soybean

Affymetrix® GeneChip®.

References Cited

1. Theil EC: Iron, Ferritin, and Nutrition. Annu Rev Nutr 2004, 24:327-343. 2. Ghandilyan A, D. Vreugdenhil, and M.G.M. Aarts: Progress in the genetic understanding of palnt iron and zinc nutrition. Physiologia Plantarum 2006, 126:407-417. 3. Mengel K, Kirby EA, Kosegarten H, Appel T: Principles of Plant Nutrition 5th Edition, 5th edn. Boston: Kluwer Academic Publishers; 2001. 4. Yi Y, Guerinot ML: Genetic evidence that induction of root Fe(III) chelate reductase activity is necessary for iron uptake under iron deficiency. Plant J 1996, 10(5):835-844. 5. Lin S, Cianzio SR, Shoemaker RC: Mapping genetic loci for iron deficiency chlorosis in soybean. Molecular Breeding 1997, 3:219-229. 6. Hansen NC, Schmitt MA, Anderson JE, Strock JS: Soybean: Iron Deficiency of Soybean in the Upper Midwest and Associated Soil Properties. Agronomy Journal 2003, 95:1595-1601. 7. Inskeep WP, Bloom, P.R.: Soil Chemical Factors Associated with Soybean Chlorosis in Calciaquolls of Western Minnesota. Agronomy Journal 1987, 79:779-786. 8. Froelich DM, Fehr WR: Agronomic Performance of Soybeans with Differing Levels of Iron Deficiency Chlorosis on Calcareous Soil. Crop Science 1981, 21:438-441.

121

9. Romheld V: Different Strategies for Iron Acquisition in Higher Plants. Physiol Plant 1987, 70:231-234. 10. Romheld V, Marschner H: Evidence for a Specific Uptake System for Iron Phytosiderophores in Roots of Grasses. Plant Physiol 1986, 80(1):175-180. 11. Connolly EL, Campbell NH, Grotz N, Prichard CL, Guerinot ML: Overexpression of the FRO2 ferric chelate reductase confers tolerance to growth on low iron and uncovers posttranscriptional control. Plant Physiol 2003, 133(3):1102-1110. 12. Diers BW, S.R. Cianzio, R.C. Shoemaker: Possible Identification of Quantitative Trait Loci Affecting Iron Efficiency in Soybean. Journal of Plant Nutrition 1992, 15(10):2127-2136. 13. Lin S, Baumer JS, Ivers D, Cianzio SR, Shoemaker RC: Field and Nutrient Solution Tests Measure similar Mechanisms Controlling Iron Deficiency Chlorosis in Soybean. Crop Science 1998(38):254-259. 14. O'Rourke JA, D.V. Charlson, D.O. Gonzalez, L.O. Vodkin, M.A. Graham, S.R. Cianzio, M.A. Grusak, R.C. Shoemaker: Microarray analysis of iron deficiency chlorosis in near-isogenic soybean lines. BMC Genomics 2007, 8:476-489. 15. O'Rourke JA, Graham MA, Vodkin L, Gonzalez DO, Cianzio SR, Shoemaker RC: Recovering from iron deficiency chlorosis in near-isogenic soybeans: a microarray study. Plant Physiol Biochem 2007, 45(5):287-292. 16. Kosak ST, D. Scalzo, S.V. Alworth, F. Li, S. Palmer, T. Enver, J.S.J. Lee, M. Groudine: Coordinate Gene Regulation during Hematopoiesis is Related to Genomic Organization. PLoS Biology 2007, 5(11):2602-2613. 17. Williams EJBaDJB: Coexpression of Neighboring Genes in the Genome of Arabidopsis thaliana. Genome research 2004, 14:1060-1067. 18. Ren X-Y, W. Stickema, J-P Nap: Local coexpression domains in the genome of rice show no microsynteny with Arabidopsis domains. Plant Molecular Biology 2007, 65:205-217. 19. Hurst LD, E.J.B. Williams, C. Pal: Natural selection promotes the conservation of linkage of co-expressed genes. TRENDS in Genetics 2002, 18(12):604-606. 20. Michalak P: Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes. Genomics 2008, 91(3):243-248. 21. Oliver B, M. Parisi, D. Clark: Gene expession neighborhoods. Journal of Biology 2002, 1(1):1-4. 22. USDA A, National Genetic Resources Program: Germplasm Resource Information Network - GRIN. In.: National Germplasm Resources Laboratory, Beltsville, Maryland; 2004. 23. Li C, W.H. Wong: Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application Genome Biology 2001, 2(8):1-11. 24. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402. 25. Fisher R: A Preliminary Linkage Test with Agouti and Undulated Mice; the Fifth Linkage-Group. Heredity 1949, 3:229-241.

122

26. Bonferroni CE: Ill calcolo delle assicurazioni su gruppi di teste. Studi in Onore del Professore Salvatore Ortu Carboni 1935:13-60. 27. Schmittgen TD, Zakrajsek BA, Mills AG, Gorn V, Singer MJ, Reed MW: Quantitative Reverse Transcription-Polymerase Chain Reaction to Study mRNA Decay: Comparison of Endpoint and Real-Time Methods. Analytical Biochemistry 2000, 285:194-204. 28. West MA, van Leeuwen H, Kozik A, Kliebenstein DJ, Doerge RW, St Clair DA, Michelmore RW: High-density haplotyping with microarray-based expression and single feature polymorphism markers in Arabidopsis. Genome research 2006, 16(6):787-795. 29. Irizarry RA, B. Hobbs, F. Collin, Y.D. Beazer-Barclay, K.J. Antonellis, U. Scherf, T.P. Speed: Exploration, Normalization and Summaries of High Density Oligonucleotide Array Probe Level Data. Biostatistics 2003, 4:249-264. 30. Grant D, Cregan P, Shoemaker RC: Genome organization in dicots: genome duplication in Arabidopsis and synteny between soybean and Arabidopsis. Proceedings of the National Academy of Sciences of the United States of America 2000, 97(8):4168-4173. 31. Bailey TL, N. Wiliams, C. Misleh, W.W. Li: MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 1996, 34(Web Server Issue):W369-W373. 32. Matys V, E. Fricke, R. Geffers, E. Gobling, M. Haubrock, R. Hehl, K. Hornischer, D. Karas, A.E. Kel, O.V. Kel-Margoulis, D.U. Kloos, S. Land, B. Lewicki-Potapov, H. Michael, R. Munch, I. Reuter, S. Rotert, H. Saxel, M. Scheer, S. Thiele, E. Wingender: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003, 31(374-378). 33. Thimm O, Essigmann B, Kloska S, Altmann T, Buckhout TJ: Response of Arabidopsis to Iron Deficiency Stress as Revealed by Microarray Analysis. Plant Physiology 2001, 127:1030-1043. 34. Espen L, M. Dell'Orto, P. De Nissi, G. Zocchi: Metabolic responses incucumber (Cucumis sativus L.) roots under Fe-deficiency: a 31P-nuclear magnetic resonance in-vivo study. Planta 2000, 210:985-992. 35. Zocchi G, De Nisi P, Dell'Orto M, Espen L, Gallina PM: Iron deficiency differently affects metabolic responses in soybean roots. Journal of experimental botany 2007, 58(5):993-1000. 36. Abadia J, A.F. Lopez-Millan, A. Rombola, and A. Abadia: Organic acids and Fe deficiency: a review. Plant and Soil 2002, 241(75-86). 37. Sun B, Jing Y, Chen K, Song L, Chen F, Zhang L: Protective effect of nitric oxide on iron deficiency-induced oxidative stress in maize (Zea mays). Journal of plant physiology 2007, 164(5):536-543. 38. Ren X, W. Stickema, JP Nap: Local coexpression domains of two to four genes in the genome of Arabidopsis. Plant physiology 2005, 138:923-934. 39. Zhan S, J. Horrocks, L.N. Lukens: Islands of co-expressed neighboruring genes in Arabidopsis thaliana suggest higher-order chromosome domains. Plant Journal 2006, 45:347-357.

123

40. Vogel JH, A. von Heydebrek, A. Purmmann, and S. Sperling: Chromosomal clustering of a human transcriptome reveals regulatory background. BMC Bioinformatics 2005, 6:230. 41. Kumar R, J. Qiu, T. Joshi, B. Valliyodan, D. Xu, H.T. Nguyen: Single Feature Polymorphism Discovery in Rice. PLoS One 2007, 2(3):e284. 42. Borevitz JO, S.P. Hazen, T.P. Michael, G.P. Morris, I.R. Baxter, T.T. Hu, H. Chen, J.D. Werner, M. Nordborg, D.E. Salt, S.a. Kay, J. Chory, D. Weigel, J.D.G. Jones, and J.R. Ecker: Genome-wide patterns of single-feature polymorphism in Arabidopsis thaliana. PNAS 2007, 104(29):12057-12062. 43. Charlson DV, Grant D, Bailey TB, Cianzio SR, Shoemaker RC: Molecular Marker Satt481 is Associated with Iron-Deficiency Chlorosis Resistance in a Soybean Breeding Population. Crop Science 2005:2394-2399. 44. Wang J, P.E. McLean, R.Lee, R.J. Goos, T. Helms: Association mapping of iron deficiency chlorosis loci in soybean (Glycine max L. Merr.) advanced breeding lines. . Theor Appl Genet 2008, DOI 10.1007/s00122-008-0710-x. 45. Luo ZW, E. Potokina, A. Druka, R. Wise, R. Waugh, M.J. Kearsey: SFP Genotyping From Affymetrix Arrays Is Robust But Largely Detects Cis- acting Expression Regulators. Genetics 2007, 176:789-900. 46. Charlson DV, Grant D, Bailey TB, Cianzio SR, Shoemaker RC: Molecular Marker Satt481 is Associated iwth Iron-Deficiency Chlorosis Resistance in a Soybean Breeding Population. Crop Science 2005:2394-2399. 47. Affymetrix: www.affymetrix.com.

124

125

126

Table 1: GO Slim Terms Over Represented in Candidate Genes from Clark

GO Slim ID GO Term Description Number of Bonferroni Genes Corrected with GO P-Value ID GO:0000004 BP No GO Term 128 0 GO:0006270 BP DNA Replication Initiation 10 0 GO:0009611 BP Response to Wounding 36 0 GO:0009695 BP Jasmonic Acid Biosynthesis 24 0 GO:0006826 BP Iron Ion Transport 9 1.2E-07 GO:0006879 BP Iron Ion Homeostasis 10 3.6E-07 GO:0010039 BP Response to Iron Ion 9 5.28E-06 GO:0009617 BP Response to Bacterium 8 0.000018 GO:0006275 BP Regulation of DNA Replication 4 0.00303 GO:0006972 BP Hyperosmotic Response 4 0.00303 GO:0030397 BP Membrane Disassembly 8 0.004381 GO:0008299 BP Isoprenoid Biosynthesis 10 0.006706 GO:0009408 BP Response to Heat 20 0.014334 GO:0019373 BP Epoxygenase P450 5 0.045066 GO:0008199 MF Ferric Iron Binding 9 0 GO:0008094 MF DNA-dependent ATPase Activity 10 1.29E-06 GO:0016165 MF Lipoxygenase Activity 12 0.000128 GO:0047763 MF Cafeate O-Methyltransferase 8 0.001014 GO:0030337 MF DNA Polymerase Processivity Factor 4 0.001426 GO:0005544 MF Calcium Dependent Phospholipid Binding 8 0.001965 GO:0009978 MF Allene Oxide Synthase 5 0.017949 GO:0046423 MF Allene Oxide Cyclase 4 0.020157 GO:0008815 MF Citrate (Pro-3S) Lyase 5 0.029946 GO:0009346 CC Citrate Lyase 5 0.002807

GO Slim ID’s statistically over or under represented in the list of differentially expressed genes comparing Clark plants grown in iron sufficient and iron limiting hydroponics conditions in comparison to their presence on the Affymetrix® GeneChip®. The GO Slim Column provides the GO Slim ID number and corresponding category. GO Slim categories are Biological Process (BP), Molecular Function (MF), and Cellular Component (CC). The GO Term Description column contains the best description for the GO Slim ID. The number of genes with GO ID column is the number of genes differentially expressed between Clark plants grown in iron sufficient and iron deficient hydroponics solutions with the corresponding GO Slim ID. The Bonferroni Corrected P- value is the statistical significance of the GO Slim ID. Only GO Slim IDs with P-values equal to or less than 0.5 were considered statistically over or under represented in the list

127

of differentially expressed genes in comparison to the presence of the GO Slim ID on the GeneChip®.

128

Table 2: GO Slim Terms Over Represented in Candidate Genes

GO Slim GO Term Description Number of Bonferroni Genes with Corrected P GO ID Value GO:0006809 BP Nitric Oxide Biosynthesis 4 0.0006102 GO:0010025 BP Wax Biosynthesis 5 0.00736524 GO:0019953 BP Sexual Reproduction 5 0.01223184 GO:0008940 MF Nitrate Reductase 4 0.00006192 GO:0008382 MF Iron Superoxide Dismutase 3 0.01245882

GO Slim Terms Over Represented in Candidate Genes from Comparison Between IsoClark Plants Grown in Iron Sufficient and Iron Deficient Hydroponics Solutions. GO Slim ID’s statistically over or under represented in the list of differentially expressed genes comparing Clark plants grown in iron sufficient and iron limiting hydroponics conditions in comparison to their presence on the Affymetrix® GeneChip®. The GO Slim Column provides the GO Slim ID number and corresponding category. GO Slim categories are Biological Process (BP), Molecular Function (MF), and Cellular Component (CC). The GO Term Description column contains the best description for the GO Slim ID. The number of genes with GO ID column is the number of genes differentially expressed between IsoClark plants grown in iron sufficient and iron deficient hydroponics solutions with the corresponding GO Slim ID. The Bonferroni Corrected P-value is the statistical significance of the GO Slim ID. Only GO Slim IDs with P-values equal to or less than 0.5 were considered statistically over or under represented in the list of differentially expressed genes in comparison to the presence of the GO Slim ID on the GeneChip®.

129

Table 4: Clusters of Candidate Genes on 7X build of Soybean Genome from Clark

1,000,000 base bins

# of Genes Stimulation Stimulation Experimental Sds from Sim Mean SD Data Mean

0 435 9 569 14.87 1 344 14.61 188 -10.69 2 136 8.83 99 -4.23 3 36 5.12 44 1.57 4 7 2.54 36 11.32 5 1 1 13 11.81 6 0 2

7 0 2 8 0 1 100,000 base bins

# of Genes Stimulation Stimulation Experimental Sds from Sim Mean SD Data Mean

0 9228 4.97 8926 -60.74 1 704 9.72 486 -22.44 2 27 4.74 98 14.98 3 0 22 4 0 2 5 0 1 10,000 base bins # of Genes Stimulation Stimulation Experimental Sds from Sim Mean SD Data Mean 0 95243 1.69 94678 -333.56 1 754 3.38 593 -47.51 2 3 1.69 71 40.12 3 0 7 4 0 0 5 0 1

# of Genes: number of genes identified within the respective bin size of the simulation Simulation Mean: average number of times from 1,000 replicates that the number of genes were found within the respective bin size. Simulation SD : standard deviation of the Simulation Mean. Experimental Data: number of bins of the respective size with that number of genes identified by microarray analysis SD from Sim Mean: difference between experimental data and simulated data in terms of the SD calculated from simulated data (significance = 2).

130

Table 5: Clusters of Candidate Genes on 7X build of Soybean Genome from IsoClark

1,000,000 base bins

# of Stimulation Stimulation Experimental SDs from Sim Genes Mean SD Data Mean

0 743 4.36 783 9.23

1 191 8.14 118 -8.94

2 24 4.02 38 3.41

3 2 1.37 8 4.35

4 0 7

5 0 0

6 0 0

100,000 base bins # of Stimulation Stimulation Experimental Sds from Sim Genes Mean SD Data Mean 0 9357 1.8 9325 -17.84 1 240 3.57 178 -17.26 2 3 1.76 28 14.13 3 0 4

10,000 base bins # of Stimulation Stimulation Experimental Sds from Sim Genes Mean SD Data Mean 0 95754 0.57 95128 -1105.22 1 245 1.13 199 -40.88 2 0 22 3 0 1

# of Genes: number of genes identified within the respective bin size of the simulation Simulation Mean: average number of times from 1,000 replicates that the number of genes were found within the respective bin size. Simulation SD : standard deviation of the Simulation Mean. Experimental Data: number of bins of the respective size with that number of genes identified by microarray analysis SD from Sim Mean: difference between experimental data and simulated data in terms of the SD calculated from simulated data (significance = 2).

131

Table 6. Motifs in Promoter Regions of Differentially Expressed Genes

Identified Motif Sequence # E- TRANSFAC TRANSFAC Sequences Value Hit ID Binding Site containing of Sequence Motif Motif

CATCCAACGGC 29 1.2E-1 M00227 TCCAACGGC

CCCGCCACGCGCCAC 48 5.1E-26 M00187 GCCACGTGCC

TGGCGGGA 50 5.8E-13 M00024 TGGCGCGA

CCAAACCC 50 2.7E-5 No Hit

CCACCACCACC 48 3.8E-16 No Hit

ACACAACACAC 45 2.2E-10 No Hit

AAAATAAAAATAAAA 9 2.27E-7 No Hit

AATAAAAAAATAAAA 8 1.51E-7 No Hit

AGCTAGCTAGC 6 1.47E-7 No Hit

AGCGAGCGAGC 4 6.23E-8 No Hit

AGCAAGCTAGC 3 2.47E-7 No Hit

Identified by MEME and their similarity to Transcription Factor Binding Sites in the TRANSFAC Database Identified Motif Sequence: The motif sequence identified by MEME analysis in the promoter region of differentially expressed genes from the Clark genotype. # of Sequences containing Motif: The number of differentially expressed genes whose promoter regions contain the identified motif. Motif E-Value: E-value of the motif assigned by MEME. TRANSFAC Hit ID: The TRANSFAC ID of the transcription factor binding site showing high homology to the identified motifs. Only three identified motifs showed homology to known transcription factor binding sites in the TRANSFAC database.

132

TRANSFAC Binding Site: The transcription factor binding site sequence. Mismatched bases are between the identified motif and the reported binding site are underlined.

133

Chapter 5. Conclusions

Conclusions

Iron deficiency chlorosis has been the subject of QTL studies and soybean breeding programs since the 1980’s [1-4]. Depending on the population, iron efficiency has been identified as the result of a major gene with minor modifying genes or as a more quantitative trait controlled by numerous genes [2-4]. Genes involved in iron reduction, uptake, and transport have been identified in model organisms such as rice and

Arabidopsis [5-11], but have not yet been identified in soybean. The research presented throughout this dissertation was performed to identify similar candidate genes in soybean in an effort to further understand soybeans’ iron efficiency.

Previous studies in soybean have identified regions of the soybean genome,

QTLs, related to iron efficiency. The nineteen QTL regions identified in previous studies account for approximately 182cM of the soybean genome and are each responsible for a certain percentage of the chlorosis exhibited in field trials. The use of microarrays to compare the expression profiles of two near isogenic soybean lines, Clark, iron efficient, and IsoClark, iron inefficient, grown under iron stressed and iron non-stressed conditions allowed us to identify genes differentially expressed between the genotypes and within genotypes under different iron concentrations. These microarray experiments have shown that under iron stress conditions iron efficient soybean plants initiate a generalized stress response and a more specific iron stress response. The iron inefficient plant does not fully initiate either stress response. In fact, the iron inefficient plants’ expression profile under iron stress conditions is remarkable similar to that under non-iron stress

134

conditions. Iron deficiency stress appears to have long-term effects on the expression profiles of iron efficient plants.

A global expression analysis, using the Affymetrix® GeneChip® Soybean

Genome Array and the soybean genome sequence identified transcripts co-expressed under iron stress that are clustered in the genome, suggesting co-regulation. This is supported by promoter sequence analysis which identified 23 conserved motifs, most likely indicating transcription factor binding sites. This analysis supports the hypothesis that the differential iron response between iron efficient and iron inefficient plants is a result of mutation(s) either in the promoter region(s) or the coding sequence(s) of the transcription factor(s) regulating soybeans’ iron deficiency response. The mutated or non-induced transcription factor in iron inefficient soybean lines is unable to affect changes in gene expression required for iron efficiency under iron stress conditions.

Future Research

The results of this research have laid the foundation for a variety of future experiments. One of the most promising, and easily implemented, experiments would be to develop the identified single feature polymorphisms into molecular markers. These represent a pool of polymorphisms between the near isogenic lines. Once developed as markers, these SFPs could possibly be used to screen germplasms to predict their iron efficiency. This pool of readymade iron specific markers should substantially increase the number of markers to be used in IDC marker assisted selection programs.

The identification of conserved motifs in the promoter regions of the candidate genes suggests the differential iron response is due to a mutation in either the promoter

135

region of the transcription factor or the coding region of the transcription factor itself.

To test this, future experiments will need to identify what type of transcription factor binds to the identified conserved motifs. Once the candidate transcription factors are identified, the QTL regions of the soybean genome should be screened to determine if any encoded within known IDC QTLs. These would represent the most likely candidate genes. I would propose these candidates be systematically silenced with virus induced gene silencing in both the iron efficient and inefficient NILs. Plants should then be grown in both iron sufficient and iron inefficient hydroponics systems to determine if the silencing had any effect on soybeans’ iron efficiency.

Narrowing the genomic region of interest in IDC from 19 QTL regions to a few specific genes or small genomic regions, should assist plant breeders in improving soybeans’ iron efficiency. This improved iron efficiency should minimize yield loss at the end of the season.

References

1. Lin S, Baumer JS, Ivers D, Cianzio SR, Shoemaker RC: Field and Nutrient Solution Tests Measure similar Mechanisms Controlling Iron Deficiency Chlorosis in Soybean. Crop Science 1998(38):254-259. 2. Cianzio SR, Fehr WR, Anderson IC: Genotypic Evaluation for Iron Deficiency Chlorosis in Soybeans by Visual Scores and Chlorophyll Concentration. Crop Science 1979, 19:644-646. 3. Diers BW, S.R. Cianzio, R.C. Shoemaker: Possible Identification of Quantitative Trait Loci Affecting Iron Efficiency in Soybean. Journal of Plant Nutrition 1992, 15(10):2127-2136. 4. Lin S, Cianzio SR, Shoemaker RC: Mapping genetic loci for iron deficiency chlorosis in soybean. Molecular Breeding 1997, 3:219-229. 5. Vert G, Grotz N, Dedaldechamp F, Gaymard F, Guerinot ML, Briat JF, Curie C: IRT1, an Arabidopsis transporter essential for iron uptake from the soil and for plant growth. The Plant cell 2002, 14(6):1223-1233. 6. Eide D, Broderius M, Fett J, Guerinot ML: A novel iron-regulated metal transporter from plants identified by functional expression in yeast.

136

Proceedings of the National Academy of Sciences of the United States of America 1996, 93(11):5624-5628. 7. Henriques R, Jasik J, Klein M, Martinoia E, Feller U, Schell J, Pais MS, Koncz C: Knock-out of Arabidopsis metal transporter gene IRT1 results in iron deficiency accompanied by cell differentiation defects. Plant molecular biology 2002, 50(4-5):587-597. 8. Grotz N, Fox T, Connolly E, Park W, Guerinot ML, Eide D: Identification of a family of zinc transporter genes from Arabidopsis that respond to zinc deficiency. Proceedings of the National Academy of Sciences of the United States of America 1998, 95(12):7220-7224. 9. Vert G, Briat JF, Curie C: Arabidopsis IRT2 gene encodes a root-periphery iron transporter. Plant J 2001, 26(2):181-189. 10. Bughio N, H. Yamaguchi, N.K. Nishizawa, H. Nakanishi, and S. Mori: Cloning an iron-regulated metal transporter from rice. Journal of experimental botany 2002, 53:1677-1682. 11. Rogers EE, Guerinot ML: FRD3, a Member of the Multidrug and Toxin Efflux Family, controls Iron Deficiency Responses in Arabidopsis. The Plant cell 2002, 14:1787-1799.

137

Appendix: Supplementary Table 1

protein metabolism protein

protein metabolism protein

response to stress response

biotic stimulus

response to abiotic or or to abiotic response

protein metabolism protein

transcription

transcription

protein metabolism protein

response to stress response

protein metabolism protein

biological process unknown biological

response to stress response

response to stress response

developmental developmental

other metabolic metabolic other

response to stress response

PlantGoSlim

T2G39730

AT5G20620

AT4G28480

AT1G52560

A

AT5G48570

AT2G26150

AT5G62020

AT5G20620

AT4G27670

AT1G56300

AT5G53740

AT4G27670

AT3G25230

AT1G34210

AT4G10490

AT4G10250

AtHomolog

PlantGoSlim

bisphosphate

Fe(II) Fe(II)

-

NILs Iron NILs Conditions Sufficient GrownUnder

shock proteinshock

-

1,5

-

like protein protein like

like protein like

its on UniProt

-

-

Ubiquitin

DnaJ

Small heat shock protein shock protein Small heat

carboxylase / oxygenase" carboxylase

"Ribulose

peptidylprolyl peptidylprolyl

Putative 70 kDa Putative 70 kDa

factor

Heat shock transcription Heat

Heat Shock Factor protein Shock Factor Heat 6

Ubiquitin

Small heat shock protein Small heat

DnaJ

No Hits on UniProtNo Hits

No Hits on UniProtNo Hits

No H

Small heat shock protein Small heat

Peptidylprolyl isomerase Peptidylprolyl

No Hits on UniProtNo Hits

hits on Uniprot)

Hypothetical Protein (only 2 Protein (only Hypothetical

oxygenase

Putative 20G

precursor precursor

class IV heat class IV

No Hits on UniProtNo Hits

Polyprotein

Annotation

Confirmed UniProt Confirmed

Q9S7H2

Q1RY14

Q8L7T2

Q1SVQ0

Q7F1F2

O80982

Q9SCW4

Q9S7H2

Q1T3Y4

Q8H2B1

Q1T3Y4

Q9FJL3

Q1SJ63

Q9ZSA7

Q39819

Q6WE90

ID

UniProt

Best Hit Best

899

2.261

2.267

2.295

2.297

2.32

2.459

2.739

2.884

2.

3.173

3.365

3.469

5.297

5.591

5.724

5.922

7.959

8.776

9.416

9.9

12.383

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Change Change

Fold Fold

ble 1: Differentially Expressed Genes Between 1: ble Genes Expressed Differentially

GmaAffx.93424.1.S1_s_at

GmaAffx.74022.1.S1_at

GmaAffx.5924.1.S1_at

Gma.1727.1.S1_at

GmaAffx.72322.1.S1_at

Gma.8636.1.S1_at

GmaAffx.56241.2.S1_at

GmaAffx.93424.1.S1_x_at

Gma.10282.2.S1_at

Gma.12660.1.A1_at

GmaAffx.89665.1.A1_s_at

Gma.11793.1.S1_at

GmaAffx.90956.1.S1_s_at

Gma.10282.1.A1_at

Gma.2185.3.S1_at

GmaAffx.62046.1.S1_at

Gma.14554.1.S1_at

Gma.17141.1.S1_at

Gma.18.1.S1_at

Gma.12096.1.A1_at

GmaAffx.93650.1.S1_s_at

CSvICS Fold2MBEI CSvICS Supplementary Ta Supplementary

138

Appendix: Supplementary Table 2

ys

se to stress se

transport

other metabolic metabolic other

biological process unknown biological

transport

respon

protein metabolism protein

response to stress response

other metabolic metabolic other

response to stress response

other metabolic metabolic other

response to stress response

other metabolic metabolic other

pathwa

electron transport or energy transportenergy or electron

other cellular cellular other

response to stress response

response to stress response

response to stress response

biological process unknown biological

stimulus

response to abiotic or biotic or to abiotic response

stimulus

response to abiotic or biotic or to abiotic response

Plant GoSlim

like like

-

t

ot

(EC (EC transferase

-

lipase/acylhydrolase

like protein protein like

motif lipase

motif

-

-

-

transfer protein transfer

-

specific lipid transferspecific

-

in Clark Genotype Induced by Iron by Genotype Clark in Deficiency Induced

protein

Non

GSDL

Cyclin B Cyclin

Lipid

Peroxidase

Lectin precursor Lectin

class I heat shock heat protein class I

GDSL

class II heat shock protein class II

Patatin

No Hits on UniProtNo Hits

No Hits on UniProNo Hits

No Hits on UniProtNo Hits

class II heat shock protein class II

Pphosphatase

Lipoxygenase Lipoxygenase

Glutathione S

class I heat shock heat protein class I

class II heat shock protein class II

class II heat shock protein class II

Hypothetical Protein Hypothetical

No Hits on UniPrNo Hits

2

Inducible nitrate reductase [NADH] nitrate Inducible

1

Inducible nitrate reductase [NADH] nitrate Inducible

Confirmed Annotation Confirmed

Q1KL62

Q6Y0F0

Q9FNI1

Q43019

O23961

Q8L683

Q2HTU2

Q8LB81

P05477

Q29VI1

P05477

Q8GUC2

O24320

P32110

P02519

P05477

P05477

Q9FLI7

Q9SYR0

P54233

ID

UniProt

Best Hit Best

20.282

21.276

21.843

22.873

22.95

23.387

23.45

23.45

23.841

25.096

25.897

30.862

31.975

32.548

33.011

34.439

35.555

38.632

41.131

41.97

47.753

50.729

56.611

58.948

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Average Fold Fold Average

: Differentially Expressed Genes : Genes Expressed Differentially

ble 2 ble

_at

GmaAffx.89407.1.A1_s_at

Gma.10073.2.S1_at

GmaAffx.30771.1.S1_at

Gma.17974.1.S1_at

GmaAffx.89697.1.S1_s_at

GmaAffx.89246.1.S1_s_at

GmaAffx.89896.1.S1_at

Gma.10073.1.A1_at

GmaAffx.69544.1.S1_at

Gma.3481.1.S1_at

GmaAffx.20332.1.A1_at

GmaAffx.39719.1.S1_at

GmaAffx.82745.1.S1_at

GmaAffx.69544.1.S1_x_at

Gma.10581.1.S1_at

GmaAffx.93591.1.S1_s_at

GmaAffx.88762.1.S1_at

Gma.17947.1.S1_at

GmaAffx.69544.2.S1_at

GmaAffx.69544.1.S1_s_at

Gma.12681.1.S1_at

GmaAffx.87317.1.S1_s_at

Gma.8416.1.S1_at

Gma.1221.1.S1_s

CSvCDwoCD1Fold2MBEI Supplementary Ta Supplementary

139

c

response to stress response cellular other to stress response biotic or to abiotic response stimulus cellular other metabolic other to stress response to stress response to stress response transport process unknown biological to stress response to stress response metabolic other process unknown biological cellular other cellular other process unknown biological cellular other cellular other metabolic other metaboli other transportenergy or electron pathways developmental

phosphate -

hyltransferase

tein homologue met methyltransferase

- -

hock proteinhock xylulose 5 xylulose 1a protein like precursor

- - like protein like D methylglutaryl methylglutaryl A coenzyme

- - like protein protein like like like Lipase/Acylhydrolase - - chain alcohol dehydrogenase alcohol chain transfer protein transfer like protein like - - - doreductase/dehydrogenase deoxy - eductoisomerase 17.5 kDa class I heat shock class protein I 17.5 kDa Apyrase factor, putativeProtodermal Peroxidase Lipoxygenase 1 r GDSL Thaumatin Oxi shock heat protein (HSP class 18.5) I Lipid latexMajor pro low molecular class II Cytosolic heat shock protein weight on UniProtNo Hits heat s class II Short oxidase precursor amine Copper Hydroxy synthase synthase Chalcone MLP O acid Caffeic O acid Caffeic Patatin putative Lipase, Lipoxygenase T2 Ribonuclease

8 P04794 Q84UE0 Q9S72 O23961 P38417 Q6EJD0 Q1SYS9 Q9FSG7 Q9ZRF1 Q2HTU2 Q43019 Q9LEJ8 Q9XGS6 P19242 Q9SQJ3 O65749 Q94ET0 Q2ENC4 Q1T055 Q3SCM5 Q3SCM5 Q29VI1 Q9M9Y7 P24095 Q1S6Z1

19.916 19.137 19.111 18.025 17.952 17.651 17.318 17.002 16.987 16.781 16.404 16.346 16.236 15.369 15.278 15.19 14.991 14.95 14.805 14.714 14.455 14.442 13.969 13.777 13.375 13.356 ------

ble 2 (Continued) 2 ble

GmaAffx.93268.1.S1_at Gma.15856.2.S1_at Gma.3043.1.S1_at Gma.16812.1.S1_s_at GmaAffx.93553.1.S1_s_at Gma.2480.1.S1_at GmaAffx.88546.1.S1_at GmaAffx.90538.1.A1_s_at GmaAffx.15215.1.S1_at GmaAffx.89896.1.S1_s_at Gma.2239.1.S1_at Gma.8522.1.S1_at Gma.10763.1.S1_at GmaAffx.89245.1.S1_s_at Gma.7766.1.S1_at Gma.12746.1.A1_at GmaAffx.766.1.S1_at GmaAffx.73904.1.S1_at GmaAffx.77637.1.S1_at Gma.15610.1.S1_at Gma.6550.2.S1_x_at Gma.6550.2.S1_s_at Gma.10923.2.S1_at Gma.16846.1.A1_at GmaAffx.90263.1.S1_s_at GmaAffx.51733.1.A1_at Supplementary Ta Supplementary

140

e to stress e us biological process unknown biological process unknown biological to stress response process unknown biological metabolic other metabolic other cellular other to stress response metabolic other to stress response and organization cell biogenesis biotic or to abiotic response stimulus cellular other biotic or to abiotic response stimul process unknown biological biotic or to abiotic response stimulus respons cellular other biotic or to abiotic response stimulus to stress response to stress response

galactosyl transferase

- niProt O -

protein rich type trypsin inhibitor KTI1 inhibitor trypsin KTI1 type - - glucosyltransferase - oxophytodienoate reductase oxophytodienoate methyltransferase, 2 family methyltransferase, - - Copper amine oxidase amine Copper Kunitz precursor Glutathione peroxidase shock transcription Heat factor (HSFA) like heavy chain protein Myosin 3 Flavonoid UDP CPRD12 protein shock protein heat precursor class IV on U No Hits Protein Hypothetical 12 Proline Lipoxygenase O Lipoxygenase inhibitor p20 Trypsin 1 Lipoxygenase heat shock class protein I 17.5 kDa phosphatase Acid Lipoxygenase hemoglobin Nonsymbiotic heat shock class protein I 17.5 kDa Protein Hypothetical

Q6A174 P25272 Q6A4W8 O82042 Q1RYE3 Q9ZWS2 Q8LP23 P93697 Q39819 Q1SJ63 Q4ZJ73 Q39887 Q42780 Q2HTB5 Q42780 Q7M1S6 P27480 P04794 Q707M7 Q42780 Q3C1F4 P04794 Q1SJ63

793 13.259 13.213 12.953 12. 12.735 12.569 12.491 12.32 11.904 11.873 11.84 11.628 11.529 11.526 11.507 11.419 11.368 11.363 11.344 11.256 11.185 11.164 10.97 10.796 ------

1.S1_at

ble 2 (Continued) 2 ble ffx.93268.1.S1_s_at

Gma.4659.1.S1_at Gma.10689.1.S1_at GmaAffx.93342.1.S1_s_at GmaAffx.71308.1.S1_at GmaAffx.13179.1.S1_at GmaAffx.28120.1.S1_at GmaAffx.76187. Gma.1594.1.S1_at Gma.18.1.S1_at Gma.5496.1.S1_at Gma.14554.1.S1_at GmaAffx.6142.1.S1_at Gma.15996.1.S1_x_at Gma.10969.1.S1_x_at Gma.10216.3.A1_x_at Gma.10969.1.S1_at Gma.9202.1.S1_at Gma.16500.1.S1_at GmaAffx.69311.1.S1_at Gma.13045.1.S1_at Gma.10969.1.S1_a_at GmaAffx.70981.1.S1_at GmaA Gma.7580.1.S1_at Supplementary Ta Supplementary

141

ss unknown ort or energy or energy ort

biological process unknown biological biotic or to abiotic response stimulus cellular other cellular other process unknown biological to stress response process unknown biological cellular other metabolism protein transp electron pathways cellular other transportenergy or electron pathways process unknown biological process unknown biological cellular other process unknown biological proce biological cellular other metabolic other cellular other metabolic other metabolic other

anchored

-

inhibiting protein

-

transferase - hydroxylase -

2, chloroplast precursor precursor 2, chloroplast

like protein protein like - oxide cyclase precursor oxide cyclase like lipase/acylhydrolaselike - - -

Fe(II) oxygenase Fe(II) - methyltransferase, 2 family methyltransferase, - Hypothetical Protein Hypothetical Polygalacturonase precursor O Ferritin Protein Hypothetical Allene factor, putativeProtodermal Transferase precursor chloroplast Ferritin, Lectin P450 Cytochrome Lipoxygenase GDSL factor, putativeProtodermal Protein Hypothetical P450 82A2 Cytochrome Transferase protein GPI transfer Lipid 3 Flavanone phosphatase Acid inhibitor Trypsin 1 enzyme biosynthetic Rhamnose Putative protein APG 2OG Glutathione S related nodulin protein, gene Early putative Patatin

5

SFJ0 32110 Q1SSI3 Q6WMU Q2HTB5 Q94IC4 Q1S265 Q599T8 Q9S728 O64470 P19976 Q2XV15 Q1RV95 O24320 Q9STM6 Q9S728 Q1 O81972 Q1T1P6 Q700A6 Q53B69 Q6YGT9 Q9LLX2 Q9SYM5 Q67ZI9 Q9SWY6 P Q5J7N0 Q29VI1

10.794 10.735 10.637 10.599 10.578 10.575 10.422 10.353 10.2 10.191 10.172 10.13 10.095 10.03 10.023 9.984 9.627 9.617 9.573 9.552 9.503 9.478 9.45 9.294 9.29 9.195 9.04 ------

ble 2 (Continued) 2 ble ma.7880.1.S1_at

Gma.791.1.S1_at Gma.9956.1.S1_at Gma.10216.1.S1_at Gma.2505.1.S1_at G Gma.12166.1.S1_at Gma.18014.2.S1_x_at Gma.17985.2.S1_at Gma.13352.1.S1_at Gma.15722.1.S1_at Gma.13058.1.A1_s_at GmaAffx.81415.1.S1_at GmaAffx.76826.1.S1_at Gma.18014.1.S1_a_at Gma.4294.3.S1_a_at Gma.153.1.S1_x_at Gma.17985.1.S1_at GmaAffx.59009.1.S1_at Gma.5461.1.S1_at Gma.529.1.S1_a_at Gma.2220.1.S1_s_at GmaAffx.93185.1.S1_s_at Gma.12966.1.S1_at Gma.2316.1.S1_at GmaAffx.88762.1.S1_x_at GmaAffx.85837.1.S1_at Gma.10923.1.A1_at Supplementary Ta Supplementary

142

ess unknowness

lar lar

on transport or energy on transport energy or other biological biological other to stress response biotic or to abiotic response stimulus transportenergy or electron pathways metabolism RNA or DNA proc biological metabolic other cellular other to stress response cellular other metabolism protein metabolic other to stress response process unknown biological cellular other metabolic other cellular other transport cellu other to stress response electr pathways

gated ion channel ion channel gated

- methyltransferase methyltransferase - O

-

motif lipase - superoxide dismutase - Hits on UniProt

Annexin shock heat protein class IV 22.0 kDa precursor Iron No P450 Cytochrome nuclease (Fragment) Bifunctional on UniProtNo Hits dehydrogenase (NAD) Aldehyde Ferritin HEV1.2 diester Glycerophosphoryl phosphodiesterase 2C phosphatase Protein 1 , family Glycoside shock heat protein 101 kDa Protein Hypothetical 3 acid Caffeic GSDL Protein Hypothetical nucleotide Cyclic 8 A synthase Chalcone on UniProtNo Hits heat shock class protein I 17.5 kDa on UniProtNo Hits P450 Cytochrome

5848 9889 O6 Q39820 Q71UA1 Q1RV95 Q9ZR88 Q1SCN9 Q941G7 Q6JYQ9 Q84WI7 Q1SSW0 Q1SWW1 Q3 Q84R94 P28002 Q5J7N0 Q6K237 Q1RWG4 P48399 P04793 O81972

8.906 8.784 8.697 8.691 8.687 8.647 8.611 8.591 8.581 8.473 8.472 8.466 8.462 8.439 8.432 8.404 8.336 8.25 8.191 8.152 8.118 8.031 8.009 7.973 ------

ble 2 (Continued) 2 ble

GmaAffx.93243.1.S1_s_at Gma.17.1.S1_at Gma.16827.1.S1_at Gma.16098.1.A1_at Gma.13058.1.A1_at Gma.3617.1.S1_at GmaAffx.55607.1.S1_at Gma.13140.4.S1_at Gma.1089.1.S1_s_at GmaAffx.92499.1.S1_s_at Gma.6211.2.A1_at Gma.13918.1.A1_at GmaAffx.90617.1.S1_s_at Gma.48.1.S1_at Gma.10342.1.S1_at Gma.6550.3.S1_s_at GmaAffx.54632.1.S1_at Gma.7623.1.A1_at Gma.4305.3.S1_a_at Gma.13327.2.S1_a_at Gma.3026.1.S1_at GmaAffx.92420.1.S1_s_at Gma.12096.1.A1_at GmaAffx.92620.1.S1_s_at Supplementary Ta Supplementary

143

energy energy

ellular ellular ription other metabolic metabolic other transport to stress response to stress response metabolic other to stress response process unknown biological transc transportenergy or electron pathways to stress response c other transport to stress response metabolic other cellular other transport or electron pathways biological other cellular other transport cellular other

L -

S catalytic catalytic - -

associated associated D

- -

like protein cspE protein like -

4, chloroplast precursor precursor 4, chloroplast

- motif lipase/acylhydrolase dase - chain type type chain dependent dehydroascorbate dehydroascorbate dependent

- - Short dehydrogenase/reductase resistance Multidrug 10 protein Peroxi shock protein Small heat GDSL on UniProtNo Hits shock 70 kDa protein Heat non Polygalacturonase precursor subunit AroGP3 Cold shock Amine oxidase, putative isomerase Peptidylprolyl Ferritin precursor Patellin/ glycoprotein GSH 1, putative reductase Peroxidase 2, beta chain synthase Tryptophan precursor chloroplast diphosphate synthase small Geranyl subunit P450 Cytochrome G enzyme, Lipolytic protein like Cellulose synthase protein extrusion Multi antimicrobial MatE CER1 like protein

Q3LHL1 Q9LZJ5 Q43854 Q1T3Y4 Q9M8Y5 P26413 P93218 Q89A90 Q9SKX5 Q9FJL3 Q948P5 Q9LEN5 Q6F4I6 Q9XFI8 Q8GWQ8 Q9SBR4 Q6J540 Q1S8F9 Q3Y6V1 Q1SD91 Q2HW67

7.967 7.935 7.849 7.832 7.823 7.822 7.745 7.737 7.721 7.715 7.64 7.583 7.57 7.536 7.444 7.423 7.413 7.372 7.299 7.286 7.285 7.274 ------

at

ble 2 (Continued) 2 ble

GmaAffx.93030.1.S1_s_at Gma.4457.1.S1_a_at Gma.4305.1.S1_at Gma.10282.1.A1_at Gma.9185.1.S1_at GmaAffx.85017.1.S1_s_at GmaAffx.30428.1.S1_ GmaAffx.57878.2.S1_at GmaAffx.64238.1.S1_at GmaAffx.32845.1.A1_at Gma.2185.3.S1_at Gma.1586.1.S1_at GmaAffx.89772.9.A1_s_at Gma.4366.1.S1_at Gma.6477.1.S1_at Gma.736.1.A1_at Gma.9069.1.S1_at GmaAffx.61184.1.S1_at Gma.4558.1.A1_at Gma.11871.2.S1_at GmaAffx.73749.1.S1_at Gma.15792.1.S1_at

plementary Ta plementary Sup

144

biotic

cal process unknowncal biological process unknown biological transportenergy or electron pathways metabolism RNA or DNA biologi transportenergy or electron pathways metabolism protein metabolic other biological other and organization cell biogenesis to stress response metabolism RNA or DNA transportenergy or electron pathways process unknown biological or to abiotic response stimulus biological other transport transport metabolic other transport process unknown biological

450 associated associated

-

type Fe(Cys)4 protein type -

ch protein ch ri ic peroxidaseascorbate 2 - glucosyl transferase glucosyl - Hypothetical Protein Hypothetical on UniProtNo Hits Cytosol nuclear cell antigen Proliferating on UniProtNo Hits phosphatase Acid on UniProtNo Hits Rubredoxin kinase Serine/threonine UDP protein regulated Gibberellin NAC3 domain protein NAC Proline Peroxidase nuclear cell antigen Proliferating P Putative cytochrome protein binding Calcium oxidase Polyphenol Lipoxygenase protein regulated Gibberellin resistance Multidrug 10 protein intrinsic protein Major 1 hydrolase, family Glycoside transporter 3phosphate glycerol Transferase

Q9ARX2 Q76LA6 O82134 Q764C1 Q1SMR5 Q7XA30 Q9ZWQ5 Q1T1E0 Q52QR3 Q39887 O23961 O82134 Q1RV95 Q8W4L0 Q84YI1 O24320 Q948Z4 Q9LZJ5 Q1SL51 Q1RV93 Q9STY1 Q1SD30

7.258 7.251 7.234 7.234 7.19 7.176 7.146 7.145 7.139 7.132 7.128 7.124 7.118 7.105 7.094 7.08 7.07 7.012 6.957 6.918 6.918 6.901 6.877 6.868 6.861 ------

ble 2 (Continued) 2 ble .1.S1_at fx.34532.1.S1_at

GmaAffx.70494.1.A1_at Gma.17890.1.S1_at GmaAffx.93393.1.S1_s_at Gma.431.2.S1_a_at Gma.13729.1.A1_at Gma.529.2.S1_at Gma.11116.1.S1_at Gma.11803.1.S1_at Gma.13369.1.S1_at Gma.4386 GmaAffx.90343.1.S1_at GmaAffx.93220.1.S1_at Gma.79.1.S1_s_at Gma.16812.1.S1_x_at Gma.431.2.S1_x_at GmaAffx.40080.1.A1_at Gma.1518.2.S1_a_at GmaAffx.12887.2.S1_at Gma.11166.1.S1_x_at Gma.3433.1.S1_at Gma.4457.1.S1_at Gma.15715.1.S1_at GmaAf Gma.3539.2.S1_at Gma.17606.2.S1_at Supplementary Ta Supplementary

145

her cellular cellular her cellular her electron transport or energy transportenergy or electron pathways cellular other to stress response transport cellular other transportenergy or electron pathways metabolic other process unknown biological ot process unknown biological metabolism protein ot transcription metabolic other transport metabolic other process unknown biological metabolic other metabolic other metabolic other cellular other process unknown biological metabolic other

e

like like

-

subunit

-

motif lipase/hydrolase

lucose glucosyltransferase lucuronosyltransferase

-

g g glucose glucosyltransferase - - - methyltransferase, 2 family methyltransferase, - Cytochrome P450 monooxygenase P450 monooxygenase Cytochrome CYP72A1 O Protein Hypothetical UDP Tonoplast intrinsic protein synthase Flavonol P450 monooxygenas Cytochrome Protein Hypothetical b lyase ATP citrate Annexin carboxylase Phosphoenolpyruvate kinase dehydrogenase subunitSuccinate 3 (Fragment) transcription factor bHLH on UniProtNo Hits protein extrusion Multi antimicrobial (MatE) glucronosyltransferase UDP glycoprotein precursor Stem 28 kDa on UniProtNo Hits UDP UDP Lipase Apyrase Protein Hypothetical GDSL protein

AK7 Q2L Q2HTB5 Q1RW87 Q1S2F4 Q39883 Q8LP22 Q2LAK7 Q5N800 Q8L9B2 Q93YH3 Q1SAQ5 Q8GSL0 Q94FP3 Q9T072 Q1T635 Q8L5C7 O49855 Q8L5C7 Q1S2D1 Q941F1 Q9FVC2 Q7F2A8 Q1SQA1

6.856 6.852 6.817 6.815 6.788 6.755 6.747 6.731 6.713 6.668 6.667 6.667 6.651 6.636 6.616 6.598 6.583 6.579 6.569 6.558 6.554 6.549 6.541 6.525 6.514 ------

t

ble 2 (Continued) 2 ble 72198.1.A1_at

Gma.17702.2.S1_a_at Gma.10216.2.S1_at Gma.3813.1.S1_at GmaAffx.82240.1.S1_at GmaAffx.5414.1.S1_s_at GmaAffx.3256.2.S1_at Gma.17702.2.S1_x_at Gma.12160.1.S1_at Gma.1361.1.S1_at GmaAffx.80064.1.S1_at GmaAffx.61222.1.S1_s_at Gma.878.1.S1_at Gma.584.3.S1_at GmaAffx.43239.1.S1_at Gma.5496.1.S1_s_at Gma.10447.2.A1_a_at GmaAffx. Gma.1532.1.S1_at GmaAffx.92964.1.S1_at Gma.8867.2.S1_at Gma.2141.1.S1_at Gma.12487.2.S1_at Gma.15856.1.S1_at Gma.12482.1.A1_a Gma.9508.3.S1_s_at Supplementary Ta Supplementary

146

cellular cellular

other cellular other cellular other transcription process unknown biological to stress response biological other metabolic other transportenergy or electron pathways cellular other process unknown biological transportenergy or electron pathways to stress response transduction signal transcription metabolic other transportenergy or electron pathways to stress response process unknown biological to stress response

-

dependent methyltransferase - - hrome P450 like related proteinrelated 10 ecursor - -

like protein like specific 3 - -

motif lipase/acylhydrolase - No Hits on UniProtNo Hits Ferritin O acid Caffeic Pyrophosphate beta subunit phosphofructokinase Albumin 1 pr Pathogenesis At1g05230/YUP8H12_16 Embryo Osmotin protein regulated Gibberellin GDSL Putative cytoc Protein Hypothetical dioxygenase Leucoanthocyanidin protein like on UniProtNo Hits Protein Hypothetical P450 Cytochrome monooxygenaseCYP93D1 inhibitor Proteinase regulator A response Type nucleosome/chromatin type HMG D factor assembly Oxidoreductase Glutaredoxin Peroxidase protein storage Vegatative P450 monooxygenase Cytochrome

1350 Q941G7 Q3SCM5 Q9ZST3 Q9ZQX0 Q9MB25 Q94C37 Q2HUA9 Q4 Q1S753 Q5J7N0 Q1RV95 Q9FLM1 Q9FFF6 Q9LMA8 Q9XHC6 Q1S278 Q71BZ3 Q6Z1Z2 Q5N800 Q9SGP6 Q43854 O49855 Q2LAJ5

132 6.493 6.454 6.414 6.404 6.388 6.376 6.346 6.319 6.318 6.294 6.265 6.261 6.249 6.242 6.233 6.212 6.212 6.211 6.211 6.207 6.205 6.203 6.186 6.179 6. ------

S1_s_at

ble 2 (Continued) 2 ble S1_at .13796.1.A1_at

GmaAffx.88571.1.A1_s_at Gma.1089.1.S1_at GmaAffx.45805.2.S1_at GmaAffx.60443.1.S1_at Gma.16124.1.S1_at GmaAffx.43575.1.S1_at GmaAffx.59986.2.S1_at Gma.10752.2.S1_at Gma.9839.2.S1_at Gma.1379.3.S1_at Gma.17724.3.S1_at Gma.13058.2.S1_at Gma.6211.1.S1_at Gma.18041.1.A1_at Gma.13905.1.A1_at Gma.3893.3. Gma.5510.2.S1_s_at Gma.17733.1.S1_s_at Gma Gma.15947.1.S1_at Gma.12160.1.S1_a_at GmaAffx.90206.1. Gma.4305.2.S1_a_at GmaAffx.89635.1.A1_s_at Gma.9894.1.S1_at Supplementary Ta Supplementary

147

abiotic or biotic or abiotic

transport or energy transport energy or

metabolic metabolic transport other to stress response process unknown biological process unknown biological metabolic other transport transportenergy or electron pathways process unknown biological electron pathways cellular other to stress response metabolic other transportenergy or electron pathways to response stimulus developmental process unknown biological cellular other metabolic other

L

- S -

D

-

-

1 CoA thiolase - -

ductase e protein e

protein rich - like protein like

transfer protein transfer - lik - - 4 - Polyprotein Lipid S Thiohydroximate glucosyltransferase Peroxidase IDS oxidase Polyphenol IDS4 dehydrogenase (NAD) Aldehyde protein family transporter, 3phosphate glycerol putative re Nitrite transporter ABC Putative phi P450 monooxygenase Cytochrome ofedges protein calyx Adhesion precursor) ACE (HOTHEAD Annexin Protein Hypothetical G enzyme, Lipolytic Amine oxidase protein inhibiting Polygalacturonase precursor Acetoacetyl phosphatase Acid Lipase Proline

Q6WE90 Q1S9L0 Q947K4 Q7XYR7 Q8RY68 Q84YI1 Q8GWZ3 Q1SCN9 Q9STY1 O04840 Q6K4D2 Q9FY71 Q2MJ08 Q9S746 O82090 Q1SHX9 Q1SQA1 Q9SKX5 Q6A171 Q2L8A7 Q6YGT9 Q1SUN0 Q1SAY6

6.067 6.054 6.051 6.038 6.025 6.008 6.006 5.985 5.959 5.952 5.951 5.939 5.921 5.908 5.851 5.849 5.845 5.841 5.813 5.811 5.803 5.798 5.794 ------

ble 2 (Continued) 2 ble .2.S1_at

GmaAffx.93650.1.S1_s_at Gma.4331.1.S1_a_at Gma.4239.2.S1_at GmaAffx.90703.1.A1_at Gma.1008.1.A1_at GmaAffx.12887.1.S1_at Gma.3216.3.A1_at Gma.13140.1.A1_at Gma.3539.1.S1_at Gma.31.1.S1_at GmaAffx.27787.1.A1_at GmaAffx.83880.1.A1_at GmaAffx.73536.1.S1_at Gma.10072.1.S1_at GmaAffx.1301.59.S1_at Gma.17840.1.S1_at Gma.9508.2.S1_at Gma.1578.1.S1_at GmaAffx.91749.1.S1_s_at GmaAffx.90393.1.S1_s_at Gma.529.1.S1_x_at GmaAffx.83607.1.S1_at Gma.5786 Supplementary Ta Supplementary

148

biotic

metabolic other metabolic other cellular other metabolic other metabolism RNA or DNA metabolic other process unknown biological transport transportenergy or electron pathways to stress response process unknown biological cellular other or to abiotic response stimulus cellular other biological other process unknown biological to stress response

se

glucosyltransferase

- oxide lyase 0 -

CoA thiolase tif lipase tif lipase - glutamyltransferase glutamyltransferase

- mo

- amyrin syntha amyrin - ketoacyl - Hypothetical Protein Hypothetical resistance Glyoxalasee/bleomycin protein GDSL 3 Annexin on UniProtNo Hits Beta nuclear cell antigen Proliferating 3 Flavonoid Protein Hypothetical (Fragment) Multicystatin protein precursor transfer Lipid specific protein Tyrosine protein family phosphatase reductase Nitrite Protein Hypothetical Sulfolipid synthase 9/13 hydroper Metallophosphatase (Diphenol oxidase) family Laccase protein Lipoxygenase on UniProtNo Hits Calmodulin on UniProtNo Hits protein transfer Lipid Gamma transcription homeodomain factor

Q1SGA4 Q1S0D0 Q8LB81 Q6TXD0 Q42922 Q8H2B0 O82134 Q9ZWS2 Q8W4Z5 Q6V4X1 Q1KL62 Q9ZVN4 O04840 Q1KSB9 Q69LA5 Q2LAJ5 Q8VXF4 Q84J37 O24320 Q6LEG8 Q6WAT9 Q8W1X8 Q02992

5.787 5.771 5.749 5.727 5.696 5.689 5.683 5.681 5.678 5.659 5.653 5.63 5.625 5.62 5.62 5.598 5.596 5.596 5.594 5.593 5.581 5.568 5.564 5.551 5.509 5.483 ------

_at

ble 2 (Continued) 2 ble 14.1.S1_at

GmaAffx.24879.1.S1_at Gma.7771.1.S1_at GmaAffx.50097.1.S1_at GmaAffx.72944.1.S1_x_at Gma.6613.2.S1_at GmaAffx.66400.1.A1_s_at Gma.16877.1.S1_at Gma.431.2.S1_at Gma.1002.2.S1_at Gma.3048.1.S1_a_at Gma.33 Gma.4630.1.S1_at Gma.6942.1.S1_at GmaAffx.80581.1.S1 GmaAffx.57661.1.S1_at Gma.17388.1.S1_at GmaAffx.79275.1.S1_s_at GmaAffx.16966.1.S1_at Gma.1614.1.S1_at Gma.11166.1.S1_at Gma.1649.1.A1_at GmaAffx.92868.1.S1_s_at Gma.12365.2.S1_at GmaAffx.93212.1.S1_at Gma.14319.1.A1_at GmaAffx.1877.1.A1_at Supplementary Ta Supplementary

149

ic or bioticic or

lular lular

us r cellular r cellular othe transport process unknown biological transport cellular other biotic or to abiotic response stimul process unknown biological cellular other to abiot response stimulus transport to stress response cel other process unknown biological cellular other biotic or to abiotic response stimulus process unknown biological metabolism protein process unknown biological biotic or to abiotic response stimulus process unknown biological developmental

catalytic catalytic

-

like

-

tein

monooxygenase monooxygenase monooxygenase 1 non - - -

phosphosulfate phosphosulfate hydroxylase - - CoA thiolase - like protein like -

like protein like - - tRNA reductase

- 2, chloroplast precursor precursor 2, chloroplast - interacting factor interacting 1 cinnamate 4 cinnamate 4 cinnamate - - -

Adenosine 5'Adenosine reductase TIP3 Aquaporin phosphatase 1 Purple acid Polygalacturonase subunit precursor beta Aquaporin 5'Adenosine reductase Trans reductase homolog 2 Isoflavone Fiddlehead Glutamyl protein extrusion Multi antimicrobial MatE Peroxidase precursor Ferritin Transferase GRF1 3'Flavonoid Kunitz ST1 inhibitor Carboxypeptidase on UniProtNo Hits pro binding Calcium Trans Protein Hypothetical Acetoacetyl

Q8W1A1 Q39883 Q9LL80 Q40161 Q39822 Q8W1A1 P37115 Q9SDZ0 Q8VWP9 Q9ZPK4 Q1SD89 Q6T1C8 Q94IC4 Q2HVN0 Q8L8A5 Q8W3Y5 Q1SMD8 Q67Y83 Q8W4L0 P37115 Q9LMA8 Q2L8A7

5.477 5.475 5.465 5.458 5.455 5.455 5.439 5.435 5.427 5.416 5.399 5.38 5.376 5.375 5.36 5.353 5.347 5.341 5.34 5.333 5.327 5.314 5.308 ------

ble 2 (Continued) 2 ble .8081.1.A1_at ma.11026.1.S1_at

GmaAffx.88681.1.S1_s_at Gma.43.1.A1_at GmaAffx.93430.1.S1_s_at Gma.8368.1.S1_at Gma GmaAffx.88681.1.S1_at Gma.3881.2.S1_x_at GmaAffx.93511.1.S1_s_at GmaAffx.25347.1.S1_at GmaAffx.24762.1.S1_at Gma.9728.1.S1_at Gma.5524.1.S1_at Gma.2505.2.S1_x_at GmaAffx.40203.1.S1_at Gma.8684.1.S1_at Gma.8455.1.S1_at G Gma.1951.2.S1_a_at GmaAffx.3021.1.A1_at Gma.1518.2.S1_x_at Gma.3881.2.S1_at Gma.11035.1.S1_at GmaAffx.89899.1.S1_s_at Supplementary Ta Supplementary

150

unknown

esponse to stress esponse developmental developmental biotic or to abiotic response stimulus to stress response transportenergy or electron pathways to stress response transduction signal transportenergy or electron pathways cellular other process unknown biological cellular other to stress response to stress response r cellular other developmental metabolism protein metabolic other process unknown biological process biological

related

like protein like -

-

like protein like

-

methyltransferase methyltransferase -

O disease disease resistane

like - - -

regulating 1 regulating factor 2, chloroplast precursor precursor ( 2, chloroplast

- - odulin gene related odulin protein, gene dependent mannitoldependent - oxophytodienoate reductase oxophytodienoate - Tropinone reductase Tropinone like Dirigent protein response oxidaseReticuline Glutaredoxin lyase 9/13 hydroperoxide protein regulated Gibberellin NAD dehydrogenase kinase familyAdenylate Ferritin protein binding Calcium 3 acid Caffeic pathogenesis Thaumatin, on UniProtNo Hits Peroxidase 12 on UniProtNo Hits on UniProtNo Hits dUTP pyrophosphatase on UniProtNo Hits Homeobox protein Growth kinase, putative tyrosine Protein n Early putative phosphatase Acid on UniProtNo Hits phosphatase acid Pruple

Q9ZW03 Q27JA0 Q9AYM8 Q9SGP6 Q7X9B3 Q49RB3 O82515 Q38JG0 Q94IC4 Q2QY10 P28002 Q1SSC0 Q58GF4 Q4ZJ73 Q9STG6 Q8LJS8 Q6ZIK5 Q1S4Z1 Q5J7N0 Q6YGT9 Q8VXF4

3 7

5.307 5.264 5.26 5.262 5.26 5.24 5.238 5.229 5.224 5.223 5.209 5.177 5.174 5.173 5.166 5.15 5.141 5.138 5.132 5.13 5.126 5.119 5.111 5.104 5.087 5.084 ------

1_s_at

at

ble 2 (Continued) 2 ble 61.1.S1_at

Gma.2096.2.S1_x_at Gma.957.1.S1_at Gma.17032.1.S1_at Gma.1704.1.S1_at GmaAffx.91313.1.S1_s_at Gma.12438.1.S1_s_at Gma.1545.2.S1_at Gma.15584.2.S1_x_at Gma.2505.1.S1_a_ Gma.1518.1.S1_at Gma.9622.1.S1_at GmaAffx.93041.1.S GmaAffx.29360.1.S1_s_at Gma.12616.1.A1_at GmaAffx.93442.1.S1_at Gma.17967.1.S1_at GmaAffx.90548.1.S1_x_at GmaAffx.90001.1.S1_at GmaAffx.87317.1.S1_at GmaAffx.158 GmaAffx.75820.1.S1_at GmaAffx.8902.1.A1_at GmaAffx.52941.1.S1_at Gma.529.1.S1_at Gma.15564.3.S1_s_at Gma.15233.1.S1_at Supplementary Ta Supplementary

151

ar ar

other metabolic metabolic other cellular other process unknown biological developmental metabolism protein process unknown biological process unknown biological cellular other process unknown biological to stress response process unknown biological process unknown biological cellular other cellular other transport cellular other cellul other to stress response to stress response process unknown biological

like like

-

-

like like protein subunit like proteinlike

- - -

transferase transferase iProt - - hydroxylase

- l Protein

kDa glycoprotein precursor glycoprotein precursor kDa

nger protein nger glucuronosyl/UDP like protein like - - ein homolog ein Phosphatase related Homeodomain 3 Flavanone protein fi Zinc reductase Tropinone protease Subtilisin serine Hypothetica acid elongase Fatty on UniProtNo Hits oxidase precursor Polyphenol protein; resistance Disease AAA Rich Repeat ATPase Leucine Calnexin Glutathione S receptor Gibberellin Glutathione S Ids4 Stem 28 prtein, storage Vegetative lyase a ATP citrate on Un No Hits Aquaporin hydroxymethyltransferase, Serine ( precursor mitochondrial 1 enzyme biosynthetic Rhamnose UDP glucosyltransferase shock transcription Heat factor controlled tumor Translationally prot

Q8GUC2 Q1SLD2 Q1T6K5 Q3S345 Q9ZW03 Q1S0Z4 Q6V8P5 Q8LAF8 O24056 Q1SI90 Q39817 P32110 Q9LYC1 Q9FQE8 O48781 P15490 Q6AVB5 Q93YH4 Q506K1 P34899 Q9SYM5 Q1STD8 Q43457 Q944T2

5.084 5.084 5.08 5.062 5.054 5.043 5.041 5.036 5.027 5.026 5.015 5.003 4.99 4.99 4.98 4.979 4.972 4.957 4.935 4.92 4.914 4.912 4.912 4.908 4.906 4.905

------inued)

ble 2 (Cont 2 ble ffx.91725.1.S1_at

Gma.1359.1.S1_at GmaAffx.29025.1.A1_at Gma.4438.3.S1_x_at Gma.15048.2.S1_at Gma.2096.2.S1_a_at GmaAffx.28861.1.S1_at GmaAffx.26533.1.A1_s_at GmaAffx.45449.1.S1_at GmaAffx.92830.1.S1_s_at Gma.6452.1.A1_at GmaA GmaAffx.90865.1.S1_s_at GmaAffx.19480.1.S1_at GmaAffx.81366.1.S1_at Gma.1917.1.S1_s_at GmaAffx.79990.1.S1_s_at GmaAffx.16212.1.S1_at Gma.4462.1.S1_at GmaAffx.93250.1.S1_at HgAffx.12572.1.S1_at Gma.14471.4.S1_x_at GmaAffx.64197.1.S1_at GmaAffx.82887.1.S1_at GmaAffx.71692.1.S1_at Gma.2342.1.S1_at Gma.5831.3.S1_s_at Supplementary Ta Supplementary

152

electron transport or energy transportenergy or electron pathways metabolism RNA or DNA to stress response and organization cell biogenesis biotic or to abiotic response stimulus biotic or to abiotic response stimulus metabolic other metabolism protein transport metabolic other process unknown biological cellular other metabolic other biotic or to abiotic response stimulus biological other cellular other cellular other transport

e -

L - like protein like

- S

-

D

-

t

like like -

CoA 2

-- diphosphate delta - H3/fasciclin protein rich - hydrolase, putativehydrolase, - dependent mannitoldependent glucose glucose glucosyltransferase glucosyltransferase Ig - - - - Hits on UniProt

coumarate - Adenylate kinase kinase familyAdenylate A1 protein Replication on UniProtNo Hits UDP Laccase Proline 4 inhibitor protein Kunitz trypsin synthetas glutamine Cytosolic nucleoid DNA chloroplast CND41, protein binding resistance protein Pleiotropic drug Lipase/ UDP Multicystatin Isopentenyl 1 type isomerase, 18 hydrolase, family Glycoside NAD dehydrogenase on UniProNo Hits G enzyme, Lipolytic Beta protein neck Crooked No Protein Hypothetical on UniProtNo Hits

Q38JG0 Q9SD82 Q1S2D1 O22917 Q39887 P31687 Q07502 Q9FUK4 Q9LS40 O81016 Q8LD01 Q66PF2 Q6V4X1 Q6EJD1 Q1SNG7 Q9ZRF1 Q1S8F6 Q1SN75 Q75HS3 O64865

4.901 4.901 4.874 4.861 4.854 4.844 4.834 4.828 4.814 4.812 4.806 4.793 4.789 4.785 4.777 4.72 4.707 4.701 4.697 4.693 4.682 4.657 4.653 4.645 ------

ble 2 (Continued) 2 ble 1_at Gm_P450_M_s_at -

Gma.15584.2.S1_at GmaAffx.23381.1.S1_at GmaAffx.89678.1.S1_s_at GmaAffx.87441.1.S1_at Gma.5141.1.S1_at Gma.15996.1.S1_at Gma.8472.1.S Gma.436.1.S1_at GmaAffx.92117.1.S1_s_at GmaAffx.91167.1.S1_s_at GmaAffx.92244.1.S1_at Gma.12567.1.S1_at GmaAffx.277.1.S1_at Gma.3314.1.S1_a_at Gma.6602.2.S1_s_at Gma.9221.1.S1_at GmaAffx.81892.1.S1_at AFFX Gma.17825.1.A1_at Gma.12974.1.S1_at Gma.13402.1.A1_at GmaAffx.69470.1.S1_at Gma.3880.1.S1_at GmaAffx.79567.1.A1_at Supplementary Ta Supplementary

153

other cellular cellular other cellular other process unknown biological to stress response cellular other to stress response metabolic other developmental to stress response cellular other to stress response cellular other transportenergy or electron pathways process unknown biological cellular other metabolic other developmental process unknown biological metabolic other to stress response

-

phosphate - 5

-

sphosphate

xylulose

CoA thiolase CoA thiolase

- - binding protein binding protein precursor

D

- 2, chloroplast precursor precursor 2, chloroplast - xylosidase - CoA acetyltransferase, CoA acetyltransferase,

- - omain protein thetical Protein D - ketoacyl deoxy - - 3 di Mevalonate decarboxylase Annexin, Peroxidase 1 synthase Luminal Protein Hypothetical NAC3 domain protein NAC C4 oxideAllene cyclase Ferritin oxideLatex allene synthase on UniProtNo Hits synthase Chalcone P450 monooxygenase Cytochrome CYP72A1 Hypo on UniProtNo Hits 1 dehydrogenase alpha NADH subunit isolog Cellulose synthase synthases, stilbene N and Chalcone terminal on UniProtNo Hits Acetyl 1 cytosolic d ML Beta dehydrogenase Alcohol lyase 9/13 hydroperoxide

Q6TXD0 Q9M381 Q1SAQ5 Q9XFI6 Q5MJZ4 O22639 Q9ZWS2 Q52QR3 Q68IP6 Q94IC4 Q2PJB7 Q2ENC2 Q2LAK7 Q94EY7 Q1S0T2 Q3Y6V1 Q1S1C2 Q2L8A7 Q1S608 Q2MCJ4 Q8LBR3 Q7X9B3

4.645 4.629 4.621 4.62 4.615 4.606 4.598 4.598 4.594 4.576 4.571 4.567 4.561 4.556 4.542 4.542 4.54 4.53 4.527 4.521 4.521 4.511 4.508 4.481 4.473 ------

t

ble 2 (Continued) 2 ble 5.1.S1_at

GmaAffx.72944.1.S1_at Gma.7599.3.S1_at GmaAffx.89049.1.S1_s_at Gma.338.2.S1_s_at GmaAffx.15794.1.S1_at Gma.13350.1.S1_x_at GmaAffx.23059.1.S1_at GmaAffx.57970.1.S1_at Gma.8020.2.S1_at Gma.2505.2.S1_at GmaAffx.86334.2.S1_at Gma.1121 Gma.13327.1.S1_at GmaAffx.77177.1.S1_at Gma.5089.1.S1_at Gma.10151.1.S1_at Gma.3289.1.S1_at GmaAffx.88982.1.A1_s_at Gma.13994.1.A1_a GmaAffx.25607.1.A1_at Gma.4468.1.S1_at Gma.5632.1.S1_at Gma.838.1.S1_at GmaAffx.84474.1.S1_at Gma.6281.1.S1_at Supplementary Ta Supplementary

154

ss unknown

biological process unknown biological process unknown biological cellular other metabolism protein metabolic other metabolism RNA or DNA cellular other transcription proce biological cellular other developmental transportenergy or electron pathways to stress response transportenergy or electron pathways biotic or to abiotic response stimulus process unknown biological to stress response and organization cell biogenesis cellular other

nce 7

- like like protein

-

CoA reductase -

isomerase -

like protein like like proteinlike - glucuronosyl/UDP chromosome maintena chromosome citrate lyase subunit B citrate - - - - 099 isulfide Hypothetical Protein Hypothetical IDS4 finger protein RING BRH1 synthase Flavonol protein kinase, Serine/threonine site active 1 hydrolase, family Glycoside Mini ATP Homeobox protein reductase homolog Isoflavone small synthase, subunit ATP citrate NAM Protein Hypothetical CT on UniProtNo Hits on UniProtNo Hits C4 oxideAllene cyclase D lipoxygenase Seed Protein Hypothetical on UniProtNo Hits UDP glucosyltransferase Actin Cinnamoyl on UniProtNo Hits

31 Q8LA20 O48781 Q1SE79 Q8LP22 Q1SJ01 Q1RV93 Q6UEJ2 Q9FGX1 Q94C37 P52581 Q1SXK5 Q9FY93 Q6SA75 Q2PER5 Q68IP6 Q38JJ2 P24095 Q60D21 Q1S3G7 Q1SQB5 Q9M6

4.471 4.468 4.465 4.464 4.462 4.46 4.454 4.43 4.429 4.425 4.41 4.41 4.398 4.373 4.372 4.36 4.365 4.358 4.354 4.346 4.34 4.337 4.336 4.33 4.32 ------

ble 2 (Continued) 2 ble S1_at fx.90548.1.S1_s_at

Gma.4739.1. Gma.3216.2.S1_at Gma.1165.1.S1_at GmaAffx.3256.1.S1_at Gma.3189.1.S1_a_at Gma.17768.1.A1_at Gma.10482.2.S1_a_at GmaAffx.55661.1.S1_at GmaAffx.75004.1.S1_at Gma.13342.1.A1_at Gma.5018.1.S1_at GmaAffx.81791.1.S1_at Gma.612.1.A1_at GmaAffx.82936.1.S1_at GmaAffx.73659.1.S1_at GmaAffx.83079.1.S1_at Gma.8020.2.S1_a_at Gma.5649.3.S1_a_at GmaAffx.90937.1.A1_at GmaAffx.87089.1.S1_at GmaAf GmaAffx.70608.1.S1_at GmaAffx.93161.1.S1_s_at GmaAffx.93522.1.S1_s_at Gma.1345.1.S1_at Supplementary Ta Supplementary

155

olism

ar ar

ulus process unknown biological cellular other metab protein cellular other metabolic other transcription to stress response biological other to stress response metabolic other cellul other process unknown biological biotic or to abiotic response stim cellular other process unknown biological cellular other

-

coenzyme A A coenzyme

-

subunit

-

phosphate

-

3

specific 3 -

- dependent dependent enate dehydratase enate -

containing - ascorbate oxidase precursorascorbate - No Hits on UniProtNo Hits Embryo Preph superfamily, dehydrogenase Alcohol zinc on UniProtNo Hits bindingprotein like DNA Nucleoid Hydroxymethylglutaryl reductase on UniProtNo Hits rhamosyl Anthocyanidine transferase domain layer homeo cell Outer putative protein, P450 monooxygenase Cytochrome CYP74C Annexin C Aminopeptidase Glucosyltransferase Glycerol 4 acyltransferase Arginase Protein Hypothetical Protein Hypothetical ammonia Phenylalanine/histidine lyase lyase a ATP citrate oxidase amine Copper L NAD family epimerase/dehydratase

Q1S104 Q6JJ29 Q1S8H7 O65739 Q1RY98 Q66PF2 Q7Y0W0 Q2LAJ7 Q9ZR53 Q9FG66 Q8S995 Q1SCJ7 O49046 Q1SMA6 Q1RW87 Q1RYR2 Q93YH4 Q70EW1 Q8LJU3 Q2XPW6

291 4.318 4.316 4.315 4.313 4.306 4.306 4.305 4.301 4.293 4. 4.291 4.285 4.284 4.284 4.272 4.255 4.243 4.239 4.239 4.239 4.239 4.236 4.236 ------

ble 2 (Continued) 2 ble .S1_a_at

GmaAffx.48718.1.S1_at Gma.14850.1.S1_at Gma.2635.1.A1_at Gma.6540.1.S1_at GmaAffx.70258.1.S1_s_at GmaAffx.72179.1.A1_at GmaAffx.56975.1.S1_at Gma.795.1.A1_at Gma.1110.1.S1_at Gma.13015.1.A1_at GmaAffx.86334.1.S1_at GmaAffx.89971.1.S1_s_at GmaAffx.62364.1.S1_at GmaAffx.60849.1.A1_at GmaAffx.51326.1.S1_at Gma.2086.1.S1_at GmaAffx.80375.1.S1_at Gma.16984.1.A1_at GmaAffx.86773.1.S1_s_at GmaAffx.93250.1.S1_s_at GmaAffx.80073.1.S1_at GmaAffx.47157.1.A1_at Gma.11299.3 Supplementary Ta Supplementary

156

biological process unknown biological to stress response to stress response process unknown biological cellular other transportenergy or electron pathways metabolism protein to stress response metabolic other metabolism protein metabolism RNA or DNA metabolism protein process unknown biological biological other cellular other metabolism protein transcription metabolism protein

like like

- binding binding -

catalytic catalytic -

phosphate

- methyltransferase methyltransferase glucosidase glucosidase - 3 - -

-

protein

beta -

CoA O like protein like like binding protein binding protein precursor rich repeat receptor repeat rich - - - - - motif lipase/hydrolase -

ntenyl pyrophosphatentenyl t shock protein 90 GDSL Hypothetical Protein Hypothetical Osmotin Luminal Protein Hypothetical Protein Hypothetical Isope isomerase kinase Thymidine Caffeoyl Hea Peroxidase precursor on UniProtNo Hits DNA Chloroplast nucleoid cnd41 protein maintenance Minichromosomal factor 1,3 Glucan protein kinase, Serine/threonine site active non Polygalacturonase precursor subunit AroGP2 Annexin Glyceraldehyde dehydrogenase Leucine kinase protein transcription factor bHLH N Anthranilate hydroxycinnamoyl/benzoyltransfera

Q8L9J9 Q41350 O22639 Q1RVA3 Q9FX61 Q6EJD1 Q9S750 Q40313 P35016 O23961 Q9AYM7 Q6Z671 Q1S047 Q8RU51 Q1SJ01 P93217 Q9ZR53 Q2I0H4 O82432 O81348 Q1S6U2

3

4.235 4.233 4.23 4.227 4.215 4.213 4.207 4.192 4.191 4.176 4.176 4.171 4.162 4.161 4.161 4.155 4.153 4.143 4.132 4.129 4.12 4.121 ------

ble 2 (Continued) 2 ble S1_s_at 292.1.S1_at 3189.1.S1_at

Gma.9832.1.S1_at Gma.9839.1.S1_at Gma.13350.1. GmaAffx.42517.1.A1_at Gma.16290.1.S1_s_at Gma.6602.1.S1_x_at GmaAffx.27192.2.S1_s_at GmaAffx.88786.1.A1_s_at GmaAffx.76520.1.S1_at GmaAffx.91168.1.S1_at Gma.1715.1.S1_at GmaAffx.5510.1.S1_s_at GmaAffx.85 GmaAffx.37281.1.S1_at Gma.15113.1.S1_a_at Gma. GmaAffx.60035.1.S1_at Gma.385.1.S1_at GmaAffx.89821.1.A1_at GmaAffx.66617.1.S1_at GmaAffx.39734.2.S1_at GmaAffx.58559.1.A1_at Supplementary Ta Supplementary

157

wn

ular ular

biological process unknown biological process unknown biological transport transport process unknown biological biological other cell other cellular other and organization cell biogenesis transcription process unknown biological cellular other transportenergy or electron pathways metabolism RNA or DNA process unkno biological cellular other metabolic other transportenergy or electron

-

methylglutaryl box binding protein box protein binding - -

3 -

on UniProt

dependent dependent

-

hydroxy

- Annexin, Protein Hypothetical intrinsicPlasma membrane protein Phosphate/phosphoenolpyruvate translocator dehydrogenase Alcohol inhibitor Protease GDSL enzyme, Lipolytic hydrolase fold Alpha/beta reductase Methylenetetrahydrofolate 1 Actin on UniProtNo Hits G Phaseolin PG1 Protein Hypothetical 3 3 A reductase coenzyme P450 Cytochrome factor MCM7 licensing Replication homologue Protein Hypothetical NAD family epimerase/dehydratase protein Lipase on UniProtNo Hits No Hits P450 Cytochrome

G753 1101 Q1SAQ5 Q9LMA8 Q5ZF77 Q58J25 Q7 Q8RVX2 Q94CH6 Q1SED4 Q9SE94 Q1SQB5 Q4 Q8LDY8 Q1RY98 Q6QNI1 Q6UEJ2 Q9LK44 Q2XPW6 Q1SH82 O81972

3

4.113 4.11 4.097 4.084 4.082 4.08 4.074 4.07 4.062 4.062 4.058 4.055 4.052 4.039 4.033 4.024 4.017 4.016 4.012 4.008 4.001 3.996 ------

1_at

ble 2 (Continued) 2 ble

GmaAffx.89049.1.S1_at Gma.11035.1.S1_s_at GmaAffx.72444.1.S1_at GmaAffx.72720.1.S1_at Gma.10666.1.S1_at Gma.6231.1.S1_at GmaAffx.87400.1.S1_at GmaAffx.17924.1.S1_at GmaAffx.92142.1.S1_s_at GmaAffx.75159.1.S1_x_at PsAffx.C45000050_at Gma.4563.2.S1_at Gma.17292.1.A1_s_at Gma.4589.2.S1_at GmaAffx.45122.1.S Gma.10482.1.A1_a_at GmaAffx.25551.1.S1_at Gma.11299.3.S1_x_at GmaAffx.38907.1.S1_at Gma.10234.1.S1_at GmaAffx.34785.7.S1_s_at Gma.153.1.S1_at Supplementary Ta Supplementary

158

olic olic

l transduction biological process unknown biological metab other metabolism protein to stress response metabolism RNA or DNA cellular other cellular other biological other transcription transcription biotic or to abiotic response stimulus signa developmental transport cellular other and organization cell biogenesis process unknown biological metabolism protein

coenzyme A A coenzyme -

rotein like p like hydrogenase 1 alpha like proteinlike kinase - responsive element

- - like protein protein like 1, chloroplast precursor precursor 1, chloroplast glucosidase precursor glucosidase - -

- induced protein induced ic subunit -

D -

like RNAse 28 RNAse like - Germin Beta bindingprotein like DNA Nucleoid C4 oxideAllene cyclase subunit alpha polymerase DNA IV (Primase) Ferritin Hydroxymethylglutaryl reductase Auxin Calnexin precursor homolog on UniProtNo Hits factor WRKY4 Transcription Ethylene homolog protein binding TIR on UniProtNo Hits regulator Response S on UniProtNo Hits protein extrusion Multi antimicrobial (MatE) de NADH subunit Protein Hypothetical Actin B type acetyltransferase Histone catalyt Receptor

Q9SC38 O82074 Q2PEZ2 Q599T8 Q9FJ26 P19976 Q8W2E3 P33083 Q39817 Q2PJR9 Q9ZR83 Q1SWS7 Q1RU46 P42814 Q1T635 Q1S0T2 Q1SD74 Q1SQB5 Q9FJT8 Q9FHM8

3.996 3.994 3.992 3.99 3.99 3.982 3.982 3.977 3.977 3.975 3.972 3.967 3.961 3.96 3.959 3.958 3.954 3.952 3.941 3.931 3.93 3.925 3.924 ------

ble 2 (Continued) 2 ble

Gma.15683.1.S1_at GmaAffx.13543.1.A1_at GmaAffx.62256.1.S1_at Gma.8020.3.S1_at GmaAffx.46567.1.S1_at GmaAffx.93603.1.S1_s_at GmaAffx.89783.1.S1_s_at Gma.9961.1.S1_at Gma.6427.3.S1_a_at GmaAffx.52252.1.A1_at Gma.3730.2.S1_a_at Gma.17785.1.S1_at GmaAffx.92441.1.S1_s_at GmaAffx.21045.1.A1_at Gma.300.1.A1_s_at Gma.12599.1.S1_at Gma.12658.1.A1_at Gma.10447.1.S1_at Gma.5712.1.S1_s_at GmaAffx.90450.1.S1_at GmaAffx.75159.1.S1_at GmaAffx.86609.1.S1_at Gma.7646.2.S1_at Supplementary Ta Supplementary

159

tion and tion and

nsport biological process unknown biological organiza cell biogenesis to stress response metabolism protein tra metabolism RNA or DNA cellular other biological other to stress response process unknown biological biotic or to abiotic response stimulus process unknown biological and organization cell biogenesis process unknown biological cellular other cellular other metabolic other metabolism protein cellular other

L -

S

- phosphate - er D

-

nsferase

transferase

fructose - - like protein like - glucose glucosyltra 3 - - - 3 pothetical Protein - complex protein 1 complex protein adenosylmethionine synthetase adenosylmethionine - - STIG1 protein STIG1 H2AX Histone on UniProtNo Hits Sulfolipid synthase on UniProtNo Hits 14 on UniProtNo Hits T UDP glucosyltransferase Protein Hypothetical maintenance Minichromosomal factor Protein Hypothetical Cellulose synthase G enzyme, Lipolytic A1 protein Replication Protein Hypothetical Ammonium transport Protein Hypothetical histone Centromeric Hy S Glutathione S UDP metallopeptidases Peptidase, Mevalonate kinase kinase/phosphomevalonate

Q9SZ28 O65759 Q9LFB4 P46266 Q8H9B2 Q1RTA0 Q8LEX2 Q8H0G9 Q1S8S7 Q6J8X2 Q1S8F9 Q7XZT4 Q1RVM0 Q1SC78 Q1SMQ2 Q6T367 Q8VY60 Q1SFW5 Q9FQF1 Q8LJC6 Q93Z89 Q2HTL9

3.923 3.922 3.921 3.902 3.901 3.898 3.898 3.891 3.882 3.879 3.877 3.876 3.869 3.866 3.86 3.858 3.855 3.851 3.844 3.843 3.841 3.839 3.836 3.83 3.824 ------

ble 2 (Continued) 2 ble ma.1127.1.S1_at

GmaAffx.62941.1.S1_at Gma.10687.1.S1_at Gma.17607.2.S1_s_at GmaAffx.83981.1.S1_at GmaAffx.25369.1.S1_s_at Gma.16576.2.S1_a_at G Gma.3183.3.S1_at Gma.17130.1.S1_at Gma.6128.1.S1_at GmaAffx.52029.1.S1_at Gma.15818.1.S1_at Gma.6331.1.S1_at Gma.13370.1.A1_at GmaAffx.72251.1.S1_at Gma.12400.1.S1_at Gma.7642.1.A1_at GmaAffx.83638.1.S1_at GmaAffx.42249.1.S1_at GmaAffx.68853.1.S1_at Gma.406.4.S1_s_at Gma.10032.1.S1_at GmaAffx.24470.1.S1_at GmaAffx.93541.2.S1_at GmaAffx.377.1.S1_at Supplementary Ta Supplementary

160

ess unknowness

cal

metabolic other biotic or to abiotic response stimulus metabolism RNA or DNA metabolic other to stress response cellular other metabolic other transcription biologi other process unknown biological process unknown biological transport cellular other process unknown biological proc biological transcription metabolism RNA or DNA biological other

L

-

S - synthase synthase

- D

-

-

gamma binding protein III C III

- - - rotein

glucosyltransferase like p like - - like GTP like - ascorbate oxidase precursorascorbate coumarate:CoA ligase ligase coumarate:CoA isoenzyme

- - No Hits on UniProtNo Hits dehydrogenase NADH Uroporphiryn methyltransferase protein like MATE efflux on UniProtNo Hits Protein Hypothetical Pectinesterase L UDP (lipoxygenase) homolog Loxc Cystathionine precursor G enzyme, Lipolytic on UniProtNo Hits shock transcription Heat factor Rac Protein Hypothetical Protein Hypothetical betaine transporter Proline/glycine 4 2 protein Peroxisomal membrane PMP22 Protein Hypothetical factor WRKY transcription maintenance Minichromosomal factor responsive Auxin and ethylene GH3

1SDJ4 Q9SK66 Q1SWY0 Q9LUH2 Q9M1S9 Q1RXS8 O24093 Q43716 P93698 Q9XFG0 Q Q43458 Q6EP31 Q501D5 O81297 Q1S5G3 Q8S5C1 Q9ZS51 Q45NI7 Q2PJR9 Q6UEJ2 Q6QUQ3

3.822 3.82 3.817 3.817 3.816 3.816 3.812 3.811 3.807 3.797 3.795 3.786 3.781 3.779 3.778 3.776 3.775 3.774 3.77 3.766 3.758 3.751 3.75 3.748 ------

ble 2 (Continued) 2 ble 396.1.S1_s_at fx.91641.1.S1_s_at

GmaAffx.73405.1.S1_at GmaAffx.81688.1.S1_at Gma.8220.1.S1_at GmaAffx.59053.1.S1_at GmaAffx.41 GmaAffx.88024.1.A1_at Gma.5744.2.S1_at GmaAffx.87031.1.S1_at Gma.2450.1.A1_at Gma.702.1.S1_at GmaAffx.93405.1.S1_s_at Gma.1907.1.S1_at GmaAffx.61513.1.A1_at Gma.8466.1.S1_at Gma.16292.2.S1_a_at GmaAffx.86552.2.S1_at Gma.16912.1.S1_at Gma.8556.1.S1_at GmaAf Gma.10709.2.S1_a_at GmaAffx.82528.2.S1_at GmaAffx.91768.1.S1_s_at GmaAffx.32647.1.S1_at Gma.9611.1.A1_at Supplementary Ta Supplementary

161

on signal transduction signal metabolic other process unknown biological transcription transduction signal transcription developmental process unknown biological process unknown biological transduction signal cellular other cellular other transcripti process unknown biological metabolism protein like like

- like like -

carboxylic carboxylic n factor, n factor, -

1 -

specific specific -

box binding protein box protein binding 2 derived factor

- - containing containing

- like protein like -

type trypsin inhibitor KTI1 inhibitor trypsin KTI1 type

-

aminocyclopropane - Phosphoinositide C phospholipase Protein Hypothetical putative Lipase/hydrolase, on UniProtNo Hits Kunitz precursor G Phaseolin PG1 Nodulin on UniProtNo Hits Histidine protein phosphotransfer transcriptio like Histone putative on UniProtNo Hits Stromal cell protein isoform Wax synthase on UniProtNo Hits 1 oxidaseacid on UniProtNo Hits on UniProtNo Hits on UniProtNo Hits pyrophosphate Isopentenyl isomerase protein acid binding Nucleic Glutamine cyclotransferase protein kinase Protein

Q1SDG3 Q9ASV9 Q1SQA1 Q8W3K5 Q41101 Q9SLF1 Q94KS0 Q9FHS0 Q1SXA5 Q2HT39 Q8W3Y3 Q6EJD1 Q9SRM4 O81226 Q1S9D3

3.737 3.734 3.734 3.732 3.73 3.726 3.719 3.717 3.717 3.717 3.711 3.705 3.703 3.702 3.699 3.698 3.697 3.695 3.694 3.691 3.688 3.683 ------

.1.S1_at ble 2 (Continued) 2 ble 16000098_at

Gma.11601.1.A1_at GmaAffx.88173.1.S1_at Gma.16787.1.S1_at GmaAffx.64305.1.S1_s_at Gma.16484.1.S1_at Gma.4563.1.A1_at GmaAffx.58494 GmaAffx.42044.1.A1_at Gma.12547.1.A1_at GmaAffx.63832.1.S1_at Gma.13259.2.S1_a_at Gma.4549.1.S1_s_at Gma.11453.1.S1_at GmaAffx.51983.1.S1_at Gma.2776.2.S1_s_at GmaAffx.34785.8.S1_s_at PsAffx.C GmaAffx.87358.1.S1_at Gma.6602.1.S1_at GmaAffx.91743.1.S1_s_at GmaAffx.30749.1.S1_at Gma.18018.1.S1_at Supplementary Ta Supplementary

162

ular ular

ellular ellular cription protein metabolism protein transcription transport trans metabolism protein transport process unknown biological metabolism protein and organization cell biogenesis cell other process unknown biological process unknown biological transport biotic or to abiotic response stimulus biological other transduction signal process unknown biological transport cellular other process unknown biological c other - -

helix DNA

-

loop - 40 repeat protein 40 repeat other nucleotide) other

-

otein

omponent response regulator omponent response regulator binding protein 2 (WRKY)binding protein Rab6 protein binding - c - -

adenosylmethionine synthetase adenosylmethionine synthetase adenosylmethionine - - Lectin precursor Lectin to helix Similarity protein binding protein 1, ATP carrier ADP, precursor mitocondrial DNA precursor Endoplasmin homolog ATP synthesis Mitochondrial transport proton protein coupled Pr Hypothetical kinase Serine/threonine Actin S Sts14 / WD Transducin like GTP protein resistance Disease on UniProtNo Hits Auxin protein induced Two ARR17 protein membrane Integral inhibitor protease some SAM (And motif binding on UniProtNo Hits protein maturation Seed S

Q8L683 Q700C7 O49875 Q9ZPL6 P35016 Q9XGM1 Q60D21 Q2PEU0 O81221 P49613 Q9FJY1 Q9SAI7 O80501 Q9FW47 Q9C9E1 Q9FPR6 Q8L924 O81872 Q1SNV1 Q9SUR3 Q1SFW5

3.682 3.674 3.674 3.671 3.667 3.661 3.659 3.659 3.658 3.658 3.648 3.648 3.647 3.647 3.646 3.644 3.643 3.641 3.641 3.641 3.637 3.634 3.632 ------

ble 2 (Continued) 2 ble 62.1.S1_at x.82443.1.S1_at Gm_P450_3_s_at -

GmaAffx.89246.1.A1_at Gma.136 GmaAffx.91491.1.S1_s_at Gma.7467.1.A1_at GmaAffx.91200.1.S1_s_at Gma.17372.3.S1_a_at GmaAffx.21782.1.A1_at Gma.17368.1.S1_at Gma.16111.3.S1_at Gma.406.5.S1_s_at GmaAffx.4776.1.A1_at GmaAffx.38155.1.S1_at Gma.16367.2.S1_a_at GmaAff AFFX Gma.4162.1.S1_at Gma.13796.2.S1_at GmaAffx.45957.1.S1_at Gma.10634.1.S1_at GmaAffx.56061.1.S1_at GmaAffx.23620.1.S1_s_at Gma.5214.1.S1_at Gma.406.4.S1_at Supplementary Ta Supplementary

163

energy energy

athways other cellular cellular other cellular other transport or electron pathways metabolism protein cellular other transcription physiological other to stress response to stress response process unknown biological process unknown biological developmental cellular other to stress response metabolism RNA or DNA process unknown biological cellular other transportenergy or electron p process unknown biological metabolic other metabolic other

-

related

like protein like - like protein like carboxylic carboxylic - - -

1

-

ein

in

prolyl isomerase isomerase prolyl

-

like

-

like protein protein like - hyeavy likechain hyeavy protein like transcription like factor 1 - 1 protein - aminocyclopropane - 1 oxidaseacid CTP synthase reductase Nitrite subunit 26S proteasome Fructokinase dioxygenase Leucoanthocyanidin protein like Myb on UniProtNo Hits membrane protein Putative integral on UniProtNo Hits pathogenesis Thaumatin, Elongation factor factor; Translation V and G, III Myosin In2 T2 Ribonuclease Putative snRNP prot peptidyl 70 kDa nuclear cell antigen Proliferating Prote Hypothetical Chitinase on UniProtNo Hits kinase familyAdenylate Protein Hypothetical 1 Phospholipase D alpha oxidase 1.10.3.1) (EC Polyphenol Xyloglucan

Q8W3Y3 Q9SY26 O04840 Q9FK79 Q307Z3 Q1SIC0 Q8H271 Q9SE40 Q1SSC0 Q1S825 Q9M9F9 Q9FQ95 Q1S6Z1 Q9SUN5 Q43207 O82134 Q45NI7 Q6JX03 Q38JG0 Q56WM6 O04865 Q7Y249 Q8LDW9

3.627 3.621 3.616 3.616 3.612 3.61 3.608 3.608 3.603 3.602 3.601 3.597 3.596 3.591 3.589 3.589 3.586 3.578 3.572 3.569 3.562 3.561 3.561 3.547 3.545 3.542 ------

ble 2 (Continued) 2 ble 2.S1_at

GmaAffx.92536.1.S1_s_at GmaAffx.87872.1.S1_at GmaAffx.51085.1.S1_at Gma.16563.2.S1_a_at GmaAffx.89990.1.S1_s_at Gma.15167.1.S1_at GmaAffx.12694.1.S1_at Gma.16052.1.S1_at GmaAffx.92422.1.S1_s_at Gma.16718.1.A1_at Gma.11336.1.S1_at GmaAffx.93348.1.S1_s_at GmaAffx.6250.1.A1_at GmaAffx.92986.1.S1_at GmaAffx.17634.1.S1_at GmaAffx.88242.1.S1_at GmaAffx.36627.1.S1_at Gma.431.1.S1_at GmaAffx.82528.2.S1_s_at Gma.6081.1.S1_at GmaAffx.73193.1.S1_at Gma.15584.2.S1_a_at GmaAffx.71230.1.S1_at GmaAffx.42955.1.S1_at GmaAffx.46214.2.S1_s_at Gma.4220. Supplementary Ta Supplementary

164

DNA or RNA metabolism RNA or DNA cellular other transport transportenergy or electron pathways process unknown biological cellular other to stress response metabolic other developmental cellular other metabolism RNA or DNA metabolism protein cellular other metabolism protein metabolism RNA or DNA process unknown biological process unknown biological

carboxylic carboxylic -

1 -

phate

methyltransferase methyltransferase -

isomerase related protein isomerase CoA O glucanase glucanase - - - induced protein induced -

chromosome maintenance maintenance chromosome 1,3 - - Hits on UniProt

complex protein 1 subunitcomplex protein epsilon aminocyclopropane protein like - - - Mini MCM3 protein disphos Mevalonate decarboxylase on UniProtNo Hits binding Polyphosphoinositide Ssh2p protein Caffeoyl disulfide protein Phosphate induced synthase Chalcone Elongation factor factor; Translation V and G, III Beta Auxin Acyltransferase 1 oxidaseacid maintenance Minichromosomal factor T 4 oxygenase Inositol 60S protein ribosomal L1 No on UniProtNo Hits recognition complexOrigin subunit 6 Protein Hypothetical Peroxiredoxin/thioredocin peroxidase

Q1SA78 Q944G0 Q1SMR6 Q43095 Q67UF5 Q9FHM9 P24826 Q1S825 Q94EN5 Q1SZR5 Q9MBD4 Q8W3Y3 Q2R482 O04450 Q1SEA5 Q2PEV4 Q60D01 Q8LAB7 Q1T4N5

3.535 3.535 3.526 3.524 3.522 3.516 3.511 3.51 3.508 3.504 3.496 3.495 3.491 3.488 3.485 3.483 3.474 3.471 3.463 3.462 3.462 3.461 ------

t

_at

S1_at

ble 2 (Continued) 2 ble

ffx.93348.1.S1_at ementary Ta ementary

GmaAffx.17904.2.S1_at Gma.7599.1.S1_a_at Gma.6613.1.A1_at Gma.4284.2.S1_s_at Gma.1034.4.S1_s_at GmaAffx.71811.1. GmaAffx.84607.1.S1_at Gma.17605.1.S1_at GmaA GmaAffx.71901.1.S1_s_at Gma.15985.1.S1_at GmaAffx.88371.1.S1_a GmaAffx.92536.1.S1_at Gma.2371.1.S1_at GmaAffx.64720.1.S1 Gma.13186.1.S1_at GmaAffx.53477.1.S1_at Gma.12692.1.A1_at Gma.16552.1.A1_s_at GmaAffx.4919.1.S1_at Gma.5979.1.S1_at GmaAffx.78026.1.S1_at Suppl

165

ansport process unknown biological metabolism RNA or DNA metabolism protein transport to stress response metabolism protein metabolism RNA or DNA metabolism protein process unknown biological to stress response tr transport cellular other cellular other developmental to stress response developmental process unknown biological metabolic other metabolism protein

like protein like

-

ia type ia type

chain

-

CoA ligase -

5 DNA methyltransferase

glucanase -

tical Protein - expansin 3 -

its on UniProt 1,3 - coumarate - No H Protein Hypothetical maintenance Minichromosomal factor protein/ Copfinger Zinc polyprotein Protein AuxinCarrier Efflux 4 Chaperonin gamma maintenance Minichromosome 5 protein deficient subunit RPN6a 26S proteasome Hypothe Monogalactosyldiacylglycerol synthase Protein Hypothetical pyrophosphatase Inorganic on UniProtNo Hits protein 1, carrier ADP,ATP precursor mitochondrial precursor peptide Phytosulfokine 7 synthase Chalcone Cytosine shock heat protein class IV 22.0 kDa precursor Alpha Protein Hypothetical Beta tRNA synthetase Alanyl Protein Hypothetical on UniProtNo Hits

Q9AS90 Q1S047 Q9C536 Q1RYA5 Q84P23 Q6K3R8 O80786 Q7X5X9 Q949S3 Q1RX96 Q9S791 Q43798 O49875 Q7PCB1 P30081 O49022 P30236 Q6T5H5 Q8W465 Q1SPW5 Q1T1K0 Q1RV18

3.46 3.459 3.454 3.454 3.453 3.452 3.451 3.45 3.45 3.447 3.443 3.443 3.439 3.439 3.426 3.425 3.416 3.406 3.406 3.399 3.398 3.394 3.391 3.391 3.39 ------

ble 2 (Continued) 2 ble .S1_s_at 682.1.A1_at

Gma.12928.1.A1_at GmaAffx.58057.1.S1_at GmaAffx.50239.1.A1_at GmaAffx.71287.1.S1_at GmaAffx.74855.1.S1_at Gma.6332.1.S1_at GmaAffx.68649.1.S1_at GmaAffx.85323.1.S1_at Gma.2892.2.S1_a_at GmaAffx.83805.1.S1_at Gma.11848.1.S1_at GmaAffx.88678.1.S1_at GmaAffx.55792.1.S1_at PsAffx.C62000070_at GmaAffx.90396.1.S1_s_at GmaAffx.88690.1.S1_at Gma.4300.1 GmaAffx.87379.1.S1_at GmaAffx.39393.1.S1_at Gma.6 GmaAffx.77908.1.S1_at GmaAffx.68770.2.S1_at GmaAffx.83421.2.S1_at GmaAffx.93614.1.S1_s_at Gma.17420.1.S1_at Supplementary Ta Supplementary

166

ion

cellular other metabolism protein transcript process unknown biological cellular other metabolism protein cellular other process unknown biological metabolism protein transport cellular other process unknown biological and organization cell biogenesis process unknown biological cellular other to stress response transcription biotic or to abiotic response stimulus to stress response transcription

.like .like -

like like -

like protein like

-

dependent dependent glutamate

-

tubulin 1

- ribosylation factor ribosylation subunit of K+ channelssubunit of K+ silencing protein silencing - - - complex protein 1, theta subunit 1, theta complex protein - No Hits on UniProtNo Hits on UniProtNo Hits transporter 1 ABC particle regulatory Proteasome subunit ammonium transport Symbiotic (Putative transcription protein factor) Protein Hypothetical Anti ADP domain barrel kinase, Pyruvate 17 hydrolase family Glycosyl protein on UniProtNo Hits binding DNA like domain MYB T Beta protein AIM1 Protein Hypothetical Alpha Protein Hypothetical NADH synthase isozyme decarboxylase Pyruvate Protein Hypothetical on UniProtNo Hits Tubulin beta SWI2/SNF2

Q58IU5 Q7X5X9 Q75GI1 Q9LY88 Q9LS09 Q7F2H4 Q1S8N3 Q9FZ86 Q1SDA6 Q9SQR6 O82064 Q1SWD4 Q1SCG2 Q6VAF9 Q9FRK6 Q93WZ7 P51851 Q9C6M4 P29516 Q7XBI0

8

3.383 3.38 3.375 3.374 3.37 3.368 3.366 3.366 3.362 3.362 3.361 3.35 3.352 3.352 3.332 3.328 3.326 3.321 3.319 3.314 3.311 3.309 3.308 3.307 ------

1_at ble 2 (Continued) 2 ble 93126.1.S1_s_at 92715.1.S1_s_at

Gma.14101.1.A1_at Gma.15947.2.A1_at Gma.12014.1.A Gma.2892.2.S1_at Gma.6638.2.S1_a_at GmaAffx.3215.1.A1_at GmaAffx.64309.2.S1_at Gma.11398.4.S1_at GmaAffx. Gma.10997.1.S1_at GmaAffx.59986.1.A1_at Gma.13864.1.A1_at Gma.10357.2.S1_at Gma.1382.2.S1_at Gma.11220.3.S1_at Gma.5757.1.S1_at GmaAffx.90582.1.S1_s_at GmaAffx.45780.1.S1_at GmaAffx.34646.1.S1_at GmaAffx.83224.1.S1_at GmaAffx.85836.1.S1_at GmaAffx. GmaAffx.92474.1.S1_s_at GmaAffx.45461.1.S1_at Supplementary Ta Supplementary

167

lar lar

ponse to stress ponse protein metabolism protein process unknown biological process unknown biological metabolism protein cellu other transport res biotic or to abiotic response stimulus metabolic other metabolism protein and organization cell biogenesis process unknown biological cellular other metabolism protein cellular other metabolic other developmental transport transcription metabolism protein cellular other metabolism protein

like protein like

-

n

subunit

like proteinlike - - Prot nscription factor nscriptiona factor

tubulin

- ribosylation factor ribosylation - somal protein somal protein L3 phosphogluconate dehydrogenase dehydrogenase phosphogluconate - 26S proteasome subunit 26S proteasome acyltransferase like Anthocyanin protein on Uni No Hits on UniProtNo Hits subunit regulatory 26S proteasome Cellulose synthase OsCslE1 on UniProtNo Hits protein activating RAS GTPase on UniProtNo Hits lyase a ATP citrate inhibitor protein Polygalacturonase cytoplasmic ( hydratase, Aconitate Ribo Alpha Protein Hypothetical tra Phantastica ADP aminotransferase Phosphoserine on UniProtNo Hits reductase Tropinone oxidase precursor Monocopper Knolle protei Homeodomain subunit RPN9b26S proteasome on UniProtNo Hits 6 carboxylase Phosphoenolpyruvate

Q9FK79 Q1SCG8 Q38JG4 Q651X8 Q2PEQ3 Q1S825 Q7FPX7 Q2PEW2 Q6TKR0 Q84UZ5 Q8GWT6 Q4VPE6 Q7F2H4 Q96255 Q9ASX2 Q9SU40 Q9SML5 Q8LJS7 Q94E72 O22111 Q8H1P3

3.299 3.28 3.277 3.276 3.274 3.267 3.267 3.266 3.265 3.263 3.252 3.251 3.247 3.245 3.244 3.238 3.232 3.232 3.226 3.225 3.224 3.22 3.22 3.216 3.216 3.214 3.214 ------

ble 2 (Continued) 2 ble 2096.3.S1_s_at

Gma.16563.2.S1_x_at GmaAffx.86830.1.S1_at Gma.13896.1.A1_at GmaAffx.11155.1.A1_at GmaAffx.65487.1.S1_at Gma.2507.2.S1_at GmaAffx.35147.1.S1_at GmaAffx.60624.1.S1_at GmaAffx.34595.2.A1_at GmaAffx.93348.1.S1_x_at Gma.9637.1.S1_at GmaAffx.93451.1.S1_s_at GmaAffx.93550.1.S1_at GmaAffx.83022.1.S1_at GmaAffx.77203.1.S1_at GmaAffx.8550.1.S1_at Gma.11398.4.S1_a_at GmaAffx.81318.1.S1_at GmaAffx.26732.1.A1_at Gma. GmaAffx.87042.2.S1_at GmaAffx.25145.1.A1_at GmaAffx.42464.1.S1_at Gma.16563.4.S1_x_at HgAffx.11519.2.S1_at GmaAffx.92278.1.S1_s_at Gma.56.1.S1_at Supplementary Ta Supplementary

168

nown bolism

other cellular cellular other process unk biological process unknown biological to stress response to stress response process unknown biological transcription process unknown biological metabolism protein transport meta RNA or DNA cellular other and organization cell biogenesis transcription process unknown biological cellular other process unknown biological transport transportenergy or electron pathways process unknown biological process unknown biological metabolism protein metabolism protein

ase

like protein

-

protein synthase protein synthase dependent RNA dependent -

-

transferase -

glucan - like proteinlike kin ical Protein

- box ATP 1,4 -

hetical Protein - glucose glucose glucosyltransferase - Alpha DEAD 56 helicase Protein Hypothetical transporter superfamilySugar peroxidase, Haem plant/fungal/bacterial Glutathione S Protein Hypothetical protein related Myb Hypot Chaperonin protein activating GTPase Ran Protein Hypothetical UDP tubulin 1 Alpha transcription factor Nuclear proptein like B Cyclin decarboxylase Lysine NPH3 Hypothet resistance protein Pleiotropic drug Dicyanin on UniProtNo Hits related protein Ripening B type acetyltransferase Histone subunit catalytic Receptor subunit P45 26S proteasome

O04300 Q1SD62 Q6GKX1 Q1SNI6 Q1RXM7 Q9FQE8 Q9SMR7 Q8GW75 Q1SP60 Q1KUM7 Q9SBS1 Q9SJW9 Q1S2D1 Q5ME66 Q67XJ2 Q9FNI1 Q1RV75 Q1SI22 O22880 O81016 Q9M510 Q9SWS4 Q9FJT8 Q9FHM8 Q1RSM6

3.201 3.198 3.197 3.188 3.187 3.177 3.173 3.164 3.164 3.163 3.162 3.157 3.145 3.145 3.141 3.133 3.132 3.13 3.126 3.124 3.119 3.117 3.116 3.114 3.098 3.093 ------

ble 2 (Continued) 2 ble S1_at

Gma.2190.1.S1_a_at GmaAffx.54742.1.S1_at Gma.12539.1.S1_at Gma.12483.1.A1_at Gma.17805.1.A1_s_at Gma.1917.1. Gma.2474.2.S1_a_at Gma.15939.1.S1_at GmaAffx.78740.1.S1_at GmaAffx.84803.1.S1_at GmaAffx.59784.1.S1_at Gma.11109.1.A1_at GmaAffx.75645.1.A1_at GmaAffx.90508.1.S1_s_at Gma.1955.3.S1_at GmaAffx.56767.1.A1_at GmaAffx.66195.1.S1_at GmaAffx.6038.1.A1_at Gma.7778.1.S1_at GmaAffx.59591.1.S1_s_at Gma.4434.1.S1_at GmaAffx.36979.1.S1_at Gma.2205.1.S1_at GmaAffx.66280.1.S1_at Gma.7646.1.A1_at GmaAffx.82721.1.S1_at Supplementary Ta Supplementary

169

cellular cellular

ther cellular cellular ther biological process unknown biological cellular other process unknown biological metabolism RNA or DNA other to stress response metabolism protein o metabolism protein transport transport metabolism RNA or DNA cellular other metabolism RNA or DNA process unknown biological metabolism protein

- coat -

e protein e

ATPase -

A1

like proteinlike CslG

-

semialdehyde semialdehyde -

ribosylation factor ribosylation -

Hypothetical Protein Hypothetical D Cyclin Protein Hypothetical maintenance Minichromosome 5 protein deficient glycosylated polypeptide Reversibly Protein Hypothetical Motif Hybrid Single on UniProtNo Hits LOX1 Lipoxygenase kinase protein Receptor Protein Hypothetical protein binding DNA Cellulose synthase AAA 26S proteasome subunit RPT4a subunit (Delta delta Coatomer protein) membran vesicle Coated like putative factor, Replication Aspartate dehydrogenase protein Replication on UniProtNo Hits Protein Hypothetical transporter ABC Iron Inhibitted homolog Phosphate/phosphoenolpyruvate protein translocator ADP

Q1S275 Q2ABE7 Q5XV70 Q6KAJ4 Q9ZR33 Q9AXI4 Q1S125 P93698 Q5NJB1 Q1S7M1 Q1SVI1 Q3Y6V1 Q8H9D4 Q93Y42 Q6YZD2 Q948P2 Q8VYI4 Q9FME0 Q2QM08 Q9MAY4 Q9SUV2 Q677H6

3.083 3.078 3.07 3.063 3.053 3.038 3.028 3.02 3.017 3.015 3.009 3.007 3.004 2.99 2.99 2.986 2.986 2.984 2.981 2.978 2.974 2.966 2.964 2.961 ------

_at

ble 2 (Continued) 2 ble fx.37109.1.S1_at ffx.54698.1.S1_at

Gma.4218.2.S1_a_at GmaAffx.83415.1.S1_at GmaA Gma.11296.1.S1_at Gma.2190.2.S1_x_at Gma.16514.2.S1_a_at Gma.16524.2.S1_a_at GmaAffx.88676.1.A1_at GmaAffx.70690.1.S1_at GmaAf Gma.5205.2.S1_at GmaAffx.47045.1.S1_at Gma.2507.1.S1_at GmaAffx.86770.1.S1_at Gma.2422.2.S1_at GmaAffx.91121.1.S1_at GmaAffx.54397.2.S1_at Gma.6962.2.S1_at GmaAffx.39814.1.S1_at GmaAffx.23647.1.S1_at Gma.2379.1.A1_at GmaAffx.86851.1.S1_at GmaAffx.51889.1.S1 Gma.4707.3.S1_s_at Supplementary Ta Supplementary

170

unknown

DNA or RNA metabolism RNA or DNA process unknown biological process unknown biological and organization cell biogenesis process unknown biological process biological transcription cellular other transport transcription metabolic other metabolism protein cellular other metabolic other process unknown biological transcription metabolic other metabolic other

-

glucosidase glucosidase -

protein synthase protein synthase beta -

-

3 - 1 -

loop helix DNA binding binding loop helix DNA like proteinlike - - glucan

-

bound Ib rich protein rich - dependent kinase kinase dependent B -

1,4

- - ate dehydrogenase ate E1 dehydrogenase binding protein binding its on UniProt glucosidase - - HD protein HD dimerisation region - acyl transferase like protein like transferase acyl - Hypothetical Protein Hypothetical No H Proline proteinATP/GTP binding protein/transporter Transmembrane related tubulin 1 Alpha Protein Hypothetical ZF protein, RINGfinger Zinc type;RINGv Helix basic protein Alpha Protein Hypothetical GTP transcriptionalII plymerase RNA protein coactivating endo Glucan Ubiquitin Cyclin Pyruv subunit, alpha component precursor mitochondrial O finger Zinc on UniProtNo Hits Granule precursor Beta

Q94C32 Q9M6T7 Q4ABY1 Q69V70 Q5ME66 Q1SGU9 Q9M9S0 Q1SJ61 Q1T454 O04300 Q3LVI1 Q8S2Z7 O65154 Q9MBB5 Q9S7H2 Q9FSH5 Q1SAV1 Q9LN83 Q1SRY0 Q84LK2 Q1T2S0

2.953 2.952 2.947 2.941 2.938 2.934 2.932 2.93 2.93 2.924 2.924 2.917 2.916 2.912 2.907 2.902 2.888 2.883 2.875 2.871 2.864 2.863 2.862 ------

at

ble 2 (Continued) 2 ble

GmaAffx.54607.1.S1_at Gma.322.2.S1_at GmaAffx.941.2.S1_s_at Gma.14851.1.S1_at GmaAffx.80532.2.S1_at Gma.16674.1.S1_s_at Gma.6493.1.S1_at Gma.7731.2.S1_at GmaAffx.84030.1.A1_at Gma.12279.1.A1_at Gma.2190.1.S1_x_at GmaAffx.27899.1.A1_at GmaAffx.92555.1.S1_s_at GmaAffx.61002.1.S1_at GmaAffx.64606.1.S1_ Gma.2534.3.S1_x_at GmaAffx.88134.1.S1_at GmaAffx.73010.1.S1_s_at GmaAffx.36148.1.S1_at Gma.8067.1.A1_at Gma.2937.1.S1_at Gma.1394.2.S1_at Gma.2241.2.S1_at Supplementary Ta Supplementary

171

ical

lular lular metabolism

ganization and and ganization on transport or energy on transport energy or other cellular cellular other cellular other electr pathways process unknown biological or cell biogenesis metabolism protein protein transport biological other cellular other metabolism protein biolog other metabolic other cellular other metabolism protein metabolic other cel other process unknown biological

rich rich

-

L L - -

S S

- - D D

- - like like protein

-

CoA synthase - binding protein; binding protein; - -

like) - ketoacyl - pid transfer protein; glossy1 protein;pid transfer glossy1 Cyclin A homolog Cyclin protein activating GTPase Ran on UniProtNo Hits on UniProtNo Hits kinase Thymidine protein membrane Integral on UniProtNo Hits (G.max) hydroproline Soybean protein heme SOUL factor, regulatory effector Bacterial heme SOUL factor, regulatory effector Bacterial protein precursor transfer Lipid G enzyme, Lipolytic on UniProtNo Hits Li homolog Protein Hypothetical on UniProtNo Hits G enzyme, Lipolytic Beta on UniProtNo Hits acid elongase Fatty (Cer2 enzyme Ubiquitin conjugating on UniProtNo Hits Oxidase/dehydrogenase putative Lipase, related protein Ripening

Q9S957 Q9SBS1 Q71F77 Q8L924 Q39887 Q1T2U5 Q1T2U5 Q9M5X6 Q1S8F9 Q8H1Z0 Q1SPF8 Q1S8F9 Q69X62 Q39048 Q677E9 Q2HVU0 Q33B71 Q1SBU7

.7E+9 2.859 2.846 2.844 2.842 2.816 2.794 2.741 ------3.026 3.862 3.888 44.124 63.159 138.75 190.3405908 229.624 312.001 407.404 519.17 936.68 2401.068 2457.383 6232.589 603126.398 7 2.62E+19

at

ble 2 (Continued) 2 ble

Gma.12894.1.S1_at GmaAffx.80651.1.S1_at Gma.13695.1.A1_at HgAffx.134.1.S1_at GmaAffx.27192.1.S1_at GmaAffx.27698.1.S1_at GmaAffx.45999.1.A1_at Gma.16913.1.S1_s_at GmaAffx.38484.1.S1_s_at GmaAffx.38484.1.S1_at Gma.1096.1.S1_at Gma.12562.1.A1_at GmaAffx.63464.1.S1_at Gma.13296.3.S1_at Gma.1720.1.S1_a_at GmaAffx.79522.1.A1_s_at Gma.15677.1.A1_at GmaAffx.76337.1.S1_at Gma.13367.1.A1_at GmaAffx.6533.1.A1_at GmaAffx.18376.1.A1_ Gma.17369.1.S1_at Gma.17707.1.A1_at Gma.12669.1.A1_at Gma.12828.1.A1_at Supplementary Ta Supplementary

172

other cellular cellular other

ts on UniProt

CoA synthetase CoA

-

No Hi No

Acyl

No Hits on UniProt Hits No

Q1SUU0

5.79E+40

3.48E+25

1.85E+23

ued)

ble 2 (Contin 2 ble

GmaAffx.28196.2.A1_s_at

Gma.12584.2.A1_at

Gma.16867.1.A1_at Supplementary Ta Supplementary

173

Appendix: Supplementary Table 3

to abiotic to abiotic

PlantGoSlim cellular other process biological unknown to abiotic response biotic or stimulus process biological unknown biological other metabolic other metabolic other process biological unknown biological other to abiotic response biotic or stimulus response biotic or stimulus to stress response to stress response metabolic other to stress response

PlantGoSli m At Homolog AT2G47240 AT5G28010 AT5G51100 AT5G62360 AT1G75900 AT3G57030 AT5G33370 AT4G15630 AT1G75900 AT5G51100 AT1G77760 AT2G16060 AT2G29500 AT3G57030 AT5G54770

e

L L - -

S S - - D D

- -

2) - motif - CoA synthetase 2 (Hemoglobin - -

sidine synthase, putative sidine synthase, superoxide dismutase superoxide dismutase

- - in IsoClark Genotype Induced by Genotype Deficiency IsoClark in Iron Induced

polytic enzyme, G polytic enzyme, Confirmed Annotation from Annotation Confirmed UniProt Putative acyl on UniProtNo Hits on UniProtNo Hits on UniProtNo Hits related protein Ripening Iron on UniProtNo Hits related protein Ripening G enzyme, Lipolytic putativ strictosidine synthase, Putative GDSL lipase/acylhydrolase Protein Hypothetical Li Iron [NADH] reductase 2 Nitrate (NR 1.7.1.1) (EC Hemoglobin II) shock heat protein class I stricto enzyme biosynthetic Thiamine on UniProtNo Hits

Best Hit Best UniProt ID Q1SUU0 Q9SWS4 P28759 Q8H7E2 Q1S8F6 Q9M1J5 Q5J7N0 O23414 Q1S8F6 Q71UA1 Q43461 Q3C1F4 P02519 Q9M1J5 Q8GVQ3

Average Average Fold Change 66.103 57.793 53.684 45.976 42.565 41.557 37.913 37.627 36.599 36.119 31.571 31.206 29.650 29.552 29.403 28.770 25.004 24.169 24.048 23.913

: Differentially Expressed Genes : Genes Expressed Differentially

ble 3 ble

AffySpotID Gma.12584.2.A1_at GmaAffx.28196.2.A1_s_at Gma.16867.1.A1_at Gma.15718.1.A1_at Gma.10658.1.A1_at Gma.3233.1.S1_s_at GmaAffx.81790.1.S1_at Gma.7557.1.S1_at GmaAffx.61521.1.A1_s_at Gma.17979.1.S1_a_at Gma.17724.3.S1_at Gma.12665.1.S1_at Gma.17825.1.A1_at Gma.16827.1.S1_at Gma.34.1.S1_at GmaAffx.70981.1.S1_at Gma.17947.1.S1_at Gma.17979.1.S1_x_at Gma.1329.1.S1_at GmaAffx.75796.1.S1_at

Supplementary Ta Supplementary

ble 3 (Continued) 3 ble Supplementary Ta Supplementary 174

other cellular cellular other physiological other metabolic other cellular other to abiotic response biotic or stimulus process biological unknown to stress response process biological unknown process biological unknown physiological other metabolic other process biological unknown

60 7800 AT5G5 AT2G46820 AT4G28780 AT5G62790 AT1G77760 AT1G03670 AT4G15630 AT4G10250 AT1G21065 AT1G73260 AT2G40100 AT2G26640 AT1G178

CoA - -

2)

- B binding B - lulose 5 ketoacyl - A

xy

-

D

type trypsin inhibitor trypsin type inhibitor trypsin type - - - t)

transfer protein transfer -

deoxy - lipid membrane Thylakoid 14 kDa, phosphoprotein chloroplast on UniProtNo Hits 1 reductoisomerase phosphate [NADH] reductase 2 Nitrate (NR 1.7.1.1) (EC hit on (only F21B7.27 UniPro on UniProtNo Hits shock protein heat class IV precursor on UniProtNo Hits Protein Hypothetical inhibitor precursor A Trypsin (Kunitz A) Chlorophyll protein Putative beta synthase inhibitor precursor A Trypsin (Kunitz A) putative chitinase shock transcription Heat factor (Fragment) (HSFA)

Q8H1Z0 Q2HTB6 Q6EJD0 P39866 Q9LR59 Q39819 Q2MGQ 1 Q94IA1 Q1SPM2 Q69X62 P01070 P39657 O82042

23.768 23.739 23.730 23.542 23.288 22.962 22.438 21.259 20.814 20.448 20.236 19.253 17.051 16.720 16.100 16.097

ble 3 (Continued) 3 ble

1_a_at

Gma.13296.3.S1_at Gma.11189.1.S1_at Gma.17724.1.A1_at Gma.2480.1.S1_at GmaAffx.63680.1.S1_at Gma.15781.1.A1_at Gma.12665.2.A1_at Gma.18.1.S1_at Gma.13367.1.A1_at Gma.11767.1.A Gma.13341.1.S1_at Gma.2362.1.S1_at GmaAffx.76337.1.S1_at Gma.13161.1.S1_at Gma.1303.1.S1_at GmaAffx.71308.1.S1_at Supplementary Ta Supplementary

175

ogical process ogical other metabolic metabolic other metabolic other to stress response biol unknown process biological unknown metabolic other to stress response transport to abiotic response biotic or stimulus developmental to stress response process biological unknown cellular other transport or electron pathways energy cellular other process biological unknown transport or electron pathways energy metabolic other

4G27700 AT3G04290 AT2G26640 AT3G46230 AT1G76560 AT3G03480 AT1G05340 AT4G35160 AT5G45910 AT3G46230 AT3G18280 AT1G64780 AT AT4G27670 AT3G53950 AT1G19670 AT2G23180 AT5G16120 AT4G15630 AT1G07180 AT5G02540

-

CoA synthase - se 1 (EC 3.1.1.14) 1 (EC se

hetical Protein transfer protein transfer ketoacyl - - methyltransferase, 2 family methyltransferase, - No Hits on UniProtNo Hits Beta shock heat protein class I (Chloroplast CP12 precursor 12) protein A: benzyl coenzyme Benzoyl transferase benzoyl alcohol 3 Protein (only Hypothetical hits on UniProt) O Putative esterase shock heat protein class I lipid on UniProtNo Hits Ammonium transporter Protein Hypothetical shock protein Small heat oxidasePutative Glyoxal Chlorophylla P450 Cytochrome putative Lipase, Hypot on UniProtNo Hits rotenone Putative internal insensitive NADH dehydrogenase putative pod specific

43681 Q41301 Q2HTU2 Q5QET3 Q5I2Q5 Q2HTZ6 Q1T3W1 Q6K4Z3 Q2HTU2 Q Q7Y1B9 Q69NF7 Q1T3Y4 Q9M332 Q8GTM4 O22189 Q33B71 O23414 Q9ST63 Q2HVU0

16.025 15.565 15.320 15.317 15.079 14.507 13.929 13.684 13.440 13.287 12.928 12.439 12.361 12.307 12.267 12.167 11.913 11.871 11.824 11.692 11.476 11.181

t

at 2.S1_at

Gma.17724.2.S1_at Gma.10713. GmaAffx.89896.1.S1_at Gma.11162.1.S1_at Gma.12270.1.S1_at Gma.15091.2.S1_at Gma.4340.2.S1_a_a Gma.1061.1.A1_at GmaAffx.89896.1.S1_s_at Gma.10980.1.S1_at Gma.13823.2.A1_s_at Gma.9265.1.S1_at Gma.2446.2.S1_at Gma.10282.1.A1_at GmaAffx.43122.1.S1_s_at Gma.13454.1.A1_at Gma.16866.1.A1_at Gma.12669.1.A1_at Gma.764.1.S1_s_at Gma.12852.1.A1_at Gma.13196.1.A1_at Gma.17707.1.A1_ Supplementary Table 3 (Continued) 3 Table Supplementary

176

stress abolism

response to stress response transport biological other process biological unknown to stress response process biological unknown process biological unknown to stress response metabolism protein to response transcription met protein transcription metabolic other metabolic other

0 1G75880 AT5G59720 AT2G36830 AT AT3G49190 AT1G0740 AT4G12390 AT3G01680 AT5G25610 AT1G11330 AT5G59720 AT4G35160 AT5G54680 AT1G20810 AT1G76110 AT2G26640 AT1G43670

L -

S -

D

inhibitor

-

E) like proteinlike - - CoA synthase

- bisphosphatase, -

methyltransferase - 1,6

- type trypsin type - type - ketoacyl -

receptor kinase receptor -

class I heat shock heat protein class I inhibitor precursor A Trypsin (Kunitz A) on UniProtNo Hits intrinsic protein Major on UniProtNo Hits G enzyme, Lipolytic Protein Hypothetical on UniProtNo Hits heat shock class I 17.5 kDa (HSP 17.5 protein on UniProtNo Hits related protein Ripening Protein Hypothetical domain BURP S 1 shock heat protein class I O Orcinol factor, transcription BHLH isomerase, Peptidylprolyl FKBP hit on T23E18.4 (only UniProt) Beta Fructose

P04794 Q94IA1 Q1SSD4 Q1S8F9 Q1SSI3 P04794 Q1SBU7 Q1S265 Q1SZC4 Q7XA30 P04794 Q1T3W1 Q1SGR4 Q1T020 Q9SGS2 Q69X62 P46276

10.969 10.820 10.643 10.453 10.370 10.218 9.844 9.825 9.804 9.712 9.461 9.443 9.370 9.181 9.115 9.104 8.981 8.947 8.926 8.876 8.808 8.758

GmaAffx.93268.1.S1_at Gma.13161.2.A1_at GmaAffx.63464.1.S1_at Gma.11024.6.S1_s_at Gma.11530.1.A1_at Gma.12562.1.A1_at Gma.13800.1.A1_at Gma.13317.1.A1_at GmaAffx.69311.1.S1_at GmaAffx.34657.1.A1_s_at Gma.12828.1.A1_at Gma.7880.1.S1_at Gma.2317.1.A1_a_at Gma.13369.1.S1_at GmaAffx.93268.1.S1_s_at GmaAffx.93221.1.S1_s_at Gma.12688.1.A1_at Gma.17712.1.S1_at Gma.12811.1.A1_at Gma.10713.1.S1_a_at GmaAffx.15792.1.A1_at Gma.2522.1.S1_at Supplementary Table 3 (Continued) 3 Table Supplementary

177

ponse to abiotic to abiotic ponse other cellular cellular other transport metabolic other metabolic other metabolism protein to abiotic response biotic or stimulus metabolic other metabolic other transport or electron pathways energy res biotic or stimulus transport cellular other

0 AT5G54190 AT2G38540 AT5G45910 AT1G70820 AT5G18910 AT1G77760 AT3G16370 AT5G4376 AT1G64980 AT1G12570 AT1G77760 AT3G43720 AT1G68530

L - - S -

D

- synthase

- transfer -

rich protein rich -

CoA

-

orophyllide orophyllide transfer protein transfer ketoacyl - - Protochlorophyllide reductase, Protochlorophyllide (EC precursor chloroplast (PCR)1.3.1.33) (NADPH protochl (POR) oxidoreductase) on UniProtNo Hits lipid Nonspecific precursor protein G enzyme, Lipolytic precursor Phosphoglucomutase 5.4.2.2) (EC Putative serine/threonine kinase protein reductase nitrate Inducible (NR) [NADH] 1.7.1.1) 1 (EC responsive element Sucrose protein binding on UniProtNo Hits on UniProtNo Hits Putative proline APG isolog on UniProtNo Hits Beta Protein Hypothetical putative lyase, Mandelonitrile reductase Nitrate lipid condensing enzyme Acid Fatty putative CUT1,

Q9LKH8 O23758 Q1SYC4 Q9SSL0 Q8GXZ5 P54233 Q1XAN1 Q1SAY6 Q84VI8 Q1STK3 Q7XTZ0 Q9SYR0 Q1S9L0 Q2QCW 7

8.700 8.675 8.518 8.495 8.411 8.305 8.295 8.258 8.257 8.148 8.141 7.991 7.990 7.987 7.880 7.786 7.609 7.554

(Continued)

1_s_at 6.1.S1_at

Gma.2542.1.S1_at Gma.14164.1.A1_at Gma.14152.1.S1_at Gma.1061.2.S1_at GmaAffx.24026.1.S1_at GmaAffx.19467.1.S1_at Gma.1221.1.S Gma.4164.3.S1_at Gma.4744.1.S1_x_at Gma.903 Gma.11415.1.A1_at Gma.14152.2.A1_at GmaAffx.54366.1.S1_at Gma.13407.1.A1_at Gma.16191.1.S1_at Gma.8416.1.S1_at Gma.4331.2.S1_at Gma.7832.1.S1_at Supplementary Table 3 Table Supplementary

178

rt other biological biological other metabolic other cellular other process biological unknown transpo process biological unknown transport and organization cell biogenesis cellular other process biological unknown process biological unknown

AT1G75880 AT3G01500 AT4G36220 AT1G35180 AT2G45180 AT2G39705 AT4G26220 AT2G45180 AT5G19780 AT5G51820 AT5G19855 AT1G21460

- L

-

S -

D -

caffeoyl

- - oAOMT)

alpha amylase alpha amylase -

yme, G yme, methyltransferase) methyltransferase) CoA O - phosphate synthase phosphate -

-

O

- Lipolytic enz Lipolytic PsAD2 (EC anhydrase Carbonic 4.2.1.1) P450 Cytochrome Sucrose 2.4.1.14) (EC 1 Protein (only Hypothetical hits on UniProt) Proline Rich Protein 2 Protein (only Hypothetical hits on UniProt) Caffeoyl (EC methyltransferase (Trans 2.1.1.104) CoA 3 (CC (CCoAMT) Plant lipid transfer/seed storage/trypsin inhibitor tubulin Alpha Phosphoglucomutase, (EC precursor chloroplast (Glucose 5.4.2.2) (PGM) phosphomutase) 3 Protein (only Hypothetical hits on UniProt) Putative MtN3 protein on UniProtNo Hits

M59 Q1S8F9 Q1SG42 Q5NE21 Q75W19 Q1RTA0 Q6IDJ6 Q1T4I0 Q1S7H7 Q9C5D7 Q1S2I7 Q38HT4 Q9S Q84M38 Q8L9J7

7.489 7.458 7.451 7.434 7.358 7.325 7.325 7.172 7.100 7.043 6.950 6.875 6.800 6.758 6.740

17130.1.S1_at

Gma.15677.1.A1_at Gma.3766.1.S1_at Gma.15490.2.S1_a_at GmaAffx.40837.1.S1_at Gma. Gma.15776.1.A1_at Gma.2516.2.S1_s_at GmaAffx.51832.1.S1_at Gma.9647.1.S1_at GmaAffx.38010.2.S1_at Gma.15469.1.S1_at Gma.7477.1.A1_at GmaAffx.20432.1.S1_at GmaAffx.76027.1.S1_at Gma.3272.1.S1_at Supplementary Table 3 (Continued) 3 Table Supplementary

179

cellular cellular own process biological unkn and organization cell biogenesis metabolic other metabolic other transport or electron pathways energy metabolic other metabolism protein to abiotic response biotic or stimulus other process biological unknown metabolic other cellular other to stress response

60 T3G47650 AT1G09310 AT3G43270 AT5G33370 AT1G23730 AT2G15620 A AT2G43870 AT4G15040 AT1G65870 AT1G19670 AT3G55240 AT5G02540 AT3G48460 AT1G52560 AT3G116

responsive -

motif like 1, chloroplast induced induced - - - -

chlorophyllido -

/acylhydrolase No Hits on UniProtNo Hits Protein Hypothetical PPE8B Pectinesterase 3.1.1.11) (Pectin (EC precursor (PE) methylesterase) Putative GDSL lipase/acylhydrolase 3 (EC anhydrase Carbonic 4.2.1.1) 1.7.7.1) (EC reductase Nitrite putative chaperon Putative polygalacturonase (Polygalacturonase/pectinase) protease Subtilisin like serine resistance Disease protein family Chlorophyllase 3.1.1.14) (EC precursor (Chlorophyll 1) (Chlase hydrolase on UniProtNo Hits putative pod specific dehydrogenase Putative GDSL lipase on UniProtNo Hits shock protein Small heat Putative harpin

S03 Q9ZPZ4 Q43062 Q8LB81 Q84Y09 O04840 Q1RYZ0 O22817 Q9ZTT3 Q9S Q94LX1 Q2HVU1 Q9STM6 Q8L7T2 Q9SRN0

6.722 6.630 6.534 6.432 6.425 6.310 6.285 6.243 6.209 6.184 6.001 5.896 5.870 5.866 5.843 5.827 5.754

.15490.1.S1_a_at

GmaAffx.36428.1.A1_at Gma.10648.1.S1_at Gma.5447.1.S1_at Gma.12675.1.A1_at Gma GmaAffx.80581.1.S1_at Gma.3310.1.S1_at GmaAffx.32326.1.S1_at GmaAffx.31196.1.S1_s_at Gma.2773.2.S1_at GmaAffx.40675.1.S1_at GmaAffx.11213.1.A1_at GmaAffx.25220.1.S1_at GmaAffx.76826.1.S1_at GmaAffx.73005.1.S1_at GmaAffx.5924.1.S1_at GmaAffx.23295.1.S1_at Supplementary Table 3 (Continued) 3 Table Supplementary

180

other cellular cellular other cellular other cellular other process biological unknown cellular other to stress response process biological unknown to stress response process biological unknown transport metabolic other process biological unknown cellular other metabolic other transport or electron pathways energy

AT2G46240 AT4G24510 AT4G16740 AT2G45990 AT1G62250 AT4G11820 AT4G10250 AT1G76560 AT5G52660 AT1G70230 AT3G18280 AT4G27570 AT3G59780 AT2G27150 AT1G73010 AT1G58260

ly ly 3

like like

-

transfer -

2) (AtAO3)

- iProt like) - Prot)

tical Protein

ve phosphatase ve Hypothetical Protein (only 2 Protein (only Hypothetical hits on UniProt) acid elongase Fatty (Cer2 protein Hypothe synthase Farnesene Protein Hypothetical 3 Protein (only Hypothetical hits on Uni Hydroxymethylglutaryl A synthase coenzyme shock protein heat class IV precursor on Un No Hits CP12 precursor transcription factor Myb M14, Peptidase A carboxypeptidase lipid nonspecific protein Glucosyltransferase UDP Protein (on Hypothetical hits on UniProt) oxidase 2 (EC Aldehyde (AtAO 1.2.3.1) on UniProtNo Hits Putati P450 Cytochrome

Q6K237 Q39048 Q1SFG3 Q1T1D0 Q1SMU6 Q84MP5 Q94ET0 Q39820 O24292 Q8LAP4 Q1T048 Q43681 Q8LP23 Q5N8N4 Q2PHF4 Q8GUC2 Q6J540

5.754 5.726 5.338 5.333 5.318 5.290 5.256 5.243 5.085 5.038 4.928 4.898 4.847 4.843 4.839 4.770 4.712 4.703 4.672

Gma.7623.1.A1_at GmaAffx.6533.1.A1_at GmaAffx.33640.1.S1_at Gma.625.1.S1_at GmaAffx.56801.1.A1_at Gma.1174.1.S1_at GmaAffx.73904.1.S1_at Gma.17.1.S1_at Gma.14094.1.A1_at Gma.8333.1.S1_at GmaAffx.12145.1.S1_at Gma.13789.1.A1_at Gma.3723.1.S1_at GmaAffx.76187.1.S1_at Gma.1102.1.S1_at Gma.14748.1.S1_s_at GmaAffx.6785.1.A1_at Gma.10581.1.S1_at GmaAffx.61184.1.S1_at Supplementary Table 3 (Continued) 3 Table Supplementary

181

n biological process biological unknown transport or electron pathways energy transport or electron pathways energy process biological unknown process biological unknow process biological unknown to stress response to stress response process biological unknown transport cellular other to abiotic response biotic or stimulus process biological unknown process biological

10 AT5G17540 AT4G19380 AT1G19250 AT2G44310 AT4G16146 AT2G27385 AT4G25290 AT1G10760 AT3G06070 AT1G625 AT5G13930 AT5G51100 AT5G23530 AT5G10695

related R1related

- precursor (EC precursor

glucan water dikinase, water glucan

- superoxide dismutase - Benzoyl A coenzyme Benzoyl fatty to long chain Similarity oxidase alcohol (At3g23410/MLM24_23) containing Flavin protein 3 like monooxygenase binding EF putative calcium protein family hand on UniProtNo Hits Protein Hypothetical Protein Hypothetical hydrdolase fold Alpha/beta family Alpha chloroplast (Starch 2.7.9.4) protein) 3 Protein (only Hypothetical hits on UniProt) Extensin like protein synthase Chalcone on UniProtNo Hits Iron Esterase/lipase/thioesterase Protein Hypothetical

Q8GT20 O65709 Q7XWZ 6 O64866 Q8GW58 Q6NLE8 Q652J5 Q8LPT9 Q1RV82 Q9SXE6 Q2ENC4 P28759 Q1SMF6 Q2HVI0

4.634 4.557 4.523 4.519 4.434 4.429 4.419 4.410 4.409 4.374 4.374 4.305 4.273 4.251 4.247 4.230 4.206 ued) ued)

fx.5738.1.S1_at

GmaAffx.49404.1.S1_at GmaAffx.86212.1.S1_at Gma.11944.1.S1_at Gma.1281.2.S1_at GmaAffx.4270.1.S1_s_at Gma.2252.1.S1_at Gma.768.1.S1_at Gma.2554.1.S1_at Gma.2574.1.S1_at Gma.2062.2.S1_a_at Gma.5382.1.S1_at GmaAffx.77637.1.S1_at GmaAffx.48040.1.A1_at Gma.3233.1.S1_at GmaAf GmaAffx.20412.1.A1_at Gma.17341.1.S1_at Supplementary Table 3 (Contin 3 Table Supplementary

182

n biological process biological unknown transport or electron pathways energy process biological unknow transport to stress response and organization cell biogenesis metabolic other metabolic other cellular other and organization cell biogenesis process biological unknown cellular other to stress response developmental

AT3G01680 AT3G22400 AT3G01670 AT1G80830 AT3G47600 AT2G27380 AT4G36250 AT1G44170 AT1G61720 AT3G54580 AT3G01210 AT4G26260 AT1G74310 AT4G33790

1 - -

inositol

-

associated -

cal Protein (only cal2 Protein (only 13.99.1) (Myo 13.99.1) related proteinrelated 306 binding region RNP binding - its on UniProt - othetical Protein Hyp 1.13.11.12) (EC Lipoxygenase Protein Hypothetical resistance Natural ( Root protein macrophage transporter) metal specific MYB No H dehydrogenase Aldehyde protein family (NAD) on UniProtNo Hits on UniProtNo Hits on UniProtNo Hits dehydrogenase Aldehyde protein family (NAD) HC toxinNADPH reductase on UniProtNo Hits RNA recognition motif)(RNA inositolProbable oxygenase 1. (EC oxygenase) shock protein 101 Heat CoA Reductase Acyl Hypotheti hits on UniProt)

Q2HUY0 Q7X9G5 Q5ZF88 Q1SJX4 P81392 Q1SCN9 Q1SCN9 Q1SZ14 Q2HVR6 Q1SEA5 Q39889 Q5QIV3 Q26YQ4

4.201 4.189 4.174 4.158 4.120 4.117 4.099 4.093 4.087 4.068 4.061 4.031 4.027 3.997 3.991 3.964 3.954 3.941

at

maAffx.79522.1.A1_s_at

GmaAffx.9536.1.S1_at GmaAffx.27496.2.S1_at Gma.13004.1.S1_at Gma.6552.1.S1_at GmaAffx.78009.1.S1_at Gma.16947.1.S1_at Gma.13140.4.S1_at Gma.13613.1.A1_at Gma.5173.2.S1_at G Gma.13140.1.A1_at Gma.10095.1.A1_at GmaAffx.6001.1.S1_ Gma.832.1.S1_a_at Gma.13186.1.S1_at Gma.48.1.S1_at Gma.3384.1.S1_at Gma.1937.1.S1_at Supplementary Table 3 (Continued) 3 Table Supplementary

183

and organization cell biogenesis to stress response process biological unknown process biological unknown process biological unknown cellular other to stress response cellular other process biological unknown and organization cell biogenesis cellular other process biological unknown process biological

01670 AT2G27380 AT3G28840 AT3G06035 AT3G AT5G11070 AT5G57800 AT1G16030 AT4G16740 AT5G54585 AT4G27360 AT5G42740 AT1G63310 AT2G26570

e

kDa protein phosphate - cal Protein (only cal2 Protein (only 6 on UniProt - rich protein rich

- farnesene synthasefarnesene - Hypotheti hits on UniProt) on UniProtNo Hits Proline on UniProtNo Hits on UniProtNo Hits on UniProtNo Hits protein anchored GPI Protein Hypothetical on UniProtNo Hits Protein Putative Glossy1 shock 70 Heat Alpha Protein Hypothetical No Hits chain protein light Dynein Glucose 1 (EC cytosolic isomerase, (GPI) 5.3.1.9) isomerase) (Phosphoglucose (Phosphohexos (PGI) (PHI) isomerase) Protein Hypothetical like heavy chain Myosin

Q26YQ4 Q39887 Q1ST72 Q5ZF88 Q67WQ7 P26413 Q1T1D0 Q8VZU6 Q8VZW 5 P54240 Q9C8T5 Q1RYE3

.911 3.941 3.939 3.924 3.923 3 3.892 3.858 3.848 3.806 3.804 3.753 3.742 3.733 3.709 3.591 3.567 3.550 3.429

Gma.1937.1.S1_at GmaAffx.41460.1.S1_at Gma.79.1.S1_s_at GmaAffx.53903.1.A1_at Gma.14039.1.A1_at Gma.10104.1.S1_at GmaAffx.37697.1.S1_at GmaAffx.20374.2.S1_at GmaAffx.11155.1.A1_at GmaAffx.82556.1.S1_at GmaAffx.30428.1.S1_at GmaAffx.87934.1.S1_at Gma.1294.1.S1_at GmaAffx.65393.1.S1_s_at Gma.12832.1.A1_at Gma.13190.1.A1_at GmaAffx.48977.1.S1_at GmaAffx.13179.1.S1_at Supplementary Table 3 (Continued) 3 Table Supplementary

184

biological process biological unknown transport cellular other transport or electron pathways energy metabolic other to stress response metabolic other process biological unknown to stress response to stress response metabolism protein transport or electron pathways energy cellular other transport

T3G51030 AT1G29520 AT4G25640 AT5G08640 AT2G30540 AT2G42990 AT1G71695 AT3G46970 AT5G06270 AT4G34135 AT1G10760 AT5G51750 A AT1G10670 AT3G62700 - -

- -

- subunit

- R1related type 1 (TRX type - -

l synthase/flavanone 3

glucan , glucan dikinase, water glucan - - glucuronosyl/UDP -

tive APG protein tive 1) - Plasma membrane associated associated Plasma membrane putative protein, on UniProtNo Hits protein MATE efflux Flavono 1.14.11.23) (EC hydroxylase (FLS) 1.14.11.9) (EC Putative glutaredoxin (At2g30540/T6B20.11) Puta Peroxidase Alpha 2.4.1.1) (EC H isozyme phosphorylase H) (Starch Protein like B Cyclin UDP glucosyltransferase Alpha (EC precursor chloroplast (Starch 2.7.9.4) protein) Putativeserine subtilisin protease Thioredoxin H H b lyase ATP citrate 4.1.3.8) (EC resistance Multidrug 10 protein (ECassociated S (Glutathione 3.6.3.44)

Q1SNE3 Q1T635 Q8LP22 O04341 Q67ZI9 Q9XFI8 P53537 Q9FNI1 Q1STD8 Q8LPT9 Q1S0Z4 Q9AR82 Q93YH3 Q9LZJ5

.411 3 3.386 3.304 3.299 3.291 3.274 3.259 3.244 3.210 3.145 3.136 3.100 3.095 2.933 2.925

x.30771.1.S1_at

Gma.13204.2.A1_at Gma.6613.1.A1_at Gma.10447.2.A1_a_at GmaAffx.3256.2.S1_at Gma.4512.2.S1_at Gma.12966.1.S1_at Gma.6477.1.S1_at Gma.7010.1.S1_at GmaAff GmaAffx.71692.1.S1_at Gma.2574.1.S1_a_at GmaAffx.28861.1.S1_at Gma.9838.1.S1_at GmaAffx.80064.1.S1_at Gma.4457.1.S1_a_at Supplementary Table 3 (Continued) 3 Table Supplementary

185

metabolic other and organization cell biogenesis to stress response metabolic other transport or electron pathways energy metabolic other transport biological other transport process biological unknown cellular other metabolism protein to abiotic response biotic or stimulus

AT1G22340 AT2G27380 AT5G06720 AT5G17050 AT5G18600 AT3G04290 AT2G38540 AT1G75900 AT3G47420 AT5G20150 AT3G49780 AT1G24430 AT5G61790 AT2G30490

- -

O L - - - S - like like D -

- e

-

- like protein like -

rich protein rich .1.144) (Anthranilate (Anthranilate .1.144) motif lipase motif lipase (Fragment) -

- 4 cinnamate glucuronosyl/UDP 3 galactose:flavonol - - - - No Hits on UniProtNo Hits UDP glucosyltransferase Proline Peroxidase UDP galactosyltransferas Glutaredoxin GDSL protein transfer Lipid protein G enzyme, Lipolytic 3 phosphate Glycerol transporter like protein IDS4 Putative phytosulfokine precursor peptide N Anthranilate protein 1 benzoyltransferase 2.3 (EC N hydroxycinnamoyl/benzoyltran 1) sferase Calnexin precursor homolog on UniProtNo Hits Trans (EC monooxygenase 4 acid (Cinnamic 1.14.13.11)

KL62 Q1RXH4 Q39887 O23961 Q9ZWS2 Q8L8Z8 Q6Y0F0 Q1 Q94CH6 Q9STY1 Q8RY68 Q7PCB1 Q1S6U2 Q39817 P37115

31 2.890 2.858 2.857 2.736 2.623 2.592 2.481 2.446 2.262 2.242 2.2 2.196 2.166 2.157 2.134 2.081

GmaAffx.20374.1.A1_at GmaAffx.45393.1.S1_at Gma.79.1.S1_x_at GmaAffx.89697.1.S1_s_at GmaAffx.28120.1.S1_at Gma.15538.1.S1_at Gma.10073.2.S1_at GmaAffx.89407.1.A1_s_at GmaAffx.87400.1.S1_at Gma.3539.2.S1_at Gma.1008.1.A1_at GmaAffx.88690.1.S1_at GmaAffx.58559.1.A1_at GmaAffx.90865.1.S1_s_at GmaAffx.83079.1.S1_at Gma.3881.2.S1_at Supplementary Table 3 (Continued) 3 Table Supplementary

186

tabolic sponse to stress other me other to stress response process biological unknown re transport or electron pathways energy metabolism protein cellular other cellular other to stress response cellular other physiological other transport or electron pathways energy cellular other metabolism protein

T3G05590

AT5G17040 AT5G05340 AT5G56550 AT4G36990 ATMG0136 0 AT1G80920 AT5G54160 AT5G41080 AT2G38870 AT3G02040 ATCG0110 0 AT3G03170 ATMG0073 0 AT3G47340 A

galactosyl - O -

methyltransferase, 2 family methyltransferase,

- utative DnaJ protein

lycerophosphoryl diesterlycerophosphoryl Flavonoid 3 Flavonoid transferase peroxidase Class III on UniProtNo Hits shock transcription Heat factor factor putative transcription 2 hits on uniprot) (only oxidase subunitCytochrome I P serum 1 alpha protein Larval (Hexamerin precursor 1 chain alpha) O on UniProtNo Hits diester glycerophosphoryl phosphodiesterase inhibitor Proteinase g phosphodiesterase subunit dehydrogenase NADH 1 Protein Hypothetical oxidase subunitCytochrome III synthetase (EC Asparagine 6.3.5.4) protein ribosomal Cytoplasmic L18

Q9ZWS2 Q7XYR7 Q43457 Q6SA75 Q94XE1 Q2L989 P11995 Q2HTB5 Q9FLM1 Q1S278 Q84WI7 Q8HQ04 Q6H5Z0 Q94XD8 Q42792 P42791

263 2.071 2.021 3.410 3.524 4.036 4.183 4.277 4.834 5.284 5.815 8.876 9.063 11.771 14.838 17.020 20. 31.956 117.272

2.1.S1_at

GmaAffx.23059.1.S1_at GmaAffx.90703.1.A1_at GmaAffx.68386.1.S1_at Gma.2342.1.S1_at Gma.612.1.A1_at GmaAffx.23237.1.S1_at GmaAffx.90642.1.S1_s_at GmaAffx.14166.1.S1_at Gma.10216.3.A1_x_at GmaAffx.3056 Gma.6211.1.S1_at Gma.17733.1.S1_s_at Gma.6211.2.A1_at GmaAffx.51208.1.S1_at Gma.7769.1.A1_at GmaAffx.78368.1.S1_at Gma.197.1.S1_at GmaAffx.8505.1.A1_s_at Supplementary Table 3 (Continued) 3 Table Supplementary

187

Appendix: Supplementary Table 4 Supplementary Table 4: Identified SFPs SFP MLG Best Hit GO TERM GO Annotation UniProt ID Gma.15061.1.S1_at A1 P35100 GO:0005524 ATP binding GmaAffx.50907.1.S1_at A1 Q1SMQ9 GO:0005509 Cation transport GmaAffx.61337.1.S1_at A1 Q2HU62 GO:0004553 Hydrolase Activity Gma.8011.2.S1_a_at A1 Q9FV81 GO:0006412 Protein Biosynthesis GmaAffx.93614.1.S1_at A1 Q1RV18 GO:0005515 Protein Folding GmaAffx.82088.1.S1_at A1 Q6ZGX8 GO:0006886 Protein Transport GmaAffx.90942.1.S1_at A1 Q40203 GO:0015031 Protein Transport Gma.4040.1.S1_at A1 Q1RY71 GO:0003700 Transcription Factor Gma.2069.1.S1_at A1 Q6Z918 GO:0004306 Transferase Gma.12885.1.A1_at A1 No UniProt Gma.5163.2.S1_at A1 Q1SBT2 GO:0005554 Unknown Function GmaAffx.11464.1.S1_at A1 Q9LS48 GO:0005554 Unknown Function GmaAffx.85311.1.S1_at A1 Q9FMN3 GO:0005554 Unknown Function Gma.4456.1.S1_at A2 Q9MUK5 GO:0004040 Amidase Activity Gma.16024.1.S1_s_at A2 Q1T4L1 GO:0015144 Carbohydrate Transport GmaAffx.79966.1.S1_at A2 Q2QQ50 GO:0003824 catalytic_activity GmaAffx.50907.1.S1_at A2 Q1SMQ9 GO:0005509 Cation transport GmaAffx.93166.1.S1_s_at A2 Q69F96 GO:0016760 Cellulose Synthase Gma.181.1.S1_at A2 Q93Y50 GO:0006633 fatty_acid_biosynthesis GmaAffx.31381.1.S1_at A2 Q8VYN6 GO:0006096 Glycolysis GmaAffx.61337.1.S1_at A2 Q2HU62 GO:0004553 Hydrolase Activity GmaAffx.79733.1.S1_s_at A2 Q39898 Kunitz trypsin inhibitor Gma.8776.1.S1_at A2 Q1SG75 GO:0016491 Oxidoreductase Gma.8011.2.S1_a_at A2 Q9FV81 GO:0006412 Protein Biosynthesis Gma.8911.1.S1_at A2 Q9SAG7 GO:0003723 RNA Binding GmaAffx.74113.1.S1_at A2 Q58I04 GO:0015770 Sugar Transport Gma.3510.1.S1_at A2 Q39658 GO:0003700 Transcription Factor Gma.2069.1.S1_at A2 Q6Z918 GO:0004306 Transferase GmaAffx.41460.1.S1_at A2 No UniProt GmaAffx.64946.1.S1_at A2 No UniProt GmaAffx.85112.1.S1_at A2 Q7XPH4 GO:0005554 Unknown Function Gma.978.1.S1_at B1 Q6H4P7 GO:0005524 ATP binding Gma.2101.1.S1_at B1 Q9M6N2 GO:0016301 Kinase Activity Gma.2977.2.S1_a_at B1 Q9FKI4 GO:0008080 N-acetyltransferase activity Gma.6914.1.A1_at B1 Q1RW61 GO:0006388 Translation Gma.17504.1.S1_at B1 No UniProt Gma.9816.1.S1_at B1 No UniProt GmaAffx.80241.1.S1_s_at B1 No UniProt Gma.5866.1.S1_x_at B1 Q9XIA7 GO:0005554 Unknown Function GmaAffx.92301.1.S1_s_at B1 Q1SZU3 GO:0005554 Unknown Function GmaAffx.47078.1.A1_at B2 Q6Z1G7 GO:0004739 Dehydrogenase Gma.2977.2.S1_a_at B2 Q9FKI4 GO:0008080 N-acetyltransferase

188

Supplementary Table 4 (Continued)

GmaAffx.80323.2.S1_at B2 No UniProt Gma.1477.1.S1_at B2 Q1SZZ7 GO:0005554 Unknown Function Gma.3265.1.S1_at B2 Q6EP64 GO:0005554 Unknown Function GmaAffx.38910.1.S1_at B2 Q1SS74 GO:0005554 Unknown Function GmaAffx.47461.1.S1_at B2 Q1RWC1 GO:0005554 Unknown Function GmaAffx.72674.1.S1_at B2 Q9SU16 GO:0005554 Unknown Function GmaAffx.88213.1.S1_at B2 Q8VY47 GO:0005554 Unknown Function GmaAffx.3025.1.S1_at C1 Q69ST5 GO:0005524 ATP binding Gma.2977.2.S1_a_at C1 Q9FKI4 GO:0008080 N-acetyltransferase activity Gma.17343.1.A1_at C1 No UniProt GmaAffx.8019.1.S1_at C1 No UniProt Gma.404.1.A1_at C1 Q76DY0 GO:0003700 Transcription Factor Gma.7761.1.S1_a_at C1 Q9SLZ4 GO:0003700 Transcription Factor Gma.3712.1.S1_s_at C1 Q7XYY0 Unknown Function GmaAffx.3025.1.S1_at C2 Q69ST5 GO:0006418 ATP binding GmaAffx.20345.1.S1_at C2 Q1T0B2 GO:0016126 Sterol Biosynthesis Gma.4461.1.S1_at C2 No UniProt GmaAffx.19097.1.S1_at C2 No UniProt Gma.11156.1.S1_s_at C2 Q5JML5 GO:0005554 Unknown Function Gma.3712.1.S1_s_at C2 Q7XYY0 GO:0005554 Unknown Function Gma.5866.1.S1_x_at C2 Q9XIA7 GO:0005554 Unknown Function Gma.8653.1.A1_at C2 Q9LXZ7 GO:0005554 Unknown Function GmaAffx.31836.1.S1_at C2 Q8LCV7 GO:0005554 Unknown Function GmaAffx.84246.1.S1_at C2 Q9LYD7 GO:0005554 Unknown Function Gma.978.1.S1_at D1a Q6H4P7 GO:0006418 ATP binding GmaAffx.83700.1.S1_at D1a Q1S6W0 GO:0016787 Hydrolase Activity Gma.2101.1.S1_at D1a Q9M6N2 GO:0016301 Kinase Activity GmaAffx.78358.2.S1_at D1a Q948H2 GO:0019252 Starch Biosynthesis GmaAffx.28966.1.S1_at D1a No UniProt GmaAffx.61076.1.S1_at D1a No UniProt Gma.4188.3.S1_a_at D1a Q8RWD4 GO:0005554 Unknown Function GmaAffx.52400.1.S1_s_at D1a Q9FHY6 GO:0005554 Unknown Function GmaAffx.92301.1.S1_s_at D1a Q1SZU3 GO:0005554 Unknown Function Gma.2424.2.S1_a_at D1B Q1SSV5 GO:0006914 Autophagy Gma.10827.3.S1_at D1B Q9LJ66 GO:0005506 Cation transport Gma.17592.1.S1_at D1B P49299 GO:0004108 Citrate Synthase GmaAffx.91660.1.S1_at D1B Q6RIB6 GO:0016615 Dehydrogenase GmaAffx.86105.1.S1_at D1B Q71QD5 Protein Metabolism GmaAffx.1301.47.A1_at D1B No UniProt GO:0003700 Transcription Factor Gma.13209.1.S1_at D1B No UniProt Gma.14204.1.A1_at D1B No UniProt Gma.2811.1.S1_at D1B No UniProt Gma.3997.1.S1_at D1B Q1S258 GmaAffx.53416.1.S1_at D1B No UniProt GmaAffx.82973.1.S1_at D1B No UniProt

189

Supplementary Table 4 (Continued)

GmaAffx.83133.1.S1_at D1B No UniProt Gma.3265.1.S1_at D1B Q6EP64 GO:0005554 Unknown Function GmaAffx.72674.1.S1_at D1B Q9SU16 GO:0005554 Unknown Function GmaAffx.15192.1.S1_at D2 Q1T1I6 GO:0006812 Cation transport GmaAffx.89881.1.S1_at D2 Q8GUR9 GO:0006118 Electron Transport Gma.3138.1.S1_at D2 Q9S9W2 GO:0016491 Oxidoreductase Gma.10873.1.S1_s_at D2 No UniProt Gma.17683.1.S1_at D2 No UniProt Gma.10605.1.S1_at D2 Q9LYH6 GO:0005554 Unknown Function Gma.5163.2.S1_at D2 Q1SBT2 GO:0005554 Unknown Function Gma.8881.1.A1_at D2 Q9FJL6 GO:0005554 Unknown Function GmaAffx.24966.1.S1_at D2 Q9SN77 GO:0005554 Unknown Function Gma.3204.1.S1_at E Q6PP98 GO:0006418 ATP binding Gma.8774.1.A1_at E Q6ATY7 GO:0015087 Cation Transport Gma.13363.1.S1_s_at E Q1PCR8 GO:0006118 Electron Transport Gma.3203.1.S1_at E No UniProt GO:0016301 Kinase Activity GmaAffx.24398.1.A1_at E Q1SP65 GO:0005515 Protein Folding GmaAffx.92612.1.S1_s_at E Q2PEV8 GO:0003723 RNA Binding Gma.7761.1.S1_a_at E Q9SLZ4 GO:0003700 Transcription Factor GmaAffx.18381.1.S1_at E Q9FI73 GO:0006810 Transport GmaAffx.54736.1.S1_at E No UniProt Gma.11156.1.S1_s_at E Q5JML5 Unknown Function GmaAffx.13040.1.S1_at E Q6Z1R9 Unknown Function Gma.6764.1.S1_at F Q6YZH8 GO:0009097 Amino Acid Biosynthesis Gma.3904.1.S1_x_at F P49597 GO:0003824 catalytic_activity GmaAffx.15192.1.S1_at F Q1T1I6 GO:0006812 Cation Transport GmaAffx.31689.1.S1_at F Q9SYM7 GO:0005506 Cation Transport Gma.13363.1.S1_s_at F Q1PCR8 GO:0006118 Electron Transport GmaAffx.5496.1.S1_at F Q6S4R9 GO:0016787 Hydrolase Activity Gma.9902.1.A1_at F Q94JU3 GO:0004708 Kinase Activity GmaAffx.40133.1.S1_at F Q1SMG8 GO:0016829 Lyase Activity GmaAffx.7213.1.A1_at F Q8S3S6 GO:0003676 Nucleic Acid Binding Gma.14194.1.A1_at F Q8GYC2 GO:0016925 Protein Metabolism Gma.16526.1.S1_at F Q2R4U5 GO:0006508 proteolysis Gma.8911.1.S1_at F Q9SAG7 GO:0003723 RNA Binding GmaAffx.92612.1.S1_s_at F Q2PEV8 GO:0003723 RNA Binding Gma.4463.1.S1_at F Q6TKQ3 GO:0006355 Transcription Factor Gma.7761.1.S1_a_at F Q9SLZ4 GO:0003700 Transcription Factor GmaAffx.51287.1.S1_at F P56820 GO:0003743 Translation Gma.9401.1.S1_at F No UniProt GmaAffx.1268.1.S1_at F No UniProt Gma.4188.3.S1_a_at F Q8RWD4 GO:0005554 Unknown Function GmaAffx.16959.1.S1_s_at F Q9FN38 GO:0005554 Unknown Function GmaAffx.73763.1.S1_at F Q9LKA5 GO:0005554 Unknown Function GmaAffx.75685.1.S1_at F Q2HWC9 GO:0005554 Unknown Function

190

Supplementary Table 4 (Continued)

GmaAffx.88213.1.S1_at F Q8VY47 GO:0005554 Unknown Function Gma.3137.1.S1_at G Q1T3D6 GO:0006812 Cation Transport GmaAffx.93166.1.S1_s_at G Q69F96 GO:0016760 Cellulose Synthase GmaAffx.31381.1.S1_at G Q8VYN6 GO:0006096 Glycolysis Gma.16673.1.A1_at G No UniProt GmaAffx.87763.1.S1_at G Q2V398 GO:0006730 Protein Metabolism Gma.3510.1.S1_at G Q39658 GO:0003700 Transcription Factor Gma.7768.1.A1_s_at G Q9SH93 GO:0005554 Unknown Function Gma.5734.1.S1_at H Q7XY14 GO:0017004 Cytochrome Gma.6749.1.S1_at H Q2HSH7 GO:0004407 Histone Deaceetylase Gma.14194.1.A1_at H Q8GYC2 GO:0016925 Protein Metabolism Gma.16753.1.S1_at H Q9SAA2 GO:0006508 proteolysis Gma.13456.1.S1_at H Q93YQ0 GO:0003723 RNA Binding Gma.10222.1.A1_at H No UniProt Gma.5866.1.S1_x_at H Q9XIA7 GO:0005554 Unknown Function Gma.2120.1.S1_at I Q330L2 GO:0015385 Cation transport Gma.15264.1.S1_at I Q1SM16 GO:0003676 Nucleic Acid Binding GmaAffx.20686.1.A1_at I No UniProt GO:0006457 Protein Folding GmaAffx.89933.1.S1_at I O65844 GO:0000163 Protein Metabolism Gma.12340.1.S1_at I No UniProt GO:0005529 Sugar Binding GmaAffx.88272.1.S1_at J Q9C5K4 GO:0005524 ATP binding Gma.4374.1.S1_s_at J Q2MJ15 GO:0005506 Cation transport GmaAffx.16523.1.S1_at J Q5N8C4 GO:0016301 Kinase Activity GmaAffx.77832.1.S1_at J Q9FX34 GO:0006499 Protein Metabolism GmaAffx.1301.47.A1_at J No UniProt GO:0003700 Transcription Factor GmaAffx.11179.2.S1_s_at J No UniProt GmaAffx.15811.1.A1_at J No UniProt GmaAffx.77266.1.S1_at J No UniProt GmaAffx.90082.1.S1_s_at J No UniProt Gma.16023.2.S1_at K Q9FMX7 GO:0003824 catalytic_activity Gma.1909.1.S1_at K No UniProt GmaAffx.27319.1.S1_at K P93045 GO:0005515 Protein Folding GmaAffx.77832.1.S1_at K Q9FX34 GO:0006499 Protein Metabolism GmaAffx.18381.1.S1_at K Q9FI73 GO:0006810 Transport Gma.3700.2.S1_at K O22164 GO:0005554 Unknown Function Gma.4653.1.S1_at K Q5YJQ1 GO:0005554 Unknown Function Gma.1955.4.A1_at L Q5PYQ5 GO:0005509 Cation transport GmaAffx.47203.1.S1_at L Q9FV54 GO:0005506 Cation Transport Gma.12704.1.A1_at L Q2MZW1 GO:0019825 Oxidase GmaAffx.75422.1.S1_at L No UniProt GO:0006499 Protein Metabolism Gma.7757.1.S1_at L Q93XA0 GO:0003700 Transcription Factor Gma.5974.1.S1_at L Q1SCD5 GO:0005525 Translation Gma.13798.1.A1_at L No UniProt Gma.602.1.S1_at L No UniProt Gma.9330.1.S1_at L No UniProt GmaAffx.33971.1.A1_at L No UniProt

191

Supplementary Table 4 (Continued)

Gma.4374.1.S1_s_at M Q2MJ15 GO:0005506 Cation transport Gma.5994.1.S1_at M Q1SRH3 GO:0004869 Cysteine Protease GmaAffx.45517.1.S1_at M Q9M3Y4 GO:0045735 Nutrient Reservoir Activity Gma.8776.1.S1_at M Q1SG75 GO:0016491 Oxidoreductase Gma.12223.1.S1_at M Q1SF86 GO:0005515 Protein Folding Gma.8911.1.S1_at M Q9SAG7 GO:0003723 RNA Binding Gma.7058.1.A1_at M Q1T4P7 GO:0003700 Transcription Factor GmaAffx.77266.1.S1_at M No UniProt Gma.1289.1.S1_at M Q05929 GO:0005554 Unknown Function Gma.7768.1.A1_s_at M Q9SH93 GO:0005554 Unknown Function GmaAffx.26598.1.S1_at M Q9LFC2 GO:0005554 Unknown Function Gma.1955.4.A1_at N Q5PYQ5 GO:0005509 Cation transport Gma.7757.1.S1_at N Q93XA0 GO:0003700 Transcription Factor Gma.7757.1.S1_at N Q93XA0 GO:0003700 Transcription Factor GmaAffx.47461.1.S1_at N Q1RWC1 GO:0005554 Unknown Function GmaAffx.72771.1.A1_at O Q1SKU8 GO:0008270 Cation transport Gma.15264.1.S1_at O Q1SM16 GO:0003676 Nucleic Acid Binding GmaAffx.76755.1.S1_at O Q8L960 GO:0003723 RNA Binding Gma.5974.1.S1_at O Q1SCD5 GO:0005525 Translation GmaAffx.92783.1.S1_s_at O Q58I24 GO:0003746 Translation Gma.16925.1.A1_at O No UniProt Gma.2811.1.S1_at O No UniProt Gma.3997.1.S1_at O Q1S258 GmaAffx.20697.1.S1_at O No UniProt GmaAffx.4898.2.S1_at O No UniProt GmaAffx.51828.1.S1_s_at O No UniProt GmaAffx.60948.1.S1_at O No UniProt GmaAffx.67495.1.S1_at O No UniProt Gma.13769.1.S1_at Unkn No UniProt own Gma.602.1.S1_at Unkn No UniProt own GmaAffx.32146.1.S1_at Unkn Q6QXZ8 GO:0009775 Electron Transport own