Investigation of Plant Specialized (Secondary Metabolism) Using Metabolomic and Proteomic Approaches

Item Type text; Electronic Dissertation

Authors Xie, Zhengzhi

Publisher The University of Arizona.

Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.

Download date 25/09/2021 17:46:43

Link to Item http://hdl.handle.net/10150/195218 INVESTIGATION OF PLANT SPECIALIZED METABOLISM (SECONDARY

METABOLISM) USING METABOLOMIC AND PROTEOMIC APPROACHES

By

Zhengzhi Xie

A Dissertation Submitted to the Faculty of the

DEPARTMENT OF PHARMACEUTICAL SCIENCES

In Partial Fulfillment of the Requirements

For the Degree

DOCTOR OF PHILOSOPHY

In the Graduate College

THE UNIVERSITY OF ARIZONA

2007

2

THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE

As members of the Dissertation Committee, we certify that we have read the dissertation

prepared by Zhengzhi Xie

entitled Investigation of plant specialized metabolism (secondary metabolism) using metabolomic and proteomic approaches

and recommend that it be accepted as fulfilling the dissertation requirement for the

Degree of Doctor of Philosophy

______Date: 5/30/07 David R. Gang

______Date: 5/30/07 Myron K. Jacobson

______Date: 5/30/07 Danzhou Yang

Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final copies of the dissertation to the Graduate College.

I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement.

______Date: 5/30/07 Dissertation Director: David R. Gang

3

STATEMENT BY AUTHOR

This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library.

Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed use of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author.

SIGNED: Zhengzhi Xie _

4

ACKNOWLEDGEMENTS

I would like to thank all the people who have provided their kind help to me. I thank Dr. David R. Gang and Dr. Barbara N. Timmermann for their support and guidance for my research. I am truly grateful for the wonderful opportunity they gave me to work on this interesting project, their outstanding teaching skills stimulating my critical scientific thinking, and for their patience when they helps me improve my communication skills. I would like to thank Dr. Myron K. Jacobson, Dr. Victor J. Hruby, and Dr. Danzhou Yang for their encouragements, and their helpful and constructive suggestions which have been critical for both my Ph.D. study and my future career.

I am thankful to all the members of Dr. Gang’s laboratory for their friendship and the enjoyable working environment they created. I would like to thank Dr. Jeremy Kapteyn for his help, collaboration and instructive discussion on the sample preparation, and data analysis of the proteomic experiments. I would like to thank Dr. Xiao-Qiang Ma and Hyun Jo Koo for their help, collaboration and instructive discussion on the experimental design, sample preparation, and data collection on the metabolomic experiments. I would like to thank Brenda L. Jackson and Eric McDowell for their assistance in instrumental operation and instructive discussion on computer programming. I would like to thank former members of Dr. Gang’s laboratory, Dr. Hongliang Jiang and Dr. Maria del Carmen Ramirez-Ahumada for their kind help and advice on my project.

Finally, I would like to thank my parents Mr. and Mrs. Shuyuan Xie, my wife Hong Gao and my sister Zhenghui Xie for their love, support, strong confidence in my scientific abilities, and giving me the strength and courage to complete the doctoral degree and pursue my scientific career.

This project is supported by the National Institutes of Health NCCAM/ODS Research Training Grant # 3P50AT000474-03S1 to B.N.T., NSF Plant Genome Program Grant DBI-0227618 to D.R.G., and NSF Metabolic Biochemistry Program Grant MCB- 0210170 to D.R.G.

5

TABLE OF CONTENTS

LIST OF FIGURES………………………………………………………………….… 8

LIST OF TABLES………………………………………………………………..…… 10

LIST OF ABBREVIATIONS…………………………………………...…………….. 11

ABSTRACT………………………………………………………………………..….. 15

CHAPTER 1: INTRODUCTION…………………………………………………….. 17 1.1 Brief introduction of plant specialized metabolism...………………………...….. 17 1.1.1 The phenylpropanoid pathway and the terpenoid pathway ………….….. 17 1.1.2 Regulation of plant specialized metabolism and variation of herbal quality………………………………………………… 21 1.2 Review: plant specialized metabolism research using proteomic and metabolomic approaches……………………………………………………. 23 1.2.1 Approaches of proteomics………………………………………………. 23 1.2.2 Proteomic research on herbal plant specialized metabolism…………….. 28 1.2.3 Introduction of metabolomics……………………………………………. 30 1.2.4 Metabolomic research on herbal plant specialized metabolism…………. 34 1.3 Review: Basil glandular trichome ...……………………………………………... 37 1.3.1 Basil produces phenylpropanoids and terpenoids in glandular trichome…………………………………………………………………. 37 1.3.2 Basil glandular trichome as a model system of plant specialized metabolism research…………………………………………. 39 1.4 Review: turmeric………………………………………………………………… 45 1.4.1 History, chemistry, and biological activities of turmeric……..…………. 45 1.4.2 of diarylheptanoids in turmeric……………………………. 54 1.4.3 Effects of agricultural factors on yield and quality of turmeric rhizome……………………………………………………….56

CHAPTER 2: MATERIALS AND METHODS………………………………………. 59 2.1 Materials…………………………………………………………………………. 59 2.1.1 General materials………………………………………………………… 59 2.1.2 Growth of basil and leaf sample collection……………………………… 59 2.1.3 Growth of turmeric and rhizome sample collection……………………... 60 2.2 Equipment and data analysis software…………………………………………… 61 2.2.1 Equipment……………………………………………………………….. 61 2.2.2 Computer languages and software tools for data processing and analysis……………………………………………………………… 62 2.3 Methods………………………………………………………………………….. 63 2.3.1 GC-MS analysis of basil leaves………………………………………….. 63 2.3.2 Isolation of peltate trichome glands from basil young leaves………….… 64

6

TABLE OF CONTENTS – continued

2.3.3 Protein purification and fractionation from the basil glandular trichomes………………………………………………………. 65 2.3.4 Proteomic analysis using GeLC-MS/MS………………………………… 66 2.3.5 Proteomic analysis using MudPIT……………………………………….. 67 2.3.6 Database searching and data processing for protein identification and TSC calculation……………………………………….. 68 2.3.7 Data processing for protein functional categorization…………………… 69 2.3.8 Searching for PTMs using X!tandem……………………………………. 70 2.3.9 Sample preparation for metabolomic analysis of turmeric rhizomes……. 71 2.3.10 GC-MS analysis of turmeric rhizomes…………………………………... 72 2.3.11 LC-MS and LC-MS/MS analysis of turmeric rhizomes…………………. 73 2.3.12 LC-PDA analysis of turmeric rhizomes……………….…………………. 75 2.3.13 Data analysis of the targeted analysis…….……………………………… 76 2.3.14 Data analysis of the non-targeted analysis…….…….…………………… 76

CHAPTER 3: RESULTS………………………………………………………………. 78 3.1 Protein expression and modification in basil glandular trichomes..…..…………. 78 3.1.1 Profiles of volatile metabolites in basil lines ……..…………………...… 79 3.1.2 Protein identification using shotgun proteomic approaches…………...… 79 3.1.3 Protein functional categorization based on gene ontology (GO) controlled vocabulary……………………………………………… 84 3.1.4 Protein and mRNA levels of related to the biosynthesis of phenylpropanoids and terpenoids……………………..… 89 3.1.5 Identification of protein post-translational modifications (PTMs)…………………………………………………………………… 92 3.2 Metabolomic investigation of turmeric rhizome …………………………………101 3.2.1 Targeted analysis of three major curcuminoids .……………..………….. 102 3.2.2 Non-targeted analysis using LC-MS and GC-MS…...………...……...…. 107 3.2.3 Correlation between the abundance patterns of metabolites: co-regulated metabolite modules………………………..…. 110 3.2.4 Distribution of Pearson correlation coefficient between the abundance patterns of metabolites……………………….…. 111 3.3 Identification of diarylheptanoids in turmeric rhizome using Tandem-MS-ASC (Assisted Spectra Comparison)……………………….. 126 3.3.1 Development of Tandem-MS-ASC (Assisted Spectra Comparison)……………………………………………………………… 127 3.3.2 Identification of diarylheptanoids in turmeric rhizome based on their LC-MS/MS……………………………………………….. 133

CHAPTER 4: DISCUSSION…………………………………………………………... 141 4.1 Discussions of proteomic results…………………………………………………... 141

7

TABLE OF CONTENTS – continued

4.1.1 Shotgun approaches as high-throughput tools for plant proteomics research…………………………………………….. 141 4.1.2 Differential expression of specific enzymes controls chemical diversity of basil lines………………………………… 146 4.1.3 Transcriptional and post-transcriptional regulation of phenylpropanoid and terpenoid metabolism in basil glandular trichomes………………………………………………… 153 4.2 Discussion of results of metabolomic investigation and diarylheptanoid identification …………………………………………………… 157 4.2.1 Variation of compound abundances in turmeric rhizomes………………. 157 4.2.2 Applications of co-regulated metabolite module ...……………………… 160 4.2.3 Plant metabolomics for natural product chemistry………………………. 162

APPENDIX A: SCRIPTS AND COMMAND LINES USED IN THE DATA ANALYSIS…………………………………..…………………. 168 A1. Scripts for the proteomic data analysis…………………………..………. 168 A2. Scripts and command lines for metabolomic data analysis……….….….. 204 A3. Scripts of Tandem-MS-ASC…………………………………………….. 207

APPENDIX B: SUPPLEMENTARY DATA OF THE PROTEOMIC RESEARCH………………………………………………………………………243 B1. Proteins present in at least three of the four basil lines…………………... 243 B2. Spectral count and descriptions of protein entries……………………..… 246 B3. Protein post-translational modifications found by X!tandem (less reliable results)……………………………………..…… 266

REFERENCES………………………………………………………………………… 268

8

LIST OF FIGURES

Figure 1-1-1 Core reactions of the phenylpropanoid pathway………….………….... 18 Figure 1-1-2 Pathways leading to the biosynthesis of terpenes……………………… 20 Figure 1-2-1 Publications on ISI containing “metabolom*” from 1999 to 2006 ……………….…………………………………….. 31 Figure 1-2-2 Structures of some specialized metabolites found in basil leaves…………………………………………………………... 38 Figure 1-4-1 Publications on ISI containing “turmeric” or “Curcuma Longa” from 1975 to 2006 ………………………………… 49 Figure 1-4-2 Diarylheptanoids identified or proposed from turmeric………………...49 Figure 1-4-3 Structures of abundant compounds in turmeric volatile oil……………………………………………………………… 53 Figure 1-4-4 Proposed biosynthetic pathway of curcuminoids in turmeric …………………………………………………………….. 55 Figure 2-3-1 Data processing method for identification of protein PTMs using X!tandem ………………………………………………… 70 Figure 3-1-1 The experimental design of the proteomic study of basil glandular trichomes of basil glandular trichome……………...….. 78 Figure 3-1-2 Representative GC-MS chromatograms showing different metabolic profiles of basil cultivars………………………….. 80 Figure 3-1-3 Venn diagrams showing overlap among proteomes of four basil cultivars……………………………...…………………… 85 Figure 3-1-4 Venn diagrams showing overlap of protein entries connected with basil glandular trichome EST entries and plant protein IDs from Uniprot……………………….…………… 86 Figure 3-1-5 Distribution of protein entries within “Biological process”, “Cellular component”, and “Molecular function” GO categories after GO Slim processing…………………..... 87 Figure 3-2-1 Experimental design for the metabolomic investigation of turmeric rhizomes …………………………………………...……… 101 Figure 3-2-2 Effects of cultivar, growth stage, and fertilization treatment on the accumulation of three major curcuminoids…...……… 105 Figure 3-2-3 Effect size (η2) of the factors and error………………………………… 106 Figure 3-2-4 Linear correlation between the levels of three major Curcuminoids…………………………………………………….…….. 106 Figure 3-2-5 A representative LC-MS result of turmeric rhizome………………...... 114 Figure 3-2-6 Representative GC-MS results showing metabolic profiles of turmeric cultivars…………………………………………… 115 Figure 3-2-7 Score plots showing position of LC-MS data collected from systematically preturbed turmeric rhizome samples…………..…. 116 Figure 3-2-8 Score plots showing position of GC-MS data collected from systematically preturbed turmeric rhizome samples……….…….. 117 Figure 3-2-9 The two-way HCA results aligned with a heatmap demonstrating the changes of metabolite levels detected by LC-MS…………………………………………………….. 118

9

LIST OF FIGURES – continued

Figure 3-2-10 The two-way HCA results aligned with a heatmap demonstrating the changes of metabolite levels detected by GC-MS a…………………………………………………... 119 Figure 3-2-11 The results of HCA analysis using Pearson correlation coefficient between abundance patterns of metabolites detected using LC-MS …………….…………………………………... 120 Figure 3-2-12 The results of HCA analysis using Pearson correlation coefficient between abundance patterns of metabolites detected using GC-MS …………….…………………………………... 121 Figure 3-2-13 Distribution of Pearson correlation coefficients between abundance patterns of metabolites detected by LC-MS……………...... 123 Figure 3-2-14 Distribution of Pearson correlation coefficients between abundance patterns of metabolites detected by GC-MS ………………. 123 Figure 3-2-15 Comparison of the distributions of correlation values calculated from real datasets with distributeions from scrambled datasets……………………………………….…………….. 124 Figure 3-2-16 Comparison of the distribution of correlation values from the whole dateset to 5M/7M separate datesets…………………….…… 125 Figure 3-3-1 The experimental approach for identification of diarylheptanoids in turmeric rhizomes …………………………….…...126 Figure 3-3-2 Principle of Component I (comparison with theoretical spectra) ……... 129 Figure 3-3-3 Principle of Component II (searching for ion series)…………………...131 Figure 3-3-4 Principle of Component III (comparison of spectra with suitable mass shifts)………………………………………………. 132 Figure 3-3-5 Combination of the three data analysis approaches.…………………… 133 Figure 3-3-6 The fragmentation pattern of diarylheptanoids………………………… 136 Figure 3-3-7 Structure and MS/MS spectra of compound 23………..………….…… 137 Figure 3-3-8 Structure and MS/MS spectra of compound 24..………………….…… 137 Figure 3-3-9 Structure and MS/MS spectra of compound 25..………………….…… 138 Figure 3-3-10 Structure and MS/MS spectra of compound 26 and 27…..……….…… 138 Figure 3-3-11 Spectral comparison of 15 and 28……………..……………………….. 139 Figure 3-3-12 Spectral comparison of 22 and 29……..……………………………….. 140 Figure 4-1-1 Levels of transcripts and proteins of specialized metabolic enzymes vs. their corresponding metabolites among the 4 basil lines….…….….. 148 Figure 4-2-1 Positions of the outliers in the green house ………………………..….. 159 Figure 4-2-2 The results of HCA analysis using Pearson correlation coefficient between abundance patterns of metabolites detected identified using LC-MS/MS……………………..……..….…. 163 Figure 4-2-3 A A possible pathway of generation of compounds in analogs in module “a” of LC-MS correlation result…………………… 164 Figure 4-2-4 Proposed biosynthetic pathway of diarylheptanoids in turmeric………. 164 Figure 4-2-5 Generic scheme for bioassay-guided fractionation…………………….. 165 Figure 4-2-6 A scheme for the proposed “metabolomics-guided discovery”……….. 167 Figure 4-2-7 A scheme for the proposed “correlation bioassay”……………………. 167

10

LIST OF TABLES

Table 1-3-1 Reported enzymatic activity and mRNA level of enzymes in basil gland………………………………………………. 44 Table 2-1-1 Fertilizer treatments for turmeric ……………………………………… 60 Table 3-1-1 Summary of results of TPP processing………………………………… 80 Table 3-1-2 Normalized protein and mRNA levels for selected highly expressed enzymes from primary/core metabolism …………………… 94 Table 3-1-3 Normalized protein and mRNA levels for enzymes in the shikimate pathway………………………………………………..95 Table 3-1-4 Normalized protein and mRNA levels for enzymes in the phenylpropanoid pathway……………………………………….. 96 Table 3-1-5 Normalized protein and mRNA levels for enzymes in the MEP/DOXP and mevalonate (MVA) pathways………………… 97 Table 3-1-6 Normalized protein and mRNA levels for prenyltransferases and terpene synthases………………………………………………….. 98 Table 3-1-7 Post-translational modifications found in the proteins from basil glandular trichomes………………………………………… 99 Table 3-2-1 The levels of the three major curcuminoids in turmeric rhizomes under systematically perturbation ...………………………… 104 Table 3-2-2 Descriptive statistics of Pearson correlation coefficients between abundance patterns of metabolites………………………….… 122

11

LIST OF ABBREVIATIONS

2D gel two-dimensional gel electrophoresis 2DE two-dimensional gel electrophoresis 4CL 4-coumarate:CoA AMDIS automated mass spectral deconvolution and identification system ANN artificial neural networks ANOVA analysis of variance APEX absolute protein expression measurements C4H cinnamate-4-hydroxylase CAAT p-coumaryl/coniferyl alcohol acetyl CAD alcohol dehydrogenases CCMT p-Coumarate/cinnamate carboxylmethyltransferase CCR cinnamoyl-CoA reductase cDNA complementary DNA CDS γ-cadinene synthase CID collision-induced dissociation COX-2 cyclooxygenase-2 p-coumaroyl-coenzyme A:4-hydroxyphenyllactic acid p-coumaroyl CPLT transferas CS3′H p-coumaroyl shikimate 3′-hydroxylase CST p-coumaroyl-coenzyme A:shikimic acid p-coumaroyl transferase CV coefficient of variation CVOMT chavicol O-methyltransferase DA discriminant analysis,or discriminant function analysis DIGE difference gel electrophoresis DMAPP dimethylallyl diphosphate DNA deoxyribonucleic acid

12

LIST OF ABBREVIATIONS– Continued

DOXP 1-deoxy-D-xylulose-5-phosphate synthase DXR DOXP reductoisomerase DXS 1-deoxy-D-xylulose-5-phosphate synthase EC evolutionary computation EGS EOMT eugenol O-methyltransferase EPA The Environmental Protection Agency ESI electrospray ionization EST expressed sequence tag EXP 2-C-methyl-D-erythritol 4-phosphate FES fenchol synthase FPPS farnesyl diphosphate synthase FT-ICR MS fourier transform-ion cyclotron resonance-mass spectrometry FT-MS fourier transform-ion cyclotron resonance-mass spectrometry two-dimensional gas chromatography with time-of-flight mass GC×GC-TOFMS spectrometry GC-MS gas chromatography-mass spectrometry GDS germacrene D synthase GeLC-MS/MS gel enhanced LC-MS/MS GES geraniol synthase GGPPS geranylgeranyl pyrophosphate synthase GO gene ontology GPPS geranyl diphosphate synthase HCA hierarchical cluster analysis HIV human immunodeficiency virus ICAT isotope coded affinity tag

13

LIST OF ABBREVIATIONS– Continued

IDI isopentenyl-diphosphate delta- iNOS nitric oxide synthase IPP isopentenyl diphosphate LC liquid chromatography LC-ESI-MS liquid chromatography-electrospray ionization-mass spectrometry LC-MS liquid chromatography-mass spectrometry LC-MS/MS high-speed liquid chromatography/tandem mass spectrometry LIS (R)-linalool synthase matrix assisted laser desorption ionization time-of-flight mass MALDI-TOF spectrometry 2-C-methyl-D-erythritol 4-phosphate/1-deoxy-D-xylulose 5- MEP/DOXP phosphate mRNA messenger ribonucleic acid MS mass spectrometry MTBE methyl t-butyl ether MudPIT multidimensional protein identification technology MVA 1-deoxy-D-xylulose-5-phosphate MW molecular weight MYS β-myrcene synthase NIH the National Institutes of Health NIST the National Institute of Standards and Technology NMR nuclear magnetic resonance PAL phenylalanine ammonia- PC principal component PCA principal component analysis PDA photodiode array

14

LIST OF ABBREVIATIONS– Continued pI isoelectric point PMF peptide mass fingerprinting phenylmethanesulphonylfluoride or phenylmethylsulphonyl PMSF fluoride PTFE polytetrafluoroethylene PTGS post-transcriptional gene silencing PTM post-translational modification RAPD Random amplified polymorphic DNA RSD relative standard deviation Rt retention time SD basil line Sweet Dani SDS sodium dodecyl sulfate SES selinene synthase SILAC isotope labeling with amino acids in cell culture SOM self organizing mapping a software tool for compound identification (ASC standing for Tandem-MS-ASC "Assisted Spectra Comparison") TEN total EST number TES terpinolene synthase TLC thin layer chromatography TPP Trans-Proteomic Pipeline TPS terpene synthases TSC total spectral count ultra-performance liquid chromatography/time-of-flight mass UPLC/TOFMS spectrometry ZIS α-zingiberene synthase

15

ABSTRACT

Specialized metabolism (secondary metabolism) in glandular trichomes of sweet basil

(Ocimum basilicum L.) and accumulation of specialized metabolites (secondary metabolites) in rhizomes of turmeric (Curcuma longa L.) was investigated using proteomic and metabolomic approaches, respectively. In an effort to further clarify the regulation of metabolism in the glandular trichomes of sweet basil, we utilized a proteomics-based approach that applied MudPIT (multidimensional protein identification technology) and GeLC-MS/MS (gel enhanced LC-MS/MS) to protein samples from isolated trichomes of four different basil lines: MC, SW, SD, and EMX-1.

Phosphorylation, ubiquitination and methylation of proteins in these samples were detected using X!tandem. Significant differences in distribution of the 755 non-redundant protein entries demonstrated that the proteomes of the glandular trichomes of the four basil lines were quite distinct. Correspondence between proteomic, EST, and metabolic profiling data demonstrated that both transcriptional regulation and post-transcriptional regulation contribute to the chemical diversity. One very interesting finding was that precursors for different classes of terpenoids, including mono- and sesquiterpenoids, appear to be almost exclusively supplied by the MEP (2-C-methyl-D-erythritol 4- phosphate) pathway, but not the mevolonate pathway, in basil glandular trichomes. Our results suggest that carbon flow can be readily redirected between the phenylpropanoid and terpenoid pathways in this specific cell type. To investigate the impact of genetic, developmental and environmental factors on the accumulation of phytochemicals in rhizomes of turmeric, we performed metabolomic analysis in a 2×2×4 full factorial

16

design experiment using GC-MS, LC-MS, and LC-PDA. Our results showed that growth

stage had the largest effect on levels of the three major curcuminoids. Co-regulated

metabolite modules were detected, which provided valuable information for identification

of phytochemicals and investigation of their biosynthesis. Based on LC-MS/MS data, 4 new diarylheptanoids were tentatively identified in turmeric rhizomes using Tandem-MS-

ASC, a home-made software tool that automatically recognizes spectra of unknown compounds using three approaches. Based on our metabolomic results, we proposed two new strategies, “metabolomics-guided discovery” and “correlation bioassay”, to identify bioactive constituents from plant extracts based on information provided by metabolomic investigation.

17

CHAPTER 1: INTRODUCTION

1.1 Brief introduction of plant specialized metabolism

1.1.1 The phenylpropanoid pathway and the terpenoid pathway

Understanding biosynthesis of plant phytochemicals is an essential task of the natural product chemistry research. Traditionally, probably due to their limited distribution across the plant kingdom, plant natural products were considered “secondary” in importance to plant life, and thus were named as “secondary” metabolites. However, in recent years, it is widely accepted that secondary metabolites are essential to plant function (Gang, 2005). Therefore, another term “specialized” metabolites has recently gained popularity to emphasize specialized functions and specific metabolic pathways of these compounds (Gang, 2005). These metabolites, despite their great structural variation, are generated in plants through a very limited number of pathways. This project focuses on two important plant specialized metabolic pathways: the phenylpropanoid pathway and the terpenoid pathway.

The products of the phenylpropanoid pathway include most of plant phenolic natural products, such as phenylpropanes, lignans, flavonoids, coumarins, and stilbenes (Croteau et al., 2002). Phenylpropanoids have been found to have various pharmacological properties, such as antimicrobial, analgesic, anti-inflammatory, immunostimulatory, and expectorant activities (Kurkin, 2003). They also play important roles in plant physiology and defense by serving as pigments, phytoalexins, UV protectants, and insect repellents

(Hahlbrock and Scheel, 1989).

18

In most vascular plants, e.g. basil and turmeric, phenylalanine is the precursor of the phenylpropanoid pathway (Figure 1-1-1) (Croteau et al., 2002; Dixon et al., 2002). In the first committed step, phenylalanine is converted to cinnamic acid via the activity of phenylalanine ammonia-lyase. (PAL) (Croteau et al., 2002). Cinnamic acid is hydroxylated to form 4-courmaric acid by cinnamate-4-hydroxylase (C4H) in the second step (Dixon et al., 2002). Subsequently, 4-coumarate:CoA ligase (4CL) activates p- coumaric acid by generation of its coenzyme A ester in the third step. p-Coumaroyl-CoA is the common precursor of phenylproanes, lignins, lignans, flavonid, and stilbenes

(Croteau et al., 2002). Cinnamic acid derivatives, such as methylcinnamate in basil line

MC, are synthesized from cinnamic acid, the product of the first step (Croteau et al.,

2002).

cinnmic acid derivatives

phenylpropanes COOH COOH COOH COSCoA

NH2 PAL C4H 4CL lignins and lignans

OH OH phenylalanine cinnamic acid p-coumaric acid p-coumaroyl-CoA flavonoids and stilbenes

Figure1-1-1 Core reactions of the phenylpropanoid pathway (Croteau et al., 2002)

The products of the terpenoid pathway have significant structural diversities and perform multiple important biological roles (Croteau et al., 2002). Structurally, they are built up

19

of C5 units, and classified by the number of units. Monoterpenes (C10) and sesquiterpenes

(C15) are major components of plant essential oils. Diterpene (C20) includes phytol, hormones, and phytoalexins. Triterpenes (C30), and tetraterpenes (C40) function as membrane components, feeding deterrents, surface waxes, and photosynthetic pigments

(Croteau et al., 2002). Many important medicinal natural products, such as taxol and artemisinin, are terpenoids. Investigation of their biosynthesis and engineering are very active research areas (Julsing et al., 2006; Withers and Keasling, 2007).

The biosynthesis pathway of terpenoids in plants is outlined in Figure 1-1-2 (Croteau et al., 2002; Iijima et al., 2004a; Bouvier et al., 2005). Briefly, isopentenyl diphosphate (IPP) is synthesized in the cytosol via mevalonic acid (MVA) pathway, or in the plastids via the

2-C-methyl-D-erythritol 4-phosphate (MEP) pathway (or named “MEP/DOXP pathway”.

DOXP stands for 1-deoxy-D-xylulose-5-phosphate) (Rohmer et al., 1996; Rodriguez-

Concepcion, 2006). Then under the catalysis of prenyl and terpene synthase, the IPP in cytosol is incorporated into sesquiterpenes, and triterpenes (Bouvier et al.,

2005). Similarly, IPP in plastids is incorporated into monoterpenes, diterpenes, , and plastoquinones (Bouvier et al., 2005). It is noteworthy that the exchange of IPP between the cytosol and the plastid has been demonstrated by feeding experiments

(Kasahara et al., 2002). This cross-talk was also found in the epidermis of snapdragon flowers, where the MEP pathway can supply IPP for the synthesis of both monoterpenes in the plastid and sesquiterpenes in the cytosol (Dudareva et al., 2005).

20

CYTOSOL PLASTID OP

O H OH Acetyl-CoA H C COSCoA pyruvate glyceraldehyde 3-phosphate 3 COOH (GAP) AACT O

HMGS DXS OH 1-Deoxy-D-xylulose-5-phosphate (DXP) HMGR OPP O OH OH DXR Mevalonic acid OH (MVA) COOH CMS MVK CMK

MCS PMK HDS

PMD HDR IDI IDI OPP OPP IPP DMAPP Dimethylallyl diphosphate Isopentenyl diphosphate (DMAPP) (IPP) GPS GPS Geranyl diphosphate GPP OPP TPS (GPP) FPS Monoterpenes GGPS ?

Farneyl OPP diphosphate TPS 2 TPS OPP Geranylgeranyl diphosphate (FPP) 3 Sesquiterpenes (GGPP) GGPS Triterpenes

GGPP TPS

Polyterpenes, etc. Diterpenes, carotenoids, plastoquinones, etc.

Figure 1-1-2 Pathways leading to the biosynthesis of terpenes (Croteau et al., 2002;

Iijima et al., 2004a; Bouvier et al., 2005)

21

1.1.2 Regulation of plant specialized metabolism and variation of herbal product quality

Plant specialized metabolism is determined by several genetic, developmental, and

environmental factors. Chemically distinct populations exist as “chemodemes” or

chemotypes within a species (Evans, 2002). The genetic differences between those

populations are so small that they have very little or no impact on morphology, but are significant enough to introduce substantial variations on specialized metabolism (Evans,

2002). Also, it is well established that the production of specialized metabolites is affected by growth stage (age and/or season of collection) (Liu et al., 1998), temperature

(Shohael et al., 2006), light (Kunz et al., 2006), salinity (Wahid and Ghazanfar, 2006),

availability of nutrients/fertilizers (Turtola et al., 2002), water (drought or flooding) (Liu et al., 1997; Liu, 2000), interaction with insects (Kliebenstein, 2004; Ralph et al., 2006), herbivores (Zangerl and Berenbaum, 1998), and pathogens (Kliebenstein, 2004; Kunz et al., 2006). The mechanism of regulation of plant specialized metabolism has been intensively investigated.

Transcriptional control of biosynthetic genes is a major regulatory mechanism controlling plant specialized metabolism (Memelink et al., 2001; Endt et al., 2002; Broun, 2005).

Transcription factors, such as bHLH proteins and ORCA proteins, are essential mediators

of this regulation (Borevitz et al., 2000; van der Fits and Memelink, 2000). When plants

are at a specific developmental stage, or under environmental stress, those transcription

factors are activated by external and/or internal signals, which leads to expression of

important enzymes and production of specialized metabolites (van der Fits and Memelink,

22

2000; Endt et al., 2002). Because of their regulatory function, transcription factors are

considered important targets in plant metabolic engineering for production of

pharmacologically active natural products (Gantet and Memelink, 2002).

Traditionally, post-transcriptional regulation of specialized metabolism has been

considered important only in specific cases (Davies and Schwinn, 2003). However, this

view has been challenged by the results of a recent metabolic profiling study, which

suggests that post-transcriptional regulation might be the dominant metabolic regulation

mechanism (Carrari et al., 2006). Post-transcriptional regulation can occur on many

levels. In transgenic petunias, post-transcriptional gene silencing (PTGS) was found to degrade chalcone synthase, a key for the biosynthesis of flavonoids and many specialized metabolites (Jorgensen et al., 1996; O'Dell et al., 1999). Some post- translational modifications (PTM) of specialized metabolic enzymes can affect enzymatic activities, and thus contribute to the regulation. For example, elicitors regulate the phosphorylation of PAL (Cheng et al., 2001; Zhao et al., 2005), which was found to decrease the activity of this key enzyme (Allwood et al., 1999). The availability of substrates also plays an important role (Bryant et al., 1987; de la Rosa et al., 2001).

Knowledge of the regulation of plant specialized metabolism is of importance for producing consistent and high quality herbal products. High content variability is a

significant problem in herbal supplements used in the United States (Wolsko et al., 2005)

and regulation of plant specialized metabolism is at least partly responsible for this

23

problem. The regulatory mechanisms that make specialized metabolism highly sensitive

to various environmental factors are hard to fully control using current agricultural

methods. As a result, large variations in the levels of pharmacologically important

compounds present in herbal products, and lead to unreliable and irreproducible clinical

effects. Considerable efforts have been made to evaluate the quality of herbal products

using advanced analytical instruments (Liang et al., 2004). However, the quality of herbal

products can not be increased merely by measuring it. Instead, the agricultural and

production methods need to be improved. And scientific investigation into the regulation

of plant specialized metabolism can provide useful information for this improvement.

1.2 Review: Plant specialized metabolism research using proteomic and metabolomic

approaches

1.2.1 Approaches of proteomics

Proteomics is defined as “systematic study of the many and diverse properties of proteins

in a parallel manner” (Patterson and Aebersold, 2003). Although gene expression can be

determined using transcriptomic methods, the abundances of proteins can not be

determined by quantification of their corresponding mRNA because the correlation

between the protein abundance and mRNA expression level has been found typically

modest or weak (Gygi et al., 1999b; Washburn et al., 2003; Nie et al., 2006). In addition,

proteomic research can provide information about modifications, locations, and

interactions of proteins which are important for the elucidation of their functions (Pandey

and Mann, 2000). Therefore, proteomics is a valuable tool for the life science research.

24

Proteomic methods were first used in the 1970s, when two-dimensional gel

electrophoresis (2DE or 2D gel) was first applied to separate complex protein mixtures in

biological samples (Pandey and Mann, 2000). However, proteomic approaches did not

gain wide popularity until the 1990s due to the absence of high throughput protein

identification methods (Pandey and Mann, 2000). The rapid progress of proteomics in

recent years is largely attributed to the completion of several genome sequencing projects

and the development of mass spectrometry (MS) (Aebersold and Mann, 2003). A wide array of proteomic approaches are now used in medical and biological research (Domon and Aebersold, 2006).

Currently, the most prevalent proteomic approaches are 2DE and shotgun approaches. In

the 2DE approach, proteins are separated according to their isoelectric point (pI) in the

first dimension, and then by their approximate molecular weight in the second dimension.

Subsequently, the proteins are visualized either by staining, or by fluorescent or radio labeling methods (Gorg et al., 2000). Protein spots of interest can be cut from the gel and in-gel digestion is performed using trypsin or other proteases. Then resulting peptides are extracted and submitted to MS or liquid chromatography-tandem MS (LC-MS/MS) analysis for protein identification (Gorg et al., 2000). As a classic proteomics method, the

2DE approach is still the most common method used in current plant proteomic studies

(Chen and Harmon, 2006). However, this method is often criticized as being labor- intensive and irreproducible (Fey and Larsen, 2001). Some of the shortcomings of the

25

2DE approach can be overcome by using difference gel electrophoresis (DIGE), a

recently developed modification of the traditional approach (Marouga et al., 2005).

Shotgun proteomic techniques, including multidimensional protein identification

technology (MudPIT) and gel enhanced LC-MS/MS (GeLC-MS/MS), have been

developed as alternatives to the 2DE approach (Lee and Cooper, 2006). MudPIT experiments begin with the digestion of a protein sample, and the resulting peptides are

separated and identified by multidimensional-LC-MS/MS (Wolters et al., 2001). In

GeLC-MS/MS experiments, the proteins are in-gel digested after fractionation using

conventional one dimensional SDS-PAGE, and then the resulting peptides are separated

and identified by nanocapillary LC-MS/MS (Simpson et al., 2000; Breci et al., 2005a).

Shotgun proteomic approaches were thought better than the 2D gel approach in aspects of through-put, dynamic range, and labor intensity (Wolters et al., 2001; Lee and Cooper,

2006). And GeLC-MS/MS is considered a better choice for analysis of plant membrane proteins because it can tolerate many detergents incompatible with MudPIT (Lee and

Cooper, 2006).

Based on the simple peptide mixtures generated using the 2DE approach, proteins can be identified by peptide mass fingerprinting (PMF, or named peptide mapping), in which the peptide masses of all protein entries in the database are compared with the experimental peptide mass often acquired by matrix assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF) (Aebersold and Mann, 2003). LC-MS/MS is often

26

used to identify a protein from more complex peptide mixtures generated in the shotgun

approaches. The peptide mixture is separated using nano-HPLC, and then collision-

induced dissociation (CID) spectra are generated by MS/MS (Newton et al., 2004). The

peptide sequences can be deduced using a cross-correlation method, in which the experimental MS/MS spectra are compared with theoretical spectra for the best match

(Yates et al., 1995b). Many software tools are available for protein identification using

MS/MS spectra, such as SEQUEST (Yates et al., 1995b), Mascot (Perkins et al., 1999)

and X!tandem (Craig and Beavis, 2004).

The “protein inference problem” complicates the task that assigns peptides back to

proteins and assesses the statistical significance of protein identification (Nesvizhskii and

Aebersold, 2005). In shotgun proteomic analysis, the connection between the proteins

and peptides is lost because the digestion was performed prior to LC separation. Due to

the sequence similarity among proteins, peptides with the same sequence can be

generated from different proteins. Several programs, such as PROT_PROBE (Sadygov et

al., 2004), PANORAMICS (Lee and Cooper, 2006) and ProteinProphet (Nesvizhskii et

al., 2003), have been developed to solve this problem. Of these, ProteinProphet was

chosen for this research because it has been integrated into Trans-Proteomic Pipeline

(TPP), a freely available uniform proteomics analysis platform (Keller et al., 2005), and

thus the large dataset acquired by shotgun proteomic experiment can be processed

efficiently.

27

In the 2DE approach, proteins can be directly quantified based on gel image after the protein visualization. Protein quantification in the shotgun approach is often performed using chemical isotopic labeling techniques, such as isotope coded affinity tag (ICAT)

(Gygi et al., 1999a). Isotopic labeling requires additional experimental steps and may affect protein identification (Kolker et al., 2006). Therefore, label-free approaches have been developed as alternatives to the chemical labeling methods to avoid these problems

(Kolker et al., 2006). These approaches quantify proteins based on information such as peptide or spectrum number (Liu et al., 2004), or the MS signal intensity of peptides

(Wang et al., 2003). Recently, a new method, absolute protein expression measurements

(APEX) (Lu et al., 2007), has been developed to perform absolute protein quantification based on spectral count in the output of ProteinProphet (Nesvizhskii et al., 2003).

Shotgun proteomic approaches generate a large number of MS/MS spectra, from which

PTMs of proteins can be identified based on the mass shift and neutral loss corresponding to a specific PTM (Kwon et al., 2006). For example, phosphorylation of serine and threonine are characterized by a mass shift of +80 and a neutral loss of -98 (DeGnore and

Qin, 1998), arginine methylation leads to mass shift of +14 (Brame et al., 2004), and lysine ubiquitination can be identified based on the mass shift of +144 (Peng et al., 2003).

These PTMs can be recognized using protein identification software tools, such as

SEQUEST (Yates et al., 1995b) or X!tandem (Craig and Beavis, 2004).

28

1.2.2 Proteomic research on herbal plant specialized metabolism

Proteomic tools have been widely used in plant research. However, only a small

proportion of these studies focus on herbal/medicinal plant specialized metabolism

(Agrawal and Rakwal, 2006; Glinski and Weckwerth, 2006; van Bentem et al., 2006; Ito

et al., 2007). To date, no proteomic study of turmeric or basil has been reported.

In this area, the only herbal plant species that has been systemically investigated using

proteomic approaches is Panax ginseng C.A. Meyer (Korean ginseng). This plant

accumulates ginsenosides in its root, which is often used as a tonic in Eastern Asia (Nam et al., 2005). Proteomic studies have been performed on many types of ginseng samples, such as main roots, rhizome heads, root skins (Lum et al., 2002), leaves (Nam et al.,

2003), cultured hairy roots (Kim et al., 2003), and cultured cells (Lum et al., 2002). A total of 91 proteins have been functionally identified using 2DE or shotgun proteomic approaches (Nam et al., 2005). The impact of high intensity light on the proteome of ginseng leaves has been investigated and 6 high light -responsive proteins were identified

(Nam et al., 2003).

Another intensively investigated system is Catharanthus roseus cell suspension culture

system. C. roseus produces anti-tumor alkaloids, including vinblastine and vincristine

(Jacobs et al., 2000). Since the 1990s, a series of studies has been performed to increase the alkaloid yield of cell cultures of C. roseus. The 2DE approach was used to

demonstrate the change of protein profile after alteration of the medium (Carpin et al.,

29

1997b) and genetic modification of the cell line (Carpin et al., 1997a). A 2-oxoglutarate- dependent dioxygenase was identified using this approach (Decarolis and Deluca, 1993).

A recent proteomic study of this cell suspension culture system using 2DE method identified 58 proteins, including enzymes directly involved in the alkaloid biosynthesis, such as strictosidine synthase and (Jacobs et al., 2005).

Plant proteomic studies of specialized metabolism of other plants are not common. In an

Eschscholtzia californica cell culture system (producing benzophenanthridine alkaloids),

2DE was used to define the change of protein pattern due to addition of yeast extracts elicitor (Park et al., 2006). In Papaver somniferum, a proteomic analysis led to the identification of methyl transferase enzymes involved in alkaloid biosynthesis (Decker et al., 2000; Ounaroon et al., 2003). Proteins contributing to specialized metabolism are sometimes identified in plant proteomic studies, but the major aims of these studies are usually not oriented toward specialized metabolism (Lei et al., 2005; Lippert et al., 2007).

Proteomic research focusing on plant specialized metabolism is relatively rare although the application of proteomic methods for to investigate plant specialized metabolism has been proposed for a long time (Jacobs et al., 2000). Proteomic research in this area has been obstructed by absence of complete plant genome sequences (Jacobs et al., 2000).

Also, membrane-associated proteins, which play important roles in plant specialized metabolism, are difficult to analyze using the 2DE approach (Jacobs et al., 2000).

Fortunately, some of these difficulties can be circumvented by using alternative analytical

30 methods and information sources. In this study, we use a shotgun proteomic approach, which has been reported to be a better choice for analysis of plant membrane proteins

(Lee and Cooper, 2006). We also used an high quality expressed sequence tag (EST) database of basil glandular trichome, containing 24248 ESTs and 7963 non-redundant entries (contigs or singletons), as an alternative to the whole genome sequence.

1.2.3 Introduction of metabolomics

Metabolomics is defined as “the study of the complete metabolic compliment of the cell, organ or organism” (Griffin and Shockcor, 2004). Metabolomic research provides comprehensive, quantitative and qualitative information about the dynamic changes of small-molecule metabolites, which can greatly facilitate understanding metabolic processes (Fiehn, 2002; Sumner et al., 2003). Small changes of enzyme activities can have significant impact on metabolite concentrations (Kell and Westerhoff, 1986).

Therefore, determining concentrations of metabolites is more sensitive than measuring rates of chemical reactions (Griffin and Shockcor, 2004). Therefore, metabolomics is especially useful to elucidate metabolic pathways in a perturbed system (Griffin and

Shockcor, 2004).

Metabolomics is less mature than proteomics. The precursor of metabolomics is metabolite profiling (or named “metabolic profiling”), which started at 1970s, following the development of GC-MS in the 1960s (Sumner et al., 2003). The real starting point of metabolomics is considered to be the end of 1990s, when the concept of “metabolic

31

snapshot'” was proposed by Oliver based on functional genomic research in yeast (Oliver

et al., 1998). Although plant proteomics only accounts for a small portion of all

proteomic publications, plant metabolomics played a very important role in the

development of metabolomics (Sumner et al., 2003). The number of publications regarding metabolomics has grown rapidly in recent years (Figure 1-2-1).

450 400 350 300 250 200 150 100 50 Number of publicaton on ISI publicaton of Number 0 1999 2000 2001 2002 2003 2004 2005 2006 Year

Figure 1-2-1 Publications on ISI containing “metabolom*” from 1999 to 2006

Metabolomic analysis starts with sample preparation. Samples are often immediately

frozen, either by freeze clamping, or liquid nitrogen freezing to stop inherent enzymatic activity (Fiehn, 2002). Alternatively, this activity can be quenched in cold methanol

(Dettmer et al., 2007). After homogenization in a cold ball mill or another type of grinder,

the sample is extracted by adding solvents. Sometimes additional energy, such as heat,

sonication, or microwaves can be applied to facilitate the extraction (Fiehn, 2002). The

32 impact of several factors on sample preparation has been systematically evaluated in a yeast metabolomic study (Villas-Boas et al., 2005).

After sample preparation, the extract is submitted for analysis. Several analytical methods are used in metabolomic research. Among them, nuclear magnetic resonance (NMR) and

MS are the most commonly used detection methods. Generally, NMR is more often used in “metabonomics”, which emphasizes the “quantitative measurement of the multivariate metabolic responses of multicellular systems” to genetic modification or environmental stimuli (Nicholson and Wilson, 2003); MS is more often used in “metabolomics”, which places an emphasis on unbiased identification and quantification of all metabolites (Dunn et al., 2005). Currently no single technology possesses all the desired features: unbiased, sensitive, reproducible, high through-put, and capable of metabolite identification (Dunn et al., 2005). Therefore, combinations of different approaches are often preferred.

Recently, several metabolomic studies using both NMR and MS have been reported

(Atherton et al., 2006; Gu et al., 2007; Hodson et al., 2007; Pan et al., 2007).

The raw data acquired must be processed and converted into data matrixes that can be used for the following data analysis steps. Two general strategies have been used for the data processing. The first one divides the signal into bins, and then recognizes profiles from each of the samples. The second one identifies and quantifies the significant features (for example, a peak in an LC-MS result) (Smith et al., 2006). Because of the large size of the data set, manual processing is not feasible. Therefore, several software

33

tools have been developed, and some of them are freely available, such as MSFACTS

(Duran et al., 2003), xcms (Smith et al., 2006), MET-IDEA (Broeckling et al., 2006),

MZmine (Broeckling et al., 2006), and MathDAMP (Baran et al., 2006).

Normally, data analysis in metabolomic research is performed using multivariate

analytical algorithms, including supervised methods, such as discriminant analysis (DA),

artificial neural networks (ANN) and evolutionary computation (EC), as well as

unsupervised methods, such as principal component analysis (PCA), hierarchical cluster

analysis (HCA), and self organizing mapping (SOM) (Nobeli and Thornton, 2006).

Unsupervised methods are often used in sample classification and supervised-type methods are often used in biomarker discovery and diagnostics studies (Dettmer et al.,

2007). Data normalization and data transformation are generally performed before application of the methods mentioned above. Range scaling and autoscaling were found to be better than other pretreatment methods during metabolomic data analysis (van den

Berg et al., 2006).

As a new research area which is rapidly growing, metabolomics suffers from certain

“growing pains” (Daviss, 2005). For example, hundreds of analytes can be readily detected or even quantified, but identification of them remains a difficult task. This puts metabolomics in a position close to where proteomics was a decade ago. Efforts have been made to standardize the format for reporting metabolomic results (Jenkins et al.,

2004). But there is still no de facto standard for the multiple-step experimental

34

procedures (Bamba and Fukusaki, 2006). Despite these problems, with its unique capability to provide comprehensive information on metabolism, metabolomics is expected to continue to grow rapidly (Sumner et al., 2003). Due to its interdisciplinary nature, the development of metabolomics will largely rely on the progress of analytic technologies and bioinformatics (Sumner et al., 2003; Hall, 2006).

1.2.4 Metabolomic research on herbal plant specialized metabolism

Metabolomic research on herbal plant specialized metabolism is far more common than proteomic research in the same area. Plant metabolomics can be viewed as “large-scale phytochemistry in the functional genomics era” (Sumner et al., 2003). Generally, specialized metabolites comprise a large proportion of the plant metabolome (Seger and

Sturm, 2007). Therefore, with appropriate analytical methods, most plant metabolomic investigations detect many specialized metabolites. This fact, combined with the large number of publications regarding plant metabolomics, makes it difficult to comprehensively review all metabolomic research related to plant specialized metabolism.

Therefore, the following review will only focus on herbal/medicinal plants.

Both of the target plants of this research have been analyzed using metabolomic or

metabolic profiling methods. Two-dimensional gas chromatography with time-of-flight

mass spectrometry (GC×GC-TOFMS) was used in the comparison of metabolite profiles

of three plant spices: basil (Ocimum basilicum), sweet herb stevia (Stevia rebaudiana),

and peppermint (Mentha piperita). The PCA results showed that the major metabolomic

35

differences of the three plants were relative abundances of carboxylic acids, amino acids,

and carbohydrates (Pierce et al., 2006). Metabolic profiling comparison of different lines and tissues of turmeric (Curcuma longa L.) was performed using gas chromatography-

mass spectrometry (GC-MS) and liquid chromatography-electrospray ionization-mass

spectrometry (LC-ESI-MS). No significant differences were found between conventional

greenhouse-grown vs. in vitro propagation derived plants (Ma and Gang, 2006).

Three of the medicinal plant species investigated using proteomics have also been

analyzed using metabolomics or metabolic profiling methods. The leaves of C. roseus

were analyzed using NMR to investigate the impact of infection by different types of

phytoplasmas. The results suggested that the biosynthesis of terpenoid indole alkaloids and phenylpropanoids was stimulated by the infection (Choi et al., 2004). Metabolic

fingerprinting of commercial ginseng preparations led to the establishment of a new

quality control method, which can discriminate ginseng preparations without the

requirement of pre-purification (Yang et al., 2006). Panax notoginseng is a plant in the

same genus with Ginseng, and its root is used in traditional Chinese medicine (Chan et al.,

2002). Ginsenoside biomarkers were found using ultra-performance liquid chromatography/time-of-flight mass spectrometry (UPLC/TOFMS) from raw and steamed roots of P. notoginseng (Chan et al., 2007). Opium poppy (P. somniferum) cell

cultures were investigated to reveal the effect of elicitor induction on sanguinarine

accumulation. Metabolite profiling data collected by Fourier transform-ion cyclotron

resonance-mass spectrometry (FT-ICR MS) showed the response of 992 independent

36 analytes to the elicitor treatment (Zulak et al., 2007). No metabolomic report of E. californica has been published.

Several other herbal or medicinal plants also have been analyzed using metabolomic or metabolic profiling methods. Twenty six chemically distinct germplasm lines of the

Chinese medicinal herb Huang-qin (Scutellaria baicalensis) were characterized using FT-

ICR MS. More than 2,400 compounds were identified in the metabolomic analysis

(Murch et al., 2004a). Extracts of chamomile flowers (Matricaria recutita) from three different geographical regions have been analyzed using NMR. The observed differences resulted from origin, purity and preparation methods (Wang et al., 2004). Differences between the metabolomic profiles of the extract of Cannabis sativa flowers and leaves were found using NMR. The impact of light and darkness on cannabinoid concentrations was also investigated (Wang et al., 2005). Thin layer chromatography (TLC) was used to analyze specialized metabolites in the MeOH extracts of Ginkgo biloba leaves. The result showed that the abundance of specialized metabolites was affected by the time of day at which samples were harvested (Wang et al., 2005). Metabolomic analysis of 9 commercial Ephedra materials has been performed using NMR and a method was established for the authentification of Ephedra species (Kim et al., 2005b). LC-UV was used in the metabolic profiling analysis of Willow (Salix sp.) bark extracts, and weighted

PCA was used as the alternative to ordinary PCA (Hendriks et al., 2005). The quality of green tea was measured by means of metabolomics using GC-TOF-MS. A metabolite

37 fingerprinting method was established to assess the quality of tea with no requirement for standard samples (Pongsuwan et al., 2007).

1.3 Review: Basil glandular trichome

1.3.1 Basil produces phenylpropanoids and terpenoids in glandular trichome

Basil (Ocimum basilicum L.), also known as Sweet Basil, is a member of the Lamiaceae

Family. It has a long history of use as a culinary spice and traditional herbal remedy

(Tapsell et al., 2006). The juice of basil was widely used in Ayurvedic medicine (Ody,

1993). Traditionally, basil leaves were taken as a tonic for nervous exhaustion, and the essential oil was used in aromatherapy (Ody, 1993). Recently, some important pharmaceutical properties of basil, such as anticarcinogenic (Dasgupta et al., 2004), antifungal (Oxenham et al., 2005), antibacterial and anti-thrombotic (Tohti et al., 2006) activities, have been reported.

Metabolic profiles vary among basil lines. Two of the four basil lines used in this project mainly accumulate phenylpropenes: eugenol in line SW, and methylchavicol in line

EMX-1. Line MC, however, produces essentially only methylcinnamate with almost no phenylpropenes (Gang et al., 2001). Line SD (Sweet Dani), in contrast to SW, EMX-1 and MC, produces only small quantities of phenylpropanoids instead of terpenoids as the major compounds (Iijima et al., 2004a). Many minor compounds have been found in each line, for example, 1,8-cineole in line SW and EMX-1 (Iijima et al., 2004a). “External flavones”, such as salvigenin and nevadensin, were also reported to exist on the surface

38

of the leaf (Werker et al., 1993a). Structures of some specialized metabolites found in

basil leaves are shown in Figure 1-2-2.

O OMe

O

OMe

OMe OH OMe OH Eugenol Z-Methylcinnamate E-Methylcinnamate Methylchavicol Chavicol

OH

O O O

1,8-Cineole Linalool Neral Geranial Caryophyllene OMe HO O OMe MeO O OMe

MeO MeO OH O OH O Nevadensin Salvigenin

Figure 1-2-2 Structures of some specialized metabolites found in basil leaves

39

The pleasant aroma of basil comes from volatile compounds produced primarily by

peltate glands on the surface of leaves. This was proven by the following observatons: (1)

the gland structure is highly specialized for volatile oil production and storage (Werker et

al., 1993b), (2) the most abundant ESTs prepared from gland cell transcripts have been

identified as enzymes related to plant specialized metabolism, (3) specific enzyme

activities essential for plant specialized metabolism are much higher in glands than in

whole leaves, and (4) the volatile metabolic profiles of the gland is similar to that of

whole leaf (Gang et al., 2001) (Iijima et al., 2004a).

1.3.2 Basil glandular trichome as a model system of plant specialized metabolism

research

EST databases have been constructed for SD, SW, and EMX-1, which have provided

valuable information regarding specialized metabolism in basil glandular trichome (Gang

et al., 2001; Iijima et al., 2004b). Based on the analysis result of EMX-1 EST database, a global view on both primary and specialized metabolism of the gland has been

established (Gang et al., 2001). Comparison of mRNA expression between glands and whole leaves proved that glands produce most, if not all, volatile metabolites in basil leaves (Gang et al., 2001). In addition, many important enzymes involved in specialized metabolism were identified based on the information provided by EST databases (Gang et al., 2002a; Gang et al., 2002b; Iijima et al., 2004a; Iijima et al., 2004b).

40

Generally, enzyme identification started by searching the EST database for cDNA sequences homologous to the protein sequences of known enzymes. Full length cDNAs can be acquired using 5′-RACE/genome walking (Gang et al., 2002a). Subsequently, candidate sequences were pinpointed based on the expression profiles in different lines with the assumption that the enzyme and its products should coexist. Then the enzymatic assay was carried out using putative enzymes expressed in E. coli and possible substrates of the enzymes. The enzyme was identified when it was found to generate the specific products. The last step was the characterization of the identified enzyme, including investigation on the relatedness to other similar enzymes and the reaction mechanism

(Fridman and Pichersky, 2005).

Six enzymes have been found in the pathway where phenylpropenes are produced. In

EMX-1, two phenylpropene O-methyltransferases, chavicol O-methyltransferase

(CVOMT) and eugenol O-methyltransferase (EOMT), were identified (Gang et al.,

2002b). In EMX-1 and SW, two acyltransferases, p-coumaroyl-coenzyme A:shikimic acid p-coumaroyl transferase (CST) and p-coumaroyl-coenzyme A:4- hydroxyphenyllactic acid p-coumaroyl transferase (CPLT), as well as a P450, p-coumaroyl shikimate 3’-hydroxylase (CS3′H) have been reported (Gang et al., 2002a).

Eugenol synthase 1 (EGS1), the enzyme directly catalyzing the formation of eugenol, was isolated from SW (Koeduka et al., 2006).

41

Nine terpene synthases (TPS) have been identified in EMX-1, SD, and SW (Iijima et al.,

2004a). Five of them are monoterpene synthases, including geraniol synthase (GES)

(Iijima et al., 2004b), (R)-linalool synthase (LIS), terpinolene synthase (TES), fenchol

synthase (FES), and β-myrcene synthase (MYS). The other four enzymes are

sesquiterpene synthases, including γ-cadinene synthase (CDS), selinene synthase (SES),

α-zingiberene synthase (ZIS) and germacrene D synthase (GDS) (Iijima et al., 2004a).

Most of the nine terpene synthases share limited sequence similarity, but two of them

share very close similarity (GES-LIS and MYS-FES) (Iijima et al., 2004a). From EMX-1,

SD, and SW, a NADP+-dependent dehydrogenase, cinnamyl 1

(CAD1), was identified as the enzyme responsible for the reversible oxidization of

geraniol to produce geranial (Iijima et al., 2006).

It is important to note that different basil lines, despite short genetic distances between them (high similarity of the genome) (Satovic et al., 2002; Labra et al., 2004), have distinct metabolomes. According to the evolutionary and breeding history of basil lines, most of the basil lines possess enough genetic potential for the synthesis of a large collection of both phenylpropanoids and terpenoids (Iijima et al., 2004a). However, only a proportion of these compounds are really produced in each line, which suggests differential regulation of specialized metabolism.

The biochemical basis of chemical diversity of basil lines has been investigated at different levels (Table 1-3-1). According to the the literature, both mRNA levels and

42

enzymatic activities of some enzymes were consistent with the levels of their

corresponding metabolites, which suggested that gene expression plays an important role.

For example, PAL, the first committed step of phenylpropanoid pathway, in SD, was

down-regulated at both the gene expression and enzymatic activity level, which was

correlated with the low concentrations of phenylpropenes in SD (Gang et al., 2001; Iijima

et al., 2004a). However, this is not always true. For example, differences in mRNA levels

can not fully explain the variation of terpene profiles of basil lines. Therefore, post-

transcriptional regulation probably also is involved in the regulation of specialized

metabolism in basil glandular trichomes (Iijima et al., 2004a).

The basil glandular trichome (gland) is a good single cell model for investigation of the

phenylpropanoid pathway and terpenoid pathway as well as their regulation. A considerable amount of enzyme-rich glands can be prepared from basil young leaves using a simple isolation procedure (Gang et al., 2001). In addition, the membrane of the

gland is permeable to small molecules but not to proteins. This feature facilitates the

measurement of enzymatic activities in glands because during the gland isolation, most of

the small molecules are washed away, so they will not interfere with the subsequent

experimental steps (Koeduka et al., 2006). More importantly, the value of the chemical

diversity of basil lines has been proven, for both identification of enzymes and

elucidation of the regulation of these pathways (Gang et al., 2002a; Gang et al., 2002b;

Iijima et al., 2004a; Iijima et al., 2004b).

43

Proteomic studies can be helpful to shed light on some important unanswered questions concerning specialized metabolism in basil glandular trichomes. Quantitative data

regarding protein levels are limited, and PTMs in glands has not been demonstrated although its existence has been surmised (Iijima et al., 2004a). Quantification of proteins

is necessary to validate the results of previous studies regarding mRNA and enzymatic

activity levels. For example, mRNA levels for enzymes in the MVA pathway were found

to be very low in the EST database, which suggested that the MVA pathway was not

likely to play a significant role in the production of terpenes in the glands (Iijima et al.,

2004a). However, because protein levels are not always well correlated with transcript

levels (Gygi et al., 1999b; Washburn et al., 2003; Nie et al., 2006), final conclusions

could not be drawn without further evidence of protein levels (Iijima et al., 2004a).

Proteomic approaches have been applied to trichomes of Arabidopsis (Wienkoop et al.,

2004) and tobacco (Amme et al., 2005), and resulted in the identification of 63 and 7

proteins, respectively. Although they revealed some aspects of the metabolism and

function of the trichomes, the limited number of proteins identified in those studies were

not sufficient to provide comprehensive information about the proteome in this cell type.

Also, the integration of transcriptomic, proteomic, and metabolic profiling data, almost

certainly helpful for the elucidation of the regulation and pathway of the metabolism, has

not been accomplished, probably due to lack of information about the corresponding

transcriptomes and metabolomes.

44

Table 1-3-1 Reported Enzymatic activity and mRNA level of enzymes in basil gland

Enzymatic activity* mRNA level Descriptive name of the enzyme SW EMX-1 SD SW EMX-1 SD Reference # Phenylalanine-ammonia lyase 17.4 27 9.8 + + +/- c Cinnamate 4-hydroxylase + + + c 4-Coumarate-CoA ligase 120.3 46 ++ ++ + a, c p-Coumaroyl-CoA:shikimic acid ~350 - b p-coumaroyl transferase Hydroxycinnamoylshikimate 9 : 1 b 3'-hydroxylase Caffeoyl-CoA O- 31.51 42 a methyltransferase Eugenol O-methyltransferase 0.51 67 a Chavicol O-methyltransferase 0.51 72 a 1-Deoxy-D-xylulose-5- + + + c phosphate synthase DXP reductoisomerase +/- +/- +/- c 4-Diphosphocytidyl-2-C-methyl- ++ ++ ++ c D-erythritol synthase CDP-ME kinase + +/- +/- c 2-C-methyl-D-erythritol 2,4- + +/- +/- c cyclodiphosphate synthase 4-Hydroxy-3-methylbut-2-en-1 ++ + + c -yl diphosphate synthase 1-hydroxy-2-methyl-butenyl 4- + +/- +/- c diphosphate reductase Acetoacetyl CoA thiolase +/- +/- +/- c HMG-CoA synthase +/- +/- +/- c HMG-CoA reductase +/- +/- +/- c Geranyl diphosphate synthase 54.9 23.4 68.3 + + + c Geraniol synthase + + ++ c

45

Table 1-3-1 Reported Enzymatic activity and mRNA level of enzymes in basil gland - continued

Enzymatic activity* mRNA level Descriptive name of the enzyme SW EMX-1 SD SW EMX-1 SD Reference # R-linalool synthase ++ + + c Terpinolene synthase ++ + +/- c Fenchol synthase + + + c β-Myrcene synthase + + + c γ-Cadinene synthase ++ + + c Selinene synthase ++ ++ ++ c α-Zingiberene synthase + + + c Germacrene D synthase + + ++ c Cinnamyl alcohol ~260 ~110 ~270 + ++ + d dehydrogenases

*: unit of enzymatic activity: pkat per mg protein #: references: a – (Gang et al., 2001); b –(Gang et al., 2002a); c–(Iijima et al., 2004a); d– (Iijima et al., 2006)

1.4 Review: turmeric

1.4.1 History, chemistry and biological activities of turmeric

Turmeric (Curcuma longa L.), a tropical herb indigenous to southern Asia, is a member

of the ginger (Zingiberaceae) family (Park and Kim, 2002). The powder of dry rhizome

of turmeric is used widely as a food flavoring, pigment, and preservative (Jagetia and

Aggarwal, 2007). More than 2400 metric tons of turmeric are imported into the United

States every year (Sharma et al., 2005).

46

Turmeric has a long history of medicinal use in Asian countries. In Ayurvedic Medicine

(a traditional medical system native to the Indian subcontinent), turmeric is used to

alleviate sprains and swellings caused by injury, stimulate wound healing, and treat

abdominal problems (Shishodia et al., 2005). In Traditional Chinese Medicine, it is used

in the treatment of rheumatism, menstrual disorders, and traumatic diseases (Xia et al.,

2005). The modern scientific investigation of turmeric has a long history. The structure of

curcumin, one of the major chemical principles in turmeric, was first described about a

century ago (Milobedzka et al., 1910; Shishodia et al., 2005). The interest in turmeric was

renewed in the 1970s, when anti-inflammatory properties of turmeric were reported

(Arora et al., 1971; Jayaprakasha et al., 2005), and was further stimulated by the

discovery of lipid peroxidation inhibition properties of curcumin and related compounds

(Reddy and Lokesh, 1992; Jayaprakasha et al., 2005). Because of the important biological

activities and medicinal properties revealed by those studies, turmeric is now a well

known health food, instead of merely a food additive. Publications concerning this herbal

plant have been growing rapidly in recent years (Figure 1-4-1).

The most characteristic compounds isolated from turmeric are curcuminoids (also called diarylheptanoids). They are a group of phenolic compounds with structures similar to curcumin. Three major curcuminoids (curcumin 1, demethoxycurcumin 2, and bisdemethoxycurcumin 3) are present in rhizome of turmeric at high levels (Srinivasan,

1952, 1953; Kosuge et al., 1985; He et al., 1998; Pothitirat and Gritsanapan, 2006; Jagetia and Aggarwal, 2007). Curcumin 1 is the most abundant among the three, with a level of

47

0.6-3.1% of the dry weight (Tayyem et al., 2006). Besides the three major curcuminoids, some diarylheptanoids have been isolated and identified from turmeric as minor constituents using conventional phytochemical approaches (Masuda et al., 1993;

Nakayama et al., 1993; Park and Kim, 2002). The fragmentation pattern of the three major curcuminoids in ion trap MS/MS has been determined (Jiang et al., 2006a). And structures of 6 new diarylheptanoids were proposed from the rhizome of turmeric based on MS/MS spectral data (Jiang et al., 2006b). Diarylheptanoids can be categorized into structurally similar analogs with common chemical modifications, such as reduction, oxidation, hydration, methylation, and methoxylation. Figure 1-4-2 shows the structures of diarylheptanoids which have been identified or proposed from turmeric.

The volatile oil of turmeric has also been intensively investigated. Most of the compounds identified from turmeric volatile oil are sesquiterpenoids. The volatile oil of the turmeric rhizome is especially rich in sesquiterpene ketones and sesquiterpene alcohols (Jayaprakasha et al., 2005). Considerable variation is found in the compositional data of different reports (Jayaprakasha et al., 2005). The most abundant compounds were reported as turmerone, ar-turmerone, β-turmerone, curdione, curlone, zingiberene, germacrone, and β-sesquiphellandrene (Chen, 1983; Kiso et al., 1983; Bansal et al., 2002;

Chane-Ming et al., 2002; Raina et al., 2002; Singh et al., 2002; Manzan et al., 2003;

Raina et al., 2005). Besides sesquiterpenoids, several monoterpenoids and fatty acids were also identified (Jayaprakasha et al., 2005). Monoterpenoids, such as terpinolene, α- phellandrene, p-cymene, 1,8-cineole, cis-sabinol, and β-pinene were reported as major

48 components of the volatile oil from turmeric leaves (Behura et al., 2002; Garg et al., 2002;

Pande and Chanotiya, 2006). Structures of some abundant components of turmeric volatile oil are shown in Figure 1-4-3.

Most pharmaceutical studies focus on curcumin 1 (Jayaprakasha et al., 2005). The results of research on pharmacokinetic properties, safety, biological activity in preclinical models, and clinical trials have been summarized in multiple reviews (Sharma et al., 2005;

Maheshwari et al., 2006; Ramassamy, 2006; Singh and Khar, 2006; Thangapazham et al.,

2006). Anti-inflammatory effects and anti-cancer effects of curcumin in human have been supported by results of clinical trials (Sharma et al., 2005). The anti-inflammatory effects of curcumin has been attributed to inhibition of the induction of cyclooxygenase-2

(COX-2), inducible nitric oxide synthase (iNOS), and production of cytokines (Sharma et al., 2005). Curcumin has been shown to inhibit initiation, progression, and promotion phases of carcinogenesis (Thangapazham et al., 2006). The anti-cancer effects of curcumin have been related to its anti-inflammatory activity and its effects on adhesion molecules, angiogenesis regulators, apoptotic genes, and carcinogen metabolizing enzymes (Maheshwari et al., 2006). In addition, curcumin showed several other pharmaceutical activities, such as anti-oxidant activity, antimicrobial activity, wound healing activity (Jayaprakasha et al., 2005; Maheshwari et al., 2006), antiplatelet activity

(Shah et al., 1999), anti-HIV (Vajragupta et al., 2005), and anti-Alzheimer's disease effects (Yang et al., 2005).

49

160 140 120 100 80 60 40 20

Number on ISI publicaton of 0 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 Year

Figure 1-4-1 Publications on ISI containing “turmeric” or “Curcuma Longa” from 1975 to 2006

O O R1 R3

HO OH R2 R4

Compound Name R1 R2 R3 R4 1 curcumin OMe H OMe H 2 demethoxycurcumin H H OMe H 3 bisdemethoxycurcumin H H H H 4 4'-hydroxybisdemethoxycurcumin* H H OH H 5 4'-hydroxydemethoxycurcumin OMe H OH H

50

O O R1 R3

HO OH R2 R4

Compound Name R1 R2 R3 R4 6 Dihydrocurcumin OMe H OMe H 7 Dihydrodemethoxycurcumin H H OMe H 8 Dihydrobisdemethoxycurcumin H H H H

O R1 R3

HO OH R2 R4

Compound Name R1 R2 R3 R4 1,7-bis(4-hydroxy-3- 9 methoxyphenyl)-4,6-heptadien-3- OMe H OMe H one 1-(4-hydroxyphenyl)-7-(4- 10 hydroxy-3-methoxyphenyl)-4,6- H H OMe H heptadien-3-one 1-(4-hydroxy-3-methoxyphenyl)-7- 11 (4-hydroxy-3,5-dimethoxyphenyl)- OMe H OMe OMe 4,6-heptadien-3-one*

OH O O R1 R3

HO OH R2 R4

Compound Name R1 R2 R3 R4 1-hydroxy-1-(4-hydroxyphenyl)-7-(4- 12 hydroxy-3-methoxyphenyl)-6-hepten-3,5- dione* H H OMe H 1-(3,4-dihydroxyphenyl)-7-(4-hydroxy-3- 13 methoxyphenyl)-6-hepten-3,5-dione* OH H OMe H

51

O R1 R3

HO OH R2 R4

Compound Name R1 R2 R3 R4 1-(4-hydroxyphenyl)-7-(4-hydroxy-3- 14 methoxyphenyl)-1,4,6-heptatrien-3-one * H H OMe H 1,7-bis(4-hydroxy-3-methoxyphenyl)-1,4,6- 15 heptatrien-3-one OMe H OMe H

OH OH

HO OH

1,7-bis(4-hydroxyphenyl)-3,5-heptanediol, compound 16

O OH

HO OH

5-hydroxy-1,7-bis(4-hydroxyphenyl)-3-heptanone, compound 17

52

OH O MeO O OMe O HO

calebin-A, compound 18

O MeO OMe

HO OH

1,5-bis(4-hydroxy-3-methoxyphenyl)-1,4-pentadien-3-one, compound 19

O OH HO OH

HO OH

5-hydroxy-1,7-bis(3,4-dihydroxyphenyl)-1-hepten-3-one, compound 20*

O O

HO OH

tetrahydrobisdemethoxycurcumin, compound 21

Figure 1-4-2 Diarylheptanoids identified or proposed from turmeric (*: the structure of the compound was proposed based on MS/MS spectra)

53

H H

O O O O

turmerone ar-turmerone ß-turmerone zingiberene

H H O O

O O

curlone ß-sesquiphellandrene germacrone curdione

p-cymene ß-pinene terpinolene α-phellandrene

HO H

O

1,8-cineole cis-sabinol

Figure 1-4-3 Structures of abundant compounds in turmeric volatile oil

54

Generally, demethoxycurcumin 2 and bisdemethoxycurcumin 3 were believed to have biological activities similar to curcumin 1, but probably with different potency (Shishodia et al., 2005). Recent studies showed that demethoxy derivatives of curcumin have lower antioxidant activity (Somparn et al., 2007), bacterial sortase A inhibition activity (Park et al., 2005), and neuroprotective activity against heavy metal-induced neurotoxicity

(Dairam et al., 2007). Pharmacological studies of curcuminoids present in rhizome in low levels are uncommon. A rare example reported protective effects of calebin-A 18 and dihydrobisdemethoxycurcumin 8 on PC12 cells from β-amyloid insult (Park and Kim,

2002).

Considerable pharmacological studies have been performed on turmeric volatile oil.

Similar to curcumin, the oil was also reported to have antioxidant activity (Jayaprakasha et al., 2002), antimicrobial activity (Thongson et al., 2004), antiplatelet activity (Lee,

2006; Tognolini et al., 2006), and anti-inflammatory activity (Chandra and Gupta, 1972).

In addition, volatile oil or its components showed some biological activities different from curcumin, such as antivenom activity (Ferreira et al., 1992) and larvicidal activity

(Pitasawat et al., 2007).

1.4.2 Biosynthesis of diarylheptanoids in turmeric

Labeling studies on plants other than turmeric indicated that diarylheptanoids, such as curcumin 1, are formed from a one-carbon unit and two phenylpropanoids, and the one- carbon unit is probably derived from malonate (Holscher and Schneider, 1995; Kamo et

55

al., 2000; Brand et al., 2006). Based on this, a putative biosynthetic pathway for

curcuminoids in turmeric has been proposed (Figure 1-4-4) (Ramirez-Ahumada et al.,

2006). Activities of some of the important enzymes in the proposed pathway, such as

PAL, CST, curcuminoid synthase, and hydroxycinnamoyl-CoA thioesterase, have been

identified from tissues of turmeric (Ramirez-Ahumada et al., 2006). However, it is not

clear whether 3-methoxyl groups on the aromatic rings are formed before the formation

of the backbone (the middle pathway) or after it (the bottom pathway) (Ramirez-

Ahumada et al., 2006).

O OH O OH O OH

NH2 PAL C4H

OH L-Phe Cinnamic p-Coumaric Acid Acid

4CL

O OH O OH OH OH OMe CST+C3H CCOMT HO O OH OH O OH O OH Shikimic HCT/CST O Acid CoAS CoAS CoAS Caffeoyl 5-O-Shikimate Caffeoyl-CoA p-Coumaroyl-CoA Feruloyl-CoA

2X 2X

Malonyl-CoA Polyketide Malonyl-CoA Polyketide Malonyl-CoA Polyketide Synthase Synthase Synthase

O OH O OH O OH Hydroxylase Hydroxylase OMe MeO OMe + OMT + OMT HO OH HO OH HO OH disdemethoxycurcumin 3 Demethoxycurcumin 2 Curcumin 1

Figure1-4-4 Proposed biosynthetic pathway of curcuminoids in turmeric (Ramirez-

Ahumada et al., 2006)

56

1.4.3 Effects of agricultural factors on yield and quality of turmeric rhizome

Turmeric does not produce seeds, and consequentially, the propagation depends on

cutting or division of rhizomes (Staples and Kristiansen, 1999). These methods are not

capable of rapidly reproducing stock lines. Recently, rapid clonal propagation methods

have been developed to overcome this problem (Sunitibala et al., 2001; Munshi et al.,

2004; Prathanturarug et al., 2005; Ma and Gang, 2006). Random amplified polymorphic

DNA (RAPD) analysis showed that variations occurred at the DNA level in plants

regenerated from leaf base callus cultures (Salvi et al., 2001). However, chemical

differences were not detected between conventional grown plants and plants derived from

in vitro propagation (Ma and Gang, 2006).

As a widely distributed culinary herb that has been cultured for a long time, turmeric has various cultivars. Genetic and morphological variability among different cultivars has been reported (Nandi, 1991; Chandra et al., 1997). The amount of essential oil and curcumin in the rhizome was found to vary greatly among 38 types of turmeric (Mathai,

1976). Similar results were reported in 6 cultivars and two wild relatives of turmeric from

Maharashtra, India (Rakhunde et al., 1998), and in 27 accessions grown at Lucknow,

India (Garg et al., 1999).

Heavy fertilization can increase the yield and quality of turmeric rhizomes (Yamgar et al.,

2001; Senapati et al., 2005; Akamine et al., 2007; Hossain and Ishimine, 2007). Effects of

57

chemical fertilizer on the yield and quality of turmeric rhizomes have been tested

(Senapati et al., 2005; Akamine et al., 2007). Application of nitrogen (N) could

significantly increase vegetative growth and rhizome yield, but did not increase the

content of curcumin. In contrast, application of potassium (K) and phosphorus (P) did not improve vegetative growth, but significantly increased curcumin 1 level in rhizome

(Akamine et al., 2007). This result showed that the vegetative growth might not closely

associate with accumulation of specialized metabolites, which was also supported by

observations on growth and development of turmeric (Mathai, 1978). While dry matter,

starch content, and rhizome yield significantly increased from 4 to 8 month, there was

almost no change on the content of curcumin. The level of volatile oil in the rhizome

even decreased after 6 month (Mathai, 1978).

The yield and quality of turmeric rhizomes are also affected by many other factors, such

as irrigation (Singte et al., 1997), planting distance (Hossain et al., 2005b), planting depth

(Ishimine et al., 2003), type of soil (Hossain and Ishimine, 2005), and seed rhizome size

(Hossain et al., 2005a). The result of those investigations provided valuable information

about the vegetative growth and accumulation of specialized metabolites in turmeric

under different agricultural conditions. However, probably due to limited accessibility of

advanced analytical instrumentation, determinatation of the quality of the turmeric

rhizome was often simplified into the quantification of a single compound (curcumin 1)

or the total amount of volatile oil. This simplified measurement is insufficient to

characterize the over-all health beneficial value, or to understand the biosynthesis of the

58 multiple compounds present. A metabolomic study will hopefully overcome this problem, and benefit both scientific research and agricultural practice of this important medicinal plant.

59

CHAPTER 2: MATERIALS AND METHODS

2.1 Materials

2.1.1 General materials

Acetonitrile and methanol (B&J ACS/HPLC certificated solvent) were purchased from

Burdick & Jackson (Muskegon, MI). Methyl t-butyl ether (MTBE, High Purity Solvent) was purchased from EMD Chemicals Inc (Gibbstown, NJ). Authentic standards of curcumin 1, demethoxycurcumin 2, and bisdemethoxycurcumin 3 were purchased from

ChromaDex, Inc. (Santa Ana, CA). Glass vials used in metabolomic experiments were

purchased from KIMBLE glass Inc. (Vineland, NJ). Glass beads (soda Lime) were

purchased from BioSpec products Inc. (Bartlesville, OK).

2.1.2 Growth of basil and leaf sample collection

Seeds for four lines of sweet basil, EMX-1, MC, SD, and SW, were obtained from the

Newe Ya'ar Research Center, Israel. Plants were grown in parallel under controlled

conditions. All basil plants were grown in a growth chamber at 34 ºC under a photoperiod

of 16 h light/8 h dark with an illumination intensity of 260 µmol m-2 s-1 and watered daily

with 20-20-20 nutrient solution (Tindara, LLC, Georgetown, MA). Young leaves (2 cm

in length or smaller) were harvested from 6 week old plants for gland isolation and

metabolite analysis.

60

2.1.3 Growth of turmeric and rhizome sample collection

We grew and harvested the turmeric plants using a 2×2×4 full factorial design with 6

replicates. The three factors of the experimental design were plant cultivar (HRT and

TMO), growth stage (5 m and 7 m), and fertilizer treatment (1-low, 2-middle, 3-high, and

4-control). Plants were grown in a green house under conditions described previously

(Jiang et al., 2006c; Ma and Gang, 2006). Briefly, seed rhizomes of turmeric line HRT

were derived from in Vitro micropropagation (Ma and Gang, 2006). Seed rhizomes of turmeric line TMO were obtained from commercial sources (Dean Pinner at Pinner Creek

Organics, Hilo, HI). The rhizomes were planted in 20 L pots with Scott’s Metromix soil.

The plants were watered 3 times a week by drip irrigation. Four types of fertilizer treatments were applied to the plants analyzed as Table 2-1-1. Fresh rhizome samples were collected at the 5th and 7th month after planting, and were immediately frozen in

liquid nitrogen after harvest. The frozen samples were stored in -80 ºC until analysis.

Table 2-1-1 Fertilizer treatments for turmeric

Time Fertilization Fertilization Fertilization Fertilization Fertilizer (month) 1 (g) 2 (g) 3 (g) 4 (control)

Ca(H2PO4)2 6 12 18 - 0 K2SO4 4 7 10 -

NaNO3 2.5 5 7.5 -

1.5 Ca(H2PO4)2 2 4 6 -

K2SO4 2 4 8 -

NaNO3 2 4 8 -

2.5 Ca(H2PO4)2 1.5 3.5 5.5 -

K2SO4 1 2 3 -

61

2.2 Equipment and data analysis software

2.2.1 Equipment

GC-MS: Thermo Electron Trace GC Ultra coupled to a DSQ mass spectrometer

(ThermoElectron, San Jose, CA)

LC-MS/MS-PDA: Surveyor HPLC coupled to a LCQ Advantage ion trap and an in-line

PDA detector (ThermoElectron, San Jose, CA)

nano-LC-MS/MS: A modified microbore HPLC system (Surveyor; ThermoFinnigan, San

Jose, CA, USA) using an autosampler. This system was modified by a simple T-piece flow-splitter to operate at capillary flow rates. LCQ-Deca XP Plus ion trap mass spectrometer (ThermoFinnigan) nanospray source was used as the detector.

Bead Beater (model 1107900, Biospec Products, Inc., Bartlesville, OK)

Branson Sonifier 450 (Branson Ultrasonics Corporation, Danbury, CT)

GS-6R centrifuge (Beckman Coulter; Fullerton, CA)

N-EVAP 112 Nitrogen Evaporators (Organomation Associates; Berlin, MA)

Optima TL ultracentrifuge (Beckman, Palo Alto, CA).

62

SpeedVac vacuum centrifuge (Savant, Farmingdale, NY)

2.2.2 Computer languages and software tools for data processing and analysis

ActivePerl (version 5.8.8): ActiveState Software Inc. Vancouver, BC; freely available at http://www.activestate.com/products/activeperl/

AMDIS (version 2.65): Automated Mass Spectral Deconvolution and Identification

System; freely available at http://chemdata.nist.gov/mass-spc/amdis/

Bioconductor (version 2.0): freely available at http://www.bioconductor.org/

BioPerl (version 1.5.2): freely available at http://www.bioperl.org/wiki/Main_Page

map2slim.pl: a component of BioPerl, freely available at http://search.cpan.org/~cmungall/go-perl/

R (version 2.4.1): freely available at http://cran.r-project.org/

SPSS (version: 15.0 for windows): SPSS Inc. Chicago, IL

SEQUEST: A component of BioworkBrowser 3.1; ThermoElectron, San Jose, CA

63

TPP (v2.9.5 GALE): Trans-Proteomic Pipeline; Institute for Systems Biology, Seattle,

WA; freely available at http://tools.proteomecenter.org/wiki/index.php?title=TPP:Windows_Cygwin_Installation

Xcalibur (version 1.4 SR1): ThermoElectron, San Jose, CA

X!tandem (version: win32-06-09-15-3): freely available at http://www.thegpm.org/TANDEM/index.html

NIST Mass Spectral library Version 2.0 (NIST/EPA/NIH, USA)

MET-IDEA (version 1.2.0): Noble Foundation, Inc., freely available at

http://bioinfo.noble.org/download/

xcms: A component of Bioconductor (version 2.0): freely available at http://www.bioconductor.org/

2.3 Methods

2.3.1 GC-MS analysis of basil leaves

Young basil leaves were extracted for metabolite analysis by shaking 0.5 g of pooled young leaves (~2 cm, collected from plants 6-8 weeks old) in 2.0 ml of EtOAc overnight

64

at room temperature. Filtered extracts were used for GC-MS analysis, which was

performed on Thermo Electron Trace DSQ Ultra equipped with an Alltech ECONO-

CAPTM-ECTM-5 (30 m × 0.25 mm ID × 0.25µm) capillary column and a 5 m guard column. Ultrapure helium was used as the carrier gas at a flow rate of 1.2 ml/min. The

injection volume was 2 µl, and the split ratio was 10. The temperatures of the injector,

transfer line, and ion source were set as 220°C, 250°C, and 200°C, respectively. After a 2

min hold at 40°C, the column oven temperature was programmed to increase to 100°C at

8°C/min, then to 300°C at 3°C/min and held for 3.5 min. Electron voltage was set as 70

eV. Eluted compounds were identified by comparison of their fragmentation patterns

with the NIST Mass Spectral library Version 2.0 (NIST/EPA/NIH, USA).

2.3.2 Isolation of peltate trichome glands from basil young leaves

Peltate trichome glands were isolated from young basil leaves using a method adopted from (Gang et al., 2001). Briefly, 16 g of basil young leaves (2 cm in length or smaller) were collected from plants 6-8 weeks old, and then soaked in 250 ml of the soaking buffer (14 mM β-mercaptoethanol, and 5 mM Tris-HCl, pH 7.5) for 15 min. The soaking buffer was decanted and the swelled leaves were transferred into a 300-ml beveled flask of the Bead Beater (model 1107900, Biospec Products, Inc., Bartlesville, OK) with 50 ml of glass beads (0.5 mm in diameter, soda lime, BioSpec products Inc. Bartlesville, OK).

280 ml of isolation buffer (5 mM Succinic acid, 0.5 mM K2PO4, 10 mM KCl, 5 mM

MgCl2, 1 mM EGTA, 14 mM β-mercaptoethanol, 50 mM Tris-HCl, 1% [w/v] polyvinylpyrrolidone, 360,000, 0.6% [w/v] Methylcellulose, pH 7.5) was added into the

65

flask. After the bead beater apparatus was assembled and plugged into a rheostat, the

mixture was beaten for 80 sec at 50 V for 3 times. The resulting mixture was passed

through a 350-mm mesh cloth, and then a 105-mm mesh cloth (Small Parts, Inc., Miami

Lakes, FL). The peltate glands collected on the 105 mesh cloth were washed with and

suspended in the gland wash/storage buffer (5 mM Succinic acid, 0.5 mM K2PO4, 10 mM KCl, 5 mM MgCl2, 1 mM EGTA, 14 mM β-mercaptoethanol, and 50 mM Tris-HCl,

pH 7.5). Samples were kept ice cold during the whole isolation process.

2.3.3 Protein purification and fractionation from the basil glandular trichomes

Basil peltate glandular trichomes (70 µl settled glands) were diluted 10-fold in ice cold

sonication buffer (5 mM Tris-HCl, 1mM PMSF, pH 8), and disrupted by sonication

(Branson Sonifier 450, Branson Ultrasonics Corporation, Danbury, CT) for 15s twice.

The lysate was ultracentrifuged at 10,000 g at 4°C for 15 min. The supernatant was

centrifuged at 100,000 g at 4 °C for 1 h using an Optima TL ultracentrifuge (Beckman,

Palo Alto, CA). The pellet (microsomal proteins) was washed with 250 µl of the

sonication buffer, and again centrifuged at 100,000 g at 4 °C for 1 h. The supernatant of

the first ultracentifugation step (cytosolic proteins) were precipitated by 20% [v/v] TCA,

incubated on ice for 15 min, and centrifuged at 16,000 g at 4°C for 15 min. The pellet

was washed twice with 0.1% β-mercaptoethanol in acetone at -20 °C for 20 min. The

protein concentrations of the cytosolic and microsomal protein fractions were determined

using the Bio-Rad Protein Assay kit (Bio-Rad, Hercules, CA, USA).

66

2.3.4 Proteomic analysis using GeLC-MS/MS

The protein samples were applied to GeLC-MS/MS with a method adopted from (Breci

et al., 2005b). Briefly, the protein pellets were resuspended in 40 µl of protein sample

buffer (8.0% SDS, 30% glycerol, 1.0% β-mercaptoethanol, 0.02% bromophenol blue,

250 mM Tris-HCl, pH 6.8), and then loaded on a Tris-HCl Ready Gel (4–20% linear

gradient, Bio-Rad, Hercules, CA, USA). After electrophoresis, the sample gels were

silver stained using a method adopted from Shevchanko et al. (Shevchenko et al., 1996).

The sample lanes were divided into 32 equally-sized pieces (~2 mm each). Proteins in the

gel pieces were digested with trypsin (Wilm et al., 1996), and extracted with 5% formic

acid/50% CH3CN using a Multiprobe-II liquid handling system (PerkinElmer, Shelton,

CT, USA). Peptide extracts were concentrated to 10 µl using a SpeedVac vacuum

centrifuge (Savant, Farmingdale, NY, USA).

The peptides were introduced into a modified microbore HPLC system (Surveyor;

ThermoFinnigan, San Jose, CA, USA) using an autosampler. This system was modified

by a simple T-piece flow-splitter to operate at capillary flow rates. Sample (5 µl) was

loaded on a nanocapillary RP-LC column (6 cm × 100 µm i.d.) packed with 5 µm Xorbax

C18 resin. Buffers were 0.1% formic acid (A) and 0.1% formic acid in ACN (B), and the flow rate was 400 nl·min-1. After a 10 min initial wash with buffer A, the separation of

peptides was achieved with a linear gradient from 5 – 50% buffer B over 30 min,

followed by 50 – 98% buffer B over 5 min, a 5 min wash at 98% B, 98% – 5% buffer B

for 5 min, and 5% buffer B for 5 min. An LCQ-Deca XP Plus ion trap mass spectrometer

67

(ThermoElectron) with a nanospray source (electrospray voltage 1.8 kV) was used as the

detector. The mass scan range was 400 – 1500 m/z, and data dependent scanning was used to acquire the MS/MS spectra of the top 3 most abundant ions in a precursor ion scan (Andon et al., 2002).

2.3.5 Proteomic analysis using MudPIT

The protein samples were applied to MudPIT with methods adopted from (Breci et al.,

2005a). Briefly, protein pellets containing about 100 µg of proteins were resuspended in

100 µl of a urea buffer (8 M urea, 100 mM ammonium bicarbonate). Prior to trypsin digestion, the protein samples were mixed with reagents and incubated as follows: 2µl of

100 mM dithiothreitol, 15 min; 3 µl of 100 mM iodoacetamide, 15 min in dark; 12 µl of

0.3 µg·µl -1 endoproteinase Lys-C, 37ºC for 3h; then 350 µl of fresh 100 mM ammonium

bicarbonate, 50 µl of acetonitrile, and 4.5 µl of 100 mM calcium chloride, and 6 µl of 0.1

µg/µl Promega trypsin, 37 ºC overnight. Tryptic peptides were purified using Spec PT

C18 solid phase extraction pipette tip (Varian, Lake Forest, CA, USA), concentrated to

near-dryness under vacuum, and dissolved in 50 µl of 0.5% formic acid. Digested protein sample was introduced into the same nano LC-MS/MS system described in the previous section with the following modifications: the column was packed with 6 cm of 100 angstrom, 5 µm Xorbax C18 resin and then 3 cm of 100 angstrom, 5 mm polyhydroxyethyl-A strong cation exchange resin (PolyLC; TheNestGroup,

Southborough, MA, USA). Buffers were 0.1% formic acid (A) and 0.1% formic acid in

ACN (B), and 250 mM ammonium bicarbonate (C), 1.5 M ammoniumbicarbonate (D).

68

Chromatographic separation was performed with the following method—step 1: 0 –

100% buffer B over 60 min; step 2: variable ratio of buffer C for 4 min; step 3: 5 – 50%

buffer B over 60 min; step 4: 50 – 98% buffer B over 5 min; and step 5: 98% buffer B for

5 min. Steps 2 to 5 were repeated with 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100 percents

of buffer C. The column was finally washed with 100% buffer B for 60 min.

2.3.6 Database searching and data processing for protein identification and TSC calculation

MS/MS spectra acquired were analyzed using SEQUEST (Eng et al., 1994; Yates et al.,

1995a) and searched against a custom protein database. This database contained peptide

sequences from 3 sources: (1) translated peptide sequences based on EST sequences from

glandular trichomes of basil, (2) plant protein sequences from the UniProt

Knowledgebase (Release 6.9), and (3) peptide sequences representing common protein

contamination in proteomics experiments. Static modification of Cys with a m/z shift of

57 and differential modification of Met with a m/z shift of 16 were considered during the

searching.

After searching against the database, TPP (version: v2.9.5 GALE, Institute for Systems

Biology, Seattle, WA) (Keller et al., 2002; Nesvizhskii et al., 2003; Nesvizhskii and

Aebersold, 2005) was used to filter the proteomics data sets. Based on the TPP results, a minimal list of proteins that are sufficient to explain all peptides observed during the database searching was generated using MS Access. A representative protein ID was

69

selected for each of the protein groups when the identification of a single protein was not

conclusive. Proteins identified as common contaminants and proteins with a protein

probability of less than 0.95 were discarded.

Sequences of known plant enzymes were obtained from Expasy

(http://www.expasy.org/srs5bin/cgi-bin/wgetz) based on their EC numbers. These

sequences were used as query sequences to blast against a protein sequence database

containing all identified proteins with a protein probability of more than 0.95. The resultant protein list was validated manually, and the spectral count number of each protein was parsed from TPP result files using a Perl script (Find_peptides.pl), available through the Gang lab web page at: http://ag.arizona.edu/research/ganglab/links.htm). TSC was calculated as the sum of spectral count numbers of proteins identified as having the same identity or function. The TSC value for each gene in a specific line was normalized as a proportion of the total TSC value for all proteins from the same line. The same approach was used to obtain normalized TENs from the EST database.

2.3.7 Data processing for protein functional categorization

The GO database with functional assignments (gene_association.goa_uniprot.gz, date:

01/19/2007) (Ashburner et al., 2000; Harris et al., 2006) was downloaded from EBI

(ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/). GO terms of proteins in the minimal list were extracted from the database using a Perl script (find_GO.pl), available through the Gang lab web page at: http://ag.arizona.edu/research/ganglab/links.htm). The

70

plant GO-slim (http://www.geneontology.org/GO_slims/goslim_plant.obo, date:

02/03/2007) and map2slim (http://search.cpan.org/~cmungall/go-perl/) programs were used to provide a broad overview of protein function using high-level GO terms and to determine the number of individual proteins contributing to each GO functional category.

TPP EST database PTM enzymes

Full list of PTM targets LC-MS/MS raw protein identified files

Sequest Protein database Wildcat Toolbox

Combined dta files Mass shifts and X!tandem Neutral losses

PTMs Manual validation

Figure 2-3-1 Data processing method for identification of protein PTMs using X!tandem

2.3.8 Searching for PTMs using X!tandem (Figure 2-3-1)

A Perl script Append.pl (http://proteomics.arizona.edu/toolbox.html) was used to

concatenate all DTA files generated during SEQUEST searching. The combined files

were analyzed using X!tandem (Fenyo and Beavis, 2003) with mass shifts and neutral

losses associated with one of the following three PTMs: (1) phosphorylation (mass shift of +80 on STY and neutral loss of -98 on ST) (DeGnore and Qin, 1998) (2) Arginine methylation (mass shift of +14 on R) (Brame et al., 2004), and (3) lysine ubiquitination

71

(mass shift of +144 on K) (Peng et al., 2003). A downsized database was used for the

X!tandem searching. It contained peptide sequences from three sources: (1) proteins

identified by SEQUEST searching and TPP processing, including all proteins with a

group probability more than 0.10, (2) plant proteins with the specific PTM in the UniProt

database, and (3) the translated EST sequences analogous to (2). The output xml files

were submitted to the GPM server (http://h319.thegpm.org/tandem/thegpm_upview.html),

and the spectral matching results of all peptides with the target PTM were manually

reviewed.

2.3.9 Sample preparation for metabolomic analysis of turmeric rhizomes

The frozen rhizome samples were ground in liquid nitrogen into fine powder with a cold mortar and a pestle. 4g of the rhizome powder were exacted three times by shaken with

16 ml MeOH overnight. The MeOH extract in the 20 ml vial was centrifuged at 2060g

(3000 rpm) for 30 min. The supernatant were combined and dried under nitrogen flow.

The dry extract was resuspended in 20 ml of LC-MS grade MeOH. 100 µl of the

suspension was diluted with 1900 µl LC-MS grade MeOH, filtered through 0.2 um PTFE

membrane, and stored at -20 ºC until analyzed using LC-PDA. The rest of the suspension

was dried under nitrogen flow, and resuspended in 2 ml of MeOH. The suspension was

centrifuged at 2060g (3000 rpm) for 30 min. Then the supernatant was filtered through

0.2 um PTFE membrane, and stored at -20 ºC until analyzed using LC-MS and LC-

MS/MS. 2 g of the rhizome powder were extracted by shaken with 4 ml MTBE overnight.

72

The MTBE extract was filtered through 0.2 µm PTFE membrane, and stored at -20 ºC

until analyzed using GC-MS.

2.3.10 GC-MS analysis of turmeric rhizomes

450 µl of the filtered MTBE extracts of turmeric rhizomes were mixed with 50 µl of

internal standard solution (p-chlorotoluene in MTBE, 0.1 mg/ml) and then submitted for

GC-MS analysis. The GC-MS analysis was performed using a method similar to the

method used in section “2.3.1 GC-MS analysis of basil leaves” with following

modifications: After a 2 min hold at 40°C, the column oven temperature was

programmed to increase to 100°C at 8°C/min, to 280°C at 3°C/min, then to 300°C at

10°C/min and held for 3.5 min.

A target spectral library with retention time information was built up based on the

compound identification using AMDIS (version 2.65) with NIST Mass Spectral library

Version 2.0 (NIST/EPA/NIH, USA) and an essential oil GC-MS mass spectra library from Dr. Robert P. Adams, as well as by referral to the literature (Jolad et al., 2004; Jiang et al., 2006c; Ma and Gang, 2006). A compound was considered identified only when the match sore of its spectrum was larger than 800. Compounds failed to meet this criteria were considered unidentified and artificial names were assigned to them according to the nomenclature rules proposed by Bino’s report (Bino et al., 2004). Then the target library was used to identify eluted compounds from representative GC-MS result files. The parameters of AMDIS were: (1) Deconv.: component width, 32; resolution, low; shape

73

requirement, low; (2) Identif.: use retention time; (3) Instr: scan direction, low to high; (4)

Other: default.

Quantitative analysis of the GC-MS result was performed using MET-IDEA (version

1.2.0). An ion-retention time list was generated using AMDIS and then manually

processed to exclude redundant peaks (R2>0.8 and ∆Rt < 0.2 min) and unreliable peaks

(Rt < 5 min; Rt > 42 min; or peak purity < 50%) after the first round of MET-IDEA

analysis. Then the refined ion-retention time list was used for the second round of MET-

IDEA analysis to collect peak area information. The parameters of MET-IDEA are: (1) chromatography: GC; average peak width, 0.1; minimum peak width, 0.3; maximum

peak width, 6; peak start/stop slope, 1.5; adjusted retention time accuracy, 0.95; peak

overload factor, 0.3; (2) mass spec: quadrupole; mass accuracy, 0.1; mass range, 0.5; (3)

AMDIS: exclude ion list, 73, 147 281 341 415; lower mass limit, 50; ions per component,

1. The peaks of internal standard p-chlorotoluene were used for the retention time

calibration.

2.3.11 LC-MS and LC-MS/MS analysis of turmeric rhizomes

75 µl of the concentrated MeOH extracts of turmeric rhizomes were mixed with 75 µl of

internal standard solution (6-Benzylaminopurine in MeOH, 0.25 mg/ml) and then

introduced into LC-MS/MS-PDA system for LC-MS analysis. The LC-MS analysis were

performed using a ThermoFinnigan Surveyor MS HPLC coupled to a ThermoFinnigan

LCQ Advantage ion trap and an in-line PDA detector (San Jose, CA, USA) fitted with a

74

Discovery® HS C18, 3 µm, 15 cm × 2.1 mm HPLC column and a Discovery® HS C18,

3 µm, 2 cm × 2.1 guard column (Supelco, Bellefonte, PA, USA). All LC-MS data were recorded under following chromatographic conditions: (1) mobile phase: buffer (A),

(5mM ammonium formate, 0.1% formic acid, in ddH2O); buffer (B), acetonitrile; (2) gradient: 0-2 min, 5% B; 2-57 min, 5-100% B; 57-60 min, 100% B; 60-65 min, 100-5%

B; 65-75 min, 5% B; (3) flow rate: 0.25 ml·min-1;(4) column temperature, 40 °C; (5) injection volume, 5 µl. The MS parameter for the ThermoFinnigan LCQ Advantage ion trap were: (1) negative mode, full scan; (2) sheath gas flow 36; aux/sweep gas flow, 45;

(3) source voltage, 4.5 kV; source current, 80 µA; (4) capillary voltage, -32 V; (5) capillary temperature, 270 °C; (6) tube lens offset, -25 V (negative); Q-value, 25; (8) mass range measured: 100-1000 m/z. (9) Data type: centroid.

Representative samples were selected for analysis using LC-MS/MS for compound identification. Different to LC-MS analysis LC-MS/MS analysis used MeOH extract without addition of internal standard. LC-MS/MS analysis was performed using the same equipment under conditions similar to LC-MS analysis. Both positive and negative modes were performed under collision gas pressure ca. 10-5 torr. Mass ranges for positive mode were: 100-307; 282-450; 312-337; 342-365; 370-450; 440-630; 620-820; 810-1000.

Mass ranges for negative mode were: 100-304; 280-450; 310-335; 340-365; 370-450;

440-630; 620-820; 810-1000. Data dependent scanning was used to acquire the MS/MS spectra of the top 1-3 and 3-5 most abundant ions in a parent scan at each of the multiple

75 mass scan ranges in both positive and negative mode. Therefore, four files were generated in each of the mass range.

Each of a retention time-ion pair in LC-MS results was treated as an individual compound. Diarylheptanoids in the rhizome sample were identified based on their

MS/MS spectra and fragmentation rules reported previously (Jiang et al., 2006a; Jiang et al., 2006b). Quantitative analysis of LC-MS was performed using an R package, xcms

(version 1.6.1) with following parameters: snthresh = 6, fwhm = 18, bw = 10, minfrac =

0.4, and span = 0.5. The result of xcms was manually processed to eliminate isotopic peaks (0.5<∆M<1.5, ∆Rt<18s) and unreliable peaks (Rt<600s or Rt> 3300s).

2.3.12 LC-PDA analysis of turmeric rhizomes

75 µl of the diluted MeOH extracts of turmeric rhizomes were mixed with 75 µl of internal standard solution (Syringone in methanol, 0.2 mg/ml) and then introduced into the LC-MS/MS-PDA system for LC-PDA analysis. The equipment and the chromato- graphic conditions were as same as those used in LC-MS analysis. Spectra were collected from 200nm to 600 nm with a filter bandwidth of 1nm. The peak areas of the internal standard and the three major curcuminoids (curcumin 1, demethoxycurcumin 2, and bisdemethoxycurcumin 3) were collected using Qual Browser (version 1.4 SR1) at 299 nm (Syringone), 427 nm (curcumin 1), 422 nm (demethoxycurcumin 2), and 418 nm

(bisdemethoxycurcumin 3). And then the concentrations of the three curcuminoids were obtained based on a calibration curve of the three curcuminoids prepared over

76 concentration ranges of 2.5-60.0 µg/ml for curcumin, and 1.25-30.0 µg/ml for demethoxycurcumin and bisdemethoxycurcumin.

2.3.13 Data analysis of the targeted analysis

3-way ANOVA (analysis of variance) of data from targeted analysis (three major curcuminoids) was performed using SPSS for windows 15.0. The significance level was set at 0.05. Multiple comparisons were carried out using Scheff's Post hoc test on factor fertilizer treatment after a statistically significant difference had been detected among the means of the 4 levels. Eta squared was calculated to show the effect size of the three factors and within group variance.

2.3.14 Data analysis of the non-targeted analysis

PCA analysis of the quantitative data acquired from LC-MS and GC-MS were performed using SPSS for windows 15.0. The quantification results of the three major curcuminoids

(from targeted analysis) were combined into the LC-MS data. All the data were autoscaled before the PCA analysis. The parameters of the PCA were set as following: (1) extraction: number of factors, 3; maximum iterations for convergence, 25; (2) rotation: varimax; maximum iterations for convergence, 25. The scatter plots of the top 3 principal components were created using Microsoft Excel 2003.

HCA analysis and creation of the heatmap were performed using two R packages,

Heatplus, and gplots. All the data were autoscaled. Two-way HCA analysis of LC-MS

77

and GC-MS data was carried out using Euclidean distance and Ward's method. Then the

data were sorted according to cluster membership. Using the sorted data, a compound-

sample heatmap was generated to represent all samples in a single graph. Correlation

heatmaps were generated based on Pearson’s correlation coefficient, which represents the similarity of the abundance patterns of compounds in the rhizome samples. Both

correlation heatmaps and compound-sample heatmaps were created using bluered color

scheme in gplots package (Appendix A2).

78

CHAPTER 3: RESULTS

3.1 Protein expression and modification in basil glandular trichomes

This part of the research had two aims:

(1) To identify proteins and protein post-translational modifications in glandular

trichomes of sweet basil

(2) To investigate the regulation of specialized metabolism in glandular trichome

from four sweet basil lines (SW, MC, EMX-1 and SD) by comparison of their

transcriptome, proteome, and metabolome

Basil young leaves Glandular trichomes

Microsomal proteins Cytosolic proteins

GeLC-MS MudPIT

LC-MS/MS results SEQUEST search X!tandem search TPP validation Manual validation

Protein identification and Protein modifications semi-quantification

EST and GC-MS data

Figure 3-1-1 Experimental design of the proteomic study of basil glandular trichomes

79

3.1.1 Profiles of volatile metabolites in basil lines

Major volatile compounds in the leaves of basil lines SW, MC, EMX-1 and SD were

extracted using ethyl acetate (see chapter 2: materials and methods) and then analyzed

using GC-MS (Figure 3-1-2). The phenylpropenes eugenol and methylchavicol were the

major constituents in SW and EMX-1, respectively. In contrast, line MC contains almost

no phenylpropenes, but primarily accumulates methylcinnamate in its glandular

trichomes. Line SD contains mainly terpenoids, such as the monoterpenoids neral and

geranial and several sesquiterpenoids, such as β-caryophyllene, germacrene D and α-

bisabolene, among others, as its major volatile constituents. The major terpenoid in both

EMX-1 and MC was the monoterpenoid 1,8-cineole. SW produced large amounts of linalool in addition to 1,8-cineole and an array of sesquiterpenoids, whereas MC produced only trace levels of sesquiterpenoids. Line EMX-1 produces only low levels of terpenoids, compared to its major phenylpropanoid pathway-derived volatile constituent, methylchavicol. These results were consistent with previous findings (Gang et al., 2001;

Iijima et al., 2004a).

3.1.2 Protein identification using shotgun proteomic approaches

To improve the coverage of the proteomic analysis, the proteome from the glandular trichomes of each basil line was divided into a microsomal fraction and a cytosolic fraction. Both fractions were analyzed using GeLC-MS/MS, and the cytosolic fraction was also analyzed using MudPIT. A custom peptide sequence database, containing

V

80

SW

MC

EMX-1

SD

Figure 3-1-2 Representative GC-MS chromatograms showing different metabolic profiles of basil cultivars (EtOAc extracts of basil leaves were analyzed using GC-MS. Compounds were identified by comparison of their fragmentation patterns with the NIST Mass Spectral library Version 2.0.)

Table 3-1-1 Summary of results of TPP processing

Minimum protein Estimated false probability positive rate (%) Entry number SW 0.95 0.7 191 MC 0.95 0.7 340 EMX-1 0.95 0.6 331 SD 0.95 0.6 420

81

both translated EST sequences from peltate glands of sweet basil and plant protein

sequences from UniProt, was used in SEQUEST-based searching for protein

identification (Eng et al., 1994). The SEQUEST results were processed using the Trans-

Proteomic Pipeline (TPP) (Keller et al., 2005). Only protein identifications with probabilities greater than 0.95 were used for further analysis (Table 3-1-1), which reduced the false positive rate to less than 0.75%. Due to the nature of this analysis and the stringency of the identification criteria, only the most abundant proteins or those proteins which yielded peptides that were very amenable to ionization in an electrospray ion source were identified. Over 950 additional potential proteins were identified, but with lower confidence, and these were not included in further analysis. Rubisco was not

found in any of our proteomic results, although it has been readily identified in other proteomic investigations (Porubleva et al., 2001; Koller et al., 2002; Hajheidari et al.,

2005). Moreover, as described below, the most abundant proteins identified in this

investigation corresponded well to the most abundant ESTs in the different basil lines.

These results indicate that the trichome preparations used for protein isolation and

proteomic analysis were not contaminated by other cell types from basil leaves and that

the protein samples analyzed do indeed represent the proteins present in the glandular

trichomes themselves.

The proteomes of the four basil lines were found to be highly diverse, as indicated by

great differences in the most abundant proteins and in contrast to what might be expected

considering that the same cell type from the four basil lines was used in this investigation.

82

When potential analogs from different basil lines were considered to be different proteins,

a total of 755 proteins were identified, with only 68 (9 %) proteins being identified from

all four basil lines (Figure 3-1-3a). In contrast, 439 proteins (58 %) were unique to a single basil line. 491 (65%) of the proteins identified had direct EST support; they matched the sequences of ESTs in our database (Figure 3-1-4a). The rest of the identified proteins were identified based on the UniProt database, i.e., no comparable EST was identified in our basil EST database. When analogous proteins from the four different basil lines were considered to be the same protein, a total of 492 proteins were identified, with only 71 (14.4 %) being common to all four lines. And, 245 (49 %) proteins were found in only one line. For this non-redundant protein set, 118 proteins (24 %) did not have a corresponding EST in the basil transcriptome database. Given the low false positive rate of the protein identification, these observations might suggest that the EST database used in this study, although it represents a relatively large number of genes for a single cell type (7963 unigenes total), is still a relatively incomplete picture of all of the genes being actively expressed in the glandular trichomes, with several proteins that are either abundant or have easily ionized peptides not having EST support. In many of these cases, however, a BLAST search back against the basil database using the amino acid sequence of the UniProt protein hit revealed one or more basil EST contigs with strong homology to the UniProt protein hit for the proteomics peptide. Some of these UniProt protein identifications may represent false negative search results against the basil EST database and failure of the data processing methods and search algorithms. This suggests the data processing software may require further refinement. On the other hand, some of

83

these proteins may be very stable proteins that do not require high steady-state transcript

levels to maintain a relatively high protein abundance within the cell. A possible example may be an Arabidopsis protein of unknown function, At1g31870 (UniProt entry

Q9C6S8_ARATH), which generated a total of 19 spectra in two different basil lines, but for which no basil EST homologs could be identified. This is an interesting question that will be the subject of future research.

We looked more closely at the proteins that were common to the glandular trichome proteomes of at least three of the four basil lines, because we surmised that this strong conservation of protein expression could be indicative of an essential role in glandular trichome cell biology or in common metabolic pathways. A total of 119 non-redundant proteins were identified that met these criteria (Appendix B1). The majority of these (72 proteins) are indeed enzymes essential for the metabolic processes involved in specialized metabolism, including many members of the MEP/terpenoid and shikimate/phenylpropanoid pathways. There is also a substantial group of conserved, housekeeping-type proteins, including histones, ribosomal proteins and chaperonins, as might be expected. A very interesting group of highly expressed genes includes 8 hypothetical proteins and proteins with unknown functions. Also interesting are an acyltransferase and 3 cytochrome P450s, whose specific metabolic functions are unknown at this time but which are very strong candidates for roles in production of rosmarinic acid in sweet basil glandular trichomes. A final group of highly expressed genes includes enzymes that may be critical for protection of the gland cells against

84 oxidative stress that most certainly affects this cell type that sits surrounded by air off the surface of the leaf.

3.1.3 Protein functional categorization based on gene ontology (GO) controlled vocabulary

Functional annotations and match IDs of the proteins identified were derived from the

UniProt database, either indirectly via a protein match in our basil EST database, which is itself annotated via blast hits against UniProt proteins, or by direct match of a peptide sequence to a UniProt entry in cases where there is no match to a basil contig. The

UniProt match ID was used to extract Gene Ontology (GO) annotation for each protein hit. To provide a broad overview of the functions of proteins, the GO functional categorization of all proteins was summarized based on the plant GO Slim categories of biological process, molecular function, and cellular component (Pruess et al., 2003)

(Figure 3-1-5).

Functional categorization based on “biological process” and “molecular function” GO

Slim terms included 76.3% and 85.0% of all protein entries, respectively. Consistent with the role of the glandular trichome as a “metabolic factory” for the plant, a large number of protein entries were assigned into categories related to the biosynthesis of phenylpropanoid and terpenoid pathway-derived compounds. In the biological process categories, “metabolic process” was the largest group, accounting for 72.6% of all protein entries. And “biosynthetic process” was the largest subset of this category, accounting for

85

MC SD A 114 163 EMX-1 1659 17 SW

125 37 41 11 41 68 16 16 20 11

MC SD B 60 112 EMX-1 1052 14 SW

58 15 31 9 24 71 8 11 11 6

Figure 3-1-3 Venn diagrams showing overlap among proteomes of four basil cultivars.

The number of protein entries found in all four basil lines is highlighted in bold.

A. When analogous proteins between basil lines were considered to be different proteins, a total of 755 protein IDs were detected.

B. When analogous proteins in different basil lines were considered to be the same protein, a total of 492 protein IDs were detected.

86

A Basil EST Uniprot

387 104 264

B Basil EST Uniprot

222 152 118

Figure 3-1-4 Venn diagrams showing overlap of protein entries connected with basil glandular trichome EST entries and plant protein IDs from Uniprot.

(The number of protein entries not connected with EST entries and identified based on homology to proteins in UniProt is highlighted.)

A. Analogous proteins from different basil lines were considered to be different proteins.

B. Analogous proteins from different basil lines were considered to be the same protein.

87

Precent of protein entries Precent of protein entries 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 B 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 A biological process cellular component cellular process cell amino acid and derivative metabolic process intracellular Biological Cellular cell differentiation cytoplasm component cellular component organization and biogenesis process ribosome generation of precursor metabolites and energy membrane electron transport plastid *nucleic acid metabolic proces mitochondrion DNA metabolic process cytosol translation nucleus metabolic process endoplasmic reticulum biosynthetic process Precent of protein entries carbohydrate metabolic process 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 catabolic process C lipid metabolic process molecular function secondary metabolic process binding response to biotic stimulus nucleic acid binding Molecular response to stress DNA binding function protein metabolic process RNA binding transport #nucleic acid binding nucleotide binding protein binding catalytic activity activity transferase activity kinase activity structural molecule activity translation regulator activity transporter activity

Figure 3-1-5 Distribution of protein entries within “Biological process”, “Cellular component”, and “Molecular function” GO categories after GO Slim processing.

(Only those categories accounting for more than 1% of total protein entries are shown.

Note: * the full name of this category is “nucleobase, nucleoside, nucleotide and nucleic acid metabolic process”. # the full name of this category is “translation factor activity, nucleic acid binding”.)

88

29.1% of all protein entries. Although the category “secondary metabolic process” only accounted for 5.4% of all protein entries, processes directly related to this category made up a much larger fraction. For example, “generation of precursor metabolites and energy”,

“carbohydrate metabolic process”, and “amino acid and derivative metabolic process” are related to the supply of energy and substrates for metabolic conversions for secondary

metabolism (e.g. production of phenylalanine for the phenylpropanoid pathway). Each of

these 3 processes accounted for more than 10% of all protein entries. Similarly, among

the molecular function categories, “catalytic activity” was the largest group, accounting

or 60.5% of all protein entries and containing enzymes necessary for most steps of the

biosynthesis of the compounds known to be produced by basil glandular trichomes.

Categorization based on “cellular component” was not as complete as that based on

biological process or molecular function categorizations. Only 35.1% of the protein

entries were assigned a subcellular location. Most of these were associated with the

cytoplasm. The largest subset was ribosome, accounting for 10.0% of protein entries.

This was consistent with the biological process and molecular function categorizations,

respectively, in which translation and structural molecule activity represented about 11%

of all protein entries. These categorization data suggest active production in the glandular

trichomes of a large number of proteins directly involved in biosynthetic processes.

Membrane proteins accounted for 18.8% of protein entries with cellular component GO

terms. This proportion is not unreasonable and suggests that the approach used provided

good coverage of membrane proteins, because 20.5% of known and predicted proteins

89

are localized to the cellular membranes according to AmiGO

(http://amigo.geneontology.org/cgi-bin/amigo/go.cgi, as of March 3, 2007) (Ashburner et

al., 2000; Harris et al., 2006).

3.1.4 Protein and mRNA levels of enzymes related to the biosynthesis of

phenylpropanoids and terpenoids

To facilitate comparison between lines of relative protein levels for specific genes in

different biosynthetic pathways, we generated a normalized TSC, which was the sum of

total numbers of spectra identified from all proteins with the same enzymatic activity,

normalized as a fraction of 10,000 spectra. The TEN was normalized using the same

approach to enable comparison of relative transcript levels. Comparisons of normalized

TSC and TEN data for 54 enzymes involved in the production of phenylpropanoids and

terpenoids are summarized in Tables 3-1-2 through 3-1-7.

Five important enzymes from core metabolism essential for the production of precursors

for the shikimate/phenylpropanoid and terpenoid pathways, including sucrose synthase,

6-phosphogluconate dehydrogenase, transaldolase, transketolase and glyceraldehyde-3-

phosphate dehydrogenase, were investigated in this study (Table 3-1-2). The TSC values

of all of these enzymes were high in each line, in keeping with their central roles in all

metabolic processes in the glandular trichomes. Variation of the TSCs between the 4 lines

(average RSD 44.1%) was much smaller than that of enzymes in more specialized

pathways such as the phenylpropanoid and terpenoid pathways (average RSD 110.0%).

90

The eight enzymes of the shikimate pathway, leading to production of L-Phe, precursor of

all phenylpropanoid pathway-derived compounds, were evaluated for their relative levels between the four basil lines (Table 3-1-3). The most interesting difference observed in this comparison was the very low level of expression of these genes in line SD, which produces low levels of volatile phenylpropanoids, whereas they were highly expressed in line EMX-1, which produces high levels of methylchavicol. These two basil lines are very similar in morphology and growth habit as well as patterns of probe hybridization to genomic DNA blots (data not shown), suggesting that these two lines are close relatives, although they differ greatly in their relative production of phenylpropanoids versus

terpenoids. This discrepancy can be partially explained by the observed differences in the

expression of genes in the shikimate pathway relative to the terpenoid pathway, as

suggested by data in Tables 3-1-3 to 3-1-5. This is addressed further in the Discussion.

Thirteen enzymes in the general and specialized phenylpropanoid pathway leading to

compounds such as eugenol and methylchavicol were confirmed, by both mRNA levels

and protein levels (high TEN and TSC values, see Table 3-1-4), as being highly

expressed in at least one of the different basil lines. Most enzymes in the

phenylpropanoid pathway were down-regulated at both the gene expression and protein

level in line SD compared to the other lines, especially compared to line EMX-1, which

is thought to be genetically most similar to line SD. The only exception to this was

chavicol O-methyltransferase (CVOMT), which was expressed at about twice the level

91

(for both RNA and proteins) of that observed for line EMX-1, which accumulates high levels of methylchavicol. The only volatile phenylpropanoid compound produced at

detectable levels by line SD is methylchavicol, although it is produced at much lower

levels than are observed for line EMX-1. Eugenol O-methyltransferase (EOMT)

expression appeared to be exclusive to line EMX-1. p-Coumarate/cinnamate

carboxylmethyltransferase (CCMT) was expressed at very high level in line MC and very low level in lines SD and EMX-1, with no transcripts in the EST database for line SW. In contrast, no peptide data for CCMT was observed for lines EMX-1 and SD, whereas high protein levels were observed in line MC, with significant, although somewhat lower levels being observed in line SW.

We also identified 13 enzymes that play roles in production of the precursors of terpenoid

(isoprenoid) biosynthesis—isopentenyl diphosphate (IPP) and dimethylallyl diphosphate

(DMAPP), see Table 3-1-5. However, only 8 of these enzymes were detected by the proteomics approach. All 7 enzymes in the 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway were found to be highly expressed in line SD, while the mevalonate pathway appeared to be practically inactive in all basil lines, as indicated by very low TEN values and the absence of peptides for any of the enzymes in the pathway. Peptides for 1-deoxy-

D-xylulose-5-phosphate synthase (DXS) and DOXP reductoisomerase (DXR) were not detected in EMX-1, although low numbers of ESTs for these proteins were found in the same line. These results may also help explain why line EMX-1 produces high levels of phenylpropanoids and low overall levels of terpenoids, compared to the other basil lines.

92

Enzymes downstream from IPP and DMAPP production, including three prenyl transferases—geranyl diphosphate synthase (GPPS), farnesyl diphosphate synthase

(FPPS), and geranylgeranyl pyrophosphate synthase (GGPPS)—were found in SW, SD, and EMX-1 (Table 3-1-6). MC contained only peptides for GPPS, but none for FPPS or

GGPPS. These results may explain why line MC produces appreciable levels of monoterpenoids, but only very low levels of sesquiterpenoids. In addition, ten different terpene synthases (TPSs) were found in the EST database (Table 3-1-6). Peptides for eight of these genes were detected. It is interesting to note that the protein and mRNA levels of TPSs were not particularly high in SD, which was not consistent with previous results regarding their enzymatic activities (Iijima et al., 2004a), and suggesting that post- translational regulation of the enzymatic activities of TPSs may occur in sweet basil.

3.1.5 Identification of protein post-translational modifications (PTMs)

Phosphorylation, ubiquitination, and arginine monomethylation were identified for proteins extracted from basil glandular trichomes using an approach similar to that reported by Peng et al. (Peng et al., 2003). X!tandem was used to search for post translational modifications (PTMs) based on the characteristic mass shifts and neutral losses in MS/MS spectra for specific PTMs. MS/MS spectra of peptides flagged as potentially having these modifications were manually reviewed to exclude false positives.

In the initial screen, 28 peptides with PTMs were identified (Table 3-1-7). Among these,

14 proteins were phosphorylated, five were ubiquitinated, and nine were arginine

93 methylated. Some peptides contained more than one modification site. An additional 25 peptides, including 13 phosphorylated peptides, 8 ubiquitinated peptides, and 4 arginine methylated peptides, were also identified, but at a decreased confidence level (Appendix

B3).

94

Table 3-1-2 Normalized protein and mRNA levels for selected highly expressed enzymes from primary/core metabolism

-4 -4 Descriptive name Abbre- Normalized TSC (×10 ) Normalized TEN (×10 ) EC number viation SW MC EMX-1 SD SW MC EMX-1 SD sucrose synthase 2.4.1.13 SUS 243.4 363.0 191.1 78.8 14.6 50.4 31.9 7.4 6-phosphogluconate 1.1.1.44 6PGD 190.8 213.2 231.1 126.6 25.0 61.6 47.9 24.0 dehydrogenase transketolase 2.2.1.1 TKT 72.4 195.1 284.4 112.5 10.4 16.8 17.4 22.2

transaldolase 2.2.1.2 TAL 243.4 136.1 124.4 45.0 33.4 54.6 36.3 11.1 Glyceraldehyde-3- 1.2.1.12 G3P 65.8 108.9 62.2 81.6 39.6 71.4 33.4 49.9 phosphate dehydrogenase a Total Spectral Count (TSC), the indictor of protein level, was calculated as the sum of spectral count numbers of protein entries with the same enzymatic activity. TSC was normalized to the total TSC numbers of all protein entries from the same cultivar. b Total EST number (TEN), the indictor of mRNA level, was calculated as the sum of EST numbers of EST entries (contigs or singletons) connected with the same enzymatic activity. TEN was normalized to the total TEN numbers of all EST entries from the same cultivar.

94

95

Table 3-1-3 Normalized protein and mRNA levels for enzymes in the shikimate pathway

a -4 b -4 Descriptive name Abbre- Normalized TSC (×10 ) Normalized TEN (×10 ) EC number viation SW MC EMX-1 SD SW MC EMX-1 SD DAHP synthetase 2.5.1.54 DAHPS 296.1 254.1 44.4 22.5 43.8 123.2 43.5 7.4 3-dehydroquinate 4.2.3.4 DHQS 6.6 36.3 31.1 8.4 0.0 9.8 4.4 1.8 synthase 3-dehydroquinate /shikimate 1.1.1.25 DHQSD 6.6 4.5 0.0 0.0 10.4 4.2 2.9 3.7 dehydrogenase shikimate kinase 2.7.1.71 AROK 0.0 0.0 4.4 0.0 EPSP synthase 2.5.1.19 AROA 59.2 36.3 22.2 5.6 6.3 29.4 18.9 3.7 Chorismate synthase 4.2.3.5 AROC 0.0 18.1 13.3 2.8 0.0 15.4 7.3 0.0

Chorismate mutase 5.4.99.5 CHMU 0.0 0.0 26.7 0.0 2.1 5.6 11.6 3.7 Chorismate mutase/prephenate 4.2.1.51 TYRA 13.2 0.0 13.3 0.0 43.8 23.8 11.6 14.8 dehydratase a,b Total Spectral Count (TSC) and Total EST number (TEN) were determined as described for Table 3-1-2.

95

96

Table 3-1-4 Normalized protein and mRNA levels for enzymes in the phenylpropanoid pathway

a -4 b -4 Enzyme name Abbre- Normalized TSC (×10 ) Normalized TEN (×10 ) EC number viation SW MC EMX-1 SD SW MC EMX-1 SD Phenylalanine ammonia 4.3.1.5 PALY 177.6 235.9 120.0 39.4 131.3 154.1 116.0 22.2 lyase Cinnamate 4-hydroxylase 1.14.13.11 C4H 85.5 0.0 22.2 30.9 31.3 81.2 52.2 46.2 p-Coumarate/cinnamate N/A CCMT 85.5 131.6 0.0 0.0 0.0 703.1 2.9 7.4 carboxyl methyltransferase 4-Coumarate-CoA ligase 6.2.1.12 4CL 85.5 86.2 57.8 0.0 62.5 172.3 66.7 16.6 p-Coumaroyl- CoA:shikimate p- N/A CST 0.0 4.5 0.0 0.0 8.3 7.0 0.0 1.8 coumaroyl transferase p-Coumaroylshikimate-3'- N/A C3'H 131.6 81.7 4.4 5.6 10.4 33.6 20.3 7.4 hydroxylase Caffeoyl-CoA O- 2.1.1.104 CCOMT 59.2 72.6 182.2 33.8 81.3 78.4 171.1 35.1 methyltransferase Cinnamoyl-CoA reductase 1.2.1.44 CCR 0.0 0.0 22.2 0.0 41.7 42.0 74.0 27.7 Cinnamyl alcohol 1.1.1.195 CAD 157.9 108.9 186.7 135.0 20.8 29.4 24.7 12.9 dehydrogenase Coniferyl alcohol acetyl N/A CAAT 13.2 0.0 4.4 0.0 2.1 4.2 4.4 1.8 transferase Eugenol/chavicol synthase N/A EUS 111.8 36.3 62.2 30.9 22.9 61.6 68.2 42.5 Eugenol O- 2.1.1.146 EOMT 0.0 0.0 253.3 0.0 0.0 0.0 53.7 0.0 methyltransferase Chavicol O- 2.1.1.146 CVOMT 0.0 0.0 53.3 126.6 0.0 2.8 145.0 262.2

methyltransferase 96 a,b Total Spectral Count (TSC) and Total EST number (TEN) were determined as described for Table 3-1-2.

97

Table 3-1-5 Normalized protein and mRNA levels for enzymes in the MEP/DOXP and mevalonate (MVA) pathways

a -4 b -4 Descriptive name Abbre- Normalized TSC (×10 ) Normalized TEN (×10 ) EC number viation SW MC EMX-1 SD SW MC EMX-1 SD DXP/MEP Pathway 1-deoxy-D-xylulose-5- 2.2.1.7 DXS 6.6 13.6 0.0 25.3 77.1 81.2 46.4 123.7 phosphate (DOXP) synthase DOXP reductoisomerase 1.1.1.267 DXR 46.1 90.7 0.0 87.2 14.6 0.0 4.4 20.3 4-diphosphocytidyl-2-C- 2.7.7.60 CMS 19.7 63.5 115.6 36.6 4.2 0.0 2.9 0.0 methyl-D-erythritol synthase CDP-ME kinase 2.7.1.148 CMK 26.3 49.9 13.3 42.2 14.6 14.0 1.5 20.3 2-C-methyl-D-erythritol 2,4- 4.6.1.12 MCS 26.3 27.2 57.8 33.8 18.8 16.8 11.6 20.3 cyclodiphosphate synthase 4-hydroxy-3-methylbut-2-en- 1.17.4.3 GCPE 197.4 240.5 253.3 303.8 54.2 23.8 40.6 73.9 1-yl diphosphate synthase 1-hydroxy-2-methyl-butenyl 1.17.1.2 HDR 52.6 27.2 26.7 104.1 123.0 47.6 243.7 179.1 4-diphosphate reductase Isopentenyl-diphosphate 5.3.3.2 IDI 0.0 9.1 0.0 14.1 12.5 16.8 2.9 40.6 Delta-isomerase Mevalonate Pathway acetoacetyl CoA thiolase 2.3.1.9 AACT 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 HMG-CoA synthase 2.3.3.10 HMGS 0.0 0.0 0.0 0.0 2.1 8.4 2.9 5.5 HMG-CoA reductase 1.1.1.34 HMGR 0.0 0.0 0.0 0.0 0.0 0.0 5.8 0.0 MVA kinase 2.7.1.36 MVK 0.0 0.0 0.0 0.0 0.0 0.0 2.9 0.0 97 MVPP decarboxylase 4.1.1.33 PMD 0.0 0.0 0.0 0.0 4.2 0.0 0.0 0.0 a,b Total Spectral Count (TSC) and Total EST number (TEN) were determined as described for Table 3-1-2.

98

Table 3-1-6 Normalized protein and mRNA levels for prenyltransferases and terpene synthases

a -4 b -4 Descriptive name Abbre- Normalized TSC (×10 ) Normalized TEN (×10 ) EC number viation SW MC EMX-1 SD SW MC EMX-1 SD Geranyl diphosphate synthase 2.5.1.1 GPPS 13.2 49.9 44.4 53.4 20.8 16.8 10.2 40.6 Farnesyl diphosphate synthase 2.5.1.10 FPPS 6.6 0.0 13.3 16.9 0.0 5.6 7.3 7.4 Geranylgeranyl pyrophosphate 2.5.1.29 GGPPS 78.9 0.0 31.1 61.9 2.1 5.6 7.3 55.4 synthase Geraniol synthase N/A GES 0.0 0.0 0.0 0.0 0.0 0.0 0.0 16.6 1,8-cineole synthase N/A TPSSCS 828.9 149.7 26.7 0.0 60.5 23.8 8.7 0.0 R-linalool synthase 4.2.3.26 LIS 0.0 0.0 0.0 0.0 16.7 8.4 0.0 12.9 Terpinolene synthase N/A TES 85.5 0.0 0.0 0.0 12.5 0.0 0.0 0.0 Fenchol synthase 4.2.3.10 FES 0.0 0.0 8.9 0.0 2.1 0.0 0.0 5.5 Beta-myrcene synthase 4.2.3.15 MYS 0.0 0.0 31.1 0.0 22.9 0.0 14.5 11.1 Gamma-cadinene synthase N/A CDS 157.9 9.1 0.0 0.0 91.7 70.0 0.0 7.4 Selinene synthase N/A SES 72.4 31.8 66.7 84.4 12.5 15.4 7.3 51.7 Alpha-zingiberene synthase N/A ZIS 0.0 0.0 62.2 8.4 20.8 9.8 29.0 16.6

Germacrene D synthase 4.2.3.22 GDS 39.5 0.0 0.0 45.0 33.4 78.4 18.9 77.5 a,b Total Spectral Count (TSC) and Total EST number (TEN) were determined as described for Table 3-1-2.

98

99

Table 3-1-7 Post-translational modifications found in the proteins from basil glandular trichomes

Basil Peptide Sequence a Protein Description line 2,3-bisphosphoglycerate- MIIDAIEQVGGIYV MC PMGI_RICCO independent phosphoglycerate VtADHGNAEDmVK mutase DSFsFPHTSKPTNL Putative glucose-6-phosphate MC Q8H103_ARATH PLtLSSARSVAR isomerase. Hypothetical protein MC VEsLEAFK Q9LD59_ARATH AT4g08710. ATGAFILTAsHNPG Phosphoglucomutase, SD PGMC_POPTN GPNEDFGIK cytoplasmic RuBisCO subunit binding- GVAVIKVGAAtETE SW Basil12_0059_03_2 protein alpha subunit, LEDR chloroplast precursor ACSTCAGKLTSGsV EMX-1 Basil12_2301_01_2 Ferredoxin I DQSDGSFLDEK RYHSTKsGDELTSL MC Basil12_0651_01_3 Heat shock protein 90 alpha KDYVTR LADITAKGVTtIIGG Putative cytosolic MC Q655T1_ORYSA GDSVAAVEK phosphoglycerate kinase 1 SD mVRVsVLNDALK Basil12_0054_03_2 40S ribosomal protein S15A Geranyl diphosphate synthase SD ATAVAVsG Basil12_1770_01_3 small subunit SD mKEKLSyIALDyDQ ACT1_ORYSA Actin-1 Methionine synthase SW LsLLDKIPPVYK Q8L6T4_9GENT (Fragment). VHLFEVDEmADtFL S-adenosyl-L-methionine SD Basil12_0001_078_3 FTSESVNEGHPDK synthetase VmESMGKGtDSYG Phenylalanine ammonia-lyase SW Q940D8_BRARP VTTGFGATSHR (Fragment). EMX-1 kKPNNIKLYVR Q6UJX4_LYCES Molecular chaperone Hsp90-1. Putative 60S ribosomal protein MC kVIEVEGPR Basil12_1598_01_2 L9 UTP--glucose-1-phosphate MC kSDVTSLSQISENEK Basil12_1764_01_2 uridylyltransferase VAAFLNPkmEDNIGP SD Basil12_0762_01_3 SRG1-like protein AASFISGETPAK AIVEAmPTMkCTVLD SD Basil12_0005_01_2 Chavicol O-methyltransferase LPHVVAGLESTDK a Phosphorylation on serine (s), threonine (t), tyrosine (y); ubiquitination on lysine (k) and monomethylation on arginine (r) were detected using X!tandem with corresponding mass shifts and neutral losses associated

100

Table 3-1-7 Post-translational modifications found in the proteins from basil glandular trichomes - continued

Basil Peptide Sequence a Protein Description line EMX-1 SHCNHLrmEGMVPELK Q7XX57_ORYSA OSJNBb0049I21.6 protein. Putative plant disease resistant EMX-1 SLHIrDCK Q6L3H7_SOLDE protein. EMX-1 TIQASPrR Q9LMM9_ARATH F22L4.6 protein MC rPENWALVEK Q5DKU6_TOBAC Adenosine kinase isoform 2T. SD STELLVr H3_GRIJA Histone H3. F1-ATPase alpha subunit SD AVDSLVPVGr Q5VL82_9ASPA (Fragment). Short-chain alcohol SD VALVTGGSr Q7XDQ8_ORYSA dehydrogenase like protein. ATP-dependent Clp protease SD LrETLTR CLPP_TOBAC proteolytic subunit S-adenosylmethionine SW YLDErTIFHLNPSGR METK_CATRO synthetase 1 a Phosphorylation on serine (s), threonine (t), tyrosine (y); ubiquitination on lysine (k) and monomethylation on arginine (r) were detected using X!tandem with corresponding mass shifts and neutral losses associated

101

3.2 Metabolomic investigation of turmeric rhizome

This part of the research had three aims.

(1) To establish high throughput qualitative and quantitative approaches for analysis

of specialized metabolites in turmeric rizhomes.

(2) To investigate the effects of genetic, developmental, and environmental factors

(cultivars, growth stages, and fertilization treatments) on turmeric specialized

metabolism.

(3) To develop hypotheses regarding biosynthetic pathways of specialized

metabolites by investigation of correlations between abundance patterns of

metabolites.

Plants (different cultivars, fertilizer treatments, and growth stages)

Plant sample collection and storage

Sample preparation Sample preparation

GC/MS LC-MS/MS LC-MS LC-PDA

non-targeted targeted

Identification of Quantitative compounds analysis

Data processing

Hypothesis Data mining generation

Figure 3-2-1 Experimental design for the metabolomic investigation of turmeric rhizomes

102

3.2.1 Targeted analysis of three major curcuminoids

Turmeric plants analyzed in this research were systematically perturbed using a 2×2×4

full factorial design experiment instead of a conventional one-factor-at-a-time method

(Table 3-2-1). Plants were grown under conditions that were all possible combinations of

each level across the three factors (genetic factor: cultivar, developmental factor: growth

stage, and environmental factor: fertilizer treatment). This experimental design allowed

for efficient evaluation of the effects of these factors.

The content of three major curcuminoids (curcumin 14, demethoxycurcumin 16, and

bisdemethoxycurcumin 18) in turmeric rhizomes was determined using LC-PDA. The

results of 3-way ANOVA analysis and post hoc tests showed that each of the three factors had a significant effect on the accumulation of these compounds. When the effects of the three factors were evaluated separately, the levels of the curcuminoids tended to be higher for cultivar TMO, for the 7 month growth stage, and for fertilization treatment 1

(low fertilizer concentration treatment) (Figure 3-2-2). These individual results were further validated by the data shown in Table 3-2-1, in which the experimental group with combination of the optimized levels of the three variables had the highest curcuminoid levels.

Among these three factors, growth stage had a large effect size (η2 ~ 0.4). In contrast,

effect sizes of cultivar and fertilization were much smaller (η2 < 0.075). And the effect

size of error was also large (η2 ~ 0.3) (Figure 3-2-3). The large within-group variation

103 was also shown by the large coefficient of variation (CV). Average of CV among experimental groups was 34.6%, much larger than the CV of the analytical measurement

(~6%). This suggested the large within-group variation was probably caused by some unknown factors, instead of merely analytical error.

Interestingly, the levels of these three compounds were found to have similar responses to each of the variables (Figure 3-2-2). During a further investigation, good linear correlation was found between the levels of three major curcuminoids (Figure 3-2-4) for concentration ranges spanning greater than 10-fold differences between the highest and lowest values. The R values (Pearson's correlation coefficients) of curcumin 1 vs. compound 2 and curcumin vs. compound 3 were larger than 0.94. This suggested that the accumulations of the three major curcuminoids were closely associated with each other.

104

Table 3-2-1 The levels of the three major curcuminoids in turmeric rhizomes under systematically perturbation (mg/g fresh weight; the highest mean value among all groups were high-lighted)

Demethoxy- Bisdemeth- Group Curcumin 1 curcumin 2 oxycurcumin 3 Growth Fertili- Standard Standard Standard Cultivar Mean Mean Mean stage zation Deviation Deviation Deviation 1 1.89 0.54 0.69 0.23 0.69 0.27 2 1.67 0.56 0.55 0.21 0.56 0.21 5 3 1.06 0.32 0.30 0.11 0.29 0.06 4 1.59 0.31 0.51 0.12 0.45 0.17 H 1 3.32 0.30 1.36 0.03 1.35 0.11 2 2.90 1.06 1.17 0.61 1.16 0.69 7 3 3.17 0.37 1.30 0.17 1.26 0.15 4 1.84 0.38 0.71 0.24 0.64 0.19 1 2.29 0.73 0.89 0.50 0.77 0.17 2 2.28 0.37 0.86 0.39 0.73 0.19 5 3 1.00 0.73 0.32 0.31 0.32 0.20 4 1.62 0.93 0.67 0.61 0.53 0.33 T 1 3.98 0.93 2.28 0.51 1.64 0.31 2 2.43 1.06 1.26 0.56 0.92 0.38 7 3 3.71 0.61 1.85 0.52 1.45 0.22 4 2.86 0.46 1.27 0.64 1.18 0.37

105

3.5 a 3 **

W) a, b 2.5 * b F * b

g/g 2 m ** 1 1.5 in ( in

m 1 urcu

C 0.5

0 1.6 ** a W)

F 1.4

g/g g/g 1.2 ** b m b 2 1 b in ( in ** m 0.8 ** 0.6

0.4

ethoxycurcu 0.2 m

De 0

1.4

W) ** F 1.2 a

g/g 1 m * b b 3 * 0.8 b in ( in m 0.6 ** 0.4

ethoxycurcu 0.2 m 0

Bisde HRT TMO 5 M 7M 1 2 3 4

Cultivar Growth Fertilizer treatment stage

Figure 3-2-2 Effects of cultivar, growth stage, and fertilization treatment on the accumulation of three major curcuminoids (This figure shows the results of 3-way ANOVA analysis and post hoc tests. Note: The values show the means and error bars show standard error. * (P<0.05) and ** (P<0.01) show the result of 3-way ANOVA analysis. Groups sharing the same letter do not differ at P <0.05 after post-hoc comparison tests)

106

Figure 3-2-3 Effect size (η2) of the factors and error on accumulation of three major curcuminoids

Demethoxycurcumin 2 Bisdemethoxycurcumin 3 Linear (Demethoxycurcumin 2 ) Linear (Bisdemethoxycurcumin 3)

3.5

3 R2 = 0.899 2.5

2

1.5 R2 = 0.9249 (mg/g) 1

0.5

0 0123456 -0.5 Curcumin 1 (mg/g)

Figure 3-2-4 Linear correlation between the levels of three major curcuminoids (The figure shows the linear correlation of curcumin vs. the other two major curcuminoids. R is Pearson coefficient)

107

3.2.2 Non-targeted analysis using LC-MS and GC-MS

A typical LC-MS result is shown in Figure 3-2-5. The peaks detected by LC-MS are

clustered into 4 major groups, based on elution time and m/z ratio. 12 known

diarylheptanoids were identified based on their LC-MS/MS data (see the section 3.3.2).

All of them, including the three major curcuminoids, are located in area 1 in Figure 3-2-5.

Most of the peaks in the LC-MS results represent unidentified metabolites. Because these

compounds were detected in negative mode under acidic conditions (pH of the mobile

phase ~ 3.3), they probably contain carboxyl or phenolic hydroxyl groups. However, a carboxyl group often affords a neutral loss of 44 (CO2) (Bandu et al., 2004; Zeng et al.,

2006), which was not frequently observed in our MS/MS results. Therefore, most of these

unknown peaks are likely to represent phenolic compounds.

Typical GC-MS results are shown in Figure 3-2-6. A majority of the peaks that eluted

before 17 min were monoterpenoids, and most of peaks that eluted with a retention time

of greater than 17 min were sesquiterpenoids. Besides terpenoids, some other compounds, such as eugenol, were also detected. The most abundant volatile compounds detected by

GC-MS were ar-turmerone, curlone, α-phellandrene, cineole, p-mentha-1, 4(8)-diene, α- zingiberene, and β-sesquiphellandrene. This result was consistent with previous reports

(Jayaprakasha et al., 2002; Jayaprakasha et al., 2005; Ma and Gang, 2006).

The score plots of PCA analysis of LC-MS data are shown in Figure 3-2-7. The plot of the two highest ranking principal components (PC), PC1 and PC2, demonstrates the

108

difference between samples collected at 5 months (5M, blue symbols) and 7 months (7M,

red symbols). The PC1 and PC2 scores of 5M were less than those of 7M. However, the

two groups were not distinctly resolved. The plot of PC1 and PC3 showed that the

samples from cultivar HRT (H, hollow symbols) tended to have lower PC3 scores than

samples from cultivar TMO (T, solid symbols). Again, however, the border was not clear

due to the overlap of the two groups. This result was generally consistent with the

targeted analysis, in which the growth stage had the largest effect size, and the effect of

the cultivar was smaller. An effect of different fertilizer treatments was not observed in the PCA result, suggesting that overall compound production profiles are likely to be controlled mostly by genetic instead of environmental factors.

Similarly, the results of the PCA analysis of GC-MS data are shown in Figure 3-2-8. Data points in the plot of PC1 and PC2 can be easily divided into two groups. Group 1, with

higher PC2 score but lower PC1 was mainly composed of HRT samples, and most of

TMO samples belonged to group 2. However, outliers were common, especially in group

2. The difference between 5M samples and 7M samples can be recognized by their PC3

scores: 5M samples tended to have higher PC3 scores. Still, the effect of different

fertilizer treatments was not recognized in the overall PCA results.

The two-way HCA result and the heatmap of LC-MS data are shown in Figure 3-2-9. The

HCA result (the dendrogram at the top of the figure) shows an approximate separation between 7M samples (green text) and 5M samples (red text). This result is consistent

109 with the PCA result (PC1-PC2 score plot) for the same dataset. It is interesting to note that all outliers are from cultivar HRT. Some of the HRT-5M samples, especially those subjected to fertilizer treatment 1 (F1H5M), are mixed with 7M samples. Two HRT-7M samples are mixed with 5M samples. The heatmap shows red areas representing high levels of groups of compounds in certain samples. For example, in area A of the LC-MS results, compounds group A (DRG.LM1.P2_45.38_525, DRG.LM1.P2_47.47_555,

DRG.LM1.P2_48.42_585, DRG.LM1.P2_50.78_509, DRG.LM1.P2_ 48.67_569,

DRG.LM1.P2_53.92_569, DRG.LM1.P2_51.95_567, and DRG.LM1.P2_ 46.67_571) were found to accumulate at high levels in the sample group A (all of these samples from

HRT-5M).

The two-way HCA result and the heatmap of GC-MS data are shown in Figure 3-2-10. In contrast to the LC-MS HCA result, for the GC-MS HCA result (the dendrogram at the top of the figure), 5M and 7M samples are essentially indistinguishable. Instead, an approximate separation was shown between HRT (red text) and TMO (green text). Most of the samples in cluster 1 are from cultivar and most of the TMO samples (green) are in cluster 2. This result is consistent with the GC-MS PCA result (PC1-PC2 score plotting).

Similar to the heatmap of LC-MS data, the heatmap of GC-MS data also shows features indicating elevated accumulation of a small group of metabolites for a subset of samples.

Area A showed high levels of a group of monoterpenoids in 12 HRT-5M samples (as well as a TMO-5M sample). And area B showed high levels of a group of sesquiterpenoids in some HRT samples.

110

3.2.3 Correlation between the abundance patterns of metabolites and co-regulated

metabolite modules

In a metabolomic investigation of tomato fruit volatiles, some structurally related

metabolites were found to cluster together in a HCA analysis using Pearson correlation

coefficient between abundance patterns of metabolites (Tikunov et al., 2005). This

“hierarchical modularity” of metabolites could be rationalized on the basis of the

organization of metabolic networks. Structurally related metabolites probably originate

from a common precursor and shared biosynthetic pathways.

In section 3.2.1, a linear correlation was identified for the levels of the three major

curcuminoids. Similar correlations probably also exist in accumulation patterns of other

compounds. To test this hypothesis, HCA analysis was performed using the Pearson

correlation coefficient between the abundance of compounds under the treatment regimes

applied in this investigation. The HCA results and “correlation heatmaps” (Figure 3-2-11

and Figure 3-2-12) indicate the existence of co-regulated “metabolite modules” in both

the LC-MS and GC-MS data set. Metabolites within the same module had abundance patterns across treatments that were highly correlated to each other. In addition, they had similar relationships to other compounds. For example, in the LC-MS correlation heatmap, almost all compounds in module “b” had a negative correlation with

compounds in “a” and “f”, but positive correlation with compounds in “c” (Figure 3-2-

11). The modularity of metabolites appeared to be hierarchical, i.e., large modules

111 contained smaller sub-modules in which compounds were more closely associated with each other.

3.2.4 Distribution of Pearson correlation coefficient between the abundance patterns of metabolites

A co-regulated metabolite module can be defined as a group of compounds whose abundance patterns are closely correlated to each other. To recognize the module from the correlation heatmap, the value of Pearson correlation coefficient within the module must be significantly higher than background. To objectively define the border of the module, the distribution of the correlation value has to be investigated.

Figure 3-2-13 and Figure 3-2-14 shows the correlation distribution (distribution of

Pearson correlation coefficient between the abundance patterns of metabolites) of LC-MS results and GC-MS results, respectively. Values of “1” correspond to “self correlation”

(located at the diagonal from top-left to bottom-right of the correlation heatmap) and do not provide any biological meaningful information, so these points were deleted prior to preparation of all the correlation distribution figures. The shapes of the two distributions are clearly different. The correlation distribution of the LC-MS results has a symmetrical distribution (peak at 0.255, skewness 0.068) (Figure 3-2-13 and Table 3-2-2). In contrast, the correlation distribution of the GC-MS results has an asymmetry distribution with a positive skew (peak at 0.035, skewness, 0.291) (Figure 3-2-14 and Table 3-2-2). The cut- off values for the top 5% correlation efficient are about 0.695 and 0.725 for the LC-MS

112

results and GC-MS results, respectively, which can be used as the border values to define

co-regulated metabolite modules.

Both of the correlation distributions contain more positive values than negative values

The LC-MS distribution shifts to the right, and the GC-MS distribution skews to the right

(Figure 3-2-13 and Figure 3-2-14). A possible reason of this phenomenon is the “0” values in the data set. If two compounds were absent in many samples, their abundance patterns will contain many “0” values, which will probably increase the correlation efficient. To test this hypothesis, the correlation distributions were plotted based on GC-

MS and LC-MS data sets scrambled using a Perl script “scramble_row.pl” (Appendix

A2). The data sets were scrambled in the “sample” dimension, so the correlations between abundance patterns were lost but the frequency of data values remained unchanged. If the hypothesis was correct, the scrambled correlation distributions would also have a big positive shift or a big positive skew. However, the scrambled correlation distributions (Figure 3-2-15) centered at 0 with a very small positive skew. Therefore, this hypothesis was rejected.

Another explanation for the skewness of the data could be the simultaneous accumulation of metabolites. Our samples were collected at two growth stages. A large increase in metabolite accumulation between the two collection times would result in lower abundances at the 5th month but higher levels at the 7th month, and thus produce a

positive correlation between their abundance patterns. To test this hypothesis, new

113 distributions were plotted based on the correlation data calculated separately from 5M samples and 7M samples. If the hypothesis were to be correct, all new distributions would shift toward 0, where the scrambled distribution centered. As the result showed, the expected shift occurred for LC-MS distribution, but not for GC-MS distribution

(Figure 3-2-15). Therefore, simultaneous accumulation of metabolites due to increasing plant age is at least partially responsible for the positive shift of the LC-MS correlation distribution. However, the positive skew of GC-MS correlation distribution could not be explained in this manner.

114

m/z

4

2

3

1

Retention time (s)

11

12 8

3

15 7 2

4 14 6 1 Figure 3-2-5 A representative LC-MS result of turmeric rhizome (MeOH extracts of turmeric rhizome were analyzed using LC-MS. Compounds were identified by comparison of their LC-MS/MS spectra with spectra reported in the literature: curcumin 1, demethoxycurcumin 2, bisdemethoxy-curcumin 3, 4'- hydroxybisdemethoxycurcumin 4, Dihydrocurcumin 6, Dihydrode-methoxycurcumin 7, Dihydrocurcumin 8, 1-(4-hydroxy-3-methoxyphenyl)-7-(4-hydroxy-3,5- dimethoxyphenyl)-4,6-heptadien-3-one 11, 1-(3,4-dihydroxyphenyl)-7-(4-hydroxy-3- methoxyphenyl)-6-hepten-3,5-dione 12, 1-(4-hydroxyphenyl)-7-(4-hydroxy-3- methoxyphenyl)-1,4,6-heptatrien-3-one 14, and 1,7-bis(4-hydroxy-3-methoxyphenyl)- 1,4,6-heptatrien-3-one,) 15. Curcumin monoacetate 22 is not showen here because it affords intense peak only in positive mode)

115

ar-turmerone 100

90 TMO 80 α-phellandrene curlone 70

60

50

40 α-zingiberene cineole Relative Abundance 30

20

10

0 α-zingiberene 100

90 β-sesquiphel landrene HRT 80 α-phellandrene ar-turmerone 70

60

50 p-mentha- 1,4(8)-diene curlone 40 Relative Abundance 30

20

10

0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 Time (min)

Figure 3-2-6 Representative GC-MS results showing metabolic profiles of turmeric cultivars (MTBE extracts of turmeric rhizomes were analyzed using GC-MS. Compounds were identified by comparison of their fragmentation patterns with the NIST Mass Spectral library Version 2.0 and and an essential oil GC-MS mass spectra library from Dr. Robert P. Adams. Major compounds identified are named.)

116

3.5

3 H51 H52 H53 H54

2.5 H71 H72 H73 H74

2 T51 T52 T53 T54

1.5 T71 T72 T73 T74

1

0.5

0 Principal component 2 2 component Principal -0.5

-1

-1.5

-2 -2-1012345 Principal component 1

4 H51 H52 H53 H54

H71 H72 H73 H74

3 T51 T52 T53 T54

T71 T72 T73 T74

2

1

0 Principal component 3 Principal component

-1

-2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Principal component 2

Figure 3-2-7 Score plots showing position of LC-MS data collected from systematically preturbed turmeric rhizome samples (Hollow: HRT, solid: TMO; blue: 5 month, red: 7 month; diamond: fertilizer treatment 1, circle: fertilizer treatment 2, square: fertilizer treatment 3, triangle: fertilizer treatment 4)

117

4

1 3

2

H51 H52 H53 H54 1 H71 H72 H73 H74 ` T51 T52 T53 T54 T71 T72 T73 T74

Principal 2 component 0 2

-1

-2 -2-10123456 Principal component 1

6 H51 H52 H53 H54

H71 H72 H73 H74

5 T51 T52 T53 T54 T71 T72 T73 T74 4

3

2

1 Principal component 3

0

-1

-2 -2-10123456 Principal component 1

Figure 3-2-8 Score plots showing position of GC-MS data collected from systematically preturbed turmeric rhizome samples (Hollow: HRT, solid: TMO; blue: 5 month, red: 7 month; diamond: fertilizer treatment 1, circle: fertilizer treatment 2, square: fertilizer treatment 3, triangle: fertilizer treatment 4)

118

331F2H5M182 292F4H5M208 345F4H5M207 295F4H5M209 358F3H5M197 218F3H5M194 200F3H5M193 206F3H5M198 208F3H5M196 258F2H5M185 224F1H5M172 302F2H5M183 123F2H7M310 215F3H5M195 355F4H5M205 307F4H5M206 370F4H5M210 252F2H5M186 125F4H7M342 247F2H5M184 221F1H5M173 234F1H5M169 174F3H7M324 151F1H7M299 231F1H5M170 129F1H7M296 372F4T5M215 338F3T5M204 350F4T5M214 316F4T5M213 357F3T5M200 195F3T5M199 242F1T5M179 300F2T5M188 368F2T5M192 283F1T5M177 239F2T5M187 255F1T5M180 290F4T5M212 333F3T5M203 329F2T5M190 288F1T5M178 278F1T5M176 266F2T5M189 155F4T7M339 127F3T7M328 147F4T7M338 149F4T7M337 110F4H7M332 106F3T7M329 145F3T7M319 143F3H7M321 185F4H7M331 250F2H5M181 236F1H5M171 104F1T7M301 322F4T5M211 55F1H7M295 40F1H7M297 59F2H7M312 93F2H7M307 83F2H7M308 61F4H7M333 73F4H7M336 28F2H7M309 77F3H7M322 75F3H7M323 79F1H7M298 81F3T7M325 30F2T7M318 53F2T7M314 95F3T7M326 38F2T7M313 64F1T7M303 51F1T7M305 26F1T7M304 36F1T7M302 57F2H7M311

A DRG.LM1.P2_29.23_679 DRG.LM1.P2_29.90_693 DRG.LM1.P2_29.00_763 DRG.LM1.P2_29.35_717 DRG.LM1.P2_29.65_747 DRG.LM1.P2_29.02_687 DRG.LM1.P2_28.02_697 DRG.LM1.P2_32.02_615 DRG.LM1.P2_31.30_601 DRG.LM1.P2_31.38_597 DRG.LM1.P2_22.35_643 Compound 14 DRG.LM1.P2_22.93_414 DRG.LM1.P2_22.90_703 DRG.LM1.P2_22.92_396 DRG.LM1.P2_23.08_733 DRG.LM1.P2_29.43_708 DRG.LM1.P2_30.77_884 DRG.LM1.P2_30.67_797 DRG.LM1.P2_31.67_677 DRG.LM1.P2_29.65_769 DRG.LM1.P2_30.67_753 DRG.LM1.P2_30.12_753 DRG.LM1.P2_30.13_768 DRG.LM1.P2_53.92_569 DRG.LM1.P2_45.38_525 DRG.LM1.P2_50.78_509 DRG.LM1.P2_48.42_585 DRG.LM1.P2_51.95_567 DRG.LM1.P2_48.67_569 A A DRG.LM1.P2_36.23_587 DRG.LM1.P2_46.67_571 DRG.LM1.P2_47.47_555 DRG.LM1.P2_45.40_569 DRG.LM1.P2_39.12_589 DRG.LM1.P2_42.42_583 DRG.LM1.P2_41.58_553 DRG.LM1.P2_40.68_523 DRG.LM1.P2_37.97_569 DRG.LM1.P2_49.98_585 DRG.LM1.P2_45.42_601 DRG.LM1.P2_20.07_325 DRG.LM1.P2_47.57_587 DRG.LM1.P2_36.62_541 DRG.LM1.P2_41.55_541 DRG.LM1.P2_40.82_571 DRG.LM1.P2_41.58_601 DRG.LM1.P2_27.70_321 DRG.LM1.P2_39.33_601 DRG.LM1.P2_27.47_293 Compound 21 Compound 1 (Curcumin) Compound 3 Compound 2 DRG.LM1.P2_30.93_143 DRG.LM1.P2_32.45_337 DRG.LM1.P2_31.35_340 DRG.LM1.P2_30.93_187 DRG.LM1.P2_30.92_353 DRG.LM1.P2_31.65_217 DRG.LM1.P2_23.85_353 DRG.LM1.P2_30.73_345 DRG.LM1.P2_42.70_249 DRG.LM1.P2_23.13_763 DRG.LM1.P2_23.13_785 DRG.LM1.P2_23.13_417 Compound 15 DRG.LM1.P2_23.13_444 Compound 28 DRG.LM1.P2_46.48_587 DRG.LM1.P2_16.10_345 DRG.LM1.P2_47.20_511 DRG.LM1.P2_49.05_571 DRG.LM1.P2_48.67_587 DRG.LM1.P2_48.72_527 DRG.LM1.P2_29.28_704 DRG.LM1.P2_28.57_645 DRG.LM1.P2_29.67_725 DRG.LM1.P2_30.02_755 DRG.LM1.P2_30.92_615 DRG.LM1.P2_28.83_681 DRG.LM1.P2_29.62_695 DRG.LM1.P2_29.22_740 DRG.LM1.P2_29.30_710 DRG.LM1.P2_32.37_335 DRG.LM1.P2_32.38_399 DRG.LM1.P2_31.65_405 DRG.LM1.P2_34.43_499 DRG.LM1.P2_29.38_515 DRG.LM1.P2_33.78_469 DRG.LM1.P2_30.15_529 DRG.LM1.P2_35.05_529 Compound 24 Compound 25 DRG.LM1.P2_17.05_405 DRG.LM1.P2_28.82_496 Compound 23 DRG.LM1.P2_28.82_707 DRG.LM1.P2_28.12_466 Compound 4 DRG.LM1.P2_28.12_647 DRG.LM1.P2_28.12_386 Compound 6 DRG.LM1.P2_42.13_566 DRG.LM1.P2_38.78_564 DRG.LM1.P2_38.53_476 DRG.LM1.P2_40.90_540 DRG.LM1.P2_36.08_735 DRG.LM1.P2_31.87_717 DRG.LM1.P2_31.47_687 DRG.LM1.P2_25.05_295 DRG.LM1.P2_32.43_387 DRG.LM1.P2_38.35_277 DRG.LM1.P2_41.75_447 DRG.LM1.P2_41.40_725 DRG.LM1.P2_42.20_561 DRG.LM1.P2_40.20_699 DRG.LM1.P2_38.17_723 DRG.LM1.P2_35.63_721 DRG.LM1.P2_32.45_735 DRG.LM1.P2_32.33_703 DRG.LM1.P2_31.90_673 DRG.LM1.P2_32.80_733 DRG.LM1.P2_32.92_717 DRG.LM1.P2_33.37_747 DRG.LM1.P2_33.68_783 DRG.LM1.P2_32.37_411 DRG.LM1.P2_18.62_419 DRG.LM1.P2_25.42_298 Compound 11 DRG.LM1.P2_21.92_311 DRG.LM1.P2_31.67_373 DRG.LM1.P2_25.03_367 DRG.LM1.P2_30.95_343 DRG.LM1.P2_32.38_217 DRG.LM1.P2_30.93_119 DRG.LM1.P2_32.37_737 DRG.LM1.P2_31.65_173 DRG.LM1.P2_32.48_732

-1.0 -0.5 0.0 0.5 1.0 1.5 2.0

Figure 3-2-9 The two-way HCA results aligned with a heatmap demonstrating the changes of metabolite levels detected by LC-MS (Red: upregulated, blue: downregluated)

119

1 2 302F2H5M183 292F4H5M208 295F4H5M209 358F3H5M197 316F4T5M213 218F3H5M194 231F1H5M170 208F3H5M196 331F2H5M182 345F4H5M207 206F3H5M198 200F3H5M193 79F1H7M298 108F4H7M334 83F2H7M308 55F1H7M295 252F2H5M186 93F2H7M307 355F4H5M205 247F2H5M184 250F2H5M181 307F4H5M206 221F1H5M173 145F3T7M319 129F1H7M296 174F3H7M324 151F1H7M299 110F4H7M332 125F4H7M342 73F4H7M336 123F2H7M310 30F2T7M318 61F4H7M333 38F2T7M313 36F1T7M302 26F1T7M304 28F2H7M309 77F3H7M322 40F1H7M297 32F2T7M317 215F3H5M195 255F1T5M180 283F1T5M177 239F2T5M187 195F3T5M199 242F1T5M179 274F1T5M175 300F2T5M188 338F3T5M204 357F3T5M200 372F4T5M215 370F4H5M210 147F4T7M338 185F4H7M331 149F4T7M337 102F3T7M327 258F2H5M185 236F1H5M171 234F1H5M169 224F1H5M172 75F3H7M323 106F3T7M329 64F1T7M303 51F1T7M305 81F3T7M325 104F1T7M301 53F2T7M314 57F2H7M311 143F3H7M321 59F2H7M312 95F3T7M326 127F3T7M328 155F4T7M339 10F2T7M315 266F2T5M189 333F3T5M203 329F2T5M190 290F4T5M212 353F3T5M201 288F1T5M178 278F1T5M176 322F4T5M211 7F1T7M306 350F4T5M214 368F2T5M192 15F2T7M316 18F3H7M320 21F1H7M300

A B trans-beta-ocimene_C10H16 n-hexadecanoicacid_C16H32O2 (Z)-sabinol_C10H16O D RG.GM 1.N2.16.17.136.93.55 D RG.GM 1.N2.16.63.136.93.55 D RG.GM 1.N2.12.59.152.69.110 trans-beta-terpineol_C10H18O DRG.GM1.N2.11.40.134.91.119 1,3,8-p-menthatriene_C10H14 p-cymen-8-ol_C10H14O cineole_C10H18O sabinen_C10H16 alpha-terpineol_C10H18O A R-pinene_C10H16 phellandrene_C10H16 trans-piperitol acetate_C12H20O2 alpha-terpinene_C10H16 A p-mentha-1,4(8)-diene_C10H16 beta-atlantol_C15H24O (E)-caryophyllene_C15H24 ar-tumerone_C15H20O sylvestrene_C10H16 3-carene_C10H16 beta-pinene_C10H16 gamma-terpinene_C10H16 delta-terpineol_C10H18O myrcene_C10H16 DRG.GM1.N2.33.59.236.109.177 corymbolone_C15H24O2 DRG.GM1.N1.35.60.238.135.109 R-citronelle_C10H18 camphene_C10H16 alpha-humulene_C15H24 acoradiene_C15H24 alpha-oxobisabolene_C15H24O hexadecane.1.2.diol_C16H34O2 terpinen-4-ol_C10H18O dehydro-aromadendrene._C15H24 beta-curcumene_C15H24 beta-selinene_C15H24 ar-curcumene_C15H22 DRG.GM1.N2.21.03.204.119.189 gamma-curcumene_C15H24 DRG.GM1.N2.24.72.202.109.67 D RG.GM 1.N2.14.00.126.87.97 alpha-zingiberene_C15H24 beta-sesquiphellandrene_C15H24 bet-.bisabolene_C15H24 B (E)-gammar-bisabolene_C15H24 B (Z)-beta-farnesene_C15H24 ar-turmerol_C15H22O (Z)-alpha-bergamotene_C15H24 7-epi-sesquithujene_C15H24 D RG.GM 1.N2.39.56.232.83.149 D RG.GM 1.N2.40.20.149.83.109 DRG.GM1.N1.32.59.250.149.83 3,6,6-trimethyl-2-norpinanon_C10H16O DRG.GM1.N2.35.76.218.121.83 D RG.GM 1.N2.15.42.135.93.91 D RG.GM 1.N2.14.13.136.85.121 D RG.GM 1.N2.17.43.139.97.107 D RG.GM 1.N2.16.76.153.70.55 D RG.GM 1.N2.17.66.165.97.55 D RG.GM 1.N2.35.99.216.83.105 m-acetanisol_C9H10O2 cis-4-thujanol_C10H18O cis-beta-terpineol_C10H18O D RG.GM 1.N2.33.08.151.83.93 alpha-atlantone_C15H22O ipsdienol_C10H16O o-cymene_C10H14 DRG.GM1.N2.11.62.122.83.55 thymol_C10H14O DRG.GM1.N2.25.31.152.109.119 curlone_C15H22O D RG.GM 1.N2.26.08.218.83.55 DRG.GM1.N2.25.68.202.119.93 DRG.GM1.N2.15.815.178.94.79 DRG.GM1.N2.14.38.178.109.79 D RG.GM 1.N2.37.77.232.83.149 DRG.GM1.N2.32.42.232.119.83 DRG.GM1.N2.25.09.220.138.96 D RG.GM 1.N2.20.76.202.69.91 DRG.GM1.N2.37.06.136.118.83 DRG.GM1.N2.33.35.234.136.95 D RG.GM 1.N2.33.94.234.83.119 eugenol_C10H12O2 DRG.GM1.N2.38.49.250.109.81 D RG.GM 1.N2.32.33.234.83.55 D RG.GM 1.N2.32.12.234.83.105 D RG.GM 1.N2.27.76.218.79.94 DRG.GM1.N2.24.88.220.138.96 D RG.GM 1.N2.26.22.136.85.93 (Z)-sesquisabinene hydrate_C15H26O DRG.GM1.N2.26.51.162.119.105 DRG.GM1.N2.24.88.146.105.132 DRG.GM1.N2.27.50.222.119.69 DRG.GM1.N2.26.88.222.119.69 D RG.GM 1.N2.24.54.202.91.79 DRG.GM1.N2.37.27.216.114.83 D RG.GM 1.N2.36.75.234.83.137 D RG.GM 1.N2.33.84.192.83.121 DRG.GM1.N2.36.12.232.135.83 D RG.GM 1.N2.26.70.218.94.79 DRG.GM1.N2.25.48.218.105.83 D RG.GM 1.N2.37.19.167.83.109 D RG.GM 1.N2.33.15.218.83.120 DRG.GM1.N2.34.24.263.138.123 DRG.GM1.N2.34.11.218.96.126 DRG.GM1.N2.32.85.238.121.137 thujene_C10H18 D RG.GM 1.N2.19.28.202.93.119 DRG.GM.N2.20.17.95.105

-1.0 -0.5 0.0 0.5 1.0 1.5 2.0

Figure 3-2-10 The two-way HCA results aligned with a heatmap demonstrating the changes of metabolite levels detected by GC-MS. (Red: upregulated, blue: downregluated)

120

Count 0 1000 2500

-0.5 0 0.5 1 Value

OMe OH Me 2H HO2 same 46.67_571 28.82_353 30.12_753 42.42_583 _49.05_571 _45.42_601 _27.92_325 _49.98_585 _35.05_529 _46.48_587 _48.67_587 _36.62_541 _41.58_601 _32.80_733 _30.92_353 und 4 LM1.P2 LM1.P2 DRG.LM1.P2 DRG.LM1.P2_47.20_511 DRG.LM1.P2_36.08_735 DRG.LM1.P2_32.43_387 DRG.LM1.P2_32.37_335 DRG.LM1.P2_40.90_540 DRG.LM1.P2_38.53_476 DRG.LM1.P2_25.05_295 DRG.LM1.P2_38.35_277 DRG.LM1.P2_32.38_399 DRG.LM1.P2_51.95_567 DRG.LM1.P2_ DRG.LM1.P2_53.92_569 DRG.LM1.P2_36.23_587 DRG.LM1.P2_48.42_585 DRG.LM1.P2_50.78_509 DRG.LM1.P2_45.38_525 DRG.LM1.P2_48.67_569 DRG.LM1.P2_47.47_555 DRG.LM1.P2_27.70_321 DRG.LM1.P2_39.33_601 Compound 25 DRG.LM1.P2_30.02_755 DRG.LM1.P2_17.05_405 Compound 24 DRG.LM1.P2_28.82_496 DRG.LM1.P2_28.82_353_dimer DRG.DRG.LM1.P2 DRG.LM1.P2_28.12_466 DRG.LM1.P2_28.12_386 _ Compo Compound 4_dimer DRG.LM1.P2_29.22_740 DRG.LM1.P2_28.83_681 DRG.LM1.P2_29.30_710 DRG.LM1.P2_31.65_405 DRG.LM1.P2_29.62_695 DRG.LM1.P2_29.67_725 DRG.LM1.P2_32.38_217 DRG.LM1.P2_39.12_589 DRG.LM1.P2_45.40_569 DRG.LM1.P2 DRG.LM1.P2_16.10_345 DRG.LM1.P2_30.92_615 DRG.LM1.P2_33.78_469 DRG.LM1.P2 DRG.LM1.P2_30.15_529 DRG.LM1.P2 DRG.LM1.P2 DRG.LM1.P2_48.72_527 DRG.LM1.P2_29.28_704 DRG.LM1.P2_28.57_645 DRG.LM1.P2_31.87_717 DRG.LM1.P2_31.47_687 DRG.LM1.P2_38.78_564 DRG.LM1.P2_42.13_566 DRG.LM1.P2_42.20_561 DRG.LM1.P2_40.20_699 DRG.LM1.P2_35.63_721 DRG.LM1.P2_41.40_725 DRG.LM1.P2_38.17_723 DRG.LM1.P2_41.75_447 DRG.LM1.P2_31.67_373 DRG.LM1.P2_32.37_737 DRG.LM1.P2_32.45_735 DRG.LM1.P2_32.48_732 DRG.LM1.P2_32.37_411 DRG.LM1.P2_33.68_783 DRG.LM1.P2_33.37_747 DRG.LM1.P2_30.95_343 DRG.LM1.P2_29.65_769 DRG.LM1.P2_30.13_768 DRG.LM1.P2_ DRG.LM1.P2_30.67_753 DRG.LM1.P2_31.30_601 DRG.LM1.P2_32.02_615 DRG.LM1.P2_30.77_884 DRG.LM1.P2_22.92_396 DRG.LM1.P2_23.08_733 DRG.LM1.P2_29.02_687 DRG.LM1.P2_22.90_703 DRG.LM1.P2_22.93_414 DRG.LM1.P2_28.02_697 DRG.LM1.P2_31.38_597 DRG.LM1.P2_30.67_797 DRG.LM1.P2_29.43_708 Compound 14_dimer DRG. DRG.LM1.P2_47.57_587 DRG.LM1.P2_41.55_541 DRG.LM1.P2 DRG.LM1.P2_40.82_571 DRG.LM1.P2 DRG.LM1.P2_18.62_419 DRG.LM1.P2_32.92_717 DRG.LM1.P2 DRG.LM1.P2_31.90_673 DRG.LM1.P2_32.33_703 11 Compound DRG.LM1.P2_34.43_499 DRG.LM1.P2_29.38_515 DRG.LM1.P2_25.03_367 DRG.LM1.P2_30.93_119 DRG.LM1.P2_31.65_173 DRG.LM1.P2_42.70_249 DRG.LM1.P2_30.93_143 2 Compound Compound(curcumin) 1 3 Compound DRG.LM1.P2_31.35_340 DRG.LM1.P2_31.65_217 DRG.LM1.P2_30.93_187 DRG.LM1.P2 DRG.LM1.P2_32.45_337 DRG.LM1.P2_23.85_353 DRG.LM1.P2_21.92_311 DRG.LM1.P2_23.13_785 DRG.LM1.P2_23.13_763 DRG.LM1.P2_31.67_677 DRG.LM1.P2_25.42_298 DRG.LM1.P2_30.73_345 DRG.LM1.P2_23.13_417 Compound 15 DRG.LM1.P2_23.13_444 Compound 28 DRG.LM1.P2_29.65_747 DRG.LM1.P2_29.35_717 DRG.LM1.P2_29.00_763 DRG.LM1.P2_29.23_679 DRG.LM1.P2_29.90_693 DRG.LM1.P2_41.58_553 DRG.LM1.P2_ DRG.LM1.P2_37.97_569 DRG.LM1.P2_40.68_523 DRG.LM1.P2_27.47_293 Compound 21 DRG.LM1.P2_20.07_325

DRG.LM1.P2_36.08_735 DRG.LM1.P2_32.43_387 DRG.LM1.P2_32.37_335 DRG.LM1.P2_40.90_540 DRG.LM1.P2_38.53_476 DRG.LM1.P2_25.05_295 DRG.LM1.P2_38.35_277 DRG.LM1.P2_32.38_399 DRG.LM1.P2_51.95_567 DRG.LM1.P2_46.67_571 DRG.LM1.P2_53.92_569 DRG.LM1.P2_36.23_587 a DRG.LM1.P2_48.42_585 DRG.LM1.P2_50.78_509 DRG.LM1.P2_45.38_525 DRG.LM1.P2_48.67_569 DRG.LM1.P2_47.47_555 DRG.LM1.P2_27.70_321 DRG.LM1.P2_39.33_601 Compound 25 DRG.LM1.P2_30.02_755 DRG.LM1.P2_17.05_405 Compound 24 DRG.LM1.P2_28.82_496 DRG.LM1.P2_28.82_353_dimer DRG.LM1.P2 _28.82_353 DRG.LM1.P2_27.92_325 DRG.LM1.P2_28.12_466 DRG.LM1.P2_28.12_386 b Compound 4 Compound 4_dimer DRG.LM1.P2_29.22_740 DRG.LM1.P2_28.83_681 DRG.LM1.P2_29.30_710 DRG.LM1.P2_31.65_405 DRG.LM1.P2_29.62_695 DRG.LM1.P2_29.67_725 DRG.LM1.P2_32.38_217 DRG.LM1.P2_39.12_589 DRG.LM1.P2_45.40_569 DRG.LM1.P2_49.98_585 DRG.LM1.P2_16.10_345 DRG.LM1.P2_30.92_615 DRG.LM1.P2_33.78_469 DRG.LM1.P2_35.05_529 DRG.LM1.P2_30.15_529 c DRG.LM1.P2_46.48_587 DRG.LM1.P2_49.05_571 DRG.LM1.P2_47.20_511 DRG.LM1.P2_48.67_587 d DRG.LM1.P2_48.72_527 DRG.LM1.P2_29.28_704 DRG.LM1.P2_28.57_645 DRG.LM1.P2_31.87_717 DRG.LM1.P2_31.47_687 DRG.LM1.P2_38.78_564 DRG.LM1.P2_42.13_566 DRG.LM1.P2_42.20_561 DRG.LM1.P2_40.20_699 DRG.LM1.P2_35.63_721 DRG.LM1.P2_41.40_725 e DRG.LM1.P2_38.17_723 DRG.LM1.P2_41.75_447 DRG.LM1.P2_31.67_373 DRG.LM1.P2_32.37_737 DRG.LM1.P2_32.45_735 DRG.LM1.P2_32.48_732 DRG.LM1.P2_32.37_411 DRG.LM1.P2_33.68_783 DRG.LM1.P2_33.37_747 OMe OH Me 2H HO2 same DRG.LM1.P2_30.95_343 DRG.LM1.P2_29.65_769 DRG.LM1.P2_30.13_768 DRG.LM1.P2_30.12_753 f1 DRG.LM1.P2_30.67_753 DRG.LM1.P2_31.30_601 DRG.LM1.P2_32.02_615 DRG.LM1.P2_30.77_884 DRG.LM1.P2_22.92_396 DRG.LM1.P2_23.08_733 DRG.LM1.P2_29.02_687 DRG.LM1.P2_22.90_703 f2 DRG.LM1.P2_22.93_414 DRG.LM1.P2_28.02_697 DRG.LM1.P2_31.38_597 DRG.LM1.P2_30.67_797 DRG.LM1.P2_29.43_708 Compound 14_dimer DRG.LM1.P2_29.65_747 DRG.LM1.P2_29.35_717 DRG.LM1.P2_29.00_763 DRG.LM1.P2_29.23_679 f3 DRG.LM1.P2_29.90_693 DRG.LM1.P2_41.58_553 DRG.LM1.P2_42.42_583 DRG.LM1.P2_37.97_569g1 DRG.LM1.P2_40.68_523 DRG.LM1.P2_27.47_293 Compound 21 DRG.LM1.P2_20.07_325 DRG.LM1.P2_45.42_601 DRG.LM1.P2_47.57_587g2 DRG.LM1.P2_41.55_541 DRG.LM1.P2_36.62_541 DRG.LM1.P2_40.82_571g3 DRG.LM1.P2_41.58_601 DRG.LM1.P2_18.62_419 DRG.LM1.P2_32.92_717 DRG.LM1.P2_32.80_733 DRG.LM1.P2_31.90_673 k DRG.LM1.P2_32.33_703 Compound 11 DRG.LM1.P2_34.43_499 DRG.LM1.P2_29.38_515 DRG.LM1.P2_25.03_367 DRG.LM1.P2_30.93_119 DRG.LM1.P2_31.65_173 DRG.LM1.P2_42.70_249 DRG.LM1.P2_30.93_143 Compound 2 Compound 1(curcumin) Compound 3 L1 DRG.LM1.P2_31.35_340 DRG.LM1.P2_31.65_217 DRG.LM1.P2_30.93_187 DRG.LM1.P2_30.92_353 DRG.LM1.P2_32.45_337 DRG.LM1.P2_23.85_353 DRG.LM1.P2_21.92_311 DRG.LM1.P2_23.13_785 DRG.LM1.P2_23.13_763 DRG.LM1.P2_31.67_677 DRG.LM1.P2_25.42_298 DRG.LM1.P2_30.73_345 DRG.LM1.P2_23.13_417 Compound 15 DRG.LM1.P2_23.13_444L2 Compound 28

Figure 3-2-11 The results of HCA analysis using Pearson correlation coefficient between abundance patterns of metabolites detected using LC-MS. (The HCA result is aligned with a heatmap demonstrating correlation between the abundance patterns of the metabolites. Postive Pearson correlation coefficients are red, and negative Pearson correlation coefficients are blue. Compounds in analog series are marked in different color to show the chemical modifications)

121

Count 0 1000 2000

-0.5 0 0.5 1 Value

DRG.GM1.N2.14.00.126.87.97 alpha-zingiberene_C15H24 ar-turmerol_C15H22O beta-sesquiphellandrene_C15H24 beta-bisabolene_C15H24 (Z)-beta-farnesene_C15H24 (E)-gammar-bisabolene_C15H24 a1 (Z)-alpha-bergamotene_C15H24 7-epi-sesquithujene_C15H24 acoradiene_C15H24 beta-curcumene_C15H24 alpha-humulene_C15H24 DRG.GM1.N2.24.72.202.109.67 gamma-curcumene_C15H24 alpha-oxobisabolene_C15H24O DRG.GM1.N2.21.03.204.119.189 dehydro-aromadendrene._C15H24 ar-curcumene_C15H22 DRG.GM1.N1.35.60.238.135.109 corymbolone_C15H24O2 a2 (Z)-sesquisabinenehydrate_C15H26O beta-selinene_C15H24 terpinen-4-ol_C 10H 18O a3 3-carene_C10H16 ar-tumerone_C15H20O beta-pinene_C10H16 sylvestrene_C10H16 gamma-terpinene_C10H16 b myrcene_C10H16 delta-terpineol_C10H18O camphene_C10H16 trans-beta-ocimene_C10H16 DRG.GM1.N2.33.59.236.109.177 R-citronelle_C10H18 DRG.GM1.N2.16.76.153.70.55 DRG.GM1.N2.17.66.165.97.55 DRG.GM1.N2.17.43.139.97.107 c DRG.GM1.N2.15.42.135.93.91 DRG.GM1.N2.14.13.136.85.121 DRG.GM1.N2.11.40.134.91.119

ipsdienol_C10H16O cis.4-thujanol_C10H18O n-hexadecanoic acid_C16H32O2 p-mentha-1,4(8)-diene_C10H16 alpha-terpinene_C10H16 trans-piperitol acetate_C12H20O2 phellandrene_C10H16 d1 R-pinene_C10H16 alpha-terpineol_C10H18O sabinen_C10H16 cineole_C10H18O d2 DRG.GM1.N2.12.59.152.69.110 trans-beta-terpineol_C10H18O DRG.GM1.N2.16.17.136.93.55 DRG.GM1.N2.16.63.136.93.55 p-cymen.8.ol_C10H14O beta-atlantol_C15H24O (E)-caryophyllene_C15H24 DRG.GM1.N2.37.19.167.83.109 DRG.GM1.N2.37.27.216.114.83 DRG.GM1.N2.24.54.202.91.79 curlone_C15H22O DRG.GM1.N2.26.70.218.94.79 DRG.GM1.N2.33.84.192.83.121 e1 DRG.GM1.N2.36.12.232.135.83 DRG.GM1.N2.25.48.218.105.83 DRG.GM1.N2.36.75.234.83.137 DRG.GM1.N2.26.22.136.85.93 cis-beta-terpineol_C10H18O DRG.GM1.N2.24.88.220.138.96 DRG.GM1.N2.27.76.218.79.94 DRG.GM1.N2.32.33.234.83.55 e2 DRG.GM1.N2.32.12.234.83.105 thujene_C10H18 hexadecane-1,2-diol_C16H34O2 DRG.GM.N2.20.17.95.105 DRG.GM1.N2.19.28.202.93.119 DRG.GM1.N2.26.51.162.119.105 DRG.GM1.N2.26.88.222.119.69 DRG.GM1.N2.24.88.146.105.132 e3 DRG.GM1.N2.27.50.222.119.69 DRG.GM1.N2.25.31.152.109.119 DRG.GM1.N2.32.42.232.119.83 DRG.GM1.N2.20.76.202.69.91 DRG.GM1.N2.25.09.220.138.96 DRG.GM1.N2.37.06.136.118.83 DRG.GM1.N2.15.815.178.94.79 DRG.GM1.N2.14.38.178.109.79 DRG.GM1.N2.38.49.250.109.81 f eugenol_C10H12O2 DRG.GM1.N2.37.77.232.83.149 DRG.GM1.N2.25.68.202.119.93 DRG.GM1.N2.33.15.218.83.120 DRG.GM1.N2.34.24.263.138.123 alpha-atlantone_C15H22O DRG.GM1.N2.32.85.238.121.137 DRG.GM1.N2.34.11.218.96.126 thymol_C10H14O DRG.GM1.N2.11.62.122.83.55 DRG.GM1.N2.33.08.151.83.93 DRG.GM1.N2.26.08.218.83.55 m-acetanisol_C9H10O2 DRG.GM1.N2.35.99.216.83.105 DRG.GM1.N2.35.76.218.121.83 DRG.GM1.N2.39.56.232.83.149 3,6,6-trimethyl-2-norpinanon_C10H16O (Z)-sabinol_C10H16O DRG.GM1.N2.40.20.149.83.109 DRG.GM1.N1.32.59.250.149.83 o-cymene_C10H14 DRG.GM1.N2.33.35.234.136.95 DRG.GM1.N2.33.94.234.83.119 g

Figure 3-2-12 The results of HCA analysis using Pearson correlation coefficient between abundance patterns of metabolites detected using GC-MS. (The HCA result is aligned with a heatmap demonstrating correlation between the abundance patterns of the metabolites. Postive Pearson correlation coefficients are red, and negative Pearson correlation coefficients are blue)

122

Table 3-2-2 Descriptive statistics of Pearson correlation coefficients between abundance patterns of metabolites

Statistics GC-MS LC_MS N Valid 12656 19182 N Missing 111 137 Mean 0.148 0.247 Std. Error of Mean 0.003 0.002 Median 0.113 0.246 Std. Deviation 0.305 0.288 Variance 0.093 0.083 Skewness 0.291 0.068 Std. Error of Skewness 0.022 0.018 Kurtosis -0.195 -0.591 Std. Error of Kurtosis 0.044 0.035 Range 1.698 1.565 Minimum -0.708 -0.574 Maximum 0.990 0.991 Sum 1872.182 4740.883

123

7.00%

6.00%

5.00%

4.00%

3.00% 0.05

Frequency 2.00%

1.00%

0.00% -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1.00% Pearson correlation coefficient 0.725

Figure 3-2-13 Distribution of Pearson correlation coefficients between abundance patterns of metabolites detected by LC-MS (The cutoff value for top 5% is shown)

9.00%

8.00%

7.00%

6.00%

5.00%

4.00%

3.00% Frequency 0.05 2.00%

1.00%

0.00% -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1.00% Pearson correlation coefficient 0.695

Figure 3-2-14 Distribution of Pearson correlation coefficients between abundance patterns of metabolites detected by GC-MS (The cutoff value for top 5% is shown)

124

20.00%

LC-MS data LC-MS scramble 15.00%

10.00%

Frequency 5.00%

0.00% -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-5.00% Pearson correlation coefficient

25.00%

GC-MS data 20.00% GC-MS scramble

15.00%

10.00% Frequency 5.00%

0.00% -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-5.00% Pearson correlation coefficient

Figure 3-2-15 Comparison of the distributions of correlation values calculated from real datasets with distributeions from scrambled datasets

125

LC-MS all 20.00% LC-scrambled LC-5M LC-7M 15.00%

10.00%

Frequency 5.00%

0.00% -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-5.00% Pearson correlation coefficient

GC-MS all 25.00% GC-scrambled GC-5M 20.00% GC-7M

15.00%

10.00%

5.00%

0.00% -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-5.00%

Figure 3-2-16 Comparison of the distribution of correlation values from the whole dateset to 5M/7M separate datesets.

126

3.3 Identification of diarylheptanoids in turmeric rhizomes using Tandem-MS-ASC

(Assisted Spectra Comparison)

This part of the research had two aims:

(1) To develop software tools to assist in the identification of non-peptidic small

molecules based on LC-MS/MS spectral data;

(2) To identify diarylheptanoids in turmeric rhizomes.

Extracts of turmeric rhizome

LC-MS/MS analysis

LC-MS/MS spectra data

Tandem-MS-ASC

LC-MS/MS spectra of poss ible diarylheptanoids

Manual validation

Structures of diarylheptanoids

Figure 3-3-1 The experimental approach for identification of diarylheptanoids in

turmeric rhizomes

127

3.3.1 Development of Tandem-MS-ASC (Assisted Spectra Comparison)

Although LC-MS analysis is able to detect large numbers of metabolites from complex biological samples, identification of these metabolites is a difficult task. It is not practical and probably not even feasible to physically isolate all compounds detected by LC-MS, construct MS/MS spectral databases from the isolated compounds, and then use database searching to identify such compounds in other samples. In the previous sections, we temporarily circumvented the requirement of metabolite identification by assigning artificial names to unidentified compounds in the LC-MS data set. However, it is still necessary to identify the compounds to determine the biological significance of the metabolomic data (Tikunov et al., 2005). In this section, we describe efforts to predict structures of some unknown compounds based on their LC-MS/MS experimental data and previous reports (Lerch et al., 2003; Jiang et al., 2006a; Jiang et al., 2006b).

Due to the large data size of metabolic profiling and metabolomics experiments, manual spectra analysis and interpretation rapidly becomes impractical. In an attempt to overcome this problem, we developed data analysis tools designed to assist in MS/MS spectra interpretation and compound structure prediction. The software tool that we have developed is presently called “Tandem-MS-ASC (Assisted Spectra Comparison)”.

Currently, this software contains three major components, which are described below.

Perl scripts that can accomplish the function of Tandem-MS-ASC are listed in Appendix

A3.

128

Fragmentation rules of diarylheptanoids have been carfully investigated (Jiang et al.,

2006; Jiang et al., 2006), which provided a good basis for identification of this group of compounds using MS/MS information. The structures of diarylheptanoids feature three

“building blocks”: two aromatic groups at both ends and a seven carbon bridge linking them. During CID in an ion trap, the fragmentation normally occurs at the bridge rather than the aromatic groups (Figure 3-3-6) (Jiang et al., 2006; Jiang et al., 2006). This is similar to the CID fragmentation of peptides, which mainly occurs at the amide bonds between amino acid residues, while the side chains of amino acid residues generally remain intact (Steen and Mann, 2004). This similarity suggests the possibility that diarylheptanoids could be identified using a strategy similar to what has been used for peptide identification.

Component I of Tandem-MS-ASC uses the “cross correlation” strategy that has been adopted to identify peptides (Yates et al., 1995b; Craig and Beavis, 2004). This strategy generates theoretical spectra based on the building blocks and fragmentation pattern of hypothetical molecules. Then it compares the theoretical and real spectra to determine whether the hypothetical molecule affords a spectrum matching the real one (Figure 3-3-

2):

Pseudocode of Component I (comparison with theoretical spectra):

129

(1) Read the fragmentation pattern (here fragmentation pattern refers to the

differences between mass of building block and mass of ions in MS/MS

spectrum).

(2) Read the mass of the precursor ion of the target spectrum, and calculate the

molecular weight (MW).

(3) Calculate all possible combinations of the building blocks. Search the all

combinations for possible structures that give the right MW.

(4) Generate the theoretical MS/MS spectra of possible structures.

(5) Read the target MS/MS spectrum. Compare the target spectrum with the

theoretical MS/MS spectrum. Report the comparison result if they match.

(6) Repeat step 2 to step 5 until all target spectra have been processed.

Mass of the precursor ion MS/MS spectrum

MW

Building blocks Combination of building blocks Fragmentation pattern Theoretical Compare and MS/MS spectrum report matches

Figure 3-3-2 Principle of Component I (comparison with theoretical spectra)

130

The usefulness of Component I is limited by its three prerequisites: (1) Changes in certain

parts of the molecule will not change the overall fragmentation pattern; (2) fragmentation patterns are known; (3) building blocks are known. These conditions are quite often not met for a group of compounds. To overcome this problem, Component II was developed

(Figure 3-3-3). Component II identifies the building blocks of the molecule by searching

for ion series, which are ions that contain the same molecular building blocks (Figure 3-

3-6). The masses of these ions are related to, associated with or connected to the mass of the building block. Pseudocode of Component II (Searching for ion series):

(1) Read the ion series in the fragmentation pattern. Calculate the mass shifts between

ions in the same ion series.

(2) Read the mass of the precursor ion of the target spectrum. Calculate the molecular

weight.

(3) Read the peak list of target MS/MS spectrum. Calculate the mass shifts between

the MS/MS peaks.

(4) Compare mass shifts values acquired at step 1 to those acquired at step 3. Report

the identified ion series if the two shift lists match each other.

(5) Use the mass of the ion series to calculate the mass of the building blocks.

(6) Report the combination of building blocks that afford the identified MW.

(7) Repeat step 2 to step 6 until all target spectra have been processed.

Component II does not require known building blocks. Instead, it can be used to find unknown building blocks. But it still needs to know fragmentation rules, which are often

131

not known. Component III has been developed to overcome this problem (Figure 3-3-4).

Component III searches numerous MS/MS spectra for analogs of a known compound.

Compared with the spectrum of the known compound, the spectra of its analogs often

contain peaks at the same mass or with certain mass shifts. If the known compound and

its analogs are only different in one part of the molecule, the mass shift of peaks will be equal to the mass shift of the molecular weight. A similar spectral comparison approach has been used in proteomic research for peptide identification (Liu et al., 2007).

Fragmentation pattern MS/MS spectrum Mass of the precursor ion

Ion series in the pattern Peaks list MW

Combination of building Ion series in the spectrum blocks affording the MW

Figure 3-3-3 Principle of Component II (searching for ion series)

Pseudocode of Component III (comparison of spectra with suitable mass shifts; query refers to the MS/MS spectra of the known compound or the compound used for comparison; target refers to all other spectra in the LC-MS/MS result):

(1) Read the mass of the precursor ion of spectrum of the known compound (massquery)

(2) Read the peak list of the query spectrum (peak list I).

132

(3) Read the mass of the precursor ion of spectrum from the unknown compound

(masstarget), and calculate the mass shift (∆).

∆ = masstarget - massquery

(4) Generate peak list II.

Mass of peak list II = Mass of peak list I - ∆

(5) Read the target spectrum. Compare it with peak list I and peak list II. Report the

target spectrum if it shares many peaks with peak list I and II.

(6) Repeat step3 to step 5 until all target spectra have been processed.

Mass of the precursor ion MS/MS spectrum MS/MS spectrum (Known and unknown) of known of unknown

Peak list

MW difference Compare top 25 peaks Peak list with mass shift

filter out the spectra with ≥ 10 matches

Figure 3-3-4 Principle of Component III (comparison of spectra with suitable mass shifts)

The three components of Tandem-MS-ASC can work together. The third component is actually a mass spectrum filter. Spectra of a series of analogs can be selected for manual review by Component III. Comparison of spectra of these analogs can be helpful to

133

reveal the fragmentation patterns. After the fragmentation patterns have been summarized, the first and second components can be used to automatically provide clues about possible structures of unknown compounds (Figure 3-3-5).

Identify Component III Fragmentation pattern related compounds

Component II Building blocks

Identify more Component I compounds

Figure 3-3-5 Combination of the three data analysis approaches

3.3.2 Identification of diarylheptanoids in turmeric rhizome based on LC-MS/MS

Twelve known diarylheptanoids were identified from turmeric rhizome extract based on their retention times, molecular weights, UV spectra, and MS/MS spectra reported previously (Jiang et al., 2006a; Jiang et al., 2006b): curcumin 1, demethoxycurcumin 2, bisdemethoxycurcumin 3, 4'-hydroxybisdemethoxy-curcumin 4, dihydrocurcumin 6, dihydrodemethoxycurcumin 7, dihydrobisdemethoxycurcumin 8, 1-(4-hydroxy-3- methoxyphenyl)-7-(4-hydroxy-3,5-dimethoxyphenyl)-4,6-heptadien-3-one 11, 1-(3,4- dihydroxyphenyl)-7-(4-hydroxy-3-methoxyphenyl)-6-hepten-3,5-dione 12, 1-(4-hydroxy- phenyl)-7-(4-hydroxy-3-methoxyphenyl)-1,4,6-heptatrien-3-one 14, 1,7-bis(4-hydroxy-3-

134 methoxyphenyl)-1,4,6-heptatrien-3-one 15, and curcumin monoacetate 22. Identification of these known compounds was a good starting point for identification of unknown compounds.

Using Tandem-MS-ASC, MS/MS spectra of more than 20 possible diarylheptanoids were found from the whole LC-MS/MS data set. After manual validation, 7 more diarylheptanoids were predicated based on their MS/MS spectra. Two of them have been isolated from turmeric, but their MS/MS spectra have not been reported. One of them has been chemically synthesized. And the structures of the remainder (4 compounds) were not found in the literature, and tentatively identified as new compounds.

Two methods were used in the manual validation. Structures of 1-(3,4-dihydroxyphenyl)-

7-(4-hydroxy-3-methoxyphenyl)- 1,6-Heptadiene-3,5-dione 23, 1-(3,4-dihydroxy-3- methoxyphenyl)-7-(4-hydroxyphenyl)-1,6-Heptadiene-3,5-dione 24, 1-(3,4-dihydroxy-3- methoxyphenyl)-7-(3,4-dihydroxyphenyl)-1,6-Heptadiene-3,5-dione 25, 1-(3,4,5- trihydroxyphenyl)-7-(4-hydroxy-3-methoxyphenyl)- 1,6-Heptadiene-3,5-dione 26 and 1-

(3-methoxy-4-hydroxyphenyl)-7-phenyl-1,6-Heptadiene-3,5-dione 27 (Figure 3-3-7 to

Figure 3-3-10) were validated using the fragmentation rules established for diarylheptanoids in previous studies (Jiang et al., 2006a; Jiang et al., 2006b) (Figure 3-3-

6). Identities of two other compounds were validated based on spectral comparison.

Spectra of 1-(4-hydroxy-3-methoxyphenyl)-7-(4-hydroxy-3,5-dimethoxyphenyl)-1,4,6- heptatrien-3-one 28 and 1-(3,4-dimethoxyphenyl)-7-(4-hydroxy-3-methoxyphenyl)- 1,6-

Heptadiene-3,5-dione 29 were compared with 15 and 22, respectively (Figure 3-3-11 and

Figure 3-3-12). In the following phragraph, identification of 28 is used as an example to

135

show how to identify unknown compounds using the spectral comparison approach. This

example also demonstrated the importance of identification of co-regulated metabolite

modules (see section 3.2.3) for compound identification.

A comparison of the LC-MS/MS spectra of 15 and 28 is shown in Figure 3-3-11. The

precursor ions of these two compounds did not fragment well under our experimental

conditions, so the MS/MS spectra were dominated by a large peak for the intact precursor

ion, and the intensities of peaks of all product ions were relatively low, which made

compound identification very difficult. However, when the spectrum of the unknown

compound (28) was compared with the known compound (15), we found that the peaks

of product ions, though very weak, were highly reproducible. Most of the peaks in

spectrum of 28 had counterparts in the spectrum of 15, at the same mass or with a mass shift of 30 (equal to the mass difference of the precursor ions). The mass shift suggested that the two molecules differed by a methoxy group. If the methoxy group was located on the bridge, where the fragmentation usually occurs, the fragmentation pattern would change and the two spectra would not be aligned with each other. Therefore, the methoxy group should be on one of the aromatic rings. Normally, the ring substituents of diarylheptanoids are located at the C-3' or C-4' positions (Jiang et al., 2006b). Two possible structures can be drawn based on the above information since the methoxy group can be on either of the two aromatic rings. A selection between the two candidates can be made based on the correlation analysis of non-targeted metabolomic data, in which 28 was found to be co-regulated with 15 (Figure 4-2-2). So 28 and 15 were probably structurally related to each other. Obviously, one of the two candidates was more similar to 15. As the result, 28 was tentatively identified as 1-(4-hydroxy-3-methoxyphenyl)-7-

136

(4-hydroxy-3,5-dimethoxyphenyl)-1,4,6-heptatrien-3-one (a new compound). Using the same approach, the structure of 29 was tentatively identified as 1-(3,4-dimethoxyphenyl)-

7-(4-hydroxy-3-methoxyphenyl)- 1,6-Heptadiene-3,5-dione (Figure 3-3-12), which was previously obtained via chemical synthesis (Berzin et al., 1996).

[M+H]+ [M-H]- L N B O D O O C O O P E

R R R R OH HO OH HO

Negative Positive Precursor ion MW-1 Precursor ion MW+1

Ar ion series Ar ion series B Ar+94 L Ar+122 C Ar+52 N Ar+96 D Ar+50 O Ar+54 E Ar+26 P Ar+52

Neutral loss I MW-18 J MW-80 K MW-84 M MW-110

Figure 3-3-6 The fragmentation pattern of diarylheptanoids (Jiang et al., 2006a)

137

O O HO OMe

HO OH 271.0(K)

216.8(B) 245.0(M,L) 100

-MS2 (352.9) 28.7min +MS2 (355.0) 28.7min

50 173.0(D) 175.0(P)

230.9(L)

Relative abundance 285.0(J) 160.9 (P) 219.0(N) 135.1(E) (O) (C) (O) 337.0(I) 0 100 150 200 250 300 350 100 150 200 250 300 350 m/z m/z Figure 3-3-7 Structure and MS/MS spectra of compound 23 [1-(3,4-dihydroxyphenyl)-7- (4-hydroxy-3-methoxyphenyl)- 1,6-Heptadiene-3,5-dione, previously isolated from turmeric (Nakayama et al., 1993)]

O O OMe

HO OH OH

Figure 3-3-8 Structure and MS/MS spectra of compound 24 [1-(3,4-dihydroxy-3- methoxyphenyl)-7-(4-hydroxyphenyl)-1,6-Heptadiene-3,5-dione, a new compound] (the chromatographic peak of this compound was very close to the peak of 23, so the MS/MS spectra of 24 were contaminated by MS/MS spectra of 23. the contamination peaks in the spectra were marked as dash lines)

138

O O HO OMe

HO OH OMe 216.8(B) 260.9(L) 100

-MS2 (382.8) 28.6min +MS2 (355.0) 28.7min

172.9(D) 244.9(L)

50 175.1(P) 301.0(K) 191.0(P) 164.9(E) (O) elative abundance elative R 367.0(I) (O) 315.1(J) 218.7(N) 189.0(D) 232.7(B) 275.1(M) 0 100 150 200 250 300 350 100 150 200 250 300 350 400 m/z m/z

Figure 3-3-9 Structure and MS/MS spectra of compound 25 [1-(3,4-dihydroxy-3- methoxyphenyl)-7-(3,4-dihydroxyphenyl)-1,6-Heptadiene-3,5-dione, a new compound]

O O O O OMe HO OMe

HO OH OH 176.9(O,P) OH 239.0(K) 100 100

+MS (371.2) 30.9min 2 +MS (323.0) 39.5min 2

50 50 305.0(I) elative abundance elative abundance elative

245.9(L) R R 177.0(O) 244.9(L) 174.9(P) 253.1(J) 310.3(J) 285.9(K)

174.8(P) 213.0(M) 352.9(I)

0 0 100 150 200 250 300 350 80 100 120 140 160 180 200 220 240 260 280 300 320 m/z m/z

Figure 3-3-10 Structure and MS/MS spectra of compound 26 [1-(3,4,5- trihydroxyphenyl)-7-(4-hydroxy-3-methoxyphenyl)- 1,6-Heptadiene-3,5-dione, a new compounds] and 27 [1-(3-methoxy-4-hydroxyphenyl)-7-phenyl-1,6-Heptadiene-3,5- dione, previously obtained via chemical synthesis (Arrieta et al., 1987)] (no high quality MS/MS spectra obtained in negative mode for these two compounds)

139

+ 353.1 6.0 +MS2 (353.1) 22.6min X16 + 269.1

O

MeO OMe

HO OH

+ + Relative abundance Relative * + 321.0 338.0 307.2 + 147.0 * + 232.0* 293.1 * * + + 153.1 190.8 219.2 247.1 + + * * + * + + 176.0 205.1+ + 279.0 0.0 * 100 120 140 160 180 200 220 240 260 280 300 320 340 360 m/z

+ 383.1 +MS (383.2) 23.1min X30 3.0 2 O

MeO OMe

HO OH OMe + O 299.2

MeO OMe

HO OH OMe Relative abundance Relative + 368.0 + + 177.0 + 350.0 + + + + 365.4 * 190.0 * 245.3 275.2* 323.1+ + 175.9 232.1 + 309.8* 333.1 + * 213.3 + * 152.8 ** * 274.2* 295.1 +* 0.0 100 150 200 250 300 350 m/z

Figure 3-3-11 Spectral comparison of 15 (top, a known compound) and 28 (bottom, a new compound) (* : peaks with the same mass; +: peaks with the mass shifted by 30. The mass difference of the two precursor ions is 383-353 = 30)

140

+ * 245.0 100 O O MeO OMe

+MS2 (410.9) 36.8min HO OAc

+ 326.8

50 * 369.0 Relative abundance Relative

175.0

+ 353.0 * + + + + + + * * 300.8 340.9 393.0 117.0 * 258.9 + 0 100 150 200 250 300 350 400 m/z + 100 299.0 +MS2 (383.0) 35.8min

O O MeO OMe

HO OMe

50 + *

elative abundance elative 259.0 R + * 383.0 189.1 + 244.9 230.9 191.0 273.1 + * + 325.4 + + * * 177.1 + + 312.9 115.1 142.9 ** 364.8 218.8 * 0 100 150 200 250 300 350 m/z

Figure 3-3-12 Spectral comparison of 22 (top, a known compound) and 29 (bottom, a previously isolated from turmeric) (* : peaks with the same mass; +: peaks with the mass shifted by 28. The mass difference of the two precursor ions is 410-383 = 28)

141

CHAPTER 4: DISCUSSION

4.1 Discussions of proteomic results

4.1.1 Shotgun approaches as high-throughput tools for plant proteomics research

Shotgun proteomics approaches, including MudPIT and GeLC-MS/MS, are widely used in the proteomic analysis of human, animal and microorganism samples (Dworzanski and

Snyder, 2005; Jung et al., 2005; Hu et al., 2006). However, these approaches are less commonly applied in plant proteomic analyses (Lee and Cooper, 2006). Compared with the 2D-gel approach, which is still dominant in plant proteomics research, shotgun approaches have some disadvantages, such as complex data analysis and the inability to directly provide quantitative information. For this research, we wished to overcome these disadvantages while capitalizing on the advantages of the high-throughput capability of shotgun proteomic approaches. The availability of an EST database from basil glandular trichomes, and our observations of patterns of gene expression at the transcript level combined with patterns of metabolite accumulation raised interesting questions regarding the presence, abundance and modifications of biosynthetic proteins, questions we perceived would be best addressed by applying the shotgun proteomics approach described here.

To achieve high throughput, shotgun proteomics approaches rely on information from sequence databases to assign protein identities (Chen and Harmon, 2006). Therefore, these approaches are especially suitable for taxa having available genome sequence.

However, complete genomic sequence is unavailable for most plant species. In these

142

cases, EST databases have proven to be a good alternative source for sequence

information (Neubauer et al., 1998). However, because such databases consist of only a

subset or sampling of the genes expressed in the plant cells or tissues used for

construction of cDNA libraries and creation of the database and because the EST

sequences themselves are usually incomplete, an EST database itself is not a complete

representation of the entire protein-coding portion of a plant’s genome. This presents a

significant problem when one seeks to identify proteins using a technology based on the availability of a sequence database. In an effort to circumvent this problem and improve our rate of protein identification, we included in the database generated for protein identification not only the genes from the basil EST database, but also all known plant protein sequences entered into the UniProt database. The necessity of this arrangement was validated by our protein identification results, where about one third of the protein identifications would not have been made if the EST database had been used as the only sequence information source. Interestingly, an informal survey of the most prominent and abundant of these proteins, using BLAST searches of the identified protein sequence back to the basil EST database, suggests that close homologs of the identified proteins exist in the basil database but were not identified due to a presumed discrepancy between sequence and mass spectrum data. We surmise that we have nonetheless missed other proteins that are expressed in this tissue because of the problems described above and the fact that exact sequence identity is required for peptide identification, and many peptides from basil proteins may be dissimilar enough from their counterparts in the UniProt database to prevent them from being identified. Thus, protein identification is a

143

conservative and incomplete process, sensitive to and unable to compensate for

artifactual sequence polymorphisms resulting from sequencing errors as well as those that

might arise due to biological process such as RNA editing or alternative transcript

processing yet not be captured in transcriptome-level data. Despite these problems,

however, our results do represent a good picture of the most abundant proteins and genes

expressed in the peltate glandular trichomes from the four different basil lines.

In shotgun proteomics, proteolytic digestion is performed prior to LC separation. Because

of sequence similarity among proteins, peptides with the same sequence can be generated

from different proteins. This “protein inference problem” complicates the task of

assigning peptides back to proteins and requires that a statistical measure of significance be applied to proteins identified (Nesvizhskii and Aebersold, 2005). Several programs,

such as PROT_PROBE (Sadygov et al., 2004), PANORAMICS (Lee and Cooper, 2006)

and ProteinProphet (Nesvizhskii et al., 2003), have been developed to carry out this task.

Of these, ProteinProphet was chosen for this investigation because it has been integrated

into the Trans-Proteomic Pipeline (TPP), a freely available uniform proteomics analysis

platform (Keller et al., 2005), and thus the large dataset produced by shotgun proteomics

experiments can be processed efficiently.

Shotgun proteomics approaches have been thought to be incapable of performing direct quantification based on MS signal intensity (Steen and Pandey, 2002). This problem can be circumvented by using techniques such as Isotope Coded Affinity Tags (ICAT) (Gygi

144

et al., 1999a) or Isotope Labeling with Amino acids in Cell culture (SILAC) (Ong et al.,

2002), both of which require the mixing of stable isotope-labeled samples with non- labeled samples. However, these methods are very expensive and not always feasible for

plant proteomic studies, and thus have not been as widely adopted in plant proteomics as

they have for other research areas (Rose et al., 2004). Alternative methods to quantify

protein levels in shotgun proteomics investigations have been developed. During data-

dependent MS/MS spectra acquisition used in shotgun approaches, highly abundant

proteins are sampled more often than less abundant proteins (Bodnar et al., 2003; Gao et

al., 2003). Therefore, protein abundance can be approximated based on the number of spectra obtained from each protein (Liu et al., 2004; Lu et al., 2007).

Our results showed that this method can be successfully used in plant proteomic

investigations to produce biologically significant semi-quantitative information that

corresponds in many instances to the relative quantitative differences observed for

mRNA expression levels of the corresponding genes or metabolite levels downstream of

the protein. The significant correlation between transcript and protein levels described in

this report provides important validation for this approach when used to analyze

important biological processes in a single cell type such as the glandular trichome and

indicates that in a majority of cases such inferences may not be misleading. Many good

examples of this pattern of gene expression matching protein levels, which itself matches

metabolite production, included glyceraldehyde-3-phosphate dehydrogenase, DAHP

synthase and most of the rest of the enzymes in the shikimate pathway, phenylalanine

145 ammonia lyase, caffeoyl-CoA O-methyltransferase, chavicol O-methyltransferase, eugenol O-methyltransferase, and 1,8-cineole synthase, among many others (Tables 3-1-2 to 3-2-7). In contrast, a second pattern was observed where dramatic differences between protein levels and gene expression levels were found for several proteins in this analysis, which nevertheless led to predictable metabolite levels based on the observed protein levels, presumably due to the fact that these enzymes reside at important regulatory or branch points in their respective pathways. Examples include p-coumaroylshikimate-3´- hydroxylase and cinnamate 4-hydroxylase among others. A third pattern was also observed, which included enzymes for which neither the respective gene expression nor metabolite levels appeared to have any correlation with protein levels observed for the different basil lines. Excellent examples of this included R-linalool synthase and geraniol synthase. Deviations from direct correlation between transcript, peptide and metabolite levels could be the result of technical factors, such as differences in protein extraction efficiencies between the different lines, differential peptide ionization efficiencies due to residue polymorphisms for equivalent peptides from different lines, ion suppression for a peptide in a particular line due to the presence of abundant peptides not present in the other line, etc., or they may reflect real biological differences in protein abundance due to differences in protein stability in particular lines (e.g., due to targeted degradation) or other post-transcriptional or post-translational regulation and modifications. Such factors cannot necessarily be determined from our current results, but lead to interesting new lines of inquiry regarding potential regulatory mechanisms governing metabolism in the different basil lines. High resolution discrimination of real variation in protein abundance

146 versus apparent differences arising due to technical reasons will require analysis of multiple biological and technical replicates, which should become increasingly feasible with improved analytical methods and higher throughput instrumentation leading to reduced analysis costs.

4.1.2 Differential expression of specific enzymes controls chemical diversity of basil lines

Although all four basil lines chosen for this investigation have unique metabolic profiles, the citral-producing line SD is the most chemically distinct of the four basil lines, containing high levels of terpenoids but very low levels of phenylpropanoids. According to the TSC data presented in Tables 2 – 6, it also displayed a distinctive protein expression pattern. Most of enzymes in the phenylpropanoid and shikimate pathways were down-regulated in SD, at both the mRNA and protein levels. This was especially the case for key entry point enzymes into the respective shikimate and phenylpropanoid pathways—DAHP synthetase and phenylalanine ammonia lyase. The lower level of production of methylchavicol in line SD relative to line EMX-1, even though CVOMT is expressed at a higher levels in SD, can easily be explained by the very low level of expression of most of the phenylpropanoid pathway genes and enzymes, especially cinnamoyl-CoA reductase and p-coumaryl/coniferyl alcohol acetyl transferase, which are both required for chavicol production and are expressed at low levels in line SD based on mRNA levels and by the absence of any proteomics support.

147

As can be seen from Figure 4-1-1, the patterns across all four basil lines for both

transcript and protein levels, as indicated by TSC and TEN, matched reasonably well the pattern of total volatile compound output (metabolite levels) for DAHP synthetase alone

or the shikimate pathway as a whole. Slight differences in expression pattern can be

attributed to error in sampling inherent in EST production via the approach used or in

shotgun proteomics peptide identification performed without replication. Nevertheless,

the general patterns matched very well for both DAHPS and the whole shikimate

pathway, suggesting that production of volatile phenylpropanoids is controlled

significantly by levels of gene expression leading to protein levels for the enzymes in this

pathway.

In contrast, many enzymes related to the production of terpenoids were expressed at high

levels in line SD. These included most of the enzymes in MEP pathway (Rohmer et al.,

1996; Rohmer, 1999; Rodriguez-Concepcion and Boronat, 2002), isopentenyl-

diphosphate delta-isomerase (IDI), and all of the prenyltransferases identified. This

protein expression pattern can explain the chemical profile of line SD relative to the other

lines, where reduced production of phenylpropanoids and the increased biosynthesis of

terpenoids appear to have resulted from reduced flux into the phenylpropanoid pathway

and diversion of upstream metabolite intermediates into the MEP and subsequent

terpenoid pathways as a result of differences in levels of expression of pathway entry

point genes and resulting differences in levels of the corresponding proteins.

148

DAHPS vs. total volatile phenylpropanoids CCMT vs. methylcinnamate 1,8-Cineole Synthase 1.2 1.2 1.2 TranscriptProtein Metabolite TranscriptProtein Metabolite TranscriptProtein Metabolite 1 1 1.0

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0.0 SW MC EMX-1SD SW MC EMX-1SD SW MC EMX-1SD SW MC EMX-1SD SW MC EMX-1SD SW MC EMX-1SD SW MC EMX SD SW MCEMX SD SW MC EMX SD Total Shikimate Pathway vs. total volatile phenylpropanoids CAAT vs. volatile phenylpropenes Cadinene Synthase 1.2 1.2 1.2 Protein Metabolite TranscriptProtein Metabolite TranscriptProtein Metabolite 1 1 1.0

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0.0 SW MC EMX-1SD SW MC EMX-1SD SW MC EMX-1SD SW MC EMX-1SD SW MC EMX-1SD SW MC EMX-1SD SW MC EMX SD SW MC EMX SD SW MC EMX SD PAL vs. total volatile phenylpropanoids C3'H vs. 3-hydroxylated phenylpropenes Germacrene D Synthase 1.2 1.2 1.2 Transcript Protein Metabolite TranscriptProtein Metabolite TranscriptProtein Metabolite 1 1 1.0

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 SW MC EMX-1SD SW MC EMX-1SD SW MC EMX-1SD 0.0 SW MC EMX-1SD SW MC EMX-1SD SW MC EMX-1SD SW MC EMX SD SW MCEMX SD SW MC EMX SD

Figure 4-1-1 Levels of transcripts and proteins of specialized metabolic enzymes vs. their corresponding metabolites among the 4 basil lines

Our results also suggest that the low level of total terpenoid production observed for line

EMX-1 (Figure 3-1-2) can be attributed to the absence of the first two enzymes in the

MEP pathway. Among the four lines, EMX-1 had the lowest mRNA and undetectable protein levels for DXS and DXR, the first two enzymes in the MEP pathway. This is consistent with reduced production of terpenoids in EMX-1, as previously reported

(Iijima et al., 2004a). The mevalonate pathway, which had higher mRNA expression levels for most enzymes in the pathway in line EMX-1 relative to the other lines, was not detectable at the protein level, and it was presumably not able to compensate for apparent reduced flux through the MEP pathway in this line. These results taken together with the

149

results described above for control of phenylpropanoid pathway-derived compounds and the apparently high level of genetic similarity between basil lines SD and EMX-1 suggests that significant metabolic crosstalk may occur between the shikimate/phenylpropanoid and MEP/terpenoid pathways in basil glandular trichomes, and that efforts to alter production of specific compounds in one pathway may very well

have an affect on the production of metabolites from the other pathway. The different

metabolic phenotypes of these basil lines may be due to a minimal genetic perturbation

that occurred during their breeding and development, and it may be possible to

experimentally test the ability to switch flux between the terpenoid and phenylpropanoid

metabolic branches by altering expression of key differentially expressed genes using

genetically transformed basil, an objective we are working to achieve.

Gene expression and protein levels observed for the MEP and mevalonate pathways in all

four basil lines clearly suggests that both mono- and sesquiterpenoids produced by basil

glandular trichomes are the products of metabolites derived perhaps exclusively from the

MEP pathway, and that the mevalonate pathway has little or no relevance to production

of these compounds in fully differentiated cell types like the glandular trichomes of sweet

basil. Moreover, the MEP pathway is localized to the plastids (Rodriguez-Concepcion

and Boronat, 2002), but farnesyl diphosphate synthase (FPS) and sesquiterpene synthases,

required for the synthesis of sesquiterpenoids, are cytosolic enzymes (Steele et al., 1998;

Dudareva et al., 2005; Szkopinska and Plochocka, 2005) . The production of

sesquiterpenoids in basil glandular trichomes from MEP pathway products suggests

150

transport of terpenoid precursors (IPP or DMAPP) out of the plastids and into the cytosol.

Experiments, yet to be conducted, that evaluate the incorporation of labeled precursors in the MEP vs. MVA pathways into mono- and sesquiterpenoids in basil glandular

trichomes will be able to further support or refute this hypothesis. However, our results clearly are in line with other reports that have proposed near-exclusive involvement of the MEP pathway in production of specialized (“secondary”) metabolites in other plant species (Kasahara et al., 2002; Dudareva et al., 2005). Furthermore, having both the shikimate and MEP pathways, respectively, which are involved in production of precursors for the phenylpropanoids and terpenoids produced by basil, localized to the plastid allows for direct crosstalk within a single subcellular compartment and reciprocal control of these pathways, as is described in this report.

The difference in production of eugenol- vs. chavicol-derived compounds between lines

EMX-1 and SW appears to be controlled by levels of p-coumaroylshikimate-3'- hydroxylase (C3´H) and of the BAHD acyltransferase, p-coumaroyl-CoA:shikimate p- coumaroyl-transferase (CST), which is the committed step on the pathway to 3´- hydroxylated phenylpropanoids in basil glandular trichomes (Gang et al., 2002a) and vascular tissues in other plants as well (Hoffmann et al., 2005). As mentioned above, the metabolic profile of line EMX-1 featured a dominant methylchavicol peak (Figure 3-1-2).

This line expressed high levels of cinnamoyl-CoA reductase (CCR), cinnamyl alcohol dehydrogenase (CAD), p-coumaryl/coniferyl alcohol acetyl transferase (CAAT), eugenol/chavicol synthase, and CVOMT, the enzymes catalyzing the reactions at the end

151

of the biosynthetic pathway to methylchavicol. No peptide data were obtained for CST in

this line (or in SD), due to the very low expression levels and very high catalytic

efficiencies of BAHD acyltransferases (Hoffmann et al., 2003). RNA expression levels

suggest that CST is not expressed at appreciable levels in line EMX-1, whereas it is

expressed at a level comparable to many other genes in the phenylpropanoid pathway in

line SW. These data support the role of CST in controlling (at least partially) the

production of differentially hydroxylated (or methoxylated) phenylpropanoid pathway-

derived compounds in plants. We also observed a large difference between the four basil

lines in protein levels of C3´H, which catalyzes the addition of the hydroxyl group to the

3´position on the aromatic rings of these compounds and which lies directly downstream

from CST in the pathway. There was a very large (30-fold) difference between lines SW

and EMX-1 in TSC for this enzyme. C3´H enzyme activity in tissues comparable to those

used in this investigation was found to be 9-fold higher in line SW relative to activity in

line EMX-1 (Gang et al., 2002a). In contrast, an apparent opposite difference was

observed for mRNA levels of C3´H in these lines (2-fold higher mRNA levels in EMX-1 vs. SW, see Table 4). These results suggest that protein stability/degradation may play a significant role in regulating the levels of this enzyme and that other factors, such as post- translational modification, may play a role in regulating the activity of this enzyme.

These results also suggest that C3´H may play some role along with CST in regulating production of 3´-hydroxylated phenylpropanoid pathway-derived compounds like eugenol. Thus, the fact that C3'H levels were very low in EMX-1 and that no transcripts for CST were observed in this line can explain the absence of production of meta

152

hydroxylated phenylpropanoids such as eugenol or methyleugenol in this line, in contrast to line SW, which has appreciable expression of these two enzymes and as a result produces eugenol. Lack of CVOMT or EOMT protein or gene expression explains why line SW produces eugenol and relatively much smaller amounts of methyleugenol.

Most enzymes in the phenylpropanoid pathway were expressed at high although not identical levels in both SW and MC, which share many morphological similarities.

Exceptions include C4H and p-coumaryl/coniferyl alcohol acetyl transferase (CAAT), which were expressed at very low levels in line MC and which, in addition to the very high expression of p-coumarate/cinnamate carboxyl methyltransferase, may explain why line MC produces only very low levels of eugenol- or chavicol-derived compounds while it produces very high levels of methylcinnamate. Protein levels of CAAT and eugenol synthase, the two enzymes essential to the biosynthesis of eugenol (Koeduka et al., 2006), were highest in SW, which accumulates eugenol at high levels. In addition, enzymes in the MEP/terpenoid pathway are expressed at high levels in both SW and MC. This led to more “balanced” production of phenylpropanoids and terpenoids, when compared to lines

EMX-1 and SD.

As was previously proposed (Iijima et al., 2004a), differences in specific TPS expression levels can explain differential accumulation of specific mono- and sesquiterpenoids in the four basil lines. For example, 1,8-cineole synthase is expressed at very high levels in line

SW, and at 5.5-fold and 31-fold lower levels in lines MC and EMX-1, respectively.

153

Levels of 1,8-cineole in these three lines (and absence from line SD) mirror these differences (see Figure 3-1-2 and Figure 4-1-1). Production of other compounds such as

α-cadinene, germacrene-D, β-myrcene, terpinolene/terpineol, etc., also mirror the expression and especially the protein levels of their respective upstream terpene synthases (see Figure 3-1-2 and Table 3-1-6).

4.1.3 Transcriptional and post-transcriptional regulation of phenylpropanoid and terpenoid metabolism in basil glandular trichomes

For many enzymes observed in this investigation, differential protein levels among the different basil lines appeared to correspond to parallel differences in mRNA levels. This provided us an opportunity to validate the proteomic and transcriptomic results with each other, as discussed above. Although “-omics” level analyses are generally considered to be hypothesis-generating rather than hypothesis-testing approaches, conclusions derived from such analyses are more reliable when consistent results are obtained from investigations at different levels (Ge et al., 2003). This is due to the fact that it is extremely unlikely to accidentally get the same false results multiple times from experiments that are independent of each other. For example, it is noteworthy that the mRNA levels of all five enzymes in the MVA pathway were consistently very low, confirming a previous report (Iijima et al., 2004a), and their proteins were undetectable.

These observations, obtained from eight separate omics-level datasets, substantiate the conclusion that the MVA pathway does not play a significant role in the production of terpenoids in basil glandular trichomes and that terpenoid precursors are (perhaps)

154 exclusively supplied by the MEP pathway in this single cell type. This appears to be true not only for monoterpenoids, but for sesquiterpenoids as well, which are produced in the cytosol and have been traditionally thought to be produced from precursors derived from the MVA pathway. Indeed, our results support the hypothesis that perhaps only steroid- derived terpenoids are produced by the MVA pathway in plants.

As mentioned above, several enzymes investigated reside at important regulation points in their respective pathways. In all such cases, mRNA and protein levels were consistent with the metabolic profiles in the respective basil line. For example, DAHPS, PAL and

DXS, the entry and control points for shikimate, phenylpropanoid and terpenoid pathways (Bate et al., 1994; Carretero-Paulet et al., 2002; Rodriguez-Concepcion, 2006), respectively, had both protein and mRNA expression profiles that were consistent with the production of phenylpropanoids and terpenoids in the glandular trichomes of specific basil lines (see Figure 4-1-1). Similar correlations were also observed for many other important enzymes, such as 4CL, CCOMT, and IDI, as well as others described above.

The good correlation among mRNA, protein, and metabolite levels in these cases suggests that these pathway nodes are regulated at the transcriptional level and that control of gene expression plays an essential role in the control of the direction of carbon flow in metabolic pathways in basil glandular trichomes.

In addition, many enzymes with consistent protein and mRNA levels were found to be co-regulated with each other. This co-regulation was most clearly seen in line SD, where

155 almost all of the enzymes related to the production of phenylpropanoids were down- regulated, and those related to the production of terpenoids were up-regulated at both mRNA and protein levels. In all lines, PAL and 4CL showed remarkable coordination at the both the mRNA and protein levels, which was consistent with previous reports

(Logemann et al., 1995; Reinold and Hahlbrock, 1997). Similar co-regulation was observed for DXS and IDI. This co-regulation implies common mechanisms for transcriptional control of these enzymes, which are yet to be determined.

On the other hand, the protein and mRNA levels of other groups of enzymes differed greatly from each other. In these cases, the proteomic data were more consistent with the metabolic profiling data. An example is C4H in line MC. The major product of MC does not have a 4-hydroxyl group, and thus its biosynthesis requires no C4H activity. This is consistent with the TSC data, which suggested that the C4H protein, although easily detected in the other basil lines, was not present at detectable levels in line MC. This contrasts with the TEN data, which showed C4H mRNA levels in MC to be the highest among all lines. The same pattern in line MC was not observed for the related cytochrome P450, C3´H, demonstrating that the discrepancy in C4H protein and mRNA levels was not due to inefficient membrane protein analysis in line MC. Indeed, the production of eugenol, the major compound in SW, requires C3´H and eugenol synthase activities. And the levels of these proteins, and not of the respective mRNAs, were more consistent with the eugenol levels in this line compared to other lines. Based these results,

156

it appears that post-transcriptional regulation of these enzymes is more important for

controlling their activities than is gene expression itself.

Our results support the existence of post-translational regulation of several proteins (see

Table 3-1-7). For example, line SD displayed high expression levels for CVOMT mRNA and protein (based on TEN and TSC), but it produced only small amounts of methylchavicol. One possible explanation for this observation could be a lack of precursors for chavicol production due to the down-regulation of enzymes in the phenylpropanoid pathway. However, ubiquitinated CVOMT was found in line SD (Table

3-1-7) aand not in the other lines, which provides an alternative explanation, because

ubiquitination would presumably lead to rapid degradation of this enzyme. In addition,

phosphorylated peptides of PAL were identified only in line SD (Table 3-1-7), and the phosphorylation of PAL has been reported to alter its enzymatic activity (Bolwell, 1992;

Allwood et al., 1999). Elucidation of the biological significance of the PTMs mentioned in this report will require further investigation. Nevertheless, these findings further suggest that metabolism in basil glandular trichomes is controlled on many levels.

Finally, our results suggest that carbon flow can be readily redirected between the phenylpropanoid and terpenoid pathways in specific cell types. This poses significant implications for attempts to metabolically engineer plants to produce desired outcomes.

Clearly, much more research must be done to uncover all of the levels of regulation of

157

these pathways in specific plant cell types before rational design of specific metabolite

profiles in plants like sweet basil can be expected to produce desired results.

4.2 Discussion of results of metabolomic investigation and diarylheptanoid identification

4.2.1 Variation of compound abundances in turmeric rhizomes

PCA and HCA results of LC-MS data, which first separated samples by growth stage, were different from results of GC-MS data, which first separated samples by cultivar

(Figures 3-2-7 to 3-2-10). The GC-MS dataset and LC-MS dataset also have different correlation distributions (Figures 3-2-13 to 3-2-16 and Table 3-2-2). This difference between the two data sets probably reflects differences in the biosynthesis of compounds detected by the two methods. Volatile compounds detected by GC-MS are mainly monoterpenoids and sesquiterpenoids, which are produced from the terpenoid pathway.

And compounds detected by LC-MS were probably mainly phenolic compounds that are mainly produced through the phenylpropanoid and acetate pathways. Therefore, the production of these different groups of was probably regulated differently, which led to different abundance profiles and correlation distributions.

The fertilizer treatments had complex effects on accumulation of the three major curcuminoids. Regarding the results of targeted analysis, compared with the control

(fertilization 0), low concentration of fertilizer (fertilization 1) significantly increased

levels of three major curcuminoids, but treatments with higher concentration of fertilizer

(Fertilization 2 and 3) did not lead to further increase of these levels. In addition, the

158

effect of different fertilizer treatments was not recognized in the PCA and HCA result of both GC-MS and LC-MS data. However, during the sample collection, we noticed that vegetative growth of all samples applied with fertilizer was better than that of control.

Therefore, our results confirmed the distinction between vegetative growth and

accumulation of specialized metabolites in turmeric, which was consistent with previous

reports (Akamine et al., 2007; Hossain and Ishimine, 2007).

Considerable within group variation was found in the results of both targeted and non- targeted analysis. ANOVA results of targeted analyses showed that the effect size of within group variance was relatively large (Figure 3-2-3). Also, outliers were common in

PCA and HCA results of both GC-MS and LC-MS data (Figures 3-2-7 to 3-2-10). Those

results suggested that some unknown factors have significant effects on specialized

metabolism of turmeric, and these factors were not controlled in the greenhouse where

the plants were grown. Initially, we suspected that the unknown factors came from

heterogeneity of the greenhouse environment. However, the outliers were not positioned

close to each other in the greenhouse and appear to be randomly distributed (Figure 4-2-

1). Although the effect of environmental factors can not be fully excluded, it is more

likely that the within group variation was caused by other factors, such as genetic

heterogeneity of the seed rhizomes which were originally purchased from commercial

sources.

159

Targeted analysis

LC-MS result

GC-MS result

Figure 4-2-1 Positions of the outliers (the cells marked with gray color) in the green house

Further investigation on these unknown factors may be a proper next step in this research area. As our results showed, the quality of tumeric rhizomes was highly variable even in the green house where variation of environmental factors was minimized. And it would be more difficult to obtain rhizomes with uniform quality under usual agricultural field

160

conditions. If our inspection is correct, i.e., the large within group variation was really

caused by genetic heterogeneity, genetically uniform seed rhizomes should be planted to solve this problem if homogeneous production of specific compounds is desired.

Alternatively, our results suggest that significant genetic variation exists within the germplasm used in commercial turmeric production, which itself might be advantageous for certain applications, such as targeted breeding, or efforts to maintain variation within the germplasm to protect against future pathogen attack. Amplification of a specific stock using the conventional approach was a slow process, but this process can be readily accelerated using in vitro micropropagation techniques (Ma and Gang, 2006). Otherwise, if the reason for the large variation was some environmental factor that is hard to control during agricultural practice, the problem can be resolved by genetic modification (such as by breeding efforts) targeting the regulation mechanisms that are sensitive to these environmental factors.

4.2.2 Applications of co-regulated metabolite modules

Such hierarchical co-regulated metabolite modules, although not called such, were previously reported in a GC-MS data set from tomato fruit (Tikunov et al., 2005). Our results showed that co-regulated metabolite modules exist in both LC-MS and GC-MS data sets from turmeric rhizomes (Figures 3-2-11 and 3-2-12). Therefore, co-regulated metabolite modules probably universally exist in plants since they were detected by different methods (which target different metabolite groups), in different organs (fruit and rhizome), and from two plant species not taxonomically close to each other: turmeric belongs to Liliopsida (monocot) and tomato belongs to eudicotyledons (eudicot).

161

Compounds in a co-regulated metabolite module may be structurally and biosynthetically related to each other. Therefore, identification of co-regulated metabolite modules can provide valuable information for phytochemistry and plant biochemistry research.

For co-regulated metabolite modules detected in LC-MS/MS data sets, series of possible analogs can be recognized based on the m/z value of the ions, which are directly related to the molecular weight of the compounds (under negative mode, molecular weight of the molecule is equal to the m/z of its deprotonated ion plus one). Those possible analogs have mass shifts that represent chemical modifications common in known compounds in turmeric, such as reduction (+2), dehydrogenation (–2) oxidation (+16), hydration (+18), methylation (+14), and methoxylation (+30), etc.. Existence of analog series, marked in different color in Figure 3-2-11, is obviously helpful for structure identification because identification of one compound greatly facilitates the identification of other compounds in the same analogs series. In addition, based on the mass shifts and predicted chemical modifications, we can even propose possible biosynthetic pathways that generate an analog series (Figure 4-2-3).

After compounds are identified, co-regulated metabolite modules can provide valuable information for the elucidation of their biosynthesis pathways. The HCA-heatmap result of identified compounds in LC-MS data set was shown in Figure 4-2-2. The module marked with a black square in the heatmap corresponded to diarylheptanoids with two hydroxyl groups bonded to the same aromatic ring in the ortho positions. The abundance patterns of these diarylheptanoids were similar to each other, but different to abundance

162

patterns of three major curcuminoids, which formed another co-regulated metabolite module in bottom right corner. This implied that these ortho diols probably have a same

precursor which contains a caffeic acid moiety.

This observation provides insight into the biosynthetic pathway of the three curcuminoids

(Figure 4-2-4). It supported that 3-methoxyl groups of curcumin formed before the formation of the backbone (the top of the figure). The common precursor of 23 and 4

with a caffeic acid moiety is likely to be caffeoyl-CoA. Therefore, at least two polyketide

synthases, with different selectivities, are probably responsible for the formation

of the two groups of diarylheptanoids. One polyketide synthase uses caffeoyl-CoA and

catalyzes the formation of the two ortho diols. Another polyketide synthase, not

preferring caffeoyl-CoA, catalyzes the formation of three major curcuminoids. These two

groups of compounds formed two distinct co-regulated metabolite modules, probably due

to differential regulation of the two polyketide synthases. Otherwise, if 3-methoxyl

groups of curcumin formed after the formation of the backbone (the bottom of the figure),

the compounds 23 and 4 should be intermediates in the formation of the of three major

curcuminoids, and it would be hard to explain why levels of the two ortho diols

correlated with each other, but not with levels of three major curcuminoids.

4.2.3 Plant metabolomics for natural product chemistry

As our results showed, metabolomic analysis is capable of revealing variation of turmeric

rhizome based on levels of hundreds of specialized metabolites, which is consistent with

previous reports about other herbs (Kim et al., 2005a; Mattoli et al., 2006; Yang et al.,

163

2006). Therefore, metabolomics is a powerful tool for the authentication and quality

control of herbs and their products. In addition, due to its ability to identify the correlation between metabolites, plant metabolomics will be helpful for research on plant

biosynthesis pathways. Besides the two applications mentioned above, the large amount

of information collected by plant metabolomics might benefit natural product discovery.

N

25 24 23 6 4 14 21 11 15 28 2 1 3

Figure 4-2-2 The results of HCA analysis using Pearson correlation coefficient between abundance patterns of metabolites identified using LC-MS/MS. (The HCA result is aligned with a heatmap demonstrating correlation between the abundance patterns of the metabolites. Positive Pearson correlation coefficients are red, and negative Pearson correlation coefficients are blue. This figure demonstrates how co- regulated metabolite modules useful for identification of 28 and elucidation of biosynthetic pathway of diarylheptanoids.)

164

DRG.LM1.P2_50.78_509 DRG.LM1.P2_51.95_567 DRG.LM1.P2_36.23_587

+CH3 DRG.LM1.P2_45.38_525 -2H +2H

+OCH3 +CH3 DRG.LM1.P2_48.67_569 +OH DRG.LM1.P2_47.47_555 DRG.LM1.P2_48.42_585 DRG.LM1.P2_53.92_569

+CH3

+OCH3 DRG.LM1.P2_46.67_571 DRG.LM1.P2_39.33_601

Figure 4-2-3 A possible pathway of generation of compounds in analogs in module “a” of LC-MS correlation result

O OH O OH O OH O OR OH HO OMe 4 23 OH HO OH HO Polyketide Polyketide OH OH Synthase Synthase OH Malonyl-CoA Malonyl-CoA

4CL O OH O OH OH OH OMe CST+C3H HCT/CST CCOMT HO O OH O OH O OH Shikimic OH O Acid CoAS CoAS CoAS Caffeoyl 5-O-Shikimate Caffeoyl-CoA Feruloyl-CoA p-Coumaroyl-CoA

2X 2X Malonyl-CoA Polyketide Malonyl-CoA Polyketide Malonyl-CoA Polyketide Synthase Synthase Synthase

O OH O OH O OH OMe MeO OMe HO OH OH Curcumin 1 OH Bisdemethoxycurcumin 3 HO Demethoxycurcumin 2 HO

Hydroxylase OMT Hydroxylase OMT

O OH O OH OH OMe

HO 4 OH HO 23 OH

Figure 4-2-4. Proposed biosynthetic pathway of diarylheptanoids in turmeric (This scheme is adopted from a biosynthetic pathway of curcumin in turmeric proposed by Ramirez-Ahumada et al., 2006. Our observation supports the hypothesis that 3- methoxyl groups are formed before the formation of the backbone (the top of the figure)

165

Conventionally, biologically active compounds are discovered from plants using bioassay-guided fractionation (Figure 4-2-5). For a long period of time, this method was

successfully applied in many plant species and led to discovery of large number of

bioactive principles and lead compounds (Houghton, 2002; Newman and Cragg, 2007).

Advanced analytical methods have been used to assist in natural product discovery. For example, FT-MS and LC-MS/MS were used for “dereplication” to avoid isolation of known compounds (Knight et al., 2003; Steinbeck, 2004; Boroczky et al., 2006), because hundreds of compounds can be readily identified from medicinal plants using database

matching approaches (Murch et al., 2004b). However, this method requires isolation of

structurally unknown active principles, and thus the output is unpredictable (O'Neill and

Lewis, 2002). In addition, when the level of bioactive principle is very low in the plant

sample, the supply of the purified compound could be a serious problem during

development and marketing of a drug.

Plant material

Extract LC or Bioassay LC/MS Yes Yes Crude extract Fractions Active? Pure? Structure Fractionate MS, NMR etc. No

Figure 4-2-5 Generic scheme for bioassay-guided fractionation (Koehn and Carter, 2005).

166

Therefore, we proposed a “metabolomics-guided discovery” approach to overcome these

problems. In this approach, possible bioactive principles are identified and isolated based

on the information collected by plant metabolomic experiment, and then their bioactivity

can be validated using purified compounds (Figure 4-2-6). In previous sections, our results showed that plant metabolomics is able to predict the structures of unknown compounds in complex mixtures based on their MS/MS spectral data and the correlation

analysis. After the structure of a compound is predected, the isolation problem can be

alleviated by using an isolation approach specifically designed for this compound. The

most interesting compounds can be selected as the targets of the isolation, probably using in silico screening tools (Rollinger et al., 2006). In addition, because metabolomic methods can quantify the abundance of unknown compounds, the best starting materials can be select for the isolation. Based on quantitative information, the plant with highest levels of compounds of interest can be chosen for plant breeding to overcome supply problems.

In the section 4.2.2, the variation of abundance of metabolites was used to investigate biosynthesis of specialized metabolites using a correlation approach. It is also possible to use this variation to find the bioactive constituents in complex mixtures using a similar

“correlation bioassay” approach (Figure 4-2-7). In this approach, crude extracts, instead of purified compounds, are used for bioassay. The bioassay results should be correlated with the abundance of the active compound. Using appropriate bioinformatics tools, such as rough set theory (Petit and Maggiora, 2005), interactions between constituents can be

167 recognized from the data set. Due to the existence of co-regulation metabolite modules, it is likely that levels of a group of compounds, not a single compound, are correlated with the biological signal. In that case, those compounds can be isolated and applied to bioassays. The results can provide valuable structure-activity relationship data.

Structures Which compound? Isolation method

Metabolomic “Targeted” Validation of Plant materials analysis isolation the activity

Which material? Abundances

Breeding etc. Supply problem

Figure 4-2-6 A scheme for the proposed “metabolomics-guided discovery”

Plant materials

Bioassay Extract Activity

Possible active Crude extracts Correlation compounds

Metabolomic Abundance analysis

Figure 4-2-7 A scheme for the proposed “correlation bioassay”

168

APPENDIX A SCRIPTS AND COMMAND LINES USED IN THE DATA ANALYSIS

A1. Perl scripts for the proteomic data analysis (1) Perl scripts for the combination of protein entries from different basil cultivars find_overlap_entries.pl #!/usr/local/bin/perl -w # this script is to group all entries that share any protein ID into a single entry

#Zhengzhi Xie 3-14-2007

# (1) read an entry # (2) compare all ID in the entries to existed entries # give a new entry number if no overlaps # add the non-share ID into the entries if "shared ID" is found

# use: perl find_overlap_entries.pl [input_file] # example: perl find_overlap_entries.pl protein_entries_4_lines.txt

@entries = ();

$in = $ARGV[0]; $out = "nr_".$ARGV[0]; open (INFILE, "<$in") || die("Cannot read file $in\n"); open (OUTFILE, ">$out") || die("Cannot write file $out\n");

while ($line = ){ chomp $line;

169

print "\n$line"; @item = split(/,/,$line); $judge = 0; $e_number = 2000; foreach $it (@item){ $current_e_number = 0; foreach $e (@entries) { if ($e =~ /$it/) {$judge = 1; $e_number = $current_e_number; print"*";} # judging if there is a shared ID $current_e_number++; } if ($judge == 1){last;} } if($judge == 0){push @entries, $line;} # give a new entry number if no overlaps else{ # add the non-share ID into the entries if "shared ID" is found foreach $it2 (@item){ if ($entries[$e_number] =~ /$it2/){next;} else{$entries[$e_number] = $entries[$e_number].",".$it2;} } } }

#output foreach $x (@entries) { print OUTFILE "$x\n"; } close INFILE; close OUTFILE;

170

assign_single_ID.pl #!/usr/local/bin/perl -w # this script is to assign a representative ID to each of the entries # and to find whether one entry contains IDs from basil12 or uniprot # Rule for select representative ID: # a. Times that this ID appears in the dataset 4>3>2>1 (share by more lines) # b. Basil12 ID > Uniport ID # c. Alphabetic order (ASCII-betical sort)

#Zhengzhi Xie 3-14-2007

# (1) read an entry from nr_entry_file, and split the entry into single IDs # (2) search the line_entry_file to find how many times it occurs # (3) find whether the entry contains Basil12 or uniprot IDs and assign other order

# use: perl assign_single_ID.pl [nr_entry_file] [line_entry_file] # example: perl assign_single_ID.pl !nr_protein_entries_4_lines.txt protein_entries_4_lines.txt

@line_entries = (); %Basil12 =(); %Uniprot = ();

$in = $ARGV[1]; open (INFILE, "<$in") || die("Cannot read file $in\n"); @line_entries = ; close INFILE;

171

$in = $ARGV[0]; $out = "singleID_".$ARGV[0]; open (INFILE, "<$in") || die("Cannot read file $in\n"); open (OUTFILE, ">$out") || die("Cannot write file $out\n");

while ($line = ){ %ID_score = (); @candidate =(); @candidate2=(); $singleID = ""; $highest_score =0; chomp $line; $Basil12{$line}=0; $Uniprot{$line}=0; print "\n$line\n"; @item = split(/,/,$line);

foreach $it (@item){ $count = 0; foreach $e (@line_entries){ # (2) search the line_entry_file to find how many times it occurs if ($e =~ /$it/) { $count++;} } $ID_score{$it} = $count * 100; if ($it =~ /Basil12_/) {$Basil12{$line}=1;$ID_score{$it}++;} else {$Uniprot{$line}=1;} if($highest_score < $ID_score{$it}){$highest_score = $ID_score{$it};}

172

}

foreach $it2 (@item){ if ($ID_score{$it2} ==$highest_score){push(@candidate, $it2); print "$it2,"; } } @candidate2 = sort { lc($a) cmp lc($b) } @candidate; # c. ASCII-betical sort http://www.perlmonks.org/?node_id=128722 $singleID = $candidate2[0]; print "---$singleID---$ID_score{$singleID}"; print OUTFILE "$line\t"; print OUTFILE "$singleID\t"; print OUTFILE "$Basil12{$line}\t"; print OUTFILE "$Uniprot{$line}\n";

}

close INFILE; close OUTFILE; singleID_to_lines.pl #!/usr/local/bin/perl -w # this script is to assign the representative IDs (single IDs) to the protein entries of each line

#Zhengzhi Xie 3-14-2007

# (1) read an entry from entry_file for all lines, and split the entry into single IDs # (2) search the singleID_nr_entry_file for the combined entry containing this ID # (3) output the original entry, the combined entry, and the representative ID

173

# use: perl singleID_to_lines.pl [singleID_nr_entry_file] [line_entry_file] # example: perl singleID_to_lines.pl singleID_!nr_protein_entries_4_lines.txt protein_entries_4_lines.txt

%single_ID = (); %Basil12 =(); %Uniprot = ();

$in = $ARGV[0]; open (INFILE, "<$in") || die("Cannot read file $in\n"); while ($line = ){ @item1 = split(/\t/,$line); $single_ID{$item1[1]} =$item1[0]; # the three hash arrays contains all information from singleID_nr_entry_file $Basil12{$item1[1]} =$item1[2]; $Uniprot{$item1[1]} =$item1[3]; } close INFILE;

$in = $ARGV[1]; $out = "lines_".$ARGV[0]; open (INFILE, "<$in") || die("Cannot read file $in\n"); open (OUTFILE, ">$out") || die("Cannot write file $out\n");

while ($line = ){ chomp $line;

174

print "\n$line\n"; @item = split(/,/,$line); $ID = ""; foreach $it (@item){ foreach $key (keys %single_ID) { if ($single_ID{$key} =~ /$it/) { $ID = $key; last;} # $ID is the combined entry containing ID in $it } } print OUTFILE "$line\t"; print OUTFILE "$ID\t"; print OUTFILE "$single_ID{$ID}\t"; print OUTFILE "$Basil12{$ID}\t"; print OUTFILE "$Uniprot{$ID}";

}

close INFILE; close OUTFILE; all_tfa_description.pl #!/usr/local/bin/perl -w # this script is to extract all description lines from a tfa file #Zhengzhi Xie 3-15-2007

# (1) read a line from a tfa file # (2) if it start with ">", then write it into the output file

# use: perl all_tfa_description.pl [tfa_file]

175

# example: perl all_tfa_description.pl basil_plant_protein3.tfa

$in = $ARGV[0]; $out = $ARGV[0]."_des.txt"; open (INFILE, "<$in") || die("Cannot read file $in\n"); open (OUTFILE, ">$out") || die("Cannot write file $out\n"); while ($line = ){ if ($line =~ /^>/) { @item = split(/\s/,$line); $item [0] =~ s/>//; print OUTFILE "\n"; print OUTFILE $item[0]; print OUTFILE "\t"; splice(@item,0,1); foreach $t (@item){ $t = $t." "; } print OUTFILE @item; } else {next;}

} close OUTFILE; close INFILE;

ID_description.pl #!/usr/local/bin/perl -w

176

# this script is to extract the description of IDs in a list from a description file (output of all_tfa_description.pl)

#Zhengzhi Xie 3-15-2007

# (1) read the ID list # (2) read the description file, and write the line if the ID is in the list

# use: perl ID_description.pl [ID_list_file] [description_file] # example: ID_description.pl all_IDs.txt basil_plant_protein3_des.txt

@ID = (); $in = $ARGV[0]; open (INFILE, "<$in") || die("Cannot read file $in\n"); @ID = ; foreach $x (@ID){ chomp $x; } close INFILE;

$in = $ARGV[1]; $out = "IDs_".$ARGV[1]; open (INFILE, "<$in") || die("Cannot read file $in\n"); open (OUTFILE, ">$out") || die("Cannot write file $out\n"); while ($line = ){ @item = split(/\t/,$line); $judge = 0; foreach $i (@ID){

177

if ($item[0] eq $i){$judge = 1; print "$line"; last;} } if ($judge == 1){print OUTFILE "$line";} }

close OUTFILE; close INFILE;

organize_des.pl #!/usr/local/bin/perl -w # this script is to organize the description of IDs according to their entry groups

#Zhengzhi Xie 3-15-2007

# (1) read an line from a tfa file # (2) if it start with ">", then write it into the output file

# use: perl organize_des.pl [description_file] [entry_group_file] # example: perl organize_des.pl IDs_basil_plant_protein3_des.txt singleID_!nr_protein_entries_4_lines.txt

%des_ID = (); $in = $ARGV[0]; open (INFILE, "<$in") || die("Cannot read file $in\n"); while ($line = ){ @item1 = split(/\t/,$line); $des_ID{$item1[0]}= $item1[1]; } close INFILE;

178

$in = $ARGV[1]; $out = "des_".$ARGV[1]; open (INFILE, "<$in") || die("Cannot read file $in\n"); open (OUTFILE, ">$out") || die("Cannot write file $out\n"); while ($line = ){ @item = split(/\t/,$line); print OUTFILE "\n\n\nRepresentative ID: "; print OUTFILE "$item[1]\t"; print OUTFILE "$des_ID{$item[1]}\n"; @item2 = split(/,/,$item[0]); foreach $it2 (@item2){ print OUTFILE "$it2\t"; print "$it2\n"; print OUTFILE "$des_ID{$it2}"; } } close OUTFILE; close INFILE;

basil_or_uniport.pl #!/usr/local/bin/perl -w # this script is to find whether a entry contain basil12 or uniprot ID.

#Zhengzhi Xie 4-3-2007

# use: perl basil_or_uniport.pl [input_file] # example: perl basil_or_uniport.pl all_protein_entries.txt

179

$in = $ARGV[0]; $out = "Basil_Uniprot".$ARGV[0]; open (INFILE, "<$in") || die("Cannot read file $in\n"); open (OUTFILE, ">$out") || die("Cannot write file $out\n"); while ($line = ){ chomp $line; @item = split(/,/,$line); $Basil_judge = 0; $uniprot_judge = 0; foreach $it (@item){ if ($it =~ /Basil12/) {$Basil_judge = 1;}else {$uniprot_judge = 1;} # judging if this is a Basil12 ID } print OUTFILE "$line\t"; print OUTFILE "$Basil_judge\t"; print OUTFILE "$uniprot_judge\n";

} close INFILE; close OUTFILE;

180

(2) Perl scripts for the calculation of TSC and TEN based on TPP and blast results

Find_peptides.pl #!/usr/local/bin/perl -w #Zhengzhi Xie (Script Created 1-16-2007, Comments added 5-9-2007) #David Gang's Lab, Department of Plant Sciences, University of Arizona

# Aim: # This script is to extract the records of specific protein IDs from # the Excel result of TPP (TPP version: v2.9.5 GALE, Institute for Systems Biology), # summarize the relative information about proteins and peptides, # and calculate the TSC (total spectrum count) value for the input protein ID list.

# Input files: # The Excel result of TPP in fact a txt file. # After the extended name changed from xls to txt, the file is readable by Perl script. # For example: interact-prot_EMX.xls to interact-prot_EMX.txt

# The specific protein IDs are saved at a txt file named XXXX_ID.txt. # One ID for each line. # The IDs will be used to match the "protein" column of Excel result of TPP.

#output files: # This script write 5 output files. The most important one is named !_XXXX_nr_sum.txt # This output file a txt file with Tabs as delimiters, so it can be opened using Excel. # This output file has 3 part. # The TSC value is at the 1st part, the 4th row: "TPP spectrum count (TSC): XXXX"

# Pseudocode: # (1) read the ID list into an array # (2) read the output of TPP and write the records matching the IDs

181

# (3) read the output of last step and write the records fulfilling the condition. # condition: protein_probability >=0.95 and ($nsp_adjusted_probability >= 0.95 or $initial_probability >=0.95) # (4) read the output of last step and write the record fulfilling the condition (eliminate redundant records). # condition: group by peptide_sequence and n_instances # (5) read the output of above steps and write the summery # note: step (3) and (4) are for the calculation of "over all total spectrum count", # which counts in the shared peptides with good probabilities. # Step (3) and (4) have no impact on the calculation of TPP spectrum count (TSC)

# Use: # Command line: perl Find_peptides.pl protein_ID_list TPP_output # Example: perl Find_peptides.pl PALY_ID.txt interact-prot_MC.txt # PALY_ID.txt is the list of protein IDs # interact-prot_MC.txt is the TPP output (for line MC) # Above input files must be in the same folder of this script # This script was used under ActivePerl 5.8.8, but it has not been tested under other perl environments # Example input and output files are in the folder: X:\xie\dissertation\proteomics\perl scripts\Find_peptides_pl

# (1) read the ID list into an array open (INFILE, "<$ARGV[0]") || die("Cannot read file $ARGV[0]\n"); @IDs = ; foreach $a (@IDs) { chomp $a; } $protein_all_lines = @IDs; close INFILE;

182

# (2) read the output of TPP and write the record fulfilling following condition. # condition: ID in the list open (INFILE, "<$ARGV[1]") || die("Cannot read file $ARGV[1]\n");# TPP output @item2 = split(/\./,$ARGV[0]); @item3 = split(/\./,$ARGV[1]); $out1 = $item2[0].$item3[0]; $out =$out1."_all.txt"; # the file name should not have too many "." open (OUTFILE, ">$out") || die("Cannot write file $out\n");

%protein_line = (); # this is a hash to store protein information in a specific line $protein_name = ""; $description = ""; $protein_probability = 0;

while ($line = ){ unless ($line =~ /\t/){next;} # skip the blank lines $judge = 0; @item = split(/\t/,$line); $ID_group = $item[2]; @single_ID = split(/,/,$ID_group); foreach $b (@single_ID) { foreach $c (@IDs) { if ($b eq $c){ $judge = 1; $protein_name = $b; $description = $item[9]; $protein_probability = $item[4]; $tot_num_peps = $item[7]; $entry_no = $item[0];

183

} } } if ($judge == 1) { print OUTFILE $line; unless ($protein_probability <0.95){ $protein_line{$protein_name} = $entry_no."\t".$ID_group."\t".$protein_probability."\t".$description."\t".$tot_num_peps; } } } close OUTFILE; close INFILE;

# (3) read the output of last step and write the record fulfilling the condition. # condition: protein_probability >=0.95 and ($nsp_adjusted_probability >= 0.95 or $initial_probability >=0.95) open (INFILE, "<$out") || die("Cannot read file $out\n");# TPP output $out =$out1."_095.txt"; open (OUTFILE, ">$out") || die("Cannot write file $out\n");

while ($line = ){ $judge = 1; @item = split(/\t/,$line); $protein_probability = $item[4]; $nsp_adjusted_probability = $item[15]; $initial_probability = $item[16];

184

#protein probability must >=0.95 if ($protein_probability < 0.95) {$judge = 0;} #$nsp_adjusted_probability or initial_probability must >= 0.95 if (($nsp_adjusted_probability < 0.95) && ($initial_probability <0.95)) {$judge = 0;} if ($judge == 1) {print OUTFILE $line;} } close OUTFILE; close INFILE;

# (4) read the output of last step and write the record fulfilling the condition. # condition: group by peptide_sequence and n_instances open (INFILE, "<$out") || die("Cannot read file $out\n");# TPP output $out =$out1."_095_q.txt"; open (OUTFILE, ">$out") || die("Cannot write file $out\n");

%peptide_line = (); # this is a hash to store peptide information in a specific line @all_sequence_n = (); # different charge state of peptide will be counted independently while ($line = ){ $judge = 1; @item = split(/\t/,$line); $peptide_sequence = $item[13]; $n_instances = $item[19]; $sequence_n = $peptide_sequence.$n_instances;

foreach $d (@all_sequence_n) { if ($d eq $sequence_n) {$judge = 0;} } if ($judge == 1) { push (@all_sequence_n, $sequence_n); print OUTFILE $line;

185

$peptide_line{$sequence_n}= $peptide_sequence."\t".$n_instances."\t".$item[15]."\t".$item[16]; # $nsp_adjusted_probability = $item[15]; $initial_probability = $item[16]; } } close OUTFILE; close INFILE;

# (5) read the output of above steps and write the summery open (INFILE, "<$out") || die("Cannot read file $out\n");# TPP output $out ="!_".$out1."_095_nr_sum.txt"; open (OUTFILE, ">$out") || die("Cannot write file $out\n");

#caculate the varables useful for the output $sum_n = 0; @protein_list_line =(); while ($line = ){ @item = split(/\t/,$line); $n_instances = $item[19]; $sum_n = $sum_n + $n_instances }

$protein_group_count = scalar(keys %protein_line); $peptide_count = scalar(keys %peptide_line);

$individual_protein_count = 0; $TPP_spectra_count = 0; foreach $g (keys %protein_line) { @item4 = split(/\t/,$protein_line{$g}); $TPP_spectra_count = $TPP_spectra_count + $item4[4];

186

@item5 = split(/,/,$item4[1]); foreach $h (@item5){ push (@protein_list_line, $h); } } $individual_protein_count = @protein_list_line;

#output: the head of the file print OUTFILE "The output of $ARGV[0] and $ARGV[1]\n"; print OUTFILE "protien number in all lines (probability >= 0.95):\t\t$protein_all_lines\n"; print OUTFILE "over all total spectrum count: \t\t$sum_n\n"; # this counts in the shared peptides with good probabilities. print OUTFILE "TPP total spectrum count(TSC): \t\t$TPP_spectra_count\n"; # this is the sum of the original TPP output. print OUTFILE "total peptide count(TPC): \t\t$peptide_count\n"; print OUTFILE "protien (group) number in this lines (probability >= 0.95):\t\t$protein_group_count\n"; print OUTFILE "individual protien number in this lines (probability >= 0.95):\t\t$individual_protein_count\n"; #output: detail of proteins print OUTFILE "------\n"; print OUTFILE "\ndetail of proteins (with group probability >=0.1)\n"; print OUTFILE "marker\tentry_no\tprotein_name\tprotein_probability\tdescription\ttot_num_peps\n"; foreach $e (keys %protein_line) { print OUTFILE ">\t$protein_line{$e}\n"; } #output: detail of peptides print OUTFILE "------\n"; print OUTFILE "\ndetail of peptides\n"; print OUTFILE "peptide_sequence\tn_instances\tnsp_adjusted_probability\tinitial_probability\n"; foreach $f (keys %peptide_line) {

187 print OUTFILE "$peptide_line{$f}\n"; } close INFILE; close OUTFILE;

#output to another file for protein ID in this specific line open (INFILE, "<$out") || die("Cannot read file $out\n");# TPP output $out ="__".$out1."_protein_list.txt"; open (OUTFILE, ">$out") || die("Cannot write file $out\n");

%protein_p_num = (); while ($line = ){ unless ($line =~ /^>/){next;} @item6 = split(/\t/,$line); $ID_group = $item6[2]; $tot_num_peps = $item6[5]; @item7 = split(/,/,$ID_group); $redundancy = @item7; # how many protein in one entry (they are not distinguishable) foreach $m (@item7){ $p = $tot_num_peps/$redundancy; $q = int($p * 100)/100; #round up to two digits after decimal point $protein_p_num{$m} = $q; } } foreach $k (@protein_list_line){ print OUTFILE "$k\t$protein_p_num{$k}\t+\n"; } close INFILE; close OUTFILE;

188

IDtoSeq.pl #!/usr/local/bin/perl -w # get fasta sequence of EST using its ID #Zhengzhi Xie 11-11-2005

#this script read the ID list, find the sequence from a fasta file, and write them into a new file

#use: perl IDtoSeq.pl [list] [fasta.file.input] [fasta.file.output]

use Bio::SeqIO; use Bio::Seq; $in = Bio::SeqIO->new(-file => "$ARGV[1]" , '-format' => 'Fasta');# where the sequence come from $out = Bio::SeqIO->new(-file => ">$ARGV[2]" , '-format' => 'Fasta'); open (INFILE, "<$ARGV[0]"); @NRID = (); @NRID = ; close INFILE; foreach $ID (@NRID) { chomp $ID; $ID =~ s/\|/__/g; # take care the case of gi|XXXX||XXX } while ( my $seq = $in->next_seq() ) { $key = $seq->display_id; chomp $key; $key =~ s/\|/__/g; # take care the case of gi|XXXX||XXX foreach $ID (@NRID) {

189

if ($ID eq $key) { print $key; print "***"; print "$ID\n"; $out->write_seq($seq); } } }

Find_target_EST.pl #!/usr/local/bin/perl -w # this script is to extract the information from Blast result for manual validation # the input file is the result of blast with option -m 9, file name: [gene_short_name].blast # the blast command like: blastall -p blastp -d 095_4lines -i PALY.tfa -o PALY.blast -e 1e-10 -m 9 # the output file is a txt file with Tab as delimiter. The file contains Subject id, Query id, e-value, # and score of a Subject id. if there are multiple hits, only the one with best sore will be included. # output filename: [gene_short_name]_blast_result.txt

# the script works as following steps: # (1) get all the lines with Subject ids from the result of blast--> outfile: blast_no_header.txt # the file contain: Subject id, Query id, e-value, and score # (2) compare the scores on lines for a id, write the line with highest scores into output file: highest_sores.txt # (3) search the Subject id in the data base source file (fasta format) and add description into the result file

#Zhengzhi Xie 1-16-2007

# use: perl Find_target.pl blast_output_file database_source_file

190

# example: perl Find_target.pl PALY.tfa PALY.blast IDs_095_4lines.tfa # PALY.tfa is the file contains the seed peptide sequences from uniprot searching # PALY.blast is the result of blast # IDs_095_4lines.tfa is the source file of the database (fasta format) # above files must be in the same folder of this script

unless(@ARGV == 3 ) {die "please give the filenames of blast output file, and source file of the database\n";}

# (1) get all the lines with Subject ids from the result of blast--> outfile: blast_no_header.txt #print "get all the est ID from the result of tblastn\n"; open (INFILE, "<$ARGV[1]") || die("Cannot read file blast result of $ARGV[1]\n");# the input file that contains blast results open (OUTFILE, ">blast_no_header.txt") || die("Cannot write file blast_no_header.txt\n");# output file

#search each query sequence while ($line = ){ if ($line =~ /^#/) {next; } else { @item = split(/\t/,$line); print OUTFILE $item[1]; print OUTFILE "\t"; print OUTFILE $item[0]; print OUTFILE "\t"; print OUTFILE $item[10]; print OUTFILE "\t"; print OUTFILE $item[11];

} } close OUTFILE;

191

close INFILE;

# (2) compare the scores on lines for a id, write the line with highest scores into outputfile: highest_sores.txt open (INFILE, "highest_sores.txt") || die("Cannot write file highest_sores.txt\n");# output file

#compare scores, and store the lines with highest socre at %id_line

%scores = (); %id_line = (); @subject_id_array = (); while ($line = ){ chomp $line; @item1 = split(/\t/,$line); $subject_id = $item1[0]; $score = $item1[3];

unless ($scores{$subject_id}) { $scores{$subject_id} = 0.001; }# Prevent the program to complain about undefined value

if($scores{$subject_id} < $score){ $scores{$subject_id} = $score; $id_line{$subject_id} = $line; } }

# output

192 foreach $k (keys (%id_line)){ push(@subject_id_array, $k); print OUTFILE $id_line{$k}; print OUTFILE "\n"; } close OUTFILE; close INFILE;

# (3) search the Subject id in the data base source file (fasta format) and add description into the result file use Bio::SeqIO; use Bio::Seq; $in = Bio::SeqIO->new(-file => " $ARGV[2]" , '-format' => 'Fasta');# where the sequence come from @item4 = split(/\./,$ARGV[0]); $out = $item4[0]."_r_result.txt"; open (OUTFILE, ">$out") || die("Cannot write file $out \n");# output file while ( my $seq = $in->next_seq() ) { $key = $seq->display_id; $descrip = $seq->description; chomp $key; foreach $ID (@subject_id_array) { if ($ID eq $key) { @item5 = split(/_/,$ID); $new_ID = $item5[1]."_".$item5[2]; print $key; print "***"; print "$new_ID\n"; $id_line{$ID} =~ s/$ID/$new_ID/; $id_line{$ID} = $id_line{$ID}."\t".$descrip; }

193

} } # output foreach $k (keys (%id_line)){ push(@subject_id_array, $k); print OUTFILE $id_line{$k}; print OUTFILE "\n"; } close OUTFILE;

#(4) if there are multiple ORF, leave only one of them (the one with the best sore) open (INFILE, "<$out") || die("Cannot read file blast result of $out\n"); $out = "!".$item4[0]."_EST_nr_result.txt"; open (OUTFILE, ">$out") || die("Cannot write file $out \n");

%scores = (); %id_line = (); while ($line = ){ chomp $line; @item1 = split(/\t/,$line); $subject_id = $item1[0]; $score = $item1[3];

unless ($scores{$subject_id}) { $scores{$subject_id} = 0.001; }# Prevent the program to complain about undefined value

if($scores{$subject_id} < $score){ $scores{$subject_id} = $score; $id_line{$subject_id} = $line; }

194

}

# output foreach $k (keys (%id_line)){ print OUTFILE $id_line{$k}; print OUTFILE "\n"; }

close OUTFILE; close INFILE;

Contig_number.pl #!/usr/local/bin/perl -w # get contig number from the whole EST ID (with ORF) #Zhengzhi Xie 11-18-2007

#this script read the ID list, find the sequence from a fasta file, and write them into a new file

#use: perl IDtoSeq.pl [list] [fasta.file.input] [fasta.file.output] #(1) input from screen while ($line = <>) { chomp ($line); if($line eq ""){ last; }else{push(@x, $line);} }

195

#(2) trim the whole EST ID (with ORF) foreach $y (@x){ @item5 = split(/_/,$y); $new_ID = $item5[1]."_".$item5[2]; push(@new_x, $new_ID); }

#(3) get ride of redundancy and output

$out = "contig_number.txt"; open (OUTFILE, ">$out") || die("Cannot write file $out \n"); @new_y = (); foreach $z (@new_x){ $judge =0; foreach $f (@new_y){ if ($z eq $f){$judge = 1;} } if($judge == 0){ print OUTFILE "$z\n"; push(@new_y, $z);} } close OUTFILE;

196

(3) Perl scripts to assistant in the functional categorization based on GO controlled vocabulary

find_GO.pl #! C:/Perl/bin -w #Zhengzhi Xie (Script Created 2-17-2006, Comments added 5-9-2007) #David Gang's Lab, Department of Plant Sciences, University of Arizona

# Aim: # To select GO terms of specific Uniprot IDs from a big GO file, # which is too big to handle using MS Access.

# Input files: # The big GO file is a txt file which unzipped from gene_association.goa_uniprot.gz. # The zipped file is downloaded from EBI (ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/)

# The ID_file is txt file with Uniprot IDs of interest. One ID per line.

# Output file: # The file name is input file name.GO. It is a smaller GO file. # It only contains GO terms of IDs of interest.

# Screen output: # number XXXXXX_XXXXX # number: No. of GO term found # XXXXXX_XXXXX: Uniprot ID

# use: # perl find_GO.pl big_GO_file ID_file

197

# example: perl find_GO.pl test_GO.txt test_ID.txt # Above input files must be in the same folder of this script # This script was used under ActivePerl 5.8.8, but it has not been tested under other perl environments

$GOA = $ARGV[0]; @out = split(/\./, $ARGV[1]); $outfile = $out[0].".GO"; $k = 1;

open (INFILE,"<$ARGV[1]") || die "could not open the input file $ARGV[0]"; @ID = ; foreach $i (@ID){chomp $i;} close INFILE;

open (INFILE,"<$GOA") || die "could not open the input file $GOA"; open (OUTFILE,">$outfile") || die "could not open the output file $outfile"; while ($line = ){ foreach $i (@ID){ if($line =~ /$i/){ print OUTFILE "$line"; print"$k\t$i\n"; $k++; } } } close INFILE;

198

close OUTFILE;

Install go-slime at Amadeus and Commands to run go-slime a. Install the program at Amadeus ~ i. The information of the program is available at http://search.cpan.org/~cmungall/go-perl/scripts/map2slim ii. The script is part of the go-perl package, available from CPAN, http://search.cpan.org/~cmungall/go-perl/ iii. Install the perl module according to the instruction at http://search.cpan.org/src/CMUNGALL/go-perl- 0.06/INSTALL.html. Use Protocol III - installing modules in your home directory (no requirement for the administer privilege: IO-String ÆData- StagÆgo-perl 1. the current perl version is 5.8.8, so some of the paths need to be modified. 2. use gunzip before tar –xvf XXXX iv. Before use, setup. [they should be at the startup file, but when I modified the file, vnc stopped working. So it is better to type them in each time before use]

setenv PATH "${PATH}:$HOME/bin" setenv PERL5LIB "${PERL5LIB}:$HOME/lib/perl5/site_perl/5.8.0/" setenv PERL5LIB "${PERL5LIB}:$HOME/lib/perl5/site_perl/5.8.8/"

b. Download the obo files, and upload them at amadeus i. http://www.geneontology.org/GO_slims/goslim_plant.obo [date: 14:12:2006 19:30] upload it at ~/go-perl-0.06/GO/GO_slims ii. http://www.geneontology.org/ontology/gene_ontology.obo [date: date: 14:12:2006 19:30] upload it at ~/go-perl-0.06/GO/ontology iii. Upload final_list_union_all_4lines_GO.txt and change the file name into final_list_union_all_4lines_GO.fb c. Run the script: the first command gives a file with protein-GO association table. The second one gives a file with a table to show how many proteins are found under each GO term cd /home/pharmtox/zhengzhi/go-perl-0.06/GO map2slim GO_slims/goslim_plant.obo ontology/gene_ontology.obo gene- associations/final_list_union_all_4lines_GO.fb -o final_4lines_GOslim.out

* -b bucket slim file

cd /home/pharmtox/zhengzhi/go-perl-0.06/GO

199

map2slim GO_slims/goslim_plant.obo ontology/gene_ontology.obo gene- associations/final_list_union_all_4lines_GO.fb -c -o final_4lines_GOslim_number.out

(4) Perl scripts to process X!tandem result and find the proteins with the specific modification. Find_modification.pl #!/usr/local/bin/perl -w # this script is to extract the information from X!tandem search result for protein modification # the input file is the xml result of X!tandem

# the out put file is a txt file which contains all peptide sequences with specific # modification (mass shift)

# the script works as following steps: # (1) read all peptide lines in the xml file and write them into a txt file if the string is found # (2) get ride of redundancy, and organize the file into a tab limited txt file.

#Zhengzhi Xie 1-31-2007

# usage: perl Find_modification.pl xml_result mass shift # example: perl Find_modification.pl SW_sol_2_phosph2.xml 80 # SW_sol_2_phosph2.xml is the xml result file of X!tandem # 80 is the mass shift for phosphorylation. In the xml file: # xml file must be in the same folder of this script

unless(@ARGV == 2 ) {die "please give enough input parameters\n";}

200

$string = $ARGV[1];

# (1) read all peptide lines in the xml file and write them into a txt file if the string is found open (INFILE, "<$ARGV[0]") || die("Cannot read xml file result $ARGV[0]\n");# the input file that contains blast results @item4 = split(/\./,$ARGV[0]); $out = $item4[0]."_lines_peptide.txt"; open (OUTFILE, ">$out") || die("Cannot write file $out\n");# output file

while ($line = ){ if ($line =~ /

} close OUTFILE; close INFILE;

# (2) get ride of redandency, and organize the file into a tab limited txt file. %expect = (); open (INFILE, "<$out") || die("Cannot read xml file result $ARGV[0]\n");

201

$out = $item4[0]."_nr_peptide.txt"; open (OUTFILE, ">$out") || die("Cannot write file $out\n");# output file while ($line = ){ @item = split(/\"/,$line); $seq = $item[29]; $expect_v = $item[7]; unless (defined $expect{$seq}){$expect{$seq} = 10;} # stop the program complain about " uninitialized value" if ($expect{$seq} > $expect_v){$expect{$seq} = $expect_v;} # store the maxium value } foreach $k (keys (%expect)){ if($k =~ /X/i){next;} #exclude if the peptide has X in its sequence unless ($k =~ /(R|K)$/i) {next;} # exclude if the peptide not end with R or K print OUTFILE "$k\t\t\t"; print OUTFILE "$expect{$k}\n"; } close OUTFILE; close INFILE; open_IE_for_sequence.pl #!/usr/local/bin/perl -w # this script should be used after Find_modification.pl # the input files are XXXX_XXX_XXX_nr_peptide.txt and 1.htm # create 1.htm: upload the X!tandem xml result, and save it as "Web Page, HTML only" file with the name 1.htm

# this script will open new windows of the specfic peptide result for manual review # after one window is closed, it will give another, until all peptide sequences in the XXXX_XXX_XXX_nr_peptide.txt have been reviwed .

# the script works as following steps:

202

# (1) read all lines in the 1.htm, and assign the link to its peptide sequence # (2) read XXXX_XXX_XXX_nr_peptide.txt, open window from thegpm seriver. if the sequence is not found, open google.

#Zhengzhi Xie 2-10-2007

# usage: perl open_IE_for_sequence.pl XXXX_XXX_XXX_nr_peptide.txt # example: perl open_IE_for_sequence.pl SD_mem_ubiq_nr_peptide.txt

#!!!note!!! #------a batch file named "e.bat" in the current folder with one line code: #@start "" /b "C:\Program Files\Internet Explorer\iexplore.exe" %* #------

# input files and the batch file must be in the same folder of this script

# (1) read all lines in the 1.htm, and assign the link to its peptide sequence %address =(); $version = 0; open (INFILE, "<1.htm") || die("Cannot read 1.htm\n"); while ($line = ){ if ($line =~ /href=/) { @item = split(/\"/,$line); $temp_address = $item[1]; # this is the line from the gpm server

$line = ; @item = split(/

if ($temp_sequence=~ /target=/){ @item2 = split(/>/,$temp_sequence);

203

$temp_sequence = $item2[1]; # in some files (from Arua?), the line like "target="sequence 19831">DVNAAVATIK" $version = 1; }

$temp_sequence =~ s/\W//g; # get ride of all blanks

if ($version ==0) {$temp_address = "http://h319.thegpm.org/".$temp_address;} # thegpm update durin data analysis $address{$temp_sequence} = $temp_address; } } close INFILE;

# (2) read XXXX_XXX_XXX_nr_peptide.txt, open window from thegpm seriver. open (INFILE, "<$ARGV[0]") || die("Cannot read file $ARGV[0]\n");# the input file that contains blast results while ($line = ){ @item1 = split(/\t/,$line); $sequence = $item1[0]; print "$sequence\n";

$x = 0; foreach $k (keys (%address)){ if ($k eq $sequence) {`e "$address{$k}"`; $x =1;} # open windows if sequence found } if ($x == 0) {`e "http://www.google.com/"`;} # open google if sequence not found } close INFILE;

204

A2. Scripts and command lines for metabolomic data analysis (1) An example of R commands used to generate heatmaps and carry out HCA analysis based on metabolomic data datafile<-"X:/xie/experiment/metabolomics/4-10- 2007_Metabolomic_result_analysis/LC_auto.txt" data <-read.table(datafile, header = TRUE, sep="\t", row.names = 1) data2<-data.matrix(data) data3 <- t(data2) library(Heatplus) library(gplots) heatmap_2(data2, hclustfun = function(m) hclust(m, method = "ward"), do.dendro = c(TRUE, TRUE), legend = 4,legfrac = 12, col = bluered(128), trim=0.03)

## stop and save the figure

K <- cor(data2[,1:139]) heatmap.2(K, hclustfun = function(m) hclust(m, method = "ward"), col = bluered(16), cexRow = .7, cexCol = .7, symm = TRUE, dend = "row", trace = "none", main = "LC-MS turmeric Rhizome") ## stop and save the figure write.table(K ,file = "X:/xie/experiment/metabolomics/4-10- 2007_Metabolomic_result_analysis/Cor_heatmap/LC_auto_cor.txt", quote = FALSE, sep = "\t") rm(list=ls()) ## using SPSS to count the frequency

205

(2) Perl script to scramble data matrix for correlation distribution analysis scramble_row.pl #! C:/Perl/bin -w # Zhengzhi Xie 4-17-2006

# Aim: # This script is to scramble a numeric matrix in row dimension

# Use: perl scramble_row.pl input_file # input_file is a txt file which contains a numeric matrix, and uses Tabs as delimiters. # Example: perl scramble_row.pl test_matrix.txt

# Pseudocode: # (1) extract a row of data from text files into an array # (2) loop the row for the amount of number in it # (3) in each interation, extract a value based on a random number, remove it from input array, and write it into the output file

# Note: scramble in column dimension is preferred. But it is hard to achieve using perl script. # This problem can be dissolved by transpose the matrix before and after scrambling using R command t()

@out = split(/\./, $ARGV[0]); $outfile = $out[0]."_scrambled.txt"; open (INFILE,"<$ARGV[0]") || die "could not open the input file $ARGV[0]"; open (OUTFILE,">$outfile") || die "could not open the output file $outfile";

$line = ; print OUTFILE "$line"; # the table-head

206

while ($line = ){ @input =(); chomp $line; @input = split(/\t/, $line); #(1) extract a row of data from text files into an array print OUTFILE "$input[0]"; shift (@input); # the frist element of the row # srand(time() ^($$ + ($$ <<15))); # random seed for ($count = @input; $count >= 1; $count--) { # (2) loop the row for the amount of number in it $element = int(rand($count)); $output_number = splice (@input, $element,1); print OUTFILE "\t"; print OUTFILE "$output_number"; # extract a value based on a random number

# delete ($input[$element]);#remove it from input array } print OUTFILE "\n"; }

close INFILE; close OUTFILE;

207

A3. Scripts of Tandem-MS-ASC

0compound_finder.pl #!/usr/bin/perl -w

# this program read the txt file converted from raw file of ESP, # filter out the MS/MS spectra for a compound by # (1) MW # (2) specific MS/MS peaks assigned for a compound

# the aim of this program is to locate known compounds on LC-MS/MS spectra # but the result is not good--all known compound seems already located.

# MW of the compound $MWcom = 399; # MW+1

#peak list @PeakListCom = (381,329,289,315,245,219,177,175,275,249,207,205); # from the excel file

#number of expected matched peaks $expected = 2;

#part 1 #read the text file and write it into a new file with the format like: #scan_time parent_ion_mass daughter_ion_mass daughter_ion_intnesity daughter_ion_precentage open (INFILE,"<$ARGV[0]"); $out = $ARGV[0]."_tandemMS.txt"; open (OUTFILE,">$out");

208

$i=0;

#skip the head of the file $item1 = ; while ($item1 = ){ $item2 = ; if ($item1 eq $item2) {last;} }

#read spectra head and list while ($item1 = ){ $PrecursorMass = 0; $BasePeakIntensity = 0; #read information from the head of a spectra scan, useful information including: #start_time = 0.082833, end_time = 0.000000, packet_type = 0 #uScanCount = 0, PeakIntensity = 2499.000000, PeakMass = 297.304749 #Precursor Mass 401.83 while ($item1 = ){ if ($item1 =~ /^start_time =/) { @line= split(/,/, $item1); @line1item = split(/=/, $line[0]); $startTime = $line1item [1]; } if ($item1 =~ /^uScanCount =/) { @line= split(/,/, $item1); @line2item = split(/=/, $line[1]); $BasePeakIntensity = $line2item [1]; } if ($item1 =~ /^Precursor Mass /) { $item1 =~ s/\015\012//; @line= split(/\s/, $item1);

209

$PrecursorMass = $line [3]; last; } } unless (defined ($PrecursorMass)) {next;} if (($BasePeakIntensity == 0) | ($PrecursorMass ==0)){next;} unless ($i==0) {print OUTFILE "end\n\n";} $i++; print OUTFILE ">"; print OUTFILE sprintf('%.2f',$startTime)."\t"; print OUTFILE sprintf('%.2f',$PrecursorMass)."\n"; # read information peak list from the spectrum #Packet # 0, intensity = 7711.000000, mass/position = 374.853149 while ($item1 = ){ if ($item1 =~ /ScanHeader #/) { last;} # "ScanHeader #" at the beginning of a new spectrum if ($item1 =~ /^Packet #/) { $item1 =~ s/\015\012//; @line= split(/,/, $item1); @line2item = split(/ = /, $line[1]); @line3item = split(/ = /, $line[2]); $intensity = $line2item [1]; $mass = $line3item [1]; # $peaklist{$mass} = $intensity; $precentage = 100*$intensity/$BasePeakIntensity; print OUTFILE sprintf('%.2f',$mass)."\t"; print OUTFILE sprintf('%.0f',$intensity)."\t"; print OUTFILE sprintf('%.1f',$precentage); print OUTFILE "\n"; } } }

210 print OUTFILE "end\n\n"; print OUTFILE "the end of the file"; close (INFILE); close (OUTFILE);

# part 2 # filter out MS/MS spectra and write it into an out file

open (INFILE,"<$out"); $out = $ARGV[0]."_".$MWcom.".txt"; open (OUTFILE,">$out");

#read the input file #>2.47 386.82 #245.06 1945 35.0 #340.92 5553 100.0 #end # $item1 = ""; while ($item1 = ){ %presentage = (); while ($item1 = ){ chomp $item1; if ($item1 =~ /end/) {last;} elsif ($item1 =~ /^>/) { $Precursor = $item1; $Precursor =~ s/>//; @PrecursorLine = split (/\t/,$Precursor); $Pmass = int ($PrecursorLine [1] + 0.5); next;

211

} @SpectraLine = split (/\t/,$item1); $presentage{$SpectraLine [0]} = $SpectraLine [2]; }

#matching unless ( $Pmass == $MWcom) {next;} print "$Pmass,"; $i = 0; @match = (); foreach $Dmass (keys %presentage) { $high = $Dmass + 0.5; $low = $Dmass - 0.5; foreach $item3 (@PeakListCom){ if (($item3>$low) && ($item3<$high)) { $i++; push (@match,$Dmass); } } } print "$i, \t";

#output if ($i > $expected) { print OUTFILE "\n>$Precursor\t"; print OUTFILE "\nexpected peaks: @PeakListCom \n"; print OUTFILE "number of matched peaks: $i\t"; print OUTFILE "matched peaks: @match \n"; foreach $Dmass (sort (keys (%presentage))) { print OUTFILE "$Dmass\t"; print OUTFILE "$presentage{$Dmass}\n";

212

} print OUTFILE "<<\n"; } } close (INFILE); close (OUTFILE);

1st_ztandem/2peak_assign.pl #!/usr/bin/perl -w

#version 0.02: chomp not works well between cygwin and windows, so replace some of them with $XXX =~ s/\015\012//; #input part #read blocks (Ar groups and bridges, file name: blocks.txt) into two arrays #input files format: Ar_or_Bridge mass1,mass2,mass3,... #input files format: Ar 93,123,153 #input files format: Bridge 122,124 open (INFILE,"){ $block =~ s/\015\012//; @item1 = split(/\t/, $block); if($item1[0] eq "Ar") { @Ar=split(/,/, $item1[1]); @Ar_another = @Ar; }

if($item1[0] eq "Bridge") { @Bridge=split(/,/, $item1[1]); } }

213 close (INFILE);

#read MS/MS spectra (spectra.txt) into a hash array #input files format: MW_Rt ion_type_+/- parent_ion_mass daughter_ion_mass1,daughter_ion_mass2,... #input files format: 338_21.3 - 307 307,99,97 #input files format: 338_21.3 + 309 309,103,113,291,273 #Rt: retension time in min.sec from MZmine open (INFILE,"){ $spec=~ s/\015\012//; @item3 = split(/\t/, $spec); $hash_key2 = $item3[0]."_".$item3[1]."_".$item3[2]; $spectra{$hash_key2} = $item3[3]; } close (INFILE);

#processing the spectra

#first read the MW out open (STRCUTRES,">redundant_structures.txt"); $Ar1=0; $Ar2=0; $bri=0; foreach $hashkey3 (keys %spectra) { @item4 = split(/_/, $hashkey3); $MW = $item4[0]; #find possible combination (hypothetic structures) for certain MW foreach $bri (@Bridge){ foreach $Ar1 (@Ar){

214

foreach $Ar2 (@Ar_another){ if ($MW == $bri+$Ar1+$Ar2){ print STRCUTRES ("$Ar1"); print STRCUTRES ("_"); print STRCUTRES ("$bri"); print STRCUTRES ("_"); print STRCUTRES ("$Ar2\n"); } } } }

} close (STRCUTRES);

#get ride of redundant structures (such as Ar1_bri_Ar2 and Ar2_bri_Ar1) open (STRCUTRES,"nr_structures.txt"); @strcutrues = (); @nrstructrues = (); while ($stru = ){ chomp $stru; @strcutrues = split(/_/, $stru); $judge=1; foreach $nrstru (@nrstructrues){ if ($stru eq $nrstru){$judge=0;} @nr_Ar = split(/_/, $nrstru); if ($strcutrues[0] eq $nr_Ar[2]) {$judge=0;} } if ($judge == 1) {push(@nrstructrues, $stru);} }

215

foreach $value1 (@nrstructrues){ print NRSTRCUTRES ("$value1\n"); } close (STRCUTRES); close (NRSTRCUTRES);

#------2generate_spectra.pl @item6 = (); @fragment = (); @strcutrues = ();

#read fragment rules (Ar groups and MW, file name: fragment.txt) into a hash array #input files format: group_name bridge_weight ion_type_+/- mass_change1, mass_change2,... #input files format: curcurminoids 122 + Ar 10,100 #input files format: dihydrocucuminoids 124 - MW -1 open (INFILE,"; close (INFILE);

#generate theoretic spectra for the hypothetic structures # the possible structure in nr_structures.txt is AR_bridge_Ar, such as "93_122_123" and "139_122_77" open (NRSTRCUTRES,"theory_spectra.txt"); while ($struc = ){ chomp $struc; @strcutrues = split(/_/, $struc);

216

$Ar1=$strcutrues[0]; $Ar2=$strcutrues[2]; $bri=$strcutrues[1]; $MW= $Ar1+$Ar2+$bri; # go over the fragment pattern to give possible peak mass foreach $frag (@fragment) { chomp $frag; @item6 = split(/\t/, $frag); $name_frag = $item6[0]; $bri_frag = $item6[1]; $PN_frag = $item6[2]; $type_frag = $item6[3]; $peaks_frag = $item6[4]; unless ($bri_frag == $bri){ next;} #skip following peak generation if bridge is not right #divide the fragment pattern to different types:(+)Ar, (+)MW, (-)Ar, (-)MW print THSPECTRA ("$struc\t"); if ($PN_frag eq "+"){ $parentIon=$MW+1; print THSPECTRA "$name_frag\t"; print THSPECTRA "$PN_frag\t"; print THSPECTRA "$type_frag\t"; print THSPECTRA "$parentIon\t"; if ($type_frag eq "Ar"){ @postiveAr=split(/,/, $peaks_frag); foreach $Arpositive (@postiveAr){ $peak=$Ar1+$Arpositive; print THSPECTRA "$peak,"; $peak=$Ar2+$Arpositive; print THSPECTRA "$peak,"; } } elsif ($type_frag eq "MW") { @positiveMW=split(/,/, $peaks_frag);

217

foreach $MWpositive (@positiveMW){ $peak=$MW+$MWpositive+1; print THSPECTRA "$peak,"; } } print THSPECTRA "\n"; } if ($PN_frag eq "-"){ $parentIon=$MW-1; print THSPECTRA "$name_frag\t"; print THSPECTRA "$PN_frag\t"; print THSPECTRA "$type_frag\t"; print THSPECTRA "$parentIon\t"; if ($type_frag eq "Ar"){ @negativeAr=split(/,/, $peaks_frag); foreach $Arnegative (@negativeAr){ $peak=$Ar1+$Arnegative; print THSPECTRA "$peak,"; $peak=$Ar2+$Arnegative; print THSPECTRA "$peak,"; } } elsif ($type_frag eq "MW") { @negativeMW=split(/,/, $peaks_frag); foreach $MWnegative (@negativeMW){ $peak=$MW+$MWnegative-1; print THSPECTRA "$peak,"; } } print THSPECTRA "\n"; } }

218

} close (NRSTRCUTRES); close (THSPECTRA);

#------4match_spectra.pl #!/usr/bin/perl -w

@sturctureBridge = (); @sturctureBridge2 =(); #read real MS/MS spectra (spectra.txt) into two array (+) and (-) #input files format: MW_Rt ion_type_+/- parent_ion_mass daughter_ion_mass1,daughter_ion_mass2,... #input files format: 338_21.3 - 337 187,145,143,119,217,175,173,149 #input files format: 338_21.3 + 339 319,267,227,215,189,147,145,253,245,219,177,175,121,151 #Rt: retension time in min.sec from MZmine open (INFILE,"; close (INFILE); foreach $realsp (@realSpectra){ $realsp=~ s/\015\012//; @realspec = split(/\t/, $realsp); $ionType=$realspec[1]; #$parentionMass=$realspec[2]; $daughterionMass=$realspec[3]; if ($ionType eq "+"){ @realPositiveMass=split(/,/, $daughterionMass); } if ($ionType eq "-"){ @realnegativeMass=split(/,/, $daughterionMass); }

219

}

#read theoretic MS/MS spectra (theory_spectra.txt) into a array #input files format: structure bridge_type ion_type:+/- ion_source:Ar/MW parent_ion_mass daughter_ion_mass1,daughter_ion_mass2,... #input files format: 93_122_123 curcurminoids - Ar 337 187,217,145,175,143,173,119,149, #input files format: 135_126_77 126O2 + MW 339 321, #Rt: retension time in min.sec from MZmine open (INFILE,"; close (INFILE); #sort theortic structures, read unique structures, and read theortic MS/MS speaks into +/- arrays foreach $theosp (@theoSpectra){ chomp $theosp; @theospec = split(/\t/, $theosp); $structure="$theospec[0]"."\t"."$theospec[1]"."\t"."$theospec[2]"."\t"."$theospec[4]"; #sample: 93_122_123-curcurminoids #sort theortic structures, read unique ones into an array @sturctureBridge $jude1=0; foreach $item7 (@sturctureBridge){ if ($item7 eq $structure){$jude1=1;} } if ($jude1==0){push (@sturctureBridge, $structure);} }

#rewite the theortic MS/MS peak into a new file in a "short" format, which contains (+) in one line and (-) in another. $strBri=""; $structure ="";

220 open (OUTFILE,">short_theory_spectra.txt"); foreach $strBri (@sturctureBridge){ print OUTFILE "$strBri\t"; foreach $theosp2 (@theoSpectra){ chomp $theosp2; @theospec2 = split(/\t/, $theosp2);

$structure="$theospec2[0]"."\t"."$theospec2[1]"."\t"."$theospec2[2]"."\t"."$theospec2[4] "; if ($strBri eq $structure){ print OUTFILE "$theospec2[5]"; } } print OUTFILE "\n"; } close (OUTFILE);

#read the short theortic MS/MS peak file into +/-hash arrays open (INFILE,"; close (INFILE);

#read theortic MS/MS speaks into +/- arrays foreach $theosp (@shorttheoSpectra){ chomp $theosp; @theospec = split(/\t/, $theosp); $hashkey4 = $theospec[0]."\t".$theospec[1]; $jude1=0; foreach $item90(@sturctureBridge2){if ($item90 eq $hashkey4){$jude1=1;}} if ($jude1==0){push (@sturctureBridge2, $hashkey4);} if ($theospec[2] eq "-") {

221

$theonegative{$hashkey4}=$theospec[4]; } if ($theospec[2] eq "+") { $theopositive{$hashkey4}= $theospec[4]; } }

# compare real spectra and theoretical spectra open (OUTFILE,">spectraMatch.txt"); foreach $item8 (@sturctureBridge2){ $peakmatch=0; @theonegativeMass = split(/,/, $theonegative{$item8}); foreach $item9 (@theonegativeMass){ foreach $item10 (@realnegativeMass){ if ($item9 == $item10){$peakmatch++;} } } @theopositiveMass = split(/,/, $theopositive{$item8}); foreach $item11 (@theopositiveMass){ foreach $item12 (@realPositiveMass){ if ($item11 == $item12){$peakmatch++;} } } $precentage = 100*$peakmatch/(@theopositiveMass + @theonegativeMass + 2); #result output, format: #>93_122_123 curcurminoids 20 peaks are found matched 100 precents of theoretic peaks are found in real spectra #theo(-):119 143 145 149 173 175 187 217 #real(-):119 143 145 149 173 175 187 217 #theo(+):145 147 175 177 189 215 219 229 245 255 269 321 #real(+):121 145 147 151 175 177 189 215 219 229 245 255 269 321

222

print OUTFILE ">$item8\t"; print OUTFILE "$peakmatch peaks are found matched\t"; print OUTFILE "$precentage% of theoretic peaks are found in real spectra\n"; @sortedtheonegativeMass = sort {$a <=> $b} @theonegativeMass; print OUTFILE "theo(-):@sortedtheonegativeMass\n"; @sortedrealnegativeMass = sort {$a <=> $b} @realnegativeMass; print OUTFILE "real(-):@sortedrealnegativeMass\n"; @sortedtheopositiveMass = sort {$a <=> $b} @theopositiveMass; print OUTFILE "theo(+):@sortedtheopositiveMass\n"; @sortedrealpositiveMass = sort {$a <=> $b} @realPositiveMass; print OUTFILE "real(+):@sortedrealpositiveMass\n>>\n"; } close (OUTFILE);

#sort the output accroding to their match precetage #read the output file into a hash array open (INFILE,"sortedspectraMatch.txt"); while ($description= ) { $match{$description}=$description; while ($lines = ){ $match{$description} .= $lines; if($lines =~ />>/){print "*";last;} } } close (INFILE); foreach $hashkey4 (keys %match) { @item16 = split(/\t/, $hashkey4); @item17 = split(/%/, $item16[3]); $precent{$hashkey4} = $item17[0];

223

} foreach $key (sort{$precent{$b} <=> $precent{$a}}(keys(%precent))){ #sort the hash by the hash value print OUTFILE "$match{$key}"; } close (OUTFILE);

2nd_pattern_finder/7pattern_identifier.pl #!/usr/bin/perl -w

# the aim of this program is to find ESP MS/MS spectra that represents curcuminoids # psudocode of this program # (1) reads the txt file converted from raw file of ESP, # (2) filters out the MS/MS spectra with expected neutral losses # (3) find the ion serious representing same R group # (4) report the combination of R group which make the right molecule weight #threhold: $i>2: more than 2 neutral lost (3 or 4) #threhold: $k > 3: more than 2 R serious ion (4 or 5)

#read the text file and write it into a new file with the format like: #scan_time parent_ion_mass daughter_ion_mass daughter_ion_intnesity daughter_ion_precentage

@nlose = (-18,-70,-110,-84); # neutral loss mass shift from MW

#R series, mass shift @Rpattern = (122,96,52,54,28); #the first one should be the most intensive one

#Part 1: reads the txt file converted from raw file of ESP,

224 open (INFILE,"<$ARGV[0]"); $out = $ARGV[0]."_tandemMS.txt"; open (OUTFILE,">$out"); $i=0;

#skip the head of the file $item1 = ; while ($item1 = ){ $item2 = ; if ($item1 eq $item2) {last;} }

#read spectra head and list while ($item1 = ){ $PrecursorMass = 0; $BasePeakIntensity = 0; #read information from the head of a spectra scan, useful information including: #start_time = 0.082833, end_time = 0.000000, packet_type = 0 #uScanCount = 0, PeakIntensity = 2499.000000, PeakMass = 297.304749 #Precursor Mass 401.83 while ($item1 = ){ if ($item1 =~ /^start_time =/) { @line= split(/,/, $item1); @line1item = split(/=/, $line[0]); $startTime = $line1item [1]; } if ($item1 =~ /^uScanCount =/) { @line= split(/,/, $item1); @line2item = split(/=/, $line[1]); $BasePeakIntensity = $line2item [1]; }

225

if ($item1 =~ /^Precursor Mass /) { $item1 =~ s/\015\012//; @line= split(/\s/, $item1); $PrecursorMass = $line [3]; last; } } unless (defined ($PrecursorMass)) {next;} if (($BasePeakIntensity == 0) | ($PrecursorMass ==0)){next;} unless ($i==0) {print OUTFILE "end\n\n";} $i++; print OUTFILE ">"; print OUTFILE sprintf('%.2f',$startTime)."\t"; print OUTFILE sprintf('%.2f',$PrecursorMass)."\n"; # read information peak list from the spectrum #Packet # 0, intensity = 7711.000000, mass/position = 374.853149 while ($item1 = ){ if ($item1 =~ /ScanHeader #/) { last;} # "ScanHeader #" at the beginning of a new spectrum if ($item1 =~ /^Packet #/) { $item1 =~ s/\015\012//; @line= split(/,/, $item1); @line2item = split(/ = /, $line[1]); @line3item = split(/ = /, $line[2]); $intensity = $line2item [1]; $mass = $line3item [1]; $peaklist{$mass} = $intensity; $precentage = 100*$intensity/$BasePeakIntensity; print OUTFILE sprintf('%.2f',$mass)."\t"; print OUTFILE sprintf('%.0f',$intensity)."\t"; print OUTFILE sprintf('%.1f',$precentage); print OUTFILE "\n";

226

} } } print OUTFILE "end\n\n"; print OUTFILE "the end of the file"; close (INFILE); close (OUTFILE);

# part 2 select out spectra with expected neutral losses open (INFILE,"<$out"); $out = $ARGV[0]."_n_loss.txt"; open (OUTFILE,">$out");

#read the input file #>2.47 386.82 #245.06 1945 35.0 #340.92 5553 100.0 #end # $item1 = ""; while ($item1 = ){ %presentage = (); @peaks = (); while ($item1 = ){ chomp $item1; if ($item1 =~ /end/) {last;} elsif ($item1 =~ /^>/) { $Precursor = $item1; $Precursor =~ s/>//; @PrecursorLine = split (/\t/,$Precursor); $Pmass = $PrecursorLine [1];

227

next; } @SpectraLine = split (/\t/,$item1); $presentage{$SpectraLine [0]} = $SpectraLine [2]; } foreach $item2 (@nlose){ $ExpectPeak = $Pmass + $item2; push (@peaks,$ExpectPeak); }

#matching $i = 0; @match = (); foreach $Dmass (keys %presentage) { $high = $Dmass + 0.5; $low = $Dmass - 0.5; foreach $item3 (@peaks){ if (($item3>$low) && ($item3<$high)) { $i++; push (@match,$Dmass); } } }

#output if ($i > 2) { print OUTFILE "\n>$Precursor\t"; print OUTFILE "number of matched neutral loss: $i\n"; print OUTFILE "matched peaks: @match \n"; foreach $Dmass (keys %presentage) {

228

print OUTFILE "$Dmass\t"; print OUTFILE "$presentage{$Dmass}\n"; } print OUTFILE "<<\n"; } } close (INFILE); close (OUTFILE);

#part 3 find the ion serious representing same R group #caculate the mass differences of peaks from R pattern @Ppattern = (); foreach $item4 (@Rpattern){ $item5 = $item4 - $Rpattern[0]; push (@Ppattern,$item5); } print "peak different by: "; print "@Ppattern \n";

#read the text file and write it into a new file with the format like: open (INFILE,"<$out"); $out = $ARGV[0]."_R_pattern.txt"; open (OUTFILE,">$out");

while ($item1 = ){ @peaklist =(); while ($item1 = ){ if ($item1 =~ /^>/) {$description = $item1; }elsif($item1 =~ /^matched peaks/){$matchedpeaks = $item1; }elsif($item1 =~ /^<

229

}else { @line= split(/\s/, $item1); $peak = int ($line[0] + 0.5); push (@peaklist,$peak); } } print OUTFILE "\n$description"; print OUTFILE "$matchedpeaks"; print OUTFILE "peak list: "; foreach $item9 (@peaklist){ print OUTFILE "$item9,"; } print OUTFILE "\n";

#find peak pattern for R+X foreach $item8 (@peaklist){ @Pdiffer = (); foreach $item9 (@peaklist){ $diff = $item9-$item8; push (@Pdiffer, $diff); }

@Rseries =(); $k=0; foreach $item10 (@Pdiffer){ foreach $item11 (@Ppattern){ if ( $item10 == $item11){ $k++; $peakRseries = $item8 + $item11; push (@Rseries, $peakRseries);

230

} } } if ($k > 2) { $R = $item8 - $Rpattern[0]; print OUTFILE "$k peaks found as series of R=$R : "; print OUTFILE "@Rseries\n"; } } print OUTFILE "<<\n"; } close (INFILE); close (OUTFILE);

#part 4 Reformate output, list the combination that gives the right MW $bridge = 122; #the MW of bridge

#read the text file and write it into a new file with the format like: open (INFILE,"<$out"); $out = $ARGV[0]."_MW.txt"; open (OUTFILE,">$out");

#input #>23.20 383.16 number of matched neutral loss: 3 #matched peaks: 365.02 299.11 272.94 #peak list: 309,209,332,324,171,365,297,366,153,368,232,301,205,337,333,348,315,185,243,356,33 4,306,303,181,353,298,144,167,195,383,300,217,141,299,279,145,213,352,245,187,354, 183,355,199,177,190,220,327,385,322,276,275,323,340,364,325,214,173,369,280,196,15

231

1,308,273,384,350,178,163,335,351,305,201,150,118,295,153,274,244,202,215,218,149, 231,317,291,157,191,137, #3 peaks found as series of R=87 : 209 141 183 #3 peaks found as series of R=243 : 365 297 295 #3 peaks found as series of R=244 : 366 298 340 #3 peaks found as series of R=246 : 368 298 300 while ($item1 = ){ @Rlist =(); while ($item1 = ){ if ($item1 =~ /^>/) { $description = $item1; print OUTFILE "\n$description"; @line= split(/\s/, $item1); $MW = int ($line[1] - 0.5); print "$MW,"; }elsif($item1 =~ /^matched peaks/){$matchedpeaks = $item1; print OUTFILE "$matchedpeaks"; }elsif($item1 =~ /^peak list/){$PeakList = $item1;print OUTFILE "$PeakList"; }elsif($item1 =~ /^<

print OUTFILE "combination of Rs:\n"; foreach $item9 (@Rlist){ foreach $item10 (@Rlist){ $total = $item9 + $item10 + $bridge;

232

# print "$total,"; if ($total == $MW) { print OUTFILE "$item9-$bridge-$item10\n"; } } # print "\n"; } print OUTFILE "<<\n"; } close (INFILE); close (OUTFILE);

3rd_spectra_aligner\ 8spectra_aligner-124.pl #!/usr/bin/perl -w

# the aim of this program is to find MS/MS (ESP or ESN) spectra that could aligned with given spectra # psudocode of this program # (1) read a text file containing a series of known spectra, and store he peak list of given spectra in hash arrays # (2) reads the txt file converted from raw file of ESP, read the MS/MS spectra out, discard spectra with too low intensity # (3) discard spectra with too few peaks # (4) align the spectrum with the given spectra, find the nubmer of peaks that # a. could be found in the given spectra # b. with a mass differece equal to MW differnce # (5) report the distribution of the matched/aligned peak number and the spectra with high that number

#version 5: statistical result: top 25 vs. top 10-25

233

# MS/MS spectrum candidate threhold: # $BasePeakIntensity > 30000 # peak number >=10, only use top 25 peaks if there are more than 25.

#search top 25 peaks of given spectra, report if >6 found $threholdPeakfound = 6;

# exclude three major curcuminoids, regardless the retion time @exculdeall = (369,309,339,342); #exclude the retion_time-MW pairs that alread found %exclude = ("23.06"=>383, "22.80"=>353, "22.77"=>355, "27.21"=>355, "27.39"=>356, "29.10"=>355, "31.73"=>342, "28.02"=>325, "32.96"=>296, "37.95"=>370, "21.93"=>313, "22.32"=>323, "26.99"=>325, "27.49"=>326, "27.95"=>327, "29.52"=>312, "31.17"=>312, "31.47"=>313, "32.78"=>312, "33.99"=>312, "28.39"=>312, "32.85"=>321, "27.75"=>323, "25.95"=>310, "26.53"=>371, "26.94"=>293, "27.82"=>340, "33.88"=>371, "47.36"=>370, "23.03"=>373, "28.71"=>385, "31.02"=>372, "33.36"=>381, "34.06"=>372, "58.64"=>437, "28.36"=>353, "28.89"=>357, "25.35"=>372, "30.49"=>372, "32.27"=>372, "33.24"=>372, "35.78"=>383, "58.67"=>439, "32.67"=>351, "24.75"=>371, "27.79"=>371, "29.10"=>371, "29.67"=>370, "30.12"=>371, "30.72"=>311, "31.04"=>371, "32.10"=>371, "32.22"=>371, "35.21"=>370, "41.77"=>370, "48.66"=>371, "48.87"=>370, "57.21"=>370, "28.38"=>355, "28.81"=>355, "29.73"=>355, "25.73"=>327, "39.52"=>323, "16.90"=>371, "28.53"=>355, "29.44"=>311, "37.83"=>371, "39.88"=>371, "41.58"=>371, "28.61"=>326);

234

# Part 1 # store he peak list of given spectra in hash arrays #top25 is the 25 most intensive peaks %GivenMW = ( "xie11" => 313);#put the mass of procussor ion, M+H, here @top25_xie11 = (91,107,119,121,125,147,148,149,163,173,175,193,201,207,219,294,295); #@AllPeak_xie11 = (245,285,175,299,177,219,259,151,161,203,351,213,325,217,297,145,227,327,143,283,3 11,160,179,202,323); #random_ESP = (301,177,328,348,130,228,306,342,322,193,316,262,205,322,264,342,231,235,198,170,3 42,214,302,310,351);

# Part 2 # reads the txt file converted from raw file of ESP, # read the MS/MS spectra out, discard spectra with too low intensit

open (INFILE,"<$ARGV[0]"); $out = $ARGV[0]."_tandemMS.txt"; open (OUTFILE,">$out"); $i=0; print "reading spectra file:\n $ARGV[0]\n";

#skip the head of the file $item1 = ; while ($item1 = ){ $item2 = ;

235

if ($item1 eq $item2) {last;} }

#read spectra head and list while ($item1 = ){ $PrecursorMass = 0; $BasePeakIntensity = 0; #read information from the head of a spectra scan, useful information including: #start_time = 0.082833, end_time = 0.000000, packet_type = 0 #uScanCount = 0, PeakIntensity = 2499.000000, PeakMass = 297.304749 #Precursor Mass 401.83 while ($item1 = ){ if ($item1 =~ /^start_time =/) { @line= split(/,/, $item1); @line1item = split(/=/, $line[0]); $startTime = $line1item [1]; } if ($item1 =~ /^uScanCount =/) { @line= split(/,/, $item1); @line2item = split(/=/, $line[1]); $BasePeakIntensity = $line2item [1]; } if ($item1 =~ /^Precursor Mass /) { $item1 =~ s/\015\012//; @line= split(/\s/, $item1); $PrecursorMass = $line [3]; last; } } unless (defined ($PrecursorMass)) {next;} if ( $BasePeakIntensity < 30000) {next;}

236

unless ($i==0) {print OUTFILE "<<\n\n";} $i++; print OUTFILE ">"; print OUTFILE sprintf('%.2f',$startTime)."\t"; print OUTFILE sprintf('%.0f',$PrecursorMass)."\n"; # read information peak list from the spectrum #Packet # 0, intensity = 7711.000000, mass/position = 374.853149 while ($item1 = ){ if ($item1 =~ /ScanHeader #/) { last;} # "ScanHeader #" at the beginning of a new spectrum if ($item1 =~ /^Packet #/) { $item1 =~ s/\015\012//; @line= split(/,/, $item1); @line2item = split(/ = /, $line[1]); @line3item = split(/ = /, $line[2]); $intensity = $line2item [1]; $mass = $line3item [1]; $peaklist{$mass} = $intensity; $precentage = 100*$intensity/$BasePeakIntensity; print OUTFILE sprintf('%.0f',$mass)."\t"; print OUTFILE sprintf('%.0f',$intensity)."\t"; print OUTFILE sprintf('%.3f',$precentage); print OUTFILE "\n"; } } } print OUTFILE "<<\n"; print OUTFILE "the end of the file\n"; close (INFILE); close (OUTFILE);

# part 3 discard spectra with too few peaks, and get top 25 peaks for the rest

237

open (INFILE,"<$out"); $out = $ARGV[0]."_top25.txt"; open (OUTFILE,">$out"); print "discard spectra containing less than 10 MS/MS peaks\n"; print "and got top 25 peaks from the rest of the spectra\n"; #read the input file #>2.47 386.82 #245.06 1945 35.0 #340.92 5553 100.0 #<< # $item1 = ""; while ($item1 = ){ %presentage = (); if ($item1 =~ /the end of the file/){last;} while ($item1 = ){ chomp $item1; if ($item1 =~ ///; next; }else{ @SpectraLine = split (/\t/,$item1); $presentage{$SpectraLine [0]} = $SpectraLine [2]; } } $hashnumber = keys %presentage; # print "$i,"; #print how many peaks in a MS/MS spectrum if ($hashnumber<10) { next;}

238

print OUTFILE "\n>$Precursor\tpeaklist:\t"; $peaknumber = 0; foreach $key (sort{$presentage{$b} <=> $presentage{$a}}(keys(%presentage))){ #sort the hash by the hash value if ($peaknumber >24){last;} print OUTFILE "$key,"; $peaknumber++; }

print OUTFILE "\n"; print OUTFILE "full list:\n"; foreach $key (sort (keys(%presentage))) { print OUTFILE "$key\t"; print OUTFILE "$presentage{$key}\n"; } print OUTFILE "<<\n";

} close (INFILE); close (OUTFILE);

# part 4 # align the spectrum with the given spectra open (INFILE,"<$out"); $out = $ARGV[0]."_alignment_xie11.txt"; open (OUTFILE,">$out"); print "align the spectra\n"; print "find known compounds:";

239

#read the input file, the first line of a block containg useful information: Rt PrecursorIonMass productIonList, #example: >23.72 391 113,149,150,163,167,181,249,250,255,256,257,259,261,278,279,327,345,373, $item1 = ""; @peaklist = (); @line = (); $totalspectranumber = 0; while ($item1 = ){ if ($item1 =~ /^>/) { chomp ($item1); $item1 =~ s/,$//; $item1 =~ s/>//; @line = split (/\s/, $item1); $MW = $line[1]; $Rt = $line[0];

# print "$MW,"; # the mass of spectrrum #should this spectrum be exculde? $e = 0; #three major curcumin? foreach $x (@exculdeall){ if ($x ==$MW) {$e = 1;} } # known Rt-MW pairs foreach $knownRt (keys %exclude) { $Rthigh = $knownRt +1; $Rtlow = $knownRt -1; if (($Rt > $Rtlow) && ($Rt< $Rthigh) && ($MW == $exclude{$knownRt})){ print "*";

240

$e =1; } }

if ( $e == 1) {next;}

$offset = $MW - $GivenMW{xie11}; if ( $offset == 0) {next;} # do not compare the same molecule or isomers $totalspectranumber++; @peaklist = split (/,/, $line[3]); $i = 0; $k =0; @same = (); @AlignedPeaks = (); %match = (); # $size = @peaklist; print "$size,"; # print how many peaks are in a spectrum foreach $top (@top25_xie11){ $ExpectTop = $top + $offset; # the possible mass should be the same or has the same offset with MW $match{$top} = 0; foreach $peak (@peaklist){ if($peak == $top) { $i++; push (@same, $peak); $match{$top} = 1; }elsif ($ExpectTop == $peak) { $k++; push (@AlignedPeaks, $peak); $match{$top} = 1; } }

241

}

# count the peak number than is same or aligned $total = 0; foreach $key (keys %match){ $total = $match{$key} + $total; } push (@alltotal, $total); $ik = $i+$k; push (@addtotal, $ik);

if ($ik > $threholdPeakfound) { print OUTFILE "\n>$item1\n"; print OUTFILE "$total peaks found matched or aligned\n"; print OUTFILE "$i matched peaks: @same\n"; print OUTFILE "$k aligned peaks: \n"; foreach $p (@AlignedPeaks){ $expect = $p - $offset; print OUTFILE "$p\t$expect\n"; } print OUTFILE "offset = $offset\n"; print OUTFILE "<<\n"; } } }

# output the disstribution of the matched/aligned peaks/events number print "\ncalculate disstribution\n";

#matched/aligned peaks number foreach $all (@alltotal){

242

$distributAll{$all}++; } print OUTFILE "disstribution of matching/alignment peaks number at $out\n"; print OUTFILE "total spectra number: $totalspectranumber\n"; for ($count=0; $count<26; $count++){ print OUTFILE "$count\t"; if (defined ($distributAll{$count})) { print OUTFILE "$distributAll{$count}\n"; }else{print OUTFILE "0\n"; } }

#matched/aligned events number foreach $add (@addtotal){ $distributAdd{$add}++; } print OUTFILE "disstribution of matching/alignment events(redundent) number $out\n"; for ($count=0; $count<51; $count++){ print OUTFILE "$count\t"; if (defined ($distributAdd{$count})) { print OUTFILE "$distributAdd{$count}\n"; }else{print OUTFILE "0\n"; } } use warnings;

close (INFILE); close (OUTFILE);

243

APPENDIX B SUPPLEMENTARY DATA OF THE PROTEOMIC RESEARCH

B1. Proteins present in at least three of the four basil lines EC # Description Metabolic enzymes contributing to specialized product biosynthesis EC 4.2.3.- 1,8-cineole synthase EC 1.1.1.267 1-deoxy-D-xylulose-5-phosphate reductoisomerase EC 2.2.1.7 1-deoxyxylulose-5-phosphate synthase EC 5.4.2.1 2,3-bisphosphoglycerate-independent phosphoglycerate mutase EC 4.6.1.12 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase EC 2.7.7.60 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase EC 4.2.3.4 3-dehydroquinate synthase-like protein EC 6.2.1.12 4-coumarate:CoA ligase EC 2.7.1.148 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase EC 1.17.1.2 4-hydroxy-3-methylbut-2-enyl diphosphate reductase EC 1.17.4.3 4-hydroxy-3-methylbut-2-enyl diphosphate synthase EC 2.5.1.19 5-enolpyruvylshikimate-3-phosphate synthase EC 2.7.1.11 6-phosphofructokinase EC 1.1.1.44 6-phosphogluconate dehydrogenase EC 4.2.1.3 aconitate hydratase EC 2.7.1.20 adenosine kinase EC 3.3.1.1 adenosylhomocysteinase EC 2.7.4.3 adenylate kinase EC 2.1.2.10 aminomethyltransferase (glycine cleavage system T protein) EC 2.6.1.1 aspartate aminotransferase EC 3.6.3.14 ATP synthase beta subunit EC 3.6.3.14 ATP synthase gamma subunit EC 2.3.3.8 ATP:citrate lyase. EC 2.1.1.68 caffeic acid 3-O-methyltransferase EC 2.1.1.104 caffeoyl-CoA O-methyltransferase EC 5.5.1.6 chalcone--flavonone isomerase EC 4.2.3.5 chorismate synthase EC 1.14.13.11 cinnamate 4-hydroxylase EC 1.1.1.195 cinnamyl-alcohol dehydrogenase EC 1.8.1.4 dihydrolipoamide dehydrogenase (glycine cleavage system L protein) EC 4.2.1.11 na eugenol synthase EC 2.5.1.10 farnesyl diphosphate synthase EC 1.18.1.2 ferredoxin NADP reductase na flavone synthase na? flavonoid 4'-O-methyltransferase EC 2.1.1.6 flavonoid 7-O-methyltransferase EC 4.1.2.13 fructose-bisphoshpate aldolase na EC 2.5.1.1 geranyl diphosphate synthase small subunit na geranylgeranyl pyrophosphate synthetase EC 1.1.1.49 glucose-6-phosphate 1-dehydrogenase

244

EC 5.3.1.9 glucose-6-phosphate isomerase EC 1.2.1.12 glyceraldehyde-3-phosphate dehydrogenase EC 1.4.4.2 glycine dehydrogenase (glycine cleavage system P protein) EC 1.1.1.42 EC 1.1.1.37 EC 2.1.1.14 methionine synthase EC 1.5.1.20 methylenetetrahydrofolate reductase na monoterpene synthase EC 1.6.99.3 NADH dehydrogenase EC 1.1.1.40 NADP malic enzyme . EC 2.7.4.6 nucleoside diphosphate kinase na p-coumaroyl shikimate 3'-hydroxylase EC 4.3.1.5 phenylalanine ammonia-lyase EC 2.5.1.54 phospho-2-dehydro-3-deoxyheptonate aldolase EC 5.4.2.2 phosphoglucomutase EC 1.1.1.95 phosphoglycerate dehydrogenase-like protein EC 2.7.2.3 phosphoglycerate kinase EC 2.6.1.52 phosphoserine aminotransferase EC 2.7.1.90 pyrophosphate--fructose 6-phosphate 1-phosphotransferase alpha subunit EC 2.7.1.90 pyrophosphate--fructose 6-phosphate 1-phosphotransferase beta subunit EC 2.5.1.6 S-adenosylmethionine synthetase na selinene synthase EC 2.1.2.1 serine hydroxymethyltransferase EC 2.4.1.13 sucrose synthase EC 2.2.1.2 transaldolase EC 2.2.1.1 transketolase EC 5.3.1.1 triosephosphate isomerase EC 5.1.3.2 UDP-glucose 4-epimerase EC 2.7.7.9 UTP-glucose-1-phosphate uridylyltransferase na vestitone reductase

Housekeeping proteins 40S ribosomal protein S5 40S ribosomal protein SA (p40) 60S acidic ribosomal protein P0 60S ribosomal protein L10 60S ribosomal protein L15 60S ribosomal protein L18 70 kDa heat shock protein chaperonin-60 beta subunit precursor chloroplast chaperonin 21 elongation factor EF-2 glycine-rich RNA binding protein 1 heat shock protein 82 heat shock protein 90 histone H3 histone H4 variant TH011 nascent polypeptide associated complex alpha chain EC 3.4.25.1 proteasome subunit alpha type 4 ribosomal protein

245

ribosomal protein S16 small subunit RuBisCO subunit binding-protein alpha subunit, (60 kDa chaperonin alpha

subunit)

Hypothetical proteins F7A19.2 protein (At1g13930/F16A14.27) (Q9XI93_ARATH) Gb|AAC98059.1 (AT3g18420/MYF24_13) (Q9LS48_ARATH) hypothetical protein (Q38JI1_SOLTU) hypothetical protein At4g25650 (Q93WJ7_ARATH) hypothetical protein OSJNBb0081B07.22 (Q852A3_ORYSA) hypothetical protein P0022B05.126 (Q8LHX6_ORYSA) putative dehydration-induced protein (Q7X9S2_GOSBA) SET domain-containing protein SET104 (Q8L820_MAIZE)

Enzymes with unknown functions acyltransferase homolog EC 1.1.1.1 alcohol dehydrogenase class III EC 1.14.-.- cytochrome P450 82A3 EC 1.14.-.- cytochrome b5 isoform 2 EC 1.14.-.- elicitor-inducible cytochrome P450 short-chain alcohol dehydrogenase (TASSELSEED2-like protein)

Oxidative stress proteins EC 1.11.1.11 ascorbate peroxidase EC 1.11.1.6 catalase EC 1.15.1.1 manganese superoxide dismutase EC 1.6.5.4 monodehydroascorbate reductase EC 1.11.1.15 thioredoxin-dependent peroxidase

Signal transduction and transport 36 kDa outer mitochondrial membrane protein porin calreticulin precursor GTP-binding protein phosphate transporter

Fatty acid biosynthesis EC 6.4.1.2 acetyl-CoA carboxylase EC 1.3.1.9 enoyl-ACP reductase EC 3.1.4.4 phospholipase D

Amino acid biosynthesis EC 6.3.1.2 glutamine synthetase

246

B2. Spectral count and descriptions of protein entries

Representative ID SW MC EMX SD Description 5- methyltetrahydropteroyltriglutamate-- homocysteine methyltransferase (EC Basil12_0001_341_3 93 61 118 83 2.1.1.14) (Methionine Synthase) Basil12_0143_01_3 136 6 3 136 Adenosine kinase isoform 2S Basil12_0073_02_1 27 13 84 112 Adenosylhomocysteinase (EC 3.3.1.1) Methylenetetrahydrofolate reductase Basil12_0067_01_1 71 30 78 42 (EC 1.5.1.20) Basil12_0001_016_1 30 48 44 96 GcpE protein Basil12_0898_01_1 37 80 43 28 Sucrose synthase (EC 2.4.1.13) Basil12_0001_279_2 19 48 65 53 Aldolase S-adenosylmethionine synthetase 3 Basil12_0001_326_1 56 48 55 22 (EC 2.5.1.6) 6-phosphogluconate dehydrogenase, putative; 13029-14489 (6- phosphogluconate dehydrogenase, Basil12_0001_028_1 29 47 51 43 putative) Cinnamyl-alcohol dehydrogenase 1 Basil12_0001_382_1 24 24 47 64 (EC 1.1.1.195) Basil12_2459_01_1 12 38 47 61 Enolase (EC 4.2.1.11) Basil12_1457_01_3 11 43 64 38 Transketolase (EC 2.2.1.1) S-adenosylmethionine synthetase 1 Basil12_0001_090_2 9 20 70 21 (EC 2.5.1.6) Phospho-2-dehydro-3-deoxyheptonate aldolase 2, chloroplast precursor (EC Basil12_0001_230_3 45 54 10 8 2.5.1.54) (DAHP Synthase) Allergenic isoflavone reductase-like protein Bet v 6.0102 (PCBER Basil12_0678_01_2 11 32 31 42 Homolog) Basil12_0001_179_3 37 30 28 19 Transaldolase (EC 2.2.1.2) Basil12_1846_01_2 16 48 14 34 70 kDa heat shock protein Phenylalanine ammonia-lyase (EC Basil12_0053_03_2 24 46 26 14 4.3.1.5) Serine hydroxymethyltransferase (EC Basil12_0001_161_2 14 22 25 35 2.1.2.1) Basil12_0068_02_3 8 35 25 22 Ascorbate peroxidase

247

S-adenosylmethionine synthetase 2 Basil12_3819_01_2 23 16 15 33 (EC 2.5.1.6) Basil12_1275_01_2 3 39 13 24 Thioredoxin-dependent peroxidase Caffeoyl-CoA O-methyltransferase Basil12_0001_067_1 9 16 41 12 (EC 2.1.1.104) Basil12_0052_02_2 13 28 13 23 Chaperonin-60 beta subunit precursor Glyceraldehyde-3-phosphate Basil12_0001_190_3 10 24 14 29 dehydrogenase (EC 1.2.1.12) phosphoserine aminotransferase Basil12_1405_01_1 9 19 26 16 (At2g17630) Basil12_0249_01_1 7 22 7 32 Ferredoxin-NADP+ reductase enzyme Basil12_0716_01_1 9 14 26 18 aspartate aminotransferase Triosephosphate isomerase, cytosolic Basil12_0108_02_3 2 20 30 15 (EC 5.3.1.1) Basil12_3635_01_3 6 16 25 19 fructokinase 2 Basil12_0001_073_1 16 23 11 13 Flavonoid 4'-O-methyltransferase Basil12_3297_01_4 11 7 15 30 Selinene synthase Pyrophosphate--fructose 6-phosphate 1-phosphotransferase beta subunit (EC Basil12_2606_01_1 9 4 28 20 2.7.1.90) 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase, chloroplast Basil12_4222_01_2 3 14 26 13 precursor (EC 2.7.7.60) Phosphoglycerate kinase, cytosolic Basil12_0234_01_2 4 20 18 14 (EC 2.7.2.3) Basil12_2999_01_6 16 10 8 16 40S ribosomal protein SA (p40) RuBisCO subunit binding-protein alpha subunit, chloroplast precursor Basil12_0059_03_2 18 6 7 18 (60 kDa chaperonin alpha subunit) Basil12_0001_417_3 5 18 14 11 Calreticulin precursor Basil12_2803_01_1 1 2 1 43 Phosphate transporter Aminomethyltransferase, mitochondrial precursor (EC 2.1.2.10) Basil12_2254_01_2 3 21 20 2 (Glycine Cleavage System) Basil12_0012_01_2 2 22 6 15 Flavonoid 7-O-methyltransferase Eugenol Synthase (Isoflavone Basil12_0001_036_3 17 3 9 16 reductase-like protein) Chalcone--flavonone isomerase (EC Basil12_0014_01_3 6 19 17 2 5.5.1.6)

248

Nucleoside diphosphate kinase (EC Basil12_1011_01_3 4 4 22 14 2.7.4.6) Basil12_1272_02_2 8 17 6 11 heat shock protein 90 Geranyl diphosphate synthase small Basil12_1770_01_3 2 11 10 19 subunit (Gpp synthase small subunit) P-coumaroyl shikimate 3'-hydroxylase Basil12_0001_334_3 20 18 1 2 isoform 2 Basil12_0010_04_3 11 8 8 11 Glycine dehydrogenase P protein Basil12_0010_05_1 1 5 14 18 Malate dehydrogenase (EC 1.1.1.37) DUF domain, unknown function F25P12.97 protein Basil12_0555_01_2 2 12 11 11 (At1g56580/F25P12_18) 2-C-methyl-D-erythritol 2,4- cyclodiphosphate synthase, chloroplast Basil12_0089_01_1 4 6 13 12 precursor (EC 4.6.1.12) 4-diphosphocytidyl-2-C-methyl-D- erythritol kinase, chloroplast precursor Basil12_0006_04_3 4 11 3 15 (EC 2.7.1.148) 2,3-bisphosphoglycerate-independent Basil12_0001_402_2 2 12 11 4 phosphoglycerate mutase (EC 5.4.2.1) Basil12_0001_352_3 3 3 9 14 Isocitrate dehydrogenase (EC 1.1.1.42) Basil12_1379_01_1 6 7 10 5 geraniol dehydrogenase Phosphoglucomutase, cytoplasmic (EC Basil12_3944_01_2 1 9 7 7 5.4.2.2) Triosephosphate isomerase, Basil12_0739_01_1 1 6 14 3 chloroplast precursor (EC 5.3.1.1) Monodehydroascorbate reductase (EC Basil12_3217_01_2 2 12 4 5 1.6.5.4) Basil12_1503_01_4 4 7 4 7 Peroxiredoxin precursor Basil12_3287_01_3 3 5 4 9 Lipoamide dehydrogenase mitochondrial NAD-dependent malate Basil12_1775_01_1 2 5 8 6 dehydrogenase Basil12_1177_01_3 1 8 7 3 3-dehydroquinate synthase-like protein 5-enolpyruvylshikimate-3-phosphate Basil12_0039_01_2 8 2 4 2 synthase (EPSP Synthase) Basil12_0391_01_3 1 5 5 4 NADP malic enzyme (EC 1.1.1.40). Basil12_4900_01_3 2 3 2 4 40S ribosomal protein S5 Basil12_2215_02_3 2 2 5 2 Aconitate hydratase.

249

Gb|AAC98059.1 Basil12_0505_01_2 2 4 3 2 (AT3g18420/MYF24_13) Basil12_0327_01_1 1 5 2 2 Vestitone reductase Alcohol dehydrogenase class III (EC Basil12_0128_02_3 2 2 2 3 1.1.1.1) Q9AXR6_CAPAN 1 3 1 2 ATP:citrate lyase. Basil12_1626_01_1 1 2 2 2 glycine hydroxymethyltransferase short-chain alcohol dehydrogenase Basil12_0001_077_2 36 5 69 (TASSELSEED2-like protein) 1,8-cineole synthase, chloroplast Basil12_0036_01_3 58 31 4 precursor (EC 4.2.3.-) Basil12_0005_03_1 18 38 18 Flavonoid 4'-O-methyltransferase 1-deoxy-D-xylulose-5-phosphate Basil12_1351_01_3 7 20 31 reductoisomerase AT4g25650/L73G19_30 (Hypothetical Basil12_3764_01_3 4 7 41 protein At4g25650) Q4ABW1_BRARP 5 3 41 Histone H3 4D11_26. Basil12_0023_02_3 26 7 15 Elicitor-inducible cytochrome P450 Basil12_0036_02_5 44 2 2 Monoterpene synthase H41_WHEAT 11 1 32 Histone H4 variant TH011. SET domain-containing protein Q8L820_MAIZE 27 11 5 SET104. Basil12_0001_186_3 13 19 10 4-coumarate: CoA ligase Basil12_0567_01_2 15 14 12 Chloroplast chaperonin 21 Basil12_0025_04_1 1 4 36 ATP synthase beta subunit Basil12_0001_407_2 12 7 22 GGPP synthase Basil12_5727_01_2 9 1 30 60S acidic ribosomal protein P0 Basil12_1228_01_3 6 1 28 60S ribosomal protein L18 Basil12_0010_03_2 14 4 15 P protein-like Cinnamate 4-hydroxylase (EC Basil12_0007_04_2 13 5 11 1.14.13.11) Basil12_2191_01_6 5 4 18 Hypothetical protein At4g34350 Caffeic acid 3-O-methyltransferase 1 Basil12_0144_01_2 7 11 8 (EC 2.1.1.68) Pyrophosphate--fructose 6-phosphate Basil12_0177_02_1 4 17 4 1-phosphotransferase alpha subunit

250

(EC 2.7.1.90) Hypothetical protein Basil12_4244_01_3 17 4 3 OSJNBb0081B07.22 Basil12_0001_210_1 9 3 11 cytochrome b5 isoform 2 Basil12_0004_04_2 9 12 2 Cytochrome P450 82A3 (EC 1.14.-.-) Basil12_0040_01_2 3 7 13 Glucose-6-phosphate isomerase UTP--glucose-1-phosphate Basil12_0694_01_4 6 3 12 uridylyltransferase (EC 2.7.7.9) Basil12_2967_01_2 4 6 9 Flavone synthase II Mitochondrial F1-ATPase gamma Basil12_1686_01_4 2 1 16 subunit Hypothetical protein Basil12_0003_05_2 5 2 11 OSJNBb0081B07.22 Basil12_3953_01_3 7 6 5 EFG2 OSJNBa0020P07.3 protein Basil12_2214_01_2 2 2 13 60S ribosomal protein L10 (EQM) Small subunit ribosomal protein S16 Q6RK71_CAPAN 2 2 13 (Fragment). Glucose-6-phosphate 1- dehydrogenase, cytoplasmic isoform Basil12_0171_01_2 1 7 8 (EC 1.1.1.49) Basil12_2556_01_4 8 6 2 GTP-binding protein Phospholipase D Basil12_0001_140_3 9 1 5 At3g15730/MSJ11_13 Basil12_0170_02_1 4 2 9 ribosomal protein Basil12_0001_049_1 1 3 9 1-deoxyxylulose-5-phosphate synthase Adenylate kinase, chloroplast (EC Basil12_0030_02_3 2 4 7 2.7.4.3) Basil12_0173_01_1 3 9 1 glycine-rich RNA binding protein 1 Basil12_0001_045_3 2 2 8 Catalase (EC 1.11.1.6) Basil12_0120_01_1 4 6 2 Manganese superoxide dismutase Phosphoglycerate dehydrogenase-like Basil12_4533_01_6 2 7 2 protein Basil12_3100_01_3 5 1 5 Acyltransferase homolog Basil12_5899_01_2 2 6 3 Adenylate kinase b (Ec 2.7.4.3) Basil12_0585_01_2 1 3 6 Farnesyl diphosphate synthase Basil12_3696_01_1 2 1 7 NADH dehydrogenase (EC 1.6.99.3)

251

Basil12_0003_08_3 3 6 1 Phenylalanine ammonia-lyase 2 Basil12_2498_01_5 1 2 7 At2g05990 Glutamine synthetase, cytosolic Basil12_0001_246_1 2 2 6 isozyme (EC 6.3.1.2) nascent polypeptide associated Basil12_1127_01_2 1 8 1 complex alpha chain Basil12_1272_01_4 2 6 1 heat shock protein 82 Basil12_1892_01_2 1 3 5 60S ribosomal protein L15 Q38970_ARATH 3 5 1 Acetyl-CoA carboxylase. F7A19.2 protein Basil12_0326_01_2 4 3 2 (At1g13930/F16A14.27) 36 kDa outer mitochondrial membrane protein porin (Voltage-dependent Basil12_3409_01_2 1 1 6 anion-selective channel protein) 3-phosphoshikimate 1- Basil12_0133_01_1 1 6 1 carboxyvinyltransferase Chorismate synthase 1, chloroplast Basil12_0001_126_1 4 3 1 precursor (EC 4.2.3.5) Q38JI1_SOLTU 2 1 4 Hypothetical protein. NAD-malate dehydrogenase precursor Basil12_1572_01_5 1 4 2 (EC 1.1.1.37) Basil12_0001_089_2 2 2 2 Dehydration-induced protein RD22 UDP-glucose 4-epimerase GEPI48 Basil12_1535_01_3 2 2 2 (EC 5.1.3.2) Proteasome subunit alpha type 4 (EC Basil12_2908_01_6 2 1 1 3.4.25.1) Basil12_0016_04_2 21 93 Alcohol dehydrogenase (EC 1.1.1.1) Basil12_0001_268_2 9 52 Elongation factor 1-alpha Chavicol O-methyltransferase (EC Basil12_0001_351_1 12 45 2.1.1.146) ADP,ATP carrier protein 1, mitochondrial precursor (ADP/ATP Basil12_0001_392_3 4 40 1) Q34691_HELAN 4 39 F1 ATPase alpha subunit. Basil12_3182_01_3 20 21 Cytokinin binding protein CBP57 Basil12_0001_306_3 23 13 Hypothetical protein Basil12_4490_01_1 11 25 Tonoplast intrinsic protein bobTIP26-1

252

Basil12_0647_01_3 1 34 14-3-3 protein Basil12_0001_020_3 13 21 CCMT1 Basil12_1487_01_3 5 27 60S ribosomal protein L6 (YL16-like) Mitochondrial 2-oxoglutarate/malate Basil12_0033_03_2 9 22 carrier protein Basil12_0762_01_3 10 20 SRG1-like protein (At4g25310) Basil12_1895_01_2 7 22 60S ribosomal protein L23a-2 Basil12_0001_259_1 24 2 Gamma-cadinene synthase Basil12_3630_01_3 2 24 60S acidic ribosomal protein P0 Basil12_2877_01_6 9 16 Ribosomal protein S26 Basil12_2338_01_2 4 20 Adenine nucleotide translocase Q9M516_CAPAN 17 6 Translation elongation factor 1a. Basil12_4908_01_3 3 20 40S ribosomal protein S8 Basil12_0895_01_5 3 20 60S ribosomal protein L7-1 60S ribosomal protein L4-2 (L1). Basil12_5789_01_3 9 13 AT5g02870/F9G14_180 Basil12_0009_02_2 6 16 Germacrene D synthase 40S ribosomal protein S3a (CYC07 Basil12_2085_01_1 3 19 protein) Basil12_2265_01_2 12 10 ribosomal protein S14 Hypothetical protein F5M6.12 Q9C6S8_ARATH 7 12 (At1g31870). Hydroxymethylbutenyl 4-diphosphate Q672R6_MAIZE 5 13 synthase. Ribosomal protein L14-like protein Basil12_1010_01_2 8 10 (AT4g27090/T24A18_40) Basil12_0001_031_1 14 3 Alpha-zingiberene synthase Q8H9B0_9CARY 8 9 Eukaryotic elongation factor 1A. Basil12_0001_097_1 5 12 Flavonoid 4'-O-methyltransferase Basil12_0029_04_1 2 15 Hypothetical protein Basil12_1309_01_2 1 16 ribosomal protein L19 Translationally-controlled tumor TCTP_CUCME 2 14 protein homolog (TCTP). Basil12_0624_01_1 4 11 ribosomal protein

253

60S ribosomal protein L11 Q676Z7_HYAOR 3 11 (Fragment). Basil12_4792_01_3 7 7 mitochondrial processing peptidase Basil12_3265_01_2 3 11 Ribosomal protein L36 Succinyl-CoA ligase alpha subunit Basil12_2137_01_3 1 12 (EC 6.2.1.5) Basil12_5062_01_1 5 7 mitochondrial ATP synthase Basil12_0152_02_1 7 5 Ferredoxin, root R-B1 Basil12_0359_01_2 3 9 Ribosomal protein L17 Basil12_4393_02_2 3 9 Ribosomal protein S6 Basil12_0937_01_3 9 2 Malate like protein Basil12_4059_01_1 4 7 Stachyose synthase (EC 2.4.1.67) 60S ribosomal protein L13 (BBC1 Basil12_4637_01_1 1 10 protein homolog) Basil12_0001_478_6 1 10 60S ribosomal protein L2 (L8) Basil12_1318_01_3 4 7 ribosomal protein L28 Basil12_0114_02_2 5 5 beta-ketoacyl-CoA synthase Basil12_2003_01_4 6 4 NADP-isocitrate dehydrogenase ACT11_ARATH 4 6 Actin-11. H11_ARATH 3 7 Histone H1.1. Basil12_0001_215_2 3 7 Histone H2B NADH-ubiquinone oxidoreductase 49 NUCM_ARATH 8 2 kDa subunit (EC 1.6.5.3) Basil12_1984_01_1 1 8 Biotin carboxylase subunit Basil12_1598_01_2 3 6 60S ribosomal protein L9 EF2_BETVU 4 5 Elongation factor 2 (EF-2). Glycine decarboxylase complex H- Basil12_1191_01_3 5 4 protein 3-hydroxyisobutyryl-coenzyme A Basil12_0111_01_3 7 1 hydrolase (At1g06550) Chloroplast 1-hydroxy-2-methyl- butenyl 4-diphosphate reductase Q5EGH0_ARATH 1 6 precursor. Q5WMY3_ORYSA 1 6 Cytoplasmic ribosomal protein L18. Basil12_4357_01_1 6 1 fatty acid elongase (At5g04530)

254

Basil12_5595_01_4 4 2 Patatin-like protein Basil12_1624_01_1 1 5 40S ribosomal protein S24 Basil12_3477_01_1 2 4 60S ribosomal protein L18a-1 Basil12_0910_01_1 4 2 60S ribosomal protein L22-3 Basil12_4384_01_1 3 3 Alanine aminotransferase Basil12_1790_01_2 3 3 Hypothetical protein orf109b Q56XD5_ARATH 2 4 Pyruvate kinase. Basil12_0875_01_2 2 4 Ribosomal protein L27 ATP-dependent Clp protease subunit Basil12_0034_03_2 4 1 ClpP (NClpP1) Q84LJ5_AVESA 3 2 1,3-beta glucanase. Basil12_0054_03_2 2 3 40S ribosomal protein S15A Basil12_1342_01_1 2 3 40S ribosomal protein S25 Basil12_3405_01_3 1 4 acyl-CoA synthetase Cytochrome c oxidase subunit 2 (EC COX2_BETVU 2 3 1.9.3.1) Basil12_2880_01_2 3 2 Flavonoid 7-O-methyltransferase Phosphoglycerate kinase, chloroplast Basil12_1805_01_3 3 2 precursor (EC 2.7.2.3) Basil12_0224_01_6 3 2 Phospholipase PLDb1 Basil12_3167_01_3 2 3 Prephenate dehydratase mitochondrial aldehyde Basil12_2096_01_3 2 2 dehydrogenase ALDH2a Ubiquinol-cytochrome c1 reductase Basil12_3646_01_6 3 1 cytochrome c1-like protein Basil12_0001_048_1 2 2 12-oxophytodienoate reductase Chloroplast inner envelope protein, Basil12_0398_02_2 3 1 110 kD (IEP110) Enoyl-ACP reductase precursor (EC O24258_PETHY 2 2 1.3.1.9). Expressed protein (Hypothetical Basil12_4480_02_1 1 3 protein At2g33220) Q308B0_SOLTU 2 2 Hypothetical protein. Basil12_0916_01_1 2 2 Isopentenyl diphosphate isomerase 2 Proteasome subunit alpha type 5 (EC PSA5_ORYSA 2 2 3.4.25.1)

255

Basil12_0712_01_3 2 2 Tubulin beta-1 (Beta-tubulin) Ketol-acid reductoisomerase, Basil12_0195_02_1 1 2 chloroplast precursor (EC 1.1.1.86) Basil12_0001_333_2 1 2 60S ribosomal protein L24 Acidic ribosomal protein P2 (Acidic Q7X729_WHEAT 1 2 ribosomal protein P2a-2) Basil12_0001_155_5 1 2 Ankyrin-repeat protein HBP1 Basil12_2495_02_3 2 1 AT3g48870/T21J18_140 Basil12_0408_01_2 2 1 Beta-ketoacyl-ACP synthase I cell division control protein 48 CD48D_ARATH 2 1 homolog D (AtCDC48d) Basil12_1979_01_3 2 1 F27F5.25 Glucose-6-phosphate 1- dehydrogenase, chloroplast precursor G6PDC_SOLTU 1 2 (EC 1.1.1.49) Glutamate synthase [NADH], GLSN_MEDSA 2 1 chloroplast precursor (EC 1.4.1.14) Basil12_0120_02_3 2 1 Highly similar to YUP8H12R.39 Basil12_0001_311_2 2 1 movement protein Q2V4B7_ARATH 1 2 Protein At1g79930. Succinyl CoA ligase beta subunit (EC Basil12_0616_02_2 1 2 6.2.1.5) Q6W2J3_LOTJA 1 2 VDAC1.3. 3-ketoacyl-CoA thiolase 2, Basil12_0035_01_2 1 1 peroxisomal precursor (EC 2.3.1.16) Basil12_0001_009_2 1 1 glutathione S-transferase NADH dehydrogenase subunit F Q716T7_9POAL 1 1 (Fragment). Cytosolic aldehyde dehydrogenase Basil12_2664_01_3 1 1 RF2C Dehydroquinate dehydratase/shikimate:NADP Basil12_0087_02_1 1 1 oxidoreductase (EC 4.2.1.10) Basil12_2452_01_6 1 1 Hypothetical protein Basil12_5343_01_2 1 1 Hypothetical protein At1g56070 Basil12_4105_01_3 1 1 NADH dehydrogenase subunit 9 Basil12_2394_01_6 1 1 Protease Do-like 1, chloroplast

256

precursor (EC 3.4.21.-) Eugenol O-methyltransferase (EC Basil12_2577_01_3 57 2.1.1.146) Q75G91_ORYSA 28 ribosomal protein. Basil12_0681_01_1 25 Dihydrolipoamide S-acetyltransferase Basil12_1237_01_1 25 Ribosomal protein Basil12_2371_01_1 24 40S ribosomal protein S21 Q9FEP0_9MAGN 19 ISPH (LYTB-like protein precursor) Basil12_5216_01_1 14 Histone H2A Q5VKY5_9LILI 13 F1-ATPase alpha subunit (Fragment). Basil12_2116_01_1 13 Terpinolene synthase Basil12_1012_01_3 12 60S ribosomal protein L2 Basil12_5711_01_1 11 L24 ribosomal protein Basil12_0001_323_4 10 Flavonoid 7-O-methyltransferase Basil12_0057_01_2 10 Phosphate translocator precursor Basil12_2649_01_1 9 LeArcA2 protein Basil12_1032_01_2 9 senescence-associated protein Basil12_1285_01_5 8 101 kDa heat shock protein Ribosomal protein S3a-like protein Q9M339_ARATH 8 (AT3g53870/F5K20_170). Basil12_0559_01_3 7 ADP-ribosylation factor Basil12_0020_04_1 7 Beta-myrcene synthase Basil12_3568_01_1 7 Cytoplasmic ribosomal protein S13 Hypothetical protein T16L1.170 Basil12_2890_01_3 7 (Hypothetical protein AT4g33680) (3R)-hydroxymyristoyl-[acyl carrier protein] dehydratase-like protein Basil12_1108_01_1 6 (At5g10160) O81396_CITLI 6 -iron regulated protein 1. ATP synthase 24 kDa subunit, Basil12_4319_01_2 6 mitochondrial precursor (EC 3.6.3.14) ATP synthase delta' chain, Basil12_2120_01_5 6 mitochondrial precursor (EC 3.6.3.14) Basil12_2376_01_2 6 chorismate mutase

257

Basil12_0549_01_1 6 Hypothetical protein At1g56070 Hypothetical protein B1099D03.27 Q8L595_ORYSA 6 (Hypothetical protein P0434C04.47). Basil12_1305_01_1 6 PHB2 Basil12_1234_01_2 6 Ribosomal protein L3A Superoxide dismutase [Cu-Zn] (EC Basil12_0001_201_3 6 1.15.1.1) Basil12_1111_01_2 5 G-box binding protein-like Q93X25_PEA 5 2-Cys peroxiredoxin. Basil12_2230_01_2 5 60S ribosomal protein L21 Basil12_1717_01_1 5 60S ribosomal protein L30 Basil12_1611_01_2 5 60S ribosomal protein L7A AC009176 putative heat-shock Q2QPX2_ORYSA 5 protein. Q65XR7_ORYSA 5 adenylate translocator (Brittle-1) Cinnamoyl CoA reductase (EC Basil12_0132_01_2 5 1.2.1.44) Luminal binding protein, BiP Q94IK4_SCHDU 5 precursor (Fragment). Basil12_0091_02_3 5 Major allergen Pru av 1 (Pru a 1) Q9FR04_PERFR 5 Malonyl-CoA:ACP transacylase. 2-hydroxyacid dehydrogenase Q7XMP6_ORYSA 5 OSJNBb0059K02.15 protein. Basil12_1976_01_2 5 ribose-5-phosphate isomerase Q2QX68_ORYSA 5 Ribosomal protein L3, putative. Q2VCJ2_SOLTU 5 Ribosomal protein L3-like. Basil12_0641_01_3 4 40S ribosomal protein S19-2 Basil12_0253_01_3 4 40S ribosomal protein S7 Basil12_3859_01_1 4 60S ribosomal protein L35 Basil12_4533_02_5 4 60S ribosomal protein L38 Basil12_2431_01_1 4 6-phosphogluconolactonase Basil12_2125_01_2 4 Cytochrome b5 isoform 1 cytochrome P450-dependent fatty acid Basil12_1210_01_3 4 hydroxylase

258

Disease resistance protein, putative; Q9C699_ARATH 4 3954-7013. Hypothetical protein (Putative transthyretin, having alternative Q7XXQ9_ORYSA 4 splicing products). Q38HU5_SOLTU 4 Hypothetical protein. Methylthioadenosine/S-adenosyl homocysteine nucleosidase Basil12_0855_01_2 4 (Hypothetical protein) Basil12_0001_359_3 4 CCMT2 Ubiquitin carboxyl-terminal hydrolase UBP2_ARATH 4 2 (EC 3.1.2.15) Basil12_3106_01_5 4 26S proteasome regulatory complex subunit (26S proteasome subunit Basil12_5070_01_2 3 RPN12b) Basil12_0758_01_6 3 40S ribosomal protein S10 Basil12_1592_01_3 3 40S ribosomal protein S11 Basil12_0147_01_3 3 4-coumarate:coenzyme A ligase 2 Basil12_6172_01_2 3 60S ribosomal protein L22-2 Q69LD4_ORYSA 3 short-chain dehydrogenase/reductase. AT4g25650/L73G19_30 (Hypothetical Basil12_0001_236_2 3 protein At4g25650) ATP-dependent Clp protease, ATP- Basil12_0001_425_3 3 binding subunit Q2R3R2_ORYSA 3 BAG domain, putative. Q56YW3_ARATH 3 Disease resistence like-protein. E1 alpha subunit of pyruvate Basil12_1461_01_3 3 dehydrogenase Basil12_0459_01_1 3 Ferredoxin Hypothetical protein Basil12_3830_01_2 3 At3g58090/T10K17_300 Basil12_0001_357_2 3 Hypothetical protein At4g34350 Basil12_3170_01_2 3 Hypothetical protein OJ1656_A11.19 Q5XWM1_SOLTU 3 Hypothetical protein. Mitochondrial import inner membrane Q8L927_ARATH 3 translocase subunit Tim9.

259

Basil12_0963_01_2 3 Nuclear RNA binding protein A-like Basil12_0001_165_1 3 O-methyltransferase Ketol-acid reductoisomerase, chloroplast [Precursor] Basil12_0001_370_1 3 OSJNBa0027P08.14 protein Basil12_2334_01_5 3 OSJNBb0066J23.7 protein UDP glucosyltransferase Q7XWV5_ORYSA 3 OSJNBb0067G11.12 protein. Basil12_2221_01_1 3 phosphoglucomutase, chloroplast pyruvate dehydrogenase E1 beta Basil12_2435_01_6 3 subunit isoform 1 protein Q2V506_9ROSI 3 UDP-glucose pyrophosphorylase. Unknown mitochondrial protein Basil12_0299_01_3 3 At2g27730 Basil12_1097_01_3 3 20S proteasome beta subunit PBG1 Basil12_1964_01_3 2 (EC 3.4.99.46) Q2VCJ6_SOLTU 2 40S ribosomal protein S19-like. Q8H189_ARATH 2 40S ribosomal protein S20. Basil12_0284_01_2 2 40S ribosomal protein S23 40S ribosomal protein S3a (CYC07 Basil12_5555_01_2 2 protein) Basil12_2300_01_3 2 60s acidic ribosomal protein Basil12_1476_01_2 2 60S ribosomal protein L37a HSP81_ARATH 2 Heat shock protein 81-1 (HSP81-1) Basil12_1036_02_3 2 Heat shock protein, putative Basil12_0001_516_1 2 Acyltransferase homolog Q5DKU6_TOBAC 2 Adenosine kinase isoform 2T. Basil12_0294_01_3 2 Allyl alcohol dehydrogenase Basil12_4982_01_3 2 anthocyanidine rhamnosyl-transferase Arabidopsis thaliana genomic DNA, chromosome 5, P1 clone:MIO24 Basil12_0847_01_4 2 (AT5g51720/MIO24_14) Basil12_1195_01_6 2 At1g12050/F12F1_8 Basil12_2836_01_1 2 AT3g62580/T12C14_280

260

Q85FQ8_CYAME 2 ATP synthase CF1 alpha chain. Q8HFC9_9POAL 2 ATPase F1 alpha subunit (Fragment). Q5ZBH8_ORYSA 2 auxin-induced protein. CALX_HELTU 2 Calnexin homolog precursor. Citrate Synthase Domain 2 F19J9.9 Basil12_0652_01_1 2 protein (At1g09430/F19J9_9) Coated vesicle membrane protein-like Basil12_2012_01_2 2 (AT3g22845/MWI23_22) Basil12_0001_317_2 2 Cyclophilin Q60CZ3_SOLDE 2 D-ribulose-5-phosphate 3-epimerase. Basil12_0020_06_3 2 Fenchol synthase FKBP (GLP_440_93577_93248) Basil12_0887_01_1 2 peptidyl prolyl isomerase Glutamate-1-semialdehyde 2,1- aminomutase, chloroplast precursor GSA_SOYBN 2 (EC 5.4.3.8) Basil12_6380_01_1 2 Glyoxalase I Basil12_2183_01_1 2 Hypothetical protein Q9ZUL9_ARATH 2 Hypothetical protein At2g02150. Basil12_2863_01_1 2 Hypothetical protein At2g39070 Hypothetical protein F14L2_70 Basil12_0001_498_3 2 (At3g44520) Q9SD39_ARATH 2 Hypothetical protein F24M12.110. Hypothetical protein Basil12_0001_295_1 2 OSJNAb0008A05.23 Hypothetical protein P0676G05.2 Q75M12_ORYSA 2 (Hypothetical protein P0431G05.11). Basil12_0891_01_3 2 IPP isomerase Basil12_0001_388_3 2 L-asparaginase (EC 3.5.1.1) Basil12_0457_01_1 2 LeArcA1 protein Light-independent protochlorophyllide CHLN_CHLRE 2 reductase subunit N (EC 1.18.-.-) Basil12_2762_01_1 2 Light-inducible protein ATLS1 Basil12_0027_04_3 2 lipase; 20450-21648 (At1g52760) Mus musculus adult male testis cDNA, Basil12_4348_01_1 2 RIKEN full-length enriched library,

261

clone:1700030F05 product:fatty acid Coenzyme A ligase, long chain 5, full insert sequence Non-specific lipid transfer protein Basil12_0001_087_1 2 precursor Basil12_2114_01_3 2 Nuclear RNA binding protein A-like Basil12_6388_01_3 2 OJ000315_02.15 protein TLC1_SOLTU 2 Plastidic ATP/ADP-transporter. Proteasome subunit beta type 1 (EC Basil12_4353_01_2 2 3.4.25.1) Pyruvate dehydrogenase E1 alpha Basil12_1029_01_1 2 subunit (EC 1.2.4.1) Basil12_3663_01_1 2 ribosomal protein Q9ATF4_CASSA 2 Ribosomal protein L33. RuBisCO subunit binding-protein beta subunit, chloroplast precursor (60 kDa RUBB_PEA 2 chaperonin beta subunit) CCMT like OSJNBa0087O24.14 Basil12_0001_129_3 2 protein Basil12_0001_262_3 2 CCMT3 Basil12_5150_01_2 2 adenylosuccinate synthase (ADSS), Basil12_0536_01_2 2 transcription factor APFI Transposon protein, putative, CACTA, Q2QP23_ORYSA 2 En/Spm sub-class. Triose-phosphate isomerase (EC Q5JZZ3_PHAVU 2 5.3.1.1). Basil12_0422_01_2 2 Ubiquitin Basil12_4587_02_6 2 26S proteasome regulatory particle Basil12_5078_01_1 1 non-ATPase subunit12 Basil12_2902_01_2 1 26S proteasome subunit RPN7 3-deoxy-D-arabino-heptulosonate 7- Q30D04_FAGSY 1 phosphate synthase 1 (Fragment). 4,5-DOPA dioxygenase extradiol-like Basil12_0001_214_1 1 protein Basil12_3669_01_4 1 60S ribosomal protein L13a-2 ACT1_MAIZE 1 Actin-1.

262

adenosine kinase OSJNBa0073E02.13 Q7X905_ORYSA 1 protein. Q8L3R8_ARATH 1 AT3g08530/T8G24_1. Q84VZ9_ARATH 1 At4g05590. Q851W4_ORYSA 1 auxin response factor. Q93X31_TOBAC 1 beta5 proteasome subunit (Fragment). Q8S4R0_CRYCO 1 BiP (Fragment). BTF3b-like factor (Transcription Basil12_0956_01_4 1 factor, putative) Q42267_ARATH 1 Carrier protein (Fragment). Q69AB3_HELAN 1 CC-NBS-LRR. Clp-like energy-dependent protease Basil12_3109_01_2 1 precursor Cytochrome b5 domain-containing Basil12_2944_01_2 1 protein-like Basil12_0216_01_2 1 cytochrome P450 Cytosolic 6-phosphogluconate Q7FRX8_ORYSA 1 dehydrogenase. TRAF-like Dbj|BAA87936.1 Q9LHA6_ARATH 1 (AT3g28220/T19D11_3). disease resistance protein Q8RXQ6_ARATH 1 (Hypothetical protein At4g13820). Q7Y204_ARATH 1 DnaJ protein. Basil12_2287_01_1 1 Elicitor-inducible protein EIG-J7 Q8S3W1_SOYBN 1 Elongation factor 1-gamma. Expressed protein O82279_ARATH 1 (At2g31040/T16B12.15). Expressed protein; supported by full length cDNA: Ceres: 122798 Basil12_3602_01_1 1 (At5g57655) Q5IBJ2_GOOOV 1 F1-ATPase alpha subunit (Fragment). O64597_ARATH 1 F-Box Protein F17O7.7 protein. Basil12_0001_115_3 1 Flavonoid 7-O-methyltransferase Basil12_2154_01_3 1 folylpolyglutamate synthetase Glucose 6 phosphate/phosphate Basil12_0714_01_3 1 translocator 1, chloroplast precursor

263

Basil12_4377_02_3 1 glucosyltransferase-3 Basil12_0090_01_2 1 Hydroxycinnamoyl transferase Basil12_2613_01_1 1 Hypothetical protein Q38957_ARATH 1 Hypothetical protein (Orf 05 protein) Q8H0W0_ARATH 1 Hypothetical protein At1g73060. Basil12_0987_01_1 1 Hypothetical protein At2g31670 Basil12_3307_01_3 1 Hypothetical protein At2g43630 Basil12_4250_01_3 1 Hypothetical protein At4g25370 Q8W496_ARATH 1 Hypothetical protein At4g25650. Basil12_2344_01_3 1 Hypothetical protein At4g34350 Hypothetical protein B1066D09.10 Basil12_5303_01_1 1 (Hypothetical protein B1026E06.24) Basil12_0484_01_4 1 Hypothetical protein B1206D04.24 Hypothetical protein F28K20.15 Basil12_0140_01_3 1 (At1g31190) Q6ZL93_ORYSA 1 Hypothetical protein OJ1065_B06.21. Hypothetical protein Basil12_0001_419_3 1 OSJNAb0008A05.23 Hypothetical protein Q5TKG2_ORYSA 1 OSJNBa0030I14.4. Hypothetical protein Q84R32_ORYSA 1 OSJNBb0016H12.19. Q654G1_ORYSA 1 Hypothetical protein P0412C04.25. Hypothetical protein P0516G10.35 Q6Z7R4_ORYSA 1 (Hypothetical protein P0585G03.13). Q5Z977_ORYSA 1 Hypothetical protein P0548E04.17. Q6ZBE7_ORYSA 1 Hypothetical protein P0671F11.26. Q2QNX5_ORYSA 1 Hypothetical protein. Q84WI5_ARATH 1 Hypothetical protein. Q8LEU2_ARATH 1 Hypothetical protein. Isopentenyl pyrophosphate isomerase Basil12_2064_01_2 1 IDI1 Basil12_3923_01_1 1 JPR ORF1 protein Basil12_1985_01_3 1 L24 ribosomal protein

264

Basil12_3077_01_2 1 lactoylglutathione lyase Q40870_PICGL 1 Legumin-like storage protein. Basil12_1615_01_3 1 L- (EC 1.1.1.27) Mechanosensitive ion channel domain- Q6ET89_ORYSA 1 containing protein-like. Methylmalonate semi-aldehyde Basil12_0080_01_3 1 dehydrogenase Q59IV1_MESCR 1 mitochondrial adenylate transporter. Mitochondrial processing peptidase alpha subunit, mitochondrial precursor Basil12_3079_01_3 1 (EC 3.4.24.64) Multicatalytic endopeptidase complex, Basil12_4200_01_2 1 proteasome, beta subunit NADH pyrophosphatase NUDT19, NUD19_ARATH 1 chloroplast precursor (EC 3.6.1.22) NADH-cytochrome b5 reductase-like Basil12_3973_01_3 1 protein (EC 1.6.2.2) NADH-ubiquinone oxidoreductase 18 kDa subunit, mitochondrial precursor Basil12_0109_02_1 1 (EC 1.6.5.3) NADH-ubiquinone oxidoreductase NU1M_ARATH 1 chain 1 (EC 1.6.5.3) Nascent polypeptide-associated complex alpha subunit-like protein 1 NACA1_ARATH 1 (NAC-alpha-like protein 1) Q5VS56_ORYSA 1 Nodulin MtN21-like. Basil12_2295_01_1 1 Nudix hydrolase 1 (EC 3.6.1.-) Q9M4Y8_CUCSA 1 OP1. Q39498_CYLFU 1 ORF218. Q36935_BETVU 1 ORFB (ATPase subunit 8). Chlorophyll a/b binding Basil12_1865_01_3 1 OSJNBa0036B21.6 protein Calycin_like OSJNBa0044M19.16 Q7XLP0_ORYSA 1 protein (OSJNBa0053B21.2 protein). Initiation factor IF_elF4G Basil12_0514_01_5 1 OSJNBb0065J09.6 protein Basil12_2959_01_3 1 Oxidoreductase Q75LR2_ORYSA 1 phosphate synthase.

265

Phosphate/phosphoenolpyruvate Basil12_0026_04_2 1 translocator Phosphoenolpyruvate carboxylase 1 CAPP1_ARATH 1 (EC 4.1.1.31) Basil12_4569_01_4 1 phosphoinositide phosphatase Q5G7K5_9STRA 1 PPAT5. O04504_ARATH 1 Prenyltransferase F21M12.21 protein. Retrotransposon protein, putative, Q53JT1_ORYSA 1 Ty3-gypsy sub-class. Ribosomal protein S27-like protein- Q2VCJ1_SOLTU 1 like. Basil12_5128_01_3 1 hemolysin Sodium/calcium exchanger protein, Q2RAK6_ORYSA 1 putative. Q6ZK48_ORYSA 1 step II splicing factor SLU7. , chloroplast Basil12_0001_518_1 1 precursor (EC 4.2.3.1) Thylakoid-bound ascorbate O04873_9ROSI 1 peroxidase. Basil12_5674_01_3 1 TRA1 protein Q7SIC9_MAIZE 1 Transferase. Q3HRV9_SOLTU 1 Triosphosphate isomerase-like protein. Tubulin alpha-6 chain (Alpha-6 Basil12_0969_01_4 1 tubulin) Basil12_1969_01_5 1 ZINC FINGER PROTEIN Basil12_6266_01_5 1

266

B3. Protein post-translational modifications found by X!tandem (less reliable results)

Cultivar Peptide Sequence a Protein Description Phenylalanine ammonia-lyase SW SKmIEREINsVNDNPLIDVSR Q58ZF0_LOTCO (Fragment). MC MGLtDREIVALSGGHTLGR Q5QIA9_VIGUN Peroxisomal ascorbate peroxidase. GGVLPGIKVDKGtVELAGTDGE Cytosolic aldolase SD Basil12_0001_279_2 TTTQGLDGLAQR MC sEHLKKPENWALVEK Q6K1R5_ORYSA Putative adenosine kinase SMSmPGFASLtRTPGGSCPGT OSJNBb0017I01.22 protein. MC Q7XKD2_ORYSA GAGTPTR SW IDFAAHtK Basil12_5451_01_5 Hypothetical protein F2P16.14 EMX RDtCRsIGFISDDVGLDADKCK METL_ARATH S-adenosylmethionine synthetase 2 sFFILLVDmDtFLFTSESVNEG S-adenosylmethionine synthetase 3 EMX Basil12_0001_350_1 HPDK MC VFSMKAsNPVIMVQAYR Basil12_0159_02_3 GcpE protein SD yVSIVLR HAK9_ORYSA Probable potassium transporter 9 QPAVNGAKsDsPTVGWGGPG Methylenetetrahydrofolate reductase SW Basil12_0001_177_1 GYVYQK SW YLKtAAYGHFGR METK_MESCR S-adenosylmethionine synthetase LsssNQVDMDTFLFTSESVNE S-adenosylmethionine synthetase 3 SD Basil12_0001_327_3 GHPDK SW HQAkLGGYDIPAESK Q9XEH8_PINTA Trans-cinnamate 4-hydroxylase 266 SW mkDRLVGVSEETTTGVK Basil12_2337_01_3 Adenosylhomocysteinase

267

Protein post-translational modifications found by X!tandem (less reliable results) - continued

Cultivar Peptide Sequence a Protein Description EMX IGTPGkGILAADESTGTIGK Basil12_0001_279_2 Cytosolic aldolase ELSGGAQkLLPEGTRLVV (E)-4-hydroxy-3-methylbut-2-enyl EMX Q6RI20_NICBE SVR diphosphate synthase (Fragment). 5-methyltetrahydropteroyl- IVEVNALAkALSGHKDEAFFS MC Basil12_0001_231_1 triglutamate--homocysteine ANAAAQASR methyltransferase MC EGkLTSLDQYISR ENPL_CATRO Endoplasmin homolog precursor MC mATITVVkAR Q6Q4Z3_CAPBU LOS2 SD GSSkAGLQFPVGR Q5YJR4_HYAOR Histone H2A (Fragment). EMX VFNEAMACHAr Basil12_2880_01_2 Flavonoid 7-O-methyltransferase EMX QLTPAmrESGSSSR Q7XZU1_ARATH SAC domain protein 4. SD SVrAGLQFPVGR H2A1_PEA Histone H2A. SW rGTSNLLGLQ Q762D3_MARPA Manganese-superoxide dismutase a Phosphorylation on serine (s), threonine (t), tyrosine (y); ubiquitination on lysine (k) and monomethylation on arginine (r) were detected using X!tandem with corresponding mass shifts and neutral losses associated

267

268

REFERENCES

Aebersold R, Mann M (2003) Mass spectrometry-based proteomics. Nature 422: 198-207

Agrawal GK, Rakwal R (2006) Rice proteomics: A cornerstone for cereal food crop proteomes. Mass Spectrometry Reviews 25: 1-53

Akamine H, Hossain A, Ishimine Y, Yogi K, Hokama K, Iraha Y, Aniya Y (2007) Effects of application of N, P and K alone or in combination on growth, yield and curcumin content of turmeric (Curcuma longa L.). Plant Production Science 10: 151-154

Allwood EG, Davies DR, Gerrish C, Ellis BE, Bolwell GP (1999) Phosphorylation of phenylalanine ammonia-lyase: evidence for a novel protein kinase and identification of the phosphorylated residue. Febs Letters 457: 47-52

Amme S, Rutten T, Melzer M, Sonsmann G, Vissers JPC, Schlesier B, Mock HP (2005) A proteome approach defines protective functions of tobacco leaf trichomes. Proteomics 5: 2508-2518

Andon NL, Hollingworth S, Koller A, Greenland AJ, Yates JR, Haynes PA (2002) Proteomic characterization of wheat amyloplasts using identification of proteins by tandem mass spectrometry. Proteomics 2: 1156-1168

Arora RB, Kapoor V, Basu N, Jain AP (1971) Anti-inflammatory studies on Curcuma longa (turmeric). Indian J Med Res 59: 1289-1295

Arrieta A, Mann G, Beyer L, Wilde H (1987) Synthesis of curcuminoid compounds. II. The synthesis of new nonsymmetrical curcuminoids. Boletin de la Sociedad Quimica del Peru 53: 85-102

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene Ontology: tool for the unification of biology. Nature Genetics 25: 25-29

Atherton HJ, Bailey NJ, Zhang W, Taylor J, Major H, Shockcor J, Clarke K, Griffin JL (2006) A combined H-1-NMR spectroscopy- and mass spectrometry-based metabolomic study of the PPAR-alpha null mutant mouse defines profound systemic changes in metabolism linked to the metabolic syndrome. Physiological Genomics 27: 178-186

269

Bamba T, Fukusaki E (2006) Technical problems and practical operations in plant metabolomics. Journal of Pesticide Science 31: 300-304

Bandu ML, Watkins KR, Bretthauer ML, Moore CA, Desaire H (2004) Prediction of MS/MS data. 1. A focus on pharmaceuticals containing carboxylic acids. Analytical Chemistry 76: 1746-1753

Bansal RP, Bahl JR, Garg SN, Naqvi AA, Kumar S (2002) Differential chemical compositions of the essential oils of the shoot organs, rhizomes and rhizoids in the turmeric Curcuma longa grown in indo-grangetic plains. Pharmaceutical Biology 40: 384-389

Baran R, Kochi H, Saito N, Suematsu M, Soga T, Nishioka T, Robert M, Tomita M (2006) MathDAMP: a package for differential analysis of metabolite profiles. Bmc Bioinformatics 7 Bate NJ, Orr J, Ni WT, Meromi A, Nadlerhassar T, Doerner PW, Dixon RA, Lamb CJ, Elkind Y (1994) Quantitative relationship between phenylalanine ammonia-lyase levels and phenylpropanoid accumulation in transgenic tobacco identifies a rate- determining step in natural product synthesis. Proceedings of the National Academy of Sciences of the United States of America 91: 7608-7612

Behura S, Sahoo S, Srivastava VK (2002) Major constituents in leaf essential oils of Curcuma longa L. and Curcuma aromatica Salisb. Current Science 83: 1312- 1313

Berzin VB, Katsitadze LG, Pilipenko TV, Ovcharenko VV, Miroshnikov AI (1996) Identification of natural curcumins. Bioorganicheskaya Khimiya 22: 823-831

Bino RJ, Hall RD, Fiehn O, Kopka J, Saito K, Draper J, Nikolau BJ, Mendes P, Roessner-Tunali U, Beale MH, Trethewey RN, Lange BM, Wurtele ES, Sumner LW (2004) Potential of metabolomics as a functional genomics tool. Trends in Plant Science 9: 418-425

Bodnar WM, Blackburn RK, Krise JM, Moseley MA (2003) Exploiting the complementary nature of LC/MALDI/MS/MS and LC/ESI/MS/MS for increased proteome coverage. Journal of the American Society for Mass Spectrometry 14: 971-979

Bolwell GP (1992) A role for phosphorylation in the down-regulation of phenylalanine ammonia-lyase in suspension-cultured cells of french bean. Phytochemistry 31: 4081-4086

270

Borevitz JO, Xia YJ, Blount J, Dixon RA, Lamb C (2000) Activation tagging identifies a conserved MYB regulator of phenylpropanoid biosynthesis. Plant Cell 12: 2383- 2393

Boroczky K, Laatsch H, Wagner-Dobler I, Stritzke K, Schulz S (2006) Cluster analysis as selection and dereplication tool for the identification of new natural compounds from large sample sets. Chemistry & Biodiversity 3: 622-634

Bouvier F, Rahier A, Camara B (2005) Biogenesis, molecular regulation and function of plant isoprenoids. Progress in Lipid Research 44: 357-429

Brame CJ, Moran MF, McBroom-Cerajewski LDB (2004) A mass spectrometry based method for distinguishing between symmetrically and asymmetrically dimethylated arginine residues. Rapid Communications in Mass Spectrometry 18: 877-881

Brand S, Holscher D, Schierhorn A, Svatos A, Schroder J, Schneider B (2006) A type III polyketide synthase from Wachendorfia thyrsiflora and its role in diarylheptanoid and phenylphenalenone biosynthesis. Planta 224: 413-428

Breci L, Hattrup E, Keeler M, Letarte J, Johnson R, Haynes PA (2005a) Comprehensive proteomics in yeast using chromatographic fractionation, gas phase fractionation, protein gel electrophoresis, and isoelectric focusing. Proteomics 5: 2018-2028

Breci L, Hattrup E, Keeler M, Letarte J, Johnson R, Haynes PA (2005b) Comprehensive proteornics in yeast using chromatographic fractionation, gas phase fractionation, protein gel electrophoresis, and isoelectric focusing. Proteomics 5: 2018-2028

Broeckling CD, Reddy IR, Duran AL, Zhao XC, Sumner LW (2006) MET-IDEA: Data extraction tool for mass spectrometry-based metabolomics. Analytical Chemistry 78: 4334-4341

Broun P (2005) Transcriptional control of flavonoid biosynthesis: a complex network of conserved regulators involved in multiple aspects of differentiation in Arabidopsis. Current Opinion in Plant Biology 8: 272-279

Bryant JP, Chapin FS, Reichardt PB, Clausen TP (1987) Response of winter chemical defense in alaska paper birch and green alder to manipulation of plant carbon nutrient balance. Oecologia 72: 510-514

Carpin S, Garnier F, Andreu F, Chenieux JC, Rideau M, Hamdi S (1997a) Changes of both polypeptide pattern and sensitivity to cytokinin following transformation of periwinkle tissues with the isopentenyl transferase gene. Plant Physiology and Biochemistry 35: 603-609

271

Carpin S, Ouelhazi L, Filali M, Chenieux JC, Rideau M, Hamdi S (1997b) The relationship between the accumulation of a 28 kD polypeptide and that of indole alkaloids in Catharanthus roseus cell suspension cultures. Journal of Plant Physiology 150: 452-457

Carrari F, Baxter C, Usadel B, Urbanczyk-Wochniak E, Zanor MI, Nunes-Nesi A, Nikiforova V, Centero D, Ratzka A, Pauly M, Sweetlove LJ, Fernie AR (2006) Integrated analysis of metabolite and transcript levels reveals the metabolic shifts that underlie tomato fruit development and highlight regulatory aspects of metabolic network behavior. Plant Physiology 142: 1380-1396

Carretero-Paulet L, Ahumada I, Cunillera N, Rodriguez-Concepcion M, Ferrer A, Boronat A, Campos N (2002) Expression and molecular analysis of the Arabidopsis DXR gene encoding 1-deoxy-D-xylulose 5-phosphate reductoisomerase, the first committed enzyme of the 2-C-methyl-D-erythritol 4- phosphate pathway. Plant Physiology 129: 1581-1591

Chan ECY, Yap SL, Lau AJ, Leow PC, Toh DF, Koh HL (2007) Ultra-performance liquid chromatography/time-of-flight mass spectrometry based metabolomics of raw and steamed Panax notoginseng. Rapid Communications in Mass Spectrometry 21: 519-528

Chan P, Thomas GN, Tomlinson B (2002) Protective effects of trilinolein extracted from panax notoginseng against cardiovascular disease. Acta Pharmacol Sin 23: 1157- 1162

Chandra D, Gupta SS (1972) Anti-inflammatory and anti-arthritic activity of volatile oil of Curcuma longa (Haldi). Indian J Med Res 60: 138-142

Chandra R, Desai AR, Govind S, Gupta PN (1997) Metroglyph analysis in turmeric (Curcuma longa L.) germplasm in India. Scientia Horticulturae 70: 211-222

Chane-Ming J, Vera R, Chalchat JC, Cabassu P (2002) Chemical composition of essential oils from rhizomes, leaves and flowers of Curcuma longa L. from Reunion Island. Journal of Essential Oil Research 14: 249-251

Chen SX, Harmon AC (2006) Advances in plant proteomics. Proteomics 6: 5504-5516

Chen YH (1983) Studies on Chinese Curcuma. III. Comparison of the volatile oil and phenolic constituents from the rhizoma and the tuber of Curcuma longa. Zhong Yao Tong Bao 8: 27-29

272

Cheng SH, Sheen J, Gerrish C, Bolwell GP (2001) Molecular identification of phenylalanine ammonia-lyase as a substrate of a specific constitutively active Arabidopsis CDPK expressed in maize protoplasts. Febs Letters 503: 185-188

Choi YH, Tapias EC, Kim HK, Lefeber AWM, Erkelens C, Verhoeven JTJ, Brzin J, Zel J, Verpoorte R (2004) Metabolic discrimination of Catharanthus roseus leaves infected by phytoplasma using H-1-NMR spectroscopy and multivariate data analysis. Plant Physiology 135: 2398-2410

Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20: 1466-1467

Croteau R, Kutchan T, Lewis N (2002) Natural products (secondary metabolites). In B Buchanan, W Gruissem, RL Jones, eds, Biochemistry & Molecular Biology of Plants. Wiley Rockville, Md., pp 1250-1316

Dairam A, Limson JL, Watkins GM, Antunes E, Daya S (2007) Curcuminoids, curcumin, and demethoxycurcumin reduce lead-induced memory deficits in male Wistar rats. J Agric Food Chem 55: 1039-1044

Dasgupta T, Rao AR, Yadava PK (2004) Chemomodulatory efficacy of Basil leaf (Ocimum basilicum) on drug metabolizing and antioxidant enzymes, and on carcinogen-induced skin and forestomach papillomagenesis. Phytomedicine 11: 139-151

Davies KM, Schwinn KE (2003) Transcriptional regulation of secondary metabolism. Functional Plant Biology 30: 913-925

Daviss B (2005) Growing pains for metabolomics. Scientist 19: 25-28 de la Rosa TM, Julkunen-Tiitto R, Lehto T, Aphalo PJ (2001) Secondary metabolites and nutrient concentrations in silver birch seedlings under five levels of daily UV-B exposure and two relative nutrient addition rates. New Phytologist 150: 121-131

Decarolis E, Deluca V (1993) Purification, characterization, and kinetic-analysis of a 2- oxoglutarate-dependent dioxygenase involved in vindoline biosynthesis from Catharanthus-Roseus. Journal of Biological Chemistry 268: 5504-5511

Decker G, Wanner G, Zenk MH, Lottspeich F (2000) Characterization of proteins in latex of the opium poppy (Papaver somniferum) using two-dimensional gel electrophoresis and microsequencing. Electrophoresis 21: 3500-3516

273

DeGnore JP, Qin J (1998) Fragmentation of phosphopeptides in an ion trap mass spectrometer. Journal of the American Society for Mass Spectrometry 9: 1175- 1188

Dettmer K, Aronov PA, Hammock BD (2007) Mass spectrometry-based metabolomics. Mass Spectrometry Reviews 26: 51-78

Dixon RA, Achnine L, Kota P, Liu CJ, Reddy MSS, Wang LJ (2002) The phenylpropanoid pathway and plant defence - a genomics perspective. Molecular Plant Pathology 3: 371-390

Domon B, Aebersold R (2006) Review - Mass spectrometry and protein analysis. Science 312: 212-217

Dudareva N, Andersson S, Orlova I, Gatto N, Reichelt M, Rhodes D, Boland W, Gershenzon J (2005) The nonmevalonate pathway supports both monoterpene and sesquiterpene formation in snapdragon flowers. Proceedings of the National Academy of Sciences of the United States of America 102: 933-938

Dunn WB, Bailey NJC, Johnson HE (2005) Measuring the metabolome: current analytical technologies. Analyst 130: 606-625

Duran AL, Yang J, Wang LJ, Sumner LW (2003) Metabolomics spectral formatting, alignment and conversion tools (MSFACTs). Bioinformatics 19: 2283-2293

Dworzanski JP, Snyder AP (2005) Classification and identification of bacteria using mass spectrometry-based proteomics. Expert Review of Proteomics 2: 863-878

Endt DV, Kijne JW, Memelink J (2002) Transcription factors controlling plant secondary metabolism: what regulates the regulators? Phytochemistry 61: 107-114

Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass- spectral data of peptides with amino-acid-sequences in a protein database. Journal of the American Society for Mass Spectrometry 5: 976-989

Evans WC (2002) Pytochemical variation within a species. In Pharmacognosy Ed 15th ed. Edinburgh, New York, pp 80-90

Fenyo D, Beavis RC (2003) A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Analytical Chemistry 75: 768-774

274

Ferreira LA, Henriques OB, Andreoni AA, Vital GR, Campos MM, Habermehl GG, de Moraes VL (1992) Antivenom and biological effects of ar-turmerone isolated from Curcuma longa (Zingiberaceae). Toxicon 30: 1211-1218

Fey SJ, Larsen PM (2001) 2D or not 2D. Current Opinion in Chemical Biology 5: 26-33

Fiehn O (2002) Metabolomics - the link between genotypes and phenotypes. Plant Molecular Biology 48: 155-171

Fridman E, Pichersky E (2005) Metabolomics, genomics, proteomics, and the identification of enzymes and their substrates and products. Current Opinion in Plant Biology 8: 242-248

Gang DR (2005) Evolution of flavors and scents. Annual Review of Plant Biology 56: 301-325

Gang DR, Beuerle T, Ullmann P, Werck-Reichhart D, Pichersky E (2002a) Differential production of meta hydroxylated phenylpropanoids in sweet basil peltate glandular trichomes and leaves is controlled by the activities of specific acyltransferases and hydroxylases. Plant Physiology 130: 1536-1544

Gang DR, Lavid N, Zubieta C, Chen F, Beuerle T, Lewinsohn E, Noel JP, Pichersky E (2002b) Characterization of phenylpropene O-methyltransferases from sweet basil: Facile change of substrate specificity and convergent evolution within a plant O- methyltransferase family. Plant Cell 14: 505-519

Gang DR, Wang JH, Dudareva N, Nam KH, Simon JE, Lewinsohn E, Pichersky E (2001) An investigation of the storage and biosynthesis of phenylpropenes in sweet basil. Plant Physiology 125: 539-555

Gantet P, Memelink J (2002) Transcription factors: tools to engineer the production of pharmacologically active plant metabolites. Trends in Pharmacological Sciences 23: 563-569

Gao J, Opiteck GJ, Friedrichs MS, Dongre AR, Hefta SA (2003) Changes in the protein expression of yeast as a function of carbon source. Journal of Proteome Research 2: 643-649

Garg SN, Bansal RP, Gupta MM, Kumar S (1999) Variation in the rhizome essential oil and curcumin contents and oil quality in the land races of turmeric Curcuma longa of North Indian plains. Flavour and Fragrance Journal 14: 315-318

275

Garg SN, Mengi N, Patra NK, Charles R, Kumar S (2002) Chemical examination of the leaf essential oil of Curcuma longa L. from the North Indian plains. Flavour and Fragrance Journal 17: 103-104

Ge H, Walhout AJM, Vidal M (2003) Integrating 'omic' information: a bridge between genomics and systems biology. Trends in Genetics 19: 551-560

Glinski M, Weckwerth W (2006) The role of mass spectrometry in plant systems biology. Mass Spectrometry Reviews 25: 173-214

Gorg A, Obermaier C, Boguth G, Harder A, Scheibe B, Wildgruber R, Weiss W (2000) The current state of two-dimensional electrophoresis with immobilized pH gradients. Electrophoresis 21: 1037-1053

Griffin JL, Shockcor JP (2004) Metabolic profiles of cancer cells. Nature Reviews Cancer 4: 551-561

Gu HW, Chen HW, Pan ZZ, Jackson AU, Talaty N, Xi BW, Kissinger C, Duda C, Mann D, Raftery D, Cooks RG (2007) Monitoring diet effects via biofluids and their implications for metabolomics studies. Analytical Chemistry 79: 89-97

Gygi SP, Rist B, Gerber SA, Turecek F, Gelb MH, Aebersold R (1999a) Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nature Biotechnology 17: 994-999

Gygi SP, Rochon Y, Franza BR, Aebersold R (1999b) Correlation between protein and mRNA abundance in yeast. Molecular and Cellular Biology 19: 1720-1730

Hahlbrock K, Scheel D (1989) Physiology and molecular-biology of phenylpropanoid metabolism. Annual Review of Plant Physiology and Plant Molecular Biology 40: 347-369

Hajheidari M, Abdollahian-Noghabi M, Askari H, Heidari M, Sadeghian SY, Ober ES, Salekdeh GH (2005) Proteome analysis of sugar beet leaves under drought stress. Proteomics 5: 950-960

Hall RD (2006) Plant metabolomics: from holistic hope, to hype, to hot topic. New Phytologist 169: 453-468

Harris MA, Clark JI, Ireland A, Lomax J, Ashburner M, Collins R, Eilbeck K, Lewis S, Mungall C, Richter J, Rubin GM, Shu SQ, Blake JA, Bult CJ, Diehl AD, Dolan ME, Drabkin HJ, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Binkley G, Cherry JM, Christie KR, Costanzo MC, Dong Q, Engel SR, Fisk DG, Hirschman JE, Hitz BC, Hong EL, Lane C, Miyasato S, Nash R, Sethuraman A,

276

Skrzypek M, Theesfeld CL, Weng SA, Botstein D, Dolinski K, Oughtred R, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Mulder N, Chisholm R, Fey P, Gaudet P, Kibbe W, Pilcher K, Bastiani CA, Kishore R, Schwarz EM, Sternberg P, Van Auken K, Gwinn M, Hannick L, Wortman J, Aslett M, Berriman M, Wood V, Bromberg S, Foote C, Jacob H, Pasko D, Petri V, Reilly D, Seiler K, Shimoyama M, Smith J, Twigger S, Jaiswal P, Seigfried T, Collmer C, Howe D, Westerfield M (2006) The Gene Ontology (GO) project in 2006. Nucleic Acids Research 34: D322-D326

He XG, Lin LZ, Lian LZ, Lindenmaier M (1998) Liquid chromatography electrospray mass spectrometric analysis of curcuminoids and sesquiterpenoids in turmeric (Curcuma longa). Journal of Chromatography A 818: 127-132

Hendriks M, Cruz-Juarez L, De Bont D, Hall RD (2005) Preprocessing and exploratory analysis of chromatographic profiles of plant extracts. Analytica Chimica Acta 545: 53-64

Hodson MP, Dear GJ, Roberts AD, Haylock CL, Ball RJ, Plumb RS, Stumpf CL, Griffin JL, Haselden JN (2007) A gender-specific discriminator in Sprague-Dawley rat urine: The deployment of a metabolic profiling strategy for biomarker discovery and identification. Analytical Biochemistry 362: 182-192

Hoffmann L, Besseau S, Geoffroy P, Ritzenthaler C, Meyer D, Lapierre C, Pollet B, Legrand M (2005) Acyltransferase-catalysed p-coumarate ester formation is a committed step of lignin biosynthesis. Plant Biosystems 139: 50-53

Hoffmann L, Maury S, Martz F, Geoffroy P, Legrand M (2003) Purification, cloning, and properties of an acyltransferase controlling shikimate and quinate ester intermediates in phenylpropanoid metabolism. Journal of Biological Chemistry 278: 95-103

Holscher D, Schneider B (1995) A Diarylheptanoid Intermediate in the Biosynthesis of Phenylphenalenones in Anigozanthos Preissii. Journal of the Chemical Society- Chemical Communications: 525-526

Hossain A, Ishimine Y (2007) Effects of farmyard manure on growth and yield of turmeric (Curcuma longa L.) cultivated in dark-red soil, red soil and gray soil in Okinawa, Japan. Plant Production Science 10: 146-150

Hossain MA, Ishimine Y (2005) Growth, yield and quality of turmeric (Curcuma longa L.) cultivated on dark-red soil, gray soil and red soil in Okinawa, Japan. Plant Production Science 8: 482-486

277

Hossain MA, Ishimine Y, Akamine H, Motomura K (2005a) Effects of seed rhizome size on growth and yield of turmeric (Curcuma longa L.). Plant Production Science 8: 86-94

Hossain MA, Ishimine Y, Motomura K, Akamine H (2005b) Effects of planting pattern and planting distance on growth and yield of turmeric (Curcuma longa L.). Plant Production Science 8: 95-105

Houghton PJ (2002) Tranditional plant medicines as a source of new drugs. In WC Evans, GE Trease, D Evans, eds, Trease and Evans pharmacognosy. Edinburgh, New York Hu S, Loo JA, Wong DT (2006) Human body fluid proteome analysis. Proteomics 6: 6326-6353

Iijima Y, Davidovich-Rikanati R, Fridman E, Gang DR, Bar E, Lewinsohn E, Pichersky E (2004a) The biochemical and molecular basis for the divergent patterns in the biosynthesis of terpenes and phenylpropenes in the peltate glands of three cultivars of basil. Plant Physiology 136: 3724-3736

Iijima Y, Gang DR, Fridman E, Lewinsohn E, Pichersky E (2004b) Characterization of geraniol synthase from the peltate glands of sweet basil. Plant Physiology 134: 370-379

Iijima Y, Wang GD, Fridman E, Pichersky E (2006) Analysis of the enzymatic formation of citral in the glands of sweet basil. Archives of Biochemistry and Biophysics 448: 141-149

Ishimine Y, Hossain A, Ishimine Y, Murayama S (2003) Optimal planting depth for turmeric (Curcuma longa L.) cultivation in dark red soil in Okinawa Island, Southern Japan. Plant Production Science 6: 83-89

Ito J, Heazlewood JL, Millar AH (2007) The plant mitochondrial proteome and the challenge of defining the posttranslational modifications responsible for signalling and stress effects on respiratory functions. Physiologia Plantarum 129: 207-224

Jacobs DI, Gaspari M, van der Greef J, van der Heijden R, Verpoorte R (2005) Proteome analysis of the medicinal plant Catharanthus roseus. Planta 221: 690-704

Jacobs DI, van der Heijden R, Verpoorte R (2000) Proteomics in plant biotechnology and secondary metabolism research. Phytochemical Analysis 11: 277-287

Jagetia GC, Aggarwal BB (2007) "Spicing up" of the immune system by curcumin. Journal of Clinical Immunology 27: 19-35

278

Jayaprakasha GK, Jagan L, Rao M, Sakariah KK (2005) Chemistry and biological activities of C. longa. Trends in Food Science & Technology 16: 533-548

Jayaprakasha GK, Jena BS, Negi PS, Sakariah KK (2002) Evaluation of antioxidant activities and antimutagenicity of turmeric oil: a byproduct from curcumin production. Z Naturforsch [C] 57: 828-835

Jenkins H, Hardy N, Beckmann M, Draper J, Smith AR, Taylor J, Fiehn O, Goodacre R, Bino RJ, Hall R, Kopka J, Lane GA, Lange BM, Liu JR, Mendes P, Nikolau BJ, Oliver SG, Paton NW, Rhee S, Roessner-Tunali U, Saito K, Smedsgaard J, Sumner LW, Wang T, Walsh S, Wurtele ES, Kell DB (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nature Biotechnology 22: 1601-1606

Jiang HL, Somogyi A, Jacobsen NE, Timmermann BN, Gang DR (2006a) Analysis of curcuminoids by positive and negative electrospray ionization and tandem mass spectrometry. Rapid Communications in Mass Spectrometry 20: 1001-1012

Jiang HL, Timmermann BN, Gang DR (2006b) Use of liquid chromatography- electrospray ionization tandem mass spectrometry to identify diarylheptanoids in turmeric (Curcuma longa L.) rhizome. Journal of Chromatography A 1111: 21-31

Jiang HL, Xie ZZ, Koo HJ, McLaughlin SP, Timmermann BN, Gang DR (2006c) Metabolic profiling and phylogenetic analysis of medicinal Zingiber species: Tools for authentication of ginger (Zingiber officinale Rosc.). Phytochemistry 67: 1673-1685

Jolad SD, Lantz RC, Solyom AM, Chen GJ, Bates RB, Timmermann BN (2004) Fresh organically grown ginger (Zingiber officinale): composition and effects on LPS- induced PGE(2) production. Phytochemistry 65: 1937-1954

Jorgensen RA, Cluster PD, English J, Que QD, Napoli CA (1996) Chalcone synthase cosuppression phenotypes in petunia flowers: Comparison of sense vs antisense constructs and single-copy vs complex T-DNA sequences. Plant Molecular Biology 31: 957-973

Julsing MK, Koulman A, Woerdenbag HJ, Quax WJ, Kayser O (2006) Combinatorial biosynthesis of medicinal plant secondary metabolites. Biomolecular Engineering 23: 265-279

Jung WW, Huh Y, Ryu JC, Lee E, Sul D (2005) Development of proteomics and applications of proteomics in toxicology. Molecular & Cellular Toxicology 1: 7- 12

279

Kamo T, Hirai N, Tsuda M, Fujioka D, Ohigashi H (2000) Changes in the content and biosynthesis of phytoalexins in banana fruit. Bioscience Biotechnology and Biochemistry 64: 2089-2098

Kasahara H, Hanada A, Kuzuyama T, Takagi M, Kamiya Y, Yamaguchi S (2002) Contribution of the mevalonate and methylerythritol phosphate pathways to the biosynthesis of gibberellins in Arabidopsis. Journal of Biological Chemistry 277: 45188-45194

Kell DB, Westerhoff HV (1986) Towards a Rational Approach to the Optimization of Flux in Microbial Biotransformations. Trends in Biotechnology 4: 137-142

Keller A, Eng J, Zhang N, Li XJ, Aebersold R (2005) A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Molecular Systems Biology Keller A, Nesvizhskii AI, Kolker E, Aebersold R (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical Chemistry 74: 5383-5392

Kim HK, Choi YH, Erkelens C, Lefeber AW, Verpoorte R (2005a) Metabolic fingerprinting of Ephedra species using 1H-NMR spectroscopy and principal component analysis. Chem Pharm Bull (Tokyo) 53: 105-109

Kim HK, Choi YH, Erkelens C, Lefeber AWM, Verpoorte R (2005b) Metabolic fingerprinting of Ephedra species using H-1-NMR spectroscopy and principal component analysis. Chemical & Pharmaceutical Bulletin 53: 105-109

Kim SI, Kim JY, Kim EA, Kwon KH, Kim KW, Cho K, Lee JH, Nam MH, Yang DC, Yoo JS, Park YM (2003) Protoome analysis of hairy root from Panax ginseng C.A. Meyer using peptide fingerprinting, internal sequencing and expressed sequence tag data. Proteomics 3: 2379-2392

Kiso Y, Suzuki Y, Oshima Y, Hikino H (1983) Sesquiterpenoids .59. Stereostructure of curlone, a sesquiterpenoid of Curcuma longa rhizomes. Phytochemistry 22: 596- 597

Kliebenstein DJ (2004) Secondary metabolites and plant/environment interactions: a view through Arabidopsis thaliana tinged glasses. Plant Cell and Environment 27: 675-684

Knight V, Sanglier JJ, DiTullio D, Braccili S, Bonner P, Waters J, Hughes D, Zhang L (2003) Diversifying microbial natural products for drug discovery. Applied Microbiology and Biotechnology 62: 446-458

280

Koeduka T, Fridman E, Gang DR, Vassao DG, Jackson BL, Kish CM, Orlova I, Spassova SM, Lewis NG, Noel JP, Baiga TJ, Dudareva N, Pichersky E (2006) Eugenol and isoeugenol, characteristic aromatic constituents of spices, are biosynthesized via reduction of a coniferyl alcohol ester. Proceedings of the National Academy of Sciences of the United States of America 103: 10128-10133

Koehn FE, Carter GT (2005) The evolving role of natural products in drug discovery. Nat Rev Drug Discov 4: 206-220

Kolker E, Higdon R, Hogan JM (2006) Protein identification and expression analysis using mass spectrometry. Trends in Microbiology 14: 229-235

Koller A, Washburn MP, Lange BM, Andon NL, Deciu C, Haynes PA, Hays L, Schieltz D, Ulaszek R, Wei J, Wolters D, Yates JR (2002) Proteomic survey of metabolic pathways in rice. Proceedings of the National Academy of Sciences of the United States of America 99: 11969-11974

Kosuge T, Ishida H, Yamazaki H (1985) Studies on active substances in the herbs used for Oketsu (\"stagnant blood\") in Chinese medicine. III. On the anticoagulative principles in curcumae rhizoma. Chemical & Pharmaceutical Bulletin 33: 1499- 1502

Kunz BA, Cahill DM, Mohr PG, Osmond MJ, Vonarx EJ (2006) Plant responses to UV radiation and links to pathogen resistance. In International Review of Cytology - a Survey of Cell Biology, Vol 255, Vol 255, pp 1-40

Kurkin VA (2003) Phenylpropanoids from medicinal plants: Distribution, classification, structural analysis, and biological activity. Chemistry of Natural Compounds 39: 123-153

Kwon SJ, Choi EY, Choi YJ, Ahn JH, Park OK (2006) Proteomics studies of post- translational modifications in plants. Journal of Experimental Botany 57: 1547- 1551

Labra M, Miele M, Ledda B, Grassi F, Mazzei M, Sala F (2004) Morphological characterization, essential oil composition and DNA genotyping of Ocimum basilicum L. cultivars. Plant Science 167: 725-731

Lee HS (2006) Antiplatelet property of Curcuma longa L. rhizome-derived ar-turmerone. Bioresource Technology 97: 1372-1376

Lee J, Cooper B (2006) Alternative workflows for plant proteomic analysis. Molecular Biosystems 2: 621-626

281

Lei Z, Elmer AM, Watson BS, Dixon RA, Mendes PJ, Sumner LW (2005) A two- dimensional electrophoresis proteomic reference map and systematic identification of 1367 proteins from a cell suspension culture of the model legume Medicago truncatula. Molecular & Cellular Proteomics 4: 1812-1825

Lerch RN, Ferrer I, Thurman EM, Zablotowicz RM (2003) Identification of trifluralin metabolites in soil using ion-trap LC/MS/MS. In Liquid Chromatography/Mass Spectrometry, Ms/Ms and Time-of-Flight Ms, Vol 850, pp 291-310

Liang YZ, Xie PS, Chan K (2004) Quality control of herbal medicines. Journal of Chromatography B-Analytical Technologies in the Biomedical and Life Sciences 812: 53-70

Lippert D, Chowrira S, Ralph SG, Zhuang J, Aeschliman D, Ritland C, Ritland K, Bohlmann J (2007) Conifer defense against insects: Proteome analysis of Sitka spruce (Picea sitchensis) bark induced by mechanical wounding or feeding by white pine weevils (Pissodes strobi). Proteomics 7: 248-270

Liu HB, Sadygov RG, Yates JR (2004) A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Analytical Chemistry 76: 4193- 4201

Liu J, Bell AW, Bergeron JJ, Yanofsky CM, Carrillo B, Beaudrie CE, Kearney RE (2007) Methods for peptide identification by spectral comparison. Proteome Sci 5: 3

Liu ZJ (2000) Drought-induced in vivo synthesis of camptothecin in Camptotheca acuminata seedlings. Physiologia Plantarum 110: 483-488

Liu ZJ, Carpenter SB, Bourgeois WJ, Yu Y, Constantin RJ, Falcon MJ, Adams JC (1998) Variations in the secondary metabolite camptothecin in relation to tissue age and season in Camptotheca acuminata. Tree Physiology 18: 265-270

Liu ZJ, Carpenter SB, Constantin RJ (1997) Camptothecin production in Camptotheca acuminata seedlings in response to shading and flooding. Canadian Journal of Botany-Revue Canadienne De Botanique 75: 368-373

Logemann E, Parniske M, Hahlbrock K (1995) Modes of expression and common structural features of the complete phenylalanine ammonia-lyase gene family in parsley. Proceedings of the National Academy of Sciences of the United States of America 92: 5905-5909

Lu P, Vogel C, Wang R, Yao X, Marcotte EM (2007) Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nature Biotechnology 25: 117-124

282

Lum JHK, Fung KL, Cheung PY, Wong MS, Lee CH, Kwok FSL, Leung MCP, Hui PK, Lo SCL (2002) Proteome of Oriental ginseng Panax ginseng C. A. Meyer and the potential to use it as an identification tool. Proteomics 2: 1123-1130

Ma XQ, Gang DR (2006) Metabolic profiling of turmeric (Curcuma longa L.) plants derived from in vitro micropropagation and conventional greenhouse cultivation. Journal of Agricultural and Food Chemistry 54: 9573-9583

Maheshwari RK, Singh AK, Gaddipati J, Srimal RC (2006) Multiple biological activities of curcumin: a short review. Life Sci 78: 2081-2087

Manzan A, Toniolo FS, Bredow E, Povh NP (2003) Extraction of essential oil and pigments from Curcuma longa [L.] by steam distillation and extraction with volatile solvents. Journal of Agricultural and Food Chemistry 51: 6802-6807

Marouga R, David S, Hawkins E (2005) The development of the DIGE system: 2D fluorescence difference gel analysis technology. Analytical and Bioanalytical Chemistry 382: 669-678

Masuda T, Jitoe A, Isobe J, Nakatani N, Yonemori S (1993) Anti-oxidative and anti- inflammatory curcumin-related phenolics from rhizomes of Curcuma domestica. Phytochemistry 32: 1557-1560

Mathai CK (1976) Variability in turmeric (Curcuma Species) germplasm for essential oil and curcumin. Qualitas Plantarum-Plant Foods for Human Nutrition 25: 227-230

Mathai CK (1978) Pattern of rhizome yield and their accumulation of commercially important chemical constituents in turmeric (curcuma species), during growth and development. Qualitas Plantarum-Plant Foods for Human Nutrition 28: 219-225

Mattoli L, Cangi F, Maidecchi A, Ghiara C, Ragazzi E, Tubaro M, Stella L, Tisato F, Traldi P (2006) Metabolomic fingerprinting of plant extracts. J Mass Spectrom 41: 1534-1545

Memelink J, Verpoorte R, Kijne JW (2001) ORCAnization of jasmonate - responsive gene expression in alkaloid metabolism. Trends in Plant Science 6: 212-219

Milobedzka J, v. Kostanecki S, Lampe V (1910) Curcumin. Berichte der Deutschen Chemischen Gesellschaft 43: 2163-2170

Munshi MK, Hossain SN, Islam MR, Hakim L, Ahmed G (2004) In vitro mass propagation of turmeric (Curcuma longa L.) using buds from rhizomes. Bangladesh Journal of Botany 33: 125-128

283

Murch SJ, Rupasinghe HP, Goodenowe D, Saxena PK (2004a) A metabolomic analysis of medicinal diversity in Huang-qin (Scutellaria baicalensis Georgi) genotypes: discovery of novel compounds. Plant Cell Reports 23: 419-425

Murch SJ, Rupasinghe HP, Goodenowe D, Saxena PK (2004b) A metabolomic analysis of medicinal diversity in Huang-qin (Scutellaria baicalensis Georgi) genotypes: discovery of novel compounds. Plant Cell Rep 23: 419-425

Nakayama R, Tamura Y, Yamanaka H, Kikuzaki H, Nakatani N (1993) Two curcuminoid pigments from Curcuma domestica. Phytochemistry 33: 501-502

Nam MH, Heo EJ, Kim JY, Kim SI, Kwon KH, Seo JB, Kwon O, Jong SY, Park YM (2003) Proteome analysis of the responses of Panax ginseng C.A. Meyer leaves to high light: Use of electrospray ionization quadrupole-time of flight mass spectrometry and expressed sequence tag data. Proteomics 3: 2351-2367

Nam MH, Kim SI, Liu JR, Yang DC, Lim YP, Kwon KH, Yoo JS, Park YM (2005) Proteomic analysis of Korean ginseng (Panax ginseng C.A. Meyer). Journal of Chromatography B-Analytical Technologies in the Biomedical and Life Sciences 815: 147-155

Nandi A (1991) Genetic-variability in turmeric (Curcuma longa). Indian Journal of Agricultural Sciences 61: 941-942

Nesvizhskii AI, Aebersold R (2005) Interpretation of shotgun proteomic data - The protein inference problem. Molecular & Cellular Proteomics 4: 1419-1440

Nesvizhskii AI, Keller A, Kolker E, Aebersold R (2003) A statistical model for identifying proteins by tandem mass spectrometry. Analytical Chemistry 75: 4646-4658

Neubauer G, King A, Rappsilber J, Calvio C, Watson M, Ajuh P, Sleeman J, Lamond A, Mann M (1998) Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex. Nature Genetics 20: 46-50

Newman DJ, Cragg GM (2007) Natural products as sources of new drugs over the last 25 years. Journal of Natural Products 70: 461-477

Newton RP, Brenton AG, Smith CJ, Dudley E (2004) Plant proteome analysis by mass spectrometry: principles, problems, pitfalls and recent developments. Phytochemistry 65: 1449-1485

284

Nicholson JK, Wilson ID (2003) Understanding 'global' systems biology: Metabonomics and the continuum of metabolism. Nature Reviews Drug Discovery 2: 668-676

Nie L, Wu G, Zhang WW (2006) Correlation of mRNA expression and protein abundance affected by multiple sequence features related to translational efficiency in Desulfovibrio vulgaris: A quantitative analysis. Genetics 174: 2229- 2243

Nobeli I, Thornton JM (2006) A bioinformatician's view of the metabolome. Bioessays 28: 534-545

O'Dell M, Metzlaff M, Flavell RB (1999) Post-transcriptional gene silencing of chalcone synthase in transgenic petunias, cytosine methylation and epigenetic variation. Plant Journal 18: 33-42

O'Neill MJ, Lewis JA (2002) Plant products and high throughput screening. In WC Evans, GE Trease, D Evans, eds, Trease and Evans pharmacognosy. Edinburgh New York, pp 109-114

Ody P (1993) Ocimum basilicum. In P Ody, ed, The complete medicinal herbal, Ed 1st American ed. . Dorling Kindersley, London : New York, p 82

Oliver SG, Winson MK, Kell DB, Baganz F (1998) Systematic functional analysis of the yeast genome. Trends in Biotechnology 16: 373-378

Ong SE, Blagoev B, Kratchmarova I, Kristensen DB, Steen H, Pandey A, Mann M (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Molecular & Cellular Proteomics 1: 376-386

Ounaroon A, Decker G, Schmidt J, Lottspeich F, Kutchan TM (2003) (R,S)-Reticuline 7- O-methyltransferase and (R,S)-norcoclaurine 6-O-methyltransferase of Papaver somniferum - cDNA cloning and characterization of methyl transfer enzymes of alkaloid biosynthesis in opium poppy. Plant Journal 36: 808-819

Oxenham SK, Svoboda KP, Walters DR (2005) Antifungal activity of the essential oil of basil (Ocimum basilicum). Journal of Phytopathology 153: 174-180

Pan ZZ, Gu HW, Talaty N, Chen HW, Shanaiah N, Hainline BE, Cooks RG, Raftery D (2007) Principal component analysis of urine metabolites detected by NMR and DESI-MS in patients with inborn errors of metabolism. Analytical and Bioanalytical Chemistry 387: 539-549

285

Pande C, Chanotiya CS (2006) Constituents of the leaf oil of Curcuma longa L. from Uttaranchal. Journal of Essential Oil Research 18: 166-167

Pandey A, Mann M (2000) Proteomics to study genes and genomes. Nature 405: 837-846

Park BS, Kim JG, Kim MR, Lee SE, Takeoka GR, Oh KB, Kim JH (2005) Curcuma longa L. constituents inhibit sortase A and Staphylococcus aureus cell adhesion to fibronectin. J Agric Food Chem 53: 9005-9009

Park JJ, Yoon SYH, Cho HY, Son SY, Rhee HS, Park JM (2006) Patterns of protein expression upon adding sugar and elicitor to the cell culture of Eschscholtzia californica. Plant Cell Tissue and Organ Culture 86: 257-269

Park SY, Kim D (2002) Discovery of natural products from Curcuma longa that protect cells from beta-amyloid insult: A drug discovery effort against Alzheimer's disease. Journal of Natural Products 65: 1227-1231

Patterson SD, Aebersold RH (2003) Proteomics: the first decade and beyond. Nature Genetics 33: 311-323

Peng JM, Schwartz D, Elias JE, Thoreen CC, Cheng DM, Marsischky G, Roelofs J, Finley D, Gygi SP (2003) A proteomics approach to understanding protein ubiquitination. Nature Biotechnology 21: 921-926

Perkins DN, Pappin DJC, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20: 3551-3567

Petit J, Maggiora GM (2005) Application of rough set theory to structure-activity relationships. Abstracts of Papers of the American Chemical Society 229: U604- U604

Pierce KM, Hope JL, Hoggard JC, Synovec RE (2006) Principal component analysis based method to discover chemical differences in comprehensive two- dimensional gas chromatography with time-of-flight mass spectrometry (GC x GC-TOFMS) separations of metabolites in plant samples. Talanta 70: 797-804

Pitasawat B, Champakaew D, Choochote W, Jitpakdi A, Chaithong U, Kanjanapothi D, Rattanachanpichai E, Tippawangkosol P, Riyong D, Tuetun B, Chaiyasit D (2007) Aromatic plant-derived essential oil: An alternative larvicide for mosquito control. Fitoterapia 78: 205-210

Pongsuwan W, Fukusaki E, Bamba T, Yonetani T, Yamahara T, Kobayashi A (2007) Prediction of Japanese green tea ranking by gas chromatography/mass

286

spectrometry-based hydrophilic metabolite fingerprinting. Journal of Agricultural and Food Chemistry 55: 231-236

Porubleva L, Vander Velden K, Kothari S, Oliver DJ, Chitnis PR (2001) The proteome of maize leaves: Use of gene sequences and expressed sequence tag data for identification of proteins with peptide mass fingerprints. Electrophoresis 22: 1724-1738

Pothitirat W, Gritsanapan W (2006) Variation of bioactive components in Curcuma longa in Thailand. Current Science 91: 1397-1400

Prathanturarug S, Soonthornchareonnon N, Chuakul W, Phaidee Y, Saralamp P (2005) Rapid micropropagation of Curcuma longa using bud explants pre-cultured in thidiazuron-supplemented liquid medium. Plant Cell Tissue and Organ Culture 80: 347-351

Pruess M, Fleischmann W, Kanapin A, Karavidopoulou Y, Kersey P, Kriventseva E, Mittard V, Mulder N, Phan I, Servant F, Apweiler R (2003) The Proteome Analysis database: a tool for the in silico analysis of whole proteomes. Nucleic Acids Research 31: 414-417

Raina VK, Srivastava SK, Jain N, Ahmad A, Syamasundar KV, Aggarwal KK (2002) Essential oil composition of Curcuma longa L. cv. Roma from the plains of northern India. Flavour and Fragrance Journal 17: 99-102

Raina VK, Srivastava SK, Syamsundar KV (2005) Rhizome and leaf oil composition of Curcuma longa from the lower Himalayan region of northern India. Journal of Essential Oil Research 17: 556-559

Rakhunde SD, Munjal SV, Patil SR (1998) Curcumin and essential oil contents of some commonly grown turmeric (Curcuma longa L.) cultivars in Maharashtra. Journal of Food Science and Technology-Mysore 35: 352-354

Ralph SG, Yueh H, Friedmann M, Aeschliman D, Zeznik JA, Nelson CC, Butterfield YSN, Kirkpatrick R, Liu J, Jones SJM, Marra MA, Douglas CJ, Ritland K, Bohlmann J (2006) Conifer defence against insects: microarray gene expression profiling of Sitka spruce (Picea sitchensis) induced by mechanical wounding or feeding by spruce budworms (Choristoneura occidentalis) or white pine weevils (Pissodes strobi) reveals large-scale changes of the host transcriptome. Plant Cell and Environment 29: 1545-1570

Ramassamy C (2006) Emerging role of polyphenolic compounds in the treatment of neurodegenerative diseases: a review of their intracellular targets. Eur J Pharmacol 545: 51-64

287

Ramirez-Ahumada MD, Timmermann BN, Gang DR (2006) Biosynthesis of curcuminoids and gingerols in turmeric (Curcuma longa) and ginger (Zingiber officinale): Identification of curcuminoid synthase and hydroxycinnamoyl-CoA thioesterases. Phytochemistry 67: 2017-2029

Reddy ACP, Lokesh BR (1992) Studies on Spice principles as antioxidants in the inhibition of lipid-peroxidation of rat-liver microsomes. Molecular and Cellular Biochemistry 111: 117-124

Reinold S, Hahlbrock K (1997) In situ localization of phenylpropanoid biosynthetic mRNAs and proteins in parsley (Petroselinum crispum). Botanica Acta 110: 431- 443

Rodriguez-Concepcion M (2006) Early steps in isoprenoid biosynthesis: multilevel regulation of the supply of common precursors Phytochemistry Reviews 5: 1-15

Rodriguez-Concepcion M, Boronat A (2002) Elucidation of the methylerythritol phosphate pathway for isoprenoid biosynthesis in bacteria and plastids. A metabolic milestone achieved through genomics. Plant Physiology 130: 1079- 1089

Rohmer M (1999) The discovery of a mevalonate-independent pathway for isoprenoid biosynthesis in bacteria, algae and higher plants. Nat Prod Rep 16: 565-574

Rohmer M, Seemann M, Horbach S, BringerMeyer S, Sahm H (1996) Glyceraldehyde 3- phosphate and pyruvate as precursors of isoprenic units in an alternative non- mevalonate pathway for terpenoid biosynthesis. Journal of the American Chemical Society 118: 2564-2566

Rollinger JM, Langer T, Stuppner H (2006) Integrated in silico tools for exploiting the natural products' bioactivity. Planta Medica 72: 671-678

Rose JKC, Bashir S, Giovannoni JJ, Jahn MM, Saravanan RS (2004) Tackling the plant proteome: practical approaches, hurdles and experimental tools. Plant Journal 39: 715-733

Sadygov RG, Liu HB, Yates JR (2004) Statistical models for protein validation using tandem mass spectral data and protein amino acid sequence databases. Analytical Chemistry 76: 1664-1671

Salvi ND, George L, Eapen S (2001) Plant regeneration from leaf base callus of turmeric and random amplified polymorphic DNA analysis of regenerated plants. Plant Cell Tissue and Organ Culture 66: 113-119

288

Satovic Z, Liber Z, Karlovic K, Kolak I (2002) Genetic relatedness among basil (Ocimum spp.) accessions using RAPD markers. Acta Biologica Cracoviensia Series Botanica 44: 155-160

Seger C, Sturm S (2007) Analytical aspects of plant metabolite profiling platforms: Current standings and future aims. Journal of Proteome Research 6: 480-497

Senapati HK, Pal AK, Samant PK (2005) Effect of chemical fertilizer, organic manure, lime and biofertilizer on yield of turmeric (Curcuma longa). Indian Journal of Agricultural Sciences 75: 593-595

Shah BH, Nawaz Z, Pertani SA, Roomi A, Mahmood H, Saeed SA, Gilani AH (1999) Inhibitory effect of curcumin, a food spice from turmeric, on platelet-activating factor- and arachidonic acid-mediated platelet aggregation through inhibition of thromboxane formation and Ca2+ signaling. Biochem Pharmacol 58: 1167-1172

Sharma RA, Gescher AJ, Steward WP (2005) Curcumin: The story so far. European Journal of Cancer 41: 1955-1968

Shevchenko A, Wilm M, Vorm O, Mann M (1996) Mass spectrometric sequencing of proteins from silver stained polyacrylamide gels. Analytical Chemistry 68: 850- 858

Shishodia S, Sethi G, Aggarwal BB (2005) Curcumin: Getting back to the roots. In Natural Products and Molecular Therapy, Vol 1056, pp 206-217

Shohael AM, Ali MB, Yu KW, Hahn EJ, Paek KY (2006) Effect of temperature on secondary metabolites production and antioxidant enzyme activities in Eleutherococcus senticosus somatic embryos. Plant Cell Tissue and Organ Culture 85: 219-228

Simpson RJ, Connolly LM, Eddes JS, Pereira JJ, Moritz RL, Reid GE (2000) Proteomic analysis of the human colon carcinoma cell line (LIM 1215): development of a membrane protein database. Electrophoresis 21: 1707-1732

Singh G, Singh OP, Maurya S (2002) Chemical and biocidal investigations on essential oils of some Indian Curcuma species. Progress in Crystal Growth and Characterization of Materials 45: 75-81

Singh S, Khar A (2006) Biological effects of curcumin and its role in cancer chemoprevention and therapy. Anticancer Agents Med Chem 6: 259-270

289

Singte MB, Yamger VT, Kathmale DK, Gaikwad DT (1997) Growth, productivity and water use of turmeric (Curcuma longa) under drip irrigation. Indian Journal of Agronomy 42: 547-549

Smith CA, Want EJ, O'Maille G, Abagyan R, Siuzdak G (2006) XCMS: Processing mass spectrometry data for metabolite profiling using Nonlinear peak alignment, matching, and identification. Analytical Chemistry 78: 779-787

Somparn P, Phisalaphong C, Nakornchai S, Unchern S, Morales NP (2007) Comparative antioxidant activities of curcumin and its demethoxy and hydrogenated derivatives. Biol Pharm Bull 30: 74-78

Srinivasan KR (1952) The coloring matter in turmeric. Current Science 21: 311-312

Srinivasan KR (1953) Chromatographic study of the curcuminoids in Curcuma longa. Journal of Pharmacy and Pharmacology 5: 448-457

Staples G, Kristiansen MS (1999) Ethnic culinary herbs : a guide to identification and cultivation in Hawai'i. University of Hawai'i Press, Honolulu : HI

Steele CL, Crock J, Bohlmann J, Croteau R (1998) Sesquiterpene synthases from grand fir (Abies grandis) - Comparison of constitutive and wound-induced activities, and cDNA isolation, characterization and bacterial expression of delta-selinene synthase and gamma-humulene synthase. Journal of Biological Chemistry 273: 2078-2089

Steen H, Mann M (2004) The ABC's (and XYZ's) of peptide sequencing. Nature Reviews Molecular Cell Biology 5: 699-711

Steen H, Pandey A (2002) Proteomics goes quantitative: measuring protein abundance. Trends in Biotechnology 20: 361-364

Steinbeck C (2004) Recent developments in automated structure elucidation of natural products. Natural Product Reports 21: 512-518

Sumner LW, Mendes P, Dixon RA (2003) Plant metabolomics: large-scale phytochemistry in the functional genomics era. Phytochemistry 62: 817-836

Sunitibala H, Damayanti M, Sharma GJ (2001) In vitro propagation and rhizome formation in Curcuma longa Linn. Cytobios 105: 71-82

Szkopinska A, Plochocka D (2005) Farnesyl diphosphate synthase; regulation of product specificity. Acta Biochimica Polonica 52: 45-55

290

Tapsell LC, Hemphill I, Cobiac L, Patch CS, Sullivan DR, Fenech M, Roodenrys S, Keogh JB, Clifton PM, Williams PG, Fazio VA, Inge KE (2006) Health benefits of herbs and spices: the past, the present, the future. Med J Aust 185: S4-24

Tayyem RF, Heath DD, Al-Delaimy WK, Rock CL (2006) Curcumin content of turmeric and curry powders. Nutr Cancer 55: 126-131

Thangapazham RL, Sharma A, Maheshwari RK (2006) Multiple molecular targets in cancer chemoprevention by curcumin. Aaps J 8: E443-449

Thongson C, Davidson PM, Mahakarnchanakul W, Weiss J (2004) Antimicrobial activity of ultrasound-assisted solvent-extracted spices. Lett Appl Microbiol 39: 401-406

Tikunov Y, Lommen A, de Vos CHR, Verhoeven HA, Bino RJ, Hall RD, Bovy AG (2005) A novel approach for nontargeted data analysis for metabolomics. Large- scale profiling of tomato fruit volatiles. Plant Physiology 139: 1125-1137

Tognolini M, Barocelli E, Ballabeni V, Bruni R, Bianchi A, Chiavarini M, Impicciatore M (2006) Comparative screening of plant essential oils: phenylpropanoid moiety as basic core for antiplatelet activity. Life Sci 78: 1419-1432

Tohti I, Tursun M, Umar A, Turdi S, Imin H, Moore N (2006) Aqueous extracts of Ocimum basilicum L. (sweet basil) decrease platelet aggregation induced by ADP and thrombin in vitro and rats arterio - venous shunt thrombosis in vivo. Thrombosis Research 118: 733-739

Turtola S, Manninen AM, Holopainen JK, Levula T, Raitio H, Kainulainen P (2002) Secondary metabolite concentrations and terpene emissions of Scots pine xylem after long-term forest fertilization. Journal of Environmental Quality 31: 1694- 1701

Vajragupta O, Boonchoong P, Morris GM, Olson AJ (2005) binding modes of curcumin in HIV-1 protease and integrase. Bioorg Med Chem Lett 15: 3364-3368 van Bentem SD, Roitinger E, Anrather D, Csaszar E, Hirt H (2006) Phosphoproteomics as a tool to unravel plant regulatory mechanisms. Physiologia Plantarum 126: 110-119 van den Berg RA, Hoefsloot HCJ, Westerhuis JA, Smilde AK, van der Werf MJ (2006) Centering, scaling, and transformations: improving the biological information content of metabolomics data. Bmc Genomics 7 van der Fits L, Memelink J (2000) ORCA3, a jasmonate-responsive transcriptional regulator of plant primary and secondary metabolism. Science 289: 295-297

291

Villas-Boas SG, Hojer-Pedersen J, Akesson M, Smedsgaard J, Nielsen J (2005) Global metabolite analysis of yeast: evaluation of sample preparation methods. Yeast 22: 1155-1169

Wahid A, Ghazanfar A (2006) Possible involvement of some secondary metabolites in salt tolerance of sugarcane. Journal of Plant Physiology 163: 723-730

Wang M, Lamers R, Korthout H, van Nesselrooij JHJ, Witkamp RF, van der Heijden R, Voshol PJ, Havekes LM, Verpoorte R, van der Greef J (2005) Metabolomics in the context of systems biology: Bridging traditional Chinese medicine and molecular pharmacology. Phytotherapy Research 19: 173-182

Wang WX, Zhou HH, Lin H, Roy S, Shaler TA, Hill LR, Norton S, Kumar P, Anderle M, Becker CH (2003) Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Analytical Chemistry 75: 4818-4826

Wang YL, Tang HR, Nicholson JK, Hylands PJ, Sampson J, Whitcombe I, Stewart CG, Caiger S, Oru I, Holmes E (2004) Metabolomic strategy for the classification and quality control of phytomedicine: A case study of chamomile flower (Matricaria recutita L.). Planta Medica 70: 250-255

Washburn MP, Koller A, Oshiro G, Ulaszek RR, Plouffe D, Deciu C, Winzeler E, Yates JR (2003) Protein pathway and complex clustering of correlated mRNA and protein expression analyses in Saccharomyces cerevisiae. Proceedings of the National Academy of Sciences of the United States of America 100: 3107-3112

Werker E, Putievsky E, Ravid U, Dudai N, Katzir I (1993a) Glandular Hairs and Essential Oil in Developing Leaves of Ocimum-Basilicum L (Lamiaceae). Annals of Botany 71: 43-50

Werker E, Putievsky E, Ravid U, Dudai N, Katzir I (1993b) Glandular hairs and essential oil in developing leaves of Ocimum Basilicum L (Lamiaceae). Annals of Botany 71: 43-50

Wienkoop S, Zoeller D, Ebert B, Simon-Rosin U, Fisahn J, Glinski M, Weckwerth W (2004) Cell-specific protein profiling in Arabidopsis thaliana trichomes: identification of trichome-located proteins involved in sulfur metabolism and detoxification. Phytochemistry 65: 1641-1649

Wilm M, Shevchenko A, Houthaeve T, Breit S, Schweigerer L, Fotsis T, Mann M (1996) Femtomole sequencing of proteins from polyacrylamide gels by nano- electrospray mass spectrometry. Nature 379: 466-469

292

Withers ST, Keasling JD (2007) Biosynthesis and engineering of isoprenoid small molecules. Applied Microbiology and Biotechnology 73: 980-990

Wolsko PM, Solondz DK, Phillips RS, Schachter SC, Eisenberg DM (2005) Lack of herbal supplement characterization in published randomized controlled trials. American Journal of Medicine 118: 1087-1093

Wolters DA, Washburn MP, Yates JR (2001) An automated multidimensional protein identification technology for shotgun proteomics. Analytical Chemistry 73: 5683- 5690

Xia Q, Zhao KJ, Huang ZG, Zhang P, Dong TTX, Li SP, Tsim KWK (2005) Molecular genetic and chemical assessment of rhizoma curcumae in China. Journal of Agricultural and Food Chemistry 53: 6019-6026

Yamgar VT, Kathmale DK, Belhekar PS, Patil RC, Patil PS (2001) Effect of different levels of nitrogen, phosphorus and potassium and split applicaton of N on growth and yield of turmeric (Curcuma longa). Indian Journal of Agronomy 46: 372-374

Yang F, Lim GP, Begum AN, Ubeda OJ, Simmons MR, Ambegaokar SS, Chen PP, Kayed R, Glabe CG, Frautschy SA, Cole GM (2005) Curcumin inhibits formation of amyloid beta oligomers and fibrils, binds plaques, and reduces amyloid in vivo. J Biol Chem 280: 5892-5901

Yang SY, Kim HK, Lefeber AW, Erkelens C, Angelova N, Choi YH, Verpoorte R (2006) Application of two-dimensional nuclear magnetic resonance spectroscopy to quality control of ginseng commercial products. Planta Med 72: 364-369

Yates JR, Eng JK, McCormack AL (1995a) Mining genomes - correlating tandem mass- spectra of modified and unmodified peptides to sequences in nucleotide databases. Analytical Chemistry 67: 3202-3210

Yates JR, Eng JK, McCormack AL, Schieltz D (1995b) Method to correlate tandem mass-spectra of modified peptides to amino-acid-sequences in the protein database. Analytical Chemistry 67: 1426-1436

Zangerl AR, Berenbaum MR (1998) Damage-inducibility of primary and secondary metabolites in the wild parsnip (Pastinaca sativa). Chemoecology 8: 187-193

Zeng GF, Xiao HB, Liu JX, Liang XM (2006) Identification of phenolic constituents in Radix Salvia miltiorrhizae by liquid chromatography/electrospray ionization mass spectrometry. Rapid Communications in Mass Spectrometry 20: 499-506

293

Zhao J, Davis LC, Verpoorte R (2005) Elicitor signal transduction leading to production of plant secondary metabolites. Biotechnology Advances 23: 283-333

Zulak KG, Cornish A, Daskalchuk TE, Deyholos MK, Goodenowe DB, Gordon PM, Klassen D, Pelcher LE, Sensen CW, Facchini PJ (2007) Gene transcript and metabolite profiling of elicitor-induced opium poppy cell cultures reveals the coordinate regulation of primary and secondary metabolism. Planta 225: 1085- 1106