UNIVERSIDADE FEDERAL DO CEARÁ CENTRO DE CIÊNCIAS DEPARTAMENTO DE BIOQUÍMICA E BIOLOGIA MOLECULAR PROGRAMA DE PÓS-GRADUAÇÃO EM BIOQUÍMICA

MOHIBULLAH SHAH

PROTEOME ANALYSIS OF DEVELOPING SEEDS OF Jatropha curcas L.

FORTALEZA 2014

MOHIBULLAH SHAH

PROTEOME ANALYSIS OF DEVELOPING SEEDS OF Jatropha curcas L.

Tese apresentada ao Curso de Doutorado em Bioquímica do Departamento de Bioquímica e Biologia Molecular da Universidade Federal do Ceará, como parte dos requisitos para obtenção do título de Doutor em Bioquímica. Área de concentração: Bioquímica vegetal.

Orientador: Prof. Francisco A. P. Campos. Co-orientador: Prof. Fabio C. S. Nogueira.

FORTALEZA 2014

Dados Internacionais de Catalogação na Publicação Universidade Federal do Ceará Biblioteca de Ciências e Tecnologia

S537p Shah, Mohibullah. Proteome analysis of developing seeds of Jatropha curcas L. / Mohibullah Shah. – 2014. 168 f. : il., color., enc. ; 30 cm.

Tese (doutorado) – Universidade Federal do Ceará, Centro de Ciência, Departamento de Bioquímica e Biologia Molecular, Programa de Pós-Graduação em Bioquímica, Fortaleza, 2014. Área de Concentração: Bioquímica vegetal. Orientação: Prof. Dr. Francisco de Assis de Paiva Campos. Coorientação: Prof. Dr. Fabio César Sousa Nogueira.

1. Plantas oleaginosas. 2. Biodiesel. 3. Pinhão-manso. 4. Espectrometria de massa. I. Título.

CDD 574-192

ACKNOWLEDGEMENTS

First of all I would like to thank to ALLAH (SWT) , for blessing me good health, courage, strength during the difficult moments and making me enable to reach here. I would also like to thank and appreciate some people who directly or indirectly support me in successful completion of this work.

First and foremost, I express my wholehearted gratitude to my respected supervisor, Prof. Francisco A. P. Campos (Prof. Chico), for his trust, unfading interest, consistent encouragement, stimulating suggestions, expert advices, guidance and above all for his friendly relationship throughout this period. I am also thankful to his family, especially his wife Dr. Ursula Wille , for her respect and loving attitudes.

Thanks to Prof. Gilberto B. Domont (UFRJ), for keeping the doors of his lab (Proteomic Unit) open for me whenever I visited, also for his friendship, guidance, encouragement and being a source of inspiration. My gratitude to his wife Prof. Solange Guimaraes (UFRJ), for her loving and friendly attitude, and for inviting me a number of times in her home during my stay in Rio de Janeiro.

Special respect for Prof. Fabio C.S. Nogueira (UFRJ), my friend and co- supervisor, among the few competent young scientists I met so far, for his guidance and sharing of knowledge with me. In each of our discussions I always learned new things from him. I am also very thankful for his continuous guidance and help in the development of this project.

Warm thanks to Dr. Paulo C. Carvalho (FIOCRUZ-PR), a competent young scientist and a great example of simplicity, for his help and collaboration in the analysis of data. I highly acknowledge his prompt reply whenever I contacted him for assistance in data analysis during the entire period of my PhD course.

I am thankful to Prof. Arlete A. Soares (Department of Biology, UFC), for her help in histological analysis of the seeds.

My sincere compliments to Dr. Jonas E. A. Perales together with Prof. Gilberto, Prof. Fabio and Dr. Paulo , for accepting the invitation to be a member of the evaluation committee of my PhD thesis.

I appreciate Emanoella Lima Soares (Manu) and Camila B. Pinheiro (minha amigas de mais tempo), for their great company, friendship, fruitful discussions, and continuous support since my first day in the Lab. I want to specially thank Manu , for her help in the analysis of data and preparation of this thesis.

My appreciation for Dr. Fabiano M. Texeira (meu amigo de mais tempo), and his wife Alice and mother Antonia, for the friendship , accepting me as a member of their family, inviting me in all the moments of happiness and for the help I needed during difficult moments. I am also thankful to Fabiano, for sharing his apartment with me after my arrival in Brazil, and for his great company.

Thanks to all the former and new members of the Bioplant lab , including Veronica, Magda, Carlos, Isabelle, Washington, Antonio, Tiago and Muciana , for their pleasant company in the lab.

I would like to extend my special thanks to all the members of the Proteomic unit (UFRJ), especially Gabriel and Prof. Magno , for their help in conducting mass spectrometry experiments.

I am also thankful to my friends Anwar and Rizwan (UFRJ) for giving me space in their apartment, for their moral support during my stay in Rio de Janeiro and for conducting the mass spectrometry experiments.

Special regards to my close friends, Mr. Javeed (South Korea), Dr. Umar (Unicamp), Dr. Asif (USP) and Dr. Asifullah (China) for their great moral support during completion of this work.

Very special respect to my loving parents , who have always been so close to me and I found them whenever I needed, for their unconditional love, dedication and support throughout my carrier; a continuous source of motivation to set higher goals in life.

Sincere regards to my late grandfather Dr. Mohammad Shah , for his love and efforts, in providing me with quality education that enabled me to reach here. Without his support, life would have been very difficult. I am thankful to my brothers and sisters for their love and support.

I would like to pay my very special thanks to my life partner , for her loving company, that greatly helped me in understanding the true meaning of life and without her moral help the completion of my PhD would not have been possible. Thank you for every thing.

My Parents in law, for their affection, dedication and love. I am also thankful to my Brother in law Mr. Noman , for solving all my problems, especially bureaucratic things that I needed from Pakistan.

Special thanks to Third World Academy of Sciences (TWAS) and Brazilian National Council for Scientific and Technological Development (CNPq), for their financial support.

Kind regards to the Department of Biochemistry and Molecular Biology of Plants , Federal University of Ceará and to the Department of Biochemistry , Institute of Chemistry, Federal University of Rio de Janeiro, for providing the infrastructure.

ABSTRACT

Physic nut ( Jatropha curcas L.) is an important crop due to its ability of storing high content of oil in the seeds, which can serve as raw material for biodiesel production. Because of the presence of toxic constituents like phorbol esters (PEs) and curcins, the seed cake produced as a result of oil extraction cannot be utilize for animal feed. Development of the genotypes better suited for the industrial applications and biodiesel production as well as with lower level of toxic constituents is being hampered by a lack of understanding about the a) related to the biosynthesis and degradation of fatty acids (FAs) and triacylglycerides (TAGs), b) role of proteins deposited during seed development and c) proteins related to the synthesis and storage of toxic compounds during seed development. Agreeing with this, we have performed the anatomical analysis of the developing seeds of J. curcas , followed by the proteome analysis of the endosperm isolated from the seeds of J. curcas at five different developmental stages, which resulted into the identification of the 1517, 1256, 1033, 752 and 307 proteins, from Stage 6, 7, 8, 9 and 10, respectively, summing up to a total of 1760 proteins. Proteins with similar expression pattern were grouped into five different clusters and quantification based on spectral counts was determined. Besides identification of the proteins involved in the biosynthesis and degradation of the FAs and TAGs, we also identified a large number of proteins involved in the metabolism of the carbohydrates, which are important for supplying energy and carbon source for the synthesis of TAGs in heterotrophic seeds. Among the members of different classes of seed storage proteins (SSPs), we have identified four SSPs named as nutrients reservoir, which in contrast to the other SSPs showed decreasing deposition pattern during seeds development and revealed to have special role during seed development. In addition, peptidases belong to different mechanistic classes were identified, which have a range of functions, highlighting the role in reserve mobilization during germination. Isoforms of curcin were also identified in this proteome analysis which were absent in our previous proteome analysis of the other tissues from these seeds, suggesting that the deposition of these toxic proteins only occur in the endosperm. Similarly, several involved in the biosynthesis of diterpenoid precursors were identified in this proteome analysis but, like in our previous proteome analysis of the other tissues from J. curcas seeds,we were unable to identify any terpene synthase/cyclase, enzymes responsible for the synthesis of PEs, which collectively suggesting that the synthesis of PEs may not occur in seeds of this plant. In conclusion, the strategy used here enabled us to provide a first in depth proteome analysis of the endosperm from J. curcas developing seeds, which along with providing information regarding important aspects of the seed development, also set the foundation of a proteomic approach to study biotechnologically important plant species.

Key words: Endosperm, proteome, oilseeds, biodiesel, mass spectrometry, seed development, seed proteins

RESUMO

Pinhão manso ( Jatropha curcas L.) é uma cultura importante devido à sua habilidade em armazenar alto conteúdo de óleo nas sementes, as quais podem servir como matéria-prima para a produção de biodiesel. Devido à presença de constituintes tóxicos como ésteres de forbol e curcina, a torta da semente produzida como resultado da extração do óleo não pode ser utilizada na alimentação animal. O desenvolvimento de genótipos mais adequados a aplicações industriais e à produção de biodiesel assim como apresentando baixos níveis de constituintes tóxicos está sendo prejudicado pela falta de entendimento sobre a) proteínas relacionadas a biossíntese e degradação de ácidos graxos e triacilgliceróis, b) o papel de proteínas depositadas durante o desenvolvimento da semente e c) proteínas relacionadas à síntese e reserva de compostos tóxicos durante o desenvolvimento da semente. Diante disso, nós realizamos uma análise anatômica de sementes em desenvolvimento de J. curcas , seguido por uma análise proteômica do endosperma isolado de sementes dessa espécie em cinco diferentes estágios de desenvolvimento, o que resultou na identificação de 1517, 1256, 1033, 752 e 307 proteínas, dos estágios 6, 7, 8, 9 e 10, respectivamente, somando um total de 1760 proteínas. Proteínas com padrão de expressão similar foram agrupadas em cinco grupos diferentes e a quantificação das proteínas baseada na contagem dos espectros foi determinada. Além da identificação das proteínas envolvidas na biossíntese e degradação de FAs e TAGs, nós identificamos um grande número de proteínas envolvidas no metabolismo de carboidratos, as quais são importantes para o fornecimento de energia e fontes de carbono para a síntese de TAGs em sementes heterotróficas. Entre os membros de diferentes classes de proteínas de reservas de sementes (SSPs), nós identificamos quatro SSPs denominadas reservatórios de sementes, que em contraste as outras SSPs mostraram decréscimo no padrão de deposição e revelaram ter um papel especial durante o desenvolvimento da semente. Em adição, peptidases pertencentes a diferentes classes mecanísticas foram identificadas destacando o papel da mobilização de reservas durante a germinação. Isoformas da curcina ausentes em nossas análises proteômicas prévias de outros tecidos da semente foram identificadas sugerindo que a deposição dessas proteínas tóxicas só ocorre no endosperma. Similarmente, várias enzimas envolvidas na biosíntese de precursores de diterpenóides foram identificadas nessa análise proteômica, mas como em nossas prévias análises proteômicas de outros tecidos de sementes de J. curcas , nós não fomos capazes de identificar sintases/ciclases de terpenos, enzimas responsáveis pela síntese de PEs, o que coletivamente sugere que a síntese desses compostos pode não ocorrer nas sementes dessa planta. Em conclusão, a estratégia utilizada nos fornece a primeira análise proteômica profunda do endosperma de sementes em desenvolvimento de J. curcas , o que além de fornecer informações sobre aspectos importantes do desenvolvimento da semente, também estabelece a base para uma pesquisa proteômica com o objetivo de estudar espécies vegetais importantes biotecnologicamente.

Palavras chave: Endosperma, proteoma, oleaginosas, biodiesel, espectrometria de massa, desenvolvimento da semente, proteínas de semente

LIST OF FIGURES

Figure 1 Components of Jatropha curcas fruit that could be utilized for 25 the production of different kind of bio-fuels. Figure adapted from Singh et al., (2008).

Figure 2 Structure of PEs backbone (Tigliane and Phorbol) and the 26 various types of PEs identified from Jatropha curcas (C1 to C6) (Evans 1986 apud Goel et al., 2007; Haas et al., 2002).

Figure 3 Jatropha curcas ovule and developing seed structures (Singh, 29 1970).

Figure 4 Nuclear endosperm development (Olsen, 2001). 31

Figure 5 Morphology of Jatropha curcas seed collected at 40 days after 35 pollination.

Figure 6 Fatty acid (FA) and triacylglycerides (TAG) biosynthesis in 38 oilseeds (Weselake et al., 2009).

Figure 7 General workflow summarizing the process from seeds 52 collection to proteomics analysis of the extracted proteins.

Figure 8 Workflow showing steps used for data analysis. 58

Jatropha curcas seeds development. Seed external morphology Figure 9 63 (A.1-A.9), internal part of the seeds (B.1-B.9) and histological analysis of the seeds (C.1-C9) at different developmental stages S-I to S-IX.

Figure 10 SDS-PAGE of proteins extracted from the endosperm of Jatropha curcas seeds at five different developmental stages. 64

Figure 11 Distribution of the 1760 proteins appeared in at least two 67 biological replicates among the five developmental stages used for proteomic analysis.

Figure 12 Gene Ontology annotation of the proteins identified from 68 Jatropha curcas endosperm at five developmental stages.

Figure 13 Gene ontology annotation of the unique proteins, present in each 69 developmental stage of Jatropha curcas seed.

Figure 14 Functional classification of the identified proteins according to 71 KEGG functional classification.

Figure 15 Functional classification of the identified proteins from the 72 Jatropha curcas endosperm based on MapMan classification.

Figure 16 Spectral count based expression comparison of the proteins 74 identified from Stage 7, 8, 9 and 10 to those identified in Stage 6, using T-test.

Figure 17 Cluster analysis of the proteins identified in Stage 6, 7, 8 and 9. 75 Proteins with similar expression profile were grouped into five different clusters.

Figure 18 Distribution of the identified peptidases and peptidase inhibitors 105 into different mechanistic classes.

Figure 19 Venn diagram showing proteins common to endosperm 116 proteome and three other proteomes analyzed in our lab ( A). Venn diagram showing comparison of the endosperm proteins with proteins identified in other proteomic analysis of the Jatropha curcas (B). Venn diagram showing comparison of the four proteome results obtained in our lab with proteins identified in other proteomic studies of Jatropha curcas (C).

LIST OF TABLES

Table 1 Composition of polyacrylamide mini gels. 54

Table 2 Proteins identified from the endosperm of Jatropha curcas seeds at 65 five different developmental stages.

Table 3 Proteins related to the metabolism of carbohydrates, identified from 77 the endosperm of developing Jatropha curcas seeds.

Table 4 Proteins related to the lipids metabolism, identified from the 87 endosperm of developing Jatropha curcas seeds.

Table 5 Seed Storage Proteins (SSPs) and proteins related to seed 94 maturation, identified from the endosperm of developing Jatropha curcas seeds.

Table 6 Peptidases and peptidase inhibitors, identified from the endosperm 100 of developing Jatropha curcas seeds.

Table 7 Proteins related to the terpenoids biosynthesis identified from the 112 endosperm of developing Jatropha curcas seeds.

LIST OF SUPPLEMENTARY TABLES

Supplementary Table I Table represents total number of proteins including common contaminants and decoys, identified from the endosperm of Jatropha curcas developing seeds at five different developmental stages.

Supplementary Table II This table shows total number of proteins and their , identified from the endosperm of Jatropha curcas developing seeds at five different developmental stages.

Supplementary Table III This table contains proteins appeared in at least two biological replicates, identified from the endosperm of Jatropha curcas developing seeds at five different developmental stages. These proteins were considered as true representatives of each developmental stage used here.

Supplementary Table IV This Supplementary Table summarizes grouping of the maximum parsimony groups to different clusters and spectral count based pairwise comparison of the proteins identified from Stage 8, 9 and 10, to Stage 6.

Supplementary Tables along with the raw data were deposited to the Chorus (https://chorusproject.org/pages/index.html), a publically available MS data repository, and currently made available only to the members of the evualvation committee of this thesis.

LIST OF ABBREVIATIONS

2-DE Two -dimensional electrophoresis AACT Acetoacetyl -CoA thiolase AAPVD Approximately Area Proportional Venn Diagram ACBP Acyl-CoA binding protein ACN Acetonitrile ACP Acyl carrier protein ACS Acetyl synthetase ADP-glucose Adenosine diphosphate glucose AP Ammonium persulfate ATP Adenosine triphosphate BH Ben jamini -Hochberg BSA Bovine serum albumin cDNA Complementary DNA CID Collision-induced dissociation CoA Coenzyme-A CS Casbene synthase DAG Diacyl glycerol DDA Data-dependent analysis DGAT Diacylglycerol acyltransferase DNA Deoxyribonucleic acid DOXP 1-deoxy -D-xylulose -5-phosphate ER Endoplasmic reticulum ESI Electrospray ionization FAs Fatty acids FAD2 Oleate desaturase FAD3 Linoleate desaturase FDR False discovery rate FPP Farnesyl diphosphate FT-ICR Fourier transform ion cyclotron resonance GGPP Geranylgeranyl diphosphate GO Gene ontology

GOEx Gene ontology explorer GPAT Glycerol -3-phosphate acyltransferase GPP Gera nyl diphosphate GUI Graphical user interface HCL Hydro chloridric acid HIV Anti-human immunodeficiency virus HMGCS Hydroxymethylglutaryl-CoA synthase IAA Iodoacetamide IPP Isopentenyl diphosphate KAAS KEGG automatic annotation server KAS Ketoacyl ACP-synthase LC Liquid chromatography LCACS Long chain acyl -CoA synthetase LEA Late embryogenesis abundant LPAAT Lysophosphatidic acid acyl LTQ Linear trap quadropole M Molar MALDI Matrix assisted laser desorption ionization MDD Mevalonate diphosphate decarboxylase mM Millimolar mRNA Messenger ribonucleic acid MS Mass spectrometry MVA Mevalonate NADH Nicotinamide adenine dinucleotide NADPH Nicotinamide adenine dinucleotide phosphate

NAHCO 3 Sodium bicarbonate NCBI National Center for Biotechnology Information

NH 4HCO 3 Ammonium bicarbonate OB Oil body OPP Oxidative pentose phosphate PAGE Polyacrylamide gel electrophoresis PCD Programmed cell death PDHC Pyruvate dehydrogenase complex PE Phorbol est er

PKC Protein kinase C PL Phospholipids PTMs Post -translational modifications PVPP Polyvinyl -polypyrrolidone RIP Ribosome-inactivating proteins RNA Ribonucleic acid ROS Reactive oxygen species rRNA Ribosomal ribonucleic acid SAD Stearoyl-ACP desaturase SDS Sodium dodecyl sulfate SEPro Search engine processor SSPs Seed storage proteins TAG Triacylglycerides TCA Tricarboxylic acid pathway TEMED N,N,N ′,N ′-tetramethylethylenediamine TOF Time of flight tRNA Transfer ribonucleic acid VPE Vacuolar processing WHO World Health Organization

TABLE OF CONTENTS

1 INTRODUCTION ...... 19

1.1 Biotechnological importance of Jatropha curcas ...... 20

1.2 Toxicity of Jatropha curcas seeds ...... 24

1.3 Ovule structure and endosperm development ...... 28

1.4 Role of endosperm in reserves deposition ...... 32

1.5 Endosperm role in seed development ...... 34

1.6 Triacylglycerides metabolism and accumulation in seeds ...... 36

1.7 Plants proteomics ...... 40

1.8 Mass spectrometry ...... 42

1.8.1 ESI LTQ-Orbitrap ...... 43

1.9 Post-genomic era and Jatropha curcas ...... 45

2 OBJECTIVES ...... 50

2.1 General objective ...... 50

2.2 Specific objectives ...... 50

3 MATERIALS AND METHODS ...... 51

3.1 Plant material ...... 51

3.2 Histological analysis and selection of developmental stages for the endosperm isolation...... 51

3.3 Endosperm isolation and protein extraction ...... 53

3.4 1D-SDS-PAGE and samples preparation for LC-MS/MS ...... 53

3.5 Protocol used for peptides cleaning through spin columns ...... 56

3.6 NanoLC-MS/MS analysis ...... 57

3.7 DATA analysis ...... 57

3.7.1 Proteins idenitifcation ...... 57

3.7.2 Functional classification ...... 59

3.7.3 Cluster analysis and quantification ...... 60

3.8 Data reposition ...... 61

4 RESULTS AND DISCUSSIONS ...... 62

4.1 Anatomical analysis of developing seeds of Jatropha curcas ...... 62

4.2 Proteins identification and data analysis ...... 62

4.3 Functional classification of the identified proteins...... 66

4.4 Proteins quantification and cluster analysis ...... 73

4.5 Major functional classes ...... 76

4.5.1 Proteins related to the metabolism of carbohydrates ...... 76

4.5.2 Proteins related to lipids metabolism ...... 86

4.5.3 Seed Storage and Desiccation related proteins ...... 93

4.5.4 Peptidases and peptidase inhibitors ...... 99

4.5.4.1 Serine peptidases ...... 106

4.5.4.2 Metallo peptidases ...... 107

4.5.4.3 Aspartic peptidases ...... 108

4.5.4.4 Cysteine peptidases ...... 109

4.5.4.5 Threonine peptidases ...... 110

4.5.5 Proteins related to toxic components ...... 111

4.6 Contribution of this study to the establishment of the deep proteome of developing Jatropha curcas seeds ...... 114

5 GENERAL CONCLUSIONS ...... 118

6 FUTURE PERSPECTIVES...... 120

7 REFERENCES...... 122

8 PUBLICATIONS, WORKSHOPS AND CONFERENCES…………………....142

9 ATTACHMENTS………………………………………………………………….144

19

1 INTRODUCTION

Jatropha curcas L. belongs to family Euphorbiaceae, and is a multipurpose small tree with substantial economical and pharmacological potential. It is commonly known as Physic nut, Purging nut, Black vomit nut, Barbados nut or big purge nut (Makkar et al ., 1998a). This plant is native to tropical regions of America but later was introduced to various tropical and sub-tropical parts of the world (Openshaw, 2000; Pramanik, 2003). J. curcas has few pests and diseases, and is adapted to grow under a wide range of climatic conditions including dearth of rains where it sheds leaves as a response to drought. This plant can be used tfor the control of soil erosion, land reclamation and as a barrier around other plants for protecting them from animals. This plant is also planted as a commercial crop but the most important aspect of the plant is the accumulation of the viscous oil in its seeds. This oil, besides other industrial uses like soap making and cosmetics industry, has a potential to be used as a raw material for the production of biodiesel (Openshaw, 2000).

The genus Jatropha is derived from Greek words iatrós which stands for doctor and trophé for food, shows that this plant was initially famous for its medicinal uses. Almost all parts of the plant were found to have medicinal importance and their use against various health problems have been extensively reviewed in literature (Villegas et al ., 1997; Debnath and Bisen, 2008).

Different parts of J. curcas have also been evaluated for their chemical constituents and different biological activities, showing its pharmacological importance. For example, skin irritant compounds were isolated and partially characterized from seed oil of J. curcas (Adolf et al ., 1984) which were further proved to have tumor promoting activity on mouse skin (Hirota et al ., 1988). Anti-human immunodeficiency virus (HIV) activity were checked for the extracts of branches and leaves of J. curcas (Matsuse et al ., 1999) and it was found that water extract of the branches and methanolic extracts of the leaves have potent inhibitory activity against HIV induced cytophatic effect in cultured cells.

Phytochemical investigation of the stem bark extracts of J. curcas revealed the presence of different secondary metabolites like, saponins, steroids, tannins, glycosides, 20

alkaloids and flavonoids (Igbinosa, 2009; Nayak and Patel, 2010). Presence of these active compounds was found to be related to the antibacterial, antifungal and other bioactivities of this plant (Igbinosa, 2009).

In summary, an extensive literature is available highlighting the pharmacological importance of this plant but currently this plant is getting famous for its biotechnological aspects that is related to the use of its oil for biodiesel production.

1.1 Biotechnological importance of Jatropha curcas

The economic development and increased industrialization of the world resulted in a high demand for energy, which is mainly derived from the fossil fuel reserves like petroleum, coal and natural gas. Limited reserves of these different fossil fuels drawn attention of the scientific community to search for alternative fuels that could be obtained from renewable feedstocks (Openshaw, 2000; Pramanik, 2003; Demirbas, 2005). Biodiesel is one such option that could be used as an alternative to diesel fuels (Pinto et al ., 2005; Gui et al., 2008). On chemical basis biodiesel is the alkyl esters of fatty acids (FA) produced through the transesterification of oils or fats from animals or plants, with short chain alcohols like methanol and ethanol (Pinto et al ., 2005). Presently, 84% of the world biodiesel production is from rapeseeds oil, while sun flower oil counts for (13%), palm oil (1%), soybean oil and other sources produce around 2% (Gui et al., 2008). As more than 95% of the biodiesel production is from edible oil sources which means, that food is converted to fuel. Large scale production of biodiesel from such kind of edible sources can cause a global food imbalance in the future. Ravaging phenomenon related to the use of edible oil for biodiesel production diverted considerations towards the use of non-edible oil for biodiesel production (Gui et al ., 2008; Carels, 2009). There are many non-edible oleaginous plants that could be used as a source of biodiesel without compromising the food industry, such as neem (Azadirachta indica A.), karanja ( Millettia pinnata L.), mahua ( Madhuca sp), castor oil (Ricinus communis L.), simarouba ( Simarouba glauca ), wild apricot ( Prunus armeniaca ), jojoba ( Simmondsia chinensis ), kokum ( Garcinia indica ), mahua (Madhuca indica ), Calophyllum ionophyllum, physic nut ( Jatropha curcas L.), etc (Sudhakar Johnson et al ., 2011). Among these J. curcas has gained importance 21

primarily because its drought resistant and can be grown in abandoned land under diverse environmental conditions (Sudhakar Johnson, et al ., 2011), but obviously for high quantity of oil production it needs soil with suitable nitrate, phosphate and potassium (Carels, 2009).

J. curas seed kernel contains about 60% of oil that can replace the fossil diesel in the form of biodiesel (Makkar et al ., 1997; Pramanik, 2003). Its oil can be combusted without making any refinement and has been tested successfully as a fuel for the diesel engine (Jain and Sharma, 2010).

Oil of this plant has been used during the Second World War in Madagascar, Cape Verde and Benin as a petroleum diesel substitute (Foidl et al ., 1996; Gubitz et al ., 1999). In Thailand the engine tests with J. curcas oil were done, showed satisfactory engine performance. The feasibility of the production of FA ethyl esters from J. curcas oil was studied for African countries, and the economic evaluation has shown, that biodiesel production from this plant is profitable, provided its by product could be sold as valuable material (reviewed in Kumar and Sharma, 2008). A preliminary investigation on the suitability of various non-edible oilseeds to be used as a source of biodiesel in Cuba revealed, that based on oil yield and FA composition, J. curcas was found to be the most promising candidate for biodiesel production (Martin et al ., 2010). In countries like India J. curcas is becoming the future source of biodiesel and government of India already initiated programs for growing J. curcas on waste land for biodiesel production (Jain and Sharma, 2010). In different states of India 1.72 million hectares has been allocated for J. curcas cultivation and small quantities of J. curcas biodiesel have already been sold to local oil companies (Gui et al ., 2008).

J. curcas oil has some valuable properties like low acidity and good stability as compared to soybean oil, low viscosity as compared to castor oil and better cold properties as compared to palm oil. Furthermore, due to high cetane value compared to petroleum diesel, J. curcas oil act as a good alternative fuel requiring modification in the existing engine (Om Tapanes, 2008; Koh and Ghazi, 2011). FA composition of the oil varies from region to region but it mainly contains linoleic acids, oleic acids and palmitic acids with minor component of stearic and palmitoleic acids (Gubitz et al ., 1999; Abdulla et al ., 2011). The fresh oil of this plant is odorless and colorless which 22

may turn yellow upon standing. The biodiesel produced from J. curcas oil release lesser amount of carbon dioxide than the petroleum based diesel which makes J. curcas biodiesel superior over the ordinary diesel (Abdulla et al ., 2011).

Fuel characteristics of J. curcas are very similar to that of the diesel and cope with the American (ASTMD6751) and European (EN14214) standards (Kumar Tiwari et al ., 2007). The ability of biodiesel to meet the American and European standard criteria is mainly determined by the FA composition. Properties like, cetane number (CN), cold flow, cloud point, viscosity and stability are influenced by the FA composition. Petroleum diesel consisted of hydrocarbons with carbon chains of 8-10 carbon atoms while J. curcas oil mainly contain FA with carbon chains of 16-18, which inferred high CN to J. curcas biodiesel (King et al ., 2009). A factor that makes the biodiesel production less efficient is the presence of high level of free FA (14%) in crude J. curcas oil. However, pretreatment methods to overcome this problem are well developed (Koh and Ghazi, 2011). Another problem related to the oil of J. curcas is its high viscosity compared to petroleum diesel and hence its direct use in the engine needs further treatments. This issue has been addressed in many ways, like preheating the oils, blending or dilution with other fuels, transesterification and thermal cracking/pyrolysis. These further treatments make this oil expensive in comparison to the diesel oil. To reduce the cost of the process proper use of the byproducts from the plant is of great importance (Pramanik, 2003: Gui et al ., 2008).

Biotechnological approaches are currently focused especially to increase the oil content to achieve oil composition suitable for industrial applications and eliminate the phorbol ester (PE) in order to permit further use of the seed cake in animal feeding. Use of biodiesel containing high level of monounsaturated FAs is desirable, because high levels of polyunsaturated FAs negatively impact the biodiesel quality. For example, unsaturations reduce the biodiesel stability, affecting the cetane number (Qu et al., 2012). Oleic acid is the main FA that compose J. curcas oil and form 34,3% to 45,8% of the total FA content of its oil, followed by ployunsaturated linoleic acid, which constitute 29% to 44.2% of the total FAs (Gubitz et al ., 1999). Thus, increase in the FA content of oleic acid and reduction in the FA content of linoleic acid is desirable to improve the quality of biodiesel produced from J. curcas seeds. Identification of the genes that regulate FA chain length and saturation in J. curcas was achieved by Ye et 23

al. (2009). Among the other genes, JcFAD2-1 was found to be the most important, as it mediates the conversion of oleic acid to linoleic acid, making it a good candidate for genetic manipulation. Silencing of this gene was performed through RNA interference that causes a dramatic increase in oleic acid from 34% to more than 78%, and a corresponding reduction in polyunsaturated FA from 41% to less than 3% (Qu et al., 2012).

J. curcas oil is a renewable source with energy potential close to petroleum diesel, but, like other vegetable oils, main problem associated with it is the cost of production which makes it economically less feasible compared to diesel fuel. To add to the economic viability of the overall production process of the J. curcas biodiesel, development of the methods for the proper use of its byproducts such as seed cake, glycerin and fruit husks, are of supreme importance (Achten et al ., 2007). Seed cake obtained after the extraction of oil from seeds is an important byproduct that could be utilized in multiple ways to make the biodiesel production process cost effective (King et al ., 2009). J. curcas produce 1ton of seed cake per hectare and for instance, in India it is expected to grow this plant on 20 million hectares in coming years that can consequently produce 20 million tons of seed cake each year (Carels, 2009).

One of the possible uses of the seed cake is organic fertilizer (Openshaw 2000; King et al ., 2009; Sharma et al ., 2009), where it will not only increase the agricultural productivity but will also save the foreign exchange by replacing mineral fertilizers. However, investigation about the impact of the toxic constituents in this seed meal on the soil ecology will be important before its consideration as organic fertilizer (King et al ., 2009). After detoxification seed meal could be an efficient animal feed due to its rich protein contents which is another alternative to add value to the biodiesel production from the oil of this plant (Achten et al ., 2007; Debnath and Bisen 2008; King et al ., 2009). A study on the utilization of various parts of J. curcas fruits and seeds showed, that seed cake is rich in organic matter and has excellent potential to be used for biogas production (Sing et al ., 2008). This study revealed that biogas generation ability of the J. curcas seed cake is higher than that of the cattle dung and this biogas has more calorific value than cattle dung.

Like biodiesel bio-ethanol is another renewable energy source that is mainly 24

obtained from the fermentation of sugar sources like molasses, cereals and fruits (Mishra et al ., 2011). J. curcas seed cake is a carbohydrates rich source that could serve as raw material for the bio-ethanol production (Mishra et al ., 2011; Wever et al ., 2012). Use of J. curcas seed cake for bio-ethanol production will not only solve the problem of safe disposition of this byproduct but will also provide opportunity of obtaining two important renewable fuels at the price of one starting material (Mishra et al ., 2011). Along with the possibilities of the generation of energy from the seed cake it was also found to be an efficient source for the production of industrial enzymes like and lipases, due to its suitable composition for the microbial growth (Mahanta et al ., 2008).

According to Sing et al ., (2008) J. curcas fruit consists of about 37.5% shell and 62.5% seed, while seed further contains 42% hull/husk and 58% kernel (Figure 1). On whole fruit weight basis it gave 17-18% oil while rest of the material produced as a byproduct that constitute about 53.7% of the whole fruit. These various parts of the fruit are rich sources of lignocelluloses and hence can be used as a raw material for the bio- ethanol production (Visser et al., 2011; Singh et al., 2008). The authors further suggested a holistic approach for utilizing all parts of J. curcas fruit in the energy production in order to achieve maximum benefit out of it.

1.2 Toxicity of Jatropha curcas seeds

Although, the protein rich seed meal obtained after oil extraction could be utilized for the animal feed but this use is currently hampered due to the presence of toxic and anti-nutritional constituents in this meal. J. curcas seeds were found to be toxic to mice, rats, calves, sheep, goats, chickens and human (Aderibigbe et al ., 1997). J. curcas seeds contain a variety of toxic or anti-nutritional compounds but the toxicity is mainly because of the presence of a protein called curcin and diterpenoids called PE (Makkar et al ., 1997; King et al ., 2009). PE are the polycyclic compounds in which two neighboring carbon atoms are esterified by FA. They are tetracyclic diterpenoids and belong to the tigliane family of diterpenes (Figure 2). Tigliane is the basic unit of the PE containing four rings. Hydroxylation and further esterification with different FA moieties of this basic structure at different positions result in a variety of 25

Figure 1: Components of Jatropha curcas fruit, figure adapted from Singh et al., (2008).

26

27

PE (Goel et al ., 2007). So far six PE have been isolated and characterized from J. curcas oil (Figure 2) (Haas et al ., 2002). PE are distributed in different parts of the plant, however, they are found in high quantity in the seed kernel, of which, 70% is in the oil and rest in the deoiled seed cake (Makkar and Becker, 2009).

PE are known for their tumor promoting activity and mimic the diacyl glycerol (DAG) mechanism of action which is an activator of the protein kinase C (PKC), and regulate different cellular metabolic pathways including signal transduction pathways. These ester molecules act as the analogs for the DAG and strongly activate PKC which in turn triggers cell proliferation and amplify the efficacy of carcinogens (Goel et al ., 2007). Curcin is another toxin commonly found in the seeds of J. curcas and represent another barrier to use seed cake as animal feed (King et al ., 2009). It belongs to the family of ribosome-inactivating proteins (RIPs) which are classified as type I RIP when consisted exclusively of single RNA N-glycosidase (A chain) or type II RIP when additionally possess a domain with carbohydrate-binding activity (B chain) (Peumans et al., 2001). Curcin is a type-I RIP that halts the protein synthesis by damaging the ribosome through rRNA N-glycosidase activity and cleaves the specific glycosidic bond of adenine residue (Juan et al ., 2003; Lin et al ., 2003; Luo et al ., 2006). Ricin is a type II RIP found in castor oil (R. communis ) seeds that possess extreme toxicity due to the binding of the B chain to a sugar-containing receptor on the cell surface (Peumans et al., 2001). Due to the absence of B chain, curcin does not present the ability of binding to sugar-containing receptor on the cell surface and hence its toxicity is not as high as ricin. It has been reported that both recombinant and the crude curcin isolated from the J. curcas seeds have anti-tumor activity and further research is needed for its application as an anti-tumor medicine (Lin et al ., 2003; Luo et al ., 2006). Besides PE and curcin there are other toxins found in the seeds of J. curcas such as trypsin inhibitors and phytates, but these toxins may not contribute to the short term toxicity however, they can aggravate the toxic effects (Makkar et al ., 1997). Main difference between the toxic and non-toxic Mexican varieties is the presence of PE in the seeds. Non-toxic varieties have low to undetectable amount of PE while rest of the composition of the two genotypes is the same (Makkar et al ., 1997; Makkar et al ., 1998a; Makkar et al ., 1998b; He et al., 2011). 28

Numerous attempts have been made to detoxify the seed meal in order to make it suitable for animal consumption. Trypsin inhibitors are heat labile, and it was found that trypsin inhibition activity can be deactivated through heat (Makkar et al ., 1998b), however, heating did not affect other anti-nutritional components (Aderibigbe et al ., 1997). Double solvent extraction using hexane and ethanol combined with moist heat treatment enabled the complete inactivation of both trypsin inhibition and lectin activity (Chivandi et al ., 2004). Martínez-Herrera et al . (2006) tested the effects of various treatments on the toxic factors in the seed cake of J. curcas and found that, trypsin inhibitors could be easily deactivated with moist heating at 121 oC for at least 25 minutes. They also reported that ethanol extraction coupled with 0.07% sodium bicarbonate (NAHCO 3) not only decreased the lectin activity but also decreased the PE content to 97.9%. They further showed that along with other treatments, irradiations could also reduce the phytate and saponin contents. In another study seed meal was subjected to alkali and heat treatment and was found that PE concentration was reduced up to 89%. They further checked the toxicity of the treated and untreated seed meal on rats and concluded that the mortality rate of the rats fed with treated meal was decreased compared to the ones fed with untreated meals (Rakshit et al ., 2008). Much efforts have been made (Oskoueian et al ., 2011; Joshi et al ., 2011; Kumar et al ., 2011; Xiao et al ., 2011) for the detoxification of the seed cake to make it suitable for animal consumption but commercial exploitation of these procedures have not been materialized yet.

1.3 Ovule structure and endosperm development

Ovule is the precursor of seeds and play a central role in the sexual reproduction of both angiosperms and gymnosperms. In angiosperms, ovule harbor female gametophyte, embryo sac, which is surrounded by nucellus and one or two integuments (Figure 3) (Gasser et al ., 1998). After fertilization, the female gametophyte will form embryo and triploid endosperm, while integuments will differentiate to maternally derived seed coat (Ingram, 2010).

Endosperm originates as a result of double fertilization, a unique characteristics of the higher plants. During this process pollen tube delivers two male gametes to the 29

Figure 3: Jatropha curcas ovule and developing seed structure (Singh, 1970). An, antipodal cells; C, caruncle; E, egg cell; EM, embryo; EN, endosperm; H, hypostase; II, inner integument; N, nucellus; NB, nucellar beak; NR, nucellar remains; OB, obturator; OI, outer integument; PN, polar nuclei; RB, raphe bundle; Sy, synergids; VS, vascular supply.

30

embryo sac where one male gamete fertilizes the female gamete forming embryo of the daughter plant while other male gamete fuses with the diploid central cell. The triploid product of the second fertilization eventually develops to endosperm (Lopes and Larkins, 1993).

Level of persistency of the endosperm during development depends on the pattern of seed formation and the fate of endosperm in that particular species. For example, in cereal species, endosperm store reserves and remains as storage tissue in the mature seeds while in many dicotyledoneous species, endosperm is formed but partly degraded with the maturation of embryo. In peas, endosperm is absorbed at the free nuclear state as it has a non persistent endosperm (Lopes and Larkins, 1993; Olsen, 2001).

Endosperm development can be differentiated into three different types i.e. nuclear, cellular and helobial. Nuclear type is the most common one, where primary nucleus undergoes several divisions without cellularization, forming a large number of free nuclei organized at the sides of the central cavity. Cytokinesis starts from peripheral movement towards the center until whole endosperm is cellularized. In cellular type of endosperm development starting from the first nuclear division both the nuclear division and cytokinesis takes place together. In the less common, helobial type of endosperm development, the first primary endosperm cell will generate two different sized cells. The larger cell will follow the nuclear type development while the smaller one will remain undivided (Lopes and Larkins, 1993).

The mature ovule of J. curcas contains nucellus, surrounded by two integuments. Endosperm development in J. curcas is of nuclear type (Singh, 1970). In nuclear type of endosperm development, the fertilized triploid nucleus of the central cell undergoes mitotic divisions without cell wall formation, resulting into a multinucleate cell called as endosperm coenocyte (Figure 4). Cellularization is starts with the formation of radial microtubules around the free nuclei and the interactions between these microtubules eventually results in the formation of special structures called alveoli. Each alveolus is a circular structure harboring a nucleus and open towards the

31

Figure 4: Nuclear endosperm development (Olsen, 2001). (a) Fertilized triploid nucleus. (b) Endosperm coenocyte. (c) Formation of radial microtubular systems. (d) Cell wall formation. (e) Periclinal cell divisions. (f) Central vacuole occupied by the endosperm cells. 32

the central vacuole. Continued growth of the alveoli towards the central vacuole and their periclinal division results in the formation of a peripheral layer of the cells and a new layer of the alveoli towards the central vacuole. This process is repeated until the whole central vacuole get closed and replaced by new cells (Olsen, 2001).

1.4 Role of endosperm in reserves deposition

Seeds of the angiosperms constitute three basic parts: embryo proper, a tissue with stored reserves for use by the embryo till the stage of autotrophy and a seed coat that act as a protecting covering. Depending on the plant species the nutrients storage tissues may be cotyledons, endosperm or megagametophyte in the gymnosperms (Miernyk and Hajduch, 2011).

Endosperm is a unique feature of the flowering plants which primarily act as a storage tissue for the nutrients like, carbohydrates, proteins, lipids, nucleic acids and minerals, to nourish the embryo during development and germination (Lopes and Larkins, 1993; Wang et al ., 2009). Depending on the plant species, the amount of endosperm in mature seeds is variable. For example in cereals like rice ( Oryza sativa ), maize ( Zea mays ) and wheat ( Triticum aestivum ), it constitute a large portion of the seeds while it is prominent in the seeds of some dicotyledons, like castor oil (R. communis ). On the other hand, in species like Arabidopsis thaliana , the endosperm is almost absent in the mature seeds. Thus the mature seeds may be endosperm dominant with endosperm as the major part of their seeds or they may be embryo dominant with no remaining endosperm in the mature seeds. However, in both cases endosperm is essential for the development of the embryo in the same way like placenta is important for the mammalians embryo (Wang et al ., 2009; Miernyk and Hajduch, 2011). Depending on the physiological differences, the composition of the storage compounds among the plant species is different (Ekman et al., 2008). For example cereal seeds store the carbon source mostly in the form of starch and proteins (Ekman et al ., 2008). Cereal seeds also accumulate small amount of oil, deposited in the embryo that constitutes only small part of the seed structure, however, some maize varieties accumulate high quantity of oil, which is due to the enlarged embryo in these species (Alexander and Seif, 1963). This stored oil serves as the main source of energy during 33

germination for the newly developing plantlet (Ekman et al ., 2008). Oil palm is a monocot that stores reserves in the form of oil but the storage tissue is endosperm rather than cotyledons (Alang et al ., 1988). Besides the monocots there are dicot species such as castor oil (R. communis ) which retain endosperm as the main storage tissue and accumulates reserves mainly in the form of oil in this important tissue (Marriott and Northcote, 1975). Mature J. curcas seeds also accumulate oil as the main storage constituents in the endosperm. Cytohistological analysis showed that all portions of the mature seeds except seed coat, contributed to the accumulation of reserves but the main site of reserves deposition is the endosperm (Reale et al ., 2012).

With the accumulation of the other products, endosperm also accumulate a variety of different proteins, most abundant of which called as storage or reserve proteins. The main purpose of these different kinds of storage proteins in the seeds is the storing of nitrogen and sulfur for the seedlings during germination, as these proteins have high amide and sulfur containing amino acids (Lopes and Larkins, 1993). Based on their extraction and solubility reserve proteins are traditionally classified into different classes like, albumins (water soluble), globulins (dilute saline solution soluble), prolamins (alcoholic solution soluble) and glutelins (dilute acid or base soluble) (Shewry et al ., 1995). These proteins have some common characteristics, for example they are synthesized in higher quantity in some special tissue at the specific stage of development. Secondly they show presence of polymorphism, which may be due to the members of the multigenic families or proteolytic processing or glycosylation (Shewry et al ., 1995). Thirdly these proteins are stored in special structures call protein bodies. Presence of reserve proteins in particular cellular bodies could be an adaptation to prevent these proteins from the action of enzymes responsible for the turnover of the metabolic proteins. Another advantage of the packing of these proteins in a particular membrane organelles is that they are deposited in the comparatively non aqueous conditions, which facilitates the seed desiccation (Lopes and Larkins 1993).

Mostly the agriculturally important storage proteins are albumins, globulins and prolamins (Gibbs et al ., 1989; Shewry et al ., 1995; Xu and Messing, 2009). Albumins are common to all seeds while prolamins and glutelins are mostly abundant in the 34

monocotyledon seeds and globulins are predominant in the dicotyledon seeds (Xu and Messing, 2009; Weber et al ., 2005).

Storage proteins has been named in different ways, for example globulin storage proteins have further been grouped based on their sedimentation rate such as 7S and 11S (Shewry et al ., 1995). Taking this as an easy and straightforward system of nomenclature it is common to find references for 3S or 12S globulins (Miernyk and Hajduch, 2011) or at the extreme for the 2.2S and 11.3S globulins (Templeman et al ., 1987). In addition to this nomenclature, there are various proteins that were assigned trivial or informal names. For example cereal prolamins were named based on their Latin names i.e. zeins from maize, hordeins from barley etc. Similarly wheat prolamins were named as gliadins. 7S globulins are also called as vicilins and 11S globulins as legumins while 2S albumins were named as cactin (Reviewed in Miernyk and Hajduch, 2011).

Besides the storage proteins there are many other proteins which accumulate in the endosperm in high quantity, such as inhibitors, a-amylase inhibitors, lectins, thionins, RIPs, and certain enzymes, e.g., sucrose synthase, urease, etc. These proteins may be the secondary source of nitrogen and sulfur but their main role is the protection of seed from the attack of pathogens and predators (Lopes and Larkins, 1993).

1.5 Endosperm role in seed development

During seed growth, coordination among the endosperm, embryo and maternally derived nucellus and integument is important (Berger et al ., 2006). Endosperm separates embryo from the source of nutrients near to the chalazal region and from the integument, hence occupies a central position in the seed (Figure 5). Due to its position in the seed, endosperm has an essential role during seed development (Berger, 1999). It was found that there is close communication between the endosperm and integument, the two distinct seed components. It has been reported that decreasing the size of mutated endosperm accompanied by the decrease in the cells elongation of the integument in Arabidopsis (Garcia et al ., 2005). 35

Figure 5: Morphology of Jatropha curcas seed, collected at 40 days after pollination. Em, embryo; En, endosperm; Ts, testa; Ts - Ch, testa on the chalazal region.

36

Embryo development depends on the proper development of the endosperm. If the endosperm fails to develop it will ultimately cause the embryo to seize its development as well (Lafon-Placette and Köhler, 2014). Hirner et al. (1998) showed that the amino acids supply to the embryo is totally dependent on the endosperm. They suggested that the amino acids transport first occurs to the endosperm which acts as a transient storage tissue and at the later stages of development these amino acids are transferred to the embryo via maternal tissue. Identification of the early expression of a specific gene around the area of embryo suggested the existence of interactions between the embryo and endosperm (Opsahl-Ferstad et al ., 1997). This novel endosperm specific gene has been detected to express in a specific endosperm region around the embryo and was suggested to have role in the interaction between the two tissues. It is thus assumed that the endosperm has a critical role in the development of its neighboring embryo, though the endosperm development itself may be independent of the interaction with the embryo. This hypothesis was supported by the in vitro fertilization of the central cell (Kranz et al ., 1998). The in vitro fertilized endosperm was found to follow the same developmental pattern as that of the in vivo . Although the used medium contained various nutrients and hormones but the role played by embryo cannot be excluded.

In cereals the endosperm may affects the development of embryo in a way other than nourishment. It was found that different mutations related to the endosperm development can restrict the development and size of the embryo and can exert both positive and negative effects on embryo. Some mutants were found to have a larger embryo due to reduced size endosperm while in others the larger endosperm occupies the main portion left by embryo (Hong et al ., 1996).

1.6 Triacylglycerides metabolism and accumulation in seeds

In many plant species, TAG, stored in the seeds, represents a great form of reduced carbon that could be utilized to produce energy. TAG are esters of glycerol with FA. They accumulate in the seeds during development until germination. After starting the germination, these TAG are catabolized to fuel the newly formed seedlings till they reach the stage of autotrophy (Voelker and Kinny, 2001; Graham, 2008). TAG 37

biosynthesis during seed maturation ocurrs in endoplasmic reticulum (ER) while the FA biosynthesis in plants exclusively takes place in plastids (Figure 6). Saturated FA are synthesized de novo by the stepwise addition of the two carbon units from malonyl-acyl carrier protein (ACP) to the growing acyl chains with stearoyl-ACP (18:0-ACP) as the predominant final product. Each elongation cycle that results in the addition of the two carbon moiety to the growing acyl chain needs four separate reactions which are catalyzed by four separate enzymatic activities of a single enzyme complex (Voelker and Kinny, 2001). Final FA composition of a plant cell is mainly decided by the activities of several enzymes that use acyl-ACP at the chain termination step of FA synthesis and therefore relative activities of these enzymes regulate the final products of FA synthesis (Ohlrogge and Jaworski, 1997). In routine pathway, 18:0-ACP mostly undergoes desaturation reaction at the C9 position to form oleoyl-ACP, a reaction catalyzed by ∆9 desaturase enzyme (Voelker and Kinny, 2001). FA biosynthesis reactions are terminated by the release of acyl chain from the ACP which is performed by the action of special enzymes called acyl-ACP thioesterases. There are two main types of this enzyme; one is relatively specific for the hydrolysis of saturated FA while other one for the desaturated FA (Ohlrogge and Jaworski, 1997). Once synthesized, FA leave plastids and arrive to the cytoplasm. In the cytoplasm FA are esterified to coenzyme-A (CoA) and become substrates for the TAG biosynthesis in the ER (Voelker and Kinny, 2001). TAG are synthesized by the sequential transfer of the different acyl groups from acyl thioesters to glycerol backbone catalyzed by the acyltransferases. This process is initiated by the action of glycerol-3-phosphate acyltransferase (GPAT), which transfer the first acyl group to the sn-1 position of the glycerol-3-phosphate, forming lysophophatidic acid. Second step is catalyzed by another acyltransferase, called 1- acylglycerol-3-phosphate acyltransferase or lysophosphatidic acid acyl transferase (LPAAT) forming diacylglycerol-3-phopshate (phosphatidic acid). This is the central intermediate in the biosynthesis of the various types of glycerolipids. In the final step of the TAGs biosynthesis catalyze by DGAT, a third acyl group is transferred to the DAG forming TAG (Frentzen, 1998; Voelker and Kinny, 2001). Types of the three acyl groups incorporated to the glyceol backbone are responsible for the specific properties of the TAG. Structures of FA in the TAG are quite diverse in different plant species and unlike membrane proteins are not usually restricted to the saturated and unsaturated acyl groups of the 16 and 18 carbon units (Frentzen, 1998). 38

Figure 6: Fatty acid (FA) and triacylglycerol (TAG) biosynthesis in oilseeds (Weselake et al., 2009). Enzymes involved are represented in boxes. ACCase, acetyl- CoA carboxylase; CoA, coenzyme A; DAG, sn-1, 2-diacylglycerol; DGAT, diacylglycerol acyltransferase; DGTA, diacylglycerol transacylase DHAP, dihydroxyacetone phosphate; ER, endoplasmic reticulum; FA-ACP; fatty acyl-ACP; FA-CoA, fatty acyl-coenzyme A; Glu6PDH, sn-glucose-6-phosphate dehydrogenase; G3P, sn-glycerol-3-phosphate; GPAT, snglycerol-3-phosphate acyltransferase; LPA, lysophosphatidic acid; LPAAT, lysophosphatidic acid acyltransferase; LPC, lysophosphatidylcholine; LPCAT, lysophosphatidylcholine acyltransferase; MAG, monoacylglycerol; PA, phosphatidic acid; PAP, phosphatidic acid phosphatase; PC, phosphatidylcholine; PDAT, phospholipid:diacylglycerol acyltransferase; PLA2, phospholipase A2; PUFA, polyunsaturated fatty acid; 16:0, palmitic acid; 18:1, oleic acid 39

In seeds, oil is accumulated in small subcellular spherical structures called oil bodies (OBs). OBs consists of a TAG matrix surrounded by a layer of phospholipids (PLs) and special structural proteins, most abundant of which are oleosins. Small size of the OBs by providing large surface area per unit of TAG, facilitate the binding of lipases and lipolysis during seed germination. They are quite stable entities inside the cell and do not segregate (Hsieh and Huang, 2004).

Depending on the species composition of FA in vegetable oil is different. For example, R. communis oil contains 90% of ricinoleate (12- hydroxy-oleate) (Chen et al ., 2007). In J. curcas, where the seeds oil content reaches to 60%, oleic acid (18:1) and linoleic (18:2) present in highest level followed by palmitic acid (16:0) and stearic acid (18:0) (Yang et al , 2009). Among the 200 different types, the most abundant FA present in the main commercial oilseeds (e.g. soybean, palm, canola and sunflower) are linoleic acid, palmitic, lauric acid and oleic acid (Thalen and Ohlrogge, 2002).

In oilseeds, central metabolism is not devoted entirely to oil production but the metabolic network links the nutrients (e.g. sugars and amino acids) to precursors (e.g. acetyl-CoA and glycerol-3-phosphate) of FA and TAG synthesis (Baud and Lepiniec, 2010). The important substrate for the plastidial FA synthesis is not the acetate but the carbon source is imported from cytoplasm in the form of glucose-6-phosphate, phosphoenol pyruvate and malate (Rawsthorne, 2002). Besides, oilseed plastids also need energy rich molecules i.e. ATP, nicotinamide adenine dinucleotide (NADH) and nicotinamide adenine dinucleotide phosphate (NADPH), for FA synthesis. These molecules are imported through special type of transporters and also synthesized inside the plastids (Baud and Lepiniec, 2010). A global analysis of profiles for developing J. curcas seeds was recently reported (Jiang et al . 2012). According to this study the expression level of the genes related to the import of various kind of sugar and energy rich ATP to plastids, remained high throughout the seed development. This study suggested that in addition to the enzymes involved directly in the FA and TAG biosynthesis, the expression profile of the central metabolic pathways also remained high which further enhanced the oil accumulation capacity of the seeds during development.

40

1.7 Plants proteomics

The term proteome was introduced to describe all the proteins encoded by a genome (Wilkins et al ., 1996a). However, this definition does not explain that unlike genome, proteome is not a static entity and can change not only with the intracellular conditions but also with the conditions of the environment. Therefore proteome means the proteins content of a particular sample under specific time and conditions and include all the proteins isoforms and modifications (Wilkins et al ., 1996b; De Hoog and Mann, 2004). Proteomics emerged as a result of the advances made by the genomics, transcriptomics and computational biology (Graves and Haystead, 2002; Zhu et al ., 2003; De Hoog and Mann, 2004). One of the aim of proteomics is to identify the total expressed proteins to help in genome annotation as it is difficult to predict all the genes from genomic data. Expression studies through mRNA using various techniques is increasingly popular but the problem with these studies is that the mRNA analysis do not reflect the exact concentration of a protein in a particular sample. Other contributions of the proteomics could be: assigning proteins functions, different modifications occurring in a protein, proteins localization and compartmentalization and protein-protein interactions (See Graves and Haystead 2002, for a review). In the field of botany, large scale proteomic studies have been possible only for some model species whose genome had been sequenced. On the other hand, proteomic analysis is still quite challenging for many economically important species with genome still not sequenced as mass spectrometry (MS) based proteomic analysis is dependent on the presence of sequence databases. With rapid advancements in the molecular biology techniques, that made possible the production of large scale genomic and transcriptomic data, plant proteomic studies acquired momentum in the last few years (Champagne and Boutry, 2013).

Recent improvements in the extraction, separation, quantification and identification of the plant proteins made it feasible to perform high throughput proteomic analysis for the plants to unravel the molecular mechanisms underlying plant growth, development and interaction with the environment (Chen and Harmon, 2006).

A particular concern in agriculture is the performance of the plant under both biotic and abiotic stress conditions of the environment which can severely affect plant 41

development, growth, productivity and can result in huge economic loss. In this regard proteomics and especially quantitative proteomics is appearing as a powerful approach, that allowing the rapid identification and quantification of the stress and tolerance related proteins (Agrawal et al ., 2012; Salekdeh and Komatsu, 2007). Direct insight in to the function and understanding of the expression pattern and post-translational modifications (PTMs) of these stress related proteins can provide information which could be utilize to engineer stress tolerant plants using molecular biology strategies. Hashiguchi et al . (2010) presented a comprehensive overview of the application of proteomics to study different kind of stress related studies.

In different foods, proteins are important from both nutritional and technological point of view, and closely related to food quality, safety and nutrition (Agrawal et al ., 2012). Proteomics could be used as a powerful tool in food industry for the quality, safety and nutritional assessment (Pedreschi et al ., 2010). Food composition and quality could be analyzed through proteomics. Knowledge of the proteins composition of the various crops, obtained through the use of proteomics, has been successfully used for the industrial improvements (Agrawal et al ., 2012). A proteomic study has been conducted to evaluate the effect of heat treatment on the proteins content of the peach fruit. This study showed, that most of the differentially expressed proteins were related to the development and repining, suggesting that this information could be used to improve the fruit quality of the peach (Zhang et al ., 2011).

Proteomics have also been applied in the cereal industry, for example, wheat flour proteins were analyzed using 2-DE to identify the cultivar specific proteins. It was found that flour quality is correlated to the proteins composition and different identified protein markers from different wheat cultivar could be efficiently utilized to select the suitable cultivar for the flour (Yahata et al ., 2005). Another study revealed, that 2-DE based proteome analysis and amylose content, of different wheat cultivar could be useful tool for selecting appropriate wheat cultivar for pasta making (De Angelis et al., 2008). Proteome maps were constructed for the eleven beer samples prepared from different barley cultivars and level of malt modification (Iimure et al ., 2010). They found that the construction of the beer proteome map could be a useful tool for the detection and manipulation of the proteins related to the beer quality. 42

Food allergy is another field where proteomics has been applied at different levels. Combination of the 2-DE with immunoblotting using allergic patients sera has been the common approach used to characterize the allergenicity of certain food proteins (Akagawa et al ., 2007; Salekdeh and Komatsu, 2007). Besides 2-DE based approaches, shotgun proteomics approaches have also been applied for the detection and characterization of the several food allergens (Chassaigne et al ., 2007; Heick et al ., 2011).

Moreover, there are many other applications of the proteomics for crops that have been extensively reported in the literature, like food authenticity through specific protein markers, assessment of the nutritional value, subcellular proteomics and many others (Salekdeh and Komatsu, 2007; Jorrin-Novo et al ., 2009; Pedreschi et al ., 2010; Agrawal et al ., 2012).

In many plants, seeds are the important source of food, feed and biotechnologically important products, due to which one of the important goal of the agricultural research is to improve seed quality and traits in order to fulfill our needs (Moïse et al ., 2005). Seed proteomics usually referred to the proteomic analysis of the storage tissues which may either be endosperm or cotyledons, as they comprise main part of the mature seeds (Miernyk and Hajduch, 2011).

Proteomics has been used for understanding the expression pattern and regulatory mechanism of the enzymes that are related to the reserves deposition in the seeds of different plants. Considerable progress has been made for the characterization of the proteomes of developing and mature seeds of the oilseed plants, which mainly accumulate oil and proteins in their seeds (Hajduch et al., 2011).

1.8 Mass spectrometry

The most important breakthrough in the field of proteomics was the identification of gel separated proteins using mass spectrometer (MS). This identification not only extends the proteins analysis far beyond their mere display but also replaced the Edman degradation tool even in the common practice of protein 43

chemistry. Mass spectrometer is very sensitive technique that can deal with the proteins mixture and offer much higher throughput (Pandey and Mann, 2000).

MS consists of an ion source, mass analyzer and ion detector. Molecules are ionized in the ion source followed by separation according to their mass to charge ratios in the mass analyzer and the separated ions are detected by the detector (Nyman, 2001; Aebersold and Mann, 2003). Use of the MS has greatly increased in the proteins analysis after the invention of the two ionization techniques: MALDI (Matrix assisted laser desorption ionization) (Karas and Hillenkamp, 1988) and ESI (Electrospray ionization) (Fenn et al., 1989), also called soft ionization techniques. In MALDI-MS, matrix is a key component, which is an organic compound containing an aromatic conjugate ring and has the ability to absorb the ultraviolet light of particular wavelengths (Muddiman et al ., 1997). In this technique, sample is cocrystallized with a proper matrix on a sample plate and laser beam of appropriate wavelength is fired on this mixture, which results desorption of the sample and matrix mixture. The matrix absorbs energy upon irradiation, and transfers it to the sample and eventually ionizes the sample (Cotte-Rodriguez et al ., 2011). In ESI, sample dissolved in a solvent that pumped through a needle with high voltage and results into charged droplets. These droplets rapidly evaporate and impart their charge on the analyte (Mann et al ., 2001; Cotte-Rodriguez et al ., 2011).

Mass analyzer is an essential part of each MS because it can separate ions based their mass to charge (m/z) ratios. The important parameters for a mass analyzer are sensitivity, resolution and mass accuracy (Aebersold and Mann, 2003; Yates et al ., 2009). Mass analyzers can be broadly categorized into two basic types: scanning and ion beam mass analyzers, such as time-of-flight (TOF) and quadrupoles (Q); and the trapping mass analyzers such as ion trap (IT), Orbitrap and Fourier transform ion cyclotrons resonance (FT-ICR) (Yates et al ., 2009). In MS, mass analyzers are either used stand alone or in combination with other analyzer in order to increase the versatility and take advantage from the strength of each one (Aebersold and Mann, 2003; Yates et al ., 2009).

1.8.1 ESI LTQ-Orbitrap 44

ESI LTQ-Orbitrap is a hybrid MS instrument that contains ESI as the ion source and two different ion trap analyzers, the linear trap quadrupole (LTQ) and orbitrap. LTQ also called two-dimensional (2D) ion trap consists of hyperbolic quadrupole rods, each of which is divided into three successive axial sections. Three kinds of voltages are necessary for LTQ to operate as MS: three discrete DC (direct current) voltages applied to the three axial sections to produce axial trapping field, radio frequency (RF) voltage applied to the two pairs of rods to produce radial trapping field and alternating current (AC) voltage applied to the X rods for isolation, activation and ejection of the ions (Schwartz et al ., 2002). In comparison to the previous three-dimensional (3D) quadrupole ion traps, LTQ has an increased sensitivity, resolution and mass accuracy (Schwartz et al ., 2002; Aebersold and Mann, 2003; Yates et al ., 2009).

A new type of mass analyzer, called Orbitrap, was introduced by Alexander Makarov in 2000 (Makarov, 2000), which consists of two electrodes, an outer barrel like electrode and a central spindle like electrode. Ions are injected in the electric field between the two electrodes where they are trapped and follow a circular orbit around the central electrode and below the surface of the outer electrode due to the electrostatic field inside the Orbitrap. The electrostatic attraction between the ions and the central electrode is balanced by the centrifugal force arises as a result of initial tangential velocity of the ions with which they were transmitted. Inside the Orbitrap ions also oscillate along the axial z-axis. These oscillations along the z-axis are detected as the images of the current which are transformed into mass spectra using Fourier transformation (Makarov, 2000; Hu et al ., 2005; Scigelova and Makarov, 2006). Characteristics of the Orbitrap include high mass resolution (up to 150 000), high mass accuracy (2-5 parts per million), a m/z range of at least 6000 and dynamic range greater than 10 3 (Hu et al ., 2005; Scigelova and Makarov, 2006; Yates et al ., 2009). LTQ- Orbitrap consists of three major parts. First section is a linear ion trap that is capable of detecting the MS and MS n spectra. Second part is a radio frequency (RF) only quadrupole called C-trap. Ions can be transfer from the LTQ to this C-trap, which will accumulate and store the ions. The third and last part is the Orbitrap itself. This hybrid LTQ-Orbitrap MS contains two mass analyzers both of which are capable of detecting the ions and recording the mass spectra (Scigelova and Makarov, 2006). This coupling of the Orbitrap with LTQ provides the advantage of the high resolution and mass 45

accuracy of the Orbitrap couple with high speed and sensitivity of the LTQ (Yates et al ., 2009).

1.9 Post-genomic era and Jatropha curcas

After availability of the techniques for producing large amount of genomics data in comparatively short period of time, now the focus is on the use of this data for conducting the transcriptomic and proteomic studies. J. curcas has also been the subject of different studies aiming in the understanding of the pattern of TAG and toxic components accumulation during plant development. Here we describe the progress made with the transcriptomic and proteomic studies of J. curcas , before and after its genome sequence.

J. curcas genome was sequenced in 2010 (Sato et al ., 2010) using conventional Sanger method and new generation multiplex sequencing method, covering ~95% genes coding regions with 40929 presumptive proteins encoding genes, including 9870 complete genes. They reported 73 genes involved in the metabolism of TAGs and 3 genes encoding the curcins. It is also reported different genes related to the PE biosynthesis, the main toxic constituent of this plant. Later the genome was added (Hirakawa et al ., 2012) by deoxyribonucleic acid (DNA) sequences generated through Illumina sequencing platform and the transcriptome data available in the public databases at that time, which considerably increased the number of genes with complete structure.

A first large scale transcriptomic analysis for J. curcas was performed by Costa et al ., (2010), where they prepared two cDNA libraries from the developing and germinating seeds, respectively. The authors found that the proteins related to the FA biosynthesis were concentrated in the cDNA of developing seeds while those related to the degradation were dominated in the cDNA of germinating seeds. Among the 20 highly expressed transcripts in the cDNA of developing seeds, included transcripts for the reserve proteins belonging to the 11s globulin family and an aspartyl peptidase known to be involved in the processing of reserve proteins. Similarly, in the cDNA of germinating seeds, among the highly expressed transcripts was a cysteine peptidase known for its role in the reserve proteins mobilization and acetyl-CoA C- 46

acyltransferase, related to the oil and carbohydrate breakdown. Besides, they also reported different proteins related to the terpenoids biosynthesis but were unable to detect any casbene synthase enzyme, the main enzyme of the PEs biosynthesis. Within two months of the appearance of this transcriptomic study, two more reports of the cDNA libraries for the developing J. curcas seeds were also published. A first study was published by Natarajan et al . (2010), where they prepared a cDNA library from developing seeds at 8 developmental stages and showed that the library had abundant genes related to stress response, disease resistance and plant development. They also identified an array of ESTs corresponding to the genes related to the FA metabolism, PL biosynthesis, carbohydrates metabolism and many other important genes involved in diverse metabolic activities. Similarly, Gomes et al . (2010) also constructed a cDNA from developing J .curcas seeds at three developmental stages and identified express sequence tags (ESTs) for the genes such as FA, terpene, quinine and hormone biosynthetic pathways. They also studied the expression profiles of the four genes i.e. palmitoyl-ACP thioesterase, 3-ketoacyl-CoA thiolase B, lysophosphatidic acid acyltransferase (LPAAT) and geranyl diphosphate (GPP) synthase, between leaves and seeds using real-time PCR (Polymerase chain reaction), and found that the expression of these genes were higher in the seeds than the leaves.

To the best of our knowledge three transcriptomic studies ultimately led to the construction of cDNA libraries were published for J. curcas in 2011, of which first was published by King et al ., by sequencing the transcriptome of the developing J. curcas seeds. Among the most abundant transcripts were those of seed storage proteins (SSPs), oleosins, ribosomal proteins, metallothioneins and late embryogenesis abundant proteins (LEA). A complete set of enzymes involved in the conversion of the sucrose to TAG were also present in their cDNA library. Along with the identification of the curcins they also identified proteins related to the diterpenoids biosynthesis with exception of the casbene synthase (CS). For a better understanding of the lipids metabolism in J .curcas , Natarajan et al . (2011) prepared a normalized cDNA library by pooling the RNA extracted from five different parts of the plant i.e. roots, mature leaves, flowers, developing seeds and mature embryo. They identified proteins of all the pathways related to the lipids metabolism and discussed in detail. Chen et al . (2011) analyzed for the first time the transcriptome of J. curcas embryo at three different 47

developmental stages and constructed three cDNA libraries, from which they identified 2295, 1646 and 1512 unigenes, respectively. They concluded that the proteins of lipids metabolism are not only involved in the oil accumulation but they are also important for embryogenesis, morphogenesis, defense responses and adaptive mechanisms in plants.

Eswaran et al . (2012) constructed a cDNA library from the salt stressed roots of J. curcas and compared with that of the untreated plants, for identification of the proteins involved in abiotic stress responses. They reported 1240 ESTs generated from salt stressed root cDNA library of J .curcas , which represent a diverse repository for stress related genes for this plant. This stress responsive transcriptome would help in understanding of the adaptability of J. curcas to harsh enviroments.

In addition to the construction of these cDNA libraries, various transcriptomic studies were also made for analysis of the expression profile of either selected genes involved in the lipids metabolism (Gu et al ., 2011; Xu et al ., 2011; Gu et al ., 2012) or a global investigation of the genes during seed development (Jiang et al ., 2012), which provided important information regarding the lipids metabolism during J. curcas seed development.

After the availability of the genome sequence and a plethora of the transcriptomic data, J. curcas became a suitable candidate for performing the large scale proteomic analysis for studying the genes responsible for the lipids metabolism, reserves deposition and toxic components metabolism, at the protein level, that could be utilized to produce J. curcas verities, best suited for the industrial applications. However, J. curcas deserved a little attention in this regard and few proteomic studies were reported so far. A first proteomic study was published by Liang et al . in 2007, where they studied the proteomic changes in the seedlings using MS and identified 8 photosynthesis related proteins, significantly changed during cold stress. Yang et al . (2009) analyzed the endosperm proteome of the germinating J. curcas seeds using 2DE approach. 138 proteins spots were found to be differentially expressed which were excised from the gel and identified through LC-MS/MS analysis. This study resulted in the identification of 50 proteins, divided into five functional classes including signal related proteins, oil mobilization related proteins, ATP synthases, oxidative stress related proteins and others. They also showed that pathways such as oxidation, 48

glyoxylate cycle, glycolysis, citric acid cycle, gluconeogenesis and pentose phosphate pathway were involved in the oil mobilization during seed germination. A comparative proteome analysis of the embryo and endosperm from the seeds of J. curcas was performed by Liu et al . (2009), using 2DE approach. 380 and 533 major protein spots were observed for the embryo and endosperm, respectively. 14 differentially expressed protein spots between the two tissues were identified through LC-MS/MS analysis and concluded that the proteins in the endosperm were catabolism related while those in the embryo were anabolism related. Oilbody proteome analysis from J. curcas identified 10 different proteins with three oleosins as the major constituents (Popluechai et al ., 2011). These oleosins were characterized at the gene, transcript and protein level. They discovered two alleles for one of the oleosin, one with and other without introns. Single nucleotide polymorphisms (SNPs) were identified in the introns among different Jatropha accessions, which were suggested to serve as the markers during the phylogenetic and breeding studies. In another study the differential proteome of the endosperm and embryo from mature J. curcas seeds was analyzed using 2DE approach (Liu et al ., 2011). 66 proteins spots were analyzed using LC-MS/MS that resulted in the identification of the 28 proteins, classified into 9 different functions. Authors showed that the proteins required for the germination of the seeds like proteins related to oil mobilization, signal transduction, transcription, proteins synthesis and cell cycle, were already present in the endosperm and embryo of the dry mature seeds. Recently proteome of the whole seed at six different developmental stages were analyzed using 2DE (Liu et al ., 2013). Analysis of the differentially expressed proteins through MALDI-TOF/TOF resulted in the identification of the 104 proteins, classified into 10 different functional categories. They demonstrated that the proteins related to the energy and metabolism were involved in the carbon flux to the lipids accumulation in the seeds. Booranasrisak et al . (2013) investigated the seed kernel proteome at eight developmental stages using shotgun proteomic approach which resulted in the identification of the 22 proteins related to the FA metabolism.

Taken together, these proteomic studies produced little information and number of the identified proteins is very limited in each of these individual study. Utilizing the genome sequence data of J .curcas , we recently made a first in depth proteome analysis of the plastids isolated from the endosperm of developing seeds (Pinheiro et al ., 2013), 49

where we identified 923 proteins, related to diverse metabolic activities, especially those involved in the FA and secondary metabolism. Similarly, we also investigated the proteome of the inner integument and plastids isolated from the inner integument (data not shown), which considerably increased the proteins repertoire of the J. curcas and also provided informations regarding some important aspects of this maternal tissue. These three proteomic studies are the milestones towards the understanding of the biological processes related to the J. curcas seeds, however, there are some aspects like, pattern of lipids and proteins deposition during seed development, importance of the carbohydrates metabolism in the lipids accumulation etc that cannot be answered with these proteomic analyses. In order to answer these questions and understand the biological processes related to the main storage tissue i.e. endosperm, of the J. curcas seeds, we made a proteomic analysis of the endosperm isolated from the J. curcas seeds at five different developmental stages. This study is the first proteomic analysis of the endosperm from this plant which resulted in the highest number of proteins identified from this tissue of J. curcas seeds. These results not only provides information regarding various biological events occurring in this tissue during seed development but also provided proteomic data that could help in gene manipulating studies for producing the J. curcas varieties best suited to the industrial demand.

50

2 OBJECTIVES

2.1 General objective

To analyze the proteome of the endosperm from developing J. curcas L. seeds through shotgun proteomics approach.

2.2 Specific objectives

• Classification and histological analysis of J. curcas seeds at different developmental stages. • Identificationof the proteins extracted from the developing J. curcas seeds using LC-MS/MS technique. • Functional classification of the identified proteins. • Classification of the proteins involved in the synthesis and degradation of the FA. • Classification of the proteins involved in carbohydrates metabolism and their role in the lipids deposition. • Classification and pattern of deposition of the J. curcas SSPs. • Classification of the different types of peptidases involved in the reserve mobilization and PCD of the developing endosperm. • Classification of the proteins involved in the synthesis of toxic constituents. • Interpretation of the results and electing the possible target for the biotechnological improvement of the J. curcas .

51

3 MATERIALS AND METHODS

3.1 Plant material

The strategy of the seeds collection for the proteomics analysis used in this thesis is summarized in the form of a workflow (Figure 7). Seeds of J.curcas were collected at the experimental farm of the Federal University of Ceará (UFC), located at the Pentecoste, a municipality of the state of Ceará, Brazil. Based on their morphological characteristics, seeds were divided into nine different developmental stages and histological analysis was performed in order to characterize seeds in each stage. The mature Stage was called Stage 10. At Stage 1 and 2, nuclear endosperm, nucellus, and integuments were present in the seeds. At Stage 3 and 4, cellular endosperm was initiating at the micropyle side. Seeds at Stage 5 contained integuments and a thin layer of cellular endosperm around the central cavity. From Stage 6 to Stage 9, seeds contained integuments and a thicker endosperm but had different morphological characteristics. Seeds from Stage 1 to 5 had a light yellow color integuments. Seeds at Stage 6 had an orange color inner integument while the outer integument was still light yellow. Seeds at Stage 7 had an inner integument with 70 to 90% area as burgundy color while the outer integument was still light yellow. Seeds at Stage 8 had black color inner integument while light yellow outer integument. Seeds at Stage 9 had a black inner integument and an external integument with brownish ends. Morphological characteristics along with histological results are presented in results and discussion.

3.2 Histological analysis and selection of developmental stages for the endosperm isolation

Histological analysis of the seeds was performed following a previously used method (Baba et al ., 2008; Rocha et al ., 2013). Seeds were collected and fixed in Karnovsky solution (Karnovsky, 1965) under vacuum for 24 h at room temperature; dehydrated in an ethanol series (10, 20, 30, 40, 50, 60, 70, 80, 90 and 100%, 1 h each step) and slowly embedded in historesin (Leica Microsystems Nussloch GmbH, 52

Figure 7: General workflow summarizing the process from seeds collection to proteomic analysis of the extracted proteins. 53

Germany) for at least 20 days, in order to guarantee a complete penetration of the resin. Serial sections of 5 µm were prepared on a rotary microtome LEICA 2065 equipped with a steel knife. Sections were stained in toluidine blue (0,05% in borax 0,12%) followed by basic fuchsin (0,05%) (Junqueira, 1990), and mounted in Tissue Mount for light microscopy. Slides were examined under the bright field optics and the results were registered with a photographic camera (HP Photosmart R967) coupled to a microscope Olympus CX40.

3.3 Endosperm isolation and protein extraction

Seeds were harvested and endosperm was isolated from the seeds under a binocular microscope using a sharp spatula. For removal of the lipids, endosperm was cut into small pieces and kept in acetone on mild shaking for at least 30 hours, with changing the acetone after each five to six hours. The endosperm material was then freeze dried and crushed to obtain a fine powder which then was passed through a 100 mesh sieve and directly used for protein extraction or alternatively kept at -20 °C. This process was repeated using three independent biological replicates. Powder of the isolated endosperm was subjected to protein extraction according to Vasconcelos et al . (2005). For this, 100 µg of endosperm material was homogenized in pyridine buffer (50 mM pyridine, 10 mM thiourea, and 1%SDS, pH 5.0) with polyvinyl-polypyrrolidone (PVPP) in a proportion of 1:40:2 (w/v/w), respectively. This mixture was stirred for 2 h at 4°C and centrifuged at 10000 g for 30 min. Proteins were precipitated from supernatant using cold 10% trichloroacetic acid in acetone. Pellets were washed with cold acetone three times, dried under vacuum, and dissolved in 7 M urea/2 M thiourea. Protein concentration was determined by the Bradford assay (Bradford, 1976), using bovine serum albumin (BSA) as standard.

3.4 1D-SDS-PAGE and samples preparation for LC-MS/MS

Polyacrylamide mini gels were prepared using reagents composition showed in Table 1. 40 g proteins from each Stage and biological replicate were subject to 1-D gel electrophoresis using Mini-PROTEAN® Tetra Cell vertical electrophoresis system

54

Table 1: Composition of polyacrylamide mini gels for SDS-PAGE.

Reagents Volume Resolving gel (15%) composition Acrylamide/Bisacrylamide (29.2%/0.8%) 2.5 ml Tris-HCL (pH: 8.8) 1.5M 1.25 ml Sodium dodecyl sulfate (SDS) 10% 100.0 µl H2O (Milli-Q) 1.2 ml Ammonium persulfate (AP) 10% 100.0 µl N,N,N ′,N ′-Tetramethylethylenediamine 5.0 µl (TEMED)

Stacking gel (5.0%) composition Acrylamide/Bisacrylamide (29.2%/0.8%) 1.33 ml Tris-HCL (pH: 6.8) 0.5M 2.5 ml SDS 10% 50.0 µl H2O (Milli-Q) 6.0 µl AP 10% 50 µl TEMED 5.0 µl

55

(Bio-Rad) and EPS 3501 XL power supply (GE Healthcare). For this, appropriate volumes containing 40 µg of proteins, from each sample were separately mixed with the sample diluting buffer [Tris-HCL (0.125 M) pH: 6.8, SDS (4%), Glycerol (20%), Bromophenol blue, DTT (0.02M)] to a final volume of 20 µl. This solution was kept in boiling water for 5-10 minutes, brought to room temperature and loaded on the gels. Gels were place in the gel tank, containing running buffer (Tris 25mM, Glycine 192mM, SDS 0.1%). Finally, Mini-PROTEAN Tetra Cell (Bio-Rad) assembly was connected to EPS 3501 XL Power Supply (GE Healthcare) and appropriate voltage (75 volts for stacking gel and 135 volts for resolving gel) were applied to run the mini gels. Once completed, gels were stained with Coomassie Brilliant Blue R-250.

In-gel trypsin digestion and samples preparation for the MS analysis were performed using a previously described method (Shevchenko et al ., 2006) with slight modifications. Procedure was divided in the following steps.

• Gels were rinsed with milli-q water and lanes corresponding to the endosperm protein samples were divided into six slices and stored in separate eppendorf tubes. • 500 µl of acetonitrile (ACN) were added to each eppendorf tube containing the gel pieces and left for 10 minutes at room temperature in order to completely dehydrate the gel pieces. • To de-stain the gel pieces 100 µl of 100mM Ammonium bicarbonate

(NH 4HCO 3)/50%ACN (1:1v/v) were added to each eppendorf tube and kept on occasional vertexing for 15 minutes. After 15 minutes solution was removed and process was repeated at least 3 times for complete de-staining. • After removing the de-staining solution 500 µl of ACN was added to each tube and left for 5 minutes. • After 5 minutes, ACN was removed and sufficient volume of 10mM DTT solution was added to each tube to completely cover the gel pieces and left at 56°C for 30 minutes. • After 30 minutes the DTT solution was removed from all the eppendorf tubes and replaced with 500 µl of ACN for 5 minutes.

56

• ACN was removed from all the tubes and sufficient volume of 55mM iodoacetamide (IAA) solution was added to each tube. Tubes were left for 30 minutes at room temperature in the dark. • After 30 minutes IAA solution was discarded from all the tubes and replaced with 500 µl ACN to dehydrate the gel pieces. • After 5 minutes, ACN was discarded from all the tubes and 20 µl of trypsin solution, containing 0.4 µg of trypsin (Promega), was added to each tube and kept overnight (or 16 hours) at 37 0C.

Peptides were extracted from the gels in new eppendorf tubes using 50 µl of 50% ACN and 5% formic acid solution in duplicates. Peptides were either passed from the spin columns (Harvard Apparatus) and stored at -20°C for future use.

3.5 Protocol used for peptides cleaning through spin columns

Before loading to the EASY-nano LC system (Proxeon Biosystem) coupled online to an ESI-LTQ-Orbitrap Velos Mass Spectrometer (Thermo Fisher Scientific), peptides were cleaned by passing through the spin columns filled with C-18 resin using following stepwise protocol.

• Activation of resin: 200 µl of the 100% ACN were loaded on the spin columns, centrifuged at 1000 rpm for 1-3 minutes. • Equilibration of resin: 200 µl of 0.1% FA were loaded on the columns, centrifuged at 1000 rpm for 1-3 minutes and process repeated three times. • Sample loading: Each sample was dissolved in 200 µl of 0.1 FA, loaded to separate pretreated spin columns and centrifuged at 500-1000rpm for 3-5 minutes. • Washing: Samples containing spin column were washed with 200 µl of 0.1% FA, three times. • Peptides elution: Finally peptides were eluted from the spin columns into new separate eppendorf tubes by applying 100 µl of 60% ACN/0.1% FA, in duplicates. 57

Eluted peptides were dried down in SpeedVac and stored at -20 for future use.

3.6 NanoLC-MS/MS analysis

Samples were analyzed using an EASY-nano LC system (Proxeon Biosystem) coupled online to an ESI-LTQ-Orbitrap Velos Mass Spectrometer (Thermo Fisher Scientific). Peptides were eluted through a trap column (150 m × 2 cm) packed in- house with C-18 ReproSil 3 m resin (Dr. Maisch, Germany) and an analytical column (100 m x 15 cm) packed with the same material, using gradients of phase A (0.1% formic acid, 5% ACN) and B (0.1% formic acid, 95% ACN) in the following three steps for a total of 2 hrs. i. A gradient of 5% to 35% phase B for 107 min.

ii. 35% to 90% phase B for 8 min.

iii. And 90% phase B for 5 min.

After each run the column was washed with 90% phase B and re-equilibrated with phase A. Mass spectra were acquired in a positive mode using the data-dependent auquisition (DDA) survey MS scan and tandem mass spectra (MS/MS) acquisition. Each DDA consisted of a survey scan of the m/z range 300−2000 and resolution 60000 with a target value of 1 × 10 e6 ions. Survey scan was followed by the MS/MS of the 10 most intense ions in the LTQ using the collision-induced dissociation (CID), and previously fragmented ions were dynamically excluded for 60 s. Raw data were viewed in Xcalibur v.2.1 (Thermo Scientific). Three raw files were generated for each biological replicate, representing three technical replicates.

3.7 DATA analysis

Steps wise data analysis is summarized in the form of a workflow (Figure 8).

3.7.1 Proteins idenitifcation

Raw files were converted into MS2 files using RawXtract software. MS2 files were loaded to the ProLuCID v.1.3 (Xu et al., 2006) and MS/MS ion search was performed against the J. curcas proteins database (Hirakawa et al., 2012), downloaded from 58

Figure 8: Workflow showing steps taken in data analysis.

59

(http://www.kazusa.or.jp/jatropha/) in September 2012, combined with J. curcas chloroplast genome encoded proteins (Asif et al ., 2010), downloaded from National Center for Biotechnology Information (NCBI) in September 2012. Parameters used for performing the searches were: full tryptic hydrolysis, two missed cleavages, oxidation of methionine as variable modification, carbamidomethylation as fixed modification and tolerance of 50 ppm. The peptides spectrum matches obtained from the ProLuCID were subsequently processed/filtered through the Search Engine Processor (SEPro), a tool for filtering and organizing the peptide spectrum matches (Carvalho et al ., 2012), using a 1% false discovery rate (FDR) at proteins level for identification quality control. The graphical user interface (GUI) of SEPro is user friendly and provided information regarding the identification of the total proteins, protein groups, proteins with maximum parsimony, peptides related to each protein etc. Besides this information, user can also download fasta sequences of the identified proteins that could be used in downstream analysis.

3.7.2 Functional classification

Identified proteins from each of the five developmental stages were blasted against the NCBI non redundant database (NCBInr) using Blast2GO annotation tool (Conesa and Gotz, 2008). Blast2GO is a functional annotation tool for the DNA or proteins sequences, that besides providing annotations like Gene Ontology (GO) and KEGG metabolic pathways, also helps in adding the descriptions to the protein sequences using BLAST against NCBInr database. In order to determine the Ricinus and Arabidopsis orthologs for the identified proteins, a local BLAST of the identified proteins was performed against the Ricinus and Arabidopsis proteins databases download from the UniProt in October 2012 (http://www.uniprot.org) and TAIR in October 2012 (http://www.arabidopsis.org/), respectively, using an evalue of 1 x10e -20 . GO annotation was performed using AgBase tools and database (http://agbase.msstate.edu/index.html). Goanna (McCarthy et al., 2006) was used to retrieve the GO annotations assigned on the basis of sequence similarities and the Plant GOSlim was used to summarize the sub-categories of the identified proteins. For obtaining the more in depth information from the identified proteins, they were functionally classified using two different approaches. Firstly, proteins were mapped to 60

KEGG metabolic pathways using KEGG Automatic Annotation Server (KAAS) (Moriya et al ., 2007). Secondly proteins were classified to different functional categories of the MapMan BIN ontology (Thimm et al ., 2004) using Mercator pipeline (Lohse et al ., 2013).

3.7.3 Cluster analysis and quantification

PatternLab for proteomics tool (Carvalho et al ., 2010) was used for determining the proteins with similar expression profiles and those with differential expression profiles, during the seed development. PattenLab is a user friendly computational environment for analyzing the shotgun proteomics data. It provides different modules, like, Approximately Area Proportional Venn Diagram (AAPVD) for pinpointing the proteins uniquely identified in a particular state and common to three different states, TrenQuest for clustering the proteins with similar expression profiles, Tfold and ACfold for determining the differentially expressed proteins between the two states and Gene Ontology Explorer (GOEx) for GO annotation. PatternLab is coupled with SEPro tool through another module called Regrouper (Carvalho et al ., 2012), that helps in parsing the experimental data to the PatternLab’s data format (index and sparsmatrix). Index and sparsmatrix files were created using following steps.

• One SEPro file was generated for each biological replicate of all the five samples, which resulted in 15 SEPro files, 3 for each sample. • Five different directories were created and named as 1, 2, 3, 4 and 5, respectively. • SEPro files of the Stage 6 were placed in directory 1, Stage 7 to directory 2, Stage 8 to directory 3, Stage 9 to directory 4 and Stage 10 to directory 5. • Each of the five directories were loaded to PatternLab’s regouper module and index and sparsmatrix files were generated using maximum parsimony option.

In order to simplify the results for the clustering and differential expression analysis, only proteins of maximum parsimony were used. To determine the proteins with similar expression profile during the course of endosperm development, proteins were clustered using TrenQuest module of the PatternLab computational environment 61

(Carvalho et al ., 2010). For this purpose PatternLab program was open, followed by selecting the TrenQuest module. Index and sparsmatrix files were loaded and following parameters were selected; Minimum average signals 10, minimum data points 6, minimum items per cluster 4 and number of cluster as 5. TrendQuest module clusters the proteins using k-means clustering algorithm and provides options like discarding the proteins averaging below certain spectral counts or did not appearing in certain number of replicates.

To determine the spectrum count based differential expression of the ´proteins during different developmental stages, Tfold module of the PatternLab for proteomics was used (Carvalho et al ., 2010). Index and sparsmatrix files were loaded and following parameters were selected; 2 minimum replicates per class and BH (Benjamini- Hochberg) Q-value as 0.05. Stage 6 was considered as reference stage and differential expression of the proteins in the other four stages were determined with reference to Stage 6. Tfold module sum the spectral counts of proteins in different biological replicates and combines the fold change cutoff with a student t-test and BH theoretical false positive rate estimator (Benjamini and Hochberg, 1995). Results of the Tfold are presented as fold change versus probability plot. In this plot each protein is represented as a dot and mapped according to its log2 (fold change) on the y-axis and it’s –log2 (t- test p-value) on the x-axis. In this plot proteins are represented in four different colors for example, red dots are proteins that satisfy neither the variable fold-change cutoff nor the p-value. Green dots are those that satisfy the fold-change cutoff but not p-value. Orange dots are those that satisfy both the fold-change cutoff and p-value, but are less abundant proteins. Finally, blue dots are those that satisfy all statistical filters. Here we choose only the blue dots as significantly differentially expressed proteins.

3.8 Data reposition

Raw data and Supplementary Tables were deposited in the Chorus (https://chorusproject.org/pages/index.html), a publically available MS data repository. Currently, data data is only available to the thesis examiners. After publication this data will be made publically available. 62

4 RESULTS AND DISCUSSIONS

4.1 Anatomical analysis of developing seeds of Jatropha curcas

Generally, transcriptomic (Jiang et al ., 2012) or proteomic (Liu et al ., 2013) analyses for the developing J. curcas seeds were made analyzing the whole seed rather than isolating a specific tissue (e.g. integument, embryo or endosperm) from the developing seeds. Here we performed the proteomic analysis of the storage tissue, the endosperm, isolated from the seeds at different developmental stages. For selecting the proper developmental stages for the isolation of the endosperm, seeds were classified based on their morphological characteristics and subjected to anatomical analysis. This anatomical study (Figure 9) highlights that seeds from Stage 1 through Stage 3 have only their nuclear endosperm observed, while seeds at Stage 4 are undergoing a transition from nuclear to cellular endosperm at the micropyle end. Seeds at Stage 5 are still in the process of cellularization while those at Stage 6 only cellular endosperm can be observed. From Stage 6 and onward, the endosperm is increasing its size. In Stage 9, the whole seed is almost filled with the endosperm.

Making use of this morphological and anatomical information, four developmental stages i.e. 6, 7, 8 and 9, were selected to isolate the endosperm for proteomic analysis. Besides these four developmental stages, endosperm was also isolated from mature seeds, which was called as Stage 10 (Not shown in the Figure 9).

4.2 Protein identification and data analysis

Three independent SDS-PAGE experiments were performed for each biological replicate of the proteins extracted from the endosperm at five different developmental stages (Figure 10). After in-gel digestion and analysis by LC-MS/MS of the proteins, search analyses were performed using a 1% FDR at protein level. Identification results from each stage are summarized in Table 2. Search analysis resulted in the identification of 1901, 1680, 1302, 1103 and 518 proteins from Stage 6, 63

Figure 9: Jatropha curcas seed development. Seed external morphology (A.1-A.9), internal part of the seeds (B.1-B.9) and histological analysis of the seeds (C.1-C9), at different developmental stages S-I to S-IX. CE, cellular endosperm; Em, embryo; M, micropyle; SC, seed coat. Arrows indicate cellular endosperm; arrowheads indicate nuclear endosperm. All the seeds are oriented with the micropyle at the upper side. Scale A.1 to C.3, 1 mm and C.4 to C.9, 2 mm. 64

Figure 10: SDS-PAGE of proteins, extracted from endosperm of Jatropha curcas seeds at five different developmental stages. Three gels representing the three independent biological replicates. The six rectangles on each lane of the gel represent fractions analyzed by the LC- MS/MS. 65

Table 2: Proteins identified from the endosperm of Jatropha curcas seeds at five different developmental stages. Row 2 represents totals number of proteins, identified from five different developmental stages. Row 3 represents proteins appeared in at least two biological replicates.

Identifications Stage 6 Stage 7 Stage 8 Stage 9 Stage 10 All Proteins 1901 1680 1302 1103 518 2 Biological replicates 1517 1256 1033 752 307

66

7, 8, 9 and 10, representing, 1304, 1151, 878, 737 and 324 protein groups, respectively (Supplementary Table I). These protein identifications were made based on the identified peptides illustrated in Supplementary Table II. Proteins appeared in at least two biological replicates were considered as representative of that particular biological sample and used in downstream analysis, which resulted in 1517, 1256, 1033, 752 and 307 proteins, identified from stages 6, 7, 8, 9 and 10, respectively, summing up to a total of 1760 proteins (Supplementary Table III). Of the1760 proteins, 14% appeared in all the five stages, 22% were in at least four stages, 19% were in at least three stages, 18% were in at least two stages while 27% proteins appeared in at least one of the five developmental stages (Figure 11).

4.3 Functional classification of the identified proteins

Although J. curcas genome has been sequenced (Sato et al ., 2010; Hirakawa et al. , 2012), it is still not fully annotated. Protein sequences in the genomic database of this species do not have descriptions and hence the first step in functional analysis of the data was to add proper descriptions to all of our identifications. Blast2GO functional annotation tool (Conesa and Gotz, 2008) was applied to add descriptions to the identified proteins and further manual corrections were made when needed. For example, Jcr4S28232.10 is an oleosin which was not annotated as oleosin by the Blast2GO functional annotation tool, similarly, description for many other proteins were also corrected. GO annotation was performed separately for the proteins identified from each stage and for the proteins unique to each stage. Despite the heterogeneity in the total number of proteins identified from each developmental stage and those unique to each stage (Supplementary Table III) some significant differences in the distribution of these proteins to different GO categories were observed (Figures 12 and 13), the later showing more variations in the distribution to the different groups of the three GO categories. For example, protein classes related to the subcategories Response to Stress, Response to Abiotic Stimulus and Response to Endogenous Stimulus of the GO Biological Process, were found to increase from initial to mature stage (Figures 12 and

67

Figure 11: Distribution of the 1760 proteins along the five developmental stages of our proteomic analysis.

68

Figure 12: Gene Ontology annotation of the proteins identified from Jatropha curcas endosperm at five developmental stages. Plant GO Slim was used to summarize the sub- categories of the identified protein groups. Y axes represent percentage of protein groups. 69

Figure 13: Gene ontology annotation of the unique proteins present in each developmental stage of Jatropha curcas seed. Plant GO Slim was used to summarize the sub-categories of the identified protein groups. Y axes represent percentage of protein groups. 70

13). Among the proteins identified in these subcategories we found especially enzymes known to be involved in the defense of the oxidative stress such as catalase, peroxidase, superoxide dismutase, quinone and thioredoxins, reflecting the importance of these enzymes during the intense metabolic activities occurring in the seeds during development. Similarly the higher deposition of these proteins in the mature seeds reflects their importance in the scavenging activity against the reactive oxygen species (ROS) during germination. In line with these results, proteins in the subcategory Peroxisome of the GO-Cellular component of the unique proteins, shows a higher percentage in the Stage 10 compare to other stages (Figure 13). This organelle, besides being involved in other plant processes, it is also involved in various biotic and abiotic stress responses (reviewed in Hu et al., 2012). Similarly, subcategory Post- embryonic Development of the GO-Biological process of the unique proteins shows higher proteins percentage in the mature stage than the initial stages (Figure 13). This subcategory is dominated by the proteins, like LEA proteins and dehydrins, which are related to the desiccation phase of the seed development. Additionally, some subcategories of the GO-Biological process of the unique proteins, like Metabolic process, Biosynthetic process and Transport, showed lower proteins percentage in the Stage 10 than the other stages (Figure 13), which is common phenomenon related to the quiescent state of the seed.

For a more detailed functional analysis, identified proteins from each stage were functionally classified using two different classification systems. Firstly for determining proteins related various metabolic pathways, according to the KEGG metabolic pathways this showed proteins related to the Carbohydrates Metabolism to be the major functional class, followed by Amino acids, Energy and Lipids Metabolism (Figure 14). Secondly, proteins were also classified to the MapMan functional categories using the online MapMan protein classifier (http://mapman.gabipd.org/web/guest/app/mercator), which resulted in the classification of the identified proteins to diverse metabolic and non metabolic functional classes (Figure 15). Protein, Amino Acids Metabolism, Glycolysis, Lipids Metabolism, Stress and Development are the prominent functional classes among the others. Carbohydrate metabolism pathways like glycolysis and the tricarboxylic acid pathway (TCA) have higher number of proteins in the mature stage than in early developmental stages. 71

Figure 14: Functional classification of the identified proteins according to KEGG functional classification. X-axis represents functional classes while Y-axis represent percentage of each fucntional class. 72

Figure 15 : Functional classification of the identified proteins from the Jatropha curcas endosperm based on MapMan classification. X-axis represents functional classes while Y-axis percentage of each functional class. 73

Glycolysis and TCA among the others, produces carbon sources and energy for the FA biosynthesis (Pinheiro et al ., 2013).

4.4 Proteins quantification and cluster analysis

In order to pinpoint proteins that are differentially expressed during the seed development, we used PatternLab TFold module (Carvalho et al ., 2010) with a q-value of 0.05 (Figure 16). TFold module uses a theoretical FDR estimator to maximize identifications satisfying both a fold-change cutoff that varies with the t-test p-value as a power law and a stringency criterion that aims to fish out lowly abundant proteins that are likely to have had their quantitations compromised. Stage 6 was considered as a reference stage and proteins identified in the other developmental stages were compared to this stage. Significantly differently expressed proteins were determined using log2 values of the spectrum counts ratios between Stage 7, 8, 9 and 10 versus Stage 6. 99, 82 and 81 proteins were found to be significantly differentially expressed in stage 8, 9 and 10, respectively, while in stage 7, none of the protein was found to be significantly differentially expressed (Supplementary Table IV).

For determining the proteins with similar expression profile during seed development, cluster analysis was performed for the identified proteins. Mature stage was excluded from the clusters analysis due to the lower number of proteins identified at this stage. TrenQuest module of the PatternLab for proteomics tool (Carvalho et al ., 2010) was applied for grouping the proteins with similar expression profile. This module clusters proteins with similar expression profiles using k-means clustering algorithms. For clustering, only proteins with minimum average spectrum counts of 10 were used. Proteins sharing common trend patterns were grouped into five clusters (Figure 17). Proteins involved in primary metabolism, for example, the ribosomal proteins, elongation factors, aminoacyl-tRNA-synthetases, proteasome components, chaperones, and proteins involved in metabolism are grouped in the clusters 1, 2 and 3 (Supplementary Table IV). Besides, some proteins involved in carbohydrates and lipids metabolism are also grouped in these three clusters. Similar pattern of these proteins have already been observed in the developing seeds of Ricinus communis (Nogueira et al ., 2013). Interestingly, four seed storage proteins (SSPs) were also 74

Figure 16: Label-free quantification based in spectral counts for the identified proteins from Stage 7, 8, 9 and 10 to those identified in Stage 6, using T-test. Proteins in stage 7 ( 16A ), 8 ( 16B ), 9 ( 16C ) and 10 ( 16D ) are mapped as dots on the plot according to its –Log 2 (p-value) (x-axis) and Log 2 (Fold change) (y-axis). Red dots are proteins that satisfy neither the variable fold-change cutoff nor the FDR cutoff α= 0.05. Green dots are those that satisfy the fold-change cutoff but not α. Orange dots are those that satisfy both the fold-change cutoff and α, but are lowly abundant proteins and therefore most likely have their quantitations compromised. Finally, blue dots are those that satisfy all statistical filters. 75

Figure 17: Cluster analysis of the proteins identified in Stage 6, 7, 8, and 9. Proteins with similar expression profile were grouped into five different clusters. X-axes represent developmental stages while Y-axes represent signal intensity.

76

grouped in Cluster 1, showing a lower deposition at the advanced stages compared to Stage 6. We already identified these SSPs from the inner integument of J. curcas seeds (data not presented) as well, revealing that, these reserve proteins may have some special function during the seed development. Cluster 5 is dominated by other SSPs (11S globulins, 2S albumins, legumins and vicilins) and proteins related to the desiccation phase of seed development like LEA proteins and oleosins (Supplementary Table IV). LEA proteins are usually expressed in the conditions of desiccation (Hundertmark and Hincha, 2008) while oleosins are important proteins for controlling the oilbody structure and lipids accumulation (Siloto et al ., 2006).

4.5 Major functional classes

4.5.1 Proteins related to the metabolism of carbohydrates

Carbohydrate metabolism is of particular importance for the biosynthesis of lipids in heterotrophic seed. It supply seed with adenosine triphosphate (ATP); reducing power in the form of NADH and NADPH; and carbon compounds, such as pyruvate, that are required for biosynthetic processes. We identified a large number of proteins involved in carbohydrates metabolism (Table 3). Carbohydrate metabolism, besides being the most represented functional class in this study, had an increase in the percent of proteins identified during the development with the highest level at Stage 10 (Figure 14). The synthesis of FA in plastids start with the breakdown of sucrose in a multistep compartmentalized reactions. First, the sucrose must cross the plasma membrane by a sucrose transporter or, after cleavage, as glucose and fructose by sugar transporters and polyol transporters (Buttner, 2007). We did not identify sucrose transporters in this study, but the sugar transporter, phosphate/phosphoenolpyruvate translocator was identified. In an other study we were able to identify phosphate /phophoenolpyruvate transolcator along with triosphosphate translocator from the plastids of endosperm (Pinheiro et al., 2013). The enzymes sucrose synthase and invertase, which cleaves sucrose into glucose and fructose generating the initial precursors of the glycolytic pathway, were 77

Table 3: Proteins related to the metabolism of carbohydrates , identified from the endosperm of developing Jatropha curcas seeds. Column 1 represents in how many developmental stages a particular protein appeared. Column 2, the protein accession numbers in the genome. Column 3, the Blast2GO descriptions of the proteins. Column 4, best Ricinus communis homologues of the Jatropha curcas proteins. Column 5, the best Arabidopsis thaliana homologues for the Jatropha curcas proteins. Column 6, presence of the proteins in different protein clusters. Column 7, 8 and 9, log2 of the ratios of the spectral counts of the proteins appeared in Stage 8, 9 and 10, respectively, to Stage 6.

Developmental Best Ricinus Best Arabidopsis Log2 Log2 Log2 stage ProteinID Description Hit UNIPROT Hit TAIR Cluster E8/E6 E9/E6 E10/E6 6: 7: 8: 9:10 Jcr4S00101.20 2,3-bisphosphoglycerate-independent phosphoglycerate B9S1V6 AT3G08590.1 I -1.7965544 -3.084131 - mutase 6: 7: 8: 9 Jcr4S01086.110 2-isopropylmalate synthase chloroplastic -like B9SX30 AT1G74040.1 III - - - 6:7 Jcr4S01488.20 2-oxoglutarate mitochondrial-like B9SR46 AT3G55410.1 - - - - 6: 7: 8: 9 Jcr4S007 81.30 3-isopropylmalate dehydratase small subunit -like B9SXE3 AT2G43090.1 - - - - 6: 7: 8: 9 Jcr4S00256.20 3-isopropylmalate dehydratase-like B9SQS5 AT4G13430.1 III - - - 6:7 Jcr4S00534.60 6-phosphofructokinase 3 -like B9RRX6 AT4G26270.1 - - - - 6 Jcr4S00758.60 6-phosphofructokinase 3-like B9RKE5 AT4G26270.1 - - - - 6 Jcr4S01697.80 6-phosphofructokinase 3 -like B9RQC7 AT4G26270.1 - - - - 6: 7: 8: 9 Jcr4S00085.60 6-phosphogluconate decarboxylating 2 B9RVA7 AT5G41670.1 II - - - 6: 7: 8: 9 Jcr4S00056.9 0 6-phosphogluconate decarboxylating 3 B9SXT4 AT3G02360.2 II - - - 6: 7: 8: 9 Jcr4S06785.30 6-phosphogluconate dehydrogenase family protein B9SXT4 AT3G02360.2 II - - - isoform 1 6: 7: 8: 9:10 Jcr4S16847.20 6-phosphogluconolactonase chloroplastic-like B9RWU5 AT5G24400.1 III - - - 6: 7: 8: 9 Jcr4S01522.20 Acetolactate synthase, putative B9SV00 AT2G31810.1 II - - - 6: 7: 8: 9 Jcr4S05012.10 Acetyl- c-acetyltransferase B9SA57 AT5G48230.2 II - - - 6: 7: 8: 9 gi|225544133 Acetyl- carboxylase carboxyltransferase beta subunit G1D767 ATCG00500.1 III - - - 6:7 Jcr4S08822.30 Acetyl- synthetase B9RGS6 AT5G36880.2 - - - - 6: 7: 8: 9 Jcr4S00416.90 Acetyl-CoA carboxylase carboxyl transferase subunit B9SPE5 AT2G38040.1 III - - - alpha 6: 7: 8: 9:10 Jcr4S00736.30 Aconitate cytoplasmic-like B9T2U5 AT2G05710.1 II - - - 78

6: 7: 8: 9:10 Jcr4S09697.10 Aconitate hydratase 1-like B9SXB6 AT4G35830.1 II - - - 6: 7: 8: 9:10 Jcr4S02965.20 Alcohol dehydrogenase, putative B9SJJ8 AT1G77120.1 II - - -2.8781077 6: 7: 8: 9:10 Jcr4S02682.20 Alcohol dehydrogenase B9SJK2 AT1G77120.1 II - - -3.292409 8:9 Jcr4S02532.80 Aldehyde dehydrogenase 7b4 B9T896 AT1G54100.1 - - - - 6: 7: 8: 9:10 Jcr4S03546.10 Aldehyde dehydrogenase family 2 member B9RB49 AT3G48000.1 III - - - 6: 7: 8: 9 Jcr4S26962.10 Aldehyde dehydrogenase family 3 member B9S2Y3 AT1G44170.1 - - - - 6:7 Jcr4S00225.80 Aldolase-type tim barrel family protein isoform 1 B9S0Y9 AT3G14420.1 I - - - 6: 7: 8 Jcr4S03178.30 Aldose 1-epimerase-like B9SWV6 AT3G17940.1 - - - - 6: 7: 8 Jcr4S00816.20 Alpha-1,4 glucan phosphorylase l chloroplastic B9SJB6 AT3G29320.1 IV - - - amyloplastic-like 6:7 Jcr4S01247.10 Alpha -1,4 glucan phosphorylase l chloroplastic B9RCW0 AT3G29320.1 IV - - - amyloplastic-like 6 Jcr4S03116.30 Alpha-galactosidase 1 B9RGK6 AT5G08380.1 - - - - 6:7 Jcr4U36897.20 Alpha-l-arabinofuranosidase 1-like B9SCF3 AT3G10740.1 I - - - 8 Jcr4S01150.20 Alpha-trehalose-phosphate synthase B9SNT9 AT1G68020.2 - - - - 6: 7: 8: 9 Jcr4S00918.60 Ascorbate peroxidase B9T852 AT1G07890.3 I - -3.2984691 - 6: 7: 8 Jcr4S00265.80 Atp-citrate A-3 B9RFP8 AT1G09430.1 IV - - - 6: 7: 8 Jcr4S01988.10 ATP-citrate synthase alpha chain protein 1-like B9SHC9 AT1G60810.1 IV - - - 6: 7: 8: 9 Jcr4S00572.60 ATP-citrate synthase beta chain protein 1-like B9RZR0 AT5G49460.1 I - - - 6: 7: 8: 9:10 Jcr4S03968.30 Beta-xylosidase alpha-l-arabinofuranosidase 2-like B9RIY8 AT5G64570.1 II -0.8883546 - -2.4550163 6: 7: 8: 9 Jcr4S01365.70 Bifunctional polymyxin resistance B9SN65 AT1G08200.1 III - - - 6: 7: 8: 9 Jcr4S03449.40 Biotin caboxylase subunit of accase B9S1E2 AT5G35360.3 I - - - 6 Jcr4S01222.30 Biotin carboxyl carrier B9SJD0 AT5G15530.1 III - - - 6: 7: 8: 9:10 Jcr4S01023.80 Catalase B9S6U0 AT4G35090.1 III - - - 6: 7: 8: 9:10 Jcr4S01159.40 Catalase Q01297 AT4G35090.1 III - - - 6:7 Jcr4S01165.10 Cell wall invertase B9SWG8 AT3G52600.1 - - - - 6:7 Jcr4S01165.20 Cell wall invertase B9SWG8 AT3G52600.1 - - - - 6:7 Jcr4S01165.30 Cell wall invertase B9SWG8 AT3G52600.1 - - - - 79

6: 7: 8: 9 Jcr4S01232.50 Chloroplast acetyl- carboxylase biotin-containing B9RM56 AT5G16390.1 II - - - subunit 6: 7: 8: 9 Jcr4S00215.140 Citrate synthase B9REL6 AT2G44350.2 I - - - 6: 7: 8: 9 Jcr4S03925.10 Cytosolic enolase 3-like B9S376 AT2G29560.1 IV - - - 6: 7: 8: 9:10 Jcr4S00445.90 Cytosolic phosphoglucomutase B9SP64 AT1G70730.3 II - - - 6: 7: 8: 9:10 Jcr4S00112.160 Dihydrolipoyl dehydrogenase B9RZN2 AT3G16950.2 IV -1.6904556 -2.3625701 -6.2854022 6: 7: 8: 9:10 Jcr4S00014.100 Dihyd rolipoyl dehydrogenase mitochondrial -like B9RZW7 AT1G48030.1 II - - - 6: 7: 8: 9 Jcr4S02278.30 Dihydrolipoyllysine-residue acetyltransferase B9SLH2 AT1G34430.1 II - - - component of PDC 6: 7: 8: 9 Jcr4S15391.10 Dihydrolipoyllysine-residue acetyltransferase B9SLH2 AT1G34430.1 III - - - component of PDC 6: 7: 8: 9:10 Jcr4S00306.100 Dihydrolipoyllysine -residue acetyltransferase B9S5V2 AT1G54220.1 I - - - component PCD 6: 7: 8: 9 Jcr4S04485.30 Dihydrolipoyllysine-residue succinyltransferase B9SVA1 AT4G26910.1 II - - - component of 2-oxoglutarate dehydrogenase complex mitochondrial 6:7 Jcr4S01892.10 Enolase chloroplastic -like B9RE72 AT1G74030.1 I - - - 6: 7: 8: 9:10 Jcr4S00171.20 Enolase-like B9R9N6 AT2G36530.1 II -1.2456772 -2.7853496 -5.6314369 6:7 Jcr4S01926.10 Ferredoxin -dependent glutamate synthase 1 B9SLP5 AT5G04140.2 - - - - 6: 7: 8: 9 Jcr4S08446.20 Formate dehydrogenase B9RUT7 AT5G14780.1 III - - - 6: 7: 8: 9 Jcr4S00363.120 Fructokinase 2 B9T544 AT3G59480.1 II - - - 6: 7: 8 Jcr4S02906.10 Fructose-1,6-bisphosphatase, cytosolic B9T2G5 AT1G43670.1 I - - - 6: 7: 8: 9 Jcr4S01786.40 Fructose -bisphosphate aldolase chloroplastic -like B9SJY9 AT2G01140.1 I - - - 6: 7: 8: 9:10 Jcr4S02610.20 Fructose-bisphosphate aldolase cytoplasmic B9T5T6 AT2G36460.1 II - - - 6: 7: 8: 9 Jcr4S00484.40 Fructose -bisphosphate cytoplasmic isozyme 1 -like B9S0W4 AT4G26530.1 I -2.6788626 -2.1454304 - 6: 7: 8: 9:10 Jcr4S14120.10 Fructose-bisphosphate cytoplasmic isozyme-like B9SRH4 AT2G36460.1 II -1.2846907 -2.2335863 -4.5256988 6: 7: 8: 9 Jcr4S07026.30 Fumarate hydratase mitochondrial -like B9SAW4 AT2G47510.1 II - - - 6: 7: 8 Jcr4S00851.70 Galactose kinase B9RZT4 AT3G06580.1 - - - - 6: 7: 8: 9:10 Jcr4S01086.70 Gdp -d-mannose -3 -epimerase B9SZ78 AT5G28840.1 - - - - 80

6 Jcr4S00402.20 Gdp-mannose dehydratase 2-like B9SZV5 AT3G51160.1 - - - - 6 Jcr4S01686.40 Glucose-6-phosphate 1-epimerase-like B9RGR6 AT5G57330.1 - - - - 6: 7: 8: 9:10 Jcr4S09225.20 Glucose-6-phosphate B9RJU9 AT4G24620.1 II - - - 6:7 Jcr4S00182.130 Glucose-6-phosphate isomerase B9R7S8 AT5G42740.1 - - - - 6: 7: 8: 9 Jcr4S00507.10 Glutamine synthetase B9SMC0 AT5G37600.1 II - - - 6: 7: 8: 9 Jcr4S01376.50 Glutamine synthetase B9SMC0 AT5G16570.1 II - - - 6: 7: 8: 9 Jcr4S06509.10 Glutamine synthetase B9SMC0 AT5G37600.1 II - - - 6: 7: 8: 9:10 Jcr4S00209.170 Glutathione-dependent formaldehyde dehydrogenase B9T5W1 AT5G43940.2 III - - - 6: 7: 8: 9:10 Jcr4S00205.140 Glyceraldehyde 3-phosphate dehydrogenase B9RAL0 AT1G13440.1 II -1.21919 -1.6742298 -2.6992952 6: 7: 8: 9:10 Jcr4S00273.150 Glyceraldehyde-3-phosphate dehydrogenase B9RAL0 AT3G04120.1 II -1.5819098 -2.2147728 -3.3406906 6: 7: 8: 9:10 Jcr4S00049.40 Glyceraldehyde-3-phosphate dehydrogenase cytosolic- B9RHV9 AT1G16300.1 II -1.0135148 -1.5353805 -2.9374789 like 7 Jcr4S19620.40 Glycosyl family 1 protein B9REG9 AT5G42260.1 - - - - 6: 7: 8: 9:10 Jcr4S00295.140 Glyoxalase i homolog B9RXK1 AT1G11840.6 I - - - 6: 7: 9 Jcr4S04434.20 Hexokinase 3 B9RQD9 AT4G29130.1 II - - - 6: 7: 8: 9 Jcr4S00679.30 Hexokinase- chloroplastic-like B9R883 AT1G47840.1 II - - - 9 Jcr4S13928.20 Hydroxyacylglutathione hydrolase B9S4H6 AT3G10850.1 - - - - 6 Jcr4S00523.60 Hydroxymethylglutaryl- synthase B9RC08 AT4G11820.2 - - - - 6:7 Jcr4S03980.3 0 Inositol -3-phosphate synthase -like B9T7K3 AT2G22240.1 - - - - 6: 7: 8: 9:10 Jcr4S01952.20 Isocitrate dehydrogenase B9SR98 AT1G65930.1 I - - - 7 Jcr4S19720.20 Isocitrate dehydrogenase B9SRZ2 AT4G35260.1 - - - - 8:9:10 Jcr4S02563.50 Isocitrate lyase B9SUS2 AT3G21720.1 V - - - 6: 7: 8 Jcr4S02228.20 L-galactose -1-phosphate phosphatase B9T037 AT3G02870.1 I - - - 6: 7: 8: 9:10 Jcr4S05113.10 Lysosomal beta glucosidase-like B9SD66 AT5G20950.1 II - - - 6: 7: 8: 9 Jcr4S04564.20 Lysosomal beta glucosidase -like B9SIA5 AT5G20950.1 II - - - 6: 7: 8: 9:10 Jcr4S00279.20 Malate dehydrogenase B9T5E4 AT5G43330.1 III - - -0.9691681 6: 7: 8: 9:10 Jcr4U31193.10 Malate dehydrogenase chloroplastic -like B9RLY1 AT3G47520.1 I -1.6920503 -2.7287621 -4.1587499 6: 7: 8: 9:10 Jcr4S06502.10 Malate dehydrogenase cytoplasmic-like B9RD45 AT1G04410.1 III - - -0.8855413 81

6: 7: 8: 9:10 Jcr4S00056.120 Malate dehydrogenase glyoxysomal-like B9S7S1 AT2G22780.1 II - - - 8:9:10 Jcr4S00100.200 Malate synthase glyoxysomal-like B9RAK0 AT5G03860.1 - - - - 6: 7: 8 Jcr4S00073.80 Melibiase family protein B9S4D3 AT3G56310.1 I - - - 7:8 Jcr4S04459.20 Methylmalonate-semialdehyde dehydrogenase B9RX74 AT2G14170.1 - - - - 6: 7: 8: 9:10 Jcr4S03295.10 Mitochondrial nad-dependent malate dehydrogenase B9SE47 AT1G53240.1 IV -1.9484418 -3.0567744 -3.8137315 6: 7: 8: 9 Jcr4S10295.10 Monodehydroascorbate reductase B9S635 AT3G52880.2 II - - - 6:7 Jcr4S00535.90 Monodehydroascorbate reductase B9RCH2 AT1G63940.2 - - - - 8 Jcr4S01662.90 NADH-cytochrome b5 reductase-like protein B9SEP3 AT5G20080.1 - - - - 6 Jcr4S00093.150 Pectinesterase B9RR22 AT4G33220.1 - - - - 6 Jcr4S00730.100 Carbohydrate kinase family protein B9RDH7 AT5G51830.1 - - - - 6: 8 Jcr4S02004.20 Phosphatidylinositol -trisphosphate 3-phosphatase and B9RYC0 AT3G19420.1 - - - - dual-specificity protein phosphatase pten-like 6: 7: 8: 9:10 Jcr4S08285.10 Phosphoenolpyruvate carboxykinase B9R6Q4 AT4G37870.1 - - - - 7: 8: 9:10 Jcr4S02543.20 Phosphoenolpyruvate carboxykinase B9SSD5 AT4G37870.1 - - - - 6: 7: 8 Jcr4S01008.10 Phosphoenolpyruvate carboxylase B9SWL2 AT3G14940.1 - - - - 6: 7: 8 Jcr4S01942.20 Phosphoenolpyruvate carboxylase B9RWB8 AT1G53310.1 - - - - 6: 7: 8 Jcr4S00675.30 Phosphoenolpyruvate carboxylase 4 -like B9SEG3 AT1G6 8750.1 - - - - 6: 7: 8: 9 Jcr4S26247.20 Phosphoglucomutase B9R9J6 AT5G51820.1 III - - - 6: 7: 8: 9:10 Jcr4S00043.140 Phosphoglycerate kinase chloroplastic -like B9RHY4 AT3G12780.1 II - - - 7 Jcr4S00075.70 Phosphomannomutase a1 B9RDN6 AT2G45790.1 - - - - 6 Jcr4S06157.30 Plastidic aldolase B9RHD4 AT4G38970.1 - - - - 6:7 Jcr4S01910.40 Probable galactinol--sucrose galactosyltransferase 1-like B9SXA4 AT1G55740.1 - - - - 6 Jcr4S00034.190 Probable lactoylglutathione chloroplast -like B9RKL0 AT1G67280.1 I - - - 6: 7: 8: 9 Jcr4S04084.30 Probable rhamnose biosynthetic enzyme 1-like B9SZ19 AT1G78570.1 III - - - 6: 7: 8: 9 Jcr4S07028.20 Probable rhamnose biosynthetic enzyme 1 -like B9RC03 AT1G63000.1 IV - - - 6: 7: 8: 9:10 Jcr4S03703.10 Pyridoxal phosphate (PLP)-dependent B9S4Y5 AT3G22200.2 II - - - 6: 7: 8: 9:10 Jcr4S00002.120 Pyrophosphate -- fructose 6 -phosphate 1 - B9R8U5 AT1G76550.1 III - - - phosphotransferase 82

6: 7: 8: 9 Jcr4S00433.130 Pyrophosphate--fructose 6-phosphate 1- B9RXE7 AT1G12000.1 III - - - phosphotransferase subunit beta 6: 7: 8: 9:10 Jcr4S02762.40 Pyruvate cytosolic isozyme -like B9SRM0 AT3G52990.1 III - - -4.0255351 6: 7: 8: 9 Jcr4S03783.80 Pyruvate cytosolic isozyme-like B9ST42 AT3G52990.1 III - - - 6: 7: 8: 9:10 Jcr4S018 91.30 Pyruvate decarboxylase B9SWY1 AT4G33070.1 I -2.3257702 -3 -4.1622714 6: 7: 8: 9:10 Jcr4S00504.20 Pyruvate decarboxylase isozyme 1-like B9S976 AT4G33070.1 II - - - 6: 7: 8 Jcr4S00575.60 Pyruvate dehydrogenase e1 alpha subunit B9S2H9 AT1G59900.1 - - - - 6: 7: 8: 9 Jcr4S00312.60 Pyruvate dehydrogenase e1 component subunit alpha- B9RNK3 AT1G01090.1 II - - - like 6: 7: 8: 9 Jcr4S00225.120 Pyruvate dehydrogenase e1 component subunit beta B9S0Z5 AT2G34590.1 III - - - 6: 7: 8: 9 Jcr4S00168.90 Pyruvate dehydrogenase e1 component subunit B9RFW4 AT5G50850.1 - - - - mitochondrial 6: 7: 8: 9:10 Jcr4S03116.20 Pyruvate kinase B9RGK5 AT5G08570.1 III - - -2.553691 8 Jcr4S00467.100 Pyruvate kinase B9SBM7 AT5G56350.1 - - - - 6: 7: 8: 9 Jcr4S00001.200 Pyruvate kinase isozyme chloroplastic-like B9RIP4 AT5G52920.1 III - - - 6: 7: 8: 9 Jcr4S07085.10 Pyruvate kinase isozyme chloroplastic -like B9S7Y4 AT3G22960.1 - - - - 6: 7: 8: 9 Jcr4S07494.70 Pyruvate kinase isozyme chloroplastic-like B9RTH5 AT3G22960.1 I - - - 6 Jcr4S04160.60 Pyruvate kinase isozyme chloroplastic -like B9S7Y5 AT3G22960.1 I - - - 6 Jcr4S19776.10 Ribose-phosphate pyrophosphokinase 1 B9SNS3 AT2G35390.2 - - - - 6 Jcr4S06264.30 Ribose -phosphate pyrophosphokinase chloroplastic -like B9SKI9 AT2G44 530.1 - - - -

6: 7: 8: 9:10 Jcr4S00559.30 Ribulose -bisphosphate carboxylase oxygenase large G1D766 ATCG00490.1 V 3.8126299 - - subunit 6: 7: 8: 9:10 gi|225544132 Ribulose-1,5-bisphosphate carboxylase oxygenase large G1D766 ATCG00490.1 - - - - subunit 6 Jcr4S05146.30 Sal1 phosphatase-like B9T8E1 AT5G63980.1 - - - - 6: 7: 8: 9:10 Jcr4S03304.50 Serine hydroxymethyltransferase B9S9Y7 AT4G13930.1 II - -1.493332 - 6: 7: 8: 9 Jcr4S05073.10 Serine hydroxymethyltransferase B9SMK7 AT4G37930.1 IV - - - 6: 7: 8: 9:10 Jcr4S01523.120 Sorbitol dehydrogenase B9R9I0 AT5G51970.1 III - - - 6: 7: 8: 9:10 Jcr4S02872.90 Sorbitol dehydrogenase B9R9I0 AT5G51970.1 III 1.1886616 - - 83

6: 7: 8: 9 Jcr4S28276.10 Starch branching enzyme ii B9T792 AT2G36390.1 III - - - 6: 7: 8: 9 Jcr4S09082.10 Succinate dehydrogenase B9SWW3 AT5G66760.1 II - - - 6: 7: 9 Jcr4S11801.30 Succinate dehydrogenase B9SLS5 AT5G40650.1 - - - - 8:9 Jcr4S01405.50 Succinate-semialdehyde mitochondrial-like B9SUZ1 AT1G79440.1 V - - - 6: 7: 8 Jcr4S00125.80 Succinyl- B9RL91 AT5G23250.1 - - - - 7 Jcr4S11430.10 Sucrose phosphate synthase B9T123 AT5G20280.1 - - - - 6: 7: 8 Jcr4S00093.110 Sucrose synthase 1 B9RR41 AT3G43190.1 I - - - 6: 7: 8: 9 Jcr4S02947.70 Sucrose synthase 2 B9RT94 AT5G49190.1 III - - - 6: 7: 8: 9 Jcr4S02562.30 Sucrose synthase 3 B9SAU6 AT4G02280.1 III - - - 6: 7: 8: 9 Jcr4S03359.20 Sucrose synthase 6 B9SJX1 AT1G73370.1 III - - - 6: 7: 8: 9 Jcr4S01020.50 Transaldolase-like protein B9RG09 AT5G13420.1 I - - - 6: 7: 8 Jcr4S00903.50 Transaldolase-like protein B9T2V8 AT1G12230.2 I - - - 6: 7: 8: 9:10 Jcr4S00057.90 Transketolase B9RDA1 AT2G45290.1 II - - -2.8161923 6: 7: 8: 9:10 Jcr4S06428.10 Triosephosphate cytosolic B9T4H8 AT3G55440.1 II - - - 6: 7: 8: 9:10 Jcr4S02839.20 Triosephosphate isomerase B9STC9 AT2G21170.1 II - - - 6: 7: 8: 9:10 Jcr4S19239.10 Triosphosphate isomerase-like protein type I B9T4H8 AT3G55440.1 II - - -3.0122783 6:7 Jcr4S00150.220 Udp-glucose 4-epimerase B9RCT8 AT1G12780.1 - - - - 6: 7: 8: 9 Jcr4S00282.100 Udp-glucose 6-dehydrogenase B9REG0 AT3G29360.1 II - - - 6: 7: 8: 9:10 Jcr4S00684.10 Udp-glucose pyrophosphorylase B9SKS5 AT5G17310.2 II - - - 6 Jcr4S19521.10 Udp-sugar pyrophospharylase-like B9S2T0 AT5G52560.1 - - - - 6 Jcr4S00335.30 Udp-sulfoquinovose chloroplastic-like B9RRR0 AT4G33030.1 - - - - 6: 7: 8: 9 Jcr4S05019.30 Uridine diphosphate glucose dehydrogenase B9RLR7 AT3G29360.1 II - - - 6: 7: 8: 9 Jcr4S11428.50 Uridine diphosphate glucose dehydrogenase B9REG0 AT5G15490.1 II - - - 6: 8: 9 Jcr4S01995.30 Xylose isomerase-like B9T2E0 AT5G57655.2 - - - - 84

identified. The former was identified from Stages 6-9 and the later from Stages 6-7 indicating the important role for sucrose synthase in the breakdown of sucrose during development. In transcriptomic studies, we observed the transcripts for sucrose synthase and neutral invertases to be expressed at higher levels during the early development stage and during the filling stage in J. curcas , while cell wall invertase genes were expressed at high levels at the early filling stage (Jiang et al., 2012).

All glycolytic pathway enzymes were identified; including phosphoglyceromutase that was not identified from the proteome of J. curcas endosperm plastids curcas (Pinheiro et al., 2013). From the endopserm plastids of J. curcas, with the exception of phosphoglyceromutase , all proteins of the glycolytic pathway were identified, thus indicating a possible ability of this organelle to form pyruvate through a complete glycolytic pathway and also using the phosphoenolpyruvate imported from the cytosol through the phosphate phosphoenolpyruvate translocator that was also found (Pinheiro et al ., 2013). Here, the phosphate/phosphoenolpyruvate translocator precursor was also identified. In integument plastids of J. curcas seeds all the enzymes of the glycolytic pathway were found indicating that both plastids from lipid-storing tissues (endosperm) and from non- lipid-storing tissues (integument) have the ability to form phospohenolpyruvate via a complete glycolytic pathway. The glycolytic pathway will supply the seed wit ATP, carbon compounds and reducing power in the form of NADH necessary for FA synthesis. The fundamental role of glycolytic pathway for the biosynthesis of lipids could be seen, for example in studies which showed a positive correlation between increase in oil content and increase in expression of enzymes from the glycolytic pathway (Loei et al., 2013; Troncoso-Ponce et al., 2009). In conclusion, a complete cytosolic and plastidic glycolytic pathways must be present in J. curcas seeds as indicated in transcriptomic studies (Jiang et al., 2012) contributing to the availability of carbon compounds, such as pyruvate, ATP and NADH necessary for the FA synthesis.

Most of the enzymes involved in the Oxidative Pentose Phosphate (OPP) pathway were found in this study, including glucose-6-phosphate isomerase, transketolase, transaldolase, fructose-bisphosphate aldolase, fructose-1,6- bisphosphatase and 6-phosphofructokinase. Transcript representatives for all enzymes of the OPP were identified in J. curcas developing seeds (Jiang et al., 2012; King et al., 85

2011). The pentose phosphate pathway can supply the seed with carbon compounds for the synthesis of pyruvate, and in addition with reducing power in the form of NADPH necessary for the FA biosynthesis. An increase in the production of pyruvate and NADPH mediated by the up regulation of some enzymes of the OPP could be one the reasons for the high oil content in oil palm fruit (Loei et al., 2013). In Helianthus annus it was observed that the OPP of the plastid can provide all the reductant NADPH needed for FA synthesis without the need for flux through NADP-dependent malic enzyme (Alonso et al., 2007). In contrast, we observed that in developing maize embryos NADP-dependent malic enzyme is responsible for the production of one-third of the carbon and NADPH required for FA synthesis (Alonso et al., 2010). Here, NADP- dependent malic enzyme was not identified, although the transcript was present in developing seeds (Jiang et al., 2012), suggesting that in J. curcas all the NADPH necessary for FA synthesis is supplied by the OPP. Thus, OPP together with the glycolytic pathway guarantee the production of pyruvate, ATP and reducing power required for the FA biosynthesis.

Pyruvate, which is generated in the final step of glycolysis and through intermediates of the OPP, is the precursor for acetyl-CoA, the substrate for de novo FA synthesis. Acetyl-CoA cannot readily cross membranes (Oliver et al., 2011) and will be synthesized by two pathways, mainly by pyruvate dehydrogenase complex (PDHC) and also by acetyl-CoA syntethase. PDHC is composed of three enzymes (pyruvate dehydrogenase subunits, dihydrolipollysine-residue acetyltransferase, and dihydrolipoyl dehydrogenase) all of which were identified here and one of them, dihydrolipoyl dehydrogenase, which was detected until Stage 10, was down regulated during endosperm development. This indicates the importance of PDHC through all endosperm development for the production of acetyl-CoA, especially in the initial stages. Additionally, acetyl-CoA syntethase, which produces acetyl-CoA as an alternative pathway for PDHC using acetate as substrate, was also identified, but oppositely to PDHC was detected only until Stage 7 indicating their main role when the highest production of acetyl-CoA for FA synthesis is required. In conclusion, the supply of acetyl-CoA is obtained from PDHC during all endosperm development with an additional aid of acetyl-CoA synthase in the initial stages, where the demand for carbon compounds used in FA synthesis is higher. 86

Starch synthesis in plants begins with the formation of adenosine diphosphate glucose (ADP-glucose) by the action of ADP-glucose pyrophosphorilase, an enzyme which comprises of two large and two small subunits, each one encoded by distinct genes (Jiang et al ., 2012). The transcripts for these subunits were found in earlier stages of developing seeds (Jiang et al ., 2012), but here none of them were identified probably because they might be used more in advanced stages. The only enzyme related to starch synthesis identified was 1,4-alpha-glucan branching enzyme. Besides, enzymes related to the starch degradation, like starch phosphorylases were also identified. These results support the observations of Jiang et al . (2012) that in J. curcas seed the net content of starch decline during development.

4.5.2 Proteins related to lipids metabolism

Most of the interest in J. curcas stems from its potential to become a source of raw material for the production of biodiesel (Openshaw, 2000). However, understanding the mechanism of lipids metabolism, identifying and further genetic manipulations of the genes responsible for the oil biosynthesis are prerequisites to produce varieties with increased oil yield and modified oil composition (Natarajan and Parani, 2011). In plants, biosynthesis of triacylglycerides occurs in three main steps, (a) synthesis of the FA in plastids, (b) transport of the FA to ER and (c) synthesis of the triacylglycerides. In this study, we have identified various enzymes involved in these biosynthetic pathways (Table 4).

The first committing step in the FA biosynthesis depends on the supply of acetyl-CoA, that is provided by the action of plastidial PDHC or acetyl synthetase (ACS) (Tovar-Mendez, et al ., 2003; Lin and Oliver, 2008). In this study, we have identified isoforms for all the subunits of the PDC (E1 α and E1 β-pyruvate dehydrogenase, E2-dihydrolipoamide acetyltransferase, and E3-dihydrolipoyl dehydrogenase) (Table 4). A PDC bypass pathway exists in Arabidopsis and other plants that make use of the Acetyl-CoA synthetase for conversion of the fermentation products, ethanol and acetaldehyde to FA (Lin and Oliver, 2008; Oliver et al ., 2009). It is well established that acetyl-CoA provide by the ACS bypass is not the major product for the bulk FA synthesis, however, ACS has an important function in plant 87

Table 4: Proteins related to the lipids metabolism, identified from the endosperm of developing Jatropha curcas seeds. Column 1 represents, in how many developmental stages a particular protein appeared. Column 2 represents the protein accession numbers in the genome. Column 3 represents the Blast2GO descriptions of the proteins. Column 4 represents best Ricinus communis homologues of the Jatropha curcas proteins. Column 5 represents the best Arabidopsis thaliana homologues for the Jatropha curcas proteins. Column 6 shows presence of the proteins in different protein clusters. Column 7, 8 and 9 shows log2 of the ratios of the spectral counts of the proteins appeared in Stage 8, 9 and 10, respectively, to Stage 6.

Developmental Best Ricinus Best Arabidopsis Log2 Log2 Log2 Protein ID Description Cluster stage Hit UNIPROT Hit TAIR E8/E6 E9/E6 E10/E6 6: 7: 8 Jcr4S00575.60 Pyruvate dehydrogenase α B9S2H9 AT1G59900.1 - - - - 6: 7: 8: 9 Jcr4S00312.60 Pyruvate dehydrogenase α B9RNK3 AT1G01090.1 III - - - 6:7 Jcr4S01594.10 Pyruvate dehydrogenase B9RFW4 AT5G50850.1 - - - - 6: 7: 8: 9 Jcr4S00168.90 Pyruvate dehydrogenase B9RFW4 AT5G50850.1 - - - - 6: 7: 8: 9 Jcr4S00225.120 Pyruvate dehydrogenase β B9S0Z5 AT2G34590.1 IV - - - 6: 7: 8: 9 Jcr4S01954.30 Dihydrolipoyllysine-residue acetyltransferase B9ST02 AT3G25860.1 III - - - 6: 7: 8: 9:10 Jcr4S00306.100 Dihydrolipoyllysine-residue acetyltransferase B9S5V2 AT1G54220.1 I - - - 6: 7: 8: 9 Jcr4S02278.30 Dihydrolipoyllysine-residue acetyltransferase B9SLH2 AT1G34430.1 III - - - 6: 7: 8: 9 Jcr4S15391.10 Dihydrolipoyllysine-residue acetyltransferase B9SLH2 AT1G34430.1 IV - - - 6:7 Jcr4U32796.10 Dihydrolipoyllysine-residue acetyltransferase B9S5V2 AT1G54220.1 I - - - 6: 7: 8: 9:10 Jcr4S00112.160 Dihydrolipoyl dehydrogenase B9RZN2 AT3G16950.2 II -1.69046 -2.36257 -6.2854 6: 7: 8: 9:10 Jcr4S00014.100 Dihydrolipoyl dehydrogenase B9RZW7 AT1G48030.1 III - - - 6:7 Jcr4S08822.30 Acetyl-coa synthetase B9RGS6 AT5G36880.2 - - - - 6: 7: 8: 9:10 Jcr4S00504.20 Pyruvate decarboxylase isozyme B9S976 AT4G33070.1 III - - - 6: 7: 8: 9:10 Jcr4S01891.30 Pyruvate decarboxylase B9SWY1 AT4G33070.1 I -2.32577 -3 -4.16227 6: 7: 8: 9:10 Jcr4S00529.80 Alcohol dehydrogenase B9T7D8 AT1G77120.1 III - - -3.34239 6: 7: 8: 9:10 Jcr4S02682.10 Alcohol dehydrogenase B9T7D8 AT1G77120.1 III - - -2.87811 6: 7: 8: 9:10 Jcr4S02682.20 Alcohol dehydrogenase B9SJK2 AT1G77120.1 III - - -3.29241 6: 7: 8: 9:10 Jcr4S02965.20 Alcohol dehydrogenase B9SJJ8 AT1G77120.1 III - - -2.87811 88

6: 7: 8: 9 Jcr4S03028.20 Alcohol dehydrogenase B9SJK2 AT1G77120.1 III - - -2.87811 6: 7: 8 Jcr4S11865.10 Alcohol dehydrogenase B9SJK2 AT1G77120.1 III - - -2.87811 6:7 Jcr4S01206.10 Alcohol dehydrogenase B9SUZ4 AT3G03080.1 - - - - 6: 7: 8: 9:10 Jcr4S03546.10 Aldehyde dehydrogenase B9RB49 AT3G48000.1 IV - - - 6: 7: 8: 9:10 Jcr4S08423.20 Aldehyde dehydrogenase B9RB49 AT1G23800.1 III - - -2.37681 6: 7: 8: 9:10 Jcr4S02497.10 Aldehyde dehydrogenase B9RB49 AT3G48000.1 III - - -2.37681 6: 7: 8: 9 Jcr4S26962.10 Aldehyde dehydrogenase B9S2Y3 AT1G44170.1 - - - - 8:9 Jcr4S02532.80 Aldehyde dehydrogenase B9T896 AT1G54100.1 - - - - 6 Jcr4S01936.100 Aldehyde dehydrogenase ------6: 7: 8: 9 Jcr4S03449.40 Biotin carboxylase subunit of accase B9S1E2 AT5G35360.3 I - - - 6:7 Jcr4S25260.10 Biotin carboxylase subunit of accase B9S1E2 AT5G35360.3 I - - - 6: 7: 8: 9 Jcr4S01232.50 Biotin carboxyl carrier protein accase B9RM56 AT5G16390.1 III - - - 6: 7: 9 Jcr4S02376.30 Biotin carboxyl carrier protein accase B9RV21 AT3G15690.2 - - - - 6 Jcr4S02867.30 Biotin carboxyl carrier protein accase B9S484 AT3G56130.1 - - - - 6 Jcr4S01222.30 Biotin carboxyl carrier protein accase B9SJD0 AT5G15530.1 IV - - - 6: 7: 8: 9 Jcr4U29862.20 Carboxyltransferase subunit beta G1D767 ATCG00500.1 IV - - - 6: 7: 8: 9 gi|225544133 Carboxyltransferase subunit beta G1D767 ATCG00500.1 IV - - - 6 Jcr4U32160.10 Carboxyltransferase subunit beta G1D767 ATCG00500.1 IV - - - 6: 7: 8: 9 Jcr4S00416.90 Carboxyltransferase subunit Alpha B9SPE5 AT2G38040.1 IV - - - 6: 7: 8: 9 Jcr4S04735.20 Malonyl-coa: ACP acyltransferase B9RRJ0 AT2G30200.1 - - - - 6: 7: 8: 9 Jcr4S00903.20 Ketoacyl-ACP synthase III (KAS III) A6N6J4 AT1G62640.1 IV - - - 6: 7: 8 Jcr4S04655.30 Ketoacyl-ACP synthase II (KAS II) B9RR59 AT1G74960.2 - - - - 8:9 Jcr4S08397.10 Ketoacyl-ACP synthase I (KAS I) B9RYN0 AT5G46290.3 - - - - 6: 7: 8: 9 Jcr4S02541.50 Ketoacyl-ACP synthase I (KAS I) Q41135 AT5G46290.3 IV - - - 6: 7: 8: 9:10 Jcr4S00273.200 Ketoacyl-acp reductase (kar) B9RMM7 AT1G24360.1 IV - - -1.69072 6: 7: 8: 9:10 Jcr4U30974.10 Ketoacyl-acp reductase (kar) B9RMM7 AT1G24360.1 IV - - -1.70044 6: 7: 8: 9:10 Jcr4S03253.10 Enoyl-ACP reductase B9T5D7 AT2G05990.1 III - - - 6: 7: 8: 9 Jcr4S01960.30 Enoyl-ACP reductase B9SVD7 AT2G05990.1 III - - - 89

6: 7: 8: 9:10 Jcr4S03305.10 Hydroxyacyl-ACP dehydrase B9SDU7 AT5G10160.1 III - - - 6 Jcr4S04843.20 Enoyl-coa hydratase B9RPB0 AT4G13360.1 - - - - 6: 8 Jcr4S00100.170 Short chain type dehydrogenase B9RAK4 AT1G24360.1 IV - - - 6:7 Jcr4S03522.10 Acyl-[acyl-carrier-protein] desaturase B5AY36 AT1G43800.1 - - - - 6:7 Jcr4S03070.30 Acyl-[acyl-carrier-protein] desaturase B9RM11 AT3G02630.1 I - - - 6: 7: 8: 9 Jcr4S01370.20 Acyl-[acyl-carrier-protein] desaturase B9T0X0 AT2G43710.2 I - - - 6:7 Jcr4S03070.40 Acyl-[acyl-carrier-protein] desaturase B9RM11 AT3G02630.1 - - - - 6: 7: 8: 9 Jcr4S00539.70 Acyl-ACP thioesterase (fata) A9XK92 AT3G25110.1 III - - - 6:7 Jcr4S01650.50 Acyl-ACP thioesterase B9RDR2 AT1G01710.1 I - - - 6: 7: 8: 9 Jcr4S01110.50 Long-chain-fatty-acid coa ligase B9RJ88 AT1G77590.1 III - - - 7 Jcr4U29918.10 Acyl-coa binding protein B9RF93 AT3G05420.2 - - - - 6:7 Jcr4S02092.50 Acyl-coa binding protein B9RF93 AT3G05420.2 - - - - 6: 7: 8: 9:10 Jcr4S00616.80 3-hydroxyacyl-coa dehyrogenase B9RT76 AT3G06860.1 V - - - 6: 7: 8: 9 Jcr4S00021.320 3-hydroxyacyl-coa dehyrogenase B9RKN5 AT4G29010.1 III - - - 7:8 Jcr4S00783.20 Acyl coa oxidase B9SH72 AT4G16760.1 - - - - 6: 7: 8: 9:10 Jcr4S01018.60 3-ketoacyl- thiolase B9RWL7 AT2G33150.1 III - - - 6: 7: 8: 9 Jcr4S15816.20 3-ketoacyl- thiolase peroxisomal B9S554 AT2G33150.1 III - - - 6: 7: 8: 9 Jcr4S00086.110 Phospholipase D B9RV56 AT1G52570.1 - - - - 6:7 Jcr4S00229.40 Phospholipase C B9REB9 AT1G13680.1 III - - - 8:9:10 Jcr4S02563.50 Isocitrate lyase B9SUS2 AT3G21720.1 V - - - 8:9:10 Jcr4S00100.200 Malate synthase glyoxysomal-like B9RAK0 AT5G03860.1 - - - - 6: 7: 8: 9:10 Jcr4S01081.40 Glycerophosphoryl diester phosphodiesterase B9SSQ8 AT5G55480.1 I -3.39742 -4.030879 - 6:7 Jcr4S00546.40 Acyl carrier protein B9REW6 AT1G54630.1 - - - - 6 Jcr4S00106.80 Acyl carrier protein B9RSZ1 AT1G54630.1 - - - - 6: 7: 8: 9:10 Jcr4S00132.120 Lipid-transfer protein B9RMD9 AT1G27950.1 V - - - 6 Jcr4S00472.40 Glycolipid transfer protein B9SV19 AT2G33470.1 - - - - 6 Jcr4S02256.40 Glycolipid transfer protein B9SS59 AT2G33470.1 - - - - 6: 7: 8: 9 Jcr4S01299.10 Lipid binding B9RIH2 - - - - - 90

6: 7: 8: 9 Jcr4S03664.20 Lipid binding B9RIH2 - - - - - 6: 7: 8: 9 Jcr4U31158.20 Lipid binding B9RIH3 - - - - - 6: 7: 9 Jcr4S16168.20 Lipid binding B9RIH2 - I - - - 91

growth and development (Lin and Oliver, 2008). Besides the identification of the ACS in this study, we also identified the isoforms for the other enzymes of this bypass pathway i.e. pyruvate carboxylase, alcohol dehydrogenase and aldehyde dehydrogenase (Table 4), revealing the existence of this bypass pathway in the developing seeds of J. curcas . Besides the identification of these enzymes related to the acetyl-CoA production, we have identified a full set of the proteins involved in the de novo FA biosynthesis including different kinds of KAS i.e. KASI, KASII and KASIII, among others (Table 4). KASI is essential for the seed oil content (Wu and Xue, 2010), while KASII is useful for manipulation of the chain length of the FA (Allen et al ., 1999). J. curcas oil has a high viscosity (Gübitz et al ., 1999) which can be reduced by decreasing the 18-carbon FA quantity in its oil (Natarajan and Parani, 2011). Previous reports showed that this can be achieved by silencing the activity of KASII that converts the palmitoyl-ACP to stearoyl-ACP (Aghoram et al ., 2006; Nguyen and Shanklin, 2009). Alternatively, activity of the enzyme palmitoyl-ACP thioesterase (encoded by FatA) (Table 4) could be elevated to accelerate the cleavage of the palmitic acid from the palimitoyl-ACP before its conversion to stearoyl-ACP (Dormann et al ., 2000). Although transcriptomic studies have shown (Costa et al ., 2010; Jiang et al ., 2010) the transcripts for different desaturases, including, stearate desaturase, oleate desaturase (FAD2) and linoleate desaturase (FAD3), we only identified the homologues of the Arabidopsis stearoyl-ACP desaturase (SAD), an enzyme that catalyze the first desaturation step in the FA metabolism (Table 4). High degree of unsaturation can affect the oxidative stability and ignition quality of the biodiesel (Knothe, 2005) and hence high content of unsaturated FA in the oil of J. curcas is not desirable for the biodiesel production. Reduction in the unsaturation through the antisense expression of the SAD gene has already achieved in the transgenic Brassica napus (Knutzon et al ., 1992). SAD gene silencing in the J. curcas seeds could be an efficient strategy to reduce the unsaturation in its oil (Natarajan and Parani, 2011).

Once completed and released from ACPs, free FA are added with CoA to produce the corresponding acyl-CoAs by the action of long chain acyl-CoA synthetases (LCACS), in order to activate them for transporting to the ER (Natarajan and Parani, 2011). Besides the identification of a LCACS, we also identified two acyl-CoA binding proteins (ACBPs) (Table 4) that are necessary for their protection from acyl-CoA 92

, while transporting to the ER (Natarajan and Parani, 2011). Previously, during the transcriptomic studies with J. curcas , multiple transcripts were reported for both LCACS and ACBP (Natarajan and parani, 2011; King et al., 2011). Once arrived to the ER, the FA will be added to the glycerol backbone to complete the triacylglycerides or oil biosynthesis using enzymes like, glycerol-3-phosphate acyl transferase (GPAT), LPAAT and diacyl glycerol acyl transferase (DGAT). So far, any of the proteomic analyses made for J. curcas seeds (Liu et al ., 2009; Yang et al ., 2009; Liu et al ., 2011; Liu et al ., 2013; Booranasrisak , et al., 2013), including this study, were unable to identify these enzymes, however, transcripts for these enzymes were detected during different transcriptomic studies (Costa et al ., 2010; Natarajan et al ., 2011; Jiang et al ., 2012). Previously, proteomic analysis of the developing Ricinus communis seeds either failed (Houston et al ., 2009) to identify these enzymes or identified only GPAT enzyme (Nogueira et al ., 2013).

Besides the identification of the proteins related to the lipids biosynthesis, we also identified proteins involved in the lipids degradation (Table 4), including enzymes of the β-oxidation pathway, enzymes of the glyxoylate cycle like isocitrate lyase and malate synthase, and lipases like phospholipase C and phospholipase D (Table 4). In Brassica napus , where embryo is the main site of oil deposition, it was observed that at least 10% of the triacyleglycerides is lost during the desiccation phase of the seed development (Chia et al ., 2005). They showed that developing Brassica napus embryos degrade FA and a complete pathway of the catabolism is active in the embryos during the main period of lipids deposition. Similar kind of observations were made during the proteomic analysis of the endosperm from the developing R. communis seeds (Nogueira et al ., 2013), where the authors identified an array of the proteins involved in the mobilization of the oil. In a recent study with the developing J. curcas seeds (Booranasrisak et al ., 2013), it was shown that the up regulation of a lipid catabolism related transcript, 4-coumarate: CoA ligase, was associated with the decrease in the FA contents in the seed kernel. The authors suggested that during development, oil in the endosperm was mobilized and the intermediates of the β-oxidation may be utilized for other metabolic processes. In a proteomic analysis of the inner integument (data not shown), besides the identification of the proteins related to the lipids catabolism, we also identified an array of the peptidases of the different mechanistic classes, involved 93

in a process of developmental programmed cell death (PCD), to gear the carbon and nitrogen sources to the developing endosperm, a process previously described for the nucellus of the developing castor seeds as well (Nogueira et al ., 2012). Identification of the proteins related to the lipids degradation (Table 4) along with the peptidases of the different mechanistic classes (Table 6) in the present study and from the inner integument of the J. curcas seeds, revealed that the process of reserves mobilization during seed development could be a common feature of the J. curcas and other oilseeds, to provide energy for the various metabolic activities occurring during seed development.

Among the proteins related to the lipids metabolism, identified in this study, 55 out of 80 were present in at least three or more than three developmental stages, while 25 proteins were found in at least one or two developmental stages (Table 4), showing that lipids metabolism occurs during the entire developing phase of the seeds. Out of 80 proteins, 52 were present in one of the five protein clusters (Supplementary Table IV), while only 13 proteins were found to be significantly differentially expressed with reference to the Stage 6. Proteins that were grouped to Cluster 4 (one with constant expression) include components of the ACCase, KAS III and I. ACCase catalyze the first important step in the long chain FA biosynthesis while KAS III and I are important for the chain elongation. These results suggest that synthesis of the FA till 16:0-ACP chain could be a constant process during the developing seeds of J. curcas .

4.5.3 Seed Storage and Desiccation related proteins

SSPs in the seeds are important source of amino acids for animal feed. J. curcas seed meal has been analyzed and shown, that the protein composition of its meal is comparable with soybean meal, containing a balance of the essential amino acids with the exception of lysine (Makkar et al ., 1998a; Makkar et al ., 1998b). Here we identified 16 storage proteins (Table 5) belong to the different classes of SSPs, like 11s globulins (Legumins), 7s globulins (Vicilins), 2s albumin and glutelins. Transcriptomic analyses of the developing J. curcas seeds (Costa et al ., 2010; King et al ., 2011; Jiang et al ., 2012) also resulted in the identification of the different classes of SSPs with most 94

Table 5: Seed Storage Proteins (SSPs) and proteins related to seed maturation, identified from the endosperm of developing Jatropha curcas seeds. Column 1 represents, in how many developmental stages a particular protein appeared. Column 2 represents the protein accession numbers in the genome. Column 3 represents the Blast2GO descriptions of the proteins. Column 4 represents best Ricinus communis homologues of the Jatropha curcas proteins. Column 5 represents the best Arabidopsis thaliana homologues for the Jatropha curcas proteins. Column 6 shows presence of the proteins in different protein clusters. Column 7, 8 and 9 shows log2 of the ratios of the spectral counts of the proteins appeared in Stage 8, 9 and 10, respectively, to Stage 6.

Developmental Best Ricinus Best Arabidopsis Log2 Log2 Log2 Protein ID Description Cluster stage Hit UNIPROT Hit TAIR E8/E6 E9/E6 E10/E6 6: 7: 8: 9:10 Jcr4S01636.40 11S globulin seed storage protein 2-like B9SW16 AT1G03890.1 V 3.041152 3.360806 4.11798 6: 7: 8: 9:10 Jcr4S01636.60 Legumin a Q9M4Q8 AT5G44120.3 V - 1.949687 2.728928 6: 7: 8: 9:10 Jcr4S01636.70 Legumin a B9SDX6 AT5G44120.3 V - - - 6: 9:10 Jcr4U29577.10 Legumin a B9SDX6 AT5G44120.3 V 3.041152 3.360806 4.11798 9:10 Jcr4S15668.10 Legumin a B9SDX6 AT5G44120.3 V 3.041152 3.360806 4.11798 6: 7: 8: 9:10 Jcr4S03723.20 Vicilin-like antimicrobial peptides 2-1 B9SFI3 AT3G22640.1 V 3.519374 - 6.202069 6: 7: 8: 9:10 Jcr4S00603.20 Vicilin-like antimicrobial peptides 2-2 B9SK34 AT2G18540.1 V 1.314381 - 1.442222 6: 7: 8: 9:10 Jcr4S03153.60 Vicilin-like antimicrobial peptides 2-3 B9SK34 AT4G36700.1 V - - - 6: 7: 8: 9:10 Jcr4S15278.20 Vicilin-like antimicrobial peptides 2-3 B9SK34 AT4G36700.1 V - - - 8:9:10 Jcr4S00619.70 2S albumin B9SA35 - V - - - 6: 7: 8: 9:10 Jcr4S00279.60 Glutelin type-A B9T5E7 AT5G44120.3 V 2.221236 2.416626 2.877723 6: 7: 8: 9:10 Jcr4S00279.80 Glutelin type-A B9T5E6 AT5G44120.3 V - 4.129283 6.194757 6: 7: 8: 9 Jcr4S00423.10 Nutrient reservoir B9SYN7 AT2G28680.1 I -2.483083 - - 6: 7: 8: 9 Jcr4S01617.40 Nutrient reservoir B9SYN7 AT2G28680.1 I -2.724366 - - 6: 7: 8: 9 Jcr4S08024.20 Nutrient reservoir B9SYN7 AT1G07750.1 I -2.308285 -2.910322 - 6: 7: 8: 9 Jcr4S03933.20 Nutrient reservoir B9SYN7 AT1G07750.1 I -2.308285 -2.910322 - 9:10 Jcr4S00706.60 Late embryogenesis abundant domain-containing B9RBC1 AT4G21020.1 - - - - 10 Jcr4S00772.40 Late embryogenesis abundant domain-containing B9S750 AT1G72100.1 - - - - 6:7 Jcr4S01793.30 Late embryogenesis abundant group 2 isoform 1 B9T526 AT2G44060.1 - - - - 95

6: 7: 8: 9:10 Jcr4S00300.40 Late embryogenesis abundant protein B9SRL2 AT2G36640.1 V 2.632382 - - 8 Jcr4S06898.40 Late embryogenesis abundant protein B9RDY8 AT2G46140.1 - - - - 8:9:10 Jcr4S05404.40 Late embryogenesis abundant protein 2-like B9RV15 - - - - - 9:10 Jcr4S00025.50 Late embryogenesis abundant protein d-29-like B9S010 - - - - - 8:9:10 Jcr4S02308.90 Late embryogenesis abundant protein d-34 B9S3Z7 AT3G22490.1 V - - - 10 Jcr4S02308.80 Late embryogenesis abundant protein d-34-like B9S3Z6 AT3G22490.1 - - - - 7: 8: 9:10 Jcr4S01102.20 Late embryogenesis abundant - AT2G42560.1 V - - - 7:8:10 Jcr4S16138.20 Seed maturation protein lea 4 - AT5G06760.1 - - - - 6: 7: 8: 9:10 Jcr4S01519.40 Dehydrin B9R8J3 AT1G20440.1 IV - - - 10 Jcr4S02525.30 Dehydrin B9S697 AT3G50980.1 - - - - 8:9:10 Jcr4S00349.40 Dehydrin B9RH90 AT2G21490.1 IV - - - 10 Jcr4S28232.10 Oleosin B9RRX4 AT2G25890.1 - - - - 6: 7: 8: 9:10 Jcr4S00009.70 Oleosin 1 B9RAW7 AT3G01570.1 V - - - 8:10 Jcr4S06252.20 Oleosin 16 kda-like B9RE45 AT3G18570.1 - - - - 7: 8: 9:10 Jcr4S05992.20 Oleosin 2 Q5VKJ9 AT3G01570.1 V - - - 6: 7: 8: 9:10 Jcr4S01276.90 Oleosin low molecular weight isoform Q5VKJ8 AT4G25140.1 V 3.476438 - 5.023333

96

prominent as the globulins. Proteins fractionation of the J .curcas seeds revealed (Peralta-Flores et al ., 2012), that globulins and glutelins are the predominant form of SSPs in the seeds of this plant followed by the albumins and prolamins. Consistent to this report, among the SSPs identified in this study, a globulin (Jcr4S03723.20) and glutelin (Jcr4S00279.80) proteins represented the highest deposition in the mature seeds (Table 5).

Globulins are the most widely distributed SSPs, present both in the monocots and dicots, with two major classes, named 11s (Legumins) and 7s (Vicilins) globulins respectively (Shewry et al ., 1995). 7s globulin proteins are single chain without disulfide bonds, usually aggregate to form trimers while in contrast the 11s globulins are double chain proteins and aggregate to form hexamers. Globulins are usually deficient in sulfur containing amino acids and hence most strategies to improve the globulins are directed at increasing their sulfur content (Casey, 1999), however, according to the World Health Organization (WHO) standards, the globulin fraction of J. curcas seeds was found to have a low level of lysine and tryptophan amino acids (Peralta-Flores et al ., 2012). 7s globulins are distinguished from the 11s globulins due the high level of sulfur containing amino acids, that make them specialized reserve for sulfur containing amino acids to be mobilized for the newly form plantlet during germination (Tan-Wilson and Wilson, 2012). Besides their role as reserve proteins, 7s globulins (Vicilins) are widely recognized for their allergenic properties ( Teuber et al ., 1999; Wang et al ., 2002; Sanchez-Monge et al ., 2004). Among the total 16 reserve proteins identified in this study, 5 belong to the group of 11s globulins and 4 to the 7s globulins (Table 5). Consistent to their role as nutrient reservoirs, quantitative analysis of the MS data led us to group these proteins to Cluster 5, where they showed sharp quantitative differences in the deposition pattern, suggesting a genetic control in the expression of these different globulin proteins. For example a 7s globulin proteins (Jcr4S03723.20) has more than 12 times higher spectrum counts at the mature stage compare to Stage 6 while the spectrum counts of the 7s globulin (Jcr4S00603.20) is 2.8 times higher at the mature stage, suggesting that there may be a differential mobilization for each individual protein during germination, as previously suggested for the differentially expressed globulin proteins identified from the developing R. communis seeds (Nogueira et al ., 2013). Previously, it was reported (Costa et al ., 2010), that the 97

transcripts of the 11s globulin proteins and an aspartyl peptidase, that is known to be involved in the processing of the precursors of storage proteins to mature proteins (Voigt et al., 1997), were amongst the 20 highly expressed transcripts, detected in the developing J. curcas seeds. Consistent to this report, besides the identifications of these 11s globulin proteins we also identified an aspartyl peptidase (Table 6, Jcr4S00223.70) that had a higher expression level in comparison to other aspartic peptidases identified here. This peptidase was identified in all the five developmental stages but in contrast to most of the SSPs its deposition was found to be decreasing during seed maturation.

We also identified one member of the 2s albumin family of the SSPs. 2S albumins represent another important class of SSPs, of which the first member to be sequenced was isolated from the seeds of R. communis (Sharief and Li, 1982). These proteins are not only important for their role as the nutrient reservoir during seed germination but they are also known as powerful allergens (Thorpe et al ., 1988; Sirvent et al ., 2012). 2S albumins are cysteine and methionine rich storage proteins and are considered as specialized sulfur reserves to be used during germination (Costa et al ., 2010).

Glutelins are another class of SSPs of which we identified two members in this study (Table 5). Glutelins are the major storage proteins of the rice endosperm consisting of acidic and basic subunits in the mature form and have a high homology to the 11s globulins at the primary sequence level (Takaiwa et al ., 1999). It was found that like globulins, the J. curcas glutelins also contain a low level of lysine and tryptophan amino acids according to the WHO standard (Peralta-Flores et al ., 2012).

We have identified four proteins (Jcr4S00423.10, Jcr4S01617.40, Jcr4S08024.20 and Jcr4S03933.20) which are classified as nutrient reservoir and belong to the Cupin_2 super family of proteins. This is a diverse family of proteins containing a conserved barrel domain, which gave name to this family ( cupa is the Latin term for a small barrel), and includes a wide variety of enzymes, but also contains the non- enzymatic SSPs (Dunwell, 1998). Even though similar proteins were found in other species (Nogueira et al ., 2013), until now the role of these proteins as reserve proteins have not been validated through experimentation. As the pattern of deposition of these proteins (grouped in cluster I) is not the same as the other seed reserve proteins (all of 98

them grouped in cluster V), we suggest that these nutrient reservoir proteins are not classical seed reserve proteins and may have specific roles during seed formation.

A number of proteins other than SSPs, expressed abundantly during late embryogenesis phase, including the LEA proteins and oleosins, were also identified in this study (Table 5). LEA proteins have hydrophilic characteristics and are involved in plant stress responses including desiccation tolerance. LEA proteins expression is not only restricted to the embryogenesis but they also express in pollen and in vegetative tissues in response to different kind of stresses (Amara et al ., 2012). Among the LEA proteins identified in this study, only five were grouped to any cluster, of which 3 were present in Cluster 5 while two were present in Cluster 4, which is consistent to their role during desiccation. Three LEA proteins (Jcr4S00772.40, Jcr4S02308.80 and Jcr4S02525.30) were only appeared in the mature seeds, indicating their special role while seeds are in the full desiccation phase.

Besides the LEA proteins, we also identified five isoforms of the oleosin proteins (Table 5). Oleosins are relatively small hydrophobic proteins associated with OBs, a cellular organelle for storage of triacylglycerides. They are synthesized preferentially in seeds and considered to be important for stability of OBs during desiccation while act as for different lipids degrading enzymes during germination (Siloto et al ., 2006). In a previous study (Popluechai et al ., 2011 ), the OB proteome from the J. curcas seeds resulted in the identification of the three different oleosins, of which one was characterized both at the transcript and protein level and suggested as a candidate marker for the phylogenetic and breeding studies. Besides the identification of these three oleosins (Jcr4S00009.70, Jcr4S05992.20 and Jcr4S01276.90), we also identified two more isoforms of oleosins, of which one is specific to the mature seed stage, showing its special role in the mature seeds. Proteomic analysis of the developing R. communis endosperm and transcriptomic analysis of the J. curcas seeds also showed the presence of the five oleosin isoforms in seeds (King et al., 2011). It was previously shown, that the transcript accumulation of the oleosin genes in developing J. curcas seeds is related to the TAG biosynthesis and desiccation tolerance (Xu et al ., 2011). Consistent to these observations, three out of five oleosins identified here were grouped to the Cluster 5. Oleosin 3 (Jcr4S01276.90), which was previously ( Popluechai et al ., 2011 ) suggested as potential marker for 99

breeding studies, appeared significantly differentially expressed in our study, with a higher deposition level at the mature seed stage than the other developmental stages (Table 5).

4.5.4 Peptidases and peptidase inhibitors

Peptidases hydrolyze peptide bonds between internal or external amino acids of proteins (Tam et al., 2004), and are involved in different aspects of plant development like, the ubiquitin/proteasome pathway including embryogenesis, photomorphogenesis, circadian rhythms, flower, seed, fruit and trichome development (Vierstra, 2003). They also participate in degradative processes to remove unwanted, misfolded and damaged proteins (reviewed by Ingvardsen and Veierskov, 2001), and additionally participate in the activation and/or maturation of proteins through specific proteolytic cleavages. Another important role of peptidases in seeds, is in the proteolytic degradation of seed reserve proteins during germination to provide carbon and nitrogen sources to the growing embryo.

According to Hartley, (1960) proteolytic enzymes could be classified according to the catalytic type. Currently, according to the peptidases database, MEROPS, this classification was extended to seven mechanistic classes as serine, cysteine, aspartic, metallo, threonine, glutamic or asparagine. In this study we identified 122 different peptidases and peptidase inhibitors, correspond to almost 7% of the total identified proteins, and were mainly classified to five different mechanistic classes i.e. serine, cysteine, aspartic, metallo and threonine (Table 6). The most representative mechanistic class was serine (24,4%) followed by metallo (22%), aspartic (19.5%), cysteine (16.3%) and threonine (12.2%) (Figure18). Inhibitors were only identified for peptidases belonging to the mechanistic class of serine and cysteine peptidases.

The presence of peptidases in mature seeds suggests that these enzymes are responsible for the initial cleavage of storage proteins. In Vicia sativa for example, it was observed that globulin mobilization in germinating seeds was initiated by cysteine peptidases stored in protein bodies of embryo axes and cotyledons (Schlereth et al., 2000). 100

Table 6: Peptidases and peptidase inhibitors, identified from the endosperm of developing Jatropha curcas seeds. Column 1 represents, in how many developmental stages a particular protein appeared. Column 2 represents the protein accession numbers in the genome. Column 3 represents the Blast2GO descriptions of the proteins. Column 4 represents the type of mechanistic class to which a peptidase or peptidase inhibitor belongs. Column 5 represents the best MEROPS homologues of the Jatropha curcas proteins. Column 6 represents the best Arabidopsis thaliana homologues for the Jatropha curcas proteins. Column 7 shows presence of the proteins in different protein clusters. Column 8, 9 and 10 shows log2 of the ratios of the spectral counts of the proteins appeared in Stage 8, 9 and 10, respectively, to Stage 6.

Best Ricinus Developmental Best HIT Log2 Log2 Log2 ProteinID Description Mechanistic Hit Cluster stage MEROPS E8/E6 E9/E6 E10/E6 class UNIPROT 6:7 Jcr4S00012.40 Protein Aspartic MER171245 B9RE23 I - - - 8:9 Jcr4S00053.140 Eukaryotic aspartyl protease Aspartic MER155849 B9RTU6 - - - - 6: 7: 8: 9:10 Jcr4S00063.130 Aspartic proteinase Aspartic MER134814 B9RXH6 III - - - 6: 7: 8: 9:10 Jcr4S00223.70 Aspartyl protease Aspartic MER119875 B9RJV7 I -1.678543 -2.38323 -4.18293 6: 7: 8: 9 Jcr4S00547.20 Aspartic proteinase Aspartic MER204669 B9SVA7 II - - - 6 Jcr4S00637.70 complex Aspartic - B9RF33 - - - - 6: 7: 8: 9:10 Jcr4S00739.60 Aspartic proteinase -1 Aspartic MER170824 B9RG92 III - -2.03037 - 6: 7: 8: 9 Jcr4S00806.10 Peptidase aspartic Aspartic MER201398 B9SCE5 I - - - 6: 7: 8 Jcr4S01232.10 Aspartic proteinase-like Aspartic MER169780 B9RM62 II - - - 6: 7: 8: 9 Jcr4S01808.60 Aspartic proteinase Aspartic MER171171 B9RFR2 III - - - 6: 7: 8 Jcr4S01901.50 Aspartic proteinase-like Aspartic MER169780 B9RM62 II - - - 6:7 Jcr4S02642.50 Aspartic proteinase nepenthesin-1 Aspartic MER135834 B9SA53 - - - - 6: 8 Jcr4S02785.20 Basic 7S globulin 2 precursor Aspartic MER177277 B9SEI2 - - - - 7:9 Jcr4S03578.20 Dna damage-inducible protein 1 Aspartic MER242394 B9SX98 - - - - 6: 7: 8 Jcr4S05131.10 Eukaryotic aspartyl protease Aspartic MER155847 B9RTU6 II - - - 6: 7: 8 Jcr4S05382.30 Aspartic proteinase nepenthesin-2 Aspartic MER167471 B9T7L5 I - - - 6 Jcr4S05775.30 Aspartic proteinase nepenthesin-2 Aspartic MER169623 B9RNR9 - - - - 6: 7: 8: 9:10 Jcr4S06063.10 Aspartic proteinase Aspartic MER184448 B9SVA7 IV - - - 101

6 Jcr4S06123.40 Signal peptidase complex subunit 3b Aspartic - B9RF33 - - - - 6: 7: 8: 9:10 Jcr4S07849.40 Protein aspartic protease Aspartic MER178226 B9SBG8 - - - - 6:7 Jcr4S08435.20 Proteasome subunit beta Aspartic MER004344 B9RTN1 - - - - 6: 7: 8: 9 Jcr4S09253.20 Aspartic proteinase Aspartic MER019996 - III - - - 6: 7: 8: 9:10 Jcr4S24165.20 Aspartic proteinase Aspartic MER184448 B9SFR8 III - - - 7:9 Jcr4U30986.20 Dna damage-inducible protein 1 Aspartic MER242394 B9SX98 - - - - 6: 7: 8: 9 Jcr4S00024.130 Cysteine proteinase inhibitor Cysteine MER176362 B9SYV2 I - - - 8 Jcr4S00049.280 Vacuolar-processing enzyme Cysteine MER135370 B9RRV3 IV - - - 6 Jcr4S00051.120 Cysteine proteinases superfamily Cysteine MER158867 B9RRA4 - - - - 6: 7: 8 Jcr4S00066.90 Cysteine proteinase Cysteine MER135672 B9RMS9 I - - - 6 Jcr4S00125.100 Ubiquitin carboxyl-terminal hydrolase isozyme Cysteine MER119853 B9S046 - - - - 6: 7: 8: 9 Jcr4S00157.60 Ubiquitin carboxyl-terminal hydrolase Cysteine MER135798 B9SIG8 - - - - 6 Jcr4S00260.130 OTU domain-containing protein Cysteine MER275761 B9SLP0 - - - - 6: 7: 8: 9 Jcr4S00409.70 Vacuolar-processing enzyme Cysteine MER000846 B9RBP3 IV - - - 8 Jcr4S00448.10 Gamma-glutamyl hydrolase 2 Cysteine MER177165 B9SEV4 - - - - 6: 7: 8: 9 Jcr4S01104.40 Cysteine proteinase Cysteine MER004595 B9RAQ2 I -2.308122 -2.7697 - 6: 7: 8: 9 Jcr4S01597.40 Cysteine proteinase Cysteine MER158871 B9RHA4 I - - - 6: 7: 8 Jcr4S01609.40 Cysteine proteinase Cysteine MER158866 B9RYC1 I - - - 6: 7: 9 Jcr4S01924.10 Cysteine proteinase inhibitor Cysteine MER176362 B9SYV2 I - - - 6: 8: 9:10 Jcr4S02989.80 Cysteine proteinase inhibitor Cysteine MER020315 B9SHT3 - - - - 6: 7: 8: 9 Jcr4S03342.10 Cysteine proteinase Cysteine MER050296 B9R777 I -2.4667 - - 6: 7: 8: 9 Jcr4S05114.20 Ubiquitin carboxyl-terminal hydrolase Cysteine MER167559 B9T7A3 V - - - 8:9 Jcr4S06367.20 Cysteine proteinase Cysteine MER158854 B9T558 - - - - 6: 7: 8: 9 Jcr4S16229.10 Cysteine protease Cysteine MER134872 B9RYC1 II -2.569615 -3.37697 - 6 Jcr4S27020.10 Cysteine proteinases superfamily Cysteine MER158867 B9RRA4 - - - - 6: 7: 8 Jcr4S00131.160 26s protease regulatory subunit Metallo MER278187 B9RR13 - - - - 6: 7: 8: 9 Jcr4S00287.130 Iaa-amino acid hydrolase ilr1 Metallo MER134840 B9RQ74 - - - - 6: 7: 8: 9 Jcr4S00296.90 26s protease regulatory subunit s10b Metallo MER267816 B9R7Z7 I - - - 102

6: 7: 8: 9:10 Jcr4S00343.100 Leucine aminopeptidase chloroplastic Metallo MER172798 B9STR1 IV - - - 6: 7: 8: 9 Jcr4S00385.110 26s protease regulatory subunit 6a homolog Metallo MER278187 B9RFB5 I - - - 8 Jcr4S00494.30 Zn-dependent exopeptidases superfamily Metallo - B9T119 - - - - 6:7 Jcr4S00664.40 Endoplasmic reticulum metallopeptidase Metallo MER179553 B9RMF8 - - - - 6:7 Jcr4S00721.10 26s proteasome non-atpase regulatory subunit Metallo MER125631 B9R7G0 - - - - 7 Jcr4S00865.10 Probable xaa-pro aminopeptidase Metallo MER134696 B9SGI3 - - - - 6: 7: 9 Jcr4S00922.80 26s protease regulatory subunit 8 homolog a Metallo MER278187 B9STQ0 I - - - 6: 7: 8 Jcr4S01168.90 Oligopeptidase a Metallo MER172955 B9SMK4 - - - - 6: 7: 8: 9 Jcr4S01199.20 26s protease regulatory subunit Metallo MER278187 B9SCE5 I - - - 6: 7: 8 Jcr4S01222.40 Probable mitochondrial-processing peptidase Metallo MER015251 B9SJC9 I - - - 6 Jcr4S02294.40 Puromycin-sensitive aminopeptidase Metallo MER278149 B9RQT2 - - - - 6: 7: 8: 9 Jcr4S02641.20 Aminopeptidase Metallo MER171524 B9RCE0 I - - - 6 Jcr4S03273.20 Nicalin precursor Metallo MER134807 B9SCX6 - - - - 6: 7: 8 Jcr4S03327.10 Metalloendoproteinase Metallo MER168520 B9RUG6 I - - - 6: 7: 8: 9 Jcr4S03552.30 Iaa-amino acid hydrolase ilr1 Metallo MER189349 B9SWZ5 I - - - 6 Jcr4S03953.30 Probable 26s proteasome non-atpase regulatory Metallo MER134701 B9SLJ3 - - - - 6 Jcr4S04439.20 N-carbamoyl-l-amino acid hydrolas Metallo MER135015 B9RTE0 - - - - 8:9 Jcr4S07215.10 Peptidase m20 m25 m40 family protein Metallo MER134650 B9S1F8 IV - - - 6: 7: 8: 9 Jcr4S07307.20 Probable xaa-pro aminopeptidase p-like Metallo MER134696 B9SGI3 - - - - 6: 7: 8: 9 Jcr4S07799.10 Aspartyl aminopeptidase Metallo MER134827 B9RAJ0 III - - - 6: 7: 8: 9:10 Jcr4S08055.20 Iaa-amino acid hydrolase ilr1-like 4-like Metallo MER122043 B9RQ74 I - - - 6 Jcr4S08174.20 26s proteasome regulatory subunit 4 homolog Metallo MER278187 B9SJQ0 - - - - 6: 7: 8 Jcr4S11084.40 26s protease regulatory subunit 7-like Metallo MER278187 B9RR13 - - - - 6 Jcr4S12171.10 26s proteasome regulatory subunit 4 homolog Metallo MER278187 B9SJQ0 - - - - 6 Jcr4S25727.20 26s proteasome regulatory subunit 4 homolog Metallo MER278187 B9SJQ0 - - - - 6:7 Jcr4S01671.20 Proteasome subunit beta type Non-peptidase MER004348 B9SK45 I - - - 6:7 Jcr4S01697.60 Mitochondrial-processing peptidase subunit alpha Non-peptidase MER169474 B9RQC8 - - - - 6: 7: 8: 9 Jcr4S02821.10 Proliferation-associated protein Non-peptidase MER176444 B9SWY5 IV - - - 103

6 Jcr4S17767.10 Proteasome subunit beta type Non-peptidase MER168793 B9RTN1 - - - - 6 Jcr4S00056.140 Proline iminopeptidase Serine MER178463 B9S7S3 - - - - 7 Jcr4S00079.140 Serpin family protein Serine MER180116 B9R7I8 - - - - 8 Jcr4S00232.100 Atp-dependent clp protease proteolytic subunit Serine MER169676 B9RND4 - - - - 6: 7: 9 Jcr4S00232.180 Probable glutamyl chloroplastic Serine MER229658 B9RNE4 - - - - 6: 7: 8: 9 Jcr4S00420.50 Subtilisin-like protease-like Serine MER134925 B9T6Y9 I - - - 6: 7: 8 Jcr4S00620.10 Serine carboxypeptidase ii-2 Serine MER172834 B9SQY4 - - - - 6: 7: 8 Jcr4S00627.40 Serine carboxypeptidase-like Serine MER134949 B9S819 I - - - 6 Jcr4S00750.70 Serine carboxypeptidase-like Serine MER172909 B9SMP4 - - - - 10 Jcr4S00794.30 Serine carboxypeptidase-like Serine MER177028 B9SH24 - - - - 6 Jcr4S00836.10 Serine carboxypeptidase-like Serine MER135405 B9SJJ0 - - - - 6: 7: 8: 9 Jcr4S01228.20 Subtilisin-like protease-like Serine MER134738 B9R9K9 III - - - 6:7 Jcr4S01323.10 Subtilisin-like protease-like Serine MER135050 B9R7A2 - - - - 6: 7: 8: 9:10 Jcr4S01500.30 Tripeptidyl-peptidase 2-like Serine MER134728 B9RIX4 - - - - 6: 7: 8: 9 Jcr4S01651.50 Subtilisin-like protease-like Serine MER134925 B9SAV8 I -3.736966 - - 6:7 Jcr4S01708.20 Alpha-amylase/subtilisin inhibitor Serine MER173219 B9SIQ2 I - - - 6 Jcr4S01752.130 Subtilisin-like protease-like Serine MER134745 B9RR97 - - - - 6:7 Jcr4S02283.20 Lon protease mitochondrial Serine MER179947 B9RFI8 - - - - 6:7 Jcr4S02309.10 Xylem serine proteinase 1 Serine MER171392 B9RBY1 - - - - 7: 8: 9 Jcr4S02309.30 Xylem serine proteinase 1 Serine MER171392 B9RBX7 V - - - 8 Jcr4S02309.40 Xylem serine proteinase 1 Serine MER171392 B9RBX7 - - - - 6 Jcr4S02532.50 Subtilisin-like protease-like Serine MER135566 B9T4J8 - - - - 6: 7: 8 Jcr4S02986.60 Proline iminopeptidase-like Serine MER209321 B9RDY5 - - - - 6:7 Jcr4S03009.10 Subtilisin-like protease-like Serine MER134766 B9SV70 - - - - 6: 7: 8: 9 Jcr4S03167.30 Lysosomal pro-x carboxypeptidase Serine MER176402 B9SX01 - - - - 6 Jcr4S03522.50 Serine carboxypeptidase-like Serine MER135405 B9SJJ0 - - - - 6 Jcr4S03773.40 Serine carboxypeptidase-like Serine MER179375 B9RU70 - - - - 6: 7: 8 Jcr4S03912.10 Subtilisin-like protease-like Serine MER167685 B9T6I8 I - - - 104

6: 7: 8: 9 Jcr4S05665.40 Subtilisin-like protease-like Serine MER080370 B9T4J8 I -4.308508 - - 6: 7: 8: 9:10 Jcr4S06861.10 Subtilisin-like protease Serine MER002779 B9R7A1 IV - - -3.08278 6:7 Jcr4S06861.20 Subtilisin-like protease-like Serine MER135050 B9R7A2 - - - - 6: 7: 8: 9:10 Jcr4S19411.10 Subtilisin-like protease Serine MER002779 B9R7A1 III - - - 6: 7: 8: 9:10 Jcr4S19411.20 Subtilisin-like protease Serine MER063464 B9R7A1 IV - - - 6: 7: 8 Jcr4S00057.260 Proteasome subunit beta type-1 Threonine MER180095 B9RDD5 - - - - 6:7 Jcr4S00603.30 20s proteasome beta subunit g1 Threonine MER004348 B9SK45 I - - - 6: 7: 9 Jcr4S00632.50 20s proteasome beta subunit c2 Threonine - B9S2W9 II - - - 6: 7: 8: 9 Jcr4S00843.120 Proteasome subunit alpha Threonine MER169383 B9T3A4 II - - - 6: 7: 9 Jcr4S00852.50 20s proteasome beta subunit c2 Threonine MER167938 B9S2W9 II - - - 6: 7: 8: 9 Jcr4S00995.10 Proteasome subunit alpha Threonine MER180218 B9R7X1 IV - - - 6: 7: 8: 9 Jcr4S01070.40 20s proteasome subunit pba1 Threonine MER172841 B9SPS6 I - - - 6 Jcr4S01594.40 Proteasome subunit alpha type-4-like Threonine MER171897 B9RA94 - - - - 6: 7: 8: 9 Jcr4S01752.50 Proteasome subunit alpha Threonine MER169383 B9T3A4 I - - - 6: 7: 8: 9:10 Jcr4S02302.40 Proteasome subunit beta type Threonine MER173164 B9SJ80 IV - - - 6: 7: 8: 9 Jcr4S02802.40 Proteasome subunit alpha type-2-b Threonine MER134729 B9RI05 I - - - 6: 7: 8: 9:10 Jcr4S03336.20 Proteasome subunit alpha type-3-like Threonine MER135395 B9T3P0 IV - - - 6: 7: 8: 9 Jcr4S05570.10 Proteasome subunit alpha type-5-like Threonine MER167864 B9T3X8 III - - - 6: 7: 9 Jcr4S07728.10 Proteasome subunit alpha type-7-like Threonine MER172557 B9SXV7 II - - - 6: 7: 9 Jcr4S11633.20 Proteasome subunit beta type-5-like Threonine MER126199 B9RMH4 III - - - 105

Figure 18: Distribution of the identified peptidases and peptidase inhibitors into different mechanistic classes. X-axis represent mechanistic classes while Y-axis represent percent of peptidases and peptidase inhibitors in each mechanistic class.

106

Only a small fraction (four) of the peptidases identified are known to have a role in protein processing/turnover either in the plastids or in the mitochondria. Most of the approximately 3000 proteins that comprise the plastid proteome (Leister, 2003), are known to be coded by nuclear genes and exported from the cytosol to the plastid in a process which require the action of a whole gamut of peptidases. Additionally, turnover of proteins inside the plastids is also very active. Identification of only four peptidases of plastid/mitochondria origin is probably due to the highly abundant reserve proteins deposited in the endosperm of seeds during development.

4.5.4.1 Serine peptidases

The mechanistic class of serine peptidses was the most representative class of peptidases identified here. Among the identified peptidases, 43% are subtilisin-like peptidases. The other main serine peptidase class is carboxypeptidase, that together with subtilisin-like peptidases, constitute around 67% of all the identified serine peptidases. In addition, two inhibitors of this mechanistic class were also identified that belong to serpin family.

Subtilisins-like proteins, also known as subtilases, were found to be expressed in different tissues of plants, suggesting distinct roles in plant development (Berger and Altmann, 2000; Fontanini and Jones, 2002). A possible role for subtilases in J. curcas seeds could be related to cell wall dynamics by cleavage of structural proteins or regulation of cell wall remodeling enzymes responsible for loosening the outer primary cell wall. For example, in mutant-type seeds of A. thaliana that do not express a subtilisin peptidase, it was observed that the outer cell wall remained largely intact upon imbibition in EDTA. In wild-type seeds only remnants of cell wall were observed, suggesting a possible involvement of subtilisin peptidase in the consistency of the outer cell wall during seed maturation (Rautengarten et al., 2008). The authors showed that the subtilase triggered the accumulation and/or activation of pectin methylesterase (Rautengarten et al., 2008), an enzyme which weaken the cell wall by increasing the susceptibility of pectins to pectin-degrading enzymes (Jolie et al., 2010). Cell remnants were observed in the endosperm of J. curcas seeds close to the developing embryo (Rocha et al., 2013), suggesting that changes in cell wall dynamics mediated by 107

subtilases could be important in breakdown of cell walls of that region. Other roles for subtilases in seed were observed, for example it was shown that a subtilase gene specifically expressed in the endosperm of Medicago truncatula and Pisum sativum seeds during development could be involved in the regulation of seed size (D'Erfurth et al., 2012). Sutilases were also seen to be involved in mucilage release from seed coat (Rautengarten et al. 2008) and seed coat development (Batchelor et al., 2000). In addition, these peptidases were found to be involved in protein mobilization during germination, where they first accumulate during the development of the seed but were not active in the mature stage (Liu et al., 2001).

4.5.4.2 Metallo peptidases

Metallo peptidase is the second most representative mechanistic class of identified peptidases. Peptidases identified in this class include the 26S proteasome regulatory subunit and amino acid hydrolases, like leucine aminopeptidase and IAA- amino acid hydrolases. We suggest that 26S proteasome complex has important role in the developing endosperm of J. curcas seeds, firstly because of the identification of the large number of these proteins and secondly because of the identification of aminopeptidases that act downstream after the action of 26S proteasome for a complete proteolytic process. The proteolytic complex 26S proteasome generate diverse peptides most of which range in length from 2 to 24 residues (Tenzer and Schild, 2005). These peptides must be rapidly degraded to amino acids that in turn will be used for the synthesis of new proteins. This function is done by a set of endo and aminopeptidases that act sequentially downstream of the proteasome to release free aminoacids (Saric et al., 2004). Degradation of 2-6 residue peptides require leucine aminopeptidases and degradation of 9-17 residue peptides in cell extracts requires a metallopeptidase known as thimet oligopeptidase (Saric et al., 2004). Leucine aminopeptidases were also found to be up regulated in response to oxidative stress possibly to remove inactivated and misfolded proteins (Boulila-Zoghlami et al ., 2011).

108

4.5.4.3 Aspartic peptidases

The aspartic peptidase mechanistic class is the third most representative of the identified peptidases. Two signal peptidase complex subunits were identified in the most initial stage analysed (Stage 6), these proteins are involved in the proteolytic removal of signal peptides from plastid protein precursors (Tsiatsiani et al., 2012). These type of peptidases belong to the class of intermembrane peptidases and involved in the ubiquitous process of hydrolyzing peptide bonds within the hydrophobic core of a membrane (reviewed by Adam, 2013). The localization of peptidases in plants have been observed in endoplasmic reticulum (Tamura et al., 2008; Tamura et al., 2009) and it seems to cleave the target proteins in the membrane to release active peptides that will function in signal transduction pathways (Hoshi et al., 2013). Four nepenthesin ,-a type of aspartic protease were also identified here. These aspartic peptidases unusually have a high temperature and pH stabilities as compared with other aspartic peptidases (Takahashi, 2013). Our quantification results shows that one of them is down regulated during the seed development. The physiological role of nepenthesin in plants is still unclear but some insights could be taken from studies with A. thaliana. Nepenthesin was found to be expressed in various tissues of A. thaliana including leaves, stems, seeds and pods suggesting a ubiquitous occurrence and multiple functions for this aspartic protease in plants (Takahashi et al., 2008). It was shown in A. thaliana that nepenthesin releases an endogenous peptide elicitor of inducible resistance mechanism through salicylic-acid signaling pathway (Xia et al., 2004). Mutants expressing nepenthesin increase the expression of defense genes including glutathione S-transferase. In our results these enzymes were found in the GO subcategories, Response to stress and Response to abiotic stimulus (Figure 9). A possible role for nepenthesin in J. curcas seeds could be related to inducible resistance mechanisms like the expression of genes related to responses to stress. Aspartic peptidases were also shown to be involved in the processing of storage proteins, like 2S albumins to mature form (Hiraiwa et al., 1997). These authors showed that aspartic endopeptidase could act together with vacuolar processing enzyme (VPE), a cysteine peptidase, trimming the C- terminal propeptides from the subunits that are produced by the action of the VPE. We also identified six phytepsins, a type of aspartic peptidase homologous to mammalian lysosomal D and yeast vacuolar proteinase A (Kervien et al., 1999). These 109

type of aspartic peptidases were found to be involved in developmentally regulated PCD of nucellar cells in barley (Simões and Faro, 2004). It was further suggested that aspartic peptidase-like protein may function as an apoptotic protease triggering nucellar cell death or as a hydrolytic protease that converts proteins into nutrients for embryo and endosperm development. As these type of peptidases were found to be active in PCD events, it is suggested that they may be involved in the PCD occurring in the endosperm of J. curcas seeds close to the embryo. It is further suggested to be involved in the cellular lyses in the presence of other peptidases that trigger PCD (Rocha et al., 2013).

4.5.4.4 Cysteine peptidases

Enzymes belong tot he mechanistic class of cysteine peptidase represented 16.3% of all identified peptidases and inhibitors. A KDEL-tailed cysteine peptidase was identified and shown to be down regulated during endosperm development present until Stage 9. The gene for this peptidase was shown to be expressed in the integument of developing J. curcas seeds but was not detected in the developing endosperm (Rocha et al., 2013). These results suggest that there is no correlation between the transcript and protein levels for this peptidase. Another KDEL was shown to be highly expressed at the transcript level (Rocha et al ., 2013) in the endosperm of J. curcas seeds but was not detected here at the protein level, showing no correlation between the transcript and protein for KDEL-tailed cysteine peptidase. This cysteine peptidase has been found to be a hallmark of PCD in various plant tissues (Nogueira et al ., 2012; Helm et al ., 2008; Greenwood, et al ., 2005) and this could possibly be its role in the endosperm of J. curcas seeds as also observed by Rocha et al., (2013) during the transcriptomic study.

β-VPE and α-VPE type cysteine peptidases were also identified where the former was detected till Stage 9 but not in mature stage possibly because of the high levels of storage proteins which decrease the dynamic range of proteins identification. The transcripts for β-VPE in developing endosperm of J. curcas seeds was shown to be up regulated and the higher level could be seen in mature seeds (Rocha et al., 2013). These results of transcript and protein accumulation in middle to late stages are in accordance with the role of β-VPEs as processing enzymes in the maturation of storage 110

proteins (Shimada et al., 2003), a role that could be complemented by the action of the vegetative α-VPE type (Shimada et al., 2003) and aspartic peptidases (Hiraiwa et al., 1997). Like in A. thaliana seeds, β-VPE seems to be the main enzyme involved in processing proproteins in J. curcas seeds taking in account that β-VPE was expressed only at Stages 6-9 and α-VPE was expressed only at Stage 8.

4.5.4.5 Threonine peptidases

Threonine peptidase mechanistic class represents 12.2% of all the peptidases and inhibitors identified here. Threonine peptidases identified in this work are components of the proteasome, the major proteolytic enzyme complex responsible for degradation of intracellular proteins in eukaryotes (Callis and Vierstra, 2000; Hershko and Ciechanover, 1998). In this pathway of degradation, proteins are tagged with the small protein ubiquitin in an ATP-dependent reaction cascade and then ubiquitinated proteins are targeted for degradation by the proteasome complex (reviewed by Sadanandom et al., 2012). This complex is formed for at least 33 subunits (Beckwith, 2013), one of them belongs to the mechanistic class of metallo peptidases and is responsible for deubiquitination of the substrate (Verma et al., 2002). The ubiquitin- proteasome system removes most abnormal peptides and short-lived cellular regulators allowing a rapid response of the cells to intracellular signals and changing environmental conditions (Sadanandom et al., 2012). As it is known, taking into consideration the very high level of protein synthesis in the cells of the developing endosperm, a proper functioning of the proteasome is of importance to make sure that truncated and/or misfolded proteins are rapidly degraded and the level of certain regulatory proteins are controlled. In addition, there are reports about the involvement of proteasome in regulatory PCD in plants (Vacca et al., 2007; Kim et al., 2003), thus this complex may play a role in the possible PCD occurring in the endosperm region close to the embryo as observed by Rocha et al. (2013).

111

4.5.5 Proteins related to toxic components

Presence of the high quantity of PEs in the seeds of J .curcas , hindered the use of its proteins rich seed cake obtained after the oil extraction for the use of animal feed (Makkar et al ., 1997; Makkar et al ., 1998a). PEs are the derivatives of a tetraclyclic diterpene backbone, called tigliane, esterified with FAs (Goel et al ., 2007). In plants, diterpenoids biosynthesis occurs in three main steps (Vranova et al ., 2012). During the first step, the 5-C unit isopentenyl diphosphate (IPP), the basic precursor of diterpenoids in plants, is synthesized via either the cytoplasmic MVA pathway or plastidic nonmavalonate DOXP pathway. From cytoplasmic MVA pathway we identified three (Table 7) out of six enzymes, including acetoacetyl-CoA thiolase (AACT), hydroxymethylglutaryl-CoA synthase (HMGCS), mevalonate diphosphate decarboxylase (MDD). Similarly, from the plastidic DOXP pathway we identified two out of seven enzymes, including, 4-(cytidine 5-diphospho)-2-C-methyl-D-erythritol kinase, and (E)-4-hydroxy-3-methylbut-2-enyl-diphosphate synthase (Table 7), however, in our previous analysis of the proteome of the plastids isolated from the endosperm of developing J. curcas seeds (Pinheiro et al ., 2013), we identified four out of seven enzymes of this pathway. These results indicates that both of these pathways are fully functional in the endosperm of developing J. curcas seeds, as previously proposed by a transcriptomic analysis of the developing seeds (King et al ., 2011). In a second step, condensation of the IPP units result in synthesis of the intermediate diphosphates, namely, GPP, farnesyl diphosphate (FPP) and geranylgeranyl diphosphate (GGPP), that will act as direct precursors for the synthesis of various diterpenoids. Here we did not identify any of the enzyme responsible for the synthesis of these diphosphate precursors, however, from the proteome analysis of the endosperm plastids (Pinheiro et al ., 2013), we identified the enzyme GGPP synthase, that is responsible for the synthesis of GGPP precursor. A previous transcriptomic analysis of the developing J. curcas also identified this enzyme (Costa et al ., 2010). In the last step, using these precursors, formation of the diverse diterpenoids occur by the action of enzymes, called terpene synthases/cyclases, from which we did not identified any enzyme. Although the biosynthetic pathway of the PEs is not fully understood, the only presumed step is the conversion of the GGPP to a monocyclic diterpene, casbene, catalyzed by a terpene synthase, called casbene synthase (CS) (Nakano et al ., 2012). 112

Table 7: Proteins related to the terpenoids biosynthesis identified from the endosperm of developing Jatropha curcas seeds. Column 1 represents, in how many developmental stages a particular protein appeared. Column 2 represents the protein accession number in the genome. Column 3 represents the Blast2GO descriptions of the proteins. Column 4 represents type of the pathway used for the synthesis of isopentenyl diphosphate back bone. Column 5 represents best Ricinus communis homologues of the Jatropha curcas proteins. Column 6 represents the best Arabidopsis thaliana homologues for the Jatropha curcas proteins. Column 7 shows presence of the proteins in different protein clusters. Column 8, 9 and 10 shows log2 of the ratios of the spectral counts of the proteins appeared in Stage 8, 9 and 10, respectively, to Stage 6.

Best Ricinus Best Developmental Log2 Log2 Log2 ProteinID Description Pathway Hit Arabidopsis Hit Cluster stage E8/E6 E9/E6 E10/E6 UNIPROT TAIR 6: 7: 8: 9 Jcr4S00742.70 Acetyl-CoA acetyltransferase *MVA B9SA57 AT5G48230.2 IV - - - 6: 7: 8: 9 Jcr4S05012.10 Acetyl-CoA acetyltransferase MVA B9SA57 AT5G48230.2 III - - - Stage 6 Jcr4S00523.60 Hydroxymethylglutaryl- synthase MVA B9RC08 AT4G11820.2 - - - - 6:08 Jcr4S01592.30 Hiphosphomevalonate decarboxylase-like MVA B9S5A3 AT3G54250.1 - - - - 8 Jcr4S05483.10 4-diphosphocytidyl-2-c-methyl-d-erythritol kinase #DOXP B9SB47 AT2G26930.1 - - - - 6: 7: 8: 9 Jcr4S01187.50 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase DOXP B9SWB7 AT5G60600.1 III - - - *MVA: Mevalonate #DOXP: 1-deoxy-D-xylulose-5-phosphate 113

Other diterpenoids, including tigliane will then be synthesized from the casbene, following many conversions (Nakano et al ., 2012). We were unable to identify CS in this study and from the proteome analysis of the plastids isolated from the developing endosperm (Pinheiro et al ., 2013) and inner integument (data not presented). Previous transcriptomic analyses of the developing J .curcas seeds were also unable to detect this enzyme (Costa et al., 2010; King et al., 2011). Together these results show that the synthesis of the PEs did not occur in the seeds, as also suggested by a detailed study of the CS expression in various parts of the J. curcas plant (Nakano et al ., 2012). In this study the authors isolated the CS gene from the leaves and showed that it expressed in the seedlings, flesh of the fruit and leaves but not in the seeds. They further proposed that the synthesis of PEs precursors may occur in the leaves or fruit, and then be transported to the seeds or alternatively there may be special CS enzymes present in the seeds. As we were unable to identify any CS enzyme in the proteomic analysis of the plastids isolated from the endosperm (Pinheiro et al ., 2013) and inner integument (data not shown), which contradict the later assumption of the author and we also support the first assumption, that the synthesis of the PEs may occur in the fruit or leaves of the plant and later transported to the seeds. Another possibility about the biosynthesis of PEs is in the roots of the plant, from where they may be transposterted to the seeds for accumulation.

Although PEs are the major contributors to the toxicity of J. curcas , there are other anti-nutritional agents, like curcins and trypsin inhibitors, which to a less extent, may also contribute to the toxicity of its seed cake (Makkar et al ., 1997; Makkar et al ., 1998a). Curcin is a type-I RIP present in the seeds of J. curcas , but its toxicity is 1000- fold lower than ricin, a type-II RIP from R. communis (Juan et al ., 2003). Here we identified three curcin proteins (Supplementary Table III), of which one (Jcr4S12813.10) was solely identified in Stage 6 while other two were present in either four or all the five stages. Only one (Jcr4S01069.20) among the three curcins was found to be significantly differentially expressed (Supplementary Table IV) during the seed development, and was grouped to Cluster 3 (Supplementary Table IV), showing a constant decrease in spectral counts during the seed development. In a previous transcriptomic analysis of the developing J. curcas seeds, a single curcin (GenBank: AAL58089 ) was identified and classified among the fifty most abundant transcripts 114

(King et al ., 2011). A local blast of this curcin against our results showed, that this is the same curcin (Jcr4S01069.20) that we identified with higher deposition at the Stage 6, which suggests that this curcin may have some special role in developing seeds.

4.6 Contribution of this study to the establishment of the deep proteome of developing Jatropha curcas seeds

We are working with the proteome analysis of the seeds, of two biolotechnologically important species i.e. J. curcas and R. communis , of the family euphorbiaceae. This study is the part of a project for creating a biotechnology program for J. curcas and R. communis , in order to maximize their use in biodiesel production and other industrial areas. Recently we have published a first in depth proteomic analysis of the plastids isolated from the endosperm of developing J. curcas seeds (Pinheiro et al ., 2013). In this study we used a gel free approach and made an LC- MS/MS analysis of the proteome of the plastids taking advantage of the availability of the J. curcas genomic protein database for doing MS/MS ion search, which resulted in identification of the 923 proteins; to date, this is the highest number of proteins identified for the J. curcas . This study considerably the repertoire of identified proteins in J. curcas seeds and provides insights into the major biosynthetic pathways related to the plastids isolated from the endosperm of developing seeds, at a whole proteome level. Besides the plastids isolated from the endosperm of developing J. curcas seeds, we also made an in depth proteomic analysis of the plastids isolated from the inner integument of the developing J. carcas seeds using a combined gel based and gel free approaches, which resulted in the identification of the 2116 proteins (data not shown). The use of these two different approaches increased the proteins identifications and its functional comparison with the proteomic analysis of the endosperm plastids indicated striking functional specialization for each type of plastids belonging to two different tissues, where the former is clearly geared to the synthesis of FA and amino acids while the later is geared to the synthesis of secondary metabolites.

In another study we analyzed the proteome of the two different regions of the inner integument from the developing J. curcas seeds using a gel free approach, which 115

resulted in the identification of the 1770 proteins (data not shown). Many of the proteins identified in this study are involved in the major metabolic pathways, indicating that the inner integument is metabolically very active. Besides identification of the peptidases of the different mechanistic classes we also identified several other hydrolases that may be involved in the mobilization of the cell wall components, as well as of protein, carbohydrates and lipids that may be available within the integument cells in order to nourish the developing embryo and endosperm. In this proteomic analysis, we for the first time identified reserve proteins from the inner integument. Previously we made a proteome analysis of the nucellus tissue, isolated from the developing seeds of Ricinus communis but did not identify any reserve proteins (Nogueira et al ., 2012), which evidentiate that these two maternally distinct tissues with similar kind of functions, contribute differently to the developmental biology of the seeds.

While comparing the results obtained from the current study to our other three proteomic analyses i.e. proteomic analysis of the endosperm plastids (Pinheiro et al ., 2013), of the inner integument plastids (data not shown) and of the inner integument (data not shown), shows, that this study added 354 unique proteins to the 3390 proteins identified in our four proteomic studies made so far (Figure 19A).

To compare the proteins identified in our studies with the proteins identified by the other groups (Liang et al ., 2007; Liu et al ., 2009; Yang et al ., 2009; Popluechai et al ., 2011; Liu et al ., 2011; Liu et al ., 2013; Booranasrisak et al ., 2013), during different proteomic studies of this species, fasta sequences were downloaded for all the identifications of these studies which collectively gave 209 non redundant proteins. For these 209 proteins a local blast was made against the J .curcas proteins database using an e-value cut off 1 x 10e -20 , which resulted into 156 non redundant J. curcas proteins. Comparison of these 156 proteins with those identified in this endosperm study showed, that out of 156 proteins, 110 were also identified in this analysis while 46 proteins were absent in the proteomic analysis of the endosperm (Figure 19B).

However comparing these 156 proteins with all the identifications made in our proteomic studies with this plant showed that only 26 proteins were absent in our identifications (Figure 19C), which could be due to the use of different tissue samples and conditions made for each study. 116

Figure 19: Venn diagram showing proteins common to endosperm proteome and three other proteomes analyzed in our lab ( A). Venn diagram showing comparison of the endosperm proteins with proteins identified in other proteomic analysis of the Jatropha curcas (B). Venn diagram showing comparison of the four proteome results obtained in our lab with proteins identified in other proteomic studies of Jatropha curcas (C).

117

In summary, along with contribution to the establishment of the full proteome of developing J. curcas seeds, we also added a considerable number of proteins to the current proteins repertoire of J .curcas , which could be utilized in the future studies for designing experiments to produce J. curcas varieties best suited to the biodiesel industry.

118

5 GENERAL CONCLUSIONS

Proteome analysis of the endosperm at five different developmental stages resulted in the identification of 1760 proteins, which is the highest number of proteins identified so far, from the endosperm of J. curcas seed. Functional classification of the proteins involved in various metabolic processes showed, that proteins involved in the metabolism of carbohydrates, amino acids, energy and lipids, represent the major functional classes. Number of proteins identifications was highest in the Stage 6 and reduced in the more advanced developmental stages of the seed, which is possibly due to the accumulation of the higher concentration of the SSPs in the endosperm of maturing seeds. However, we cannot negate the experimental errors like, problems in sample preparation and even performance of the mass spectrometer during analysis of some samples, which could be other possible reason for the low number of proteins identified in more advanced developmental stages, especially in the samples of mature seed.

We have identified a range of proteins involved in the carbohydrates metabolism responsible for providing the energy and carbon sources for FA biosynthesis. Although we identified a complete set of proteins involved in the FA biosynthesis, we were unable to detect any of the protein involved in the triacylglycerides biosynthesis. Similarly, proteins related to the FA and triacylglycerides degradation, and various classes of peptidases involved in diverse biochemical functions were also identified here, revealing the importance of these degradation related enzymes for seed development.

Different classes of SSPs, added with proteins related to seed maturation like oleosins and LEA proteins, were also identified in this study. Four SSPs, named as nutrient reservoirs, showed a decreasing deposition level with seed maturation, in comparison to other SSPs which showed an increasing deposition during seed maturation, suggesting some special role for these SSPs in seed development.

Although we identified proteins involved in the terpenoids backbone biosynthesis however, we were unable to identify the main enzyme i.e. CS, responsible for the PE biosynthesis. Based on these results and those we obtained from the other sub-proteomes of the developing J. curcas seeds, it is suggested that the synthesis of PE 119

may takes place in leaves, fruit or roots and then transported to the seeds for accomulation. Among the other toxic constituents we identified isoforms of curcin, which were absent in the proteome analysis of the inner integument, suggesting that unlike ricin of castor, which was detect in both maternal and filial tissues of the castor seeds this toxic protein is specific to seed endosperm.

In summary, this proteome analysis increased the number of proteins identified so far from the seeds of J. curcas and contributes to the understanding of the biological processes occurring in the seeds during development.

120

6 FUTURE PERSPECTIVES

This work is the part of a project to contribute in providing the knowledge to increase the biotechnological uses of J. curcas and design experiments for producing varieties that best cope with its industrial demand. The main focus of this project was to use a shotgun proteomic approach for understanding the protein machinery responsible for the deposition of the reserves, especially lipids and toxic constituents, accumulated in the seeds of J. curcas . Here we took advantage from the availability of the genome sequence data of J. curcas , for doing the proteins identification search, which considerably increased the rate of proteins identified in comparison to the proteomic studies made without the utility of its genome sequences. Based on the dynamic range of the identifications and absence of some important proteins related to the triacylglycerides biosynthesis, in this study, we suggest the application of methods for depletion of the SSPs proteins using their physiological and biochemical properties. SSPs can be reduced by applying methods like Ca +2 precipitations (Krishnan et al ., 2009) to increase the dynamic range. Similarly, dynamic range can also be improved by fractionating a proteome into smaller sub-proteomes and application of the multidimensional chromatographic steps before injecting the tryptic peptides into the mass spectrometer. These processes can collectively lead us to identify important proteins like those involved in the last step of triacylglycerides biosynthesis; those are usually missing in the presence of these highly abundant SSPs.

The main hurdle to profitable use of the J. curcas oil is the presence of toxic PE in the seeds and hence it is very important to understand their biosynthetic mechanism and accumulation in the seeds of J. curcas . Based on the absence of the CS in this proteome and other sub-proteome analyses J. curcas seeds, we suggest to analyze the proteome of the chloroplast, isolated from leaves, fruit flesh and roots. Once detected in these suggested studies, it will also be important to apply quantitative proteomic approaches, like SRM/MRM (selective reaction monitoring and multiple reaction monitoring), for determining the absolute quantification of CS in these different tissues. These studies can help us to determine the point of their synthesis and contribution of different tissues in their deposition. Once we better understood their biosynthetic pathway, the next important step should be the silencing of the genes involved in their synthesis and studying the effect of their absence on seed development. 121

It will also be interesting to design proteomic studies for determining the post- translational modifications of the J. curcas plastids and endosperm proteins and ultimately evaluate the effects of PTMs on the activity of the enzymes related to the FA and TAGs, carbohydrates and toxic components metabolism etc.

Finally, another important aspect in the biotechnological improvement of this plant is the manipulation of the genes involved in the FA biosynthesis and degradation. High viscosity of J. curcas oil could be reduced by manipulating genes like, KASII and palmitoyl-ACP thioesterase, while high degree of unsaturation could be reduced by manipulation of the desaturase enzymes. Our proteomic study provided important information that is helpful in selecting the proper genes for manipulation studies for increasing the quality of the J. curcas seed oil.

122

7 REFERENCES

ABDULLA, R.; CHAN, E.S.; RAVINDRA, P. Biodiesel production from Jatropha curcas : a critical review. Critical Reviews in Biotechnology , 31, 53-64, 2011.

ACHTEN, W.M.J.; MATHIJS, E.; VERCHOT, L.; SINGH, V.P.; AERTS, R.; MUYS, B. Jatropha biodiesel fueling sustainability? Biofuels, Bioproducts and Biorefining , 1, 283-91, 2007.

ADAM, Z. Emerging roles for diverse intramembrane proteases in plant biology. Biochimica et Biophysica Acta (BBA) – Biomembranes , 1828, 2933-6, 2013.

ADERIBIGBE, A.O.; JOHNSON, C.O.L.E.; MAKKAR, H.P.S.; BECKER, K.; FOIDL, N. Chemical composition and effect of heat on organic matter- and nitrogen- degradability and some antinutritional components of Jatropha meal. Animal Feed Science and Technology , 67, 223-43, 1997.

ADOLF, W.; OPFERKUCH, H.J.; HECKER, E. Irritant phorbol derivatives from four Jatropha species. Phytochemistry , 23, 129-32, 1984.

AEBERSOLD, R.; MANN, M. Mass spectrometry-based proteomics. Nature, 422, 198-207, 2003.

AGHORAM, K.; WILSON, R.F.; BURTON, J.W.; DEWEY, R.E. A Mutation in a 3- Keto-Acyl-ACP Synthase II Gene is Associated with Elevated Palmitic Acid Levels in Soybean Seeds. Crop Science , 46, 2453-9, 2006.

AGRAWAL, G.K.; PEDRESCHI, R.; BARKLA, B.J.; BINDSCHEDLER, L.V.; CRAMER, R.; SARKAR, A. et al. Translational plant proteomics: a perspective. Journal of Proteomics , 75, 4588-601, 2012.

AKAGAWA, M.; HANDOYO, T.; ISHII, T.; KUMAZAWA, S.; MORITA, N.; SUYAMA, K. Proteomic analysis of wheat flour allergens. J Agric Food Chemical , 55, 6863-70, 2007.

ALANG, Z.C.; MOIR, G.F.J.; JONES, L.H. Composition, Degradation and Utilization of Endosperm During Germination in the Oil Palm ( Elaeis guineensis Jacq.). Annals of Botany , 61, 261-8, 1988.

ALEXANDER, D.E.; SEIF, R.D. Relation of Kernel Oil Content to Some Agronomic Traits in Maize1. Crop Science , 3, 354-5, 1963.

123

ALLEN, C.A.W.; WATTS, K.C.; ACKMAN, R.G.;, PEGG, M.J. Predicting the viscosity of biodiesel fuels from their fatty acid ester composition. Fuel , 78, 1319-26, 1999.

ALONSO, A.P.; GOFFMAN, F.D.; OHLROGGE, J.B.; SHACHAR-HILL, Y. Carbon conversion efficiency and central metabolic fluxes in developing sunflower ( Helianthus annuus L.) embryos. The Plant Journal, 52, 296–308, 2007.

AMARA, I.; ODENA, A.; OLIVEIRA, E.; MORENO, A.; MASMOUDI, K.; PAGES, M. et al . Insights into Maize LEA proteins: from proteomics to functional approaches. Plant Cell Physiology , 53, 312-29, 2012.

BABA, A.I.; NOGUEIRA, F.C.S.; PINHEIRO, C.B.; BRASIL, J.N.; JEREISSATI, E.S.; JUCÁ, T.L. et al . Proteome analysis of secondary somatic embryogenesis in cassava ( Manihot esculenta ). Plant Science, 175, 717-23, 2008.

BATCHELOR, A.K.; BOUTILIER, K.; MILLER, S.S.; LABBE, H.; BOWMAN, L.; HU, M. et al . The seed coat-specific expression of a subtilisin-like gene, SCS1, from soybean. Planta , 211, 484-92, 2000.

BAUD, S.; LEPINIEC, L. Physiological and developmental regulation of seed oil production. Prog Lipid Research , 49, 235-49, 2010.

BECKWITH, R.; ESTRIN, E.; WOEDEN, E.J.; MARTIN, A. Reconstitution of the 26S proteasome reveals functional asymmetries in its AAA+ unfoldase. Nature Structural & Molecular Biology , 20, 1164–1172, 2013.

BENJAMINI, Y.; HOCHBERG, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B (Methodological), 57, 289-300, 1995.

BERGER, D.; ALTMANN, T. A subtilisin-like serine protease involved in the regulation of stomatal density and distribution in Arabidopsis thaliana . Genes Dev ., 14, 1119-31, 2000.

BERGER, F.; GRINI, P.E.; SCHNITTGER, A. Endosperm: an integrator of seed growth and development. Curr Opin Plant Biology , 9, 664-70, 2006.

BERGER, F. Endosperm development. Current Opinion in Plant Biology , 2, 28-32, 1999.

BOORANASRISAK, T.; PHAONAKROP, N.; JARESITTHIKUNCHAI, J.; VIRUNANON, C.; ROYTRAKUL, S.; CHULALAKSANANUKUL, W. Proteomic 124

evaluation of free fatty acid biosynthesis in Jatropha curcas L. (physic nut) kernel development. African Journal of Biotechnology. 2013 12:3132-42.

BOULILA-ZOGHLAMI, L.; GALLUSCI, P.; HOLZER, F.M.; BASSET, G.J.; DJEBALI, W.; CHAÏBI, W.; WALLING, L.L.; BROUQUISSE, R. Up-regulation of leucine aminopeptidase-A in cadmium-treated tomato roots. Planta , 234, 857-863, 2011.

BUTTNER, M. The monosaccharide transporter-like gene family in Arabidopsis . FEBS Lett , 581, 2318–2324, 2007.

CALLIS, J.; VIERSTRA, R.D. Protein degradation in signaling. Current Opinion Plant Biology , 3, 381-6, 2000.

CARELS, N. Chapter 2 Jatropha curcas : A Review. In: Jean-Claude, K.; Michel, D. editors. Advances in Botanical Research: Academic Press,. p. 39-86, 2009.

CARVALHO, P.C.; FISCHER, J.S.; XU, T.; COCIORVA, D.; BALBUENA, T.S.; VALENTE, R.H. et al . Search engine processor: Filtering and organizing peptide spectrum matches. Proteomics , 12, 944-9, 2012.

CARVALHO, P.C.; YATES, III J.R.; BARBOSA, V.C. Analyzing shotgun proteomic data with PatternLab for proteomics. Curr Protoc Bioinformatics , Chapter 13:Unit 13, 1-5, 2010.

CASEY, R. Distribution and Some Properties of Seed Globulins. In: Shewry, P.; Casey, R. editors. Seed Proteins : Springer Netherlands, 159-69, 1999.

CHAMPAGNE, A.; BOUTRY, M. Proteomics of nonmodel plant species. Proteomics , 13, 663-73, 2013.

CHASSAIGNE, H.; NØRGAARD, J. V.; VAN HENGEL, A. J. Proteomics-Based Approach To Detect and Identify Major Allergens in Processed Peanuts by Capillary LC-Q-TOF (MS/MS). Journal of Agricultural and Food Chemistry, 55, 4461-73, 2007.

CHEN, G. Q.; TURNER, C.; HE, X.; NGUYEN, T.; MCKEON, T. A.; LAUDENCIA- CHINGCUANCO, D. Expression profiles of genes involved in fatty acid and triacylglycerol synthesis in castor bean ( Ricinus communis L.). Lipids , 42, 263-74, 2007.

125

CHEN, M. S.; WANG, G. J.; WANG, R. L.; WANG, J.; SONG, S. Q.; XU, Z. F. Analysis of expressed sequence tags from biodiesel plant Jatropha curcas embryos at different developmental stages. Plant Science , 181, 696-700, 2011.

CHEN, S.; HARMON, A.C. Advances in plant proteomics. Proteomics , 6, 5504-16, 2006.

CHIA, T.Y.; PIKE, M.J.; RAWSTHORNE, S. Storage oil breakdown during embryo development of Brassica napus (L.). J Exp Botany , 56, 1285-96, 2005.

CHIVANDI, E.; MTIMUNI, J. P.; READ, J. S.; MAKUZA, S. M. Effect of processing method on phorbol esters concentration, total phenolics, trypsin inhibitor activity and the proximate composition of the Zimbabwean Jatropha curcas provenance: a potential livestock feed. Pakistan Journal of Biological Sciences , 7, 1001-5, 2004.

CONESA, A., Gotz, S. Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics . 619832, 2008.

COSTA, G.G.; CARDOSO, K.C.; DEL BEM, L.E.; LIMA, A.C.; CUNHA, M.A.; DE CAMPOS-LEITE L, et al . Transcriptome analysis of the oil-rich seed of the bioenergy crop Jatropha curcas L. BMC Genomics . 11, 462, 2010.

COTTE-RODRIGUEZ, I.; ZHANG, Y.; MIAO, Z.; CHEN, H. Ionization Methods in Protein Mass Spectrometry. Protein and Peptide Mass Spectrometry in Drug Discovery: John Wiley & Sons, Inc, 1-42, 2011.

DE ANGELIS, M.; MINERVINI, F.; CAPUTO, L.; CASSONE, A.; CODA, R.; CALASSO, M. P. et al . Proteomic Analysis by Two-Dimensional Gel Electrophoresis and Starch Characterization of Triticum turgidum L. var. durum Cultivars for Pasta Making. Journal of Agricultural and Food Chemistry . 56, 8619-28, 2008.

DE HOOG, C.L.; MANN, M. Proteomics. Annu Rev Genomics Hum Genet . 5, 267- 93, 2004.

DEBNATH, M.; BISEN, P.S. Jatropha curcas L. A multipurpose stress resistant plant with a potential for ethnomedicine and renewable energy. Curr Pharm Biotechnology, 9, 288-306, 2008.

DEMIRBAS, A. Biodiesel production from vegetable oils via catalytic and non- catalytic supercritical methanol transesterification methods. Progress in Energy and Combustion Science , 31, 466-87, 2005.

126

D'ERFURTH, I.; LE SIGNOR, C.; AUBERT, G.; SANCHEZ, M.; VERNOUD, V.; DARCHY, B. et al . A role for an endosperm-localized subtilase in the control of seed size in legumes. New Phytology , 196, 738-51, 2012.

DORMANN, P.; VOELKER, T. A.; OHLROGGE, J. B. Accumulation of palmitate in Arabidopsis mediated by the acyl-acyl carrier protein thioesterase FATB1. Plant Physiology, 123, 637-44, 2000.

DUNWELL, J. M. Cupins: a new superfamily of functionally diverse proteins that include germins and plant storage proteins. Biotechnol Genet Eng Rev ., 15, 1-32, 1998.

EKMAN, A.; HAYDEN, D. M.; DEHESH, K.; BULOW, L.; STYMNE, S. Carbon partitioning between oil and carbohydrates in developing oat ( Avena sativa L.) seeds. J Exp Botany, 59, 4247-57, 2008.

ESWARAN, N.; PARAMESWARAN, S.; ANANTHARAMAN, B.; KUMAR, G.R.; SATHRAM, B.; JOHNSON, T.S. Generation of an expressed sequence tag (EST) library from salt-stressed roots of Jatropha curcas for identification of abiotic stress- responsive genes. Plant Biology (Stuttg) , 14, 428-37, 2012.

FENN, J.B.; MANN, M.; MENG, C.K.; WONG, S.F.; WHITEHOUSE, C.M. Electrospray ionization for mass spectrometry of large biomolecules. Science, 246, 64- 71, 1989.

FOIDL, N.; FOIDL, G.; SANCHEZ, M.; MITTELBACH, M.; HACKEL, S. Jatropha curcas L. as a source for the production of biofuel in Nicaragua. Bioresource Technology , 58, 77-82, 1996.

FONTANINI, D.; JONES, B.L. SEP-1 - a subtilisin-like serine endopeptidase from germinated seeds of Hordeum vulgare L. cv. Morex. Planta, 215, 885-93, 2002.

FRENTZEN, M. Acyltransferases from basic science to modified seed oils. Lipid / Fett , 100, 161-6, 1998.

GARCIA, D. F.; GERALD, J.N.; BERGER, F. Maternal control of integument cell elongation and zygotic control of endosperm growth are coordinated to determine seed size in Arabidopsis . Plant Cell, 17, 52-60, 2005.

GASSER, C.S.; BROADHVEST, J.; HAUSER, B.A. Genetic analysis of ovule development. Annu Rev Plant Physiol Plant Mol Biology, 49, 1-24, 1998.

127

GIBBS, P.E.; STRONGIN, K.B.; MCPHERSON, A. Evolution of legume seed storage proteins--a domain common to legumins and vicilins is duplicated in vicilins. Molecular Biology and Evolution, 6, 614-23, 1989.

GOEL, G.; MAKKAR, H. P.; FRANCIS, G.; BECKER, K. Phorbol esters: structure, biological activity, and toxicity in animals. Int J Toxicology, 26, 279-88, 2007.

GOMES, K. A.; ALMEIDA, T. C.; GESTEIRA, A.S.; LÔBO, I. P.; GUIMARÃES, A. C. R.; MIRANDA, A. B. D., et al . ESTs from Seeds to Assist the Selective Breeding of Jatropha curcas L. for Oil and Active Compounds. Genomics Insights , 3, 29-55, 2010.

GRAHAM, I. A. Seed storage oil mobilization. Annu Rev Plant Biology , 59, 115-42, 2008.

GRAVES, P. R.; HAYSTEAD, T. A. Molecular biologist's guide to proteomics. Microbiol Mol Biol Rev ., 66, 39-63, 2002.

GREENWOOD, J. S, HELM, M., GIETL, C. Ricinosomes and endosperm transfer cell structure in programmed cell death of the nucellus during Ricinus seed development. Proc Natl Acad Sci U S A ., 102, 2238-43, 2005.

GU, K.; CHIAM, H.; TIAN, D.; YIN, Z. Molecular cloning and expression of heteromeric ACCase subunit genes from Jatropha curcas. Plant Science , 180, 642-9, 2011.

GU, K.; YI, C.; TIAN, D.; SANGHA, J.S.; HONG, Y.; YIN, Z. Expression of fatty acid and lipid biosynthetic genes in developing endosperm of Jatropha curcas . Biotechnol Biofuels , 5, 47, 2012.

GÜBITZ, G.M.; MITTELBACH, M.; TRABI M. Exploitation of the tropical oil seed plant Jatropha curcas L. Bioresource Technology , 67, 73-82,1999.

GUI, M.M.; LEE, K.T.; BHATIA, S. Feasibility of edible oil vs. non-edible oil vs. waste edible oil as biodiesel feedstock. Energy , 33, 1646-53, 2008.

HAAS, W.; STERK, H.; MITTELBACH, M. Novel 12-deoxy-16-hydroxyphorbol diesters isolated from the seed oil of Jatropha curcas . J Nat Prod ., 65, 1434-40, 2002.

HAJDUCH, M.; MATUSOVA, R.; HOUSTON, N.L.; THELEN, J.J. Comparative proteomics of seed maturation in oilseeds reveals differences in intermediary metabolism. Proteomics . 11, 1619-29, 2011.

HARTLEY, B.S. Proteolytic enzymes. Annu Rev Biochemical , 29, 45-72, 1960. 128

HASHIGUCHI, A.; AHSAN, N.; KOMATSU, S. Proteomics application of crops in the context of climatic changes. Food Research International , 43, 1803-13, 2010.

HE, W.; KING, A.J.; KHAN, M.A.; CUEVAS, J.A.; RAMIARAMANANA, D.; GRAHAM, I.A. Analysis of seed phorbol-ester and curcin content together with genetic diversity in multiple provenances of Jatropha curcas L. from Madagascar and Mexico. Plant Physiology and Biochemistry . 49, 1183-90, 2011.

HELM, M.; SCHMID, M.; HIERL, G.; TERNEUS, K.; TAN, L.; LOTTSPEICH, F., et al . KDEL-tailed cysteine endopeptidases involved in programmed cell death, intercalation of new cells, and dismantling of extensin scaffolds. Am J Botany , 95, 1049-62, 2008.

HERSHKO, A.; CIECHANOVER, A. The ubiquitin system. Annu Rev Biochemistrty , 67, 425-79, 1998.

HIRAIWA, N., KONDO, M., NISHIMURA, M., HARA-NISHIMURA, I. An aspartic endopeptidase is involved in the breakdown of propeptides of storage proteins in protein-storage vacuoles of plants. Eur. J. Biochem ., 246, 133- 141, 1997.

HIRAKAWA, H.; TSUCHIMOTO, S.; SAKAI, H.; NAKAYAMA, S.; FUJISHIRO, T.; KISHIDA, Y., et al . Upgraded genomic information of Jatropha curcas L. Plant Biotechnology , 29, 123-30, 2012.

HIRNER, B.; FISCHER, W.N.; RENTSCH, D.; KWART, M.; FROMMER, W.B. Developmental control of H+/amino acid permease gene expression during seed development of Arabidopsis . Plant Journal , 14, 535-44, 1998.

HIROTA, M.; SUTTAJIT, M.; SUGURI, H.; ENDO, Y.; SHUDO, K.; WONGCHAI, V., et al . A new tumor promoter from the seed oil of Jatropha curcas L., an intramolecular diester of 12-deoxy-16-hydroxyphorbol. Cancer Research , 48, 5800-4, 1988.

HONG, S.K.; KITANO, H.; SATOH, H.; NAGATO, Y. How is embryo size genetically regulated in rice? Development, 122, 2051-8, 1996.

HOSHI, M.; OHKI, Y.; ITO, K.; TOMITA, T.; IWATSUBO, T.; ISHIMARU, Y., et al . Experimental detection of proteolytic activity in a signal peptide peptidase of Arabidopsis thaliana . BMC Biochemical , 14, 16, 2013.

HSIEH, K.; HUANG, A.H. Endoplasmic reticulum, oleosins, and oils in seeds and tapetum cells. Plant Physiology , 136, 3427-34, 2004.

129

HU, Q.; NOLL, R. J.; LI, H.; MAKAROV, A.; HARDMAN, M.; GRAHAM, C. R. The Orbitrap: a new mass spectrometer. J Mass Spectrometry , 40, 430-43, 2005.

HU, J.; BAKER, A.; BARTEL, B.;LINKA, B.; MULLEN, R.T.; REUMANN, S.; ZOLMAN, B.K. Plant Peroxisomes: Biogenesis and Function. The Plant Cell Preview , 1-25, 2012.

HUNDERTMARK, M.; HINCHA, D.K. LEA (late embryogenesis abundant) proteins and their encoding genes in Arabidopsis thaliana . BMC Genomics , 9, 118, 2008.

IGBINOSA, O.O.; IGBINOSA, E.O.; AIYEGORO, O.A. Antimicrobial activity and phytochemical screening of stem bark extracts from Jatropha curcas (Linn). African Journal of Pharmacy and Pharmacology, 3, 58-62, 2009.

IIMURE, T.; NANKAKU, N.; HIROTA, N.; TIANSU, Z.; HOKI, T.; KIHARA, M., et al . Construction of a novel beer proteome map and its use in beer quality control. Food Chemistry, 118, 566-74, 2010.

INGRAM, G.C. Family life at close quarters: communication and constraint in angiosperm seed development. Protoplasma , 247, 195-214, 2010.

INGVARDSEN, C.; VEIERSKOV, B. Ubiquitin- and proteasome-dependent in plants. Physiol Plant , 112, 451-9, 2001.

JAIN, S.; SHARMA, M.P. Prospects of biodiesel from Jatropha in India: A review. Renewable and Sustainable Energy Reviews , 14, 763-71, 2010.

JIANG, H.; WU, P.; ZHANG, S.; SONG, C.; CHEN, Y.; LI, M., et al . Global analysis of gene expression profiles in developing physic nut ( Jatropha curcas L.) seeds. PLoS One , 7, e36522, 2012.

JOLIE, R.P.; DUVETTER, T.; VAN LOEY, A.M.; HENDRICKX, M.E. Pectin methylesterase and its proteinaceous inhibitor: a review. Carbohydrate Research , 345, 2583-95, 2010.

JORRIN-NOVO, J.V.; MALDONADO, A.M.; ECHEVARRIA-ZOMENO, S.; VALLEDOR, L.; CASTILLEJO, M.A.; CURTO, M., et al . Plant proteomics update (2007-2008): Second-generation proteomic techniques, an appropriate experimental design, and data analysis to fulfill MIAPE standards, increase plant proteome coverage and expand biological knowledge. J Proteomics , 72, 285-314, 2009.

130

JOSHI, C.; MATHUR, P.; KHARE, S.K. Degradation of phorbol esters by Pseudomonas aeruginosa PseA during solid-state fermentation of deoiled Jatropha curcas seed cake. Bioresour Technology , 102, 4815-9, 2011.

JUAN, L.; YU, C.; YING, X.; FANG, Y.; LIN, T.; FANG, C. Cloning and Expression of Curcin, a Ribosome-Inactivating Protein from the Seeds of Jatropha curcas . Acta Botanica Sinica , 45, 858-63, 2003.

KARAS, M.; HILLENKAMP, F. Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chemical , 60, 2299-301, 1988.

KERVINEN, J.; TOBIN, G.J.; COSTA, J.; WAUGH, D.S.; WLODAWER, A.; ZDANOV, A. Crystal structure of plant aspartic proteinase prophytepsin: inactivation and vacuolar targeting. EMBO J., 18, 3947-55, 1999.

KIM, M.; AHN, J.W.; JIN, U.H.; CHOI, D.; PAEK, K.H.; PAI, H.S. Activation of the programmed cell death pathway by inhibition of proteasome function in plants. J Biol Chemical , 278, 19406-15, 2003.

KING, A.; LI, Y.; GRAHAM, I. Profiling the Developing Jatropha curcas L. Seed Transcriptome by Pyrosequencing. BioEnergy Research , 4, 211-21, 2011.

KING, A.J.; HE, W.; CUEVAS, J.A.; FREUDENBERGER, M.; RAMIARAMANANA, D.; GRAHAM, I.A. Potential of Jatropha curcas as a source of renewable oil and animal feed. Journal of Experimental Botany, 60, 2897-905, 2009.

KNOTHE, G. Dependence of biodiesel fuel properties on the structure of fatty acid alkyl esters. Fuel Processing Technology , 86, 1059-70, 2005.

KNUTZON, D. S.; THOMPSON, G. A.; RADKE, S. E.; JOHNSON, W. B.; KNAUF, V.C.; KRIDL, J. C. Modification of Brassica seed oil by antisense expression of a stearoyl-acyl carrier protein desaturase gene. Proceedings of the National Academy of Sciences, 89, 2624-8, 1992.

KOH, M. Y.; MOHD, G. T. I. A review of biodiesel production from Jatropha curcas L. oil. Renewable and Sustainable Energy Reviews , 15, 2240-51, 2011.

KRANZ, E.; VON WIEGEN, P.; QUADER, H.; LORZ, H. Endosperm development after fusion of isolated, single maize sperm and central cells in vitro. Plant Cell , 10, 511-24, 1998.

131

KUMAR, A.; SHARMA, S. An evaluation of multipurpose oil seed crop for industrial uses ( Jatropha curcas L.): A review. Industrial Crops and Products , 28, 1-10, 2008.

KUMAR TIWARI, A.; KUMAR, A.; RAHEMAN, H. Biodiesel production from Jatropha oil ( Jatropha curcas ) with high free fatty acids: An optimized process. Biomass and Bioenergy , 31, 569-75, 2007.

KUMAR, V.; MAKKAR, H.P.S; BECKER, K. Detoxified Jatropha curcas kernel meal as a dietary protein source: growth performance, nutrient utilization and digestive enzymes in common carp ( Cyprinus carpio L.) fingerlings. Aquaculture Nutrition , 17, 313-26, 2011.

LAFON-PLACETTE, C.; KÖHLER, C. Embryo and endosperm, partners in seed development. Current Opinion in Plant Biology , 17, 64-9, 2014.

LEISTER, D. Chloroplast research in the genomic age. Trends Genetics , 19, 47-56, 2003.

LIANG, Y.; CHEN, H.; TANG, M.J.; YANG, P.F.; SHEN, S.H. Responses of Jatropha curcas seedlings to cold stress: photosynthesis-related proteins and chlorophyll fluorescence characteristics. Physiol Plant , 131, 508-17, 2007.

LIN, J.; YAN, F.; TANG, L.; CHEN, F. Antitumor effects of curcin from seeds of Jatropha curcas . Acta Pharmacol Sin ., 24, 241-6, 2003.

LIN, M.; OLIVER, D. J. The role of acetyl-coenzyme a synthetase in Arabidopsis . Plant Physiology , 147, 1822-9, 2008.

LIU, H.; LIU, Y. J.; YANG, M. F.; SHEN, S. H. A comparative analysis of embryo and endosperm proteome from seeds of Jatropha curcas . J Integr Plant Biology, 51, 850- 7, 2009.

LIU, H.; WANG, C.; KOMATSU, S.; HE, M.; LIU, G.; SHEN, S. Proteomic analysis of the seed development in Jatropha curcas : from carbon flux to the lipid accumulation. J Proteomics , 91, 23-40, 2013.

LIU, H.; YANG, Z.; YANG, M.; SHEN, S. The differential proteome of endosperm and embryo from mature seed of Jatropha curcas . Plant Science , 181, 660-6, 2011.

LOEI, H.; LIM, J.; TAN, M.; LIM, T. K.; LIN, Q. S.; CHEW, F. T., et al . Proteomic analysis of the oil palm fruit mesocarp reveals elevated oxidative phosphorylation activity is critical for increased storage oil production. J Proteome Res ., 12, 5096-109, 2013. 132

LOPES, M.A.; LARKINS, B.A. Endosperm origin, development, and function. The Plant Cell Online , 5, 1383-99, 1993.

LUO, M. J.; YANG, X. Y.; LIU, W. X.; XU, Y.; HUANG, P.; YAN, F., et al . Expression, purification and anti-tumor activity of curcin. Acta Biochim Biophys Sin (Shanghai) , 38, 663-8, 2006.

MAHANTA, N.; GUPTA, A.; KHARE, S. K. Production of protease and lipase by solvent tolerant Pseudomonas aeruginosa PseA in solid-state fermentation using Jatropha curcas seed cake as substrate. Bioresource Technology , 99, 1729-35, 2008.

MAKAROV, A. Electrostatic axially harmonic orbital trapping: a high-performance technique of mass analysis. Anal Chemical , 72, 1156-62, 2000.

MAKKAR, H. P. S.; ADERIBIGBE, A. O.; BECKER, K. Comparative evaluation of non-toxic and toxic varieties of Jatropha curcas for chemical composition, digestibility, protein degradability and toxic factors. Food Chemistry , 62, 207-15, 1998a.

MAKKAR, H. P.; BECKER, K.; SCHMOOK, B. Edible provenances of Jatropha curcas from Quintana Roo state of Mexico and effect of roasting on antinutrient and toxic factors in seeds . Plant Foods Hum Nutr ., 52, 31-6, 1998b.

MAKKAR, H. P. S.; BECKER, K.; SPORER, F.; WINK, M. Studies on Nutritive Potential and Toxic Constituents of Different Provenances of Jatropha curcas. Journal of Agricultural and Food Chemistry, 45, 3152-7, 1997.

MAKKAR, H. P. S.; BECKER, K. Jatropha curcas, a promising crop for the generation of biodiesel and value-added coproducts. European Journal of Lipid Science and Technology , 111, 773-87, 2009.

MANN, M.; HENDRICKSON, R. C.; PANDEY, A. Analysis of proteins and proteomes by mass spectrometry. Annu Rev Biochem , 70, 437-73, 2001.

MARRIOTT, K. M.; NORTHCOTE, D. H. The breakdown of lipid reserves in the endosperm of germinating castor beans. Biochem J., 148, 139-44, 1975.

MARTÍN, C.; MOURE, A.; MARTÍN, G.; CARRILLO, E.; DOMÍNGUEZ, H.; PARAJÓ, J. C. Fractional characterisation of jatropha, neem, moringa, trisperma, castor and candlenut seeds as potential feedstocks for biodiesel production in Cuba. Biomass and Bioenergy , 34, 533-8, 2010.

133

MARTÍNEZ-HERRERA, J.; SIDDHURAJU, P.; FRANCIS, G.; DÁVILA-ORTÍZ, G.; BECKER, K. Chemical composition, toxic/antimetabolic constituents, and effects of different treatments on their levels, in four provenances of Jatropha curcas L. from Mexico. Food Chemistry , 96, 80-9, 2006.

MATSUSE, I. T.; LIM, Y. A.; HATTORI, M.; CORRE, M.; GUPTA, M. P. A search for anti-viral properties in Panamanian medicinal plants.: The effects on HIV and its essential enzymes. Journal of Ethnopharmacology , 64, 15-22, 1999.

MCCARTHY, F. M.; WANG, N.; MAGEE, G. B.; NANDURI, B.; LAWRENCE, M. L.; CAMON, E. B.; et al . AgBase: a functional genomics resource for agriculture. BMC Genomics , 7, 229, 2006.

MIERNYK, J. A.; HAJDUCH, M. Seed proteomics. Journal of Proteomics , 74, 389- 400 2011.

MISHRA, M. S.; CHANDRASHEKHAR, B.; CHATTERJEE, T.; SINGH, K. Production of bio-ethanol from jatropha oilseed cakes via dilute acid hydrolysis and fermentation by saccharomyces cerevisiae. International Journal of Biotechnology Applications , 3, 41-7, 2011.

MOÏSE, J.; HAN, S.; GUDYNAIT Ę-SAVITCH, L.; JOHNSON, D.; MIKI, B. A. Seed coats: Structure, development, composition, and biotechnology. In Vitro Cellular & Developmental Biology , 41, 620-44, 2005.

MUDDIMAN, D. C.; BAKHTIAR, R.; HOFSTADLER, S. A.; SMITH, R.D. Matrix- Assisted Laser Desorption/Ionization Mass Spectrometry. Instrumentation and Applications. Journal of Chemical Education , 74, 1288, 1997.

NAKANO, Y.; OHTANI, M.; POLSRI, W.; USAMI, T.; SAMBONGI, K.; DEMURA, T. Characterization of the casbene synthase homolog from Jatropha (Jatropha curcas L.). Plant Biotechnology , 29, 185-9, 2012.

NATARAJAN, P.; KANAGASABAPATHY, D.; GUNADAYALAN, G.; PANCHALINGAM, J.; SHREE, N.; SUGANTHAM, P. A.; et al . Gene discovery from Jatropha curcas by sequencing of ESTs from normalized and full-length enriched cDNA library from developing seeds. BMC Genomics , 11, 606, 2010.

NATARAJAN, P.; PARANI, M. De novo assembly and transcriptome analysis of five major tissues of Jatropha curcas L. using GS FLX titanium platform of 454 pyrosequencing. BMC Genomics , 12, 191, 2011.

134

NAYAK, B. S.; PATEL, K. N. Pharmacognostic studies of the Jatropha curcas leaves. International Journal of PharmTech Research , 2, 140-3, 2010.

NGUYEN, T.; SHANKLIN, J. Altering Arabidopsis Oilseed Composition by a Combined Antisense-Hairpin RNAi Gene Suppression Approach. Journal of the American Oil Chemists Society , 86, 41-9, 2009.

NOGUEIRA, F.C.; PALMISANO, G.; SCHWAMMLE, V.; SOARES, E. L.; SOARES, A. A.; ROEPSTORFF, P.; et al . Isotope labeling-based quantitative proteomics of developing seeds of castor oil seed (Ricinus communis L.). Journal of Proteome Research , 12, 5012-24, 2013.

NOGUEIRA, F. C.; PALMISANO, G.; SOARES, E. L.; SHAH, M.; SOARES, A. A.; ROEPSTORFF, P.; et al . Proteomic profile of the nucellus of castor bean (Ricinus communis L.) seeds during development. Journal of Proteomics , 75, 1933-9, 2012.

NYMAN, T. A. The role of mass spectrometry in proteome studies. Biomol Eng ., 18, 221-7, 2001.

OHLROGGE, J. B.; JAWORSKI, J. G. Regulation of fatty acid synthesis. Annu Rev Plant Physiol Plant Mol Biol. , 48, 109-36, 1997.

OLIVER, D. J.; NIKOLAU, B. J.; WURTELE, E. S. Acetyl-CoA—Life at the metabolic nexus. Plant Science , 176, 597-601, 2009.

OLSEN, O. A. ENDOSPERM DEVELOPMENT: Cellularization and Cell Fate Specification. Annu Rev Plant Physiol Plant Mol Biol ., 52, 233-67, 2001.

OM TAPANES, N. C.; GOMES ARANDA, D. A.; DE MESQUITA CARNEIRO, J. W.; CEVA ANTUNES O. A. Transesterification of Jatropha curcas oil glycerides: Theoretical and experimental studies of biodiesel reaction. Fuel , 87, 2286-95, 2008.

OPENSHAW, K. A review of Jatropha curcas: an oil plant of unfulfilled promise. Biomass and Bioenergy , 19, 1-15, 2000.

OPSAHL-FERSTAD, H. G.; LE DEUNFF, E.; DUMAS, C.; ROGOWSKY, P. M. ZmEsr, a novel endosperm-specific gene expressed in a restricted region around the maize embryo. Plant J ., 12, 235-46, 1997.

OSKOUEIAN, E.; ABDULLAH, N.; SAAD, W. Z.; OMAR, A. R.; PUTEH, M. B.; HO, Y. W. Anti-Nutritional Metabolites and Effect of Treated Jatropha curcas Kernel Meal on Rumen Fermentation in vitro. Journal of Animal and Veterinary Advances , 10, 214-20, 2011. 135

PANDEY, A.; MANN, M. Proteomics to study genes and genomes. Nature, 405, 837- 46, 2000.

PEDRESCHI, R.; HERTOG, M.; LILLEY, K. S.; NICOLAI, B. Proteomics for the food industry: opportunities and challenges. Crit Rev Food Sci Nutr ., 50, 680-92, 2010.

PERALTA-FLORES, L.; GALLEGOS-TINTORÉ, S.; SOLORZA-FERIA, J.; DÁVILA-ORTÍZ, G.; CHEL-GUERRERO, L.; MARTÍNEZ-AYALA, A. Biochemical evaluation of protein fractions from physic nut (Jatropha curcas L.). Grasas y Aceites ., 63, 253-9, 2012.

PEUMANS, W. J.; HAO, Q.; VAN DAMME, E. J. Ribosome-inactivating proteins from plants: more than RNA N-glycosidases. FASEB J ., 15, 1493-506, 2001.

PINHEIRO, C. B.; SHAH, M.; SOARES, E. L.; NOGUEIRA, F. C.; CARVALHO, P. C.; JUNQUEIRA, M.; et al . Proteome analysis of plastids from developing seeds of Jatropha curcas L. Journal Proteome Research , 12, 5137-45, 2013.

PINTO, A. C.; GUARIEIRO, L. L. N.; REZENDE, M. J. C.; RIBEIRO, N. M.; TORRES, E. A.; LOPES, W. A.; et al . Biodiesel: an overview. Journal of the Brazilian Chemical Society , 16, 1313-30, 2005.

POPLUECHAI, S.; FROISSARD, M.; JOLIVET, P.; BREVIARIO, D.; GATEHOUSE, A. M.; O'DONNELL, A. G.; et al . Jatropha curcas oil body proteome and oleosins: L- form JcOle3 as a potential phylogenetic marker. Plant Physiol Biochem ., 49, 352-6, 2011.

PRAMANIK, K. Properties and use of jatropha curcas oil and diesel fuel blends in compression ignition engine. Renewable Energy , 28, 239-48, 2003.

QU, J.; MAO, H. Z.; CHEN, W.; GAO, S. Q.; BAI, Y. N.; SUN, Y. W.; et al . Development of marker-free transgenic Jatropha plants with increased levels of seed oleic acid. Biotechnol Biofuels ., 5, 10, 2012.

RAKSHIT, K. D.; DARUKESHWARA, J.; RATHINA, RAJ, K.; NARASIMHAMURTHY, K.; SAIBABA, P.; BHAGYA, S. Toxicity studies of detoxified Jatropha meal (Jatropha curcas) in rats . Food and Chemical Toxicology, 46, 3621-5, 2008.

RAUTENGARTEN, C.; USADEL, B.; NEUMETZLER, L.; HARTMANN, J.; BUSSIS, D.; ALTMANN, T. A subtilisin-like serine protease essential for mucilage release from Arabidopsis seed coats. Plant J ., 54, 466-80, 2008. 136

RAWSTHORNE, S. Carbon flux and fatty acid synthesis in plants. Prog Lipid Res ., 41, 182-96, 2002.

REALE, L.; RICCI, A.; FERRANTI, F.; TORRICELLI, R.; VENANZONI, R.; FALCINELLI, M. Cytohistological Analysis and Mobilization of Reserves in Jatropha curcas L. Seed. Crop Sci ., 52, 830-5, 2012.

ROCHA, A. J.; SOARES, E. L.; COSTA, J. H.; COSTA, W. L.; SOARES, A. A.; NOGUEIRA, F. C.; et al. Differential expression of cysteine peptidase genes in the inner integument and endosperm of developing seeds of Jatropha curcas L. (Euphorbiaceae). Plant Sci ., 213, 30-7, 2013.

SADANANDOM, A.; BAILEY, M.; EWAN, R.; LEE, J.; NELIS, S. The ubiquitin- proteasome system: central modifier of plant signalling. New Phytol. , 196, 13-28, 2012.

SALEKDEH, G. H.; KOMATSU, S. Crop proteomics: aim at sustainable agriculture of tomorrow. Proteomics , 7, 2976-96, 2007.

SANCHEZ-MONGE, R.; LOPEZ-TORREJON, G.; PASCUAL, C. Y.; VARELA, J.; MARTIN-ESTEBAN, M.; SALCEDO, G. Vicilin and convicilin are potential major allergens from pea. Clin Exp Allergy . 34, 1747-53, 2004..

SARIC, T.; GRAEF, C. I.; GOLDBERG, A. L. Pathway for degradation of peptides generated by proteasomes: a key role for thimet oligopeptidase and other metallopeptidases. J Biol Chem ., 279, 46723-32. 2004;

SATO, S.; HIRAKAWA, H.; ISOBE, S.; FUKAI, E.; WATANABE, A.; KATO, M.; et al. Sequence analysis of the genome of an oil-bearing tree, Jatropha curcas L. DNA Res. 18, 65-76,. 2010.

SCHLERETH, A.; STANDHARDT, D.; MOCK, H. P.; MUNTZ, K. Stored cysteine proteinases start globulin mobilization in protein bodies of embryonic axes and cotyledons during vetch (Vicia sativa L.) seed germination. Planta , 212, 718-27, 2001.

SCHWARTZ, J. C.; SENKO, M. W.; SYKA, J. E. A two-dimensional quadrupole ion trap mass spectrometer. J Am Soc Mass Spectrom ., 13, 659-69, 2002.

SCIGELOVA, M.; MAKAROV, A. Orbitrap mass analyzer--overview and applications in proteomics. Proteomics , 6, 16-21, 2006.

SHARIEF, F. S.; LI, S. S. Amino acid sequence of small and large subunits of seed storage protein from Ricinus communis. J Biol Chem ., 257, 14753-9, 1982. 137

SHARMA, D. K.; PANDEY, A. K. Use of Jatropha curcas hull biomass for bioactive compost production. Biomass and Bioenergy , 33, 159-62, 2009.

SHEVCHENKO, A.; TOMAS, H.; HAVLIS, J.; OLSEN, J. V.; MANN, M. In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat Protoc . 1, 2856-60, 2006.

SHEWRY, P. R.; NAPIER, J. A.; TATHAM, A. S. Seed storage proteins: structures and biosynthesis. Plant Cell ., 7, 945-56, 1995

SHIMADA, T.; YAMADA, K.; KATAOKA.; M.; NAKAUNE, S.; KOUMOTO, Y.; KUROYANAGI, M.; et al . Vacuolar processing enzymes are essential for proper processing of seed storage proteins in Arabidopsis thaliana . J Biol Chem ., 278, 32292- 9, 2003.

SILOTO, R. M.; FINDLAY, K.; LOPEZ-VILLALOBOS, A.; YEUNG, E. C.; NYKIFORUK, C. L, MOLONEY, M. M. The accumulation of oleosins determines the size of seed oilbodies in Arabidopsis. Plant Cell ., 18, 1961-74, 2006.

SIMOES, I.; FARO, C. Structure and function of plant aspartic proteinases. Eur J Biochem. , 271, 2067-75, 2004.

SINGH, R. N., VYAS, D. K.; SRIVASTAVA, N. S. L.; NARRA, M. SPRERI experience on holistic approach to utilize all parts of Jatropha curcas fruit for energy. Renewable Energy, 33, 1868-73, 2008.

SINGH, R. P. Structure and development of seeds in Euphorbiaceae. Beitr Biol Pflanzen ., 47, 79-90, 1970.

SIRVENT, S.; PALOMARES, O.; CUESTA-HERRANZ, J.; VILLALBA, M.; RODRÍGUEZ, R. Analysis of the Structural and Immunological Stability of 2S Albumin, Nonspecific Lipid Transfer Protein, and Profilin Allergens from Mustard Seeds . Journal of Agricultural and Food Chemistry , 60, 6011-8, 2012.

SUDHAKAR JOHNSON, T.; ESWARAN, N.; SUJATHA, M. Molecular approaches to improvement of Jatropha curcas Linn. as a sustainable energy crop. Plant Cell Reports. , 30, 1573-91, 2011.

TAKAHASHI, K.; ATHAUDA, S. B.; MATSUMOTO, K.; RAJAPAKSHE, S.; KURIBAYASHI, M.; KOJIMA, M.; KUBOMURA-YOSHIDA, N.; IWAMATSU, A.; SHIBATA, C.; INOUE, H. Nepenthesin, a unique member of a novel subfamily of 138

aspartic proteinases: enzymatic and structural characteristics . Curr Protein Pept Sci ., 6, 513-525, 2005.

TAKAHASHI, K.; NIWA, H.; YOKOTA, N.; KUBOTA, K.; INOUE, H. Widespread tissue expression of nepenthesin-like aspartic protease genes in Arabidopsis thaliana . Plant Physiol Biochem. , 46, 724-9, 2008.

TAKAHASHI, T. Structure and function studies on enzymes with a catalytic carboxyl group(s): from ribonuclease T-1 to carboxyl peptidases. Proceedings of the japan academy series b-physical and biological sciences , 89, 201-225, 2013.

TAKAIWA, F.; OGAWA, M.; OKITA, T. RICE GLUTELINS. IN: SHEWRY P, CASEY R, editors. Seed Proteins: Springer Netherlands; 1999. p. 401-25.

TAM, E. M.; MORRISON, C. J.; WU, Y. I.; STACK, M. S.; OVERALL, C. M. Membrane protease proteomics: Isotope-coded affinity tag MS identification of undescribed MT1-matrix metalloproteinase substrates. Proc Natl Acad Sci U S A ., 101, 6917-22, 2004.

TAMURA, T.; ASAKURA, T.; UEMURA, T.; UEDA, T.; TERAUCHI. K.; MISAKA, T.; et al . Signal peptide peptidase and its homologs in Arabidopsis thaliana--plant tissue-specific expression and distinct subcellular localization. FEBS J ., 275, 34-43, 2008.

TAMURA, T.; KURODA, M.; OIKAWA, T.; KYOZUKA, J.; TERAUCHI, K.; ISHIMARU, Y. et al . Signal peptide peptidases are expressed in the shoot apex of rice, localized to the endoplasmic reticulum . Plant Cell Rep ., 28, 1615-21, 2009.

TAN-WILSON, A. L.; WILSON, K. A. Mobilization of seed protein reserves. Physiol Plant ., 145, 140-53, 2012.

TEMPLEMAN, T. S.; DEMAGGIO, A. E.; STETLER, D. A. Biochemistry of Fern Spore Germination: Globulin Storage Proteins in Matteuccia struthiopteris L. Plant Physiol ., 85, 343-9, 1987.

TENZER, S.; SCHILD, H. Assays of proteasome-dependent cleavage products. Methods Mol Biol., 301, 97-115, 2005.

TEUBER, S. S.; JARVIS, K. C.; DANDEKAR, A. M.; PETERSON, W. R. ANSARI AA. Identification and cloning of a complementary DNA encoding a vicilin-like proprotein, jug r 2, from english walnut kernel (Juglans regia), a major food allergen. J Allergy Clin Immunol ., 104, 1311-20, 1999.

139

THELEN, J. J.; OHLROGGE, J. B. Metabolic Engineering of Fatty Acid Biosynthesis in Plants. Metabolic Engineering , 4, 12-21, 2002.

THIMM, O.; BLASING, O.; GIBON, Y.; NAGEL, A.; MEYER, S.; KRUGER, P.; et al . MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant Journal , 37, 914-39, 2004.

THORPE, S. C.; KEMENY, D. M.; PANZANI, R. C.; MCGURL, B.; LORD, M. Allergy to castor bean Identification of the major allergens in castor bean seeds . J Allergy Clin Immunol ., 82, 67-72, 1988.

TOVAR-MENDEZ, A.; MIERNYK, J. A.; RANDALL, D. D. Regulation of pyruvate dehydrogenase complex activity in plant cells. Eur J Biochem ., 270, 1043-9, 2003.

TRONCOSO-PONCE, M. A.; KRUGER, N. J.; RATCLIFFE, G.; GARCÉS, R.; MARTÍNEZ-FORCE, E. Characterization of glycolytic initial metabolites and enzyme activities in developing sunflower (Helianthus annuus L.) seeds . Phytochemistry , 70, 1117-1122, 2009.

TSIATSIANI, L.; GEVAERT, K.; VAN BREUSEGEM, F. Natural substrates of plant proteases: how can protease degradomics extend our knowledge. Physiol Plant ., 145, 28-40. 2012.

VACCA, R. A.; VALENTI, D.; BOBBA, A.; DE PINTO, M. C.; MERAFINA, R. S.; DE GARA, L.; et al . Proteasome function is required for activation of programmed cell death in heat shocked tobacco Bright-Yellow 2 cells. FEBS Lett ., 581, 917-22, 2007.

VASCONCELOS, É. A. R.; NOGUEIRA, F. C. S.; ABREU, E. F. M.; GONÇALVES, E. F.; SOUZA, P. A. S.; CAMPOS, F. A. P. Protein Extraction From Cowpea Tissues for 2-D Gel Electrophoresis and MS Analysis. Chromatographia., 62, 447-50, 2005.

VERMA, R.; ARAVIND, L.; OANAA, R.; MCDONALD, W. H.; YATES, J. R.; KOONIN, E. V.; DESHAIES, R. J. Role of Rpn11 metalloprotease in deubiquitination and degradation by the 26S proteasome. Science , 298, 611–615, 2002.

VIERSTRA, R. D. The ubiquitin/26S proteasome pathway, the complex last chapter in the life of many plant proteins. Trends Plant Sci ., 8, 135-42, 2003.

VILLEGAS, L. F.; FERNANDEZ, I. D.; MALDONADO, H.; TORRES, R.; ZAVALETA, A.; VAISBERG, A. J.; et al. Evaluation of the wound-healing activity of selected traditional medicinal plants from Peru. Journal of Ethnopharmacology , 55, 193-200, 1997.

140

VISSER, E. M.; FILHO, D. O.; MARTINS, M. A.; STEWARD, B. L. Bioethanol production potential from Brazilian biodiesel co-products. Biomass and Bioenergy , 35, 489-94, 2011.

VOELKER, T.; KINNEY, A. J. VARIATIONS IN THE BIOSYNTHESIS OF SEED- STORAGE LIPIDS. Annu Rev Plant Physiol Plant Mol Biol ., 52, 335-61, 2001.

VOIGT, G.; BIEHL, B.; HEINRICHS, H.; VOIGT, J. Aspartic proteinase levels in seeds of different angiosperms. Phytochemistry , 44, 389-92, 1997.

VRANOVA, E.; COMAN, D.; GRUISSEM, W. Structure and dynamics of the isoprenoid pathway network. Mol Plant ., 5, 318-33, 2012.

WANG, F.; ROBOTHAM, J. M.; TEUBER, S. S.; TAWDE, P.; SATHE, S. K.; ROUX, K. H.; ANA O 1. A cashew (Anacardium occidental) allergen of the vicilin seed storage . J Allergy Clin Immunol ., 110, 160-6, 2002.

WANG, X.; XU, C.; WU, R.; LARKINS, B. A. Genetic dissection of complex endosperm traits. Trends in Plant Science , 14, 391-8, 2009.

WEBER, H.; BORISJUK, L.; WOBUS, U., Molecular physiology of legume seed development Annu Rev Plant Biol ., 56, 253-79, 2005.

Weselake, R. J.; Taylor, D. C.; Rahman, M. H.; Shah, S.; Laroche, A.; McVetty, P. B.; Harwood, J. L., Increasing the flow of carbon into seed oil. Biotechnol Adv , 27, (6), 866-78, 2009.

WEVER D-AZ, HEERES HJ, BROEKHUIS AA. Characterization of Physic nut (Jatropha curcas L.) shells. Biomass and Bioenergy , 7,177-87, 2012.

WILKINS, M. R.; PASQUALI, C.; APPEL, R.; OU, K.; GOLAZ, O.; SANCHEZ, J. C.; et al. From proteins to proteomes: large scale protein identification by two- dimensional electrophoresis and amino acid analysis. Biotechnology , 14, 14:61-5, 1996.

WILKINS, M. R.; SANCHEZ, J. C.; GOOLEY, A. A.; APPEL, R. D.; HUMPHERY- SMITH, I.; HOCHSTRASSER, D. F.; et al . Progress with proteome projects: why all proteins expressed by a genome should be identified and how to do it. Biotechnol Genet Eng Rev ., 13, 19-50, 1996.

WU , G. Z.; XUE, H. W. Arabidopsis beta-ketoacyl-[acyl carrier protein] synthase i is crucial for fatty acid synthesis and plays a role in chloroplast division and embryo development. Plant Cell , 22, 3726-44, 2010. 141

XIA, Y.; SUZUKI, H.; BOREVITZ, J.; BLOUNT, J.; GUO, Z.; PATEL, K., et al. An extracellular aspartic protease functions in Arabidopsis disease resistance signaling. EMBO Journal , 23, 980-8. 2004.

XIAO, J.; ZHANG, H.; NIU, L.; WANG, X.; LU, X. Evaluation of Detoxification Methods on Toxic and Antinutritional Composition and Nutritional Quality of Proteins in Jatropha curcas Meal. Journal of Agricultural and Food Chemistry , 59, 4040-4, 2011.

XU, J. H.; MESSING, J. Amplification of prolamin storage protein genes in different subfamilies of the Poaceae. Theor Appl Genet .,119, 1397-412, 2009.

XU R, WANG R, LIU A. Expression profiles of genes involved in fatty acid and triacylglycerol synthesis in developing seeds of Jatropha (Jatropha curcas L.). Biomass and Bioenergy , 35, 1683-92, 2011.

XU, T.; VENABLE, J. D.; PARK, S. K.; COCIORVA, D.; LU, B.; LIAO, L.; et al . ProLuCID, a fast and sensitive tandem mass spectra-based protein identification program. . Molecular & Cellular Proteomics , 5, S174, 2006.

YAHATA, E.; MARUYAMA-FUNATSUKI, W.; NISHIO, Z.; TABIKI, T.; TAKATA, K.; YAMAMOTO, Y.; et al . Wheat cultivar-specific proteins in grain revealed by 2-DE and their application to cultivar identification of flour. Proteomics , 5, 3942-53, 2005.

YANG, M. F.; LIU, Y. J.; LIU, Y.; CHEN, H.; CHEN, F.; SHEN, S. H. Proteomic analysis of oil mobilization in seed germination and postgermination development of Jatropha curcas. Journal of Proteome Research , 8, 1441-51, 2009.

YATES, J. R.; RUSE, C. I.; NAKORCHEVSKY, A. Proteomics by mass spectrometry: approaches, advances, and applications. Annu Rev Biomed Eng., 11, 49-79, 2009

YE, J.; QU, J.; BUI, H. T.; CHUA, N. H. Rapid analysis of Jatropha curcas gene functions by virus-induced gene silencing. Plant Biotechnology Journal , 7, 964-76, 2009.

ZHANG, L.; YU, Z.; JIANG, L.; JIANG, J.; LUO, H.; FU, L. Effect of post-harvest heat treatment on proteome change of peach fruit during ripening. Journal of Proteomics , 74, 1135-49, 2011

ZHU, H.; BILGIN, M.; SNYDER, M. Proteomics. Annu Rev Biochem ., 72, 783-812, 2003.

142

8 PUBLICATIONS, WORKSHOPS AND CONFERENCES

1. Pinheiro, C. B.; Shah, M. ; Soares, E. L.; Nogueira, F. C.; Carvalho, P. C.; Junqueira, M.; Araujo, G. D.; Soares, A. A.; Domont, G. B.; Campos, F. A., Proteome analysis of plastids from developing seeds of Jatropha curcas L. Journal of Proteome Research , 12, (11), 5137-45, 2013.

2. Nogueira, F. C.; Palmisano, G.; Soares, E. L.; Shah, M. ; Soares, A. A.; Roepstorff, P.; Campos, F. A.; Domont, G. B., Proteomic profile of the nucellus of castor bean ( Ricinus communis L.) seeds during development. Journal of Proteomics, 75, (6), 1933-9, 2012.

3. Soares, E. L.; Shah, M. ; Nogueira, F. C.; Carvalho, P. C.; Soares, A. A.; Domont, G. B.; Campos, F. A., Proteome analysis of the inner integument from developing seeds of Jatropha curcas L. (submitted to Journal of Proteome Research ).

4. Shah, M. ; Pinheiro, C. B.; Soares, E. L.; Nogueira, F. C.; Carvalho, P. C.; Soares, A. A.; Domont, G. B.; Campos, F. A. Proteome analysis of plastids isolated from inner integument of developing seeds of Jatropha curcas L. (In preparation ).

5. Shah, M. ; Pinheiro, C. B.; Soares, E. L.; Nogueira, F. C.; Carvalho, P. C.; Soares, A. A.; Domont, G. B.; Campos, F. A. Proteome analysis of endosperm from developing seeds of Jatropha curcas L. (In preparation ).

6. Participation in the ‘’ Joint Wellcome Trust/EBI Workshop on Proteomics and Bioinformatics’’ 11-15 November 2013 at the Wellcome Trust Genome Campus, Hinxton, Cambridge.

7. Shah, M ; Soares, E. L.; Nogueira, F. C. S.; Soares, A. A; Domont, G.; Campos, F. A. P. Programmed Cell Death in the Inner Integument of Seeds of the Bioenergy Crop Jatropha curcas. XL Anual Meeting of Brazilian Biochemistry and Molecular Biology Society. Foz do Iguaçu, PR, Brazil. April 30 th to May 3rd , 2011. 143

8. Mohib U. Shah ; Camila B. Pinheiro; Fábio C. S. Nogueira1; Emanoella L. Soares; Paulo C. Carvalho; Francisco A. P. Campos; Gilberto B. Domont. Subproteome Analysis of Developing Seeds from Jatropha curcas . The 11 th Human Proteome Organization World Congress ( HUPO 2012 ).

9. Mohibullah Shah ; Camila B. Pinheiro; Fábio C.S. Nogueira; Emanoella L. Soares; Paulo C. Carvalho; Francisco A.P. Campos; Gilberto B. Domont. Subproteome Analysis of Developing Seeds from Jatropha curcas . 1st meeting of the Brazilian Proteomics Society. Rio de Janeiro, from 10 th to 12 th of december 2012 .

10. Mohib U. Shah ; Camila B. Pinheiro; Fábio C. S. Nogueira1; Gabriel D.T Araujo.; Emanoella L. Soares; Paulo C. Carvalho; Francisco A. P. Campos; Gilberto B. Domont. In-Depth Proteome Analysis of Developing Seeds of Jatropha curcas . The 12 th Human Proteome Organization World Congress (HUPO 2013 ).

144

9 ATTACHMENTS

Journal of Proteome Research

This document is confidential and is proprietary to the American Chemical Society and its authors. Do not copy or disclose without written permission. If you have received this item in error, notify the sender and delete all copies.

Proteome analysis of the inner integument from developing seeds of Jatropha curcas L.

Journal: Journal of Proteome Research

Manuscript ID: pr-2013-01266q

Manuscript Type: Article

Date Submitted by the Author: 18-Dec-2013

Complete List of Authors: Soares, Emanoella; Federal University of Ceará, Biochemistry and Molecular Biology Shah, Mohibullah; Federal University of Ceará, Biochemistry and Molecular Biology Soares, Arlete; Federal University of Ceará, Biology Carvalho, Paulo; Instituto Carlos Chagas, Fiocruz, Proteomics and Protein Engineering Domont, Gilberto; Federal University of Rio de Janeiro, Proteomics Unit, Department of Biochemistry, Institute of Chemistry; Nogueira, Fábio; Universidade Federal do Rio de Janeiro, Institute of Chemistry, Department of Biochemistry Campos, Francisco; Federal University of Ceará, Biochemistry and Molecular Biology

ACS Paragon Plus Environment Page 1 of 24 Journal of Proteome Research

1 2 3 Proteome analysis of the inner integument from developing seeds of 4 5 6 Jatropha curcas L. 7 8 9 10 11 12 Emanoella L. Soares 1# , Mohibullah Shah 1# , Arlete A. Soares 2, Paulo Carvalho 3, 13 14 4 4* 1* 15 Gilberto B. Domont , Fábio C.S. Nogueira and Francisco A.P. Campos 16 17 18 1 Department of Biochemistry and Molecular Biology, Federal University of Ceará, 19 20 Fortaleza, Brazil 21 22 23 2 Department of Biology, Federal University of Ceará, Fortaleza, Brazil 24 25 26 3 27 Laboratory for Proteomics and Protein Engineering, Carlos Chagas Institute, Fiocruz, 28 29 Paraná, Brazil 30 31 32 4 Proteomic Unit, Institute of Chemistry, Federal University of Rio de Janeiro, Rio de 33 34 Janeiro, Brazil 35 36 37 # 38 Equal contribution 39 40 41 *Correspondence: 42 43 44 Prof. Francisco A.P. Campos, Department of Biochemistry and Molecular Biology, 45 46 Federal University of Ceará, 60455900 Fortaleza, Brazil 47 48 49 Email : [email protected] 50 51 52 Fax : +558533669829 53 54 55 56 57 58 1 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 2 of 24

1 2 3 Prof. Fábio C. S. Nogueira, Proteomic Unit, Institute of Chemistry, Federal University 4 5 of Rio de Janeiro, 21941909 Rio de Janeiro, Brazil 6 7 8 Email : [email protected] 9 10 11 Phone : +552125628862 12 13 14 Fax : +552125627266 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 2 59 60 ACS Paragon Plus Environment Page 3 of 24 Journal of Proteome Research

1 2 Abstract 3 4 5 In this study, we performed a systematic proteomic analysis of the inner integument from 6 7 developing seeds of Jatropha curcas and further explore the protein machinery responsible for 8 9 the generating the carbon and nitrogen sources to feed the growing embryo and endosperm. For 10 11 such, the inner integument was dissected into two regions viz ., one internal to the vascular 12 13 14 bundle (proximal region), facing the central cavity and other external (distal region) facing the 15 16 exotegmen. Proteins were extracted from these sections and from the whole integument, 17 18 trypsinized and analyzed using an EASYnanoLC system coupled to an ESILTQOrbitrap Velos 19 20 mass spectrometer. Our results disclose the identification of 1526, 1192, and 1062 proteins from 21 22 23 the proximal, distal and whole inner integument, respectively. The identifications comprise 24 25 peptidases that play a key role in developmental programmed cell death (PCD), and proteins 26 27 associated to the cell wall architecture and modification. As many of these proteins are 28 29 differentially expressed within the integument cell layers, these findings suggest that the cells 30 31 mobilize an array of hydrolases to produce carbon and nitrogen sources from proteins, 32 33 34 carbohydrates, and lipids available within the cells. Not least, the identification of several classes 35 36 of seed storage proteins in the inner integument is an additional evidence of the role of the seed 37 38 coat as a transient source of reserves to the growing embryo and endosperm. 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 3 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 4 of 24

1 2 1. Introduction 3 4 5 The seeds of J. curcas are considered as a potential source of raw material for the 6 7 production of biodiesel 1 and two byproducts of the oil extraction, the proteinrich presscake 8 9 and the seed coat also represent opportunities for adding value to the crop. The former can 10 11 potentially be used as animal feed and the later, representing almost 40% of the seed weight, 12 13 2 14 constitutes a potential source of biomass for the production of secondgeneration ethanol. The 15 16 full exploitation of such potentials is hindered by lack of knowledge about the chemical identity 17 18 and biosynthesis of cell wall components of the seed coat and by the presence, in the seeds, of 19 20 high concentration of phorbol esters, tetracyclic diterpenoids known for their tumorpromoting 21 22 3 23 activity, representing serious risks for humans and animals and for the environment. In mature 24 25 seed, the highest concentration of phorbol esters are found in the inner layer of the seed coat 26 27 called tegmen, from where it diffuses to the endosperm.4,5 . However, the biosynthesis pathway of 28 29 these compounds is largely unknown. 30 31 32 The seeds of Angiosperms are derived from the ovule, an organ that harbors the female 33 34 6 35 gametophyte which in turn is surrounded by the nucellus and one or two integuments. 36 37 Following fertilization of the egg cell and the central cell the female gametophyte will originate 38 39 the diploid embryo and triploid endosperm respectively and differentiation of the integuments 40 41 will form the maternally derived diploid seed coat.7 While knowledge of the developmental 42 43 biology of embryo and endosperm is expanding rapidly 8,9 the cellular events underlying seed 44 45 10,11 46 coat differentiation has received comparatively little attention. However, it is well established 47 48 that at early to mid stages of development, the seed coat acts as a maternal conduit to convey 49 50 nutrients for the developing embryo and endosperm 1214 and at maturity it provides protection, 51 52 regulates germination and additionally promotes seed dispersal.15 53 54 55 56 57 58 4 59 60 ACS Paragon Plus Environment Page 5 of 24 Journal of Proteome Research

1 2 Details about the protein machinery responsible for producing the nitrogen and carbon 3 4 sources for feeding the embryo and endosperm during seed development are elusive. Although 5 6 differentiation of the integuments to seed coat is known to involve developmentally regulated 7 8 programmed cell death (PCD) triggered by vacuole collapse, 16,17 very little is known about the 9 10 11 synthesis, subcellular location, and site of action of the peptidases, lipases, carbohydrases, and 12 13 nucleases needed to digest cell components, including the cell wall, so that its products may be 14 15 used as energy sources and/or buildings blocks for biosynthetic reactions within the embryo and 16 17 endosperm. Most of the studies related to seed coat are either restricted to individual enzymes 18 18 19 20 9 10 20 or to high throughput transcriptomic analysis and only recently Miernyk and Johnston 21 22 published the very first indepth proteomic analysis of the seed coat, providing a glimpse about 23 24 the dynamic changes in the proteome of the testa of developing soybean seeds. In this work, we 25 26 capitalized on a detailed histological description of seed coat development of Jatropha curcas 21 27 28 for guiding our proteomic study within the layers of the inner integument during its midstage 29 30 31 development phase; this allowed us to report with unprecedented details the proteome of the 32 33 enzymatic machinery responsible for executing the developmentally controlled PCD within this 34 35 tissue. 36 37 38 2. Material and Methods 39 40 41 2.1. Plant materials and histological analysis 42 43 44 Histological analyses were performed with inner integument isolated from seeds at 25 45 46 days of pollination (DAP), as described by Rocha et al .22 Briefly, this tissue was isolated with a 47 48 49 sharp spatula and dissected into two regions – one internal to the vascular bundle (proximal 50 51 region), faced to the central cavity and other external (distal region), faced to the exotegmen, 52 53 (Figure 1). The middle section between the two regions was discarded in order to avoid 54 55 56 57 58 5 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 6 of 24

1 2 contamination. Besides the proximal and distal regions we also took the whole intact inner 3 4 integument (total integument). 5 6 7 2.2. Sample preparation and protein determination 8 9 10 After separation, all the three samples were immediately frozen in liquid nitrogen, 11 12 smashed to powder, and separately subjected to protein extraction according to Vasconcelos et 13 14 al ., 23 . Proteins concentrations were determined by the Bradford assay. 24 15 16 17 18 2.3. LC-MS/MS and data analysis 19 20 21 Trypsin digestion was performed in 100 g of proteins from each of the three samples as 22 25 23 described by Pinheiro et al . Briefly, peptides dissolved in 0.5% formic acid entered an EASY 24 25 nano LC system (Proxeon Biosystem) coupled online to an ESILTQ Orbitrap Velos mass 26 27 spectrometer (Thermo Fisher Scientific). Peptides were loaded in a trap column (150 m × 2 cm) 28 29 packed inhouse with C18 ReproSil 3 m resin (Dr. Maisch) and eluted in an analytical column 30 31 32 (100 m x 15 cm) packed with the same material. Peptide separation was performed using a 33 34 gradient from 100% of A (0.1% formic acid) to 35% of B (0.1% formic acid, 95% formic acid) 35 36 for 150 min, followed by 35% to 90% of B for 15 min, and 90% for 5 min. MS1 spectra were 37 38 acquired in a positive mode using the datadependent automatic (DDA) survey MS scan. Each 39 40 41 DDA consisted of a survey scan on the m/z range 300−2000 and resolution 60,000 with a target 42 6 43 value of 1 × 10 ions. The ten most intense ions were subjected to MS2 acquisition in the LTQ 44 45 using a normalized collisioninduced dissociation (CID) of 35; previously fragmented ions were 46 47 dynamically excluded for 60 s. 48 49 50 Six biological replicates were performed for each sample: each biological sample in 51 52 triplicate for the distal and proximal regions. The total integument was analyzed twice. Raw files 53 54 26 55 were converted into MS2 files using PatternLab’s RawReader. ProLuCID v1.3 was used to 56 57 perform peptide spectrum matching against the J. curcas proteins database,27 downloaded from 58 6 59 60 ACS Paragon Plus Environment Page 7 of 24 Journal of Proteome Research

1 2 http://www.kazusa.or.jp/jatropha/ September 2012, combined with J. curcas chloroplast genome 3 4 encoded proteins, downloaded from NCBI September 2012. Search parameters were full tryptic 5 6 hydrolysis, two missed cleavages, oxidation of methionine as variable and carbamidomethylation 7 8 as fixed modifications, and a peptide tolerance of 50 ppm. Search results were subsequently 9 10 28 11 filtered/processed through Search Engine Processor to achieve a 1% false discovery rate (FDR). 12 13 Proteins appearing in at least two biological replicates, considered as representatives of that 14 15 biological sample, were blasted against NCBI non redundant database using Blast2GO 16 17 annotation tool 29 and used for downstream analysis. Spectral counts of the identified proteins 18 19 30 20 were obtained using the PatternLab computational environment. Gene ontology (GO) 21 22 annotation was done using AgBase tools and database (http://agbase.msstate.edu/index.html). 23 24 Goanna 31 was used to retrieve the GO annotations assigned on the basis of sequence similarities 25 26 and Plant GOSlim was introduced to summarize the subcategories of the identified proteins. 27 28 The revised Pattern Lab’s TFold module was used for pinpointing differentially expressed 29 30 32 31 proteins. 32 33 34 3. Results and Discussion 35 36 37 The mature ovule of J. curcas is anatropous, crassinucellar, bitegmic, and the vascular 38 39 bundles present in the outer integument extend by postchalazal branching from the chalaza 40 41 through the inner integument.21,33 Both integuments are composed of three layers: endotegmen, 42 43 mesotegmen and exotegmen in the inner integument and endotesta, mesotesta and exotesta in the 44 45 46 outer integument. During development the integuments will differentiate to form the seed coat. 47 48 At mid stage of seed development, 25 days after pollination (DAP), the embryo is at the heart 49 50 stage, the cellularization of the endosperm is well underway (Figure 1A) and the cell layers 51 52 closer to the exotegmen are small and become progressively larger and vacuolated as they are 53 54 55 positioned closer to the endosperm and those in the immediate vicinity of the endosperm 56 57 collapse, forming a layer of cell debris (Figure 1B). In later stages, when cellularization of the 58 7 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 8 of 24

1 2 nuclear endosperm is complete and the embryo is fully developed, the mesotegmen cells, which 3 4 are internal to the vascular bundle, are completely consumed, the cells external to the vascular 5 6 bundle are crushed and those from the exotegmen become sclerified forming the main 7 8 mechanical barrier of the mature seed coat. At this stage the three layers of the outer integument 9 10 11 are differentiated. Figure 1B indicates that cells closer to the exotegmen and those closer to the 12 13 endosperm are at different metabolic states and are experiencing PCD at different degrees, 14 15 suggesting the possible existence within the inner integument of a concentration gradient of the 16 17 proteins involved with the execution of PCD (e.g. peptidases, lipases and carbohydrases). To 18 19 20 further investigate, we took advantage of the large size of the developing seeds to isolate the 21 22 whole inner integument and to obtain from it proximal and distal sections to the developing 23 24 endosperm (Figure 1A) and perform an indepth proteomic analysis of each of the three parts. 25 26 27 Results disclose the identification of 1752, 1405, and 1251 proteins from distal, proximal 28 29 and total integument, representing 1169, 912, and 823 protein groups, respectively, at a 1% FDR. 30 31 Proteins identified in at least two biological replicates were considered as highly confident; these 32 33 34 identifications comprise 1526 (distal), 1192 (proximal), and 1062 (total integument) proteins 35 36 summing up a total of 1770 (Supplementary Table I). Identification of sucrose synthases and cell 37 38 wall and apoplastic invertases, which are key players in the control of nutrient supply to the 39 40 embryo and endosperm,34 indicates that the inner integument of developing J. curcas seeds is 41 42 43 metabolically very active. This is further supported by the identification of proteins belonging to 44 45 major biochemical pathways, such as those related to amino acid and protein biosynthesis, and a 46 47 number of transporters (Supplementary Table 1). Additionally, the role of the inner integument 48 49 as a transient storage source of nutrients to the developing embryo and endosperm is supported 50 51 by the identification of several classes of seed storage proteins such as nutrient reservoir, 52 53 54 globulins 11S, legumins, and glutelins (Table 1). Although a number of proteins involved in 55 56 secondary metabolism were identified, we did not identify any casbene synthase or other protein 57 58 8 59 60 ACS Paragon Plus Environment Page 9 of 24 Journal of Proteome Research

1 2 that may be involved in the biosynthesis of phorbol esters, the major toxic components of J. 3 4 curcas seeds. This result is in line with recent findings by Nakano et al .35 showing that in J. 5 6 curcas , casbene synthase is expressed in seedlings, mature leaves and the flesh of developing 7 8 fruits, but not in developing seeds. Another recent proteome analysis of plastids from developing 9 10 25 11 J. curcas seeds reported no evidence for the presence of casbene synthase among more than 12 13 1100 proteins. 14 15 16 Despite the heterogeneity in the total number of the proteins identified in each of the 17 18 three samples analyzed (Supplementary Table 1), we did not observe significant differences in 19 20 the distribution of these proteins into different Gene Ontology categories (Supplementary Table 21 22 23 II). However, Gene Ontology annotation of the proteins unique to distal and proximal region of 24 25 the integument revealed a clear difference in their distribution to some of the subcategories of 26 27 GO Molecular Function and Cellular Component (Supplementary Table III and Figure 2). 28 29 Proteins related to hydrolase activity of the GO Molecular Function (Figure 2A) were found to 30 31 comprise the largest functional class in the unique proximal region as compared to the distal 32 33 34 region where proteins related to protein binding (Figure 2A) make the largest functional class. 35 36 This indicates that most of the metabolic activity located in the proximal region is geared 37 38 towards reallocating to the embryo and endosperm the catabolic products produced by the 39 40 digestion of proteins, carbohydrates, lipids, and nucleic acids within the cells undergoing PCD 41 42 43 Moreover, the percentage of the cell wall subcategory from the GO Cellular Component (Figure 44 45 2B) was greater in the proximal region than in the analysis of distal region; the proteins mapping 46 47 to this category are mostly related to the cell wall degradation processes which further supports a 48 49 metabolism geared towards the digestion of carbohydrate polymers and proteins that comprise 50 51 the cell wall. 52 53 54 55 We used PatternLab’s Approximately Area Proportional Venn Diagram module to 56 57 pinpoint proteins uniquely identified in distal and proximal regions of the inner integument, the 58 9 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 10 of 24

1 2 analysis only considered proteins found in two or more biological replicates from a particular 3 4 sample. Differentially expressed proteins found in two or more biological replicates identified in 5 6 the distal and proximal regions of the inner integument were discriminated by PatternLab’s 7 8 TFold module using a qvalue of 0.05 (Figure 3). Only proteins that satisfy all statistical tests 9 10 11 (blue dots in Figure 3) were considered; and those dots found in the upper section of the plot are 12 13 upregulated in the distal sample and those found in the lower section are upregulated in the 14 15 proximal sample. 16 17 18 We recall that the TFold module uses a theoretical FDR estimator to maximize 19 20 identifications satisfying both a foldchange cutoff that varies with the ttest pvalue as a power 21 22 23 law and a stringency criterion that aims to fish out lowly abundant proteins that are likely to have 24 25 had their quantitation compromised. Differentially expressed proteins between distal and 26 27 proximal regions are present in Supplementary Table IV. 28 29 30 Of all identified proteins, almost 10% belong to one of the four major mechanistic classes 31 32 of peptidases and their inhibitors or are catalytic subunits of the proteasome or ubiquitination 33 34 35 proteins (Supplementary Table V). The identification of proteasome components and 36 37 ubiquitination proteins indicates a proper functioning of the proteasome, a conditio sine qua non 38 39 for the occurrence of PCD in plant cells. 36 Many of the identified peptidases are known to have a 40 41 role in PCD. For example, ϒvacuolar processing enzyme (ϒVPE), only identified in the distal 42 43 region, is a homologue to a VPE also identified in castor seeds during the proteomic analysis of 44 45 37 46 nucellus tissue undergoing PCD. The VPEs are a subclass of cysteine peptidases (CP) which 47 48 possess caspase1like activity and are used by the vacuolarcollapse system to promote 49 50 developmentally controlled PCD in plant tissues. 16 The KDELtailed CP is another type of CP 51 52 that has been found to be a hallmark of PCD in plant tissues undergoing PCD.22,3739 Here we 53 54 55 identified a KDELtailed CP having an statistically higher expression in the proximal region 56 57 (Figure 3, Supplementary Table IV); this evidence support the hypothesis that these proteinases 58 10 59 60 ACS Paragon Plus Environment Page 11 of 24 Journal of Proteome Research

1 2 are released into the cytoplasm following the vacuole collapse triggered by the action of the 3 4 VPEs.20 The spatial and temporal differences in expression of the VPEs and KDELtailed CP in 5 6 the inner integument of J. curcas seeds underline the different roles of these CP in PCD and are 7 8 in line with recent findings by Rocha et al .,22 that during seed development, transcripts for ϒ 9 10 11 VPE are detected earlier than those for the KDELtailed CP. 12 13 14 Our results disclose identifications of several serine peptidases (SP) and inhibitors (SPI) 15 16 (Supplementary Table V) of which 10 are differentially expressed (Supplementary Table IV, 17 18 Figure 3). Notably, three subtilisinlike serine peptidases presented spectralcounting fold 19 20 changes greater than 10 folds higher in the proximal region. These subtilisinlike serine 21 22 23 peptidases are characterized by an aspartate, histidine and serine and have been 24 25 shown to display caspase like activity and proved to be associated with cell death in several plant 26 27 species.40 Identification of this remarkable number and diversity of peptidases belonging to the 28 29 four mechanistic classes of peptidases in tissues undergoing PCD underlines the importance of 30 31 proteolysis for providing amino acids for protein synthesis in the filial tissues as suggested by 32 33 41 34 Gallardo et al . Moreover, such diversity highlights the role of seed integuments in the 35 36 developmental biology of J. curcas seeds. Miernyk and Johnston 10 and Gallardo et al., 41 have 37 38 identified an abundance of peptidase in several stages of development of the seed coat of 39 40 soybean and Medicago truncatula , but the possibility that the abundance and variety of these 41 42 43 peptidases in the developing seed coat may be related to PCD was not discussed. 44 45 46 Besides the peptidases referenced above, we identified proteins associated to the cell wall 47 48 architecture such as several expansins and extensins and several other proteins associated to cell 49 50 wall modification and degradation processes, such as alphaLarabinofuranosidases, lysosomal 51 52 alphamannosidases, pectinesterase, alpha and betaglucosidases, betaxylosidases, alpha 53 54 55 galactosidases, glucanases, polygalacturonases, among others (Supplementary Table I). Both 56 57 betaglucosidase and xylosidase are known to be involved in the breakdown or modification of 58 11 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 12 of 24

1 42,43 2 cell wall hemicelluloses and pectins. Alphadgalactosidase is associated to the 3 4 galactomannan mobilization while betagalactosidase to the degradation of the galactan side 5 6 chain.43 Pectinesterases also called pectin methylesterases (PMEs) catalyse the demethyl 7 8 esterification of the pectin domains and make it feasible to be degraded by the action of 9 10 44 11 polygalactorunases another degrading enzyme. Additionally, we identified pectinesterase 12 42 13 inhibitors (Supplementary Table 1) which modulate pectinesterase activity. Similarly the 14 15 enzyme alphaarabinofuranosidases identified here, in combined action with betagalactosidase 16 17 and betaglucuronidase (not identified here) is found to degrade the carbohydrate moieties of 18 19 43 20 arabinogalactan proteins (AGPs). Our results show the expression of many of these proteins to 21 22 be differentially regulated, as is the case alphaLarabinofuranosidases, pectinesterase inhibitor 23 24 and polygalacturonase that are more expressed in the proximal section than in the distal section 25 26 of the inner integument (Supplementary Table IV, Figure 3). Additionally, several other proteins 27 28 such as glucosidases, xylanases, mannosidases and pectinesterase, among others, were identified 29 30 31 only in the proximal region of the inner integument. Together, these results indicate that along 32 33 other possible roles, these polysaccharide acting enzymes together with different kind of 34 35 peptidases, as described above, are acting to liberate all possible carbon and nitrogen sources 36 37 available in the cell for the filial tissues, including those that belong to the cell wall. 38 39 40 It is noteworthy pointing that our results disclose several proteins related to lipids 41 42 43 degradation, viz. : phospholipase D, lipoxygenases, acylCoA oxidases, 3ketoacyl thiolase, 44 45 among others (Supplementary Table 1). Phospholipase is an enzyme that catalyses the hydrolysis 46 47 of structural phospholipids into hydroperoxides leading to the loss of membrane integrity, an 48 49 event associated with PCD.45 This enzyme is known to be involved in largescale lipid 50 51 degradation associated with cell death 46 and sometimes contributes to caspasedependent cell 52 53 47 54 death signaling. Fatty acids resulted from the membrane degradation are oxidized by the action 55 56 57 58 12 59 60 ACS Paragon Plus Environment Page 13 of 24 Journal of Proteome Research

1 48 2 of lipoxygenases and acyl hydrolases. The appearance of these lipids degradation enzymes 3 4 along with other hydrolases, reinforce the occurrence of PCD in this maternal tissue. 5 6 7 8 9 10 11 12 13 4. Conclusion 14 15 16 The data presented here demonstrate the role of the inner integument of developing seeds 17 18 of J. curcas in providing carbon and nitrogen sources to the growing embryo and endosperm. 19 20 21 The identification of several classes of peptidases, particularly of ϒVPE and KDELtailed CP 22 23 highlights the role of developmental PCD in the developmental biology of seeds of J. curcas . 24 25 Additionally, the demonstration of the differential expression of these peptidases and other 26 27 hydrolases within cell layers of the inner integument after the triggering of PCD, suggests that 28 29 the cells mobilize a whole array of hydrolases to produce carbon and nitrogen sources from 30 31 32 proteins, carbohydrates and lipids available within the cells. Not least, the identification of 33 34 several classes of seed storage proteins in the inner integument is an additional evidence of the 35 36 role of the seed coat as a transient source of reserves to the growing embryo and endosperm. 37 38 39 40 41 42 Acknowledgements 43 44 45 This work was supported by grants from PETROBRAS, Banco do Nordeste, CNPq, CAPES, 46 47 FUNCAP and TWAS. 48 49 50 51 52 53 54 55 56 57 58 13 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 14 of 24

1 2 5. References 3 4 (1) Contran, N.; Chessa, L.; Lubino, M.; Bellavite, D.; Roggero, P. P.; Enne, G. Stateoftheart 5 of the Jatropha curcas productive chain: from sowing to biodiesel and byproducts. Ind. Crop 6 Prod. 2013 , 42, (0), 202215. 7 8 (2) Wever, D.A. Z.; Heeres, H. J.; Broekhuis, A. A. Characterization of Physic nut ( Jatropha 9 10 curcas L.) shells. Biomass Bioenergy 2012 , 37, (0), 177187. 11 12 (3) Gressel, J. Transgenics are imperative for biofuel crops. Plant Sci. 2008 , 174, (3), 246263. 13 14 (4) King, A. J.; Montes, L. R.; Clarke, J. G.; Affleck, J.; Li, Y.; Witsenboer, H.; van der Vossen, 15 E.; van der Linde, P.; Tripathi, Y.; Tavares, E.; Shukla, P.; Rajasekaran, T.; van Loo, E. N.; 16 Graham, I. A. Linkage mapping in the oilseed crop Jatropha curcas L. reveals a locus 17 controlling the biosynthesis of phorbol esters which cause seed toxicity. Plant Biotechnol. J. 18 2013 , 11, (8), 98696. 19 20 21 (5) He, W.; King, A. J.; Khan, M. A.; Cuevas, J. A.; Ramiaramanana, D.; Graham, I. A. Analysis 22 of seed phorbolester and curcin content together with genetic diversity in multiple provenances 23 of Jatropha curcas L. from Madagascar and Mexico. Plant Physiol. Biochem. 2011 , 49, (10), 24 11831190. 25 26 (6) Boesewinkel, F. D.; Bouman, F. The seed: structure. In Embryology of Angiosperms, Johri, 27 B., Ed. Springer Berlin Heidelberg: 1984 ; pp 567610. 28 29 30 (7) Ingram, G. C. Family life at close quarters: communication and constraint in angiosperm seed 31 development. Protoplasma 2010 , 247, (34), 195214. 32 33 (8) Harada, J. J.; Pelletier, J. Genomewide analyses of gene activity during seed development. 34 Seed Sci. Res. 2012 , 22, (Supplement S1), S15S22. 35 36 (9) Belmonte, M. F.; Kirkbride, R. C.; Stone, S. L.; Pelletier, J. M.; Bui, A. Q.; Yeung, E. C.; 37 Hashimoto, M.; Fei, J.; Harada, C. M.; Munoz, M. D.; Le, B. H.; Drews, G. N.; Brady, S. M.; 38 Goldberg, R. B.; Harada, J. J. Comprehensive developmental profiles of gene activity in regions 39 40 and subregions of the Arabidopsis seed. Proc. Natl. Acad. Sci. U S A 2013 , 110, (5), 43544. 41 42 (10) Miernyk, J. A.; Johnston, M. L. Proteomic analysis of the testa from developing soybean 43 seeds. J. Proteomics 2013 , 89, 26572. 44 45 (11) Verdier, J.; Dessaint, F.; Schneider, C.; AbirachedDarmency, M. A combined histology 46 and transcriptome analysis unravels novel questions on Medicago truncatula seed coat. J. Exp. 47 Bot. 2013 , 64, (2), 45970. 48 49 50 (12) Heim, U.; Weber, H.; Baumlein, H.; Wobus, U. A sucrosesynthase gene of Vicia faba L.: 51 expression pattern in developing seeds in relation to starch synthesis and metabolic regulation. 52 Planta 1993 , 191, (3), 394401. 53 54 (13) Weber, H.; Borisjuk, L.; Heim, U.; Buchner, P.; Wobus, U. Seed coatassociated invertases 55 of fava bean control both unloading and storage functions: cloning of cDNAs and cell type 56 specific expression. Plant Cell 1995 , 7, (11), 18351846. 57 58 14 59 60 ACS Paragon Plus Environment Page 15 of 24 Journal of Proteome Research

1 2 3 (14) Rochat, C.; Boutin, J.P. Metabolism of phloemborne amino acids in maternal tissues 2 4 Pisum sativum L. J. Exp. Bot. 1991 , 42, (2), 207214. 5 6 (15) Dean, G.; Cao, Y.; Xiang, D.; Provart, N. J.; Ramsay, L.; Ahad, A.; White, R.; Selvaraj, G.; 7 Datla, R.; Haughn, G. Analysis of gene expression patterns during seed coat development in 8 Arabidopsis . Mol. Plant 2011 , 4, (6), 107491. 9 10 11 (16) HaraNishimura, I.; Hatsugai, N. The role of vacuole in plant cell death. Cell Death Differ. 12 2011 , 18, (8), 1298304. 13 14 (17) Haughn, G.; Chaudhury, A. Genetic analysis of seed coat development in Arabidopsis . 15 Trends Plant Sci. 2005 , 10, (10), 4727. 16 17 (18) Nakaune, S.; Yamada, K.; Kondo, M.; Kato, T.; Tabata, S.; Nishimura, M.; Hara 18 Nishimura, I. A Vacuolar Processing Enzyme, δVPE, is involved in seed coat formation at the 19 20 early stage of seed development. Plant Cell 2005 , 17, (3), 876887. 21 22 (19) Wan, L.; Xia, Q.; Qiu, X.; Selvaraj, G. Early stages of seed development in Brassica napus : 23 a seed coatspecific cysteine proteinase associated with programmed cell death of the inner 24 integument. Plant J. 2002 , 30, (1), 110. 25 26 (20) Trobacher, C. P.; Senatore, A.; Holley, C.; Greenwood, J. S. Induction of a ricinosomal 27 protease and programmed cell death in tomato endosperm by gibberellic acid. Planta 2013 , 237, 28 (3), 66579. 29 30 31 (21) Singh, R. P. Structure and development of seeds in Euphorbiaceae: Jatropha species. Beitr 32 Biol. Pflanzen 1970 , 47, 7990. 33 34 (22) Rocha, A. J.; Soares, E. L.; Costa, J. H.; Costa, W. L.; Soares, A. A.; Nogueira, F. C.; 35 Domont, G. B.; Campos, F. A. Differential expression of cysteine peptidase genes in the inner 36 integument and endosperm of developing seeds of Jatropha curcas L. (Euphorbiaceae). Plant 37 Sci. 2013 , 213, 307. 38 39 40 (23) Vasconcelos, É. A. R.; Nogueira, F. C. S.; Abreu, E. F. M.; Gonçalves, E. F.; Souza, P. A. 41 S.; Campos, F. A. P. Protein extraction from cowpea tissues for 2D gel electrophoresis and MS 42 analysis. Chromatographia 2005 , 62, (78), 447450. 43 44 (24) Bradford, M. M. A rapid and sensitive method for the quantitation of microgram quantities 45 of protein utilizing the principle of proteindye binding. Anal Biochem. 1976 , 72, 24854. 46 47 (25) Pinheiro, C. B.; Shah, M.; Soares, E. L.; Nogueira, F. C.; Carvalho, P. C.; Junqueira, M.; 48 Araujo, G. D.; Soares, A. A.; Domont, G. B.; Campos, F. A. Proteome analysis of plastids from 49 50 developing seeds of Jatropha curcas L. J. Proteome Res. 2013 , 12, (11), 513745. 51 52 (26) Xu, T.; Venable, J. D.; Park, S. K.; Cociorva, D.; Lu, B.; Liao, L.; Wohlschlegel, J.; Hewel, 53 J.; Yates, J. R. I. ProLuCID, a fast and sensitive tandem mass spectrabased protein identification 54 program. Mol Cell Proteomics 2006 , 5, S174. 55 56 57 58 15 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 16 of 24

1 2 (27) Hirakawa, H.; Tsuchimoto, S.; Sakai, H.; Nakayama, S.; Fujishiro, T.; Kishida, Y.; Kohara, 3 M.; Watanabe, A.; Yamada, M.; Aizu, T.; Toyoda, A.; Fujiyama, A.; Tabata, S.; Fukui, K.; Sato, 4 S. Upgraded genomic information of Jatropha curcas L. Plant Biotechnol. J. 2012 , 29, (2), 123 5 130. 6 7 (28) Carvalho, P. C.; Fischer, J. S.; Xu, T.; Cociorva, D.; Balbuena, T. S.; Valente, R. H.; 8 Perales, J.; Yates, J. R., 3rd; Barbosa, V. C. Search engine processor: filtering and organizing 9 10 peptide spectrum matches. Proteomics 2012 , 12, (7), 9449. 11 12 (29) Conesa, A.; Gotz, S. Blast2GO: A comprehensive suite for functional analysis in plant 13 genomics. Int J. Plant Genomics 2008 , 2008, 619832. 14 15 (30) Carvalho, P. C.; Yates Iii, J. R.; Barbosa, V. C. Analyzing shotgun proteomic data with 16 PatternLab for proteomics. Curr. Protoc. Bioinformatics 2010 , Chapter 13, Unit 13 13 115. 17 18 (31) McCarthy, F. M.; Wang, N.; Magee, G. B.; Nanduri, B.; Lawrence, M. L.; Camon, E. B.; 19 20 Barrell, D. G.; Hill, D. P.; Dolan, M. E.; Williams, W. P.; Luthe, D. S.; Bridges, S. M.; Burgess, 21 S. C. AgBase: a functional genomics resource for agriculture. BMC Genomics 2006 , 7, 229. 22 23 (32) Carvalho, P. C.; Yates, J. R., 3rd; Barbosa, V. C. Improving the TFold test for differential 24 shotgun proteomics. Bioinformatics 2012 , 28, (12), 16524. 25 26 (33) Tokuoka, T.; Tobe, H. Ovules and seeds in Crotonoideae (Euphorbiaceae): structure and 27 systematic implications. Bot. Jahrb Syst. 1998 , 120, (2), 16586. 28 29 30 (34) Weber, H.; Borisjuk, L.; Wobus, U. Molecular physiology of legume seed development. 31 Annu. Rev. Plant Biol. 2005 , 56, 25379. 32 33 (35) Nakano, Y.; Ohtani, M.; Polsri, W.; Usami, T.; Sambongi, K.; Demura, T. Characterization 34 of the casbene synthase homolog from Jatropha (Jatropha curcas L.). Plant Biotech. J. 2012 , 35 29, (2), 185189. 36 37 (36) Hatsugai, N.; Iwasaki, S.; Tamura, K.; Kondo, M.; Fuji, K.; Ogasawara, K.; Nishimura, M.; 38 HaraNishimura, I. A novel membrane fusionmediated plant immunity against bacterial 39 40 pathogens. Genes Dev. 2009 , 23, (21), 2496506. 41 42 (37) Nogueira, F. C.; Palmisano, G.; Soares, E. L.; Shah, M.; Soares, A. A.; Roepstorff, P.; 43 Campos, F. A.; Domont, G. B. Proteomic profile of the nucellus of castor bean ( Ricinus 44 communis L.) seeds during development. J. Proteomics 2012 , 75, (6), 19339. 45 46 (38) Helm, M.; Schmid, M.; Hierl, G.; Terneus, K.; Tan, L.; Lottspeich, F.; Kieliszewski, M. J.; 47 Gietl, C. KDELtailed cysteine endopeptidases involved in programmed cell death, intercalation 48 of new cells, and dismantling of extensin scaffolds. Am. J. Bot. 2008 , 95, (9), 104962. 49 50 51 (39) Greenwood, J. S.; Helm, M.; Gietl, C. Ricinosomes and endosperm transfer cell structure in 52 programmed cell death of the nucellus during Ricinus seed development. Proc. Natl. Acad. Sci. 53 U. S. A. 2005 , 102, (6), 223843. 54 55 56 57 58 16 59 60 ACS Paragon Plus Environment Page 17 of 24 Journal of Proteome Research

1 2 (40) Vartapetian, A. B.; Tuzhikov, A. I.; Chichkova, N. V.; Taliansky, M.; Wolpert, T. J. A 3 plant alternative to animal caspases: subtilisinlike proteases. Cell Death Differ. 2011 , 18, (8), 4 128997. 5 6 (41) Gallardo, K.; Firnhaber, C.; Zuber, H.; Hericher, D.; Belghazi, M.; Henry, C.; Kuster, H.; 7 Thompson, R. A combined proteome and transcriptome analysis of developing Medicago 8 truncatula seeds: evidence for metabolic specialization of maternal and filial tissues. Mol Cell 9 10 Proteomics 2007 , 6, (12), 216579. 11 12 (42) Day, A.; Fenart, S.; Neutelings, G.; Hawkins, S.; Rolando, C.; Tokarski, C. Identification of 13 cell wall proteins in the flax ( Linum usitatissimum ) stem. Proteomics 2013 , 13, (5), 81225. 14 15 (43) Minic, Z. Physiological roles of plant glycoside hydrolases. Planta 2008, 227, (4), 72340. 16 17 (44) Pelloux, J.; Rustérucci, C.; Mellerowicz, E. J. New insights into pectin methylesterase 18 structure and function. Trends Plant Sci. 2007 , 12, (6), 267277. 19 20 21 (45) Rubinstein, B. Regulation of cell death in flower petals. Plant Mol. Biol. 2000 , 44, (3), 303 22 18. 23 24 (46) Laxalt, A. M.; ter Riet, B.; Verdonk, J. C.; Parigi, L.; Tameling, W. I.; Vossen, J.; Haring, 25 M.; Musgrave, A.; Munnik, T. Characterization of five tomato phospholipase D cDNAs: rapid 26 and specific expression of LePLDbeta1 on elicitation with xylanase. Plant J. 2001 , 26, (3), 237 27 47. 28 29 30 (47) Iakimova, E. T.; Michaeli, R.; Woltering, E. J. Involvement of phospholipase Drelated 31 signal transduction in chemicalinduced programmed cell death in tomato cell cultures. 32 Protoplasma 2013 . 33 34 (48) Siedow, J. N. Plant Lipoxygenase: Structure and Function. Annu. Rev. Plant Physiol. Plant 35 Mol. Biol. 1991 , 42, (1), 145188. 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 17 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 18 of 24

1 2 Table 3 4 Table 1 : Classes of seed storage proteins identified in the inner integument of Jatropha curcas 5 6 seeds. 7 8 Figures 9 10 11 Figure 1 : Anatomical structure of J. curcas seed. A. Seed structure 25 DAP. B. Magnified 12 13 representation of the square area in A showing cell features of the distal and proximal region. 14 15 Legends: D, distal region; En, endosperm; ExTg, exotegmen; II, inner integument; OI, outer 16 17 integument; P, proximal region; Ts, testa; arrowheads indicate cell remnants; arrows indicate 18 19 20 vascular bundles. 21 22 23 Figure 2 : Gene ontology (GO) categories of the proteins unique to distal and proximal regions of 24 25 the integument. Plant GO Slim was used to summarize the subcategories of the identified 26 27 proteins. Y axes represent percent of proteins. A: GOMolecular function. B: GOCellular 28 29 component. C: GOBiological process. 30 31 32 Figure 3 : TFold Analysis pinpointing differentially expressed proteins when comparing the 33 34 35 proteomics profiles of the distal and proximal integuments. Each protein is mapped as a dot on 36 37 the plot according to its –Log 2 (pvalue) (xaxis) and Log 2 (Fold change) (yaxis). Red dots are 38 39 proteins that satisfy neither the variable foldchange cutoff nor the FDR cutoff α= 0.05. Green 40 41 dots are those that satisfy the foldchange cutoff but not α. Orange dots are those that satisfy both 42 43 the foldchange cutoff and α, but are lowly abundant proteins and therefore most likely have 44 45 46 their quantitation compromised. Finally, blue dots are those that satisfy all statistical filters. Dots 47 48 found in the upper section of the plot are upregulated in the distal sample; likewise, those found 49 50 in the lower section are upregulated in the proximal sample. 51 52 53 Supplementary Material 54 55 56 57 58 18 59 60 ACS Paragon Plus Environment Page 19 of 24 Journal of Proteome Research

1 2 Supplementary Table 1 : List of the proteins appearing in at least two biological replicates 3 4 identified in the distal and proximal regions and intact inner integument of developing J. curcas 5 6 seeds. 7 8 9 Supplementary Table 2 : Gene ontology (GO) annotation of the proteins identified in the distal 10 11 and proximal regions and intact inner integument of developing J. curcas seeds. Plant GO Slim 12 13 14 was used to summarize subcategories of the identified proteins. 15 16 17 Supplementary Table 3 : Gene ontology (GO) annotation of the proteins unique to the distal and 18 19 proximal regions of the inner integument of developing J. curcas seeds. Plant GO Slim was used 20 21 to summarize the subcategories of the proteins. 22 23 24 Supplementary Table 4 : Differentially expressed proteins identified from distal and proximal 25 26 regions of the inner integument of developing J. curcas seeds. 27 28 29 Supplementary Table 5 : 30 31 32 Peptidases and peptidase inhibitors identified in distal and proximal regions and intact inner 33 34 35 integument of developing J. curcas seeds. 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 19 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 20 of 24

1 2 Table 1: Classes of seed storage proteins identified in the inner integument of Jatropha curcas 3 seeds. 4 5 Spectrum count R. communis A. thaliana 6 Protein ID Description Distal Proximal Total 7 (UNIPROT) (TAIR) 8 region region Integument 9 11S globulin seed AT1G03890. 10 Jcr4S01636.40 storage protein 2 B9SW16 3 13 5 11 1 12 like glutelin typeA AT5G44120. 13 Jcr4S00279.60 B9T5E7 18 28 46 14 precursor 3 15 AT5G44120. 16 Jcr4S01636.60 legumin B precursor Q9M4Q8 13 25 6 17 3 18 AT2G28680. Jcr4S01617.40 nutrient reservoir B9SYN7 246 45 66 19 1 20 AT1G07750. 21 Jcr4S08024.20 nutrient reservoir B9SYN7 165 33 43 22 1 23 AT2G28680. Jcr4S00423.10 nutrient reservoir B9SYN7 187 42 42 24 1 25 AT1G07750. 26 Jcr4S03933.20 nutrient reservoir B9SYN7 193 52 45 27 1 28 late embryogenesis AT2G44060. Jcr4S01793.30 B9T526 60 41 25 29 abundant protein 1 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 20 59 60 ACS Paragon Plus Environment Page 21 of 24 Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Anatomical structure of J. curcas seed. A. Seed structure 25 DAP. B. Magnified representation of the square 21 area in A showing cell features of the distal and proximal region. Legends: D, distal region; En, endosperm; 22 ExTg, exotegmen; II, inner integument; OI, outer integument; P, proximal region; Ts, testa; arrowheads 23 indicate cell remnants; arrows indicate vascular bundles. 24 186x71mm (150 x 150 DPI) 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 22 of 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Gene ontology (GO) categories of the proteins unique to distal and proximal regions of the integument. Plant 30 GO Slim was used to summarize the sub-categories of the identified proteins. Y axes represent percent of proteins. A: GO-Molecular function. B: GO-Cellular component. C: GO-Biological process. 31 252x162mm (150 x 150 DPI) 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment Page 23 of 24 Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 TFold Analysis pinpointing differentially expressed proteins when comparing the proteomics profiles of the 28 distal and proximal integuments. Each protein is mapped as a dot on the plot according to its –Log2 (p 29 value) (xaxis) and Log2 (Fold change) (yaxis). Red dots are proteins that satisfy neither the variable fold 30 change cutoff nor the FDR cutoff α= 0.05. Green dots are those that satisfy the foldchange cutoff but not α. 31 Orange dots are those that satisfy both the foldchange cutoff and α, but are lowly abundant proteins and 32 therefore most likely have their quantitation compromised. Finally, blue dots are those that satisfy all statistical filters. Dots found in the upper section of the plot are upregulated in the distal sample; likewise, 33 those found in the lower section are upregulated in the proximal sample. 34 254x151mm (101 x 101 DPI) 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment Journal of Proteome Research Page 24 of 24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 162x86mm (150 x 150 DPI) 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment